* [PATCH v3 0/8] Fix SCHED_DEADLINE bandwidth accounting during suspend
@ 2025-03-10 9:19 Juri Lelli
2025-03-10 9:19 ` [PATCH v3 1/8] sched/deadline: Ignore special tasks when rebuilding domains Juri Lelli
` (7 more replies)
0 siblings, 8 replies; 30+ messages in thread
From: Juri Lelli @ 2025-03-10 9:19 UTC (permalink / raw)
To: linux-kernel, cgroups
Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Waiman Long, Tejun Heo, Johannes Weiner,
Michal Koutný, Qais Yousef, Sebastian Andrzej Siewior,
Swapnil Sapkal, Shrikanth Hegde, Phil Auld, luca.abeni,
tommaso.cucinotta, Jon Hunter
Hello!
Jon reported [1] a suspend regression on a Tegra board configured to
boot with isolcpus and bisected it to commit 53916d5fd3c0
("sched/deadline: Check bandwidth overflow earlier for hotplug").
Root cause analysis pointed out that we are currently failing to
correctly clear and restore bandwidth accounting on root domains after
changes that initiate from partition_sched_domains(), as it is the case
for suspend operations on that board.
This is v3 [2] of the proposed approach to fix the issue. With respect
to v2, the following implements the approach by:
- 01: filter out DEADLINE special tasks
- 02: preparatory wrappers to be able to grab sched_domains_mutex on
UP (added !SMP wrappers back as sched_rt_handler() needs them)
- 03: generalize unique visiting of root domains so that we can
re-use the mechanism elsewhere
- 04: the bulk of the approach, clean and rebuild after changes
- 05: clean up a now redundant call
- 06: remove partition_and_rebuild_sched_domains()
- 07: stop exposing partition_sched_domains_locked
I kept Jon and Waiman's Tested-by tags from v2 as there are no
functional changes in v3.
Please test and review. The set is also available at
git@github.com:jlelli/linux.git upstream/deadline/domains-suspend
Best,
Juri
1 - https://lore.kernel.org/lkml/ba51a43f-796d-4b79-808a-b8185905638a@nvidia.com/
2 - v1 https://lore.kernel.org/lkml/20250304084045.62554-1-juri.lelli@redhat.com
v2 https://lore.kernel.org/lkml/20250306141016.268313-1-juri.lelli@redhat.com/
Juri Lelli (8):
sched/deadline: Ignore special tasks when rebuilding domains
sched/topology: Wrappers for sched_domains_mutex
sched/deadline: Generalize unique visiting of root domains
sched/deadline: Rebuild root domain accounting after every update
sched/topology: Remove redundant dl_clear_root_domain call
cgroup/cpuset: Remove partition_and_rebuild_sched_domains
sched/topology: Stop exposing partition_sched_domains_locked
include/{topology,cpuset}: Move dl_rebuild_rd_accounting to cpuset.h
include/linux/cpuset.h | 5 +++++
include/linux/sched.h | 5 +++++
include/linux/sched/deadline.h | 4 ++++
include/linux/sched/topology.h | 10 ---------
kernel/cgroup/cpuset.c | 27 +++++++++----------------
kernel/sched/core.c | 4 ++--
kernel/sched/deadline.c | 37 ++++++++++++++++++++--------------
kernel/sched/debug.c | 8 ++++----
kernel/sched/rt.c | 2 ++
kernel/sched/sched.h | 2 +-
kernel/sched/topology.c | 32 +++++++++++++----------------
11 files changed, 69 insertions(+), 67 deletions(-)
base-commit: 80e54e84911a923c40d7bee33a34c1b4be148d7a
--
2.48.1
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v3 1/8] sched/deadline: Ignore special tasks when rebuilding domains
2025-03-10 9:19 [PATCH v3 0/8] Fix SCHED_DEADLINE bandwidth accounting during suspend Juri Lelli
@ 2025-03-10 9:19 ` Juri Lelli
2025-03-12 13:32 ` Valentin Schneider
2025-03-10 9:33 ` [PATCH v3 2/8] sched/topology: Wrappers for sched_domains_mutex Juri Lelli
` (6 subsequent siblings)
7 siblings, 1 reply; 30+ messages in thread
From: Juri Lelli @ 2025-03-10 9:19 UTC (permalink / raw)
To: linux-kernel, cgroups
Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Waiman Long, Tejun Heo, Johannes Weiner,
Michal Koutný, Qais Yousef, Sebastian Andrzej Siewior,
Swapnil Sapkal, Shrikanth Hegde, Phil Auld, luca.abeni,
tommaso.cucinotta, Jon Hunter
SCHED_DEADLINE special tasks get a fake bandwidth that is only used to
make sure sleeping and priority inheritance 'work', but it is ignored
for runtime enforcement and admission control.
Be consistent with it also when rebuilding root domains.
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Fixes: 53916d5fd3c0 ("sched/deadline: Check bandwidth overflow earlier for hotplug")
Tested-by: Waiman Long <longman@redhat.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
---
kernel/sched/deadline.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index ff4df16b5186..1a041c1fc0d1 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2956,7 +2956,7 @@ void dl_add_task_root_domain(struct task_struct *p)
struct dl_bw *dl_b;
raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
- if (!dl_task(p)) {
+ if (!dl_task(p) || dl_entity_is_special(&p->dl)) {
raw_spin_unlock_irqrestore(&p->pi_lock, rf.flags);
return;
}
--
2.48.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v3 2/8] sched/topology: Wrappers for sched_domains_mutex
2025-03-10 9:19 [PATCH v3 0/8] Fix SCHED_DEADLINE bandwidth accounting during suspend Juri Lelli
2025-03-10 9:19 ` [PATCH v3 1/8] sched/deadline: Ignore special tasks when rebuilding domains Juri Lelli
@ 2025-03-10 9:33 ` Juri Lelli
2025-03-12 13:32 ` Valentin Schneider
2025-03-10 9:35 ` [PATCH v3 3/8] sched/deadline: Generalize unique visiting of root domains Juri Lelli
` (5 subsequent siblings)
7 siblings, 1 reply; 30+ messages in thread
From: Juri Lelli @ 2025-03-10 9:33 UTC (permalink / raw)
To: linux-kernel, cgroups
Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
Waiman Long, Tejun Heo, Johannes Weiner, Michal Koutný,
Qais Yousef, Sebastian Andrzej Siewior, Swapnil Sapkal,
Shrikanth Hegde, Phil Auld, luca.abeni, tommaso.cucinotta,
Jon Hunter
Create wrappers for sched_domains_mutex so that it can transparently be
used on both CONFIG_SMP and !CONFIG_SMP, as some function will need to
do.
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Fixes: 53916d5fd3c0 ("sched/deadline: Check bandwidth overflow earlier for hotplug")
Tested-by: Waiman Long <longman@redhat.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
---
v2 -> v3: Add wrappers back for the !SMP case as sched_rt_handler()
needs them
---
include/linux/sched.h | 5 +++++
kernel/cgroup/cpuset.c | 4 ++--
kernel/sched/core.c | 4 ++--
kernel/sched/debug.c | 8 ++++----
kernel/sched/topology.c | 12 ++++++++++--
5 files changed, 23 insertions(+), 10 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 9c15365a30c0..4659898c0299 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -382,6 +382,11 @@ enum uclamp_id {
#ifdef CONFIG_SMP
extern struct root_domain def_root_domain;
extern struct mutex sched_domains_mutex;
+extern void sched_domains_mutex_lock(void);
+extern void sched_domains_mutex_unlock(void);
+#else
+static inline void sched_domains_mutex_lock(void) { }
+static inline void sched_domains_mutex_unlock(void) { }
#endif
struct sched_param {
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 0f910c828973..f87526edb2a4 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -994,10 +994,10 @@ static void
partition_and_rebuild_sched_domains(int ndoms_new, cpumask_var_t doms_new[],
struct sched_domain_attr *dattr_new)
{
- mutex_lock(&sched_domains_mutex);
+ sched_domains_mutex_lock();
partition_sched_domains_locked(ndoms_new, doms_new, dattr_new);
dl_rebuild_rd_accounting();
- mutex_unlock(&sched_domains_mutex);
+ sched_domains_mutex_unlock();
}
/*
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 67189907214d..58593f4d09a1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8424,9 +8424,9 @@ void __init sched_init_smp(void)
* CPU masks are stable and all blatant races in the below code cannot
* happen.
*/
- mutex_lock(&sched_domains_mutex);
+ sched_domains_mutex_lock();
sched_init_domains(cpu_active_mask);
- mutex_unlock(&sched_domains_mutex);
+ sched_domains_mutex_unlock();
/* Move init over to a non-isolated CPU */
if (set_cpus_allowed_ptr(current, housekeeping_cpumask(HK_TYPE_DOMAIN)) < 0)
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index ef047add7f9e..a0893a483d35 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -292,7 +292,7 @@ static ssize_t sched_verbose_write(struct file *filp, const char __user *ubuf,
bool orig;
cpus_read_lock();
- mutex_lock(&sched_domains_mutex);
+ sched_domains_mutex_lock();
orig = sched_debug_verbose;
result = debugfs_write_file_bool(filp, ubuf, cnt, ppos);
@@ -304,7 +304,7 @@ static ssize_t sched_verbose_write(struct file *filp, const char __user *ubuf,
sd_dentry = NULL;
}
- mutex_unlock(&sched_domains_mutex);
+ sched_domains_mutex_unlock();
cpus_read_unlock();
return result;
@@ -515,9 +515,9 @@ static __init int sched_init_debug(void)
debugfs_create_u32("migration_cost_ns", 0644, debugfs_sched, &sysctl_sched_migration_cost);
debugfs_create_u32("nr_migrate", 0644, debugfs_sched, &sysctl_sched_nr_migrate);
- mutex_lock(&sched_domains_mutex);
+ sched_domains_mutex_lock();
update_sched_domain_debugfs();
- mutex_unlock(&sched_domains_mutex);
+ sched_domains_mutex_unlock();
#endif
#ifdef CONFIG_NUMA_BALANCING
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index c49aea8c1025..296ff2acfd32 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -6,6 +6,14 @@
#include <linux/bsearch.h>
DEFINE_MUTEX(sched_domains_mutex);
+void sched_domains_mutex_lock(void)
+{
+ mutex_lock(&sched_domains_mutex);
+}
+void sched_domains_mutex_unlock(void)
+{
+ mutex_unlock(&sched_domains_mutex);
+}
/* Protected by sched_domains_mutex: */
static cpumask_var_t sched_domains_tmpmask;
@@ -2791,7 +2799,7 @@ void partition_sched_domains_locked(int ndoms_new, cpumask_var_t doms_new[],
void partition_sched_domains(int ndoms_new, cpumask_var_t doms_new[],
struct sched_domain_attr *dattr_new)
{
- mutex_lock(&sched_domains_mutex);
+ sched_domains_mutex_lock();
partition_sched_domains_locked(ndoms_new, doms_new, dattr_new);
- mutex_unlock(&sched_domains_mutex);
+ sched_domains_mutex_unlock();
}
--
2.48.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v3 3/8] sched/deadline: Generalize unique visiting of root domains
2025-03-10 9:19 [PATCH v3 0/8] Fix SCHED_DEADLINE bandwidth accounting during suspend Juri Lelli
2025-03-10 9:19 ` [PATCH v3 1/8] sched/deadline: Ignore special tasks when rebuilding domains Juri Lelli
2025-03-10 9:33 ` [PATCH v3 2/8] sched/topology: Wrappers for sched_domains_mutex Juri Lelli
@ 2025-03-10 9:35 ` Juri Lelli
2025-03-12 13:32 ` Valentin Schneider
2025-03-10 9:37 ` [PATCH v3 4/8] sched/deadline: Rebuild root domain accounting after every update Juri Lelli
` (4 subsequent siblings)
7 siblings, 1 reply; 30+ messages in thread
From: Juri Lelli @ 2025-03-10 9:35 UTC (permalink / raw)
To: linux-kernel, cgroups
Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
Waiman Long, Tejun Heo, Johannes Weiner, Michal Koutný,
Qais Yousef, Sebastian Andrzej Siewior, Swapnil Sapkal,
Shrikanth Hegde, Phil Auld, luca.abeni, tommaso.cucinotta,
Jon Hunter
Bandwidth checks and updates that work on root domains currently employ
a cookie mechanism for efficiency. This mechanism is very much tied to
when root domains are first created and initialized.
Generalize the cookie mechanism so that it can be used also later at
runtime while updating root domains. Also, additionally guard it with
sched_domains_mutex, since domains need to be stable while updating them
(and it will be required for further dynamic changes).
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Fixes: 53916d5fd3c0 ("sched/deadline: Check bandwidth overflow earlier for hotplug")
Tested-by: Waiman Long <longman@redhat.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
---
include/linux/sched/deadline.h | 3 +++
kernel/sched/deadline.c | 23 +++++++++++++----------
kernel/sched/rt.c | 2 ++
kernel/sched/sched.h | 2 +-
kernel/sched/topology.c | 2 +-
5 files changed, 20 insertions(+), 12 deletions(-)
diff --git a/include/linux/sched/deadline.h b/include/linux/sched/deadline.h
index 3a912ab42bb5..6ec578600b24 100644
--- a/include/linux/sched/deadline.h
+++ b/include/linux/sched/deadline.h
@@ -37,4 +37,7 @@ extern void dl_clear_root_domain(struct root_domain *rd);
#endif /* CONFIG_SMP */
+extern u64 dl_cookie;
+extern bool dl_bw_visited(int cpu, u64 cookie);
+
#endif /* _LINUX_SCHED_DEADLINE_H */
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 1a041c1fc0d1..3e05032e9e0e 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -166,14 +166,14 @@ static inline unsigned long dl_bw_capacity(int i)
}
}
-static inline bool dl_bw_visited(int cpu, u64 gen)
+static inline bool dl_bw_visited(int cpu, u64 cookie)
{
struct root_domain *rd = cpu_rq(cpu)->rd;
- if (rd->visit_gen == gen)
+ if (rd->visit_cookie == cookie)
return true;
- rd->visit_gen = gen;
+ rd->visit_cookie = cookie;
return false;
}
@@ -207,7 +207,7 @@ static inline unsigned long dl_bw_capacity(int i)
return SCHED_CAPACITY_SCALE;
}
-static inline bool dl_bw_visited(int cpu, u64 gen)
+static inline bool dl_bw_visited(int cpu, u64 cookie)
{
return false;
}
@@ -3171,15 +3171,18 @@ DEFINE_SCHED_CLASS(dl) = {
#endif
};
-/* Used for dl_bw check and update, used under sched_rt_handler()::mutex */
-static u64 dl_generation;
+/*
+ * Used for dl_bw check and update, used under sched_rt_handler()::mutex and
+ * sched_domains_mutex.
+ */
+u64 dl_cookie;
int sched_dl_global_validate(void)
{
u64 runtime = global_rt_runtime();
u64 period = global_rt_period();
u64 new_bw = to_ratio(period, runtime);
- u64 gen = ++dl_generation;
+ u64 cookie = ++dl_cookie;
struct dl_bw *dl_b;
int cpu, cpus, ret = 0;
unsigned long flags;
@@ -3192,7 +3195,7 @@ int sched_dl_global_validate(void)
for_each_online_cpu(cpu) {
rcu_read_lock_sched();
- if (dl_bw_visited(cpu, gen))
+ if (dl_bw_visited(cpu, cookie))
goto next;
dl_b = dl_bw_of(cpu);
@@ -3229,7 +3232,7 @@ static void init_dl_rq_bw_ratio(struct dl_rq *dl_rq)
void sched_dl_do_global(void)
{
u64 new_bw = -1;
- u64 gen = ++dl_generation;
+ u64 cookie = ++dl_cookie;
struct dl_bw *dl_b;
int cpu;
unsigned long flags;
@@ -3240,7 +3243,7 @@ void sched_dl_do_global(void)
for_each_possible_cpu(cpu) {
rcu_read_lock_sched();
- if (dl_bw_visited(cpu, gen)) {
+ if (dl_bw_visited(cpu, cookie)) {
rcu_read_unlock_sched();
continue;
}
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 4b8e33c615b1..8cebe71d2bb1 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2910,6 +2910,7 @@ static int sched_rt_handler(const struct ctl_table *table, int write, void *buff
int ret;
mutex_lock(&mutex);
+ sched_domains_mutex_lock();
old_period = sysctl_sched_rt_period;
old_runtime = sysctl_sched_rt_runtime;
@@ -2936,6 +2937,7 @@ static int sched_rt_handler(const struct ctl_table *table, int write, void *buff
sysctl_sched_rt_period = old_period;
sysctl_sched_rt_runtime = old_runtime;
}
+ sched_domains_mutex_unlock();
mutex_unlock(&mutex);
return ret;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index c8512a9fb022..c978abe38c07 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -998,7 +998,7 @@ struct root_domain {
* Also, some corner cases, like 'wrap around' is dangerous, but given
* that u64 is 'big enough'. So that shouldn't be a concern.
*/
- u64 visit_gen;
+ u64 visit_cookie;
#ifdef HAVE_RT_PUSH_IPI
/*
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 296ff2acfd32..44093339761c 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -568,7 +568,7 @@ static int init_rootdomain(struct root_domain *rd)
rd->rto_push_work = IRQ_WORK_INIT_HARD(rto_push_irq_work_func);
#endif
- rd->visit_gen = 0;
+ rd->visit_cookie = 0;
init_dl_bw(&rd->dl_bw);
if (cpudl_init(&rd->cpudl) != 0)
goto free_rto_mask;
--
2.48.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v3 4/8] sched/deadline: Rebuild root domain accounting after every update
2025-03-10 9:19 [PATCH v3 0/8] Fix SCHED_DEADLINE bandwidth accounting during suspend Juri Lelli
` (2 preceding siblings ...)
2025-03-10 9:35 ` [PATCH v3 3/8] sched/deadline: Generalize unique visiting of root domains Juri Lelli
@ 2025-03-10 9:37 ` Juri Lelli
2025-03-10 18:54 ` Dietmar Eggemann
2025-03-10 9:38 ` [PATCH v3 5/8] sched/topology: Remove redundant dl_clear_root_domain call Juri Lelli
` (3 subsequent siblings)
7 siblings, 1 reply; 30+ messages in thread
From: Juri Lelli @ 2025-03-10 9:37 UTC (permalink / raw)
To: linux-kernel, cgroups
Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
Waiman Long, Tejun Heo, Johannes Weiner, Michal Koutný,
Qais Yousef, Sebastian Andrzej Siewior, Swapnil Sapkal,
Shrikanth Hegde, Phil Auld, luca.abeni, tommaso.cucinotta,
Jon Hunter
Rebuilding of root domains accounting information (total_bw) is
currently broken on some cases, e.g. suspend/resume on aarch64. Problem
is that the way we keep track of domain changes and try to add bandwidth
back is convoluted and fragile.
Fix it by simplify things by making sure bandwidth accounting is cleared
and completely restored after root domains changes (after root domains
are again stable).
Reported-by: Jon Hunter <jonathanh@nvidia.com>
Fixes: 53916d5fd3c0 ("sched/deadline: Check bandwidth overflow earlier for hotplug")
Tested-by: Waiman Long <longman@redhat.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
---
v2 -> v3: remove spurious dl_bw_visited declaration (Shrikanth)
---
include/linux/sched/deadline.h | 1 +
include/linux/sched/topology.h | 2 ++
kernel/cgroup/cpuset.c | 16 +++++++++-------
kernel/sched/deadline.c | 16 ++++++++++------
kernel/sched/topology.c | 1 +
5 files changed, 23 insertions(+), 13 deletions(-)
diff --git a/include/linux/sched/deadline.h b/include/linux/sched/deadline.h
index 6ec578600b24..f9aabbc9d22e 100644
--- a/include/linux/sched/deadline.h
+++ b/include/linux/sched/deadline.h
@@ -34,6 +34,7 @@ static inline bool dl_time_before(u64 a, u64 b)
struct root_domain;
extern void dl_add_task_root_domain(struct task_struct *p);
extern void dl_clear_root_domain(struct root_domain *rd);
+extern void dl_clear_root_domain_cpu(int cpu);
#endif /* CONFIG_SMP */
diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 7f3dbafe1817..1622232bd08b 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -166,6 +166,8 @@ static inline struct cpumask *sched_domain_span(struct sched_domain *sd)
return to_cpumask(sd->span);
}
+extern void dl_rebuild_rd_accounting(void);
+
extern void partition_sched_domains_locked(int ndoms_new,
cpumask_var_t doms_new[],
struct sched_domain_attr *dattr_new);
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index f87526edb2a4..f66b2aefdc04 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -954,10 +954,12 @@ static void dl_update_tasks_root_domain(struct cpuset *cs)
css_task_iter_end(&it);
}
-static void dl_rebuild_rd_accounting(void)
+void dl_rebuild_rd_accounting(void)
{
struct cpuset *cs = NULL;
struct cgroup_subsys_state *pos_css;
+ int cpu;
+ u64 cookie = ++dl_cookie;
lockdep_assert_held(&cpuset_mutex);
lockdep_assert_cpus_held();
@@ -965,11 +967,12 @@ static void dl_rebuild_rd_accounting(void)
rcu_read_lock();
- /*
- * Clear default root domain DL accounting, it will be computed again
- * if a task belongs to it.
- */
- dl_clear_root_domain(&def_root_domain);
+ for_each_possible_cpu(cpu) {
+ if (dl_bw_visited(cpu, cookie))
+ continue;
+
+ dl_clear_root_domain_cpu(cpu);
+ }
cpuset_for_each_descendant_pre(cs, pos_css, &top_cpuset) {
@@ -996,7 +999,6 @@ partition_and_rebuild_sched_domains(int ndoms_new, cpumask_var_t doms_new[],
{
sched_domains_mutex_lock();
partition_sched_domains_locked(ndoms_new, doms_new, dattr_new);
- dl_rebuild_rd_accounting();
sched_domains_mutex_unlock();
}
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 3e05032e9e0e..5dca336cdd7c 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -166,7 +166,7 @@ static inline unsigned long dl_bw_capacity(int i)
}
}
-static inline bool dl_bw_visited(int cpu, u64 cookie)
+bool dl_bw_visited(int cpu, u64 cookie)
{
struct root_domain *rd = cpu_rq(cpu)->rd;
@@ -207,7 +207,7 @@ static inline unsigned long dl_bw_capacity(int i)
return SCHED_CAPACITY_SCALE;
}
-static inline bool dl_bw_visited(int cpu, u64 cookie)
+bool dl_bw_visited(int cpu, u64 cookie)
{
return false;
}
@@ -2981,18 +2981,22 @@ void dl_clear_root_domain(struct root_domain *rd)
rd->dl_bw.total_bw = 0;
/*
- * dl_server bandwidth is only restored when CPUs are attached to root
- * domains (after domains are created or CPUs moved back to the
- * default root doamin).
+ * dl_servers are not tasks. Since dl_add_task_root_domain ignores
+ * them, we need to account for them here explicitly.
*/
for_each_cpu(i, rd->span) {
struct sched_dl_entity *dl_se = &cpu_rq(i)->fair_server;
if (dl_server(dl_se) && cpu_active(i))
- rd->dl_bw.total_bw += dl_se->dl_bw;
+ __dl_add(&rd->dl_bw, dl_se->dl_bw, dl_bw_cpus(i));
}
}
+void dl_clear_root_domain_cpu(int cpu)
+{
+ dl_clear_root_domain(cpu_rq(cpu)->rd);
+}
+
#endif /* CONFIG_SMP */
static void switched_from_dl(struct rq *rq, struct task_struct *p)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 44093339761c..363ad268a25b 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2791,6 +2791,7 @@ void partition_sched_domains_locked(int ndoms_new, cpumask_var_t doms_new[],
ndoms_cur = ndoms_new;
update_sched_domain_debugfs();
+ dl_rebuild_rd_accounting();
}
/*
--
2.48.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v3 5/8] sched/topology: Remove redundant dl_clear_root_domain call
2025-03-10 9:19 [PATCH v3 0/8] Fix SCHED_DEADLINE bandwidth accounting during suspend Juri Lelli
` (3 preceding siblings ...)
2025-03-10 9:37 ` [PATCH v3 4/8] sched/deadline: Rebuild root domain accounting after every update Juri Lelli
@ 2025-03-10 9:38 ` Juri Lelli
2025-03-12 13:32 ` Valentin Schneider
2025-03-10 9:39 ` [PATCH v3 6/8] cgroup/cpuset: Remove partition_and_rebuild_sched_domains Juri Lelli
` (2 subsequent siblings)
7 siblings, 1 reply; 30+ messages in thread
From: Juri Lelli @ 2025-03-10 9:38 UTC (permalink / raw)
To: linux-kernel, cgroups
Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
Waiman Long, Tejun Heo, Johannes Weiner, Michal Koutný,
Qais Yousef, Sebastian Andrzej Siewior, Swapnil Sapkal,
Shrikanth Hegde, Phil Auld, luca.abeni, tommaso.cucinotta,
Jon Hunter
We completely clean and restore root domains bandwidth accounting after
every root domains change, so the dl_clear_root_domain() call in
partition_sched_domains_locked() is redundant.
Remove it.
Reviewed-by: Waiman Long <llong@redhat.com>
Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Tested-by: Waiman Long <longman@redhat.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
---
kernel/sched/topology.c | 15 +--------------
1 file changed, 1 insertion(+), 14 deletions(-)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 363ad268a25b..df2d94a57e84 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2720,21 +2720,8 @@ void partition_sched_domains_locked(int ndoms_new, cpumask_var_t doms_new[],
for (i = 0; i < ndoms_cur; i++) {
for (j = 0; j < n && !new_topology; j++) {
if (cpumask_equal(doms_cur[i], doms_new[j]) &&
- dattrs_equal(dattr_cur, i, dattr_new, j)) {
- struct root_domain *rd;
-
- /*
- * This domain won't be destroyed and as such
- * its dl_bw->total_bw needs to be cleared.
- * Tasks contribution will be then recomputed
- * in function dl_update_tasks_root_domain(),
- * dl_servers contribution in function
- * dl_restore_server_root_domain().
- */
- rd = cpu_rq(cpumask_any(doms_cur[i]))->rd;
- dl_clear_root_domain(rd);
+ dattrs_equal(dattr_cur, i, dattr_new, j))
goto match1;
- }
}
/* No match - a current sched domain not in new doms_new[] */
detach_destroy_domains(doms_cur[i]);
--
2.48.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v3 6/8] cgroup/cpuset: Remove partition_and_rebuild_sched_domains
2025-03-10 9:19 [PATCH v3 0/8] Fix SCHED_DEADLINE bandwidth accounting during suspend Juri Lelli
` (4 preceding siblings ...)
2025-03-10 9:38 ` [PATCH v3 5/8] sched/topology: Remove redundant dl_clear_root_domain call Juri Lelli
@ 2025-03-10 9:39 ` Juri Lelli
2025-03-12 13:32 ` Valentin Schneider
2025-03-10 9:40 ` [PATCH v3 7/8] sched/topology: Stop exposing partition_sched_domains_locked Juri Lelli
2025-03-10 9:40 ` [PATCH v3 8/8] include/{topology,cpuset}: Move dl_rebuild_rd_accounting to cpuset.h Juri Lelli
7 siblings, 1 reply; 30+ messages in thread
From: Juri Lelli @ 2025-03-10 9:39 UTC (permalink / raw)
To: linux-kernel, cgroups
Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
Waiman Long, Tejun Heo, Johannes Weiner, Michal Koutný,
Qais Yousef, Sebastian Andrzej Siewior, Swapnil Sapkal,
Shrikanth Hegde, Phil Auld, luca.abeni, tommaso.cucinotta,
Jon Hunter
partition_and_rebuild_sched_domains() and partition_sched_domains() are
now equivalent.
Remove the former as a nice clean up.
Suggested-by: Waiman Long <llong@redhat.com>
Reviewed-by: Waiman Long <llong@redhat.com>
Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Tested-by: Waiman Long <longman@redhat.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
---
kernel/cgroup/cpuset.c | 11 +----------
1 file changed, 1 insertion(+), 10 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index f66b2aefdc04..7995cd58a01b 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -993,15 +993,6 @@ void dl_rebuild_rd_accounting(void)
rcu_read_unlock();
}
-static void
-partition_and_rebuild_sched_domains(int ndoms_new, cpumask_var_t doms_new[],
- struct sched_domain_attr *dattr_new)
-{
- sched_domains_mutex_lock();
- partition_sched_domains_locked(ndoms_new, doms_new, dattr_new);
- sched_domains_mutex_unlock();
-}
-
/*
* Rebuild scheduler domains.
*
@@ -1063,7 +1054,7 @@ void rebuild_sched_domains_locked(void)
ndoms = generate_sched_domains(&doms, &attr);
/* Have scheduler rebuild the domains */
- partition_and_rebuild_sched_domains(ndoms, doms, attr);
+ partition_sched_domains(ndoms, doms, attr);
}
#else /* !CONFIG_SMP */
void rebuild_sched_domains_locked(void)
--
2.48.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v3 7/8] sched/topology: Stop exposing partition_sched_domains_locked
2025-03-10 9:19 [PATCH v3 0/8] Fix SCHED_DEADLINE bandwidth accounting during suspend Juri Lelli
` (5 preceding siblings ...)
2025-03-10 9:39 ` [PATCH v3 6/8] cgroup/cpuset: Remove partition_and_rebuild_sched_domains Juri Lelli
@ 2025-03-10 9:40 ` Juri Lelli
2025-03-12 13:32 ` Valentin Schneider
2025-03-10 9:40 ` [PATCH v3 8/8] include/{topology,cpuset}: Move dl_rebuild_rd_accounting to cpuset.h Juri Lelli
7 siblings, 1 reply; 30+ messages in thread
From: Juri Lelli @ 2025-03-10 9:40 UTC (permalink / raw)
To: linux-kernel, cgroups
Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
Waiman Long, Tejun Heo, Johannes Weiner, Michal Koutný,
Qais Yousef, Sebastian Andrzej Siewior, Swapnil Sapkal,
Shrikanth Hegde, Phil Auld, luca.abeni, tommaso.cucinotta,
Jon Hunter
The are no callers of partition_sched_domains_locked() outside
topology.c.
Stop exposing such function.
Suggested-by: Waiman Long <llong@redhat.com>
Tested-by: Waiman Long <longman@redhat.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
---
include/linux/sched/topology.h | 10 ----------
kernel/sched/topology.c | 2 +-
2 files changed, 1 insertion(+), 11 deletions(-)
diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 1622232bd08b..96e69bfc3c8a 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -168,10 +168,6 @@ static inline struct cpumask *sched_domain_span(struct sched_domain *sd)
extern void dl_rebuild_rd_accounting(void);
-extern void partition_sched_domains_locked(int ndoms_new,
- cpumask_var_t doms_new[],
- struct sched_domain_attr *dattr_new);
-
extern void partition_sched_domains(int ndoms_new, cpumask_var_t doms_new[],
struct sched_domain_attr *dattr_new);
@@ -212,12 +208,6 @@ extern void __init set_sched_topology(struct sched_domain_topology_level *tl);
struct sched_domain_attr;
-static inline void
-partition_sched_domains_locked(int ndoms_new, cpumask_var_t doms_new[],
- struct sched_domain_attr *dattr_new)
-{
-}
-
static inline void
partition_sched_domains(int ndoms_new, cpumask_var_t doms_new[],
struct sched_domain_attr *dattr_new)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index df2d94a57e84..95bde793651c 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2688,7 +2688,7 @@ static int dattrs_equal(struct sched_domain_attr *cur, int idx_cur,
*
* Call with hotplug lock and sched_domains_mutex held
*/
-void partition_sched_domains_locked(int ndoms_new, cpumask_var_t doms_new[],
+static void partition_sched_domains_locked(int ndoms_new, cpumask_var_t doms_new[],
struct sched_domain_attr *dattr_new)
{
bool __maybe_unused has_eas = false;
--
2.48.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v3 8/8] include/{topology,cpuset}: Move dl_rebuild_rd_accounting to cpuset.h
2025-03-10 9:19 [PATCH v3 0/8] Fix SCHED_DEADLINE bandwidth accounting during suspend Juri Lelli
` (6 preceding siblings ...)
2025-03-10 9:40 ` [PATCH v3 7/8] sched/topology: Stop exposing partition_sched_domains_locked Juri Lelli
@ 2025-03-10 9:40 ` Juri Lelli
2025-03-12 13:32 ` Valentin Schneider
7 siblings, 1 reply; 30+ messages in thread
From: Juri Lelli @ 2025-03-10 9:40 UTC (permalink / raw)
To: linux-kernel, cgroups
Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Valentin Schneider,
Waiman Long, Tejun Heo, Johannes Weiner, Michal Koutný,
Qais Yousef, Sebastian Andrzej Siewior, Swapnil Sapkal,
Shrikanth Hegde, Phil Auld, luca.abeni, tommaso.cucinotta,
Jon Hunter
dl_rebuild_rd_accounting() is defined in cpuset.c, so it makes more
sense to move related declarations to cpuset.h.
Implement the move.
Suggested-by: Waiman Long <llong@redhat.com>
Reviewed-by: Waiman Long <llong@redhat.com>
Tested-by: Waiman Long <longman@redhat.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
---
include/linux/cpuset.h | 5 +++++
include/linux/sched/topology.h | 2 --
2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index 835e7b793f6a..c414daa7d503 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -125,6 +125,7 @@ static inline int cpuset_do_page_mem_spread(void)
extern bool current_cpuset_is_being_rebound(void);
+extern void dl_rebuild_rd_accounting(void);
extern void rebuild_sched_domains(void);
extern void cpuset_print_current_mems_allowed(void);
@@ -259,6 +260,10 @@ static inline bool current_cpuset_is_being_rebound(void)
return false;
}
+static inline void dl_rebuild_rd_accounting(void)
+{
+}
+
static inline void rebuild_sched_domains(void)
{
partition_sched_domains(1, NULL, NULL);
diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 96e69bfc3c8a..51f7b8169515 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -166,8 +166,6 @@ static inline struct cpumask *sched_domain_span(struct sched_domain *sd)
return to_cpumask(sd->span);
}
-extern void dl_rebuild_rd_accounting(void);
-
extern void partition_sched_domains(int ndoms_new, cpumask_var_t doms_new[],
struct sched_domain_attr *dattr_new);
--
2.48.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH v3 4/8] sched/deadline: Rebuild root domain accounting after every update
2025-03-10 9:37 ` [PATCH v3 4/8] sched/deadline: Rebuild root domain accounting after every update Juri Lelli
@ 2025-03-10 18:54 ` Dietmar Eggemann
2025-03-10 19:18 ` Waiman Long
0 siblings, 1 reply; 30+ messages in thread
From: Dietmar Eggemann @ 2025-03-10 18:54 UTC (permalink / raw)
To: Juri Lelli, linux-kernel, cgroups
Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Steven Rostedt,
Ben Segall, Mel Gorman, Valentin Schneider, Waiman Long,
Tejun Heo, Johannes Weiner, Michal Koutný, Qais Yousef,
Sebastian Andrzej Siewior, Swapnil Sapkal, Shrikanth Hegde,
Phil Auld, luca.abeni, tommaso.cucinotta, Jon Hunter
On 10/03/2025 10:37, Juri Lelli wrote:
> Rebuilding of root domains accounting information (total_bw) is
> currently broken on some cases, e.g. suspend/resume on aarch64. Problem
Nit: Couldn't spot any arch dependency here. I guess it was just tested
on Arm64 platforms so far.
[...]
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 44093339761c..363ad268a25b 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -2791,6 +2791,7 @@ void partition_sched_domains_locked(int ndoms_new, cpumask_var_t doms_new[],
> ndoms_cur = ndoms_new;
>
> update_sched_domain_debugfs();
> + dl_rebuild_rd_accounting();
Won't dl_rebuild_rd_accounting()'s lockdep_assert_held(&cpuset_mutex)
barf when called via cpuhp's:
sched_cpu_deactivate()
cpuset_cpu_inactive()
partition_sched_domains()
partition_sched_domains_locked()
dl_rebuild_rd_accounting()
?
[...]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 4/8] sched/deadline: Rebuild root domain accounting after every update
2025-03-10 18:54 ` Dietmar Eggemann
@ 2025-03-10 19:18 ` Waiman Long
2025-03-11 0:16 ` Waiman Long
0 siblings, 1 reply; 30+ messages in thread
From: Waiman Long @ 2025-03-10 19:18 UTC (permalink / raw)
To: Dietmar Eggemann, Juri Lelli, linux-kernel, cgroups
Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Steven Rostedt,
Ben Segall, Mel Gorman, Valentin Schneider, Tejun Heo,
Johannes Weiner, Michal Koutný, Qais Yousef,
Sebastian Andrzej Siewior, Swapnil Sapkal, Shrikanth Hegde,
Phil Auld, luca.abeni, tommaso.cucinotta, Jon Hunter
On 3/10/25 2:54 PM, Dietmar Eggemann wrote:
> On 10/03/2025 10:37, Juri Lelli wrote:
>> Rebuilding of root domains accounting information (total_bw) is
>> currently broken on some cases, e.g. suspend/resume on aarch64. Problem
> Nit: Couldn't spot any arch dependency here. I guess it was just tested
> on Arm64 platforms so far.
>
> [...]
>
>> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
>> index 44093339761c..363ad268a25b 100644
>> --- a/kernel/sched/topology.c
>> +++ b/kernel/sched/topology.c
>> @@ -2791,6 +2791,7 @@ void partition_sched_domains_locked(int ndoms_new, cpumask_var_t doms_new[],
>> ndoms_cur = ndoms_new;
>>
>> update_sched_domain_debugfs();
>> + dl_rebuild_rd_accounting();
> Won't dl_rebuild_rd_accounting()'s lockdep_assert_held(&cpuset_mutex)
> barf when called via cpuhp's:
>
> sched_cpu_deactivate()
>
> cpuset_cpu_inactive()
>
> partition_sched_domains()
>
> partition_sched_domains_locked()
>
> dl_rebuild_rd_accounting()
>
> ?
>
> [...]
Right. If cpuhp_tasks_frozen is true, partition_sched_domains() will be
called without holding cpuset mutex.
Well, I think we will need an additional wrapper in cpuset.c that
acquires the cpuset_mutex first before calling partition_sched_domains()
and use the new wrapper in these cases.
Cheers,
Longman
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 4/8] sched/deadline: Rebuild root domain accounting after every update
2025-03-10 19:18 ` Waiman Long
@ 2025-03-11 0:16 ` Waiman Long
2025-03-11 11:59 ` Juri Lelli
0 siblings, 1 reply; 30+ messages in thread
From: Waiman Long @ 2025-03-11 0:16 UTC (permalink / raw)
To: Dietmar Eggemann, Juri Lelli, linux-kernel, cgroups
Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Steven Rostedt,
Ben Segall, Mel Gorman, Valentin Schneider, Tejun Heo,
Johannes Weiner, Michal Koutný, Qais Yousef,
Sebastian Andrzej Siewior, Swapnil Sapkal, Shrikanth Hegde,
Phil Auld, luca.abeni, tommaso.cucinotta, Jon Hunter
On 3/10/25 3:18 PM, Waiman Long wrote:
>
> On 3/10/25 2:54 PM, Dietmar Eggemann wrote:
>> On 10/03/2025 10:37, Juri Lelli wrote:
>>> Rebuilding of root domains accounting information (total_bw) is
>>> currently broken on some cases, e.g. suspend/resume on aarch64. Problem
>> Nit: Couldn't spot any arch dependency here. I guess it was just tested
>> on Arm64 platforms so far.
>>
>> [...]
>>
>>> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
>>> index 44093339761c..363ad268a25b 100644
>>> --- a/kernel/sched/topology.c
>>> +++ b/kernel/sched/topology.c
>>> @@ -2791,6 +2791,7 @@ void partition_sched_domains_locked(int
>>> ndoms_new, cpumask_var_t doms_new[],
>>> ndoms_cur = ndoms_new;
>>> update_sched_domain_debugfs();
>>> + dl_rebuild_rd_accounting();
>> Won't dl_rebuild_rd_accounting()'s lockdep_assert_held(&cpuset_mutex)
>> barf when called via cpuhp's:
>>
>> sched_cpu_deactivate()
>>
>> cpuset_cpu_inactive()
>>
>> partition_sched_domains()
>>
>> partition_sched_domains_locked()
>>
>> dl_rebuild_rd_accounting()
>>
>> ?
>>
>> [...]
>
> Right. If cpuhp_tasks_frozen is true, partition_sched_domains() will
> be called without holding cpuset mutex.
>
> Well, I think we will need an additional wrapper in cpuset.c that
> acquires the cpuset_mutex first before calling
> partition_sched_domains() and use the new wrapper in these cases.
Actually, partition_sched_domains() is called with the special arguments
(1, NULL, NULL) to reset the domain to a single one. So perhaps
something like the following will be enough to avoid this problem.
BTW, we can merge partition_sched_domains_locked() into
partition_sched_domains() as there is no other caller.
Cheers,
Longman
------------------------------------------------------------------------------------------------
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 95bde793651c..39b9ffa6a39a 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2692,6 +2692,7 @@ static void partition_sched_domains_locked(int
ndoms_new, cpumask_var_t doms_new
struct sched_domain_attr *dattr_new)
{
bool __maybe_unused has_eas = false;
+ bool reset_domain = false;
int i, j, n;
int new_topology;
@@ -2706,6 +2707,7 @@ static void partition_sched_domains_locked(int
ndoms_new, cpumask_var_t doms_new
if (!doms_new) {
WARN_ON_ONCE(dattr_new);
n = 0;
+ reset_domain = true;
doms_new = alloc_sched_domains(1);
if (doms_new) {
n = 1;
@@ -2778,7 +2780,8 @@ static void partition_sched_domains_locked(int
ndoms_new, cpumask_var_t doms_new
ndoms_cur = ndoms_new;
update_sched_domain_debugfs();
- dl_rebuild_rd_accounting();
+ if (!reset_domain)
+ dl_rebuild_rd_accounting();
}
/*
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH v3 4/8] sched/deadline: Rebuild root domain accounting after every update
2025-03-11 0:16 ` Waiman Long
@ 2025-03-11 11:59 ` Juri Lelli
2025-03-11 12:34 ` Waiman Long
0 siblings, 1 reply; 30+ messages in thread
From: Juri Lelli @ 2025-03-11 11:59 UTC (permalink / raw)
To: Waiman Long
Cc: Dietmar Eggemann, linux-kernel, cgroups, Ingo Molnar,
Peter Zijlstra, Vincent Guittot, Steven Rostedt, Ben Segall,
Mel Gorman, Valentin Schneider, Tejun Heo, Johannes Weiner,
Michal Koutný, Qais Yousef, Sebastian Andrzej Siewior,
Swapnil Sapkal, Shrikanth Hegde, Phil Auld, luca.abeni,
tommaso.cucinotta, Jon Hunter
On 10/03/25 20:16, Waiman Long wrote:
> On 3/10/25 3:18 PM, Waiman Long wrote:
> >
> > On 3/10/25 2:54 PM, Dietmar Eggemann wrote:
> > > On 10/03/2025 10:37, Juri Lelli wrote:
> > > > Rebuilding of root domains accounting information (total_bw) is
> > > > currently broken on some cases, e.g. suspend/resume on aarch64. Problem
> > > Nit: Couldn't spot any arch dependency here. I guess it was just tested
> > > on Arm64 platforms so far.
> > >
> > > [...]
> > >
> > > > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> > > > index 44093339761c..363ad268a25b 100644
> > > > --- a/kernel/sched/topology.c
> > > > +++ b/kernel/sched/topology.c
> > > > @@ -2791,6 +2791,7 @@ void partition_sched_domains_locked(int
> > > > ndoms_new, cpumask_var_t doms_new[],
> > > > ndoms_cur = ndoms_new;
> > > > update_sched_domain_debugfs();
> > > > + dl_rebuild_rd_accounting();
> > > Won't dl_rebuild_rd_accounting()'s lockdep_assert_held(&cpuset_mutex)
> > > barf when called via cpuhp's:
> > >
> > > sched_cpu_deactivate()
> > >
> > > cpuset_cpu_inactive()
> > >
> > > partition_sched_domains()
> > >
> > > partition_sched_domains_locked()
> > >
> > > dl_rebuild_rd_accounting()
> > >
> > > ?
Good catch. Guess I didn't notice while testing with LOCKDEP as I was
never able to hit this call path on my systems.
> > Right. If cpuhp_tasks_frozen is true, partition_sched_domains() will be
> > called without holding cpuset mutex.
> >
> > Well, I think we will need an additional wrapper in cpuset.c that
> > acquires the cpuset_mutex first before calling partition_sched_domains()
> > and use the new wrapper in these cases.
>
> Actually, partition_sched_domains() is called with the special arguments (1,
> NULL, NULL) to reset the domain to a single one. So perhaps something like
> the following will be enough to avoid this problem.
I think this would work, as we will still rebuild the accounting after
last CPU comes back from suspend. The thing I am still not sure about is
what we want to do in case we have DEADLINE tasks around, since with
this I belive we would be ignoring them and let suspend proceed.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 4/8] sched/deadline: Rebuild root domain accounting after every update
2025-03-11 11:59 ` Juri Lelli
@ 2025-03-11 12:34 ` Waiman Long
2025-03-11 13:29 ` Dietmar Eggemann
0 siblings, 1 reply; 30+ messages in thread
From: Waiman Long @ 2025-03-11 12:34 UTC (permalink / raw)
To: Juri Lelli, Waiman Long
Cc: Dietmar Eggemann, linux-kernel, cgroups, Ingo Molnar,
Peter Zijlstra, Vincent Guittot, Steven Rostedt, Ben Segall,
Mel Gorman, Valentin Schneider, Tejun Heo, Johannes Weiner,
Michal Koutný, Qais Yousef, Sebastian Andrzej Siewior,
Swapnil Sapkal, Shrikanth Hegde, Phil Auld, luca.abeni,
tommaso.cucinotta, Jon Hunter
On 3/11/25 7:59 AM, Juri Lelli wrote:
> On 10/03/25 20:16, Waiman Long wrote:
>> On 3/10/25 3:18 PM, Waiman Long wrote:
>>> On 3/10/25 2:54 PM, Dietmar Eggemann wrote:
>>>> On 10/03/2025 10:37, Juri Lelli wrote:
>>>>> Rebuilding of root domains accounting information (total_bw) is
>>>>> currently broken on some cases, e.g. suspend/resume on aarch64. Problem
>>>> Nit: Couldn't spot any arch dependency here. I guess it was just tested
>>>> on Arm64 platforms so far.
>>>>
>>>> [...]
>>>>
>>>>> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
>>>>> index 44093339761c..363ad268a25b 100644
>>>>> --- a/kernel/sched/topology.c
>>>>> +++ b/kernel/sched/topology.c
>>>>> @@ -2791,6 +2791,7 @@ void partition_sched_domains_locked(int
>>>>> ndoms_new, cpumask_var_t doms_new[],
>>>>> ndoms_cur = ndoms_new;
>>>>> update_sched_domain_debugfs();
>>>>> + dl_rebuild_rd_accounting();
>>>> Won't dl_rebuild_rd_accounting()'s lockdep_assert_held(&cpuset_mutex)
>>>> barf when called via cpuhp's:
>>>>
>>>> sched_cpu_deactivate()
>>>>
>>>> cpuset_cpu_inactive()
>>>>
>>>> partition_sched_domains()
>>>>
>>>> partition_sched_domains_locked()
>>>>
>>>> dl_rebuild_rd_accounting()
>>>>
>>>> ?
> Good catch. Guess I didn't notice while testing with LOCKDEP as I was
> never able to hit this call path on my systems.
>
>>> Right. If cpuhp_tasks_frozen is true, partition_sched_domains() will be
>>> called without holding cpuset mutex.
>>>
>>> Well, I think we will need an additional wrapper in cpuset.c that
>>> acquires the cpuset_mutex first before calling partition_sched_domains()
>>> and use the new wrapper in these cases.
>> Actually, partition_sched_domains() is called with the special arguments (1,
>> NULL, NULL) to reset the domain to a single one. So perhaps something like
>> the following will be enough to avoid this problem.
> I think this would work, as we will still rebuild the accounting after
> last CPU comes back from suspend. The thing I am still not sure about is
> what we want to do in case we have DEADLINE tasks around, since with
> this I belive we would be ignoring them and let suspend proceed.
That is the current behavior. You can certainly create a test case to
trigger such condition and see what to do about it. Alternatively, you
can document that and come up with a follow-up patch later on.
Cheers,
Longman
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 4/8] sched/deadline: Rebuild root domain accounting after every update
2025-03-11 12:34 ` Waiman Long
@ 2025-03-11 13:29 ` Dietmar Eggemann
2025-03-11 14:51 ` Waiman Long
0 siblings, 1 reply; 30+ messages in thread
From: Dietmar Eggemann @ 2025-03-11 13:29 UTC (permalink / raw)
To: Waiman Long, Juri Lelli
Cc: linux-kernel, cgroups, Ingo Molnar, Peter Zijlstra,
Vincent Guittot, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Tejun Heo, Johannes Weiner,
Michal Koutný, Qais Yousef, Sebastian Andrzej Siewior,
Swapnil Sapkal, Shrikanth Hegde, Phil Auld, luca.abeni,
tommaso.cucinotta, Jon Hunter
On 11/03/2025 13:34, Waiman Long wrote:
> On 3/11/25 7:59 AM, Juri Lelli wrote:
>> On 10/03/25 20:16, Waiman Long wrote:
>>> On 3/10/25 3:18 PM, Waiman Long wrote:
>>>> On 3/10/25 2:54 PM, Dietmar Eggemann wrote:
>>>>> On 10/03/2025 10:37, Juri Lelli wrote:
>>>>>> Rebuilding of root domains accounting information (total_bw) is
>>>>>> currently broken on some cases, e.g. suspend/resume on aarch64.
>>>>>> Problem
>>>>> Nit: Couldn't spot any arch dependency here. I guess it was just
>>>>> tested
>>>>> on Arm64 platforms so far.
>>>>>
>>>>> [...]
>>>>>
>>>>>> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
>>>>>> index 44093339761c..363ad268a25b 100644
>>>>>> --- a/kernel/sched/topology.c
>>>>>> +++ b/kernel/sched/topology.c
>>>>>> @@ -2791,6 +2791,7 @@ void partition_sched_domains_locked(int
>>>>>> ndoms_new, cpumask_var_t doms_new[],
>>>>>> ndoms_cur = ndoms_new;
>>>>>> update_sched_domain_debugfs();
>>>>>> + dl_rebuild_rd_accounting();
>>>>> Won't dl_rebuild_rd_accounting()'s lockdep_assert_held(&cpuset_mutex)
>>>>> barf when called via cpuhp's:
>>>>>
>>>>> sched_cpu_deactivate()
>>>>>
>>>>> cpuset_cpu_inactive()
>>>>>
>>>>> partition_sched_domains()
>>>>>
>>>>> partition_sched_domains_locked()
>>>>>
>>>>> dl_rebuild_rd_accounting()
>>>>>
>>>>> ?
>> Good catch. Guess I didn't notice while testing with LOCKDEP as I was
>> never able to hit this call path on my systems.
>>
>>>> Right. If cpuhp_tasks_frozen is true, partition_sched_domains() will be
>>>> called without holding cpuset mutex.
>>>>
>>>> Well, I think we will need an additional wrapper in cpuset.c that
>>>> acquires the cpuset_mutex first before calling
>>>> partition_sched_domains()
>>>> and use the new wrapper in these cases.
>>> Actually, partition_sched_domains() is called with the special
>>> arguments (1,
>>> NULL, NULL) to reset the domain to a single one. So perhaps something
>>> like
>>> the following will be enough to avoid this problem.
>> I think this would work, as we will still rebuild the accounting after
>> last CPU comes back from suspend. The thing I am still not sure about is
>> what we want to do in case we have DEADLINE tasks around, since with
>> this I belive we would be ignoring them and let suspend proceed.
>
> That is the current behavior. You can certainly create a test case to
> trigger such condition and see what to do about it. Alternatively, you
> can document that and come up with a follow-up patch later on.
But don't we rely on that partition_sched_domains_locked() calls
dl_rebuild_rd_accounting() even in the reset_domain=1 case?
Testcase: suspend/resume
on Arm64 big.LITTLE cpumask=[LITTLE][big]=[0,3-5][1-2]
plus cmd line option 'isolcpus=3,4'.
with Waiman's snippet:
https://lkml.kernel.org/r/fd4d6143-9bd2-4a7c-80dc-1e19e4d1b2d1@redhat.com
...
[ 234.831675] --- > partition_sched_domains_locked() reset_domain=1
[ 234.835966] psci: CPU4 killed (polled 0 ms)
[ 234.838912] Error taking CPU3 down: -16
[ 234.838952] Non-boot CPUs are not disabled
[ 234.838986] Enabling non-boot CPUs ...
...
IIRC, that's the old DL accounting issue.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 4/8] sched/deadline: Rebuild root domain accounting after every update
2025-03-11 13:29 ` Dietmar Eggemann
@ 2025-03-11 14:51 ` Waiman Long
2025-03-12 9:53 ` Dietmar Eggemann
0 siblings, 1 reply; 30+ messages in thread
From: Waiman Long @ 2025-03-11 14:51 UTC (permalink / raw)
To: Dietmar Eggemann, Waiman Long, Juri Lelli
Cc: linux-kernel, cgroups, Ingo Molnar, Peter Zijlstra,
Vincent Guittot, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Tejun Heo, Johannes Weiner,
Michal Koutný, Qais Yousef, Sebastian Andrzej Siewior,
Swapnil Sapkal, Shrikanth Hegde, Phil Auld, luca.abeni,
tommaso.cucinotta, Jon Hunter
On 3/11/25 9:29 AM, Dietmar Eggemann wrote:
> On 11/03/2025 13:34, Waiman Long wrote:
>> On 3/11/25 7:59 AM, Juri Lelli wrote:
>>> On 10/03/25 20:16, Waiman Long wrote:
>>>> On 3/10/25 3:18 PM, Waiman Long wrote:
>>>>> On 3/10/25 2:54 PM, Dietmar Eggemann wrote:
>>>>>> On 10/03/2025 10:37, Juri Lelli wrote:
>>>>>>> Rebuilding of root domains accounting information (total_bw) is
>>>>>>> currently broken on some cases, e.g. suspend/resume on aarch64.
>>>>>>> Problem
>>>>>> Nit: Couldn't spot any arch dependency here. I guess it was just
>>>>>> tested
>>>>>> on Arm64 platforms so far.
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>>> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
>>>>>>> index 44093339761c..363ad268a25b 100644
>>>>>>> --- a/kernel/sched/topology.c
>>>>>>> +++ b/kernel/sched/topology.c
>>>>>>> @@ -2791,6 +2791,7 @@ void partition_sched_domains_locked(int
>>>>>>> ndoms_new, cpumask_var_t doms_new[],
>>>>>>> ndoms_cur = ndoms_new;
>>>>>>> update_sched_domain_debugfs();
>>>>>>> + dl_rebuild_rd_accounting();
>>>>>> Won't dl_rebuild_rd_accounting()'s lockdep_assert_held(&cpuset_mutex)
>>>>>> barf when called via cpuhp's:
>>>>>>
>>>>>> sched_cpu_deactivate()
>>>>>>
>>>>>> cpuset_cpu_inactive()
>>>>>>
>>>>>> partition_sched_domains()
>>>>>>
>>>>>> partition_sched_domains_locked()
>>>>>>
>>>>>> dl_rebuild_rd_accounting()
>>>>>>
>>>>>> ?
>>> Good catch. Guess I didn't notice while testing with LOCKDEP as I was
>>> never able to hit this call path on my systems.
>>>
>>>>> Right. If cpuhp_tasks_frozen is true, partition_sched_domains() will be
>>>>> called without holding cpuset mutex.
>>>>>
>>>>> Well, I think we will need an additional wrapper in cpuset.c that
>>>>> acquires the cpuset_mutex first before calling
>>>>> partition_sched_domains()
>>>>> and use the new wrapper in these cases.
>>>> Actually, partition_sched_domains() is called with the special
>>>> arguments (1,
>>>> NULL, NULL) to reset the domain to a single one. So perhaps something
>>>> like
>>>> the following will be enough to avoid this problem.
>>> I think this would work, as we will still rebuild the accounting after
>>> last CPU comes back from suspend. The thing I am still not sure about is
>>> what we want to do in case we have DEADLINE tasks around, since with
>>> this I belive we would be ignoring them and let suspend proceed.
>> That is the current behavior. You can certainly create a test case to
>> trigger such condition and see what to do about it. Alternatively, you
>> can document that and come up with a follow-up patch later on.
> But don't we rely on that partition_sched_domains_locked() calls
> dl_rebuild_rd_accounting() even in the reset_domain=1 case?
>
> Testcase: suspend/resume
>
> on Arm64 big.LITTLE cpumask=[LITTLE][big]=[0,3-5][1-2]
> plus cmd line option 'isolcpus=3,4'.
>
> with Waiman's snippet:
> https://lkml.kernel.org/r/fd4d6143-9bd2-4a7c-80dc-1e19e4d1b2d1@redhat.com
>
> ...
> [ 234.831675] --- > partition_sched_domains_locked() reset_domain=1
> [ 234.835966] psci: CPU4 killed (polled 0 ms)
> [ 234.838912] Error taking CPU3 down: -16
> [ 234.838952] Non-boot CPUs are not disabled
> [ 234.838986] Enabling non-boot CPUs ...
> ...
>
> IIRC, that's the old DL accounting issue.
You are right. cpuhp_tasks_frozen will be set in the suspend/resume
case. In that case, we do need to add a cpuset helper to acquire the
cpuset_mutex. A test patch as follows (no testing done yet):
diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index c414daa7d503..ef1ffb9c52b0 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -129,6 +129,7 @@ extern void dl_rebuild_rd_accounting(void);
extern void rebuild_sched_domains(void);
extern void cpuset_print_current_mems_allowed(void);
+extern void cpuset_reset_sched_domains(void)
/*
* read_mems_allowed_begin is required when making decisions involving
@@ -269,6 +270,11 @@ static inline void rebuild_sched_domains(void)
partition_sched_domains(1, NULL, NULL);
}
+static inline void cpuset_reset_sched_domains(void)
+{
+ partition_sched_domains(1, NULL, NULL);
+}
+
static inline void cpuset_print_current_mems_allowed(void)
{
}
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 7995cd58a01b..a51099e5d587 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1076,6 +1076,13 @@ void rebuild_sched_domains(void)
cpus_read_unlock();
}
+void cpuset_reset_sched_domains(void)
+{
+ mutex_lock(&cpuset_mutex);
+ partition_sched_domains(1, NULL, NULL);
+ mutex_unlock(&cpuset_mutex);
+}
+
/**
* cpuset_update_tasks_cpumask - Update the cpumasks of tasks in the
cpuset.
* @cs: the cpuset in which each task's cpus_allowed mask needs to be
changed
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 58593f4d09a1..dbf44ddbb6b4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8183,7 +8183,7 @@ static void cpuset_cpu_active(void)
* operation in the resume sequence, just build a
single sched
* domain, ignoring cpusets.
*/
- partition_sched_domains(1, NULL, NULL);
+ cpuset_reset_sched_domains();
if (--num_cpus_frozen)
return;
/*
@@ -8202,7 +8202,7 @@ static void cpuset_cpu_inactive(unsigned int cpu)
cpuset_update_active_cpus();
} else {
num_cpus_frozen++;
- partition_sched_domains(1, NULL, NULL);
+ cpuset_reset_sched_domains();
}
}
Cheers,
Longman
>
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH v3 4/8] sched/deadline: Rebuild root domain accounting after every update
2025-03-11 14:51 ` Waiman Long
@ 2025-03-12 9:53 ` Dietmar Eggemann
2025-03-12 10:09 ` Juri Lelli
0 siblings, 1 reply; 30+ messages in thread
From: Dietmar Eggemann @ 2025-03-12 9:53 UTC (permalink / raw)
To: Waiman Long, Juri Lelli
Cc: linux-kernel, cgroups, Ingo Molnar, Peter Zijlstra,
Vincent Guittot, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Tejun Heo, Johannes Weiner,
Michal Koutný, Qais Yousef, Sebastian Andrzej Siewior,
Swapnil Sapkal, Shrikanth Hegde, Phil Auld, luca.abeni,
tommaso.cucinotta, Jon Hunter
On 11/03/2025 15:51, Waiman Long wrote:
> On 3/11/25 9:29 AM, Dietmar Eggemann wrote:
>> On 11/03/2025 13:34, Waiman Long wrote:
>>> On 3/11/25 7:59 AM, Juri Lelli wrote:
>>>> On 10/03/25 20:16, Waiman Long wrote:
>>>>> On 3/10/25 3:18 PM, Waiman Long wrote:
>>>>>> On 3/10/25 2:54 PM, Dietmar Eggemann wrote:
>>>>>>> On 10/03/2025 10:37, Juri Lelli wrote:
[...]
>> Testcase: suspend/resume
>>
>> on Arm64 big.LITTLE cpumask=[LITTLE][big]=[0,3-5][1-2]
>> plus cmd line option 'isolcpus=3,4'.
>>
>> with Waiman's snippet:
>> https://lkml.kernel.org/r/fd4d6143-9bd2-4a7c-80dc-1e19e4d1b2d1@redhat.com
>>
>> ...
>> [ 234.831675] --- > partition_sched_domains_locked() reset_domain=1
>> [ 234.835966] psci: CPU4 killed (polled 0 ms)
>> [ 234.838912] Error taking CPU3 down: -16
>> [ 234.838952] Non-boot CPUs are not disabled
>> [ 234.838986] Enabling non-boot CPUs ...
>> ...
>>
>> IIRC, that's the old DL accounting issue.
>
> You are right. cpuhp_tasks_frozen will be set in the suspend/resume
> case. In that case, we do need to add a cpuset helper to acquire the
> cpuset_mutex. A test patch as follows (no testing done yet):
>
> diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
> index c414daa7d503..ef1ffb9c52b0 100644
> --- a/include/linux/cpuset.h
> +++ b/include/linux/cpuset.h
> @@ -129,6 +129,7 @@ extern void dl_rebuild_rd_accounting(void);
> extern void rebuild_sched_domains(void);
>
> extern void cpuset_print_current_mems_allowed(void);
> +extern void cpuset_reset_sched_domains(void)
>
> /*
> * read_mems_allowed_begin is required when making decisions involving
> @@ -269,6 +270,11 @@ static inline void rebuild_sched_domains(void)
> partition_sched_domains(1, NULL, NULL);
> }
>
> +static inline void cpuset_reset_sched_domains(void)
> +{
> + partition_sched_domains(1, NULL, NULL);
> +}
> +
> static inline void cpuset_print_current_mems_allowed(void)
> {
> }
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 7995cd58a01b..a51099e5d587 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -1076,6 +1076,13 @@ void rebuild_sched_domains(void)
> cpus_read_unlock();
> }
>
> +void cpuset_reset_sched_domains(void)
> +{
> + mutex_lock(&cpuset_mutex);
> + partition_sched_domains(1, NULL, NULL);
> + mutex_unlock(&cpuset_mutex);
> +}
> +
> /**
> * cpuset_update_tasks_cpumask - Update the cpumasks of tasks in the
> cpuset.
> * @cs: the cpuset in which each task's cpus_allowed mask needs to be
> changed
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 58593f4d09a1..dbf44ddbb6b4 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -8183,7 +8183,7 @@ static void cpuset_cpu_active(void)
> * operation in the resume sequence, just build a single
> sched
> * domain, ignoring cpusets.
> */
> - partition_sched_domains(1, NULL, NULL);
> + cpuset_reset_sched_domains();
> if (--num_cpus_frozen)
> return;
> /*
> @@ -8202,7 +8202,7 @@ static void cpuset_cpu_inactive(unsigned int cpu)
> cpuset_update_active_cpus();
> } else {
> num_cpus_frozen++;
> - partition_sched_domains(1, NULL, NULL);
> + cpuset_reset_sched_domains();
> }
> }
This seems to work. But what about a !CONFIG_CPUSETS build. In this case
we won't have this DL accounting update during suspend/resume since
dl_rebuild_rd_accounting() is empty.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 4/8] sched/deadline: Rebuild root domain accounting after every update
2025-03-12 9:53 ` Dietmar Eggemann
@ 2025-03-12 10:09 ` Juri Lelli
2025-03-12 13:55 ` Waiman Long
0 siblings, 1 reply; 30+ messages in thread
From: Juri Lelli @ 2025-03-12 10:09 UTC (permalink / raw)
To: Dietmar Eggemann
Cc: Waiman Long, linux-kernel, cgroups, Ingo Molnar, Peter Zijlstra,
Vincent Guittot, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Tejun Heo, Johannes Weiner,
Michal Koutný, Qais Yousef, Sebastian Andrzej Siewior,
Swapnil Sapkal, Shrikanth Hegde, Phil Auld, luca.abeni,
tommaso.cucinotta, Jon Hunter
On 12/03/25 10:53, Dietmar Eggemann wrote:
> On 11/03/2025 15:51, Waiman Long wrote:
...
> > You are right. cpuhp_tasks_frozen will be set in the suspend/resume
> > case. In that case, we do need to add a cpuset helper to acquire the
> > cpuset_mutex. A test patch as follows (no testing done yet):
...
> This seems to work.
Thanks for testing!
Waiman, how do you like to proceed. Separate patch (in this case can you
please send me that with changelog etc.) or incorporate your changes
into my original patch and possibly, if you like, add Co-authored-by?
> But what about a !CONFIG_CPUSETS build. In this case we won't have
> this DL accounting update during suspend/resume since
> dl_rebuild_rd_accounting() is empty.
I unfortunately very much suspect !CPUSETS accounting is broken. But if
that is indeed the case, it has been broken for a while. :(
Will need to double check that, but I would probably do it later on
separated from this set that at least seems to cure the most common
cases. What do people think?
Thanks,
Juri
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 1/8] sched/deadline: Ignore special tasks when rebuilding domains
2025-03-10 9:19 ` [PATCH v3 1/8] sched/deadline: Ignore special tasks when rebuilding domains Juri Lelli
@ 2025-03-12 13:32 ` Valentin Schneider
0 siblings, 0 replies; 30+ messages in thread
From: Valentin Schneider @ 2025-03-12 13:32 UTC (permalink / raw)
To: Juri Lelli, linux-kernel, cgroups
Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman,
Waiman Long, Tejun Heo, Johannes Weiner, Michal Koutný,
Qais Yousef, Sebastian Andrzej Siewior, Swapnil Sapkal,
Shrikanth Hegde, Phil Auld, luca.abeni, tommaso.cucinotta,
Jon Hunter
On 10/03/25 10:19, Juri Lelli wrote:
> SCHED_DEADLINE special tasks get a fake bandwidth that is only used to
> make sure sleeping and priority inheritance 'work', but it is ignored
> for runtime enforcement and admission control.
>
> Be consistent with it also when rebuilding root domains.
>
> Reported-by: Jon Hunter <jonathanh@nvidia.com>
> Fixes: 53916d5fd3c0 ("sched/deadline: Check bandwidth overflow earlier for hotplug")
> Tested-by: Waiman Long <longman@redhat.com>
> Tested-by: Jon Hunter <jonathanh@nvidia.com>
> Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 2/8] sched/topology: Wrappers for sched_domains_mutex
2025-03-10 9:33 ` [PATCH v3 2/8] sched/topology: Wrappers for sched_domains_mutex Juri Lelli
@ 2025-03-12 13:32 ` Valentin Schneider
0 siblings, 0 replies; 30+ messages in thread
From: Valentin Schneider @ 2025-03-12 13:32 UTC (permalink / raw)
To: Juri Lelli, linux-kernel, cgroups
Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Waiman Long, Tejun Heo,
Johannes Weiner, Michal Koutný, Qais Yousef,
Sebastian Andrzej Siewior, Swapnil Sapkal, Shrikanth Hegde,
Phil Auld, luca.abeni, tommaso.cucinotta, Jon Hunter
On 10/03/25 10:33, Juri Lelli wrote:
> Create wrappers for sched_domains_mutex so that it can transparently be
> used on both CONFIG_SMP and !CONFIG_SMP, as some function will need to
> do.
>
> Reported-by: Jon Hunter <jonathanh@nvidia.com>
> Fixes: 53916d5fd3c0 ("sched/deadline: Check bandwidth overflow earlier for hotplug")
> Tested-by: Waiman Long <longman@redhat.com>
> Tested-by: Jon Hunter <jonathanh@nvidia.com>
> Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 3/8] sched/deadline: Generalize unique visiting of root domains
2025-03-10 9:35 ` [PATCH v3 3/8] sched/deadline: Generalize unique visiting of root domains Juri Lelli
@ 2025-03-12 13:32 ` Valentin Schneider
0 siblings, 0 replies; 30+ messages in thread
From: Valentin Schneider @ 2025-03-12 13:32 UTC (permalink / raw)
To: Juri Lelli, linux-kernel, cgroups
Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Waiman Long, Tejun Heo,
Johannes Weiner, Michal Koutný, Qais Yousef,
Sebastian Andrzej Siewior, Swapnil Sapkal, Shrikanth Hegde,
Phil Auld, luca.abeni, tommaso.cucinotta, Jon Hunter
On 10/03/25 10:35, Juri Lelli wrote:
> Bandwidth checks and updates that work on root domains currently employ
> a cookie mechanism for efficiency. This mechanism is very much tied to
> when root domains are first created and initialized.
>
> Generalize the cookie mechanism so that it can be used also later at
> runtime while updating root domains. Also, additionally guard it with
> sched_domains_mutex, since domains need to be stable while updating them
> (and it will be required for further dynamic changes).
>
> Reported-by: Jon Hunter <jonathanh@nvidia.com>
> Fixes: 53916d5fd3c0 ("sched/deadline: Check bandwidth overflow earlier for hotplug")
> Tested-by: Waiman Long <longman@redhat.com>
> Tested-by: Jon Hunter <jonathanh@nvidia.com>
> Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 5/8] sched/topology: Remove redundant dl_clear_root_domain call
2025-03-10 9:38 ` [PATCH v3 5/8] sched/topology: Remove redundant dl_clear_root_domain call Juri Lelli
@ 2025-03-12 13:32 ` Valentin Schneider
0 siblings, 0 replies; 30+ messages in thread
From: Valentin Schneider @ 2025-03-12 13:32 UTC (permalink / raw)
To: Juri Lelli, linux-kernel, cgroups
Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Waiman Long, Tejun Heo,
Johannes Weiner, Michal Koutný, Qais Yousef,
Sebastian Andrzej Siewior, Swapnil Sapkal, Shrikanth Hegde,
Phil Auld, luca.abeni, tommaso.cucinotta, Jon Hunter
On 10/03/25 10:38, Juri Lelli wrote:
> We completely clean and restore root domains bandwidth accounting after
> every root domains change, so the dl_clear_root_domain() call in
> partition_sched_domains_locked() is redundant.
>
> Remove it.
>
> Reviewed-by: Waiman Long <llong@redhat.com>
> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
> Tested-by: Waiman Long <longman@redhat.com>
> Tested-by: Jon Hunter <jonathanh@nvidia.com>
> Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 6/8] cgroup/cpuset: Remove partition_and_rebuild_sched_domains
2025-03-10 9:39 ` [PATCH v3 6/8] cgroup/cpuset: Remove partition_and_rebuild_sched_domains Juri Lelli
@ 2025-03-12 13:32 ` Valentin Schneider
0 siblings, 0 replies; 30+ messages in thread
From: Valentin Schneider @ 2025-03-12 13:32 UTC (permalink / raw)
To: Juri Lelli, linux-kernel, cgroups
Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Waiman Long, Tejun Heo,
Johannes Weiner, Michal Koutný, Qais Yousef,
Sebastian Andrzej Siewior, Swapnil Sapkal, Shrikanth Hegde,
Phil Auld, luca.abeni, tommaso.cucinotta, Jon Hunter
On 10/03/25 10:39, Juri Lelli wrote:
> partition_and_rebuild_sched_domains() and partition_sched_domains() are
> now equivalent.
>
> Remove the former as a nice clean up.
>
> Suggested-by: Waiman Long <llong@redhat.com>
> Reviewed-by: Waiman Long <llong@redhat.com>
> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
> Tested-by: Waiman Long <longman@redhat.com>
> Tested-by: Jon Hunter <jonathanh@nvidia.com>
> Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 7/8] sched/topology: Stop exposing partition_sched_domains_locked
2025-03-10 9:40 ` [PATCH v3 7/8] sched/topology: Stop exposing partition_sched_domains_locked Juri Lelli
@ 2025-03-12 13:32 ` Valentin Schneider
0 siblings, 0 replies; 30+ messages in thread
From: Valentin Schneider @ 2025-03-12 13:32 UTC (permalink / raw)
To: Juri Lelli, linux-kernel, cgroups
Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Waiman Long, Tejun Heo,
Johannes Weiner, Michal Koutný, Qais Yousef,
Sebastian Andrzej Siewior, Swapnil Sapkal, Shrikanth Hegde,
Phil Auld, luca.abeni, tommaso.cucinotta, Jon Hunter
On 10/03/25 10:40, Juri Lelli wrote:
> The are no callers of partition_sched_domains_locked() outside
> topology.c.
>
> Stop exposing such function.
>
> Suggested-by: Waiman Long <llong@redhat.com>
> Tested-by: Waiman Long <longman@redhat.com>
> Tested-by: Jon Hunter <jonathanh@nvidia.com>
> Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 8/8] include/{topology,cpuset}: Move dl_rebuild_rd_accounting to cpuset.h
2025-03-10 9:40 ` [PATCH v3 8/8] include/{topology,cpuset}: Move dl_rebuild_rd_accounting to cpuset.h Juri Lelli
@ 2025-03-12 13:32 ` Valentin Schneider
0 siblings, 0 replies; 30+ messages in thread
From: Valentin Schneider @ 2025-03-12 13:32 UTC (permalink / raw)
To: Juri Lelli, linux-kernel, cgroups
Cc: Ingo Molnar, Peter Zijlstra, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Mel Gorman, Waiman Long, Tejun Heo,
Johannes Weiner, Michal Koutný, Qais Yousef,
Sebastian Andrzej Siewior, Swapnil Sapkal, Shrikanth Hegde,
Phil Auld, luca.abeni, tommaso.cucinotta, Jon Hunter
On 10/03/25 10:40, Juri Lelli wrote:
> dl_rebuild_rd_accounting() is defined in cpuset.c, so it makes more
> sense to move related declarations to cpuset.h.
>
> Implement the move.
>
> Suggested-by: Waiman Long <llong@redhat.com>
> Reviewed-by: Waiman Long <llong@redhat.com>
> Tested-by: Waiman Long <longman@redhat.com>
> Tested-by: Jon Hunter <jonathanh@nvidia.com>
> Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 4/8] sched/deadline: Rebuild root domain accounting after every update
2025-03-12 10:09 ` Juri Lelli
@ 2025-03-12 13:55 ` Waiman Long
2025-03-12 14:11 ` Juri Lelli
0 siblings, 1 reply; 30+ messages in thread
From: Waiman Long @ 2025-03-12 13:55 UTC (permalink / raw)
To: Juri Lelli, Dietmar Eggemann
Cc: Waiman Long, linux-kernel, cgroups, Ingo Molnar, Peter Zijlstra,
Vincent Guittot, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Tejun Heo, Johannes Weiner,
Michal Koutný, Qais Yousef, Sebastian Andrzej Siewior,
Swapnil Sapkal, Shrikanth Hegde, Phil Auld, luca.abeni,
tommaso.cucinotta, Jon Hunter
On 3/12/25 6:09 AM, Juri Lelli wrote:
> On 12/03/25 10:53, Dietmar Eggemann wrote:
>> On 11/03/2025 15:51, Waiman Long wrote:
> ...
>
>>> You are right. cpuhp_tasks_frozen will be set in the suspend/resume
>>> case. In that case, we do need to add a cpuset helper to acquire the
>>> cpuset_mutex. A test patch as follows (no testing done yet):
> ...
>
>> This seems to work.
> Thanks for testing!
>
> Waiman, how do you like to proceed. Separate patch (in this case can you
> please send me that with changelog etc.) or incorporate your changes
> into my original patch and possibly, if you like, add Co-authored-by?
I think it will be better to merge into a single patch to avoid having a
broken patch. It is up to you if you want me as a co-author. I don't
really mind.
>
>> But what about a !CONFIG_CPUSETS build. In this case we won't have
>> this DL accounting update during suspend/resume since
>> dl_rebuild_rd_accounting() is empty.
> I unfortunately very much suspect !CPUSETS accounting is broken. But if
> that is indeed the case, it has been broken for a while. :(
Without CONFIG_CPUSETS, there will be one and only one global sched
domain. Will this still be a problem?
>
> Will need to double check that, but I would probably do it later on
> separated from this set that at least seems to cure the most common
> cases. What do people think?
I am not aware of any distros without setting CONFIG_CPUSETS. So it is
mostly a theoretical problem if there is one. So I would recommend going
ahead with the current patch series instead of spending more time
investigating this issue.
Cheers,
Longman
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 4/8] sched/deadline: Rebuild root domain accounting after every update
2025-03-12 13:55 ` Waiman Long
@ 2025-03-12 14:11 ` Juri Lelli
2025-03-12 16:29 ` Dietmar Eggemann
0 siblings, 1 reply; 30+ messages in thread
From: Juri Lelli @ 2025-03-12 14:11 UTC (permalink / raw)
To: Waiman Long
Cc: Dietmar Eggemann, linux-kernel, cgroups, Ingo Molnar,
Peter Zijlstra, Vincent Guittot, Steven Rostedt, Ben Segall,
Mel Gorman, Valentin Schneider, Tejun Heo, Johannes Weiner,
Michal Koutný, Qais Yousef, Sebastian Andrzej Siewior,
Swapnil Sapkal, Shrikanth Hegde, Phil Auld, luca.abeni,
tommaso.cucinotta, Jon Hunter
On 12/03/25 09:55, Waiman Long wrote:
> On 3/12/25 6:09 AM, Juri Lelli wrote:
> > On 12/03/25 10:53, Dietmar Eggemann wrote:
> > > On 11/03/2025 15:51, Waiman Long wrote:
> > ...
> >
> > > > You are right. cpuhp_tasks_frozen will be set in the suspend/resume
> > > > case. In that case, we do need to add a cpuset helper to acquire the
> > > > cpuset_mutex. A test patch as follows (no testing done yet):
> > ...
> >
> > > This seems to work.
> > Thanks for testing!
> >
> > Waiman, how do you like to proceed. Separate patch (in this case can you
> > please send me that with changelog etc.) or incorporate your changes
> > into my original patch and possibly, if you like, add Co-authored-by?
> I think it will be better to merge into a single patch to avoid having a
> broken patch. It is up to you if you want me as a co-author. I don't really
> mind.
> >
> > > But what about a !CONFIG_CPUSETS build. In this case we won't have
> > > this DL accounting update during suspend/resume since
> > > dl_rebuild_rd_accounting() is empty.
> > I unfortunately very much suspect !CPUSETS accounting is broken. But if
> > that is indeed the case, it has been broken for a while. :(
> Without CONFIG_CPUSETS, there will be one and only one global sched domain.
> Will this still be a problem?
Still need to double check. But I have a feeling we don't restore
accounting correctly (at all?!) without CPUSETS. Orthogonal to this
issue though, as if we don't, we didn't so far. :/
> > Will need to double check that, but I would probably do it later on
> > separated from this set that at least seems to cure the most common
> > cases. What do people think?
>
> I am not aware of any distros without setting CONFIG_CPUSETS. So it is
> mostly a theoretical problem if there is one. So I would recommend going
> ahead with the current patch series instead of spending more time
> investigating this issue.
And I would agree (and then find time to look better into !CPUSETS
case). If nobody objects, I will refresh the series including Waiman's
changes and repost.
Thanks!
Juri
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 4/8] sched/deadline: Rebuild root domain accounting after every update
2025-03-12 14:11 ` Juri Lelli
@ 2025-03-12 16:29 ` Dietmar Eggemann
2025-03-12 16:51 ` Juri Lelli
0 siblings, 1 reply; 30+ messages in thread
From: Dietmar Eggemann @ 2025-03-12 16:29 UTC (permalink / raw)
To: Juri Lelli, Waiman Long
Cc: linux-kernel, cgroups, Ingo Molnar, Peter Zijlstra,
Vincent Guittot, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Tejun Heo, Johannes Weiner,
Michal Koutný, Qais Yousef, Sebastian Andrzej Siewior,
Swapnil Sapkal, Shrikanth Hegde, Phil Auld, luca.abeni,
tommaso.cucinotta, Jon Hunter
On 12/03/2025 15:11, Juri Lelli wrote:
> On 12/03/25 09:55, Waiman Long wrote:
>> On 3/12/25 6:09 AM, Juri Lelli wrote:
>>> On 12/03/25 10:53, Dietmar Eggemann wrote:
>>>> On 11/03/2025 15:51, Waiman Long wrote:
[...]
>>> I unfortunately very much suspect !CPUSETS accounting is broken. But if
>>> that is indeed the case, it has been broken for a while. :(
>> Without CONFIG_CPUSETS, there will be one and only one global sched domain.
>> Will this still be a problem?
>
> Still need to double check. But I have a feeling we don't restore
> accounting correctly (at all?!) without CPUSETS. Orthogonal to this
> issue though, as if we don't, we didn't so far. :/
As expected:
Since dl_rebuild_rd_accounting() is empty with !CONFIG_CPUSETS, the same
issue happens.
Testcase: suspend/resume
Test machine: Arm64 big.LITTLE cpumask=[LITTLE][big]=[0,3-5][1-2]
plus cmd line option 'isolcpus=3,4'.
...
[ 2250.898771] PM: suspend entry (deep)
[ 2250.902566] Filesystems sync: 0.000 seconds
[ 2250.908704] Freezing user space processes
[ 2250.914379] Freezing user space processes completed (elapsed 0.001
seconds)
[ 2250.921433] OOM killer disabled.
[ 2250.924702] Freezing remaining freezable tasks
[ 2250.930497] Freezing remaining freezable tasks completed (elapsed
0.001 seconds)
...
[ 2251.060052] Disabling non-boot CPUs ...
[ 2251.060426] CPU0 attaching NULL sched-domain.
[ 2251.060455] CPU1 attaching NULL sched-domain.
[ 2251.060478] CPU2 attaching NULL sched-domain.
[ 2251.060499] CPU5 attaching NULL sched-domain.
[ 2251.060712] CPU0 attaching sched-domain(s):
[ 2251.060723] domain-0: span=0-2 level=PKG
[ 2251.060750] groups: 0:{ span=0 cap=503 }, 1:{ span=1-2 cap=2048 }
[ 2251.060829] CPU1 attaching sched-domain(s):
[ 2251.060838] domain-0: span=1-2 level=MC
[ 2251.060859] groups: 1:{ span=1 }, 2:{ span=2 }
[ 2251.060906] domain-1: span=0-2 level=PKG
[ 2251.060926] groups: 1:{ span=1-2 cap=2048 }, 0:{ span=0 cap=503 }
[ 2251.061000] CPU2 attaching sched-domain(s):
[ 2251.061009] domain-0: span=1-2 level=MC
[ 2251.061030] groups: 2:{ span=2 }, 1:{ span=1 }
[ 2251.061077] domain-1: span=0-2 level=PKG
[ 2251.061097] groups: 1:{ span=1-2 cap=2048 }, 0:{ span=0 cap=503 }
[ 2251.061221] root domain span: 0-2
[ 2251.061270] root_domain 0-2: pd1:{ cpus=1-2 nr_pstate=5 } pd0:{
cpus=0,3-5 nr_pstate=5 }
[ 2251.064976] psci: CPU5 killed (polled 0 ms)
[ 2251.066211] Error taking CPU4 down: -16
[ 2251.066226] Non-boot CPUs are not disabled
[ 2251.066234] Enabling non-boot CPUs ...
[...]
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 4/8] sched/deadline: Rebuild root domain accounting after every update
2025-03-12 16:29 ` Dietmar Eggemann
@ 2025-03-12 16:51 ` Juri Lelli
2025-03-13 9:09 ` Dietmar Eggemann
0 siblings, 1 reply; 30+ messages in thread
From: Juri Lelli @ 2025-03-12 16:51 UTC (permalink / raw)
To: Dietmar Eggemann
Cc: Waiman Long, linux-kernel, cgroups, Ingo Molnar, Peter Zijlstra,
Vincent Guittot, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Tejun Heo, Johannes Weiner,
Michal Koutný, Qais Yousef, Sebastian Andrzej Siewior,
Swapnil Sapkal, Shrikanth Hegde, Phil Auld, luca.abeni,
tommaso.cucinotta, Jon Hunter
On 12/03/25 17:29, Dietmar Eggemann wrote:
> On 12/03/2025 15:11, Juri Lelli wrote:
> > On 12/03/25 09:55, Waiman Long wrote:
> >> On 3/12/25 6:09 AM, Juri Lelli wrote:
> >>> On 12/03/25 10:53, Dietmar Eggemann wrote:
> >>>> On 11/03/2025 15:51, Waiman Long wrote:
>
> [...]
>
> >>> I unfortunately very much suspect !CPUSETS accounting is broken. But if
> >>> that is indeed the case, it has been broken for a while. :(
> >> Without CONFIG_CPUSETS, there will be one and only one global sched domain.
> >> Will this still be a problem?
> >
> > Still need to double check. But I have a feeling we don't restore
> > accounting correctly (at all?!) without CPUSETS. Orthogonal to this
> > issue though, as if we don't, we didn't so far. :/
>
> As expected:
>
> Since dl_rebuild_rd_accounting() is empty with !CONFIG_CPUSETS, the same
> issue happens.
Right, suspicion confirmed. :)
But, as I was saying, I believe it has been broken for a while/forever.
Not only suspend/resume, the accounting itself.
Would you be OK if we address the !CPUSETS case with a separate later
series?
Thanks!
Juri
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 4/8] sched/deadline: Rebuild root domain accounting after every update
2025-03-12 16:51 ` Juri Lelli
@ 2025-03-13 9:09 ` Dietmar Eggemann
0 siblings, 0 replies; 30+ messages in thread
From: Dietmar Eggemann @ 2025-03-13 9:09 UTC (permalink / raw)
To: Juri Lelli
Cc: Waiman Long, linux-kernel, cgroups, Ingo Molnar, Peter Zijlstra,
Vincent Guittot, Steven Rostedt, Ben Segall, Mel Gorman,
Valentin Schneider, Tejun Heo, Johannes Weiner,
Michal Koutný, Qais Yousef, Sebastian Andrzej Siewior,
Swapnil Sapkal, Shrikanth Hegde, Phil Auld, luca.abeni,
tommaso.cucinotta, Jon Hunter
On 12.03.25 17:51, Juri Lelli wrote:
> On 12/03/25 17:29, Dietmar Eggemann wrote:
>> On 12/03/2025 15:11, Juri Lelli wrote:
>>> On 12/03/25 09:55, Waiman Long wrote:
>>>> On 3/12/25 6:09 AM, Juri Lelli wrote:
>>>>> On 12/03/25 10:53, Dietmar Eggemann wrote:
>>>>>> On 11/03/2025 15:51, Waiman Long wrote:
>>
>> [...]
>>
>>>>> I unfortunately very much suspect !CPUSETS accounting is broken. But if
>>>>> that is indeed the case, it has been broken for a while. :(
>>>> Without CONFIG_CPUSETS, there will be one and only one global sched domain.
>>>> Will this still be a problem?
>>>
>>> Still need to double check. But I have a feeling we don't restore
>>> accounting correctly (at all?!) without CPUSETS. Orthogonal to this
>>> issue though, as if we don't, we didn't so far. :/
>>
>> As expected:
>>
>> Since dl_rebuild_rd_accounting() is empty with !CONFIG_CPUSETS, the same
>> issue happens.
>
> Right, suspicion confirmed. :)
>
> But, as I was saying, I believe it has been broken for a while/forever.
> Not only suspend/resume, the accounting itself.
>
> Would you be OK if we address the !CPUSETS case with a separate later
> series?
Yes, we can do that.
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2025-03-13 9:09 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-10 9:19 [PATCH v3 0/8] Fix SCHED_DEADLINE bandwidth accounting during suspend Juri Lelli
2025-03-10 9:19 ` [PATCH v3 1/8] sched/deadline: Ignore special tasks when rebuilding domains Juri Lelli
2025-03-12 13:32 ` Valentin Schneider
2025-03-10 9:33 ` [PATCH v3 2/8] sched/topology: Wrappers for sched_domains_mutex Juri Lelli
2025-03-12 13:32 ` Valentin Schneider
2025-03-10 9:35 ` [PATCH v3 3/8] sched/deadline: Generalize unique visiting of root domains Juri Lelli
2025-03-12 13:32 ` Valentin Schneider
2025-03-10 9:37 ` [PATCH v3 4/8] sched/deadline: Rebuild root domain accounting after every update Juri Lelli
2025-03-10 18:54 ` Dietmar Eggemann
2025-03-10 19:18 ` Waiman Long
2025-03-11 0:16 ` Waiman Long
2025-03-11 11:59 ` Juri Lelli
2025-03-11 12:34 ` Waiman Long
2025-03-11 13:29 ` Dietmar Eggemann
2025-03-11 14:51 ` Waiman Long
2025-03-12 9:53 ` Dietmar Eggemann
2025-03-12 10:09 ` Juri Lelli
2025-03-12 13:55 ` Waiman Long
2025-03-12 14:11 ` Juri Lelli
2025-03-12 16:29 ` Dietmar Eggemann
2025-03-12 16:51 ` Juri Lelli
2025-03-13 9:09 ` Dietmar Eggemann
2025-03-10 9:38 ` [PATCH v3 5/8] sched/topology: Remove redundant dl_clear_root_domain call Juri Lelli
2025-03-12 13:32 ` Valentin Schneider
2025-03-10 9:39 ` [PATCH v3 6/8] cgroup/cpuset: Remove partition_and_rebuild_sched_domains Juri Lelli
2025-03-12 13:32 ` Valentin Schneider
2025-03-10 9:40 ` [PATCH v3 7/8] sched/topology: Stop exposing partition_sched_domains_locked Juri Lelli
2025-03-12 13:32 ` Valentin Schneider
2025-03-10 9:40 ` [PATCH v3 8/8] include/{topology,cpuset}: Move dl_rebuild_rd_accounting to cpuset.h Juri Lelli
2025-03-12 13:32 ` Valentin Schneider
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox