[PATCH -next 0/6] cpuset: further separate v1 and v2 implementations

Linux cgroups development
 help / color / mirror / Atom feed

* [PATCH -next 0/6] cpuset: further separate v1 and v2 implementations
@ 2025-12-17  8:49 Chen Ridong
  2025-12-17  8:49 ` [PATCH -next 1/6] cpuset: add assert_cpuset_lock_held helper Chen Ridong
                   ` (5 more replies)
  0 siblings, 6 replies; 21+ messages in thread
From: Chen Ridong @ 2025-12-17  8:49 UTC (permalink / raw)
  To: longman, tj, hannes, mkoutny; +Cc: cgroups, linux-kernel, lujialin4, chenridong

From: Chen Ridong <chenridong@huawei.com>

Most of the v1-specific code has already been moved to cpuset-v1.c, but
some parts remain in cpuset.c, such as the handling of CS_SPREAD_PAGE,
CS_SPREAD_SLAB, and CGRP_CPUSET_CLONE_CHILDREN. These can also be moved
to cpuset-v1.c.

Additionally, several cpuset members are specific to v1, including
fmeter, relax_domain_level, and the uf_node node. These should only be
visible when v1 support is enabled (CONFIG_CPUSETS_V1).

This series relocates the remaining v1-specific code to cpuset-v1.c and
guards v1-only members with CONFIG_CPUSETS_V1.

The most significant change is the separation of generate_sched_domains()
into v1 and v2 versions. For v1, the original function is preserved
with v2-specific code removed, keeping it largely unchanged since v1 is
deprecated and receives minimal future updates. For v2, all v1-specific
code has been removed, resulting in a much simpler and more maintainable
implementation.

Chen Ridong (6):
  cpuset: add assert_cpuset_lock_held helper
  cpuset: add cpuset1_online_css helper for v1-specific operations
  cpuset: add cpuset1_init helper for v1 initialization
  cpuset: move update_domain_attr_tree to cpuset_v1.c
  cpuset: separate generate_sched_domains for v1 and v2
  cpuset: remove v1-specific code from generate_sched_domains

 include/linux/cpuset.h          |   2 +
 kernel/cgroup/cpuset-internal.h |  43 +++++-
 kernel/cgroup/cpuset-v1.c       | 250 ++++++++++++++++++++++++++++++-
 kernel/cgroup/cpuset.c          | 252 +++++---------------------------
 4 files changed, 321 insertions(+), 226 deletions(-)

-- 
2.34.1

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH -next 1/6] cpuset: add assert_cpuset_lock_held helper
  2025-12-17  8:49 [PATCH -next 0/6] cpuset: further separate v1 and v2 implementations Chen Ridong
@ 2025-12-17  8:49 ` Chen Ridong
  2025-12-17 17:02   ` Waiman Long
  2025-12-17  8:49 ` [PATCH -next 2/6] cpuset: add cpuset1_online_css helper for v1-specific operations Chen Ridong
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 21+ messages in thread
From: Chen Ridong @ 2025-12-17  8:49 UTC (permalink / raw)
  To: longman, tj, hannes, mkoutny; +Cc: cgroups, linux-kernel, lujialin4, chenridong

From: Chen Ridong <chenridong@huawei.com>

Add assert_cpuset_lock_held() to allow other subsystems to verify that
cpuset_mutex is held.

Suggested-by: Waiman Long <longman@redhat.com>
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
 include/linux/cpuset.h | 2 ++
 kernel/cgroup/cpuset.c | 5 +++++
 2 files changed, 7 insertions(+)

diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index a98d3330385c..af0e76d10476 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -74,6 +74,7 @@ extern void inc_dl_tasks_cs(struct task_struct *task);
 extern void dec_dl_tasks_cs(struct task_struct *task);
 extern void cpuset_lock(void);
 extern void cpuset_unlock(void);
+extern void assert_cpuset_lock_held(void);
 extern void cpuset_cpus_allowed_locked(struct task_struct *p, struct cpumask *mask);
 extern void cpuset_cpus_allowed(struct task_struct *p, struct cpumask *mask);
 extern bool cpuset_cpus_allowed_fallback(struct task_struct *p);
@@ -195,6 +196,7 @@ static inline void inc_dl_tasks_cs(struct task_struct *task) { }
 static inline void dec_dl_tasks_cs(struct task_struct *task) { }
 static inline void cpuset_lock(void) { }
 static inline void cpuset_unlock(void) { }
+static inline void assert_cpuset_lock_held(void) { }
 
 static inline void cpuset_cpus_allowed_locked(struct task_struct *p,
 					struct cpumask *mask)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index fea577b4016a..a5ad124ea1cf 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -271,6 +271,11 @@ void cpuset_unlock(void)
 	mutex_unlock(&cpuset_mutex);
 }
 
+void assert_cpuset_lock_held(void)
+{
+	lockdep_assert_held(&cpuset_mutex);
+}
+
 /**
  * cpuset_full_lock - Acquire full protection for cpuset modification
  *
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH -next 1/6] cpuset: add assert_cpuset_lock_held helper
  2025-12-17  8:49 ` [PATCH -next 1/6] cpuset: add assert_cpuset_lock_held helper Chen Ridong
@ 2025-12-17 17:02   ` Waiman Long
  2025-12-18  0:37     ` Chen Ridong
  0 siblings, 1 reply; 21+ messages in thread
From: Waiman Long @ 2025-12-17 17:02 UTC (permalink / raw)
  To: Chen Ridong, tj, hannes, mkoutny; +Cc: cgroups, linux-kernel, lujialin4

On 12/17/25 3:49 AM, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
>
> Add assert_cpuset_lock_held() to allow other subsystems to verify that
> cpuset_mutex is held.

Sorry, I should have added the "lockdep_" prefix when I mentioned adding 
this helper function to be consistent with the others. Could you update 
the patch to add that?

Thanks,
Longman

>
> Suggested-by: Waiman Long <longman@redhat.com>
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> ---
>   include/linux/cpuset.h | 2 ++
>   kernel/cgroup/cpuset.c | 5 +++++
>   2 files changed, 7 insertions(+)
>
> diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
> index a98d3330385c..af0e76d10476 100644
> --- a/include/linux/cpuset.h
> +++ b/include/linux/cpuset.h
> @@ -74,6 +74,7 @@ extern void inc_dl_tasks_cs(struct task_struct *task);
>   extern void dec_dl_tasks_cs(struct task_struct *task);
>   extern void cpuset_lock(void);
>   extern void cpuset_unlock(void);
> +extern void assert_cpuset_lock_held(void);
>   extern void cpuset_cpus_allowed_locked(struct task_struct *p, struct cpumask *mask);
>   extern void cpuset_cpus_allowed(struct task_struct *p, struct cpumask *mask);
>   extern bool cpuset_cpus_allowed_fallback(struct task_struct *p);
> @@ -195,6 +196,7 @@ static inline void inc_dl_tasks_cs(struct task_struct *task) { }
>   static inline void dec_dl_tasks_cs(struct task_struct *task) { }
>   static inline void cpuset_lock(void) { }
>   static inline void cpuset_unlock(void) { }
> +static inline void assert_cpuset_lock_held(void) { }
>   
>   static inline void cpuset_cpus_allowed_locked(struct task_struct *p,
>   					struct cpumask *mask)
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index fea577b4016a..a5ad124ea1cf 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -271,6 +271,11 @@ void cpuset_unlock(void)
>   	mutex_unlock(&cpuset_mutex);
>   }
>   
> +void assert_cpuset_lock_held(void)
> +{
> +	lockdep_assert_held(&cpuset_mutex);
> +}
> +
>   /**
>    * cpuset_full_lock - Acquire full protection for cpuset modification
>    *


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH -next 1/6] cpuset: add assert_cpuset_lock_held helper
  2025-12-17 17:02   ` Waiman Long
@ 2025-12-18  0:37     ` Chen Ridong
  0 siblings, 0 replies; 21+ messages in thread
From: Chen Ridong @ 2025-12-18  0:37 UTC (permalink / raw)
  To: Waiman Long, tj, hannes, mkoutny; +Cc: cgroups, linux-kernel, lujialin4



On 2025/12/18 1:02, Waiman Long wrote:
> On 12/17/25 3:49 AM, Chen Ridong wrote:
>> From: Chen Ridong <chenridong@huawei.com>
>>
>> Add assert_cpuset_lock_held() to allow other subsystems to verify that
>> cpuset_mutex is held.
> 
> Sorry, I should have added the "lockdep_" prefix when I mentioned adding this helper function to be
> consistent with the others. Could you update the patch to add that?
> 

Thank you.

I was tangled on whether to add the lockdep_ prefix, since lockdep_assert_cpus_held has it. But I
named it as you suggested originally. I'll update the patch accordingly.

> 
>>
>> Suggested-by: Waiman Long <longman@redhat.com>
>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>> ---
>>   include/linux/cpuset.h | 2 ++
>>   kernel/cgroup/cpuset.c | 5 +++++
>>   2 files changed, 7 insertions(+)
>>
>> diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
>> index a98d3330385c..af0e76d10476 100644
>> --- a/include/linux/cpuset.h
>> +++ b/include/linux/cpuset.h
>> @@ -74,6 +74,7 @@ extern void inc_dl_tasks_cs(struct task_struct *task);
>>   extern void dec_dl_tasks_cs(struct task_struct *task);
>>   extern void cpuset_lock(void);
>>   extern void cpuset_unlock(void);
>> +extern void assert_cpuset_lock_held(void);
>>   extern void cpuset_cpus_allowed_locked(struct task_struct *p, struct cpumask *mask);
>>   extern void cpuset_cpus_allowed(struct task_struct *p, struct cpumask *mask);
>>   extern bool cpuset_cpus_allowed_fallback(struct task_struct *p);
>> @@ -195,6 +196,7 @@ static inline void inc_dl_tasks_cs(struct task_struct *task) { }
>>   static inline void dec_dl_tasks_cs(struct task_struct *task) { }
>>   static inline void cpuset_lock(void) { }
>>   static inline void cpuset_unlock(void) { }
>> +static inline void assert_cpuset_lock_held(void) { }
>>     static inline void cpuset_cpus_allowed_locked(struct task_struct *p,
>>                       struct cpumask *mask)
>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>> index fea577b4016a..a5ad124ea1cf 100644
>> --- a/kernel/cgroup/cpuset.c
>> +++ b/kernel/cgroup/cpuset.c
>> @@ -271,6 +271,11 @@ void cpuset_unlock(void)
>>       mutex_unlock(&cpuset_mutex);
>>   }
>>   +void assert_cpuset_lock_held(void)
>> +{
>> +    lockdep_assert_held(&cpuset_mutex);
>> +}
>> +
>>   /**
>>    * cpuset_full_lock - Acquire full protection for cpuset modification
>>    *
> 

-- 
Best regards,
Ridong


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH -next 2/6] cpuset: add cpuset1_online_css helper for v1-specific operations
  2025-12-17  8:49 [PATCH -next 0/6] cpuset: further separate v1 and v2 implementations Chen Ridong
  2025-12-17  8:49 ` [PATCH -next 1/6] cpuset: add assert_cpuset_lock_held helper Chen Ridong
@ 2025-12-17  8:49 ` Chen Ridong
  2025-12-17  8:49 ` [PATCH -next 3/6] cpuset: add cpuset1_init helper for v1 initialization Chen Ridong
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 21+ messages in thread
From: Chen Ridong @ 2025-12-17  8:49 UTC (permalink / raw)
  To: longman, tj, hannes, mkoutny; +Cc: cgroups, linux-kernel, lujialin4, chenridong

From: Chen Ridong <chenridong@huawei.com>

This commit introduces the cpuset1_online_css helper to centralize
v1-specific handling during cpuset online. It performs operations such as
updating the CS_SPREAD_PAGE, CS_SPREAD_SLAB, and CGRP_CPUSET_CLONE_CHILDREN
flags, which are unique to the cpuset v1 control group interface.

The helper is now placed in cpuset-v1.c to maintain clear separation
between v1 and v2 logic.

Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
 kernel/cgroup/cpuset-internal.h |  2 ++
 kernel/cgroup/cpuset-v1.c       | 48 +++++++++++++++++++++++++++++++++
 kernel/cgroup/cpuset.c          | 39 +--------------------------
 3 files changed, 51 insertions(+), 38 deletions(-)

diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
index 01976c8e7d49..6c03cad02302 100644
--- a/kernel/cgroup/cpuset-internal.h
+++ b/kernel/cgroup/cpuset-internal.h
@@ -293,6 +293,7 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs,
 			    struct cpumask *new_cpus, nodemask_t *new_mems,
 			    bool cpus_updated, bool mems_updated);
 int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial);
+void cpuset1_online_css(struct cgroup_subsys_state *css);
 #else
 static inline void fmeter_init(struct fmeter *fmp) {}
 static inline void cpuset1_update_task_spread_flags(struct cpuset *cs,
@@ -303,6 +304,7 @@ static inline void cpuset1_hotplug_update_tasks(struct cpuset *cs,
 			    bool cpus_updated, bool mems_updated) {}
 static inline int cpuset1_validate_change(struct cpuset *cur,
 				struct cpuset *trial) { return 0; }
+static inline void cpuset1_online_css(struct cgroup_subsys_state *css) {}
 #endif /* CONFIG_CPUSETS_V1 */
 
 #endif /* __CPUSET_INTERNAL_H */
diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
index 12e76774c75b..650028ee250b 100644
--- a/kernel/cgroup/cpuset-v1.c
+++ b/kernel/cgroup/cpuset-v1.c
@@ -499,6 +499,54 @@ static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft,
 	return retval;
 }
 
+void cpuset1_online_css(struct cgroup_subsys_state *css)
+{
+	struct cpuset *tmp_cs;
+	struct cgroup_subsys_state *pos_css;
+	struct cpuset *cs = css_cs(css);
+	struct cpuset *parent = parent_cs(cs);
+
+	lockdep_assert_cpus_held();
+	assert_cpuset_lock_held();
+
+	if (is_spread_page(parent))
+		set_bit(CS_SPREAD_PAGE, &cs->flags);
+	if (is_spread_slab(parent))
+		set_bit(CS_SPREAD_SLAB, &cs->flags);
+
+	if (!test_bit(CGRP_CPUSET_CLONE_CHILDREN, &css->cgroup->flags))
+		return;
+
+	/*
+	 * Clone @parent's configuration if CGRP_CPUSET_CLONE_CHILDREN is
+	 * set.  This flag handling is implemented in cgroup core for
+	 * historical reasons - the flag may be specified during mount.
+	 *
+	 * Currently, if any sibling cpusets have exclusive cpus or mem, we
+	 * refuse to clone the configuration - thereby refusing the task to
+	 * be entered, and as a result refusing the sys_unshare() or
+	 * clone() which initiated it.  If this becomes a problem for some
+	 * users who wish to allow that scenario, then this could be
+	 * changed to grant parent->cpus_allowed-sibling_cpus_exclusive
+	 * (and likewise for mems) to the new cgroup.
+	 */
+	rcu_read_lock();
+	cpuset_for_each_child(tmp_cs, pos_css, parent) {
+		if (is_mem_exclusive(tmp_cs) || is_cpu_exclusive(tmp_cs)) {
+			rcu_read_unlock();
+			return;
+		}
+	}
+	rcu_read_unlock();
+
+	cpuset_callback_lock_irq();
+	cs->mems_allowed = parent->mems_allowed;
+	cs->effective_mems = parent->mems_allowed;
+	cpumask_copy(cs->cpus_allowed, parent->cpus_allowed);
+	cpumask_copy(cs->effective_cpus, parent->cpus_allowed);
+	cpuset_callback_unlock_irq();
+}
+
 /*
  * for the common functions, 'private' gives the type of file
  */
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index a5ad124ea1cf..f74da3086120 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3616,17 +3616,11 @@ static int cpuset_css_online(struct cgroup_subsys_state *css)
 {
 	struct cpuset *cs = css_cs(css);
 	struct cpuset *parent = parent_cs(cs);
-	struct cpuset *tmp_cs;
-	struct cgroup_subsys_state *pos_css;
 
 	if (!parent)
 		return 0;
 
 	cpuset_full_lock();
-	if (is_spread_page(parent))
-		set_bit(CS_SPREAD_PAGE, &cs->flags);
-	if (is_spread_slab(parent))
-		set_bit(CS_SPREAD_SLAB, &cs->flags);
 	/*
 	 * For v2, clear CS_SCHED_LOAD_BALANCE if parent is isolated
 	 */
@@ -3641,39 +3635,8 @@ static int cpuset_css_online(struct cgroup_subsys_state *css)
 		cs->effective_mems = parent->effective_mems;
 	}
 	spin_unlock_irq(&callback_lock);
+	cpuset1_online_css(css);
 
-	if (!test_bit(CGRP_CPUSET_CLONE_CHILDREN, &css->cgroup->flags))
-		goto out_unlock;
-
-	/*
-	 * Clone @parent's configuration if CGRP_CPUSET_CLONE_CHILDREN is
-	 * set.  This flag handling is implemented in cgroup core for
-	 * historical reasons - the flag may be specified during mount.
-	 *
-	 * Currently, if any sibling cpusets have exclusive cpus or mem, we
-	 * refuse to clone the configuration - thereby refusing the task to
-	 * be entered, and as a result refusing the sys_unshare() or
-	 * clone() which initiated it.  If this becomes a problem for some
-	 * users who wish to allow that scenario, then this could be
-	 * changed to grant parent->cpus_allowed-sibling_cpus_exclusive
-	 * (and likewise for mems) to the new cgroup.
-	 */
-	rcu_read_lock();
-	cpuset_for_each_child(tmp_cs, pos_css, parent) {
-		if (is_mem_exclusive(tmp_cs) || is_cpu_exclusive(tmp_cs)) {
-			rcu_read_unlock();
-			goto out_unlock;
-		}
-	}
-	rcu_read_unlock();
-
-	spin_lock_irq(&callback_lock);
-	cs->mems_allowed = parent->mems_allowed;
-	cs->effective_mems = parent->mems_allowed;
-	cpumask_copy(cs->cpus_allowed, parent->cpus_allowed);
-	cpumask_copy(cs->effective_cpus, parent->cpus_allowed);
-	spin_unlock_irq(&callback_lock);
-out_unlock:
 	cpuset_full_unlock();
 	return 0;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH -next 3/6] cpuset: add cpuset1_init helper for v1 initialization
  2025-12-17  8:49 [PATCH -next 0/6] cpuset: further separate v1 and v2 implementations Chen Ridong
  2025-12-17  8:49 ` [PATCH -next 1/6] cpuset: add assert_cpuset_lock_held helper Chen Ridong
  2025-12-17  8:49 ` [PATCH -next 2/6] cpuset: add cpuset1_online_css helper for v1-specific operations Chen Ridong
@ 2025-12-17  8:49 ` Chen Ridong
  2025-12-17  8:49 ` [PATCH -next 4/6] cpuset: move update_domain_attr_tree to cpuset_v1.c Chen Ridong
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 21+ messages in thread
From: Chen Ridong @ 2025-12-17  8:49 UTC (permalink / raw)
  To: longman, tj, hannes, mkoutny; +Cc: cgroups, linux-kernel, lujialin4, chenridong

From: Chen Ridong <chenridong@huawei.com>

This patch introduces the cpuset1_init helper in cpuset_v1.c to initialize
v1-specific fields, including the fmeter and relax_domain_level members.

The relax_domain_level related code will be moved to cpuset_v1.c in a
subsequent patch. After this move, v1-specific members will only be
visible when CONFIG_CPUSETS_V1=y.

Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
 kernel/cgroup/cpuset-internal.h | 10 ++++++----
 kernel/cgroup/cpuset-v1.c       |  7 ++++++-
 kernel/cgroup/cpuset.c          |  4 ++--
 3 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
index 6c03cad02302..a32517da8231 100644
--- a/kernel/cgroup/cpuset-internal.h
+++ b/kernel/cgroup/cpuset-internal.h
@@ -144,8 +144,6 @@ struct cpuset {
 	 */
 	nodemask_t old_mems_allowed;
 
-	struct fmeter fmeter;		/* memory_pressure filter */
-
 	/*
 	 * Tasks are being attached to this cpuset.  Used to prevent
 	 * zeroing cpus/mems_allowed between ->can_attach() and ->attach().
@@ -181,6 +179,10 @@ struct cpuset {
 
 	/* Used to merge intersecting subsets for generate_sched_domains */
 	struct uf_node node;
+
+#ifdef CONFIG_CPUSETS_V1
+	struct fmeter fmeter;		/* memory_pressure filter */
+#endif
 };
 
 static inline struct cpuset *css_cs(struct cgroup_subsys_state *css)
@@ -285,7 +287,6 @@ void cpuset_full_unlock(void);
  */
 #ifdef CONFIG_CPUSETS_V1
 extern struct cftype cpuset1_files[];
-void fmeter_init(struct fmeter *fmp);
 void cpuset1_update_task_spread_flags(struct cpuset *cs,
 					struct task_struct *tsk);
 void cpuset1_update_tasks_flags(struct cpuset *cs);
@@ -293,9 +294,9 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs,
 			    struct cpumask *new_cpus, nodemask_t *new_mems,
 			    bool cpus_updated, bool mems_updated);
 int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial);
+void cpuset1_init(struct cpuset *cs);
 void cpuset1_online_css(struct cgroup_subsys_state *css);
 #else
-static inline void fmeter_init(struct fmeter *fmp) {}
 static inline void cpuset1_update_task_spread_flags(struct cpuset *cs,
 					struct task_struct *tsk) {}
 static inline void cpuset1_update_tasks_flags(struct cpuset *cs) {}
@@ -304,6 +305,7 @@ static inline void cpuset1_hotplug_update_tasks(struct cpuset *cs,
 			    bool cpus_updated, bool mems_updated) {}
 static inline int cpuset1_validate_change(struct cpuset *cur,
 				struct cpuset *trial) { return 0; }
+static inline void cpuset1_init(struct cpuset *cs) {}
 static inline void cpuset1_online_css(struct cgroup_subsys_state *css) {}
 #endif /* CONFIG_CPUSETS_V1 */
 
diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
index 650028ee250b..574df740f21a 100644
--- a/kernel/cgroup/cpuset-v1.c
+++ b/kernel/cgroup/cpuset-v1.c
@@ -62,7 +62,7 @@ struct cpuset_remove_tasks_struct {
 #define FM_SCALE 1000		/* faux fixed point scale */
 
 /* Initialize a frequency meter */
-void fmeter_init(struct fmeter *fmp)
+static void fmeter_init(struct fmeter *fmp)
 {
 	fmp->cnt = 0;
 	fmp->val = 0;
@@ -499,6 +499,11 @@ static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft,
 	return retval;
 }
 
+void cpuset1_init(struct cpuset *cs)
+{
+	fmeter_init(&cs->fmeter);
+}
+
 void cpuset1_online_css(struct cgroup_subsys_state *css)
 {
 	struct cpuset *tmp_cs;
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index f74da3086120..e836a1f2b951 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3602,7 +3602,7 @@ cpuset_css_alloc(struct cgroup_subsys_state *parent_css)
 		return ERR_PTR(-ENOMEM);
 
 	__set_bit(CS_SCHED_LOAD_BALANCE, &cs->flags);
-	fmeter_init(&cs->fmeter);
+	cpuset1_init(cs);
 	cs->relax_domain_level = -1;
 
 	/* Set CS_MEMORY_MIGRATE for default hierarchy */
@@ -3836,7 +3836,7 @@ int __init cpuset_init(void)
 	cpumask_setall(top_cpuset.exclusive_cpus);
 	nodes_setall(top_cpuset.effective_mems);
 
-	fmeter_init(&top_cpuset.fmeter);
+	cpuset1_init(&top_cpuset);
 
 	BUG_ON(!alloc_cpumask_var(&cpus_attach, GFP_KERNEL));
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH -next 4/6] cpuset: move update_domain_attr_tree to cpuset_v1.c
  2025-12-17  8:49 [PATCH -next 0/6] cpuset: further separate v1 and v2 implementations Chen Ridong
                   ` (2 preceding siblings ...)
  2025-12-17  8:49 ` [PATCH -next 3/6] cpuset: add cpuset1_init helper for v1 initialization Chen Ridong
@ 2025-12-17  8:49 ` Chen Ridong
  2025-12-17 17:09   ` Waiman Long
  2025-12-17  8:49 ` [PATCH -next 5/6] cpuset: separate generate_sched_domains for v1 and v2 Chen Ridong
  2025-12-17  8:49 ` [PATCH -next 6/6] cpuset: remove v1-specific code from generate_sched_domains Chen Ridong
  5 siblings, 1 reply; 21+ messages in thread
From: Chen Ridong @ 2025-12-17  8:49 UTC (permalink / raw)
  To: longman, tj, hannes, mkoutny; +Cc: cgroups, linux-kernel, lujialin4, chenridong

From: Chen Ridong <chenridong@huawei.com>

Since relax_domain_level is only applicable to v1, move
update_domain_attr_tree() to cpuset-v1.c, which solely updates
relax_domain_level,

Additionally, relax_domain_level is now initialized in cpuset1_inited.
Accordingly, the initialization of relax_domain_level in top_cpuset is
removed. The unnecessary remote_partition initialization in top_cpuset
is also cleaned up.

As a result, relax_domain_level can be defined in cpuset only when
CONFIG_CPUSETS_V1=y.

Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
 kernel/cgroup/cpuset-internal.h | 11 ++++++++---
 kernel/cgroup/cpuset-v1.c       | 28 ++++++++++++++++++++++++++++
 kernel/cgroup/cpuset.c          | 31 -------------------------------
 3 files changed, 36 insertions(+), 34 deletions(-)

diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
index a32517da8231..677053ffb913 100644
--- a/kernel/cgroup/cpuset-internal.h
+++ b/kernel/cgroup/cpuset-internal.h
@@ -150,9 +150,6 @@ struct cpuset {
 	 */
 	int attach_in_progress;
 
-	/* for custom sched domain */
-	int relax_domain_level;
-
 	/* partition root state */
 	int partition_root_state;
 
@@ -182,6 +179,9 @@ struct cpuset {
 
 #ifdef CONFIG_CPUSETS_V1
 	struct fmeter fmeter;		/* memory_pressure filter */
+
+	/* for custom sched domain */
+	int relax_domain_level;
 #endif
 };
 
@@ -296,6 +296,8 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs,
 int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial);
 void cpuset1_init(struct cpuset *cs);
 void cpuset1_online_css(struct cgroup_subsys_state *css);
+void update_domain_attr_tree(struct sched_domain_attr *dattr,
+				    struct cpuset *root_cs);
 #else
 static inline void cpuset1_update_task_spread_flags(struct cpuset *cs,
 					struct task_struct *tsk) {}
@@ -307,6 +309,9 @@ static inline int cpuset1_validate_change(struct cpuset *cur,
 				struct cpuset *trial) { return 0; }
 static inline void cpuset1_init(struct cpuset *cs) {}
 static inline void cpuset1_online_css(struct cgroup_subsys_state *css) {}
+static inline void update_domain_attr_tree(struct sched_domain_attr *dattr,
+				    struct cpuset *root_cs) {}
+
 #endif /* CONFIG_CPUSETS_V1 */
 
 #endif /* __CPUSET_INTERNAL_H */
diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
index 574df740f21a..95de6f2a4cc5 100644
--- a/kernel/cgroup/cpuset-v1.c
+++ b/kernel/cgroup/cpuset-v1.c
@@ -502,6 +502,7 @@ static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft,
 void cpuset1_init(struct cpuset *cs)
 {
 	fmeter_init(&cs->fmeter);
+	cs->relax_domain_level = -1;
 }
 
 void cpuset1_online_css(struct cgroup_subsys_state *css)
@@ -552,6 +553,33 @@ void cpuset1_online_css(struct cgroup_subsys_state *css)
 	cpuset_callback_unlock_irq();
 }
 
+static void
+update_domain_attr(struct sched_domain_attr *dattr, struct cpuset *c)
+{
+	if (dattr->relax_domain_level < c->relax_domain_level)
+		dattr->relax_domain_level = c->relax_domain_level;
+}
+
+void update_domain_attr_tree(struct sched_domain_attr *dattr,
+				    struct cpuset *root_cs)
+{
+	struct cpuset *cp;
+	struct cgroup_subsys_state *pos_css;
+
+	rcu_read_lock();
+	cpuset_for_each_descendant_pre(cp, pos_css, root_cs) {
+		/* skip the whole subtree if @cp doesn't have any CPU */
+		if (cpumask_empty(cp->cpus_allowed)) {
+			pos_css = css_rightmost_descendant(pos_css);
+			continue;
+		}
+
+		if (is_sched_load_balance(cp))
+			update_domain_attr(dattr, cp);
+	}
+	rcu_read_unlock();
+}
+
 /*
  * for the common functions, 'private' gives the type of file
  */
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index e836a1f2b951..88ca8b40e01a 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -215,8 +215,6 @@ static struct cpuset top_cpuset = {
 	.flags = BIT(CS_CPU_EXCLUSIVE) |
 		 BIT(CS_MEM_EXCLUSIVE) | BIT(CS_SCHED_LOAD_BALANCE),
 	.partition_root_state = PRS_ROOT,
-	.relax_domain_level = -1,
-	.remote_partition = false,
 };
 
 /*
@@ -755,34 +753,6 @@ static int cpusets_overlap(struct cpuset *a, struct cpuset *b)
 	return cpumask_intersects(a->effective_cpus, b->effective_cpus);
 }
 
-static void
-update_domain_attr(struct sched_domain_attr *dattr, struct cpuset *c)
-{
-	if (dattr->relax_domain_level < c->relax_domain_level)
-		dattr->relax_domain_level = c->relax_domain_level;
-	return;
-}
-
-static void update_domain_attr_tree(struct sched_domain_attr *dattr,
-				    struct cpuset *root_cs)
-{
-	struct cpuset *cp;
-	struct cgroup_subsys_state *pos_css;
-
-	rcu_read_lock();
-	cpuset_for_each_descendant_pre(cp, pos_css, root_cs) {
-		/* skip the whole subtree if @cp doesn't have any CPU */
-		if (cpumask_empty(cp->cpus_allowed)) {
-			pos_css = css_rightmost_descendant(pos_css);
-			continue;
-		}
-
-		if (is_sched_load_balance(cp))
-			update_domain_attr(dattr, cp);
-	}
-	rcu_read_unlock();
-}
-
 /* Must be called with cpuset_mutex held.  */
 static inline int nr_cpusets(void)
 {
@@ -3603,7 +3573,6 @@ cpuset_css_alloc(struct cgroup_subsys_state *parent_css)
 
 	__set_bit(CS_SCHED_LOAD_BALANCE, &cs->flags);
 	cpuset1_init(cs);
-	cs->relax_domain_level = -1;
 
 	/* Set CS_MEMORY_MIGRATE for default hierarchy */
 	if (cpuset_v2())
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH -next 4/6] cpuset: move update_domain_attr_tree to cpuset_v1.c
  2025-12-17  8:49 ` [PATCH -next 4/6] cpuset: move update_domain_attr_tree to cpuset_v1.c Chen Ridong
@ 2025-12-17 17:09   ` Waiman Long
  2025-12-18  0:44     ` Chen Ridong
  0 siblings, 1 reply; 21+ messages in thread
From: Waiman Long @ 2025-12-17 17:09 UTC (permalink / raw)
  To: Chen Ridong, tj, hannes, mkoutny; +Cc: cgroups, linux-kernel, lujialin4


On 12/17/25 3:49 AM, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
>
> Since relax_domain_level is only applicable to v1, move
> update_domain_attr_tree() to cpuset-v1.c, which solely updates
> relax_domain_level,
>
> Additionally, relax_domain_level is now initialized in cpuset1_inited.
> Accordingly, the initialization of relax_domain_level in top_cpuset is
> removed. The unnecessary remote_partition initialization in top_cpuset
> is also cleaned up.
>
> As a result, relax_domain_level can be defined in cpuset only when
> CONFIG_CPUSETS_V1=y.
>
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> ---
>   kernel/cgroup/cpuset-internal.h | 11 ++++++++---
>   kernel/cgroup/cpuset-v1.c       | 28 ++++++++++++++++++++++++++++
>   kernel/cgroup/cpuset.c          | 31 -------------------------------
>   3 files changed, 36 insertions(+), 34 deletions(-)
>
> diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
> index a32517da8231..677053ffb913 100644
> --- a/kernel/cgroup/cpuset-internal.h
> +++ b/kernel/cgroup/cpuset-internal.h
> @@ -150,9 +150,6 @@ struct cpuset {
>   	 */
>   	int attach_in_progress;
>   
> -	/* for custom sched domain */
> -	int relax_domain_level;
> -
>   	/* partition root state */
>   	int partition_root_state;
>   
> @@ -182,6 +179,9 @@ struct cpuset {
>   
>   #ifdef CONFIG_CPUSETS_V1
>   	struct fmeter fmeter;		/* memory_pressure filter */
> +
> +	/* for custom sched domain */
> +	int relax_domain_level;
>   #endif
>   };
>   
> @@ -296,6 +296,8 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs,
>   int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial);
>   void cpuset1_init(struct cpuset *cs);
>   void cpuset1_online_css(struct cgroup_subsys_state *css);
> +void update_domain_attr_tree(struct sched_domain_attr *dattr,
> +				    struct cpuset *root_cs);
>   #else
>   static inline void cpuset1_update_task_spread_flags(struct cpuset *cs,
>   					struct task_struct *tsk) {}
> @@ -307,6 +309,9 @@ static inline int cpuset1_validate_change(struct cpuset *cur,
>   				struct cpuset *trial) { return 0; }
>   static inline void cpuset1_init(struct cpuset *cs) {}
>   static inline void cpuset1_online_css(struct cgroup_subsys_state *css) {}
> +static inline void update_domain_attr_tree(struct sched_domain_attr *dattr,
> +				    struct cpuset *root_cs) {}
> +
>   #endif /* CONFIG_CPUSETS_V1 */
>   
>   #endif /* __CPUSET_INTERNAL_H */
> diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
> index 574df740f21a..95de6f2a4cc5 100644
> --- a/kernel/cgroup/cpuset-v1.c
> +++ b/kernel/cgroup/cpuset-v1.c
> @@ -502,6 +502,7 @@ static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft,
>   void cpuset1_init(struct cpuset *cs)
>   {
>   	fmeter_init(&cs->fmeter);
> +	cs->relax_domain_level = -1;
>   }
>   
>   void cpuset1_online_css(struct cgroup_subsys_state *css)
> @@ -552,6 +553,33 @@ void cpuset1_online_css(struct cgroup_subsys_state *css)
>   	cpuset_callback_unlock_irq();
>   }
>   
> +static void
> +update_domain_attr(struct sched_domain_attr *dattr, struct cpuset *c)
> +{
> +	if (dattr->relax_domain_level < c->relax_domain_level)
> +		dattr->relax_domain_level = c->relax_domain_level;
> +}
> +
> +void update_domain_attr_tree(struct sched_domain_attr *dattr,
> +				    struct cpuset *root_cs)
> +{
> +	struct cpuset *cp;
> +	struct cgroup_subsys_state *pos_css;
> +
> +	rcu_read_lock();
> +	cpuset_for_each_descendant_pre(cp, pos_css, root_cs) {
> +		/* skip the whole subtree if @cp doesn't have any CPU */
> +		if (cpumask_empty(cp->cpus_allowed)) {
> +			pos_css = css_rightmost_descendant(pos_css);
> +			continue;
> +		}
> +
> +		if (is_sched_load_balance(cp))
> +			update_domain_attr(dattr, cp);
> +	}
> +	rcu_read_unlock();
> +}
> +
>   /*
>    * for the common functions, 'private' gives the type of file
>    */
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index e836a1f2b951..88ca8b40e01a 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -215,8 +215,6 @@ static struct cpuset top_cpuset = {
>   	.flags = BIT(CS_CPU_EXCLUSIVE) |
>   		 BIT(CS_MEM_EXCLUSIVE) | BIT(CS_SCHED_LOAD_BALANCE),
>   	.partition_root_state = PRS_ROOT,
> -	.relax_domain_level = -1,

As the cpuset1_init() function will not be called for top_cpuset, you 
should not remove the initialization of relax_domain_level. Instead, put 
it inside a "ifdef CONFIG_CPUSETS_V1 block.

> -	.remote_partition = false,

Yes, this is not really needed and can be removed.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH -next 4/6] cpuset: move update_domain_attr_tree to cpuset_v1.c
  2025-12-17 17:09   ` Waiman Long
@ 2025-12-18  0:44     ` Chen Ridong
  2025-12-18  3:06       ` Waiman Long
  0 siblings, 1 reply; 21+ messages in thread
From: Chen Ridong @ 2025-12-18  0:44 UTC (permalink / raw)
  To: Waiman Long, tj, hannes, mkoutny; +Cc: cgroups, linux-kernel, lujialin4



On 2025/12/18 1:09, Waiman Long wrote:
> 
> On 12/17/25 3:49 AM, Chen Ridong wrote:
>> From: Chen Ridong <chenridong@huawei.com>
>>
>> Since relax_domain_level is only applicable to v1, move
>> update_domain_attr_tree() to cpuset-v1.c, which solely updates
>> relax_domain_level,
>>
>> Additionally, relax_domain_level is now initialized in cpuset1_inited.
>> Accordingly, the initialization of relax_domain_level in top_cpuset is
>> removed. The unnecessary remote_partition initialization in top_cpuset
>> is also cleaned up.
>>
>> As a result, relax_domain_level can be defined in cpuset only when
>> CONFIG_CPUSETS_V1=y.
>>
>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>> ---
>>   kernel/cgroup/cpuset-internal.h | 11 ++++++++---
>>   kernel/cgroup/cpuset-v1.c       | 28 ++++++++++++++++++++++++++++
>>   kernel/cgroup/cpuset.c          | 31 -------------------------------
>>   3 files changed, 36 insertions(+), 34 deletions(-)
>>
>> diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
>> index a32517da8231..677053ffb913 100644
>> --- a/kernel/cgroup/cpuset-internal.h
>> +++ b/kernel/cgroup/cpuset-internal.h
>> @@ -150,9 +150,6 @@ struct cpuset {
>>        */
>>       int attach_in_progress;
>>   -    /* for custom sched domain */
>> -    int relax_domain_level;
>> -
>>       /* partition root state */
>>       int partition_root_state;
>>   @@ -182,6 +179,9 @@ struct cpuset {
>>     #ifdef CONFIG_CPUSETS_V1
>>       struct fmeter fmeter;        /* memory_pressure filter */
>> +
>> +    /* for custom sched domain */
>> +    int relax_domain_level;
>>   #endif
>>   };
>>   @@ -296,6 +296,8 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs,
>>   int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial);
>>   void cpuset1_init(struct cpuset *cs);
>>   void cpuset1_online_css(struct cgroup_subsys_state *css);
>> +void update_domain_attr_tree(struct sched_domain_attr *dattr,
>> +                    struct cpuset *root_cs);
>>   #else
>>   static inline void cpuset1_update_task_spread_flags(struct cpuset *cs,
>>                       struct task_struct *tsk) {}
>> @@ -307,6 +309,9 @@ static inline int cpuset1_validate_change(struct cpuset *cur,
>>                   struct cpuset *trial) { return 0; }
>>   static inline void cpuset1_init(struct cpuset *cs) {}
>>   static inline void cpuset1_online_css(struct cgroup_subsys_state *css) {}
>> +static inline void update_domain_attr_tree(struct sched_domain_attr *dattr,
>> +                    struct cpuset *root_cs) {}
>> +
>>   #endif /* CONFIG_CPUSETS_V1 */
>>     #endif /* __CPUSET_INTERNAL_H */
>> diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
>> index 574df740f21a..95de6f2a4cc5 100644
>> --- a/kernel/cgroup/cpuset-v1.c
>> +++ b/kernel/cgroup/cpuset-v1.c
>> @@ -502,6 +502,7 @@ static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft,
>>   void cpuset1_init(struct cpuset *cs)
>>   {
>>       fmeter_init(&cs->fmeter);
>> +    cs->relax_domain_level = -1;
>>   }
>>     void cpuset1_online_css(struct cgroup_subsys_state *css)
>> @@ -552,6 +553,33 @@ void cpuset1_online_css(struct cgroup_subsys_state *css)
>>       cpuset_callback_unlock_irq();
>>   }
>>   +static void
>> +update_domain_attr(struct sched_domain_attr *dattr, struct cpuset *c)
>> +{
>> +    if (dattr->relax_domain_level < c->relax_domain_level)
>> +        dattr->relax_domain_level = c->relax_domain_level;
>> +}
>> +
>> +void update_domain_attr_tree(struct sched_domain_attr *dattr,
>> +                    struct cpuset *root_cs)
>> +{
>> +    struct cpuset *cp;
>> +    struct cgroup_subsys_state *pos_css;
>> +
>> +    rcu_read_lock();
>> +    cpuset_for_each_descendant_pre(cp, pos_css, root_cs) {
>> +        /* skip the whole subtree if @cp doesn't have any CPU */
>> +        if (cpumask_empty(cp->cpus_allowed)) {
>> +            pos_css = css_rightmost_descendant(pos_css);
>> +            continue;
>> +        }
>> +
>> +        if (is_sched_load_balance(cp))
>> +            update_domain_attr(dattr, cp);
>> +    }
>> +    rcu_read_unlock();
>> +}
>> +
>>   /*
>>    * for the common functions, 'private' gives the type of file
>>    */
>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>> index e836a1f2b951..88ca8b40e01a 100644
>> --- a/kernel/cgroup/cpuset.c
>> +++ b/kernel/cgroup/cpuset.c
>> @@ -215,8 +215,6 @@ static struct cpuset top_cpuset = {
>>       .flags = BIT(CS_CPU_EXCLUSIVE) |
>>            BIT(CS_MEM_EXCLUSIVE) | BIT(CS_SCHED_LOAD_BALANCE),
>>       .partition_root_state = PRS_ROOT,
>> -    .relax_domain_level = -1,
> 
> As the cpuset1_init() function will not be called for top_cpuset, you should not remove the
> initialization of relax_domain_level. Instead, put it inside a "ifdef CONFIG_CPUSETS_V1 block.
> 

In patch 3/6, I've made cpuset_init call cpuset1_init to initialize top_cpuset.fmeter. Thus, I think
we could remove the relax_domain_level initialization here.

>> -    .remote_partition = false,
> 
> Yes, this is not really needed and can be removed.
> 
> Cheers,
> Longman
> 

-- 
Best regards,
Ridong


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH -next 4/6] cpuset: move update_domain_attr_tree to cpuset_v1.c
  2025-12-18  0:44     ` Chen Ridong
@ 2025-12-18  3:06       ` Waiman Long
  0 siblings, 0 replies; 21+ messages in thread
From: Waiman Long @ 2025-12-18  3:06 UTC (permalink / raw)
  To: Chen Ridong, Waiman Long, tj, hannes, mkoutny
  Cc: cgroups, linux-kernel, lujialin4

On 12/17/25 7:44 PM, Chen Ridong wrote:
>
> On 2025/12/18 1:09, Waiman Long wrote:
>> On 12/17/25 3:49 AM, Chen Ridong wrote:
>>> From: Chen Ridong <chenridong@huawei.com>
>>>
>>> Since relax_domain_level is only applicable to v1, move
>>> update_domain_attr_tree() to cpuset-v1.c, which solely updates
>>> relax_domain_level,
>>>
>>> Additionally, relax_domain_level is now initialized in cpuset1_inited.
>>> Accordingly, the initialization of relax_domain_level in top_cpuset is
>>> removed. The unnecessary remote_partition initialization in top_cpuset
>>> is also cleaned up.
>>>
>>> As a result, relax_domain_level can be defined in cpuset only when
>>> CONFIG_CPUSETS_V1=y.
>>>
>>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>>> ---
>>>    kernel/cgroup/cpuset-internal.h | 11 ++++++++---
>>>    kernel/cgroup/cpuset-v1.c       | 28 ++++++++++++++++++++++++++++
>>>    kernel/cgroup/cpuset.c          | 31 -------------------------------
>>>    3 files changed, 36 insertions(+), 34 deletions(-)
>>>
>>> diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
>>> index a32517da8231..677053ffb913 100644
>>> --- a/kernel/cgroup/cpuset-internal.h
>>> +++ b/kernel/cgroup/cpuset-internal.h
>>> @@ -150,9 +150,6 @@ struct cpuset {
>>>         */
>>>        int attach_in_progress;
>>>    -    /* for custom sched domain */
>>> -    int relax_domain_level;
>>> -
>>>        /* partition root state */
>>>        int partition_root_state;
>>>    @@ -182,6 +179,9 @@ struct cpuset {
>>>      #ifdef CONFIG_CPUSETS_V1
>>>        struct fmeter fmeter;        /* memory_pressure filter */
>>> +
>>> +    /* for custom sched domain */
>>> +    int relax_domain_level;
>>>    #endif
>>>    };
>>>    @@ -296,6 +296,8 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs,
>>>    int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial);
>>>    void cpuset1_init(struct cpuset *cs);
>>>    void cpuset1_online_css(struct cgroup_subsys_state *css);
>>> +void update_domain_attr_tree(struct sched_domain_attr *dattr,
>>> +                    struct cpuset *root_cs);
>>>    #else
>>>    static inline void cpuset1_update_task_spread_flags(struct cpuset *cs,
>>>                        struct task_struct *tsk) {}
>>> @@ -307,6 +309,9 @@ static inline int cpuset1_validate_change(struct cpuset *cur,
>>>                    struct cpuset *trial) { return 0; }
>>>    static inline void cpuset1_init(struct cpuset *cs) {}
>>>    static inline void cpuset1_online_css(struct cgroup_subsys_state *css) {}
>>> +static inline void update_domain_attr_tree(struct sched_domain_attr *dattr,
>>> +                    struct cpuset *root_cs) {}
>>> +
>>>    #endif /* CONFIG_CPUSETS_V1 */
>>>      #endif /* __CPUSET_INTERNAL_H */
>>> diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
>>> index 574df740f21a..95de6f2a4cc5 100644
>>> --- a/kernel/cgroup/cpuset-v1.c
>>> +++ b/kernel/cgroup/cpuset-v1.c
>>> @@ -502,6 +502,7 @@ static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft,
>>>    void cpuset1_init(struct cpuset *cs)
>>>    {
>>>        fmeter_init(&cs->fmeter);
>>> +    cs->relax_domain_level = -1;
>>>    }
>>>      void cpuset1_online_css(struct cgroup_subsys_state *css)
>>> @@ -552,6 +553,33 @@ void cpuset1_online_css(struct cgroup_subsys_state *css)
>>>        cpuset_callback_unlock_irq();
>>>    }
>>>    +static void
>>> +update_domain_attr(struct sched_domain_attr *dattr, struct cpuset *c)
>>> +{
>>> +    if (dattr->relax_domain_level < c->relax_domain_level)
>>> +        dattr->relax_domain_level = c->relax_domain_level;
>>> +}
>>> +
>>> +void update_domain_attr_tree(struct sched_domain_attr *dattr,
>>> +                    struct cpuset *root_cs)
>>> +{
>>> +    struct cpuset *cp;
>>> +    struct cgroup_subsys_state *pos_css;
>>> +
>>> +    rcu_read_lock();
>>> +    cpuset_for_each_descendant_pre(cp, pos_css, root_cs) {
>>> +        /* skip the whole subtree if @cp doesn't have any CPU */
>>> +        if (cpumask_empty(cp->cpus_allowed)) {
>>> +            pos_css = css_rightmost_descendant(pos_css);
>>> +            continue;
>>> +        }
>>> +
>>> +        if (is_sched_load_balance(cp))
>>> +            update_domain_attr(dattr, cp);
>>> +    }
>>> +    rcu_read_unlock();
>>> +}
>>> +
>>>    /*
>>>     * for the common functions, 'private' gives the type of file
>>>     */
>>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>>> index e836a1f2b951..88ca8b40e01a 100644
>>> --- a/kernel/cgroup/cpuset.c
>>> +++ b/kernel/cgroup/cpuset.c
>>> @@ -215,8 +215,6 @@ static struct cpuset top_cpuset = {
>>>        .flags = BIT(CS_CPU_EXCLUSIVE) |
>>>             BIT(CS_MEM_EXCLUSIVE) | BIT(CS_SCHED_LOAD_BALANCE),
>>>        .partition_root_state = PRS_ROOT,
>>> -    .relax_domain_level = -1,
>> As the cpuset1_init() function will not be called for top_cpuset, you should not remove the
>> initialization of relax_domain_level. Instead, put it inside a "ifdef CONFIG_CPUSETS_V1 block.
>>
> In patch 3/6, I've made cpuset_init call cpuset1_init to initialize top_cpuset.fmeter. Thus, I think
> we could remove the relax_domain_level initialization here.

I missed that. You are right. Remove the initialization here should be 
all right.

Cheers,
Longman

>
>>> -    .remote_partition = false,
>> Yes, this is not really needed and can be removed.
>>
>> Cheers,
>> Longman
>>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH -next 5/6] cpuset: separate generate_sched_domains for v1 and v2
  2025-12-17  8:49 [PATCH -next 0/6] cpuset: further separate v1 and v2 implementations Chen Ridong
                   ` (3 preceding siblings ...)
  2025-12-17  8:49 ` [PATCH -next 4/6] cpuset: move update_domain_attr_tree to cpuset_v1.c Chen Ridong
@ 2025-12-17  8:49 ` Chen Ridong
  2025-12-17 17:48   ` Waiman Long
  2025-12-17  8:49 ` [PATCH -next 6/6] cpuset: remove v1-specific code from generate_sched_domains Chen Ridong
  5 siblings, 1 reply; 21+ messages in thread
From: Chen Ridong @ 2025-12-17  8:49 UTC (permalink / raw)
  To: longman, tj, hannes, mkoutny; +Cc: cgroups, linux-kernel, lujialin4, chenridong

From: Chen Ridong <chenridong@huawei.com>

The generate_sched_domains() function currently handles both v1 and v2
logic. However, the underlying mechanisms for building scheduler domains
differ significantly between the two versions. For cpuset v2, scheduler
domains are straightforwardly derived from valid partitions, whereas
cpuset v1 employs a more complex union-find algorithm to merge overlapping
cpusets. Co-locating these implementations complicates maintenance.

This patch, along with subsequent ones, aims to separate the v1 and v2
logic. For ease of review, this patch first copies the
generate_sched_domains() function into cpuset-v1.c as
cpuset1_generate_sched_domains() and removes v2-specific code. Common
helpers and top_cpuset are declared in cpuset-internal.h. When operating
in v1 mode, the code now calls cpuset1_generate_sched_domains().

Currently there is some code duplication, which will be largely eliminated
once v1-specific code is removed from v2 in the following patch.

Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
 kernel/cgroup/cpuset-internal.h |  24 +++++
 kernel/cgroup/cpuset-v1.c       | 167 ++++++++++++++++++++++++++++++++
 kernel/cgroup/cpuset.c          |  31 +-----
 3 files changed, 195 insertions(+), 27 deletions(-)

diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
index 677053ffb913..bd767f8cb0ed 100644
--- a/kernel/cgroup/cpuset-internal.h
+++ b/kernel/cgroup/cpuset-internal.h
@@ -9,6 +9,7 @@
 #include <linux/cpuset.h>
 #include <linux/spinlock.h>
 #include <linux/union_find.h>
+#include <linux/sched/isolation.h>
 
 /* See "Frequency meter" comments, below. */
 
@@ -185,6 +186,8 @@ struct cpuset {
 #endif
 };
 
+extern struct cpuset top_cpuset;
+
 static inline struct cpuset *css_cs(struct cgroup_subsys_state *css)
 {
 	return css ? container_of(css, struct cpuset, css) : NULL;
@@ -242,6 +245,22 @@ static inline int is_spread_slab(const struct cpuset *cs)
 	return test_bit(CS_SPREAD_SLAB, &cs->flags);
 }
 
+/*
+ * Helper routine for generate_sched_domains().
+ * Do cpusets a, b have overlapping effective cpus_allowed masks?
+ */
+static inline int cpusets_overlap(struct cpuset *a, struct cpuset *b)
+{
+	return cpumask_intersects(a->effective_cpus, b->effective_cpus);
+}
+
+static inline int nr_cpusets(void)
+{
+	assert_cpuset_lock_held();
+	/* jump label reference count + the top-level cpuset */
+	return static_key_count(&cpusets_enabled_key.key) + 1;
+}
+
 /**
  * cpuset_for_each_child - traverse online children of a cpuset
  * @child_cs: loop cursor pointing to the current child
@@ -298,6 +317,9 @@ void cpuset1_init(struct cpuset *cs);
 void cpuset1_online_css(struct cgroup_subsys_state *css);
 void update_domain_attr_tree(struct sched_domain_attr *dattr,
 				    struct cpuset *root_cs);
+int cpuset1_generate_sched_domains(cpumask_var_t **domains,
+			struct sched_domain_attr **attributes);
+
 #else
 static inline void cpuset1_update_task_spread_flags(struct cpuset *cs,
 					struct task_struct *tsk) {}
@@ -311,6 +333,8 @@ static inline void cpuset1_init(struct cpuset *cs) {}
 static inline void cpuset1_online_css(struct cgroup_subsys_state *css) {}
 static inline void update_domain_attr_tree(struct sched_domain_attr *dattr,
 				    struct cpuset *root_cs) {}
+static inline int cpuset1_generate_sched_domains(cpumask_var_t **domains,
+			struct sched_domain_attr **attributes) { return 0; };
 
 #endif /* CONFIG_CPUSETS_V1 */
 
diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
index 95de6f2a4cc5..5c0bded46a7c 100644
--- a/kernel/cgroup/cpuset-v1.c
+++ b/kernel/cgroup/cpuset-v1.c
@@ -580,6 +580,173 @@ void update_domain_attr_tree(struct sched_domain_attr *dattr,
 	rcu_read_unlock();
 }
 
+/*
+ * cpuset1_generate_sched_domains()
+ *
+ * Finding the best partition (set of domains):
+ *	The double nested loops below over i, j scan over the load
+ *	balanced cpusets (using the array of cpuset pointers in csa[])
+ *	looking for pairs of cpusets that have overlapping cpus_allowed
+ *	and merging them using a union-find algorithm.
+ *
+ *	The union of the cpus_allowed masks from the set of all cpusets
+ *	having the same root then form the one element of the partition
+ *	(one sched domain) to be passed to partition_sched_domains().
+ */
+int cpuset1_generate_sched_domains(cpumask_var_t **domains,
+			struct sched_domain_attr **attributes)
+{
+	struct cpuset *cp;	/* top-down scan of cpusets */
+	struct cpuset **csa;	/* array of all cpuset ptrs */
+	int csn;		/* how many cpuset ptrs in csa so far */
+	int i, j;		/* indices for partition finding loops */
+	cpumask_var_t *doms;	/* resulting partition; i.e. sched domains */
+	struct sched_domain_attr *dattr;  /* attributes for custom domains */
+	int ndoms = 0;		/* number of sched domains in result */
+	int nslot;		/* next empty doms[] struct cpumask slot */
+	struct cgroup_subsys_state *pos_css;
+	bool root_load_balance = is_sched_load_balance(&top_cpuset);
+	int nslot_update;
+
+	assert_cpuset_lock_held();
+
+	doms = NULL;
+	dattr = NULL;
+	csa = NULL;
+
+	/* Special case for the 99% of systems with one, full, sched domain */
+	if (root_load_balance) {
+single_root_domain:
+		ndoms = 1;
+		doms = alloc_sched_domains(ndoms);
+		if (!doms)
+			goto done;
+
+		dattr = kmalloc(sizeof(struct sched_domain_attr), GFP_KERNEL);
+		if (dattr) {
+			*dattr = SD_ATTR_INIT;
+			update_domain_attr_tree(dattr, &top_cpuset);
+		}
+		cpumask_and(doms[0], top_cpuset.effective_cpus,
+			    housekeeping_cpumask(HK_TYPE_DOMAIN));
+
+		goto done;
+	}
+
+	csa = kmalloc_array(nr_cpusets(), sizeof(cp), GFP_KERNEL);
+	if (!csa)
+		goto done;
+	csn = 0;
+
+	rcu_read_lock();
+	if (root_load_balance)
+		csa[csn++] = &top_cpuset;
+	cpuset_for_each_descendant_pre(cp, pos_css, &top_cpuset) {
+		if (cp == &top_cpuset)
+			continue;
+
+		/*
+		 * v1:
+		 * Continue traversing beyond @cp iff @cp has some CPUs and
+		 * isn't load balancing.  The former is obvious.  The
+		 * latter: All child cpusets contain a subset of the
+		 * parent's cpus, so just skip them, and then we call
+		 * update_domain_attr_tree() to calc relax_domain_level of
+		 * the corresponding sched domain.
+		 */
+		if (!cpumask_empty(cp->cpus_allowed) &&
+		    !(is_sched_load_balance(cp) &&
+		      cpumask_intersects(cp->cpus_allowed,
+					 housekeeping_cpumask(HK_TYPE_DOMAIN))))
+			continue;
+
+		if (is_sched_load_balance(cp) &&
+		    !cpumask_empty(cp->effective_cpus))
+			csa[csn++] = cp;
+
+		/* skip @cp's subtree */
+		pos_css = css_rightmost_descendant(pos_css);
+		continue;
+	}
+	rcu_read_unlock();
+
+	/*
+	 * If there are only isolated partitions underneath the cgroup root,
+	 * we can optimize out unneeded sched domains scanning.
+	 */
+	if (root_load_balance && (csn == 1))
+		goto single_root_domain;
+
+	for (i = 0; i < csn; i++)
+		uf_node_init(&csa[i]->node);
+
+	/* Merge overlapping cpusets */
+	for (i = 0; i < csn; i++) {
+		for (j = i + 1; j < csn; j++) {
+			if (cpusets_overlap(csa[i], csa[j]))
+				uf_union(&csa[i]->node, &csa[j]->node);
+		}
+	}
+
+	/* Count the total number of domains */
+	for (i = 0; i < csn; i++) {
+		if (uf_find(&csa[i]->node) == &csa[i]->node)
+			ndoms++;
+	}
+
+	/*
+	 * Now we know how many domains to create.
+	 * Convert <csn, csa> to <ndoms, doms> and populate cpu masks.
+	 */
+	doms = alloc_sched_domains(ndoms);
+	if (!doms)
+		goto done;
+
+	/*
+	 * The rest of the code, including the scheduler, can deal with
+	 * dattr==NULL case. No need to abort if alloc fails.
+	 */
+	dattr = kmalloc_array(ndoms, sizeof(struct sched_domain_attr),
+			      GFP_KERNEL);
+
+	for (nslot = 0, i = 0; i < csn; i++) {
+		nslot_update = 0;
+		for (j = i; j < csn; j++) {
+			if (uf_find(&csa[j]->node) == &csa[i]->node) {
+				struct cpumask *dp = doms[nslot];
+
+				if (i == j) {
+					nslot_update = 1;
+					cpumask_clear(dp);
+					if (dattr)
+						*(dattr + nslot) = SD_ATTR_INIT;
+				}
+				cpumask_or(dp, dp, csa[j]->effective_cpus);
+				cpumask_and(dp, dp, housekeeping_cpumask(HK_TYPE_DOMAIN));
+				if (dattr)
+					update_domain_attr_tree(dattr + nslot, csa[j]);
+			}
+		}
+		if (nslot_update)
+			nslot++;
+	}
+	BUG_ON(nslot != ndoms);
+
+done:
+	kfree(csa);
+
+	/*
+	 * Fallback to the default domain if kmalloc() failed.
+	 * See comments in partition_sched_domains().
+	 */
+	if (doms == NULL)
+		ndoms = 1;
+
+	*domains    = doms;
+	*attributes = dattr;
+	return ndoms;
+}
+
 /*
  * for the common functions, 'private' gives the type of file
  */
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 88ca8b40e01a..6bb0b201c34b 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -211,7 +211,7 @@ static inline void notify_partition_change(struct cpuset *cs, int old_prs)
  * If cpu_online_mask is used while a hotunplug operation is happening in
  * parallel, we may leave an offline CPU in cpu_allowed or some other masks.
  */
-static struct cpuset top_cpuset = {
+struct cpuset top_cpuset = {
 	.flags = BIT(CS_CPU_EXCLUSIVE) |
 		 BIT(CS_MEM_EXCLUSIVE) | BIT(CS_SCHED_LOAD_BALANCE),
 	.partition_root_state = PRS_ROOT,
@@ -744,21 +744,6 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial)
 }
 
 #ifdef CONFIG_SMP
-/*
- * Helper routine for generate_sched_domains().
- * Do cpusets a, b have overlapping effective cpus_allowed masks?
- */
-static int cpusets_overlap(struct cpuset *a, struct cpuset *b)
-{
-	return cpumask_intersects(a->effective_cpus, b->effective_cpus);
-}
-
-/* Must be called with cpuset_mutex held.  */
-static inline int nr_cpusets(void)
-{
-	/* jump label reference count + the top-level cpuset */
-	return static_key_count(&cpusets_enabled_key.key) + 1;
-}
 
 /*
  * generate_sched_domains()
@@ -798,17 +783,6 @@ static inline int nr_cpusets(void)
  *	   convenient format, that can be easily compared to the prior
  *	   value to determine what partition elements (sched domains)
  *	   were changed (added or removed.)
- *
- * Finding the best partition (set of domains):
- *	The double nested loops below over i, j scan over the load
- *	balanced cpusets (using the array of cpuset pointers in csa[])
- *	looking for pairs of cpusets that have overlapping cpus_allowed
- *	and merging them using a union-find algorithm.
- *
- *	The union of the cpus_allowed masks from the set of all cpusets
- *	having the same root then form the one element of the partition
- *	(one sched domain) to be passed to partition_sched_domains().
- *
  */
 static int generate_sched_domains(cpumask_var_t **domains,
 			struct sched_domain_attr **attributes)
@@ -826,6 +800,9 @@ static int generate_sched_domains(cpumask_var_t **domains,
 	bool cgrpv2 = cpuset_v2();
 	int nslot_update;
 
+	if (!cgrpv2)
+		return cpuset1_generate_sched_domains(domains, attributes);
+
 	doms = NULL;
 	dattr = NULL;
 	csa = NULL;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH -next 5/6] cpuset: separate generate_sched_domains for v1 and v2
  2025-12-17  8:49 ` [PATCH -next 5/6] cpuset: separate generate_sched_domains for v1 and v2 Chen Ridong
@ 2025-12-17 17:48   ` Waiman Long
  2025-12-18  1:28     ` Chen Ridong
  0 siblings, 1 reply; 21+ messages in thread
From: Waiman Long @ 2025-12-17 17:48 UTC (permalink / raw)
  To: Chen Ridong, tj, hannes, mkoutny; +Cc: cgroups, linux-kernel, lujialin4

On 12/17/25 3:49 AM, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
>
> The generate_sched_domains() function currently handles both v1 and v2
> logic. However, the underlying mechanisms for building scheduler domains
> differ significantly between the two versions. For cpuset v2, scheduler
> domains are straightforwardly derived from valid partitions, whereas
> cpuset v1 employs a more complex union-find algorithm to merge overlapping
> cpusets. Co-locating these implementations complicates maintenance.
>
> This patch, along with subsequent ones, aims to separate the v1 and v2
> logic. For ease of review, this patch first copies the
> generate_sched_domains() function into cpuset-v1.c as
> cpuset1_generate_sched_domains() and removes v2-specific code. Common
> helpers and top_cpuset are declared in cpuset-internal.h. When operating
> in v1 mode, the code now calls cpuset1_generate_sched_domains().
>
> Currently there is some code duplication, which will be largely eliminated
> once v1-specific code is removed from v2 in the following patch.
>
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> ---
>   kernel/cgroup/cpuset-internal.h |  24 +++++
>   kernel/cgroup/cpuset-v1.c       | 167 ++++++++++++++++++++++++++++++++
>   kernel/cgroup/cpuset.c          |  31 +-----
>   3 files changed, 195 insertions(+), 27 deletions(-)
>
> diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
> index 677053ffb913..bd767f8cb0ed 100644
> --- a/kernel/cgroup/cpuset-internal.h
> +++ b/kernel/cgroup/cpuset-internal.h
> @@ -9,6 +9,7 @@
>   #include <linux/cpuset.h>
>   #include <linux/spinlock.h>
>   #include <linux/union_find.h>
> +#include <linux/sched/isolation.h>
>   
>   /* See "Frequency meter" comments, below. */
>   
> @@ -185,6 +186,8 @@ struct cpuset {
>   #endif
>   };
>   
> +extern struct cpuset top_cpuset;
> +
>   static inline struct cpuset *css_cs(struct cgroup_subsys_state *css)
>   {
>   	return css ? container_of(css, struct cpuset, css) : NULL;
> @@ -242,6 +245,22 @@ static inline int is_spread_slab(const struct cpuset *cs)
>   	return test_bit(CS_SPREAD_SLAB, &cs->flags);
>   }
>   
> +/*
> + * Helper routine for generate_sched_domains().
> + * Do cpusets a, b have overlapping effective cpus_allowed masks?
> + */
> +static inline int cpusets_overlap(struct cpuset *a, struct cpuset *b)
> +{
> +	return cpumask_intersects(a->effective_cpus, b->effective_cpus);
> +}
> +
> +static inline int nr_cpusets(void)
> +{
> +	assert_cpuset_lock_held();

For a simple helper like this one which only does an atomic_read(), I 
don't think you need to assert that cpuset_mutex is held.

> +	/* jump label reference count + the top-level cpuset */
> +	return static_key_count(&cpusets_enabled_key.key) + 1;
> +}
> +
>   /**
>    * cpuset_for_each_child - traverse online children of a cpuset
>    * @child_cs: loop cursor pointing to the current child
> @@ -298,6 +317,9 @@ void cpuset1_init(struct cpuset *cs);
>   void cpuset1_online_css(struct cgroup_subsys_state *css);
>   void update_domain_attr_tree(struct sched_domain_attr *dattr,
>   				    struct cpuset *root_cs);
> +int cpuset1_generate_sched_domains(cpumask_var_t **domains,
> +			struct sched_domain_attr **attributes);
> +
>   #else
>   static inline void cpuset1_update_task_spread_flags(struct cpuset *cs,
>   					struct task_struct *tsk) {}
> @@ -311,6 +333,8 @@ static inline void cpuset1_init(struct cpuset *cs) {}
>   static inline void cpuset1_online_css(struct cgroup_subsys_state *css) {}
>   static inline void update_domain_attr_tree(struct sched_domain_attr *dattr,
>   				    struct cpuset *root_cs) {}
> +static inline int cpuset1_generate_sched_domains(cpumask_var_t **domains,
> +			struct sched_domain_attr **attributes) { return 0; };
>   
>   #endif /* CONFIG_CPUSETS_V1 */
>   
> diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
> index 95de6f2a4cc5..5c0bded46a7c 100644
> --- a/kernel/cgroup/cpuset-v1.c
> +++ b/kernel/cgroup/cpuset-v1.c
> @@ -580,6 +580,173 @@ void update_domain_attr_tree(struct sched_domain_attr *dattr,
>   	rcu_read_unlock();
>   }
>   
> +/*
> + * cpuset1_generate_sched_domains()
> + *
> + * Finding the best partition (set of domains):
> + *	The double nested loops below over i, j scan over the load
> + *	balanced cpusets (using the array of cpuset pointers in csa[])
> + *	looking for pairs of cpusets that have overlapping cpus_allowed
> + *	and merging them using a union-find algorithm.
> + *
> + *	The union of the cpus_allowed masks from the set of all cpusets
> + *	having the same root then form the one element of the partition
> + *	(one sched domain) to be passed to partition_sched_domains().
> + */
> +int cpuset1_generate_sched_domains(cpumask_var_t **domains,
> +			struct sched_domain_attr **attributes)
> +{
> +	struct cpuset *cp;	/* top-down scan of cpusets */
> +	struct cpuset **csa;	/* array of all cpuset ptrs */
> +	int csn;		/* how many cpuset ptrs in csa so far */
> +	int i, j;		/* indices for partition finding loops */
> +	cpumask_var_t *doms;	/* resulting partition; i.e. sched domains */
> +	struct sched_domain_attr *dattr;  /* attributes for custom domains */
> +	int ndoms = 0;		/* number of sched domains in result */
> +	int nslot;		/* next empty doms[] struct cpumask slot */
> +	struct cgroup_subsys_state *pos_css;
> +	bool root_load_balance = is_sched_load_balance(&top_cpuset);
> +	int nslot_update;
> +
> +	assert_cpuset_lock_held();
> +
> +	doms = NULL;
> +	dattr = NULL;
> +	csa = NULL;
> +
> +	/* Special case for the 99% of systems with one, full, sched domain */
> +	if (root_load_balance) {
> +single_root_domain:
> +		ndoms = 1;
> +		doms = alloc_sched_domains(ndoms);
> +		if (!doms)
> +			goto done;
> +
> +		dattr = kmalloc(sizeof(struct sched_domain_attr), GFP_KERNEL);
> +		if (dattr) {
> +			*dattr = SD_ATTR_INIT;
> +			update_domain_attr_tree(dattr, &top_cpuset);
> +		}
> +		cpumask_and(doms[0], top_cpuset.effective_cpus,
> +			    housekeeping_cpumask(HK_TYPE_DOMAIN));
> +
> +		goto done;
> +	}
> +
> +	csa = kmalloc_array(nr_cpusets(), sizeof(cp), GFP_KERNEL);
> +	if (!csa)
> +		goto done;
> +	csn = 0;
> +
> +	rcu_read_lock();
> +	if (root_load_balance)
> +		csa[csn++] = &top_cpuset;
> +	cpuset_for_each_descendant_pre(cp, pos_css, &top_cpuset) {
> +		if (cp == &top_cpuset)
> +			continue;
> +
> +		/*
> +		 * v1:
Remove this v1 line.
> +		 * Continue traversing beyond @cp iff @cp has some CPUs and
> +		 * isn't load balancing.  The former is obvious.  The
> +		 * latter: All child cpusets contain a subset of the
> +		 * parent's cpus, so just skip them, and then we call
> +		 * update_domain_attr_tree() to calc relax_domain_level of
> +		 * the corresponding sched domain.
> +		 */
> +		if (!cpumask_empty(cp->cpus_allowed) &&
> +		    !(is_sched_load_balance(cp) &&
> +		      cpumask_intersects(cp->cpus_allowed,
> +					 housekeeping_cpumask(HK_TYPE_DOMAIN))))
> +			continue;
> +
> +		if (is_sched_load_balance(cp) &&
> +		    !cpumask_empty(cp->effective_cpus))
> +			csa[csn++] = cp;
> +
> +		/* skip @cp's subtree */
> +		pos_css = css_rightmost_descendant(pos_css);
> +		continue;
> +	}
> +	rcu_read_unlock();
> +
> +	/*
> +	 * If there are only isolated partitions underneath the cgroup root,
> +	 * we can optimize out unneeded sched domains scanning.
> +	 */
> +	if (root_load_balance && (csn == 1))
> +		goto single_root_domain;

This check is v2 specific and you can remove it as well as the 
"single_root_domain" label.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH -next 5/6] cpuset: separate generate_sched_domains for v1 and v2
  2025-12-17 17:48   ` Waiman Long
@ 2025-12-18  1:28     ` Chen Ridong
  2025-12-18  3:09       ` Waiman Long
  0 siblings, 1 reply; 21+ messages in thread
From: Chen Ridong @ 2025-12-18  1:28 UTC (permalink / raw)
  To: Waiman Long, tj, hannes, mkoutny; +Cc: cgroups, linux-kernel, lujialin4



On 2025/12/18 1:48, Waiman Long wrote:
Thank you Longman:
> On 12/17/25 3:49 AM, Chen Ridong wrote:
>> From: Chen Ridong <chenridong@huawei.com>
>>
>> The generate_sched_domains() function currently handles both v1 and v2
>> logic. However, the underlying mechanisms for building scheduler domains
>> differ significantly between the two versions. For cpuset v2, scheduler
>> domains are straightforwardly derived from valid partitions, whereas
>> cpuset v1 employs a more complex union-find algorithm to merge overlapping
>> cpusets. Co-locating these implementations complicates maintenance.
>>
>> This patch, along with subsequent ones, aims to separate the v1 and v2
>> logic. For ease of review, this patch first copies the
>> generate_sched_domains() function into cpuset-v1.c as
>> cpuset1_generate_sched_domains() and removes v2-specific code. Common
>> helpers and top_cpuset are declared in cpuset-internal.h. When operating
>> in v1 mode, the code now calls cpuset1_generate_sched_domains().
>>
>> Currently there is some code duplication, which will be largely eliminated
>> once v1-specific code is removed from v2 in the following patch.
>>
>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>> ---
>>   kernel/cgroup/cpuset-internal.h |  24 +++++
>>   kernel/cgroup/cpuset-v1.c       | 167 ++++++++++++++++++++++++++++++++
>>   kernel/cgroup/cpuset.c          |  31 +-----
>>   3 files changed, 195 insertions(+), 27 deletions(-)
>>
>> diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
>> index 677053ffb913..bd767f8cb0ed 100644
>> --- a/kernel/cgroup/cpuset-internal.h
>> +++ b/kernel/cgroup/cpuset-internal.h
>> @@ -9,6 +9,7 @@
>>   #include <linux/cpuset.h>
>>   #include <linux/spinlock.h>
>>   #include <linux/union_find.h>
>> +#include <linux/sched/isolation.h>
>>     /* See "Frequency meter" comments, below. */
>>   @@ -185,6 +186,8 @@ struct cpuset {
>>   #endif
>>   };
>>   +extern struct cpuset top_cpuset;
>> +
>>   static inline struct cpuset *css_cs(struct cgroup_subsys_state *css)
>>   {
>>       return css ? container_of(css, struct cpuset, css) : NULL;
>> @@ -242,6 +245,22 @@ static inline int is_spread_slab(const struct cpuset *cs)
>>       return test_bit(CS_SPREAD_SLAB, &cs->flags);
>>   }
>>   +/*
>> + * Helper routine for generate_sched_domains().
>> + * Do cpusets a, b have overlapping effective cpus_allowed masks?
>> + */
>> +static inline int cpusets_overlap(struct cpuset *a, struct cpuset *b)
>> +{
>> +    return cpumask_intersects(a->effective_cpus, b->effective_cpus);
>> +}
>> +
>> +static inline int nr_cpusets(void)
>> +{
>> +    assert_cpuset_lock_held();
> 
> For a simple helper like this one which only does an atomic_read(), I don't think you need to assert
> that cpuset_mutex is held.
> 

Will remove it.

I added the lock because the location where it’s removed already includes the comment:
/* Must be called with cpuset_mutex held.  */

>> +    /* jump label reference count + the top-level cpuset */
>> +    return static_key_count(&cpusets_enabled_key.key) + 1;
>> +}
>> +
>>   /**
>>    * cpuset_for_each_child - traverse online children of a cpuset
>>    * @child_cs: loop cursor pointing to the current child
>> @@ -298,6 +317,9 @@ void cpuset1_init(struct cpuset *cs);
>>   void cpuset1_online_css(struct cgroup_subsys_state *css);
>>   void update_domain_attr_tree(struct sched_domain_attr *dattr,
>>                       struct cpuset *root_cs);
>> +int cpuset1_generate_sched_domains(cpumask_var_t **domains,
>> +            struct sched_domain_attr **attributes);
>> +
>>   #else
>>   static inline void cpuset1_update_task_spread_flags(struct cpuset *cs,
>>                       struct task_struct *tsk) {}
>> @@ -311,6 +333,8 @@ static inline void cpuset1_init(struct cpuset *cs) {}
>>   static inline void cpuset1_online_css(struct cgroup_subsys_state *css) {}
>>   static inline void update_domain_attr_tree(struct sched_domain_attr *dattr,
>>                       struct cpuset *root_cs) {}
>> +static inline int cpuset1_generate_sched_domains(cpumask_var_t **domains,
>> +            struct sched_domain_attr **attributes) { return 0; };
>>     #endif /* CONFIG_CPUSETS_V1 */
>>   diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
>> index 95de6f2a4cc5..5c0bded46a7c 100644
>> --- a/kernel/cgroup/cpuset-v1.c
>> +++ b/kernel/cgroup/cpuset-v1.c
>> @@ -580,6 +580,173 @@ void update_domain_attr_tree(struct sched_domain_attr *dattr,
>>       rcu_read_unlock();
>>   }
>>   +/*
>> + * cpuset1_generate_sched_domains()
>> + *
>> + * Finding the best partition (set of domains):
>> + *    The double nested loops below over i, j scan over the load
>> + *    balanced cpusets (using the array of cpuset pointers in csa[])
>> + *    looking for pairs of cpusets that have overlapping cpus_allowed
>> + *    and merging them using a union-find algorithm.
>> + *
>> + *    The union of the cpus_allowed masks from the set of all cpusets
>> + *    having the same root then form the one element of the partition
>> + *    (one sched domain) to be passed to partition_sched_domains().
>> + */
>> +int cpuset1_generate_sched_domains(cpumask_var_t **domains,
>> +            struct sched_domain_attr **attributes)
>> +{
>> +    struct cpuset *cp;    /* top-down scan of cpusets */
>> +    struct cpuset **csa;    /* array of all cpuset ptrs */
>> +    int csn;        /* how many cpuset ptrs in csa so far */
>> +    int i, j;        /* indices for partition finding loops */
>> +    cpumask_var_t *doms;    /* resulting partition; i.e. sched domains */
>> +    struct sched_domain_attr *dattr;  /* attributes for custom domains */
>> +    int ndoms = 0;        /* number of sched domains in result */
>> +    int nslot;        /* next empty doms[] struct cpumask slot */
>> +    struct cgroup_subsys_state *pos_css;
>> +    bool root_load_balance = is_sched_load_balance(&top_cpuset);
>> +    int nslot_update;
>> +
>> +    assert_cpuset_lock_held();
>> +
>> +    doms = NULL;
>> +    dattr = NULL;
>> +    csa = NULL;
>> +
>> +    /* Special case for the 99% of systems with one, full, sched domain */
>> +    if (root_load_balance) {
>> +single_root_domain:
>> +        ndoms = 1;
>> +        doms = alloc_sched_domains(ndoms);
>> +        if (!doms)
>> +            goto done;
>> +
>> +        dattr = kmalloc(sizeof(struct sched_domain_attr), GFP_KERNEL);
>> +        if (dattr) {
>> +            *dattr = SD_ATTR_INIT;
>> +            update_domain_attr_tree(dattr, &top_cpuset);
>> +        }
>> +        cpumask_and(doms[0], top_cpuset.effective_cpus,
>> +                housekeeping_cpumask(HK_TYPE_DOMAIN));
>> +
>> +        goto done;
>> +    }
>> +
>> +    csa = kmalloc_array(nr_cpusets(), sizeof(cp), GFP_KERNEL);
>> +    if (!csa)
>> +        goto done;
>> +    csn = 0;
>> +
>> +    rcu_read_lock();
>> +    if (root_load_balance)
>> +        csa[csn++] = &top_cpuset;
>> +    cpuset_for_each_descendant_pre(cp, pos_css, &top_cpuset) {
>> +        if (cp == &top_cpuset)
>> +            continue;
>> +
>> +        /*
>> +         * v1:
> Remove this v1 line.

Will do.

>> +         * Continue traversing beyond @cp iff @cp has some CPUs and
>> +         * isn't load balancing.  The former is obvious.  The
>> +         * latter: All child cpusets contain a subset of the
>> +         * parent's cpus, so just skip them, and then we call
>> +         * update_domain_attr_tree() to calc relax_domain_level of
>> +         * the corresponding sched domain.
>> +         */
>> +        if (!cpumask_empty(cp->cpus_allowed) &&
>> +            !(is_sched_load_balance(cp) &&
>> +              cpumask_intersects(cp->cpus_allowed,
>> +                     housekeeping_cpumask(HK_TYPE_DOMAIN))))
>> +            continue;
>> +
>> +        if (is_sched_load_balance(cp) &&
>> +            !cpumask_empty(cp->effective_cpus))
>> +            csa[csn++] = cp;
>> +
>> +        /* skip @cp's subtree */
>> +        pos_css = css_rightmost_descendant(pos_css);
>> +        continue;
>> +    }
>> +    rcu_read_unlock();
>> +
>> +    /*
>> +     * If there are only isolated partitions underneath the cgroup root,
>> +     * we can optimize out unneeded sched domains scanning.
>> +     */
>> +    if (root_load_balance && (csn == 1))
>> +        goto single_root_domain;
> 
> This check is v2 specific and you can remove it as well as the "single_root_domain" label.
> 

Thank you.

Will remove.

Just a note — I removed this code for cpuset v2. Please confirm if that's acceptable. If we drop the
v1-specific logic, handling this case wouldn’t take much extra work.

-- 
Best regards,
Ridong


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH -next 5/6] cpuset: separate generate_sched_domains for v1 and v2
  2025-12-18  1:28     ` Chen Ridong
@ 2025-12-18  3:09       ` Waiman Long
  2025-12-18  3:31         ` Chen Ridong
  0 siblings, 1 reply; 21+ messages in thread
From: Waiman Long @ 2025-12-18  3:09 UTC (permalink / raw)
  To: Chen Ridong, Waiman Long, tj, hannes, mkoutny
  Cc: cgroups, linux-kernel, lujialin4

On 12/17/25 8:28 PM, Chen Ridong wrote:
>
> On 2025/12/18 1:48, Waiman Long wrote:
> Thank you Longman:
>> On 12/17/25 3:49 AM, Chen Ridong wrote:
>>> From: Chen Ridong <chenridong@huawei.com>
>>>
>>> The generate_sched_domains() function currently handles both v1 and v2
>>> logic. However, the underlying mechanisms for building scheduler domains
>>> differ significantly between the two versions. For cpuset v2, scheduler
>>> domains are straightforwardly derived from valid partitions, whereas
>>> cpuset v1 employs a more complex union-find algorithm to merge overlapping
>>> cpusets. Co-locating these implementations complicates maintenance.
>>>
>>> This patch, along with subsequent ones, aims to separate the v1 and v2
>>> logic. For ease of review, this patch first copies the
>>> generate_sched_domains() function into cpuset-v1.c as
>>> cpuset1_generate_sched_domains() and removes v2-specific code. Common
>>> helpers and top_cpuset are declared in cpuset-internal.h. When operating
>>> in v1 mode, the code now calls cpuset1_generate_sched_domains().
>>>
>>> Currently there is some code duplication, which will be largely eliminated
>>> once v1-specific code is removed from v2 in the following patch.
>>>
>>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>>> ---
>>>    kernel/cgroup/cpuset-internal.h |  24 +++++
>>>    kernel/cgroup/cpuset-v1.c       | 167 ++++++++++++++++++++++++++++++++
>>>    kernel/cgroup/cpuset.c          |  31 +-----
>>>    3 files changed, 195 insertions(+), 27 deletions(-)
>>>
>>> diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
>>> index 677053ffb913..bd767f8cb0ed 100644
>>> --- a/kernel/cgroup/cpuset-internal.h
>>> +++ b/kernel/cgroup/cpuset-internal.h
>>> @@ -9,6 +9,7 @@
>>>    #include <linux/cpuset.h>
>>>    #include <linux/spinlock.h>
>>>    #include <linux/union_find.h>
>>> +#include <linux/sched/isolation.h>
>>>      /* See "Frequency meter" comments, below. */
>>>    @@ -185,6 +186,8 @@ struct cpuset {
>>>    #endif
>>>    };
>>>    +extern struct cpuset top_cpuset;
>>> +
>>>    static inline struct cpuset *css_cs(struct cgroup_subsys_state *css)
>>>    {
>>>        return css ? container_of(css, struct cpuset, css) : NULL;
>>> @@ -242,6 +245,22 @@ static inline int is_spread_slab(const struct cpuset *cs)
>>>        return test_bit(CS_SPREAD_SLAB, &cs->flags);
>>>    }
>>>    +/*
>>> + * Helper routine for generate_sched_domains().
>>> + * Do cpusets a, b have overlapping effective cpus_allowed masks?
>>> + */
>>> +static inline int cpusets_overlap(struct cpuset *a, struct cpuset *b)
>>> +{
>>> +    return cpumask_intersects(a->effective_cpus, b->effective_cpus);
>>> +}
>>> +
>>> +static inline int nr_cpusets(void)
>>> +{
>>> +    assert_cpuset_lock_held();
>> For a simple helper like this one which only does an atomic_read(), I don't think you need to assert
>> that cpuset_mutex is held.
>>
> Will remove it.
>
> I added the lock because the location where it’s removed already includes the comment:
> /* Must be called with cpuset_mutex held.  */
>
>>> +    /* jump label reference count + the top-level cpuset */
>>> +    return static_key_count(&cpusets_enabled_key.key) + 1;
>>> +}
>>> +
>>>    /**
>>>     * cpuset_for_each_child - traverse online children of a cpuset
>>>     * @child_cs: loop cursor pointing to the current child
>>> @@ -298,6 +317,9 @@ void cpuset1_init(struct cpuset *cs);
>>>    void cpuset1_online_css(struct cgroup_subsys_state *css);
>>>    void update_domain_attr_tree(struct sched_domain_attr *dattr,
>>>                        struct cpuset *root_cs);
>>> +int cpuset1_generate_sched_domains(cpumask_var_t **domains,
>>> +            struct sched_domain_attr **attributes);
>>> +
>>>    #else
>>>    static inline void cpuset1_update_task_spread_flags(struct cpuset *cs,
>>>                        struct task_struct *tsk) {}
>>> @@ -311,6 +333,8 @@ static inline void cpuset1_init(struct cpuset *cs) {}
>>>    static inline void cpuset1_online_css(struct cgroup_subsys_state *css) {}
>>>    static inline void update_domain_attr_tree(struct sched_domain_attr *dattr,
>>>                        struct cpuset *root_cs) {}
>>> +static inline int cpuset1_generate_sched_domains(cpumask_var_t **domains,
>>> +            struct sched_domain_attr **attributes) { return 0; };
>>>      #endif /* CONFIG_CPUSETS_V1 */
>>>    diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
>>> index 95de6f2a4cc5..5c0bded46a7c 100644
>>> --- a/kernel/cgroup/cpuset-v1.c
>>> +++ b/kernel/cgroup/cpuset-v1.c
>>> @@ -580,6 +580,173 @@ void update_domain_attr_tree(struct sched_domain_attr *dattr,
>>>        rcu_read_unlock();
>>>    }
>>>    +/*
>>> + * cpuset1_generate_sched_domains()
>>> + *
>>> + * Finding the best partition (set of domains):
>>> + *    The double nested loops below over i, j scan over the load
>>> + *    balanced cpusets (using the array of cpuset pointers in csa[])
>>> + *    looking for pairs of cpusets that have overlapping cpus_allowed
>>> + *    and merging them using a union-find algorithm.
>>> + *
>>> + *    The union of the cpus_allowed masks from the set of all cpusets
>>> + *    having the same root then form the one element of the partition
>>> + *    (one sched domain) to be passed to partition_sched_domains().
>>> + */
>>> +int cpuset1_generate_sched_domains(cpumask_var_t **domains,
>>> +            struct sched_domain_attr **attributes)
>>> +{
>>> +    struct cpuset *cp;    /* top-down scan of cpusets */
>>> +    struct cpuset **csa;    /* array of all cpuset ptrs */
>>> +    int csn;        /* how many cpuset ptrs in csa so far */
>>> +    int i, j;        /* indices for partition finding loops */
>>> +    cpumask_var_t *doms;    /* resulting partition; i.e. sched domains */
>>> +    struct sched_domain_attr *dattr;  /* attributes for custom domains */
>>> +    int ndoms = 0;        /* number of sched domains in result */
>>> +    int nslot;        /* next empty doms[] struct cpumask slot */
>>> +    struct cgroup_subsys_state *pos_css;
>>> +    bool root_load_balance = is_sched_load_balance(&top_cpuset);
>>> +    int nslot_update;
>>> +
>>> +    assert_cpuset_lock_held();
>>> +
>>> +    doms = NULL;
>>> +    dattr = NULL;
>>> +    csa = NULL;
>>> +
>>> +    /* Special case for the 99% of systems with one, full, sched domain */
>>> +    if (root_load_balance) {
>>> +single_root_domain:
>>> +        ndoms = 1;
>>> +        doms = alloc_sched_domains(ndoms);
>>> +        if (!doms)
>>> +            goto done;
>>> +
>>> +        dattr = kmalloc(sizeof(struct sched_domain_attr), GFP_KERNEL);
>>> +        if (dattr) {
>>> +            *dattr = SD_ATTR_INIT;
>>> +            update_domain_attr_tree(dattr, &top_cpuset);
>>> +        }
>>> +        cpumask_and(doms[0], top_cpuset.effective_cpus,
>>> +                housekeeping_cpumask(HK_TYPE_DOMAIN));
>>> +
>>> +        goto done;
>>> +    }
>>> +
>>> +    csa = kmalloc_array(nr_cpusets(), sizeof(cp), GFP_KERNEL);
>>> +    if (!csa)
>>> +        goto done;
>>> +    csn = 0;
>>> +
>>> +    rcu_read_lock();
>>> +    if (root_load_balance)
>>> +        csa[csn++] = &top_cpuset;
>>> +    cpuset_for_each_descendant_pre(cp, pos_css, &top_cpuset) {
>>> +        if (cp == &top_cpuset)
>>> +            continue;
>>> +
>>> +        /*
>>> +         * v1:
>> Remove this v1 line.
> Will do.
>
>>> +         * Continue traversing beyond @cp iff @cp has some CPUs and
>>> +         * isn't load balancing.  The former is obvious.  The
>>> +         * latter: All child cpusets contain a subset of the
>>> +         * parent's cpus, so just skip them, and then we call
>>> +         * update_domain_attr_tree() to calc relax_domain_level of
>>> +         * the corresponding sched domain.
>>> +         */
>>> +        if (!cpumask_empty(cp->cpus_allowed) &&
>>> +            !(is_sched_load_balance(cp) &&
>>> +              cpumask_intersects(cp->cpus_allowed,
>>> +                     housekeeping_cpumask(HK_TYPE_DOMAIN))))
>>> +            continue;
>>> +
>>> +        if (is_sched_load_balance(cp) &&
>>> +            !cpumask_empty(cp->effective_cpus))
>>> +            csa[csn++] = cp;
>>> +
>>> +        /* skip @cp's subtree */
>>> +        pos_css = css_rightmost_descendant(pos_css);
>>> +        continue;
>>> +    }
>>> +    rcu_read_unlock();
>>> +
>>> +    /*
>>> +     * If there are only isolated partitions underneath the cgroup root,
>>> +     * we can optimize out unneeded sched domains scanning.
>>> +     */
>>> +    if (root_load_balance && (csn == 1))
>>> +        goto single_root_domain;
>> This check is v2 specific and you can remove it as well as the "single_root_domain" label.
>>
> Thank you.
>
> Will remove.
>
> Just a note — I removed this code for cpuset v2. Please confirm if that's acceptable. If we drop the
> v1-specific logic, handling this case wouldn’t take much extra work.

This code is there because of the single dom check above that handles 
both v1 and v2. With just one version to support, this extra code isn't 
necessary.

Cheers,
Longman


>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH -next 5/6] cpuset: separate generate_sched_domains for v1 and v2
  2025-12-18  3:09       ` Waiman Long
@ 2025-12-18  3:31         ` Chen Ridong
  0 siblings, 0 replies; 21+ messages in thread
From: Chen Ridong @ 2025-12-18  3:31 UTC (permalink / raw)
  To: Waiman Long, tj, hannes, mkoutny; +Cc: cgroups, linux-kernel, lujialin4



On 2025/12/18 11:09, Waiman Long wrote:
> On 12/17/25 8:28 PM, Chen Ridong wrote:
>>
>> On 2025/12/18 1:48, Waiman Long wrote:
>> Thank you Longman:
>>> On 12/17/25 3:49 AM, Chen Ridong wrote:
>>>> From: Chen Ridong <chenridong@huawei.com>
>>>>
>>>> The generate_sched_domains() function currently handles both v1 and v2
>>>> logic. However, the underlying mechanisms for building scheduler domains
>>>> differ significantly between the two versions. For cpuset v2, scheduler
>>>> domains are straightforwardly derived from valid partitions, whereas
>>>> cpuset v1 employs a more complex union-find algorithm to merge overlapping
>>>> cpusets. Co-locating these implementations complicates maintenance.
>>>>
>>>> This patch, along with subsequent ones, aims to separate the v1 and v2
>>>> logic. For ease of review, this patch first copies the
>>>> generate_sched_domains() function into cpuset-v1.c as
>>>> cpuset1_generate_sched_domains() and removes v2-specific code. Common
>>>> helpers and top_cpuset are declared in cpuset-internal.h. When operating
>>>> in v1 mode, the code now calls cpuset1_generate_sched_domains().
>>>>
>>>> Currently there is some code duplication, which will be largely eliminated
>>>> once v1-specific code is removed from v2 in the following patch.
>>>>
>>>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>>>> ---
>>>>    kernel/cgroup/cpuset-internal.h |  24 +++++
>>>>    kernel/cgroup/cpuset-v1.c       | 167 ++++++++++++++++++++++++++++++++
>>>>    kernel/cgroup/cpuset.c          |  31 +-----
>>>>    3 files changed, 195 insertions(+), 27 deletions(-)
>>>>
>>>> diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
>>>> index 677053ffb913..bd767f8cb0ed 100644
>>>> --- a/kernel/cgroup/cpuset-internal.h
>>>> +++ b/kernel/cgroup/cpuset-internal.h
>>>> @@ -9,6 +9,7 @@
>>>>    #include <linux/cpuset.h>
>>>>    #include <linux/spinlock.h>
>>>>    #include <linux/union_find.h>
>>>> +#include <linux/sched/isolation.h>
>>>>      /* See "Frequency meter" comments, below. */
>>>>    @@ -185,6 +186,8 @@ struct cpuset {
>>>>    #endif
>>>>    };
>>>>    +extern struct cpuset top_cpuset;
>>>> +
>>>>    static inline struct cpuset *css_cs(struct cgroup_subsys_state *css)
>>>>    {
>>>>        return css ? container_of(css, struct cpuset, css) : NULL;
>>>> @@ -242,6 +245,22 @@ static inline int is_spread_slab(const struct cpuset *cs)
>>>>        return test_bit(CS_SPREAD_SLAB, &cs->flags);
>>>>    }
>>>>    +/*
>>>> + * Helper routine for generate_sched_domains().
>>>> + * Do cpusets a, b have overlapping effective cpus_allowed masks?
>>>> + */
>>>> +static inline int cpusets_overlap(struct cpuset *a, struct cpuset *b)
>>>> +{
>>>> +    return cpumask_intersects(a->effective_cpus, b->effective_cpus);
>>>> +}
>>>> +
>>>> +static inline int nr_cpusets(void)
>>>> +{
>>>> +    assert_cpuset_lock_held();
>>> For a simple helper like this one which only does an atomic_read(), I don't think you need to assert
>>> that cpuset_mutex is held.
>>>
>> Will remove it.
>>
>> I added the lock because the location where it’s removed already includes the comment:
>> /* Must be called with cpuset_mutex held.  */
>>
>>>> +    /* jump label reference count + the top-level cpuset */
>>>> +    return static_key_count(&cpusets_enabled_key.key) + 1;
>>>> +}
>>>> +
>>>>    /**
>>>>     * cpuset_for_each_child - traverse online children of a cpuset
>>>>     * @child_cs: loop cursor pointing to the current child
>>>> @@ -298,6 +317,9 @@ void cpuset1_init(struct cpuset *cs);
>>>>    void cpuset1_online_css(struct cgroup_subsys_state *css);
>>>>    void update_domain_attr_tree(struct sched_domain_attr *dattr,
>>>>                        struct cpuset *root_cs);
>>>> +int cpuset1_generate_sched_domains(cpumask_var_t **domains,
>>>> +            struct sched_domain_attr **attributes);
>>>> +
>>>>    #else
>>>>    static inline void cpuset1_update_task_spread_flags(struct cpuset *cs,
>>>>                        struct task_struct *tsk) {}
>>>> @@ -311,6 +333,8 @@ static inline void cpuset1_init(struct cpuset *cs) {}
>>>>    static inline void cpuset1_online_css(struct cgroup_subsys_state *css) {}
>>>>    static inline void update_domain_attr_tree(struct sched_domain_attr *dattr,
>>>>                        struct cpuset *root_cs) {}
>>>> +static inline int cpuset1_generate_sched_domains(cpumask_var_t **domains,
>>>> +            struct sched_domain_attr **attributes) { return 0; };
>>>>      #endif /* CONFIG_CPUSETS_V1 */
>>>>    diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
>>>> index 95de6f2a4cc5..5c0bded46a7c 100644
>>>> --- a/kernel/cgroup/cpuset-v1.c
>>>> +++ b/kernel/cgroup/cpuset-v1.c
>>>> @@ -580,6 +580,173 @@ void update_domain_attr_tree(struct sched_domain_attr *dattr,
>>>>        rcu_read_unlock();
>>>>    }
>>>>    +/*
>>>> + * cpuset1_generate_sched_domains()
>>>> + *
>>>> + * Finding the best partition (set of domains):
>>>> + *    The double nested loops below over i, j scan over the load
>>>> + *    balanced cpusets (using the array of cpuset pointers in csa[])
>>>> + *    looking for pairs of cpusets that have overlapping cpus_allowed
>>>> + *    and merging them using a union-find algorithm.
>>>> + *
>>>> + *    The union of the cpus_allowed masks from the set of all cpusets
>>>> + *    having the same root then form the one element of the partition
>>>> + *    (one sched domain) to be passed to partition_sched_domains().
>>>> + */
>>>> +int cpuset1_generate_sched_domains(cpumask_var_t **domains,
>>>> +            struct sched_domain_attr **attributes)
>>>> +{
>>>> +    struct cpuset *cp;    /* top-down scan of cpusets */
>>>> +    struct cpuset **csa;    /* array of all cpuset ptrs */
>>>> +    int csn;        /* how many cpuset ptrs in csa so far */
>>>> +    int i, j;        /* indices for partition finding loops */
>>>> +    cpumask_var_t *doms;    /* resulting partition; i.e. sched domains */
>>>> +    struct sched_domain_attr *dattr;  /* attributes for custom domains */
>>>> +    int ndoms = 0;        /* number of sched domains in result */
>>>> +    int nslot;        /* next empty doms[] struct cpumask slot */
>>>> +    struct cgroup_subsys_state *pos_css;
>>>> +    bool root_load_balance = is_sched_load_balance(&top_cpuset);
>>>> +    int nslot_update;
>>>> +
>>>> +    assert_cpuset_lock_held();
>>>> +
>>>> +    doms = NULL;
>>>> +    dattr = NULL;
>>>> +    csa = NULL;
>>>> +
>>>> +    /* Special case for the 99% of systems with one, full, sched domain */
>>>> +    if (root_load_balance) {
>>>> +single_root_domain:
>>>> +        ndoms = 1;
>>>> +        doms = alloc_sched_domains(ndoms);
>>>> +        if (!doms)
>>>> +            goto done;
>>>> +
>>>> +        dattr = kmalloc(sizeof(struct sched_domain_attr), GFP_KERNEL);
>>>> +        if (dattr) {
>>>> +            *dattr = SD_ATTR_INIT;
>>>> +            update_domain_attr_tree(dattr, &top_cpuset);
>>>> +        }
>>>> +        cpumask_and(doms[0], top_cpuset.effective_cpus,
>>>> +                housekeeping_cpumask(HK_TYPE_DOMAIN));
>>>> +
>>>> +        goto done;
>>>> +    }
>>>> +
>>>> +    csa = kmalloc_array(nr_cpusets(), sizeof(cp), GFP_KERNEL);
>>>> +    if (!csa)
>>>> +        goto done;
>>>> +    csn = 0;
>>>> +
>>>> +    rcu_read_lock();
>>>> +    if (root_load_balance)
>>>> +        csa[csn++] = &top_cpuset;
>>>> +    cpuset_for_each_descendant_pre(cp, pos_css, &top_cpuset) {
>>>> +        if (cp == &top_cpuset)
>>>> +            continue;
>>>> +
>>>> +        /*
>>>> +         * v1:
>>> Remove this v1 line.
>> Will do.
>>
>>>> +         * Continue traversing beyond @cp iff @cp has some CPUs and
>>>> +         * isn't load balancing.  The former is obvious.  The
>>>> +         * latter: All child cpusets contain a subset of the
>>>> +         * parent's cpus, so just skip them, and then we call
>>>> +         * update_domain_attr_tree() to calc relax_domain_level of
>>>> +         * the corresponding sched domain.
>>>> +         */
>>>> +        if (!cpumask_empty(cp->cpus_allowed) &&
>>>> +            !(is_sched_load_balance(cp) &&
>>>> +              cpumask_intersects(cp->cpus_allowed,
>>>> +                     housekeeping_cpumask(HK_TYPE_DOMAIN))))
>>>> +            continue;
>>>> +
>>>> +        if (is_sched_load_balance(cp) &&
>>>> +            !cpumask_empty(cp->effective_cpus))
>>>> +            csa[csn++] = cp;
>>>> +
>>>> +        /* skip @cp's subtree */
>>>> +        pos_css = css_rightmost_descendant(pos_css);
>>>> +        continue;
>>>> +    }
>>>> +    rcu_read_unlock();
>>>> +
>>>> +    /*
>>>> +     * If there are only isolated partitions underneath the cgroup root,
>>>> +     * we can optimize out unneeded sched domains scanning.
>>>> +     */
>>>> +    if (root_load_balance && (csn == 1))
>>>> +        goto single_root_domain;
>>> This check is v2 specific and you can remove it as well as the "single_root_domain" label.
>>>
>> Thank you.
>>
>> Will remove.
>>
>> Just a note — I removed this code for cpuset v2. Please confirm if that's acceptable. If we drop the
>> v1-specific logic, handling this case wouldn’t take much extra work.
> 
> This code is there because of the single dom check above that handles both v1 and v2. With just one
> version to support, this extra code isn't necessary.
> 

Thank you, got.

-- 
Best regards,
Ridong


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH -next 6/6] cpuset: remove v1-specific code from generate_sched_domains
  2025-12-17  8:49 [PATCH -next 0/6] cpuset: further separate v1 and v2 implementations Chen Ridong
                   ` (4 preceding siblings ...)
  2025-12-17  8:49 ` [PATCH -next 5/6] cpuset: separate generate_sched_domains for v1 and v2 Chen Ridong
@ 2025-12-17  8:49 ` Chen Ridong
  2025-12-17 19:05   ` Waiman Long
  5 siblings, 1 reply; 21+ messages in thread
From: Chen Ridong @ 2025-12-17  8:49 UTC (permalink / raw)
  To: longman, tj, hannes, mkoutny; +Cc: cgroups, linux-kernel, lujialin4, chenridong

From: Chen Ridong <chenridong@huawei.com>

Following the introduction of cpuset1_generate_sched_domains() for v1
in the previous patch, v1-specific logic can now be removed from the
generic generate_sched_domains(). This patch cleans up the v1-only
code and ensures uf_node is only visible when CONFIG_CPUSETS_V1=y.

Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
 kernel/cgroup/cpuset-internal.h |  10 +--
 kernel/cgroup/cpuset-v1.c       |   2 +-
 kernel/cgroup/cpuset.c          | 144 +++++---------------------------
 3 files changed, 27 insertions(+), 129 deletions(-)

diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
index bd767f8cb0ed..ef7b7c5afd4c 100644
--- a/kernel/cgroup/cpuset-internal.h
+++ b/kernel/cgroup/cpuset-internal.h
@@ -175,14 +175,14 @@ struct cpuset {
 	/* Handle for cpuset.cpus.partition */
 	struct cgroup_file partition_file;
 
-	/* Used to merge intersecting subsets for generate_sched_domains */
-	struct uf_node node;
-
 #ifdef CONFIG_CPUSETS_V1
 	struct fmeter fmeter;		/* memory_pressure filter */
 
 	/* for custom sched domain */
 	int relax_domain_level;
+
+	/* Used to merge intersecting subsets for generate_sched_domains */
+	struct uf_node node;
 #endif
 };
 
@@ -315,8 +315,6 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs,
 int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial);
 void cpuset1_init(struct cpuset *cs);
 void cpuset1_online_css(struct cgroup_subsys_state *css);
-void update_domain_attr_tree(struct sched_domain_attr *dattr,
-				    struct cpuset *root_cs);
 int cpuset1_generate_sched_domains(cpumask_var_t **domains,
 			struct sched_domain_attr **attributes);
 
@@ -331,8 +329,6 @@ static inline int cpuset1_validate_change(struct cpuset *cur,
 				struct cpuset *trial) { return 0; }
 static inline void cpuset1_init(struct cpuset *cs) {}
 static inline void cpuset1_online_css(struct cgroup_subsys_state *css) {}
-static inline void update_domain_attr_tree(struct sched_domain_attr *dattr,
-				    struct cpuset *root_cs) {}
 static inline int cpuset1_generate_sched_domains(cpumask_var_t **domains,
 			struct sched_domain_attr **attributes) { return 0; };
 
diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
index 5c0bded46a7c..0226350e704f 100644
--- a/kernel/cgroup/cpuset-v1.c
+++ b/kernel/cgroup/cpuset-v1.c
@@ -560,7 +560,7 @@ update_domain_attr(struct sched_domain_attr *dattr, struct cpuset *c)
 		dattr->relax_domain_level = c->relax_domain_level;
 }
 
-void update_domain_attr_tree(struct sched_domain_attr *dattr,
+static void update_domain_attr_tree(struct sched_domain_attr *dattr,
 				    struct cpuset *root_cs)
 {
 	struct cpuset *cp;
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 6bb0b201c34b..3e3468d928f3 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -789,18 +789,13 @@ static int generate_sched_domains(cpumask_var_t **domains,
 {
 	struct cpuset *cp;	/* top-down scan of cpusets */
 	struct cpuset **csa;	/* array of all cpuset ptrs */
-	int csn;		/* how many cpuset ptrs in csa so far */
 	int i, j;		/* indices for partition finding loops */
 	cpumask_var_t *doms;	/* resulting partition; i.e. sched domains */
 	struct sched_domain_attr *dattr;  /* attributes for custom domains */
 	int ndoms = 0;		/* number of sched domains in result */
-	int nslot;		/* next empty doms[] struct cpumask slot */
 	struct cgroup_subsys_state *pos_css;
-	bool root_load_balance = is_sched_load_balance(&top_cpuset);
-	bool cgrpv2 = cpuset_v2();
-	int nslot_update;
 
-	if (!cgrpv2)
+	if (!cpuset_v2())
 		return cpuset1_generate_sched_domains(domains, attributes);
 
 	doms = NULL;
@@ -808,70 +803,25 @@ static int generate_sched_domains(cpumask_var_t **domains,
 	csa = NULL;
 
 	/* Special case for the 99% of systems with one, full, sched domain */
-	if (root_load_balance && cpumask_empty(subpartitions_cpus)) {
-single_root_domain:
+	if (cpumask_empty(subpartitions_cpus)) {
 		ndoms = 1;
-		doms = alloc_sched_domains(ndoms);
-		if (!doms)
-			goto done;
-
-		dattr = kmalloc(sizeof(struct sched_domain_attr), GFP_KERNEL);
-		if (dattr) {
-			*dattr = SD_ATTR_INIT;
-			update_domain_attr_tree(dattr, &top_cpuset);
-		}
-		cpumask_and(doms[0], top_cpuset.effective_cpus,
-			    housekeeping_cpumask(HK_TYPE_DOMAIN));
-
-		goto done;
+		goto generate_doms;
 	}
 
 	csa = kmalloc_array(nr_cpusets(), sizeof(cp), GFP_KERNEL);
 	if (!csa)
 		goto done;
-	csn = 0;
 
+	/* Find how many partitions and cache them to csa[] */
 	rcu_read_lock();
-	if (root_load_balance)
-		csa[csn++] = &top_cpuset;
 	cpuset_for_each_descendant_pre(cp, pos_css, &top_cpuset) {
-		if (cp == &top_cpuset)
-			continue;
-
-		if (cgrpv2)
-			goto v2;
-
-		/*
-		 * v1:
-		 * Continue traversing beyond @cp iff @cp has some CPUs and
-		 * isn't load balancing.  The former is obvious.  The
-		 * latter: All child cpusets contain a subset of the
-		 * parent's cpus, so just skip them, and then we call
-		 * update_domain_attr_tree() to calc relax_domain_level of
-		 * the corresponding sched domain.
-		 */
-		if (!cpumask_empty(cp->cpus_allowed) &&
-		    !(is_sched_load_balance(cp) &&
-		      cpumask_intersects(cp->cpus_allowed,
-					 housekeeping_cpumask(HK_TYPE_DOMAIN))))
-			continue;
-
-		if (is_sched_load_balance(cp) &&
-		    !cpumask_empty(cp->effective_cpus))
-			csa[csn++] = cp;
-
-		/* skip @cp's subtree */
-		pos_css = css_rightmost_descendant(pos_css);
-		continue;
-
-v2:
 		/*
 		 * Only valid partition roots that are not isolated and with
-		 * non-empty effective_cpus will be saved into csn[].
+		 * non-empty effective_cpus will be saved into csa[].
 		 */
 		if ((cp->partition_root_state == PRS_ROOT) &&
 		    !cpumask_empty(cp->effective_cpus))
-			csa[csn++] = cp;
+			csa[ndoms++] = cp;
 
 		/*
 		 * Skip @cp's subtree if not a partition root and has no
@@ -882,40 +832,18 @@ static int generate_sched_domains(cpumask_var_t **domains,
 	}
 	rcu_read_unlock();
 
-	/*
-	 * If there are only isolated partitions underneath the cgroup root,
-	 * we can optimize out unneeded sched domains scanning.
-	 */
-	if (root_load_balance && (csn == 1))
-		goto single_root_domain;
-
-	for (i = 0; i < csn; i++)
-		uf_node_init(&csa[i]->node);
-
-	/* Merge overlapping cpusets */
-	for (i = 0; i < csn; i++) {
-		for (j = i + 1; j < csn; j++) {
-			if (cpusets_overlap(csa[i], csa[j])) {
+	for (i = 0; i < ndoms; i++) {
+		for (j = i + 1; j < ndoms; j++) {
+			if (cpusets_overlap(csa[i], csa[j]))
 				/*
 				 * Cgroup v2 shouldn't pass down overlapping
 				 * partition root cpusets.
 				 */
-				WARN_ON_ONCE(cgrpv2);
-				uf_union(&csa[i]->node, &csa[j]->node);
-			}
+				WARN_ON_ONCE(1);
 		}
 	}
 
-	/* Count the total number of domains */
-	for (i = 0; i < csn; i++) {
-		if (uf_find(&csa[i]->node) == &csa[i]->node)
-			ndoms++;
-	}
-
-	/*
-	 * Now we know how many domains to create.
-	 * Convert <csn, csa> to <ndoms, doms> and populate cpu masks.
-	 */
+generate_doms:
 	doms = alloc_sched_domains(ndoms);
 	if (!doms)
 		goto done;
@@ -932,45 +860,19 @@ static int generate_sched_domains(cpumask_var_t **domains,
 	 * to SD_ATTR_INIT. Also non-isolating partition root CPUs are a
 	 * subset of HK_TYPE_DOMAIN housekeeping CPUs.
 	 */
-	if (cgrpv2) {
-		for (i = 0; i < ndoms; i++) {
-			/*
-			 * The top cpuset may contain some boot time isolated
-			 * CPUs that need to be excluded from the sched domain.
-			 */
-			if (csa[i] == &top_cpuset)
-				cpumask_and(doms[i], csa[i]->effective_cpus,
-					    housekeeping_cpumask(HK_TYPE_DOMAIN));
-			else
-				cpumask_copy(doms[i], csa[i]->effective_cpus);
-			if (dattr)
-				dattr[i] = SD_ATTR_INIT;
-		}
-		goto done;
-	}
-
-	for (nslot = 0, i = 0; i < csn; i++) {
-		nslot_update = 0;
-		for (j = i; j < csn; j++) {
-			if (uf_find(&csa[j]->node) == &csa[i]->node) {
-				struct cpumask *dp = doms[nslot];
-
-				if (i == j) {
-					nslot_update = 1;
-					cpumask_clear(dp);
-					if (dattr)
-						*(dattr + nslot) = SD_ATTR_INIT;
-				}
-				cpumask_or(dp, dp, csa[j]->effective_cpus);
-				cpumask_and(dp, dp, housekeeping_cpumask(HK_TYPE_DOMAIN));
-				if (dattr)
-					update_domain_attr_tree(dattr + nslot, csa[j]);
-			}
-		}
-		if (nslot_update)
-			nslot++;
+	for (i = 0; i < ndoms; i++) {
+		/*
+		 * The top cpuset may contain some boot time isolated
+		 * CPUs that need to be excluded from the sched domain.
+		 */
+		if (!csa || csa[i] == &top_cpuset)
+			cpumask_and(doms[i], top_cpuset.effective_cpus,
+				    housekeeping_cpumask(HK_TYPE_DOMAIN));
+		else
+			cpumask_copy(doms[i], csa[i]->effective_cpus);
+		if (dattr)
+			dattr[i] = SD_ATTR_INIT;
 	}
-	BUG_ON(nslot != ndoms);
 
 done:
 	kfree(csa);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH -next 6/6] cpuset: remove v1-specific code from generate_sched_domains
  2025-12-17  8:49 ` [PATCH -next 6/6] cpuset: remove v1-specific code from generate_sched_domains Chen Ridong
@ 2025-12-17 19:05   ` Waiman Long
  2025-12-18  1:39     ` Chen Ridong
  2025-12-18  3:56     ` Chen Ridong
  0 siblings, 2 replies; 21+ messages in thread
From: Waiman Long @ 2025-12-17 19:05 UTC (permalink / raw)
  To: Chen Ridong, tj, hannes, mkoutny; +Cc: cgroups, linux-kernel, lujialin4

On 12/17/25 3:49 AM, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
>
> Following the introduction of cpuset1_generate_sched_domains() for v1
> in the previous patch, v1-specific logic can now be removed from the
> generic generate_sched_domains(). This patch cleans up the v1-only
> code and ensures uf_node is only visible when CONFIG_CPUSETS_V1=y.
>
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> ---
>   kernel/cgroup/cpuset-internal.h |  10 +--
>   kernel/cgroup/cpuset-v1.c       |   2 +-
>   kernel/cgroup/cpuset.c          | 144 +++++---------------------------
>   3 files changed, 27 insertions(+), 129 deletions(-)
>
> diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
> index bd767f8cb0ed..ef7b7c5afd4c 100644
> --- a/kernel/cgroup/cpuset-internal.h
> +++ b/kernel/cgroup/cpuset-internal.h
> @@ -175,14 +175,14 @@ struct cpuset {
>   	/* Handle for cpuset.cpus.partition */
>   	struct cgroup_file partition_file;
>   
> -	/* Used to merge intersecting subsets for generate_sched_domains */
> -	struct uf_node node;
> -
>   #ifdef CONFIG_CPUSETS_V1
>   	struct fmeter fmeter;		/* memory_pressure filter */
>   
>   	/* for custom sched domain */
>   	int relax_domain_level;
> +
> +	/* Used to merge intersecting subsets for generate_sched_domains */
> +	struct uf_node node;
>   #endif
>   };
>   
> @@ -315,8 +315,6 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs,
>   int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial);
>   void cpuset1_init(struct cpuset *cs);
>   void cpuset1_online_css(struct cgroup_subsys_state *css);
> -void update_domain_attr_tree(struct sched_domain_attr *dattr,
> -				    struct cpuset *root_cs);
>   int cpuset1_generate_sched_domains(cpumask_var_t **domains,
>   			struct sched_domain_attr **attributes);
>   
> @@ -331,8 +329,6 @@ static inline int cpuset1_validate_change(struct cpuset *cur,
>   				struct cpuset *trial) { return 0; }
>   static inline void cpuset1_init(struct cpuset *cs) {}
>   static inline void cpuset1_online_css(struct cgroup_subsys_state *css) {}
> -static inline void update_domain_attr_tree(struct sched_domain_attr *dattr,
> -				    struct cpuset *root_cs) {}
>   static inline int cpuset1_generate_sched_domains(cpumask_var_t **domains,
>   			struct sched_domain_attr **attributes) { return 0; };
>   
> diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
> index 5c0bded46a7c..0226350e704f 100644
> --- a/kernel/cgroup/cpuset-v1.c
> +++ b/kernel/cgroup/cpuset-v1.c
> @@ -560,7 +560,7 @@ update_domain_attr(struct sched_domain_attr *dattr, struct cpuset *c)
>   		dattr->relax_domain_level = c->relax_domain_level;
>   }
>   
> -void update_domain_attr_tree(struct sched_domain_attr *dattr,
> +static void update_domain_attr_tree(struct sched_domain_attr *dattr,
>   				    struct cpuset *root_cs)
>   {
>   	struct cpuset *cp;
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index 6bb0b201c34b..3e3468d928f3 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -789,18 +789,13 @@ static int generate_sched_domains(cpumask_var_t **domains,
>   {
>   	struct cpuset *cp;	/* top-down scan of cpusets */
>   	struct cpuset **csa;	/* array of all cpuset ptrs */
> -	int csn;		/* how many cpuset ptrs in csa so far */
>   	int i, j;		/* indices for partition finding loops */
>   	cpumask_var_t *doms;	/* resulting partition; i.e. sched domains */
>   	struct sched_domain_attr *dattr;  /* attributes for custom domains */
>   	int ndoms = 0;		/* number of sched domains in result */
> -	int nslot;		/* next empty doms[] struct cpumask slot */
>   	struct cgroup_subsys_state *pos_css;
> -	bool root_load_balance = is_sched_load_balance(&top_cpuset);
> -	bool cgrpv2 = cpuset_v2();
> -	int nslot_update;
>   
> -	if (!cgrpv2)
> +	if (!cpuset_v2())
>   		return cpuset1_generate_sched_domains(domains, attributes);
>   
>   	doms = NULL;
> @@ -808,70 +803,25 @@ static int generate_sched_domains(cpumask_var_t **domains,
>   	csa = NULL;
>   
>   	/* Special case for the 99% of systems with one, full, sched domain */
> -	if (root_load_balance && cpumask_empty(subpartitions_cpus)) {
> -single_root_domain:
> +	if (cpumask_empty(subpartitions_cpus)) {
>   		ndoms = 1;
> -		doms = alloc_sched_domains(ndoms);
> -		if (!doms)
> -			goto done;
> -
> -		dattr = kmalloc(sizeof(struct sched_domain_attr), GFP_KERNEL);
> -		if (dattr) {
> -			*dattr = SD_ATTR_INIT;
> -			update_domain_attr_tree(dattr, &top_cpuset);
> -		}
> -		cpumask_and(doms[0], top_cpuset.effective_cpus,
> -			    housekeeping_cpumask(HK_TYPE_DOMAIN));
> -
> -		goto done;
> +		goto generate_doms;

That is not correct. The code under the generate_doms label will need to 
access csa[0] which is not allocated yet and may cause panic. You either 
need to keep the current code or move it after the csa allocation and 
assign top_cpuset to csa[0].

>   	}
>   
>   	csa = kmalloc_array(nr_cpusets(), sizeof(cp), GFP_KERNEL);
>   	if (!csa)
>   		goto done;
> -	csn = 0;
>   
> +	/* Find how many partitions and cache them to csa[] */
>   	rcu_read_lock();
> -	if (root_load_balance)
> -		csa[csn++] = &top_cpuset;
>   	cpuset_for_each_descendant_pre(cp, pos_css, &top_cpuset) {

The cpuset_for_each_descendant_pre() macro will visit the root 
(top_cpuset) first and so it should be OK to remove the above 2 lines of 
code.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH -next 6/6] cpuset: remove v1-specific code from generate_sched_domains
  2025-12-17 19:05   ` Waiman Long
@ 2025-12-18  1:39     ` Chen Ridong
  2025-12-18  3:14       ` Waiman Long
  2025-12-18  3:56     ` Chen Ridong
  1 sibling, 1 reply; 21+ messages in thread
From: Chen Ridong @ 2025-12-18  1:39 UTC (permalink / raw)
  To: Waiman Long, tj, hannes, mkoutny; +Cc: cgroups, linux-kernel, lujialin4



On 2025/12/18 3:05, Waiman Long wrote:
> On 12/17/25 3:49 AM, Chen Ridong wrote:
>> From: Chen Ridong <chenridong@huawei.com>
>>
>> Following the introduction of cpuset1_generate_sched_domains() for v1
>> in the previous patch, v1-specific logic can now be removed from the
>> generic generate_sched_domains(). This patch cleans up the v1-only
>> code and ensures uf_node is only visible when CONFIG_CPUSETS_V1=y.
>>
>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>> ---
>>   kernel/cgroup/cpuset-internal.h |  10 +--
>>   kernel/cgroup/cpuset-v1.c       |   2 +-
>>   kernel/cgroup/cpuset.c          | 144 +++++---------------------------
>>   3 files changed, 27 insertions(+), 129 deletions(-)
>>
>> diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
>> index bd767f8cb0ed..ef7b7c5afd4c 100644
>> --- a/kernel/cgroup/cpuset-internal.h
>> +++ b/kernel/cgroup/cpuset-internal.h
>> @@ -175,14 +175,14 @@ struct cpuset {
>>       /* Handle for cpuset.cpus.partition */
>>       struct cgroup_file partition_file;
>>   -    /* Used to merge intersecting subsets for generate_sched_domains */
>> -    struct uf_node node;
>> -
>>   #ifdef CONFIG_CPUSETS_V1
>>       struct fmeter fmeter;        /* memory_pressure filter */
>>         /* for custom sched domain */
>>       int relax_domain_level;
>> +
>> +    /* Used to merge intersecting subsets for generate_sched_domains */
>> +    struct uf_node node;
>>   #endif
>>   };
>>   @@ -315,8 +315,6 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs,
>>   int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial);
>>   void cpuset1_init(struct cpuset *cs);
>>   void cpuset1_online_css(struct cgroup_subsys_state *css);
>> -void update_domain_attr_tree(struct sched_domain_attr *dattr,
>> -                    struct cpuset *root_cs);
>>   int cpuset1_generate_sched_domains(cpumask_var_t **domains,
>>               struct sched_domain_attr **attributes);
>>   @@ -331,8 +329,6 @@ static inline int cpuset1_validate_change(struct cpuset *cur,
>>                   struct cpuset *trial) { return 0; }
>>   static inline void cpuset1_init(struct cpuset *cs) {}
>>   static inline void cpuset1_online_css(struct cgroup_subsys_state *css) {}
>> -static inline void update_domain_attr_tree(struct sched_domain_attr *dattr,
>> -                    struct cpuset *root_cs) {}
>>   static inline int cpuset1_generate_sched_domains(cpumask_var_t **domains,
>>               struct sched_domain_attr **attributes) { return 0; };
>>   diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
>> index 5c0bded46a7c..0226350e704f 100644
>> --- a/kernel/cgroup/cpuset-v1.c
>> +++ b/kernel/cgroup/cpuset-v1.c
>> @@ -560,7 +560,7 @@ update_domain_attr(struct sched_domain_attr *dattr, struct cpuset *c)
>>           dattr->relax_domain_level = c->relax_domain_level;
>>   }
>>   -void update_domain_attr_tree(struct sched_domain_attr *dattr,
>> +static void update_domain_attr_tree(struct sched_domain_attr *dattr,
>>                       struct cpuset *root_cs)
>>   {
>>       struct cpuset *cp;
>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>> index 6bb0b201c34b..3e3468d928f3 100644
>> --- a/kernel/cgroup/cpuset.c
>> +++ b/kernel/cgroup/cpuset.c
>> @@ -789,18 +789,13 @@ static int generate_sched_domains(cpumask_var_t **domains,
>>   {
>>       struct cpuset *cp;    /* top-down scan of cpusets */
>>       struct cpuset **csa;    /* array of all cpuset ptrs */
>> -    int csn;        /* how many cpuset ptrs in csa so far */
>>       int i, j;        /* indices for partition finding loops */
>>       cpumask_var_t *doms;    /* resulting partition; i.e. sched domains */
>>       struct sched_domain_attr *dattr;  /* attributes for custom domains */
>>       int ndoms = 0;        /* number of sched domains in result */
>> -    int nslot;        /* next empty doms[] struct cpumask slot */
>>       struct cgroup_subsys_state *pos_css;
>> -    bool root_load_balance = is_sched_load_balance(&top_cpuset);
>> -    bool cgrpv2 = cpuset_v2();
>> -    int nslot_update;
>>   -    if (!cgrpv2)
>> +    if (!cpuset_v2())
>>           return cpuset1_generate_sched_domains(domains, attributes);
>>         doms = NULL;
>> @@ -808,70 +803,25 @@ static int generate_sched_domains(cpumask_var_t **domains,
>>       csa = NULL;
>>         /* Special case for the 99% of systems with one, full, sched domain */
>> -    if (root_load_balance && cpumask_empty(subpartitions_cpus)) {
>> -single_root_domain:
>> +    if (cpumask_empty(subpartitions_cpus)) {
>>           ndoms = 1;
>> -        doms = alloc_sched_domains(ndoms);
>> -        if (!doms)
>> -            goto done;
>> -
>> -        dattr = kmalloc(sizeof(struct sched_domain_attr), GFP_KERNEL);
>> -        if (dattr) {
>> -            *dattr = SD_ATTR_INIT;
>> -            update_domain_attr_tree(dattr, &top_cpuset);
>> -        }
>> -        cpumask_and(doms[0], top_cpuset.effective_cpus,
>> -                housekeeping_cpumask(HK_TYPE_DOMAIN));
>> -
>> -        goto done;
>> +        goto generate_doms;
> 
> That is not correct. The code under the generate_doms label will need to access csa[0] which is not
> allocated yet and may cause panic. You either need to keep the current code or move it after the csa
> allocation and assign top_cpuset to csa[0].
> 

Thank you, Longman.

Sorry, I should note that I made a small change. I added a !csa check: if csa is not allocated, then
ndoms should equal 1, and we only need the top_cpuset (no csa is indeed required). I think it's
cleaner to avoid allocating csa when there's no valid partition.

```
+	for (i = 0; i < ndoms; i++) {
+		/*
+		 * The top cpuset may contain some boot time isolated
+		 * CPUs that need to be excluded from the sched domain.
+		 */
+		if (!csa || csa[i] == &top_cpuset)
+			cpumask_and(doms[i], top_cpuset.effective_cpus,
+				    housekeeping_cpumask(HK_TYPE_DOMAIN));
+		else
+			cpumask_copy(doms[i], csa[i]->effective_cpus);
+		if (dattr)
+			dattr[i] = SD_ATTR_INIT;
 	}
```

Tested with single‑domain generation — no panic or warning observed.

>>       }
>>         csa = kmalloc_array(nr_cpusets(), sizeof(cp), GFP_KERNEL);
>>       if (!csa)
>>           goto done;
>> -    csn = 0;
>>   +    /* Find how many partitions and cache them to csa[] */
>>       rcu_read_lock();
>> -    if (root_load_balance)
>> -        csa[csn++] = &top_cpuset;
>>       cpuset_for_each_descendant_pre(cp, pos_css, &top_cpuset) {
> 
> The cpuset_for_each_descendant_pre() macro will visit the root (top_cpuset) first and so it should
> be OK to remove the above 2 lines of code.
> 
> Cheers,
> Longman
> 

-- 
Best regards,
Ridong


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH -next 6/6] cpuset: remove v1-specific code from generate_sched_domains
  2025-12-18  1:39     ` Chen Ridong
@ 2025-12-18  3:14       ` Waiman Long
  2025-12-18  3:32         ` Chen Ridong
  0 siblings, 1 reply; 21+ messages in thread
From: Waiman Long @ 2025-12-18  3:14 UTC (permalink / raw)
  To: Chen Ridong, Waiman Long, tj, hannes, mkoutny
  Cc: cgroups, linux-kernel, lujialin4

On 12/17/25 8:39 PM, Chen Ridong wrote:
>
> On 2025/12/18 3:05, Waiman Long wrote:
>> On 12/17/25 3:49 AM, Chen Ridong wrote:
>>> From: Chen Ridong <chenridong@huawei.com>
>>>
>>> Following the introduction of cpuset1_generate_sched_domains() for v1
>>> in the previous patch, v1-specific logic can now be removed from the
>>> generic generate_sched_domains(). This patch cleans up the v1-only
>>> code and ensures uf_node is only visible when CONFIG_CPUSETS_V1=y.
>>>
>>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>>> ---
>>>    kernel/cgroup/cpuset-internal.h |  10 +--
>>>    kernel/cgroup/cpuset-v1.c       |   2 +-
>>>    kernel/cgroup/cpuset.c          | 144 +++++---------------------------
>>>    3 files changed, 27 insertions(+), 129 deletions(-)
>>>
>>> diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
>>> index bd767f8cb0ed..ef7b7c5afd4c 100644
>>> --- a/kernel/cgroup/cpuset-internal.h
>>> +++ b/kernel/cgroup/cpuset-internal.h
>>> @@ -175,14 +175,14 @@ struct cpuset {
>>>        /* Handle for cpuset.cpus.partition */
>>>        struct cgroup_file partition_file;
>>>    -    /* Used to merge intersecting subsets for generate_sched_domains */
>>> -    struct uf_node node;
>>> -
>>>    #ifdef CONFIG_CPUSETS_V1
>>>        struct fmeter fmeter;        /* memory_pressure filter */
>>>          /* for custom sched domain */
>>>        int relax_domain_level;
>>> +
>>> +    /* Used to merge intersecting subsets for generate_sched_domains */
>>> +    struct uf_node node;
>>>    #endif
>>>    };
>>>    @@ -315,8 +315,6 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs,
>>>    int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial);
>>>    void cpuset1_init(struct cpuset *cs);
>>>    void cpuset1_online_css(struct cgroup_subsys_state *css);
>>> -void update_domain_attr_tree(struct sched_domain_attr *dattr,
>>> -                    struct cpuset *root_cs);
>>>    int cpuset1_generate_sched_domains(cpumask_var_t **domains,
>>>                struct sched_domain_attr **attributes);
>>>    @@ -331,8 +329,6 @@ static inline int cpuset1_validate_change(struct cpuset *cur,
>>>                    struct cpuset *trial) { return 0; }
>>>    static inline void cpuset1_init(struct cpuset *cs) {}
>>>    static inline void cpuset1_online_css(struct cgroup_subsys_state *css) {}
>>> -static inline void update_domain_attr_tree(struct sched_domain_attr *dattr,
>>> -                    struct cpuset *root_cs) {}
>>>    static inline int cpuset1_generate_sched_domains(cpumask_var_t **domains,
>>>                struct sched_domain_attr **attributes) { return 0; };
>>>    diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
>>> index 5c0bded46a7c..0226350e704f 100644
>>> --- a/kernel/cgroup/cpuset-v1.c
>>> +++ b/kernel/cgroup/cpuset-v1.c
>>> @@ -560,7 +560,7 @@ update_domain_attr(struct sched_domain_attr *dattr, struct cpuset *c)
>>>            dattr->relax_domain_level = c->relax_domain_level;
>>>    }
>>>    -void update_domain_attr_tree(struct sched_domain_attr *dattr,
>>> +static void update_domain_attr_tree(struct sched_domain_attr *dattr,
>>>                        struct cpuset *root_cs)
>>>    {
>>>        struct cpuset *cp;
>>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>>> index 6bb0b201c34b..3e3468d928f3 100644
>>> --- a/kernel/cgroup/cpuset.c
>>> +++ b/kernel/cgroup/cpuset.c
>>> @@ -789,18 +789,13 @@ static int generate_sched_domains(cpumask_var_t **domains,
>>>    {
>>>        struct cpuset *cp;    /* top-down scan of cpusets */
>>>        struct cpuset **csa;    /* array of all cpuset ptrs */
>>> -    int csn;        /* how many cpuset ptrs in csa so far */
>>>        int i, j;        /* indices for partition finding loops */
>>>        cpumask_var_t *doms;    /* resulting partition; i.e. sched domains */
>>>        struct sched_domain_attr *dattr;  /* attributes for custom domains */
>>>        int ndoms = 0;        /* number of sched domains in result */
>>> -    int nslot;        /* next empty doms[] struct cpumask slot */
>>>        struct cgroup_subsys_state *pos_css;
>>> -    bool root_load_balance = is_sched_load_balance(&top_cpuset);
>>> -    bool cgrpv2 = cpuset_v2();
>>> -    int nslot_update;
>>>    -    if (!cgrpv2)
>>> +    if (!cpuset_v2())
>>>            return cpuset1_generate_sched_domains(domains, attributes);
>>>          doms = NULL;
>>> @@ -808,70 +803,25 @@ static int generate_sched_domains(cpumask_var_t **domains,
>>>        csa = NULL;
>>>          /* Special case for the 99% of systems with one, full, sched domain */
>>> -    if (root_load_balance && cpumask_empty(subpartitions_cpus)) {
>>> -single_root_domain:
>>> +    if (cpumask_empty(subpartitions_cpus)) {
>>>            ndoms = 1;
>>> -        doms = alloc_sched_domains(ndoms);
>>> -        if (!doms)
>>> -            goto done;
>>> -
>>> -        dattr = kmalloc(sizeof(struct sched_domain_attr), GFP_KERNEL);
>>> -        if (dattr) {
>>> -            *dattr = SD_ATTR_INIT;
>>> -            update_domain_attr_tree(dattr, &top_cpuset);
>>> -        }
>>> -        cpumask_and(doms[0], top_cpuset.effective_cpus,
>>> -                housekeeping_cpumask(HK_TYPE_DOMAIN));
>>> -
>>> -        goto done;
>>> +        goto generate_doms;
>> That is not correct. The code under the generate_doms label will need to access csa[0] which is not
>> allocated yet and may cause panic. You either need to keep the current code or move it after the csa
>> allocation and assign top_cpuset to csa[0].
>>
> Thank you, Longman.
>
> Sorry, I should note that I made a small change. I added a !csa check: if csa is not allocated, then
> ndoms should equal 1, and we only need the top_cpuset (no csa is indeed required). I think it's
> cleaner to avoid allocating csa when there's no valid partition.
>
> ```
> +	for (i = 0; i < ndoms; i++) {
> +		/*
> +		 * The top cpuset may contain some boot time isolated
> +		 * CPUs that need to be excluded from the sched domain.
> +		 */
> +		if (!csa || csa[i] == &top_cpuset)
> +			cpumask_and(doms[i], top_cpuset.effective_cpus,
> +				    housekeeping_cpumask(HK_TYPE_DOMAIN));
> +		else
> +			cpumask_copy(doms[i], csa[i]->effective_cpus);
> +		if (dattr)
> +			dattr[i] = SD_ATTR_INIT;
>   	}
> ```
>
> Tested with single‑domain generation — no panic or warning observed.

Yes, !csa check here should be good enough to handle the NULL csa case 
here. Maybe adding a comment in the goto line saying that !csa will be 
correctly handled.

Cheers,
Longman


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH -next 6/6] cpuset: remove v1-specific code from generate_sched_domains
  2025-12-18  3:14       ` Waiman Long
@ 2025-12-18  3:32         ` Chen Ridong
  0 siblings, 0 replies; 21+ messages in thread
From: Chen Ridong @ 2025-12-18  3:32 UTC (permalink / raw)
  To: Waiman Long, tj, hannes, mkoutny; +Cc: cgroups, linux-kernel, lujialin4



On 2025/12/18 11:14, Waiman Long wrote:
> On 12/17/25 8:39 PM, Chen Ridong wrote:
>>
>> On 2025/12/18 3:05, Waiman Long wrote:
>>> On 12/17/25 3:49 AM, Chen Ridong wrote:
>>>> From: Chen Ridong <chenridong@huawei.com>
>>>>
>>>> Following the introduction of cpuset1_generate_sched_domains() for v1
>>>> in the previous patch, v1-specific logic can now be removed from the
>>>> generic generate_sched_domains(). This patch cleans up the v1-only
>>>> code and ensures uf_node is only visible when CONFIG_CPUSETS_V1=y.
>>>>
>>>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>>>> ---
>>>>    kernel/cgroup/cpuset-internal.h |  10 +--
>>>>    kernel/cgroup/cpuset-v1.c       |   2 +-
>>>>    kernel/cgroup/cpuset.c          | 144 +++++---------------------------
>>>>    3 files changed, 27 insertions(+), 129 deletions(-)
>>>>
>>>> diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
>>>> index bd767f8cb0ed..ef7b7c5afd4c 100644
>>>> --- a/kernel/cgroup/cpuset-internal.h
>>>> +++ b/kernel/cgroup/cpuset-internal.h
>>>> @@ -175,14 +175,14 @@ struct cpuset {
>>>>        /* Handle for cpuset.cpus.partition */
>>>>        struct cgroup_file partition_file;
>>>>    -    /* Used to merge intersecting subsets for generate_sched_domains */
>>>> -    struct uf_node node;
>>>> -
>>>>    #ifdef CONFIG_CPUSETS_V1
>>>>        struct fmeter fmeter;        /* memory_pressure filter */
>>>>          /* for custom sched domain */
>>>>        int relax_domain_level;
>>>> +
>>>> +    /* Used to merge intersecting subsets for generate_sched_domains */
>>>> +    struct uf_node node;
>>>>    #endif
>>>>    };
>>>>    @@ -315,8 +315,6 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs,
>>>>    int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial);
>>>>    void cpuset1_init(struct cpuset *cs);
>>>>    void cpuset1_online_css(struct cgroup_subsys_state *css);
>>>> -void update_domain_attr_tree(struct sched_domain_attr *dattr,
>>>> -                    struct cpuset *root_cs);
>>>>    int cpuset1_generate_sched_domains(cpumask_var_t **domains,
>>>>                struct sched_domain_attr **attributes);
>>>>    @@ -331,8 +329,6 @@ static inline int cpuset1_validate_change(struct cpuset *cur,
>>>>                    struct cpuset *trial) { return 0; }
>>>>    static inline void cpuset1_init(struct cpuset *cs) {}
>>>>    static inline void cpuset1_online_css(struct cgroup_subsys_state *css) {}
>>>> -static inline void update_domain_attr_tree(struct sched_domain_attr *dattr,
>>>> -                    struct cpuset *root_cs) {}
>>>>    static inline int cpuset1_generate_sched_domains(cpumask_var_t **domains,
>>>>                struct sched_domain_attr **attributes) { return 0; };
>>>>    diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
>>>> index 5c0bded46a7c..0226350e704f 100644
>>>> --- a/kernel/cgroup/cpuset-v1.c
>>>> +++ b/kernel/cgroup/cpuset-v1.c
>>>> @@ -560,7 +560,7 @@ update_domain_attr(struct sched_domain_attr *dattr, struct cpuset *c)
>>>>            dattr->relax_domain_level = c->relax_domain_level;
>>>>    }
>>>>    -void update_domain_attr_tree(struct sched_domain_attr *dattr,
>>>> +static void update_domain_attr_tree(struct sched_domain_attr *dattr,
>>>>                        struct cpuset *root_cs)
>>>>    {
>>>>        struct cpuset *cp;
>>>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>>>> index 6bb0b201c34b..3e3468d928f3 100644
>>>> --- a/kernel/cgroup/cpuset.c
>>>> +++ b/kernel/cgroup/cpuset.c
>>>> @@ -789,18 +789,13 @@ static int generate_sched_domains(cpumask_var_t **domains,
>>>>    {
>>>>        struct cpuset *cp;    /* top-down scan of cpusets */
>>>>        struct cpuset **csa;    /* array of all cpuset ptrs */
>>>> -    int csn;        /* how many cpuset ptrs in csa so far */
>>>>        int i, j;        /* indices for partition finding loops */
>>>>        cpumask_var_t *doms;    /* resulting partition; i.e. sched domains */
>>>>        struct sched_domain_attr *dattr;  /* attributes for custom domains */
>>>>        int ndoms = 0;        /* number of sched domains in result */
>>>> -    int nslot;        /* next empty doms[] struct cpumask slot */
>>>>        struct cgroup_subsys_state *pos_css;
>>>> -    bool root_load_balance = is_sched_load_balance(&top_cpuset);
>>>> -    bool cgrpv2 = cpuset_v2();
>>>> -    int nslot_update;
>>>>    -    if (!cgrpv2)
>>>> +    if (!cpuset_v2())
>>>>            return cpuset1_generate_sched_domains(domains, attributes);
>>>>          doms = NULL;
>>>> @@ -808,70 +803,25 @@ static int generate_sched_domains(cpumask_var_t **domains,
>>>>        csa = NULL;
>>>>          /* Special case for the 99% of systems with one, full, sched domain */
>>>> -    if (root_load_balance && cpumask_empty(subpartitions_cpus)) {
>>>> -single_root_domain:
>>>> +    if (cpumask_empty(subpartitions_cpus)) {
>>>>            ndoms = 1;
>>>> -        doms = alloc_sched_domains(ndoms);
>>>> -        if (!doms)
>>>> -            goto done;
>>>> -
>>>> -        dattr = kmalloc(sizeof(struct sched_domain_attr), GFP_KERNEL);
>>>> -        if (dattr) {
>>>> -            *dattr = SD_ATTR_INIT;
>>>> -            update_domain_attr_tree(dattr, &top_cpuset);
>>>> -        }
>>>> -        cpumask_and(doms[0], top_cpuset.effective_cpus,
>>>> -                housekeeping_cpumask(HK_TYPE_DOMAIN));
>>>> -
>>>> -        goto done;
>>>> +        goto generate_doms;
>>> That is not correct. The code under the generate_doms label will need to access csa[0] which is not
>>> allocated yet and may cause panic. You either need to keep the current code or move it after the csa
>>> allocation and assign top_cpuset to csa[0].
>>>
>> Thank you, Longman.
>>
>> Sorry, I should note that I made a small change. I added a !csa check: if csa is not allocated, then
>> ndoms should equal 1, and we only need the top_cpuset (no csa is indeed required). I think it's
>> cleaner to avoid allocating csa when there's no valid partition.
>>
>> ```
>> +    for (i = 0; i < ndoms; i++) {
>> +        /*
>> +         * The top cpuset may contain some boot time isolated
>> +         * CPUs that need to be excluded from the sched domain.
>> +         */
>> +        if (!csa || csa[i] == &top_cpuset)
>> +            cpumask_and(doms[i], top_cpuset.effective_cpus,
>> +                    housekeeping_cpumask(HK_TYPE_DOMAIN));
>> +        else
>> +            cpumask_copy(doms[i], csa[i]->effective_cpus);
>> +        if (dattr)
>> +            dattr[i] = SD_ATTR_INIT;
>>       }
>> ```
>>
>> Tested with single‑domain generation — no panic or warning observed.
> 
> Yes, !csa check here should be good enough to handle the NULL csa case here. Maybe adding a comment
> in the goto line saying that !csa will be correctly handled.
> 

Good idea, will add.

-- 
Best regards,
Ridong


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH -next 6/6] cpuset: remove v1-specific code from generate_sched_domains
  2025-12-17 19:05   ` Waiman Long
  2025-12-18  1:39     ` Chen Ridong
@ 2025-12-18  3:56     ` Chen Ridong
  1 sibling, 0 replies; 21+ messages in thread
From: Chen Ridong @ 2025-12-18  3:56 UTC (permalink / raw)
  To: Waiman Long, tj, hannes, mkoutny; +Cc: cgroups, linux-kernel, lujialin4



On 2025/12/18 3:05, Waiman Long wrote:
> On 12/17/25 3:49 AM, Chen Ridong wrote:
>> From: Chen Ridong <chenridong@huawei.com>
>>
>> Following the introduction of cpuset1_generate_sched_domains() for v1
>> in the previous patch, v1-specific logic can now be removed from the
>> generic generate_sched_domains(). This patch cleans up the v1-only
>> code and ensures uf_node is only visible when CONFIG_CPUSETS_V1=y.
>>
>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>> ---
>>   kernel/cgroup/cpuset-internal.h |  10 +--
>>   kernel/cgroup/cpuset-v1.c       |   2 +-
>>   kernel/cgroup/cpuset.c          | 144 +++++---------------------------
>>   3 files changed, 27 insertions(+), 129 deletions(-)
>>
>> diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
>> index bd767f8cb0ed..ef7b7c5afd4c 100644
>> --- a/kernel/cgroup/cpuset-internal.h
>> +++ b/kernel/cgroup/cpuset-internal.h
>> @@ -175,14 +175,14 @@ struct cpuset {
>>       /* Handle for cpuset.cpus.partition */
>>       struct cgroup_file partition_file;
>>   -    /* Used to merge intersecting subsets for generate_sched_domains */
>> -    struct uf_node node;
>> -
>>   #ifdef CONFIG_CPUSETS_V1
>>       struct fmeter fmeter;        /* memory_pressure filter */
>>         /* for custom sched domain */
>>       int relax_domain_level;
>> +
>> +    /* Used to merge intersecting subsets for generate_sched_domains */
>> +    struct uf_node node;
>>   #endif
>>   };
>>   @@ -315,8 +315,6 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs,
>>   int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial);
>>   void cpuset1_init(struct cpuset *cs);
>>   void cpuset1_online_css(struct cgroup_subsys_state *css);
>> -void update_domain_attr_tree(struct sched_domain_attr *dattr,
>> -                    struct cpuset *root_cs);
>>   int cpuset1_generate_sched_domains(cpumask_var_t **domains,
>>               struct sched_domain_attr **attributes);
>>   @@ -331,8 +329,6 @@ static inline int cpuset1_validate_change(struct cpuset *cur,
>>                   struct cpuset *trial) { return 0; }
>>   static inline void cpuset1_init(struct cpuset *cs) {}
>>   static inline void cpuset1_online_css(struct cgroup_subsys_state *css) {}
>> -static inline void update_domain_attr_tree(struct sched_domain_attr *dattr,
>> -                    struct cpuset *root_cs) {}
>>   static inline int cpuset1_generate_sched_domains(cpumask_var_t **domains,
>>               struct sched_domain_attr **attributes) { return 0; };
>>   diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
>> index 5c0bded46a7c..0226350e704f 100644
>> --- a/kernel/cgroup/cpuset-v1.c
>> +++ b/kernel/cgroup/cpuset-v1.c
>> @@ -560,7 +560,7 @@ update_domain_attr(struct sched_domain_attr *dattr, struct cpuset *c)
>>           dattr->relax_domain_level = c->relax_domain_level;
>>   }
>>   -void update_domain_attr_tree(struct sched_domain_attr *dattr,
>> +static void update_domain_attr_tree(struct sched_domain_attr *dattr,
>>                       struct cpuset *root_cs)
>>   {
>>       struct cpuset *cp;
>> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
>> index 6bb0b201c34b..3e3468d928f3 100644
>> --- a/kernel/cgroup/cpuset.c
>> +++ b/kernel/cgroup/cpuset.c
>> @@ -789,18 +789,13 @@ static int generate_sched_domains(cpumask_var_t **domains,
>>   {
>>       struct cpuset *cp;    /* top-down scan of cpusets */
>>       struct cpuset **csa;    /* array of all cpuset ptrs */
>> -    int csn;        /* how many cpuset ptrs in csa so far */
>>       int i, j;        /* indices for partition finding loops */
>>       cpumask_var_t *doms;    /* resulting partition; i.e. sched domains */
>>       struct sched_domain_attr *dattr;  /* attributes for custom domains */
>>       int ndoms = 0;        /* number of sched domains in result */
>> -    int nslot;        /* next empty doms[] struct cpumask slot */
>>       struct cgroup_subsys_state *pos_css;
>> -    bool root_load_balance = is_sched_load_balance(&top_cpuset);
>> -    bool cgrpv2 = cpuset_v2();
>> -    int nslot_update;
>>   -    if (!cgrpv2)
>> +    if (!cpuset_v2())
>>           return cpuset1_generate_sched_domains(domains, attributes);
>>         doms = NULL;
>> @@ -808,70 +803,25 @@ static int generate_sched_domains(cpumask_var_t **domains,
>>       csa = NULL;
>>         /* Special case for the 99% of systems with one, full, sched domain */
>> -    if (root_load_balance && cpumask_empty(subpartitions_cpus)) {
>> -single_root_domain:
>> +    if (cpumask_empty(subpartitions_cpus)) {
>>           ndoms = 1;
>> -        doms = alloc_sched_domains(ndoms);
>> -        if (!doms)
>> -            goto done;
>> -
>> -        dattr = kmalloc(sizeof(struct sched_domain_attr), GFP_KERNEL);
>> -        if (dattr) {
>> -            *dattr = SD_ATTR_INIT;
>> -            update_domain_attr_tree(dattr, &top_cpuset);
>> -        }
>> -        cpumask_and(doms[0], top_cpuset.effective_cpus,
>> -                housekeeping_cpumask(HK_TYPE_DOMAIN));
>> -
>> -        goto done;
>> +        goto generate_doms;
> 
> That is not correct. The code under the generate_doms label will need to access csa[0] which is not
> allocated yet and may cause panic. You either need to keep the current code or move it after the csa
> allocation and assign top_cpuset to csa[0].
> 
>>       }
>>         csa = kmalloc_array(nr_cpusets(), sizeof(cp), GFP_KERNEL);
>>       if (!csa)
>>           goto done;
>> -    csn = 0;
>>   +    /* Find how many partitions and cache them to csa[] */
>>       rcu_read_lock();
>> -    if (root_load_balance)
>> -        csa[csn++] = &top_cpuset;
>>       cpuset_for_each_descendant_pre(cp, pos_css, &top_cpuset) {
> 
> The cpuset_for_each_descendant_pre() macro will visit the root (top_cpuset) first and so it should
> be OK to remove the above 2 lines of code.
> 

Yes, it is OK for v2, but we have to keep it in v1. If we remove it in v1, it will skip all whole tree.

-- 
Best regards,
Ridong


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2025-12-18  3:57 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-17  8:49 [PATCH -next 0/6] cpuset: further separate v1 and v2 implementations Chen Ridong
2025-12-17  8:49 ` [PATCH -next 1/6] cpuset: add assert_cpuset_lock_held helper Chen Ridong
2025-12-17 17:02   ` Waiman Long
2025-12-18  0:37     ` Chen Ridong
2025-12-17  8:49 ` [PATCH -next 2/6] cpuset: add cpuset1_online_css helper for v1-specific operations Chen Ridong
2025-12-17  8:49 ` [PATCH -next 3/6] cpuset: add cpuset1_init helper for v1 initialization Chen Ridong
2025-12-17  8:49 ` [PATCH -next 4/6] cpuset: move update_domain_attr_tree to cpuset_v1.c Chen Ridong
2025-12-17 17:09   ` Waiman Long
2025-12-18  0:44     ` Chen Ridong
2025-12-18  3:06       ` Waiman Long
2025-12-17  8:49 ` [PATCH -next 5/6] cpuset: separate generate_sched_domains for v1 and v2 Chen Ridong
2025-12-17 17:48   ` Waiman Long
2025-12-18  1:28     ` Chen Ridong
2025-12-18  3:09       ` Waiman Long
2025-12-18  3:31         ` Chen Ridong
2025-12-17  8:49 ` [PATCH -next 6/6] cpuset: remove v1-specific code from generate_sched_domains Chen Ridong
2025-12-17 19:05   ` Waiman Long
2025-12-18  1:39     ` Chen Ridong
2025-12-18  3:14       ` Waiman Long
2025-12-18  3:32         ` Chen Ridong
2025-12-18  3:56     ` Chen Ridong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox