public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH -next v2 0/6] cpuset: further separate v1 and v2 implementations
@ 2025-12-18  9:31 Chen Ridong
  2025-12-18  9:31 ` [PATCH -next v2 1/6] cpuset: add lockdep_assert_cpuset_lock_held helper Chen Ridong
                   ` (7 more replies)
  0 siblings, 8 replies; 10+ messages in thread
From: Chen Ridong @ 2025-12-18  9:31 UTC (permalink / raw)
  To: longman, tj, hannes, mkoutny; +Cc: cgroups, linux-kernel, lujialin4, chenridong

From: Chen Ridong <chenridong@huawei.com>

Most of the v1-specific code has already been moved to cpuset-v1.c, but
some parts remain in cpuset.c, such as the handling of CS_SPREAD_PAGE,
CS_SPREAD_SLAB, and CGRP_CPUSET_CLONE_CHILDREN. These can also be moved
to cpuset-v1.c.

Additionally, several cpuset members are specific to v1, including
fmeter, relax_domain_level, and the uf_node node. These should only be
visible when v1 support is enabled (CONFIG_CPUSETS_V1).

This series relocates the remaining v1-specific code to cpuset-v1.c and
guards v1-only members with CONFIG_CPUSETS_V1.

The most significant change is the separation of generate_sched_domains()
into v1 and v2 versions. For v1, the original function is preserved
with v2-specific code removed, keeping it largely unchanged since v1 is
deprecated and receives minimal future updates. For v2, all v1-specific
code has been removed, resulting in a much simpler and more maintainable
implementation.

---

v2:
patch1: remame assert_cpuset_lock_held to lockdep_assert_cpuset_lock_held.
patch5: remove some unnecessary v1 code.
patch6: add comment before the goto generate_doms label in the v2 version.

Chen Ridong (6):
  cpuset: add lockdep_assert_cpuset_lock_held helper
  cpuset: add cpuset1_online_css helper for v1-specific operations
  cpuset: add cpuset1_init helper for v1 initialization
  cpuset: move update_domain_attr_tree to cpuset_v1.c
  cpuset: separate generate_sched_domains for v1 and v2
  cpuset: remove v1-specific code from generate_sched_domains

 include/linux/cpuset.h          |   2 +
 kernel/cgroup/cpuset-internal.h |  42 +++++-
 kernel/cgroup/cpuset-v1.c       | 241 +++++++++++++++++++++++++++++-
 kernel/cgroup/cpuset.c          | 253 +++++---------------------------
 4 files changed, 312 insertions(+), 226 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH -next v2 1/6] cpuset: add lockdep_assert_cpuset_lock_held helper
  2025-12-18  9:31 [PATCH -next v2 0/6] cpuset: further separate v1 and v2 implementations Chen Ridong
@ 2025-12-18  9:31 ` Chen Ridong
  2025-12-18  9:31 ` [PATCH -next v2 2/6] cpuset: add cpuset1_online_css helper for v1-specific operations Chen Ridong
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Chen Ridong @ 2025-12-18  9:31 UTC (permalink / raw)
  To: longman, tj, hannes, mkoutny; +Cc: cgroups, linux-kernel, lujialin4, chenridong

From: Chen Ridong <chenridong@huawei.com>

Add lockdep_assert_cpuset_lock_held() to allow other subsystems to verify
that cpuset_mutex is held.

Suggested-by: Waiman Long <longman@redhat.com>
Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
 include/linux/cpuset.h | 2 ++
 kernel/cgroup/cpuset.c | 5 +++++
 2 files changed, 7 insertions(+)

diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index a98d3330385c..a634489c9f12 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -74,6 +74,7 @@ extern void inc_dl_tasks_cs(struct task_struct *task);
 extern void dec_dl_tasks_cs(struct task_struct *task);
 extern void cpuset_lock(void);
 extern void cpuset_unlock(void);
+extern void lockdep_assert_cpuset_lock_held(void);
 extern void cpuset_cpus_allowed_locked(struct task_struct *p, struct cpumask *mask);
 extern void cpuset_cpus_allowed(struct task_struct *p, struct cpumask *mask);
 extern bool cpuset_cpus_allowed_fallback(struct task_struct *p);
@@ -195,6 +196,7 @@ static inline void inc_dl_tasks_cs(struct task_struct *task) { }
 static inline void dec_dl_tasks_cs(struct task_struct *task) { }
 static inline void cpuset_lock(void) { }
 static inline void cpuset_unlock(void) { }
+static inline void lockdep_assert_cpuset_lock_held(void) { }
 
 static inline void cpuset_cpus_allowed_locked(struct task_struct *p,
 					struct cpumask *mask)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index fea577b4016a..4fa3080da3be 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -271,6 +271,11 @@ void cpuset_unlock(void)
 	mutex_unlock(&cpuset_mutex);
 }
 
+void lockdep_assert_cpuset_lock_held(void)
+{
+	lockdep_assert_held(&cpuset_mutex);
+}
+
 /**
  * cpuset_full_lock - Acquire full protection for cpuset modification
  *
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH -next v2 2/6] cpuset: add cpuset1_online_css helper for v1-specific operations
  2025-12-18  9:31 [PATCH -next v2 0/6] cpuset: further separate v1 and v2 implementations Chen Ridong
  2025-12-18  9:31 ` [PATCH -next v2 1/6] cpuset: add lockdep_assert_cpuset_lock_held helper Chen Ridong
@ 2025-12-18  9:31 ` Chen Ridong
  2025-12-18  9:31 ` [PATCH -next v2 3/6] cpuset: add cpuset1_init helper for v1 initialization Chen Ridong
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Chen Ridong @ 2025-12-18  9:31 UTC (permalink / raw)
  To: longman, tj, hannes, mkoutny; +Cc: cgroups, linux-kernel, lujialin4, chenridong

From: Chen Ridong <chenridong@huawei.com>

This commit introduces the cpuset1_online_css helper to centralize
v1-specific handling during cpuset online. It performs operations such as
updating the CS_SPREAD_PAGE, CS_SPREAD_SLAB, and CGRP_CPUSET_CLONE_CHILDREN
flags, which are unique to the cpuset v1 control group interface.

The helper is now placed in cpuset-v1.c to maintain clear separation
between v1 and v2 logic.

Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
 kernel/cgroup/cpuset-internal.h |  2 ++
 kernel/cgroup/cpuset-v1.c       | 48 +++++++++++++++++++++++++++++++++
 kernel/cgroup/cpuset.c          | 39 +--------------------------
 3 files changed, 51 insertions(+), 38 deletions(-)

diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
index 01976c8e7d49..6c03cad02302 100644
--- a/kernel/cgroup/cpuset-internal.h
+++ b/kernel/cgroup/cpuset-internal.h
@@ -293,6 +293,7 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs,
 			    struct cpumask *new_cpus, nodemask_t *new_mems,
 			    bool cpus_updated, bool mems_updated);
 int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial);
+void cpuset1_online_css(struct cgroup_subsys_state *css);
 #else
 static inline void fmeter_init(struct fmeter *fmp) {}
 static inline void cpuset1_update_task_spread_flags(struct cpuset *cs,
@@ -303,6 +304,7 @@ static inline void cpuset1_hotplug_update_tasks(struct cpuset *cs,
 			    bool cpus_updated, bool mems_updated) {}
 static inline int cpuset1_validate_change(struct cpuset *cur,
 				struct cpuset *trial) { return 0; }
+static inline void cpuset1_online_css(struct cgroup_subsys_state *css) {}
 #endif /* CONFIG_CPUSETS_V1 */
 
 #endif /* __CPUSET_INTERNAL_H */
diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
index 12e76774c75b..c296ce47616a 100644
--- a/kernel/cgroup/cpuset-v1.c
+++ b/kernel/cgroup/cpuset-v1.c
@@ -499,6 +499,54 @@ static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft,
 	return retval;
 }
 
+void cpuset1_online_css(struct cgroup_subsys_state *css)
+{
+	struct cpuset *tmp_cs;
+	struct cgroup_subsys_state *pos_css;
+	struct cpuset *cs = css_cs(css);
+	struct cpuset *parent = parent_cs(cs);
+
+	lockdep_assert_cpus_held();
+	lockdep_assert_cpuset_lock_held();
+
+	if (is_spread_page(parent))
+		set_bit(CS_SPREAD_PAGE, &cs->flags);
+	if (is_spread_slab(parent))
+		set_bit(CS_SPREAD_SLAB, &cs->flags);
+
+	if (!test_bit(CGRP_CPUSET_CLONE_CHILDREN, &css->cgroup->flags))
+		return;
+
+	/*
+	 * Clone @parent's configuration if CGRP_CPUSET_CLONE_CHILDREN is
+	 * set.  This flag handling is implemented in cgroup core for
+	 * historical reasons - the flag may be specified during mount.
+	 *
+	 * Currently, if any sibling cpusets have exclusive cpus or mem, we
+	 * refuse to clone the configuration - thereby refusing the task to
+	 * be entered, and as a result refusing the sys_unshare() or
+	 * clone() which initiated it.  If this becomes a problem for some
+	 * users who wish to allow that scenario, then this could be
+	 * changed to grant parent->cpus_allowed-sibling_cpus_exclusive
+	 * (and likewise for mems) to the new cgroup.
+	 */
+	rcu_read_lock();
+	cpuset_for_each_child(tmp_cs, pos_css, parent) {
+		if (is_mem_exclusive(tmp_cs) || is_cpu_exclusive(tmp_cs)) {
+			rcu_read_unlock();
+			return;
+		}
+	}
+	rcu_read_unlock();
+
+	cpuset_callback_lock_irq();
+	cs->mems_allowed = parent->mems_allowed;
+	cs->effective_mems = parent->mems_allowed;
+	cpumask_copy(cs->cpus_allowed, parent->cpus_allowed);
+	cpumask_copy(cs->effective_cpus, parent->cpus_allowed);
+	cpuset_callback_unlock_irq();
+}
+
 /*
  * for the common functions, 'private' gives the type of file
  */
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 4fa3080da3be..5d9dbd1aeed3 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3616,17 +3616,11 @@ static int cpuset_css_online(struct cgroup_subsys_state *css)
 {
 	struct cpuset *cs = css_cs(css);
 	struct cpuset *parent = parent_cs(cs);
-	struct cpuset *tmp_cs;
-	struct cgroup_subsys_state *pos_css;
 
 	if (!parent)
 		return 0;
 
 	cpuset_full_lock();
-	if (is_spread_page(parent))
-		set_bit(CS_SPREAD_PAGE, &cs->flags);
-	if (is_spread_slab(parent))
-		set_bit(CS_SPREAD_SLAB, &cs->flags);
 	/*
 	 * For v2, clear CS_SCHED_LOAD_BALANCE if parent is isolated
 	 */
@@ -3641,39 +3635,8 @@ static int cpuset_css_online(struct cgroup_subsys_state *css)
 		cs->effective_mems = parent->effective_mems;
 	}
 	spin_unlock_irq(&callback_lock);
+	cpuset1_online_css(css);
 
-	if (!test_bit(CGRP_CPUSET_CLONE_CHILDREN, &css->cgroup->flags))
-		goto out_unlock;
-
-	/*
-	 * Clone @parent's configuration if CGRP_CPUSET_CLONE_CHILDREN is
-	 * set.  This flag handling is implemented in cgroup core for
-	 * historical reasons - the flag may be specified during mount.
-	 *
-	 * Currently, if any sibling cpusets have exclusive cpus or mem, we
-	 * refuse to clone the configuration - thereby refusing the task to
-	 * be entered, and as a result refusing the sys_unshare() or
-	 * clone() which initiated it.  If this becomes a problem for some
-	 * users who wish to allow that scenario, then this could be
-	 * changed to grant parent->cpus_allowed-sibling_cpus_exclusive
-	 * (and likewise for mems) to the new cgroup.
-	 */
-	rcu_read_lock();
-	cpuset_for_each_child(tmp_cs, pos_css, parent) {
-		if (is_mem_exclusive(tmp_cs) || is_cpu_exclusive(tmp_cs)) {
-			rcu_read_unlock();
-			goto out_unlock;
-		}
-	}
-	rcu_read_unlock();
-
-	spin_lock_irq(&callback_lock);
-	cs->mems_allowed = parent->mems_allowed;
-	cs->effective_mems = parent->mems_allowed;
-	cpumask_copy(cs->cpus_allowed, parent->cpus_allowed);
-	cpumask_copy(cs->effective_cpus, parent->cpus_allowed);
-	spin_unlock_irq(&callback_lock);
-out_unlock:
 	cpuset_full_unlock();
 	return 0;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH -next v2 3/6] cpuset: add cpuset1_init helper for v1 initialization
  2025-12-18  9:31 [PATCH -next v2 0/6] cpuset: further separate v1 and v2 implementations Chen Ridong
  2025-12-18  9:31 ` [PATCH -next v2 1/6] cpuset: add lockdep_assert_cpuset_lock_held helper Chen Ridong
  2025-12-18  9:31 ` [PATCH -next v2 2/6] cpuset: add cpuset1_online_css helper for v1-specific operations Chen Ridong
@ 2025-12-18  9:31 ` Chen Ridong
  2025-12-18  9:31 ` [PATCH -next v2 4/6] cpuset: move update_domain_attr_tree to cpuset_v1.c Chen Ridong
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Chen Ridong @ 2025-12-18  9:31 UTC (permalink / raw)
  To: longman, tj, hannes, mkoutny; +Cc: cgroups, linux-kernel, lujialin4, chenridong

From: Chen Ridong <chenridong@huawei.com>

This patch introduces the cpuset1_init helper in cpuset_v1.c to initialize
v1-specific fields, including the fmeter and relax_domain_level members.

The relax_domain_level related code will be moved to cpuset_v1.c in a
subsequent patch. After this move, v1-specific members will only be
visible when CONFIG_CPUSETS_V1=y.

Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
 kernel/cgroup/cpuset-internal.h | 10 ++++++----
 kernel/cgroup/cpuset-v1.c       |  7 ++++++-
 kernel/cgroup/cpuset.c          |  4 ++--
 3 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
index 6c03cad02302..a32517da8231 100644
--- a/kernel/cgroup/cpuset-internal.h
+++ b/kernel/cgroup/cpuset-internal.h
@@ -144,8 +144,6 @@ struct cpuset {
 	 */
 	nodemask_t old_mems_allowed;
 
-	struct fmeter fmeter;		/* memory_pressure filter */
-
 	/*
 	 * Tasks are being attached to this cpuset.  Used to prevent
 	 * zeroing cpus/mems_allowed between ->can_attach() and ->attach().
@@ -181,6 +179,10 @@ struct cpuset {
 
 	/* Used to merge intersecting subsets for generate_sched_domains */
 	struct uf_node node;
+
+#ifdef CONFIG_CPUSETS_V1
+	struct fmeter fmeter;		/* memory_pressure filter */
+#endif
 };
 
 static inline struct cpuset *css_cs(struct cgroup_subsys_state *css)
@@ -285,7 +287,6 @@ void cpuset_full_unlock(void);
  */
 #ifdef CONFIG_CPUSETS_V1
 extern struct cftype cpuset1_files[];
-void fmeter_init(struct fmeter *fmp);
 void cpuset1_update_task_spread_flags(struct cpuset *cs,
 					struct task_struct *tsk);
 void cpuset1_update_tasks_flags(struct cpuset *cs);
@@ -293,9 +294,9 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs,
 			    struct cpumask *new_cpus, nodemask_t *new_mems,
 			    bool cpus_updated, bool mems_updated);
 int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial);
+void cpuset1_init(struct cpuset *cs);
 void cpuset1_online_css(struct cgroup_subsys_state *css);
 #else
-static inline void fmeter_init(struct fmeter *fmp) {}
 static inline void cpuset1_update_task_spread_flags(struct cpuset *cs,
 					struct task_struct *tsk) {}
 static inline void cpuset1_update_tasks_flags(struct cpuset *cs) {}
@@ -304,6 +305,7 @@ static inline void cpuset1_hotplug_update_tasks(struct cpuset *cs,
 			    bool cpus_updated, bool mems_updated) {}
 static inline int cpuset1_validate_change(struct cpuset *cur,
 				struct cpuset *trial) { return 0; }
+static inline void cpuset1_init(struct cpuset *cs) {}
 static inline void cpuset1_online_css(struct cgroup_subsys_state *css) {}
 #endif /* CONFIG_CPUSETS_V1 */
 
diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
index c296ce47616a..84f00ab9c81f 100644
--- a/kernel/cgroup/cpuset-v1.c
+++ b/kernel/cgroup/cpuset-v1.c
@@ -62,7 +62,7 @@ struct cpuset_remove_tasks_struct {
 #define FM_SCALE 1000		/* faux fixed point scale */
 
 /* Initialize a frequency meter */
-void fmeter_init(struct fmeter *fmp)
+static void fmeter_init(struct fmeter *fmp)
 {
 	fmp->cnt = 0;
 	fmp->val = 0;
@@ -499,6 +499,11 @@ static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft,
 	return retval;
 }
 
+void cpuset1_init(struct cpuset *cs)
+{
+	fmeter_init(&cs->fmeter);
+}
+
 void cpuset1_online_css(struct cgroup_subsys_state *css)
 {
 	struct cpuset *tmp_cs;
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 5d9dbd1aeed3..8ef8b7511659 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3602,7 +3602,7 @@ cpuset_css_alloc(struct cgroup_subsys_state *parent_css)
 		return ERR_PTR(-ENOMEM);
 
 	__set_bit(CS_SCHED_LOAD_BALANCE, &cs->flags);
-	fmeter_init(&cs->fmeter);
+	cpuset1_init(cs);
 	cs->relax_domain_level = -1;
 
 	/* Set CS_MEMORY_MIGRATE for default hierarchy */
@@ -3836,7 +3836,7 @@ int __init cpuset_init(void)
 	cpumask_setall(top_cpuset.exclusive_cpus);
 	nodes_setall(top_cpuset.effective_mems);
 
-	fmeter_init(&top_cpuset.fmeter);
+	cpuset1_init(&top_cpuset);
 
 	BUG_ON(!alloc_cpumask_var(&cpus_attach, GFP_KERNEL));
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH -next v2 4/6] cpuset: move update_domain_attr_tree to cpuset_v1.c
  2025-12-18  9:31 [PATCH -next v2 0/6] cpuset: further separate v1 and v2 implementations Chen Ridong
                   ` (2 preceding siblings ...)
  2025-12-18  9:31 ` [PATCH -next v2 3/6] cpuset: add cpuset1_init helper for v1 initialization Chen Ridong
@ 2025-12-18  9:31 ` Chen Ridong
  2025-12-18  9:31 ` [PATCH -next v2 5/6] cpuset: separate generate_sched_domains for v1 and v2 Chen Ridong
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Chen Ridong @ 2025-12-18  9:31 UTC (permalink / raw)
  To: longman, tj, hannes, mkoutny; +Cc: cgroups, linux-kernel, lujialin4, chenridong

From: Chen Ridong <chenridong@huawei.com>

Since relax_domain_level is only applicable to v1, move
update_domain_attr_tree() to cpuset-v1.c, which solely updates
relax_domain_level,

Additionally, relax_domain_level is now initialized in cpuset1_inited.
Accordingly, the initialization of relax_domain_level in top_cpuset is
removed. The unnecessary remote_partition initialization in top_cpuset
is also cleaned up.

As a result, relax_domain_level can be defined in cpuset only when
CONFIG_CPUSETS_V1=y.

Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
 kernel/cgroup/cpuset-internal.h | 11 ++++++++---
 kernel/cgroup/cpuset-v1.c       | 28 ++++++++++++++++++++++++++++
 kernel/cgroup/cpuset.c          | 31 -------------------------------
 3 files changed, 36 insertions(+), 34 deletions(-)

diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
index a32517da8231..677053ffb913 100644
--- a/kernel/cgroup/cpuset-internal.h
+++ b/kernel/cgroup/cpuset-internal.h
@@ -150,9 +150,6 @@ struct cpuset {
 	 */
 	int attach_in_progress;
 
-	/* for custom sched domain */
-	int relax_domain_level;
-
 	/* partition root state */
 	int partition_root_state;
 
@@ -182,6 +179,9 @@ struct cpuset {
 
 #ifdef CONFIG_CPUSETS_V1
 	struct fmeter fmeter;		/* memory_pressure filter */
+
+	/* for custom sched domain */
+	int relax_domain_level;
 #endif
 };
 
@@ -296,6 +296,8 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs,
 int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial);
 void cpuset1_init(struct cpuset *cs);
 void cpuset1_online_css(struct cgroup_subsys_state *css);
+void update_domain_attr_tree(struct sched_domain_attr *dattr,
+				    struct cpuset *root_cs);
 #else
 static inline void cpuset1_update_task_spread_flags(struct cpuset *cs,
 					struct task_struct *tsk) {}
@@ -307,6 +309,9 @@ static inline int cpuset1_validate_change(struct cpuset *cur,
 				struct cpuset *trial) { return 0; }
 static inline void cpuset1_init(struct cpuset *cs) {}
 static inline void cpuset1_online_css(struct cgroup_subsys_state *css) {}
+static inline void update_domain_attr_tree(struct sched_domain_attr *dattr,
+				    struct cpuset *root_cs) {}
+
 #endif /* CONFIG_CPUSETS_V1 */
 
 #endif /* __CPUSET_INTERNAL_H */
diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
index 84f00ab9c81f..a4f8f1c3cfaa 100644
--- a/kernel/cgroup/cpuset-v1.c
+++ b/kernel/cgroup/cpuset-v1.c
@@ -502,6 +502,7 @@ static int cpuset_write_u64(struct cgroup_subsys_state *css, struct cftype *cft,
 void cpuset1_init(struct cpuset *cs)
 {
 	fmeter_init(&cs->fmeter);
+	cs->relax_domain_level = -1;
 }
 
 void cpuset1_online_css(struct cgroup_subsys_state *css)
@@ -552,6 +553,33 @@ void cpuset1_online_css(struct cgroup_subsys_state *css)
 	cpuset_callback_unlock_irq();
 }
 
+static void
+update_domain_attr(struct sched_domain_attr *dattr, struct cpuset *c)
+{
+	if (dattr->relax_domain_level < c->relax_domain_level)
+		dattr->relax_domain_level = c->relax_domain_level;
+}
+
+void update_domain_attr_tree(struct sched_domain_attr *dattr,
+				    struct cpuset *root_cs)
+{
+	struct cpuset *cp;
+	struct cgroup_subsys_state *pos_css;
+
+	rcu_read_lock();
+	cpuset_for_each_descendant_pre(cp, pos_css, root_cs) {
+		/* skip the whole subtree if @cp doesn't have any CPU */
+		if (cpumask_empty(cp->cpus_allowed)) {
+			pos_css = css_rightmost_descendant(pos_css);
+			continue;
+		}
+
+		if (is_sched_load_balance(cp))
+			update_domain_attr(dattr, cp);
+	}
+	rcu_read_unlock();
+}
+
 /*
  * for the common functions, 'private' gives the type of file
  */
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 8ef8b7511659..cf2363a9c552 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -215,8 +215,6 @@ static struct cpuset top_cpuset = {
 	.flags = BIT(CS_CPU_EXCLUSIVE) |
 		 BIT(CS_MEM_EXCLUSIVE) | BIT(CS_SCHED_LOAD_BALANCE),
 	.partition_root_state = PRS_ROOT,
-	.relax_domain_level = -1,
-	.remote_partition = false,
 };
 
 /*
@@ -755,34 +753,6 @@ static int cpusets_overlap(struct cpuset *a, struct cpuset *b)
 	return cpumask_intersects(a->effective_cpus, b->effective_cpus);
 }
 
-static void
-update_domain_attr(struct sched_domain_attr *dattr, struct cpuset *c)
-{
-	if (dattr->relax_domain_level < c->relax_domain_level)
-		dattr->relax_domain_level = c->relax_domain_level;
-	return;
-}
-
-static void update_domain_attr_tree(struct sched_domain_attr *dattr,
-				    struct cpuset *root_cs)
-{
-	struct cpuset *cp;
-	struct cgroup_subsys_state *pos_css;
-
-	rcu_read_lock();
-	cpuset_for_each_descendant_pre(cp, pos_css, root_cs) {
-		/* skip the whole subtree if @cp doesn't have any CPU */
-		if (cpumask_empty(cp->cpus_allowed)) {
-			pos_css = css_rightmost_descendant(pos_css);
-			continue;
-		}
-
-		if (is_sched_load_balance(cp))
-			update_domain_attr(dattr, cp);
-	}
-	rcu_read_unlock();
-}
-
 /* Must be called with cpuset_mutex held.  */
 static inline int nr_cpusets(void)
 {
@@ -3603,7 +3573,6 @@ cpuset_css_alloc(struct cgroup_subsys_state *parent_css)
 
 	__set_bit(CS_SCHED_LOAD_BALANCE, &cs->flags);
 	cpuset1_init(cs);
-	cs->relax_domain_level = -1;
 
 	/* Set CS_MEMORY_MIGRATE for default hierarchy */
 	if (cpuset_v2())
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH -next v2 5/6] cpuset: separate generate_sched_domains for v1 and v2
  2025-12-18  9:31 [PATCH -next v2 0/6] cpuset: further separate v1 and v2 implementations Chen Ridong
                   ` (3 preceding siblings ...)
  2025-12-18  9:31 ` [PATCH -next v2 4/6] cpuset: move update_domain_attr_tree to cpuset_v1.c Chen Ridong
@ 2025-12-18  9:31 ` Chen Ridong
  2025-12-18  9:31 ` [PATCH -next v2 6/6] cpuset: remove v1-specific code from generate_sched_domains Chen Ridong
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Chen Ridong @ 2025-12-18  9:31 UTC (permalink / raw)
  To: longman, tj, hannes, mkoutny; +Cc: cgroups, linux-kernel, lujialin4, chenridong

From: Chen Ridong <chenridong@huawei.com>

The generate_sched_domains() function currently handles both v1 and v2
logic. However, the underlying mechanisms for building scheduler domains
differ significantly between the two versions. For cpuset v2, scheduler
domains are straightforwardly derived from valid partitions, whereas
cpuset v1 employs a more complex union-find algorithm to merge overlapping
cpusets. Co-locating these implementations complicates maintenance.

This patch, along with subsequent ones, aims to separate the v1 and v2
logic. For ease of review, this patch first copies the
generate_sched_domains() function into cpuset-v1.c as
cpuset1_generate_sched_domains() and removes v2-specific code. Common
helpers and top_cpuset are declared in cpuset-internal.h. When operating
in v1 mode, the code now calls cpuset1_generate_sched_domains().

Currently there is some code duplication, which will be largely eliminated
once v1-specific code is removed from v2 in the following patch.

Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
 kernel/cgroup/cpuset-internal.h |  23 +++++
 kernel/cgroup/cpuset-v1.c       | 158 ++++++++++++++++++++++++++++++++
 kernel/cgroup/cpuset.c          |  31 +------
 3 files changed, 185 insertions(+), 27 deletions(-)

diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
index 677053ffb913..8622a4666170 100644
--- a/kernel/cgroup/cpuset-internal.h
+++ b/kernel/cgroup/cpuset-internal.h
@@ -9,6 +9,7 @@
 #include <linux/cpuset.h>
 #include <linux/spinlock.h>
 #include <linux/union_find.h>
+#include <linux/sched/isolation.h>
 
 /* See "Frequency meter" comments, below. */
 
@@ -185,6 +186,8 @@ struct cpuset {
 #endif
 };
 
+extern struct cpuset top_cpuset;
+
 static inline struct cpuset *css_cs(struct cgroup_subsys_state *css)
 {
 	return css ? container_of(css, struct cpuset, css) : NULL;
@@ -242,6 +245,21 @@ static inline int is_spread_slab(const struct cpuset *cs)
 	return test_bit(CS_SPREAD_SLAB, &cs->flags);
 }
 
+/*
+ * Helper routine for generate_sched_domains().
+ * Do cpusets a, b have overlapping effective cpus_allowed masks?
+ */
+static inline int cpusets_overlap(struct cpuset *a, struct cpuset *b)
+{
+	return cpumask_intersects(a->effective_cpus, b->effective_cpus);
+}
+
+static inline int nr_cpusets(void)
+{
+	/* jump label reference count + the top-level cpuset */
+	return static_key_count(&cpusets_enabled_key.key) + 1;
+}
+
 /**
  * cpuset_for_each_child - traverse online children of a cpuset
  * @child_cs: loop cursor pointing to the current child
@@ -298,6 +316,9 @@ void cpuset1_init(struct cpuset *cs);
 void cpuset1_online_css(struct cgroup_subsys_state *css);
 void update_domain_attr_tree(struct sched_domain_attr *dattr,
 				    struct cpuset *root_cs);
+int cpuset1_generate_sched_domains(cpumask_var_t **domains,
+			struct sched_domain_attr **attributes);
+
 #else
 static inline void cpuset1_update_task_spread_flags(struct cpuset *cs,
 					struct task_struct *tsk) {}
@@ -311,6 +332,8 @@ static inline void cpuset1_init(struct cpuset *cs) {}
 static inline void cpuset1_online_css(struct cgroup_subsys_state *css) {}
 static inline void update_domain_attr_tree(struct sched_domain_attr *dattr,
 				    struct cpuset *root_cs) {}
+static inline int cpuset1_generate_sched_domains(cpumask_var_t **domains,
+			struct sched_domain_attr **attributes) { return 0; };
 
 #endif /* CONFIG_CPUSETS_V1 */
 
diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
index a4f8f1c3cfaa..ffa7a8dc6c3a 100644
--- a/kernel/cgroup/cpuset-v1.c
+++ b/kernel/cgroup/cpuset-v1.c
@@ -580,6 +580,164 @@ void update_domain_attr_tree(struct sched_domain_attr *dattr,
 	rcu_read_unlock();
 }
 
+/*
+ * cpuset1_generate_sched_domains()
+ *
+ * Finding the best partition (set of domains):
+ *	The double nested loops below over i, j scan over the load
+ *	balanced cpusets (using the array of cpuset pointers in csa[])
+ *	looking for pairs of cpusets that have overlapping cpus_allowed
+ *	and merging them using a union-find algorithm.
+ *
+ *	The union of the cpus_allowed masks from the set of all cpusets
+ *	having the same root then form the one element of the partition
+ *	(one sched domain) to be passed to partition_sched_domains().
+ */
+int cpuset1_generate_sched_domains(cpumask_var_t **domains,
+			struct sched_domain_attr **attributes)
+{
+	struct cpuset *cp;	/* top-down scan of cpusets */
+	struct cpuset **csa;	/* array of all cpuset ptrs */
+	int csn;		/* how many cpuset ptrs in csa so far */
+	int i, j;		/* indices for partition finding loops */
+	cpumask_var_t *doms;	/* resulting partition; i.e. sched domains */
+	struct sched_domain_attr *dattr;  /* attributes for custom domains */
+	int ndoms = 0;		/* number of sched domains in result */
+	int nslot;		/* next empty doms[] struct cpumask slot */
+	struct cgroup_subsys_state *pos_css;
+	bool root_load_balance = is_sched_load_balance(&top_cpuset);
+	int nslot_update;
+
+	lockdep_assert_cpuset_lock_held();
+
+	doms = NULL;
+	dattr = NULL;
+	csa = NULL;
+
+	/* Special case for the 99% of systems with one, full, sched domain */
+	if (root_load_balance) {
+		ndoms = 1;
+		doms = alloc_sched_domains(ndoms);
+		if (!doms)
+			goto done;
+
+		dattr = kmalloc(sizeof(struct sched_domain_attr), GFP_KERNEL);
+		if (dattr) {
+			*dattr = SD_ATTR_INIT;
+			update_domain_attr_tree(dattr, &top_cpuset);
+		}
+		cpumask_and(doms[0], top_cpuset.effective_cpus,
+			    housekeeping_cpumask(HK_TYPE_DOMAIN));
+
+		goto done;
+	}
+
+	csa = kmalloc_array(nr_cpusets(), sizeof(cp), GFP_KERNEL);
+	if (!csa)
+		goto done;
+	csn = 0;
+
+	rcu_read_lock();
+	if (root_load_balance)
+		csa[csn++] = &top_cpuset;
+	cpuset_for_each_descendant_pre(cp, pos_css, &top_cpuset) {
+		if (cp == &top_cpuset)
+			continue;
+
+		/*
+		 * Continue traversing beyond @cp iff @cp has some CPUs and
+		 * isn't load balancing.  The former is obvious.  The
+		 * latter: All child cpusets contain a subset of the
+		 * parent's cpus, so just skip them, and then we call
+		 * update_domain_attr_tree() to calc relax_domain_level of
+		 * the corresponding sched domain.
+		 */
+		if (!cpumask_empty(cp->cpus_allowed) &&
+		    !(is_sched_load_balance(cp) &&
+		      cpumask_intersects(cp->cpus_allowed,
+					 housekeeping_cpumask(HK_TYPE_DOMAIN))))
+			continue;
+
+		if (is_sched_load_balance(cp) &&
+		    !cpumask_empty(cp->effective_cpus))
+			csa[csn++] = cp;
+
+		/* skip @cp's subtree */
+		pos_css = css_rightmost_descendant(pos_css);
+		continue;
+	}
+	rcu_read_unlock();
+
+	for (i = 0; i < csn; i++)
+		uf_node_init(&csa[i]->node);
+
+	/* Merge overlapping cpusets */
+	for (i = 0; i < csn; i++) {
+		for (j = i + 1; j < csn; j++) {
+			if (cpusets_overlap(csa[i], csa[j]))
+				uf_union(&csa[i]->node, &csa[j]->node);
+		}
+	}
+
+	/* Count the total number of domains */
+	for (i = 0; i < csn; i++) {
+		if (uf_find(&csa[i]->node) == &csa[i]->node)
+			ndoms++;
+	}
+
+	/*
+	 * Now we know how many domains to create.
+	 * Convert <csn, csa> to <ndoms, doms> and populate cpu masks.
+	 */
+	doms = alloc_sched_domains(ndoms);
+	if (!doms)
+		goto done;
+
+	/*
+	 * The rest of the code, including the scheduler, can deal with
+	 * dattr==NULL case. No need to abort if alloc fails.
+	 */
+	dattr = kmalloc_array(ndoms, sizeof(struct sched_domain_attr),
+			      GFP_KERNEL);
+
+	for (nslot = 0, i = 0; i < csn; i++) {
+		nslot_update = 0;
+		for (j = i; j < csn; j++) {
+			if (uf_find(&csa[j]->node) == &csa[i]->node) {
+				struct cpumask *dp = doms[nslot];
+
+				if (i == j) {
+					nslot_update = 1;
+					cpumask_clear(dp);
+					if (dattr)
+						*(dattr + nslot) = SD_ATTR_INIT;
+				}
+				cpumask_or(dp, dp, csa[j]->effective_cpus);
+				cpumask_and(dp, dp, housekeeping_cpumask(HK_TYPE_DOMAIN));
+				if (dattr)
+					update_domain_attr_tree(dattr + nslot, csa[j]);
+			}
+		}
+		if (nslot_update)
+			nslot++;
+	}
+	BUG_ON(nslot != ndoms);
+
+done:
+	kfree(csa);
+
+	/*
+	 * Fallback to the default domain if kmalloc() failed.
+	 * See comments in partition_sched_domains().
+	 */
+	if (doms == NULL)
+		ndoms = 1;
+
+	*domains    = doms;
+	*attributes = dattr;
+	return ndoms;
+}
+
 /*
  * for the common functions, 'private' gives the type of file
  */
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index cf2363a9c552..33c929b191e8 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -211,7 +211,7 @@ static inline void notify_partition_change(struct cpuset *cs, int old_prs)
  * If cpu_online_mask is used while a hotunplug operation is happening in
  * parallel, we may leave an offline CPU in cpu_allowed or some other masks.
  */
-static struct cpuset top_cpuset = {
+struct cpuset top_cpuset = {
 	.flags = BIT(CS_CPU_EXCLUSIVE) |
 		 BIT(CS_MEM_EXCLUSIVE) | BIT(CS_SCHED_LOAD_BALANCE),
 	.partition_root_state = PRS_ROOT,
@@ -744,21 +744,6 @@ static int validate_change(struct cpuset *cur, struct cpuset *trial)
 }
 
 #ifdef CONFIG_SMP
-/*
- * Helper routine for generate_sched_domains().
- * Do cpusets a, b have overlapping effective cpus_allowed masks?
- */
-static int cpusets_overlap(struct cpuset *a, struct cpuset *b)
-{
-	return cpumask_intersects(a->effective_cpus, b->effective_cpus);
-}
-
-/* Must be called with cpuset_mutex held.  */
-static inline int nr_cpusets(void)
-{
-	/* jump label reference count + the top-level cpuset */
-	return static_key_count(&cpusets_enabled_key.key) + 1;
-}
 
 /*
  * generate_sched_domains()
@@ -798,17 +783,6 @@ static inline int nr_cpusets(void)
  *	   convenient format, that can be easily compared to the prior
  *	   value to determine what partition elements (sched domains)
  *	   were changed (added or removed.)
- *
- * Finding the best partition (set of domains):
- *	The double nested loops below over i, j scan over the load
- *	balanced cpusets (using the array of cpuset pointers in csa[])
- *	looking for pairs of cpusets that have overlapping cpus_allowed
- *	and merging them using a union-find algorithm.
- *
- *	The union of the cpus_allowed masks from the set of all cpusets
- *	having the same root then form the one element of the partition
- *	(one sched domain) to be passed to partition_sched_domains().
- *
  */
 static int generate_sched_domains(cpumask_var_t **domains,
 			struct sched_domain_attr **attributes)
@@ -826,6 +800,9 @@ static int generate_sched_domains(cpumask_var_t **domains,
 	bool cgrpv2 = cpuset_v2();
 	int nslot_update;
 
+	if (!cgrpv2)
+		return cpuset1_generate_sched_domains(domains, attributes);
+
 	doms = NULL;
 	dattr = NULL;
 	csa = NULL;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH -next v2 6/6] cpuset: remove v1-specific code from generate_sched_domains
  2025-12-18  9:31 [PATCH -next v2 0/6] cpuset: further separate v1 and v2 implementations Chen Ridong
                   ` (4 preceding siblings ...)
  2025-12-18  9:31 ` [PATCH -next v2 5/6] cpuset: separate generate_sched_domains for v1 and v2 Chen Ridong
@ 2025-12-18  9:31 ` Chen Ridong
  2025-12-18 16:30 ` [PATCH -next v2 0/6] cpuset: further separate v1 and v2 implementations Waiman Long
  2025-12-18 18:39 ` Tejun Heo
  7 siblings, 0 replies; 10+ messages in thread
From: Chen Ridong @ 2025-12-18  9:31 UTC (permalink / raw)
  To: longman, tj, hannes, mkoutny; +Cc: cgroups, linux-kernel, lujialin4, chenridong

From: Chen Ridong <chenridong@huawei.com>

Following the introduction of cpuset1_generate_sched_domains() for v1
in the previous patch, v1-specific logic can now be removed from the
generic generate_sched_domains(). This patch cleans up the v1-only
code and ensures uf_node is only visible when CONFIG_CPUSETS_V1=y.

Signed-off-by: Chen Ridong <chenridong@huawei.com>
---
 kernel/cgroup/cpuset-internal.h |  10 +--
 kernel/cgroup/cpuset-v1.c       |   2 +-
 kernel/cgroup/cpuset.c          | 145 ++++++--------------------------
 3 files changed, 28 insertions(+), 129 deletions(-)

diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
index 8622a4666170..e718a4f54360 100644
--- a/kernel/cgroup/cpuset-internal.h
+++ b/kernel/cgroup/cpuset-internal.h
@@ -175,14 +175,14 @@ struct cpuset {
 	/* Handle for cpuset.cpus.partition */
 	struct cgroup_file partition_file;
 
-	/* Used to merge intersecting subsets for generate_sched_domains */
-	struct uf_node node;
-
 #ifdef CONFIG_CPUSETS_V1
 	struct fmeter fmeter;		/* memory_pressure filter */
 
 	/* for custom sched domain */
 	int relax_domain_level;
+
+	/* Used to merge intersecting subsets for generate_sched_domains */
+	struct uf_node node;
 #endif
 };
 
@@ -314,8 +314,6 @@ void cpuset1_hotplug_update_tasks(struct cpuset *cs,
 int cpuset1_validate_change(struct cpuset *cur, struct cpuset *trial);
 void cpuset1_init(struct cpuset *cs);
 void cpuset1_online_css(struct cgroup_subsys_state *css);
-void update_domain_attr_tree(struct sched_domain_attr *dattr,
-				    struct cpuset *root_cs);
 int cpuset1_generate_sched_domains(cpumask_var_t **domains,
 			struct sched_domain_attr **attributes);
 
@@ -330,8 +328,6 @@ static inline int cpuset1_validate_change(struct cpuset *cur,
 				struct cpuset *trial) { return 0; }
 static inline void cpuset1_init(struct cpuset *cs) {}
 static inline void cpuset1_online_css(struct cgroup_subsys_state *css) {}
-static inline void update_domain_attr_tree(struct sched_domain_attr *dattr,
-				    struct cpuset *root_cs) {}
 static inline int cpuset1_generate_sched_domains(cpumask_var_t **domains,
 			struct sched_domain_attr **attributes) { return 0; };
 
diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
index ffa7a8dc6c3a..7303315fdba7 100644
--- a/kernel/cgroup/cpuset-v1.c
+++ b/kernel/cgroup/cpuset-v1.c
@@ -560,7 +560,7 @@ update_domain_attr(struct sched_domain_attr *dattr, struct cpuset *c)
 		dattr->relax_domain_level = c->relax_domain_level;
 }
 
-void update_domain_attr_tree(struct sched_domain_attr *dattr,
+static void update_domain_attr_tree(struct sched_domain_attr *dattr,
 				    struct cpuset *root_cs)
 {
 	struct cpuset *cp;
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 33c929b191e8..221da921b4f9 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -789,18 +789,13 @@ static int generate_sched_domains(cpumask_var_t **domains,
 {
 	struct cpuset *cp;	/* top-down scan of cpusets */
 	struct cpuset **csa;	/* array of all cpuset ptrs */
-	int csn;		/* how many cpuset ptrs in csa so far */
 	int i, j;		/* indices for partition finding loops */
 	cpumask_var_t *doms;	/* resulting partition; i.e. sched domains */
 	struct sched_domain_attr *dattr;  /* attributes for custom domains */
 	int ndoms = 0;		/* number of sched domains in result */
-	int nslot;		/* next empty doms[] struct cpumask slot */
 	struct cgroup_subsys_state *pos_css;
-	bool root_load_balance = is_sched_load_balance(&top_cpuset);
-	bool cgrpv2 = cpuset_v2();
-	int nslot_update;
 
-	if (!cgrpv2)
+	if (!cpuset_v2())
 		return cpuset1_generate_sched_domains(domains, attributes);
 
 	doms = NULL;
@@ -808,70 +803,26 @@ static int generate_sched_domains(cpumask_var_t **domains,
 	csa = NULL;
 
 	/* Special case for the 99% of systems with one, full, sched domain */
-	if (root_load_balance && cpumask_empty(subpartitions_cpus)) {
-single_root_domain:
+	if (cpumask_empty(subpartitions_cpus)) {
 		ndoms = 1;
-		doms = alloc_sched_domains(ndoms);
-		if (!doms)
-			goto done;
-
-		dattr = kmalloc(sizeof(struct sched_domain_attr), GFP_KERNEL);
-		if (dattr) {
-			*dattr = SD_ATTR_INIT;
-			update_domain_attr_tree(dattr, &top_cpuset);
-		}
-		cpumask_and(doms[0], top_cpuset.effective_cpus,
-			    housekeeping_cpumask(HK_TYPE_DOMAIN));
-
-		goto done;
+		/* !csa will be checked and can be correctly handled */
+		goto generate_doms;
 	}
 
 	csa = kmalloc_array(nr_cpusets(), sizeof(cp), GFP_KERNEL);
 	if (!csa)
 		goto done;
-	csn = 0;
 
+	/* Find how many partitions and cache them to csa[] */
 	rcu_read_lock();
-	if (root_load_balance)
-		csa[csn++] = &top_cpuset;
 	cpuset_for_each_descendant_pre(cp, pos_css, &top_cpuset) {
-		if (cp == &top_cpuset)
-			continue;
-
-		if (cgrpv2)
-			goto v2;
-
-		/*
-		 * v1:
-		 * Continue traversing beyond @cp iff @cp has some CPUs and
-		 * isn't load balancing.  The former is obvious.  The
-		 * latter: All child cpusets contain a subset of the
-		 * parent's cpus, so just skip them, and then we call
-		 * update_domain_attr_tree() to calc relax_domain_level of
-		 * the corresponding sched domain.
-		 */
-		if (!cpumask_empty(cp->cpus_allowed) &&
-		    !(is_sched_load_balance(cp) &&
-		      cpumask_intersects(cp->cpus_allowed,
-					 housekeeping_cpumask(HK_TYPE_DOMAIN))))
-			continue;
-
-		if (is_sched_load_balance(cp) &&
-		    !cpumask_empty(cp->effective_cpus))
-			csa[csn++] = cp;
-
-		/* skip @cp's subtree */
-		pos_css = css_rightmost_descendant(pos_css);
-		continue;
-
-v2:
 		/*
 		 * Only valid partition roots that are not isolated and with
-		 * non-empty effective_cpus will be saved into csn[].
+		 * non-empty effective_cpus will be saved into csa[].
 		 */
 		if ((cp->partition_root_state == PRS_ROOT) &&
 		    !cpumask_empty(cp->effective_cpus))
-			csa[csn++] = cp;
+			csa[ndoms++] = cp;
 
 		/*
 		 * Skip @cp's subtree if not a partition root and has no
@@ -882,40 +833,18 @@ static int generate_sched_domains(cpumask_var_t **domains,
 	}
 	rcu_read_unlock();
 
-	/*
-	 * If there are only isolated partitions underneath the cgroup root,
-	 * we can optimize out unneeded sched domains scanning.
-	 */
-	if (root_load_balance && (csn == 1))
-		goto single_root_domain;
-
-	for (i = 0; i < csn; i++)
-		uf_node_init(&csa[i]->node);
-
-	/* Merge overlapping cpusets */
-	for (i = 0; i < csn; i++) {
-		for (j = i + 1; j < csn; j++) {
-			if (cpusets_overlap(csa[i], csa[j])) {
+	for (i = 0; i < ndoms; i++) {
+		for (j = i + 1; j < ndoms; j++) {
+			if (cpusets_overlap(csa[i], csa[j]))
 				/*
 				 * Cgroup v2 shouldn't pass down overlapping
 				 * partition root cpusets.
 				 */
-				WARN_ON_ONCE(cgrpv2);
-				uf_union(&csa[i]->node, &csa[j]->node);
-			}
+				WARN_ON_ONCE(1);
 		}
 	}
 
-	/* Count the total number of domains */
-	for (i = 0; i < csn; i++) {
-		if (uf_find(&csa[i]->node) == &csa[i]->node)
-			ndoms++;
-	}
-
-	/*
-	 * Now we know how many domains to create.
-	 * Convert <csn, csa> to <ndoms, doms> and populate cpu masks.
-	 */
+generate_doms:
 	doms = alloc_sched_domains(ndoms);
 	if (!doms)
 		goto done;
@@ -932,45 +861,19 @@ static int generate_sched_domains(cpumask_var_t **domains,
 	 * to SD_ATTR_INIT. Also non-isolating partition root CPUs are a
 	 * subset of HK_TYPE_DOMAIN housekeeping CPUs.
 	 */
-	if (cgrpv2) {
-		for (i = 0; i < ndoms; i++) {
-			/*
-			 * The top cpuset may contain some boot time isolated
-			 * CPUs that need to be excluded from the sched domain.
-			 */
-			if (csa[i] == &top_cpuset)
-				cpumask_and(doms[i], csa[i]->effective_cpus,
-					    housekeeping_cpumask(HK_TYPE_DOMAIN));
-			else
-				cpumask_copy(doms[i], csa[i]->effective_cpus);
-			if (dattr)
-				dattr[i] = SD_ATTR_INIT;
-		}
-		goto done;
-	}
-
-	for (nslot = 0, i = 0; i < csn; i++) {
-		nslot_update = 0;
-		for (j = i; j < csn; j++) {
-			if (uf_find(&csa[j]->node) == &csa[i]->node) {
-				struct cpumask *dp = doms[nslot];
-
-				if (i == j) {
-					nslot_update = 1;
-					cpumask_clear(dp);
-					if (dattr)
-						*(dattr + nslot) = SD_ATTR_INIT;
-				}
-				cpumask_or(dp, dp, csa[j]->effective_cpus);
-				cpumask_and(dp, dp, housekeeping_cpumask(HK_TYPE_DOMAIN));
-				if (dattr)
-					update_domain_attr_tree(dattr + nslot, csa[j]);
-			}
-		}
-		if (nslot_update)
-			nslot++;
+	for (i = 0; i < ndoms; i++) {
+		/*
+		 * The top cpuset may contain some boot time isolated
+		 * CPUs that need to be excluded from the sched domain.
+		 */
+		if (!csa || csa[i] == &top_cpuset)
+			cpumask_and(doms[i], top_cpuset.effective_cpus,
+				    housekeeping_cpumask(HK_TYPE_DOMAIN));
+		else
+			cpumask_copy(doms[i], csa[i]->effective_cpus);
+		if (dattr)
+			dattr[i] = SD_ATTR_INIT;
 	}
-	BUG_ON(nslot != ndoms);
 
 done:
 	kfree(csa);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH -next v2 0/6] cpuset: further separate v1 and v2 implementations
  2025-12-18  9:31 [PATCH -next v2 0/6] cpuset: further separate v1 and v2 implementations Chen Ridong
                   ` (5 preceding siblings ...)
  2025-12-18  9:31 ` [PATCH -next v2 6/6] cpuset: remove v1-specific code from generate_sched_domains Chen Ridong
@ 2025-12-18 16:30 ` Waiman Long
  2025-12-18 18:39 ` Tejun Heo
  7 siblings, 0 replies; 10+ messages in thread
From: Waiman Long @ 2025-12-18 16:30 UTC (permalink / raw)
  To: Chen Ridong, tj, hannes, mkoutny; +Cc: cgroups, linux-kernel, lujialin4

On 12/18/25 4:31 AM, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
>
> Most of the v1-specific code has already been moved to cpuset-v1.c, but
> some parts remain in cpuset.c, such as the handling of CS_SPREAD_PAGE,
> CS_SPREAD_SLAB, and CGRP_CPUSET_CLONE_CHILDREN. These can also be moved
> to cpuset-v1.c.
>
> Additionally, several cpuset members are specific to v1, including
> fmeter, relax_domain_level, and the uf_node node. These should only be
> visible when v1 support is enabled (CONFIG_CPUSETS_V1).
>
> This series relocates the remaining v1-specific code to cpuset-v1.c and
> guards v1-only members with CONFIG_CPUSETS_V1.
>
> The most significant change is the separation of generate_sched_domains()
> into v1 and v2 versions. For v1, the original function is preserved
> with v2-specific code removed, keeping it largely unchanged since v1 is
> deprecated and receives minimal future updates. For v2, all v1-specific
> code has been removed, resulting in a much simpler and more maintainable
> implementation.
>
> ---
>
> v2:
> patch1: remame assert_cpuset_lock_held to lockdep_assert_cpuset_lock_held.
> patch5: remove some unnecessary v1 code.
> patch6: add comment before the goto generate_doms label in the v2 version.
>
> Chen Ridong (6):
>    cpuset: add lockdep_assert_cpuset_lock_held helper
>    cpuset: add cpuset1_online_css helper for v1-specific operations
>    cpuset: add cpuset1_init helper for v1 initialization
>    cpuset: move update_domain_attr_tree to cpuset_v1.c
>    cpuset: separate generate_sched_domains for v1 and v2
>    cpuset: remove v1-specific code from generate_sched_domains
>
>   include/linux/cpuset.h          |   2 +
>   kernel/cgroup/cpuset-internal.h |  42 +++++-
>   kernel/cgroup/cpuset-v1.c       | 241 +++++++++++++++++++++++++++++-
>   kernel/cgroup/cpuset.c          | 253 +++++---------------------------
>   4 files changed, 312 insertions(+), 226 deletions(-)
>
For the whole series,

Reviewed-by: Waiman Long <longman@redhat.com>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH -next v2 0/6] cpuset: further separate v1 and v2 implementations
  2025-12-18  9:31 [PATCH -next v2 0/6] cpuset: further separate v1 and v2 implementations Chen Ridong
                   ` (6 preceding siblings ...)
  2025-12-18 16:30 ` [PATCH -next v2 0/6] cpuset: further separate v1 and v2 implementations Waiman Long
@ 2025-12-18 18:39 ` Tejun Heo
  2025-12-19  0:42   ` Chen Ridong
  7 siblings, 1 reply; 10+ messages in thread
From: Tejun Heo @ 2025-12-18 18:39 UTC (permalink / raw)
  To: Chen Ridong; +Cc: longman, hannes, mkoutny, cgroups, linux-kernel, lujialin4

> Chen Ridong (6):
>   cpuset: add lockdep_assert_cpuset_lock_held helper
>   cpuset: add cpuset1_online_css helper for v1-specific operations
>   cpuset: add cpuset1_init helper for v1 initialization
>   cpuset: move update_domain_attr_tree to cpuset_v1.c
>   cpuset: separate generate_sched_domains for v1 and v2
>   cpuset: remove v1-specific code from generate_sched_domains

Applied 1-6 to cgroup/for-6.20.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH -next v2 0/6] cpuset: further separate v1 and v2 implementations
  2025-12-18 18:39 ` Tejun Heo
@ 2025-12-19  0:42   ` Chen Ridong
  0 siblings, 0 replies; 10+ messages in thread
From: Chen Ridong @ 2025-12-19  0:42 UTC (permalink / raw)
  To: Tejun Heo, Waiman Long
  Cc: longman, hannes, mkoutny, cgroups, linux-kernel, lujialin4



On 2025/12/19 2:39, Tejun Heo wrote:
>> Chen Ridong (6):
>>   cpuset: add lockdep_assert_cpuset_lock_held helper
>>   cpuset: add cpuset1_online_css helper for v1-specific operations
>>   cpuset: add cpuset1_init helper for v1 initialization
>>   cpuset: move update_domain_attr_tree to cpuset_v1.c
>>   cpuset: separate generate_sched_domains for v1 and v2
>>   cpuset: remove v1-specific code from generate_sched_domains
> 
> Applied 1-6 to cgroup/for-6.20.
> 
> Thanks.
> 

Thank you.

-- 
Best regards,
Ridong


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-12-19  0:42 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-18  9:31 [PATCH -next v2 0/6] cpuset: further separate v1 and v2 implementations Chen Ridong
2025-12-18  9:31 ` [PATCH -next v2 1/6] cpuset: add lockdep_assert_cpuset_lock_held helper Chen Ridong
2025-12-18  9:31 ` [PATCH -next v2 2/6] cpuset: add cpuset1_online_css helper for v1-specific operations Chen Ridong
2025-12-18  9:31 ` [PATCH -next v2 3/6] cpuset: add cpuset1_init helper for v1 initialization Chen Ridong
2025-12-18  9:31 ` [PATCH -next v2 4/6] cpuset: move update_domain_attr_tree to cpuset_v1.c Chen Ridong
2025-12-18  9:31 ` [PATCH -next v2 5/6] cpuset: separate generate_sched_domains for v1 and v2 Chen Ridong
2025-12-18  9:31 ` [PATCH -next v2 6/6] cpuset: remove v1-specific code from generate_sched_domains Chen Ridong
2025-12-18 16:30 ` [PATCH -next v2 0/6] cpuset: further separate v1 and v2 implementations Waiman Long
2025-12-18 18:39 ` Tejun Heo
2025-12-19  0:42   ` Chen Ridong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox