Re: [PATCH -next 5/6] cpuset: separate generate_sched_domains for v1 and v2

Linux cgroups development
 help / color / mirror / Atom feed

From: Waiman Long <llong@redhat.com>
To: Chen Ridong <chenridong@huaweicloud.com>,
	tj@kernel.org, hannes@cmpxchg.org, mkoutny@suse.com
Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
	lujialin4@huawei.com
Subject: Re: [PATCH -next 5/6] cpuset: separate generate_sched_domains for v1 and v2
Date: Wed, 17 Dec 2025 12:48:53 -0500	[thread overview]
Message-ID: <8d0ef5fc-f392-40f8-9803-50807c172800@redhat.com> (raw)
In-Reply-To: <20251217084942.2666405-6-chenridong@huaweicloud.com>

On 12/17/25 3:49 AM, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
>
> The generate_sched_domains() function currently handles both v1 and v2
> logic. However, the underlying mechanisms for building scheduler domains
> differ significantly between the two versions. For cpuset v2, scheduler
> domains are straightforwardly derived from valid partitions, whereas
> cpuset v1 employs a more complex union-find algorithm to merge overlapping
> cpusets. Co-locating these implementations complicates maintenance.
>
> This patch, along with subsequent ones, aims to separate the v1 and v2
> logic. For ease of review, this patch first copies the
> generate_sched_domains() function into cpuset-v1.c as
> cpuset1_generate_sched_domains() and removes v2-specific code. Common
> helpers and top_cpuset are declared in cpuset-internal.h. When operating
> in v1 mode, the code now calls cpuset1_generate_sched_domains().
>
> Currently there is some code duplication, which will be largely eliminated
> once v1-specific code is removed from v2 in the following patch.
>
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> ---
>   kernel/cgroup/cpuset-internal.h |  24 +++++
>   kernel/cgroup/cpuset-v1.c       | 167 ++++++++++++++++++++++++++++++++
>   kernel/cgroup/cpuset.c          |  31 +-----
>   3 files changed, 195 insertions(+), 27 deletions(-)
>
> diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
> index 677053ffb913..bd767f8cb0ed 100644
> --- a/kernel/cgroup/cpuset-internal.h
> +++ b/kernel/cgroup/cpuset-internal.h
> @@ -9,6 +9,7 @@
>   #include <linux/cpuset.h>
>   #include <linux/spinlock.h>
>   #include <linux/union_find.h>
> +#include <linux/sched/isolation.h>
>   
>   /* See "Frequency meter" comments, below. */
>   
> @@ -185,6 +186,8 @@ struct cpuset {
>   #endif
>   };
>   
> +extern struct cpuset top_cpuset;
> +
>   static inline struct cpuset *css_cs(struct cgroup_subsys_state *css)
>   {
>   	return css ? container_of(css, struct cpuset, css) : NULL;
> @@ -242,6 +245,22 @@ static inline int is_spread_slab(const struct cpuset *cs)
>   	return test_bit(CS_SPREAD_SLAB, &cs->flags);
>   }
>   
> +/*
> + * Helper routine for generate_sched_domains().
> + * Do cpusets a, b have overlapping effective cpus_allowed masks?
> + */
> +static inline int cpusets_overlap(struct cpuset *a, struct cpuset *b)
> +{
> +	return cpumask_intersects(a->effective_cpus, b->effective_cpus);
> +}
> +
> +static inline int nr_cpusets(void)
> +{
> +	assert_cpuset_lock_held();

For a simple helper like this one which only does an atomic_read(), I 
don't think you need to assert that cpuset_mutex is held.

> +	/* jump label reference count + the top-level cpuset */
> +	return static_key_count(&cpusets_enabled_key.key) + 1;
> +}
> +
>   /**
>    * cpuset_for_each_child - traverse online children of a cpuset
>    * @child_cs: loop cursor pointing to the current child
> @@ -298,6 +317,9 @@ void cpuset1_init(struct cpuset *cs);
>   void cpuset1_online_css(struct cgroup_subsys_state *css);
>   void update_domain_attr_tree(struct sched_domain_attr *dattr,
>   				    struct cpuset *root_cs);
> +int cpuset1_generate_sched_domains(cpumask_var_t **domains,
> +			struct sched_domain_attr **attributes);
> +
>   #else
>   static inline void cpuset1_update_task_spread_flags(struct cpuset *cs,
>   					struct task_struct *tsk) {}
> @@ -311,6 +333,8 @@ static inline void cpuset1_init(struct cpuset *cs) {}
>   static inline void cpuset1_online_css(struct cgroup_subsys_state *css) {}
>   static inline void update_domain_attr_tree(struct sched_domain_attr *dattr,
>   				    struct cpuset *root_cs) {}
> +static inline int cpuset1_generate_sched_domains(cpumask_var_t **domains,
> +			struct sched_domain_attr **attributes) { return 0; };
>   
>   #endif /* CONFIG_CPUSETS_V1 */
>   
> diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
> index 95de6f2a4cc5..5c0bded46a7c 100644
> --- a/kernel/cgroup/cpuset-v1.c
> +++ b/kernel/cgroup/cpuset-v1.c
> @@ -580,6 +580,173 @@ void update_domain_attr_tree(struct sched_domain_attr *dattr,
>   	rcu_read_unlock();
>   }
>   
> +/*
> + * cpuset1_generate_sched_domains()
> + *
> + * Finding the best partition (set of domains):
> + *	The double nested loops below over i, j scan over the load
> + *	balanced cpusets (using the array of cpuset pointers in csa[])
> + *	looking for pairs of cpusets that have overlapping cpus_allowed
> + *	and merging them using a union-find algorithm.
> + *
> + *	The union of the cpus_allowed masks from the set of all cpusets
> + *	having the same root then form the one element of the partition
> + *	(one sched domain) to be passed to partition_sched_domains().
> + */
> +int cpuset1_generate_sched_domains(cpumask_var_t **domains,
> +			struct sched_domain_attr **attributes)
> +{
> +	struct cpuset *cp;	/* top-down scan of cpusets */
> +	struct cpuset **csa;	/* array of all cpuset ptrs */
> +	int csn;		/* how many cpuset ptrs in csa so far */
> +	int i, j;		/* indices for partition finding loops */
> +	cpumask_var_t *doms;	/* resulting partition; i.e. sched domains */
> +	struct sched_domain_attr *dattr;  /* attributes for custom domains */
> +	int ndoms = 0;		/* number of sched domains in result */
> +	int nslot;		/* next empty doms[] struct cpumask slot */
> +	struct cgroup_subsys_state *pos_css;
> +	bool root_load_balance = is_sched_load_balance(&top_cpuset);
> +	int nslot_update;
> +
> +	assert_cpuset_lock_held();
> +
> +	doms = NULL;
> +	dattr = NULL;
> +	csa = NULL;
> +
> +	/* Special case for the 99% of systems with one, full, sched domain */
> +	if (root_load_balance) {
> +single_root_domain:
> +		ndoms = 1;
> +		doms = alloc_sched_domains(ndoms);
> +		if (!doms)
> +			goto done;
> +
> +		dattr = kmalloc(sizeof(struct sched_domain_attr), GFP_KERNEL);
> +		if (dattr) {
> +			*dattr = SD_ATTR_INIT;
> +			update_domain_attr_tree(dattr, &top_cpuset);
> +		}
> +		cpumask_and(doms[0], top_cpuset.effective_cpus,
> +			    housekeeping_cpumask(HK_TYPE_DOMAIN));
> +
> +		goto done;
> +	}
> +
> +	csa = kmalloc_array(nr_cpusets(), sizeof(cp), GFP_KERNEL);
> +	if (!csa)
> +		goto done;
> +	csn = 0;
> +
> +	rcu_read_lock();
> +	if (root_load_balance)
> +		csa[csn++] = &top_cpuset;
> +	cpuset_for_each_descendant_pre(cp, pos_css, &top_cpuset) {
> +		if (cp == &top_cpuset)
> +			continue;
> +
> +		/*
> +		 * v1:
Remove this v1 line.
> +		 * Continue traversing beyond @cp iff @cp has some CPUs and
> +		 * isn't load balancing.  The former is obvious.  The
> +		 * latter: All child cpusets contain a subset of the
> +		 * parent's cpus, so just skip them, and then we call
> +		 * update_domain_attr_tree() to calc relax_domain_level of
> +		 * the corresponding sched domain.
> +		 */
> +		if (!cpumask_empty(cp->cpus_allowed) &&
> +		    !(is_sched_load_balance(cp) &&
> +		      cpumask_intersects(cp->cpus_allowed,
> +					 housekeeping_cpumask(HK_TYPE_DOMAIN))))
> +			continue;
> +
> +		if (is_sched_load_balance(cp) &&
> +		    !cpumask_empty(cp->effective_cpus))
> +			csa[csn++] = cp;
> +
> +		/* skip @cp's subtree */
> +		pos_css = css_rightmost_descendant(pos_css);
> +		continue;
> +	}
> +	rcu_read_unlock();
> +
> +	/*
> +	 * If there are only isolated partitions underneath the cgroup root,
> +	 * we can optimize out unneeded sched domains scanning.
> +	 */
> +	if (root_load_balance && (csn == 1))
> +		goto single_root_domain;

This check is v2 specific and you can remove it as well as the 
"single_root_domain" label.

Cheers,
Longman

next prev parent reply	other threads:[~2025-12-17 17:49 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-17  8:49 [PATCH -next 0/6] cpuset: further separate v1 and v2 implementations Chen Ridong
2025-12-17  8:49 ` [PATCH -next 1/6] cpuset: add assert_cpuset_lock_held helper Chen Ridong
2025-12-17 17:02   ` Waiman Long
2025-12-18  0:37     ` Chen Ridong
2025-12-17  8:49 ` [PATCH -next 2/6] cpuset: add cpuset1_online_css helper for v1-specific operations Chen Ridong
2025-12-17  8:49 ` [PATCH -next 3/6] cpuset: add cpuset1_init helper for v1 initialization Chen Ridong
2025-12-17  8:49 ` [PATCH -next 4/6] cpuset: move update_domain_attr_tree to cpuset_v1.c Chen Ridong
2025-12-17 17:09   ` Waiman Long
2025-12-18  0:44     ` Chen Ridong
2025-12-18  3:06       ` Waiman Long
2025-12-17  8:49 ` [PATCH -next 5/6] cpuset: separate generate_sched_domains for v1 and v2 Chen Ridong
2025-12-17 17:48   ` Waiman Long [this message]
2025-12-18  1:28     ` Chen Ridong
2025-12-18  3:09       ` Waiman Long
2025-12-18  3:31         ` Chen Ridong
2025-12-17  8:49 ` [PATCH -next 6/6] cpuset: remove v1-specific code from generate_sched_domains Chen Ridong
2025-12-17 19:05   ` Waiman Long
2025-12-18  1:39     ` Chen Ridong
2025-12-18  3:14       ` Waiman Long
2025-12-18  3:32         ` Chen Ridong
2025-12-18  3:56     ` Chen Ridong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8d0ef5fc-f392-40f8-9803-50807c172800@redhat.com \
    --to=llong@redhat.com \
    --cc=cgroups@vger.kernel.org \
    --cc=chenridong@huaweicloud.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lujialin4@huawei.com \
    --cc=mkoutny@suse.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox