Linux cgroups development
 help / color / mirror / Atom feed
From: Waiman Long <llong@redhat.com>
To: Chen Ridong <chenridong@huaweicloud.com>,
	Waiman Long <llong@redhat.com>,
	tj@kernel.org, hannes@cmpxchg.org, mkoutny@suse.com
Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
	lujialin4@huawei.com
Subject: Re: [PATCH -next 5/6] cpuset: separate generate_sched_domains for v1 and v2
Date: Wed, 17 Dec 2025 22:09:39 -0500	[thread overview]
Message-ID: <08b26d6b-2a8b-491a-aa38-b93e21728445@redhat.com> (raw)
In-Reply-To: <3ca5c423-1b9e-4e59-acf0-ffe3f1086b7e@huaweicloud.com>

On 12/17/25 8:28 PM, Chen Ridong wrote:
>
> On 2025/12/18 1:48, Waiman Long wrote:
> Thank you Longman:
>> On 12/17/25 3:49 AM, Chen Ridong wrote:
>>> From: Chen Ridong <chenridong@huawei.com>
>>>
>>> The generate_sched_domains() function currently handles both v1 and v2
>>> logic. However, the underlying mechanisms for building scheduler domains
>>> differ significantly between the two versions. For cpuset v2, scheduler
>>> domains are straightforwardly derived from valid partitions, whereas
>>> cpuset v1 employs a more complex union-find algorithm to merge overlapping
>>> cpusets. Co-locating these implementations complicates maintenance.
>>>
>>> This patch, along with subsequent ones, aims to separate the v1 and v2
>>> logic. For ease of review, this patch first copies the
>>> generate_sched_domains() function into cpuset-v1.c as
>>> cpuset1_generate_sched_domains() and removes v2-specific code. Common
>>> helpers and top_cpuset are declared in cpuset-internal.h. When operating
>>> in v1 mode, the code now calls cpuset1_generate_sched_domains().
>>>
>>> Currently there is some code duplication, which will be largely eliminated
>>> once v1-specific code is removed from v2 in the following patch.
>>>
>>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
>>> ---
>>>    kernel/cgroup/cpuset-internal.h |  24 +++++
>>>    kernel/cgroup/cpuset-v1.c       | 167 ++++++++++++++++++++++++++++++++
>>>    kernel/cgroup/cpuset.c          |  31 +-----
>>>    3 files changed, 195 insertions(+), 27 deletions(-)
>>>
>>> diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h
>>> index 677053ffb913..bd767f8cb0ed 100644
>>> --- a/kernel/cgroup/cpuset-internal.h
>>> +++ b/kernel/cgroup/cpuset-internal.h
>>> @@ -9,6 +9,7 @@
>>>    #include <linux/cpuset.h>
>>>    #include <linux/spinlock.h>
>>>    #include <linux/union_find.h>
>>> +#include <linux/sched/isolation.h>
>>>      /* See "Frequency meter" comments, below. */
>>>    @@ -185,6 +186,8 @@ struct cpuset {
>>>    #endif
>>>    };
>>>    +extern struct cpuset top_cpuset;
>>> +
>>>    static inline struct cpuset *css_cs(struct cgroup_subsys_state *css)
>>>    {
>>>        return css ? container_of(css, struct cpuset, css) : NULL;
>>> @@ -242,6 +245,22 @@ static inline int is_spread_slab(const struct cpuset *cs)
>>>        return test_bit(CS_SPREAD_SLAB, &cs->flags);
>>>    }
>>>    +/*
>>> + * Helper routine for generate_sched_domains().
>>> + * Do cpusets a, b have overlapping effective cpus_allowed masks?
>>> + */
>>> +static inline int cpusets_overlap(struct cpuset *a, struct cpuset *b)
>>> +{
>>> +    return cpumask_intersects(a->effective_cpus, b->effective_cpus);
>>> +}
>>> +
>>> +static inline int nr_cpusets(void)
>>> +{
>>> +    assert_cpuset_lock_held();
>> For a simple helper like this one which only does an atomic_read(), I don't think you need to assert
>> that cpuset_mutex is held.
>>
> Will remove it.
>
> I added the lock because the location where it’s removed already includes the comment:
> /* Must be called with cpuset_mutex held.  */
>
>>> +    /* jump label reference count + the top-level cpuset */
>>> +    return static_key_count(&cpusets_enabled_key.key) + 1;
>>> +}
>>> +
>>>    /**
>>>     * cpuset_for_each_child - traverse online children of a cpuset
>>>     * @child_cs: loop cursor pointing to the current child
>>> @@ -298,6 +317,9 @@ void cpuset1_init(struct cpuset *cs);
>>>    void cpuset1_online_css(struct cgroup_subsys_state *css);
>>>    void update_domain_attr_tree(struct sched_domain_attr *dattr,
>>>                        struct cpuset *root_cs);
>>> +int cpuset1_generate_sched_domains(cpumask_var_t **domains,
>>> +            struct sched_domain_attr **attributes);
>>> +
>>>    #else
>>>    static inline void cpuset1_update_task_spread_flags(struct cpuset *cs,
>>>                        struct task_struct *tsk) {}
>>> @@ -311,6 +333,8 @@ static inline void cpuset1_init(struct cpuset *cs) {}
>>>    static inline void cpuset1_online_css(struct cgroup_subsys_state *css) {}
>>>    static inline void update_domain_attr_tree(struct sched_domain_attr *dattr,
>>>                        struct cpuset *root_cs) {}
>>> +static inline int cpuset1_generate_sched_domains(cpumask_var_t **domains,
>>> +            struct sched_domain_attr **attributes) { return 0; };
>>>      #endif /* CONFIG_CPUSETS_V1 */
>>>    diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c
>>> index 95de6f2a4cc5..5c0bded46a7c 100644
>>> --- a/kernel/cgroup/cpuset-v1.c
>>> +++ b/kernel/cgroup/cpuset-v1.c
>>> @@ -580,6 +580,173 @@ void update_domain_attr_tree(struct sched_domain_attr *dattr,
>>>        rcu_read_unlock();
>>>    }
>>>    +/*
>>> + * cpuset1_generate_sched_domains()
>>> + *
>>> + * Finding the best partition (set of domains):
>>> + *    The double nested loops below over i, j scan over the load
>>> + *    balanced cpusets (using the array of cpuset pointers in csa[])
>>> + *    looking for pairs of cpusets that have overlapping cpus_allowed
>>> + *    and merging them using a union-find algorithm.
>>> + *
>>> + *    The union of the cpus_allowed masks from the set of all cpusets
>>> + *    having the same root then form the one element of the partition
>>> + *    (one sched domain) to be passed to partition_sched_domains().
>>> + */
>>> +int cpuset1_generate_sched_domains(cpumask_var_t **domains,
>>> +            struct sched_domain_attr **attributes)
>>> +{
>>> +    struct cpuset *cp;    /* top-down scan of cpusets */
>>> +    struct cpuset **csa;    /* array of all cpuset ptrs */
>>> +    int csn;        /* how many cpuset ptrs in csa so far */
>>> +    int i, j;        /* indices for partition finding loops */
>>> +    cpumask_var_t *doms;    /* resulting partition; i.e. sched domains */
>>> +    struct sched_domain_attr *dattr;  /* attributes for custom domains */
>>> +    int ndoms = 0;        /* number of sched domains in result */
>>> +    int nslot;        /* next empty doms[] struct cpumask slot */
>>> +    struct cgroup_subsys_state *pos_css;
>>> +    bool root_load_balance = is_sched_load_balance(&top_cpuset);
>>> +    int nslot_update;
>>> +
>>> +    assert_cpuset_lock_held();
>>> +
>>> +    doms = NULL;
>>> +    dattr = NULL;
>>> +    csa = NULL;
>>> +
>>> +    /* Special case for the 99% of systems with one, full, sched domain */
>>> +    if (root_load_balance) {
>>> +single_root_domain:
>>> +        ndoms = 1;
>>> +        doms = alloc_sched_domains(ndoms);
>>> +        if (!doms)
>>> +            goto done;
>>> +
>>> +        dattr = kmalloc(sizeof(struct sched_domain_attr), GFP_KERNEL);
>>> +        if (dattr) {
>>> +            *dattr = SD_ATTR_INIT;
>>> +            update_domain_attr_tree(dattr, &top_cpuset);
>>> +        }
>>> +        cpumask_and(doms[0], top_cpuset.effective_cpus,
>>> +                housekeeping_cpumask(HK_TYPE_DOMAIN));
>>> +
>>> +        goto done;
>>> +    }
>>> +
>>> +    csa = kmalloc_array(nr_cpusets(), sizeof(cp), GFP_KERNEL);
>>> +    if (!csa)
>>> +        goto done;
>>> +    csn = 0;
>>> +
>>> +    rcu_read_lock();
>>> +    if (root_load_balance)
>>> +        csa[csn++] = &top_cpuset;
>>> +    cpuset_for_each_descendant_pre(cp, pos_css, &top_cpuset) {
>>> +        if (cp == &top_cpuset)
>>> +            continue;
>>> +
>>> +        /*
>>> +         * v1:
>> Remove this v1 line.
> Will do.
>
>>> +         * Continue traversing beyond @cp iff @cp has some CPUs and
>>> +         * isn't load balancing.  The former is obvious.  The
>>> +         * latter: All child cpusets contain a subset of the
>>> +         * parent's cpus, so just skip them, and then we call
>>> +         * update_domain_attr_tree() to calc relax_domain_level of
>>> +         * the corresponding sched domain.
>>> +         */
>>> +        if (!cpumask_empty(cp->cpus_allowed) &&
>>> +            !(is_sched_load_balance(cp) &&
>>> +              cpumask_intersects(cp->cpus_allowed,
>>> +                     housekeeping_cpumask(HK_TYPE_DOMAIN))))
>>> +            continue;
>>> +
>>> +        if (is_sched_load_balance(cp) &&
>>> +            !cpumask_empty(cp->effective_cpus))
>>> +            csa[csn++] = cp;
>>> +
>>> +        /* skip @cp's subtree */
>>> +        pos_css = css_rightmost_descendant(pos_css);
>>> +        continue;
>>> +    }
>>> +    rcu_read_unlock();
>>> +
>>> +    /*
>>> +     * If there are only isolated partitions underneath the cgroup root,
>>> +     * we can optimize out unneeded sched domains scanning.
>>> +     */
>>> +    if (root_load_balance && (csn == 1))
>>> +        goto single_root_domain;
>> This check is v2 specific and you can remove it as well as the "single_root_domain" label.
>>
> Thank you.
>
> Will remove.
>
> Just a note — I removed this code for cpuset v2. Please confirm if that's acceptable. If we drop the
> v1-specific logic, handling this case wouldn’t take much extra work.

This code is there because of the single dom check above that handles 
both v1 and v2. With just one version to support, this extra code isn't 
necessary.

Cheers,
Longman


>


  reply	other threads:[~2025-12-18  3:09 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-17  8:49 [PATCH -next 0/6] cpuset: further separate v1 and v2 implementations Chen Ridong
2025-12-17  8:49 ` [PATCH -next 1/6] cpuset: add assert_cpuset_lock_held helper Chen Ridong
2025-12-17 17:02   ` Waiman Long
2025-12-18  0:37     ` Chen Ridong
2025-12-17  8:49 ` [PATCH -next 2/6] cpuset: add cpuset1_online_css helper for v1-specific operations Chen Ridong
2025-12-17  8:49 ` [PATCH -next 3/6] cpuset: add cpuset1_init helper for v1 initialization Chen Ridong
2025-12-17  8:49 ` [PATCH -next 4/6] cpuset: move update_domain_attr_tree to cpuset_v1.c Chen Ridong
2025-12-17 17:09   ` Waiman Long
2025-12-18  0:44     ` Chen Ridong
2025-12-18  3:06       ` Waiman Long
2025-12-17  8:49 ` [PATCH -next 5/6] cpuset: separate generate_sched_domains for v1 and v2 Chen Ridong
2025-12-17 17:48   ` Waiman Long
2025-12-18  1:28     ` Chen Ridong
2025-12-18  3:09       ` Waiman Long [this message]
2025-12-18  3:31         ` Chen Ridong
2025-12-17  8:49 ` [PATCH -next 6/6] cpuset: remove v1-specific code from generate_sched_domains Chen Ridong
2025-12-17 19:05   ` Waiman Long
2025-12-18  1:39     ` Chen Ridong
2025-12-18  3:14       ` Waiman Long
2025-12-18  3:32         ` Chen Ridong
2025-12-18  3:56     ` Chen Ridong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=08b26d6b-2a8b-491a-aa38-b93e21728445@redhat.com \
    --to=llong@redhat.com \
    --cc=cgroups@vger.kernel.org \
    --cc=chenridong@huaweicloud.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lujialin4@huawei.com \
    --cc=mkoutny@suse.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox