From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3383327B348; Thu, 18 Dec 2025 03:31:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766028674; cv=none; b=FCHH5Mg3JsehBpdc1/HKB9D7rlZv9GURRr5qFa8zymrFCsxOB8ZouI0iKMBPgArzYeu5XJTtClUokvoQQpg1iLEhu/cNLhk3V4koEGtF2rtUD//tsTwf6/LYCa+EzNwcp6uNfVFnUaLnuD3OKC97TLVh5+EgZ+letJuNC+l1x7U= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1766028674; c=relaxed/simple; bh=+ZKD1QlAiGxx8Js8puTUzThZ4tn4WO6OItvQpyKc6Kw=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=LIv5ot64HptRglS0Eu9Gg8+nJNtEODfE7Rf3srDqmvCokAOoyw0xgzQxI/ye9v1PaKEdg0dksbfEu33BhGi8PktHQNeghJGyIPuLfQ1P5zw6RxXFJ+4gK+aiEMmkdP/g9XTziecVMqm8wpYDoSEsw11/BS/mGheniJ0rUEf4frI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.198]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4dWx5D4L9mzYQtJN; Thu, 18 Dec 2025 11:30:40 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.75]) by mail.maildlp.com (Postfix) with ESMTP id 83CA140575; Thu, 18 Dec 2025 11:31:08 +0800 (CST) Received: from [10.67.111.176] (unknown [10.67.111.176]) by APP2 (Coremail) with SMTP id Syh0CgCnwYB6dUNpNSJNAg--.40020S2; Thu, 18 Dec 2025 11:31:08 +0800 (CST) Message-ID: Date: Thu, 18 Dec 2025 11:31:06 +0800 Precedence: bulk X-Mailing-List: cgroups@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH -next 5/6] cpuset: separate generate_sched_domains for v1 and v2 To: Waiman Long , tj@kernel.org, hannes@cmpxchg.org, mkoutny@suse.com Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, lujialin4@huawei.com References: <20251217084942.2666405-1-chenridong@huaweicloud.com> <20251217084942.2666405-6-chenridong@huaweicloud.com> <8d0ef5fc-f392-40f8-9803-50807c172800@redhat.com> <3ca5c423-1b9e-4e59-acf0-ffe3f1086b7e@huaweicloud.com> <08b26d6b-2a8b-491a-aa38-b93e21728445@redhat.com> Content-Language: en-US From: Chen Ridong In-Reply-To: <08b26d6b-2a8b-491a-aa38-b93e21728445@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-CM-TRANSID:Syh0CgCnwYB6dUNpNSJNAg--.40020S2 X-Coremail-Antispam: 1UD129KBjvJXoWfJr45JFykGFW8CFWUGw4rKrg_yoWDury5pF 1kGFWUArW5Jr1rGw1jgw1UXr9Iyw1UJ3WDJr18X3W8JrsrtF129r1UXFn09r18Ar4kGr1U Zr1jqr43uF13JrJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUyGb4IE77IF4wAFF20E14v26r4j6ryUM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4 vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7Cj xVAFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x 0267AKxVW0oVCq3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG 6I80ewAv7VC0I7IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFV Cjc4AY6r1j6r4UM4x0Y48IcVAKI48JMxkF7I0En4kS14v26r126r1DMxAIw28IcxkI7VAK I48JMxC20s026xCaFVCjc4AY6r1j6r4UMI8I3I0E5I8CrVAFwI0_Jr0_Jr4lx2IqxVCjr7 xvwVAFwI0_JrI_JrWlx4CE17CEb7AF67AKxVWUAVWUtwCIc40Y0x0EwIxGrwCI42IY6xII jxv20xvE14v26r1j6r1xMIIF0xvE2Ix0cI8IcVCY1x0267AKxVWUJVW8JwCI42IY6xAIw2 0EY4v20xvaj40_Jr0_JF4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x02 67AKxVW8JVW8JrUvcSsGvfC2KfnxnUUI43ZEXa7IU1veHDUUUUU== X-CM-SenderInfo: hfkh02xlgr0w46kxt4xhlfz01xgou0bp/ On 2025/12/18 11:09, Waiman Long wrote: > On 12/17/25 8:28 PM, Chen Ridong wrote: >> >> On 2025/12/18 1:48, Waiman Long wrote: >> Thank you Longman: >>> On 12/17/25 3:49 AM, Chen Ridong wrote: >>>> From: Chen Ridong >>>> >>>> The generate_sched_domains() function currently handles both v1 and v2 >>>> logic. However, the underlying mechanisms for building scheduler domains >>>> differ significantly between the two versions. For cpuset v2, scheduler >>>> domains are straightforwardly derived from valid partitions, whereas >>>> cpuset v1 employs a more complex union-find algorithm to merge overlapping >>>> cpusets. Co-locating these implementations complicates maintenance. >>>> >>>> This patch, along with subsequent ones, aims to separate the v1 and v2 >>>> logic. For ease of review, this patch first copies the >>>> generate_sched_domains() function into cpuset-v1.c as >>>> cpuset1_generate_sched_domains() and removes v2-specific code. Common >>>> helpers and top_cpuset are declared in cpuset-internal.h. When operating >>>> in v1 mode, the code now calls cpuset1_generate_sched_domains(). >>>> >>>> Currently there is some code duplication, which will be largely eliminated >>>> once v1-specific code is removed from v2 in the following patch. >>>> >>>> Signed-off-by: Chen Ridong >>>> --- >>>>    kernel/cgroup/cpuset-internal.h |  24 +++++ >>>>    kernel/cgroup/cpuset-v1.c       | 167 ++++++++++++++++++++++++++++++++ >>>>    kernel/cgroup/cpuset.c          |  31 +----- >>>>    3 files changed, 195 insertions(+), 27 deletions(-) >>>> >>>> diff --git a/kernel/cgroup/cpuset-internal.h b/kernel/cgroup/cpuset-internal.h >>>> index 677053ffb913..bd767f8cb0ed 100644 >>>> --- a/kernel/cgroup/cpuset-internal.h >>>> +++ b/kernel/cgroup/cpuset-internal.h >>>> @@ -9,6 +9,7 @@ >>>>    #include >>>>    #include >>>>    #include >>>> +#include >>>>      /* See "Frequency meter" comments, below. */ >>>>    @@ -185,6 +186,8 @@ struct cpuset { >>>>    #endif >>>>    }; >>>>    +extern struct cpuset top_cpuset; >>>> + >>>>    static inline struct cpuset *css_cs(struct cgroup_subsys_state *css) >>>>    { >>>>        return css ? container_of(css, struct cpuset, css) : NULL; >>>> @@ -242,6 +245,22 @@ static inline int is_spread_slab(const struct cpuset *cs) >>>>        return test_bit(CS_SPREAD_SLAB, &cs->flags); >>>>    } >>>>    +/* >>>> + * Helper routine for generate_sched_domains(). >>>> + * Do cpusets a, b have overlapping effective cpus_allowed masks? >>>> + */ >>>> +static inline int cpusets_overlap(struct cpuset *a, struct cpuset *b) >>>> +{ >>>> +    return cpumask_intersects(a->effective_cpus, b->effective_cpus); >>>> +} >>>> + >>>> +static inline int nr_cpusets(void) >>>> +{ >>>> +    assert_cpuset_lock_held(); >>> For a simple helper like this one which only does an atomic_read(), I don't think you need to assert >>> that cpuset_mutex is held. >>> >> Will remove it. >> >> I added the lock because the location where it’s removed already includes the comment: >> /* Must be called with cpuset_mutex held.  */ >> >>>> +    /* jump label reference count + the top-level cpuset */ >>>> +    return static_key_count(&cpusets_enabled_key.key) + 1; >>>> +} >>>> + >>>>    /** >>>>     * cpuset_for_each_child - traverse online children of a cpuset >>>>     * @child_cs: loop cursor pointing to the current child >>>> @@ -298,6 +317,9 @@ void cpuset1_init(struct cpuset *cs); >>>>    void cpuset1_online_css(struct cgroup_subsys_state *css); >>>>    void update_domain_attr_tree(struct sched_domain_attr *dattr, >>>>                        struct cpuset *root_cs); >>>> +int cpuset1_generate_sched_domains(cpumask_var_t **domains, >>>> +            struct sched_domain_attr **attributes); >>>> + >>>>    #else >>>>    static inline void cpuset1_update_task_spread_flags(struct cpuset *cs, >>>>                        struct task_struct *tsk) {} >>>> @@ -311,6 +333,8 @@ static inline void cpuset1_init(struct cpuset *cs) {} >>>>    static inline void cpuset1_online_css(struct cgroup_subsys_state *css) {} >>>>    static inline void update_domain_attr_tree(struct sched_domain_attr *dattr, >>>>                        struct cpuset *root_cs) {} >>>> +static inline int cpuset1_generate_sched_domains(cpumask_var_t **domains, >>>> +            struct sched_domain_attr **attributes) { return 0; }; >>>>      #endif /* CONFIG_CPUSETS_V1 */ >>>>    diff --git a/kernel/cgroup/cpuset-v1.c b/kernel/cgroup/cpuset-v1.c >>>> index 95de6f2a4cc5..5c0bded46a7c 100644 >>>> --- a/kernel/cgroup/cpuset-v1.c >>>> +++ b/kernel/cgroup/cpuset-v1.c >>>> @@ -580,6 +580,173 @@ void update_domain_attr_tree(struct sched_domain_attr *dattr, >>>>        rcu_read_unlock(); >>>>    } >>>>    +/* >>>> + * cpuset1_generate_sched_domains() >>>> + * >>>> + * Finding the best partition (set of domains): >>>> + *    The double nested loops below over i, j scan over the load >>>> + *    balanced cpusets (using the array of cpuset pointers in csa[]) >>>> + *    looking for pairs of cpusets that have overlapping cpus_allowed >>>> + *    and merging them using a union-find algorithm. >>>> + * >>>> + *    The union of the cpus_allowed masks from the set of all cpusets >>>> + *    having the same root then form the one element of the partition >>>> + *    (one sched domain) to be passed to partition_sched_domains(). >>>> + */ >>>> +int cpuset1_generate_sched_domains(cpumask_var_t **domains, >>>> +            struct sched_domain_attr **attributes) >>>> +{ >>>> +    struct cpuset *cp;    /* top-down scan of cpusets */ >>>> +    struct cpuset **csa;    /* array of all cpuset ptrs */ >>>> +    int csn;        /* how many cpuset ptrs in csa so far */ >>>> +    int i, j;        /* indices for partition finding loops */ >>>> +    cpumask_var_t *doms;    /* resulting partition; i.e. sched domains */ >>>> +    struct sched_domain_attr *dattr;  /* attributes for custom domains */ >>>> +    int ndoms = 0;        /* number of sched domains in result */ >>>> +    int nslot;        /* next empty doms[] struct cpumask slot */ >>>> +    struct cgroup_subsys_state *pos_css; >>>> +    bool root_load_balance = is_sched_load_balance(&top_cpuset); >>>> +    int nslot_update; >>>> + >>>> +    assert_cpuset_lock_held(); >>>> + >>>> +    doms = NULL; >>>> +    dattr = NULL; >>>> +    csa = NULL; >>>> + >>>> +    /* Special case for the 99% of systems with one, full, sched domain */ >>>> +    if (root_load_balance) { >>>> +single_root_domain: >>>> +        ndoms = 1; >>>> +        doms = alloc_sched_domains(ndoms); >>>> +        if (!doms) >>>> +            goto done; >>>> + >>>> +        dattr = kmalloc(sizeof(struct sched_domain_attr), GFP_KERNEL); >>>> +        if (dattr) { >>>> +            *dattr = SD_ATTR_INIT; >>>> +            update_domain_attr_tree(dattr, &top_cpuset); >>>> +        } >>>> +        cpumask_and(doms[0], top_cpuset.effective_cpus, >>>> +                housekeeping_cpumask(HK_TYPE_DOMAIN)); >>>> + >>>> +        goto done; >>>> +    } >>>> + >>>> +    csa = kmalloc_array(nr_cpusets(), sizeof(cp), GFP_KERNEL); >>>> +    if (!csa) >>>> +        goto done; >>>> +    csn = 0; >>>> + >>>> +    rcu_read_lock(); >>>> +    if (root_load_balance) >>>> +        csa[csn++] = &top_cpuset; >>>> +    cpuset_for_each_descendant_pre(cp, pos_css, &top_cpuset) { >>>> +        if (cp == &top_cpuset) >>>> +            continue; >>>> + >>>> +        /* >>>> +         * v1: >>> Remove this v1 line. >> Will do. >> >>>> +         * Continue traversing beyond @cp iff @cp has some CPUs and >>>> +         * isn't load balancing.  The former is obvious.  The >>>> +         * latter: All child cpusets contain a subset of the >>>> +         * parent's cpus, so just skip them, and then we call >>>> +         * update_domain_attr_tree() to calc relax_domain_level of >>>> +         * the corresponding sched domain. >>>> +         */ >>>> +        if (!cpumask_empty(cp->cpus_allowed) && >>>> +            !(is_sched_load_balance(cp) && >>>> +              cpumask_intersects(cp->cpus_allowed, >>>> +                     housekeeping_cpumask(HK_TYPE_DOMAIN)))) >>>> +            continue; >>>> + >>>> +        if (is_sched_load_balance(cp) && >>>> +            !cpumask_empty(cp->effective_cpus)) >>>> +            csa[csn++] = cp; >>>> + >>>> +        /* skip @cp's subtree */ >>>> +        pos_css = css_rightmost_descendant(pos_css); >>>> +        continue; >>>> +    } >>>> +    rcu_read_unlock(); >>>> + >>>> +    /* >>>> +     * If there are only isolated partitions underneath the cgroup root, >>>> +     * we can optimize out unneeded sched domains scanning. >>>> +     */ >>>> +    if (root_load_balance && (csn == 1)) >>>> +        goto single_root_domain; >>> This check is v2 specific and you can remove it as well as the "single_root_domain" label. >>> >> Thank you. >> >> Will remove. >> >> Just a note — I removed this code for cpuset v2. Please confirm if that's acceptable. If we drop the >> v1-specific logic, handling this case wouldn’t take much extra work. > > This code is there because of the single dom check above that handles both v1 and v2. With just one > version to support, this extra code isn't necessary. > Thank you, got. -- Best regards, Ridong