Linux cgroups development
 help / color / mirror / Atom feed
From: Chen Ridong <chenridong@huaweicloud.com>
To: Sun Shaojie <sunshaojie@kylinos.cn>, mkoutny@suse.com, llong@redhat.com
Cc: cgroups@vger.kernel.org, hannes@cmpxchg.org,
	linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
	shuah@kernel.org, tj@kernel.org
Subject: Re: [PATCH v2] cpuset: relax the overlap check for cgroup-v2
Date: Sat, 15 Nov 2025 15:41:03 +0800	[thread overview]
Message-ID: <d9332ba1-2614-44c2-b2e8-eab213f196bc@huaweicloud.com> (raw)
In-Reply-To: <20251115060211.853449-1-sunshaojie@kylinos.cn>



On 2025/11/15 14:02, Sun Shaojie wrote:
> On 2015/11/15 08:58, Chen Ridong wrote:
>> On 2025/11/15 0:14, Michal Koutný wrote:
>>> On Fri, Nov 14, 2025 at 09:29:20AM +0800, Chen Ridong <chenridong@huaweicloud.com> wrote:
>>>> After further consideration, I still suggest retaining this rule.
>>>
>>> Apologies, I'm slightly lost which rule. I hope the new iteration from
>>> Shaojie with both before/after tables will explain it.
>>>
>>
>> The rule has changed in this patch from "If either cpuset is exclusive, check if they are mutually
>> exclusive" to
>> "If both cpusets are exclusive, check if they are mutually exclusive"
>>
>>  -    /* If either cpuset is exclusive, check if they are mutually exclusive */
>>  -    if (is_cpu_exclusive(cs1) || is_cpu_exclusive(cs2))
>>  +    /* If both cpusets are exclusive, check if they are mutually exclusive */
>>  +    if (is_cpu_exclusive(cs1) && is_cpu_exclusive(cs2))
>>  +        return !cpusets_are_exclusive(cs1, cs2);
>>
>> I suggest not modifying this rule and keeping the original logic intact:
>>
>>>> For am example:
>>>>   Step                                       | A1's prstate | B1's prstate |
>>>>   #1> mkdir -p A1                            | member       |              |
>>>>   #2> echo "0-1" > A1/cpuset.cpus.exclusive  | member       |              |
>>>>   #3> echo "root" > A1/cpuset.cpus.partition | root         |              |
>>>>   #4> mkdir -p B1                            | root         | member       |
>>>>   #5> echo "0" > B1/cpuset.cpus              | root invalid | member       |
>>>>
>>>> Currently, we mark A1 as invalid. But similar to the logic in this patch, why must A1 be
>>>> invalidated?
>>>
>>> A1 is invalidated becase it doesn't have exclusive ownership of CPU 0
>>> anymore.
>>>
>>>> B1 could also use the parent's effective CPUs, right?
>>>
>>> Here you assume some ordering between siblings treating A1 more
>>> important than B1. But it's symmetrical in principle, no?
>>>
>>
>> I’m using an example to illustrate that if Shaojie’s patch is accepted, other rules could be relaxed
>> following the same logic—but I’m not in favor of doing so.
> 
> Hi, Ridong,
> 
> Thank you for pointing out the issue with the current patch; this is indeed
> not what our product intends. I must admit that I haven't thoroughly tested
> on such recent kernel versions.
> 
> Obviously, this patch is flawed. However, patch v3 is needed. Regarding the
> "other rules" you mentioned, we do not intend to relax them. On the 
> contrary, we aim to maintain them firmly.
> 
> Our product need ensure the following behavior: in cgroup-v2, user 
> modifications to one cpuset should not affect the partition state of its 
> sibling cpusets. This is justified and meaningful, as it aligns with the 
> isolation characteristics of cgroups.
> 

This is ideal in theory, but I don’t think it’s practical in reality.

> This can be divided into two scenarios:
> Scenario 1: Only one of A1 and B1 is "root".
> Scenario 2: Both A1 and B1 are "root".
> 
> We plan to implement Scenario 1 first. This is the goal of patch v2.
> However, patch v2 is flawed because it does not strictly adhere to the 
> following existing rule.
> 
> However, it is worth noting that the current cgroup v2 implementation does 
> not strictly adhere to the following rule either (which is also an 
> objective for patch v3 to address).
> 
> Rule 1: "cpuset.cpus" cannot be a subset of a sibling's "cpuset.cpus.exclusive".
> 
> Using your example to illustrate.
>  Step (refer to the steps in the table below)
>  #1> mkdir -p A1                           
>  #2> echo "0-1" > A1/cpuset.cpus.exclusive 
>  #3> echo "root" > A1/cpuset.cpus.partition
>  #4> mkdir -p B1               
>  #5> echo "0" > B1/cpuset.cpus 
> 
> Table 1: Current result
>  Step | return | A1's excl_cpus | B1's cpus | A1's prstate | B1's prstate |
>  #1   | 0      |                |           | member       |              |
>  #2   | 0      | 0-1            |           | member       |              |
>  #3   | 0      | 0-1            |           | root         |              |
>  #4   | 0      | 0-1            |           | root         | member       |
>  #5   | 0      | 0-1            | 0         | root invalid | member       |
> 

I think this what we expect.

> Table 2: Expected result
>  Step | return | A1's excl_cpus | B1's cpus | A1's prstate | B1's prstate |
>  #1   | 0      |                |           | member       |              |
>  #2   | 0      | 0-1            |           | member       |              |
>  #3   | 0      | 0-1            |           | root         |              |
>  #4   | 0      | 0-1            |           | root         | member       |
>  #5   | error  | 0-1            |           | root         | member       |
> 

Step 5 should not return an error. As Longman pointed out, in cgroup-v2, setting cpuset.cpus should
never fail.

> Currently, after step #5, the operation returns success, which clearly 
> violates Rule 1, as B1's "cpuset.cpus" is a subset of A1's 
> "cpuset.cpus.exclusive".
> 
> Therefore, after step #5, the operation should return error, with A1 
> remaining as "root". This better complies with the Rule 1.
> 

This is an exclusivity rule. Since it violates the exclusivity rules, A1 should be invalidated.

> ------
> The following content is provided for reference, and we hope it may be 
> adopted in the future.
> !!These are not part of what patch v3 will implement.
> 
> As for Scenario 2 (Both A1 and B1 are "root"), we will retain the current 
> cgroup v2 behavior. This patch series does not modify it, but we hope to 
> draw the maintainers' attention, as we indeed have plans for future 
> modifications. Our intent can be seen from the following examples.
> 
> For example:
>  Step (refer to the steps in the table below)
>  #1> mkdir -p A1                           
>  #2> echo "0-1"  > A1/cpuset.cpus 
>  #3> echo "root" > A1/cpuset.cpus.partition
>  #4> mkdir -p B1               
>  #5> echo "2-3"  > B1/cpuset.cpus 
>  #6> echo "root" > B1/cpuset.cpus.partition
>  #7> echo "1-2"  > B1/cpuset.cpus
> 
> Table 1: Current result
>  Step | A1's eft_cpus | B1's eft_cpus | A1's prstate | B1's prstate |
>  #1   | from parent   |               | member       |              |
>  #2   | 0-1           |               | member       |              |
>  #3   | 0-1           |               | root         |              |
>  #4   | 0-1           | from parent   | root         | member       |
>  #5   | 0-1           | 2-3           | root         | member       |
>  #6   | 0-1           | 2-3           | root         | root         |
>  #7   | 0-1           | 1-2           | root invalid | root invalid |
> 
> Table 2: Expected result
>  Step | A1's eft_cpus | B1's eft_cpus | A1's prstate | B1's prstate |
>  #1   | from parent   |               | member       |              |
>  #2   | 0-1           |               | member       |              |
>  #3   | 0-1           |               | root         |              |
>  #4   | 0-1           | from parent   | root         | member       |
>  #5   | 0-1           | 2-3           | root         | member       |
>  #6   | 0-1           | 2-3           | root         | root         |
>  #7   | 0-1           | 2             | root         | root invalid |
> 
> After step #7, we expect A1 to remain "root" (unaffected), while only B1 
> becomes "root invalid".
> 

With the result you expect, would we observe the following behaviors:

#1> mkdir -p A1
#2> mkdir -p B1
#3> echo "0-1"  > A1/cpuset.cpus
#4> echo "1-2"  > B1/cpuset.cpus
#5> echo "root" > A1/cpuset.cpus.partition
#6> echo "root" > B1/cpuset.cpus.partition # A1:root;B1:root invalid

#1> mkdir -p A1
#2> mkdir -p B1
#3> echo "0-1"  > A1/cpuset.cpus
#4> echo "1-2"  > B1/cpuset.cpus
#5> echo "root" > B1/cpuset.cpus.partition
#6> echo "root" > A1/cpuset.cpus.partition # A1:root invalid;B1:root

Do different operation orders yield different results? If so, this is not what we expect.

-- 
Best regards,
Ridong


  reply	other threads:[~2025-11-15  7:41 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-13 13:14 [PATCH v2] cpuset: relax the overlap check for cgroup-v2 Sun Shaojie
2025-11-13 14:57 ` Waiman Long
2025-11-13 17:07 ` Michal Koutný
2025-11-14  1:29   ` Chen Ridong
2025-11-14 16:14     ` Michal Koutný
2025-11-15  0:58       ` Chen Ridong
2025-11-14  0:50 ` Chen Ridong
2025-11-14  5:53   ` Sun Shaojie
2025-11-14  6:24   ` Sun Shaojie
2025-11-14 16:15     ` Michal Koutný
2025-11-15  2:01       ` Chen Ridong
2025-11-15  9:31         ` [PATCH -next] cpuset: treate root invalid trialcs as exclusive Chen Ridong
2025-11-16 14:08           ` [PATCH v2] cpuset: relax the overlap check for cgroup-v2 Sun Shaojie
2025-11-17  4:35           ` [PATCH -next] cpuset: treate root invalid trialcs as exclusive Sun Shaojie
2025-11-17  6:23             ` Chen Ridong
2025-11-17  6:53               ` Sun Shaojie
2025-11-17  7:30                 ` Chen Ridong
2025-11-17 15:56           ` Waiman Long
2025-11-15  9:51         ` [PATCH v2] cpuset: relax the overlap check for cgroup-v2 Chen Ridong
2025-11-15 11:24       ` Sun Shaojie
2025-11-14  6:33   ` Sun Shaojie
2025-11-14  6:59     ` Chen Ridong
2025-11-15  6:02   ` Sun Shaojie
2025-11-15  7:41     ` Chen Ridong [this message]
2025-11-17 18:43     ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d9332ba1-2614-44c2-b2e8-eab213f196bc@huaweicloud.com \
    --to=chenridong@huaweicloud.com \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=llong@redhat.com \
    --cc=mkoutny@suse.com \
    --cc=shuah@kernel.org \
    --cc=sunshaojie@kylinos.cn \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox