From: Ridong Chen <ridong.chen@linux.dev>
To: "Waiman Long" <longman@redhat.com>, "Tejun Heo" <tj@kernel.org>,
"Johannes Weiner" <hannes@cmpxchg.org>,
"Michal Koutný" <mkoutny@suse.com>,
"Jonathan Corbet" <corbet@lwn.net>,
"Shuah Khan" <skhan@linuxfoundation.org>
Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org
Subject: Re: [PATCH v2 1/2] cgroup/cpuset: Avoid unnecessary cpus & mems update in cpuset_hotplug_update_tasks()
Date: Wed, 24 Jun 2026 13:51:46 +0800 [thread overview]
Message-ID: <2c59b7b7-a3c0-4799-8c39-2288299b117e@linux.dev> (raw)
In-Reply-To: <20260623230413.1984188-2-longman@redhat.com>
On 6/24/2026 7:04 AM, Waiman Long wrote:
> As reported by sashiko [1], cpuset_hotplug_update_tasks() may perform
> unnecessary task iteration and updating of tasks' CPU and node masks
> when mems_allowed and/or cpus_allowed are not set in cpuset v2. It is
> due to the fact that the temporary new_cpus and new_mems masks do not
> inherit parent's effective_cpus/mems when they are empty which is the
> expected behavior for cpuset v2 since commit 4ec22e9c5a90 ("cpuset:
> Enable cpuset controller in default hierarchy").
>
> Fix that and avoid unnecessay work by enhancing
> compute_effective_cpumask() to add the empty cpumask check
> and inheriting the parent's versions if empty when in v2. A new
> compute_effective_nodemask() helper is also added to perform similar
> function for new effective_mems.
>
> Add new test_cpuset_prs.sh test cases to confirm that effective_cpus
> will inherit the parent's version if cpuset.cpus is empty.
>
> [1] https://sashiko.dev/#/patchset/20260621032816.1806773-1-longman%40redhat.com
>
> Suggested-by: Ridong Chen <ridong.chen@linux.dev>
> Fixes: 4ec22e9c5a90 ("cpuset: Enable cpuset controller in default hierarchy")
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
> kernel/cgroup/cpuset.c | 45 +++++++++++--------
> .../selftests/cgroup/test_cpuset_prs.sh | 11 ++++-
> 2 files changed, 35 insertions(+), 21 deletions(-)
>
> diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
> index aff86acea701..044ddbf66f8e 100644
> --- a/kernel/cgroup/cpuset.c
> +++ b/kernel/cgroup/cpuset.c
> @@ -1094,12 +1094,35 @@ void cpuset_update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus)
> * @cs: the cpuset the need to recompute the new effective_cpus mask
> * @parent: the parent cpuset
> *
> + * For v2, the parent's effective_cpus is inherited if cpumask empty.
> * The result is valid only if the given cpuset isn't a partition root.
> */
> static void compute_effective_cpumask(struct cpumask *new_cpus,
> struct cpuset *cs, struct cpuset *parent)
> {
> - cpumask_and(new_cpus, cs->cpus_allowed, parent->effective_cpus);
> + bool has_cpus;
> +
> + has_cpus = cpumask_and(new_cpus, cs->cpus_allowed, parent->effective_cpus);
> + if (!has_cpus && is_in_v2_mode())
> + cpumask_copy(new_cpus, parent->effective_cpus);
> +}
> +
> +/**
> + * compute_effective_nodemask - Compute the effective nodemask of the cpuset
> + * @new_cpus: the temp variable for the new effective_mems mask
> + * @cs: the cpuset the need to recompute the new effective_mems mask
> + * @parent: the parent cpuset
> + *
> + * For v2, the parent's effective_mems is inherited if nodemask empty.
> + */
> +static void compute_effective_nodemask(nodemask_t *new_mems,
> + struct cpuset *cs, struct cpuset *parent)
> +{
> + bool has_mems;
> +
> + has_mems = nodes_and(*new_mems, cs->mems_allowed, parent->effective_mems);
> + if (!has_mems && is_in_v2_mode())
> + nodes_copy(*new_mems, parent->effective_mems);
> }
>
> /*
> @@ -2148,15 +2171,6 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
> goto update_parent_effective;
> }
>
> - /*
> - * If it becomes empty, inherit the effective mask of the
> - * parent, which is guaranteed to have some CPUs unless
> - * it is a partition root that has explicitly distributed
> - * out all its CPUs.
> - */
> - if (is_in_v2_mode() && !remote && cpumask_empty(tmp->new_cpus))
> - cpumask_copy(tmp->new_cpus, parent->effective_cpus);
> -
> /*
> * Skip the whole subtree if
> * 1) the cpumask remains the same,
> @@ -2704,14 +2718,7 @@ static void update_nodemasks_hier(struct cpuset *cs, nodemask_t *new_mems)
> cpuset_for_each_descendant_pre(cp, pos_css, cs) {
> struct cpuset *parent = parent_cs(cp);
>
> - bool has_mems = nodes_and(*new_mems, cp->mems_allowed, parent->effective_mems);
> -
> - /*
> - * If it becomes empty, inherit the effective mask of the
> - * parent, which is guaranteed to have some MEMs.
> - */
> - if (is_in_v2_mode() && !has_mems)
> - *new_mems = parent->effective_mems;
> + compute_effective_nodemask(new_mems, cp, parent);
>
> /* Skip the whole subtree if the nodemask remains the same. */
> if (nodes_equal(*new_mems, cp->effective_mems)) {
> @@ -3923,7 +3930,7 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp)
>
> parent = parent_cs(cs);
> compute_effective_cpumask(&new_cpus, cs, parent);
> - nodes_and(new_mems, cs->mems_allowed, parent->effective_mems);
> + compute_effective_nodemask(&new_mems, cs, parent);
>
> if (!tmp || !cs->partition_root_state)
> goto update_tasks;
> diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
> index 0d41aa0d343d..ca9bc38fdb95 100755
> --- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh
> +++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
> @@ -495,13 +495,20 @@ REMOTE_TEST_MATRIX=(
> # Narrowing cpuset.cpus to previously sibling-excluded CPUs should
> # not return CPUs that were never actually owned.
> " C1-4:P1 . C1-2:P1 C1-3:P2 . . \
> - . . . C3 . . p1:4|c11:1-2|c12:3 \
> + . . . C3 . . p1:4|c11:1-2|c12:3 \
> p1:P1|c11:P1|c12:P2 3"
> # Expanding cpuset.cpus to include a previously sibling-excluded CPU
> # after the sibling has become a member should correctly request it.
> " C1-4:P1 . C1-2:P1 C1-3:P2 . . \
> - . . P0 C2-3 . . p1:1,4|c11:1|c12:2-3 \
> + . . P0 C2-3 . . p1:1,4|c11:1|c12:2-3 \
> p1:P1|c11:P0|c12:P2 2-3"
> + # Cpusets with empty cpuset.cpus should inherit parent's effective_cpus
> + " C1-4:P1 C5-6 C1-2 . C5 . \
> + . P1 P1 . . . p1:3-4|p2:5-6|c11:1-2|c12:3-4|c21:5|c22:5-6 \
> + p1:P1|p2:P1|c11:P1"
> + " C1-4:P1 C5-6 C1-2 . C5 . \
> + . P1 P1 . O5=0 . p1:3-4|p2:6|c11:1-2|c12:3-4|c21:6|c22:6 \
> + p1:P1|p2:P1|c11:P1"
> )
>
> #
LGTM.
Reviewed-by: Ridong Chen <ridong.chen@linux.dev>
--
Best regards
Ridong
next prev parent reply other threads:[~2026-06-24 5:52 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-23 23:04 [PATCH v2 0/2] cgroup/cpuset: Miscellaneous fixes and cleanups Waiman Long
2026-06-23 23:04 ` [PATCH v2 1/2] cgroup/cpuset: Avoid unnecessary cpus & mems update in cpuset_hotplug_update_tasks() Waiman Long
2026-06-24 5:51 ` Ridong Chen [this message]
2026-06-24 8:40 ` Manuel Ebner
2026-06-23 23:04 ` [PATCH v2 2/2] cgroup/cpuset: Rebind/migrate mm only for threadgroup leader in cpuset_update_tasks_nodemask() Waiman Long
2026-06-24 8:27 ` Manuel Ebner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2c59b7b7-a3c0-4799-8c39-2288299b117e@linux.dev \
--to=ridong.chen@linux.dev \
--cc=cgroups@vger.kernel.org \
--cc=corbet@lwn.net \
--cc=hannes@cmpxchg.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=longman@redhat.com \
--cc=mkoutny@suse.com \
--cc=skhan@linuxfoundation.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox