From: Waiman Long <longman@redhat.com>
To: "Tejun Heo" <tj@kernel.org>,
"Johannes Weiner" <hannes@cmpxchg.org>,
"Michal Koutný" <mkoutny@suse.com>,
"Jonathan Corbet" <corbet@lwn.net>,
"Shuah Khan" <skhan@linuxfoundation.org>,
"Catalin Marinas" <catalin.marinas@arm.com>,
"Will Deacon" <will@kernel.org>,
"K. Y. Srinivasan" <kys@microsoft.com>,
"Haiyang Zhang" <haiyangz@microsoft.com>,
"Wei Liu" <wei.liu@kernel.org>,
"Dexuan Cui" <decui@microsoft.com>,
"Long Li" <longli@microsoft.com>,
"Guenter Roeck" <linux@roeck-us.net>,
"Frederic Weisbecker" <frederic@kernel.org>,
"Paul E. McKenney" <paulmck@kernel.org>,
"Neeraj Upadhyay" <neeraj.upadhyay@kernel.org>,
"Joel Fernandes" <joelagnelf@nvidia.com>,
"Josh Triplett" <josh@joshtriplett.org>,
"Boqun Feng" <boqun@kernel.org>,
"Uladzislau Rezki" <urezki@gmail.com>,
"Steven Rostedt" <rostedt@goodmis.org>,
"Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>,
"Lai Jiangshan" <jiangshanlai@gmail.com>,
Zqiang <qiang.zhang@linux.dev>,
"Anna-Maria Behnsen" <anna-maria@linutronix.de>,
"Ingo Molnar" <mingo@kernel.org>,
"Thomas Gleixner" <tglx@kernel.org>,
"Chen Ridong" <chenridong@huaweicloud.com>,
"Peter Zijlstra" <peterz@infradead.org>,
"Juri Lelli" <juri.lelli@redhat.com>,
"Vincent Guittot" <vincent.guittot@linaro.org>,
"Dietmar Eggemann" <dietmar.eggemann@arm.com>,
"Ben Segall" <bsegall@google.com>, "Mel Gorman" <mgorman@suse.de>,
"Valentin Schneider" <vschneid@redhat.com>,
"K Prateek Nayak" <kprateek.nayak@amd.com>,
"David S. Miller" <davem@davemloft.net>,
"Eric Dumazet" <edumazet@google.com>,
"Jakub Kicinski" <kuba@kernel.org>,
"Paolo Abeni" <pabeni@redhat.com>,
"Simon Horman" <horms@kernel.org>
Cc: cgroups@vger.kernel.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
linux-hyperv@vger.kernel.org, linux-hwmon@vger.kernel.org,
rcu@vger.kernel.org, netdev@vger.kernel.org,
linux-kselftest@vger.kernel.org,
Costa Shulyupin <cshulyup@redhat.com>,
Qiliang Yuan <realwujing@gmail.com>,
Waiman Long <longman@redhat.com>
Subject: [PATCH 23/23] cgroup/cpuset: Documentation and kselftest updates
Date: Mon, 20 Apr 2026 23:03:51 -0400 [thread overview]
Message-ID: <20260421030351.281436-24-longman@redhat.com> (raw)
In-Reply-To: <20260421030351.281436-1-longman@redhat.com>
As CPU hotplug is now being used to enable runtime update to the list
of nohz_full and managed_irq CPUs, we should avoid using CPU 0 in the
formation of isolated partition as CPU 0 may not be able to be brought
offline like in the case of x86-64 architecture. So a number of the
test cases in test_cpuset_prs.sh will have to be updated accordingly.
A new test will also be run in offline isn't allowed in CPU 0 to verify
that using CPU 0 as part of an isolated partition will fail.
The cgroup-v2.rst is also updated to reflect the new capability of using
CPU hotplug to enable run time change to the nohz_full and managed_irq
CPU lists.
Since there is a slight performance overhead to enable runtime changes
to nohz_full CPU list, users have to explicitly opt in by adding a
"nohz_ful" kernel command line parameter with or without a CPU list.
Signed-off-by: Waiman Long <longman@redhat.com>
---
Documentation/admin-guide/cgroup-v2.rst | 35 +++++++---
.../selftests/cgroup/test_cpuset_prs.sh | 70 +++++++++++++++++--
2 files changed, 92 insertions(+), 13 deletions(-)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 8ad0b2781317..e97fc031eb86 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2604,11 +2604,12 @@ Cpuset Interface Files
It accepts only the following input values when written to.
- ========== =====================================
+ ========== ===============================================
"member" Non-root member of a partition
"root" Partition root
- "isolated" Partition root without load balancing
- ========== =====================================
+ "isolated" Partition root without load balancing and other
+ OS noises
+ ========== ===============================================
A cpuset partition is a collection of cpuset-enabled cgroups with
a partition root at the top of the hierarchy and its descendants
@@ -2652,11 +2653,29 @@ Cpuset Interface Files
partition or scheduling domain. The set of exclusive CPUs is
determined by the value of its "cpuset.cpus.exclusive.effective".
- When set to "isolated", the CPUs in that partition will be in
- an isolated state without any load balancing from the scheduler
- and excluded from the unbound workqueues. Tasks placed in such
- a partition with multiple CPUs should be carefully distributed
- and bound to each of the individual CPUs for optimal performance.
+ When set to "isolated", the CPUs in that partition will be in an
+ isolated state without any load balancing from the scheduler and
+ excluded from the unbound workqueues as well as other OS noises.
+ Tasks placed in such a partition with multiple CPUs should be
+ carefully distributed and bound to each of the individual CPUs
+ for optimal performance.
+
+ As CPU hotplug, if supported, is used to improve the degree of
+ CPU isolation close to the "nohz_full" kernel boot parameter.
+ In some architectures, like x86-64, the boot CPU (typically CPU
+ 0) cannot be brought offline, so the boot CPU should not be used
+ for forming isolated partitions. The "nohz_full" kernel boot
+ parameter needs to be present to enable full dynticks support
+ and RCU no-callback CPU mode for CPUs in isolated partitions
+ even if the optional cpu list isn't provided.
+
+ Using CPU hotplug for creating or destroying an isolated
+ partition can cause latency spike in applications running
+ in other isolated partitions. A reserved list of CPUs can
+ optionally be put in the "nohz_full" kernel boot parameter to
+ alleviate this problem. When these reserved CPUs are used for
+ isolated partitions, CPU hotplug won't need to be invoked and
+ so there won't be latency spike in other isolated partitions.
A partition root ("root" or "isolated") can be in one of the
two possible states - valid or invalid. An invalid partition
diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
index a56f4153c64d..eebb4122b581 100755
--- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh
+++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
@@ -67,6 +67,12 @@ then
echo Y > /sys/kernel/debug/sched/verbose
fi
+# Enable dynamic debug message if available
+DYN_DEBUG=/proc/dynamic_debug/control
+[[ -f $DYN_DEBUG ]] && {
+ echo "file kernel/cpu.c +p" > $DYN_DEBUG
+}
+
cd $CGROUP2
echo +cpuset > cgroup.subtree_control
@@ -84,6 +90,15 @@ echo member > test/cpuset.cpus.partition
echo "" > test/cpuset.cpus
[[ $RESULT -eq 0 ]] && skip_test "Child cgroups are using cpuset!"
+#
+# If nohz_full parameter is specified and nohz_full file exists, CPU hotplug
+# will be used to modify nohz_full cpumask to include all the isolated CPUs
+# in cpuset isolated partitions.
+#
+NOHZ_FULL=/sys/devices/system/cpu/nohz_full
+BOOT_NOHZ_FULL=$(fmt -1 /proc/cmdline | grep "^nohz_full")
+[[ "$BOOT_NOHZ_FULL" = nohz_full ]] && CHK_NOHZ_FULL=1
+
#
# If isolated CPUs have been reserved at boot time (as shown in
# cpuset.cpus.isolated), these isolated CPUs should be outside of CPUs 0-8
@@ -318,8 +333,8 @@ TEST_MATRIX=(
# Invalid to valid local partition direct transition tests
" C1-3:P2 X4:P2 . . . . . . 0 A1:1-3|XA1:1-3|A2:1-3:XA2: A1:P2|A2:P-2 1-3"
" C1-3:P2 X4:P2 . . . X3:P2 . . 0 A1:1-2|XA1:1-3|A2:3:XA2:3 A1:P2|A2:P2 1-3"
- " C0-3:P2 . . C4-6 C0-4 . . . 0 A1:0-4|B1:5-6 A1:P2|B1:P0"
- " C0-3:P2 . . C4-6 C0-4:C0-3 . . . 0 A1:0-3|B1:4-6 A1:P2|B1:P0 0-3"
+ " C1-3:P2 . . C4-6 C1-4 . . . 0 A1:1-4|B1:5-6 A1:P2|B1:P0"
+ " C1-3:P2 . . C4-6 C1-4:C1-3 . . . 0 A1:1-3|B1:4-6 A1:P2|B1:P0 1-3"
# Local partition invalidation tests
" C0-3:X1-3:P2 C1-3:X2-3:P2 C2-3:X3:P2 \
@@ -329,8 +344,8 @@ TEST_MATRIX=(
" C0-3:X1-3:P2 C1-3:X2-3:P2 C2-3:X3:P2 \
. . C4:X . . 0 A1:1-3|A2:1-3|A3:2-3|XA2:|XA3: A1:P2|A2:P-2|A3:P-2 1-3"
# Local partition CPU change tests
- " C0-5:P2 C4-5:P1 . . . C3-5 . . 0 A1:0-2|A2:3-5 A1:P2|A2:P1 0-2"
- " C0-5:P2 C4-5:P1 . . C1-5 . . . 0 A1:1-3|A2:4-5 A1:P2|A2:P1 1-3"
+ " C1-5:P2 C4-5:P1 . . . C3-5 . . 0 A1:1-2|A2:3-5 A1:P2|A2:P1 1-2"
+ " C1-5:P2 C4-5:P1 . . C2-5 . . . 0 A1:2-3|A2:4-5 A1:P2|A2:P1 2-3"
# cpus_allowed/exclusive_cpus update tests
" C0-3:X2-3 C1-3:X2-3 C2-3:X2-3 \
@@ -442,6 +457,21 @@ TEST_MATRIX=(
" C0-3 . . C4-5 X3-5 . . . 1 A1:0-3|B1:4-5"
)
+#
+# Test matrix to verify that using CPU 0 in isolated (local or remote) partition
+# will fail when offline isn't allowed for CPU 0.
+#
+CPU0_ISOLCPUS_MATRIX=(
+ # old-A1 old-A2 old-A3 old-B1 new-A1 new-A2 new-A3 new-B1 fail ECPUs Pstate ISOLCPUS
+ # ------ ------ ------ ------ ------ ------ ------ ------ ---- ----- ------ --------
+ " C0-3 . . C4-5 P2 . . . 0 A1:0-3|B1:4-5 A1:P-2"
+ " C1-3 . . . P2 . . . 0 A1:1-3 A1:P2"
+ " C1-3 . . . P2:C0-3 . . . 0 A1:0-3 A1:P-2"
+ " CX0-3 C0-3 . . . P2 . . 0 A1:0-3|A2:0-3 A2:P-2"
+ " CX0-3 C0-3:X1-3 . . . P2 . . 0 A1:0|A2:1-3 A2:P2"
+ " CX0-3 C0-3:X1-3 . . . P2:X0-3 . . 0 A1:0-3|A2:0-3 A2:P-2"
+)
+
#
# Cpuset controller remote partition test matrix.
#
@@ -513,7 +543,7 @@ write_cpu_online()
}
fi
echo $VAL > $CPUFILE
- pause 0.05
+ pause 0.10
}
#
@@ -654,6 +684,8 @@ dump_states()
[[ -e $PCPUS ]] && echo "$PCPUS: $(cat $PCPUS)"
[[ -e $ISCPUS ]] && echo "$ISCPUS: $(cat $ISCPUS)"
done
+ # Dump nohz_full
+ [[ -f $NOHZ_FULL ]] && echo "nohz_full: $(cat $NOHZ_FULL)"
}
#
@@ -789,6 +821,18 @@ check_isolcpus()
EXPECTED_SDOMAIN=$EXPECTED_ISOLCPUS
fi
+ #
+ # Check if nohz_full match cpuset.cpus.isolated if nohz_boot parameter
+ # specified with no parameter.
+ #
+ [[ -f $NOHZ_FULL && "$BOOT_NOHZ_FULL" = nohz_full ]] && {
+ NOHZ_FULL_CPUS=$(cat $NOHZ_FULL)
+ [[ "$ISOLCPUS" != "$NOHZ_FULL_CPUS" ]] && {
+ echo "nohz_full ($NOHZ_FULL_CPUS) does not match cpuset.cpus.isolated ($ISOLCPUS)"
+ return 1
+ }
+ }
+
#
# Appending pre-isolated CPUs
# Even though CPU #8 isn't used for testing, it can't be pre-isolated
@@ -1070,6 +1114,21 @@ run_remote_state_test()
echo "All $I tests of $TEST PASSED."
}
+#
+# Testing CPU 0 isolated partition test when offline is disabled
+#
+run_cpu0_isol_test()
+{
+ # Skip the test if CPU0 offline is allowed or if nohz_full kernel
+ # boot parameter is missing.
+ CPU0_ONLINE=/sys/devices/system/cpu/cpu0/online
+ [[ -f $CPU0_ONLINE ]] && return
+ grep -q -w nohz_full /proc/cmdline
+ [[ $? -ne 0 ]] && return
+
+ run_state_test CPU0_ISOLCPUS_MATRIX
+}
+
#
# Testing the new "isolated" partition root type
#
@@ -1207,6 +1266,7 @@ test_inotify()
trap cleanup 0 2 3 6
run_state_test TEST_MATRIX
run_remote_state_test REMOTE_TEST_MATRIX
+run_cpu0_isol_test
test_isolated
test_inotify
echo "All tests PASSED."
--
2.53.0
next prev parent reply other threads:[~2026-04-21 3:07 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-21 3:03 [PATCH-next 00/23] cgroup/cpuset: Enable runtime update of nohz_full and managed_irq CPUs Waiman Long
2026-04-21 3:03 ` [PATCH 01/23] sched/isolation: Add HK_TYPE_KERNEL_NOISE_BOOT & HK_TYPE_MANAGED_IRQ_BOOT Waiman Long
2026-04-21 3:03 ` [PATCH 02/23] sched/isolation: Enhance housekeeping_update() to support updating more than one HK cpumask Waiman Long
2026-04-22 3:08 ` sashiko-bot
2026-04-22 6:39 ` Chen Ridong
2026-04-21 3:03 ` [PATCH 03/23] tick/nohz: Make nohz_full parameter optional Waiman Long
2026-04-21 8:32 ` Thomas Gleixner
2026-04-21 14:14 ` Waiman Long
2026-04-24 15:57 ` Frederic Weisbecker
2026-04-22 3:08 ` sashiko-bot
2026-04-21 3:03 ` [PATCH 04/23] tick/nohz: Allow runtime changes in full dynticks CPUs Waiman Long
2026-04-21 8:50 ` Thomas Gleixner
2026-04-21 14:24 ` Waiman Long
2026-05-13 13:04 ` Frederic Weisbecker
2026-04-22 3:08 ` sashiko-bot
2026-04-21 3:03 ` [PATCH 05/23] tick: Pass timer tick job to an online HK CPU in tick_cpu_dying() Waiman Long
2026-04-21 8:55 ` Thomas Gleixner
2026-04-21 14:22 ` Waiman Long
2026-04-21 3:03 ` [PATCH 06/23] rcu/nocbs: Allow runtime changes in RCU NOCBS cpumask Waiman Long
2026-04-22 3:08 ` sashiko-bot
2026-04-23 2:05 ` Waiman Long
2026-04-21 3:03 ` [PATCH 07/23] watchdog: Sync up with runtime change of isolated CPUs Waiman Long
2026-04-22 3:08 ` sashiko-bot
2026-04-23 2:14 ` Waiman Long
2026-04-21 3:03 ` [PATCH 08/23] arm64: topology: Use RCU to protect access to HK_TYPE_TICK cpumask Waiman Long
2026-04-22 3:08 ` sashiko-bot
2026-04-22 9:34 ` Chen Ridong
2026-05-13 16:19 ` Frederic Weisbecker
2026-04-21 3:03 ` [PATCH 09/23] workqueue: Use RCU to protect access of HK_TYPE_TIMER cpumask Waiman Long
2026-04-21 3:03 ` [PATCH 10/23] cpu: " Waiman Long
2026-04-21 8:57 ` Thomas Gleixner
2026-04-21 14:25 ` Waiman Long
2026-04-21 3:03 ` [PATCH 11/23] hrtimer: " Waiman Long
2026-04-21 8:59 ` Thomas Gleixner
2026-04-22 3:09 ` sashiko-bot
2026-04-21 3:03 ` [PATCH 12/23] net: Use boot time housekeeping cpumask settings for now Waiman Long
2026-04-21 3:03 ` [PATCH 13/23] sched/core: Use RCU to protect access of HK_TYPE_KERNEL_NOISE cpumask Waiman Long
2026-04-22 3:09 ` sashiko-bot
2026-04-23 14:37 ` Waiman Long
2026-04-21 3:03 ` [PATCH 14/23] hwmon/coretemp: Use RCU to protect access of HK_TYPE_MISC cpumask Waiman Long
2026-04-22 3:09 ` sashiko-bot
2026-04-21 3:03 ` [PATCH 15/23] Drivers: hv: Use RCU to protect access of HK_TYPE_MANAGED_IRQ cpumask Waiman Long
2026-04-22 3:09 ` sashiko-bot
2026-04-23 17:14 ` Waiman Long
2026-04-21 3:03 ` [PATCH 16/23] genirq/cpuhotplug: " Waiman Long
2026-04-21 9:02 ` Thomas Gleixner
2026-04-21 14:29 ` Waiman Long
2026-04-21 3:03 ` [PATCH 17/23] sched/isolation: Extend housekeeping_dereference_check() to cover changes in nohz_full or manged_irqs cpumasks Waiman Long
2026-04-22 3:09 ` sashiko-bot
2026-04-23 17:30 ` Waiman Long
2026-04-21 3:03 ` [PATCH 18/23] cpu/hotplug: Add a new cpuhp_offline_cb() API Waiman Long
2026-04-21 16:17 ` Thomas Gleixner
2026-04-21 17:29 ` Waiman Long
2026-04-21 18:43 ` Thomas Gleixner
2026-04-22 3:09 ` sashiko-bot
2026-04-21 3:03 ` [PATCH 19/23] cgroup/cpuset: Improve check for calling housekeeping_update() Waiman Long
2026-04-23 1:10 ` Chen Ridong
2026-04-24 18:32 ` Waiman Long
2026-04-21 3:03 ` [PATCH 20/23] cgroup/cpuset: Enable runtime update of HK_TYPE_{KERNEL_NOISE,MANAGED_IRQ} cpumasks Waiman Long
2026-04-22 3:09 ` sashiko-bot
2026-04-21 3:03 ` [PATCH 21/23] cgroup/cpuset: Limit the side effect of using CPU hotplug on isolated partition Waiman Long
2026-04-22 3:09 ` sashiko-bot
2026-04-21 3:03 ` [PATCH 22/23] cgroup/cpuset: Prevent offline_disabled CPUs from being used in " Waiman Long
2026-04-22 3:09 ` sashiko-bot
2026-04-21 3:03 ` Waiman Long [this message]
2026-04-22 3:09 ` [PATCH 23/23] cgroup/cpuset: Documentation and kselftest updates sashiko-bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260421030351.281436-24-longman@redhat.com \
--to=longman@redhat.com \
--cc=anna-maria@linutronix.de \
--cc=boqun@kernel.org \
--cc=bsegall@google.com \
--cc=catalin.marinas@arm.com \
--cc=cgroups@vger.kernel.org \
--cc=chenridong@huaweicloud.com \
--cc=corbet@lwn.net \
--cc=cshulyup@redhat.com \
--cc=davem@davemloft.net \
--cc=decui@microsoft.com \
--cc=dietmar.eggemann@arm.com \
--cc=edumazet@google.com \
--cc=frederic@kernel.org \
--cc=haiyangz@microsoft.com \
--cc=hannes@cmpxchg.org \
--cc=horms@kernel.org \
--cc=jiangshanlai@gmail.com \
--cc=joelagnelf@nvidia.com \
--cc=josh@joshtriplett.org \
--cc=juri.lelli@redhat.com \
--cc=kprateek.nayak@amd.com \
--cc=kuba@kernel.org \
--cc=kys@microsoft.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-hwmon@vger.kernel.org \
--cc=linux-hyperv@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux@roeck-us.net \
--cc=longli@microsoft.com \
--cc=mathieu.desnoyers@efficios.com \
--cc=mgorman@suse.de \
--cc=mingo@kernel.org \
--cc=mkoutny@suse.com \
--cc=neeraj.upadhyay@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=qiang.zhang@linux.dev \
--cc=rcu@vger.kernel.org \
--cc=realwujing@gmail.com \
--cc=rostedt@goodmis.org \
--cc=skhan@linuxfoundation.org \
--cc=tglx@kernel.org \
--cc=tj@kernel.org \
--cc=urezki@gmail.com \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=wei.liu@kernel.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.