* Re: [PATCH v2] doc: Add CPU Isolation documentation
2026-03-26 14:00 [PATCH v2] doc: Add CPU Isolation documentation Frederic Weisbecker
@ 2026-03-26 19:17 ` Waiman Long
2026-03-26 21:42 ` Randy Dunlap
2026-03-27 16:01 ` Valentin Schneider
2 siblings, 0 replies; 10+ messages in thread
From: Waiman Long @ 2026-03-26 19:17 UTC (permalink / raw)
To: Frederic Weisbecker, LKML
Cc: Anna-Maria Behnsen, Gabriele Monaco, Ingo Molnar, Jonathan Corbet,
Marcelo Tosatti, Marco Crivellari, Michal Hocko,
Paul E . McKenney, Peter Zijlstra, Phil Auld, Steven Rostedt,
Thomas Gleixner, Valentin Schneider, Vlastimil Babka, linux-doc,
Sebastian Andrzej Siewior, Bagas Sanjaya
On 3/26/26 10:00 AM, Frederic Weisbecker wrote:
> nohz_full was introduced in v3.10 in 2013, which means this
> documentation is overdue for 13 years.
>
> Fortunately Paul wrote a part of the needed documentation a while ago,
> especially concerning nohz_full in Documentation/timers/no_hz.rst and
> also about per-CPU kthreads in
> Documentation/admin-guide/kernel-per-CPU-kthreads.rst
>
> Introduce a new page that gives an overview of CPU isolation in general.
>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> ---
> v2:
> - Fix links and code blocks (Bagas and Sebastian)
> - Isolation is not only about userspace, rephrase accordingly (Valentin)
> - Paste BIOS issues suggestion from Valentin
> - Include the whole rtla suite (Valentin)
> - Rephrase a few details (Waiman)
> - Talk about RCU induced overhead rather than slower RCU (Sebastian)
>
> Documentation/admin-guide/cpu-isolation.rst | 357 ++++++++++++++++++++
> Documentation/admin-guide/index.rst | 1 +
> 2 files changed, 358 insertions(+)
> create mode 100644 Documentation/admin-guide/cpu-isolation.rst
>
> diff --git a/Documentation/admin-guide/cpu-isolation.rst b/Documentation/admin-guide/cpu-isolation.rst
> new file mode 100644
> index 000000000000..886dec79b056
> --- /dev/null
> +++ b/Documentation/admin-guide/cpu-isolation.rst
> @@ -0,0 +1,357 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +=============
> +CPU Isolation
> +=============
> +
> +Introduction
> +============
> +
> +"CPU Isolation" means leaving a CPU exclusive to a given workload
> +without any undesired code interference from the kernel.
> +
> +Those interferences, commonly pointed out as "noise", can be triggered
> +by asynchronous events (interrupts, timers, scheduler preemption by
> +workqueues and kthreads, ...) or synchronous events (syscalls and page
> +faults).
> +
> +Such noise usually goes unnoticed. After all synchronous events are a
> +component of the requested kernel service. And asynchronous events are
> +either sufficiently well distributed by the scheduler when executed
> +as tasks or reasonably fast when executed as interrupt. The timer
> +interrupt can even execute 1024 times per seconds without a significant
> +and measurable impact most of the time.
> +
> +However some rare and extreme workloads can be quite sensitive to
> +those kinds of noise. This is the case, for example, with high
> +bandwidth network processing that can't afford losing a single packet
> +or very low latency network processing. Typically those usecases
> +involve DPDK, bypassing the kernel networking stack and performing
> +direct access to the networking device from userscace.
As also pointed by by Sashiko, there is a typo "userscace" ->
"userspace". There are also typos reported in
https://sashiko.dev/#/patchset/20260326140055.41555-1-frederic%40kernel.org
> +
> +In order to run a CPU without or with limited kernel noise, the
> +related housekeeping work needs to be either shutdown, migrated or
> +offloaded.
> +
> +Housekeeping
> +============
> +
> +In the CPU isolation terminology, housekeeping is the work, often
> +asynchronous, that the kernel needs to process in order to maintain
> +all its services. It matches the noises and disturbances enumerated
> +above except when at least one CPU is isolated. Then housekeeping may
> +make use of further coping mechanisms if CPU-tied work must be
> +offloaded.
> +
> +Housekeeping CPUs are the non-isolated CPUs where the kernel noise
> +is moved away from isolated CPUs.
> +
> +The isolation can be implemented in several ways depending on the
> +nature of the noise:
> +
> +- Unbound work, where "unbound" means not tied to any CPU, can be
> + simply migrated away from isolated CPUs to housekeeping CPUs.
> + This is the case of unbound workqueues, kthreads and timers.
> +
> +- Bound work, where "bound" means tied to a specific CPU, usually
> + can't be moved away as-is by nature. Either:
> +
> + - The work must switch to a locked implementation. Eg: This is
> + the case of RCU with CONFIG_RCU_NOCB_CPU.
> +
> + - The related feature must be shutdown and considered
> + incompatible with isolated CPUs. Eg: Lockup watchdog,
> + unreliable clocksources, etc...
> +
> + - An elaborated and heavyweight coping mechanism stands as a
> + replacement. Eg: the timer tick is shutdown on nohz_full but
"shutdown" should be 2 words as "shutdown" isn't a verb. Should we add
CPU after "nohz_full" to make it more clear?
> + with the constraint of running a single task on the CPU. A
> + significant cost penalty is added on kernel entry/exit and
> + a residual 1Hz scheduler tick is offloaded to housekeeping
> + CPUs.
> +
> +In any case, housekeeping work has to be handled, which is why there
> +must be at least one housekeeping CPU in the system, preferrably more
> +if the machine runs a lot of CPUs. For example one per node on NUMA
> +systems.
> +
> +Also CPU isolation often means a tradeoff between noise-free isolated
> +CPUs and added overhead on housekeeping CPUs, sometimes even on
> +isolated CPUs entering the kernel.
> +
> +Isolation features
> +==================
> +
> +Different levels of isolation can be configured in the kernel, each of
> +which having their own drawbacks and tradeoffs.
> +
> +Scheduler domain isolation
> +--------------------------
> +
> +This feature isolates a CPU from the scheduler topology. As a result,
> +the target isn't part of the load balancing. Tasks won't migrate
> +neither from nor to it unless affined explicitly.
> +
> +As a side effect the CPU is also isolated from unbound workqueues and
> +unbound kthreads.
> +
> +Requirements
> +~~~~~~~~~~~~
> +
> +- CONFIG_CPUSETS=y for the cpusets based interface
> +
> +Tradeoffs
> +~~~~~~~~~
> +
> +By nature, the system load is overall less distributed since some CPUs
> +are extracted from the global load balancing.
> +
> +Interface
> +~~~~~~~~~
> +
> +- Documentation/admin-guide/cgroup-v2.rst cpuset isolated partitions are recommended
> + because they are tunable at runtime.
> +
> +- The 'isolcpus=' kernel boot parameter with the 'domain' flag is a
> + less flexible alternative that doesn't allow for runtime
> + reconfiguration.
> +
> +IRQs isolation
> +--------------
> +
> +Isolate the IRQs whenever possible, so that they don't fire on the
> +target CPUs.
> +
> +Interface
> +~~~~~~~~~
> +
> +- The file /proc/irq/\*/smp_affinity as explained in detail in
> + Documentation/core-api/irq/irq-affinity.rst page.
> +
> +- The "irqaffinity=" kernel boot parameter for a default setting.
> +
> +- The "managed_irq" flag in the "isolcpus=" kernel boot parameter
> + tries a best effort affinity override for managed IRQs.
> +
> +Full Dynticks (aka nohz_full)
> +-----------------------------
> +
> +Full dynticks extends the dynticks idle mode, which stop the tick when
> +the CPU is idle, to CPUs running a single task in userspace. That is,
> +the timer tick is stopped if the environment allows it.
> +
> +Global timer callbacks are also isolated from the nohz_full CPUs.
> +
> +Requirements
> +~~~~~~~~~~~~
> +
> +- CONFIG_NO_HZ_FULL=y
> +
> +Constraints
> +~~~~~~~~~~~
> +
> +- The isolated CPUs must run a single task only. Multitask requires
> + the tick to maintain preemption. This is usually fine since the
> + workload usually can't stand the latency of random context switches.
> +
> +- No call to the kernel from isolated CPUs, at the risk of triggering
> + random noise.
> +
> +- No use of posix CPU timers on isolated CPUs.
> +
> +- Architecture must have a stable and reliable clocksource (no
> + unreliable TSC that requires the watchdog).
> +
> +
> +Tradeoffs
> +~~~~~~~~~
> +
> +In terms of cost, this is the most invasive isolation feature. It is
> +assumed to be used when the workload spends most of its time in
> +userspace and doesn't rely on the kernel except for preparatory
> +work because:
> +
> +- RCU adds more overhead due to the locked, offloaded and threaded
> + callbacks processing (the same that would be obtained with "rcu_nocb"
> + boot parameter).
It should be "rcu_nocbs".
> +
> +- Kernel entry/exit through syscalls, exceptions and IRQs are more
> + costly due to fully ordered RmW operations that maintain userspace
> + as RCU extended quiescent state. Also the CPU time is accounted on
> + kernel boundaries instead of periodically from the tick.
> +
> +- Housekeeping CPUs must run a 1Hz residual remote scheduler tick
> + on behalf of the isolated CPUs.
> +
> +Checklist
> +=========
> +
> +You have set up each of the above isolation features but you still
> +observe jitters that trash your workload? Make sure to check a few
> +elements before proceeding.
> +
> +Some of these checklist items are similar to those of real time
> +workloads:
> +
> +- Use mlock() to prevent your pages from being swapped away. Page
> + faults are usually not compatible with jitter sensitive workloads.
> +
> +- Avoid SMT to prevent your hardware thread from being "preempted"
> + by another one.
> +
> +- CPU frequency changes may induce subtle sorts of jitter in a
> + workload. Cpufreq should be used and tuned with caution.
> +
> +- Deep C-states may result in latency issues upon wake-up. If this
> + happens to be a problem, C-states can be limited via kernel boot
> + parameters such as processor.max_cstate or intel_idle.max_cstate.
> + More finegrained tunings are described in
> + Documentation/admin-guide/pm/cpuidle.rst page
> +
> +- Your system may be subject to firmware-originating interrupts - x86 has
> + System Management Interrupts (SMIs) for example. Check your system BIOS
> + to disable such interference, and with some luck your vendor will have
> + a BIOS tuning guidance for low-latency operations.
> +
> +
> +Full isolation example
> +======================
> +
> +In this example, the system has 8 CPUs and the 8th is to be fully
> +isolated. Since CPUs start from 0, the 8th CPU is CPU 7.
> +
> +Kernel parameters
> +-----------------
> +
> +Set the following kernel boot parameters to disable SMT and setup tick
> +and IRQ isolation:
> +
> +- Full dynticks: nohz_full=7
> +
> +- IRQs isolation: irqaffinity=0-6
> +
> +- Managed IRQs isolation: isolcpus=managed_irq,7
> +
> +- Prevent from SMT: nosmt
> +
> +The full command line is then:
> +
> + nohz_full=7 irqaffinity=0-6 isolcpus=managed_irq,7 nosmt
> +
> +CPUSET configuration (cgroup v2)
> +--------------------------------
> +
> +Assuming cgroup v2 is mounted to /sys/fs/cgroup, the following script
> +isolates CPU 7 from scheduler domains.
> +
> +::
> +
> + cd /sys/fs/cgroup
> + # Activate the cpuset subsystem
> + echo +cpuset > cgroup.subtree_control
> + # Create partition to be isolated
> + mkdir test
> + cd test
> + echo +cpuset > cgroup.subtree_control
> + # Isolate CPU 7
> + echo 7 > cpuset.cpus
> + echo "isolated" > cpuset.cpus.partition
> +
> +The userspace workload
> +----------------------
> +
> +Fake a pure userspace workload, the below program runs a dummy
> +userspace loop on the isolated CPU 7.
> +
> +::
> +
> + #include <stdio.h>
> + #include <fcntl.h>
> + #include <unistd.h>
> + #include <errno.h>
> + int main(void)
> + {
> + // Move the current task to the isolated cpuset (bind to CPU 7)
> + int fd = open("/sys/fs/cgroup/test/cgroup.procs", O_WRONLY);
> + if (fd < 0) {
> + perror("Can't open cpuset file...\n");
> + return 0;
> + }
> +
> + write(fd, "0\n", 2);
> + close(fd);
> +
> + // Run an endless dummy loop until the launcher kills us
> + while (1)
> + ;
> +
> + return 0;
> + }
> +
> +Build it and save for later step:
> +
> +::
> +
> + # gcc user_loop.c -o user_loop
> +
> +The launcher
> +------------
> +
> +The below launcher runs the above program for 10 seconds and traces
> +the noise resulting from preempting tasks and IRQs.
> +
> +::
> +
> + TRACING=/sys/kernel/tracing/
> + # Make sure tracing is off for now
> + echo 0 > $TRACING/tracing_on
> + # Flush previous traces
> + echo > $TRACING/trace
> + # Record disturbance from other tasks
> + echo 1 > $TRACING/events/sched/sched_switch/enable
> + # Record disturbance from interrupts
> + echo 1 > $TRACING/events/irq_vectors/enable
> + # Now we can start tracing
> + echo 1 > $TRACING/tracing_on
> + # Run the dummy user_loop for 10 seconds on CPU 7
> + ./user_loop &
> + USER_LOOP_PID=$!
> + sleep 10
> + kill $USER_LOOP_PID
> + # Disable tracing and save traces from CPU 7 in a file
> + echo 0 > $TRACING/tracing_on
> + cat $TRACING/per_cpu/cpu7/trace > trace.7
> +
> +If no specific problem arose, the output of trace.7 should look like
> +the following:
> +
> +::
> +
> + <idle>-0 [007] d..2. 1980.976624: sched_switch: prev_comm=swapper/7 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=user_loop next_pid=1553 next_prio=120
> + user_loop-1553 [007] d.h.. 1990.946593: reschedule_entry: vector=253
> + user_loop-1553 [007] d.h.. 1990.946593: reschedule_exit: vector=253
> +
> +That is, no specific noise triggered between the first trace and the
> +second during 10 seconds when user_loop was running.
> +
> +Debugging
> +=========
> +
> +Of course things are never so easy, especially on this matter.
> +Chances are that actual noise will be observed in the aforementioned
> +trace.7 file.
> +
> +The best way to investigate further is to enable finer grained
> +tracepoints such as those of subsystems producing asynchronous
> +events: workqueue, timer, irq_vector, etc... It also can be
> +interesting to enable the tick_stop event to diagnose why the tick is
> +retained when that happens.
> +
> +Some tools may also be useful for higher level analysis:
> +
> +- Documentation/tools/rtla/rtla.rst provides a suite of tools to analyze
> + latency and noise in the system. For example Documentation/tools/rtla/rtla-osnoise.rst
> + runs a kernel tracer that analyzes and output a summary of the noises.
> +
> +- dynticks-testing does something similar to rtla-osnoise but in userspace. It is available
> + at git://git.kernel.org/pub/scm/linux/kernel/git/frederic/dynticks-testing.git
> diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
> index b734f8a2a2c4..cd28dfe91b06 100644
> --- a/Documentation/admin-guide/index.rst
> +++ b/Documentation/admin-guide/index.rst
> @@ -94,6 +94,7 @@ likely to be of interest on almost any system.
>
> cgroup-v2
> cgroup-v1/index
> + cpu-isolation
> cpu-load
> mm/index
> module-signing
Other than the minor nits mentioned above,
Acked-by: Waiman Long <longman@redhat.com>
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH v2] doc: Add CPU Isolation documentation
2026-03-26 14:00 [PATCH v2] doc: Add CPU Isolation documentation Frederic Weisbecker
2026-03-26 19:17 ` Waiman Long
@ 2026-03-26 21:42 ` Randy Dunlap
2026-03-26 23:00 ` Steven Rostedt
2026-03-27 16:01 ` Valentin Schneider
2 siblings, 1 reply; 10+ messages in thread
From: Randy Dunlap @ 2026-03-26 21:42 UTC (permalink / raw)
To: Frederic Weisbecker, LKML
Cc: Anna-Maria Behnsen, Gabriele Monaco, Ingo Molnar, Jonathan Corbet,
Marcelo Tosatti, Marco Crivellari, Michal Hocko,
Paul E . McKenney, Peter Zijlstra, Phil Auld, Steven Rostedt,
Thomas Gleixner, Valentin Schneider, Vlastimil Babka, Waiman Long,
linux-doc, Sebastian Andrzej Siewior, Bagas Sanjaya
(Just some small comments -- take them or not.)
On 3/26/26 7:00 AM, Frederic Weisbecker wrote:
> nohz_full was introduced in v3.10 in 2013, which means this
> documentation is overdue for 13 years.
>
> Fortunately Paul wrote a part of the needed documentation a while ago,
> especially concerning nohz_full in Documentation/timers/no_hz.rst and
> also about per-CPU kthreads in
> Documentation/admin-guide/kernel-per-CPU-kthreads.rst
>
> Introduce a new page that gives an overview of CPU isolation in general.
>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> ---
> v2:
> - Fix links and code blocks (Bagas and Sebastian)
> - Isolation is not only about userspace, rephrase accordingly (Valentin)
> - Paste BIOS issues suggestion from Valentin
> - Include the whole rtla suite (Valentin)
> - Rephrase a few details (Waiman)
> - Talk about RCU induced overhead rather than slower RCU (Sebastian)
>
> Documentation/admin-guide/cpu-isolation.rst | 357 ++++++++++++++++++++
> Documentation/admin-guide/index.rst | 1 +
> 2 files changed, 358 insertions(+)
> create mode 100644 Documentation/admin-guide/cpu-isolation.rst
>
> diff --git a/Documentation/admin-guide/cpu-isolation.rst b/Documentation/admin-guide/cpu-isolation.rst
> new file mode 100644
> index 000000000000..886dec79b056
> --- /dev/null
> +++ b/Documentation/admin-guide/cpu-isolation.rst
> @@ -0,0 +1,357 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +=============
> +CPU Isolation
> +=============
> +
> +Introduction
> +============
> +
> +"CPU Isolation" means leaving a CPU exclusive to a given workload
> +without any undesired code interference from the kernel.
> +
> +Those interferences, commonly pointed out as "noise", can be triggered
nit: "noise,"
> +by asynchronous events (interrupts, timers, scheduler preemption by
> +workqueues and kthreads, ...) or synchronous events (syscalls and page
> +faults).
> +
> +Such noise usually goes unnoticed. After all synchronous events are a
maybe (I would): After all,
> +component of the requested kernel service. And asynchronous events are
> +either sufficiently well distributed by the scheduler when executed
well-distributed
> +as tasks or reasonably fast when executed as interrupt. The timer
> +interrupt can even execute 1024 times per seconds without a significant
> +and measurable impact most of the time.
> +
> +However some rare and extreme workloads can be quite sensitive to
> +those kinds of noise. This is the case, for example, with high
> +bandwidth network processing that can't afford losing a single packet
> +or very low latency network processing. Typically those usecases
use cases
> +involve DPDK, bypassing the kernel networking stack and performing
> +direct access to the networking device from userscace.
^^^^^^^^^ (already noted)
> +
> +In order to run a CPU without or with limited kernel noise, the
> +related housekeeping work needs to be either shutdown, migrated or
> +offloaded.
> +
> +Housekeeping
> +============
> +
> +In the CPU isolation terminology, housekeeping is the work, often
> +asynchronous, that the kernel needs to process in order to maintain
> +all its services. It matches the noises and disturbances enumerated
> +above except when at least one CPU is isolated. Then housekeeping may
> +make use of further coping mechanisms if CPU-tied work must be
> +offloaded.
> +
> +Housekeeping CPUs are the non-isolated CPUs where the kernel noise
> +is moved away from isolated CPUs.
> +
> +The isolation can be implemented in several ways depending on the
> +nature of the noise:
> +
> +- Unbound work, where "unbound" means not tied to any CPU, can be
> + simply migrated away from isolated CPUs to housekeeping CPUs.
> + This is the case of unbound workqueues, kthreads and timers.
> +
> +- Bound work, where "bound" means tied to a specific CPU, usually
> + can't be moved away as-is by nature. Either:
> +
> + - The work must switch to a locked implementation. Eg: This is
E.g.: This is
> + the case of RCU with CONFIG_RCU_NOCB_CPU.
> +
> + - The related feature must be shutdown and considered
> + incompatible with isolated CPUs. Eg: Lockup watchdog,
E.g.:
> + unreliable clocksources, etc...
> +
> + - An elaborated and heavyweight coping mechanism stands as a
elaborate
> + replacement. Eg: the timer tick is shutdown on nohz_full but
E.g.:
> + with the constraint of running a single task on the CPU. A
> + significant cost penalty is added on kernel entry/exit and
> + a residual 1Hz scheduler tick is offloaded to housekeeping
> + CPUs.
> +
> +In any case, housekeeping work has to be handled, which is why there
> +must be at least one housekeeping CPU in the system, preferrably more
preferably
> +if the machine runs a lot of CPUs. For example one per node on NUMA
> +systems.
> +
> +Also CPU isolation often means a tradeoff between noise-free isolated
> +CPUs and added overhead on housekeeping CPUs, sometimes even on
> +isolated CPUs entering the kernel.
> +
> +Isolation features
> +==================
> +
> +Different levels of isolation can be configured in the kernel, each of
> +which having their own drawbacks and tradeoffs.
has its own
> +
> +Scheduler domain isolation
> +--------------------------
> +
> +This feature isolates a CPU from the scheduler topology. As a result,
> +the target isn't part of the load balancing. Tasks won't migrate
> +neither from nor to it unless affined explicitly.
either from or to it
> +
> +As a side effect the CPU is also isolated from unbound workqueues and
> +unbound kthreads.
> +
> +Requirements
> +~~~~~~~~~~~~
> +
> +- CONFIG_CPUSETS=y for the cpusets based interface
cpusets-based
> +
> +Tradeoffs
> +~~~~~~~~~
> +
> +By nature, the system load is overall less distributed since some CPUs
> +are extracted from the global load balancing.
> +
> +Interface
> +~~~~~~~~~
Interfaces
~~~~~~~~~~
> +
> +- Documentation/admin-guide/cgroup-v2.rst cpuset isolated partitions are recommended
> + because they are tunable at runtime.
> +
> +- The 'isolcpus=' kernel boot parameter with the 'domain' flag is a
> + less flexible alternative that doesn't allow for runtime
> + reconfiguration.
> +
> +IRQs isolation
> +--------------
> +
> +Isolate the IRQs whenever possible, so that they don't fire on the
> +target CPUs.
> +
> +Interface
> +~~~~~~~~~
Interfaces
~~~~~~~~~~
> +
> +- The file /proc/irq/\*/smp_affinity as explained in detail in
> + Documentation/core-api/irq/irq-affinity.rst page.
> +
> +- The "irqaffinity=" kernel boot parameter for a default setting.
> +
> +- The "managed_irq" flag in the "isolcpus=" kernel boot parameter
> + tries a best effort affinity override for managed IRQs.
> +
> +Full Dynticks (aka nohz_full)
> +-----------------------------
> +
> +Full dynticks extends the dynticks idle mode, which stop the tick when
stops
> +the CPU is idle, to CPUs running a single task in userspace. That is,
> +the timer tick is stopped if the environment allows it.
> +
> +Global timer callbacks are also isolated from the nohz_full CPUs.
> +
> +Requirements
> +~~~~~~~~~~~~
> +
> +- CONFIG_NO_HZ_FULL=y
> +
> +Constraints
> +~~~~~~~~~~~
> +
> +- The isolated CPUs must run a single task only. Multitask requires
> + the tick to maintain preemption. This is usually fine since the
> + workload usually can't stand the latency of random context switches.
> +
> +- No call to the kernel from isolated CPUs, at the risk of triggering
> + random noise.
> +
> +- No use of posix CPU timers on isolated CPUs.
POSIX
> +
> +- Architecture must have a stable and reliable clocksource (no
> + unreliable TSC that requires the watchdog).
> +
> +
> +Tradeoffs
> +~~~~~~~~~
> +
> +In terms of cost, this is the most invasive isolation feature. It is
> +assumed to be used when the workload spends most of its time in
> +userspace and doesn't rely on the kernel except for preparatory
> +work because:
> +
> +- RCU adds more overhead due to the locked, offloaded and threaded
> + callbacks processing (the same that would be obtained with "rcu_nocb"
> + boot parameter).
> +
> +- Kernel entry/exit through syscalls, exceptions and IRQs are more
> + costly due to fully ordered RmW operations that maintain userspace
> + as RCU extended quiescent state. Also the CPU time is accounted on
> + kernel boundaries instead of periodically from the tick.
> +
> +- Housekeeping CPUs must run a 1Hz residual remote scheduler tick
> + on behalf of the isolated CPUs.
> +
> +Checklist
> +=========
> +
> +You have set up each of the above isolation features but you still
> +observe jitters that trash your workload? Make sure to check a few
> +elements before proceeding.
> +
> +Some of these checklist items are similar to those of real time
I prefer "real-time" but it seems that "real time," "realtime," or
"real-time" is OK in the kernel source tree.
> +workloads:
> +
> +- Use mlock() to prevent your pages from being swapped away. Page
> + faults are usually not compatible with jitter sensitive workloads.
> +
> +- Avoid SMT to prevent your hardware thread from being "preempted"
> + by another one.
> +
> +- CPU frequency changes may induce subtle sorts of jitter in a
> + workload. Cpufreq should be used and tuned with caution.
> +
> +- Deep C-states may result in latency issues upon wake-up. If this
> + happens to be a problem, C-states can be limited via kernel boot
> + parameters such as processor.max_cstate or intel_idle.max_cstate.
> + More finegrained tunings are described in
> + Documentation/admin-guide/pm/cpuidle.rst page
> +
> +- Your system may be subject to firmware-originating interrupts - x86 has
> + System Management Interrupts (SMIs) for example. Check your system BIOS
> + to disable such interference, and with some luck your vendor will have
> + a BIOS tuning guidance for low-latency operations.
> +
> +
> +Full isolation example
> +======================
> +
> +In this example, the system has 8 CPUs and the 8th is to be fully
> +isolated. Since CPUs start from 0, the 8th CPU is CPU 7.
> +
> +Kernel parameters
> +-----------------
> +
> +Set the following kernel boot parameters to disable SMT and setup tick
> +and IRQ isolation:
> +
> +- Full dynticks: nohz_full=7
> +
> +- IRQs isolation: irqaffinity=0-6
> +
> +- Managed IRQs isolation: isolcpus=managed_irq,7
> +
> +- Prevent from SMT: nosmt
Prevent SMT: nosmt
> +
> +The full command line is then:
> +
> + nohz_full=7 irqaffinity=0-6 isolcpus=managed_irq,7 nosmt
> +
> +CPUSET configuration (cgroup v2)
> +--------------------------------
> +
> +Assuming cgroup v2 is mounted to /sys/fs/cgroup, the following script
> +isolates CPU 7 from scheduler domains.
> +
> +::
> +
> + cd /sys/fs/cgroup
> + # Activate the cpuset subsystem
> + echo +cpuset > cgroup.subtree_control
> + # Create partition to be isolated
> + mkdir test
> + cd test
> + echo +cpuset > cgroup.subtree_control
> + # Isolate CPU 7
> + echo 7 > cpuset.cpus
> + echo "isolated" > cpuset.cpus.partition
> +
> +The userspace workload
> +----------------------
> +
> +Fake a pure userspace workload, the below program runs a dummy
workload: the program below
> +userspace loop on the isolated CPU 7.
> +
> +::
> +
> + #include <stdio.h>
> + #include <fcntl.h>
> + #include <unistd.h>
> + #include <errno.h>
> + int main(void)
> + {
> + // Move the current task to the isolated cpuset (bind to CPU 7)
> + int fd = open("/sys/fs/cgroup/test/cgroup.procs", O_WRONLY);
> + if (fd < 0) {
> + perror("Can't open cpuset file...\n");
> + return 0;
> + }
> +
> + write(fd, "0\n", 2);
> + close(fd);
> +
> + // Run an endless dummy loop until the launcher kills us
> + while (1)
> + ;
> +
> + return 0;
> + }
> +
> +Build it and save for later step:
> +
> +::
> +
> + # gcc user_loop.c -o user_loop
> +
> +The launcher
> +------------
> +
> +The below launcher runs the above program for 10 seconds and traces
> +the noise resulting from preempting tasks and IRQs.
> +
> +::
> +
> + TRACING=/sys/kernel/tracing/
> + # Make sure tracing is off for now
> + echo 0 > $TRACING/tracing_on
> + # Flush previous traces
> + echo > $TRACING/trace
> + # Record disturbance from other tasks
> + echo 1 > $TRACING/events/sched/sched_switch/enable
> + # Record disturbance from interrupts
> + echo 1 > $TRACING/events/irq_vectors/enable
> + # Now we can start tracing
> + echo 1 > $TRACING/tracing_on
> + # Run the dummy user_loop for 10 seconds on CPU 7
> + ./user_loop &
> + USER_LOOP_PID=$!
> + sleep 10
> + kill $USER_LOOP_PID
> + # Disable tracing and save traces from CPU 7 in a file
> + echo 0 > $TRACING/tracing_on
> + cat $TRACING/per_cpu/cpu7/trace > trace.7
> +
> +If no specific problem arose, the output of trace.7 should look like
> +the following:
> +
> +::
> +
> + <idle>-0 [007] d..2. 1980.976624: sched_switch: prev_comm=swapper/7 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=user_loop next_pid=1553 next_prio=120
> + user_loop-1553 [007] d.h.. 1990.946593: reschedule_entry: vector=253
> + user_loop-1553 [007] d.h.. 1990.946593: reschedule_exit: vector=253
> +
> +That is, no specific noise triggered between the first trace and the
> +second during 10 seconds when user_loop was running.
> +
> +Debugging
> +=========
> +
> +Of course things are never so easy, especially on this matter.
> +Chances are that actual noise will be observed in the aforementioned
> +trace.7 file.
> +
> +The best way to investigate further is to enable finer grained
> +tracepoints such as those of subsystems producing asynchronous
> +events: workqueue, timer, irq_vector, etc... It also can be
> +interesting to enable the tick_stop event to diagnose why the tick is
> +retained when that happens.
> +
> +Some tools may also be useful for higher level analysis:
> +
> +- Documentation/tools/rtla/rtla.rst provides a suite of tools to analyze
> + latency and noise in the system. For example Documentation/tools/rtla/rtla-osnoise.rst
> + runs a kernel tracer that analyzes and output a summary of the noises.
> +
> +- dynticks-testing does something similar to rtla-osnoise but in userspace. It is available
> + at git://git.kernel.org/pub/scm/linux/kernel/git/frederic/dynticks-testing.git
(No bot or AI help :)
--
~Randy
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH v2] doc: Add CPU Isolation documentation
2026-03-26 21:42 ` Randy Dunlap
@ 2026-03-26 23:00 ` Steven Rostedt
2026-03-26 23:01 ` Steven Rostedt
2026-03-26 23:03 ` Randy Dunlap
0 siblings, 2 replies; 10+ messages in thread
From: Steven Rostedt @ 2026-03-26 23:00 UTC (permalink / raw)
To: Randy Dunlap
Cc: Frederic Weisbecker, LKML, Anna-Maria Behnsen, Gabriele Monaco,
Ingo Molnar, Jonathan Corbet, Marcelo Tosatti, Marco Crivellari,
Michal Hocko, Paul E . McKenney, Peter Zijlstra, Phil Auld,
Thomas Gleixner, Valentin Schneider, Vlastimil Babka, Waiman Long,
linux-doc, Sebastian Andrzej Siewior, Bagas Sanjaya
On Thu, 26 Mar 2026 14:42:32 -0700
Randy Dunlap <rdunlap@infradead.org> wrote:
> > +Those interferences, commonly pointed out as "noise", can be triggered
>
> nit: "noise,"
Really?
Even if it's grammatically correct, it still looks horrible.
Those interferences, commonly pointed out as "noise", can be triggered
vs
Those interferences, commonly pointed out as "noise," can be triggered
As the quotes are going around the noun, and placing the comma into the
quotes makes it look like the ',' is part of that noun and not a
punctuation of the sentence.
Being read by C programmers, I would keep he ',' out of the quotes as it
looks like a string literal "noun,".
-- Steve
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH v2] doc: Add CPU Isolation documentation
2026-03-26 23:00 ` Steven Rostedt
@ 2026-03-26 23:01 ` Steven Rostedt
2026-03-26 23:03 ` Randy Dunlap
1 sibling, 0 replies; 10+ messages in thread
From: Steven Rostedt @ 2026-03-26 23:01 UTC (permalink / raw)
To: Randy Dunlap
Cc: Frederic Weisbecker, LKML, Anna-Maria Behnsen, Gabriele Monaco,
Ingo Molnar, Jonathan Corbet, Marcelo Tosatti, Marco Crivellari,
Michal Hocko, Paul E . McKenney, Peter Zijlstra, Phil Auld,
Thomas Gleixner, Valentin Schneider, Vlastimil Babka, Waiman Long,
linux-doc, Sebastian Andrzej Siewior, Bagas Sanjaya
On Thu, 26 Mar 2026 19:00:22 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:
> Being read by C programmers, I would keep he ',' out of the quotes as it
> looks like a string literal "noun,".
Or was that supposed to be: a string literal "noun,." ;-)
-- Steve
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] doc: Add CPU Isolation documentation
2026-03-26 23:00 ` Steven Rostedt
2026-03-26 23:01 ` Steven Rostedt
@ 2026-03-26 23:03 ` Randy Dunlap
2026-03-26 23:06 ` Steven Rostedt
1 sibling, 1 reply; 10+ messages in thread
From: Randy Dunlap @ 2026-03-26 23:03 UTC (permalink / raw)
To: Steven Rostedt
Cc: Frederic Weisbecker, LKML, Anna-Maria Behnsen, Gabriele Monaco,
Ingo Molnar, Jonathan Corbet, Marcelo Tosatti, Marco Crivellari,
Michal Hocko, Paul E . McKenney, Peter Zijlstra, Phil Auld,
Thomas Gleixner, Valentin Schneider, Vlastimil Babka, Waiman Long,
linux-doc, Sebastian Andrzej Siewior, Bagas Sanjaya
On 3/26/26 4:00 PM, Steven Rostedt wrote:
> On Thu, 26 Mar 2026 14:42:32 -0700
> Randy Dunlap <rdunlap@infradead.org> wrote:
>
>>> +Those interferences, commonly pointed out as "noise", can be triggered
>>
>> nit: "noise,"
>
> Really?
>
> Even if it's grammatically correct, it still looks horrible.
>
> Those interferences, commonly pointed out as "noise", can be triggered
>
> vs
>
> Those interferences, commonly pointed out as "noise," can be triggered
>
> As the quotes are going around the noun, and placing the comma into the
> quotes makes it look like the ',' is part of that noun and not a
> punctuation of the sentence.
>
> Being read by C programmers, I would keep he ',' out of the quotes as it
he ? :)
> looks like a string literal "noun,".
Stephen, all I can do is make recommendations. If it hurts your eyes, tough. ;)
--
~Randy
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH v2] doc: Add CPU Isolation documentation
2026-03-26 23:03 ` Randy Dunlap
@ 2026-03-26 23:06 ` Steven Rostedt
2026-03-26 23:09 ` Randy Dunlap
0 siblings, 1 reply; 10+ messages in thread
From: Steven Rostedt @ 2026-03-26 23:06 UTC (permalink / raw)
To: Randy Dunlap
Cc: Frederic Weisbecker, LKML, Anna-Maria Behnsen, Gabriele Monaco,
Ingo Molnar, Jonathan Corbet, Marcelo Tosatti, Marco Crivellari,
Michal Hocko, Paul E . McKenney, Peter Zijlstra, Phil Auld,
Thomas Gleixner, Valentin Schneider, Vlastimil Babka, Waiman Long,
linux-doc, Sebastian Andrzej Siewior, Bagas Sanjaya
On Thu, 26 Mar 2026 16:03:18 -0700
Randy Dunlap <rdunlap@infradead.org> wrote:
> Stephen, all I can do is make recommendations. If it hurts your eyes, tough. ;)
Who's this "Stephen" you are talking to?
-- Steve
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] doc: Add CPU Isolation documentation
2026-03-26 23:06 ` Steven Rostedt
@ 2026-03-26 23:09 ` Randy Dunlap
2026-03-26 23:16 ` Steven Rostedt
0 siblings, 1 reply; 10+ messages in thread
From: Randy Dunlap @ 2026-03-26 23:09 UTC (permalink / raw)
To: Steven Rostedt
Cc: Frederic Weisbecker, LKML, Anna-Maria Behnsen, Gabriele Monaco,
Ingo Molnar, Jonathan Corbet, Marcelo Tosatti, Marco Crivellari,
Michal Hocko, Paul E . McKenney, Peter Zijlstra, Phil Auld,
Thomas Gleixner, Valentin Schneider, Vlastimil Babka, Waiman Long,
linux-doc, Sebastian Andrzej Siewior, Bagas Sanjaya
On 3/26/26 4:06 PM, Steven Rostedt wrote:
> On Thu, 26 Mar 2026 16:03:18 -0700
> Randy Dunlap <rdunlap@infradead.org> wrote:
>
>> Stephen, all I can do is make recommendations. If it hurts your eyes, tough. ;)
>
> Who's this "Stephen" you are talking to?
Oops. I guess it's the one who accidentally wrote "he" instead of "the".
(Do you like where that period ^ is,
even though it's incorrect? :)
--
~Randy
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH v2] doc: Add CPU Isolation documentation
2026-03-26 23:09 ` Randy Dunlap
@ 2026-03-26 23:16 ` Steven Rostedt
0 siblings, 0 replies; 10+ messages in thread
From: Steven Rostedt @ 2026-03-26 23:16 UTC (permalink / raw)
To: Randy Dunlap
Cc: Frederic Weisbecker, LKML, Anna-Maria Behnsen, Gabriele Monaco,
Ingo Molnar, Jonathan Corbet, Marcelo Tosatti, Marco Crivellari,
Michal Hocko, Paul E . McKenney, Peter Zijlstra, Phil Auld,
Thomas Gleixner, Valentin Schneider, Vlastimil Babka, Waiman Long,
linux-doc, Sebastian Andrzej Siewior, Bagas Sanjaya
On Thu, 26 Mar 2026 16:09:15 -0700
Randy Dunlap <rdunlap@infradead.org> wrote:
> On 3/26/26 4:06 PM, Steven Rostedt wrote:
> > On Thu, 26 Mar 2026 16:03:18 -0700
> > Randy Dunlap <rdunlap@infradead.org> wrote:
> >
> >> Stephen, all I can do is make recommendations. If it hurts your eyes, tough. ;)
> >
> > Who's this "Stephen" you are talking to?
>
> Oops. I guess it's the one who accidentally wrote "he" instead of "the".
> (Do you like where that period ^ is,
> even though it's incorrect? :)
>
Yes, Randi, I like where that period is. I also need to fix my 't' on my
keyboard as it seems to act shy sometimes.
-- Steve
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] doc: Add CPU Isolation documentation
2026-03-26 14:00 [PATCH v2] doc: Add CPU Isolation documentation Frederic Weisbecker
2026-03-26 19:17 ` Waiman Long
2026-03-26 21:42 ` Randy Dunlap
@ 2026-03-27 16:01 ` Valentin Schneider
2 siblings, 0 replies; 10+ messages in thread
From: Valentin Schneider @ 2026-03-27 16:01 UTC (permalink / raw)
To: Frederic Weisbecker, LKML
Cc: Frederic Weisbecker, Anna-Maria Behnsen, Gabriele Monaco,
Ingo Molnar, Jonathan Corbet, Marcelo Tosatti, Marco Crivellari,
Michal Hocko, Paul E . McKenney, Peter Zijlstra, Phil Auld,
Steven Rostedt, Thomas Gleixner, Vlastimil Babka, Waiman Long,
linux-doc, Sebastian Andrzej Siewior, Bagas Sanjaya
On 26/03/26 15:00, Frederic Weisbecker wrote:
> nohz_full was introduced in v3.10 in 2013, which means this
> documentation is overdue for 13 years.
>
> Fortunately Paul wrote a part of the needed documentation a while ago,
> especially concerning nohz_full in Documentation/timers/no_hz.rst and
> also about per-CPU kthreads in
> Documentation/admin-guide/kernel-per-CPU-kthreads.rst
>
> Introduce a new page that gives an overview of CPU isolation in general.
>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
With what's already been brought up:
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
^ permalink raw reply [flat|nested] 10+ messages in thread