From: Shrikanth Hegde <sshegde@linux.ibm.com>
To: linux-kernel@vger.kernel.org, mingo@kernel.org,
peterz@infradead.org, juri.lelli@redhat.com,
vincent.guittot@linaro.org, yury.norov@gmail.com,
kprateek.nayak@amd.com, iii@linux.ibm.com, corbet@lwn.net
Cc: sshegde@linux.ibm.com, tglx@kernel.org,
gregkh@linuxfoundation.org, pbonzini@redhat.com,
seanjc@google.com, vschneid@redhat.com, huschle@linux.ibm.com,
rostedt@goodmis.org, dietmar.eggemann@arm.com,
maddy@linux.ibm.com, srikar@linux.ibm.com, hdanton@sina.com,
chleroy@kernel.org, vineeth@bitbyteword.org, frederic@kernel.org,
arighi@nvidia.com, pauld@redhat.com, christian.loehle@arm.com,
tj@kernel.org, tommaso.cucinotta@gmail.com, maz@kernel.org,
rafael@kernel.org, rdunlap@infradead.org, kernellwp@gmail.com,
linux-doc@vger.kernel.org
Subject: [PATCH v5 13/24] virt/steal_monitor: Add documentation
Date: Thu, 25 Jun 2026 18:16:37 +0530 [thread overview]
Message-ID: <20260625124648.802832-14-sshegde@linux.ibm.com> (raw)
In-Reply-To: <20260625124648.802832-1-sshegde@linux.ibm.com>
Document this module named steal_monitor and its parameters.
Signed-off-by: Shrikanth Hegde <sshegde@linux.ibm.com>
---
v4-v5:
- new patch
Please let me know if the placing is not right.
Documentation/driver-api/index.rst | 1 +
Documentation/driver-api/steal-monitor.rst | 93 ++++++++++++++++++++++
2 files changed, 94 insertions(+)
create mode 100644 Documentation/driver-api/steal-monitor.rst
diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst
index eaf7161ff957..ec12f396a5e6 100644
--- a/Documentation/driver-api/index.rst
+++ b/Documentation/driver-api/index.rst
@@ -138,6 +138,7 @@ Subsystem-specific APIs
sm501
soundwire/index
spi
+ steal-monitor
surface_aggregator/index
switchtec
sync_file
diff --git a/Documentation/driver-api/steal-monitor.rst b/Documentation/driver-api/steal-monitor.rst
new file mode 100644
index 000000000000..997a22d0812c
--- /dev/null
+++ b/Documentation/driver-api/steal-monitor.rst
@@ -0,0 +1,93 @@
+.. SPDX-License-Identifier: GPL-2.0
+=============
+Steal Monitor
+=============
+
+:Author: Shrikanth Hegde
+
+Introduction:
+=============
+
+Steal monitor is a driver aimed at solving the Noisy Neighbour problem
+in virtualized environments. I.e performance of workload
+running in one VM gets affected significantly due to other VMs and
+combined they make slower forward progress.
+
+When there is overcommit of CPU resources, i.e sum of virtual CPUs(vCPU)
+of all VMs is greater than number of physical CPUs(pCPU) and
+when all or many VMs have high utilization, hypervisor won't be able
+to satisfy the CPU requirement and has to context switch within or
+across VM. I.e hypervisor needs to preempt one vCPU to run
+another. This is called vCPU preemption.
+This is more expensive compared to task context switch within a vCPU.
+
+In such cases it is better that combined vCPU ask from all VM is reduced
+by not using some of the vCPUs. vCPUs where workload can be safely
+scheduled which won't increase any contention for pCPU are called as
+"Preferred CPUs".
+
+See more on "Preferred CPUs" in Documentation/scheduler/sched-arch.rst.
+
+This driver helps in setting/clearing the CPUs in the "Preferred CPUs" list.
+This list is obtained using cpu_preferred_mask.
+
+Core idea:
+==========
+steal time is an indication available today in Guest which shows contention
+for underlying physical CPU. Use it as a hint in the guest to fold the
+workload to a reduced set of vCPUs. When there is contention, steal time
+will show up in all the guests. When each guest honors the hint and folds
+the workload to a smaller set of vCPUs(Preferred CPUs), it reduces the
+contention and thereby reduces vCPU preemption.
+This is achieved without any cross-guest communication.
+
+Steal monitor driver effectively does:
+
+1. Periodically computes steal time across the system.
+
+2. If steal time is greater than high threshold, reduce the number of
+ preferred CPUs by 1 core. Ensure at least one core is left always.
+ This avoids running into extreme cases.
+
+3. If steal time is lower or equal to low threshold, increase the
+ number of preferred CPUs by 1 core. If preferred is same as active,
+ nothing to be done.
+
+4. Ensure preferred CPUs is always subset of active CPUs.
+ On feature disable it is same as active CPUs.
+
+Module Parameters:
+==================
+interval_ms
+-----------
+How often steal monitor checks for steal time.
+(Default: 1000 i.e 1 second)
+
+This controls how fast steal monitor driver reacts to changes to
+the contention of physical CPUs. Since it does fair amount of
+work, setting too low will have overheads. If set to 0, on next
+work it will be set to default.
+
+low_threshold
+-------------
+lower threshold value in percentage * 100.
+(Default: 200, i.e 2% steal is considered as low threshold)
+
+This determines what values should be considered as nil/no steal values.
+When steal monitor see steal time is below or equal to this value, it
+will increase the preferred CPUs by 1 core. Having value as zero
+might cause too much oscillations.
+
+high_threshold
+--------------
+higher threshold value in percentage * 100
+(Default: 500, i.e 5% steal is considered as high threshold)
+
+This determines what values should be considered as high steal values.
+When steal monitor sees steal time is higher than this value, it will
+reduce the preferred CPUs by 1 core.
+
+Notes:
+======
+This is available under CONFIG_PREFERRED_CPU. Selecting that includes
+this module. Module is not loaded by default.
--
2.47.3
next prev parent reply other threads:[~2026-06-25 12:49 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-25 12:46 [PATCH v5 00/24] sched: Introduce cpu_preferred_mask and steal-driven vCPU backoff Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 01/24] sched/debug: Remove unused schedstats Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 02/24] sched/docs: Document cpu_preferred_mask and Preferred CPU concept Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 03/24] kconfig: Provide PREFERRED_CPU option Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 04/24] cpumask: Introduce cpu_preferred_mask Shrikanth Hegde
2026-06-26 9:34 ` Peter Zijlstra
2026-06-26 13:37 ` Shrikanth Hegde
2026-06-26 9:39 ` Peter Zijlstra
2026-06-26 9:41 ` Peter Zijlstra
2026-06-26 13:09 ` Shrikanth Hegde
2026-06-26 13:18 ` Yury Norov
2026-06-26 13:27 ` Shrikanth Hegde
2026-06-26 12:40 ` Yury Norov
2026-06-26 13:18 ` Shrikanth Hegde
2026-06-26 18:51 ` Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 05/24] sysfs: Add preferred CPU file Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 06/24] sched/core: allow only preferred CPUs in is_cpu_allowed Shrikanth Hegde
2026-06-26 13:06 ` Yury Norov
2026-06-26 13:25 ` Shrikanth Hegde
2026-06-26 18:43 ` Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 07/24] sched/fair: Select preferred CPU at wakeup when possible Shrikanth Hegde
2026-06-26 9:59 ` Peter Zijlstra
2026-06-26 13:17 ` Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 08/24] sched/fair: load balance only among preferred CPUs Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 09/24] sched/fair: Pull the load on preferred CPU Shrikanth Hegde
2026-06-26 10:00 ` Peter Zijlstra
2026-06-26 13:35 ` Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 10/24] sched/core: Keep tick on non-preferred CPUs until tasks are out Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 11/24] sched/core: Push current task from non preferred CPU Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 12/24] sched/debug: Add migration stats due to non preferred CPUs Shrikanth Hegde
2026-06-25 12:46 ` Shrikanth Hegde [this message]
2026-06-25 17:00 ` [PATCH v5 13/24] virt/steal_monitor: Add documentation Randy Dunlap
2026-06-26 4:30 ` Shrikanth Hegde
2026-06-26 9:28 ` Peter Zijlstra
2026-06-26 14:05 ` Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 14/24] virt: Introduce steal monitor driver Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 15/24] virt/steal_monitor: Restore to active on module disable Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 16/24] virt/steal_monitor: Define steal_monitor structure Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 17/24] virt/steal_monitor: Add control knobs for handling steal values Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 18/24] virt/steal_monitor: Compute work at regular intervals Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 19/24] virt/steal_monitor: Provide default method to get systemwide steal time Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 20/24] virt/steal_monitor: Provide default method to inc/dec preferred CPUs Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 21/24] virt/steal_monitor: Provide default method to get num of CPUs for steal ratio Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 22/24] virt/steal_monitor: Act on steal values at regular intervals Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 23/24] virt/steal_monitor: Add direction control Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 24/24] virt/steal_monitor: Add design check of preferred subset of active Shrikanth Hegde
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260625124648.802832-14-sshegde@linux.ibm.com \
--to=sshegde@linux.ibm.com \
--cc=arighi@nvidia.com \
--cc=chleroy@kernel.org \
--cc=christian.loehle@arm.com \
--cc=corbet@lwn.net \
--cc=dietmar.eggemann@arm.com \
--cc=frederic@kernel.org \
--cc=gregkh@linuxfoundation.org \
--cc=hdanton@sina.com \
--cc=huschle@linux.ibm.com \
--cc=iii@linux.ibm.com \
--cc=juri.lelli@redhat.com \
--cc=kernellwp@gmail.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maddy@linux.ibm.com \
--cc=maz@kernel.org \
--cc=mingo@kernel.org \
--cc=pauld@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=rafael@kernel.org \
--cc=rdunlap@infradead.org \
--cc=rostedt@goodmis.org \
--cc=seanjc@google.com \
--cc=srikar@linux.ibm.com \
--cc=tglx@kernel.org \
--cc=tj@kernel.org \
--cc=tommaso.cucinotta@gmail.com \
--cc=vincent.guittot@linaro.org \
--cc=vineeth@bitbyteword.org \
--cc=vschneid@redhat.com \
--cc=yury.norov@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.