From: Shrikanth Hegde <sshegde@linux.ibm.com>
To: linux-kernel@vger.kernel.org
Cc: pbonzini@redhat.com, seanjc@google.com, kprateek.nayak@amd.com,
vschneid@redhat.com, iii@linux.ibm.com, huschle@linux.ibm.com,
rostedt@goodmis.org, dietmar.eggemann@arm.com, mgorman@suse.de,
bsegall@google.com, maddy@linux.ibm.com, srikar@linux.ibm.com,
hdanton@sina.com, chleroy@kernel.org, vineeth@bitbyteword.org,
joelagnelf@nvidia.com, mingo@kernel.org, peterz@infradead.org,
juri.lelli@redhat.com, vincent.guittot@linaro.org,
tglx@linutronix.de, yury.norov@gmail.com,
gregkh@linuxfoundation.org
Subject: Re: [PATCH v2 00/17] sched/paravirt: Introduce cpu_preferred_mask and steal-driven vCPU backoff
Date: Fri, 10 Apr 2026 15:17:40 +0530 [thread overview]
Message-ID: <fc05a2b4-a98b-4e7c-beed-05407886d55a@linux.ibm.com> (raw)
In-Reply-To: <20260407191950.643549-1-sshegde@linux.ibm.com>
On 4/8/26 12:49 AM, Shrikanth Hegde wrote:
> In the virtualized environment, often there is vCPU overcommit. i.e. sum
> of CPUs in all guests(virtual CPU aka vCPU) exceed the underlying physical CPU
> (managed by host aka pCPU).
Patch to write custom CPUs into preferred CPUs.
This might help one echo specific CPUs based on their hardware
topology. This could be used to find out the different kind
of patterns across HWs and kind of arch specific hooks one might need
if generic STEAL_MONITOR can't cater to all needs.
Note: This disables the generic steal when custom mask is provided and
enables it once empty mask is echoed.
---
drivers/base/cpu.c | 54 ++++++++++++++++++++++++++++++++++++++++++-
include/linux/sched.h | 3 +++
kernel/sched/core.c | 4 ++++
3 files changed, 60 insertions(+), 1 deletion(-)
diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index 0a6cf37f2001..133f28b15906 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -392,12 +392,64 @@ static int cpu_uevent(const struct device *dev,
struct kobj_uevent_env *env)
#endif
#ifdef CONFIG_PARAVIRT
+static ssize_t preferred_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ cpumask_var_t temp_mask;
+ int retval = 0;
+ int cpu;
+
+ if (!alloc_cpumask_var(&temp_mask, GFP_KERNEL))
+ return -ENOMEM;
+
+ retval = cpulist_parse(buf, temp_mask);
+ if (retval)
+ goto free_mask;
+
+ /* ALL cpus can't be marked as paravirt */
+ if (cpumask_equal(temp_mask, cpu_online_mask)) {
+ retval = -EINVAL;
+ goto free_mask;
+ }
+ if (cpumask_weight(temp_mask) > num_online_cpus()) {
+ retval = -EINVAL;
+ goto free_mask;
+ }
+
+ /* Echoing > means all CPUs are preferred and Enables generic steal
monitor */
+ if (cpumask_empty(temp_mask)) {
+ static_branch_disable(&disable_generic_steal_mon);
+ cpumask_copy((struct cpumask *)&__cpu_preferred_mask, cpu_online_mask);
+
+ } else {
+ /*
+ * Explicit Specification of Usable CPUs and Disables generic steal
+ * monitor
+ */
+ static_branch_enable(&disable_generic_steal_mon);
+ cpumask_copy((struct cpumask *)&__cpu_preferred_mask, temp_mask);
+
+ /* Enable tick on nohz_full cpu */
+ for_each_cpu_andnot(cpu, cpu_online_mask, temp_mask) {
+ if (tick_nohz_full_cpu(cpu))
+ tick_nohz_dep_set_cpu(cpu, TICK_DEP_BIT_SCHED);
+ }
+ }
+
+ retval = count;
+
+free_mask:
+ free_cpumask_var(temp_mask);
+ return retval;
+}
+
static ssize_t preferred_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
return sysfs_emit(buf, "%*pbl\n", cpumask_pr_args(cpu_preferred_mask));
}
-static DEVICE_ATTR_RO(preferred);
+static DEVICE_ATTR_RW(preferred);
#endif
const struct bus_type cpu_subsys = {
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6c0d5d36f21c..3760c8047ffe 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2515,4 +2515,7 @@ extern void migrate_enable(void);
DEFINE_LOCK_GUARD_0(migrate, migrate_disable(), migrate_enable())
+#ifdef CONFIG_PARAVIRT
+DECLARE_STATIC_KEY_FALSE(disable_generic_steal_mon);
+#endif
#endif
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index cb9110f95ebf..680da55070f8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -11339,6 +11339,7 @@ void sched_push_current_non_preferred_cpu(struct
rq *rq)
}
struct steal_monitor_t steal_mon;
+DEFINE_STATIC_KEY_FALSE(disable_generic_steal_mon);
void sched_init_steal_monitor(void)
{
@@ -11428,6 +11429,9 @@ void sched_trigger_steal_computation(int cpu)
if (likely(cpu != first_hk_cpu))
return;
+ if (static_branch_unlikely(&disable_generic_steal_mon))
+ return;
+
/*
* Since everything is updated by first housekeeping CPU,
* There is no need for complex syncronization.
--
2.47.3
prev parent reply other threads:[~2026-04-10 9:48 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-07 19:19 [PATCH v2 00/17] sched/paravirt: Introduce cpu_preferred_mask and steal-driven vCPU backoff Shrikanth Hegde
2026-04-07 19:19 ` [PATCH v2 01/17] sched/debug: Remove unused schedstats Shrikanth Hegde
2026-04-07 19:19 ` [PATCH v2 02/17] sched/docs: Document cpu_preferred_mask and Preferred CPU concept Shrikanth Hegde
2026-04-07 19:19 ` [PATCH v2 03/17] cpumask: Introduce cpu_preferred_mask Shrikanth Hegde
2026-04-07 20:27 ` Yury Norov
2026-04-08 9:16 ` Shrikanth Hegde
2026-04-08 17:57 ` Yury Norov
2026-04-07 19:19 ` [PATCH v2 04/17] sysfs: Add preferred CPU file Shrikanth Hegde
2026-04-07 19:19 ` [PATCH v2 05/17] sched/core: allow only preferred CPUs in is_cpu_allowed Shrikanth Hegde
2026-04-08 1:05 ` Yury Norov
2026-04-08 12:56 ` Shrikanth Hegde
2026-04-08 18:09 ` Yury Norov
2026-04-07 19:19 ` [PATCH v2 06/17] sched/fair: Select preferred CPU at wakeup when possible Shrikanth Hegde
2026-04-07 19:19 ` [PATCH v2 07/17] sched/fair: load balance only among preferred CPUs Shrikanth Hegde
2026-04-07 19:19 ` [PATCH v2 08/17] sched/rt: Select a preferred CPU for wakeup and pulling rt task Shrikanth Hegde
2026-04-07 19:19 ` [PATCH v2 09/17] sched/core: Keep tick on non-preferred CPUs until tasks are out Shrikanth Hegde
2026-04-07 19:19 ` [PATCH v2 10/17] sched/core: Push current task from non preferred CPU Shrikanth Hegde
2026-04-07 19:19 ` [PATCH v2 11/17] sched/debug: Add migration stats due to non preferred CPUs Shrikanth Hegde
2026-04-07 19:19 ` [PATCH v2 12/17] sched/feature: Add STEAL_MONITOR feature Shrikanth Hegde
2026-04-07 19:19 ` [PATCH v2 13/17] sched/core: Introduce a simple steal monitor Shrikanth Hegde
2026-04-07 19:19 ` [PATCH v2 14/17] sched/core: Compute steal values at regular intervals Shrikanth Hegde
2026-04-07 19:19 ` [PATCH v2 15/17] sched/core: Handle steal values and mark CPUs as preferred Shrikanth Hegde
2026-04-07 19:19 ` [PATCH v2 16/17] sched/core: Mark the direction of steal values to avoid oscillations Shrikanth Hegde
2026-04-07 19:19 ` [PATCH v2 17/17] sched/debug: Add debug knobs for steal monitor Shrikanth Hegde
2026-04-07 19:50 ` [PATCH v2 00/17] sched/paravirt: Introduce cpu_preferred_mask and steal-driven vCPU backoff Shrikanth Hegde
2026-04-08 10:14 ` Hillf Danton
2026-04-08 13:49 ` Shrikanth Hegde
2026-04-09 5:15 ` Hillf Danton
2026-04-09 10:27 ` Shrikanth Hegde
2026-04-10 9:47 ` Shrikanth Hegde [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fc05a2b4-a98b-4e7c-beed-05407886d55a@linux.ibm.com \
--to=sshegde@linux.ibm.com \
--cc=bsegall@google.com \
--cc=chleroy@kernel.org \
--cc=dietmar.eggemann@arm.com \
--cc=gregkh@linuxfoundation.org \
--cc=hdanton@sina.com \
--cc=huschle@linux.ibm.com \
--cc=iii@linux.ibm.com \
--cc=joelagnelf@nvidia.com \
--cc=juri.lelli@redhat.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=maddy@linux.ibm.com \
--cc=mgorman@suse.de \
--cc=mingo@kernel.org \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=seanjc@google.com \
--cc=srikar@linux.ibm.com \
--cc=tglx@linutronix.de \
--cc=vincent.guittot@linaro.org \
--cc=vineeth@bitbyteword.org \
--cc=vschneid@redhat.com \
--cc=yury.norov@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox