From: Srikar Dronamraju <srikar@linux.ibm.com>
To: linux-kernel@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org, Ben Segall <bsegall@google.com>,
Christophe Leroy <christophe.leroy@csgroup.eu>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Ingo Molnar <mingo@kernel.org>,
Juri Lelli <juri.lelli@redhat.com>,
Madhavan Srinivasan <maddy@linux.ibm.com>,
Mel Gorman <mgorman@suse.de>,
Michael Ellerman <mpe@ellerman.id.au>,
Nicholas Piggin <npiggin@gmail.com>,
Peter Zijlstra <peterz@infradead.org>,
Steven Rostedt <rostedt@goodmis.org>,
Thomas Gleixner <tglx@linutronix.de>,
Valentin Schneider <vschneid@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Srikar Dronamraju <srikar@linux.ibm.com>
Subject: [PATCH v2 1/2] sched/core: Option if steal should update CPU capacity
Date: Wed, 29 Oct 2025 11:37:56 +0530 [thread overview]
Message-ID: <20251029060757.2007601-1-srikar@linux.ibm.com> (raw)
At present, scheduler scales CPU capacity for fair tasks based on time
spent on irq and steal time. If a CPU sees irq or steal time, its
capacity for fair tasks decreases causing tasks to migrate to other CPU
that are not affected by irq and steal time. All of this is gated by
scheduler feature NONTASK_CAPACITY.
In virtualized setups, a CPU that reports steal time (time taken by the
hypervisor) can cause tasks to migrate unnecessarily to sibling CPUs that
appear to be less busy, only for the situation to reverse shortly.
To mitigate this ping-pong behaviour, this change introduces a new
static branch sched_acct_steal_cap which will control whether steal time
contributes to non-task capacity adjustments (used for fair scheduling).
Signed-off-by: Srikar Dronamraju <srikar@linux.ibm.com>
---
Changelog v1->v2:
v1: https://lkml.kernel.org/r/20251028104255.1892485-1-srikar@linux.ibm.com
Peter suggested to use static branch instead of sched feat
include/linux/sched/topology.h | 6 ++++++
kernel/sched/core.c | 15 +++++++++++++--
2 files changed, 19 insertions(+), 2 deletions(-)
diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 198bb5cc1774..88e34c60cffd 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -285,4 +285,10 @@ static inline int task_node(const struct task_struct *p)
return cpu_to_node(task_cpu(p));
}
+#ifdef CONFIG_HAVE_SCHED_AVG_IRQ
+extern void sched_disable_steal_acct(void);
+#else
+static __always_inline void sched_disable_steal_acct(void) { }
+#endif
+
#endif /* _LINUX_SCHED_TOPOLOGY_H */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 81c6df746df1..09884da6b085 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -738,6 +738,14 @@ struct rq *task_rq_lock(struct task_struct *p, struct rq_flags *rf)
/*
* RQ-clock updating methods:
*/
+#ifdef CONFIG_HAVE_SCHED_AVG_IRQ
+static DEFINE_STATIC_KEY_TRUE(sched_acct_steal_cap);
+
+void sched_disable_steal_acct(void)
+{
+ return static_branch_disable(&sched_acct_steal_cap);
+}
+#endif
static void update_rq_clock_task(struct rq *rq, s64 delta)
{
@@ -792,8 +800,11 @@ static void update_rq_clock_task(struct rq *rq, s64 delta)
rq->clock_task += delta;
#ifdef CONFIG_HAVE_SCHED_AVG_IRQ
- if ((irq_delta + steal) && sched_feat(NONTASK_CAPACITY))
- update_irq_load_avg(rq, irq_delta + steal);
+ if ((irq_delta + steal) && sched_feat(NONTASK_CAPACITY)) {
+ if (steal && static_branch_likely(&sched_acct_steal_cap))
+ irq_delta += steal;
+ update_irq_load_avg(rq, irq_delta);
+ }
#endif
update_rq_clock_pelt(rq, delta);
}
--
2.47.3
next reply other threads:[~2025-10-29 6:09 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-29 6:07 Srikar Dronamraju [this message]
2025-10-29 6:07 ` [PATCH v2 2/2] powerpc/smp: Disable steal from updating CPU capacity Srikar Dronamraju
2025-10-29 7:43 ` Vincent Guittot
2025-10-29 8:31 ` Srikar Dronamraju
2025-11-03 8:46 ` Vincent Guittot
2025-11-06 5:22 ` Srikar Dronamraju
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251029060757.2007601-1-srikar@linux.ibm.com \
--to=srikar@linux.ibm.com \
--cc=bsegall@google.com \
--cc=christophe.leroy@csgroup.eu \
--cc=dietmar.eggemann@arm.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=maddy@linux.ibm.com \
--cc=mgorman@suse.de \
--cc=mingo@kernel.org \
--cc=mpe@ellerman.id.au \
--cc=npiggin@gmail.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).