linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 1/2] sched/core: Option if steal should update CPU capacity
@ 2025-10-29  6:07 Srikar Dronamraju
  2025-10-29  6:07 ` [PATCH v2 2/2] powerpc/smp: Disable steal from updating " Srikar Dronamraju
  0 siblings, 1 reply; 6+ messages in thread
From: Srikar Dronamraju @ 2025-10-29  6:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: linuxppc-dev, Ben Segall, Christophe Leroy, Dietmar Eggemann,
	Ingo Molnar, Juri Lelli, Madhavan Srinivasan, Mel Gorman,
	Michael Ellerman, Nicholas Piggin, Peter Zijlstra, Steven Rostedt,
	Thomas Gleixner, Valentin Schneider, Vincent Guittot,
	Srikar Dronamraju

At present, scheduler scales CPU capacity for fair tasks based on time
spent on irq and steal time. If a CPU sees irq or steal time, its
capacity for fair tasks decreases causing tasks to migrate to other CPU
that are not affected by irq and steal time. All of this is gated by
scheduler feature NONTASK_CAPACITY.

In virtualized setups, a CPU that reports steal time (time taken by the
hypervisor) can cause tasks to migrate unnecessarily to sibling CPUs that
appear to be less busy, only for the situation to reverse shortly.

To mitigate this ping-pong behaviour, this change introduces a new
static branch sched_acct_steal_cap which will control whether steal time
contributes to non-task capacity adjustments (used for fair scheduling).

Signed-off-by: Srikar Dronamraju <srikar@linux.ibm.com>
---
Changelog v1->v2:
v1: https://lkml.kernel.org/r/20251028104255.1892485-1-srikar@linux.ibm.com
Peter suggested to use static branch instead of sched feat

 include/linux/sched/topology.h |  6 ++++++
 kernel/sched/core.c            | 15 +++++++++++++--
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 198bb5cc1774..88e34c60cffd 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -285,4 +285,10 @@ static inline int task_node(const struct task_struct *p)
 	return cpu_to_node(task_cpu(p));
 }
 
+#ifdef CONFIG_HAVE_SCHED_AVG_IRQ
+extern void sched_disable_steal_acct(void);
+#else
+static __always_inline void sched_disable_steal_acct(void) { }
+#endif
+
 #endif /* _LINUX_SCHED_TOPOLOGY_H */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 81c6df746df1..09884da6b085 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -738,6 +738,14 @@ struct rq *task_rq_lock(struct task_struct *p, struct rq_flags *rf)
 /*
  * RQ-clock updating methods:
  */
+#ifdef CONFIG_HAVE_SCHED_AVG_IRQ
+static DEFINE_STATIC_KEY_TRUE(sched_acct_steal_cap);
+
+void sched_disable_steal_acct(void)
+{
+	return static_branch_disable(&sched_acct_steal_cap);
+}
+#endif
 
 static void update_rq_clock_task(struct rq *rq, s64 delta)
 {
@@ -792,8 +800,11 @@ static void update_rq_clock_task(struct rq *rq, s64 delta)
 	rq->clock_task += delta;
 
 #ifdef CONFIG_HAVE_SCHED_AVG_IRQ
-	if ((irq_delta + steal) && sched_feat(NONTASK_CAPACITY))
-		update_irq_load_avg(rq, irq_delta + steal);
+	if ((irq_delta + steal) && sched_feat(NONTASK_CAPACITY)) {
+		if (steal && static_branch_likely(&sched_acct_steal_cap))
+			irq_delta += steal;
+		update_irq_load_avg(rq, irq_delta);
+	}
 #endif
 	update_rq_clock_pelt(rq, delta);
 }
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 2/2] powerpc/smp: Disable steal from updating CPU capacity
  2025-10-29  6:07 [PATCH v2 1/2] sched/core: Option if steal should update CPU capacity Srikar Dronamraju
@ 2025-10-29  6:07 ` Srikar Dronamraju
  2025-10-29  7:43   ` Vincent Guittot
  0 siblings, 1 reply; 6+ messages in thread
From: Srikar Dronamraju @ 2025-10-29  6:07 UTC (permalink / raw)
  To: linux-kernel
  Cc: linuxppc-dev, Ben Segall, Christophe Leroy, Dietmar Eggemann,
	Ingo Molnar, Juri Lelli, Madhavan Srinivasan, Mel Gorman,
	Michael Ellerman, Nicholas Piggin, Peter Zijlstra, Steven Rostedt,
	Thomas Gleixner, Valentin Schneider, Vincent Guittot,
	Srikar Dronamraju

In a shared LPAR with SMT enabled, it has been observed that when a CPU
experiences steal time, it can trigger task migrations between sibling
CPUs. The idle CPU pulls a runnable task from its sibling that is
impacted by steal, making the previously busy CPU go idle. This reversal
can repeat continuously, resulting in ping-pong behavior between SMT
siblings.

To avoid migrations solely triggered by steal time, disable steal from
updating CPU capacity when running in shared processor mode.

lparstat
System Configuration
type=Shared mode=Uncapped smt=8 lcpu=72 mem=2139693696 kB cpus=64 ent=24.00

Noise case: (Ebizzy on 2 LPARs with similar configuration as above)
nr-ebizzy-threads  baseline  std-deviation  +patch    std-deviation
36                 1         (0.0345589)    1.01073   (0.0411082)
72                 1         (0.0387066)    1.12867   (0.029486)
96                 1         (0.013317)     1.05755   (0.0118292)
128                1         (0.028087)     1.04193   (0.027159)
144                1         (0.0103478)    1.07522   (0.0265476)
192                1         (0.0164666)    1.02177   (0.0164088)
256                1         (0.0241208)    0.977572  (0.0310648)
288                1         (0.0121516)    0.97529   (0.0263536)
384                1         (0.0128001)    0.967025  (0.0207603)
512                1         (0.0113173)    1.00975   (0.00753263)
576                1         (0.0126021)    1.01087   (0.0054196)
864                1         (0.0109194)    1.00369   (0.00987092)
1024               1         (0.0121474)    1.00338   (0.0122591)
1152               1         (0.013801)     1.0097    (0.0150391)

scaled perf stats for 72 thread case.
event         baseline  +patch
cycles        1         1.16993
instructions  1         1.14435
cs            1         0.913554
migrations    1         0.110884
faults        1         1.0005
cache-misses  1         1.68619

Observations:
- We see a drop in context-switches and migrations resulting in an
improvement in the records per second.

No-noise case: (Ebizzy on 1 LPARs with other LPAR being idle)
nr-ebizzy-threads  baseline  std-deviation  +patch    std-deviation
36                 1         (0.0451482)    1.01243   (0.0434088)
72                 1         (0.0308503)    1.06175   (0.0373877)
96                 1         (0.0500514)    1.13143   (0.0718754)
128                1         (0.0602872)    1.09909   (0.0375227)
144                1         (0.0843502)    1.07494   (0.0240824)
192                1         (0.0255402)    0.992734  (0.0615166)
256                1         (0.00653372)   0.982841  (0.00751558)
288                1         (0.00318369)   0.99093   (0.00960287)
384                1         (0.00272681)   0.974312  (0.0112133)
512                1         (0.00528486)   0.981207  (0.0125443)
576                1         (0.00491385)   0.992027  (0.0104948)
864                1         (0.0087057)    0.994927  (0.0143434)
1024               1         (0.010002)     0.992463  (0.00429322)
1152               1         (0.00720965)   1.00393   (0.012553)

Signed-off-by: Srikar Dronamraju <srikar@linux.ibm.com>
---
Changelog v1->v2:
v1: https://lkml.kernel.org/r/20251028104255.1892485-2-srikar@linux.ibm.com
Peter suggested to use static branch instead of sched feat

 arch/powerpc/kernel/smp.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 5ac7084eebc0..0f7fae0b4420 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1694,8 +1694,11 @@ static void __init build_sched_topology(void)
 {
 	int i = 0;
 
-	if (is_shared_processor() && has_big_cores)
-		static_branch_enable(&splpar_asym_pack);
+	if (is_shared_processor()) {
+		if (has_big_cores)
+			static_branch_enable(&splpar_asym_pack);
+		sched_disable_steal_acct();
+	}
 
 #ifdef CONFIG_SCHED_SMT
 	if (has_big_cores) {
-- 
2.47.3



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 2/2] powerpc/smp: Disable steal from updating CPU capacity
  2025-10-29  6:07 ` [PATCH v2 2/2] powerpc/smp: Disable steal from updating " Srikar Dronamraju
@ 2025-10-29  7:43   ` Vincent Guittot
  2025-10-29  8:31     ` Srikar Dronamraju
  0 siblings, 1 reply; 6+ messages in thread
From: Vincent Guittot @ 2025-10-29  7:43 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: linux-kernel, linuxppc-dev, Ben Segall, Christophe Leroy,
	Dietmar Eggemann, Ingo Molnar, Juri Lelli, Madhavan Srinivasan,
	Mel Gorman, Michael Ellerman, Nicholas Piggin, Peter Zijlstra,
	Steven Rostedt, Thomas Gleixner, Valentin Schneider

Hi Srikar,

On Wed, 29 Oct 2025 at 07:09, Srikar Dronamraju <srikar@linux.ibm.com> wrote:
>
> In a shared LPAR with SMT enabled, it has been observed that when a CPU
> experiences steal time, it can trigger task migrations between sibling
> CPUs. The idle CPU pulls a runnable task from its sibling that is
> impacted by steal, making the previously busy CPU go idle. This reversal

IIUC, the migration is triggered by the reduced capacity case when
there is 1 task on the CPU

> can repeat continuously, resulting in ping-pong behavior between SMT
> siblings.

Does it mean that the vCPU generates its own steal time or is it
because other vcpus are already running on the other CPU and they
starts to steal time on the sibling vCPU

>
> To avoid migrations solely triggered by steal time, disable steal from
> updating CPU capacity when running in shared processor mode.

You are disabling the steal time accounting only for your arch. Does
it mean that only powerpc are impacted by this effect ?

>
> lparstat
> System Configuration
> type=Shared mode=Uncapped smt=8 lcpu=72 mem=2139693696 kB cpus=64 ent=24.00
>
> Noise case: (Ebizzy on 2 LPARs with similar configuration as above)
> nr-ebizzy-threads  baseline  std-deviation  +patch    std-deviation
> 36                 1         (0.0345589)    1.01073   (0.0411082)
> 72                 1         (0.0387066)    1.12867   (0.029486)
> 96                 1         (0.013317)     1.05755   (0.0118292)
> 128                1         (0.028087)     1.04193   (0.027159)
> 144                1         (0.0103478)    1.07522   (0.0265476)
> 192                1         (0.0164666)    1.02177   (0.0164088)
> 256                1         (0.0241208)    0.977572  (0.0310648)
> 288                1         (0.0121516)    0.97529   (0.0263536)
> 384                1         (0.0128001)    0.967025  (0.0207603)
> 512                1         (0.0113173)    1.00975   (0.00753263)
> 576                1         (0.0126021)    1.01087   (0.0054196)
> 864                1         (0.0109194)    1.00369   (0.00987092)
> 1024               1         (0.0121474)    1.00338   (0.0122591)
> 1152               1         (0.013801)     1.0097    (0.0150391)
>
> scaled perf stats for 72 thread case.
> event         baseline  +patch
> cycles        1         1.16993
> instructions  1         1.14435
> cs            1         0.913554
> migrations    1         0.110884
> faults        1         1.0005
> cache-misses  1         1.68619
>
> Observations:
> - We see a drop in context-switches and migrations resulting in an
> improvement in the records per second.
>
> No-noise case: (Ebizzy on 1 LPARs with other LPAR being idle)
> nr-ebizzy-threads  baseline  std-deviation  +patch    std-deviation
> 36                 1         (0.0451482)    1.01243   (0.0434088)
> 72                 1         (0.0308503)    1.06175   (0.0373877)
> 96                 1         (0.0500514)    1.13143   (0.0718754)
> 128                1         (0.0602872)    1.09909   (0.0375227)
> 144                1         (0.0843502)    1.07494   (0.0240824)
> 192                1         (0.0255402)    0.992734  (0.0615166)
> 256                1         (0.00653372)   0.982841  (0.00751558)
> 288                1         (0.00318369)   0.99093   (0.00960287)
> 384                1         (0.00272681)   0.974312  (0.0112133)
> 512                1         (0.00528486)   0.981207  (0.0125443)
> 576                1         (0.00491385)   0.992027  (0.0104948)
> 864                1         (0.0087057)    0.994927  (0.0143434)
> 1024               1         (0.010002)     0.992463  (0.00429322)
> 1152               1         (0.00720965)   1.00393   (0.012553)
>
> Signed-off-by: Srikar Dronamraju <srikar@linux.ibm.com>
> ---
> Changelog v1->v2:
> v1: https://lkml.kernel.org/r/20251028104255.1892485-2-srikar@linux.ibm.com
> Peter suggested to use static branch instead of sched feat
>
>  arch/powerpc/kernel/smp.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 5ac7084eebc0..0f7fae0b4420 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -1694,8 +1694,11 @@ static void __init build_sched_topology(void)
>  {
>         int i = 0;
>
> -       if (is_shared_processor() && has_big_cores)
> -               static_branch_enable(&splpar_asym_pack);
> +       if (is_shared_processor()) {
> +               if (has_big_cores)
> +                       static_branch_enable(&splpar_asym_pack);
> +               sched_disable_steal_acct();
> +       }
>
>  #ifdef CONFIG_SCHED_SMT
>         if (has_big_cores) {
> --
> 2.47.3
>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 2/2] powerpc/smp: Disable steal from updating CPU capacity
  2025-10-29  7:43   ` Vincent Guittot
@ 2025-10-29  8:31     ` Srikar Dronamraju
  2025-11-03  8:46       ` Vincent Guittot
  0 siblings, 1 reply; 6+ messages in thread
From: Srikar Dronamraju @ 2025-10-29  8:31 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: linux-kernel, linuxppc-dev, Ben Segall, Christophe Leroy,
	Dietmar Eggemann, Ingo Molnar, Juri Lelli, Madhavan Srinivasan,
	Mel Gorman, Michael Ellerman, Nicholas Piggin, Peter Zijlstra,
	Steven Rostedt, Thomas Gleixner, Valentin Schneider

* Vincent Guittot <vincent.guittot@linaro.org> [2025-10-29 08:43:34]:

> Hi Srikar,
> 
> On Wed, 29 Oct 2025 at 07:09, Srikar Dronamraju <srikar@linux.ibm.com> wrote:
> >
> > In a shared LPAR with SMT enabled, it has been observed that when a CPU
> > experiences steal time, it can trigger task migrations between sibling
> > CPUs. The idle CPU pulls a runnable task from its sibling that is
> > impacted by steal, making the previously busy CPU go idle. This reversal
> 
> IIUC, the migration is triggered by the reduced capacity case when
> there is 1 task on the CPU

Thanks Vincent for taking a look at the change.

Yes, Lets assume we have 3 threads running on 6 vCPUs backed by 2 Physical
cores. So only 3 vCPUs (0,1,2) would be busy and other 3 (3,4,5) will be
idle. The vCPUs that are busy will start seeing steal time of around 33%
because they cant run completely on the Physical CPU. Without the change,
they will start seeing their capacity decrease. While the idle vCPUs(3,4,5)
ones will have their capacity intact. So when the scheduler switches the 3
tasks to the idle vCPUs, the newer busy vCPUs (3,4,5) will start seeing steal
and hence see their CPU capacity drops while the newer idle vCPUs (0,1,2)
will see their capacity increase since their steal time reduces. Hence the
tasks will be migrated again.

> 
> > can repeat continuously, resulting in ping-pong behavior between SMT
> > siblings.
> 
> Does it mean that the vCPU generates its own steal time or is it
> because other vcpus are already running on the other CPU and they
> starts to steal time on the sibling vCPU

There are other vCPUs running and sharing the same Physical CPU, and hence
these vCPUs are seeing steal time.

> 
> >
> > To avoid migrations solely triggered by steal time, disable steal from
> > updating CPU capacity when running in shared processor mode.
> 
> You are disabling the steal time accounting only for your arch. Does
> it mean that only powerpc are impacted by this effect ?

On PowerVM, the hypervisor schedules at a core granularity. So in the above
scenario, if we assume SMT to be 2, then we have 3 vCores and 1 Physical
core. So even if 2 threads are running, they would be scheduled on 2 vCores
and hence we would start seeing 50% steal. So this steal accounting is more
predominant on Shared LPARs running on PowerVM.

However we can use this same mechanism on other architectures too since the 
framework is arch independent.

Does this clarify?

-- 
Thanks and Regards
Srikar Dronamraju


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 2/2] powerpc/smp: Disable steal from updating CPU capacity
  2025-10-29  8:31     ` Srikar Dronamraju
@ 2025-11-03  8:46       ` Vincent Guittot
  2025-11-06  5:22         ` Srikar Dronamraju
  0 siblings, 1 reply; 6+ messages in thread
From: Vincent Guittot @ 2025-11-03  8:46 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: linux-kernel, linuxppc-dev, Ben Segall, Christophe Leroy,
	Dietmar Eggemann, Ingo Molnar, Juri Lelli, Madhavan Srinivasan,
	Mel Gorman, Michael Ellerman, Nicholas Piggin, Peter Zijlstra,
	Steven Rostedt, Thomas Gleixner, Valentin Schneider

Hi Sikar,

On Wed, 29 Oct 2025 at 09:32, Srikar Dronamraju <srikar@linux.ibm.com> wrote:
>
> * Vincent Guittot <vincent.guittot@linaro.org> [2025-10-29 08:43:34]:
>
> > Hi Srikar,
> >
> > On Wed, 29 Oct 2025 at 07:09, Srikar Dronamraju <srikar@linux.ibm.com> wrote:
> > >
> > > In a shared LPAR with SMT enabled, it has been observed that when a CPU
> > > experiences steal time, it can trigger task migrations between sibling
> > > CPUs. The idle CPU pulls a runnable task from its sibling that is
> > > impacted by steal, making the previously busy CPU go idle. This reversal
> >
> > IIUC, the migration is triggered by the reduced capacity case when
> > there is 1 task on the CPU
>
> Thanks Vincent for taking a look at the change.
>
> Yes, Lets assume we have 3 threads running on 6 vCPUs backed by 2 Physical
> cores. So only 3 vCPUs (0,1,2) would be busy and other 3 (3,4,5) will be
> idle. The vCPUs that are busy will start seeing steal time of around 33%
> because they cant run completely on the Physical CPU. Without the change,
> they will start seeing their capacity decrease. While the idle vCPUs(3,4,5)
> ones will have their capacity intact. So when the scheduler switches the 3
> tasks to the idle vCPUs, the newer busy vCPUs (3,4,5) will start seeing steal
> and hence see their CPU capacity drops while the newer idle vCPUs (0,1,2)
> will see their capacity increase since their steal time reduces. Hence the
> tasks will be migrated again.

Thanks for the details
This is probably even more visible when vcpu are not pinned to separate cpu


>
> >
> > > can repeat continuously, resulting in ping-pong behavior between SMT
> > > siblings.
> >
> > Does it mean that the vCPU generates its own steal time or is it
> > because other vcpus are already running on the other CPU and they
> > starts to steal time on the sibling vCPU
>
> There are other vCPUs running and sharing the same Physical CPU, and hence
> these vCPUs are seeing steal time.
>
> >
> > >
> > > To avoid migrations solely triggered by steal time, disable steal from
> > > updating CPU capacity when running in shared processor mode.
> >
> > You are disabling the steal time accounting only for your arch. Does
> > it mean that only powerpc are impacted by this effect ?
>
> On PowerVM, the hypervisor schedules at a core granularity. So in the above
> scenario, if we assume SMT to be 2, then we have 3 vCores and 1 Physical
> core. So even if 2 threads are running, they would be scheduled on 2 vCores
> and hence we would start seeing 50% steal. So this steal accounting is more
> predominant on Shared LPARs running on PowerVM.
>
> However we can use this same mechanism on other architectures too since the
> framework is arch independent.
>
> Does this clarify?

yes, thanks
I see 2 problems in your use case, the idle cpu doesn't have steal
time even if the host cpu on which it will run, is already busy with
other things
and with not pinned vcpu, we can't estimate what will be the steal
time on the target host
And I don't see a simple way other than disabling steal time

>
> --
> Thanks and Regards
> Srikar Dronamraju


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 2/2] powerpc/smp: Disable steal from updating CPU capacity
  2025-11-03  8:46       ` Vincent Guittot
@ 2025-11-06  5:22         ` Srikar Dronamraju
  0 siblings, 0 replies; 6+ messages in thread
From: Srikar Dronamraju @ 2025-11-06  5:22 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: linux-kernel, linuxppc-dev, Ben Segall, Christophe Leroy,
	Dietmar Eggemann, Ingo Molnar, Juri Lelli, Madhavan Srinivasan,
	Mel Gorman, Michael Ellerman, Nicholas Piggin, Peter Zijlstra,
	Steven Rostedt, Thomas Gleixner, Valentin Schneider

* Vincent Guittot <vincent.guittot@linaro.org> [2025-11-03 09:46:26]:

> On Wed, 29 Oct 2025 at 09:32, Srikar Dronamraju <srikar@linux.ibm.com> wrote:
> > * Vincent Guittot <vincent.guittot@linaro.org> [2025-10-29 08:43:34]:
> > > On Wed, 29 Oct 2025 at 07:09, Srikar Dronamraju <srikar@linux.ibm.com> wrote:
> > > >
> > > IIUC, the migration is triggered by the reduced capacity case when
> > > there is 1 task on the CPU
> >
> > Thanks Vincent for taking a look at the change.
> >
> > Yes, Lets assume we have 3 threads running on 6 vCPUs backed by 2 Physical
> > cores. So only 3 vCPUs (0,1,2) would be busy and other 3 (3,4,5) will be
> > idle. The vCPUs that are busy will start seeing steal time of around 33%
> > because they cant run completely on the Physical CPU. Without the change,
> > they will start seeing their capacity decrease. While the idle vCPUs(3,4,5)
> > ones will have their capacity intact. So when the scheduler switches the 3
> > tasks to the idle vCPUs, the newer busy vCPUs (3,4,5) will start seeing steal
> > and hence see their CPU capacity drops while the newer idle vCPUs (0,1,2)
> > will see their capacity increase since their steal time reduces. Hence the
> > tasks will be migrated again.
> 
> Thanks for the details
> This is probably even more visible when vcpu are not pinned to separate cpu

If workload runs on vCPUs pinned to CPUs belonging to the same core, then
yes, steal may be less visible. However if workload were to run unpinned or
were to run on vCPUs pinned to CPUs belonging to different cores, then its
more visible.

> > >
> > > > can repeat continuously, resulting in ping-pong behavior between SMT
> > > > siblings.
> > >
> > > Does it mean that the vCPU generates its own steal time or is it
> > > because other vcpus are already running on the other CPU and they
> > > starts to steal time on the sibling vCPU
> >
> > There are other vCPUs running and sharing the same Physical CPU, and hence
> > these vCPUs are seeing steal time.
> >
> > >
> > > >
> > > > To avoid migrations solely triggered by steal time, disable steal from
> > > > updating CPU capacity when running in shared processor mode.
> > >
> > > You are disabling the steal time accounting only for your arch. Does
> > > it mean that only powerpc are impacted by this effect ?
> >
> > On PowerVM, the hypervisor schedules at a core granularity. So in the above
> > scenario, if we assume SMT to be 2, then we have 3 vCores and 1 Physical
> > core. So even if 2 threads are running, they would be scheduled on 2 vCores
> > and hence we would start seeing 50% steal. So this steal accounting is more
> > predominant on Shared LPARs running on PowerVM.
> >
> > However we can use this same mechanism on other architectures too since the
> > framework is arch independent.
> >
> > Does this clarify?
> 
> yes, thanks
> I see 2 problems in your use case, the idle cpu doesn't have steal
> time even if the host cpu on which it will run, is already busy with
> other things
> and with not pinned vcpu, we can't estimate what will be the steal
> time on the target host
> And I don't see a simple way other than disabling steal time
> 

Yes, neither we can have steal time for an idle sibling nor can we estimate
the steal time for the target CPU. Thanks for acknowledging the problem.

-- 
Thanks and Regards
Srikar Dronamraju


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-11-06  5:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-29  6:07 [PATCH v2 1/2] sched/core: Option if steal should update CPU capacity Srikar Dronamraju
2025-10-29  6:07 ` [PATCH v2 2/2] powerpc/smp: Disable steal from updating " Srikar Dronamraju
2025-10-29  7:43   ` Vincent Guittot
2025-10-29  8:31     ` Srikar Dronamraju
2025-11-03  8:46       ` Vincent Guittot
2025-11-06  5:22         ` Srikar Dronamraju

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).