LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v4 00/17] Paravirt CPUs and push task for less vCPU preemption
@ 2025-11-19  6:20 Shrikanth Hegde
  2025-11-19  6:20 ` [RFC PATCH v4 01/17] sched/docs: Document cpu_paravirt_mask and Paravirt CPU concept Shrikanth Hegde
                   ` (17 more replies)
  0 siblings, 18 replies; 25+ messages in thread
From: Shrikanth Hegde @ 2025-11-19  6:20 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev
  Cc: sshegde, mingo, peterz, juri.lelli, vincent.guittot, tglx,
	yury.norov, maddy, srikar, gregkh, pbonzini, seanjc,
	kprateek.nayak, vschneid, iii, huschle, rostedt, dietmar.eggemann,
	christophe.leroy

Detailed problem statement and some of the implementation choices were 
discussed earlier[1].

[1]: https://lore.kernel.org/all/20250910174210.1969750-1-sshegde@linux.ibm.com/

This is likely the version which would be used for LPC2025 discussion on
this topic. Feel free to provide your suggestion and hoping for a solution
that works for different architectures and it's use cases.

All the existing alternatives such as cpu hotplug, creating isolated
partitions etc break the user affinity. Since number of CPUs to use change
depending on the steal time, it is not driven by User. Hence it would be
wrong to break the affinity. This series allows if the task is pinned
only paravirt CPUs, it will continue running there.

Changes compared v3[1]:

- Introduced computation of steal time in powerpc code.
- Derive number of CPUs to use and mark the remaining as paravirt based
  on steal values. 
- Provide debugfs knobs to alter how steal time values being used.
- Removed static key check for paravirt CPUs (Yury)
- Removed preempt_disable/enable while calling stopper (Prateek)
- Made select_idle_sibling and friends aware of paravirt CPUs.
- Removed 3 unused schedstat fields and introduced 2 related to paravirt
  handling.
- Handled nohz_full case by enabling tick on it when there is CFS/RT on
  it.
- Updated helper patch to override arch behaviour for easier debugging
  during development.

TODO: 

- Get performance numbers on PowerPC, x86 and S390. Hopefully by next
  week. Didn't want to hold the series till then.

- The CPUs to mark as paravirt is very simple and doesn't work when
  vCPUs aren't spread out uniformly across NUMA nodes. Ideal would be splice
  the numbers based on how many CPUs each NUMA node has. It is quite
  tricky to do specially since cpumask can be on stack too. Given
  NR_CPUS can be 8192 and nr_possible_nodes 32. Haven't got my head into
  solving it yet. Maybe there is easier way.

- DLPAR Add/Remove needs to call init of EC/VP cores (powerpc specific)

- Userspace tools awareness such as irqbalance. 

- Delve into design of hint from Hyeprvisor(HW Hint). i.e Host informs
  guest which/how many CPUs it has to use at this moment. This interface
  should work across archs with each arch doing its specific handling.

- Determine the default values for steal time related knobs
  empirically and document them.

- Need to check safety against CPU hotplug specially in process_steal.


Applies cleanly on tip/master:
commit c2ef745151b21d4dcc4b29a1eabf1096f5ba544b


Thanks to srikar for providing the initial code around powerpc steal
time handling code. Thanks to all who went through and provided reviews.

PS: I haven't found a better name. Please suggest if you have any.

Shrikanth Hegde (17):
  sched/docs: Document cpu_paravirt_mask and Paravirt CPU concept
  cpumask: Introduce cpu_paravirt_mask
  sched/core: Dont allow to use CPU marked as paravirt
  sched/debug: Remove unused schedstats
  sched/fair: Add paravirt movements for proc sched file
  sched/fair: Pass current cpu in select_idle_sibling
  sched/fair: Don't consider paravirt CPUs for wakeup and load balance
  sched/rt: Don't select paravirt CPU for wakeup and push/pull rt task
  sched/core: Add support for nohz_full CPUs
  sched/core: Push current task from paravirt CPU
  sysfs: Add paravirt CPU file
  powerpc: method to initialize ec and vp cores
  powerpc: enable/disable paravirt CPUs based on steal time
  powerpc: process steal values at fixed intervals
  powerpc: add debugfs file for controlling handling on steal values
  sysfs: Provide write method for paravirt
  helper: disable arch handling if paravirt file being written

 .../ABI/testing/sysfs-devices-system-cpu      |   9 +
 Documentation/scheduler/sched-arch.rst        |  37 +++
 arch/powerpc/include/asm/smp.h                |   1 +
 arch/powerpc/kernel/smp.c                     |   1 +
 arch/powerpc/platforms/pseries/lpar.c         | 223 ++++++++++++++++++
 arch/powerpc/platforms/pseries/pseries.h      |   1 +
 drivers/base/cpu.c                            |  60 ++++-
 include/linux/cpumask.h                       |  20 ++
 include/linux/sched.h                         |   9 +-
 kernel/sched/core.c                           | 106 ++++++++-
 kernel/sched/debug.c                          |   5 +-
 kernel/sched/fair.c                           |  42 +++-
 kernel/sched/rt.c                             |  11 +-
 kernel/sched/sched.h                          |   9 +
 14 files changed, 519 insertions(+), 15 deletions(-)

-- 
2.47.3



^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2025-11-19 12:54 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-19  6:20 [RFC PATCH v4 00/17] Paravirt CPUs and push task for less vCPU preemption Shrikanth Hegde
2025-11-19  6:20 ` [RFC PATCH v4 01/17] sched/docs: Document cpu_paravirt_mask and Paravirt CPU concept Shrikanth Hegde
2025-11-19  6:20 ` [RFC PATCH v4 02/17] cpumask: Introduce cpu_paravirt_mask Shrikanth Hegde
2025-11-19  6:20 ` [RFC PATCH v4 03/17] sched/core: Dont allow to use CPU marked as paravirt Shrikanth Hegde
2025-11-19  6:20 ` [RFC PATCH v4 04/17] sched/debug: Remove unused schedstats Shrikanth Hegde
2025-11-19  6:20 ` [RFC PATCH v5 05/17] sched/fair: Add paravirt movements for proc sched file Shrikanth Hegde
2025-11-19  6:20 ` [RFC PATCH v4 06/17] sched/fair: Pass current cpu in select_idle_sibling Shrikanth Hegde
2025-11-19  6:20 ` [RFC PATCH v4 07/17] sched/fair: Don't consider paravirt CPUs for wakeup and load balance Shrikanth Hegde
2025-11-19  6:20 ` [RFC PATCH v4 08/17] sched/rt: Don't select paravirt CPU for wakeup and push/pull rt task Shrikanth Hegde
2025-11-19  6:20 ` [RFC PATCH v4 09/17] sched/core: Add support for nohz_full CPUs Shrikanth Hegde
2025-11-19  6:20 ` [RFC PATCH v4 10/17] sched/core: Push current task from paravirt CPU Shrikanth Hegde
2025-11-19  6:20 ` [RFC PATCH v4 11/17] sysfs: Add paravirt CPU file Shrikanth Hegde
2025-11-19  6:20 ` [RFC PATCH v4 12/17] powerpc: method to initialize ec and vp cores Shrikanth Hegde
2025-11-19  6:20 ` [RFC PATCH v4 13/17] powerpc: enable/disable paravirt CPUs based on steal time Shrikanth Hegde
2025-11-19  6:20 ` [RFC PATCH v4 14/17] powerpc: process steal values at fixed intervals Shrikanth Hegde
2025-11-19  6:20 ` [RFC PATCH v4 15/17] powerpc: add debugfs file for controlling handling on steal values Shrikanth Hegde
2025-11-19  6:20 ` [HELPER PATCH 1] sysfs: Provide write method for paravirt Shrikanth Hegde
2025-11-19  7:42   ` Greg KH
2025-11-19  8:08     ` Shrikanth Hegde
2025-11-19  8:20       ` Christophe Leroy
2025-11-19 10:01         ` Shrikanth Hegde
2025-11-19  8:23       ` Greg KH
2025-11-19  9:56         ` Shrikanth Hegde
2025-11-19  6:21 ` [HELPER PATCH 2] helper: disable arch handling if paravirt file being written Shrikanth Hegde
2025-11-19 12:53 ` [RFC PATCH v4 00/17] Paravirt CPUs and push task for less vCPU preemption Shrikanth Hegde

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox