Linux Documentation
 help / color / mirror / Atom feed
* [PATCH v5 00/24] sched: Introduce cpu_preferred_mask and steal-driven vCPU backoff
@ 2026-06-25 12:46 Shrikanth Hegde
  2026-06-25 12:46 ` [PATCH v5 01/24] sched/debug: Remove unused schedstats Shrikanth Hegde
                   ` (23 more replies)
  0 siblings, 24 replies; 25+ messages in thread
From: Shrikanth Hegde @ 2026-06-25 12:46 UTC (permalink / raw)
  To: linux-kernel, mingo, peterz, juri.lelli, vincent.guittot,
	yury.norov, kprateek.nayak, iii, corbet
  Cc: sshegde, tglx, gregkh, pbonzini, seanjc, vschneid, huschle,
	rostedt, dietmar.eggemann, maddy, srikar, hdanton, chleroy,
	vineeth, frederic, arighi, pauld, christian.loehle, tj,
	tommaso.cucinotta, maz, rafael, rdunlap, kernellwp, linux-doc

Very briefly,
- Maintain set of CPUs which can be used by workload. It is denoted as
  cpu_preferred_mask
- Periodically compute the steal time. If steal time is high/low based
  on the thresholds, either reduce/increase the preferred CPUs. This is
  handled in a new driver called steal_monitor
- If a CPU is marked as non-preferred, push the task running on it if
  possible.
- Use this CPU state in wakeup and load balance to ensure tasks run
  within preferred CPUs.

For more details on idea, problem statement and performance numbers,
please refer to cover-letter of v2[2] and OSPM talk[1].

*** Please review and provide your feedback!! ***

[1]:https://youtu.be/adxUKFPlOp0
[2] v2: https://lore.kernel.org/all/20260407191950.643549-1-sshegde@linux.ibm.com/#t
[3] v4: https://lore.kernel.org/all/20260617174139.155540-1-sshegde@linux.ibm.com/#t

Thank you very much for feedback so far. This has helped the code to
evolve towards a clear abstraction layers and get simplified.
(Hopefully). Apologies in advance if I have missed any comment.

base commit:
tip/sched/core at c095741713d1 ("sched/fair: Fix newidle vs core-sched")

v4->v5:
- Move the computation of steal time and decide on preferred CPU state
  to a driver. Drop those changes in core scheduler. (Yury Norov, K Prateek Nayak)
- A new driver called steal_monitor is added in drivers/virt/ (K Prateek Nayak)
  (Please let me know if there is a better place for it. I can move it
  there)
- New driver does periodic computation of steal time and
  increments/decrements the preferred CPUs.
- Debug knobs can be changed via module parameters. (Yury Norov)
- Default implementation are weak symbols. Archs may override by
  providing strong symbols in new respective arch specific file.
- Everything is centered around CONFIG_PREFERRED_CPU. No new config
  for new driver. Driver gets added to kernel, but not loaded by
  default.
- Load the driver to enable steal_monitor functionality. Unload to
  remove the same.
- Make CONFIG_PREFERRED_CPU depend on PARAVIRT && SMP (Yury Norov)
- move set_cpu_preferred to a macro. (Yury Norov)
  on CONFIG_PREFERRED_CPU=n it will just act on active CPUs in that case.
  It shouldn't alter any functionality.
- Do a simple encoding for has_preferred_cpu_state, which aims to avoid
  repeated cpumask_interest in is_cpu_allowed. 
  (Please let me know if new variable based approach to is_cpu_allowed
  should be done instead).
- Move select_fallback_rq above the rq_lock. (sashiko)
- Few documentation nitpicks (Randy Dunlap, sashiko)
- Avoid any decision for is_cpu_allowed for other classes (sashiko)
- Don't pull the load towards a non-preferred CPUs in idle and new
  idle balanced. (Inferred when seeing sashiko comments)
- Fix leaking of task_struct in push_work_done (K Prateek Nayak)
- Module parameters aren't checked for sane values. One should know
  what they are writing to it. If one writes 0 for interval_ms,
  then it gets set to default value again to avoid workqueue lockup.
- Added a few design construct related checks in the periodic work
  to ensure any future arch specific implementations follow it.
  1. preferred is subset of active.
  2. preferred cannot be empty.
- Added Documentation of steal_monitor in Documentation/driver-api/
  (Let me know if there is better place for it)

performance numbers are expected to be same or slightly better than v2.
With driver, one major overhead in sched_tick has been removed. i.e
finding the first housekeeping CPU which was O(N). 

Apologies in advance if there is any critical information is missing
regarding new driver such as policy, documentation or missing
implementation. Please let me know, and I can make those changes.
I have ensured checkpatch --strict is happy.

Also, I think there should be a MAINTAINERS file entry for new
driver. I don't see a drivers/virt/* entry.
Either as a new entry for driver or a few file in SCHEDULER entry.
Let me know if/what I should add it. I am bit cautious about such
change. I am willing to maintain this driver, other than that
I don't know what else i going to be necessary for it. I don't have
any maintainer experience either :)

PS: Sorry for the long CC list. Please unicast it to me if you want to
be dropped for the CC list.

Shrikanth Hegde (24):
  sched/debug: Remove unused schedstats
  sched/docs: Document cpu_preferred_mask and Preferred CPU concept
  kconfig: Provide PREFERRED_CPU option
  cpumask: Introduce cpu_preferred_mask
  sysfs: Add preferred CPU file
  sched/core: allow only preferred CPUs in is_cpu_allowed
  sched/fair: Select preferred CPU at wakeup when possible
  sched/fair: load balance only among preferred CPUs
  sched/fair: Pull the load on preferred CPU
  sched/core: Keep tick on non-preferred CPUs until tasks are out
  sched/core: Push current task from non preferred CPU
  sched/debug: Add migration stats due to non preferred CPUs
  virt/steal_monitor: Add documentation
  virt: Introduce steal monitor driver
  virt/steal_monitor: Restore to active on module disable
  virt/steal_monitor: Define steal_monitor structure
  virt/steal_monitor: Add control knobs for handling steal values
  virt/steal_monitor: Compute work at regular intervals
  virt/steal_monitor: Provide default method to get systemwide steal
    time
  virt/steal_monitor: Provide default method to inc/dec preferred CPUs
  virt/steal_monitor: Provide default method to get num of CPUs for
    steal ratio
  virt/steal_monitor: Act on steal values at regular intervals
  virt/steal_monitor: Add direction control
  virt/steal_monitor: Add design check of preferred subset of active

 .../ABI/testing/sysfs-devices-system-cpu      |  11 ++
 Documentation/driver-api/index.rst            |   1 +
 Documentation/driver-api/steal-monitor.rst    |  93 ++++++++++++
 Documentation/scheduler/sched-arch.rst        |  50 +++++++
 drivers/base/cpu.c                            |   8 ++
 drivers/virt/Makefile                         |   1 +
 drivers/virt/steal_monitor/Makefile           |  14 ++
 drivers/virt/steal_monitor/defaults.c         | 105 ++++++++++++++
 drivers/virt/steal_monitor/sm_core.c          | 124 ++++++++++++++++
 drivers/virt/steal_monitor/sm_core.h          |  32 +++++
 include/linux/cpumask.h                       |  21 ++-
 include/linux/sched.h                         |   5 +-
 kernel/Kconfig.preempt                        |  14 ++
 kernel/cpu.c                                  |   6 +
 kernel/sched/core.c                           | 133 +++++++++++++++++-
 kernel/sched/debug.c                          |   4 +-
 kernel/sched/fair.c                           |  11 +-
 kernel/sched/sched.h                          |  36 +++++
 18 files changed, 659 insertions(+), 10 deletions(-)
 create mode 100644 Documentation/driver-api/steal-monitor.rst
 create mode 100644 drivers/virt/steal_monitor/Makefile
 create mode 100644 drivers/virt/steal_monitor/defaults.c
 create mode 100644 drivers/virt/steal_monitor/sm_core.c
 create mode 100644 drivers/virt/steal_monitor/sm_core.h

-- 
2.47.3


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2026-06-25 12:50 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-25 12:46 [PATCH v5 00/24] sched: Introduce cpu_preferred_mask and steal-driven vCPU backoff Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 01/24] sched/debug: Remove unused schedstats Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 02/24] sched/docs: Document cpu_preferred_mask and Preferred CPU concept Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 03/24] kconfig: Provide PREFERRED_CPU option Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 04/24] cpumask: Introduce cpu_preferred_mask Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 05/24] sysfs: Add preferred CPU file Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 06/24] sched/core: allow only preferred CPUs in is_cpu_allowed Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 07/24] sched/fair: Select preferred CPU at wakeup when possible Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 08/24] sched/fair: load balance only among preferred CPUs Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 09/24] sched/fair: Pull the load on preferred CPU Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 10/24] sched/core: Keep tick on non-preferred CPUs until tasks are out Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 11/24] sched/core: Push current task from non preferred CPU Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 12/24] sched/debug: Add migration stats due to non preferred CPUs Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 13/24] virt/steal_monitor: Add documentation Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 14/24] virt: Introduce steal monitor driver Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 15/24] virt/steal_monitor: Restore to active on module disable Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 16/24] virt/steal_monitor: Define steal_monitor structure Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 17/24] virt/steal_monitor: Add control knobs for handling steal values Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 18/24] virt/steal_monitor: Compute work at regular intervals Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 19/24] virt/steal_monitor: Provide default method to get systemwide steal time Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 20/24] virt/steal_monitor: Provide default method to inc/dec preferred CPUs Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 21/24] virt/steal_monitor: Provide default method to get num of CPUs for steal ratio Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 22/24] virt/steal_monitor: Act on steal values at regular intervals Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 23/24] virt/steal_monitor: Add direction control Shrikanth Hegde
2026-06-25 12:46 ` [PATCH v5 24/24] virt/steal_monitor: Add design check of preferred subset of active Shrikanth Hegde

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox