public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET sched_ext/for-6.12-fixes] sched_ext: Split %SCX_DSQ_GLOBAL per-node
@ 2024-09-25  0:06 Tejun Heo
  2024-09-25  0:06 ` [PATCH 1/5] scx_flatcg: Use a user DSQ for fallback instead of SCX_DSQ_GLOBAL Tejun Heo
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Tejun Heo @ 2024-09-25  0:06 UTC (permalink / raw)
  To: void; +Cc: kernel-team, linux-kernel, sched-ext

In the bypass mode, the global DSQ is used to schedule all tasks in simple
FIFO order. All tasks are queued into the global DSQ and all CPUs try to
execute tasks from it. This creates a lot of cross-node cacheline accesses
and scheduling across the node boundaries, and can lead to live-lock
conditions where the system takes tens of minutes to disable the BPF
scheduler while executing in the bypass mode.

This patchset splits the global DSQ per NUMA node. Each node has its own
global DSQ. When a task is dispatched to SCX_DSQ_GLOBAL, it's put into the
global DSQ local to the task's CPU and all CPUs in a node only consume its
node-local global DSQ.

This resolves a livelock condition which could be reliably triggered on an
2x EPYC 7642 system by running `stress-ng --race-sched 1024` together with
`stress-ng --workload 80 --workload-threads 10` while repeatedly enabling
and disabling a SCX scheduler.

This patchset contains the following patches:

 0001-scx_flatcg-Use-a-user-DSQ-for-fallback-instead-of-SC.patch
 0002-sched_ext-Allow-only-user-DSQs-for-scx_bpf_consume-s.patch
 0003-sched_ext-Relocate-find_user_dsq.patch
 0004-sched_ext-Split-the-global-DSQ-per-NUMA-node.patch
 0005-sched_ext-Use-shorter-slice-while-bypassing.patch

 0001-0003 are preparations.

 0004 splits %SCX_DSQ_GLOBAL per-node.

 0005 reduces time slice used while bypassing. This can make e.g. unloading
 of the BPF scheduler complete faster under heavy contention.

This patchset can also be found in the following git branch:

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git scx-split-global

diffstat follows. Thanks.

 kernel/sched/ext.c               |  109 ++++++++++++++++++++++++++++++++++++++++++-------------------
 tools/sched_ext/scx_flatcg.bpf.c |   17 +++++++--
 2 files changed, 89 insertions(+), 37 deletions(-)

--
tejun

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2024-09-26 23:00 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-25  0:06 [PATCHSET sched_ext/for-6.12-fixes] sched_ext: Split %SCX_DSQ_GLOBAL per-node Tejun Heo
2024-09-25  0:06 ` [PATCH 1/5] scx_flatcg: Use a user DSQ for fallback instead of SCX_DSQ_GLOBAL Tejun Heo
2024-09-25 16:45   ` David Vernet
2024-09-25  0:06 ` [PATCH 2/5] sched_ext: Allow only user DSQs for scx_bpf_consume(), scx_bpf_dsq_nr_queued() and bpf_iter_scx_dsq_new() Tejun Heo
2024-09-25 17:09   ` David Vernet
2024-09-25 21:04     ` Tejun Heo
2024-09-26 21:36       ` David Vernet
2024-09-25  0:06 ` [PATCH 3/5] sched_ext: Relocate find_user_dsq() Tejun Heo
2024-09-26 21:46   ` David Vernet
2024-09-25  0:06 ` [PATCH 4/5] sched_ext: Split the global DSQ per NUMA node Tejun Heo
2024-09-26 21:56   ` David Vernet
2024-09-25  0:06 ` [PATCH 5/5] sched_ext: Use shorter slice while bypassing Tejun Heo
2024-09-26 22:07   ` David Vernet
2024-09-26 22:55     ` Tejun Heo
2024-09-26 23:00 ` [PATCHSET sched_ext/for-6.12-fixes] sched_ext: Split %SCX_DSQ_GLOBAL per-node Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox