From: Tejun Heo <tj@kernel.org>
To: David Vernet <void@manifault.com>,
Andrea Righi <andrea.righi@linux.dev>,
Changwoo Min <changwoo@igalia.com>
Cc: Dan Schatzberg <schatzberg.dan@gmail.com>,
Emil Tsalapatis <etsal@meta.com>,
sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org,
Tejun Heo <tj@kernel.org>
Subject: [PATCH v2 02/14] sched_ext: Make slice values tunable and use shorter slice in bypass mode
Date: Mon, 10 Nov 2025 10:56:24 -1000 [thread overview]
Message-ID: <20251110205636.405592-3-tj@kernel.org> (raw)
In-Reply-To: <20251110205636.405592-1-tj@kernel.org>
There have been reported cases of bypass mode not making forward progress fast
enough. The 20ms default slice is unnecessarily long for bypass mode where the
primary goal is ensuring all tasks can make forward progress.
Introduce SCX_SLICE_BYPASS set to 5ms and make the scheduler automatically
switch to it when entering bypass mode. Also make the bypass slice value
tunable through the slice_bypass_us module parameter (adjustable between 100us
and 100ms) to make it easier to test whether slice durations are a factor in
problem cases.
v2: Removed slice_dfl_us module parameter. Fixed typos (Andrea).
Cc: Dan Schatzberg <schatzberg.dan@gmail.com>
Cc: Emil Tsalapatis <etsal@meta.com>
Cc: Andrea Righi <andrea.righi@linux.dev>
Signed-off-by: Tejun Heo <tj@kernel.org>
---
include/linux/sched/ext.h | 11 +++++++++++
kernel/sched/ext.c | 34 +++++++++++++++++++++++++++++++---
2 files changed, 42 insertions(+), 3 deletions(-)
diff --git a/include/linux/sched/ext.h b/include/linux/sched/ext.h
index eb776b094d36..60285c3d07cf 100644
--- a/include/linux/sched/ext.h
+++ b/include/linux/sched/ext.h
@@ -17,7 +17,18 @@
enum scx_public_consts {
SCX_OPS_NAME_LEN = 128,
+ /*
+ * %SCX_SLICE_DFL is used to refill slices when the BPF scheduler misses
+ * to set the slice for a task that is selected for execution.
+ * %SCX_EV_REFILL_SLICE_DFL counts the number of times the default slice
+ * refill has been triggered.
+ *
+ * %SCX_SLICE_BYPASS is used as the slice for all tasks in the bypass
+ * mode. As making forward progress for all tasks is the main goal of
+ * the bypass mode, a shorter slice is used.
+ */
SCX_SLICE_DFL = 20 * 1000000, /* 20ms */
+ SCX_SLICE_BYPASS = 5 * 1000000, /* 5ms */
SCX_SLICE_INF = U64_MAX, /* infinite, implies nohz */
};
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index cf8d86a2585c..abf2075f174f 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -143,6 +143,32 @@ static struct scx_dump_data scx_dump_data = {
/* /sys/kernel/sched_ext interface */
static struct kset *scx_kset;
+/*
+ * Parameters that can be adjusted through /sys/module/sched_ext/parameters.
+ * There usually is no reason to modify these as normal scheduler operation
+ * shouldn't be affected by them. The knobs are primarily for debugging.
+ */
+static u64 scx_slice_dfl = SCX_SLICE_DFL;
+static unsigned int scx_slice_bypass_us = SCX_SLICE_BYPASS / NSEC_PER_USEC;
+
+static int set_slice_us(const char *val, const struct kernel_param *kp)
+{
+ return param_set_uint_minmax(val, kp, 100, 100 * USEC_PER_MSEC);
+}
+
+static const struct kernel_param_ops slice_us_param_ops = {
+ .set = set_slice_us,
+ .get = param_get_uint,
+};
+
+#undef MODULE_PARAM_PREFIX
+#define MODULE_PARAM_PREFIX "sched_ext."
+
+module_param_cb(slice_bypass_us, &slice_us_param_ops, &scx_slice_bypass_us, 0600);
+MODULE_PARM_DESC(slice_bypass_us, "bypass slice in microseconds, applied on [un]load (100us to 100ms)");
+
+#undef MODULE_PARAM_PREFIX
+
#define CREATE_TRACE_POINTS
#include <trace/events/sched_ext.h>
@@ -919,7 +945,7 @@ static void dsq_mod_nr(struct scx_dispatch_q *dsq, s32 delta)
static void refill_task_slice_dfl(struct scx_sched *sch, struct task_struct *p)
{
- p->scx.slice = SCX_SLICE_DFL;
+ p->scx.slice = scx_slice_dfl;
__scx_add_event(sch, SCX_EV_REFILL_SLICE_DFL, 1);
}
@@ -2892,7 +2918,7 @@ void init_scx_entity(struct sched_ext_entity *scx)
INIT_LIST_HEAD(&scx->runnable_node);
scx->runnable_at = jiffies;
scx->ddsp_dsq_id = SCX_DSQ_INVALID;
- scx->slice = SCX_SLICE_DFL;
+ scx->slice = scx_slice_dfl;
}
void scx_pre_fork(struct task_struct *p)
@@ -3770,6 +3796,7 @@ static void scx_bypass(bool bypass)
WARN_ON_ONCE(scx_bypass_depth <= 0);
if (scx_bypass_depth != 1)
goto unlock;
+ scx_slice_dfl = scx_slice_bypass_us * NSEC_PER_USEC;
bypass_timestamp = ktime_get_ns();
if (sch)
scx_add_event(sch, SCX_EV_BYPASS_ACTIVATE, 1);
@@ -3778,6 +3805,7 @@ static void scx_bypass(bool bypass)
WARN_ON_ONCE(scx_bypass_depth < 0);
if (scx_bypass_depth != 0)
goto unlock;
+ scx_slice_dfl = SCX_SLICE_DFL;
if (sch)
scx_add_event(sch, SCX_EV_BYPASS_DURATION,
ktime_get_ns() - bypass_timestamp);
@@ -4776,7 +4804,7 @@ static int scx_enable(struct sched_ext_ops *ops, struct bpf_link *link)
queue_flags |= DEQUEUE_CLASS;
scoped_guard (sched_change, p, queue_flags) {
- p->scx.slice = SCX_SLICE_DFL;
+ p->scx.slice = scx_slice_dfl;
p->sched_class = new_class;
}
}
--
2.51.2
next prev parent reply other threads:[~2025-11-10 20:56 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-10 20:56 [PATCHSET v2 sched_ext/for-6.19] sched_ext: Improve bypass mode scalability Tejun Heo
2025-11-10 20:56 ` [PATCH v2 01/14] sched_ext: Don't set ddsp_dsq_id during select_cpu in bypass mode Tejun Heo
2025-11-10 21:21 ` Emil Tsalapatis
2025-11-10 21:56 ` Tejun Heo
2025-11-10 20:56 ` Tejun Heo [this message]
2025-11-10 21:56 ` [PATCH v2 02/14] sched_ext: Make slice values tunable and use shorter slice " Emil Tsalapatis
2025-11-11 17:43 ` [PATCH v3 02/14] sched_ext: Use " Tejun Heo
2025-11-11 18:07 ` Andrea Righi
2025-11-10 20:56 ` [PATCH v2 03/14] sched_ext: Refactor do_enqueue_task() local and global DSQ paths Tejun Heo
2025-11-10 22:06 ` Emil Tsalapatis
2025-11-10 20:56 ` [PATCH v2 04/14] sched_ext: Use per-CPU DSQs instead of per-node global DSQs in bypass mode Tejun Heo
2025-11-10 21:43 ` Emil Tsalapatis
2025-11-10 21:59 ` Tejun Heo
2025-11-10 23:26 ` Emil Tsalapatis
2025-11-10 20:56 ` [PATCH v2 05/14] sched_ext: Simplify breather mechanism with scx_aborting flag Tejun Heo
2025-11-11 16:34 ` Emil Tsalapatis
2025-11-10 20:56 ` [PATCH v2 06/14] sched_ext: Exit dispatch and move operations immediately when aborting Tejun Heo
2025-11-10 20:56 ` [PATCH v2 07/14] sched_ext: Make scx_exit() and scx_vexit() return bool Tejun Heo
2025-11-10 20:56 ` [PATCH v2 08/14] sched_ext: Refactor lockup handlers into handle_lockup() Tejun Heo
2025-11-10 20:56 ` [PATCH v2 09/14] sched_ext: Make handle_lockup() propagate scx_verror() result Tejun Heo
2025-11-10 20:56 ` [PATCH v2 10/14] sched_ext: Hook up hardlockup detector Tejun Heo
2025-11-11 18:33 ` [PATCH UPDATED " Tejun Heo
2025-11-11 18:39 ` Tejun Heo
2025-11-10 20:56 ` [PATCH v2 11/14] sched_ext: Add scx_cpu0 example scheduler Tejun Heo
2025-11-10 20:56 ` [PATCH v2 12/14] sched_ext: Factor out scx_dsq_list_node cursor initialization into INIT_DSQ_LIST_CURSOR Tejun Heo
2025-11-10 23:56 ` Emil Tsalapatis
2025-11-10 20:56 ` [PATCH v2 13/14] sched_ext: Factor out abbreviated dispatch dequeue into dispatch_dequeue_locked() Tejun Heo
2025-11-10 20:56 ` [PATCH v2 14/14] sched_ext: Implement load balancer for bypass mode Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251110205636.405592-3-tj@kernel.org \
--to=tj@kernel.org \
--cc=andrea.righi@linux.dev \
--cc=changwoo@igalia.com \
--cc=etsal@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=schatzberg.dan@gmail.com \
--cc=sched-ext@lists.linux.dev \
--cc=void@manifault.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox