From: Andrea Righi <arighi@nvidia.com>
To: Tejun Heo <tj@kernel.org>
Cc: David Vernet <void@manifault.com>,
Changwoo Min <changwoo@igalia.com>,
Dan Schatzberg <schatzberg.dan@gmail.com>,
Emil Tsalapatis <etsal@meta.com>,
sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 02/13] sched_ext: Make slice values tunable and use shorter slice in bypass mode
Date: Mon, 10 Nov 2025 08:59:56 +0100 [thread overview]
Message-ID: <aRGbfO7COrEpijtd@gpd4> (raw)
In-Reply-To: <aRGOSSN9PiCBCoWy@gpd4>
On Mon, Nov 10, 2025 at 08:03:45AM +0100, Andrea Righi wrote:
> Hi Tejun,
>
> On Sun, Nov 09, 2025 at 08:31:01AM -1000, Tejun Heo wrote:
> > There have been reported cases of bypass mode not making forward progress fast
> > enough. The 20ms default slice is unnecessarily long for bypass mode where the
> > primary goal is ensuring all tasks can make forward progress.
> >
> > Introduce SCX_SLICE_BYPASS set to 5ms and make the scheduler automatically
> > switch to it when entering bypass mode. Also make both the default and bypass
> > slice values tunable through module parameters (slice_dfl_us and
> > slice_bypass_us, adjustable between 100us and 100ms) to make it easier to test
> > whether slice durations are a factor in problem cases. Note that the configured
> > values are applied through bypass mode switching and thus are guaranteed to
> > apply only during scheduler [un]load operations.
>
> IIRC Changwoo suggested to introduce a tunable to change the default time
> slice in the past.
>
> I agree that slice_bypass_us can be a tunable in sysfs, but I think it'd be
> nicer if the default time slice would be a property of sched_ext_ops, is
> there any reason to not do that?
Moreover (not necessarily for this patchset, we can add this later), should
we turn SCX_SLICE_DFL into a special value (e.g., 0) and have the
schedulers that currently rely on it automatically pick up the new global
default time slice internally?
Thanks,
-Andrea
>
> Thanks,
> -Andrea
>
> >
> > Cc: Dan Schatzberg <schatzberg.dan@gmail.com>
> > Cc: Emil Tsalapatis <etsal@meta.com>
> > Signed-off-by: Tejun Heo <tj@kernel.org>
> > ---
> > include/linux/sched/ext.h | 11 +++++++++++
> > kernel/sched/ext.c | 37 ++++++++++++++++++++++++++++++++++---
> > 2 files changed, 45 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/linux/sched/ext.h b/include/linux/sched/ext.h
> > index eb776b094d36..9f5b0f2be310 100644
> > --- a/include/linux/sched/ext.h
> > +++ b/include/linux/sched/ext.h
> > @@ -17,7 +17,18 @@
> > enum scx_public_consts {
> > SCX_OPS_NAME_LEN = 128,
> >
> > + /*
> > + * %SCX_SLICE_DFL is used to refill slices when the BPF scheduler misses
> > + * to set the slice for a task that is selected for execution.
> > + * %SCX_EV_REFILL_SLICE_DFL counts the number of times the default slice
> > + * refill has been triggered.
> > + *
> > + * %SCX_SLICE_BYPASS is used as the slice for all tasks in the bypass
> > + * mode. As mkaing forward progress for all tasks is the main goal of
> > + * the bypass mode, a shorter slice is used.
> > + */
> > SCX_SLICE_DFL = 20 * 1000000, /* 20ms */
> > + SCX_SLICE_BYPASS = 5 * 1000000, /* 5ms */
> > SCX_SLICE_INF = U64_MAX, /* infinite, implies nohz */
> > };
> >
> > diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> > index cf8d86a2585c..2ce226018dbe 100644
> > --- a/kernel/sched/ext.c
> > +++ b/kernel/sched/ext.c
> > @@ -143,6 +143,35 @@ static struct scx_dump_data scx_dump_data = {
> > /* /sys/kernel/sched_ext interface */
> > static struct kset *scx_kset;
> >
> > +/*
> > + * Parameter that can be adjusted through /sys/module/sched_ext/parameters.
> > + * There usually is no reason to modify these as normal scheduler opertion
> > + * shouldn't be affected by them. The knobs are primarily for debugging.
> > + */
> > +static u64 scx_slice_dfl = SCX_SLICE_DFL;
> > +static unsigned int scx_slice_dfl_us = SCX_SLICE_DFL / NSEC_PER_USEC;
> > +static unsigned int scx_slice_bypass_us = SCX_SLICE_BYPASS / NSEC_PER_USEC;
> > +
> > +static int set_slice_us(const char *val, const struct kernel_param *kp)
> > +{
> > + return param_set_uint_minmax(val, kp, 100, 100 * USEC_PER_MSEC);
> > +}
> > +
> > +static const struct kernel_param_ops slice_us_param_ops = {
> > + .set = set_slice_us,
> > + .get = param_get_uint,
> > +};
> > +
> > +#undef MODULE_PARAM_PREFIX
> > +#define MODULE_PARAM_PREFIX "sched_ext."
> > +
> > +module_param_cb(slice_dfl_us, &slice_us_param_ops, &scx_slice_dfl_us, 0600);
> > +MODULE_PARM_DESC(slice_dfl_us, "default slice in microseconds, applied on [un]load (100us to 100ms)");
> > +module_param_cb(slice_bypass_us, &slice_us_param_ops, &scx_slice_bypass_us, 0600);
> > +MODULE_PARM_DESC(slice_bypass_us, "bypass slice in microseconds, applied on [un]load (100us to 100ms)");
> > +
> > +#undef MODULE_PARAM_PREFIX
> > +
> > #define CREATE_TRACE_POINTS
> > #include <trace/events/sched_ext.h>
> >
> > @@ -919,7 +948,7 @@ static void dsq_mod_nr(struct scx_dispatch_q *dsq, s32 delta)
> >
> > static void refill_task_slice_dfl(struct scx_sched *sch, struct task_struct *p)
> > {
> > - p->scx.slice = SCX_SLICE_DFL;
> > + p->scx.slice = scx_slice_dfl;
> > __scx_add_event(sch, SCX_EV_REFILL_SLICE_DFL, 1);
> > }
> >
> > @@ -2892,7 +2921,7 @@ void init_scx_entity(struct sched_ext_entity *scx)
> > INIT_LIST_HEAD(&scx->runnable_node);
> > scx->runnable_at = jiffies;
> > scx->ddsp_dsq_id = SCX_DSQ_INVALID;
> > - scx->slice = SCX_SLICE_DFL;
> > + scx->slice = scx_slice_dfl;
> > }
> >
> > void scx_pre_fork(struct task_struct *p)
> > @@ -3770,6 +3799,7 @@ static void scx_bypass(bool bypass)
> > WARN_ON_ONCE(scx_bypass_depth <= 0);
> > if (scx_bypass_depth != 1)
> > goto unlock;
> > + scx_slice_dfl = scx_slice_bypass_us * NSEC_PER_USEC;
> > bypass_timestamp = ktime_get_ns();
> > if (sch)
> > scx_add_event(sch, SCX_EV_BYPASS_ACTIVATE, 1);
> > @@ -3778,6 +3808,7 @@ static void scx_bypass(bool bypass)
> > WARN_ON_ONCE(scx_bypass_depth < 0);
> > if (scx_bypass_depth != 0)
> > goto unlock;
> > + scx_slice_dfl = scx_slice_dfl_us * NSEC_PER_USEC;
> > if (sch)
> > scx_add_event(sch, SCX_EV_BYPASS_DURATION,
> > ktime_get_ns() - bypass_timestamp);
> > @@ -4776,7 +4807,7 @@ static int scx_enable(struct sched_ext_ops *ops, struct bpf_link *link)
> > queue_flags |= DEQUEUE_CLASS;
> >
> > scoped_guard (sched_change, p, queue_flags) {
> > - p->scx.slice = SCX_SLICE_DFL;
> > + p->scx.slice = scx_slice_dfl;
> > p->sched_class = new_class;
> > }
> > }
> > --
> > 2.51.1
> >
next prev parent reply other threads:[~2025-11-10 8:00 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-09 18:30 [PATCHSET sched_ext/for-6.19] sched_ext: Improve bypass mode scalability Tejun Heo
2025-11-09 18:31 ` [PATCH 01/13] sched_ext: Don't set ddsp_dsq_id during select_cpu in bypass mode Tejun Heo
2025-11-10 6:57 ` Andrea Righi
2025-11-10 16:08 ` Tejun Heo
2025-11-09 18:31 ` [PATCH 02/13] sched_ext: Make slice values tunable and use shorter slice " Tejun Heo
2025-11-10 7:03 ` Andrea Righi
2025-11-10 7:59 ` Andrea Righi [this message]
2025-11-10 16:21 ` Tejun Heo
2025-11-10 16:22 ` Tejun Heo
2025-11-10 8:22 ` Andrea Righi
2025-11-11 14:57 ` Dan Schatzberg
2025-11-09 18:31 ` [PATCH 03/13] sched_ext: Refactor do_enqueue_task() local and global DSQ paths Tejun Heo
2025-11-10 7:21 ` Andrea Righi
2025-11-09 18:31 ` [PATCH 04/13] sched_ext: Use per-CPU DSQs instead of per-node global DSQs in bypass mode Tejun Heo
2025-11-10 7:42 ` Andrea Righi
2025-11-10 16:42 ` Tejun Heo
2025-11-10 17:30 ` Andrea Righi
2025-11-11 15:31 ` Dan Schatzberg
2025-11-09 18:31 ` [PATCH 05/13] sched_ext: Simplify breather mechanism with scx_aborting flag Tejun Heo
2025-11-10 7:45 ` Andrea Righi
2025-11-11 15:34 ` Dan Schatzberg
2025-11-09 18:31 ` [PATCH 06/13] sched_ext: Exit dispatch and move operations immediately when aborting Tejun Heo
2025-11-10 8:20 ` Andrea Righi
2025-11-10 18:51 ` Tejun Heo
2025-11-11 15:46 ` Dan Schatzberg
2025-11-09 18:31 ` [PATCH 07/13] sched_ext: Make scx_exit() and scx_vexit() return bool Tejun Heo
2025-11-10 8:28 ` Andrea Righi
2025-11-11 15:48 ` Dan Schatzberg
2025-11-09 18:31 ` [PATCH 08/13] sched_ext: Refactor lockup handlers into handle_lockup() Tejun Heo
2025-11-10 8:29 ` Andrea Righi
2025-11-11 15:49 ` Dan Schatzberg
2025-11-09 18:31 ` [PATCH 09/13] sched_ext: Make handle_lockup() propagate scx_verror() result Tejun Heo
2025-11-10 8:29 ` Andrea Righi
2025-11-09 18:31 ` [PATCH 10/13] sched_ext: Hook up hardlockup detector Tejun Heo
2025-11-10 8:31 ` Andrea Righi
2025-11-09 18:31 ` [PATCH 11/13] sched_ext: Add scx_cpu0 example scheduler Tejun Heo
2025-11-10 8:36 ` Andrea Righi
2025-11-10 18:44 ` Tejun Heo
2025-11-10 21:06 ` Andrea Righi
2025-11-10 22:08 ` Tejun Heo
2025-11-09 18:31 ` [PATCH 12/13] sched_ext: Factor out scx_dsq_list_node cursor initialization into INIT_DSQ_LIST_CURSOR Tejun Heo
2025-11-10 8:37 ` Andrea Righi
2025-11-09 18:31 ` [PATCH 13/13] sched_ext: Implement load balancer for bypass mode Tejun Heo
2025-11-10 9:38 ` Andrea Righi
2025-11-10 19:21 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aRGbfO7COrEpijtd@gpd4 \
--to=arighi@nvidia.com \
--cc=changwoo@igalia.com \
--cc=etsal@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=schatzberg.dan@gmail.com \
--cc=sched-ext@lists.linux.dev \
--cc=tj@kernel.org \
--cc=void@manifault.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox