public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrea Righi <arighi@nvidia.com>
To: Tejun Heo <tj@kernel.org>
Cc: David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>,
	Dan Schatzberg <schatzberg.dan@gmail.com>,
	Emil Tsalapatis <etsal@meta.com>,
	sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 02/14] sched_ext: Use shorter slice in bypass mode
Date: Tue, 11 Nov 2025 19:07:32 +0100	[thread overview]
Message-ID: <aRN7ZFrwY6OFleiE@gpd4> (raw)
In-Reply-To: <aRN1t_BT4IqYLdBj@slm.duckdns.org>

On Tue, Nov 11, 2025 at 07:43:19AM -1000, Tejun Heo wrote:
> There have been reported cases of bypass mode not making forward progress fast
> enough. The 20ms default slice is unnecessarily long for bypass mode where the
> primary goal is ensuring all tasks can make forward progress.
> 
> Introduce SCX_SLICE_BYPASS set to 5ms and make the scheduler automatically
> switch to it when entering bypass mode. Also make the bypass slice value
> tunable through the slice_bypass_us module parameter (adjustable between 100us
> and 100ms) to make it easier to test whether slice durations are a factor in
> problem cases.
> 
> v3: Use READ_ONCE/WRITE_ONCE for scx_slice_dfl access (Dan).
> 
> v2: Removed slice_dfl_us module parameter. Fixed typos (Andrea).
> 
> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
> Cc: Dan Schatzberg <schatzberg.dan@gmail.com>
> Cc: Andrea Righi <andrea.righi@linux.dev>
> Signed-off-by: Tejun Heo <tj@kernel.org>

Looks good.

Reviewed-by: Andrea Righi <arighi@nvidia.com>

Thanks,
-Andrea

> ---
>  include/linux/sched/ext.h |   11 +++++++++++
>  kernel/sched/ext.c        |   34 +++++++++++++++++++++++++++++++---
>  2 files changed, 42 insertions(+), 3 deletions(-)
> 
> --- a/include/linux/sched/ext.h
> +++ b/include/linux/sched/ext.h
> @@ -17,7 +17,18 @@
>  enum scx_public_consts {
>  	SCX_OPS_NAME_LEN	= 128,
>  
> +	/*
> +	 * %SCX_SLICE_DFL is used to refill slices when the BPF scheduler misses
> +	 * to set the slice for a task that is selected for execution.
> +	 * %SCX_EV_REFILL_SLICE_DFL counts the number of times the default slice
> +	 * refill has been triggered.
> +	 *
> +	 * %SCX_SLICE_BYPASS is used as the slice for all tasks in the bypass
> +	 * mode. As making forward progress for all tasks is the main goal of
> +	 * the bypass mode, a shorter slice is used.
> +	 */
>  	SCX_SLICE_DFL		= 20 * 1000000,	/* 20ms */
> +	SCX_SLICE_BYPASS	=  5 * 1000000, /*  5ms */
>  	SCX_SLICE_INF		= U64_MAX,	/* infinite, implies nohz */
>  };
>  
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -143,6 +143,32 @@ static struct scx_dump_data scx_dump_dat
>  /* /sys/kernel/sched_ext interface */
>  static struct kset *scx_kset;
>  
> +/*
> + * Parameters that can be adjusted through /sys/module/sched_ext/parameters.
> + * There usually is no reason to modify these as normal scheduler operation
> + * shouldn't be affected by them. The knobs are primarily for debugging.
> + */
> +static u64 scx_slice_dfl = SCX_SLICE_DFL;
> +static unsigned int scx_slice_bypass_us = SCX_SLICE_BYPASS / NSEC_PER_USEC;
> +
> +static int set_slice_us(const char *val, const struct kernel_param *kp)
> +{
> +	return param_set_uint_minmax(val, kp, 100, 100 * USEC_PER_MSEC);
> +}
> +
> +static const struct kernel_param_ops slice_us_param_ops = {
> +	.set = set_slice_us,
> +	.get = param_get_uint,
> +};
> +
> +#undef MODULE_PARAM_PREFIX
> +#define MODULE_PARAM_PREFIX	"sched_ext."
> +
> +module_param_cb(slice_bypass_us, &slice_us_param_ops, &scx_slice_bypass_us, 0600);
> +MODULE_PARM_DESC(slice_bypass_us, "bypass slice in microseconds, applied on [un]load (100us to 100ms)");
> +
> +#undef MODULE_PARAM_PREFIX
> +
>  #define CREATE_TRACE_POINTS
>  #include <trace/events/sched_ext.h>
>  
> @@ -919,7 +945,7 @@ static void dsq_mod_nr(struct scx_dispat
>  
>  static void refill_task_slice_dfl(struct scx_sched *sch, struct task_struct *p)
>  {
> -	p->scx.slice = SCX_SLICE_DFL;
> +	p->scx.slice = READ_ONCE(scx_slice_dfl);
>  	__scx_add_event(sch, SCX_EV_REFILL_SLICE_DFL, 1);
>  }
>  
> @@ -2896,7 +2922,7 @@ void init_scx_entity(struct sched_ext_en
>  	INIT_LIST_HEAD(&scx->runnable_node);
>  	scx->runnable_at = jiffies;
>  	scx->ddsp_dsq_id = SCX_DSQ_INVALID;
> -	scx->slice = SCX_SLICE_DFL;
> +	scx->slice = READ_ONCE(scx_slice_dfl);
>  }
>  
>  void scx_pre_fork(struct task_struct *p)
> @@ -3774,6 +3800,7 @@ static void scx_bypass(bool bypass)
>  		WARN_ON_ONCE(scx_bypass_depth <= 0);
>  		if (scx_bypass_depth != 1)
>  			goto unlock;
> +		WRITE_ONCE(scx_slice_dfl, scx_slice_bypass_us * NSEC_PER_USEC);
>  		bypass_timestamp = ktime_get_ns();
>  		if (sch)
>  			scx_add_event(sch, SCX_EV_BYPASS_ACTIVATE, 1);
> @@ -3782,6 +3809,7 @@ static void scx_bypass(bool bypass)
>  		WARN_ON_ONCE(scx_bypass_depth < 0);
>  		if (scx_bypass_depth != 0)
>  			goto unlock;
> +		WRITE_ONCE(scx_slice_dfl, SCX_SLICE_DFL);
>  		if (sch)
>  			scx_add_event(sch, SCX_EV_BYPASS_DURATION,
>  				      ktime_get_ns() - bypass_timestamp);
> @@ -4780,7 +4808,7 @@ static int scx_enable(struct sched_ext_o
>  			queue_flags |= DEQUEUE_CLASS;
>  
>  		scoped_guard (sched_change, p, queue_flags) {
> -			p->scx.slice = SCX_SLICE_DFL;
> +			p->scx.slice = READ_ONCE(scx_slice_dfl);
>  			p->sched_class = new_class;
>  		}
>  	}

  reply	other threads:[~2025-11-11 18:07 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-10 20:56 [PATCHSET v2 sched_ext/for-6.19] sched_ext: Improve bypass mode scalability Tejun Heo
2025-11-10 20:56 ` [PATCH v2 01/14] sched_ext: Don't set ddsp_dsq_id during select_cpu in bypass mode Tejun Heo
2025-11-10 21:21   ` Emil Tsalapatis
2025-11-10 21:56   ` Tejun Heo
2025-11-10 20:56 ` [PATCH v2 02/14] sched_ext: Make slice values tunable and use shorter slice " Tejun Heo
2025-11-10 21:56   ` Emil Tsalapatis
2025-11-11 17:43   ` [PATCH v3 02/14] sched_ext: Use " Tejun Heo
2025-11-11 18:07     ` Andrea Righi [this message]
2025-11-10 20:56 ` [PATCH v2 03/14] sched_ext: Refactor do_enqueue_task() local and global DSQ paths Tejun Heo
2025-11-10 22:06   ` Emil Tsalapatis
2025-11-10 20:56 ` [PATCH v2 04/14] sched_ext: Use per-CPU DSQs instead of per-node global DSQs in bypass mode Tejun Heo
2025-11-10 21:43   ` Emil Tsalapatis
2025-11-10 21:59     ` Tejun Heo
2025-11-10 23:26       ` Emil Tsalapatis
2025-11-10 20:56 ` [PATCH v2 05/14] sched_ext: Simplify breather mechanism with scx_aborting flag Tejun Heo
2025-11-11 16:34   ` Emil Tsalapatis
2025-11-10 20:56 ` [PATCH v2 06/14] sched_ext: Exit dispatch and move operations immediately when aborting Tejun Heo
2025-11-10 20:56 ` [PATCH v2 07/14] sched_ext: Make scx_exit() and scx_vexit() return bool Tejun Heo
2025-11-10 20:56 ` [PATCH v2 08/14] sched_ext: Refactor lockup handlers into handle_lockup() Tejun Heo
2025-11-10 20:56 ` [PATCH v2 09/14] sched_ext: Make handle_lockup() propagate scx_verror() result Tejun Heo
2025-11-10 20:56 ` [PATCH v2 10/14] sched_ext: Hook up hardlockup detector Tejun Heo
2025-11-11 18:33   ` [PATCH UPDATED " Tejun Heo
2025-11-11 18:39     ` Tejun Heo
2025-11-10 20:56 ` [PATCH v2 11/14] sched_ext: Add scx_cpu0 example scheduler Tejun Heo
2025-11-10 20:56 ` [PATCH v2 12/14] sched_ext: Factor out scx_dsq_list_node cursor initialization into INIT_DSQ_LIST_CURSOR Tejun Heo
2025-11-10 23:56   ` Emil Tsalapatis
2025-11-10 20:56 ` [PATCH v2 13/14] sched_ext: Factor out abbreviated dispatch dequeue into dispatch_dequeue_locked() Tejun Heo
2025-11-10 20:56 ` [PATCH v2 14/14] sched_ext: Implement load balancer for bypass mode Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aRN7ZFrwY6OFleiE@gpd4 \
    --to=arighi@nvidia.com \
    --cc=changwoo@igalia.com \
    --cc=etsal@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=schatzberg.dan@gmail.com \
    --cc=sched-ext@lists.linux.dev \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox