public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrea Righi <arighi@nvidia.com>
To: Tejun Heo <tj@kernel.org>
Cc: David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>,
	sched-ext@lists.linux.dev, Emil Tsalapatis <emil@etsalapatis.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/3] sched_ext: Implement SCX_ENQ_IMMED
Date: Mon, 9 Mar 2026 18:35:37 +0100	[thread overview]
Message-ID: <aa8E6WauTRALRCqs@gpd4> (raw)
In-Reply-To: <20260307002817.1298341-3-tj@kernel.org>

Hi Tejun,

On Fri, Mar 06, 2026 at 02:28:16PM -1000, Tejun Heo wrote:
> Add SCX_ENQ_IMMED enqueue flag for inserting into local DSQs. It requests
> that the task be queued on the CPU's local DSQ only if it can execute
> immediately - the current task is done and no other tasks are waiting. If the
> CPU is busy, the task is re-enqueued back to the BPF scheduler with
> SCX_TASK_REENQ_IMMED so that it can be dispatched elsewhere. When multiple
> IMMED tasks are inserted, only the first one stays if the current task is
> done and the rest are re-enqueued.
> 
> One intended use case is enabling opportunistic CPU sharing across multiple
> sub-schedulers. Without this, a sub-scheduler can stuff the local DSQ of a
> shared CPU, making it difficult for others to use. More generally, multiple
> tasks on a local DSQ can cause high latencies, and stricter control can help.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
...
> @@ -3818,15 +3929,31 @@ static void process_deferred_reenq_locals(struct rq *rq)
>  			sch_pcpu = container_of(drl, struct scx_sched_pcpu,
>  						deferred_reenq_local);
>  			sch = sch_pcpu->sch;
> +
>  			reenq_flags = drl->flags;
>  			WRITE_ONCE(drl->flags, 0);
>  			list_del_init(&drl->node);
> +
> +			if (likely(drl->seq != seq)) {
> +				drl->seq = seq;
> +				drl->cnt = 0;
> +			} else {
> +				if (unlikely(++drl->cnt > SCX_REENQ_LOCAL_MAX_REPEAT)) {
> +					scx_error(sch, "SCX_ENQ_REENQ on SCX_DSQ_LOCAL repeated %u times",
> +						  drl->cnt);

Instead of triggering an error here, should we simply accept the task into
the local DSQ and ignore the fact that the CPU is busy?

I'm thinking at the SCX_OPS_ALWAYS_ENQ_IMMED scenario. In that case, the
scheduler can't clear the ENQ_IMMED flag, so it may hit this error loop
limitation, unless it explicitly bounces the task to a non-local DSQ at
some point.

> +					skip = true;
> +				}
> +
> +				__scx_add_event(sch, SCX_EV_REENQ_LOCAL_REPEAT, 1);
> +			}
>  		}
>  
> -		/* see schedule_dsq_reenq() */
> -		smp_mb();
> +		if (!skip) {
> +			/* see schedule_dsq_reenq() */
> +			smp_mb();
>  
> -		reenq_local(sch, rq, reenq_flags);
> +			reenq_local(sch, rq, reenq_flags);
> +		}
>  	}
>  }
...

> diff --git a/kernel/sched/ext_internal.h b/kernel/sched/ext_internal.h
> index f8df73044515..cd4272117be4 100644
> --- a/kernel/sched/ext_internal.h
> +++ b/kernel/sched/ext_internal.h
> @@ -31,6 +31,8 @@ enum scx_consts {
>  	SCX_BYPASS_LB_MIN_DELTA_DIV	= 4,
>  	SCX_BYPASS_LB_BATCH		= 256,
>  
> +	SCX_REENQ_LOCAL_MAX_REPEAT	= 256,

That's a lot of re-enqueues. What if we simply ignore SCX_ENQ_IMMED when
SCX_ENQ_REENQ is set?

This would solve the SCX_OPS_ALWAYS_ENQ_IMMED issue and naturally limit the
loop to a single retry:
 - first attempt (IMMED) fails -> task re-enqueued with REENQ flag,
 - second attempt sees REENQ -> ignores IMMED check -> queues normally on
   local DSQ.

This approach seems more robust and would avoid the latency overhead of
repeated failures (the re-enqueues were actually the reason of the latency
issues that I was experiencing). If I don't use SCX_OPS_ALWAYS_ENQ_IMMED
and I selectively use SCX_ENQ_IMMED with just one retry I can actually see
some small, but consistent, benefits with scx_cosmos running some latency
benchmarks.

What do you think?

Thanks,
-Andrea

  reply	other threads:[~2026-03-09 17:35 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-07  0:28 [PATCHSET sched_ext/for-7.1] sched_ext: Implement SCX_ENQ_IMMED Tejun Heo
2026-03-07  0:28 ` [PATCH 1/3] sched_ext: Disallow setting slice to zero via scx_bpf_task_set_slice() Tejun Heo
2026-03-07  0:28 ` [PATCH 2/3] sched_ext: Implement SCX_ENQ_IMMED Tejun Heo
2026-03-09 17:35   ` Andrea Righi [this message]
2026-03-13 10:40     ` Tejun Heo
2026-03-13 11:11       ` Andrea Righi
2026-03-13 11:32         ` Tejun Heo
2026-03-07  0:28 ` [PATCH 3/3] sched_ext: Add SCX_OPS_ALWAYS_ENQ_IMMED ops flag Tejun Heo
2026-03-07 22:36 ` [PATCHSET sched_ext/for-7.1] sched_ext: Implement SCX_ENQ_IMMED Andrea Righi
2026-03-08  0:19   ` Tejun Heo
2026-03-08  8:54     ` Andrea Righi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aa8E6WauTRALRCqs@gpd4 \
    --to=arighi@nvidia.com \
    --cc=changwoo@igalia.com \
    --cc=emil@etsalapatis.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sched-ext@lists.linux.dev \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox