All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Righi <arighi@nvidia.com>
To: Tejun Heo <tj@kernel.org>
Cc: David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>,
	sched-ext@lists.linux.dev, Emil Tsalapatis <emil@etsalapatis.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/3] sched_ext: Implement SCX_ENQ_IMMED
Date: Mon, 9 Mar 2026 18:35:37 +0100	[thread overview]
Message-ID: <aa8E6WauTRALRCqs@gpd4> (raw)
In-Reply-To: <20260307002817.1298341-3-tj@kernel.org>

Hi Tejun,

On Fri, Mar 06, 2026 at 02:28:16PM -1000, Tejun Heo wrote:
> Add SCX_ENQ_IMMED enqueue flag for inserting into local DSQs. It requests
> that the task be queued on the CPU's local DSQ only if it can execute
> immediately - the current task is done and no other tasks are waiting. If the
> CPU is busy, the task is re-enqueued back to the BPF scheduler with
> SCX_TASK_REENQ_IMMED so that it can be dispatched elsewhere. When multiple
> IMMED tasks are inserted, only the first one stays if the current task is
> done and the rest are re-enqueued.
> 
> One intended use case is enabling opportunistic CPU sharing across multiple
> sub-schedulers. Without this, a sub-scheduler can stuff the local DSQ of a
> shared CPU, making it difficult for others to use. More generally, multiple
> tasks on a local DSQ can cause high latencies, and stricter control can help.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
...
> @@ -3818,15 +3929,31 @@ static void process_deferred_reenq_locals(struct rq *rq)
>  			sch_pcpu = container_of(drl, struct scx_sched_pcpu,
>  						deferred_reenq_local);
>  			sch = sch_pcpu->sch;
> +
>  			reenq_flags = drl->flags;
>  			WRITE_ONCE(drl->flags, 0);
>  			list_del_init(&drl->node);
> +
> +			if (likely(drl->seq != seq)) {
> +				drl->seq = seq;
> +				drl->cnt = 0;
> +			} else {
> +				if (unlikely(++drl->cnt > SCX_REENQ_LOCAL_MAX_REPEAT)) {
> +					scx_error(sch, "SCX_ENQ_REENQ on SCX_DSQ_LOCAL repeated %u times",
> +						  drl->cnt);

Instead of triggering an error here, should we simply accept the task into
the local DSQ and ignore the fact that the CPU is busy?

I'm thinking at the SCX_OPS_ALWAYS_ENQ_IMMED scenario. In that case, the
scheduler can't clear the ENQ_IMMED flag, so it may hit this error loop
limitation, unless it explicitly bounces the task to a non-local DSQ at
some point.

> +					skip = true;
> +				}
> +
> +				__scx_add_event(sch, SCX_EV_REENQ_LOCAL_REPEAT, 1);
> +			}
>  		}
>  
> -		/* see schedule_dsq_reenq() */
> -		smp_mb();
> +		if (!skip) {
> +			/* see schedule_dsq_reenq() */
> +			smp_mb();
>  
> -		reenq_local(sch, rq, reenq_flags);
> +			reenq_local(sch, rq, reenq_flags);
> +		}
>  	}
>  }
...

> diff --git a/kernel/sched/ext_internal.h b/kernel/sched/ext_internal.h
> index f8df73044515..cd4272117be4 100644
> --- a/kernel/sched/ext_internal.h
> +++ b/kernel/sched/ext_internal.h
> @@ -31,6 +31,8 @@ enum scx_consts {
>  	SCX_BYPASS_LB_MIN_DELTA_DIV	= 4,
>  	SCX_BYPASS_LB_BATCH		= 256,
>  
> +	SCX_REENQ_LOCAL_MAX_REPEAT	= 256,

That's a lot of re-enqueues. What if we simply ignore SCX_ENQ_IMMED when
SCX_ENQ_REENQ is set?

This would solve the SCX_OPS_ALWAYS_ENQ_IMMED issue and naturally limit the
loop to a single retry:
 - first attempt (IMMED) fails -> task re-enqueued with REENQ flag,
 - second attempt sees REENQ -> ignores IMMED check -> queues normally on
   local DSQ.

This approach seems more robust and would avoid the latency overhead of
repeated failures (the re-enqueues were actually the reason of the latency
issues that I was experiencing). If I don't use SCX_OPS_ALWAYS_ENQ_IMMED
and I selectively use SCX_ENQ_IMMED with just one retry I can actually see
some small, but consistent, benefits with scx_cosmos running some latency
benchmarks.

What do you think?

Thanks,
-Andrea

  reply	other threads:[~2026-03-09 17:35 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-07  0:28 [PATCHSET sched_ext/for-7.1] sched_ext: Implement SCX_ENQ_IMMED Tejun Heo
2026-03-07  0:28 ` [PATCH 1/3] sched_ext: Disallow setting slice to zero via scx_bpf_task_set_slice() Tejun Heo
2026-03-07  0:28 ` [PATCH 2/3] sched_ext: Implement SCX_ENQ_IMMED Tejun Heo
2026-03-09 17:35   ` Andrea Righi [this message]
2026-03-13 10:40     ` Tejun Heo
2026-03-13 11:11       ` Andrea Righi
2026-03-13 11:32         ` Tejun Heo
2026-03-07  0:28 ` [PATCH 3/3] sched_ext: Add SCX_OPS_ALWAYS_ENQ_IMMED ops flag Tejun Heo
2026-03-07 22:36 ` [PATCHSET sched_ext/for-7.1] sched_ext: Implement SCX_ENQ_IMMED Andrea Righi
2026-03-08  0:19   ` Tejun Heo
2026-03-08  8:54     ` Andrea Righi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aa8E6WauTRALRCqs@gpd4 \
    --to=arighi@nvidia.com \
    --cc=changwoo@igalia.com \
    --cc=emil@etsalapatis.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sched-ext@lists.linux.dev \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.