public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrea Righi <arighi@nvidia.com>
To: Kuba Piecuch <jpiecuch@google.com>
Cc: Christian Loehle <christian.loehle@arm.com>,
	Tejun Heo <tj@kernel.org>, David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>,
	Emil Tsalapatis <emil@etsalapatis.com>,
	sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org
Subject: Re: [PATCH sched_ext/for-7.1] sched_ext: Documentation: Add missing calls to quiescent(), runnable()
Date: Thu, 9 Apr 2026 16:12:29 +0200	[thread overview]
Message-ID: <adezzWWsgrcGGD3j@gpd4> (raw)
In-Reply-To: <DHONT6QVA0PH.IP79IYRM8PKK@google.com>

On Thu, Apr 09, 2026 at 01:30:55PM +0000, Kuba Piecuch wrote:
> On Thu Apr 9, 2026 at 9:46 AM UTC, Christian Loehle wrote:
> ...
> >>>
> >>>     ops.init_task();            /* A new task is created */
> >>>     ops.enable();               /* Enable BPF scheduling for the task */
> >>>
> >>>     while (task in SCHED_EXT) {
> >>>         if (task can migrate)
> >>>             ops.select_cpu();   /* Called on wakeup (optimization) */
> >>>
> >>>         ops.runnable();         /* Task becomes ready to run */
> >>>
> >>>         while (task_is_runnable(task)) {
> >>>             if (task is not in a DSQ || task->scx.slice == 0) {
> >>>                 ops.enqueue();  /* Task can be added to a DSQ */
> >>>
> >>>                 /* Task property change (i.e., affinity, nice, etc.)? */
> >>>                 if (sched_change(task)) {
> >>>                     ops.dequeue(); /* Exiting BPF scheduler custody */
> >>>                     ops.quiescent();
> >>>
> >>>                     /* Property change callback, e.g. ops.set_weight() */
> >>>
> >>>                     ops.runnable();
> >>>                     continue;
> >>>                 }
> >>>
> >>>                 /* Any usable CPU becomes available */
> >>>
> >>>                 ops.dispatch();     /* Task is moved to a local DSQ */
> >>>                 ops.dequeue();      /* Exiting BPF scheduler custody */
> > Is this true here? Any dispatch followed by a dequeue?
> 
> The comment next to ops.dispatch() says the task is moved to a local DSQ,
> so if we assume that, then I think it will always be followed by ops.dequeue().
> Same if we move the task to the global DSQ.

So, ops.dispatch() is not a "task callback", it's a "CPU callback", invoked when
a CPU becomes available. So having ops.dispatch() here can be a bit confusing.

The intent was to describe the workflow where, once the task is enqueued to a
non-terminal DSQ, then it can be consumed by an ops.dispatch() event and, in
that case, ops.dequeue() is also invoked when the task reaches a terminal DSQ.
Not sure if there's a better way to express this concept in the pseudocode.

> 
> Of course, you could do something weird like dispatch the task to a user DSQ,
> in which case there won't be a dequeue and the task won't start running, but
> that's weird enough that I don't think we need to consider it.

Right. For the records, scx_rustland_core does something similar: from
ops.dispatch() it consumes a task from a BPF user ringbuffer, inserts it into a
user DSQ and then consumes the first task from the user DSQ via
scx_bpf_dsq_move_to_local().

But that's a bit of a special use case, due to the unusual user-space scheduling
part. But in this case the pseudocode is still accurate, since the
scx_bpf_dsq_move_to_local() triggers an ops.dequeue().

> 
> You could also have a property change racing with the dispatch which would make
> the dispatch fail and not be followed by a dequeue, but again, we need to draw
> the line somewhere.

Yeah, the thing is that sched_change() introduces a lot of different edge cases
that are difficult to represent in the pseudocode. I guess the best we can do is
document in a descriptive manner the concept of "BPF scheduler's custody" and
the fact that a task can temporarily leave the custody when a sched_change()
event happens.

> 
> So, in other words, any _successful_ dispatch to a _terminal_ DSQ is always
> followed by a dequeue.
> 
> Another case that isn't handled here is direct dispatch to a terminal DSQ from
> ops.enqueue(), where we don't get ops.dispatch() or ops.dequeue() and go
> straight to ops.running(). If any of the above cases should be handled in the
> pseudocode, I'd say it's this one.

Right, in fact it should be: any _successful_ dispatch to a _terminal DSQ_, if
the task was in the BPF scheduler's custody. A direct dispatch to a terminal DSQ
either from ops.select_cpu() or ops.enqueue() doesn't trigger ops.dequeue(),
because the task doesn't enter the BPF scheduler's custody.

Thanks,
-Andrea

  reply	other threads:[~2026-04-09 14:12 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-06 11:47 [PATCH sched_ext/for-7.1] sched_ext: Documentation: Add ops.dequeue() to task lifecycle Andrea Righi
2026-04-06 14:49 ` Emil Tsalapatis
2026-04-06 19:08   ` Andrea Righi
2026-04-06 18:09 ` Tejun Heo
2026-04-07  9:54 ` Kuba Piecuch
2026-04-07 16:31   ` Andrea Righi
2026-04-08  9:18     ` [PATCH sched_ext/for-7.1] sched_ext: Documentation: Add missing calls to quiescent(), runnable() Kuba Piecuch
2026-04-08 11:28       ` Andrea Righi
2026-04-08 12:40         ` Kuba Piecuch
2026-04-08 13:49           ` Andrea Righi
2026-04-08 14:17             ` Kuba Piecuch
2026-04-08 14:54               ` Andrea Righi
2026-04-09  8:46                 ` Kuba Piecuch
2026-04-09  9:38                   ` Andrea Righi
2026-04-09  9:46                   ` Christian Loehle
2026-04-09 13:30                     ` Kuba Piecuch
2026-04-09 14:12                       ` Andrea Righi [this message]
2026-04-09 13:51                     ` Andrea Righi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=adezzWWsgrcGGD3j@gpd4 \
    --to=arighi@nvidia.com \
    --cc=changwoo@igalia.com \
    --cc=christian.loehle@arm.com \
    --cc=emil@etsalapatis.com \
    --cc=jpiecuch@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sched-ext@lists.linux.dev \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox