public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrea Righi <arighi@nvidia.com>
To: Kuba Piecuch <jpiecuch@google.com>
Cc: Tejun Heo <tj@kernel.org>, David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>,
	Christian Loehle <christian.loehle@arm.com>,
	Daniel Hodges <hodgesd@meta.com>,
	sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org,
	Emil Tsalapatis <emil@etsalapatis.com>
Subject: Re: [PATCH 1/2] sched_ext: Fix ops.dequeue() semantics
Date: Sat, 31 Jan 2026 10:02:19 +0100	[thread overview]
Message-ID: <aX3FG00RNMv8VnQQ@gpd4> (raw)
In-Reply-To: <DG1WJEB6B0AC.151EBIUYXCR55@google.com>

On Fri, Jan 30, 2026 at 11:54:00AM +0000, Kuba Piecuch wrote:
> Hi Tejun,
> 
> On Wed Jan 28, 2026 at 9:21 PM UTC, Tejun Heo wrote:
> ...
> > 1. When to call ops.dequeue()?
> >
> > I'm not sure whether deciding whether to call ops.dequeue() solely onwhether
> > ops.enqueue() was called. Direct dispatch has been expanded to include other
> > DSQs but was originally added as a way to shortcut the dispatch path and
> > "dispatch directly" for execution from ops.select_cpu/enqueue() paths. ie.
> > When a task is dispatched directly to a local DSQ, the BPF scheduler is done
> > with that task - the task is now in the same state with tasks that get
> > dispatched to a local DSQ from ops.dispatch().
> >
> > ie. What effectively decides whether a task left the BPF scheduler is
> > whether the task reached a local DSQ or not, and direct dispatching into a
> > local DSQ shouldn't trigger ops.dequeue() - the task never really "queues"
> > on the BPF scheduler.
> 
> Is "local" short for "local or global", i.e. not user-created?
> Direct dispatching into the global DSQ also shouldn't trigger ops.dequeue(),
> since dispatch isn't necessary for the task to run. This follows from the last
> paragraph:
> 
>   Note that, this way, whether ops.dequeue() needs to be called agrees with
>   whether the task needs to be dispatched to run.
> 
> I agree with your points, just wanted to clarify this one thing.

I think this should be interpreted as local DSQs only
(SCX_DSQ_LOCAL / SCX_DSQ_LOCAL_ON), not any built-in DSQ. SCX_DSQ_GLOBAL is
essentially a built-in user DSQ, provided for convenience, it's not really
a "direct dispatch" DSQ.

> 
> >
> > This creates another discrepancy - From ops.enqueue(), direct dispatching
> > into a non-local DSQ clearly makes the task enter the BPF scheduler and thus
> > its departure should trigger ops.dequeue(). What about a task which is
> > direct dispatched to a non-local DSQ from ops.select_cpu()? Superficially,
> > the right thing to do seems to skip ops.dequeue(). After all, the task has
> > never been ops.enqueue()'d. However, I think this is another case where
> > what's obvious doesn't agree with what's happening underneath.
> >
> > ops.select_cpu() cannot actually queue anything. It's too early. Direct
> > dispatch from ops.select_cpu() is a shortcut to schedule direct dispatch
> > once the enqueue path is invoked so that the BPF scheudler can avoid
> > invocation of ops.enqueue() when the decision has already been made. While
> > this shortcut was added for convenience (so that e.g. the BPF scheduler
> > doesn't have to pass a note from ops.select_cpu() to ops.enqueue()), it has
> > real performance implications as it does save a roundtrip through
> > ops.enqueue() and we know that such overheads do matter for some use cases
> > (e.g. maximizing FPS on certain games).
> >
> > So, while more subtle on the surface, I think the right thing to do is
> > basing the decision to call ops.dequeue() on the task's actual state -
> > ops.dequeue() should be called if the task is "on" the BPF scheduler - ie.
> > if the task ran ops.select_cpu/enqueue() paths and ended up in a non-local
> > DSQ or on the BPF side.
> >
> > The subtlety would need clear documentation and we probably want to allow
> > ops.dequeue() to distinguish different cases. If you boil it down to the
> > actual task state, I don't think it's that subtle - if a task is in the
> > custody of the BPF scheduler, ops.dequeue() will be called. Otherwise, not.
> > Note that, this way, whether ops.dequeue() needs to be called agrees with
> > whether the task needs to be dispatched to run.
> 
> Here's my attempt at documenting this behavior:
> 
> After ops.enqueue() is called on a task, the task is owned by the BPF
> scheduler, provided the task wasn't direct-dispatched to a local/global DSQ.
> When a task is owned by the BPF scheduler, the scheduler needs to dispatch the
> task to a local/global DSQ in order for it to run.
> When the BPF scheduler loses ownership of the task, either due to dispatching it
> to a local/global DSQ or due to external events (core-sched pick, CPU
> migration, scheduling property changes), the BPF scheduler is notified through
> ops.dequeue() with appropriate flags (TBD).

This looks good overall, except for the global DSQ part. Also, it might be
better to avoid the term “owned”, internally the kernel already uses the
concept of "task ownership" with a different meaning (see
https://lore.kernel.org/all/aVHAZNbIJLLBHEXY@slm.duckdns.org), and reusing
it here could be misleading.

With that in mind, I'd probably rephrase your documentation along these
lines:

After ops.enqueue() is called, the task is considered *enqueued* by the BPF
scheduler, unless it is directly dispatched to a local DSQ (via
SCX_DSQ_LOCAL or SCX_DSQ_LOCAL_ON).

While a task is enqueued, the BPF scheduler must explicitly dispatch it to
a DSQ in order for it to run.

When a task leaves the enqueued state (either because it is dispatched to a
non-local DSQ, or due to external events such as a core-sched pick, CPU
migration, or scheduling property changes), ops.dequeue() is invoked to
notify the BPF scheduler, with flags indicating the reason for the dequeue:
regular dispatch dequeues have no flags set, whereas dequeues triggered by
scheduling property changes are reported with SCX_DEQ_SCHED_CHANGE.

What do you think?

Thanks,
-Andrea

  reply	other threads:[~2026-01-31  9:02 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-26  8:41 [PATCHSET v3 sched_ext/for-6.20] sched_ext: Fix ops.dequeue() semantics Andrea Righi
2026-01-26  8:41 ` [PATCH 1/2] " Andrea Righi
2026-01-27 16:38   ` Emil Tsalapatis
2026-01-27 16:41   ` Kuba Piecuch
2026-01-30  7:34     ` Andrea Righi
2026-01-30 13:14       ` Kuba Piecuch
2026-01-31  6:54         ` Andrea Righi
2026-01-31 16:45           ` Kuba Piecuch
2026-01-31 17:24             ` Andrea Righi
2026-01-28 21:21   ` Tejun Heo
2026-01-30 11:54     ` Kuba Piecuch
2026-01-31  9:02       ` Andrea Righi [this message]
2026-01-31 17:53         ` Kuba Piecuch
2026-01-31 20:26           ` Andrea Righi
2026-02-02 15:19             ` Tejun Heo
2026-02-02 15:30               ` Andrea Righi
2026-02-01 17:43       ` Tejun Heo
2026-02-02 15:52         ` Andrea Righi
2026-02-02 16:23           ` Kuba Piecuch
2026-01-26  8:41 ` [PATCH 2/2] selftests/sched_ext: Add test to validate " Andrea Righi
2026-01-27 16:53   ` Emil Tsalapatis
  -- strict thread matches above, loose matches on Subject: below --
2026-02-10 21:26 [PATCHSET v8] sched_ext: Fix " Andrea Righi
2026-02-10 21:26 ` [PATCH 1/2] " Andrea Righi
2026-02-10 23:20   ` Tejun Heo
2026-02-11 16:06     ` Andrea Righi
2026-02-11 19:47       ` Tejun Heo
2026-02-11 22:34         ` Andrea Righi
2026-02-11 22:37           ` Tejun Heo
2026-02-11 22:48             ` Andrea Righi
2026-02-12 10:16             ` Andrea Righi
2026-02-12 14:32               ` Christian Loehle
2026-02-12 15:45                 ` Andrea Righi
2026-02-12 17:07                   ` Tejun Heo
2026-02-12 18:14                     ` Andrea Righi
2026-02-12 18:35                       ` Tejun Heo
2026-02-12 22:30                         ` Andrea Righi
2026-02-14 10:16                           ` Andrea Righi
2026-02-14 17:56                             ` Tejun Heo
2026-02-14 19:32                               ` Andrea Righi
2026-02-10 23:54   ` Tejun Heo
2026-02-11 16:07     ` Andrea Righi
2026-02-06 13:54 [PATCHSET v7] " Andrea Righi
2026-02-06 13:54 ` [PATCH 1/2] " Andrea Righi
2026-02-06 20:35   ` Emil Tsalapatis
2026-02-07  9:26     ` Andrea Righi
2026-02-09 17:28       ` Tejun Heo
2026-02-09 19:06         ` Andrea Righi
2026-02-05 15:32 [PATCHSET v6] " Andrea Righi
2026-02-05 15:32 ` [PATCH 1/2] " Andrea Righi
2026-02-05 19:29   ` Kuba Piecuch
2026-02-05 21:32     ` Andrea Righi
2026-02-04 16:05 [PATCHSET v5] " Andrea Righi
2026-02-04 16:05 ` [PATCH 1/2] " Andrea Righi
2026-02-04 22:14   ` Tejun Heo
2026-02-05  9:26     ` Andrea Righi
2026-02-01  9:08 [PATCHSET v4 sched_ext/for-6.20] " Andrea Righi
2026-02-01  9:08 ` [PATCH 1/2] " Andrea Righi
2026-02-01 22:47   ` Christian Loehle
2026-02-02  7:45     ` Andrea Righi
2026-02-02  9:26       ` Andrea Righi
2026-02-02 10:02         ` Christian Loehle
2026-02-02 15:32           ` Andrea Righi
2026-02-02 10:09       ` Christian Loehle
2026-02-02 13:59       ` Kuba Piecuch
2026-02-04  9:36         ` Andrea Righi
2026-02-04  9:51           ` Kuba Piecuch
2026-02-02 11:56   ` Kuba Piecuch
2026-02-04 10:11     ` Andrea Righi
2026-02-04 10:33       ` Kuba Piecuch
2026-01-21 12:25 [PATCHSET v2 sched_ext/for-6.20] " Andrea Righi
2026-01-21 12:25 ` [PATCH 1/2] " Andrea Righi
2026-01-21 12:54   ` Christian Loehle
2026-01-21 12:57     ` Andrea Righi
2026-01-22  9:28   ` Kuba Piecuch
2026-01-23 13:32     ` Andrea Righi
2025-12-19 22:43 [PATCH 0/2] sched_ext: Implement proper " Andrea Righi
2025-12-19 22:43 ` [PATCH 1/2] sched_ext: Fix " Andrea Righi
2025-12-28  3:20   ` Emil Tsalapatis
2025-12-29 16:36     ` Andrea Righi
2025-12-29 18:35       ` Emil Tsalapatis
2025-12-28 17:19   ` Tejun Heo
2025-12-28 23:28     ` Tejun Heo
2025-12-28 23:38       ` Tejun Heo
2025-12-29 17:07         ` Andrea Righi
2025-12-29 18:55           ` Emil Tsalapatis
2025-12-28 23:42   ` Tejun Heo
2025-12-29 17:17     ` Andrea Righi
2025-12-29  0:06   ` Tejun Heo
2025-12-29 18:56     ` Andrea Righi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aX3FG00RNMv8VnQQ@gpd4 \
    --to=arighi@nvidia.com \
    --cc=changwoo@igalia.com \
    --cc=christian.loehle@arm.com \
    --cc=emil@etsalapatis.com \
    --cc=hodgesd@meta.com \
    --cc=jpiecuch@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sched-ext@lists.linux.dev \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox