From: Kuba Piecuch <jpiecuch@google.com>
To: Andrea Righi <arighi@nvidia.com>, Tejun Heo <tj@kernel.org>,
David Vernet <void@manifault.com>,
Changwoo Min <changwoo@igalia.com>
Cc: Emil Tsalapatis <emil@etsalapatis.com>,
Daniel Hodges <hodgesd@meta.com>, <sched-ext@lists.linux.dev>,
<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/2] sched_ext: Fix ops.dequeue() semantics
Date: Thu, 22 Jan 2026 09:28:39 +0000 [thread overview]
Message-ID: <DFV0FQUGMEVB.321XK903AC0B9@google.com> (raw)
In-Reply-To: <20260121123118.964704-2-arighi@nvidia.com>
[Resending with reply-all, messed up the first time, apologies.]
Hi Andrea,
On Wed Jan 21, 2026 at 12:25 PM UTC, Andrea Righi wrote:
> Currently, ops.dequeue() is only invoked when the sched_ext core knows
> that a task resides in BPF-managed data structures, which causes it to
> miss scheduling property change scenarios. As a result, BPF schedulers
> cannot reliably track task state.
>
> In addition, some ops.dequeue() callbacks can be skipped (e.g., during
> direct dispatch), so ops.enqueue() calls are not always paired with a
> corresponding ops.dequeue(), potentially breaking accounting logic.
>
> Fix this by guaranteeing that every ops.enqueue() is matched with a
> corresponding ops.dequeue(), and introduce the SCX_DEQ_ASYNC flag to
> distinguish dequeues triggered by scheduling property changes from those
> occurring in the normal dispatch workflow.
>
> New semantics:
> 1. ops.enqueue() is called when a task enters the BPF scheduler
> 2. ops.dequeue() is called when the task leaves the BPF scheduler,
> because it is dispatched to a DSQ (regular workflow)
> 3. ops.dequeue(SCX_DEQ_ASYNC) is called when the task leaves the BPF
> scheduler, because a task property is changed (sched_change)
What about the case where ops.dequeue() is called due to core-sched picking the
task through sched_core_find()? If I understand core-sched correctly, it can
happen without prior dispatch, so it doesn't fit case 2, and we're not changing
task properties, so it doesn't fit case 3 either.
> + /*
> + * Set when ops.dequeue() is called after successful dispatch; used to
> + * distinguish dispatch dequeues from async dequeues (property changes)
> + * and to prevent duplicate dequeue calls.
> + */
> + SCX_TASK_DISPATCH_DEQUEUED = 1 << 4,
I see this flag being set and cleared in several places, but I don't see it
actually being read, is that intentional?
> @@ -1529,6 +1553,17 @@ static void ops_dequeue(struct rq *rq, struct task_struct *p, u64 deq_flags)
>
> switch (opss & SCX_OPSS_STATE_MASK) {
> case SCX_OPSS_NONE:
> + if (SCX_HAS_OP(sch, dequeue) &&
> + p->scx.flags & SCX_TASK_OPS_ENQUEUED) {
> + bool is_async_dequeue =
> + !(deq_flags & (DEQUEUE_SLEEP | SCX_DEQ_CORE_SCHED_EXEC));
> +
> + if (is_async_dequeue)
> + SCX_CALL_OP_TASK(sch, SCX_KF_REST, dequeue, rq,
> + p, deq_flags | SCX_DEQ_ASYNC);
> + p->scx.flags &= ~(SCX_TASK_OPS_ENQUEUED |
> + SCX_TASK_DISPATCH_DEQUEUED);
> + }
> break;
> case SCX_OPSS_QUEUEING:
> /*
> @@ -1537,9 +1572,17 @@ static void ops_dequeue(struct rq *rq, struct task_struct *p, u64 deq_flags)
> */
> BUG();
> case SCX_OPSS_QUEUED:
> - if (SCX_HAS_OP(sch, dequeue))
> + /*
> + * Task is in the enqueued state. This is a property change
> + * dequeue before dispatch completes. Notify the BPF scheduler
> + * with SCX_DEQ_ASYNC flag.
> + */
> + if (SCX_HAS_OP(sch, dequeue)) {
> SCX_CALL_OP_TASK(sch, SCX_KF_REST, dequeue, rq,
> - p, deq_flags);
> + p, deq_flags | SCX_DEQ_ASYNC);
> + p->scx.flags &= ~(SCX_TASK_OPS_ENQUEUED |
> + SCX_TASK_DISPATCH_DEQUEUED);
> + }
>
A core-sched pick of a task queued on the BPF scheduler will result in entering
the SCX_OPSS_QUEUED case, which in turn will call ops.dequeue(SCX_DEQ_ASYNC).
This seems to conflict with the is_async_dequeue check above, which treats
SCX_DEQ_CORE_SCHED_EXEC as a synchronous dequeue.
Thanks,
Kuba
next prev parent reply other threads:[~2026-01-22 9:28 UTC|newest]
Thread overview: 81+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-21 12:25 [PATCHSET v2 sched_ext/for-6.20] sched_ext: Fix ops.dequeue() semantics Andrea Righi
2026-01-21 12:25 ` [PATCH 1/2] " Andrea Righi
2026-01-21 12:54 ` Christian Loehle
2026-01-21 12:57 ` Andrea Righi
2026-01-22 9:28 ` Kuba Piecuch [this message]
2026-01-23 13:32 ` Andrea Righi
2026-01-21 12:25 ` [PATCH 2/2] selftests/sched_ext: Add test to validate " Andrea Righi
-- strict thread matches above, loose matches on Subject: below --
2026-02-10 21:26 [PATCHSET v8] sched_ext: Fix " Andrea Righi
2026-02-10 21:26 ` [PATCH 1/2] " Andrea Righi
2026-02-10 23:20 ` Tejun Heo
2026-02-11 16:06 ` Andrea Righi
2026-02-11 19:47 ` Tejun Heo
2026-02-11 22:34 ` Andrea Righi
2026-02-11 22:37 ` Tejun Heo
2026-02-11 22:48 ` Andrea Righi
2026-02-12 10:16 ` Andrea Righi
2026-02-12 14:32 ` Christian Loehle
2026-02-12 15:45 ` Andrea Righi
2026-02-12 17:07 ` Tejun Heo
2026-02-12 18:14 ` Andrea Righi
2026-02-12 18:35 ` Tejun Heo
2026-02-12 22:30 ` Andrea Righi
2026-02-14 10:16 ` Andrea Righi
2026-02-14 17:56 ` Tejun Heo
2026-02-14 19:32 ` Andrea Righi
2026-02-10 23:54 ` Tejun Heo
2026-02-11 16:07 ` Andrea Righi
2026-02-06 13:54 [PATCHSET v7] " Andrea Righi
2026-02-06 13:54 ` [PATCH 1/2] " Andrea Righi
2026-02-06 20:35 ` Emil Tsalapatis
2026-02-07 9:26 ` Andrea Righi
2026-02-09 17:28 ` Tejun Heo
2026-02-09 19:06 ` Andrea Righi
2026-02-05 15:32 [PATCHSET v6] " Andrea Righi
2026-02-05 15:32 ` [PATCH 1/2] " Andrea Righi
2026-02-05 19:29 ` Kuba Piecuch
2026-02-05 21:32 ` Andrea Righi
2026-02-04 16:05 [PATCHSET v5] " Andrea Righi
2026-02-04 16:05 ` [PATCH 1/2] " Andrea Righi
2026-02-04 22:14 ` Tejun Heo
2026-02-05 9:26 ` Andrea Righi
2026-02-01 9:08 [PATCHSET v4 sched_ext/for-6.20] " Andrea Righi
2026-02-01 9:08 ` [PATCH 1/2] " Andrea Righi
2026-02-01 22:47 ` Christian Loehle
2026-02-02 7:45 ` Andrea Righi
2026-02-02 9:26 ` Andrea Righi
2026-02-02 10:02 ` Christian Loehle
2026-02-02 15:32 ` Andrea Righi
2026-02-02 10:09 ` Christian Loehle
2026-02-02 13:59 ` Kuba Piecuch
2026-02-04 9:36 ` Andrea Righi
2026-02-04 9:51 ` Kuba Piecuch
2026-02-02 11:56 ` Kuba Piecuch
2026-02-04 10:11 ` Andrea Righi
2026-02-04 10:33 ` Kuba Piecuch
2026-01-26 8:41 [PATCHSET v3 sched_ext/for-6.20] " Andrea Righi
2026-01-26 8:41 ` [PATCH 1/2] " Andrea Righi
2026-01-27 16:38 ` Emil Tsalapatis
2026-01-27 16:41 ` Kuba Piecuch
2026-01-30 7:34 ` Andrea Righi
2026-01-30 13:14 ` Kuba Piecuch
2026-01-31 6:54 ` Andrea Righi
2026-01-31 16:45 ` Kuba Piecuch
2026-01-31 17:24 ` Andrea Righi
2026-01-28 21:21 ` Tejun Heo
2026-01-30 11:54 ` Kuba Piecuch
2026-01-31 9:02 ` Andrea Righi
2026-01-31 17:53 ` Kuba Piecuch
2026-01-31 20:26 ` Andrea Righi
2026-02-02 15:19 ` Tejun Heo
2026-02-02 15:30 ` Andrea Righi
2026-02-01 17:43 ` Tejun Heo
2026-02-02 15:52 ` Andrea Righi
2026-02-02 16:23 ` Kuba Piecuch
2025-12-19 22:43 [PATCH 0/2] sched_ext: Implement proper " Andrea Righi
2025-12-19 22:43 ` [PATCH 1/2] sched_ext: Fix " Andrea Righi
2025-12-28 3:20 ` Emil Tsalapatis
2025-12-29 16:36 ` Andrea Righi
2025-12-29 18:35 ` Emil Tsalapatis
2025-12-28 17:19 ` Tejun Heo
2025-12-28 23:28 ` Tejun Heo
2025-12-28 23:38 ` Tejun Heo
2025-12-29 17:07 ` Andrea Righi
2025-12-29 18:55 ` Emil Tsalapatis
2025-12-28 23:42 ` Tejun Heo
2025-12-29 17:17 ` Andrea Righi
2025-12-29 0:06 ` Tejun Heo
2025-12-29 18:56 ` Andrea Righi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DFV0FQUGMEVB.321XK903AC0B9@google.com \
--to=jpiecuch@google.com \
--cc=arighi@nvidia.com \
--cc=changwoo@igalia.com \
--cc=emil@etsalapatis.com \
--cc=hodgesd@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=sched-ext@lists.linux.dev \
--cc=tj@kernel.org \
--cc=void@manifault.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox