public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Andrea Righi <arighi@nvidia.com>
Cc: David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>,
	Emil Tsalapatis <emil@etsalapatis.com>,
	Daniel Hodges <hodgesd@meta.com>,
	sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] sched_ext: Fix ops.dequeue() semantics
Date: Sun, 28 Dec 2025 13:28:04 -1000	[thread overview]
Message-ID: <aVG9BBWIQLL8uJrQ@slm.duckdns.org> (raw)
In-Reply-To: <aVFmstVjLW_QIQis@slm.duckdns.org>

Hello,

On Sun, Dec 28, 2025 at 07:19:46AM -1000, Tejun Heo wrote:
> While this works, from the BPF sched's POV, there's no way to tell whether
> an ops.dequeue() call is from the task being actually dequeued or the
> follow-up to the dispatch operation it just did. This won't make much
> difference if ops.dequeue() is just used for accounting purposes, but, a
> scheduler which uses an arena data structure for queueing would likely need
> to perform extra tests to tell whether the task needs to be dequeued from
> the arena side. I *think* hot path (ops.dequeue() following the task's
> dispatch) can be a simple lockless test, so this may be okay, but from API
> POV, it can probably be better.
> 
> The counter interlocking point is scx_bpf_dsq_insert(). If we can
> synchronize scx_bpf_dsq_insert() and dequeue so that ops.dequeue() is not
> called for a successfully inserted task, I think the semantics would be
> neater - an enqueued task is either dispatched or dequeued. Due to the async
> dispatch operation, this likely is difficult to do without adding extra sync
> operations in scx_bpf_dsq_insert(). However, I *think* we may be able to get
> rid of dspc and async inserting if we call ops.dispatch() w/ rq lock
> dropped. That may make the whole dispatch path simpler and the behavior
> neater too. What do you think?

I sat down and went through the code to see whether I was actually making
sense, and I wasn't:

The async dispatch buffering is necessary to avoid lock inversion between rq
lock and whatever locks the BPF scheduler might be using internally. This is
necessary because enqueue path runs with rq lock held. Thus, any lock that
BPF sched uses in tne enqueue path has to nest inside rq lock.

In dispatch, scx_bpf_dsq_insert() is likely to be called with the same BPF
sched side lock held. If we try to do rq lock dancing synchronously, we can
end up trying to grab rq lock while holding BPF side lock leading to
deadlock.

Kernel side has no control over BPF side locking, so the asynchronous
operation is there to side-step the issue. I don't see a good way to make
this synchronous.

So, please ignore that part. That's non-sense. I still wonder whether we can
create some interlocking between scx_bpf_dsq_insert() and ops.dequeue()
without making hot path slower. I'll think more about it.

Thanks.

-- 
tejun

  reply	other threads:[~2025-12-28 23:28 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-19 22:43 [PATCH 0/2] sched_ext: Implement proper ops.dequeue() semantics Andrea Righi
2025-12-19 22:43 ` [PATCH 1/2] sched_ext: Fix " Andrea Righi
2025-12-28  3:20   ` Emil Tsalapatis
2025-12-29 16:36     ` Andrea Righi
2025-12-29 18:35       ` Emil Tsalapatis
2025-12-28 17:19   ` Tejun Heo
2025-12-28 23:28     ` Tejun Heo [this message]
2025-12-28 23:38       ` Tejun Heo
2025-12-29 17:07         ` Andrea Righi
2025-12-29 18:55           ` Emil Tsalapatis
2025-12-28 23:42   ` Tejun Heo
2025-12-29 17:17     ` Andrea Righi
2025-12-29  0:06   ` Tejun Heo
2025-12-29 18:56     ` Andrea Righi
2025-12-19 22:43 ` [PATCH 2/2] selftests/sched_ext: Add test to validate ops.dequeue() Andrea Righi
2025-12-28  3:28   ` Emil Tsalapatis
2025-12-29 16:11     ` Andrea Righi
  -- strict thread matches above, loose matches on Subject: below --
2026-01-21 12:25 [PATCHSET v2 sched_ext/for-6.20] sched_ext: Fix ops.dequeue() semantics Andrea Righi
2026-01-21 12:25 ` [PATCH 1/2] " Andrea Righi
2026-01-21 12:54   ` Christian Loehle
2026-01-21 12:57     ` Andrea Righi
2026-01-22  9:28   ` Kuba Piecuch
2026-01-23 13:32     ` Andrea Righi
2026-01-26  8:41 [PATCHSET v3 sched_ext/for-6.20] " Andrea Righi
2026-01-26  8:41 ` [PATCH 1/2] " Andrea Righi
2026-01-27 16:38   ` Emil Tsalapatis
2026-01-27 16:41   ` Kuba Piecuch
2026-01-30  7:34     ` Andrea Righi
2026-01-30 13:14       ` Kuba Piecuch
2026-01-31  6:54         ` Andrea Righi
2026-01-31 16:45           ` Kuba Piecuch
2026-01-31 17:24             ` Andrea Righi
2026-01-28 21:21   ` Tejun Heo
2026-01-30 11:54     ` Kuba Piecuch
2026-01-31  9:02       ` Andrea Righi
2026-01-31 17:53         ` Kuba Piecuch
2026-01-31 20:26           ` Andrea Righi
2026-02-02 15:19             ` Tejun Heo
2026-02-02 15:30               ` Andrea Righi
2026-02-01 17:43       ` Tejun Heo
2026-02-02 15:52         ` Andrea Righi
2026-02-02 16:23           ` Kuba Piecuch
2026-02-01  9:08 [PATCHSET v4 sched_ext/for-6.20] " Andrea Righi
2026-02-01  9:08 ` [PATCH 1/2] " Andrea Righi
2026-02-01 22:47   ` Christian Loehle
2026-02-02  7:45     ` Andrea Righi
2026-02-02  9:26       ` Andrea Righi
2026-02-02 10:02         ` Christian Loehle
2026-02-02 15:32           ` Andrea Righi
2026-02-02 10:09       ` Christian Loehle
2026-02-02 13:59       ` Kuba Piecuch
2026-02-04  9:36         ` Andrea Righi
2026-02-04  9:51           ` Kuba Piecuch
2026-02-02 11:56   ` Kuba Piecuch
2026-02-04 10:11     ` Andrea Righi
2026-02-04 10:33       ` Kuba Piecuch
2026-02-04 16:05 [PATCHSET v5] " Andrea Righi
2026-02-04 16:05 ` [PATCH 1/2] " Andrea Righi
2026-02-04 22:14   ` Tejun Heo
2026-02-05  9:26     ` Andrea Righi
2026-02-05 15:32 [PATCHSET v6] " Andrea Righi
2026-02-05 15:32 ` [PATCH 1/2] " Andrea Righi
2026-02-05 19:29   ` Kuba Piecuch
2026-02-05 21:32     ` Andrea Righi
2026-02-06 13:54 [PATCHSET v7] " Andrea Righi
2026-02-06 13:54 ` [PATCH 1/2] " Andrea Righi
2026-02-06 20:35   ` Emil Tsalapatis
2026-02-07  9:26     ` Andrea Righi
2026-02-09 17:28       ` Tejun Heo
2026-02-09 19:06         ` Andrea Righi
2026-02-10 21:26 [PATCHSET v8] " Andrea Righi
2026-02-10 21:26 ` [PATCH 1/2] " Andrea Righi
2026-02-10 23:20   ` Tejun Heo
2026-02-11 16:06     ` Andrea Righi
2026-02-11 19:47       ` Tejun Heo
2026-02-11 22:34         ` Andrea Righi
2026-02-11 22:37           ` Tejun Heo
2026-02-11 22:48             ` Andrea Righi
2026-02-12 10:16             ` Andrea Righi
2026-02-12 14:32               ` Christian Loehle
2026-02-12 15:45                 ` Andrea Righi
2026-02-12 17:07                   ` Tejun Heo
2026-02-12 18:14                     ` Andrea Righi
2026-02-12 18:35                       ` Tejun Heo
2026-02-12 22:30                         ` Andrea Righi
2026-02-14 10:16                           ` Andrea Righi
2026-02-14 17:56                             ` Tejun Heo
2026-02-14 19:32                               ` Andrea Righi
2026-02-10 23:54   ` Tejun Heo
2026-02-11 16:07     ` Andrea Righi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aVG9BBWIQLL8uJrQ@slm.duckdns.org \
    --to=tj@kernel.org \
    --cc=arighi@nvidia.com \
    --cc=changwoo@igalia.com \
    --cc=emil@etsalapatis.com \
    --cc=hodgesd@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sched-ext@lists.linux.dev \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox