public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andrea Righi <arighi@nvidia.com>
To: Emil Tsalapatis <emil@etsalapatis.com>
Cc: Tejun Heo <tj@kernel.org>, David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>,
	Kuba Piecuch <jpiecuch@google.com>,
	Christian Loehle <christian.loehle@arm.com>,
	Daniel Hodges <hodgesd@meta.com>,
	sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] sched_ext: Fix ops.dequeue() semantics
Date: Sat, 7 Feb 2026 10:26:17 +0100	[thread overview]
Message-ID: <aYcFOVlJhUU5huNd@gpd4> (raw)
In-Reply-To: <DG860JW64VVD.31BS2QTEB8XZQ@etsalapatis.com>

Hi Emil,

On Fri, Feb 06, 2026 at 03:35:34PM -0500, Emil Tsalapatis wrote:
> On Fri Feb 6, 2026 at 8:54 AM EST, Andrea Righi wrote:
...
> > diff --git a/include/linux/sched/ext.h b/include/linux/sched/ext.h
> > index bcb962d5ee7d8..c48f818eee9b8 100644
> > --- a/include/linux/sched/ext.h
> > +++ b/include/linux/sched/ext.h
> > @@ -84,6 +84,7 @@ struct scx_dispatch_q {
> >  /* scx_entity.flags */
> >  enum scx_ent_flags {
> >  	SCX_TASK_QUEUED		= 1 << 0, /* on ext runqueue */
> > +	SCX_TASK_NEED_DEQ	= 1 << 1, /* in BPF custody, needs ops.dequeue() when leaving */
> 
> Can we make this "SCX_TASK_IN_BPF"? Since we've now defined what it means to be
> in BPF custody vs the core scx scheduler (terminal DSQs) this is a more
> general property that can be useful to check in the future. An example:
> We can now assert that a task's BPF state is consistent with its actual 
> kernel state when using BPF-based data structures to manage tasks.

Ack. I like SCX_TASK_IN_BPF and I also like the idea of resuing the flag
for other purposes. It can be helpful for debugging as well.

> 
> >  	SCX_TASK_RESET_RUNNABLE_AT = 1 << 2, /* runnable_at should be reset */
> >  	SCX_TASK_DEQD_FOR_SLEEP	= 1 << 3, /* last dequeue was for SLEEP */
> >  
> > diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> > index 0bb8fa927e9e9..d17fd9141adf4 100644
> > --- a/kernel/sched/ext.c
> > +++ b/kernel/sched/ext.c
> > @@ -925,6 +925,27 @@ static void touch_core_sched(struct rq *rq, struct task_struct *p)
> >  #endif
> >  }
> >  
> > +/**
> > + * is_terminal_dsq - Check if a DSQ is terminal for ops.dequeue() purposes
> > + * @dsq_id: DSQ ID to check
> > + *
> > + * Returns true if @dsq_id is a terminal/builtin DSQ where the BPF
> > + * scheduler is considered "done" with the task.
> > + *
> > + * Builtin DSQs include:
> > + *  - Local DSQs (%SCX_DSQ_LOCAL or %SCX_DSQ_LOCAL_ON): per-CPU queues
> > + *    where tasks go directly to execution,
> > + *  - Global DSQ (%SCX_DSQ_GLOBAL): built-in fallback queue,
> > + *  - Bypass DSQ: used during bypass mode.
> > + *
> > + * Tasks dispatched to builtin DSQs exit BPF scheduler custody and do not
> > + * trigger ops.dequeue() when they are later consumed.
> > + */
> > +static inline bool is_terminal_dsq(u64 dsq_id)
> > +{
> > +	return dsq_id & SCX_DSQ_FLAG_BUILTIN;
> > +}
> > +
> >  /**
> >   * touch_core_sched_dispatch - Update core-sched timestamp on dispatch
> >   * @rq: rq to read clock from, must be locked
> > @@ -1008,7 +1029,8 @@ static void local_dsq_post_enq(struct scx_dispatch_q *dsq, struct task_struct *p
> >  		resched_curr(rq);
> >  }
> >  
> > -static void dispatch_enqueue(struct scx_sched *sch, struct scx_dispatch_q *dsq,
> > +static void dispatch_enqueue(struct scx_sched *sch, struct rq *rq,
> > +			     struct scx_dispatch_q *dsq,
> >  			     struct task_struct *p, u64 enq_flags)
> >  {
> >  	bool is_local = dsq->id == SCX_DSQ_LOCAL;
> > @@ -1103,6 +1125,27 @@ static void dispatch_enqueue(struct scx_sched *sch, struct scx_dispatch_q *dsq,
> >  	dsq_mod_nr(dsq, 1);
> >  	p->scx.dsq = dsq;
> >  
> > +	/*
> > +	 * Handle ops.dequeue() and custody tracking.
> > +	 *
> > +	 * Builtin DSQs (local, global, bypass) are terminal: the BPF
> > +	 * scheduler is done with the task. If it was in BPF custody, call
> > +	 * ops.dequeue() and clear the flag.
> > +	 *
> > +	 * User DSQs: Task is in BPF scheduler's custody. Set the flag so
> > +	 * ops.dequeue() will be called when it leaves.
> > +	 */
> > +	if (SCX_HAS_OP(sch, dequeue)) {
> > +		if (is_terminal_dsq(dsq->id)) {
> > +			if (p->scx.flags & SCX_TASK_NEED_DEQ)
> > +				SCX_CALL_OP_TASK(sch, SCX_KF_REST, dequeue,
> > +						 rq, p, 0);
> > +			p->scx.flags &= ~SCX_TASK_NEED_DEQ;
> > +		} else {
> > +			p->scx.flags |= SCX_TASK_NEED_DEQ;
> > +		}
> > +	}
> > +
> >  	/*
> >  	 * scx.ddsp_dsq_id and scx.ddsp_enq_flags are only relevant on the
> >  	 * direct dispatch path, but we clear them here because the direct
> > @@ -1323,7 +1366,7 @@ static void direct_dispatch(struct scx_sched *sch, struct task_struct *p,
> >  		return;
> >  	}
> >  
> > -	dispatch_enqueue(sch, dsq, p,
> > +	dispatch_enqueue(sch, rq, dsq, p,
> >  			 p->scx.ddsp_enq_flags | SCX_ENQ_CLEAR_OPSS);
> >  }
> >  
> > @@ -1407,13 +1450,22 @@ static void do_enqueue_task(struct rq *rq, struct task_struct *p, u64 enq_flags,
> >  	 * dequeue may be waiting. The store_release matches their load_acquire.
> >  	 */
> >  	atomic_long_set_release(&p->scx.ops_state, SCX_OPSS_QUEUED | qseq);
> > +
> > +	/*
> > +	 * Task is now in BPF scheduler's custody (queued on BPF internal
> > +	 * structures). Set %SCX_TASK_NEED_DEQ so ops.dequeue() is called
> > +	 * when it leaves custody (e.g. dispatched to a terminal DSQ or on
> > +	 * property change).
> > +	 */
> > +	if (SCX_HAS_OP(sch, dequeue))
> 
> Related to the rename: Can we remove the guards and track the flag
> regardless of whether ops.dequeue() is present?
> 
> There is no reason not to track whether a task is in BPF or the core, 
> and it is a property that's independent of whether we implement ops.dequeue(). 
> This also simplifies the code since we now just guard the actual ops.dequeue()
> call.

I was concerned about introducing overhead, with the guard we can save a
few memory writes to p->scx.flags. But I don't have numbers and probably
the overhead is negligible.

Also, if we have a working ops.dequeue(), I guess more schedulers will
start implementing an ops.dequeue() callback, so the guard itself may
actually become the extra overhead.

So, I guess we can remove the guard and just set/clear the flag even
without an ops.dequeue() callback...

Thanks,
-Andrea

  reply	other threads:[~2026-02-07  9:26 UTC|newest]

Thread overview: 98+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-06 13:54 [PATCHSET v7] sched_ext: Fix ops.dequeue() semantics Andrea Righi
2026-02-06 13:54 ` [PATCH 1/2] " Andrea Righi
2026-02-06 20:35   ` Emil Tsalapatis
2026-02-07  9:26     ` Andrea Righi [this message]
2026-02-09 17:28       ` Tejun Heo
2026-02-09 19:06         ` Andrea Righi
2026-02-06 13:54 ` [PATCH 2/2] selftests/sched_ext: Add test to validate " Andrea Righi
2026-02-06 20:10   ` Emil Tsalapatis
2026-02-07  9:16     ` Andrea Righi
2026-02-08  5:11       ` Emil Tsalapatis
2026-02-08  9:02         ` Andrea Righi
2026-02-08 10:26           ` Andrea Righi
2026-02-08 13:55             ` Andrea Righi
2026-02-08 17:59               ` Emil Tsalapatis
2026-02-08 20:08                 ` Andrea Righi
2026-02-09 10:20                   ` Andrea Righi
2026-02-09 15:00                     ` Emil Tsalapatis
2026-02-09 15:43                       ` Andrea Righi
2026-02-09 17:23                         ` Tejun Heo
2026-02-09 19:17                           ` Andrea Righi
2026-02-09 20:10                             ` Tejun Heo
2026-02-09 22:22                               ` Andrea Righi
2026-02-10  0:42                                 ` Tejun Heo
2026-02-10  7:29                                   ` Andrea Righi
  -- strict thread matches above, loose matches on Subject: below --
2026-02-10 21:26 [PATCHSET v8] sched_ext: Fix " Andrea Righi
2026-02-10 21:26 ` [PATCH 1/2] " Andrea Righi
2026-02-10 23:20   ` Tejun Heo
2026-02-11 16:06     ` Andrea Righi
2026-02-11 19:47       ` Tejun Heo
2026-02-11 22:34         ` Andrea Righi
2026-02-11 22:37           ` Tejun Heo
2026-02-11 22:48             ` Andrea Righi
2026-02-12 10:16             ` Andrea Righi
2026-02-12 14:32               ` Christian Loehle
2026-02-12 15:45                 ` Andrea Righi
2026-02-12 17:07                   ` Tejun Heo
2026-02-12 18:14                     ` Andrea Righi
2026-02-12 18:35                       ` Tejun Heo
2026-02-12 22:30                         ` Andrea Righi
2026-02-14 10:16                           ` Andrea Righi
2026-02-14 17:56                             ` Tejun Heo
2026-02-14 19:32                               ` Andrea Righi
2026-02-10 23:54   ` Tejun Heo
2026-02-11 16:07     ` Andrea Righi
2026-02-05 15:32 [PATCHSET v6] " Andrea Righi
2026-02-05 15:32 ` [PATCH 1/2] " Andrea Righi
2026-02-05 19:29   ` Kuba Piecuch
2026-02-05 21:32     ` Andrea Righi
2026-02-04 16:05 [PATCHSET v5] " Andrea Righi
2026-02-04 16:05 ` [PATCH 1/2] " Andrea Righi
2026-02-04 22:14   ` Tejun Heo
2026-02-05  9:26     ` Andrea Righi
2026-02-01  9:08 [PATCHSET v4 sched_ext/for-6.20] " Andrea Righi
2026-02-01  9:08 ` [PATCH 1/2] " Andrea Righi
2026-02-01 22:47   ` Christian Loehle
2026-02-02  7:45     ` Andrea Righi
2026-02-02  9:26       ` Andrea Righi
2026-02-02 10:02         ` Christian Loehle
2026-02-02 15:32           ` Andrea Righi
2026-02-02 10:09       ` Christian Loehle
2026-02-02 13:59       ` Kuba Piecuch
2026-02-04  9:36         ` Andrea Righi
2026-02-04  9:51           ` Kuba Piecuch
2026-02-02 11:56   ` Kuba Piecuch
2026-02-04 10:11     ` Andrea Righi
2026-02-04 10:33       ` Kuba Piecuch
2026-01-26  8:41 [PATCHSET v3 sched_ext/for-6.20] " Andrea Righi
2026-01-26  8:41 ` [PATCH 1/2] " Andrea Righi
2026-01-27 16:38   ` Emil Tsalapatis
2026-01-27 16:41   ` Kuba Piecuch
2026-01-30  7:34     ` Andrea Righi
2026-01-30 13:14       ` Kuba Piecuch
2026-01-31  6:54         ` Andrea Righi
2026-01-31 16:45           ` Kuba Piecuch
2026-01-31 17:24             ` Andrea Righi
2026-01-28 21:21   ` Tejun Heo
2026-01-30 11:54     ` Kuba Piecuch
2026-01-31  9:02       ` Andrea Righi
2026-01-31 17:53         ` Kuba Piecuch
2026-01-31 20:26           ` Andrea Righi
2026-02-02 15:19             ` Tejun Heo
2026-02-02 15:30               ` Andrea Righi
2026-02-01 17:43       ` Tejun Heo
2026-02-02 15:52         ` Andrea Righi
2026-02-02 16:23           ` Kuba Piecuch
2026-01-21 12:25 [PATCHSET v2 sched_ext/for-6.20] " Andrea Righi
2026-01-21 12:25 ` [PATCH 1/2] " Andrea Righi
2026-01-21 12:54   ` Christian Loehle
2026-01-21 12:57     ` Andrea Righi
2026-01-22  9:28   ` Kuba Piecuch
2026-01-23 13:32     ` Andrea Righi
2025-12-19 22:43 [PATCH 0/2] sched_ext: Implement proper " Andrea Righi
2025-12-19 22:43 ` [PATCH 1/2] sched_ext: Fix " Andrea Righi
2025-12-28  3:20   ` Emil Tsalapatis
2025-12-29 16:36     ` Andrea Righi
2025-12-29 18:35       ` Emil Tsalapatis
2025-12-28 17:19   ` Tejun Heo
2025-12-28 23:28     ` Tejun Heo
2025-12-28 23:38       ` Tejun Heo
2025-12-29 17:07         ` Andrea Righi
2025-12-29 18:55           ` Emil Tsalapatis
2025-12-28 23:42   ` Tejun Heo
2025-12-29 17:17     ` Andrea Righi
2025-12-29  0:06   ` Tejun Heo
2025-12-29 18:56     ` Andrea Righi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aYcFOVlJhUU5huNd@gpd4 \
    --to=arighi@nvidia.com \
    --cc=changwoo@igalia.com \
    --cc=christian.loehle@arm.com \
    --cc=emil@etsalapatis.com \
    --cc=hodgesd@meta.com \
    --cc=jpiecuch@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sched-ext@lists.linux.dev \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox