Re: [PATCH 3/4] sched_ext: Fix ops.dequeue() semantics

public inbox for sched-ext@lists.linux.dev
 help / color / mirror / Atom feed

From: Andrea Righi <arighi@nvidia.com>
To: Tejun Heo <tj@kernel.org>
Cc: David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>,
	Kuba Piecuch <jpiecuch@google.com>,
	Emil Tsalapatis <emil@etsalapatis.com>,
	Christian Loehle <christian.loehle@arm.com>,
	Daniel Hodges <hodgesd@meta.com>,
	sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/4] sched_ext: Fix ops.dequeue() semantics
Date: Tue, 17 Feb 2026 12:51:13 +0100	[thread overview]
Message-ID: <aZRWMcY6dJe9UD_h@gpd4> (raw)
In-Reply-To: <aZQLFBEBkbF23-K8@slm.duckdns.org>

On Mon, Feb 16, 2026 at 08:30:44PM -1000, Tejun Heo wrote:
> Hello,
> 
> On Sun, Feb 15, 2026 at 08:16:55PM +0100, Andrea Righi wrote:
> > +/*
> > + * Return true if @p is moving due to an internal SCX migration, false
> > + * otherwise.
> > + */
> > +static inline bool task_scx_migrating(struct task_struct *p)
> > +{
> > +	return task_on_rq_migrating(p) && p->scx.sticky_cpu >= 0;
> > +}
> 
> Can you explain why testing task_on_rq_migrating() is necessary? What does
> just testing p->scx.sticky_cpu miss?

Yeah, we just need to check sticky_cpu here, whenever sticky_cpu >= 0 we
are by definition in the middle of an SCX-initiated migration. I'll change
that and add a comment.

> 
> > @@ -1106,6 +1153,12 @@ static void dispatch_enqueue(struct scx_sched *sch, struct rq *rq,
> >  	dsq_mod_nr(dsq, 1);
> >  	p->scx.dsq = dsq;
> >  
> > +	/*
> > +	 * Non-terminal DSQs: task enters BPF scheduler's custody.
> > +	 */
> > +	if (!is_terminal_dsq(dsq))
> > +		p->scx.flags |= SCX_TASK_IN_CUSTODY;
> 
> Can't this be done in the local_dsq_post_enq() else block?

Oh yes, this is better.

> 
> >  	/*
> >  	 * scx.ddsp_dsq_id and scx.ddsp_enq_flags are only relevant on the
> >  	 * direct dispatch path, but we clear them here because the direct
> > @@ -1122,10 +1175,23 @@ static void dispatch_enqueue(struct scx_sched *sch, struct rq *rq,
> >  	if (enq_flags & SCX_ENQ_CLEAR_OPSS)
> >  		atomic_long_set_release(&p->scx.ops_state, SCX_OPSS_NONE);
> >  
> > -	if (is_local)
> > +	if (is_local) {
> >  		local_dsq_post_enq(dsq, p, enq_flags);
> > -	else
> > +	} else {
> > +		if (dsq->id == SCX_DSQ_GLOBAL || dsq->id == SCX_DSQ_BYPASS) {
> > +			/*
> > +			 * Task is on the global or bypass DSQ: call
> > +			 * ops.dequeue() if the task was in BPF custody and
> > +			 * it's not an internal SCX migration.
> > +			 */
> > +			if ((p->scx.flags & SCX_TASK_IN_CUSTODY) &&
> > +			    !task_scx_migrating(p)) {
> > +				call_task_dequeue(sch, rq, p, 0);
> > +				p->scx.flags &= ~SCX_TASK_IN_CUSTODY;
> > +			}
> > +		}
> 
> If you add else {} here, that'd catch the non-terminal DSQs, right? If that
> works, I think that'd be more logical organization.

Agreed.

> 
> > @@ -1531,6 +1611,25 @@ static void ops_dequeue(struct rq *rq, struct task_struct *p, u64 deq_flags)
> >  
> >  	switch (opss & SCX_OPSS_STATE_MASK) {
> >  	case SCX_OPSS_NONE:
> > +		/*
> > +		 * If the task is still in BPF scheduler's custody
> > +		 * (%SCX_TASK_IN_CUSTODY is set) call ops.dequeue().
> > +		 *
> > +		 * The code that clears ops_state to %SCX_OPSS_NONE does
> > +		 * not always clear %SCX_TASK_IN_CUSTODY: in
> > +		 * dispatch_to_local_dsq(), when we're moving a task that
> > +		 * was in %SCX_OPSS_DISPATCHING to a remote CPU's local
> > +		 * DSQ, we only set ops_state to %SCX_OPSS_NONE so that a
> > +		 * concurrent dequeue can proceed, but we clear
> > +		 * %SCX_TASK_IN_CUSTODY only when we later enqueue or move
> > +		 * the task. So we can see NONE + IN_CUSTODY here and we
> > +		 * must handle it.
> > +		 */
> > +		if ((p->scx.flags & SCX_TASK_IN_CUSTODY) &&
> > +		    !task_scx_migrating(p)) {
> > +			call_task_dequeue(sch, rq, p, op_deq_flags);
> > +			p->scx.flags &= ~SCX_TASK_IN_CUSTODY;
> > +		}
> 
> Except for OPSS_QUEUED path, all call_task_dequeue() callers are using the
> same code block and OPSS_QUEUED can too. Can't we move the whole block into
> call_task_dequeue()?

Ack.

> 
> >  		break;
> >  	case SCX_OPSS_QUEUEING:
> >  		/*
> > @@ -1539,9 +1638,18 @@ static void ops_dequeue(struct rq *rq, struct task_struct *p, u64 deq_flags)
> >  		 */
> >  		BUG();
> >  	case SCX_OPSS_QUEUED:
> > -		if (SCX_HAS_OP(sch, dequeue))
> > -			SCX_CALL_OP_TASK(sch, SCX_KF_REST, dequeue, rq,
> > -					 p, deq_flags);
> > +		/*
> > +		 * Task is in BPF scheduler's custody (not dispatched yet).
> > +		 * Call ops.dequeue() unless this is an SCX-initiated
> > +		 * migration.
> > +		 *
> > +		 * A queued task must be always in BPF scheduler's custody.
> > +		 */
> > +		WARN_ON_ONCE(!(p->scx.flags & SCX_TASK_IN_CUSTODY));
> > +		if (!task_scx_migrating(p)) {
> > +			call_task_dequeue(sch, rq, p, op_deq_flags);
> > +			p->scx.flags &= ~SCX_TASK_IN_CUSTODY;
> > +		}
> 
> This placement is a bit odd as before the following try_cmpxchg, this path
> doesn't have the ownership of the task. Please see below.
> 
> >  		if (atomic_long_try_cmpxchg(&p->scx.ops_state, &opss,
> >  					    SCX_OPSS_NONE))
> > @@ -1563,6 +1671,16 @@ static void ops_dequeue(struct rq *rq, struct task_struct *p, u64 deq_flags)
> >  		 */
> >  		wait_ops_state(p, SCX_OPSS_DISPATCHING);
> >  		BUG_ON(atomic_long_read(&p->scx.ops_state) != SCX_OPSS_NONE);
> > +
> > +		 /*
> > +		  * After DISPATCHING completes, task may still be
> > +		  * IN_CUSTODY (see the NONE case).
> > +		  */
> > +		if ((p->scx.flags & SCX_TASK_IN_CUSTODY) &&
> > +		    !task_scx_migrating(p)) {
> > +			call_task_dequeue(sch, rq, p, op_deq_flags);
> > +			p->scx.flags &= ~SCX_TASK_IN_CUSTODY;
> > +		}
> >  		break;
> >  	}
> >  }
> 
> If you move QUEUED case after successful try_cmpxchg(), it's always calling
> dequeue before breaking out of the switch block. Might as well do it in a
> single spot right after the switch block?

Makes sense. I'll move this after the switch block.

Thanks!
-Andrea

next prev parent reply	other threads:[~2026-02-17 11:51 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-15 19:16 [PATCHSET v9] sched_ext: Fix ops.dequeue() semantics Andrea Righi
2026-02-15 19:16 ` [PATCH 1/4] sched_ext: Properly mark SCX-internal migrations via sticky_cpu Andrea Righi
2026-02-15 19:16 ` [PATCH 2/4] sched_ext: Add rq parameter to dispatch_enqueue() Andrea Righi
2026-02-15 19:16 ` [PATCH 3/4] sched_ext: Fix ops.dequeue() semantics Andrea Righi
2026-02-17  6:30   ` Tejun Heo
2026-02-17 11:51     ` Andrea Righi [this message]
2026-02-15 19:16 ` [PATCH 4/4] selftests/sched_ext: Add test to validate " Andrea Righi
  -- strict thread matches above, loose matches on Subject: below --
2026-02-18  8:32 [PATCHSET v10] sched_ext: Fix " Andrea Righi
2026-02-18  8:32 ` [PATCH 3/4] " Andrea Righi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aZRWMcY6dJe9UD_h@gpd4 \
    --to=arighi@nvidia.com \
    --cc=changwoo@igalia.com \
    --cc=christian.loehle@arm.com \
    --cc=emil@etsalapatis.com \
    --cc=hodgesd@meta.com \
    --cc=jpiecuch@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sched-ext@lists.linux.dev \
    --cc=tj@kernel.org \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox