Re: [PATCH 13/13] sched_ext: Implement load balancer for bypass mode

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Tejun Heo <tj@kernel.org>
To: Andrea Righi <arighi@nvidia.com>
Cc: David Vernet <void@manifault.com>,
	Changwoo Min <changwoo@igalia.com>,
	Dan Schatzberg <schatzberg.dan@gmail.com>,
	Emil Tsalapatis <etsal@meta.com>,
	sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 13/13] sched_ext: Implement load balancer for bypass mode
Date: Mon, 10 Nov 2025 09:21:46 -1000	[thread overview]
Message-ID: <aRI7SpAS_CQeS-Ph@slm.duckdns.org> (raw)
In-Reply-To: <aRGyo6M9AbInZTkb@gpd4>

Hello,

On Mon, Nov 10, 2025 at 10:38:43AM +0100, Andrea Righi wrote:
> > @@ -965,7 +980,9 @@ static void dispatch_enqueue(struct scx_sched *sch, struct scx_dispatch_q *dsq,
> >  		     !RB_EMPTY_NODE(&p->scx.dsq_priq));
> >  
> >  	if (!is_local) {
> > -		raw_spin_lock(&dsq->lock);
> > +		raw_spin_lock_nested(&dsq->lock,
> > +			(enq_flags & SCX_ENQ_NESTED) ? SINGLE_DEPTH_NESTING : 0);
> > +
> >  		if (unlikely(dsq->id == SCX_DSQ_INVALID)) {
> >  			scx_error(sch, "attempting to dispatch to a destroyed dsq");
> >  			/* fall back to the global dsq */
> 
> Outside the context of the patch we're doing:
> 
> 			/* fall back to the global dsq */
> 			raw_spin_unlock(&dsq->lock);
> 			dsq = find_global_dsq(sch, p);
> 			raw_spin_lock(&dsq->lock);
> 
> I think we should we preserve the nested lock annotation also when locking
> the global DSQ and do:
> 
> 		raw_spin_lock_nested(&dsq->lock,
> 			(enq_flags & SCX_ENQ_NESTED) ? SINGLE_DEPTH_NESTING : 0);
> 
> It seems correct either way, but without this I think we could potentially
> trigger false positive lockdep warnings.

That'd be a bug. I'll add an explicit WARN. I don't think falling back to
global DSQ quietly makes sense - e.g. global DSQ is not even consumed in
bypass mode anymore.

> > +		/*
> > +		 * Moving $p from one non-local DSQ to another. The source DSQ
> > +		 * is already locked. Do an abbreviated dequeue and then perform
> > +		 * enqueue without unlocking $donor_dsq.
> > +		 *
> > +		 * We don't want to drop and reacquire the lock on each
> > +		 * iteration as @donor_dsq can be very long and potentially
> > +		 * highly contended. Donee DSQs are less likely to be contended.
> > +		 * The nested locking is safe as only this LB moves tasks
> > +		 * between bypass DSQs.
> > +		 */
> > +		task_unlink_from_dsq(p, donor_dsq);
> > +		p->scx.dsq = NULL;
> > +		dispatch_enqueue(sch, donee_dsq, p, SCX_ENQ_NESTED);
> 
> Are we racing with dispatch_dequeue() and the holding_cpu dancing here?
> 
> If I read correctly, dispatch_dequeue() reads p->scx.dsq without holding
> the lock, then acquires the lock on that DSQ, but between the read and lock
> acquisition, the load balancer can move the task to a different DSQ.
> 
> Maybe we should change dispatch_dequeue() as well to verify after locking
> that we locked the correct DSQ, and retry if the task was moved.

Right, this is a bug. The LB should hold the source rq lock too. Let me
update the code and add a lockdep annotation.

Thanks.

-- 
tejun

next prev parent reply	other threads:[~2025-11-10 19:21 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-09 18:30 [PATCHSET sched_ext/for-6.19] sched_ext: Improve bypass mode scalability Tejun Heo
2025-11-09 18:31 ` [PATCH 01/13] sched_ext: Don't set ddsp_dsq_id during select_cpu in bypass mode Tejun Heo
2025-11-10  6:57   ` Andrea Righi
2025-11-10 16:08     ` Tejun Heo
2025-11-09 18:31 ` [PATCH 02/13] sched_ext: Make slice values tunable and use shorter slice " Tejun Heo
2025-11-10  7:03   ` Andrea Righi
2025-11-10  7:59     ` Andrea Righi
2025-11-10 16:21     ` Tejun Heo
2025-11-10 16:22       ` Tejun Heo
2025-11-10  8:22   ` Andrea Righi
2025-11-11 14:57   ` Dan Schatzberg
2025-11-09 18:31 ` [PATCH 03/13] sched_ext: Refactor do_enqueue_task() local and global DSQ paths Tejun Heo
2025-11-10  7:21   ` Andrea Righi
2025-11-09 18:31 ` [PATCH 04/13] sched_ext: Use per-CPU DSQs instead of per-node global DSQs in bypass mode Tejun Heo
2025-11-10  7:42   ` Andrea Righi
2025-11-10 16:42     ` Tejun Heo
2025-11-10 17:30       ` Andrea Righi
2025-11-11 15:31   ` Dan Schatzberg
2025-11-09 18:31 ` [PATCH 05/13] sched_ext: Simplify breather mechanism with scx_aborting flag Tejun Heo
2025-11-10  7:45   ` Andrea Righi
2025-11-11 15:34   ` Dan Schatzberg
2025-11-09 18:31 ` [PATCH 06/13] sched_ext: Exit dispatch and move operations immediately when aborting Tejun Heo
2025-11-10  8:20   ` Andrea Righi
2025-11-10 18:51     ` Tejun Heo
2025-11-11 15:46   ` Dan Schatzberg
2025-11-09 18:31 ` [PATCH 07/13] sched_ext: Make scx_exit() and scx_vexit() return bool Tejun Heo
2025-11-10  8:28   ` Andrea Righi
2025-11-11 15:48   ` Dan Schatzberg
2025-11-09 18:31 ` [PATCH 08/13] sched_ext: Refactor lockup handlers into handle_lockup() Tejun Heo
2025-11-10  8:29   ` Andrea Righi
2025-11-11 15:49   ` Dan Schatzberg
2025-11-09 18:31 ` [PATCH 09/13] sched_ext: Make handle_lockup() propagate scx_verror() result Tejun Heo
2025-11-10  8:29   ` Andrea Righi
2025-11-09 18:31 ` [PATCH 10/13] sched_ext: Hook up hardlockup detector Tejun Heo
2025-11-10  8:31   ` Andrea Righi
2025-11-09 18:31 ` [PATCH 11/13] sched_ext: Add scx_cpu0 example scheduler Tejun Heo
2025-11-10  8:36   ` Andrea Righi
2025-11-10 18:44     ` Tejun Heo
2025-11-10 21:06       ` Andrea Righi
2025-11-10 22:08         ` Tejun Heo
2025-11-09 18:31 ` [PATCH 12/13] sched_ext: Factor out scx_dsq_list_node cursor initialization into INIT_DSQ_LIST_CURSOR Tejun Heo
2025-11-10  8:37   ` Andrea Righi
2025-11-09 18:31 ` [PATCH 13/13] sched_ext: Implement load balancer for bypass mode Tejun Heo
2025-11-10  9:38   ` Andrea Righi
2025-11-10 19:21     ` Tejun Heo [this message]
  -- strict thread matches above, loose matches on Subject: below --
2025-11-11 19:18 [PATCHSET v3 sched_ext/for-6.19] sched_ext: Improve bypass mode scalability Tejun Heo
2025-11-11 19:18 ` [PATCH 13/13] sched_ext: Implement load balancer for bypass mode Tejun Heo
2025-11-11 19:30   ` Emil Tsalapatis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aRI7SpAS_CQeS-Ph@slm.duckdns.org \
    --to=tj@kernel.org \
    --cc=arighi@nvidia.com \
    --cc=changwoo@igalia.com \
    --cc=etsal@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=schatzberg.dan@gmail.com \
    --cc=sched-ext@lists.linux.dev \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox