public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH sched_ext/for-7.1-fixes] sched_ext: Call wakeup_preempt() in local_dsq_post_enq()
@ 2026-04-24  9:22 Kuba Piecuch
  2026-04-24 17:17 ` Tejun Heo
  0 siblings, 1 reply; 4+ messages in thread
From: Kuba Piecuch @ 2026-04-24  9:22 UTC (permalink / raw)
  To: Tejun Heo, Andrea Righi, Changwoo Min, David Vernet
  Cc: linux-kernel, sched-ext, Peter Zijlstra, Kuba Piecuch

There are several edge cases (see linked thread) where an IMMED task
can be left lingering on a local DSQ if an RT task swoops in at the
wrong time. All of these edge cases are due to rq->next_class being idle
even after dispatching a task to rq's local DSQ. We should bump
rq->next_class to &ext_sched_class as soon as we've inserted a task into
the local DSQ.

To optimize the common case of rq->next_class == &ext_sched_class,
only call wakeup_preempt() if rq->next_class is below EXT. If next_class
is EXT or above, wakeup_preempt() is a no-op anyway.

Link: https://lore.kernel.org/all/DHZPHUFXB4N3.2RY28MUEWBNYK@google.com/
Signed-off-by: Kuba Piecuch <jpiecuch@google.com>
---
 kernel/sched/ext.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 1f670028bf19..034df77e3af1 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -1393,7 +1393,6 @@ static void local_dsq_post_enq(struct scx_sched *sch, struct scx_dispatch_q *dsq
 			       struct task_struct *p, u64 enq_flags)
 {
 	struct rq *rq = container_of(dsq, struct rq, scx.local_dsq);
-	bool preempt = false;
 
 	call_task_dequeue(sch, rq, p, 0);
 
@@ -1408,11 +1407,19 @@ static void local_dsq_post_enq(struct scx_sched *sch, struct scx_dispatch_q *dsq
 	if ((enq_flags & SCX_ENQ_PREEMPT) && p != rq->curr &&
 	    rq->curr->sched_class == &ext_sched_class) {
 		rq->curr->scx.slice = 0;
-		preempt = true;
+		resched_curr(rq);
 	}
 
-	if (preempt || sched_class_above(&ext_sched_class, rq->curr->sched_class))
-		resched_curr(rq);
+	/*
+	 * If @rq->next_class is currently idle, we need to bump it
+	 * to &ext_sched_class using wakeup_preempt(). Otherwise, if we drop
+	 * the rq lock later in the pick and an RT task wakes up on @rq,
+	 * wakeup_preempt_idle() will be called during RT task wakeup and
+	 * SCX won't have an opportunity to re-enqueue IMMED tasks from @rq's
+	 * local DSQ.
+	 */
+	if (sched_class_above(&ext_sched_class, rq->next_class))
+		wakeup_preempt(rq, p, 0);
 }
 
 static void dispatch_enqueue(struct scx_sched *sch, struct rq *rq,
-- 
2.54.0.rc2.544.gc7ae2d5bb8-goog


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH sched_ext/for-7.1-fixes] sched_ext: Call wakeup_preempt() in local_dsq_post_enq()
  2026-04-24  9:22 [PATCH sched_ext/for-7.1-fixes] sched_ext: Call wakeup_preempt() in local_dsq_post_enq() Kuba Piecuch
@ 2026-04-24 17:17 ` Tejun Heo
  2026-04-27 14:14   ` Kuba Piecuch
  0 siblings, 1 reply; 4+ messages in thread
From: Tejun Heo @ 2026-04-24 17:17 UTC (permalink / raw)
  To: Kuba Piecuch
  Cc: Andrea Righi, Changwoo Min, David Vernet, linux-kernel, sched-ext,
	Peter Zijlstra

Hello, Kuba.

On Fri, Apr 24, 2026 at 09:22:44AM +0000, Kuba Piecuch wrote:
> @@ -1408,11 +1407,19 @@ static void local_dsq_post_enq(struct scx_sched *sch, struct scx_dispatch_q *dsq
>  	if ((enq_flags & SCX_ENQ_PREEMPT) && p != rq->curr &&
>  	    rq->curr->sched_class == &ext_sched_class) {
>  		rq->curr->scx.slice = 0;
> -		preempt = true;
> +		resched_curr(rq);
>  	}
>  
> -	if (preempt || sched_class_above(&ext_sched_class, rq->curr->sched_class))
> -		resched_curr(rq);

Hmm... I don't quite understand this part of the change. sched_class_above()
got separated out into its own case but why is it dropping resched_curr() on
SCX_ENQ_PREEMPT?

> +	/*
> +	 * If @rq->next_class is currently idle, we need to bump it
> +	 * to &ext_sched_class using wakeup_preempt(). Otherwise, if we drop
> +	 * the rq lock later in the pick and an RT task wakes up on @rq,
> +	 * wakeup_preempt_idle() will be called during RT task wakeup and
> +	 * SCX won't have an opportunity to re-enqueue IMMED tasks from @rq's
> +	 * local DSQ.

As this was really subtle, I think it warrants documenting all cases here.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH sched_ext/for-7.1-fixes] sched_ext: Call wakeup_preempt() in local_dsq_post_enq()
  2026-04-24 17:17 ` Tejun Heo
@ 2026-04-27 14:14   ` Kuba Piecuch
  2026-04-27 17:01     ` Tejun Heo
  0 siblings, 1 reply; 4+ messages in thread
From: Kuba Piecuch @ 2026-04-27 14:14 UTC (permalink / raw)
  To: Tejun Heo, Kuba Piecuch
  Cc: Andrea Righi, Changwoo Min, David Vernet, linux-kernel, sched-ext,
	Peter Zijlstra

Hi Tejun,

On Fri Apr 24, 2026 at 5:17 PM UTC, Tejun Heo wrote:
> Hello, Kuba.
>
> On Fri, Apr 24, 2026 at 09:22:44AM +0000, Kuba Piecuch wrote:
>> @@ -1408,11 +1407,19 @@ static void local_dsq_post_enq(struct scx_sched *sch, struct scx_dispatch_q *dsq
>>  	if ((enq_flags & SCX_ENQ_PREEMPT) && p != rq->curr &&
>>  	    rq->curr->sched_class == &ext_sched_class) {
>>  		rq->curr->scx.slice = 0;
>> -		preempt = true;
>> +		resched_curr(rq);
>>  	}
>>  
>> -	if (preempt || sched_class_above(&ext_sched_class, rq->curr->sched_class))
>> -		resched_curr(rq);
>
> Hmm... I don't quite understand this part of the change. sched_class_above()
> got separated out into its own case but why is it dropping resched_curr() on
> SCX_ENQ_PREEMPT?

In the SCX_ENQ_PREEMPT case we call resched_curr() where we previously set
preempt = true.

In the sched_class_above() case, wakeup_preempt() will call resched_curr()
for us:

	void wakeup_preempt(struct rq *rq, struct task_struct *p, int flags)
	{
		[...]

		if (p->sched_class == rq->next_class) {
			rq->next_class->wakeup_preempt(rq, p, flags);

		} else if (sched_class_above(p->sched_class, rq->next_class)) {
			rq->next_class->wakeup_preempt(rq, p, flags);
		=====>	resched_curr(rq);  <=====
			rq->next_class = p->sched_class;
		}

		[...]
	}

>
>> +	/*
>> +	 * If @rq->next_class is currently idle, we need to bump it
>> +	 * to &ext_sched_class using wakeup_preempt(). Otherwise, if we drop
>> +	 * the rq lock later in the pick and an RT task wakes up on @rq,
>> +	 * wakeup_preempt_idle() will be called during RT task wakeup and
>> +	 * SCX won't have an opportunity to re-enqueue IMMED tasks from @rq's
>> +	 * local DSQ.
>
> As this was really subtle, I think it warrants documenting all cases here.

Yeah, I was trying to keep it concise. How about something like this:

	/*
	 * Note that @rq's lock may be dropped between this enqueue and @p
	 * actually getting on CPU. This gives higher-class tasks (e.g. RT)
	 * an opportunity to wake up on @rq and prevent @p from running.
	 * Here are some concrete examples:
	 *
	 * Example 1:
         *
	 * We dispatch two tasks from a single ops.dispatch():
	 * - First, a local task to this CPU's local DSQ;
	 * - Second, a local/remote task to a remote CPU's local DSQ.
	 * We must drop the local rq lock in order to finish the second
	 * dispatch. In that time, an RT task can wake up on the local rq.
	 *
	 * Example 2:
	 *
	 * We dispatch a local/remote task to a remote CPU's local DSQ.
	 * We must drop the remote rq lock before the dispatched task can run,
	 * which gives an RT task an opportunity to wake up on the remote rq.
	 *
	 * Both examples work the same if we replace dispatching with moving
	 * the tasks from a user-created DSQ.
	 *
	 * We must detect these wakeups so that we can re-enqueue IMMED tasks
	 * from @rq's local DSQ. scx_wakeup_preempt() serves exactly this
	 * purpose, but for it to be invoked, we must ensure that we bump
	 * @rq->next_class to &ext_sched_class if it's currently idle.
	 *
	 * wakeup_preempt() does the bumping, and since we only invoke it if
	 * @rq->next_class is below &ext_sched_class, it will also
	 * resched_curr(rq).
	 */

Thanks,
Kuba



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH sched_ext/for-7.1-fixes] sched_ext: Call wakeup_preempt() in local_dsq_post_enq()
  2026-04-27 14:14   ` Kuba Piecuch
@ 2026-04-27 17:01     ` Tejun Heo
  0 siblings, 0 replies; 4+ messages in thread
From: Tejun Heo @ 2026-04-27 17:01 UTC (permalink / raw)
  To: Kuba Piecuch
  Cc: Andrea Righi, Changwoo Min, David Vernet, linux-kernel, sched-ext,
	Peter Zijlstra

On Mon, Apr 27, 2026 at 02:14:33PM +0000, Kuba Piecuch wrote:
> > Hmm... I don't quite understand this part of the change. sched_class_above()
> > got separated out into its own case but why is it dropping resched_curr() on
> > SCX_ENQ_PREEMPT?
> 
> In the SCX_ENQ_PREEMPT case we call resched_curr() where we previously set
> preempt = true.
> 
> In the sched_class_above() case, wakeup_preempt() will call resched_curr()
> for us:

I see. It'd be helpful to note that in desc.

> >> +	/*
> >> +	 * If @rq->next_class is currently idle, we need to bump it
> >> +	 * to &ext_sched_class using wakeup_preempt(). Otherwise, if we drop
> >> +	 * the rq lock later in the pick and an RT task wakes up on @rq,
> >> +	 * wakeup_preempt_idle() will be called during RT task wakeup and
> >> +	 * SCX won't have an opportunity to re-enqueue IMMED tasks from @rq's
> >> +	 * local DSQ.
> >
> > As this was really subtle, I think it warrants documenting all cases here.
> 
> Yeah, I was trying to keep it concise. How about something like this:
> 
> 	/*
> 	 * Note that @rq's lock may be dropped between this enqueue and @p
> 	 * actually getting on CPU. This gives higher-class tasks (e.g. RT)
> 	 * an opportunity to wake up on @rq and prevent @p from running.
> 	 * Here are some concrete examples:
> 	 *
> 	 * Example 1:
>          *
> 	 * We dispatch two tasks from a single ops.dispatch():
> 	 * - First, a local task to this CPU's local DSQ;
> 	 * - Second, a local/remote task to a remote CPU's local DSQ.
> 	 * We must drop the local rq lock in order to finish the second
> 	 * dispatch. In that time, an RT task can wake up on the local rq.
> 	 *
> 	 * Example 2:
> 	 *
> 	 * We dispatch a local/remote task to a remote CPU's local DSQ.
> 	 * We must drop the remote rq lock before the dispatched task can run,
> 	 * which gives an RT task an opportunity to wake up on the remote rq.
> 	 *
> 	 * Both examples work the same if we replace dispatching with moving
> 	 * the tasks from a user-created DSQ.
> 	 *
> 	 * We must detect these wakeups so that we can re-enqueue IMMED tasks
> 	 * from @rq's local DSQ. scx_wakeup_preempt() serves exactly this
> 	 * purpose, but for it to be invoked, we must ensure that we bump
> 	 * @rq->next_class to &ext_sched_class if it's currently idle.
> 	 *
> 	 * wakeup_preempt() does the bumping, and since we only invoke it if
> 	 * @rq->next_class is below &ext_sched_class, it will also
> 	 * resched_curr(rq).
> 	 */

Looks good to me.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-04-27 17:01 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-24  9:22 [PATCH sched_ext/for-7.1-fixes] sched_ext: Call wakeup_preempt() in local_dsq_post_enq() Kuba Piecuch
2026-04-24 17:17 ` Tejun Heo
2026-04-27 14:14   ` Kuba Piecuch
2026-04-27 17:01     ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox