public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH] sched/ext: Avoid null ptr traversal when ->put_prev_task() is called with NULL next
@ 2025-12-06  2:22 John Stultz
  2025-12-07 13:54 ` Andrea Righi
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: John Stultz @ 2025-12-06  2:22 UTC (permalink / raw)
  To: LKML
  Cc: John Stultz, Joel Fernandes, Qais Yousef, Ingo Molnar,
	Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
	Valentin Schneider, Steven Rostedt, Ben Segall, Zimuzo Ezeozue,
	Mel Gorman, Will Deacon, Waiman Long, Boqun Feng,
	Paul E. McKenney, Metin Kaya, Xuewen Yan, K Prateek Nayak,
	Thomas Gleixner, Daniel Lezcano, Suleiman Souhlal, kuyo chang,
	hupu, Tejun Heo, David Vernet, Andrea Righi, Changwoo Min,
	sched-ext, kernel-team

Early when trying to get sched_ext and proxy-exe working together,
I kept tripping over NULL ptr in put_prev_task_scx() on the line:
  if (sched_class_above(&ext_sched_class, next->sched_class)) {

Which was due to put_prev_task() passes a NULL next, calling:
  prev->sched_class->put_prev_task(rq, prev, NULL);

put_prev_task_scx() already guards for a NULL next in the
switch_class case, but doesn't seem to have a guard for
sched_class_above() check.

I can't say I understand why this doesn't trip usually without
proxy-exec. And in newer kernels there are way fewer
put_prev_task(), and I can't easily reproduce the issue now
even with proxy-exec.

But we still have one put_prev_task() call left in core.c that
seems like it could trip this, so I wanted to send this out for
consideration.

Signed-off-by: John Stultz <jstultz@google.com>
---
Cc: Joel Fernandes <joelagnelf@nvidia.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: kuyo chang <kuyo.chang@mediatek.com>
Cc: hupu <hupu.gm@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Vernet <void@manifault.com>
Cc: Andrea Righi <arighi@nvidia.com>
Cc: Changwoo Min <changwoo@igalia.com>
Cc: sched-ext@lists.linux.dev
Cc: kernel-team@android.com
---
 kernel/sched/ext.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 446091cba4429..598552f58f5ec 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -2402,7 +2402,7 @@ static void put_prev_task_scx(struct rq *rq, struct task_struct *p,
 		 * ops.enqueue() that @p is the only one available for this cpu,
 		 * which should trigger an explicit follow-up scheduling event.
 		 */
-		if (sched_class_above(&ext_sched_class, next->sched_class)) {
+		if (next && sched_class_above(&ext_sched_class, next->sched_class)) {
 			WARN_ON_ONCE(!(sch->ops.flags & SCX_OPS_ENQ_LAST));
 			do_enqueue_task(rq, p, SCX_ENQ_LAST, -1);
 		} else {
-- 
2.52.0.223.gf5cc29aaa4-goog


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC][PATCH] sched/ext: Avoid null ptr traversal when ->put_prev_task() is called with NULL next
  2025-12-06  2:22 [RFC][PATCH] sched/ext: Avoid null ptr traversal when ->put_prev_task() is called with NULL next John Stultz
@ 2025-12-07 13:54 ` Andrea Righi
  2025-12-08 10:10 ` Kuba Piecuch
  2025-12-08 18:23 ` Tejun Heo
  2 siblings, 0 replies; 6+ messages in thread
From: Andrea Righi @ 2025-12-07 13:54 UTC (permalink / raw)
  To: John Stultz
  Cc: LKML, Joel Fernandes, Qais Yousef, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
	Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Mel Gorman,
	Will Deacon, Waiman Long, Boqun Feng, Paul E. McKenney,
	Metin Kaya, Xuewen Yan, K Prateek Nayak, Thomas Gleixner,
	Daniel Lezcano, Suleiman Souhlal, kuyo chang, hupu, Tejun Heo,
	David Vernet, Changwoo Min, sched-ext, kernel-team

On Sat, Dec 06, 2025 at 02:22:03AM +0000, John Stultz wrote:
> Early when trying to get sched_ext and proxy-exe working together,
> I kept tripping over NULL ptr in put_prev_task_scx() on the line:
>   if (sched_class_above(&ext_sched_class, next->sched_class)) {
> 
> Which was due to put_prev_task() passes a NULL next, calling:
>   prev->sched_class->put_prev_task(rq, prev, NULL);
> 
> put_prev_task_scx() already guards for a NULL next in the
> switch_class case, but doesn't seem to have a guard for
> sched_class_above() check.
> 
> I can't say I understand why this doesn't trip usually without
> proxy-exec. And in newer kernels there are way fewer
> put_prev_task(), and I can't easily reproduce the issue now
> even with proxy-exec.
> 
> But we still have one put_prev_task() call left in core.c that
> seems like it could trip this, so I wanted to send this out for
> consideration.
> 
> Signed-off-by: John Stultz <jstultz@google.com>

This looks like a valid fix to me. If the task changes any sched property
while it's running, we go through sched_change_begin() which calls
put_prev_task() that always passes NULL as the next parameter:

static inline void put_prev_task(struct rq *rq, struct task_struct *prev)
{
        WARN_ON_ONCE(rq->donor != prev);
        prev->sched_class->put_prev_task(rq, prev, NULL);
}

This should be the code path(s) to trigger the bug:

  sys_setpriority() / sched_setaffinity() / sched_setscheduler()
    - set_user_nice() / __sched_setaffinity() / __sched_setscheduler()
      - scoped_guard(sched_change, p, DEQUEUE_SAVE)
        - sched_change_begin(p, DEQUEUE_SAVE)
          - if (ctx->running)
               put_prev_task(rq, p)
            - prev->sched_class->put_prev_task(rq, prev, NULL)
              - put_prev_task_scx(rq, prev, NULL)
                - if (sched_class_above(&ext_sched_class, next->sched_class))
	                                                  ^^^^
                                                          NULL dereference

Reviewed-by: Andrea Righi <arighi@nvidia.com>

Thanks,
-Andrea

> ---
> Cc: Joel Fernandes <joelagnelf@nvidia.com>
> Cc: Qais Yousef <qyousef@layalina.io>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Juri Lelli <juri.lelli@redhat.com>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Valentin Schneider <vschneid@redhat.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Zimuzo Ezeozue <zezeozue@google.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Will Deacon <will@kernel.org>
> Cc: Waiman Long <longman@redhat.com>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Metin Kaya <Metin.Kaya@arm.com>
> Cc: Xuewen Yan <xuewen.yan94@gmail.com>
> Cc: K Prateek Nayak <kprateek.nayak@amd.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
> Cc: Suleiman Souhlal <suleiman@google.com>
> Cc: kuyo chang <kuyo.chang@mediatek.com>
> Cc: hupu <hupu.gm@gmail.com>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: David Vernet <void@manifault.com>
> Cc: Andrea Righi <arighi@nvidia.com>
> Cc: Changwoo Min <changwoo@igalia.com>
> Cc: sched-ext@lists.linux.dev
> Cc: kernel-team@android.com
> ---
>  kernel/sched/ext.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index 446091cba4429..598552f58f5ec 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -2402,7 +2402,7 @@ static void put_prev_task_scx(struct rq *rq, struct task_struct *p,
>  		 * ops.enqueue() that @p is the only one available for this cpu,
>  		 * which should trigger an explicit follow-up scheduling event.
>  		 */
> -		if (sched_class_above(&ext_sched_class, next->sched_class)) {
> +		if (next && sched_class_above(&ext_sched_class, next->sched_class)) {
>  			WARN_ON_ONCE(!(sch->ops.flags & SCX_OPS_ENQ_LAST));
>  			do_enqueue_task(rq, p, SCX_ENQ_LAST, -1);
>  		} else {
> -- 
> 2.52.0.223.gf5cc29aaa4-goog
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC][PATCH] sched/ext: Avoid null ptr traversal when ->put_prev_task() is called with NULL next
  2025-12-06  2:22 [RFC][PATCH] sched/ext: Avoid null ptr traversal when ->put_prev_task() is called with NULL next John Stultz
  2025-12-07 13:54 ` Andrea Righi
@ 2025-12-08 10:10 ` Kuba Piecuch
  2025-12-08 11:15   ` Kuba Piecuch
  2025-12-08 18:23 ` Tejun Heo
  2 siblings, 1 reply; 6+ messages in thread
From: Kuba Piecuch @ 2025-12-08 10:10 UTC (permalink / raw)
  To: John Stultz, LKML
  Cc: Joel Fernandes, Qais Yousef, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
	Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Mel Gorman,
	Will Deacon, Waiman Long, Boqun Feng, Paul E. McKenney,
	Metin Kaya, Xuewen Yan, K Prateek Nayak, Thomas Gleixner,
	Daniel Lezcano, Suleiman Souhlal, kuyo chang, hupu, Tejun Heo,
	David Vernet, Andrea Righi, Changwoo Min, sched-ext, kernel-team

Hello John,

On Sat Dec 6, 2025 at 2:22 AM UTC, John Stultz wrote:
> I can't say I understand why this doesn't trip usually without
> proxy-exec. And in newer kernels there are way fewer
> put_prev_task(), and I can't easily reproduce the issue now
> even with proxy-exec.

That's probably because put_prev_task_scx() with next == NULL is always
preceded by a dequeue, clearing SCX_TASK_QUEUED from p->scx.flags, so we don't
reach the problematic sched_class_above() check because it only happens when
the flag is set.

> But we still have one put_prev_task() call left in core.c that
> seems like it could trip this, so I wanted to send this out for
> consideration.

I'm assuming you're referring to the one in sched_change_begin().
It looks like it's impossible for an outside observer holding a CPU's rq lock
to observe a task that is running on that CPU and isn't queued, i.e.
'running' implies 'queued' (I'm new to the scheduler so I may be wrong here).
That would explain why dequeue_task() is always called before put_prev_task().
Does proxy execution break that assumption?

Best,
Kuba

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC][PATCH] sched/ext: Avoid null ptr traversal when ->put_prev_task() is called with NULL next
  2025-12-08 10:10 ` Kuba Piecuch
@ 2025-12-08 11:15   ` Kuba Piecuch
  2025-12-08 14:27     ` Kuba Piecuch
  0 siblings, 1 reply; 6+ messages in thread
From: Kuba Piecuch @ 2025-12-08 11:15 UTC (permalink / raw)
  To: Kuba Piecuch, John Stultz, LKML
  Cc: Joel Fernandes, Qais Yousef, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
	Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Mel Gorman,
	Will Deacon, Waiman Long, Boqun Feng, Paul E. McKenney,
	Metin Kaya, Xuewen Yan, K Prateek Nayak, Thomas Gleixner,
	Daniel Lezcano, Suleiman Souhlal, kuyo chang, hupu, Tejun Heo,
	David Vernet, Andrea Righi, Changwoo Min, sched-ext, kernel-team

On Mon Dec 8, 2025 at 10:10 AM UTC, Kuba Piecuch wrote:
> It looks like it's impossible for an outside observer holding a CPU's rq lock
> to observe a task that is running on that CPU and isn't queued, i.e.
> 'running' implies 'queued' (I'm new to the scheduler so I may be wrong here).

A task that blocks in __schedule() can drop the rq lock while picking the next
task, which is after try_to_block_task() dequeues prev. So it's very much
possible for a task on another CPU to grab the rq lock and observe prev as
dequeued but still running.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC][PATCH] sched/ext: Avoid null ptr traversal when ->put_prev_task() is called with NULL next
  2025-12-08 11:15   ` Kuba Piecuch
@ 2025-12-08 14:27     ` Kuba Piecuch
  0 siblings, 0 replies; 6+ messages in thread
From: Kuba Piecuch @ 2025-12-08 14:27 UTC (permalink / raw)
  To: Kuba Piecuch, John Stultz, LKML
  Cc: Joel Fernandes, Qais Yousef, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
	Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Mel Gorman,
	Will Deacon, Waiman Long, Boqun Feng, Paul E. McKenney,
	Metin Kaya, Xuewen Yan, K Prateek Nayak, Thomas Gleixner,
	Daniel Lezcano, Suleiman Souhlal, kuyo chang, hupu, Tejun Heo,
	David Vernet, Andrea Righi, Changwoo Min, sched-ext, kernel-team

On Mon Dec 8, 2025 at 11:15 AM UTC, Kuba Piecuch wrote:
> On Mon Dec 8, 2025 at 10:10 AM UTC, Kuba Piecuch wrote:
>> It looks like it's impossible for an outside observer holding a CPU's rq lock
>> to observe a task that is running on that CPU and isn't queued, i.e.
>> 'running' implies 'queued' (I'm new to the scheduler so I may be wrong here).
>
> A task that blocks in __schedule() can drop the rq lock while picking the next
> task, which is after try_to_block_task() dequeues prev. So it's very much
> possible for a task on another CPU to grab the rq lock and observe prev as
> dequeued but still running.

Even with that, I'm not convinced that it's possible to do a NULL deref with
the current code.

In order for sched_change_begin() to do the NULL deref in put_prev_task_scx(),
we would need to have:

* rq->donor == p (for sched_change_begin() to call put_prev_task())
* p->on_rq != TASK_ON_RQ_QUEUED
  (for sched_change_begin() to not call dequeue_task() beforehand)
* p->scx.flags & SCX_TASK_QUEUED
  (for put_prev_task_scx() to enter the branch with the @next deref)

From a brief survey of the code, __assuming proxy execution is disabled__,
I don't think it's possible for a remote task holding @rq's lock to observe
the second and third condition to be true.

Every time p->on_rq is changed away from TASK_ON_RQ_QUEUED, it happens under
the rq lock and is paired with a dequeue (see block_task(),
deactivate_task()). dequeue_task_scx() always clears SCX_TASK_QUEUED from
p->scx.flags.

Every time SCX_TASK_QUEUED is set in p->scx.flags (i.e. enqueue_task_scx()
is called), it happens under the rq lock and is either gated by
p->on_rq == TASK_ON_RQ_QUEUED (see ttwu_runnable(), sched_change_end()) or is
paired with p->on_rq being set to TASK_ON_RQ_QUEUED (see activate_task()).
It also happens in proxy_tag_curr(), which is a no-op if proxy execution is
disabled. Even when it's enabled, proxy_tag_curr() does a dequeue-enqueue
cycle while holding the rq lock, which doesn't look dangerous.

I'm not trying to say that we shouldn't add a NULL check, all this is just
for my own understanding.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC][PATCH] sched/ext: Avoid null ptr traversal when ->put_prev_task() is called with NULL next
  2025-12-06  2:22 [RFC][PATCH] sched/ext: Avoid null ptr traversal when ->put_prev_task() is called with NULL next John Stultz
  2025-12-07 13:54 ` Andrea Righi
  2025-12-08 10:10 ` Kuba Piecuch
@ 2025-12-08 18:23 ` Tejun Heo
  2 siblings, 0 replies; 6+ messages in thread
From: Tejun Heo @ 2025-12-08 18:23 UTC (permalink / raw)
  To: John Stultz
  Cc: LKML, Joel Fernandes, Qais Yousef, Ingo Molnar, Peter Zijlstra,
	Juri Lelli, Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
	Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Mel Gorman,
	Will Deacon, Waiman Long, Boqun Feng, Paul E. McKenney,
	Metin Kaya, Xuewen Yan, K Prateek Nayak, Thomas Gleixner,
	Daniel Lezcano, Suleiman Souhlal, kuyo chang, hupu, David Vernet,
	Andrea Righi, Changwoo Min, sched-ext, kernel-team

Applied to sched_ext/for-6.19-fixes with a note.

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-12-08 18:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-06  2:22 [RFC][PATCH] sched/ext: Avoid null ptr traversal when ->put_prev_task() is called with NULL next John Stultz
2025-12-07 13:54 ` Andrea Righi
2025-12-08 10:10 ` Kuba Piecuch
2025-12-08 11:15   ` Kuba Piecuch
2025-12-08 14:27     ` Kuba Piecuch
2025-12-08 18:23 ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox