All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Righi <arighi@nvidia.com>
To: Alice Ryhl <aliceryhl@google.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>,
	Boqun Feng <boqun@kernel.org>, Changwoo Min <changwoo@igalia.com>,
	Clark Williams <clrkwllms@kernel.org>,
	David Vernet <void@manifault.com>,
	Frederic Weisbecker <frederic@kernel.org>,
	Ingo Molnar <mingo@redhat.com>, Jens Axboe <axboe@kernel.dk>,
	Joel Fernandes <joelagnelf@nvidia.com>,
	Josh Triplett <josh@joshtriplett.org>,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Steven Rostedt <rostedt@goodmis.org>, Tejun Heo <tj@kernel.org>,
	Uladzislau Rezki <urezki@gmail.com>,
	Zqiang <qiang.zhang@linux.dev>,
	io-uring@vger.kernel.org, rcu@vger.kernel.org,
	sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org,
	linux-rt-devel@lists.linux.dev
Subject: Re: [PATCH] sched/task: always defer 'struct task_struct' destruction via RCU
Date: Sun, 10 May 2026 20:36:41 +0200	[thread overview]
Message-ID: <agDQOf8Qe11ozpuf@gpd4> (raw)
In-Reply-To: <agCLBxHEUqWIepx8@google.com>

On Sun, May 10, 2026 at 01:41:27PM +0000, Alice Ryhl wrote:
> On Fri, May 08, 2026 at 11:38:18PM +0200, Andrea Righi wrote:
> > Hi Alice,
> > 
> > On Fri, May 08, 2026 at 02:02:45PM +0000, Alice Ryhl wrote:
> > > The sched/task.h header file currently exposes a tryget_task_struct()
> > > function, but it is very risky to use it: If the last refcount of the
> > > task is dropped using put_task_struct_many(), then the task is freed
> > > right away without an RCU grace period.
> > > 
> > > This means that if the kernel contains a code path anywhere such that
> > > the last refcount of a task may be dropped with put_task_struct_many(),
> > > and it also contains a code path anywhere that tries to stash a task
> > > pointer under rcu and use tryget_task_struct() on it, then if they ever
> > > execute on the same 'struct task_struct', it results in a
> > > use-after-free.
> > > 
> > > The above applies even if the RCU user drops its own task reference with
> > > put_task_struct(), because if that is not the last reference, then it's
> > > possible for another thread to invoke put_task_struct_many() and free
> > > the task less than a grace period after the RCU user called
> > > put_task_struct().
> > > 
> > > There does not appear to be an actual problem in the kernel tree right
> > > now because there are no in-tree users of put_task_struct_many() where
> > > refcount_sub_and_test() might return 'true'. Io-uring invokes the
> > > function from task work while the task is still running, so it will not
> > > decrement it all the way to zero. (Note that if I'm wrong about this,
> > > then it's probably possible to trigger UAF by combining this codepath in
> > > io-uring with the tryget_task_struct() call in sched-ext.)
> > > 
> > > However, the current situation is fragile and error-prone.
> > > - If you look at put_task_struct_many() in isolation, it looks like it
> > >   would be okay to call it in a situation where refcount_sub_and_test()
> > >   might return 'true'.
> > > - Similarly, if you look at tryget_task_struct(), you would assume that
> > >   you are allowed to call this method for a grace period after 'users'
> > >   hitting zero. (If not, why does it exist?)
> > > But if two different kernel developers anywhere in the kernel make these
> > > conflicting assumptions at any point in the future, then the combination
> > > of their code may lead to a use-after-free if there is any way for them
> > > to interact via the same 'struct task_struct'.
> > > 
> > > Thus, as a defensive measure, we should either make
> > > put_task_struct_many() use call_rcu(), or we should delete
> > > tryget_task_struct(). This patch suggests the former because it does not
> > > change anything for any callers that exist today. (As argued previously,
> > > the body of the 'if' statement is dead code in the kernel today.)
> > > 
> > > The comment in put_task_struct() is also updated so that nobody changes
> > > its implementation to only use call_rcu() under PREEMPT_RT in the
> > > future. The current comment suggests that would be a legal change, but
> > > it is similarly incompatible with anyone using tryget_task_struct().
> > > 
> > > Signed-off-by: Alice Ryhl <aliceryhl@google.com>
> > > ---
> > > Including sched-ext and io-uring in the cc list as they are the only
> > > users of tryget_task_struct() and put_task_struct_many() respectively.
> > 
> > For sched_ext I think we should be already protected by scx_tasks_lock.
> > 
> > From kernel/sched/core.c:
> > 
> >   finish_task_switch():
> >       if (prev_state == TASK_DEAD) {
> >           prev->sched_class->task_dead(prev);
> >           sched_ext_dead(prev);
> >           cgroup_task_dead(prev);
> >           put_task_stack(prev);
> >           ...
> >           put_task_struct_rcu_user(prev);
> >       }
> > 
> >  And sched_ext_dead() in kernel/sched/ext.c:
> > 
> >   scoped_guard(raw_spinlock_irqsave, &scx_tasks_lock) {
> >       list_del_init(&p->scx.tasks_node);
> >       ...
> >   }
> > 
> > Now on the sched_ext iter side:
> > 
> >   scx_task_iter_start();                     /* takes scx_tasks_lock */
> >   while ((p = scx_task_iter_next_locked()))
> >       if (!tryget_task_struct(p))            /* still under scx_tasks_lock */
> >          ...
> > 
> > So, the locking gives us the invariant: while the iter holds scx_tasks_lock and
> > observes p on the list, sched_ext_dead(p) cannot have completed.
> 
> Correct my if I'm wrong, but this sounds like you don't need the tryget
> variant. The 'users' counter is guaranteed be non-zero for one grace
> period after put_task_struct_rcu_user(prev).

Correct, I think we can just get rid of tryget and use get_task_struct().
I'll run some stress tests with this change.

-Andrea

      reply	other threads:[~2026-05-10 18:36 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-08 14:02 [PATCH] sched/task: always defer 'struct task_struct' destruction via RCU Alice Ryhl
2026-05-08 20:01 ` Sebastian Andrzej Siewior
2026-05-09  7:18   ` Alice Ryhl
2026-05-08 21:38 ` Andrea Righi
2026-05-10 13:41   ` Alice Ryhl
2026-05-10 18:36     ` Andrea Righi [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=agDQOf8Qe11ozpuf@gpd4 \
    --to=arighi@nvidia.com \
    --cc=aliceryhl@google.com \
    --cc=axboe@kernel.dk \
    --cc=bigeasy@linutronix.de \
    --cc=boqun@kernel.org \
    --cc=changwoo@igalia.com \
    --cc=clrkwllms@kernel.org \
    --cc=frederic@kernel.org \
    --cc=io-uring@vger.kernel.org \
    --cc=jiangshanlai@gmail.com \
    --cc=joelagnelf@nvidia.com \
    --cc=josh@joshtriplett.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-devel@lists.linux.dev \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@redhat.com \
    --cc=neeraj.upadhyay@kernel.org \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=qiang.zhang@linux.dev \
    --cc=rcu@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=sched-ext@lists.linux.dev \
    --cc=tj@kernel.org \
    --cc=urezki@gmail.com \
    --cc=void@manifault.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.