* [PATCH 1/1] release_task: kill unnecessary rcu_read_lock() around dec_rlimit_ucounts()
@ 2025-10-26 14:31 Oleg Nesterov
2025-10-26 14:35 ` [RFC 2/1] kill task_ucounts()->rcu_read_lock(), add __task_ucounts() Oleg Nesterov
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Oleg Nesterov @ 2025-10-26 14:31 UTC (permalink / raw)
To: Alexey Gladkov, Andrew Morton, David Howells, Mateusz Guzik,
Paul E. McKenney
Cc: linux-kernel
rcu_read_lock() was added to shut RCU-lockdep up when this code used
__task_cred()->rcu_dereference(), but after the commit 21d1c5e386bc
("Reimplement RLIMIT_NPROC on top of ucounts") it is no longer needed:
task_ucounts()->task_cred_xxx() takes rcu_read_lock() itself.
NOTE: task_ucounts() returns the pointer to another rcu-protected data,
struct ucounts. So it should either be used when task->real_cred and thus
task->real_cred->ucounts is stable (release_task, copy_process, copy_creds),
or it should be called under rcu_read_lock(). In both cases it is pointless
to take rcu_read_lock() to read the cred->ucounts pointer.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
kernel/exit.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/kernel/exit.c b/kernel/exit.c
index 9f74e8f1c431..f041f0c05ebb 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -251,10 +251,8 @@ void release_task(struct task_struct *p)
memset(&post, 0, sizeof(post));
/* don't need to get the RCU readlock here - the process is dead and
- * can't be modifying its own credentials. But shut RCU-lockdep up */
- rcu_read_lock();
+ * can't be modifying its own credentials. */
dec_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
- rcu_read_unlock();
pidfs_exit(p);
cgroup_release(p);
--
2.25.1.362.g51ebf55
^ permalink raw reply related [flat|nested] 7+ messages in thread* [RFC 2/1] kill task_ucounts()->rcu_read_lock(), add __task_ucounts() 2025-10-26 14:31 [PATCH 1/1] release_task: kill unnecessary rcu_read_lock() around dec_rlimit_ucounts() Oleg Nesterov @ 2025-10-26 14:35 ` Oleg Nesterov 2025-10-26 14:49 ` Mateusz Guzik 2025-10-27 14:55 ` David Howells 2025-10-26 15:44 ` [PATCH 1/1] release_task: kill unnecessary rcu_read_lock() around dec_rlimit_ucounts() Alexey Gladkov 2025-10-27 14:51 ` David Howells 2 siblings, 2 replies; 7+ messages in thread From: Oleg Nesterov @ 2025-10-26 14:35 UTC (permalink / raw) To: Alexey Gladkov, Andrew Morton, David Howells, Mateusz Guzik, Paul E. McKenney Cc: linux-kernel On 10/26, Oleg Nesterov wrote: > > NOTE: task_ucounts() returns the pointer to another rcu-protected data, > struct ucounts. So it should either be used when task->real_cred and thus > task->real_cred->ucounts is stable (release_task, copy_process, copy_creds), > or it should be called under rcu_read_lock(). In both cases it is pointless > to take rcu_read_lock() to read the cred->ucounts pointer. So I think task_ucounts() can just do /* The caller must ensure that ->real_cred is stable or take rcu_read_lock() */ #define task_ucounts(task) \ rcu_dereference_check((task)->real_cred, 1)->ucounts but this removes the lockdep checks altogether. But, otoh, task_cred_xxx(t, ucounts) (or, say, task_cred_xxx(task, user_ns)) can hide the problem. Lockdep won't complain if, for example, we remove rcu_read_lock() in task_sig() around get_rlimit_value(task_ucounts(p)). So perhaps something like below makes any sense? diff --git a/include/linux/cred.h b/include/linux/cred.h index 89ae50ad2ace..7078159486f0 100644 --- a/include/linux/cred.h +++ b/include/linux/cred.h @@ -347,7 +347,14 @@ DEFINE_FREE(put_cred, struct cred *, if (!IS_ERR_OR_NULL(_T)) put_cred(_T)) #define task_uid(task) (task_cred_xxx((task), uid)) #define task_euid(task) (task_cred_xxx((task), euid)) -#define task_ucounts(task) (task_cred_xxx((task), ucounts)) + +// ->real_cred must be stable +#define __task_ucounts(task) \ + rcu_dereference_protected((task)->real_cred, 1)->ucounts + +// needs rcu_read_lock() +#define task_ucounts(task) \ + rcu_dereference((task)->real_cred)->ucounts #define current_cred_xxx(xxx) \ ({ \ diff --git a/kernel/cred.c b/kernel/cred.c index dbf6b687dc5c..edddecec82e5 100644 --- a/kernel/cred.c +++ b/kernel/cred.c @@ -305,7 +305,7 @@ int copy_creds(struct task_struct *p, u64 clone_flags) p->real_cred = get_cred_many(p->cred, 2); kdebug("share_creds(%p{%ld})", p->cred, atomic_long_read(&p->cred->usage)); - inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1); + inc_rlimit_ucounts(__task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1); return 0; } @@ -342,7 +342,7 @@ int copy_creds(struct task_struct *p, u64 clone_flags) #endif p->cred = p->real_cred = get_cred(new); - inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1); + inc_rlimit_ucounts(__task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1); return 0; error_put: diff --git a/kernel/exit.c b/kernel/exit.c index f041f0c05ebb..80b0f1114bd3 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -252,7 +252,7 @@ void release_task(struct task_struct *p) /* don't need to get the RCU readlock here - the process is dead and * can't be modifying its own credentials. */ - dec_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1); + dec_rlimit_ucounts(__task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1); pidfs_exit(p); cgroup_release(p); diff --git a/kernel/fork.c b/kernel/fork.c index 3da0f08615a9..f2a6a3cd14ef 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2048,7 +2048,7 @@ __latent_entropy struct task_struct *copy_process( goto bad_fork_free; retval = -EAGAIN; - if (is_rlimit_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) { + if (is_rlimit_overlimit(__task_ucounts(p), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) { if (p->real_cred->user != INIT_USER && !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN)) goto bad_fork_cleanup_count; @@ -2486,7 +2486,7 @@ __latent_entropy struct task_struct *copy_process( bad_fork_cleanup_delayacct: delayacct_tsk_free(p); bad_fork_cleanup_count: - dec_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1); + dec_rlimit_ucounts(__task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1); exit_creds(p); bad_fork_free: WRITE_ONCE(p->__state, TASK_DEAD); ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [RFC 2/1] kill task_ucounts()->rcu_read_lock(), add __task_ucounts() 2025-10-26 14:35 ` [RFC 2/1] kill task_ucounts()->rcu_read_lock(), add __task_ucounts() Oleg Nesterov @ 2025-10-26 14:49 ` Mateusz Guzik 2025-10-27 14:55 ` David Howells 1 sibling, 0 replies; 7+ messages in thread From: Mateusz Guzik @ 2025-10-26 14:49 UTC (permalink / raw) To: Oleg Nesterov Cc: Alexey Gladkov, Andrew Morton, David Howells, Paul E. McKenney, linux-kernel On Sun, Oct 26, 2025 at 3:36 PM Oleg Nesterov <oleg@redhat.com> wrote: > > On 10/26, Oleg Nesterov wrote: > > > > NOTE: task_ucounts() returns the pointer to another rcu-protected data, > > struct ucounts. So it should either be used when task->real_cred and thus > > task->real_cred->ucounts is stable (release_task, copy_process, copy_creds), > > or it should be called under rcu_read_lock(). In both cases it is pointless > > to take rcu_read_lock() to read the cred->ucounts pointer. > > So I think task_ucounts() can just do > > /* The caller must ensure that ->real_cred is stable or take rcu_read_lock() */ > #define task_ucounts(task) \ > rcu_dereference_check((task)->real_cred, 1)->ucounts > > but this removes the lockdep checks altogether. > > But, otoh, task_cred_xxx(t, ucounts) (or, say, task_cred_xxx(task, user_ns)) can > hide the problem. Lockdep won't complain if, for example, we remove rcu_read_lock() > in task_sig() around get_rlimit_value(task_ucounts(p)). So perhaps something like > below makes any sense? > > > diff --git a/include/linux/cred.h b/include/linux/cred.h > index 89ae50ad2ace..7078159486f0 100644 > --- a/include/linux/cred.h > +++ b/include/linux/cred.h > @@ -347,7 +347,14 @@ DEFINE_FREE(put_cred, struct cred *, if (!IS_ERR_OR_NULL(_T)) put_cred(_T)) > > #define task_uid(task) (task_cred_xxx((task), uid)) > #define task_euid(task) (task_cred_xxx((task), euid)) > -#define task_ucounts(task) (task_cred_xxx((task), ucounts)) > + > +// ->real_cred must be stable > +#define __task_ucounts(task) \ > + rcu_dereference_protected((task)->real_cred, 1)->ucounts > + While this indeed should be fine, this invites potential for misuse which perhaps can be warned about. It's not a big deal and can be ignored, but should you be willing to further massage this: As is, this is legally callable for tasks which are still under construction or are already dead. Maybe there is a WARN_ON to that effect which can be trivially slapped in for the non-rcu case? As a nit in a nit, lack of a debug-only general kernel WARN_ON/BUG_ON variants is discouraging frequent use and perhaps this could be used as an impetus to add something of the sort, or a justification for not bothering to add the new check. ;) I'm definitely not going to try to add something like that and I can't good conscience suggest anyone tries that either. tl;dr the patch LGTM, but consider the first nit. thanks. ;) > +// needs rcu_read_lock() > +#define task_ucounts(task) \ > + rcu_dereference((task)->real_cred)->ucounts > > #define current_cred_xxx(xxx) \ > ({ \ > diff --git a/kernel/cred.c b/kernel/cred.c > index dbf6b687dc5c..edddecec82e5 100644 > --- a/kernel/cred.c > +++ b/kernel/cred.c > @@ -305,7 +305,7 @@ int copy_creds(struct task_struct *p, u64 clone_flags) > p->real_cred = get_cred_many(p->cred, 2); > kdebug("share_creds(%p{%ld})", > p->cred, atomic_long_read(&p->cred->usage)); > - inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1); > + inc_rlimit_ucounts(__task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1); > return 0; > } > > @@ -342,7 +342,7 @@ int copy_creds(struct task_struct *p, u64 clone_flags) > #endif > > p->cred = p->real_cred = get_cred(new); > - inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1); > + inc_rlimit_ucounts(__task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1); > return 0; > > error_put: > diff --git a/kernel/exit.c b/kernel/exit.c > index f041f0c05ebb..80b0f1114bd3 100644 > --- a/kernel/exit.c > +++ b/kernel/exit.c > @@ -252,7 +252,7 @@ void release_task(struct task_struct *p) > > /* don't need to get the RCU readlock here - the process is dead and > * can't be modifying its own credentials. */ > - dec_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1); > + dec_rlimit_ucounts(__task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1); > > pidfs_exit(p); > cgroup_release(p); > diff --git a/kernel/fork.c b/kernel/fork.c > index 3da0f08615a9..f2a6a3cd14ef 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -2048,7 +2048,7 @@ __latent_entropy struct task_struct *copy_process( > goto bad_fork_free; > > retval = -EAGAIN; > - if (is_rlimit_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) { > + if (is_rlimit_overlimit(__task_ucounts(p), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) { > if (p->real_cred->user != INIT_USER && > !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN)) > goto bad_fork_cleanup_count; > @@ -2486,7 +2486,7 @@ __latent_entropy struct task_struct *copy_process( > bad_fork_cleanup_delayacct: > delayacct_tsk_free(p); > bad_fork_cleanup_count: > - dec_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1); > + dec_rlimit_ucounts(__task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1); > exit_creds(p); > bad_fork_free: > WRITE_ONCE(p->__state, TASK_DEAD); > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC 2/1] kill task_ucounts()->rcu_read_lock(), add __task_ucounts() 2025-10-26 14:35 ` [RFC 2/1] kill task_ucounts()->rcu_read_lock(), add __task_ucounts() Oleg Nesterov 2025-10-26 14:49 ` Mateusz Guzik @ 2025-10-27 14:55 ` David Howells 2025-10-27 15:18 ` Paul E. McKenney 1 sibling, 1 reply; 7+ messages in thread From: David Howells @ 2025-10-27 14:55 UTC (permalink / raw) To: Oleg Nesterov Cc: dhowells, Alexey Gladkov, Andrew Morton, Mateusz Guzik, Paul E. McKenney, linux-kernel Oleg Nesterov <oleg@redhat.com> wrote: > So I think task_ucounts() can just do > > /* The caller must ensure that ->real_cred is stable or take rcu_read_lock() */ > #define task_ucounts(task) \ > rcu_dereference_check((task)->real_cred, 1)->ucounts Can you use rcu_access_pointer() within exit.c? E.g.: struct cred *pcreds = rcu_access_pointer(task->real_cred); dec_rlimit_ucounts(pcreds->ucounts, UCOUNT_RLIMIT_NPROC, 1); David ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC 2/1] kill task_ucounts()->rcu_read_lock(), add __task_ucounts() 2025-10-27 14:55 ` David Howells @ 2025-10-27 15:18 ` Paul E. McKenney 0 siblings, 0 replies; 7+ messages in thread From: Paul E. McKenney @ 2025-10-27 15:18 UTC (permalink / raw) To: David Howells Cc: Oleg Nesterov, Alexey Gladkov, Andrew Morton, Mateusz Guzik, linux-kernel On Mon, Oct 27, 2025 at 02:55:56PM +0000, David Howells wrote: > Oleg Nesterov <oleg@redhat.com> wrote: > > > So I think task_ucounts() can just do > > > > /* The caller must ensure that ->real_cred is stable or take rcu_read_lock() */ > > #define task_ucounts(task) \ > > rcu_dereference_check((task)->real_cred, 1)->ucounts > > Can you use rcu_access_pointer() within exit.c? E.g.: > > struct cred *pcreds = rcu_access_pointer(task->real_cred); > dec_rlimit_ucounts(pcreds->ucounts, UCOUNT_RLIMIT_NPROC, 1); No go, unfortunately. You can only use rcu_access_pointer() if you are *not* dereferencing it. And here, dereferencing is happening. However, if there is some mutex that is preventing changes to ->real_cred, then something like this would work: struct cred *pcreds = rcu_dereference_protected(task->real_cred mutex_is_locked(&whatever)); Alternatively, if this is due to the running kthread being in some state, then a check for that state can be substituted for the above mutex_is_locked(). And so on. Thanx, Paul ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/1] release_task: kill unnecessary rcu_read_lock() around dec_rlimit_ucounts() 2025-10-26 14:31 [PATCH 1/1] release_task: kill unnecessary rcu_read_lock() around dec_rlimit_ucounts() Oleg Nesterov 2025-10-26 14:35 ` [RFC 2/1] kill task_ucounts()->rcu_read_lock(), add __task_ucounts() Oleg Nesterov @ 2025-10-26 15:44 ` Alexey Gladkov 2025-10-27 14:51 ` David Howells 2 siblings, 0 replies; 7+ messages in thread From: Alexey Gladkov @ 2025-10-26 15:44 UTC (permalink / raw) To: Oleg Nesterov Cc: Andrew Morton, David Howells, Mateusz Guzik, Paul E. McKenney, linux-kernel On Sun, Oct 26, 2025 at 03:31:40PM +0100, Oleg Nesterov wrote: > rcu_read_lock() was added to shut RCU-lockdep up when this code used > __task_cred()->rcu_dereference(), but after the commit 21d1c5e386bc > ("Reimplement RLIMIT_NPROC on top of ucounts") it is no longer needed: > task_ucounts()->task_cred_xxx() takes rcu_read_lock() itself. Yes, it makes sense. Acked-by: Alexey Gladkov <legion@kernel.org> > NOTE: task_ucounts() returns the pointer to another rcu-protected data, > struct ucounts. So it should either be used when task->real_cred and thus > task->real_cred->ucounts is stable (release_task, copy_process, copy_creds), > or it should be called under rcu_read_lock(). In both cases it is pointless > to take rcu_read_lock() to read the cred->ucounts pointer. > > Signed-off-by: Oleg Nesterov <oleg@redhat.com> > --- > kernel/exit.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/kernel/exit.c b/kernel/exit.c > index 9f74e8f1c431..f041f0c05ebb 100644 > --- a/kernel/exit.c > +++ b/kernel/exit.c > @@ -251,10 +251,8 @@ void release_task(struct task_struct *p) > memset(&post, 0, sizeof(post)); > > /* don't need to get the RCU readlock here - the process is dead and > - * can't be modifying its own credentials. But shut RCU-lockdep up */ > - rcu_read_lock(); > + * can't be modifying its own credentials. */ > dec_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1); > - rcu_read_unlock(); > > pidfs_exit(p); > cgroup_release(p); > -- > 2.25.1.362.g51ebf55 > > -- Rgrds, legion ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 1/1] release_task: kill unnecessary rcu_read_lock() around dec_rlimit_ucounts() 2025-10-26 14:31 [PATCH 1/1] release_task: kill unnecessary rcu_read_lock() around dec_rlimit_ucounts() Oleg Nesterov 2025-10-26 14:35 ` [RFC 2/1] kill task_ucounts()->rcu_read_lock(), add __task_ucounts() Oleg Nesterov 2025-10-26 15:44 ` [PATCH 1/1] release_task: kill unnecessary rcu_read_lock() around dec_rlimit_ucounts() Alexey Gladkov @ 2025-10-27 14:51 ` David Howells 2 siblings, 0 replies; 7+ messages in thread From: David Howells @ 2025-10-27 14:51 UTC (permalink / raw) To: Oleg Nesterov Cc: dhowells, Alexey Gladkov, Andrew Morton, Mateusz Guzik, Paul E. McKenney, linux-kernel Oleg Nesterov <oleg@redhat.com> wrote: > rcu_read_lock() was added to shut RCU-lockdep up when this code used > __task_cred()->rcu_dereference(), but after the commit 21d1c5e386bc > ("Reimplement RLIMIT_NPROC on top of ucounts") it is no longer needed: > task_ucounts()->task_cred_xxx() takes rcu_read_lock() itself. > > NOTE: task_ucounts() returns the pointer to another rcu-protected data, > struct ucounts. So it should either be used when task->real_cred and thus > task->real_cred->ucounts is stable (release_task, copy_process, copy_creds), > or it should be called under rcu_read_lock(). In both cases it is pointless > to take rcu_read_lock() to read the cred->ucounts pointer. Yeah, accessing the pointer that task_ucounts() gives you isn't RCU safe unless you're holding the rcu_read_lock() or are in a context where RCU safety is irrelevant. The task doing the dismantling in release_task() would seem to qualify for that. David ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-10-27 15:18 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-10-26 14:31 [PATCH 1/1] release_task: kill unnecessary rcu_read_lock() around dec_rlimit_ucounts() Oleg Nesterov 2025-10-26 14:35 ` [RFC 2/1] kill task_ucounts()->rcu_read_lock(), add __task_ucounts() Oleg Nesterov 2025-10-26 14:49 ` Mateusz Guzik 2025-10-27 14:55 ` David Howells 2025-10-27 15:18 ` Paul E. McKenney 2025-10-26 15:44 ` [PATCH 1/1] release_task: kill unnecessary rcu_read_lock() around dec_rlimit_ucounts() Alexey Gladkov 2025-10-27 14:51 ` David Howells
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox