public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/1] release_task: kill unnecessary rcu_read_lock() around dec_rlimit_ucounts()
@ 2025-10-26 14:31 Oleg Nesterov
  2025-10-26 14:35 ` [RFC 2/1] kill task_ucounts()->rcu_read_lock(), add __task_ucounts() Oleg Nesterov
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Oleg Nesterov @ 2025-10-26 14:31 UTC (permalink / raw)
  To: Alexey Gladkov, Andrew Morton, David Howells, Mateusz Guzik,
	Paul E. McKenney
  Cc: linux-kernel

rcu_read_lock() was added to shut RCU-lockdep up when this code used
__task_cred()->rcu_dereference(), but after the commit 21d1c5e386bc
("Reimplement RLIMIT_NPROC on top of ucounts") it is no longer needed:
task_ucounts()->task_cred_xxx() takes rcu_read_lock() itself.

NOTE: task_ucounts() returns the pointer to another rcu-protected data,
struct ucounts. So it should either be used when task->real_cred and thus
task->real_cred->ucounts is stable (release_task, copy_process, copy_creds),
or it should be called under rcu_read_lock(). In both cases it is pointless
to take rcu_read_lock() to read the cred->ucounts pointer.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/exit.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 9f74e8f1c431..f041f0c05ebb 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -251,10 +251,8 @@ void release_task(struct task_struct *p)
 	memset(&post, 0, sizeof(post));
 
 	/* don't need to get the RCU readlock here - the process is dead and
-	 * can't be modifying its own credentials. But shut RCU-lockdep up */
-	rcu_read_lock();
+	 * can't be modifying its own credentials. */
 	dec_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
-	rcu_read_unlock();
 
 	pidfs_exit(p);
 	cgroup_release(p);
-- 
2.25.1.362.g51ebf55



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC 2/1] kill task_ucounts()->rcu_read_lock(), add __task_ucounts()
  2025-10-26 14:31 [PATCH 1/1] release_task: kill unnecessary rcu_read_lock() around dec_rlimit_ucounts() Oleg Nesterov
@ 2025-10-26 14:35 ` Oleg Nesterov
  2025-10-26 14:49   ` Mateusz Guzik
  2025-10-27 14:55   ` David Howells
  2025-10-26 15:44 ` [PATCH 1/1] release_task: kill unnecessary rcu_read_lock() around dec_rlimit_ucounts() Alexey Gladkov
  2025-10-27 14:51 ` David Howells
  2 siblings, 2 replies; 7+ messages in thread
From: Oleg Nesterov @ 2025-10-26 14:35 UTC (permalink / raw)
  To: Alexey Gladkov, Andrew Morton, David Howells, Mateusz Guzik,
	Paul E. McKenney
  Cc: linux-kernel

On 10/26, Oleg Nesterov wrote:
>
> NOTE: task_ucounts() returns the pointer to another rcu-protected data,
> struct ucounts. So it should either be used when task->real_cred and thus
> task->real_cred->ucounts is stable (release_task, copy_process, copy_creds),
> or it should be called under rcu_read_lock(). In both cases it is pointless
> to take rcu_read_lock() to read the cred->ucounts pointer.

So I think task_ucounts() can just do

	/* The caller must ensure that ->real_cred is stable or take rcu_read_lock() */
	#define task_ucounts(task)	\
		rcu_dereference_check((task)->real_cred, 1)->ucounts

but this removes the lockdep checks altogether.

But, otoh, task_cred_xxx(t, ucounts) (or, say, task_cred_xxx(task, user_ns)) can
hide the problem. Lockdep won't complain if, for example, we remove rcu_read_lock()
in task_sig() around get_rlimit_value(task_ucounts(p)). So perhaps something like
below makes any sense?


diff --git a/include/linux/cred.h b/include/linux/cred.h
index 89ae50ad2ace..7078159486f0 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -347,7 +347,14 @@ DEFINE_FREE(put_cred, struct cred *, if (!IS_ERR_OR_NULL(_T)) put_cred(_T))
 
 #define task_uid(task)		(task_cred_xxx((task), uid))
 #define task_euid(task)		(task_cred_xxx((task), euid))
-#define task_ucounts(task)	(task_cred_xxx((task), ucounts))
+
+// ->real_cred must be stable
+#define __task_ucounts(task)	\
+	rcu_dereference_protected((task)->real_cred, 1)->ucounts
+
+// needs rcu_read_lock()
+#define task_ucounts(task)	\
+	rcu_dereference((task)->real_cred)->ucounts
 
 #define current_cred_xxx(xxx)			\
 ({						\
diff --git a/kernel/cred.c b/kernel/cred.c
index dbf6b687dc5c..edddecec82e5 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -305,7 +305,7 @@ int copy_creds(struct task_struct *p, u64 clone_flags)
 		p->real_cred = get_cred_many(p->cred, 2);
 		kdebug("share_creds(%p{%ld})",
 		       p->cred, atomic_long_read(&p->cred->usage));
-		inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
+		inc_rlimit_ucounts(__task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
 		return 0;
 	}
 
@@ -342,7 +342,7 @@ int copy_creds(struct task_struct *p, u64 clone_flags)
 #endif
 
 	p->cred = p->real_cred = get_cred(new);
-	inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
+	inc_rlimit_ucounts(__task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
 	return 0;
 
 error_put:
diff --git a/kernel/exit.c b/kernel/exit.c
index f041f0c05ebb..80b0f1114bd3 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -252,7 +252,7 @@ void release_task(struct task_struct *p)
 
 	/* don't need to get the RCU readlock here - the process is dead and
 	 * can't be modifying its own credentials. */
-	dec_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
+	dec_rlimit_ucounts(__task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
 
 	pidfs_exit(p);
 	cgroup_release(p);
diff --git a/kernel/fork.c b/kernel/fork.c
index 3da0f08615a9..f2a6a3cd14ef 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2048,7 +2048,7 @@ __latent_entropy struct task_struct *copy_process(
 		goto bad_fork_free;
 
 	retval = -EAGAIN;
-	if (is_rlimit_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
+	if (is_rlimit_overlimit(__task_ucounts(p), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
 		if (p->real_cred->user != INIT_USER &&
 		    !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
 			goto bad_fork_cleanup_count;
@@ -2486,7 +2486,7 @@ __latent_entropy struct task_struct *copy_process(
 bad_fork_cleanup_delayacct:
 	delayacct_tsk_free(p);
 bad_fork_cleanup_count:
-	dec_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
+	dec_rlimit_ucounts(__task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
 	exit_creds(p);
 bad_fork_free:
 	WRITE_ONCE(p->__state, TASK_DEAD);


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC 2/1] kill task_ucounts()->rcu_read_lock(), add __task_ucounts()
  2025-10-26 14:35 ` [RFC 2/1] kill task_ucounts()->rcu_read_lock(), add __task_ucounts() Oleg Nesterov
@ 2025-10-26 14:49   ` Mateusz Guzik
  2025-10-27 14:55   ` David Howells
  1 sibling, 0 replies; 7+ messages in thread
From: Mateusz Guzik @ 2025-10-26 14:49 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Alexey Gladkov, Andrew Morton, David Howells, Paul E. McKenney,
	linux-kernel

On Sun, Oct 26, 2025 at 3:36 PM Oleg Nesterov <oleg@redhat.com> wrote:
>
> On 10/26, Oleg Nesterov wrote:
> >
> > NOTE: task_ucounts() returns the pointer to another rcu-protected data,
> > struct ucounts. So it should either be used when task->real_cred and thus
> > task->real_cred->ucounts is stable (release_task, copy_process, copy_creds),
> > or it should be called under rcu_read_lock(). In both cases it is pointless
> > to take rcu_read_lock() to read the cred->ucounts pointer.
>
> So I think task_ucounts() can just do
>
>         /* The caller must ensure that ->real_cred is stable or take rcu_read_lock() */
>         #define task_ucounts(task)      \
>                 rcu_dereference_check((task)->real_cred, 1)->ucounts
>
> but this removes the lockdep checks altogether.
>
> But, otoh, task_cred_xxx(t, ucounts) (or, say, task_cred_xxx(task, user_ns)) can
> hide the problem. Lockdep won't complain if, for example, we remove rcu_read_lock()
> in task_sig() around get_rlimit_value(task_ucounts(p)). So perhaps something like
> below makes any sense?
>
>
> diff --git a/include/linux/cred.h b/include/linux/cred.h
> index 89ae50ad2ace..7078159486f0 100644
> --- a/include/linux/cred.h
> +++ b/include/linux/cred.h
> @@ -347,7 +347,14 @@ DEFINE_FREE(put_cred, struct cred *, if (!IS_ERR_OR_NULL(_T)) put_cred(_T))
>
>  #define task_uid(task)         (task_cred_xxx((task), uid))
>  #define task_euid(task)                (task_cred_xxx((task), euid))
> -#define task_ucounts(task)     (task_cred_xxx((task), ucounts))
> +
> +// ->real_cred must be stable
> +#define __task_ucounts(task)   \
> +       rcu_dereference_protected((task)->real_cred, 1)->ucounts
> +

While this indeed should be fine, this invites potential for misuse
which perhaps can be warned about. It's not a big deal and can be
ignored, but should you be willing to further massage this:

As is, this is legally callable for tasks which are still under
construction or are already dead. Maybe there is a WARN_ON to that
effect which can be trivially slapped in for the non-rcu case?

As a nit in a nit, lack of a debug-only general kernel WARN_ON/BUG_ON
variants is discouraging frequent use and perhaps this could be used
as an impetus to add something of the sort, or a justification for not
bothering to add the new check. ;) I'm definitely not going to try to
add something like that and I can't good conscience suggest anyone
tries that either.

tl;dr the patch LGTM, but consider the first nit. thanks. ;)

> +// needs rcu_read_lock()
> +#define task_ucounts(task)     \
> +       rcu_dereference((task)->real_cred)->ucounts
>
>  #define current_cred_xxx(xxx)                  \
>  ({                                             \
> diff --git a/kernel/cred.c b/kernel/cred.c
> index dbf6b687dc5c..edddecec82e5 100644
> --- a/kernel/cred.c
> +++ b/kernel/cred.c
> @@ -305,7 +305,7 @@ int copy_creds(struct task_struct *p, u64 clone_flags)
>                 p->real_cred = get_cred_many(p->cred, 2);
>                 kdebug("share_creds(%p{%ld})",
>                        p->cred, atomic_long_read(&p->cred->usage));
> -               inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
> +               inc_rlimit_ucounts(__task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
>                 return 0;
>         }
>
> @@ -342,7 +342,7 @@ int copy_creds(struct task_struct *p, u64 clone_flags)
>  #endif
>
>         p->cred = p->real_cred = get_cred(new);
> -       inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
> +       inc_rlimit_ucounts(__task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
>         return 0;
>
>  error_put:
> diff --git a/kernel/exit.c b/kernel/exit.c
> index f041f0c05ebb..80b0f1114bd3 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -252,7 +252,7 @@ void release_task(struct task_struct *p)
>
>         /* don't need to get the RCU readlock here - the process is dead and
>          * can't be modifying its own credentials. */
> -       dec_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
> +       dec_rlimit_ucounts(__task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
>
>         pidfs_exit(p);
>         cgroup_release(p);
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 3da0f08615a9..f2a6a3cd14ef 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2048,7 +2048,7 @@ __latent_entropy struct task_struct *copy_process(
>                 goto bad_fork_free;
>
>         retval = -EAGAIN;
> -       if (is_rlimit_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
> +       if (is_rlimit_overlimit(__task_ucounts(p), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
>                 if (p->real_cred->user != INIT_USER &&
>                     !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
>                         goto bad_fork_cleanup_count;
> @@ -2486,7 +2486,7 @@ __latent_entropy struct task_struct *copy_process(
>  bad_fork_cleanup_delayacct:
>         delayacct_tsk_free(p);
>  bad_fork_cleanup_count:
> -       dec_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
> +       dec_rlimit_ucounts(__task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
>         exit_creds(p);
>  bad_fork_free:
>         WRITE_ONCE(p->__state, TASK_DEAD);
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] release_task: kill unnecessary rcu_read_lock() around dec_rlimit_ucounts()
  2025-10-26 14:31 [PATCH 1/1] release_task: kill unnecessary rcu_read_lock() around dec_rlimit_ucounts() Oleg Nesterov
  2025-10-26 14:35 ` [RFC 2/1] kill task_ucounts()->rcu_read_lock(), add __task_ucounts() Oleg Nesterov
@ 2025-10-26 15:44 ` Alexey Gladkov
  2025-10-27 14:51 ` David Howells
  2 siblings, 0 replies; 7+ messages in thread
From: Alexey Gladkov @ 2025-10-26 15:44 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Andrew Morton, David Howells, Mateusz Guzik, Paul E. McKenney,
	linux-kernel

On Sun, Oct 26, 2025 at 03:31:40PM +0100, Oleg Nesterov wrote:
> rcu_read_lock() was added to shut RCU-lockdep up when this code used
> __task_cred()->rcu_dereference(), but after the commit 21d1c5e386bc
> ("Reimplement RLIMIT_NPROC on top of ucounts") it is no longer needed:
> task_ucounts()->task_cred_xxx() takes rcu_read_lock() itself.

Yes, it makes sense.

Acked-by: Alexey Gladkov <legion@kernel.org>

> NOTE: task_ucounts() returns the pointer to another rcu-protected data,
> struct ucounts. So it should either be used when task->real_cred and thus
> task->real_cred->ucounts is stable (release_task, copy_process, copy_creds),
> or it should be called under rcu_read_lock(). In both cases it is pointless
> to take rcu_read_lock() to read the cred->ucounts pointer.
> 
> Signed-off-by: Oleg Nesterov <oleg@redhat.com>
> ---
>  kernel/exit.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/kernel/exit.c b/kernel/exit.c
> index 9f74e8f1c431..f041f0c05ebb 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -251,10 +251,8 @@ void release_task(struct task_struct *p)
>  	memset(&post, 0, sizeof(post));
>  
>  	/* don't need to get the RCU readlock here - the process is dead and
> -	 * can't be modifying its own credentials. But shut RCU-lockdep up */
> -	rcu_read_lock();
> +	 * can't be modifying its own credentials. */
>  	dec_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
> -	rcu_read_unlock();
>  
>  	pidfs_exit(p);
>  	cgroup_release(p);
> -- 
> 2.25.1.362.g51ebf55
> 
> 

-- 
Rgrds, legion


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/1] release_task: kill unnecessary rcu_read_lock() around dec_rlimit_ucounts()
  2025-10-26 14:31 [PATCH 1/1] release_task: kill unnecessary rcu_read_lock() around dec_rlimit_ucounts() Oleg Nesterov
  2025-10-26 14:35 ` [RFC 2/1] kill task_ucounts()->rcu_read_lock(), add __task_ucounts() Oleg Nesterov
  2025-10-26 15:44 ` [PATCH 1/1] release_task: kill unnecessary rcu_read_lock() around dec_rlimit_ucounts() Alexey Gladkov
@ 2025-10-27 14:51 ` David Howells
  2 siblings, 0 replies; 7+ messages in thread
From: David Howells @ 2025-10-27 14:51 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: dhowells, Alexey Gladkov, Andrew Morton, Mateusz Guzik,
	Paul E. McKenney, linux-kernel

Oleg Nesterov <oleg@redhat.com> wrote:

> rcu_read_lock() was added to shut RCU-lockdep up when this code used
> __task_cred()->rcu_dereference(), but after the commit 21d1c5e386bc
> ("Reimplement RLIMIT_NPROC on top of ucounts") it is no longer needed:
> task_ucounts()->task_cred_xxx() takes rcu_read_lock() itself.
> 
> NOTE: task_ucounts() returns the pointer to another rcu-protected data,
> struct ucounts. So it should either be used when task->real_cred and thus
> task->real_cred->ucounts is stable (release_task, copy_process, copy_creds),
> or it should be called under rcu_read_lock(). In both cases it is pointless
> to take rcu_read_lock() to read the cred->ucounts pointer.

Yeah, accessing the pointer that task_ucounts() gives you isn't RCU safe
unless you're holding the rcu_read_lock() or are in a context where RCU safety
is irrelevant.  The task doing the dismantling in release_task() would seem to
qualify for that.

David


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC 2/1] kill task_ucounts()->rcu_read_lock(), add __task_ucounts()
  2025-10-26 14:35 ` [RFC 2/1] kill task_ucounts()->rcu_read_lock(), add __task_ucounts() Oleg Nesterov
  2025-10-26 14:49   ` Mateusz Guzik
@ 2025-10-27 14:55   ` David Howells
  2025-10-27 15:18     ` Paul E. McKenney
  1 sibling, 1 reply; 7+ messages in thread
From: David Howells @ 2025-10-27 14:55 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: dhowells, Alexey Gladkov, Andrew Morton, Mateusz Guzik,
	Paul E. McKenney, linux-kernel

Oleg Nesterov <oleg@redhat.com> wrote:

> So I think task_ucounts() can just do
> 
> 	/* The caller must ensure that ->real_cred is stable or take rcu_read_lock() */
> 	#define task_ucounts(task)	\
> 		rcu_dereference_check((task)->real_cred, 1)->ucounts

Can you use rcu_access_pointer() within exit.c?  E.g.:

	struct cred *pcreds = rcu_access_pointer(task->real_cred);
	dec_rlimit_ucounts(pcreds->ucounts, UCOUNT_RLIMIT_NPROC, 1);

David


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC 2/1] kill task_ucounts()->rcu_read_lock(), add __task_ucounts()
  2025-10-27 14:55   ` David Howells
@ 2025-10-27 15:18     ` Paul E. McKenney
  0 siblings, 0 replies; 7+ messages in thread
From: Paul E. McKenney @ 2025-10-27 15:18 UTC (permalink / raw)
  To: David Howells
  Cc: Oleg Nesterov, Alexey Gladkov, Andrew Morton, Mateusz Guzik,
	linux-kernel

On Mon, Oct 27, 2025 at 02:55:56PM +0000, David Howells wrote:
> Oleg Nesterov <oleg@redhat.com> wrote:
> 
> > So I think task_ucounts() can just do
> > 
> > 	/* The caller must ensure that ->real_cred is stable or take rcu_read_lock() */
> > 	#define task_ucounts(task)	\
> > 		rcu_dereference_check((task)->real_cred, 1)->ucounts
> 
> Can you use rcu_access_pointer() within exit.c?  E.g.:
> 
> 	struct cred *pcreds = rcu_access_pointer(task->real_cred);
> 	dec_rlimit_ucounts(pcreds->ucounts, UCOUNT_RLIMIT_NPROC, 1);

No go, unfortunately.  You can only use rcu_access_pointer() if you are
*not* dereferencing it.  And here, dereferencing is happening.

However, if there is some mutex that is preventing changes to ->real_cred,
then something like this would work:

	struct cred *pcreds = rcu_dereference_protected(task->real_cred mutex_is_locked(&whatever));

Alternatively, if this is due to the running kthread being in some
state, then a check for that state can be substituted for the above
mutex_is_locked().  And so on.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-10-27 15:18 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-26 14:31 [PATCH 1/1] release_task: kill unnecessary rcu_read_lock() around dec_rlimit_ucounts() Oleg Nesterov
2025-10-26 14:35 ` [RFC 2/1] kill task_ucounts()->rcu_read_lock(), add __task_ucounts() Oleg Nesterov
2025-10-26 14:49   ` Mateusz Guzik
2025-10-27 14:55   ` David Howells
2025-10-27 15:18     ` Paul E. McKenney
2025-10-26 15:44 ` [PATCH 1/1] release_task: kill unnecessary rcu_read_lock() around dec_rlimit_ucounts() Alexey Gladkov
2025-10-27 14:51 ` David Howells

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox