From: Oleg Nesterov <oleg@redhat.com>
To: Alexey Gladkov <legion@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
David Howells <dhowells@redhat.com>,
Mateusz Guzik <mjguzik@gmail.com>,
"Paul E. McKenney" <paulmck@kernel.org>
Cc: linux-kernel@vger.kernel.org
Subject: [RFC 2/1] kill task_ucounts()->rcu_read_lock(), add __task_ucounts()
Date: Sun, 26 Oct 2025 15:35:01 +0100 [thread overview]
Message-ID: <20251026143501.GA22472@redhat.com> (raw)
In-Reply-To: <20251026143140.GA22463@redhat.com>
On 10/26, Oleg Nesterov wrote:
>
> NOTE: task_ucounts() returns the pointer to another rcu-protected data,
> struct ucounts. So it should either be used when task->real_cred and thus
> task->real_cred->ucounts is stable (release_task, copy_process, copy_creds),
> or it should be called under rcu_read_lock(). In both cases it is pointless
> to take rcu_read_lock() to read the cred->ucounts pointer.
So I think task_ucounts() can just do
/* The caller must ensure that ->real_cred is stable or take rcu_read_lock() */
#define task_ucounts(task) \
rcu_dereference_check((task)->real_cred, 1)->ucounts
but this removes the lockdep checks altogether.
But, otoh, task_cred_xxx(t, ucounts) (or, say, task_cred_xxx(task, user_ns)) can
hide the problem. Lockdep won't complain if, for example, we remove rcu_read_lock()
in task_sig() around get_rlimit_value(task_ucounts(p)). So perhaps something like
below makes any sense?
diff --git a/include/linux/cred.h b/include/linux/cred.h
index 89ae50ad2ace..7078159486f0 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -347,7 +347,14 @@ DEFINE_FREE(put_cred, struct cred *, if (!IS_ERR_OR_NULL(_T)) put_cred(_T))
#define task_uid(task) (task_cred_xxx((task), uid))
#define task_euid(task) (task_cred_xxx((task), euid))
-#define task_ucounts(task) (task_cred_xxx((task), ucounts))
+
+// ->real_cred must be stable
+#define __task_ucounts(task) \
+ rcu_dereference_protected((task)->real_cred, 1)->ucounts
+
+// needs rcu_read_lock()
+#define task_ucounts(task) \
+ rcu_dereference((task)->real_cred)->ucounts
#define current_cred_xxx(xxx) \
({ \
diff --git a/kernel/cred.c b/kernel/cred.c
index dbf6b687dc5c..edddecec82e5 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -305,7 +305,7 @@ int copy_creds(struct task_struct *p, u64 clone_flags)
p->real_cred = get_cred_many(p->cred, 2);
kdebug("share_creds(%p{%ld})",
p->cred, atomic_long_read(&p->cred->usage));
- inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
+ inc_rlimit_ucounts(__task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
return 0;
}
@@ -342,7 +342,7 @@ int copy_creds(struct task_struct *p, u64 clone_flags)
#endif
p->cred = p->real_cred = get_cred(new);
- inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
+ inc_rlimit_ucounts(__task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
return 0;
error_put:
diff --git a/kernel/exit.c b/kernel/exit.c
index f041f0c05ebb..80b0f1114bd3 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -252,7 +252,7 @@ void release_task(struct task_struct *p)
/* don't need to get the RCU readlock here - the process is dead and
* can't be modifying its own credentials. */
- dec_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
+ dec_rlimit_ucounts(__task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
pidfs_exit(p);
cgroup_release(p);
diff --git a/kernel/fork.c b/kernel/fork.c
index 3da0f08615a9..f2a6a3cd14ef 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2048,7 +2048,7 @@ __latent_entropy struct task_struct *copy_process(
goto bad_fork_free;
retval = -EAGAIN;
- if (is_rlimit_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
+ if (is_rlimit_overlimit(__task_ucounts(p), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
if (p->real_cred->user != INIT_USER &&
!capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
goto bad_fork_cleanup_count;
@@ -2486,7 +2486,7 @@ __latent_entropy struct task_struct *copy_process(
bad_fork_cleanup_delayacct:
delayacct_tsk_free(p);
bad_fork_cleanup_count:
- dec_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
+ dec_rlimit_ucounts(__task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
exit_creds(p);
bad_fork_free:
WRITE_ONCE(p->__state, TASK_DEAD);
next prev parent reply other threads:[~2025-10-26 14:36 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-26 14:31 [PATCH 1/1] release_task: kill unnecessary rcu_read_lock() around dec_rlimit_ucounts() Oleg Nesterov
2025-10-26 14:35 ` Oleg Nesterov [this message]
2025-10-26 14:49 ` [RFC 2/1] kill task_ucounts()->rcu_read_lock(), add __task_ucounts() Mateusz Guzik
2025-10-27 14:55 ` David Howells
2025-10-27 15:18 ` Paul E. McKenney
2025-10-26 15:44 ` [PATCH 1/1] release_task: kill unnecessary rcu_read_lock() around dec_rlimit_ucounts() Alexey Gladkov
2025-10-27 14:51 ` David Howells
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251026143501.GA22472@redhat.com \
--to=oleg@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=dhowells@redhat.com \
--cc=legion@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mjguzik@gmail.com \
--cc=paulmck@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.