* [PATCH v1] pid: annotate data-races around pid_ns->pid_allocated
@ 2025-04-23 11:55 Jiayuan Chen
2025-04-23 13:51 ` Oleg Nesterov
0 siblings, 1 reply; 7+ messages in thread
From: Jiayuan Chen @ 2025-04-23 11:55 UTC (permalink / raw)
To: linux-kernel
Cc: mrpre, mkoutny, Jiayuan Chen, syzbot+adcaa842b762a1762e7d,
syzbot+fab52e3459fa2f95df57, syzbot+0718f65353d72efaac1e,
Andrew Morton, Christian Brauner, Lorenzo Stoakes,
Liam R. Howlett, Suren Baghdasaryan, Wei Yang, David Hildenbrand,
Al Viro, Oleg Nesterov, Mateusz Guzik, Joel Granados, Jens Axboe,
Wei Liu, Frederic Weisbecker
Suppress syzbot reports by annotating these accesses using
READ_ONCE() / WRITE_ONCE().
Reported-by: syzbot+adcaa842b762a1762e7d@syzkaller.appspotmail.com
Reported-by: syzbot+fab52e3459fa2f95df57@syzkaller.appspotmail.com
Reported-by: syzbot+0718f65353d72efaac1e@syzkaller.appspotmail.com
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
---
kernel/fork.c | 2 +-
kernel/pid.c | 7 ++++---
kernel/pid_namespace.c | 2 +-
3 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/kernel/fork.c b/kernel/fork.c
index c4b26cd8998b..1966ddea150d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2584,7 +2584,7 @@ __latent_entropy struct task_struct *copy_process(
rseq_fork(p, clone_flags);
/* Don't start children in a dying pid namespace */
- if (unlikely(!(ns_of_pid(pid)->pid_allocated & PIDNS_ADDING))) {
+ if (unlikely(!(READ_ONCE(ns_of_pid(pid)->pid_allocated) & PIDNS_ADDING))) {
retval = -ENOMEM;
goto bad_fork_core_free;
}
diff --git a/kernel/pid.c b/kernel/pid.c
index 4ac2ce46817f..47e74457572f 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -122,7 +122,8 @@ void free_pid(struct pid *pid)
for (i = 0; i <= pid->level; i++) {
struct upid *upid = pid->numbers + i;
struct pid_namespace *ns = upid->ns;
- switch (--ns->pid_allocated) {
+ WRITE_ONCE(ns->pid_allocated, READ_ONCE(ns->pid_allocated) - 1);
+ switch (READ_ONCE(ns->pid_allocated)) {
case 2:
case 1:
/* When all that is left in the pid namespace
@@ -271,13 +272,13 @@ struct pid *alloc_pid(struct pid_namespace *ns, pid_t *set_tid,
upid = pid->numbers + ns->level;
idr_preload(GFP_KERNEL);
spin_lock(&pidmap_lock);
- if (!(ns->pid_allocated & PIDNS_ADDING))
+ if (!(READ_ONCE(ns->pid_allocated) & PIDNS_ADDING))
goto out_unlock;
pidfs_add_pid(pid);
for ( ; upid >= pid->numbers; --upid) {
/* Make the PID visible to find_pid_ns. */
idr_replace(&upid->ns->idr, pid, upid->nr);
- upid->ns->pid_allocated++;
+ WRITE_ONCE(upid->ns->pid_allocated, READ_ONCE(upid->ns->pid_allocated) + 1);
}
spin_unlock(&pidmap_lock);
idr_preload_end();
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
index 7098ed44e717..148f7789d6f3 100644
--- a/kernel/pid_namespace.c
+++ b/kernel/pid_namespace.c
@@ -268,7 +268,7 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns)
*/
for (;;) {
set_current_state(TASK_INTERRUPTIBLE);
- if (pid_ns->pid_allocated == init_pids)
+ if (READ_ONCE(pid_ns->pid_allocated) == init_pids)
break;
schedule();
}
--
2.47.1
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH v1] pid: annotate data-races around pid_ns->pid_allocated
2025-04-23 11:55 [PATCH v1] pid: annotate data-races around pid_ns->pid_allocated Jiayuan Chen
@ 2025-04-23 13:51 ` Oleg Nesterov
2025-04-23 14:33 ` Jiayuan Chen
0 siblings, 1 reply; 7+ messages in thread
From: Oleg Nesterov @ 2025-04-23 13:51 UTC (permalink / raw)
To: Jiayuan Chen
Cc: linux-kernel, mrpre, mkoutny, syzbot+adcaa842b762a1762e7d,
syzbot+fab52e3459fa2f95df57, syzbot+0718f65353d72efaac1e,
Andrew Morton, Christian Brauner, Lorenzo Stoakes,
Liam R. Howlett, Suren Baghdasaryan, Wei Yang, David Hildenbrand,
Al Viro, Mateusz Guzik, Joel Granados, Jens Axboe, Wei Liu,
Frederic Weisbecker
On 04/23, Jiayuan Chen wrote:
>
> Suppress syzbot reports by annotating these accesses using
> READ_ONCE() / WRITE_ONCE().
...
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -122,7 +122,8 @@ void free_pid(struct pid *pid)
> for (i = 0; i <= pid->level; i++) {
> struct upid *upid = pid->numbers + i;
> struct pid_namespace *ns = upid->ns;
> - switch (--ns->pid_allocated) {
> + WRITE_ONCE(ns->pid_allocated, READ_ONCE(ns->pid_allocated) - 1);
> + switch (READ_ONCE(ns->pid_allocated)) {
I keep forgetting how kcsan works, but we don't need
READ_ONCE(ns->pid_allocated) under pidmap_lock?
Same for other functions which read/modify ->pid_allocated with
this lock held.
Oleg.
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH v1] pid: annotate data-races around pid_ns->pid_allocated
2025-04-23 13:51 ` Oleg Nesterov
@ 2025-04-23 14:33 ` Jiayuan Chen
2025-04-23 16:24 ` Michal Koutný
2025-04-23 16:38 ` Oleg Nesterov
0 siblings, 2 replies; 7+ messages in thread
From: Jiayuan Chen @ 2025-04-23 14:33 UTC (permalink / raw)
To: Oleg Nesterov
Cc: linux-kernel, mrpre, mkoutny, syzbot+adcaa842b762a1762e7d,
syzbot+fab52e3459fa2f95df57, syzbot+0718f65353d72efaac1e,
Andrew Morton, Christian Brauner, Lorenzo Stoakes,
Liam R. Howlett, Suren Baghdasaryan, Wei Yang, David Hildenbrand,
Al Viro, Mateusz Guzik, Joel Granados, Jens Axboe, Wei Liu,
Frederic Weisbecker
April 23, 2025 at 21:51, "Oleg Nesterov" <oleg@redhat.com> wrote:
>
> On 04/23, Jiayuan Chen wrote:
>
> >
> > Suppress syzbot reports by annotating these accesses using
> >
> > READ_ONCE() / WRITE_ONCE().
> >
>
> ...
>
> >
> > --- a/kernel/pid.c
> >
> > +++ b/kernel/pid.c
> >
> > @@ -122,7 +122,8 @@ void free_pid(struct pid *pid)
> >
> > for (i = 0; i <= pid->level; i++) {
> >
> > struct upid *upid = pid->numbers + i;
> >
> > struct pid_namespace *ns = upid->ns;
> >
> > - switch (--ns->pid_allocated) {
> >
> > + WRITE_ONCE(ns->pid_allocated, READ_ONCE(ns->pid_allocated) - 1);
> >
> > + switch (READ_ONCE(ns->pid_allocated)) {
> >
>
> I keep forgetting how kcsan works, but we don't need
>
> READ_ONCE(ns->pid_allocated) under pidmap_lock?
>
> Same for other functions which read/modify ->pid_allocated with
>
> this lock held.
>
> Oleg.
>
However, not all places that read/write pid_allocated are locked,
for example:
https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/kernel/pid_namespace.c#n271
https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/kernel/fork.c#n2602
So, in fact, the pidmap_lock is not effective. And if we were to add locks
to all these places, it would be too heavy.
There's no actual impact on usage without locks, so I think it might be more
suitable to add these macros, KASAN can recognize READ_ONCE and WRITE_ONCE
and suppress warnings.
Thanks.
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH v1] pid: annotate data-races around pid_ns->pid_allocated
2025-04-23 14:33 ` Jiayuan Chen
@ 2025-04-23 16:24 ` Michal Koutný
2025-04-23 16:38 ` Oleg Nesterov
1 sibling, 0 replies; 7+ messages in thread
From: Michal Koutný @ 2025-04-23 16:24 UTC (permalink / raw)
To: Jiayuan Chen
Cc: Oleg Nesterov, linux-kernel, mrpre, syzbot+adcaa842b762a1762e7d,
syzbot+fab52e3459fa2f95df57, syzbot+0718f65353d72efaac1e,
Andrew Morton, Christian Brauner, Lorenzo Stoakes,
Liam R. Howlett, Suren Baghdasaryan, Wei Yang, David Hildenbrand,
Al Viro, Mateusz Guzik, Joel Granados, Jens Axboe, Wei Liu,
Frederic Weisbecker
[-- Attachment #1: Type: text/plain, Size: 883 bytes --]
On Wed, Apr 23, 2025 at 02:33:37PM +0000, Jiayuan Chen <jiayuan.chen@linux.dev> wrote:
> However, not all places that read/write pid_allocated are locked,
> for example:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/kernel/pid_namespace.c#n271
> https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/kernel/fork.c#n2602
>
> So, in fact, the pidmap_lock is not effective. And if we were to add locks
> to all these places, it would be too heavy.
>
> There's no actual impact on usage without locks, so I think it might be more
> suitable to add these macros, KASAN can recognize READ_ONCE and WRITE_ONCE
> and suppress warnings.
Wouldn't it be nicer to add data_race() to mark those places where the
race (presumably) doesn't matter? (Instead of _ONCE'ing places that are
under the lock.)
0.02€,
Michal
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v1] pid: annotate data-races around pid_ns->pid_allocated
2025-04-23 14:33 ` Jiayuan Chen
2025-04-23 16:24 ` Michal Koutný
@ 2025-04-23 16:38 ` Oleg Nesterov
2025-04-24 9:38 ` Christian Brauner
1 sibling, 1 reply; 7+ messages in thread
From: Oleg Nesterov @ 2025-04-23 16:38 UTC (permalink / raw)
To: Jiayuan Chen
Cc: linux-kernel, mrpre, mkoutny, syzbot+adcaa842b762a1762e7d,
syzbot+fab52e3459fa2f95df57, syzbot+0718f65353d72efaac1e,
Andrew Morton, Christian Brauner, Lorenzo Stoakes,
Liam R. Howlett, Suren Baghdasaryan, Wei Yang, David Hildenbrand,
Al Viro, Mateusz Guzik, Joel Granados, Jens Axboe, Wei Liu,
Frederic Weisbecker
On 04/23, Jiayuan Chen wrote:
>
> April 23, 2025 at 21:51, "Oleg Nesterov" <oleg@redhat.com> wrote:
>
>
>
> >
> > On 04/23, Jiayuan Chen wrote:
> >
> > >
> > > Suppress syzbot reports by annotating these accesses using
> > >
> > > READ_ONCE() / WRITE_ONCE().
> > >
> >
> > ...
> >
> > >
> > > --- a/kernel/pid.c
> > >
> > > +++ b/kernel/pid.c
> > >
> > > @@ -122,7 +122,8 @@ void free_pid(struct pid *pid)
> > >
> > > for (i = 0; i <= pid->level; i++) {
> > >
> > > struct upid *upid = pid->numbers + i;
> > >
> > > struct pid_namespace *ns = upid->ns;
> > >
> > > - switch (--ns->pid_allocated) {
> > >
> > > + WRITE_ONCE(ns->pid_allocated, READ_ONCE(ns->pid_allocated) - 1);
> > >
> > > + switch (READ_ONCE(ns->pid_allocated)) {
> > >
> >
> > I keep forgetting how kcsan works, but we don't need
> >
> > READ_ONCE(ns->pid_allocated) under pidmap_lock?
> >
> > Same for other functions which read/modify ->pid_allocated with
> >
> > this lock held.
> >
> > Oleg.
> >
>
> However, not all places that read/write pid_allocated are locked,
> for example:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/kernel/pid_namespace.c#n271
> https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/kernel/fork.c#n2602
>
> So, in fact, the pidmap_lock is not effective. And if we were to add locks
> to all these places, it would be too heavy.
It seems you misunderstood me. I didn't argue with the lockless READ_ONCE()s
outside of pidmap_lock.
Oleg.
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH v1] pid: annotate data-races around pid_ns->pid_allocated
2025-04-23 16:38 ` Oleg Nesterov
@ 2025-04-24 9:38 ` Christian Brauner
2025-04-25 5:37 ` Jiayuan Chen
0 siblings, 1 reply; 7+ messages in thread
From: Christian Brauner @ 2025-04-24 9:38 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Jiayuan Chen, linux-kernel, mrpre, mkoutny,
syzbot+adcaa842b762a1762e7d, syzbot+fab52e3459fa2f95df57,
syzbot+0718f65353d72efaac1e, Andrew Morton, Lorenzo Stoakes,
Liam R. Howlett, Suren Baghdasaryan, Wei Yang, David Hildenbrand,
Al Viro, Mateusz Guzik, Joel Granados, Jens Axboe, Wei Liu,
Frederic Weisbecker
On Wed, Apr 23, 2025 at 06:38:18PM +0200, Oleg Nesterov wrote:
> On 04/23, Jiayuan Chen wrote:
> >
> > April 23, 2025 at 21:51, "Oleg Nesterov" <oleg@redhat.com> wrote:
> >
> >
> >
> > >
> > > On 04/23, Jiayuan Chen wrote:
> > >
> > > >
> > > > Suppress syzbot reports by annotating these accesses using
> > > >
> > > > READ_ONCE() / WRITE_ONCE().
> > > >
> > >
> > > ...
> > >
> > > >
> > > > --- a/kernel/pid.c
> > > >
> > > > +++ b/kernel/pid.c
> > > >
> > > > @@ -122,7 +122,8 @@ void free_pid(struct pid *pid)
> > > >
> > > > for (i = 0; i <= pid->level; i++) {
> > > >
> > > > struct upid *upid = pid->numbers + i;
> > > >
> > > > struct pid_namespace *ns = upid->ns;
> > > >
> > > > - switch (--ns->pid_allocated) {
> > > >
> > > > + WRITE_ONCE(ns->pid_allocated, READ_ONCE(ns->pid_allocated) - 1);
> > > >
> > > > + switch (READ_ONCE(ns->pid_allocated)) {
> > > >
> > >
> > > I keep forgetting how kcsan works, but we don't need
> > >
> > > READ_ONCE(ns->pid_allocated) under pidmap_lock?
> > >
> > > Same for other functions which read/modify ->pid_allocated with
> > >
> > > this lock held.
> > >
> > > Oleg.
> > >
> >
> > However, not all places that read/write pid_allocated are locked,
> > for example:
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/kernel/pid_namespace.c#n271
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/kernel/fork.c#n2602
> >
> > So, in fact, the pidmap_lock is not effective. And if we were to add locks
> > to all these places, it would be too heavy.
>
> It seems you misunderstood me. I didn't argue with the lockless READ_ONCE()s
> outside of pidmap_lock.
Agreed. We should only add those annotations where they're really
needed (someone once taught me ;).
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH v1] pid: annotate data-races around pid_ns->pid_allocated
2025-04-24 9:38 ` Christian Brauner
@ 2025-04-25 5:37 ` Jiayuan Chen
0 siblings, 0 replies; 7+ messages in thread
From: Jiayuan Chen @ 2025-04-25 5:37 UTC (permalink / raw)
To: Christian Brauner, Oleg Nesterov
Cc: linux-kernel, mrpre, mkoutny, syzbot+adcaa842b762a1762e7d,
syzbot+fab52e3459fa2f95df57, syzbot+0718f65353d72efaac1e,
Andrew Morton, Lorenzo Stoakes, Liam R. Howlett,
Suren Baghdasaryan, Wei Yang, David Hildenbrand, Al Viro,
Mateusz Guzik, Joel Granados, Jens Axboe, Wei Liu,
Frederic Weisbecker
April 24, 2025 at 17:38, "Christian Brauner" <brauner@kernel.org> wrote:
>
> On Wed, Apr 23, 2025 at 06:38:18PM +0200, Oleg Nesterov wrote:
>
> >
> > On 04/23, Jiayuan Chen wrote:
> >
> > April 23, 2025 at 21:51, "Oleg Nesterov" <oleg@redhat.com> wrote:
> >
> > >
> >
> > > On 04/23, Jiayuan Chen wrote:
> >
> > >
> >
> > > >
> >
> > > > Suppress syzbot reports by annotating these accesses using
> >
> > > >
> >
> > > > READ_ONCE() / WRITE_ONCE().
> >
> > > >
> >
> > >
> >
> > > ...
> >
> > >
> >
> > > >
> >
> > > > --- a/kernel/pid.c
> >
> > > >
> >
> > > > +++ b/kernel/pid.c
> >
> > > >
> >
> > > > @@ -122,7 +122,8 @@ void free_pid(struct pid *pid)
> >
> > > >
> >
> > > > for (i = 0; i <= pid->level; i++) {
> >
> > > >
> >
> > > > struct upid *upid = pid->numbers + i;
> >
> > > >
> >
> > > > struct pid_namespace *ns = upid->ns;
> >
> > > >
> >
> > > > - switch (--ns->pid_allocated) {
> >
> > > >
> >
> > > > + WRITE_ONCE(ns->pid_allocated, READ_ONCE(ns->pid_allocated) - 1);
> >
> > > >
> >
> > > > + switch (READ_ONCE(ns->pid_allocated)) {
> >
> > > >
> >
> > >
> >
> > > I keep forgetting how kcsan works, but we don't need
> >
> > >
> >
> > > READ_ONCE(ns->pid_allocated) under pidmap_lock?
> >
> > >
> >
> > > Same for other functions which read/modify ->pid_allocated with
> >
> > >
> >
> > > this lock held.
> >
> > >
> >
> > > Oleg.
> >
> > >
> >
> > However, not all places that read/write pid_allocated are locked,
> >
> > for example:
> >
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/kernel/pid_namespace.c#n271
> >
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/kernel/fork.c#n2602
> >
> > So, in fact, the pidmap_lock is not effective. And if we were to add locks
> >
> > to all these places, it would be too heavy.
> >
> >
> >
> > It seems you misunderstood me. I didn't argue with the lockless READ_ONCE()s
> >
> > outside of pidmap_lock.
> >
>
> Agreed. We should only add those annotations where they're really
>
> needed (someone once taught me ;).
>
Thank you for your suggestion, it make sense to me.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-04-25 5:38 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-23 11:55 [PATCH v1] pid: annotate data-races around pid_ns->pid_allocated Jiayuan Chen
2025-04-23 13:51 ` Oleg Nesterov
2025-04-23 14:33 ` Jiayuan Chen
2025-04-23 16:24 ` Michal Koutný
2025-04-23 16:38 ` Oleg Nesterov
2025-04-24 9:38 ` Christian Brauner
2025-04-25 5:37 ` Jiayuan Chen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox