From: Oleg Nesterov <oleg@redhat.com>
To: Christian Brauner <brauner@kernel.org>
Cc: linux-fsdevel@vger.kernel.org, Jeff Layton <jlayton@kernel.org>,
Lennart Poettering <lennart@poettering.net>,
Daan De Meyer <daan.j.demeyer@gmail.com>,
Mike Yuan <me@yhndnzj.com>,
linux-kernel@vger.kernel.org,
Peter Ziljstra <peterz@infradead.org>
Subject: Re: [RFC PATCH] pidfs: ensure consistent ENOENT/ESRCH reporting
Date: Thu, 10 Apr 2025 15:10:09 +0200 [thread overview]
Message-ID: <20250410131008.GB15280@redhat.com> (raw)
In-Reply-To: <20250410-barhocker-weinhandel-8ed2f619899b@brauner>
On 04/10, Christian Brauner wrote:
>
> On Thu, Apr 10, 2025 at 12:18:01PM +0200, Oleg Nesterov wrote:
> > On 04/09, Oleg Nesterov wrote:
> > >
> > > On 04/09, Christian Brauner wrote:
> > > >
> > > > The seqcounter might be
> > > > useful independent of pidfs.
> > >
> > > Are you sure? ;) to me the new pid->pid_seq needs more justification...
>
> Yeah, pretty much. I'd make use of this in other cases where we need to
> detect concurrent changes to struct pid without having to take any
> locks. Multi-threaded exec in de_exec() comes to mind as well.
Perhaps you are right, but so far I am still not sure it makes sense.
And we can always add it later if we have another (more convincing)
use-case.
> > To remind, detach_pid(pid, PIDTYPE_PID) does wake_up_all(&pid->wait_pidfd) and
> > takes pid->wait_pidfd->lock.
> >
> > So if pid_has_task(PIDTYPE_PID) succeeds, __unhash_process() -> detach_pid(TGID)
> > is not possible until we drop pid->wait_pidfd->lock.
> >
> > If detach_pid(PIDTYPE_PID) was already called and have passed wake_up_all(),
> > pid_has_task(PIDTYPE_PID) can't succeed.
>
> I know. I was trying to avoid having to take the lock and just make this
> lockless. But if you think we should use this lock here instead I'm
> willing to do this. I just find the sequence counter more elegant than
> the spin_lock_irq().
This is subjective, and quite possibly I am wrong. But yes, I'd prefer
to (ab)use pid->wait_pidfd->lock in pidfd_prepare() for now and not
penalize __unhash_process(). Simply because this is simpler.
If you really dislike taking wait_pidfd->lock, we can add mb() into
__unhash_process() or even smp_mb__after_spinlock() into __change_pid(),
but this will need a lengthy comment...
As for your patch... it doesn't apply on top of 3/4, but I guess it
is clear what does it do, and (unfortunately ;) it looks correct, so
I won't insist too much. See a couple of nits below.
> this imho and it would give pidfds a reliable way to detect relevant
> concurrent changes locklessly without penalizing other critical paths
> (e.g., under tasklist_lock) in the kernel.
Can't resist... Note that raw_seqcount_begin() in pidfd_prepare() will
take/drop tasklist_lock if it races with __unhash_process() on PREEMPT_RT.
Yes, this is unlikely case, but still...
Now. Unless I misread your patch, pidfd_prepare() does "err = 0" only
once before the main loop. And this is correct. But this means that
we do not need the do/while loop.
If read_seqcount_retry() returns true, we can safely return -ESRCH. So
we can do
seq = raw_seqcount_begin(&pid->pid_seq);
if (!PIDFD_THREAD && !pid_has_task(PIDTYPE_TGID))
err = -ENOENT;
if (!pid_has_task(PIDTYPE_PID))
err = -ESRCH;
if (read_seqcount_retry(pid->pid_seq, seq))
err = -ESRCH;
In fact we don't even need raw_seqcount_begin(), we could use
raw_seqcount_try_begin().
And why seqcount_rwlock_t? A plain seqcount_t can equally work.
And, if we use seqcount_rwlock_t,
lockdep_assert_held_write(&tasklist_lock);
...
raw_write_seqcount_begin(pid->pid_seq);
in __unhash_process() looks a bit strange. I'd suggest to use
write_seqcount_begin() which does seqprop_assert() and kill
lockdep_assert_held_write().
Oleg.
next prev parent reply other threads:[~2025-04-10 13:10 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-03 14:09 [PATCH RFC 0/4] pidfd: improve uapi when task isn't found Christian Brauner
2025-04-03 14:09 ` [PATCH RFC 1/4] selftests/pidfd: adapt to recent changes Christian Brauner
2025-04-03 14:09 ` [PATCH RFC 2/4] pidfd: remove unneeded NULL check from pidfd_prepare() Christian Brauner
2025-04-03 14:09 ` [PATCH RFC 3/4] pidfd: improve uapi when task isn't found Christian Brauner
2025-04-04 12:37 ` Oleg Nesterov
2025-04-04 13:38 ` Christian Brauner
2025-04-04 14:53 ` Oleg Nesterov
2025-04-09 15:38 ` Christian Brauner
2025-04-09 18:18 ` [RFC PATCH] pidfs: ensure consistent ENOENT/ESRCH reporting Christian Brauner
2025-04-09 18:40 ` Oleg Nesterov
2025-04-10 10:18 ` Oleg Nesterov
2025-04-10 10:43 ` Christian Brauner
2025-04-10 13:10 ` Oleg Nesterov [this message]
2025-04-10 20:05 ` Christian Brauner
2025-04-10 20:24 ` Christian Brauner
2025-04-11 11:08 ` Christian Brauner
2025-04-11 11:25 ` Oleg Nesterov
2025-04-11 11:41 ` Oleg Nesterov
2025-04-03 14:09 ` [PATCH RFC 4/4] selftest/pidfd: add test for thread-group leader pidfd open for thread Christian Brauner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250410131008.GB15280@redhat.com \
--to=oleg@redhat.com \
--cc=brauner@kernel.org \
--cc=daan.j.demeyer@gmail.com \
--cc=jlayton@kernel.org \
--cc=lennart@poettering.net \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=me@yhndnzj.com \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.