From: Matthew Bobrowski <repnop@google.com>
To: Amir Goldstein <amir73il@gmail.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>,
Jan Kara <jack@suse.cz>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Linux API <linux-api@vger.kernel.org>
Subject: Re: [PATCH 0/5] Add pidfd support to the fanotify API
Date: Wed, 2 Jun 2021 20:43:59 +1000 [thread overview]
Message-ID: <YLdg7wWQ/GTbe1eh@google.com> (raw)
In-Reply-To: <CAOQ4uxieRQ3s5rWA55ZBDr4xm6i9vXyWx-iErMgYzGCE5nYKcA@mail.gmail.com>
On Wed, Jun 02, 2021 at 10:18:36AM +0300, Amir Goldstein wrote:
> On Wed, Jun 2, 2021 at 9:30 AM Matthew Bobrowski <repnop@google.com> wrote:
> >
> > On Tue, Jun 01, 2021 at 01:46:28PM +0200, Christian Brauner wrote:
> > > On Tue, Jun 01, 2021 at 09:03:26PM +1000, Matthew Bobrowski wrote:
> > > > On Wed, May 26, 2021 at 08:05:29PM +0200, Christian Brauner wrote:
> > > > > On Wed, May 26, 2021 at 09:20:55AM +1000, Matthew Bobrowski wrote:
> > > > > > On Tue, May 25, 2021 at 12:31:33PM +0200, Christian Brauner wrote:
> > > > > > > On Mon, May 24, 2021 at 10:47:46AM +0200, Jan Kara wrote:
> > > > > > > > On Sat 22-05-21 09:32:36, Matthew Bobrowski wrote:
> > > > > > > > > On Fri, May 21, 2021 at 12:40:56PM +0200, Jan Kara wrote:
> > > > > > > > > > On Fri 21-05-21 20:15:35, Matthew Bobrowski wrote:
> > > > > > > > > > > On Thu, May 20, 2021 at 03:55:27PM +0200, Jan Kara wrote:
> > > > > > > > > > > There's one thing that I'd like to mention, and it's something in
> > > > > > > > > > > regards to the overall approach we've taken that I'm not particularly
> > > > > > > > > > > happy about and I'd like to hear all your thoughts. Basically, with
> > > > > > > > > > > this approach the pidfd creation is done only once an event has been
> > > > > > > > > > > queued and the notification worker wakes up and picks up the event
> > > > > > > > > > > from the queue processes it. There's a subtle latency introduced when
> > > > > > > > > > > taking such an approach which at times leads to pidfd creation
> > > > > > > > > > > failures. As in, by the time pidfd_create() is called the struct pid
> > > > > > > > > > > has already been reaped, which then results in FAN_NOPIDFD being
> > > > > > > > > > > returned in the pidfd info record.
> > > > > > > > > > >
> > > > > > > > > > > Having said that, I'm wondering what the thoughts are on doing pidfd
> > > > > > > > > > > creation earlier on i.e. in the event allocation stages? This way, the
> > > > > > > > > > > struct pid is pinned earlier on and rather than FAN_NOPIDFD being
> > > > > > > > > > > returned in the pidfd info record because the struct pid has been
> > > > > > > > > > > already reaped, userspace application will atleast receive a valid
> > > > > > > > > > > pidfd which can be used to check whether the process still exists or
> > > > > > > > > > > not. I think it'll just set the expectation better from an API
> > > > > > > > > > > perspective.
> > > > > > > > > >
> > > > > > > > > > Yes, there's this race. OTOH if FAN_NOPIDFD is returned, the listener can
> > > > > > > > > > be sure the original process doesn't exist anymore. So is it useful to
> > > > > > > > > > still receive pidfd of the dead process?
> > > > > > > > >
> > > > > > > > > Well, you're absolutely right. However, FWIW I was approaching this
> > > > > > > > > from two different angles:
> > > > > > > > >
> > > > > > > > > 1) I wanted to keep the pattern in which the listener checks for the
> > > > > > > > > existence/recycling of the process consistent. As in, the listener
> > > > > > > > > would receive the pidfd, then send the pidfd a signal via
> > > > > > > > > pidfd_send_signal() and check for -ESRCH which clearly indicates
> > > > > > > > > that the target process has terminated.
> > > > > > > > >
> > > > > > > > > 2) I didn't want to mask failed pidfd creation because of early
> > > > > > > > > process termination and other possible failures behind a single
> > > > > > > > > FAN_NOPIDFD. IOW, if we take the -ESRCH approach above, the
> > > > > > > > > listener can take clear corrective branches as what's to be done
> > > > > > > > > next if a race is to have been detected, whereas simply returning
> > > > > > > > > FAN_NOPIDFD at this stage can mean multiple things.
> > > > > > > > >
> > > > > > > > > Now that I've written the above and keeping in mind that we'd like to
> > > > > > > > > refrain from doing anything in the event allocation stages, perhaps we
> > > > > > > > > could introduce a different error code for detecting early process
> > > > > > > > > termination while attempting to construct the info record. WDYT?
> > > > > > > >
> > > > > > > > Sure, I wouldn't like to overengineer it but having one special fd value for
> > > > > > > > "process doesn't exist anymore" and another for general "creating pidfd
> > > > > > > > failed" looks OK to me.
> > > > > > >
> > > > > > > FAN_EPIDFD -> "creation failed"
> > > > > > > FAN_NOPIDFD -> "no such process"
> > > > > >
> > > > > > Yes, I was thinking something along the lines of this...
> > > > > >
> > > > > > With the approach that I've proposed in this series, the pidfd
> > > > > > creation failure trips up in pidfd_create() at the following
> > > > > > condition:
> > > > > >
> > > > > > if (!pid || !pid_has_task(pid, PIDTYPE_TGID))
> > > > > > return -EINVAL;
> > > > > >
> > > > > > Specifically, the following check:
> > > > > > !pid_has_task(pid, PIDTYPE_TGID)
> > > > > >
> > > > > > In order to properly report either FAN_NOPIDFD/FAN_EPIDFD to
> > > > > > userspace, AFAIK I'll have to do one of either two things to better
> > > > > > distinguish between why the pidfd creation had failed:
> > > > >
> > > > > Ok, I see. You already do have a reference to a struct pid and in that
> > > > > case we should just always return a pidfd to the caller. For
> > > > > pidfd_open() for example we only report an error when
> > > > > find_get_pid(<pidnr>) doesn't find a struct pid to refer to. But in your
> > > > > case here you already have a struct pid so I think we should just keep
> > > > > this simple and always return a pidfd to the caller and in fact do
> > > > > burden them with figuring out that the process is gone via
> > > > > pidfd_send_signal() instead of complicating our lives here.
> > > >
> > > > Ah, actually Christian... Before, I go ahead and send through the updated
> > > > series. Given what you've mentioned above I'm working with the assumption
> > > > that you're OK with dropping the pid_has_task() check from pidfd_create()
> > > > [0]. Is that right?
> > > >
> > > > If so, I don't know how I feel about this given that pidfd_create() is now
> > > > to be exposed to the rest of the kernel and the pidfd API, as it stands,
> > > > doesn't support the creation of pidfds for non-thread-group leaders. I
> > > > suppose what I don't want is other kernel subsystems, if any, thinking it's
> > > > OK to call pidfd_create() with an arbitrary struct pid and setting the
> > > > expectation that a fully functional pidfd will be returned.
> > > >
> > > > The way I see it, I think we've got two options here:
> > > >
> > > > 1) Leave the pid_has_task() check within pidfd_create() and perform another
> > > > explicit pid_has_task() check from the fanotify code before calling
> > > > pidfd_create(). If it returns false, we set something like FAN_NOPIDFD
> > > > indicating to userspace that there's no such process when the event was
> > > > created.
> > > >
> > > > 2) Scrap using pidfd_create() all together and implement a fanotify
> > > > specific pidfd creation wrapper which would allow for more
> > > > control. Something along the lines of what you've done in kernel/fork.c
> > > > [1]. Not the biggest fan of this idea just yet given the possibility of
> > > > it leading to an API drift over time.
> > > >
> > > > WDYT?
> > >
> > > Hm, why would you have to drop the pid_has_task() check again?
> >
> > Because of the race that I brielfy decscribed here [0]. The race exists
>
> Sorry for being thich.
You're not being thick at all Amir, and perhaps this is my fault for not
articulating the problem at hand correctly.
> I still don't understand what's racy about this. Won't the event reader
> get a valid pidfd?
I guess this depends, right?
As the logic/implementation currently stands in this specific patch series,
pidfd_create() will _NOT_ return a valid pidfd unless the struct pid still
holds reference to a task of type PIDTYPE_TGID. This is thanks to the extra
pid_hash_task() check that I thought was appropriate to incorporate into
pidfd_create() seeing as though:
- With the pidfd_create() declaration now being added to linux/pid.h, we
effectively are giving the implicit OK for it to be called from other
kernel subsystems, and hence why the caller should be subject to the
same restrictions/verifications imposed by the API specification
i.e. "Currently, the process identified by @pid must be a thread-group
leader...". Not enforcing the pid_has_task() check in pidfd_create()
effectively says that the pidfd creation can be done for any struct pid
types i.e. PIDTYPE_PID, PIDTYPE_PGID, etc. This leads to assumptions
being made by the callers, which effectively then could lead to
undefined/unexpected behavior.
There definitely can be cases whereby the underlying task(s) associated
with a struct pid have been freed as a result of process being killed
early. As in, the process is killed before the pid_has_task() check is
performed from within pidfd_create() when called from fanotify. This is
precisely the race that I'm referring to here, and in such cases as the
code currently stands, the event listener will _NOT_ receive a valid pidfd.
> Can't the event reader verify that the pidfd points to a dead process?
This was the idea, as in, the burden of checking whether a process has been
killed before the event listener receives the event should be on the event
listener. However, we're trying to come up with a good way to effectively
communicate that the above race I've attempted to articulate has actually
occurred. As in, do we:
a) Drop the pid_has_task() check in pidfd_create() so that a pidfd can be
returned for all passed struct pids? That way, even if the above race is
experienced the caller will still receive a pidfd and the event listener
can do whatever it needs to with it.
b) Before calling into pidfd_create(), perform an explicit pid_has_task()
check for PIDTYPE_TGID and if that returns false, then set FAN_NOPIDFD
and save ourselves from calling into pidfd_create() all together. The
event listener in this case doesn't have to perform the signal check to
determine whether the process has already been killed.
c) Scrap calling into pidfd_create() all together and write a simple
fanotify wrapper that contains the pidfd creation logic we need.
> I don't mind returning FAN_NOPIDFD for convenience, but user
> will have to check the pidfd that it got anyway, because process
> can die at any time between reading the event and acting on the
> pidfd.
Well sort of, as it depends on the approach that we decide to go ahead with
to report such early process termination cases. If we return FAN_NOPIDFD,
which signifies that the process died before a pidfd could be created, then
there's no point for the listener to step into checking the pidfd because
the pidfd already == FAN_NOPIDFD. If we simply return a pidfd regardless of
the early termination of the process, then sure the event listener will
need to check each pidfd that is supplied.
> > because we perform the pidfd creation during the notification queue
> > processing and not in the event allocation stages (for reasons that Jan has
> > already covered here [1]). So, tl;dr there is the case where the fanotify
> > calls pidfd_create() and the check for pid_has_task() fails because the
> > struct pid that we're hanging onto within an event no longer contains a
> > task of type PIDTYPE_TGID...
> >
> > [0] https://www.spinics.net/lists/linux-api/msg48630.html
> > [1] https://www.spinics.net/lists/linux-api/msg48632.html
Maybe I'm going down a rabbit hole and overthinking this whole thing,
IDK... :(
/M
next prev parent reply other threads:[~2021-06-02 10:44 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-20 2:09 [PATCH 0/5] Add pidfd support to the fanotify API Matthew Bobrowski
2021-05-20 2:10 ` [PATCH 1/5] kernel/pid.c: remove static qualifier from pidfd_create() Matthew Bobrowski
2021-05-20 2:10 ` [PATCH 2/5] kernel/pid.c: implement checks on parameters passed to pidfd_create() Matthew Bobrowski
2021-05-20 2:11 ` [PATCH 3/5] fanotify_user.c: minor cosmetic adjustments to fid labels Matthew Bobrowski
2021-05-20 2:11 ` [PATCH 4/5] fanotify/fanotify_user.c: introduce a generic info record copying function Matthew Bobrowski
2021-05-20 13:59 ` Amir Goldstein
2021-05-21 9:26 ` Matthew Bobrowski
2021-05-20 2:11 ` [PATCH 5/5] fanotify: Add pidfd info record support to the fanotify API Matthew Bobrowski
2021-05-20 8:17 ` Christian Brauner
2021-05-20 13:43 ` Amir Goldstein
2021-05-21 9:21 ` Matthew Bobrowski
2021-05-21 9:41 ` Amir Goldstein
2021-05-21 10:24 ` Jan Kara
2021-05-21 11:10 ` Amir Goldstein
2021-05-21 13:19 ` Jan Kara
2021-05-21 13:52 ` Amir Goldstein
2021-05-21 15:14 ` Jan Kara
2021-05-22 0:41 ` Matthew Bobrowski
2021-05-22 9:01 ` Amir Goldstein
2021-05-20 13:55 ` [PATCH 0/5] Add pidfd " Jan Kara
2021-05-21 10:15 ` Matthew Bobrowski
2021-05-21 10:40 ` Jan Kara
2021-05-21 23:32 ` Matthew Bobrowski
2021-05-24 8:47 ` Jan Kara
2021-05-25 10:31 ` Christian Brauner
2021-05-25 23:20 ` Matthew Bobrowski
2021-05-26 18:05 ` Christian Brauner
2021-05-26 22:54 ` Matthew Bobrowski
2021-06-01 11:03 ` Matthew Bobrowski
2021-06-01 11:46 ` Christian Brauner
2021-06-02 6:30 ` Matthew Bobrowski
2021-06-02 7:18 ` Amir Goldstein
2021-06-02 8:48 ` Christian Brauner
2021-06-02 10:56 ` Matthew Bobrowski
2021-06-02 12:46 ` Christian Brauner
2021-06-02 10:43 ` Matthew Bobrowski [this message]
2021-06-02 12:18 ` Amir Goldstein
2021-06-03 1:24 ` Matthew Bobrowski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YLdg7wWQ/GTbe1eh@google.com \
--to=repnop@google.com \
--cc=amir73il@gmail.com \
--cc=christian.brauner@ubuntu.com \
--cc=jack@suse.cz \
--cc=linux-api@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.