linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jiri Slaby <jirislaby@kernel.org>
To: Christian Brauner <brauner@kernel.org>
Cc: linux-fsdevel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Seth Forshee <sforshee@kernel.org>,
	Tycho Andersen <tycho@tycho.pizza>
Subject: Re: [PATCH 2/2] pidfd: add pidfdfs
Date: Fri, 17 May 2024 09:09:03 +0200	[thread overview]
Message-ID: <a65b573a-8573-4a17-a918-b5cf358c17d6@kernel.org> (raw)
In-Reply-To: <a15b1050-4b52-4740-a122-a4d055c17f11@kernel.org>

On 16. 05. 24, 7:28, Jiri Slaby wrote:
> On 15. 05. 24, 18:39, Christian Brauner wrote:
>> On Wed, May 15, 2024 at 01:10:49PM +0200, Jiri Slaby wrote:
>>> On 13. 02. 24, 17:45, Christian Brauner wrote:
>>>> This moves pidfds from the anonymous inode infrastructure to a tiny
>>>> pseudo filesystem. This has been on my todo for quite a while as it 
>>>> will
>>>> unblock further work that we weren't able to do simply because of the
>>>> very justified limitations of anonymous inodes. Moving pidfds to a tiny
>>>> pseudo filesystem allows:
>>>>
>>>> * statx() on pidfds becomes useful for the first time.
>>>> * pidfds can be compared simply via statx() and then comparing inode
>>>>     numbers.
>>>> * pidfds have unique inode numbers for the system lifetime.
>>>> * struct pid is now stashed in inode->i_private instead of
>>>>     file->private_data. This means it is now possible to introduce
>>>>     concepts that operate on a process once all file descriptors 
>>>> have been
>>>>     closed. A concrete example is kill-on-last-close.
>>>> * file->private_data is freed up for per-file options for pidfds.
>>>> * Each struct pid will refer to a different inode but the same struct
>>>>     pid will refer to the same inode if it's opened multiple times. In
>>>>     contrast to now where each struct pid refers to the same inode. 
>>>> Even
>>>>     if we were to move to anon_inode_create_getfile() which creates new
>>>>     inodes we'd still be associating the same struct pid with multiple
>>>>     different inodes.
>>>> * Pidfds now go through the regular dentry_open() path which means that
>>>>     all security hooks are called unblocking proper LSM management for
>>>>     pidfds. In addition fsnotify hooks are called and allow for 
>>>> listening
>>>>     to open events on pidfds.
>>>>
>>>> The tiny pseudo filesystem is not visible anywhere in userspace exactly
>>>> like e.g., pipefs and sockfs. There's no lookup, there's no complex
>>>> inode operations, nothing. Dentries and inodes are always deleted when
>>>> the last pidfd is closed.
>>>
>>> This breaks lsof and util-linux.
>>>
>>> Without the commit, lsof shows:
>>> systemd      ... 59 [pidfd:899]
>>>
>>>
>>> With the commit:
>>> systemd      ... 1187 pidfd
>>>
>>>
>>> And that user-visible change breaks a lot of stuff, incl. lsof tests.
>>>
>>> For util-linux, its test fail with:
>>>
>>>> [  125s] --- tests/expected/lsfd/column-name-pidfd    2024-05-06 
>>>> 07:20:54.655845940 +0000
>>>> [  125s] +++ tests/output/lsfd/column-name-pidfd    2024-05-15 
>>>> 01:04:15.406666666 +0000
>>>> [  125s] @@ -1,2 +1,2 @@
>>>> [  125s] -3 anon_inode:[pidfd] pid=1 comm= nspid=1
>>>> [  125s] +3 pidfd:[INODENUM] pidfd:[INODENUM]
>>>> [  125s]  pidfd:ASSOC,KNAME,NAME: 0
>>>> [  125s]          lsfd: NAME and KNAME column: [02] 
>>>> pidfd             ... FAILED (lsfd/column-name-pidfd)
>>>
>>> And:
>>>> [  125s] --- tests/expected/lsfd/column-type-pidfd    2024-05-06 
>>>> 07:20:54.655845940 +0000
>>>> [  125s] +++ tests/output/lsfd/column-type-pidfd    2024-05-15 
>>>> 01:04:15.573333333 +0000
>>>> [  125s] @@ -1,2 +1,2 @@
>>>> [  125s] -3 UNKN pidfd
>>>> [  125s] +3 REG REG
>>>> [  125s]  pidfd:ASSOC,STTYPE,TYPE: 0
>>>> [  125s]          lsfd: TYPE and STTYPE column: [02] 
>>>> pidfd            ... FAILED (lsfd/column-type-pidfd)
>>>
>>> Any ideas?
>>
>> util-linux upstream is already handling that correctly now but it 
>> seems that
>> lsof is not. To fix this in the kernel we'll need something like. If 
>> you could
>> test this it'd be great as I'm currently traveling:
>>
>> diff --git a/fs/pidfs.c b/fs/pidfs.c
>> index a63d5d24aa02..3da848a8a95e 100644
>> --- a/fs/pidfs.c
>> +++ b/fs/pidfs.c
>> @@ -201,10 +201,8 @@ static const struct super_operations pidfs_sops = {
>>
>>   static char *pidfs_dname(struct dentry *dentry, char *buffer, int 
>> buflen)
>>   {
>> -       struct inode *inode = d_inode(dentry);
>> -       struct pid *pid = inode->i_private;
>> -
>> -       return dynamic_dname(buffer, buflen, "pidfd:[%llu]", pid->ino);
>> +       /* Fake the old name as some userspace seems to rely on this. */
>> +       return dynamic_dname(buffer, buflen, "anon_inode:[pidfd]");
> 
> No, the lsof test runs "lsof -p $pid -a -d $fd -F pfn" and expects:
> "p${pid} f${fd} n[pidfd:$pid]"
> 
> But it gets now:
> p959 f3 nanon_inode
> 
> I.e. "anon_inode" instead of "n[pidfd:959]".
> 
> Did you intend to fix by the patch the lsfd's (util-linux) 
> column-name-pidfd test by this instead (the above)?

This is now discussed in https://github.com/lsof-org/lsof/issues/317 too.

So yeah,
# ll /proc/984/fd
total 0
lrwx------ 1 xslaby users 64 May 17 09:00 0 -> /dev/pts/1
lrwx------ 1 xslaby users 64 May 17 09:00 2 -> /dev/pts/1
lrwx------ 1 xslaby users 64 May 17 09:00 3 -> anon_inode:[pidfd]

looks good with the patch. But lsof checks if this IS_REG(). And if it 
is, pidfd handling is not done.

> thanks,
-- 
js
suse labs


  reply	other threads:[~2024-05-17  7:09 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-13 16:45 [PATCH 0/2] Move pidfd to tiny pseudo fs Christian Brauner
2024-02-13 16:45 ` [PATCH 1/2] pidfd: move struct pidfd_fops Christian Brauner
2024-02-13 16:45 ` [PATCH 2/2] pidfd: add pidfdfs Christian Brauner
2024-02-13 17:17   ` Linus Torvalds
2024-02-14 14:40     ` Christian Brauner
2024-02-14 18:27       ` Christian Brauner
2024-02-14 18:37         ` Linus Torvalds
2024-02-15 16:11           ` Christian Brauner
2024-02-16 11:50             ` Christian Brauner
2024-02-16 16:41               ` Christian Brauner
2024-02-17 13:59               ` Oleg Nesterov
2024-02-17 17:30                 ` Linus Torvalds
2024-02-17 17:38                   ` Linus Torvalds
2024-02-18 11:15                   ` Christian Brauner
2024-02-18 11:33                     ` Christian Brauner
2024-02-18 17:54                       ` Christian Brauner
2024-02-18 18:08                         ` Linus Torvalds
2024-02-18 18:57                           ` Linus Torvalds
2024-02-19 18:05                             ` Christian Brauner
2024-02-19 18:34                               ` Linus Torvalds
2024-02-19 21:18                                 ` Christian Brauner
2024-02-19 23:24                                   ` Linus Torvalds
2024-02-18 14:27                     ` Oleg Nesterov
2024-02-18  9:30                 ` Christian Brauner
2024-02-22 19:03   ` Nathan Chancellor
2024-02-23 10:18     ` Heiko Carstens
2024-02-23 11:56       ` Christian Brauner
2024-02-23 11:55     ` Christian Brauner
2024-02-23 12:57       ` Heiko Carstens
2024-02-23 13:27         ` Christian Brauner
2024-02-23 13:35           ` Heiko Carstens
2024-02-23 13:41       ` Christian Brauner
2024-02-23 21:26       ` Christian Brauner
2024-02-23 21:58         ` Linus Torvalds
2024-02-24  5:52           ` Christian Brauner
2024-02-24  6:05             ` Christian Brauner
2024-02-24 18:48             ` Linus Torvalds
2024-02-24 19:15               ` Christian Brauner
2024-02-24 19:19                 ` Christian Brauner
2024-02-24 19:21                 ` Linus Torvalds
2024-02-27 19:26                 ` Nathan Chancellor
2024-02-27 22:13                   ` Christian Brauner
2024-03-12 10:35   ` Geert Uytterhoeven
2024-03-12 14:09     ` Christian Brauner
2024-05-15 11:10   ` Jiri Slaby
2024-05-15 16:39     ` Christian Brauner
2024-05-16  5:28       ` Jiri Slaby
2024-05-17  7:09         ` Jiri Slaby [this message]
2024-05-17  7:54           ` Jiri Slaby
2024-05-17 20:07             ` Linus Torvalds
2024-05-20  8:23               ` Jiri Slaby
2024-05-20 19:01                 ` Linus Torvalds
2024-05-20 19:15                   ` Linus Torvalds
2024-05-21  6:07                     ` Jiri Slaby
2024-05-21  6:13                       ` Jiri Slaby
2024-05-21 12:33                         ` Christian Brauner
2024-05-21 12:40                           ` Christian Brauner
2024-05-21 15:10                             ` Linus Torvalds
2024-05-25 11:57                               ` Christian Brauner
2024-05-21 12:16               ` Christian Brauner
2024-02-13 17:02 ` [PATCH 0/2] Move pidfd to tiny pseudo fs Christian Brauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a65b573a-8573-4a17-a918-b5cf358c17d6@kernel.org \
    --to=jirislaby@kernel.org \
    --cc=brauner@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=sforshee@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tycho@tycho.pizza \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).