All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jiri Slaby <jirislaby@kernel.org>
To: Christian Brauner <brauner@kernel.org>
Cc: linux-fsdevel@vger.kernel.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Seth Forshee <sforshee@kernel.org>,
	Tycho Andersen <tycho@tycho.pizza>
Subject: Re: [PATCH 2/2] pidfd: add pidfdfs
Date: Fri, 17 May 2024 09:54:19 +0200	[thread overview]
Message-ID: <84bc442d-c4dd-418e-8020-e1ff987cad13@kernel.org> (raw)
In-Reply-To: <a65b573a-8573-4a17-a918-b5cf358c17d6@kernel.org>

On 17. 05. 24, 9:09, Jiri Slaby wrote:
> On 16. 05. 24, 7:28, Jiri Slaby wrote:
>> On 15. 05. 24, 18:39, Christian Brauner wrote:
>>> On Wed, May 15, 2024 at 01:10:49PM +0200, Jiri Slaby wrote:
>>>> On 13. 02. 24, 17:45, Christian Brauner wrote:
>>>>> This moves pidfds from the anonymous inode infrastructure to a tiny
>>>>> pseudo filesystem. This has been on my todo for quite a while as it 
>>>>> will
>>>>> unblock further work that we weren't able to do simply because of the
>>>>> very justified limitations of anonymous inodes. Moving pidfds to a 
>>>>> tiny
>>>>> pseudo filesystem allows:
>>>>>
>>>>> * statx() on pidfds becomes useful for the first time.
>>>>> * pidfds can be compared simply via statx() and then comparing inode
>>>>>     numbers.
>>>>> * pidfds have unique inode numbers for the system lifetime.
>>>>> * struct pid is now stashed in inode->i_private instead of
>>>>>     file->private_data. This means it is now possible to introduce
>>>>>     concepts that operate on a process once all file descriptors 
>>>>> have been
>>>>>     closed. A concrete example is kill-on-last-close.
>>>>> * file->private_data is freed up for per-file options for pidfds.
>>>>> * Each struct pid will refer to a different inode but the same struct
>>>>>     pid will refer to the same inode if it's opened multiple times. In
>>>>>     contrast to now where each struct pid refers to the same inode. 
>>>>> Even
>>>>>     if we were to move to anon_inode_create_getfile() which creates 
>>>>> new
>>>>>     inodes we'd still be associating the same struct pid with multiple
>>>>>     different inodes.
>>>>> * Pidfds now go through the regular dentry_open() path which means 
>>>>> that
>>>>>     all security hooks are called unblocking proper LSM management for
>>>>>     pidfds. In addition fsnotify hooks are called and allow for 
>>>>> listening
>>>>>     to open events on pidfds.
>>>>>
>>>>> The tiny pseudo filesystem is not visible anywhere in userspace 
>>>>> exactly
>>>>> like e.g., pipefs and sockfs. There's no lookup, there's no complex
>>>>> inode operations, nothing. Dentries and inodes are always deleted when
>>>>> the last pidfd is closed.
>>>>
>>>> This breaks lsof and util-linux.
>>>>
>>>> Without the commit, lsof shows:
>>>> systemd      ... 59 [pidfd:899]
>>>>
>>>>
>>>> With the commit:
>>>> systemd      ... 1187 pidfd
>>>>
>>>>
>>>> And that user-visible change breaks a lot of stuff, incl. lsof tests.
>>>>
>>>> For util-linux, its test fail with:
>>>>
>>>>> [  125s] --- tests/expected/lsfd/column-name-pidfd    2024-05-06 
>>>>> 07:20:54.655845940 +0000
>>>>> [  125s] +++ tests/output/lsfd/column-name-pidfd    2024-05-15 
>>>>> 01:04:15.406666666 +0000
>>>>> [  125s] @@ -1,2 +1,2 @@
>>>>> [  125s] -3 anon_inode:[pidfd] pid=1 comm= nspid=1
>>>>> [  125s] +3 pidfd:[INODENUM] pidfd:[INODENUM]
>>>>> [  125s]  pidfd:ASSOC,KNAME,NAME: 0
>>>>> [  125s]          lsfd: NAME and KNAME column: [02] 
>>>>> pidfd             ... FAILED (lsfd/column-name-pidfd)
>>>>
>>>> And:
>>>>> [  125s] --- tests/expected/lsfd/column-type-pidfd    2024-05-06 
>>>>> 07:20:54.655845940 +0000
>>>>> [  125s] +++ tests/output/lsfd/column-type-pidfd    2024-05-15 
>>>>> 01:04:15.573333333 +0000
>>>>> [  125s] @@ -1,2 +1,2 @@
>>>>> [  125s] -3 UNKN pidfd
>>>>> [  125s] +3 REG REG
>>>>> [  125s]  pidfd:ASSOC,STTYPE,TYPE: 0
>>>>> [  125s]          lsfd: TYPE and STTYPE column: [02] 
>>>>> pidfd            ... FAILED (lsfd/column-type-pidfd)
>>>>
>>>> Any ideas?
>>>
>>> util-linux upstream is already handling that correctly now but it 
>>> seems that
>>> lsof is not. To fix this in the kernel we'll need something like. If 
>>> you could
>>> test this it'd be great as I'm currently traveling:
>>>
>>> diff --git a/fs/pidfs.c b/fs/pidfs.c
>>> index a63d5d24aa02..3da848a8a95e 100644
>>> --- a/fs/pidfs.c
>>> +++ b/fs/pidfs.c
>>> @@ -201,10 +201,8 @@ static const struct super_operations pidfs_sops = {
>>>
>>>   static char *pidfs_dname(struct dentry *dentry, char *buffer, int 
>>> buflen)
>>>   {
>>> -       struct inode *inode = d_inode(dentry);
>>> -       struct pid *pid = inode->i_private;
>>> -
>>> -       return dynamic_dname(buffer, buflen, "pidfd:[%llu]", pid->ino);
>>> +       /* Fake the old name as some userspace seems to rely on this. */
>>> +       return dynamic_dname(buffer, buflen, "anon_inode:[pidfd]");
>>
>> No, the lsof test runs "lsof -p $pid -a -d $fd -F pfn" and expects:
>> "p${pid} f${fd} n[pidfd:$pid]"
>>
>> But it gets now:
>> p959 f3 nanon_inode
>>
>> I.e. "anon_inode" instead of "n[pidfd:959]".
>>
>> Did you intend to fix by the patch the lsfd's (util-linux) 
>> column-name-pidfd test by this instead (the above)?
> 
> This is now discussed in https://github.com/lsof-org/lsof/issues/317 too.
> 
> So yeah,
> # ll /proc/984/fd
> total 0
> lrwx------ 1 xslaby users 64 May 17 09:00 0 -> /dev/pts/1
> lrwx------ 1 xslaby users 64 May 17 09:00 2 -> /dev/pts/1
> lrwx------ 1 xslaby users 64 May 17 09:00 3 -> anon_inode:[pidfd]
> 
> looks good with the patch. But lsof checks if this IS_REG(). And if it 
> is, pidfd handling is not done.

So this fixes it, of course:
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -2026,7 +2026,6 @@ static struct dentry *prepare_anon_dentry(struct 
dentry **stashed,
         }

         /* Notice when this is changed. */
-       WARN_ON_ONCE(!S_ISREG(inode->i_mode));
         WARN_ON_ONCE(!IS_IMMUTABLE(inode));

         dentry = d_alloc_anon(sb);
diff --git a/fs/pidfs.c b/fs/pidfs.c
index a63d5d24..f51a794f 100644
--- a/fs/pidfs.c
+++ b/fs/pidfs.c
@@ -201,10 +201,7 @@ static const struct super_operations pidfs_sops = {

  static char *pidfs_dname(struct dentry *dentry, char *buffer, int buflen)
  {
-       struct inode *inode = d_inode(dentry);
-       struct pid *pid = inode->i_private;
-
-       return dynamic_dname(buffer, buflen, "pidfd:[%llu]", pid->ino);
+       return dynamic_dname(buffer, buflen, "anon_inode:[pidfd]");
  }

  static const struct dentry_operations pidfs_dentry_operations = {
@@ -217,6 +214,7 @@ static int pidfs_init_inode(struct inode *inode, 
void *data)
  {
         inode->i_private = data;
         inode->i_flags |= S_PRIVATE;
+       inode->i_mode &= ~S_IFREG;
         inode->i_mode |= S_IRWXU;
         inode->i_op = &pidfs_inode_operations;
         inode->i_fop = &pidfs_file_operations;


> 
>> thanks,

-- 
js
suse labs


  reply	other threads:[~2024-05-17  7:54 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-13 16:45 [PATCH 0/2] Move pidfd to tiny pseudo fs Christian Brauner
2024-02-13 16:45 ` [PATCH 1/2] pidfd: move struct pidfd_fops Christian Brauner
2024-02-13 16:45 ` [PATCH 2/2] pidfd: add pidfdfs Christian Brauner
2024-02-13 17:17   ` Linus Torvalds
2024-02-14 14:40     ` Christian Brauner
2024-02-14 18:27       ` Christian Brauner
2024-02-14 18:37         ` Linus Torvalds
2024-02-15 16:11           ` Christian Brauner
2024-02-16 11:50             ` Christian Brauner
2024-02-16 16:41               ` Christian Brauner
2024-02-17 13:59               ` Oleg Nesterov
2024-02-17 17:30                 ` Linus Torvalds
2024-02-17 17:38                   ` Linus Torvalds
2024-02-18 11:15                   ` Christian Brauner
2024-02-18 11:33                     ` Christian Brauner
2024-02-18 17:54                       ` Christian Brauner
2024-02-18 18:08                         ` Linus Torvalds
2024-02-18 18:57                           ` Linus Torvalds
2024-02-19 18:05                             ` Christian Brauner
2024-02-19 18:34                               ` Linus Torvalds
2024-02-19 21:18                                 ` Christian Brauner
2024-02-19 23:24                                   ` Linus Torvalds
2024-02-18 14:27                     ` Oleg Nesterov
2024-02-18  9:30                 ` Christian Brauner
2024-02-22 19:03   ` Nathan Chancellor
2024-02-23 10:18     ` Heiko Carstens
2024-02-23 11:56       ` Christian Brauner
2024-02-23 11:55     ` Christian Brauner
2024-02-23 12:57       ` Heiko Carstens
2024-02-23 13:27         ` Christian Brauner
2024-02-23 13:35           ` Heiko Carstens
2024-02-23 13:41       ` Christian Brauner
2024-02-23 21:26       ` Christian Brauner
2024-02-23 21:58         ` Linus Torvalds
2024-02-24  5:52           ` Christian Brauner
2024-02-24  6:05             ` Christian Brauner
2024-02-24 18:48             ` Linus Torvalds
2024-02-24 19:15               ` Christian Brauner
2024-02-24 19:19                 ` Christian Brauner
2024-02-24 19:21                 ` Linus Torvalds
2024-02-27 19:26                 ` Nathan Chancellor
2024-02-27 22:13                   ` Christian Brauner
2024-03-12 10:35   ` Geert Uytterhoeven
2024-03-12 14:09     ` Christian Brauner
2024-05-15 11:10   ` Jiri Slaby
2024-05-15 16:39     ` Christian Brauner
2024-05-16  5:28       ` Jiri Slaby
2024-05-17  7:09         ` Jiri Slaby
2024-05-17  7:54           ` Jiri Slaby [this message]
2024-05-17 20:07             ` Linus Torvalds
2024-05-20  8:23               ` Jiri Slaby
2024-05-20 19:01                 ` Linus Torvalds
2024-05-20 19:15                   ` Linus Torvalds
2024-05-21  6:07                     ` Jiri Slaby
2024-05-21  6:13                       ` Jiri Slaby
2024-05-21 12:33                         ` Christian Brauner
2024-05-21 12:40                           ` Christian Brauner
2024-05-21 15:10                             ` Linus Torvalds
2024-05-25 11:57                               ` Christian Brauner
2024-05-21 12:16               ` Christian Brauner
2024-02-13 17:02 ` [PATCH 0/2] Move pidfd to tiny pseudo fs Christian Brauner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=84bc442d-c4dd-418e-8020-e1ff987cad13@kernel.org \
    --to=jirislaby@kernel.org \
    --cc=brauner@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=sforshee@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tycho@tycho.pizza \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.