From: Jiri Slaby <jirislaby@kernel.org>
To: Christian Brauner <brauner@kernel.org>
Cc: linux-fsdevel@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Seth Forshee <sforshee@kernel.org>,
Tycho Andersen <tycho@tycho.pizza>
Subject: Re: [PATCH 2/2] pidfd: add pidfdfs
Date: Fri, 17 May 2024 09:54:19 +0200 [thread overview]
Message-ID: <84bc442d-c4dd-418e-8020-e1ff987cad13@kernel.org> (raw)
In-Reply-To: <a65b573a-8573-4a17-a918-b5cf358c17d6@kernel.org>
On 17. 05. 24, 9:09, Jiri Slaby wrote:
> On 16. 05. 24, 7:28, Jiri Slaby wrote:
>> On 15. 05. 24, 18:39, Christian Brauner wrote:
>>> On Wed, May 15, 2024 at 01:10:49PM +0200, Jiri Slaby wrote:
>>>> On 13. 02. 24, 17:45, Christian Brauner wrote:
>>>>> This moves pidfds from the anonymous inode infrastructure to a tiny
>>>>> pseudo filesystem. This has been on my todo for quite a while as it
>>>>> will
>>>>> unblock further work that we weren't able to do simply because of the
>>>>> very justified limitations of anonymous inodes. Moving pidfds to a
>>>>> tiny
>>>>> pseudo filesystem allows:
>>>>>
>>>>> * statx() on pidfds becomes useful for the first time.
>>>>> * pidfds can be compared simply via statx() and then comparing inode
>>>>> numbers.
>>>>> * pidfds have unique inode numbers for the system lifetime.
>>>>> * struct pid is now stashed in inode->i_private instead of
>>>>> file->private_data. This means it is now possible to introduce
>>>>> concepts that operate on a process once all file descriptors
>>>>> have been
>>>>> closed. A concrete example is kill-on-last-close.
>>>>> * file->private_data is freed up for per-file options for pidfds.
>>>>> * Each struct pid will refer to a different inode but the same struct
>>>>> pid will refer to the same inode if it's opened multiple times. In
>>>>> contrast to now where each struct pid refers to the same inode.
>>>>> Even
>>>>> if we were to move to anon_inode_create_getfile() which creates
>>>>> new
>>>>> inodes we'd still be associating the same struct pid with multiple
>>>>> different inodes.
>>>>> * Pidfds now go through the regular dentry_open() path which means
>>>>> that
>>>>> all security hooks are called unblocking proper LSM management for
>>>>> pidfds. In addition fsnotify hooks are called and allow for
>>>>> listening
>>>>> to open events on pidfds.
>>>>>
>>>>> The tiny pseudo filesystem is not visible anywhere in userspace
>>>>> exactly
>>>>> like e.g., pipefs and sockfs. There's no lookup, there's no complex
>>>>> inode operations, nothing. Dentries and inodes are always deleted when
>>>>> the last pidfd is closed.
>>>>
>>>> This breaks lsof and util-linux.
>>>>
>>>> Without the commit, lsof shows:
>>>> systemd ... 59 [pidfd:899]
>>>>
>>>>
>>>> With the commit:
>>>> systemd ... 1187 pidfd
>>>>
>>>>
>>>> And that user-visible change breaks a lot of stuff, incl. lsof tests.
>>>>
>>>> For util-linux, its test fail with:
>>>>
>>>>> [ 125s] --- tests/expected/lsfd/column-name-pidfd 2024-05-06
>>>>> 07:20:54.655845940 +0000
>>>>> [ 125s] +++ tests/output/lsfd/column-name-pidfd 2024-05-15
>>>>> 01:04:15.406666666 +0000
>>>>> [ 125s] @@ -1,2 +1,2 @@
>>>>> [ 125s] -3 anon_inode:[pidfd] pid=1 comm= nspid=1
>>>>> [ 125s] +3 pidfd:[INODENUM] pidfd:[INODENUM]
>>>>> [ 125s] pidfd:ASSOC,KNAME,NAME: 0
>>>>> [ 125s] lsfd: NAME and KNAME column: [02]
>>>>> pidfd ... FAILED (lsfd/column-name-pidfd)
>>>>
>>>> And:
>>>>> [ 125s] --- tests/expected/lsfd/column-type-pidfd 2024-05-06
>>>>> 07:20:54.655845940 +0000
>>>>> [ 125s] +++ tests/output/lsfd/column-type-pidfd 2024-05-15
>>>>> 01:04:15.573333333 +0000
>>>>> [ 125s] @@ -1,2 +1,2 @@
>>>>> [ 125s] -3 UNKN pidfd
>>>>> [ 125s] +3 REG REG
>>>>> [ 125s] pidfd:ASSOC,STTYPE,TYPE: 0
>>>>> [ 125s] lsfd: TYPE and STTYPE column: [02]
>>>>> pidfd ... FAILED (lsfd/column-type-pidfd)
>>>>
>>>> Any ideas?
>>>
>>> util-linux upstream is already handling that correctly now but it
>>> seems that
>>> lsof is not. To fix this in the kernel we'll need something like. If
>>> you could
>>> test this it'd be great as I'm currently traveling:
>>>
>>> diff --git a/fs/pidfs.c b/fs/pidfs.c
>>> index a63d5d24aa02..3da848a8a95e 100644
>>> --- a/fs/pidfs.c
>>> +++ b/fs/pidfs.c
>>> @@ -201,10 +201,8 @@ static const struct super_operations pidfs_sops = {
>>>
>>> static char *pidfs_dname(struct dentry *dentry, char *buffer, int
>>> buflen)
>>> {
>>> - struct inode *inode = d_inode(dentry);
>>> - struct pid *pid = inode->i_private;
>>> -
>>> - return dynamic_dname(buffer, buflen, "pidfd:[%llu]", pid->ino);
>>> + /* Fake the old name as some userspace seems to rely on this. */
>>> + return dynamic_dname(buffer, buflen, "anon_inode:[pidfd]");
>>
>> No, the lsof test runs "lsof -p $pid -a -d $fd -F pfn" and expects:
>> "p${pid} f${fd} n[pidfd:$pid]"
>>
>> But it gets now:
>> p959 f3 nanon_inode
>>
>> I.e. "anon_inode" instead of "n[pidfd:959]".
>>
>> Did you intend to fix by the patch the lsfd's (util-linux)
>> column-name-pidfd test by this instead (the above)?
>
> This is now discussed in https://github.com/lsof-org/lsof/issues/317 too.
>
> So yeah,
> # ll /proc/984/fd
> total 0
> lrwx------ 1 xslaby users 64 May 17 09:00 0 -> /dev/pts/1
> lrwx------ 1 xslaby users 64 May 17 09:00 2 -> /dev/pts/1
> lrwx------ 1 xslaby users 64 May 17 09:00 3 -> anon_inode:[pidfd]
>
> looks good with the patch. But lsof checks if this IS_REG(). And if it
> is, pidfd handling is not done.
So this fixes it, of course:
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -2026,7 +2026,6 @@ static struct dentry *prepare_anon_dentry(struct
dentry **stashed,
}
/* Notice when this is changed. */
- WARN_ON_ONCE(!S_ISREG(inode->i_mode));
WARN_ON_ONCE(!IS_IMMUTABLE(inode));
dentry = d_alloc_anon(sb);
diff --git a/fs/pidfs.c b/fs/pidfs.c
index a63d5d24..f51a794f 100644
--- a/fs/pidfs.c
+++ b/fs/pidfs.c
@@ -201,10 +201,7 @@ static const struct super_operations pidfs_sops = {
static char *pidfs_dname(struct dentry *dentry, char *buffer, int buflen)
{
- struct inode *inode = d_inode(dentry);
- struct pid *pid = inode->i_private;
-
- return dynamic_dname(buffer, buflen, "pidfd:[%llu]", pid->ino);
+ return dynamic_dname(buffer, buflen, "anon_inode:[pidfd]");
}
static const struct dentry_operations pidfs_dentry_operations = {
@@ -217,6 +214,7 @@ static int pidfs_init_inode(struct inode *inode,
void *data)
{
inode->i_private = data;
inode->i_flags |= S_PRIVATE;
+ inode->i_mode &= ~S_IFREG;
inode->i_mode |= S_IRWXU;
inode->i_op = &pidfs_inode_operations;
inode->i_fop = &pidfs_file_operations;
>
>> thanks,
--
js
suse labs
next prev parent reply other threads:[~2024-05-17 7:54 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-13 16:45 [PATCH 0/2] Move pidfd to tiny pseudo fs Christian Brauner
2024-02-13 16:45 ` [PATCH 1/2] pidfd: move struct pidfd_fops Christian Brauner
2024-02-13 16:45 ` [PATCH 2/2] pidfd: add pidfdfs Christian Brauner
2024-02-13 17:17 ` Linus Torvalds
2024-02-14 14:40 ` Christian Brauner
2024-02-14 18:27 ` Christian Brauner
2024-02-14 18:37 ` Linus Torvalds
2024-02-15 16:11 ` Christian Brauner
2024-02-16 11:50 ` Christian Brauner
2024-02-16 16:41 ` Christian Brauner
2024-02-17 13:59 ` Oleg Nesterov
2024-02-17 17:30 ` Linus Torvalds
2024-02-17 17:38 ` Linus Torvalds
2024-02-18 11:15 ` Christian Brauner
2024-02-18 11:33 ` Christian Brauner
2024-02-18 17:54 ` Christian Brauner
2024-02-18 18:08 ` Linus Torvalds
2024-02-18 18:57 ` Linus Torvalds
2024-02-19 18:05 ` Christian Brauner
2024-02-19 18:34 ` Linus Torvalds
2024-02-19 21:18 ` Christian Brauner
2024-02-19 23:24 ` Linus Torvalds
2024-02-18 14:27 ` Oleg Nesterov
2024-02-18 9:30 ` Christian Brauner
2024-02-22 19:03 ` Nathan Chancellor
2024-02-23 10:18 ` Heiko Carstens
2024-02-23 11:56 ` Christian Brauner
2024-02-23 11:55 ` Christian Brauner
2024-02-23 12:57 ` Heiko Carstens
2024-02-23 13:27 ` Christian Brauner
2024-02-23 13:35 ` Heiko Carstens
2024-02-23 13:41 ` Christian Brauner
2024-02-23 21:26 ` Christian Brauner
2024-02-23 21:58 ` Linus Torvalds
2024-02-24 5:52 ` Christian Brauner
2024-02-24 6:05 ` Christian Brauner
2024-02-24 18:48 ` Linus Torvalds
2024-02-24 19:15 ` Christian Brauner
2024-02-24 19:19 ` Christian Brauner
2024-02-24 19:21 ` Linus Torvalds
2024-02-27 19:26 ` Nathan Chancellor
2024-02-27 22:13 ` Christian Brauner
2024-03-12 10:35 ` Geert Uytterhoeven
2024-03-12 14:09 ` Christian Brauner
2024-05-15 11:10 ` Jiri Slaby
2024-05-15 16:39 ` Christian Brauner
2024-05-16 5:28 ` Jiri Slaby
2024-05-17 7:09 ` Jiri Slaby
2024-05-17 7:54 ` Jiri Slaby [this message]
2024-05-17 20:07 ` Linus Torvalds
2024-05-20 8:23 ` Jiri Slaby
2024-05-20 19:01 ` Linus Torvalds
2024-05-20 19:15 ` Linus Torvalds
2024-05-21 6:07 ` Jiri Slaby
2024-05-21 6:13 ` Jiri Slaby
2024-05-21 12:33 ` Christian Brauner
2024-05-21 12:40 ` Christian Brauner
2024-05-21 15:10 ` Linus Torvalds
2024-05-25 11:57 ` Christian Brauner
2024-05-21 12:16 ` Christian Brauner
2024-02-13 17:02 ` [PATCH 0/2] Move pidfd to tiny pseudo fs Christian Brauner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=84bc442d-c4dd-418e-8020-e1ff987cad13@kernel.org \
--to=jirislaby@kernel.org \
--cc=brauner@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=sforshee@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=tycho@tycho.pizza \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).