All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Mickaël Salaün" <mic@digikod.net>
To: Eric Van Hensbergen <ericvh@kernel.org>,
	 Dominique Martinet <asmadeus@codewreck.org>,
	Latchesar Ionkov <lucho@ionkov.net>,
	 Christian Schoenebeck <linux_oss@crudebyte.com>
Cc: "Tingmao Wang" <m@maowtm.org>,
	v9fs@lists.linux.dev, "Günther Noack" <gnoack@google.com>,
	linux-security-module@vger.kernel.org, "Jan Kara" <jack@suse.cz>,
	"Amir Goldstein" <amir73il@gmail.com>,
	"Matthew Bobrowski" <repnop@google.com>,
	linux-fsdevel@vger.kernel.org,
	"Christian Brauner" <brauner@kernel.org>
Subject: Re: [RFC PATCH 0/6] fs/9p: Reuse inode based on path (in addition to qid)
Date: Fri, 4 Jul 2025 20:37:06 +0200	[thread overview]
Message-ID: <20250704.eek1caeveC0c@digikod.net> (raw)
In-Reply-To: <cover.1743971855.git.m@maowtm.org>

Hi,

I tested this patch series and I confirm that it fixes the issue for
Landlock.  Tingmao explained that it also fixes other subsystems such as
fanotify, and 9p itself.  I sent a patch to test this fix with Landlock
and I'll merge it once 9p is fixed:
https://lore.kernel.org/all/20250704171345.1393451-1-mic@digikod.net/

Could you please give it a try in linux-next and consider it for the
next merge window?

Regards,
 Mickaël

On Sun, Apr 06, 2025 at 09:43:01PM +0100, Tingmao Wang wrote:
> Hi 9p-fs maintainers,
> 
> (CC'ing Landlock and Fanotify/inotify people as this affects both use
> cases, although most of my investigation has been on Landlock)
> 
> Previously [1], I noticed that when using 9pfs filesystems, the Landlock
> LSM is blocking access even for files / directories allowed by rules, and
> that this has something to do with 9pfs creating new inodes despite
> Landlock holding a reference to the existing one.  Because Landlock uses
> inodes' in-memory state (i_security) to identify allowed fs
> objects/hierarchies, this causes Landlock to partially break on 9pfs, at
> least in uncached mode, which is the default:
> 
>     # mount -t 9p -o trans=virtio test /mnt
>     # env LL_FS_RO=/etc:/usr:/bin:/lib:/mnt/readme LL_FS_RW= /sandboxer bash
>     Executing the sandboxed command...
>     bash: cannot set terminal process group (164): Inappropriate ioctl for device
>     bash: no job control in this shell
>     # cat /mnt/readme
>     cat: /mnt/readme: Permission denied
> 
> This, however, works if somebody is holding onto the dentry (and it also
> works with cache=loose), as in both cases the inode is reused:
> 
>     # tail -f /mnt/readme &
>     [1] 196
>     # env LL_FS_RO=/etc:/usr:/bin:/lib:/mnt/readme LL_FS_RW= /sandboxer bash
>     Executing the sandboxed command...
>     bash: cannot set terminal process group (164): Inappropriate ioctl for device
>     bash: no job control in this shell
>     # cat /mnt/readme
>     aa
> 
> It also works on directories if one have a shell that cd into the
> directory.  Note that this means only certain usage of Landlock are
> affected - for example, sandboxing applications that takes a list of files
> to allow, landlocks itself, then evecve.  On the other hand, this does not
> affect applications that opens a file, then Landlocks itself while keeping
> the file it needs open.
> 
> While the above is a very simple example, this is problematic in
> real-world use cases if Landlock is used to sandox applications on system
> that has files mounted via 9pfs, or use 9pfs as the root filesystem.  In
> addition, this also affects fanotify / inotify when using inode mark (for
> local access):
> 
>     root@d8c28a676d72:/# ./fanotify-basic-open /readme & # on virtiofs
>     [1] 173
>     root@d8c28a676d72:/# cat readme
>     aa
>     FAN_OPEN: File /readme
>     root@d8c28a676d72:/# mount -t 9p -o trans=virtio test /mnt
>     root@d8c28a676d72:/# ./fanotify-basic-open /mnt/readme & # on 9pfs
>     [2] 176
>     root@d8c28a676d72:/# cat /mnt/readme
>     aa
>     root@d8c28a676d72:/#
> 
> Same can be demonstrated with inotifywait.  The source code for
> fanotify-basic-open, adopted from the fanotify man page, is available at
> the end of this email.
> 
> Note that this is not a security bug for Landlock since it can only cause
> legitimate access to be denied, but might be a problem for fanotify perm
> (although I do recognize that using perm on individual inodes is already
> perhaps a bit unreliable?)
> 
> It seems that there was an attempt at making 9pfs reuse inodes, based on
> qid.path, however it was reverted [2] due to issues with servers that
> present duplicate qids, for example on a QEMU host that has multiple
> filesystems mounted under a single 9pfs export without multidevs=remap, or
> in the case of other servers that doesn't necessarily support remapping
> qids ([3] and more).  I've done some testing on v6.12-rc4 which has the
> simplified 9pfs inode code before it was reverted, and found that Landlock
> works (however, we of course then have the issue demonstrated in [2]).
> 
> Unrelated to the above problem, it also seems like even with the revert in
> [2], because in cached mode inode are still reused based on qid (and type,
> version (aka mtime), etc), the setup mentioned in [2] still causes
> problems in th latest kernel with cache=loose:
> 
>     host # cd /tmp/linux-test
>     host # mkdir m1 m2
>     host # mount -t tmpfs tmpfs m1
>     host # mount -t tmpfs tmpfs m2
>     host # mkdir m1/dir m2/dir  # needs to be done together so that they have the same mtime
>     host # echo foo > m1/dir/foo
>     host # echo bar > m2/dir/bar
> 
>     guest # mount -t 9p -o trans=virtio,cache=loose test /mnt
>     guest # cd /mnt/m1/dir
>     qemu-system-x86_64: warning: 9p: Multiple devices detected in same VirtFS export, which might lead to file ID collisions and severe misbehaviours on guest! You should either use a separate export for each device shared from host or use virtfs option 'multidevs=remap'!
>     guest # ls
>     foo
>     guest # ls /mnt/m2/dir
>     foo # <- should be bar
>     guest # uname -a
>     Linux d8c28a676d72 6.14.0-dev #92 SMP PREEMPT_DYNAMIC Sun Apr  6 18:47:54 BST 2025 x86_64 GNU/Linux
> 
> With the above in mind, I have a proposal for 9pfs to:
> 1. Reuse inodes even in uncached mode
> 2. However, reuse them based on qid.path AND the actual pathname, by doing
>    the appropriate testing in v9fs_test(_new)?_inode(_dotl)?
> 
> The main problem here is how to store the pathname in a sensible way and
> tie it to the inode.  For now I opted with an array of names acquired with
> take_dentry_name_snapshot, which reuses the same memory as the dcache to
> store the actual strings, but doesn't tie the lifetime of the dentry with
> the inode (I thought about holding a reference to the dentry in the
> v9fs_inode, but it seemed like a wrong approach and would cause dentries
> to not be evicted/released).
> 
> Maybe ideally there could be a general way for filesystems to tell
> VFS/dcache to "pin" a dentry as long as the inode is alive, but still
> allows the dentry and inode to be evicted based on memory pressure?  In
> fact, if the dentry is alive, we might not even need to do this in 9pfs,
> as we will automatically get the same inode pointed to by the dentry.

Christian, any though?

> 
> Currently this pathname is immutable once attached to an inode, and
> therefore it is not protected by locks or RCU, but this might have to
> change for us to support renames that preserve the inode on next access.
> This is not in this patchset yet.  Also let me know if I've missed any
> locks etc (I'm new to VFS, or for that matter, the kernel).
> 
> Storing one pathname per inode also means we don't reuse the same inode
> for hardlinks -- maybe this can be fixed as well in a future version, if
> this approach sounds good?
> 
> From some QEMU documentation I read [4] it seems like there is a plan to
> resolve these kind of problems in a new version of the protocol, by
> expanding the qid to include the filesystem identifier of a file on the
> host, so maybe this can be disabled after a successful protocol version
> check with the host?  For now this is implemented as a default option,
> inodeident=path, which can be set to 'none' to instead get the previous
> behaviour.
> 
> This patchset is still a bit of a work in progress, and I've not tested
> the performance impact.  It currently uses strncmp to compare paths but
> this might be able to be optimized into a hash comparison first.  It
> should also normally only need to do it for one pair of filenames, as the
> test is only done if qid.path matches in the first place.
> 
> Also, currently the inode is not reused in cached mode if mtime changed
> AND the dentry was evicted -- I considered removing the qid.version test
> in v9fs_test_inode_dotl but I think perhaps care needs to be taken to
> ensure we can refresh an inode that potentially has data cached?  This is
> a TODO for this patchset.
> 
> Another TODO is to maybe add support for case-insensitive matching?
> 
> For this patch series I've tested modifying files on both host and guest,
> changing a file from regular file to dir then back while preserving ino,
> keeping a file open in the guest with a fd, and using Landlock (which
> results in an ihold but does not keep the dentry) then trying to access
> the file from both inside and outside the Landlocked shell.
> 
> Let me know what's your thought on this -- if this is a viable approach
> I'm happy to work on it more and do more testing.  The main motivation
> behind this is getting Landlock to work "out of the box" on 9pfs.
> 
> This patch series was based on, and tested on v6.14 + [5]
> 
> Kind regards,
> Tingmao
> 
> [1]: https://github.com/landlock-lsm/linux/issues/45
> [2]: https://lore.kernel.org/all/20241024-revert_iget-v1-4-4cac63d25f72@codewreck.org/
> [3]: https://lore.kernel.org/all/20240923100508.GA32066@willie-the-truck/
> [4]: https://wiki.qemu.org/Documentation/9p#Protocol_Plans
> [5]: https://lore.kernel.org/all/cover.1743956147.git.m@maowtm.org/
> 
> fanotify-basic-open.c:
> 
>     #define _GNU_SOURCE /* Needed to get O_LARGEFILE definition */
>     #include <errno.h>
>     #include <fcntl.h>
>     #include <limits.h>
>     #include <poll.h>
>     #include <stdio.h>
>     #include <stdlib.h>
>     #include <sys/fanotify.h>
>     #include <unistd.h>
> 
>     /* Read all available fanotify events from the file descriptor 'fd'. */
> 
>     static void handle_events(int fd)
>     {
>         const struct fanotify_event_metadata *metadata;
>         struct fanotify_event_metadata buf[200];
>         ssize_t size;
>         char path[PATH_MAX];
>         ssize_t path_len;
>         char procfd_path[PATH_MAX];
>         struct fanotify_response response;
> 
>         for (;;) {
>             size = read(fd, buf, sizeof(buf));
>             if (size == -1 && errno != EAGAIN) {
>                 perror("read");
>                 exit(EXIT_FAILURE);
>             }
> 
>             /* Check if end of available data reached. */
> 
>             if (size <= 0)
>                 break;
> 
>             /* Point to the first event in the buffer. */
> 
>             metadata = buf;
> 
>             /* Loop over all events in the buffer. */
> 
>             while (FAN_EVENT_OK(metadata, size)) {
>                 if (metadata->fd >= 0) {
>                     if (metadata->mask & FAN_OPEN) {
>                         printf("FAN_OPEN: ");
>                     }
> 
>                     /* Retrieve and print pathname of the accessed file. */
> 
>                     snprintf(procfd_path, sizeof(procfd_path),
>                         "/proc/self/fd/%d", metadata->fd);
>                     path_len = readlink(procfd_path, path,
>                                 sizeof(path) - 1);
>                     if (path_len == -1) {
>                         perror("readlink");
>                         exit(EXIT_FAILURE);
>                     }
> 
>                     path[path_len] = '\0';
>                     printf("File %s\n", path);
> 
>                     /* Close the file descriptor of the event. */
> 
>                     close(metadata->fd);
>                 }
> 
>                 /* Advance to next event. */
> 
>                 metadata = FAN_EVENT_NEXT(metadata, size);
>             }
>         }
>     }
> 
>     int main(int argc, char *argv[])
>     {
>         char buf;
>         int fd, poll_num;
>         nfds_t nfds;
>         struct pollfd fds[2];
> 
>         /* Check mount point is supplied. */
> 
>         if (argc != 2) {
>             fprintf(stderr, "Usage: %s FILE\n", argv[0]);
>             exit(EXIT_FAILURE);
>         }
> 
>         fd = fanotify_init(0, O_RDONLY | O_LARGEFILE);
>         if (fd == -1) {
>             perror("fanotify_init");
>             exit(EXIT_FAILURE);
>         }
> 
>         if (fanotify_mark(fd, FAN_MARK_ADD | FAN_MARK_INODE, FAN_OPEN, AT_FDCWD,
>                 argv[1]) == -1) {
>             perror("fanotify_mark");
>             exit(EXIT_FAILURE);
>         }
> 
>         while (1) {
>             handle_events(fd);
>         }
>     }
> 
> Tingmao Wang (6):
>   fs/9p: Add ability to identify inode by path for .L
>   fs/9p: add default option for path-based inodes
>   fs/9p: Hide inodeident=path from show_options as it is the default
>   fs/9p: Add ability to identify inode by path for non-.L
>   fs/9p: .L: Refresh stale inodes on reuse
>   fs/9p: non-.L: Refresh stale inodes on reuse
> 
>  fs/9p/Makefile         |   3 +-
>  fs/9p/ino_path.c       | 114 ++++++++++++++++++++++++++++++++++++++
>  fs/9p/v9fs.c           |  33 ++++++++++-
>  fs/9p/v9fs.h           |  63 +++++++++++++++------
>  fs/9p/vfs_inode.c      | 122 +++++++++++++++++++++++++++++++++++------
>  fs/9p/vfs_inode_dotl.c | 108 +++++++++++++++++++++++++++++++++---
>  fs/9p/vfs_super.c      |  10 +++-
>  7 files changed, 406 insertions(+), 47 deletions(-)
>  create mode 100644 fs/9p/ino_path.c
> 
> 
> base-commit: 38fec10eb60d687e30c8c6b5420d86e8149f7557
> prerequisite-patch-id: 12dc6676db52ff32eed082b1e5d273f297737f61
> prerequisite-patch-id: 93ab54c52a41fa44b8d0baf55df949d0ad27e99a
> prerequisite-patch-id: 5f558bf969e6eaa3d011c98de0806ca8ad369efe
> --
> 2.39.5
> 

  parent reply	other threads:[~2025-07-04 18:37 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-06 20:43 [RFC PATCH 0/6] fs/9p: Reuse inode based on path (in addition to qid) Tingmao Wang
2025-04-06 20:43 ` [RFC PATCH 1/6] fs/9p: Add ability to identify inode by path for .L Tingmao Wang
2025-07-05  0:19   ` Al Viro
2025-07-05  0:25   ` Al Viro
2025-07-11 19:11     ` Tingmao Wang
2025-08-08  8:32       ` Mickaël Salaün
2025-08-12 23:57         ` Tingmao Wang
2025-08-13  7:47           ` Mickaël Salaün
2025-07-11 19:11   ` Tingmao Wang
2025-04-06 20:43 ` [RFC PATCH 2/6] fs/9p: add default option for path-based inodes Tingmao Wang
2025-04-06 20:43 ` [RFC PATCH 3/6] fs/9p: Hide inodeident=path from show_options as it is the default Tingmao Wang
2025-04-06 20:43 ` [RFC PATCH 4/6] fs/9p: Add ability to identify inode by path for non-.L Tingmao Wang
2025-07-11 19:12   ` Tingmao Wang
2025-04-06 20:43 ` [RFC PATCH 5/6] fs/9p: .L: Refresh stale inodes on reuse Tingmao Wang
2025-04-06 20:43 ` [RFC PATCH 6/6] fs/9p: non-.L: " Tingmao Wang
2025-07-04 18:37 ` Mickaël Salaün [this message]
2025-08-08 10:27 ` [RFC PATCH 0/6] fs/9p: Reuse inode based on path (in addition to qid) Dominique Martinet
2025-08-08 10:52   ` Christian Schoenebeck
2025-08-12 23:53     ` Tingmao Wang
2025-08-13  5:30       ` Dominique Martinet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250704.eek1caeveC0c@digikod.net \
    --to=mic@digikod.net \
    --cc=amir73il@gmail.com \
    --cc=asmadeus@codewreck.org \
    --cc=brauner@kernel.org \
    --cc=ericvh@kernel.org \
    --cc=gnoack@google.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=linux_oss@crudebyte.com \
    --cc=lucho@ionkov.net \
    --cc=m@maowtm.org \
    --cc=repnop@google.com \
    --cc=v9fs@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.