[RFC PATCH 0/6] fs/9p: Reuse inode based on path (in addition to qid)

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [RFC PATCH 0/6] fs/9p: Reuse inode based on path (in addition to qid)
@ 2025-04-06 20:43 Tingmao Wang
  2025-04-06 20:43 ` [RFC PATCH 1/6] fs/9p: Add ability to identify inode by path for .L Tingmao Wang
                   ` (7 more replies)
  0 siblings, 8 replies; 20+ messages in thread
From: Tingmao Wang @ 2025-04-06 20:43 UTC (permalink / raw)
  To: Eric Van Hensbergen, Dominique Martinet, Latchesar Ionkov,
	Christian Schoenebeck
  Cc: Tingmao Wang, v9fs, Mickaël Salaün, Günther Noack,
	linux-security-module, Jan Kara, Amir Goldstein,
	Matthew Bobrowski, linux-fsdevel

Hi 9p-fs maintainers,

(CC'ing Landlock and Fanotify/inotify people as this affects both use
cases, although most of my investigation has been on Landlock)

Previously [1], I noticed that when using 9pfs filesystems, the Landlock
LSM is blocking access even for files / directories allowed by rules, and
that this has something to do with 9pfs creating new inodes despite
Landlock holding a reference to the existing one.  Because Landlock uses
inodes' in-memory state (i_security) to identify allowed fs
objects/hierarchies, this causes Landlock to partially break on 9pfs, at
least in uncached mode, which is the default:

    # mount -t 9p -o trans=virtio test /mnt
    # env LL_FS_RO=/etc:/usr:/bin:/lib:/mnt/readme LL_FS_RW= /sandboxer bash
    Executing the sandboxed command...
    bash: cannot set terminal process group (164): Inappropriate ioctl for device
    bash: no job control in this shell
    # cat /mnt/readme
    cat: /mnt/readme: Permission denied

This, however, works if somebody is holding onto the dentry (and it also
works with cache=loose), as in both cases the inode is reused:

    # tail -f /mnt/readme &
    [1] 196
    # env LL_FS_RO=/etc:/usr:/bin:/lib:/mnt/readme LL_FS_RW= /sandboxer bash
    Executing the sandboxed command...
    bash: cannot set terminal process group (164): Inappropriate ioctl for device
    bash: no job control in this shell
    # cat /mnt/readme
    aa

It also works on directories if one have a shell that cd into the
directory.  Note that this means only certain usage of Landlock are
affected - for example, sandboxing applications that takes a list of files
to allow, landlocks itself, then evecve.  On the other hand, this does not
affect applications that opens a file, then Landlocks itself while keeping
the file it needs open.

While the above is a very simple example, this is problematic in
real-world use cases if Landlock is used to sandox applications on system
that has files mounted via 9pfs, or use 9pfs as the root filesystem.  In
addition, this also affects fanotify / inotify when using inode mark (for
local access):

    root@d8c28a676d72:/# ./fanotify-basic-open /readme & # on virtiofs
    [1] 173
    root@d8c28a676d72:/# cat readme
    aa
    FAN_OPEN: File /readme
    root@d8c28a676d72:/# mount -t 9p -o trans=virtio test /mnt
    root@d8c28a676d72:/# ./fanotify-basic-open /mnt/readme & # on 9pfs
    [2] 176
    root@d8c28a676d72:/# cat /mnt/readme
    aa
    root@d8c28a676d72:/#

Same can be demonstrated with inotifywait.  The source code for
fanotify-basic-open, adopted from the fanotify man page, is available at
the end of this email.

Note that this is not a security bug for Landlock since it can only cause
legitimate access to be denied, but might be a problem for fanotify perm
(although I do recognize that using perm on individual inodes is already
perhaps a bit unreliable?)

It seems that there was an attempt at making 9pfs reuse inodes, based on
qid.path, however it was reverted [2] due to issues with servers that
present duplicate qids, for example on a QEMU host that has multiple
filesystems mounted under a single 9pfs export without multidevs=remap, or
in the case of other servers that doesn't necessarily support remapping
qids ([3] and more).  I've done some testing on v6.12-rc4 which has the
simplified 9pfs inode code before it was reverted, and found that Landlock
works (however, we of course then have the issue demonstrated in [2]).

Unrelated to the above problem, it also seems like even with the revert in
[2], because in cached mode inode are still reused based on qid (and type,
version (aka mtime), etc), the setup mentioned in [2] still causes
problems in th latest kernel with cache=loose:

    host # cd /tmp/linux-test
    host # mkdir m1 m2
    host # mount -t tmpfs tmpfs m1
    host # mount -t tmpfs tmpfs m2
    host # mkdir m1/dir m2/dir  # needs to be done together so that they have the same mtime
    host # echo foo > m1/dir/foo
    host # echo bar > m2/dir/bar

    guest # mount -t 9p -o trans=virtio,cache=loose test /mnt
    guest # cd /mnt/m1/dir
    qemu-system-x86_64: warning: 9p: Multiple devices detected in same VirtFS export, which might lead to file ID collisions and severe misbehaviours on guest! You should either use a separate export for each device shared from host or use virtfs option 'multidevs=remap'!
    guest # ls
    foo
    guest # ls /mnt/m2/dir
    foo # <- should be bar
    guest # uname -a
    Linux d8c28a676d72 6.14.0-dev #92 SMP PREEMPT_DYNAMIC Sun Apr  6 18:47:54 BST 2025 x86_64 GNU/Linux

With the above in mind, I have a proposal for 9pfs to:
1. Reuse inodes even in uncached mode
2. However, reuse them based on qid.path AND the actual pathname, by doing
   the appropriate testing in v9fs_test(_new)?_inode(_dotl)?

The main problem here is how to store the pathname in a sensible way and
tie it to the inode.  For now I opted with an array of names acquired with
take_dentry_name_snapshot, which reuses the same memory as the dcache to
store the actual strings, but doesn't tie the lifetime of the dentry with
the inode (I thought about holding a reference to the dentry in the
v9fs_inode, but it seemed like a wrong approach and would cause dentries
to not be evicted/released).

Maybe ideally there could be a general way for filesystems to tell
VFS/dcache to "pin" a dentry as long as the inode is alive, but still
allows the dentry and inode to be evicted based on memory pressure?  In
fact, if the dentry is alive, we might not even need to do this in 9pfs,
as we will automatically get the same inode pointed to by the dentry.

Currently this pathname is immutable once attached to an inode, and
therefore it is not protected by locks or RCU, but this might have to
change for us to support renames that preserve the inode on next access.
This is not in this patchset yet.  Also let me know if I've missed any
locks etc (I'm new to VFS, or for that matter, the kernel).

Storing one pathname per inode also means we don't reuse the same inode
for hardlinks -- maybe this can be fixed as well in a future version, if
this approach sounds good?

From some QEMU documentation I read [4] it seems like there is a plan to
resolve these kind of problems in a new version of the protocol, by
expanding the qid to include the filesystem identifier of a file on the
host, so maybe this can be disabled after a successful protocol version
check with the host?  For now this is implemented as a default option,
inodeident=path, which can be set to 'none' to instead get the previous
behaviour.

This patchset is still a bit of a work in progress, and I've not tested
the performance impact.  It currently uses strncmp to compare paths but
this might be able to be optimized into a hash comparison first.  It
should also normally only need to do it for one pair of filenames, as the
test is only done if qid.path matches in the first place.

Also, currently the inode is not reused in cached mode if mtime changed
AND the dentry was evicted -- I considered removing the qid.version test
in v9fs_test_inode_dotl but I think perhaps care needs to be taken to
ensure we can refresh an inode that potentially has data cached?  This is
a TODO for this patchset.

Another TODO is to maybe add support for case-insensitive matching?

For this patch series I've tested modifying files on both host and guest,
changing a file from regular file to dir then back while preserving ino,
keeping a file open in the guest with a fd, and using Landlock (which
results in an ihold but does not keep the dentry) then trying to access
the file from both inside and outside the Landlocked shell.

Let me know what's your thought on this -- if this is a viable approach
I'm happy to work on it more and do more testing.  The main motivation
behind this is getting Landlock to work "out of the box" on 9pfs.

This patch series was based on, and tested on v6.14 + [5]

Kind regards,
Tingmao

[1]: https://github.com/landlock-lsm/linux/issues/45
[2]: https://lore.kernel.org/all/20241024-revert_iget-v1-4-4cac63d25f72@codewreck.org/
[3]: https://lore.kernel.org/all/20240923100508.GA32066@willie-the-truck/
[4]: https://wiki.qemu.org/Documentation/9p#Protocol_Plans
[5]: https://lore.kernel.org/all/cover.1743956147.git.m@maowtm.org/

fanotify-basic-open.c:

    #define _GNU_SOURCE /* Needed to get O_LARGEFILE definition */
    #include <errno.h>
    #include <fcntl.h>
    #include <limits.h>
    #include <poll.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <sys/fanotify.h>
    #include <unistd.h>

    /* Read all available fanotify events from the file descriptor 'fd'. */

    static void handle_events(int fd)
    {
        const struct fanotify_event_metadata *metadata;
        struct fanotify_event_metadata buf[200];
        ssize_t size;
        char path[PATH_MAX];
        ssize_t path_len;
        char procfd_path[PATH_MAX];
        struct fanotify_response response;

        for (;;) {
            size = read(fd, buf, sizeof(buf));
            if (size == -1 && errno != EAGAIN) {
                perror("read");
                exit(EXIT_FAILURE);
            }

            /* Check if end of available data reached. */

            if (size <= 0)
                break;

            /* Point to the first event in the buffer. */

            metadata = buf;

            /* Loop over all events in the buffer. */

            while (FAN_EVENT_OK(metadata, size)) {
                if (metadata->fd >= 0) {
                    if (metadata->mask & FAN_OPEN) {
                        printf("FAN_OPEN: ");
                    }

                    /* Retrieve and print pathname of the accessed file. */

                    snprintf(procfd_path, sizeof(procfd_path),
                        "/proc/self/fd/%d", metadata->fd);
                    path_len = readlink(procfd_path, path,
                                sizeof(path) - 1);
                    if (path_len == -1) {
                        perror("readlink");
                        exit(EXIT_FAILURE);
                    }

                    path[path_len] = '\0';
                    printf("File %s\n", path);

                    /* Close the file descriptor of the event. */

                    close(metadata->fd);
                }

                /* Advance to next event. */

                metadata = FAN_EVENT_NEXT(metadata, size);
            }
        }
    }

    int main(int argc, char *argv[])
    {
        char buf;
        int fd, poll_num;
        nfds_t nfds;
        struct pollfd fds[2];

        /* Check mount point is supplied. */

        if (argc != 2) {
            fprintf(stderr, "Usage: %s FILE\n", argv[0]);
            exit(EXIT_FAILURE);
        }

        fd = fanotify_init(0, O_RDONLY | O_LARGEFILE);
        if (fd == -1) {
            perror("fanotify_init");
            exit(EXIT_FAILURE);
        }

        if (fanotify_mark(fd, FAN_MARK_ADD | FAN_MARK_INODE, FAN_OPEN, AT_FDCWD,
                argv[1]) == -1) {
            perror("fanotify_mark");
            exit(EXIT_FAILURE);
        }

        while (1) {
            handle_events(fd);
        }
    }

Tingmao Wang (6):
  fs/9p: Add ability to identify inode by path for .L
  fs/9p: add default option for path-based inodes
  fs/9p: Hide inodeident=path from show_options as it is the default
  fs/9p: Add ability to identify inode by path for non-.L
  fs/9p: .L: Refresh stale inodes on reuse
  fs/9p: non-.L: Refresh stale inodes on reuse

 fs/9p/Makefile         |   3 +-
 fs/9p/ino_path.c       | 114 ++++++++++++++++++++++++++++++++++++++
 fs/9p/v9fs.c           |  33 ++++++++++-
 fs/9p/v9fs.h           |  63 +++++++++++++++------
 fs/9p/vfs_inode.c      | 122 +++++++++++++++++++++++++++++++++++------
 fs/9p/vfs_inode_dotl.c | 108 +++++++++++++++++++++++++++++++++---
 fs/9p/vfs_super.c      |  10 +++-
 7 files changed, 406 insertions(+), 47 deletions(-)
 create mode 100644 fs/9p/ino_path.c

base-commit: 38fec10eb60d687e30c8c6b5420d86e8149f7557
prerequisite-patch-id: 12dc6676db52ff32eed082b1e5d273f297737f61
prerequisite-patch-id: 93ab54c52a41fa44b8d0baf55df949d0ad27e99a
prerequisite-patch-id: 5f558bf969e6eaa3d011c98de0806ca8ad369efe
--
2.39.5

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC PATCH 1/6] fs/9p: Add ability to identify inode by path for .L
  2025-04-06 20:43 [RFC PATCH 0/6] fs/9p: Reuse inode based on path (in addition to qid) Tingmao Wang
@ 2025-04-06 20:43 ` Tingmao Wang
  2025-07-05  0:19   ` Al Viro
                     ` (2 more replies)
  2025-04-06 20:43 ` [RFC PATCH 2/6] fs/9p: add default option for path-based inodes Tingmao Wang
                   ` (6 subsequent siblings)
  7 siblings, 3 replies; 20+ messages in thread
From: Tingmao Wang @ 2025-04-06 20:43 UTC (permalink / raw)
  To: Eric Van Hensbergen, Dominique Martinet, Latchesar Ionkov,
	Christian Schoenebeck
  Cc: Tingmao Wang, v9fs, Mickaël Salaün, Günther Noack,
	linux-security-module, Jan Kara, Amir Goldstein,
	Matthew Bobrowski, linux-fsdevel

Creating a new file for shared code between vfs_inode.c and
vfs_inode_dotl.c.  Same change for non-.L will be added in a following
commit.

Signed-off-by: Tingmao Wang <m@maowtm.org>
---
 fs/9p/Makefile         |   3 +-
 fs/9p/ino_path.c       | 114 +++++++++++++++++++++++++++++++++++++++++
 fs/9p/v9fs.h           |  54 +++++++++++++------
 fs/9p/vfs_inode.c      |  20 ++++++--
 fs/9p/vfs_inode_dotl.c |  91 +++++++++++++++++++++++++++++---
 fs/9p/vfs_super.c      |  10 +++-
 6 files changed, 262 insertions(+), 30 deletions(-)
 create mode 100644 fs/9p/ino_path.c

diff --git a/fs/9p/Makefile b/fs/9p/Makefile
index e7800a5c7395..38c3ceb26274 100644
--- a/fs/9p/Makefile
+++ b/fs/9p/Makefile
@@ -11,7 +11,8 @@ obj-$(CONFIG_9P_FS) := 9p.o
 	vfs_dentry.o \
 	v9fs.o \
 	fid.o  \
-	xattr.o
+	xattr.o \
+	ino_path.o
 
 9p-$(CONFIG_9P_FSCACHE) += cache.o
 9p-$(CONFIG_9P_FS_POSIX_ACL) += acl.o
diff --git a/fs/9p/ino_path.c b/fs/9p/ino_path.c
new file mode 100644
index 000000000000..a4e0aef81618
--- /dev/null
+++ b/fs/9p/ino_path.c
@@ -0,0 +1,114 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Specific operations on the v9fs_ino_path structure.
+ *
+ * Copyright (C) 2025 by Tingmao Wang <m@maowtm.org>
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/fs.h>
+#include <linux/string.h>
+#include <linux/dcache.h>
+
+#include <linux/posix_acl.h>
+#include <net/9p/9p.h>
+#include <net/9p/client.h>
+#include "v9fs.h"
+
+/*
+ * Must hold rename_sem due to traversing parents
+ */
+struct v9fs_ino_path *make_ino_path(struct dentry *dentry)
+{
+	struct v9fs_ino_path *path;
+	size_t path_components = 0;
+	struct dentry *curr = dentry;
+	ssize_t i;
+
+	lockdep_assert_held_read(&v9fs_dentry2v9ses(dentry)->rename_sem);
+
+	rcu_read_lock();
+
+    /* Don't include the root dentry */
+	while (curr->d_parent != curr) {
+		path_components++;
+		curr = curr->d_parent;
+	}
+	if (WARN_ON(path_components > SSIZE_MAX)) {
+		rcu_read_unlock();
+		return NULL;
+	}
+
+	path = kmalloc(struct_size(path, names, path_components),
+		       GFP_KERNEL);
+	if (!path) {
+		rcu_read_unlock();
+		return NULL;
+	}
+
+	path->nr_components = path_components;
+	curr = dentry;
+	for (i = path_components - 1; i >= 0; i--) {
+		take_dentry_name_snapshot(&path->names[i], curr);
+		curr = curr->d_parent;
+	}
+	WARN_ON(curr != curr->d_parent);
+	rcu_read_unlock();
+	return path;
+}
+
+void free_ino_path(struct v9fs_ino_path *path)
+{
+	if (path) {
+		for (size_t i = 0; i < path->nr_components; i++)
+			release_dentry_name_snapshot(&path->names[i]);
+		kfree(path);
+	}
+}
+
+/*
+ * Must hold rename_sem due to traversing parents
+ */
+bool ino_path_compare(struct v9fs_ino_path *ino_path,
+			     struct dentry *dentry)
+{
+	struct dentry *curr = dentry;
+	struct qstr *curr_name;
+	struct name_snapshot *compare;
+	ssize_t i;
+
+	lockdep_assert_held_read(&v9fs_dentry2v9ses(dentry)->rename_sem);
+
+	rcu_read_lock();
+	for (i = ino_path->nr_components - 1; i >= 0; i--) {
+		if (curr->d_parent == curr) {
+			/* We're supposed to have more components to walk */
+			rcu_read_unlock();
+			return false;
+		}
+		curr_name = &curr->d_name;
+		compare = &ino_path->names[i];
+		/*
+		 * We can't use hash_len because it is salted with the parent
+		 * dentry pointer.  We could make this faster by pre-computing our
+		 * own hashlen for compare and ino_path outside, probably.
+		 */
+		if (curr_name->len != compare->name.len) {
+			rcu_read_unlock();
+			return false;
+		}
+		if (strncmp(curr_name->name, compare->name.name,
+			    curr_name->len) != 0) {
+			rcu_read_unlock();
+			return false;
+		}
+		curr = curr->d_parent;
+	}
+	rcu_read_unlock();
+	if (curr != curr->d_parent) {
+		/* dentry is deeper than ino_path */
+		return false;
+	}
+	return true;
+}
diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h
index f28bc763847a..5c85923aa2dd 100644
--- a/fs/9p/v9fs.h
+++ b/fs/9p/v9fs.h
@@ -10,6 +10,7 @@
 
 #include <linux/backing-dev.h>
 #include <linux/netfs.h>
+#include <linux/dcache.h>
 
 /**
  * enum p9_session_flags - option flags for each 9P session
@@ -31,16 +32,17 @@
 #define V9FS_ACL_MASK V9FS_POSIX_ACL
 
 enum p9_session_flags {
-	V9FS_PROTO_2000U    = 0x01,
-	V9FS_PROTO_2000L    = 0x02,
-	V9FS_ACCESS_SINGLE  = 0x04,
-	V9FS_ACCESS_USER    = 0x08,
-	V9FS_ACCESS_CLIENT  = 0x10,
-	V9FS_POSIX_ACL      = 0x20,
-	V9FS_NO_XATTR       = 0x40,
-	V9FS_IGNORE_QV      = 0x80, /* ignore qid.version for cache hints */
-	V9FS_DIRECT_IO      = 0x100,
-	V9FS_SYNC           = 0x200
+	V9FS_PROTO_2000U      = 0x01,
+	V9FS_PROTO_2000L      = 0x02,
+	V9FS_ACCESS_SINGLE    = 0x04,
+	V9FS_ACCESS_USER      = 0x08,
+	V9FS_ACCESS_CLIENT    = 0x10,
+	V9FS_POSIX_ACL        = 0x20,
+	V9FS_NO_XATTR         = 0x40,
+	V9FS_IGNORE_QV        = 0x80, /* ignore qid.version for cache hints */
+	V9FS_DIRECT_IO        = 0x100,
+	V9FS_SYNC             = 0x200,
+	V9FS_INODE_IDENT_PATH = 0x400,
 };
 
 /**
@@ -133,11 +135,27 @@ struct v9fs_session_info {
 /* cache_validity flags */
 #define V9FS_INO_INVALID_ATTR 0x01
 
+struct v9fs_ino_path {
+	size_t nr_components;
+	struct name_snapshot names[] __counted_by(nr_components);
+};
+
+extern struct v9fs_ino_path *make_ino_path(struct dentry *dentry);
+extern void free_ino_path(struct v9fs_ino_path *path);
+extern bool ino_path_compare(struct v9fs_ino_path *ino_path,
+	struct dentry *dentry);
+
 struct v9fs_inode {
 	struct netfs_inode netfs; /* Netfslib context and vfs inode */
 	struct p9_qid qid;
 	unsigned int cache_validity;
 	struct mutex v_mutex;
+
+	/*
+	 * Only for filesystems with inode_ident=path.  Lifetime is the same as
+	 * this inode
+	 */
+	struct v9fs_ino_path *path;
 };
 
 static inline struct v9fs_inode *V9FS_I(const struct inode *inode)
@@ -188,7 +206,8 @@ extern const struct inode_operations v9fs_symlink_inode_operations_dotl;
 extern const struct netfs_request_ops v9fs_req_ops;
 extern struct inode *v9fs_inode_from_fid_dotl(struct v9fs_session_info *v9ses,
 					      struct p9_fid *fid,
-					      struct super_block *sb, int new);
+					      struct super_block *sb,
+					      struct dentry *dentry, int new);
 
 /* other default globals */
 #define V9FS_PORT	564
@@ -217,6 +236,11 @@ static inline int v9fs_proto_dotl(struct v9fs_session_info *v9ses)
 	return v9ses->flags & V9FS_PROTO_2000L;
 }
 
+static inline int v9fs_inode_ident_path(struct v9fs_session_info *v9ses)
+{
+	return v9ses->flags & V9FS_INODE_IDENT_PATH;
+}
+
 /**
  * v9fs_get_inode_from_fid - Helper routine to populate an inode by
  * issuing a attribute request
@@ -227,10 +251,10 @@ static inline int v9fs_proto_dotl(struct v9fs_session_info *v9ses)
  */
 static inline struct inode *
 v9fs_get_inode_from_fid(struct v9fs_session_info *v9ses, struct p9_fid *fid,
-			struct super_block *sb)
+			struct super_block *sb, struct dentry *dentry)
 {
 	if (v9fs_proto_dotl(v9ses))
-		return v9fs_inode_from_fid_dotl(v9ses, fid, sb, 0);
+		return v9fs_inode_from_fid_dotl(v9ses, fid, sb, dentry, 0);
 	else
 		return v9fs_inode_from_fid(v9ses, fid, sb, 0);
 }
@@ -245,10 +269,10 @@ v9fs_get_inode_from_fid(struct v9fs_session_info *v9ses, struct p9_fid *fid,
  */
 static inline struct inode *
 v9fs_get_new_inode_from_fid(struct v9fs_session_info *v9ses, struct p9_fid *fid,
-			    struct super_block *sb)
+			    struct super_block *sb, struct dentry *dentry)
 {
 	if (v9fs_proto_dotl(v9ses))
-		return v9fs_inode_from_fid_dotl(v9ses, fid, sb, 1);
+		return v9fs_inode_from_fid_dotl(v9ses, fid, sb, dentry, 1);
 	else
 		return v9fs_inode_from_fid(v9ses, fid, sb, 1);
 }
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 1640765563e9..72fd72a2ff06 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -243,6 +243,7 @@ struct inode *v9fs_alloc_inode(struct super_block *sb)
 
 void v9fs_free_inode(struct inode *inode)
 {
+	free_ino_path(V9FS_I(inode)->path);
 	kmem_cache_free(v9fs_inode_cache, V9FS_I(inode));
 }
 
@@ -607,15 +608,17 @@ v9fs_create(struct v9fs_session_info *v9ses, struct inode *dir,
 			goto error;
 		}
 		/*
-		 * instantiate inode and assign the unopened fid to the dentry
+		 * Instantiate inode.  On .L fs, pass in dentry for inodeident=path.
 		 */
-		inode = v9fs_get_new_inode_from_fid(v9ses, fid, dir->i_sb);
+		inode = v9fs_get_new_inode_from_fid(v9ses, fid, dir->i_sb,
+			v9fs_proto_dotl(v9ses) ? dentry : NULL);
 		if (IS_ERR(inode)) {
 			err = PTR_ERR(inode);
 			p9_debug(P9_DEBUG_VFS,
 				   "inode creation failed %d\n", err);
 			goto error;
 		}
+		/* Assign the unopened fid to the dentry */
 		v9fs_fid_add(dentry, &fid);
 		d_instantiate(dentry, inode);
 	}
@@ -733,14 +736,21 @@ struct dentry *v9fs_vfs_lookup(struct inode *dir, struct dentry *dentry,
 	name = dentry->d_name.name;
 	fid = p9_client_walk(dfid, 1, &name, 1);
 	p9_fid_put(dfid);
+
+	/*
+	 * On .L fs, pass in dentry to v9fs_get_inode_from_fid in case it is
+	 * needed by inodeident=path
+	 */
 	if (fid == ERR_PTR(-ENOENT))
 		inode = NULL;
 	else if (IS_ERR(fid))
 		inode = ERR_CAST(fid);
-	else if (v9ses->cache & (CACHE_META|CACHE_LOOSE))
-		inode = v9fs_get_inode_from_fid(v9ses, fid, dir->i_sb);
+	else if (v9ses->cache & (CACHE_META | CACHE_LOOSE))
+		inode = v9fs_get_inode_from_fid(v9ses, fid, dir->i_sb,
+			v9fs_proto_dotl(v9ses) ? dentry : NULL);
 	else
-		inode = v9fs_get_new_inode_from_fid(v9ses, fid, dir->i_sb);
+		inode = v9fs_get_new_inode_from_fid(v9ses, fid, dir->i_sb,
+			v9fs_proto_dotl(v9ses) ? dentry : NULL);
 	/*
 	 * If we had a rename on the server and a parallel lookup
 	 * for the new name, then make sure we instantiate with
diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index 2d025e561ba1..c1cc3553f2fb 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -52,10 +52,17 @@ static kgid_t v9fs_get_fsgid_for_create(struct inode *dir_inode)
 	return current_fsgid();
 }
 
+struct iget_data {
+	struct p9_stat_dotl *st;
+	struct dentry *dentry;
+};
+
 static int v9fs_test_inode_dotl(struct inode *inode, void *data)
 {
 	struct v9fs_inode *v9inode = V9FS_I(inode);
-	struct p9_stat_dotl *st = (struct p9_stat_dotl *)data;
+	struct p9_stat_dotl *st = ((struct iget_data *)data)->st;
+	struct dentry *dentry = ((struct iget_data *)data)->dentry;
+	struct v9fs_session_info *v9ses = v9fs_inode2v9ses(inode);
 
 	/* don't match inode of different type */
 	if (inode_wrong_type(inode, st->st_mode))
@@ -74,22 +81,74 @@ static int v9fs_test_inode_dotl(struct inode *inode, void *data)
 
 	if (v9inode->qid.path != st->qid.path)
 		return 0;
+
+	if (v9fs_inode_ident_path(v9ses)) {
+		if (!ino_path_compare(v9inode->path, dentry)) {
+			p9_debug(P9_DEBUG_VFS, "Refusing to reuse inode %p based on path mismatch",
+				 inode);
+			return 0;
+		}
+	}
 	return 1;
 }
 
 /* Always get a new inode */
 static int v9fs_test_new_inode_dotl(struct inode *inode, void *data)
 {
+	struct v9fs_inode *v9inode = V9FS_I(inode);
+	struct p9_stat_dotl *st = ((struct iget_data *)data)->st;
+	struct dentry *dentry = ((struct iget_data *)data)->dentry;
+	struct v9fs_session_info *v9ses = v9fs_inode2v9ses(inode);
+
+	/*
+	 * Don't reuse inode of different type, even if we have
+	 * inodeident=path and path matches.
+	 */
+	if (inode_wrong_type(inode, st->st_mode))
+		return 0;
+
+	/*
+	 * We're only getting here if QID2INO stays the same anyway, so
+	 * mirroring the qid checks in v9fs_test_inode_dotl
+	 * (but maybe that check is unnecessary anyway? at least on 64bit)
+	 */
+
+	if (v9inode->qid.type != st->qid.type)
+		return 0;
+
+	if (v9inode->qid.path != st->qid.path)
+		return 0;
+
+	if (v9fs_inode_ident_path(v9ses) && dentry && v9inode->path) {
+		if (ino_path_compare(V9FS_I(inode)->path, dentry)) {
+			p9_debug(P9_DEBUG_VFS,
+				 "Reusing inode %p based on path match", inode);
+			return 1;
+		}
+	}
+
 	return 0;
 }
 
 static int v9fs_set_inode_dotl(struct inode *inode,  void *data)
 {
 	struct v9fs_inode *v9inode = V9FS_I(inode);
-	struct p9_stat_dotl *st = (struct p9_stat_dotl *)data;
+	struct v9fs_session_info *v9ses = v9fs_inode2v9ses(inode);
+	struct iget_data *idata = data;
+	struct p9_stat_dotl *st = idata->st;
+	struct dentry *dentry = idata->dentry;
 
 	memcpy(&v9inode->qid, &st->qid, sizeof(st->qid));
 	inode->i_generation = st->st_gen;
+	if (v9fs_inode_ident_path(v9ses)) {
+		if (dentry) {
+			v9inode->path = make_ino_path(dentry);
+			if (!v9inode->path)
+				return -ENOMEM;
+		} else {
+			v9inode->path = NULL;
+		}
+	}
 	return 0;
 }
 
@@ -97,19 +156,35 @@ static struct inode *v9fs_qid_iget_dotl(struct super_block *sb,
 					struct p9_qid *qid,
 					struct p9_fid *fid,
 					struct p9_stat_dotl *st,
+					struct dentry *dentry,
 					int new)
 {
 	int retval;
 	struct inode *inode;
 	struct v9fs_session_info *v9ses = sb->s_fs_info;
 	int (*test)(struct inode *inode, void *data);
+	struct iget_data data = {
+		.st = st,
+		.dentry = dentry,
+	};
+
 
 	if (new)
 		test = v9fs_test_new_inode_dotl;
 	else
 		test = v9fs_test_inode_dotl;
 
-	inode = iget5_locked(sb, QID2INO(qid), test, v9fs_set_inode_dotl, st);
+	if (v9fs_inode_ident_path(v9ses) && dentry) {
+		/*
+		 * We have to take the rename_sem lock here as iget5_locked has
+		 * spinlock in it (inode_hash_lock)
+		 */
+		down_read(&v9ses->rename_sem);
+	}
+	inode = iget5_locked(sb, QID2INO(qid), test, v9fs_set_inode_dotl, &data);
+	if (v9fs_inode_ident_path(v9ses) && dentry)
+		up_read(&v9ses->rename_sem);
+
 	if (!inode)
 		return ERR_PTR(-ENOMEM);
 	if (!(inode->i_state & I_NEW))
@@ -142,7 +217,7 @@ static struct inode *v9fs_qid_iget_dotl(struct super_block *sb,
 
 struct inode *
 v9fs_inode_from_fid_dotl(struct v9fs_session_info *v9ses, struct p9_fid *fid,
-			 struct super_block *sb, int new)
+			 struct super_block *sb, struct dentry *dentry, int new)
 {
 	struct p9_stat_dotl *st;
 	struct inode *inode = NULL;
@@ -151,7 +226,7 @@ v9fs_inode_from_fid_dotl(struct v9fs_session_info *v9ses, struct p9_fid *fid,
 	if (IS_ERR(st))
 		return ERR_CAST(st);
 
-	inode = v9fs_qid_iget_dotl(sb, &st->qid, fid, st, new);
+	inode = v9fs_qid_iget_dotl(sb, &st->qid, fid, st, dentry, new);
 	kfree(st);
 	return inode;
 }
@@ -305,7 +380,7 @@ v9fs_vfs_atomic_open_dotl(struct inode *dir, struct dentry *dentry,
 		p9_debug(P9_DEBUG_VFS, "p9_client_walk failed %d\n", err);
 		goto out;
 	}
-	inode = v9fs_get_new_inode_from_fid(v9ses, fid, dir->i_sb);
+	inode = v9fs_get_new_inode_from_fid(v9ses, fid, dir->i_sb, dentry);
 	if (IS_ERR(inode)) {
 		err = PTR_ERR(inode);
 		p9_debug(P9_DEBUG_VFS, "inode creation failed %d\n", err);
@@ -400,7 +475,7 @@ static int v9fs_vfs_mkdir_dotl(struct mnt_idmap *idmap,
 	}
 
 	/* instantiate inode and assign the unopened fid to the dentry */
-	inode = v9fs_get_new_inode_from_fid(v9ses, fid, dir->i_sb);
+	inode = v9fs_get_new_inode_from_fid(v9ses, fid, dir->i_sb, dentry);
 	if (IS_ERR(inode)) {
 		err = PTR_ERR(inode);
 		p9_debug(P9_DEBUG_VFS, "inode creation failed %d\n",
@@ -838,7 +913,7 @@ v9fs_vfs_mknod_dotl(struct mnt_idmap *idmap, struct inode *dir,
 			 err);
 		goto error;
 	}
-	inode = v9fs_get_new_inode_from_fid(v9ses, fid, dir->i_sb);
+	inode = v9fs_get_new_inode_from_fid(v9ses, fid, dir->i_sb, dentry);
 	if (IS_ERR(inode)) {
 		err = PTR_ERR(inode);
 		p9_debug(P9_DEBUG_VFS, "inode creation failed %d\n",
diff --git a/fs/9p/vfs_super.c b/fs/9p/vfs_super.c
index 489db161abc9..566d9ae6255f 100644
--- a/fs/9p/vfs_super.c
+++ b/fs/9p/vfs_super.c
@@ -139,7 +139,7 @@ static struct dentry *v9fs_mount(struct file_system_type *fs_type, int flags,
 	else
 		sb->s_d_op = &v9fs_dentry_operations;
 
-	inode = v9fs_get_new_inode_from_fid(v9ses, fid, sb);
+	inode = v9fs_get_new_inode_from_fid(v9ses, fid, sb, NULL);
 	if (IS_ERR(inode)) {
 		retval = PTR_ERR(inode);
 		goto release_sb;
@@ -151,6 +151,14 @@ static struct dentry *v9fs_mount(struct file_system_type *fs_type, int flags,
 		goto release_sb;
 	}
 	sb->s_root = root;
+
+	if (v9fs_inode_ident_path(v9ses)) {
+		/* Probably not necessary, just to satisfy lockdep_assert */
+		down_read(&v9ses->rename_sem);
+		V9FS_I(inode)->path = make_ino_path(root);
+		up_read(&v9ses->rename_sem);
+	}
+
 	retval = v9fs_get_acl(inode, fid);
 	if (retval)
 		goto release_sb;
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 2/6] fs/9p: add default option for path-based inodes
  2025-04-06 20:43 [RFC PATCH 0/6] fs/9p: Reuse inode based on path (in addition to qid) Tingmao Wang
  2025-04-06 20:43 ` [RFC PATCH 1/6] fs/9p: Add ability to identify inode by path for .L Tingmao Wang
@ 2025-04-06 20:43 ` Tingmao Wang
  2025-04-06 20:43 ` [RFC PATCH 3/6] fs/9p: Hide inodeident=path from show_options as it is the default Tingmao Wang
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Tingmao Wang @ 2025-04-06 20:43 UTC (permalink / raw)
  To: Eric Van Hensbergen, Dominique Martinet, Latchesar Ionkov,
	Christian Schoenebeck
  Cc: Tingmao Wang, v9fs, Mickaël Salaün, Günther Noack,
	linux-security-module, Jan Kara, Amir Goldstein,
	Matthew Bobrowski, linux-fsdevel

Add an extra option to control the inode identification behaviour, and
make path-based default.  The reasoning for this default is that the
current 9pfs already does not re-use inodes even for same qid.path, but
because of the dcache, a file kept open by a process (by e.g. open, or
having the directory as CWD) means that other processes will re-use that
inode.  If we identify inode by path by default, this behaviour is
consistent regardless of whether files are kept open or not.

In a future version, if we can negotiate with the server and be sure that
it won't give us duplicate qid.path, the default for those cases can be
qid-based (possibly re-merging commit 724a08450f74 ("fs/9p: simplify iget
to remove unnecessary paths")...?)

Signed-off-by: Tingmao Wang <m@maowtm.org>
---
 fs/9p/v9fs.c | 33 ++++++++++++++++++++++++++++++++-
 fs/9p/v9fs.h |  2 ++
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/fs/9p/v9fs.c b/fs/9p/v9fs.c
index 77e9c4387c1d..487532295a98 100644
--- a/fs/9p/v9fs.c
+++ b/fs/9p/v9fs.c
@@ -36,7 +36,7 @@ enum {
 	/* Options that take integer arguments */
 	Opt_debug, Opt_dfltuid, Opt_dfltgid, Opt_afid,
 	/* String options */
-	Opt_uname, Opt_remotename, Opt_cache, Opt_cachetag,
+	Opt_uname, Opt_remotename, Opt_cache, Opt_cachetag, Opt_inodeident,
 	/* Options that take no arguments */
 	Opt_nodevmap, Opt_noxattr, Opt_directio, Opt_ignoreqv,
 	/* Access options */
@@ -63,6 +63,7 @@ static const match_table_t tokens = {
 	{Opt_access, "access=%s"},
 	{Opt_posixacl, "posixacl"},
 	{Opt_locktimeout, "locktimeout=%u"},
+	{Opt_inodeident, "inodeident=%s"},
 	{Opt_err, NULL}
 };
 
@@ -149,6 +150,15 @@ int v9fs_show_options(struct seq_file *m, struct dentry *root)
 	if (v9ses->flags & V9FS_NO_XATTR)
 		seq_puts(m, ",noxattr");
 
+	switch (v9ses->flags & V9FS_INODE_IDENT_MASK) {
+	case 0:
+		seq_puts(m, ",inodeident=none");
+		break;
+	case V9FS_INODE_IDENT_PATH:
+		seq_puts(m, ",inodeident=path");
+		break;
+	}
+
 	return p9_show_client_options(m, v9ses->clnt);
 }
 
@@ -369,6 +379,25 @@ static int v9fs_parse_options(struct v9fs_session_info *v9ses, char *opts)
 			v9ses->session_lock_timeout = (long)option * HZ;
 			break;
 
+		case Opt_inodeident:
+			s = match_strdup(&args[0]);
+			if (!s) {
+				ret = -ENOMEM;
+				p9_debug(P9_DEBUG_ERROR,
+					 "problem allocating copy of inodeident arg\n");
+				goto free_and_return;
+			}
+			if (strcmp(s, "none") == 0) {
+				v9ses->flags &= ~V9FS_INODE_IDENT_MASK;
+			} else if (strcmp(s, "path") == 0) {
+				v9ses->flags |= V9FS_INODE_IDENT_PATH;
+			} else {
+				ret = -EINVAL;
+				p9_debug(P9_DEBUG_ERROR, "Unknown inodeident argument %s\n", s);
+			}
+			kfree(s);
+			break;
+
 		default:
 			continue;
 		}
@@ -423,6 +452,8 @@ struct p9_fid *v9fs_session_init(struct v9fs_session_info *v9ses,
 		v9ses->flags |= V9FS_PROTO_2000U;
 	}
 
+	v9ses->flags |= V9FS_INODE_IDENT_PATH;
+
 	rc = v9fs_parse_options(v9ses, data);
 	if (rc < 0)
 		goto err_clnt;
diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h
index 5c85923aa2dd..b4874fdd925e 100644
--- a/fs/9p/v9fs.h
+++ b/fs/9p/v9fs.h
@@ -31,6 +31,8 @@
 #define V9FS_ACCESS_MASK V9FS_ACCESS_ANY
 #define V9FS_ACL_MASK V9FS_POSIX_ACL
 
+#define V9FS_INODE_IDENT_MASK (V9FS_INODE_IDENT_PATH)
+
 enum p9_session_flags {
 	V9FS_PROTO_2000U      = 0x01,
 	V9FS_PROTO_2000L      = 0x02,
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 3/6] fs/9p: Hide inodeident=path from show_options as it is the default
  2025-04-06 20:43 [RFC PATCH 0/6] fs/9p: Reuse inode based on path (in addition to qid) Tingmao Wang
  2025-04-06 20:43 ` [RFC PATCH 1/6] fs/9p: Add ability to identify inode by path for .L Tingmao Wang
  2025-04-06 20:43 ` [RFC PATCH 2/6] fs/9p: add default option for path-based inodes Tingmao Wang
@ 2025-04-06 20:43 ` Tingmao Wang
  2025-04-06 20:43 ` [RFC PATCH 4/6] fs/9p: Add ability to identify inode by path for non-.L Tingmao Wang
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Tingmao Wang @ 2025-04-06 20:43 UTC (permalink / raw)
  To: Eric Van Hensbergen, Dominique Martinet, Latchesar Ionkov,
	Christian Schoenebeck
  Cc: Tingmao Wang, v9fs, Mickaël Salaün, Günther Noack,
	linux-security-module, Jan Kara, Amir Goldstein,
	Matthew Bobrowski, linux-fsdevel

I think it makes sense either way, not sure whether we should do this or
not, but including this patch anyway.

Signed-off-by: Tingmao Wang <m@maowtm.org>
---
 fs/9p/v9fs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/9p/v9fs.c b/fs/9p/v9fs.c
index 487532295a98..0070d1179021 100644
--- a/fs/9p/v9fs.c
+++ b/fs/9p/v9fs.c
@@ -155,7 +155,7 @@ int v9fs_show_options(struct seq_file *m, struct dentry *root)
 		seq_puts(m, ",inodeident=none");
 		break;
 	case V9FS_INODE_IDENT_PATH:
-		seq_puts(m, ",inodeident=path");
+		/* default */
 		break;
 	}
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 4/6] fs/9p: Add ability to identify inode by path for non-.L
  2025-04-06 20:43 [RFC PATCH 0/6] fs/9p: Reuse inode based on path (in addition to qid) Tingmao Wang
                   ` (2 preceding siblings ...)
  2025-04-06 20:43 ` [RFC PATCH 3/6] fs/9p: Hide inodeident=path from show_options as it is the default Tingmao Wang
@ 2025-04-06 20:43 ` Tingmao Wang
  2025-07-11 19:12   ` Tingmao Wang
  2025-04-06 20:43 ` [RFC PATCH 5/6] fs/9p: .L: Refresh stale inodes on reuse Tingmao Wang
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: Tingmao Wang @ 2025-04-06 20:43 UTC (permalink / raw)
  To: Eric Van Hensbergen, Dominique Martinet, Latchesar Ionkov,
	Christian Schoenebeck
  Cc: Tingmao Wang, v9fs, Mickaël Salaün, Günther Noack,
	linux-security-module, Jan Kara, Amir Goldstein,
	Matthew Bobrowski, linux-fsdevel

This replicates the earlier .L patch for non-.L, and removing some
previously inserted conditionals in shared code.

Signed-off-by: Tingmao Wang <m@maowtm.org>
---
 fs/9p/v9fs.h      |   7 +--
 fs/9p/vfs_inode.c | 112 +++++++++++++++++++++++++++++++++++++---------
 2 files changed, 96 insertions(+), 23 deletions(-)

diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h
index b4874fdd925e..3199d516dc8a 100644
--- a/fs/9p/v9fs.h
+++ b/fs/9p/v9fs.h
@@ -201,7 +201,8 @@ extern int v9fs_vfs_rename(struct mnt_idmap *idmap,
 			   unsigned int flags);
 extern struct inode *v9fs_inode_from_fid(struct v9fs_session_info *v9ses,
 					 struct p9_fid *fid,
-					 struct super_block *sb, int new);
+					 struct super_block *sb,
+					 struct dentry *dentry, int new);
 extern const struct inode_operations v9fs_dir_inode_operations_dotl;
 extern const struct inode_operations v9fs_file_inode_operations_dotl;
 extern const struct inode_operations v9fs_symlink_inode_operations_dotl;
@@ -258,7 +259,7 @@ v9fs_get_inode_from_fid(struct v9fs_session_info *v9ses, struct p9_fid *fid,
 	if (v9fs_proto_dotl(v9ses))
 		return v9fs_inode_from_fid_dotl(v9ses, fid, sb, dentry, 0);
 	else
-		return v9fs_inode_from_fid(v9ses, fid, sb, 0);
+		return v9fs_inode_from_fid(v9ses, fid, sb, dentry, 0);
 }
 
 /**
@@ -276,7 +277,7 @@ v9fs_get_new_inode_from_fid(struct v9fs_session_info *v9ses, struct p9_fid *fid,
 	if (v9fs_proto_dotl(v9ses))
 		return v9fs_inode_from_fid_dotl(v9ses, fid, sb, dentry, 1);
 	else
-		return v9fs_inode_from_fid(v9ses, fid, sb, 1);
+		return v9fs_inode_from_fid(v9ses, fid, sb, dentry, 1);
 }
 
 #endif
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 72fd72a2ff06..1137a5960ac2 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -363,12 +363,18 @@ void v9fs_evict_inode(struct inode *inode)
 		clear_inode(inode);
 }
 
+struct iget_data {
+	struct p9_wstat *st;
+	struct dentry *dentry;
+};
+
 static int v9fs_test_inode(struct inode *inode, void *data)
 {
 	int umode;
 	dev_t rdev;
 	struct v9fs_inode *v9inode = V9FS_I(inode);
-	struct p9_wstat *st = (struct p9_wstat *)data;
+	struct p9_wstat *st = ((struct iget_data *)data)->st;
+	struct dentry *dentry = ((struct iget_data *)data)->dentry;
 	struct v9fs_session_info *v9ses = v9fs_inode2v9ses(inode);
 
 	umode = p9mode2unixmode(v9ses, st, &rdev);
@@ -386,26 +392,81 @@ static int v9fs_test_inode(struct inode *inode, void *data)
 
 	if (v9inode->qid.path != st->qid.path)
 		return 0;
+
+	if (v9fs_inode_ident_path(v9ses)) {
+		if (!ino_path_compare(v9inode->path, dentry)) {
+			p9_debug(P9_DEBUG_VFS, "Refusing to reuse inode %p based on path mismatch",
+				 inode);
+			return 0;
+		}
+	}
+
 	return 1;
 }
 
 static int v9fs_test_new_inode(struct inode *inode, void *data)
 {
+	int umode;
+	dev_t rdev;
+	struct v9fs_inode *v9inode = V9FS_I(inode);
+	struct p9_wstat *st = ((struct iget_data *)data)->st;
+	struct dentry *dentry = ((struct iget_data *)data)->dentry;
+	struct v9fs_session_info *v9ses = v9fs_inode2v9ses(inode);
+
+	umode = p9mode2unixmode(v9ses, st, &rdev);
+	/*
+	 * Don't reuse inode of different type, even if we have
+	 * inodeident=path and path matches.
+	 */
+	if (inode_wrong_type(inode, umode))
+		return 0;
+
+	/*
+	 * We're only getting here if QID2INO stays the same anyway, so
+	 * mirroring the qid checks in v9fs_test_inode
+	 * (but maybe that check is unnecessary anyway? at least on 64bit)
+	 */
+
+	if (v9inode->qid.path != st->qid.path)
+		return 0;
+
+	if (v9inode->qid.type != st->qid.type)
+		return 0;
+
+	if (v9fs_inode_ident_path(v9ses) && dentry && v9inode->path) {
+		if (ino_path_compare(V9FS_I(inode)->path, dentry)) {
+			p9_debug(P9_DEBUG_VFS, "Refusing to reuse inode %p based on path mismatch",
+				 inode);
+			return 1;
+		}
+	}
+
 	return 0;
 }
 
-static int v9fs_set_inode(struct inode *inode,  void *data)
+static int v9fs_set_inode(struct inode *inode, void *data)
 {
 	struct v9fs_inode *v9inode = V9FS_I(inode);
-	struct p9_wstat *st = (struct p9_wstat *)data;
+	struct v9fs_session_info *v9ses = v9fs_inode2v9ses(inode);
+	struct iget_data *idata = data;
+	struct p9_wstat *st = idata->st;
+	struct dentry *dentry = idata->dentry;
 
 	memcpy(&v9inode->qid, &st->qid, sizeof(st->qid));
+	if (v9fs_inode_ident_path(v9ses)) {
+		if (dentry) {
+			v9inode->path = make_ino_path(dentry);
+			if (!v9inode->path)
+				return -ENOMEM;
+		} else {
+			v9inode->path = NULL;
+		}
+	}
 	return 0;
 }
 
-static struct inode *v9fs_qid_iget(struct super_block *sb,
-				   struct p9_qid *qid,
-				   struct p9_wstat *st,
+static struct inode *v9fs_qid_iget(struct super_block *sb, struct p9_qid *qid,
+				   struct p9_wstat *st, struct dentry *dentry,
 				   int new)
 {
 	dev_t rdev;
@@ -414,13 +475,27 @@ static struct inode *v9fs_qid_iget(struct super_block *sb,
 	struct inode *inode;
 	struct v9fs_session_info *v9ses = sb->s_fs_info;
 	int (*test)(struct inode *inode, void *data);
+	struct iget_data data = {
+		.st = st,
+		.dentry = dentry,
+	};
 
 	if (new)
 		test = v9fs_test_new_inode;
 	else
 		test = v9fs_test_inode;
 
-	inode = iget5_locked(sb, QID2INO(qid), test, v9fs_set_inode, st);
+	if (v9fs_inode_ident_path(v9ses) && dentry) {
+		/*
+		 * We have to take the rename_sem lock here as iget5_locked has
+		 * spinlock in it (inode_hash_lock)
+		 */
+		down_read(&v9ses->rename_sem);
+	}
+	inode = iget5_locked(sb, QID2INO(qid), test, v9fs_set_inode, &data);
+	if (v9fs_inode_ident_path(v9ses) && dentry)
+		up_read(&v9ses->rename_sem);
+
 	if (!inode)
 		return ERR_PTR(-ENOMEM);
 	if (!(inode->i_state & I_NEW))
@@ -447,9 +522,9 @@ static struct inode *v9fs_qid_iget(struct super_block *sb,
 
 }
 
-struct inode *
-v9fs_inode_from_fid(struct v9fs_session_info *v9ses, struct p9_fid *fid,
-		    struct super_block *sb, int new)
+struct inode *v9fs_inode_from_fid(struct v9fs_session_info *v9ses,
+				  struct p9_fid *fid, struct super_block *sb,
+				  struct dentry *dentry, int new)
 {
 	struct p9_wstat *st;
 	struct inode *inode = NULL;
@@ -458,7 +533,7 @@ v9fs_inode_from_fid(struct v9fs_session_info *v9ses, struct p9_fid *fid,
 	if (IS_ERR(st))
 		return ERR_CAST(st);
 
-	inode = v9fs_qid_iget(sb, &st->qid, st, new);
+	inode = v9fs_qid_iget(sb, &st->qid, st, dentry, new);
 	p9stat_free(st);
 	kfree(st);
 	return inode;
@@ -608,10 +683,9 @@ v9fs_create(struct v9fs_session_info *v9ses, struct inode *dir,
 			goto error;
 		}
 		/*
-		 * Instantiate inode.  On .L fs, pass in dentry for inodeident=path.
+		 * Instantiate inode.  Pass in dentry for inodeident=path.
 		 */
-		inode = v9fs_get_new_inode_from_fid(v9ses, fid, dir->i_sb,
-			v9fs_proto_dotl(v9ses) ? dentry : NULL);
+		inode = v9fs_get_new_inode_from_fid(v9ses, fid, dir->i_sb, dentry);
 		if (IS_ERR(inode)) {
 			err = PTR_ERR(inode);
 			p9_debug(P9_DEBUG_VFS,
@@ -738,19 +812,17 @@ struct dentry *v9fs_vfs_lookup(struct inode *dir, struct dentry *dentry,
 	p9_fid_put(dfid);
 
 	/*
-	 * On .L fs, pass in dentry to v9fs_get_inode_from_fid in case it is
-	 * needed by inodeident=path
+	 * Pass in dentry to v9fs_get_inode_from_fid in case it is needed by
+	 * inodeident=path
 	 */
 	if (fid == ERR_PTR(-ENOENT))
 		inode = NULL;
 	else if (IS_ERR(fid))
 		inode = ERR_CAST(fid);
 	else if (v9ses->cache & (CACHE_META | CACHE_LOOSE))
-		inode = v9fs_get_inode_from_fid(v9ses, fid, dir->i_sb,
-			v9fs_proto_dotl(v9ses) ? dentry : NULL);
+		inode = v9fs_get_inode_from_fid(v9ses, fid, dir->i_sb, dentry);
 	else
-		inode = v9fs_get_new_inode_from_fid(v9ses, fid, dir->i_sb,
-			v9fs_proto_dotl(v9ses) ? dentry : NULL);
+		inode = v9fs_get_new_inode_from_fid(v9ses, fid, dir->i_sb, dentry);
 	/*
 	 * If we had a rename on the server and a parallel lookup
 	 * for the new name, then make sure we instantiate with
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 5/6] fs/9p: .L: Refresh stale inodes on reuse
  2025-04-06 20:43 [RFC PATCH 0/6] fs/9p: Reuse inode based on path (in addition to qid) Tingmao Wang
                   ` (3 preceding siblings ...)
  2025-04-06 20:43 ` [RFC PATCH 4/6] fs/9p: Add ability to identify inode by path for non-.L Tingmao Wang
@ 2025-04-06 20:43 ` Tingmao Wang
  2025-04-06 20:43 ` [RFC PATCH 6/6] fs/9p: non-.L: " Tingmao Wang
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: Tingmao Wang @ 2025-04-06 20:43 UTC (permalink / raw)
  To: Eric Van Hensbergen, Dominique Martinet, Latchesar Ionkov,
	Christian Schoenebeck
  Cc: Tingmao Wang, v9fs, Mickaël Salaün, Günther Noack,
	linux-security-module, Jan Kara, Amir Goldstein,
	Matthew Bobrowski, linux-fsdevel

Only for uncached mode for now.  We will need to revisit this for cached
mode once we sort out reusing an old inode with changed qid.version.

Note that v9fs_test(_new)?_inode_dotl already makes sure we don't reuse
inodes of the wrong type or different qid.

Signed-off-by: Tingmao Wang <m@maowtm.org>
---
 fs/9p/vfs_inode_dotl.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index c1cc3553f2fb..20434a25cb22 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -187,8 +187,23 @@ static struct inode *v9fs_qid_iget_dotl(struct super_block *sb,
 
 	if (!inode)
 		return ERR_PTR(-ENOMEM);
-	if (!(inode->i_state & I_NEW))
+	if (!(inode->i_state & I_NEW)) {
+		/*
+		 * If we're returning an existing inode, we might as well refresh
+		 * it with the metadata we just got.  Refreshing the i_size also
+		 * prevents read errors.
+		 *
+		 * We only do this for uncached mode, since in cached move, any
+		 * change on the inode will bump qid.version, which will result in
+		 * us getting a new inode in the first place.  If we got an old
+		 * inode, let's not touch it for now.
+		 */
+		if (new) {
+			v9fs_stat2inode_dotl(st, inode,
+				(v9ses->cache & CACHE_LOOSE) ?  V9FS_STAT2INODE_KEEP_ISIZE : 0);
+		}
 		return inode;
+	}
 	/*
 	 * initialize the inode with the stat info
 	 * FIXME!! we may need support for stale inodes
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 6/6] fs/9p: non-.L: Refresh stale inodes on reuse
  2025-04-06 20:43 [RFC PATCH 0/6] fs/9p: Reuse inode based on path (in addition to qid) Tingmao Wang
                   ` (4 preceding siblings ...)
  2025-04-06 20:43 ` [RFC PATCH 5/6] fs/9p: .L: Refresh stale inodes on reuse Tingmao Wang
@ 2025-04-06 20:43 ` Tingmao Wang
  2025-07-04 18:37 ` [RFC PATCH 0/6] fs/9p: Reuse inode based on path (in addition to qid) Mickaël Salaün
  2025-08-08 10:27 ` Dominique Martinet
  7 siblings, 0 replies; 20+ messages in thread
From: Tingmao Wang @ 2025-04-06 20:43 UTC (permalink / raw)
  To: Eric Van Hensbergen, Dominique Martinet, Latchesar Ionkov,
	Christian Schoenebeck
  Cc: Tingmao Wang, v9fs, Mickaël Salaün, Günther Noack,
	linux-security-module, Jan Kara, Amir Goldstein,
	Matthew Bobrowski, linux-fsdevel

This replicates the previous .L commit for non-.L

Signed-off-by: Tingmao Wang <m@maowtm.org>
---
 fs/9p/vfs_inode.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 1137a5960ac2..3f087b0bf1bf 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -498,8 +498,14 @@ static struct inode *v9fs_qid_iget(struct super_block *sb, struct p9_qid *qid,
 
 	if (!inode)
 		return ERR_PTR(-ENOMEM);
-	if (!(inode->i_state & I_NEW))
+	if (!(inode->i_state & I_NEW)) {
+		/* See explanation in v9fs_qid_iget_dotl */
+		if (new) {
+			v9fs_stat2inode(st, inode, sb,
+				(v9ses->cache & CACHE_LOOSE) ?  V9FS_STAT2INODE_KEEP_ISIZE : 0);
+		}
 		return inode;
+	}
 	/*
 	 * initialize the inode with the stat info
 	 * FIXME!! we may need support for stale inodes
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 0/6] fs/9p: Reuse inode based on path (in addition to qid)
  2025-04-06 20:43 [RFC PATCH 0/6] fs/9p: Reuse inode based on path (in addition to qid) Tingmao Wang
                   ` (5 preceding siblings ...)
  2025-04-06 20:43 ` [RFC PATCH 6/6] fs/9p: non-.L: " Tingmao Wang
@ 2025-07-04 18:37 ` Mickaël Salaün
  2025-08-08 10:27 ` Dominique Martinet
  7 siblings, 0 replies; 20+ messages in thread
From: Mickaël Salaün @ 2025-07-04 18:37 UTC (permalink / raw)
  To: Eric Van Hensbergen, Dominique Martinet, Latchesar Ionkov,
	Christian Schoenebeck
  Cc: Tingmao Wang, v9fs, Günther Noack, linux-security-module,
	Jan Kara, Amir Goldstein, Matthew Bobrowski, linux-fsdevel,
	Christian Brauner

Hi,

I tested this patch series and I confirm that it fixes the issue for
Landlock.  Tingmao explained that it also fixes other subsystems such as
fanotify, and 9p itself.  I sent a patch to test this fix with Landlock
and I'll merge it once 9p is fixed:
https://lore.kernel.org/all/20250704171345.1393451-1-mic@digikod.net/

Could you please give it a try in linux-next and consider it for the
next merge window?

Regards,
 Mickaël

On Sun, Apr 06, 2025 at 09:43:01PM +0100, Tingmao Wang wrote:
> Hi 9p-fs maintainers,
> 
> (CC'ing Landlock and Fanotify/inotify people as this affects both use
> cases, although most of my investigation has been on Landlock)
> 
> Previously [1], I noticed that when using 9pfs filesystems, the Landlock
> LSM is blocking access even for files / directories allowed by rules, and
> that this has something to do with 9pfs creating new inodes despite
> Landlock holding a reference to the existing one.  Because Landlock uses
> inodes' in-memory state (i_security) to identify allowed fs
> objects/hierarchies, this causes Landlock to partially break on 9pfs, at
> least in uncached mode, which is the default:
> 
>     # mount -t 9p -o trans=virtio test /mnt
>     # env LL_FS_RO=/etc:/usr:/bin:/lib:/mnt/readme LL_FS_RW= /sandboxer bash
>     Executing the sandboxed command...
>     bash: cannot set terminal process group (164): Inappropriate ioctl for device
>     bash: no job control in this shell
>     # cat /mnt/readme
>     cat: /mnt/readme: Permission denied
> 
> This, however, works if somebody is holding onto the dentry (and it also
> works with cache=loose), as in both cases the inode is reused:
> 
>     # tail -f /mnt/readme &
>     [1] 196
>     # env LL_FS_RO=/etc:/usr:/bin:/lib:/mnt/readme LL_FS_RW= /sandboxer bash
>     Executing the sandboxed command...
>     bash: cannot set terminal process group (164): Inappropriate ioctl for device
>     bash: no job control in this shell
>     # cat /mnt/readme
>     aa
> 
> It also works on directories if one have a shell that cd into the
> directory.  Note that this means only certain usage of Landlock are
> affected - for example, sandboxing applications that takes a list of files
> to allow, landlocks itself, then evecve.  On the other hand, this does not
> affect applications that opens a file, then Landlocks itself while keeping
> the file it needs open.
> 
> While the above is a very simple example, this is problematic in
> real-world use cases if Landlock is used to sandox applications on system
> that has files mounted via 9pfs, or use 9pfs as the root filesystem.  In
> addition, this also affects fanotify / inotify when using inode mark (for
> local access):
> 
>     root@d8c28a676d72:/# ./fanotify-basic-open /readme & # on virtiofs
>     [1] 173
>     root@d8c28a676d72:/# cat readme
>     aa
>     FAN_OPEN: File /readme
>     root@d8c28a676d72:/# mount -t 9p -o trans=virtio test /mnt
>     root@d8c28a676d72:/# ./fanotify-basic-open /mnt/readme & # on 9pfs
>     [2] 176
>     root@d8c28a676d72:/# cat /mnt/readme
>     aa
>     root@d8c28a676d72:/#
> 
> Same can be demonstrated with inotifywait.  The source code for
> fanotify-basic-open, adopted from the fanotify man page, is available at
> the end of this email.
> 
> Note that this is not a security bug for Landlock since it can only cause
> legitimate access to be denied, but might be a problem for fanotify perm
> (although I do recognize that using perm on individual inodes is already
> perhaps a bit unreliable?)
> 
> It seems that there was an attempt at making 9pfs reuse inodes, based on
> qid.path, however it was reverted [2] due to issues with servers that
> present duplicate qids, for example on a QEMU host that has multiple
> filesystems mounted under a single 9pfs export without multidevs=remap, or
> in the case of other servers that doesn't necessarily support remapping
> qids ([3] and more).  I've done some testing on v6.12-rc4 which has the
> simplified 9pfs inode code before it was reverted, and found that Landlock
> works (however, we of course then have the issue demonstrated in [2]).
> 
> Unrelated to the above problem, it also seems like even with the revert in
> [2], because in cached mode inode are still reused based on qid (and type,
> version (aka mtime), etc), the setup mentioned in [2] still causes
> problems in th latest kernel with cache=loose:
> 
>     host # cd /tmp/linux-test
>     host # mkdir m1 m2
>     host # mount -t tmpfs tmpfs m1
>     host # mount -t tmpfs tmpfs m2
>     host # mkdir m1/dir m2/dir  # needs to be done together so that they have the same mtime
>     host # echo foo > m1/dir/foo
>     host # echo bar > m2/dir/bar
> 
>     guest # mount -t 9p -o trans=virtio,cache=loose test /mnt
>     guest # cd /mnt/m1/dir
>     qemu-system-x86_64: warning: 9p: Multiple devices detected in same VirtFS export, which might lead to file ID collisions and severe misbehaviours on guest! You should either use a separate export for each device shared from host or use virtfs option 'multidevs=remap'!
>     guest # ls
>     foo
>     guest # ls /mnt/m2/dir
>     foo # <- should be bar
>     guest # uname -a
>     Linux d8c28a676d72 6.14.0-dev #92 SMP PREEMPT_DYNAMIC Sun Apr  6 18:47:54 BST 2025 x86_64 GNU/Linux
> 
> With the above in mind, I have a proposal for 9pfs to:
> 1. Reuse inodes even in uncached mode
> 2. However, reuse them based on qid.path AND the actual pathname, by doing
>    the appropriate testing in v9fs_test(_new)?_inode(_dotl)?
> 
> The main problem here is how to store the pathname in a sensible way and
> tie it to the inode.  For now I opted with an array of names acquired with
> take_dentry_name_snapshot, which reuses the same memory as the dcache to
> store the actual strings, but doesn't tie the lifetime of the dentry with
> the inode (I thought about holding a reference to the dentry in the
> v9fs_inode, but it seemed like a wrong approach and would cause dentries
> to not be evicted/released).
> 
> Maybe ideally there could be a general way for filesystems to tell
> VFS/dcache to "pin" a dentry as long as the inode is alive, but still
> allows the dentry and inode to be evicted based on memory pressure?  In
> fact, if the dentry is alive, we might not even need to do this in 9pfs,
> as we will automatically get the same inode pointed to by the dentry.

Christian, any though?

> 
> Currently this pathname is immutable once attached to an inode, and
> therefore it is not protected by locks or RCU, but this might have to
> change for us to support renames that preserve the inode on next access.
> This is not in this patchset yet.  Also let me know if I've missed any
> locks etc (I'm new to VFS, or for that matter, the kernel).
> 
> Storing one pathname per inode also means we don't reuse the same inode
> for hardlinks -- maybe this can be fixed as well in a future version, if
> this approach sounds good?
> 
> From some QEMU documentation I read [4] it seems like there is a plan to
> resolve these kind of problems in a new version of the protocol, by
> expanding the qid to include the filesystem identifier of a file on the
> host, so maybe this can be disabled after a successful protocol version
> check with the host?  For now this is implemented as a default option,
> inodeident=path, which can be set to 'none' to instead get the previous
> behaviour.
> 
> This patchset is still a bit of a work in progress, and I've not tested
> the performance impact.  It currently uses strncmp to compare paths but
> this might be able to be optimized into a hash comparison first.  It
> should also normally only need to do it for one pair of filenames, as the
> test is only done if qid.path matches in the first place.
> 
> Also, currently the inode is not reused in cached mode if mtime changed
> AND the dentry was evicted -- I considered removing the qid.version test
> in v9fs_test_inode_dotl but I think perhaps care needs to be taken to
> ensure we can refresh an inode that potentially has data cached?  This is
> a TODO for this patchset.
> 
> Another TODO is to maybe add support for case-insensitive matching?
> 
> For this patch series I've tested modifying files on both host and guest,
> changing a file from regular file to dir then back while preserving ino,
> keeping a file open in the guest with a fd, and using Landlock (which
> results in an ihold but does not keep the dentry) then trying to access
> the file from both inside and outside the Landlocked shell.
> 
> Let me know what's your thought on this -- if this is a viable approach
> I'm happy to work on it more and do more testing.  The main motivation
> behind this is getting Landlock to work "out of the box" on 9pfs.
> 
> This patch series was based on, and tested on v6.14 + [5]
> 
> Kind regards,
> Tingmao
> 
> [1]: https://github.com/landlock-lsm/linux/issues/45
> [2]: https://lore.kernel.org/all/20241024-revert_iget-v1-4-4cac63d25f72@codewreck.org/
> [3]: https://lore.kernel.org/all/20240923100508.GA32066@willie-the-truck/
> [4]: https://wiki.qemu.org/Documentation/9p#Protocol_Plans
> [5]: https://lore.kernel.org/all/cover.1743956147.git.m@maowtm.org/
> 
> fanotify-basic-open.c:
> 
>     #define _GNU_SOURCE /* Needed to get O_LARGEFILE definition */
>     #include <errno.h>
>     #include <fcntl.h>
>     #include <limits.h>
>     #include <poll.h>
>     #include <stdio.h>
>     #include <stdlib.h>
>     #include <sys/fanotify.h>
>     #include <unistd.h>
> 
>     /* Read all available fanotify events from the file descriptor 'fd'. */
> 
>     static void handle_events(int fd)
>     {
>         const struct fanotify_event_metadata *metadata;
>         struct fanotify_event_metadata buf[200];
>         ssize_t size;
>         char path[PATH_MAX];
>         ssize_t path_len;
>         char procfd_path[PATH_MAX];
>         struct fanotify_response response;
> 
>         for (;;) {
>             size = read(fd, buf, sizeof(buf));
>             if (size == -1 && errno != EAGAIN) {
>                 perror("read");
>                 exit(EXIT_FAILURE);
>             }
> 
>             /* Check if end of available data reached. */
> 
>             if (size <= 0)
>                 break;
> 
>             /* Point to the first event in the buffer. */
> 
>             metadata = buf;
> 
>             /* Loop over all events in the buffer. */
> 
>             while (FAN_EVENT_OK(metadata, size)) {
>                 if (metadata->fd >= 0) {
>                     if (metadata->mask & FAN_OPEN) {
>                         printf("FAN_OPEN: ");
>                     }
> 
>                     /* Retrieve and print pathname of the accessed file. */
> 
>                     snprintf(procfd_path, sizeof(procfd_path),
>                         "/proc/self/fd/%d", metadata->fd);
>                     path_len = readlink(procfd_path, path,
>                                 sizeof(path) - 1);
>                     if (path_len == -1) {
>                         perror("readlink");
>                         exit(EXIT_FAILURE);
>                     }
> 
>                     path[path_len] = '\0';
>                     printf("File %s\n", path);
> 
>                     /* Close the file descriptor of the event. */
> 
>                     close(metadata->fd);
>                 }
> 
>                 /* Advance to next event. */
> 
>                 metadata = FAN_EVENT_NEXT(metadata, size);
>             }
>         }
>     }
> 
>     int main(int argc, char *argv[])
>     {
>         char buf;
>         int fd, poll_num;
>         nfds_t nfds;
>         struct pollfd fds[2];
> 
>         /* Check mount point is supplied. */
> 
>         if (argc != 2) {
>             fprintf(stderr, "Usage: %s FILE\n", argv[0]);
>             exit(EXIT_FAILURE);
>         }
> 
>         fd = fanotify_init(0, O_RDONLY | O_LARGEFILE);
>         if (fd == -1) {
>             perror("fanotify_init");
>             exit(EXIT_FAILURE);
>         }
> 
>         if (fanotify_mark(fd, FAN_MARK_ADD | FAN_MARK_INODE, FAN_OPEN, AT_FDCWD,
>                 argv[1]) == -1) {
>             perror("fanotify_mark");
>             exit(EXIT_FAILURE);
>         }
> 
>         while (1) {
>             handle_events(fd);
>         }
>     }
> 
> Tingmao Wang (6):
>   fs/9p: Add ability to identify inode by path for .L
>   fs/9p: add default option for path-based inodes
>   fs/9p: Hide inodeident=path from show_options as it is the default
>   fs/9p: Add ability to identify inode by path for non-.L
>   fs/9p: .L: Refresh stale inodes on reuse
>   fs/9p: non-.L: Refresh stale inodes on reuse
> 
>  fs/9p/Makefile         |   3 +-
>  fs/9p/ino_path.c       | 114 ++++++++++++++++++++++++++++++++++++++
>  fs/9p/v9fs.c           |  33 ++++++++++-
>  fs/9p/v9fs.h           |  63 +++++++++++++++------
>  fs/9p/vfs_inode.c      | 122 +++++++++++++++++++++++++++++++++++------
>  fs/9p/vfs_inode_dotl.c | 108 +++++++++++++++++++++++++++++++++---
>  fs/9p/vfs_super.c      |  10 +++-
>  7 files changed, 406 insertions(+), 47 deletions(-)
>  create mode 100644 fs/9p/ino_path.c
> 
> 
> base-commit: 38fec10eb60d687e30c8c6b5420d86e8149f7557
> prerequisite-patch-id: 12dc6676db52ff32eed082b1e5d273f297737f61
> prerequisite-patch-id: 93ab54c52a41fa44b8d0baf55df949d0ad27e99a
> prerequisite-patch-id: 5f558bf969e6eaa3d011c98de0806ca8ad369efe
> --
> 2.39.5
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 1/6] fs/9p: Add ability to identify inode by path for .L
  2025-04-06 20:43 ` [RFC PATCH 1/6] fs/9p: Add ability to identify inode by path for .L Tingmao Wang
@ 2025-07-05  0:19   ` Al Viro
  2025-07-05  0:25   ` Al Viro
  2025-07-11 19:11   ` Tingmao Wang
  2 siblings, 0 replies; 20+ messages in thread
From: Al Viro @ 2025-07-05  0:19 UTC (permalink / raw)
  To: Tingmao Wang
  Cc: Eric Van Hensbergen, Dominique Martinet, Latchesar Ionkov,
	Christian Schoenebeck, v9fs, Mickaël Salaün,
	Günther Noack, linux-security-module, Jan Kara,
	Amir Goldstein, Matthew Bobrowski, linux-fsdevel

On Sun, Apr 06, 2025 at 09:43:02PM +0100, Tingmao Wang wrote:

> +struct v9fs_ino_path *make_ino_path(struct dentry *dentry)
> +{
> +	struct v9fs_ino_path *path;
> +	size_t path_components = 0;
> +	struct dentry *curr = dentry;
> +	ssize_t i;
> +
> +	lockdep_assert_held_read(&v9fs_dentry2v9ses(dentry)->rename_sem);
> +
> +	rcu_read_lock();
> +
> +    /* Don't include the root dentry */
> +	while (curr->d_parent != curr) {
> +		path_components++;
> +		curr = curr->d_parent;
> +	}
> +	if (WARN_ON(path_components > SSIZE_MAX)) {
> +		rcu_read_unlock();
> +		return NULL;
> +	}
> +
> +	path = kmalloc(struct_size(path, names, path_components),
> +		       GFP_KERNEL);

Blocking allocation under rcu_read_lock().

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 1/6] fs/9p: Add ability to identify inode by path for .L
  2025-04-06 20:43 ` [RFC PATCH 1/6] fs/9p: Add ability to identify inode by path for .L Tingmao Wang
  2025-07-05  0:19   ` Al Viro
@ 2025-07-05  0:25   ` Al Viro
  2025-07-11 19:11     ` Tingmao Wang
  2025-07-11 19:11   ` Tingmao Wang
  2 siblings, 1 reply; 20+ messages in thread
From: Al Viro @ 2025-07-05  0:25 UTC (permalink / raw)
  To: Tingmao Wang
  Cc: Eric Van Hensbergen, Dominique Martinet, Latchesar Ionkov,
	Christian Schoenebeck, v9fs, Mickaël Salaün,
	Günther Noack, linux-security-module, Jan Kara,
	Amir Goldstein, Matthew Bobrowski, linux-fsdevel

On Sun, Apr 06, 2025 at 09:43:02PM +0100, Tingmao Wang wrote:
> +bool ino_path_compare(struct v9fs_ino_path *ino_path,
> +			     struct dentry *dentry)
> +{
> +	struct dentry *curr = dentry;
> +	struct qstr *curr_name;
> +	struct name_snapshot *compare;
> +	ssize_t i;
> +
> +	lockdep_assert_held_read(&v9fs_dentry2v9ses(dentry)->rename_sem);
> +
> +	rcu_read_lock();
> +	for (i = ino_path->nr_components - 1; i >= 0; i--) {
> +		if (curr->d_parent == curr) {
> +			/* We're supposed to have more components to walk */
> +			rcu_read_unlock();
> +			return false;
> +		}
> +		curr_name = &curr->d_name;
> +		compare = &ino_path->names[i];
> +		/*
> +		 * We can't use hash_len because it is salted with the parent
> +		 * dentry pointer.  We could make this faster by pre-computing our
> +		 * own hashlen for compare and ino_path outside, probably.
> +		 */
> +		if (curr_name->len != compare->name.len) {
> +			rcu_read_unlock();
> +			return false;
> +		}
> +		if (strncmp(curr_name->name, compare->name.name,
> +			    curr_name->len) != 0) {

... without any kind of protection for curr_name.  Incidentally,
what about rename()?  Not a cross-directory one, just one that
changes the name of a subdirectory within the same parent?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 1/6] fs/9p: Add ability to identify inode by path for .L
  2025-07-05  0:25   ` Al Viro
@ 2025-07-11 19:11     ` Tingmao Wang
  2025-08-08  8:32       ` Mickaël Salaün
  0 siblings, 1 reply; 20+ messages in thread
From: Tingmao Wang @ 2025-07-11 19:11 UTC (permalink / raw)
  To: Al Viro
  Cc: Eric Van Hensbergen, Dominique Martinet, Latchesar Ionkov,
	Christian Schoenebeck, v9fs, Mickaël Salaün,
	Günther Noack, linux-security-module, Jan Kara,
	Amir Goldstein, Matthew Bobrowski, linux-fsdevel

Hi Al, thanks for the review :)  I haven't had the chance to properly
think about this until today, so apologies for the delay.

On 7/5/25 01:19, Al Viro wrote:
> On Sun, Apr 06, 2025 at 09:43:02PM +0100, Tingmao Wang wrote:
> 
>> +struct v9fs_ino_path *make_ino_path(struct dentry *dentry)
>> +{
>> +	struct v9fs_ino_path *path;
>> +	size_t path_components = 0;
>> +	struct dentry *curr = dentry;
>> +	ssize_t i;
>> +
>> +	lockdep_assert_held_read(&v9fs_dentry2v9ses(dentry)->rename_sem);
>> +
>> +	rcu_read_lock();
>> +
>> +    /* Don't include the root dentry */
>> +	while (curr->d_parent != curr) {
>> +		path_components++;
>> +		curr = curr->d_parent;
>> +	}
>> +	if (WARN_ON(path_components > SSIZE_MAX)) {

(Looking at this again I think this check is a bit bogus.  I don't know
how would it be possible at all for us to have > SSIZE_MAX deep
directories especially since each level requires a dentry allocation, but
even if this check is actually useful, it should be in the while loop,
before each path_components++)

>> +		rcu_read_unlock();
>> +		return NULL;
>> +	}
>> +
>> +	path = kmalloc(struct_size(path, names, path_components),
>> +		       GFP_KERNEL);
> 
> Blocking allocation under rcu_read_lock().

I think my first instinct of how to fix this, if the original code is
correct barring this allocation issue, would be to take rcu read lock
twice (first walk to calculate how much to allocate, then second walk to
actually take the snapshots).  We should be safe to rcu_read_unlock() in
the middle as long as caller has a reference to the target dentry (this
needs to be true even if we just do one rcu_read_lock() anyway), and we
can start a parent walk again.  The v9fs rename_sem should ensure we see
the same path again.

Alternatively, we can use dget_parent to do the walk, and not lock RCU at
all.  We still need to walk twice tho, to know how much to allocate.  But
for now I will keep the current approach.

New version:

/*
 * Must hold rename_sem due to traversing parents.  Caller must hold
 * reference to dentry.
 */
struct v9fs_ino_path *make_ino_path(struct dentry *dentry)
{
	struct v9fs_ino_path *path;
	size_t path_components = 0;
	struct dentry *curr = dentry;
	ssize_t i;

	lockdep_assert_held_read(&v9fs_dentry2v9ses(dentry)->rename_sem);
	might_sleep(); /* Allocation below might block */

	rcu_read_lock();

	/* Don't include the root dentry */
	while (curr->d_parent != curr) {
		if (WARN_ON(path_components >= SSIZE_MAX)) {
			rcu_read_unlock();
			return NULL;
		}
		path_components++;
		curr = curr->d_parent;
	}

	/*
	 * Allocation can block so don't do it in RCU (and because the
	 * allocation might be large, since name_snapshot leaves space for
	 * inline str, not worth trying GFP_ATOMIC)
	 */
	rcu_read_unlock();

	path = kmalloc(struct_size(path, names, path_components), GFP_KERNEL);
	if (!path) {
		rcu_read_unlock();
		return NULL;
	}

	path->nr_components = path_components;
	curr = dentry;

	rcu_read_lock();
	for (i = path_components - 1; i >= 0; i--) {
		take_dentry_name_snapshot(&path->names[i], curr);
		curr = curr->d_parent;
	}
	WARN_ON(curr != curr->d_parent);
	rcu_read_unlock();
	return path;
}

How does this look?

On 7/5/25 01:25, Al Viro wrote:
> On Sun, Apr 06, 2025 at 09:43:02PM +0100, Tingmao Wang wrote:
>> +bool ino_path_compare(struct v9fs_ino_path *ino_path,
>> +			     struct dentry *dentry)
>> +{
>> +	struct dentry *curr = dentry;
>> +	struct qstr *curr_name;
>> +	struct name_snapshot *compare;
>> +	ssize_t i;
>> +
>> +	lockdep_assert_held_read(&v9fs_dentry2v9ses(dentry)->rename_sem);
>> +
>> +	rcu_read_lock();
>> +	for (i = ino_path->nr_components - 1; i >= 0; i--) {
>> +		if (curr->d_parent == curr) {
>> +			/* We're supposed to have more components to walk */
>> +			rcu_read_unlock();
>> +			return false;
>> +		}
>> +		curr_name = &curr->d_name;
>> +		compare = &ino_path->names[i];
>> +		/*
>> +		 * We can't use hash_len because it is salted with the parent
>> +		 * dentry pointer.  We could make this faster by pre-computing our
>> +		 * own hashlen for compare and ino_path outside, probably.
>> +		 */
>> +		if (curr_name->len != compare->name.len) {
>> +			rcu_read_unlock();
>> +			return false;
>> +		}
>> +		if (strncmp(curr_name->name, compare->name.name,
>> +			    curr_name->len) != 0) {
> 
> ... without any kind of protection for curr_name.  Incidentally,
> what about rename()?  Not a cross-directory one, just one that
> changes the name of a subdirectory within the same parent?

As far as I can tell, in v9fs_vfs_rename, v9ses->rename_sem is taken for
both same-parent and different parent renames, so I think we're safe here
(and hopefully for any v9fs dentries, nobody should be causing d_name to
change except for ourselves when we call d_move in v9fs_vfs_rename?  If
yes then because we also take v9ses->rename_sem, in theory we should be
fine here...?)

(Let me know if I missed anything.  I'm assuming only the filesystem
"owning" a dentry should d_move/d_exchange the dentry.)

However, I see that there is a d_same_name function in dcache.c which is
slightly more careful (but still requires the caller to check the dentry
seqcount, which we do not need to because of the reasoning above), and in
hindsight I think that is probably the more proper way to do this
comparison (and will also handle case-insensitivity, although I've not
explored if this is applicable to 9pfs).

New version:

/*
 * Must hold rename_sem due to traversing parents
 */
bool ino_path_compare(struct v9fs_ino_path *ino_path, struct dentry *dentry)
{
	struct dentry *curr = dentry;
	struct name_snapshot *compare;
	ssize_t i;

	lockdep_assert_held_read(&v9fs_dentry2v9ses(dentry)->rename_sem);

	rcu_read_lock();
	for (i = ino_path->nr_components - 1; i >= 0; i--) {
		if (curr->d_parent == curr) {
			/* We're supposed to have more components to walk */
			rcu_read_unlock();
			return false;
		}
		compare = &ino_path->names[i];
		if (!d_same_name(curr, curr->d_parent, &compare->name)) {
			rcu_read_unlock();
			return false;
		}
		curr = curr->d_parent;
	}
	rcu_read_unlock();
	if (curr != curr->d_parent) {
		/* dentry is deeper than ino_path */
		return false;
	}
	return true;
}

If you think this is not enough, can you suggest what would be needed?
I'm thinking maybe we can check dentry seqcount to be safe, but from
earlier discussion in "bpf path iterator" my impression is that that is
VFS internal data - can we use it here (if needed)?

(I assume, from looking at the code, just having a reference does not
prevent d_name from changing)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 1/6] fs/9p: Add ability to identify inode by path for .L
  2025-04-06 20:43 ` [RFC PATCH 1/6] fs/9p: Add ability to identify inode by path for .L Tingmao Wang
  2025-07-05  0:19   ` Al Viro
  2025-07-05  0:25   ` Al Viro
@ 2025-07-11 19:11   ` Tingmao Wang
  2 siblings, 0 replies; 20+ messages in thread
From: Tingmao Wang @ 2025-07-11 19:11 UTC (permalink / raw)
  To: Eric Van Hensbergen, Dominique Martinet, Latchesar Ionkov,
	Christian Schoenebeck
  Cc: v9fs, Mickaël Salaün, Günther Noack,
	linux-security-module, Jan Kara, Amir Goldstein,
	Matthew Bobrowski, linux-fsdevel

On 4/6/25 21:43, Tingmao Wang wrote:
> [...]
>  
> +struct iget_data {
> +	struct p9_stat_dotl *st;
> +	struct dentry *dentry;
> +};
> +
>  static int v9fs_test_inode_dotl(struct inode *inode, void *data)
>  {
>  	struct v9fs_inode *v9inode = V9FS_I(inode);
> -	struct p9_stat_dotl *st = (struct p9_stat_dotl *)data;
> +	struct p9_stat_dotl *st = ((struct iget_data *)data)->st;
> +	struct dentry *dentry = ((struct iget_data *)data)->dentry;
> +	struct v9fs_session_info *v9ses = v9fs_inode2v9ses(inode);
>  
>  	/* don't match inode of different type */
>  	if (inode_wrong_type(inode, st->st_mode))
> @@ -74,22 +81,74 @@ static int v9fs_test_inode_dotl(struct inode *inode, void *data)
>  
>  	if (v9inode->qid.path != st->qid.path)
>  		return 0;
> +
> +	if (v9fs_inode_ident_path(v9ses)) {
> +		if (!ino_path_compare(v9inode->path, dentry)) {
> +			p9_debug(P9_DEBUG_VFS, "Refusing to reuse inode %p based on path mismatch",
> +				 inode);
> +			return 0;
> +		}
> +	}
>  	return 1;
>  }
>  
>  /* Always get a new inode */>  static int v9fs_test_new_inode_dotl(struct inode *inode, void *data)

Looking back, this function should probably be renamed to something like
"v9fs_test_inode_uncached" since it now no longer "always get a new
inode".

Actually, maybe this should be merged with v9fs_test_inode_dotl - the
behavior is basically the same when inodeident=path is enabled.  Maybe the
approach could be that v9fs always re-use inodes (as long as qid matches,
and when inodeident=path, the path matches as well), but if in uncached
mode, it will also always refresh metadata?  (Basically inodes has to be
re-used, even in uncached mode, for Landlock and Fanotify using inode
marks to work)

Doing so does mean that if one sets inodeident=none, in a pathological
9pfs where different file/dirs have same qids, the inode will mistakenly
be re-used (like before be2ca38253 (Revert "fs/9p: simplify iget to remove
unnecessary paths")), but given that the user has specifically set
inodeident=none (i.e. not the default as proposed in this patch), I wonder
if this is acceptable behaviour?

>  {
> +	struct v9fs_inode *v9inode = V9FS_I(inode);
> +	struct p9_stat_dotl *st = ((struct iget_data *)data)->st;
> +	struct dentry *dentry = ((struct iget_data *)data)->dentry;
> +	struct v9fs_session_info *v9ses = v9fs_inode2v9ses(inode);
> +
> +	/*
> +	 * Don't reuse inode of different type, even if we have
> +	 * inodeident=path and path matches.
> +	 */
> +	if (inode_wrong_type(inode, st->st_mode))
> +		return 0;
> +
> +	/*
> +	 * We're only getting here if QID2INO stays the same anyway, so
> +	 * mirroring the qid checks in v9fs_test_inode_dotl
> +	 * (but maybe that check is unnecessary anyway? at least on 64bit)
> +	 */
> +
> +	if (v9inode->qid.type != st->qid.type)
> +		return 0;
> +
> +	if (v9inode->qid.path != st->qid.path)
> +		return 0;
> +
> +	if (v9fs_inode_ident_path(v9ses) && dentry && v9inode->path) {
> +		if (ino_path_compare(V9FS_I(inode)->path, dentry)) {
> +			p9_debug(P9_DEBUG_VFS,
> +				 "Reusing inode %p based on path match", inode);
> +			return 1;
> +		}
> +	}
> +
>  	return 0;
>  }
>  
>  static int v9fs_set_inode_dotl(struct inode *inode,  void *data)
>  {
>  	struct v9fs_inode *v9inode = V9FS_I(inode);
> -	struct p9_stat_dotl *st = (struct p9_stat_dotl *)data;
> +	struct v9fs_session_info *v9ses = v9fs_inode2v9ses(inode);
> +	struct iget_data *idata = data;
> +	struct p9_stat_dotl *st = idata->st;
> +	struct dentry *dentry = idata->dentry;
>  
>  	memcpy(&v9inode->qid, &st->qid, sizeof(st->qid));
>  	inode->i_generation = st->st_gen;
> +	if (v9fs_inode_ident_path(v9ses)) {
> +		if (dentry) {
> +			v9inode->path = make_ino_path(dentry);
> +			if (!v9inode->path)
> +				return -ENOMEM;
> +		} else {
> +			v9inode->path = NULL;
> +		}
> +	}

I realized that this leaves v9inode->path uninitialized if
inodeident=none.  The proper way is

	v9inode->path = NULL;
	if (v9fs_inode_ident_path(v9ses) && dentry) {
		v9inode->path = make_ino_path(dentry);
		if (!v9inode->path)
			return -ENOMEM;
	}

Same change applies for the non-.L version.

>  	return 0;
>  }
>  
> [...]


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 4/6] fs/9p: Add ability to identify inode by path for non-.L
  2025-04-06 20:43 ` [RFC PATCH 4/6] fs/9p: Add ability to identify inode by path for non-.L Tingmao Wang
@ 2025-07-11 19:12   ` Tingmao Wang
  0 siblings, 0 replies; 20+ messages in thread
From: Tingmao Wang @ 2025-07-11 19:12 UTC (permalink / raw)
  To: Eric Van Hensbergen, Dominique Martinet, Latchesar Ionkov,
	Christian Schoenebeck
  Cc: v9fs, Mickaël Salaün, Günther Noack,
	linux-security-module, Jan Kara, Amir Goldstein,
	Matthew Bobrowski, linux-fsdevel

On 4/6/25 21:43, Tingmao Wang wrote:
> [...]
> -static int v9fs_set_inode(struct inode *inode,  void *data)
> +static int v9fs_set_inode(struct inode *inode, void *data)
>  {
>  	struct v9fs_inode *v9inode = V9FS_I(inode);
> -	struct p9_wstat *st = (struct p9_wstat *)data;
> +	struct v9fs_session_info *v9ses = v9fs_inode2v9ses(inode);
> +	struct iget_data *idata = data;
> +	struct p9_wstat *st = idata->st;
> +	struct dentry *dentry = idata->dentry;
>  
>  	memcpy(&v9inode->qid, &st->qid, sizeof(st->qid));
> +	if (v9fs_inode_ident_path(v9ses)) {
> +		if (dentry) {
> +			v9inode->path = make_ino_path(dentry);
> +			if (!v9inode->path)
> +				return -ENOMEM;
> +		} else {
> +			v9inode->path = NULL;
> +		}
> +	}

Fix v9inode->path uninitialized if inodeident=none, similar to the .L
case:

	if (v9fs_inode_ident_path(v9ses) && dentry) {
		v9inode->path = make_ino_path(dentry);
		if (!v9inode->path)
			return -ENOMEM;
	}

>  	return 0;
>  }
>  
> [...]


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 1/6] fs/9p: Add ability to identify inode by path for .L
  2025-07-11 19:11     ` Tingmao Wang
@ 2025-08-08  8:32       ` Mickaël Salaün
  2025-08-12 23:57         ` Tingmao Wang
  0 siblings, 1 reply; 20+ messages in thread
From: Mickaël Salaün @ 2025-08-08  8:32 UTC (permalink / raw)
  To: Tingmao Wang
  Cc: Al Viro, Eric Van Hensbergen, Dominique Martinet,
	Latchesar Ionkov, Christian Schoenebeck, v9fs, Günther Noack,
	linux-security-module, Jan Kara, Amir Goldstein,
	Matthew Bobrowski, linux-fsdevel

On Fri, Jul 11, 2025 at 08:11:44PM +0100, Tingmao Wang wrote:
> Hi Al, thanks for the review :)  I haven't had the chance to properly
> think about this until today, so apologies for the delay.
> 
> On 7/5/25 01:19, Al Viro wrote:
> > On Sun, Apr 06, 2025 at 09:43:02PM +0100, Tingmao Wang wrote:
> > 
> >> +struct v9fs_ino_path *make_ino_path(struct dentry *dentry)
> >> +{
> >> +	struct v9fs_ino_path *path;
> >> +	size_t path_components = 0;
> >> +	struct dentry *curr = dentry;
> >> +	ssize_t i;
> >> +
> >> +	lockdep_assert_held_read(&v9fs_dentry2v9ses(dentry)->rename_sem);
> >> +
> >> +	rcu_read_lock();
> >> +
> >> +    /* Don't include the root dentry */
> >> +	while (curr->d_parent != curr) {
> >> +		path_components++;
> >> +		curr = curr->d_parent;
> >> +	}
> >> +	if (WARN_ON(path_components > SSIZE_MAX)) {
> 
> (Looking at this again I think this check is a bit bogus.  I don't know
> how would it be possible at all for us to have > SSIZE_MAX deep
> directories especially since each level requires a dentry allocation, but
> even if this check is actually useful, it should be in the while loop,
> before each path_components++)

WARN_ON_ONCE() would be better, especially in a while loop.  I avoid
using WARN_ON(), especially when that can be triggered by users.

> 
> >> +		rcu_read_unlock();
> >> +		return NULL;
> >> +	}
> >> +
> >> +	path = kmalloc(struct_size(path, names, path_components),
> >> +		       GFP_KERNEL);
> > 
> > Blocking allocation under rcu_read_lock().
> 
> I think my first instinct of how to fix this, if the original code is
> correct barring this allocation issue, would be to take rcu read lock
> twice (first walk to calculate how much to allocate, then second walk to
> actually take the snapshots).  We should be safe to rcu_read_unlock() in
> the middle as long as caller has a reference to the target dentry (this
> needs to be true even if we just do one rcu_read_lock() anyway), and we
> can start a parent walk again.  The v9fs rename_sem should ensure we see
> the same path again.
> 
> Alternatively, we can use dget_parent to do the walk, and not lock RCU at
> all.  We still need to walk twice tho, to know how much to allocate.  But
> for now I will keep the current approach.
> 
> New version:
> 
> /*
>  * Must hold rename_sem due to traversing parents.  Caller must hold
>  * reference to dentry.
>  */
> struct v9fs_ino_path *make_ino_path(struct dentry *dentry)
> {
> 	struct v9fs_ino_path *path;
> 	size_t path_components = 0;
> 	struct dentry *curr = dentry;
> 	ssize_t i;
> 
> 	lockdep_assert_held_read(&v9fs_dentry2v9ses(dentry)->rename_sem);
> 	might_sleep(); /* Allocation below might block */
> 
> 	rcu_read_lock();
> 
> 	/* Don't include the root dentry */
> 	while (curr->d_parent != curr) {
> 		if (WARN_ON(path_components >= SSIZE_MAX)) {
> 			rcu_read_unlock();
> 			return NULL;
> 		}
> 		path_components++;
> 		curr = curr->d_parent;
> 	}
> 
> 	/*
> 	 * Allocation can block so don't do it in RCU (and because the
> 	 * allocation might be large, since name_snapshot leaves space for
> 	 * inline str, not worth trying GFP_ATOMIC)
> 	 */
> 	rcu_read_unlock();
> 
> 	path = kmalloc(struct_size(path, names, path_components), GFP_KERNEL);
> 	if (!path) {
> 		rcu_read_unlock();

This unlock is wrong.

> 		return NULL;
> 	}
> 
> 	path->nr_components = path_components;
> 	curr = dentry;
> 
> 	rcu_read_lock();
> 	for (i = path_components - 1; i >= 0; i--) {
> 		take_dentry_name_snapshot(&path->names[i], curr);
> 		curr = curr->d_parent;
> 	}
> 	WARN_ON(curr != curr->d_parent);
> 	rcu_read_unlock();
> 	return path;
> }
> 
> How does this look?

Looks good to me overall.  Please sent a new patch series.

> 
> On 7/5/25 01:25, Al Viro wrote:
> > On Sun, Apr 06, 2025 at 09:43:02PM +0100, Tingmao Wang wrote:
> >> +bool ino_path_compare(struct v9fs_ino_path *ino_path,
> >> +			     struct dentry *dentry)
> >> +{
> >> +	struct dentry *curr = dentry;
> >> +	struct qstr *curr_name;
> >> +	struct name_snapshot *compare;
> >> +	ssize_t i;
> >> +
> >> +	lockdep_assert_held_read(&v9fs_dentry2v9ses(dentry)->rename_sem);
> >> +
> >> +	rcu_read_lock();
> >> +	for (i = ino_path->nr_components - 1; i >= 0; i--) {
> >> +		if (curr->d_parent == curr) {
> >> +			/* We're supposed to have more components to walk */
> >> +			rcu_read_unlock();
> >> +			return false;
> >> +		}
> >> +		curr_name = &curr->d_name;
> >> +		compare = &ino_path->names[i];
> >> +		/*
> >> +		 * We can't use hash_len because it is salted with the parent
> >> +		 * dentry pointer.  We could make this faster by pre-computing our
> >> +		 * own hashlen for compare and ino_path outside, probably.
> >> +		 */
> >> +		if (curr_name->len != compare->name.len) {
> >> +			rcu_read_unlock();
> >> +			return false;
> >> +		}
> >> +		if (strncmp(curr_name->name, compare->name.name,
> >> +			    curr_name->len) != 0) {
> > 
> > ... without any kind of protection for curr_name.  Incidentally,
> > what about rename()?  Not a cross-directory one, just one that
> > changes the name of a subdirectory within the same parent?
> 
> As far as I can tell, in v9fs_vfs_rename, v9ses->rename_sem is taken for
> both same-parent and different parent renames, so I think we're safe here
> (and hopefully for any v9fs dentries, nobody should be causing d_name to
> change except for ourselves when we call d_move in v9fs_vfs_rename?  If
> yes then because we also take v9ses->rename_sem, in theory we should be
> fine here...?)

A lockdep_assert_held() or similar and a comment would make this clear.

> 
> (Let me know if I missed anything.  I'm assuming only the filesystem
> "owning" a dentry should d_move/d_exchange the dentry.)
> 
> However, I see that there is a d_same_name function in dcache.c which is
> slightly more careful (but still requires the caller to check the dentry
> seqcount, which we do not need to because of the reasoning above), and in
> hindsight I think that is probably the more proper way to do this
> comparison (and will also handle case-insensitivity, although I've not
> explored if this is applicable to 9pfs).
> 
> New version:
> 
> /*
>  * Must hold rename_sem due to traversing parents
>  */
> bool ino_path_compare(struct v9fs_ino_path *ino_path, struct dentry *dentry)
> {
> 	struct dentry *curr = dentry;
> 	struct name_snapshot *compare;
> 	ssize_t i;
> 
> 	lockdep_assert_held_read(&v9fs_dentry2v9ses(dentry)->rename_sem);
> 
> 	rcu_read_lock();
> 	for (i = ino_path->nr_components - 1; i >= 0; i--) {
> 		if (curr->d_parent == curr) {
> 			/* We're supposed to have more components to walk */
> 			rcu_read_unlock();
> 			return false;
> 		}
> 		compare = &ino_path->names[i];
> 		if (!d_same_name(curr, curr->d_parent, &compare->name)) {
> 			rcu_read_unlock();
> 			return false;
> 		}
> 		curr = curr->d_parent;
> 	}
> 	rcu_read_unlock();
> 	if (curr != curr->d_parent) {
> 		/* dentry is deeper than ino_path */
> 		return false;
> 	}
> 	return true;
> }

I like this new version.

> 
> If you think this is not enough, can you suggest what would be needed?
> I'm thinking maybe we can check dentry seqcount to be safe, but from
> earlier discussion in "bpf path iterator" my impression is that that is
> VFS internal data - can we use it here (if needed)?
> 
> (I assume, from looking at the code, just having a reference does not
> prevent d_name from changing)
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 0/6] fs/9p: Reuse inode based on path (in addition to qid)
  2025-04-06 20:43 [RFC PATCH 0/6] fs/9p: Reuse inode based on path (in addition to qid) Tingmao Wang
                   ` (6 preceding siblings ...)
  2025-07-04 18:37 ` [RFC PATCH 0/6] fs/9p: Reuse inode based on path (in addition to qid) Mickaël Salaün
@ 2025-08-08 10:27 ` Dominique Martinet
  2025-08-08 10:52   ` Christian Schoenebeck
  7 siblings, 1 reply; 20+ messages in thread
From: Dominique Martinet @ 2025-08-08 10:27 UTC (permalink / raw)
  To: Tingmao Wang
  Cc: Eric Van Hensbergen, Latchesar Ionkov, Christian Schoenebeck,
	v9fs, Mickaël Salaün, Günther Noack,
	linux-security-module, Jan Kara, Amir Goldstein,
	Matthew Bobrowski, linux-fsdevel

Sorry for the delay...

Tingmao Wang wrote on Sun, Apr 06, 2025 at 09:43:01PM +0100:
> Unrelated to the above problem, it also seems like even with the revert in
> [2], because in cached mode inode are still reused based on qid (and type,
> version (aka mtime), etc), the setup mentioned in [2] still causes
> problems in th latest kernel with cache=loose:

cache=loose is "you're on your own", I think it's fine to keep as is,
especially given qemu can handle it with multidevs=remap if required

> With the above in mind, I have a proposal for 9pfs to:
> 1. Reuse inodes even in uncached mode
> 2. However, reuse them based on qid.path AND the actual pathname, by doing
>    the appropriate testing in v9fs_test(_new)?_inode(_dotl)?

I think that's fine for cache=none, but it breaks hardlinks on
cache=loose so I think this ought to only be done without cache
(I haven't really played with the cache flag bits, not check pathname if
any of loose, writeback or metadata are set?)

> The main problem here is how to store the pathname in a sensible way and
> tie it to the inode.  For now I opted with an array of names acquired with
> take_dentry_name_snapshot, which reuses the same memory as the dcache to
> store the actual strings, but doesn't tie the lifetime of the dentry with
> the inode (I thought about holding a reference to the dentry in the
> v9fs_inode, but it seemed like a wrong approach and would cause dentries
> to not be evicted/released).

That's pretty hard to get right and I wish we had more robust testing
there... But I guess that's appropriate enough.

I know Atos has done an implementation that keeps the full path
somewhere to re-open fids in case of server reconnections, but that code
has never been submitted upstream that I can see so I can't check how
they used to store the path :/ Ohwell.

> Storing one pathname per inode also means we don't reuse the same inode
> for hardlinks -- maybe this can be fixed as well in a future version, if
> this approach sounds good?

Ah, you pointed it out yourself. I don't see how we could fix that, as
we have no way other than the qid to identify hard links; so this really
ought to depend on cache level if you want to support landlock/*notify
in cache=none.

Frankly the *notify use-case is difficult to support properly, as files
can change from under us (e.g. modifying the file directly on the host
in qemu case, or just multiple mounts of the same directory), so it
can't be relied on in the general case anyway -- 9p doesn't have
anything like NFSv4 leases to get notified when other clients write a
file we "own", so whatever we do will always be limited...
But I guess it can make sense for limited monitoring e.g. rebuilding
something on change and things like that?

-- 
Dominique Martinet | Asmadeus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 0/6] fs/9p: Reuse inode based on path (in addition to qid)
  2025-08-08 10:27 ` Dominique Martinet
@ 2025-08-08 10:52   ` Christian Schoenebeck
  2025-08-12 23:53     ` Tingmao Wang
  0 siblings, 1 reply; 20+ messages in thread
From: Christian Schoenebeck @ 2025-08-08 10:52 UTC (permalink / raw)
  To: Tingmao Wang, Dominique Martinet
  Cc: Eric Van Hensbergen, Latchesar Ionkov, v9fs,
	Mickaël Salaün, Günther Noack,
	linux-security-module, Jan Kara, Amir Goldstein,
	Matthew Bobrowski, linux-fsdevel

On Friday, August 8, 2025 12:27:15 PM CEST Dominique Martinet wrote:
> Sorry for the delay...
> 
> Tingmao Wang wrote on Sun, Apr 06, 2025 at 09:43:01PM +0100:
> > Unrelated to the above problem, it also seems like even with the revert in
> > [2], because in cached mode inode are still reused based on qid (and type,
> > version (aka mtime), etc), the setup mentioned in [2] still causes
> > problems in th latest kernel with cache=loose:
> 
> cache=loose is "you're on your own", I think it's fine to keep as is,
> especially given qemu can handle it with multidevs=remap if required

As of QEMU 10.0, multidevs=remap (i.e. remapping inodes from host to guest) is
now the default behaviour, since inode collisions were constantly causing
issues and confusion among 9p users.

And yeah, cache=loose means 9p client is blind for whatever changes on 9p
server side.

/Christian



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 0/6] fs/9p: Reuse inode based on path (in addition to qid)
  2025-08-08 10:52   ` Christian Schoenebeck
@ 2025-08-12 23:53     ` Tingmao Wang
  2025-08-13  5:30       ` Dominique Martinet
  0 siblings, 1 reply; 20+ messages in thread
From: Tingmao Wang @ 2025-08-12 23:53 UTC (permalink / raw)
  To: Dominique Martinet, Christian Schoenebeck
  Cc: Eric Van Hensbergen, Latchesar Ionkov, v9fs,
	Mickaël Salaün, Günther Noack,
	linux-security-module, Jan Kara, Amir Goldstein,
	Matthew Bobrowski, Al Viro, linux-fsdevel

On 8/8/25 11:27, Dominique Martinet wrote:
> Sorry for the delay...

No worries, thanks for picking this up!

> 
> Tingmao Wang wrote on Sun, Apr 06, 2025 at 09:43:01PM +0100:
>> Unrelated to the above problem, it also seems like even with the revert in
>> [2], because in cached mode inode are still reused based on qid (and type,
>> version (aka mtime), etc), the setup mentioned in [2] still causes
>> problems in th latest kernel with cache=loose:
> 
> cache=loose is "you're on your own", I think it's fine to keep as is,
> especially given qemu can handle it with multidevs=remap if required

On 8/8/25 11:52, Christian Schoenebeck wrote:
>> [...]
> 
> As of QEMU 10.0, multidevs=remap (i.e. remapping inodes from host to guest) is
> now the default behaviour, since inode collisions were constantly causing
> issues and confusion among 9p users.
> 
> And yeah, cache=loose means 9p client is blind for whatever changes on 9p
> server side.
> 
> /Christian
> 

My understanding of cache=loose was that it only assumes that the server
side can't change the fs under the client, but otherwise things like
inodes should work identically, even though currently it works differently
due to (only) cached mode re-using inodes - was this deliberate?

Basically, assuming no unexpected server side changes, I would think that
the user would not be totally "on their own" (even if there are colliding
qids).  But given the issue with breaking how hardlinks currently works in
cached mode, as Dominique mentioned below, and the fact that inode
collisions should be rare given the new QEMU default, maybe we do need to
handle this differently (i.e. keep the existing behaviour and working
hardlinks for cached mode)?  I'm happy either way (Landlock already works
currently with cached mode)

On 8/8/25 11:27, Dominique Martinet wrote:
> Tingmao Wang wrote on Sun, Apr 06, 2025 at 09:43:01PM +0100:
> [...]
>> With the above in mind, I have a proposal for 9pfs to:
>> 1. Reuse inodes even in uncached mode
>> 2. However, reuse them based on qid.path AND the actual pathname, by doing
>>    the appropriate testing in v9fs_test(_new)?_inode(_dotl)?
> 
> I think that's fine for cache=none, but it breaks hardlinks on
> cache=loose so I think this ought to only be done without cache
> (I haven't really played with the cache flag bits, not check pathname if
> any of loose, writeback or metadata are set?)
> 

I think currently 9pfs reuse inodes if cache is either loose or metadata
(but not writeback), given:

	else if (v9ses->cache & (CACHE_META|CACHE_LOOSE))
		inode = v9fs_get_inode_from_fid(v9ses, fid, dir->i_sb);
	else
		inode = v9fs_get_new_inode_from_fid(v9ses, fid, dir->i_sb);

in v9fs_vfs_lookup, so maybe we keep this pattern and not check pathname
if (loose|metadata) is set (but not writeback)?

>> The main problem here is how to store the pathname in a sensible way and
>> tie it to the inode.  For now I opted with an array of names acquired with
>> take_dentry_name_snapshot, which reuses the same memory as the dcache to
>> store the actual strings

(Self correction: technically, the space is only shared if they are long
enough to not be inlined, which is DNAME_INLINE_LEN = 40 for 64bit or 20
for 32bit, so in most cases the names would probably be copied.  Maybe it
would be more compact in terms of memory usage to just store the path as a
string, with '/' separating components?  But then the code would be more
complex and we can't easily use d_same_name anymore, so maybe it's not
worth doing, unless this will prove useful for other purposes, like the
re-opening of fid mentioned below?)

>> , but doesn't tie the lifetime of the dentry with
>> the inode (I thought about holding a reference to the dentry in the
>> v9fs_inode, but it seemed like a wrong approach and would cause dentries
>> to not be evicted/released).
> 
> That's pretty hard to get right and I wish we had more robust testing
> there... But I guess that's appropriate enough.
> 
> I know Atos has done an implementation that keeps the full path
> somewhere to re-open fids in case of server reconnections, but that code
> has never been submitted upstream that I can see so I can't check how
> they used to store the path :/ Ohwell.
> 
>> Storing one pathname per inode also means we don't reuse the same inode
>> for hardlinks -- maybe this can be fixed as well in a future version, if
>> this approach sounds good?
> 
> Ah, you pointed it out yourself. I don't see how we could fix that, as
> we have no way other than the qid to identify hard links; so this really
> ought to depend on cache level if you want to support landlock/*notify
> in cache=none.

In that case, and as discussed above, I'm happy to change this patch
series to only affect uncached.

> 
> Frankly the *notify use-case is difficult to support properly, as files
> can change from under us (e.g. modifying the file directly on the host
> in qemu case, or just multiple mounts of the same directory), so it
> can't be relied on in the general case anyway -- 9p doesn't have
> anything like NFSv4 leases to get notified when other clients write a
> file we "own", so whatever we do will always be limited...
> But I guess it can make sense for limited monitoring e.g. rebuilding
> something on change and things like that?

One of the first use case I can think of here is IDE/code editors
reloading state (e.g. the file tree) on change, which I think didn't work
for 9pfs folders opened with vscode if I remembered correctly (but I
haven't tested this recently).  Even though we can't monitor for remote
changes, having this work for local changes would still be nice.

Note that aside from Mickaël's comments which I will apply in the next
version, I also realized that I haven't done the proper handling for
renames (it should probably update the v9fs_ino_path in the renamed
inode).  I will do that in the next version (in addition to changing the
cached behaviour).  Do let me know if you have any other comment on the
patch series.

Best,
Tingmao

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 1/6] fs/9p: Add ability to identify inode by path for .L
  2025-08-08  8:32       ` Mickaël Salaün
@ 2025-08-12 23:57         ` Tingmao Wang
  2025-08-13  7:47           ` Mickaël Salaün
  0 siblings, 1 reply; 20+ messages in thread
From: Tingmao Wang @ 2025-08-12 23:57 UTC (permalink / raw)
  To: Mickaël Salaün, Dominique Martinet
  Cc: Al Viro, Eric Van Hensbergen, Latchesar Ionkov,
	Christian Schoenebeck, v9fs, Günther Noack,
	linux-security-module, Jan Kara, Amir Goldstein,
	Matthew Bobrowski, linux-fsdevel

Thanks for the review :)  I will try to send a v2 in the coming weeks with
the two changes you suggested and the changes to cached mode as suggested
by Dominique (plus rename handling).  (will also try to figure out how to
test with xfstests)

On 8/8/25 09:32, Mickaël Salaün wrote:
> [...]
>> On 7/5/25 01:25, Al Viro wrote:
>>> On Sun, Apr 06, 2025 at 09:43:02PM +0100, Tingmao Wang wrote:
>>>> +bool ino_path_compare(struct v9fs_ino_path *ino_path,
>>>> +			     struct dentry *dentry)
>>>> +{
>>>> +	struct dentry *curr = dentry;
>>>> +	struct qstr *curr_name;
>>>> +	struct name_snapshot *compare;
>>>> +	ssize_t i;
>>>> +
>>>> +	lockdep_assert_held_read(&v9fs_dentry2v9ses(dentry)->rename_sem);
>>>> +
>>>> +	rcu_read_lock();
>>>> +	for (i = ino_path->nr_components - 1; i >= 0; i--) {
>>>> +		if (curr->d_parent == curr) {
>>>> +			/* We're supposed to have more components to walk */
>>>> +			rcu_read_unlock();
>>>> +			return false;
>>>> +		}
>>>> +		curr_name = &curr->d_name;
>>>> +		compare = &ino_path->names[i];
>>>> +		/*
>>>> +		 * We can't use hash_len because it is salted with the parent
>>>> +		 * dentry pointer.  We could make this faster by pre-computing our
>>>> +		 * own hashlen for compare and ino_path outside, probably.
>>>> +		 */
>>>> +		if (curr_name->len != compare->name.len) {
>>>> +			rcu_read_unlock();
>>>> +			return false;
>>>> +		}
>>>> +		if (strncmp(curr_name->name, compare->name.name,
>>>> +			    curr_name->len) != 0) {
>>>
>>> ... without any kind of protection for curr_name.  Incidentally,
>>> what about rename()?  Not a cross-directory one, just one that
>>> changes the name of a subdirectory within the same parent?
>>
>> As far as I can tell, in v9fs_vfs_rename, v9ses->rename_sem is taken for
>> both same-parent and different parent renames, so I think we're safe here
>> (and hopefully for any v9fs dentries, nobody should be causing d_name to
>> change except for ourselves when we call d_move in v9fs_vfs_rename?  If
>> yes then because we also take v9ses->rename_sem, in theory we should be
>> fine here...?)
> 
> A lockdep_assert_held() or similar and a comment would make this clear.

I can add a comment, but there is already a lockdep_assert_held_read of
the v9fs rename sem at the top of this function.

> [...]
>> /*
>>  * Must hold rename_sem due to traversing parents
>>  */
>> bool ino_path_compare(struct v9fs_ino_path *ino_path, struct dentry *dentry)
>> {
>> 	struct dentry *curr = dentry;
>> 	struct name_snapshot *compare;
>> 	ssize_t i;
>>
>> 	lockdep_assert_held_read(&v9fs_dentry2v9ses(dentry)->rename_sem);
>>
>> 	rcu_read_lock();
>> 	for (i = ino_path->nr_components - 1; i >= 0; i--) {
>> 		if (curr->d_parent == curr) {
>> 			/* We're supposed to have more components to walk */
>> 			rcu_read_unlock();
>> 			return false;
>> 		}
>> 		compare = &ino_path->names[i];
>> 		if (!d_same_name(curr, curr->d_parent, &compare->name)) {
>> 			rcu_read_unlock();
>> 			return false;
>> 		}
>> 		curr = curr->d_parent;
>> 	}
>> 	rcu_read_unlock();
>> 	if (curr != curr->d_parent) {

Looking at this again I think this check probably needs to be done inside
RCU, will fix as below:

>> 		/* dentry is deeper than ino_path */
>> 		return false;
>> 	}
>> 	return true;
>> }

diff --git a/fs/9p/ino_path.c b/fs/9p/ino_path.c
index 0000b4964df0..7264003cb087 100644
--- a/fs/9p/ino_path.c
+++ b/fs/9p/ino_path.c
@@ -77,13 +77,15 @@ void free_ino_path(struct v9fs_ino_path *path)
 }
 
 /*
- * Must hold rename_sem due to traversing parents
+ * Must hold rename_sem due to traversing parents.  Returns whether
+ * ino_path matches with the path of a v9fs dentry.
  */
 bool ino_path_compare(struct v9fs_ino_path *ino_path, struct dentry *dentry)
 {
 	struct dentry *curr = dentry;
 	struct name_snapshot *compare;
 	ssize_t i;
+	bool ret;
 
 	lockdep_assert_held_read(&v9fs_dentry2v9ses(dentry)->rename_sem);
 
@@ -101,10 +103,8 @@ bool ino_path_compare(struct v9fs_ino_path *ino_path, struct dentry *dentry)
 		}
 		curr = curr->d_parent;
 	}
+	/* Comparison fails if dentry is deeper than ino_path */
+	ret = (curr == curr->d_parent);
 	rcu_read_unlock();
-	if (curr != curr->d_parent) {
-		/* dentry is deeper than ino_path */
-		return false;
-	}
-	return true;
+	return ret;
 }

> 
> I like this new version.
> 

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 0/6] fs/9p: Reuse inode based on path (in addition to qid)
  2025-08-12 23:53     ` Tingmao Wang
@ 2025-08-13  5:30       ` Dominique Martinet
  0 siblings, 0 replies; 20+ messages in thread
From: Dominique Martinet @ 2025-08-13  5:30 UTC (permalink / raw)
  To: Tingmao Wang
  Cc: Christian Schoenebeck, Eric Van Hensbergen, Latchesar Ionkov,
	v9fs, Mickaël Salaün, Günther Noack,
	linux-security-module, Jan Kara, Amir Goldstein,
	Matthew Bobrowski, Al Viro, linux-fsdevel

Tingmao Wang wrote on Wed, Aug 13, 2025 at 12:53:37AM +0100:
> My understanding of cache=loose was that it only assumes that the server
> side can't change the fs under the client, but otherwise things like
> inodes should work identically, even though currently it works differently
> due to (only) cached mode re-using inodes - was this deliberate?

Right, generally speaking I agree things should work as long as the
mount is exclusive (can't change server side/no other client); but
I think it's fine to extend this to server being "sane" (as in,
protocol-wise we're supposed to be guaranteed that two files are
identical if and only if qid is identical)

> > I think that's fine for cache=none, but it breaks hardlinks on
> > cache=loose so I think this ought to only be done without cache
> > (I haven't really played with the cache flag bits, not check pathname if
> > any of loose, writeback or metadata are set?)
> 
> I think currently 9pfs reuse inodes if cache is either loose or metadata
> (but not writeback), given:
> 
> 	else if (v9ses->cache & (CACHE_META|CACHE_LOOSE))
> 		inode = v9fs_get_inode_from_fid(v9ses, fid, dir->i_sb);
> 	else
> 		inode = v9fs_get_new_inode_from_fid(v9ses, fid, dir->i_sb);
> 
> in v9fs_vfs_lookup, so maybe we keep this pattern and not check pathname
> if (loose|metadata) is set (but not writeback)?

This makes sense, let's stick to loose/metadata for this as well.

> >> The main problem here is how to store the pathname in a sensible way and
> >> tie it to the inode.  For now I opted with an array of names acquired with
> >> take_dentry_name_snapshot, which reuses the same memory as the dcache to
> >> store the actual strings
> 
> (Self correction: technically, the space is only shared if they are long
> enough to not be inlined, which is DNAME_INLINE_LEN = 40 for 64bit or 20
> for 32bit, so in most cases the names would probably be copied.  Maybe it
> would be more compact in terms of memory usage to just store the path as a
> string, with '/' separating components?  But then the code would be more
> complex and we can't easily use d_same_name anymore, so maybe it's not
> worth doing, unless this will prove useful for other purposes, like the
> re-opening of fid mentioned below?)

I think it's better to keep things simple first and check the actual
impact with a couple of simple benchmarks (re-opening a file in a loop
with cache=none should hit this path?)

If we want to improve this, rather than keeping the full string it might
make sense to carry a partial hash in each dentry so we can get away
with checking only the parent's dentry + current dentry, or something
like that?
But, simple and working is better than fast and broken, so let's have a
simple v1 first.

> > Frankly the *notify use-case is difficult to support properly, as files
> > can change from under us (e.g. modifying the file directly on the host
> > in qemu case, or just multiple mounts of the same directory), so it
> > can't be relied on in the general case anyway -- 9p doesn't have
> > anything like NFSv4 leases to get notified when other clients write a
> > file we "own", so whatever we do will always be limited...
> > But I guess it can make sense for limited monitoring e.g. rebuilding
> > something on change and things like that?
> 
> One of the first use case I can think of here is IDE/code editors
> reloading state (e.g. the file tree) on change, which I think didn't work
> for 9pfs folders opened with vscode if I remembered correctly (but I
> haven't tested this recently).  Even though we can't monitor for remote
> changes, having this work for local changes would still be nice.

Yeah, I also do this manually with entr[1], and git's fsmonitor (with
watchman) does that too -- so I guess people living in a 9p mount would
see it..
[1] https://github.com/eradman/entr

But I think 9p is just slow in general so such setups can probably be
improved more drastically by using something else :P

Anyway, thank you for your time/work, I'll try to be more timely in
later reviews.
-- 
Dominique Martinet | Asmadeus

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 1/6] fs/9p: Add ability to identify inode by path for .L
  2025-08-12 23:57         ` Tingmao Wang
@ 2025-08-13  7:47           ` Mickaël Salaün
  0 siblings, 0 replies; 20+ messages in thread
From: Mickaël Salaün @ 2025-08-13  7:47 UTC (permalink / raw)
  To: Tingmao Wang
  Cc: Dominique Martinet, Al Viro, Eric Van Hensbergen,
	Latchesar Ionkov, Christian Schoenebeck, v9fs, Günther Noack,
	linux-security-module, Jan Kara, Amir Goldstein,
	Matthew Bobrowski, linux-fsdevel

On Wed, Aug 13, 2025 at 12:57:49AM +0100, Tingmao Wang wrote:
> Thanks for the review :)  I will try to send a v2 in the coming weeks with
> the two changes you suggested and the changes to cached mode as suggested
> by Dominique (plus rename handling).  (will also try to figure out how to
> test with xfstests)
> 
> On 8/8/25 09:32, Mickaël Salaün wrote:
> > [...]
> >> On 7/5/25 01:25, Al Viro wrote:
> >>> On Sun, Apr 06, 2025 at 09:43:02PM +0100, Tingmao Wang wrote:
> >>>> +bool ino_path_compare(struct v9fs_ino_path *ino_path,
> >>>> +			     struct dentry *dentry)
> >>>> +{
> >>>> +	struct dentry *curr = dentry;
> >>>> +	struct qstr *curr_name;
> >>>> +	struct name_snapshot *compare;
> >>>> +	ssize_t i;
> >>>> +
> >>>> +	lockdep_assert_held_read(&v9fs_dentry2v9ses(dentry)->rename_sem);
> >>>> +
> >>>> +	rcu_read_lock();
> >>>> +	for (i = ino_path->nr_components - 1; i >= 0; i--) {
> >>>> +		if (curr->d_parent == curr) {
> >>>> +			/* We're supposed to have more components to walk */
> >>>> +			rcu_read_unlock();
> >>>> +			return false;
> >>>> +		}
> >>>> +		curr_name = &curr->d_name;
> >>>> +		compare = &ino_path->names[i];
> >>>> +		/*
> >>>> +		 * We can't use hash_len because it is salted with the parent
> >>>> +		 * dentry pointer.  We could make this faster by pre-computing our
> >>>> +		 * own hashlen for compare and ino_path outside, probably.
> >>>> +		 */
> >>>> +		if (curr_name->len != compare->name.len) {
> >>>> +			rcu_read_unlock();
> >>>> +			return false;
> >>>> +		}
> >>>> +		if (strncmp(curr_name->name, compare->name.name,
> >>>> +			    curr_name->len) != 0) {
> >>>
> >>> ... without any kind of protection for curr_name.  Incidentally,
> >>> what about rename()?  Not a cross-directory one, just one that
> >>> changes the name of a subdirectory within the same parent?
> >>
> >> As far as I can tell, in v9fs_vfs_rename, v9ses->rename_sem is taken for
> >> both same-parent and different parent renames, so I think we're safe here
> >> (and hopefully for any v9fs dentries, nobody should be causing d_name to
> >> change except for ourselves when we call d_move in v9fs_vfs_rename?  If
> >> yes then because we also take v9ses->rename_sem, in theory we should be
> >> fine here...?)
> > 
> > A lockdep_assert_held() or similar and a comment would make this clear.
> 
> I can add a comment, but there is already a lockdep_assert_held_read of
> the v9fs rename sem at the top of this function.

I wrote this comment before reading your new version beneath, which
already have this lockdep, so no need to change anything. :)

> 
> > [...]
> >> /*
> >>  * Must hold rename_sem due to traversing parents
> >>  */
> >> bool ino_path_compare(struct v9fs_ino_path *ino_path, struct dentry *dentry)
> >> {
> >> 	struct dentry *curr = dentry;
> >> 	struct name_snapshot *compare;
> >> 	ssize_t i;
> >>
> >> 	lockdep_assert_held_read(&v9fs_dentry2v9ses(dentry)->rename_sem);
> >>
> >> 	rcu_read_lock();
> >> 	for (i = ino_path->nr_components - 1; i >= 0; i--) {
> >> 		if (curr->d_parent == curr) {
> >> 			/* We're supposed to have more components to walk */
> >> 			rcu_read_unlock();
> >> 			return false;
> >> 		}
> >> 		compare = &ino_path->names[i];
> >> 		if (!d_same_name(curr, curr->d_parent, &compare->name)) {
> >> 			rcu_read_unlock();
> >> 			return false;
> >> 		}
> >> 		curr = curr->d_parent;
> >> 	}
> >> 	rcu_read_unlock();
> >> 	if (curr != curr->d_parent) {
> 
> Looking at this again I think this check probably needs to be done inside
> RCU, will fix as below:
> 
> >> 		/* dentry is deeper than ino_path */
> >> 		return false;
> >> 	}
> >> 	return true;
> >> }
> 
> diff --git a/fs/9p/ino_path.c b/fs/9p/ino_path.c
> index 0000b4964df0..7264003cb087 100644
> --- a/fs/9p/ino_path.c
> +++ b/fs/9p/ino_path.c
> @@ -77,13 +77,15 @@ void free_ino_path(struct v9fs_ino_path *path)
>  }
>  
>  /*
> - * Must hold rename_sem due to traversing parents
> + * Must hold rename_sem due to traversing parents.  Returns whether
> + * ino_path matches with the path of a v9fs dentry.
>   */
>  bool ino_path_compare(struct v9fs_ino_path *ino_path, struct dentry *dentry)
>  {
>  	struct dentry *curr = dentry;
>  	struct name_snapshot *compare;
>  	ssize_t i;
> +	bool ret;
>  
>  	lockdep_assert_held_read(&v9fs_dentry2v9ses(dentry)->rename_sem);
>  
> @@ -101,10 +103,8 @@ bool ino_path_compare(struct v9fs_ino_path *ino_path, struct dentry *dentry)
>  		}
>  		curr = curr->d_parent;
>  	}
> +	/* Comparison fails if dentry is deeper than ino_path */
> +	ret = (curr == curr->d_parent);
>  	rcu_read_unlock();
> -	if (curr != curr->d_parent) {
> -		/* dentry is deeper than ino_path */
> -		return false;
> -	}
> -	return true;
> +	return ret;
>  }

Looks good

> 
> > 
> > I like this new version.
> > 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2025-08-13  7:54 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-06 20:43 [RFC PATCH 0/6] fs/9p: Reuse inode based on path (in addition to qid) Tingmao Wang
2025-04-06 20:43 ` [RFC PATCH 1/6] fs/9p: Add ability to identify inode by path for .L Tingmao Wang
2025-07-05  0:19   ` Al Viro
2025-07-05  0:25   ` Al Viro
2025-07-11 19:11     ` Tingmao Wang
2025-08-08  8:32       ` Mickaël Salaün
2025-08-12 23:57         ` Tingmao Wang
2025-08-13  7:47           ` Mickaël Salaün
2025-07-11 19:11   ` Tingmao Wang
2025-04-06 20:43 ` [RFC PATCH 2/6] fs/9p: add default option for path-based inodes Tingmao Wang
2025-04-06 20:43 ` [RFC PATCH 3/6] fs/9p: Hide inodeident=path from show_options as it is the default Tingmao Wang
2025-04-06 20:43 ` [RFC PATCH 4/6] fs/9p: Add ability to identify inode by path for non-.L Tingmao Wang
2025-07-11 19:12   ` Tingmao Wang
2025-04-06 20:43 ` [RFC PATCH 5/6] fs/9p: .L: Refresh stale inodes on reuse Tingmao Wang
2025-04-06 20:43 ` [RFC PATCH 6/6] fs/9p: non-.L: " Tingmao Wang
2025-07-04 18:37 ` [RFC PATCH 0/6] fs/9p: Reuse inode based on path (in addition to qid) Mickaël Salaün
2025-08-08 10:27 ` Dominique Martinet
2025-08-08 10:52   ` Christian Schoenebeck
2025-08-12 23:53     ` Tingmao Wang
2025-08-13  5:30       ` Dominique Martinet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).