[PATCH v2 0/7] fs/9p: Reuse inode based on path (in addition to qid)

linux-security-module.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/7] fs/9p: Reuse inode based on path (in addition to qid)
@ 2025-09-04  0:04 Tingmao Wang
  2025-09-04  0:04 ` [PATCH v2 1/7] fs/9p: Add ability to identify inode by path for .L in uncached mode Tingmao Wang
                   ` (7 more replies)
  0 siblings, 8 replies; 27+ messages in thread
From: Tingmao Wang @ 2025-09-04  0:04 UTC (permalink / raw)
  To: Dominique Martinet, Eric Van Hensbergen, Latchesar Ionkov,
	Christian Schoenebeck, Mickaël Salaün
  Cc: Tingmao Wang, v9fs, Günther Noack, linux-security-module,
	Jan Kara, Amir Goldstein, Matthew Bobrowski, Al Viro,
	linux-fsdevel

Hi!

This is the second version of this series.  The individual commits
contains changelogs (most of them are in the first patch), but overall,
most significantly cached mode (loose or metadata) is now unchanged, there
is no longer a "don't reuse inodes at all" mode, bug fixes, using the
right functions, basic rename handling, and new documentation.

Thanks in advance for the review effort :)

v1: https://lore.kernel.org/all/cover.1743971855.git.m@maowtm.org/

Background
----------

(This section has basically the same content as the v1 cover letter)

Previously [1], I noticed that when using 9pfs filesystems, the Landlock
LSM is blocking access even for files / directories allowed by rules, and
that this has something to do with 9pfs creating new inodes despite
Landlock holding a reference to the existing one.  Because Landlock uses
inodes' in-memory state (i_security) to identify allowed fs
objects/hierarchies, this causes Landlock to partially break on 9pfs, at
least in uncached mode, which is the default:

    # mount -t 9p -o trans=virtio test /mnt
    # env LL_FS_RO=/etc:/usr:/bin:/lib:/mnt/readme LL_FS_RW= /sandboxer bash
    Executing the sandboxed command...
    # cat /mnt/readme
    cat: /mnt/readme: Permission denied

This, however, works if somebody is holding onto the dentry (and it also
works with cache=loose), as in both cases the inode is reused:

    # tail -f /mnt/readme &
    [1] 196
    # env LL_FS_RO=/etc:/usr:/bin:/lib:/mnt/readme LL_FS_RW= /sandboxer bash
    Executing the sandboxed command...
    # cat /mnt/readme
    aa

It also works on directories if one have a shell that cd into the
directory.  Note that this means only certain usage of Landlock are
affected - for example, sandboxing applications that takes a list of files
to allow, landlocks itself, then evecve.  On the other hand, this does not
affect applications that opens a file, then Landlocks itself while keeping
the file it needs open.

While the above is a very simple example, this is problematic in
real-world use cases if Landlock is used to sandox applications on system
that has files mounted via 9pfs, or use 9pfs as the root filesystem.  In
addition, this also affects fanotify / inotify when using inode mark (for
local access):

    root@d8c28a676d72:/# ./fanotify-basic-open /readme & # on virtiofs
    [1] 173
    root@d8c28a676d72:/# cat readme
    aa
    FAN_OPEN: File /readme
    root@d8c28a676d72:/# mount -t 9p -o trans=virtio test /mnt
    root@d8c28a676d72:/# ./fanotify-basic-open /mnt/readme & # on 9pfs
    [2] 176
    root@d8c28a676d72:/# cat /mnt/readme
    aa
    root@d8c28a676d72:/#

Same can be demonstrated with inotifywait.  The source code for
fanotify-basic-open, adopted from the fanotify man page, is available at
https://fileshare.maowtm.org/9pfs-landlock-fix/20250903/fanotify-basic-open.c [2].

Note that this is not a security bug for Landlock since it can only cause
legitimate access to be denied, but might be a problem for fanotify perm
(although I do recognize that using perm on individual inodes is already
perhaps a bit unreliable?)

It seems that there was an attempt at making 9pfs reuse inodes on uncached
mode as well, based on qid.path, however it was reverted [3] due to issues
with servers that present duplicate qids, for example on a QEMU host that
has multiple filesystems mounted under a single 9pfs export without
multidevs=remap, or in the case of other servers that doesn't necessarily
support remapping qids ([4] and more).  I've done some testing on
v6.12-rc4 which has the simplified 9pfs inode code before it was reverted,
and found that Landlock works (however, we of course then have the issue
demonstrated in [3]).

What this series do
-------------------

(Changes since v1: added more reasoning for the ino_path struct)

With the above in mind, I have a proposal for 9pfs to:
1. Reuse inodes even in uncached mode
2. However, reuse them based on qid.path AND the actual pathname, by doing
   the appropriate testing in v9fs_test_inode(_dotl)?

The main problem here is how to store the pathname in a sensible way and
tie it to the inode.  For now I opted with an array of names acquired with
take_dentry_name_snapshot, which reuses the same memory as the dcache to
store the actual strings, but doesn't tie the lifetime of the dentry with
the inode (I thought about holding a reference to the dentry in the
v9fs_inode, but it seemed like a wrong approach and would cause dentries
to not be evicted/released).

Additional discussions
----------------------

(New section)

From some QEMU documentation I read [5] it seems like there is a plan to
resolve these kind of problems in a new version of the protocol, by
expanding the qid to include the filesystem identifier of a file on the
host, so maybe this can be disabled after a successful protocol version
check with the host?  For now, inodeident=path will be the default for
uncached filesystems, which can be set to 'qid' to instead to reuse based
only on server-provided inode numbers.

This patchset currently uses strncmp to compare paths but this might be
able to be optimized into a hash comparison first (not done yet).
Alternatively the path can be stored more compactly in the form of a
single string with `/` in it (like normal paths).  However, we should
normally only need to do this comparison for one pair of filenames, as the
test is only done if qid.path matches in the first place.

This patchset currently does not support enabling path-based inodes in
cached mode.  Additional care needs to be taken to ensure we can refresh
an inode that potentially has data cached, but since Dominique is happy
with cached mode behaving as-is (reusing inodes via qid only), this is not
done.

The current implementation will handle client-side renames of a single
file (or empty directory) correctly, but server side renames, or renaming
a non-empty directory (client or server side), will cause the files being
renamed (or files under the renamed directory) to use new inodes (unless
they are renamed back).  The decision to not update the children of a
client-renamed directory is purely to reduce the complexity of this patch,
but is in principle possible.

Testing and explanations
------------------------

(New section)

    # mount -t 9p -o ... test /mnt
        with the following options:
        - trans=virtio
        - trans=virtio,inodeident=qid
        - trans=virtio,cache=loose
    # env LL_FS_RO=/etc:/usr:/bin:/lib:/mnt/readme LL_FS_RW= /sandboxer bash
    Executing the sandboxed command...
    # cat /mnt/readme
    hi
    ^^ landlock works

    # mount -t 9p -o trans=virtio test /mnt
    # mkdir /mnt/dir
    # mv /mnt/readme /mnt/dir/readme
    # env LL_FS_RO=/etc:/usr:/bin:/lib:/mnt/dir/readme LL_FS_RW= /sandboxer bash
    Executing the sandboxed command...
    # cat /mnt/dir/readme
    hi
    ^^ landlock works

    # # another terminal in guest: mv /mnt/dir/readme /mnt/dir/readme.2
    # cat /mnt/dir/readme.2
    hi
    ^^ ino_path is carried with renames

    # # host: mv 9pfs/dir/readme.2 9pfs/dir/readme
    # cat /mnt/dir/readme.2
    cat: /mnt/dir/readme.2: No such file or directory
    # cat /mnt/dir/readme
    cat: /mnt/dir/readme: Permission denied
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ we can't track renames on the server side
    # # host: mv 9pfs/dir/readme 9pfs/dir/readme.2
    # cat /mnt/dir/readme.2
    hi
    ^^ once the file is back at its original place it works as expected.

    # # another terminal in guest: mv /mnt/dir/readme.2 /mnt/dir/readme
    # cat /mnt/dir/readme
    hi
    ^^ we can track renames of the file directly...
    # # another terminal in guest: mv /mnt/dir /mnt/dir.2
    # cat /mnt/dir.2/readme
    cat: /mnt/dir.2/readme: Permission denied
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ but not renames of the parent directory, even if done client-side

    # # another terminal in guest: mv /mnt/dir.2 /mnt/dir
    # cat /mnt/dir/readme
    hi
    ^^ works once it's back
    # # another terminal in guest: mv /mnt/dir /mnt/dir.2 && mkdir /mnt/dir && echo hi2 > /mnt/dir/readme
    # cat /mnt/dir/readme
    cat: /mnt/dir/readme: Permission denied
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ a different file uses a different inode even if same path

    # # another terminal in guest: mv /mnt/dir.2/readme /mnt/dir/readme
    # cat /mnt/dir/readme
    hi
    # # host: rm 9pfs/dir/readme && echo hi3 > 9pfs/dir/readme
    # cat /mnt/dir/readme
    cat: /mnt/dir/readme: Permission denied
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ a different file (identified by server-side qid changes) uses different inode

fanotify also works, as tested with the program attached at the end.

In addition, I ran xfstests on a uncached 9pfs mount, and while there are
some test failures, it is the same set of failures as on the current
mainline.  Test logs at https://fileshare.maowtm.org/9pfs-landlock-fix/20250903/index.html

Tested also with Mickaël's new v9fs landlock tests [6] (unmerged yet):

    #  RUN           layout3_fs.v9fs.tag_inode_dir_parent ...
    #            OK  layout3_fs.v9fs.tag_inode_dir_parent
    ok 129 layout3_fs.v9fs.tag_inode_dir_parent
    #  RUN           layout3_fs.v9fs.tag_inode_dir_mnt ...
    #            OK  layout3_fs.v9fs.tag_inode_dir_mnt
    ok 130 layout3_fs.v9fs.tag_inode_dir_mnt
    #  RUN           layout3_fs.v9fs.tag_inode_dir_child ...
    #            OK  layout3_fs.v9fs.tag_inode_dir_child
    ok 131 layout3_fs.v9fs.tag_inode_dir_child
    #  RUN           layout3_fs.v9fs.tag_inode_file ...
    #            OK  layout3_fs.v9fs.tag_inode_file
    ok 132 layout3_fs.v9fs.tag_inode_file
    #  RUN           layout3_fs.v9fs.release_inodes ...
    #            OK  layout3_fs.v9fs.release_inodes
    ok 133 layout3_fs.v9fs.release_inodes

This patch series was based on, and mostly tested on v6.17-rc1 + [7]

Kind regards,
Tingmao

[1]: https://github.com/landlock-lsm/linux/issues/45
[2]: https://fileshare.maowtm.org/9pfs-landlock-fix/20250903/fanotify-basic-open.c
[3]: https://lore.kernel.org/all/20241024-revert_iget-v1-4-4cac63d25f72@codewreck.org/
[4]: https://lore.kernel.org/all/20240923100508.GA32066@willie-the-truck/
[5]: https://wiki.qemu.org/Documentation/9p#Protocol_Plans
[6]: https://lore.kernel.org/all/20250704171345.1393451-1-mic@digikod.net/
[7]: https://lore.kernel.org/all/cover.1743956147.git.m@maowtm.org/

Tingmao Wang (7):
  fs/9p: Add ability to identify inode by path for .L in uncached mode
  fs/9p: add option for path-based inodes
  fs/9p: Add ability to identify inode by path for non-.L in uncached
    mode
  fs/9p: .L: Refresh stale inodes on reuse
  fs/9p: non-.L: Refresh stale inodes on reuse
  fs/9p: update the target's ino_path on rename
  docs: fs/9p: Document the "inodeident" option

 Documentation/filesystems/9p.rst |  42 +++++++
 fs/9p/Makefile                   |   3 +-
 fs/9p/ino_path.c                 | 111 ++++++++++++++++++
 fs/9p/v9fs.c                     |  59 +++++++++-
 fs/9p/v9fs.h                     |  87 ++++++++++----
 fs/9p/vfs_inode.c                | 195 ++++++++++++++++++++++++++-----
 fs/9p/vfs_inode_dotl.c           | 171 +++++++++++++++++++++++----
 fs/9p/vfs_super.c                |  13 ++-
 8 files changed, 611 insertions(+), 70 deletions(-)
 create mode 100644 fs/9p/ino_path.c

base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
prerequisite-patch-id: 3dae487a4b3d676de7c20b269553e3e2176b1e36
prerequisite-patch-id: 93ab54c52a41fa44b8d0baf55df949d0ad27e99a
prerequisite-patch-id: 5f558bf969e6eaa3d011c98de0806ca8ad369efe
-- 
2.51.0

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 1/7] fs/9p: Add ability to identify inode by path for .L in uncached mode
  2025-09-04  0:04 [PATCH v2 0/7] fs/9p: Reuse inode based on path (in addition to qid) Tingmao Wang
@ 2025-09-04  0:04 ` Tingmao Wang
  2025-09-04  0:04 ` [PATCH v2 2/7] fs/9p: add option for path-based inodes Tingmao Wang
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 27+ messages in thread
From: Tingmao Wang @ 2025-09-04  0:04 UTC (permalink / raw)
  To: Dominique Martinet, Eric Van Hensbergen, Latchesar Ionkov,
	Christian Schoenebeck, Mickaël Salaün
  Cc: Tingmao Wang, v9fs, Günther Noack, linux-security-module,
	Jan Kara, Amir Goldstein, Matthew Bobrowski, Al Viro,
	linux-fsdevel

The intention of this patch is to allow features like Landlock and
fanotify (inode mark mode) to work on uncached 9pfs.  These features rely
on holding a specific inode and handling further access to the same file
(as identified by that inode), however, currently in uncached mode, we
always get a new inode on each access, due to concerns regarding server
side inode number collision.

On cached mode (either CACHE_LOOSE or CACHE_META), inode is already reused
only by looking at the qid (server-side inode number).  Since introducing
this additional check would regress hard links (as they will have
different path, and thus the two ends of a hard link won't be the same
inode anymore under this approach), this won't be done for cached mode.

Currently this patch doesn't actually have any effect - the next commit
will introduce a config option to control inodeident=path enablement and
default it to on for uncached mode.

Signed-off-by: Tingmao Wang <m@maowtm.org>
Cc: "Mickaël Salaün" <mic@digikod.net>
Cc: "Günther Noack" <gnoack@google.com>
Closes: https://github.com/landlock-lsm/linux/issues/45

---
Changes since v1:
- Assume inodeident=path will not be set in cached mode.

- Fix various issues (rcu usage etc) in ino_path.c with feedback from Al
  Viro and Mickaël Salaün

- Use d_same_name instead of strncmp

- Instead of changing v9fs_test_new_inode_dotl to add the path check (thus
  hijacking the meaning of "new" to actually mean "uncached"), we add the
  path check (conditional on the right flags in v9ses) to the cached test
  function (v9fs_test_inode_dotl) and use that function for both cached
  and uncached mode, by adding additional conditionals within in for the
  version/generation check.  The v9fs_test_new_inode_dotl function is thus
  used only for mknod, mkdir and atomic_open in the "need to create" case.

- Instead of never reusing inode if path-based ident is not enabled, we
  always reuse in uncached mode, but if path-based ident is not enabled,
  we don't check the path.  This makes the code easier to reason about,
  and gets rid of the complexity of having to support two quite different
  mode of operation (re-using and not re-using inodes).

- Fix crash due to uninitialized v9inode->path when inode is allocated
  then immediately deallocated in iget5_locked as a result of two iget
  racing with each other to insert the inode.  Spotted via xfstests.

- Don't allocate v9fs_ino_path within v9fs_set_inode_dotl, as iget5_locked
  specifies that it can't sleep.  Doing so means that we need to handle a
  special case of inode being created and hashed into the inode list, and
  thus may be tested by another iget5_locked call, but its v9inode->path
  has not been populated yet.  This is resolved via waiting for
  iget5_locked to return before checking the path.  This edge case was
  spotted via xfstests.

 fs/9p/Makefile         |   3 +-
 fs/9p/ino_path.c       | 110 ++++++++++++++++++++++++++++++
 fs/9p/v9fs.h           |  74 +++++++++++++++-----
 fs/9p/vfs_inode.c      |  16 +++--
 fs/9p/vfs_inode_dotl.c | 149 +++++++++++++++++++++++++++++++++++------
 fs/9p/vfs_super.c      |  13 +++-
 6 files changed, 321 insertions(+), 44 deletions(-)
 create mode 100644 fs/9p/ino_path.c

diff --git a/fs/9p/Makefile b/fs/9p/Makefile
index e7800a5c7395..38c3ceb26274 100644
--- a/fs/9p/Makefile
+++ b/fs/9p/Makefile
@@ -11,7 +11,8 @@ obj-$(CONFIG_9P_FS) := 9p.o
 	vfs_dentry.o \
 	v9fs.o \
 	fid.o  \
-	xattr.o
+	xattr.o \
+	ino_path.o
 
 9p-$(CONFIG_9P_FSCACHE) += cache.o
 9p-$(CONFIG_9P_FS_POSIX_ACL) += acl.o
diff --git a/fs/9p/ino_path.c b/fs/9p/ino_path.c
new file mode 100644
index 000000000000..a03145e08a9d
--- /dev/null
+++ b/fs/9p/ino_path.c
@@ -0,0 +1,110 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Specific operations on the v9fs_ino_path structure.
+ *
+ * Copyright (C) 2025 by Tingmao Wang <m@maowtm.org>
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/fs.h>
+#include <linux/string.h>
+#include <linux/dcache.h>
+
+#include <linux/posix_acl.h>
+#include <net/9p/9p.h>
+#include <net/9p/client.h>
+#include "v9fs.h"
+
+/*
+ * Must hold rename_sem due to traversing parents.  Caller must hold
+ * reference to dentry.
+ */
+struct v9fs_ino_path *make_ino_path(struct dentry *dentry)
+{
+	struct v9fs_ino_path *path;
+	size_t path_components = 0;
+	struct dentry *curr = dentry;
+	ssize_t i;
+
+	lockdep_assert_held_read(&v9fs_dentry2v9ses(dentry)->rename_sem);
+	might_sleep(); /* Allocation below might block */
+
+	rcu_read_lock();
+
+	/* Don't include the root dentry */
+	while (curr->d_parent != curr) {
+		if (WARN_ON_ONCE(path_components >= SSIZE_MAX)) {
+			rcu_read_unlock();
+			return NULL;
+		}
+		path_components++;
+		curr = curr->d_parent;
+	}
+
+	/*
+	 * Allocation can block so don't do it in RCU (and because the
+	 * allocation might be large, since name_snapshot leaves space for
+	 * inline str, not worth trying GFP_ATOMIC)
+	 */
+	rcu_read_unlock();
+
+	path = kmalloc(struct_size(path, names, path_components), GFP_KERNEL);
+	if (!path)
+		return NULL;
+
+	path->nr_components = path_components;
+	curr = dentry;
+
+	rcu_read_lock();
+	for (i = path_components - 1; i >= 0; i--) {
+		take_dentry_name_snapshot(&path->names[i], curr);
+		curr = curr->d_parent;
+	}
+	WARN_ON(curr != curr->d_parent);
+	rcu_read_unlock();
+	return path;
+}
+
+void free_ino_path(struct v9fs_ino_path *path)
+{
+	if (path) {
+		for (size_t i = 0; i < path->nr_components; i++)
+			release_dentry_name_snapshot(&path->names[i]);
+		kfree(path);
+	}
+}
+
+/*
+ * Must hold rename_sem due to traversing parents.  Returns whether
+ * ino_path matches with the path of a v9fs dentry.  This function does
+ * not sleep.
+ */
+bool ino_path_compare(struct v9fs_ino_path *ino_path, struct dentry *dentry)
+{
+	struct dentry *curr = dentry;
+	struct name_snapshot *compare;
+	ssize_t i;
+	bool ret;
+
+	lockdep_assert_held_read(&v9fs_dentry2v9ses(dentry)->rename_sem);
+
+	rcu_read_lock();
+	for (i = ino_path->nr_components - 1; i >= 0; i--) {
+		if (curr->d_parent == curr) {
+			/* We're supposed to have more components to walk */
+			rcu_read_unlock();
+			return false;
+		}
+		compare = &ino_path->names[i];
+		if (!d_same_name(curr, curr->d_parent, &compare->name)) {
+			rcu_read_unlock();
+			return false;
+		}
+		curr = curr->d_parent;
+	}
+	/* Comparison fails if dentry is deeper than ino_path */
+	ret = (curr == curr->d_parent);
+	rcu_read_unlock();
+	return ret;
+}
diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h
index f28bc763847a..134b55a605be 100644
--- a/fs/9p/v9fs.h
+++ b/fs/9p/v9fs.h
@@ -10,6 +10,7 @@
 
 #include <linux/backing-dev.h>
 #include <linux/netfs.h>
+#include <linux/dcache.h>
 
 /**
  * enum p9_session_flags - option flags for each 9P session
@@ -31,16 +32,17 @@
 #define V9FS_ACL_MASK V9FS_POSIX_ACL
 
 enum p9_session_flags {
-	V9FS_PROTO_2000U    = 0x01,
-	V9FS_PROTO_2000L    = 0x02,
-	V9FS_ACCESS_SINGLE  = 0x04,
-	V9FS_ACCESS_USER    = 0x08,
-	V9FS_ACCESS_CLIENT  = 0x10,
-	V9FS_POSIX_ACL      = 0x20,
-	V9FS_NO_XATTR       = 0x40,
-	V9FS_IGNORE_QV      = 0x80, /* ignore qid.version for cache hints */
-	V9FS_DIRECT_IO      = 0x100,
-	V9FS_SYNC           = 0x200
+	V9FS_PROTO_2000U      = 0x01,
+	V9FS_PROTO_2000L      = 0x02,
+	V9FS_ACCESS_SINGLE    = 0x04,
+	V9FS_ACCESS_USER      = 0x08,
+	V9FS_ACCESS_CLIENT    = 0x10,
+	V9FS_POSIX_ACL        = 0x20,
+	V9FS_NO_XATTR         = 0x40,
+	V9FS_IGNORE_QV        = 0x80, /* ignore qid.version for cache hints */
+	V9FS_DIRECT_IO        = 0x100,
+	V9FS_SYNC             = 0x200,
+	V9FS_INODE_IDENT_PATH = 0x400,
 };
 
 /**
@@ -133,11 +135,27 @@ struct v9fs_session_info {
 /* cache_validity flags */
 #define V9FS_INO_INVALID_ATTR 0x01
 
+struct v9fs_ino_path {
+	size_t nr_components;
+	struct name_snapshot names[] __counted_by(nr_components);
+};
+
+extern struct v9fs_ino_path *make_ino_path(struct dentry *dentry);
+extern void free_ino_path(struct v9fs_ino_path *path);
+extern bool ino_path_compare(struct v9fs_ino_path *ino_path,
+	struct dentry *dentry);
+
 struct v9fs_inode {
 	struct netfs_inode netfs; /* Netfslib context and vfs inode */
 	struct p9_qid qid;
 	unsigned int cache_validity;
 	struct mutex v_mutex;
+
+	/*
+	 * Stores the path of the file this inode is for, only for filesystems
+	 * with inode_ident=path.  Lifetime is the same as this inode.
+	 */
+	struct v9fs_ino_path *path;
 };
 
 static inline struct v9fs_inode *V9FS_I(const struct inode *inode)
@@ -188,7 +206,8 @@ extern const struct inode_operations v9fs_symlink_inode_operations_dotl;
 extern const struct netfs_request_ops v9fs_req_ops;
 extern struct inode *v9fs_inode_from_fid_dotl(struct v9fs_session_info *v9ses,
 					      struct p9_fid *fid,
-					      struct super_block *sb, int new);
+					      struct super_block *sb,
+					      struct dentry *dentry, int new);
 
 /* other default globals */
 #define V9FS_PORT	564
@@ -217,38 +236,57 @@ static inline int v9fs_proto_dotl(struct v9fs_session_info *v9ses)
 	return v9ses->flags & V9FS_PROTO_2000L;
 }
 
+static inline int v9fs_inode_ident_path(struct v9fs_session_info *v9ses)
+{
+	return v9ses->flags & V9FS_INODE_IDENT_PATH;
+}
+
 /**
- * v9fs_get_inode_from_fid - Helper routine to populate an inode by
- * issuing a attribute request
+ * v9fs_get_inode_from_fid - Find or populate an inode by issuing a
+ * attribute request, reusing existing inode by qid, and additionally
+ * path, if inodeident=path is enabled.
  * @v9ses: session information
  * @fid: fid to issue attribute request for
  * @sb: superblock on which to create inode
+ * @dentry: dentry corresponding to @fid
  *
  */
 static inline struct inode *
 v9fs_get_inode_from_fid(struct v9fs_session_info *v9ses, struct p9_fid *fid,
-			struct super_block *sb)
+			struct super_block *sb, struct dentry *dentry)
 {
+	if (!v9fs_inode_ident_path(v9ses)) {
+		/* Only pass in a dentry if we use qid+path to identify inodes */
+		dentry = NULL;
+	} else {
+		WARN_ON_ONCE(!dentry);
+	}
 	if (v9fs_proto_dotl(v9ses))
-		return v9fs_inode_from_fid_dotl(v9ses, fid, sb, 0);
+		return v9fs_inode_from_fid_dotl(v9ses, fid, sb, dentry, 0);
 	else
 		return v9fs_inode_from_fid(v9ses, fid, sb, 0);
 }
 
 /**
  * v9fs_get_new_inode_from_fid - Helper routine to populate an inode by
- * issuing a attribute request
+ * issuing a attribute request.  Always get a new inode.
  * @v9ses: session information
  * @fid: fid to issue attribute request for
  * @sb: superblock on which to create inode
+ * @dentry: dentry corresponding to @fid.  A reference will be taken and
+ * placed in the inode, if in path identification mode.
  *
  */
 static inline struct inode *
 v9fs_get_new_inode_from_fid(struct v9fs_session_info *v9ses, struct p9_fid *fid,
-			    struct super_block *sb)
+			    struct super_block *sb, struct dentry *dentry)
 {
+	if (!v9fs_inode_ident_path(v9ses)) {
+		/* Only pass in a dentry if we use qid+path to identify inodes */
+		dentry = NULL;
+	}
 	if (v9fs_proto_dotl(v9ses))
-		return v9fs_inode_from_fid_dotl(v9ses, fid, sb, 1);
+		return v9fs_inode_from_fid_dotl(v9ses, fid, sb, dentry, 1);
 	else
 		return v9fs_inode_from_fid(v9ses, fid, sb, 1);
 }
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index caff65d8b2bb..5e56c13da733 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -232,6 +232,7 @@ struct inode *v9fs_alloc_inode(struct super_block *sb)
 	if (!v9inode)
 		return NULL;
 	v9inode->cache_validity = 0;
+	v9inode->path = NULL;
 	mutex_init(&v9inode->v_mutex);
 	return &v9inode->netfs.inode;
 }
@@ -243,6 +244,7 @@ struct inode *v9fs_alloc_inode(struct super_block *sb)
 
 void v9fs_free_inode(struct inode *inode)
 {
+	free_ino_path(V9FS_I(inode)->path);
 	kmem_cache_free(v9fs_inode_cache, V9FS_I(inode));
 }
 
@@ -607,15 +609,17 @@ v9fs_create(struct v9fs_session_info *v9ses, struct inode *dir,
 			goto error;
 		}
 		/*
-		 * instantiate inode and assign the unopened fid to the dentry
+		 * Instantiate inode.  On .L fs, pass in dentry for inodeident=path.
 		 */
-		inode = v9fs_get_new_inode_from_fid(v9ses, fid, dir->i_sb);
+		inode = v9fs_get_new_inode_from_fid(v9ses, fid, dir->i_sb,
+			v9fs_proto_dotl(v9ses) ? dentry : NULL);
 		if (IS_ERR(inode)) {
 			err = PTR_ERR(inode);
 			p9_debug(P9_DEBUG_VFS,
 				   "inode creation failed %d\n", err);
 			goto error;
 		}
+		/* Assign the unopened fid to the dentry */
 		v9fs_fid_add(dentry, &fid);
 		d_instantiate(dentry, inode);
 	}
@@ -732,14 +736,16 @@ struct dentry *v9fs_vfs_lookup(struct inode *dir, struct dentry *dentry,
 	name = dentry->d_name.name;
 	fid = p9_client_walk(dfid, 1, &name, 1);
 	p9_fid_put(dfid);
+
 	if (fid == ERR_PTR(-ENOENT))
 		inode = NULL;
 	else if (IS_ERR(fid))
 		inode = ERR_CAST(fid);
-	else if (v9ses->cache & (CACHE_META|CACHE_LOOSE))
-		inode = v9fs_get_inode_from_fid(v9ses, fid, dir->i_sb);
+	else if (v9ses->cache & (CACHE_META | CACHE_LOOSE))
+		/* Cached fs will not use inode path identification */
+		inode = v9fs_get_inode_from_fid(v9ses, fid, dir->i_sb, NULL);
 	else
-		inode = v9fs_get_new_inode_from_fid(v9ses, fid, dir->i_sb);
+		inode = v9fs_get_inode_from_fid(v9ses, fid, dir->i_sb, dentry);
 	/*
 	 * If we had a rename on the server and a parallel lookup
 	 * for the new name, then make sure we instantiate with
diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index 0fafc603b64a..86adaf5bcc0e 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -52,44 +52,98 @@ static kgid_t v9fs_get_fsgid_for_create(struct inode *dir_inode)
 	return current_fsgid();
 }
 
+struct iget_data {
+	struct p9_stat_dotl *st;
+
+	/* May be NULL */
+	struct dentry *dentry;
+
+	bool need_double_check;
+};
+
 static int v9fs_test_inode_dotl(struct inode *inode, void *data)
 {
 	struct v9fs_inode *v9inode = V9FS_I(inode);
-	struct p9_stat_dotl *st = (struct p9_stat_dotl *)data;
+	struct p9_stat_dotl *st = ((struct iget_data *)data)->st;
+	struct dentry *dentry = ((struct iget_data *)data)->dentry;
+	struct v9fs_session_info *v9ses = v9fs_inode2v9ses(inode);
+	bool cached = v9ses->cache & (CACHE_META | CACHE_LOOSE);
 
-	/* don't match inode of different type */
+	/*
+	 * Don't reuse inode of different type, even if path matches.
+	 */
 	if (inode_wrong_type(inode, st->st_mode))
 		return 0;
 
-	if (inode->i_generation != st->st_gen)
-		return 0;
-
-	/* compare qid details */
-	if (memcmp(&v9inode->qid.version,
-		   &st->qid.version, sizeof(v9inode->qid.version)))
-		return 0;
-
 	if (v9inode->qid.type != st->qid.type)
 		return 0;
 
 	if (v9inode->qid.path != st->qid.path)
 		return 0;
+
+	if (cached) {
+		/*
+		 * Server side changes are not supposed to happen in cached mode.
+		 * If we fail this generation or version comparison on the inode,
+		 * we don't reuse it.
+		 */
+		if (inode->i_generation != st->st_gen)
+			return 0;
+
+		/* compare qid details */
+		if (memcmp(&v9inode->qid.version,
+			&st->qid.version, sizeof(v9inode->qid.version)))
+			return 0;
+	}
+
+	if (v9fs_inode_ident_path(v9ses) && dentry) {
+		if (v9inode->path) {
+			if (!ino_path_compare(v9inode->path, dentry)) {
+				p9_debug(
+					P9_DEBUG_VFS,
+					"Refusing to reuse inode %p based on path mismatch",
+					inode);
+				return 0;
+			}
+		} else if (inode->i_state & I_NEW) {
+			/*
+			 * iget5_locked may call this function with a still
+			 * initializing (I_NEW) inode, so we're now racing with the
+			 * code in v9fs_qid_iget_dotl that prepares v9inode->path.
+			 * Returning from this test function now with positive result
+			 * will cause us to wait for this inode to be ready, and we
+			 * can then re-check in v9fs_qid_iget_dotl.
+			 */
+			((struct iget_data *)data)->need_double_check = true;
+		} else {
+			WARN_ONCE(
+				1,
+				"Inode %p (ino %lu) does not have v9inode->path even though fs has path-based inode identification enabled?",
+				inode, inode->i_ino);
+		}
+	}
+
 	return 1;
 }
 
-/* Always get a new inode */
 static int v9fs_test_new_inode_dotl(struct inode *inode, void *data)
 {
 	return 0;
 }
 
-static int v9fs_set_inode_dotl(struct inode *inode,  void *data)
+static int v9fs_set_inode_dotl(struct inode *inode, void *data)
 {
 	struct v9fs_inode *v9inode = V9FS_I(inode);
-	struct p9_stat_dotl *st = (struct p9_stat_dotl *)data;
+	struct iget_data *idata = data;
+	struct p9_stat_dotl *st = idata->st;
 
 	memcpy(&v9inode->qid, &st->qid, sizeof(st->qid));
 	inode->i_generation = st->st_gen;
+	/*
+	 * We can't fill v9inode->path here, because allocating an ino_path
+	 * means that we might sleep, and we can't sleep here.
+	 */
+	v9inode->path = NULL;
 	return 0;
 }
 
@@ -97,19 +151,56 @@ static struct inode *v9fs_qid_iget_dotl(struct super_block *sb,
 					struct p9_qid *qid,
 					struct p9_fid *fid,
 					struct p9_stat_dotl *st,
+					struct dentry *dentry,
 					int new)
 {
 	int retval;
 	struct inode *inode;
+	struct v9fs_inode *v9inode;
 	struct v9fs_session_info *v9ses = sb->s_fs_info;
 	int (*test)(struct inode *inode, void *data);
+	struct iget_data data = {
+		.st = st,
+		.dentry = dentry,
+		.need_double_check = false,
+	};
 
 	if (new)
 		test = v9fs_test_new_inode_dotl;
 	else
 		test = v9fs_test_inode_dotl;
 
-	inode = iget5_locked(sb, QID2INO(qid), test, v9fs_set_inode_dotl, st);
+	if (dentry) {
+		/*
+		 * If we need to compare paths to find the inode to reuse, we need
+		 * to take the rename_sem for this FS.  We need to take it here,
+		 * instead of inside ino_path_compare, as iget5_locked has
+		 * spinlock in it (inode_hash_lock)
+		 */
+		down_read(&v9ses->rename_sem);
+	}
+	while (true) {
+		data.need_double_check = false;
+		inode = iget5_locked(sb, QID2INO(qid), test, v9fs_set_inode_dotl, &data);
+		if (!data.need_double_check)
+			break;
+		/*
+		 * Need to double check path as it wasn't initialized yet when we
+		 * tested it
+		 */
+		if (!inode || (inode->i_state & I_NEW)) {
+			WARN_ONCE(
+				1,
+				"Expected iget5_locked to return an existing inode");
+			break;
+		}
+		if (ino_path_compare(V9FS_I(inode)->path, dentry))
+			break;
+		iput(inode);
+	}
+	if (dentry)
+		up_read(&v9ses->rename_sem);
+
 	if (!inode)
 		return ERR_PTR(-ENOMEM);
 	if (!(inode->i_state & I_NEW))
@@ -125,6 +216,17 @@ static struct inode *v9fs_qid_iget_dotl(struct super_block *sb,
 	if (retval)
 		goto error;
 
+	v9inode = V9FS_I(inode);
+	if (dentry) {
+		down_read(&v9ses->rename_sem);
+		v9inode->path = make_ino_path(dentry);
+		up_read(&v9ses->rename_sem);
+		if (!v9inode->path) {
+			retval = -ENOMEM;
+			goto error;
+		}
+	}
+
 	v9fs_stat2inode_dotl(st, inode, 0);
 	v9fs_set_netfs_context(inode);
 	v9fs_cache_inode_get_cookie(inode);
@@ -140,9 +242,18 @@ static struct inode *v9fs_qid_iget_dotl(struct super_block *sb,
 
 }
 
+/**
+ * Issues a getattr request and use the result to look up the inode for
+ * the target pointed to by @fid.
+ * @v9ses: session information
+ * @fid: fid to issue attribute request for
+ * @sb: superblock on which to create inode
+ * @dentry: if not NULL, the path of the provided dentry is compared
+ * against the path stored in the inode, to determine reuse eligibility.
+ */
 struct inode *
 v9fs_inode_from_fid_dotl(struct v9fs_session_info *v9ses, struct p9_fid *fid,
-			 struct super_block *sb, int new)
+			 struct super_block *sb, struct dentry *dentry, int new)
 {
 	struct p9_stat_dotl *st;
 	struct inode *inode = NULL;
@@ -151,7 +262,7 @@ v9fs_inode_from_fid_dotl(struct v9fs_session_info *v9ses, struct p9_fid *fid,
 	if (IS_ERR(st))
 		return ERR_CAST(st);
 
-	inode = v9fs_qid_iget_dotl(sb, &st->qid, fid, st, new);
+	inode = v9fs_qid_iget_dotl(sb, &st->qid, fid, st, dentry, new);
 	kfree(st);
 	return inode;
 }
@@ -305,7 +416,7 @@ v9fs_vfs_atomic_open_dotl(struct inode *dir, struct dentry *dentry,
 		p9_debug(P9_DEBUG_VFS, "p9_client_walk failed %d\n", err);
 		goto out;
 	}
-	inode = v9fs_get_new_inode_from_fid(v9ses, fid, dir->i_sb);
+	inode = v9fs_get_new_inode_from_fid(v9ses, fid, dir->i_sb, dentry);
 	if (IS_ERR(inode)) {
 		err = PTR_ERR(inode);
 		p9_debug(P9_DEBUG_VFS, "inode creation failed %d\n", err);
@@ -400,7 +511,7 @@ static struct dentry *v9fs_vfs_mkdir_dotl(struct mnt_idmap *idmap,
 	}
 
 	/* instantiate inode and assign the unopened fid to the dentry */
-	inode = v9fs_get_new_inode_from_fid(v9ses, fid, dir->i_sb);
+	inode = v9fs_get_new_inode_from_fid(v9ses, fid, dir->i_sb, dentry);
 	if (IS_ERR(inode)) {
 		err = PTR_ERR(inode);
 		p9_debug(P9_DEBUG_VFS, "inode creation failed %d\n",
@@ -838,7 +949,7 @@ v9fs_vfs_mknod_dotl(struct mnt_idmap *idmap, struct inode *dir,
 			 err);
 		goto error;
 	}
-	inode = v9fs_get_new_inode_from_fid(v9ses, fid, dir->i_sb);
+	inode = v9fs_get_new_inode_from_fid(v9ses, fid, dir->i_sb, dentry);
 	if (IS_ERR(inode)) {
 		err = PTR_ERR(inode);
 		p9_debug(P9_DEBUG_VFS, "inode creation failed %d\n",
diff --git a/fs/9p/vfs_super.c b/fs/9p/vfs_super.c
index 795c6388744c..bb9e66f4631e 100644
--- a/fs/9p/vfs_super.c
+++ b/fs/9p/vfs_super.c
@@ -141,7 +141,7 @@ static struct dentry *v9fs_mount(struct file_system_type *fs_type, int flags,
 		sb->s_d_flags |= DCACHE_DONTCACHE;
 	}
 
-	inode = v9fs_get_new_inode_from_fid(v9ses, fid, sb);
+	inode = v9fs_get_new_inode_from_fid(v9ses, fid, sb, NULL);
 	if (IS_ERR(inode)) {
 		retval = PTR_ERR(inode);
 		goto release_sb;
@@ -153,6 +153,17 @@ static struct dentry *v9fs_mount(struct file_system_type *fs_type, int flags,
 		goto release_sb;
 	}
 	sb->s_root = root;
+
+	if (v9fs_inode_ident_path(v9ses)) {
+		/*
+		 * This down_read is probably not necessary, just to satisfy
+		 * lockdep_assert
+		 */
+		down_read(&v9ses->rename_sem);
+		V9FS_I(inode)->path = make_ino_path(root);
+		up_read(&v9ses->rename_sem);
+	}
+
 	retval = v9fs_get_acl(inode, fid);
 	if (retval)
 		goto release_sb;
-- 
2.51.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 2/7] fs/9p: add option for path-based inodes
  2025-09-04  0:04 [PATCH v2 0/7] fs/9p: Reuse inode based on path (in addition to qid) Tingmao Wang
  2025-09-04  0:04 ` [PATCH v2 1/7] fs/9p: Add ability to identify inode by path for .L in uncached mode Tingmao Wang
@ 2025-09-04  0:04 ` Tingmao Wang
  2025-09-04  0:04 ` [PATCH v2 3/7] fs/9p: Add ability to identify inode by path for non-.L in uncached mode Tingmao Wang
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 27+ messages in thread
From: Tingmao Wang @ 2025-09-04  0:04 UTC (permalink / raw)
  To: Dominique Martinet, Eric Van Hensbergen, Latchesar Ionkov,
	Christian Schoenebeck, Mickaël Salaün
  Cc: Tingmao Wang, v9fs, Günther Noack, linux-security-module,
	Jan Kara, Amir Goldstein, Matthew Bobrowski, Al Viro,
	linux-fsdevel

By this point we have two ways to test for inode reuse - qid and qid+path.
By default, uncached mode uses qid+path and cached mode uses qid (and in
fact does not support qid+path).  This patch adds the option to control
the behaviour for uncached mode.

In a future version, if we can negotiate with the server and be sure that
it won't give us duplicate qid.path, the default for those cases can be
qid-based.

Signed-off-by: Tingmao Wang <m@maowtm.org>
Cc: "Mickaël Salaün" <mic@digikod.net>
Cc: "Günther Noack" <gnoack@google.com>

---
Changes since v1:
- Removed inodeident=none and instead supports inodeident=qid.  This means
  that there is no longer an option to not re-use inodes at all.

- No longer supports inodeident=path on cached mode, checks added at
  option init time.

- Added explicit bits for both V9FS_INODE_IDENT_PATH and
  V9FS_INODE_IDENT_QID, in order to set a default based on cache bits when
  neither are set explicitly by the user.

 fs/9p/v9fs.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 fs/9p/v9fs.h |  3 +++
 2 files changed, 61 insertions(+), 1 deletion(-)

diff --git a/fs/9p/v9fs.c b/fs/9p/v9fs.c
index 77e9c4387c1d..f87d6680b85a 100644
--- a/fs/9p/v9fs.c
+++ b/fs/9p/v9fs.c
@@ -36,7 +36,7 @@ enum {
 	/* Options that take integer arguments */
 	Opt_debug, Opt_dfltuid, Opt_dfltgid, Opt_afid,
 	/* String options */
-	Opt_uname, Opt_remotename, Opt_cache, Opt_cachetag,
+	Opt_uname, Opt_remotename, Opt_cache, Opt_cachetag, Opt_inodeident,
 	/* Options that take no arguments */
 	Opt_nodevmap, Opt_noxattr, Opt_directio, Opt_ignoreqv,
 	/* Access options */
@@ -63,6 +63,7 @@ static const match_table_t tokens = {
 	{Opt_access, "access=%s"},
 	{Opt_posixacl, "posixacl"},
 	{Opt_locktimeout, "locktimeout=%u"},
+	{Opt_inodeident, "inodeident=%s"},
 	{Opt_err, NULL}
 };
 
@@ -149,6 +150,21 @@ int v9fs_show_options(struct seq_file *m, struct dentry *root)
 	if (v9ses->flags & V9FS_NO_XATTR)
 		seq_puts(m, ",noxattr");
 
+	switch (v9ses->flags & V9FS_INODE_IDENT_MASK) {
+	case V9FS_INODE_IDENT_QID:
+		seq_puts(m, ",inodeident=qid");
+		break;
+	case V9FS_INODE_IDENT_PATH:
+		seq_puts(m, ",inodeident=path");
+		break;
+	default:
+		/*
+		 * Unspecified, will be set later in v9fs_session_init depending on
+		 * cache setting
+		 */
+		break;
+	}
+
 	return p9_show_client_options(m, v9ses->clnt);
 }
 
@@ -369,6 +385,26 @@ static int v9fs_parse_options(struct v9fs_session_info *v9ses, char *opts)
 			v9ses->session_lock_timeout = (long)option * HZ;
 			break;
 
+		case Opt_inodeident:
+			s = match_strdup(&args[0]);
+			if (!s) {
+				ret = -ENOMEM;
+				p9_debug(P9_DEBUG_ERROR,
+					 "problem allocating copy of inodeident arg\n");
+				goto free_and_return;
+			}
+			v9ses->flags &= ~V9FS_INODE_IDENT_MASK;
+			if (strcmp(s, "qid") == 0) {
+				v9ses->flags |= V9FS_INODE_IDENT_QID;
+			} else if (strcmp(s, "path") == 0) {
+				v9ses->flags |= V9FS_INODE_IDENT_PATH;
+			} else {
+				ret = -EINVAL;
+				p9_debug(P9_DEBUG_ERROR, "Unknown inodeident argument %s\n", s);
+			}
+			kfree(s);
+			break;
+
 		default:
 			continue;
 		}
@@ -393,6 +429,7 @@ struct p9_fid *v9fs_session_init(struct v9fs_session_info *v9ses,
 {
 	struct p9_fid *fid;
 	int rc = -ENOMEM;
+	bool cached;
 
 	v9ses->uname = kstrdup(V9FS_DEFUSER, GFP_KERNEL);
 	if (!v9ses->uname)
@@ -427,6 +464,26 @@ struct p9_fid *v9fs_session_init(struct v9fs_session_info *v9ses,
 	if (rc < 0)
 		goto err_clnt;
 
+	cached = v9ses->cache & (CACHE_META | CACHE_LOOSE);
+
+	if (cached && v9ses->flags & V9FS_INODE_IDENT_PATH) {
+		rc = -EINVAL;
+		p9_debug(P9_DEBUG_ERROR,
+			 "inodeident=path not supported in cached mode\n");
+		goto err_clnt;
+	}
+
+	if (!(v9ses->flags & V9FS_INODE_IDENT_MASK)) {
+		/* Unspecified - use default */
+		if (cached) {
+			/* which is qid in cached mode (path not supported) */
+			v9ses->flags |= V9FS_INODE_IDENT_QID;
+		} else {
+			/* ...or path in uncached mode */
+			v9ses->flags |= V9FS_INODE_IDENT_PATH;
+		}
+	}
+
 	v9ses->maxdata = v9ses->clnt->msize - P9_IOHDRSZ;
 
 	if (!v9fs_proto_dotl(v9ses) &&
diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h
index 134b55a605be..b4e738c1bba5 100644
--- a/fs/9p/v9fs.h
+++ b/fs/9p/v9fs.h
@@ -43,8 +43,11 @@ enum p9_session_flags {
 	V9FS_DIRECT_IO        = 0x100,
 	V9FS_SYNC             = 0x200,
 	V9FS_INODE_IDENT_PATH = 0x400,
+	V9FS_INODE_IDENT_QID  = 0x800,
 };
 
+#define V9FS_INODE_IDENT_MASK (V9FS_INODE_IDENT_PATH | V9FS_INODE_IDENT_QID)
+
 /**
  * enum p9_cache_shortcuts - human readable cache preferences
  * @CACHE_SC_NONE: disable all caches
-- 
2.51.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 3/7] fs/9p: Add ability to identify inode by path for non-.L in uncached mode
  2025-09-04  0:04 [PATCH v2 0/7] fs/9p: Reuse inode based on path (in addition to qid) Tingmao Wang
  2025-09-04  0:04 ` [PATCH v2 1/7] fs/9p: Add ability to identify inode by path for .L in uncached mode Tingmao Wang
  2025-09-04  0:04 ` [PATCH v2 2/7] fs/9p: add option for path-based inodes Tingmao Wang
@ 2025-09-04  0:04 ` Tingmao Wang
  2025-09-04  0:04 ` [PATCH v2 4/7] fs/9p: .L: Refresh stale inodes on reuse Tingmao Wang
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 27+ messages in thread
From: Tingmao Wang @ 2025-09-04  0:04 UTC (permalink / raw)
  To: Dominique Martinet, Eric Van Hensbergen, Latchesar Ionkov,
	Christian Schoenebeck, Mickaël Salaün
  Cc: Tingmao Wang, v9fs, Günther Noack, linux-security-module,
	Jan Kara, Amir Goldstein, Matthew Bobrowski, Al Viro,
	linux-fsdevel

This replicates the earlier .L patch for non-.L, and removing some
previously inserted conditionals in shared code.

Signed-off-by: Tingmao Wang <m@maowtm.org>

---
Changes since v1:
- Reflect v2 changes to the .L counterpart of this.

 fs/9p/v9fs.h      |   7 ++-
 fs/9p/vfs_inode.c | 150 ++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 130 insertions(+), 27 deletions(-)

diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h
index b4e738c1bba5..bacd0052e22c 100644
--- a/fs/9p/v9fs.h
+++ b/fs/9p/v9fs.h
@@ -202,7 +202,8 @@ extern int v9fs_vfs_rename(struct mnt_idmap *idmap,
 			   unsigned int flags);
 extern struct inode *v9fs_inode_from_fid(struct v9fs_session_info *v9ses,
 					 struct p9_fid *fid,
-					 struct super_block *sb, int new);
+					 struct super_block *sb,
+					 struct dentry *dentry, int new);
 extern const struct inode_operations v9fs_dir_inode_operations_dotl;
 extern const struct inode_operations v9fs_file_inode_operations_dotl;
 extern const struct inode_operations v9fs_symlink_inode_operations_dotl;
@@ -267,7 +268,7 @@ v9fs_get_inode_from_fid(struct v9fs_session_info *v9ses, struct p9_fid *fid,
 	if (v9fs_proto_dotl(v9ses))
 		return v9fs_inode_from_fid_dotl(v9ses, fid, sb, dentry, 0);
 	else
-		return v9fs_inode_from_fid(v9ses, fid, sb, 0);
+		return v9fs_inode_from_fid(v9ses, fid, sb, dentry, 0);
 }
 
 /**
@@ -291,7 +292,7 @@ v9fs_get_new_inode_from_fid(struct v9fs_session_info *v9ses, struct p9_fid *fid,
 	if (v9fs_proto_dotl(v9ses))
 		return v9fs_inode_from_fid_dotl(v9ses, fid, sb, dentry, 1);
 	else
-		return v9fs_inode_from_fid(v9ses, fid, sb, 1);
+		return v9fs_inode_from_fid(v9ses, fid, sb, dentry, 1);
 }
 
 #endif
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 5e56c13da733..606760f966fd 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -364,29 +364,76 @@ void v9fs_evict_inode(struct inode *inode)
 		clear_inode(inode);
 }
 
+struct iget_data {
+	struct p9_wstat *st;
+
+	/* May be NULL */
+	struct dentry *dentry;
+
+	bool need_double_check;
+};
+
 static int v9fs_test_inode(struct inode *inode, void *data)
 {
 	int umode;
 	dev_t rdev;
 	struct v9fs_inode *v9inode = V9FS_I(inode);
-	struct p9_wstat *st = (struct p9_wstat *)data;
+	struct p9_wstat *st = ((struct iget_data *)data)->st;
+	struct dentry *dentry = ((struct iget_data *)data)->dentry;
 	struct v9fs_session_info *v9ses = v9fs_inode2v9ses(inode);
+	bool cached = v9ses->cache & (CACHE_META | CACHE_LOOSE);
 
 	umode = p9mode2unixmode(v9ses, st, &rdev);
-	/* don't match inode of different type */
+	/*
+	 * Don't reuse inode of different type, even if path matches.
+	 */
 	if (inode_wrong_type(inode, umode))
 		return 0;
 
-	/* compare qid details */
-	if (memcmp(&v9inode->qid.version,
-		   &st->qid.version, sizeof(v9inode->qid.version)))
-		return 0;
-
 	if (v9inode->qid.type != st->qid.type)
 		return 0;
 
 	if (v9inode->qid.path != st->qid.path)
 		return 0;
+
+	if (cached) {
+		/*
+		 * Server side changes are not supposed to happen in cached mode.
+		 * If we fail this version comparison on the inode, we don't reuse
+		 * it.
+		 */
+		if (memcmp(&v9inode->qid.version,
+			&st->qid.version, sizeof(v9inode->qid.version)))
+			return 0;
+	}
+
+	if (v9fs_inode_ident_path(v9ses) && dentry) {
+		if (v9inode->path) {
+			if (!ino_path_compare(v9inode->path, dentry)) {
+				p9_debug(
+					P9_DEBUG_VFS,
+					"Refusing to reuse inode %p based on path mismatch",
+					inode);
+				return 0;
+			}
+		} else if (inode->i_state & I_NEW) {
+			/*
+			 * iget5_locked may call this function with a still
+			 * initializing (I_NEW) inode, so we're now racing with the
+			 * code in v9fs_qid_iget that prepares v9inode->path.
+			 * Returning from this test function now with positive result
+			 * will cause us to wait for this inode to be ready, and we
+			 * can then re-check in v9fs_qid_iget.
+			 */
+			((struct iget_data *)data)->need_double_check = true;
+		} else {
+			WARN_ONCE(
+				1,
+				"Inode %p (ino %lu) does not have v9inode->path even though fs has path-based inode identification enabled?",
+				inode, inode->i_ino);
+		}
+	}
+
 	return 1;
 }
 
@@ -395,33 +442,74 @@ static int v9fs_test_new_inode(struct inode *inode, void *data)
 	return 0;
 }
 
-static int v9fs_set_inode(struct inode *inode,  void *data)
+static int v9fs_set_inode(struct inode *inode, void *data)
 {
 	struct v9fs_inode *v9inode = V9FS_I(inode);
-	struct p9_wstat *st = (struct p9_wstat *)data;
+	struct iget_data *idata = data;
+	struct p9_wstat *st = idata->st;
 
 	memcpy(&v9inode->qid, &st->qid, sizeof(st->qid));
+	/*
+	 * We can't fill v9inode->path here, because allocating an ino_path
+	 * means that we might sleep, and we can't sleep here.
+	 */
+	v9inode->path = NULL;
 	return 0;
 }
 
-static struct inode *v9fs_qid_iget(struct super_block *sb,
-				   struct p9_qid *qid,
-				   struct p9_wstat *st,
+static struct inode *v9fs_qid_iget(struct super_block *sb, struct p9_qid *qid,
+				   struct p9_wstat *st, struct dentry *dentry,
 				   int new)
 {
 	dev_t rdev;
 	int retval;
 	umode_t umode;
 	struct inode *inode;
+	struct v9fs_inode *v9inode;
 	struct v9fs_session_info *v9ses = sb->s_fs_info;
 	int (*test)(struct inode *inode, void *data);
+	struct iget_data data = {
+		.st = st,
+		.dentry = dentry,
+		.need_double_check = false,
+	};
 
 	if (new)
 		test = v9fs_test_new_inode;
 	else
 		test = v9fs_test_inode;
 
-	inode = iget5_locked(sb, QID2INO(qid), test, v9fs_set_inode, st);
+	if (dentry) {
+		/*
+		 * If we need to compare paths to find the inode to reuse, we need
+		 * to take the rename_sem for this FS.  We need to take it here,
+		 * instead of inside ino_path_compare, as iget5_locked has
+		 * spinlock in it (inode_hash_lock)
+		 */
+		down_read(&v9ses->rename_sem);
+	}
+	while (true) {
+		data.need_double_check = false;
+		inode = iget5_locked(sb, QID2INO(qid), test, v9fs_set_inode, &data);
+		if (!data.need_double_check)
+			break;
+		/*
+		 * Need to double check path as it wasn't initialized yet when we
+		 * tested it
+		 */
+		if (!inode || (inode->i_state & I_NEW)) {
+			WARN_ONCE(
+				1,
+				"Expected iget5_locked to return an existing inode");
+			break;
+		}
+		if (ino_path_compare(V9FS_I(inode)->path, dentry))
+			break;
+		iput(inode);
+	}
+	if (dentry)
+		up_read(&v9ses->rename_sem);
+
 	if (!inode)
 		return ERR_PTR(-ENOMEM);
 	if (!(inode->i_state & I_NEW))
@@ -437,6 +525,16 @@ static struct inode *v9fs_qid_iget(struct super_block *sb,
 	if (retval)
 		goto error;
 
+	v9inode = V9FS_I(inode);
+	if (dentry) {
+		down_read(&v9ses->rename_sem);
+		v9inode->path = make_ino_path(dentry);
+		up_read(&v9ses->rename_sem);
+		if (!v9inode->path) {
+			retval = -ENOMEM;
+			goto error;
+		}
+	}
 	v9fs_stat2inode(st, inode, sb, 0);
 	v9fs_set_netfs_context(inode);
 	v9fs_cache_inode_get_cookie(inode);
@@ -448,9 +546,18 @@ static struct inode *v9fs_qid_iget(struct super_block *sb,
 
 }
 
-struct inode *
-v9fs_inode_from_fid(struct v9fs_session_info *v9ses, struct p9_fid *fid,
-		    struct super_block *sb, int new)
+/**
+ * Issues a getattr request and use the result to look up the inode for
+ * the target pointed to by @fid.
+ * @v9ses: session information
+ * @fid: fid to issue attribute request for
+ * @sb: superblock on which to create inode
+ * @dentry: if not NULL, the path of the provided dentry is compared
+ * against the path stored in the inode, to determine reuse eligibility.
+ */
+struct inode *v9fs_inode_from_fid(struct v9fs_session_info *v9ses,
+				  struct p9_fid *fid, struct super_block *sb,
+				  struct dentry *dentry, int new)
 {
 	struct p9_wstat *st;
 	struct inode *inode = NULL;
@@ -459,7 +566,7 @@ v9fs_inode_from_fid(struct v9fs_session_info *v9ses, struct p9_fid *fid,
 	if (IS_ERR(st))
 		return ERR_CAST(st);
 
-	inode = v9fs_qid_iget(sb, &st->qid, st, new);
+	inode = v9fs_qid_iget(sb, &st->qid, st, dentry, new);
 	p9stat_free(st);
 	kfree(st);
 	return inode;
@@ -608,18 +715,14 @@ v9fs_create(struct v9fs_session_info *v9ses, struct inode *dir,
 				   "p9_client_walk failed %d\n", err);
 			goto error;
 		}
-		/*
-		 * Instantiate inode.  On .L fs, pass in dentry for inodeident=path.
-		 */
-		inode = v9fs_get_new_inode_from_fid(v9ses, fid, dir->i_sb,
-			v9fs_proto_dotl(v9ses) ? dentry : NULL);
+		/* instantiate inode and assign the unopened fid to the dentry */
+		inode = v9fs_get_new_inode_from_fid(v9ses, fid, dir->i_sb, dentry);
 		if (IS_ERR(inode)) {
 			err = PTR_ERR(inode);
 			p9_debug(P9_DEBUG_VFS,
 				   "inode creation failed %d\n", err);
 			goto error;
 		}
-		/* Assign the unopened fid to the dentry */
 		v9fs_fid_add(dentry, &fid);
 		d_instantiate(dentry, inode);
 	}
@@ -1415,4 +1518,3 @@ static const struct inode_operations v9fs_symlink_inode_operations = {
 	.getattr = v9fs_vfs_getattr,
 	.setattr = v9fs_vfs_setattr,
 };
-
-- 
2.51.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 4/7] fs/9p: .L: Refresh stale inodes on reuse
  2025-09-04  0:04 [PATCH v2 0/7] fs/9p: Reuse inode based on path (in addition to qid) Tingmao Wang
                   ` (2 preceding siblings ...)
  2025-09-04  0:04 ` [PATCH v2 3/7] fs/9p: Add ability to identify inode by path for non-.L in uncached mode Tingmao Wang
@ 2025-09-04  0:04 ` Tingmao Wang
  2025-09-04  0:04 ` [PATCH v2 5/7] fs/9p: non-.L: " Tingmao Wang
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 27+ messages in thread
From: Tingmao Wang @ 2025-09-04  0:04 UTC (permalink / raw)
  To: Dominique Martinet, Eric Van Hensbergen, Latchesar Ionkov,
	Christian Schoenebeck, Mickaël Salaün
  Cc: Tingmao Wang, v9fs, Günther Noack, linux-security-module,
	Jan Kara, Amir Goldstein, Matthew Bobrowski, Al Viro,
	linux-fsdevel

This uses the stat struct we already got as part of lookup process to
refresh the inode "for free".

Only for uncached mode for now.  We will need to revisit this for cached
mode once we sort out reusing an old inode with changed qid.version.
(Currently this is not done in this series, which would be fine unless
server side change happens, which is not supposed to happen in cached mode
anyway)

Note that v9fs_test_inode_dotl already makes sure we don't reuse
inodes of the wrong type or different qid.

Signed-off-by: Tingmao Wang <m@maowtm.org>

---
Changes since v1:
- Check cache bits instead of using `new` - uncached mode now also have
  new=0.

 fs/9p/vfs_inode_dotl.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index 86adaf5bcc0e..d008e82256ac 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -164,6 +164,7 @@ static struct inode *v9fs_qid_iget_dotl(struct super_block *sb,
 		.dentry = dentry,
 		.need_double_check = false,
 	};
+	bool cached = v9ses->cache & (CACHE_META | CACHE_LOOSE);
 
 	if (new)
 		test = v9fs_test_new_inode_dotl;
@@ -203,8 +204,21 @@ static struct inode *v9fs_qid_iget_dotl(struct super_block *sb,
 
 	if (!inode)
 		return ERR_PTR(-ENOMEM);
-	if (!(inode->i_state & I_NEW))
+	if (!(inode->i_state & I_NEW)) {
+		/*
+		 * If we're returning an existing inode, we might as well refresh
+		 * it with the metadata we just got.  Refreshing the i_size also
+		 * prevents read errors.
+		 *
+		 * We only do this for uncached mode, since in cached move, any
+		 * change on the inode will bump qid.version, which will result in
+		 * us getting a new inode in the first place.  If we got an old
+		 * inode, let's not touch it for now.
+		 */
+		if (!cached)
+			v9fs_stat2inode_dotl(st, inode, 0);
 		return inode;
+	}
 	/*
 	 * initialize the inode with the stat info
 	 * FIXME!! we may need support for stale inodes
-- 
2.51.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 5/7] fs/9p: non-.L: Refresh stale inodes on reuse
  2025-09-04  0:04 [PATCH v2 0/7] fs/9p: Reuse inode based on path (in addition to qid) Tingmao Wang
                   ` (3 preceding siblings ...)
  2025-09-04  0:04 ` [PATCH v2 4/7] fs/9p: .L: Refresh stale inodes on reuse Tingmao Wang
@ 2025-09-04  0:04 ` Tingmao Wang
  2025-09-04  0:04 ` [PATCH v2 6/7] fs/9p: update the target's ino_path on rename Tingmao Wang
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 27+ messages in thread
From: Tingmao Wang @ 2025-09-04  0:04 UTC (permalink / raw)
  To: Dominique Martinet, Eric Van Hensbergen, Latchesar Ionkov,
	Christian Schoenebeck, Mickaël Salaün
  Cc: Tingmao Wang, v9fs, Günther Noack, linux-security-module,
	Jan Kara, Amir Goldstein, Matthew Bobrowski, Al Viro,
	linux-fsdevel

This replicates the previous .L commit for non-.L

Signed-off-by: Tingmao Wang <m@maowtm.org>

---
Changes since v1:
- Check cache bits instead of using `new` - uncached mode now also have
  new=0.

 fs/9p/vfs_inode.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 606760f966fd..4b4712eafe4d 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -473,6 +473,7 @@ static struct inode *v9fs_qid_iget(struct super_block *sb, struct p9_qid *qid,
 		.dentry = dentry,
 		.need_double_check = false,
 	};
+	bool cached = v9ses->cache & (CACHE_META | CACHE_LOOSE);
 
 	if (new)
 		test = v9fs_test_new_inode;
@@ -512,8 +513,12 @@ static struct inode *v9fs_qid_iget(struct super_block *sb, struct p9_qid *qid,
 
 	if (!inode)
 		return ERR_PTR(-ENOMEM);
-	if (!(inode->i_state & I_NEW))
+	if (!(inode->i_state & I_NEW)) {
+		/* See explanation in v9fs_qid_iget_dotl */
+		if (!cached)
+			v9fs_stat2inode(st, inode, sb, 0);
 		return inode;
+	}
 	/*
 	 * initialize the inode with the stat info
 	 * FIXME!! we may need support for stale inodes
-- 
2.51.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 6/7] fs/9p: update the target's ino_path on rename
  2025-09-04  0:04 [PATCH v2 0/7] fs/9p: Reuse inode based on path (in addition to qid) Tingmao Wang
                   ` (4 preceding siblings ...)
  2025-09-04  0:04 ` [PATCH v2 5/7] fs/9p: non-.L: " Tingmao Wang
@ 2025-09-04  0:04 ` Tingmao Wang
  2025-09-04  0:04 ` [PATCH v2 7/7] docs: fs/9p: Document the "inodeident" option Tingmao Wang
  2025-09-14 21:25 ` [PATCH v2 0/7] fs/9p: Reuse inode based on path (in addition to qid) Tingmao Wang
  7 siblings, 0 replies; 27+ messages in thread
From: Tingmao Wang @ 2025-09-04  0:04 UTC (permalink / raw)
  To: Dominique Martinet, Eric Van Hensbergen, Latchesar Ionkov,
	Christian Schoenebeck, Mickaël Salaün
  Cc: Tingmao Wang, v9fs, Günther Noack, linux-security-module,
	Jan Kara, Amir Goldstein, Matthew Bobrowski, Al Viro,
	linux-fsdevel

This makes it possible for the inode to "move along" to the new location
when a file under a inodeident=path 9pfs is moved, and it will be reused
on next access to the new location.

Modifying the ino_path of children when renaming a directory is currently
not handled.  Renaming non-empty directories still work, but the children
won't have their the inodes be reused after renaming.

Inodes will also not be reused on server-side rename, since there is no
way for us to know about it.  From our perspective this is
indistinguishable from a new file being created in the destination that
just happened to have the same qid, and the original file being deleted.

Signed-off-by: Tingmao Wang <m@maowtm.org>
Cc: "Mickaël Salaün" <mic@digikod.net>
Cc: "Günther Noack" <gnoack@google.com>

---
New patch in v2

 fs/9p/ino_path.c       |  3 ++-
 fs/9p/v9fs.h           |  3 +++
 fs/9p/vfs_inode.c      | 30 ++++++++++++++++++++++++++++++
 fs/9p/vfs_inode_dotl.c |  6 ++++++
 4 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/fs/9p/ino_path.c b/fs/9p/ino_path.c
index a03145e08a9d..ee4752b9f796 100644
--- a/fs/9p/ino_path.c
+++ b/fs/9p/ino_path.c
@@ -27,7 +27,8 @@ struct v9fs_ino_path *make_ino_path(struct dentry *dentry)
 	struct dentry *curr = dentry;
 	ssize_t i;
 
-	lockdep_assert_held_read(&v9fs_dentry2v9ses(dentry)->rename_sem);
+	/* Either read or write lock held is ok */
+	lockdep_assert_held(&v9fs_dentry2v9ses(dentry)->rename_sem);
 	might_sleep(); /* Allocation below might block */
 
 	rcu_read_lock();
diff --git a/fs/9p/v9fs.h b/fs/9p/v9fs.h
index bacd0052e22c..c441fa8e757b 100644
--- a/fs/9p/v9fs.h
+++ b/fs/9p/v9fs.h
@@ -157,6 +157,9 @@ struct v9fs_inode {
 	/*
 	 * Stores the path of the file this inode is for, only for filesystems
 	 * with inode_ident=path.  Lifetime is the same as this inode.
+	 * Read/write to this pointer should be under the target v9fs's
+	 * rename_sem to protect against races (except when initializing or
+	 * freeing an inode, at which point nobody else has reference to us)
 	 */
 	struct v9fs_ino_path *path;
 };
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 4b4712eafe4d..68a1837ff3dc 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -532,6 +532,12 @@ static struct inode *v9fs_qid_iget(struct super_block *sb, struct p9_qid *qid,
 
 	v9inode = V9FS_I(inode);
 	if (dentry) {
+		/*
+		 * In order to make_ino_path, we need at least a read lock on the
+		 * rename_sem.  Since we re initializing a new inode, there is no
+		 * risk of races with another task trying to write to
+		 * v9inode->path, so we do not need an actual down_write.
+		 */
 		down_read(&v9ses->rename_sem);
 		v9inode->path = make_ino_path(dentry);
 		up_read(&v9ses->rename_sem);
@@ -983,18 +989,21 @@ v9fs_vfs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
 {
 	int retval;
 	struct inode *old_inode;
+	struct v9fs_inode *old_v9inode;
 	struct inode *new_inode;
 	struct v9fs_session_info *v9ses;
 	struct p9_fid *oldfid = NULL, *dfid = NULL;
 	struct p9_fid *olddirfid = NULL;
 	struct p9_fid *newdirfid = NULL;
 	struct p9_wstat wstat;
+	struct v9fs_ino_path *new_ino_path = NULL;
 
 	if (flags)
 		return -EINVAL;
 
 	p9_debug(P9_DEBUG_VFS, "\n");
 	old_inode = d_inode(old_dentry);
+	old_v9inode = V9FS_I(old_inode);
 	new_inode = d_inode(new_dentry);
 	v9ses = v9fs_inode2v9ses(old_inode);
 	oldfid = v9fs_fid_lookup(old_dentry);
@@ -1022,6 +1031,17 @@ v9fs_vfs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
 	}
 
 	down_write(&v9ses->rename_sem);
+	if (v9fs_inode_ident_path(v9ses)) {
+		/*
+		 * Try to allocate this first, and don't actually do rename if
+		 * allocation fails.
+		 */
+		new_ino_path = make_ino_path(new_dentry);
+		if (!new_ino_path) {
+			retval = -ENOMEM;
+			goto error_locked;
+		}
+	}
 	if (v9fs_proto_dotl(v9ses)) {
 		retval = p9_client_renameat(olddirfid, old_dentry->d_name.name,
 					    newdirfid, new_dentry->d_name.name);
@@ -1061,6 +1081,15 @@ v9fs_vfs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
 		v9fs_invalidate_inode_attr(old_inode);
 		v9fs_invalidate_inode_attr(old_dir);
 		v9fs_invalidate_inode_attr(new_dir);
+		if (v9fs_inode_ident_path(v9ses)) {
+			/*
+			 * We currently have rename_sem write lock, which protects all
+			 * v9inode->path in this fs.
+			 */
+			free_ino_path(old_v9inode->path);
+			old_v9inode->path = new_ino_path;
+			new_ino_path = NULL;
+		}
 
 		/* successful rename */
 		d_move(old_dentry, new_dentry);
@@ -1068,6 +1097,7 @@ v9fs_vfs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
 	up_write(&v9ses->rename_sem);
 
 error:
+	free_ino_path(new_ino_path);
 	p9_fid_put(newdirfid);
 	p9_fid_put(olddirfid);
 	p9_fid_put(oldfid);
diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index d008e82256ac..a3f70dd422fb 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -232,6 +232,12 @@ static struct inode *v9fs_qid_iget_dotl(struct super_block *sb,
 
 	v9inode = V9FS_I(inode);
 	if (dentry) {
+		/*
+		 * In order to make_ino_path, we need at least a read lock on the
+		 * rename_sem.  Since we re initializing a new inode, there is no
+		 * risk of races with another task trying to write to
+		 * v9inode->path, so we do not need an actual down_write.
+		 */
 		down_read(&v9ses->rename_sem);
 		v9inode->path = make_ino_path(dentry);
 		up_read(&v9ses->rename_sem);
-- 
2.51.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 7/7] docs: fs/9p: Document the "inodeident" option
  2025-09-04  0:04 [PATCH v2 0/7] fs/9p: Reuse inode based on path (in addition to qid) Tingmao Wang
                   ` (5 preceding siblings ...)
  2025-09-04  0:04 ` [PATCH v2 6/7] fs/9p: update the target's ino_path on rename Tingmao Wang
@ 2025-09-04  0:04 ` Tingmao Wang
  2025-09-14 21:25 ` [PATCH v2 0/7] fs/9p: Reuse inode based on path (in addition to qid) Tingmao Wang
  7 siblings, 0 replies; 27+ messages in thread
From: Tingmao Wang @ 2025-09-04  0:04 UTC (permalink / raw)
  To: Dominique Martinet, Eric Van Hensbergen, Latchesar Ionkov,
	Christian Schoenebeck, Mickaël Salaün
  Cc: Tingmao Wang, v9fs, Günther Noack, linux-security-module,
	Jan Kara, Amir Goldstein, Matthew Bobrowski, Al Viro,
	linux-fsdevel

Add a row for this option in the Options table.

Signed-off-by: Tingmao Wang <m@maowtm.org>

---
New patch in v2

 Documentation/filesystems/9p.rst | 42 ++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/Documentation/filesystems/9p.rst b/Documentation/filesystems/9p.rst
index be3504ca034a..8b570a7ae698 100644
--- a/Documentation/filesystems/9p.rst
+++ b/Documentation/filesystems/9p.rst
@@ -238,6 +238,48 @@ Options
   cachetag	cache tag to use the specified persistent cache.
 		cache tags for existing cache sessions can be listed at
 		/sys/fs/9p/caches. (applies only to cache=fscache)
+
+  inodeident	this setting controls how inodes work on this filesystem.
+		More specifically, how they are "reused".  This is most
+		relevant when used with features like Landlock and
+		fanotify (in inode mark mode).  These features rely on
+		holding a specific inode and identifying further access to
+		the same file (as identified by that inode).
+
+		There are 2 possible values:
+			qid
+				This is the default and the only possible
+				option if loose or metadata cache is
+				enabled.  In this mode, 9pfs assumes that
+				the server will not present different
+				files with the same inode number, and will
+				use the presented inode number to lookup
+				inodes.  For QEMU users, this can be
+				ensured by setting multidevs=remap.  If
+				the server does present inode number
+				collisions, this may lead to unpredictable
+				behaviour when both files are accessed.
+			path
+				This is the default if neither loose nor
+				metadata cache bits are enabled.  This
+				option causes 9pfs to internally track the
+				file path that an inode originated from,
+				and will only use an existing inode
+				(instead of allocating a new one) if the
+				path matches, even if the file's inode
+				number matches that of an existing inode.
+
+		.. note::
+			For inodeident=path, when a directory is renamed
+			or moved, inodeident=path mode currently does not
+			update its children's inodes to point to the new
+			path, and thus further access to them via the new
+			location will use newly allocated inodes, and
+			existing inode marks placed by Landlock and
+			fanotify on them will no longer work.
+
+			The inode path for the target being renamed itself
+			(not its children) *is* updated, however.
   ============= ===============================================================
 
 Behavior
-- 
2.51.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/7] fs/9p: Reuse inode based on path (in addition to qid)
  2025-09-04  0:04 [PATCH v2 0/7] fs/9p: Reuse inode based on path (in addition to qid) Tingmao Wang
                   ` (6 preceding siblings ...)
  2025-09-04  0:04 ` [PATCH v2 7/7] docs: fs/9p: Document the "inodeident" option Tingmao Wang
@ 2025-09-14 21:25 ` Tingmao Wang
  2025-09-15 12:53   ` Dominique Martinet
  7 siblings, 1 reply; 27+ messages in thread
From: Tingmao Wang @ 2025-09-14 21:25 UTC (permalink / raw)
  To: Dominique Martinet, Mickaël Salaün
  Cc: Eric Van Hensbergen, Latchesar Ionkov, Christian Schoenebeck,
	v9fs, Günther Noack, linux-security-module, Jan Kara,
	Amir Goldstein, Matthew Bobrowski, Al Viro, Christian Brauner,
	linux-fsdevel

Hi Dominique and others,

I had a chat with Mickaël earlier this week and some discussion following
that, and we thought of a potential alternative to what I was proposing
here that might work for Landlock: using the inode number (or more
correctly, qid.path) directly as the keys for Landlock rules when
accessing 9p files.  I'm not sure how sound this is from the perspective
of 9pfs (there are pros and caveats), and I would like to gather some
thoughts on this idea.

Technically a 9pfs server is not supposed to return colliding qid.paths
for different files.  In fact, according to [1], the qid must not be the
same even for files which are deleted then recreated using the same name
(whereas for e.g. ext4, inode number is reused if a file is deleted and
recreated, possibly with a different name, in the same directory).
However, this is in practice not the case for many actual 9pfs server
implementations (thus the reason for this patch series in the first
place).

This is a bad problem for the 9pfs client in Linux as it can lead to data
corruption if the wrong inode is used, but for Landlock, the only effect
of this is allowing access to more files then the sandboxing application
intended (and only in the presence of an "erroneous" 9pfs server).  Any
other alternative, including this patch series, has the opposite risk -
files that should be allowed might be denied (even if the server
implementation is fully correct in terms of no reusing of qids).  In
particular, this patch cannot correctly handle server-side renames of an
allowed file, or rename of a directory with children in it from the client
(although this might be solved, with the expense of adding more
complicated code in the rename path to rewrite all the struct ino_paths).

In discussion with Mickaël he thought that it would be acceptable for
Landlock to assume that the server is well-behaved, and Landlock could
specialize for 9pfs to allow access if the qid matches what's previously
seen when creating the Landlock ruleset (by using the qid as the key of
the rule, instead of a pointer to the inode).

There are, however, several immediate issues with this approach:

1. The qid is 9pfs internal data, and we may need extra API for 9pfs to
   expose this to Landlock.  On 64bit, this is easy as it's just the inode
   number (offset by 2), which we can already get from the struct inode.
   But perhaps on 32bit we need a way to expose the full 64bit server-sent
   qid to Landlock (or other kernel subsystems), if we're going to do
   this.

2. Even though qids are supposed to be unique across the lifetime of a
   filesystem (including deleted files), this is not the case even for
   QEMU in multidevs=remap mode, when running on ext4, as tested on QEMU
   10.1.0.  And thus in practice a Landlock ruleset would need to hold a
   reference to the file to keep it open, so that the server will not
   re-use the qid for other files (having a reference to the struct inode
   alone doesn't seem to do that).

   Unfortunately, holding a dentry in Landlock prevents the filesystem
   from being unmounted (causes WARNs), with no (proper) chance for
   Landlock to release those dentries.  We might do it in
   security_sb_umount, but then at that point it is not guaranteed that
   the unmount will happen - perhaps we would need a new security_ hooks
   in the umount path?

   Alternatively, I think if we could somehow tell 9pfs to keep a fid open
   (until either the Landlock domain is closed, or the filesystem is
   unmounted), it could also work.

   I'm not sure what's the best way to do this, it seems like unless we
   can get a new pre_umount / pre_sb_delete hook in which we can free
   dentries, 9pfs would need to expose some new API, or alternatively, in
   uncached mode, have the v9fs inode itself hold a (strong) reference to
   the fid, so that if Landlock has a reference to the inode, the file is
   kept open server-side.

The advantage of doing this is that, for a server with reasonable
behaviour, Landlock users would not get incorrect denials (i.e. things
"just work"), while still maintaining security if the 9p server is
"reasonable" (in particular, an application sandboxed under Landlock would
not get access to unrelated files if it does not have a way to somehow get
those files to be recreated with an allowed inode number), whereas the
current patch has the problem with server side renames and directory
renames (server or client side), and also can't deal with hard links.

I'm not sure how attractive this solution is to various people here -
Mickaël is happy with special-casing 9pfs in Landlock, and in fact he
suggested this idea in the first place, but I think this has the potential
to be quite complicated (but technically more correct).  It would also
only work for Landlock, and if e.g. fsnotify wants to have the same
behaviour, that would need its own changes too.

Apologies for the long-winded explanation, any thoughts on this?

[1]: https://ericvh.github.io/9p-rfc/rfc9p2000.html#msgs
     "If a file is deleted and recreated with the same name in the same
     directory, the old and new path components of the qids should be
     different."

---

Note: Even with the above, there's another potential problem - QEMU does
not, for some reason (I've not really investigated this very deeply, but
it's observation from /proc/.../fd), keep a directory open when the guest
has a fid to it.  This means that if a directory is deleted while we have
an active Landlock rule on it, a new file or directory may get the same
qid.  (However, at least this still correctly handles directory renames,
and the only effect is Landlock allowing more files than intended in the
presence of a buggy server.)

(The Hyper-V 9p server, used by WSL, seems to have the same problem, and a
bit worse since even client-side renames breaks opened dir fds on the
WSL-to-Windows 9pfs (/mnt/c/...))

(Another challenge is that Landlock would have to know when a file is on a
9pfs in uncached mode - we probably don't need this behaviour for cached
mode filesystems, as we assume no server changes in that case and the
inode is reused already.  We can certainly determine the FS of a file, but
not sure about specific 9pfs cache options)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/7] fs/9p: Reuse inode based on path (in addition to qid)
  2025-09-14 21:25 ` [PATCH v2 0/7] fs/9p: Reuse inode based on path (in addition to qid) Tingmao Wang
@ 2025-09-15 12:53   ` Dominique Martinet
  2025-09-15 13:44     ` Tingmao Wang
  2025-09-15 14:10     ` Christian Schoenebeck
  0 siblings, 2 replies; 27+ messages in thread
From: Dominique Martinet @ 2025-09-15 12:53 UTC (permalink / raw)
  To: Tingmao Wang
  Cc: Mickaël Salaün, Eric Van Hensbergen, Latchesar Ionkov,
	Christian Schoenebeck, v9fs, Günther Noack,
	linux-security-module, Jan Kara, Amir Goldstein,
	Matthew Bobrowski, Al Viro, Christian Brauner, linux-fsdevel

Hi Tingmao,

thanks for pushing this forward, I still have very little time so
haven't been able to review this properly

Tingmao Wang wrote on Sun, Sep 14, 2025 at 10:25:02PM +0100:
> I had a chat with Mickaël earlier this week and some discussion following
> that, and we thought of a potential alternative to what I was proposing
> here that might work for Landlock: using the inode number (or more
> correctly, qid.path) directly as the keys for Landlock rules when
> accessing 9p files.  I'm not sure how sound this is from the perspective
> of 9pfs (there are pros and caveats), and I would like to gather some
> thoughts on this idea.

I'm honestly split on this:
- I really don't like tracking the full path of each file around;
there are various corner cases with files being removed (possibly server
side!) or hard links; and it's potentially slowing down all operations a
bit...
- OTOH as you pointed out qid isn't as reliable, and having file paths
around opens the way to rebuilding fids on reconnect for non-local
servers, which could potentially be interesting (not that I ever see
myself having time to work on this as I no longer have any stake there,
I just know that would have interested my previous employer when they
were still using 9p/rdma...)

> In discussion with Mickaël he thought that it would be acceptable for
> Landlock to assume that the server is well-behaved, and Landlock could
> specialize for 9pfs to allow access if the qid matches what's previously
> seen when creating the Landlock ruleset (by using the qid as the key of
> the rule, instead of a pointer to the inode).

I'm not familiar at all with landlock so forgive this question: what is
this key about exactly?
When a program loads a ruleset, paths referred in that ruleset are
looked up by the kernel and the inodes involved kept around in some hash
table for lookup on further accesses?

I'm fuzzy on the details but I don't see how inode pointers would be
stable for other filesystems as well, what prevents
e.g. vm.drop_caches=3 to drop these inodes on ext4?

In general I'd see the file handle (as exposed to userspace by
name_to_handle_at) as a stable key, that works for all filesystems
supporting fhandles (... so, not 9p, right... But in general it's
something like inode number + generation, and we could expose that as
handle and "just" return ENOTSUP on open_by_handle_at if that helps)

Although looking at the patches what 9p seems to need isn't a new stable
handle, but "just" not allocating new inodes in iget5...
This was attempted in 724a08450f74 ("fs/9p: simplify iget to remove
unnecessary paths"), but later reverted in be2ca3825372 ("Revert "fs/9p:
simplify iget to remove unnecessary paths"") because it broke too many
users, but if you're comfortable with a new mount option for the lookup
by path I think we could make a new option saying
"yes_my_server_has_unique_qids"... Which I assume would work for
landlock/fsnotify?

If you'd like to try, you can re-revert these 4 patches:
Fixes: be2ca3825372 ("Revert "fs/9p: simplify iget to remove unnecessary paths"")
Fixes: 26f8dd2dde68 ("Revert "fs/9p: fix uaf in in v9fs_stat2inode_dotl"")
Fixes: fedd06210b14 ("Revert "fs/9p: remove redundant pointer v9ses"")
Fixes: f69999b5f9b4 ("Revert " fs/9p: mitigate inode collisions"")

If that works, and having this only work when a non-default option is
set is acceptable, I think that's as good a way forward as we'll find.

> 1. The qid is 9pfs internal data, and we may need extra API for 9pfs to
>    expose this to Landlock.  On 64bit, this is easy as it's just the inode
>    number (offset by 2), which we can already get from the struct inode.
>    But perhaps on 32bit we need a way to expose the full 64bit server-sent
>    qid to Landlock (or other kernel subsystems), if we're going to do
>    this.

I'm not sure how much effort we want to spend on 32bit: as far as I
know, if we have inode number collision on 32 bit we're already in
trouble (tools like tar will consider such files to be hardlink of each
other and happily skip reading data, producing corrupted archives);
this is not a happy state but I don't know how to do better in any
reasonable way, so we can probably keep a similar limitation for 32bit
and use inode number directly...

> 2. Even though qids are supposed to be unique across the lifetime of a
>    filesystem (including deleted files), this is not the case even for
>    QEMU in multidevs=remap mode, when running on ext4, as tested on QEMU
>    10.1.0.

I'm not familiar with the qid remap implementation in qemu, but I'm
curious in what case you hit that.
Deleting and recreating files? Or as you seem to say below the 'qid' is
"freed" when fd is closed qemu-side and re-used by later open of other
files?

If this is understood I think this can be improved, reusing the qid on
different files could yield problems with caching as well so I think
it's something that warrants investigations.

>    Unfortunately, holding a dentry in Landlock prevents the filesystem
>    from being unmounted (causes WARNs), with no (proper) chance for
>    Landlock to release those dentries.  We might do it in
>    security_sb_umount, but then at that point it is not guaranteed that
>    the unmount will happen - perhaps we would need a new security_ hooks
>    in the umount path?

Hmm yeah that is problematic, I don't see how to take "weak" refs that
wouldn't cause a warning for the umount to free yet still prevent
recycling the inode, so another hook to free up resources when really
umounting sounds appropriate if we go that way... At least umount isn't
usually performance sensitive so it would probably be acceptable?

-- 
Dominique Martinet | Asmadeus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/7] fs/9p: Reuse inode based on path (in addition to qid)
  2025-09-15 12:53   ` Dominique Martinet
@ 2025-09-15 13:44     ` Tingmao Wang
  2025-09-15 23:31       ` Dominique Martinet
  2025-09-15 14:10     ` Christian Schoenebeck
  1 sibling, 1 reply; 27+ messages in thread
From: Tingmao Wang @ 2025-09-15 13:44 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: Mickaël Salaün, Eric Van Hensbergen, Latchesar Ionkov,
	Christian Schoenebeck, v9fs, Günther Noack,
	linux-security-module, Jan Kara, Amir Goldstein,
	Matthew Bobrowski, Al Viro, Christian Brauner, linux-fsdevel

On 9/15/25 13:53, Dominique Martinet wrote:
> Hi Tingmao,
> 
> thanks for pushing this forward, I still have very little time so
> haven't been able to review this properly

No worries and thanks for the quick reply :)

> 
> Tingmao Wang wrote on Sun, Sep 14, 2025 at 10:25:02PM +0100:
>> I had a chat with Mickaël earlier this week and some discussion following
>> that, and we thought of a potential alternative to what I was proposing
>> here that might work for Landlock: using the inode number (or more
>> correctly, qid.path) directly as the keys for Landlock rules when
>> accessing 9p files.  I'm not sure how sound this is from the perspective
>> of 9pfs (there are pros and caveats), and I would like to gather some
>> thoughts on this idea.
> 
> I'm honestly split on this:
> - I really don't like tracking the full path of each file around;
> there are various corner cases with files being removed (possibly server
> side!) or hard links; and it's potentially slowing down all operations a
> bit...

The way I see it, this tracking is really a best-effort, pragmatic
solution, and in the presence of server-side changes, it's not going to
always be correct (but then, because of the possibility of qid collisions,
there is not really a fully fool-proof way in any case).  In some sense
currently hard links doesn't "work" anyway in uncached mode since each
inode is separate - this doesn't change that.  Files being removed is not
a problem, and if another file with the same name is recreated, if it has
a different qid we will create a new inode anyway (since this patch really
matches against path _and_ qid).  I've not measured the slowdown, but I
think in uncached mode the time is mostly domainated communication
overhead in any case... I think.

However, alternatives do exist (as discussed above and below)

> - OTOH as you pointed out qid isn't as reliable, and having file paths
> around opens the way to rebuilding fids on reconnect for non-local
> servers, which could potentially be interesting (not that I ever see
> myself having time to work on this as I no longer have any stake there,
> I just know that would have interested my previous employer when they
> were still using 9p/rdma...)

If you / anyone else would be interested in that I can implement it, but
that would probably be best done by landing the ino_path struct in this
patchset first.

> 
>> In discussion with Mickaël he thought that it would be acceptable for
>> Landlock to assume that the server is well-behaved, and Landlock could
>> specialize for 9pfs to allow access if the qid matches what's previously
>> seen when creating the Landlock ruleset (by using the qid as the key of
>> the rule, instead of a pointer to the inode).
> 
> I'm not familiar at all with landlock so forgive this question: what is
> this key about exactly?
> When a program loads a ruleset, paths referred in that ruleset are
> looked up by the kernel and the inodes involved kept around in some hash
> table for lookup on further accesses?

Yes, that is correct (but it uses rbtree, not hash table, currently).

> 
> I'm fuzzy on the details but I don't see how inode pointers would be
> stable for other filesystems as well, what prevents
> e.g. vm.drop_caches=3 to drop these inodes on ext4?

Landlock holds a reference to the inode in the ruleset, so they shouldn't
be dropped.  On security_sb_delete Landlock will iput those inodes so they
won't cause issue with unmounting.  There is some special mechanism
("landlock objects") to decouple the ruleset themselves from the actual
inodes, so that previously Landlocked things can keep running even after
the inode has disappeared as a result of unmounting.

> 
> In general I'd see the file handle (as exposed to userspace by
> name_to_handle_at) as a stable key, that works for all filesystems
> supporting fhandles (... so, not 9p, right... But in general it's
> something like inode number + generation, and we could expose that as
> handle and "just" return ENOTSUP on open_by_handle_at if that helps)

Hmm, I think this would be a good way for 9pfs to expose the qid to
Landlock, by exposing it as a handle, since that is standardized.

> 
> Although looking at the patches what 9p seems to need isn't a new stable
> handle, but "just" not allocating new inodes in iget5...
> This was attempted in 724a08450f74 ("fs/9p: simplify iget to remove
> unnecessary paths"), but later reverted in be2ca3825372 ("Revert "fs/9p:
> simplify iget to remove unnecessary paths"") because it broke too many
> users, but if you're comfortable with a new mount option for the lookup
> by path I think we could make a new option saying
> "yes_my_server_has_unique_qids"... Which I assume would work for
> landlock/fsnotify?

I noticed that, but assumed that simply reverting them without additional
work (such as tracking the string path) would be a no go given the reason
why they are reverted.

> 
> If you'd like to try, you can re-revert these 4 patches:
> Fixes: be2ca3825372 ("Revert "fs/9p: simplify iget to remove unnecessary paths"")
> Fixes: 26f8dd2dde68 ("Revert "fs/9p: fix uaf in in v9fs_stat2inode_dotl"")
> Fixes: fedd06210b14 ("Revert "fs/9p: remove redundant pointer v9ses"")
> Fixes: f69999b5f9b4 ("Revert " fs/9p: mitigate inode collisions"")
> 
> If that works, and having this only work when a non-default option is
> set is acceptable, I think that's as good a way forward as we'll find.

Well, if you think there is no other possibility for a default solution
(and tracking paths by default is not feasible) I think it might also be
alright if we expose the qid as a handle to Landlock (without any need for
mount options), and figure out a way for Landlock to keep a fid open.

> 
>> 1. The qid is 9pfs internal data, and we may need extra API for 9pfs to
>>    expose this to Landlock.  On 64bit, this is easy as it's just the inode
>>    number (offset by 2), which we can already get from the struct inode.
>>    But perhaps on 32bit we need a way to expose the full 64bit server-sent
>>    qid to Landlock (or other kernel subsystems), if we're going to do
>>    this.
> 
> I'm not sure how much effort we want to spend on 32bit: as far as I
> know, if we have inode number collision on 32 bit we're already in
> trouble (tools like tar will consider such files to be hardlink of each
> other and happily skip reading data, producing corrupted archives);
> this is not a happy state but I don't know how to do better in any
> reasonable way, so we can probably keep a similar limitation for 32bit
> and use inode number directly...

I think if 9pfs export a handle it can be the full 64bit qid.path on any
platform, right?

> 
>> 2. Even though qids are supposed to be unique across the lifetime of a
>>    filesystem (including deleted files), this is not the case even for
>>    QEMU in multidevs=remap mode, when running on ext4, as tested on QEMU
>>    10.1.0.
> 
> I'm not familiar with the qid remap implementation in qemu, but I'm
> curious in what case you hit that.
> Deleting and recreating files? Or as you seem to say below the 'qid' is
> "freed" when fd is closed qemu-side and re-used by later open of other
> files?

I tried mounting a qemu-exported 9pfs backed on ext4, with
multidevs=remap, and created a file, used stat to note its inode number,
deleted the file, created another file (of the same OR different name),
and that new file will have the same inode number.

(If I don't delete the file, then a newly created file would of course
have a different ext4 inode number, and in that case QEMU exposes a
different qid)

> 
> If this is understood I think this can be improved, reusing the qid on
> different files could yield problems with caching as well so I think
> it's something that warrants investigations.
> 
>>    Unfortunately, holding a dentry in Landlock prevents the filesystem
>>    from being unmounted (causes WARNs), with no (proper) chance for
>>    Landlock to release those dentries.  We might do it in
>>    security_sb_umount, but then at that point it is not guaranteed that
>>    the unmount will happen - perhaps we would need a new security_ hooks
>>    in the umount path?
> 
> Hmm yeah that is problematic, I don't see how to take "weak" refs that
> wouldn't cause a warning for the umount to free yet still prevent
> recycling the inode, so another hook to free up resources when really
> umounting sounds appropriate if we go that way... At least umount isn't
> usually performance sensitive so it would probably be acceptable?
> 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/7] fs/9p: Reuse inode based on path (in addition to qid)
  2025-09-15 12:53   ` Dominique Martinet
  2025-09-15 13:44     ` Tingmao Wang
@ 2025-09-15 14:10     ` Christian Schoenebeck
  2025-09-17 15:00       ` Mickaël Salaün
  1 sibling, 1 reply; 27+ messages in thread
From: Christian Schoenebeck @ 2025-09-15 14:10 UTC (permalink / raw)
  To: Tingmao Wang, Dominique Martinet
  Cc: Mickaël Salaün, Eric Van Hensbergen, Latchesar Ionkov,
	v9fs, Günther Noack, linux-security-module, Jan Kara,
	Amir Goldstein, Matthew Bobrowski, Al Viro, Christian Brauner,
	linux-fsdevel

On Monday, September 15, 2025 2:53:14 PM CEST Dominique Martinet wrote:
[...]
> > 1. The qid is 9pfs internal data, and we may need extra API for 9pfs to
> > 
> >    expose this to Landlock.  On 64bit, this is easy as it's just the inode
> >    number (offset by 2), which we can already get from the struct inode.
> >    But perhaps on 32bit we need a way to expose the full 64bit server-sent
> >    qid to Landlock (or other kernel subsystems), if we're going to do
> >    this.
> 
> I'm not sure how much effort we want to spend on 32bit: as far as I
> know, if we have inode number collision on 32 bit we're already in
> trouble (tools like tar will consider such files to be hardlink of each
> other and happily skip reading data, producing corrupted archives);
> this is not a happy state but I don't know how to do better in any
> reasonable way, so we can probably keep a similar limitation for 32bit
> and use inode number directly...

I agree, on 32-bit the game is lost.

One way that would come to my mind though: exposing the full qid path as xattr 
on 32-bit, e.g. via "system.9pfs_qid" or something like that.

> > 2. Even though qids are supposed to be unique across the lifetime of a
> > 
> >    filesystem (including deleted files), this is not the case even for
> >    QEMU in multidevs=remap mode, when running on ext4, as tested on QEMU
> >    10.1.0.
> 
> I'm not familiar with the qid remap implementation in qemu, but I'm
> curious in what case you hit that.
> Deleting and recreating files? Or as you seem to say below the 'qid' is
> "freed" when fd is closed qemu-side and re-used by later open of other
> files?

The inode remap algorithm in QEMU's 9p server was designed to prevent inode 
number collisions of equally numbered inodes of *different* *devices* on host, 
exposed to guest via the same 9p mount (which appears as only one 9pfs device 
on guest). Basis for this however is still the underlying filesystem's inode 
number on host.

So yes, ext4 re-uses inode numbers of deleted files, and when that happens, a 
new file appears with the same qid path as the previously deleted file with 
QEMU.

/Christian



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/7] fs/9p: Reuse inode based on path (in addition to qid)
  2025-09-15 13:44     ` Tingmao Wang
@ 2025-09-15 23:31       ` Dominique Martinet
  2025-09-16 12:44         ` Tingmao Wang
  0 siblings, 1 reply; 27+ messages in thread
From: Dominique Martinet @ 2025-09-15 23:31 UTC (permalink / raw)
  To: Tingmao Wang
  Cc: Christian Schoenebeck, Mickaël Salaün,
	Eric Van Hensbergen, Latchesar Ionkov, v9fs, Günther Noack,
	linux-security-module, Jan Kara, Amir Goldstein,
	Matthew Bobrowski, Al Viro, Christian Brauner, linux-fsdevel

(Thanks Christian, replying just once but your reply was helpful)

Tingmao Wang wrote on Mon, Sep 15, 2025 at 02:44:44PM +0100:
> > I'm fuzzy on the details but I don't see how inode pointers would be
> > stable for other filesystems as well, what prevents
> > e.g. vm.drop_caches=3 to drop these inodes on ext4?
> 
> Landlock holds a reference to the inode in the ruleset, so they shouldn't
> be dropped.  On security_sb_delete Landlock will iput those inodes so they
> won't cause issue with unmounting.  There is some special mechanism
> ("landlock objects") to decouple the ruleset themselves from the actual
> inodes, so that previously Landlocked things can keep running even after
> the inode has disappeared as a result of unmounting.

Thank you for the explanation, that makes more sense.
iirc even in cacheless mode 9p should keep inode arounds if there's an
open fd somewhere

> > Although looking at the patches what 9p seems to need isn't a new stable
> > handle, but "just" not allocating new inodes in iget5...
> > This was attempted in 724a08450f74 ("fs/9p: simplify iget to remove
> > unnecessary paths"), but later reverted in be2ca3825372 ("Revert "fs/9p:
> > simplify iget to remove unnecessary paths"") because it broke too many
> > users, but if you're comfortable with a new mount option for the lookup
> > by path I think we could make a new option saying
> > "yes_my_server_has_unique_qids"... Which I assume would work for
> > landlock/fsnotify?
> 
> I noticed that, but assumed that simply reverting them without additional
> work (such as tracking the string path) would be a no go given the reason
> why they are reverted.

Yes, just reverting and using that as default broke too much things, so
this is unfortunately not acceptable... And 9p has no "negotiation"
phase on mount to say "okay this is qemu with remap mode so we can do
that" to enable to disable the behaviour automatically; which has been
annoying in the past too.

I understand you'd prefer something that works by default.

> > I'm not sure how much effort we want to spend on 32bit: as far as I
> > know, if we have inode number collision on 32 bit we're already in
> > trouble (tools like tar will consider such files to be hardlink of each
> > other and happily skip reading data, producing corrupted archives);
> > this is not a happy state but I don't know how to do better in any
> > reasonable way, so we can probably keep a similar limitation for 32bit
> > and use inode number directly...
> 
> I think if 9pfs export a handle it can be the full 64bit qid.path on any
> platform, right?

yes, file handle can be an arbitrary size.

> > I'm not familiar with the qid remap implementation in qemu, but I'm
> > curious in what case you hit that.
> > Deleting and recreating files? Or as you seem to say below the 'qid' is
> > "freed" when fd is closed qemu-side and re-used by later open of other
> > files?
> 
> I tried mounting a qemu-exported 9pfs backed on ext4, with
> multidevs=remap, and created a file, used stat to note its inode number,
> deleted the file, created another file (of the same OR different name),
> and that new file will have the same inode number.
> 
> (If I don't delete the file, then a newly created file would of course
> have a different ext4 inode number, and in that case QEMU exposes a
> different qid)

Ok so from Christian's reply this is just ext4 reusing the same inode..
I briefly hinted at this above, but in this case ext4 will give the
inode a different generation number (so the ext4 file handle will be
different, and accessing the old one will get ESTALE); but that's not
something qemu currently tracks and it'd be a bit of an overhaul...
In theory qemu could hash mount_id + file handle to get a properly
unique qid, if we need to improve that, but that'd be limited to root
users (and to filesystems that support name_to_handle_at) so I don't
think it's really appropriate either... hmm..

(I also thought of checking if nlink is 0 when getting a new inode, but
that's technically legimitate from /proc/x/fd opens so I don't think we
can do that either)

And then there's also all the servers that don't give unique qids at
all, so we'll just get weird landlock/fsnotify behaviours for them if we
go that way...

-----------------

Okay, you've convinced me something like path tracking seems more
appropriate; I'll just struggle one last time first with a few more open
questions:
 - afaiu this (reusing inodes) work in cached mode because the dentry is
kept around; I don't understand the vfs well enough but could the inodes
hold its dentry and dentries hold their parent dentry alive somehow?
So in cacheless mode, if you have a tree like this:
a
└── b
    ├── c
    └── d
with c 'open' (or a reference held by landlock), then dentries for a/b/c
would be kept, but d could be droppable?

My understanding is that in cacheless mode we're dropping dentries
aggressively so that things like readdir() are refreshed, but I'm
thinking this should work even if we keep some dentries alive when their
inode is held up.

 - if that doesn't work (or is too complicated), I'm thinking tracking
path is probably better than qid-based filtering based on what we
discussed as it only affects uncached mode.. I'll need to spend some
time testing but I think we can move forward with the current patchset
rather than try something new.

Thanks!
-- 
Dominique Martinet | Asmadeus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/7] fs/9p: Reuse inode based on path (in addition to qid)
  2025-09-15 23:31       ` Dominique Martinet
@ 2025-09-16 12:44         ` Tingmao Wang
  2025-09-16 13:35           ` Dominique Martinet
  2025-09-16 13:43           ` Christian Schoenebeck
  0 siblings, 2 replies; 27+ messages in thread
From: Tingmao Wang @ 2025-09-16 12:44 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: Christian Schoenebeck, Mickaël Salaün,
	Eric Van Hensbergen, Latchesar Ionkov, v9fs, Günther Noack,
	linux-security-module, Jan Kara, Amir Goldstein,
	Matthew Bobrowski, Al Viro, Christian Brauner, linux-fsdevel

On 9/16/25 00:31, Dominique Martinet wrote:
> (Thanks Christian, replying just once but your reply was helpful)
> 
> Tingmao Wang wrote on Mon, Sep 15, 2025 at 02:44:44PM +0100:
>>> I'm fuzzy on the details but I don't see how inode pointers would be
>>> stable for other filesystems as well, what prevents
>>> e.g. vm.drop_caches=3 to drop these inodes on ext4?
>>
>> Landlock holds a reference to the inode in the ruleset, so they shouldn't
>> be dropped.  On security_sb_delete Landlock will iput those inodes so they
>> won't cause issue with unmounting.  There is some special mechanism
>> ("landlock objects") to decouple the ruleset themselves from the actual
>> inodes, so that previously Landlocked things can keep running even after
>> the inode has disappeared as a result of unmounting.
> 
> Thank you for the explanation, that makes more sense.
> iirc even in cacheless mode 9p should keep inode arounds if there's an
> open fd somewhere

Yes, because there is a dentry that has a reference to it.  Similarly if
there is a Landlock rule referencing it, the inode will also be kept
around (but not the dentry, Landlock only references the inode).  The
problem is that when another application (that is being Landlocked)
accesses the file, 9pfs will create a new inode in uncached mode,
regardless of whether an existing inode exists.

> 
>> [...]
>>
>> I tried mounting a qemu-exported 9pfs backed on ext4, with
>> multidevs=remap, and created a file, used stat to note its inode number,
>> deleted the file, created another file (of the same OR different name),
>> and that new file will have the same inode number.
>>
>> (If I don't delete the file, then a newly created file would of course
>> have a different ext4 inode number, and in that case QEMU exposes a
>> different qid)
> 
> Ok so from Christian's reply this is just ext4 reusing the same inode..
> I briefly hinted at this above, but in this case ext4 will give the
> inode a different generation number (so the ext4 file handle will be
> different, and accessing the old one will get ESTALE); but that's not
> something qemu currently tracks and it'd be a bit of an overhaul...
> In theory qemu could hash mount_id + file handle to get a properly
> unique qid, if we need to improve that, but that'd be limited to root
> users (and to filesystems that support name_to_handle_at) so I don't
> think it's really appropriate either... hmm..

Actually I think I forgot that there is also qid.version, which in the
case of a QEMU-exported 9pfs might just be the file modification time?  In
9pfs currently we do reject a inode match if that version changed server
side in cached mode:

v9fs_test_inode_dotl:
	/* compare qid details */
	if (memcmp(&v9inode->qid.version,
		   &st->qid.version, sizeof(v9inode->qid.version)))
		return 0;

(not tested whether QEMU correctly sets this version yet)

> 
> (I also thought of checking if nlink is 0 when getting a new inode, but
> that's technically legimitate from /proc/x/fd opens so I don't think we
> can do that either)
> 
> And then there's also all the servers that don't give unique qids at
> all, so we'll just get weird landlock/fsnotify behaviours for them if we
> go that way...
> 
> -----------------
> 
> Okay, you've convinced me something like path tracking seems more
> appropriate; I'll just struggle one last time first with a few more open
> questions:
>  - afaiu this (reusing inodes) work in cached mode because the dentry is
> kept around;

Based on my understanding, I think this isn't really to do with whether
the dentry is around or not.  In cached mode, 9pfs will use iget5_locked
to look up an existing inode based on the qid, if one exists, and use
that, even if no cached dentry points to it.  However, in uncached mode,
currently if vfs asks 9pfs to find an inode (e.g. because the dentry is no
longer in cache), it always get a new one:

v9fs_vfs_lookup:
	...
	else if (v9ses->cache & (CACHE_META|CACHE_LOOSE))
		inode = v9fs_get_inode_from_fid(v9ses, fid, dir->i_sb);
	else
		inode = v9fs_get_new_inode_from_fid(v9ses, fid, dir->i_sb);
	...
v9fs_qid_iget_dotl:
	...
	if (new)
		test = v9fs_test_new_inode_dotl;
	else
		test = v9fs_test_inode_dotl;
	...
v9fs_test_new_inode_dotl:
	static int v9fs_test_new_inode_dotl(struct inode *inode, void *data)
	{
		return 0;
	}


> I don't understand the vfs well enough but could the inodes
> hold its dentry and dentries hold their parent dentry alive somehow?
> So in cacheless mode, if you have a tree like this:
> a
> └── b
>     ├── c
>     └── d
> with c 'open' (or a reference held by landlock), then dentries for a/b/c
> would be kept, but d could be droppable?

I think, based on my understanding, a child dentry does always have a
reference to its parent, and so parent won't be dropped before child, if
child dentry is alive.  However holding a proper dentry reference in an
inode might be tricky as dentry holds the reference to its inode.

> 
> My understanding is that in cacheless mode we're dropping dentries
> aggressively so that things like readdir() are refreshed, but I'm
> thinking this should work even if we keep some dentries alive when their
> inode is held up.

If we have some way of keeping the dentry alive (without introducing
circular reference problems) then I guess that would work and we don't
have to track paths ourselves.

> 
>  - if that doesn't work (or is too complicated), I'm thinking tracking
> path is probably better than qid-based filtering based on what we
> discussed as it only affects uncached mode.. I'll need to spend some
> time testing but I think we can move forward with the current patchset
> rather than try something new.
> 
> Thanks!

Note that in discussion with Mickaël (maintainer of Landlock) he indicated
that he would be comfortable for Landlock to track a qid, instead of
holding a inode, specifically for 9pfs.

(This doesn't solve the problem for fsnotify though)

Kind regards,
Tingmao

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/7] fs/9p: Reuse inode based on path (in addition to qid)
  2025-09-16 12:44         ` Tingmao Wang
@ 2025-09-16 13:35           ` Dominique Martinet
  2025-09-16 14:01             ` Tingmao Wang
  2025-09-16 13:43           ` Christian Schoenebeck
  1 sibling, 1 reply; 27+ messages in thread
From: Dominique Martinet @ 2025-09-16 13:35 UTC (permalink / raw)
  To: Tingmao Wang
  Cc: Christian Schoenebeck, Mickaël Salaün,
	Eric Van Hensbergen, Latchesar Ionkov, v9fs, Günther Noack,
	linux-security-module, Jan Kara, Amir Goldstein,
	Matthew Bobrowski, Al Viro, Christian Brauner, linux-fsdevel

Tingmao Wang wrote on Tue, Sep 16, 2025 at 01:44:27PM +0100:
> > iirc even in cacheless mode 9p should keep inode arounds if there's an
> > open fd somewhere
> 
> Yes, because there is a dentry that has a reference to it.  Similarly if
> there is a Landlock rule referencing it, the inode will also be kept
> around (but not the dentry, Landlock only references the inode).  The
> problem is that when another application (that is being Landlocked)
> accesses the file, 9pfs will create a new inode in uncached mode,
> regardless of whether an existing inode exists.
> [...]
> Based on my understanding, I think this isn't really to do with whether
> the dentry is around or not.  In cached mode, 9pfs will use iget5_locked
> to look up an existing inode based on the qid, if one exists, and use
> that, even if no cached dentry points to it.  However, in uncached mode,
> currently if vfs asks 9pfs to find an inode (e.g. because the dentry is no
> longer in cache), it always get a new one:
> [...]
> v9fs_qid_iget_dotl:
> 	...
> 	if (new)
> 		test = v9fs_test_new_inode_dotl;
> 	else
> 		test = v9fs_test_inode_dotl;

Right, if we get all the way to iget uncached mode will get a new inode,
but if the file is opened (I tried `cat > foo` and `tail -f foo`) then
re-opening cat will not issue a lookup at all -- v9fs_vfs_lookup() is
not called in the first place.
Likewise, in cached mode, just having the file in cache makes new open
not call v9fs_vfs_lookup for that file (even if it's not currently
open), so that `if (new)` is not actually what matters here afaiu.

What's the condition to make it happen? Can we make that happen with
landlock?

In practice that'd make landlock partially negate cacheless mode, as
we'd "pin" landlocked paths, but as long as readdirs aren't cached and
other metadata is refreshed on e.g. stat() calls I think that's fine
if we can make it happen.

(That's a big if)

> > So in cacheless mode, if you have a tree like this:
> > a
> > └── b
> >     ├── c
> >     └── d
> > with c 'open' (or a reference held by landlock), then dentries for a/b/c
> > would be kept, but d could be droppable?
> 
> I think, based on my understanding, a child dentry does always have a
> reference to its parent, and so parent won't be dropped before child, if
> child dentry is alive.

I'd be tempted to agree here

> However holding a proper dentry reference in an
> inode might be tricky as dentry holds the reference to its inode.

Hmm, yeah, that's problematic.
Could it be held in landlock and not by the inode?
Just thinking out loud.

> > My understanding is that in cacheless mode we're dropping dentries
> > aggressively so that things like readdir() are refreshed, but I'm
> > thinking this should work even if we keep some dentries alive when their
> > inode is held up.
> 
> If we have some way of keeping the dentry alive (without introducing
> circular reference problems) then I guess that would work and we don't
> have to track paths ourselves.

Yes, that's the idea - the dentry basically already contain the path, so
we wouldn't be reinventing the wheel...

> >  - if that doesn't work (or is too complicated), I'm thinking tracking
> > path is probably better than qid-based filtering based on what we
> > discussed as it only affects uncached mode.. I'll need to spend some
> > time testing but I think we can move forward with the current patchset
> > rather than try something new.
> 
> Note that in discussion with Mickaël (maintainer of Landlock) he indicated
> that he would be comfortable for Landlock to track a qid, instead of
> holding a inode, specifically for 9pfs.

Yes, I saw that, but what you pointed out about qid reuse make me
somewhat uncomfortable with that direction -- you could allow a
directory, delete it, create a new one somewhere else and if the
underlying fs reuse the same inode number the rule would allow an
intended directory instead so I'd rather not rely on qid for this
either.
But if you think that's not a problem in practice (because e.g. landlock
would somehow detect the dir got deleted or another good reason it's not
a problem) then I agree it's probably the simplest way forward
implementation-wise.

-- 
Dominique Martinet | Asmadeus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/7] fs/9p: Reuse inode based on path (in addition to qid)
  2025-09-16 12:44         ` Tingmao Wang
  2025-09-16 13:35           ` Dominique Martinet
@ 2025-09-16 13:43           ` Christian Schoenebeck
  1 sibling, 0 replies; 27+ messages in thread
From: Christian Schoenebeck @ 2025-09-16 13:43 UTC (permalink / raw)
  To: Dominique Martinet, Tingmao Wang
  Cc: Mickaël Salaün, Eric Van Hensbergen, Latchesar Ionkov,
	v9fs, Günther Noack, linux-security-module, Jan Kara,
	Amir Goldstein, Matthew Bobrowski, Al Viro, Christian Brauner,
	linux-fsdevel

On Tuesday, September 16, 2025 2:44:27 PM CEST Tingmao Wang wrote:
> On 9/16/25 00:31, Dominique Martinet wrote:
[...]
> >> I tried mounting a qemu-exported 9pfs backed on ext4, with
> >> multidevs=remap, and created a file, used stat to note its inode number,
> >> deleted the file, created another file (of the same OR different name),
> >> and that new file will have the same inode number.
> >> 
> >> (If I don't delete the file, then a newly created file would of course
> >> have a different ext4 inode number, and in that case QEMU exposes a
> >> different qid)
> > 
> > Ok so from Christian's reply this is just ext4 reusing the same inode..
> > I briefly hinted at this above, but in this case ext4 will give the
> > inode a different generation number (so the ext4 file handle will be
> > different, and accessing the old one will get ESTALE); but that's not
> > something qemu currently tracks and it'd be a bit of an overhaul...
> > In theory qemu could hash mount_id + file handle to get a properly
> > unique qid, if we need to improve that, but that'd be limited to root
> > users (and to filesystems that support name_to_handle_at) so I don't
> > think it's really appropriate either... hmm..
> 
> Actually I think I forgot that there is also qid.version, which in the
> case of a QEMU-exported 9pfs might just be the file modification time?  In
> 9pfs currently we do reject a inode match if that version changed server
> side in cached mode:
> 
> v9fs_test_inode_dotl:
> 	/* compare qid details */
> 	if (memcmp(&v9inode->qid.version,
> 		   &st->qid.version, sizeof(v9inode->qid.version)))
> 		return 0;
> 
> (not tested whether QEMU correctly sets this version yet)

Define "correctly". ;-) QEMU sets it like this since 2010:

  qidp->version = stbuf->st_mtime ^ (stbuf->st_size << 8);

https://github.com/qemu/qemu/blob/190d5d7fd725ff754f94e8e0cbfb69f279c82b5d/hw/9pfs/9p.c#L1020

/Christian



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/7] fs/9p: Reuse inode based on path (in addition to qid)
  2025-09-16 13:35           ` Dominique Martinet
@ 2025-09-16 14:01             ` Tingmao Wang
  2025-09-16 19:22               ` Christian Schoenebeck
  0 siblings, 1 reply; 27+ messages in thread
From: Tingmao Wang @ 2025-09-16 14:01 UTC (permalink / raw)
  To: Dominique Martinet, Christian Schoenebeck
  Cc: Mickaël Salaün, Eric Van Hensbergen, Latchesar Ionkov,
	v9fs, Günther Noack, linux-security-module, Jan Kara,
	Amir Goldstein, Matthew Bobrowski, Al Viro, Christian Brauner,
	linux-fsdevel

On 9/16/25 14:35, Dominique Martinet wrote:
> Tingmao Wang wrote on Tue, Sep 16, 2025 at 01:44:27PM +0100:
>> [...]
>>
>> Note that in discussion with Mickaël (maintainer of Landlock) he indicated
>> that he would be comfortable for Landlock to track a qid, instead of
>> holding a inode, specifically for 9pfs.
> 
> Yes, I saw that, but what you pointed out about qid reuse make me
> somewhat uncomfortable with that direction -- you could allow a
> directory, delete it, create a new one somewhere else and if the
> underlying fs reuse the same inode number the rule would allow an
> intended directory instead so I'd rather not rely on qid for this
> either.
> But if you think that's not a problem in practice (because e.g. landlock
> would somehow detect the dir got deleted or another good reason it's not
> a problem) then I agree it's probably the simplest way forward
> implementation-wise.
> 

Sorry, I forgot to add that this idea would also involve Landlock holding
a reference to the fid (or dentry, but that's problematic due to breaking
unmount unless we can have a new hook) to keep the file open on the host
side so that the qid won't be reused (ignoring collisions caused by
different filesystems mounted under one 9pfs export when multidev mapping
is not enabled)

(There's the separate issue of QEMU not seemingly keeping a directory open
on the host when the guest has a fid to it tho.  I checked that if the dir
is renamed on the host side, any process in the guest that has a fd to it
(checked via cd in a shell) will not be able to use that fd to read it
anymore.  This also means that another directory might be created with the
same qid.path)

(I've not looked at the code yet but Christian, feel free to point out if
I missed anything or if you disagree :) didn't realize earlier you're also
the recent author of the 9pfs server in QEMU)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/7] fs/9p: Reuse inode based on path (in addition to qid)
  2025-09-16 14:01             ` Tingmao Wang
@ 2025-09-16 19:22               ` Christian Schoenebeck
  2025-09-16 23:59                 ` Tingmao Wang
  0 siblings, 1 reply; 27+ messages in thread
From: Christian Schoenebeck @ 2025-09-16 19:22 UTC (permalink / raw)
  To: Dominique Martinet, Tingmao Wang
  Cc: Mickaël Salaün, Eric Van Hensbergen, Latchesar Ionkov,
	v9fs, Günther Noack, linux-security-module, Jan Kara,
	Amir Goldstein, Matthew Bobrowski, Al Viro, Christian Brauner,
	linux-fsdevel

On Tuesday, September 16, 2025 4:01:40 PM CEST Tingmao Wang wrote:
> On 9/16/25 14:35, Dominique Martinet wrote:
> > Tingmao Wang wrote on Tue, Sep 16, 2025 at 01:44:27PM +0100:
> >> [...]
> >> 
> >> Note that in discussion with Mickaël (maintainer of Landlock) he
> >> indicated
> >> that he would be comfortable for Landlock to track a qid, instead of
> >> holding a inode, specifically for 9pfs.
> > 
> > Yes, I saw that, but what you pointed out about qid reuse make me
> > somewhat uncomfortable with that direction -- you could allow a
> > directory, delete it, create a new one somewhere else and if the
> > underlying fs reuse the same inode number the rule would allow an
> > intended directory instead so I'd rather not rely on qid for this
> > either.
> > But if you think that's not a problem in practice (because e.g. landlock
> > would somehow detect the dir got deleted or another good reason it's not
> > a problem) then I agree it's probably the simplest way forward
> > implementation-wise.
> 
> Sorry, I forgot to add that this idea would also involve Landlock holding
> a reference to the fid (or dentry, but that's problematic due to breaking
> unmount unless we can have a new hook) to keep the file open on the host
> side so that the qid won't be reused (ignoring collisions caused by
> different filesystems mounted under one 9pfs export when multidev mapping
> is not enabled)

I see that you are proposing an option for your proposed qid based re-using of 
dentries. I don't think it should be on by default though, considering what we 
already discussed (e.g. inodes recycled by ext4, but also not all 9p servers 
handling inode collisions).

> (There's the separate issue of QEMU not seemingly keeping a directory open
> on the host when the guest has a fid to it tho.  I checked that if the dir
> is renamed on the host side, any process in the guest that has a fd to it
> (checked via cd in a shell) will not be able to use that fd to read it
> anymore.  This also means that another directory might be created with the
> same qid.path)

For all open FIDs QEMU retains a descriptor to the file/directory.

Which 9p message do you see sent to server, Trename or Trenameat?

Does this always happen to you or just sometimes, i.e. under heavy load? 
Because even though QEMU retains descriptors of open FIDs; when the QEMU 
process approaches host system's max. allowed number of open file descriptors 
then v9fs_reclaim_fd() [hw/9pfs/9p.c] is called, which closes some descriptors 
of older FIDs to (at least) keep the QEMU process alive.

BTW: to prevent these descriptor reclaims to happen too often, I plan to do 
what many other files servers do: asking the host system on process start to 
increase the max. number of file descriptors.

/Christian



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/7] fs/9p: Reuse inode based on path (in addition to qid)
  2025-09-16 19:22               ` Christian Schoenebeck
@ 2025-09-16 23:59                 ` Tingmao Wang
  2025-09-17  9:52                   ` Christian Schoenebeck
  0 siblings, 1 reply; 27+ messages in thread
From: Tingmao Wang @ 2025-09-16 23:59 UTC (permalink / raw)
  To: Christian Schoenebeck, Dominique Martinet
  Cc: Mickaël Salaün, Eric Van Hensbergen, Latchesar Ionkov,
	v9fs, Günther Noack, linux-security-module, Jan Kara,
	Amir Goldstein, Matthew Bobrowski, Al Viro, Christian Brauner,
	linux-fsdevel

On 9/16/25 20:22, Christian Schoenebeck wrote:
> On Tuesday, September 16, 2025 4:01:40 PM CEST Tingmao Wang wrote:
>> On 9/16/25 14:35, Dominique Martinet wrote:
>>> Tingmao Wang wrote on Tue, Sep 16, 2025 at 01:44:27PM +0100:
>>>> [...]
>>>>
>>>> Note that in discussion with Mickaël (maintainer of Landlock) he
>>>> indicated
>>>> that he would be comfortable for Landlock to track a qid, instead of
>>>> holding a inode, specifically for 9pfs.
>>>
>>> Yes, I saw that, but what you pointed out about qid reuse make me
>>> somewhat uncomfortable with that direction -- you could allow a
>>> directory, delete it, create a new one somewhere else and if the
>>> underlying fs reuse the same inode number the rule would allow an
>>> intended directory instead so I'd rather not rely on qid for this
>>> either.
>>> But if you think that's not a problem in practice (because e.g. landlock
>>> would somehow detect the dir got deleted or another good reason it's not
>>> a problem) then I agree it's probably the simplest way forward
>>> implementation-wise.
>>
>> Sorry, I forgot to add that this idea would also involve Landlock holding
>> a reference to the fid (or dentry, but that's problematic due to breaking
>> unmount unless we can have a new hook) to keep the file open on the host
>> side so that the qid won't be reused (ignoring collisions caused by
>> different filesystems mounted under one 9pfs export when multidev mapping
>> is not enabled)
> 
> I see that you are proposing an option for your proposed qid based re-using of 
> dentries. I don't think it should be on by default though, considering what we 
> already discussed (e.g. inodes recycled by ext4, but also not all 9p servers 
> handling inode collisions).

Just to be clear, this approach (Landlock holding a fid reference, then
using the qid as a key to search for rules when a Landlocked process
accesses the previously remembered file, possibly after the file has been
moved on the server) would only be in Landlock, and would only affect
Landlock, not 9pfs (so not sure what you meant by "re-using of dentries").

The idea behind holding a fid reference within Landlock is that, because
we have the file open, the inode would not get recycled in ext4, and thus
no other file will reuse the qid, until we close that reference (when the
Landlock domain terminates, or when the 9p filesystem is unmounted)

> 
>> (There's the separate issue of QEMU not seemingly keeping a directory open
>> on the host when the guest has a fid to it tho.  I checked that if the dir
>> is renamed on the host side, any process in the guest that has a fd to it
>> (checked via cd in a shell) will not be able to use that fd to read it
>> anymore.  This also means that another directory might be created with the
>> same qid.path)
> 
> For all open FIDs QEMU retains a descriptor to the file/directory.
> 
> Which 9p message do you see sent to server, Trename or Trenameat?
> 
> Does this always happen to you or just sometimes, i.e. under heavy load? 

Always happen, see log: (no Trename since the rename is done on the host)

    qemu flags: -virtfs "local,path=/tmp/test,mount_tag=test,security_model=passthrough,readonly=off,multidevs=remap"
    qemu version: QEMU emulator version 10.1.0 (Debian 1:10.1.0+ds-5)
    guest kernel version: 6.17.0-rc5 (for the avoidance of doubt, this is clean 6.17-rc5 with no patches)
    qemu pid: 511476

    root@host # ls -la /proc/511476/fd | grep test
    lr-x------ 1 root root 64 Sep 17 00:35 41 -> /tmp/test

    root@guest # mount --mkdir -t 9p -o trans=virtio,cache=none,inodeident=qid,debug=13 test /tmp/test
    root@guest # mkdir /tmp/test/dir1
    root@guest # cd /tmp/test/dir1
     9pnet: -- v9fs_vfs_getattr_dotl (183): dentry: ffff888102ed4d38
     9pnet: -- v9fs_fid_find (183):  dentry: / (ffff888102ed4d38) uid 0 any 0
     9pnet: (00000183) >>> TGETATTR fid 1, request_mask 16383
     9pnet: (00000183) >>> size=19 type: 24 tag: 0
     9pnet: (00000183) <<< size=160 type: 25 tag: 0
     9pnet: (00000183) <<< RGETATTR st_result_mask=6143
     <<< qid=80.3.68c9c8a3
     <<< st_mode=000043ff st_nlink=3
     <<< st_uid=0 st_gid=0
     <<< st_rdev=0 st_size=3c st_blksize=131072 st_blocks=0
     <<< st_atime_sec=1758065706 st_atime_nsec=857221735
     <<< st_mtime_sec=1758065827 st_mtime_nsec=745359877
     <<< st_ctime_sec=1758065827 st_ctime_nsec=745359877
     <<< st_btime_sec=0 st_btime_nsec=0
     <<< st_gen=0 st_data_version=0
     9pnet: -- v9fs_vfs_lookup (183): dir: ffff8881090e0000 dentry: (dir1) ffff888102e458f8 flags: 1
     9pnet: -- v9fs_fid_find (183):  dentry: / (ffff888102ed4d38) uid 0 any 0
     9pnet: (00000183) >>> TWALK fids 1,2 nwname 1d wname[0] dir1
     9pnet: (00000183) >>> size=23 type: 110 tag: 0
     9pnet: (00000183) <<< size=22 type: 111 tag: 0
     9pnet: (00000183) <<< RWALK nwqid 1:
     9pnet: (00000183) <<<     [0] 80.5.68c9dca3
     9pnet: (00000183) >>> TGETATTR fid 2, request_mask 6143
     9pnet: (00000183) >>> size=19 type: 24 tag: 0
     9pnet: (00000183) <<< size=160 type: 25 tag: 0
     9pnet: (00000183) <<< RGETATTR st_result_mask=6143
     <<< qid=80.5.68c9dca3
     <<< st_mode=000041ed st_nlink=2
     <<< st_uid=0 st_gid=0
     <<< st_rdev=0 st_size=28 st_blksize=131072 st_blocks=0
     <<< st_atime_sec=1758065827 st_atime_nsec=745359877
     <<< st_mtime_sec=1758065827 st_mtime_nsec=745359877
     <<< st_ctime_sec=1758065827 st_ctime_nsec=749830521
     <<< st_btime_sec=0 st_btime_nsec=0
     <<< st_gen=0 st_data_version=0
     9pnet: -- v9fs_vfs_getattr_dotl (183): dentry: ffff888102e458f8
     9pnet: -- v9fs_fid_find (183):  dentry: dir1 (ffff888102e458f8) uid 0 any 0
     9pnet: (00000183) >>> TGETATTR fid 2, request_mask 16383
     9pnet: (00000183) >>> size=19 type: 24 tag: 0
     9pnet: (00000183) <<< size=160 type: 25 tag: 0
     9pnet: (00000183) <<< RGETATTR st_result_mask=6143
     <<< qid=80.5.68c9dca3
     <<< st_mode=000041ed st_nlink=2
     <<< st_uid=0 st_gid=0
     <<< st_rdev=0 st_size=28 st_blksize=131072 st_blocks=0
     <<< st_atime_sec=1758065827 st_atime_nsec=745359877
     <<< st_mtime_sec=1758065827 st_mtime_nsec=745359877
     <<< st_ctime_sec=1758065827 st_ctime_nsec=749830521
     <<< st_btime_sec=0 st_btime_nsec=0
     <<< st_gen=0 st_data_version=0
     9pnet: -- v9fs_dentry_release (183):  dentry: dir1 (ffff888102e458f8)
     9pnet: (00000183) >>> TCLUNK fid 2 (try 0)
     9pnet: (00000183) >>> size=11 type: 120 tag: 0
     9pnet: (00000183) <<< size=7 type: 121 tag: 0
     9pnet: (00000183) <<< RCLUNK fid 2
     9pnet: -- v9fs_vfs_lookup (183): dir: ffff8881090e0000 dentry: (dir1) ffff888102e45a70 flags: 3
     9pnet: -- v9fs_fid_find (183):  dentry: / (ffff888102ed4d38) uid 0 any 0
     9pnet: (00000183) >>> TWALK fids 1,2 nwname 1d wname[0] dir1
     9pnet: (00000183) >>> size=23 type: 110 tag: 0
     9pnet: (00000183) <<< size=22 type: 111 tag: 0
     9pnet: (00000183) <<< RWALK nwqid 1:
     9pnet: (00000183) <<<     [0] 80.5.68c9dca3
     9pnet: (00000183) >>> TGETATTR fid 2, request_mask 6143
     9pnet: (00000183) >>> size=19 type: 24 tag: 0
     9pnet: (00000183) <<< size=160 type: 25 tag: 0
     9pnet: (00000183) <<< RGETATTR st_result_mask=6143
     <<< qid=80.5.68c9dca3
     <<< st_mode=000041ed st_nlink=2
     <<< st_uid=0 st_gid=0
     <<< st_rdev=0 st_size=28 st_blksize=131072 st_blocks=0
     <<< st_atime_sec=1758065827 st_atime_nsec=745359877
     <<< st_mtime_sec=1758065827 st_mtime_nsec=745359877
     <<< st_ctime_sec=1758065827 st_ctime_nsec=749830521
     <<< st_btime_sec=0 st_btime_nsec=0
     <<< st_gen=0 st_data_version=0

     (fid 2 is now a persistent handle pointing to /dir1, not sure why the
     walk was done twice)

    root@host # ls -la /proc/511476/fd | grep test
    lr-x------ 1 root root 64 Sep 17 00:35 41 -> /tmp/test
    (no fd points to dir1)

    root@host # mv -v /tmp/test/dir1 /tmp/test/dir2
    renamed '/tmp/test/dir1' -> '/tmp/test/dir2'

    root@guest:/tmp/test/dir1# ls
     9pnet: -- v9fs_vfs_getattr_dotl (183): dentry: ffff888102e45a70
     9pnet: -- v9fs_fid_find (183):  dentry: dir1 (ffff888102e45a70) uid 0 any 0
     9pnet: (00000183) >>> TGETATTR fid 2, request_mask 16383
     9pnet: (00000183) >>> size=19 type: 24 tag: 0
     9pnet: (00000183) <<< size=11 type: 7 tag: 0
     9pnet: (00000183) <<< RLERROR (-2)
     9pnet: -- v9fs_file_open (188): inode: ffff888102e80640 file: ffff88810af45340
     9pnet: -- v9fs_fid_find (188):  dentry: dir1 (ffff888102e45a70) uid 0 any 0
     9pnet: (00000188) >>> TWALK fids 2,3 nwname 0d wname[0] (null)
     9pnet: (00000188) >>> size=17 type: 110 tag: 0
     9pnet: (00000188) <<< size=11 type: 7 tag: 0
     9pnet: (00000188) <<< RLERROR (-2)
    ls: cannot open directory '.': No such file or directory

It looks like as soon as the directory was moved on the host, TGETATTR on
the guest-opened fid 2 fails, even though I would expect that if QEMU
opens a fd to the dir and use that fd whenever fid 2 is used, that
TGETATTR should succeed.  The fact that I can't see anything pointing to
dir1 in /proc/511476/fd was also suspicious.

Also, if I remove the dir on the host, then repoen it in the guest, ls
starts working again:

    root@host # mv -v /tmp/test/dir2 /tmp/test/dir1
    renamed '/tmp/test/dir2' -> '/tmp/test/dir1'

    root@guest:/tmp/test/dir1# ls
     9pnet: -- v9fs_file_open (189): inode: ffff888102e80640 file: ffff88810af47100
     9pnet: -- v9fs_fid_find (189):  dentry: dir1 (ffff888102e45a70) uid 0 any 0
     9pnet: (00000189) >>> TWALK fids 2,3 nwname 0d wname[0] (null)
     9pnet: (00000189) >>> size=17 type: 110 tag: 0
     9pnet: (00000189) <<< size=9 type: 111 tag: 0
     9pnet: (00000189) <<< RWALK nwqid 0:
     9pnet: (00000189) >>> TLOPEN fid 3 mode 100352
     9pnet: (00000189) >>> size=15 type: 12 tag: 0
     9pnet: (00000189) <<< size=24 type: 13 tag: 0
     9pnet: (00000189) <<< RLOPEN qid 80.5.68c9dca3 iounit 0
     9pnet: -- v9fs_vfs_getattr_dotl (189): dentry: ffff888102e45a70
     9pnet: -- v9fs_fid_find (189):  dentry: dir1 (ffff888102e45a70) uid 0 any 0
     9pnet: (00000189) >>> TGETATTR fid 2, request_mask 16383
     9pnet: (00000189) >>> size=19 type: 24 tag: 0
     9pnet: (00000189) <<< size=160 type: 25 tag: 0
     9pnet: (00000189) <<< RGETATTR st_result_mask=6143
     <<< qid=80.5.68c9dca3
     <<< st_mode=000041ed st_nlink=2
     <<< st_uid=0 st_gid=0
     <<< st_rdev=0 st_size=28 st_blksize=131072 st_blocks=0
     <<< st_atime_sec=1758065827 st_atime_nsec=745359877
     <<< st_mtime_sec=1758065827 st_mtime_nsec=745359877
     <<< st_ctime_sec=1758066075 st_ctime_nsec=497687251
     <<< st_btime_sec=0 st_btime_nsec=0
     <<< st_gen=0 st_data_version=0
     9pnet: -- v9fs_dir_readdir_dotl (189): name dir1
     9pnet: (00000189) >>> TREADDIR fid 3 offset 0 count 131072
     9pnet: (00000189) >>> size=23 type: 40 tag: 0
     9pnet: (00000189) <<< size=62 type: 41 tag: 0
     9pnet: (00000189) <<< RREADDIR count 51
     9pnet: (00000189) >>> TREADDIR fid 3 offset 2147483647 count 131072
     9pnet: (00000189) >>> size=23 type: 40 tag: 0
     9pnet: (00000189) <<< size=11 type: 41 tag: 0
     9pnet: (00000189) <<< RREADDIR count 0
     9pnet: -- v9fs_dir_readdir_dotl (189): name dir1
     9pnet: (00000189) >>> TREADDIR fid 3 offset 2147483647 count 131072
     9pnet: (00000189) >>> size=23 type: 40 tag: 0
     9pnet: (00000189) <<< size=11 type: 41 tag: 0
     9pnet: (00000189) <<< RREADDIR count 0
     9pnet: -- v9fs_dir_release (189): inode: ffff888102e80640 filp: ffff88810af47100 fid: 3
     9pnet: (00000189) >>> TCLUNK fid 3 (try 0)
     9pnet: (00000189) >>> size=11 type: 120 tag: 0
     9pnet: (00000189) <<< size=7 type: 121 tag
    root@guest:/tmp/test/dir1# echo $?
    0

Somehow if I rename in the guest, it all works, even though it's using the
same fid 2 (and it didn't ask QEMU to walk the new path)

    root@guest:/tmp/test/dir1# mv /tmp/test/dir1 /tmp/test/dir2
     9pnet: -- v9fs_vfs_getattr_dotl (183): dentry: ffff888102e45a70
     9pnet: -- v9fs_fid_find (183):  dentry: dir1 (ffff888102e45a70) uid 0 any 0
     9pnet: (00000183) >>> TGETATTR fid 2, request_mask 16383
     9pnet: (00000183) >>> size=19 type: 24 tag: 0
     9pnet: (00000183) <<< size=160 type: 25 tag: 0
     9pnet: (00000183) <<< RGETATTR st_result_mask=6143
     <<< qid=80.5.68c9dca3
     <<< st_mode=000041ed st_nlink=2
     <<< st_uid=0 st_gid=0
     <<< st_rdev=0 st_size=28 st_blksize=131072 st_blocks=0
     <<< st_atime_sec=1758066561 st_atime_nsec=442431580
     <<< st_mtime_sec=1758065827 st_mtime_nsec=745359877
     <<< st_ctime_sec=1758066559 st_ctime_nsec=570428555
     <<< st_btime_sec=0 st_btime_nsec=0
     <<< st_gen=0 st_data_version=0
     9pnet: -- v9fs_vfs_lookup (194): dir: ffff8881090e0000 dentry: (dir2) ffff888102edca48 flags: e0000
     9pnet: -- v9fs_fid_find (194):  dentry: / (ffff888102ed4d38) uid 0 any 0
     9pnet: (00000194) >>> TWALK fids 1,3 nwname 1d wname[0] dir2
     9pnet: (00000194) >>> size=23 type: 110 tag: 0
     9pnet: (00000194) <<< size=11 type: 7 tag: 0
     9pnet: (00000194) <<< RLERROR (-2)
     9pnet: -- v9fs_dentry_release (194):  dentry: dir2 (ffff888102edca48)
     9pnet: -- v9fs_vfs_lookup (194): dir: ffff8881090e0000 dentry: (dir2) ffff888102edcbc0 flags: 0
     9pnet: -- v9fs_fid_find (194):  dentry: / (ffff888102ed4d38) uid 0 any 0
     9pnet: (00000194) >>> TWALK fids 1,3 nwname 1d wname[0] dir2
     9pnet: (00000194) >>> size=23 type: 110 tag: 0
     9pnet: (00000194) <<< size=11 type: 7 tag: 0
     9pnet: (00000194) <<< RLERROR (-2)
     9pnet: -- v9fs_dentry_release (194):  dentry: dir2 (ffff888102edcbc0)
     9pnet: -- v9fs_vfs_lookup (194): dir: ffff8881090e0000 dentry: (dir2) ffff888102edcd38 flags: a0000
     9pnet: -- v9fs_fid_find (194):  dentry: / (ffff888102ed4d38) uid 0 any 0
     9pnet: (00000194) >>> TWALK fids 1,3 nwname 1d wname[0] dir2
     9pnet: (00000194) >>> size=23 type: 110 tag: 0
     9pnet: (00000194) <<< size=11 type: 7 tag: 0
     9pnet: (00000194) <<< RLERROR (-2)
     9pnet: -- v9fs_vfs_rename (194): 
     9pnet: -- v9fs_fid_find (194):  dentry: dir1 (ffff888102e45a70) uid 0 any 0
     9pnet: -- v9fs_fid_find (194):  dentry: / (ffff888102ed4d38) uid 0 any 0
     9pnet: (00000194) >>> TWALK fids 1,3 nwname 0d wname[0] (null)
     9pnet: (00000194) >>> size=17 type: 110 tag: 0
     9pnet: (00000194) <<< size=9 type: 111 tag: 0
     9pnet: (00000194) <<< RWALK nwqid 0:
     9pnet: -- v9fs_fid_find (194):  dentry: / (ffff888102ed4d38) uid 0 any 0
     9pnet: (00000194) >>> TWALK fids 1,4 nwname 0d wname[0] (null)
     9pnet: (00000194) >>> size=17 type: 110 tag: 0
     9pnet: (00000194) <<< size=9 type: 111 tag: 0
     9pnet: (00000194) <<< RWALK nwqid 0:
     9pnet: (00000194) >>> TRENAMEAT olddirfid 3 old name dir1 newdirfid 4 new name dir2
     9pnet: (00000194) >>> size=27 type: 74 tag: 0
     9pnet: (00000194) <<< size=7 type: 75 tag: 0
     9pnet: (00000194) <<< RRENAMEAT newdirfid 4 new name dir2
     9pnet: (00000194) >>> TCLUNK fid 4 (try 0)
     9pnet: (00000194) >>> size=11 type: 120 tag: 0
     9pnet: (00000194) <<< size=7 type: 121 tag: 0
     9pnet: (00000194) <<< RCLUNK fid 4
     9pnet: (00000194) >>> TCLUNK fid 3 (try 0)
     9pnet: (00000194) >>> size=11 type: 120 tag: 0
     9pnet: (00000194) <<< size=7 type: 121 tag: 0
     9pnet: (00000194) <<< RCLUNK fid 3
     9pnet: -- v9fs_dentry_release (194):  dentry: dir2 (ffff888102edcd38)
    root@guest:/tmp/test/dir1# ls
     9pnet: -- v9fs_file_open (195): inode: ffff888102e80640 file: ffff88810b2b1500
     9pnet: -- v9fs_fid_find (195):  dentry: dir2 (ffff888102e45a70) uid 0 any 0
     9pnet: (00000195) >>> TWALK fids 2,3 nwname 0d wname[0] (null)
     9pnet: (00000195) >>> size=17 type: 110 tag: 0
     9pnet: (00000195) <<< size=9 type: 111 tag: 0
     9pnet: (00000195) <<< RWALK nwqid 0:
     9pnet: (00000195) >>> TLOPEN fid 3 mode 100352
     9pnet: (00000195) >>> size=15 type: 12 tag: 0
     9pnet: (00000195) <<< size=24 type: 13 tag: 0
     9pnet: (00000195) <<< RLOPEN qid 80.5.68c9dca3 iounit 0
     9pnet: -- v9fs_vfs_getattr_dotl (195): dentry: ffff888102e45a70
     9pnet: -- v9fs_fid_find (195):  dentry: dir2 (ffff888102e45a70) uid 0 any 0
     9pnet: (00000195) >>> TGETATTR fid 2, request_mask 16383
     9pnet: (00000195) >>> size=19 type: 24 tag: 0
     9pnet: (00000195) <<< size=160 type: 25 tag: 0
     9pnet: (00000195) <<< RGETATTR st_result_mask=6143
     <<< qid=80.5.68c9dca3
     <<< st_mode=000041ed st_nlink=2
     <<< st_uid=0 st_gid=0
     <<< st_rdev=0 st_size=28 st_blksize=131072 st_blocks=0
     <<< st_atime_sec=1758066561 st_atime_nsec=442431580
     <<< st_mtime_sec=1758065827 st_mtime_nsec=745359877
     <<< st_ctime_sec=1758066568 st_ctime_nsec=562443096
     <<< st_btime_sec=0 st_btime_nsec=0
     <<< st_gen=0 st_data_version=0
     9pnet: -- v9fs_dir_readdir_dotl (195): name dir2
     9pnet: (00000195) >>> TREADDIR fid 3 offset 0 count 131072
     9pnet: (00000195) >>> size=23 type: 40 tag: 0
     9pnet: (00000195) <<< size=62 type: 41 tag: 0
     9pnet: (00000195) <<< RREADDIR count 51
     9pnet: (00000195) >>> TREADDIR fid 3 offset 2147483647 count 131072
     9pnet: (00000195) >>> size=23 type: 40 tag: 0
     9pnet: (00000195) <<< size=11 type: 41 tag: 0
     9pnet: (00000195) <<< RREADDIR count 0
     9pnet: -- v9fs_dir_readdir_dotl (195): name dir2
     9pnet: (00000195) >>> TREADDIR fid 3 offset 2147483647 count 131072
     9pnet: (00000195) >>> size=23 type: 40 tag: 0
     9pnet: (00000195) <<< size=11 type: 41 tag: 0
     9pnet: (00000195) <<< RREADDIR count 0
     9pnet: -- v9fs_dir_release (195): inode: ffff888102e80640 filp: ffff88810b2b1500 fid: 3
     9pnet: (00000195) >>> TCLUNK fid 3 (try 0)
     9pnet: (00000195) >>> size=11 type: 120 tag: 0
     9pnet: (00000195) <<< size=7 type: 121 tag: 0
     9pnet: (00000195) <<< RCLUNK fid 3

If this is surprising, I'm happy to take a deeper look over the weekend
(but I've never tried to debug QEMU itself :D)

> Because even though QEMU retains descriptors of open FIDs; when the QEMU 
> process approaches host system's max. allowed number of open file descriptors 
> then v9fs_reclaim_fd() [hw/9pfs/9p.c] is called, which closes some descriptors 
> of older FIDs to (at least) keep the QEMU process alive.
> 
> BTW: to prevent these descriptor reclaims to happen too often, I plan to do 
> what many other files servers do: asking the host system on process start to 
> increase the max. number of file descriptors.

Note that the above is reproduced with only 1 file open (the dir being
renamed around)

Kind regards,
Tingmao

> 
> /Christian
> 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/7] fs/9p: Reuse inode based on path (in addition to qid)
  2025-09-16 23:59                 ` Tingmao Wang
@ 2025-09-17  9:52                   ` Christian Schoenebeck
  2025-09-17 15:00                     ` Mickaël Salaün
  0 siblings, 1 reply; 27+ messages in thread
From: Christian Schoenebeck @ 2025-09-17  9:52 UTC (permalink / raw)
  To: Dominique Martinet, Tingmao Wang
  Cc: Mickaël Salaün, Eric Van Hensbergen, Latchesar Ionkov,
	v9fs, Günther Noack, linux-security-module, Jan Kara,
	Amir Goldstein, Matthew Bobrowski, Al Viro, Christian Brauner,
	linux-fsdevel

On Wednesday, September 17, 2025 1:59:21 AM CEST Tingmao Wang wrote:
> On 9/16/25 20:22, Christian Schoenebeck wrote:
> > On Tuesday, September 16, 2025 4:01:40 PM CEST Tingmao Wang wrote:
[...]
> > I see that you are proposing an option for your proposed qid based
> > re-using of dentries. I don't think it should be on by default though,
> > considering what we already discussed (e.g. inodes recycled by ext4, but
> > also not all 9p servers handling inode collisions).
> 
> Just to be clear, this approach (Landlock holding a fid reference, then
> using the qid as a key to search for rules when a Landlocked process
> accesses the previously remembered file, possibly after the file has been
> moved on the server) would only be in Landlock, and would only affect
> Landlock, not 9pfs (so not sure what you meant by "re-using of dentries").
> 
> The idea behind holding a fid reference within Landlock is that, because
> we have the file open, the inode would not get recycled in ext4, and thus
> no other file will reuse the qid, until we close that reference (when the
> Landlock domain terminates, or when the 9p filesystem is unmounted)

So far I only had a glimpse on your kernel patches and had the impression that 
they are changing behaviour for all users, since you are touching dentry 
lookup.

> > For all open FIDs QEMU retains a descriptor to the file/directory.
> > 
> > Which 9p message do you see sent to server, Trename or Trenameat?
> > 
> > Does this always happen to you or just sometimes, i.e. under heavy load?
> 
> Always happen, see log: (no Trename since the rename is done on the host)
[...]
> Somehow if I rename in the guest, it all works, even though it's using the
> same fid 2 (and it didn't ask QEMU to walk the new path)

Got it. Even though QEMU *should* hold a file descriptor (or a DIR* stream, 
which should imply a file descriptor), there is still a path string stored at 
V9fsFidState and that path being processed at some places, probably because 
there are path based and FID based variants (e.g Trename vs. Trenameat). Maybe 
that clashes somewhere, not sure. So I fear you would need to debug this.

/Christian



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/7] fs/9p: Reuse inode based on path (in addition to qid)
  2025-09-15 14:10     ` Christian Schoenebeck
@ 2025-09-17 15:00       ` Mickaël Salaün
  0 siblings, 0 replies; 27+ messages in thread
From: Mickaël Salaün @ 2025-09-17 15:00 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: Tingmao Wang, Dominique Martinet, Eric Van Hensbergen,
	Latchesar Ionkov, v9fs, Günther Noack, linux-security-module,
	Jan Kara, Amir Goldstein, Matthew Bobrowski, Al Viro,
	Christian Brauner, linux-fsdevel

On Mon, Sep 15, 2025 at 04:10:07PM +0200, Christian Schoenebeck wrote:
> On Monday, September 15, 2025 2:53:14 PM CEST Dominique Martinet wrote:
> [...]
> > > 1. The qid is 9pfs internal data, and we may need extra API for 9pfs to
> > > 
> > >    expose this to Landlock.  On 64bit, this is easy as it's just the inode
> > >    number (offset by 2), which we can already get from the struct inode.
> > >    But perhaps on 32bit we need a way to expose the full 64bit server-sent
> > >    qid to Landlock (or other kernel subsystems), if we're going to do
> > >    this.
> > 
> > I'm not sure how much effort we want to spend on 32bit: as far as I
> > know, if we have inode number collision on 32 bit we're already in
> > trouble (tools like tar will consider such files to be hardlink of each
> > other and happily skip reading data, producing corrupted archives);
> > this is not a happy state but I don't know how to do better in any
> > reasonable way, so we can probably keep a similar limitation for 32bit
> > and use inode number directly...
> 
> I agree, on 32-bit the game is lost.
> 
> One way that would come to my mind though: exposing the full qid path as xattr 
> on 32-bit, e.g. via "system.9pfs_qid" or something like that.

Another way to always deal with 64-bit values, even on 32-bit
architectures, would be to implement inode->i_op->getattr(), but that
could have side effects for 9p users expecting the current behavior.

> 
> > > 2. Even though qids are supposed to be unique across the lifetime of a
> > > 
> > >    filesystem (including deleted files), this is not the case even for
> > >    QEMU in multidevs=remap mode, when running on ext4, as tested on QEMU
> > >    10.1.0.
> > 
> > I'm not familiar with the qid remap implementation in qemu, but I'm
> > curious in what case you hit that.
> > Deleting and recreating files? Or as you seem to say below the 'qid' is
> > "freed" when fd is closed qemu-side and re-used by later open of other
> > files?
> 
> The inode remap algorithm in QEMU's 9p server was designed to prevent inode 
> number collisions of equally numbered inodes of *different* *devices* on host, 
> exposed to guest via the same 9p mount (which appears as only one 9pfs device 
> on guest). Basis for this however is still the underlying filesystem's inode 
> number on host.
> 
> So yes, ext4 re-uses inode numbers of deleted files, and when that happens, a 
> new file appears with the same qid path as the previously deleted file with 
> QEMU.
> 
> /Christian
> 
> 
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/7] fs/9p: Reuse inode based on path (in addition to qid)
  2025-09-17  9:52                   ` Christian Schoenebeck
@ 2025-09-17 15:00                     ` Mickaël Salaün
  2025-09-21 16:24                       ` Tingmao Wang
  0 siblings, 1 reply; 27+ messages in thread
From: Mickaël Salaün @ 2025-09-17 15:00 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: Dominique Martinet, Tingmao Wang, Eric Van Hensbergen,
	Latchesar Ionkov, v9fs, Günther Noack, linux-security-module,
	Jan Kara, Amir Goldstein, Matthew Bobrowski, Al Viro,
	Christian Brauner, linux-fsdevel

On Wed, Sep 17, 2025 at 11:52:35AM +0200, Christian Schoenebeck wrote:
> On Wednesday, September 17, 2025 1:59:21 AM CEST Tingmao Wang wrote:
> > On 9/16/25 20:22, Christian Schoenebeck wrote:
> > > On Tuesday, September 16, 2025 4:01:40 PM CEST Tingmao Wang wrote:
> [...]
> > > I see that you are proposing an option for your proposed qid based
> > > re-using of dentries. I don't think it should be on by default though,
> > > considering what we already discussed (e.g. inodes recycled by ext4, but
> > > also not all 9p servers handling inode collisions).
> > 
> > Just to be clear, this approach (Landlock holding a fid reference, then
> > using the qid as a key to search for rules when a Landlocked process
> > accesses the previously remembered file, possibly after the file has been
> > moved on the server) would only be in Landlock, and would only affect
> > Landlock, not 9pfs (so not sure what you meant by "re-using of dentries").
> > 
> > The idea behind holding a fid reference within Landlock is that, because
> > we have the file open, the inode would not get recycled in ext4, and thus
> > no other file will reuse the qid, until we close that reference (when the
> > Landlock domain terminates, or when the 9p filesystem is unmounted)
> 
> So far I only had a glimpse on your kernel patches and had the impression that 
> they are changing behaviour for all users, since you are touching dentry 
> lookup.

I think we should not hold dentries because:
- they reference other dentries (i.e. a file hierarchy),
- they block umount and I'm convinced the VFS (and users) are not going
  to like long-lived dentries,
- Landlock and inotify don't need dentries, just inodes.

I'm wondering why fid are referenced by dentries instead of inodes.

The need for Landlock is to be able to match an inode with a previously
seen one.  Not all LSM hooks (nor VFS internals) always have access to
dentries, but they do have access to inodes.

> 
> > > For all open FIDs QEMU retains a descriptor to the file/directory.
> > > 
> > > Which 9p message do you see sent to server, Trename or Trenameat?
> > > 
> > > Does this always happen to you or just sometimes, i.e. under heavy load?
> > 
> > Always happen, see log: (no Trename since the rename is done on the host)
> [...]
> > Somehow if I rename in the guest, it all works, even though it's using the
> > same fid 2 (and it didn't ask QEMU to walk the new path)
> 
> Got it. Even though QEMU *should* hold a file descriptor (or a DIR* stream, 

It's reasonable to assume that QEMU and other should hold opened fid In
practice, this might not always be the case, but let's move on and
consider that a 9p server bug.

Landlock and fanotify need some guarantees on opened files, and we
cannot consider every server bug.  For Landlock, inode may get an
"ephemeral tag" (with the Landlock object mechanism) to match previously
seen inodes.  In a perfect world, Landlock could keep a reference on 9p
inodes (as for other filesystems) and these inodes would always match
the same file.  In practice this is not the case, but the 9p client
requirements and the Landlock requirements are not exactly the same.

A 9p client (the kernel) wants to safely deal with duplicated qid, which
should not happen but still happen in practice as explained before.
On the other side, Landlock wants to not deny access to allowed files
(currently identified by their inodes), but I think it would be
reasonable to allow access theoretically denied (i.e. not allowed to be
precise, because of the denied by default mechanism) files because of a
9p server bug mishandling qid (e.g. mapping them to recycled ext4
inodes).

All that to say that it looks reasonable for Landlock to trust the
filesystem, and by that I mean all its dependencies, including the 9p
server, to not have bugs.

Another advantage to rely on qid and server-side opened files is that we
get (in theory) the same semantic as when Landlock is used with local
filesystems (e.g. files moved on the server should still be correctly
identified by Landlock on the client).

> which should imply a file descriptor), there is still a path string stored at 
> V9fsFidState and that path being processed at some places, probably because 
> there are path based and FID based variants (e.g Trename vs. Trenameat). Maybe 
> that clashes somewhere, not sure. So I fear you would need to debug this.

Good to know that it is not a legitimate behavior for a 9p client.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/7] fs/9p: Reuse inode based on path (in addition to qid)
  2025-09-17 15:00                     ` Mickaël Salaün
@ 2025-09-21 16:24                       ` Tingmao Wang
  2025-09-27 18:27                         ` Mickaël Salaün
  2025-09-29 13:06                         ` Christian Schoenebeck
  0 siblings, 2 replies; 27+ messages in thread
From: Tingmao Wang @ 2025-09-21 16:24 UTC (permalink / raw)
  To: Christian Schoenebeck, Mickaël Salaün,
	Dominique Martinet
  Cc: Eric Van Hensbergen, Latchesar Ionkov, v9fs, Günther Noack,
	linux-security-module, Jan Kara, Amir Goldstein,
	Matthew Bobrowski, Al Viro, Christian Brauner, linux-fsdevel,
	qemu-devel

On 9/17/25 16:00, Mickaël Salaün wrote:
> On Wed, Sep 17, 2025 at 11:52:35AM +0200, Christian Schoenebeck wrote:
>> On Wednesday, September 17, 2025 1:59:21 AM CEST Tingmao Wang wrote:
>>> On 9/16/25 20:22, Christian Schoenebeck wrote:
>>>> On Tuesday, September 16, 2025 4:01:40 PM CEST Tingmao Wang wrote:
>> [...]
>>>> I see that you are proposing an option for your proposed qid based
>>>> re-using of dentries. I don't think it should be on by default though,
>>>> considering what we already discussed (e.g. inodes recycled by ext4, but
>>>> also not all 9p servers handling inode collisions).
>>>
>>> Just to be clear, this approach (Landlock holding a fid reference, then
>>> using the qid as a key to search for rules when a Landlocked process
>>> accesses the previously remembered file, possibly after the file has been
>>> moved on the server) would only be in Landlock, and would only affect
>>> Landlock, not 9pfs (so not sure what you meant by "re-using of dentries").
>>>
>>> The idea behind holding a fid reference within Landlock is that, because
>>> we have the file open, the inode would not get recycled in ext4, and thus
>>> no other file will reuse the qid, until we close that reference (when the
>>> Landlock domain terminates, or when the 9p filesystem is unmounted)
>>
>> So far I only had a glimpse on your kernel patches and had the impression that 
>> they are changing behaviour for all users, since you are touching dentry 
>> lookup.
> 
> I think we should not hold dentries because:
> - they reference other dentries (i.e. a file hierarchy),
> - they block umount and I'm convinced the VFS (and users) are not going
>   to like long-lived dentries,
> - Landlock and inotify don't need dentries, just inodes.
> 
> I'm wondering why fid are referenced by dentries instead of inodes.
> 
> The need for Landlock is to be able to match an inode with a previously
> seen one.  Not all LSM hooks (nor VFS internals) always have access to
> dentries, but they do have access to inodes.
> 
>>
>>>> For all open FIDs QEMU retains a descriptor to the file/directory.
>>>>
>>>> Which 9p message do you see sent to server, Trename or Trenameat?
>>>>
>>>> Does this always happen to you or just sometimes, i.e. under heavy load?
>>>
>>> Always happen, see log: (no Trename since the rename is done on the host)
>> [...]
>>> Somehow if I rename in the guest, it all works, even though it's using the
>>> same fid 2 (and it didn't ask QEMU to walk the new path)
>>
>> Got it. Even though QEMU *should* hold a file descriptor (or a DIR* stream, 
> 
> It's reasonable to assume that QEMU and other should hold opened fid In
> practice, this might not always be the case, but let's move on and
> consider that a 9p server bug.
> 
> Landlock and fanotify need some guarantees on opened files, and we
> cannot consider every server bug.  For Landlock, inode may get an
> "ephemeral tag" (with the Landlock object mechanism) to match previously
> seen inodes.  In a perfect world, Landlock could keep a reference on 9p
> inodes (as for other filesystems) and these inodes would always match
> the same file.  In practice this is not the case, but the 9p client
> requirements and the Landlock requirements are not exactly the same.
> 
> A 9p client (the kernel) wants to safely deal with duplicated qid, which
> should not happen but still happen in practice as explained before.
> On the other side, Landlock wants to not deny access to allowed files
> (currently identified by their inodes), but I think it would be
> reasonable to allow access theoretically denied (i.e. not allowed to be
> precise, because of the denied by default mechanism) files because of a
> 9p server bug mishandling qid (e.g. mapping them to recycled ext4
> inodes).
> 
> All that to say that it looks reasonable for Landlock to trust the
> filesystem, and by that I mean all its dependencies, including the 9p
> server, to not have bugs.
> 
> Another advantage to rely on qid and server-side opened files is that we
> get (in theory) the same semantic as when Landlock is used with local
> filesystems (e.g. files moved on the server should still be correctly
> identified by Landlock on the client).
> 
>> which should imply a file descriptor), there is still a path string stored at 
>> V9fsFidState and that path being processed at some places, probably because 
>> there are path based and FID based variants (e.g Trename vs. Trenameat). Maybe 
>> that clashes somewhere, not sure. So I fear you would need to debug this.
> 
> Good to know that it is not a legitimate behavior for a 9p client.

So I did some quick debugging and realized that I had a wrong
understanding of how fids relates to opened files on the host, under QEMU.
It turns out that in QEMU's 9p server implementation, a fid does not
actually correspond to any opened file descriptors - it merely represents
a (string-based) path that QEMU stores internally.  It only opens the
actual file if the client actually does an T(l)open, which is in fact
separate from acquiring the fid with T(l)walk.  The reason why renaming
file/dirs from the client doesn't break those fids is because QEMU will
actually fix those paths when a rename request is processed - c.f.
v9fs_fix_fid_paths [1].

It turns out that even if a guest process opens the file with O_PATH, that
file descriptor does not cause an actual Topen, and therefore QEMU does
not open the file on the host, and later on reopening that fd with another
mode (via e.g. open("/proc/self/fd/...", O_RDONLY)) will fail if the file
has moved on the host without QEMU's knowledge.  Also, openat will fail if
provided with a dir fd that "points" to a moved directory, regardless of
whether the fd is opened with O_PATH or not, since path walk in QEMU is
completely string-based and does not actually issue openat on the host fs
[2].

I'm not sure if this was is intentional in QEMU - it would seem to me that
a fid should translate to a fd (maybe opened with just O_PATH) on the
host, and path walks based on that fid should be done via openat with this
fd, which will also "automatically" handle renames without QEMU needing to
fixup the string paths?

In any case, this probably means that even if Landlock were to hold a fid
reference, and QEMU does qid remapping, that's still not enough to
guarantees that we won't have a different, unrelated file ending up with
the same qid, at least under ext4.

I'm not sure what's the way forward - would Landlock need to actually
"open" the files (or do something that will cause a Topen to be issued by
v9fs)?  Alternatively if we believe this to be a QEMU issue, maybe
Landlock don't need to work around it and should just hold fids (and use
QIDs to key the rules) anyway despite server quirks like these.  This can
perhaps then be fixed in QEMU?

(I guess the fact that QEMU is doing path tracking in the first place does
gives more precedent for justifying doing path tracking in v9fs as well,
but maybe that's the wrong way to think about it)

Test programs: openat.c [3], open_procselffd.c [4]


[1]: https://gitlab.com/qemu-project/qemu/-/blob/44f51c1a3cf435daa82eb757740b59b1fd4fe71c/hw/9pfs/9p.c#L3403
[2]: https://gitlab.com/qemu-project/qemu/-/blob/371a269ff8ce561c28e4fa03bb49e4940f990637/hw/9pfs/9p-local.c#L1243
[3]: https://fileshare.maowtm.org/9pfs-landlock-fix/20250921/openat.c
[4]: https://fileshare.maowtm.org/9pfs-landlock-fix/20250921/open_procselffd.c

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/7] fs/9p: Reuse inode based on path (in addition to qid)
  2025-09-21 16:24                       ` Tingmao Wang
@ 2025-09-27 18:27                         ` Mickaël Salaün
  2025-09-27 22:53                           ` Tingmao Wang
  2025-09-29 13:06                         ` Christian Schoenebeck
  1 sibling, 1 reply; 27+ messages in thread
From: Mickaël Salaün @ 2025-09-27 18:27 UTC (permalink / raw)
  To: Tingmao Wang, Greg Kurz
  Cc: Christian Schoenebeck, Dominique Martinet, Eric Van Hensbergen,
	Latchesar Ionkov, v9fs, Günther Noack, linux-security-module,
	Jan Kara, Amir Goldstein, Matthew Bobrowski, Al Viro,
	Christian Brauner, linux-fsdevel, qemu-devel

Adding Greg Kurz too.

On Sun, Sep 21, 2025 at 05:24:49PM +0100, Tingmao Wang wrote:
> On 9/17/25 16:00, Mickaël Salaün wrote:
> > On Wed, Sep 17, 2025 at 11:52:35AM +0200, Christian Schoenebeck wrote:
> >> On Wednesday, September 17, 2025 1:59:21 AM CEST Tingmao Wang wrote:
> >>> On 9/16/25 20:22, Christian Schoenebeck wrote:
> >>>> On Tuesday, September 16, 2025 4:01:40 PM CEST Tingmao Wang wrote:
> >> [...]
> >>>> I see that you are proposing an option for your proposed qid based
> >>>> re-using of dentries. I don't think it should be on by default though,
> >>>> considering what we already discussed (e.g. inodes recycled by ext4, but
> >>>> also not all 9p servers handling inode collisions).
> >>>
> >>> Just to be clear, this approach (Landlock holding a fid reference, then
> >>> using the qid as a key to search for rules when a Landlocked process
> >>> accesses the previously remembered file, possibly after the file has been
> >>> moved on the server) would only be in Landlock, and would only affect
> >>> Landlock, not 9pfs (so not sure what you meant by "re-using of dentries").
> >>>
> >>> The idea behind holding a fid reference within Landlock is that, because
> >>> we have the file open, the inode would not get recycled in ext4, and thus
> >>> no other file will reuse the qid, until we close that reference (when the
> >>> Landlock domain terminates, or when the 9p filesystem is unmounted)
> >>
> >> So far I only had a glimpse on your kernel patches and had the impression that 
> >> they are changing behaviour for all users, since you are touching dentry 
> >> lookup.
> > 
> > I think we should not hold dentries because:
> > - they reference other dentries (i.e. a file hierarchy),
> > - they block umount and I'm convinced the VFS (and users) are not going
> >   to like long-lived dentries,
> > - Landlock and inotify don't need dentries, just inodes.
> > 
> > I'm wondering why fid are referenced by dentries instead of inodes.
> > 
> > The need for Landlock is to be able to match an inode with a previously
> > seen one.  Not all LSM hooks (nor VFS internals) always have access to
> > dentries, but they do have access to inodes.
> > 
> >>
> >>>> For all open FIDs QEMU retains a descriptor to the file/directory.
> >>>>
> >>>> Which 9p message do you see sent to server, Trename or Trenameat?
> >>>>
> >>>> Does this always happen to you or just sometimes, i.e. under heavy load?
> >>>
> >>> Always happen, see log: (no Trename since the rename is done on the host)
> >> [...]
> >>> Somehow if I rename in the guest, it all works, even though it's using the
> >>> same fid 2 (and it didn't ask QEMU to walk the new path)
> >>
> >> Got it. Even though QEMU *should* hold a file descriptor (or a DIR* stream, 
> > 
> > It's reasonable to assume that QEMU and other should hold opened fid In
> > practice, this might not always be the case, but let's move on and
> > consider that a 9p server bug.
> > 
> > Landlock and fanotify need some guarantees on opened files, and we
> > cannot consider every server bug.  For Landlock, inode may get an
> > "ephemeral tag" (with the Landlock object mechanism) to match previously
> > seen inodes.  In a perfect world, Landlock could keep a reference on 9p
> > inodes (as for other filesystems) and these inodes would always match
> > the same file.  In practice this is not the case, but the 9p client
> > requirements and the Landlock requirements are not exactly the same.
> > 
> > A 9p client (the kernel) wants to safely deal with duplicated qid, which
> > should not happen but still happen in practice as explained before.
> > On the other side, Landlock wants to not deny access to allowed files
> > (currently identified by their inodes), but I think it would be
> > reasonable to allow access theoretically denied (i.e. not allowed to be
> > precise, because of the denied by default mechanism) files because of a
> > 9p server bug mishandling qid (e.g. mapping them to recycled ext4
> > inodes).
> > 
> > All that to say that it looks reasonable for Landlock to trust the
> > filesystem, and by that I mean all its dependencies, including the 9p
> > server, to not have bugs.
> > 
> > Another advantage to rely on qid and server-side opened files is that we
> > get (in theory) the same semantic as when Landlock is used with local
> > filesystems (e.g. files moved on the server should still be correctly
> > identified by Landlock on the client).
> > 
> >> which should imply a file descriptor), there is still a path string stored at 
> >> V9fsFidState and that path being processed at some places, probably because 
> >> there are path based and FID based variants (e.g Trename vs. Trenameat). Maybe 
> >> that clashes somewhere, not sure. So I fear you would need to debug this.
> > 
> > Good to know that it is not a legitimate behavior for a 9p client.
> 
> So I did some quick debugging and realized that I had a wrong
> understanding of how fids relates to opened files on the host, under QEMU.
> It turns out that in QEMU's 9p server implementation, a fid does not
> actually correspond to any opened file descriptors - it merely represents
> a (string-based) path that QEMU stores internally.  It only opens the
> actual file if the client actually does an T(l)open, which is in fact
> separate from acquiring the fid with T(l)walk.  The reason why renaming
> file/dirs from the client doesn't break those fids is because QEMU will
> actually fix those paths when a rename request is processed - c.f.
> v9fs_fix_fid_paths [1].
> 
> It turns out that even if a guest process opens the file with O_PATH, that
> file descriptor does not cause an actual Topen, and therefore QEMU does
> not open the file on the host, and later on reopening that fd with another
> mode (via e.g. open("/proc/self/fd/...", O_RDONLY)) will fail if the file
> has moved on the host without QEMU's knowledge.  Also, openat will fail if
> provided with a dir fd that "points" to a moved directory, regardless of
> whether the fd is opened with O_PATH or not, since path walk in QEMU is
> completely string-based and does not actually issue openat on the host fs
> [2].
> 
> I'm not sure if this was is intentional in QEMU - it would seem to me that
> a fid should translate to a fd (maybe opened with just O_PATH) on the
> host, and path walks based on that fid should be done via openat with this
> fd, which will also "automatically" handle renames without QEMU needing to
> fixup the string paths?

I agree, it would make sense for QEMU to map fid to FD+O_PATH.  That
would avoid the kind of issues you mentioned.

Christian, Greg, what do you think?

> 
> In any case, this probably means that even if Landlock were to hold a fid
> reference, and QEMU does qid remapping, that's still not enough to
> guarantees that we won't have a different, unrelated file ending up with
> the same qid, at least under ext4.
> 
> I'm not sure what's the way forward - would Landlock need to actually
> "open" the files (or do something that will cause a Topen to be issued by
> v9fs)?

> Alternatively if we believe this to be a QEMU issue, maybe
> Landlock don't need to work around it and should just hold fids (and use
> QIDs to key the rules) anyway despite server quirks like these.  This can
> perhaps then be fixed in QEMU?

Yes, I think it would make sense for Landlock to open and keep open a
fid (and hopefully the related remote file).  However, the v9fs umount
should be handled gracefully the same way Landlock tag inodes are
handled.  This should come with a QEMU patch to fix the consistency
issue.

> 
> (I guess the fact that QEMU is doing path tracking in the first place does
> gives more precedent for justifying doing path tracking in v9fs as well,
> but maybe that's the wrong way to think about it)

Anyway, if QEMU does it, wouldn't it be the same for Landlock to just
rely on fid?  If QEMU uses FD+O_PATH, then Landlock would work even for
server-moved files.

> 
> Test programs: openat.c [3], open_procselffd.c [4]
> 
> 
> [1]: https://gitlab.com/qemu-project/qemu/-/blob/44f51c1a3cf435daa82eb757740b59b1fd4fe71c/hw/9pfs/9p.c#L3403
> [2]: https://gitlab.com/qemu-project/qemu/-/blob/371a269ff8ce561c28e4fa03bb49e4940f990637/hw/9pfs/9p-local.c#L1243
> [3]: https://fileshare.maowtm.org/9pfs-landlock-fix/20250921/openat.c
> [4]: https://fileshare.maowtm.org/9pfs-landlock-fix/20250921/open_procselffd.c
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/7] fs/9p: Reuse inode based on path (in addition to qid)
  2025-09-27 18:27                         ` Mickaël Salaün
@ 2025-09-27 22:53                           ` Tingmao Wang
  0 siblings, 0 replies; 27+ messages in thread
From: Tingmao Wang @ 2025-09-27 22:53 UTC (permalink / raw)
  To: Mickaël Salaün, Greg Kurz
  Cc: Christian Schoenebeck, Dominique Martinet, Eric Van Hensbergen,
	Latchesar Ionkov, v9fs, Günther Noack, linux-security-module,
	Jan Kara, Amir Goldstein, Matthew Bobrowski, Al Viro,
	Christian Brauner, linux-fsdevel, qemu-devel

On 9/27/25 19:27, Mickaël Salaün wrote:
> Adding Greg Kurz too.
> 
> On Sun, Sep 21, 2025 at 05:24:49PM +0100, Tingmao Wang wrote:
>> On 9/17/25 16:00, Mickaël Salaün wrote:
>>> [...]
>>
>> Alternatively if we believe this to be a QEMU issue, maybe
>> Landlock don't need to work around it and should just hold fids (and use
>> QIDs to key the rules) anyway despite server quirks like these.  This can
>> perhaps then be fixed in QEMU?
> 
> Yes, I think it would make sense for Landlock to open and keep open a
> fid (and hopefully the related remote file).  However, the v9fs umount
> should be handled gracefully the same way Landlock tag inodes are
> handled.  This should come with a QEMU patch to fix the consistency
> issue.
> 
>>
>> (I guess the fact that QEMU is doing path tracking in the first place does
>> gives more precedent for justifying doing path tracking in v9fs as well,
>> but maybe that's the wrong way to think about it)
> 
> Anyway, if QEMU does it, wouldn't it be the same for Landlock to just
> rely on fid?

The fid can't be relied on because it's just a handle.  The client can
open multiple fids pointing to the same file (and in fact this is what
v9fs does - new fid for each open())

> If QEMU uses FD+O_PATH, then Landlock would work even for
> server-moved files.

(With this new approach, Landlock would have to key the rules based on
qid, but it also needs to hold an open fid to prevent that qid from being
reused (due to ext4 inode number reuse, etc))

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/7] fs/9p: Reuse inode based on path (in addition to qid)
  2025-09-21 16:24                       ` Tingmao Wang
  2025-09-27 18:27                         ` Mickaël Salaün
@ 2025-09-29 13:06                         ` Christian Schoenebeck
  2025-10-13  9:24                           ` Greg Kurz
  1 sibling, 1 reply; 27+ messages in thread
From: Christian Schoenebeck @ 2025-09-29 13:06 UTC (permalink / raw)
  To: Mickaël Salaün, Dominique Martinet, qemu-devel,
	Greg Kurz
  Cc: Eric Van Hensbergen, Latchesar Ionkov, v9fs, Günther Noack,
	linux-security-module, Jan Kara, Amir Goldstein,
	Matthew Bobrowski, Al Viro, Christian Brauner, linux-fsdevel,
	qemu-devel, Tingmao Wang

On Sunday, September 21, 2025 6:24:49 PM CEST Tingmao Wang wrote:
> On 9/17/25 16:00, Mickaël Salaün wrote:
[...]

Hi Greg,

I'd appreciate comments from your side as well, as you are much on longer on
the QEMU 9p front than me.

I know you won't have the time to read up on the entire thread so I try to
summarize: basically this is yet another user-after-unlink issue, this time on
directories instead of files.

> So I did some quick debugging and realized that I had a wrong
> understanding of how fids relates to opened files on the host, under QEMU.
> It turns out that in QEMU's 9p server implementation, a fid does not
> actually correspond to any opened file descriptors - it merely represents
> a (string-based) path that QEMU stores internally.  It only opens the
> actual file if the client actually does an T(l)open, which is in fact
> separate from acquiring the fid with T(l)walk.  The reason why renaming
> file/dirs from the client doesn't break those fids is because QEMU will
> actually fix those paths when a rename request is processed - c.f.
> v9fs_fix_fid_paths [1].

Correct, that's based on what the 9p protocols define: a FID does not exactly
translate to what a file handle is on a local system. Even after acquiring a
new FID by sending a Twalk request, subsequently client would still need to
send a Topen for server to actually open that file/directory.

And yes, QEMU's 9p server "fixes" the path string of a FID if it was moved
upon client request. If the move happened on host side, outside of server's
knowledge, then this won't happen ATM and hence it would break your use
case.

> It turns out that even if a guest process opens the file with O_PATH, that
> file descriptor does not cause an actual Topen, and therefore QEMU does
> not open the file on the host, and later on reopening that fd with another
> mode (via e.g. open("/proc/self/fd/...", O_RDONLY)) will fail if the file
> has moved on the host without QEMU's knowledge.  Also, openat will fail if
> provided with a dir fd that "points" to a moved directory, regardless of
> whether the fd is opened with O_PATH or not, since path walk in QEMU is
> completely string-based and does not actually issue openat on the host fs
> [2].

I don't think the problem here is the string based walk per se, but rather
that the string based walk always starts from the export root:

https://github.com/qemu/qemu/blob/4975b64efb5aa4248cbc3760312bbe08d6e71638/hw/9pfs/9p-local.c#L64

I guess that's something that could be changed in QEMU such that the walk
starts from FID's fs point, as the code already uses openat() to walk relative
to a file descriptor (for security reasons actually), Greg?

That alone would still not fix your use case though: things being moved on
host side. For this to work, it would require to already have a fd open on
host for the FID. This could be done by server for each FID as you suggested,
or it could be done by client by opening the FID.

Also keep in mind: once the open file descriptor limit on host is exhausted,
QEMU is forced to close older open file desciptors to keep the QEMU process
alive. So this might still break what you are trying to achieve there.

Having said that, I wonder whether it'd be simpler for server to track for
file tree changes (inotify API) and fix the pathes accordingly for host
side changes as well?

/Christian

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/7] fs/9p: Reuse inode based on path (in addition to qid)
  2025-09-29 13:06                         ` Christian Schoenebeck
@ 2025-10-13  9:24                           ` Greg Kurz
  0 siblings, 0 replies; 27+ messages in thread
From: Greg Kurz @ 2025-10-13  9:24 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: Mickaël Salaün, Dominique Martinet, qemu-devel,
	Eric Van Hensbergen, Latchesar Ionkov, v9fs, Günther Noack,
	linux-security-module, Jan Kara, Amir Goldstein,
	Matthew Bobrowski, Al Viro, Christian Brauner, linux-fsdevel,
	Tingmao Wang

On Mon, 29 Sep 2025 15:06:59 +0200
Christian Schoenebeck <linux_oss@crudebyte.com> wrote:

> On Sunday, September 21, 2025 6:24:49 PM CEST Tingmao Wang wrote:
> > On 9/17/25 16:00, Mickaël Salaün wrote:
> [...]
> 
> Hi Greg,
> 

Hi Christian,

> I'd appreciate comments from your side as well, as you are much on longer on
> the QEMU 9p front than me.
> 
> I know you won't have the time to read up on the entire thread so I try to
> summarize: basically this is yet another user-after-unlink issue, this time on
> directories instead of files.
> 

Thread that never landed in my mailbox actually and it is quite
hard to understand the root problem with the content of this
e-mail actually ;-)

> > So I did some quick debugging and realized that I had a wrong
> > understanding of how fids relates to opened files on the host, under QEMU.
> > It turns out that in QEMU's 9p server implementation, a fid does not
> > actually correspond to any opened file descriptors - it merely represents
> > a (string-based) path that QEMU stores internally.  It only opens the
> > actual file if the client actually does an T(l)open, which is in fact
> > separate from acquiring the fid with T(l)walk.  The reason why renaming
> > file/dirs from the client doesn't break those fids is because QEMU will
> > actually fix those paths when a rename request is processed - c.f.
> > v9fs_fix_fid_paths [1].
> 
> Correct, that's based on what the 9p protocols define: a FID does not exactly
> translate to what a file handle is on a local system. Even after acquiring a
> new FID by sending a Twalk request, subsequently client would still need to
> send a Topen for server to actually open that file/directory.
> 
> And yes, QEMU's 9p server "fixes" the path string of a FID if it was moved
> upon client request. If the move happened on host side, outside of server's
> knowledge, then this won't happen ATM and hence it would break your use
> case.
> 
> > It turns out that even if a guest process opens the file with O_PATH, that
> > file descriptor does not cause an actual Topen, and therefore QEMU does
> > not open the file on the host, and later on reopening that fd with another
> > mode (via e.g. open("/proc/self/fd/...", O_RDONLY)) will fail if the file
> > has moved on the host without QEMU's knowledge.  Also, openat will fail if
> > provided with a dir fd that "points" to a moved directory, regardless of
> > whether the fd is opened with O_PATH or not, since path walk in QEMU is
> > completely string-based and does not actually issue openat on the host fs
> > [2].
> 
> I don't think the problem here is the string based walk per se, but rather
> that the string based walk always starts from the export root:
> 
> https://github.com/qemu/qemu/blob/4975b64efb5aa4248cbc3760312bbe08d6e71638/hw/9pfs/9p-local.c#L64
> 
> I guess that's something that could be changed in QEMU such that the walk
> starts from FID's fs point, as the code already uses openat() to walk relative
> to a file descriptor (for security reasons actually), Greg?
> 

Yes this was introduced for security reasons. In a nutshell, the idea is
to *not* follow symlinks in any element of the path being opened. It thus
naturally starts at the export root for which we have an fd.

> That alone would still not fix your use case though: things being moved on
> host side. For this to work, it would require to already have a fd open on
> host for the FID. This could be done by server for each FID as you suggested,
> or it could be done by client by opening the FID.
> 

Can you elaborate on the "things being move on host side" ? With
an example of code that breaks on the client side ?

> Also keep in mind: once the open file descriptor limit on host is exhausted,
> QEMU is forced to close older open file desciptors to keep the QEMU process
> alive. So this might still break what you are trying to achieve there.
> 

Correct.

> Having said that, I wonder whether it'd be simpler for server to track for
> file tree changes (inotify API) and fix the pathes accordingly for host
> side changes as well?
> 

The problem is how to have the guest know about such changes, e.g. in
order to invalidate a stale cache entry. 9P doesn't provide any way for
host->client notification.

> /Christian
> 
> 

Cheers,

-- 
Greg

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2025-10-13  9:24 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-04  0:04 [PATCH v2 0/7] fs/9p: Reuse inode based on path (in addition to qid) Tingmao Wang
2025-09-04  0:04 ` [PATCH v2 1/7] fs/9p: Add ability to identify inode by path for .L in uncached mode Tingmao Wang
2025-09-04  0:04 ` [PATCH v2 2/7] fs/9p: add option for path-based inodes Tingmao Wang
2025-09-04  0:04 ` [PATCH v2 3/7] fs/9p: Add ability to identify inode by path for non-.L in uncached mode Tingmao Wang
2025-09-04  0:04 ` [PATCH v2 4/7] fs/9p: .L: Refresh stale inodes on reuse Tingmao Wang
2025-09-04  0:04 ` [PATCH v2 5/7] fs/9p: non-.L: " Tingmao Wang
2025-09-04  0:04 ` [PATCH v2 6/7] fs/9p: update the target's ino_path on rename Tingmao Wang
2025-09-04  0:04 ` [PATCH v2 7/7] docs: fs/9p: Document the "inodeident" option Tingmao Wang
2025-09-14 21:25 ` [PATCH v2 0/7] fs/9p: Reuse inode based on path (in addition to qid) Tingmao Wang
2025-09-15 12:53   ` Dominique Martinet
2025-09-15 13:44     ` Tingmao Wang
2025-09-15 23:31       ` Dominique Martinet
2025-09-16 12:44         ` Tingmao Wang
2025-09-16 13:35           ` Dominique Martinet
2025-09-16 14:01             ` Tingmao Wang
2025-09-16 19:22               ` Christian Schoenebeck
2025-09-16 23:59                 ` Tingmao Wang
2025-09-17  9:52                   ` Christian Schoenebeck
2025-09-17 15:00                     ` Mickaël Salaün
2025-09-21 16:24                       ` Tingmao Wang
2025-09-27 18:27                         ` Mickaël Salaün
2025-09-27 22:53                           ` Tingmao Wang
2025-09-29 13:06                         ` Christian Schoenebeck
2025-10-13  9:24                           ` Greg Kurz
2025-09-16 13:43           ` Christian Schoenebeck
2025-09-15 14:10     ` Christian Schoenebeck
2025-09-17 15:00       ` Mickaël Salaün

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).