linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v8 0/8] fs: clean up internal i_version handling
@ 2022-10-21 13:05 Jeff Layton
  2022-10-21 13:05 ` [PATCH v8 1/8] fs: uninline inode_query_iversion Jeff Layton
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: Jeff Layton @ 2022-10-21 13:05 UTC (permalink / raw)
  To: tytso, adilger.kernel, djwong, david, trondmy, neilb, viro, zohar,
	xiubli, chuck.lever, lczerner, jack, bfields, brauner, fweimer
  Cc: linux-btrfs, linux-fsdevel, linux-kernel, ceph-devel, linux-ext4,
	linux-nfs, linux-xfs

The main consumer of i_version field (knfsd) has to jump through a
number of hoops to fetch it, depending on what sort of inode it is.
Rather than do this, we want to offload the responsibility for
presenting this field to the filesystem's ->getattr operation, which is
a more natural way to deal with a field that may be implemented
differently.

The focus of this patchset is to clean up these internal interfaces.
This should also make it simple to present this attribute to userland in
the future, which should be possible once the semantics are a bit more
consistent across different backing filesystems.

The change are fairly small, but they cross several subsystems. I'd
appreciate R-b's and A-b's from maintainers whose subsystems I'm
touching (Chuck, Al, Trond, and Xiubo in particular).

For now, I'm leaving out more siginificant behavioral changes to
i_version handling so that we can keep the focus on this set rather
narrow. The next stap is to get this into linux-next with an aim toward
merge in v6.2.

Thanks!

Jeff Layton (8):
  fs: uninline inode_query_iversion
  fs: clarify when the i_version counter must be updated
  vfs: plumb i_version handling into struct kstat
  nfs: report the inode version in getattr if requested
  ceph: report the inode version in getattr if requested
  nfsd: move nfsd4_change_attribute to nfsfh.c
  nfsd: use the getattr operation to fetch i_version
  nfsd: remove fetch_iversion export operation

 fs/ceph/inode.c          | 16 +++++++----
 fs/libfs.c               | 36 +++++++++++++++++++++++++
 fs/nfs/export.c          |  7 -----
 fs/nfs/inode.c           | 16 ++++++++---
 fs/nfsd/nfs4xdr.c        |  4 ++-
 fs/nfsd/nfsfh.c          | 42 +++++++++++++++++++++++++++++
 fs/nfsd/nfsfh.h          | 29 +-------------------
 fs/nfsd/vfs.h            |  7 ++++-
 fs/stat.c                | 17 ++++++++++--
 include/linux/exportfs.h |  1 -
 include/linux/iversion.h | 58 ++++++++++++++--------------------------
 include/linux/stat.h     |  9 +++++++
 12 files changed, 155 insertions(+), 87 deletions(-)

-- 
2.37.3


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v8 1/8] fs: uninline inode_query_iversion
  2022-10-21 13:05 [PATCH v8 0/8] fs: clean up internal i_version handling Jeff Layton
@ 2022-10-21 13:05 ` Jeff Layton
  2022-10-21 13:05 ` [PATCH v8 2/8] fs: clarify when the i_version counter must be updated Jeff Layton
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Jeff Layton @ 2022-10-21 13:05 UTC (permalink / raw)
  To: tytso, adilger.kernel, djwong, david, trondmy, neilb, viro, zohar,
	xiubli, chuck.lever, lczerner, jack, bfields, brauner, fweimer
  Cc: linux-btrfs, linux-fsdevel, linux-kernel, ceph-devel, linux-ext4,
	linux-nfs, linux-xfs

Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/libfs.c               | 36 ++++++++++++++++++++++++++++++++++++
 include/linux/iversion.h | 38 ++------------------------------------
 2 files changed, 38 insertions(+), 36 deletions(-)

diff --git a/fs/libfs.c b/fs/libfs.c
index 682d56345a1c..5ae81466a422 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -1566,3 +1566,39 @@ bool inode_maybe_inc_iversion(struct inode *inode, bool force)
 	return true;
 }
 EXPORT_SYMBOL(inode_maybe_inc_iversion);
+
+/**
+ * inode_query_iversion - read i_version for later use
+ * @inode: inode from which i_version should be read
+ *
+ * Read the inode i_version counter. This should be used by callers that wish
+ * to store the returned i_version for later comparison. This will guarantee
+ * that a later query of the i_version will result in a different value if
+ * anything has changed.
+ *
+ * In this implementation, we fetch the current value, set the QUERIED flag and
+ * then try to swap it into place with a cmpxchg, if it wasn't already set. If
+ * that fails, we try again with the newly fetched value from the cmpxchg.
+ */
+u64 inode_query_iversion(struct inode *inode)
+{
+	u64 cur, new;
+
+	cur = inode_peek_iversion_raw(inode);
+	do {
+		/* If flag is already set, then no need to swap */
+		if (cur & I_VERSION_QUERIED) {
+			/*
+			 * This barrier (and the implicit barrier in the
+			 * cmpxchg below) pairs with the barrier in
+			 * inode_maybe_inc_iversion().
+			 */
+			smp_mb();
+			break;
+		}
+
+		new = cur | I_VERSION_QUERIED;
+	} while (!atomic64_try_cmpxchg(&inode->i_version, &cur, new));
+	return cur >> I_VERSION_QUERIED_SHIFT;
+}
+EXPORT_SYMBOL(inode_query_iversion);
diff --git a/include/linux/iversion.h b/include/linux/iversion.h
index e27bd4f55d84..6755d8b4f20b 100644
--- a/include/linux/iversion.h
+++ b/include/linux/iversion.h
@@ -234,42 +234,6 @@ inode_peek_iversion(const struct inode *inode)
 	return inode_peek_iversion_raw(inode) >> I_VERSION_QUERIED_SHIFT;
 }
 
-/**
- * inode_query_iversion - read i_version for later use
- * @inode: inode from which i_version should be read
- *
- * Read the inode i_version counter. This should be used by callers that wish
- * to store the returned i_version for later comparison. This will guarantee
- * that a later query of the i_version will result in a different value if
- * anything has changed.
- *
- * In this implementation, we fetch the current value, set the QUERIED flag and
- * then try to swap it into place with a cmpxchg, if it wasn't already set. If
- * that fails, we try again with the newly fetched value from the cmpxchg.
- */
-static inline u64
-inode_query_iversion(struct inode *inode)
-{
-	u64 cur, new;
-
-	cur = inode_peek_iversion_raw(inode);
-	do {
-		/* If flag is already set, then no need to swap */
-		if (cur & I_VERSION_QUERIED) {
-			/*
-			 * This barrier (and the implicit barrier in the
-			 * cmpxchg below) pairs with the barrier in
-			 * inode_maybe_inc_iversion().
-			 */
-			smp_mb();
-			break;
-		}
-
-		new = cur | I_VERSION_QUERIED;
-	} while (!atomic64_try_cmpxchg(&inode->i_version, &cur, new));
-	return cur >> I_VERSION_QUERIED_SHIFT;
-}
-
 /*
  * For filesystems without any sort of change attribute, the best we can
  * do is fake one up from the ctime:
@@ -283,6 +247,8 @@ static inline u64 time_to_chattr(struct timespec64 *t)
 	return chattr;
 }
 
+u64 inode_query_iversion(struct inode *inode);
+
 /**
  * inode_eq_iversion_raw - check whether the raw i_version counter has changed
  * @inode: inode to check
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v8 2/8] fs: clarify when the i_version counter must be updated
  2022-10-21 13:05 [PATCH v8 0/8] fs: clean up internal i_version handling Jeff Layton
  2022-10-21 13:05 ` [PATCH v8 1/8] fs: uninline inode_query_iversion Jeff Layton
@ 2022-10-21 13:05 ` Jeff Layton
  2022-10-21 13:05 ` [PATCH v8 3/8] vfs: plumb i_version handling into struct kstat Jeff Layton
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Jeff Layton @ 2022-10-21 13:05 UTC (permalink / raw)
  To: tytso, adilger.kernel, djwong, david, trondmy, neilb, viro, zohar,
	xiubli, chuck.lever, lczerner, jack, bfields, brauner, fweimer
  Cc: linux-btrfs, linux-fsdevel, linux-kernel, ceph-devel, linux-ext4,
	linux-nfs, linux-xfs, Colin Walters

The i_version field in the kernel has had different semantics over
the decades, but NFSv4 has certain expectations. Update the comments
in iversion.h to describe when the i_version must change.

Cc: Colin Walters <walters@verbum.org>
Cc: NeilBrown <neilb@suse.de>
Cc: Trond Myklebust <trondmy@hammerspace.com>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 include/linux/iversion.h | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/include/linux/iversion.h b/include/linux/iversion.h
index 6755d8b4f20b..94f4dc620d01 100644
--- a/include/linux/iversion.h
+++ b/include/linux/iversion.h
@@ -9,8 +9,24 @@
  * ---------------------------
  * The change attribute (i_version) is mandated by NFSv4 and is mostly for
  * knfsd, but is also used for other purposes (e.g. IMA). The i_version must
- * appear different to observers if there was a change to the inode's data or
- * metadata since it was last queried.
+ * appear larger to observers if there was an explicit change to the inode's
+ * data or metadata since it was last queried.
+ *
+ * An explicit change is one that would ordinarily result in a change to the
+ * inode status change time (aka ctime). i_version must appear to change, even
+ * if the ctime does not (since the whole point is to avoid missing updates due
+ * to timestamp granularity). If POSIX mandates that the ctime must change due
+ * to an operation, then the i_version counter must be incremented as well.
+ *
+ * Making the i_version update completely atomic with the operation itself would
+ * be prohibitively expensive. Traditionally the kernel has updated the times on
+ * directories after an operation that changes its contents. For regular files,
+ * the ctime is usually updated before the data is copied into the cache for a
+ * write. This means that there is a window of time when an observer can
+ * associate a new timestamp with old file contents. Since the purpose of the
+ * i_version is to allow for better cache coherency, the i_version must always
+ * be updated after the results of the operation are visible. Updating it before
+ * and after a change is also permitted.
  *
  * Observers see the i_version as a 64-bit number that never decreases. If it
  * remains the same since it was last checked, then nothing has changed in the
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v8 3/8] vfs: plumb i_version handling into struct kstat
  2022-10-21 13:05 [PATCH v8 0/8] fs: clean up internal i_version handling Jeff Layton
  2022-10-21 13:05 ` [PATCH v8 1/8] fs: uninline inode_query_iversion Jeff Layton
  2022-10-21 13:05 ` [PATCH v8 2/8] fs: clarify when the i_version counter must be updated Jeff Layton
@ 2022-10-21 13:05 ` Jeff Layton
  2022-10-21 13:05 ` [PATCH v8 4/8] nfs: report the inode version in getattr if requested Jeff Layton
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Jeff Layton @ 2022-10-21 13:05 UTC (permalink / raw)
  To: tytso, adilger.kernel, djwong, david, trondmy, neilb, viro, zohar,
	xiubli, chuck.lever, lczerner, jack, bfields, brauner, fweimer
  Cc: linux-btrfs, linux-fsdevel, linux-kernel, ceph-devel, linux-ext4,
	linux-nfs, linux-xfs, Jeff Layton

From: Jeff Layton <jlayton@redhat.com>

The NFS server has a lot of special handling for different types of
change attribute access, depending on the underlying filesystem. In
most cases, it's doing a getattr anyway and then fetching that value
after the fact.

Rather that do that, add a new STATX_CHANGE_COOKIE flag that is a
kernel-only symbol (for now). If requested and getattr can implement it,
it can fill out this field. For IS_I_VERSION inodes, add a generic
implementation in vfs_getattr_nosec. Take care to mask
STATX_CHANGE_COOKIE off in requests from userland and in the result
mask.

Since not all filesystems can give the same guarantees of monotonicity,
claim a STATX_ATTR_CHANGE_MONOTONIC flag that filesystems can set to
indicate that they offer an i_version value that can never go backward.

Eventually if we decide to make the i_version available to userland, we
can just designate a field for it in struct statx, and move the
STATX_CHANGE_COOKIE definition to the uapi header.

Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/stat.c            | 17 +++++++++++++++--
 include/linux/stat.h |  9 +++++++++
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/fs/stat.c b/fs/stat.c
index ef50573c72a2..06fd3fc1ab84 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -18,6 +18,7 @@
 #include <linux/syscalls.h>
 #include <linux/pagemap.h>
 #include <linux/compat.h>
+#include <linux/iversion.h>
 
 #include <linux/uaccess.h>
 #include <asm/unistd.h>
@@ -119,6 +120,11 @@ int vfs_getattr_nosec(const struct path *path, struct kstat *stat,
 	stat->attributes_mask |= (STATX_ATTR_AUTOMOUNT |
 				  STATX_ATTR_DAX);
 
+	if ((request_mask & STATX_CHANGE_COOKIE) && IS_I_VERSION(inode)) {
+		stat->result_mask |= STATX_CHANGE_COOKIE;
+		stat->change_cookie = inode_query_iversion(inode);
+	}
+
 	mnt_userns = mnt_user_ns(path->mnt);
 	if (inode->i_op->getattr)
 		return inode->i_op->getattr(mnt_userns, path, stat,
@@ -599,9 +605,11 @@ cp_statx(const struct kstat *stat, struct statx __user *buffer)
 
 	memset(&tmp, 0, sizeof(tmp));
 
-	tmp.stx_mask = stat->result_mask;
+	/* STATX_CHANGE_COOKIE is kernel-only for now */
+	tmp.stx_mask = stat->result_mask & ~STATX_CHANGE_COOKIE;
 	tmp.stx_blksize = stat->blksize;
-	tmp.stx_attributes = stat->attributes;
+	/* STATX_ATTR_CHANGE_MONOTONIC is kernel-only for now */
+	tmp.stx_attributes = stat->attributes & ~STATX_ATTR_CHANGE_MONOTONIC;
 	tmp.stx_nlink = stat->nlink;
 	tmp.stx_uid = from_kuid_munged(current_user_ns(), stat->uid);
 	tmp.stx_gid = from_kgid_munged(current_user_ns(), stat->gid);
@@ -640,6 +648,11 @@ int do_statx(int dfd, struct filename *filename, unsigned int flags,
 	if ((flags & AT_STATX_SYNC_TYPE) == AT_STATX_SYNC_TYPE)
 		return -EINVAL;
 
+	/* STATX_CHANGE_COOKIE is kernel-only for now. Ignore requests
+	 * from userland.
+	 */
+	mask &= ~STATX_CHANGE_COOKIE;
+
 	error = vfs_statx(dfd, filename, flags, &stat, mask);
 	if (error)
 		return error;
diff --git a/include/linux/stat.h b/include/linux/stat.h
index ff277ced50e9..52150570d37a 100644
--- a/include/linux/stat.h
+++ b/include/linux/stat.h
@@ -52,6 +52,15 @@ struct kstat {
 	u64		mnt_id;
 	u32		dio_mem_align;
 	u32		dio_offset_align;
+	u64		change_cookie;
 };
 
+/* These definitions are internal to the kernel for now. Mainly used by nfsd. */
+
+/* mask values */
+#define STATX_CHANGE_COOKIE		0x40000000U	/* Want/got stx_change_attr */
+
+/* file attribute values */
+#define STATX_ATTR_CHANGE_MONOTONIC	0x8000000000000000ULL /* version monotonically increases */
+
 #endif
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v8 4/8] nfs: report the inode version in getattr if requested
  2022-10-21 13:05 [PATCH v8 0/8] fs: clean up internal i_version handling Jeff Layton
                   ` (2 preceding siblings ...)
  2022-10-21 13:05 ` [PATCH v8 3/8] vfs: plumb i_version handling into struct kstat Jeff Layton
@ 2022-10-21 13:05 ` Jeff Layton
  2022-10-21 13:05 ` [PATCH v8 5/8] ceph: " Jeff Layton
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Jeff Layton @ 2022-10-21 13:05 UTC (permalink / raw)
  To: tytso, adilger.kernel, djwong, david, trondmy, neilb, viro, zohar,
	xiubli, chuck.lever, lczerner, jack, bfields, brauner, fweimer
  Cc: linux-btrfs, linux-fsdevel, linux-kernel, ceph-devel, linux-ext4,
	linux-nfs, linux-xfs

Allow NFS to report the i_version in getattr requests. Since the cost to
fetch it is relatively cheap, do it unconditionally and just set the
flag if it looks like it's valid. Also, conditionally enable the
MONOTONIC flag when the server reports its change attr type as such.

Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/nfs/inode.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 6b2cfa59a1a2..f70f11df1ee0 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -825,6 +825,8 @@ static u32 nfs_get_valid_attrmask(struct inode *inode)
 		reply_mask |= STATX_UID | STATX_GID;
 	if (!(cache_validity & NFS_INO_INVALID_BLOCKS))
 		reply_mask |= STATX_BLOCKS;
+	if (!(cache_validity & NFS_INO_INVALID_CHANGE))
+		reply_mask |= STATX_CHANGE_COOKIE;
 	return reply_mask;
 }
 
@@ -843,7 +845,8 @@ int nfs_getattr(struct user_namespace *mnt_userns, const struct path *path,
 
 	request_mask &= STATX_TYPE | STATX_MODE | STATX_NLINK | STATX_UID |
 			STATX_GID | STATX_ATIME | STATX_MTIME | STATX_CTIME |
-			STATX_INO | STATX_SIZE | STATX_BLOCKS;
+			STATX_INO | STATX_SIZE | STATX_BLOCKS | STATX_BTIME |
+			STATX_CHANGE_COOKIE;
 
 	if ((query_flags & AT_STATX_DONT_SYNC) && !force_sync) {
 		if (readdirplus_enabled)
@@ -851,8 +854,8 @@ int nfs_getattr(struct user_namespace *mnt_userns, const struct path *path,
 		goto out_no_revalidate;
 	}
 
-	/* Flush out writes to the server in order to update c/mtime.  */
-	if ((request_mask & (STATX_CTIME | STATX_MTIME)) &&
+	/* Flush out writes to the server in order to update c/mtime/version.  */
+	if ((request_mask & (STATX_CTIME | STATX_MTIME | STATX_CHANGE_COOKIE)) &&
 	    S_ISREG(inode->i_mode))
 		filemap_write_and_wait(inode->i_mapping);
 
@@ -872,7 +875,8 @@ int nfs_getattr(struct user_namespace *mnt_userns, const struct path *path,
 	/* Is the user requesting attributes that might need revalidation? */
 	if (!(request_mask & (STATX_MODE|STATX_NLINK|STATX_ATIME|STATX_CTIME|
 					STATX_MTIME|STATX_UID|STATX_GID|
-					STATX_SIZE|STATX_BLOCKS)))
+					STATX_SIZE|STATX_BLOCKS|
+					STATX_CHANGE_COOKIE)))
 		goto out_no_revalidate;
 
 	/* Check whether the cached attributes are stale */
@@ -910,6 +914,10 @@ int nfs_getattr(struct user_namespace *mnt_userns, const struct path *path,
 
 	generic_fillattr(&init_user_ns, inode, stat);
 	stat->ino = nfs_compat_user_ino64(NFS_FILEID(inode));
+	stat->change_cookie = inode_peek_iversion_raw(inode);
+	stat->attributes_mask |= STATX_ATTR_CHANGE_MONOTONIC;
+	if (server->change_attr_type != NFS4_CHANGE_TYPE_IS_UNDEFINED)
+		stat->attributes |= STATX_ATTR_CHANGE_MONOTONIC;
 	if (S_ISDIR(inode->i_mode))
 		stat->blksize = NFS_SERVER(inode)->dtsize;
 out:
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v8 5/8] ceph: report the inode version in getattr if requested
  2022-10-21 13:05 [PATCH v8 0/8] fs: clean up internal i_version handling Jeff Layton
                   ` (3 preceding siblings ...)
  2022-10-21 13:05 ` [PATCH v8 4/8] nfs: report the inode version in getattr if requested Jeff Layton
@ 2022-10-21 13:05 ` Jeff Layton
  2022-10-21 13:06 ` [PATCH v8 6/8] nfsd: move nfsd4_change_attribute to nfsfh.c Jeff Layton
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Jeff Layton @ 2022-10-21 13:05 UTC (permalink / raw)
  To: tytso, adilger.kernel, djwong, david, trondmy, neilb, viro, zohar,
	xiubli, chuck.lever, lczerner, jack, bfields, brauner, fweimer
  Cc: linux-btrfs, linux-fsdevel, linux-kernel, ceph-devel, linux-ext4,
	linux-nfs, linux-xfs

When getattr requests the STX_CHANGE_COOKIE, request the full gamut of
caps (similarly to how ctime is handled). When the change attribute
seems to be valid, return it in the change_cookie field and set the flag
in the reply mask. Also, unconditionally enable
STATX_ATTR_CHANGE_MONOTONIC.

Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/ceph/inode.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
index 4af5e55abc15..0e2e6ca399a9 100644
--- a/fs/ceph/inode.c
+++ b/fs/ceph/inode.c
@@ -2417,10 +2417,10 @@ static int statx_to_caps(u32 want, umode_t mode)
 {
 	int mask = 0;
 
-	if (want & (STATX_MODE|STATX_UID|STATX_GID|STATX_CTIME|STATX_BTIME))
+	if (want & (STATX_MODE|STATX_UID|STATX_GID|STATX_CTIME|STATX_BTIME|STATX_CHANGE_COOKIE))
 		mask |= CEPH_CAP_AUTH_SHARED;
 
-	if (want & (STATX_NLINK|STATX_CTIME)) {
+	if (want & (STATX_NLINK|STATX_CTIME|STATX_CHANGE_COOKIE)) {
 		/*
 		 * The link count for directories depends on inode->i_subdirs,
 		 * and that is only updated when Fs caps are held.
@@ -2431,11 +2431,10 @@ static int statx_to_caps(u32 want, umode_t mode)
 			mask |= CEPH_CAP_LINK_SHARED;
 	}
 
-	if (want & (STATX_ATIME|STATX_MTIME|STATX_CTIME|STATX_SIZE|
-		    STATX_BLOCKS))
+	if (want & (STATX_ATIME|STATX_MTIME|STATX_CTIME|STATX_SIZE|STATX_BLOCKS|STATX_CHANGE_COOKIE))
 		mask |= CEPH_CAP_FILE_SHARED;
 
-	if (want & (STATX_CTIME))
+	if (want & (STATX_CTIME|STATX_CHANGE_COOKIE))
 		mask |= CEPH_CAP_XATTR_SHARED;
 
 	return mask;
@@ -2478,6 +2477,11 @@ int ceph_getattr(struct user_namespace *mnt_userns, const struct path *path,
 		valid_mask |= STATX_BTIME;
 	}
 
+	if (request_mask & STATX_CHANGE_COOKIE) {
+		stat->change_cookie = inode_peek_iversion_raw(inode);
+		valid_mask |= STATX_CHANGE_COOKIE;
+	}
+
 	if (ceph_snap(inode) == CEPH_NOSNAP)
 		stat->dev = sb->s_dev;
 	else
@@ -2519,6 +2523,8 @@ int ceph_getattr(struct user_namespace *mnt_userns, const struct path *path,
 			stat->nlink = 1 + 1 + ci->i_subdirs;
 	}
 
+	stat->attributes_mask |= STATX_ATTR_CHANGE_MONOTONIC;
+	stat->attributes |= STATX_ATTR_CHANGE_MONOTONIC;
 	stat->result_mask = request_mask & valid_mask;
 	return err;
 }
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v8 6/8] nfsd: move nfsd4_change_attribute to nfsfh.c
  2022-10-21 13:05 [PATCH v8 0/8] fs: clean up internal i_version handling Jeff Layton
                   ` (4 preceding siblings ...)
  2022-10-21 13:05 ` [PATCH v8 5/8] ceph: " Jeff Layton
@ 2022-10-21 13:06 ` Jeff Layton
  2022-10-21 13:06 ` [PATCH v8 7/8] nfsd: use the getattr operation to fetch i_version Jeff Layton
  2022-10-21 13:06 ` [PATCH v8 8/8] nfsd: remove fetch_iversion export operation Jeff Layton
  7 siblings, 0 replies; 9+ messages in thread
From: Jeff Layton @ 2022-10-21 13:06 UTC (permalink / raw)
  To: tytso, adilger.kernel, djwong, david, trondmy, neilb, viro, zohar,
	xiubli, chuck.lever, lczerner, jack, bfields, brauner, fweimer
  Cc: linux-btrfs, linux-fsdevel, linux-kernel, ceph-devel, linux-ext4,
	linux-nfs, linux-xfs

This is a pretty big function for inlining. Move it to being
non-inlined.

Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/nfsd/nfsfh.c | 27 +++++++++++++++++++++++++++
 fs/nfsd/nfsfh.h | 29 +----------------------------
 2 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
index d73434200df9..7030d9209903 100644
--- a/fs/nfsd/nfsfh.c
+++ b/fs/nfsd/nfsfh.c
@@ -748,3 +748,30 @@ enum fsid_source fsid_source(const struct svc_fh *fhp)
 		return FSIDSOURCE_UUID;
 	return FSIDSOURCE_DEV;
 }
+
+/*
+ * We could use i_version alone as the change attribute.  However,
+ * i_version can go backwards after a reboot.  On its own that doesn't
+ * necessarily cause a problem, but if i_version goes backwards and then
+ * is incremented again it could reuse a value that was previously used
+ * before boot, and a client who queried the two values might
+ * incorrectly assume nothing changed.
+ *
+ * By using both ctime and the i_version counter we guarantee that as
+ * long as time doesn't go backwards we never reuse an old value.
+ */
+u64 nfsd4_change_attribute(struct kstat *stat, struct inode *inode)
+{
+	if (inode->i_sb->s_export_op->fetch_iversion)
+		return inode->i_sb->s_export_op->fetch_iversion(inode);
+	else if (IS_I_VERSION(inode)) {
+		u64 chattr;
+
+		chattr =  stat->ctime.tv_sec;
+		chattr <<= 30;
+		chattr += stat->ctime.tv_nsec;
+		chattr += inode_query_iversion(inode);
+		return chattr;
+	} else
+		return time_to_chattr(&stat->ctime);
+}
diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
index c3ae6414fc5c..4c223a7a91d4 100644
--- a/fs/nfsd/nfsfh.h
+++ b/fs/nfsd/nfsfh.h
@@ -291,34 +291,7 @@ static inline void fh_clear_pre_post_attrs(struct svc_fh *fhp)
 	fhp->fh_pre_saved = false;
 }
 
-/*
- * We could use i_version alone as the change attribute.  However,
- * i_version can go backwards after a reboot.  On its own that doesn't
- * necessarily cause a problem, but if i_version goes backwards and then
- * is incremented again it could reuse a value that was previously used
- * before boot, and a client who queried the two values might
- * incorrectly assume nothing changed.
- *
- * By using both ctime and the i_version counter we guarantee that as
- * long as time doesn't go backwards we never reuse an old value.
- */
-static inline u64 nfsd4_change_attribute(struct kstat *stat,
-					 struct inode *inode)
-{
-	if (inode->i_sb->s_export_op->fetch_iversion)
-		return inode->i_sb->s_export_op->fetch_iversion(inode);
-	else if (IS_I_VERSION(inode)) {
-		u64 chattr;
-
-		chattr =  stat->ctime.tv_sec;
-		chattr <<= 30;
-		chattr += stat->ctime.tv_nsec;
-		chattr += inode_query_iversion(inode);
-		return chattr;
-	} else
-		return time_to_chattr(&stat->ctime);
-}
-
+u64 nfsd4_change_attribute(struct kstat *stat, struct inode *inode);
 extern void fh_fill_pre_attrs(struct svc_fh *fhp);
 extern void fh_fill_post_attrs(struct svc_fh *fhp);
 extern void fh_fill_both_attrs(struct svc_fh *fhp);
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v8 7/8] nfsd: use the getattr operation to fetch i_version
  2022-10-21 13:05 [PATCH v8 0/8] fs: clean up internal i_version handling Jeff Layton
                   ` (5 preceding siblings ...)
  2022-10-21 13:06 ` [PATCH v8 6/8] nfsd: move nfsd4_change_attribute to nfsfh.c Jeff Layton
@ 2022-10-21 13:06 ` Jeff Layton
  2022-10-21 13:06 ` [PATCH v8 8/8] nfsd: remove fetch_iversion export operation Jeff Layton
  7 siblings, 0 replies; 9+ messages in thread
From: Jeff Layton @ 2022-10-21 13:06 UTC (permalink / raw)
  To: tytso, adilger.kernel, djwong, david, trondmy, neilb, viro, zohar,
	xiubli, chuck.lever, lczerner, jack, bfields, brauner, fweimer
  Cc: linux-btrfs, linux-fsdevel, linux-kernel, ceph-devel, linux-ext4,
	linux-nfs, linux-xfs

Now that we can call into vfs_getattr to get the i_version field, use
that facility to fetch it instead of doing it in nfsd4_change_attribute.

Neil also pointed out recently that IS_I_VERSION directory operations
are always logged, and so we only need to mitigate the rollback problem
on regular files. Also, we don't need to factor in the ctime when
reexporting NFS or Ceph.

Set the STATX_CHANGE_COOKIE (and BTIME) bits in the request when we're
dealing with a v4 request. Then, instead of looking at IS_I_VERSION when
generating the change attr, look at the result mask and only use it if
STATX_CHANGE_COOKIE is set.

Change nfsd4_change_attribute to only factor in the ctime if it's a
regular file and the fs doesn't advertise STATX_ATTR_CHANGE_MONOTONIC.

Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/nfsd/nfs4xdr.c |  4 +++-
 fs/nfsd/nfsfh.c   | 54 +++++++++++++++++++++++++++++++----------------
 fs/nfsd/vfs.h     |  7 +++++-
 3 files changed, 45 insertions(+), 20 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index bcfeb1a922c0..06eb1aa7846b 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -2906,7 +2906,9 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
 			goto out;
 	}
 
-	err = vfs_getattr(&path, &stat, STATX_BASIC_STATS, AT_STATX_SYNC_AS_STAT);
+	err = vfs_getattr(&path, &stat,
+			  STATX_BASIC_STATS | STATX_BTIME | STATX_CHANGE_COOKIE,
+			  AT_STATX_SYNC_AS_STAT);
 	if (err)
 		goto out_nfserr;
 	if (!(stat.result_mask & STATX_BTIME))
diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
index 7030d9209903..3e09129db340 100644
--- a/fs/nfsd/nfsfh.c
+++ b/fs/nfsd/nfsfh.c
@@ -628,6 +628,10 @@ void fh_fill_pre_attrs(struct svc_fh *fhp)
 		stat.mtime = inode->i_mtime;
 		stat.ctime = inode->i_ctime;
 		stat.size  = inode->i_size;
+		if (v4 && IS_I_VERSION(inode)) {
+			stat.change_cookie = inode_query_iversion(inode);
+			stat.result_mask |= STATX_CHANGE_COOKIE;
+		}
 	}
 	if (v4)
 		fhp->fh_pre_change = nfsd4_change_attribute(&stat, inode);
@@ -659,6 +663,10 @@ void fh_fill_post_attrs(struct svc_fh *fhp)
 	if (err) {
 		fhp->fh_post_saved = false;
 		fhp->fh_post_attr.ctime = inode->i_ctime;
+		if (v4 && IS_I_VERSION(inode)) {
+			fhp->fh_post_attr.change_cookie = inode_query_iversion(inode);
+			fhp->fh_post_attr.result_mask |= STATX_CHANGE_COOKIE;
+		}
 	} else
 		fhp->fh_post_saved = true;
 	if (v4)
@@ -750,28 +758,38 @@ enum fsid_source fsid_source(const struct svc_fh *fhp)
 }
 
 /*
- * We could use i_version alone as the change attribute.  However,
- * i_version can go backwards after a reboot.  On its own that doesn't
- * necessarily cause a problem, but if i_version goes backwards and then
- * is incremented again it could reuse a value that was previously used
- * before boot, and a client who queried the two values might
- * incorrectly assume nothing changed.
+ * We could use i_version alone as the change attribute.  However, i_version
+ * can go backwards on a regular file after an unclean shutdown.  On its own
+ * that doesn't necessarily cause a problem, but if i_version goes backwards
+ * and then is incremented again it could reuse a value that was previously
+ * used before boot, and a client who queried the two values might incorrectly
+ * assume nothing changed.
+ *
+ * By using both ctime and the i_version counter we guarantee that as long as
+ * time doesn't go backwards we never reuse an old value. If the filesystem
+ * advertises STATX_ATTR_CHANGE_MONOTONIC, then this mitigation is not
+ * needed.
  *
- * By using both ctime and the i_version counter we guarantee that as
- * long as time doesn't go backwards we never reuse an old value.
+ * We only need to do this for regular files as well. For directories, we
+ * assume that the new change attr is always logged to stable storage in some
+ * fashion before the results can be seen.
  */
 u64 nfsd4_change_attribute(struct kstat *stat, struct inode *inode)
 {
+	u64 chattr;
+
 	if (inode->i_sb->s_export_op->fetch_iversion)
 		return inode->i_sb->s_export_op->fetch_iversion(inode);
-	else if (IS_I_VERSION(inode)) {
-		u64 chattr;
-
-		chattr =  stat->ctime.tv_sec;
-		chattr <<= 30;
-		chattr += stat->ctime.tv_nsec;
-		chattr += inode_query_iversion(inode);
-		return chattr;
-	} else
-		return time_to_chattr(&stat->ctime);
+	if (stat->result_mask & STATX_CHANGE_COOKIE) {
+		chattr = stat->change_cookie;
+
+		if (S_ISREG(inode->i_mode) &&
+		    !(stat->attributes & STATX_ATTR_CHANGE_MONOTONIC)) {
+			chattr += (u64)stat->ctime.tv_sec << 30;
+			chattr += stat->ctime.tv_nsec;
+		}
+	} else {
+		chattr = time_to_chattr(&stat->ctime);
+	}
+	return chattr;
 }
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index 120521bc7b24..1b205a27f961 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -168,9 +168,14 @@ static inline void fh_drop_write(struct svc_fh *fh)
 
 static inline __be32 fh_getattr(const struct svc_fh *fh, struct kstat *stat)
 {
+	u32 request_mask = STATX_BASIC_STATS;
 	struct path p = {.mnt = fh->fh_export->ex_path.mnt,
 			 .dentry = fh->fh_dentry};
-	return nfserrno(vfs_getattr(&p, stat, STATX_BASIC_STATS,
+
+	if (fh->fh_maxsize == NFS4_FHSIZE)
+		request_mask |= (STATX_BTIME | STATX_CHANGE_COOKIE);
+
+	return nfserrno(vfs_getattr(&p, stat, request_mask,
 				    AT_STATX_SYNC_AS_STAT));
 }
 
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v8 8/8] nfsd: remove fetch_iversion export operation
  2022-10-21 13:05 [PATCH v8 0/8] fs: clean up internal i_version handling Jeff Layton
                   ` (6 preceding siblings ...)
  2022-10-21 13:06 ` [PATCH v8 7/8] nfsd: use the getattr operation to fetch i_version Jeff Layton
@ 2022-10-21 13:06 ` Jeff Layton
  7 siblings, 0 replies; 9+ messages in thread
From: Jeff Layton @ 2022-10-21 13:06 UTC (permalink / raw)
  To: tytso, adilger.kernel, djwong, david, trondmy, neilb, viro, zohar,
	xiubli, chuck.lever, lczerner, jack, bfields, brauner, fweimer
  Cc: linux-btrfs, linux-fsdevel, linux-kernel, ceph-devel, linux-ext4,
	linux-nfs, linux-xfs

Now that the i_version counter is reported in struct kstat, there is no
need for this export operation.

Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/nfs/export.c          | 7 -------
 fs/nfsd/nfsfh.c          | 3 ---
 include/linux/exportfs.h | 1 -
 3 files changed, 11 deletions(-)

diff --git a/fs/nfs/export.c b/fs/nfs/export.c
index 01596f2d0a1e..1a9d5aa51dfb 100644
--- a/fs/nfs/export.c
+++ b/fs/nfs/export.c
@@ -145,17 +145,10 @@ nfs_get_parent(struct dentry *dentry)
 	return parent;
 }
 
-static u64 nfs_fetch_iversion(struct inode *inode)
-{
-	nfs_revalidate_inode(inode, NFS_INO_INVALID_CHANGE);
-	return inode_peek_iversion_raw(inode);
-}
-
 const struct export_operations nfs_export_ops = {
 	.encode_fh = nfs_encode_fh,
 	.fh_to_dentry = nfs_fh_to_dentry,
 	.get_parent = nfs_get_parent,
-	.fetch_iversion = nfs_fetch_iversion,
 	.flags = EXPORT_OP_NOWCC|EXPORT_OP_NOSUBTREECHK|
 		EXPORT_OP_CLOSE_BEFORE_UNLINK|EXPORT_OP_REMOTE_FS|
 		EXPORT_OP_NOATOMIC_ATTR,
diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
index 3e09129db340..bd450b141ca4 100644
--- a/fs/nfsd/nfsfh.c
+++ b/fs/nfsd/nfsfh.c
@@ -778,11 +778,8 @@ u64 nfsd4_change_attribute(struct kstat *stat, struct inode *inode)
 {
 	u64 chattr;
 
-	if (inode->i_sb->s_export_op->fetch_iversion)
-		return inode->i_sb->s_export_op->fetch_iversion(inode);
 	if (stat->result_mask & STATX_CHANGE_COOKIE) {
 		chattr = stat->change_cookie;
-
 		if (S_ISREG(inode->i_mode) &&
 		    !(stat->attributes & STATX_ATTR_CHANGE_MONOTONIC)) {
 			chattr += (u64)stat->ctime.tv_sec << 30;
diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
index fe848901fcc3..9f4d4bcbf251 100644
--- a/include/linux/exportfs.h
+++ b/include/linux/exportfs.h
@@ -213,7 +213,6 @@ struct export_operations {
 			  bool write, u32 *device_generation);
 	int (*commit_blocks)(struct inode *inode, struct iomap *iomaps,
 			     int nr_iomaps, struct iattr *iattr);
-	u64 (*fetch_iversion)(struct inode *);
 #define	EXPORT_OP_NOWCC			(0x1) /* don't collect v3 wcc data */
 #define	EXPORT_OP_NOSUBTREECHK		(0x2) /* no subtree checking */
 #define	EXPORT_OP_CLOSE_BEFORE_UNLINK	(0x4) /* close files before unlink */
-- 
2.37.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-10-21 13:07 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-10-21 13:05 [PATCH v8 0/8] fs: clean up internal i_version handling Jeff Layton
2022-10-21 13:05 ` [PATCH v8 1/8] fs: uninline inode_query_iversion Jeff Layton
2022-10-21 13:05 ` [PATCH v8 2/8] fs: clarify when the i_version counter must be updated Jeff Layton
2022-10-21 13:05 ` [PATCH v8 3/8] vfs: plumb i_version handling into struct kstat Jeff Layton
2022-10-21 13:05 ` [PATCH v8 4/8] nfs: report the inode version in getattr if requested Jeff Layton
2022-10-21 13:05 ` [PATCH v8 5/8] ceph: " Jeff Layton
2022-10-21 13:06 ` [PATCH v8 6/8] nfsd: move nfsd4_change_attribute to nfsfh.c Jeff Layton
2022-10-21 13:06 ` [PATCH v8 7/8] nfsd: use the getattr operation to fetch i_version Jeff Layton
2022-10-21 13:06 ` [PATCH v8 8/8] nfsd: remove fetch_iversion export operation Jeff Layton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).