linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] FUSE consistency improvements
@ 2023-07-11  4:34 Jiachen Zhang
  2023-07-11  4:34 ` [PATCH 1/5] fuse: check attributes staleness on fuse_iget() Jiachen Zhang
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Jiachen Zhang @ 2023-07-11  4:34 UTC (permalink / raw)
  To: Miklos Szeredi, Jonathan Corbet, linux-fsdevel, linux-doc,
	linux-kernel
  Cc: me, Jiachen Zhang

This patchset resends some patches that related to FUSE consistency
improvements in the mailing list.

The 1st patch fixes a staleness-checking issue, which is the v2 version
of the patch[1].

The 2nd patch is a resend version of the patch[2] with its commit message
rewritten.

The 3rd and 4th patches are new versions of the patch[3] and the patch[4],
FUSE filesystems are able to implement the close-to-open (CTO) consistency
semantics with the help of these two patches. The 5th patch is a new
patch which improves the explanation of FUSE cache mode and consistency
models in the documentation.


[1] [PATCH] fuse: initialize attr_version of new fuse inodes by fc->attr_version,
https://lore.kernel.org/lkml/20221111093702.80975-1-zhangjiachen.jaycee@bytedance.com/
[2] [PATCH] fuse: invalidate dentry on EEXIST creates or ENOENT deletes,
https://lore.kernel.org/lkml/20220805131823.83544-1-zhangjiachen.jaycee@bytedance.com/
[3] [PATCH] fuse: add FOPEN_INVAL_ATTR,
https://lore.kernel.org/lkml/20220608104202.19461-1-zhangjiachen.jaycee@bytedance.com/
[4] [PATCH] fuse: writeback_cache consistency enhancement (writeback_cache_v2),
https://lore.kernel.org/lkml/20220624055825.29183-1-zhangjiachen.jaycee@bytedance.com/


Jiachen Zhang (5):
  fuse: check attributes staleness on fuse_iget()
  fuse: invalidate dentry on EEXIST creates or ENOENT deletes
  fuse: add FOPEN_INVAL_ATTR
  fuse: writeback_cache consistency enhancement (writeback_cache_v2)
  docs: fuse: improve FUSE consistency explanation

 Documentation/filesystems/fuse-io.rst | 32 +++++++++++++++--
 fs/fuse/dir.c                         | 11 +++---
 fs/fuse/file.c                        | 35 +++++++++++++++++++
 fs/fuse/fuse_i.h                      |  6 ++++
 fs/fuse/inode.c                       | 49 +++++++++++++++++++++++++--
 include/uapi/linux/fuse.h             | 11 +++++-
 6 files changed, 135 insertions(+), 9 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/5] fuse: check attributes staleness on fuse_iget()
  2023-07-11  4:34 [PATCH 0/5] FUSE consistency improvements Jiachen Zhang
@ 2023-07-11  4:34 ` Jiachen Zhang
  2023-08-23  8:57   ` Miklos Szeredi
  2023-07-11  4:34 ` [PATCH 2/5] fuse: invalidate dentry on EEXIST creates or ENOENT deletes Jiachen Zhang
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Jiachen Zhang @ 2023-07-11  4:34 UTC (permalink / raw)
  To: Miklos Szeredi, Jonathan Corbet, linux-fsdevel, linux-doc,
	linux-kernel
  Cc: me, Jiachen Zhang

Function fuse_direntplus_link() might call fuse_iget() to initialize a new
fuse_inode and change its attributes. If fi->attr_version is always
initialized with 0, even if the attributes returned by the FUSE_READDIR
request is staled, as the new fi->attr_version is 0, fuse_change_attributes
will still set the staled attributes to inode. This wrong behaviour may
cause file size inconsistency even when there is no changes from
server-side.

To reproduce the issue, consider the following 2 programs (A and B) are
running concurrently,

        A                                               B
----------------------------------      --------------------------------
{ /fusemnt/dir/f is a file path in a fuse mount, the size of f is 0. }

readdir(/fusemnt/dir) start
//Daemon set size 0 to f direntry
                                        fallocate(f, 1024)
                                        stat(f) // B see size 1024
                                        echo 2 > /proc/sys/vm/drop_caches
readdir(/fusemnt/dir) reply to kernel
Kernel set 0 to the I_NEW inode

                                        stat(f) // B see size 0

In the above case, only program B is modifying the file size, however, B
observes file size changing between the 2 'readonly' stat() calls. To fix
this issue, we should make sure readdirplus still follows the rule of
attr_version staleness checking even if the fi->attr_version is lost due to
inode eviction. So this patch increases fc->attr_version on inode eviction,
and compares request attr_version and the fc->attr_version when a
FUSE_READDIRPLUS request is finished.

Signed-off-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
---
 fs/fuse/inode.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 660be31aaabc..3e0b1fb1db17 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -115,6 +115,7 @@ static void fuse_free_inode(struct inode *inode)
 
 static void fuse_evict_inode(struct inode *inode)
 {
+	struct fuse_conn *fc = get_fuse_conn(inode);
 	struct fuse_inode *fi = get_fuse_inode(inode);
 
 	/* Will write inode on close/munmap and in all other dirtiers */
@@ -137,6 +138,8 @@ static void fuse_evict_inode(struct inode *inode)
 		WARN_ON(!list_empty(&fi->write_files));
 		WARN_ON(!list_empty(&fi->queued_writes));
 	}
+
+	atomic64_inc(&fc->attr_version);
 }
 
 static int fuse_reconfigure(struct fs_context *fsc)
@@ -409,6 +412,10 @@ struct inode *fuse_iget(struct super_block *sb, u64 nodeid,
 	fi->nlookup++;
 	spin_unlock(&fi->lock);
 	fuse_change_attributes(inode, attr, attr_valid, attr_version);
+	spin_lock(&fi->lock);
+	if (attr_version < atomic64_read(&fc->attr_version))
+		fuse_invalidate_attr(inode);
+	spin_unlock(&fi->lock);
 
 	return inode;
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/5] fuse: invalidate dentry on EEXIST creates or ENOENT deletes
  2023-07-11  4:34 [PATCH 0/5] FUSE consistency improvements Jiachen Zhang
  2023-07-11  4:34 ` [PATCH 1/5] fuse: check attributes staleness on fuse_iget() Jiachen Zhang
@ 2023-07-11  4:34 ` Jiachen Zhang
  2023-08-16 12:27   ` Miklos Szeredi
  2023-07-11  4:34 ` [PATCH 3/5] fuse: add FOPEN_INVAL_ATTR Jiachen Zhang
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Jiachen Zhang @ 2023-07-11  4:34 UTC (permalink / raw)
  To: Miklos Szeredi, Jonathan Corbet, linux-fsdevel, linux-doc,
	linux-kernel
  Cc: me, Jiachen Zhang

The EEXIST errors returned from server are strong sign that a local
negative dentry should be invalidated. Similarly, The ENOENT errors from
server can also be a sign of revalidate failure. This commit invalidates
dentries on EEXIST creates and ENOENT deletes by calling
fuse_invalidate_entry(), which improves the consistency with no
performance degradation.

Signed-off-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
---
 fs/fuse/dir.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 5a4a7155cf1c..cfe38ee91ffd 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -755,7 +755,8 @@ static int fuse_atomic_open(struct inode *dir, struct dentry *entry,
 	if (err == -ENOSYS) {
 		fc->no_create = 1;
 		goto mknod;
-	}
+	} else if (err == -EEXIST)
+		fuse_invalidate_entry(entry);
 out_dput:
 	dput(res);
 	return err;
@@ -835,6 +836,8 @@ static int create_new_entry(struct fuse_mount *fm, struct fuse_args *args,
 	return 0;
 
  out_put_forget_req:
+	if (err == -EEXIST)
+		fuse_invalidate_entry(entry);
 	kfree(forget);
 	return err;
 }
@@ -986,7 +989,7 @@ static int fuse_unlink(struct inode *dir, struct dentry *entry)
 	if (!err) {
 		fuse_dir_changed(dir);
 		fuse_entry_unlinked(entry);
-	} else if (err == -EINTR)
+	} else if (err == -EINTR || err == -ENOENT)
 		fuse_invalidate_entry(entry);
 	return err;
 }
@@ -1009,7 +1012,7 @@ static int fuse_rmdir(struct inode *dir, struct dentry *entry)
 	if (!err) {
 		fuse_dir_changed(dir);
 		fuse_entry_unlinked(entry);
-	} else if (err == -EINTR)
+	} else if (err == -EINTR || err == -ENOENT)
 		fuse_invalidate_entry(entry);
 	return err;
 }
@@ -1050,7 +1053,7 @@ static int fuse_rename_common(struct inode *olddir, struct dentry *oldent,
 		/* newent will end up negative */
 		if (!(flags & RENAME_EXCHANGE) && d_really_is_positive(newent))
 			fuse_entry_unlinked(newent);
-	} else if (err == -EINTR) {
+	} else if (err == -EINTR || err == -ENOENT) {
 		/* If request was interrupted, DEITY only knows if the
 		   rename actually took place.  If the invalidation
 		   fails (e.g. some process has CWD under the renamed
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/5] fuse: add FOPEN_INVAL_ATTR
  2023-07-11  4:34 [PATCH 0/5] FUSE consistency improvements Jiachen Zhang
  2023-07-11  4:34 ` [PATCH 1/5] fuse: check attributes staleness on fuse_iget() Jiachen Zhang
  2023-07-11  4:34 ` [PATCH 2/5] fuse: invalidate dentry on EEXIST creates or ENOENT deletes Jiachen Zhang
@ 2023-07-11  4:34 ` Jiachen Zhang
  2023-08-23  9:01   ` Miklos Szeredi
  2023-07-11  4:34 ` [PATCH 4/5] fuse: writeback_cache consistency enhancement (writeback_cache_v2) Jiachen Zhang
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Jiachen Zhang @ 2023-07-11  4:34 UTC (permalink / raw)
  To: Miklos Szeredi, Jonathan Corbet, linux-fsdevel, linux-doc,
	linux-kernel
  Cc: me, Jiachen Zhang

Add FOPEN_INVAL_ATTR so that the fuse daemon can ask kernel to invalidate
the attr cache on file open.

The fi->attr_version should be increased when handling FOPEN_INVAL_ATTR.
Because if a FUSE request returning attributes (getattr, setattr, lookup,
and readdirplus) starts before a FUSE_OPEN replying FOPEN_INVAL_ATTR, but
finishes after the FUSE_OPEN, staled attributes will be set to the inode
and falsely clears the inval_mask.

Signed-off-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
---
 fs/fuse/file.c            | 10 ++++++++++
 include/uapi/linux/fuse.h |  2 ++
 2 files changed, 12 insertions(+)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index de37a3a06a71..412824a11b7b 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -215,6 +215,16 @@ void fuse_finish_open(struct inode *inode, struct file *file)
 		file_update_time(file);
 		fuse_invalidate_attr_mask(inode, FUSE_STATX_MODSIZE);
 	}
+
+	if (ff->open_flags & FOPEN_INVAL_ATTR) {
+		struct fuse_inode *fi = get_fuse_inode(inode);
+
+		spin_lock(&fi->lock);
+		fi->attr_version = atomic64_inc_return(&fc->attr_version);
+		fuse_invalidate_attr(inode);
+		spin_unlock(&fi->lock);
+	}
+
 	if ((file->f_mode & FMODE_WRITE) && fc->writeback_cache)
 		fuse_link_write_file(file);
 }
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index b3fcab13fcd3..1a24c11637a4 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -315,6 +315,7 @@ struct fuse_file_lock {
  * FOPEN_STREAM: the file is stream-like (no file position at all)
  * FOPEN_NOFLUSH: don't flush data cache on close (unless FUSE_WRITEBACK_CACHE)
  * FOPEN_PARALLEL_DIRECT_WRITES: Allow concurrent direct writes on the same inode
+ * FOPEN_INVAL_ATTR: invalidate the attr cache on open
  */
 #define FOPEN_DIRECT_IO		(1 << 0)
 #define FOPEN_KEEP_CACHE	(1 << 1)
@@ -323,6 +324,7 @@ struct fuse_file_lock {
 #define FOPEN_STREAM		(1 << 4)
 #define FOPEN_NOFLUSH		(1 << 5)
 #define FOPEN_PARALLEL_DIRECT_WRITES	(1 << 6)
+#define FOPEN_INVAL_ATTR	(1 << 7)
 
 /**
  * INIT request/reply flags
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/5] fuse: writeback_cache consistency enhancement (writeback_cache_v2)
  2023-07-11  4:34 [PATCH 0/5] FUSE consistency improvements Jiachen Zhang
                   ` (2 preceding siblings ...)
  2023-07-11  4:34 ` [PATCH 3/5] fuse: add FOPEN_INVAL_ATTR Jiachen Zhang
@ 2023-07-11  4:34 ` Jiachen Zhang
  2023-08-23  9:07   ` Miklos Szeredi
  2023-07-11  4:34 ` [PATCH 5/5] docs: fuse: improve FUSE consistency explanation Jiachen Zhang
  2023-07-14  5:50 ` [PATCH 0/5] FUSE consistency improvements Jingbo Xu
  5 siblings, 1 reply; 17+ messages in thread
From: Jiachen Zhang @ 2023-07-11  4:34 UTC (permalink / raw)
  To: Miklos Szeredi, Jonathan Corbet, linux-fsdevel, linux-doc,
	linux-kernel
  Cc: me, Jiachen Zhang

Some users may want both the high performance of the writeback_cahe mode
and a little bit more consistency among FUSE mounts. Current
writeback_cache mode never updates attributes from server, so can never
see the file attributes changed by other FUSE mounts, which means
'zero-consisteny'.

This commit introduces writeback_cache_v2 mode, which allows the attributes
to be updated from server to kernel when the inode is clean and no
writeback is in-progressing. FUSE daemons can select this mode by the
FUSE_WRITEBACK_CACHE_V2 init flag.

In writeback_cache_v2 mode, the server generates official attributes.
Therefore,

    1. For the cmtime, the cmtime generated by kernel are just temporary
    values that are never flushed to server by fuse_write_inode(), and they
    could be eventually updated by the official server cmtime. The
    mtime-based revalidation of the fc->auto_inval_data mode is also
    skipped, as the kernel-generated temporary cmtime are likely not equal
    to the offical server cmtime.

    2. For the file size, we expect server updates its file size on
    FUSE_WRITEs. So we increase fi->attr_version in fuse_writepage_end() to
    check the staleness of the returning file size.

Together with FOPEN_INVAL_ATTR, a FUSE daemon is able to implement
close-to-open (CTO) consistency like NFS client implementations.

Signed-off-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
---
 fs/fuse/file.c            | 25 +++++++++++++++++++++++
 fs/fuse/fuse_i.h          |  6 ++++++
 fs/fuse/inode.c           | 42 +++++++++++++++++++++++++++++++++++++--
 include/uapi/linux/fuse.h |  9 ++++++++-
 4 files changed, 79 insertions(+), 3 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 412824a11b7b..09416caea575 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1914,6 +1914,10 @@ static void fuse_writepage_end(struct fuse_mount *fm, struct fuse_args *args,
 		 */
 		fuse_send_writepage(fm, next, inarg->offset + inarg->size);
 	}
+
+	if (fc->writeback_cache_v2)
+		fi->attr_version = atomic64_inc_return(&fc->attr_version);
+
 	fi->writectr--;
 	fuse_writepage_finish(fm, wpa);
 	spin_unlock(&fi->lock);
@@ -1943,10 +1947,18 @@ static struct fuse_file *fuse_write_file_get(struct fuse_inode *fi)
 
 int fuse_write_inode(struct inode *inode, struct writeback_control *wbc)
 {
+	struct fuse_conn *fc = get_fuse_conn(inode);
 	struct fuse_inode *fi = get_fuse_inode(inode);
 	struct fuse_file *ff;
 	int err;
 
+	/*
+	 * Kernel c/mtime should not be updated to the server in the
+	 * writeback_cache_v2 mode as server c/mtime are official.
+	 */
+	if (fc->writeback_cache_v2)
+		return 0;
+
 	/*
 	 * Inode is always written before the last reference is dropped and
 	 * hence this should not be reached from reclaim.
@@ -2375,11 +2387,14 @@ static int fuse_write_begin(struct file *file, struct address_space *mapping,
 {
 	pgoff_t index = pos >> PAGE_SHIFT;
 	struct fuse_conn *fc = get_fuse_conn(file_inode(file));
+	struct fuse_inode *fi = get_fuse_inode(file_inode(file));
 	struct page *page;
 	loff_t fsize;
 	int err = -ENOMEM;
 
 	WARN_ON(!fc->writeback_cache);
+	if (fc->writeback_cache_v2)
+		mutex_lock(&fi->wb_attr_mutex);
 
 	page = grab_cache_page_write_begin(mapping, index);
 	if (!page)
@@ -2411,6 +2426,9 @@ static int fuse_write_begin(struct file *file, struct address_space *mapping,
 	unlock_page(page);
 	put_page(page);
 error:
+	if (fc->writeback_cache_v2)
+		mutex_unlock(&fi->wb_attr_mutex);
+
 	return err;
 }
 
@@ -2419,6 +2437,7 @@ static int fuse_write_end(struct file *file, struct address_space *mapping,
 		struct page *page, void *fsdata)
 {
 	struct inode *inode = page->mapping->host;
+	struct fuse_conn *fc = get_fuse_conn(inode);
 
 	/* Haven't copied anything?  Skip zeroing, size extending, dirtying. */
 	if (!copied)
@@ -2442,6 +2461,12 @@ static int fuse_write_end(struct file *file, struct address_space *mapping,
 	unlock_page(page);
 	put_page(page);
 
+	if (fc->writeback_cache_v2) {
+		struct fuse_inode *fi = get_fuse_inode(inode);
+
+		mutex_unlock(&fi->wb_attr_mutex);
+	}
+
 	return copied;
 }
 
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 9b7fc7d3c7f1..200be199eb93 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -155,6 +155,9 @@ struct fuse_inode {
 	 */
 	struct fuse_inode_dax *dax;
 #endif
+
+	/** Lock for serializing size updates in writeback_cache_v2 mode */
+	struct mutex wb_attr_mutex;
 };
 
 /** FUSE inode state bits */
@@ -656,6 +659,9 @@ struct fuse_conn {
 	/* show legacy mount options */
 	unsigned int legacy_opts_show:1;
 
+	/* Improved writeback cache policy */
+	unsigned writeback_cache_v2:1;
+
 	/*
 	 * fs kills suid/sgid/cap on write/chown/trunc. suid is killed on
 	 * write/trunc only if caller did not have CAP_FSETID.  sgid is killed
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 3e0b1fb1db17..958f8534a585 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -84,6 +84,7 @@ static struct inode *fuse_alloc_inode(struct super_block *sb)
 	fi->orig_ino = 0;
 	fi->state = 0;
 	mutex_init(&fi->mutex);
+	mutex_init(&fi->wb_attr_mutex);
 	spin_lock_init(&fi->lock);
 	fi->forget = fuse_alloc_forget();
 	if (!fi->forget)
@@ -246,14 +247,36 @@ void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr,
 	u32 cache_mask;
 	loff_t oldsize;
 	struct timespec64 old_mtime;
+	bool try_wb_update = false;
+
+	if (fc->writeback_cache_v2 && S_ISREG(inode->i_mode)) {
+		mutex_lock(&fi->wb_attr_mutex);
+		try_wb_update = true;
+	}
 
 	spin_lock(&fi->lock);
 	/*
 	 * In case of writeback_cache enabled, writes update mtime, ctime and
 	 * may update i_size.  In these cases trust the cached value in the
 	 * inode.
+	 *
+	 * In writeback_cache_v2 mode, if all the following conditions are met,
+	 * then we allow the attributes to be refreshed:
+	 *
+	 * - inode is not in the process of being written (I_SYNC)
+	 * - inode has no dirty pages (I_DIRTY_PAGES)
+	 * - inode data-related attributes are clean (I_DIRTY_DATASYNC)
+	 * - inode does not have any page writeback in progress
+	 *
+	 * Note: checking PAGECACHE_TAG_WRITEBACK is not sufficient in fuse,
+	 * since inode can appear to have no PageWriteback pages, yet still have
+	 * outstanding write request.
 	 */
 	cache_mask = fuse_get_cache_mask(inode);
+	if (try_wb_update && !(inode->i_state & (I_DIRTY_PAGES | I_SYNC |
+	    I_DIRTY_DATASYNC)) && RB_EMPTY_ROOT(&fi->writepages))
+		cache_mask &= ~(STATX_MTIME | STATX_CTIME | STATX_SIZE);
+
 	if (cache_mask & STATX_SIZE)
 		attr->size = i_size_read(inode);
 
@@ -269,6 +292,8 @@ void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr,
 	if ((attr_version != 0 && fi->attr_version > attr_version) ||
 	    test_bit(FUSE_I_SIZE_UNSTABLE, &fi->state)) {
 		spin_unlock(&fi->lock);
+		if (try_wb_update)
+			mutex_unlock(&fi->wb_attr_mutex);
 		return;
 	}
 
@@ -292,7 +317,13 @@ void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr,
 			truncate_pagecache(inode, attr->size);
 			if (!fc->explicit_inval_data)
 				inval = true;
-		} else if (fc->auto_inval_data) {
+		} else if (!fc->writeback_cache_v2 && fc->auto_inval_data) {
+			/*
+			 * When fc->writeback_cache_v2 is set, the old_mtime
+			 * can be generated by kernel and must not equal to
+			 * new_mtime generated by server. So skip in such
+			 * case.
+			 */
 			struct timespec64 new_mtime = {
 				.tv_sec = attr->mtime,
 				.tv_nsec = attr->mtimensec,
@@ -312,6 +343,9 @@ void fuse_change_attributes(struct inode *inode, struct fuse_attr *attr,
 
 	if (IS_ENABLED(CONFIG_FUSE_DAX))
 		fuse_dax_dontcache(inode, attr->flags);
+
+	if (try_wb_update)
+		mutex_unlock(&fi->wb_attr_mutex);
 }
 
 static void fuse_init_inode(struct inode *inode, struct fuse_attr *attr,
@@ -1179,6 +1213,10 @@ static void process_init_reply(struct fuse_mount *fm, struct fuse_args *args,
 				fc->async_dio = 1;
 			if (flags & FUSE_WRITEBACK_CACHE)
 				fc->writeback_cache = 1;
+			if (flags & FUSE_WRITEBACK_CACHE_V2) {
+				fc->writeback_cache = 1;
+				fc->writeback_cache_v2 = 1;
+			}
 			if (flags & FUSE_PARALLEL_DIROPS)
 				fc->parallel_dirops = 1;
 			if (flags & FUSE_HANDLE_KILLPRIV)
@@ -1262,7 +1300,7 @@ void fuse_send_init(struct fuse_mount *fm)
 		FUSE_NO_OPENDIR_SUPPORT | FUSE_EXPLICIT_INVAL_DATA |
 		FUSE_HANDLE_KILLPRIV_V2 | FUSE_SETXATTR_EXT | FUSE_INIT_EXT |
 		FUSE_SECURITY_CTX | FUSE_CREATE_SUPP_GROUP |
-		FUSE_HAS_EXPIRE_ONLY;
+		FUSE_HAS_EXPIRE_ONLY | FUSE_WRITEBACK_CACHE_V2;
 #ifdef CONFIG_FUSE_DAX
 	if (fm->fc->dax)
 		flags |= FUSE_MAP_ALIGNMENT;
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index 1a24c11637a4..850a3c0f87fb 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -207,6 +207,9 @@
  *  - add FUSE_EXT_GROUPS
  *  - add FUSE_CREATE_SUPP_GROUP
  *  - add FUSE_HAS_EXPIRE_ONLY
+ *
+ *  7.39
+ *  - add FUSE_WRITEBACK_CACHE_V2 init flag
  */
 
 #ifndef _LINUX_FUSE_H
@@ -242,7 +245,7 @@
 #define FUSE_KERNEL_VERSION 7
 
 /** Minor version number of this interface */
-#define FUSE_KERNEL_MINOR_VERSION 38
+#define FUSE_KERNEL_MINOR_VERSION 39
 
 /** The node ID of the root inode */
 #define FUSE_ROOT_ID 1
@@ -373,6 +376,9 @@ struct fuse_file_lock {
  * FUSE_CREATE_SUPP_GROUP: add supplementary group info to create, mkdir,
  *			symlink and mknod (single group that matches parent)
  * FUSE_HAS_EXPIRE_ONLY: kernel supports expiry-only entry invalidation
+ * FUSE_WRITEBACK_CACHE_V2:
+ *			allow time/size to be refreshed if no pending write
+ *			c/mtime not updated from kernel to server
  */
 #define FUSE_ASYNC_READ		(1 << 0)
 #define FUSE_POSIX_LOCKS	(1 << 1)
@@ -411,6 +417,7 @@ struct fuse_file_lock {
 #define FUSE_HAS_INODE_DAX	(1ULL << 33)
 #define FUSE_CREATE_SUPP_GROUP	(1ULL << 34)
 #define FUSE_HAS_EXPIRE_ONLY	(1ULL << 35)
+#define FUSE_WRITEBACK_CACHE_V2	(1ULL << 36)
 
 /**
  * CUSE INIT request/reply flags
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/5] docs: fuse: improve FUSE consistency explanation
  2023-07-11  4:34 [PATCH 0/5] FUSE consistency improvements Jiachen Zhang
                   ` (3 preceding siblings ...)
  2023-07-11  4:34 ` [PATCH 4/5] fuse: writeback_cache consistency enhancement (writeback_cache_v2) Jiachen Zhang
@ 2023-07-11  4:34 ` Jiachen Zhang
  2023-07-11  4:42   ` Randy Dunlap
  2023-07-14  5:50 ` [PATCH 0/5] FUSE consistency improvements Jingbo Xu
  5 siblings, 1 reply; 17+ messages in thread
From: Jiachen Zhang @ 2023-07-11  4:34 UTC (permalink / raw)
  To: Miklos Szeredi, Jonathan Corbet, linux-fsdevel, linux-doc,
	linux-kernel
  Cc: me, Jiachen Zhang

Signed-off-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
---
 Documentation/filesystems/fuse-io.rst | 32 +++++++++++++++++++++++++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/fuse-io.rst b/Documentation/filesystems/fuse-io.rst
index 255a368fe534..cdd292dd2e9c 100644
--- a/Documentation/filesystems/fuse-io.rst
+++ b/Documentation/filesystems/fuse-io.rst
@@ -10,6 +10,10 @@ Fuse supports the following I/O modes:
 - cached
   + write-through
   + writeback-cache
+  + writeback-cache-v2
+
+Direct-io Mode
+==============
 
 The direct-io mode can be selected with the FOPEN_DIRECT_IO flag in the
 FUSE_OPEN reply.
@@ -17,6 +21,9 @@ FUSE_OPEN reply.
 In direct-io mode the page cache is completely bypassed for reads and writes.
 No read-ahead takes place. Shared mmap is disabled.
 
+Cached Modes and Cache Coherence
+================================
+
 In cached mode reads may be satisfied from the page cache, and data may be
 read-ahead by the kernel to fill the cache.  The cache is always kept consistent
 after any writes to the file.  All mmap modes are supported.
@@ -24,7 +31,8 @@ after any writes to the file.  All mmap modes are supported.
 The cached mode has two sub modes controlling how writes are handled.  The
 write-through mode is the default and is supported on all kernels.  The
 writeback-cache mode may be selected by the FUSE_WRITEBACK_CACHE flag in the
-FUSE_INIT reply.
+FUSE_INIT reply. In either modes, if the FOPEN_KEEP_CACHE flag is not set in
+the FUSE_OPEN, cached pages of the file will be invalidated immediatedly.
 
 In write-through mode each write is immediately sent to userspace as one or more
 WRITE requests, as well as updating any cached pages (and caching previously
@@ -38,7 +46,27 @@ reclaim on memory pressure) or explicitly (invoked by close(2), fsync(2) and
 when the last ref to the file is being released on munmap(2)).  This mode
 assumes that all changes to the filesystem go through the FUSE kernel module
 (size and atime/ctime/mtime attributes are kept up-to-date by the kernel), so
-it's generally not suitable for network filesystems.  If a partial page is
+it's generally not suitable for network filesystems (you can consider the
+writeback-cache-v2 mode mentioned latter for them).  If a partial page is
 written, then the page needs to be first read from userspace.  This means, that
 even for files opened for O_WRONLY it is possible that READ requests will be
 generated by the kernel.
+
+Writeback-cache-v2 mode (enabled by the FUSE_WRITEBACK_CACHE_V2 flag) retains
+the dirty page management logic of the writeback-cache mode, which provides
+great write performance.  Furthermore, the v2 mode improves cache coherence for
+multiple FUSE mounts scenarios, especially for network filesystems. The kernel
+a/c/mtime and size attributes are allowed to be updated from the filesystem
+either on timeout or when they have been explicitly invalidated. Meanwhile, if
+ever updated by kernel locally, the attributes will not be propagated to the
+filesystem. In other words, the filesystem rather than kernel is considered the
+official source for generating these attributes.
+
+By combining the writeback-cache-v2 mode with the appropriate open flags
+(FOPEN_KEEP_CACHE and FOPEN_INVAL_ATTR for keeping page cache and invalidating
+attributes on FUSE_OPEN respectively), filesystems are able to implement the
+close-to-open (CTO) consistency semantics, which is widely supported by NFS
+client implementations. This allows for maintaining the writeback manner of
+dirty pages while ensuring cache coherence of attributes and file data if the
+operations among different FUSE mounts on a file are properly serialized by
+users using the open-after-close manner.
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] docs: fuse: improve FUSE consistency explanation
  2023-07-11  4:34 ` [PATCH 5/5] docs: fuse: improve FUSE consistency explanation Jiachen Zhang
@ 2023-07-11  4:42   ` Randy Dunlap
  2023-07-11  7:15     ` Jiachen Zhang
  0 siblings, 1 reply; 17+ messages in thread
From: Randy Dunlap @ 2023-07-11  4:42 UTC (permalink / raw)
  To: Jiachen Zhang, Miklos Szeredi, Jonathan Corbet, linux-fsdevel,
	linux-doc, linux-kernel
  Cc: me

Hi--

On 7/10/23 21:34, Jiachen Zhang wrote:
> Signed-off-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
> ---
>  Documentation/filesystems/fuse-io.rst | 32 +++++++++++++++++++++++++--
>  1 file changed, 30 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/filesystems/fuse-io.rst b/Documentation/filesystems/fuse-io.rst
> index 255a368fe534..cdd292dd2e9c 100644
> --- a/Documentation/filesystems/fuse-io.rst
> +++ b/Documentation/filesystems/fuse-io.rst

> @@ -24,7 +31,8 @@ after any writes to the file.  All mmap modes are supported.
>  The cached mode has two sub modes controlling how writes are handled.  The
>  write-through mode is the default and is supported on all kernels.  The
>  writeback-cache mode may be selected by the FUSE_WRITEBACK_CACHE flag in the
> -FUSE_INIT reply.
> +FUSE_INIT reply. In either modes, if the FOPEN_KEEP_CACHE flag is not set in

                       either mode,

> +the FUSE_OPEN, cached pages of the file will be invalidated immediatedly.

                                                               immediately.

>  
>  In write-through mode each write is immediately sent to userspace as one or more
>  WRITE requests, as well as updating any cached pages (and caching previously
> @@ -38,7 +46,27 @@ reclaim on memory pressure) or explicitly (invoked by close(2), fsync(2) and
>  when the last ref to the file is being released on munmap(2)).  This mode
>  assumes that all changes to the filesystem go through the FUSE kernel module
>  (size and atime/ctime/mtime attributes are kept up-to-date by the kernel), so
> -it's generally not suitable for network filesystems.  If a partial page is
> +it's generally not suitable for network filesystems (you can consider the
> +writeback-cache-v2 mode mentioned latter for them).  If a partial page is

                                     later

>  written, then the page needs to be first read from userspace.  This means, that
>  even for files opened for O_WRONLY it is possible that READ requests will be
>  generated by the kernel.


-- 
~Randy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] docs: fuse: improve FUSE consistency explanation
  2023-07-11  4:42   ` Randy Dunlap
@ 2023-07-11  7:15     ` Jiachen Zhang
  0 siblings, 0 replies; 17+ messages in thread
From: Jiachen Zhang @ 2023-07-11  7:15 UTC (permalink / raw)
  To: Randy Dunlap, Miklos Szeredi, Jonathan Corbet, linux-fsdevel,
	linux-doc, linux-kernel
  Cc: me


On 2023/7/11 12:42, Randy Dunlap wrote:
> Hi--
> 
> On 7/10/23 21:34, Jiachen Zhang wrote:
>> Signed-off-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
>> ---
>>   Documentation/filesystems/fuse-io.rst | 32 +++++++++++++++++++++++++--
>>   1 file changed, 30 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/filesystems/fuse-io.rst b/Documentation/filesystems/fuse-io.rst
>> index 255a368fe534..cdd292dd2e9c 100644
>> --- a/Documentation/filesystems/fuse-io.rst
>> +++ b/Documentation/filesystems/fuse-io.rst
> 
>> @@ -24,7 +31,8 @@ after any writes to the file.  All mmap modes are supported.
>>   The cached mode has two sub modes controlling how writes are handled.  The
>>   write-through mode is the default and is supported on all kernels.  The
>>   writeback-cache mode may be selected by the FUSE_WRITEBACK_CACHE flag in the
>> -FUSE_INIT reply.
>> +FUSE_INIT reply. In either modes, if the FOPEN_KEEP_CACHE flag is not set in
> 
>                         either mode,
> 
>> +the FUSE_OPEN, cached pages of the file will be invalidated immediatedly.
> 
>                                                                 immediately.
> 
>>   
>>   In write-through mode each write is immediately sent to userspace as one or more
>>   WRITE requests, as well as updating any cached pages (and caching previously
>> @@ -38,7 +46,27 @@ reclaim on memory pressure) or explicitly (invoked by close(2), fsync(2) and
>>   when the last ref to the file is being released on munmap(2)).  This mode
>>   assumes that all changes to the filesystem go through the FUSE kernel module
>>   (size and atime/ctime/mtime attributes are kept up-to-date by the kernel), so
>> -it's generally not suitable for network filesystems.  If a partial page is
>> +it's generally not suitable for network filesystems (you can consider the
>> +writeback-cache-v2 mode mentioned latter for them).  If a partial page is
> 
>                                       later
> 
>>   written, then the page needs to be first read from userspace.  This means, that
>>   even for files opened for O_WRONLY it is possible that READ requests will be
>>   generated by the kernel.
> 
> 


Thanks, Randy. I will fix them in the next version.

Jiachen

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/5] FUSE consistency improvements
  2023-07-11  4:34 [PATCH 0/5] FUSE consistency improvements Jiachen Zhang
                   ` (4 preceding siblings ...)
  2023-07-11  4:34 ` [PATCH 5/5] docs: fuse: improve FUSE consistency explanation Jiachen Zhang
@ 2023-07-14  5:50 ` Jingbo Xu
  5 siblings, 0 replies; 17+ messages in thread
From: Jingbo Xu @ 2023-07-14  5:50 UTC (permalink / raw)
  To: Jiachen Zhang, Miklos Szeredi, Jonathan Corbet, linux-fsdevel,
	linux-doc, linux-kernel
  Cc: me



On 7/11/23 12:34 PM, Jiachen Zhang wrote:
> This patchset resends some patches that related to FUSE consistency
> improvements in the mailing list.
> 
> The 1st patch fixes a staleness-checking issue, which is the v2 version
> of the patch[1].
> 
> The 2nd patch is a resend version of the patch[2] with its commit message
> rewritten.
> 
> The 3rd and 4th patches are new versions of the patch[3] and the patch[4],
> FUSE filesystems are able to implement the close-to-open (CTO) consistency
> semantics with the help of these two patches. The 5th patch is a new
> patch which improves the explanation of FUSE cache mode and consistency
> models in the documentation.
> 

Yeah our internal production environment will also benefit from this
cache consistency enhancement.  It would be great if this feature could
be improved and finally gets merged.


-- 
Thanks,
Jingbo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/5] fuse: invalidate dentry on EEXIST creates or ENOENT deletes
  2023-07-11  4:34 ` [PATCH 2/5] fuse: invalidate dentry on EEXIST creates or ENOENT deletes Jiachen Zhang
@ 2023-08-16 12:27   ` Miklos Szeredi
  0 siblings, 0 replies; 17+ messages in thread
From: Miklos Szeredi @ 2023-08-16 12:27 UTC (permalink / raw)
  To: Jiachen Zhang; +Cc: Jonathan Corbet, linux-fsdevel, linux-doc, linux-kernel, me

On Tue, 11 Jul 2023 at 06:36, Jiachen Zhang
<zhangjiachen.jaycee@bytedance.com> wrote:
>
> The EEXIST errors returned from server are strong sign that a local
> negative dentry should be invalidated. Similarly, The ENOENT errors from
> server can also be a sign of revalidate failure. This commit invalidates
> dentries on EEXIST creates and ENOENT deletes by calling
> fuse_invalidate_entry(), which improves the consistency with no
> performance degradation.

Applied, thanks.

Miklos

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/5] fuse: check attributes staleness on fuse_iget()
  2023-07-11  4:34 ` [PATCH 1/5] fuse: check attributes staleness on fuse_iget() Jiachen Zhang
@ 2023-08-23  8:57   ` Miklos Szeredi
  0 siblings, 0 replies; 17+ messages in thread
From: Miklos Szeredi @ 2023-08-23  8:57 UTC (permalink / raw)
  To: Jiachen Zhang; +Cc: Jonathan Corbet, linux-fsdevel, linux-doc, linux-kernel, me

On Tue, 11 Jul 2023 at 06:36, Jiachen Zhang
<zhangjiachen.jaycee@bytedance.com> wrote:
>
> Function fuse_direntplus_link() might call fuse_iget() to initialize a new
> fuse_inode and change its attributes. If fi->attr_version is always
> initialized with 0, even if the attributes returned by the FUSE_READDIR
> request is staled, as the new fi->attr_version is 0, fuse_change_attributes
> will still set the staled attributes to inode. This wrong behaviour may
> cause file size inconsistency even when there is no changes from
> server-side.
>
> To reproduce the issue, consider the following 2 programs (A and B) are
> running concurrently,
>
>         A                                               B
> ----------------------------------      --------------------------------
> { /fusemnt/dir/f is a file path in a fuse mount, the size of f is 0. }
>
> readdir(/fusemnt/dir) start
> //Daemon set size 0 to f direntry
>                                         fallocate(f, 1024)
>                                         stat(f) // B see size 1024
>                                         echo 2 > /proc/sys/vm/drop_caches
> readdir(/fusemnt/dir) reply to kernel
> Kernel set 0 to the I_NEW inode
>
>                                         stat(f) // B see size 0
>
> In the above case, only program B is modifying the file size, however, B
> observes file size changing between the 2 'readonly' stat() calls. To fix
> this issue, we should make sure readdirplus still follows the rule of
> attr_version staleness checking even if the fi->attr_version is lost due to
> inode eviction. So this patch increases fc->attr_version on inode eviction,
> and compares request attr_version and the fc->attr_version when a
> FUSE_READDIRPLUS request is finished.

Thanks for the report.  It's really interesting that you are the first
to notice this misbehavior, even though it's been there since the
beginning.

The fix looks correct but suboptimal: fc->attr_version will get
incremented due to non-evict events as well, which could lead to false
positives.   I'd add another counter (e.g. fc->evict_ctr) and manage
that separately for lookup type operations (LOOKUP/READDIRPLUS).

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/5] fuse: add FOPEN_INVAL_ATTR
  2023-07-11  4:34 ` [PATCH 3/5] fuse: add FOPEN_INVAL_ATTR Jiachen Zhang
@ 2023-08-23  9:01   ` Miklos Szeredi
  2023-08-23 11:12     ` Jiachen Zhang
  0 siblings, 1 reply; 17+ messages in thread
From: Miklos Szeredi @ 2023-08-23  9:01 UTC (permalink / raw)
  To: Jiachen Zhang; +Cc: Jonathan Corbet, linux-fsdevel, linux-doc, linux-kernel, me

On Tue, 11 Jul 2023 at 06:36, Jiachen Zhang
<zhangjiachen.jaycee@bytedance.com> wrote:
>
> Add FOPEN_INVAL_ATTR so that the fuse daemon can ask kernel to invalidate
> the attr cache on file open.
>
> The fi->attr_version should be increased when handling FOPEN_INVAL_ATTR.
> Because if a FUSE request returning attributes (getattr, setattr, lookup,
> and readdirplus) starts before a FUSE_OPEN replying FOPEN_INVAL_ATTR, but
> finishes after the FUSE_OPEN, staled attributes will be set to the inode
> and falsely clears the inval_mask.
>
> Signed-off-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
> ---
>  fs/fuse/file.c            | 10 ++++++++++
>  include/uapi/linux/fuse.h |  2 ++
>  2 files changed, 12 insertions(+)
>
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index de37a3a06a71..412824a11b7b 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -215,6 +215,16 @@ void fuse_finish_open(struct inode *inode, struct file *file)
>                 file_update_time(file);
>                 fuse_invalidate_attr_mask(inode, FUSE_STATX_MODSIZE);
>         }
> +
> +       if (ff->open_flags & FOPEN_INVAL_ATTR) {
> +               struct fuse_inode *fi = get_fuse_inode(inode);
> +
> +               spin_lock(&fi->lock);
> +               fi->attr_version = atomic64_inc_return(&fc->attr_version);

No need to add locking or change fi->attr_version.  This will be done
next time the attributes are updated.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/5] fuse: writeback_cache consistency enhancement (writeback_cache_v2)
  2023-07-11  4:34 ` [PATCH 4/5] fuse: writeback_cache consistency enhancement (writeback_cache_v2) Jiachen Zhang
@ 2023-08-23  9:07   ` Miklos Szeredi
  2023-08-23 10:35     ` Bernd Schubert
  0 siblings, 1 reply; 17+ messages in thread
From: Miklos Szeredi @ 2023-08-23  9:07 UTC (permalink / raw)
  To: Jiachen Zhang; +Cc: Jonathan Corbet, linux-fsdevel, linux-doc, linux-kernel, me

On Tue, 11 Jul 2023 at 06:36, Jiachen Zhang
<zhangjiachen.jaycee@bytedance.com> wrote:
>
> Some users may want both the high performance of the writeback_cahe mode
> and a little bit more consistency among FUSE mounts. Current
> writeback_cache mode never updates attributes from server, so can never
> see the file attributes changed by other FUSE mounts, which means
> 'zero-consisteny'.
>
> This commit introduces writeback_cache_v2 mode, which allows the attributes
> to be updated from server to kernel when the inode is clean and no
> writeback is in-progressing. FUSE daemons can select this mode by the
> FUSE_WRITEBACK_CACHE_V2 init flag.
>
> In writeback_cache_v2 mode, the server generates official attributes.
> Therefore,
>
>     1. For the cmtime, the cmtime generated by kernel are just temporary
>     values that are never flushed to server by fuse_write_inode(), and they
>     could be eventually updated by the official server cmtime. The
>     mtime-based revalidation of the fc->auto_inval_data mode is also
>     skipped, as the kernel-generated temporary cmtime are likely not equal
>     to the offical server cmtime.
>
>     2. For the file size, we expect server updates its file size on
>     FUSE_WRITEs. So we increase fi->attr_version in fuse_writepage_end() to
>     check the staleness of the returning file size.
>
> Together with FOPEN_INVAL_ATTR, a FUSE daemon is able to implement
> close-to-open (CTO) consistency like NFS client implementations.

What I'd prefer is mode similar to NFS: getattr flushes pending writes
so that server ctime/mtime are always in sync with client.  FUSE
probably should have done that from the beginning, but at that time I
wasn't aware of the NFS solution.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/5] fuse: writeback_cache consistency enhancement (writeback_cache_v2)
  2023-08-23  9:07   ` Miklos Szeredi
@ 2023-08-23 10:35     ` Bernd Schubert
  2023-08-23 10:59       ` Jiachen Zhang
  0 siblings, 1 reply; 17+ messages in thread
From: Bernd Schubert @ 2023-08-23 10:35 UTC (permalink / raw)
  To: Miklos Szeredi, Jiachen Zhang
  Cc: Jonathan Corbet, linux-fsdevel, linux-doc, linux-kernel, me

On 8/23/23 11:07, Miklos Szeredi wrote:
> On Tue, 11 Jul 2023 at 06:36, Jiachen Zhang
> <zhangjiachen.jaycee@bytedance.com> wrote:
>>
>> Some users may want both the high performance of the writeback_cahe mode
>> and a little bit more consistency among FUSE mounts. Current
>> writeback_cache mode never updates attributes from server, so can never
>> see the file attributes changed by other FUSE mounts, which means
>> 'zero-consisteny'.
>>
>> This commit introduces writeback_cache_v2 mode, which allows the attributes
>> to be updated from server to kernel when the inode is clean and no
>> writeback is in-progressing. FUSE daemons can select this mode by the
>> FUSE_WRITEBACK_CACHE_V2 init flag.
>>
>> In writeback_cache_v2 mode, the server generates official attributes.
>> Therefore,
>>
>>      1. For the cmtime, the cmtime generated by kernel are just temporary
>>      values that are never flushed to server by fuse_write_inode(), and they
>>      could be eventually updated by the official server cmtime. The
>>      mtime-based revalidation of the fc->auto_inval_data mode is also
>>      skipped, as the kernel-generated temporary cmtime are likely not equal
>>      to the offical server cmtime.
>>
>>      2. For the file size, we expect server updates its file size on
>>      FUSE_WRITEs. So we increase fi->attr_version in fuse_writepage_end() to
>>      check the staleness of the returning file size.
>>
>> Together with FOPEN_INVAL_ATTR, a FUSE daemon is able to implement
>> close-to-open (CTO) consistency like NFS client implementations.
> 
> What I'd prefer is mode similar to NFS: getattr flushes pending writes
> so that server ctime/mtime are always in sync with client.  FUSE
> probably should have done that from the beginning, but at that time I
> wasn't aware of the NFS solution.


I think it would be good to have flush-on-getattr configurable - systems 
with a distributed lock manager (DLM) and notifications from 
server/daemon to kernel should not need it.


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/5] fuse: writeback_cache consistency enhancement (writeback_cache_v2)
  2023-08-23 10:35     ` Bernd Schubert
@ 2023-08-23 10:59       ` Jiachen Zhang
  2023-08-23 22:55         ` Bernd Schubert
  0 siblings, 1 reply; 17+ messages in thread
From: Jiachen Zhang @ 2023-08-23 10:59 UTC (permalink / raw)
  To: Bernd Schubert, Miklos Szeredi
  Cc: Jonathan Corbet, linux-fsdevel, linux-doc, linux-kernel, me

On 2023/8/23 18:35, Bernd Schubert wrote:
> On 8/23/23 11:07, Miklos Szeredi wrote:
>> On Tue, 11 Jul 2023 at 06:36, Jiachen Zhang
>> <zhangjiachen.jaycee@bytedance.com> wrote:
>>>
>>> Some users may want both the high performance of the writeback_cahe mode
>>> and a little bit more consistency among FUSE mounts. Current
>>> writeback_cache mode never updates attributes from server, so can never
>>> see the file attributes changed by other FUSE mounts, which means
>>> 'zero-consisteny'.
>>>
>>> This commit introduces writeback_cache_v2 mode, which allows the 
>>> attributes
>>> to be updated from server to kernel when the inode is clean and no
>>> writeback is in-progressing. FUSE daemons can select this mode by the
>>> FUSE_WRITEBACK_CACHE_V2 init flag.
>>>
>>> In writeback_cache_v2 mode, the server generates official attributes.
>>> Therefore,
>>>
>>>      1. For the cmtime, the cmtime generated by kernel are just 
>>> temporary
>>>      values that are never flushed to server by fuse_write_inode(), 
>>> and they
>>>      could be eventually updated by the official server cmtime. The
>>>      mtime-based revalidation of the fc->auto_inval_data mode is also
>>>      skipped, as the kernel-generated temporary cmtime are likely not 
>>> equal
>>>      to the offical server cmtime.
>>>
>>>      2. For the file size, we expect server updates its file size on
>>>      FUSE_WRITEs. So we increase fi->attr_version in 
>>> fuse_writepage_end() to
>>>      check the staleness of the returning file size.
>>>
>>> Together with FOPEN_INVAL_ATTR, a FUSE daemon is able to implement
>>> close-to-open (CTO) consistency like NFS client implementations.
>>
>> What I'd prefer is mode similar to NFS: getattr flushes pending writes
>> so that server ctime/mtime are always in sync with client.  FUSE
>> probably should have done that from the beginning, but at that time I
>> wasn't aware of the NFS solution.
> 
> 
> I think it would be good to have flush-on-getattr configurable - systems 
> with a distributed lock manager (DLM) and notifications from 
> server/daemon to kernel should not need it.
> 
> 
> Thanks,
> Bernd

Hi Miklos and Bernd,

I agree that flush-on-getattr is a good solution to keep the c/mtime
consistency for the view of userspace applications.

Maybe in the next version, we can add the flush-on-getattr just for the
writeback_cache_v2 mode, as daemons replying on reverse notifications
are likely not need the writeback_cache_v2 mode. What do you think?

Thanks,
Jiachen



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 3/5] fuse: add FOPEN_INVAL_ATTR
  2023-08-23  9:01   ` Miklos Szeredi
@ 2023-08-23 11:12     ` Jiachen Zhang
  0 siblings, 0 replies; 17+ messages in thread
From: Jiachen Zhang @ 2023-08-23 11:12 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Jonathan Corbet, linux-fsdevel, linux-doc, linux-kernel, me

On 2023/8/23 17:01, Miklos Szeredi wrote:
> On Tue, 11 Jul 2023 at 06:36, Jiachen Zhang
> <zhangjiachen.jaycee@bytedance.com> wrote:
>>
>> Add FOPEN_INVAL_ATTR so that the fuse daemon can ask kernel to invalidate
>> the attr cache on file open.
>>
>> The fi->attr_version should be increased when handling FOPEN_INVAL_ATTR.
>> Because if a FUSE request returning attributes (getattr, setattr, lookup,
>> and readdirplus) starts before a FUSE_OPEN replying FOPEN_INVAL_ATTR, but
>> finishes after the FUSE_OPEN, staled attributes will be set to the inode
>> and falsely clears the inval_mask.
>>
>> Signed-off-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
>> ---
>>   fs/fuse/file.c            | 10 ++++++++++
>>   include/uapi/linux/fuse.h |  2 ++
>>   2 files changed, 12 insertions(+)
>>
>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
>> index de37a3a06a71..412824a11b7b 100644
>> --- a/fs/fuse/file.c
>> +++ b/fs/fuse/file.c
>> @@ -215,6 +215,16 @@ void fuse_finish_open(struct inode *inode, struct file *file)
>>                  file_update_time(file);
>>                  fuse_invalidate_attr_mask(inode, FUSE_STATX_MODSIZE);
>>          }
>> +
>> +       if (ff->open_flags & FOPEN_INVAL_ATTR) {
>> +               struct fuse_inode *fi = get_fuse_inode(inode);
>> +
>> +               spin_lock(&fi->lock);
>> +               fi->attr_version = atomic64_inc_return(&fc->attr_version);
> 
> No need to add locking or change fi->attr_version.  This will be done
> next time the attributes are updated.
> 
> Thanks,
> Miklos

Hi Miklos,

Thanks for the review! As said in the commit message, increasing the
attr version here is to prevent the attr updated by staled operations. 
If such cases happen, inval_mask will be falsely cleared, and 
FOPEN_INVAL_ATTR takes no effect from the user's point of view.

The increasing of attr_version here can also be explained as the
attr has been updated from the server side.

Thanks,
Jiachen

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 4/5] fuse: writeback_cache consistency enhancement (writeback_cache_v2)
  2023-08-23 10:59       ` Jiachen Zhang
@ 2023-08-23 22:55         ` Bernd Schubert
  0 siblings, 0 replies; 17+ messages in thread
From: Bernd Schubert @ 2023-08-23 22:55 UTC (permalink / raw)
  To: Jiachen Zhang, Miklos Szeredi
  Cc: Jonathan Corbet, linux-fsdevel, linux-doc, linux-kernel, me



On 8/23/23 12:59, Jiachen Zhang wrote:
> On 2023/8/23 18:35, Bernd Schubert wrote:
>> On 8/23/23 11:07, Miklos Szeredi wrote:
>>> On Tue, 11 Jul 2023 at 06:36, Jiachen Zhang
>>> <zhangjiachen.jaycee@bytedance.com> wrote:
>>>>
>>>> Some users may want both the high performance of the writeback_cahe 
>>>> mode
>>>> and a little bit more consistency among FUSE mounts. Current
>>>> writeback_cache mode never updates attributes from server, so can never
>>>> see the file attributes changed by other FUSE mounts, which means
>>>> 'zero-consisteny'.
>>>>
>>>> This commit introduces writeback_cache_v2 mode, which allows the 
>>>> attributes
>>>> to be updated from server to kernel when the inode is clean and no
>>>> writeback is in-progressing. FUSE daemons can select this mode by the
>>>> FUSE_WRITEBACK_CACHE_V2 init flag.
>>>>
>>>> In writeback_cache_v2 mode, the server generates official attributes.
>>>> Therefore,
>>>>
>>>>      1. For the cmtime, the cmtime generated by kernel are just 
>>>> temporary
>>>>      values that are never flushed to server by fuse_write_inode(), 
>>>> and they
>>>>      could be eventually updated by the official server cmtime. The
>>>>      mtime-based revalidation of the fc->auto_inval_data mode is also
>>>>      skipped, as the kernel-generated temporary cmtime are likely 
>>>> not equal
>>>>      to the offical server cmtime.
>>>>
>>>>      2. For the file size, we expect server updates its file size on
>>>>      FUSE_WRITEs. So we increase fi->attr_version in 
>>>> fuse_writepage_end() to
>>>>      check the staleness of the returning file size.
>>>>
>>>> Together with FOPEN_INVAL_ATTR, a FUSE daemon is able to implement
>>>> close-to-open (CTO) consistency like NFS client implementations.
>>>
>>> What I'd prefer is mode similar to NFS: getattr flushes pending writes
>>> so that server ctime/mtime are always in sync with client.  FUSE
>>> probably should have done that from the beginning, but at that time I
>>> wasn't aware of the NFS solution.
>>
>>
>> I think it would be good to have flush-on-getattr configurable - 
>> systems with a distributed lock manager (DLM) and notifications from 
>> server/daemon to kernel should not need it.
>>
>>
>> Thanks,
>> Bernd
> 
> Hi Miklos and Bernd,
> 
> I agree that flush-on-getattr is a good solution to keep the c/mtime
> consistency for the view of userspace applications.
> 
> Maybe in the next version, we can add the flush-on-getattr just for the
> writeback_cache_v2 mode, as daemons replying on reverse notifications
> are likely not need the writeback_cache_v2 mode. What do you think?

Hi Jiachen,

isn't Miklos' idea that we can avoid writeback_cache_v2 mode?


Thanks,
Bernd

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-08-23 22:56 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-11  4:34 [PATCH 0/5] FUSE consistency improvements Jiachen Zhang
2023-07-11  4:34 ` [PATCH 1/5] fuse: check attributes staleness on fuse_iget() Jiachen Zhang
2023-08-23  8:57   ` Miklos Szeredi
2023-07-11  4:34 ` [PATCH 2/5] fuse: invalidate dentry on EEXIST creates or ENOENT deletes Jiachen Zhang
2023-08-16 12:27   ` Miklos Szeredi
2023-07-11  4:34 ` [PATCH 3/5] fuse: add FOPEN_INVAL_ATTR Jiachen Zhang
2023-08-23  9:01   ` Miklos Szeredi
2023-08-23 11:12     ` Jiachen Zhang
2023-07-11  4:34 ` [PATCH 4/5] fuse: writeback_cache consistency enhancement (writeback_cache_v2) Jiachen Zhang
2023-08-23  9:07   ` Miklos Szeredi
2023-08-23 10:35     ` Bernd Schubert
2023-08-23 10:59       ` Jiachen Zhang
2023-08-23 22:55         ` Bernd Schubert
2023-07-11  4:34 ` [PATCH 5/5] docs: fuse: improve FUSE consistency explanation Jiachen Zhang
2023-07-11  4:42   ` Randy Dunlap
2023-07-11  7:15     ` Jiachen Zhang
2023-07-14  5:50 ` [PATCH 0/5] FUSE consistency improvements Jingbo Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).