linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] fuse: inode IO modes and mmap
@ 2023-12-24 10:49 Bernd Schubert
  2023-12-24 10:49 ` [PATCH 1/4] fuse: Fix VM_MAYSHARE and direct_io_allow_mmap Bernd Schubert
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Bernd Schubert @ 2023-12-24 10:49 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: bernd.schubert, miklos, dsingh, amir73il, Bernd Schubert

This series is mostly about mmap and inode IO modes.

Patch 1/4 was already posted before
https://patchwork.kernel.org/project/linux-fsdevel/patch/20231213150703.6262-1-bschubert@ddn.com/
but is included here again, as especially patch 4/4 has a
dependency on it. Amir has also spotted a typo in the commit message
of the initial patch, which is corrected here.

Patch 2/4 and 3/4 add helper functions, which are needed by the main
patch (4/4) in this series and are be also needed by another fuse
direct-IO series. That series needs the helper functions in
fuse_cache_write_iter, thus, these new helpers are above that
function.

Patch 4/4 is the main patch in the series, which adds inode
IO modes, which is needed to re-enable shared DIO writes locks
when FUSE_DIRECT_IO_ALLOW_MMAP is set. Furthermore, these IO modes
are also needed by Amirs WIP fuse passthrough work.

The conflict of FUSE_DIRECT_IO_ALLOW_MMAP and
FOPEN_PARALLEL_DIRECT_WRITES was detected by xfstest generic/095.
This patch series was tested by running a loop of that test
and also by multiple runs of the complete xfstest suite.
For testing with libfuse the branch of this pull request is needed
https://github.com/libfuse/libfuse/pull/870

Amir Goldstein (1):
  fuse: introduce inode io modes

Bernd Schubert (3):
  fuse: Fix VM_MAYSHARE and direct_io_allow_mmap
  fuse: Create helper function if DIO write needs exclusive lock
  fuse: Add fuse_dio_lock/unlock helper functions

 fs/fuse/file.c            | 268 ++++++++++++++++++++++++++++++++------
 fs/fuse/fuse_i.h          |  76 ++++++++++-
 include/uapi/linux/fuse.h |   2 +
 3 files changed, 302 insertions(+), 44 deletions(-)

-- 
2.40.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/4] fuse: Fix VM_MAYSHARE and direct_io_allow_mmap
  2023-12-24 10:49 [PATCH 0/4] fuse: inode IO modes and mmap Bernd Schubert
@ 2023-12-24 10:49 ` Bernd Schubert
  2023-12-24 10:49 ` [PATCH 2/4] fuse: Create helper function if DIO write needs exclusive lock Bernd Schubert
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Bernd Schubert @ 2023-12-24 10:49 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: bernd.schubert, miklos, dsingh, amir73il, Bernd Schubert, Hao Xu,
	stable

There were multiple issues with direct_io_allow_mmap:
- fuse_link_write_file() was missing, resulting in warnings in
  fuse_write_file_get() and EIO from msync()
- "vma->vm_ops = &fuse_file_vm_ops" was not set, but especially
  fuse_page_mkwrite is needed.

The semantics of invalidate_inode_pages2() is so far not clearly defined
in fuse_file_mmap. It dates back to
commit 3121bfe76311 ("fuse: fix "direct_io" private mmap")
Though, as direct_io_allow_mmap is a new feature, that was for MAP_PRIVATE
only. As invalidate_inode_pages2() is calling into fuse_launder_folio()
and writes out dirty pages, it should be safe to call
invalidate_inode_pages2 for MAP_PRIVATE and MAP_SHARED as well.

Cc: Hao Xu <howeyxu@tencent.com>
Cc: stable@vger.kernel.org
Fixes: e78662e818f9 ("fuse: add a new fuse init flag to relax restrictions in no cache mode")
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/fuse/file.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index a660f1f21540a..174aa16407c4b 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -2475,7 +2475,10 @@ static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma)
 
 		invalidate_inode_pages2(file->f_mapping);
 
-		return generic_file_mmap(file, vma);
+		if (!(vma->vm_flags & VM_MAYSHARE)) {
+			/* MAP_PRIVATE */
+			return generic_file_mmap(file, vma);
+		}
 	}
 
 	if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_MAYWRITE))
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/4] fuse: Create helper function if DIO write needs exclusive lock
  2023-12-24 10:49 [PATCH 0/4] fuse: inode IO modes and mmap Bernd Schubert
  2023-12-24 10:49 ` [PATCH 1/4] fuse: Fix VM_MAYSHARE and direct_io_allow_mmap Bernd Schubert
@ 2023-12-24 10:49 ` Bernd Schubert
  2023-12-24 11:14   ` Amir Goldstein
  2023-12-24 10:49 ` [PATCH 3/4] fuse: Add fuse_dio_lock/unlock helper functions Bernd Schubert
  2023-12-24 10:49 ` [PATCH 4/4] fuse: introduce inode io modes Bernd Schubert
  3 siblings, 1 reply; 7+ messages in thread
From: Bernd Schubert @ 2023-12-24 10:49 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: bernd.schubert, miklos, dsingh, amir73il, Bernd Schubert

This makes the code a bit easier to read and allows to more
easily add more conditions when an exclusive lock is needed.

Signed-off-by: Bernd Schubert <bschubert@ddn.com>
---
 fs/fuse/file.c | 64 +++++++++++++++++++++++++++++++++++---------------
 1 file changed, 45 insertions(+), 19 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 174aa16407c4b..546254aaab19f 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1298,6 +1298,45 @@ static ssize_t fuse_perform_write(struct kiocb *iocb, struct iov_iter *ii)
 	return res;
 }
 
+static bool fuse_io_past_eof(struct kiocb *iocb, struct iov_iter *iter)
+{
+	struct inode *inode = file_inode(iocb->ki_filp);
+
+	return iocb->ki_pos + iov_iter_count(iter) > i_size_read(inode);
+}
+
+/*
+ * @return true if an exclusive lock for direct IO writes is needed
+ */
+static bool fuse_dio_wr_exclusive_lock(struct kiocb *iocb, struct iov_iter *from)
+{
+	struct file *file = iocb->ki_filp;
+	struct fuse_file *ff = file->private_data;
+	struct inode *inode = file_inode(iocb->ki_filp);
+
+	/* server side has to advise that it supports parallel dio writes */
+	if (!(ff->open_flags & FOPEN_PARALLEL_DIRECT_WRITES))
+		return true;
+
+	/* append will need to know the eventual eof - always needs an
+	 * exclusive lock
+	 */
+	if (iocb->ki_flags & IOCB_APPEND)
+		return true;
+
+	/* combination opf page access and direct-io difficult, shared
+	 * locks actually introduce a conflict.
+	 */
+	if (get_fuse_conn(inode)->direct_io_allow_mmap)
+		return true;
+
+	/* parallel dio beyond eof is at least for now not supported */
+	if (fuse_io_past_eof(iocb, from))
+		return true;
+
+	return false;
+}
+
 static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
 {
 	struct file *file = iocb->ki_filp;
@@ -1557,26 +1596,12 @@ static ssize_t fuse_direct_read_iter(struct kiocb *iocb, struct iov_iter *to)
 	return res;
 }
 
-static bool fuse_direct_write_extending_i_size(struct kiocb *iocb,
-					       struct iov_iter *iter)
-{
-	struct inode *inode = file_inode(iocb->ki_filp);
-
-	return iocb->ki_pos + iov_iter_count(iter) > i_size_read(inode);
-}
-
 static ssize_t fuse_direct_write_iter(struct kiocb *iocb, struct iov_iter *from)
 {
 	struct inode *inode = file_inode(iocb->ki_filp);
-	struct file *file = iocb->ki_filp;
-	struct fuse_file *ff = file->private_data;
 	struct fuse_io_priv io = FUSE_IO_PRIV_SYNC(iocb);
 	ssize_t res;
-	bool exclusive_lock =
-		!(ff->open_flags & FOPEN_PARALLEL_DIRECT_WRITES) ||
-		get_fuse_conn(inode)->direct_io_allow_mmap ||
-		iocb->ki_flags & IOCB_APPEND ||
-		fuse_direct_write_extending_i_size(iocb, from);
+	bool exclusive_lock = fuse_dio_wr_exclusive_lock(iocb, from);
 
 	/*
 	 * Take exclusive lock if
@@ -1590,10 +1615,10 @@ static ssize_t fuse_direct_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	else {
 		inode_lock_shared(inode);
 
-		/* A race with truncate might have come up as the decision for
-		 * the lock type was done without holding the lock, check again.
+		/*
+		 * Previous check was without any lock and might have raced.
 		 */
-		if (fuse_direct_write_extending_i_size(iocb, from)) {
+		if (fuse_dio_wr_exclusive_lock(iocb, from)) {
 			inode_unlock_shared(inode);
 			inode_lock(inode);
 			exclusive_lock = true;
@@ -2467,7 +2492,8 @@ static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma)
 		return fuse_dax_mmap(file, vma);
 
 	if (ff->open_flags & FOPEN_DIRECT_IO) {
-		/* Can't provide the coherency needed for MAP_SHARED
+		/*
+		 * Can't provide the coherency needed for MAP_SHARED
 		 * if FUSE_DIRECT_IO_ALLOW_MMAP isn't set.
 		 */
 		if ((vma->vm_flags & VM_MAYSHARE) && !fc->direct_io_allow_mmap)
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/4] fuse: Add fuse_dio_lock/unlock helper functions
  2023-12-24 10:49 [PATCH 0/4] fuse: inode IO modes and mmap Bernd Schubert
  2023-12-24 10:49 ` [PATCH 1/4] fuse: Fix VM_MAYSHARE and direct_io_allow_mmap Bernd Schubert
  2023-12-24 10:49 ` [PATCH 2/4] fuse: Create helper function if DIO write needs exclusive lock Bernd Schubert
@ 2023-12-24 10:49 ` Bernd Schubert
  2023-12-24 11:15   ` Amir Goldstein
  2023-12-24 10:49 ` [PATCH 4/4] fuse: introduce inode io modes Bernd Schubert
  3 siblings, 1 reply; 7+ messages in thread
From: Bernd Schubert @ 2023-12-24 10:49 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: bernd.schubert, miklos, dsingh, amir73il, Bernd Schubert

So far this is just a helper to remove complex locking
logic out of fuse_direct_write_iter. Especially needed
by the next patch in the series to that adds the fuse inode
cache IO mode and adds in even more locking complexity.

Signed-off-by: Bernd Schubert <bschubert@ddn.com>
---
 fs/fuse/file.c | 61 ++++++++++++++++++++++++++++----------------------
 1 file changed, 34 insertions(+), 27 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 546254aaab19f..abc93415ec7e3 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1337,6 +1337,37 @@ static bool fuse_dio_wr_exclusive_lock(struct kiocb *iocb, struct iov_iter *from
 	return false;
 }
 
+static void fuse_dio_lock(struct kiocb *iocb, struct iov_iter *from,
+			  bool *exclusive)
+{
+	struct inode *inode = file_inode(iocb->ki_filp);
+
+	*exclusive = fuse_dio_wr_exclusive_lock(iocb, from);
+	if (*exclusive) {
+		inode_lock(inode);
+	} else {
+		inode_lock_shared(inode);
+		/*
+		 * Previous check was without inode lock and might have raced,
+		 * check again.
+		 */
+		if (fuse_io_past_eof(iocb, from)) {
+			inode_unlock_shared(inode);
+			inode_lock(inode);
+			*exclusive = true;
+		}
+	}
+}
+
+static void fuse_dio_unlock(struct inode *inode, bool exclusive)
+{
+	if (exclusive) {
+		inode_unlock(inode);
+	} else {
+		inode_unlock_shared(inode);
+	}
+}
+
 static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
 {
 	struct file *file = iocb->ki_filp;
@@ -1601,30 +1632,9 @@ static ssize_t fuse_direct_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	struct inode *inode = file_inode(iocb->ki_filp);
 	struct fuse_io_priv io = FUSE_IO_PRIV_SYNC(iocb);
 	ssize_t res;
-	bool exclusive_lock = fuse_dio_wr_exclusive_lock(iocb, from);
-
-	/*
-	 * Take exclusive lock if
-	 * - Parallel direct writes are disabled - a user space decision
-	 * - Parallel direct writes are enabled and i_size is being extended.
-	 * - Shared mmap on direct_io file is supported (FUSE_DIRECT_IO_ALLOW_MMAP).
-	 *   This might not be needed at all, but needs further investigation.
-	 */
-	if (exclusive_lock)
-		inode_lock(inode);
-	else {
-		inode_lock_shared(inode);
-
-		/*
-		 * Previous check was without any lock and might have raced.
-		 */
-		if (fuse_dio_wr_exclusive_lock(iocb, from)) {
-			inode_unlock_shared(inode);
-			inode_lock(inode);
-			exclusive_lock = true;
-		}
-	}
+	bool exclusive;
 
+	fuse_dio_lock(iocb, from, &exclusive);
 	res = generic_write_checks(iocb, from);
 	if (res > 0) {
 		if (!is_sync_kiocb(iocb) && iocb->ki_flags & IOCB_DIRECT) {
@@ -1635,10 +1645,7 @@ static ssize_t fuse_direct_write_iter(struct kiocb *iocb, struct iov_iter *from)
 			fuse_write_update_attr(inode, iocb->ki_pos, res);
 		}
 	}
-	if (exclusive_lock)
-		inode_unlock(inode);
-	else
-		inode_unlock_shared(inode);
+	fuse_dio_unlock(inode, exclusive);
 
 	return res;
 }
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 4/4] fuse: introduce inode io modes
  2023-12-24 10:49 [PATCH 0/4] fuse: inode IO modes and mmap Bernd Schubert
                   ` (2 preceding siblings ...)
  2023-12-24 10:49 ` [PATCH 3/4] fuse: Add fuse_dio_lock/unlock helper functions Bernd Schubert
@ 2023-12-24 10:49 ` Bernd Schubert
  3 siblings, 0 replies; 7+ messages in thread
From: Bernd Schubert @ 2023-12-24 10:49 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: bernd.schubert, miklos, dsingh, amir73il, Bernd Schubert

From: Amir Goldstein <amir73il@gmail.com>

The fuse inode io mode is determined by the mode of its open files/mmaps
and parallel dio.

- caching io mode - files open in caching mode or mmap on direct_io file
- direct io mode - no files open in caching mode and no files mmaped
- parallel dio mode - direct io mode with parallel dio in progress

We use a new FOPEN_CACHE_IO flag to explicitly mark a file that was open
in caching mode.

direct_io mmap uses page cache, so first mmap will mark the file as
FOPEN_DIRECT_IO|FOPEN_CACHE_IO (i.e. mixed mode) and inode will enter
the caching io mode.

If the server opens the file with flags FOPEN_DIRECT_IO|FOPEN_CACHE_IO,
the inode enters caching io mode already on open.

This allows executing parallel dio when inode is not in caching mode
even if shared mmap is allowed, but no mmaps have been performed on
the inode in question.

An mmap on direct_io file now waits for in-progress parallel dio writes,
so FOPEN_PARALLEL_DIRECT_WRITES is enabled again by this commit.

Open in caching mode falls back to direct io mode if parallel dio is
in progress.

Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/fuse/file.c            | 160 ++++++++++++++++++++++++++++++++++++--
 fs/fuse/fuse_i.h          |  76 +++++++++++++++++-
 include/uapi/linux/fuse.h |   2 +
 3 files changed, 230 insertions(+), 8 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index abc93415ec7e3..fb0b571daaf55 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -104,10 +104,100 @@ static void fuse_release_end(struct fuse_mount *fm, struct fuse_args *args,
 	kfree(ra);
 }
 
+static bool fuse_file_is_direct_io(struct file *file)
+{
+	struct fuse_file *ff = file->private_data;
+
+	return ff->open_flags & FOPEN_DIRECT_IO || file->f_flags & O_DIRECT;
+}
+
+/* Request access to submit new io to inode via open file */
+static bool fuse_file_io_open(struct file *file, struct inode *inode)
+{
+	struct fuse_file *ff = file->private_data;
+	struct fuse_inode *fi = get_fuse_inode(inode);
+	bool ok = true;
+
+	if (!S_ISREG(inode->i_mode) || FUSE_IS_DAX(inode))
+		return true;
+
+	/* Set explicit FOPEN_CACHE_IO flag for file open in caching mode */
+	if (!fuse_file_is_direct_io(file))
+		ff->open_flags |= FOPEN_CACHE_IO;
+
+	spin_lock(&fi->lock);
+	/* First caching file open enters caching inode io mode */
+	if (ff->open_flags & FOPEN_CACHE_IO) {
+		ok = fuse_inode_get_io_cache(fi);
+		if (!ok) {
+			/* fallback to open in direct io mode */
+			pr_debug("failed to open file in caching mode; falling back to direct io mode.\n");
+			ff->open_flags &= ~FOPEN_CACHE_IO;
+			ff->open_flags |= FOPEN_DIRECT_IO;
+		}
+	}
+	spin_unlock(&fi->lock);
+
+	return ok;
+}
+
+/* Request access to submit new io to inode via mmap */
+static int fuse_file_io_mmap(struct fuse_file *ff, struct inode *inode)
+{
+	struct fuse_inode *fi = get_fuse_inode(inode);
+
+	if (WARN_ON(!S_ISREG(inode->i_mode) || FUSE_IS_DAX(inode)))
+		return -ENODEV;
+
+	spin_lock(&fi->lock);
+	/*
+	 * First mmap of direct_io file enters caching inode io mode, blocks
+	 * new parallel dio writes and waits for the in-progress parallel dio
+	 * writes to complete.
+	 */
+	if (!(ff->open_flags & FOPEN_CACHE_IO)) {
+		while (!fuse_inode_get_io_cache(fi)) {
+			/*
+			 * Setting the bit advises new direct-io writes
+			 * to use an exclusive lock - without it the wait below
+			 * might be forever.
+			 */
+			set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
+			spin_unlock(&fi->lock);
+			wait_event_interruptible(fi->direct_io_waitq,
+						 fuse_is_io_cache_allowed(fi));
+			spin_lock(&fi->lock);
+		}
+		ff->open_flags |= FOPEN_CACHE_IO;
+	}
+	spin_unlock(&fi->lock);
+
+	return 0;
+}
+
+/* No more pending io and no new io possible to inode via open/mmapped file */
+static void fuse_file_io_release(struct fuse_file *ff, struct inode *inode)
+{
+	struct fuse_inode *fi = get_fuse_inode(inode);
+
+	if (!S_ISREG(inode->i_mode) || FUSE_IS_DAX(inode))
+		return;
+
+	spin_lock(&fi->lock);
+	/* Last caching file close exits caching inode io mode */
+	if (ff->open_flags & FOPEN_CACHE_IO)
+		fuse_inode_put_io_cache(fi);
+	spin_unlock(&fi->lock);
+}
+
 static void fuse_file_put(struct fuse_file *ff, bool sync, bool isdir)
 {
 	if (refcount_dec_and_test(&ff->count)) {
 		struct fuse_args *args = &ff->release_args->args;
+		struct inode *inode = ff->release_args->inode;
+
+		if (inode)
+			fuse_file_io_release(ff, inode);
 
 		if (isdir ? ff->fm->fc->no_opendir : ff->fm->fc->no_open) {
 			/* Do nothing when client does not implement 'open' */
@@ -199,6 +289,9 @@ void fuse_finish_open(struct inode *inode, struct file *file)
 	struct fuse_file *ff = file->private_data;
 	struct fuse_conn *fc = get_fuse_conn(inode);
 
+	/* The file open mode determines the inode io mode */
+	fuse_file_io_open(file, inode);
+
 	if (ff->open_flags & FOPEN_STREAM)
 		stream_open(inode, file);
 	else if (ff->open_flags & FOPEN_NONSEEKABLE)
@@ -1305,6 +1398,37 @@ static bool fuse_io_past_eof(struct kiocb *iocb, struct iov_iter *iter)
 	return iocb->ki_pos + iov_iter_count(iter) > i_size_read(inode);
 }
 
+/*
+ * New parallal dio allowed only if inode is not in caching mode and
+ * denies new opens in caching mode.
+ */
+static bool fuse_file_shared_dio_start(struct inode *inode)
+{
+	struct fuse_inode *fi = get_fuse_inode(inode);
+	bool ok;
+
+	if (WARN_ON(!S_ISREG(inode->i_mode) || FUSE_IS_DAX(inode)))
+		return false;
+
+	spin_lock(&fi->lock);
+	ok = fuse_inode_deny_io_cache(fi);
+	spin_unlock(&fi->lock);
+	return ok;
+}
+
+/* Allow new opens in caching mode after last parallel dio end */
+static void fuse_file_shared_dio_end(struct inode *inode)
+{
+	struct fuse_inode *fi = get_fuse_inode(inode);
+	bool allow_cached_io;
+
+	spin_lock(&fi->lock);
+	allow_cached_io = fuse_inode_allow_io_cache(fi);
+	spin_unlock(&fi->lock);
+	if (allow_cached_io)
+		wake_up(&fi->direct_io_waitq);
+}
+
 /*
  * @return true if an exclusive lock for direct IO writes is needed
  */
@@ -1313,6 +1437,7 @@ static bool fuse_dio_wr_exclusive_lock(struct kiocb *iocb, struct iov_iter *from
 	struct file *file = iocb->ki_filp;
 	struct fuse_file *ff = file->private_data;
 	struct inode *inode = file_inode(iocb->ki_filp);
+	struct fuse_inode *fi = get_fuse_inode(inode);
 
 	/* server side has to advise that it supports parallel dio writes */
 	if (!(ff->open_flags & FOPEN_PARALLEL_DIRECT_WRITES))
@@ -1324,11 +1449,9 @@ static bool fuse_dio_wr_exclusive_lock(struct kiocb *iocb, struct iov_iter *from
 	if (iocb->ki_flags & IOCB_APPEND)
 		return true;
 
-	/* combination opf page access and direct-io difficult, shared
-	 * locks actually introduce a conflict.
-	 */
-	if (get_fuse_conn(inode)->direct_io_allow_mmap)
-		return true;
+	/* shared locks are not allowed with parallel page cache IO */
+	if (test_bit(FUSE_I_CACHE_IO_MODE, &fi->state))
+		return false;
 
 	/* parallel dio beyond eof is at least for now not supported */
 	if (fuse_io_past_eof(iocb, from))
@@ -1349,9 +1472,11 @@ static void fuse_dio_lock(struct kiocb *iocb, struct iov_iter *from,
 		inode_lock_shared(inode);
 		/*
 		 * Previous check was without inode lock and might have raced,
-		 * check again.
+		 * check again. fuse_file_shared_dio_start() should be performed
+		 * only after taking shared inode lock.
 		 */
-		if (fuse_io_past_eof(iocb, from)) {
+		if (fuse_io_past_eof(iocb, from) ||
+		    !fuse_file_shared_dio_start(inode)) {
 			inode_unlock_shared(inode);
 			inode_lock(inode);
 			*exclusive = true;
@@ -1364,6 +1489,7 @@ static void fuse_dio_unlock(struct inode *inode, bool exclusive)
 	if (exclusive) {
 		inode_unlock(inode);
 	} else {
+		fuse_file_shared_dio_end(inode);
 		inode_unlock_shared(inode);
 	}
 }
@@ -2493,11 +2619,16 @@ static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma)
 {
 	struct fuse_file *ff = file->private_data;
 	struct fuse_conn *fc = ff->fm->fc;
+	int rc;
 
 	/* DAX mmap is superior to direct_io mmap */
 	if (FUSE_IS_DAX(file_inode(file)))
 		return fuse_dax_mmap(file, vma);
 
+	/*
+	 * FOPEN_DIRECT_IO handling is special compared to O_DIRECT,
+	 * as does not allow MAP_SHARED mmap without FUSE_DIRECT_IO_ALLOW_MMAP.
+	 */
 	if (ff->open_flags & FOPEN_DIRECT_IO) {
 		/*
 		 * Can't provide the coherency needed for MAP_SHARED
@@ -2508,10 +2639,23 @@ static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma)
 
 		invalidate_inode_pages2(file->f_mapping);
 
+		/*
+		 * First mmap of direct_io file enters caching inode io mode.
+		 * Also waits for parallel dio writers to go into serial mode
+		 * (exclusive instead of shared lock).
+		 */
+		rc = fuse_file_io_mmap(ff, file_inode(file));
+		if (rc)
+			return rc;
+
 		if (!(vma->vm_flags & VM_MAYSHARE)) {
 			/* MAP_PRIVATE */
 			return generic_file_mmap(file, vma);
 		}
+	} else if (file->f_flags & O_DIRECT) {
+		rc = fuse_file_io_mmap(ff, file_inode(file));
+		if (rc)
+			return rc;
 	}
 
 	if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_MAYWRITE))
@@ -3280,7 +3424,9 @@ void fuse_init_file_inode(struct inode *inode, unsigned int flags)
 	INIT_LIST_HEAD(&fi->write_files);
 	INIT_LIST_HEAD(&fi->queued_writes);
 	fi->writectr = 0;
+	fi->iocachectr = 0;
 	init_waitqueue_head(&fi->page_waitq);
+	init_waitqueue_head(&fi->direct_io_waitq);
 	fi->writepages = RB_ROOT;
 
 	if (IS_ENABLED(CONFIG_FUSE_DAX))
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 1df83eebda927..5774585f6de3e 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -111,7 +111,7 @@ struct fuse_inode {
 	u64 attr_version;
 
 	union {
-		/* Write related fields (regular file only) */
+		/* read/write io cache (regular file only) */
 		struct {
 			/* Files usable in writepage.  Protected by fi->lock */
 			struct list_head write_files;
@@ -123,9 +123,15 @@ struct fuse_inode {
 			 * (FUSE_NOWRITE) means more writes are blocked */
 			int writectr;
 
+			/** Number of files/maps using page cache */
+			int iocachectr;
+
 			/* Waitq for writepage completion */
 			wait_queue_head_t page_waitq;
 
+			/* waitq for direct-io completion */
+			wait_queue_head_t direct_io_waitq;
+
 			/* List of writepage requestst (pending or sent) */
 			struct rb_root writepages;
 		};
@@ -187,6 +193,8 @@ enum {
 	FUSE_I_BAD,
 	/* Has btime */
 	FUSE_I_BTIME,
+	/* Wants or already has page cache IO */
+	FUSE_I_CACHE_IO_MODE,
 };
 
 struct fuse_conn;
@@ -1349,6 +1357,72 @@ int fuse_fileattr_set(struct mnt_idmap *idmap,
 		      struct dentry *dentry, struct fileattr *fa);
 
 /* file.c */
+/*
+ * Request an open in caching mode.
+ * Return true if in caching mode.
+ */
+static inline bool fuse_inode_get_io_cache(struct fuse_inode *fi)
+{
+	assert_spin_locked(&fi->lock);
+	if (fi->iocachectr < 0)
+		return false;
+	fi->iocachectr++;
+	if (fi->iocachectr == 1)
+		set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
+
+	return true;
+}
+
+/*
+ * Release an open in caching mode.
+ * Return true if no more files open in caching mode.
+ */
+static inline bool fuse_inode_put_io_cache(struct fuse_inode *fi)
+{
+	assert_spin_locked(&fi->lock);
+	if (WARN_ON(fi->iocachectr <= 0))
+		return false;
+
+	if (--fi->iocachectr == 0) {
+		clear_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
+		return true;
+	}
+
+	return false;
+}
+
+/*
+ * Requets to deny new opens in caching mode.
+ * Return true if denying new opens in caching mode.
+ */
+static inline bool fuse_inode_deny_io_cache(struct fuse_inode *fi)
+{
+	assert_spin_locked(&fi->lock);
+	if (fi->iocachectr > 0)
+		return false;
+	fi->iocachectr--;
+	return true;
+}
+
+/*
+ * Release a request to deny open in caching mode.
+ * Return true if allowing new opens in caching mode.
+ */
+static inline bool fuse_inode_allow_io_cache(struct fuse_inode *fi)
+{
+	assert_spin_locked(&fi->lock);
+	if (WARN_ON(fi->iocachectr >= 0))
+		return false;
+	return ++(fi->iocachectr) == 0;
+}
+
+/*
+ * Return true if allowing new opens in caching mode.
+ */
+static inline bool fuse_is_io_cache_allowed(struct fuse_inode *fi)
+{
+	return READ_ONCE(fi->iocachectr) >= 0;
+}
 
 struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
 				 unsigned int open_flags, bool isdir);
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index e7418d15fe390..66a4bd8d767d4 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -353,6 +353,7 @@ struct fuse_file_lock {
  * FOPEN_STREAM: the file is stream-like (no file position at all)
  * FOPEN_NOFLUSH: don't flush data cache on close (unless FUSE_WRITEBACK_CACHE)
  * FOPEN_PARALLEL_DIRECT_WRITES: Allow concurrent direct writes on the same inode
+ * FOPEN_CACHE_IO: using cache for this open file (incl. mmap on direct_io)
  */
 #define FOPEN_DIRECT_IO		(1 << 0)
 #define FOPEN_KEEP_CACHE	(1 << 1)
@@ -361,6 +362,7 @@ struct fuse_file_lock {
 #define FOPEN_STREAM		(1 << 4)
 #define FOPEN_NOFLUSH		(1 << 5)
 #define FOPEN_PARALLEL_DIRECT_WRITES	(1 << 6)
+#define FOPEN_CACHE_IO		(1 << 7)
 
 /**
  * INIT request/reply flags
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/4] fuse: Create helper function if DIO write needs exclusive lock
  2023-12-24 10:49 ` [PATCH 2/4] fuse: Create helper function if DIO write needs exclusive lock Bernd Schubert
@ 2023-12-24 11:14   ` Amir Goldstein
  0 siblings, 0 replies; 7+ messages in thread
From: Amir Goldstein @ 2023-12-24 11:14 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: linux-fsdevel, bernd.schubert, miklos, dsingh

On Sun, Dec 24, 2023 at 12:49 PM Bernd Schubert <bschubert@ddn.com> wrote:
>
> This makes the code a bit easier to read and allows to more
> easily add more conditions when an exclusive lock is needed.
>
> Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>

> ---
>  fs/fuse/file.c | 64 +++++++++++++++++++++++++++++++++++---------------
>  1 file changed, 45 insertions(+), 19 deletions(-)
>
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 174aa16407c4b..546254aaab19f 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -1298,6 +1298,45 @@ static ssize_t fuse_perform_write(struct kiocb *iocb, struct iov_iter *ii)
>         return res;
>  }
>
> +static bool fuse_io_past_eof(struct kiocb *iocb, struct iov_iter *iter)
> +{
> +       struct inode *inode = file_inode(iocb->ki_filp);
> +
> +       return iocb->ki_pos + iov_iter_count(iter) > i_size_read(inode);
> +}
> +
> +/*
> + * @return true if an exclusive lock for direct IO writes is needed
> + */
> +static bool fuse_dio_wr_exclusive_lock(struct kiocb *iocb, struct iov_iter *from)
> +{
> +       struct file *file = iocb->ki_filp;
> +       struct fuse_file *ff = file->private_data;
> +       struct inode *inode = file_inode(iocb->ki_filp);
> +
> +       /* server side has to advise that it supports parallel dio writes */
> +       if (!(ff->open_flags & FOPEN_PARALLEL_DIRECT_WRITES))
> +               return true;
> +
> +       /* append will need to know the eventual eof - always needs an
> +        * exclusive lock
> +        */
> +       if (iocb->ki_flags & IOCB_APPEND)
> +               return true;
> +
> +       /* combination opf page access and direct-io difficult, shared
> +        * locks actually introduce a conflict.
> +        */
> +       if (get_fuse_conn(inode)->direct_io_allow_mmap)
> +               return true;
> +
> +       /* parallel dio beyond eof is at least for now not supported */
> +       if (fuse_io_past_eof(iocb, from))
> +               return true;
> +
> +       return false;
> +}
> +
>  static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
>  {
>         struct file *file = iocb->ki_filp;
> @@ -1557,26 +1596,12 @@ static ssize_t fuse_direct_read_iter(struct kiocb *iocb, struct iov_iter *to)
>         return res;
>  }
>
> -static bool fuse_direct_write_extending_i_size(struct kiocb *iocb,
> -                                              struct iov_iter *iter)
> -{
> -       struct inode *inode = file_inode(iocb->ki_filp);
> -
> -       return iocb->ki_pos + iov_iter_count(iter) > i_size_read(inode);
> -}
> -
>  static ssize_t fuse_direct_write_iter(struct kiocb *iocb, struct iov_iter *from)
>  {
>         struct inode *inode = file_inode(iocb->ki_filp);
> -       struct file *file = iocb->ki_filp;
> -       struct fuse_file *ff = file->private_data;
>         struct fuse_io_priv io = FUSE_IO_PRIV_SYNC(iocb);
>         ssize_t res;
> -       bool exclusive_lock =
> -               !(ff->open_flags & FOPEN_PARALLEL_DIRECT_WRITES) ||
> -               get_fuse_conn(inode)->direct_io_allow_mmap ||
> -               iocb->ki_flags & IOCB_APPEND ||
> -               fuse_direct_write_extending_i_size(iocb, from);
> +       bool exclusive_lock = fuse_dio_wr_exclusive_lock(iocb, from);
>
>         /*
>          * Take exclusive lock if
> @@ -1590,10 +1615,10 @@ static ssize_t fuse_direct_write_iter(struct kiocb *iocb, struct iov_iter *from)
>         else {
>                 inode_lock_shared(inode);
>
> -               /* A race with truncate might have come up as the decision for
> -                * the lock type was done without holding the lock, check again.
> +               /*
> +                * Previous check was without any lock and might have raced.
>                  */
> -               if (fuse_direct_write_extending_i_size(iocb, from)) {
> +               if (fuse_dio_wr_exclusive_lock(iocb, from)) {
>                         inode_unlock_shared(inode);
>                         inode_lock(inode);
>                         exclusive_lock = true;
> @@ -2467,7 +2492,8 @@ static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma)
>                 return fuse_dax_mmap(file, vma);
>
>         if (ff->open_flags & FOPEN_DIRECT_IO) {
> -               /* Can't provide the coherency needed for MAP_SHARED
> +               /*
> +                * Can't provide the coherency needed for MAP_SHARED
>                  * if FUSE_DIRECT_IO_ALLOW_MMAP isn't set.
>                  */
>                 if ((vma->vm_flags & VM_MAYSHARE) && !fc->direct_io_allow_mmap)
> --
> 2.40.1
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 3/4] fuse: Add fuse_dio_lock/unlock helper functions
  2023-12-24 10:49 ` [PATCH 3/4] fuse: Add fuse_dio_lock/unlock helper functions Bernd Schubert
@ 2023-12-24 11:15   ` Amir Goldstein
  0 siblings, 0 replies; 7+ messages in thread
From: Amir Goldstein @ 2023-12-24 11:15 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: linux-fsdevel, bernd.schubert, miklos, dsingh

On Sun, Dec 24, 2023 at 12:49 PM Bernd Schubert <bschubert@ddn.com> wrote:
>
> So far this is just a helper to remove complex locking
> logic out of fuse_direct_write_iter. Especially needed
> by the next patch in the series to that adds the fuse inode
> cache IO mode and adds in even more locking complexity.
>
> Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>

> ---
>  fs/fuse/file.c | 61 ++++++++++++++++++++++++++++----------------------
>  1 file changed, 34 insertions(+), 27 deletions(-)
>
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 546254aaab19f..abc93415ec7e3 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -1337,6 +1337,37 @@ static bool fuse_dio_wr_exclusive_lock(struct kiocb *iocb, struct iov_iter *from
>         return false;
>  }
>
> +static void fuse_dio_lock(struct kiocb *iocb, struct iov_iter *from,
> +                         bool *exclusive)
> +{
> +       struct inode *inode = file_inode(iocb->ki_filp);
> +
> +       *exclusive = fuse_dio_wr_exclusive_lock(iocb, from);
> +       if (*exclusive) {
> +               inode_lock(inode);
> +       } else {
> +               inode_lock_shared(inode);
> +               /*
> +                * Previous check was without inode lock and might have raced,
> +                * check again.
> +                */
> +               if (fuse_io_past_eof(iocb, from)) {
> +                       inode_unlock_shared(inode);
> +                       inode_lock(inode);
> +                       *exclusive = true;
> +               }
> +       }
> +}
> +
> +static void fuse_dio_unlock(struct inode *inode, bool exclusive)
> +{
> +       if (exclusive) {
> +               inode_unlock(inode);
> +       } else {
> +               inode_unlock_shared(inode);
> +       }
> +}
> +
>  static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
>  {
>         struct file *file = iocb->ki_filp;
> @@ -1601,30 +1632,9 @@ static ssize_t fuse_direct_write_iter(struct kiocb *iocb, struct iov_iter *from)
>         struct inode *inode = file_inode(iocb->ki_filp);
>         struct fuse_io_priv io = FUSE_IO_PRIV_SYNC(iocb);
>         ssize_t res;
> -       bool exclusive_lock = fuse_dio_wr_exclusive_lock(iocb, from);
> -
> -       /*
> -        * Take exclusive lock if
> -        * - Parallel direct writes are disabled - a user space decision
> -        * - Parallel direct writes are enabled and i_size is being extended.
> -        * - Shared mmap on direct_io file is supported (FUSE_DIRECT_IO_ALLOW_MMAP).
> -        *   This might not be needed at all, but needs further investigation.
> -        */
> -       if (exclusive_lock)
> -               inode_lock(inode);
> -       else {
> -               inode_lock_shared(inode);
> -
> -               /*
> -                * Previous check was without any lock and might have raced.
> -                */
> -               if (fuse_dio_wr_exclusive_lock(iocb, from)) {
> -                       inode_unlock_shared(inode);
> -                       inode_lock(inode);
> -                       exclusive_lock = true;
> -               }
> -       }
> +       bool exclusive;
>
> +       fuse_dio_lock(iocb, from, &exclusive);
>         res = generic_write_checks(iocb, from);
>         if (res > 0) {
>                 if (!is_sync_kiocb(iocb) && iocb->ki_flags & IOCB_DIRECT) {
> @@ -1635,10 +1645,7 @@ static ssize_t fuse_direct_write_iter(struct kiocb *iocb, struct iov_iter *from)
>                         fuse_write_update_attr(inode, iocb->ki_pos, res);
>                 }
>         }
> -       if (exclusive_lock)
> -               inode_unlock(inode);
> -       else
> -               inode_unlock_shared(inode);
> +       fuse_dio_unlock(inode, exclusive);
>
>         return res;
>  }
> --
> 2.40.1
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-12-24 12:23 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-24 10:49 [PATCH 0/4] fuse: inode IO modes and mmap Bernd Schubert
2023-12-24 10:49 ` [PATCH 1/4] fuse: Fix VM_MAYSHARE and direct_io_allow_mmap Bernd Schubert
2023-12-24 10:49 ` [PATCH 2/4] fuse: Create helper function if DIO write needs exclusive lock Bernd Schubert
2023-12-24 11:14   ` Amir Goldstein
2023-12-24 10:49 ` [PATCH 3/4] fuse: Add fuse_dio_lock/unlock helper functions Bernd Schubert
2023-12-24 11:15   ` Amir Goldstein
2023-12-24 10:49 ` [PATCH 4/4] fuse: introduce inode io modes Bernd Schubert

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).