* [PATCH v3 1/9] fuse: Fix VM_MAYSHARE and direct_io_allow_mmap
2024-02-08 17:05 [PATCH v3 0/9] fuse: inode IO modes and mmap + parallel dio Amir Goldstein
@ 2024-02-08 17:05 ` Amir Goldstein
2024-02-08 17:05 ` [PATCH v3 2/9] fuse: Create helper function if DIO write needs exclusive lock Amir Goldstein
` (7 subsequent siblings)
8 siblings, 0 replies; 30+ messages in thread
From: Amir Goldstein @ 2024-02-08 17:05 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Bernd Schubert, linux-fsdevel, Bernd Schubert
From: Bernd Schubert <bschubert@ddn.com>
There were multiple issues with direct_io_allow_mmap:
- fuse_link_write_file() was missing, resulting in warnings in
fuse_write_file_get() and EIO from msync()
- "vma->vm_ops = &fuse_file_vm_ops" was not set, but especially
fuse_page_mkwrite is needed.
The semantics of invalidate_inode_pages2() is so far not clearly defined
in fuse_file_mmap. It dates back to
commit 3121bfe76311 ("fuse: fix "direct_io" private mmap")
Though, as direct_io_allow_mmap is a new feature, that was for MAP_PRIVATE
only. As invalidate_inode_pages2() is calling into fuse_launder_folio()
and writes out dirty pages, it should be safe to call
invalidate_inode_pages2 for MAP_PRIVATE and MAP_SHARED as well.
Cc: Hao Xu <howeyxu@tencent.com>
Cc: stable@vger.kernel.org
Fixes: e78662e818f9 ("fuse: add a new fuse init flag to relax restrictions in no cache mode")
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
---
fs/fuse/file.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 148a71b8b4d0..243f469cac07 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -2476,7 +2476,10 @@ static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma)
invalidate_inode_pages2(file->f_mapping);
- return generic_file_mmap(file, vma);
+ if (!(vma->vm_flags & VM_MAYSHARE)) {
+ /* MAP_PRIVATE */
+ return generic_file_mmap(file, vma);
+ }
}
if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_MAYWRITE))
--
2.34.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v3 2/9] fuse: Create helper function if DIO write needs exclusive lock
2024-02-08 17:05 [PATCH v3 0/9] fuse: inode IO modes and mmap + parallel dio Amir Goldstein
2024-02-08 17:05 ` [PATCH v3 1/9] fuse: Fix VM_MAYSHARE and direct_io_allow_mmap Amir Goldstein
@ 2024-02-08 17:05 ` Amir Goldstein
2024-02-09 9:50 ` Miklos Szeredi
2024-02-08 17:05 ` [PATCH v3 3/9] fuse: Add fuse_dio_lock/unlock helper functions Amir Goldstein
` (6 subsequent siblings)
8 siblings, 1 reply; 30+ messages in thread
From: Amir Goldstein @ 2024-02-08 17:05 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Bernd Schubert, linux-fsdevel, Bernd Schubert
From: Bernd Schubert <bschubert@ddn.com>
This makes the code a bit easier to read and allows to more
easily add more conditions when an exclusive lock is needed.
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
---
fs/fuse/file.c | 64 +++++++++++++++++++++++++++++++++++---------------
1 file changed, 45 insertions(+), 19 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 243f469cac07..a64ee1392c77 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1299,6 +1299,45 @@ static ssize_t fuse_perform_write(struct kiocb *iocb, struct iov_iter *ii)
return res;
}
+static bool fuse_io_past_eof(struct kiocb *iocb, struct iov_iter *iter)
+{
+ struct inode *inode = file_inode(iocb->ki_filp);
+
+ return iocb->ki_pos + iov_iter_count(iter) > i_size_read(inode);
+}
+
+/*
+ * @return true if an exclusive lock for direct IO writes is needed
+ */
+static bool fuse_dio_wr_exclusive_lock(struct kiocb *iocb, struct iov_iter *from)
+{
+ struct file *file = iocb->ki_filp;
+ struct fuse_file *ff = file->private_data;
+ struct inode *inode = file_inode(iocb->ki_filp);
+
+ /* server side has to advise that it supports parallel dio writes */
+ if (!(ff->open_flags & FOPEN_PARALLEL_DIRECT_WRITES))
+ return true;
+
+ /* append will need to know the eventual eof - always needs an
+ * exclusive lock
+ */
+ if (iocb->ki_flags & IOCB_APPEND)
+ return true;
+
+ /* combination opf page access and direct-io difficult, shared
+ * locks actually introduce a conflict.
+ */
+ if (get_fuse_conn(inode)->direct_io_allow_mmap)
+ return true;
+
+ /* parallel dio beyond eof is at least for now not supported */
+ if (fuse_io_past_eof(iocb, from))
+ return true;
+
+ return false;
+}
+
static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
{
struct file *file = iocb->ki_filp;
@@ -1558,26 +1597,12 @@ static ssize_t fuse_direct_read_iter(struct kiocb *iocb, struct iov_iter *to)
return res;
}
-static bool fuse_direct_write_extending_i_size(struct kiocb *iocb,
- struct iov_iter *iter)
-{
- struct inode *inode = file_inode(iocb->ki_filp);
-
- return iocb->ki_pos + iov_iter_count(iter) > i_size_read(inode);
-}
-
static ssize_t fuse_direct_write_iter(struct kiocb *iocb, struct iov_iter *from)
{
struct inode *inode = file_inode(iocb->ki_filp);
- struct file *file = iocb->ki_filp;
- struct fuse_file *ff = file->private_data;
struct fuse_io_priv io = FUSE_IO_PRIV_SYNC(iocb);
ssize_t res;
- bool exclusive_lock =
- !(ff->open_flags & FOPEN_PARALLEL_DIRECT_WRITES) ||
- get_fuse_conn(inode)->direct_io_allow_mmap ||
- iocb->ki_flags & IOCB_APPEND ||
- fuse_direct_write_extending_i_size(iocb, from);
+ bool exclusive_lock = fuse_dio_wr_exclusive_lock(iocb, from);
/*
* Take exclusive lock if
@@ -1591,10 +1616,10 @@ static ssize_t fuse_direct_write_iter(struct kiocb *iocb, struct iov_iter *from)
else {
inode_lock_shared(inode);
- /* A race with truncate might have come up as the decision for
- * the lock type was done without holding the lock, check again.
+ /*
+ * Previous check was without any lock and might have raced.
*/
- if (fuse_direct_write_extending_i_size(iocb, from)) {
+ if (fuse_io_past_eof(iocb, from)) {
inode_unlock_shared(inode);
inode_lock(inode);
exclusive_lock = true;
@@ -2468,7 +2493,8 @@ static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma)
return fuse_dax_mmap(file, vma);
if (ff->open_flags & FOPEN_DIRECT_IO) {
- /* Can't provide the coherency needed for MAP_SHARED
+ /*
+ * Can't provide the coherency needed for MAP_SHARED
* if FUSE_DIRECT_IO_ALLOW_MMAP isn't set.
*/
if ((vma->vm_flags & VM_MAYSHARE) && !fc->direct_io_allow_mmap)
--
2.34.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH v3 2/9] fuse: Create helper function if DIO write needs exclusive lock
2024-02-08 17:05 ` [PATCH v3 2/9] fuse: Create helper function if DIO write needs exclusive lock Amir Goldstein
@ 2024-02-09 9:50 ` Miklos Szeredi
0 siblings, 0 replies; 30+ messages in thread
From: Miklos Szeredi @ 2024-02-09 9:50 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Bernd Schubert, linux-fsdevel, Bernd Schubert
On Thu, 8 Feb 2024 at 18:08, Amir Goldstein <amir73il@gmail.com> wrote:
> @@ -2468,7 +2493,8 @@ static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma)
> return fuse_dax_mmap(file, vma);
>
> if (ff->open_flags & FOPEN_DIRECT_IO) {
> - /* Can't provide the coherency needed for MAP_SHARED
> + /*
> + * Can't provide the coherency needed for MAP_SHARED
> * if FUSE_DIRECT_IO_ALLOW_MMAP isn't set.
> */
> if ((vma->vm_flags & VM_MAYSHARE) && !fc->direct_io_allow_mmap)
This last hunk seems to belong in the previous patch.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v3 3/9] fuse: Add fuse_dio_lock/unlock helper functions
2024-02-08 17:05 [PATCH v3 0/9] fuse: inode IO modes and mmap + parallel dio Amir Goldstein
2024-02-08 17:05 ` [PATCH v3 1/9] fuse: Fix VM_MAYSHARE and direct_io_allow_mmap Amir Goldstein
2024-02-08 17:05 ` [PATCH v3 2/9] fuse: Create helper function if DIO write needs exclusive lock Amir Goldstein
@ 2024-02-08 17:05 ` Amir Goldstein
2024-02-08 17:05 ` [PATCH v3 4/9] fuse: factor out helper fuse_truncate_update_attr() Amir Goldstein
` (5 subsequent siblings)
8 siblings, 0 replies; 30+ messages in thread
From: Amir Goldstein @ 2024-02-08 17:05 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Bernd Schubert, linux-fsdevel, Bernd Schubert
From: Bernd Schubert <bschubert@ddn.com>
So far this is just a helper to remove complex locking
logic out of fuse_direct_write_iter. Especially needed
by the next patch in the series to that adds the fuse inode
cache IO mode and adds in even more locking complexity.
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
---
fs/fuse/file.c | 61 ++++++++++++++++++++++++++++----------------------
1 file changed, 34 insertions(+), 27 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index a64ee1392c77..3062f4b5a34b 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1338,6 +1338,37 @@ static bool fuse_dio_wr_exclusive_lock(struct kiocb *iocb, struct iov_iter *from
return false;
}
+static void fuse_dio_lock(struct kiocb *iocb, struct iov_iter *from,
+ bool *exclusive)
+{
+ struct inode *inode = file_inode(iocb->ki_filp);
+
+ *exclusive = fuse_dio_wr_exclusive_lock(iocb, from);
+ if (*exclusive) {
+ inode_lock(inode);
+ } else {
+ inode_lock_shared(inode);
+ /*
+ * Previous check was without inode lock and might have raced,
+ * check again.
+ */
+ if (fuse_io_past_eof(iocb, from)) {
+ inode_unlock_shared(inode);
+ inode_lock(inode);
+ *exclusive = true;
+ }
+ }
+}
+
+static void fuse_dio_unlock(struct inode *inode, bool exclusive)
+{
+ if (exclusive) {
+ inode_unlock(inode);
+ } else {
+ inode_unlock_shared(inode);
+ }
+}
+
static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
{
struct file *file = iocb->ki_filp;
@@ -1602,30 +1633,9 @@ static ssize_t fuse_direct_write_iter(struct kiocb *iocb, struct iov_iter *from)
struct inode *inode = file_inode(iocb->ki_filp);
struct fuse_io_priv io = FUSE_IO_PRIV_SYNC(iocb);
ssize_t res;
- bool exclusive_lock = fuse_dio_wr_exclusive_lock(iocb, from);
-
- /*
- * Take exclusive lock if
- * - Parallel direct writes are disabled - a user space decision
- * - Parallel direct writes are enabled and i_size is being extended.
- * - Shared mmap on direct_io file is supported (FUSE_DIRECT_IO_ALLOW_MMAP).
- * This might not be needed at all, but needs further investigation.
- */
- if (exclusive_lock)
- inode_lock(inode);
- else {
- inode_lock_shared(inode);
-
- /*
- * Previous check was without any lock and might have raced.
- */
- if (fuse_io_past_eof(iocb, from)) {
- inode_unlock_shared(inode);
- inode_lock(inode);
- exclusive_lock = true;
- }
- }
+ bool exclusive;
+ fuse_dio_lock(iocb, from, &exclusive);
res = generic_write_checks(iocb, from);
if (res > 0) {
if (!is_sync_kiocb(iocb) && iocb->ki_flags & IOCB_DIRECT) {
@@ -1636,10 +1646,7 @@ static ssize_t fuse_direct_write_iter(struct kiocb *iocb, struct iov_iter *from)
fuse_write_update_attr(inode, iocb->ki_pos, res);
}
}
- if (exclusive_lock)
- inode_unlock(inode);
- else
- inode_unlock_shared(inode);
+ fuse_dio_unlock(inode, exclusive);
return res;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v3 4/9] fuse: factor out helper fuse_truncate_update_attr()
2024-02-08 17:05 [PATCH v3 0/9] fuse: inode IO modes and mmap + parallel dio Amir Goldstein
` (2 preceding siblings ...)
2024-02-08 17:05 ` [PATCH v3 3/9] fuse: Add fuse_dio_lock/unlock helper functions Amir Goldstein
@ 2024-02-08 17:05 ` Amir Goldstein
2024-02-08 17:05 ` [PATCH v3 5/9] fuse: allocate ff->release_args only if release is needed Amir Goldstein
` (4 subsequent siblings)
8 siblings, 0 replies; 30+ messages in thread
From: Amir Goldstein @ 2024-02-08 17:05 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Bernd Schubert, linux-fsdevel
fuse_finish_open() is called from fuse_open_common() and from
fuse_create_open(). In the latter case, the O_TRUNC flag is always
cleared in finish_open()m before calling into fuse_finish_open().
Move the bits that update attribute cache post O_TRUNC open into a
helper and call this helper from fuse_open_common() directly.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/fuse/file.c | 38 +++++++++++++++++++++-----------------
1 file changed, 21 insertions(+), 17 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 3062f4b5a34b..151658692eda 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -205,30 +205,31 @@ void fuse_finish_open(struct inode *inode, struct file *file)
else if (ff->open_flags & FOPEN_NONSEEKABLE)
nonseekable_open(inode, file);
- if (fc->atomic_o_trunc && (file->f_flags & O_TRUNC)) {
- struct fuse_inode *fi = get_fuse_inode(inode);
-
- spin_lock(&fi->lock);
- fi->attr_version = atomic64_inc_return(&fc->attr_version);
- i_size_write(inode, 0);
- spin_unlock(&fi->lock);
- file_update_time(file);
- fuse_invalidate_attr_mask(inode, FUSE_STATX_MODSIZE);
- }
if ((file->f_mode & FMODE_WRITE) && fc->writeback_cache)
fuse_link_write_file(file);
}
+static void fuse_truncate_update_attr(struct inode *inode, struct file *file)
+{
+ struct fuse_conn *fc = get_fuse_conn(inode);
+ struct fuse_inode *fi = get_fuse_inode(inode);
+
+ spin_lock(&fi->lock);
+ fi->attr_version = atomic64_inc_return(&fc->attr_version);
+ i_size_write(inode, 0);
+ spin_unlock(&fi->lock);
+ file_update_time(file);
+ fuse_invalidate_attr_mask(inode, FUSE_STATX_MODSIZE);
+}
+
int fuse_open_common(struct inode *inode, struct file *file, bool isdir)
{
struct fuse_mount *fm = get_fuse_mount(inode);
struct fuse_conn *fc = fm->fc;
int err;
- bool is_wb_truncate = (file->f_flags & O_TRUNC) &&
- fc->atomic_o_trunc &&
- fc->writeback_cache;
- bool dax_truncate = (file->f_flags & O_TRUNC) &&
- fc->atomic_o_trunc && FUSE_IS_DAX(inode);
+ bool is_truncate = (file->f_flags & O_TRUNC) && fc->atomic_o_trunc;
+ bool is_wb_truncate = is_truncate && fc->writeback_cache;
+ bool dax_truncate = is_truncate && FUSE_IS_DAX(inode);
if (fuse_is_bad(inode))
return -EIO;
@@ -251,15 +252,18 @@ int fuse_open_common(struct inode *inode, struct file *file, bool isdir)
fuse_set_nowrite(inode);
err = fuse_do_open(fm, get_node_id(inode), file, isdir);
- if (!err)
+ if (!err) {
fuse_finish_open(inode, file);
+ if (is_truncate)
+ fuse_truncate_update_attr(inode, file);
+ }
if (is_wb_truncate || dax_truncate)
fuse_release_nowrite(inode);
if (!err) {
struct fuse_file *ff = file->private_data;
- if (fc->atomic_o_trunc && (file->f_flags & O_TRUNC))
+ if (is_truncate)
truncate_pagecache(inode, 0);
else if (!(ff->open_flags & FOPEN_KEEP_CACHE))
invalidate_inode_pages2(inode->i_mapping);
--
2.34.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v3 5/9] fuse: allocate ff->release_args only if release is needed
2024-02-08 17:05 [PATCH v3 0/9] fuse: inode IO modes and mmap + parallel dio Amir Goldstein
` (3 preceding siblings ...)
2024-02-08 17:05 ` [PATCH v3 4/9] fuse: factor out helper fuse_truncate_update_attr() Amir Goldstein
@ 2024-02-08 17:05 ` Amir Goldstein
2024-02-09 9:57 ` Miklos Szeredi
2024-02-08 17:06 ` [PATCH v3 6/9] fuse: break up fuse_open_common() Amir Goldstein
` (3 subsequent siblings)
8 siblings, 1 reply; 30+ messages in thread
From: Amir Goldstein @ 2024-02-08 17:05 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Bernd Schubert, linux-fsdevel
This removed the need to pass isdir argument to fuse_put_file().
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/fuse/dir.c | 2 +-
fs/fuse/file.c | 69 +++++++++++++++++++++++++++---------------------
fs/fuse/fuse_i.h | 2 +-
3 files changed, 41 insertions(+), 32 deletions(-)
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index d19cbf34c634..b8fc3a6b87fe 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -630,7 +630,7 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
goto out_err;
err = -ENOMEM;
- ff = fuse_file_alloc(fm);
+ ff = fuse_file_alloc(fm, true);
if (!ff)
goto out_put_forget_req;
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 151658692eda..0ca471c5d184 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -56,7 +56,7 @@ struct fuse_release_args {
struct inode *inode;
};
-struct fuse_file *fuse_file_alloc(struct fuse_mount *fm)
+struct fuse_file *fuse_file_alloc(struct fuse_mount *fm, bool release)
{
struct fuse_file *ff;
@@ -65,11 +65,13 @@ struct fuse_file *fuse_file_alloc(struct fuse_mount *fm)
return NULL;
ff->fm = fm;
- ff->release_args = kzalloc(sizeof(*ff->release_args),
- GFP_KERNEL_ACCOUNT);
- if (!ff->release_args) {
- kfree(ff);
- return NULL;
+ if (release) {
+ ff->release_args = kzalloc(sizeof(*ff->release_args),
+ GFP_KERNEL_ACCOUNT);
+ if (!ff->release_args) {
+ kfree(ff);
+ return NULL;
+ }
}
INIT_LIST_HEAD(&ff->write_entry);
@@ -105,14 +107,14 @@ static void fuse_release_end(struct fuse_mount *fm, struct fuse_args *args,
kfree(ra);
}
-static void fuse_file_put(struct fuse_file *ff, bool sync, bool isdir)
+static void fuse_file_put(struct fuse_file *ff, bool sync)
{
if (refcount_dec_and_test(&ff->count)) {
- struct fuse_args *args = &ff->release_args->args;
+ struct fuse_release_args *ra = ff->release_args;
+ struct fuse_args *args = (ra ? &ra->args : NULL);
- if (isdir ? ff->fm->fc->no_opendir : ff->fm->fc->no_open) {
- /* Do nothing when client does not implement 'open' */
- fuse_release_end(ff->fm, args, 0);
+ if (!args) {
+ /* Do nothing when server does not implement 'open' */
} else if (sync) {
fuse_simple_request(ff->fm, args);
fuse_release_end(ff->fm, args, 0);
@@ -132,15 +134,16 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
struct fuse_conn *fc = fm->fc;
struct fuse_file *ff;
int opcode = isdir ? FUSE_OPENDIR : FUSE_OPEN;
+ int noopen = isdir ? fc->no_opendir : fc->no_open;
- ff = fuse_file_alloc(fm);
+ ff = fuse_file_alloc(fm, !noopen);
if (!ff)
return ERR_PTR(-ENOMEM);
ff->fh = 0;
/* Default for no-open */
ff->open_flags = FOPEN_KEEP_CACHE | (isdir ? FOPEN_CACHE_DIR : 0);
- if (isdir ? !fc->no_opendir : !fc->no_open) {
+ if (!noopen) {
struct fuse_open_out outarg;
int err;
@@ -148,11 +151,13 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
if (!err) {
ff->fh = outarg.fh;
ff->open_flags = outarg.open_flags;
-
} else if (err != -ENOSYS) {
fuse_file_free(ff);
return ERR_PTR(err);
} else {
+ /* No release needed */
+ kfree(ff->release_args);
+ ff->release_args = NULL;
if (isdir)
fc->no_opendir = 1;
else
@@ -278,7 +283,7 @@ int fuse_open_common(struct inode *inode, struct file *file, bool isdir)
}
static void fuse_prepare_release(struct fuse_inode *fi, struct fuse_file *ff,
- unsigned int flags, int opcode)
+ unsigned int flags, int opcode, bool sync)
{
struct fuse_conn *fc = ff->fm->fc;
struct fuse_release_args *ra = ff->release_args;
@@ -296,6 +301,9 @@ static void fuse_prepare_release(struct fuse_inode *fi, struct fuse_file *ff,
wake_up_interruptible_all(&ff->poll_wait);
+ if (!ra)
+ return;
+
ra->inarg.fh = ff->fh;
ra->inarg.flags = flags;
ra->args.in_numargs = 1;
@@ -305,6 +313,13 @@ static void fuse_prepare_release(struct fuse_inode *fi, struct fuse_file *ff,
ra->args.nodeid = ff->nodeid;
ra->args.force = true;
ra->args.nocreds = true;
+
+ /*
+ * Hold inode until release is finished.
+ * From fuse_sync_release() the refcount is 1 and everything's
+ * synchronous, so we are fine with not doing igrab() here.
+ */
+ ra->inode = sync ? NULL : igrab(&fi->inode);
}
void fuse_file_release(struct inode *inode, struct fuse_file *ff,
@@ -314,14 +329,12 @@ void fuse_file_release(struct inode *inode, struct fuse_file *ff,
struct fuse_release_args *ra = ff->release_args;
int opcode = isdir ? FUSE_RELEASEDIR : FUSE_RELEASE;
- fuse_prepare_release(fi, ff, open_flags, opcode);
+ fuse_prepare_release(fi, ff, open_flags, opcode, false);
- if (ff->flock) {
+ if (ra && ff->flock) {
ra->inarg.release_flags |= FUSE_RELEASE_FLOCK_UNLOCK;
ra->inarg.lock_owner = fuse_lock_owner_id(ff->fm->fc, id);
}
- /* Hold inode until release is finished */
- ra->inode = igrab(inode);
/*
* Normally this will send the RELEASE request, however if
@@ -332,7 +345,7 @@ void fuse_file_release(struct inode *inode, struct fuse_file *ff,
* synchronous RELEASE is allowed (and desirable) in this case
* because the server can be trusted not to screw up.
*/
- fuse_file_put(ff, ff->fm->fc->destroy, isdir);
+ fuse_file_put(ff, ff->fm->fc->destroy);
}
void fuse_release_common(struct file *file, bool isdir)
@@ -367,12 +380,8 @@ void fuse_sync_release(struct fuse_inode *fi, struct fuse_file *ff,
unsigned int flags)
{
WARN_ON(refcount_read(&ff->count) > 1);
- fuse_prepare_release(fi, ff, flags, FUSE_RELEASE);
- /*
- * iput(NULL) is a no-op and since the refcount is 1 and everything's
- * synchronous, we are fine with not doing igrab() here"
- */
- fuse_file_put(ff, true, false);
+ fuse_prepare_release(fi, ff, flags, FUSE_RELEASE, true);
+ fuse_file_put(ff, true);
}
EXPORT_SYMBOL_GPL(fuse_sync_release);
@@ -929,7 +938,7 @@ static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args,
put_page(page);
}
if (ia->ff)
- fuse_file_put(ia->ff, false, false);
+ fuse_file_put(ia->ff, false);
fuse_io_free(ia);
}
@@ -1703,7 +1712,7 @@ static void fuse_writepage_free(struct fuse_writepage_args *wpa)
__free_page(ap->pages[i]);
if (wpa->ia.ff)
- fuse_file_put(wpa->ia.ff, false, false);
+ fuse_file_put(wpa->ia.ff, false);
kfree(ap->pages);
kfree(wpa);
@@ -1945,7 +1954,7 @@ int fuse_write_inode(struct inode *inode, struct writeback_control *wbc)
ff = __fuse_write_file_get(fi);
err = fuse_flush_times(inode, ff);
if (ff)
- fuse_file_put(ff, false, false);
+ fuse_file_put(ff, false);
return err;
}
@@ -2343,7 +2352,7 @@ static int fuse_writepages(struct address_space *mapping,
fuse_writepages_send(&data);
}
if (data.ff)
- fuse_file_put(data.ff, false, false);
+ fuse_file_put(data.ff, false);
kfree(data.orig_pages);
out:
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 1df83eebda92..daf7036cd692 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1036,7 +1036,7 @@ void fuse_read_args_fill(struct fuse_io_args *ia, struct file *file, loff_t pos,
*/
int fuse_open_common(struct inode *inode, struct file *file, bool isdir);
-struct fuse_file *fuse_file_alloc(struct fuse_mount *fm);
+struct fuse_file *fuse_file_alloc(struct fuse_mount *fm, bool release);
void fuse_file_free(struct fuse_file *ff);
void fuse_finish_open(struct inode *inode, struct file *file);
--
2.34.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH v3 5/9] fuse: allocate ff->release_args only if release is needed
2024-02-08 17:05 ` [PATCH v3 5/9] fuse: allocate ff->release_args only if release is needed Amir Goldstein
@ 2024-02-09 9:57 ` Miklos Szeredi
2024-02-09 11:26 ` Bernd Schubert
0 siblings, 1 reply; 30+ messages in thread
From: Miklos Szeredi @ 2024-02-09 9:57 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Bernd Schubert, linux-fsdevel
On Thu, 8 Feb 2024 at 18:09, Amir Goldstein <amir73il@gmail.com> wrote:
> @@ -132,15 +134,16 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
> struct fuse_conn *fc = fm->fc;
> struct fuse_file *ff;
> int opcode = isdir ? FUSE_OPENDIR : FUSE_OPEN;
> + int noopen = isdir ? fc->no_opendir : fc->no_open;
bool?
>
> - ff = fuse_file_alloc(fm);
> + ff = fuse_file_alloc(fm, !noopen);
> if (!ff)
> return ERR_PTR(-ENOMEM);
>
> ff->fh = 0;
> /* Default for no-open */
> ff->open_flags = FOPEN_KEEP_CACHE | (isdir ? FOPEN_CACHE_DIR : 0);
> - if (isdir ? !fc->no_opendir : !fc->no_open) {
> + if (!noopen) {
I think this would be more readable without the double negation, i.e
if the bool variable was called e.g. "open".
Thanks,
Miklos
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 5/9] fuse: allocate ff->release_args only if release is needed
2024-02-09 9:57 ` Miklos Szeredi
@ 2024-02-09 11:26 ` Bernd Schubert
0 siblings, 0 replies; 30+ messages in thread
From: Bernd Schubert @ 2024-02-09 11:26 UTC (permalink / raw)
To: Miklos Szeredi, Amir Goldstein; +Cc: linux-fsdevel
On 2/9/24 10:57, Miklos Szeredi wrote:
> On Thu, 8 Feb 2024 at 18:09, Amir Goldstein <amir73il@gmail.com> wrote:
>
>> @@ -132,15 +134,16 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
>> struct fuse_conn *fc = fm->fc;
>> struct fuse_file *ff;
>> int opcode = isdir ? FUSE_OPENDIR : FUSE_OPEN;
>> + int noopen = isdir ? fc->no_opendir : fc->no_open;
>
> bool?
>
>>
>> - ff = fuse_file_alloc(fm);
>> + ff = fuse_file_alloc(fm, !noopen);
>> if (!ff)
>> return ERR_PTR(-ENOMEM);
>>
>> ff->fh = 0;
>> /* Default for no-open */
>> ff->open_flags = FOPEN_KEEP_CACHE | (isdir ? FOPEN_CACHE_DIR : 0);
>> - if (isdir ? !fc->no_opendir : !fc->no_open) {
>> + if (!noopen) {
>
> I think this would be more readable without the double negation, i.e
> if the bool variable was called e.g. "open".
We can change to bool. I had _thought_ to suggest in the internal review
round to use
bool is_open = isdir ? !fc->no_opendir : !fc->no_open;
But then we have double negations in the initialization and for me it
didn't seem to be much better than having them below (though two times
then). I guess no issue if you prefer like that.
Thanks,
Bernd
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v3 6/9] fuse: break up fuse_open_common()
2024-02-08 17:05 [PATCH v3 0/9] fuse: inode IO modes and mmap + parallel dio Amir Goldstein
` (4 preceding siblings ...)
2024-02-08 17:05 ` [PATCH v3 5/9] fuse: allocate ff->release_args only if release is needed Amir Goldstein
@ 2024-02-08 17:06 ` Amir Goldstein
2024-02-09 9:59 ` Miklos Szeredi
2024-02-08 17:06 ` [PATCH v3 7/9] fuse: prepare for failing open response Amir Goldstein
` (2 subsequent siblings)
8 siblings, 1 reply; 30+ messages in thread
From: Amir Goldstein @ 2024-02-08 17:06 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Bernd Schubert, linux-fsdevel
fuse_open_common() has a lot of code relevant only for regular files and
O_TRUNC in particular.
Copy the little bit of remaining code into fuse_dir_open() and stop using
this common helper for directory open.
Also split out fuse_dir_finish_open() from fuse_finish_open() before we
add inode io modes to fuse_finish_open().
Suggested-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/fuse/dir.c | 26 +++++++++++++++++++++++++-
fs/fuse/file.c | 9 ++-------
fs/fuse/fuse_i.h | 2 --
3 files changed, 27 insertions(+), 10 deletions(-)
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index b8fc3a6b87fe..ff324be72abd 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1628,9 +1628,33 @@ static const char *fuse_get_link(struct dentry *dentry, struct inode *inode,
return ERR_PTR(err);
}
+static void fuse_dir_finish_open(struct inode *inode, struct file *file)
+{
+ struct fuse_file *ff = file->private_data;
+
+ if (ff->open_flags & FOPEN_STREAM)
+ stream_open(inode, file);
+ else if (ff->open_flags & FOPEN_NONSEEKABLE)
+ nonseekable_open(inode, file);
+}
+
static int fuse_dir_open(struct inode *inode, struct file *file)
{
- return fuse_open_common(inode, file, true);
+ struct fuse_mount *fm = get_fuse_mount(inode);
+ int err;
+
+ if (fuse_is_bad(inode))
+ return -EIO;
+
+ err = generic_file_open(inode, file);
+ if (err)
+ return err;
+
+ err = fuse_do_open(fm, get_node_id(inode), file, true);
+ if (!err)
+ fuse_dir_finish_open(inode, file);
+
+ return err;
}
static int fuse_dir_release(struct inode *inode, struct file *file)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 0ca471c5d184..84b35bbf22ac 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -227,7 +227,7 @@ static void fuse_truncate_update_attr(struct inode *inode, struct file *file)
fuse_invalidate_attr_mask(inode, FUSE_STATX_MODSIZE);
}
-int fuse_open_common(struct inode *inode, struct file *file, bool isdir)
+static int fuse_open(struct inode *inode, struct file *file)
{
struct fuse_mount *fm = get_fuse_mount(inode);
struct fuse_conn *fc = fm->fc;
@@ -256,7 +256,7 @@ int fuse_open_common(struct inode *inode, struct file *file, bool isdir)
if (is_wb_truncate || dax_truncate)
fuse_set_nowrite(inode);
- err = fuse_do_open(fm, get_node_id(inode), file, isdir);
+ err = fuse_do_open(fm, get_node_id(inode), file, false);
if (!err) {
fuse_finish_open(inode, file);
if (is_truncate)
@@ -354,11 +354,6 @@ void fuse_release_common(struct file *file, bool isdir)
(fl_owner_t) file, isdir);
}
-static int fuse_open(struct inode *inode, struct file *file)
-{
- return fuse_open_common(inode, file, false);
-}
-
static int fuse_release(struct inode *inode, struct file *file)
{
struct fuse_conn *fc = get_fuse_conn(inode);
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index daf7036cd692..5fe096820e97 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1034,8 +1034,6 @@ void fuse_read_args_fill(struct fuse_io_args *ia, struct file *file, loff_t pos,
/**
* Send OPEN or OPENDIR request
*/
-int fuse_open_common(struct inode *inode, struct file *file, bool isdir);
-
struct fuse_file *fuse_file_alloc(struct fuse_mount *fm, bool release);
void fuse_file_free(struct fuse_file *ff);
void fuse_finish_open(struct inode *inode, struct file *file);
--
2.34.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH v3 6/9] fuse: break up fuse_open_common()
2024-02-08 17:06 ` [PATCH v3 6/9] fuse: break up fuse_open_common() Amir Goldstein
@ 2024-02-09 9:59 ` Miklos Szeredi
0 siblings, 0 replies; 30+ messages in thread
From: Miklos Szeredi @ 2024-02-09 9:59 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Bernd Schubert, linux-fsdevel
On Thu, 8 Feb 2024 at 18:09, Amir Goldstein <amir73il@gmail.com> wrote:
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -1034,8 +1034,6 @@ void fuse_read_args_fill(struct fuse_io_args *ia, struct file *file, loff_t pos,
> /**
> * Send OPEN or OPENDIR request
> */
> -int fuse_open_common(struct inode *inode, struct file *file, bool isdir);
> -
Comment above declaration needs to be removed as well.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v3 7/9] fuse: prepare for failing open response
2024-02-08 17:05 [PATCH v3 0/9] fuse: inode IO modes and mmap + parallel dio Amir Goldstein
` (5 preceding siblings ...)
2024-02-08 17:06 ` [PATCH v3 6/9] fuse: break up fuse_open_common() Amir Goldstein
@ 2024-02-08 17:06 ` Amir Goldstein
2024-02-08 17:06 ` [PATCH v3 8/9] fuse: introduce inode io modes Amir Goldstein
2024-02-08 17:06 ` [PATCH v3 9/9] fuse: allow parallel dio writes with FUSE_DIRECT_IO_ALLOW_MMAP Amir Goldstein
8 siblings, 0 replies; 30+ messages in thread
From: Amir Goldstein @ 2024-02-08 17:06 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Bernd Schubert, linux-fsdevel
In preparation for inode io modes, a server open response could fail
due to conflicting inode io modes.
Allow returning an error from fuse_finish_open() and handle the error
in the callers.
fuse_finish_open() is used as the callback of finish_open(), so that
FMODE_OPENED will not be set if fuse_finish_open() fails.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/fuse/dir.c | 8 +++++---
fs/fuse/file.c | 15 ++++++++++-----
fs/fuse/fuse_i.h | 2 +-
3 files changed, 16 insertions(+), 9 deletions(-)
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index ff324be72abd..ea635c17572a 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -692,13 +692,15 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
d_instantiate(entry, inode);
fuse_change_entry_timeout(entry, &outentry);
fuse_dir_changed(dir);
- err = finish_open(file, entry, generic_file_open);
+ err = generic_file_open(inode, file);
+ if (!err) {
+ file->private_data = ff;
+ err = finish_open(file, entry, fuse_finish_open);
+ }
if (err) {
fi = get_fuse_inode(inode);
fuse_sync_release(fi, ff, flags);
} else {
- file->private_data = ff;
- fuse_finish_open(inode, file);
if (fm->fc->atomic_o_trunc && trunc)
truncate_pagecache(inode, 0);
else if (!(ff->open_flags & FOPEN_KEEP_CACHE))
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 84b35bbf22ac..52181c69a527 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -200,7 +200,7 @@ static void fuse_link_write_file(struct file *file)
spin_unlock(&fi->lock);
}
-void fuse_finish_open(struct inode *inode, struct file *file)
+int fuse_finish_open(struct inode *inode, struct file *file)
{
struct fuse_file *ff = file->private_data;
struct fuse_conn *fc = get_fuse_conn(inode);
@@ -212,6 +212,8 @@ void fuse_finish_open(struct inode *inode, struct file *file)
if ((file->f_mode & FMODE_WRITE) && fc->writeback_cache)
fuse_link_write_file(file);
+
+ return 0;
}
static void fuse_truncate_update_attr(struct inode *inode, struct file *file)
@@ -230,7 +232,9 @@ static void fuse_truncate_update_attr(struct inode *inode, struct file *file)
static int fuse_open(struct inode *inode, struct file *file)
{
struct fuse_mount *fm = get_fuse_mount(inode);
+ struct fuse_inode *fi = get_fuse_inode(inode);
struct fuse_conn *fc = fm->fc;
+ struct fuse_file *ff;
int err;
bool is_truncate = (file->f_flags & O_TRUNC) && fc->atomic_o_trunc;
bool is_wb_truncate = is_truncate && fc->writeback_cache;
@@ -258,16 +262,17 @@ static int fuse_open(struct inode *inode, struct file *file)
err = fuse_do_open(fm, get_node_id(inode), file, false);
if (!err) {
- fuse_finish_open(inode, file);
- if (is_truncate)
+ ff = file->private_data;
+ err = fuse_finish_open(inode, file);
+ if (err)
+ fuse_sync_release(fi, ff, file->f_flags);
+ else if (is_truncate)
fuse_truncate_update_attr(inode, file);
}
if (is_wb_truncate || dax_truncate)
fuse_release_nowrite(inode);
if (!err) {
- struct fuse_file *ff = file->private_data;
-
if (is_truncate)
truncate_pagecache(inode, 0);
else if (!(ff->open_flags & FOPEN_KEEP_CACHE))
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 5fe096820e97..9ad5f882bd0a 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1036,7 +1036,7 @@ void fuse_read_args_fill(struct fuse_io_args *ia, struct file *file, loff_t pos,
*/
struct fuse_file *fuse_file_alloc(struct fuse_mount *fm, bool release);
void fuse_file_free(struct fuse_file *ff);
-void fuse_finish_open(struct inode *inode, struct file *file);
+int fuse_finish_open(struct inode *inode, struct file *file);
void fuse_sync_release(struct fuse_inode *fi, struct fuse_file *ff,
unsigned int flags);
--
2.34.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v3 8/9] fuse: introduce inode io modes
2024-02-08 17:05 [PATCH v3 0/9] fuse: inode IO modes and mmap + parallel dio Amir Goldstein
` (6 preceding siblings ...)
2024-02-08 17:06 ` [PATCH v3 7/9] fuse: prepare for failing open response Amir Goldstein
@ 2024-02-08 17:06 ` Amir Goldstein
2024-02-09 10:21 ` Miklos Szeredi
2024-02-08 17:06 ` [PATCH v3 9/9] fuse: allow parallel dio writes with FUSE_DIRECT_IO_ALLOW_MMAP Amir Goldstein
8 siblings, 1 reply; 30+ messages in thread
From: Amir Goldstein @ 2024-02-08 17:06 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Bernd Schubert, linux-fsdevel, Bernd Schubert
The fuse inode io mode is determined by the mode of its open files/mmaps
and parallel dio opens and expressed in the value of fi->iocachectr:
> 0 - caching io: files open in caching mode or mmap on direct_io file
< 0 - parallel dio: direct io mode with parallel dio writes enabled
== 0 - direct io: no files open in caching mode and no files mmaped
Note that iocachectr value of 0 might become positive or negative,
while non-parallel dio is getting processed.
We use an internal FOPEN_CACHE_IO flag to mark a file that was open in
caching mode. This flag can not be set by the server.
direct_io mmap uses page cache, so first mmap will mark the file as
FOPEN_DIRECT_IO|FOPEN_CACHE_IO (i.e. mixed mode) and inode will enter
the caching io mode.
If the server opens the file in caching mode while it is already open
for parallel dio or vice versa the open fails.
This allows executing parallel dio when inode is not in caching mode
and no mmaps have been performed on the inode in question.
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/fuse/Makefile | 1 +
fs/fuse/file.c | 17 +++
fs/fuse/fuse_i.h | 21 +++-
fs/fuse/iomode.c | 218 ++++++++++++++++++++++++++++++++++++++
include/uapi/linux/fuse.h | 2 +
5 files changed, 257 insertions(+), 2 deletions(-)
create mode 100644 fs/fuse/iomode.c
diff --git a/fs/fuse/Makefile b/fs/fuse/Makefile
index 0c48b35c058d..b734cc2a5e65 100644
--- a/fs/fuse/Makefile
+++ b/fs/fuse/Makefile
@@ -8,6 +8,7 @@ obj-$(CONFIG_CUSE) += cuse.o
obj-$(CONFIG_VIRTIO_FS) += virtiofs.o
fuse-y := dev.o dir.o file.o inode.o control.o xattr.o acl.o readdir.o ioctl.o
+fuse-y += iomode.o
fuse-$(CONFIG_FUSE_DAX) += dax.o
virtiofs-y := virtio_fs.o
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 52181c69a527..29e18e5a6f6c 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -113,6 +113,9 @@ static void fuse_file_put(struct fuse_file *ff, bool sync)
struct fuse_release_args *ra = ff->release_args;
struct fuse_args *args = (ra ? &ra->args : NULL);
+ if (ra && ra->inode)
+ fuse_file_io_release(ff, ra->inode);
+
if (!args) {
/* Do nothing when server does not implement 'open' */
} else if (sync) {
@@ -204,6 +207,11 @@ int fuse_finish_open(struct inode *inode, struct file *file)
{
struct fuse_file *ff = file->private_data;
struct fuse_conn *fc = get_fuse_conn(inode);
+ int err;
+
+ err = fuse_file_io_open(file, inode);
+ if (err)
+ return err;
if (ff->open_flags & FOPEN_STREAM)
stream_open(inode, file);
@@ -2507,6 +2515,7 @@ static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma)
{
struct fuse_file *ff = file->private_data;
struct fuse_conn *fc = ff->fm->fc;
+ int rc;
/* DAX mmap is superior to direct_io mmap */
if (FUSE_IS_DAX(file_inode(file)))
@@ -2522,6 +2531,13 @@ static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma)
invalidate_inode_pages2(file->f_mapping);
+ /*
+ * First mmap of direct_io file enters caching inode io mode.
+ */
+ rc = fuse_file_io_mmap(ff, file_inode(file));
+ if (rc)
+ return rc;
+
if (!(vma->vm_flags & VM_MAYSHARE)) {
/* MAP_PRIVATE */
return generic_file_mmap(file, vma);
@@ -3294,6 +3310,7 @@ void fuse_init_file_inode(struct inode *inode, unsigned int flags)
INIT_LIST_HEAD(&fi->write_files);
INIT_LIST_HEAD(&fi->queued_writes);
fi->writectr = 0;
+ fi->iocachectr = 0;
init_waitqueue_head(&fi->page_waitq);
fi->writepages = RB_ROOT;
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 9ad5f882bd0a..5e5465f6a1ac 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -111,7 +111,7 @@ struct fuse_inode {
u64 attr_version;
union {
- /* Write related fields (regular file only) */
+ /* read/write io cache (regular file only) */
struct {
/* Files usable in writepage. Protected by fi->lock */
struct list_head write_files;
@@ -123,6 +123,9 @@ struct fuse_inode {
* (FUSE_NOWRITE) means more writes are blocked */
int writectr;
+ /** Number of files/maps using page cache */
+ int iocachectr;
+
/* Waitq for writepage completion */
wait_queue_head_t page_waitq;
@@ -187,6 +190,8 @@ enum {
FUSE_I_BAD,
/* Has btime */
FUSE_I_BTIME,
+ /* Wants or already has page cache IO */
+ FUSE_I_CACHE_IO_MODE,
};
struct fuse_conn;
@@ -246,6 +251,9 @@ struct fuse_file {
/** Has flock been performed on this file? */
bool flock:1;
+
+ /** Was file opened for io? */
+ bool io_opened:1;
};
/** One input argument of a request */
@@ -1346,8 +1354,17 @@ int fuse_fileattr_get(struct dentry *dentry, struct fileattr *fa);
int fuse_fileattr_set(struct mnt_idmap *idmap,
struct dentry *dentry, struct fileattr *fa);
-/* file.c */
+/* iomode.c */
+int fuse_file_cached_io_start(struct inode *inode);
+void fuse_file_cached_io_end(struct inode *inode);
+int fuse_file_uncached_io_start(struct inode *inode);
+void fuse_file_uncached_io_end(struct inode *inode);
+int fuse_file_io_open(struct file *file, struct inode *inode);
+int fuse_file_io_mmap(struct fuse_file *ff, struct inode *inode);
+void fuse_file_io_release(struct fuse_file *ff, struct inode *inode);
+
+/* file.c */
struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
unsigned int open_flags, bool isdir);
void fuse_file_release(struct inode *inode, struct fuse_file *ff,
diff --git a/fs/fuse/iomode.c b/fs/fuse/iomode.c
new file mode 100644
index 000000000000..13faae77aec4
--- /dev/null
+++ b/fs/fuse/iomode.c
@@ -0,0 +1,218 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * FUSE inode io modes.
+ *
+ * Copyright (c) 2024 CTERA Networks.
+ */
+
+#include "fuse_i.h"
+
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+
+/*
+ * Request an open in caching mode.
+ * Return 0 if in caching mode.
+ */
+static int fuse_inode_get_io_cache(struct fuse_inode *fi)
+{
+ assert_spin_locked(&fi->lock);
+ if (fi->iocachectr < 0)
+ return -ETXTBSY;
+ if (fi->iocachectr++ == 0)
+ set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
+ return 0;
+}
+
+/*
+ * Release an open in caching mode.
+ * Return 0 if in neutral (direct io) mode.
+ */
+static int fuse_inode_put_io_cache(struct fuse_inode *fi)
+{
+ assert_spin_locked(&fi->lock);
+ if (WARN_ON(fi->iocachectr <= 0))
+ return -EIO;
+ if (--fi->iocachectr == 0)
+ clear_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
+ return fi->iocachectr;
+}
+
+/*
+ * Requets to deny new opens in caching mode.
+ * Return 0 if denying new opens in caching mode.
+ */
+static int fuse_inode_deny_io_cache(struct fuse_inode *fi)
+{
+ assert_spin_locked(&fi->lock);
+ if (fi->iocachectr > 0)
+ return -ETXTBSY;
+ fi->iocachectr--;
+ return 0;
+}
+
+/*
+ * Release a request to deny open in caching mode.
+ * Return 0 if allowing new opens in caching mode.
+ */
+static int fuse_inode_allow_io_cache(struct fuse_inode *fi)
+{
+ assert_spin_locked(&fi->lock);
+ if (WARN_ON(fi->iocachectr >= 0))
+ return -EIO;
+ fi->iocachectr++;
+ return -fi->iocachectr;
+}
+
+/* Start cached io mode where parallel dio writes are not allowed */
+int fuse_file_cached_io_start(struct inode *inode)
+{
+ struct fuse_inode *fi = get_fuse_inode(inode);
+ int err;
+
+ spin_lock(&fi->lock);
+ err = fuse_inode_get_io_cache(fi);
+ spin_unlock(&fi->lock);
+ return err;
+}
+
+void fuse_file_cached_io_end(struct inode *inode)
+{
+ struct fuse_inode *fi = get_fuse_inode(inode);
+
+ spin_lock(&fi->lock);
+ fuse_inode_put_io_cache(get_fuse_inode(inode));
+ spin_unlock(&fi->lock);
+}
+
+/* Start strictly uncached io mode where cache access is not allowed */
+int fuse_file_uncached_io_start(struct inode *inode)
+{
+ struct fuse_inode *fi = get_fuse_inode(inode);
+ int err;
+
+ spin_lock(&fi->lock);
+ err = fuse_inode_deny_io_cache(fi);
+ spin_unlock(&fi->lock);
+ return err;
+}
+
+void fuse_file_uncached_io_end(struct inode *inode)
+{
+ struct fuse_inode *fi = get_fuse_inode(inode);
+
+ spin_lock(&fi->lock);
+ fuse_inode_allow_io_cache(fi);
+ spin_unlock(&fi->lock);
+}
+
+/* Open flags to determine regular file io mode */
+#define FOPEN_IO_MODE_MASK \
+ (FOPEN_DIRECT_IO | FOPEN_CACHE_IO)
+
+/* Request access to submit new io to inode via open file */
+int fuse_file_io_open(struct file *file, struct inode *inode)
+{
+ struct fuse_file *ff = file->private_data;
+ int iomode_flags = ff->open_flags & FOPEN_IO_MODE_MASK;
+ int err;
+
+ err = -EBADF;
+ if (WARN_ON(!S_ISREG(inode->i_mode)))
+ goto fail;
+
+ err = -EBUSY;
+ if (WARN_ON(ff->io_opened))
+ goto fail;
+
+ /*
+ * io modes are not relevant with DAX and with server that does not
+ * implement open.
+ */
+ err = -EINVAL;
+ if (FUSE_IS_DAX(inode) || !ff->release_args) {
+ if (iomode_flags)
+ goto fail;
+ return 0;
+ }
+
+ /*
+ * FOPEN_CACHE_IO is an internal flag that is set on file not open in
+ * direct io mode and it cannot be set explicitly by the server.
+ * This includes a file open with O_DIRECT, but server did not specify
+ * FOPEN_DIRECT_IO. In this case, a later fcntl() could remove O_DIRECT,
+ * so we put the inode in caching mode to prevent parallel dio.
+ * FOPEN_PARALLEL_DIRECT_WRITES requires FOPEN_DIRECT_IO.
+ */
+ if (ff->open_flags & FOPEN_CACHE_IO) {
+ goto fail;
+ } else if (!(ff->open_flags & FOPEN_DIRECT_IO)) {
+ ff->open_flags |= FOPEN_CACHE_IO;
+ ff->open_flags &= ~FOPEN_PARALLEL_DIRECT_WRITES;
+ }
+
+ /*
+ * First caching file open enters caching inode io mode.
+ * First parallel dio open denies caching inode io mode.
+ */
+ err = 0;
+ if (ff->open_flags & FOPEN_CACHE_IO)
+ err = fuse_file_cached_io_start(inode);
+ else if (ff->open_flags & FOPEN_PARALLEL_DIRECT_WRITES)
+ err = fuse_file_uncached_io_start(inode);
+ if (err)
+ goto fail;
+
+ ff->io_opened = true;
+ return 0;
+
+fail:
+ pr_debug("failed to open file in requested io mode (open_flags=0x%x, err=%i).\n",
+ ff->open_flags, err);
+ /*
+ * The file open mode determines the inode io mode.
+ * Using incorrect open mode is a server mistake, which results in
+ * user visible failure of open() with EIO error.
+ */
+ return -EIO;
+}
+
+/* Request access to submit new io to inode via mmap */
+int fuse_file_io_mmap(struct fuse_file *ff, struct inode *inode)
+{
+ struct fuse_inode *fi = get_fuse_inode(inode);
+ int err = 0;
+
+ /* There are no io modes if server does not implement open */
+ if (!ff->release_args)
+ return 0;
+
+ if (WARN_ON(!ff->io_opened))
+ return -ENODEV;
+
+ spin_lock(&fi->lock);
+ /* First mmap of direct_io file enters caching inode io mode */
+ if (!(ff->open_flags & FOPEN_CACHE_IO)) {
+ err = fuse_inode_get_io_cache(fi);
+ if (!err)
+ ff->open_flags |= FOPEN_CACHE_IO;
+ }
+ spin_unlock(&fi->lock);
+
+ return err;
+}
+
+/* No more pending io and no new io possible to inode via open/mmapped file */
+void fuse_file_io_release(struct fuse_file *ff, struct inode *inode)
+{
+ if (!ff->io_opened)
+ return;
+
+ /* Last caching file close exits caching inode io mode */
+ if (ff->open_flags & FOPEN_CACHE_IO)
+ fuse_file_cached_io_end(inode);
+
+ ff->io_opened = false;
+}
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index e7418d15fe39..1f452d9a5024 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -353,6 +353,7 @@ struct fuse_file_lock {
* FOPEN_STREAM: the file is stream-like (no file position at all)
* FOPEN_NOFLUSH: don't flush data cache on close (unless FUSE_WRITEBACK_CACHE)
* FOPEN_PARALLEL_DIRECT_WRITES: Allow concurrent direct writes on the same inode
+ * FOPEN_CACHE_IO: internal flag for mmap of direct_io (resereved for future use)
*/
#define FOPEN_DIRECT_IO (1 << 0)
#define FOPEN_KEEP_CACHE (1 << 1)
@@ -361,6 +362,7 @@ struct fuse_file_lock {
#define FOPEN_STREAM (1 << 4)
#define FOPEN_NOFLUSH (1 << 5)
#define FOPEN_PARALLEL_DIRECT_WRITES (1 << 6)
+#define FOPEN_CACHE_IO (1 << 7)
/**
* INIT request/reply flags
--
2.34.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH v3 8/9] fuse: introduce inode io modes
2024-02-08 17:06 ` [PATCH v3 8/9] fuse: introduce inode io modes Amir Goldstein
@ 2024-02-09 10:21 ` Miklos Szeredi
2024-02-09 10:35 ` Amir Goldstein
0 siblings, 1 reply; 30+ messages in thread
From: Miklos Szeredi @ 2024-02-09 10:21 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Bernd Schubert, linux-fsdevel, Bernd Schubert
On Thu, 8 Feb 2024 at 18:09, Amir Goldstein <amir73il@gmail.com> wrote:
> We use an internal FOPEN_CACHE_IO flag to mark a file that was open in
> caching mode. This flag can not be set by the server.
>
> direct_io mmap uses page cache, so first mmap will mark the file as
> FOPEN_DIRECT_IO|FOPEN_CACHE_IO (i.e. mixed mode) and inode will enter
> the caching io mode.
I think using FOPEN_CACHE_IO for this is wrong. Why not just an internal flag?
> +/* No more pending io and no new io possible to inode via open/mmapped file */
> +void fuse_file_io_release(struct fuse_file *ff, struct inode *inode)
> +{
> + if (!ff->io_opened)
> + return;
> +
> + /* Last caching file close exits caching inode io mode */
> + if (ff->open_flags & FOPEN_CACHE_IO)
> + fuse_file_cached_io_end(inode);
The above is the only place where io_opened is checked without a
WARN_ON. So can't this be incorporated into the flag indicating
whether caching is on?
Thanks,
Miklos
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 8/9] fuse: introduce inode io modes
2024-02-09 10:21 ` Miklos Szeredi
@ 2024-02-09 10:35 ` Amir Goldstein
2024-02-09 10:56 ` Miklos Szeredi
0 siblings, 1 reply; 30+ messages in thread
From: Amir Goldstein @ 2024-02-09 10:35 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Bernd Schubert, linux-fsdevel, Bernd Schubert
On Fri, Feb 9, 2024 at 12:21 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Thu, 8 Feb 2024 at 18:09, Amir Goldstein <amir73il@gmail.com> wrote:
>
> > We use an internal FOPEN_CACHE_IO flag to mark a file that was open in
> > caching mode. This flag can not be set by the server.
> >
> > direct_io mmap uses page cache, so first mmap will mark the file as
> > FOPEN_DIRECT_IO|FOPEN_CACHE_IO (i.e. mixed mode) and inode will enter
> > the caching io mode.
>
> I think using FOPEN_CACHE_IO for this is wrong. Why not just an internal flag?
>
No reason apart from "symmetry" with the other io mode FOPEN flags.
I will use an internal flag.
> > +/* No more pending io and no new io possible to inode via open/mmapped file */
> > +void fuse_file_io_release(struct fuse_file *ff, struct inode *inode)
> > +{
> > + if (!ff->io_opened)
> > + return;
> > +
> > + /* Last caching file close exits caching inode io mode */
> > + if (ff->open_flags & FOPEN_CACHE_IO)
> > + fuse_file_cached_io_end(inode);
>
> The above is the only place where io_opened is checked without a
> WARN_ON. So can't this be incorporated into the flag indicating
> whether caching is on?
I added io_open to fix a bug in an earlier version where fuse_file_release()
was called on the error path after fuse_file_io_open() failed.
The simple change would be to replace FOPEN_CACHE_IO flag
with ff->io_cache_opened bit.
I assume you meant to change ff->io_opened to an enum:
{ FUSE_IO_NONE, FUSE_IO_CACHE, FUSE_IO_DIRECT,
FUSE_IO_PASSTHROUGH }
Is that what you mean?
Thanks,
Amir.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 8/9] fuse: introduce inode io modes
2024-02-09 10:35 ` Amir Goldstein
@ 2024-02-09 10:56 ` Miklos Szeredi
2024-02-09 11:50 ` Amir Goldstein
0 siblings, 1 reply; 30+ messages in thread
From: Miklos Szeredi @ 2024-02-09 10:56 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Bernd Schubert, linux-fsdevel, Bernd Schubert
On Fri, 9 Feb 2024 at 11:35, Amir Goldstein <amir73il@gmail.com> wrote:
> No reason apart from "symmetry" with the other io mode FOPEN flags.
> I will use an internal flag.
Okay.
Other ff->open_flags are set at open time and never changed.
FOPEN_CACHE_IO doesn't fit into that pattern.
> I added io_open to fix a bug in an earlier version where fuse_file_release()
> was called on the error path after fuse_file_io_open() failed.
>
> The simple change would be to replace FOPEN_CACHE_IO flag
> with ff->io_cache_opened bit.
Right.
>
> I assume you meant to change ff->io_opened to an enum:
> { FUSE_IO_NONE, FUSE_IO_CACHE, FUSE_IO_DIRECT,
> FUSE_IO_PASSTHROUGH }
>
> Is that what you mean?
I just meant ff->io_cache_opened is only set when the cache counter
needs decrementing. The logic is simpler and is trivially right.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 8/9] fuse: introduce inode io modes
2024-02-09 10:56 ` Miklos Szeredi
@ 2024-02-09 11:50 ` Amir Goldstein
0 siblings, 0 replies; 30+ messages in thread
From: Amir Goldstein @ 2024-02-09 11:50 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Bernd Schubert, linux-fsdevel, Bernd Schubert
On Fri, Feb 9, 2024 at 12:56 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Fri, 9 Feb 2024 at 11:35, Amir Goldstein <amir73il@gmail.com> wrote:
>
> > No reason apart from "symmetry" with the other io mode FOPEN flags.
> > I will use an internal flag.
>
> Okay.
>
> Other ff->open_flags are set at open time and never changed.
> FOPEN_CACHE_IO doesn't fit into that pattern.
>
> > I added io_open to fix a bug in an earlier version where fuse_file_release()
> > was called on the error path after fuse_file_io_open() failed.
> >
> > The simple change would be to replace FOPEN_CACHE_IO flag
> > with ff->io_cache_opened bit.
>
> Right.
>
> >
> > I assume you meant to change ff->io_opened to an enum:
> > { FUSE_IO_NONE, FUSE_IO_CACHE, FUSE_IO_DIRECT,
> > FUSE_IO_PASSTHROUGH }
> >
> > Is that what you mean?
>
> I just meant ff->io_cache_opened is only set when the cache counter
> needs decrementing. The logic is simpler and is trivially right.
>
Ok. but I'll leave it named io_opened because with fuse passthrough
io_opened means either positive or negative refcount:
void fuse_file_io_release(struct fuse_file *ff, struct inode *inode)
{
if (!ff->io_opened)
return;
/*
* Last caching file close allows passthrough open of inode and
* Last passthrough file close allows caching open of inode.
*/
if (ff->open_flags & FOPEN_PASSTHROUGH)
fuse_file_uncached_io_end(inode);
else
fuse_file_cached_io_end(inode);
ff->io_opened = false;
}
Thanks,
Amir.
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v3 9/9] fuse: allow parallel dio writes with FUSE_DIRECT_IO_ALLOW_MMAP
2024-02-08 17:05 [PATCH v3 0/9] fuse: inode IO modes and mmap + parallel dio Amir Goldstein
` (7 preceding siblings ...)
2024-02-08 17:06 ` [PATCH v3 8/9] fuse: introduce inode io modes Amir Goldstein
@ 2024-02-08 17:06 ` Amir Goldstein
2024-02-09 10:50 ` Miklos Szeredi
8 siblings, 1 reply; 30+ messages in thread
From: Amir Goldstein @ 2024-02-08 17:06 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Bernd Schubert, linux-fsdevel, Bernd Schubert
Instead of denying caching mode on parallel dio open, deny caching
open only while parallel dio are in-progress and wait for in-progress
parallel dio writes before entering inode caching io mode.
This allows executing parallel dio when inode is not in caching mode
even if shared mmap is allowed, but no mmaps have been performed on
the inode in question.
An mmap on direct_io file now waits for all in-progress parallel dio
writes to complete, so parallel dio writes together with
FUSE_DIRECT_IO_ALLOW_MMAP is enabled by this commit.
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/fuse/file.c | 28 ++++++++++++++++++++--------
fs/fuse/fuse_i.h | 3 +++
fs/fuse/iomode.c | 45 ++++++++++++++++++++++++++++++++++++---------
3 files changed, 59 insertions(+), 17 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 29e18e5a6f6c..eb226457c4bd 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1335,6 +1335,7 @@ static bool fuse_dio_wr_exclusive_lock(struct kiocb *iocb, struct iov_iter *from
struct file *file = iocb->ki_filp;
struct fuse_file *ff = file->private_data;
struct inode *inode = file_inode(iocb->ki_filp);
+ struct fuse_inode *fi = get_fuse_inode(inode);
/* server side has to advise that it supports parallel dio writes */
if (!(ff->open_flags & FOPEN_PARALLEL_DIRECT_WRITES))
@@ -1346,11 +1347,9 @@ static bool fuse_dio_wr_exclusive_lock(struct kiocb *iocb, struct iov_iter *from
if (iocb->ki_flags & IOCB_APPEND)
return true;
- /* combination opf page access and direct-io difficult, shared
- * locks actually introduce a conflict.
- */
- if (get_fuse_conn(inode)->direct_io_allow_mmap)
- return true;
+ /* shared locks are not allowed with parallel page cache IO */
+ if (test_bit(FUSE_I_CACHE_IO_MODE, &fi->state))
+ return false;
/* parallel dio beyond eof is at least for now not supported */
if (fuse_io_past_eof(iocb, from))
@@ -1370,10 +1369,14 @@ static void fuse_dio_lock(struct kiocb *iocb, struct iov_iter *from,
} else {
inode_lock_shared(inode);
/*
- * Previous check was without inode lock and might have raced,
- * check again.
+ * New parallal dio allowed only if inode is not in caching
+ * mode and denies new opens in caching mode. This check
+ * should be performed only after taking shared inode lock.
+ * Previous past eof check was without inode lock and might
+ * have raced, so check it again.
*/
- if (fuse_io_past_eof(iocb, from)) {
+ if (fuse_io_past_eof(iocb, from) ||
+ fuse_file_uncached_io_start(inode) != 0) {
inode_unlock_shared(inode);
inode_lock(inode);
*exclusive = true;
@@ -1386,6 +1389,8 @@ static void fuse_dio_unlock(struct inode *inode, bool exclusive)
if (exclusive) {
inode_unlock(inode);
} else {
+ /* Allow opens in caching mode after last parallel dio end */
+ fuse_file_uncached_io_end(inode);
inode_unlock_shared(inode);
}
}
@@ -2521,6 +2526,10 @@ static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma)
if (FUSE_IS_DAX(file_inode(file)))
return fuse_dax_mmap(file, vma);
+ /*
+ * FOPEN_DIRECT_IO handling is special compared to O_DIRECT,
+ * as does not allow MAP_SHARED mmap without FUSE_DIRECT_IO_ALLOW_MMAP.
+ */
if (ff->open_flags & FOPEN_DIRECT_IO) {
/*
* Can't provide the coherency needed for MAP_SHARED
@@ -2533,6 +2542,8 @@ static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma)
/*
* First mmap of direct_io file enters caching inode io mode.
+ * Also waits for parallel dio writers to go into serial mode
+ * (exclusive instead of shared lock).
*/
rc = fuse_file_io_mmap(ff, file_inode(file));
if (rc)
@@ -3312,6 +3323,7 @@ void fuse_init_file_inode(struct inode *inode, unsigned int flags)
fi->writectr = 0;
fi->iocachectr = 0;
init_waitqueue_head(&fi->page_waitq);
+ init_waitqueue_head(&fi->direct_io_waitq);
fi->writepages = RB_ROOT;
if (IS_ENABLED(CONFIG_FUSE_DAX))
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 5e5465f6a1ac..dede4378c719 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -129,6 +129,9 @@ struct fuse_inode {
/* Waitq for writepage completion */
wait_queue_head_t page_waitq;
+ /* waitq for direct-io completion */
+ wait_queue_head_t direct_io_waitq;
+
/* List of writepage requestst (pending or sent) */
struct rb_root writepages;
};
diff --git a/fs/fuse/iomode.c b/fs/fuse/iomode.c
index 13faae77aec4..acd0833ae873 100644
--- a/fs/fuse/iomode.c
+++ b/fs/fuse/iomode.c
@@ -12,18 +12,45 @@
#include <linux/file.h>
#include <linux/fs.h>
+/*
+ * Return true if need to wait for new opens in caching mode.
+ */
+static inline bool fuse_is_io_cache_wait(struct fuse_inode *fi)
+{
+ return READ_ONCE(fi->iocachectr) < 0;
+}
+
/*
* Request an open in caching mode.
+ * Blocks new parallel dio writes and waits for the in-progress parallel dio
+ * writes to complete.
* Return 0 if in caching mode.
*/
static int fuse_inode_get_io_cache(struct fuse_inode *fi)
{
+ int err = 0;
+
assert_spin_locked(&fi->lock);
- if (fi->iocachectr < 0)
- return -ETXTBSY;
- if (fi->iocachectr++ == 0)
- set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
- return 0;
+ /*
+ * Setting the bit advises new direct-io writes to use an exclusive
+ * lock - without it the wait below might be forever.
+ */
+ set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
+ while (!err && fuse_is_io_cache_wait(fi)) {
+ spin_unlock(&fi->lock);
+ err = wait_event_killable(fi->direct_io_waitq,
+ !fuse_is_io_cache_wait(fi));
+ spin_lock(&fi->lock);
+ }
+ /*
+ * Enter caching mode or clear the FUSE_I_CACHE_IO_MODE bit if we
+ * failed to enter caching mode and no other caching open exists.
+ */
+ if (!err)
+ fi->iocachectr++;
+ else if (fi->iocachectr <= 0)
+ clear_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
+ return err;
}
/*
@@ -102,10 +129,13 @@ int fuse_file_uncached_io_start(struct inode *inode)
void fuse_file_uncached_io_end(struct inode *inode)
{
struct fuse_inode *fi = get_fuse_inode(inode);
+ int uncached_io;
spin_lock(&fi->lock);
- fuse_inode_allow_io_cache(fi);
+ uncached_io = fuse_inode_allow_io_cache(fi);
spin_unlock(&fi->lock);
+ if (!uncached_io)
+ wake_up(&fi->direct_io_waitq);
}
/* Open flags to determine regular file io mode */
@@ -155,13 +185,10 @@ int fuse_file_io_open(struct file *file, struct inode *inode)
/*
* First caching file open enters caching inode io mode.
- * First parallel dio open denies caching inode io mode.
*/
err = 0;
if (ff->open_flags & FOPEN_CACHE_IO)
err = fuse_file_cached_io_start(inode);
- else if (ff->open_flags & FOPEN_PARALLEL_DIRECT_WRITES)
- err = fuse_file_uncached_io_start(inode);
if (err)
goto fail;
--
2.34.1
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH v3 9/9] fuse: allow parallel dio writes with FUSE_DIRECT_IO_ALLOW_MMAP
2024-02-08 17:06 ` [PATCH v3 9/9] fuse: allow parallel dio writes with FUSE_DIRECT_IO_ALLOW_MMAP Amir Goldstein
@ 2024-02-09 10:50 ` Miklos Szeredi
2024-02-09 11:21 ` Bernd Schubert
0 siblings, 1 reply; 30+ messages in thread
From: Miklos Szeredi @ 2024-02-09 10:50 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Bernd Schubert, linux-fsdevel, Bernd Schubert
On Thu, 8 Feb 2024 at 18:09, Amir Goldstein <amir73il@gmail.com> wrote:
> static int fuse_inode_get_io_cache(struct fuse_inode *fi)
> {
> + int err = 0;
> +
> assert_spin_locked(&fi->lock);
> - if (fi->iocachectr < 0)
> - return -ETXTBSY;
> - if (fi->iocachectr++ == 0)
> - set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
> - return 0;
> + /*
> + * Setting the bit advises new direct-io writes to use an exclusive
> + * lock - without it the wait below might be forever.
> + */
> + set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
> + while (!err && fuse_is_io_cache_wait(fi)) {
> + spin_unlock(&fi->lock);
> + err = wait_event_killable(fi->direct_io_waitq,
> + !fuse_is_io_cache_wait(fi));
> + spin_lock(&fi->lock);
> + }
> + /*
> + * Enter caching mode or clear the FUSE_I_CACHE_IO_MODE bit if we
> + * failed to enter caching mode and no other caching open exists.
> + */
> + if (!err)
> + fi->iocachectr++;
> + else if (fi->iocachectr <= 0)
> + clear_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
This seems wrong: if the current task is killed, and there's anther
task trying to get cached open mode, then clearing
FUSE_I_CACHE_IO_MODE will allow new parallel writes, breaking this
logic.
I'd suggest fixing this by not making the wait killable. It's just a
nice to have, but without all the other waits being killable (e.g.
filesystem locks) it's just a drop in the ocean.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 9/9] fuse: allow parallel dio writes with FUSE_DIRECT_IO_ALLOW_MMAP
2024-02-09 10:50 ` Miklos Szeredi
@ 2024-02-09 11:21 ` Bernd Schubert
2024-02-09 11:48 ` Bernd Schubert
0 siblings, 1 reply; 30+ messages in thread
From: Bernd Schubert @ 2024-02-09 11:21 UTC (permalink / raw)
To: Miklos Szeredi, Amir Goldstein; +Cc: linux-fsdevel, Bernd Schubert
On 2/9/24 11:50, Miklos Szeredi wrote:
> On Thu, 8 Feb 2024 at 18:09, Amir Goldstein <amir73il@gmail.com> wrote:
>
>> static int fuse_inode_get_io_cache(struct fuse_inode *fi)
>> {
>> + int err = 0;
>> +
>> assert_spin_locked(&fi->lock);
>> - if (fi->iocachectr < 0)
>> - return -ETXTBSY;
>> - if (fi->iocachectr++ == 0)
>> - set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
>> - return 0;
>> + /*
>> + * Setting the bit advises new direct-io writes to use an exclusive
>> + * lock - without it the wait below might be forever.
>> + */
>> + set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
>> + while (!err && fuse_is_io_cache_wait(fi)) {
>> + spin_unlock(&fi->lock);
>> + err = wait_event_killable(fi->direct_io_waitq,
>> + !fuse_is_io_cache_wait(fi));
>> + spin_lock(&fi->lock);
>> + }
>> + /*
>> + * Enter caching mode or clear the FUSE_I_CACHE_IO_MODE bit if we
>> + * failed to enter caching mode and no other caching open exists.
>> + */
>> + if (!err)
>> + fi->iocachectr++;
>> + else if (fi->iocachectr <= 0)
>> + clear_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
>
> This seems wrong: if the current task is killed, and there's anther
> task trying to get cached open mode, then clearing
> FUSE_I_CACHE_IO_MODE will allow new parallel writes, breaking this
> logic.
This is called holding a spin lock, another task cannot enter here?
Neither can direct-IO, because it is also locked out. The bit helps DIO
code to avoid trying to do parallel DIO without the need to take a spin
lock. When DIO decides it wants to do parallel IO, it first has to get
past fi->iocachectr < 0 - if there is another task trying to do cache
IO, either DIO gets < 0 first and the other cache task has to wait, or
cache tasks gets > 0 and dio will continue with the exclusive lock. Or
do I miss something?
>
> I'd suggest fixing this by not making the wait killable. It's just a
> nice to have, but without all the other waits being killable (e.g.
> filesystem locks) it's just a drop in the ocean.
Thanks,
Bernd
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 9/9] fuse: allow parallel dio writes with FUSE_DIRECT_IO_ALLOW_MMAP
2024-02-09 11:21 ` Bernd Schubert
@ 2024-02-09 11:48 ` Bernd Schubert
2024-02-09 12:12 ` Amir Goldstein
0 siblings, 1 reply; 30+ messages in thread
From: Bernd Schubert @ 2024-02-09 11:48 UTC (permalink / raw)
To: Miklos Szeredi, Amir Goldstein; +Cc: linux-fsdevel, Bernd Schubert
On 2/9/24 12:21, Bernd Schubert wrote:
>
>
> On 2/9/24 11:50, Miklos Szeredi wrote:
>> On Thu, 8 Feb 2024 at 18:09, Amir Goldstein <amir73il@gmail.com> wrote:
>>
>>> static int fuse_inode_get_io_cache(struct fuse_inode *fi)
>>> {
>>> + int err = 0;
>>> +
>>> assert_spin_locked(&fi->lock);
>>> - if (fi->iocachectr < 0)
>>> - return -ETXTBSY;
>>> - if (fi->iocachectr++ == 0)
>>> - set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
>>> - return 0;
>>> + /*
>>> + * Setting the bit advises new direct-io writes to use an exclusive
>>> + * lock - without it the wait below might be forever.
>>> + */
>>> + set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
>>> + while (!err && fuse_is_io_cache_wait(fi)) {
>>> + spin_unlock(&fi->lock);
>>> + err = wait_event_killable(fi->direct_io_waitq,
>>> + !fuse_is_io_cache_wait(fi));
>>> + spin_lock(&fi->lock);
>>> + }
>>> + /*
>>> + * Enter caching mode or clear the FUSE_I_CACHE_IO_MODE bit if we
>>> + * failed to enter caching mode and no other caching open exists.
>>> + */
>>> + if (!err)
>>> + fi->iocachectr++;
>>> + else if (fi->iocachectr <= 0)
>>> + clear_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
>>
>> This seems wrong: if the current task is killed, and there's anther
>> task trying to get cached open mode, then clearing
>> FUSE_I_CACHE_IO_MODE will allow new parallel writes, breaking this
>> logic.
>
> This is called holding a spin lock, another task cannot enter here?
> Neither can direct-IO, because it is also locked out. The bit helps DIO
> code to avoid trying to do parallel DIO without the need to take a spin
> lock. When DIO decides it wants to do parallel IO, it first has to get
> past fi->iocachectr < 0 - if there is another task trying to do cache
> IO, either DIO gets < 0 first and the other cache task has to wait, or
> cache tasks gets > 0 and dio will continue with the exclusive lock. Or
> do I miss something?
Now I see what you mean, there is an unlock and another task might have also already set the bit
I think this should do
diff --git a/fs/fuse/iomode.c b/fs/fuse/iomode.c
index acd0833ae873..7c22edd674cb 100644
--- a/fs/fuse/iomode.c
+++ b/fs/fuse/iomode.c
@@ -41,6 +41,8 @@ static int fuse_inode_get_io_cache(struct fuse_inode *fi)
err = wait_event_killable(fi->direct_io_waitq,
!fuse_is_io_cache_wait(fi));
spin_lock(&fi->lock);
+ if (!err)
+ /* Another interrupted task might have unset it */
+ set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
}
/*
* Enter caching mode or clear the FUSE_I_CACHE_IO_MODE bit if we
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH v3 9/9] fuse: allow parallel dio writes with FUSE_DIRECT_IO_ALLOW_MMAP
2024-02-09 11:48 ` Bernd Schubert
@ 2024-02-09 12:12 ` Amir Goldstein
2024-02-09 12:19 ` Amir Goldstein
` (2 more replies)
0 siblings, 3 replies; 30+ messages in thread
From: Amir Goldstein @ 2024-02-09 12:12 UTC (permalink / raw)
To: Bernd Schubert; +Cc: Miklos Szeredi, linux-fsdevel, Bernd Schubert
On Fri, Feb 9, 2024 at 1:48 PM Bernd Schubert
<bernd.schubert@fastmail.fm> wrote:
>
>
>
> On 2/9/24 12:21, Bernd Schubert wrote:
> >
> >
> > On 2/9/24 11:50, Miklos Szeredi wrote:
> >> On Thu, 8 Feb 2024 at 18:09, Amir Goldstein <amir73il@gmail.com> wrote:
> >>
> >>> static int fuse_inode_get_io_cache(struct fuse_inode *fi)
> >>> {
> >>> + int err = 0;
> >>> +
> >>> assert_spin_locked(&fi->lock);
> >>> - if (fi->iocachectr < 0)
> >>> - return -ETXTBSY;
> >>> - if (fi->iocachectr++ == 0)
> >>> - set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
> >>> - return 0;
> >>> + /*
> >>> + * Setting the bit advises new direct-io writes to use an exclusive
> >>> + * lock - without it the wait below might be forever.
> >>> + */
> >>> + set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
> >>> + while (!err && fuse_is_io_cache_wait(fi)) {
> >>> + spin_unlock(&fi->lock);
> >>> + err = wait_event_killable(fi->direct_io_waitq,
> >>> + !fuse_is_io_cache_wait(fi));
> >>> + spin_lock(&fi->lock);
> >>> + }
> >>> + /*
> >>> + * Enter caching mode or clear the FUSE_I_CACHE_IO_MODE bit if we
> >>> + * failed to enter caching mode and no other caching open exists.
> >>> + */
> >>> + if (!err)
> >>> + fi->iocachectr++;
> >>> + else if (fi->iocachectr <= 0)
> >>> + clear_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
> >>
> >> This seems wrong: if the current task is killed, and there's anther
> >> task trying to get cached open mode, then clearing
> >> FUSE_I_CACHE_IO_MODE will allow new parallel writes, breaking this
> >> logic.
> >
> > This is called holding a spin lock, another task cannot enter here?
> > Neither can direct-IO, because it is also locked out. The bit helps DIO
> > code to avoid trying to do parallel DIO without the need to take a spin
> > lock. When DIO decides it wants to do parallel IO, it first has to get
> > past fi->iocachectr < 0 - if there is another task trying to do cache
> > IO, either DIO gets < 0 first and the other cache task has to wait, or
> > cache tasks gets > 0 and dio will continue with the exclusive lock. Or
> > do I miss something?
>
> Now I see what you mean, there is an unlock and another task might have also already set the bit
>
> I think this should do
>
> diff --git a/fs/fuse/iomode.c b/fs/fuse/iomode.c
> index acd0833ae873..7c22edd674cb 100644
> --- a/fs/fuse/iomode.c
> +++ b/fs/fuse/iomode.c
> @@ -41,6 +41,8 @@ static int fuse_inode_get_io_cache(struct fuse_inode *fi)
> err = wait_event_killable(fi->direct_io_waitq,
> !fuse_is_io_cache_wait(fi));
> spin_lock(&fi->lock);
> + if (!err)
> + /* Another interrupted task might have unset it */
> + set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
> }
> /*
> * Enter caching mode or clear the FUSE_I_CACHE_IO_MODE bit if we
I think this race can happen even if we remove killable_
not sure - anyway, with fuse passthrough there is another error
condition:
/*
* Check if inode entered passthrough io mode while waiting for parallel
* dio write completion.
*/
if (fuse_inode_backing(fi))
err = -ETXTBSY;
But in this condition, all waiting tasks should abort the wait,
so it does not seem a problem to clean the flag.
Anyway, IMO it is better to set the flag before every wait and on
success. Like below.
Thanks,
Amir.
--- a/fs/fuse/iomode.c
+++ b/fs/fuse/iomode.c
@@ -35,8 +35,8 @@ static int fuse_inode_get_io_cache(struct fuse_inode *fi)
* Setting the bit advises new direct-io writes to use an exclusive
* lock - without it the wait below might be forever.
*/
- set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
while (!err && fuse_is_io_cache_wait(fi)) {
+ set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
spin_unlock(&fi->lock);
err = wait_event_killable(fi->direct_io_waitq,
!fuse_is_io_cache_wait(fi));
@@ -53,8 +53,8 @@ static int fuse_inode_get_io_cache(struct fuse_inode *fi)
* Enter caching mode or clear the FUSE_I_CACHE_IO_MODE bit if we
* failed to enter caching mode and no other caching open exists.
*/
- if (!err)
- fi->iocachectr++;
+ if (!err && fi->iocachectr++ == 0)
+ set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
else if (fi->iocachectr <= 0)
clear_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
return err;
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 9/9] fuse: allow parallel dio writes with FUSE_DIRECT_IO_ALLOW_MMAP
2024-02-09 12:12 ` Amir Goldstein
@ 2024-02-09 12:19 ` Amir Goldstein
2024-02-09 12:19 ` Bernd Schubert
2024-02-09 13:26 ` Miklos Szeredi
2 siblings, 0 replies; 30+ messages in thread
From: Amir Goldstein @ 2024-02-09 12:19 UTC (permalink / raw)
To: Bernd Schubert; +Cc: Miklos Szeredi, linux-fsdevel, Bernd Schubert
On Fri, Feb 9, 2024 at 2:12 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Fri, Feb 9, 2024 at 1:48 PM Bernd Schubert
> <bernd.schubert@fastmail.fm> wrote:
> >
> >
> >
> > On 2/9/24 12:21, Bernd Schubert wrote:
> > >
> > >
> > > On 2/9/24 11:50, Miklos Szeredi wrote:
> > >> On Thu, 8 Feb 2024 at 18:09, Amir Goldstein <amir73il@gmail.com> wrote:
> > >>
> > >>> static int fuse_inode_get_io_cache(struct fuse_inode *fi)
> > >>> {
> > >>> + int err = 0;
> > >>> +
> > >>> assert_spin_locked(&fi->lock);
> > >>> - if (fi->iocachectr < 0)
> > >>> - return -ETXTBSY;
> > >>> - if (fi->iocachectr++ == 0)
> > >>> - set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
> > >>> - return 0;
> > >>> + /*
> > >>> + * Setting the bit advises new direct-io writes to use an exclusive
> > >>> + * lock - without it the wait below might be forever.
> > >>> + */
> > >>> + set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
> > >>> + while (!err && fuse_is_io_cache_wait(fi)) {
> > >>> + spin_unlock(&fi->lock);
> > >>> + err = wait_event_killable(fi->direct_io_waitq,
> > >>> + !fuse_is_io_cache_wait(fi));
> > >>> + spin_lock(&fi->lock);
> > >>> + }
> > >>> + /*
> > >>> + * Enter caching mode or clear the FUSE_I_CACHE_IO_MODE bit if we
> > >>> + * failed to enter caching mode and no other caching open exists.
> > >>> + */
> > >>> + if (!err)
> > >>> + fi->iocachectr++;
> > >>> + else if (fi->iocachectr <= 0)
> > >>> + clear_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
> > >>
> > >> This seems wrong: if the current task is killed, and there's anther
> > >> task trying to get cached open mode, then clearing
> > >> FUSE_I_CACHE_IO_MODE will allow new parallel writes, breaking this
> > >> logic.
> > >
> > > This is called holding a spin lock, another task cannot enter here?
> > > Neither can direct-IO, because it is also locked out. The bit helps DIO
> > > code to avoid trying to do parallel DIO without the need to take a spin
> > > lock. When DIO decides it wants to do parallel IO, it first has to get
> > > past fi->iocachectr < 0 - if there is another task trying to do cache
> > > IO, either DIO gets < 0 first and the other cache task has to wait, or
> > > cache tasks gets > 0 and dio will continue with the exclusive lock. Or
> > > do I miss something?
> >
> > Now I see what you mean, there is an unlock and another task might have also already set the bit
> >
> > I think this should do
> >
> > diff --git a/fs/fuse/iomode.c b/fs/fuse/iomode.c
> > index acd0833ae873..7c22edd674cb 100644
> > --- a/fs/fuse/iomode.c
> > +++ b/fs/fuse/iomode.c
> > @@ -41,6 +41,8 @@ static int fuse_inode_get_io_cache(struct fuse_inode *fi)
> > err = wait_event_killable(fi->direct_io_waitq,
> > !fuse_is_io_cache_wait(fi));
> > spin_lock(&fi->lock);
> > + if (!err)
> > + /* Another interrupted task might have unset it */
> > + set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
> > }
> > /*
> > * Enter caching mode or clear the FUSE_I_CACHE_IO_MODE bit if we
>
> I think this race can happen even if we remove killable_
> not sure - anyway, with fuse passthrough there is another error
> condition:
>
> /*
> * Check if inode entered passthrough io mode while waiting for parallel
> * dio write completion.
> */
> if (fuse_inode_backing(fi))
> err = -ETXTBSY;
>
> But in this condition, all waiting tasks should abort the wait,
> so it does not seem a problem to clean the flag.
>
> Anyway, IMO it is better to set the flag before every wait and on
> success. Like below.
>
> Thanks,
> Amir.
>
> --- a/fs/fuse/iomode.c
> +++ b/fs/fuse/iomode.c
> @@ -35,8 +35,8 @@ static int fuse_inode_get_io_cache(struct fuse_inode *fi)
> * Setting the bit advises new direct-io writes to use an exclusive
> * lock - without it the wait below might be forever.
> */
> - set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
> while (!err && fuse_is_io_cache_wait(fi)) {
> + set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
> spin_unlock(&fi->lock);
> err = wait_event_killable(fi->direct_io_waitq,
> !fuse_is_io_cache_wait(fi));
> @@ -53,8 +53,8 @@ static int fuse_inode_get_io_cache(struct fuse_inode *fi)
> * Enter caching mode or clear the FUSE_I_CACHE_IO_MODE bit if we
> * failed to enter caching mode and no other caching open exists.
> */
> - if (!err)
> - fi->iocachectr++;
> + if (!err && fi->iocachectr++ == 0)
> + set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
> else if (fi->iocachectr <= 0)
> clear_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
> return err;
Oops should be:
if (!err) {
fi->iocachectr++;
set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
} else if (fi->iocachectr <= 0) {
clear_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
}
return err;
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 9/9] fuse: allow parallel dio writes with FUSE_DIRECT_IO_ALLOW_MMAP
2024-02-09 12:12 ` Amir Goldstein
2024-02-09 12:19 ` Amir Goldstein
@ 2024-02-09 12:19 ` Bernd Schubert
2024-02-09 13:26 ` Miklos Szeredi
2 siblings, 0 replies; 30+ messages in thread
From: Bernd Schubert @ 2024-02-09 12:19 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Miklos Szeredi, linux-fsdevel, Bernd Schubert
On 2/9/24 13:12, Amir Goldstein wrote:
> On Fri, Feb 9, 2024 at 1:48 PM Bernd Schubert
> <bernd.schubert@fastmail.fm> wrote:
>>
>>
>>
>> On 2/9/24 12:21, Bernd Schubert wrote:
>>>
>>>
>>> On 2/9/24 11:50, Miklos Szeredi wrote:
>>>> On Thu, 8 Feb 2024 at 18:09, Amir Goldstein <amir73il@gmail.com> wrote:
>>>>
>>>>> static int fuse_inode_get_io_cache(struct fuse_inode *fi)
>>>>> {
>>>>> + int err = 0;
>>>>> +
>>>>> assert_spin_locked(&fi->lock);
>>>>> - if (fi->iocachectr < 0)
>>>>> - return -ETXTBSY;
>>>>> - if (fi->iocachectr++ == 0)
>>>>> - set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
>>>>> - return 0;
>>>>> + /*
>>>>> + * Setting the bit advises new direct-io writes to use an exclusive
>>>>> + * lock - without it the wait below might be forever.
>>>>> + */
>>>>> + set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
>>>>> + while (!err && fuse_is_io_cache_wait(fi)) {
>>>>> + spin_unlock(&fi->lock);
>>>>> + err = wait_event_killable(fi->direct_io_waitq,
>>>>> + !fuse_is_io_cache_wait(fi));
>>>>> + spin_lock(&fi->lock);
>>>>> + }
>>>>> + /*
>>>>> + * Enter caching mode or clear the FUSE_I_CACHE_IO_MODE bit if we
>>>>> + * failed to enter caching mode and no other caching open exists.
>>>>> + */
>>>>> + if (!err)
>>>>> + fi->iocachectr++;
>>>>> + else if (fi->iocachectr <= 0)
>>>>> + clear_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
>>>>
>>>> This seems wrong: if the current task is killed, and there's anther
>>>> task trying to get cached open mode, then clearing
>>>> FUSE_I_CACHE_IO_MODE will allow new parallel writes, breaking this
>>>> logic.
>>>
>>> This is called holding a spin lock, another task cannot enter here?
>>> Neither can direct-IO, because it is also locked out. The bit helps DIO
>>> code to avoid trying to do parallel DIO without the need to take a spin
>>> lock. When DIO decides it wants to do parallel IO, it first has to get
>>> past fi->iocachectr < 0 - if there is another task trying to do cache
>>> IO, either DIO gets < 0 first and the other cache task has to wait, or
>>> cache tasks gets > 0 and dio will continue with the exclusive lock. Or
>>> do I miss something?
>>
>> Now I see what you mean, there is an unlock and another task might have also already set the bit
>>
>> I think this should do
>>
>> diff --git a/fs/fuse/iomode.c b/fs/fuse/iomode.c
>> index acd0833ae873..7c22edd674cb 100644
>> --- a/fs/fuse/iomode.c
>> +++ b/fs/fuse/iomode.c
>> @@ -41,6 +41,8 @@ static int fuse_inode_get_io_cache(struct fuse_inode *fi)
>> err = wait_event_killable(fi->direct_io_waitq,
>> !fuse_is_io_cache_wait(fi));
>> spin_lock(&fi->lock);
>> + if (!err)
>> + /* Another interrupted task might have unset it */
>> + set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
>> }
>> /*
>> * Enter caching mode or clear the FUSE_I_CACHE_IO_MODE bit if we
>
> I think this race can happen even if we remove killable_
> not sure - anyway, with fuse passthrough there is another error
> condition:
>
> /*
> * Check if inode entered passthrough io mode while waiting for parallel
> * dio write completion.
> */
> if (fuse_inode_backing(fi))
> err = -ETXTBSY;
>
> But in this condition, all waiting tasks should abort the wait,
> so it does not seem a problem to clean the flag.
>
> Anyway, IMO it is better to set the flag before every wait and on
> success. Like below.
>
> Thanks,
> Amir.
>
> --- a/fs/fuse/iomode.c
> +++ b/fs/fuse/iomode.c
> @@ -35,8 +35,8 @@ static int fuse_inode_get_io_cache(struct fuse_inode *fi)
> * Setting the bit advises new direct-io writes to use an exclusive
> * lock - without it the wait below might be forever.
> */
> - set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
> while (!err && fuse_is_io_cache_wait(fi)) {
> + set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
> spin_unlock(&fi->lock);
> err = wait_event_killable(fi->direct_io_waitq,
> !fuse_is_io_cache_wait(fi));
> @@ -53,8 +53,8 @@ static int fuse_inode_get_io_cache(struct fuse_inode *fi)
> * Enter caching mode or clear the FUSE_I_CACHE_IO_MODE bit if we
> * failed to enter caching mode and no other caching open exists.
> */
> - if (!err)
> - fi->iocachectr++;
> + if (!err && fi->iocachectr++ == 0)
> + set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
> else if (fi->iocachectr <= 0)
> clear_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
> return err;
Yeah, that is fine with me. Basically every wait signals "please let me
in", instead of doing it a single time only.
Thanks,
Bernd
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 9/9] fuse: allow parallel dio writes with FUSE_DIRECT_IO_ALLOW_MMAP
2024-02-09 12:12 ` Amir Goldstein
2024-02-09 12:19 ` Amir Goldstein
2024-02-09 12:19 ` Bernd Schubert
@ 2024-02-09 13:26 ` Miklos Szeredi
2024-02-09 15:28 ` Amir Goldstein
2 siblings, 1 reply; 30+ messages in thread
From: Miklos Szeredi @ 2024-02-09 13:26 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Bernd Schubert, linux-fsdevel, Bernd Schubert
On Fri, 9 Feb 2024 at 13:12, Amir Goldstein <amir73il@gmail.com> wrote:
> I think this race can happen even if we remove killable_
Without _killable, the loop will exit with iocachectr >= 0, hence the
FUSE_I_CACHE_IO_MODE will not be cleared.
> not sure - anyway, with fuse passthrough there is another error
> condition:
>
> /*
> * Check if inode entered passthrough io mode while waiting for parallel
> * dio write completion.
> */
> if (fuse_inode_backing(fi))
> err = -ETXTBSY;
>
> But in this condition, all waiting tasks should abort the wait,
> so it does not seem a problem to clean the flag.
Ah, this complicates things. But I think it's safe to clear
FUSE_I_CACHE_IO_MODE in this case, since other
fuse_inode_get_io_cache() calls will also fail.
> Anyway, IMO it is better to set the flag before every wait and on
> success. Like below.
This would still have the race, since there will be a window during
which the FUSE_I_CACHE_IO_MODE flag has been cleared and new parallel
writes can start, even though there are one or more waiters for cached
open.
Not that this would be a problem in practice, but I also don't see
removing the _killable being a big issue.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 9/9] fuse: allow parallel dio writes with FUSE_DIRECT_IO_ALLOW_MMAP
2024-02-09 13:26 ` Miklos Szeredi
@ 2024-02-09 15:28 ` Amir Goldstein
2024-03-17 13:54 ` Amir Goldstein
0 siblings, 1 reply; 30+ messages in thread
From: Amir Goldstein @ 2024-02-09 15:28 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Bernd Schubert, linux-fsdevel, Bernd Schubert
On Fri, Feb 9, 2024 at 3:27 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Fri, 9 Feb 2024 at 13:12, Amir Goldstein <amir73il@gmail.com> wrote:
>
> > I think this race can happen even if we remove killable_
>
> Without _killable, the loop will exit with iocachectr >= 0, hence the
> FUSE_I_CACHE_IO_MODE will not be cleared.
>
> > not sure - anyway, with fuse passthrough there is another error
> > condition:
> >
> > /*
> > * Check if inode entered passthrough io mode while waiting for parallel
> > * dio write completion.
> > */
> > if (fuse_inode_backing(fi))
> > err = -ETXTBSY;
> >
> > But in this condition, all waiting tasks should abort the wait,
> > so it does not seem a problem to clean the flag.
>
> Ah, this complicates things. But I think it's safe to clear
> FUSE_I_CACHE_IO_MODE in this case, since other
> fuse_inode_get_io_cache() calls will also fail.
>
Right.
> > Anyway, IMO it is better to set the flag before every wait and on
> > success. Like below.
>
> This would still have the race, since there will be a window during
> which the FUSE_I_CACHE_IO_MODE flag has been cleared and new parallel
> writes can start, even though there are one or more waiters for cached
> open.
>
> Not that this would be a problem in practice, but I also don't see
> removing the _killable being a big issue.
ok. Remove killable.
Pushed branches fuse_io_mode-090224 and fuse-backing-fd-090224
with requested fixes.
Note that I had to update libfuse fuse-backing-fd branch, because when
removing FOPEN_CACHE_IO, I changed the constant value of
FOPEN_PASSTHOUGH.
Passes my sanity tests.
Bernd, please verify that I did not break anything on your end.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 9/9] fuse: allow parallel dio writes with FUSE_DIRECT_IO_ALLOW_MMAP
2024-02-09 15:28 ` Amir Goldstein
@ 2024-03-17 13:54 ` Amir Goldstein
2024-03-17 19:31 ` Miklos Szeredi
0 siblings, 1 reply; 30+ messages in thread
From: Amir Goldstein @ 2024-03-17 13:54 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Bernd Schubert, linux-fsdevel, Bernd Schubert
On Fri, Feb 9, 2024 at 5:28 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Fri, Feb 9, 2024 at 3:27 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > On Fri, 9 Feb 2024 at 13:12, Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > > I think this race can happen even if we remove killable_
> >
> > Without _killable, the loop will exit with iocachectr >= 0, hence the
> > FUSE_I_CACHE_IO_MODE will not be cleared.
> >
> > > not sure - anyway, with fuse passthrough there is another error
> > > condition:
> > >
> > > /*
> > > * Check if inode entered passthrough io mode while waiting for parallel
> > > * dio write completion.
> > > */
> > > if (fuse_inode_backing(fi))
> > > err = -ETXTBSY;
> > >
> > > But in this condition, all waiting tasks should abort the wait,
> > > so it does not seem a problem to clean the flag.
> >
> > Ah, this complicates things. But I think it's safe to clear
> > FUSE_I_CACHE_IO_MODE in this case, since other
> > fuse_inode_get_io_cache() calls will also fail.
> >
>
> Right.
>
> > > Anyway, IMO it is better to set the flag before every wait and on
> > > success. Like below.
> >
> > This would still have the race, since there will be a window during
> > which the FUSE_I_CACHE_IO_MODE flag has been cleared and new parallel
> > writes can start, even though there are one or more waiters for cached
> > open.
> >
> > Not that this would be a problem in practice, but I also don't see
> > removing the _killable being a big issue.
>
> ok. Remove killable.
>
> Pushed branches fuse_io_mode-090224 and fuse-backing-fd-090224
> with requested fixes.
>
> Note that I had to update libfuse fuse-backing-fd branch, because when
> removing FOPEN_CACHE_IO, I changed the constant value of
> FOPEN_PASSTHOUGH.
>
> Passes my sanity tests.
> Bernd, please verify that I did not break anything on your end.
>
Miklos,
I see that you decided to drop the waiting for parallel dio logic
in the final version. Decided to simply or found a problem?
Also, FUSE passthrough patch 9/9 ("fuse: auto-invalidate inode
attributes in passthrough mode") was not included in the final version,
so these fstests are failing on non uptodate size/mtime/ctime after
mapped write:
Failures: generic/080 generic/120 generic/207 generic/215
Was it left out by mistake or for a reason?
Thanks,
Amir.
[1] https://lore.kernel.org/linux-fsdevel/20240206142453.1906268-10-amir73il@gmail.com/
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 9/9] fuse: allow parallel dio writes with FUSE_DIRECT_IO_ALLOW_MMAP
2024-03-17 13:54 ` Amir Goldstein
@ 2024-03-17 19:31 ` Miklos Szeredi
2024-03-17 20:00 ` Amir Goldstein
0 siblings, 1 reply; 30+ messages in thread
From: Miklos Szeredi @ 2024-03-17 19:31 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Bernd Schubert, linux-fsdevel, Bernd Schubert
On Sun, 17 Mar 2024 at 14:54, Amir Goldstein <amir73il@gmail.com> wrote:>
> I see that you decided to drop the waiting for parallel dio logic
> in the final version. Decided to simply or found a problem?
I don't think I dropped this. Which one are you thinking?
> Also, FUSE passthrough patch 9/9 ("fuse: auto-invalidate inode
> attributes in passthrough mode") was not included in the final version,
> so these fstests are failing on non uptodate size/mtime/ctime after
> mapped write:
>
> Failures: generic/080 generic/120 generic/207 generic/215
>
> Was it left out by mistake or for a reason?
Yes, this was deliberate. For plain read/write it's simply
unnecessary, since it should get 100% hit rate on the c/mtime tests.
For mmap we do need something, and that something might be best done
by looking at backing inode times. But even in that case I'd just
compare the times from a previous version of the backing inode to the
current version, instead of comparing with the cached values.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 9/9] fuse: allow parallel dio writes with FUSE_DIRECT_IO_ALLOW_MMAP
2024-03-17 19:31 ` Miklos Szeredi
@ 2024-03-17 20:00 ` Amir Goldstein
2024-03-17 20:02 ` Miklos Szeredi
0 siblings, 1 reply; 30+ messages in thread
From: Amir Goldstein @ 2024-03-17 20:00 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Bernd Schubert, linux-fsdevel, Bernd Schubert
On Sun, Mar 17, 2024 at 9:31 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Sun, 17 Mar 2024 at 14:54, Amir Goldstein <amir73il@gmail.com> wrote:>
>
> > I see that you decided to drop the waiting for parallel dio logic
> > in the final version. Decided to simply or found a problem?
>
> I don't think I dropped this. Which one are you thinking?
My bad. just looked at the wrong diff.
>
> > Also, FUSE passthrough patch 9/9 ("fuse: auto-invalidate inode
> > attributes in passthrough mode") was not included in the final version,
> > so these fstests are failing on non uptodate size/mtime/ctime after
> > mapped write:
> >
> > Failures: generic/080 generic/120 generic/207 generic/215
> >
> > Was it left out by mistake or for a reason?
>
> Yes, this was deliberate. For plain read/write it's simply
> unnecessary, since it should get 100% hit rate on the c/mtime tests.
>
> For mmap we do need something, and that something might be best done
> by looking at backing inode times. But even in that case I'd just
> compare the times from a previous version of the backing inode to the
> current version, instead of comparing with the cached values.
>
Well, looking ahead to implementing more passthrough ops, I have
a use case for passthrough of getattr(), so if that will be a possible
configuration in the future, maybe we won't need to worry about
the cached values at all?
Thanks,
Amir.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v3 9/9] fuse: allow parallel dio writes with FUSE_DIRECT_IO_ALLOW_MMAP
2024-03-17 20:00 ` Amir Goldstein
@ 2024-03-17 20:02 ` Miklos Szeredi
0 siblings, 0 replies; 30+ messages in thread
From: Miklos Szeredi @ 2024-03-17 20:02 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Bernd Schubert, linux-fsdevel, Bernd Schubert
On Sun, 17 Mar 2024 at 21:00, Amir Goldstein <amir73il@gmail.com> wrote:
> Well, looking ahead to implementing more passthrough ops, I have
> a use case for passthrough of getattr(), so if that will be a possible
> configuration in the future, maybe we won't need to worry about
> the cached values at all?
Yes, if attributes are passed through, then things become simpler. So
introducing that mode first and seeing if that covers all the use
cases is probably a good first step.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 30+ messages in thread