* [PATCH v2 1/5] fuse: Fix VM_MAYSHARE and direct_io_allow_mmap
2024-01-31 23:08 [PATCH v2 0/5] fuse: inode IO modes and mmap Bernd Schubert
@ 2024-01-31 23:08 ` Bernd Schubert
2024-02-01 8:45 ` Miklos Szeredi
2024-01-31 23:08 ` [PATCH v2 2/5] fuse: Create helper function if DIO write needs exclusive lock Bernd Schubert
` (4 subsequent siblings)
5 siblings, 1 reply; 40+ messages in thread
From: Bernd Schubert @ 2024-01-31 23:08 UTC (permalink / raw)
To: miklos
Cc: linux-fsdevel, dsingh, Bernd Schubert, Hao Xu, stable,
Amir Goldstein
There were multiple issues with direct_io_allow_mmap:
- fuse_link_write_file() was missing, resulting in warnings in
fuse_write_file_get() and EIO from msync()
- "vma->vm_ops = &fuse_file_vm_ops" was not set, but especially
fuse_page_mkwrite is needed.
The semantics of invalidate_inode_pages2() is so far not clearly defined
in fuse_file_mmap. It dates back to
commit 3121bfe76311 ("fuse: fix "direct_io" private mmap")
Though, as direct_io_allow_mmap is a new feature, that was for MAP_PRIVATE
only. As invalidate_inode_pages2() is calling into fuse_launder_folio()
and writes out dirty pages, it should be safe to call
invalidate_inode_pages2 for MAP_PRIVATE and MAP_SHARED as well.
Cc: Hao Xu <howeyxu@tencent.com>
Cc: stable@vger.kernel.org
Fixes: e78662e818f9 ("fuse: add a new fuse init flag to relax restrictions in no cache mode")
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
---
fs/fuse/file.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 148a71b8b4d0..243f469cac07 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -2476,7 +2476,10 @@ static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma)
invalidate_inode_pages2(file->f_mapping);
- return generic_file_mmap(file, vma);
+ if (!(vma->vm_flags & VM_MAYSHARE)) {
+ /* MAP_PRIVATE */
+ return generic_file_mmap(file, vma);
+ }
}
if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_MAYWRITE))
--
2.40.1
^ permalink raw reply related [flat|nested] 40+ messages in thread* Re: [PATCH v2 1/5] fuse: Fix VM_MAYSHARE and direct_io_allow_mmap
2024-01-31 23:08 ` [PATCH v2 1/5] fuse: Fix VM_MAYSHARE and direct_io_allow_mmap Bernd Schubert
@ 2024-02-01 8:45 ` Miklos Szeredi
2024-02-01 14:36 ` Bernd Schubert
0 siblings, 1 reply; 40+ messages in thread
From: Miklos Szeredi @ 2024-02-01 8:45 UTC (permalink / raw)
To: Bernd Schubert; +Cc: linux-fsdevel, dsingh, Hao Xu, stable, Amir Goldstein
On Thu, 1 Feb 2024 at 00:09, Bernd Schubert <bschubert@ddn.com> wrote:
>
> There were multiple issues with direct_io_allow_mmap:
> - fuse_link_write_file() was missing, resulting in warnings in
> fuse_write_file_get() and EIO from msync()
> - "vma->vm_ops = &fuse_file_vm_ops" was not set, but especially
> fuse_page_mkwrite is needed.
>
> The semantics of invalidate_inode_pages2() is so far not clearly defined
> in fuse_file_mmap. It dates back to
> commit 3121bfe76311 ("fuse: fix "direct_io" private mmap")
> Though, as direct_io_allow_mmap is a new feature, that was for MAP_PRIVATE
> only. As invalidate_inode_pages2() is calling into fuse_launder_folio()
> and writes out dirty pages, it should be safe to call
> invalidate_inode_pages2 for MAP_PRIVATE and MAP_SHARED as well.
Did you test with fsx (various versions can be found in LTP/xfstests)?
It's very good at finding mapped vs. non-mapped bugs.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: [PATCH v2 1/5] fuse: Fix VM_MAYSHARE and direct_io_allow_mmap
2024-02-01 8:45 ` Miklos Szeredi
@ 2024-02-01 14:36 ` Bernd Schubert
2024-02-01 14:52 ` Miklos Szeredi
0 siblings, 1 reply; 40+ messages in thread
From: Bernd Schubert @ 2024-02-01 14:36 UTC (permalink / raw)
To: Miklos Szeredi, Bernd Schubert
Cc: linux-fsdevel, dsingh, Hao Xu, stable, Amir Goldstein
On 2/1/24 09:45, Miklos Szeredi wrote:
> On Thu, 1 Feb 2024 at 00:09, Bernd Schubert <bschubert@ddn.com> wrote:
>>
>> There were multiple issues with direct_io_allow_mmap:
>> - fuse_link_write_file() was missing, resulting in warnings in
>> fuse_write_file_get() and EIO from msync()
>> - "vma->vm_ops = &fuse_file_vm_ops" was not set, but especially
>> fuse_page_mkwrite is needed.
>>
>> The semantics of invalidate_inode_pages2() is so far not clearly defined
>> in fuse_file_mmap. It dates back to
>> commit 3121bfe76311 ("fuse: fix "direct_io" private mmap")
>> Though, as direct_io_allow_mmap is a new feature, that was for MAP_PRIVATE
>> only. As invalidate_inode_pages2() is calling into fuse_launder_folio()
>> and writes out dirty pages, it should be safe to call
>> invalidate_inode_pages2 for MAP_PRIVATE and MAP_SHARED as well.
>
> Did you test with fsx (various versions can be found in LTP/xfstests)?
> It's very good at finding mapped vs. non-mapped bugs.
I tested with xfstest, but not with fsx yet. I can look into that. Do
you have by any chance an exact command I should run?
Thanks,
Bernd
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: [PATCH v2 1/5] fuse: Fix VM_MAYSHARE and direct_io_allow_mmap
2024-02-01 14:36 ` Bernd Schubert
@ 2024-02-01 14:52 ` Miklos Szeredi
2024-02-01 15:39 ` Bernd Schubert
0 siblings, 1 reply; 40+ messages in thread
From: Miklos Szeredi @ 2024-02-01 14:52 UTC (permalink / raw)
To: Bernd Schubert
Cc: Bernd Schubert, linux-fsdevel, dsingh, Hao Xu, stable,
Amir Goldstein
On Thu, 1 Feb 2024 at 15:36, Bernd Schubert <bernd.schubert@fastmail.fm> wrote:
>
>
>
> On 2/1/24 09:45, Miklos Szeredi wrote:
> > On Thu, 1 Feb 2024 at 00:09, Bernd Schubert <bschubert@ddn.com> wrote:
> >>
> >> There were multiple issues with direct_io_allow_mmap:
> >> - fuse_link_write_file() was missing, resulting in warnings in
> >> fuse_write_file_get() and EIO from msync()
> >> - "vma->vm_ops = &fuse_file_vm_ops" was not set, but especially
> >> fuse_page_mkwrite is needed.
> >>
> >> The semantics of invalidate_inode_pages2() is so far not clearly defined
> >> in fuse_file_mmap. It dates back to
> >> commit 3121bfe76311 ("fuse: fix "direct_io" private mmap")
> >> Though, as direct_io_allow_mmap is a new feature, that was for MAP_PRIVATE
> >> only. As invalidate_inode_pages2() is calling into fuse_launder_folio()
> >> and writes out dirty pages, it should be safe to call
> >> invalidate_inode_pages2 for MAP_PRIVATE and MAP_SHARED as well.
> >
> > Did you test with fsx (various versions can be found in LTP/xfstests)?
> > It's very good at finding mapped vs. non-mapped bugs.
>
> I tested with xfstest, but not with fsx yet. I can look into that. Do
> you have by any chance an exact command I should run?
Just specifying a filename should be good. Make sure you test with
various open modes.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: [PATCH v2 1/5] fuse: Fix VM_MAYSHARE and direct_io_allow_mmap
2024-02-01 14:52 ` Miklos Szeredi
@ 2024-02-01 15:39 ` Bernd Schubert
2024-02-01 15:43 ` Miklos Szeredi
0 siblings, 1 reply; 40+ messages in thread
From: Bernd Schubert @ 2024-02-01 15:39 UTC (permalink / raw)
To: Miklos Szeredi, Bernd Schubert
Cc: linux-fsdevel@vger.kernel.org, Dharmendra Singh, Hao Xu,
stable@vger.kernel.org, Amir Goldstein
On 2/1/24 15:52, Miklos Szeredi wrote:
> On Thu, 1 Feb 2024 at 15:36, Bernd Schubert <bernd.schubert@fastmail.fm> wrote:
>>
>>
>>
>> On 2/1/24 09:45, Miklos Szeredi wrote:
>>> On Thu, 1 Feb 2024 at 00:09, Bernd Schubert <bschubert@ddn.com> wrote:
>>>>
>>>> There were multiple issues with direct_io_allow_mmap:
>>>> - fuse_link_write_file() was missing, resulting in warnings in
>>>> fuse_write_file_get() and EIO from msync()
>>>> - "vma->vm_ops = &fuse_file_vm_ops" was not set, but especially
>>>> fuse_page_mkwrite is needed.
>>>>
>>>> The semantics of invalidate_inode_pages2() is so far not clearly defined
>>>> in fuse_file_mmap. It dates back to
>>>> commit 3121bfe76311 ("fuse: fix "direct_io" private mmap")
>>>> Though, as direct_io_allow_mmap is a new feature, that was for MAP_PRIVATE
>>>> only. As invalidate_inode_pages2() is calling into fuse_launder_folio()
>>>> and writes out dirty pages, it should be safe to call
>>>> invalidate_inode_pages2 for MAP_PRIVATE and MAP_SHARED as well.
>>>
>>> Did you test with fsx (various versions can be found in LTP/xfstests)?
>>> It's very good at finding mapped vs. non-mapped bugs.
>>
>> I tested with xfstest, but not with fsx yet. I can look into that. Do
>> you have by any chance an exact command I should run?
>
> Just specifying a filename should be good. Make sure you test with
> various open modes.
fsx immediately fails in FOPEN_DIRECT_IP mode ("passthrough_hp
--direct-io ...") on an unpatched kernel, but continues to run in
patched mode.
Given
-N numops: total # operations to do (default infinity)
How long do you think I should run it? Maybe something like 3 hours
(--duration=$(3 * 60 * 60))?
Thanks,
Bernd
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: [PATCH v2 1/5] fuse: Fix VM_MAYSHARE and direct_io_allow_mmap
2024-02-01 15:39 ` Bernd Schubert
@ 2024-02-01 15:43 ` Miklos Szeredi
2024-02-01 15:48 ` Amir Goldstein
2024-02-02 14:47 ` Bernd Schubert
0 siblings, 2 replies; 40+ messages in thread
From: Miklos Szeredi @ 2024-02-01 15:43 UTC (permalink / raw)
To: Bernd Schubert
Cc: Bernd Schubert, linux-fsdevel@vger.kernel.org, Dharmendra Singh,
Hao Xu, stable@vger.kernel.org, Amir Goldstein
On Thu, 1 Feb 2024 at 16:40, Bernd Schubert <bschubert@ddn.com> wrote:
> Given
> -N numops: total # operations to do (default infinity)
>
> How long do you think I should run it? Maybe something like 3 hours
> (--duration=$(3 * 60 * 60))?
I used -N1000000. If there were any issues they usually triggered much earlier.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v2 1/5] fuse: Fix VM_MAYSHARE and direct_io_allow_mmap
2024-02-01 15:43 ` Miklos Szeredi
@ 2024-02-01 15:48 ` Amir Goldstein
2024-02-01 16:16 ` Bernd Schubert
2024-02-02 14:47 ` Bernd Schubert
1 sibling, 1 reply; 40+ messages in thread
From: Amir Goldstein @ 2024-02-01 15:48 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Bernd Schubert, Bernd Schubert, linux-fsdevel@vger.kernel.org,
Dharmendra Singh, Hao Xu, stable@vger.kernel.org
On Thu, Feb 1, 2024 at 5:43 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Thu, 1 Feb 2024 at 16:40, Bernd Schubert <bschubert@ddn.com> wrote:
>
> > Given
> > -N numops: total # operations to do (default infinity)
> >
> > How long do you think I should run it? Maybe something like 3 hours
> > (--duration=$(3 * 60 * 60))?
>
> I used -N1000000. If there were any issues they usually triggered much earlier.
>
Note that at least these fstests run fsx in several configurations:
$ grep begin `git grep -l run_fsx tests/generic/`
tests/generic/091:_begin_fstest rw auto quick
tests/generic/263:_begin_fstest rw auto quick
tests/generic/469:_begin_fstest auto quick punch zero prealloc
tests/generic/521:_begin_fstest soak long_rw smoketest
tests/generic/522:_begin_fstest soak long_rw smoketest
tests/generic/616:_begin_fstest auto rw io_uring stress soak
tests/generic/617:_begin_fstest auto rw io_uring stress soak
Bernd, you've probably already ran them if you are running auto, quick
or rw test groups.
Possibly you want to try and run also the -g soak.long_rw tests.
They use nr_ops=$((1000000 * TIME_FACTOR))
Thanks,
Amir.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v2 1/5] fuse: Fix VM_MAYSHARE and direct_io_allow_mmap
2024-02-01 15:48 ` Amir Goldstein
@ 2024-02-01 16:16 ` Bernd Schubert
0 siblings, 0 replies; 40+ messages in thread
From: Bernd Schubert @ 2024-02-01 16:16 UTC (permalink / raw)
To: Amir Goldstein, Miklos Szeredi
Cc: Bernd Schubert, linux-fsdevel@vger.kernel.org, Dharmendra Singh,
Hao Xu, stable@vger.kernel.org
On 2/1/24 16:48, Amir Goldstein wrote:
> On Thu, Feb 1, 2024 at 5:43 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>>
>> On Thu, 1 Feb 2024 at 16:40, Bernd Schubert <bschubert@ddn.com> wrote:
>>
>>> Given
>>> -N numops: total # operations to do (default infinity)
>>>
>>> How long do you think I should run it? Maybe something like 3 hours
>>> (--duration=$(3 * 60 * 60))?
>>
>> I used -N1000000. If there were any issues they usually triggered much earlier.
>>
>
> Note that at least these fstests run fsx in several configurations:
>
> $ grep begin `git grep -l run_fsx tests/generic/`
> tests/generic/091:_begin_fstest rw auto quick
> tests/generic/263:_begin_fstest rw auto quick
> tests/generic/469:_begin_fstest auto quick punch zero prealloc
> tests/generic/521:_begin_fstest soak long_rw smoketest
> tests/generic/522:_begin_fstest soak long_rw smoketest
> tests/generic/616:_begin_fstest auto rw io_uring stress soak
> tests/generic/617:_begin_fstest auto rw io_uring stress soak
>
> Bernd, you've probably already ran them if you are running auto, quick
> or rw test groups.
>
> Possibly you want to try and run also the -g soak.long_rw tests.
>
> They use nr_ops=$((1000000 * TIME_FACTOR))
I didn't check yet what is the actual value, but TIME_FACTOR must be
smaller than 1 - using "-N1000000" is taking quite some time. I should
have started in screen. Some of the tests are marked as failed, like
generic/263:
+main: filesystem does not support fallocate mode
FALLOC_FL_KEEP_SIZE, disabling!
+main: filesystem does not support fallocate mode
FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, disabling!
+main: filesystem does not support fallocate mode
FALLOC_FL_ZERO_RANGE, disabling!
Although the test runs
generic/469 7s ... [not run] xfs_io falloc -k failed (old kernel/wrong fs?)
generic/521
generic/522
--> Somehow not in the output list at all. Ah I see, that needs "-g
soak.long_rw"
Going to add the the soak.long option to the tests and will run again,
once current fsx is over.
Thanks,
Bernd
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v2 1/5] fuse: Fix VM_MAYSHARE and direct_io_allow_mmap
2024-02-01 15:43 ` Miklos Szeredi
2024-02-01 15:48 ` Amir Goldstein
@ 2024-02-02 14:47 ` Bernd Schubert
1 sibling, 0 replies; 40+ messages in thread
From: Bernd Schubert @ 2024-02-02 14:47 UTC (permalink / raw)
To: Miklos Szeredi
Cc: Bernd Schubert, linux-fsdevel@vger.kernel.org, Dharmendra Singh,
Hao Xu, stable@vger.kernel.org, Amir Goldstein
On 2/1/24 16:43, Miklos Szeredi wrote:
> On Thu, 1 Feb 2024 at 16:40, Bernd Schubert <bschubert@ddn.com> wrote:
>
>> Given
>> -N numops: total # operations to do (default infinity)
>>
>> How long do you think I should run it? Maybe something like 3 hours
>> (--duration=$(3 * 60 * 60))?
>
> I used -N1000000. If there were any issues they usually triggered much earlier.
Forgot to post, it succeeds both, with and without FOPEN_DIRECT_IO with
bernd@squeeze1 ~>/home/fusetests/src/xfstests-dev/ltp/fsx -N1000000
/scratch/dest/testfile
Seed set to 1
main: filesystem does not support fallocate mode FALLOC_FL_KEEP_SIZE,
disabling!
main: filesystem does not support fallocate mode FALLOC_FL_PUNCH_HOLE |
FALLOC_FL_KEEP_SIZE, disabling!
main: filesystem does not support fallocate mode FALLOC_FL_ZERO_RANGE,
disabling!
main: filesystem does not support fallocate mode
FALLOC_FL_COLLAPSE_RANGE, disabling!
main: filesystem does not support fallocate mode FALLOC_FL_INSERT_RANGE,
disabling!
main: filesystem does not support clone range, disabling!
main: filesystem does not support dedupe range, disabling!
main: filesystem does not support exchange range, disabling!
truncating to largest ever: 0x3aea7
copying to largest ever: 0x3e19b
copying to largest ever: 0x3e343
fallocating to largest ever: 0x40000
skipping zero length fallocate
skipping zero size write
All 1000000 operations completed A-OK!
(I always tested the entire patch series).
Thanks,
Bernd
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH v2 2/5] fuse: Create helper function if DIO write needs exclusive lock
2024-01-31 23:08 [PATCH v2 0/5] fuse: inode IO modes and mmap Bernd Schubert
2024-01-31 23:08 ` [PATCH v2 1/5] fuse: Fix VM_MAYSHARE and direct_io_allow_mmap Bernd Schubert
@ 2024-01-31 23:08 ` Bernd Schubert
2024-02-06 9:20 ` Jingbo Xu
2024-01-31 23:08 ` [PATCH v2 3/5] fuse: Add fuse_dio_lock/unlock helper functions Bernd Schubert
` (3 subsequent siblings)
5 siblings, 1 reply; 40+ messages in thread
From: Bernd Schubert @ 2024-01-31 23:08 UTC (permalink / raw)
To: miklos; +Cc: linux-fsdevel, dsingh, Bernd Schubert, Amir Goldstein
This makes the code a bit easier to read and allows to more
easily add more conditions when an exclusive lock is needed.
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
---
fs/fuse/file.c | 64 +++++++++++++++++++++++++++++++++++---------------
1 file changed, 45 insertions(+), 19 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 243f469cac07..0c4d93293eac 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1299,6 +1299,45 @@ static ssize_t fuse_perform_write(struct kiocb *iocb, struct iov_iter *ii)
return res;
}
+static bool fuse_io_past_eof(struct kiocb *iocb, struct iov_iter *iter)
+{
+ struct inode *inode = file_inode(iocb->ki_filp);
+
+ return iocb->ki_pos + iov_iter_count(iter) > i_size_read(inode);
+}
+
+/*
+ * @return true if an exclusive lock for direct IO writes is needed
+ */
+static bool fuse_dio_wr_exclusive_lock(struct kiocb *iocb, struct iov_iter *from)
+{
+ struct file *file = iocb->ki_filp;
+ struct fuse_file *ff = file->private_data;
+ struct inode *inode = file_inode(iocb->ki_filp);
+
+ /* server side has to advise that it supports parallel dio writes */
+ if (!(ff->open_flags & FOPEN_PARALLEL_DIRECT_WRITES))
+ return true;
+
+ /* append will need to know the eventual eof - always needs an
+ * exclusive lock
+ */
+ if (iocb->ki_flags & IOCB_APPEND)
+ return true;
+
+ /* combination opf page access and direct-io difficult, shared
+ * locks actually introduce a conflict.
+ */
+ if (get_fuse_conn(inode)->direct_io_allow_mmap)
+ return true;
+
+ /* parallel dio beyond eof is at least for now not supported */
+ if (fuse_io_past_eof(iocb, from))
+ return true;
+
+ return false;
+}
+
static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
{
struct file *file = iocb->ki_filp;
@@ -1558,26 +1597,12 @@ static ssize_t fuse_direct_read_iter(struct kiocb *iocb, struct iov_iter *to)
return res;
}
-static bool fuse_direct_write_extending_i_size(struct kiocb *iocb,
- struct iov_iter *iter)
-{
- struct inode *inode = file_inode(iocb->ki_filp);
-
- return iocb->ki_pos + iov_iter_count(iter) > i_size_read(inode);
-}
-
static ssize_t fuse_direct_write_iter(struct kiocb *iocb, struct iov_iter *from)
{
struct inode *inode = file_inode(iocb->ki_filp);
- struct file *file = iocb->ki_filp;
- struct fuse_file *ff = file->private_data;
struct fuse_io_priv io = FUSE_IO_PRIV_SYNC(iocb);
ssize_t res;
- bool exclusive_lock =
- !(ff->open_flags & FOPEN_PARALLEL_DIRECT_WRITES) ||
- get_fuse_conn(inode)->direct_io_allow_mmap ||
- iocb->ki_flags & IOCB_APPEND ||
- fuse_direct_write_extending_i_size(iocb, from);
+ bool exclusive_lock = fuse_dio_wr_exclusive_lock(iocb, from);
/*
* Take exclusive lock if
@@ -1591,10 +1616,10 @@ static ssize_t fuse_direct_write_iter(struct kiocb *iocb, struct iov_iter *from)
else {
inode_lock_shared(inode);
- /* A race with truncate might have come up as the decision for
- * the lock type was done without holding the lock, check again.
+ /*
+ * Previous check was without any lock and might have raced.
*/
- if (fuse_direct_write_extending_i_size(iocb, from)) {
+ if (fuse_dio_wr_exclusive_lock(iocb, from)) {
inode_unlock_shared(inode);
inode_lock(inode);
exclusive_lock = true;
@@ -2468,7 +2493,8 @@ static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma)
return fuse_dax_mmap(file, vma);
if (ff->open_flags & FOPEN_DIRECT_IO) {
- /* Can't provide the coherency needed for MAP_SHARED
+ /*
+ * Can't provide the coherency needed for MAP_SHARED
* if FUSE_DIRECT_IO_ALLOW_MMAP isn't set.
*/
if ((vma->vm_flags & VM_MAYSHARE) && !fc->direct_io_allow_mmap)
--
2.40.1
^ permalink raw reply related [flat|nested] 40+ messages in thread* Re: [PATCH v2 2/5] fuse: Create helper function if DIO write needs exclusive lock
2024-01-31 23:08 ` [PATCH v2 2/5] fuse: Create helper function if DIO write needs exclusive lock Bernd Schubert
@ 2024-02-06 9:20 ` Jingbo Xu
2024-02-07 13:38 ` Bernd Schubert
0 siblings, 1 reply; 40+ messages in thread
From: Jingbo Xu @ 2024-02-06 9:20 UTC (permalink / raw)
To: Bernd Schubert, miklos; +Cc: linux-fsdevel, dsingh, Amir Goldstein
On 2/1/24 7:08 AM, Bernd Schubert wrote:
> @@ -1591,10 +1616,10 @@ static ssize_t fuse_direct_write_iter(struct kiocb *iocb, struct iov_iter *from)
> else {
> inode_lock_shared(inode);
>
> - /* A race with truncate might have come up as the decision for
> - * the lock type was done without holding the lock, check again.
> + /*
> + * Previous check was without any lock and might have raced.
> */
> - if (fuse_direct_write_extending_i_size(iocb, from)) {
> + if (fuse_dio_wr_exclusive_lock(iocb, from)) {
^
The overall is good. Maybe fuse_io_past_eof() is better to make it a
solely cleanup or refactoring. Actually it's already changed back to
fuse_io_past_eof() in patch 3/5.
--
Thanks,
Jingbo
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: [PATCH v2 2/5] fuse: Create helper function if DIO write needs exclusive lock
2024-02-06 9:20 ` Jingbo Xu
@ 2024-02-07 13:38 ` Bernd Schubert
2024-02-07 13:44 ` Jingbo Xu
0 siblings, 1 reply; 40+ messages in thread
From: Bernd Schubert @ 2024-02-07 13:38 UTC (permalink / raw)
To: Jingbo Xu, Bernd Schubert, miklos; +Cc: linux-fsdevel, dsingh, Amir Goldstein
On 2/6/24 10:20, Jingbo Xu wrote:
>
>
> On 2/1/24 7:08 AM, Bernd Schubert wrote:
>> @@ -1591,10 +1616,10 @@ static ssize_t fuse_direct_write_iter(struct kiocb *iocb, struct iov_iter *from)
>> else {
>> inode_lock_shared(inode);
>>
>> - /* A race with truncate might have come up as the decision for
>> - * the lock type was done without holding the lock, check again.
>> + /*
>> + * Previous check was without any lock and might have raced.
>> */
>> - if (fuse_direct_write_extending_i_size(iocb, from)) {
>> + if (fuse_dio_wr_exclusive_lock(iocb, from)) {
> ^
>
> The overall is good. Maybe fuse_io_past_eof() is better to make it a
> solely cleanup or refactoring. Actually it's already changed back to
> fuse_io_past_eof() in patch 3/5.
So I'm bit confused what you would like to see improved. Patch 2/5
renames "fuse_direct_write_extending_i_size" to "fuse_io_past_eof" and
also moves it up in the file. (The latter is a preparation for my
direct-write consolidation patches.). It also creates the helper
function fuse_dio_wr_exclusive_lock(). None of that is changed in 3/5,
which just moves the locking/unlocking from fuse_cache_write_iter() into
the functions fuse_dio_lock/fuse_dio_unlock.
Thanks,
Bernd
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: [PATCH v2 2/5] fuse: Create helper function if DIO write needs exclusive lock
2024-02-07 13:38 ` Bernd Schubert
@ 2024-02-07 13:44 ` Jingbo Xu
2024-02-07 14:13 ` Bernd Schubert
0 siblings, 1 reply; 40+ messages in thread
From: Jingbo Xu @ 2024-02-07 13:44 UTC (permalink / raw)
To: Bernd Schubert, Bernd Schubert, miklos
Cc: linux-fsdevel, dsingh, Amir Goldstein
On 2/7/24 9:38 PM, Bernd Schubert wrote:
>
>
> On 2/6/24 10:20, Jingbo Xu wrote:
>>
>>
>> On 2/1/24 7:08 AM, Bernd Schubert wrote:
>>> @@ -1591,10 +1616,10 @@ static ssize_t fuse_direct_write_iter(struct kiocb *iocb, struct iov_iter *from)
>>> else {
>>> inode_lock_shared(inode);
>>>
>>> - /* A race with truncate might have come up as the decision for
>>> - * the lock type was done without holding the lock, check again.
>>> + /*
>>> + * Previous check was without any lock and might have raced.
>>> */
>>> - if (fuse_direct_write_extending_i_size(iocb, from)) {
>>> + if (fuse_dio_wr_exclusive_lock(iocb, from)) {
>> ^
I mean, in patch 2/5
> - if (fuse_direct_write_extending_i_size(iocb, from)) {
> + if (fuse_io_past_eof(iocb, from)) {
is better, otherwise it's not an equivalent change.
Thanks,
Jingbo
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: [PATCH v2 2/5] fuse: Create helper function if DIO write needs exclusive lock
2024-02-07 13:44 ` Jingbo Xu
@ 2024-02-07 14:13 ` Bernd Schubert
2024-02-08 0:04 ` Jingbo Xu
0 siblings, 1 reply; 40+ messages in thread
From: Bernd Schubert @ 2024-02-07 14:13 UTC (permalink / raw)
To: Jingbo Xu, Bernd Schubert, miklos; +Cc: linux-fsdevel, dsingh, Amir Goldstein
On 2/7/24 14:44, Jingbo Xu wrote:
>
>
> On 2/7/24 9:38 PM, Bernd Schubert wrote:
>>
>>
>> On 2/6/24 10:20, Jingbo Xu wrote:
>>>
>>>
>>> On 2/1/24 7:08 AM, Bernd Schubert wrote:
>>>> @@ -1591,10 +1616,10 @@ static ssize_t fuse_direct_write_iter(struct kiocb *iocb, struct iov_iter *from)
>>>> else {
>>>> inode_lock_shared(inode);
>>>>
>>>> - /* A race with truncate might have come up as the decision for
>>>> - * the lock type was done without holding the lock, check again.
>>>> + /*
>>>> + * Previous check was without any lock and might have raced.
>>>> */
>
>
>>>> - if (fuse_direct_write_extending_i_size(iocb, from)) {
>>>> + if (fuse_dio_wr_exclusive_lock(iocb, from)) {
>>> ^
>
> I mean, in patch 2/5
>
>> - if (fuse_direct_write_extending_i_size(iocb, from)) {
>> + if (fuse_io_past_eof(iocb, from)) {
>
> is better, otherwise it's not an equivalent change.
Ah thanks, good catch! Now I see what you mean. Yeah, we can switch to
fuse_io_past_eof() here. And yeah, 3/5 changes it back.
Fortunately there is actually not much harm, as
fuse_dio_wr_exclusive_lock also calls into fuse_io_past_eof.
Thanks,
Bernd
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: [PATCH v2 2/5] fuse: Create helper function if DIO write needs exclusive lock
2024-02-07 14:13 ` Bernd Schubert
@ 2024-02-08 0:04 ` Jingbo Xu
0 siblings, 0 replies; 40+ messages in thread
From: Jingbo Xu @ 2024-02-08 0:04 UTC (permalink / raw)
To: Bernd Schubert, Bernd Schubert, miklos
Cc: linux-fsdevel, dsingh, Amir Goldstein
On 2/7/24 10:13 PM, Bernd Schubert wrote:
>
>
> On 2/7/24 14:44, Jingbo Xu wrote:
>>
>>
>> On 2/7/24 9:38 PM, Bernd Schubert wrote:
>>>
>>>
>>> On 2/6/24 10:20, Jingbo Xu wrote:
>>>>
>>>>
>>>> On 2/1/24 7:08 AM, Bernd Schubert wrote:
>>>>> @@ -1591,10 +1616,10 @@ static ssize_t fuse_direct_write_iter(struct kiocb *iocb, struct iov_iter *from)
>>>>> else {
>>>>> inode_lock_shared(inode);
>>>>>
>>>>> - /* A race with truncate might have come up as the decision for
>>>>> - * the lock type was done without holding the lock, check again.
>>>>> + /*
>>>>> + * Previous check was without any lock and might have raced.
>>>>> */
>>
>>
>>>>> - if (fuse_direct_write_extending_i_size(iocb, from)) {
>>>>> + if (fuse_dio_wr_exclusive_lock(iocb, from)) {
>>>> ^
>>
>> I mean, in patch 2/5
>>
>>> - if (fuse_direct_write_extending_i_size(iocb, from)) {
>>> + if (fuse_io_past_eof(iocb, from)) {
>>
>> is better, otherwise it's not an equivalent change.
>
> Ah thanks, good catch! Now I see what you mean. Yeah, we can switch to
> fuse_io_past_eof() here. And yeah, 3/5 changes it back.
> Fortunately there is actually not much harm, as
> fuse_dio_wr_exclusive_lock also calls into fuse_io_past_eof.
>
Yeah fortunately we needn't retest it as patch 3/5 changes it back.
The whole series is good. The comment is just from per-patch basis.
Thanks,
Jingbo
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH v2 3/5] fuse: Add fuse_dio_lock/unlock helper functions
2024-01-31 23:08 [PATCH v2 0/5] fuse: inode IO modes and mmap Bernd Schubert
2024-01-31 23:08 ` [PATCH v2 1/5] fuse: Fix VM_MAYSHARE and direct_io_allow_mmap Bernd Schubert
2024-01-31 23:08 ` [PATCH v2 2/5] fuse: Create helper function if DIO write needs exclusive lock Bernd Schubert
@ 2024-01-31 23:08 ` Bernd Schubert
2024-01-31 23:08 ` [PATCH v2 4/5] fuse: prepare for failing open response Bernd Schubert
` (2 subsequent siblings)
5 siblings, 0 replies; 40+ messages in thread
From: Bernd Schubert @ 2024-01-31 23:08 UTC (permalink / raw)
To: miklos; +Cc: linux-fsdevel, dsingh, Bernd Schubert, Amir Goldstein
So far this is just a helper to remove complex locking
logic out of fuse_direct_write_iter. Especially needed
by the next patch in the series to that adds the fuse inode
cache IO mode and adds in even more locking complexity.
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
---
fs/fuse/file.c | 61 ++++++++++++++++++++++++++++----------------------
1 file changed, 34 insertions(+), 27 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 0c4d93293eac..3062f4b5a34b 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1338,6 +1338,37 @@ static bool fuse_dio_wr_exclusive_lock(struct kiocb *iocb, struct iov_iter *from
return false;
}
+static void fuse_dio_lock(struct kiocb *iocb, struct iov_iter *from,
+ bool *exclusive)
+{
+ struct inode *inode = file_inode(iocb->ki_filp);
+
+ *exclusive = fuse_dio_wr_exclusive_lock(iocb, from);
+ if (*exclusive) {
+ inode_lock(inode);
+ } else {
+ inode_lock_shared(inode);
+ /*
+ * Previous check was without inode lock and might have raced,
+ * check again.
+ */
+ if (fuse_io_past_eof(iocb, from)) {
+ inode_unlock_shared(inode);
+ inode_lock(inode);
+ *exclusive = true;
+ }
+ }
+}
+
+static void fuse_dio_unlock(struct inode *inode, bool exclusive)
+{
+ if (exclusive) {
+ inode_unlock(inode);
+ } else {
+ inode_unlock_shared(inode);
+ }
+}
+
static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
{
struct file *file = iocb->ki_filp;
@@ -1602,30 +1633,9 @@ static ssize_t fuse_direct_write_iter(struct kiocb *iocb, struct iov_iter *from)
struct inode *inode = file_inode(iocb->ki_filp);
struct fuse_io_priv io = FUSE_IO_PRIV_SYNC(iocb);
ssize_t res;
- bool exclusive_lock = fuse_dio_wr_exclusive_lock(iocb, from);
-
- /*
- * Take exclusive lock if
- * - Parallel direct writes are disabled - a user space decision
- * - Parallel direct writes are enabled and i_size is being extended.
- * - Shared mmap on direct_io file is supported (FUSE_DIRECT_IO_ALLOW_MMAP).
- * This might not be needed at all, but needs further investigation.
- */
- if (exclusive_lock)
- inode_lock(inode);
- else {
- inode_lock_shared(inode);
-
- /*
- * Previous check was without any lock and might have raced.
- */
- if (fuse_dio_wr_exclusive_lock(iocb, from)) {
- inode_unlock_shared(inode);
- inode_lock(inode);
- exclusive_lock = true;
- }
- }
+ bool exclusive;
+ fuse_dio_lock(iocb, from, &exclusive);
res = generic_write_checks(iocb, from);
if (res > 0) {
if (!is_sync_kiocb(iocb) && iocb->ki_flags & IOCB_DIRECT) {
@@ -1636,10 +1646,7 @@ static ssize_t fuse_direct_write_iter(struct kiocb *iocb, struct iov_iter *from)
fuse_write_update_attr(inode, iocb->ki_pos, res);
}
}
- if (exclusive_lock)
- inode_unlock(inode);
- else
- inode_unlock_shared(inode);
+ fuse_dio_unlock(inode, exclusive);
return res;
}
--
2.40.1
^ permalink raw reply related [flat|nested] 40+ messages in thread* [PATCH v2 4/5] fuse: prepare for failing open response
2024-01-31 23:08 [PATCH v2 0/5] fuse: inode IO modes and mmap Bernd Schubert
` (2 preceding siblings ...)
2024-01-31 23:08 ` [PATCH v2 3/5] fuse: Add fuse_dio_lock/unlock helper functions Bernd Schubert
@ 2024-01-31 23:08 ` Bernd Schubert
2024-02-01 9:23 ` Miklos Szeredi
2024-02-02 15:55 ` Miklos Szeredi
2024-01-31 23:08 ` [PATCH v2 5/5] fuse: introduce inode io modes Bernd Schubert
2024-02-01 10:30 ` [PATCH v2 0/5] fuse: inode IO modes and mmap Amir Goldstein
5 siblings, 2 replies; 40+ messages in thread
From: Bernd Schubert @ 2024-01-31 23:08 UTC (permalink / raw)
To: miklos; +Cc: linux-fsdevel, dsingh, Amir Goldstein, Bernd Schubert
From: Amir Goldstein <amir73il@gmail.com>
In preparation for inode io modes, a server open response could fail
due to conflicting inode io modes.
Allow returning an error from fuse_finish_open() and handle the error in
the callers. fuse_dir_open() can now call fuse_sync_release(), so handle
the isdir case correctly.
fuse_finish_open() is used as the callback of finish_open(), so that
FMODE_OPENED will not be set if fuse_finish_open() fails.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
---
fs/fuse/dir.c | 8 +++++---
fs/fuse/file.c | 18 ++++++++++++------
fs/fuse/fuse_i.h | 2 +-
3 files changed, 18 insertions(+), 10 deletions(-)
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index d19cbf34c634..d45d4a678351 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -692,13 +692,15 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
d_instantiate(entry, inode);
fuse_change_entry_timeout(entry, &outentry);
fuse_dir_changed(dir);
- err = finish_open(file, entry, generic_file_open);
+ err = generic_file_open(inode, file);
+ if (!err) {
+ file->private_data = ff;
+ err = finish_open(file, entry, fuse_finish_open);
+ }
if (err) {
fi = get_fuse_inode(inode);
fuse_sync_release(fi, ff, flags);
} else {
- file->private_data = ff;
- fuse_finish_open(inode, file);
if (fm->fc->atomic_o_trunc && trunc)
truncate_pagecache(inode, 0);
else if (!(ff->open_flags & FOPEN_KEEP_CACHE))
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 3062f4b5a34b..7d2f4b0eb36a 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -195,7 +195,7 @@ static void fuse_link_write_file(struct file *file)
spin_unlock(&fi->lock);
}
-void fuse_finish_open(struct inode *inode, struct file *file)
+int fuse_finish_open(struct inode *inode, struct file *file)
{
struct fuse_file *ff = file->private_data;
struct fuse_conn *fc = get_fuse_conn(inode);
@@ -217,12 +217,16 @@ void fuse_finish_open(struct inode *inode, struct file *file)
}
if ((file->f_mode & FMODE_WRITE) && fc->writeback_cache)
fuse_link_write_file(file);
+
+ return 0;
}
int fuse_open_common(struct inode *inode, struct file *file, bool isdir)
{
struct fuse_mount *fm = get_fuse_mount(inode);
+ struct fuse_inode *fi = get_fuse_inode(inode);
struct fuse_conn *fc = fm->fc;
+ struct fuse_file *ff;
int err;
bool is_wb_truncate = (file->f_flags & O_TRUNC) &&
fc->atomic_o_trunc &&
@@ -251,14 +255,16 @@ int fuse_open_common(struct inode *inode, struct file *file, bool isdir)
fuse_set_nowrite(inode);
err = fuse_do_open(fm, get_node_id(inode), file, isdir);
- if (!err)
- fuse_finish_open(inode, file);
+ if (!err) {
+ ff = file->private_data;
+ err = fuse_finish_open(inode, file);
+ if (err)
+ fuse_sync_release(fi, ff, file->f_flags);
+ }
if (is_wb_truncate || dax_truncate)
fuse_release_nowrite(inode);
if (!err) {
- struct fuse_file *ff = file->private_data;
-
if (fc->atomic_o_trunc && (file->f_flags & O_TRUNC))
truncate_pagecache(inode, 0);
else if (!(ff->open_flags & FOPEN_KEEP_CACHE))
@@ -368,7 +374,7 @@ void fuse_sync_release(struct fuse_inode *fi, struct fuse_file *ff,
* iput(NULL) is a no-op and since the refcount is 1 and everything's
* synchronous, we are fine with not doing igrab() here"
*/
- fuse_file_put(ff, true, false);
+ fuse_file_put(ff, true, fi && S_ISDIR(fi->inode.i_mode));
}
EXPORT_SYMBOL_GPL(fuse_sync_release);
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 1df83eebda92..1c0cde4022f0 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1038,7 +1038,7 @@ int fuse_open_common(struct inode *inode, struct file *file, bool isdir);
struct fuse_file *fuse_file_alloc(struct fuse_mount *fm);
void fuse_file_free(struct fuse_file *ff);
-void fuse_finish_open(struct inode *inode, struct file *file);
+int fuse_finish_open(struct inode *inode, struct file *file);
void fuse_sync_release(struct fuse_inode *fi, struct fuse_file *ff,
unsigned int flags);
--
2.40.1
^ permalink raw reply related [flat|nested] 40+ messages in thread* Re: [PATCH v2 4/5] fuse: prepare for failing open response
2024-01-31 23:08 ` [PATCH v2 4/5] fuse: prepare for failing open response Bernd Schubert
@ 2024-02-01 9:23 ` Miklos Szeredi
2024-02-01 10:16 ` Amir Goldstein
2024-02-02 15:55 ` Miklos Szeredi
1 sibling, 1 reply; 40+ messages in thread
From: Miklos Szeredi @ 2024-02-01 9:23 UTC (permalink / raw)
To: Bernd Schubert; +Cc: linux-fsdevel, dsingh, Amir Goldstein
On Thu, 1 Feb 2024 at 00:09, Bernd Schubert <bschubert@ddn.com> wrote:
>
> From: Amir Goldstein <amir73il@gmail.com>
>
> In preparation for inode io modes, a server open response could fail
> due to conflicting inode io modes.
>
> Allow returning an error from fuse_finish_open() and handle the error in
> the callers. fuse_dir_open() can now call fuse_sync_release(), so handle
> the isdir case correctly.
While that's true, it may be better to just decouple the dir/regular
paths completely, since there isn't much sharing anyway and becoming
even less.
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index d19cbf34c634..d45d4a678351 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -692,13 +692,15 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
> d_instantiate(entry, inode);
> fuse_change_entry_timeout(entry, &outentry);
> fuse_dir_changed(dir);
> - err = finish_open(file, entry, generic_file_open);
> + err = generic_file_open(inode, file);
> + if (!err) {
> + file->private_data = ff;
> + err = finish_open(file, entry, fuse_finish_open);
Need to be careful with moving fuse_finish_open() call inside
finish_open() since various fields will be different.
In particular O_TRUNC in f_flags will not be cleared and in this case
it looks undesirable.
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: [PATCH v2 4/5] fuse: prepare for failing open response
2024-02-01 9:23 ` Miklos Szeredi
@ 2024-02-01 10:16 ` Amir Goldstein
2024-02-01 10:28 ` Miklos Szeredi
0 siblings, 1 reply; 40+ messages in thread
From: Amir Goldstein @ 2024-02-01 10:16 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Bernd Schubert, linux-fsdevel, dsingh
On Thu, Feb 1, 2024 at 11:23 AM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Thu, 1 Feb 2024 at 00:09, Bernd Schubert <bschubert@ddn.com> wrote:
> >
> > From: Amir Goldstein <amir73il@gmail.com>
> >
> > In preparation for inode io modes, a server open response could fail
> > due to conflicting inode io modes.
> >
> > Allow returning an error from fuse_finish_open() and handle the error in
> > the callers. fuse_dir_open() can now call fuse_sync_release(), so handle
> > the isdir case correctly.
>
> While that's true, it may be better to just decouple the dir/regular
> paths completely, since there isn't much sharing anyway and becoming
> even less.
>
I can look into it, but for now the fix to fuse_sync_release() is a simple
one liner, so I would rather limit the changes in this series.
> > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> > index d19cbf34c634..d45d4a678351 100644
> > --- a/fs/fuse/dir.c
> > +++ b/fs/fuse/dir.c
> > @@ -692,13 +692,15 @@ static int fuse_create_open(struct inode *dir, struct dentry *entry,
> > d_instantiate(entry, inode);
> > fuse_change_entry_timeout(entry, &outentry);
> > fuse_dir_changed(dir);
> > - err = finish_open(file, entry, generic_file_open);
> > + err = generic_file_open(inode, file);
> > + if (!err) {
> > + file->private_data = ff;
> > + err = finish_open(file, entry, fuse_finish_open);
>
> Need to be careful with moving fuse_finish_open() call inside
> finish_open() since various fields will be different.
>
> In particular O_TRUNC in f_flags will not be cleared and in this case
> it looks undesirable.
Why? coming from fuse_open_common(), fuse_finish_open() is
called before clearing O_TRUNC.
Is fuse_finish_open() supposed to be called after clearing O_TRUNC
in fuse_create_open()?
I realize that this is what the code is doing in upstream, but it does not
look correct.
Probably, nobody could notice it, because server would probably have
truncated the file before the CREATE response anyway?
Am I missing something?
Thanks,
Amir.
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: [PATCH v2 4/5] fuse: prepare for failing open response
2024-02-01 10:16 ` Amir Goldstein
@ 2024-02-01 10:28 ` Miklos Szeredi
2024-02-01 10:41 ` Amir Goldstein
0 siblings, 1 reply; 40+ messages in thread
From: Miklos Szeredi @ 2024-02-01 10:28 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Bernd Schubert, linux-fsdevel, dsingh
On Thu, 1 Feb 2024 at 11:16, Amir Goldstein <amir73il@gmail.com> wrote:
> I can look into it, but for now the fix to fuse_sync_release() is a simple
> one liner, so I would rather limit the changes in this series.
Not urgent, but it might be a good idea to add a cleanup patch as a
prep, which would make this patch just that one line less complex.
> Is fuse_finish_open() supposed to be called after clearing O_TRUNC
> in fuse_create_open()?
This will invalidate size/modification time, which we've just updated
with the correct post open values.
> I realize that this is what the code is doing in upstream, but it does not
> look correct.
I think it's correct, because it deals with the effect of
FUSE_OPEN/O_TRUNC on attributes that weren't refreshed in contrast to
fuse_create_open() where the attributes were refreshed.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v2 4/5] fuse: prepare for failing open response
2024-02-01 10:28 ` Miklos Szeredi
@ 2024-02-01 10:41 ` Amir Goldstein
2024-02-01 10:51 ` Miklos Szeredi
0 siblings, 1 reply; 40+ messages in thread
From: Amir Goldstein @ 2024-02-01 10:41 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Bernd Schubert, linux-fsdevel, dsingh
On Thu, Feb 1, 2024 at 12:29 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Thu, 1 Feb 2024 at 11:16, Amir Goldstein <amir73il@gmail.com> wrote:
>
> > I can look into it, but for now the fix to fuse_sync_release() is a simple
> > one liner, so I would rather limit the changes in this series.
>
> Not urgent, but it might be a good idea to add a cleanup patch as a
> prep, which would make this patch just that one line less complex.
>
I will see if I can get something done quickly.
> > Is fuse_finish_open() supposed to be called after clearing O_TRUNC
> > in fuse_create_open()?
>
> This will invalidate size/modification time, which we've just updated
> with the correct post open values.
>
> > I realize that this is what the code is doing in upstream, but it does not
> > look correct.
>
> I think it's correct, because it deals with the effect of
> FUSE_OPEN/O_TRUNC on attributes that weren't refreshed in contrast to
> fuse_create_open() where the attributes were refreshed.
>
I was considering splitting fuse_finish_open() to the first part that
can fail and the "finally" part that deals with attributes, but seeing
that this entire chunk of atomic_o_trunc code in fuse_finish_open()
is never relevant to atomic_open(), I'd rather just move it out
into fuse_open_common() which has loads of other code related to
atomic_o_trunc anyway?
Thanks,
Amir.
> Thanks,
> Miklos
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v2 4/5] fuse: prepare for failing open response
2024-02-01 10:41 ` Amir Goldstein
@ 2024-02-01 10:51 ` Miklos Szeredi
2024-02-01 16:46 ` Amir Goldstein
0 siblings, 1 reply; 40+ messages in thread
From: Miklos Szeredi @ 2024-02-01 10:51 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Bernd Schubert, linux-fsdevel, dsingh
On Thu, 1 Feb 2024 at 11:41, Amir Goldstein <amir73il@gmail.com> wrote:
> I was considering splitting fuse_finish_open() to the first part that
> can fail and the "finally" part that deals with attributes, but seeing
> that this entire chunk of atomic_o_trunc code in fuse_finish_open()
> is never relevant to atomic_open(), I'd rather just move it out
> into fuse_open_common() which has loads of other code related to
> atomic_o_trunc anyway?
Yep.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v2 4/5] fuse: prepare for failing open response
2024-02-01 10:51 ` Miklos Szeredi
@ 2024-02-01 16:46 ` Amir Goldstein
2024-02-02 12:03 ` Amir Goldstein
0 siblings, 1 reply; 40+ messages in thread
From: Amir Goldstein @ 2024-02-01 16:46 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Bernd Schubert, linux-fsdevel, dsingh
On Thu, Feb 1, 2024 at 12:51 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Thu, 1 Feb 2024 at 11:41, Amir Goldstein <amir73il@gmail.com> wrote:
>
> > I was considering splitting fuse_finish_open() to the first part that
> > can fail and the "finally" part that deals with attributes, but seeing
> > that this entire chunk of atomic_o_trunc code in fuse_finish_open()
> > is never relevant to atomic_open(), I'd rather just move it out
> > into fuse_open_common() which has loads of other code related to
> > atomic_o_trunc anyway?
>
> Yep.
FWIW, I pushed some cleanups to:
https://github.com/amir73il/linux/commits/fuse_io_mode-wip/
* e71b0c0356c8 - (github/fuse_io_mode-wip) fuse: introduce inode io modes
* 081ddd63a9ff - fuse: prepare for failing open response
* 437b84a47a8a - fuse: allocate ff->release_args only if release is needed
* e2df18f9a3d6 - fuse: factor out helper fuse_truncate_update_attr()
e2df18f9a3d6 is the O_TRUNC change discussed above.
437b84a47a8a gets rid of the isdir argument to fuse_file_put(), so this
one liner that you disliked is gone.
I will see if I can also get the opendir separation cleanup done.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v2 4/5] fuse: prepare for failing open response
2024-02-01 16:46 ` Amir Goldstein
@ 2024-02-02 12:03 ` Amir Goldstein
2024-02-02 15:49 ` Miklos Szeredi
0 siblings, 1 reply; 40+ messages in thread
From: Amir Goldstein @ 2024-02-02 12:03 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Bernd Schubert, linux-fsdevel, dsingh
On Thu, Feb 1, 2024 at 6:46 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Thu, Feb 1, 2024 at 12:51 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > On Thu, 1 Feb 2024 at 11:41, Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > > I was considering splitting fuse_finish_open() to the first part that
> > > can fail and the "finally" part that deals with attributes, but seeing
> > > that this entire chunk of atomic_o_trunc code in fuse_finish_open()
> > > is never relevant to atomic_open(), I'd rather just move it out
> > > into fuse_open_common() which has loads of other code related to
> > > atomic_o_trunc anyway?
> >
> > Yep.
>
> FWIW, I pushed some cleanups to:
>
> https://github.com/amir73il/linux/commits/fuse_io_mode-wip/
>
> * e71b0c0356c8 - (github/fuse_io_mode-wip) fuse: introduce inode io modes
> * 081ddd63a9ff - fuse: prepare for failing open response
> * 437b84a47a8a - fuse: allocate ff->release_args only if release is needed
> * e2df18f9a3d6 - fuse: factor out helper fuse_truncate_update_attr()
>
> e2df18f9a3d6 is the O_TRUNC change discussed above.
> 437b84a47a8a gets rid of the isdir argument to fuse_file_put(), so this
> one liner that you disliked is gone.
>
> I will see if I can also get the opendir separation cleanup done.
>
This is what I have in WIP branch - not sure if that is what you meant:
* 285a83f439d8 - (github/fuse_io_mode-wip) fuse: introduce inode io modes
* cf7e1707a319 - fuse: break up fuse_open_common()
* d8fcee9252ca - fuse: prepare for failing open response
* 5e4786da5d6e - fuse: allocate ff->release_args only if release is needed
* 64a6a415239c - fuse: factor out helper fuse_truncate_update_attr()
Thanks,
Amir.
commit cf7e1707a31990ed5df1294047909fce60cc3ec1
Author: Amir Goldstein <amir73il@gmail.com>
Date: Fri Feb 2 13:30:30 2024 +0200
fuse: break up fuse_open_common()
fuse_open_common() has a lot of code relevant only for regular files and
O_TRUNC in particular.
Copy the little bit of remaining code into fuse_dir_open() and stop using
this common helper for directory open.
Suggested-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 27daf0bf84ad..3498255402fe 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1632,7 +1632,27 @@ static const char *fuse_get_link(struct dentry
*dentry, struct inode *inode,
static int fuse_dir_open(struct inode *inode, struct file *file)
{
- return fuse_open_common(inode, file, true);
+ struct fuse_mount *fm = get_fuse_mount(inode);
+ struct fuse_inode *fi = get_fuse_inode(inode);
+ int err;
+
+ if (fuse_is_bad(inode))
+ return -EIO;
+
+ err = generic_file_open(inode, file);
+ if (err)
+ return err;
+
+ err = fuse_do_open(fm, get_node_id(inode), file, true);
+ if (!err) {
+ struct fuse_file *ff = file->private_data;
+
+ err = fuse_finish_open(inode, file);
+ if (err)
+ fuse_sync_release(fi, ff, file->f_flags);
+ }
+
+ return err;
}
static int fuse_dir_release(struct inode *inode, struct file *file)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 891bfa8a6724..1d6b3499c069 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -229,7 +229,7 @@ static void fuse_truncate_update_attr(struct inode
*inode, struct file *file)
fuse_invalidate_attr_mask(inode, FUSE_STATX_MODSIZE);
}
-int fuse_open_common(struct inode *inode, struct file *file, bool isdir)
+int fuse_open(struct inode *inode, struct file *file)
{
struct fuse_mount *fm = get_fuse_mount(inode);
struct fuse_inode *fi = get_fuse_inode(inode);
@@ -260,7 +260,7 @@ int fuse_open_common(struct inode *inode, struct
file *file, bool isdir)
if (is_wb_truncate || dax_truncate)
fuse_set_nowrite(inode);
- err = fuse_do_open(fm, get_node_id(inode), file, isdir);
+ err = fuse_do_open(fm, get_node_id(inode), file, false);
if (!err) {
ff = file->private_data;
err = fuse_finish_open(inode, file);
@@ -359,11 +359,6 @@ void fuse_release_common(struct file *file, bool isdir)
(fl_owner_t) file, isdir);
}
-static int fuse_open(struct inode *inode, struct file *file)
-{
- return fuse_open_common(inode, file, false);
-}
-
static int fuse_release(struct inode *inode, struct file *file)
{
struct fuse_conn *fc = get_fuse_conn(inode);
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 536b4515c2c8..9ad5f882bd0a 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -1034,8 +1034,6 @@ void fuse_read_args_fill(struct fuse_io_args
*ia, struct file *file, loff_t pos,
/**
* Send OPEN or OPENDIR request
*/
-int fuse_open_common(struct inode *inode, struct file *file, bool isdir);
-
struct fuse_file *fuse_file_alloc(struct fuse_mount *fm, bool release);
void fuse_file_free(struct fuse_file *ff);
int fuse_finish_open(struct inode *inode, struct file *file);
--
^ permalink raw reply related [flat|nested] 40+ messages in thread* Re: [PATCH v2 4/5] fuse: prepare for failing open response
2024-02-02 12:03 ` Amir Goldstein
@ 2024-02-02 15:49 ` Miklos Szeredi
0 siblings, 0 replies; 40+ messages in thread
From: Miklos Szeredi @ 2024-02-02 15:49 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Bernd Schubert, linux-fsdevel, dsingh
On Fri, 2 Feb 2024 at 13:03, Amir Goldstein <amir73il@gmail.com> wrote:
> static int fuse_dir_open(struct inode *inode, struct file *file)
> {
> - return fuse_open_common(inode, file, true);
> + struct fuse_mount *fm = get_fuse_mount(inode);
> + struct fuse_inode *fi = get_fuse_inode(inode);
> + int err;
> +
> + if (fuse_is_bad(inode))
> + return -EIO;
> +
> + err = generic_file_open(inode, file);
> + if (err)
> + return err;
> +
> + err = fuse_do_open(fm, get_node_id(inode), file, true);
> + if (!err) {
> + struct fuse_file *ff = file->private_data;
> +
> + err = fuse_finish_open(inode, file);
I'd prefer fuse_finish_open() to be expanded as well. FMODE_WRITE is
always false for directories. The other two FOPEN_ flags don't make
sense for directories, but let's just leave them for a later cleanup.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v2 4/5] fuse: prepare for failing open response
2024-01-31 23:08 ` [PATCH v2 4/5] fuse: prepare for failing open response Bernd Schubert
2024-02-01 9:23 ` Miklos Szeredi
@ 2024-02-02 15:55 ` Miklos Szeredi
2024-02-02 16:07 ` Amir Goldstein
1 sibling, 1 reply; 40+ messages in thread
From: Miklos Szeredi @ 2024-02-02 15:55 UTC (permalink / raw)
To: Bernd Schubert; +Cc: linux-fsdevel, dsingh, Amir Goldstein
On Thu, 1 Feb 2024 at 00:09, Bernd Schubert <bschubert@ddn.com> wrote:
>
> From: Amir Goldstein <amir73il@gmail.com>
>
> In preparation for inode io modes, a server open response could fail
> due to conflicting inode io modes.
>
> Allow returning an error from fuse_finish_open() and handle the error in
> the callers. fuse_dir_open() can now call fuse_sync_release(), so handle
> the isdir case correctly.
Another question: will fuse_finish_open() fail for any reason other
than due to an interrupted wait?
If not, then maybe this belongs into a separate update which deals
with killability/interruptibility in a comprehensive way.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v2 4/5] fuse: prepare for failing open response
2024-02-02 15:55 ` Miklos Szeredi
@ 2024-02-02 16:07 ` Amir Goldstein
0 siblings, 0 replies; 40+ messages in thread
From: Amir Goldstein @ 2024-02-02 16:07 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Bernd Schubert, linux-fsdevel, dsingh
On Fri, Feb 2, 2024 at 5:55 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Thu, 1 Feb 2024 at 00:09, Bernd Schubert <bschubert@ddn.com> wrote:
> >
> > From: Amir Goldstein <amir73il@gmail.com>
> >
> > In preparation for inode io modes, a server open response could fail
> > due to conflicting inode io modes.
> >
> > Allow returning an error from fuse_finish_open() and handle the error in
> > the callers. fuse_dir_open() can now call fuse_sync_release(), so handle
> > the isdir case correctly.
>
> Another question: will fuse_finish_open() fail for any reason other
> than due to an interrupted wait?
>
Yes. Definitely. With fuse passthrough.
Wait for parallel dio is Benrd's share of the patch set.
My interest was always to deny cached open and passthrough
open of the same inode and passthrough file open to a conflicting backing file
as we have agreed [1]. See this passthough open commit for example:
https://github.com/amir73il/linux/commit/9422b02931ca4be5471230645a2b4a6723f99d0e
I will also split fuse_finish_dir_open() as you suggested.
Thanks,
Amir.
[1] https://lore.kernel.org/linux-fsdevel/CAJfpegsoHtp_VthZRGfcoBREZ0pveb4wYYiKVEnCxaTgGEaeWw@mail.gmail.com/
^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH v2 5/5] fuse: introduce inode io modes
2024-01-31 23:08 [PATCH v2 0/5] fuse: inode IO modes and mmap Bernd Schubert
` (3 preceding siblings ...)
2024-01-31 23:08 ` [PATCH v2 4/5] fuse: prepare for failing open response Bernd Schubert
@ 2024-01-31 23:08 ` Bernd Schubert
2024-02-01 14:47 ` Miklos Szeredi
2024-02-01 10:30 ` [PATCH v2 0/5] fuse: inode IO modes and mmap Amir Goldstein
5 siblings, 1 reply; 40+ messages in thread
From: Bernd Schubert @ 2024-01-31 23:08 UTC (permalink / raw)
To: miklos; +Cc: linux-fsdevel, dsingh, Amir Goldstein, Bernd Schubert
From: Amir Goldstein <amir73il@gmail.com>
The fuse inode io mode is determined by the mode of its open files/mmaps
and parallel dio.
- caching io mode - files open in caching mode or mmap on direct_io file
- direct io mode - no files open in caching mode and no files mmaped
- parallel dio mode - direct io mode with parallel dio in progress
We use a new FOPEN_CACHE_IO flag to explicitly mark a file that was open
in caching mode.
direct_io mmap uses page cache, so first mmap will mark the file as
FOPEN_DIRECT_IO|FOPEN_CACHE_IO (i.e. mixed mode) and inode will enter
the caching io mode.
If the server opens the file with flags FOPEN_DIRECT_IO|FOPEN_CACHE_IO,
the inode enters caching io mode already on open.
This allows executing parallel dio when inode is not in caching mode
even if shared mmap is allowed, but no mmaps have been performed on
the inode in question.
An open in caching mode and mmap on direct_io file now waits for all
in-progress parallel dio writes to complete, so paralle dio writes
together with FUSE_DIRECT_IO_ALLOW_MMAP is enabled by this commit.
Signed-off-by: Bernd Schubert <bschubert@ddn.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
fs/fuse/file.c | 215 ++++++++++++++++++++++++++++++++++++--
fs/fuse/fuse_i.h | 79 +++++++++++++-
include/uapi/linux/fuse.h | 2 +
3 files changed, 286 insertions(+), 10 deletions(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 7d2f4b0eb36a..eb9929ff9f60 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -105,10 +105,177 @@ static void fuse_release_end(struct fuse_mount *fm, struct fuse_args *args,
kfree(ra);
}
+static bool fuse_file_is_direct_io(struct file *file)
+{
+ struct fuse_file *ff = file->private_data;
+
+ return ff->open_flags & FOPEN_DIRECT_IO || file->f_flags & O_DIRECT;
+}
+
+/*
+ * Wait for cached io to be allowed -
+ * Blocks new parallel dio writes and waits for the in-progress parallel dio
+ * writes to complete.
+ */
+static int fuse_inode_wait_for_cached_io(struct fuse_inode *fi)
+{
+ int err = 0;
+
+ assert_spin_locked(&fi->lock);
+
+ while (!err && !fuse_inode_get_io_cache(fi)) {
+ /*
+ * Setting the bit advises new direct-io writes
+ * to use an exclusive lock - without it the wait below
+ * might be forever.
+ */
+ set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
+ spin_unlock(&fi->lock);
+ err = wait_event_killable(fi->direct_io_waitq,
+ fuse_is_io_cache_allowed(fi));
+ spin_lock(&fi->lock);
+ }
+ /* Clear FUSE_I_CACHE_IO_MODE flag if failed to enter caching mode */
+ if (err && fi->iocachectr <= 0)
+ clear_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
+
+ return err;
+}
+
+/* Start cached io mode where parallel dio writes are not allowed */
+static int fuse_file_cached_io_start(struct inode *inode)
+{
+ struct fuse_inode *fi = get_fuse_inode(inode);
+ int err;
+
+ spin_lock(&fi->lock);
+ err = fuse_inode_wait_for_cached_io(fi);
+ spin_unlock(&fi->lock);
+ return err;
+}
+
+static void fuse_file_cached_io_end(struct inode *inode)
+{
+ struct fuse_inode *fi = get_fuse_inode(inode);
+
+ spin_lock(&fi->lock);
+ fuse_inode_put_io_cache(get_fuse_inode(inode));
+ spin_unlock(&fi->lock);
+}
+
+/* Start strictly uncached io mode where cache access is not allowed */
+static int fuse_file_uncached_io_start(struct inode *inode)
+{
+ struct fuse_inode *fi = get_fuse_inode(inode);
+ bool ok;
+
+ spin_lock(&fi->lock);
+ ok = fuse_inode_deny_io_cache(fi);
+ spin_unlock(&fi->lock);
+ return ok ? 0 : -ETXTBSY;
+}
+
+static void fuse_file_uncached_io_end(struct inode *inode)
+{
+ struct fuse_inode *fi = get_fuse_inode(inode);
+ bool allow_cached_io;
+
+ spin_lock(&fi->lock);
+ allow_cached_io = fuse_inode_allow_io_cache(fi);
+ spin_unlock(&fi->lock);
+ if (allow_cached_io)
+ wake_up(&fi->direct_io_waitq);
+}
+
+/* Open flags to determine regular file io mode */
+#define FOPEN_IO_MODE_MASK \
+ (FOPEN_DIRECT_IO | FOPEN_CACHE_IO)
+
+/* Request access to submit new io to inode via open file */
+static int fuse_file_io_open(struct file *file, struct inode *inode)
+{
+ struct fuse_file *ff = file->private_data;
+ int iomode_flags = ff->open_flags & FOPEN_IO_MODE_MASK;
+ int err;
+
+ err = -EBUSY;
+ if (WARN_ON(ff->io_opened))
+ goto fail;
+
+ if (!S_ISREG(inode->i_mode) || FUSE_IS_DAX(inode)) {
+ err = -EINVAL;
+ if (iomode_flags)
+ goto fail;
+ return 0;
+ }
+
+ /* Set explicit FOPEN_CACHE_IO flag for file open in caching mode */
+ if (!fuse_file_is_direct_io(file))
+ ff->open_flags |= FOPEN_CACHE_IO;
+
+ /* First caching file open enters caching inode io mode */
+ if (ff->open_flags & FOPEN_CACHE_IO) {
+ err = fuse_file_cached_io_start(inode);
+ if (err)
+ goto fail;
+ }
+
+ ff->io_opened = true;
+ return 0;
+
+fail:
+ pr_debug("failed to open file in requested io mode (open_flags=0x%x, err=%i).\n",
+ ff->open_flags, err);
+ /*
+ * The file open mode determines the inode io mode.
+ * Using incorrect open mode is a server mistake, which results in
+ * user visible failure of open() with EIO error.
+ */
+ return -EIO;
+}
+
+/* Request access to submit new io to inode via mmap */
+static int fuse_file_io_mmap(struct fuse_file *ff, struct inode *inode)
+{
+ struct fuse_inode *fi = get_fuse_inode(inode);
+ int err = 0;
+
+ if (WARN_ON(!ff->io_opened))
+ return -ENODEV;
+
+ spin_lock(&fi->lock);
+ /* First mmap of direct_io file enters caching inode io mode */
+ if (!(ff->open_flags & FOPEN_CACHE_IO)) {
+ err = fuse_inode_wait_for_cached_io(fi);
+ if (!err)
+ ff->open_flags |= FOPEN_CACHE_IO;
+ }
+ spin_unlock(&fi->lock);
+
+ return err;
+}
+
+/* No more pending io and no new io possible to inode via open/mmapped file */
+static void fuse_file_io_release(struct fuse_file *ff, struct inode *inode)
+{
+ if (!ff->io_opened)
+ return;
+
+ /* Last caching file close exits caching inode io mode */
+ if (ff->open_flags & FOPEN_CACHE_IO)
+ fuse_file_cached_io_end(inode);
+
+ ff->io_opened = false;
+}
+
static void fuse_file_put(struct fuse_file *ff, bool sync, bool isdir)
{
if (refcount_dec_and_test(&ff->count)) {
struct fuse_args *args = &ff->release_args->args;
+ struct inode *inode = ff->release_args->inode;
+
+ if (inode)
+ fuse_file_io_release(ff, inode);
if (isdir ? ff->fm->fc->no_opendir : ff->fm->fc->no_open) {
/* Do nothing when client does not implement 'open' */
@@ -161,7 +328,7 @@ struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
}
if (isdir)
- ff->open_flags &= ~FOPEN_DIRECT_IO;
+ ff->open_flags &= ~(FOPEN_DIRECT_IO | FOPEN_CACHE_IO);
ff->nodeid = nodeid;
@@ -199,6 +366,11 @@ int fuse_finish_open(struct inode *inode, struct file *file)
{
struct fuse_file *ff = file->private_data;
struct fuse_conn *fc = get_fuse_conn(inode);
+ int err;
+
+ err = fuse_file_io_open(file, inode);
+ if (err)
+ return err;
if (ff->open_flags & FOPEN_STREAM)
stream_open(inode, file);
@@ -1320,6 +1492,7 @@ static bool fuse_dio_wr_exclusive_lock(struct kiocb *iocb, struct iov_iter *from
struct file *file = iocb->ki_filp;
struct fuse_file *ff = file->private_data;
struct inode *inode = file_inode(iocb->ki_filp);
+ struct fuse_inode *fi = get_fuse_inode(inode);
/* server side has to advise that it supports parallel dio writes */
if (!(ff->open_flags & FOPEN_PARALLEL_DIRECT_WRITES))
@@ -1331,11 +1504,9 @@ static bool fuse_dio_wr_exclusive_lock(struct kiocb *iocb, struct iov_iter *from
if (iocb->ki_flags & IOCB_APPEND)
return true;
- /* combination opf page access and direct-io difficult, shared
- * locks actually introduce a conflict.
- */
- if (get_fuse_conn(inode)->direct_io_allow_mmap)
- return true;
+ /* shared locks are not allowed with parallel page cache IO */
+ if (test_bit(FUSE_I_CACHE_IO_MODE, &fi->state))
+ return false;
/* parallel dio beyond eof is at least for now not supported */
if (fuse_io_past_eof(iocb, from))
@@ -1355,10 +1526,14 @@ static void fuse_dio_lock(struct kiocb *iocb, struct iov_iter *from,
} else {
inode_lock_shared(inode);
/*
- * Previous check was without inode lock and might have raced,
- * check again.
+ * New parallal dio allowed only if inode is not in caching
+ * mode and denies new opens in caching mode. This check
+ * should be performed only after taking shared inode lock.
+ * Previous past eof check was without inode lock and might
+ * have raced, so check it again.
*/
- if (fuse_io_past_eof(iocb, from)) {
+ if (fuse_io_past_eof(iocb, from) ||
+ fuse_file_uncached_io_start(inode) != 0) {
inode_unlock_shared(inode);
inode_lock(inode);
*exclusive = true;
@@ -1371,6 +1546,8 @@ static void fuse_dio_unlock(struct inode *inode, bool exclusive)
if (exclusive) {
inode_unlock(inode);
} else {
+ /* Allow opens in caching mode after last parallel dio end */
+ fuse_file_uncached_io_end(inode);
inode_unlock_shared(inode);
}
}
@@ -2500,11 +2677,16 @@ static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma)
{
struct fuse_file *ff = file->private_data;
struct fuse_conn *fc = ff->fm->fc;
+ int rc;
/* DAX mmap is superior to direct_io mmap */
if (FUSE_IS_DAX(file_inode(file)))
return fuse_dax_mmap(file, vma);
+ /*
+ * FOPEN_DIRECT_IO handling is special compared to O_DIRECT,
+ * as does not allow MAP_SHARED mmap without FUSE_DIRECT_IO_ALLOW_MMAP.
+ */
if (ff->open_flags & FOPEN_DIRECT_IO) {
/*
* Can't provide the coherency needed for MAP_SHARED
@@ -2515,10 +2697,23 @@ static int fuse_file_mmap(struct file *file, struct vm_area_struct *vma)
invalidate_inode_pages2(file->f_mapping);
+ /*
+ * First mmap of direct_io file enters caching inode io mode.
+ * Also waits for parallel dio writers to go into serial mode
+ * (exclusive instead of shared lock).
+ */
+ rc = fuse_file_io_mmap(ff, file_inode(file));
+ if (rc)
+ return rc;
+
if (!(vma->vm_flags & VM_MAYSHARE)) {
/* MAP_PRIVATE */
return generic_file_mmap(file, vma);
}
+ } else if (file->f_flags & O_DIRECT) {
+ rc = fuse_file_io_mmap(ff, file_inode(file));
+ if (rc)
+ return rc;
}
if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_MAYWRITE))
@@ -3287,7 +3482,9 @@ void fuse_init_file_inode(struct inode *inode, unsigned int flags)
INIT_LIST_HEAD(&fi->write_files);
INIT_LIST_HEAD(&fi->queued_writes);
fi->writectr = 0;
+ fi->iocachectr = 0;
init_waitqueue_head(&fi->page_waitq);
+ init_waitqueue_head(&fi->direct_io_waitq);
fi->writepages = RB_ROOT;
if (IS_ENABLED(CONFIG_FUSE_DAX))
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 1c0cde4022f0..cb961f9a13c3 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -111,7 +111,7 @@ struct fuse_inode {
u64 attr_version;
union {
- /* Write related fields (regular file only) */
+ /* read/write io cache (regular file only) */
struct {
/* Files usable in writepage. Protected by fi->lock */
struct list_head write_files;
@@ -123,9 +123,15 @@ struct fuse_inode {
* (FUSE_NOWRITE) means more writes are blocked */
int writectr;
+ /** Number of files/maps using page cache */
+ int iocachectr;
+
/* Waitq for writepage completion */
wait_queue_head_t page_waitq;
+ /* waitq for direct-io completion */
+ wait_queue_head_t direct_io_waitq;
+
/* List of writepage requestst (pending or sent) */
struct rb_root writepages;
};
@@ -187,6 +193,8 @@ enum {
FUSE_I_BAD,
/* Has btime */
FUSE_I_BTIME,
+ /* Wants or already has page cache IO */
+ FUSE_I_CACHE_IO_MODE,
};
struct fuse_conn;
@@ -246,6 +254,9 @@ struct fuse_file {
/** Has flock been performed on this file? */
bool flock:1;
+
+ /** Was file opened for io? */
+ bool io_opened:1;
};
/** One input argument of a request */
@@ -1349,6 +1360,72 @@ int fuse_fileattr_set(struct mnt_idmap *idmap,
struct dentry *dentry, struct fileattr *fa);
/* file.c */
+/*
+ * Request an open in caching mode.
+ * Return true if in caching mode.
+ */
+static inline bool fuse_inode_get_io_cache(struct fuse_inode *fi)
+{
+ assert_spin_locked(&fi->lock);
+ if (fi->iocachectr < 0)
+ return false;
+ fi->iocachectr++;
+ if (fi->iocachectr == 1)
+ set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
+
+ return true;
+}
+
+/*
+ * Release an open in caching mode.
+ * Return true if no more files open in caching mode.
+ */
+static inline bool fuse_inode_put_io_cache(struct fuse_inode *fi)
+{
+ assert_spin_locked(&fi->lock);
+ if (WARN_ON(fi->iocachectr <= 0))
+ return false;
+
+ if (--fi->iocachectr == 0) {
+ clear_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * Requets to deny new opens in caching mode.
+ * Return true if denying new opens in caching mode.
+ */
+static inline bool fuse_inode_deny_io_cache(struct fuse_inode *fi)
+{
+ assert_spin_locked(&fi->lock);
+ if (fi->iocachectr > 0)
+ return false;
+ fi->iocachectr--;
+ return true;
+}
+
+/*
+ * Release a request to deny open in caching mode.
+ * Return true if allowing new opens in caching mode.
+ */
+static inline bool fuse_inode_allow_io_cache(struct fuse_inode *fi)
+{
+ assert_spin_locked(&fi->lock);
+ if (WARN_ON(fi->iocachectr >= 0))
+ return false;
+ return ++(fi->iocachectr) == 0;
+}
+
+/*
+ * Return true if allowing new opens in caching mode.
+ */
+static inline bool fuse_is_io_cache_allowed(struct fuse_inode *fi)
+{
+ return READ_ONCE(fi->iocachectr) >= 0;
+}
struct fuse_file *fuse_file_open(struct fuse_mount *fm, u64 nodeid,
unsigned int open_flags, bool isdir);
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index e7418d15fe39..66a4bd8d767d 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -353,6 +353,7 @@ struct fuse_file_lock {
* FOPEN_STREAM: the file is stream-like (no file position at all)
* FOPEN_NOFLUSH: don't flush data cache on close (unless FUSE_WRITEBACK_CACHE)
* FOPEN_PARALLEL_DIRECT_WRITES: Allow concurrent direct writes on the same inode
+ * FOPEN_CACHE_IO: using cache for this open file (incl. mmap on direct_io)
*/
#define FOPEN_DIRECT_IO (1 << 0)
#define FOPEN_KEEP_CACHE (1 << 1)
@@ -361,6 +362,7 @@ struct fuse_file_lock {
#define FOPEN_STREAM (1 << 4)
#define FOPEN_NOFLUSH (1 << 5)
#define FOPEN_PARALLEL_DIRECT_WRITES (1 << 6)
+#define FOPEN_CACHE_IO (1 << 7)
/**
* INIT request/reply flags
--
2.40.1
^ permalink raw reply related [flat|nested] 40+ messages in thread* Re: [PATCH v2 5/5] fuse: introduce inode io modes
2024-01-31 23:08 ` [PATCH v2 5/5] fuse: introduce inode io modes Bernd Schubert
@ 2024-02-01 14:47 ` Miklos Szeredi
2024-02-01 16:33 ` Amir Goldstein
0 siblings, 1 reply; 40+ messages in thread
From: Miklos Szeredi @ 2024-02-01 14:47 UTC (permalink / raw)
To: Bernd Schubert; +Cc: linux-fsdevel, dsingh, Amir Goldstein
On Thu, 1 Feb 2024 at 00:09, Bernd Schubert <bschubert@ddn.com> wrote:
>
> From: Amir Goldstein <amir73il@gmail.com>
>
> The fuse inode io mode is determined by the mode of its open files/mmaps
> and parallel dio.
>
> - caching io mode - files open in caching mode or mmap on direct_io file
> - direct io mode - no files open in caching mode and no files mmaped
> - parallel dio mode - direct io mode with parallel dio in progress
Specifically if iocachectr is:
> 0 -> caching io
== 0 -> direct io
< 0 -> parallel io
>
> We use a new FOPEN_CACHE_IO flag to explicitly mark a file that was open
> in caching mode.
This is really confusing. FOPEN_CACHE_IO is apparently an internally
used flag, but it's defined on the userspace API.
a) what is the meaning of this flag on the external API?
b) what is the purpose of this flag internally?
>
> direct_io mmap uses page cache, so first mmap will mark the file as
> FOPEN_DIRECT_IO|FOPEN_CACHE_IO (i.e. mixed mode) and inode will enter
> the caching io mode.
>
> If the server opens the file with flags FOPEN_DIRECT_IO|FOPEN_CACHE_IO,
> the inode enters caching io mode already on open.
>
> This allows executing parallel dio when inode is not in caching mode
> even if shared mmap is allowed, but no mmaps have been performed on
> the inode in question.
>
> An open in caching mode and mmap on direct_io file now waits for all
> in-progress parallel dio writes to complete, so paralle dio writes
> together with FUSE_DIRECT_IO_ALLOW_MMAP is enabled by this commit.
I think the per file state is wrong, the above can be done with just
the per-inode state.
>
> Signed-off-by: Bernd Schubert <bschubert@ddn.com>
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
> fs/fuse/file.c | 215 ++++++++++++++++++++++++++++++++++++--
> fs/fuse/fuse_i.h | 79 +++++++++++++-
> include/uapi/linux/fuse.h | 2 +
> 3 files changed, 286 insertions(+), 10 deletions(-)
>
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 7d2f4b0eb36a..eb9929ff9f60 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -105,10 +105,177 @@ static void fuse_release_end(struct fuse_mount *fm, struct fuse_args *args,
> kfree(ra);
> }
>
> +static bool fuse_file_is_direct_io(struct file *file)
> +{
> + struct fuse_file *ff = file->private_data;
> +
> + return ff->open_flags & FOPEN_DIRECT_IO || file->f_flags & O_DIRECT;
> +}
This is one of the issues with the per-file state. O_DIRECT can be
changed with fcntl(fd, F_SETFL, ...), so state calculated at open can
go stale.
> +
> +/*
> + * Wait for cached io to be allowed -
> + * Blocks new parallel dio writes and waits for the in-progress parallel dio
> + * writes to complete.
> + */
> +static int fuse_inode_wait_for_cached_io(struct fuse_inode *fi)
> +{
> + int err = 0;
> +
> + assert_spin_locked(&fi->lock);
> +
> + while (!err && !fuse_inode_get_io_cache(fi)) {
> + /*
> + * Setting the bit advises new direct-io writes
> + * to use an exclusive lock - without it the wait below
> + * might be forever.
> + */
> + set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
> + spin_unlock(&fi->lock);
> + err = wait_event_killable(fi->direct_io_waitq,
> + fuse_is_io_cache_allowed(fi));
> + spin_lock(&fi->lock);
> + }
> + /* Clear FUSE_I_CACHE_IO_MODE flag if failed to enter caching mode */
> + if (err && fi->iocachectr <= 0)
> + clear_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
> +
> + return err;
> +}
I suggest moving all infrastructure, including the inline helpers in
fuse_i.h into a separate source file.
> +
> +/* Start cached io mode where parallel dio writes are not allowed */
> +static int fuse_file_cached_io_start(struct inode *inode)
> +{
> + struct fuse_inode *fi = get_fuse_inode(inode);
> + int err;
> +
> + spin_lock(&fi->lock);
> + err = fuse_inode_wait_for_cached_io(fi);
> + spin_unlock(&fi->lock);
> + return err;
> +}
> +
> +static void fuse_file_cached_io_end(struct inode *inode)
> +{
> + struct fuse_inode *fi = get_fuse_inode(inode);
> +
> + spin_lock(&fi->lock);
> + fuse_inode_put_io_cache(get_fuse_inode(inode));
> + spin_unlock(&fi->lock);
> +}
> +
> +/* Start strictly uncached io mode where cache access is not allowed */
> +static int fuse_file_uncached_io_start(struct inode *inode)
> +{
> + struct fuse_inode *fi = get_fuse_inode(inode);
> + bool ok;
> +
> + spin_lock(&fi->lock);
> + ok = fuse_inode_deny_io_cache(fi);
> + spin_unlock(&fi->lock);
> + return ok ? 0 : -ETXTBSY;
> +}
> +
> +static void fuse_file_uncached_io_end(struct inode *inode)
> +{
> + struct fuse_inode *fi = get_fuse_inode(inode);
> + bool allow_cached_io;
> +
> + spin_lock(&fi->lock);
> + allow_cached_io = fuse_inode_allow_io_cache(fi);
> + spin_unlock(&fi->lock);
> + if (allow_cached_io)
> + wake_up(&fi->direct_io_waitq);
> +}
> +
> +/* Open flags to determine regular file io mode */
> +#define FOPEN_IO_MODE_MASK \
> + (FOPEN_DIRECT_IO | FOPEN_CACHE_IO)
> +
> +/* Request access to submit new io to inode via open file */
> +static int fuse_file_io_open(struct file *file, struct inode *inode)
> +{
> + struct fuse_file *ff = file->private_data;
> + int iomode_flags = ff->open_flags & FOPEN_IO_MODE_MASK;
> + int err;
> +
> + err = -EBUSY;
> + if (WARN_ON(ff->io_opened))
> + goto fail;
> +
> + if (!S_ISREG(inode->i_mode) || FUSE_IS_DAX(inode)) {
This S_ISREG check can also go away with separating the directory opens.
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: [PATCH v2 5/5] fuse: introduce inode io modes
2024-02-01 14:47 ` Miklos Szeredi
@ 2024-02-01 16:33 ` Amir Goldstein
2024-02-01 17:53 ` Bernd Schubert
2024-02-06 12:39 ` Amir Goldstein
0 siblings, 2 replies; 40+ messages in thread
From: Amir Goldstein @ 2024-02-01 16:33 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Bernd Schubert, linux-fsdevel, dsingh
On Thu, Feb 1, 2024 at 4:47 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Thu, 1 Feb 2024 at 00:09, Bernd Schubert <bschubert@ddn.com> wrote:
> >
> > From: Amir Goldstein <amir73il@gmail.com>
> >
> > The fuse inode io mode is determined by the mode of its open files/mmaps
> > and parallel dio.
> >
> > - caching io mode - files open in caching mode or mmap on direct_io file
> > - direct io mode - no files open in caching mode and no files mmaped
> > - parallel dio mode - direct io mode with parallel dio in progress
>
> Specifically if iocachectr is:
>
> > 0 -> caching io
> == 0 -> direct io
> < 0 -> parallel io
>
> >
> > We use a new FOPEN_CACHE_IO flag to explicitly mark a file that was open
> > in caching mode.
>
> This is really confusing. FOPEN_CACHE_IO is apparently an internally
> used flag, but it's defined on the userspace API.
>
> a) what is the meaning of this flag on the external API?
> b) what is the purpose of this flag internally?
The purpose is to annotate the state of direct io file that was mmaped
as FOPEN_DIRECT_IO | FOPEN_CACHE_IO.
An fd like this puts inode in caching mode and its release may get inode
out of caching mode.
I did not manage to do refcoutning with fuse_vma_close(), because those
calls are not balances with fuse_file_mmap() calls.
The first mmap() of an FOPEN_DIRECT_IO file may incur wait for completion
of parallel dio.
The only use of exporting FOPEN_CACHE_IO to the server is that it could
force incurring this wait at open() time instead of mmap() time.
I also considered for servers that advertise FUSE_PASSTHROUGH
capability to not allow open without specifying an explicit io mode,
that is one of FOPEN_DIRECT_IO | FOPEN_CACHE_IO |
FOPEN_PASSTHROUGH, but I did not actually implement those
semantics ATM.
>
> >
> > direct_io mmap uses page cache, so first mmap will mark the file as
> > FOPEN_DIRECT_IO|FOPEN_CACHE_IO (i.e. mixed mode) and inode will enter
> > the caching io mode.
> >
> > If the server opens the file with flags FOPEN_DIRECT_IO|FOPEN_CACHE_IO,
> > the inode enters caching io mode already on open.
> >
> > This allows executing parallel dio when inode is not in caching mode
> > even if shared mmap is allowed, but no mmaps have been performed on
> > the inode in question.
> >
> > An open in caching mode and mmap on direct_io file now waits for all
> > in-progress parallel dio writes to complete, so paralle dio writes
> > together with FUSE_DIRECT_IO_ALLOW_MMAP is enabled by this commit.
>
> I think the per file state is wrong, the above can be done with just
> the per-inode state.
>
The per-file state is to taint the file is "has been mmaped".
I did not find another way of doing this.
Please suggest another way.
>
> >
> > Signed-off-by: Bernd Schubert <bschubert@ddn.com>
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > ---
> > fs/fuse/file.c | 215 ++++++++++++++++++++++++++++++++++++--
> > fs/fuse/fuse_i.h | 79 +++++++++++++-
> > include/uapi/linux/fuse.h | 2 +
> > 3 files changed, 286 insertions(+), 10 deletions(-)
> >
> > diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> > index 7d2f4b0eb36a..eb9929ff9f60 100644
> > --- a/fs/fuse/file.c
> > +++ b/fs/fuse/file.c
> > @@ -105,10 +105,177 @@ static void fuse_release_end(struct fuse_mount *fm, struct fuse_args *args,
> > kfree(ra);
> > }
> >
> > +static bool fuse_file_is_direct_io(struct file *file)
> > +{
> > + struct fuse_file *ff = file->private_data;
> > +
> > + return ff->open_flags & FOPEN_DIRECT_IO || file->f_flags & O_DIRECT;
> > +}
>
> This is one of the issues with the per-file state. O_DIRECT can be
> changed with fcntl(fd, F_SETFL, ...), so state calculated at open can
> go stale.
>
Ouch!
Actually, it may not matter much if we just ignore O_DIRECT
completely for the purpose of io modes.
The io mode could be determined solely from FMODE_* flags.
Worst case is that if the server opens an O_DIRECT file without
FOPEN_DIRECT_IO, then this inode cannot do parallel dio
and this inode cannot be opened in passthrough mode.
I don't think that is such a big problem.
Bernd, what do you think?
BTW, server can and probably should open O_DIRECT files
with FOPEN_PASSTHROUGH and that means something
completely different than O_DIRECT && FOPEN_DIRECT_IO -
it means that io is passed through to the backing file without
doing buffered io on the backing file.
> > +
> > +/*
> > + * Wait for cached io to be allowed -
> > + * Blocks new parallel dio writes and waits for the in-progress parallel dio
> > + * writes to complete.
> > + */
> > +static int fuse_inode_wait_for_cached_io(struct fuse_inode *fi)
> > +{
> > + int err = 0;
> > +
> > + assert_spin_locked(&fi->lock);
> > +
> > + while (!err && !fuse_inode_get_io_cache(fi)) {
> > + /*
> > + * Setting the bit advises new direct-io writes
> > + * to use an exclusive lock - without it the wait below
> > + * might be forever.
> > + */
> > + set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
> > + spin_unlock(&fi->lock);
> > + err = wait_event_killable(fi->direct_io_waitq,
> > + fuse_is_io_cache_allowed(fi));
> > + spin_lock(&fi->lock);
> > + }
> > + /* Clear FUSE_I_CACHE_IO_MODE flag if failed to enter caching mode */
> > + if (err && fi->iocachectr <= 0)
> > + clear_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
> > +
> > + return err;
> > +}
>
> I suggest moving all infrastructure, including the inline helpers in
> fuse_i.h into a separate source file.
>
OK. I can make those changes.
> > +
> > +/* Start cached io mode where parallel dio writes are not allowed */
> > +static int fuse_file_cached_io_start(struct inode *inode)
> > +{
> > + struct fuse_inode *fi = get_fuse_inode(inode);
> > + int err;
> > +
> > + spin_lock(&fi->lock);
> > + err = fuse_inode_wait_for_cached_io(fi);
> > + spin_unlock(&fi->lock);
> > + return err;
> > +}
> > +
> > +static void fuse_file_cached_io_end(struct inode *inode)
> > +{
> > + struct fuse_inode *fi = get_fuse_inode(inode);
> > +
> > + spin_lock(&fi->lock);
> > + fuse_inode_put_io_cache(get_fuse_inode(inode));
> > + spin_unlock(&fi->lock);
> > +}
> > +
> > +/* Start strictly uncached io mode where cache access is not allowed */
> > +static int fuse_file_uncached_io_start(struct inode *inode)
> > +{
> > + struct fuse_inode *fi = get_fuse_inode(inode);
> > + bool ok;
> > +
> > + spin_lock(&fi->lock);
> > + ok = fuse_inode_deny_io_cache(fi);
> > + spin_unlock(&fi->lock);
> > + return ok ? 0 : -ETXTBSY;
> > +}
> > +
> > +static void fuse_file_uncached_io_end(struct inode *inode)
> > +{
> > + struct fuse_inode *fi = get_fuse_inode(inode);
> > + bool allow_cached_io;
> > +
> > + spin_lock(&fi->lock);
> > + allow_cached_io = fuse_inode_allow_io_cache(fi);
> > + spin_unlock(&fi->lock);
> > + if (allow_cached_io)
> > + wake_up(&fi->direct_io_waitq);
> > +}
> > +
> > +/* Open flags to determine regular file io mode */
> > +#define FOPEN_IO_MODE_MASK \
> > + (FOPEN_DIRECT_IO | FOPEN_CACHE_IO)
> > +
> > +/* Request access to submit new io to inode via open file */
> > +static int fuse_file_io_open(struct file *file, struct inode *inode)
> > +{
> > + struct fuse_file *ff = file->private_data;
> > + int iomode_flags = ff->open_flags & FOPEN_IO_MODE_MASK;
> > + int err;
> > +
> > + err = -EBUSY;
> > + if (WARN_ON(ff->io_opened))
> > + goto fail;
> > +
> > + if (!S_ISREG(inode->i_mode) || FUSE_IS_DAX(inode)) {
>
> This S_ISREG check can also go away with separating the directory opens.
OK. I will see if I can make that cleanup as well.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: [PATCH v2 5/5] fuse: introduce inode io modes
2024-02-01 16:33 ` Amir Goldstein
@ 2024-02-01 17:53 ` Bernd Schubert
2024-02-02 19:40 ` Amir Goldstein
2024-02-06 12:39 ` Amir Goldstein
1 sibling, 1 reply; 40+ messages in thread
From: Bernd Schubert @ 2024-02-01 17:53 UTC (permalink / raw)
To: Amir Goldstein, Miklos Szeredi
Cc: linux-fsdevel@vger.kernel.org, Dharmendra Singh
On 2/1/24 17:33, Amir Goldstein wrote:
> On Thu, Feb 1, 2024 at 4:47 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>>
>> On Thu, 1 Feb 2024 at 00:09, Bernd Schubert <bschubert@ddn.com> wrote:
>>>
>>> From: Amir Goldstein <amir73il@gmail.com>
>>>
>>> The fuse inode io mode is determined by the mode of its open files/mmaps
>>> and parallel dio.
>>>
>>> - caching io mode - files open in caching mode or mmap on direct_io file
>>> - direct io mode - no files open in caching mode and no files mmaped
>>> - parallel dio mode - direct io mode with parallel dio in progress
>>
>> Specifically if iocachectr is:
>>
>>> 0 -> caching io
>> == 0 -> direct io
>> < 0 -> parallel io
>>
>>>
>>> We use a new FOPEN_CACHE_IO flag to explicitly mark a file that was open
>>> in caching mode.
>>
>> This is really confusing. FOPEN_CACHE_IO is apparently an internally
>> used flag, but it's defined on the userspace API.
>>
>> a) what is the meaning of this flag on the external API?
>> b) what is the purpose of this flag internally?
>
> The purpose is to annotate the state of direct io file that was mmaped
> as FOPEN_DIRECT_IO | FOPEN_CACHE_IO.
> An fd like this puts inode in caching mode and its release may get inode
> out of caching mode.
>
> I did not manage to do refcoutning with fuse_vma_close(), because those
> calls are not balances with fuse_file_mmap() calls.
>
> The first mmap() of an FOPEN_DIRECT_IO file may incur wait for completion
> of parallel dio.
>
> The only use of exporting FOPEN_CACHE_IO to the server is that it could
> force incurring this wait at open() time instead of mmap() time.
>
> I also considered for servers that advertise FUSE_PASSTHROUGH
> capability to not allow open without specifying an explicit io mode,
> that is one of FOPEN_DIRECT_IO | FOPEN_CACHE_IO |
> FOPEN_PASSTHROUGH, but I did not actually implement those
> semantics ATM.
>
>>
>>>
>>> direct_io mmap uses page cache, so first mmap will mark the file as
>>> FOPEN_DIRECT_IO|FOPEN_CACHE_IO (i.e. mixed mode) and inode will enter
>>> the caching io mode.
>>>
>>> If the server opens the file with flags FOPEN_DIRECT_IO|FOPEN_CACHE_IO,
>>> the inode enters caching io mode already on open.
>>>
>>> This allows executing parallel dio when inode is not in caching mode
>>> even if shared mmap is allowed, but no mmaps have been performed on
>>> the inode in question.
>>>
>>> An open in caching mode and mmap on direct_io file now waits for all
>>> in-progress parallel dio writes to complete, so paralle dio writes
>>> together with FUSE_DIRECT_IO_ALLOW_MMAP is enabled by this commit.
>>
>> I think the per file state is wrong, the above can be done with just
>> the per-inode state.
>>
>
> The per-file state is to taint the file is "has been mmaped".
> I did not find another way of doing this.
> Please suggest another way.
>
>>
>>>
>>> Signed-off-by: Bernd Schubert <bschubert@ddn.com>
>>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>>> ---
>>> fs/fuse/file.c | 215 ++++++++++++++++++++++++++++++++++++--
>>> fs/fuse/fuse_i.h | 79 +++++++++++++-
>>> include/uapi/linux/fuse.h | 2 +
>>> 3 files changed, 286 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
>>> index 7d2f4b0eb36a..eb9929ff9f60 100644
>>> --- a/fs/fuse/file.c
>>> +++ b/fs/fuse/file.c
>>> @@ -105,10 +105,177 @@ static void fuse_release_end(struct fuse_mount *fm, struct fuse_args *args,
>>> kfree(ra);
>>> }
>>>
>>> +static bool fuse_file_is_direct_io(struct file *file)
>>> +{
>>> + struct fuse_file *ff = file->private_data;
>>> +
>>> + return ff->open_flags & FOPEN_DIRECT_IO || file->f_flags & O_DIRECT;
>>> +}
>>
>> This is one of the issues with the per-file state. O_DIRECT can be
>> changed with fcntl(fd, F_SETFL, ...), so state calculated at open can
>> go stale.
>>
>
> Ouch!
> Actually, it may not matter much if we just ignore O_DIRECT
> completely for the purpose of io modes.
I had also missed that fcntl option :/
>
> The io mode could be determined solely from FMODE_* flags.
>
> Worst case is that if the server opens an O_DIRECT file without
> FOPEN_DIRECT_IO, then this inode cannot do parallel dio
> and this inode cannot be opened in passthrough mode.
> I don't think that is such a big problem.
>
> Bernd, what do you think?
Yeah, currently FOPEN_PARALLEL_DIRECT_WRITES does not have an effect
without FOPEN_DIRECT_IO. In my fuse-dio consolidation branch that
changes. For me it is ok that parallel dio requires
FOPEN_DIRECT_IO, easy enough to set on the server side in open methods
when O_DIRECT is set. We should just document it in fuse.h.
>
> BTW, server can and probably should open O_DIRECT files
> with FOPEN_PASSTHROUGH and that means something
> completely different than O_DIRECT && FOPEN_DIRECT_IO -
> it means that io is passed through to the backing file without
> doing buffered io on the backing file.
>
>>> +
>>> +/*
>>> + * Wait for cached io to be allowed -
>>> + * Blocks new parallel dio writes and waits for the in-progress parallel dio
>>> + * writes to complete.
>>> + */
>>> +static int fuse_inode_wait_for_cached_io(struct fuse_inode *fi)
>>> +{
>>> + int err = 0;
>>> +
>>> + assert_spin_locked(&fi->lock);
>>> +
>>> + while (!err && !fuse_inode_get_io_cache(fi)) {
>>> + /*
>>> + * Setting the bit advises new direct-io writes
>>> + * to use an exclusive lock - without it the wait below
>>> + * might be forever.
>>> + */
>>> + set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
>>> + spin_unlock(&fi->lock);
>>> + err = wait_event_killable(fi->direct_io_waitq,
>>> + fuse_is_io_cache_allowed(fi));
>>> + spin_lock(&fi->lock);
>>> + }
>>> + /* Clear FUSE_I_CACHE_IO_MODE flag if failed to enter caching mode */
>>> + if (err && fi->iocachectr <= 0)
>>> + clear_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
>>> +
>>> + return err;
>>> +}
>>
>> I suggest moving all infrastructure, including the inline helpers in
>> fuse_i.h into a separate source file.
>>
>
> OK. I can make those changes.
>
>>> +
>>> +/* Start cached io mode where parallel dio writes are not allowed */
>>> +static int fuse_file_cached_io_start(struct inode *inode)
>>> +{
>>> + struct fuse_inode *fi = get_fuse_inode(inode);
>>> + int err;
>>> +
>>> + spin_lock(&fi->lock);
>>> + err = fuse_inode_wait_for_cached_io(fi);
>>> + spin_unlock(&fi->lock);
>>> + return err;
>>> +}
>>> +
>>> +static void fuse_file_cached_io_end(struct inode *inode)
>>> +{
>>> + struct fuse_inode *fi = get_fuse_inode(inode);
>>> +
>>> + spin_lock(&fi->lock);
>>> + fuse_inode_put_io_cache(get_fuse_inode(inode));
>>> + spin_unlock(&fi->lock);
>>> +}
>>> +
>>> +/* Start strictly uncached io mode where cache access is not allowed */
>>> +static int fuse_file_uncached_io_start(struct inode *inode)
>>> +{
>>> + struct fuse_inode *fi = get_fuse_inode(inode);
>>> + bool ok;
>>> +
>>> + spin_lock(&fi->lock);
>>> + ok = fuse_inode_deny_io_cache(fi);
>>> + spin_unlock(&fi->lock);
>>> + return ok ? 0 : -ETXTBSY;
>>> +}
>>> +
>>> +static void fuse_file_uncached_io_end(struct inode *inode)
>>> +{
>>> + struct fuse_inode *fi = get_fuse_inode(inode);
>>> + bool allow_cached_io;
>>> +
>>> + spin_lock(&fi->lock);
>>> + allow_cached_io = fuse_inode_allow_io_cache(fi);
>>> + spin_unlock(&fi->lock);
>>> + if (allow_cached_io)
>>> + wake_up(&fi->direct_io_waitq);
>>> +}
>>> +
>>> +/* Open flags to determine regular file io mode */
>>> +#define FOPEN_IO_MODE_MASK \
>>> + (FOPEN_DIRECT_IO | FOPEN_CACHE_IO)
>>> +
>>> +/* Request access to submit new io to inode via open file */
>>> +static int fuse_file_io_open(struct file *file, struct inode *inode)
>>> +{
>>> + struct fuse_file *ff = file->private_data;
>>> + int iomode_flags = ff->open_flags & FOPEN_IO_MODE_MASK;
>>> + int err;
>>> +
>>> + err = -EBUSY;
>>> + if (WARN_ON(ff->io_opened))
>>> + goto fail;
>>> +
>>> + if (!S_ISREG(inode->i_mode) || FUSE_IS_DAX(inode)) {
>>
>> This S_ISREG check can also go away with separating the directory opens.
>
> OK. I will see if I can make that cleanup as well.
Note to myself, I need to make double sure that our (Dharmendras/mine)
atomic open branch gets updated.
Thanks,
Bernd
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: [PATCH v2 5/5] fuse: introduce inode io modes
2024-02-01 17:53 ` Bernd Schubert
@ 2024-02-02 19:40 ` Amir Goldstein
0 siblings, 0 replies; 40+ messages in thread
From: Amir Goldstein @ 2024-02-02 19:40 UTC (permalink / raw)
To: Bernd Schubert
Cc: Miklos Szeredi, linux-fsdevel@vger.kernel.org, Dharmendra Singh
On Thu, Feb 1, 2024 at 7:53 PM Bernd Schubert <bschubert@ddn.com> wrote:
>
> On 2/1/24 17:33, Amir Goldstein wrote:
> > On Thu, Feb 1, 2024 at 4:47 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >>
> >> On Thu, 1 Feb 2024 at 00:09, Bernd Schubert <bschubert@ddn.com> wrote:
> >>>
> >>> From: Amir Goldstein <amir73il@gmail.com>
> >>>
> >>> The fuse inode io mode is determined by the mode of its open files/mmaps
> >>> and parallel dio.
> >>>
> >>> - caching io mode - files open in caching mode or mmap on direct_io file
> >>> - direct io mode - no files open in caching mode and no files mmaped
> >>> - parallel dio mode - direct io mode with parallel dio in progress
> >>
> >> Specifically if iocachectr is:
> >>
> >>> 0 -> caching io
> >> == 0 -> direct io
> >> < 0 -> parallel io
> >>
> >>>
> >>> We use a new FOPEN_CACHE_IO flag to explicitly mark a file that was open
> >>> in caching mode.
> >>
> >> This is really confusing. FOPEN_CACHE_IO is apparently an internally
> >> used flag, but it's defined on the userspace API.
> >>
> >> a) what is the meaning of this flag on the external API?
> >> b) what is the purpose of this flag internally?
> >
> > The purpose is to annotate the state of direct io file that was mmaped
> > as FOPEN_DIRECT_IO | FOPEN_CACHE_IO.
> > An fd like this puts inode in caching mode and its release may get inode
> > out of caching mode.
> >
> > I did not manage to do refcoutning with fuse_vma_close(), because those
> > calls are not balances with fuse_file_mmap() calls.
> >
> > The first mmap() of an FOPEN_DIRECT_IO file may incur wait for completion
> > of parallel dio.
> >
> > The only use of exporting FOPEN_CACHE_IO to the server is that it could
> > force incurring this wait at open() time instead of mmap() time.
> >
> > I also considered for servers that advertise FUSE_PASSTHROUGH
> > capability to not allow open without specifying an explicit io mode,
> > that is one of FOPEN_DIRECT_IO | FOPEN_CACHE_IO |
> > FOPEN_PASSTHROUGH, but I did not actually implement those
> > semantics ATM.
> >
> >>
> >>>
> >>> direct_io mmap uses page cache, so first mmap will mark the file as
> >>> FOPEN_DIRECT_IO|FOPEN_CACHE_IO (i.e. mixed mode) and inode will enter
> >>> the caching io mode.
> >>>
> >>> If the server opens the file with flags FOPEN_DIRECT_IO|FOPEN_CACHE_IO,
> >>> the inode enters caching io mode already on open.
> >>>
> >>> This allows executing parallel dio when inode is not in caching mode
> >>> even if shared mmap is allowed, but no mmaps have been performed on
> >>> the inode in question.
> >>>
> >>> An open in caching mode and mmap on direct_io file now waits for all
> >>> in-progress parallel dio writes to complete, so paralle dio writes
> >>> together with FUSE_DIRECT_IO_ALLOW_MMAP is enabled by this commit.
> >>
> >> I think the per file state is wrong, the above can be done with just
> >> the per-inode state.
> >>
> >
> > The per-file state is to taint the file is "has been mmaped".
> > I did not find another way of doing this.
> > Please suggest another way.
> >
> >>
> >>>
> >>> Signed-off-by: Bernd Schubert <bschubert@ddn.com>
> >>> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> >>> ---
> >>> fs/fuse/file.c | 215 ++++++++++++++++++++++++++++++++++++--
> >>> fs/fuse/fuse_i.h | 79 +++++++++++++-
> >>> include/uapi/linux/fuse.h | 2 +
> >>> 3 files changed, 286 insertions(+), 10 deletions(-)
> >>>
> >>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> >>> index 7d2f4b0eb36a..eb9929ff9f60 100644
> >>> --- a/fs/fuse/file.c
> >>> +++ b/fs/fuse/file.c
> >>> @@ -105,10 +105,177 @@ static void fuse_release_end(struct fuse_mount *fm, struct fuse_args *args,
> >>> kfree(ra);
> >>> }
> >>>
> >>> +static bool fuse_file_is_direct_io(struct file *file)
> >>> +{
> >>> + struct fuse_file *ff = file->private_data;
> >>> +
> >>> + return ff->open_flags & FOPEN_DIRECT_IO || file->f_flags & O_DIRECT;
> >>> +}
> >>
> >> This is one of the issues with the per-file state. O_DIRECT can be
> >> changed with fcntl(fd, F_SETFL, ...), so state calculated at open can
> >> go stale.
> >>
> >
> > Ouch!
> > Actually, it may not matter much if we just ignore O_DIRECT
> > completely for the purpose of io modes.
>
> I had also missed that fcntl option :/
>
> >
> > The io mode could be determined solely from FMODE_* flags.
> >
> > Worst case is that if the server opens an O_DIRECT file without
> > FOPEN_DIRECT_IO, then this inode cannot do parallel dio
> > and this inode cannot be opened in passthrough mode.
> > I don't think that is such a big problem.
> >
> > Bernd, what do you think?
>
> Yeah, currently FOPEN_PARALLEL_DIRECT_WRITES does not have an effect
> without FOPEN_DIRECT_IO. In my fuse-dio consolidation branch that
> changes. For me it is ok that parallel dio requires
> FOPEN_DIRECT_IO, easy enough to set on the server side in open methods
> when O_DIRECT is set. We should just document it in fuse.h.
>
Ok. I removed the O_DIRECT code from the last version I pushed.
Please verify that it does not break anything in your dio tests.
> >
> > BTW, server can and probably should open O_DIRECT files
> > with FOPEN_PASSTHROUGH and that means something
> > completely different than O_DIRECT && FOPEN_DIRECT_IO -
> > it means that io is passed through to the backing file without
> > doing buffered io on the backing file.
> >
> >>> +
> >>> +/*
> >>> + * Wait for cached io to be allowed -
> >>> + * Blocks new parallel dio writes and waits for the in-progress parallel dio
> >>> + * writes to complete.
> >>> + */
> >>> +static int fuse_inode_wait_for_cached_io(struct fuse_inode *fi)
> >>> +{
> >>> + int err = 0;
> >>> +
> >>> + assert_spin_locked(&fi->lock);
> >>> +
> >>> + while (!err && !fuse_inode_get_io_cache(fi)) {
> >>> + /*
> >>> + * Setting the bit advises new direct-io writes
> >>> + * to use an exclusive lock - without it the wait below
> >>> + * might be forever.
> >>> + */
> >>> + set_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
> >>> + spin_unlock(&fi->lock);
> >>> + err = wait_event_killable(fi->direct_io_waitq,
> >>> + fuse_is_io_cache_allowed(fi));
> >>> + spin_lock(&fi->lock);
> >>> + }
> >>> + /* Clear FUSE_I_CACHE_IO_MODE flag if failed to enter caching mode */
> >>> + if (err && fi->iocachectr <= 0)
> >>> + clear_bit(FUSE_I_CACHE_IO_MODE, &fi->state);
> >>> +
> >>> + return err;
> >>> +}
> >>
> >> I suggest moving all infrastructure, including the inline helpers in
> >> fuse_i.h into a separate source file.
> >>
> >
> > OK. I can make those changes.
> >
moved to iomode.c
> >>> +
> >>> +/* Start cached io mode where parallel dio writes are not allowed */
> >>> +static int fuse_file_cached_io_start(struct inode *inode)
> >>> +{
> >>> + struct fuse_inode *fi = get_fuse_inode(inode);
> >>> + int err;
> >>> +
> >>> + spin_lock(&fi->lock);
> >>> + err = fuse_inode_wait_for_cached_io(fi);
> >>> + spin_unlock(&fi->lock);
> >>> + return err;
> >>> +}
> >>> +
> >>> +static void fuse_file_cached_io_end(struct inode *inode)
> >>> +{
> >>> + struct fuse_inode *fi = get_fuse_inode(inode);
> >>> +
> >>> + spin_lock(&fi->lock);
> >>> + fuse_inode_put_io_cache(get_fuse_inode(inode));
> >>> + spin_unlock(&fi->lock);
> >>> +}
> >>> +
> >>> +/* Start strictly uncached io mode where cache access is not allowed */
> >>> +static int fuse_file_uncached_io_start(struct inode *inode)
> >>> +{
> >>> + struct fuse_inode *fi = get_fuse_inode(inode);
> >>> + bool ok;
> >>> +
> >>> + spin_lock(&fi->lock);
> >>> + ok = fuse_inode_deny_io_cache(fi);
> >>> + spin_unlock(&fi->lock);
> >>> + return ok ? 0 : -ETXTBSY;
> >>> +}
> >>> +
> >>> +static void fuse_file_uncached_io_end(struct inode *inode)
> >>> +{
> >>> + struct fuse_inode *fi = get_fuse_inode(inode);
> >>> + bool allow_cached_io;
> >>> +
> >>> + spin_lock(&fi->lock);
> >>> + allow_cached_io = fuse_inode_allow_io_cache(fi);
> >>> + spin_unlock(&fi->lock);
> >>> + if (allow_cached_io)
> >>> + wake_up(&fi->direct_io_waitq);
> >>> +}
> >>> +
> >>> +/* Open flags to determine regular file io mode */
> >>> +#define FOPEN_IO_MODE_MASK \
> >>> + (FOPEN_DIRECT_IO | FOPEN_CACHE_IO)
> >>> +
> >>> +/* Request access to submit new io to inode via open file */
> >>> +static int fuse_file_io_open(struct file *file, struct inode *inode)
> >>> +{
> >>> + struct fuse_file *ff = file->private_data;
> >>> + int iomode_flags = ff->open_flags & FOPEN_IO_MODE_MASK;
> >>> + int err;
> >>> +
> >>> + err = -EBUSY;
> >>> + if (WARN_ON(ff->io_opened))
> >>> + goto fail;
> >>> +
> >>> + if (!S_ISREG(inode->i_mode) || FUSE_IS_DAX(inode)) {
> >>
> >> This S_ISREG check can also go away with separating the directory opens.
> >
> > OK. I will see if I can make that cleanup as well.
>
done.
Bernd,
I pushed branch with latest cleanup patches and addressed Miklos'
simpler review comments to:
https://github.com/amir73il/linux/commits/fuse_io_mode-020224/
The one thing I did not change is the FOPEN_CACHE_IO flag
and per-file state, because I have no idea how to fix this differently.
I can change the external flag to an internal state if that is the main
concern, but I had a feeling there was more to it.
If you get the chance, please review and re-run your tests.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v2 5/5] fuse: introduce inode io modes
2024-02-01 16:33 ` Amir Goldstein
2024-02-01 17:53 ` Bernd Schubert
@ 2024-02-06 12:39 ` Amir Goldstein
2024-02-06 12:53 ` Miklos Szeredi
1 sibling, 1 reply; 40+ messages in thread
From: Amir Goldstein @ 2024-02-06 12:39 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Bernd Schubert, linux-fsdevel, dsingh
On Thu, Feb 1, 2024 at 6:33 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Thu, Feb 1, 2024 at 4:47 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > On Thu, 1 Feb 2024 at 00:09, Bernd Schubert <bschubert@ddn.com> wrote:
> > >
> > > From: Amir Goldstein <amir73il@gmail.com>
> > >
> > > The fuse inode io mode is determined by the mode of its open files/mmaps
> > > and parallel dio.
> > >
> > > - caching io mode - files open in caching mode or mmap on direct_io file
> > > - direct io mode - no files open in caching mode and no files mmaped
> > > - parallel dio mode - direct io mode with parallel dio in progress
> >
> > Specifically if iocachectr is:
> >
> > > 0 -> caching io
> > == 0 -> direct io
> > < 0 -> parallel io
> >
> > >
> > > We use a new FOPEN_CACHE_IO flag to explicitly mark a file that was open
> > > in caching mode.
> >
> > This is really confusing. FOPEN_CACHE_IO is apparently an internally
> > used flag, but it's defined on the userspace API.
> >
> > a) what is the meaning of this flag on the external API?
> > b) what is the purpose of this flag internally?
>
> The purpose is to annotate the state of direct io file that was mmaped
> as FOPEN_DIRECT_IO | FOPEN_CACHE_IO.
> An fd like this puts inode in caching mode and its release may get inode
> out of caching mode.
>
> I did not manage to do refcoutning with fuse_vma_close(), because those
> calls are not balances with fuse_file_mmap() calls.
>
> The first mmap() of an FOPEN_DIRECT_IO file may incur wait for completion
> of parallel dio.
>
> The only use of exporting FOPEN_CACHE_IO to the server is that it could
> force incurring this wait at open() time instead of mmap() time.
>
Miklos,
I have played with this rebranding of
FOPEN_CACHE_IO => FOPEN_NO_PARALLEL_DIO_WRITES
The meaning of the rebranded flag is:
Prevent parallel dio on inode for as long as this file is kept open.
The io modes code sets this flag implicitly on the first shared mmap.
Let me know if this makes the external flag easier to swallow.
Of course I can make this flag internal and not and FOPEN_ flag
at all, but IMO, the code is easier to understand when the type of
iocachectl refcount held by any file is specified by its FOPEN_ flags.
Let me know what you think.
Thanks,
Amir.
https://github.com/amir73il/linux/commits/fuse_io_mode-wip/
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -353,7 +353,7 @@ struct fuse_file_lock {
* FOPEN_STREAM: the file is stream-like (no file position at all)
* FOPEN_NOFLUSH: don't flush data cache on close (unless FUSE_WRITEBACK_CACHE)
* FOPEN_PARALLEL_DIRECT_WRITES: Allow concurrent direct writes on
the same inode
- * FOPEN_CACHE_IO: using cache for this open file (incl. mmap on direct_io)
+ * FOPEN_NO_PARALLEL_DIO_WRITES: Deny concurrent direct writes on the
same inode
*/
#define FOPEN_DIRECT_IO (1 << 0)
#define FOPEN_KEEP_CACHE (1 << 1)
@@ -362,7 +362,7 @@ struct fuse_file_lock {
#define FOPEN_STREAM (1 << 4)
#define FOPEN_NOFLUSH (1 << 5)
#define FOPEN_PARALLEL_DIRECT_WRITES (1 << 6)
-#define FOPEN_CACHE_IO (1 << 7)
+#define FOPEN_NO_PARALLEL_DIO_WRITES (1 << 7)
...
- /* Set explicit FOPEN_CACHE_IO flag for file open in caching mode */
- if (!fuse_file_is_direct_io(file))
- ff->open_flags |= FOPEN_CACHE_IO;
+ /*
+ * FOPEN_CACHE_IO is an internal flag that is set on file not open in
+ * direct io mode and it cannot be set explicitly by the server.
+ * FOPEN_NO_PARALLEL_DIO_WRITES is set on file open in caching mode and
+ * is not allowed together with FOPEN_PARALLEL_DIRECT_WRITES.
+ * This includes a file open with O_DIRECT, but server did not specify
+ * FOPEN_DIRECT_IO. In this case, a later fcntl() could
remove O_DIRECT,
+ * so we put the inode in caching mode to prevent parallel dio.
+ * FOPEN_PARALLEL_DIRECT_WRITES requires FOPEN_DIRECT_IO.
+ */
+ if (ff->open_flags & FOPEN_NO_PARALLEL_DIO_WRITES) {
+ if (ff->open_flags & FOPEN_PARALLEL_DIRECT_WRITES)
+ goto fail;
+ } else if (!(ff->open_flags & FOPEN_DIRECT_IO)) {
+ ff->open_flags |= FOPEN_NO_PARALLEL_DIO_WRITES;
+ ff->open_flags &= ~FOPEN_PARALLEL_DIRECT_WRITES;
+ }
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: [PATCH v2 5/5] fuse: introduce inode io modes
2024-02-06 12:39 ` Amir Goldstein
@ 2024-02-06 12:53 ` Miklos Szeredi
2024-02-06 13:07 ` Amir Goldstein
0 siblings, 1 reply; 40+ messages in thread
From: Miklos Szeredi @ 2024-02-06 12:53 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Bernd Schubert, linux-fsdevel, dsingh
On Tue, 6 Feb 2024 at 13:39, Amir Goldstein <amir73il@gmail.com> wrote:
> I have played with this rebranding of
> FOPEN_CACHE_IO => FOPEN_NO_PARALLEL_DIO_WRITES
>
> The meaning of the rebranded flag is:
> Prevent parallel dio on inode for as long as this file is kept open.
>
> The io modes code sets this flag implicitly on the first shared mmap.
>
> Let me know if this makes the external flag easier to swallow.
> Of course I can make this flag internal and not and FOPEN_ flag
> at all, but IMO, the code is easier to understand when the type of
> iocachectl refcount held by any file is specified by its FOPEN_ flags.
If there's no clear use case that would benefit from having this flag
on the userspace interface, then I'd recommend not to export it for
now.
I understand the need for clarifying the various states that the
kernel can be, but I think that's a bigger project involving e.g. data
and metadata cache validity, where the current rules are pretty
convoluted.
So for now I'd just stick with the implicit state change by mmap.
BTW, I started looking at the fuse-backing-fd branch and really hope
we can get this into shape for the next merge window.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v2 5/5] fuse: introduce inode io modes
2024-02-06 12:53 ` Miklos Szeredi
@ 2024-02-06 13:07 ` Amir Goldstein
0 siblings, 0 replies; 40+ messages in thread
From: Amir Goldstein @ 2024-02-06 13:07 UTC (permalink / raw)
To: Miklos Szeredi; +Cc: Bernd Schubert, linux-fsdevel, dsingh
On Tue, Feb 6, 2024 at 2:53 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Tue, 6 Feb 2024 at 13:39, Amir Goldstein <amir73il@gmail.com> wrote:
>
> > I have played with this rebranding of
> > FOPEN_CACHE_IO => FOPEN_NO_PARALLEL_DIO_WRITES
> >
> > The meaning of the rebranded flag is:
> > Prevent parallel dio on inode for as long as this file is kept open.
> >
> > The io modes code sets this flag implicitly on the first shared mmap.
> >
> > Let me know if this makes the external flag easier to swallow.
> > Of course I can make this flag internal and not and FOPEN_ flag
> > at all, but IMO, the code is easier to understand when the type of
> > iocachectl refcount held by any file is specified by its FOPEN_ flags.
>
> If there's no clear use case that would benefit from having this flag
> on the userspace interface, then I'd recommend not to export it for
> now.
>
> I understand the need for clarifying the various states that the
> kernel can be, but I think that's a bigger project involving e.g. data
> and metadata cache validity, where the current rules are pretty
> convoluted.
>
> So for now I'd just stick with the implicit state change by mmap.
>
Understood.
Do you object to reserving the flag in uapi, but disallowing the
server to set it?
This is how it is in my branch after addressing you other review comments:
https://github.com/amir73il/linux/commits/fuse_io_mode-030224
* FOPEN_PARALLEL_DIRECT_WRITES: Allow concurrent direct writes on
the same inode
+ * FOPEN_CACHE_IO: internal flag for mmap of direct_io (reserved for
future use)
*/
> BTW, I started looking at the fuse-backing-fd branch and really hope
> we can get this into shape for the next merge window.
>
As far as I am concerned, those patches are good to go.
I was only waiting for fuse_io_mode to stabilize, in case you wanted
bigger changes there.
I will post the FUSE_PASSTHROUGH patches, based on fuse_io_mode OTM
so you could review them on-list as well.
Thanks,
Amir.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v2 0/5] fuse: inode IO modes and mmap
2024-01-31 23:08 [PATCH v2 0/5] fuse: inode IO modes and mmap Bernd Schubert
` (4 preceding siblings ...)
2024-01-31 23:08 ` [PATCH v2 5/5] fuse: introduce inode io modes Bernd Schubert
@ 2024-02-01 10:30 ` Amir Goldstein
2024-02-01 14:30 ` Bernd Schubert
5 siblings, 1 reply; 40+ messages in thread
From: Amir Goldstein @ 2024-02-01 10:30 UTC (permalink / raw)
To: Bernd Schubert; +Cc: miklos, linux-fsdevel, dsingh, Bernd Schubert
On Thu, Feb 1, 2024 at 1:08 AM Bernd Schubert <bschubert@ddn.com> wrote:
>
> This series is mostly about mmap, direct-IO and inode IO modes.
> (new in this series is FOPEN_CACHE_IO).
> It brings back the shared lock for FOPEN_DIRECT_IO when
> FUSE_DIRECT_IO_ALLOW_MMAP is set and is also preparation
> work for Amirs work on fuse-passthrough and also for
For the interested:
https://github.com/amir73il/linux/commits/fuse-backing-fd-010224/
Bernd,
Can you push this series to your v2 branch so that I can rebase
my branch on top of it?
> shared lock O_DIRECT and direct-IO code consolidation I have
> patches for.
>
> Patch 1/5 was already posted before
> https://patchwork.kernel.org/project/linux-fsdevel/patch/20231213150703.6262-1-bschubert@ddn.com/
> but is included here again, as especially patch 5/5 has a
> dependency on it. Amir has also spotted a typo in the commit message
> of the initial patch, which is corrected here.
>
> Patches 2/5 and 3/5 add helper functions, which are needed by the
> main patch (5/5) in this series and are be also needed by another
> fuse direct-IO series. That series needs the helper functions in
> fuse_cache_write_iter, thus, these new helpers are above that
> function.
>
> Patch 4/5 allows to fail fuse_finish_open and is a preparation
> to handle conflicting IO modes from the server side and will also be
> needed for fuse passthrough.
>
> Patch 5/5 is the main patch in the series, which adds inode
> IO modes, which is needed to re-enable shared DIO writes locks
> when FUSE_DIRECT_IO_ALLOW_MMAP is set. Furthermore, these IO modes
> are also needed by Amirs WIP fuse passthrough work.
>
> The conflict of FUSE_DIRECT_IO_ALLOW_MMAP and
> FOPEN_PARALLEL_DIRECT_WRITES was detected by xfstest generic/095.
> This patch series was tested by running a loop of that test
> and also by multiple runs of the complete xfstest suite.
> For testing with libfuse a version is needed that includes this
> pull request
> https://github.com/libfuse/libfuse/pull/870
Heh, this is already merged :)
For the record, I understand that you ran this test with passthrough_hp.
In which configurations --direct-io? --nocache? only default?
Thanks for pushing this through!
Amir.
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: [PATCH v2 0/5] fuse: inode IO modes and mmap
2024-02-01 10:30 ` [PATCH v2 0/5] fuse: inode IO modes and mmap Amir Goldstein
@ 2024-02-01 14:30 ` Bernd Schubert
2024-02-01 15:56 ` Amir Goldstein
0 siblings, 1 reply; 40+ messages in thread
From: Bernd Schubert @ 2024-02-01 14:30 UTC (permalink / raw)
To: Amir Goldstein, Bernd Schubert; +Cc: miklos, linux-fsdevel, dsingh
Hi Amir,
sorry for a bit late reply (*).
On 2/1/24 11:30, Amir Goldstein wrote:
> On Thu, Feb 1, 2024 at 1:08 AM Bernd Schubert <bschubert@ddn.com> wrote:
>>
>> This series is mostly about mmap, direct-IO and inode IO modes.
>> (new in this series is FOPEN_CACHE_IO).
>> It brings back the shared lock for FOPEN_DIRECT_IO when
>> FUSE_DIRECT_IO_ALLOW_MMAP is set and is also preparation
>> work for Amirs work on fuse-passthrough and also for
>
> For the interested:
> https://github.com/amir73il/linux/commits/fuse-backing-fd-010224/
>
> Bernd,
>
> Can you push this series to your v2 branch so that I can rebase
> my branch on top of it?
Do you mind if I push this to a different branch to keep the branch clean?
>
>> shared lock O_DIRECT and direct-IO code consolidation I have
>> patches for.
>>
>> Patch 1/5 was already posted before
>> https://patchwork.kernel.org/project/linux-fsdevel/patch/20231213150703.6262-1-bschubert@ddn.com/
>> but is included here again, as especially patch 5/5 has a
>> dependency on it. Amir has also spotted a typo in the commit message
>> of the initial patch, which is corrected here.
>>
>> Patches 2/5 and 3/5 add helper functions, which are needed by the
>> main patch (5/5) in this series and are be also needed by another
>> fuse direct-IO series. That series needs the helper functions in
>> fuse_cache_write_iter, thus, these new helpers are above that
>> function.
>>
>> Patch 4/5 allows to fail fuse_finish_open and is a preparation
>> to handle conflicting IO modes from the server side and will also be
>> needed for fuse passthrough.
>>
>> Patch 5/5 is the main patch in the series, which adds inode
>> IO modes, which is needed to re-enable shared DIO writes locks
>> when FUSE_DIRECT_IO_ALLOW_MMAP is set. Furthermore, these IO modes
>> are also needed by Amirs WIP fuse passthrough work.
>>
>> The conflict of FUSE_DIRECT_IO_ALLOW_MMAP and
>> FOPEN_PARALLEL_DIRECT_WRITES was detected by xfstest generic/095.
>> This patch series was tested by running a loop of that test
>> and also by multiple runs of the complete xfstest suite.
>> For testing with libfuse a version is needed that includes this
>> pull request
>> https://github.com/libfuse/libfuse/pull/870
>
> Heh, this is already merged :)
Yeah, just not in any libfuse release yet. Btw, for some reasons I need
to comment out "_require_aio" in tests/generic/095 to get the test to do
something, I need to investigate why.
>
> For the record, I understand that you ran this test with passthrough_hp.
> In which configurations --direct-io? --nocache? only default?
I always test with both, which is why testing takes a bit time with all
tests. Also both modes for generic/095, but that one with an additional
loop (10 iterations), as in loop mode it found some issues in the
initial patches. For the --direct-io I actually need a slight patch to
get that passed through from local.config to passthrough_hp via
/sbin/mount.fuse.passthrough. Will submit that to libfuse later on.
>
> Thanks for pushing this through!
Well, thanks a lot for all of your work on it!
Cheers,
Bernd
PS (*): _Another_ broken bicycle in the middle of way to the office....
Half of the spokes of my rear wheel broke, out of the nowhere.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v2 0/5] fuse: inode IO modes and mmap
2024-02-01 14:30 ` Bernd Schubert
@ 2024-02-01 15:56 ` Amir Goldstein
2024-02-01 16:01 ` Bernd Schubert
0 siblings, 1 reply; 40+ messages in thread
From: Amir Goldstein @ 2024-02-01 15:56 UTC (permalink / raw)
To: Bernd Schubert; +Cc: Bernd Schubert, miklos, linux-fsdevel, dsingh
On Thu, Feb 1, 2024 at 4:30 PM Bernd Schubert
<bernd.schubert@fastmail.fm> wrote:
>
> Hi Amir,
>
> sorry for a bit late reply (*).
>
> On 2/1/24 11:30, Amir Goldstein wrote:
> > On Thu, Feb 1, 2024 at 1:08 AM Bernd Schubert <bschubert@ddn.com> wrote:
> >>
> >> This series is mostly about mmap, direct-IO and inode IO modes.
> >> (new in this series is FOPEN_CACHE_IO).
> >> It brings back the shared lock for FOPEN_DIRECT_IO when
> >> FUSE_DIRECT_IO_ALLOW_MMAP is set and is also preparation
> >> work for Amirs work on fuse-passthrough and also for
> >
> > For the interested:
> > https://github.com/amir73il/linux/commits/fuse-backing-fd-010224/
> >
> > Bernd,
> >
> > Can you push this series to your v2 branch so that I can rebase
> > my branch on top of it?
>
> Do you mind if I push this to a different branch to keep the branch clean?
>
I don't mind, I just thought that it would be nice if the branch fuse_io_mode-v2
actually contained the patches that were posted as V2 to the mailing list...
Thanks,
Amir.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v2 0/5] fuse: inode IO modes and mmap
2024-02-01 15:56 ` Amir Goldstein
@ 2024-02-01 16:01 ` Bernd Schubert
0 siblings, 0 replies; 40+ messages in thread
From: Bernd Schubert @ 2024-02-01 16:01 UTC (permalink / raw)
To: Amir Goldstein; +Cc: Bernd Schubert, miklos, linux-fsdevel, dsingh
On 2/1/24 16:56, Amir Goldstein wrote:
> On Thu, Feb 1, 2024 at 4:30 PM Bernd Schubert
> <bernd.schubert@fastmail.fm> wrote:
>>
>> Hi Amir,
>>
>> sorry for a bit late reply (*).
>>
>> On 2/1/24 11:30, Amir Goldstein wrote:
>>> On Thu, Feb 1, 2024 at 1:08 AM Bernd Schubert <bschubert@ddn.com> wrote:
>>>>
>>>> This series is mostly about mmap, direct-IO and inode IO modes.
>>>> (new in this series is FOPEN_CACHE_IO).
>>>> It brings back the shared lock for FOPEN_DIRECT_IO when
>>>> FUSE_DIRECT_IO_ALLOW_MMAP is set and is also preparation
>>>> work for Amirs work on fuse-passthrough and also for
>>>
>>> For the interested:
>>> https://github.com/amir73il/linux/commits/fuse-backing-fd-010224/
>>>
>>> Bernd,
>>>
>>> Can you push this series to your v2 branch so that I can rebase
>>> my branch on top of it?
>>
>> Do you mind if I push this to a different branch to keep the branch clean?
>>
>
> I don't mind, I just thought that it would be nice if the branch fuse_io_mode-v2
> actually contained the patches that were posted as V2 to the mailing list...
Ah sorry, I misread. I thought you wanted me to pull
fuse-backing-fd-010224 onto the fuse_io_mode-v2 branch. I just pushed
posted v2 to my branch.
Thanks,
Bernd
^ permalink raw reply [flat|nested] 40+ messages in thread