[PATCH] fuse: clarify extending writes handling

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] fuse: clarify extending writes handling
@ 2025-08-18 13:29 Chunsheng Luo
  2025-08-19 14:07 ` Miklos Szeredi
  0 siblings, 1 reply; 7+ messages in thread
From: Chunsheng Luo @ 2025-08-18 13:29 UTC (permalink / raw)
  To: miklos; +Cc: linux-fsdevel, linux-kernel, Chunsheng Luo

Only flush extending writes (up to LLONG_MAX) for files with upcoming
write operations, and Fix confusing 'end' parameter usage.

Signed-off-by: Chunsheng Luo <luochunsheng@ustc.edu>
---
 fs/fuse/file.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 95275a1e2f54..d2b8e3a7d4a4 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -2851,7 +2851,7 @@ fuse_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
 
 static int fuse_writeback_range(struct inode *inode, loff_t start, loff_t end)
 {
-	int err = filemap_write_and_wait_range(inode->i_mapping, start, LLONG_MAX);
+	int err = filemap_write_and_wait_range(inode->i_mapping, start, end);
 
 	if (!err)
 		fuse_sync_writes(inode);
@@ -2894,9 +2894,8 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset,
 	}
 
 	if (mode & (FALLOC_FL_PUNCH_HOLE | FALLOC_FL_ZERO_RANGE)) {
-		loff_t endbyte = offset + length - 1;
-
-		err = fuse_writeback_range(inode, offset, endbyte);
+		/* flush extending writes for upcoming write operations */
+		err = fuse_writeback_range(inode, offset, LLONG_MAX);
 		if (err)
 			goto out;
 	}
@@ -3017,7 +3016,8 @@ static ssize_t __fuse_copy_file_range(struct file *file_in, loff_t pos_in,
 	 * To fix this a mapping->invalidate_lock could be used to prevent new
 	 * faults while the copy is ongoing.
 	 */
-	err = fuse_writeback_range(inode_out, pos_out, pos_out + len - 1);
+	/*  flush extending writes for upcoming write operations */
+	err = fuse_writeback_range(inode_out, pos_out, LLONG_MAX);
 	if (err)
 		goto out;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] fuse: clarify extending writes handling
  2025-08-18 13:29 [PATCH] fuse: clarify extending writes handling Chunsheng Luo
@ 2025-08-19 14:07 ` Miklos Szeredi
  2025-08-20  2:11   ` Chunsheng Luo
  0 siblings, 1 reply; 7+ messages in thread
From: Miklos Szeredi @ 2025-08-19 14:07 UTC (permalink / raw)
  To: Chunsheng Luo; +Cc: linux-fsdevel, linux-kernel

On Mon, 18 Aug 2025 at 15:29, Chunsheng Luo <luochunsheng@ustc.edu> wrote:
>
> Only flush extending writes (up to LLONG_MAX) for files with upcoming
> write operations, and Fix confusing 'end' parameter usage.

Patch looks correct, but it changes behavior on input file of
copy_file_range(), which is not explained here.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] fuse: clarify extending writes handling
  2025-08-19 14:07 ` Miklos Szeredi
@ 2025-08-20  2:11   ` Chunsheng Luo
  2025-08-20  5:20     ` Darrick J. Wong
  0 siblings, 1 reply; 7+ messages in thread
From: Chunsheng Luo @ 2025-08-20  2:11 UTC (permalink / raw)
  To: miklos; +Cc: linux-fsdevel, linux-kernel, luochunsheng

Tue, 19 Aug 2025 16:07:19 Miklos Szeredi wrote:

>>
>> Only flush extending writes (up to LLONG_MAX) for files with upcoming
>> write operations, and Fix confusing 'end' parameter usage.
>
> Patch looks correct, but it changes behavior on input file of
> copy_file_range(), which is not explained here.

Thank you for your review.

For the copy_file_range input file, since it only involves read operations,
I think it is not necessary to flush to LLONG_MAX. Therefore, for the input file, 
flushing to the end is sufficient.

If you think my understanding is correct, I can resend a revised version of
the patch to update the commit log and include a clear explanation regarding
the behavior changes in 'fuse_copy_file_range' and 'fuse_file_fallocate' operations.

Thanks
Chunsheng Luo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] fuse: clarify extending writes handling
  2025-08-20  2:11   ` Chunsheng Luo
@ 2025-08-20  5:20     ` Darrick J. Wong
  2025-08-20  6:52       ` Miklos Szeredi
  0 siblings, 1 reply; 7+ messages in thread
From: Darrick J. Wong @ 2025-08-20  5:20 UTC (permalink / raw)
  To: Chunsheng Luo; +Cc: miklos, linux-fsdevel, linux-kernel

On Wed, Aug 20, 2025 at 10:11:43AM +0800, Chunsheng Luo wrote:
> Tue, 19 Aug 2025 16:07:19 Miklos Szeredi wrote:
> 
> >>
> >> Only flush extending writes (up to LLONG_MAX) for files with upcoming
> >> write operations, and Fix confusing 'end' parameter usage.
> >
> > Patch looks correct, but it changes behavior on input file of
> > copy_file_range(), which is not explained here.
> 
> Thank you for your review.
> 
> For the copy_file_range input file, since it only involves read operations,
> I think it is not necessary to flush to LLONG_MAX. Therefore, for the input file, 
> flushing to the end is sufficient.
> 
> If you think my understanding is correct, I can resend a revised version of
> the patch to update the commit log and include a clear explanation regarding
> the behavior changes in 'fuse_copy_file_range' and 'fuse_file_fallocate' operations.

I don't understand the current behavior at all -- why do the callers of
fuse_writeback_range pass an @end parameter when it ignores @end in
favor of LLONG_MAX?  And why is it necessary to flush to EOF at all?
fallocate and copy_file_range both take i_rwsem, so what could they be
racing with?  Or am I missing something here?

fuse-iomap flushes and unmaps only the given file range, and afaict
that's just fine ... but there is this pesky generic/551 failure I keep
seeing, so I might actually be missing some subtlety. :)

--D

> Thanks
> Chunsheng Luo
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] fuse: clarify extending writes handling
  2025-08-20  5:20     ` Darrick J. Wong
@ 2025-08-20  6:52       ` Miklos Szeredi
  2025-08-20 16:27         ` Darrick J. Wong
  0 siblings, 1 reply; 7+ messages in thread
From: Miklos Szeredi @ 2025-08-20  6:52 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Chunsheng Luo, linux-fsdevel, linux-kernel

On Wed, 20 Aug 2025 at 07:20, Darrick J. Wong <djwong@kernel.org> wrote:

> I don't understand the current behavior at all -- why do the callers of
> fuse_writeback_range pass an @end parameter when it ignores @end in
> favor of LLONG_MAX?  And why is it necessary to flush to EOF at all?
> fallocate and copy_file_range both take i_rwsem, so what could they be
> racing with?  Or am I missing something here?

commit 59bda8ecee2f ("fuse: flush extending writes")

The issue AFAICS is that if writes beyond the range end are not
flushed, then EOF on backing file could be below range end (if pending
writes create a hole), hence copy_file_range() will stop copying at
the start of that hole.

So this patch is incorrect, since not flushing copy_file_range input
file could result in a short copy.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] fuse: clarify extending writes handling
  2025-08-20  6:52       ` Miklos Szeredi
@ 2025-08-20 16:27         ` Darrick J. Wong
  2025-08-21  6:25           ` Chunsheng Luo
  0 siblings, 1 reply; 7+ messages in thread
From: Darrick J. Wong @ 2025-08-20 16:27 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Chunsheng Luo, linux-fsdevel, linux-kernel

On Wed, Aug 20, 2025 at 08:52:35AM +0200, Miklos Szeredi wrote:
> On Wed, 20 Aug 2025 at 07:20, Darrick J. Wong <djwong@kernel.org> wrote:
> 
> > I don't understand the current behavior at all -- why do the callers of
> > fuse_writeback_range pass an @end parameter when it ignores @end in
> > favor of LLONG_MAX?  And why is it necessary to flush to EOF at all?
> > fallocate and copy_file_range both take i_rwsem, so what could they be
> > racing with?  Or am I missing something here?
> 
> commit 59bda8ecee2f ("fuse: flush extending writes")
> 
> The issue AFAICS is that if writes beyond the range end are not
> flushed, then EOF on backing file could be below range end (if pending
> writes create a hole), hence copy_file_range() will stop copying at
> the start of that hole.
> 
> So this patch is incorrect, since not flushing copy_file_range input
> file could result in a short copy.

<nod> As far as Mr. Luo's patch is concerned, I agree that a strict "no
behavior changes" patch should have changed the inode_in writeback_range
call to:

	err = fuse_writeback_range(inode_in, pos_in, LLONG_MAX);

Though if all callsites are going to pass LLONG_MAX in as @end, then
why not eliminate the parameter entirely?

What I'm (still) wondering is why was it necessary to flush the source
and destination ranges between (pos + len - 1) and LLONG_MAX?  But let's
see, what did 59bda8ecee2f have to say?

| fuse: flush extending writes
|
| Callers of fuse_writeback_range() assume that the file is ready for
| modification by the server in the supplied byte range after the call
| returns.

Ok, so far so good.

| If there's a write that extends the file beyond the end of the supplied
| range, then the file needs to be extended to at least the end of the range,
| but currently that's not done.
|
| There are at least two cases where this can cause problems:
|
|  - copy_file_range() will return short count if the file is not extended
|    up to end of the source range.

That suggests to me

filemap_write_and_wait_range(inode_in, pos_in, pos_in + pos_len - 1)

but I don't see why we need to flush more bytes than that?  The server's
CFR implementation has all the bytes it needs to read the source data.

Hum.  But what if CFR is actually reflink?  I guess you'd want to
buffer-copy the unaligned head and tail regions, and reflink the
allocation units in the middle, but I still don't see why the fuse
server needs more of the source file than (pos, pos + len - 1)?

|  - FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE will not extend the file,
|    hence the region may not be fully allocated.

Hrm, ZERO | KEEP_SIZE is supposed to allow preallocation of blocks
beyond EOF, or at least that's what XFS does:

$ truncate -s 10m /mnt/test
$ xfs_io -c 'fzero -k 100m 64k' /mnt/test
$ filefrag -v /mnt/test
Filesystem type is: 58465342
File size of /mnt/test is 10485760 (2560 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:    25600..   25615:         24..        39:     16:      25600: last,unwritten,eof
/mnt/test: 1 extent found

as does ext4:

$ truncate -s 10m /mnt/test
$ xfs_io -c 'fzero -k 100m 64k' /mnt/test
$ filefrag -v /mnt/test
Filesystem type is: ef53
File size of /mnt/test is 10485760 (2560 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:    25600..   25615:      33808..     33823:     16:      25600: last,unwritten,eof
/mnt/test: 1 extent found

(Notice that the 10M file has one extent starting at 100M)

I can see why you'd want to flush the target range in case the fuse
server has a better trick up its sleeve to zero the already-written
region that isn't the punch-and-realloc behavior that xfs and ext4 have.
But here too I don't see why the fuse server would need more than the
target region.

Though I think for both cases we end up flushing more than the target
region, because the page cache rounds start down and end up to PAGE_SIZE
boundaries.

| Fix by flushing writes from the start of the range up to the end of the
| file.  This could be optimized if the writes are non-extending, etc, but
| it's probably not worth the trouble.

<shrug> Was there a bug report associated with this commit?  I couldn't
find the any hits on the subject line in lore.  Was this simply a big
hammer that solved whatever corruption problems were occuring?  Or
something found in code inspection?

<confused>

--D

> Thanks,
> Miklos
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] fuse: clarify extending writes handling
  2025-08-20 16:27         ` Darrick J. Wong
@ 2025-08-21  6:25           ` Chunsheng Luo
  0 siblings, 0 replies; 7+ messages in thread
From: Chunsheng Luo @ 2025-08-21  6:25 UTC (permalink / raw)
  To: djwong; +Cc: linux-fsdevel, linux-kernel, luochunsheng, miklos

On  Wed, 20 Aug 2025 09:27:24 Darrick J. Wong wrote:

> On Wed, Aug 20, 2025 at 08:52:35AM +0200, Miklos Szeredi wrote:
> > On Wed, 20 Aug 2025 at 07:20, Darrick J. Wong <djwong@kernel.org> wrote:
> > 
> > > I don't understand the current behavior at all -- why do the callers of
> > > fuse_writeback_range pass an @end parameter when it ignores @end in
> > > favor of LLONG_MAX?  And why is it necessary to flush to EOF at all?
> > > fallocate and copy_file_range both take i_rwsem, so what could they be
> > > racing with?  Or am I missing something here?
> > 
> > commit 59bda8ecee2f ("fuse: flush extending writes")
> > 
> > The issue AFAICS is that if writes beyond the range end are not
> > flushed, then EOF on backing file could be below range end (if pending
> > writes create a hole), hence copy_file_range() will stop copying at
> > the start of that hole.
> > 
> > So this patch is incorrect, since not flushing copy_file_range input
> > file could result in a short copy.
> 

Thanks to Miklos for the review and explanation.

> <nod> As far as Mr. Luo's patch is concerned, I agree that a strict "no
> behavior changes" patch should have changed the inode_in writeback_range
> call to:
> 
> 	err = fuse_writeback_range(inode_in, pos_in, LLONG_MAX);
> 
> Though if all callsites are going to pass LLONG_MAX in as @end, then
> why not eliminate the parameter entirely?
> 

Thanks for your reply.

Ok, understood. Before fully understanding why we need to flush up to the end,
let's first ensure the logic remains unchanged.
 
Rather than removing the end parameter from fuse_writeback_range and putting
LLONG_MAX inside the function, I suggest keeping the end parameter, modifying
the input argument to LLONG_MAX, and adding some comments. This way we can
more clearly see the range scope. Also, we cannot guarantee whether there
will be other scenarios that need the real_end in the future.

> What I'm (still) wondering is why was it necessary to flush the source
> and destination ranges between (pos + len - 1) and LLONG_MAX?  But let's
> see, what did 59bda8ecee2f have to say?
> 
> | fuse: flush extending writes
> |
> | Callers of fuse_writeback_range() assume that the file is ready for
> | modification by the server in the supplied byte range after the call
> | returns.
> 
> Ok, so far so good.
> 
> | If there's a write that extends the file beyond the end of the supplied
> | range, then the file needs to be extended to at least the end of the range,
> | but currently that's not done.
> |
> | There are at least two cases where this can cause problems:
> |
> |  - copy_file_range() will return short count if the file is not extended
> |    up to end of the source range.
> 
> That suggests to me
> 
> filemap_write_and_wait_range(inode_in, pos_in, pos_in + pos_len - 1)
> 
> but I don't see why we need to flush more bytes than that?  The server's
> CFR implementation has all the bytes it needs to read the source data.
> 
> Hum.  But what if CFR is actually reflink?  I guess you'd want to
> buffer-copy the unaligned head and tail regions, and reflink the
> allocation units in the middle, but I still don't see why the fuse
> server needs more of the source file than (pos, pos + len - 1)?
> 
> |  - FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE will not extend the file,
> |    hence the region may not be fully allocated.
> 
> Hrm, ZERO | KEEP_SIZE is supposed to allow preallocation of blocks
> beyond EOF, or at least that's what XFS does:
> 
> $ truncate -s 10m /mnt/test
> $ xfs_io -c 'fzero -k 100m 64k' /mnt/test
> $ filefrag -v /mnt/test
> Filesystem type is: 58465342
> File size of /mnt/test is 10485760 (2560 blocks of 4096 bytes)
>  ext:     logical_offset:        physical_offset: length:   expected: flags:
>    0:    25600..   25615:         24..        39:     16:      25600: last,unwritten,eof
> /mnt/test: 1 extent found
> 
> as does ext4:
> 
> $ truncate -s 10m /mnt/test
> $ xfs_io -c 'fzero -k 100m 64k' /mnt/test
> $ filefrag -v /mnt/test
> Filesystem type is: ef53
> File size of /mnt/test is 10485760 (2560 blocks of 4096 bytes)
>  ext:     logical_offset:        physical_offset: length:   expected: flags:
>    0:    25600..   25615:      33808..     33823:     16:      25600: last,unwritten,eof
> /mnt/test: 1 extent found
> 
> (Notice that the 10M file has one extent starting at 100M)
> 
> I can see why you'd want to flush the target range in case the fuse
> server has a better trick up its sleeve to zero the already-written
> region that isn't the punch-and-realloc behavior that xfs and ext4 have.
> But here too I don't see why the fuse server would need more than the
> target region.
> 
> Though I think for both cases we end up flushing more than the target
> region, because the page cache rounds start down and end up to PAGE_SIZE
> boundaries.
> 
> | Fix by flushing writes from the start of the range up to the end of the
> | file.  This could be optimized if the writes are non-extending, etc, but
> | it's probably not worth the trouble.
> 
> <shrug> Was there a bug report associated with this commit?  I couldn't
> find the any hits on the subject line in lore.  Was this simply a big
> hammer that solved whatever corruption problems were occuring?  Or
> something found in code inspection?
> 
> <confused>
> 
> --D
> 
> > Thanks,
> > Miklos
> > 

Regarding "The issue AFAICS is that if writes beyond the range end are not flushed, 
then EOF on backing file could be below range end (if pending writes create a hole), 
hence copy_file_range() will stop copying at the start of that hole."

I looked up some information from man and code

1. The man copy_file_range description:

"If fd_in is a sparse file, then copy_file_range() may expand any holes existing 
in the requested range. Users may benefit from calling copy_file_range() in a loop, 
and using the lseek(2) SEEK_DATA and SEEK_HOLE operations to find the locations of
data segments."

The man page description of 'If fd_in is a sparse file' clearly refers to the source
file being a sparse file (i.e., containing holes). In this case, copy_file_range may
expand holes (logical zero-byte regions) in the source file into actual written zero
bytes in the destination file (physically occupying disk space), causing the destination
file to lose its sparseness. This should refer to the case where holes exist within the
copy_from range of fd_in.

2. Looking at the corresponding code:
copy_file_range() -> do_splice_direct -> splice_direct_to_actor -> do_splice_read

do_splice_read:
do {
    if (*ppos >= i_size_read(in->f_mapping->host))
        break;  // Hit end of file, exit
		
    // filemap_get_pages encountering file holes will fill with zeros
    // Or is there a case where the filesystem returns failure when it encounters a hole?
    error = filemap_get_pages(&iocb, len, &fbatch, true); 
    if (error < 0)
        break;
    
    // Process each page, copy to pipe
    for (i = 0; i < folio_batch_count(&fbatch); i++) {
        n = splice_folio_into_pipe(pipe, folio, *ppos, n);
        if (!n)
            goto out;
			...
    }
} while (len);

I can understand that the [pos, pos+len) range needs to be flushed to the backing file
to avoid the FUSE userspace program mistakenly thinking that there are holes in the
backing file (file_in) or that the size is insufficient, which would cause the FUSE
userspace program to execute copy_file_range(back_file_in, back_file_out) and return
short copy or overwrite holes with zeros.

But I'm also confused why we need to flush beyond the [pos, pos+len) range?

Yes, are there any testcases or problem email discussions that would make it easier
to understand the reason? 

I'll continue to look at the code in detail combined with testing later.

Thanks
Chunsheng Luo

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-08-21  6:25 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-18 13:29 [PATCH] fuse: clarify extending writes handling Chunsheng Luo
2025-08-19 14:07 ` Miklos Szeredi
2025-08-20  2:11   ` Chunsheng Luo
2025-08-20  5:20     ` Darrick J. Wong
2025-08-20  6:52       ` Miklos Szeredi
2025-08-20 16:27         ` Darrick J. Wong
2025-08-21  6:25           ` Chunsheng Luo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).