* [PATCH] fuse: clarify extending writes handling @ 2025-08-18 13:29 Chunsheng Luo 2025-08-19 14:07 ` Miklos Szeredi 0 siblings, 1 reply; 7+ messages in thread From: Chunsheng Luo @ 2025-08-18 13:29 UTC (permalink / raw) To: miklos; +Cc: linux-fsdevel, linux-kernel, Chunsheng Luo Only flush extending writes (up to LLONG_MAX) for files with upcoming write operations, and Fix confusing 'end' parameter usage. Signed-off-by: Chunsheng Luo <luochunsheng@ustc.edu> --- fs/fuse/file.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 95275a1e2f54..d2b8e3a7d4a4 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -2851,7 +2851,7 @@ fuse_direct_IO(struct kiocb *iocb, struct iov_iter *iter) static int fuse_writeback_range(struct inode *inode, loff_t start, loff_t end) { - int err = filemap_write_and_wait_range(inode->i_mapping, start, LLONG_MAX); + int err = filemap_write_and_wait_range(inode->i_mapping, start, end); if (!err) fuse_sync_writes(inode); @@ -2894,9 +2894,8 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset, } if (mode & (FALLOC_FL_PUNCH_HOLE | FALLOC_FL_ZERO_RANGE)) { - loff_t endbyte = offset + length - 1; - - err = fuse_writeback_range(inode, offset, endbyte); + /* flush extending writes for upcoming write operations */ + err = fuse_writeback_range(inode, offset, LLONG_MAX); if (err) goto out; } @@ -3017,7 +3016,8 @@ static ssize_t __fuse_copy_file_range(struct file *file_in, loff_t pos_in, * To fix this a mapping->invalidate_lock could be used to prevent new * faults while the copy is ongoing. */ - err = fuse_writeback_range(inode_out, pos_out, pos_out + len - 1); + /* flush extending writes for upcoming write operations */ + err = fuse_writeback_range(inode_out, pos_out, LLONG_MAX); if (err) goto out; -- 2.43.0 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] fuse: clarify extending writes handling 2025-08-18 13:29 [PATCH] fuse: clarify extending writes handling Chunsheng Luo @ 2025-08-19 14:07 ` Miklos Szeredi 2025-08-20 2:11 ` Chunsheng Luo 0 siblings, 1 reply; 7+ messages in thread From: Miklos Szeredi @ 2025-08-19 14:07 UTC (permalink / raw) To: Chunsheng Luo; +Cc: linux-fsdevel, linux-kernel On Mon, 18 Aug 2025 at 15:29, Chunsheng Luo <luochunsheng@ustc.edu> wrote: > > Only flush extending writes (up to LLONG_MAX) for files with upcoming > write operations, and Fix confusing 'end' parameter usage. Patch looks correct, but it changes behavior on input file of copy_file_range(), which is not explained here. Thanks, Miklos ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] fuse: clarify extending writes handling 2025-08-19 14:07 ` Miklos Szeredi @ 2025-08-20 2:11 ` Chunsheng Luo 2025-08-20 5:20 ` Darrick J. Wong 0 siblings, 1 reply; 7+ messages in thread From: Chunsheng Luo @ 2025-08-20 2:11 UTC (permalink / raw) To: miklos; +Cc: linux-fsdevel, linux-kernel, luochunsheng Tue, 19 Aug 2025 16:07:19 Miklos Szeredi wrote: >> >> Only flush extending writes (up to LLONG_MAX) for files with upcoming >> write operations, and Fix confusing 'end' parameter usage. > > Patch looks correct, but it changes behavior on input file of > copy_file_range(), which is not explained here. Thank you for your review. For the copy_file_range input file, since it only involves read operations, I think it is not necessary to flush to LLONG_MAX. Therefore, for the input file, flushing to the end is sufficient. If you think my understanding is correct, I can resend a revised version of the patch to update the commit log and include a clear explanation regarding the behavior changes in 'fuse_copy_file_range' and 'fuse_file_fallocate' operations. Thanks Chunsheng Luo ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] fuse: clarify extending writes handling 2025-08-20 2:11 ` Chunsheng Luo @ 2025-08-20 5:20 ` Darrick J. Wong 2025-08-20 6:52 ` Miklos Szeredi 0 siblings, 1 reply; 7+ messages in thread From: Darrick J. Wong @ 2025-08-20 5:20 UTC (permalink / raw) To: Chunsheng Luo; +Cc: miklos, linux-fsdevel, linux-kernel On Wed, Aug 20, 2025 at 10:11:43AM +0800, Chunsheng Luo wrote: > Tue, 19 Aug 2025 16:07:19 Miklos Szeredi wrote: > > >> > >> Only flush extending writes (up to LLONG_MAX) for files with upcoming > >> write operations, and Fix confusing 'end' parameter usage. > > > > Patch looks correct, but it changes behavior on input file of > > copy_file_range(), which is not explained here. > > Thank you for your review. > > For the copy_file_range input file, since it only involves read operations, > I think it is not necessary to flush to LLONG_MAX. Therefore, for the input file, > flushing to the end is sufficient. > > If you think my understanding is correct, I can resend a revised version of > the patch to update the commit log and include a clear explanation regarding > the behavior changes in 'fuse_copy_file_range' and 'fuse_file_fallocate' operations. I don't understand the current behavior at all -- why do the callers of fuse_writeback_range pass an @end parameter when it ignores @end in favor of LLONG_MAX? And why is it necessary to flush to EOF at all? fallocate and copy_file_range both take i_rwsem, so what could they be racing with? Or am I missing something here? fuse-iomap flushes and unmaps only the given file range, and afaict that's just fine ... but there is this pesky generic/551 failure I keep seeing, so I might actually be missing some subtlety. :) --D > Thanks > Chunsheng Luo > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] fuse: clarify extending writes handling 2025-08-20 5:20 ` Darrick J. Wong @ 2025-08-20 6:52 ` Miklos Szeredi 2025-08-20 16:27 ` Darrick J. Wong 0 siblings, 1 reply; 7+ messages in thread From: Miklos Szeredi @ 2025-08-20 6:52 UTC (permalink / raw) To: Darrick J. Wong; +Cc: Chunsheng Luo, linux-fsdevel, linux-kernel On Wed, 20 Aug 2025 at 07:20, Darrick J. Wong <djwong@kernel.org> wrote: > I don't understand the current behavior at all -- why do the callers of > fuse_writeback_range pass an @end parameter when it ignores @end in > favor of LLONG_MAX? And why is it necessary to flush to EOF at all? > fallocate and copy_file_range both take i_rwsem, so what could they be > racing with? Or am I missing something here? commit 59bda8ecee2f ("fuse: flush extending writes") The issue AFAICS is that if writes beyond the range end are not flushed, then EOF on backing file could be below range end (if pending writes create a hole), hence copy_file_range() will stop copying at the start of that hole. So this patch is incorrect, since not flushing copy_file_range input file could result in a short copy. Thanks, Miklos ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] fuse: clarify extending writes handling 2025-08-20 6:52 ` Miklos Szeredi @ 2025-08-20 16:27 ` Darrick J. Wong 2025-08-21 6:25 ` Chunsheng Luo 0 siblings, 1 reply; 7+ messages in thread From: Darrick J. Wong @ 2025-08-20 16:27 UTC (permalink / raw) To: Miklos Szeredi; +Cc: Chunsheng Luo, linux-fsdevel, linux-kernel On Wed, Aug 20, 2025 at 08:52:35AM +0200, Miklos Szeredi wrote: > On Wed, 20 Aug 2025 at 07:20, Darrick J. Wong <djwong@kernel.org> wrote: > > > I don't understand the current behavior at all -- why do the callers of > > fuse_writeback_range pass an @end parameter when it ignores @end in > > favor of LLONG_MAX? And why is it necessary to flush to EOF at all? > > fallocate and copy_file_range both take i_rwsem, so what could they be > > racing with? Or am I missing something here? > > commit 59bda8ecee2f ("fuse: flush extending writes") > > The issue AFAICS is that if writes beyond the range end are not > flushed, then EOF on backing file could be below range end (if pending > writes create a hole), hence copy_file_range() will stop copying at > the start of that hole. > > So this patch is incorrect, since not flushing copy_file_range input > file could result in a short copy. <nod> As far as Mr. Luo's patch is concerned, I agree that a strict "no behavior changes" patch should have changed the inode_in writeback_range call to: err = fuse_writeback_range(inode_in, pos_in, LLONG_MAX); Though if all callsites are going to pass LLONG_MAX in as @end, then why not eliminate the parameter entirely? What I'm (still) wondering is why was it necessary to flush the source and destination ranges between (pos + len - 1) and LLONG_MAX? But let's see, what did 59bda8ecee2f have to say? | fuse: flush extending writes | | Callers of fuse_writeback_range() assume that the file is ready for | modification by the server in the supplied byte range after the call | returns. Ok, so far so good. | If there's a write that extends the file beyond the end of the supplied | range, then the file needs to be extended to at least the end of the range, | but currently that's not done. | | There are at least two cases where this can cause problems: | | - copy_file_range() will return short count if the file is not extended | up to end of the source range. That suggests to me filemap_write_and_wait_range(inode_in, pos_in, pos_in + pos_len - 1) but I don't see why we need to flush more bytes than that? The server's CFR implementation has all the bytes it needs to read the source data. Hum. But what if CFR is actually reflink? I guess you'd want to buffer-copy the unaligned head and tail regions, and reflink the allocation units in the middle, but I still don't see why the fuse server needs more of the source file than (pos, pos + len - 1)? | - FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE will not extend the file, | hence the region may not be fully allocated. Hrm, ZERO | KEEP_SIZE is supposed to allow preallocation of blocks beyond EOF, or at least that's what XFS does: $ truncate -s 10m /mnt/test $ xfs_io -c 'fzero -k 100m 64k' /mnt/test $ filefrag -v /mnt/test Filesystem type is: 58465342 File size of /mnt/test is 10485760 (2560 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 25600.. 25615: 24.. 39: 16: 25600: last,unwritten,eof /mnt/test: 1 extent found as does ext4: $ truncate -s 10m /mnt/test $ xfs_io -c 'fzero -k 100m 64k' /mnt/test $ filefrag -v /mnt/test Filesystem type is: ef53 File size of /mnt/test is 10485760 (2560 blocks of 4096 bytes) ext: logical_offset: physical_offset: length: expected: flags: 0: 25600.. 25615: 33808.. 33823: 16: 25600: last,unwritten,eof /mnt/test: 1 extent found (Notice that the 10M file has one extent starting at 100M) I can see why you'd want to flush the target range in case the fuse server has a better trick up its sleeve to zero the already-written region that isn't the punch-and-realloc behavior that xfs and ext4 have. But here too I don't see why the fuse server would need more than the target region. Though I think for both cases we end up flushing more than the target region, because the page cache rounds start down and end up to PAGE_SIZE boundaries. | Fix by flushing writes from the start of the range up to the end of the | file. This could be optimized if the writes are non-extending, etc, but | it's probably not worth the trouble. <shrug> Was there a bug report associated with this commit? I couldn't find the any hits on the subject line in lore. Was this simply a big hammer that solved whatever corruption problems were occuring? Or something found in code inspection? <confused> --D > Thanks, > Miklos > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] fuse: clarify extending writes handling 2025-08-20 16:27 ` Darrick J. Wong @ 2025-08-21 6:25 ` Chunsheng Luo 0 siblings, 0 replies; 7+ messages in thread From: Chunsheng Luo @ 2025-08-21 6:25 UTC (permalink / raw) To: djwong; +Cc: linux-fsdevel, linux-kernel, luochunsheng, miklos On Wed, 20 Aug 2025 09:27:24 Darrick J. Wong wrote: > On Wed, Aug 20, 2025 at 08:52:35AM +0200, Miklos Szeredi wrote: > > On Wed, 20 Aug 2025 at 07:20, Darrick J. Wong <djwong@kernel.org> wrote: > > > > > I don't understand the current behavior at all -- why do the callers of > > > fuse_writeback_range pass an @end parameter when it ignores @end in > > > favor of LLONG_MAX? And why is it necessary to flush to EOF at all? > > > fallocate and copy_file_range both take i_rwsem, so what could they be > > > racing with? Or am I missing something here? > > > > commit 59bda8ecee2f ("fuse: flush extending writes") > > > > The issue AFAICS is that if writes beyond the range end are not > > flushed, then EOF on backing file could be below range end (if pending > > writes create a hole), hence copy_file_range() will stop copying at > > the start of that hole. > > > > So this patch is incorrect, since not flushing copy_file_range input > > file could result in a short copy. > Thanks to Miklos for the review and explanation. > <nod> As far as Mr. Luo's patch is concerned, I agree that a strict "no > behavior changes" patch should have changed the inode_in writeback_range > call to: > > err = fuse_writeback_range(inode_in, pos_in, LLONG_MAX); > > Though if all callsites are going to pass LLONG_MAX in as @end, then > why not eliminate the parameter entirely? > Thanks for your reply. Ok, understood. Before fully understanding why we need to flush up to the end, let's first ensure the logic remains unchanged. Rather than removing the end parameter from fuse_writeback_range and putting LLONG_MAX inside the function, I suggest keeping the end parameter, modifying the input argument to LLONG_MAX, and adding some comments. This way we can more clearly see the range scope. Also, we cannot guarantee whether there will be other scenarios that need the real_end in the future. > What I'm (still) wondering is why was it necessary to flush the source > and destination ranges between (pos + len - 1) and LLONG_MAX? But let's > see, what did 59bda8ecee2f have to say? > > | fuse: flush extending writes > | > | Callers of fuse_writeback_range() assume that the file is ready for > | modification by the server in the supplied byte range after the call > | returns. > > Ok, so far so good. > > | If there's a write that extends the file beyond the end of the supplied > | range, then the file needs to be extended to at least the end of the range, > | but currently that's not done. > | > | There are at least two cases where this can cause problems: > | > | - copy_file_range() will return short count if the file is not extended > | up to end of the source range. > > That suggests to me > > filemap_write_and_wait_range(inode_in, pos_in, pos_in + pos_len - 1) > > but I don't see why we need to flush more bytes than that? The server's > CFR implementation has all the bytes it needs to read the source data. > > Hum. But what if CFR is actually reflink? I guess you'd want to > buffer-copy the unaligned head and tail regions, and reflink the > allocation units in the middle, but I still don't see why the fuse > server needs more of the source file than (pos, pos + len - 1)? > > | - FALLOC_FL_ZERO_RANGE | FALLOC_FL_KEEP_SIZE will not extend the file, > | hence the region may not be fully allocated. > > Hrm, ZERO | KEEP_SIZE is supposed to allow preallocation of blocks > beyond EOF, or at least that's what XFS does: > > $ truncate -s 10m /mnt/test > $ xfs_io -c 'fzero -k 100m 64k' /mnt/test > $ filefrag -v /mnt/test > Filesystem type is: 58465342 > File size of /mnt/test is 10485760 (2560 blocks of 4096 bytes) > ext: logical_offset: physical_offset: length: expected: flags: > 0: 25600.. 25615: 24.. 39: 16: 25600: last,unwritten,eof > /mnt/test: 1 extent found > > as does ext4: > > $ truncate -s 10m /mnt/test > $ xfs_io -c 'fzero -k 100m 64k' /mnt/test > $ filefrag -v /mnt/test > Filesystem type is: ef53 > File size of /mnt/test is 10485760 (2560 blocks of 4096 bytes) > ext: logical_offset: physical_offset: length: expected: flags: > 0: 25600.. 25615: 33808.. 33823: 16: 25600: last,unwritten,eof > /mnt/test: 1 extent found > > (Notice that the 10M file has one extent starting at 100M) > > I can see why you'd want to flush the target range in case the fuse > server has a better trick up its sleeve to zero the already-written > region that isn't the punch-and-realloc behavior that xfs and ext4 have. > But here too I don't see why the fuse server would need more than the > target region. > > Though I think for both cases we end up flushing more than the target > region, because the page cache rounds start down and end up to PAGE_SIZE > boundaries. > > | Fix by flushing writes from the start of the range up to the end of the > | file. This could be optimized if the writes are non-extending, etc, but > | it's probably not worth the trouble. > > <shrug> Was there a bug report associated with this commit? I couldn't > find the any hits on the subject line in lore. Was this simply a big > hammer that solved whatever corruption problems were occuring? Or > something found in code inspection? > > <confused> > > --D > > > Thanks, > > Miklos > > Regarding "The issue AFAICS is that if writes beyond the range end are not flushed, then EOF on backing file could be below range end (if pending writes create a hole), hence copy_file_range() will stop copying at the start of that hole." I looked up some information from man and code 1. The man copy_file_range description: "If fd_in is a sparse file, then copy_file_range() may expand any holes existing in the requested range. Users may benefit from calling copy_file_range() in a loop, and using the lseek(2) SEEK_DATA and SEEK_HOLE operations to find the locations of data segments." The man page description of 'If fd_in is a sparse file' clearly refers to the source file being a sparse file (i.e., containing holes). In this case, copy_file_range may expand holes (logical zero-byte regions) in the source file into actual written zero bytes in the destination file (physically occupying disk space), causing the destination file to lose its sparseness. This should refer to the case where holes exist within the copy_from range of fd_in. 2. Looking at the corresponding code: copy_file_range() -> do_splice_direct -> splice_direct_to_actor -> do_splice_read do_splice_read: do { if (*ppos >= i_size_read(in->f_mapping->host)) break; // Hit end of file, exit // filemap_get_pages encountering file holes will fill with zeros // Or is there a case where the filesystem returns failure when it encounters a hole? error = filemap_get_pages(&iocb, len, &fbatch, true); if (error < 0) break; // Process each page, copy to pipe for (i = 0; i < folio_batch_count(&fbatch); i++) { n = splice_folio_into_pipe(pipe, folio, *ppos, n); if (!n) goto out; ... } } while (len); I can understand that the [pos, pos+len) range needs to be flushed to the backing file to avoid the FUSE userspace program mistakenly thinking that there are holes in the backing file (file_in) or that the size is insufficient, which would cause the FUSE userspace program to execute copy_file_range(back_file_in, back_file_out) and return short copy or overwrite holes with zeros. But I'm also confused why we need to flush beyond the [pos, pos+len) range? Yes, are there any testcases or problem email discussions that would make it easier to understand the reason? I'll continue to look at the code in detail combined with testing later. Thanks Chunsheng Luo ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-08-21 6:25 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-08-18 13:29 [PATCH] fuse: clarify extending writes handling Chunsheng Luo 2025-08-19 14:07 ` Miklos Szeredi 2025-08-20 2:11 ` Chunsheng Luo 2025-08-20 5:20 ` Darrick J. Wong 2025-08-20 6:52 ` Miklos Szeredi 2025-08-20 16:27 ` Darrick J. Wong 2025-08-21 6:25 ` Chunsheng Luo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).