[PATCH v2] 9p: fix i_size update race in getattr with writeback caching

public inbox for v9fs@lists.linux.dev
 help / color / mirror / Atom feed

* [PATCH v2] 9p: fix i_size update race in getattr with writeback caching
@ 2025-12-27 18:01 Pierre Barre
  2026-01-05 11:35 ` Christian Schoenebeck
  2026-02-18 13:38 ` David Howells
  0 siblings, 2 replies; 11+ messages in thread
From: Pierre Barre @ 2025-12-27 18:01 UTC (permalink / raw)
  To: ericvh, lucho, asmadeus; +Cc: linux_oss, v9fs, linux-kernel, Pierre Barre

With writeback caching (cache=mmap), v9fs_stat2inode() and
v9fs_stat2inode_dotl() unconditionally overwrite i_size from the server
response, even when dirty pages may exist locally. This causes processes
using lseek(SEEK_END) to see incorrect file sizes, leading to data
corruption when extending files while concurrent stat() calls occur.

Fix by passing V9FS_STAT2INODE_KEEP_ISIZE when CACHE_WRITEBACK is
enabled to preserve the client's authoritative i_size.

Signed-off-by: Pierre Barre <pierre@barre.sh>
---
 fs/9p/vfs_inode.c      | 7 ++++++-
 fs/9p/vfs_inode_dotl.c | 7 ++++++-
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 97abe65bf7c1..bfba21f5d8a9 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -993,7 +993,12 @@ v9fs_vfs_getattr(struct mnt_idmap *idmap, const struct path *path,
 	if (IS_ERR(st))
 		return PTR_ERR(st);
 
-	v9fs_stat2inode(st, d_inode(dentry), dentry->d_sb, 0);
+	/*
+	 * With writeback caching, the client is authoritative for i_size.
+	 * Don't let the server overwrite it with a potentially stale value.
+	 */
+	v9fs_stat2inode(st, d_inode(dentry), dentry->d_sb,
+		(v9ses->cache & CACHE_WRITEBACK) ? V9FS_STAT2INODE_KEEP_ISIZE : 0);
 	generic_fillattr(&nop_mnt_idmap, request_mask, d_inode(dentry), stat);
 
 	p9stat_free(st);
diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
index 643e759eacb2..67a0ded2e223 100644
--- a/fs/9p/vfs_inode_dotl.c
+++ b/fs/9p/vfs_inode_dotl.c
@@ -453,7 +453,12 @@ v9fs_vfs_getattr_dotl(struct mnt_idmap *idmap,
 	if (IS_ERR(st))
 		return PTR_ERR(st);
 
-	v9fs_stat2inode_dotl(st, d_inode(dentry), 0);
+	/*
+	 * With writeback caching, the client is authoritative for i_size.
+	 * Don't let the server overwrite it with a potentially stale value.
+	 */
+	v9fs_stat2inode_dotl(st, d_inode(dentry),
+		(v9ses->cache & CACHE_WRITEBACK) ? V9FS_STAT2INODE_KEEP_ISIZE : 0);
 	generic_fillattr(&nop_mnt_idmap, request_mask, d_inode(dentry), stat);
 	/* Change block size to what the server returned */
 	stat->blksize = st->st_blksize;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] 9p: fix i_size update race in getattr with writeback caching
  2025-12-27 18:01 [PATCH v2] 9p: fix i_size update race in getattr with writeback caching Pierre Barre
@ 2026-01-05 11:35 ` Christian Schoenebeck
  2026-01-07  0:27   ` Pierre Barre
                     ` (2 more replies)
  2026-02-18 13:38 ` David Howells
  1 sibling, 3 replies; 11+ messages in thread
From: Christian Schoenebeck @ 2026-01-05 11:35 UTC (permalink / raw)
  To: ericvh, lucho, asmadeus, Pierre Barre, David Howells
  Cc: v9fs, linux-kernel, Pierre Barre

On Saturday, 27 December 2025 19:01:37 CET Pierre Barre wrote:
> With writeback caching (cache=mmap), v9fs_stat2inode() and
> v9fs_stat2inode_dotl() unconditionally overwrite i_size from the server
> response, even when dirty pages may exist locally. This causes processes
> using lseek(SEEK_END) to see incorrect file sizes, leading to data
> corruption when extending files while concurrent stat() calls occur.
> 
> Fix by passing V9FS_STAT2INODE_KEEP_ISIZE when CACHE_WRITEBACK is
> enabled to preserve the client's authoritative i_size.
> 
> Signed-off-by: Pierre Barre <pierre@barre.sh>
> ---

Adding David Howells to this discussion.

David, I read today that you suggested [1] to go for a wait approach instead?
So somewhat similar to what Pierre suggested in v1 [2].

[1] https://lore.kernel.org/all/1696785.1767599663@warthog.procyon.org.uk/
[2] https://lore.kernel.org/all/20251227083751.715152-1-pierre@barre.sh/

I understand the point about cifs and nfs, where you must expect foreign
changes on server side by another client.

For 9pfs though IMO it would make sense to distinguish:

for cache=loose (and cache=fscache?) we are not expecting host side changes,
so we could safely go for the approach suggested by this patch here and not
wait for flushing dirty data and just use the locally cached file size (for
performance reasons).

And for all other cache modes going for the cifs/nfs approach and wait for
flushing? Does this make sense?

I just tested this reported issue with the following reproducer:

--------------------------

#!/bin/bash

rm -f foo.txt

for i in {1..50}
do
  echo $i >> foo.txt &
  ls -lha foo.txt &
done

sync

cat foo.txt

echo

wc -l foo.txt

--------------------------

For cache=loose I couldn't reproduce, but with cache=mmap I could reproduce
data corruption, folio writeback hangup and the following error message:

[   60.847671] [append] R=134d: No submit

...

[  243.153292] INFO: task sync:986 blocked for more than 120 seconds.
[  243.161516]       Tainted: G            E       6.19.0-rc1+ #107
[  243.168847] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.178702] task:sync            state:D stack:0     pid:986   tgid:986   ppid:884    task_flags:0x400000 flags:0x00080002
[  243.190379] Call Trace:
[  243.192285]  <TASK>

[  243.193891]  __schedule (kernel/sched/core.c:5256 kernel/sched/core.c:6863)
[  243.195999]  schedule (kernel/sched/core.c:6946 kernel/sched/core.c:6960)
[  243.197730]  io_schedule (kernel/sched/core.c:7764 (discriminator 1) kernel/sched/core.c:7790 (discriminator 1))
[  243.199418]  folio_wait_bit_common (mm/filemap.c:1312)
[  243.201451]  ? __pfx_wake_page_function (mm/filemap.c:1134)
[  243.203399]  folio_wait_writeback (./arch/x86/include/asm/bitops.h:202 ./arch/x86/include/asm/bitops.h:232 ./include/asm-generic/bitops/instrumented-non-atomic.h:142 ./include/linux/page-flags.h:597 mm/page-writeback.c:3087)
[  243.205371]  __filemap_fdatawait_range (mm/filemap.c:529 (discriminator 1))
[  243.207377]  ? _raw_spin_unlock (./include/linux/spinlock_api_smp.h:143 (discriminator 3) kernel/locking/spinlock.c:186 (discriminator 3))
[  243.209305]  ? finish_task_switch.isra.0 (./arch/x86/include/asm/paravirt.h:671 kernel/sched/sched.h:1570 kernel/sched/core.c:4995 kernel/sched/core.c:5112)
[  243.212030]  ? __schedule (kernel/sched/core.c:5259)
[  243.214083]  ? wb_wait_for_completion (fs/fs-writeback.c:227)
[  243.215925]  filemap_fdatawait_keep_errors (./arch/x86/include/asm/bitops.h:202 ./arch/x86/include/asm/bitops.h:232 ./include/asm-generic/bitops/instrumented-non-atomic.h:142 mm/filemap.c:363 mm/filemap.c:627)
[  243.217628]  sync_inodes_sb (./include/linux/sched.h:2062 fs/fs-writeback.c:2777 fs/fs-writeback.c:2897)
[  243.218914]  ? __pfx_sync_inodes_one_sb (fs/sync.c:75)
[  243.220186]  __iterate_supers (fs/super.c:64 fs/super.c:925)
[  243.221601]  ksys_sync (fs/sync.c:103)
[  243.222746]  __do_sys_sync (fs/sync.c:115)
[  243.223950]  do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1))
[  243.226140]  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:131)
[  243.228375] RIP: 0033:0x7f51e4613707
[  243.230260] RSP: 002b:00007ffdb2fccad8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a2
[  243.234092] RAX: ffffffffffffffda RBX: 00007ffdb2fccc38 RCX: 00007f51e4613707
[  243.238153] RDX: 00007f51e46f3501 RSI: 00007ffdb2fcef3e RDI: 00007f51e46ae557
[  243.241063] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
[  243.244028] R10: 00007f51e45be0d0 R11: 0000000000000246 R12: 00005568f2d350fb
[  243.247586] R13: 0000000000000000 R14: 0000000000000000 R15: 00005568f2d37b00
[  243.251622]  </TASK>

BTW running this script with cache=loose made me realize that Linux 9p client
is always requesting server for every "ls foo.txt", which makes it extremely
slow unnecessarily.

>  fs/9p/vfs_inode.c      | 7 ++++++-
>  fs/9p/vfs_inode_dotl.c | 7 ++++++-
>  2 files changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
> index 97abe65bf7c1..bfba21f5d8a9 100644
> --- a/fs/9p/vfs_inode.c
> +++ b/fs/9p/vfs_inode.c
> @@ -993,7 +993,12 @@ v9fs_vfs_getattr(struct mnt_idmap *idmap, const struct
> path *path, if (IS_ERR(st))
>  		return PTR_ERR(st);
> 
> -	v9fs_stat2inode(st, d_inode(dentry), dentry->d_sb, 0);
> +	/*
> +	 * With writeback caching, the client is authoritative for i_size.
> +	 * Don't let the server overwrite it with a potentially stale value.
> +	 */
> +	v9fs_stat2inode(st, d_inode(dentry), dentry->d_sb,
> +		(v9ses->cache & CACHE_WRITEBACK) ? V9FS_STAT2INODE_KEEP_ISIZE : 0);
>  	generic_fillattr(&nop_mnt_idmap, request_mask, d_inode(dentry), stat);
> 
>  	p9stat_free(st);
> diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
> index 643e759eacb2..67a0ded2e223 100644
> --- a/fs/9p/vfs_inode_dotl.c
> +++ b/fs/9p/vfs_inode_dotl.c
> @@ -453,7 +453,12 @@ v9fs_vfs_getattr_dotl(struct mnt_idmap *idmap,
>  	if (IS_ERR(st))
>  		return PTR_ERR(st);
> 
> -	v9fs_stat2inode_dotl(st, d_inode(dentry), 0);
> +	/*
> +	 * With writeback caching, the client is authoritative for i_size.
> +	 * Don't let the server overwrite it with a potentially stale value.
> +	 */
> +	v9fs_stat2inode_dotl(st, d_inode(dentry),
> +		(v9ses->cache & CACHE_WRITEBACK) ? V9FS_STAT2INODE_KEEP_ISIZE : 0);
>  	generic_fillattr(&nop_mnt_idmap, request_mask, d_inode(dentry), stat);
>  	/* Change block size to what the server returned */
>  	stat->blksize = st->st_blksize;




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] 9p: fix i_size update race in getattr with writeback caching
  2026-01-05 11:35 ` Christian Schoenebeck
@ 2026-01-07  0:27   ` Pierre Barre
  2026-02-04 11:48     ` Christian Schoenebeck
  2026-02-17 14:48   ` David Howells
  2026-02-17 15:05   ` David Howells
  2 siblings, 1 reply; 11+ messages in thread
From: Pierre Barre @ 2026-01-07  0:27 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: ericvh, lucho, asmadeus, David Howells, v9fs, linux-kernel



> On 5 Jan 2026, at 12:35, Christian Schoenebeck <linux_oss@crudebyte.com> wrote:
> 
> On Saturday, 27 December 2025 19:01:37 CET Pierre Barre wrote:
>> With writeback caching (cache=mmap), v9fs_stat2inode() and
>> v9fs_stat2inode_dotl() unconditionally overwrite i_size from the server
>> response, even when dirty pages may exist locally. This causes processes
>> using lseek(SEEK_END) to see incorrect file sizes, leading to data
>> corruption when extending files while concurrent stat() calls occur.
>> 
>> Fix by passing V9FS_STAT2INODE_KEEP_ISIZE when CACHE_WRITEBACK is
>> enabled to preserve the client's authoritative i_size.
>> 
>> Signed-off-by: Pierre Barre <pierre@barre.sh>
>> ---
> 
> Adding David Howells to this discussion.
> 
> David, I read today that you suggested [1] to go for a wait approach instead?
> So somewhat similar to what Pierre suggested in v1 [2].

FWIW, I tested the initial approach I described and this seems to result in no real performance improvements compared to not using any writeback caching at all.

Best,
Pierre

> [1] https://lore.kernel.org/all/1696785.1767599663@warthog.procyon.org.uk/
> [2] https://lore.kernel.org/all/20251227083751.715152-1-pierre@barre.sh/
> 
> I understand the point about cifs and nfs, where you must expect foreign
> changes on server side by another client.
> 
> For 9pfs though IMO it would make sense to distinguish:
> 
> for cache=loose (and cache=fscache?) we are not expecting host side changes,
> so we could safely go for the approach suggested by this patch here and not
> wait for flushing dirty data and just use the locally cached file size (for
> performance reasons).
> 
> And for all other cache modes going for the cifs/nfs approach and wait for
> flushing? Does this make sense?
> 
> I just tested this reported issue with the following reproducer:
> 
> --------------------------
> 
> #!/bin/bash
> 
> rm -f foo.txt
> 
> for i in {1..50}
> do
>  echo $i >> foo.txt &
>  ls -lha foo.txt &
> done
> 
> sync
> 
> cat foo.txt
> 
> echo
> 
> wc -l foo.txt
> 
> --------------------------
> 
> For cache=loose I couldn't reproduce, but with cache=mmap I could reproduce
> data corruption, folio writeback hangup and the following error message:
> 
> [   60.847671] [append] R=134d: No submit
> 
> ...
> 
> [  243.153292] INFO: task sync:986 blocked for more than 120 seconds.
> [  243.161516]       Tainted: G            E       6.19.0-rc1+ #107
> [  243.168847] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  243.178702] task:sync            state:D stack:0     pid:986   tgid:986   ppid:884    task_flags:0x400000 flags:0x00080002
> [  243.190379] Call Trace:
> [  243.192285]  <TASK>
> 
> [  243.193891]  __schedule (kernel/sched/core.c:5256 kernel/sched/core.c:6863)
> [  243.195999]  schedule (kernel/sched/core.c:6946 kernel/sched/core.c:6960)
> [  243.197730]  io_schedule (kernel/sched/core.c:7764 (discriminator 1) kernel/sched/core.c:7790 (discriminator 1))
> [  243.199418]  folio_wait_bit_common (mm/filemap.c:1312)
> [  243.201451]  ? __pfx_wake_page_function (mm/filemap.c:1134)
> [  243.203399]  folio_wait_writeback (./arch/x86/include/asm/bitops.h:202 ./arch/x86/include/asm/bitops.h:232 ./include/asm-generic/bitops/instrumented-non-atomic.h:142 ./include/linux/page-flags.h:597 mm/page-writeback.c:3087)
> [  243.205371]  __filemap_fdatawait_range (mm/filemap.c:529 (discriminator 1))
> [  243.207377]  ? _raw_spin_unlock (./include/linux/spinlock_api_smp.h:143 (discriminator 3) kernel/locking/spinlock.c:186 (discriminator 3))
> [  243.209305]  ? finish_task_switch.isra.0 (./arch/x86/include/asm/paravirt.h:671 kernel/sched/sched.h:1570 kernel/sched/core.c:4995 kernel/sched/core.c:5112)
> [  243.212030]  ? __schedule (kernel/sched/core.c:5259)
> [  243.214083]  ? wb_wait_for_completion (fs/fs-writeback.c:227)
> [  243.215925]  filemap_fdatawait_keep_errors (./arch/x86/include/asm/bitops.h:202 ./arch/x86/include/asm/bitops.h:232 ./include/asm-generic/bitops/instrumented-non-atomic.h:142 mm/filemap.c:363 mm/filemap.c:627)
> [  243.217628]  sync_inodes_sb (./include/linux/sched.h:2062 fs/fs-writeback.c:2777 fs/fs-writeback.c:2897)
> [  243.218914]  ? __pfx_sync_inodes_one_sb (fs/sync.c:75)
> [  243.220186]  __iterate_supers (fs/super.c:64 fs/super.c:925)
> [  243.221601]  ksys_sync (fs/sync.c:103)
> [  243.222746]  __do_sys_sync (fs/sync.c:115)
> [  243.223950]  do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1))
> [  243.226140]  entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:131)
> [  243.228375] RIP: 0033:0x7f51e4613707
> [  243.230260] RSP: 002b:00007ffdb2fccad8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a2
> [  243.234092] RAX: ffffffffffffffda RBX: 00007ffdb2fccc38 RCX: 00007f51e4613707
> [  243.238153] RDX: 00007f51e46f3501 RSI: 00007ffdb2fcef3e RDI: 00007f51e46ae557
> [  243.241063] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
> [  243.244028] R10: 00007f51e45be0d0 R11: 0000000000000246 R12: 00005568f2d350fb
> [  243.247586] R13: 0000000000000000 R14: 0000000000000000 R15: 00005568f2d37b00
> [  243.251622]  </TASK>
> 
> BTW running this script with cache=loose made me realize that Linux 9p client
> is always requesting server for every "ls foo.txt", which makes it extremely
> slow unnecessarily.
> 
>> fs/9p/vfs_inode.c      | 7 ++++++-
>> fs/9p/vfs_inode_dotl.c | 7 ++++++-
>> 2 files changed, 12 insertions(+), 2 deletions(-)
>> 
>> diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
>> index 97abe65bf7c1..bfba21f5d8a9 100644
>> --- a/fs/9p/vfs_inode.c
>> +++ b/fs/9p/vfs_inode.c
>> @@ -993,7 +993,12 @@ v9fs_vfs_getattr(struct mnt_idmap *idmap, const struct
>> path *path, if (IS_ERR(st))
>> return PTR_ERR(st);
>> 
>> - v9fs_stat2inode(st, d_inode(dentry), dentry->d_sb, 0);
>> + /*
>> + * With writeback caching, the client is authoritative for i_size.
>> + * Don't let the server overwrite it with a potentially stale value.
>> + */
>> + v9fs_stat2inode(st, d_inode(dentry), dentry->d_sb,
>> + (v9ses->cache & CACHE_WRITEBACK) ? V9FS_STAT2INODE_KEEP_ISIZE : 0);
>> generic_fillattr(&nop_mnt_idmap, request_mask, d_inode(dentry), stat);
>> 
>> p9stat_free(st);
>> diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
>> index 643e759eacb2..67a0ded2e223 100644
>> --- a/fs/9p/vfs_inode_dotl.c
>> +++ b/fs/9p/vfs_inode_dotl.c
>> @@ -453,7 +453,12 @@ v9fs_vfs_getattr_dotl(struct mnt_idmap *idmap,
>> if (IS_ERR(st))
>> return PTR_ERR(st);
>> 
>> - v9fs_stat2inode_dotl(st, d_inode(dentry), 0);
>> + /*
>> + * With writeback caching, the client is authoritative for i_size.
>> + * Don't let the server overwrite it with a potentially stale value.
>> + */
>> + v9fs_stat2inode_dotl(st, d_inode(dentry),
>> + (v9ses->cache & CACHE_WRITEBACK) ? V9FS_STAT2INODE_KEEP_ISIZE : 0);
>> generic_fillattr(&nop_mnt_idmap, request_mask, d_inode(dentry), stat);
>> /* Change block size to what the server returned */
>> stat->blksize = st->st_blksize;
> 
> 
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] 9p: fix i_size update race in getattr with writeback caching
  2026-01-07  0:27   ` Pierre Barre
@ 2026-02-04 11:48     ` Christian Schoenebeck
  2026-02-15  9:21       ` Dominique Martinet
  0 siblings, 1 reply; 11+ messages in thread
From: Christian Schoenebeck @ 2026-02-04 11:48 UTC (permalink / raw)
  To: Pierre Barre, asmadeus; +Cc: ericvh, lucho, David Howells, v9fs, linux-kernel

On Wednesday, 7 January 2026 01:27:50 CET Pierre Barre wrote:
> > On 5 Jan 2026, at 12:35, Christian Schoenebeck <linux_oss@crudebyte.com>
> > wrote:> 
> > On Saturday, 27 December 2025 19:01:37 CET Pierre Barre wrote:
> >> With writeback caching (cache=mmap), v9fs_stat2inode() and
> >> v9fs_stat2inode_dotl() unconditionally overwrite i_size from the server
> >> response, even when dirty pages may exist locally. This causes processes
> >> using lseek(SEEK_END) to see incorrect file sizes, leading to data
> >> corruption when extending files while concurrent stat() calls occur.
> >> 
> >> Fix by passing V9FS_STAT2INODE_KEEP_ISIZE when CACHE_WRITEBACK is
> >> enabled to preserve the client's authoritative i_size.
> >> 
> >> Signed-off-by: Pierre Barre <pierre@barre.sh>
> >> ---
> > 
> > Adding David Howells to this discussion.
> > 
> > David, I read today that you suggested [1] to go for a wait approach
> > instead? So somewhat similar to what Pierre suggested in v1 [2].
> 
> FWIW, I tested the initial approach I described and this seems to result in
> no real performance improvements compared to not using any writeback
> caching at all.

Right, surprisingly I don't measure a performance penalty either.

Dominique, suggestions how to proceed on this issue?

/Christian

> > [1] https://lore.kernel.org/all/1696785.1767599663@warthog.procyon.org.uk/
> > [2] https://lore.kernel.org/all/20251227083751.715152-1-pierre@barre.sh/
> > 
> > I understand the point about cifs and nfs, where you must expect foreign
> > changes on server side by another client.
> > 
> > For 9pfs though IMO it would make sense to distinguish:
> > 
> > for cache=loose (and cache=fscache?) we are not expecting host side
> > changes, so we could safely go for the approach suggested by this patch
> > here and not wait for flushing dirty data and just use the locally cached
> > file size (for performance reasons).
> > 
> > And for all other cache modes going for the cifs/nfs approach and wait for
> > flushing? Does this make sense?
> > 
> > I just tested this reported issue with the following reproducer:
> > 
> > --------------------------
> > 
> > #!/bin/bash
> > 
> > rm -f foo.txt
> > 
> > for i in {1..50}
> > do
> > 
> >  echo $i >> foo.txt &
> >  ls -lha foo.txt &
> > 
> > done
> > 
> > sync
> > 
> > cat foo.txt
> > 
> > echo
> > 
> > wc -l foo.txt
> > 
> > --------------------------
> > 
> > For cache=loose I couldn't reproduce, but with cache=mmap I could
> > reproduce
> > data corruption, folio writeback hangup and the following error message:
> > 
> > [   60.847671] [append] R=134d: No submit
> > 
> > ...
> > 
> > [  243.153292] INFO: task sync:986 blocked for more than 120 seconds.
> > [  243.161516]       Tainted: G            E       6.19.0-rc1+ #107
> > [  243.168847] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > this message. [  243.178702] task:sync            state:D stack:0    
> > pid:986   tgid:986   ppid:884    task_flags:0x400000 flags:0x00080002 [ 
> > 243.190379] Call Trace:
> > [  243.192285]  <TASK>
> > 
> > [  243.193891]  __schedule (kernel/sched/core.c:5256
> > kernel/sched/core.c:6863) [  243.195999]  schedule
> > (kernel/sched/core.c:6946 kernel/sched/core.c:6960) [  243.197730] 
> > io_schedule (kernel/sched/core.c:7764 (discriminator 1)
> > kernel/sched/core.c:7790 (discriminator 1)) [  243.199418] 
> > folio_wait_bit_common (mm/filemap.c:1312)
> > [  243.201451]  ? __pfx_wake_page_function (mm/filemap.c:1134)
> > [  243.203399]  folio_wait_writeback (./arch/x86/include/asm/bitops.h:202
> > ./arch/x86/include/asm/bitops.h:232
> > ./include/asm-generic/bitops/instrumented-non-atomic.h:142
> > ./include/linux/page-flags.h:597 mm/page-writeback.c:3087) [  243.205371]
> >  __filemap_fdatawait_range (mm/filemap.c:529 (discriminator 1)) [ 
> > 243.207377]  ? _raw_spin_unlock (./include/linux/spinlock_api_smp.h:143
> > (discriminator 3) kernel/locking/spinlock.c:186 (discriminator 3)) [ 
> > 243.209305]  ? finish_task_switch.isra.0
> > (./arch/x86/include/asm/paravirt.h:671 kernel/sched/sched.h:1570
> > kernel/sched/core.c:4995 kernel/sched/core.c:5112) [  243.212030]  ?
> > __schedule (kernel/sched/core.c:5259)
> > [  243.214083]  ? wb_wait_for_completion (fs/fs-writeback.c:227)
> > [  243.215925]  filemap_fdatawait_keep_errors
> > (./arch/x86/include/asm/bitops.h:202 ./arch/x86/include/asm/bitops.h:232
> > ./include/asm-generic/bitops/instrumented-non-atomic.h:142
> > mm/filemap.c:363 mm/filemap.c:627) [  243.217628]  sync_inodes_sb
> > (./include/linux/sched.h:2062 fs/fs-writeback.c:2777
> > fs/fs-writeback.c:2897) [  243.218914]  ? __pfx_sync_inodes_one_sb
> > (fs/sync.c:75)
> > [  243.220186]  __iterate_supers (fs/super.c:64 fs/super.c:925)
> > [  243.221601]  ksys_sync (fs/sync.c:103)
> > [  243.222746]  __do_sys_sync (fs/sync.c:115)
> > [  243.223950]  do_syscall_64 (arch/x86/entry/syscall_64.c:63
> > (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1)) [ 
> > 243.226140]  entry_SYSCALL_64_after_hwframe
> > (arch/x86/entry/entry_64.S:131) [  243.228375] RIP: 0033:0x7f51e4613707
> > [  243.230260] RSP: 002b:00007ffdb2fccad8 EFLAGS: 00000246 ORIG_RAX:
> > 00000000000000a2 [  243.234092] RAX: ffffffffffffffda RBX:
> > 00007ffdb2fccc38 RCX: 00007f51e4613707 [  243.238153] RDX:
> > 00007f51e46f3501 RSI: 00007ffdb2fcef3e RDI: 00007f51e46ae557 [ 
> > 243.241063] RBP: 0000000000000001 R08: 0000000000000000 R09:
> > 0000000000000000 [  243.244028] R10: 00007f51e45be0d0 R11:
> > 0000000000000246 R12: 00005568f2d350fb [  243.247586] R13:
> > 0000000000000000 R14: 0000000000000000 R15: 00005568f2d37b00 [ 
> > 243.251622]  </TASK>
> > 
> > BTW running this script with cache=loose made me realize that Linux 9p
> > client is always requesting server for every "ls foo.txt", which makes it
> > extremely slow unnecessarily.
> > 
> >> fs/9p/vfs_inode.c      | 7 ++++++-
> >> fs/9p/vfs_inode_dotl.c | 7 ++++++-
> >> 2 files changed, 12 insertions(+), 2 deletions(-)
> >> 
> >> diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
> >> index 97abe65bf7c1..bfba21f5d8a9 100644
> >> --- a/fs/9p/vfs_inode.c
> >> +++ b/fs/9p/vfs_inode.c
> >> @@ -993,7 +993,12 @@ v9fs_vfs_getattr(struct mnt_idmap *idmap, const
> >> struct
> >> path *path, if (IS_ERR(st))
> >> return PTR_ERR(st);
> >> 
> >> - v9fs_stat2inode(st, d_inode(dentry), dentry->d_sb, 0);
> >> + /*
> >> + * With writeback caching, the client is authoritative for i_size.
> >> + * Don't let the server overwrite it with a potentially stale value.
> >> + */
> >> + v9fs_stat2inode(st, d_inode(dentry), dentry->d_sb,
> >> + (v9ses->cache & CACHE_WRITEBACK) ? V9FS_STAT2INODE_KEEP_ISIZE : 0);
> >> generic_fillattr(&nop_mnt_idmap, request_mask, d_inode(dentry), stat);
> >> 
> >> p9stat_free(st);
> >> diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
> >> index 643e759eacb2..67a0ded2e223 100644
> >> --- a/fs/9p/vfs_inode_dotl.c
> >> +++ b/fs/9p/vfs_inode_dotl.c
> >> @@ -453,7 +453,12 @@ v9fs_vfs_getattr_dotl(struct mnt_idmap *idmap,
> >> if (IS_ERR(st))
> >> return PTR_ERR(st);
> >> 
> >> - v9fs_stat2inode_dotl(st, d_inode(dentry), 0);
> >> + /*
> >> + * With writeback caching, the client is authoritative for i_size.
> >> + * Don't let the server overwrite it with a potentially stale value.
> >> + */
> >> + v9fs_stat2inode_dotl(st, d_inode(dentry),
> >> + (v9ses->cache & CACHE_WRITEBACK) ? V9FS_STAT2INODE_KEEP_ISIZE : 0);
> >> generic_fillattr(&nop_mnt_idmap, request_mask, d_inode(dentry), stat);
> >> /* Change block size to what the server returned */
> >> stat->blksize = st->st_blksize;





^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] 9p: fix i_size update race in getattr with writeback caching
  2026-02-04 11:48     ` Christian Schoenebeck
@ 2026-02-15  9:21       ` Dominique Martinet
  0 siblings, 0 replies; 11+ messages in thread
From: Dominique Martinet @ 2026-02-15  9:21 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: Pierre Barre, ericvh, lucho, David Howells, v9fs, linux-kernel

Christian Schoenebeck wrote on Wed, Feb 04, 2026 at 12:48:55PM +0100:
> > FWIW, I tested the initial approach I described and this seems to result in
> > no real performance improvements compared to not using any writeback
> > caching at all.
> 
> Right, surprisingly I don't measure a performance penalty either.
> 
> Dominique, suggestions how to proceed on this issue?

Thanks for the ping and sorry for lack of activity lately.

I've just pinged David on IRC (and will remind him monday if no news) as
I don't have enough opinion here..

I'd be tempted to agree with your initial review that this v2 should be
enough though, so unless he has any reason to disagree I'll pick it up
and send to Linus after rc1

-- 
Dominique Martinet | Asmadeus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] 9p: fix i_size update race in getattr with writeback caching
  2026-01-05 11:35 ` Christian Schoenebeck
  2026-01-07  0:27   ` Pierre Barre
@ 2026-02-17 14:48   ` David Howells
  2026-02-17 15:05   ` David Howells
  2 siblings, 0 replies; 11+ messages in thread
From: David Howells @ 2026-02-17 14:48 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: dhowells, ericvh, lucho, asmadeus, Pierre Barre, v9fs,
	linux-kernel

Christian Schoenebeck <linux_oss@crudebyte.com> wrote:

> do
>   echo $i >> foo.txt &
>   ls -lha foo.txt &
> done
> 
> sync

Did you want a wait somewhere?  There's no guarantee that the asynchronous
processes thereby launched have done anything at the point sync happens.

David


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] 9p: fix i_size update race in getattr with writeback caching
  2026-01-05 11:35 ` Christian Schoenebeck
  2026-01-07  0:27   ` Pierre Barre
  2026-02-17 14:48   ` David Howells
@ 2026-02-17 15:05   ` David Howells
  2026-02-17 16:54     ` David Howells
  2 siblings, 1 reply; 11+ messages in thread
From: David Howells @ 2026-02-17 15:05 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: dhowells, ericvh, lucho, asmadeus, Pierre Barre, v9fs,
	linux-kernel

Christian Schoenebeck <linux_oss@crudebyte.com> wrote:

> For cache=loose I couldn't reproduce, but with cache=mmap I could reproduce
> data corruption, folio writeback hangup and the following error message:
> 
> [   60.847671] [append] R=134d: No submit

Can you get me some tracing?  If you can turn on:

echo 65536 >/sys/kernel/tracing/buffer_size_kb
echo 1 > /sys/kernel/tracing/events/netfs/netfs_read/enable
echo 1 > /sys/kernel/tracing/events/netfs/netfs_write/enable
echo 1 > /sys/kernel/tracing/events/netfs/netfs_rreq/enable
echo 1 > /sys/kernel/tracing/events/netfs/netfs_sreq/enable
echo 1 > /sys/kernel/tracing/events/netfs/netfs_failure/enable

run the test to generate the issue, then get the trace:

cat /sys/kernel/tracing/trace | bzip2 -9 >trace.txt

and grab whatever it said in dmesg, particularly if it mentions R=<hex-number>.

Thanks,
David


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] 9p: fix i_size update race in getattr with writeback caching
  2026-02-17 15:05   ` David Howells
@ 2026-02-17 16:54     ` David Howells
  0 siblings, 0 replies; 11+ messages in thread
From: David Howells @ 2026-02-17 16:54 UTC (permalink / raw)
  Cc: dhowells, Christian Schoenebeck, ericvh, lucho, asmadeus,
	Pierre Barre, v9fs, linux-kernel

David Howells <dhowells@redhat.com> wrote:

> > [   60.847671] [append] R=134d: No submit
> 
> Can you get me some tracing?  If you can turn on:

Nevermind, I've managed to reproduce it - but only with the virtio transport
and not the tcp one.  It also doesn't seem to show anything with afs.

     9p_repro.sh-1399 : netfs_folio: i=d306b44 ix=00000-00000 mod-streamw
     9p_repro.sh-1401 : netfs_folio: i=d306b44 ix=00000-00000 flush
     9p_repro.sh-1401 : netfs_write: R=000001be WRITEBACK c=00000000 i=d306b44 by=0-ffffffffffffffff
     9p_repro.sh-1401 : netfs_folio: i=d306b44 ix=00000-00000 store
   kworker/u32:2-63   : netfs_rreq: R=000001be WB COLLECT f=903
   kworker/u32:2-63   : netfs_rreq: R=000001be WB WR-DONE f=903
   kworker/u32:2-63   : netfs_rreq: R=000001be WB WAKE-IP f=902
   kworker/u32:2-63   : netfs_rreq: R=000001be WB FREE    f=902

Looks like netfs failed to flush an incompatible streaming write.  The folio
was moved to the 'store' state, but no subrequests were generated for some
reason.

*Why* it produced an incompatible streaming write is an interesting question.
The writes are all done with O_APPEND, so even if the processes write out of
order, there shouldn't be a problem...

David

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] 9p: fix i_size update race in getattr with writeback caching
  2025-12-27 18:01 [PATCH v2] 9p: fix i_size update race in getattr with writeback caching Pierre Barre
  2026-01-05 11:35 ` Christian Schoenebeck
@ 2026-02-18 13:38 ` David Howells
  2026-02-18 13:41   ` David Howells
  1 sibling, 1 reply; 11+ messages in thread
From: David Howells @ 2026-02-18 13:38 UTC (permalink / raw)
  To: Pierre Barre
  Cc: dhowells, ericvh, lucho, asmadeus, linux_oss, v9fs, linux-kernel

Pierre Barre <pierre@barre.sh> wrote:

> With writeback caching (cache=mmap), v9fs_stat2inode() and
> v9fs_stat2inode_dotl() unconditionally overwrite i_size from the server
> response, even when dirty pages may exist locally. This causes processes
> using lseek(SEEK_END) to see incorrect file sizes, leading to data
> corruption when extending files while concurrent stat() calls occur.
> 
> Fix by passing V9FS_STAT2INODE_KEEP_ISIZE when CACHE_WRITEBACK is
> enabled to preserve the client's authoritative i_size.

I think this might be the wrong approach.

A better way is probably, in v9fs_stat2inode() and v9fs_stat2inode_dotl(), if
the inode isn't new, compare the values for stat->mtime to inode->i_mtime and
stat->length to v9inode->netfs.remote_i_size and if they differ mark the inode
as being remotely modified, invalidate the pagecache and reset inode->i_size.

If stat->mtime == inode->i_mtime and stat->length ==
v9inode->netfs.remote_i_size, then don't alter inode->i_size.

David

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] 9p: fix i_size update race in getattr with writeback caching
  2026-02-18 13:38 ` David Howells
@ 2026-02-18 13:41   ` David Howells
  2026-02-18 14:11     ` Dominique Martinet
  0 siblings, 1 reply; 11+ messages in thread
From: David Howells @ 2026-02-18 13:41 UTC (permalink / raw)
  To: Pierre Barre
  Cc: dhowells, ericvh, lucho, asmadeus, linux_oss, v9fs, linux-kernel

David Howells <dhowells@redhat.com> wrote:

> A better way is probably, in v9fs_stat2inode() and v9fs_stat2inode_dotl(), if
> the inode isn't new, compare the values for stat->mtime to inode->i_mtime and
> stat->length to v9inode->netfs.remote_i_size and if they differ mark the inode
> as being remotely modified, invalidate the pagecache and reset inode->i_size.
> 
> If stat->mtime == inode->i_mtime and stat->length ==
> v9inode->netfs.remote_i_size, then don't alter inode->i_size.

Also, you'd need to update remote_i_size upon successful completion of a write
RPC call to increase it up to the end of the write.

David


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] 9p: fix i_size update race in getattr with writeback caching
  2026-02-18 13:41   ` David Howells
@ 2026-02-18 14:11     ` Dominique Martinet
  0 siblings, 0 replies; 11+ messages in thread
From: Dominique Martinet @ 2026-02-18 14:11 UTC (permalink / raw)
  To: David Howells; +Cc: Pierre Barre, ericvh, lucho, linux_oss, v9fs, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2501 bytes --]

David Howells wrote on Wed, Feb 18, 2026 at 01:41:25PM +0000:
> > A better way is probably, in v9fs_stat2inode() and v9fs_stat2inode_dotl(), if
> > the inode isn't new, compare the values for stat->mtime to inode->i_mtime and
> > stat->length to v9inode->netfs.remote_i_size and if they differ mark the inode
> > as being remotely modified, invalidate the pagecache and reset inode->i_size.
> > 
> > If stat->mtime == inode->i_mtime and stat->length ==
> > v9inode->netfs.remote_i_size, then don't alter inode->i_size.
> 
> Also, you'd need to update remote_i_size upon successful completion of a write
> RPC call to increase it up to the end of the write.

I pointed out a couple of problems with that on IRC, so replying here
for everyone:
- for msize, it's set by the server (using server clock) anywhere
between our write request and the reply, so we can't predict it (we
could, however, remember that there's been an io or not, and if mtime
changes when there was no io we could use it to invalidate our cache or
something)
- for size, we'd need to flush (like the v1 of this patch) -- apparently
this is fs specific but it might make sense to flush unless
AT_STATX_DONT_SYNC is specified similarly to what nfs_getattr() does


This part is just me thinking out loud (not discussed properly), but I
kind of like what nfs does in fs/nfs/inode.c nfs_wcc_update_inode() and
nfs_check_inode_attributes(): if there are IOs in progress, they ignore
the server i_size:
                if (fattr->valid & NFS_ATTR_FATTR_SIZE) {
                        cur_size = i_size_read(inode);
                        new_isize = nfs_size_to_loff_t(fattr->size);
                        if (cur_size != new_isize)
                                invalid |= NFS_INO_INVALID_SIZE;
                }

If cache is used I agree with Christian that the client can be
considered correct, but I don't like the idea that we'd never catch a
server update that happens much later, but that might need work...


Right now I'm a bit torn between "it'd be good to fix this sooner than
later" (and want to take this as a stop-gap) and "it's scary to never
notice size changes I'm sure it'll break some CI again", so since David
said he'd look some more I'm thinking of giving him a bit more time
first but if there's no clear solution emerging in a week or two let's
first get this v2 in as a first step, then we can do better as a second
iteration...


I've attached irc logs for today, if anyone wants to read up

-- 
Dominique

[-- Attachment #2: #linuxfs.26-02-18.log --]
[-- Type: text/plain, Size: 8065 bytes --]

--- Log opened Wed Feb 18 00:00:52 2026
15:09 -!- jelly is now known as Guest3038
17:48 < dhowells> Asmadeus: okay, the problem is that netfs_perform_write() moves i_size after netfs_write_folio() has sampled it
17:49 < dhowells> but I'm not sure how because both functions access i_size with the folio locked
18:04 < Asmadeus> Are you sure it's not as he said in the commit? e.g. ls triggering a stat fetch from server, and outdated server value overriding the local one
18:05 < dhowells> Asmadeus: weird - I'm seeing i_size revert somehow
18:06 < Asmadeus> v9fs_vfs_getattr() calls v9fs_stat2inode() which in turn sets i_size (v9fs_i_size_write) if not told not to
18:07 < dhowells> 9p_repro.sh-1580 : netfs_write_iter: WRITE-ITER i=d306b44 s=1e l=3 f=10
18:07 < dhowells> 9p_repro.sh-1580 : netfs_modify: i=d306b44 ix=00000 o=1e l=3
18:07 < dhowells> 9p_repro.sh-1580 : netfs_folio: i=d306b44 ix=00000-00000 mod-streamw
18:07 < dhowells> 9p_repro.sh-1580 : netfs_i_size: i=d306b44 iz=21
18:07 < dhowells> 9p_repro.sh-1582 : netfs_write_iter: WRITE-ITER i=d306b44 s=1e l=3 f=10
18:07 < dhowells> 9p_repro.sh-1582 : netfs_finfo: i=d306b44 ix=00000 dirty=1e-21
18:07 < dhowells> 9p_repro.sh-1582 : netfs_flush: i=d306b44 ix=00000 o=1e l=3
18:07 < dhowells> 9p_repro.sh-1582 : netfs_folio: i=d306b44 ix=00000-00000 flush
18:07 < dhowells> 9p_repro.sh-1582 : netfs_write: R=000001fd WRITEBACK c=00000000 i=d306b44 by=0-ffffffffffffffff
18:07 < dhowells> 9p_repro.sh-1582 : netfs_i_size: i=d306b44 iz=1e
18:07 < dhowells> 9p_repro.sh-1582 : netfs_wback: i=d306b44 ix=00000 o=1e l=3 iz=1e
18:07 < dhowells> 9p_repro.sh-1582 : netfs_folio: i=d306b44 ix=00000-00000 store
18:07 < dhowells> hmmm
18:08 < dhowells> maybe 9p is reverting the inode size?
18:08 < Asmadeus> Yes, that's what his patch prevents (if cache is enabled)
18:08 < Asmadeus> Well, v2
18:09 < dhowells> ah - there's a patch at the beginning of the message sequence - I started in the middle of it
18:10 < dhowells> I have to go out for a bit, will try that patch when I get back
18:10 < Asmadeus> I'm not sure why v1 wanted to wait (and worked), so I'm tempted to just go with v2: if cache is enabled then the client view is authoritative... But 9p doesn't have any mechanism like nfs to lease or just re-check a file after 60s, so it's pretty permanent
18:11 < Asmadeus> Ah, sorry if you hadn't seen the patch. Christian says it works so I trust his test here, mostly wanted your opinion on this -- how is netfs supposed to handle remote file changes
20:56 -!- jelly is now known as Guest3067
21:30 < dhowells> Asmadeus: the 9P write RPC call doesn't return the new file attributes, does it?
21:41 -!- Guest3067 is now known as jelly
21:51 < dhowells> Asmadeus: I think your options are a bit limited with 9P as it doesn't have any equivalent of AFS's data version or NFS's change attribute
21:51 < dhowells> if a third party change happens, you're kind of stuffed
21:53 < dhowells> I think what we need to do is, check to see whether the size differs from that which we expect (ie. ->remote_i_size)
22:03 < Asmadeus> right, write returns the amount of data written, so we can keep track of what we wrote but it doesn't tell more
22:06 < Asmadeus> I didn't remember remote_i_size, we actually set it in a couple of places... but I don't see where it's actually used?
22:10 < Asmadeus> oh, in fs/nfs/inode.c they have an interesting check: if the new_isize obtained from server is different from the cur_isize, they only update the size if !nfs_have_writebacks(inode)
22:11 < Asmadeus> Is there a way to tell if there are writebacks in progress with netfs?
22:12 < Asmadeus> It apparently just checks if there are outstanding requests (atomic_long_read(&NFS_I(inode)->nrequests) != 0)
22:13 < Asmadeus> It seems to me that if there's no outstanding request/unflushed data and size from the server doesn't match we could invalidate the cache and set the new size -- it's not perfect (won't catch in-place writes) but better than just ignoring the size
22:14 < Asmadeus> If there's unflushed data or pending writes then keeping the local size is the sane thing to do
22:38 < dhowells> Asmadeus: see the reply I just sent
22:41 < Asmadeus> We can't check mtime, it's set by the server when it processes the IO but we don't know exactly when (won't exactly match whenever we update our's)
22:42 < Asmadeus> likewise remote_i_size is the last size we got, so if we just wrote to the file it's expected to have changed?
22:43 < dhowells> by "we don't know exactly when" - do you mean you don't know at what point the server will update it or do you mean that you don't know to what the server will update it?
22:43 < dhowells> "... to what the server will set it?" I should say
22:43 < Asmadeus> we don't know to what the server will set it, right
22:44 < Asmadeus> It should set it somewhere between a write call and it's reply, but the value will be any timestamp in between
22:44 < dhowells> wrt to i_size, I sent a follow up to my reply - we'd have to update remote_i_size on completion of the write
22:44 < dhowells> of the write RPC
22:45 < dhowells> hmmm
22:45 < dhowells> so we can predict the new i_size, but not the new mtime
22:45 < Asmadeus> Well I guess we could bound it, but then there's also the can of worms of the server and client not being synchronized properly.. it'll be set at the *server time* between these two, not our's
22:45 < dhowells> however, we can note that we expect the mtime to have changed
22:46 < dhowells> and forego the check
22:46 < Asmadeus> Right we could note that we expect it to not have changed if we didn't do any IO
22:46 < Asmadeus> (and thus any change means remote got updated)
22:47 < dhowells> I presume there's no way to "compound" a status fetch with a write?
22:47 < dhowells> (like you can do in cifs and nfs3+)
22:47 < Asmadeus> For size, just updating the size when write is done might not be enough for the same reason... Picture this: write starts -> sent to server, server updates size, meanwhile we send a stat() in parallel and get updated size.. but write is not complete so remote i_size isn't updated yet
22:48 < Asmadeus> Correct, we have no compound/chained IO either
22:48 < dhowells> actually, stat() is less of a problem
22:49 < Asmadeus> I think there's no way around either serializing stat() with other IOs, or ignoring the size if there are IOs in flight like nfs does
22:49 < dhowells> unless userspace states AT_STATX_DONT_SYNC, you're going to have to flush
22:49 < iokill> 
22:51 < Asmadeus> Would a flush make any further IO wait for the stat to complete as well?
22:51 < dhowells> I suppose not
22:51 < Asmadeus> (I don't think we're flushing at all right now.. that might look like the v1)
22:52 < dhowells> no, you call filemap_fdatawrite(), but that doesn't wait
22:52 < Asmadeus>        AT_STATX_SYNC_AS_STAT
22:52 < Asmadeus>               Do whatever stat(2) does.  This is the default and is very much filesystem-specific.
22:53 < dhowells> yes - because this is a flag for statx(), not stat()
22:53 < Asmadeus> Yes, but I meant according to this there is not standard about whether we need to sync or not?
22:53 < dhowells> for stat(), correct
22:53 < Asmadeus> (although we shouldn't ignore AT_STATX_FORCE_SYNC..)
22:54 < dhowells> I think you need to call filemap_write_and_wait()
22:54 < dhowells> unless AT_STATX_DONT_SYNC is specified
22:54 < dhowells> see nfs_getattr()
22:55 < Asmadeus> So that'd be his v1 - https://lore.kernel.org/all/20251227083751.715152-1-pierre@barre.sh/T/#u
22:55 < Asmadeus> (well, it actually does both waiting and ignores the value in cached mode)
22:56 < dhowells> yeah - it needs some work, I think
22:56 < dhowells> I can have a play with it
22:57 < dhowells> give me a little while to poke at it
22:59 < Asmadeus> Sure, thanks -- I'll reply to the mail anyway with what we discussed (perhaps just quote it all as well...) for others
23:00 < Asmadeus> And I still think something like nfs update-i_size-unless-nfs_have_writebacks() might be interesting if we can do it reasonably

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-02-18 14:11 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-27 18:01 [PATCH v2] 9p: fix i_size update race in getattr with writeback caching Pierre Barre
2026-01-05 11:35 ` Christian Schoenebeck
2026-01-07  0:27   ` Pierre Barre
2026-02-04 11:48     ` Christian Schoenebeck
2026-02-15  9:21       ` Dominique Martinet
2026-02-17 14:48   ` David Howells
2026-02-17 15:05   ` David Howells
2026-02-17 16:54     ` David Howells
2026-02-18 13:38 ` David Howells
2026-02-18 13:41   ` David Howells
2026-02-18 14:11     ` Dominique Martinet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox