Re: [PATCH v2] 9p: fix i_size update race in getattr with writeback caching

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Christian Schoenebeck <linux_oss@crudebyte.com>
To: Pierre Barre <pierre@barre.sh>, asmadeus <asmadeus@codewreck.org>
Cc: ericvh@kernel.org, lucho@ionkov.net,
	David Howells <dhowells@redhat.com>,
	v9fs@lists.linux.dev, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] 9p: fix i_size update race in getattr with writeback caching
Date: Wed, 04 Feb 2026 12:48:55 +0100	[thread overview]
Message-ID: <1946294.tdWV9SEqCh@weasel> (raw)
In-Reply-To: <64008E3D-7EEE-4E6C-9392-986C1D8E8858@barre.sh>

On Wednesday, 7 January 2026 01:27:50 CET Pierre Barre wrote:
> > On 5 Jan 2026, at 12:35, Christian Schoenebeck <linux_oss@crudebyte.com>
> > wrote:> 
> > On Saturday, 27 December 2025 19:01:37 CET Pierre Barre wrote:
> >> With writeback caching (cache=mmap), v9fs_stat2inode() and
> >> v9fs_stat2inode_dotl() unconditionally overwrite i_size from the server
> >> response, even when dirty pages may exist locally. This causes processes
> >> using lseek(SEEK_END) to see incorrect file sizes, leading to data
> >> corruption when extending files while concurrent stat() calls occur.
> >> 
> >> Fix by passing V9FS_STAT2INODE_KEEP_ISIZE when CACHE_WRITEBACK is
> >> enabled to preserve the client's authoritative i_size.
> >> 
> >> Signed-off-by: Pierre Barre <pierre@barre.sh>
> >> ---
> > 
> > Adding David Howells to this discussion.
> > 
> > David, I read today that you suggested [1] to go for a wait approach
> > instead? So somewhat similar to what Pierre suggested in v1 [2].
> 
> FWIW, I tested the initial approach I described and this seems to result in
> no real performance improvements compared to not using any writeback
> caching at all.

Right, surprisingly I don't measure a performance penalty either.

Dominique, suggestions how to proceed on this issue?

/Christian

> > [1] https://lore.kernel.org/all/1696785.1767599663@warthog.procyon.org.uk/
> > [2] https://lore.kernel.org/all/20251227083751.715152-1-pierre@barre.sh/
> > 
> > I understand the point about cifs and nfs, where you must expect foreign
> > changes on server side by another client.
> > 
> > For 9pfs though IMO it would make sense to distinguish:
> > 
> > for cache=loose (and cache=fscache?) we are not expecting host side
> > changes, so we could safely go for the approach suggested by this patch
> > here and not wait for flushing dirty data and just use the locally cached
> > file size (for performance reasons).
> > 
> > And for all other cache modes going for the cifs/nfs approach and wait for
> > flushing? Does this make sense?
> > 
> > I just tested this reported issue with the following reproducer:
> > 
> > --------------------------
> > 
> > #!/bin/bash
> > 
> > rm -f foo.txt
> > 
> > for i in {1..50}
> > do
> > 
> >  echo $i >> foo.txt &
> >  ls -lha foo.txt &
> > 
> > done
> > 
> > sync
> > 
> > cat foo.txt
> > 
> > echo
> > 
> > wc -l foo.txt
> > 
> > --------------------------
> > 
> > For cache=loose I couldn't reproduce, but with cache=mmap I could
> > reproduce
> > data corruption, folio writeback hangup and the following error message:
> > 
> > [   60.847671] [append] R=134d: No submit
> > 
> > ...
> > 
> > [  243.153292] INFO: task sync:986 blocked for more than 120 seconds.
> > [  243.161516]       Tainted: G            E       6.19.0-rc1+ #107
> > [  243.168847] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > this message. [  243.178702] task:sync            state:D stack:0    
> > pid:986   tgid:986   ppid:884    task_flags:0x400000 flags:0x00080002 [ 
> > 243.190379] Call Trace:
> > [  243.192285]  <TASK>
> > 
> > [  243.193891]  __schedule (kernel/sched/core.c:5256
> > kernel/sched/core.c:6863) [  243.195999]  schedule
> > (kernel/sched/core.c:6946 kernel/sched/core.c:6960) [  243.197730] 
> > io_schedule (kernel/sched/core.c:7764 (discriminator 1)
> > kernel/sched/core.c:7790 (discriminator 1)) [  243.199418] 
> > folio_wait_bit_common (mm/filemap.c:1312)
> > [  243.201451]  ? __pfx_wake_page_function (mm/filemap.c:1134)
> > [  243.203399]  folio_wait_writeback (./arch/x86/include/asm/bitops.h:202
> > ./arch/x86/include/asm/bitops.h:232
> > ./include/asm-generic/bitops/instrumented-non-atomic.h:142
> > ./include/linux/page-flags.h:597 mm/page-writeback.c:3087) [  243.205371]
> >  __filemap_fdatawait_range (mm/filemap.c:529 (discriminator 1)) [ 
> > 243.207377]  ? _raw_spin_unlock (./include/linux/spinlock_api_smp.h:143
> > (discriminator 3) kernel/locking/spinlock.c:186 (discriminator 3)) [ 
> > 243.209305]  ? finish_task_switch.isra.0
> > (./arch/x86/include/asm/paravirt.h:671 kernel/sched/sched.h:1570
> > kernel/sched/core.c:4995 kernel/sched/core.c:5112) [  243.212030]  ?
> > __schedule (kernel/sched/core.c:5259)
> > [  243.214083]  ? wb_wait_for_completion (fs/fs-writeback.c:227)
> > [  243.215925]  filemap_fdatawait_keep_errors
> > (./arch/x86/include/asm/bitops.h:202 ./arch/x86/include/asm/bitops.h:232
> > ./include/asm-generic/bitops/instrumented-non-atomic.h:142
> > mm/filemap.c:363 mm/filemap.c:627) [  243.217628]  sync_inodes_sb
> > (./include/linux/sched.h:2062 fs/fs-writeback.c:2777
> > fs/fs-writeback.c:2897) [  243.218914]  ? __pfx_sync_inodes_one_sb
> > (fs/sync.c:75)
> > [  243.220186]  __iterate_supers (fs/super.c:64 fs/super.c:925)
> > [  243.221601]  ksys_sync (fs/sync.c:103)
> > [  243.222746]  __do_sys_sync (fs/sync.c:115)
> > [  243.223950]  do_syscall_64 (arch/x86/entry/syscall_64.c:63
> > (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1)) [ 
> > 243.226140]  entry_SYSCALL_64_after_hwframe
> > (arch/x86/entry/entry_64.S:131) [  243.228375] RIP: 0033:0x7f51e4613707
> > [  243.230260] RSP: 002b:00007ffdb2fccad8 EFLAGS: 00000246 ORIG_RAX:
> > 00000000000000a2 [  243.234092] RAX: ffffffffffffffda RBX:
> > 00007ffdb2fccc38 RCX: 00007f51e4613707 [  243.238153] RDX:
> > 00007f51e46f3501 RSI: 00007ffdb2fcef3e RDI: 00007f51e46ae557 [ 
> > 243.241063] RBP: 0000000000000001 R08: 0000000000000000 R09:
> > 0000000000000000 [  243.244028] R10: 00007f51e45be0d0 R11:
> > 0000000000000246 R12: 00005568f2d350fb [  243.247586] R13:
> > 0000000000000000 R14: 0000000000000000 R15: 00005568f2d37b00 [ 
> > 243.251622]  </TASK>
> > 
> > BTW running this script with cache=loose made me realize that Linux 9p
> > client is always requesting server for every "ls foo.txt", which makes it
> > extremely slow unnecessarily.
> > 
> >> fs/9p/vfs_inode.c      | 7 ++++++-
> >> fs/9p/vfs_inode_dotl.c | 7 ++++++-
> >> 2 files changed, 12 insertions(+), 2 deletions(-)
> >> 
> >> diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
> >> index 97abe65bf7c1..bfba21f5d8a9 100644
> >> --- a/fs/9p/vfs_inode.c
> >> +++ b/fs/9p/vfs_inode.c
> >> @@ -993,7 +993,12 @@ v9fs_vfs_getattr(struct mnt_idmap *idmap, const
> >> struct
> >> path *path, if (IS_ERR(st))
> >> return PTR_ERR(st);
> >> 
> >> - v9fs_stat2inode(st, d_inode(dentry), dentry->d_sb, 0);
> >> + /*
> >> + * With writeback caching, the client is authoritative for i_size.
> >> + * Don't let the server overwrite it with a potentially stale value.
> >> + */
> >> + v9fs_stat2inode(st, d_inode(dentry), dentry->d_sb,
> >> + (v9ses->cache & CACHE_WRITEBACK) ? V9FS_STAT2INODE_KEEP_ISIZE : 0);
> >> generic_fillattr(&nop_mnt_idmap, request_mask, d_inode(dentry), stat);
> >> 
> >> p9stat_free(st);
> >> diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
> >> index 643e759eacb2..67a0ded2e223 100644
> >> --- a/fs/9p/vfs_inode_dotl.c
> >> +++ b/fs/9p/vfs_inode_dotl.c
> >> @@ -453,7 +453,12 @@ v9fs_vfs_getattr_dotl(struct mnt_idmap *idmap,
> >> if (IS_ERR(st))
> >> return PTR_ERR(st);
> >> 
> >> - v9fs_stat2inode_dotl(st, d_inode(dentry), 0);
> >> + /*
> >> + * With writeback caching, the client is authoritative for i_size.
> >> + * Don't let the server overwrite it with a potentially stale value.
> >> + */
> >> + v9fs_stat2inode_dotl(st, d_inode(dentry),
> >> + (v9ses->cache & CACHE_WRITEBACK) ? V9FS_STAT2INODE_KEEP_ISIZE : 0);
> >> generic_fillattr(&nop_mnt_idmap, request_mask, d_inode(dentry), stat);
> >> /* Change block size to what the server returned */
> >> stat->blksize = st->st_blksize;

next prev parent reply	other threads:[~2026-02-04 11:49 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-27 18:01 [PATCH v2] 9p: fix i_size update race in getattr with writeback caching Pierre Barre
2026-01-05 11:35 ` Christian Schoenebeck
2026-01-07  0:27   ` Pierre Barre
2026-02-04 11:48     ` Christian Schoenebeck [this message]
2026-02-15  9:21       ` Dominique Martinet
2026-02-17 14:48   ` David Howells
2026-02-17 15:05   ` David Howells
2026-02-17 16:54     ` David Howells
2026-02-18 13:38 ` David Howells
2026-02-18 13:41   ` David Howells
2026-02-18 14:11     ` Dominique Martinet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1946294.tdWV9SEqCh@weasel \
    --to=linux_oss@crudebyte.com \
    --cc=asmadeus@codewreck.org \
    --cc=dhowells@redhat.com \
    --cc=ericvh@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lucho@ionkov.net \
    --cc=pierre@barre.sh \
    --cc=v9fs@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox