All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christian Schoenebeck <linux_oss@crudebyte.com>
To: Pierre Barre <pierre@barre.sh>, asmadeus <asmadeus@codewreck.org>
Cc: ericvh@kernel.org, lucho@ionkov.net,
	David Howells <dhowells@redhat.com>,
	v9fs@lists.linux.dev, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] 9p: fix i_size update race in getattr with writeback caching
Date: Wed, 04 Feb 2026 12:48:55 +0100	[thread overview]
Message-ID: <1946294.tdWV9SEqCh@weasel> (raw)
In-Reply-To: <64008E3D-7EEE-4E6C-9392-986C1D8E8858@barre.sh>

On Wednesday, 7 January 2026 01:27:50 CET Pierre Barre wrote:
> > On 5 Jan 2026, at 12:35, Christian Schoenebeck <linux_oss@crudebyte.com>
> > wrote:> 
> > On Saturday, 27 December 2025 19:01:37 CET Pierre Barre wrote:
> >> With writeback caching (cache=mmap), v9fs_stat2inode() and
> >> v9fs_stat2inode_dotl() unconditionally overwrite i_size from the server
> >> response, even when dirty pages may exist locally. This causes processes
> >> using lseek(SEEK_END) to see incorrect file sizes, leading to data
> >> corruption when extending files while concurrent stat() calls occur.
> >> 
> >> Fix by passing V9FS_STAT2INODE_KEEP_ISIZE when CACHE_WRITEBACK is
> >> enabled to preserve the client's authoritative i_size.
> >> 
> >> Signed-off-by: Pierre Barre <pierre@barre.sh>
> >> ---
> > 
> > Adding David Howells to this discussion.
> > 
> > David, I read today that you suggested [1] to go for a wait approach
> > instead? So somewhat similar to what Pierre suggested in v1 [2].
> 
> FWIW, I tested the initial approach I described and this seems to result in
> no real performance improvements compared to not using any writeback
> caching at all.

Right, surprisingly I don't measure a performance penalty either.

Dominique, suggestions how to proceed on this issue?

/Christian

> > [1] https://lore.kernel.org/all/1696785.1767599663@warthog.procyon.org.uk/
> > [2] https://lore.kernel.org/all/20251227083751.715152-1-pierre@barre.sh/
> > 
> > I understand the point about cifs and nfs, where you must expect foreign
> > changes on server side by another client.
> > 
> > For 9pfs though IMO it would make sense to distinguish:
> > 
> > for cache=loose (and cache=fscache?) we are not expecting host side
> > changes, so we could safely go for the approach suggested by this patch
> > here and not wait for flushing dirty data and just use the locally cached
> > file size (for performance reasons).
> > 
> > And for all other cache modes going for the cifs/nfs approach and wait for
> > flushing? Does this make sense?
> > 
> > I just tested this reported issue with the following reproducer:
> > 
> > --------------------------
> > 
> > #!/bin/bash
> > 
> > rm -f foo.txt
> > 
> > for i in {1..50}
> > do
> > 
> >  echo $i >> foo.txt &
> >  ls -lha foo.txt &
> > 
> > done
> > 
> > sync
> > 
> > cat foo.txt
> > 
> > echo
> > 
> > wc -l foo.txt
> > 
> > --------------------------
> > 
> > For cache=loose I couldn't reproduce, but with cache=mmap I could
> > reproduce
> > data corruption, folio writeback hangup and the following error message:
> > 
> > [   60.847671] [append] R=134d: No submit
> > 
> > ...
> > 
> > [  243.153292] INFO: task sync:986 blocked for more than 120 seconds.
> > [  243.161516]       Tainted: G            E       6.19.0-rc1+ #107
> > [  243.168847] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> > this message. [  243.178702] task:sync            state:D stack:0    
> > pid:986   tgid:986   ppid:884    task_flags:0x400000 flags:0x00080002 [ 
> > 243.190379] Call Trace:
> > [  243.192285]  <TASK>
> > 
> > [  243.193891]  __schedule (kernel/sched/core.c:5256
> > kernel/sched/core.c:6863) [  243.195999]  schedule
> > (kernel/sched/core.c:6946 kernel/sched/core.c:6960) [  243.197730] 
> > io_schedule (kernel/sched/core.c:7764 (discriminator 1)
> > kernel/sched/core.c:7790 (discriminator 1)) [  243.199418] 
> > folio_wait_bit_common (mm/filemap.c:1312)
> > [  243.201451]  ? __pfx_wake_page_function (mm/filemap.c:1134)
> > [  243.203399]  folio_wait_writeback (./arch/x86/include/asm/bitops.h:202
> > ./arch/x86/include/asm/bitops.h:232
> > ./include/asm-generic/bitops/instrumented-non-atomic.h:142
> > ./include/linux/page-flags.h:597 mm/page-writeback.c:3087) [  243.205371]
> >  __filemap_fdatawait_range (mm/filemap.c:529 (discriminator 1)) [ 
> > 243.207377]  ? _raw_spin_unlock (./include/linux/spinlock_api_smp.h:143
> > (discriminator 3) kernel/locking/spinlock.c:186 (discriminator 3)) [ 
> > 243.209305]  ? finish_task_switch.isra.0
> > (./arch/x86/include/asm/paravirt.h:671 kernel/sched/sched.h:1570
> > kernel/sched/core.c:4995 kernel/sched/core.c:5112) [  243.212030]  ?
> > __schedule (kernel/sched/core.c:5259)
> > [  243.214083]  ? wb_wait_for_completion (fs/fs-writeback.c:227)
> > [  243.215925]  filemap_fdatawait_keep_errors
> > (./arch/x86/include/asm/bitops.h:202 ./arch/x86/include/asm/bitops.h:232
> > ./include/asm-generic/bitops/instrumented-non-atomic.h:142
> > mm/filemap.c:363 mm/filemap.c:627) [  243.217628]  sync_inodes_sb
> > (./include/linux/sched.h:2062 fs/fs-writeback.c:2777
> > fs/fs-writeback.c:2897) [  243.218914]  ? __pfx_sync_inodes_one_sb
> > (fs/sync.c:75)
> > [  243.220186]  __iterate_supers (fs/super.c:64 fs/super.c:925)
> > [  243.221601]  ksys_sync (fs/sync.c:103)
> > [  243.222746]  __do_sys_sync (fs/sync.c:115)
> > [  243.223950]  do_syscall_64 (arch/x86/entry/syscall_64.c:63
> > (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1)) [ 
> > 243.226140]  entry_SYSCALL_64_after_hwframe
> > (arch/x86/entry/entry_64.S:131) [  243.228375] RIP: 0033:0x7f51e4613707
> > [  243.230260] RSP: 002b:00007ffdb2fccad8 EFLAGS: 00000246 ORIG_RAX:
> > 00000000000000a2 [  243.234092] RAX: ffffffffffffffda RBX:
> > 00007ffdb2fccc38 RCX: 00007f51e4613707 [  243.238153] RDX:
> > 00007f51e46f3501 RSI: 00007ffdb2fcef3e RDI: 00007f51e46ae557 [ 
> > 243.241063] RBP: 0000000000000001 R08: 0000000000000000 R09:
> > 0000000000000000 [  243.244028] R10: 00007f51e45be0d0 R11:
> > 0000000000000246 R12: 00005568f2d350fb [  243.247586] R13:
> > 0000000000000000 R14: 0000000000000000 R15: 00005568f2d37b00 [ 
> > 243.251622]  </TASK>
> > 
> > BTW running this script with cache=loose made me realize that Linux 9p
> > client is always requesting server for every "ls foo.txt", which makes it
> > extremely slow unnecessarily.
> > 
> >> fs/9p/vfs_inode.c      | 7 ++++++-
> >> fs/9p/vfs_inode_dotl.c | 7 ++++++-
> >> 2 files changed, 12 insertions(+), 2 deletions(-)
> >> 
> >> diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
> >> index 97abe65bf7c1..bfba21f5d8a9 100644
> >> --- a/fs/9p/vfs_inode.c
> >> +++ b/fs/9p/vfs_inode.c
> >> @@ -993,7 +993,12 @@ v9fs_vfs_getattr(struct mnt_idmap *idmap, const
> >> struct
> >> path *path, if (IS_ERR(st))
> >> return PTR_ERR(st);
> >> 
> >> - v9fs_stat2inode(st, d_inode(dentry), dentry->d_sb, 0);
> >> + /*
> >> + * With writeback caching, the client is authoritative for i_size.
> >> + * Don't let the server overwrite it with a potentially stale value.
> >> + */
> >> + v9fs_stat2inode(st, d_inode(dentry), dentry->d_sb,
> >> + (v9ses->cache & CACHE_WRITEBACK) ? V9FS_STAT2INODE_KEEP_ISIZE : 0);
> >> generic_fillattr(&nop_mnt_idmap, request_mask, d_inode(dentry), stat);
> >> 
> >> p9stat_free(st);
> >> diff --git a/fs/9p/vfs_inode_dotl.c b/fs/9p/vfs_inode_dotl.c
> >> index 643e759eacb2..67a0ded2e223 100644
> >> --- a/fs/9p/vfs_inode_dotl.c
> >> +++ b/fs/9p/vfs_inode_dotl.c
> >> @@ -453,7 +453,12 @@ v9fs_vfs_getattr_dotl(struct mnt_idmap *idmap,
> >> if (IS_ERR(st))
> >> return PTR_ERR(st);
> >> 
> >> - v9fs_stat2inode_dotl(st, d_inode(dentry), 0);
> >> + /*
> >> + * With writeback caching, the client is authoritative for i_size.
> >> + * Don't let the server overwrite it with a potentially stale value.
> >> + */
> >> + v9fs_stat2inode_dotl(st, d_inode(dentry),
> >> + (v9ses->cache & CACHE_WRITEBACK) ? V9FS_STAT2INODE_KEEP_ISIZE : 0);
> >> generic_fillattr(&nop_mnt_idmap, request_mask, d_inode(dentry), stat);
> >> /* Change block size to what the server returned */
> >> stat->blksize = st->st_blksize;





  reply	other threads:[~2026-02-04 11:49 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-27 18:01 [PATCH v2] 9p: fix i_size update race in getattr with writeback caching Pierre Barre
2026-01-05 11:35 ` Christian Schoenebeck
2026-01-07  0:27   ` Pierre Barre
2026-02-04 11:48     ` Christian Schoenebeck [this message]
2026-02-15  9:21       ` Dominique Martinet
2026-02-17 14:48   ` David Howells
2026-02-17 15:05   ` David Howells
2026-02-17 16:54     ` David Howells
2026-02-18 13:38 ` David Howells
2026-02-18 13:41   ` David Howells
2026-02-18 14:11     ` Dominique Martinet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1946294.tdWV9SEqCh@weasel \
    --to=linux_oss@crudebyte.com \
    --cc=asmadeus@codewreck.org \
    --cc=dhowells@redhat.com \
    --cc=ericvh@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lucho@ionkov.net \
    --cc=pierre@barre.sh \
    --cc=v9fs@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.