[BUG] 9p: data corruption with cache=mmap under concurrent stat/write

public inbox for v9fs@lists.linux.dev
 help / color / mirror / Atom feed

* [BUG] 9p: data corruption with cache=mmap under concurrent stat/write
@ 2025-12-24 14:29 Pierre Barre
  2025-12-24 22:33 ` Dominique Martinet
  0 siblings, 1 reply; 6+ messages in thread
From: Pierre Barre @ 2025-12-24 14:29 UTC (permalink / raw)
  To: ericvh, lucho, asmadeus; +Cc: linux_oss, v9fs, linux-kernel

Hi,

I'm hitting data corruption using 9p with cache=mmap when stat() is called concurrently with writes.

Environment:
- Kernel: v6.18.1-061801
- Mount options: cache=mmap
- Transport: unix

Reproducer:
1. Mount 9p filesystem with cache=mmap
2. Run PostgreSQL with data directory on 9p mount
3. Run pgbench workload
4. Simultaneously run `watch -n 0.1 tree -ah` on the data directory

PostgreSQL reports:
  ERROR: unexpected data beyond EOF in block N of relation "..."
  HINT: This has been seen to occur with buggy kernels

Analysis:

The issue appears to be race conditions in getattr/setattr when using
writeback caching:

1. v9fs_vfs_getattr_dotl() condition checks `v9ses->cache` instead of
   `v9ses->cache & CACHE_WRITEBACK`, triggering writeback flush for
   any cache mode
2. Both getattr and setattr call filemap_fdatawrite() which initiates
   writeback but doesn't wait for completion. The subsequent server
   stat/wstat sees stale file size.

Would using filemap_write_and_wait() instead be the correct fix?

Thanks,
Pierre Barre

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] 9p: data corruption with cache=mmap under concurrent stat/write
  2025-12-24 14:29 [BUG] 9p: data corruption with cache=mmap under concurrent stat/write Pierre Barre
@ 2025-12-24 22:33 ` Dominique Martinet
  2025-12-25 10:23   ` Christian Schoenebeck
  0 siblings, 1 reply; 6+ messages in thread
From: Dominique Martinet @ 2025-12-24 22:33 UTC (permalink / raw)
  To: Pierre Barre; +Cc: ericvh, lucho, linux_oss, v9fs, linux-kernel, David Howells

Hi Pierre,

Pierre Barre wrote on Wed, Dec 24, 2025 at 03:29:01PM +0100:
> I'm hitting data corruption using 9p with cache=mmap when stat() is called concurrently with writes.

Thanks for the report

> Environment:
> - Kernel: v6.18.1-061801
> - Mount options: cache=mmap
> - Transport: unix
> 
> Reproducer:
> 1. Mount 9p filesystem with cache=mmap
> 2. Run PostgreSQL with data directory on 9p mount
> 3. Run pgbench workload
> 4. Simultaneously run `watch -n 0.1 tree -ah` on the data directory
> 
> PostgreSQL reports:
>   ERROR: unexpected data beyond EOF in block N of relation "..."

unexpected data beyond EOF looks a lot like
https://lkml.kernel.org/r/938162.1766233900@warthog.procyon.org.uk

could you try with this patch?

if it doesn't work we need a better look

>   HINT: This has been seen to occur with buggy kernels
> 
> Analysis:
> 
> The issue appears to be race conditions in getattr/setattr when using
> writeback caching:
> 
> 1. v9fs_vfs_getattr_dotl() condition checks `v9ses->cache` instead of
>    `v9ses->cache & CACHE_WRITEBACK`, triggering writeback flush for
>    any cache mode
> 2. Both getattr and setattr call filemap_fdatawrite() which initiates
>    writeback but doesn't wait for completion. The subsequent server
>    stat/wstat sees stale file size.
> 
> Would using filemap_write_and_wait() instead be the correct fix?

-- 
Dominique Martinet | Asmadeus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] 9p: data corruption with cache=mmap under concurrent stat/write
  2025-12-24 22:33 ` Dominique Martinet
@ 2025-12-25 10:23   ` Christian Schoenebeck
  2025-12-25 14:52     ` Pierre Barre
  2026-01-05  7:54     ` David Howells
  0 siblings, 2 replies; 6+ messages in thread
From: Christian Schoenebeck @ 2025-12-25 10:23 UTC (permalink / raw)
  To: Pierre Barre, Dominique Martinet
  Cc: ericvh, lucho, v9fs, linux-kernel, David Howells

On Wednesday, 24 December 2025 23:33:58 CET Dominique Martinet wrote:
> Hi Pierre,
> 
> Pierre Barre wrote on Wed, Dec 24, 2025 at 03:29:01PM +0100:
> > I'm hitting data corruption using 9p with cache=mmap when stat() is called
> > concurrently with writes.
> Thanks for the report
> 
> > Environment:
> > - Kernel: v6.18.1-061801
> > - Mount options: cache=mmap
> > - Transport: unix
> > 
> > Reproducer:
> > 1. Mount 9p filesystem with cache=mmap
> > 2. Run PostgreSQL with data directory on 9p mount
> > 3. Run pgbench workload
> > 4. Simultaneously run `watch -n 0.1 tree -ah` on the data directory
> > 
> > PostgreSQL reports:
> >   ERROR: unexpected data beyond EOF in block N of relation "..."
> 
> unexpected data beyond EOF looks a lot like
> https://lkml.kernel.org/r/938162.1766233900@warthog.procyon.org.uk
> 
> could you try with this patch?

Pierre, I am also confident that this patch will fix the EOF data issue you 
encountered with PostgreSQL. However ...

> >   HINT: This has been seen to occur with buggy kernels
> > 
> > Analysis:
> > 
> > The issue appears to be race conditions in getattr/setattr when using
> > writeback caching:
> > 
> > 1. v9fs_vfs_getattr_dotl() condition checks `v9ses->cache` instead of
> > 
> >    `v9ses->cache & CACHE_WRITEBACK`, triggering writeback flush for
> >    any cache mode
> > 
> > 2. Both getattr and setattr call filemap_fdatawrite() which initiates
> > 
> >    writeback but doesn't wait for completion. The subsequent server
> >    stat/wstat sees stale file size.
> > 
> > Would using filemap_write_and_wait() instead be the correct fix?

... you are seeing a 2nd issue? getattr() output should not be related to 
mmap() access.

/Christian



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] 9p: data corruption with cache=mmap under concurrent stat/write
  2025-12-25 10:23   ` Christian Schoenebeck
@ 2025-12-25 14:52     ` Pierre Barre
  2025-12-26 13:13       ` Pierre Barre
  2026-01-05  7:54     ` David Howells
  1 sibling, 1 reply; 6+ messages in thread
From: Pierre Barre @ 2025-12-25 14:52 UTC (permalink / raw)
  To: Christian Schoenebeck, asmadeus
  Cc: ericvh, lucho, v9fs, linux-kernel, David Howells

Hi Christian, Dominique,

Thank you for your reply and merry christmas.

>> unexpected data beyond EOF looks a lot like
>> https://lkml.kernel.org/r/938162.1766233900@warthog.procyon.org.uk
>> 
>> could you try with this patch?

I will try this patch and report back.

> ... you are seeing a 2nd issue? getattr() output should not be related to 
> mmap() access.

What's strange is that this issue doesn't occur during normal Postgres operation or while just running benchmarks. I initially encountered it while running du -hs during a pgbench benchmark, and I've since been able to reproduce it consistently with watch -n 0.1 tree -ah. Running the benchmarks for hours never trigger this bug, but it (almost) immediately occurs during du -hs / tree -ah.

Best,
Pierre.

On Thu, Dec 25, 2025, at 11:23, Christian Schoenebeck wrote:
> On Wednesday, 24 December 2025 23:33:58 CET Dominique Martinet wrote:
>> Hi Pierre,
>> 
>> Pierre Barre wrote on Wed, Dec 24, 2025 at 03:29:01PM +0100:
>> > I'm hitting data corruption using 9p with cache=mmap when stat() is called
>> > concurrently with writes.
>> Thanks for the report
>> 
>> > Environment:
>> > - Kernel: v6.18.1-061801
>> > - Mount options: cache=mmap
>> > - Transport: unix
>> > 
>> > Reproducer:
>> > 1. Mount 9p filesystem with cache=mmap
>> > 2. Run PostgreSQL with data directory on 9p mount
>> > 3. Run pgbench workload
>> > 4. Simultaneously run `watch -n 0.1 tree -ah` on the data directory
>> > 
>> > PostgreSQL reports:
>> >   ERROR: unexpected data beyond EOF in block N of relation "..."
>> 
>> unexpected data beyond EOF looks a lot like
>> https://lkml.kernel.org/r/938162.1766233900@warthog.procyon.org.uk
>> 
>> could you try with this patch?
>
> Pierre, I am also confident that this patch will fix the EOF data issue you 
> encountered with PostgreSQL. However ...
>
>> >   HINT: This has been seen to occur with buggy kernels
>> > 
>> > Analysis:
>> > 
>> > The issue appears to be race conditions in getattr/setattr when using
>> > writeback caching:
>> > 
>> > 1. v9fs_vfs_getattr_dotl() condition checks `v9ses->cache` instead of
>> > 
>> >    `v9ses->cache & CACHE_WRITEBACK`, triggering writeback flush for
>> >    any cache mode
>> > 
>> > 2. Both getattr and setattr call filemap_fdatawrite() which initiates
>> > 
>> >    writeback but doesn't wait for completion. The subsequent server
>> >    stat/wstat sees stale file size.
>> > 
>> > Would using filemap_write_and_wait() instead be the correct fix?
>
> ... you are seeing a 2nd issue? getattr() output should not be related to 
> mmap() access.
>
> /Christian

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] 9p: data corruption with cache=mmap under concurrent stat/write
  2025-12-25 14:52     ` Pierre Barre
@ 2025-12-26 13:13       ` Pierre Barre
  0 siblings, 0 replies; 6+ messages in thread
From: Pierre Barre @ 2025-12-26 13:13 UTC (permalink / raw)
  To: Christian Schoenebeck, asmadeus
  Cc: ericvh, lucho, v9fs, linux-kernel, David Howells

To clarify: the issue isn't with mmap access specifically. With cache=mmap (which enables CACHE_WRITEBACK), PostgreSQL uses regular write() calls that go through the page cache with writeback caching.

Best,
Pierre

> On 25 Dec 2025, at 15:52, Pierre Barre <pierre@barre.sh> wrote:
> 
> Hi Christian, Dominique,
> 
> Thank you for your reply and merry christmas.
> 
>>> unexpected data beyond EOF looks a lot like
>>> https://lkml.kernel.org/r/938162.1766233900@warthog.procyon.org.uk
>>> 
>>> could you try with this patch?
> 
> I will try this patch and report back.
> 
>> ... you are seeing a 2nd issue? getattr() output should not be related to 
>> mmap() access.
> 
> What's strange is that this issue doesn't occur during normal Postgres operation or while just running benchmarks. I initially encountered it while running du -hs during a pgbench benchmark, and I've since been able to reproduce it consistently with watch -n 0.1 tree -ah. Running the benchmarks for hours never trigger this bug, but it (almost) immediately occurs during du -hs / tree -ah.
> 
> Best,
> Pierre.
> 
> On Thu, Dec 25, 2025, at 11:23, Christian Schoenebeck wrote:
>> On Wednesday, 24 December 2025 23:33:58 CET Dominique Martinet wrote:
>>> Hi Pierre,
>>> 
>>> Pierre Barre wrote on Wed, Dec 24, 2025 at 03:29:01PM +0100:
>>>> I'm hitting data corruption using 9p with cache=mmap when stat() is called
>>>> concurrently with writes.
>>> Thanks for the report
>>> 
>>>> Environment:
>>>> - Kernel: v6.18.1-061801
>>>> - Mount options: cache=mmap
>>>> - Transport: unix
>>>> 
>>>> Reproducer:
>>>> 1. Mount 9p filesystem with cache=mmap
>>>> 2. Run PostgreSQL with data directory on 9p mount
>>>> 3. Run pgbench workload
>>>> 4. Simultaneously run `watch -n 0.1 tree -ah` on the data directory
>>>> 
>>>> PostgreSQL reports:
>>>>  ERROR: unexpected data beyond EOF in block N of relation "..."
>>> 
>>> unexpected data beyond EOF looks a lot like
>>> https://lkml.kernel.org/r/938162.1766233900@warthog.procyon.org.uk
>>> 
>>> could you try with this patch?
>> 
>> Pierre, I am also confident that this patch will fix the EOF data issue you 
>> encountered with PostgreSQL. However ...
>> 
>>>>  HINT: This has been seen to occur with buggy kernels
>>>> 
>>>> Analysis:
>>>> 
>>>> The issue appears to be race conditions in getattr/setattr when using
>>>> writeback caching:
>>>> 
>>>> 1. v9fs_vfs_getattr_dotl() condition checks `v9ses->cache` instead of
>>>> 
>>>>   `v9ses->cache & CACHE_WRITEBACK`, triggering writeback flush for
>>>>   any cache mode
>>>> 
>>>> 2. Both getattr and setattr call filemap_fdatawrite() which initiates
>>>> 
>>>>   writeback but doesn't wait for completion. The subsequent server
>>>>   stat/wstat sees stale file size.
>>>> 
>>>> Would using filemap_write_and_wait() instead be the correct fix?
>> 
>> ... you are seeing a 2nd issue? getattr() output should not be related to 
>> mmap() access.
>> 
>> /Christian



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] 9p: data corruption with cache=mmap under concurrent stat/write
  2025-12-25 10:23   ` Christian Schoenebeck
  2025-12-25 14:52     ` Pierre Barre
@ 2026-01-05  7:54     ` David Howells
  1 sibling, 0 replies; 6+ messages in thread
From: David Howells @ 2026-01-05  7:54 UTC (permalink / raw)
  To: Christian Schoenebeck
  Cc: dhowells, Pierre Barre, Dominique Martinet, ericvh, lucho, v9fs,
	linux-kernel

Christian Schoenebeck <linux_oss@crudebyte.com> wrote:

> > > 2. Both getattr and setattr call filemap_fdatawrite() which initiates
> > > 
> > >    writeback but doesn't wait for completion. The subsequent server
> > >    stat/wstat sees stale file size.
> > > 
> > > Would using filemap_write_and_wait() instead be the correct fix?
> 
> ... you are seeing a 2nd issue? getattr() output should not be related to 
> mmap() access.

getattr() may flush outstanding dirty data on an inode so that the stats are
correct - but if so, it should wait for completion.  If you look at cifs and
nfs, those uses filemap_datawait() or filemap_write_and_wait(), rather then
filemap_datawrite().

You might also want to check flags & AT_STATX_FORCE_SYNC and flags &
AT_STATX_DONT_SYNC.

I really ought to make afs honour AT_STATX_FORCE_SYNC.

David

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-01-05  7:54 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-24 14:29 [BUG] 9p: data corruption with cache=mmap under concurrent stat/write Pierre Barre
2025-12-24 22:33 ` Dominique Martinet
2025-12-25 10:23   ` Christian Schoenebeck
2025-12-25 14:52     ` Pierre Barre
2025-12-26 13:13       ` Pierre Barre
2026-01-05  7:54     ` David Howells

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox