* [BUG] 9p: data corruption with cache=mmap under concurrent stat/write
@ 2025-12-24 14:29 Pierre Barre
2025-12-24 22:33 ` Dominique Martinet
0 siblings, 1 reply; 6+ messages in thread
From: Pierre Barre @ 2025-12-24 14:29 UTC (permalink / raw)
To: ericvh, lucho, asmadeus; +Cc: linux_oss, v9fs, linux-kernel
Hi,
I'm hitting data corruption using 9p with cache=mmap when stat() is called concurrently with writes.
Environment:
- Kernel: v6.18.1-061801
- Mount options: cache=mmap
- Transport: unix
Reproducer:
1. Mount 9p filesystem with cache=mmap
2. Run PostgreSQL with data directory on 9p mount
3. Run pgbench workload
4. Simultaneously run `watch -n 0.1 tree -ah` on the data directory
PostgreSQL reports:
ERROR: unexpected data beyond EOF in block N of relation "..."
HINT: This has been seen to occur with buggy kernels
Analysis:
The issue appears to be race conditions in getattr/setattr when using
writeback caching:
1. v9fs_vfs_getattr_dotl() condition checks `v9ses->cache` instead of
`v9ses->cache & CACHE_WRITEBACK`, triggering writeback flush for
any cache mode
2. Both getattr and setattr call filemap_fdatawrite() which initiates
writeback but doesn't wait for completion. The subsequent server
stat/wstat sees stale file size.
Would using filemap_write_and_wait() instead be the correct fix?
Thanks,
Pierre Barre
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [BUG] 9p: data corruption with cache=mmap under concurrent stat/write
2025-12-24 14:29 [BUG] 9p: data corruption with cache=mmap under concurrent stat/write Pierre Barre
@ 2025-12-24 22:33 ` Dominique Martinet
2025-12-25 10:23 ` Christian Schoenebeck
0 siblings, 1 reply; 6+ messages in thread
From: Dominique Martinet @ 2025-12-24 22:33 UTC (permalink / raw)
To: Pierre Barre; +Cc: ericvh, lucho, linux_oss, v9fs, linux-kernel, David Howells
Hi Pierre,
Pierre Barre wrote on Wed, Dec 24, 2025 at 03:29:01PM +0100:
> I'm hitting data corruption using 9p with cache=mmap when stat() is called concurrently with writes.
Thanks for the report
> Environment:
> - Kernel: v6.18.1-061801
> - Mount options: cache=mmap
> - Transport: unix
>
> Reproducer:
> 1. Mount 9p filesystem with cache=mmap
> 2. Run PostgreSQL with data directory on 9p mount
> 3. Run pgbench workload
> 4. Simultaneously run `watch -n 0.1 tree -ah` on the data directory
>
> PostgreSQL reports:
> ERROR: unexpected data beyond EOF in block N of relation "..."
unexpected data beyond EOF looks a lot like
https://lkml.kernel.org/r/938162.1766233900@warthog.procyon.org.uk
could you try with this patch?
if it doesn't work we need a better look
> HINT: This has been seen to occur with buggy kernels
>
> Analysis:
>
> The issue appears to be race conditions in getattr/setattr when using
> writeback caching:
>
> 1. v9fs_vfs_getattr_dotl() condition checks `v9ses->cache` instead of
> `v9ses->cache & CACHE_WRITEBACK`, triggering writeback flush for
> any cache mode
> 2. Both getattr and setattr call filemap_fdatawrite() which initiates
> writeback but doesn't wait for completion. The subsequent server
> stat/wstat sees stale file size.
>
> Would using filemap_write_and_wait() instead be the correct fix?
--
Dominique Martinet | Asmadeus
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [BUG] 9p: data corruption with cache=mmap under concurrent stat/write
2025-12-24 22:33 ` Dominique Martinet
@ 2025-12-25 10:23 ` Christian Schoenebeck
2025-12-25 14:52 ` Pierre Barre
2026-01-05 7:54 ` David Howells
0 siblings, 2 replies; 6+ messages in thread
From: Christian Schoenebeck @ 2025-12-25 10:23 UTC (permalink / raw)
To: Pierre Barre, Dominique Martinet
Cc: ericvh, lucho, v9fs, linux-kernel, David Howells
On Wednesday, 24 December 2025 23:33:58 CET Dominique Martinet wrote:
> Hi Pierre,
>
> Pierre Barre wrote on Wed, Dec 24, 2025 at 03:29:01PM +0100:
> > I'm hitting data corruption using 9p with cache=mmap when stat() is called
> > concurrently with writes.
> Thanks for the report
>
> > Environment:
> > - Kernel: v6.18.1-061801
> > - Mount options: cache=mmap
> > - Transport: unix
> >
> > Reproducer:
> > 1. Mount 9p filesystem with cache=mmap
> > 2. Run PostgreSQL with data directory on 9p mount
> > 3. Run pgbench workload
> > 4. Simultaneously run `watch -n 0.1 tree -ah` on the data directory
> >
> > PostgreSQL reports:
> > ERROR: unexpected data beyond EOF in block N of relation "..."
>
> unexpected data beyond EOF looks a lot like
> https://lkml.kernel.org/r/938162.1766233900@warthog.procyon.org.uk
>
> could you try with this patch?
Pierre, I am also confident that this patch will fix the EOF data issue you
encountered with PostgreSQL. However ...
> > HINT: This has been seen to occur with buggy kernels
> >
> > Analysis:
> >
> > The issue appears to be race conditions in getattr/setattr when using
> > writeback caching:
> >
> > 1. v9fs_vfs_getattr_dotl() condition checks `v9ses->cache` instead of
> >
> > `v9ses->cache & CACHE_WRITEBACK`, triggering writeback flush for
> > any cache mode
> >
> > 2. Both getattr and setattr call filemap_fdatawrite() which initiates
> >
> > writeback but doesn't wait for completion. The subsequent server
> > stat/wstat sees stale file size.
> >
> > Would using filemap_write_and_wait() instead be the correct fix?
... you are seeing a 2nd issue? getattr() output should not be related to
mmap() access.
/Christian
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [BUG] 9p: data corruption with cache=mmap under concurrent stat/write
2025-12-25 10:23 ` Christian Schoenebeck
@ 2025-12-25 14:52 ` Pierre Barre
2025-12-26 13:13 ` Pierre Barre
2026-01-05 7:54 ` David Howells
1 sibling, 1 reply; 6+ messages in thread
From: Pierre Barre @ 2025-12-25 14:52 UTC (permalink / raw)
To: Christian Schoenebeck, asmadeus
Cc: ericvh, lucho, v9fs, linux-kernel, David Howells
Hi Christian, Dominique,
Thank you for your reply and merry christmas.
>> unexpected data beyond EOF looks a lot like
>> https://lkml.kernel.org/r/938162.1766233900@warthog.procyon.org.uk
>>
>> could you try with this patch?
I will try this patch and report back.
> ... you are seeing a 2nd issue? getattr() output should not be related to
> mmap() access.
What's strange is that this issue doesn't occur during normal Postgres operation or while just running benchmarks. I initially encountered it while running du -hs during a pgbench benchmark, and I've since been able to reproduce it consistently with watch -n 0.1 tree -ah. Running the benchmarks for hours never trigger this bug, but it (almost) immediately occurs during du -hs / tree -ah.
Best,
Pierre.
On Thu, Dec 25, 2025, at 11:23, Christian Schoenebeck wrote:
> On Wednesday, 24 December 2025 23:33:58 CET Dominique Martinet wrote:
>> Hi Pierre,
>>
>> Pierre Barre wrote on Wed, Dec 24, 2025 at 03:29:01PM +0100:
>> > I'm hitting data corruption using 9p with cache=mmap when stat() is called
>> > concurrently with writes.
>> Thanks for the report
>>
>> > Environment:
>> > - Kernel: v6.18.1-061801
>> > - Mount options: cache=mmap
>> > - Transport: unix
>> >
>> > Reproducer:
>> > 1. Mount 9p filesystem with cache=mmap
>> > 2. Run PostgreSQL with data directory on 9p mount
>> > 3. Run pgbench workload
>> > 4. Simultaneously run `watch -n 0.1 tree -ah` on the data directory
>> >
>> > PostgreSQL reports:
>> > ERROR: unexpected data beyond EOF in block N of relation "..."
>>
>> unexpected data beyond EOF looks a lot like
>> https://lkml.kernel.org/r/938162.1766233900@warthog.procyon.org.uk
>>
>> could you try with this patch?
>
> Pierre, I am also confident that this patch will fix the EOF data issue you
> encountered with PostgreSQL. However ...
>
>> > HINT: This has been seen to occur with buggy kernels
>> >
>> > Analysis:
>> >
>> > The issue appears to be race conditions in getattr/setattr when using
>> > writeback caching:
>> >
>> > 1. v9fs_vfs_getattr_dotl() condition checks `v9ses->cache` instead of
>> >
>> > `v9ses->cache & CACHE_WRITEBACK`, triggering writeback flush for
>> > any cache mode
>> >
>> > 2. Both getattr and setattr call filemap_fdatawrite() which initiates
>> >
>> > writeback but doesn't wait for completion. The subsequent server
>> > stat/wstat sees stale file size.
>> >
>> > Would using filemap_write_and_wait() instead be the correct fix?
>
> ... you are seeing a 2nd issue? getattr() output should not be related to
> mmap() access.
>
> /Christian
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [BUG] 9p: data corruption with cache=mmap under concurrent stat/write
2025-12-25 14:52 ` Pierre Barre
@ 2025-12-26 13:13 ` Pierre Barre
0 siblings, 0 replies; 6+ messages in thread
From: Pierre Barre @ 2025-12-26 13:13 UTC (permalink / raw)
To: Christian Schoenebeck, asmadeus
Cc: ericvh, lucho, v9fs, linux-kernel, David Howells
To clarify: the issue isn't with mmap access specifically. With cache=mmap (which enables CACHE_WRITEBACK), PostgreSQL uses regular write() calls that go through the page cache with writeback caching.
Best,
Pierre
> On 25 Dec 2025, at 15:52, Pierre Barre <pierre@barre.sh> wrote:
>
> Hi Christian, Dominique,
>
> Thank you for your reply and merry christmas.
>
>>> unexpected data beyond EOF looks a lot like
>>> https://lkml.kernel.org/r/938162.1766233900@warthog.procyon.org.uk
>>>
>>> could you try with this patch?
>
> I will try this patch and report back.
>
>> ... you are seeing a 2nd issue? getattr() output should not be related to
>> mmap() access.
>
> What's strange is that this issue doesn't occur during normal Postgres operation or while just running benchmarks. I initially encountered it while running du -hs during a pgbench benchmark, and I've since been able to reproduce it consistently with watch -n 0.1 tree -ah. Running the benchmarks for hours never trigger this bug, but it (almost) immediately occurs during du -hs / tree -ah.
>
> Best,
> Pierre.
>
> On Thu, Dec 25, 2025, at 11:23, Christian Schoenebeck wrote:
>> On Wednesday, 24 December 2025 23:33:58 CET Dominique Martinet wrote:
>>> Hi Pierre,
>>>
>>> Pierre Barre wrote on Wed, Dec 24, 2025 at 03:29:01PM +0100:
>>>> I'm hitting data corruption using 9p with cache=mmap when stat() is called
>>>> concurrently with writes.
>>> Thanks for the report
>>>
>>>> Environment:
>>>> - Kernel: v6.18.1-061801
>>>> - Mount options: cache=mmap
>>>> - Transport: unix
>>>>
>>>> Reproducer:
>>>> 1. Mount 9p filesystem with cache=mmap
>>>> 2. Run PostgreSQL with data directory on 9p mount
>>>> 3. Run pgbench workload
>>>> 4. Simultaneously run `watch -n 0.1 tree -ah` on the data directory
>>>>
>>>> PostgreSQL reports:
>>>> ERROR: unexpected data beyond EOF in block N of relation "..."
>>>
>>> unexpected data beyond EOF looks a lot like
>>> https://lkml.kernel.org/r/938162.1766233900@warthog.procyon.org.uk
>>>
>>> could you try with this patch?
>>
>> Pierre, I am also confident that this patch will fix the EOF data issue you
>> encountered with PostgreSQL. However ...
>>
>>>> HINT: This has been seen to occur with buggy kernels
>>>>
>>>> Analysis:
>>>>
>>>> The issue appears to be race conditions in getattr/setattr when using
>>>> writeback caching:
>>>>
>>>> 1. v9fs_vfs_getattr_dotl() condition checks `v9ses->cache` instead of
>>>>
>>>> `v9ses->cache & CACHE_WRITEBACK`, triggering writeback flush for
>>>> any cache mode
>>>>
>>>> 2. Both getattr and setattr call filemap_fdatawrite() which initiates
>>>>
>>>> writeback but doesn't wait for completion. The subsequent server
>>>> stat/wstat sees stale file size.
>>>>
>>>> Would using filemap_write_and_wait() instead be the correct fix?
>>
>> ... you are seeing a 2nd issue? getattr() output should not be related to
>> mmap() access.
>>
>> /Christian
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [BUG] 9p: data corruption with cache=mmap under concurrent stat/write
2025-12-25 10:23 ` Christian Schoenebeck
2025-12-25 14:52 ` Pierre Barre
@ 2026-01-05 7:54 ` David Howells
1 sibling, 0 replies; 6+ messages in thread
From: David Howells @ 2026-01-05 7:54 UTC (permalink / raw)
To: Christian Schoenebeck
Cc: dhowells, Pierre Barre, Dominique Martinet, ericvh, lucho, v9fs,
linux-kernel
Christian Schoenebeck <linux_oss@crudebyte.com> wrote:
> > > 2. Both getattr and setattr call filemap_fdatawrite() which initiates
> > >
> > > writeback but doesn't wait for completion. The subsequent server
> > > stat/wstat sees stale file size.
> > >
> > > Would using filemap_write_and_wait() instead be the correct fix?
>
> ... you are seeing a 2nd issue? getattr() output should not be related to
> mmap() access.
getattr() may flush outstanding dirty data on an inode so that the stats are
correct - but if so, it should wait for completion. If you look at cifs and
nfs, those uses filemap_datawait() or filemap_write_and_wait(), rather then
filemap_datawrite().
You might also want to check flags & AT_STATX_FORCE_SYNC and flags &
AT_STATX_DONT_SYNC.
I really ought to make afs honour AT_STATX_FORCE_SYNC.
David
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-01-05 7:54 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-24 14:29 [BUG] 9p: data corruption with cache=mmap under concurrent stat/write Pierre Barre
2025-12-24 22:33 ` Dominique Martinet
2025-12-25 10:23 ` Christian Schoenebeck
2025-12-25 14:52 ` Pierre Barre
2025-12-26 13:13 ` Pierre Barre
2026-01-05 7:54 ` David Howells
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox