* [BUG] 9p: data corruption with cache=mmap under concurrent stat/write @ 2025-12-24 14:29 Pierre Barre 2025-12-24 22:33 ` Dominique Martinet 0 siblings, 1 reply; 6+ messages in thread From: Pierre Barre @ 2025-12-24 14:29 UTC (permalink / raw) To: ericvh, lucho, asmadeus; +Cc: linux_oss, v9fs, linux-kernel Hi, I'm hitting data corruption using 9p with cache=mmap when stat() is called concurrently with writes. Environment: - Kernel: v6.18.1-061801 - Mount options: cache=mmap - Transport: unix Reproducer: 1. Mount 9p filesystem with cache=mmap 2. Run PostgreSQL with data directory on 9p mount 3. Run pgbench workload 4. Simultaneously run `watch -n 0.1 tree -ah` on the data directory PostgreSQL reports: ERROR: unexpected data beyond EOF in block N of relation "..." HINT: This has been seen to occur with buggy kernels Analysis: The issue appears to be race conditions in getattr/setattr when using writeback caching: 1. v9fs_vfs_getattr_dotl() condition checks `v9ses->cache` instead of `v9ses->cache & CACHE_WRITEBACK`, triggering writeback flush for any cache mode 2. Both getattr and setattr call filemap_fdatawrite() which initiates writeback but doesn't wait for completion. The subsequent server stat/wstat sees stale file size. Would using filemap_write_and_wait() instead be the correct fix? Thanks, Pierre Barre ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [BUG] 9p: data corruption with cache=mmap under concurrent stat/write 2025-12-24 14:29 [BUG] 9p: data corruption with cache=mmap under concurrent stat/write Pierre Barre @ 2025-12-24 22:33 ` Dominique Martinet 2025-12-25 10:23 ` Christian Schoenebeck 0 siblings, 1 reply; 6+ messages in thread From: Dominique Martinet @ 2025-12-24 22:33 UTC (permalink / raw) To: Pierre Barre; +Cc: ericvh, lucho, linux_oss, v9fs, linux-kernel, David Howells Hi Pierre, Pierre Barre wrote on Wed, Dec 24, 2025 at 03:29:01PM +0100: > I'm hitting data corruption using 9p with cache=mmap when stat() is called concurrently with writes. Thanks for the report > Environment: > - Kernel: v6.18.1-061801 > - Mount options: cache=mmap > - Transport: unix > > Reproducer: > 1. Mount 9p filesystem with cache=mmap > 2. Run PostgreSQL with data directory on 9p mount > 3. Run pgbench workload > 4. Simultaneously run `watch -n 0.1 tree -ah` on the data directory > > PostgreSQL reports: > ERROR: unexpected data beyond EOF in block N of relation "..." unexpected data beyond EOF looks a lot like https://lkml.kernel.org/r/938162.1766233900@warthog.procyon.org.uk could you try with this patch? if it doesn't work we need a better look > HINT: This has been seen to occur with buggy kernels > > Analysis: > > The issue appears to be race conditions in getattr/setattr when using > writeback caching: > > 1. v9fs_vfs_getattr_dotl() condition checks `v9ses->cache` instead of > `v9ses->cache & CACHE_WRITEBACK`, triggering writeback flush for > any cache mode > 2. Both getattr and setattr call filemap_fdatawrite() which initiates > writeback but doesn't wait for completion. The subsequent server > stat/wstat sees stale file size. > > Would using filemap_write_and_wait() instead be the correct fix? -- Dominique Martinet | Asmadeus ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [BUG] 9p: data corruption with cache=mmap under concurrent stat/write 2025-12-24 22:33 ` Dominique Martinet @ 2025-12-25 10:23 ` Christian Schoenebeck 2025-12-25 14:52 ` Pierre Barre 2026-01-05 7:54 ` David Howells 0 siblings, 2 replies; 6+ messages in thread From: Christian Schoenebeck @ 2025-12-25 10:23 UTC (permalink / raw) To: Pierre Barre, Dominique Martinet Cc: ericvh, lucho, v9fs, linux-kernel, David Howells On Wednesday, 24 December 2025 23:33:58 CET Dominique Martinet wrote: > Hi Pierre, > > Pierre Barre wrote on Wed, Dec 24, 2025 at 03:29:01PM +0100: > > I'm hitting data corruption using 9p with cache=mmap when stat() is called > > concurrently with writes. > Thanks for the report > > > Environment: > > - Kernel: v6.18.1-061801 > > - Mount options: cache=mmap > > - Transport: unix > > > > Reproducer: > > 1. Mount 9p filesystem with cache=mmap > > 2. Run PostgreSQL with data directory on 9p mount > > 3. Run pgbench workload > > 4. Simultaneously run `watch -n 0.1 tree -ah` on the data directory > > > > PostgreSQL reports: > > ERROR: unexpected data beyond EOF in block N of relation "..." > > unexpected data beyond EOF looks a lot like > https://lkml.kernel.org/r/938162.1766233900@warthog.procyon.org.uk > > could you try with this patch? Pierre, I am also confident that this patch will fix the EOF data issue you encountered with PostgreSQL. However ... > > HINT: This has been seen to occur with buggy kernels > > > > Analysis: > > > > The issue appears to be race conditions in getattr/setattr when using > > writeback caching: > > > > 1. v9fs_vfs_getattr_dotl() condition checks `v9ses->cache` instead of > > > > `v9ses->cache & CACHE_WRITEBACK`, triggering writeback flush for > > any cache mode > > > > 2. Both getattr and setattr call filemap_fdatawrite() which initiates > > > > writeback but doesn't wait for completion. The subsequent server > > stat/wstat sees stale file size. > > > > Would using filemap_write_and_wait() instead be the correct fix? ... you are seeing a 2nd issue? getattr() output should not be related to mmap() access. /Christian ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [BUG] 9p: data corruption with cache=mmap under concurrent stat/write 2025-12-25 10:23 ` Christian Schoenebeck @ 2025-12-25 14:52 ` Pierre Barre 2025-12-26 13:13 ` Pierre Barre 2026-01-05 7:54 ` David Howells 1 sibling, 1 reply; 6+ messages in thread From: Pierre Barre @ 2025-12-25 14:52 UTC (permalink / raw) To: Christian Schoenebeck, asmadeus Cc: ericvh, lucho, v9fs, linux-kernel, David Howells Hi Christian, Dominique, Thank you for your reply and merry christmas. >> unexpected data beyond EOF looks a lot like >> https://lkml.kernel.org/r/938162.1766233900@warthog.procyon.org.uk >> >> could you try with this patch? I will try this patch and report back. > ... you are seeing a 2nd issue? getattr() output should not be related to > mmap() access. What's strange is that this issue doesn't occur during normal Postgres operation or while just running benchmarks. I initially encountered it while running du -hs during a pgbench benchmark, and I've since been able to reproduce it consistently with watch -n 0.1 tree -ah. Running the benchmarks for hours never trigger this bug, but it (almost) immediately occurs during du -hs / tree -ah. Best, Pierre. On Thu, Dec 25, 2025, at 11:23, Christian Schoenebeck wrote: > On Wednesday, 24 December 2025 23:33:58 CET Dominique Martinet wrote: >> Hi Pierre, >> >> Pierre Barre wrote on Wed, Dec 24, 2025 at 03:29:01PM +0100: >> > I'm hitting data corruption using 9p with cache=mmap when stat() is called >> > concurrently with writes. >> Thanks for the report >> >> > Environment: >> > - Kernel: v6.18.1-061801 >> > - Mount options: cache=mmap >> > - Transport: unix >> > >> > Reproducer: >> > 1. Mount 9p filesystem with cache=mmap >> > 2. Run PostgreSQL with data directory on 9p mount >> > 3. Run pgbench workload >> > 4. Simultaneously run `watch -n 0.1 tree -ah` on the data directory >> > >> > PostgreSQL reports: >> > ERROR: unexpected data beyond EOF in block N of relation "..." >> >> unexpected data beyond EOF looks a lot like >> https://lkml.kernel.org/r/938162.1766233900@warthog.procyon.org.uk >> >> could you try with this patch? > > Pierre, I am also confident that this patch will fix the EOF data issue you > encountered with PostgreSQL. However ... > >> > HINT: This has been seen to occur with buggy kernels >> > >> > Analysis: >> > >> > The issue appears to be race conditions in getattr/setattr when using >> > writeback caching: >> > >> > 1. v9fs_vfs_getattr_dotl() condition checks `v9ses->cache` instead of >> > >> > `v9ses->cache & CACHE_WRITEBACK`, triggering writeback flush for >> > any cache mode >> > >> > 2. Both getattr and setattr call filemap_fdatawrite() which initiates >> > >> > writeback but doesn't wait for completion. The subsequent server >> > stat/wstat sees stale file size. >> > >> > Would using filemap_write_and_wait() instead be the correct fix? > > ... you are seeing a 2nd issue? getattr() output should not be related to > mmap() access. > > /Christian ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [BUG] 9p: data corruption with cache=mmap under concurrent stat/write 2025-12-25 14:52 ` Pierre Barre @ 2025-12-26 13:13 ` Pierre Barre 0 siblings, 0 replies; 6+ messages in thread From: Pierre Barre @ 2025-12-26 13:13 UTC (permalink / raw) To: Christian Schoenebeck, asmadeus Cc: ericvh, lucho, v9fs, linux-kernel, David Howells To clarify: the issue isn't with mmap access specifically. With cache=mmap (which enables CACHE_WRITEBACK), PostgreSQL uses regular write() calls that go through the page cache with writeback caching. Best, Pierre > On 25 Dec 2025, at 15:52, Pierre Barre <pierre@barre.sh> wrote: > > Hi Christian, Dominique, > > Thank you for your reply and merry christmas. > >>> unexpected data beyond EOF looks a lot like >>> https://lkml.kernel.org/r/938162.1766233900@warthog.procyon.org.uk >>> >>> could you try with this patch? > > I will try this patch and report back. > >> ... you are seeing a 2nd issue? getattr() output should not be related to >> mmap() access. > > What's strange is that this issue doesn't occur during normal Postgres operation or while just running benchmarks. I initially encountered it while running du -hs during a pgbench benchmark, and I've since been able to reproduce it consistently with watch -n 0.1 tree -ah. Running the benchmarks for hours never trigger this bug, but it (almost) immediately occurs during du -hs / tree -ah. > > Best, > Pierre. > > On Thu, Dec 25, 2025, at 11:23, Christian Schoenebeck wrote: >> On Wednesday, 24 December 2025 23:33:58 CET Dominique Martinet wrote: >>> Hi Pierre, >>> >>> Pierre Barre wrote on Wed, Dec 24, 2025 at 03:29:01PM +0100: >>>> I'm hitting data corruption using 9p with cache=mmap when stat() is called >>>> concurrently with writes. >>> Thanks for the report >>> >>>> Environment: >>>> - Kernel: v6.18.1-061801 >>>> - Mount options: cache=mmap >>>> - Transport: unix >>>> >>>> Reproducer: >>>> 1. Mount 9p filesystem with cache=mmap >>>> 2. Run PostgreSQL with data directory on 9p mount >>>> 3. Run pgbench workload >>>> 4. Simultaneously run `watch -n 0.1 tree -ah` on the data directory >>>> >>>> PostgreSQL reports: >>>> ERROR: unexpected data beyond EOF in block N of relation "..." >>> >>> unexpected data beyond EOF looks a lot like >>> https://lkml.kernel.org/r/938162.1766233900@warthog.procyon.org.uk >>> >>> could you try with this patch? >> >> Pierre, I am also confident that this patch will fix the EOF data issue you >> encountered with PostgreSQL. However ... >> >>>> HINT: This has been seen to occur with buggy kernels >>>> >>>> Analysis: >>>> >>>> The issue appears to be race conditions in getattr/setattr when using >>>> writeback caching: >>>> >>>> 1. v9fs_vfs_getattr_dotl() condition checks `v9ses->cache` instead of >>>> >>>> `v9ses->cache & CACHE_WRITEBACK`, triggering writeback flush for >>>> any cache mode >>>> >>>> 2. Both getattr and setattr call filemap_fdatawrite() which initiates >>>> >>>> writeback but doesn't wait for completion. The subsequent server >>>> stat/wstat sees stale file size. >>>> >>>> Would using filemap_write_and_wait() instead be the correct fix? >> >> ... you are seeing a 2nd issue? getattr() output should not be related to >> mmap() access. >> >> /Christian ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [BUG] 9p: data corruption with cache=mmap under concurrent stat/write 2025-12-25 10:23 ` Christian Schoenebeck 2025-12-25 14:52 ` Pierre Barre @ 2026-01-05 7:54 ` David Howells 1 sibling, 0 replies; 6+ messages in thread From: David Howells @ 2026-01-05 7:54 UTC (permalink / raw) To: Christian Schoenebeck Cc: dhowells, Pierre Barre, Dominique Martinet, ericvh, lucho, v9fs, linux-kernel Christian Schoenebeck <linux_oss@crudebyte.com> wrote: > > > 2. Both getattr and setattr call filemap_fdatawrite() which initiates > > > > > > writeback but doesn't wait for completion. The subsequent server > > > stat/wstat sees stale file size. > > > > > > Would using filemap_write_and_wait() instead be the correct fix? > > ... you are seeing a 2nd issue? getattr() output should not be related to > mmap() access. getattr() may flush outstanding dirty data on an inode so that the stats are correct - but if so, it should wait for completion. If you look at cifs and nfs, those uses filemap_datawait() or filemap_write_and_wait(), rather then filemap_datawrite(). You might also want to check flags & AT_STATX_FORCE_SYNC and flags & AT_STATX_DONT_SYNC. I really ought to make afs honour AT_STATX_FORCE_SYNC. David ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-01-05 7:54 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-12-24 14:29 [BUG] 9p: data corruption with cache=mmap under concurrent stat/write Pierre Barre 2025-12-24 22:33 ` Dominique Martinet 2025-12-25 10:23 ` Christian Schoenebeck 2025-12-25 14:52 ` Pierre Barre 2025-12-26 13:13 ` Pierre Barre 2026-01-05 7:54 ` David Howells
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox