Possible regression after NFS eof page pollution fix (ext4 checksum errors)

public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed

* Possible regression after NFS eof page pollution fix (ext4 checksum errors)
@ 2026-01-04  9:16 Mark Bloch
  2026-01-04 15:36 ` Trond Myklebust
  0 siblings, 1 reply; 10+ messages in thread
From: Mark Bloch @ 2026-01-04  9:16 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linoy Ganti, Bar Friedman, linux-nfs, Maor Gottlieb

Hi Trond,

We’ve recently started seeing filesystem issues in our internal
regression runs, and we were able to bisect the problem down to
the following commit:

commit b1817b18ff20e69f5accdccefaf78bf5454bede2
Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date:   Thu Sep 4 18:46:16 2025 -0400

    NFS: Protect against 'eof page pollution'

    This commit fixes the failing xfstest 'generic/363'.

    When the user mmaps() an area that extends beyond the end of file, and
    proceeds to write data into the folio that straddles that eof, we're
    required to discard that folio data if the user calls some function that
    extends the file length.

    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

After this change, we intermittently see EXT4 checksum-related errors during boot.
A representative dmesg excerpt is below:

 [ 1908.365537] EXT4-fs warning (device vda2): ext4_dirblock_csum_verify:375: inode #263414: comm updatedb: No space for directory leaf checksum. Please run e2fsck -D.
 [ 1908.375449] EXT4-fs error (device vda2): __ext4_find_entry:1624: inode #263414: comm updatedb: checksumming directory block 0
 [ 1908.382985] EXT4-fs warning (device vda2): ext4_dirblock_csum_verify:375: inode #263414: comm updatedb: No space for directory leaf checksum. Please run e2fsck -D.
 [ 1908.389289] EXT4-fs error (device vda2): __ext4_find_entry:1624: inode #263414: comm updatedb: checksumming directory block 0
 [ 1909.598811] EXT4-fs warning (device vda2): ext4_dirblock_csum_verify:375: inode #423753: comm updatedb: No space for directory leaf checksum. Please run e2fsck -D.
 [ 1909.604308] EXT4-fs error (device vda2): htree_dirblock_to_tree:1051: inode #423753: comm updatedb: Directory block failed checksum
 [ 1909.958470] EXT4-fs warning (device vda2): ext4_dirblock_csum_verify:375: inode #423759: comm updatedb: No space for directory leaf checksum. Please run e2fsck -D.
 [ 1909.963825] EXT4-fs error (device vda2): htree_dirblock_to_tree:1051: inode #423759: comm updatedb: Directory block failed checksum
 [ 1909.985956] EXT4-fs warning (device vda2): ext4_dirblock_csum_verify:375: inode #303617: comm updatedb: No space for directory leaf checksum. Please run e2fsck -D.
 [ 1909.991371] EXT4-fs error (device vda2): __ext4_find_entry:1624: inode #303617: comm updatedb: checksumming directory block 0
 [ 1910.156415] EXT4-fs warning (device vda2): ext4_dirblock_csum_verify:375: inode #423761: comm updatedb: No space for directory leaf checksum. Please run e2fsck -D.
 [ 1910.161959] EXT4-fs error (device vda2): htree_dirblock_to_tree:1051: inode #423761: comm updatedb: Directory block failed checksum
 [ 1910.171364] EXT4-fs warning (device vda2): ext4_dirblock_csum_verify:375: inode #423735: comm updatedb: No space for directory leaf checksum. Please run e2fsck -D.
 [ 1910.177292] EXT4-fs error (device vda2): htree_dirblock_to_tree:1051: inode #423735: comm updatedb: Directory block failed checksum
 [ 1910.267721] EXT4-fs warning (device vda2): ext4_dirblock_csum_verify:375: inode #423744: comm updatedb: No space for directory leaf checksum. Please run e2fsck -D.
 [ 1910.281838] EXT4-fs error (device vda2): htree_dirblock_to_tree:1051: inode #423744: comm updatedb: Directory block failed checksum
 [ 1910.476906] EXT4-fs warning (device vda2): ext4_dirblock_csum_verify:375: inode #423751: comm updatedb: No space for directory leaf checksum. Please run e2fsck -D.
 [ 1910.482403] EXT4-fs error (device vda2): htree_dirblock_to_tree:1051: inode #423751: comm updatedb: Directory block failed checksum

The issue has so far only been observed in tests that use a nested VM setup.
It does not reproduce deterministically, roughly half of the nested
VM boots trigger the problem.

Would you mind taking a look or pointing us in the right direction?
Please let us know if additional information, testing,
or instrumentation would be helpful.

Thanks,
Mark

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Possible regression after NFS eof page pollution fix (ext4 checksum errors)
  2026-01-04  9:16 Possible regression after NFS eof page pollution fix (ext4 checksum errors) Mark Bloch
@ 2026-01-04 15:36 ` Trond Myklebust
  2026-01-05 14:00   ` Mark Bloch
  0 siblings, 1 reply; 10+ messages in thread
From: Trond Myklebust @ 2026-01-04 15:36 UTC (permalink / raw)
  To: Mark Bloch; +Cc: Linoy Ganti, Bar Friedman, linux-nfs, Maor Gottlieb

On Sun, 2026-01-04 at 11:16 +0200, Mark Bloch wrote:
> Hi Trond,
> 
> We’ve recently started seeing filesystem issues in our internal
> regression runs, and we were able to bisect the problem down to
> the following commit:
> 
> commit b1817b18ff20e69f5accdccefaf78bf5454bede2
> Author: Trond Myklebust <trond.myklebust@hammerspace.com>
> Date:   Thu Sep 4 18:46:16 2025 -0400
> 
>     NFS: Protect against 'eof page pollution'
> 
>     This commit fixes the failing xfstest 'generic/363'.
> 
>     When the user mmaps() an area that extends beyond the end of
> file, and
>     proceeds to write data into the folio that straddles that eof,
> we're
>     required to discard that folio data if the user calls some
> function that
>     extends the file length.
> 
>     Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> 
> 
> After this change, we intermittently see EXT4 checksum-related errors
> during boot.
> A representative dmesg excerpt is below:
> 
>  [ 1908.365537] EXT4-fs warning (device vda2):
> ext4_dirblock_csum_verify:375: inode #263414: comm updatedb: No space
> for directory leaf checksum. Please run e2fsck -D.
>  [ 1908.375449] EXT4-fs error (device vda2): __ext4_find_entry:1624:
> inode #263414: comm updatedb: checksumming directory block 0
>  [ 1908.382985] EXT4-fs warning (device vda2):
> ext4_dirblock_csum_verify:375: inode #263414: comm updatedb: No space
> for directory leaf checksum. Please run e2fsck -D.
>  [ 1908.389289] EXT4-fs error (device vda2): __ext4_find_entry:1624:
> inode #263414: comm updatedb: checksumming directory block 0
>  [ 1909.598811] EXT4-fs warning (device vda2):
> ext4_dirblock_csum_verify:375: inode #423753: comm updatedb: No space
> for directory leaf checksum. Please run e2fsck -D.
>  [ 1909.604308] EXT4-fs error (device vda2):
> htree_dirblock_to_tree:1051: inode #423753: comm updatedb: Directory
> block failed checksum
>  [ 1909.958470] EXT4-fs warning (device vda2):
> ext4_dirblock_csum_verify:375: inode #423759: comm updatedb: No space
> for directory leaf checksum. Please run e2fsck -D.
>  [ 1909.963825] EXT4-fs error (device vda2):
> htree_dirblock_to_tree:1051: inode #423759: comm updatedb: Directory
> block failed checksum
>  [ 1909.985956] EXT4-fs warning (device vda2):
> ext4_dirblock_csum_verify:375: inode #303617: comm updatedb: No space
> for directory leaf checksum. Please run e2fsck -D.
>  [ 1909.991371] EXT4-fs error (device vda2): __ext4_find_entry:1624:
> inode #303617: comm updatedb: checksumming directory block 0
>  [ 1910.156415] EXT4-fs warning (device vda2):
> ext4_dirblock_csum_verify:375: inode #423761: comm updatedb: No space
> for directory leaf checksum. Please run e2fsck -D.
>  [ 1910.161959] EXT4-fs error (device vda2):
> htree_dirblock_to_tree:1051: inode #423761: comm updatedb: Directory
> block failed checksum
>  [ 1910.171364] EXT4-fs warning (device vda2):
> ext4_dirblock_csum_verify:375: inode #423735: comm updatedb: No space
> for directory leaf checksum. Please run e2fsck -D.
>  [ 1910.177292] EXT4-fs error (device vda2):
> htree_dirblock_to_tree:1051: inode #423735: comm updatedb: Directory
> block failed checksum
>  [ 1910.267721] EXT4-fs warning (device vda2):
> ext4_dirblock_csum_verify:375: inode #423744: comm updatedb: No space
> for directory leaf checksum. Please run e2fsck -D.
>  [ 1910.281838] EXT4-fs error (device vda2):
> htree_dirblock_to_tree:1051: inode #423744: comm updatedb: Directory
> block failed checksum
>  [ 1910.476906] EXT4-fs warning (device vda2):
> ext4_dirblock_csum_verify:375: inode #423751: comm updatedb: No space
> for directory leaf checksum. Please run e2fsck -D.
>  [ 1910.482403] EXT4-fs error (device vda2):
> htree_dirblock_to_tree:1051: inode #423751: comm updatedb: Directory
> block failed checksum
> 
> The issue has so far only been observed in tests that use a nested VM
> setup.
> It does not reproduce deterministically, roughly half of the nested
> VM boots trigger the problem.
> 
> Would you mind taking a look or pointing us in the right direction?
> Please let us know if additional information, testing,
> or instrumentation would be helpful.
> 
> Thanks,
> Mark

I'm having trouble seeing how those issues can be related unless ext4
and NFS are somehow sharing the same folios. Does reverting just 
commit b1817b18ff20 and b2036bb65114 actually fix the ext4 problem?

What does "nested VM" mean in this situation, and what is the storage
for the ext4 filesystem that is being corrupted?

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@kernel.org, trond.myklebust@hammerspace.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Possible regression after NFS eof page pollution fix (ext4 checksum errors)
  2026-01-04 15:36 ` Trond Myklebust
@ 2026-01-05 14:00   ` Mark Bloch
  2026-01-05 15:20     ` Trond Myklebust
  0 siblings, 1 reply; 10+ messages in thread
From: Mark Bloch @ 2026-01-05 14:00 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linoy Ganti, Bar Friedman, linux-nfs, Maor Gottlieb



On 04/01/2026 17:36, Trond Myklebust wrote:
> On Sun, 2026-01-04 at 11:16 +0200, Mark Bloch wrote:
>> Hi Trond,
>>
>> We’ve recently started seeing filesystem issues in our internal
>> regression runs, and we were able to bisect the problem down to
>> the following commit:
>>
>> commit b1817b18ff20e69f5accdccefaf78bf5454bede2
>> Author: Trond Myklebust <trond.myklebust@hammerspace.com>
>> Date:   Thu Sep 4 18:46:16 2025 -0400
>>
>>     NFS: Protect against 'eof page pollution'
>>
>>     This commit fixes the failing xfstest 'generic/363'.
>>
>>     When the user mmaps() an area that extends beyond the end of
>> file, and
>>     proceeds to write data into the folio that straddles that eof,
>> we're
>>     required to discard that folio data if the user calls some
>> function that
>>     extends the file length.
>>
>>     Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
>>
>>
>> After this change, we intermittently see EXT4 checksum-related errors
>> during boot.
>> A representative dmesg excerpt is below:
>>
>>  [ 1908.365537] EXT4-fs warning (device vda2):
>> ext4_dirblock_csum_verify:375: inode #263414: comm updatedb: No space
>> for directory leaf checksum. Please run e2fsck -D.
>>  [ 1908.375449] EXT4-fs error (device vda2): __ext4_find_entry:1624:
>> inode #263414: comm updatedb: checksumming directory block 0
>>  [ 1908.382985] EXT4-fs warning (device vda2):
>> ext4_dirblock_csum_verify:375: inode #263414: comm updatedb: No space
>> for directory leaf checksum. Please run e2fsck -D.
>>  [ 1908.389289] EXT4-fs error (device vda2): __ext4_find_entry:1624:
>> inode #263414: comm updatedb: checksumming directory block 0
>>  [ 1909.598811] EXT4-fs warning (device vda2):
>> ext4_dirblock_csum_verify:375: inode #423753: comm updatedb: No space
>> for directory leaf checksum. Please run e2fsck -D.
>>  [ 1909.604308] EXT4-fs error (device vda2):
>> htree_dirblock_to_tree:1051: inode #423753: comm updatedb: Directory
>> block failed checksum
>>  [ 1909.958470] EXT4-fs warning (device vda2):
>> ext4_dirblock_csum_verify:375: inode #423759: comm updatedb: No space
>> for directory leaf checksum. Please run e2fsck -D.
>>  [ 1909.963825] EXT4-fs error (device vda2):
>> htree_dirblock_to_tree:1051: inode #423759: comm updatedb: Directory
>> block failed checksum
>>  [ 1909.985956] EXT4-fs warning (device vda2):
>> ext4_dirblock_csum_verify:375: inode #303617: comm updatedb: No space
>> for directory leaf checksum. Please run e2fsck -D.
>>  [ 1909.991371] EXT4-fs error (device vda2): __ext4_find_entry:1624:
>> inode #303617: comm updatedb: checksumming directory block 0
>>  [ 1910.156415] EXT4-fs warning (device vda2):
>> ext4_dirblock_csum_verify:375: inode #423761: comm updatedb: No space
>> for directory leaf checksum. Please run e2fsck -D.
>>  [ 1910.161959] EXT4-fs error (device vda2):
>> htree_dirblock_to_tree:1051: inode #423761: comm updatedb: Directory
>> block failed checksum
>>  [ 1910.171364] EXT4-fs warning (device vda2):
>> ext4_dirblock_csum_verify:375: inode #423735: comm updatedb: No space
>> for directory leaf checksum. Please run e2fsck -D.
>>  [ 1910.177292] EXT4-fs error (device vda2):
>> htree_dirblock_to_tree:1051: inode #423735: comm updatedb: Directory
>> block failed checksum
>>  [ 1910.267721] EXT4-fs warning (device vda2):
>> ext4_dirblock_csum_verify:375: inode #423744: comm updatedb: No space
>> for directory leaf checksum. Please run e2fsck -D.
>>  [ 1910.281838] EXT4-fs error (device vda2):
>> htree_dirblock_to_tree:1051: inode #423744: comm updatedb: Directory
>> block failed checksum
>>  [ 1910.476906] EXT4-fs warning (device vda2):
>> ext4_dirblock_csum_verify:375: inode #423751: comm updatedb: No space
>> for directory leaf checksum. Please run e2fsck -D.
>>  [ 1910.482403] EXT4-fs error (device vda2):
>> htree_dirblock_to_tree:1051: inode #423751: comm updatedb: Directory
>> block failed checksum
>>
>> The issue has so far only been observed in tests that use a nested VM
>> setup.
>> It does not reproduce deterministically, roughly half of the nested
>> VM boots trigger the problem.
>>
>> Would you mind taking a look or pointing us in the right direction?
>> Please let us know if additional information, testing,
>> or instrumentation would be helpful.
>>
>> Thanks,
>> Mark
> 
> I'm having trouble seeing how those issues can be related unless ext4
> and NFS are somehow sharing the same folios. Does reverting just 
> commit b1817b18ff20 and b2036bb65114 actually fix the ext4 problem?

Yes, after reverting those two commits we no longer can reproduce it.

> 
> What does "nested VM" mean in this situation, and what is the storage
> for the ext4 filesystem that is being corrupted?
> 

Probably should have explained better, let me do that now.
Say we have host A.
On host A we run VM B.
Inside VM B we run VM C.

Inside VM B we have a mount (nfs one)
X:/images/.libvirt on /images/.libvirt type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,fatal_neterrors=none,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=Y,local_lock=none,addr=X)

which holds the .img files. We launce QEMU with something like this:

{"driver":"file","filename":"/images/.libvirt/linux-VAGRANTSLASH-upstream_Z.img","node-name":"libvirt-2-storage","auto-read-only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-2-format","read-only":true,"driver":"qcow2","file":"libvirt-2-storage","backing":null} -blockdev {"driver":"file","filename":"/images/.libvirt/Y.img","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}

inside VM C, it's a regular ext4 mount:
/dev/vda2 on / type ext4 (rw,relatime)

Mark



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Possible regression after NFS eof page pollution fix (ext4 checksum errors)
  2026-01-05 14:00   ` Mark Bloch
@ 2026-01-05 15:20     ` Trond Myklebust
  2026-01-06 10:12       ` Bar Friedman
  2026-01-11  0:24       ` Trond Myklebust
  0 siblings, 2 replies; 10+ messages in thread
From: Trond Myklebust @ 2026-01-05 15:20 UTC (permalink / raw)
  To: Mark Bloch; +Cc: Linoy Ganti, Bar Friedman, linux-nfs, Maor Gottlieb

On Mon, 2026-01-05 at 16:00 +0200, Mark Bloch wrote:
> 
> 
> On 04/01/2026 17:36, Trond Myklebust wrote:
> > On Sun, 2026-01-04 at 11:16 +0200, Mark Bloch wrote:
> > > Hi Trond,
> > > 
> > > We’ve recently started seeing filesystem issues in our internal
> > > regression runs, and we were able to bisect the problem down to
> > > the following commit:
> > > 
> > > commit b1817b18ff20e69f5accdccefaf78bf5454bede2
> > > Author: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > Date:   Thu Sep 4 18:46:16 2025 -0400
> > > 
> > >     NFS: Protect against 'eof page pollution'
> > > 
> > >     This commit fixes the failing xfstest 'generic/363'.
> > > 
> > >     When the user mmaps() an area that extends beyond the end of
> > > file, and
> > >     proceeds to write data into the folio that straddles that
> > > eof,
> > > we're
> > >     required to discard that folio data if the user calls some
> > > function that
> > >     extends the file length.
> > > 
> > >     Signed-off-by: Trond Myklebust
> > > <trond.myklebust@hammerspace.com>
> > > 
> > > 
> > > After this change, we intermittently see EXT4 checksum-related
> > > errors
> > > during boot.
> > > A representative dmesg excerpt is below:
> > > 
> > >  [ 1908.365537] EXT4-fs warning (device vda2):
> > > ext4_dirblock_csum_verify:375: inode #263414: comm updatedb: No
> > > space
> > > for directory leaf checksum. Please run e2fsck -D.
> > >  [ 1908.375449] EXT4-fs error (device vda2):
> > > __ext4_find_entry:1624:
> > > inode #263414: comm updatedb: checksumming directory block 0
> > >  [ 1908.382985] EXT4-fs warning (device vda2):
> > > ext4_dirblock_csum_verify:375: inode #263414: comm updatedb: No
> > > space
> > > for directory leaf checksum. Please run e2fsck -D.
> > >  [ 1908.389289] EXT4-fs error (device vda2):
> > > __ext4_find_entry:1624:
> > > inode #263414: comm updatedb: checksumming directory block 0
> > >  [ 1909.598811] EXT4-fs warning (device vda2):
> > > ext4_dirblock_csum_verify:375: inode #423753: comm updatedb: No
> > > space
> > > for directory leaf checksum. Please run e2fsck -D.
> > >  [ 1909.604308] EXT4-fs error (device vda2):
> > > htree_dirblock_to_tree:1051: inode #423753: comm updatedb:
> > > Directory
> > > block failed checksum
> > >  [ 1909.958470] EXT4-fs warning (device vda2):
> > > ext4_dirblock_csum_verify:375: inode #423759: comm updatedb: No
> > > space
> > > for directory leaf checksum. Please run e2fsck -D.
> > >  [ 1909.963825] EXT4-fs error (device vda2):
> > > htree_dirblock_to_tree:1051: inode #423759: comm updatedb:
> > > Directory
> > > block failed checksum
> > >  [ 1909.985956] EXT4-fs warning (device vda2):
> > > ext4_dirblock_csum_verify:375: inode #303617: comm updatedb: No
> > > space
> > > for directory leaf checksum. Please run e2fsck -D.
> > >  [ 1909.991371] EXT4-fs error (device vda2):
> > > __ext4_find_entry:1624:
> > > inode #303617: comm updatedb: checksumming directory block 0
> > >  [ 1910.156415] EXT4-fs warning (device vda2):
> > > ext4_dirblock_csum_verify:375: inode #423761: comm updatedb: No
> > > space
> > > for directory leaf checksum. Please run e2fsck -D.
> > >  [ 1910.161959] EXT4-fs error (device vda2):
> > > htree_dirblock_to_tree:1051: inode #423761: comm updatedb:
> > > Directory
> > > block failed checksum
> > >  [ 1910.171364] EXT4-fs warning (device vda2):
> > > ext4_dirblock_csum_verify:375: inode #423735: comm updatedb: No
> > > space
> > > for directory leaf checksum. Please run e2fsck -D.
> > >  [ 1910.177292] EXT4-fs error (device vda2):
> > > htree_dirblock_to_tree:1051: inode #423735: comm updatedb:
> > > Directory
> > > block failed checksum
> > >  [ 1910.267721] EXT4-fs warning (device vda2):
> > > ext4_dirblock_csum_verify:375: inode #423744: comm updatedb: No
> > > space
> > > for directory leaf checksum. Please run e2fsck -D.
> > >  [ 1910.281838] EXT4-fs error (device vda2):
> > > htree_dirblock_to_tree:1051: inode #423744: comm updatedb:
> > > Directory
> > > block failed checksum
> > >  [ 1910.476906] EXT4-fs warning (device vda2):
> > > ext4_dirblock_csum_verify:375: inode #423751: comm updatedb: No
> > > space
> > > for directory leaf checksum. Please run e2fsck -D.
> > >  [ 1910.482403] EXT4-fs error (device vda2):
> > > htree_dirblock_to_tree:1051: inode #423751: comm updatedb:
> > > Directory
> > > block failed checksum
> > > 
> > > The issue has so far only been observed in tests that use a
> > > nested VM
> > > setup.
> > > It does not reproduce deterministically, roughly half of the
> > > nested
> > > VM boots trigger the problem.
> > > 
> > > Would you mind taking a look or pointing us in the right
> > > direction?
> > > Please let us know if additional information, testing,
> > > or instrumentation would be helpful.
> > > 
> > > Thanks,
> > > Mark
> > 
> > I'm having trouble seeing how those issues can be related unless
> > ext4
> > and NFS are somehow sharing the same folios. Does reverting just 
> > commit b1817b18ff20 and b2036bb65114 actually fix the ext4 problem?
> 
> Yes, after reverting those two commits we no longer can reproduce it.
> 
> > 
> > What does "nested VM" mean in this situation, and what is the
> > storage
> > for the ext4 filesystem that is being corrupted?
> > 
> 
> Probably should have explained better, let me do that now.
> Say we have host A.
> On host A we run VM B.
> Inside VM B we run VM C.
> 
> Inside VM B we have a mount (nfs one)
> X:/images/.libvirt on /images/.libvirt type nfs4
> (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,fat
> al_neterrors=none,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=Y,
> local_lock=none,addr=X)
> 
> which holds the .img files. We launce QEMU with something like this:
> 
> {"driver":"file","filename":"/images/.libvirt/linux-VAGRANTSLASH-
> upstream_Z.img","node-name":"libvirt-2-storage","auto-read-
> only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-2-
> format","read-only":true,"driver":"qcow2","file":"libvirt-2-
> storage","backing":null} -blockdev
> {"driver":"file","filename":"/images/.libvirt/Y.img","node-
> name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}
> 
> inside VM C, it's a regular ext4 mount:
> /dev/vda2 on / type ext4 (rw,relatime)
> 
> Mark
> 

OK so if I'm understanding correctly, this is organised as ext4
partitions that are stored in qcow2 images that are again stored on a
NFSv4.2 partition.

Do these qcow2 images have a file size that is fixed at creation time,
or is the file size dynamic?
Also, does changing the "discard" option from "unmap" to "ignore" make
any difference to the outcome?

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@kernel.org, trond.myklebust@hammerspace.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Possible regression after NFS eof page pollution fix (ext4 checksum errors)
  2026-01-05 15:20     ` Trond Myklebust
@ 2026-01-06 10:12       ` Bar Friedman
  2026-01-11  0:24       ` Trond Myklebust
  1 sibling, 0 replies; 10+ messages in thread
From: Bar Friedman @ 2026-01-06 10:12 UTC (permalink / raw)
  To: Trond Myklebust, Mark Bloch, Riad Abu Ra Ed
  Cc: Linoy Ganti, linux-nfs@vger.kernel.org, Maor Gottlieb

+@Riad Abu Ra Ed




________________________________________
From: Trond Myklebust <trondmy@kernel.org>
Sent: Monday, January 5, 2026 5:20 PM
To: Mark Bloch <mbloch@nvidia.com>
Cc: Linoy Ganti <lganti@nvidia.com>; Bar Friedman <bfriedman@nvidia.com>; linux-nfs@vger.kernel.org <linux-nfs@vger.kernel.org>; Maor Gottlieb <maorg@nvidia.com>
Subject: Re: Possible regression after NFS eof page pollution fix (ext4 checksum errors)


External email: Use caution opening links or attachments





On Mon, 2026-01-05 at 16:00 +0200, Mark Bloch wrote:

>

>

> On 04/01/2026 17:36, Trond Myklebust wrote:

> > On Sun, 2026-01-04 at 11:16 +0200, Mark Bloch wrote:

> > > Hi Trond,

> > >

> > > We’ve recently started seeing filesystem issues in our internal

> > > regression runs, and we were able to bisect the problem down to

> > > the following commit:

> > >

> > > commit b1817b18ff20e69f5accdccefaf78bf5454bede2

> > > Author: Trond Myklebust <trond.myklebust@hammerspace.com>

> > > Date:   Thu Sep 4 18:46:16 2025 -0400

> > >

> > >     NFS: Protect against 'eof page pollution'

> > >

> > >     This commit fixes the failing xfstest 'generic/363'.

> > >

> > >     When the user mmaps() an area that extends beyond the end of

> > > file, and

> > >     proceeds to write data into the folio that straddles that

> > > eof,

> > > we're

> > >     required to discard that folio data if the user calls some

> > > function that

> > >     extends the file length.

> > >

> > >     Signed-off-by: Trond Myklebust

> > > <trond.myklebust@hammerspace.com>

> > >

> > >

> > > After this change, we intermittently see EXT4 checksum-related

> > > errors

> > > during boot.

> > > A representative dmesg excerpt is below:

> > >

> > >  [ 1908.365537] EXT4-fs warning (device vda2):

> > > ext4_dirblock_csum_verify:375: inode #263414: comm updatedb: No

> > > space

> > > for directory leaf checksum. Please run e2fsck -D.

> > >  [ 1908.375449] EXT4-fs error (device vda2):

> > > __ext4_find_entry:1624:

> > > inode #263414: comm updatedb: checksumming directory block 0

> > >  [ 1908.382985] EXT4-fs warning (device vda2):

> > > ext4_dirblock_csum_verify:375: inode #263414: comm updatedb: No

> > > space

> > > for directory leaf checksum. Please run e2fsck -D.

> > >  [ 1908.389289] EXT4-fs error (device vda2):

> > > __ext4_find_entry:1624:

> > > inode #263414: comm updatedb: checksumming directory block 0

> > >  [ 1909.598811] EXT4-fs warning (device vda2):

> > > ext4_dirblock_csum_verify:375: inode #423753: comm updatedb: No

> > > space

> > > for directory leaf checksum. Please run e2fsck -D.

> > >  [ 1909.604308] EXT4-fs error (device vda2):

> > > htree_dirblock_to_tree:1051: inode #423753: comm updatedb:

> > > Directory

> > > block failed checksum

> > >  [ 1909.958470] EXT4-fs warning (device vda2):

> > > ext4_dirblock_csum_verify:375: inode #423759: comm updatedb: No

> > > space

> > > for directory leaf checksum. Please run e2fsck -D.

> > >  [ 1909.963825] EXT4-fs error (device vda2):

> > > htree_dirblock_to_tree:1051: inode #423759: comm updatedb:

> > > Directory

> > > block failed checksum

> > >  [ 1909.985956] EXT4-fs warning (device vda2):

> > > ext4_dirblock_csum_verify:375: inode #303617: comm updatedb: No

> > > space

> > > for directory leaf checksum. Please run e2fsck -D.

> > >  [ 1909.991371] EXT4-fs error (device vda2):

> > > __ext4_find_entry:1624:

> > > inode #303617: comm updatedb: checksumming directory block 0

> > >  [ 1910.156415] EXT4-fs warning (device vda2):

> > > ext4_dirblock_csum_verify:375: inode #423761: comm updatedb: No

> > > space

> > > for directory leaf checksum. Please run e2fsck -D.

> > >  [ 1910.161959] EXT4-fs error (device vda2):

> > > htree_dirblock_to_tree:1051: inode #423761: comm updatedb:

> > > Directory

> > > block failed checksum

> > >  [ 1910.171364] EXT4-fs warning (device vda2):

> > > ext4_dirblock_csum_verify:375: inode #423735: comm updatedb: No

> > > space

> > > for directory leaf checksum. Please run e2fsck -D.

> > >  [ 1910.177292] EXT4-fs error (device vda2):

> > > htree_dirblock_to_tree:1051: inode #423735: comm updatedb:

> > > Directory

> > > block failed checksum

> > >  [ 1910.267721] EXT4-fs warning (device vda2):

> > > ext4_dirblock_csum_verify:375: inode #423744: comm updatedb: No

> > > space

> > > for directory leaf checksum. Please run e2fsck -D.

> > >  [ 1910.281838] EXT4-fs error (device vda2):

> > > htree_dirblock_to_tree:1051: inode #423744: comm updatedb:

> > > Directory

> > > block failed checksum

> > >  [ 1910.476906] EXT4-fs warning (device vda2):

> > > ext4_dirblock_csum_verify:375: inode #423751: comm updatedb: No

> > > space

> > > for directory leaf checksum. Please run e2fsck -D.

> > >  [ 1910.482403] EXT4-fs error (device vda2):

> > > htree_dirblock_to_tree:1051: inode #423751: comm updatedb:

> > > Directory

> > > block failed checksum

> > >

> > > The issue has so far only been observed in tests that use a

> > > nested VM

> > > setup.

> > > It does not reproduce deterministically, roughly half of the

> > > nested

> > > VM boots trigger the problem.

> > >

> > > Would you mind taking a look or pointing us in the right

> > > direction?

> > > Please let us know if additional information, testing,

> > > or instrumentation would be helpful.

> > >

> > > Thanks,

> > > Mark

> >

> > I'm having trouble seeing how those issues can be related unless

> > ext4

> > and NFS are somehow sharing the same folios. Does reverting just

> > commit b1817b18ff20 and b2036bb65114 actually fix the ext4 problem?

>

> Yes, after reverting those two commits we no longer can reproduce it.

>

> >

> > What does "nested VM" mean in this situation, and what is the

> > storage

> > for the ext4 filesystem that is being corrupted?

> >

>

> Probably should have explained better, let me do that now.

> Say we have host A.

> On host A we run VM B.

> Inside VM B we run VM C.

>

> Inside VM B we have a mount (nfs one)

> X:/images/.libvirt on /images/.libvirt type nfs4

> (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,fat

> al_neterrors=none,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=Y,

> local_lock=none,addr=X)

>

> which holds the .img files. We launce QEMU with something like this:

>

> {"driver":"file","filename":"/images/.libvirt/linux-VAGRANTSLASH-

> upstream_Z.img","node-name":"libvirt-2-storage","auto-read-

> only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-2-

> format","read-only":true,"driver":"qcow2","file":"libvirt-2-

> storage","backing":null} -blockdev

> {"driver":"file","filename":"/images/.libvirt/Y.img","node-

> name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}

>

> inside VM C, it's a regular ext4 mount:

> /dev/vda2 on / type ext4 (rw,relatime)

>

> Mark

>



OK so if I'm understanding correctly, this is organised as ext4

partitions that are stored in qcow2 images that are again stored on a

NFSv4.2 partition.



Do these qcow2 images have a file size that is fixed at creation time,

or is the file size dynamic?

Also, does changing the "discard" option from "unmap" to "ignore" make

any difference to the outcome?



--

Trond Myklebust

Linux NFS client maintainer, Hammerspace

trondmy@kernel.org, trond.myklebust@hammerspace.com


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Possible regression after NFS eof page pollution fix (ext4 checksum errors)
  2026-01-05 15:20     ` Trond Myklebust
  2026-01-06 10:12       ` Bar Friedman
@ 2026-01-11  0:24       ` Trond Myklebust
  2026-01-11  7:03         ` Mark Bloch
  1 sibling, 1 reply; 10+ messages in thread
From: Trond Myklebust @ 2026-01-11  0:24 UTC (permalink / raw)
  To: Mark Bloch; +Cc: Linoy Ganti, Bar Friedman, linux-nfs, Maor Gottlieb

Hi Mark,

On Mon, 2026-01-05 at 10:20 -0500, Trond Myklebust wrote:
> 
> OK so if I'm understanding correctly, this is organised as ext4
> partitions that are stored in qcow2 images that are again stored on a
> NFSv4.2 partition.
> 
> Do these qcow2 images have a file size that is fixed at creation
> time,
> or is the file size dynamic?
> Also, does changing the "discard" option from "unmap" to "ignore"
> make
> any difference to the outcome?

I've been staring at this for several days now, and the only candidate
for a bug in the NFS client that I can see is this one. Can you please
check if the following patch helps?

Thanks
  Trond

8<------------------------------------------------------------------
From 18acd9e2652d44bcb8a48bc4643ab006787b809a Mon Sep 17 00:00:00 2001
Message-ID: <18acd9e2652d44bcb8a48bc4643ab006787b809a.1768091015.git.trond.myklebust@hammerspace.com>
From: Trond Myklebust <trond.myklebust@hammerspace.com>
Date: Sat, 10 Jan 2026 18:53:34 -0500
Subject: [PATCH] NFSv4.2: Fix size read races in fallocate and copy offload

If the pre-operation file size is read before locking the inode and
quiescing O_DIRECT writes, then nfs_truncate_last_folio() might end up
overwriting valid file data.

Fixes: b1817b18ff20 ("NFS: Protect against 'eof page pollution'")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/io.c        |  2 ++
 fs/nfs/nfs42proc.c | 29 +++++++++++++++++++----------
 2 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/fs/nfs/io.c b/fs/nfs/io.c
index d275b0a250bf..8337f0ae852d 100644
--- a/fs/nfs/io.c
+++ b/fs/nfs/io.c
@@ -84,6 +84,7 @@ nfs_start_io_write(struct inode *inode)
 		nfs_file_block_o_direct(NFS_I(inode));
 	return err;
 }
+EXPORT_SYMBOL_GPL(nfs_start_io_write);
 
 /**
  * nfs_end_io_write - declare that the buffered write operation is done
@@ -97,6 +98,7 @@ nfs_end_io_write(struct inode *inode)
 {
 	up_write(&inode->i_rwsem);
 }
+EXPORT_SYMBOL_GPL(nfs_end_io_write);
 
 /* Call with exclusively locked inode->i_rwsem */
 static void nfs_block_buffered(struct nfs_inode *nfsi, struct inode *inode)
diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
index d537fb0c230e..c08520828708 100644
--- a/fs/nfs/nfs42proc.c
+++ b/fs/nfs/nfs42proc.c
@@ -114,7 +114,6 @@ static int nfs42_proc_fallocate(struct rpc_message *msg, struct file *filep,
 	exception.inode = inode;
 	exception.state = lock->open_context->state;
 
-	nfs_file_block_o_direct(NFS_I(inode));
 	err = nfs_sync_inode(inode);
 	if (err)
 		goto out;
@@ -138,13 +137,17 @@ int nfs42_proc_allocate(struct file *filep, loff_t offset, loff_t len)
 		.rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_ALLOCATE],
 	};
 	struct inode *inode = file_inode(filep);
-	loff_t oldsize = i_size_read(inode);
+	loff_t oldsize;
 	int err;
 
 	if (!nfs_server_capable(inode, NFS_CAP_ALLOCATE))
 		return -EOPNOTSUPP;
 
-	inode_lock(inode);
+	err = nfs_start_io_write(inode);
+	if (err)
+		return err;
+
+	oldsize = i_size_read(inode);
 
 	err = nfs42_proc_fallocate(&msg, filep, offset, len);
 
@@ -155,7 +158,7 @@ int nfs42_proc_allocate(struct file *filep, loff_t offset, loff_t len)
 		NFS_SERVER(inode)->caps &= ~(NFS_CAP_ALLOCATE |
 					     NFS_CAP_ZERO_RANGE);
 
-	inode_unlock(inode);
+	nfs_end_io_write(inode);
 	return err;
 }
 
@@ -170,7 +173,9 @@ int nfs42_proc_deallocate(struct file *filep, loff_t offset, loff_t len)
 	if (!nfs_server_capable(inode, NFS_CAP_DEALLOCATE))
 		return -EOPNOTSUPP;
 
-	inode_lock(inode);
+	err = nfs_start_io_write(inode);
+	if (err)
+		return err;
 
 	err = nfs42_proc_fallocate(&msg, filep, offset, len);
 	if (err == 0)
@@ -179,7 +184,7 @@ int nfs42_proc_deallocate(struct file *filep, loff_t offset, loff_t len)
 		NFS_SERVER(inode)->caps &= ~(NFS_CAP_DEALLOCATE |
 					     NFS_CAP_ZERO_RANGE);
 
-	inode_unlock(inode);
+	nfs_end_io_write(inode);
 	return err;
 }
 
@@ -189,14 +194,17 @@ int nfs42_proc_zero_range(struct file *filep, loff_t offset, loff_t len)
 		.rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_ZERO_RANGE],
 	};
 	struct inode *inode = file_inode(filep);
-	loff_t oldsize = i_size_read(inode);
+	loff_t oldsize;
 	int err;
 
 	if (!nfs_server_capable(inode, NFS_CAP_ZERO_RANGE))
 		return -EOPNOTSUPP;
 
-	inode_lock(inode);
+	err = nfs_start_io_write(inode);
+	if (err)
+		return err;
 
+	oldsize = i_size_read(inode);
 	err = nfs42_proc_fallocate(&msg, filep, offset, len);
 	if (err == 0) {
 		nfs_truncate_last_folio(inode->i_mapping, oldsize,
@@ -205,7 +213,7 @@ int nfs42_proc_zero_range(struct file *filep, loff_t offset, loff_t len)
 	} else if (err == -EOPNOTSUPP)
 		NFS_SERVER(inode)->caps &= ~NFS_CAP_ZERO_RANGE;
 
-	inode_unlock(inode);
+	nfs_end_io_write(inode);
 	return err;
 }
 
@@ -416,7 +424,7 @@ static ssize_t _nfs42_proc_copy(struct file *src,
 	struct nfs_server *src_server = NFS_SERVER(src_inode);
 	loff_t pos_src = args->src_pos;
 	loff_t pos_dst = args->dst_pos;
-	loff_t oldsize_dst = i_size_read(dst_inode);
+	loff_t oldsize_dst;
 	size_t count = args->count;
 	ssize_t status;
 
@@ -461,6 +469,7 @@ static ssize_t _nfs42_proc_copy(struct file *src,
 		&src_lock->open_context->state->flags);
 	set_bit(NFS_CLNT_DST_SSC_COPY_STATE,
 		&dst_lock->open_context->state->flags);
+	oldsize_dst = i_size_read(dst_inode);
 
 	status = nfs4_call_sync(dst_server->client, dst_server, &msg,
 				&args->seq_args, &res->seq_res, 0);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Possible regression after NFS eof page pollution fix (ext4 checksum errors)
  2026-01-11  0:24       ` Trond Myklebust
@ 2026-01-11  7:03         ` Mark Bloch
  2026-01-11 14:59           ` Trond Myklebust
  2026-01-22 10:00           ` Mark Bloch
  0 siblings, 2 replies; 10+ messages in thread
From: Mark Bloch @ 2026-01-11  7:03 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linoy Ganti, Bar Friedman, linux-nfs, Maor Gottlieb

Hi Trond,

On 11/01/2026 2:24, Trond Myklebust wrote:
> Hi Mark,
> 
> On Mon, 2026-01-05 at 10:20 -0500, Trond Myklebust wrote:
>>
>> OK so if I'm understanding correctly, this is organised as ext4
>> partitions that are stored in qcow2 images that are again stored on a
>> NFSv4.2 partition.
>>
>> Do these qcow2 images have a file size that is fixed at creation
>> time,
>> or is the file size dynamic?

The file size is dynamic (with a fixed maximum of 35 GB).

>> Also, does changing the "discard" option from "unmap" to "ignore"
>> make
>> any difference to the outcome?

The discard option is already set to "ignore" in the image.
Do you want us to test the other options just to see if it makes
a difference?

> 
> I've been staring at this for several days now, and the only candidate
> for a bug in the NFS client that I can see is this one. Can you please
> check if the following patch helps?

Thanks for the patch, I'll let the team dealing with the issue know
and let them test the patch.
I'll update once I know anything.

Mark

> 
> Thanks
>   Trond
> 
> 8<------------------------------------------------------------------
> From 18acd9e2652d44bcb8a48bc4643ab006787b809a Mon Sep 17 00:00:00 2001
> Message-ID: <18acd9e2652d44bcb8a48bc4643ab006787b809a.1768091015.git.trond.myklebust@hammerspace.com>
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
> Date: Sat, 10 Jan 2026 18:53:34 -0500
> Subject: [PATCH] NFSv4.2: Fix size read races in fallocate and copy offload
> 
> If the pre-operation file size is read before locking the inode and
> quiescing O_DIRECT writes, then nfs_truncate_last_folio() might end up
> overwriting valid file data.
> 
> Fixes: b1817b18ff20 ("NFS: Protect against 'eof page pollution'")
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> ---
>  fs/nfs/io.c        |  2 ++
>  fs/nfs/nfs42proc.c | 29 +++++++++++++++++++----------
>  2 files changed, 21 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/nfs/io.c b/fs/nfs/io.c
> index d275b0a250bf..8337f0ae852d 100644
> --- a/fs/nfs/io.c
> +++ b/fs/nfs/io.c
> @@ -84,6 +84,7 @@ nfs_start_io_write(struct inode *inode)
>  		nfs_file_block_o_direct(NFS_I(inode));
>  	return err;
>  }
> +EXPORT_SYMBOL_GPL(nfs_start_io_write);
>  
>  /**
>   * nfs_end_io_write - declare that the buffered write operation is done
> @@ -97,6 +98,7 @@ nfs_end_io_write(struct inode *inode)
>  {
>  	up_write(&inode->i_rwsem);
>  }
> +EXPORT_SYMBOL_GPL(nfs_end_io_write);
>  
>  /* Call with exclusively locked inode->i_rwsem */
>  static void nfs_block_buffered(struct nfs_inode *nfsi, struct inode *inode)
> diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
> index d537fb0c230e..c08520828708 100644
> --- a/fs/nfs/nfs42proc.c
> +++ b/fs/nfs/nfs42proc.c
> @@ -114,7 +114,6 @@ static int nfs42_proc_fallocate(struct rpc_message *msg, struct file *filep,
>  	exception.inode = inode;
>  	exception.state = lock->open_context->state;
>  
> -	nfs_file_block_o_direct(NFS_I(inode));
>  	err = nfs_sync_inode(inode);
>  	if (err)
>  		goto out;
> @@ -138,13 +137,17 @@ int nfs42_proc_allocate(struct file *filep, loff_t offset, loff_t len)
>  		.rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_ALLOCATE],
>  	};
>  	struct inode *inode = file_inode(filep);
> -	loff_t oldsize = i_size_read(inode);
> +	loff_t oldsize;
>  	int err;
>  
>  	if (!nfs_server_capable(inode, NFS_CAP_ALLOCATE))
>  		return -EOPNOTSUPP;
>  
> -	inode_lock(inode);
> +	err = nfs_start_io_write(inode);
> +	if (err)
> +		return err;
> +
> +	oldsize = i_size_read(inode);
>  
>  	err = nfs42_proc_fallocate(&msg, filep, offset, len);
>  
> @@ -155,7 +158,7 @@ int nfs42_proc_allocate(struct file *filep, loff_t offset, loff_t len)
>  		NFS_SERVER(inode)->caps &= ~(NFS_CAP_ALLOCATE |
>  					     NFS_CAP_ZERO_RANGE);
>  
> -	inode_unlock(inode);
> +	nfs_end_io_write(inode);
>  	return err;
>  }
>  
> @@ -170,7 +173,9 @@ int nfs42_proc_deallocate(struct file *filep, loff_t offset, loff_t len)
>  	if (!nfs_server_capable(inode, NFS_CAP_DEALLOCATE))
>  		return -EOPNOTSUPP;
>  
> -	inode_lock(inode);
> +	err = nfs_start_io_write(inode);
> +	if (err)
> +		return err;
>  
>  	err = nfs42_proc_fallocate(&msg, filep, offset, len);
>  	if (err == 0)
> @@ -179,7 +184,7 @@ int nfs42_proc_deallocate(struct file *filep, loff_t offset, loff_t len)
>  		NFS_SERVER(inode)->caps &= ~(NFS_CAP_DEALLOCATE |
>  					     NFS_CAP_ZERO_RANGE);
>  
> -	inode_unlock(inode);
> +	nfs_end_io_write(inode);
>  	return err;
>  }
>  
> @@ -189,14 +194,17 @@ int nfs42_proc_zero_range(struct file *filep, loff_t offset, loff_t len)
>  		.rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_ZERO_RANGE],
>  	};
>  	struct inode *inode = file_inode(filep);
> -	loff_t oldsize = i_size_read(inode);
> +	loff_t oldsize;
>  	int err;
>  
>  	if (!nfs_server_capable(inode, NFS_CAP_ZERO_RANGE))
>  		return -EOPNOTSUPP;
>  
> -	inode_lock(inode);
> +	err = nfs_start_io_write(inode);
> +	if (err)
> +		return err;
>  
> +	oldsize = i_size_read(inode);
>  	err = nfs42_proc_fallocate(&msg, filep, offset, len);
>  	if (err == 0) {
>  		nfs_truncate_last_folio(inode->i_mapping, oldsize,
> @@ -205,7 +213,7 @@ int nfs42_proc_zero_range(struct file *filep, loff_t offset, loff_t len)
>  	} else if (err == -EOPNOTSUPP)
>  		NFS_SERVER(inode)->caps &= ~NFS_CAP_ZERO_RANGE;
>  
> -	inode_unlock(inode);
> +	nfs_end_io_write(inode);
>  	return err;
>  }
>  
> @@ -416,7 +424,7 @@ static ssize_t _nfs42_proc_copy(struct file *src,
>  	struct nfs_server *src_server = NFS_SERVER(src_inode);
>  	loff_t pos_src = args->src_pos;
>  	loff_t pos_dst = args->dst_pos;
> -	loff_t oldsize_dst = i_size_read(dst_inode);
> +	loff_t oldsize_dst;
>  	size_t count = args->count;
>  	ssize_t status;
>  
> @@ -461,6 +469,7 @@ static ssize_t _nfs42_proc_copy(struct file *src,
>  		&src_lock->open_context->state->flags);
>  	set_bit(NFS_CLNT_DST_SSC_COPY_STATE,
>  		&dst_lock->open_context->state->flags);
> +	oldsize_dst = i_size_read(dst_inode);
>  
>  	status = nfs4_call_sync(dst_server->client, dst_server, &msg,
>  				&args->seq_args, &res->seq_res, 0);


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Possible regression after NFS eof page pollution fix (ext4 checksum errors)
  2026-01-11  7:03         ` Mark Bloch
@ 2026-01-11 14:59           ` Trond Myklebust
  2026-01-22 10:00           ` Mark Bloch
  1 sibling, 0 replies; 10+ messages in thread
From: Trond Myklebust @ 2026-01-11 14:59 UTC (permalink / raw)
  To: Mark Bloch; +Cc: Linoy Ganti, Bar Friedman, linux-nfs, Maor Gottlieb

On Sun, 2026-01-11 at 09:03 +0200, Mark Bloch wrote:
> Hi Trond,
> 
> On 11/01/2026 2:24, Trond Myklebust wrote:
> > Hi Mark,
> > 
> > On Mon, 2026-01-05 at 10:20 -0500, Trond Myklebust wrote:
> > > 
> > > OK so if I'm understanding correctly, this is organised as ext4
> > > partitions that are stored in qcow2 images that are again stored
> > > on a
> > > NFSv4.2 partition.
> > > 
> > > Do these qcow2 images have a file size that is fixed at creation
> > > time,
> > > or is the file size dynamic?
> 
> The file size is dynamic (with a fixed maximum of 35 GB).
> 
> > > Also, does changing the "discard" option from "unmap" to "ignore"
> > > make
> > > any difference to the outcome?
> 
> The discard option is already set to "ignore" in the image.
> Do you want us to test the other options just to see if it makes
> a difference?

I believe in your previous email you had it set as "unmap":

{"driver":"file","filename":"/images/.libvirt/linux-VAGRANTSLASH-
upstream_Z.img","node-name":"libvirt-2-storage","auto-read-
only":true,"discard":"unmap"} -blockdev {"node-name":"libvirt-2-
format","read-only":true,"driver":"qcow2","file":"libvirt-2-
storage","backing":null} -blockdev
{"driver":"file","filename":"/images/.libvirt/Y.img","node-
name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}

However if you've already tested with "ignore" and it didn't make a
difference then let's not worry about it.

> 
> > 
> > I've been staring at this for several days now, and the only
> > candidate
> > for a bug in the NFS client that I can see is this one. Can you
> > please
> > check if the following patch helps?
> 
> Thanks for the patch, I'll let the team dealing with the issue know
> and let them test the patch.
> I'll update once I know anything.

Thanks!
-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@kernel.org, trond.myklebust@hammerspace.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Possible regression after NFS eof page pollution fix (ext4 checksum errors)
  2026-01-11  7:03         ` Mark Bloch
  2026-01-11 14:59           ` Trond Myklebust
@ 2026-01-22 10:00           ` Mark Bloch
  2026-01-22 21:13             ` Trond Myklebust
  1 sibling, 1 reply; 10+ messages in thread
From: Mark Bloch @ 2026-01-22 10:00 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linoy Ganti, Bar Friedman, linux-nfs, Maor Gottlieb



On 11/01/2026 9:03, Mark Bloch wrote:
> Hi Trond,
> 
> On 11/01/2026 2:24, Trond Myklebust wrote:
>> Hi Mark,
>>
>> On Mon, 2026-01-05 at 10:20 -0500, Trond Myklebust wrote:
>>>
>>> OK so if I'm understanding correctly, this is organised as ext4
>>> partitions that are stored in qcow2 images that are again stored on a
>>> NFSv4.2 partition.
>>>
>>> Do these qcow2 images have a file size that is fixed at creation
>>> time,
>>> or is the file size dynamic?
> 
> The file size is dynamic (with a fixed maximum of 35 GB).
> 
>>> Also, does changing the "discard" option from "unmap" to "ignore"
>>> make
>>> any difference to the outcome?
> 
> The discard option is already set to "ignore" in the image.
> Do you want us to test the other options just to see if it makes
> a difference?
> 
>>
>> I've been staring at this for several days now, and the only candidate
>> for a bug in the NFS client that I can see is this one. Can you please
>> check if the following patch helps?
> 
> Thanks for the patch, I'll let the team dealing with the issue know
> and let them test the patch.
> I'll update once I know anything.

We've been testing your patch for some time now and didn't hit the issue.
Feel free to add Bar's tested by tag as she was the one
that actually tested the fix. Thanks for looking into this.

Tested-by: Bar Friedman <bfriedman@nvidia.com>

Mark

> 
> Mark
> 
>>
>> Thanks
>>   Trond
>>
>> 8<------------------------------------------------------------------
>> From 18acd9e2652d44bcb8a48bc4643ab006787b809a Mon Sep 17 00:00:00 2001
>> Message-ID: <18acd9e2652d44bcb8a48bc4643ab006787b809a.1768091015.git.trond.myklebust@hammerspace.com>
>> From: Trond Myklebust <trond.myklebust@hammerspace.com>
>> Date: Sat, 10 Jan 2026 18:53:34 -0500
>> Subject: [PATCH] NFSv4.2: Fix size read races in fallocate and copy offload
>>
>> If the pre-operation file size is read before locking the inode and
>> quiescing O_DIRECT writes, then nfs_truncate_last_folio() might end up
>> overwriting valid file data.
>>
>> Fixes: b1817b18ff20 ("NFS: Protect against 'eof page pollution'")
>> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
>> ---
>>  fs/nfs/io.c        |  2 ++
>>  fs/nfs/nfs42proc.c | 29 +++++++++++++++++++----------
>>  2 files changed, 21 insertions(+), 10 deletions(-)
>>
>> diff --git a/fs/nfs/io.c b/fs/nfs/io.c
>> index d275b0a250bf..8337f0ae852d 100644
>> --- a/fs/nfs/io.c
>> +++ b/fs/nfs/io.c
>> @@ -84,6 +84,7 @@ nfs_start_io_write(struct inode *inode)
>>  		nfs_file_block_o_direct(NFS_I(inode));
>>  	return err;
>>  }
>> +EXPORT_SYMBOL_GPL(nfs_start_io_write);
>>  
>>  /**
>>   * nfs_end_io_write - declare that the buffered write operation is done
>> @@ -97,6 +98,7 @@ nfs_end_io_write(struct inode *inode)
>>  {
>>  	up_write(&inode->i_rwsem);
>>  }
>> +EXPORT_SYMBOL_GPL(nfs_end_io_write);
>>  
>>  /* Call with exclusively locked inode->i_rwsem */
>>  static void nfs_block_buffered(struct nfs_inode *nfsi, struct inode *inode)
>> diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
>> index d537fb0c230e..c08520828708 100644
>> --- a/fs/nfs/nfs42proc.c
>> +++ b/fs/nfs/nfs42proc.c
>> @@ -114,7 +114,6 @@ static int nfs42_proc_fallocate(struct rpc_message *msg, struct file *filep,
>>  	exception.inode = inode;
>>  	exception.state = lock->open_context->state;
>>  
>> -	nfs_file_block_o_direct(NFS_I(inode));
>>  	err = nfs_sync_inode(inode);
>>  	if (err)
>>  		goto out;
>> @@ -138,13 +137,17 @@ int nfs42_proc_allocate(struct file *filep, loff_t offset, loff_t len)
>>  		.rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_ALLOCATE],
>>  	};
>>  	struct inode *inode = file_inode(filep);
>> -	loff_t oldsize = i_size_read(inode);
>> +	loff_t oldsize;
>>  	int err;
>>  
>>  	if (!nfs_server_capable(inode, NFS_CAP_ALLOCATE))
>>  		return -EOPNOTSUPP;
>>  
>> -	inode_lock(inode);
>> +	err = nfs_start_io_write(inode);
>> +	if (err)
>> +		return err;
>> +
>> +	oldsize = i_size_read(inode);
>>  
>>  	err = nfs42_proc_fallocate(&msg, filep, offset, len);
>>  
>> @@ -155,7 +158,7 @@ int nfs42_proc_allocate(struct file *filep, loff_t offset, loff_t len)
>>  		NFS_SERVER(inode)->caps &= ~(NFS_CAP_ALLOCATE |
>>  					     NFS_CAP_ZERO_RANGE);
>>  
>> -	inode_unlock(inode);
>> +	nfs_end_io_write(inode);
>>  	return err;
>>  }
>>  
>> @@ -170,7 +173,9 @@ int nfs42_proc_deallocate(struct file *filep, loff_t offset, loff_t len)
>>  	if (!nfs_server_capable(inode, NFS_CAP_DEALLOCATE))
>>  		return -EOPNOTSUPP;
>>  
>> -	inode_lock(inode);
>> +	err = nfs_start_io_write(inode);
>> +	if (err)
>> +		return err;
>>  
>>  	err = nfs42_proc_fallocate(&msg, filep, offset, len);
>>  	if (err == 0)
>> @@ -179,7 +184,7 @@ int nfs42_proc_deallocate(struct file *filep, loff_t offset, loff_t len)
>>  		NFS_SERVER(inode)->caps &= ~(NFS_CAP_DEALLOCATE |
>>  					     NFS_CAP_ZERO_RANGE);
>>  
>> -	inode_unlock(inode);
>> +	nfs_end_io_write(inode);
>>  	return err;
>>  }
>>  
>> @@ -189,14 +194,17 @@ int nfs42_proc_zero_range(struct file *filep, loff_t offset, loff_t len)
>>  		.rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_ZERO_RANGE],
>>  	};
>>  	struct inode *inode = file_inode(filep);
>> -	loff_t oldsize = i_size_read(inode);
>> +	loff_t oldsize;
>>  	int err;
>>  
>>  	if (!nfs_server_capable(inode, NFS_CAP_ZERO_RANGE))
>>  		return -EOPNOTSUPP;
>>  
>> -	inode_lock(inode);
>> +	err = nfs_start_io_write(inode);
>> +	if (err)
>> +		return err;
>>  
>> +	oldsize = i_size_read(inode);
>>  	err = nfs42_proc_fallocate(&msg, filep, offset, len);
>>  	if (err == 0) {
>>  		nfs_truncate_last_folio(inode->i_mapping, oldsize,
>> @@ -205,7 +213,7 @@ int nfs42_proc_zero_range(struct file *filep, loff_t offset, loff_t len)
>>  	} else if (err == -EOPNOTSUPP)
>>  		NFS_SERVER(inode)->caps &= ~NFS_CAP_ZERO_RANGE;
>>  
>> -	inode_unlock(inode);
>> +	nfs_end_io_write(inode);
>>  	return err;
>>  }
>>  
>> @@ -416,7 +424,7 @@ static ssize_t _nfs42_proc_copy(struct file *src,
>>  	struct nfs_server *src_server = NFS_SERVER(src_inode);
>>  	loff_t pos_src = args->src_pos;
>>  	loff_t pos_dst = args->dst_pos;
>> -	loff_t oldsize_dst = i_size_read(dst_inode);
>> +	loff_t oldsize_dst;
>>  	size_t count = args->count;
>>  	ssize_t status;
>>  
>> @@ -461,6 +469,7 @@ static ssize_t _nfs42_proc_copy(struct file *src,
>>  		&src_lock->open_context->state->flags);
>>  	set_bit(NFS_CLNT_DST_SSC_COPY_STATE,
>>  		&dst_lock->open_context->state->flags);
>> +	oldsize_dst = i_size_read(dst_inode);
>>  
>>  	status = nfs4_call_sync(dst_server->client, dst_server, &msg,
>>  				&args->seq_args, &res->seq_res, 0);
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Possible regression after NFS eof page pollution fix (ext4 checksum errors)
  2026-01-22 10:00           ` Mark Bloch
@ 2026-01-22 21:13             ` Trond Myklebust
  0 siblings, 0 replies; 10+ messages in thread
From: Trond Myklebust @ 2026-01-22 21:13 UTC (permalink / raw)
  To: Mark Bloch; +Cc: Linoy Ganti, Bar Friedman, linux-nfs, Maor Gottlieb

On Thu, 2026-01-22 at 12:00 +0200, Mark Bloch wrote:
> 
> 
> On 11/01/2026 9:03, Mark Bloch wrote:
> > Hi Trond,
> > 
> > On 11/01/2026 2:24, Trond Myklebust wrote:
> > > Hi Mark,
> > > 
> > > On Mon, 2026-01-05 at 10:20 -0500, Trond Myklebust wrote:
> > > > 
> > > > OK so if I'm understanding correctly, this is organised as ext4
> > > > partitions that are stored in qcow2 images that are again
> > > > stored on a
> > > > NFSv4.2 partition.
> > > > 
> > > > Do these qcow2 images have a file size that is fixed at
> > > > creation
> > > > time,
> > > > or is the file size dynamic?
> > 
> > The file size is dynamic (with a fixed maximum of 35 GB).
> > 
> > > > Also, does changing the "discard" option from "unmap" to
> > > > "ignore"
> > > > make
> > > > any difference to the outcome?
> > 
> > The discard option is already set to "ignore" in the image.
> > Do you want us to test the other options just to see if it makes
> > a difference?
> > 
> > > 
> > > I've been staring at this for several days now, and the only
> > > candidate
> > > for a bug in the NFS client that I can see is this one. Can you
> > > please
> > > check if the following patch helps?
> > 
> > Thanks for the patch, I'll let the team dealing with the issue know
> > and let them test the patch.
> > I'll update once I know anything.
> 
> We've been testing your patch for some time now and didn't hit the
> issue.
> Feel free to add Bar's tested by tag as she was the one
> that actually tested the fix. Thanks for looking into this.
> 
> Tested-by: Bar Friedman <bfriedman@nvidia.com>
> 
> Mark
> 


Thank you very much for testing, Bar! I unfortunately already sent the
patch upstream as it was clearly a necessary fix (even though it was
not obvious to me that it would be sufficient to fix your reported
problem). I'm therefore hoping it will hit the 6.18.x stable kernels
soon.

> 

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trondmy@kernel.org, trond.myklebust@hammerspace.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-01-22 21:13 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-04  9:16 Possible regression after NFS eof page pollution fix (ext4 checksum errors) Mark Bloch
2026-01-04 15:36 ` Trond Myklebust
2026-01-05 14:00   ` Mark Bloch
2026-01-05 15:20     ` Trond Myklebust
2026-01-06 10:12       ` Bar Friedman
2026-01-11  0:24       ` Trond Myklebust
2026-01-11  7:03         ` Mark Bloch
2026-01-11 14:59           ` Trond Myklebust
2026-01-22 10:00           ` Mark Bloch
2026-01-22 21:13             ` Trond Myklebust

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox