[PATCH 1/1] xfs: fix overlapping extents returned for pNFS LAYOUTGET

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/1] xfs: fix overlapping extents returned for pNFS LAYOUTGET
@ 2026-05-12 17:21 Dai Ngo
  2026-05-12 17:34 ` Darrick J. Wong
  2026-05-13  7:01 ` Christoph Hellwig
  0 siblings, 2 replies; 13+ messages in thread
From: Dai Ngo @ 2026-05-12 17:21 UTC (permalink / raw)
  To: cem; +Cc: linux-xfs, linux-nfs

xfs_fs_map_blocks() currently passes XFS_BMAPI_ENTIRE to xfs_bmapi_read(),
which causes the bmap code to expand the mapping to cover the entire
extent rather than the requested range.

A single LAYOUTGET request from the client can cause the server to
issue multiple calls to xfs_fs_map_blocks() for different offsets
within the same extent. Because the use of XFS_BMAPI_ENTIRE flag,
these calls can produce overlapping mappings.

As a result, the LAYOUTGET reply sent to the NFS client may contain
overlapping extents. This creates ambiguity in extent selection for a
given file range, which can lead to incorrect device selection,
inconsistent handling of datastate, and ultimately data corruption or
protocol violations on the client side.

Problem discovered with xfstest generic/075 test using NFSv4.2 mount
with SCSI layout.

Fix this by replacing the XFS_BMAPI_ENTIRE flag with '0' so that
xfs_bmapi_read() returns only the mapping for the requested range.

Also drop the check for (!error) since it was checked after call to
xfs_bmapi_read().

Fixes: cc6c40e09d7b1 ("NFSD/blocklayout: Support multiple extents per LAYOUTGET").
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
---
 fs/xfs/xfs_pnfs.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

- This patch is based on top of the patch:
  xfs: fix use of uninitialized imap in xfs_fs_map_blocks error path

diff --git a/fs/xfs/xfs_pnfs.c b/fs/xfs/xfs_pnfs.c
index f7c6dba3d21e..697bf3e4ad7e 100644
--- a/fs/xfs/xfs_pnfs.c
+++ b/fs/xfs/xfs_pnfs.c
@@ -118,7 +118,7 @@ xfs_fs_map_blocks(
 	struct xfs_bmbt_irec	imap;
 	xfs_fileoff_t		offset_fsb, end_fsb;
 	loff_t			limit;
-	int			bmapi_flags = XFS_BMAPI_ENTIRE;
+	int			bmapi_flags;
 	int			nimaps = 1;
 	uint			lock_flags;
 	int			error = 0;
@@ -172,6 +172,7 @@ xfs_fs_map_blocks(
 	offset_fsb = XFS_B_TO_FSBT(mp, offset);

 	lock_flags = xfs_ilock_data_map_shared(ip);
+	bmapi_flags = 0;	/* return map for requested range only */
 	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
 				&imap, &nimaps, bmapi_flags);
 	if (error) {
@@ -182,8 +183,7 @@ xfs_fs_map_blocks(

 	ASSERT(!nimaps || imap.br_startblock != DELAYSTARTBLOCK);

-	if (!error && write &&
-	    (!nimaps || imap.br_startblock == HOLESTARTBLOCK)) {
+	if (write && (!nimaps || imap.br_startblock == HOLESTARTBLOCK)) {
 		if (offset + length > XFS_ISIZE(ip))
 			end_fsb = xfs_iomap_eof_align_last_fsb(ip, end_fsb);
 		else if (nimaps && imap.br_startblock == HOLESTARTBLOCK)
-- 
2.47.3

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] xfs: fix overlapping extents returned for pNFS LAYOUTGET
  2026-05-12 17:21 [PATCH 1/1] xfs: fix overlapping extents returned for pNFS LAYOUTGET Dai Ngo
@ 2026-05-12 17:34 ` Darrick J. Wong
  2026-05-12 19:21   ` Dai Ngo
  2026-05-13  7:01 ` Christoph Hellwig
  1 sibling, 1 reply; 13+ messages in thread
From: Darrick J. Wong @ 2026-05-12 17:34 UTC (permalink / raw)
  To: Dai Ngo; +Cc: cem, linux-xfs, linux-nfs

On Tue, May 12, 2026 at 10:21:53AM -0700, Dai Ngo wrote:
> xfs_fs_map_blocks() currently passes XFS_BMAPI_ENTIRE to xfs_bmapi_read(),
> which causes the bmap code to expand the mapping to cover the entire
> extent rather than the requested range.

Nitpicking: _ENTIRE causes bmapi_read to return the whole extent instead
of trimming it down to the requested range.

> A single LAYOUTGET request from the client can cause the server to
> issue multiple calls to xfs_fs_map_blocks() for different offsets
> within the same extent. Because the use of XFS_BMAPI_ENTIRE flag,
> these calls can produce overlapping mappings.
> 
> As a result, the LAYOUTGET reply sent to the NFS client may contain
> overlapping extents. This creates ambiguity in extent selection for a
> given file range, which can lead to incorrect device selection,
> inconsistent handling of datastate, and ultimately data corruption or
> protocol violations on the client side.
> 
> Problem discovered with xfstest generic/075 test using NFSv4.2 mount
> with SCSI layout.

Might be helpful to provide an example of the request vs. the
overlapping layouts.  IIRC the client asks for a layout for the first
32 fsblocks of the file.  On the first call to xfs_fs_map_blocks, block
0 is a single unwritten mapping, so that gets returned.

Meanwhile, another thread fallocates block 2 and gets lucky in that an
adjacent block is free, so the first mapping in the file is now 2
unwritten fsblocks starting at 0.  This can happen because we don't hold
i_rwsem (or the ILOCK) between calls to ->map_blocks.

Returning to the first thread, it calls xfs_fs_map_blocks again to map
block 1.  However, the mapping's been changed, so we now return the
entire 2-fsblock  mapping.  What gets sent to the client is

{.offset = 0, .length = 4096, .addr = X, .dev = Y},
{.offset = 0, .length = 8192, .addr = X, .dev = Y},

and the client rejects that as overlapping.  Right?

> Fix this by replacing the XFS_BMAPI_ENTIRE flag with '0' so that
> xfs_bmapi_read() returns only the mapping for the requested range.
> 
> Also drop the check for (!error) since it was checked after call to
> xfs_bmapi_read().

Cc: <stable@vger.kernel.org> # v6.19

> Fixes: cc6c40e09d7b1 ("NFSD/blocklayout: Support multiple extents per LAYOUTGET").
> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> ---
>  fs/xfs/xfs_pnfs.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> - This patch is based on top of the patch:
>   xfs: fix use of uninitialized imap in xfs_fs_map_blocks error path
> 
> diff --git a/fs/xfs/xfs_pnfs.c b/fs/xfs/xfs_pnfs.c
> index f7c6dba3d21e..697bf3e4ad7e 100644
> --- a/fs/xfs/xfs_pnfs.c
> +++ b/fs/xfs/xfs_pnfs.c
> @@ -118,7 +118,7 @@ xfs_fs_map_blocks(
>  	struct xfs_bmbt_irec	imap;
>  	xfs_fileoff_t		offset_fsb, end_fsb;
>  	loff_t			limit;
> -	int			bmapi_flags = XFS_BMAPI_ENTIRE;
> +	int			bmapi_flags;

Why not just replace the argument to xfs_bmapi_read with a constant
zero?

--D

>  	int			nimaps = 1;
>  	uint			lock_flags;
>  	int			error = 0;
> @@ -172,6 +172,7 @@ xfs_fs_map_blocks(
>  	offset_fsb = XFS_B_TO_FSBT(mp, offset);
>  
>  	lock_flags = xfs_ilock_data_map_shared(ip);
> +	bmapi_flags = 0;	/* return map for requested range only */
>  	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
>  				&imap, &nimaps, bmapi_flags);
>  	if (error) {
> @@ -182,8 +183,7 @@ xfs_fs_map_blocks(
>  
>  	ASSERT(!nimaps || imap.br_startblock != DELAYSTARTBLOCK);
>  
> -	if (!error && write &&
> -	    (!nimaps || imap.br_startblock == HOLESTARTBLOCK)) {
> +	if (write && (!nimaps || imap.br_startblock == HOLESTARTBLOCK)) {
>  		if (offset + length > XFS_ISIZE(ip))
>  			end_fsb = xfs_iomap_eof_align_last_fsb(ip, end_fsb);
>  		else if (nimaps && imap.br_startblock == HOLESTARTBLOCK)
> -- 
> 2.47.3
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] xfs: fix overlapping extents returned for pNFS LAYOUTGET
  2026-05-12 17:34 ` Darrick J. Wong
@ 2026-05-12 19:21   ` Dai Ngo
  0 siblings, 0 replies; 13+ messages in thread
From: Dai Ngo @ 2026-05-12 19:21 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, linux-nfs

Thank you for your review!

On 5/12/26 10:34 AM, Darrick J. Wong wrote:
> On Tue, May 12, 2026 at 10:21:53AM -0700, Dai Ngo wrote:
>> xfs_fs_map_blocks() currently passes XFS_BMAPI_ENTIRE to xfs_bmapi_read(),
>> which causes the bmap code to expand the mapping to cover the entire
>> extent rather than the requested range.
> Nitpicking: _ENTIRE causes bmapi_read to return the whole extent instead
> of trimming it down to the requested range.

Fix in v2.

>
>> A single LAYOUTGET request from the client can cause the server to
>> issue multiple calls to xfs_fs_map_blocks() for different offsets
>> within the same extent. Because the use of XFS_BMAPI_ENTIRE flag,
>> these calls can produce overlapping mappings.
>>
>> As a result, the LAYOUTGET reply sent to the NFS client may contain
>> overlapping extents. This creates ambiguity in extent selection for a
>> given file range, which can lead to incorrect device selection,
>> inconsistent handling of datastate, and ultimately data corruption or
>> protocol violations on the client side.
>>
>> Problem discovered with xfstest generic/075 test using NFSv4.2 mount
>> with SCSI layout.
> Might be helpful to provide an example of the request vs. the
> overlapping layouts.  IIRC the client asks for a layout for the first
> 32 fsblocks of the file.  On the first call to xfs_fs_map_blocks, block
> 0 is a single unwritten mapping, so that gets returned.
>
> Meanwhile, another thread fallocates block 2 and gets lucky in that an
> adjacent block is free, so the first mapping in the file is now 2
> unwritten fsblocks starting at 0.  This can happen because we don't hold
> i_rwsem (or the ILOCK) between calls to ->map_blocks.
>
> Returning to the first thread, it calls xfs_fs_map_blocks again to map
> block 1.  However, the mapping's been changed, so we now return the
> entire 2-fsblock  mapping.

I'm not sure why the file map gets change between the successive calls
from the same thread that services the LAYOUTGET. This test does lots
of ALLOCATE and DEALLOCATE ops prior to the LAYOUTGET.

>    What gets sent to the client is
>
> {.offset = 0, .length = 4096, .addr = X, .dev = Y},
> {.offset = 0, .length = 8192, .addr = X, .dev = Y},

Yes, I will include on-the-wire capture of a LAYOUTGET operation and
reply showing the overlapping extents in v2.

> and the client rejects that as overlapping.  Right?

The Linux client uses a red-black tree to maintain these extents.
Overlapping extents can cause the client to select the wrong extent,
resulting in intermittent test failures.

>
>> Fix this by replacing the XFS_BMAPI_ENTIRE flag with '0' so that
>> xfs_bmapi_read() returns only the mapping for the requested range.
>>
>> Also drop the check for (!error) since it was checked after call to
>> xfs_bmapi_read().
> Cc: <stable@vger.kernel.org> # v6.19

Fix in v2.

>
>> Fixes: cc6c40e09d7b1 ("NFSD/blocklayout: Support multiple extents per LAYOUTGET").
>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>> ---
>>   fs/xfs/xfs_pnfs.c | 6 +++---
>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> - This patch is based on top of the patch:
>>    xfs: fix use of uninitialized imap in xfs_fs_map_blocks error path
>>
>> diff --git a/fs/xfs/xfs_pnfs.c b/fs/xfs/xfs_pnfs.c
>> index f7c6dba3d21e..697bf3e4ad7e 100644
>> --- a/fs/xfs/xfs_pnfs.c
>> +++ b/fs/xfs/xfs_pnfs.c
>> @@ -118,7 +118,7 @@ xfs_fs_map_blocks(
>>   	struct xfs_bmbt_irec	imap;
>>   	xfs_fileoff_t		offset_fsb, end_fsb;
>>   	loff_t			limit;
>> -	int			bmapi_flags = XFS_BMAPI_ENTIRE;
>> +	int			bmapi_flags;
> Why not just replace the argument to xfs_bmapi_read with a constant
> zero?

This makes the code easier to understand since there is no comment
describing the meaning of the '0' argument passed toxfs_bmapi_read().

Thanks,
-Dai

>
> --D
>
>>   	int			nimaps = 1;
>>   	uint			lock_flags;
>>   	int			error = 0;
>> @@ -172,6 +172,7 @@ xfs_fs_map_blocks(
>>   	offset_fsb = XFS_B_TO_FSBT(mp, offset);
>>   
>>   	lock_flags = xfs_ilock_data_map_shared(ip);
>> +	bmapi_flags = 0;	/* return map for requested range only */
>>   	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
>>   				&imap, &nimaps, bmapi_flags);
>>   	if (error) {
>> @@ -182,8 +183,7 @@ xfs_fs_map_blocks(
>>   
>>   	ASSERT(!nimaps || imap.br_startblock != DELAYSTARTBLOCK);
>>   
>> -	if (!error && write &&
>> -	    (!nimaps || imap.br_startblock == HOLESTARTBLOCK)) {
>> +	if (write && (!nimaps || imap.br_startblock == HOLESTARTBLOCK)) {
>>   		if (offset + length > XFS_ISIZE(ip))
>>   			end_fsb = xfs_iomap_eof_align_last_fsb(ip, end_fsb);
>>   		else if (nimaps && imap.br_startblock == HOLESTARTBLOCK)
>> -- 
>> 2.47.3
>>
>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] xfs: fix overlapping extents returned for pNFS LAYOUTGET
  2026-05-12 17:21 [PATCH 1/1] xfs: fix overlapping extents returned for pNFS LAYOUTGET Dai Ngo
  2026-05-12 17:34 ` Darrick J. Wong
@ 2026-05-13  7:01 ` Christoph Hellwig
  2026-05-13 15:50   ` Dai Ngo
  1 sibling, 1 reply; 13+ messages in thread
From: Christoph Hellwig @ 2026-05-13  7:01 UTC (permalink / raw)
  To: Dai Ngo; +Cc: cem, linux-xfs, linux-nfs

On Tue, May 12, 2026 at 10:21:53AM -0700, Dai Ngo wrote:
> A single LAYOUTGET request from the client can cause the server to
> issue multiple calls to xfs_fs_map_blocks() for different offsets
> within the same extent. Because the use of XFS_BMAPI_ENTIRE flag,
> these calls can produce overlapping mappings.
> 
> As a result, the LAYOUTGET reply sent to the NFS client may contain
> overlapping extents. This creates ambiguity in extent selection for a
> given file range, which can lead to incorrect device selection,
> inconsistent handling of datastate, and ultimately data corruption or
> protocol violations on the client side.

Please also add a check to the client that catches this and doesn't
use the layout that has extents outside the requested range.  And maybe
warn about it as well.

> Also drop the check for (!error) since it was checked after call to
> xfs_bmapi_read().
> 
> Fixes: cc6c40e09d7b1 ("NFSD/blocklayout: Support multiple extents per LAYOUTGET").
> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> ---
>  fs/xfs/xfs_pnfs.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> - This patch is based on top of the patch:
>   xfs: fix use of uninitialized imap in xfs_fs_map_blocks error path

The error changes should go into that patch, so please resend it with
that fixes.  Maybe as a series together with this patch to keep them
together.

> @@ -172,6 +172,7 @@ xfs_fs_map_blocks(
>  	offset_fsb = XFS_B_TO_FSBT(mp, offset);
>  
>  	lock_flags = xfs_ilock_data_map_shared(ip);
> +	bmapi_flags = 0;	/* return map for requested range only */

Just remove the variable and hard code the 0 in the xfs_bmapi_read call.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] xfs: fix overlapping extents returned for pNFS LAYOUTGET
  2026-05-13  7:01 ` Christoph Hellwig
@ 2026-05-13 15:50   ` Dai Ngo
  2026-05-13 17:28     ` Dai Ngo
  2026-05-15 11:49     ` Christoph Hellwig
  0 siblings, 2 replies; 13+ messages in thread
From: Dai Ngo @ 2026-05-13 15:50 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: cem, linux-xfs, linux-nfs


On 5/13/26 12:01 AM, Christoph Hellwig wrote:
> On Tue, May 12, 2026 at 10:21:53AM -0700, Dai Ngo wrote:
>> A single LAYOUTGET request from the client can cause the server to
>> issue multiple calls to xfs_fs_map_blocks() for different offsets
>> within the same extent. Because the use of XFS_BMAPI_ENTIRE flag,
>> these calls can produce overlapping mappings.
>>
>> As a result, the LAYOUTGET reply sent to the NFS client may contain
>> overlapping extents. This creates ambiguity in extent selection for a
>> given file range, which can lead to incorrect device selection,
>> inconsistent handling of datastate, and ultimately data corruption or
>> protocol violations on the client side.
> Please also add a check to the client that catches this and doesn't
> use the layout that has extents outside the requested range.  And maybe
> warn about it as well.

The returned extents cover exactly the range requested in the LAYOUTGET
op. However these extents are overlapping. For example, here is the
on-the-wire capture of the LAYOUTGET operation and reply showing the
overlapping extents:

     Network File System, Ops(3): SEQUENCE, PUTFH, LAYOUTGET
         [Program Version: 4]
         [V4 Procedure: COMPOUND (1)]
         Tag: <EMPTY>
         minorversion: 2
         Operations (count: 3): SEQUENCE, PUTFH, LAYOUTGET
             Opcode: SEQUENCE (53)
             Opcode: PUTFH (22)
             Opcode: LAYOUTGET (50)
                 layout available?: No
                 layout type: LAYOUT4_SCSI (5)
                 IO mode: IOMODE_RW (2)
                 offset: 122880
                 length: 65536
                 min length: 4096
                 StateID
                 maxcount: 4096
         [Main Opcode: LAYOUTGET (50)]
     
     Network File System, Ops(3): SEQUENCE PUTFH LAYOUTGET
         [Program Version: 4]
         [V4 Procedure: COMPOUND (1)]
         Status: NFS4_OK (0)
         Tag: <EMPTY>
         Operations (count: 3)
             Opcode: SEQUENCE (53)
             Opcode: PUTFH (22)
             Opcode: LAYOUTGET (50)
                 Status: NFS4_OK (0)
                 return on close?: Yes
                 StateID
                 Layout Segment (count: 1)
                     offset: 122880
                     length: 77824
                     IO mode: IOMODE_RW (2)
                     layout type: LAYOUT4_SCSI (5)
                     SCSI Extents (count: 2)
                         extent 0
                             device ID: 01000000000000000000000000000000
                             file offset: 122880
                             length: 53248
                             volume offset: 339460096
                             extent state: INVALID_DATA (2)
                         extent 1
                             device ID: 01000000000000000000000000000000
                             file offset: 122880
                             length: 77824
                             volume offset: 339460096
                             extent state: INVALID_DATA (2)
         [Main Opcode: LAYOUTGET (50)]

-Dai

>
>> Also drop the check for (!error) since it was checked after call to
>> xfs_bmapi_read().
>>
>> Fixes: cc6c40e09d7b1 ("NFSD/blocklayout: Support multiple extents per LAYOUTGET").
>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>> ---
>>   fs/xfs/xfs_pnfs.c | 6 +++---
>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> - This patch is based on top of the patch:
>>    xfs: fix use of uninitialized imap in xfs_fs_map_blocks error path
> The error changes should go into that patch, so please resend it with
> that fixes.  Maybe as a series together with this patch to keep them
> together.
>
>> @@ -172,6 +172,7 @@ xfs_fs_map_blocks(
>>   	offset_fsb = XFS_B_TO_FSBT(mp, offset);
>>   
>>   	lock_flags = xfs_ilock_data_map_shared(ip);
>> +	bmapi_flags = 0;	/* return map for requested range only */
> Just remove the variable and hard code the 0 in the xfs_bmapi_read call.
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] xfs: fix overlapping extents returned for pNFS LAYOUTGET
  2026-05-13 15:50   ` Dai Ngo
@ 2026-05-13 17:28     ` Dai Ngo
  2026-05-14  0:25       ` Darrick J. Wong
  2026-05-15 11:50       ` Christoph Hellwig
  2026-05-15 11:49     ` Christoph Hellwig
  1 sibling, 2 replies; 13+ messages in thread
From: Dai Ngo @ 2026-05-13 17:28 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: cem, linux-xfs, linux-nfs

Hi Christoph,

On 5/13/26 8:50 AM, Dai Ngo wrote:
>
> On 5/13/26 12:01 AM, Christoph Hellwig wrote:
>> On Tue, May 12, 2026 at 10:21:53AM -0700, Dai Ngo wrote:
>>> A single LAYOUTGET request from the client can cause the server to
>>> issue multiple calls to xfs_fs_map_blocks() for different offsets
>>> within the same extent. Because the use of XFS_BMAPI_ENTIRE flag,
>>> these calls can produce overlapping mappings.
>>>
>>> As a result, the LAYOUTGET reply sent to the NFS client may contain
>>> overlapping extents. This creates ambiguity in extent selection for a
>>> given file range, which can lead to incorrect device selection,
>>> inconsistent handling of datastate, and ultimately data corruption or
>>> protocol violations on the client side.
>> Please also add a check to the client that catches this and doesn't
>> use the layout that has extents outside the requested range. And maybe
>> warn about it as well.
>
> The returned extents cover exactly the range requested in the LAYOUTGET
> op. However these extents are overlapping. For example, here is the
> on-the-wire capture of the LAYOUTGET operation and reply showing the
> overlapping extents:
>
>     Network File System, Ops(3): SEQUENCE, PUTFH, LAYOUTGET
>         [Program Version: 4]
>         [V4 Procedure: COMPOUND (1)]
>         Tag: <EMPTY>
>         minorversion: 2
>         Operations (count: 3): SEQUENCE, PUTFH, LAYOUTGET
>             Opcode: SEQUENCE (53)
>             Opcode: PUTFH (22)
>             Opcode: LAYOUTGET (50)
>                 layout available?: No
>                 layout type: LAYOUT4_SCSI (5)
>                 IO mode: IOMODE_RW (2)
>                 offset: 122880
>                 length: 65536
>                 min length: 4096
>                 StateID
>                 maxcount: 4096
>         [Main Opcode: LAYOUTGET (50)]
>         Network File System, Ops(3): SEQUENCE PUTFH LAYOUTGET
>         [Program Version: 4]
>         [V4 Procedure: COMPOUND (1)]
>         Status: NFS4_OK (0)
>         Tag: <EMPTY>
>         Operations (count: 3)
>             Opcode: SEQUENCE (53)
>             Opcode: PUTFH (22)
>             Opcode: LAYOUTGET (50)
>                 Status: NFS4_OK (0)
>                 return on close?: Yes
>                 StateID
>                 Layout Segment (count: 1)
>                     offset: 122880
>                     length: 77824
>                     IO mode: IOMODE_RW (2)
>                     layout type: LAYOUT4_SCSI (5)
>                     SCSI Extents (count: 2)
>                         extent 0
>                             device ID: 01000000000000000000000000000000
>                             file offset: 122880
>                             length: 53248
>                             volume offset: 339460096
>                             extent state: INVALID_DATA (2)
>                         extent 1
>                             device ID: 01000000000000000000000000000000
>                             file offset: 122880
>                             length: 77824
>                             volume offset: 339460096
>                             extent state: INVALID_DATA (2)
>         [Main Opcode: LAYOUTGET (50)]

After reviewing ext_tree_insert(), with assist from Codex, I think this
function handles overlapping extents properly. The only issue I see in
ext_tree_insert() is the accuracy of the return error code, EINVAL instead
of ENOMEM, when kmemdup() fails.

Since ext_tree_insert seems to handle overlapping extents fine, do you
think it's worth it to fix xfs_fs_map_blocks() to avoid returning overlap
extents?

IMHO, I think we still should fix xfs_fs_map_blocks() to avoid any overhead
and complication in ext_tree_insert having to handle overlapping extents.

-Dai

>
> -Dai
>
>>
>>> Also drop the check for (!error) since it was checked after call to
>>> xfs_bmapi_read().
>>>
>>> Fixes: cc6c40e09d7b1 ("NFSD/blocklayout: Support multiple extents 
>>> per LAYOUTGET").
>>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>>> ---
>>>   fs/xfs/xfs_pnfs.c | 6 +++---
>>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>>
>>> - This patch is based on top of the patch:
>>>    xfs: fix use of uninitialized imap in xfs_fs_map_blocks error path
>> The error changes should go into that patch, so please resend it with
>> that fixes.  Maybe as a series together with this patch to keep them
>> together.
>>
>>> @@ -172,6 +172,7 @@ xfs_fs_map_blocks(
>>>       offset_fsb = XFS_B_TO_FSBT(mp, offset);
>>>         lock_flags = xfs_ilock_data_map_shared(ip);
>>> +    bmapi_flags = 0;    /* return map for requested range only */
>> Just remove the variable and hard code the 0 in the xfs_bmapi_read call.
>>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] xfs: fix overlapping extents returned for pNFS LAYOUTGET
  2026-05-13 17:28     ` Dai Ngo
@ 2026-05-14  0:25       ` Darrick J. Wong
  2026-05-14 17:19         ` Dai Ngo
  2026-05-15 11:50       ` Christoph Hellwig
  1 sibling, 1 reply; 13+ messages in thread
From: Darrick J. Wong @ 2026-05-14  0:25 UTC (permalink / raw)
  To: Dai Ngo; +Cc: Christoph Hellwig, cem, linux-xfs, linux-nfs

On Wed, May 13, 2026 at 10:28:31AM -0700, Dai Ngo wrote:
> Hi Christoph,
> 
> On 5/13/26 8:50 AM, Dai Ngo wrote:
> > 
> > On 5/13/26 12:01 AM, Christoph Hellwig wrote:
> > > On Tue, May 12, 2026 at 10:21:53AM -0700, Dai Ngo wrote:
> > > > A single LAYOUTGET request from the client can cause the server to
> > > > issue multiple calls to xfs_fs_map_blocks() for different offsets
> > > > within the same extent. Because the use of XFS_BMAPI_ENTIRE flag,
> > > > these calls can produce overlapping mappings.
> > > > 
> > > > As a result, the LAYOUTGET reply sent to the NFS client may contain
> > > > overlapping extents. This creates ambiguity in extent selection for a
> > > > given file range, which can lead to incorrect device selection,
> > > > inconsistent handling of datastate, and ultimately data corruption or
> > > > protocol violations on the client side.
> > > Please also add a check to the client that catches this and doesn't
> > > use the layout that has extents outside the requested range. And maybe
> > > warn about it as well.
> > 
> > The returned extents cover exactly the range requested in the LAYOUTGET
> > op. However these extents are overlapping. For example, here is the
> > on-the-wire capture of the LAYOUTGET operation and reply showing the
> > overlapping extents:
> > 
> >     Network File System, Ops(3): SEQUENCE, PUTFH, LAYOUTGET
> >         [Program Version: 4]
> >         [V4 Procedure: COMPOUND (1)]
> >         Tag: <EMPTY>
> >         minorversion: 2
> >         Operations (count: 3): SEQUENCE, PUTFH, LAYOUTGET
> >             Opcode: SEQUENCE (53)
> >             Opcode: PUTFH (22)
> >             Opcode: LAYOUTGET (50)
> >                 layout available?: No
> >                 layout type: LAYOUT4_SCSI (5)
> >                 IO mode: IOMODE_RW (2)
> >                 offset: 122880
> >                 length: 65536
> >                 min length: 4096
> >                 StateID
> >                 maxcount: 4096
> >         [Main Opcode: LAYOUTGET (50)]
> >         Network File System, Ops(3): SEQUENCE PUTFH LAYOUTGET
> >         [Program Version: 4]
> >         [V4 Procedure: COMPOUND (1)]
> >         Status: NFS4_OK (0)
> >         Tag: <EMPTY>
> >         Operations (count: 3)
> >             Opcode: SEQUENCE (53)
> >             Opcode: PUTFH (22)
> >             Opcode: LAYOUTGET (50)
> >                 Status: NFS4_OK (0)
> >                 return on close?: Yes
> >                 StateID
> >                 Layout Segment (count: 1)
> >                     offset: 122880
> >                     length: 77824
> >                     IO mode: IOMODE_RW (2)
> >                     layout type: LAYOUT4_SCSI (5)
> >                     SCSI Extents (count: 2)
> >                         extent 0
> >                             device ID: 01000000000000000000000000000000
> >                             file offset: 122880
> >                             length: 53248
> >                             volume offset: 339460096
> >                             extent state: INVALID_DATA (2)
> >                         extent 1
> >                             device ID: 01000000000000000000000000000000
> >                             file offset: 122880
> >                             length: 77824
> >                             volume offset: 339460096
> >                             extent state: INVALID_DATA (2)
> >         [Main Opcode: LAYOUTGET (50)]
> 
> After reviewing ext_tree_insert(), with assist from Codex, I think this
> function handles overlapping extents properly. The only issue I see in
> ext_tree_insert() is the accuracy of the return error code, EINVAL instead
> of ENOMEM, when kmemdup() fails.
> 
> Since ext_tree_insert seems to handle overlapping extents fine, do you
> think it's worth it to fix xfs_fs_map_blocks() to avoid returning overlap
> extents?
> 
> IMHO, I think we still should fix xfs_fs_map_blocks() to avoid any overhead
> and complication in ext_tree_insert having to handle overlapping extents.

I don't know enough about the nfs blocklayout code to say for sure, but
it seems like you want to upsert the mapping returned by
xfs_fs_map_blocks into the "ext_tree" right?

And by "upsert" I mean "clear out any mappings for the (offset, length)
range, then insert the new mapping", sort of like what the fuse iomap
cache does:
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/tree/fs/fuse/fuse_iomap_cache.c?h=fuse-iomap-cache_2026-05-07#n1682

or I guess the xfs scrub bitmap support code does when you set a range:
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/tree/fs/xfs/scrub/bitmap.c?h=fuse-iomap-cache_2026-05-07#n395

But as I said before, I don't know if "two mappings retrieved in rapid
succession that overlap" is actually an NFS error.

--D

> -Dai
> 
> > 
> > -Dai
> > 
> > > 
> > > > Also drop the check for (!error) since it was checked after call to
> > > > xfs_bmapi_read().
> > > > 
> > > > Fixes: cc6c40e09d7b1 ("NFSD/blocklayout: Support multiple
> > > > extents per LAYOUTGET").
> > > > Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> > > > ---
> > > >   fs/xfs/xfs_pnfs.c | 6 +++---
> > > >   1 file changed, 3 insertions(+), 3 deletions(-)
> > > > 
> > > > - This patch is based on top of the patch:
> > > >    xfs: fix use of uninitialized imap in xfs_fs_map_blocks error path
> > > The error changes should go into that patch, so please resend it with
> > > that fixes.  Maybe as a series together with this patch to keep them
> > > together.
> > > 
> > > > @@ -172,6 +172,7 @@ xfs_fs_map_blocks(
> > > >       offset_fsb = XFS_B_TO_FSBT(mp, offset);
> > > >         lock_flags = xfs_ilock_data_map_shared(ip);
> > > > +    bmapi_flags = 0;    /* return map for requested range only */
> > > Just remove the variable and hard code the 0 in the xfs_bmapi_read call.
> > > 
> > 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] xfs: fix overlapping extents returned for pNFS LAYOUTGET
  2026-05-14  0:25       ` Darrick J. Wong
@ 2026-05-14 17:19         ` Dai Ngo
  2026-05-14 17:49           ` Darrick J. Wong
  2026-05-15 21:39           ` Dave Chinner
  0 siblings, 2 replies; 13+ messages in thread
From: Dai Ngo @ 2026-05-14 17:19 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, cem, linux-xfs, linux-nfs


On 5/13/26 5:25 PM, Darrick J. Wong wrote:
> On Wed, May 13, 2026 at 10:28:31AM -0700, Dai Ngo wrote:
>> Hi Christoph,
>>
>> On 5/13/26 8:50 AM, Dai Ngo wrote:
>>> On 5/13/26 12:01 AM, Christoph Hellwig wrote:
>>>> On Tue, May 12, 2026 at 10:21:53AM -0700, Dai Ngo wrote:
>>>>> A single LAYOUTGET request from the client can cause the server to
>>>>> issue multiple calls to xfs_fs_map_blocks() for different offsets
>>>>> within the same extent. Because the use of XFS_BMAPI_ENTIRE flag,
>>>>> these calls can produce overlapping mappings.
>>>>>
>>>>> As a result, the LAYOUTGET reply sent to the NFS client may contain
>>>>> overlapping extents. This creates ambiguity in extent selection for a
>>>>> given file range, which can lead to incorrect device selection,
>>>>> inconsistent handling of datastate, and ultimately data corruption or
>>>>> protocol violations on the client side.
>>>> Please also add a check to the client that catches this and doesn't
>>>> use the layout that has extents outside the requested range. And maybe
>>>> warn about it as well.
>>> The returned extents cover exactly the range requested in the LAYOUTGET
>>> op. However these extents are overlapping. For example, here is the
>>> on-the-wire capture of the LAYOUTGET operation and reply showing the
>>> overlapping extents:
>>>
>>>      Network File System, Ops(3): SEQUENCE, PUTFH, LAYOUTGET
>>>          [Program Version: 4]
>>>          [V4 Procedure: COMPOUND (1)]
>>>          Tag: <EMPTY>
>>>          minorversion: 2
>>>          Operations (count: 3): SEQUENCE, PUTFH, LAYOUTGET
>>>              Opcode: SEQUENCE (53)
>>>              Opcode: PUTFH (22)
>>>              Opcode: LAYOUTGET (50)
>>>                  layout available?: No
>>>                  layout type: LAYOUT4_SCSI (5)
>>>                  IO mode: IOMODE_RW (2)
>>>                  offset: 122880
>>>                  length: 65536
>>>                  min length: 4096
>>>                  StateID
>>>                  maxcount: 4096
>>>          [Main Opcode: LAYOUTGET (50)]
>>>          Network File System, Ops(3): SEQUENCE PUTFH LAYOUTGET
>>>          [Program Version: 4]
>>>          [V4 Procedure: COMPOUND (1)]
>>>          Status: NFS4_OK (0)
>>>          Tag: <EMPTY>
>>>          Operations (count: 3)
>>>              Opcode: SEQUENCE (53)
>>>              Opcode: PUTFH (22)
>>>              Opcode: LAYOUTGET (50)
>>>                  Status: NFS4_OK (0)
>>>                  return on close?: Yes
>>>                  StateID
>>>                  Layout Segment (count: 1)
>>>                      offset: 122880
>>>                      length: 77824
>>>                      IO mode: IOMODE_RW (2)
>>>                      layout type: LAYOUT4_SCSI (5)
>>>                      SCSI Extents (count: 2)
>>>                          extent 0
>>>                              device ID: 01000000000000000000000000000000
>>>                              file offset: 122880
>>>                              length: 53248
>>>                              volume offset: 339460096
>>>                              extent state: INVALID_DATA (2)
>>>                          extent 1
>>>                              device ID: 01000000000000000000000000000000
>>>                              file offset: 122880
>>>                              length: 77824
>>>                              volume offset: 339460096
>>>                              extent state: INVALID_DATA (2)
>>>          [Main Opcode: LAYOUTGET (50)]
>> After reviewing ext_tree_insert(), with assist from Codex, I think this
>> function handles overlapping extents properly. The only issue I see in
>> ext_tree_insert() is the accuracy of the return error code, EINVAL instead
>> of ENOMEM, when kmemdup() fails.
>>
>> Since ext_tree_insert seems to handle overlapping extents fine, do you
>> think it's worth it to fix xfs_fs_map_blocks() to avoid returning overlap
>> extents?
>>
>> IMHO, I think we still should fix xfs_fs_map_blocks() to avoid any overhead
>> and complication in ext_tree_insert having to handle overlapping extents.
> I don't know enough about the nfs blocklayout code to say for sure, but
> it seems like you want to upsert the mapping returned by
> xfs_fs_map_blocks into the "ext_tree" right?

This is currently done on the NFS client side by ext_tree_insert(). The
question I have is should we enhance the server side by passing '0' to
xfs_fs_map_blocks() so the client does not have to do the work of
handling the overlap extents.

>
> And by "upsert" I mean "clear out any mappings for the (offset, length)
> range, then insert the new mapping", sort of like what the fuse iomap
> cache does:
> https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/tree/fs/fuse/fuse_iomap_cache.c?h=fuse-iomap-cache_2026-05-07*n1682__;Iw!!ACWV5N9M2RV99hQ!LP7Lgbj10myml6rtWPCyBcfoKlvvpS39fLQ4Dy8puJ9c8ZQbxV6ToyYupyVa8TrICy--mS-sUwGxrA$
>
> or I guess the xfs scrub bitmap support code does when you set a range:
> https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/tree/fs/xfs/scrub/bitmap.c?h=fuse-iomap-cache_2026-05-07*n395__;Iw!!ACWV5N9M2RV99hQ!LP7Lgbj10myml6rtWPCyBcfoKlvvpS39fLQ4Dy8puJ9c8ZQbxV6ToyYupyVa8TrICy--mS9n7W3nLw$
>
> But as I said before, I don't know if "two mappings retrieved in rapid
> succession that overlap" is actually an NFS error.

As far as I can tell,|nfsd4_block_proc_layoutget()| correctly advances the
offset and decrements the length after each call to|nfsd4_block_map_extent()|,
which passes the arguments through verbatim to|xfs_fs_map_blocks()|.

-Dai

>
> --D
>
>> -Dai
>>
>>> -Dai
>>>
>>>>> Also drop the check for (!error) since it was checked after call to
>>>>> xfs_bmapi_read().
>>>>>
>>>>> Fixes: cc6c40e09d7b1 ("NFSD/blocklayout: Support multiple
>>>>> extents per LAYOUTGET").
>>>>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>>>>> ---
>>>>>    fs/xfs/xfs_pnfs.c | 6 +++---
>>>>>    1 file changed, 3 insertions(+), 3 deletions(-)
>>>>>
>>>>> - This patch is based on top of the patch:
>>>>>     xfs: fix use of uninitialized imap in xfs_fs_map_blocks error path
>>>> The error changes should go into that patch, so please resend it with
>>>> that fixes.  Maybe as a series together with this patch to keep them
>>>> together.
>>>>
>>>>> @@ -172,6 +172,7 @@ xfs_fs_map_blocks(
>>>>>        offset_fsb = XFS_B_TO_FSBT(mp, offset);
>>>>>          lock_flags = xfs_ilock_data_map_shared(ip);
>>>>> +    bmapi_flags = 0;    /* return map for requested range only */
>>>> Just remove the variable and hard code the 0 in the xfs_bmapi_read call.
>>>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] xfs: fix overlapping extents returned for pNFS LAYOUTGET
  2026-05-14 17:19         ` Dai Ngo
@ 2026-05-14 17:49           ` Darrick J. Wong
  2026-05-15 21:39           ` Dave Chinner
  1 sibling, 0 replies; 13+ messages in thread
From: Darrick J. Wong @ 2026-05-14 17:49 UTC (permalink / raw)
  To: Dai Ngo; +Cc: Christoph Hellwig, cem, linux-xfs, linux-nfs

On Thu, May 14, 2026 at 10:19:14AM -0700, Dai Ngo wrote:
> 
> On 5/13/26 5:25 PM, Darrick J. Wong wrote:
> > On Wed, May 13, 2026 at 10:28:31AM -0700, Dai Ngo wrote:
> > > Hi Christoph,
> > > 
> > > On 5/13/26 8:50 AM, Dai Ngo wrote:
> > > > On 5/13/26 12:01 AM, Christoph Hellwig wrote:
> > > > > On Tue, May 12, 2026 at 10:21:53AM -0700, Dai Ngo wrote:
> > > > > > A single LAYOUTGET request from the client can cause the server to
> > > > > > issue multiple calls to xfs_fs_map_blocks() for different offsets
> > > > > > within the same extent. Because the use of XFS_BMAPI_ENTIRE flag,
> > > > > > these calls can produce overlapping mappings.
> > > > > > 
> > > > > > As a result, the LAYOUTGET reply sent to the NFS client may contain
> > > > > > overlapping extents. This creates ambiguity in extent selection for a
> > > > > > given file range, which can lead to incorrect device selection,
> > > > > > inconsistent handling of datastate, and ultimately data corruption or
> > > > > > protocol violations on the client side.
> > > > > Please also add a check to the client that catches this and doesn't
> > > > > use the layout that has extents outside the requested range. And maybe
> > > > > warn about it as well.
> > > > The returned extents cover exactly the range requested in the LAYOUTGET
> > > > op. However these extents are overlapping. For example, here is the
> > > > on-the-wire capture of the LAYOUTGET operation and reply showing the
> > > > overlapping extents:
> > > > 
> > > >      Network File System, Ops(3): SEQUENCE, PUTFH, LAYOUTGET
> > > >          [Program Version: 4]
> > > >          [V4 Procedure: COMPOUND (1)]
> > > >          Tag: <EMPTY>
> > > >          minorversion: 2
> > > >          Operations (count: 3): SEQUENCE, PUTFH, LAYOUTGET
> > > >              Opcode: SEQUENCE (53)
> > > >              Opcode: PUTFH (22)
> > > >              Opcode: LAYOUTGET (50)
> > > >                  layout available?: No
> > > >                  layout type: LAYOUT4_SCSI (5)
> > > >                  IO mode: IOMODE_RW (2)
> > > >                  offset: 122880
> > > >                  length: 65536
> > > >                  min length: 4096
> > > >                  StateID
> > > >                  maxcount: 4096
> > > >          [Main Opcode: LAYOUTGET (50)]
> > > >          Network File System, Ops(3): SEQUENCE PUTFH LAYOUTGET
> > > >          [Program Version: 4]
> > > >          [V4 Procedure: COMPOUND (1)]
> > > >          Status: NFS4_OK (0)
> > > >          Tag: <EMPTY>
> > > >          Operations (count: 3)
> > > >              Opcode: SEQUENCE (53)
> > > >              Opcode: PUTFH (22)
> > > >              Opcode: LAYOUTGET (50)
> > > >                  Status: NFS4_OK (0)
> > > >                  return on close?: Yes
> > > >                  StateID
> > > >                  Layout Segment (count: 1)
> > > >                      offset: 122880
> > > >                      length: 77824
> > > >                      IO mode: IOMODE_RW (2)
> > > >                      layout type: LAYOUT4_SCSI (5)
> > > >                      SCSI Extents (count: 2)
> > > >                          extent 0
> > > >                              device ID: 01000000000000000000000000000000
> > > >                              file offset: 122880
> > > >                              length: 53248
> > > >                              volume offset: 339460096
> > > >                              extent state: INVALID_DATA (2)
> > > >                          extent 1
> > > >                              device ID: 01000000000000000000000000000000
> > > >                              file offset: 122880
> > > >                              length: 77824
> > > >                              volume offset: 339460096
> > > >                              extent state: INVALID_DATA (2)
> > > >          [Main Opcode: LAYOUTGET (50)]
> > > After reviewing ext_tree_insert(), with assist from Codex, I think this
> > > function handles overlapping extents properly. The only issue I see in
> > > ext_tree_insert() is the accuracy of the return error code, EINVAL instead
> > > of ENOMEM, when kmemdup() fails.
> > > 
> > > Since ext_tree_insert seems to handle overlapping extents fine, do you
> > > think it's worth it to fix xfs_fs_map_blocks() to avoid returning overlap
> > > extents?
> > > 
> > > IMHO, I think we still should fix xfs_fs_map_blocks() to avoid any overhead
> > > and complication in ext_tree_insert having to handle overlapping extents.
> > I don't know enough about the nfs blocklayout code to say for sure, but
> > it seems like you want to upsert the mapping returned by
> > xfs_fs_map_blocks into the "ext_tree" right?
> 
> This is currently done on the NFS client side by ext_tree_insert(). The
> question I have is should we enhance the server side by passing '0' to
> xfs_fs_map_blocks() so the client does not have to do the work of
> handling the overlap extents.

Oh, ok.

Not passing XFS_BMAPI_ENTIRE into bmapi_read sounds fine to me.

> > 
> > And by "upsert" I mean "clear out any mappings for the (offset, length)
> > range, then insert the new mapping", sort of like what the fuse iomap
> > cache does:
> > https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/tree/fs/fuse/fuse_iomap_cache.c?h=fuse-iomap-cache_2026-05-07*n1682__;Iw!!ACWV5N9M2RV99hQ!LP7Lgbj10myml6rtWPCyBcfoKlvvpS39fLQ4Dy8puJ9c8ZQbxV6ToyYupyVa8TrICy--mS-sUwGxrA$
> > 
> > or I guess the xfs scrub bitmap support code does when you set a range:
> > https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/tree/fs/xfs/scrub/bitmap.c?h=fuse-iomap-cache_2026-05-07*n395__;Iw!!ACWV5N9M2RV99hQ!LP7Lgbj10myml6rtWPCyBcfoKlvvpS39fLQ4Dy8puJ9c8ZQbxV6ToyYupyVa8TrICy--mS9n7W3nLw$
> > 
> > But as I said before, I don't know if "two mappings retrieved in rapid
> > succession that overlap" is actually an NFS error.
> 
> As far as I can tell,|nfsd4_block_proc_layoutget()| correctly advances the
> offset and decrements the length after each call to|nfsd4_block_map_extent()|,
> which passes the arguments through verbatim to|xfs_fs_map_blocks()|.

<nod>

--D

> -Dai
> 
> > 
> > --D
> > 
> > > -Dai
> > > 
> > > > -Dai
> > > > 
> > > > > > Also drop the check for (!error) since it was checked after call to
> > > > > > xfs_bmapi_read().
> > > > > > 
> > > > > > Fixes: cc6c40e09d7b1 ("NFSD/blocklayout: Support multiple
> > > > > > extents per LAYOUTGET").
> > > > > > Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> > > > > > ---
> > > > > >    fs/xfs/xfs_pnfs.c | 6 +++---
> > > > > >    1 file changed, 3 insertions(+), 3 deletions(-)
> > > > > > 
> > > > > > - This patch is based on top of the patch:
> > > > > >     xfs: fix use of uninitialized imap in xfs_fs_map_blocks error path
> > > > > The error changes should go into that patch, so please resend it with
> > > > > that fixes.  Maybe as a series together with this patch to keep them
> > > > > together.
> > > > > 
> > > > > > @@ -172,6 +172,7 @@ xfs_fs_map_blocks(
> > > > > >        offset_fsb = XFS_B_TO_FSBT(mp, offset);
> > > > > >          lock_flags = xfs_ilock_data_map_shared(ip);
> > > > > > +    bmapi_flags = 0;    /* return map for requested range only */
> > > > > Just remove the variable and hard code the 0 in the xfs_bmapi_read call.
> > > > > 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] xfs: fix overlapping extents returned for pNFS LAYOUTGET
  2026-05-13 15:50   ` Dai Ngo
  2026-05-13 17:28     ` Dai Ngo
@ 2026-05-15 11:49     ` Christoph Hellwig
  1 sibling, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2026-05-15 11:49 UTC (permalink / raw)
  To: Dai Ngo; +Cc: Christoph Hellwig, cem, linux-xfs, linux-nfs

On Wed, May 13, 2026 at 08:50:29AM -0700, Dai Ngo wrote:
> 
> On 5/13/26 12:01 AM, Christoph Hellwig wrote:
> > On Tue, May 12, 2026 at 10:21:53AM -0700, Dai Ngo wrote:
> > > A single LAYOUTGET request from the client can cause the server to
> > > issue multiple calls to xfs_fs_map_blocks() for different offsets
> > > within the same extent. Because the use of XFS_BMAPI_ENTIRE flag,
> > > these calls can produce overlapping mappings.
> > > 
> > > As a result, the LAYOUTGET reply sent to the NFS client may contain
> > > overlapping extents. This creates ambiguity in extent selection for a
> > > given file range, which can lead to incorrect device selection,
> > > inconsistent handling of datastate, and ultimately data corruption or
> > > protocol violations on the client side.
> > Please also add a check to the client that catches this and doesn't
> > use the layout that has extents outside the requested range.  And maybe
> > warn about it as well.
> 
> The returned extents cover exactly the range requested in the LAYOUTGET
> op. However these extents are overlapping. For example, here is the
> on-the-wire capture of the LAYOUTGET operation and reply showing the
> overlapping extents:

Now that is really weird.  How do we end up not using the remainder
of the previous extent from a previous nfsd4_block_map_extent call
in nfsd4_block_proc_layoutget?  Looks like there is another bug hiding
in the nfsd code somewhere.

And the client should probably also validate that extents of the same
kind do not overlap (read vs write extents are allowed to overlap).


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] xfs: fix overlapping extents returned for pNFS LAYOUTGET
  2026-05-13 17:28     ` Dai Ngo
  2026-05-14  0:25       ` Darrick J. Wong
@ 2026-05-15 11:50       ` Christoph Hellwig
  1 sibling, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2026-05-15 11:50 UTC (permalink / raw)
  To: Dai Ngo; +Cc: Christoph Hellwig, cem, linux-xfs, linux-nfs

On Wed, May 13, 2026 at 10:28:31AM -0700, Dai Ngo wrote:
> After reviewing ext_tree_insert(), with assist from Codex, I think this
> function handles overlapping extents properly. The only issue I see in
> ext_tree_insert() is the accuracy of the return error code, EINVAL instead
> of ENOMEM, when kmemdup() fails.
> 
> Since ext_tree_insert seems to handle overlapping extents fine, do you
> think it's worth it to fix xfs_fs_map_blocks() to avoid returning overlap
> extents?

Oh, we absolutely should not return overlapping extents of the same
class.  So the bug fix itself is good, I was just hoping we could also
improve the sanity checking on the client side.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] xfs: fix overlapping extents returned for pNFS LAYOUTGET
  2026-05-14 17:19         ` Dai Ngo
  2026-05-14 17:49           ` Darrick J. Wong
@ 2026-05-15 21:39           ` Dave Chinner
  2026-05-16  2:14             ` Dai Ngo
  1 sibling, 1 reply; 13+ messages in thread
From: Dave Chinner @ 2026-05-15 21:39 UTC (permalink / raw)
  To: Dai Ngo; +Cc: Darrick J. Wong, Christoph Hellwig, cem, linux-xfs, linux-nfs

On Thu, May 14, 2026 at 10:19:14AM -0700, Dai Ngo wrote:
> On 5/13/26 5:25 PM, Darrick J. Wong wrote:
> > On Wed, May 13, 2026 at 10:28:31AM -0700, Dai Ngo wrote:
> > > IMHO, I think we still should fix xfs_fs_map_blocks() to avoid any overhead
> > > and complication in ext_tree_insert having to handle overlapping extents.
> > I don't know enough about the nfs blocklayout code to say for sure, but
> > it seems like you want to upsert the mapping returned by
> > xfs_fs_map_blocks into the "ext_tree" right?
> 
> This is currently done on the NFS client side by ext_tree_insert(). The
> question I have is should we enhance the server side by passing '0' to
> xfs_fs_map_blocks() so the client does not have to do the work of
> handling the overlap extents.

I think you've all missed the optimal solution to the problem. 

The problem is not the use of XFS_BMAPI_ENTIRE on the first mapping
call, it's the use of it on the -second- after the first call didn't
return a range that mapped the -entire- request range.

Hence the second and subsequent calls need range trimming so that
they don't overlap with the first range that was returned.

IOWs, we keep the use XFS_BMAPI_ENTIRE for the first mapping call
so we retain the optimisation that minimises the number of pNFS
client mapping calls needed, but if it needs to make a second call
we drop the ENTIRE flag and append extents trimmed to the range
being asked for (which doesn't include the first extent already
returned).

That was we get large extents reported most of the time, and in the
corner cases where we have a race like this one or an extent
boundary lies in the middle of the requested range, we will get
correct, non-overlapping behaviour.

Best of both worlds, yes?

-Dave.
-- 
Dave Chinner
dgc@kernel.org

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/1] xfs: fix overlapping extents returned for pNFS LAYOUTGET
  2026-05-15 21:39           ` Dave Chinner
@ 2026-05-16  2:14             ` Dai Ngo
  0 siblings, 0 replies; 13+ messages in thread
From: Dai Ngo @ 2026-05-16  2:14 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Darrick J. Wong, Christoph Hellwig, cem, linux-xfs, linux-nfs

Hi Dave,

Thank you for your feedback.

On 5/15/26 2:39 PM, Dave Chinner wrote:
> On Thu, May 14, 2026 at 10:19:14AM -0700, Dai Ngo wrote:
>> On 5/13/26 5:25 PM, Darrick J. Wong wrote:
>>> On Wed, May 13, 2026 at 10:28:31AM -0700, Dai Ngo wrote:
>>>> IMHO, I think we still should fix xfs_fs_map_blocks() to avoid any overhead
>>>> and complication in ext_tree_insert having to handle overlapping extents.
>>> I don't know enough about the nfs blocklayout code to say for sure, but
>>> it seems like you want to upsert the mapping returned by
>>> xfs_fs_map_blocks into the "ext_tree" right?
>> This is currently done on the NFS client side by ext_tree_insert(). The
>> question I have is should we enhance the server side by passing '0' to
>> xfs_fs_map_blocks() so the client does not have to do the work of
>> handling the overlap extents.
> I think you've all missed the optimal solution to the problem.
>
> The problem is not the use of XFS_BMAPI_ENTIRE on the first mapping
> call, it's the use of it on the -second- after the first call didn't
> return a range that mapped the -entire- request range.

Currently the map_blocks() API between the NFS server and XFS does not
provide a way to specify whether XFS should use XFS_BMAPI_ENTIRE or '0'.
xfs_fs_map_blocks() just uses XFS_BMAPI_ENTIRE.

On the first mapping call, NFS server always specify the whole file
range that requested by the client in the LAYOUTGET.

So if xfs_fs_map_blocks() can not return the requested mapping range
with '0' on the first mapping call then I think using XFS_BMAPI_ENTIRE
in the first mapping call makes any different.

-Dai

>
> Hence the second and subsequent calls need range trimming so that
> they don't overlap with the first range that was returned.
>
> IOWs, we keep the use XFS_BMAPI_ENTIRE for the first mapping call
> so we retain the optimisation that minimises the number of pNFS
> client mapping calls needed, but if it needs to make a second call
> we drop the ENTIRE flag and append extents trimmed to the range
> being asked for (which doesn't include the first extent already
> returned).
>
> That was we get large extents reported most of the time, and in the
> corner cases where we have a race like this one or an extent
> boundary lies in the middle of the requested range, we will get
> correct, non-overlapping behaviour.
>
> Best of both worlds, yes?
>
> -Dave.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-05-16  2:15 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-12 17:21 [PATCH 1/1] xfs: fix overlapping extents returned for pNFS LAYOUTGET Dai Ngo
2026-05-12 17:34 ` Darrick J. Wong
2026-05-12 19:21   ` Dai Ngo
2026-05-13  7:01 ` Christoph Hellwig
2026-05-13 15:50   ` Dai Ngo
2026-05-13 17:28     ` Dai Ngo
2026-05-14  0:25       ` Darrick J. Wong
2026-05-14 17:19         ` Dai Ngo
2026-05-14 17:49           ` Darrick J. Wong
2026-05-15 21:39           ` Dave Chinner
2026-05-16  2:14             ` Dai Ngo
2026-05-15 11:50       ` Christoph Hellwig
2026-05-15 11:49     ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.