Linux NFS development
 help / color / mirror / Atom feed
* [PATCH 1/1] xfs: fix overlapping extents returned for pNFS LAYOUTGET
@ 2026-05-12 17:21 Dai Ngo
  2026-05-12 17:34 ` Darrick J. Wong
  0 siblings, 1 reply; 3+ messages in thread
From: Dai Ngo @ 2026-05-12 17:21 UTC (permalink / raw)
  To: cem; +Cc: linux-xfs, linux-nfs

xfs_fs_map_blocks() currently passes XFS_BMAPI_ENTIRE to xfs_bmapi_read(),
which causes the bmap code to expand the mapping to cover the entire
extent rather than the requested range.

A single LAYOUTGET request from the client can cause the server to
issue multiple calls to xfs_fs_map_blocks() for different offsets
within the same extent. Because the use of XFS_BMAPI_ENTIRE flag,
these calls can produce overlapping mappings.

As a result, the LAYOUTGET reply sent to the NFS client may contain
overlapping extents. This creates ambiguity in extent selection for a
given file range, which can lead to incorrect device selection,
inconsistent handling of datastate, and ultimately data corruption or
protocol violations on the client side.

Problem discovered with xfstest generic/075 test using NFSv4.2 mount
with SCSI layout.

Fix this by replacing the XFS_BMAPI_ENTIRE flag with '0' so that
xfs_bmapi_read() returns only the mapping for the requested range.

Also drop the check for (!error) since it was checked after call to
xfs_bmapi_read().

Fixes: cc6c40e09d7b1 ("NFSD/blocklayout: Support multiple extents per LAYOUTGET").
Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
---
 fs/xfs/xfs_pnfs.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

- This patch is based on top of the patch:
  xfs: fix use of uninitialized imap in xfs_fs_map_blocks error path

diff --git a/fs/xfs/xfs_pnfs.c b/fs/xfs/xfs_pnfs.c
index f7c6dba3d21e..697bf3e4ad7e 100644
--- a/fs/xfs/xfs_pnfs.c
+++ b/fs/xfs/xfs_pnfs.c
@@ -118,7 +118,7 @@ xfs_fs_map_blocks(
 	struct xfs_bmbt_irec	imap;
 	xfs_fileoff_t		offset_fsb, end_fsb;
 	loff_t			limit;
-	int			bmapi_flags = XFS_BMAPI_ENTIRE;
+	int			bmapi_flags;
 	int			nimaps = 1;
 	uint			lock_flags;
 	int			error = 0;
@@ -172,6 +172,7 @@ xfs_fs_map_blocks(
 	offset_fsb = XFS_B_TO_FSBT(mp, offset);
 
 	lock_flags = xfs_ilock_data_map_shared(ip);
+	bmapi_flags = 0;	/* return map for requested range only */
 	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
 				&imap, &nimaps, bmapi_flags);
 	if (error) {
@@ -182,8 +183,7 @@ xfs_fs_map_blocks(
 
 	ASSERT(!nimaps || imap.br_startblock != DELAYSTARTBLOCK);
 
-	if (!error && write &&
-	    (!nimaps || imap.br_startblock == HOLESTARTBLOCK)) {
+	if (write && (!nimaps || imap.br_startblock == HOLESTARTBLOCK)) {
 		if (offset + length > XFS_ISIZE(ip))
 			end_fsb = xfs_iomap_eof_align_last_fsb(ip, end_fsb);
 		else if (nimaps && imap.br_startblock == HOLESTARTBLOCK)
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH 1/1] xfs: fix overlapping extents returned for pNFS LAYOUTGET
  2026-05-12 17:21 [PATCH 1/1] xfs: fix overlapping extents returned for pNFS LAYOUTGET Dai Ngo
@ 2026-05-12 17:34 ` Darrick J. Wong
  2026-05-12 19:21   ` Dai Ngo
  0 siblings, 1 reply; 3+ messages in thread
From: Darrick J. Wong @ 2026-05-12 17:34 UTC (permalink / raw)
  To: Dai Ngo; +Cc: cem, linux-xfs, linux-nfs

On Tue, May 12, 2026 at 10:21:53AM -0700, Dai Ngo wrote:
> xfs_fs_map_blocks() currently passes XFS_BMAPI_ENTIRE to xfs_bmapi_read(),
> which causes the bmap code to expand the mapping to cover the entire
> extent rather than the requested range.

Nitpicking: _ENTIRE causes bmapi_read to return the whole extent instead
of trimming it down to the requested range.

> A single LAYOUTGET request from the client can cause the server to
> issue multiple calls to xfs_fs_map_blocks() for different offsets
> within the same extent. Because the use of XFS_BMAPI_ENTIRE flag,
> these calls can produce overlapping mappings.
> 
> As a result, the LAYOUTGET reply sent to the NFS client may contain
> overlapping extents. This creates ambiguity in extent selection for a
> given file range, which can lead to incorrect device selection,
> inconsistent handling of datastate, and ultimately data corruption or
> protocol violations on the client side.
> 
> Problem discovered with xfstest generic/075 test using NFSv4.2 mount
> with SCSI layout.

Might be helpful to provide an example of the request vs. the
overlapping layouts.  IIRC the client asks for a layout for the first
32 fsblocks of the file.  On the first call to xfs_fs_map_blocks, block
0 is a single unwritten mapping, so that gets returned.

Meanwhile, another thread fallocates block 2 and gets lucky in that an
adjacent block is free, so the first mapping in the file is now 2
unwritten fsblocks starting at 0.  This can happen because we don't hold
i_rwsem (or the ILOCK) between calls to ->map_blocks.

Returning to the first thread, it calls xfs_fs_map_blocks again to map
block 1.  However, the mapping's been changed, so we now return the
entire 2-fsblock  mapping.  What gets sent to the client is

{.offset = 0, .length = 4096, .addr = X, .dev = Y},
{.offset = 0, .length = 8192, .addr = X, .dev = Y},

and the client rejects that as overlapping.  Right?

> Fix this by replacing the XFS_BMAPI_ENTIRE flag with '0' so that
> xfs_bmapi_read() returns only the mapping for the requested range.
> 
> Also drop the check for (!error) since it was checked after call to
> xfs_bmapi_read().

Cc: <stable@vger.kernel.org> # v6.19

> Fixes: cc6c40e09d7b1 ("NFSD/blocklayout: Support multiple extents per LAYOUTGET").
> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> ---
>  fs/xfs/xfs_pnfs.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> - This patch is based on top of the patch:
>   xfs: fix use of uninitialized imap in xfs_fs_map_blocks error path
> 
> diff --git a/fs/xfs/xfs_pnfs.c b/fs/xfs/xfs_pnfs.c
> index f7c6dba3d21e..697bf3e4ad7e 100644
> --- a/fs/xfs/xfs_pnfs.c
> +++ b/fs/xfs/xfs_pnfs.c
> @@ -118,7 +118,7 @@ xfs_fs_map_blocks(
>  	struct xfs_bmbt_irec	imap;
>  	xfs_fileoff_t		offset_fsb, end_fsb;
>  	loff_t			limit;
> -	int			bmapi_flags = XFS_BMAPI_ENTIRE;
> +	int			bmapi_flags;

Why not just replace the argument to xfs_bmapi_read with a constant
zero?

--D

>  	int			nimaps = 1;
>  	uint			lock_flags;
>  	int			error = 0;
> @@ -172,6 +172,7 @@ xfs_fs_map_blocks(
>  	offset_fsb = XFS_B_TO_FSBT(mp, offset);
>  
>  	lock_flags = xfs_ilock_data_map_shared(ip);
> +	bmapi_flags = 0;	/* return map for requested range only */
>  	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
>  				&imap, &nimaps, bmapi_flags);
>  	if (error) {
> @@ -182,8 +183,7 @@ xfs_fs_map_blocks(
>  
>  	ASSERT(!nimaps || imap.br_startblock != DELAYSTARTBLOCK);
>  
> -	if (!error && write &&
> -	    (!nimaps || imap.br_startblock == HOLESTARTBLOCK)) {
> +	if (write && (!nimaps || imap.br_startblock == HOLESTARTBLOCK)) {
>  		if (offset + length > XFS_ISIZE(ip))
>  			end_fsb = xfs_iomap_eof_align_last_fsb(ip, end_fsb);
>  		else if (nimaps && imap.br_startblock == HOLESTARTBLOCK)
> -- 
> 2.47.3
> 
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH 1/1] xfs: fix overlapping extents returned for pNFS LAYOUTGET
  2026-05-12 17:34 ` Darrick J. Wong
@ 2026-05-12 19:21   ` Dai Ngo
  0 siblings, 0 replies; 3+ messages in thread
From: Dai Ngo @ 2026-05-12 19:21 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: cem, linux-xfs, linux-nfs

Thank you for your review!

On 5/12/26 10:34 AM, Darrick J. Wong wrote:
> On Tue, May 12, 2026 at 10:21:53AM -0700, Dai Ngo wrote:
>> xfs_fs_map_blocks() currently passes XFS_BMAPI_ENTIRE to xfs_bmapi_read(),
>> which causes the bmap code to expand the mapping to cover the entire
>> extent rather than the requested range.
> Nitpicking: _ENTIRE causes bmapi_read to return the whole extent instead
> of trimming it down to the requested range.

Fix in v2.

>
>> A single LAYOUTGET request from the client can cause the server to
>> issue multiple calls to xfs_fs_map_blocks() for different offsets
>> within the same extent. Because the use of XFS_BMAPI_ENTIRE flag,
>> these calls can produce overlapping mappings.
>>
>> As a result, the LAYOUTGET reply sent to the NFS client may contain
>> overlapping extents. This creates ambiguity in extent selection for a
>> given file range, which can lead to incorrect device selection,
>> inconsistent handling of datastate, and ultimately data corruption or
>> protocol violations on the client side.
>>
>> Problem discovered with xfstest generic/075 test using NFSv4.2 mount
>> with SCSI layout.
> Might be helpful to provide an example of the request vs. the
> overlapping layouts.  IIRC the client asks for a layout for the first
> 32 fsblocks of the file.  On the first call to xfs_fs_map_blocks, block
> 0 is a single unwritten mapping, so that gets returned.
>
> Meanwhile, another thread fallocates block 2 and gets lucky in that an
> adjacent block is free, so the first mapping in the file is now 2
> unwritten fsblocks starting at 0.  This can happen because we don't hold
> i_rwsem (or the ILOCK) between calls to ->map_blocks.
>
> Returning to the first thread, it calls xfs_fs_map_blocks again to map
> block 1.  However, the mapping's been changed, so we now return the
> entire 2-fsblock  mapping.

I'm not sure why the file map gets change between the successive calls
from the same thread that services the LAYOUTGET. This test does lots
of ALLOCATE and DEALLOCATE ops prior to the LAYOUTGET.

>    What gets sent to the client is
>
> {.offset = 0, .length = 4096, .addr = X, .dev = Y},
> {.offset = 0, .length = 8192, .addr = X, .dev = Y},

Yes, I will include on-the-wire capture of a LAYOUTGET operation and
reply showing the overlapping extents in v2.

> and the client rejects that as overlapping.  Right?

The Linux client uses a red-black tree to maintain these extents.
Overlapping extents can cause the client to select the wrong extent,
resulting in intermittent test failures.

>
>> Fix this by replacing the XFS_BMAPI_ENTIRE flag with '0' so that
>> xfs_bmapi_read() returns only the mapping for the requested range.
>>
>> Also drop the check for (!error) since it was checked after call to
>> xfs_bmapi_read().
> Cc: <stable@vger.kernel.org> # v6.19

Fix in v2.

>
>> Fixes: cc6c40e09d7b1 ("NFSD/blocklayout: Support multiple extents per LAYOUTGET").
>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>> ---
>>   fs/xfs/xfs_pnfs.c | 6 +++---
>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> - This patch is based on top of the patch:
>>    xfs: fix use of uninitialized imap in xfs_fs_map_blocks error path
>>
>> diff --git a/fs/xfs/xfs_pnfs.c b/fs/xfs/xfs_pnfs.c
>> index f7c6dba3d21e..697bf3e4ad7e 100644
>> --- a/fs/xfs/xfs_pnfs.c
>> +++ b/fs/xfs/xfs_pnfs.c
>> @@ -118,7 +118,7 @@ xfs_fs_map_blocks(
>>   	struct xfs_bmbt_irec	imap;
>>   	xfs_fileoff_t		offset_fsb, end_fsb;
>>   	loff_t			limit;
>> -	int			bmapi_flags = XFS_BMAPI_ENTIRE;
>> +	int			bmapi_flags;
> Why not just replace the argument to xfs_bmapi_read with a constant
> zero?

This makes the code easier to understand since there is no comment
describing the meaning of the '0' argument passed toxfs_bmapi_read().

Thanks,
-Dai

>
> --D
>
>>   	int			nimaps = 1;
>>   	uint			lock_flags;
>>   	int			error = 0;
>> @@ -172,6 +172,7 @@ xfs_fs_map_blocks(
>>   	offset_fsb = XFS_B_TO_FSBT(mp, offset);
>>   
>>   	lock_flags = xfs_ilock_data_map_shared(ip);
>> +	bmapi_flags = 0;	/* return map for requested range only */
>>   	error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
>>   				&imap, &nimaps, bmapi_flags);
>>   	if (error) {
>> @@ -182,8 +183,7 @@ xfs_fs_map_blocks(
>>   
>>   	ASSERT(!nimaps || imap.br_startblock != DELAYSTARTBLOCK);
>>   
>> -	if (!error && write &&
>> -	    (!nimaps || imap.br_startblock == HOLESTARTBLOCK)) {
>> +	if (write && (!nimaps || imap.br_startblock == HOLESTARTBLOCK)) {
>>   		if (offset + length > XFS_ISIZE(ip))
>>   			end_fsb = xfs_iomap_eof_align_last_fsb(ip, end_fsb);
>>   		else if (nimaps && imap.br_startblock == HOLESTARTBLOCK)
>> -- 
>> 2.47.3
>>
>>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-05-12 19:21 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-12 17:21 [PATCH 1/1] xfs: fix overlapping extents returned for pNFS LAYOUTGET Dai Ngo
2026-05-12 17:34 ` Darrick J. Wong
2026-05-12 19:21   ` Dai Ngo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox