From: Dai Ngo <dai.ngo@oracle.com>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: cem@kernel.org, linux-xfs@vger.kernel.org, linux-nfs@vger.kernel.org
Subject: Re: [PATCH 1/1] xfs: fix overlapping extents returned for pNFS LAYOUTGET
Date: Tue, 12 May 2026 12:21:07 -0700 [thread overview]
Message-ID: <bee02ca2-ca85-4098-b025-21532d1317e9@oracle.com> (raw)
In-Reply-To: <20260512173402.GO9555@frogsfrogsfrogs>
Thank you for your review!
On 5/12/26 10:34 AM, Darrick J. Wong wrote:
> On Tue, May 12, 2026 at 10:21:53AM -0700, Dai Ngo wrote:
>> xfs_fs_map_blocks() currently passes XFS_BMAPI_ENTIRE to xfs_bmapi_read(),
>> which causes the bmap code to expand the mapping to cover the entire
>> extent rather than the requested range.
> Nitpicking: _ENTIRE causes bmapi_read to return the whole extent instead
> of trimming it down to the requested range.
Fix in v2.
>
>> A single LAYOUTGET request from the client can cause the server to
>> issue multiple calls to xfs_fs_map_blocks() for different offsets
>> within the same extent. Because the use of XFS_BMAPI_ENTIRE flag,
>> these calls can produce overlapping mappings.
>>
>> As a result, the LAYOUTGET reply sent to the NFS client may contain
>> overlapping extents. This creates ambiguity in extent selection for a
>> given file range, which can lead to incorrect device selection,
>> inconsistent handling of datastate, and ultimately data corruption or
>> protocol violations on the client side.
>>
>> Problem discovered with xfstest generic/075 test using NFSv4.2 mount
>> with SCSI layout.
> Might be helpful to provide an example of the request vs. the
> overlapping layouts. IIRC the client asks for a layout for the first
> 32 fsblocks of the file. On the first call to xfs_fs_map_blocks, block
> 0 is a single unwritten mapping, so that gets returned.
>
> Meanwhile, another thread fallocates block 2 and gets lucky in that an
> adjacent block is free, so the first mapping in the file is now 2
> unwritten fsblocks starting at 0. This can happen because we don't hold
> i_rwsem (or the ILOCK) between calls to ->map_blocks.
>
> Returning to the first thread, it calls xfs_fs_map_blocks again to map
> block 1. However, the mapping's been changed, so we now return the
> entire 2-fsblock mapping.
I'm not sure why the file map gets change between the successive calls
from the same thread that services the LAYOUTGET. This test does lots
of ALLOCATE and DEALLOCATE ops prior to the LAYOUTGET.
> What gets sent to the client is
>
> {.offset = 0, .length = 4096, .addr = X, .dev = Y},
> {.offset = 0, .length = 8192, .addr = X, .dev = Y},
Yes, I will include on-the-wire capture of a LAYOUTGET operation and
reply showing the overlapping extents in v2.
> and the client rejects that as overlapping. Right?
The Linux client uses a red-black tree to maintain these extents.
Overlapping extents can cause the client to select the wrong extent,
resulting in intermittent test failures.
>
>> Fix this by replacing the XFS_BMAPI_ENTIRE flag with '0' so that
>> xfs_bmapi_read() returns only the mapping for the requested range.
>>
>> Also drop the check for (!error) since it was checked after call to
>> xfs_bmapi_read().
> Cc: <stable@vger.kernel.org> # v6.19
Fix in v2.
>
>> Fixes: cc6c40e09d7b1 ("NFSD/blocklayout: Support multiple extents per LAYOUTGET").
>> Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
>> ---
>> fs/xfs/xfs_pnfs.c | 6 +++---
>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> - This patch is based on top of the patch:
>> xfs: fix use of uninitialized imap in xfs_fs_map_blocks error path
>>
>> diff --git a/fs/xfs/xfs_pnfs.c b/fs/xfs/xfs_pnfs.c
>> index f7c6dba3d21e..697bf3e4ad7e 100644
>> --- a/fs/xfs/xfs_pnfs.c
>> +++ b/fs/xfs/xfs_pnfs.c
>> @@ -118,7 +118,7 @@ xfs_fs_map_blocks(
>> struct xfs_bmbt_irec imap;
>> xfs_fileoff_t offset_fsb, end_fsb;
>> loff_t limit;
>> - int bmapi_flags = XFS_BMAPI_ENTIRE;
>> + int bmapi_flags;
> Why not just replace the argument to xfs_bmapi_read with a constant
> zero?
This makes the code easier to understand since there is no comment
describing the meaning of the '0' argument passed toxfs_bmapi_read().
Thanks,
-Dai
>
> --D
>
>> int nimaps = 1;
>> uint lock_flags;
>> int error = 0;
>> @@ -172,6 +172,7 @@ xfs_fs_map_blocks(
>> offset_fsb = XFS_B_TO_FSBT(mp, offset);
>>
>> lock_flags = xfs_ilock_data_map_shared(ip);
>> + bmapi_flags = 0; /* return map for requested range only */
>> error = xfs_bmapi_read(ip, offset_fsb, end_fsb - offset_fsb,
>> &imap, &nimaps, bmapi_flags);
>> if (error) {
>> @@ -182,8 +183,7 @@ xfs_fs_map_blocks(
>>
>> ASSERT(!nimaps || imap.br_startblock != DELAYSTARTBLOCK);
>>
>> - if (!error && write &&
>> - (!nimaps || imap.br_startblock == HOLESTARTBLOCK)) {
>> + if (write && (!nimaps || imap.br_startblock == HOLESTARTBLOCK)) {
>> if (offset + length > XFS_ISIZE(ip))
>> end_fsb = xfs_iomap_eof_align_last_fsb(ip, end_fsb);
>> else if (nimaps && imap.br_startblock == HOLESTARTBLOCK)
>> --
>> 2.47.3
>>
>>
prev parent reply other threads:[~2026-05-12 19:21 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-12 17:21 [PATCH 1/1] xfs: fix overlapping extents returned for pNFS LAYOUTGET Dai Ngo
2026-05-12 17:34 ` Darrick J. Wong
2026-05-12 19:21 ` Dai Ngo [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bee02ca2-ca85-4098-b025-21532d1317e9@oracle.com \
--to=dai.ngo@oracle.com \
--cc=cem@kernel.org \
--cc=djwong@kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox