Re: [PATCH 1/1] xfs: fix overlapping extents returned for pNFS LAYOUTGET

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Darrick J. Wong" <djwong@kernel.org>
To: Dai Ngo <dai.ngo@oracle.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	cem@kernel.org, linux-xfs@vger.kernel.org,
	linux-nfs@vger.kernel.org
Subject: Re: [PATCH 1/1] xfs: fix overlapping extents returned for pNFS LAYOUTGET
Date: Wed, 13 May 2026 17:25:13 -0700	[thread overview]
Message-ID: <20260514002513.GQ9555@frogsfrogsfrogs> (raw)
In-Reply-To: <06d9b1ae-e46f-459c-bcb4-1a5ca4ded4b0@oracle.com>

On Wed, May 13, 2026 at 10:28:31AM -0700, Dai Ngo wrote:
> Hi Christoph,
> 
> On 5/13/26 8:50 AM, Dai Ngo wrote:
> > 
> > On 5/13/26 12:01 AM, Christoph Hellwig wrote:
> > > On Tue, May 12, 2026 at 10:21:53AM -0700, Dai Ngo wrote:
> > > > A single LAYOUTGET request from the client can cause the server to
> > > > issue multiple calls to xfs_fs_map_blocks() for different offsets
> > > > within the same extent. Because the use of XFS_BMAPI_ENTIRE flag,
> > > > these calls can produce overlapping mappings.
> > > > 
> > > > As a result, the LAYOUTGET reply sent to the NFS client may contain
> > > > overlapping extents. This creates ambiguity in extent selection for a
> > > > given file range, which can lead to incorrect device selection,
> > > > inconsistent handling of datastate, and ultimately data corruption or
> > > > protocol violations on the client side.
> > > Please also add a check to the client that catches this and doesn't
> > > use the layout that has extents outside the requested range. And maybe
> > > warn about it as well.
> > 
> > The returned extents cover exactly the range requested in the LAYOUTGET
> > op. However these extents are overlapping. For example, here is the
> > on-the-wire capture of the LAYOUTGET operation and reply showing the
> > overlapping extents:
> > 
> >     Network File System, Ops(3): SEQUENCE, PUTFH, LAYOUTGET
> >         [Program Version: 4]
> >         [V4 Procedure: COMPOUND (1)]
> >         Tag: <EMPTY>
> >         minorversion: 2
> >         Operations (count: 3): SEQUENCE, PUTFH, LAYOUTGET
> >             Opcode: SEQUENCE (53)
> >             Opcode: PUTFH (22)
> >             Opcode: LAYOUTGET (50)
> >                 layout available?: No
> >                 layout type: LAYOUT4_SCSI (5)
> >                 IO mode: IOMODE_RW (2)
> >                 offset: 122880
> >                 length: 65536
> >                 min length: 4096
> >                 StateID
> >                 maxcount: 4096
> >         [Main Opcode: LAYOUTGET (50)]
> >         Network File System, Ops(3): SEQUENCE PUTFH LAYOUTGET
> >         [Program Version: 4]
> >         [V4 Procedure: COMPOUND (1)]
> >         Status: NFS4_OK (0)
> >         Tag: <EMPTY>
> >         Operations (count: 3)
> >             Opcode: SEQUENCE (53)
> >             Opcode: PUTFH (22)
> >             Opcode: LAYOUTGET (50)
> >                 Status: NFS4_OK (0)
> >                 return on close?: Yes
> >                 StateID
> >                 Layout Segment (count: 1)
> >                     offset: 122880
> >                     length: 77824
> >                     IO mode: IOMODE_RW (2)
> >                     layout type: LAYOUT4_SCSI (5)
> >                     SCSI Extents (count: 2)
> >                         extent 0
> >                             device ID: 01000000000000000000000000000000
> >                             file offset: 122880
> >                             length: 53248
> >                             volume offset: 339460096
> >                             extent state: INVALID_DATA (2)
> >                         extent 1
> >                             device ID: 01000000000000000000000000000000
> >                             file offset: 122880
> >                             length: 77824
> >                             volume offset: 339460096
> >                             extent state: INVALID_DATA (2)
> >         [Main Opcode: LAYOUTGET (50)]
> 
> After reviewing ext_tree_insert(), with assist from Codex, I think this
> function handles overlapping extents properly. The only issue I see in
> ext_tree_insert() is the accuracy of the return error code, EINVAL instead
> of ENOMEM, when kmemdup() fails.
> 
> Since ext_tree_insert seems to handle overlapping extents fine, do you
> think it's worth it to fix xfs_fs_map_blocks() to avoid returning overlap
> extents?
> 
> IMHO, I think we still should fix xfs_fs_map_blocks() to avoid any overhead
> and complication in ext_tree_insert having to handle overlapping extents.

I don't know enough about the nfs blocklayout code to say for sure, but
it seems like you want to upsert the mapping returned by
xfs_fs_map_blocks into the "ext_tree" right?

And by "upsert" I mean "clear out any mappings for the (offset, length)
range, then insert the new mapping", sort of like what the fuse iomap
cache does:
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/tree/fs/fuse/fuse_iomap_cache.c?h=fuse-iomap-cache_2026-05-07#n1682

or I guess the xfs scrub bitmap support code does when you set a range:
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/tree/fs/xfs/scrub/bitmap.c?h=fuse-iomap-cache_2026-05-07#n395

But as I said before, I don't know if "two mappings retrieved in rapid
succession that overlap" is actually an NFS error.

--D

> -Dai
> 
> > 
> > -Dai
> > 
> > > 
> > > > Also drop the check for (!error) since it was checked after call to
> > > > xfs_bmapi_read().
> > > > 
> > > > Fixes: cc6c40e09d7b1 ("NFSD/blocklayout: Support multiple
> > > > extents per LAYOUTGET").
> > > > Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
> > > > ---
> > > >   fs/xfs/xfs_pnfs.c | 6 +++---
> > > >   1 file changed, 3 insertions(+), 3 deletions(-)
> > > > 
> > > > - This patch is based on top of the patch:
> > > >    xfs: fix use of uninitialized imap in xfs_fs_map_blocks error path
> > > The error changes should go into that patch, so please resend it with
> > > that fixes.  Maybe as a series together with this patch to keep them
> > > together.
> > > 
> > > > @@ -172,6 +172,7 @@ xfs_fs_map_blocks(
> > > >       offset_fsb = XFS_B_TO_FSBT(mp, offset);
> > > >         lock_flags = xfs_ilock_data_map_shared(ip);
> > > > +    bmapi_flags = 0;    /* return map for requested range only */
> > > Just remove the variable and hard code the 0 in the xfs_bmapi_read call.
> > > 
> > 
>

next prev parent reply	other threads:[~2026-05-14  0:25 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-12 17:21 [PATCH 1/1] xfs: fix overlapping extents returned for pNFS LAYOUTGET Dai Ngo
2026-05-12 17:34 ` Darrick J. Wong
2026-05-12 19:21   ` Dai Ngo
2026-05-13  7:01 ` Christoph Hellwig
2026-05-13 15:50   ` Dai Ngo
2026-05-13 17:28     ` Dai Ngo
2026-05-14  0:25       ` Darrick J. Wong [this message]
2026-05-14 17:19         ` Dai Ngo
2026-05-14 17:49           ` Darrick J. Wong
2026-05-15 21:39           ` Dave Chinner
2026-05-16  2:14             ` Dai Ngo
2026-05-15 11:50       ` Christoph Hellwig
2026-05-15 11:49     ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260514002513.GQ9555@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=cem@kernel.org \
    --cc=dai.ngo@oracle.com \
    --cc=hch@infradead.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.