* [PATCH 0/2] NFSv4/flexfiles: fix unwanted in-band IO fallback from DS to MDS @ 2026-06-04 20:24 Mike Snitzer 2026-06-04 20:24 ` [PATCH 1/2] NFSv4/flexfiles: honor FF_FLAGS_NO_IO_THRU_MDS on fatal DS connect errors Mike Snitzer 2026-06-04 20:24 ` [PATCH 2/2] NFSv4/flexfiles: honor FF_FLAGS_NO_IO_THRU_MDS in pg_get_mirror_count_write Mike Snitzer 0 siblings, 2 replies; 4+ messages in thread From: Mike Snitzer @ 2026-06-04 20:24 UTC (permalink / raw) To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs If FF_FLAGS_NO_IO_THRU_MDS flag is set then NFSv4 flexfiles should never fallback to trying to issue IO directly to the MDS. These patches fix 2 locations where FF_FLAGS_NO_IO_THRU_MDS should inform the flexfiles client's behavior but it isn't considered. A Hammerspace test that caused the time between layout return and get to be much longer widened exposure for unwanted in-band IO to be issued by flexfiles. The test allowed for these fixes to be developed and the test now passes. All review appreciated. Thanks, Mike Mike Snitzer (2): NFSv4/flexfiles: honor FF_FLAGS_NO_IO_THRU_MDS on fatal DS connect errors NFSv4/flexfiles: honor FF_FLAGS_NO_IO_THRU_MDS in pg_get_mirror_count_write fs/nfs/flexfilelayout/flexfilelayout.c | 29 ++++++++++++++++++++++++++ fs/nfs/flexfilelayout/flexfilelayout.h | 16 ++++++++++++++ 2 files changed, 45 insertions(+) -- 2.44.0 ^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH 1/2] NFSv4/flexfiles: honor FF_FLAGS_NO_IO_THRU_MDS on fatal DS connect errors 2026-06-04 20:24 [PATCH 0/2] NFSv4/flexfiles: fix unwanted in-band IO fallback from DS to MDS Mike Snitzer @ 2026-06-04 20:24 ` Mike Snitzer 2026-06-09 17:14 ` Mkrtchyan, Tigran 2026-06-04 20:24 ` [PATCH 2/2] NFSv4/flexfiles: honor FF_FLAGS_NO_IO_THRU_MDS in pg_get_mirror_count_write Mike Snitzer 1 sibling, 1 reply; 4+ messages in thread From: Mike Snitzer @ 2026-06-04 20:24 UTC (permalink / raw) To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs Commit f06bedfa62d5 ("pNFS/flexfiles: don't attempt pnfs on fatal DS errors") teaches ff_layout_{read,write}_pagelist() to return PNFS_NOT_ATTEMPTED when nfs4_ff_layout_prepare_ds() fails with a nfs_error_is_fatal() errno (e.g. -ETIMEDOUT from a SOFTCONN connect deadline, -ENOMEM, -ERESTARTSYS), so that the client gives up instead of spinning. pnfs_do_{read,write}() then dispatches the I/O through pnfs_{read,write}_through_mds() → nfs_pageio_reset_{read,write}_mds(). That fallback is unconditional and silently violates FF_FLAGS_NO_IO_THRU_MDS: when the layout segment carries the flag (typically single-mirror appliance layouts where MDS I/O is explicitly forbidden), the out_failed: path's \`&& !ds_fatal_error\` clause overrides the flag's short-circuit through ff_layout_avoid_mds_available_ds() and routes the I/O to the MDS file handle anyway. This is reachable in practice during a data-server restart: SOFTCONN exhaustion produces -ETIMEDOUT, which is fatal per nfs_error_is_fatal(), which triggers PNFS_NOT_ATTEMPTED, which silently goes to MDS. Preserve the upstream "don't spin on fatal errors" intent for layouts that permit MDS fallback. For layouts with FF_FLAGS_NO_IO_THRU_MDS set, mark the layout for return and request PNFS_TRY_AGAIN instead; if the server cannot supply a usable layout the failure now surfaces cleanly via pnfs_update_layout(), rather than via silent MDS I/O that contradicts the flag. Fixes: f06bedfa62d5 ("pNFS/flexfiles: don't attempt pnfs on fatal DS errors") Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Mike Snitzer <snitzer@kernel.org> --- fs/nfs/flexfilelayout/flexfilelayout.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c index 4d142f1fdf61a..38bcd260e0a91 100644 --- a/fs/nfs/flexfilelayout/flexfilelayout.c +++ b/fs/nfs/flexfilelayout/flexfilelayout.c @@ -2204,6 +2204,14 @@ ff_layout_read_pagelist(struct nfs_pgio_header *hdr) out_failed: if (ff_layout_avoid_mds_available_ds(lseg) && !ds_fatal_error) return PNFS_TRY_AGAIN; + if (ff_layout_no_fallback_to_mds(lseg)) { + /* + * FF_FLAGS_NO_IO_THRU_MDS: force fresh LAYOUTGET, + * never fall through to MDS I/O. + */ + pnfs_error_mark_layout_for_return(hdr->inode, lseg); + return PNFS_TRY_AGAIN; + } trace_pnfs_mds_fallback_read_pagelist(hdr->inode, hdr->args.offset, hdr->args.count, IOMODE_READ, NFS_I(hdr->inode)->layout, lseg); @@ -2289,6 +2297,14 @@ ff_layout_write_pagelist(struct nfs_pgio_header *hdr, int sync) out_failed: if (ff_layout_avoid_mds_available_ds(lseg) && !ds_fatal_error) return PNFS_TRY_AGAIN; + if (ff_layout_no_fallback_to_mds(lseg)) { + /* + * FF_FLAGS_NO_IO_THRU_MDS: force fresh LAYOUTGET, + * never fall through to MDS I/O. + */ + pnfs_error_mark_layout_for_return(hdr->inode, lseg); + return PNFS_TRY_AGAIN; + } trace_pnfs_mds_fallback_write_pagelist(hdr->inode, hdr->args.offset, hdr->args.count, IOMODE_RW, NFS_I(hdr->inode)->layout, lseg); -- 2.44.0 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH 1/2] NFSv4/flexfiles: honor FF_FLAGS_NO_IO_THRU_MDS on fatal DS connect errors 2026-06-04 20:24 ` [PATCH 1/2] NFSv4/flexfiles: honor FF_FLAGS_NO_IO_THRU_MDS on fatal DS connect errors Mike Snitzer @ 2026-06-09 17:14 ` Mkrtchyan, Tigran 0 siblings, 0 replies; 4+ messages in thread From: Mkrtchyan, Tigran @ 2026-06-09 17:14 UTC (permalink / raw) To: Mike Snitzer; +Cc: Trond Myklebust, Anna Schumaker, linux-nfs [-- Attachment #1: Type: text/plain, Size: 3853 bytes --] Just to hand up. I built the proposed patches on top of 7.1.0-rc6 and ran a bunch of tests against the dCache NFS server with flexfile layout with tightly coupled NFSv4.1 DSes. No smoking guns detected. Thanks, Mile. Best, Tigran. ----- Original Message ----- > From: "Mike Snitzer" <snitzer@kernel.org> > To: "Trond Myklebust" <trondmy@kernel.org>, "Anna Schumaker" <anna@kernel.org> > Cc: "linux-nfs" <linux-nfs@vger.kernel.org> > Sent: Thursday, 4 June, 2026 22:24:02 > Subject: [PATCH 1/2] NFSv4/flexfiles: honor FF_FLAGS_NO_IO_THRU_MDS on fatal DS connect errors > Commit f06bedfa62d5 ("pNFS/flexfiles: don't attempt pnfs on fatal DS > errors") teaches ff_layout_{read,write}_pagelist() to return > PNFS_NOT_ATTEMPTED when nfs4_ff_layout_prepare_ds() fails with a > nfs_error_is_fatal() errno (e.g. -ETIMEDOUT from a SOFTCONN connect > deadline, -ENOMEM, -ERESTARTSYS), so that the client gives up instead > of spinning. pnfs_do_{read,write}() then dispatches the I/O through > pnfs_{read,write}_through_mds() → nfs_pageio_reset_{read,write}_mds(). > > That fallback is unconditional and silently violates FF_FLAGS_NO_IO_THRU_MDS: > when the layout segment carries the flag (typically single-mirror > appliance layouts where MDS I/O is explicitly forbidden), the > out_failed: path's \`&& !ds_fatal_error\` clause overrides the flag's > short-circuit through ff_layout_avoid_mds_available_ds() and routes > the I/O to the MDS file handle anyway. > > This is reachable in practice during a data-server restart: SOFTCONN > exhaustion produces -ETIMEDOUT, which is fatal per nfs_error_is_fatal(), > which triggers PNFS_NOT_ATTEMPTED, which silently goes to MDS. > > Preserve the upstream "don't spin on fatal errors" intent for layouts > that permit MDS fallback. For layouts with FF_FLAGS_NO_IO_THRU_MDS > set, mark the layout for return and request PNFS_TRY_AGAIN instead; > if the server cannot supply a usable layout the failure now surfaces > cleanly via pnfs_update_layout(), rather than via silent MDS I/O that > contradicts the flag. > > Fixes: f06bedfa62d5 ("pNFS/flexfiles: don't attempt pnfs on fatal DS errors") > Assisted-by: Claude:claude-opus-4-7 > Signed-off-by: Mike Snitzer <snitzer@kernel.org> > --- > fs/nfs/flexfilelayout/flexfilelayout.c | 16 ++++++++++++++++ > 1 file changed, 16 insertions(+) > > diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c > b/fs/nfs/flexfilelayout/flexfilelayout.c > index 4d142f1fdf61a..38bcd260e0a91 100644 > --- a/fs/nfs/flexfilelayout/flexfilelayout.c > +++ b/fs/nfs/flexfilelayout/flexfilelayout.c > @@ -2204,6 +2204,14 @@ ff_layout_read_pagelist(struct nfs_pgio_header *hdr) > out_failed: > if (ff_layout_avoid_mds_available_ds(lseg) && !ds_fatal_error) > return PNFS_TRY_AGAIN; > + if (ff_layout_no_fallback_to_mds(lseg)) { > + /* > + * FF_FLAGS_NO_IO_THRU_MDS: force fresh LAYOUTGET, > + * never fall through to MDS I/O. > + */ > + pnfs_error_mark_layout_for_return(hdr->inode, lseg); > + return PNFS_TRY_AGAIN; > + } > trace_pnfs_mds_fallback_read_pagelist(hdr->inode, > hdr->args.offset, hdr->args.count, > IOMODE_READ, NFS_I(hdr->inode)->layout, lseg); > @@ -2289,6 +2297,14 @@ ff_layout_write_pagelist(struct nfs_pgio_header *hdr, int > sync) > out_failed: > if (ff_layout_avoid_mds_available_ds(lseg) && !ds_fatal_error) > return PNFS_TRY_AGAIN; > + if (ff_layout_no_fallback_to_mds(lseg)) { > + /* > + * FF_FLAGS_NO_IO_THRU_MDS: force fresh LAYOUTGET, > + * never fall through to MDS I/O. > + */ > + pnfs_error_mark_layout_for_return(hdr->inode, lseg); > + return PNFS_TRY_AGAIN; > + } > trace_pnfs_mds_fallback_write_pagelist(hdr->inode, > hdr->args.offset, hdr->args.count, > IOMODE_RW, NFS_I(hdr->inode)->layout, lseg); > -- > 2.44.0 [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 2309 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH 2/2] NFSv4/flexfiles: honor FF_FLAGS_NO_IO_THRU_MDS in pg_get_mirror_count_write 2026-06-04 20:24 [PATCH 0/2] NFSv4/flexfiles: fix unwanted in-band IO fallback from DS to MDS Mike Snitzer 2026-06-04 20:24 ` [PATCH 1/2] NFSv4/flexfiles: honor FF_FLAGS_NO_IO_THRU_MDS on fatal DS connect errors Mike Snitzer @ 2026-06-04 20:24 ` Mike Snitzer 1 sibling, 0 replies; 4+ messages in thread From: Mike Snitzer @ 2026-06-04 20:24 UTC (permalink / raw) To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs The FF_FLAGS_NO_IO_THRU_MDS flag lives on each lseg, so any fallback decision made when there is no current lseg (e.g. between LAYOUTRETURN and the next LAYOUTGET) cannot run the per-lseg check. Introduce a sticky hdr-level ditto for FF_FLAGS_NO_IO_THRU_MDS in struct nfs4_flexfile_layout::flags (NFS4_FF_HDR_NO_IO_THRU_MDS bit), set whenever ff_layout_alloc_lseg() parses an lseg with the flag. The bit is never cleared for the lifetime of the layout hdr; the server is assumed to be consistent in its no-fallback policy per file. kzalloc() in ff_layout_alloc_layout_hdr() zero-initializes the field. Use the new ff_layout_hdr_no_fallback_to_mds() helper to gate ff_layout_pg_get_mirror_count_write(): when pnfs_update_layout() returns NULL (e.g. NFS_LAYOUT_BULK_RECALL, pnfs_layout_io_test_failed, pnfs_layoutgets_blocked) the existing code unconditionally calls nfs_pageio_reset_write_mds(). This is a source of unwanted WRITE to MDS. Fix it by checking NFS4_FF_HDR_NO_IO_THRU_MDS bit, and if set surface -EAGAIN instead; the writepage-side caller (nfs_do_writepage() for buffered, nfs_direct_write_reschedule() for O_DIRECT) then redirties the request so writeback retries via pNFS. Fixes: 260074cd8413 ("pNFS/flexfiles: Add support for FF_FLAGS_NO_IO_THRU_MDS") Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Mike Snitzer <snitzer@kernel.org> --- fs/nfs/flexfilelayout/flexfilelayout.c | 13 +++++++++++++ fs/nfs/flexfilelayout/flexfilelayout.h | 16 ++++++++++++++++ 2 files changed, 29 insertions(+) diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c index 38bcd260e0a91..a63f90be11dfd 100644 --- a/fs/nfs/flexfilelayout/flexfilelayout.c +++ b/fs/nfs/flexfilelayout/flexfilelayout.c @@ -636,6 +636,9 @@ ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh, if (!p) goto out_sort_mirrors; fls->flags = be32_to_cpup(p); + if (fls->flags & FF_FLAGS_NO_IO_THRU_MDS) + set_bit(NFS4_FF_HDR_NO_IO_THRU_MDS, + &FF_LAYOUT_FROM_HDR(lh)->flags); p = xdr_inline_decode(&stream, 4); if (!p) @@ -1185,6 +1188,16 @@ ff_layout_pg_get_mirror_count_write(struct nfs_pageio_descriptor *pgio, 0, NFS4_MAX_UINT64, IOMODE_RW, NFS_I(pgio->pg_inode)->layout, pgio->pg_lseg); + if (NFS_I(pgio->pg_inode)->layout && + ff_layout_hdr_no_fallback_to_mds(NFS_I(pgio->pg_inode)->layout)) { + /* + * FF_FLAGS_NO_IO_THRU_MDS: no current lseg but the server's + * policy forbids MDS fallback. Surface -EAGAIN so writeback + * retries rather than silently issuing the WRITE via MDS. + */ + pgio->pg_error = -EAGAIN; + goto out; + } /* no lseg means that pnfs is not in use, so no mirroring here */ nfs_pageio_reset_write_mds(pgio); out: diff --git a/fs/nfs/flexfilelayout/flexfilelayout.h b/fs/nfs/flexfilelayout/flexfilelayout.h index 17a008c8e97ce..a5bd00f69e824 100644 --- a/fs/nfs/flexfilelayout/flexfilelayout.h +++ b/fs/nfs/flexfilelayout/flexfilelayout.h @@ -112,12 +112,16 @@ struct nfs4_ff_layout_segment { struct nfs4_ff_layout_mirror *mirror_array[] __counted_by(mirror_array_cnt); }; +/* nfs4_flexfile_layout::flags bit indices */ +#define NFS4_FF_HDR_NO_IO_THRU_MDS 0 /* any lseg has had FF_FLAGS_NO_IO_THRU_MDS */ + struct nfs4_flexfile_layout { struct pnfs_layout_hdr generic_hdr; struct pnfs_ds_commit_info commit_info; struct list_head mirrors; struct list_head error_list; /* nfs4_ff_layout_ds_err */ ktime_t last_report_time; /* Layoutstat report times */ + unsigned long flags; }; struct nfs4_flexfile_layoutreturn_args { @@ -184,6 +188,18 @@ ff_layout_no_fallback_to_mds(struct pnfs_layout_segment *lseg) return FF_LAYOUT_LSEG(lseg)->flags & FF_FLAGS_NO_IO_THRU_MDS; } +/* + * Sticky hdr-level mirror of FF_FLAGS_NO_IO_THRU_MDS so callers that have + * no current lseg (e.g. between LAYOUTRETURN and the next LAYOUTGET) can + * still honor the no-MDS-fallback policy. + */ +static inline bool +ff_layout_hdr_no_fallback_to_mds(struct pnfs_layout_hdr *lo) +{ + return test_bit(NFS4_FF_HDR_NO_IO_THRU_MDS, + &FF_LAYOUT_FROM_HDR(lo)->flags); +} + static inline bool ff_layout_no_read_on_rw(struct pnfs_layout_segment *lseg) { -- 2.44.0 ^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-06-09 17:23 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-04 20:24 [PATCH 0/2] NFSv4/flexfiles: fix unwanted in-band IO fallback from DS to MDS Mike Snitzer 2026-06-04 20:24 ` [PATCH 1/2] NFSv4/flexfiles: honor FF_FLAGS_NO_IO_THRU_MDS on fatal DS connect errors Mike Snitzer 2026-06-09 17:14 ` Mkrtchyan, Tigran 2026-06-04 20:24 ` [PATCH 2/2] NFSv4/flexfiles: honor FF_FLAGS_NO_IO_THRU_MDS in pg_get_mirror_count_write Mike Snitzer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox