From: Jeff Layton <jlayton@poochiereds.net>
To: Trond Myklebust <trondmy@primarydata.com>,
Anna Schumaker <Anna.Schumaker@netapp.com>
Cc: Thomas Haynes <loghyr@primarydata.com>,
linux-nfs@vger.kernel.org, hch@lst.de
Subject: [PATCH v4 00/13] pnfs: layout pipelining and related fixes
Date: Tue, 17 May 2016 12:28:35 -0400 [thread overview]
Message-ID: <1463502528-11519-1-git-send-email-jeff.layton@primarydata.com> (raw)
v4:
- fix error handling in send_layoutget/pnfs_update_layout
v3:
- move more of the LAYOUTGET error handling out of the state machine
- better tracepoints
- cleanup and fixes in pnfs_layout_process
- patches to prevent client from flipping to in-band I/O
- bugfixes
v2:
- rework of LAYOUTGET retry handling.
v4 for the set, a fix for the error handling bug that Trond spotted in
send_layoutget. Also fix a minor comment typo that Tom spotted.
Original cover letter from the RFC patchset follows:
--------------------------[snip]----------------------------
At Primary Data, one of the things we're most interested in is data
mobility. IOW, we want to be able to change the layout for an inode
seamlessly, with little interruption to I/O patterns.
The problem we have now is that CB_LAYOUTRECALLs interrupt I/O. When one
comes in, most pNFS servers refuse to hand out new layouts until the
recalled ones have been returned (or the client indicates that it no
longer knows about them). It doesn't have to be this way though. RFC5661
allows for concurrent LAYOUTGET and LAYOUTRETURN calls.
Furthermore, servers are expected to deal with old stateids in
LAYOUTRETURN. From RFC5661, section 18.44.3:
If the client returns the layout in response to a CB_LAYOUTRECALL
where the lor_recalltype field of the clora_recall field was
LAYOUTRECALL4_FILE, the client should use the lor_stateid value from
CB_LAYOUTRECALL as the value for lrf_stateid. Otherwise, it should
use logr_stateid (from a previous LAYOUTGET result) or lorr_stateid
(from a previous LAYRETURN result). This is done to indicate the
point in time (in terms of layout stateid transitions) when the
recall was sent.
The way I'm interpreting this is that we can treat a LAYOUTRETURN with
an old stateid as returning all layouts that matched the given iomode,
at the time that that seqid was current.
With that, we can allow a LAYOUTGET on the same fh to proceed even when
there are still recalled layouts outstanding. This should allow the
client to pivot to a new layout while it's still draining I/Os
that are pinning the ones to be returned.
This patchset is a first draft of the client side piece that allows
this. Basically whenever we get a new layout segment, we'll tag it with
the seqid that was in the LAYOUTGET stateid that grants it.
When a CB_LAYOUTRECALL comes in, we tag the return seqid in the layout
header with the one that was in the request. When we do a LAYOUTRETURN
in response to a CB_LAYOUTRECALL, we craft the seqid such that we're
only returning the layouts that were recalled. Nothing that has been
granted since then will be returned.
I think I've done this in a way that the existing behavior is preserved
in the case where the server enforces the serialization of these
operations, but please do have a look and let me know if you see any
potential problems here. Testing this is still a WIP...
Jeff Layton (10):
pnfs: don't merge new ff lsegs with ones that have LAYOUTRETURN bit
set
pnfs: record sequence in pnfs_layout_segment when it's created
pnfs: keep track of the return sequence number in pnfs_layout_hdr
pnfs: only tear down lsegs that precede seqid in LAYOUTRETURN args
flexfiles: remove pointless setting of NFS_LAYOUT_RETURN_REQUESTED
flexfiles: add kerneldoc header to nfs4_ff_layout_prepare_ds
pnfs: fix bad error handling in send_layoutget
pnfs: lift retry logic from send_layoutget to pnfs_update_layout
pnfs: rework LAYOUTGET retry handling
pnfs: make pnfs_layout_process more robust
Tom Haynes (2):
pNFS/flexfiles: When checking for available DSes, conditionally check
for MDS io
pNFS/flexfiles: When initing reads or writes, we might have to retry
connecting to DSes
Trond Myklebust (1):
pNFS/flexfile: Fix erroneous fall back to read/write through the MDS
fs/nfs/callback_proc.c | 3 +-
fs/nfs/flexfilelayout/flexfilelayout.c | 63 ++++---
fs/nfs/flexfilelayout/flexfilelayout.h | 1 +
fs/nfs/flexfilelayout/flexfilelayoutdev.c | 32 +++-
fs/nfs/nfs42proc.c | 2 +-
fs/nfs/nfs4proc.c | 120 +++++-------
fs/nfs/nfs4trace.h | 10 +-
fs/nfs/pnfs.c | 300 +++++++++++++++++-------------
fs/nfs/pnfs.h | 14 +-
include/linux/errno.h | 1 +
include/linux/nfs4.h | 2 +
include/linux/nfs_xdr.h | 2 -
12 files changed, 299 insertions(+), 251 deletions(-)
--
2.5.5
next reply other threads:[~2016-05-17 16:28 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-17 16:28 Jeff Layton [this message]
2016-05-17 16:28 ` [PATCH v4 01/13] pNFS/flexfile: Fix erroneous fall back to read/write through the MDS Jeff Layton
2016-05-17 16:28 ` [PATCH v4 02/13] pNFS/flexfiles: When checking for available DSes, conditionally check for MDS io Jeff Layton
2016-05-17 16:28 ` [PATCH v4 03/13] pNFS/flexfiles: When initing reads or writes, we might have to retry connecting to DSes Jeff Layton
2016-05-17 16:28 ` [PATCH v4 04/13] pnfs: don't merge new ff lsegs with ones that have LAYOUTRETURN bit set Jeff Layton
2016-05-17 16:28 ` [PATCH v4 05/13] pnfs: record sequence in pnfs_layout_segment when it's created Jeff Layton
2016-05-17 16:28 ` [PATCH v4 06/13] pnfs: keep track of the return sequence number in pnfs_layout_hdr Jeff Layton
2016-05-17 16:28 ` [PATCH v4 07/13] pnfs: only tear down lsegs that precede seqid in LAYOUTRETURN args Jeff Layton
2016-05-17 16:28 ` [PATCH v4 08/13] flexfiles: remove pointless setting of NFS_LAYOUT_RETURN_REQUESTED Jeff Layton
2016-05-17 16:28 ` [PATCH v4 09/13] flexfiles: add kerneldoc header to nfs4_ff_layout_prepare_ds Jeff Layton
2016-05-17 16:28 ` [PATCH v4 10/13] pnfs: fix bad error handling in send_layoutget Jeff Layton
2016-05-17 16:28 ` [PATCH v4 11/13] pnfs: lift retry logic from send_layoutget to pnfs_update_layout Jeff Layton
2016-05-17 16:28 ` [PATCH v4 12/13] pnfs: rework LAYOUTGET retry handling Jeff Layton
2016-06-28 12:10 ` Andrew W Elble
2016-06-28 12:22 ` Jeff Layton
2016-06-28 12:53 ` Andrew W Elble
2016-06-28 12:55 ` Trond Myklebust
2016-06-28 13:09 ` Jeff Layton
2016-05-17 16:28 ` [PATCH v4 13/13] pnfs: make pnfs_layout_process more robust Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1463502528-11519-1-git-send-email-jeff.layton@primarydata.com \
--to=jlayton@poochiereds.net \
--cc=Anna.Schumaker@netapp.com \
--cc=hch@lst.de \
--cc=linux-nfs@vger.kernel.org \
--cc=loghyr@primarydata.com \
--cc=trondmy@primarydata.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).