From: Jeff Layton <jlayton@poochiereds.net>
To: Andrew W Elble <aweits@rit.edu>
Cc: Trond Myklebust <trondmy@primarydata.com>,
Anna Schumaker <Anna.Schumaker@netapp.com>,
Thomas Haynes <loghyr@primarydata.com>,
linux-nfs@vger.kernel.org, hch@lst.de
Subject: Re: [PATCH v4 12/13] pnfs: rework LAYOUTGET retry handling
Date: Tue, 28 Jun 2016 08:22:20 -0400 [thread overview]
Message-ID: <1467116540.32374.5.camel@poochiereds.net> (raw)
In-Reply-To: <m260stv7hp.fsf@discipline.rit.edu>
On Tue, 2016-06-28 at 08:10 -0400, Andrew W Elble wrote:
> Jeff Layton <jlayton@poochiereds.net> writes:
>
> >
> > There are several problems in the way a stateid is selected for a
> > LAYOUTGET operation:
> >
> > We pick a stateid to use in the RPC prepare op, but that makes
> > it difficult to serialize LAYOUTGETs that use the open stateid. That
> > serialization is done in pnfs_update_layout, which occurs well before
> > the rpc_prepare operation.
> >
> > Between those two events, the i_lock is dropped and reacquired.
> > pnfs_update_layout can find that the list has lsegs in it and not do any
> > serialization, but then later pnfs_choose_layoutget_stateid ends up
> > choosing the open stateid.
> >
> > This patch changes the client to select the stateid to use in the
> > LAYOUTGET earlier, when we're searching for a usable layout segment.
> > This way we can do it all while holding the i_lock the first time, and
> > ensure that we serialize any LAYOUTGET call that uses a non-layout
> > stateid.
> >
> > This also means a rework of how LAYOUTGET replies are handled, as we
> > must now get the latest stateid if we want to retransmit in response
> > to a retryable error.
> >
> > Most of those errors boil down to the fact that the layout state has
> > changed in some fashion. Thus, what we really want to do is to re-search
> > for a layout when it fails with a retryable error, so that we can avoid
> > reissuing the RPC at all if possible.
> >
> > While the LAYOUTGET RPC is async, the initiating thread always waits for
> > it to complete, so it's effectively synchronous anyway. Currently, when
> > we need to retry a LAYOUTGET because of an error, we drive that retry
> > via the rpc state machine.
> >
> > This means that once the call has been submitted, it runs until it
> > completes. So, we must move the error handling for this RPC out of the
> > rpc_call_done operation and into the caller.
> >
> > In order to handle errors like NFS4ERR_DELAY properly, we must also
> > pass a pointer to the sliding timeout, which is now moved to the stack
> > in pnfs_update_layout.
> >
> > The complicating errors are -NFS4ERR_RECALLCONFLICT and
> > -NFS4ERR_LAYOUTTRYLATER, as those involve a timeout after which we give
> > up and return NULL back to the caller. So, there is some special
> > handling for those errors to ensure that the layers driving the retries
> > can handle that appropriately.
> >
> > Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
> > ---
> > fs/nfs/nfs4proc.c | 115 ++++++++++++++++----------------------
> > fs/nfs/nfs4trace.h | 10 +++-
> > fs/nfs/pnfs.c | 144 +++++++++++++++++++++++++-----------------------
> > fs/nfs/pnfs.h | 6 +-
> > include/linux/errno.h | 1 +
> > include/linux/nfs4.h | 2 +
> > include/linux/nfs_xdr.h | 2 -
> > 7 files changed, 136 insertions(+), 144 deletions(-)
> >
> > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> > index c0d75be8cb69..c2583ca6c8b6 100644
> > --- a/fs/nfs/nfs4proc.c
> > +++ b/fs/nfs/nfs4proc.c
> > @@ -416,6 +416,7 @@ static int nfs4_do_handle_exception(struct nfs_server *server,
> > case -NFS4ERR_DELAY:
> > nfs_inc_server_stats(server, NFSIOS_DELAY);
> > case -NFS4ERR_GRACE:
> > + case -NFS4ERR_RECALLCONFLICT:
> > exception->delay = 1;
> > return 0;
> >
> > @@ -7824,40 +7825,34 @@ nfs4_layoutget_prepare(struct rpc_task *task, void *calldata)
> > struct nfs4_layoutget *lgp = calldata;
> > struct nfs_server *server = NFS_SERVER(lgp->args.inode);
> > struct nfs4_session *session = nfs4_get_session(server);
> > - int ret;
> >
> > dprintk("--> %s\n", __func__);
> > - /* Note the is a race here, where a CB_LAYOUTRECALL can come in
> > - * right now covering the LAYOUTGET we are about to send.
> > - * However, that is not so catastrophic, and there seems
> > - * to be no way to prevent it completely.
> > - */
> > - if (nfs41_setup_sequence(session, &lgp->args.seq_args,
> > - &lgp->res.seq_res, task))
> > - return;
> > - ret = pnfs_choose_layoutget_stateid(&lgp->args.stateid,
> > - NFS_I(lgp->args.inode)->layout,
> > - &lgp->args.range,
> > - lgp->args.ctx->state);
> > - if (ret < 0)
> > - rpc_exit(task, ret);
> > + nfs41_setup_sequence(session, &lgp->args.seq_args,
> > + &lgp->res.seq_res, task);
> > + dprintk("<-- %s\n", __func__);
> > }
> >
> > static void nfs4_layoutget_done(struct rpc_task *task, void *calldata)
> > {
> > struct nfs4_layoutget *lgp = calldata;
> > +
> > + dprintk("--> %s\n", __func__);
> > + nfs41_sequence_done(task, &lgp->res.seq_res);
> > + dprintk("<-- %s\n", __func__);
> > +}
> > +
> > +static int
> > +nfs4_layoutget_handle_exception(struct rpc_task *task,
> > + struct nfs4_layoutget *lgp, struct nfs4_exception *exception)
> > +{
> > struct inode *inode = lgp->args.inode;
> > struct nfs_server *server = NFS_SERVER(inode);
> > struct pnfs_layout_hdr *lo;
> > - struct nfs4_state *state = NULL;
> > - unsigned long timeo, now, giveup;
> > + int status = task->tk_status;
> >
> > dprintk("--> %s tk_status => %d\n", __func__, -task->tk_status);
> >
> > - if (!nfs41_sequence_done(task, &lgp->res.seq_res))
> > - goto out;
> > -
> > - switch (task->tk_status) {
> > + switch (status) {
> > case 0:
> > goto out;
> >
> > @@ -7867,57 +7862,43 @@ static void nfs4_layoutget_done(struct rpc_task *task, void *calldata)
> > * retry go inband.
> > */
> > case -NFS4ERR_LAYOUTUNAVAILABLE:
> > - task->tk_status = -ENODATA;
> > + status = -ENODATA;
> > goto out;
> > /*
> > * NFS4ERR_BADLAYOUT means the MDS cannot return a layout of
> > * length lgp->args.minlength != 0 (see RFC5661 section 18.43.3).
> > */
> > case -NFS4ERR_BADLAYOUT:
> > - goto out_overflow;
> > + status = -EOVERFLOW;
> > + goto out;
> > /*
> > * NFS4ERR_LAYOUTTRYLATER is a conflict with another client
> > * (or clients) writing to the same RAID stripe except when
> > * the minlength argument is 0 (see RFC5661 section 18.43.3).
> > + *
> > + * Treat it like we would RECALLCONFLICT -- we retry for a little
> > + * while, and then eventually give up.
> > */
> > case -NFS4ERR_LAYOUTTRYLATER:
> > - if (lgp->args.minlength == 0)
> > - goto out_overflow;
> > - /*
> > - * NFS4ERR_RECALLCONFLICT is when conflict with self (must recall
> > - * existing layout before getting a new one).
> > - */
> > - case -NFS4ERR_RECALLCONFLICT:
> > - timeo = rpc_get_timeout(task->tk_client);
> > - giveup = lgp->args.timestamp + timeo;
> > - now = jiffies;
> > - if (time_after(giveup, now)) {
> > - unsigned long delay;
> > -
> > - /* Delay for:
> > - * - Not less then NFS4_POLL_RETRY_MIN.
> > - * - One last time a jiffie before we give up
> > - * - exponential backoff (time_now minus start_attempt)
> > - */
> > - delay = max_t(unsigned long, NFS4_POLL_RETRY_MIN,
> > - min((giveup - now - 1),
> > - now - lgp->args.timestamp));
> > -
> > - dprintk("%s: NFS4ERR_RECALLCONFLICT waiting %lu\n",
> > - __func__, delay);
> > - rpc_delay(task, delay);
> > - /* Do not call nfs4_async_handle_error() */
> > - goto out_restart;
> > + if (lgp->args.minlength == 0) {
> > + status = -EOVERFLOW;
> > + goto out;
> > }
> > - break;
> > + /* Fallthrough */
> > + case -NFS4ERR_RECALLCONFLICT:
> > + nfs4_handle_exception(server, -NFS4ERR_RECALLCONFLICT,
> > + exception);
> > + status = -ERECALLCONFLICT;
> > + goto out;
> > case -NFS4ERR_EXPIRED:
> > case -NFS4ERR_BAD_STATEID:
> > + exception->timeout = 0;
> > spin_lock(&inode->i_lock);
> > if (nfs4_stateid_match(&lgp->args.stateid,
> > &lgp->args.ctx->state->stateid)) {
> > spin_unlock(&inode->i_lock);
> > /* If the open stateid was bad, then recover it. */
> > - state = lgp->args.ctx->state;
> > + exception->state = lgp->args.ctx->state;
> > break;
> > }
> > lo = NFS_I(inode)->layout;
> > @@ -7935,20 +7916,16 @@ static void nfs4_layoutget_done(struct rpc_task *task, void *calldata)
> > pnfs_free_lseg_list(&head);
> > } else
> > spin_unlock(&inode->i_lock);
> > - goto out_restart;
> > + status = -EAGAIN;
> > + goto out;
> > }
> > - if (nfs4_async_handle_error(task, server, state, &lgp->timeout) == -EAGAIN)
> > - goto out_restart;
> > +
> > + status = nfs4_handle_exception(server, status, exception);
> > + if (exception->retry)
> > + status = -EAGAIN;
> > out:
> > dprintk("<-- %s\n", __func__);
> > - return;
> > -out_restart:
> > - task->tk_status = 0;
> > - rpc_restart_call_prepare(task);
> > - return;
> > -out_overflow:
> > - task->tk_status = -EOVERFLOW;
> > - goto out;
> > + return status;
> > }
> >
> > static size_t max_response_pages(struct nfs_server *server)
> > @@ -8017,7 +7994,7 @@ static const struct rpc_call_ops nfs4_layoutget_call_ops = {
> > };
> >
> > struct pnfs_layout_segment *
> > -nfs4_proc_layoutget(struct nfs4_layoutget *lgp, gfp_t gfp_flags)
> > +nfs4_proc_layoutget(struct nfs4_layoutget *lgp, long *timeout, gfp_t gfp_flags)
> > {
> > struct inode *inode = lgp->args.inode;
> > struct nfs_server *server = NFS_SERVER(inode);
> > @@ -8037,6 +8014,7 @@ nfs4_proc_layoutget(struct nfs4_layoutget *lgp, gfp_t gfp_flags)
> > .flags = RPC_TASK_ASYNC,
> > };
> > struct pnfs_layout_segment *lseg = NULL;
> > + struct nfs4_exception exception = { .timeout = *timeout };
> > int status = 0;
> >
> > dprintk("--> %s\n", __func__);
> > @@ -8050,7 +8028,6 @@ nfs4_proc_layoutget(struct nfs4_layoutget *lgp, gfp_t gfp_flags)
> > return ERR_PTR(-ENOMEM);
> > }
> > lgp->args.layout.pglen = max_pages * PAGE_SIZE;
> > - lgp->args.timestamp = jiffies;
> >
> > lgp->res.layoutp = &lgp->args.layout;
> > lgp->res.seq_res.sr_slot = NULL;
> > @@ -8060,13 +8037,17 @@ nfs4_proc_layoutget(struct nfs4_layoutget *lgp, gfp_t gfp_flags)
> > if (IS_ERR(task))
> > return ERR_CAST(task);
> > status = nfs4_wait_for_completion_rpc_task(task);
> > - if (status == 0)
> > - status = task->tk_status;
> > + if (status == 0) {
> > + status = nfs4_layoutget_handle_exception(task, lgp, &exception);
> > + *timeout = exception.timeout;
> > + }
> > +
> > trace_nfs4_layoutget(lgp->args.ctx,
> > &lgp->args.range,
> > &lgp->res.range,
> > &lgp->res.stateid,
> > status);
> > +
> > /* if layoutp->len is 0, nfs4_layoutget_prepare called rpc_exit */
> > if (status == 0 && lgp->res.layoutp->len)
> > lseg = pnfs_layout_process(lgp);
> > diff --git a/fs/nfs/nfs4trace.h b/fs/nfs/nfs4trace.h
> > index 2c8d05dae5b1..9c150b153782 100644
> > --- a/fs/nfs/nfs4trace.h
> > +++ b/fs/nfs/nfs4trace.h
> > @@ -1520,6 +1520,8 @@ DEFINE_NFS4_INODE_EVENT(nfs4_layoutreturn_on_close);
> > { PNFS_UPDATE_LAYOUT_FOUND_CACHED, "found cached" }, \
> > { PNFS_UPDATE_LAYOUT_RETURN, "layoutreturn" }, \
> > { PNFS_UPDATE_LAYOUT_BLOCKED, "layouts blocked" }, \
> > + { PNFS_UPDATE_LAYOUT_INVALID_OPEN, "invalid open" }, \
> > + { PNFS_UPDATE_LAYOUT_RETRY, "retrying" }, \
> > { PNFS_UPDATE_LAYOUT_SEND_LAYOUTGET, "sent layoutget" })
> >
> > TRACE_EVENT(pnfs_update_layout,
> > @@ -1528,9 +1530,10 @@ TRACE_EVENT(pnfs_update_layout,
> > u64 count,
> > enum pnfs_iomode iomode,
> > struct pnfs_layout_hdr *lo,
> > + struct pnfs_layout_segment *lseg,
> > enum pnfs_update_layout_reason reason
> > ),
> > - TP_ARGS(inode, pos, count, iomode, lo, reason),
> > + TP_ARGS(inode, pos, count, iomode, lo, lseg, reason),
> > TP_STRUCT__entry(
> > __field(dev_t, dev)
> > __field(u64, fileid)
> > @@ -1540,6 +1543,7 @@ TRACE_EVENT(pnfs_update_layout,
> > __field(enum pnfs_iomode, iomode)
> > __field(int, layoutstateid_seq)
> > __field(u32, layoutstateid_hash)
> > + __field(long, lseg)
> > __field(enum pnfs_update_layout_reason, reason)
> > ),
> > TP_fast_assign(
> > @@ -1559,11 +1563,12 @@ TRACE_EVENT(pnfs_update_layout,
> > __entry->layoutstateid_seq = 0;
> > __entry->layoutstateid_hash = 0;
> > }
> > + __entry->lseg = (long)lseg;
> > ),
> > TP_printk(
> > "fileid=%02x:%02x:%llu fhandle=0x%08x "
> > "iomode=%s pos=%llu count=%llu "
> > - "layoutstateid=%d:0x%08x (%s)",
> > + "layoutstateid=%d:0x%08x lseg=0x%lx (%s)",
> > MAJOR(__entry->dev), MINOR(__entry->dev),
> > (unsigned long long)__entry->fileid,
> > __entry->fhandle,
> > @@ -1571,6 +1576,7 @@ TRACE_EVENT(pnfs_update_layout,
> > (unsigned long long)__entry->pos,
> > (unsigned long long)__entry->count,
> > __entry->layoutstateid_seq, __entry->layoutstateid_hash,
> > + __entry->lseg,
> > show_pnfs_update_layout_reason(__entry->reason)
> > )
> > );
> > diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> > index bc2c3ec98d32..e97895b427c0 100644
> > --- a/fs/nfs/pnfs.c
> > +++ b/fs/nfs/pnfs.c
> > @@ -796,45 +796,18 @@ pnfs_layoutgets_blocked(const struct pnfs_layout_hdr *lo)
> > test_bit(NFS_LAYOUT_BULK_RECALL, &lo->plh_flags);
> > }
> >
> > -int
> > -pnfs_choose_layoutget_stateid(nfs4_stateid *dst, struct pnfs_layout_hdr *lo,
> > - const struct pnfs_layout_range *range,
> > - struct nfs4_state *open_state)
> > -{
> > - int status = 0;
> > -
> > - dprintk("--> %s\n", __func__);
> > - spin_lock(&lo->plh_inode->i_lock);
> > - if (pnfs_layoutgets_blocked(lo)) {
> > - status = -EAGAIN;
> > - } else if (!nfs4_valid_open_stateid(open_state)) {
> > - status = -EBADF;
> > - } else if (list_empty(&lo->plh_segs) ||
> > - test_bit(NFS_LAYOUT_INVALID_STID, &lo->plh_flags)) {
> > - int seq;
> > -
> > - do {
> > - seq = read_seqbegin(&open_state->seqlock);
> > - nfs4_stateid_copy(dst, &open_state->stateid);
> > - } while (read_seqretry(&open_state->seqlock, seq));
> > - } else
> > - nfs4_stateid_copy(dst, &lo->plh_stateid);
> > - spin_unlock(&lo->plh_inode->i_lock);
> > - dprintk("<-- %s\n", __func__);
> > - return status;
> > -}
> > -
> > /*
> > -* Get layout from server.
> > -* for now, assume that whole file layouts are requested.
> > -* arg->offset: 0
> > -* arg->length: all ones
> > -*/
> > + * Get layout from server.
> > + * for now, assume that whole file layouts are requested.
> > + * arg->offset: 0
> > + * arg->length: all ones
> > + */
> > static struct pnfs_layout_segment *
> > send_layoutget(struct pnfs_layout_hdr *lo,
> > struct nfs_open_context *ctx,
> > + nfs4_stateid *stateid,
> > const struct pnfs_layout_range *range,
> > - gfp_t gfp_flags)
> > + long *timeout, gfp_t gfp_flags)
> > {
> > struct inode *ino = lo->plh_inode;
> > struct nfs_server *server = NFS_SERVER(ino);
> > @@ -868,10 +841,11 @@ send_layoutget(struct pnfs_layout_hdr *lo,
> > lgp->args.type = server->pnfs_curr_ld->id;
> > lgp->args.inode = ino;
> > lgp->args.ctx = get_nfs_open_context(ctx);
> > + nfs4_stateid_copy(&lgp->args.stateid, stateid);
> > lgp->gfp_flags = gfp_flags;
> > lgp->cred = lo->plh_lc_cred;
> >
> > - return nfs4_proc_layoutget(lgp, gfp_flags);
> > + return nfs4_proc_layoutget(lgp, timeout, gfp_flags);
> > }
> >
> > static void pnfs_clear_layoutcommit(struct inode *inode,
> > @@ -1511,27 +1485,30 @@ pnfs_update_layout(struct inode *ino,
> > .offset = pos,
> > .length = count,
> > };
> > - unsigned pg_offset;
> > + unsigned pg_offset, seq;
> > struct nfs_server *server = NFS_SERVER(ino);
> > struct nfs_client *clp = server->nfs_client;
> > - struct pnfs_layout_hdr *lo;
> > + struct pnfs_layout_hdr *lo = NULL;
> > struct pnfs_layout_segment *lseg = NULL;
> > + nfs4_stateid stateid;
> > + long timeout = 0;
> > + unsigned long giveup = jiffies + rpc_get_timeout(server->client);
> > bool first;
> >
> > if (!pnfs_enabled_sb(NFS_SERVER(ino))) {
> > - trace_pnfs_update_layout(ino, pos, count, iomode, NULL,
> > + trace_pnfs_update_layout(ino, pos, count, iomode, lo, lseg,
> > PNFS_UPDATE_LAYOUT_NO_PNFS);
> > goto out;
> > }
> >
> > if (iomode == IOMODE_READ && i_size_read(ino) == 0) {
> > - trace_pnfs_update_layout(ino, pos, count, iomode, NULL,
> > + trace_pnfs_update_layout(ino, pos, count, iomode, lo, lseg,
> > PNFS_UPDATE_LAYOUT_RD_ZEROLEN);
> > goto out;
> > }
> >
> > if (pnfs_within_mdsthreshold(ctx, ino, iomode)) {
> > - trace_pnfs_update_layout(ino, pos, count, iomode, NULL,
> > + trace_pnfs_update_layout(ino, pos, count, iomode, lo, lseg,
> > PNFS_UPDATE_LAYOUT_MDSTHRESH);
> > goto out;
> > }
> > @@ -1542,14 +1519,14 @@ lookup_again:
> > lo = pnfs_find_alloc_layout(ino, ctx, gfp_flags);
> > if (lo == NULL) {
> > spin_unlock(&ino->i_lock);
> > - trace_pnfs_update_layout(ino, pos, count, iomode, NULL,
> > + trace_pnfs_update_layout(ino, pos, count, iomode, lo, lseg,
> > PNFS_UPDATE_LAYOUT_NOMEM);
> > goto out;
> > }
> >
> > /* Do we even need to bother with this? */
> > if (test_bit(NFS_LAYOUT_BULK_RECALL, &lo->plh_flags)) {
> > - trace_pnfs_update_layout(ino, pos, count, iomode, lo,
> > + trace_pnfs_update_layout(ino, pos, count, iomode, lo, lseg,
> > PNFS_UPDATE_LAYOUT_BULK_RECALL);
> > dprintk("%s matches recall, use MDS\n", __func__);
> > goto out_unlock;
> > @@ -1557,14 +1534,34 @@ lookup_again:
> >
> > /* if LAYOUTGET already failed once we don't try again */
> > if (pnfs_layout_io_test_failed(lo, iomode)) {
> > - trace_pnfs_update_layout(ino, pos, count, iomode, lo,
> > + trace_pnfs_update_layout(ino, pos, count, iomode, lo, lseg,
> > PNFS_UPDATE_LAYOUT_IO_TEST_FAIL);
> > goto out_unlock;
> > }
> >
> > - first = list_empty(&lo->plh_segs);
> > - if (first) {
> > - /* The first layoutget for the file. Need to serialize per
> > + lseg = pnfs_find_lseg(lo, &arg);
> > + if (lseg) {
> > + trace_pnfs_update_layout(ino, pos, count, iomode, lo, lseg,
> > + PNFS_UPDATE_LAYOUT_FOUND_CACHED);
> > + goto out_unlock;
> > + }
> > +
> > + if (!nfs4_valid_open_stateid(ctx->state)) {
> > + trace_pnfs_update_layout(ino, pos, count, iomode, lo, lseg,
> > + PNFS_UPDATE_LAYOUT_INVALID_OPEN);
> > + goto out_unlock;
> > + }
> > +
> > + /*
> > + * Choose a stateid for the LAYOUTGET. If we don't have a layout
> > + * stateid, or it has been invalidated, then we must use the open
> > + * stateid.
> > + */
> > + if (lo->plh_stateid.seqid == 0 ||
> > + test_bit(NFS_LAYOUT_INVALID_STID, &lo->plh_flags)) {
> > +
> > + /*
> > + * The first layoutget for the file. Need to serialize per
> > * RFC 5661 Errata 3208.
> > */
> > if (test_and_set_bit(NFS_LAYOUT_FIRST_LAYOUTGET,
> > @@ -1573,18 +1570,17 @@ lookup_again:
> > wait_on_bit(&lo->plh_flags, NFS_LAYOUT_FIRST_LAYOUTGET,
> > TASK_UNINTERRUPTIBLE);
> > pnfs_put_layout_hdr(lo);
> > + dprintk("%s retrying\n", __func__);
> > goto lookup_again;
> > }
> > +
> > + first = true;
> > + do {
> > + seq = read_seqbegin(&ctx->state->seqlock);
> > + nfs4_stateid_copy(&stateid, &ctx->state->stateid);
> > + } while (read_seqretry(&ctx->state->seqlock, seq));
> > } else {
> > - /* Check to see if the layout for the given range
> > - * already exists
> > - */
> > - lseg = pnfs_find_lseg(lo, &arg);
> > - if (lseg) {
> > - trace_pnfs_update_layout(ino, pos, count, iomode, lo,
> > - PNFS_UPDATE_LAYOUT_FOUND_CACHED);
> > - goto out_unlock;
> > - }
> > + nfs4_stateid_copy(&stateid, &lo->plh_stateid);
> > }
> >
> > /*
> > @@ -1599,15 +1595,17 @@ lookup_again:
> > pnfs_clear_first_layoutget(lo);
> > pnfs_put_layout_hdr(lo);
> > dprintk("%s retrying\n", __func__);
> > + trace_pnfs_update_layout(ino, pos, count, iomode, lo,
> > + lseg, PNFS_UPDATE_LAYOUT_RETRY);
> > goto lookup_again;
> > }
> > - trace_pnfs_update_layout(ino, pos, count, iomode, lo,
> > + trace_pnfs_update_layout(ino, pos, count, iomode, lo, lseg,
> > PNFS_UPDATE_LAYOUT_RETURN);
> > goto out_put_layout_hdr;
> > }
> >
> > if (pnfs_layoutgets_blocked(lo)) {
> > - trace_pnfs_update_layout(ino, pos, count, iomode, lo,
> > + trace_pnfs_update_layout(ino, pos, count, iomode, lo, lseg,
> > PNFS_UPDATE_LAYOUT_BLOCKED);
> > goto out_unlock;
> > }
> > @@ -1632,26 +1630,36 @@ lookup_again:
> > if (arg.length != NFS4_MAX_UINT64)
> > arg.length = PAGE_ALIGN(arg.length);
> >
> > - lseg = send_layoutget(lo, ctx, &arg, gfp_flags);
> > + lseg = send_layoutget(lo, ctx, &stateid, &arg, &timeout, gfp_flags);
> > + trace_pnfs_update_layout(ino, pos, count, iomode, lo, lseg,
> > + PNFS_UPDATE_LAYOUT_SEND_LAYOUTGET);
> > if (IS_ERR(lseg)) {
> > - if (lseg == ERR_PTR(-EAGAIN)) {
> > + switch(PTR_ERR(lseg)) {
> > + case -ERECALLCONFLICT:
> > + if (time_after(jiffies, giveup))
> > + lseg = NULL;
> > + /* Fallthrough */
> > + case -EAGAIN:
> > + pnfs_put_layout_hdr(lo);
> > if (first)
> > pnfs_clear_first_layoutget(lo);
> > - pnfs_put_layout_hdr(lo);
> > - goto lookup_again;
> > - }
> > -
> > - if (!nfs_error_is_fatal(PTR_ERR(lseg))) {
> > - pnfs_layout_clear_fail_bit(lo, pnfs_iomode_to_fail_bit(iomode));
> > - lseg = NULL;
> > + if (lseg) {
> > + trace_pnfs_update_layout(ino, pos, count,
> > + iomode, lo, lseg, PNFS_UPDATE_LAYOUT_RETRY);
> > + goto lookup_again;
> > + }
> > + /* Fallthrough */
> > + default:
> > + if (!nfs_error_is_fatal(PTR_ERR(lseg))) {
> > + pnfs_layout_clear_fail_bit(lo, pnfs_iomode_to_fail_bit(iomode));
> > + lseg = NULL;
> > + }
> Seems like in the past, a non-fatal-error used to trigger the opposite
> behavior, where this would set the fail_bit? Shouldn't that be the
> behavior for -NFS4ERR_LAYOUTUNAVAILABLE (which is mapped to -ENODATA)
> etc...?
>
Yes, and I think that was a bug. See commit
d03ab29dbbe905c6c7c5abd78e02547fa954ec07 for where that actually
changed.
You do have a good point about LAYOUTUNAVAILABLE though. That probably
should stop further attempts to get a layout. That said, the error
mapping here is fiendishly complex. I always have to wonder whether
there other errors that get turned into ENODATA? It may be safest to
change the error mapping there to something more specific.
> >
> > }
> > } else {
> > pnfs_layout_clear_fail_bit(lo, pnfs_iomode_to_fail_bit(iomode));
> > }
> >
> > atomic_dec(&lo->plh_outstanding);
> > - trace_pnfs_update_layout(ino, pos, count, iomode, lo,
> > - PNFS_UPDATE_LAYOUT_SEND_LAYOUTGET);
> > out_put_layout_hdr:
> > if (first)
> > pnfs_clear_first_layoutget(lo);
> > diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
> > index 971068b58647..f9f3331bef49 100644
> > --- a/fs/nfs/pnfs.h
> > +++ b/fs/nfs/pnfs.h
> > @@ -228,7 +228,7 @@ extern void pnfs_unregister_layoutdriver(struct pnfs_layoutdriver_type *);
> > extern int nfs4_proc_getdeviceinfo(struct nfs_server *server,
> > struct pnfs_device *dev,
> > struct rpc_cred *cred);
> > -extern struct pnfs_layout_segment* nfs4_proc_layoutget(struct nfs4_layoutget *lgp, gfp_t gfp_flags);
> > +extern struct pnfs_layout_segment* nfs4_proc_layoutget(struct nfs4_layoutget *lgp, long *timeout, gfp_t gfp_flags);
> > extern int nfs4_proc_layoutreturn(struct nfs4_layoutreturn *lrp, bool sync);
> >
> > /* pnfs.c */
> > @@ -260,10 +260,6 @@ void pnfs_put_layout_hdr(struct pnfs_layout_hdr *lo);
> > void pnfs_set_layout_stateid(struct pnfs_layout_hdr *lo,
> > const nfs4_stateid *new,
> > bool update_barrier);
> > -int pnfs_choose_layoutget_stateid(nfs4_stateid *dst,
> > - struct pnfs_layout_hdr *lo,
> > - const struct pnfs_layout_range *range,
> > - struct nfs4_state *open_state);
> > int pnfs_mark_matching_lsegs_invalid(struct pnfs_layout_hdr *lo,
> > struct list_head *tmp_list,
> > const struct pnfs_layout_range *recall_range,
> > diff --git a/include/linux/errno.h b/include/linux/errno.h
> > index 89627b9187f9..7ce9fb1b7d28 100644
> > --- a/include/linux/errno.h
> > +++ b/include/linux/errno.h
> > @@ -28,5 +28,6 @@
> > #define EBADTYPE 527 /* Type not supported by server */
> > #define EJUKEBOX 528 /* Request initiated, but will not complete before timeout */
> > #define EIOCBQUEUED 529 /* iocb queued, will get completion event */
> > +#define ERECALLCONFLICT 530 /* conflict with recalled state */
> >
> > #endif
> > diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
> > index 011433478a14..f4870a330290 100644
> > --- a/include/linux/nfs4.h
> > +++ b/include/linux/nfs4.h
> > @@ -621,7 +621,9 @@ enum pnfs_update_layout_reason {
> > PNFS_UPDATE_LAYOUT_IO_TEST_FAIL,
> > PNFS_UPDATE_LAYOUT_FOUND_CACHED,
> > PNFS_UPDATE_LAYOUT_RETURN,
> > + PNFS_UPDATE_LAYOUT_RETRY,
> > PNFS_UPDATE_LAYOUT_BLOCKED,
> > + PNFS_UPDATE_LAYOUT_INVALID_OPEN,
> > PNFS_UPDATE_LAYOUT_SEND_LAYOUTGET,
> > };
> >
> > diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
> > index cb9982d8f38f..a4cb8a33ae2c 100644
> > --- a/include/linux/nfs_xdr.h
> > +++ b/include/linux/nfs_xdr.h
> > @@ -233,7 +233,6 @@ struct nfs4_layoutget_args {
> > struct inode *inode;
> > struct nfs_open_context *ctx;
> > nfs4_stateid stateid;
> > - unsigned long timestamp;
> > struct nfs4_layoutdriver_data layout;
> > };
> >
> > @@ -251,7 +250,6 @@ struct nfs4_layoutget {
> > struct nfs4_layoutget_res res;
> > struct rpc_cred *cred;
> > gfp_t gfp_flags;
> > - long timeout;
> > };
> >
> > struct nfs4_getdeviceinfo_args {
--
Jeff Layton <jlayton@poochiereds.net>
next prev parent reply other threads:[~2016-06-28 12:22 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-17 16:28 [PATCH v4 00/13] pnfs: layout pipelining and related fixes Jeff Layton
2016-05-17 16:28 ` [PATCH v4 01/13] pNFS/flexfile: Fix erroneous fall back to read/write through the MDS Jeff Layton
2016-05-17 16:28 ` [PATCH v4 02/13] pNFS/flexfiles: When checking for available DSes, conditionally check for MDS io Jeff Layton
2016-05-17 16:28 ` [PATCH v4 03/13] pNFS/flexfiles: When initing reads or writes, we might have to retry connecting to DSes Jeff Layton
2016-05-17 16:28 ` [PATCH v4 04/13] pnfs: don't merge new ff lsegs with ones that have LAYOUTRETURN bit set Jeff Layton
2016-05-17 16:28 ` [PATCH v4 05/13] pnfs: record sequence in pnfs_layout_segment when it's created Jeff Layton
2016-05-17 16:28 ` [PATCH v4 06/13] pnfs: keep track of the return sequence number in pnfs_layout_hdr Jeff Layton
2016-05-17 16:28 ` [PATCH v4 07/13] pnfs: only tear down lsegs that precede seqid in LAYOUTRETURN args Jeff Layton
2016-05-17 16:28 ` [PATCH v4 08/13] flexfiles: remove pointless setting of NFS_LAYOUT_RETURN_REQUESTED Jeff Layton
2016-05-17 16:28 ` [PATCH v4 09/13] flexfiles: add kerneldoc header to nfs4_ff_layout_prepare_ds Jeff Layton
2016-05-17 16:28 ` [PATCH v4 10/13] pnfs: fix bad error handling in send_layoutget Jeff Layton
2016-05-17 16:28 ` [PATCH v4 11/13] pnfs: lift retry logic from send_layoutget to pnfs_update_layout Jeff Layton
2016-05-17 16:28 ` [PATCH v4 12/13] pnfs: rework LAYOUTGET retry handling Jeff Layton
2016-06-28 12:10 ` Andrew W Elble
2016-06-28 12:22 ` Jeff Layton [this message]
2016-06-28 12:53 ` Andrew W Elble
2016-06-28 12:55 ` Trond Myklebust
2016-06-28 13:09 ` Jeff Layton
2016-05-17 16:28 ` [PATCH v4 13/13] pnfs: make pnfs_layout_process more robust Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1467116540.32374.5.camel@poochiereds.net \
--to=jlayton@poochiereds.net \
--cc=Anna.Schumaker@netapp.com \
--cc=aweits@rit.edu \
--cc=hch@lst.de \
--cc=linux-nfs@vger.kernel.org \
--cc=loghyr@primarydata.com \
--cc=trondmy@primarydata.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).