From: Trond Myklebust <trondmy@hammerspace.com>
To: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
"schumaker.anna@gmail.com" <schumaker.anna@gmail.com>
Cc: "Anna.Schumaker@Netapp.com" <Anna.Schumaker@Netapp.com>
Subject: Re: [PATCH v1 1/1] NFS: Fix interrupted slots by sending a solo SEQUENCE operation
Date: Wed, 8 Jul 2020 15:59:59 +0000 [thread overview]
Message-ID: <25e89e208bd3c6e44f8041d64c96be238b78c3b6.camel@hammerspace.com> (raw)
In-Reply-To: <20200708155018.110150-2-Anna.Schumaker@Netapp.com>
On Wed, 2020-07-08 at 11:50 -0400, schumaker.anna@gmail.com wrote:
> From: Anna Schumaker <Anna.Schumaker@Netapp.com>
>
> We used to do this before 3453d5708b33, but this was changed to
> better
> handle the NFS4ERR_SEQ_MISORDERED error code. This commit fixed the
> slot
> re-use case when the server doesn't receive the interrupted
> operation,
> but if the server does receive the operation then it could still end
> up
> replying to the client with mis-matched operations from the reply
> cache.
>
> We can fix this by sending a SEQUENCE to the server while recovering
> from
> a SEQ_MISORDERED error when we detect that we are in an interrupted
> slot
> situation.
>
> Fixes: 3453d5708b33 (NFSv4.1: Avoid false retries when RPC calls are
> interrupted)
> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
> ---
> fs/nfs/nfs4proc.c | 17 +++++++++++++++--
> 1 file changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index e32717fd1169..5de41a5772f0 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -774,6 +774,14 @@ static void nfs4_slot_sequence_acked(struct
> nfs4_slot *slot,
> slot->seq_nr_last_acked = seqnr;
> }
>
> +static void nfs4_probe_sequence(struct nfs_client *client, const
> struct cred *cred,
> + struct nfs4_slot *slot)
> +{
> + struct rpc_task *task = _nfs41_proc_sequence(client, cred,
> slot, true);
> + if (!IS_ERR(task))
> + rpc_wait_for_completion_task(task);
Hmm... I am a little concerned about the wait here, since we don't know
what kind of thread this is.
Any chance we could kick off a _nfs41_proc_sequence asynchronously, and
then perhaps requeue the original task to wait for the next free slot?
I suppose one issue there would be if the 'original task is an earlier
call to _nfs41_proc_sequence, but perhaps that can be worked around?
> +}
> +
> static int nfs41_sequence_process(struct rpc_task *task,
> struct nfs4_sequence_res *res)
> {
> @@ -790,6 +798,7 @@ static int nfs41_sequence_process(struct rpc_task
> *task,
> goto out;
>
> session = slot->table->session;
> + clp = session->clp;
>
> trace_nfs4_sequence_done(session, res);
>
> @@ -804,7 +813,6 @@ static int nfs41_sequence_process(struct rpc_task
> *task,
> nfs4_slot_sequence_acked(slot, slot->seq_nr);
> /* Update the slot's sequence and clientid lease timer
> */
> slot->seq_done = 1;
> - clp = session->clp;
> do_renew_lease(clp, res->sr_timestamp);
> /* Check sequence flags */
> nfs41_handle_sequence_flag_errors(clp, res-
> >sr_status_flags,
> @@ -852,10 +860,15 @@ static int nfs41_sequence_process(struct
> rpc_task *task,
> /*
> * Were one or more calls using this slot interrupted?
> * If the server never received the request, then our
> - * transmitted slot sequence number may be too high.
> + * transmitted slot sequence number may be too high.
> However,
> + * if the server did receive the request then it might
> + * accidentally give us a reply with a mismatched
> operation.
> + * We can sort this out by sending a lone sequence
> operation
> + * to the server on the same slot.
> */
> if ((s32)(slot->seq_nr - slot->seq_nr_last_acked) > 1)
> {
> slot->seq_nr--;
> + nfs4_probe_sequence(clp, task->tk_msg.rpc_cred,
> slot);
> goto retry_nowait;
> }
> /*
--
Trond Myklebust
CTO, Hammerspace Inc
4984 El Camino Real, Suite 208
Los Altos, CA 94022
www.hammer.space
next prev parent reply other threads:[~2020-07-08 16:00 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-08 15:50 [PATCH v1 0/1] NFS: Fix -EREMOTEIO error on interrupted slots schumaker.anna
2020-07-08 15:50 ` [PATCH v1 1/1] NFS: Fix interrupted slots by sending a solo SEQUENCE operation schumaker.anna
2020-07-08 15:59 ` Trond Myklebust [this message]
2020-07-08 16:08 ` Anna Schumaker
2020-07-08 20:19 ` Anna Schumaker
2020-07-08 22:09 ` Trond Myklebust
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=25e89e208bd3c6e44f8041d64c96be238b78c3b6.camel@hammerspace.com \
--to=trondmy@hammerspace.com \
--cc=Anna.Schumaker@Netapp.com \
--cc=linux-nfs@vger.kernel.org \
--cc=schumaker.anna@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox