public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: schumaker.anna@gmail.com
To: linux-nfs@vger.kernel.org
Cc: Anna.Schumaker@Netapp.com
Subject: [PATCH v2 0/1] NFS: Fix -EREMOTEIO error on interrupted slots
Date: Thu,  9 Jul 2020 14:05:44 -0400	[thread overview]
Message-ID: <20200709180545.903715-1-Anna.Schumaker@Netapp.com> (raw)

From: Anna Schumaker <Anna.Schumaker@Netapp.com>

The scenario is as follows:
 - The client attempts to remove a file on the server, but the remove is
   interrupted AFTER the server receives it.
 - At the same time, another thread removes the same file on the server
   before NFSD has a chance to remove it
 - The client then attempts another NFS operation with the same slot.

Because another thread removed the file the vfs returns -ENOENT to NFSD,
which causes NFSD to reply to the next operation on the same slot with
the result of the REMOVE (even if we asked for an OPEN). The client
detects the mismatched operations during decoding, and returns
-EREMOTEIO to the application.

The timing is tricky to get right on this, so I added a 3-second sleep
to nfsd4_remove() before calling nfsd_unlink():

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index a09c35f0f6f0..bd93be50eaa8 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -851,6 +851,8 @@ nfsd4_remove(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 
 	if (opens_in_grace(SVC_NET(rqstp)))
 		return nfserr_grace;
+
+	ssleep(3);
 	status = nfsd_unlink(rqstp, &cstate->current_fh, 0,
 			     remove->rm_name, remove->rm_namelen);
 	if (!status) {


I'm able to hit this every time using the following script combined with
the artifical delay on the server:

#!/bin/bash
SERVER=192.168.111.200
SERVER_DIR=/srv/test
CLIENT_DIR=/mnt/test

ssh $SERVER "echo test > $SERVER_DIR/test1"
rm -v $CLIENT_DIR/test1 &
sleep 1
killall -9 rm
ssh $SERVER "rm $SERVER_DIR/test1"
echo "test2" > $CLIENT_DIR/test2


I was able to solve the issue by sending a SEQUENCE using the same slot.
The server replies to this with NFS4ERR_SEQ_FALSE_RETRY instead of an
operation from the reply cache, and we are able to recover from here.

Thoughts?
Anna

Anna Schumaker (1):
  NFS: Fix interrupted slots by sending a solo SEQUENCE operation

 fs/nfs/nfs4proc.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

-- 
2.27.0


             reply	other threads:[~2020-07-09 18:05 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-09 18:05 schumaker.anna [this message]
2020-07-09 18:05 ` [PATCH v2 1/1] NFS: Fix interrupted slots by sending a solo SEQUENCE operation schumaker.anna

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200709180545.903715-1-Anna.Schumaker@Netapp.com \
    --to=schumaker.anna@gmail.com \
    --cc=Anna.Schumaker@Netapp.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox