linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* NFS4 patch 08/20 (BAD_SEQID recovery)
@ 2014-03-07  9:41 Ben Taylor
  2014-03-07 13:05 ` Trond Myklebust
  0 siblings, 1 reply; 5+ messages in thread
From: Ben Taylor @ 2014-03-07  9:41 UTC (permalink / raw)
  To: linux-nfs

Hi

We've been getting weird occasional failures on our NFS systems where
our processing gridnodes will gradually grind to a halt (we lose a
couple of machines a day requiring a reboot - hard reboot if left long
enough). Hunting through Wireshark dumps, the problem is that the NFS
client is making repeated requests to open the same file on our
fileserver and every one has the same owner ID and a sequence ID of 0
(which the server throws out again as a bad sequence ID). I've got a
dump I can give you if you want it.

I am convinced that the problem is that described in patch 08/20 from
Chuck Lever (see http://www.spinics.net/lists/linux-nfs/msg29413.html),
where in this case the client gets the same open owner ID from the
server and retries with that, which makes the server think it's the same
request and throw it out again. In that patch Chuck added a uniqifier to
the owner ID to avoid this problem.

The problem is that we can't find any kernel versions that include that
patch - easy way to
check is look for the " therefore safely retry using a new one. We
should still warn the user though..." part - if the "warn the user" part
is there, it's not been patched (we did check other bits of the patch
too). We're running both Fedora 17 and Fedora 19 at the moment (yes, I
know 17 is EOL), neither of which includes the patch. We also can't see
it in the NFS client or server trees at

http://git.linux-nfs.org/?p=trondmy/nfs-2.6.git;a=blob;f=fs/nfs/nfs4proc.c;h=2da6a698b8f7719c14eefec65e6148a48d030bb3;hb=HEAD#l2327

http://git.linux-nfs.org/?p=trondmy/nfs-2.6.git;a=blob;f=fs/nfs/nfs4proc.c;h=2da6a698b8f7719c14eefec65e6148a48d030bb3;hb=HEAD#l2327

...and nor does Chuck appear to have it in his merging tree:
http://git.linux-nfs.org/?p=cel/cel-2.6.git;a=blob;f=fs/nfs/nfs4proc.c;h=15052b81df4245e4f797adb0d0b2e523338b23cc;hb=HEAD#l2327

Can anyone tell me what happened to this patch please? Was it lost or
superseded?

TIA
Ben

-- 
Ben Taylor <benj@pml.ac.uk>, http://rsg.pml.ac.uk/
Remote Sensing Group, Plymouth Marine Laboratory
Tel: +44 (0)1752 633432, Fax: +44 (0)1752 633101


Please visit our new website at www.pml.ac.uk and follow us on Twitter  @PlymouthMarine

Plymouth Marine Laboratory (PML) is a company limited by guarantee registered in England & Wales, company number 4178503. Registered Charity No. 1091222. Registered Office: Prospect Place, The Hoe, Plymouth  PL1 3DH, UK. 

This message is private and confidential. If you have received this message in error, please notify the sender and remove it from your system. You are reminded that e-mail communications are not secure and may contain viruses; PML accepts no liability for any loss or damage which may be caused by viruses.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-03-11  8:49 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-07  9:41 NFS4 patch 08/20 (BAD_SEQID recovery) Ben Taylor
2014-03-07 13:05 ` Trond Myklebust
2014-03-10 17:26   ` Ben Taylor
2014-03-10 23:23     ` Trond Myklebust
2014-03-11  8:49       ` Ben Taylor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).