Linux NFS development
 help / color / mirror / Atom feed
From: Michel Lespinasse <walken-Y93EPB1FQwg@public.gmane.org>
To: linux-nfs@vger.kernel.org
Subject: Fwd: NFS 5-minute hangs upon S3 resume using 2.6.27 client
Date: Wed, 22 Oct 2008 21:02:31 -0700	[thread overview]
Message-ID: <20081023040231.GA13512@zoy.org> (raw)

I sent this out to LKML earlier but was told this would be a better list.
This has been mentionned in bugzilla already, but I'd like to draw attention
before it gets too late for 2.6.28 (or is it already too late ???)

The following is a common cause of 5-minute NFS hangs here:

* Client has TCP connections to the NFS server, goes to S3 sleep for few hours.
* TCP connections die on the server side.
  (not 100% sure why, do they use some kind of keepalive ???)
* Client resumes from S3.
* Client sends NFS requests down its TCP connections, gets back RST packet.
* [Client hangs for exactly 300 seconds here]
* Client establishes new TCP connections to the NFS server,
  and recovers from the hang.

A tcpdump trace is attached at the end of bugzilla bug 11154:
http://bugzilla.kernel.org/show_bug.cgi?id=11154

Should the client immediately try to reconnect when its existing connection
receives an RST packet ? (the 5 minute delay would make sense to me if
RST was received in reply to a SYN, but I'm not sure about it in the case
of an existing open TCP connection).

If the 5 minute delay after an RST is necessary, could the client avoid it
by explicitly closing/reopening its connections using suspend/resume hooks ?

(I can not work around the issue locally by mounting/unmounting my NFS
shares around the suspend/resume because rootfs also on NFS...)

This NFS setup was working fine in 2.6.24. There has been issues with
2.6.25 and 2.6.26, but I did not confirm if they are the same bug.
2.6.25 usualy recovers after some variable delay and 2.6.26 usualy
does not recover. Bugs 11154 and 11061 have more details about this,
also Ian Campbell has been tracking an NFS issue under load that
appeared at around the same time.

Hope this helps,

-- 
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.

             reply	other threads:[~2008-10-23  4:37 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-23  4:02 Michel Lespinasse [this message]
     [not found] ` <20081023040231.GA13512-Y93EPB1FQwg@public.gmane.org>
2008-10-23 15:36   ` Fwd: NFS 5-minute hangs upon S3 resume using 2.6.27 client Trond Myklebust
2008-10-23 19:52     ` Michel Lespinasse
     [not found]       ` <20081023195231.GA2090-Y93EPB1FQwg@public.gmane.org>
2008-10-23 23:17         ` Trond Myklebust
2008-10-24  6:57           ` Michel Lespinasse
     [not found]             ` <20081024065759.GA2401-Y93EPB1FQwg@public.gmane.org>
2008-10-24 12:29               ` Trond Myklebust
2008-10-24 21:02                 ` Michel Lespinasse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081023040231.GA13512@zoy.org \
    --to=walken-y93epb1fqwg@public.gmane.org \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox