From: Trond Myklebust <Trond.Myklebust@netapp.com>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: Max Matveev <makc@redhat.com>, linux-nfs@vger.kernel.org
Subject: Re: NFS/TCP timeout sequence
Date: Thu, 07 Jul 2011 10:59:12 -0400 [thread overview]
Message-ID: <1310050752.3863.51.camel@lade.trondhjem.org> (raw)
In-Reply-To: <7464E63E-E02E-4E4F-95A4-D4CF235DAFB3@oracle.com>
On Thu, 2011-07-07 at 10:44 -0400, Chuck Lever wrote:
> On Jul 7, 2011, at 10:16 AM, Trond Myklebust wrote:
>
> > On Thu, 2011-07-07 at 10:04 -0400, Chuck Lever wrote:
> >> On Jul 7, 2011, at 9:47 AM, Trond Myklebust wrote:
> >>
> >>> On Thu, 2011-07-07 at 18:11 +1000, Max Matveev wrote:
> >>>> I've had to look at the way NFS/TCP does its timeouts and backoff
> >>>> and it does not make a lot of sense to me: according to the
> >>>> following paragram from nfs(5) on Fedora 14 (I'm using Fedora 14
> >>>> because it has more text then the same page in nfs-utils):
> >>>>
> >>>> timeo=n The time (in tenths of a second) the NFS client waits
> >>>> for a response before it retries an NFS request. If this
> >>>> option is not specified, requests are retried every 60
> >>>> seconds for NFS over TCP. The NFS client does not per‐
> >>>> form any kind of timeout backoff for NFS over TCP.
> >>>>
> >>>> but if I try the mount with timeo=20,retrans=7 then I'm getting
> >>>> retransmits which are 2, 4, 6, 8, 2, 4, 6, 8 seconds apart, i.e.
> >>>> there is a) linear backoff and b) the backoff is not long enough to
> >>>> let the complete sequence of 7 retransmits run its course.
> >>>
> >>> Sigh... Firstly, 2 second timeouts are complete lunacy when using a
> >>> protocol that guarantees reliable delivery, such as TCP does. Anyone who
> >>> tries it deserves exactly what they get: poor unreliable performance.
> >>
> >> We shouldn't allow such low settings.
> >>
> >>> Secondly, the _other_ fix for this problem is to fix the documentation.
> >>
> >> How is the documentation incorrect? We do not want any kind of back-off for stream transports.
> >
> > The documentation states that we don't do back off, but as Max points
> > out, in practice the kernel does a linear back off (and has always done
> > so).
>
> I question that parenthetical assertion. When I've looked at this behavior in the past, it has not backed off. It has retried every 60 seconds. That's why I wrote that in nfs(5). I've had many discussions about this with you in the past. We agreed: no back-off for TCP. The default settings for TCP transports are timeo=600,retrans=2, which means try three times at fixed 60 second intervals.
Looking at the code:
v2.6.0: exponential back off
v2.6.4: exponential back off
v2.6.9: exponential back off
v2.6.16: linear back off
v2.6.18: linear back off
v2.6.24: linear back off
v2.6.32: linear back off
....
So I've no idea what you were testing.
> So it seems to me the kernel has diverged (perhaps long ago) from the documentation, not the other way around.
Nope. The documentation has simply always been inaccurate afaics from
the above inspection.
> > Anyway, why shouldn't we back off if the server is failing to respond?
>
> Because the Solaris NFS client behaves this way, and we want to keep the syntax and semantics of our admin interfaces aligned between these implementations unless there is a good reason not to, because these mount options are published in automounter maps.
>
> More importantly, a 60 second wait is not an onerous workload for either the network or the server. Back-offs are usually used to provide quick recovery but then reduce network traffic if the server is down for a long while. If we start at 60 seconds, there's already no onerous workload; plus we already have a slow recovery anyway...
>
> In fact, for a long time we've wanted to make server restart recovery _faster_ not slower. Thus using back-off with already lengthy retransmit timeouts seems like a step in the wrong direction. After a server restart, our users want the client talking to the server again as quickly as possible. At a guess, quicker recovery time after a server reboot is probably the number one reason why people try using a smaller timeo= setting for TCP.
'retrans=1' will do this for you.
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
next prev parent reply other threads:[~2011-07-07 14:59 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-07 8:11 NFS/TCP timeout sequence Max Matveev
2011-07-07 13:47 ` Trond Myklebust
2011-07-07 14:04 ` Chuck Lever
2011-07-07 14:16 ` Trond Myklebust
2011-07-07 14:44 ` Chuck Lever
2011-07-07 14:59 ` Trond Myklebust [this message]
2011-08-04 5:54 ` Max Matveev
2011-08-04 5:42 ` [PATCH] NFS: allow enough time for timeouts to run Max Matveev
2011-08-04 5:47 ` [PATCH] Update nfs(5) manpage - timeo for NFS/TCP Max Matveev
2011-08-04 12:04 ` Jim Rees
2011-08-05 0:57 ` Max Matveev
2011-08-05 1:39 ` Jim Rees
2011-08-05 2:14 ` Max Matveev
2011-07-08 6:05 ` NFS/TCP timeout sequence Max Matveev
2011-07-08 0:20 ` Max Matveev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1310050752.3863.51.camel@lade.trondhjem.org \
--to=trond.myklebust@netapp.com \
--cc=chuck.lever@oracle.com \
--cc=linux-nfs@vger.kernel.org \
--cc=makc@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).