Man page update for timeo= and retrans= options.

Linux NFS development
 help / color / mirror / Atom feed

* Man page update for timeo= and retrans= options.
@ 2008-01-04  2:32 Neil Brown
       [not found] ` <18301.39633.368089.130622-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Neil Brown @ 2008-01-04  2:32 UTC (permalink / raw)
  To: linux-nfs

I've been trying to understand exactly how timeouts work in the NFS
client and find that the man page in nfs-utils is not correct.

In particular, the implementation differentiates between TCP and UDP,
while the man page does not make that distinction.

I have attempted an update to the man page as you can see below.  It
is entirely possible that I have not got it completely correct (or
comprehensible) so I'm asking for people to check that what I have
written is correct and clear.

This I would particularly like comment on:

1/ I have left

 Better overall performance may be achieved by increasing the
 timeout when mounting on a busy network, to a slow server, or through
 several routers or gateways.

 unchanged.  Is it still a reasonable thing to say?

2/ I have moved the documentation about major timeouts into the retrans=
   section.  Does that break the description up too much?

3/ the old text seems to say that after the first major-timeout, a
  slightly different sequence of timeouts are used.  I couldn't find
  evidence of this in the code.  Did I miss something, or is my text
  correct?

4/ Did this change in some ancient kernel version, and should the
  version number of the change be documented?  e.g. is it a 2.4 / 2.6
  difference?

5/ As the behaviour is quite different for UDP and TCP, should we
  introduce a major_timeo= option which calculates an appropriate
  retrans= based on the actual timeo= and proto= used.

and anything else that occurs to anyone.

Thanks,
NeilBrown

diff --git a/utils/mount/nfs.man b/utils/mount/nfs.man
index d92da19..0142075 100644
--- a/utils/mount/nfs.man
+++ b/utils/mount/nfs.man
@@ -83,24 +83,50 @@ Note: Setting this size to a value less than the largest supported
 block size will adversely affect performance.
 .TP 1.5i
 .I timeo=n
-The value in tenths of a second before sending the
-first retransmission after an RPC timeout.
-The default value is 7 tenths of a second.  After the first timeout,
-the timeout is doubled after each successive timeout until a maximum
-timeout of 60 seconds is reached or the enough retransmissions
-have occured to cause a major timeout.  Then, if the filesystem
-is hard mounted, each new timeout cascade restarts at twice the
-initial value of the previous cascade, again doubling at each
-retransmission.  The maximum timeout is always 60 seconds.
+The value in tenths of a second for the first RPC timeout.  If no
+reply has been received in this much time, the message is
+retransmitted.
+Further timeouts are handled differently depending on the connection
+type.
+
+For UDP (which is unreliable and lacks congestion control),
+each successive timeout is twice the previous timeout.  As the default
+is 11 tenths of a seconds, the timeouts used if
+.I timeo=
+is not specified are 1.1, 2.2, 4.4, 8.8,... seconds.  The timeout for
+each retransmission is limited to 60 seconds, so the next few numbers
+in the above sequence would be 17.6, 35.2, 60, 60.
+
+For reliable protocols such as TCP and RDMA, the successive timeouts
+grow linearly rather than exponentially to a maximum of 10 minutes.
+The default is 1 minute, so the default successive timeout are 1,
+2, 3, 4, 5, 6, 7, 8, 9, 10, 10, 10 minutes.
+
+It is unwise to set
+.I timeo=
+explicitly without also setting the protocol to use, as it has a
+significantly different effect depending on protocol.
+
 Better overall performance may be achieved by increasing the
 timeout when mounting on a busy network, to a slow server, or through
 several routers or gateways.
 .TP 1.5i
 .I retrans=n
 The number of minor timeouts and retransmissions that must occur before
-a major timeout occurs.  The default is 3 timeouts.  When a major timeout
-occurs, the file operation is either aborted or a "server not responding"
-message is printed on the console.
+a major timeout occurs.  The default is 2 yielding a total of 3
+attempts (1 transmission and 2 retransmissions).  When a major timeout
+occurs the behaviour depends on whether the filesystem was mounted
+.I hard
+or
+.IR soft .
+In the case of a
+.I soft
+mount, the operation will abort and typically return an IO error to
+the application.  In the case of a
+.I hard
+mount a "server not responding" message will be printed on the
+console, and the request will be retried with the original series of
+timeouts.
 .TP 1.5i
 .I acregmin=n
 The minimum time in seconds that attributes of a regular file should

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Man page update for timeo= and retrans= options.
       [not found] ` <18301.39633.368089.130622-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
@ 2008-01-04 21:31   ` Trond Myklebust
  2008-01-07 18:15   ` Chuck Lever
  1 sibling, 0 replies; 6+ messages in thread
From: Trond Myklebust @ 2008-01-04 21:31 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-nfs


On Fri, 2008-01-04 at 13:32 +1100, Neil Brown wrote:
> I've been trying to understand exactly how timeouts work in the NFS
> client and find that the man page in nfs-utils is not correct.
> 
> In particular, the implementation differentiates between TCP and UDP,
> while the man page does not make that distinction.
> 
> I have attempted an update to the man page as you can see below.  It
> is entirely possible that I have not got it completely correct (or
> comprehensible) so I'm asking for people to check that what I have
> written is correct and clear.
> 
> This I would particularly like comment on:
> 
> 1/ I have left
> 
>  Better overall performance may be achieved by increasing the
>  timeout when mounting on a busy network, to a slow server, or through
>  several routers or gateways.
> 
>  unchanged.  Is it still a reasonable thing to say?

I suppose so, however it might be worth stating that a better solution
is to use TCP.

It is also worth pointing out that for TCP, the timeo mount option is
deprecated.

> 2/ I have moved the documentation about major timeouts into the retrans=
>    section.  Does that break the description up too much?

No, that sounds like a good idea.

> 3/ the old text seems to say that after the first major-timeout, a
>   slightly different sequence of timeouts are used.  I couldn't find
>   evidence of this in the code.  Did I miss something, or is my text
>   correct?

The text stating that 'each new  timeout  cascade  restarts at twice the
initial value of the previous cascade' is wrong. AFAIK, we restart at
the initial value...

> 4/ Did this change in some ancient kernel version, and should the
>   version number of the change be documented?  e.g. is it a 2.4 / 2.6
>   difference?

I'd have to check.

> 5/ As the behaviour is quite different for UDP and TCP, should we
>   introduce a major_timeo= option which calculates an appropriate
>   retrans= based on the actual timeo= and proto= used.

No. We should deprecate use of retrans/timeo altogether for TCP except
possibly for the case of 'soft' mounts (and even then you need to be
careful). It is far too easy to flood the server with redundant RPC
requests...

Cheers
  Trond


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Man page update for timeo= and retrans= options.
       [not found] ` <18301.39633.368089.130622-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
  2008-01-04 21:31   ` Trond Myklebust
@ 2008-01-07 18:15   ` Chuck Lever
  2008-01-08  1:32     ` Neil Brown
  2008-01-08 18:54     ` Steve Dickson
  1 sibling, 2 replies; 6+ messages in thread
From: Chuck Lever @ 2008-01-07 18:15 UTC (permalink / raw)
  To: Neil Brown, Steve Dickson; +Cc: linux-nfs

Hi Neil-

I just spent two months and rewrote all of nfs(5).  It should appear  
in the next release of nfs-utils.  Steve, when can we expect to see  
the updated man page?

On Jan 3, 2008, at 9:32 PM, Neil Brown wrote:

>
> I've been trying to understand exactly how timeouts work in the NFS
> client and find that the man page in nfs-utils is not correct.
>
> In particular, the implementation differentiates between TCP and UDP,
> while the man page does not make that distinction.
>
> I have attempted an update to the man page as you can see below.  It
> is entirely possible that I have not got it completely correct (or
> comprehensible) so I'm asking for people to check that what I have
> written is correct and clear.
>
> This I would particularly like comment on:
>
> 1/ I have left
>
>  Better overall performance may be achieved by increasing the
>  timeout when mounting on a busy network, to a slow server, or through
>  several routers or gateways.
>
>  unchanged.  Is it still a reasonable thing to say?
>
> 2/ I have moved the documentation about major timeouts into the  
> retrans=
>    section.  Does that break the description up too much?
>
> 3/ the old text seems to say that after the first major-timeout, a
>   slightly different sequence of timeouts are used.  I couldn't find
>   evidence of this in the code.  Did I miss something, or is my text
>   correct?
>
> 4/ Did this change in some ancient kernel version, and should the
>   version number of the change be documented?  e.g. is it a 2.4 / 2.6
>   difference?
>
> 5/ As the behaviour is quite different for UDP and TCP, should we
>   introduce a major_timeo= option which calculates an appropriate
>   retrans= based on the actual timeo= and proto= used.
>
> and anything else that occurs to anyone.
>
> Thanks,
> NeilBrown
>
> diff --git a/utils/mount/nfs.man b/utils/mount/nfs.man
> index d92da19..0142075 100644
> --- a/utils/mount/nfs.man
> +++ b/utils/mount/nfs.man
> @@ -83,24 +83,50 @@ Note: Setting this size to a value less than  
> the largest supported
>  block size will adversely affect performance.
>  .TP 1.5i
>  .I timeo=n
> -The value in tenths of a second before sending the
> -first retransmission after an RPC timeout.
> -The default value is 7 tenths of a second.  After the first timeout,
> -the timeout is doubled after each successive timeout until a maximum
> -timeout of 60 seconds is reached or the enough retransmissions
> -have occured to cause a major timeout.  Then, if the filesystem
> -is hard mounted, each new timeout cascade restarts at twice the
> -initial value of the previous cascade, again doubling at each
> -retransmission.  The maximum timeout is always 60 seconds.
> +The value in tenths of a second for the first RPC timeout.  If no
> +reply has been received in this much time, the message is
> +retransmitted.
> +Further timeouts are handled differently depending on the connection
> +type.
> +
> +For UDP (which is unreliable and lacks congestion control),
> +each successive timeout is twice the previous timeout.  As the  
> default
> +is 11 tenths of a seconds, the timeouts used if
> +.I timeo=
> +is not specified are 1.1, 2.2, 4.4, 8.8,... seconds.  The timeout for
> +each retransmission is limited to 60 seconds, so the next few numbers
> +in the above sequence would be 17.6, 35.2, 60, 60.
> +
> +For reliable protocols such as TCP and RDMA, the successive timeouts
> +grow linearly rather than exponentially to a maximum of 10 minutes.
> +The default is 1 minute, so the default successive timeout are 1,
> +2, 3, 4, 5, 6, 7, 8, 9, 10, 10, 10 minutes.
> +
> +It is unwise to set
> +.I timeo=
> +explicitly without also setting the protocol to use, as it has a
> +significantly different effect depending on protocol.
> +
>  Better overall performance may be achieved by increasing the
>  timeout when mounting on a busy network, to a slow server, or through
>  several routers or gateways.
>  .TP 1.5i
>  .I retrans=n
>  The number of minor timeouts and retransmissions that must occur  
> before
> -a major timeout occurs.  The default is 3 timeouts.  When a major  
> timeout
> -occurs, the file operation is either aborted or a "server not  
> responding"
> -message is printed on the console.
> +a major timeout occurs.  The default is 2 yielding a total of 3
> +attempts (1 transmission and 2 retransmissions).  When a major  
> timeout
> +occurs the behaviour depends on whether the filesystem was mounted
> +.I hard
> +or
> +.IR soft .
> +In the case of a
> +.I soft
> +mount, the operation will abort and typically return an IO error to
> +the application.  In the case of a
> +.I hard
> +mount a "server not responding" message will be printed on the
> +console, and the request will be retried with the original series of
> +timeouts.
>  .TP 1.5i
>  .I acregmin=n
>  The minimum time in seconds that attributes of a regular file should
> -
> To unsubscribe from this list: send the line "unsubscribe linux- 
> nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Man page update for timeo= and retrans= options.
  2008-01-07 18:15   ` Chuck Lever
@ 2008-01-08  1:32     ` Neil Brown
       [not found]       ` <18306.53954.61368.902438-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
  2008-01-08 18:54     ` Steve Dickson
  1 sibling, 1 reply; 6+ messages in thread
From: Neil Brown @ 2008-01-08  1:32 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Steve Dickson, linux-nfs

On Monday January 7, chuck.lever@oracle.com wrote:
> Hi Neil-
> 
> I just spent two months and rewrote all of nfs(5).  It should appear  
> in the next release of nfs-utils.  Steve, when can we expect to see  
> the updated man page?

I thought I had seem some rewrite go past, but it wasn't in my inbox
any more and also not it Steve's git so I just went ahead...

I see it is in the .git now (as of Friday).

Comments:
 - It says UDP defaults to 7/10 of a second, but
      nfs_init_timeout_values()
   says:
		if (!to->to_initval)
			to->to_initval = 11 * HZ / 10;

   which suggests 11/10 of a second.

 - It says 
    If the retrans option is not specified, the NFS client retries
    each request three times.

  but nfs_init_timeout_values() says

	if (!to->to_retries)
		to->to_retries = 2;

   which suggests it retries 2 time (or tries 3 times).


 - It says:
      After each retransmission, the NFS client doubles the timeout
      for that request, up to a maximum timeout length of 60 seconds.

   but doesn't (to me) make it clear that only applies to UDP.  For
   TCP, the timeouts appear to increase linearly up to 600 seconds.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Man page update for timeo= and retrans= options.
       [not found]       ` <18306.53954.61368.902438-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
@ 2008-01-08 12:38         ` Chuck Lever
  0 siblings, 0 replies; 6+ messages in thread
From: Chuck Lever @ 2008-01-08 12:38 UTC (permalink / raw)
  To: Neil Brown; +Cc: Steve Dickson, linux-nfs

Hi Neil-

On Jan 7, 2008, at 8:32 PM, Neil Brown wrote:
> On Monday January 7, chuck.lever@oracle.com wrote:
>> Hi Neil-
>>
>> I just spent two months and rewrote all of nfs(5).  It should appear
>> in the next release of nfs-utils.  Steve, when can we expect to see
>> the updated man page?
>
> I thought I had seem some rewrite go past, but it wasn't in my inbox
> any more and also not it Steve's git so I just went ahead...

> I see it is in the .git now (as of Friday).

Good.  I hope others will also have a chance to look it over.  And  
thanks for your scrutiny, btw.

> Comments:
>  - It says UDP defaults to 7/10 of a second, but
>       nfs_init_timeout_values()
>    says:
> 		if (!to->to_initval)
> 			to->to_initval = 11 * HZ / 10;
>
>    which suggests 11/10 of a second.

Yup.  I forgot about that code change, which I believe was to make  
UDP on Linux work more like Solaris does.

>  - It says
>     If the retrans option is not specified, the NFS client retries
>     each request three times.
>
>   but nfs_init_timeout_values() says
>
> 	if (!to->to_retries)
> 		to->to_retries = 2;
>
>    which suggests it retries 2 time (or tries 3 times).

Yes, nfs(5) should be changed to say "tries each request 3 times."

>  - It says:
>       After each retransmission, the NFS client doubles the timeout
>       for that request, up to a maximum timeout length of 60 seconds.
>
>    but doesn't (to me) make it clear that only applies to UDP.

It follows "However, for NFS over UDP" .... But perhaps the UDP part  
can be wholly split into a separate paragraph to make the distinction  
more clear.

I'll post a patch with these updates to nfs(5).

>   For TCP, the timeouts appear to increase linearly up to 600 seconds.

The TCP RTT should not change after a timeout.  At least that was the  
way it worked when I modified it a few years ago.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Man page update for timeo= and retrans= options.
  2008-01-07 18:15   ` Chuck Lever
  2008-01-08  1:32     ` Neil Brown
@ 2008-01-08 18:54     ` Steve Dickson
  1 sibling, 0 replies; 6+ messages in thread
From: Steve Dickson @ 2008-01-08 18:54 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Linux NFS Mailing list



Chuck Lever wrote:
> Hi Neil-
> 
> I just spent two months and rewrote all of nfs(5).  It should appear in
> the next release of nfs-utils.  Steve, when can we expect to see the
> updated man page?
I committed the update a few days ago...

steved.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-01-08 18:55 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-04  2:32 Man page update for timeo= and retrans= options Neil Brown
     [not found] ` <18301.39633.368089.130622-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2008-01-04 21:31   ` Trond Myklebust
2008-01-07 18:15   ` Chuck Lever
2008-01-08  1:32     ` Neil Brown
     [not found]       ` <18306.53954.61368.902438-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2008-01-08 12:38         ` Chuck Lever
2008-01-08 18:54     ` Steve Dickson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox