All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Timeouts gone wild on ia64
@ 2003-05-09 12:41 Steve Dickson
  2003-05-10 13:50 ` Trond Myklebust
  0 siblings, 1 reply; 19+ messages in thread
From: Steve Dickson @ 2003-05-09 12:41 UTC (permalink / raw)
  To: nfs

[-- Attachment #1: Type: text/plain, Size: 704 bytes --]

Here is a patch that greatly reduces that number of
timeout (and EIO errors with soft mounts) that
occur when a fast client is talking to a slow server.
 
We were noticing a large number of EIO errors when
a ia64 client was talking to a x86 server with
soft mounts (ala autofs)....

True, EIO errors should be expect with soft mounts but
it turns out that thousands of timeouts were occurring on
a ia64 client compared to 50 to 60 timeouts with
a x86 client when talking to the same slow server and
generating the same traffic.

What this patch does is make the minimal Round Trip
time value relative to HZ. So When HZ is greater (as
in the case of ia64) the minimal value goes up.
 
Comments?

SteveD.


[-- Attachment #2: linux-2.4.20-nfs-ia64-EIO.patch --]
[-- Type: text/plain, Size: 333 bytes --]

--- linux-2.4.20/net/sunrpc/timer.c.orig	2003-05-08 15:10:24.000000000 -0400
+++ linux-2.4.20/net/sunrpc/timer.c	2003-05-08 15:40:01.000000000 -0400
@@ -8,7 +8,7 @@
 
 #define RPC_RTO_MAX (60*HZ)
 #define RPC_RTO_INIT (HZ/5)
-#define RPC_RTO_MIN (2)
+#define RPC_RTO_MIN (HZ/30)
 
 void
 rpc_init_rtt(struct rpc_rtt *rt, long timeo)

^ permalink raw reply	[flat|nested] 19+ messages in thread
* RE: [PATCH] Timeouts gone wild on ia64
@ 2003-05-09 13:40 Lever, Charles
  2003-05-09 14:12 ` Steve Dickson
  0 siblings, 1 reply; 19+ messages in thread
From: Lever, Charles @ 2003-05-09 13:40 UTC (permalink / raw)
  To: Steve Dickson, nfs

[-- Attachment #1: Type: text/plain, Size: 1059 bytes --]

steve-
 
can you explain why there are more timeouts for ia64?  do you
have a network trace you can share?

-----Original Message----- 
From: Steve Dickson [mailto:SteveD@RedHat.com] 
Sent: Fri 5/9/2003 8:41 AM 
To: nfs@lists.sourceforge.net 
Cc: 
Subject: [NFS] [PATCH] Timeouts gone wild on ia64



Here is a patch that greatly reduces that number of 
timeout (and EIO errors with soft mounts) that 
occur when a fast client is talking to a slow server. 
  
We were noticing a large number of EIO errors when 
a ia64 client was talking to a x86 server with 
soft mounts (ala autofs).... 

True, EIO errors should be expect with soft mounts but 
it turns out that thousands of timeouts were occurring on 
a ia64 client compared to 50 to 60 timeouts with 
a x86 client when talking to the same slow server and 
generating the same traffic. 

What this patch does is make the minimal Round Trip 
time value relative to HZ. So When HZ is greater (as 
in the case of ia64) the minimal value goes up. 
  
Comments? 

SteveD. 


[-- Attachment #2: winmail.dat --]
[-- Type: application/ms-tnef, Size: 4482 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread
* RE: [PATCH] Timeouts gone wild on ia64
@ 2003-05-09 15:24 Lever, Charles
  2003-05-09 17:19 ` Steve Dickson
  0 siblings, 1 reply; 19+ messages in thread
From: Lever, Charles @ 2003-05-09 15:24 UTC (permalink / raw)
  To: Steve Dickson; +Cc: nfs

b2ssIGkgd2FzIGNvbmZ1c2VkLiAgeW91ciBpbml0aWFsIGV4cGxhbmF0aW9uDQpzdWdnZXN0ZWQg
eW91IHdlcmUgY2hhbmdpbmcgdGhlIGJlaGF2aW9yIG9mDQp0aGUgUlRPIGVzdGltYXRvciwgd2hl
cmVhcyBpbiBmYWN0LCB5b3UgYXJlDQpjb3JyZWN0aW5nIGl0LiAgb24gaWE2NCwgdGhlIG1pbmlt
dW0gdGltZW91dA0KdmFsdWUgaXMgaW5jb3JyZWN0IGJlY2F1c2UgSFogaXMgYW4gb3JkZXIgb2YN
Cm1hZ25pdHVkZSBsYXJnZXIuDQoNCmhvd2V2ZXIsIGVxdWl2YWxlbnQgYmVoYXZpb3Igd291bGQg
YmU6DQoNCiAgI2RlZmluZSBSUENfUlRPX01JTiAoSFovNTApDQoNCm5vdCBIWi8zMC4gIHdoeSBk
aWQgeW91IGNob29zZSAzMD8NCg0KPiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9t
OiBTdGV2ZSBEaWNrc29uIFttYWlsdG86U3RldmVEQHJlZGhhdC5jb21dDQo+IFNlbnQ6IEZyaWRh
eSwgTWF5IDA5LCAyMDAzIDEwOjEyIEFNDQo+IFRvOiBMZXZlciwgQ2hhcmxlcw0KPiBDYzogbmZz
QGxpc3RzLnNvdXJjZWZvcmdlLm5ldA0KPiBTdWJqZWN0OiBSZTogW05GU10gW1BBVENIXSBUaW1l
b3V0cyBnb25lIHdpbGQgb24gaWE2NA0KPiANCj4gDQo+IEl0IGhhcyB0byBkbyB3aXRoIHRoZSB2
YWx1ZSBvZiBIWi4uLi4gT24gYSBpYTY0LCBIWiBpcw0KPiBhdCAxMDI0IGFuZCBvbiBhbiB4ODYg
bWFjaGluZSBpdHMgMTAwLiBOb3QgdGFraW5nIHRoaXMNCj4gZGlmZmVyZW5jZSBpbiBhY2NvdW50
IHdoZW4gZmlndXJlIHRoZSB0aGUgbWluaW1hbA0KPiB0aW1lb3V0IHZhbHVlcyB3YXMgY2F1c2lu
ZyB0aW1lb3V0cyB0byBvY2N1ciBldmVyeSA0bXMNCj4gaW5zdGVhZCBvZiA0MG1zLg0KPiANCj4g
VGhlIG5ldHdvcmsgdHJhY2UgZGlkbid0IHNob3cgYW55dGhpbmcgc3Vic3RhbnRpYWwsDQo+IGJ1
dCBoZXJlIGlzIHRoZSBkZWJ1Z2dpbmcgdHJhaWwuDQo+IA0KPiBCeSBsb2dnaW5nIChhbmQgY291
bnRpbmcpIHRoZSBudW1iZXIgb2YgdGltZXMgY2FsbF9zdGF0dXMoKSB3YXMgY2FsbGVkDQo+IHdp
dGggYSAtRVRJTUVET1VUIHN0YXR1cywgaXQgYmVjYW1lIHZlcnkgYXBwYXJlbnQgdGhhdCBpYTY0
IG1hY2hpbmUNCj4gd2VyZSB0aW1pbmcgb3V0IHRob3VzYW5kcyBvZiB0aW1lcyBtb3JlIG9mdGVu
IHRoYW4gYW4geDg2IG1hY2hpbmUuDQo+IFRoZSBhY3R1YWwgbnVtYmVycyB3YXMgc29tZXRoaW5n
IGxpa2UgMTQwMCB0byA1MCB3aGVuIEkgZ2VuZXJhdGVkDQo+IHRyYWZmaWMgYnkgZG9pbmcgIG1k
NXN1bSAvbmZzL21vdW50ZWQvKi5ycG0gPiAvZGV2L251bGwuDQo+IA0KPiBOZXh0IEkgdG9vayBh
IGxvb2sgYXQgd2hhdCB0YXNrLT50a190aW1lb3V0IHdhcyBiZWluZyBzZXQgdG8gaW4NCj4gZG9f
eHBydF90cmFuc21pdCgpLiBPbiBhbiB4ODYgaXQgd2FzIGJlaW5nIHNldCB0byB+NDBtcy4gT24N
Cj4gYW4gaWE2NCBtYWNoaW5lIGl0IHdhcyBiZWluZyBzZXQgdG8gfjRtcy4NCj4gDQo+IFRoYXQg
bGVhZCBtZSB0byBob3cgcnBjX2NhbGNfcnRvKCkgd2FzIGZpZ3VyaW5nIG91dA0KPiB0aGUgUlRP
cy4uLiBJIG5vdGljZWQgdGhhdCBSUENfUlRPX01JTiB3YXMgdGhlIG9ubHkNCj4gY29uc3RhbnQg
dGhhdCB3YXMgbm90IHJlbGF0aXZlIHRvIEhaLiBTbyBJIGRpZCBzb21lDQo+IGV4cGVyaW1lbnRz
IGFuZCBmb3VuZCBvdXQgYnkgbWFraW5nIGl0IHJlbGF0aXZlIHRvIEhaDQo+IHRoZSB0aW1lb3V0
IGRlY3JlYXNlZCBzdWJzdGFudGlhbGx5Li4uDQo+IA0KPiBTdGV2ZUQuDQo+IA0KPiBMZXZlciwg
Q2hhcmxlcyB3cm90ZToNCj4gDQo+ID5zdGV2ZS0NCj4gPiANCj4gPmNhbiB5b3UgZXhwbGFpbiB3
aHkgdGhlcmUgYXJlIG1vcmUgdGltZW91dHMgZm9yIGlhNjQ/ICBkbyB5b3UNCj4gPmhhdmUgYSBu
ZXR3b3JrIHRyYWNlIHlvdSBjYW4gc2hhcmU/DQo+ID4NCj4gPi0tLS0tT3JpZ2luYWwgTWVzc2Fn
ZS0tLS0tIA0KPiA+RnJvbTogU3RldmUgRGlja3NvbiBbbWFpbHRvOlN0ZXZlREBSZWRIYXQuY29t
XSANCj4gPlNlbnQ6IEZyaSA1LzkvMjAwMyA4OjQxIEFNIA0KPiA+VG86IG5mc0BsaXN0cy5zb3Vy
Y2Vmb3JnZS5uZXQgDQo+ID5DYzogDQo+ID5TdWJqZWN0OiBbTkZTXSBbUEFUQ0hdIFRpbWVvdXRz
IGdvbmUgd2lsZCBvbiBpYTY0DQo+ID4NCj4gPg0KPiA+DQo+ID5IZXJlIGlzIGEgcGF0Y2ggdGhh
dCBncmVhdGx5IHJlZHVjZXMgdGhhdCBudW1iZXIgb2YgDQo+ID50aW1lb3V0IChhbmQgRUlPIGVy
cm9ycyB3aXRoIHNvZnQgbW91bnRzKSB0aGF0IA0KPiA+b2NjdXIgd2hlbiBhIGZhc3QgY2xpZW50
IGlzIHRhbGtpbmcgdG8gYSBzbG93IHNlcnZlci4gDQo+ID4gIA0KPiA+V2Ugd2VyZSBub3RpY2lu
ZyBhIGxhcmdlIG51bWJlciBvZiBFSU8gZXJyb3JzIHdoZW4gDQo+ID5hIGlhNjQgY2xpZW50IHdh
cyB0YWxraW5nIHRvIGEgeDg2IHNlcnZlciB3aXRoIA0KPiA+c29mdCBtb3VudHMgKGFsYSBhdXRv
ZnMpLi4uLiANCj4gPg0KPiA+VHJ1ZSwgRUlPIGVycm9ycyBzaG91bGQgYmUgZXhwZWN0IHdpdGgg
c29mdCBtb3VudHMgYnV0IA0KPiA+aXQgdHVybnMgb3V0IHRoYXQgdGhvdXNhbmRzIG9mIHRpbWVv
dXRzIHdlcmUgb2NjdXJyaW5nIG9uIA0KPiA+YSBpYTY0IGNsaWVudCBjb21wYXJlZCB0byA1MCB0
byA2MCB0aW1lb3V0cyB3aXRoIA0KPiA+YSB4ODYgY2xpZW50IHdoZW4gdGFsa2luZyB0byB0aGUg
c2FtZSBzbG93IHNlcnZlciBhbmQgDQo+ID5nZW5lcmF0aW5nIHRoZSBzYW1lIHRyYWZmaWMuIA0K
PiA+DQo+ID5XaGF0IHRoaXMgcGF0Y2ggZG9lcyBpcyBtYWtlIHRoZSBtaW5pbWFsIFJvdW5kIFRy
aXAgDQo+ID50aW1lIHZhbHVlIHJlbGF0aXZlIHRvIEhaLiBTbyBXaGVuIEhaIGlzIGdyZWF0ZXIg
KGFzIA0KPiA+aW4gdGhlIGNhc2Ugb2YgaWE2NCkgdGhlIG1pbmltYWwgdmFsdWUgZ29lcyB1cC4g
DQo+ID4gIA0KPiA+Q29tbWVudHM/IA0KPiA+DQo+ID5TdGV2ZUQuIA0KPiA+DQo+ID4gIA0KPiA+
DQo+IA0KPiANCj4gDQo=


-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
www.enterpriselinuxforum.com

_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 19+ messages in thread
* RE: [PATCH] Timeouts gone wild on ia64
@ 2003-05-15  5:46 Lever, Charles
  2003-05-15 14:10 ` Steve Dickson
  0 siblings, 1 reply; 19+ messages in thread
From: Lever, Charles @ 2003-05-15  5:46 UTC (permalink / raw)
  To: Steve Dickson, Trond Myklebust, nfs

[-- Attachment #1: Type: text/plain, Size: 1905 bytes --]

steve-
 
i think there must be an underlying problem here.  soft mounts
to slow servers should work correctly without raising the RTO
minimum.  
 
the RTO estimator should automatically raise the RTO values to
avoid extra timeouts.  if not, there is a problem with the RTO
estimator that needs to be addressed.
 

-----Original Message----- 
From: Steve Dickson [mailto:SteveD@RedHat.com] 
Sent: Wed 5/14/2003 8:34 PM 
To: Trond Myklebust; nfs@lists.sourceforge.net 
Cc: 
Subject: Re: [NFS] [PATCH] Timeouts gone wild on ia64



Hi Trond, 

Trond Myklebust wrote: 

>>>>>>" " == Steve Dickson <SteveD@RedHat.com> writes: 
>>>>>>            
>>>>>> 
> 
>     > What this patch does is make the minimal Round Trip time value 
>     > relative to HZ. So When HZ is greater (as in the case of ia64) 
>     > the minimal value goes up. 
>     >  Comments? 
> 
>1/30 of a second is a long time. Why that particular choice? 
> 
>Note: "It works" won't cut ice as that is completely a function of the 
>choice of hardware. 
> 
I realize this but... I figured waiting a few extra ticks of time before 
timing 
out was probably a good thing with respect to soft mounts to slow 
servers... 
Especially  since EIO errors can be pretty disruptive... 

And as I see, its a no-op with a fast (or slow) client  talking to a fast 
server because the timeout will should never happen... (i.e. the waiting 
thread will get the response before the time out occurs).... 

SteveD. 





------------------------------------------------------- 
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara 
The only event dedicated to issues related to Linux enterprise solutions 
www.enterpriselinuxforum.com 

_______________________________________________ 
NFS maillist  -  NFS@lists.sourceforge.net 
https://lists.sourceforge.net/lists/listinfo/nfs 


[-- Attachment #2: winmail.dat --]
[-- Type: application/ms-tnef, Size: 6178 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread
* RE: [PATCH] Timeouts gone wild on ia64
@ 2003-05-15 14:26 Lever, Charles
  2003-05-15 14:41 ` Trond Myklebust
  2003-05-15 15:16 ` Steve Dickson
  0 siblings, 2 replies; 19+ messages in thread
From: Lever, Charles @ 2003-05-15 14:26 UTC (permalink / raw)
  To: Steve Dickson; +Cc: Trond Myklebust, nfs

[-- Attachment #1: Type: text/plain, Size: 3603 bytes --]

you want to keep the retransmit timeout as short as possible,
just before things start timing out.  this means you get the fastest
possible recovery when the server drops a request.  a 4ms
timeout is probably reasonable for a fast server on a fast
network for small requests like GETATTR and LOOKUP.
 
but what i'm hearing is the starting RTO is probably not
optimal for slow servers.  right now the initial value is:
 
  #define RPC_RTO_INIT (HZ/5)
 
(200ms) which is perhaps too small.  a better value for
general use might be HZ/2 (half a second).  then the
estimator can adjust downward for faster servers while
behaving practically for slow ones.

-----Original Message----- 
From: Steve Dickson [mailto:SteveD@redhat.com] 
Sent: Thu 5/15/2003 10:10 AM 
To: Lever, Charles 
Cc: Trond Myklebust; nfs@lists.sourceforge.net 
Subject: Re: [NFS] [PATCH] Timeouts gone wild on ia64




It appears the RTO code does seem to be working but when the 
minimums start so low (like at 4m instead 40ms) it takes some 
time for the timeout value to build up and with soft mounts 
there is no time... 

Maybe I'm missing something... increasing the timeout value should 
not have any affect on performance since in a well tuned 
client and server these timeout will never occur since the 
responses from the server will be returned before the 
timeout expires... right? Also decreasing the number of timeouts 
will decrease the number of retransmits which is another good 
thing... True? 

SteveD. 

Lever, Charles wrote: 

>steve- 
> 
>i think there must be an underlying problem here.  soft mounts 
>to slow servers should work correctly without raising the RTO 
>minimum.  
> 
>the RTO estimator should automatically raise the RTO values to 
>avoid extra timeouts.  if not, there is a problem with the RTO 
>estimator that needs to be addressed. 
> 
> 
>-----Original Message----- 
>From: Steve Dickson [mailto:SteveD@RedHat.com] 
>Sent: Wed 5/14/2003 8:34 PM 
>To: Trond Myklebust; nfs@lists.sourceforge.net 
>Cc: 
>Subject: Re: [NFS] [PATCH] Timeouts gone wild on ia64 
> 
> 
> 
>Hi Trond, 
> 
>Trond Myklebust wrote: 
> 
>  
> 
>>>>>>>" " == Steve Dickson <SteveD@RedHat.com> writes: 
>>>>>>>           
>>>>>>> 
>>>>>>>              
>>>>>>> 
>>    > What this patch does is make the minimal Round Trip time value 
>>    > relative to HZ. So When HZ is greater (as in the case of ia64) 
>>    > the minimal value goes up. 
>>    >  Comments? 
>> 
>>1/30 of a second is a long time. Why that particular choice? 
>> 
>>Note: "It works" won't cut ice as that is completely a function of the 
>>choice of hardware. 
>> 
>>    
>> 
>I realize this but... I figured waiting a few extra ticks of time before 
>timing 
>out was probably a good thing with respect to soft mounts to slow 
>servers... 
>Especially  since EIO errors can be pretty disruptive... 
> 
>And as I see, its a no-op with a fast (or slow) client  talking to a fast 
>server because the timeout will should never happen... (i.e. the waiting 
>thread will get the response before the time out occurs).... 
> 
>SteveD. 
> 
> 
> 
> 
> 
>------------------------------------------------------- 
>Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara 
>The only event dedicated to issues related to Linux enterprise solutions 
>www.enterpriselinuxforum.com 
> 
>_______________________________________________ 
>NFS maillist  -  NFS@lists.sourceforge.net 
>https://lists.sourceforge.net/lists/listinfo/nfs 
> 
>  
> 


[-- Attachment #2: winmail.dat --]
[-- Type: application/ms-tnef, Size: 9202 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread
* RE: [PATCH] Timeouts gone wild on ia64
@ 2003-05-15 15:34 Lever, Charles
  0 siblings, 0 replies; 19+ messages in thread
From: Lever, Charles @ 2003-05-15 15:34 UTC (permalink / raw)
  To: Steve Dickson; +Cc: nfs

> >you want to keep the retransmit timeout as short as possible,
> >just before things start timing out.  this means you get the fastest
> >possible recovery when the server drops a request. =20
> >
> That's assuming server drops the request... now if the server is
> simply buzy because its severing hundreds of clients and it
> takes 6ms to respond, you now have hundreds of clients retransmitting
> very 4ms (for basically for no reason) which is just adding to the=20
> problem...
> I'm sure the RTO code would eventually increase the timeout which
> would smooth everything out but before that happens you would be
> blasting the network with a ton of unnecessary retransmits... True?

we're agreeing vehemently.  the RTO estimator should
*start* at a larger timeout value to prevent this.

> >but what i'm hearing is the starting RTO is probably not
> >optimal for slow servers.  right now the initial value is:
> >=20
> >  #define RPC_RTO_INIT (HZ/5)
> >=20
> >(200ms) which is perhaps too small.  a better value for
> >general use might be HZ/2 (half a second).  then the
> >estimator can adjust downward for faster servers while
> >behaving practically for slow ones.

i agree with trond that fixing mount is a good idea...
however, the mount command's initial RTO value is up
in the hundreds of msec.  so why does the estimator
allow the RTO values to drop for slow servers?

the default retransmit count is too low for UDP.  but
i think we all agree on that.

> By increasing the initial timeout, ISTM, that the client
> is assuming a slower server verses a fast one... which will
> probably work as well... Its just that I thought making
> all of the RTO constants value relative to HZ was a good idea...

yes, making the RTO constants relative to HZ is a good
idea.  i think the objection is to raising the minimum
RTO at the same time.


-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
www.enterpriselinuxforum.com

_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2003-09-18 12:14 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-05-09 12:41 [PATCH] Timeouts gone wild on ia64 Steve Dickson
2003-05-10 13:50 ` Trond Myklebust
2003-05-15  0:34   ` Steve Dickson
  -- strict thread matches above, loose matches on Subject: below --
2003-05-09 13:40 Lever, Charles
2003-05-09 14:12 ` Steve Dickson
2003-05-09 15:24 Lever, Charles
2003-05-09 17:19 ` Steve Dickson
2003-05-15  5:46 Lever, Charles
2003-05-15 14:10 ` Steve Dickson
2003-05-15 14:31   ` Trond Myklebust
2003-05-15 15:33     ` Steve Dickson
2003-09-17  4:14     ` Yusuf Goolamabbas
2003-09-17 13:46       ` Trond Myklebust
2003-09-18  7:03         ` Yusuf Goolamabbas
2003-09-18 12:13           ` Trond Myklebust
2003-05-15 14:26 Lever, Charles
2003-05-15 14:41 ` Trond Myklebust
2003-05-15 15:16 ` Steve Dickson
2003-05-15 15:34 Lever, Charles

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.