* NFS client problems in 2.4.18 to 2.4.20
@ 2003-09-05 20:44 Joshua Weage
2003-09-05 21:51 ` Trond Myklebust
0 siblings, 1 reply; 13+ messages in thread
From: Joshua Weage @ 2003-09-05 20:44 UTC (permalink / raw)
To: linux-kernel
I hope this was not discussed previously, I couldn't find anything
relevant in the archives.
I am having problems with NFS clients getting stuck after reporting a
"nfs server not responding message". The majority of the time the
mount starts working again when the nfs server load goes down.
However, sometimes the mount on one client becomes completely
unresponsive, but all of the clients still work correctly. Even after
letting it set for 2-3+ hours it still doesn't come back up. I can
ping the server from the locked client and that works. If I do a lazy
unmount and then remount the NFS disk it works again for awhile - but
tends to lock up again. A standard umount doesn't work when the client
is hung.
This happens with all RedHat kernel releases 2.4.18 to 2.4.20.
I have tried tuning the NFS server by going to nfs utils 1.0.3 and by
increasing nfsd's and the socket buffer sizes. I have also increased
the timeout on the clients to 2.0. One thing that seems to help is to
enable async mode on the NFS server. However, I've still seen the same
client hang with async turned on.
Machine Details:
12x Cluster nodes 2xAMD Athlon MP's, 100 MbEthernet
1x server 2xPentium III 1.13GHz, Adaptec 39160, Promise RM8000,
GigEthernet
1x Cisco 2924-T switch.
I'm running 8 CPU jobs, each cpu occasionally writes 120MB files to the
NFS disk. The client lockup always occurs during these file writes.
The lockups have occured on several of the cluster nodes.
Any suggestions on what could be causing this?
Thanks,
Joshua Weage
=====
__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: NFS client problems in 2.4.18 to 2.4.20
2003-09-05 20:44 NFS client problems in 2.4.18 to 2.4.20 Joshua Weage
@ 2003-09-05 21:51 ` Trond Myklebust
2003-09-06 16:29 ` Joshua Weage
0 siblings, 1 reply; 13+ messages in thread
From: Trond Myklebust @ 2003-09-05 21:51 UTC (permalink / raw)
To: Joshua Weage; +Cc: linux-kernel
>>>>> " " == Joshua Weage <weage98@yahoo.com> writes:
> Any suggestions on what could be causing this?
Have you read through the sections pertaining to these problems in the
NFS HOWTO and NFS FAQ? If not, see http://nfs.sourceforge.net
Cheers,
Trond
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: NFS client problems in 2.4.18 to 2.4.20
2003-09-05 21:51 ` Trond Myklebust
@ 2003-09-06 16:29 ` Joshua Weage
2003-09-06 17:09 ` Trond Myklebust
2003-09-10 18:37 ` Wouter Vlothuizen
0 siblings, 2 replies; 13+ messages in thread
From: Joshua Weage @ 2003-09-06 16:29 UTC (permalink / raw)
To: Trond Myklebust; +Cc: linux-kernel
I have read through them and did some tuning a few months ago. The
only thing that seems relevant to this problem is UDP buffer overflows,
but I'm using the 2.4.20 kernel. I'm using 8192 byte read and write
sizes and have increased the number of nfsd's and socket buffer size on
the server. Everything was working fine until about 4 weeks ago when
the mounts on the clients started locking up. The server is dropping
packets at times, but increasing nfds only manages to load the disk
further - it doesn't seem to reduce timeouts. However, I don't
understand why a mount on a single client will go AWOL - and never come
back - while all the others will continue to work properly.
Are there any commands that would allow me to figure out why the mount
has stopped working? I've looked at nfsstat and the kernel seems to
have stopped sending any data to the server, or it may send one packet
every couple of seconds. If I start up another shell and try to do an
ls on the problem filesystem, the command locks up and can't be
interrupted. I think I've also mounted the same filesystem in another
location, on the same machine, and it works fine.
Josh
--- Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
> >>>>> " " == Joshua Weage <weage98@yahoo.com> writes:
>
> > Any suggestions on what could be causing this?
>
> Have you read through the sections pertaining to these problems in
> the
> NFS HOWTO and NFS FAQ? If not, see http://nfs.sourceforge.net
>
> Cheers,
> Trond
=====
__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: NFS client problems in 2.4.18 to 2.4.20
2003-09-06 16:29 ` Joshua Weage
@ 2003-09-06 17:09 ` Trond Myklebust
2003-09-06 21:22 ` Joshua Weage
2003-09-10 18:37 ` Wouter Vlothuizen
1 sibling, 1 reply; 13+ messages in thread
From: Trond Myklebust @ 2003-09-06 17:09 UTC (permalink / raw)
To: Joshua Weage; +Cc: linux-kernel
>>>>> " " == Joshua Weage <weage98@yahoo.com> writes:
> Are there any commands that would allow me to figure out why
> the mount has stopped working? I've looked at nfsstat and the
> kernel seems to have stopped sending any data to the server, or
> it may send one packet every couple of seconds. If I start up
> another shell and try to do an ls on the problem filesystem,
> the command locks up and can't be interrupted. I think I've
> also mounted the same filesystem in another location, on the
> same machine, and it works fine.
Does 'dmesg' produce any clues as to what is going on?
How about tcpdump?
Cheers,
Trond
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: NFS client problems in 2.4.18 to 2.4.20
2003-09-06 17:09 ` Trond Myklebust
@ 2003-09-06 21:22 ` Joshua Weage
2003-09-06 23:14 ` Jamie Lokier
0 siblings, 1 reply; 13+ messages in thread
From: Joshua Weage @ 2003-09-06 21:22 UTC (permalink / raw)
To: trond.myklebust; +Cc: linux-kernel
There aren't any clues in the kernel logs, except that the kernel does
report "nfs server not responding" and never comes back with "nfs
server OK". I've enabled kernel debugging on all of the cluster nodes,
but the above message is all that I get in the logs.
I'll have to try out tcpdump the next time this happens.
Thanks,
Joshua Weage
>
> Does 'dmesg' produce any clues as to what is going on?
>
> How about tcpdump?
>
=====
__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: NFS client problems in 2.4.18 to 2.4.20
2003-09-06 21:22 ` Joshua Weage
@ 2003-09-06 23:14 ` Jamie Lokier
2003-09-07 1:54 ` Trond Myklebust
2003-09-07 2:02 ` Trond Myklebust
0 siblings, 2 replies; 13+ messages in thread
From: Jamie Lokier @ 2003-09-06 23:14 UTC (permalink / raw)
To: Joshua Weage; +Cc: trond.myklebust, linux-kernel
Joshua Weage wrote:
> There aren't any clues in the kernel logs, except that the kernel does
> report "nfs server not responding" and never comes back with "nfs
> server OK". I've enabled kernel debugging on all of the cluster nodes,
> but the above message is all that I get in the logs.
>
> I'll have to try out tcpdump the next time this happens.
Look for lots of retransmits from the client. This might be the bug
where it adjusts the retransmit timeout to a ridiculously small
sub-millisecond value, because of a sequence of fast cached responses
from the server, then when the server responds slowly due to a disk
access the client times out within milliseconds. Repeatedly.
-- Jamie
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: NFS client problems in 2.4.18 to 2.4.20
2003-09-06 23:14 ` Jamie Lokier
@ 2003-09-07 1:54 ` Trond Myklebust
2003-09-07 2:02 ` Trond Myklebust
1 sibling, 0 replies; 13+ messages in thread
From: Trond Myklebust @ 2003-09-07 1:54 UTC (permalink / raw)
To: Jamie Lokier; +Cc: Joshua Weage, linux-kernel
>>>>> " " == Jamie Lokier <jamie@shareable.org> writes:
> Look for lots of retransmits from the client. This might be
> the bug where it adjusts the retransmit timeout to a
> ridiculously small sub-millisecond value, because of a sequence
> of fast cached responses from the server, then when the server
> responds slowly due to a disk access the client times out
> within milliseconds. Repeatedly.
Nope. He said 2.4.18 to 2.4.20...
Cheers,
Trond
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: NFS client problems in 2.4.18 to 2.4.20
2003-09-06 23:14 ` Jamie Lokier
2003-09-07 1:54 ` Trond Myklebust
@ 2003-09-07 2:02 ` Trond Myklebust
2003-09-07 14:27 ` Jamie Lokier
1 sibling, 1 reply; 13+ messages in thread
From: Trond Myklebust @ 2003-09-07 2:02 UTC (permalink / raw)
To: Jamie Lokier; +Cc: linux-kernel
>>>>> " " == Jamie Lokier <jamie@shareable.org> writes:
> This might be the bug where it adjusts the retransmit timeout
> to a ridiculously small sub-millisecond value, because of a
> sequence of fast cached responses from the server
BTW: this should be fixed now in 2.6.x. I've set a minimum value on
the estimated error on the round-trip time to 1/10sec.
Cheers,
Trond
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: NFS client problems in 2.4.18 to 2.4.20
2003-09-07 2:02 ` Trond Myklebust
@ 2003-09-07 14:27 ` Jamie Lokier
2003-09-07 15:18 ` Trond Myklebust
0 siblings, 1 reply; 13+ messages in thread
From: Jamie Lokier @ 2003-09-07 14:27 UTC (permalink / raw)
To: Trond Myklebust; +Cc: linux-kernel
Trond Myklebust wrote:
> > This might be the bug where it adjusts the retransmit timeout
> > to a ridiculously small sub-millisecond value, because of a
> > sequence of fast cached responses from the server
>
> BTW: this should be fixed now in 2.6.x. I've set a minimum value on
> the estimated error on the round-trip time to 1/10sec.
I don't think the round-trip estimate was ever a real problem,
although setting a low bound does make sense.
A real problem is the rule of having a fixed number of retransmits
before an operation fails with a "soft" moount. This is wrong for
NFS, now that rtt is estimated dynamically.
It is wrong because NFS response times are not due to network
congestion - they are due mainly to I/O on the server, and I/O times
don't have the same properties as networks at all.
The "soft operation fail" imeout should have a minimum absolute time,
like 30 seconds or so. It should also have a maximum (for systems
where the estimated rtt is 10 seconds). This should be independent of
the rtt estimate.
Think of a worst case:
- server responds to cached requests within 10 microseconds
- uncached requests take 10 seconds to respond (spinning up CD,
seeking on tape HFS, or just ordinary disk/swap contention).
This should never timeout, as long as the server is responding within
a fixed absolute time, although it's fine to issue lots of retransmits
until that time.
The fundamental error is assuming that all NFS requests take about the
same time to server, and delays are caused by the network. This isn't
true especially on a LAN. Delays for NFS are typically caused by I/O,
and vary by 6 orders of magnitude from request to request.
-- Jamie
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: NFS client problems in 2.4.18 to 2.4.20
2003-09-07 14:27 ` Jamie Lokier
@ 2003-09-07 15:18 ` Trond Myklebust
2003-09-07 15:42 ` Jamie Lokier
0 siblings, 1 reply; 13+ messages in thread
From: Trond Myklebust @ 2003-09-07 15:18 UTC (permalink / raw)
To: Jamie Lokier; +Cc: Trond Myklebust, linux-kernel
>>>>> " " == Jamie Lokier <jamie@shareable.org> writes:
> A real problem is the rule of having a fixed number of
> retransmits before an operation fails with a "soft" moount.
> This is wrong for NFS, now that rtt is estimated dynamically.
> The "soft operation fail" imeout should have a minimum absolute
> time, like 30 seconds or so. It should also have a maximum
> (for systems where the estimated rtt is 10 seconds). This
> should be independent of the rtt estimate.
> - server responds to cached requests within 10 microseconds
> - uncached requests take 10 seconds to respond (spinning up CD,
> seeking on tape HFS, or just ordinary disk/swap
> contention).
This is not an issue for tapes, etc. NFS has an alternative mechanisms
for dealing with this in the form of the NFSERR_JUKEBOX error.
However for disks I agree that you do have 'large' variations between
the cached and uncached case. Should latency really be much larger
than 1/10 second for a 32k read though?
> The fundamental error is assuming that all NFS requests take
> about the same time to server, and delays are caused by the
> network. This isn't true especially on a LAN. Delays for NFS
> are typically caused by I/O, and vary by 6 orders of magnitude
> from request to request.
If it was merely a case of random error, then we wouldn't have a
problem at all. The RTT code does make an estimate of the error on the
the measurement. The problem is that there is a large tail in the
graph of round trip time vs. number of events due to these disk
spinups, etc...
However retransmissions compensate somewhat because they impose a
geometric increase in the timeout value. i.e. The for the first
transmission the timeout == the rto, then the retransmissions follow
2*rto, 4*rto, 8*rto,...
Part of the problem in the Linux case is therefore that we have a too
low default value for 'retrans'. The kernel default is '5' (same as
for Solaris, however the mount program still overrides that default
with a value of '3'. This implies that for soft mounts, we never wait
longer than 15*rto before we time out (well - 7*rto actually since the
code in xdr_adjust_timeout() actually appears to confuse number of
retransmissions with the number of transmissions).
By setting 'retrans=6' (5 + 1 to compensate for the bug), therefore,
people can ensure that we retry for at least 6 seconds before timing
out. The question is: is this an adequate default?
Cheers,
Trond
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: NFS client problems in 2.4.18 to 2.4.20
2003-09-07 15:18 ` Trond Myklebust
@ 2003-09-07 15:42 ` Jamie Lokier
2003-09-07 16:03 ` Trond Myklebust
0 siblings, 1 reply; 13+ messages in thread
From: Jamie Lokier @ 2003-09-07 15:42 UTC (permalink / raw)
To: Trond Myklebust; +Cc: linux-kernel
Trond Myklebust wrote:
> This is not an issue for tapes, etc. NFS has an alternative mechanisms
> for dealing with this in the form of the NFSERR_JUKEBOX error.
Oh, cool. Perhaps the server should send these automatically, when
I/O operations are taking a little bit too long?
> However for disks I agree that you do have 'large' variations between
> the cached and uncached case. Should latency really be much larger
> than 1/10 second for a 32k read though?
Yes, it often is. I seek 32k reads taking several seconds, when the
disk is busy with other tasks.
> > The fundamental error is assuming that all NFS requests take
> > about the same time to server, and delays are caused by the
> > network. This isn't true especially on a LAN. Delays for NFS
> > are typically caused by I/O, and vary by 6 orders of magnitude
> > from request to request.
>
> If it was merely a case of random error, then we wouldn't have a
> problem at all. The RTT code does make an estimate of the error on the
> the measurement. The problem is that there is a large tail in the
> graph of round trip time vs. number of events due to these disk
> spinups, etc...
Yes, exactly. Though I saw the tail due to normal disk seek + access
time is quite significant, compared with cached access time.
> However retransmissions compensate somewhat because they impose a
> geometric increase in the timeout value. i.e. The for the first
> transmission the timeout == the rto, then the retransmissions follow
> 2*rto, 4*rto, 8*rto,...
>
> Part of the problem in the Linux case is therefore that we have a too
> low default value for 'retrans'. The kernel default is '5' (same as
> for Solaris, however the mount program still overrides that default
> with a value of '3'. This implies that for soft mounts, we never wait
> longer than 15*rto before we time out (well - 7*rto actually since the
> code in xdr_adjust_timeout() actually appears to confuse number of
> retransmissions with the number of transmissions).
>
> By setting 'retrans=6' (5 + 1 to compensate for the bug), therefore,
> people can ensure that we retry for at least 6 seconds before timing
> out. The question is: is this an adequate default?
That would be a big improvement. I take it you have effectively
clamped the retransmit time at a minimum of 1/10 second, then? (I
didn't understand what you meant earlier).
Last time I used a soft mount, I was seeing the first retransmit after
some time smaller than a millisecond. (I don't remember, but 0.1ms
sounds about right). If that is the retransmit time, then retrans=6
won't be enough - retrans=16 would be needed. I don't think a good
correct retrans=xxx setting should depend on the network like that.
Setting a minimum retransmit time is one way to fix that.
-- Jamie
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: NFS client problems in 2.4.18 to 2.4.20
2003-09-07 15:42 ` Jamie Lokier
@ 2003-09-07 16:03 ` Trond Myklebust
0 siblings, 0 replies; 13+ messages in thread
From: Trond Myklebust @ 2003-09-07 16:03 UTC (permalink / raw)
To: Jamie Lokier; +Cc: Trond Myklebust, linux-kernel
>>>>> " " == Jamie Lokier <jamie@shareable.org> writes:
>> This is not an issue for tapes, etc. NFS has an alternative
>> mechanisms for dealing with this in the form of the
>> NFSERR_JUKEBOX error.
> Oh, cool. Perhaps the server should send these automatically,
> when I/O operations are taking a little bit too long?
Yes. Needs a patch to knfsd, but would be very useful for people that
want to export tapes, CD exchangers, etc...
>> By setting 'retrans=6' (5 + 1 to compensate for the bug),
>> therefore, people can ensure that we retry for at least 6
>> seconds before timing out. The question is: is this an adequate
>> default?
> That would be a big improvement. I take it you have
> effectively clamped the retransmit time at a minimum of 1/10
> second, then? (I didn't understand what you meant earlier).
Yes. When we calculate the timeout value, we add the estimated error*4
to the estimated round trip time. I've set a floor on the former value
so that the minumum timeout will be 1/10second.
> Last time I used a soft mount, I was seeing the first
> retransmit after some time smaller than a millisecond. (I
> don't remember, but 0.1ms sounds about right). If that is the
> retransmit time, then retrans=6 won't be enough - retrans=16
> would be needed. I don't think a good correct retrans=xxx
> setting should depend on the network like that. Setting a
> minimum retransmit time is one way to fix that.
It is already in 2.6.0. I'm expecting to put it into 2.4.23 too, but I
want to know that this (together with a patch to 'mount' to change the
retrans default) really does solve the problem for people...
Cheers,
Trond
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: NFS client problems in 2.4.18 to 2.4.20
2003-09-06 16:29 ` Joshua Weage
2003-09-06 17:09 ` Trond Myklebust
@ 2003-09-10 18:37 ` Wouter Vlothuizen
1 sibling, 0 replies; 13+ messages in thread
From: Wouter Vlothuizen @ 2003-09-10 18:37 UTC (permalink / raw)
To: Joshua Weage; +Cc: linux-kernel
Joshua Weage wrote:
>
> has stopped working? I've looked at nfsstat and the kernel seems to
> have stopped sending any data to the server, or it may send one packet
> every couple of seconds. If I start up another shell and try to do an
> ls on the problem filesystem, the command locks up and can't be
> interrupted. I think I've also mounted the same filesystem in another
> location, on the same machine, and it works fine.
>
I am experiencing similar problems with 2.4.18 as a client (the NFS
server is on Solaris). When the client freezes I see nfsstat 'client rpc
retrans' counting fast. I found a quite strange way to unlock the
machine, by performing a port scan with nmap from elsewhere.
BTW, which Gigabit ethernet do you use, I use tg3, there could be a
relation to the network card?
Cheers,
Wouter
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2003-09-10 18:37 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-09-05 20:44 NFS client problems in 2.4.18 to 2.4.20 Joshua Weage
2003-09-05 21:51 ` Trond Myklebust
2003-09-06 16:29 ` Joshua Weage
2003-09-06 17:09 ` Trond Myklebust
2003-09-06 21:22 ` Joshua Weage
2003-09-06 23:14 ` Jamie Lokier
2003-09-07 1:54 ` Trond Myklebust
2003-09-07 2:02 ` Trond Myklebust
2003-09-07 14:27 ` Jamie Lokier
2003-09-07 15:18 ` Trond Myklebust
2003-09-07 15:42 ` Jamie Lokier
2003-09-07 16:03 ` Trond Myklebust
2003-09-10 18:37 ` Wouter Vlothuizen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox