* nfs: server not responding, timed out
@ 2010-03-18 21:06 Dennis Nezic
[not found] ` <20100318170603.f6a7f188.dennisn-YN8wfZw00oOZ9vWoFJJngh2eb7JE58TQ@public.gmane.org>
0 siblings, 1 reply; 11+ messages in thread
From: Dennis Nezic @ 2010-03-18 21:06 UTC (permalink / raw)
To: linux-nfs
After upgrading my server (kernel 2.6.19 to 2.6.33, nfs-utils 1.1.0 to
1.2.1/1.1.4/1.1.6), and probably other stuff too), and possibly my
client laptop's kernel, I have suddenly started to get these "server X
not responding, timed out" errors (on my client), especially (only?)
when doing large file transfers. This would lead to input/output
errors, and the transfers would fail.
I never noticed any such problems for over two years, using the older
versions. The networking (wifi link) hasn't changed.
Usually the file transfer trips and falls over itself near the end --
Ie. it will do 600MB out of 800MB just fine, and then suddently start
giving these "timed out" errors, and then crash and burn. At this
point, I am forced to "umount -fl" the mount. If I then try to remount
it, the server acnowledges my "authenticated mount requests" perfectly
fine, but my client (laptop) still appears "hung". After a few minutes,
I am able to remount it.
I tried playing with the rsize/wsize/timeo/retrans variables, but none
of it seemed to fix the problem.
Any ideas about what has changed? Maybe this is/was a well-known
problem? :P
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: nfs: server not responding, timed out
[not found] ` <20100318170603.f6a7f188.dennisn-YN8wfZw00oOZ9vWoFJJngh2eb7JE58TQ@public.gmane.org>
@ 2010-03-19 2:21 ` Bian Naimeng
2010-03-19 4:27 ` Dennis Nezic
0 siblings, 1 reply; 11+ messages in thread
From: Bian Naimeng @ 2010-03-19 2:21 UTC (permalink / raw)
To: Dennis Nezic; +Cc: linux-nfs
> After upgrading my server (kernel 2.6.19 to 2.6.33, nfs-utils 1.1.0 to
> 1.2.1/1.1.4/1.1.6), and probably other stuff too), and possibly my
> client laptop's kernel, I have suddenly started to get these "server X
> not responding, timed out" errors (on my client), especially (only?)
> when doing large file transfers. This would lead to input/output
> errors, and the transfers would fail.
>
> I never noticed any such problems for over two years, using the older
> versions. The networking (wifi link) hasn't changed.
>
> Usually the file transfer trips and falls over itself near the end --
> Ie. it will do 600MB out of 800MB just fine, and then suddently start
> giving these "timed out" errors, and then crash and burn. At this
> point, I am forced to "umount -fl" the mount. If I then try to remount
> it, the server acnowledges my "authenticated mount requests" perfectly
> fine, but my client (laptop) still appears "hung". After a few minutes,
> I am able to remount it.
>
> I tried playing with the rsize/wsize/timeo/retrans variables, but none
> of it seemed to fix the problem.
>
> Any ideas about what has changed? Maybe this is/was a well-known
> problem? :P
>
I do not know the what's the reason. And I am not sure the followed discussion
can fix this problem, but maybe it can help you.
http://marc.info/?l=linux-nfs&m=123478426412524&w=2
Best Regards
Bian
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: nfs: server not responding, timed out
2010-03-19 2:21 ` Bian Naimeng
@ 2010-03-19 4:27 ` Dennis Nezic
2010-03-19 22:10 ` Dennis Nezic
0 siblings, 1 reply; 11+ messages in thread
From: Dennis Nezic @ 2010-03-19 4:27 UTC (permalink / raw)
To: linux-nfs
On Fri, 19 Mar 2010 10:21:57 +0800, Bian Naimeng wrote:
> > After upgrading my server (kernel 2.6.19 to 2.6.33, nfs-utils 1.1.0
> > to 1.2.1/1.1.4/1.1.6), and probably other stuff too), and possibly
> > my client laptop's kernel, I have suddenly started to get these
> > "server X not responding, timed out" errors (on my client),
> > especially (only?) when doing large file transfers. This would lead
> > to input/output errors, and the transfers would fail.
> >
> > I never noticed any such problems for over two years, using the
> > older versions. The networking (wifi link) hasn't changed.
> >
> > Usually the file transfer trips and falls over itself near the end
> > -- Ie. it will do 600MB out of 800MB just fine, and then suddently
> > start giving these "timed out" errors, and then crash and burn. At
> > this point, I am forced to "umount -fl" the mount. If I then try to
> > remount it, the server acnowledges my "authenticated mount
> > requests" perfectly fine, but my client (laptop) still appears
> > "hung". After a few minutes, I am able to remount it.
> >
> > I tried playing with the rsize/wsize/timeo/retrans variables, but
> > none of it seemed to fix the problem.
> >
> > Any ideas about what has changed? Maybe this is/was a well-known
> > problem? :P
> >
>
> I do not know the what's the reason. And I am not sure the followed
> discussion can fix this problem, but maybe it can help you.
> http://marc.info/?l=linux-nfs&m=123478426412524&w=2
Both the patches mentioned in that thread already seem to have been
applied to my kernels. So, although the problem seems related, it
wasn't that bug in particular. The person in that thread was talking
about mounts dying after 5-15minutes, which doesn't happen with me --
my problem only seems to occur under intense activity.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: nfs: server not responding, timed out
2010-03-19 4:27 ` Dennis Nezic
@ 2010-03-19 22:10 ` Dennis Nezic
2010-03-20 14:52 ` Dennis Nezic
[not found] ` <20100319181038.c94fa3c4.dennisn-YN8wfZw00oOZ9vWoFJJngh2eb7JE58TQ@public.gmane.org>
0 siblings, 2 replies; 11+ messages in thread
From: Dennis Nezic @ 2010-03-19 22:10 UTC (permalink / raw)
To: linux-nfs
On Fri, 19 Mar 2010 00:27:20 -0400, Dennis Nezic wrote:
> On Fri, 19 Mar 2010 10:21:57 +0800, Bian Naimeng wrote:
> > > After upgrading my server (kernel 2.6.19 to 2.6.33, nfs-utils
> > > 1.1.0 to 1.2.1/1.1.4/1.1.6), and probably other stuff too), and
> > > possibly my client laptop's kernel, I have suddenly started to
> > > get these "server X not responding, timed out" errors (on my
> > > client), especially (only?) when doing large file transfers. This
> > > would lead to input/output errors, and the transfers would fail.
> > >
> > > I never noticed any such problems for over two years, using the
> > > older versions. The networking (wifi link) hasn't changed.
> > >
> > > Usually the file transfer trips and falls over itself near the end
> > > -- Ie. it will do 600MB out of 800MB just fine, and then suddently
> > > start giving these "timed out" errors, and then crash and burn. At
> > > this point, I am forced to "umount -fl" the mount. If I then try
> > > to remount it, the server acnowledges my "authenticated mount
> > > requests" perfectly fine, but my client (laptop) still appears
> > > "hung". After a few minutes, I am able to remount it.
> > >
> > > I tried playing with the rsize/wsize/timeo/retrans variables, but
> > > none of it seemed to fix the problem.
> > >
> > > Any ideas about what has changed? Maybe this is/was a well-known
> > > problem? :P
> > >
> >
> > I do not know the what's the reason. And I am not sure the
> > followed discussion can fix this problem, but maybe it can help you.
> > http://marc.info/?l=linux-nfs&m=123478426412524&w=2
>
> Both the patches mentioned in that thread already seem to have been
> applied to my kernels. So, although the problem seems related, it
> wasn't that bug in particular. The person in that thread was talking
> about mounts dying after 5-15minutes, which doesn't happen with me --
> my problem only seems to occur under intense activity.
Hrm. I just noticed that my scp transfers are stalling -- which also
didn't used to happen before with my old kernel. No error messages. Ftp
transfers work fine. Eek. :S. (Despite the freezing/stalling, my
*actual* network connection works perfectly.)
Ideas?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: nfs: server not responding, timed out
2010-03-19 22:10 ` Dennis Nezic
@ 2010-03-20 14:52 ` Dennis Nezic
[not found] ` <20100320105237.1353566e.dennisn-YN8wfZw00oOZ9vWoFJJngh2eb7JE58TQ@public.gmane.org>
[not found] ` <20100319181038.c94fa3c4.dennisn-YN8wfZw00oOZ9vWoFJJngh2eb7JE58TQ@public.gmane.org>
1 sibling, 1 reply; 11+ messages in thread
From: Dennis Nezic @ 2010-03-20 14:52 UTC (permalink / raw)
To: linux-nfs
On Fri, 19 Mar 2010 18:10:38 -0400, Dennis Nezic wrote:
> On Fri, 19 Mar 2010 00:27:20 -0400, Dennis Nezic wrote:
> > On Fri, 19 Mar 2010 10:21:57 +0800, Bian Naimeng wrote:
> > > > After upgrading my server (kernel 2.6.19 to 2.6.33, nfs-utils
> > > > 1.1.0 to 1.2.1/1.1.4/1.1.6), and probably other stuff too), and
> > > > possibly my client laptop's kernel, I have suddenly started to
> > > > get these "server X not responding, timed out" errors (on my
> > > > client), especially (only?) when doing large file transfers.
> > > > This would lead to input/output errors, and the transfers would
> > > > fail.
> > > >
> > > > I never noticed any such problems for over two years, using the
> > > > older versions. The networking (wifi link) hasn't changed.
> > > >
> > > > Usually the file transfer trips and falls over itself near the
> > > > end
> > > > -- Ie. it will do 600MB out of 800MB just fine, and then
> > > > suddently start giving these "timed out" errors, and then crash
> > > > and burn. At this point, I am forced to "umount -fl" the mount.
> > > > If I then try to remount it, the server acnowledges my
> > > > "authenticated mount requests" perfectly fine, but my client
> > > > (laptop) still appears "hung". After a few minutes, I am able
> > > > to remount it.
> > > >
> > > > I tried playing with the rsize/wsize/timeo/retrans variables,
> > > > but none of it seemed to fix the problem.
> > > >
> > > > Any ideas about what has changed? Maybe this is/was a well-known
> > > > problem? :P
> > > >
> > >
> > > I do not know the what's the reason. And I am not sure the
> > > followed discussion can fix this problem, but maybe it can help
> > > you. http://marc.info/?l=linux-nfs&m=123478426412524&w=2
> >
> > Both the patches mentioned in that thread already seem to have been
> > applied to my kernels. So, although the problem seems related, it
> > wasn't that bug in particular. The person in that thread was talking
> > about mounts dying after 5-15minutes, which doesn't happen with me
> > -- my problem only seems to occur under intense activity.
>
> Hrm. I just noticed that my scp transfers are stalling -- which also
> didn't used to happen before with my old kernel. No error messages.
> Ftp transfers work fine. Eek. :S. (Despite the freezing/stalling, my
> *actual* network connection works perfectly.)
>
> Ideas?
It seems that changing the mount options from "soft" to "hard" seems to
"work" -- at least the transfers eventually finish! Although there are
still stalls of 6-8minutes ... between the 16 syslog error messages:
"nfs: server XYZ not responding, still trying" and the 16 subsequent
error messages: "nfs: server XYZ OK". The key difference being that
with "hard", it is "still trying" rather than "timed out".
Now why is it stalling for so long?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: nfs: server not responding, timed out
[not found] ` <20100320105237.1353566e.dennisn-YN8wfZw00oOZ9vWoFJJngh2eb7JE58TQ@public.gmane.org>
@ 2010-03-20 15:42 ` Krzysztof Adamski
2010-03-20 20:28 ` Dennis Nezic
0 siblings, 1 reply; 11+ messages in thread
From: Krzysztof Adamski @ 2010-03-20 15:42 UTC (permalink / raw)
To: Linux NFS Mailing list
On Sat, 2010-03-20 at 10:52 -0400, Dennis Nezic wrote:
> On Fri, 19 Mar 2010 18:10:38 -0400, Dennis Nezic wrote:
> > On Fri, 19 Mar 2010 00:27:20 -0400, Dennis Nezic wrote:
> > > On Fri, 19 Mar 2010 10:21:57 +0800, Bian Naimeng wrote:
> > > > > After upgrading my server (kernel 2.6.19 to 2.6.33, nfs-utils
> > > > > 1.1.0 to 1.2.1/1.1.4/1.1.6), and probably other stuff too), and
> > > > > possibly my client laptop's kernel, I have suddenly started to
> > > > > get these "server X not responding, timed out" errors (on my
> > > > > client), especially (only?) when doing large file transfers.
> > > > > This would lead to input/output errors, and the transfers would
> > > > > fail.
> > > > >
> > > > > I never noticed any such problems for over two years, using the
> > > > > older versions. The networking (wifi link) hasn't changed.
> > > > >
> > > > > Usually the file transfer trips and falls over itself near the
> > > > > end
> > > > > -- Ie. it will do 600MB out of 800MB just fine, and then
> > > > > suddently start giving these "timed out" errors, and then crash
> > > > > and burn. At this point, I am forced to "umount -fl" the mount.
> > > > > If I then try to remount it, the server acnowledges my
> > > > > "authenticated mount requests" perfectly fine, but my client
> > > > > (laptop) still appears "hung". After a few minutes, I am able
> > > > > to remount it.
> > > > >
> > > > > I tried playing with the rsize/wsize/timeo/retrans variables,
> > > > > but none of it seemed to fix the problem.
> > > > >
> > > > > Any ideas about what has changed? Maybe this is/was a well-known
> > > > > problem? :P
> > > > >
> > > >
> > > > I do not know the what's the reason. And I am not sure the
> > > > followed discussion can fix this problem, but maybe it can help
> > > > you. http://marc.info/?l=linux-nfs&m=123478426412524&w=2
> > >
> > > Both the patches mentioned in that thread already seem to have been
> > > applied to my kernels. So, although the problem seems related, it
> > > wasn't that bug in particular. The person in that thread was talking
> > > about mounts dying after 5-15minutes, which doesn't happen with me
> > > -- my problem only seems to occur under intense activity.
> >
> > Hrm. I just noticed that my scp transfers are stalling -- which also
> > didn't used to happen before with my old kernel. No error messages.
> > Ftp transfers work fine. Eek. :S. (Despite the freezing/stalling, my
> > *actual* network connection works perfectly.)
> >
> > Ideas?
>
> It seems that changing the mount options from "soft" to "hard" seems to
> "work" -- at least the transfers eventually finish! Although there are
> still stalls of 6-8minutes ... between the 16 syslog error messages:
> "nfs: server XYZ not responding, still trying" and the 16 subsequent
> error messages: "nfs: server XYZ OK". The key difference being that
> with "hard", it is "still trying" rather than "timed out".
>
> Now why is it stalling for so long?
I can't tell you why, but I had the same problem with NFS server in
2.6.32.*. Try 2.6.31.something to see if the problem goes away.
K
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: nfs: server not responding, timed out
2010-03-20 15:42 ` Krzysztof Adamski
@ 2010-03-20 20:28 ` Dennis Nezic
[not found] ` <20100320162845.c6b7b6c4.dennisn-YN8wfZw00oOZ9vWoFJJngh2eb7JE58TQ@public.gmane.org>
0 siblings, 1 reply; 11+ messages in thread
From: Dennis Nezic @ 2010-03-20 20:28 UTC (permalink / raw)
To: linux-nfs
On Sat, 20 Mar 2010 11:42:33 -0400, Krzysztof Adamski wrote:
> On Sat, 2010-03-20 at 10:52 -0400, Dennis Nezic wrote:
> > On Fri, 19 Mar 2010 18:10:38 -0400, Dennis Nezic wrote:
> > > On Fri, 19 Mar 2010 00:27:20 -0400, Dennis Nezic wrote:
> > > > On Fri, 19 Mar 2010 10:21:57 +0800, Bian Naimeng wrote:
> > > > > > After upgrading my server (kernel 2.6.19 to 2.6.33,
> > > > > > nfs-utils 1.1.0 to 1.2.1/1.1.4/1.1.6), and probably other
> > > > > > stuff too), and possibly my client laptop's kernel, I have
> > > > > > suddenly started to get these "server X not responding,
> > > > > > timed out" errors (on my client), especially (only?) when
> > > > > > doing large file transfers. This would lead to input/output
> > > > > > errors, and the transfers would fail.
> > > > > >
> > > > > > I never noticed any such problems for over two years, using
> > > > > > the older versions. The networking (wifi link) hasn't
> > > > > > changed.
> > > > > >
> > > > > > Usually the file transfer trips and falls over itself near
> > > > > > the end
> > > > > > -- Ie. it will do 600MB out of 800MB just fine, and then
> > > > > > suddently start giving these "timed out" errors, and then
> > > > > > crash and burn. At this point, I am forced to "umount -fl"
> > > > > > the mount. If I then try to remount it, the server
> > > > > > acnowledges my "authenticated mount requests" perfectly
> > > > > > fine, but my client (laptop) still appears "hung". After a
> > > > > > few minutes, I am able to remount it.
> > > > > >
> > > > > > I tried playing with the rsize/wsize/timeo/retrans
> > > > > > variables, but none of it seemed to fix the problem.
> > > > > >
> > > > > > Any ideas about what has changed? Maybe this is/was a
> > > > > > well-known problem? :P
> > > > > >
> > > > >
> > > > > I do not know the what's the reason. And I am not sure the
> > > > > followed discussion can fix this problem, but maybe it can
> > > > > help you. http://marc.info/?l=linux-nfs&m=123478426412524&w=2
> > > >
> > > > Both the patches mentioned in that thread already seem to have
> > > > been applied to my kernels. So, although the problem seems
> > > > related, it wasn't that bug in particular. The person in that
> > > > thread was talking about mounts dying after 5-15minutes, which
> > > > doesn't happen with me
> > > > -- my problem only seems to occur under intense activity.
> > >
> > > Hrm. I just noticed that my scp transfers are stalling -- which
> > > also didn't used to happen before with my old kernel. No error
> > > messages. Ftp transfers work fine. Eek. :S. (Despite the
> > > freezing/stalling, my *actual* network connection works
> > > perfectly.)
> > >
> > > Ideas?
> >
> > It seems that changing the mount options from "soft" to "hard"
> > seems to "work" -- at least the transfers eventually finish!
> > Although there are still stalls of 6-8minutes ... between the 16
> > syslog error messages: "nfs: server XYZ not responding, still
> > trying" and the 16 subsequent error messages: "nfs: server XYZ OK".
> > The key difference being that with "hard", it is "still trying"
> > rather than "timed out".
> >
> > Now why is it stalling for so long?
>
> I can't tell you why, but I had the same problem with NFS server in
> 2.6.32.*. Try 2.6.31.something to see if the problem goes away.
I'll try that.
(By the way, do you also access your nfs server over wifi? It wouldn't
happen to be the b43 driver on the client side? I only ask because
somehow (by setting timeo=10) I managed to get my client in a state
where the transfer (actually an mplayer streaming) seemed frozen, but
the wifi activity still appeared to be streaming. Although, this
didn't happen before when timeo was the default 10minutes, so it's
probably unrelated.)
Here is a gratuitous graph when I tried to transfer a ~2G file from my
nfs server to my wifi nfs client. The plateaus are where it stalls (no
net traffic (although the network still works fine)), usually for about
16 minutes, which includes the default timeo=600(s) plus the ~6min
delay between the "server not responding" and "OK" messages.
http://dennisn.dyndns.org/guest/pubstuff/nfs-debug/nfs-stalling-2g-file-transfer.jpg
Maybe I should also note that during the "stalls", "rpcinfo -t server
1000XY 3" (I use nfs3) all report "ready and waiting". Maybe there are
other things I can check to pinpoint the fault?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: nfs: server not responding, timed out
[not found] ` <20100320162845.c6b7b6c4.dennisn-YN8wfZw00oOZ9vWoFJJngh2eb7JE58TQ@public.gmane.org>
@ 2010-03-21 4:16 ` Krzysztof Adamski
2010-03-27 15:04 ` Dennis Nezic
0 siblings, 1 reply; 11+ messages in thread
From: Krzysztof Adamski @ 2010-03-21 4:16 UTC (permalink / raw)
To: linux-nfs
On Sat, 2010-03-20 at 16:28 -0400, Dennis Nezic wrote:
> On Sat, 20 Mar 2010 11:42:33 -0400, Krzysztof Adamski wrote:
> > On Sat, 2010-03-20 at 10:52 -0400, Dennis Nezic wrote:
> > > On Fri, 19 Mar 2010 18:10:38 -0400, Dennis Nezic wrote:
> > > > On Fri, 19 Mar 2010 00:27:20 -0400, Dennis Nezic wrote:
> > > > > On Fri, 19 Mar 2010 10:21:57 +0800, Bian Naimeng wrote:
> > > > > > > After upgrading my server (kernel 2.6.19 to 2.6.33,
> > > > > > > nfs-utils 1.1.0 to 1.2.1/1.1.4/1.1.6), and probably other
> > > > > > > stuff too), and possibly my client laptop's kernel, I have
> > > > > > > suddenly started to get these "server X not responding,
> > > > > > > timed out" errors (on my client), especially (only?) when
> > > > > > > doing large file transfers. This would lead to input/output
> > > > > > > errors, and the transfers would fail.
> > > > > > >
> > > > > > > I never noticed any such problems for over two years, using
> > > > > > > the older versions. The networking (wifi link) hasn't
> > > > > > > changed.
> > > > > > >
> > > > > > > Usually the file transfer trips and falls over itself near
> > > > > > > the end
> > > > > > > -- Ie. it will do 600MB out of 800MB just fine, and then
> > > > > > > suddently start giving these "timed out" errors, and then
> > > > > > > crash and burn. At this point, I am forced to "umount -fl"
> > > > > > > the mount. If I then try to remount it, the server
> > > > > > > acnowledges my "authenticated mount requests" perfectly
> > > > > > > fine, but my client (laptop) still appears "hung". After a
> > > > > > > few minutes, I am able to remount it.
> > > > > > >
> > > > > > > I tried playing with the rsize/wsize/timeo/retrans
> > > > > > > variables, but none of it seemed to fix the problem.
> > > > > > >
> > > > > > > Any ideas about what has changed? Maybe this is/was a
> > > > > > > well-known problem? :P
> > > > > > >
> > > > > >
> > > > > > I do not know the what's the reason. And I am not sure the
> > > > > > followed discussion can fix this problem, but maybe it can
> > > > > > help you. http://marc.info/?l=linux-nfs&m=123478426412524&w=2
> > > > >
> > > > > Both the patches mentioned in that thread already seem to have
> > > > > been applied to my kernels. So, although the problem seems
> > > > > related, it wasn't that bug in particular. The person in that
> > > > > thread was talking about mounts dying after 5-15minutes, which
> > > > > doesn't happen with me
> > > > > -- my problem only seems to occur under intense activity.
> > > >
> > > > Hrm. I just noticed that my scp transfers are stalling -- which
> > > > also didn't used to happen before with my old kernel. No error
> > > > messages. Ftp transfers work fine. Eek. :S. (Despite the
> > > > freezing/stalling, my *actual* network connection works
> > > > perfectly.)
> > > >
> > > > Ideas?
> > >
> > > It seems that changing the mount options from "soft" to "hard"
> > > seems to "work" -- at least the transfers eventually finish!
> > > Although there are still stalls of 6-8minutes ... between the 16
> > > syslog error messages: "nfs: server XYZ not responding, still
> > > trying" and the 16 subsequent error messages: "nfs: server XYZ OK".
> > > The key difference being that with "hard", it is "still trying"
> > > rather than "timed out".
> > >
> > > Now why is it stalling for so long?
> >
> > I can't tell you why, but I had the same problem with NFS server in
> > 2.6.32.*. Try 2.6.31.something to see if the problem goes away.
>
> I'll try that.
>
> (By the way, do you also access your nfs server over wifi? It wouldn't
> happen to be the b43 driver on the client side? I only ask because
> somehow (by setting timeo=10) I managed to get my client in a state
> where the transfer (actually an mplayer streaming) seemed frozen, but
> the wifi activity still appeared to be streaming. Although, this
> didn't happen before when timeo was the default 10minutes, so it's
> probably unrelated.)
No, no wifi, just gigabit network.
K
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: nfs: server not responding, timed out
2010-03-21 4:16 ` Krzysztof Adamski
@ 2010-03-27 15:04 ` Dennis Nezic
0 siblings, 0 replies; 11+ messages in thread
From: Dennis Nezic @ 2010-03-27 15:04 UTC (permalink / raw)
To: linux-nfs
On Sun, 21 Mar 2010 00:16:03 -0400, Krzysztof Adamski wrote:
> On Sat, 2010-03-20 at 16:28 -0400, Dennis Nezic wrote:
> > On Sat, 20 Mar 2010 11:42:33 -0400, Krzysztof Adamski wrote:
> > > On Sat, 2010-03-20 at 10:52 -0400, Dennis Nezic wrote:
> > > > On Fri, 19 Mar 2010 18:10:38 -0400, Dennis Nezic wrote:
> > > > > On Fri, 19 Mar 2010 00:27:20 -0400, Dennis Nezic wrote:
> > > > > > On Fri, 19 Mar 2010 10:21:57 +0800, Bian Naimeng wrote:
> > > > > > > > After upgrading my server (kernel 2.6.19 to 2.6.33,
> > > > > > > > nfs-utils 1.1.0 to 1.2.1/1.1.4/1.1.6), and probably
> > > > > > > > other stuff too), and possibly my client laptop's
> > > > > > > > kernel, I have suddenly started to get these "server X
> > > > > > > > not responding, timed out" errors (on my client),
> > > > > > > > especially (only?) when doing large file transfers.
> > > > > > > > This would lead to input/output errors, and the
> > > > > > > > transfers would fail.
> > > > > > > >
> > > > > > > > I never noticed any such problems for over two years,
> > > > > > > > using the older versions. The networking (wifi link)
> > > > > > > > hasn't changed.
> > > > > > > >
> > > > > > > > Usually the file transfer trips and falls over itself
> > > > > > > > near the end
> > > > > > > > -- Ie. it will do 600MB out of 800MB just fine, and then
> > > > > > > > suddently start giving these "timed out" errors, and
> > > > > > > > then crash and burn. At this point, I am forced to
> > > > > > > > "umount -fl" the mount. If I then try to remount it,
> > > > > > > > the server acnowledges my "authenticated mount
> > > > > > > > requests" perfectly fine, but my client (laptop) still
> > > > > > > > appears "hung". After a few minutes, I am able to
> > > > > > > > remount it.
> > > > > > > >
> > > > > > > > I tried playing with the rsize/wsize/timeo/retrans
> > > > > > > > variables, but none of it seemed to fix the problem.
> > > > > > > >
> > > > > > > > Any ideas about what has changed? Maybe this is/was a
> > > > > > > > well-known problem? :P
> > > > > > > >
> > > > > > >
> > > > > > > I do not know the what's the reason. And I am not sure
> > > > > > > the followed discussion can fix this problem, but maybe
> > > > > > > it can help you.
> > > > > > > http://marc.info/?l=linux-nfs&m=123478426412524&w=2
> > > > > >
> > > > > > Both the patches mentioned in that thread already seem to
> > > > > > have been applied to my kernels. So, although the problem
> > > > > > seems related, it wasn't that bug in particular. The person
> > > > > > in that thread was talking about mounts dying after
> > > > > > 5-15minutes, which doesn't happen with me
> > > > > > -- my problem only seems to occur under intense activity.
> > > > >
> > > > > Hrm. I just noticed that my scp transfers are stalling --
> > > > > which also didn't used to happen before with my old kernel.
> > > > > No error messages. Ftp transfers work fine. Eek. :S. (Despite
> > > > > the freezing/stalling, my *actual* network connection works
> > > > > perfectly.)
> > > > >
> > > > > Ideas?
> > > >
> > > > It seems that changing the mount options from "soft" to "hard"
> > > > seems to "work" -- at least the transfers eventually finish!
> > > > Although there are still stalls of 6-8minutes ... between the 16
> > > > syslog error messages: "nfs: server XYZ not responding, still
> > > > trying" and the 16 subsequent error messages: "nfs: server XYZ
> > > > OK". The key difference being that with "hard", it is "still
> > > > trying" rather than "timed out".
> > > >
> > > > Now why is it stalling for so long?
> > >
> > > I can't tell you why, but I had the same problem with NFS server
> > > in 2.6.32.*. Try 2.6.31.something to see if the problem goes away.
> >
> > I'll try that.
> >
> > (By the way, do you also access your nfs server over wifi? It
> > wouldn't happen to be the b43 driver on the client side? I only ask
> > because somehow (by setting timeo=10) I managed to get my client in
> > a state where the transfer (actually an mplayer streaming) seemed
> > frozen, but the wifi activity still appeared to be streaming.
> > Although, this didn't happen before when timeo was the default
> > 10minutes, so it's probably unrelated.)
>
> No, no wifi, just gigabit network.
Hrrm. With my wired ethernet connection, I haven't (yet) been able to
reproduce the problem. It looks like some kind of low-level networking
(driver) problem :\.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: nfs: server not responding, timed out
[not found] ` <20100319181038.c94fa3c4.dennisn-YN8wfZw00oOZ9vWoFJJngh2eb7JE58TQ@public.gmane.org>
@ 2010-03-30 14:44 ` J. Bruce Fields
2010-03-31 15:59 ` Dennis Nezic
0 siblings, 1 reply; 11+ messages in thread
From: J. Bruce Fields @ 2010-03-30 14:44 UTC (permalink / raw)
To: Dennis Nezic; +Cc: linux-nfs
On Fri, Mar 19, 2010 at 06:10:38PM -0400, Dennis Nezic wrote:
> On Fri, 19 Mar 2010 00:27:20 -0400, Dennis Nezic wrote:
> >
> > Both the patches mentioned in that thread already seem to have been
> > applied to my kernels. So, although the problem seems related, it
> > wasn't that bug in particular. The person in that thread was talking
> > about mounts dying after 5-15minutes, which doesn't happen with me --
> > my problem only seems to occur under intense activity.
>
> Hrm. I just noticed that my scp transfers are stalling -- which also
> didn't used to happen before with my old kernel. No error messages. Ftp
> transfers work fine. Eek. :S. (Despite the freezing/stalling, my
> *actual* network connection works perfectly.)
That also suggests some network problem.... Is the scp problem
reproduceable? Are packets getting dropped?
Also: is the kernel dumping any backtraces into the server's logs?
--b.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: nfs: server not responding, timed out
2010-03-30 14:44 ` J. Bruce Fields
@ 2010-03-31 15:59 ` Dennis Nezic
0 siblings, 0 replies; 11+ messages in thread
From: Dennis Nezic @ 2010-03-31 15:59 UTC (permalink / raw)
To: linux-nfs
On Tue, 30 Mar 2010 10:44:38 -0400, J. Bruce Fields wrote:
> On Fri, Mar 19, 2010 at 06:10:38PM -0400, Dennis Nezic wrote:
> > On Fri, 19 Mar 2010 00:27:20 -0400, Dennis Nezic wrote:
> > >
> > > Both the patches mentioned in that thread already seem to have
> > > been applied to my kernels. So, although the problem seems
> > > related, it wasn't that bug in particular. The person in that
> > > thread was talking about mounts dying after 5-15minutes, which
> > > doesn't happen with me -- my problem only seems to occur under
> > > intense activity.
> >
> > Hrm. I just noticed that my scp transfers are stalling -- which also
> > didn't used to happen before with my old kernel. No error messages.
> > Ftp transfers work fine. Eek. :S. (Despite the freezing/stalling, my
> > *actual* network connection works perfectly.)
>
> That also suggests some network problem.... Is the scp problem
> reproduceable? Are packets getting dropped?
The scp problem is quite reproduceable -- when it decides to act up, it
quite consistently freezes/stalls at the same point (+/- a few (dozen)
MB) -- at least when testing roughly at the same time. I usually give
up after about 10 attempts. Sometimes, when it's a full moon, it will
work after ten attempts. (Restarting sshd has no effect.)
I haven't done any tcpdump yet.
I suspect the nfs stalls are also similarly reproduceable -- except
it's harder to tell since it doesn't display the progress as nicely as
scp :b. However, I did notice that nfs transfers very often stall at the
very beginning (I'm not sure if it's at byte 0, or a few MB in) -- as
well as at various points in the middle. (It "feels" like a bursting
buffer problem -- I remember maany times mplayer playing songs over
nfs, and having it stall before beginning the next song -- it buffered
about 32KB, but was waiting for a few more before it could start
playing. Very annoying :|.)
>
> Also: is the kernel dumping any backtraces into the server's logs?
Nothing suspicious in the system logs, nor in verbose mode.
I'll try a different wifi driver soon, and see if the problem persists.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2010-03-31 15:59 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-18 21:06 nfs: server not responding, timed out Dennis Nezic
[not found] ` <20100318170603.f6a7f188.dennisn-YN8wfZw00oOZ9vWoFJJngh2eb7JE58TQ@public.gmane.org>
2010-03-19 2:21 ` Bian Naimeng
2010-03-19 4:27 ` Dennis Nezic
2010-03-19 22:10 ` Dennis Nezic
2010-03-20 14:52 ` Dennis Nezic
[not found] ` <20100320105237.1353566e.dennisn-YN8wfZw00oOZ9vWoFJJngh2eb7JE58TQ@public.gmane.org>
2010-03-20 15:42 ` Krzysztof Adamski
2010-03-20 20:28 ` Dennis Nezic
[not found] ` <20100320162845.c6b7b6c4.dennisn-YN8wfZw00oOZ9vWoFJJngh2eb7JE58TQ@public.gmane.org>
2010-03-21 4:16 ` Krzysztof Adamski
2010-03-27 15:04 ` Dennis Nezic
[not found] ` <20100319181038.c94fa3c4.dennisn-YN8wfZw00oOZ9vWoFJJngh2eb7JE58TQ@public.gmane.org>
2010-03-30 14:44 ` J. Bruce Fields
2010-03-31 15:59 ` Dennis Nezic
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.