* Timeout issue (similar to bugs 11061 and 11154), bisected
@ 2009-02-16 11:11 Arto Jantunen
2009-02-16 13:04 ` Trond Myklebust
0 siblings, 1 reply; 3+ messages in thread
From: Arto Jantunen @ 2009-02-16 11:11 UTC (permalink / raw)
To: linux-nfs
(I'm not subscribed, so please CC me on any replies)
I seem to have hit a NFS bug while upgrading a machine from Debian
Etch to Debian Lenny. I have a NFS server running FreeBSD 7.0 RC1 and
a bunch of clients running Linux. The ones running kernel 2.6.18 work
perfectly, as do the ones running 2.6.24. The one I upgraded to 2.6.26
fails. After 5-15 minutes of working normally the mount dies and I get
the usual "nfs: server <server> not responding, still trying" in
dmesg. The only way I have found to get the mount back is umount -f &&
mount, waiting does not bring it back.
I have tested quite a bunch of different kernel versions, and starting
from 25 and ending at the git tree last week they all fail in the same
way. Bisecting tracks the problem to commit
e06799f958bf7f9f8fae15f0c6f519953fb0257c
I originally thought that it was the same as bug 11154, but the
patches attached to that bug do not fix this issue.
Any thoughts, patches, ideas?
--
Arto Jantunen
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Timeout issue (similar to bugs 11061 and 11154), bisected
2009-02-16 11:11 Timeout issue (similar to bugs 11061 and 11154), bisected Arto Jantunen
@ 2009-02-16 13:04 ` Trond Myklebust
[not found] ` <1234789459.7708.47.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
0 siblings, 1 reply; 3+ messages in thread
From: Trond Myklebust @ 2009-02-16 13:04 UTC (permalink / raw)
To: Arto Jantunen; +Cc: linux-nfs
[-- Attachment #1: Type: text/plain, Size: 1640 bytes --]
On Mon, 2009-02-16 at 13:11 +0200, Arto Jantunen wrote:
> (I'm not subscribed, so please CC me on any replies)
>
> I seem to have hit a NFS bug while upgrading a machine from Debian
> Etch to Debian Lenny. I have a NFS server running FreeBSD 7.0 RC1 and
> a bunch of clients running Linux. The ones running kernel 2.6.18 work
> perfectly, as do the ones running 2.6.24. The one I upgraded to 2.6.26
> fails. After 5-15 minutes of working normally the mount dies and I get
> the usual "nfs: server <server> not responding, still trying" in
> dmesg. The only way I have found to get the mount back is umount -f &&
> mount, waiting does not bring it back.
>
> I have tested quite a bunch of different kernel versions, and starting
> from 25 and ending at the git tree last week they all fail in the same
> way. Bisecting tracks the problem to commit
> e06799f958bf7f9f8fae15f0c6f519953fb0257c
>
> I originally thought that it was the same as bug 11154, but the
> patches attached to that bug do not fix this issue.
>
> Any thoughts, patches, ideas?
That looks like the known problem with the NFS server failing to close
connections in a timely manner. There is a fix for this in
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git&a=commitdiff&h=69b6ba3712b796a66595cfaf0a5ab4dfe1cf964a
There is also a client side patch that increases the robustness of the
client when it hits a buggy server, and that causes it to do the
equivalent of a linger2 timeout. That patch is as of yet not merged into
mainline, however I've attached it below together with a followup patch
that makes the timeout configurable...
Cheers
Trond
[-- Attachment #2: linux-2.6.28-100-add_tcp_linger.dif --]
[-- Type: application/x-dif, Size: 9185 bytes --]
[-- Attachment #3: linux-2.6.28-101-add_tcp_linger_sysctl.dif --]
[-- Type: application/x-dif, Size: 1904 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Timeout issue (similar to bugs 11061 and 11154), bisected
[not found] ` <1234789459.7708.47.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
@ 2009-02-17 10:38 ` Arto Jantunen
0 siblings, 0 replies; 3+ messages in thread
From: Arto Jantunen @ 2009-02-17 10:38 UTC (permalink / raw)
To: linux-nfs
Trond Myklebust <trond.myklebust@fys.uio.no> writes:
> On Mon, 2009-02-16 at 13:11 +0200, Arto Jantunen wrote:
>> (I'm not subscribed, so please CC me on any replies)
>>
>> I seem to have hit a NFS bug while upgrading a machine from Debian
>> Etch to Debian Lenny. I have a NFS server running FreeBSD 7.0 RC1 and
>> a bunch of clients running Linux. The ones running kernel 2.6.18 work
>> perfectly, as do the ones running 2.6.24. The one I upgraded to 2.6.26
>> fails. After 5-15 minutes of working normally the mount dies and I get
>> the usual "nfs: server <server> not responding, still trying" in
>> dmesg. The only way I have found to get the mount back is umount -f &&
>> mount, waiting does not bring it back.
>>
>> I have tested quite a bunch of different kernel versions, and starting
>> from 25 and ending at the git tree last week they all fail in the same
>> way. Bisecting tracks the problem to commit
>> e06799f958bf7f9f8fae15f0c6f519953fb0257c
>>
>> I originally thought that it was the same as bug 11154, but the
>> patches attached to that bug do not fix this issue.
>>
>> Any thoughts, patches, ideas?
>
> That looks like the known problem with the NFS server failing to close
> connections in a timely manner. There is a fix for this in
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git&a=commitdiff&h=69b6ba3712b796a66595cfaf0a5ab4dfe1cf964a
>
> There is also a client side patch that increases the robustness of the
> client when it hits a buggy server, and that causes it to do the
> equivalent of a linger2 timeout. That patch is as of yet not merged into
> mainline, however I've attached it below together with a followup patch
> that makes the timeout configurable...
The client side patch you attached hides the problem on the server,
after applying it the mount sticks around. As previously discussed,
the server is running an apparently buggy version of FreeBSD and I'd
rather not touch it right now since it is in production.
Thanks for your fast response.
--
Arto Jantunen
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2009-02-17 10:39 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-16 11:11 Timeout issue (similar to bugs 11061 and 11154), bisected Arto Jantunen
2009-02-16 13:04 ` Trond Myklebust
[not found] ` <1234789459.7708.47.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-02-17 10:38 ` Arto Jantunen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox