* nfsd: terminating on error 104 problem
@ 2004-02-25 16:55 Johan van den Dorpe
0 siblings, 0 replies; 5+ messages in thread
From: Johan van den Dorpe @ 2004-02-25 16:55 UTC (permalink / raw)
To: linux-kernel
Hi all
We are currently using quite a number of HP DL380 servers within our
company that use the 2.4.25 kernel. These are primarily used for heavy
NFS access, so we keep a large number of nfsd processes concurrently
running. We have noticed over time however that nfsd processes
periodically die. From inspection of the system logs, we get numerous
entries:
Feb 22 12:25:24 ps29 kernel: nfsd: recvfrom returned errno 104
Feb 22 12:25:24 ps29 kernel: nfsd: terminating on error 104
At the moment we cron a script that counts the number of nfsds and
restart rpc.nfsd if they drop below a threshold. Although this is a
working solution, it's not ideal and we would really like to get his
problem patched up properly.
So from my limited knowledge of the kernel source I can see that
"terminating on error 104" corresponds to line 221 of
/usr/src/linux-2.4.25/fs/nfsd/nfssvc.c. So svc_recv on line 191 is
obviously returning -104.
I've noticed that in the 2.6.0 kernel there are quite a few changes to
nfssvc.c, and I wondered if they dealt with this situation.
In the mean time, are there any quick hacks I could add to nfssvc.c to
make it tolerate error -104? Could I safely alter the main request loop
to simply continue execution if svc_recv returns this code?
Any help would be much appreciated.
many thanks
--
Johan van den Dorpe
^ permalink raw reply [flat|nested] 5+ messages in thread
* nfsd: terminating on error 104 problem
@ 2004-03-04 10:41 Johan van den Dorpe
2004-03-04 11:11 ` Olaf Kirch
0 siblings, 1 reply; 5+ messages in thread
From: Johan van den Dorpe @ 2004-03-04 10:41 UTC (permalink / raw)
To: nfs
Hi all
We are currently using quite a number of HP DL380 servers within our
company that use the 2.4.25 kernel. These are primarily used for heavy
NFS access, so we keep a large number of nfsd processes concurrently
running. We have noticed over time however that single instances of nfsd
processes periodically die. From inspection of the system logs, we get
numerous entries:
Feb 22 12:25:24 ps29 kernel: nfsd: recvfrom returned errno 104
Feb 22 12:25:24 ps29 kernel: nfsd: terminating on error 104
At the moment we cron a script that counts the number of nfsds and
restart rpc.nfsd if they drop below a threshold. Although this is a
working solution, it's not ideal and we would really like to get his
problem patched up properly.
So from my limited knowledge of the kernel source I can see that
"terminating on error 104" corresponds to line 221 of
/usr/src/linux-2.4.25/fs/nfsd/nfssvc.c. So svc_recv on line 191 is
obviously returning -104.
I've noticed that in the 2.6 kernel there are quite a few changes to
nfssvc.c, and I wondered if they dealt with this situation.
In the mean time, are there any quick hacks I could add to nfssvc.c to
make it tolerate error -104? Could I safely alter the main request loop
to simply continue execution if svc_recv returns this code?
Any help would be much appreciated.
Many thanks,
--
Johan van den Dorpe
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: nfsd: terminating on error 104 problem
2004-03-04 10:41 Johan van den Dorpe
@ 2004-03-04 11:11 ` Olaf Kirch
2004-03-04 12:59 ` Johan van den Dorpe
0 siblings, 1 reply; 5+ messages in thread
From: Olaf Kirch @ 2004-03-04 11:11 UTC (permalink / raw)
To: Johan van den Dorpe; +Cc: nfs
[-- Attachment #1: Type: text/plain, Size: 654 bytes --]
On Thu, Mar 04, 2004 at 10:41:02AM +0000, Johan van den Dorpe wrote:
> So from my limited knowledge of the kernel source I can see that
> "terminating on error 104" corresponds to line 221 of
> /usr/src/linux-2.4.25/fs/nfsd/nfssvc.c. So svc_recv on line 191 is
> obviously returning -104.
104 is ECONNRESET, this means the client reset the connection.
The server should really clean up the socket in this case.
Does the attached patch help?
Beware - totally untested, better not try this on a production
machine :)
Olaf
--
Olaf Kirch | Stop wasting entropy - start using predictable
okir@suse.de | tempfile names today!
---------------+
[-- Attachment #2: nfsd-handle-econnreset --]
[-- Type: text/plain, Size: 483 bytes --]
--- svcsock.c.orig 2004-03-04 12:08:51.000000000 +0100
+++ svcsock.c 2004-03-04 12:10:08.000000000 +0100
@@ -904,6 +904,10 @@
if (len == -EAGAIN) {
dprintk("RPC: TCP recvfrom got EAGAIN\n");
svc_sock_received(svsk);
+ } else if (len == -ECONNRESET) {
+ dprintk("RPC: TCP recvfrom got ECONNRESET\n");
+ svc_sock_received(svsk);
+ set_bit(SK_DEAD, &svsk->sk_flags);
} else {
printk(KERN_NOTICE "%s: recvfrom returned errno %d\n",
svsk->sk_server->sv_name, -len);
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: nfsd: terminating on error 104 problem
2004-03-04 11:11 ` Olaf Kirch
@ 2004-03-04 12:59 ` Johan van den Dorpe
2004-03-05 18:01 ` Johan van den Dorpe
0 siblings, 1 reply; 5+ messages in thread
From: Johan van den Dorpe @ 2004-03-04 12:59 UTC (permalink / raw)
To: Olaf Kirch; +Cc: nfs
Olaf Kirch wrote:
> On Thu, Mar 04, 2004 at 10:41:02AM +0000, Johan van den Dorpe wrote:
>
>>So from my limited knowledge of the kernel source I can see that
>>"terminating on error 104" corresponds to line 221 of
>>/usr/src/linux-2.4.25/fs/nfsd/nfssvc.c. So svc_recv on line 191 is
>>obviously returning -104.
>
>
> 104 is ECONNRESET, this means the client reset the connection.
> The server should really clean up the socket in this case.
>
> Does the attached patch help?
>
> Beware - totally untested, better not try this on a production
> machine :)
Thanks very much for this... I'm going to run a test with this patch and
get back to you with results after.
--
Johan van den Dorpe
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: nfsd: terminating on error 104 problem
2004-03-04 12:59 ` Johan van den Dorpe
@ 2004-03-05 18:01 ` Johan van den Dorpe
0 siblings, 0 replies; 5+ messages in thread
From: Johan van den Dorpe @ 2004-03-05 18:01 UTC (permalink / raw)
To: Johan van den Dorpe; +Cc: Olaf Kirch, nfs
I tried out the patch today, rebooted with the new kernel at 9:42am
Then, we started getting these messages (which we were having before but
didn't cause any noticable problems)
Mar 5 10:05:00 ps30 kernel: nfsd: peername failed (err 107)!
After a few hours, we started getting the terminating on error 104 errors:
Mar 5 13:56:49 ps30 kernel: nfsd: terminating on error 104
Until finally the server crashed, refusing input from the console. There
were no messages on the console apart from the above nfsd errors. This
is the last syslog entry
Mar 5 17:10:10 ps30 rpc.mountd: Caught signal 15, un-registering and
exiting.
Mar 5 17:10:10 ps30 nfs: rpc.mountd shutdown succeeded
While the server was up we recieved 112 peername failed (err 107)
messages, and 5 terminating on error 104 messages.
If you want any more info please tell me what it is you need.
Otherwise have a good weekend!
Thanks,
Johan.
Johan van den Dorpe wrote:
> Olaf Kirch wrote:
>
>> On Thu, Mar 04, 2004 at 10:41:02AM +0000, Johan van den Dorpe wrote:
>>
>>> So from my limited knowledge of the kernel source I can see that
>>> "terminating on error 104" corresponds to line 221 of
>>> /usr/src/linux-2.4.25/fs/nfsd/nfssvc.c. So svc_recv on line 191 is
>>> obviously returning -104.
>>
>>
>>
>> 104 is ECONNRESET, this means the client reset the connection.
>> The server should really clean up the socket in this case.
>>
>> Does the attached patch help?
>> Beware - totally untested, better not try this on a production
>> machine :)
>
>
> Thanks very much for this... I'm going to run a test with this patch and
> get back to you with results after.
>
>
--
Johan van den Dorpe
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2004-03-05 18:10 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-25 16:55 nfsd: terminating on error 104 problem Johan van den Dorpe
-- strict thread matches above, loose matches on Subject: below --
2004-03-04 10:41 Johan van den Dorpe
2004-03-04 11:11 ` Olaf Kirch
2004-03-04 12:59 ` Johan van den Dorpe
2004-03-05 18:01 ` Johan van den Dorpe
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.