All of lore.kernel.org
 help / color / mirror / Atom feed
* nfsd: terminating on error 104 problem
@ 2004-02-25 16:55 Johan van den Dorpe
  0 siblings, 0 replies; 5+ messages in thread
From: Johan van den Dorpe @ 2004-02-25 16:55 UTC (permalink / raw)
  To: linux-kernel

Hi all

We are currently using quite a number of HP DL380 servers within our 
company that use the 2.4.25 kernel. These are primarily used for heavy 
NFS access, so we keep a large number of nfsd processes concurrently 
running. We have noticed over time however that nfsd processes 
periodically die. From inspection of the system logs, we get numerous 
entries:

Feb 22 12:25:24 ps29 kernel: nfsd: recvfrom returned errno 104
Feb 22 12:25:24 ps29 kernel: nfsd: terminating on error 104

At the moment we cron a script that counts the number of nfsds and 
restart rpc.nfsd if they drop below a threshold. Although this is a 
working solution, it's not ideal and we would really like to get his 
problem patched up properly.

So from my limited knowledge of the kernel source I can see that 
"terminating on error 104" corresponds to line 221 of
/usr/src/linux-2.4.25/fs/nfsd/nfssvc.c. So svc_recv on line 191 is 
obviously returning -104.

I've noticed that in the 2.6.0 kernel there are quite a few changes to 
nfssvc.c, and I wondered if they dealt with this situation.

In the mean time, are there any quick hacks I could add to nfssvc.c to 
make it tolerate error -104? Could I safely alter the main request loop 
to simply continue execution if svc_recv returns this code?

Any help would be much appreciated.

many thanks

-- 
Johan van den Dorpe

^ permalink raw reply	[flat|nested] 5+ messages in thread

* nfsd: terminating on error 104 problem
@ 2004-03-04 10:41 Johan van den Dorpe
  2004-03-04 11:11 ` Olaf Kirch
  0 siblings, 1 reply; 5+ messages in thread
From: Johan van den Dorpe @ 2004-03-04 10:41 UTC (permalink / raw)
  To: nfs

Hi all

We are currently using quite a number of HP DL380 servers within our 
company that use the 2.4.25 kernel. These are primarily used for heavy 
NFS access, so we keep a large number of nfsd processes concurrently 
running. We have noticed over time however that single instances of nfsd 
processes periodically die. From inspection of the system logs, we get 
numerous entries:

Feb 22 12:25:24 ps29 kernel: nfsd: recvfrom returned errno 104
Feb 22 12:25:24 ps29 kernel: nfsd: terminating on error 104

At the moment we cron a script that counts the number of nfsds and 
restart rpc.nfsd if they drop below a threshold. Although this is a 
working solution, it's not ideal and we would really like to get his 
problem patched up properly.

So from my limited knowledge of the kernel source I can see that 
"terminating on error 104" corresponds to line 221 of
/usr/src/linux-2.4.25/fs/nfsd/nfssvc.c. So svc_recv on line 191 is 
obviously returning -104.

I've noticed that in the 2.6 kernel there are quite a few changes to 
nfssvc.c, and I wondered if they dealt with this situation.

In the mean time, are there any quick hacks I could add to nfssvc.c to 
make it tolerate error -104? Could I safely alter the main request loop 
to simply continue execution if svc_recv returns this code?

Any help would be much appreciated.

Many thanks,

-- 
Johan van den Dorpe


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: nfsd: terminating on error 104 problem
  2004-03-04 10:41 Johan van den Dorpe
@ 2004-03-04 11:11 ` Olaf Kirch
  2004-03-04 12:59   ` Johan van den Dorpe
  0 siblings, 1 reply; 5+ messages in thread
From: Olaf Kirch @ 2004-03-04 11:11 UTC (permalink / raw)
  To: Johan van den Dorpe; +Cc: nfs

[-- Attachment #1: Type: text/plain, Size: 654 bytes --]

On Thu, Mar 04, 2004 at 10:41:02AM +0000, Johan van den Dorpe wrote:
> So from my limited knowledge of the kernel source I can see that 
> "terminating on error 104" corresponds to line 221 of
> /usr/src/linux-2.4.25/fs/nfsd/nfssvc.c. So svc_recv on line 191 is 
> obviously returning -104.

104 is ECONNRESET, this means the client reset the connection.
The server should really clean up the socket in this case.

Does the attached patch help? 

Beware - totally untested, better not try this on a production
machine :)

Olaf
-- 
Olaf Kirch     |  Stop wasting entropy - start using predictable
okir@suse.de   |  tempfile names today!
---------------+ 

[-- Attachment #2: nfsd-handle-econnreset --]
[-- Type: text/plain, Size: 483 bytes --]

--- svcsock.c.orig	2004-03-04 12:08:51.000000000 +0100
+++ svcsock.c	2004-03-04 12:10:08.000000000 +0100
@@ -904,6 +904,10 @@
 	if (len == -EAGAIN) {
 		dprintk("RPC: TCP recvfrom got EAGAIN\n");
 		svc_sock_received(svsk);
+	} else if (len == -ECONNRESET) {
+		dprintk("RPC: TCP recvfrom got ECONNRESET\n");
+		svc_sock_received(svsk);
+		set_bit(SK_DEAD, &svsk->sk_flags);
 	} else {
 		printk(KERN_NOTICE "%s: recvfrom returned errno %d\n",
 					svsk->sk_server->sv_name, -len);

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: nfsd: terminating on error 104 problem
  2004-03-04 11:11 ` Olaf Kirch
@ 2004-03-04 12:59   ` Johan van den Dorpe
  2004-03-05 18:01     ` Johan van den Dorpe
  0 siblings, 1 reply; 5+ messages in thread
From: Johan van den Dorpe @ 2004-03-04 12:59 UTC (permalink / raw)
  To: Olaf Kirch; +Cc: nfs

Olaf Kirch wrote:
> On Thu, Mar 04, 2004 at 10:41:02AM +0000, Johan van den Dorpe wrote:
> 
>>So from my limited knowledge of the kernel source I can see that 
>>"terminating on error 104" corresponds to line 221 of
>>/usr/src/linux-2.4.25/fs/nfsd/nfssvc.c. So svc_recv on line 191 is 
>>obviously returning -104.
> 
> 
> 104 is ECONNRESET, this means the client reset the connection.
> The server should really clean up the socket in this case.
> 
> Does the attached patch help? 
> 
> Beware - totally untested, better not try this on a production
> machine :)

Thanks very much for this... I'm going to run a test with this patch and 
get back to you with results after.


-- 
Johan van den Dorpe


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: nfsd: terminating on error 104 problem
  2004-03-04 12:59   ` Johan van den Dorpe
@ 2004-03-05 18:01     ` Johan van den Dorpe
  0 siblings, 0 replies; 5+ messages in thread
From: Johan van den Dorpe @ 2004-03-05 18:01 UTC (permalink / raw)
  To: Johan van den Dorpe; +Cc: Olaf Kirch, nfs

I tried out the patch today, rebooted with the new kernel at 9:42am

Then, we started getting these messages (which we were having before but 
didn't cause any noticable problems)

Mar  5 10:05:00 ps30 kernel: nfsd: peername failed (err 107)!

After a few hours, we started getting the terminating on error 104 errors:

Mar  5 13:56:49 ps30 kernel: nfsd: terminating on error 104

Until finally the server crashed, refusing input from the console. There 
were no messages on the console apart from the above nfsd errors. This 
is the last syslog entry

Mar  5 17:10:10 ps30 rpc.mountd: Caught signal 15, un-registering and 
exiting.
Mar  5 17:10:10 ps30 nfs: rpc.mountd shutdown succeeded

While the server was up we recieved 112 peername failed (err 107) 
messages, and 5 terminating on error 104 messages.

If you want any more info please tell me what it is you need.

Otherwise have a good weekend!

Thanks,
Johan.

Johan van den Dorpe wrote:
> Olaf Kirch wrote:
> 
>> On Thu, Mar 04, 2004 at 10:41:02AM +0000, Johan van den Dorpe wrote:
>>
>>> So from my limited knowledge of the kernel source I can see that 
>>> "terminating on error 104" corresponds to line 221 of
>>> /usr/src/linux-2.4.25/fs/nfsd/nfssvc.c. So svc_recv on line 191 is 
>>> obviously returning -104.
>>
>>
>>
>> 104 is ECONNRESET, this means the client reset the connection.
>> The server should really clean up the socket in this case.
>>
>> Does the attached patch help?
>> Beware - totally untested, better not try this on a production
>> machine :)
> 
> 
> Thanks very much for this... I'm going to run a test with this patch and 
> get back to you with results after.
> 
> 


-- 
Johan van den Dorpe


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2004-03-05 18:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-25 16:55 nfsd: terminating on error 104 problem Johan van den Dorpe
  -- strict thread matches above, loose matches on Subject: below --
2004-03-04 10:41 Johan van den Dorpe
2004-03-04 11:11 ` Olaf Kirch
2004-03-04 12:59   ` Johan van den Dorpe
2004-03-05 18:01     ` Johan van den Dorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.