problems with lockd in 2.6.22.6

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* problems with lockd in 2.6.22.6
@ 2007-09-07 15:49 Wolfgang Walter
  2007-09-07 16:19 ` [NFS] " J. Bruce Fields
  0 siblings, 1 reply; 4+ messages in thread
From: Wolfgang Walter @ 2007-09-07 15:49 UTC (permalink / raw)
  To: neilb; +Cc: netdev, nfs

Hello,

we upgraded the kernel of a nfs-server from 2.6.17.11 to 2.6.22.6. Since then we get the message

lockd: too many open TCP sockets, consider increasing the number of nfsd threads
lockd: last TCP connect from ^\\236^\É^D

1) These random characters in the second line are caused by a bug in svc_tcp_accept.
I already posted this patch on netdev@vger.kernel.org:

Signed-off-by: Wolfgang Walter <wolfgang.walter@studentenwerk.mhn.de>
--- linux-2.6.22.6/net/sunrpc/svcsock.c	2007-08-27 18:10:14.000000000 +0200
+++ linux-2.6.22.6w/net/sunrpc/svcsock.c	2007-09-03 18:27:30.000000000 +0200
@@ -1090,7 +1090,7 @@
 						   serv->sv_name);
 				printk(KERN_NOTICE
 				       "%s: last TCP connect from %s\n",
-				       serv->sv_name, buf);
+				       serv->sv_name, __svc_print_addr(sin, buf, sizeof(buf)));
 			}
 			/*
 			 * Always select the oldest socket. It's not fair,

with this patch applied one gets something like

lockd: too many open TCP sockets, consider increasing the number of nfsd threads
lockd: last TCP connect from 10.11.0.12, port=784

2) The number of nfsd threads we are running on the machine is 1024. So this is not
the problem. It seems, though, that in the case of lockd svc_tcp_accept does not
check the number of nfsd threads but the number of lockd threads which is one.
As soon as the number of open lockd sockets surpasses 80 this message gets logged.
This usually happens every evening when a lot of people shutdown their workstation.

3) For unknown reason these sockets then remain open. In the morning when people
start their workstation again we therefor not only get a lot of these messages
again but often the nfs-server does not proberly work any more. Restarting the
nfs-daemon is a workaround.

Reagrds,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [NFS] problems with lockd in 2.6.22.6
  2007-09-07 15:49 problems with lockd in 2.6.22.6 Wolfgang Walter
@ 2007-09-07 16:19 ` J. Bruce Fields
  2007-09-07 18:05   ` Wolfgang Walter
  2007-09-08 18:20   ` [NFS] " Wolfgang Walter
  0 siblings, 2 replies; 4+ messages in thread
From: J. Bruce Fields @ 2007-09-07 16:19 UTC (permalink / raw)
  To: Wolfgang Walter; +Cc: neilb, netdev, nfs

On Fri, Sep 07, 2007 at 05:49:55PM +0200, Wolfgang Walter wrote:
> Hello,
> 
> we upgraded the kernel of a nfs-server from 2.6.17.11 to 2.6.22.6. Since then we get the message
> 
> lockd: too many open TCP sockets, consider increasing the number of nfsd threads
> lockd: last TCP connect from ^\\236^\É^D
> 
> 1) These random characters in the second line are caused by a bug in svc_tcp_accept.
> I already posted this patch on netdev@vger.kernel.org:

Thanks, I've applied that.  (The bug is a little subtle: there's
actually two previous __svc_print_addr() calls which might have
initialized "buf" correctly, and it's not obvious that the second isn't
always called (since it's in a dprintk, which is a macro that expands
into a printk inside a conditional)).

> with this patch applied one gets something like
> 
> lockd: too many open TCP sockets, consider increasing the number of
> nfsd threads lockd: last TCP connect from 10.11.0.12, port=784
> 
> 
> 2) The number of nfsd threads we are running on the machine is 1024.
> So this is not the problem. It seems, though, that in the case of
> lockd svc_tcp_accept does not check the number of nfsd threads but the
> number of lockd threads which is one.  As soon as the number of open
> lockd sockets surpasses 80 this message gets logged.  This usually
> happens every evening when a lot of people shutdown their workstation.

So to be clear: there's not an actual problem here other than that the
logs are getting spammed?  (Not that that isn't a problem in itself.)

> 3) For unknown reason these sockets then remain open. In the morning
> when people start their workstation again we therefor not only get a
> lot of these messages again but often the nfs-server does not proberly
> work any more. Restarting the nfs-daemon is a workaround.

Hm, thanks.

--b.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: problems with lockd in 2.6.22.6
  2007-09-07 16:19 ` [NFS] " J. Bruce Fields
@ 2007-09-07 18:05   ` Wolfgang Walter
  2007-09-08 18:20   ` [NFS] " Wolfgang Walter
  1 sibling, 0 replies; 4+ messages in thread
From: Wolfgang Walter @ 2007-09-07 18:05 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: neilb, netdev, nfs

Am Freitag, 7. September 2007 18:19 schrieben Sie:
> On Fri, Sep 07, 2007 at 05:49:55PM +0200, Wolfgang Walter wrote:
> > Hello,
> >
> > we upgraded the kernel of a nfs-server from 2.6.17.11 to 2.6.22.6. Since
> > then we get the message
> >
> > lockd: too many open TCP sockets, consider increasing the number of nfsd
> > threads lockd: last TCP connect from ^\\236^\É^D

> >
> > 2) The number of nfsd threads we are running on the machine is 1024.
> > So this is not the problem. It seems, though, that in the case of
> > lockd svc_tcp_accept does not check the number of nfsd threads but the
> > number of lockd threads which is one.  As soon as the number of open
> > lockd sockets surpasses 80 this message gets logged.  This usually
> > happens every evening when a lot of people shutdown their workstation.
>
> So to be clear: there's not an actual problem here other than that the
> logs are getting spammed?  (Not that that isn't a problem in itself.)
>

When more than 80 nfs clients try to lock files at the same time then it
probably would.

> > 3) For unknown reason these sockets then remain open. In the morning
> > when people start their workstation again we therefor not only get a
> > lot of these messages again but often the nfs-server does not properly
> > work any more. Restarting the nfs-daemon is a workaround.
>
> Hm, thanks.
>

I don't know if the lockd thing is the reason, though.

2.6.22.6 per se runs stable (no oops, no crash etc) but kernel nfs seems
to be a little bit unstable. 2.6.17.11 run for months without any nfsd-related 
problems whereas in 2.6.22.6 nfs needs to be restarted almost every day. 
Sometimes this fails with

lockd_down: lockd failed to exit, clearing pid
nfsd: last server has exited
nfsd: unexporting all filesystems
lockd_up: makesock failed, error=-98

after which the server must be rebooted.

I think there is something with lockd because there are no problems over the 
day. It is in the morning when a lot of people log into their machines and 
start their desktops (I think kde locks its config files when it reads them).

Regards
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [NFS] problems with lockd in 2.6.22.6
  2007-09-07 16:19 ` [NFS] " J. Bruce Fields
  2007-09-07 18:05   ` Wolfgang Walter
@ 2007-09-08 18:20   ` Wolfgang Walter
  1 sibling, 0 replies; 4+ messages in thread
From: Wolfgang Walter @ 2007-09-08 18:20 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: neilb, netdev, nfs

On Friday 07 September 2007, J. Bruce Fields wrote:
> On Fri, Sep 07, 2007 at 05:49:55PM +0200, Wolfgang Walter wrote:
> > Hello,
> > 

> > 3) For unknown reason these sockets then remain open. In the morning
> > when people start their workstation again we therefor not only get a
> > lot of these messages again but often the nfs-server does not proberly
> > work any more. Restarting the nfs-daemon is a workaround.
> 

I wonder why these sockets remain open, by the way. Even if they aren't used
for days. Such a socket only gets deleted when the 81. socket must be opened.

If I do not misunderstand the idea then temporary sockets should be destroyed
after some time without activity by svc_age_temp_sockets.

Now I wonder how svc_age_temp_sockets works. Does it ever close and delete a
temporary socket at all?


static void
svc_age_temp_sockets(unsigned long closure)
{
	struct svc_serv *serv = (struct svc_serv *)closure;
	struct svc_sock *svsk;
	struct list_head *le, *next;
	LIST_HEAD(to_be_aged);

	dprintk("svc_age_temp_sockets\n");

	if (!spin_trylock_bh(&serv->sv_lock)) {
		/* busy, try again 1 sec later */
		dprintk("svc_age_temp_sockets: busy\n");
		mod_timer(&serv->sv_temptimer, jiffies + HZ);
		return;
	}

	list_for_each_safe(le, next, &serv->sv_tempsocks) {
		svsk = list_entry(le, struct svc_sock, sk_list);

		if (!test_and_set_bit(SK_OLD, &svsk->sk_flags))
			continue;
		if (atomic_read(&svsk->sk_inuse) || test_bit(SK_BUSY, &svsk->sk_flags))
			continue;
####
doesn't this mean that svsk->sk_inuse must be zero which means that SK_DEAD is set?
and wouldn't that mean that svc_delete_socket already has been called for that socket
(and probably is already closed) ?
and wouldn't that mean that svc_sock_enqueue which is called later does not make any
sense (it checks for SK_DEAD)?
####

		atomic_inc(&svsk->sk_inuse);
		list_move(le, &to_be_aged);
		set_bit(SK_CLOSE, &svsk->sk_flags);
		set_bit(SK_DETACHED, &svsk->sk_flags);
	}
	spin_unlock_bh(&serv->sv_lock);

	while (!list_empty(&to_be_aged)) {
		le = to_be_aged.next;
		/* fiddling the sk_list node is safe 'cos we're SK_DETACHED */
		list_del_init(le);
		svsk = list_entry(le, struct svc_sock, sk_list);

		dprintk("queuing svsk %p for closing, %lu seconds old\n",
			svsk, get_seconds() - svsk->sk_lastrecv);

		/* a thread will dequeue and close it soon */
		svc_sock_enqueue(svsk);
		svc_sock_put(svsk);
	}

	mod_timer(&serv->sv_temptimer, jiffies + svc_conn_age_period * HZ);
}

Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2007-09-08 18:20 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-09-07 15:49 problems with lockd in 2.6.22.6 Wolfgang Walter
2007-09-07 16:19 ` [NFS] " J. Bruce Fields
2007-09-07 18:05   ` Wolfgang Walter
2007-09-08 18:20   ` [NFS] " Wolfgang Walter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).