* [PATCH] nfsd: just keep single lockd reference for nfsd @ 2010-06-18 11:02 Jeff Layton 2010-06-18 14:18 ` Chris Vine 0 siblings, 1 reply; 6+ messages in thread From: Jeff Layton @ 2010-06-18 11:02 UTC (permalink / raw) To: bfields; +Cc: linux-nfs, chris-TF6qbakwsgc2epGFuHBODCp2UmYkHbXO This patch should replace the other patches that I proposed to make sure that each sv_permsock entry holds a lockd refrence. Right now, nfsd keeps a lockd reference for each socket that it has open. This is unnecessary and complicates the error handling on startup and shutdown. Change it to just do a lockd_up when creating the nfsd_serv and just do a single lockd_down when taking down the last nfsd thread. This patch also changes the error handling in nfsd_create_serv a bit too. There doesn't seem to be any need to reset the nfssvc_boot time if the nfsd startup failed. Note though that this does change the user-visible behavior slightly. Today when someone writes a text socktype and port to the portlist file prior to starting nfsd, lockd is not started when nfsd threads are brought up. With this change, it will be started any time that the nfsd_serv is created. I'm making the assumption that that's not a problem. If it is then we'll need to take a different approach to fixing this. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- fs/nfsd/nfsctl.c | 10 ---------- fs/nfsd/nfssvc.c | 25 ++++++++++--------------- 2 files changed, 10 insertions(+), 25 deletions(-) diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 9e8645a..b1c5be8 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -949,15 +949,8 @@ static ssize_t __write_ports_addfd(char *buf) if (err != 0) return err; - err = lockd_up(); - if (err != 0) { - svc_destroy(nfsd_serv); - return err; - } - err = svc_addsock(nfsd_serv, fd, buf, SIMPLE_TRANSACTION_LIMIT); if (err < 0) { - lockd_down(); svc_destroy(nfsd_serv); return err; } @@ -982,9 +975,6 @@ static ssize_t __write_ports_delfd(char *buf) if (nfsd_serv != NULL) len = svc_sock_names(nfsd_serv, buf, SIMPLE_TRANSACTION_LIMIT, toclose); - if (len >= 0) - lockd_down(); - kfree(toclose); return len; } diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index a06ea99..6b59d32 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -183,9 +183,7 @@ int nfsd_nrthreads(void) static void nfsd_last_thread(struct svc_serv *serv) { /* When last nfsd thread exits we need to do some clean-up */ - struct svc_xprt *xprt; - list_for_each_entry(xprt, &serv->sv_permsocks, xpt_list) - lockd_down(); + lockd_down(); nfsd_serv = NULL; nfsd_racache_shutdown(); nfs4_state_shutdown(); @@ -264,13 +262,18 @@ int nfsd_create_serv(void) nfsd_max_blksize /= 2; } + err = lockd_up(); + if (err != 0) + return err; + nfsd_serv = svc_create_pooled(&nfsd_program, nfsd_max_blksize, nfsd_last_thread, nfsd, THIS_MODULE); - if (nfsd_serv == NULL) - err = -ENOMEM; - else - set_max_drc(); + if (nfsd_serv == NULL) { + lockd_down(); + return -ENOMEM; + } + set_max_drc(); do_gettimeofday(&nfssvc_boot); /* record boot time */ return err; } @@ -286,19 +289,11 @@ static int nfsd_init_socks(int port) if (error < 0) return error; - error = lockd_up(); - if (error < 0) - return error; - error = svc_create_xprt(nfsd_serv, "tcp", PF_INET, port, SVC_SOCK_DEFAULTS); if (error < 0) return error; - error = lockd_up(); - if (error < 0) - return error; - return 0; } -- 1.5.5.6 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] nfsd: just keep single lockd reference for nfsd 2010-06-18 11:02 [PATCH] nfsd: just keep single lockd reference for nfsd Jeff Layton @ 2010-06-18 14:18 ` Chris Vine [not found] ` <20100618151808.6e40a0ca-PlFwYbz7hoIyinQH9hAyzg@public.gmane.org> 0 siblings, 1 reply; 6+ messages in thread From: Chris Vine @ 2010-06-18 14:18 UTC (permalink / raw) To: Jeff Layton; +Cc: bfields, linux-nfs On Fri, 18 Jun 2010 07:02:20 -0400 Jeff Layton <jlayton@redhat.com> wrote: > > This patch should replace the other patches that I proposed to make > sure that each sv_permsock entry holds a lockd refrence. > > Right now, nfsd keeps a lockd reference for each socket that it has > open. This is unnecessary and complicates the error handling on > startup and shutdown. Change it to just do a lockd_up when creating > the nfsd_serv and just do a single lockd_down when taking down the > last nfsd thread. > > This patch also changes the error handling in nfsd_create_serv a > bit too. There doesn't seem to be any need to reset the nfssvc_boot > time if the nfsd startup failed. > > Note though that this does change the user-visible behavior slightly. > Today when someone writes a text socktype and port to the portlist > file prior to starting nfsd, lockd is not started when nfsd threads > are brought up. With this change, it will be started any time that > the nfsd_serv is created. I'm making the assumption that that's not a > problem. If it is then we'll need to take a different approach to > fixing this. [snip] With this (and all the other patches in nfsd-error) applied, this eliminates the kernel bug/oops. However, on my netbook nfsd now always hangs when starting up, no matter how much in advance I start portmap. (The race condition has been traded for a hang in very case.) dmesg reports this: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory NFSD: starting 90-second grace period [... hang here... ] [... continuing after killall rpc.nfsd ...] svc: failed to register lockdv1 RPC service (errno 512). lockd_up: makesock failed, error=-512 portmap is definitely running. 'ps axc | grep rpc' gives: 1767 ? Ss 0:00 rpc.portmap 1771 ? Ss 0:00 rpc.statd 2412 ? S 0:00 rpciod/0 2413 ? S 0:00 rpciod/1 3075 ? Ss 0:00 rpc.rquotad Chris ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <20100618151808.6e40a0ca-PlFwYbz7hoIyinQH9hAyzg@public.gmane.org>]
* Re: [PATCH] nfsd: just keep single lockd reference for nfsd [not found] ` <20100618151808.6e40a0ca-PlFwYbz7hoIyinQH9hAyzg@public.gmane.org> @ 2010-06-18 14:30 ` Jeff Layton [not found] ` <20100618103042.080aa597-4QP7MXygkU+dMjc06nkz3ljfA9RmPOcC@public.gmane.org> 0 siblings, 1 reply; 6+ messages in thread From: Jeff Layton @ 2010-06-18 14:30 UTC (permalink / raw) To: Chris Vine; +Cc: bfields, linux-nfs On Fri, 18 Jun 2010 15:18:08 +0100 Chris Vine <chris-TF6qbakwsgc2epGFuHBODCp2UmYkHbXO@public.gmane.org> wrote: > On Fri, 18 Jun 2010 07:02:20 -0400 > Jeff Layton <jlayton@redhat.com> wrote: > > > > > This patch should replace the other patches that I proposed to make > > sure that each sv_permsock entry holds a lockd refrence. > > > > Right now, nfsd keeps a lockd reference for each socket that it has > > open. This is unnecessary and complicates the error handling on > > startup and shutdown. Change it to just do a lockd_up when creating > > the nfsd_serv and just do a single lockd_down when taking down the > > last nfsd thread. > > > > This patch also changes the error handling in nfsd_create_serv a > > bit too. There doesn't seem to be any need to reset the nfssvc_boot > > time if the nfsd startup failed. > > > > Note though that this does change the user-visible behavior slightly. > > Today when someone writes a text socktype and port to the portlist > > file prior to starting nfsd, lockd is not started when nfsd threads > > are brought up. With this change, it will be started any time that > > the nfsd_serv is created. I'm making the assumption that that's not a > > problem. If it is then we'll need to take a different approach to > > fixing this. > > [snip] > > With this (and all the other patches in nfsd-error) applied, this > eliminates the kernel bug/oops. > > However, on my netbook nfsd now always hangs when starting up, no matter > how much in advance I start portmap. (The race condition has been > traded for a hang in very case.) > > dmesg reports this: > > NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory > NFSD: starting 90-second grace period > [... hang here... ] > [... continuing after killall rpc.nfsd ...] > svc: failed to register lockdv1 RPC service (errno 512). > lockd_up: makesock failed, error=-512 > > portmap is definitely running. 'ps axc | grep rpc' gives: > > 1767 ? Ss 0:00 rpc.portmap > 1771 ? Ss 0:00 rpc.statd > 2412 ? S 0:00 rpciod/0 > 2413 ? S 0:00 rpciod/1 > 3075 ? Ss 0:00 rpc.rquotad > > Chris > Thanks for testing them. No oops == improvement! ...but it would still be good to know what's wrong here. It sounds like something is really odd with loopback communications on this box. Is the ipv4 loopback interface up at this time? Do you have any iptables stuff set up that might be filtering out portmap registration requests from the kernel? What happens if you run "rpcinfo"? Does it also hang? The kernel uses TCP for talking to portmap these days, so it might also be good to see whether you can use rpcinfo to talk to it with TCP too... Cheers, -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <20100618103042.080aa597-4QP7MXygkU+dMjc06nkz3ljfA9RmPOcC@public.gmane.org>]
* Re: [PATCH] nfsd: just keep single lockd reference for nfsd [not found] ` <20100618103042.080aa597-4QP7MXygkU+dMjc06nkz3ljfA9RmPOcC@public.gmane.org> @ 2010-06-18 15:04 ` Chris Vine [not found] ` <20100618160411.14d65b2b-PlFwYbz7hoIyinQH9hAyzg@public.gmane.org> 0 siblings, 1 reply; 6+ messages in thread From: Chris Vine @ 2010-06-18 15:04 UTC (permalink / raw) To: Jeff Layton; +Cc: bfields, linux-nfs On Fri, 18 Jun 2010 10:30:42 -0400 Jeff Layton <jlayton@redhat.com> wrote: [snip] > Thanks for testing them. No oops == improvement! > > ...but it would still be good to know what's wrong here. It sounds > like something is really odd with loopback communications on this > box. Is the ipv4 loopback interface up at this time? Do you have any > iptables stuff set up that might be filtering out portmap > registration requests from the kernel? What happens if you run > "rpcinfo"? Does it also hang? > > The kernel uses TCP for talking to portmap these days, so it might > also be good to see whether you can use rpcinfo to talk to it with > TCP too... You are right. Although the lo interface is up, localhost is broken in 2.6.35-rc3. External networking works OK which is why I hadn't noticed. It looks like some other unrelated bug in 2.6.35 is at work. Quite why it doesn't strike my Pentium machine I can't say. The netbook takes its network address from DHCP (my Pentium does not) but that shouldn't make any difference to localhost. I shall do a bit more looking around later today or tomorrow. Chris ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <20100618160411.14d65b2b-PlFwYbz7hoIyinQH9hAyzg@public.gmane.org>]
* Re: [PATCH] nfsd: just keep single lockd reference for nfsd [not found] ` <20100618160411.14d65b2b-PlFwYbz7hoIyinQH9hAyzg@public.gmane.org> @ 2010-06-18 15:11 ` Jeff Layton [not found] ` <20100618111132.20fb518a-4QP7MXygkU+dMjc06nkz3ljfA9RmPOcC@public.gmane.org> 0 siblings, 1 reply; 6+ messages in thread From: Jeff Layton @ 2010-06-18 15:11 UTC (permalink / raw) To: Chris Vine; +Cc: bfields, linux-nfs On Fri, 18 Jun 2010 16:04:11 +0100 Chris Vine <chris-TF6qbakwsgc2epGFuHBODCp2UmYkHbXO@public.gmane.org> wrote: > On Fri, 18 Jun 2010 10:30:42 -0400 > Jeff Layton <jlayton@redhat.com> wrote: > [snip] > > Thanks for testing them. No oops == improvement! > > > > ...but it would still be good to know what's wrong here. It sounds > > like something is really odd with loopback communications on this > > box. Is the ipv4 loopback interface up at this time? Do you have any > > iptables stuff set up that might be filtering out portmap > > registration requests from the kernel? What happens if you run > > "rpcinfo"? Does it also hang? > > > > The kernel uses TCP for talking to portmap these days, so it might > > also be good to see whether you can use rpcinfo to talk to it with > > TCP too... > > You are right. Although the lo interface is up, localhost is broken in > 2.6.35-rc3. External networking works OK which is why I hadn't noticed. > > It looks like some other unrelated bug in 2.6.35 is at work. Quite why > it doesn't strike my Pentium machine I can't say. The netbook takes its > network address from DHCP (my Pentium does not) but that shouldn't make > any difference to localhost. > > I shall do a bit more looking around later today or tomorrow. > > Chris > > Ok, good to know. I'll still plan to pursue these fixes. The error handling in nfsd obviously an area that needs improvement. I think Bruce's current plan is to take the earlier patches in that stack for 2.6.36. If this one is OK, it's probably reasonable to go in at the same time. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <20100618111132.20fb518a-4QP7MXygkU+dMjc06nkz3ljfA9RmPOcC@public.gmane.org>]
* Re: [PATCH] nfsd: just keep single lockd reference for nfsd [not found] ` <20100618111132.20fb518a-4QP7MXygkU+dMjc06nkz3ljfA9RmPOcC@public.gmane.org> @ 2010-06-18 22:45 ` Chris Vine 0 siblings, 0 replies; 6+ messages in thread From: Chris Vine @ 2010-06-18 22:45 UTC (permalink / raw) To: Jeff Layton; +Cc: bfields, linux-nfs On Fri, 18 Jun 2010 11:11:32 -0400 Jeff Layton <jlayton@redhat.com> wrote: > On Fri, 18 Jun 2010 16:04:11 +0100 > Chris Vine <chris-TF6qbakwsgc2epGFuHBODCp2UmYkHbXO@public.gmane.org> wrote: > > > On Fri, 18 Jun 2010 10:30:42 -0400 > > Jeff Layton <jlayton@redhat.com> wrote: > > [snip] > > > Thanks for testing them. No oops == improvement! > > > > > > ...but it would still be good to know what's wrong here. It sounds > > > like something is really odd with loopback communications on this > > > box. Is the ipv4 loopback interface up at this time? Do you have > > > any iptables stuff set up that might be filtering out portmap > > > registration requests from the kernel? What happens if you run > > > "rpcinfo"? Does it also hang? > > > > > > The kernel uses TCP for talking to portmap these days, so it might > > > also be good to see whether you can use rpcinfo to talk to it with > > > TCP too... > > > > You are right. Although the lo interface is up, localhost is broken > > in 2.6.35-rc3. External networking works OK which is why I hadn't > > noticed. > > > > It looks like some other unrelated bug in 2.6.35 is at work. Quite > > why it doesn't strike my Pentium machine I can't say. The netbook > > takes its network address from DHCP (my Pentium does not) but that > > shouldn't make any difference to localhost. > > > > I shall do a bit more looking around later today or tomorrow. > > > > Chris > > > > > > Ok, good to know. I'll still plan to pursue these fixes. The error > handling in nfsd obviously an area that needs improvement. I think > Bruce's current plan is to take the earlier patches in that stack for > 2.6.36. If this one is OK, it's probably reasonable to go in at the > same time. The problem is indeed a loopback bug and for anyone else suffering this with nfsd, the fix is a one liner, here: http://marc.info/?l=linux-kernel&m=127646140827821 At least this had the benefit of improving the error handling of nfsd! Chris ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2010-06-18 22:49 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-18 11:02 [PATCH] nfsd: just keep single lockd reference for nfsd Jeff Layton
2010-06-18 14:18 ` Chris Vine
[not found] ` <20100618151808.6e40a0ca-PlFwYbz7hoIyinQH9hAyzg@public.gmane.org>
2010-06-18 14:30 ` Jeff Layton
[not found] ` <20100618103042.080aa597-4QP7MXygkU+dMjc06nkz3ljfA9RmPOcC@public.gmane.org>
2010-06-18 15:04 ` Chris Vine
[not found] ` <20100618160411.14d65b2b-PlFwYbz7hoIyinQH9hAyzg@public.gmane.org>
2010-06-18 15:11 ` Jeff Layton
[not found] ` <20100618111132.20fb518a-4QP7MXygkU+dMjc06nkz3ljfA9RmPOcC@public.gmane.org>
2010-06-18 22:45 ` Chris Vine
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox