* [PATCH - take 2] knfsd: nfsd: Handle ERESTARTSYS from syscalls. [not found] <20080619101025.24263.patches@notabene> @ 2008-06-19 0:11 ` NeilBrown 2008-06-19 1:09 ` Jeff Layton 2008-06-20 17:34 ` [PATCH - take 2] knfsd: nfsd: Handle ERESTARTSYS from syscalls J. Bruce Fields 0 siblings, 2 replies; 18+ messages in thread From: NeilBrown @ 2008-06-19 0:11 UTC (permalink / raw) To: J. Bruce Fields; +Cc: linux-nfs, linux-kernel OCFS2 can return -ERESTARTSYS from write requests (and possibly elsewhere) if there is a signal pending. If nfsd is shutdown (by sending a signal to each thread) while there is still an IO load from the client, each thread could handle one last request with a signal pending. This can result in -ERESTARTSYS which is not understood by nfserrno() and so is reflected back to the client as nfserr_io aka -EIO. This is wrong. Instead, interpret ERESTARTSYS to mean "try again later" by returning nfserr_jukebox. The client will resend and - if the server is restarted - the write will (hopefully) be successful and everyone will be happy. The symptom that I narrowed down to this was: copy a large file via NFS to an OCFS2 filesystem, and restart the nfs server during the copy. The 'cp' might get an -EIO, and the file will be corrupted - presumably holes in the middle where writes appeared to fail. Signed-off-by: Neil Brown <neilb@suse.de> ### Diffstat output ./fs/nfsd/nfsproc.c | 1 + 1 file changed, 1 insertion(+) diff .prev/fs/nfsd/nfsproc.c ./fs/nfsd/nfsproc.c --- .prev/fs/nfsd/nfsproc.c 2008-06-19 10:06:36.000000000 +1000 +++ ./fs/nfsd/nfsproc.c 2008-06-19 10:07:58.000000000 +1000 @@ -614,6 +614,7 @@ nfserrno (int errno) #endif { nfserr_stale, -ESTALE }, { nfserr_jukebox, -ETIMEDOUT }, + { nfserr_jukebox, -ERESTARTSYS }, { nfserr_dropit, -EAGAIN }, { nfserr_dropit, -ENOMEM }, { nfserr_badname, -ESRCH }, ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH - take 2] knfsd: nfsd: Handle ERESTARTSYS from syscalls. 2008-06-19 0:11 ` [PATCH - take 2] knfsd: nfsd: Handle ERESTARTSYS from syscalls NeilBrown @ 2008-06-19 1:09 ` Jeff Layton 2008-06-19 2:29 ` Neil Brown 2008-06-20 17:34 ` [PATCH - take 2] knfsd: nfsd: Handle ERESTARTSYS from syscalls J. Bruce Fields 1 sibling, 1 reply; 18+ messages in thread From: Jeff Layton @ 2008-06-19 1:09 UTC (permalink / raw) To: NeilBrown; +Cc: J. Bruce Fields, linux-nfs, linux-kernel On Thu, 19 Jun 2008 10:11:09 +1000 NeilBrown <neilb@suse.de> wrote: > > OCFS2 can return -ERESTARTSYS from write requests (and possibly > elsewhere) if there is a signal pending. > > If nfsd is shutdown (by sending a signal to each thread) while there > is still an IO load from the client, each thread could handle one last > request with a signal pending. This can result in -ERESTARTSYS > which is not understood by nfserrno() and so is reflected back to > the client as nfserr_io aka -EIO. This is wrong. > > Instead, interpret ERESTARTSYS to mean "try again later" by returning > nfserr_jukebox. The client will resend and - if the server is > restarted - the write will (hopefully) be successful and everyone will > be happy. > > The symptom that I narrowed down to this was: > copy a large file via NFS to an OCFS2 filesystem, and restart > the nfs server during the copy. > The 'cp' might get an -EIO, and the file will be corrupted - > presumably holes in the middle where writes appeared to fail. > > > Signed-off-by: Neil Brown <neilb@suse.de> > > ### Diffstat output > ./fs/nfsd/nfsproc.c | 1 + > 1 file changed, 1 insertion(+) > > diff .prev/fs/nfsd/nfsproc.c ./fs/nfsd/nfsproc.c > --- .prev/fs/nfsd/nfsproc.c 2008-06-19 10:06:36.000000000 +1000 > +++ ./fs/nfsd/nfsproc.c 2008-06-19 10:07:58.000000000 +1000 > @@ -614,6 +614,7 @@ nfserrno (int errno) > #endif > { nfserr_stale, -ESTALE }, > { nfserr_jukebox, -ETIMEDOUT }, > + { nfserr_jukebox, -ERESTARTSYS }, > { nfserr_dropit, -EAGAIN }, > { nfserr_dropit, -ENOMEM }, > { nfserr_badname, -ESRCH }, > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > No objection to the patch, but what signal was being sent to nfsd when you saw this? If it's anything but a SIGKILL, then I wonder if we have a race that we need to deal with. My understanding is that we have nfsd flip between 2 sigmasks to prevent anything but a SIGKILL from being delivered while we're handling the local filesystem operation. >From nfsd(): ----------[snip]----------- sigprocmask(SIG_SETMASK, &shutdown_mask, NULL); /* * Find a socket with data available and call its * recvfrom routine. */ while ((err = svc_recv(rqstp, 60*60*HZ)) == -EAGAIN) ; if (err < 0) break; update_thread_usage(atomic_read(&nfsd_busy)); atomic_inc(&nfsd_busy); /* Lock the export hash tables for reading. */ exp_readlock(); /* Process request with signals blocked. */ sigprocmask(SIG_SETMASK, &allowed_mask, NULL); svc_process(rqstp); ----------[snip]----------- What happens if this catches a SIGINT after the err<0 check, but before the mask is set to allowed_mask? Does svc_process() then get called with a signal pending? -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH - take 2] knfsd: nfsd: Handle ERESTARTSYS from syscalls. 2008-06-19 1:09 ` Jeff Layton @ 2008-06-19 2:29 ` Neil Brown 2008-06-19 10:38 ` Jeff Layton 0 siblings, 1 reply; 18+ messages in thread From: Neil Brown @ 2008-06-19 2:29 UTC (permalink / raw) To: Jeff Layton; +Cc: J. Bruce Fields, linux-nfs, linux-kernel On Wednesday June 18, jlayton@redhat.com wrote: > > No objection to the patch, but what signal was being sent to nfsd when > you saw this? If it's anything but a SIGKILL, then I wonder if we have > a race that we need to deal with. My understanding is that we have nfsd > flip between 2 sigmasks to prevent anything but a SIGKILL from being > delivered while we're handling the local filesystem operation. SuSE /etc/init.d/nfsserver does killproc -n -KILL nfsd so it looks like a SIGKILL. > > From nfsd(): > > ----------[snip]----------- > sigprocmask(SIG_SETMASK, &shutdown_mask, NULL); > > /* > * Find a socket with data available and call its > * recvfrom routine. > */ > while ((err = svc_recv(rqstp, 60*60*HZ)) == -EAGAIN) > ; > if (err < 0) > break; > update_thread_usage(atomic_read(&nfsd_busy)); > atomic_inc(&nfsd_busy); > > /* Lock the export hash tables for reading. */ > exp_readlock(); > > /* Process request with signals blocked. */ > sigprocmask(SIG_SETMASK, &allowed_mask, NULL); > > svc_process(rqstp); > > ----------[snip]----------- > > What happens if this catches a SIGINT after the err<0 check, but before > the mask is set to allowed_mask? Does svc_process() then get called with > a signal pending? Yes, I suspect it does. I wonder why we have all this mucking about this signal masks anyway. Anyone have any ideas about what it actually achieves? NeilBrown ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH - take 2] knfsd: nfsd: Handle ERESTARTSYS from syscalls. 2008-06-19 2:29 ` Neil Brown @ 2008-06-19 10:38 ` Jeff Layton [not found] ` <20080619063824.00ca6381-RtJpwOs3+0O+kQycOl6kW4xkIHaj4LzF@public.gmane.org> 2008-06-23 0:20 ` Neil Brown 0 siblings, 2 replies; 18+ messages in thread From: Jeff Layton @ 2008-06-19 10:38 UTC (permalink / raw) To: Neil Brown; +Cc: J. Bruce Fields, linux-nfs, linux-kernel On Thu, 19 Jun 2008 12:29:16 +1000 Neil Brown <neilb@suse.de> wrote: > On Wednesday June 18, jlayton@redhat.com wrote: > > > > No objection to the patch, but what signal was being sent to nfsd when > > you saw this? If it's anything but a SIGKILL, then I wonder if we have > > a race that we need to deal with. My understanding is that we have nfsd > > flip between 2 sigmasks to prevent anything but a SIGKILL from being > > delivered while we're handling the local filesystem operation. > > SuSE /etc/init.d/nfsserver does > > killproc -n -KILL nfsd > > so it looks like a SIGKILL. > > > > > > From nfsd(): > > > > ----------[snip]----------- > > sigprocmask(SIG_SETMASK, &shutdown_mask, NULL); > > > > /* > > * Find a socket with data available and call its > > * recvfrom routine. > > */ > > while ((err = svc_recv(rqstp, 60*60*HZ)) == -EAGAIN) > > ; > > if (err < 0) > > break; > > update_thread_usage(atomic_read(&nfsd_busy)); > > atomic_inc(&nfsd_busy); > > > > /* Lock the export hash tables for reading. */ > > exp_readlock(); > > > > /* Process request with signals blocked. */ > > sigprocmask(SIG_SETMASK, &allowed_mask, NULL); > > > > svc_process(rqstp); > > > > ----------[snip]----------- > > > > What happens if this catches a SIGINT after the err<0 check, but before > > the mask is set to allowed_mask? Does svc_process() then get called with > > a signal pending? > > Yes, I suspect it does. > > I wonder why we have all this mucking about this signal masks anyway. > Anyone have any ideas about what it actually achieves? > HCH asked me the same question when I did the conversion to kthreads. My interpretation (based on guesswork here) was that we wanted to distinguish between SIGKILL and other allowed signals. A SIGKILL is allowed to interrupt the underlying I/O, but other signals should not. The question to answer here, I suppose, is whether masking a pending signal is sufficient to make signal_pending() return false. If I'm looking correctly then the answer should be "yes". So I don't think we have a race here after all. I suspect that if SuSE used a different signal here, that would prevent this from happening. For the record, both RHEL and Fedora's init scripts use SIGINT for this. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <20080619063824.00ca6381-RtJpwOs3+0O+kQycOl6kW4xkIHaj4LzF@public.gmane.org>]
* Re: [PATCH - take 2] knfsd: nfsd: Handle ERESTARTSYS from syscalls. 2008-06-19 10:38 ` Jeff Layton @ 2008-06-20 17:50 ` J. Bruce Fields 2008-06-23 0:20 ` Neil Brown 1 sibling, 0 replies; 18+ messages in thread From: J. Bruce Fields @ 2008-06-20 17:50 UTC (permalink / raw) To: Jeff Layton; +Cc: Neil Brown, linux-nfs, linux-kernel On Thu, Jun 19, 2008 at 06:38:24AM -0400, Jeff Layton wrote: > On Thu, 19 Jun 2008 12:29:16 +1000 > Neil Brown <neilb@suse.de> wrote: > > > On Wednesday June 18, jlayton@redhat.com wrote: > > > > > > No objection to the patch, but what signal was being sent to nfsd when > > > you saw this? If it's anything but a SIGKILL, then I wonder if we have > > > a race that we need to deal with. My understanding is that we have nfsd > > > flip between 2 sigmasks to prevent anything but a SIGKILL from being > > > delivered while we're handling the local filesystem operation. > > > > SuSE /etc/init.d/nfsserver does > > > > killproc -n -KILL nfsd > > > > so it looks like a SIGKILL. > > > > > > > > > > From nfsd(): > > > > > > ----------[snip]----------- > > > sigprocmask(SIG_SETMASK, &shutdown_mask, NULL); > > > > > > /* > > > * Find a socket with data available and call its > > > * recvfrom routine. > > > */ > > > while ((err = svc_recv(rqstp, 60*60*HZ)) == -EAGAIN) > > > ; > > > if (err < 0) > > > break; > > > update_thread_usage(atomic_read(&nfsd_busy)); > > > atomic_inc(&nfsd_busy); > > > > > > /* Lock the export hash tables for reading. */ > > > exp_readlock(); > > > > > > /* Process request with signals blocked. */ > > > sigprocmask(SIG_SETMASK, &allowed_mask, NULL); > > > > > > svc_process(rqstp); > > > > > > ----------[snip]----------- > > > > > > What happens if this catches a SIGINT after the err<0 check, but before > > > the mask is set to allowed_mask? Does svc_process() then get called with > > > a signal pending? > > > > Yes, I suspect it does. > > > > I wonder why we have all this mucking about this signal masks anyway. > > Anyone have any ideas about what it actually achieves? > > > > HCH asked me the same question when I did the conversion to kthreads. > My interpretation (based on guesswork here) was that we wanted to > distinguish between SIGKILL and other allowed signals. A SIGKILL is > allowed to interrupt the underlying I/O, but other signals should not. > > The question to answer here, I suppose, is whether masking a pending > signal is sufficient to make signal_pending() return false. If I'm > looking correctly then the answer should be "yes". Just looking out of curiosity: signal_pending() checks whether some thread_info->flags has TIF_SIGPENDING set. sigprocmask() sets current->blocked to the given set, then calls recalc_sigpending(), which (ignoring some freezer and SIGSTOP code that I don't understand), clears TIF_SIGPENDING if any pending signals are in the newly blocked set. So, yes. --b. > So I don't think we > have a race here after all. I suspect that if SuSE used a different > signal here, that would prevent this from happening. For the record, > both RHEL and Fedora's init scripts use SIGINT for this. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH - take 2] knfsd: nfsd: Handle ERESTARTSYS from syscalls. @ 2008-06-20 17:50 ` J. Bruce Fields 0 siblings, 0 replies; 18+ messages in thread From: J. Bruce Fields @ 2008-06-20 17:50 UTC (permalink / raw) To: Jeff Layton; +Cc: Neil Brown, linux-nfs, linux-kernel On Thu, Jun 19, 2008 at 06:38:24AM -0400, Jeff Layton wrote: > On Thu, 19 Jun 2008 12:29:16 +1000 > Neil Brown <neilb@suse.de> wrote: > > > On Wednesday June 18, jlayton@redhat.com wrote: > > > > > > No objection to the patch, but what signal was being sent to nfsd when > > > you saw this? If it's anything but a SIGKILL, then I wonder if we have > > > a race that we need to deal with. My understanding is that we have nfsd > > > flip between 2 sigmasks to prevent anything but a SIGKILL from being > > > delivered while we're handling the local filesystem operation. > > > > SuSE /etc/init.d/nfsserver does > > > > killproc -n -KILL nfsd > > > > so it looks like a SIGKILL. > > > > > > > > > > From nfsd(): > > > > > > ----------[snip]----------- > > > sigprocmask(SIG_SETMASK, &shutdown_mask, NULL); > > > > > > /* > > > * Find a socket with data available and call its > > > * recvfrom routine. > > > */ > > > while ((err = svc_recv(rqstp, 60*60*HZ)) == -EAGAIN) > > > ; > > > if (err < 0) > > > break; > > > update_thread_usage(atomic_read(&nfsd_busy)); > > > atomic_inc(&nfsd_busy); > > > > > > /* Lock the export hash tables for reading. */ > > > exp_readlock(); > > > > > > /* Process request with signals blocked. */ > > > sigprocmask(SIG_SETMASK, &allowed_mask, NULL); > > > > > > svc_process(rqstp); > > > > > > ----------[snip]----------- > > > > > > What happens if this catches a SIGINT after the err<0 check, but before > > > the mask is set to allowed_mask? Does svc_process() then get called with > > > a signal pending? > > > > Yes, I suspect it does. > > > > I wonder why we have all this mucking about this signal masks anyway. > > Anyone have any ideas about what it actually achieves? > > > > HCH asked me the same question when I did the conversion to kthreads. > My interpretation (based on guesswork here) was that we wanted to > distinguish between SIGKILL and other allowed signals. A SIGKILL is > allowed to interrupt the underlying I/O, but other signals should not. > > The question to answer here, I suppose, is whether masking a pending > signal is sufficient to make signal_pending() return false. If I'm > looking correctly then the answer should be "yes". Just looking out of curiosity: signal_pending() checks whether some thread_info->flags has TIF_SIGPENDING set. sigprocmask() sets current->blocked to the given set, then calls recalc_sigpending(), which (ignoring some freezer and SIGSTOP code that I don't understand), clears TIF_SIGPENDING if any pending signals are in the newly blocked set. So, yes. --b. > So I don't think we > have a race here after all. I suspect that if SuSE used a different > signal here, that would prevent this from happening. For the record, > both RHEL and Fedora's init scripts use SIGINT for this. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH - take 2] knfsd: nfsd: Handle ERESTARTSYS from syscalls. 2008-06-19 10:38 ` Jeff Layton [not found] ` <20080619063824.00ca6381-RtJpwOs3+0O+kQycOl6kW4xkIHaj4LzF@public.gmane.org> @ 2008-06-23 0:20 ` Neil Brown [not found] ` <18526.60523.584503.68076-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org> 1 sibling, 1 reply; 18+ messages in thread From: Neil Brown @ 2008-06-23 0:20 UTC (permalink / raw) To: Jeff Layton; +Cc: J. Bruce Fields, linux-nfs, linux-kernel On Thursday June 19, jlayton@redhat.com wrote: > > > > I wonder why we have all this mucking about this signal masks anyway. > > Anyone have any ideas about what it actually achieves? > > > > HCH asked me the same question when I did the conversion to kthreads. > My interpretation (based on guesswork here) was that we wanted to > distinguish between SIGKILL and other allowed signals. A SIGKILL is > allowed to interrupt the underlying I/O, but other signals should not. Yes, that seems to be the intent of the code, but does it really make any sense? Traditionally, filesystem IO is immune to signals so any signal arriving while nfsd is doing filesystem IO is irrelevant. I guess network IO can be interrupted, and nfsd is expected to write to the network during this time when all-but-SIGKILL is ignored. So maybe the intent was to allow e.g. SIGINT to shut down cleanly, and SIGKILL to kill more promptly even if the network was sluggish... However I don't think we ever blocked on network write. And currently we try very hard to ensure that there is enough memory to hold a reply before we start on a request, so it should definitely never block. So until OCFS2 came along, the distinction seems to have been meaningless, which is why I questioned it. Even with OCFS2 I'm not sure it makes much sense. Any time that we shut down NFSD, we really want it to shut down promptly. If SIGINT isn't certain to achieve that, we should just be sending SIGKILL. Conversely, if it does make sense to keep the two signals, we should be educating init.d scripts to make use of them. e.g. killall -2 nfsd sleep 0.1 if [ `cat /proc/fs/nfsd/threads` -gt 0 ] then sleep 5 killall -9 nfsd fi or something more clever. I think my leaning is just to remove the distinction. About the worst that can happen if the filesystem decides to respond to the signal is that you get a short write or a short read, both of which should be correctly handled by the client. This brings up an interesting issue for your kthread conversion. The new code uses kthread_stop instead of sending a signal. That means it will block until the threads have completed a request and exited. If some thread is in a filesystem that is waiting on network IO and expects to get a signal if it needs to stop in a hurry, then the shutdown could block for a long time. That might not be what we want. Particularly as if we are blocked in kthread_stop, then no-one else can do a kthread stop.... I think there are issues here that don't yet have obvious answers. Other opinions most welcome. > > The question to answer here, I suppose, is whether masking a pending > signal is sufficient to make signal_pending() return false. If I'm > looking correctly then the answer should be "yes". So I don't think we > have a race here after all. I suspect that if SuSE used a different > signal here, that would prevent this from happening. For the record, > both RHEL and Fedora's init scripts use SIGINT for this. Yep. Using SIGINT or "rpc.nfsd 0" would avoid the problem. So does mounted the OCFS2 filesystem with -o nointr. NeilBrown ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <18526.60523.584503.68076-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>]
* Re: [PATCH - take 2] knfsd: nfsd: Handle ERESTARTSYS from syscalls. 2008-06-23 0:20 ` Neil Brown @ 2008-06-23 0:52 ` Jeff Layton 0 siblings, 0 replies; 18+ messages in thread From: Jeff Layton @ 2008-06-23 0:52 UTC (permalink / raw) To: Neil Brown; +Cc: J. Bruce Fields, linux-nfs, linux-kernel On Mon, 23 Jun 2008 10:20:59 +1000 Neil Brown <neilb@suse.de> wrote: > On Thursday June 19, jlayton@redhat.com wrote: > > > > > > I wonder why we have all this mucking about this signal masks anyway. > > > Anyone have any ideas about what it actually achieves? > > > > > > > HCH asked me the same question when I did the conversion to kthreads. > > My interpretation (based on guesswork here) was that we wanted to > > distinguish between SIGKILL and other allowed signals. A SIGKILL is > > allowed to interrupt the underlying I/O, but other signals should not. > > Yes, that seems to be the intent of the code, but does it really make > any sense? > > Traditionally, filesystem IO is immune to signals so any signal > arriving while nfsd is doing filesystem IO is irrelevant. > I guess network IO can be interrupted, and nfsd is expected to write > to the network during this time when all-but-SIGKILL is ignored. So > maybe the intent was to allow e.g. SIGINT to shut down cleanly, and > SIGKILL to kill more promptly even if the network was sluggish... > > However I don't think we ever blocked on network write. And currently > we try very hard to ensure that there is enough memory to hold a reply > before we start on a request, so it should definitely never block. > > So until OCFS2 came along, the distinction seems to have been > meaningless, which is why I questioned it. > > Even with OCFS2 I'm not sure it makes much sense. > > Any time that we shut down NFSD, we really want it to shut down > promptly. If SIGINT isn't certain to achieve that, we should just be > sending SIGKILL. > Conversely, if it does make sense to keep the two signals, we should > be educating init.d scripts to make use of them. e.g. > > killall -2 nfsd > sleep 0.1 > if [ `cat /proc/fs/nfsd/threads` -gt 0 ] > then sleep 5 > killall -9 nfsd > fi > > or something more clever. > > I think my leaning is just to remove the distinction. About the worst > that can happen if the filesystem decides to respond to the signal is > that you get a short write or a short read, both of which should be > correctly handled by the client. > Good points all around. If the sigmask juggling makes no sense, then I'm OK with removing it. I'd prefer that we simplify the code rather than trying to get clever with init scripts anyway... The fact that SuSE (and possibly others) have used SIGKILL to take down nfsd probably means that removing it is a non-issue (modulo the OCFS issue your patch fixes). > > This brings up an interesting issue for your kthread conversion. > The new code uses kthread_stop instead of sending a signal. That > means it will block until the threads have completed a request and > exited. > If some thread is in a filesystem that is waiting on network IO and > expects to get a signal if it needs to stop in a hurry, then the > shutdown could block for a long time. That might not be what we > want. > > Particularly as if we are blocked in kthread_stop, then no-one else can > do a kthread stop.... > > I think there are issues here that don't yet have obvious answers. > Other opinions most welcome. > The latest patch doesn't use kthread_stop with nfsd. I figured it was best to avoid having multiple takedown methods, and since we're using signals here anyway, I just kept signaling as the only way to take down nfsd. Plus, as you mention -- all kthread_stop's are serialized, which could have meant that nfsd's would take a long time to come down on a busy box with a lot of them. As a side note, I'm not terribly thrilled with how kthread_stop is implemented. I worry that a stuck kthread would block unrelated kthreads from coming down. I did a patch 6 mos ago or so to make it so that kthread_stop's could be done in parallel. The downside was that it added a new field to the task_struct (which is already too bloated, IMO). It might be worth resurrecting that at some point (possibly making the new field a union with another field that's unused in kthreads), or considering a different approach to parallelize kthread_stop. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH - take 2] knfsd: nfsd: Handle ERESTARTSYS from syscalls. @ 2008-06-23 0:52 ` Jeff Layton 0 siblings, 0 replies; 18+ messages in thread From: Jeff Layton @ 2008-06-23 0:52 UTC (permalink / raw) To: Neil Brown; +Cc: J. Bruce Fields, linux-nfs, linux-kernel On Mon, 23 Jun 2008 10:20:59 +1000 Neil Brown <neilb@suse.de> wrote: > On Thursday June 19, jlayton@redhat.com wrote: > > > > > > I wonder why we have all this mucking about this signal masks anyway. > > > Anyone have any ideas about what it actually achieves? > > > > > > > HCH asked me the same question when I did the conversion to kthreads. > > My interpretation (based on guesswork here) was that we wanted to > > distinguish between SIGKILL and other allowed signals. A SIGKILL is > > allowed to interrupt the underlying I/O, but other signals should not. > > Yes, that seems to be the intent of the code, but does it really make > any sense? > > Traditionally, filesystem IO is immune to signals so any signal > arriving while nfsd is doing filesystem IO is irrelevant. > I guess network IO can be interrupted, and nfsd is expected to write > to the network during this time when all-but-SIGKILL is ignored. So > maybe the intent was to allow e.g. SIGINT to shut down cleanly, and > SIGKILL to kill more promptly even if the network was sluggish... > > However I don't think we ever blocked on network write. And currently > we try very hard to ensure that there is enough memory to hold a reply > before we start on a request, so it should definitely never block. > > So until OCFS2 came along, the distinction seems to have been > meaningless, which is why I questioned it. > > Even with OCFS2 I'm not sure it makes much sense. > > Any time that we shut down NFSD, we really want it to shut down > promptly. If SIGINT isn't certain to achieve that, we should just be > sending SIGKILL. > Conversely, if it does make sense to keep the two signals, we should > be educating init.d scripts to make use of them. e.g. > > killall -2 nfsd > sleep 0.1 > if [ `cat /proc/fs/nfsd/threads` -gt 0 ] > then sleep 5 > killall -9 nfsd > fi > > or something more clever. > > I think my leaning is just to remove the distinction. About the worst > that can happen if the filesystem decides to respond to the signal is > that you get a short write or a short read, both of which should be > correctly handled by the client. > Good points all around. If the sigmask juggling makes no sense, then I'm OK with removing it. I'd prefer that we simplify the code rather than trying to get clever with init scripts anyway... The fact that SuSE (and possibly others) have used SIGKILL to take down nfsd probably means that removing it is a non-issue (modulo the OCFS issue your patch fixes). > > This brings up an interesting issue for your kthread conversion. > The new code uses kthread_stop instead of sending a signal. That > means it will block until the threads have completed a request and > exited. > If some thread is in a filesystem that is waiting on network IO and > expects to get a signal if it needs to stop in a hurry, then the > shutdown could block for a long time. That might not be what we > want. > > Particularly as if we are blocked in kthread_stop, then no-one else can > do a kthread stop.... > > I think there are issues here that don't yet have obvious answers. > Other opinions most welcome. > The latest patch doesn't use kthread_stop with nfsd. I figured it was best to avoid having multiple takedown methods, and since we're using signals here anyway, I just kept signaling as the only way to take down nfsd. Plus, as you mention -- all kthread_stop's are serialized, which could have meant that nfsd's would take a long time to come down on a busy box with a lot of them. As a side note, I'm not terribly thrilled with how kthread_stop is implemented. I worry that a stuck kthread would block unrelated kthreads from coming down. I did a patch 6 mos ago or so to make it so that kthread_stop's could be done in parallel. The downside was that it added a new field to the task_struct (which is already too bloated, IMO). It might be worth resurrecting that at some point (possibly making the new field a union with another field that's unused in kthreads), or considering a different approach to parallelize kthread_stop. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH - take 2] knfsd: nfsd: Handle ERESTARTSYS from syscalls. (and possible kthread_stop changes) 2008-06-23 0:52 ` Jeff Layton (?) @ 2008-06-23 23:55 ` Neil Brown [not found] ` <18528.14347.525397.917553-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org> -1 siblings, 1 reply; 18+ messages in thread From: Neil Brown @ 2008-06-23 23:55 UTC (permalink / raw) To: Jeff Layton; +Cc: J. Bruce Fields, linux-nfs, linux-kernel On Sunday June 22, jlayton@redhat.com wrote: > On Mon, 23 Jun 2008 10:20:59 +1000 > Neil Brown <neilb@suse.de> wrote: > > > > I think my leaning is just to remove the distinction. About the worst > > that can happen if the filesystem decides to respond to the signal is > > that you get a short write or a short read, both of which should be > > correctly handled by the client. > > > > Good points all around. If the sigmask juggling makes no sense, then > I'm OK with removing it. I'd prefer that we simplify the code rather > than trying to get clever with init scripts anyway... > Let's do it? diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 941041f..914e579 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -36,12 +36,7 @@ #define NFSDDBG_FACILITY NFSDDBG_SVC -/* these signals will be delivered to an nfsd thread - * when handling a request - */ -#define ALLOWED_SIGS (sigmask(SIGKILL)) -/* these signals will be delivered to an nfsd thread - * when not handling a request. i.e. when waiting +/* These signals can be used to shutdown an nfsd thread. */ #define SHUTDOWN_SIGS (sigmask(SIGKILL) | sigmask(SIGHUP) | sigmask(SIGINT) | sigmask(SIGQUIT)) /* if the last thread dies with SIGHUP, then the exports table is @@ -396,7 +391,7 @@ nfsd(struct svc_rqst *rqstp) { struct fs_struct *fsp; int err; - sigset_t shutdown_mask, allowed_mask; + sigset_t shutdown_mask; /* Lock module and set up kernel thread */ lock_kernel(); @@ -415,7 +410,8 @@ nfsd(struct svc_rqst *rqstp) current->fs->umask = 0; siginitsetinv(&shutdown_mask, SHUTDOWN_SIGS); - siginitsetinv(&allowed_mask, ALLOWED_SIGS); + /* Block all but the shutdown signals */ + sigprocmask(SIG_SETMASK, &shutdown_mask, NULL); nfsdstats.th_cnt++; @@ -435,8 +431,6 @@ nfsd(struct svc_rqst *rqstp) * The main request loop */ for (;;) { - /* Block all but the shutdown signals */ - sigprocmask(SIG_SETMASK, &shutdown_mask, NULL); /* * Find a socket with data available and call its @@ -452,9 +446,6 @@ nfsd(struct svc_rqst *rqstp) /* Lock the export hash tables for reading. */ exp_readlock(); - /* Process request with signals blocked. */ - sigprocmask(SIG_SETMASK, &allowed_mask, NULL); - svc_process(rqstp); /* Unlock export hash tables */ > > The latest patch doesn't use kthread_stop with nfsd. I figured it was > best to avoid having multiple takedown methods, and since we're using > signals here anyway, I just kept signaling as the only way to take down > nfsd. Plus, as you mention -- all kthread_stop's are serialized, which > could have meant that nfsd's would take a long time to come down on a > busy box with a lot of them. Oh, good. I must have been looking at an old patch. > > As a side note, I'm not terribly thrilled with how kthread_stop is > implemented. I worry that a stuck kthread would block unrelated > kthreads from coming down. I did a patch 6 mos ago or so to make it so > that kthread_stop's could be done in parallel. The downside was that it > added a new field to the task_struct (which is already too bloated, > IMO). It might be worth resurrecting that at some point (possibly > making the new field a union with another field that's unused in > kthreads), or considering a different approach to parallelize > kthread_stop. One the one hand, kthreads are expected to not block and to check for kthread_should_stop very often, so this wouldn't be a problem. On the other hand, expectations like this are quickly invalidated, and the code probably should be fixed. I suspect we could do without adding anything to task_struct. Instead of just one kthread_stop_info, have a linked list, one entry for each thread being stopped at the moment. kthread_stop adds to the list, sets PF_EXITING (I hope that is safe) and wakes the process. kthread_should_stop tests PF_EXITING. When kthread() finishes, it does a linear search (the list should be short) and calls complete and the relevant .done. Something like this? NeilBrown diff --git a/kernel/kthread.c b/kernel/kthread.c index bd1b9ea..107290a 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -39,12 +39,13 @@ struct kthread_stop_info struct task_struct *k; int err; struct completion done; + struct kthread_stp_info *next; }; /* Thread stopping is done by setthing this var: lock serializes * multiple kthread_stop calls. */ static DEFINE_MUTEX(kthread_stop_lock); -static struct kthread_stop_info kthread_stop_info; +static struct kthread_stop_info *kthread_stop_info; /** * kthread_should_stop - should this kthread return now? @@ -55,7 +56,7 @@ static struct kthread_stop_info kthread_stop_info; */ int kthread_should_stop(void) { - return (kthread_stop_info.k == current); + return test_bit(PF_EXITING, ¤t.flags); } EXPORT_SYMBOL(kthread_should_stop); @@ -80,8 +81,17 @@ static int kthread(void *_create) /* It might have exited on its own, w/o kthread_stop. Check. */ if (kthread_should_stop()) { - kthread_stop_info.err = ret; - complete(&kthread_stop_info.done); + struct kthread_stop_info **ksip; + mutex_lock(&kthread_stop_lock); + for (ksip = &kthread_stop_info ; *ksip ; ksip = &(*ksip)->next) + if ((*ksip)->k == current) + break; + *ksip = (*ksip)->next; + (*ksip)->next = NULL; + mutex_unlock(&kthread_stop_lock); + + ksi->err = ret; + complete(&ksi->done); } return 0; } @@ -199,26 +209,20 @@ EXPORT_SYMBOL(kthread_bind); int kthread_stop(struct task_struct *k) { int ret; + struct kthread_stop_info ksi; mutex_lock(&kthread_stop_lock); - /* It could exit after stop_info.k set, but before wake_up_process. */ - get_task_struct(k); + init_completion(&ksi->done); + ksi->k = k; - /* Must init completion *before* thread sees kthread_stop_info.k */ - init_completion(&kthread_stop_info.done); - smp_wmb(); - - /* Now set kthread_should_stop() to true, and wake it up. */ - kthread_stop_info.k = k; + set_bit(PF_EXITING, &k->flags); wake_up_process(k); - put_task_struct(k); + mutex_unlock(&kthread_stop_lock); - /* Once it dies, reset stop ptr, gather result and we're done. */ - wait_for_completion(&kthread_stop_info.done); - kthread_stop_info.k = NULL; + /* Once it dies gather result and we're done. */ + wait_for_completion(&ksi->done); ret = kthread_stop_info.err; - mutex_unlock(&kthread_stop_lock); return ret; } ^ permalink raw reply related [flat|nested] 18+ messages in thread
[parent not found: <18528.14347.525397.917553-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>]
* Re: [PATCH - take 2] knfsd: nfsd: Handle ERESTARTSYS from syscalls. (and possible kthread_stop changes) 2008-06-23 23:55 ` [PATCH - take 2] knfsd: nfsd: Handle ERESTARTSYS from syscalls. (and possible kthread_stop changes) Neil Brown @ 2008-06-30 12:35 ` Jeff Layton 0 siblings, 0 replies; 18+ messages in thread From: Jeff Layton @ 2008-06-30 12:35 UTC (permalink / raw) To: Neil Brown; +Cc: J. Bruce Fields, linux-nfs, linux-kernel On Tue, 24 Jun 2008 09:55:55 +1000 Neil Brown <neilb@suse.de> wrote: > On Sunday June 22, jlayton@redhat.com wrote: > > On Mon, 23 Jun 2008 10:20:59 +1000 > > Neil Brown <neilb@suse.de> wrote: > > > > > > I think my leaning is just to remove the distinction. About the worst > > > that can happen if the filesystem decides to respond to the signal is > > > that you get a short write or a short read, both of which should be > > > correctly handled by the client. > > > > > > > Good points all around. If the sigmask juggling makes no sense, then > > I'm OK with removing it. I'd prefer that we simplify the code rather > > than trying to get clever with init scripts anyway... > > > > Let's do it? > The idea looks reasonable, but the patch above doesn't apply to Bruce's for-2.6.27 tree. The nfsd() function has changed a bit. Actually, I think we can make this even simpler. kthreads start with all signals ignored and the new patch enables all of the ones in SHUTDOWN_SIGS. So I don't think we need any signal masking at all. How about this patch? It should apply on top of the patchset already in Bruce's tree. ------------[snip]-------------- Subject: [PATCH] knfsd: treat all shutdown signals as equivalent knfsd currently uses 2 signal masks when processing requests. A "loose" mask (SHUTDOWN_SIGS) that it uses when receiving network requests, and then a more "strict" mask (ALLOWED_SIGS, which is just SIGKILL) that it allows when doing the actual operation on the local storage. This is apparently unnecessarily complicated. The underlying filesystem should be able to sanely handle a signal in the middle of an operation. This patch removes the signal mask handling from knfsd altogether. When knfsd is started as a kthread, all signals are ignored. It then allows all of the signals in SHUTDOWN_SIGS. There's no need to set the mask as well. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- fs/nfsd/nfssvc.c | 20 ++------------------ 1 files changed, 2 insertions(+), 18 deletions(-) diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 96fdbca..d20a087 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -37,13 +37,7 @@ #define NFSDDBG_FACILITY NFSDDBG_SVC -/* these signals will be delivered to an nfsd thread - * when handling a request - */ -#define ALLOWED_SIGS (sigmask(SIGKILL)) -/* these signals will be delivered to an nfsd thread - * when not handling a request. i.e. when waiting - */ +/* These signals can be used to shutdown an nfsd thread. */ #define SHUTDOWN_SIGS (sigmask(SIGKILL) | sigmask(SIGHUP) | sigmask(SIGINT) | sigmask(SIGQUIT)) extern struct svc_program nfsd_program; @@ -414,7 +408,6 @@ nfsd(void *vrqstp) { struct svc_rqst *rqstp = (struct svc_rqst *) vrqstp; struct fs_struct *fsp; - sigset_t shutdown_mask, allowed_mask; int err, preverr = 0; unsigned int signo; @@ -433,15 +426,12 @@ nfsd(void *vrqstp) current->fs = fsp; current->fs->umask = 0; - siginitsetinv(&shutdown_mask, SHUTDOWN_SIGS); - siginitsetinv(&allowed_mask, ALLOWED_SIGS); - /* * thread is spawned with all signals set to SIG_IGN, re-enable * the ones that matter */ for (signo = 1; signo <= _NSIG; signo++) { - if (!sigismember(&shutdown_mask, signo)) + if (siginmask(signo, SHUTDOWN_SIGS)) allow_signal(signo); } @@ -460,9 +450,6 @@ nfsd(void *vrqstp) * The main request loop */ for (;;) { - /* Block all but the shutdown signals */ - sigprocmask(SIG_SETMASK, &shutdown_mask, NULL); - /* * Find a socket with data available and call its * recvfrom routine. @@ -487,9 +474,6 @@ nfsd(void *vrqstp) /* Lock the export hash tables for reading. */ exp_readlock(); - /* Process request with signals blocked. */ - sigprocmask(SIG_SETMASK, &allowed_mask, NULL); - svc_process(rqstp); /* Unlock export hash tables */ -- 1.5.5.1 > > > > The latest patch doesn't use kthread_stop with nfsd. I figured it was > > best to avoid having multiple takedown methods, and since we're using > > signals here anyway, I just kept signaling as the only way to take down > > nfsd. Plus, as you mention -- all kthread_stop's are serialized, which > > could have meant that nfsd's would take a long time to come down on a > > busy box with a lot of them. > > Oh, good. I must have been looking at an old patch. > > > > > As a side note, I'm not terribly thrilled with how kthread_stop is > > implemented. I worry that a stuck kthread would block unrelated > > kthreads from coming down. I did a patch 6 mos ago or so to make it so > > that kthread_stop's could be done in parallel. The downside was that it > > added a new field to the task_struct (which is already too bloated, > > IMO). It might be worth resurrecting that at some point (possibly > > making the new field a union with another field that's unused in > > kthreads), or considering a different approach to parallelize > > kthread_stop. > > One the one hand, kthreads are expected to not block and to check for > kthread_should_stop very often, so this wouldn't be a problem. > On the other hand, expectations like this are quickly invalidated, and > the code probably should be fixed. > I suspect we could do without adding anything to task_struct. > Instead of just one kthread_stop_info, have a linked list, one entry > for each thread being stopped at the moment. kthread_stop adds to > the list, sets PF_EXITING (I hope that is safe) and wakes the process. > kthread_should_stop tests PF_EXITING. > When kthread() finishes, it does a linear search (the list should be > short) and calls complete and the relevant .done. > > Something like this? > > NeilBrown > > > diff --git a/kernel/kthread.c b/kernel/kthread.c > index bd1b9ea..107290a 100644 > --- a/kernel/kthread.c > +++ b/kernel/kthread.c > @@ -39,12 +39,13 @@ struct kthread_stop_info > struct task_struct *k; > int err; > struct completion done; > + struct kthread_stp_info *next; > }; > > /* Thread stopping is done by setthing this var: lock serializes > * multiple kthread_stop calls. */ > static DEFINE_MUTEX(kthread_stop_lock); > -static struct kthread_stop_info kthread_stop_info; > +static struct kthread_stop_info *kthread_stop_info; > > /** > * kthread_should_stop - should this kthread return now? > @@ -55,7 +56,7 @@ static struct kthread_stop_info kthread_stop_info; > */ > int kthread_should_stop(void) > { > - return (kthread_stop_info.k == current); > + return test_bit(PF_EXITING, ¤t.flags); > } > EXPORT_SYMBOL(kthread_should_stop); > > @@ -80,8 +81,17 @@ static int kthread(void *_create) > > /* It might have exited on its own, w/o kthread_stop. Check. */ > if (kthread_should_stop()) { > - kthread_stop_info.err = ret; > - complete(&kthread_stop_info.done); > + struct kthread_stop_info **ksip; > + mutex_lock(&kthread_stop_lock); > + for (ksip = &kthread_stop_info ; *ksip ; ksip = &(*ksip)->next) > + if ((*ksip)->k == current) > + break; > + *ksip = (*ksip)->next; > + (*ksip)->next = NULL; > + mutex_unlock(&kthread_stop_lock); > + > + ksi->err = ret; > + complete(&ksi->done); > } > return 0; > } > @@ -199,26 +209,20 @@ EXPORT_SYMBOL(kthread_bind); > int kthread_stop(struct task_struct *k) > { > int ret; > + struct kthread_stop_info ksi; > > mutex_lock(&kthread_stop_lock); > > - /* It could exit after stop_info.k set, but before wake_up_process. */ > - get_task_struct(k); > + init_completion(&ksi->done); > + ksi->k = k; > > - /* Must init completion *before* thread sees kthread_stop_info.k */ > - init_completion(&kthread_stop_info.done); > - smp_wmb(); > - > - /* Now set kthread_should_stop() to true, and wake it up. */ > - kthread_stop_info.k = k; > + set_bit(PF_EXITING, &k->flags); > wake_up_process(k); > - put_task_struct(k); > + mutex_unlock(&kthread_stop_lock); > > - /* Once it dies, reset stop ptr, gather result and we're done. */ > - wait_for_completion(&kthread_stop_info.done); > - kthread_stop_info.k = NULL; > + /* Once it dies gather result and we're done. */ > + wait_for_completion(&ksi->done); > ret = kthread_stop_info.err; > - mutex_unlock(&kthread_stop_lock); > > return ret; > } That looks reasonable to me, though I tend to find standard list functions easier to follow... The main reason I was going to add a field to the task_struct was because I wanted to make sure that kthread_should_stop() was a quick operation. If we can just use PF_EXITING then that's definitely a better scheme. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH - take 2] knfsd: nfsd: Handle ERESTARTSYS from syscalls. (and possible kthread_stop changes) @ 2008-06-30 12:35 ` Jeff Layton 0 siblings, 0 replies; 18+ messages in thread From: Jeff Layton @ 2008-06-30 12:35 UTC (permalink / raw) To: Neil Brown; +Cc: J. Bruce Fields, linux-nfs, linux-kernel On Tue, 24 Jun 2008 09:55:55 +1000 Neil Brown <neilb@suse.de> wrote: > On Sunday June 22, jlayton@redhat.com wrote: > > On Mon, 23 Jun 2008 10:20:59 +1000 > > Neil Brown <neilb@suse.de> wrote: > > > > > > I think my leaning is just to remove the distinction. About the worst > > > that can happen if the filesystem decides to respond to the signal is > > > that you get a short write or a short read, both of which should be > > > correctly handled by the client. > > > > > > > Good points all around. If the sigmask juggling makes no sense, then > > I'm OK with removing it. I'd prefer that we simplify the code rather > > than trying to get clever with init scripts anyway... > > > > Let's do it? > The idea looks reasonable, but the patch above doesn't apply to Bruce's for-2.6.27 tree. The nfsd() function has changed a bit. Actually, I think we can make this even simpler. kthreads start with all signals ignored and the new patch enables all of the ones in SHUTDOWN_SIGS. So I don't think we need any signal masking at all. How about this patch? It should apply on top of the patchset already in Bruce's tree. ------------[snip]-------------- Subject: [PATCH] knfsd: treat all shutdown signals as equivalent knfsd currently uses 2 signal masks when processing requests. A "loose" mask (SHUTDOWN_SIGS) that it uses when receiving network requests, and then a more "strict" mask (ALLOWED_SIGS, which is just SIGKILL) that it allows when doing the actual operation on the local storage. This is apparently unnecessarily complicated. The underlying filesystem should be able to sanely handle a signal in the middle of an operation. This patch removes the signal mask handling from knfsd altogether. When knfsd is started as a kthread, all signals are ignored. It then allows all of the signals in SHUTDOWN_SIGS. There's no need to set the mask as well. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- fs/nfsd/nfssvc.c | 20 ++------------------ 1 files changed, 2 insertions(+), 18 deletions(-) diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 96fdbca..d20a087 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -37,13 +37,7 @@ #define NFSDDBG_FACILITY NFSDDBG_SVC -/* these signals will be delivered to an nfsd thread - * when handling a request - */ -#define ALLOWED_SIGS (sigmask(SIGKILL)) -/* these signals will be delivered to an nfsd thread - * when not handling a request. i.e. when waiting - */ +/* These signals can be used to shutdown an nfsd thread. */ #define SHUTDOWN_SIGS (sigmask(SIGKILL) | sigmask(SIGHUP) | sigmask(SIGINT) | sigmask(SIGQUIT)) extern struct svc_program nfsd_program; @@ -414,7 +408,6 @@ nfsd(void *vrqstp) { struct svc_rqst *rqstp = (struct svc_rqst *) vrqstp; struct fs_struct *fsp; - sigset_t shutdown_mask, allowed_mask; int err, preverr = 0; unsigned int signo; @@ -433,15 +426,12 @@ nfsd(void *vrqstp) current->fs = fsp; current->fs->umask = 0; - siginitsetinv(&shutdown_mask, SHUTDOWN_SIGS); - siginitsetinv(&allowed_mask, ALLOWED_SIGS); - /* * thread is spawned with all signals set to SIG_IGN, re-enable * the ones that matter */ for (signo = 1; signo <= _NSIG; signo++) { - if (!sigismember(&shutdown_mask, signo)) + if (siginmask(signo, SHUTDOWN_SIGS)) allow_signal(signo); } @@ -460,9 +450,6 @@ nfsd(void *vrqstp) * The main request loop */ for (;;) { - /* Block all but the shutdown signals */ - sigprocmask(SIG_SETMASK, &shutdown_mask, NULL); - /* * Find a socket with data available and call its * recvfrom routine. @@ -487,9 +474,6 @@ nfsd(void *vrqstp) /* Lock the export hash tables for reading. */ exp_readlock(); - /* Process request with signals blocked. */ - sigprocmask(SIG_SETMASK, &allowed_mask, NULL); - svc_process(rqstp); /* Unlock export hash tables */ -- 1.5.5.1 > > > > The latest patch doesn't use kthread_stop with nfsd. I figured it was > > best to avoid having multiple takedown methods, and since we're using > > signals here anyway, I just kept signaling as the only way to take down > > nfsd. Plus, as you mention -- all kthread_stop's are serialized, which > > could have meant that nfsd's would take a long time to come down on a > > busy box with a lot of them. > > Oh, good. I must have been looking at an old patch. > > > > > As a side note, I'm not terribly thrilled with how kthread_stop is > > implemented. I worry that a stuck kthread would block unrelated > > kthreads from coming down. I did a patch 6 mos ago or so to make it so > > that kthread_stop's could be done in parallel. The downside was that it > > added a new field to the task_struct (which is already too bloated, > > IMO). It might be worth resurrecting that at some point (possibly > > making the new field a union with another field that's unused in > > kthreads), or considering a different approach to parallelize > > kthread_stop. > > One the one hand, kthreads are expected to not block and to check for > kthread_should_stop very often, so this wouldn't be a problem. > On the other hand, expectations like this are quickly invalidated, and > the code probably should be fixed. > I suspect we could do without adding anything to task_struct. > Instead of just one kthread_stop_info, have a linked list, one entry > for each thread being stopped at the moment. kthread_stop adds to > the list, sets PF_EXITING (I hope that is safe) and wakes the process. > kthread_should_stop tests PF_EXITING. > When kthread() finishes, it does a linear search (the list should be > short) and calls complete and the relevant .done. > > Something like this? > > NeilBrown > > > diff --git a/kernel/kthread.c b/kernel/kthread.c > index bd1b9ea..107290a 100644 > --- a/kernel/kthread.c > +++ b/kernel/kthread.c > @@ -39,12 +39,13 @@ struct kthread_stop_info > struct task_struct *k; > int err; > struct completion done; > + struct kthread_stp_info *next; > }; > > /* Thread stopping is done by setthing this var: lock serializes > * multiple kthread_stop calls. */ > static DEFINE_MUTEX(kthread_stop_lock); > -static struct kthread_stop_info kthread_stop_info; > +static struct kthread_stop_info *kthread_stop_info; > > /** > * kthread_should_stop - should this kthread return now? > @@ -55,7 +56,7 @@ static struct kthread_stop_info kthread_stop_info; > */ > int kthread_should_stop(void) > { > - return (kthread_stop_info.k == current); > + return test_bit(PF_EXITING, ¤t.flags); > } > EXPORT_SYMBOL(kthread_should_stop); > > @@ -80,8 +81,17 @@ static int kthread(void *_create) > > /* It might have exited on its own, w/o kthread_stop. Check. */ > if (kthread_should_stop()) { > - kthread_stop_info.err = ret; > - complete(&kthread_stop_info.done); > + struct kthread_stop_info **ksip; > + mutex_lock(&kthread_stop_lock); > + for (ksip = &kthread_stop_info ; *ksip ; ksip = &(*ksip)->next) > + if ((*ksip)->k == current) > + break; > + *ksip = (*ksip)->next; > + (*ksip)->next = NULL; > + mutex_unlock(&kthread_stop_lock); > + > + ksi->err = ret; > + complete(&ksi->done); > } > return 0; > } > @@ -199,26 +209,20 @@ EXPORT_SYMBOL(kthread_bind); > int kthread_stop(struct task_struct *k) > { > int ret; > + struct kthread_stop_info ksi; > > mutex_lock(&kthread_stop_lock); > > - /* It could exit after stop_info.k set, but before wake_up_process. */ > - get_task_struct(k); > + init_completion(&ksi->done); > + ksi->k = k; > > - /* Must init completion *before* thread sees kthread_stop_info.k */ > - init_completion(&kthread_stop_info.done); > - smp_wmb(); > - > - /* Now set kthread_should_stop() to true, and wake it up. */ > - kthread_stop_info.k = k; > + set_bit(PF_EXITING, &k->flags); > wake_up_process(k); > - put_task_struct(k); > + mutex_unlock(&kthread_stop_lock); > > - /* Once it dies, reset stop ptr, gather result and we're done. */ > - wait_for_completion(&kthread_stop_info.done); > - kthread_stop_info.k = NULL; > + /* Once it dies gather result and we're done. */ > + wait_for_completion(&ksi->done); > ret = kthread_stop_info.err; > - mutex_unlock(&kthread_stop_lock); > > return ret; > } That looks reasonable to me, though I tend to find standard list functions easier to follow... The main reason I was going to add a field to the task_struct was because I wanted to make sure that kthread_should_stop() was a quick operation. If we can just use PF_EXITING then that's definitely a better scheme. -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply related [flat|nested] 18+ messages in thread
[parent not found: <20080630083536.680b093a-xSBYVWDuneFaJnirhKH9O4GKTjYczspe@public.gmane.org>]
* Re: [PATCH - take 2] knfsd: nfsd: Handle ERESTARTSYS from syscalls. (and possible kthread_stop changes) 2008-06-30 12:35 ` Jeff Layton @ 2008-06-30 17:10 ` J. Bruce Fields -1 siblings, 0 replies; 18+ messages in thread From: J. Bruce Fields @ 2008-06-30 17:10 UTC (permalink / raw) To: Jeff Layton; +Cc: Neil Brown, linux-nfs, linux-kernel On Mon, Jun 30, 2008 at 08:35:36AM -0400, Jeff Layton wrote: > The idea looks reasonable, but the patch above doesn't apply to Bruce's > for-2.6.27 tree. The nfsd() function has changed a bit. Actually, I > think we can make this even simpler. kthreads start with all signals > ignored and the new patch enables all of the ones in SHUTDOWN_SIGS. So > I don't think we need any signal masking at all. > > How about this patch? It should apply on top of the patchset already > in Bruce's tree. Looks reasonable to me; applied for 2.6.27 barring any test failures or objections. --b. > > ------------[snip]-------------- > > Subject: [PATCH] knfsd: treat all shutdown signals as equivalent > > knfsd currently uses 2 signal masks when processing requests. A "loose" > mask (SHUTDOWN_SIGS) that it uses when receiving network requests, and > then a more "strict" mask (ALLOWED_SIGS, which is just SIGKILL) that it > allows when doing the actual operation on the local storage. > > This is apparently unnecessarily complicated. The underlying filesystem > should be able to sanely handle a signal in the middle of an operation. > This patch removes the signal mask handling from knfsd altogether. When > knfsd is started as a kthread, all signals are ignored. It then allows > all of the signals in SHUTDOWN_SIGS. There's no need to set the mask > as well. > > Signed-off-by: Jeff Layton <jlayton@redhat.com> > --- > fs/nfsd/nfssvc.c | 20 ++------------------ > 1 files changed, 2 insertions(+), 18 deletions(-) > > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c > index 96fdbca..d20a087 100644 > --- a/fs/nfsd/nfssvc.c > +++ b/fs/nfsd/nfssvc.c > @@ -37,13 +37,7 @@ > > #define NFSDDBG_FACILITY NFSDDBG_SVC > > -/* these signals will be delivered to an nfsd thread > - * when handling a request > - */ > -#define ALLOWED_SIGS (sigmask(SIGKILL)) > -/* these signals will be delivered to an nfsd thread > - * when not handling a request. i.e. when waiting > - */ > +/* These signals can be used to shutdown an nfsd thread. */ > #define SHUTDOWN_SIGS (sigmask(SIGKILL) | sigmask(SIGHUP) | sigmask(SIGINT) | sigmask(SIGQUIT)) > > extern struct svc_program nfsd_program; > @@ -414,7 +408,6 @@ nfsd(void *vrqstp) > { > struct svc_rqst *rqstp = (struct svc_rqst *) vrqstp; > struct fs_struct *fsp; > - sigset_t shutdown_mask, allowed_mask; > int err, preverr = 0; > unsigned int signo; > > @@ -433,15 +426,12 @@ nfsd(void *vrqstp) > current->fs = fsp; > current->fs->umask = 0; > > - siginitsetinv(&shutdown_mask, SHUTDOWN_SIGS); > - siginitsetinv(&allowed_mask, ALLOWED_SIGS); > - > /* > * thread is spawned with all signals set to SIG_IGN, re-enable > * the ones that matter > */ > for (signo = 1; signo <= _NSIG; signo++) { > - if (!sigismember(&shutdown_mask, signo)) > + if (siginmask(signo, SHUTDOWN_SIGS)) > allow_signal(signo); > } > > @@ -460,9 +450,6 @@ nfsd(void *vrqstp) > * The main request loop > */ > for (;;) { > - /* Block all but the shutdown signals */ > - sigprocmask(SIG_SETMASK, &shutdown_mask, NULL); > - > /* > * Find a socket with data available and call its > * recvfrom routine. > @@ -487,9 +474,6 @@ nfsd(void *vrqstp) > /* Lock the export hash tables for reading. */ > exp_readlock(); > > - /* Process request with signals blocked. */ > - sigprocmask(SIG_SETMASK, &allowed_mask, NULL); > - > svc_process(rqstp); > > /* Unlock export hash tables */ > -- > 1.5.5.1 > > > > > > > > > The latest patch doesn't use kthread_stop with nfsd. I figured it was > > > best to avoid having multiple takedown methods, and since we're using > > > signals here anyway, I just kept signaling as the only way to take down > > > nfsd. Plus, as you mention -- all kthread_stop's are serialized, which > > > could have meant that nfsd's would take a long time to come down on a > > > busy box with a lot of them. > > > > Oh, good. I must have been looking at an old patch. > > > > > > > > As a side note, I'm not terribly thrilled with how kthread_stop is > > > implemented. I worry that a stuck kthread would block unrelated > > > kthreads from coming down. I did a patch 6 mos ago or so to make it so > > > that kthread_stop's could be done in parallel. The downside was that it > > > added a new field to the task_struct (which is already too bloated, > > > IMO). It might be worth resurrecting that at some point (possibly > > > making the new field a union with another field that's unused in > > > kthreads), or considering a different approach to parallelize > > > kthread_stop. > > > > One the one hand, kthreads are expected to not block and to check for > > kthread_should_stop very often, so this wouldn't be a problem. > > On the other hand, expectations like this are quickly invalidated, and > > the code probably should be fixed. > > I suspect we could do without adding anything to task_struct. > > Instead of just one kthread_stop_info, have a linked list, one entry > > for each thread being stopped at the moment. kthread_stop adds to > > the list, sets PF_EXITING (I hope that is safe) and wakes the process. > > kthread_should_stop tests PF_EXITING. > > When kthread() finishes, it does a linear search (the list should be > > short) and calls complete and the relevant .done. > > > > Something like this? > > > > NeilBrown > > > > > > diff --git a/kernel/kthread.c b/kernel/kthread.c > > index bd1b9ea..107290a 100644 > > --- a/kernel/kthread.c > > +++ b/kernel/kthread.c > > @@ -39,12 +39,13 @@ struct kthread_stop_info > > struct task_struct *k; > > int err; > > struct completion done; > > + struct kthread_stp_info *next; > > }; > > > > /* Thread stopping is done by setthing this var: lock serializes > > * multiple kthread_stop calls. */ > > static DEFINE_MUTEX(kthread_stop_lock); > > -static struct kthread_stop_info kthread_stop_info; > > +static struct kthread_stop_info *kthread_stop_info; > > > > /** > > * kthread_should_stop - should this kthread return now? > > @@ -55,7 +56,7 @@ static struct kthread_stop_info kthread_stop_info; > > */ > > int kthread_should_stop(void) > > { > > - return (kthread_stop_info.k == current); > > + return test_bit(PF_EXITING, ¤t.flags); > > } > > EXPORT_SYMBOL(kthread_should_stop); > > > > @@ -80,8 +81,17 @@ static int kthread(void *_create) > > > > /* It might have exited on its own, w/o kthread_stop. Check. */ > > if (kthread_should_stop()) { > > - kthread_stop_info.err = ret; > > - complete(&kthread_stop_info.done); > > + struct kthread_stop_info **ksip; > > + mutex_lock(&kthread_stop_lock); > > + for (ksip = &kthread_stop_info ; *ksip ; ksip = &(*ksip)->next) > > + if ((*ksip)->k == current) > > + break; > > + *ksip = (*ksip)->next; > > + (*ksip)->next = NULL; > > + mutex_unlock(&kthread_stop_lock); > > + > > + ksi->err = ret; > > + complete(&ksi->done); > > } > > return 0; > > } > > @@ -199,26 +209,20 @@ EXPORT_SYMBOL(kthread_bind); > > int kthread_stop(struct task_struct *k) > > { > > int ret; > > + struct kthread_stop_info ksi; > > > > mutex_lock(&kthread_stop_lock); > > > > - /* It could exit after stop_info.k set, but before wake_up_process. */ > > - get_task_struct(k); > > + init_completion(&ksi->done); > > + ksi->k = k; > > > > - /* Must init completion *before* thread sees kthread_stop_info.k */ > > - init_completion(&kthread_stop_info.done); > > - smp_wmb(); > > - > > - /* Now set kthread_should_stop() to true, and wake it up. */ > > - kthread_stop_info.k = k; > > + set_bit(PF_EXITING, &k->flags); > > wake_up_process(k); > > - put_task_struct(k); > > + mutex_unlock(&kthread_stop_lock); > > > > - /* Once it dies, reset stop ptr, gather result and we're done. */ > > - wait_for_completion(&kthread_stop_info.done); > > - kthread_stop_info.k = NULL; > > + /* Once it dies gather result and we're done. */ > > + wait_for_completion(&ksi->done); > > ret = kthread_stop_info.err; > > - mutex_unlock(&kthread_stop_lock); > > > > return ret; > > } > > > That looks reasonable to me, though I tend to find standard list > functions easier to follow... > > The main reason I was going to add a field to the task_struct was > because I wanted to make sure that kthread_should_stop() was a quick > operation. If we can just use PF_EXITING then that's definitely a > better scheme. > > -- > Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH - take 2] knfsd: nfsd: Handle ERESTARTSYS from syscalls. (and possible kthread_stop changes) @ 2008-06-30 17:10 ` J. Bruce Fields 0 siblings, 0 replies; 18+ messages in thread From: J. Bruce Fields @ 2008-06-30 17:10 UTC (permalink / raw) To: Jeff Layton; +Cc: Neil Brown, linux-nfs, linux-kernel On Mon, Jun 30, 2008 at 08:35:36AM -0400, Jeff Layton wrote: > The idea looks reasonable, but the patch above doesn't apply to Bruce's > for-2.6.27 tree. The nfsd() function has changed a bit. Actually, I > think we can make this even simpler. kthreads start with all signals > ignored and the new patch enables all of the ones in SHUTDOWN_SIGS. So > I don't think we need any signal masking at all. > > How about this patch? It should apply on top of the patchset already > in Bruce's tree. Looks reasonable to me; applied for 2.6.27 barring any test failures or objections. --b. > > ------------[snip]-------------- > > Subject: [PATCH] knfsd: treat all shutdown signals as equivalent > > knfsd currently uses 2 signal masks when processing requests. A "loose" > mask (SHUTDOWN_SIGS) that it uses when receiving network requests, and > then a more "strict" mask (ALLOWED_SIGS, which is just SIGKILL) that it > allows when doing the actual operation on the local storage. > > This is apparently unnecessarily complicated. The underlying filesystem > should be able to sanely handle a signal in the middle of an operation. > This patch removes the signal mask handling from knfsd altogether. When > knfsd is started as a kthread, all signals are ignored. It then allows > all of the signals in SHUTDOWN_SIGS. There's no need to set the mask > as well. > > Signed-off-by: Jeff Layton <jlayton@redhat.com> > --- > fs/nfsd/nfssvc.c | 20 ++------------------ > 1 files changed, 2 insertions(+), 18 deletions(-) > > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c > index 96fdbca..d20a087 100644 > --- a/fs/nfsd/nfssvc.c > +++ b/fs/nfsd/nfssvc.c > @@ -37,13 +37,7 @@ > > #define NFSDDBG_FACILITY NFSDDBG_SVC > > -/* these signals will be delivered to an nfsd thread > - * when handling a request > - */ > -#define ALLOWED_SIGS (sigmask(SIGKILL)) > -/* these signals will be delivered to an nfsd thread > - * when not handling a request. i.e. when waiting > - */ > +/* These signals can be used to shutdown an nfsd thread. */ > #define SHUTDOWN_SIGS (sigmask(SIGKILL) | sigmask(SIGHUP) | sigmask(SIGINT) | sigmask(SIGQUIT)) > > extern struct svc_program nfsd_program; > @@ -414,7 +408,6 @@ nfsd(void *vrqstp) > { > struct svc_rqst *rqstp = (struct svc_rqst *) vrqstp; > struct fs_struct *fsp; > - sigset_t shutdown_mask, allowed_mask; > int err, preverr = 0; > unsigned int signo; > > @@ -433,15 +426,12 @@ nfsd(void *vrqstp) > current->fs = fsp; > current->fs->umask = 0; > > - siginitsetinv(&shutdown_mask, SHUTDOWN_SIGS); > - siginitsetinv(&allowed_mask, ALLOWED_SIGS); > - > /* > * thread is spawned with all signals set to SIG_IGN, re-enable > * the ones that matter > */ > for (signo = 1; signo <= _NSIG; signo++) { > - if (!sigismember(&shutdown_mask, signo)) > + if (siginmask(signo, SHUTDOWN_SIGS)) > allow_signal(signo); > } > > @@ -460,9 +450,6 @@ nfsd(void *vrqstp) > * The main request loop > */ > for (;;) { > - /* Block all but the shutdown signals */ > - sigprocmask(SIG_SETMASK, &shutdown_mask, NULL); > - > /* > * Find a socket with data available and call its > * recvfrom routine. > @@ -487,9 +474,6 @@ nfsd(void *vrqstp) > /* Lock the export hash tables for reading. */ > exp_readlock(); > > - /* Process request with signals blocked. */ > - sigprocmask(SIG_SETMASK, &allowed_mask, NULL); > - > svc_process(rqstp); > > /* Unlock export hash tables */ > -- > 1.5.5.1 > > > > > > > > > The latest patch doesn't use kthread_stop with nfsd. I figured it was > > > best to avoid having multiple takedown methods, and since we're using > > > signals here anyway, I just kept signaling as the only way to take down > > > nfsd. Plus, as you mention -- all kthread_stop's are serialized, which > > > could have meant that nfsd's would take a long time to come down on a > > > busy box with a lot of them. > > > > Oh, good. I must have been looking at an old patch. > > > > > > > > As a side note, I'm not terribly thrilled with how kthread_stop is > > > implemented. I worry that a stuck kthread would block unrelated > > > kthreads from coming down. I did a patch 6 mos ago or so to make it so > > > that kthread_stop's could be done in parallel. The downside was that it > > > added a new field to the task_struct (which is already too bloated, > > > IMO). It might be worth resurrecting that at some point (possibly > > > making the new field a union with another field that's unused in > > > kthreads), or considering a different approach to parallelize > > > kthread_stop. > > > > One the one hand, kthreads are expected to not block and to check for > > kthread_should_stop very often, so this wouldn't be a problem. > > On the other hand, expectations like this are quickly invalidated, and > > the code probably should be fixed. > > I suspect we could do without adding anything to task_struct. > > Instead of just one kthread_stop_info, have a linked list, one entry > > for each thread being stopped at the moment. kthread_stop adds to > > the list, sets PF_EXITING (I hope that is safe) and wakes the process. > > kthread_should_stop tests PF_EXITING. > > When kthread() finishes, it does a linear search (the list should be > > short) and calls complete and the relevant .done. > > > > Something like this? > > > > NeilBrown > > > > > > diff --git a/kernel/kthread.c b/kernel/kthread.c > > index bd1b9ea..107290a 100644 > > --- a/kernel/kthread.c > > +++ b/kernel/kthread.c > > @@ -39,12 +39,13 @@ struct kthread_stop_info > > struct task_struct *k; > > int err; > > struct completion done; > > + struct kthread_stp_info *next; > > }; > > > > /* Thread stopping is done by setthing this var: lock serializes > > * multiple kthread_stop calls. */ > > static DEFINE_MUTEX(kthread_stop_lock); > > -static struct kthread_stop_info kthread_stop_info; > > +static struct kthread_stop_info *kthread_stop_info; > > > > /** > > * kthread_should_stop - should this kthread return now? > > @@ -55,7 +56,7 @@ static struct kthread_stop_info kthread_stop_info; > > */ > > int kthread_should_stop(void) > > { > > - return (kthread_stop_info.k == current); > > + return test_bit(PF_EXITING, ¤t.flags); > > } > > EXPORT_SYMBOL(kthread_should_stop); > > > > @@ -80,8 +81,17 @@ static int kthread(void *_create) > > > > /* It might have exited on its own, w/o kthread_stop. Check. */ > > if (kthread_should_stop()) { > > - kthread_stop_info.err = ret; > > - complete(&kthread_stop_info.done); > > + struct kthread_stop_info **ksip; > > + mutex_lock(&kthread_stop_lock); > > + for (ksip = &kthread_stop_info ; *ksip ; ksip = &(*ksip)->next) > > + if ((*ksip)->k == current) > > + break; > > + *ksip = (*ksip)->next; > > + (*ksip)->next = NULL; > > + mutex_unlock(&kthread_stop_lock); > > + > > + ksi->err = ret; > > + complete(&ksi->done); > > } > > return 0; > > } > > @@ -199,26 +209,20 @@ EXPORT_SYMBOL(kthread_bind); > > int kthread_stop(struct task_struct *k) > > { > > int ret; > > + struct kthread_stop_info ksi; > > > > mutex_lock(&kthread_stop_lock); > > > > - /* It could exit after stop_info.k set, but before wake_up_process. */ > > - get_task_struct(k); > > + init_completion(&ksi->done); > > + ksi->k = k; > > > > - /* Must init completion *before* thread sees kthread_stop_info.k */ > > - init_completion(&kthread_stop_info.done); > > - smp_wmb(); > > - > > - /* Now set kthread_should_stop() to true, and wake it up. */ > > - kthread_stop_info.k = k; > > + set_bit(PF_EXITING, &k->flags); > > wake_up_process(k); > > - put_task_struct(k); > > + mutex_unlock(&kthread_stop_lock); > > > > - /* Once it dies, reset stop ptr, gather result and we're done. */ > > - wait_for_completion(&kthread_stop_info.done); > > - kthread_stop_info.k = NULL; > > + /* Once it dies gather result and we're done. */ > > + wait_for_completion(&ksi->done); > > ret = kthread_stop_info.err; > > - mutex_unlock(&kthread_stop_lock); > > > > return ret; > > } > > > That looks reasonable to me, though I tend to find standard list > functions easier to follow... > > The main reason I was going to add a field to the task_struct was > because I wanted to make sure that kthread_should_stop() was a quick > operation. If we can just use PF_EXITING then that's definitely a > better scheme. > > -- > Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH - take 2] knfsd: nfsd: Handle ERESTARTSYS from syscalls. (and possible kthread_stop changes) 2008-06-30 17:10 ` J. Bruce Fields (?) @ 2008-06-30 18:09 ` Jeff Layton [not found] ` <20080630140946.34154d4c-xSBYVWDuneFaJnirhKH9O4GKTjYczspe@public.gmane.org> -1 siblings, 1 reply; 18+ messages in thread From: Jeff Layton @ 2008-06-30 18:09 UTC (permalink / raw) To: J. Bruce Fields; +Cc: Neil Brown, linux-nfs, linux-kernel On Mon, 30 Jun 2008 13:10:19 -0400 "J. Bruce Fields" <bfields@fieldses.org> wrote: > On Mon, Jun 30, 2008 at 08:35:36AM -0400, Jeff Layton wrote: > > The idea looks reasonable, but the patch above doesn't apply to Bruce's > > for-2.6.27 tree. The nfsd() function has changed a bit. Actually, I > > think we can make this even simpler. kthreads start with all signals > > ignored and the new patch enables all of the ones in SHUTDOWN_SIGS. So > > I don't think we need any signal masking at all. > > > > How about this patch? It should apply on top of the patchset already > > in Bruce's tree. > > Looks reasonable to me; applied for 2.6.27 barring any test failures or > objections. > > --b. > Actually, here's an even simpler version. How about we just get rid of the loop and just have a series of allow_signal() calls. We can get rid of a local variable that way and since we're only referencing SHUTDOWN_SIGS once, I don't see any reason to keep it around. -------[snip]----------- knfsd: treat all shutdown signals as equivalent knfsd currently uses 2 signal masks when processing requests. A "loose" mask (SHUTDOWN_SIGS) that it uses when receiving network requests, and then a more "strict" mask (ALLOWED_SIGS, which is just SIGKILL) that it allows when doing the actual operation on the local storage. This is apparently unnecessarily complicated. The underlying filesystem should be able to sanely handle a signal in the middle of an operation. This patch removes the signal mask handling from knfsd altogether. When knfsd is started as a kthread, all signals are ignored. It then allows all of the signals in SHUTDOWN_SIGS. There's no need to set the mask as well. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- fs/nfsd/nfssvc.c | 30 +++++------------------------- 1 files changed, 5 insertions(+), 25 deletions(-) diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 96fdbca..80292ff 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -37,15 +37,6 @@ #define NFSDDBG_FACILITY NFSDDBG_SVC -/* these signals will be delivered to an nfsd thread - * when handling a request - */ -#define ALLOWED_SIGS (sigmask(SIGKILL)) -/* these signals will be delivered to an nfsd thread - * when not handling a request. i.e. when waiting - */ -#define SHUTDOWN_SIGS (sigmask(SIGKILL) | sigmask(SIGHUP) | sigmask(SIGINT) | sigmask(SIGQUIT)) - extern struct svc_program nfsd_program; static int nfsd(void *vrqstp); struct timeval nfssvc_boot; @@ -414,9 +405,7 @@ nfsd(void *vrqstp) { struct svc_rqst *rqstp = (struct svc_rqst *) vrqstp; struct fs_struct *fsp; - sigset_t shutdown_mask, allowed_mask; int err, preverr = 0; - unsigned int signo; /* Lock module and set up kernel thread */ mutex_lock(&nfsd_mutex); @@ -433,17 +422,14 @@ nfsd(void *vrqstp) current->fs = fsp; current->fs->umask = 0; - siginitsetinv(&shutdown_mask, SHUTDOWN_SIGS); - siginitsetinv(&allowed_mask, ALLOWED_SIGS); - /* * thread is spawned with all signals set to SIG_IGN, re-enable - * the ones that matter + * the ones that will bring down the thread */ - for (signo = 1; signo <= _NSIG; signo++) { - if (!sigismember(&shutdown_mask, signo)) - allow_signal(signo); - } + allow_signal(SIGKILL); + allow_signal(SIGHUP); + allow_signal(SIGINT); + allow_signal(SIGQUIT); nfsdstats.th_cnt++; mutex_unlock(&nfsd_mutex); @@ -460,9 +446,6 @@ nfsd(void *vrqstp) * The main request loop */ for (;;) { - /* Block all but the shutdown signals */ - sigprocmask(SIG_SETMASK, &shutdown_mask, NULL); - /* * Find a socket with data available and call its * recvfrom routine. @@ -487,9 +470,6 @@ nfsd(void *vrqstp) /* Lock the export hash tables for reading. */ exp_readlock(); - /* Process request with signals blocked. */ - sigprocmask(SIG_SETMASK, &allowed_mask, NULL); - svc_process(rqstp); /* Unlock export hash tables */ -- 1.5.5.1 ^ permalink raw reply related [flat|nested] 18+ messages in thread
[parent not found: <20080630140946.34154d4c-xSBYVWDuneFaJnirhKH9O4GKTjYczspe@public.gmane.org>]
* Re: [PATCH - take 2] knfsd: nfsd: Handle ERESTARTSYS from syscalls. (and possible kthread_stop changes) 2008-06-30 18:09 ` Jeff Layton @ 2008-06-30 19:29 ` J. Bruce Fields 0 siblings, 0 replies; 18+ messages in thread From: J. Bruce Fields @ 2008-06-30 19:29 UTC (permalink / raw) To: Jeff Layton; +Cc: Neil Brown, linux-nfs, linux-kernel On Mon, Jun 30, 2008 at 02:09:46PM -0400, Jeff Layton wrote: > On Mon, 30 Jun 2008 13:10:19 -0400 > "J. Bruce Fields" <bfields@fieldses.org> wrote: > > > On Mon, Jun 30, 2008 at 08:35:36AM -0400, Jeff Layton wrote: > > > The idea looks reasonable, but the patch above doesn't apply to Bruce's > > > for-2.6.27 tree. The nfsd() function has changed a bit. Actually, I > > > think we can make this even simpler. kthreads start with all signals > > > ignored and the new patch enables all of the ones in SHUTDOWN_SIGS. So > > > I don't think we need any signal masking at all. > > > > > > How about this patch? It should apply on top of the patchset already > > > in Bruce's tree. > > > > Looks reasonable to me; applied for 2.6.27 barring any test failures or > > objections. > > > > --b. > > > > Actually, here's an even simpler version. How about we just get rid of > the loop and just have a series of allow_signal() calls. We can get > rid of a local variable that way and since we're only referencing > SHUTDOWN_SIGS once, I don't see any reason to keep it around. Yep, it's one less definition to track down when you read the code.... Thanks! --b. > > -------[snip]----------- > > knfsd: treat all shutdown signals as equivalent > > knfsd currently uses 2 signal masks when processing requests. A "loose" > mask (SHUTDOWN_SIGS) that it uses when receiving network requests, and > then a more "strict" mask (ALLOWED_SIGS, which is just SIGKILL) that it > allows when doing the actual operation on the local storage. > > This is apparently unnecessarily complicated. The underlying filesystem > should be able to sanely handle a signal in the middle of an operation. > This patch removes the signal mask handling from knfsd altogether. When > knfsd is started as a kthread, all signals are ignored. It then allows > all of the signals in SHUTDOWN_SIGS. There's no need to set the mask > as well. > > Signed-off-by: Jeff Layton <jlayton@redhat.com> > --- > fs/nfsd/nfssvc.c | 30 +++++------------------------- > 1 files changed, 5 insertions(+), 25 deletions(-) > > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c > index 96fdbca..80292ff 100644 > --- a/fs/nfsd/nfssvc.c > +++ b/fs/nfsd/nfssvc.c > @@ -37,15 +37,6 @@ > > #define NFSDDBG_FACILITY NFSDDBG_SVC > > -/* these signals will be delivered to an nfsd thread > - * when handling a request > - */ > -#define ALLOWED_SIGS (sigmask(SIGKILL)) > -/* these signals will be delivered to an nfsd thread > - * when not handling a request. i.e. when waiting > - */ > -#define SHUTDOWN_SIGS (sigmask(SIGKILL) | sigmask(SIGHUP) | sigmask(SIGINT) | sigmask(SIGQUIT)) > - > extern struct svc_program nfsd_program; > static int nfsd(void *vrqstp); > struct timeval nfssvc_boot; > @@ -414,9 +405,7 @@ nfsd(void *vrqstp) > { > struct svc_rqst *rqstp = (struct svc_rqst *) vrqstp; > struct fs_struct *fsp; > - sigset_t shutdown_mask, allowed_mask; > int err, preverr = 0; > - unsigned int signo; > > /* Lock module and set up kernel thread */ > mutex_lock(&nfsd_mutex); > @@ -433,17 +422,14 @@ nfsd(void *vrqstp) > current->fs = fsp; > current->fs->umask = 0; > > - siginitsetinv(&shutdown_mask, SHUTDOWN_SIGS); > - siginitsetinv(&allowed_mask, ALLOWED_SIGS); > - > /* > * thread is spawned with all signals set to SIG_IGN, re-enable > - * the ones that matter > + * the ones that will bring down the thread > */ > - for (signo = 1; signo <= _NSIG; signo++) { > - if (!sigismember(&shutdown_mask, signo)) > - allow_signal(signo); > - } > + allow_signal(SIGKILL); > + allow_signal(SIGHUP); > + allow_signal(SIGINT); > + allow_signal(SIGQUIT); > > nfsdstats.th_cnt++; > mutex_unlock(&nfsd_mutex); > @@ -460,9 +446,6 @@ nfsd(void *vrqstp) > * The main request loop > */ > for (;;) { > - /* Block all but the shutdown signals */ > - sigprocmask(SIG_SETMASK, &shutdown_mask, NULL); > - > /* > * Find a socket with data available and call its > * recvfrom routine. > @@ -487,9 +470,6 @@ nfsd(void *vrqstp) > /* Lock the export hash tables for reading. */ > exp_readlock(); > > - /* Process request with signals blocked. */ > - sigprocmask(SIG_SETMASK, &allowed_mask, NULL); > - > svc_process(rqstp); > > /* Unlock export hash tables */ > -- > 1.5.5.1 > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH - take 2] knfsd: nfsd: Handle ERESTARTSYS from syscalls. (and possible kthread_stop changes) @ 2008-06-30 19:29 ` J. Bruce Fields 0 siblings, 0 replies; 18+ messages in thread From: J. Bruce Fields @ 2008-06-30 19:29 UTC (permalink / raw) To: Jeff Layton; +Cc: Neil Brown, linux-nfs, linux-kernel On Mon, Jun 30, 2008 at 02:09:46PM -0400, Jeff Layton wrote: > On Mon, 30 Jun 2008 13:10:19 -0400 > "J. Bruce Fields" <bfields@fieldses.org> wrote: > > > On Mon, Jun 30, 2008 at 08:35:36AM -0400, Jeff Layton wrote: > > > The idea looks reasonable, but the patch above doesn't apply to Bruce's > > > for-2.6.27 tree. The nfsd() function has changed a bit. Actually, I > > > think we can make this even simpler. kthreads start with all signals > > > ignored and the new patch enables all of the ones in SHUTDOWN_SIGS. So > > > I don't think we need any signal masking at all. > > > > > > How about this patch? It should apply on top of the patchset already > > > in Bruce's tree. > > > > Looks reasonable to me; applied for 2.6.27 barring any test failures or > > objections. > > > > --b. > > > > Actually, here's an even simpler version. How about we just get rid of > the loop and just have a series of allow_signal() calls. We can get > rid of a local variable that way and since we're only referencing > SHUTDOWN_SIGS once, I don't see any reason to keep it around. Yep, it's one less definition to track down when you read the code.... Thanks! --b. > > -------[snip]----------- > > knfsd: treat all shutdown signals as equivalent > > knfsd currently uses 2 signal masks when processing requests. A "loose" > mask (SHUTDOWN_SIGS) that it uses when receiving network requests, and > then a more "strict" mask (ALLOWED_SIGS, which is just SIGKILL) that it > allows when doing the actual operation on the local storage. > > This is apparently unnecessarily complicated. The underlying filesystem > should be able to sanely handle a signal in the middle of an operation. > This patch removes the signal mask handling from knfsd altogether. When > knfsd is started as a kthread, all signals are ignored. It then allows > all of the signals in SHUTDOWN_SIGS. There's no need to set the mask > as well. > > Signed-off-by: Jeff Layton <jlayton@redhat.com> > --- > fs/nfsd/nfssvc.c | 30 +++++------------------------- > 1 files changed, 5 insertions(+), 25 deletions(-) > > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c > index 96fdbca..80292ff 100644 > --- a/fs/nfsd/nfssvc.c > +++ b/fs/nfsd/nfssvc.c > @@ -37,15 +37,6 @@ > > #define NFSDDBG_FACILITY NFSDDBG_SVC > > -/* these signals will be delivered to an nfsd thread > - * when handling a request > - */ > -#define ALLOWED_SIGS (sigmask(SIGKILL)) > -/* these signals will be delivered to an nfsd thread > - * when not handling a request. i.e. when waiting > - */ > -#define SHUTDOWN_SIGS (sigmask(SIGKILL) | sigmask(SIGHUP) | sigmask(SIGINT) | sigmask(SIGQUIT)) > - > extern struct svc_program nfsd_program; > static int nfsd(void *vrqstp); > struct timeval nfssvc_boot; > @@ -414,9 +405,7 @@ nfsd(void *vrqstp) > { > struct svc_rqst *rqstp = (struct svc_rqst *) vrqstp; > struct fs_struct *fsp; > - sigset_t shutdown_mask, allowed_mask; > int err, preverr = 0; > - unsigned int signo; > > /* Lock module and set up kernel thread */ > mutex_lock(&nfsd_mutex); > @@ -433,17 +422,14 @@ nfsd(void *vrqstp) > current->fs = fsp; > current->fs->umask = 0; > > - siginitsetinv(&shutdown_mask, SHUTDOWN_SIGS); > - siginitsetinv(&allowed_mask, ALLOWED_SIGS); > - > /* > * thread is spawned with all signals set to SIG_IGN, re-enable > - * the ones that matter > + * the ones that will bring down the thread > */ > - for (signo = 1; signo <= _NSIG; signo++) { > - if (!sigismember(&shutdown_mask, signo)) > - allow_signal(signo); > - } > + allow_signal(SIGKILL); > + allow_signal(SIGHUP); > + allow_signal(SIGINT); > + allow_signal(SIGQUIT); > > nfsdstats.th_cnt++; > mutex_unlock(&nfsd_mutex); > @@ -460,9 +446,6 @@ nfsd(void *vrqstp) > * The main request loop > */ > for (;;) { > - /* Block all but the shutdown signals */ > - sigprocmask(SIG_SETMASK, &shutdown_mask, NULL); > - > /* > * Find a socket with data available and call its > * recvfrom routine. > @@ -487,9 +470,6 @@ nfsd(void *vrqstp) > /* Lock the export hash tables for reading. */ > exp_readlock(); > > - /* Process request with signals blocked. */ > - sigprocmask(SIG_SETMASK, &allowed_mask, NULL); > - > svc_process(rqstp); > > /* Unlock export hash tables */ > -- > 1.5.5.1 > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH - take 2] knfsd: nfsd: Handle ERESTARTSYS from syscalls. 2008-06-19 0:11 ` [PATCH - take 2] knfsd: nfsd: Handle ERESTARTSYS from syscalls NeilBrown 2008-06-19 1:09 ` Jeff Layton @ 2008-06-20 17:34 ` J. Bruce Fields 1 sibling, 0 replies; 18+ messages in thread From: J. Bruce Fields @ 2008-06-20 17:34 UTC (permalink / raw) To: NeilBrown; +Cc: linux-nfs, linux-kernel Thanks, applied (and replaced the earlier patch in the for-2.6.27 branch at git://linux-nfs.org/~bfields/linux.git for-2.6.27 I saw sort of curious if I could convince myself to go through this cycle only appending to that branch. I guess not. Maybe next time.) --b. On Thu, Jun 19, 2008 at 10:11:09AM +1000, NeilBrown wrote: > > OCFS2 can return -ERESTARTSYS from write requests (and possibly > elsewhere) if there is a signal pending. > > If nfsd is shutdown (by sending a signal to each thread) while there > is still an IO load from the client, each thread could handle one last > request with a signal pending. This can result in -ERESTARTSYS > which is not understood by nfserrno() and so is reflected back to > the client as nfserr_io aka -EIO. This is wrong. > > Instead, interpret ERESTARTSYS to mean "try again later" by returning > nfserr_jukebox. The client will resend and - if the server is > restarted - the write will (hopefully) be successful and everyone will > be happy. > > The symptom that I narrowed down to this was: > copy a large file via NFS to an OCFS2 filesystem, and restart > the nfs server during the copy. > The 'cp' might get an -EIO, and the file will be corrupted - > presumably holes in the middle where writes appeared to fail. > > > Signed-off-by: Neil Brown <neilb@suse.de> > > ### Diffstat output > ./fs/nfsd/nfsproc.c | 1 + > 1 file changed, 1 insertion(+) > > diff .prev/fs/nfsd/nfsproc.c ./fs/nfsd/nfsproc.c > --- .prev/fs/nfsd/nfsproc.c 2008-06-19 10:06:36.000000000 +1000 > +++ ./fs/nfsd/nfsproc.c 2008-06-19 10:07:58.000000000 +1000 > @@ -614,6 +614,7 @@ nfserrno (int errno) > #endif > { nfserr_stale, -ESTALE }, > { nfserr_jukebox, -ETIMEDOUT }, > + { nfserr_jukebox, -ERESTARTSYS }, > { nfserr_dropit, -EAGAIN }, > { nfserr_dropit, -ENOMEM }, > { nfserr_badname, -ESRCH }, ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2008-06-30 19:30 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20080619101025.24263.patches@notabene>
2008-06-19 0:11 ` [PATCH - take 2] knfsd: nfsd: Handle ERESTARTSYS from syscalls NeilBrown
2008-06-19 1:09 ` Jeff Layton
2008-06-19 2:29 ` Neil Brown
2008-06-19 10:38 ` Jeff Layton
[not found] ` <20080619063824.00ca6381-RtJpwOs3+0O+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2008-06-20 17:50 ` J. Bruce Fields
2008-06-20 17:50 ` J. Bruce Fields
2008-06-23 0:20 ` Neil Brown
[not found] ` <18526.60523.584503.68076-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2008-06-23 0:52 ` Jeff Layton
2008-06-23 0:52 ` Jeff Layton
2008-06-23 23:55 ` [PATCH - take 2] knfsd: nfsd: Handle ERESTARTSYS from syscalls. (and possible kthread_stop changes) Neil Brown
[not found] ` <18528.14347.525397.917553-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2008-06-30 12:35 ` Jeff Layton
2008-06-30 12:35 ` Jeff Layton
[not found] ` <20080630083536.680b093a-xSBYVWDuneFaJnirhKH9O4GKTjYczspe@public.gmane.org>
2008-06-30 17:10 ` J. Bruce Fields
2008-06-30 17:10 ` J. Bruce Fields
2008-06-30 18:09 ` Jeff Layton
[not found] ` <20080630140946.34154d4c-xSBYVWDuneFaJnirhKH9O4GKTjYczspe@public.gmane.org>
2008-06-30 19:29 ` J. Bruce Fields
2008-06-30 19:29 ` J. Bruce Fields
2008-06-20 17:34 ` [PATCH - take 2] knfsd: nfsd: Handle ERESTARTSYS from syscalls J. Bruce Fields
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.