* [PATCH 0/4] Convert knfsd to kthread API and fix startup/shutdown races (try #2) @ 2008-06-04 15:03 Jeff Layton 2008-06-04 15:03 ` [PATCH 1/4] knfsd: Replace lock_kernel with a mutex for nfsd thread startup/shutdown locking Jeff Layton 0 siblings, 1 reply; 18+ messages in thread From: Jeff Layton @ 2008-06-04 15:03 UTC (permalink / raw) To: linux-nfs, nfsv4; +Cc: gnb This patchset is the second attempt to change nfsd to the kthread API. The main changes from the first patchset are: - it includes the patch by Neil Brown to take the BKL out of nfsd startup/shutdown codepath. This should hopefully plug the main race condition that could lead to oopses when nfsd was signaled and then rapidly restarted before all of the existing nfsd's had gone down. I've also added some comments to this patch to try and clarify the locking for future work. - It preserves signaling as the main method for taking down nfsd. While kthread_stop is easier to deal with and less racy than signals, a lot of distros do the equivalent of "pkill nfsd" to take nfsd down. Changing that would be painful. Also, using kthread_stop is slower than using signals, since taking down each thread would be serialized. - it removes the special handling where if the last thread caught a SIGHUP, the kernel would leave the export cache intact I've tested this by running I/O against the server w/ iozone while restarting nfs service in a loop. Everything seems to work as expected. As always, comments and suggestions are appreciated... Signed-off-by: Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/4] knfsd: Replace lock_kernel with a mutex for nfsd thread startup/shutdown locking. 2008-06-04 15:03 [PATCH 0/4] Convert knfsd to kthread API and fix startup/shutdown races (try #2) Jeff Layton @ 2008-06-04 15:03 ` Jeff Layton 2008-06-04 15:03 ` [PATCH 2/4] knfsd: remove special handling for SIGHUP Jeff Layton ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Jeff Layton @ 2008-06-04 15:03 UTC (permalink / raw) To: linux-nfs, nfsv4; +Cc: gnb From: Neil Brown <neilb@suse.de> This removes the BKL from the RPC service creation codepath. The BKL really isn't adequate for this job since some of this info needs protection across sleeps. Also, add some comments to try and clarify how the locking should work and to make it clear that the BKL isn't necessary as long as there is adequate locking between tasks when touching the svc_serv fields. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Jeff Layton <jlayton@redhat.com> --- fs/nfsd/nfsctl.c | 39 +++++++++++++++++++++++++-------------- fs/nfsd/nfssvc.c | 45 ++++++++++++++++++++++++++++++++------------- net/sunrpc/svc.c | 15 +++++++++------ 3 files changed, 66 insertions(+), 33 deletions(-) diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 5ac00c4..d601a77 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -441,6 +441,8 @@ static ssize_t write_threads(struct file *file, char *buf, size_t size) return strlen(buf); } +extern struct mutex nfsd_mutex; + static ssize_t write_pool_threads(struct file *file, char *buf, size_t size) { /* if size > 0, look for an array of number of threads per node @@ -450,22 +452,26 @@ static ssize_t write_pool_threads(struct file *file, char *buf, size_t size) int i; int rv; int len; - int npools = nfsd_nrpools(); + int npools; int *nthreads; + mutex_lock(&nfsd_mutex); + npools = nfsd_nrpools(); if (npools == 0) { /* * NFS is shut down. The admin can start it by * writing to the threads file but NOT the pool_threads * file, sorry. Report zero threads. */ + mutex_unlock(&nfsd_mutex); strcpy(buf, "0\n"); return strlen(buf); } nthreads = kcalloc(npools, sizeof(int), GFP_KERNEL); + rv = -ENOMEM; if (nthreads == NULL) - return -ENOMEM; + goto out_free; if (size > 0) { for (i = 0; i < npools; i++) { @@ -496,10 +502,12 @@ static ssize_t write_pool_threads(struct file *file, char *buf, size_t size) mesg += len; } + mutex_unlock(&nfsd_mutex); return (mesg-buf); out_free: kfree(nthreads); + mutex_unlock(&nfsd_mutex); return rv; } @@ -566,14 +574,13 @@ static ssize_t write_versions(struct file *file, char *buf, size_t size) return len; } -static ssize_t write_ports(struct file *file, char *buf, size_t size) +static ssize_t __write_ports(struct file *file, char *buf, size_t size) { if (size == 0) { int len = 0; - lock_kernel(); + if (nfsd_serv) len = svc_xprt_names(nfsd_serv, buf, 0); - unlock_kernel(); return len; } /* Either a single 'fd' number is written, in which @@ -603,9 +610,7 @@ static ssize_t write_ports(struct file *file, char *buf, size_t size) /* Decrease the count, but don't shutdown the * the service */ - lock_kernel(); nfsd_serv->sv_nrthreads--; - unlock_kernel(); } return err < 0 ? err : 0; } @@ -614,10 +619,8 @@ static ssize_t write_ports(struct file *file, char *buf, size_t size) int len = 0; if (!toclose) return -ENOMEM; - lock_kernel(); if (nfsd_serv) len = svc_sock_names(buf, nfsd_serv, toclose); - unlock_kernel(); if (len >= 0) lockd_down(); kfree(toclose); @@ -655,7 +658,6 @@ static ssize_t write_ports(struct file *file, char *buf, size_t size) if (sscanf(&buf[1], "%15s %4d", transport, &port) == 2) { if (port == 0) return -EINVAL; - lock_kernel(); if (nfsd_serv) { xprt = svc_find_xprt(nfsd_serv, transport, AF_UNSPEC, port); @@ -666,13 +668,22 @@ static ssize_t write_ports(struct file *file, char *buf, size_t size) } else err = -ENOTCONN; } - unlock_kernel(); return err < 0 ? err : 0; } } return -EINVAL; } +static ssize_t write_ports(struct file *file, char *buf, size_t size) +{ + ssize_t rv; + mutex_lock(&nfsd_mutex); + rv = __write_ports(file, buf, size); + mutex_unlock(&nfsd_mutex); + return rv; +} + + int nfsd_max_blksize; static ssize_t write_maxblksize(struct file *file, char *buf, size_t size) @@ -691,13 +702,13 @@ static ssize_t write_maxblksize(struct file *file, char *buf, size_t size) if (bsize > NFSSVC_MAXBLKSIZE) bsize = NFSSVC_MAXBLKSIZE; bsize &= ~(1024-1); - lock_kernel(); + mutex_lock(&nfsd_mutex); if (nfsd_serv && nfsd_serv->sv_nrthreads) { - unlock_kernel(); + mutex_unlock(&nfsd_mutex); return -EBUSY; } nfsd_max_blksize = bsize; - unlock_kernel(); + mutex_unlock(&nfsd_mutex); } return sprintf(buf, "%d\n", nfsd_max_blksize); } diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 941041f..040b3c2 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -53,11 +53,27 @@ extern struct svc_program nfsd_program; static void nfsd(struct svc_rqst *rqstp); struct timeval nfssvc_boot; - struct svc_serv *nfsd_serv; static atomic_t nfsd_busy; static unsigned long nfsd_last_call; static DEFINE_SPINLOCK(nfsd_call_lock); +/* + * nfsd_mutex protects nfsd_serv -- both the pointer itself and the members + * of the svc_serv struct. In particular, ->sv_nrthreads but also to some + * extent ->sv_temp_socks and ->sv_permsocks. It also protects nfsdstats.th_cnt + * + * If (out side the lock) nfsd_serv is non-NULL, then it must point to a + * properly initialised 'struct svc_serv' with ->sv_nrthreads > 0. That number + * of nfsd threads must exist and each must listed in ->sp_all_threads in each + * entry of ->sv_pools[]. + * + * Transitions of the thread count between zero and non-zero are of particular + * interest since the svc_serv needs to be created and initialized at that + * point, or freed. + */ +struct svc_serv *nfsd_serv; +DEFINE_MUTEX(nfsd_mutex); + #if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) static struct svc_stat nfsd_acl_svcstats; static struct svc_version * nfsd_acl_version[] = { @@ -190,13 +206,14 @@ void nfsd_reset_versions(void) } } + int nfsd_create_serv(void) { int err = 0; - lock_kernel(); + + WARN_ON(!mutex_is_locked(&nfsd_mutex)); if (nfsd_serv) { svc_get(nfsd_serv); - unlock_kernel(); return 0; } if (nfsd_max_blksize == 0) { @@ -223,7 +240,7 @@ int nfsd_create_serv(void) nfsd, SIG_NOCLEAN, THIS_MODULE); if (nfsd_serv == NULL) err = -ENOMEM; - unlock_kernel(); + do_gettimeofday(&nfssvc_boot); /* record boot time */ return err; } @@ -282,6 +299,8 @@ int nfsd_set_nrthreads(int n, int *nthreads) int tot = 0; int err = 0; + WARN_ON(!mutex_is_locked(&nfsd_mutex)); + if (nfsd_serv == NULL || n <= 0) return 0; @@ -316,7 +335,6 @@ int nfsd_set_nrthreads(int n, int *nthreads) nthreads[0] = 1; /* apply the new numbers */ - lock_kernel(); svc_get(nfsd_serv); for (i = 0; i < n; i++) { err = svc_set_num_threads(nfsd_serv, &nfsd_serv->sv_pools[i], @@ -325,7 +343,6 @@ int nfsd_set_nrthreads(int n, int *nthreads) break; } svc_destroy(nfsd_serv); - unlock_kernel(); return err; } @@ -334,8 +351,8 @@ int nfsd_svc(unsigned short port, int nrservs) { int error; - - lock_kernel(); + + mutex_lock(&nfsd_mutex); dprintk("nfsd: creating service\n"); error = -EINVAL; if (nrservs <= 0) @@ -363,7 +380,7 @@ nfsd_svc(unsigned short port, int nrservs) failure: svc_destroy(nfsd_serv); /* Release server */ out: - unlock_kernel(); + mutex_unlock(&nfsd_mutex); return error; } @@ -399,7 +416,7 @@ nfsd(struct svc_rqst *rqstp) sigset_t shutdown_mask, allowed_mask; /* Lock module and set up kernel thread */ - lock_kernel(); + mutex_lock(&nfsd_mutex); daemonize("nfsd"); /* After daemonize() this kernel thread shares current->fs @@ -417,11 +434,13 @@ nfsd(struct svc_rqst *rqstp) siginitsetinv(&shutdown_mask, SHUTDOWN_SIGS); siginitsetinv(&allowed_mask, ALLOWED_SIGS); + nfsdstats.th_cnt++; rqstp->rq_task = current; - unlock_kernel(); + mutex_unlock(&nfsd_mutex); + /* * We want less throttling in balance_dirty_pages() so that nfs to @@ -477,7 +496,7 @@ nfsd(struct svc_rqst *rqstp) /* Clear signals before calling svc_exit_thread() */ flush_signals(current); - lock_kernel(); + mutex_lock(&nfsd_mutex); nfsdstats.th_cnt --; @@ -486,7 +505,7 @@ out: svc_exit_thread(rqstp); /* Release module */ - unlock_kernel(); + mutex_unlock(&nfsd_mutex); module_put_and_exit(0); } diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 01c7e31..7bffaff 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -461,7 +461,8 @@ svc_create_pooled(struct svc_program *prog, unsigned int bufsize, EXPORT_SYMBOL(svc_create_pooled); /* - * Destroy an RPC service. Should be called with the BKL held + * Destroy an RPC service. Should be called with appropriate locking to + * protect the sv_nrthreads, sv_permsocks and sv_tempsocks. */ void svc_destroy(struct svc_serv *serv) @@ -578,9 +579,10 @@ out_enomem: EXPORT_SYMBOL(svc_prepare_thread); /* - * Create a thread in the given pool. Caller must hold BKL. - * On a NUMA or SMP machine, with a multi-pool serv, the thread - * will be restricted to run on the cpus belonging to the pool. + * Create a thread in the given pool. Caller must hold BKL or another lock to + * serialize access to the svc_serv struct. On a NUMA or SMP machine, with a + * multi-pool serv, the thread will be restricted to run on the cpus belonging + * to the pool. */ static int __svc_create_thread(svc_thread_fn func, struct svc_serv *serv, @@ -674,7 +676,7 @@ found_pool: * of threads the given number. If `pool' is non-NULL, applies * only to threads in that pool, otherwise round-robins between * all pools. Must be called with a svc_get() reference and - * the BKL held. + * the BKL or another lock to protect access to svc_serv fields. * * Destroying threads relies on the service threads filling in * rqstp->rq_task, which only the nfs ones do. Assumes the serv @@ -722,7 +724,8 @@ svc_set_num_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs) EXPORT_SYMBOL(svc_set_num_threads); /* - * Called from a server thread as it's exiting. Caller must hold BKL. + * Called from a server thread as it's exiting. Caller must hold the BKL or + * the "service mutex", whichever is appropriate for the service. */ void svc_exit_thread(struct svc_rqst *rqstp) -- 1.5.3.6 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 2/4] knfsd: remove special handling for SIGHUP 2008-06-04 15:03 ` [PATCH 1/4] knfsd: Replace lock_kernel with a mutex for nfsd thread startup/shutdown locking Jeff Layton @ 2008-06-04 15:03 ` Jeff Layton 2008-06-04 15:03 ` [PATCH 3/4] knfsd: convert knfsd to kthread API Jeff Layton 2008-06-05 0:59 ` [PATCH 2/4] knfsd: remove special handling for SIGHUP Greg Banks 2008-06-04 21:02 ` [PATCH 1/4] knfsd: Replace lock_kernel with a mutex for nfsd thread startup/shutdown locking J. Bruce Fields 2008-06-05 0:47 ` Greg Banks 2 siblings, 2 replies; 18+ messages in thread From: Jeff Layton @ 2008-06-04 15:03 UTC (permalink / raw) To: linux-nfs, nfsv4; +Cc: gnb The special handling for SIGHUP in knfsd is a holdover from much earlier versions of Linux where reloading the export table was more expensive. That facility is not really needed anymore and to my knowledge, is seldom-used. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- fs/nfsd/nfssvc.c | 33 ++++++++------------------------- 1 files changed, 8 insertions(+), 25 deletions(-) diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 040b3c2..582acb1 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -44,11 +44,6 @@ * when not handling a request. i.e. when waiting */ #define SHUTDOWN_SIGS (sigmask(SIGKILL) | sigmask(SIGHUP) | sigmask(SIGINT) | sigmask(SIGQUIT)) -/* if the last thread dies with SIGHUP, then the exports table is - * left unchanged ( like 2.4-{0-9} ). Any other signal will clear - * the exports table (like 2.2). - */ -#define SIG_NOCLEAN SIGHUP extern struct svc_program nfsd_program; static void nfsd(struct svc_rqst *rqstp); @@ -167,7 +162,6 @@ int nfsd_nrthreads(void) return nfsd_serv->sv_nrthreads; } -static int killsig; /* signal that was used to kill last nfsd */ static void nfsd_last_thread(struct svc_serv *serv) { /* When last nfsd thread exits we need to do some clean-up */ @@ -178,11 +172,9 @@ static void nfsd_last_thread(struct svc_serv *serv) nfsd_racache_shutdown(); nfs4_state_shutdown(); - printk(KERN_WARNING "nfsd: last server has exited\n"); - if (killsig != SIG_NOCLEAN) { - printk(KERN_WARNING "nfsd: unexporting all filesystems\n"); - nfsd_export_flush(); - } + printk(KERN_WARNING "nfsd: last server has exited, flushing export " + "cache\n"); + nfsd_export_flush(); } void nfsd_reset_versions(void) @@ -234,10 +226,9 @@ int nfsd_create_serv(void) } atomic_set(&nfsd_busy, 0); - nfsd_serv = svc_create_pooled(&nfsd_program, - nfsd_max_blksize, - nfsd_last_thread, - nfsd, SIG_NOCLEAN, THIS_MODULE); + nfsd_serv = svc_create_pooled(&nfsd_program, nfsd_max_blksize, + nfsd_last_thread, nfsd, SIGINT, + THIS_MODULE); if (nfsd_serv == NULL) err = -ENOMEM; @@ -482,17 +473,9 @@ nfsd(struct svc_rqst *rqstp) atomic_dec(&nfsd_busy); } - if (err != -EINTR) { + if (err != -EINTR) printk(KERN_WARNING "nfsd: terminating on error %d\n", -err); - } else { - unsigned int signo; - - for (signo = 1; signo <= _NSIG; signo++) - if (sigismember(¤t->pending.signal, signo) && - !sigismember(¤t->blocked, signo)) - break; - killsig = signo; - } + /* Clear signals before calling svc_exit_thread() */ flush_signals(current); -- 1.5.3.6 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 3/4] knfsd: convert knfsd to kthread API 2008-06-04 15:03 ` [PATCH 2/4] knfsd: remove special handling for SIGHUP Jeff Layton @ 2008-06-04 15:03 ` Jeff Layton 2008-06-04 15:03 ` [PATCH 4/4] sunrpc: remove unneeded field from svc_serv struct Jeff Layton 2008-06-05 1:28 ` [PATCH 3/4] knfsd: convert knfsd to kthread API Greg Banks 2008-06-05 0:59 ` [PATCH 2/4] knfsd: remove special handling for SIGHUP Greg Banks 1 sibling, 2 replies; 18+ messages in thread From: Jeff Layton @ 2008-06-04 15:03 UTC (permalink / raw) To: linux-nfs, nfsv4; +Cc: neilb, gnb-cP1dWloDopni96+mSzHFpQC/G2K4zDHf This patch is rather large, but I couldn't figure out a way to break it up that would remain bisectable. It does several things: - change svc_thread_fn typedef to better match what kthread_create expects - change svc_pool_map_set_cpumask to be more kthread friendly. Make it take a task arg and and get rid of the "oldmask" - have svc_set_num_threads call kthread_create directly - eliminate __svc_create_thread Signed-off-by: Jeff Layton <jlayton@redhat.com> --- fs/nfsd/nfssvc.c | 52 ++++++++++++++-------- include/linux/sunrpc/svc.h | 2 +- net/sunrpc/svc.c | 102 +++++++++++++++----------------------------- 3 files changed, 68 insertions(+), 88 deletions(-) diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 582acb1..825936b 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -21,6 +21,7 @@ #include <linux/smp_lock.h> #include <linux/freezer.h> #include <linux/fs_struct.h> +#include <linux/kthread.h> #include <linux/sunrpc/types.h> #include <linux/sunrpc/stats.h> @@ -46,7 +47,7 @@ #define SHUTDOWN_SIGS (sigmask(SIGKILL) | sigmask(SIGHUP) | sigmask(SIGINT) | sigmask(SIGQUIT)) extern struct svc_program nfsd_program; -static void nfsd(struct svc_rqst *rqstp); +static int nfsd(void *vrqstp); struct timeval nfssvc_boot; static atomic_t nfsd_busy; static unsigned long nfsd_last_call; @@ -399,18 +400,19 @@ update_thread_usage(int busy_threads) /* * This is the NFS server kernel thread */ -static void -nfsd(struct svc_rqst *rqstp) +static int +nfsd(void *vrqstp) { + struct svc_rqst *rqstp = (struct svc_rqst *) vrqstp; struct fs_struct *fsp; - int err; sigset_t shutdown_mask, allowed_mask; + int err, preverr = 0; + unsigned int signo; /* Lock module and set up kernel thread */ mutex_lock(&nfsd_mutex); - daemonize("nfsd"); - /* After daemonize() this kernel thread shares current->fs + /* At this point, the thread shares current->fs * with the init process. We need to create files with a * umask of 0 instead of init's umask. */ fsp = copy_fs_struct(current->fs); @@ -425,14 +427,18 @@ nfsd(struct svc_rqst *rqstp) siginitsetinv(&shutdown_mask, SHUTDOWN_SIGS); siginitsetinv(&allowed_mask, ALLOWED_SIGS); + /* + * thread is spawned with all signals set to SIG_IGN, re-enable + * the ones that matter + */ + for (signo = 1; signo <= _NSIG; signo++) { + if (!sigismember(&shutdown_mask, signo)) + allow_signal(signo); + } nfsdstats.th_cnt++; - - rqstp->rq_task = current; - mutex_unlock(&nfsd_mutex); - /* * We want less throttling in balance_dirty_pages() so that nfs to * localhost doesn't cause nfsd to lock up due to all the client's @@ -454,15 +460,25 @@ nfsd(struct svc_rqst *rqstp) */ while ((err = svc_recv(rqstp, 60*60*HZ)) == -EAGAIN) ; - if (err < 0) + if (err == -EINTR) break; + else if (err < 0) { + if (err != preverr) { + printk(KERN_WARNING "%s: unexpected error " + "from svc_recv (%d)\n", __func__, -err); + preverr = err; + } + schedule_timeout_uninterruptible(HZ); + continue; + } + update_thread_usage(atomic_read(&nfsd_busy)); atomic_inc(&nfsd_busy); /* Lock the export hash tables for reading. */ exp_readlock(); - /* Process request with signals blocked. */ + /* Process request with signals blocked. */ sigprocmask(SIG_SETMASK, &allowed_mask, NULL); svc_process(rqstp); @@ -471,25 +487,23 @@ nfsd(struct svc_rqst *rqstp) exp_readunlock(); update_thread_usage(atomic_read(&nfsd_busy)); atomic_dec(&nfsd_busy); - } - if (err != -EINTR) - printk(KERN_WARNING "nfsd: terminating on error %d\n", -err); + /* reset err in case kthread_stop is called */ + err = 0; + } /* Clear signals before calling svc_exit_thread() */ flush_signals(current); mutex_lock(&nfsd_mutex); - nfsdstats.th_cnt --; out: /* Release the thread */ svc_exit_thread(rqstp); - - /* Release module */ mutex_unlock(&nfsd_mutex); - module_put_and_exit(0); + + return 0; } static __be32 map_new_errors(u32 vers, __be32 nfserr) diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 4b54c5f..011d6d8 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -22,7 +22,7 @@ /* * This is the RPC server thread function prototype */ -typedef void (*svc_thread_fn)(struct svc_rqst *); +typedef int (*svc_thread_fn)(void *); /* * diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 7bffaff..f461da2 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -18,6 +18,7 @@ #include <linux/mm.h> #include <linux/interrupt.h> #include <linux/module.h> +#include <linux/kthread.h> #include <linux/sunrpc/types.h> #include <linux/sunrpc/xdr.h> @@ -291,15 +292,14 @@ svc_pool_map_put(void) /* - * Set the current thread's cpus_allowed mask so that it + * Set the given thread's cpus_allowed mask so that it * will only run on cpus in the given pool. - * - * Returns 1 and fills in oldmask iff a cpumask was applied. */ -static inline int -svc_pool_map_set_cpumask(unsigned int pidx, cpumask_t *oldmask) +static inline void +svc_pool_map_set_cpumask(struct task_struct *task, unsigned int pidx) { struct svc_pool_map *m = &svc_pool_map; + unsigned int node = m->pool_to[pidx]; /* * The caller checks for sv_nrpools > 1, which @@ -307,26 +307,17 @@ svc_pool_map_set_cpumask(unsigned int pidx, cpumask_t *oldmask) */ BUG_ON(m->count == 0); - switch (m->mode) - { - default: - return 0; + switch (m->mode) { case SVC_POOL_PERCPU: { - unsigned int cpu = m->pool_to[pidx]; - - *oldmask = current->cpus_allowed; - set_cpus_allowed_ptr(current, &cpumask_of_cpu(cpu)); - return 1; + set_cpus_allowed_ptr(task, &cpumask_of_cpu(node)); + break; } case SVC_POOL_PERNODE: { - unsigned int node = m->pool_to[pidx]; node_to_cpumask_ptr(nodecpumask, node); - - *oldmask = current->cpus_allowed; - set_cpus_allowed_ptr(current, nodecpumask); - return 1; + set_cpus_allowed_ptr(task, nodecpumask); + break; } } } @@ -579,47 +570,6 @@ out_enomem: EXPORT_SYMBOL(svc_prepare_thread); /* - * Create a thread in the given pool. Caller must hold BKL or another lock to - * serialize access to the svc_serv struct. On a NUMA or SMP machine, with a - * multi-pool serv, the thread will be restricted to run on the cpus belonging - * to the pool. - */ -static int -__svc_create_thread(svc_thread_fn func, struct svc_serv *serv, - struct svc_pool *pool) -{ - struct svc_rqst *rqstp; - int error = -ENOMEM; - int have_oldmask = 0; - cpumask_t uninitialized_var(oldmask); - - rqstp = svc_prepare_thread(serv, pool); - if (IS_ERR(rqstp)) { - error = PTR_ERR(rqstp); - goto out; - } - - if (serv->sv_nrpools > 1) - have_oldmask = svc_pool_map_set_cpumask(pool->sp_id, &oldmask); - - error = kernel_thread((int (*)(void *)) func, rqstp, 0); - - if (have_oldmask) - set_cpus_allowed(current, oldmask); - - if (error < 0) - goto out_thread; - svc_sock_update_bufs(serv); - error = 0; -out: - return error; - -out_thread: - svc_exit_thread(rqstp); - goto out; -} - -/* * Choose a pool in which to create a new thread, for svc_set_num_threads */ static inline struct svc_pool * @@ -688,7 +638,9 @@ found_pool: int svc_set_num_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs) { - struct task_struct *victim; + struct svc_rqst *rqstp; + struct task_struct *task; + struct svc_pool *chosen_pool; int error = 0; unsigned int state = serv->sv_nrthreads-1; @@ -704,18 +656,32 @@ svc_set_num_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs) /* create new threads */ while (nrservs > 0) { nrservs--; - __module_get(serv->sv_module); - error = __svc_create_thread(serv->sv_function, serv, - choose_pool(serv, pool, &state)); - if (error < 0) { - module_put(serv->sv_module); + chosen_pool = choose_pool(serv, pool, &state); + + rqstp = svc_prepare_thread(serv, chosen_pool); + if (IS_ERR(rqstp)) { + error = PTR_ERR(rqstp); break; } + + task = kthread_create(serv->sv_function, rqstp, serv->sv_name); + if (IS_ERR(task)) { + error = PTR_ERR(task); + svc_exit_thread(rqstp); + break; + } + + rqstp->rq_task = task; + if (serv->sv_nrpools > 1) + svc_pool_map_set_cpumask(task, chosen_pool->sp_id); + + svc_sock_update_bufs(serv); + wake_up_process(task); } /* destroy old threads */ while (nrservs < 0 && - (victim = choose_victim(serv, pool, &state)) != NULL) { - send_sig(serv->sv_kill_signal, victim, 1); + (task = choose_victim(serv, pool, &state)) != NULL) { + send_sig(serv->sv_kill_signal, task, 1); nrservs++; } -- 1.5.3.6 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 4/4] sunrpc: remove unneeded field from svc_serv struct 2008-06-04 15:03 ` [PATCH 3/4] knfsd: convert knfsd to kthread API Jeff Layton @ 2008-06-04 15:03 ` Jeff Layton 2008-06-05 1:30 ` Greg Banks 2008-06-05 1:28 ` [PATCH 3/4] knfsd: convert knfsd to kthread API Greg Banks 1 sibling, 1 reply; 18+ messages in thread From: Jeff Layton @ 2008-06-04 15:03 UTC (permalink / raw) To: linux-nfs, nfsv4; +Cc: neilb, gnb-cP1dWloDopni96+mSzHFpQC/G2K4zDHf Remove the sv_module field from the svc_serv struct since svc_set_num_threads doesn't bother with module reference counts anymore. Signed-off-by: Jeff Layton <jlayton@redhat.com> --- fs/nfsd/nfssvc.c | 3 +-- include/linux/sunrpc/svc.h | 8 ++------ net/sunrpc/svc.c | 3 +-- 3 files changed, 4 insertions(+), 10 deletions(-) diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 825936b..a87f7f9 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -228,8 +228,7 @@ int nfsd_create_serv(void) atomic_set(&nfsd_busy, 0); nfsd_serv = svc_create_pooled(&nfsd_program, nfsd_max_blksize, - nfsd_last_thread, nfsd, SIGINT, - THIS_MODULE); + nfsd_last_thread, nfsd, SIGINT); if (nfsd_serv == NULL) err = -ENOMEM; diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 011d6d8..6d69165 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -72,13 +72,9 @@ struct svc_serv { unsigned int sv_nrpools; /* number of thread pools */ struct svc_pool * sv_pools; /* array of thread pools */ + /* Callback to use when last thread exits */ void (*sv_shutdown)(struct svc_serv *serv); - /* Callback to use when last thread - * exits. - */ - struct module * sv_module; /* optional module to count when - * adding threads */ svc_thread_fn sv_function; /* main function for threads */ int sv_kill_signal; /* signal to kill threads */ }; @@ -389,7 +385,7 @@ struct svc_rqst *svc_prepare_thread(struct svc_serv *serv, void svc_exit_thread(struct svc_rqst *); struct svc_serv * svc_create_pooled(struct svc_program *, unsigned int, void (*shutdown)(struct svc_serv*), - svc_thread_fn, int sig, struct module *); + svc_thread_fn, int); int svc_set_num_threads(struct svc_serv *, struct svc_pool *, int); void svc_destroy(struct svc_serv *); int svc_process(struct svc_rqst *); diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index f461da2..9eabb3f 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -434,7 +434,7 @@ EXPORT_SYMBOL(svc_create); struct svc_serv * svc_create_pooled(struct svc_program *prog, unsigned int bufsize, void (*shutdown)(struct svc_serv *serv), - svc_thread_fn func, int sig, struct module *mod) + svc_thread_fn func, int sig) { struct svc_serv *serv; unsigned int npools = svc_pool_map_get(); @@ -444,7 +444,6 @@ svc_create_pooled(struct svc_program *prog, unsigned int bufsize, if (serv != NULL) { serv->sv_function = func; serv->sv_kill_signal = sig; - serv->sv_module = mod; } return serv; -- 1.5.3.6 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 4/4] sunrpc: remove unneeded field from svc_serv struct 2008-06-04 15:03 ` [PATCH 4/4] sunrpc: remove unneeded field from svc_serv struct Jeff Layton @ 2008-06-05 1:30 ` Greg Banks 0 siblings, 0 replies; 18+ messages in thread From: Greg Banks @ 2008-06-05 1:30 UTC (permalink / raw) To: Jeff Layton; +Cc: linux-nfs, nfsv4 Jeff Layton wrote: > Remove the sv_module field from the svc_serv struct since > svc_set_num_threads doesn't bother with module reference > counts anymore. > > Signed-off-by: Jeff Layton <jlayton@redhat.com> > Acked-by: Greg Banks <gnb@melbourne.sgi.com> -- Greg Banks, P.Engineer, SGI Australian Software Group. The cake is *not* a lie. I don't speak for SGI. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 3/4] knfsd: convert knfsd to kthread API 2008-06-04 15:03 ` [PATCH 3/4] knfsd: convert knfsd to kthread API Jeff Layton 2008-06-04 15:03 ` [PATCH 4/4] sunrpc: remove unneeded field from svc_serv struct Jeff Layton @ 2008-06-05 1:28 ` Greg Banks 1 sibling, 0 replies; 18+ messages in thread From: Greg Banks @ 2008-06-05 1:28 UTC (permalink / raw) To: Jeff Layton; +Cc: linux-nfs, nfsv4 Jeff Layton wrote: > This patch is rather large, but I couldn't figure out a way to break it > up that would remain bisectable. It does several things: > > - change svc_thread_fn typedef to better match what kthread_create expects > - change svc_pool_map_set_cpumask to be more kthread friendly. Make it > take a task arg and and get rid of the "oldmask" > - have svc_set_num_threads call kthread_create directly > - eliminate __svc_create_thread > > Signed-off-by: Jeff Layton <jlayton@redhat.com> > > Acked-by: Greg Banks <gnb@melbourne.sgi.com> > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c > index 582acb1..825936b 100644 > --- a/fs/nfsd/nfssvc.c > +++ b/fs/nfsd/nfssvc.c > @@ -21,6 +21,7 @@ > @@ -46,7 +47,7 @@ > @@ -399,18 +400,19 @@ update_thread_usage(int busy_threads) > /* > * This is the NFS server kernel thread > */ > -static void > -nfsd(struct svc_rqst *rqstp) > +static int > +nfsd(void *vrqstp) > { > + struct svc_rqst *rqstp = (struct svc_rqst *) vrqstp; > Not that it matters, but you don't actually need a cast here. > @@ -425,14 +427,18 @@ nfsd(struct svc_rqst *rqstp) > @@ -454,15 +460,25 @@ nfsd(struct svc_rqst *rqstp) > @@ -471,25 +487,23 @@ nfsd(struct svc_rqst *rqstp) > --- a/include/linux/sunrpc/svc.h > +++ b/include/linux/sunrpc/svc.h > @@ -22,7 +22,7 @@ > --- a/net/sunrpc/svc.c > +++ b/net/sunrpc/svc.c > @@ -18,6 +18,7 @@ > @@ -291,15 +292,14 @@ svc_pool_map_put(void) > @@ -307,26 +307,17 @@ svc_pool_map_set_cpumask(unsigned int pidx, cpumask_t *oldmask) > @@ -579,47 +570,6 @@ out_enomem: > @@ -688,7 +638,9 @@ found_pool: > @@ -704,18 +656,32 @@ svc_set_num_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs) > All good. -- Greg Banks, P.Engineer, SGI Australian Software Group. The cake is *not* a lie. I don't speak for SGI. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/4] knfsd: remove special handling for SIGHUP 2008-06-04 15:03 ` [PATCH 2/4] knfsd: remove special handling for SIGHUP Jeff Layton 2008-06-04 15:03 ` [PATCH 3/4] knfsd: convert knfsd to kthread API Jeff Layton @ 2008-06-05 0:59 ` Greg Banks 1 sibling, 0 replies; 18+ messages in thread From: Greg Banks @ 2008-06-05 0:59 UTC (permalink / raw) To: Jeff Layton; +Cc: linux-nfs, nfsv4, neilb Jeff Layton wrote: > The special handling for SIGHUP in knfsd is a holdover from much > earlier versions of Linux where reloading the export table was > more expensive. That facility is not really needed anymore and > to my knowledge, is seldom-used. > > Signed-off-by: Jeff Layton <jlayton@redhat.com> > Acked-by: Greg Banks <gnb-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org> > @@ -234,10 +226,9 @@ int nfsd_create_serv(void) > } > > atomic_set(&nfsd_busy, 0); > - nfsd_serv = svc_create_pooled(&nfsd_program, > - nfsd_max_blksize, > - nfsd_last_thread, > - nfsd, SIG_NOCLEAN, THIS_MODULE); > + nfsd_serv = svc_create_pooled(&nfsd_program, nfsd_max_blksize, > + nfsd_last_thread, nfsd, SIGINT, > + THIS_MODULE); > Now that there's no special interpretation of signals, you could probably also remove the `sig' argument to svc_create_pooled() and the svc_serv->sv_kill_signal field. -- Greg Banks, P.Engineer, SGI Australian Software Group. The cake is *not* a lie. I don't speak for SGI. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/4] knfsd: Replace lock_kernel with a mutex for nfsd thread startup/shutdown locking. 2008-06-04 15:03 ` [PATCH 1/4] knfsd: Replace lock_kernel with a mutex for nfsd thread startup/shutdown locking Jeff Layton 2008-06-04 15:03 ` [PATCH 2/4] knfsd: remove special handling for SIGHUP Jeff Layton @ 2008-06-04 21:02 ` J. Bruce Fields 2008-06-04 21:27 ` Jeff Layton 2008-06-05 0:47 ` Greg Banks 2 siblings, 1 reply; 18+ messages in thread From: J. Bruce Fields @ 2008-06-04 21:02 UTC (permalink / raw) To: Jeff Layton; +Cc: linux-nfs, nfsv4, gnb-cP1dWloDopni96+mSzHFpQC/G2K4zDHf On Wed, Jun 04, 2008 at 11:03:13AM -0400, Jeff Layton wrote: > diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c > index 5ac00c4..d601a77 100644 > --- a/fs/nfsd/nfsctl.c > +++ b/fs/nfsd/nfsctl.c ... > @@ -566,14 +574,13 @@ static ssize_t write_versions(struct file *file, char *buf, size_t size) > return len; > } > > -static ssize_t write_ports(struct file *file, char *buf, size_t size) > +static ssize_t __write_ports(struct file *file, char *buf, size_t size) > { > if (size == 0) { > int len = 0; > - lock_kernel(); > + > if (nfsd_serv) > len = svc_xprt_names(nfsd_serv, buf, 0); > - unlock_kernel(); svc_xprt_names() has to be prepared to accept NULL as a first parameter (since we've got nothing here any longer to guarantee that nfsd_serv won't change after we've checked it). And, indeed, it does check for that (with its local copy, which won't change. So that's OK. But then could we just ditch this redundant check here? It's confusing. Oops, but: what happens if something like this races with svc_destroy, so svc_xprt_names() is passed a pointer to freed memory? --b. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/4] knfsd: Replace lock_kernel with a mutex for nfsd thread startup/shutdown locking. 2008-06-04 21:02 ` [PATCH 1/4] knfsd: Replace lock_kernel with a mutex for nfsd thread startup/shutdown locking J. Bruce Fields @ 2008-06-04 21:27 ` Jeff Layton [not found] ` <20080604172752.31686797-RtJpwOs3+0O+kQycOl6kW4xkIHaj4LzF@public.gmane.org> 0 siblings, 1 reply; 18+ messages in thread From: Jeff Layton @ 2008-06-04 21:27 UTC (permalink / raw) To: J. Bruce Fields; +Cc: linux-nfs, nfsv4, gnb On Wed, 4 Jun 2008 17:02:35 -0400 "J. Bruce Fields" <bfields@fieldses.org> wrote: > On Wed, Jun 04, 2008 at 11:03:13AM -0400, Jeff Layton wrote: > > diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c > > index 5ac00c4..d601a77 100644 > > --- a/fs/nfsd/nfsctl.c > > +++ b/fs/nfsd/nfsctl.c > ... > > @@ -566,14 +574,13 @@ static ssize_t write_versions(struct file *file, char *buf, size_t size) > > return len; > > } > > > > -static ssize_t write_ports(struct file *file, char *buf, size_t size) > > +static ssize_t __write_ports(struct file *file, char *buf, size_t size) > > { > > if (size == 0) { > > int len = 0; > > - lock_kernel(); > > + > > if (nfsd_serv) > > len = svc_xprt_names(nfsd_serv, buf, 0); > > - unlock_kernel(); > > svc_xprt_names() has to be prepared to accept NULL as a first parameter > (since we've got nothing here any longer to guarantee that nfsd_serv > won't change after we've checked it). And, indeed, it does check for > that (with its local copy, which won't change. So that's OK. But then > could we just ditch this redundant check here? It's confusing. > > Oops, but: what happens if something like this races with svc_destroy, > so svc_xprt_names() is passed a pointer to freed memory? > We do have a guarantee that nfsd_serv won't change after it's checked here. The new nfsd_mutex protects it. write_ports has been renamed to __write_ports, and write_ports has been turned into a wrapper that runs the entire original function under the nfsd_mutex. We also have nfsd hold the nfsd_mutex when svc_exit_thread is called, so svc_destroy should also be called while holding it. That should serialize access to the nfsd_serv. I think you're correct that we can get rid of the redundant null pointer check in __write_ports here though. Cheers, -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <20080604172752.31686797-RtJpwOs3+0O+kQycOl6kW4xkIHaj4LzF@public.gmane.org>]
* Re: [PATCH 1/4] knfsd: Replace lock_kernel with a mutex for nfsd thread startup/shutdown locking. [not found] ` <20080604172752.31686797-RtJpwOs3+0O+kQycOl6kW4xkIHaj4LzF@public.gmane.org> @ 2008-06-04 21:58 ` J. Bruce Fields 2008-06-04 22:41 ` J. Bruce Fields 0 siblings, 1 reply; 18+ messages in thread From: J. Bruce Fields @ 2008-06-04 21:58 UTC (permalink / raw) To: Jeff Layton; +Cc: linux-nfs, nfsv4, gnb-cP1dWloDopni96+mSzHFpQC/G2K4zDHf On Wed, Jun 04, 2008 at 05:27:52PM -0400, Jeff Layton wrote: > On Wed, 4 Jun 2008 17:02:35 -0400 > "J. Bruce Fields" <bfields@fieldses.org> wrote: > > > On Wed, Jun 04, 2008 at 11:03:13AM -0400, Jeff Layton wrote: > > > diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c > > > index 5ac00c4..d601a77 100644 > > > --- a/fs/nfsd/nfsctl.c > > > +++ b/fs/nfsd/nfsctl.c > > ... > > > @@ -566,14 +574,13 @@ static ssize_t write_versions(struct file *file, char *buf, size_t size) > > > return len; > > > } > > > > > > -static ssize_t write_ports(struct file *file, char *buf, size_t size) > > > +static ssize_t __write_ports(struct file *file, char *buf, size_t size) > > > { > > > if (size == 0) { > > > int len = 0; > > > - lock_kernel(); > > > + > > > if (nfsd_serv) > > > len = svc_xprt_names(nfsd_serv, buf, 0); > > > - unlock_kernel(); > > > > svc_xprt_names() has to be prepared to accept NULL as a first parameter > > (since we've got nothing here any longer to guarantee that nfsd_serv > > won't change after we've checked it). And, indeed, it does check for > > that (with its local copy, which won't change. So that's OK. But then > > could we just ditch this redundant check here? It's confusing. > > > > Oops, but: what happens if something like this races with svc_destroy, > > so svc_xprt_names() is passed a pointer to freed memory? > > > > We do have a guarantee that nfsd_serv won't change after it's checked > here. The new nfsd_mutex protects it. write_ports has been renamed to > __write_ports, and write_ports has been turned into a wrapper that runs > the entire original function under the nfsd_mutex. We also have nfsd > hold the nfsd_mutex when svc_exit_thread is called, so svc_destroy > should also be called while holding it. That should serialize access > to the nfsd_serv. Of course, you're right; thanks for setting me straight! > > I think you're correct that we can get rid of the redundant null > pointer check in __write_ports here though. OK.--b. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/4] knfsd: Replace lock_kernel with a mutex for nfsd thread startup/shutdown locking. 2008-06-04 21:58 ` J. Bruce Fields @ 2008-06-04 22:41 ` J. Bruce Fields 2008-06-05 0:07 ` Jeff Layton 0 siblings, 1 reply; 18+ messages in thread From: J. Bruce Fields @ 2008-06-04 22:41 UTC (permalink / raw) To: Jeff Layton; +Cc: linux-nfs, nfsv4, gnb-cP1dWloDopni96+mSzHFpQC/G2K4zDHf On Wed, Jun 04, 2008 at 05:58:15PM -0400, bfields wrote: > On Wed, Jun 04, 2008 at 05:27:52PM -0400, Jeff Layton wrote: > > On Wed, 4 Jun 2008 17:02:35 -0400 > > "J. Bruce Fields" <bfields@fieldses.org> wrote: > > > > > On Wed, Jun 04, 2008 at 11:03:13AM -0400, Jeff Layton wrote: > > > > diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c > > > > index 5ac00c4..d601a77 100644 > > > > --- a/fs/nfsd/nfsctl.c > > > > +++ b/fs/nfsd/nfsctl.c > > > ... > > > > @@ -566,14 +574,13 @@ static ssize_t write_versions(struct file *file, char *buf, size_t size) > > > > return len; > > > > } > > > > > > > > -static ssize_t write_ports(struct file *file, char *buf, size_t size) > > > > +static ssize_t __write_ports(struct file *file, char *buf, size_t size) > > > > { > > > > if (size == 0) { > > > > int len = 0; > > > > - lock_kernel(); > > > > + > > > > if (nfsd_serv) > > > > len = svc_xprt_names(nfsd_serv, buf, 0); > > > > - unlock_kernel(); > > > > > > svc_xprt_names() has to be prepared to accept NULL as a first parameter > > > (since we've got nothing here any longer to guarantee that nfsd_serv > > > won't change after we've checked it). And, indeed, it does check for > > > that (with its local copy, which won't change. So that's OK. But then > > > could we just ditch this redundant check here? It's confusing. > > > > > > Oops, but: what happens if something like this races with svc_destroy, > > > so svc_xprt_names() is passed a pointer to freed memory? > > > > > > > We do have a guarantee that nfsd_serv won't change after it's checked > > here. The new nfsd_mutex protects it. write_ports has been renamed to > > __write_ports, and write_ports has been turned into a wrapper that runs > > the entire original function under the nfsd_mutex. We also have nfsd > > hold the nfsd_mutex when svc_exit_thread is called, so svc_destroy > > should also be called while holding it. That should serialize access > > to the nfsd_serv. > > Of course, you're right; thanks for setting me straight! One more random point of confusion: is write_versions racy? It assigns to nfsd_versions, which is used in svc_process() to decide whether a version is supported or not, without doing the adjustment of rq_argp and rq_resp which a comment in write_versions() says is necessary. And there's no locking around the nfsd_serv check there. So in theory could a write_versions() at the wrong time result in an nfsd that accepted nfs versions that it shouldn't (and hence could overflow some buffer)? That'd be a preexisting problem, nothing to do with your work--I was just grepping for uses of nfsd_serv.... --b. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/4] knfsd: Replace lock_kernel with a mutex for nfsd thread startup/shutdown locking. 2008-06-04 22:41 ` J. Bruce Fields @ 2008-06-05 0:07 ` Jeff Layton 0 siblings, 0 replies; 18+ messages in thread From: Jeff Layton @ 2008-06-05 0:07 UTC (permalink / raw) To: J. Bruce Fields; +Cc: linux-nfs, nfsv4, gnb On Wed, 4 Jun 2008 18:41:20 -0400 "J. Bruce Fields" <bfields@fieldses.org> wrote: > On Wed, Jun 04, 2008 at 05:58:15PM -0400, bfields wrote: > > On Wed, Jun 04, 2008 at 05:27:52PM -0400, Jeff Layton wrote: > > > On Wed, 4 Jun 2008 17:02:35 -0400 > > > "J. Bruce Fields" <bfields@fieldses.org> wrote: > > > > > > > On Wed, Jun 04, 2008 at 11:03:13AM -0400, Jeff Layton wrote: > > > > > diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c > > > > > index 5ac00c4..d601a77 100644 > > > > > --- a/fs/nfsd/nfsctl.c > > > > > +++ b/fs/nfsd/nfsctl.c > > > > ... > > > > > @@ -566,14 +574,13 @@ static ssize_t write_versions(struct file *file, char *buf, size_t size) > > > > > return len; > > > > > } > > > > > > > > > > -static ssize_t write_ports(struct file *file, char *buf, size_t size) > > > > > +static ssize_t __write_ports(struct file *file, char *buf, size_t size) > > > > > { > > > > > if (size == 0) { > > > > > int len = 0; > > > > > - lock_kernel(); > > > > > + > > > > > if (nfsd_serv) > > > > > len = svc_xprt_names(nfsd_serv, buf, 0); > > > > > - unlock_kernel(); > > > > > > > > svc_xprt_names() has to be prepared to accept NULL as a first parameter > > > > (since we've got nothing here any longer to guarantee that nfsd_serv > > > > won't change after we've checked it). And, indeed, it does check for > > > > that (with its local copy, which won't change. So that's OK. But then > > > > could we just ditch this redundant check here? It's confusing. > > > > > > > > Oops, but: what happens if something like this races with svc_destroy, > > > > so svc_xprt_names() is passed a pointer to freed memory? > > > > > > > > > > We do have a guarantee that nfsd_serv won't change after it's checked > > > here. The new nfsd_mutex protects it. write_ports has been renamed to > > > __write_ports, and write_ports has been turned into a wrapper that runs > > > the entire original function under the nfsd_mutex. We also have nfsd > > > hold the nfsd_mutex when svc_exit_thread is called, so svc_destroy > > > should also be called while holding it. That should serialize access > > > to the nfsd_serv. > > > > Of course, you're right; thanks for setting me straight! > > One more random point of confusion: is write_versions racy? It assigns > to nfsd_versions, which is used in svc_process() to decide whether a > version is supported or not, without doing the adjustment of rq_argp and > rq_resp which a comment in write_versions() says is necessary. And > there's no locking around the nfsd_serv check there. So in theory could > a write_versions() at the wrong time result in an nfsd that accepted nfs > versions that it shouldn't (and hence could overflow some buffer)? > Hmm. You may be right, though I'd think the race is pretty unlikely in normal usage. I guess the comment you're referring to is this one: if (nfsd_serv) /* Cannot change versions without updating * nfsd_serv->sv_xdrsize, and reallocing * rq_argp and rq_resp */ return -EBUSY; ...so the race would have to be: nfsd is down write versions is called and gets past nfsd_serv NULL ptr check nfsd accepts a call write versions disables the NFS version that was in the call A pretty unlikely race, I think, but might be possible. Holding the nfsd_mutex over the life of write_versions is probably the right thing to do here. I'll plan a respin to add that (and I'll check that it doesn't cause any problems). > That'd be a preexisting problem, nothing to do with your work--I was > just grepping for uses of nfsd_serv.... > This is actually Neil's work...I only added the signed-off-by since I added and cleaned up some comments. ;-) -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/4] knfsd: Replace lock_kernel with a mutex for nfsd thread startup/shutdown locking. 2008-06-04 15:03 ` [PATCH 1/4] knfsd: Replace lock_kernel with a mutex for nfsd thread startup/shutdown locking Jeff Layton 2008-06-04 15:03 ` [PATCH 2/4] knfsd: remove special handling for SIGHUP Jeff Layton 2008-06-04 21:02 ` [PATCH 1/4] knfsd: Replace lock_kernel with a mutex for nfsd thread startup/shutdown locking J. Bruce Fields @ 2008-06-05 0:47 ` Greg Banks 2008-06-05 20:03 ` J. Bruce Fields 2 siblings, 1 reply; 18+ messages in thread From: Greg Banks @ 2008-06-05 0:47 UTC (permalink / raw) To: Jeff Layton; +Cc: linux-nfs, nfsv4 Jeff Layton wrote: > From: Neil Brown <neilb@suse.de> > > > Sorry; I know this was posted before but now that I'm back home I've had a chance to review it in slightly more depth so I have some more comments. > diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c > index 5ac00c4..d601a77 100644 > --- a/fs/nfsd/nfsctl.c > +++ b/fs/nfsd/nfsctl.c > @@ -441,6 +441,8 @@ static ssize_t write_threads(struct file *file, char *buf, size_t size) > @@ -450,22 +452,26 @@ static ssize_t write_pool_threads(struct file *file, char *buf, size_t size) > @@ -496,10 +502,12 @@ static ssize_t write_pool_threads(struct file *file, char *buf, size_t size) > These are fine. > @@ -566,14 +574,13 @@ static ssize_t write_versions(struct file *file, char *buf, size_t size) > I notice there's no change to write_versions(), even though that function checks nfsd_serv for NULL and implicitly expects the svc_serv to be created while it's running. > @@ -603,9 +610,7 @@ static ssize_t write_ports(struct file *file, char *buf, size_t size) > @@ -614,10 +619,8 @@ static ssize_t write_ports(struct file *file, char *buf, size_t size > @@ -655,7 +658,6 @@ static ssize_t write_ports(struct file *file, char *buf, size_t size) > @@ -666,13 +668,22 @@ static ssize_t write_ports(struct file *file, char *buf, size_t size) > @@ -691,13 +702,13 @@ static ssize_t write_maxblksize(struct file *file, char *buf, size_t size) > All good. I notice there's no change to write_leasetime(). That calls nfs4_reset_lease(), which seems to want to only be called when nfsd is not started (according to my reading of the comment preceding the function), and which uses the BKL to protect the variable user_lease_time. Perhaps that path should be changed to use the new nfsd_mutex also? Similarly with write_recoverydir(). That calls nfs4_reset_recoverydir() which uses client_mutex to protect user_recovery_dirname[], but that variable is only used during nfsd startup. Perhaps that path should use the new nfsd_mutex also? > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c > index 941041f..040b3c2 100644 > --- a/fs/nfsd/nfssvc.c > +++ b/fs/nfsd/nfssvc.c > @@ -53,11 +53,27 @@ > @@ -190,13 +206,14 @@ void nfsd_reset_versions(void) > @@ -223,7 +240,7 @@ int nfsd_create_serv(void) > @@ -282,6 +299,8 @@ int nfsd_set_nrthreads(int n, int *nthreads) > @@ -316,7 +335,6 @@ int nfsd_set_nrthreads(int n, int *nthreads) > @@ -325,7 +343,6 @@ int nfsd_set_nrthreads(int n, int *nthreads) > @@ -334,8 +351,8 @@ int > @@ -363,7 +380,7 @@ nfsd_svc(unsigned short port, int nrservs) > @@ -399,7 +416,7 @@ nfsd(struct svc_rqst *rqstp) > @@ -417,11 +434,13 @@ nfsd(struct svc_rqst *rqstp) > @@ -477,7 +496,7 @@ nfsd(struct svc_rqst *rqstp) > @@ -486,7 +505,7 @@ out: > All good. > diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c > index 01c7e31..7bffaff 100644 > --- a/net/sunrpc/svc.c > +++ b/net/sunrpc/svc.c > @@ -461,7 +461,8 @@ svc_create_pooled(struct svc_program *prog, unsigned int bufsize, > @@ -578,9 +579,10 @@ out_enomem: > @@ -674,7 +676,7 @@ found_pool: > @@ -722,7 +724,8 @@ svc_set_num_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs) > All good. -- Greg Banks, P.Engineer, SGI Australian Software Group. The cake is *not* a lie. I don't speak for SGI. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/4] knfsd: Replace lock_kernel with a mutex for nfsd thread startup/shutdown locking. 2008-06-05 0:47 ` Greg Banks @ 2008-06-05 20:03 ` J. Bruce Fields 2008-06-05 20:15 ` Jeff Layton 2008-06-05 23:35 ` Greg Banks 0 siblings, 2 replies; 18+ messages in thread From: J. Bruce Fields @ 2008-06-05 20:03 UTC (permalink / raw) To: Greg Banks; +Cc: linux-nfs, nfsv4, Jeff Layton On Thu, Jun 05, 2008 at 10:47:37AM +1000, Greg Banks wrote: > I notice there's no change to write_leasetime(). That calls > nfs4_reset_lease(), which seems to want to only be called when nfsd is > not started (according to my reading of the comment preceding the > function), and which uses the BKL to protect the variable > user_lease_time. Perhaps that path should be changed to use the new > nfsd_mutex also? write_leasetime() just results in setting the user_lease_time, which is copied (once, on server startup) to lease_time, which is what is actually used by the running server. So I don't think there's a race here (and the existing BKL use in write_leasetime() isn't really needed). --b. > > Similarly with write_recoverydir(). That calls nfs4_reset_recoverydir() > which uses client_mutex to protect user_recovery_dirname[], but that > variable is only used during nfsd startup. Perhaps that path should use > the new nfsd_mutex also? > > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c > > index 941041f..040b3c2 100644 > > --- a/fs/nfsd/nfssvc.c > > +++ b/fs/nfsd/nfssvc.c > > @@ -53,11 +53,27 @@ > > @@ -190,13 +206,14 @@ void nfsd_reset_versions(void) > > @@ -223,7 +240,7 @@ int nfsd_create_serv(void) > > @@ -282,6 +299,8 @@ int nfsd_set_nrthreads(int n, int *nthreads) > > @@ -316,7 +335,6 @@ int nfsd_set_nrthreads(int n, int *nthreads) > > @@ -325,7 +343,6 @@ int nfsd_set_nrthreads(int n, int *nthreads) > > @@ -334,8 +351,8 @@ int > > @@ -363,7 +380,7 @@ nfsd_svc(unsigned short port, int nrservs) > > @@ -399,7 +416,7 @@ nfsd(struct svc_rqst *rqstp) > > @@ -417,11 +434,13 @@ nfsd(struct svc_rqst *rqstp) > > @@ -477,7 +496,7 @@ nfsd(struct svc_rqst *rqstp) > > @@ -486,7 +505,7 @@ out: > > > All good. > > diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c > > index 01c7e31..7bffaff 100644 > > --- a/net/sunrpc/svc.c > > +++ b/net/sunrpc/svc.c > > @@ -461,7 +461,8 @@ svc_create_pooled(struct svc_program *prog, unsigned int bufsize, > > @@ -578,9 +579,10 @@ out_enomem: > > @@ -674,7 +676,7 @@ found_pool: > > @@ -722,7 +724,8 @@ svc_set_num_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs) > > > > All good. > > -- > Greg Banks, P.Engineer, SGI Australian Software Group. > The cake is *not* a lie. > I don't speak for SGI. > > _______________________________________________ > NFSv4 mailing list > NFSv4@linux-nfs.org > http://linux-nfs.org/cgi-bin/mailman/listinfo/nfsv4 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/4] knfsd: Replace lock_kernel with a mutex for nfsd thread startup/shutdown locking. 2008-06-05 20:03 ` J. Bruce Fields @ 2008-06-05 20:15 ` Jeff Layton 2008-06-05 23:35 ` Greg Banks 1 sibling, 0 replies; 18+ messages in thread From: Jeff Layton @ 2008-06-05 20:15 UTC (permalink / raw) To: J. Bruce Fields; +Cc: linux-nfs, nfsv4, Greg Banks On Thu, 5 Jun 2008 16:03:58 -0400 "J. Bruce Fields" <bfields@fieldses.org> wrote: > On Thu, Jun 05, 2008 at 10:47:37AM +1000, Greg Banks wrote: > > I notice there's no change to write_leasetime(). That calls > > nfs4_reset_lease(), which seems to want to only be called when nfsd is > > not started (according to my reading of the comment preceding the > > function), and which uses the BKL to protect the variable > > user_lease_time. Perhaps that path should be changed to use the new > > nfsd_mutex also? > > write_leasetime() just results in setting the user_lease_time, which is > copied (once, on server startup) to lease_time, which is what is > actually used by the running server. So I don't think there's a race > here (and the existing BKL use in write_leasetime() isn't really > needed). > > --b. > I think write_recoverydir might be a similar situation. It just sets user_recovery_dirname, and that is also only read once when nfsd is starting. We can probably eliminate the client_mutex locking around that too. Presuming that Bruce and I are correct, then I think I only need to make sure that we add some nfsd_mutex locking to write_versions. -- Jeff > > > > Similarly with write_recoverydir(). That calls nfs4_reset_recoverydir() > > which uses client_mutex to protect user_recovery_dirname[], but that > > variable is only used during nfsd startup. Perhaps that path should use > > the new nfsd_mutex also? > > > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c > > > index 941041f..040b3c2 100644 > > > --- a/fs/nfsd/nfssvc.c > > > +++ b/fs/nfsd/nfssvc.c > > > @@ -53,11 +53,27 @@ > > > @@ -190,13 +206,14 @@ void nfsd_reset_versions(void) > > > @@ -223,7 +240,7 @@ int nfsd_create_serv(void) > > > @@ -282,6 +299,8 @@ int nfsd_set_nrthreads(int n, int *nthreads) > > > @@ -316,7 +335,6 @@ int nfsd_set_nrthreads(int n, int *nthreads) > > > @@ -325,7 +343,6 @@ int nfsd_set_nrthreads(int n, int *nthreads) > > > @@ -334,8 +351,8 @@ int > > > @@ -363,7 +380,7 @@ nfsd_svc(unsigned short port, int nrservs) > > > @@ -399,7 +416,7 @@ nfsd(struct svc_rqst *rqstp) > > > @@ -417,11 +434,13 @@ nfsd(struct svc_rqst *rqstp) > > > @@ -477,7 +496,7 @@ nfsd(struct svc_rqst *rqstp) > > > @@ -486,7 +505,7 @@ out: > > > > > All good. > > > diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c > > > index 01c7e31..7bffaff 100644 > > > --- a/net/sunrpc/svc.c > > > +++ b/net/sunrpc/svc.c > > > @@ -461,7 +461,8 @@ svc_create_pooled(struct svc_program *prog, unsigned int bufsize, > > > @@ -578,9 +579,10 @@ out_enomem: > > > @@ -674,7 +676,7 @@ found_pool: > > > @@ -722,7 +724,8 @@ svc_set_num_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs) > > > > > > > All good. > > > > -- > > Greg Banks, P.Engineer, SGI Australian Software Group. > > The cake is *not* a lie. > > I don't speak for SGI. > > > > _______________________________________________ > > NFSv4 mailing list > > NFSv4@linux-nfs.org > > http://linux-nfs.org/cgi-bin/mailman/listinfo/nfsv4 > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/4] knfsd: Replace lock_kernel with a mutex for nfsd thread startup/shutdown locking. 2008-06-05 20:03 ` J. Bruce Fields 2008-06-05 20:15 ` Jeff Layton @ 2008-06-05 23:35 ` Greg Banks [not found] ` <48487857.4030706-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org> 1 sibling, 1 reply; 18+ messages in thread From: Greg Banks @ 2008-06-05 23:35 UTC (permalink / raw) To: J. Bruce Fields; +Cc: linux-nfs, nfsv4, Jeff Layton J. Bruce Fields wrote: > On Thu, Jun 05, 2008 at 10:47:37AM +1000, Greg Banks wrote: > >> I notice there's no change to write_leasetime(). That calls >> nfs4_reset_lease(), which seems to want to only be called when nfsd is >> not started (according to my reading of the comment preceding the >> function), and which uses the BKL to protect the variable >> user_lease_time. Perhaps that path should be changed to use the new >> nfsd_mutex also? >> > > write_leasetime() just results in setting the user_lease_time, which is > copied (once, on server startup) to lease_time, which is what is > actually used by the running server. Yes, understood. > So I don't think there's a race > here (and the existing BKL use in write_leasetime() isn't really > needed). > The race is pretty harmless. The only read of user_lease_time happens during the critical section currently guarded by the BKL in nfsd_svc(); the only write of the variable is not guarded by that lock. If you call write_leasetime() many times with different values during nfsd startup, the actual value used is not predictable and you can't tell from userspace whether your write() succeeded in changing the behaviour of the server or not. Also, you can do write_leasetime() after nfsd startup without useful effect; other parameters which can't be changed from /proc after the service is running will fail the write() with EBUSY. -- Greg Banks, P.Engineer, SGI Australian Software Group. The cake is *not* a lie. I don't speak for SGI. ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <48487857.4030706-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>]
* Re: [PATCH 1/4] knfsd: Replace lock_kernel with a mutex for nfsd thread startup/shutdown locking. [not found] ` <48487857.4030706-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org> @ 2008-06-06 15:05 ` Jeff Layton 0 siblings, 0 replies; 18+ messages in thread From: Jeff Layton @ 2008-06-06 15:05 UTC (permalink / raw) To: Greg Banks; +Cc: J. Bruce Fields, linux-nfs, nfsv4 On Fri, 06 Jun 2008 09:35:51 +1000 Greg Banks <gnb-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org> wrote: > J. Bruce Fields wrote: > > On Thu, Jun 05, 2008 at 10:47:37AM +1000, Greg Banks wrote: > > > >> I notice there's no change to write_leasetime(). That calls > >> nfs4_reset_lease(), which seems to want to only be called when nfsd is > >> not started (according to my reading of the comment preceding the > >> function), and which uses the BKL to protect the variable > >> user_lease_time. Perhaps that path should be changed to use the new > >> nfsd_mutex also? > >> > > > > write_leasetime() just results in setting the user_lease_time, which is > > copied (once, on server startup) to lease_time, which is what is > > actually used by the running server. > Yes, understood. > > So I don't think there's a race > > here (and the existing BKL use in write_leasetime() isn't really > > needed). > > > The race is pretty harmless. The only read of user_lease_time happens > during the critical section currently guarded by the BKL in nfsd_svc(); > the only write of the variable is not guarded by that lock. If you call > write_leasetime() many times with different values during nfsd startup, > the actual value used is not predictable and you can't tell from > userspace whether your write() succeeded in changing the behaviour of > the server or not. Also, you can do write_leasetime() after nfsd > startup without useful effect; other parameters which can't be changed > from /proc after the service is running will fail the write() with EBUSY. > The race there is pretty harmless, but fixing it sounds like it'll be low impact. I don't expect that the nfsd_mutex will be highly contended, so adding some locking around these interfaces really shouldn't harm anything. Also, as you point out, many of these interfaces should return -EBUSY if nfsd is up (to be consistent with how write_versions behaves), and we'll need proper locking to do that. I've got a new patchset that should hopefully address these problems and some of your other comments that I'll post in a little while... Cheers, -- Jeff Layton <jlayton@redhat.com> ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2008-06-06 15:05 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-04 15:03 [PATCH 0/4] Convert knfsd to kthread API and fix startup/shutdown races (try #2) Jeff Layton
2008-06-04 15:03 ` [PATCH 1/4] knfsd: Replace lock_kernel with a mutex for nfsd thread startup/shutdown locking Jeff Layton
2008-06-04 15:03 ` [PATCH 2/4] knfsd: remove special handling for SIGHUP Jeff Layton
2008-06-04 15:03 ` [PATCH 3/4] knfsd: convert knfsd to kthread API Jeff Layton
2008-06-04 15:03 ` [PATCH 4/4] sunrpc: remove unneeded field from svc_serv struct Jeff Layton
2008-06-05 1:30 ` Greg Banks
2008-06-05 1:28 ` [PATCH 3/4] knfsd: convert knfsd to kthread API Greg Banks
2008-06-05 0:59 ` [PATCH 2/4] knfsd: remove special handling for SIGHUP Greg Banks
2008-06-04 21:02 ` [PATCH 1/4] knfsd: Replace lock_kernel with a mutex for nfsd thread startup/shutdown locking J. Bruce Fields
2008-06-04 21:27 ` Jeff Layton
[not found] ` <20080604172752.31686797-RtJpwOs3+0O+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2008-06-04 21:58 ` J. Bruce Fields
2008-06-04 22:41 ` J. Bruce Fields
2008-06-05 0:07 ` Jeff Layton
2008-06-05 0:47 ` Greg Banks
2008-06-05 20:03 ` J. Bruce Fields
2008-06-05 20:15 ` Jeff Layton
2008-06-05 23:35 ` Greg Banks
[not found] ` <48487857.4030706-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
2008-06-06 15:05 ` Jeff Layton
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.