* VM issue causing high CPU loads
@ 2009-08-24 14:23 Yohan
2009-08-24 23:21 ` Andrew Morton
0 siblings, 1 reply; 18+ messages in thread
From: Yohan @ 2009-08-24 14:23 UTC (permalink / raw)
To: linux-kernel
Hi,
Is someone have an idea for that :
http://bugzilla.kernel.org/show_bug.cgi?id=14024
Thanks
Yohan
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: VM issue causing high CPU loads 2009-08-24 14:23 VM issue causing high CPU loads Yohan @ 2009-08-24 23:21 ` Andrew Morton 2009-08-26 11:08 ` Mel Gorman ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Andrew Morton @ 2009-08-24 23:21 UTC (permalink / raw) To: Yohan; +Cc: linux-kernel, linux-mm On Mon, 24 Aug 2009 16:23:22 +0200 Yohan <kernel@yohan.staff.proxad.net> wrote: > Hi, > > Is someone have an idea for that : > > http://bugzilla.kernel.org/show_bug.cgi?id=14024 > Please generate a kernel profile to work out where all the CPU tie is being spent. Documentation/basic_profiling.txt is a starting point. Thanks. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: VM issue causing high CPU loads 2009-08-24 23:21 ` Andrew Morton @ 2009-08-26 11:08 ` Mel Gorman 2009-08-26 11:55 ` Yohan 2009-08-26 11:53 ` Yohan 2009-08-27 8:39 ` Yohan 2 siblings, 1 reply; 18+ messages in thread From: Mel Gorman @ 2009-08-26 11:08 UTC (permalink / raw) To: Andrew Morton; +Cc: Yohan, linux-kernel, linux-mm On Mon, Aug 24, 2009 at 04:21:55PM -0700, Andrew Morton wrote: > On Mon, 24 Aug 2009 16:23:22 +0200 > Yohan <kernel@yohan.staff.proxad.net> wrote: > > > Hi, > > > > Is someone have an idea for that : > > > > http://bugzilla.kernel.org/show_bug.cgi?id=14024 > > > > Please generate a kernel profile to work out where all the CPU tie is > being spent. Documentation/basic_profiling.txt is a starting point. > In the absense of a profile, here is a total stab in the dark. Is this a NUMA machine? If so, is /proc/sys/vm/zone_reclaim_mode set to 1 and does setting it to 0 help? This is based on a relatively recent bug where malloc() could stall for long times with large amounts of CPU usage due to useless scanning in page reclaim. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: VM issue causing high CPU loads 2009-08-26 11:08 ` Mel Gorman @ 2009-08-26 11:55 ` Yohan 0 siblings, 0 replies; 18+ messages in thread From: Yohan @ 2009-08-26 11:55 UTC (permalink / raw) To: Mel Gorman; +Cc: Andrew Morton, linux-kernel, linux-mm Mel Gorman wrote: > On Mon, Aug 24, 2009 at 04:21:55PM -0700, Andrew Morton wrote: > >> On Mon, 24 Aug 2009 16:23:22 +0200 >> Yohan <kernel@yohan.staff.proxad.net> wrote: >> >>> Hi, >>> >>> Is someone have an idea for that : >>> >>> http://bugzilla.kernel.org/show_bug.cgi?id=14024 >>> >> Please generate a kernel profile to work out where all the CPU tie is >> being spent. Documentation/basic_profiling.txt is a starting point. >> > In the absense of a profile, here is a total stab in the dark. Is this a > NUMA machine? This is a Intel(R) Xeon(R) CPU E5520 on Dell R610 > If so, is /proc/sys/vm/zone_reclaim_mode set to 1 and does > setting it to 0 help? > The value is already 0... Thanks ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: VM issue causing high CPU loads 2009-08-24 23:21 ` Andrew Morton 2009-08-26 11:08 ` Mel Gorman @ 2009-08-26 11:53 ` Yohan 2009-08-27 8:39 ` Yohan 2 siblings, 0 replies; 18+ messages in thread From: Yohan @ 2009-08-26 11:53 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, linux-mm Andrew Morton wrote: > On Mon, 24 Aug 2009 16:23:22 +0200 > Yohan <kernel@yohan.staff.proxad.net> wrote: > >> Hi, >> >> Is someone have an idea for that : >> >> http://bugzilla.kernel.org/show_bug.cgi?id=14024 >> > Please generate a kernel profile to work out where all the CPU tie is > being spent. Documentation/basic_profiling.txt is a starting point. I did & post the profiles on the bugtrack I dit it with a 2.6.31-rc7-git2 kernel (need at least 2 week days after a reboot/drop_cache to really show the bug) Thanks ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: VM issue causing high CPU loads 2009-08-24 23:21 ` Andrew Morton 2009-08-26 11:08 ` Mel Gorman 2009-08-26 11:53 ` Yohan @ 2009-08-27 8:39 ` Yohan 2009-08-31 20:39 ` Yohan 2 siblings, 1 reply; 18+ messages in thread From: Yohan @ 2009-08-27 8:39 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, linux-mm Andrew Morton wrote: > On Mon, 24 Aug 2009 16:23:22 +0200 > Yohan <kernel@yohan.staff.proxad.net> wrote: > >> Hi, >> >> Is someone have an idea for that : >> >> http://bugzilla.kernel.org/show_bug.cgi?id=14024 >> > Please generate a kernel profile to work out where all the CPU tie is > being spent. Documentation/basic_profiling.txt is a starting point. > I post some new reports, it seems that the problem is in rpcauth_lookup_credcache ... for information, this is an imap mail server that mounts ~10 netapp over ~300 mountpoints.. Thanks Yohan ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: VM issue causing high CPU loads 2009-08-27 8:39 ` Yohan @ 2009-08-31 20:39 ` Yohan 2009-09-03 0:06 ` Andrew Morton 0 siblings, 1 reply; 18+ messages in thread From: Yohan @ 2009-08-31 20:39 UTC (permalink / raw) To: Yohan; +Cc: Andrew Morton, linux-kernel Yohan wrote: > Andrew Morton wrote: >> On Mon, 24 Aug 2009 16:23:22 +0200 >> Yohan <kernel@yohan.staff.proxad.net> wrote: >>> Hi, >>> >>> Is someone have an idea for that : >>> >>> http://bugzilla.kernel.org/show_bug.cgi?id=14024 >>> >> Please generate a kernel profile to work out where all the CPU tie is >> being spent. Documentation/basic_profiling.txt is a starting point. >> > I post some new reports, it seems that the problem is in > rpcauth_lookup_credcache ... > > for information, this is an imap mail server that mounts ~10 netapp > over ~300 mountpoints.. I saw that : http://patchwork.kernel.org/patch/24747/ I did only: --- linux-2.6.27.21/include/linux/sunrpc/auth.h 2009-03-23 23:04:09.000000000 +0100 +++ linux-2.6.27.21/include/linux/sunrpc/auth.h 2009-05-19 16:02:35.000000000 +0200 @@ -62,8 +62,12 @@ */ - #define RPC_CREDCACHE_HASHBITS 4 + #define RPC_CREDCACHE_HASHBITS 12 And i test it in prod since sunday: i only have 36% of one core used by system versus more than 3 cores used by system in another server that did a drop_caches at morning... Yohan ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: VM issue causing high CPU loads 2009-08-31 20:39 ` Yohan @ 2009-09-03 0:06 ` Andrew Morton 2009-09-03 13:01 ` Trond Myklebust 0 siblings, 1 reply; 18+ messages in thread From: Andrew Morton @ 2009-09-03 0:06 UTC (permalink / raw) To: Yohan Cc: ytordjman, linux-kernel, linux-nfs, Neil Brown, J. Bruce Fields, Trond Myklebust, mikevs On Mon, 31 Aug 2009 22:39:20 +0200 Yohan <ytordjman@corp.free.fr> wrote: > Yohan wrote: > > Andrew Morton wrote: > >> On Mon, 24 Aug 2009 16:23:22 +0200 > >> Yohan <kernel@yohan.staff.proxad.net> wrote: > >>> Hi, > >>> > >>> Is someone have an idea for that : > >>> > >>> http://bugzilla.kernel.org/show_bug.cgi?id=14024 > >>> > >> Please generate a kernel profile to work out where all the CPU tie is > >> being spent. Documentation/basic_profiling.txt is a starting point. > >> > > I post some new reports, it seems that the problem is in > > rpcauth_lookup_credcache ... Thanks, that helps a lot. > > for information, this is an imap mail server that mounts ~10 netapp > > over ~300 mountpoints.. > I saw that : http://patchwork.kernel.org/patch/24747/ I wonder what happened with Miquel's patch? > I did only: > > --- linux-2.6.27.21/include/linux/sunrpc/auth.h 2009-03-23 23:04:09.000000000 +0100 > +++ linux-2.6.27.21/include/linux/sunrpc/auth.h 2009-05-19 16:02:35.000000000 +0200 > @@ -62,8 +62,12 @@ > */ > - #define RPC_CREDCACHE_HASHBITS 4 > + #define RPC_CREDCACHE_HASHBITS 12 > > > And i test it in prod since sunday: i only have 36% of one core used by > system > versus more than 3 cores used by system in another server that did a > drop_caches at morning... > OK, but it's still pretty bad. Let's tell the NFS guys. In http://bugzilla.kernel.org/show_bug.cgi?id=14024 we appear to have a major meltdown caused by the linear search in rpcauth_lookup_credcache() with Yohan's workload. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: VM issue causing high CPU loads 2009-09-03 0:06 ` Andrew Morton @ 2009-09-03 13:01 ` Trond Myklebust 2009-09-03 13:39 ` Yohan 0 siblings, 1 reply; 18+ messages in thread From: Trond Myklebust @ 2009-09-03 13:01 UTC (permalink / raw) To: Andrew Morton Cc: Yohan, linux-kernel, linux-nfs, Neil Brown, J. Bruce Fields, mikevs On Wed, 2009-09-02 at 17:06 -0700, Andrew Morton wrote: > On Mon, 31 Aug 2009 22:39:20 +0200 > Yohan <ytordjman@corp.free.fr> wrote: > > > Yohan wrote: > > > Andrew Morton wrote: > > >> On Mon, 24 Aug 2009 16:23:22 +0200 > > >> Yohan <kernel@yohan.staff.proxad.net> wrote: > > >>> Hi, > > >>> > > >>> Is someone have an idea for that : > > >>> > > >>> http://bugzilla.kernel.org/show_bug.cgi?id=14024 > > >>> > > >> Please generate a kernel profile to work out where all the CPU tie is > > >> being spent. Documentation/basic_profiling.txt is a starting point. > > >> > > > I post some new reports, it seems that the problem is in > > > rpcauth_lookup_credcache ... > > Thanks, that helps a lot. > > > > for information, this is an imap mail server that mounts ~10 netapp > > > over ~300 mountpoints.. > > I saw that : http://patchwork.kernel.org/patch/24747/ > > I wonder what happened with Miquel's patch? At the time, I asked him to split out the various changes into several patches. His patch did a lot of different things that would impact workloads in different ways. For instance, while increasing the hash table size is not likely to have a huge performance degradation for most people, the change that decreases the garbage collection timeout is very likely to cause issues (particularly with RPCSEC_GSS setups)... > > I did only: > > > > --- linux-2.6.27.21/include/linux/sunrpc/auth.h 2009-03-23 23:04:09.000000000 +0100 > > +++ linux-2.6.27.21/include/linux/sunrpc/auth.h 2009-05-19 16:02:35.000000000 +0200 > > @@ -62,8 +62,12 @@ > > */ > > - #define RPC_CREDCACHE_HASHBITS 4 > > + #define RPC_CREDCACHE_HASHBITS 12 > > > > > > And i test it in prod since sunday: i only have 36% of one core used by > > system > > versus more than 3 cores used by system in another server that did a > > drop_caches at morning... > > > > OK, but it's still pretty bad. Let's tell the NFS guys. > > In http://bugzilla.kernel.org/show_bug.cgi?id=14024 we appear to have a > major meltdown caused by the linear search in > rpcauth_lookup_credcache() with Yohan's workload. > OK. Could we please have some more details about the actual workload involved here? As far as I can see, there is no RPCSEC_GSS involved, so credentials should never expire. They will be reused as long as processes aren't switching between thousands and thousands of different combinations of uid, gid and groups. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: VM issue causing high CPU loads 2009-09-03 13:01 ` Trond Myklebust @ 2009-09-03 13:39 ` Yohan 2009-09-03 14:02 ` Trond Myklebust 0 siblings, 1 reply; 18+ messages in thread From: Yohan @ 2009-09-03 13:39 UTC (permalink / raw) To: Trond Myklebust Cc: Andrew Morton, linux-kernel, linux-nfs, Neil Brown, J. Bruce Fields, mikevs >>> I did only: >>> >>> --- linux-2.6.27.21/include/linux/sunrpc/auth.h 2009-03-23 23:04:09.000000000 +0100 >>> +++ linux-2.6.27.21/include/linux/sunrpc/auth.h 2009-05-19 16:02:35.000000000 +0200 >>> @@ -62,8 +62,12 @@ >>> */ >>> - #define RPC_CREDCACHE_HASHBITS 4 >>> + #define RPC_CREDCACHE_HASHBITS 12 >>> >>> >>> And i test it in prod since sunday: i only have 36% of one core used by >>> system >>> versus more than 3 cores used by system in another server that did a >>> drop_caches at morning... >>> >> OK, but it's still pretty bad. Let's tell the NFS guys. >> >> In http://bugzilla.kernel.org/show_bug.cgi?id=14024 we appear to have a >> major meltdown caused by the linear search in >> rpcauth_lookup_credcache() with Yohan's workload. >> > OK. Could we please have some more details about the actual workload involved here? > I add a new server CPU graph and 60s readprofile on the bugzilla > As far as I can see, there is no RPCSEC_GSS involved, so credentials > should never expire. They will be reused as long as processes aren't > switching between thousands and thousands of different combinations of > uid, gid and groups. My servers are imap servers. Foreach user (~15 million) it have a specific uid over ~10 nfs netapp storage. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: VM issue causing high CPU loads 2009-09-03 13:39 ` Yohan @ 2009-09-03 14:02 ` Trond Myklebust 2009-09-03 14:08 ` Yohan ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Trond Myklebust @ 2009-09-03 14:02 UTC (permalink / raw) To: Yohan Cc: Andrew Morton, linux-kernel, linux-nfs, Neil Brown, J. Bruce Fields, mikevs On Thu, 2009-09-03 at 15:39 +0200, Yohan wrote: > > As far as I can see, there is no RPCSEC_GSS involved, so credentials > > should never expire. They will be reused as long as processes aren't > > switching between thousands and thousands of different combinations of > > uid, gid and groups. > My servers are imap servers. > Foreach user (~15 million) it have a specific uid over ~10 nfs netapp > storage. OK, so 16 hash buckets are likely to be filled with ~10^6 entries each. I can see that might be a performance issue... So afaics, you did try adjusting the hashtable size. How much larger does it have to be before you start to get acceptable performance? If it solves your problem we could make hash table sizes adjustable via a module parameter, for instance. Cheers Trond ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: VM issue causing high CPU loads 2009-09-03 14:02 ` Trond Myklebust @ 2009-09-03 14:08 ` Yohan 2009-09-03 14:35 ` sunrpc: dynamically allocate credcache hashtables [was: Re: VM issue causing high CPU loads] Miquel van Smoorenburg 2009-09-03 20:05 ` VM issue causing high CPU loads Simon Kirby 2 siblings, 0 replies; 18+ messages in thread From: Yohan @ 2009-09-03 14:08 UTC (permalink / raw) To: Trond Myklebust Cc: Andrew Morton, linux-kernel, linux-nfs, Neil Brown, J. Bruce Fields, mikevs Trond Myklebust wrote: > On Thu, 2009-09-03 at 15:39 +0200, Yohan wrote: > >>> As far as I can see, there is no RPCSEC_GSS involved, so credentials >>> should never expire. They will be reused as long as processes aren't >>> switching between thousands and thousands of different combinations of >>> uid, gid and groups. >>> >> My servers are imap servers. >> Foreach user (~15 million) it have a specific uid over ~10 nfs netapp >> storage. >> > OK, so 16 hash buckets are likely to be filled with ~10^6 entries each. > I can see that might be a performance issue... > > So afaics, you did try adjusting the hashtable size. How much larger > does it have to be before you start to get acceptable performance? If it > solves your problem we could make hash table sizes adjustable via a > module parameter, for instance. > I run now with a value of 12, and it's great for me... ^ permalink raw reply [flat|nested] 18+ messages in thread
* sunrpc: dynamically allocate credcache hashtables [was: Re: VM issue causing high CPU loads] 2009-09-03 14:02 ` Trond Myklebust 2009-09-03 14:08 ` Yohan @ 2009-09-03 14:35 ` Miquel van Smoorenburg 2009-09-03 20:05 ` VM issue causing high CPU loads Simon Kirby 2 siblings, 0 replies; 18+ messages in thread From: Miquel van Smoorenburg @ 2009-09-03 14:35 UTC (permalink / raw) To: Trond Myklebust Cc: Yohan, Andrew Morton, linux-kernel, linux-nfs, Neil Brown, J. Bruce Fields, mikevs [-- Attachment #1: Type: text/plain, Size: 2128 bytes --] On Thu, 2009-09-03 at 10:02 -0400, Trond Myklebust wrote: > On Thu, 2009-09-03 at 15:39 +0200, Yohan wrote: > > > As far as I can see, there is no RPCSEC_GSS involved, so credentials > > > should never expire. They will be reused as long as processes aren't > > > switching between thousands and thousands of different combinations of > > > uid, gid and groups. > > My servers are imap servers. > > Foreach user (~15 million) it have a specific uid over ~10 nfs netapp > > storage. > > OK, so 16 hash buckets are likely to be filled with ~10^6 entries each. > I can see that might be a performance issue... > > So afaics, you did try adjusting the hashtable size. How much larger > does it have to be before you start to get acceptable performance? If it > solves your problem we could make hash table sizes adjustable via a > module parameter, for instance. That is *exactly* what my patch does :) I ported it to 2.6.31-rc8-bk2 this afternoon, that was trivial. What I wanted to discuss was finding out if there was another solution, or that we should build something that auto-tunes hashtable sizes, of if there was a way to limit the size of the cache in another way. I have the same usage pattern as Yohan (also an IMAP server for potentially a few million different uids) - lots of uids are used, but not simultaneously (maybe a few hundred or a thousand at the same time). It's just that the inode/dentry/cred caches never expire because modern boxes have lots and lots of memory. Due to personal circumstances though I haven't been able to work on anything much for the last few months. I apologize for keeping quiet. Patch attached. I've removed the debugging stuff, this is only the "dynamically allocate credcache hashtables" patch. Patch description: auth.h: increase RPC_CREDCACHE_HASHBITS from 4 to 12 (16 hashtable entries -> 4096). This is just the default. auth.c: allocate hashtables dyamically add sysctl for credcache_hashsize auth_generic.c: use rpcauth_init_credcache auth_unix.c: use rpcauth_init_credcache sunrpc_syms.c: add hashsize module parameter Mike. [-- Attachment #2: linux-2.6.31-rc8-git2-sunprc-credcache_hashsize.patch --] [-- Type: text/x-patch, Size: 9129 bytes --] diff -ruN linux-2.6.31-rc8-git2.orig/include/linux/sunrpc/auth.h linux-2.6.31-rc8-git2/include/linux/sunrpc/auth.h --- linux-2.6.31-rc8-git2.orig/include/linux/sunrpc/auth.h 2009-08-28 02:59:04.000000000 +0200 +++ linux-2.6.31-rc8-git2/include/linux/sunrpc/auth.h 2009-09-03 12:29:45.000000000 +0200 @@ -60,10 +60,14 @@ /* * Client authentication handle */ -#define RPC_CREDCACHE_HASHBITS 4 +#define RPC_CREDCACHE_HASHBITS 12 #define RPC_CREDCACHE_NR (1 << RPC_CREDCACHE_HASHBITS) +#define RPC_CREDCACHE_MIN 4 +#define RPC_CREDCACHE_MAX 16384 struct rpc_cred_cache { - struct hlist_head hashtable[RPC_CREDCACHE_NR]; + int hashsize; + int hashbits; + struct hlist_head *hashtable; spinlock_t lock; }; @@ -124,9 +128,8 @@ extern const struct rpc_authops authunix_ops; extern const struct rpc_authops authnull_ops; -void __init rpc_init_authunix(void); -void __init rpc_init_generic_auth(void); -void __init rpcauth_init_module(void); +int __init rpc_init_generic_auth(void); +int __init rpcauth_init_module(int); void __exit rpcauth_remove_module(void); void __exit rpc_destroy_generic_auth(void); diff -ruN linux-2.6.31-rc8-git2.orig/net/sunrpc/auth.c linux-2.6.31-rc8-git2/net/sunrpc/auth.c --- linux-2.6.31-rc8-git2.orig/net/sunrpc/auth.c 2009-08-28 02:59:04.000000000 +0200 +++ linux-2.6.31-rc8-git2/net/sunrpc/auth.c 2009-09-03 13:59:01.000000000 +0200 @@ -14,6 +14,8 @@ #include <linux/hash.h> #include <linux/sunrpc/clnt.h> #include <linux/spinlock.h> +#include <linux/vmalloc.h> +#include <linux/sysctl.h> #ifdef RPC_DEBUG # define RPCDBG_FACILITY RPCDBG_AUTH @@ -28,6 +30,7 @@ static LIST_HEAD(cred_unused); static unsigned long number_cred_unused; +int credcache_hashsize = RPC_CREDCACHE_NR; static u32 pseudoflavor_to_flavor(u32 flavor) { @@ -147,7 +150,14 @@ new = kmalloc(sizeof(*new), GFP_KERNEL); if (!new) return -ENOMEM; - for (i = 0; i < RPC_CREDCACHE_NR; i++) + new->hashsize = credcache_hashsize; + new->hashbits = ilog2(new->hashsize); + new->hashtable = vmalloc(new->hashsize * sizeof(struct hlist_head)); + if (!new->hashtable) { + kfree(new); + return -ENOMEM; + } + for (i = 0; i < new->hashsize; i++) INIT_HLIST_HEAD(&new->hashtable[i]); spin_lock_init(&new->lock); auth->au_credcache = new; @@ -184,7 +194,7 @@ spin_lock(&rpc_credcache_lock); spin_lock(&cache->lock); - for (i = 0; i < RPC_CREDCACHE_NR; i++) { + for (i = 0; i < cache->hashsize; i++) { head = &cache->hashtable[i]; while (!hlist_empty(head)) { cred = hlist_entry(head->first, struct rpc_cred, cr_hash); @@ -213,6 +223,8 @@ if (cache) { auth->au_credcache = NULL; rpcauth_clear_credcache(cache); + if (cache->hashtable) + vfree(cache->hashtable); kfree(cache); } } @@ -291,7 +303,7 @@ *entry, *new; unsigned int nr; - nr = hash_long(acred->uid, RPC_CREDCACHE_HASHBITS); + nr = hash_long(acred->uid, cache->hashbits); rcu_read_lock(); hlist_for_each_entry_rcu(entry, pos, &cache->hashtable[nr], cr_hash) { @@ -568,19 +580,87 @@ test_bit(RPCAUTH_CRED_UPTODATE, &cred->cr_flags) != 0; } +#ifdef RPC_DEBUG +static int proc_credcache_hashsize(struct ctl_table *table, int write, + struct file *file, void __user *buffer, + size_t *length, loff_t *ppos) +{ + int tmp = credcache_hashsize; + + table->data = &tmp; + table->maxlen = sizeof(int); + proc_dointvec(table, write, file, buffer, length, ppos); + if (write) { + if (tmp < RPC_CREDCACHE_MIN || + tmp > RPC_CREDCACHE_MAX || + !is_power_of_2(tmp)) + return -EINVAL; + credcache_hashsize = tmp; + } + return 0; +} + +static ctl_table sunrpc_credcache_knobs_table [] = { + { + .procname = "credcache_hashsize", + .data = NULL, + .mode = 0644, + .proc_handler = &proc_credcache_hashsize, + }, + { + .ctl_name = 0, + } +}; + +static ctl_table sunrpc_credcache_table[] = { + { + .ctl_name = CTL_SUNRPC, + .procname = "sunrpc", + .mode = 0555, + .child = sunrpc_credcache_knobs_table, + }, + { + .ctl_name = 0, + } +}; + +static struct ctl_table_header *sunrpc_credcache_table_header; +#endif + static struct shrinker rpc_cred_shrinker = { .shrink = rpcauth_cache_shrinker, .seeks = DEFAULT_SEEKS, }; -void __init rpcauth_init_module(void) +int __init rpcauth_init_module(int hashsize) { - rpc_init_authunix(); - rpc_init_generic_auth(); + int err; + + if (hashsize) { + hashsize = min(hashsize, RPC_CREDCACHE_MAX); + hashsize = max(hashsize, RPC_CREDCACHE_MIN); + credcache_hashsize = rounddown_pow_of_two(hashsize); + printk(KERN_INFO "RPC: credcache hashtable size %d\n", + credcache_hashsize); + } + + err = rpc_init_generic_auth(); + if (err) + goto out; +#ifdef RPC_DEBUG + sunrpc_credcache_table_header = + register_sysctl_table(sunrpc_credcache_table); +#endif register_shrinker(&rpc_cred_shrinker); +out: + return err; } void __exit rpcauth_remove_module(void) { +#ifdef RPC_DEBUG + if (sunrpc_credcache_table_header) + unregister_sysctl_table(sunrpc_credcache_table_header); +#endif unregister_shrinker(&rpc_cred_shrinker); } diff -ruN linux-2.6.31-rc8-git2.orig/net/sunrpc/auth_generic.c linux-2.6.31-rc8-git2/net/sunrpc/auth_generic.c --- linux-2.6.31-rc8-git2.orig/net/sunrpc/auth_generic.c 2009-08-28 02:59:04.000000000 +0200 +++ linux-2.6.31-rc8-git2/net/sunrpc/auth_generic.c 2009-09-03 12:29:45.000000000 +0200 @@ -26,7 +26,6 @@ }; static struct rpc_auth generic_auth; -static struct rpc_cred_cache generic_cred_cache; static const struct rpc_credops generic_credops; /* @@ -158,20 +157,16 @@ return 0; } -void __init rpc_init_generic_auth(void) +int __init rpc_init_generic_auth(void) { - spin_lock_init(&generic_cred_cache.lock); + return rpcauth_init_credcache(&generic_auth); } void __exit rpc_destroy_generic_auth(void) { - rpcauth_clear_credcache(&generic_cred_cache); + rpcauth_destroy_credcache(&generic_auth); } -static struct rpc_cred_cache generic_cred_cache = { - {{ NULL, },}, -}; - static const struct rpc_authops generic_auth_ops = { .owner = THIS_MODULE, .au_name = "Generic", @@ -182,7 +177,6 @@ static struct rpc_auth generic_auth = { .au_ops = &generic_auth_ops, .au_count = ATOMIC_INIT(0), - .au_credcache = &generic_cred_cache, }; static const struct rpc_credops generic_credops = { diff -ruN linux-2.6.31-rc8-git2.orig/net/sunrpc/auth_unix.c linux-2.6.31-rc8-git2/net/sunrpc/auth_unix.c --- linux-2.6.31-rc8-git2.orig/net/sunrpc/auth_unix.c 2009-08-28 02:59:04.000000000 +0200 +++ linux-2.6.31-rc8-git2/net/sunrpc/auth_unix.c 2009-09-03 12:29:45.000000000 +0200 @@ -28,15 +28,23 @@ #endif static struct rpc_auth unix_auth; -static struct rpc_cred_cache unix_cred_cache; static const struct rpc_credops unix_credops; static struct rpc_auth * unx_create(struct rpc_clnt *clnt, rpc_authflavor_t flavor) { + int err; + dprintk("RPC: creating UNIX authenticator for client %p\n", clnt); atomic_inc(&unix_auth.au_count); + if (!unix_auth.au_credcache) { + err = rpcauth_init_credcache(&unix_auth); + if (err) { + atomic_dec(&unix_auth.au_count); + return ERR_PTR(err); + } + } return &unix_auth; } @@ -202,11 +210,6 @@ return p; } -void __init rpc_init_authunix(void) -{ - spin_lock_init(&unix_cred_cache.lock); -} - const struct rpc_authops authunix_ops = { .owner = THIS_MODULE, .au_flavor = RPC_AUTH_UNIX, @@ -218,17 +221,12 @@ }; static -struct rpc_cred_cache unix_cred_cache = { -}; - -static struct rpc_auth unix_auth = { .au_cslack = UNX_WRITESLACK, .au_rslack = 2, /* assume AUTH_NULL verf */ .au_ops = &authunix_ops, .au_flavor = RPC_AUTH_UNIX, .au_count = ATOMIC_INIT(0), - .au_credcache = &unix_cred_cache, }; static diff -ruN linux-2.6.31-rc8-git2.orig/net/sunrpc/sunrpc_syms.c linux-2.6.31-rc8-git2/net/sunrpc/sunrpc_syms.c --- linux-2.6.31-rc8-git2.orig/net/sunrpc/sunrpc_syms.c 2009-08-28 02:59:04.000000000 +0200 +++ linux-2.6.31-rc8-git2/net/sunrpc/sunrpc_syms.c 2009-09-03 12:29:45.000000000 +0200 @@ -23,6 +23,7 @@ #include <linux/sunrpc/xprtsock.h> extern struct cache_detail ip_map_cache, unix_gid_cache; +static int hashsize; static int __init init_sunrpc(void) @@ -31,13 +32,14 @@ if (err) goto out; err = rpc_init_mempool(); - if (err) { - unregister_rpc_pipefs(); - goto out; - } + if (err) + goto out_err1; #ifdef RPC_DEBUG rpc_register_sysctl(); #endif + err = rpcauth_init_module(hashsize); + if (err) + goto out_err2; #ifdef CONFIG_PROC_FS rpc_proc_init(); #endif @@ -45,7 +47,14 @@ cache_register(&unix_gid_cache); svc_init_xprt_sock(); /* svc sock transport */ init_socket_xprt(); /* clnt sock transport */ - rpcauth_init_module(); + goto out; +out_err2: + rpc_destroy_mempool(); +#ifdef RPC_DEBUG + rpc_unregister_sysctl(); +#endif +out_err1: + unregister_rpc_pipefs(); out: return err; } @@ -68,6 +77,8 @@ #endif rcu_barrier(); /* Wait for completion of call_rcu()'s */ } +module_param(hashsize, int, 0); +MODULE_PARM_DESC(hashsize, "size of hashtables for credential caches"); MODULE_LICENSE("GPL"); module_init(init_sunrpc); module_exit(cleanup_sunrpc); ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: VM issue causing high CPU loads 2009-09-03 14:02 ` Trond Myklebust 2009-09-03 14:08 ` Yohan 2009-09-03 14:35 ` sunrpc: dynamically allocate credcache hashtables [was: Re: VM issue causing high CPU loads] Miquel van Smoorenburg @ 2009-09-03 20:05 ` Simon Kirby 2009-09-03 20:49 ` Trond Myklebust 2009-09-03 21:21 ` Muntz, Daniel 2 siblings, 2 replies; 18+ messages in thread From: Simon Kirby @ 2009-09-03 20:05 UTC (permalink / raw) To: Trond Myklebust Cc: Yohan, Andrew Morton, linux-kernel, linux-nfs, Neil Brown, J. Bruce Fields, mikevs On Thu, Sep 03, 2009 at 10:02:06AM -0400, Trond Myklebust wrote: > OK, so 16 hash buckets are likely to be filled with ~10^6 entries each. > I can see that might be a performance issue... We have a similar setup with millions of UIDs over NFS (currently NFSv3). I _wish_ there were a way to use NFSv4 without having to use name-mapped UIDs and GIDs, since our user and group names come from MySQL anyway, and are guaranteed to be consistent across machines. Why on earth does NFSv4 force the use of names? I was considering hacking the code to stick IDs in there anyway, but I haven't looked at the feasibility of this. I suspect this would break or complicate other things, but the current NFSv4 design just seems like an incredible waste for this case. Simon- ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: VM issue causing high CPU loads 2009-09-03 20:05 ` VM issue causing high CPU loads Simon Kirby @ 2009-09-03 20:49 ` Trond Myklebust 2009-09-03 22:22 ` Simon Kirby 2009-09-03 21:21 ` Muntz, Daniel 1 sibling, 1 reply; 18+ messages in thread From: Trond Myklebust @ 2009-09-03 20:49 UTC (permalink / raw) To: Simon Kirby Cc: Yohan, Andrew Morton, linux-kernel, linux-nfs, Neil Brown, J. Bruce Fields, mikevs On Thu, 2009-09-03 at 13:05 -0700, Simon Kirby wrote: > On Thu, Sep 03, 2009 at 10:02:06AM -0400, Trond Myklebust wrote: > > > OK, so 16 hash buckets are likely to be filled with ~10^6 entries each. > > I can see that might be a performance issue... > > We have a similar setup with millions of UIDs over NFS (currently NFSv3). > I _wish_ there were a way to use NFSv4 without having to use name-mapped > UIDs and GIDs, since our user and group names come from MySQL anyway, and > are guaranteed to be consistent across machines. That's a separate issue. I'm working on increasing the idmapper scalability, however another project is currently taking up most of my time. I can't guarantee that the revised idmapper code will be finished in time to allow for inclusion in 2.6.32. > Why on earth does NFSv4 force the use of names? NFSv4 aspires to be an internet-wide protocol, and so you cannot use uids/gids: they just aren't guaranteed to represent a unique user outside your local LDAP/NIS or /etc/passwd domain. Furthermore, uids and gids are a posix construct. They simply don't work in environments where you may have lots of non-posix systems. Trond ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: VM issue causing high CPU loads 2009-09-03 20:49 ` Trond Myklebust @ 2009-09-03 22:22 ` Simon Kirby 2009-09-04 12:31 ` Trond Myklebust 0 siblings, 1 reply; 18+ messages in thread From: Simon Kirby @ 2009-09-03 22:22 UTC (permalink / raw) To: Trond Myklebust Cc: Yohan, Andrew Morton, linux-kernel, linux-nfs, Neil Brown, J. Bruce Fields, mikevs On Thu, Sep 03, 2009 at 04:49:25PM -0400, Trond Myklebust wrote: > I'm working on increasing the idmapper scalability, however another > project is currently taking up most of my time. I can't guarantee that > the revised idmapper code will be finished in time to allow for > inclusion in 2.6.32. Sure, improving it would be nice for cases where it's needed, but in environments where all IDs are consistent (by design), it just seems silly to force this extra work for zero gain. > NFSv4 aspires to be an internet-wide protocol, and so you cannot use > uids/gids: they just aren't guaranteed to represent a unique user > outside your local LDAP/NIS or /etc/passwd domain. Furthermore, uids and > gids are a posix construct. They simply don't work in environments where > you may have lots of non-posix systems. So, for environments with all POSIX systems, what do you think about perhaps a mount or export flag that violates the spec on purpose to allow numeric IDs to be used? I can understand that the quiet use of IDs if name-to-user mapping fails will cause security issues in environments without consistent users, so it would now be unsafe to turn this on silently. However, making this an option seems reasonable to me. (Not that I know what I'm doing.) Simon- ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: VM issue causing high CPU loads 2009-09-03 22:22 ` Simon Kirby @ 2009-09-04 12:31 ` Trond Myklebust 0 siblings, 0 replies; 18+ messages in thread From: Trond Myklebust @ 2009-09-04 12:31 UTC (permalink / raw) To: Simon Kirby Cc: Yohan, Andrew Morton, linux-kernel, linux-nfs, Neil Brown, J. Bruce Fields, mikevs On Thu, 2009-09-03 at 15:22 -0700, Simon Kirby wrote: > So, for environments with all POSIX systems, what do you think about > perhaps a mount or export flag that violates the spec on purpose to allow > numeric IDs to be used? No! I'm not interested in starting a LinuxPrivateNFSv4 protocol on top of everything else we've got... Trond ^ permalink raw reply [flat|nested] 18+ messages in thread
* RE: VM issue causing high CPU loads 2009-09-03 20:05 ` VM issue causing high CPU loads Simon Kirby 2009-09-03 20:49 ` Trond Myklebust @ 2009-09-03 21:21 ` Muntz, Daniel 1 sibling, 0 replies; 18+ messages in thread From: Muntz, Daniel @ 2009-09-03 21:21 UTC (permalink / raw) To: Simon Kirby, Trond Myklebust Cc: Yohan, Andrew Morton, linux-kernel, linux-nfs, Neil Brown, J. Bruce Fields, mikevs Amen. I understand that v4 wants to extend across domains, etc., but it goes out of its way to prevent the use of uids/gids, which in the vast majority of installations would work just fine and wouldn't incur the overhead of the mapping/unmapping operations. There's no reason uids/gids couldn't coexist with string names. If the 4.0 spec had a slightly different version of this paragraph: To provide a greater degree of compatibility with previous versions of NFS (i.e., v2 and v3), which identified users and groups by 32-bit unsigned uid's and gid's, owner and group strings that consist of decimal numeric values with no leading zeros can be given a special interpretation by clients and servers which choose to provide such support. The receiver may treat such a user or group string as representing the same user as would be represented by a v2/v3 uid or gid having the corresponding numeric value. A server is not obligated to accept such a string, but may return an NFS4ERR_BADOWNER instead. To avoid this mechanism being used to subvert user and group translation, so that a client might pass all of the owners and groups in numeric form, a server SHOULD return an NFS4ERR_BADOWNER error when there is a valid translation for the user or owner designated in this way. In that case, the client must use the appropriate name@domain string and not the special form for compatibility. i.e., take out the "subvert" portion, and just plain allow string representations of uids/gids, then at least the conversion would just be an atoi and itoa. Even better, allow the uids/gids to be used directly and avoid the atoi/itoa, perhaps with a flag. Either case is better than idmapd and getting EDELAY and an X-second pause in odd places because NFS has to go to userspace for a translation. -Dan Quixote > -----Original Message----- > From: Simon Kirby [mailto:sim@hostway.ca] > Sent: Thursday, September 03, 2009 1:06 PM > To: Trond Myklebust > Cc: Yohan; Andrew Morton; linux-kernel@vger.kernel.org; > linux-nfs@vger.kernel.org; Neil Brown; J. Bruce Fields; > mikevs@xs4all.net > Subject: Re: VM issue causing high CPU loads > > On Thu, Sep 03, 2009 at 10:02:06AM -0400, Trond Myklebust wrote: > > > OK, so 16 hash buckets are likely to be filled with ~10^6 > entries each. > > I can see that might be a performance issue... > > We have a similar setup with millions of UIDs over NFS > (currently NFSv3). > I _wish_ there were a way to use NFSv4 without having to use > name-mapped UIDs and GIDs, since our user and group names > come from MySQL anyway, and are guaranteed to be consistent > across machines. > > Why on earth does NFSv4 force the use of names? > > I was considering hacking the code to stick IDs in there > anyway, but I haven't looked at the feasibility of this. I > suspect this would break or complicate other things, but the > current NFSv4 design just seems like an incredible waste for > this case. > > Simon- > -- > To unsubscribe from this list: send the line "unsubscribe > linux-nfs" in the body of a message to > majordomo@vger.kernel.org More majordomo info at > http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2009-09-04 12:31 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-08-24 14:23 VM issue causing high CPU loads Yohan 2009-08-24 23:21 ` Andrew Morton 2009-08-26 11:08 ` Mel Gorman 2009-08-26 11:55 ` Yohan 2009-08-26 11:53 ` Yohan 2009-08-27 8:39 ` Yohan 2009-08-31 20:39 ` Yohan 2009-09-03 0:06 ` Andrew Morton 2009-09-03 13:01 ` Trond Myklebust 2009-09-03 13:39 ` Yohan 2009-09-03 14:02 ` Trond Myklebust 2009-09-03 14:08 ` Yohan 2009-09-03 14:35 ` sunrpc: dynamically allocate credcache hashtables [was: Re: VM issue causing high CPU loads] Miquel van Smoorenburg 2009-09-03 20:05 ` VM issue causing high CPU loads Simon Kirby 2009-09-03 20:49 ` Trond Myklebust 2009-09-03 22:22 ` Simon Kirby 2009-09-04 12:31 ` Trond Myklebust 2009-09-03 21:21 ` Muntz, Daniel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox