* Issues using new idmapper in large sites
@ 2013-06-19 3:12 Brian De Wolf
2013-06-19 13:41 ` Myklebust, Trond
2013-06-19 14:07 ` David Howells
0 siblings, 2 replies; 3+ messages in thread
From: Brian De Wolf @ 2013-06-19 3:12 UTC (permalink / raw)
To: Linux NFS list
Hello,
I've been having some problems after upgrading to 3.4.44 that seem to
stem from the new idmapper. We've got a site with ~36k users and our
interactive login servers pretty quickly started identifying users as
nfsnobody (-2). Looking at /proc/key-users, we had exhausted the
available space for "keys". After tuning these variables up to large
values, though, it still fails to cache more than ~500 users.
I made a directory with a file owned by every user and started testing
with /proc/sys/kernel/key values. To test, my script
prints /proc/key-users, times an "ls -ln", and checks the output for
wrong uids.
Before tweaking values:
3.4.44-gentoo
0: 620 619/514 615/1000 19995/20000
real 0m52.758s
user 0m0.370s
sys 0m7.020s
Missing users: 35784
0: 620 619/514 615/1000 19995/20000
After tweaking values (and with a hot cache):
3.4.44-gentoo
0: 620 619/514 615/1000000 19995/536870912
real 0m17.198s
user 0m0.410s
sys 0m5.020s
Missing users: 35784
0: 72188 72187/514 72183/1000000 1964565/536870912
It's fast but...it also missed most of my users (it only has 503
cached, there are 36287 total). The refcount and number of keys
skyrocket even further on repeated runs but the number of missing
users remains the same.
After testing with 3.9.6, I'm really wondering about the number of keys
instantiated being so low. It seems to hit the same ~500 limit but does
something so that it can keep working:
3.9.6-gentoo
0: 13 12/12 8/1000000 239/536870912
real 12m3.462s
user 0m0.440s
sys 0m10.720s
Missing users: 0
0: 519 518/518 513/1000000 17276/536870912
The key numbers settle at ~500 and refuse to settle any higher, even on
repeated runs (although if I watch /proc/key-users while it runs, it
sometimes jumps to ~700 and goes back to ~500. Aggressive GC?). It
would be nice to be able to give it a bit more room to cache.
Is there anything else I should test? Is there a tunable I missed? It
looks like idmapping in 3.4.44 is problematic with several hundred
users and slow in 3.9.6. Solaris performs the same test in 1 minute (1
second with a hot cache, though the cache quickly dissipates).
Thanks,
Brian
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Issues using new idmapper in large sites
2013-06-19 3:12 Issues using new idmapper in large sites Brian De Wolf
@ 2013-06-19 13:41 ` Myklebust, Trond
2013-06-19 14:07 ` David Howells
1 sibling, 0 replies; 3+ messages in thread
From: Myklebust, Trond @ 2013-06-19 13:41 UTC (permalink / raw)
To: Brian De Wolf, David Howells; +Cc: Linux NFS list
On Tue, 2013-06-18 at 20:12 -0700, Brian De Wolf wrote:
> Hello,
>
> I've been having some problems after upgrading to 3.4.44 that seem to
> stem from the new idmapper. We've got a site with ~36k users and our
> interactive login servers pretty quickly started identifying users as
> nfsnobody (-2). Looking at /proc/key-users, we had exhausted the
> available space for "keys". After tuning these variables up to large
> values, though, it still fails to cache more than ~500 users.
>
> I made a directory with a file owned by every user and started testing
> with /proc/sys/kernel/key values. To test, my script
> prints /proc/key-users, times an "ls -ln", and checks the output for
> wrong uids.
>
> Before tweaking values:
>
> 3.4.44-gentoo
> 0: 620 619/514 615/1000 19995/20000
> real 0m52.758s
> user 0m0.370s
> sys 0m7.020s
> Missing users: 35784
> 0: 620 619/514 615/1000 19995/20000
>
> After tweaking values (and with a hot cache):
>
> 3.4.44-gentoo
> 0: 620 619/514 615/1000000 19995/536870912
> real 0m17.198s
> user 0m0.410s
> sys 0m5.020s
> Missing users: 35784
> 0: 72188 72187/514 72183/1000000 1964565/536870912
>
> It's fast but...it also missed most of my users (it only has 503
> cached, there are 36287 total). The refcount and number of keys
> skyrocket even further on repeated runs but the number of missing
> users remains the same.
>
> After testing with 3.9.6, I'm really wondering about the number of keys
> instantiated being so low. It seems to hit the same ~500 limit but does
> something so that it can keep working:
>
> 3.9.6-gentoo
> 0: 13 12/12 8/1000000 239/536870912
> real 12m3.462s
> user 0m0.440s
> sys 0m10.720s
> Missing users: 0
> 0: 519 518/518 513/1000000 17276/536870912
>
> The key numbers settle at ~500 and refuse to settle any higher, even on
> repeated runs (although if I watch /proc/key-users while it runs, it
> sometimes jumps to ~700 and goes back to ~500. Aggressive GC?). It
> would be nice to be able to give it a bit more room to cache.
>
> Is there anything else I should test? Is there a tunable I missed? It
> looks like idmapping in 3.4.44 is problematic with several hundred
> users and slow in 3.9.6. Solaris performs the same test in 1 minute (1
> second with a hot cache, though the cache quickly dissipates).
>
> Thanks,
> Brian
Hi David,
What happened to your keyring patchset that was supposed to fix the
problem that Brian is reporting?
Cheers
Trond
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Issues using new idmapper in large sites
2013-06-19 3:12 Issues using new idmapper in large sites Brian De Wolf
2013-06-19 13:41 ` Myklebust, Trond
@ 2013-06-19 14:07 ` David Howells
1 sibling, 0 replies; 3+ messages in thread
From: David Howells @ 2013-06-19 14:07 UTC (permalink / raw)
To: Myklebust, Trond; +Cc: dhowells, Brian De Wolf, Linux NFS list
Myklebust, Trond <Trond.Myklebust@netapp.com> wrote:
> > It's fast but...it also missed most of my users (it only has 503
> > cached, there are 36287 total). The refcount and number of keys
> > skyrocket even further on repeated runs but the number of missing
> > users remains the same.
Okay... I need to improve keyring capacity.
Current max keyring capacity is
(PAGE_SIZE - header) / sizeof(key *)
in other words, 500-ish on a 64-bit machine.
Let me have a play - in theory, it shouldn't be too hard; the main trick is
retaining RCU-based reading and, ideally, tree size. Plus if I can sort it
too...
David
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2013-06-19 14:07 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-19 3:12 Issues using new idmapper in large sites Brian De Wolf
2013-06-19 13:41 ` Myklebust, Trond
2013-06-19 14:07 ` David Howells
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).