* Re: rpcauth_lookup_credcache() lock contentions [not found] <20120623122604.GA10887@localhost> @ 2012-06-23 15:07 ` Fengguang Wu 2012-06-24 21:34 ` Andi Kleen 1 sibling, 0 replies; 11+ messages in thread From: Fengguang Wu @ 2012-06-23 15:07 UTC (permalink / raw) To: Linux-NFS; +Cc: Andi Kleen On Sat, Jun 23, 2012 at 08:26:04PM +0800, Fengguang Wu wrote: > Hi, > > When doing parallel kernel builds on an NFSROOT system with 32 logical > CPUs, I see half CPU time spent in kernel: > > top - 20:08:28 up 5:01, 8 users, load average: 62.97, 55.82, 54.56 > Tasks: 751 total, 80 running, 671 sleeping, 0 stopped, 0 zombie > Cpu(s): 0.1%us, 53.5%sy, 46.4%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.1%si, 0.0%st > Mem: 65401132k total, 36455604k used, 28945528k free, 0k buffers > Swap: 0k total, 0k used, 0k free, 31858244k cached > > The attached lock_stat shows that rpcauth_lookup_credcache() has the > outstanding contentions. Aha, I got some progress: 20% kernel time can be saved when increasing the hash bits from 4 to 10 with sunrpc.auth_hashtable_size=10: Cpu(s): 0.0%us, 38.3%sy, 61.6%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.1%si, 0.0%st ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- class name con-bounces contentions waittime-min waittime-max waittime-total acq-bounces acquisitions holdtime-min holdtime-max holdtime-total ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- &(&dentry->d_lock)->rlock: 268601736 276206929 0.07 26627.70 496747630.48 2001328443 5557869248 0.06 37208.89 1755583747.41 ------------------------- &(&dentry->d_lock)->rlock 24861581 [<ffffffff8116de34>] unlazy_walk+0x8a/0x1bb &(&dentry->d_lock)->rlock 148863110 [<ffffffff81177f50>] dput+0x46/0x17c &(&dentry->d_lock)->rlock 13725203 [<ffffffff81178c3c>] __d_lookup+0xa2/0x19d &(&dentry->d_lock)->rlock 46173902 [<ffffffff811770a9>] dget_parent+0x63/0xd5 ------------------------- &(&dentry->d_lock)->rlock 86386730 [<ffffffff811770a9>] dget_parent+0x63/0xd5 &(&dentry->d_lock)->rlock 20746860 [<ffffffff8116dd9a>] path_get+0x2c/0x3c &(&dentry->d_lock)->rlock 12672320 [<ffffffff81178c3c>] __d_lookup+0xa2/0x19d &(&dentry->d_lock)->rlock 32752994 [<ffffffff8116de34>] unlazy_walk+0x8a/0x1bb ............................................................................................................................................................................................... &(&new->lock)->rlock: 230350869 230621514 0.08 6929.01 361504869.99 965086516 1052051247 0.07 5783.47 369851400.65 -------------------- &(&new->lock)->rlock 230621514 [<ffffffff81d026c0>] rpcauth_lookup_credcache+0xa7/0x244 -------------------- &(&new->lock)->rlock 230621514 [<ffffffff81d026c0>] rpcauth_lookup_credcache+0xa7/0x244 Thanks, Fengguang ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: rpcauth_lookup_credcache() lock contentions [not found] <20120623122604.GA10887@localhost> 2012-06-23 15:07 ` rpcauth_lookup_credcache() lock contentions Fengguang Wu @ 2012-06-24 21:34 ` Andi Kleen 2012-06-25 1:21 ` Myklebust, Trond 2012-06-25 2:42 ` Fengguang Wu 1 sibling, 2 replies; 11+ messages in thread From: Andi Kleen @ 2012-06-24 21:34 UTC (permalink / raw) To: Fengguang Wu; +Cc: Linux-NFS On Sat, Jun 23, 2012 at 08:26:04PM +0800, Fengguang Wu wrote: > Hi, > > When doing parallel kernel builds on an NFSROOT system with 32 logical > CPUs, I see half CPU time spent in kernel: > > top - 20:08:28 up 5:01, 8 users, load average: 62.97, 55.82, 54.56 > Tasks: 751 total, 80 running, 671 sleeping, 0 stopped, 0 zombie > Cpu(s): 0.1%us, 53.5%sy, 46.4%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.1%si, 0.0%st > Mem: 65401132k total, 36455604k used, 28945528k free, 0k buffers > Swap: 0k total, 0k used, 0k free, 31858244k cached > > The attached lock_stat shows that rpcauth_lookup_credcache() has the > outstanding contentions. Hmm, the cache is already using RCU for the read case, just takes a lock for writing. Or when checking an entry. So you either have a lot of lookups on the same entry, or you thrash the cache. When you look up the look address, is it the one in the first loop? If yes then it's the first. I'm not fully sure why it takes the lock in the read case anyways. Neither test_bit nor atomic_inc under RCU should need a lock, so I suppose it can be just removed. 367 spin_lock(&cache->lock); 368 if (test_bit(RPCAUTH_CRED_HASHED, &entry->cr_flags) == 0) { 369 spin_unlock(&cache->lock); 370 continue; 371 } 372 cred = get_rpccred(entry); 373 spin_unlock(&cache->lock); and get_rpccred is just 154 static inline 155 struct rpc_cred * get_rpccred(struct rpc_cred *cred) 156 { 157 atomic_inc(&cred->cr_count); 158 return cred; 159 } Can you try this patch? -andi commit fa1eef2ec22f2fc31e0381b864044fbb753dd572 Author: Andi Kleen <ak@linux.intel.com> Date: Sun Jun 24 14:31:06 2012 -0700 sunrpc: remove useless spinlocks in credential lookup path Fengguang noticed that rpcauth_lookup_credcache has high lock contention on the nfs server when doing kernel builds on nfsroot. There is no reason to take the spinlock in the read loop: the RCU makes sure the object does not go away, and either test_bit nor atomic_inc in get_rpccred() needs a lock. So just remove the spinlock in the read lookup path. Reported-by: Fengguang Wu Signed-off-by: Andi Kleen <ak@linux.intel.com> diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c index 727e506..13c2b58 100644 --- a/net/sunrpc/auth.c +++ b/net/sunrpc/auth.c @@ -364,13 +364,10 @@ rpcauth_lookup_credcache(struct rpc_auth *auth, struct auth_cred * acred, hlist_for_each_entry_rcu(entry, pos, &cache->hashtable[nr], cr_hash) { if (!entry->cr_ops->crmatch(acred, entry, flags)) continue; - spin_lock(&cache->lock); if (test_bit(RPCAUTH_CRED_HASHED, &entry->cr_flags) == 0) { - spin_unlock(&cache->lock); continue; } cred = get_rpccred(entry); - spin_unlock(&cache->lock); break; } rcu_read_unlock(); ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: rpcauth_lookup_credcache() lock contentions 2012-06-24 21:34 ` Andi Kleen @ 2012-06-25 1:21 ` Myklebust, Trond 2012-06-25 2:45 ` Andi Kleen 2012-06-25 2:42 ` Fengguang Wu 1 sibling, 1 reply; 11+ messages in thread From: Myklebust, Trond @ 2012-06-25 1:21 UTC (permalink / raw) To: Andi Kleen; +Cc: Fengguang Wu, Linux-NFS T24gU3VuLCAyMDEyLTA2LTI0IGF0IDE0OjM0IC0wNzAwLCBBbmRpIEtsZWVuIHdyb3RlOg0KPiBP biBTYXQsIEp1biAyMywgMjAxMiBhdCAwODoyNjowNFBNICswODAwLCBGZW5nZ3VhbmcgV3Ugd3Jv dGU6DQo+ID4gSGksDQo+ID4gDQo+ID4gV2hlbiBkb2luZyBwYXJhbGxlbCBrZXJuZWwgYnVpbGRz IG9uIGFuIE5GU1JPT1Qgc3lzdGVtIHdpdGggMzIgbG9naWNhbA0KPiA+IENQVXMsIEkgc2VlIGhh bGYgQ1BVIHRpbWUgc3BlbnQgaW4ga2VybmVsOg0KPiA+IA0KPiA+IHRvcCAtIDIwOjA4OjI4IHVw ICA1OjAxLCAgOCB1c2VycywgIGxvYWQgYXZlcmFnZTogNjIuOTcsIDU1LjgyLCA1NC41Ng0KPiA+ IFRhc2tzOiA3NTEgdG90YWwsICA4MCBydW5uaW5nLCA2NzEgc2xlZXBpbmcsICAgMCBzdG9wcGVk LCAgIDAgem9tYmllDQo+ID4gQ3B1KHMpOiAgMC4xJXVzLCA1My41JXN5LCA0Ni40JW5pLCAgMC4w JWlkLCAgMC4wJXdhLCAgMC4wJWhpLCAgMC4xJXNpLCAgMC4wJXN0DQo+ID4gTWVtOiAgNjU0MDEx MzJrIHRvdGFsLCAzNjQ1NTYwNGsgdXNlZCwgMjg5NDU1MjhrIGZyZWUsICAgICAgICAwayBidWZm ZXJzDQo+ID4gU3dhcDogICAgICAgIDBrIHRvdGFsLCAgICAgICAgMGsgdXNlZCwgICAgICAgIDBr IGZyZWUsIDMxODU4MjQ0ayBjYWNoZWQNCj4gPiANCj4gPiBUaGUgYXR0YWNoZWQgbG9ja19zdGF0 IHNob3dzIHRoYXQgcnBjYXV0aF9sb29rdXBfY3JlZGNhY2hlKCkgaGFzIHRoZQ0KPiA+IG91dHN0 YW5kaW5nIGNvbnRlbnRpb25zLg0KPiANCj4gSG1tLCB0aGUgY2FjaGUgaXMgYWxyZWFkeSB1c2lu ZyBSQ1UgZm9yIHRoZSByZWFkIGNhc2UsIGp1c3QgdGFrZXMNCj4gYSBsb2NrIGZvciB3cml0aW5n LiBPciB3aGVuIGNoZWNraW5nIGFuIGVudHJ5Lg0KPiANCj4gU28geW91IGVpdGhlciBoYXZlIGEg bG90IG9mIGxvb2t1cHMgb24gdGhlIHNhbWUgZW50cnksIG9yIHlvdSB0aHJhc2gNCj4gdGhlIGNh Y2hlLg0KPiANCj4gV2hlbiB5b3UgbG9vayB1cCB0aGUgbG9vayBhZGRyZXNzLCBpcyBpdCB0aGUg b25lIGluIHRoZSBmaXJzdCBsb29wPw0KPiBJZiB5ZXMgdGhlbiBpdCdzIHRoZSBmaXJzdC4NCj4g DQo+IEknbSBub3QgZnVsbHkgc3VyZSB3aHkgaXQgdGFrZXMgdGhlIGxvY2sgaW4gdGhlIHJlYWQg Y2FzZSBhbnl3YXlzLiBOZWl0aGVyDQo+IHRlc3RfYml0IG5vciBhdG9taWNfaW5jIHVuZGVyIFJD VSBzaG91bGQgbmVlZCBhIGxvY2ssIHNvIEkgc3VwcG9zZQ0KPiBpdCBjYW4gYmUganVzdCByZW1v dmVkLg0KPiANCj4gMzY3ICAgICAgICAgICAgICAgICBzcGluX2xvY2soJmNhY2hlLT5sb2NrKTsN Cj4gMzY4ICAgICAgICAgICAgICAgICBpZiAodGVzdF9iaXQoUlBDQVVUSF9DUkVEX0hBU0hFRCwg JmVudHJ5LT5jcl9mbGFncykgPT0gMCkgew0KPiAzNjkgICAgICAgICAgICAgICAgICAgICAgICAg c3Bpbl91bmxvY2soJmNhY2hlLT5sb2NrKTsNCj4gMzcwICAgICAgICAgICAgICAgICAgICAgICAg IGNvbnRpbnVlOw0KPiAzNzEgICAgICAgICAgICAgICAgIH0NCj4gMzcyICAgICAgICAgICAgICAg ICBjcmVkID0gZ2V0X3JwY2NyZWQoZW50cnkpOw0KPiAzNzMgICAgICAgICAgICAgICAgIHNwaW5f dW5sb2NrKCZjYWNoZS0+bG9jayk7DQoNCkhpIEFuZGksDQoNCkkgYmVsaWV2ZSB0aGF0IHRoZSBz cGlubG9jayBpcyB0aGVyZSBpbiBvcmRlciB0byBlbnN1cmUgdGhhdCB0aGUNCnRlc3RfYml0KCkg YW5kIGdldF9ycGNjcmVkKCkgYXJlIGF0b21pYyB3LnIudC4gZ2FyYmFnZSBjb2xsZWN0aW9uLg0K DQpXaGF0IHdlIGNvdWxkIHBlcmhhcHMgZG8gaXMgY3JlYXRlIGEgbG9jay1sZXNzIGZhc3QgcGF0 aC4gRm9yIGluc3RhbmNlDQp1c2UgYXRvbWljX2luY191bmxlc3NfemVybyB0byBncmFiIGEgcmVm ZXJlbmNlIGluIGNhc2VzIHdoZXJlIHdlIGtub3cNCnRoYXQgdGhlIGdhcmJhZ2UgY29sbGVjdG9y IGNhbid0IGFjdCBvbiB0aGlzIGNyZWQuIFdlJ2Qgc3RpbGwgbmVlZCB0bw0KdGVzdCBmb3IgUlBD QVVUSF9DUkVEX0hBU0hFRCwgYnV0IHNpbmNlIHdlJ2QgYmUgaG9sZGluZyBhIHJlZmVyZW5jZSB3 ZQ0KY291bGQgZG8gdGhhdCBvdXRzaWRlIHRoZSBsb2NrLi4uDQoNCkNoZWVycw0KICBUcm9uZA0K LS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5lcg0KDQpOZXRB cHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tDQp3d3cubmV0YXBwLmNvbQ0KDQo= ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: rpcauth_lookup_credcache() lock contentions 2012-06-25 1:21 ` Myklebust, Trond @ 2012-06-25 2:45 ` Andi Kleen 0 siblings, 0 replies; 11+ messages in thread From: Andi Kleen @ 2012-06-25 2:45 UTC (permalink / raw) To: Myklebust, Trond; +Cc: Fengguang Wu, Linux-NFS > I believe that the spinlock is there in order to ensure that the > test_bit() and get_rpccred() are atomic w.r.t. garbage collection. Normally RCU should ensure that already with minor care. Only drop after a quiescent period and when the reference is still 0. And drop always when the refcount goes to 0. -Andi -- ak@linux.intel.com -- Speaking for myself only ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: rpcauth_lookup_credcache() lock contentions 2012-06-24 21:34 ` Andi Kleen 2012-06-25 1:21 ` Myklebust, Trond @ 2012-06-25 2:42 ` Fengguang Wu 2012-06-27 18:03 ` Andi Kleen 1 sibling, 1 reply; 11+ messages in thread From: Fengguang Wu @ 2012-06-25 2:42 UTC (permalink / raw) To: Andi Kleen; +Cc: Linux-NFS On Sun, Jun 24, 2012 at 02:34:17PM -0700, Andi Kleen wrote: > On Sat, Jun 23, 2012 at 08:26:04PM +0800, Fengguang Wu wrote: > > Hi, > > > > When doing parallel kernel builds on an NFSROOT system with 32 logical > > CPUs, I see half CPU time spent in kernel: > > > > top - 20:08:28 up 5:01, 8 users, load average: 62.97, 55.82, 54.56 > > Tasks: 751 total, 80 running, 671 sleeping, 0 stopped, 0 zombie > > Cpu(s): 0.1%us, 53.5%sy, 46.4%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.1%si, 0.0%st > > Mem: 65401132k total, 36455604k used, 28945528k free, 0k buffers > > Swap: 0k total, 0k used, 0k free, 31858244k cached > > > > The attached lock_stat shows that rpcauth_lookup_credcache() has the > > outstanding contentions. > > Hmm, the cache is already using RCU for the read case, just takes > a lock for writing. Or when checking an entry. > > So you either have a lot of lookups on the same entry, or you thrash > the cache. > > When you look up the look address, is it the one in the first loop? > If yes then it's the first. > > I'm not fully sure why it takes the lock in the read case anyways. Neither > test_bit nor atomic_inc under RCU should need a lock, so I suppose > it can be just removed. > > 367 spin_lock(&cache->lock); > 368 if (test_bit(RPCAUTH_CRED_HASHED, &entry->cr_flags) == 0) { > 369 spin_unlock(&cache->lock); > 370 continue; > 371 } > 372 cred = get_rpccred(entry); > 373 spin_unlock(&cache->lock); > > and get_rpccred is just > > 154 static inline > 155 struct rpc_cred * get_rpccred(struct rpc_cred *cred) > 156 { > 157 atomic_inc(&cred->cr_count); > 158 return cred; > 159 } > > Can you try this patch? Thank you! This patch brings %sys down to 24%, a pretty large improvement! root@snb /home/wfg# vmstat 3 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 78 0 0 21294520 0 8822056 0 0 0 0 46 56 7 3 90 0 71 0 0 21312104 0 8832776 0 0 0 0 32831 14960 76 24 0 0 76 0 0 21203800 0 8844552 0 0 0 0 32892 15100 75 25 0 0 73 0 0 21019004 0 8856028 0 0 0 0 32837 14770 75 25 0 0 72 0 0 21021084 0 8865996 0 0 0 0 32841 14901 76 24 0 0 76 0 0 20907260 0 8876084 0 0 0 0 32678 14643 76 24 0 0 68 0 0 20936304 0 8886732 0 0 0 0 32724 14858 78 22 0 0 72 0 0 20889164 0 8897520 0 0 0 0 32732 14813 76 24 0 0 81 0 0 20742408 0 8907368 0 0 0 0 32745 14669 77 23 0 0 70 0 0 20662060 0 8917840 0 0 0 0 32748 14751 76 24 0 0 74 0 0 21035616 0 8927836 0 0 0 0 32781 14856 77 23 0 0 ^C It completely removes the function from lock_stat. Now the new contented functions are class name con-bounces contentions waittime-min waittime-max waittime-total acq-bounces acquisitions holdtime-min holdtime-max holdtime-total ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- &(&dentry->d_lock)->rlock: 197445146 201224451 0.10 2777.33 461033150.94 1345525784 6033591257 0.08 321.13 1158851748.52 ------------------------- &(&dentry->d_lock)->rlock 111609824 [<ffffffff811609f1>] dput+0x33/0x169 &(&dentry->d_lock)->rlock 18819236 [<ffffffff81156b86>] path_get+0x2c/0x3c &(&dentry->d_lock)->rlock 17621335 [<ffffffff81156c20>] unlazy_walk+0x8a/0x1ac &(&dentry->d_lock)->rlock 34148964 [<ffffffff81160233>] dget_parent+0x4b/0xaf ------------------------- &(&dentry->d_lock)->rlock 14495017 [<ffffffff81156b86>] path_get+0x2c/0x3c &(&dentry->d_lock)->rlock 89313409 [<ffffffff811609f1>] dput+0x33/0x169 &(&dentry->d_lock)->rlock 19487816 [<ffffffff81156c20>] unlazy_walk+0x8a/0x1ac &(&dentry->d_lock)->rlock 59151508 [<ffffffff81160233>] dget_parent+0x4b/0xaf ............................................................................................................................................................................................... &sb->s_type->i_lock_key#1: 24313829 24330907 0.09 185.01 22864745.70 572924538 1422754983 0.08 244.15 852837006.69 ------------------------- &sb->s_type->i_lock_key#1 24263909 [<ffffffff81244ad9>] nfs_do_access+0x2b/0x350 &sb->s_type->i_lock_key#1 1438 [<ffffffff8125069d>] nfs_unblock_sillyrename+0x9b/0xce &sb->s_type->i_lock_key#1 24986 [<ffffffff81247049>] nfs_file_set_open_context+0x3a/0x62 &sb->s_type->i_lock_key#1 18689 [<ffffffff813f8285>] _atomic_dec_and_lock+0x39/0x50 ------------------------- &sb->s_type->i_lock_key#1 24280596 [<ffffffff81244ad9>] nfs_do_access+0x2b/0x350 &sb->s_type->i_lock_key#1 578 [<ffffffff8125069d>] nfs_unblock_sillyrename+0x9b/0xce &sb->s_type->i_lock_key#1 8602 [<ffffffff813f8285>] _atomic_dec_and_lock+0x39/0x50 &sb->s_type->i_lock_key#1 15649 [<ffffffff81247186>] nfs_release+0x47/0x93 Thanks, Fengguang > commit fa1eef2ec22f2fc31e0381b864044fbb753dd572 > Author: Andi Kleen <ak@linux.intel.com> > Date: Sun Jun 24 14:31:06 2012 -0700 > > sunrpc: remove useless spinlocks in credential lookup path > > Fengguang noticed that rpcauth_lookup_credcache has high lock contention > on the nfs server when doing kernel builds on nfsroot. > > There is no reason to take the spinlock in the read loop: the RCU > makes sure the object does not go away, and either test_bit > nor atomic_inc in get_rpccred() needs a lock. > > So just remove the spinlock in the read lookup path. > > Reported-by: Fengguang Wu > Signed-off-by: Andi Kleen <ak@linux.intel.com> > > diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c > index 727e506..13c2b58 100644 > --- a/net/sunrpc/auth.c > +++ b/net/sunrpc/auth.c > @@ -364,13 +364,10 @@ rpcauth_lookup_credcache(struct rpc_auth *auth, struct auth_cred * acred, > hlist_for_each_entry_rcu(entry, pos, &cache->hashtable[nr], cr_hash) { > if (!entry->cr_ops->crmatch(acred, entry, flags)) > continue; > - spin_lock(&cache->lock); > if (test_bit(RPCAUTH_CRED_HASHED, &entry->cr_flags) == 0) { > - spin_unlock(&cache->lock); > continue; > } > cred = get_rpccred(entry); > - spin_unlock(&cache->lock); > break; > } > rcu_read_unlock(); ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: rpcauth_lookup_credcache() lock contentions 2012-06-25 2:42 ` Fengguang Wu @ 2012-06-27 18:03 ` Andi Kleen 2012-06-27 18:36 ` Myklebust, Trond 2012-07-05 13:11 ` Fengguang Wu 0 siblings, 2 replies; 11+ messages in thread From: Andi Kleen @ 2012-06-27 18:03 UTC (permalink / raw) To: Fengguang Wu; +Cc: Linux-NFS, Trond.Myklebust > > Can you try this patch? > > Thank you! This patch brings %sys down to 24%, a pretty large improvement! I redid the patch now and fixed the race Trond pointed out. Fengguang, can you retest please? Trond, does it look good now? --- From: Andi Kleen <ak@linux.intel.com> Date: Sun, 24 Jun 2012 14:31:06 -0700 Subject: [PATCH] sunrpc: Improve spinlocks in credential lookup path v2 Fengguang noticed that rpcauth_lookup_credcache has high lock contention on the nfs server when doing kernel builds on nfsroot. The only reason the spinlock is needed in the lockup path is to synchronize with the garbage collector. First the object itself is stable because it's RCU'ed. So now we use an atomic_add_return and check for the 0 reference count case. When the count was 0 assume the garbage collector is active on this entry and take the lock and recheck. Otherwise the path is lockless. v2: Add GC race check based on Trond's feedback Cc: Trond.Myklebust@netapp.com Reported-by: Fengguang Wu Signed-off-by: Andi Kleen <ak@linux.intel.com> --- net/sunrpc/auth.c | 19 +++++++++++++++---- 1 files changed, 15 insertions(+), 4 deletions(-) diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c index 727e506..645769e 100644 --- a/net/sunrpc/auth.c +++ b/net/sunrpc/auth.c @@ -364,13 +364,24 @@ rpcauth_lookup_credcache(struct rpc_auth *auth, struct auth_cred * acred, hlist_for_each_entry_rcu(entry, pos, &cache->hashtable[nr], cr_hash) { if (!entry->cr_ops->crmatch(acred, entry, flags)) continue; - spin_lock(&cache->lock); if (test_bit(RPCAUTH_CRED_HASHED, &entry->cr_flags) == 0) { - spin_unlock(&cache->lock); continue; } - cred = get_rpccred(entry); - spin_unlock(&cache->lock); + cred = entry; + if (atomic_add_return(1, &cred->cr_count) == 1) { + /* + * When the count was 0 we could race with the GC. + * Take the lock and recheck. + */ + spin_lock(&cache->lock); + if (!test_bit(RPCAUTH_CRED_HASHED, &entry->cr_flags)) { + /* Lost the race. */ + atomic_dec(&cred->cr_count); + spin_unlock(&cache->lock); + continue; + } + spin_unlock(&cache->lock); + } break; } rcu_read_unlock(); -- 1.7.7 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: rpcauth_lookup_credcache() lock contentions 2012-06-27 18:03 ` Andi Kleen @ 2012-06-27 18:36 ` Myklebust, Trond 2012-07-05 13:11 ` Fengguang Wu 1 sibling, 0 replies; 11+ messages in thread From: Myklebust, Trond @ 2012-06-27 18:36 UTC (permalink / raw) To: Andi Kleen; +Cc: Fengguang Wu, Linux-NFS T24gV2VkLCAyMDEyLTA2LTI3IGF0IDExOjAzIC0wNzAwLCBBbmRpIEtsZWVuIHdyb3RlOg0KPiA+ ID4gQ2FuIHlvdSB0cnkgdGhpcyBwYXRjaD8NCj4gPiANCj4gPiBUaGFuayB5b3UhIFRoaXMgcGF0 Y2ggYnJpbmdzICVzeXMgZG93biB0byAyNCUsIGEgcHJldHR5IGxhcmdlIGltcHJvdmVtZW50IQ0K PiANCj4gSSByZWRpZCB0aGUgcGF0Y2ggbm93IGFuZCBmaXhlZCB0aGUgcmFjZSBUcm9uZCBwb2lu dGVkIG91dC4NCj4gDQo+IEZlbmdndWFuZywgY2FuIHlvdSByZXRlc3QgcGxlYXNlPyBUcm9uZCwg ZG9lcyBpdCBsb29rIGdvb2Qgbm93Pw0KDQpIaSBBbmRpLA0KDQpUaGVyZSBpcyBzdGlsbCBhIHJh Y2UgdGhlcmU6IGlmIHlvdXIgdGhyZWFkIGRvZXMgYXRvbWljX2FkZF9yZXR1cm4oKSA9PQ0KMSwg dGhlbiBhIHNlY29uZCB0aHJlYWQgY2FuIHNjYW4sIGRvIHRoZSBhdG9taWNfYWRkX3JldHVybigp IGFuZCBnZXQgYQ0KdmFsdWUgb2YgMiwgaW4gd2hpY2ggY2FzZSBpdCB3b24ndCBjaGVjayBmb3Ig SEFTSEVELi4uDQoNCkNoZWVycw0KICBUcm9uZA0KDQo+IC0tLQ0KPiANCj4gRnJvbTogQW5kaSBL bGVlbiA8YWtAbGludXguaW50ZWwuY29tPg0KPiBEYXRlOiBTdW4sIDI0IEp1biAyMDEyIDE0OjMx OjA2IC0wNzAwDQo+IFN1YmplY3Q6IFtQQVRDSF0gc3VucnBjOiBJbXByb3ZlIHNwaW5sb2NrcyBp biBjcmVkZW50aWFsIGxvb2t1cCBwYXRoIHYyDQo+IA0KPiBGZW5nZ3Vhbmcgbm90aWNlZCB0aGF0 IHJwY2F1dGhfbG9va3VwX2NyZWRjYWNoZSBoYXMgaGlnaCBsb2NrIGNvbnRlbnRpb24NCj4gb24g dGhlIG5mcyBzZXJ2ZXIgd2hlbiBkb2luZyBrZXJuZWwgYnVpbGRzIG9uIG5mc3Jvb3QuDQo+IA0K PiBUaGUgb25seSByZWFzb24gdGhlIHNwaW5sb2NrIGlzIG5lZWRlZCBpbiB0aGUgbG9ja3VwIHBh dGggaXMgdG8NCj4gc3luY2hyb25pemUgd2l0aCB0aGUgZ2FyYmFnZSBjb2xsZWN0b3IuIEZpcnN0 IHRoZSBvYmplY3QgaXRzZWxmDQo+IGlzIHN0YWJsZSBiZWNhdXNlIGl0J3MgUkNVJ2VkLg0KPiAN Cj4gU28gbm93IHdlIHVzZSBhbiBhdG9taWNfYWRkX3JldHVybiBhbmQgY2hlY2sgZm9yIHRoZSAw IHJlZmVyZW5jZQ0KPiBjb3VudCBjYXNlLiAgV2hlbiB0aGUgY291bnQgd2FzIDAgYXNzdW1lIHRo ZSBnYXJiYWdlIGNvbGxlY3Rvcg0KPiBpcyBhY3RpdmUgb24gdGhpcyBlbnRyeSBhbmQgdGFrZSB0 aGUgbG9jayBhbmQgcmVjaGVjay4NCj4gDQo+IE90aGVyd2lzZSB0aGUgcGF0aCBpcyBsb2NrbGVz cy4NCj4gDQo+IHYyOiBBZGQgR0MgcmFjZSBjaGVjayBiYXNlZCBvbiBUcm9uZCdzIGZlZWRiYWNr DQo+IENjOiBUcm9uZC5NeWtsZWJ1c3RAbmV0YXBwLmNvbQ0KPiBSZXBvcnRlZC1ieTogRmVuZ2d1 YW5nIFd1DQo+IFNpZ25lZC1vZmYtYnk6IEFuZGkgS2xlZW4gPGFrQGxpbnV4LmludGVsLmNvbT4N Cj4gLS0tDQo+ICBuZXQvc3VucnBjL2F1dGguYyB8ICAgMTkgKysrKysrKysrKysrKysrLS0tLQ0K PiAgMSBmaWxlcyBjaGFuZ2VkLCAxNSBpbnNlcnRpb25zKCspLCA0IGRlbGV0aW9ucygtKQ0KPiAN Cj4gZGlmZiAtLWdpdCBhL25ldC9zdW5ycGMvYXV0aC5jIGIvbmV0L3N1bnJwYy9hdXRoLmMNCj4g aW5kZXggNzI3ZTUwNi4uNjQ1NzY5ZSAxMDA2NDQNCj4gLS0tIGEvbmV0L3N1bnJwYy9hdXRoLmMN Cj4gKysrIGIvbmV0L3N1bnJwYy9hdXRoLmMNCj4gQEAgLTM2NCwxMyArMzY0LDI0IEBAIHJwY2F1 dGhfbG9va3VwX2NyZWRjYWNoZShzdHJ1Y3QgcnBjX2F1dGggKmF1dGgsIHN0cnVjdCBhdXRoX2Ny ZWQgKiBhY3JlZCwNCj4gIAlobGlzdF9mb3JfZWFjaF9lbnRyeV9yY3UoZW50cnksIHBvcywgJmNh Y2hlLT5oYXNodGFibGVbbnJdLCBjcl9oYXNoKSB7DQo+ICAJCWlmICghZW50cnktPmNyX29wcy0+ Y3JtYXRjaChhY3JlZCwgZW50cnksIGZsYWdzKSkNCj4gIAkJCWNvbnRpbnVlOw0KPiAtCQlzcGlu X2xvY2soJmNhY2hlLT5sb2NrKTsNCj4gIAkJaWYgKHRlc3RfYml0KFJQQ0FVVEhfQ1JFRF9IQVNI RUQsICZlbnRyeS0+Y3JfZmxhZ3MpID09IDApIHsNCj4gLQkJCXNwaW5fdW5sb2NrKCZjYWNoZS0+ bG9jayk7DQo+ICAJCQljb250aW51ZTsNCj4gIAkJfQ0KPiAtCQljcmVkID0gZ2V0X3JwY2NyZWQo ZW50cnkpOw0KPiAtCQlzcGluX3VubG9jaygmY2FjaGUtPmxvY2spOw0KPiArCQljcmVkID0gZW50 cnk7DQo+ICsJCWlmIChhdG9taWNfYWRkX3JldHVybigxLCAmY3JlZC0+Y3JfY291bnQpID09IDEp IHsNCj4gKwkJCS8qDQo+ICsJCQkgKiBXaGVuIHRoZSBjb3VudCB3YXMgMCB3ZSBjb3VsZCByYWNl IHdpdGggdGhlIEdDLg0KPiArCQkJICogVGFrZSB0aGUgbG9jayBhbmQgcmVjaGVjay4NCj4gKwkJ CSAqLw0KPiArCQkJc3Bpbl9sb2NrKCZjYWNoZS0+bG9jayk7DQo+ICsJCQlpZiAoIXRlc3RfYml0 KFJQQ0FVVEhfQ1JFRF9IQVNIRUQsICZlbnRyeS0+Y3JfZmxhZ3MpKSB7DQo+ICsJCQkJLyogTG9z dCB0aGUgcmFjZS4gKi8NCj4gKwkJCQlhdG9taWNfZGVjKCZjcmVkLT5jcl9jb3VudCk7DQo+ICsJ CQkJc3Bpbl91bmxvY2soJmNhY2hlLT5sb2NrKTsNCj4gKwkJCQljb250aW51ZTsNCj4gKwkJCX0N Cj4gKwkJCXNwaW5fdW5sb2NrKCZjYWNoZS0+bG9jayk7DQo+ICsJCX0NCj4gIAkJYnJlYWs7DQo+ ICAJfQ0KPiAgCXJjdV9yZWFkX3VubG9jaygpOw0KDQotLSANClRyb25kIE15a2xlYnVzdA0KTGlu dXggTkZTIGNsaWVudCBtYWludGFpbmVyDQoNCk5ldEFwcA0KVHJvbmQuTXlrbGVidXN0QG5ldGFw cC5jb20NCnd3dy5uZXRhcHAuY29tDQoNCg== ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: rpcauth_lookup_credcache() lock contentions 2012-06-27 18:03 ` Andi Kleen 2012-06-27 18:36 ` Myklebust, Trond @ 2012-07-05 13:11 ` Fengguang Wu 2012-07-05 15:05 ` Malahal Naineni 2012-07-05 16:27 ` Andi Kleen 1 sibling, 2 replies; 11+ messages in thread From: Fengguang Wu @ 2012-07-05 13:11 UTC (permalink / raw) To: Andi Kleen; +Cc: Linux-NFS, Trond.Myklebust On Wed, Jun 27, 2012 at 11:03:53AM -0700, Andi Kleen wrote: > > > Can you try this patch? > > > > Thank you! This patch brings %sys down to 24%, a pretty large improvement! > > I redid the patch now and fixed the race Trond pointed out. > > Fengguang, can you retest please? Trond, does it look good now? Andi, this patch performs equally well! And it runs stable for over a week. wfg@snb ~% vmstat 3 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 78 0 0 14084304 0 15489456 0 0 0 0 2 5 43 17 39 0 76 0 0 14077684 0 15501356 0 0 0 0 32553 14948 76 24 0 0 100 0 0 14079660 0 15513388 0 0 0 0 32562 14961 75 25 0 0 120 0 0 12934440 0 15523132 0 0 0 0 32780 15917 74 26 0 0 118 0 0 12845200 0 15531784 0 0 0 0 32495 14932 78 22 0 0 108 0 0 13004212 0 15543092 0 0 0 0 32541 15020 78 22 0 0 120 0 0 13114864 0 15551964 0 0 0 0 32502 14900 78 22 0 0 100 0 0 13305164 0 15562040 0 0 0 0 32497 14980 77 23 0 0 95 0 0 13094524 0 15573836 0 0 0 0 32551 14992 78 22 0 0 Now the most contented locks are class name con-bounces contentions waittime-min waittime-max waittime-total acq-bounces acqui sitions holdtime-min holdtime-max holdtime-total ------------------------------------------------------------------------------------------------------------------------------------------ ----------------------------------------------------- &(&zone->lru_lock)->rlock: 766782640 767439846 0.09 359.51 1315341516.73 6714323642 1003 4569711 0.10 13565.21 22972522899.23 ------------------------- &(&zone->lru_lock)->rlock 403547412 [<ffffffff8110dcf7>] release_pages+0xd9/0x197 &(&zone->lru_lock)->rlock 363659746 [<ffffffff8110de31>] pagevec_lru_move_fn+0x7c/0xd1 &(&zone->lru_lock)->rlock 232683 [<ffffffff8110d965>] __page_cache_release.part.8+0x47/0xcc &(&zone->lru_lock)->rlock 4 [<ffffffff811126c8>] isolate_lru_page+0x66/0x118 ------------------------- &(&zone->lru_lock)->rlock 402927785 [<ffffffff8110dcf7>] release_pages+0xd9/0x197 &(&zone->lru_lock)->rlock 364420999 [<ffffffff8110de31>] pagevec_lru_move_fn+0x7c/0xd1 &(&zone->lru_lock)->rlock 91058 [<ffffffff8110d965>] __page_cache_release.part.8+0x47/0xcc &(&zone->lru_lock)->rlock 3 [<ffffffff8110e269>] add_page_to_unevictable_list+0x43/0x90 .......................................................................................................................................... ..................................................... cpufreq_driver_lock: 443785151 443792591 0.14 60.87 7083609116.92 511710115 51 3985181 0.11 31.25 303483059.29 ------------------- cpufreq_driver_lock 443792591 [<ffffffff817d5eef>] cpufreq_cpu_get+0x22/0x9f ------------------- cpufreq_driver_lock 443792591 [<ffffffff817d5eef>] cpufreq_cpu_get+0x22/0x9f .......................................................................................................................................... ..................................................... rcu_node_level_1: 333497521 345451048 0.10 38.72 742394589.16 676477679 175 3637221 0.00 24.74 698337725.87 ---------------- rcu_node_level_1 320813377 [<ffffffff810d6bc2>] rcu_report_qs_rdp.isra.26+0x25/0x78 rcu_node_level_1 23968287 [<ffffffff810d6ac1>] force_qs_rnp+0x3b/0x117 rcu_node_level_1 35937 [<ffffffff810d69d8>] rcu_report_qs_rnp+0x1ed/0x29b rcu_node_level_1 633447 [<ffffffff810d6668>] rcu_start_gp+0x158/0x2db ---------------- rcu_node_level_1 278686102 [<ffffffff810d6bc2>] rcu_report_qs_rdp.isra.26+0x25/0x78 rcu_node_level_1 31540615 [<ffffffff810d6ac1>] force_qs_rnp+0x3b/0x117 rcu_node_level_1 634856 [<ffffffff810d69d8>] rcu_report_qs_rnp+0x1ed/0x29b rcu_node_level_1 34589475 [<ffffffff810d6668>] rcu_start_gp+0x158/0x2db Thanks, Fengguang > --- > > From: Andi Kleen <ak@linux.intel.com> > Date: Sun, 24 Jun 2012 14:31:06 -0700 > Subject: [PATCH] sunrpc: Improve spinlocks in credential lookup path v2 > > Fengguang noticed that rpcauth_lookup_credcache has high lock contention > on the nfs server when doing kernel builds on nfsroot. > > The only reason the spinlock is needed in the lockup path is to > synchronize with the garbage collector. First the object itself > is stable because it's RCU'ed. > > So now we use an atomic_add_return and check for the 0 reference > count case. When the count was 0 assume the garbage collector > is active on this entry and take the lock and recheck. > > Otherwise the path is lockless. > > v2: Add GC race check based on Trond's feedback > Cc: Trond.Myklebust@netapp.com > Reported-by: Fengguang Wu > Signed-off-by: Andi Kleen <ak@linux.intel.com> > --- > net/sunrpc/auth.c | 19 +++++++++++++++---- > 1 files changed, 15 insertions(+), 4 deletions(-) > > diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c > index 727e506..645769e 100644 > --- a/net/sunrpc/auth.c > +++ b/net/sunrpc/auth.c > @@ -364,13 +364,24 @@ rpcauth_lookup_credcache(struct rpc_auth *auth, struct auth_cred * acred, > hlist_for_each_entry_rcu(entry, pos, &cache->hashtable[nr], cr_hash) { > if (!entry->cr_ops->crmatch(acred, entry, flags)) > continue; > - spin_lock(&cache->lock); > if (test_bit(RPCAUTH_CRED_HASHED, &entry->cr_flags) == 0) { > - spin_unlock(&cache->lock); > continue; > } > - cred = get_rpccred(entry); > - spin_unlock(&cache->lock); > + cred = entry; > + if (atomic_add_return(1, &cred->cr_count) == 1) { > + /* > + * When the count was 0 we could race with the GC. > + * Take the lock and recheck. > + */ > + spin_lock(&cache->lock); > + if (!test_bit(RPCAUTH_CRED_HASHED, &entry->cr_flags)) { > + /* Lost the race. */ > + atomic_dec(&cred->cr_count); > + spin_unlock(&cache->lock); > + continue; > + } > + spin_unlock(&cache->lock); > + } > break; > } > rcu_read_unlock(); > -- > 1.7.7 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: rpcauth_lookup_credcache() lock contentions 2012-07-05 13:11 ` Fengguang Wu @ 2012-07-05 15:05 ` Malahal Naineni 2012-07-05 16:29 ` Andi Kleen 2012-07-05 16:27 ` Andi Kleen 1 sibling, 1 reply; 11+ messages in thread From: Malahal Naineni @ 2012-07-05 15:05 UTC (permalink / raw) To: Fengguang Wu; +Cc: Andi Kleen, Linux-NFS, Trond.Myklebust Fengguang Wu [fengguang.wu@intel.com] wrote: > > diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c > > index 727e506..645769e 100644 > > --- a/net/sunrpc/auth.c > > +++ b/net/sunrpc/auth.c > > @@ -364,13 +364,24 @@ rpcauth_lookup_credcache(struct rpc_auth *auth, struct auth_cred * acred, > > hlist_for_each_entry_rcu(entry, pos, &cache->hashtable[nr], cr_hash) { > > if (!entry->cr_ops->crmatch(acred, entry, flags)) > > continue; > > - spin_lock(&cache->lock); > > if (test_bit(RPCAUTH_CRED_HASHED, &entry->cr_flags) == 0) { > > - spin_unlock(&cache->lock); > > continue; > > } > > - cred = get_rpccred(entry); > > - spin_unlock(&cache->lock); > > + cred = entry; I know very little about this code, so take this with grain of salt! The cred symbol is set above but there is no guaranty that you will use it. If you don't win, you should set it to NULL. > > + if (atomic_add_return(1, &cred->cr_count) == 1) { > > + /* > > + * When the count was 0 we could race with the GC. > > + * Take the lock and recheck. > > + */ > > + spin_lock(&cache->lock); > > + if (!test_bit(RPCAUTH_CRED_HASHED, &entry->cr_flags)) { > > + /* Lost the race. */ > > + atomic_dec(&cred->cr_count); > > + spin_unlock(&cache->lock); > > + continue; If you execute the above "continue" statement, you will have stale "cred" where the original code may end up with NULL cred. I don't know if that can happen really though. > > + } > > + spin_unlock(&cache->lock); > > + } > > break; Setting cred to entry just before the above break and using entry in the above code should be fine, I believe. > > } > > rcu_read_unlock(); > > -- > > 1.7.7 > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: rpcauth_lookup_credcache() lock contentions 2012-07-05 15:05 ` Malahal Naineni @ 2012-07-05 16:29 ` Andi Kleen 0 siblings, 0 replies; 11+ messages in thread From: Andi Kleen @ 2012-07-05 16:29 UTC (permalink / raw) To: Fengguang Wu, Linux-NFS, Trond.Myklebust > I know very little about this code, so take this with grain of salt! > The cred symbol is set above but there is no guaranty that you will use > it. If you don't win, you should set it to NULL. Yes you're probably right. Thanks. -Andi ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: rpcauth_lookup_credcache() lock contentions 2012-07-05 13:11 ` Fengguang Wu 2012-07-05 15:05 ` Malahal Naineni @ 2012-07-05 16:27 ` Andi Kleen 1 sibling, 0 replies; 11+ messages in thread From: Andi Kleen @ 2012-07-05 16:27 UTC (permalink / raw) To: Fengguang Wu; +Cc: Linux-NFS, Trond.Myklebust On Thu, Jul 05, 2012 at 09:11:05PM +0800, Fengguang Wu wrote: > On Wed, Jun 27, 2012 at 11:03:53AM -0700, Andi Kleen wrote: > > > > Can you try this patch? > > > > > > Thank you! This patch brings %sys down to 24%, a pretty large improvement! > > > > I redid the patch now and fixed the race Trond pointed out. > > > > Fengguang, can you retest please? Trond, does it look good now? > > Andi, this patch performs equally well! And it runs stable for over a week. Thanks for testing. But it still has the race Trond pointed out. Probably need to rely on write ordering with the uncached flag. -Andi ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2012-07-05 16:29 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20120623122604.GA10887@localhost>
2012-06-23 15:07 ` rpcauth_lookup_credcache() lock contentions Fengguang Wu
2012-06-24 21:34 ` Andi Kleen
2012-06-25 1:21 ` Myklebust, Trond
2012-06-25 2:45 ` Andi Kleen
2012-06-25 2:42 ` Fengguang Wu
2012-06-27 18:03 ` Andi Kleen
2012-06-27 18:36 ` Myklebust, Trond
2012-07-05 13:11 ` Fengguang Wu
2012-07-05 15:05 ` Malahal Naineni
2012-07-05 16:29 ` Andi Kleen
2012-07-05 16:27 ` Andi Kleen
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.