* Another possible race in put_rpccred / rpcauth_unhash_cred
@ 2009-11-18 0:56 Neil Brown
[not found] ` <19203.18009.945354.446817-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
0 siblings, 1 reply; 3+ messages in thread
From: Neil Brown @ 2009-11-18 0:56 UTC (permalink / raw)
To: Trond Myklebust; +Cc: linux-nfs
Hi again Trond,
I've been looking at a customer report of an oops in
rpcauth_lookup_credcache.
https://bugzilla.novell.com/show_bug.cgi?id=547137
It appears that one of the hash chains in the cache has become
corrupted.
In trying to guess how this might happen, the one possibility I have
found is that it might be possible for hlist_del_rcu to be called on
an rpc_cred that is not hashed. This certainly could cause list
corruption.
put_rpccred checks if the cred is hashed (RPCAUTH_CRED_HASHED) and if
not, it takes a lock and then - if the cred it not uptodate -
calls rpcauth_unhash_cred which will call hlist_del_rcu.
If some other thread unhashed the cred while this thread was waiting
for rpc_credcache_lock, we could get list corruption.
The kernel in question did not have commit 5f707eb429 which claims to
fix a race in much the same place, however I don't think that patch
would fix this problem.
I think the simplest fix would be to use hlist_del_init_rcu in place
of hlist_del_rcu as shown below. That version protects again
deleting the same element multiple times.
Do you agree with this fix?
Thanks,
NeilBrown
diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c
index 54a4e04..323eff6 100644
--- a/net/sunrpc/auth.c
+++ b/net/sunrpc/auth.c
@@ -118,7 +118,7 @@ static DEFINE_SPINLOCK(rpc_credcache_lock);
static void
rpcauth_unhash_cred_locked(struct rpc_cred *cred)
{
- hlist_del_rcu(&cred->cr_hash);
+ hlist_del_init_rcu(&cred->cr_hash);
smp_mb__before_clear_bit();
clear_bit(RPCAUTH_CRED_HASHED, &cred->cr_flags);
}
^ permalink raw reply related [flat|nested] 3+ messages in thread[parent not found: <19203.18009.945354.446817-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>]
* Re: Another possible race in put_rpccred / rpcauth_unhash_cred [not found] ` <19203.18009.945354.446817-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org> @ 2009-11-19 15:33 ` Trond Myklebust 2009-11-20 16:48 ` Trond Myklebust 1 sibling, 0 replies; 3+ messages in thread From: Trond Myklebust @ 2009-11-19 15:33 UTC (permalink / raw) To: Neil Brown; +Cc: linux-nfs On Wed, 2009-11-18 at 11:56 +1100, Neil Brown wrote: > Hi again Trond, > > I've been looking at a customer report of an oops in > rpcauth_lookup_credcache. > https://bugzilla.novell.com/show_bug.cgi?id=547137 > > It appears that one of the hash chains in the cache has become > corrupted. > > In trying to guess how this might happen, the one possibility I have > found is that it might be possible for hlist_del_rcu to be called on > an rpc_cred that is not hashed. This certainly could cause list > corruption. > > put_rpccred checks if the cred is hashed (RPCAUTH_CRED_HASHED) and if > not, it takes a lock and then - if the cred it not uptodate - > calls rpcauth_unhash_cred which will call hlist_del_rcu. > If some other thread unhashed the cred while this thread was waiting > for rpc_credcache_lock, we could get list corruption. > > The kernel in question did not have commit 5f707eb429 which claims to > fix a race in much the same place, however I don't think that patch > would fix this problem. > > I think the simplest fix would be to use hlist_del_init_rcu in place > of hlist_del_rcu as shown below. That version protects again > deleting the same element multiple times. > > Do you agree with this fix? > > Thanks, > NeilBrown > > > > diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c > index 54a4e04..323eff6 100644 > --- a/net/sunrpc/auth.c > +++ b/net/sunrpc/auth.c > @@ -118,7 +118,7 @@ static DEFINE_SPINLOCK(rpc_credcache_lock); > static void > rpcauth_unhash_cred_locked(struct rpc_cred *cred) > { > - hlist_del_rcu(&cred->cr_hash); > + hlist_del_init_rcu(&cred->cr_hash); > smp_mb__before_clear_bit(); > clear_bit(RPCAUTH_CRED_HASHED, &cred->cr_flags); > } Hmm... Why not simply convert that 'clear_bit()' into a 'test_and_clear_bit()' and have it protect the hlist_del_rcu? I can't see anything that would break if we do this. There doesn't seem to be anything that explicitly depends on the ordering of those two operations... Cheers Trond ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Another possible race in put_rpccred / rpcauth_unhash_cred [not found] ` <19203.18009.945354.446817-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org> 2009-11-19 15:33 ` Trond Myklebust @ 2009-11-20 16:48 ` Trond Myklebust 1 sibling, 0 replies; 3+ messages in thread From: Trond Myklebust @ 2009-11-20 16:48 UTC (permalink / raw) To: Neil Brown; +Cc: linux-nfs On Wed, 2009-11-18 at 11:56 +1100, Neil Brown wrote: > Hi again Trond, > > I've been looking at a customer report of an oops in > rpcauth_lookup_credcache. > https://bugzilla.novell.com/show_bug.cgi?id=547137 > > It appears that one of the hash chains in the cache has become > corrupted. How about the following patch? Cheers Trond -------------------------------------------------------------------------- RPC: Fix two potential races in put_rpccred From: Trond Myklebust <Trond.Myklebust@netapp.com> It is possible for rpcauth_destroy_credcache() to cause the rpc credentials to be unhashed while put_rpccred is waiting for the rpc_credcache_lock on another cpu. Should this happen, then we can end up calling hlist_del_rcu(&cred->cr_hash) a second time, thus causing list corruption. Should the credential actually be hashed, it is possible for rpcauth_lookup_credcache to find and reference it before we get round to unhashing it. We should therefore check for whether the credential is referenced after we've unhashed it. See https://bugzilla.novell.com/show_bug.cgi?id=547137 Reported-by: Neil Brown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> --- net/sunrpc/auth.c | 39 +++++++++++++++++++++++---------------- 1 files changed, 23 insertions(+), 16 deletions(-) diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c index 54a4e04..7ee6f7e 100644 --- a/net/sunrpc/auth.c +++ b/net/sunrpc/auth.c @@ -123,16 +123,19 @@ rpcauth_unhash_cred_locked(struct rpc_cred *cred) clear_bit(RPCAUTH_CRED_HASHED, &cred->cr_flags); } -static void +static int rpcauth_unhash_cred(struct rpc_cred *cred) { spinlock_t *cache_lock; + int ret; cache_lock = &cred->cr_auth->au_credcache->lock; spin_lock(cache_lock); - if (atomic_read(&cred->cr_count) == 0) + ret = atomic_read(&cred->cr_count) == 0; + if (ret) rpcauth_unhash_cred_locked(cred); spin_unlock(cache_lock); + return ret; } /* @@ -446,31 +449,35 @@ void put_rpccred(struct rpc_cred *cred) { /* Fast path for unhashed credentials */ - if (test_bit(RPCAUTH_CRED_HASHED, &cred->cr_flags) != 0) - goto need_lock; - - if (!atomic_dec_and_test(&cred->cr_count)) + if (test_bit(RPCAUTH_CRED_HASHED, &cred->cr_flags) == 0) { + if (atomic_dec_and_test(&cred->cr_count)) + cred->cr_ops->crdestroy(cred); return; - goto out_destroy; -need_lock: + } + if (!atomic_dec_and_lock(&cred->cr_count, &rpc_credcache_lock)) return; if (!list_empty(&cred->cr_lru)) { number_cred_unused--; list_del_init(&cred->cr_lru); } - if (test_bit(RPCAUTH_CRED_UPTODATE, &cred->cr_flags) == 0) - rpcauth_unhash_cred(cred); if (test_bit(RPCAUTH_CRED_HASHED, &cred->cr_flags) != 0) { - cred->cr_expire = jiffies; - list_add_tail(&cred->cr_lru, &cred_unused); - number_cred_unused++; - spin_unlock(&rpc_credcache_lock); - return; + if (test_bit(RPCAUTH_CRED_UPTODATE, &cred->cr_flags) != 0) { + cred->cr_expire = jiffies; + list_add_tail(&cred->cr_lru, &cred_unused); + number_cred_unused++; + goto out_nodestroy; + } + if (!rpcauth_unhash_cred(cred)) { + /* We were hashed and someone looked us up... */ + goto out_nodestroy; + } } spin_unlock(&rpc_credcache_lock); -out_destroy: cred->cr_ops->crdestroy(cred); + return; +out_nodestroy: + spin_unlock(&rpc_credcache_lock); } EXPORT_SYMBOL_GPL(put_rpccred); ^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2009-11-20 16:48 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-18 0:56 Another possible race in put_rpccred / rpcauth_unhash_cred Neil Brown
[not found] ` <19203.18009.945354.446817-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2009-11-19 15:33 ` Trond Myklebust
2009-11-20 16:48 ` Trond Myklebust
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox