From: NeilBrown <neilb@suse.de>
To: bstroesser@ts.fujitsu.com
Cc: bfields@fieldses.org, linux-nfs@vger.kernel.org
Subject: Re: [PATCH] sunrpc.ko: RPC cache fix races
Date: Wed, 20 Feb 2013 14:08:48 +1100 [thread overview]
Message-ID: <20130220140848.7d664dfb@notabene.brown> (raw)
In-Reply-To: <61eb00$3f3hds@dgate20u.abg.fsc.net>
[-- Attachment #1: Type: text/plain, Size: 10498 bytes --]
On 19 Feb 2013 18:08:40 +0100 bstroesser@ts.fujitsu.com wrote:
> Second attempt using the correct FROM. Sorry for the noise.
>
>
> Hi,
>
> I found a problem in sunrpc.ko on a SLES11 SP1 (2.6.32.59-0,7.1)
> and fixed it (see first patch ifor 2.6.32.60 below).
> For us the patch works fine (tested on 2.6.32.59-0.7.1).
>
> AFAICS from the code, the problem seems to persist in current
> kernel versions also. Thus, I added the second patch for 3.7.9.
> As the setup to reproduce the problem is quite complex, I couldn't
> test the second patch yet. So consider this one as a RFC.
>
> Best regards,
> Bodo
>
> Please CC me, I'm not on the list.
>
> =========================================
> From: Bodo Stroesser <bstroesser@ts.fujitsu.com>
> Date: Fri, 08 Feb 2013
> Subject: [PATCH] net: sunrpc: fix races in RPC cache
>
> We found the problem and tested the patch on a SLES11 SP1 2.6.32.59-0.7.1
>
> This patch applies to linux-2.6.32.60 (changed monotonic_seconds -->
> get_seconds())
>
> Sporadically NFS3 RPC requests to the nfs server are dropped due to
> cache_check() (net/sunrpc/cache.c) returning -ETIMEDOUT for an entry
> of the "auth.unix.gid" cache.
> In this case, no NFS reply is sent to the client.
>
> The reason for the dropped requests are races in cache_check() when
> cache_make_upcall() returns -EINVAL (because it is called for a cache
> without readers) and cache_check() therefore refreshes the cache entry
> (rv == -EAGAIN).
>
> There are three details that need to be changed:
> 1) cache_revisit_request() must not be called before cache_fresh_locked()
> has updated the cache entry, as cache_revisit_request() wakes up
> threads waiting for the cache entry to be updated.
This certainly seems correct. It is wrong to call cache_revisit_request() so
early.
> The explicit call to cache_revisit_request() is not needed, as
> cache_fresh_unlocked() calls it anyway.
But cache_fresh_unlocked is only called if "rv == -EAGAIN", however we also
need to call it in the case where "age > refresh_age/2" - it must always be
called after clearing CACHE_PENDING.
Also, cache_fresh_unlocked() only calls cache_revisit_request() if
CACHE_PENDING is set, but we have just cleared it! Some definitely something
wrong here.
(Note that I'm looking at the SLES 2.6.32 code at the moment, mainline is a
bit different).
> (But in case of not updating the cache entry, cache_revisit_request()
> must be called. Thus, we use a fall through in the "case").
Hmm... I don't like case fallthroughs unless they have nice big comments:
/* FALLTHROUGH */
or similar. :-)
> 2) CACHE_PENDING must not be cleared before cache_fresh_locked() has
> updated the cache entry, as cache_defer_req() can return without really
> sleeping if it detects CACHE_PENDING unset.
Agreed. So we should leave the clearing of CACHE_PENDING to
cache_fresh_unlocked().
> (In case of not updating the cache entry again we use the fall through)
> 3) Imagine a thread that calls cache_check() and gets rv = -ENOENT from
> cache_is_valid(). Then it sets CACHE_PENDINGs and calls
> cache_make_upcall().
> We assume that meanwhile get_seconds() advances to the next
> sec. and a second thread also calls cache_check(). It gets -EAGAIN from
> cache_is_valid() for the same cache entry. As CACHE_PENDING still is
> set, it calls cache_defer_req() immediately and waits for a wakeup from
> the first thread.
> After cache_make_upcall() returned -EINVAL, the first thread does not
> update the cache entry as it had got rv = -ENOENT, but wakes up the
> second thread by calling cache_revisit_request().
> Thread two wakes up, calls cache_is_valid() and again gets -EAGAIN.
> Thus, the result of the second cache_check() is -ETIMEDOUT and the
> NFS request is dropped.
Yep, that's not so good....
> To solve this, the cache entry now is updated not only if rv == -EAGAIN
> but if rv == -ENOENT also. This is a sufficient workaround, as the
> first thread would have to stay in cache_check() between its call to
> cache_is_valid() and clearing CACHE_PENDING for more than 60 seconds
> to break the workaround.
Still, it isn't nice to just have a work-around. It would be best to have a
fix.
The key problem here is that cache_is_valid() is time-sensitive. This have
been address in mainline - cache_is_valid doesn't depend on the current time
there.
>
> Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
> ---
>
> --- a/net/sunrpc/cache.c 2012-08-08 21:35:09.000000000 +0200
> +++ b/net/sunrpc/cache.c 2013-02-08 14:29:41.000000000 +0100
> @@ -233,15 +233,14 @@ int cache_check(struct cache_detail *det
> if (!test_and_set_bit(CACHE_PENDING, &h->flags)) {
> switch (cache_make_upcall(detail, h)) {
> case -EINVAL:
> - clear_bit(CACHE_PENDING, &h->flags);
> - cache_revisit_request(h);
> - if (rv == -EAGAIN) {
> + if (rv == -EAGAIN || rv == -ENOENT) {
> set_bit(CACHE_NEGATIVE, &h->flags);
> cache_fresh_locked(h, get_seconds()+CACHE_NEW_EXPIRY);
> + clear_bit(CACHE_PENDING, &h->flags);
> cache_fresh_unlocked(h, detail);
> rv = -ENOENT;
> + break;
> }
> - break;
>
> case -EAGAIN:
> clear_bit(CACHE_PENDING, &h->flags);
I agree with some of this....
Maybe:
switch(cache_make_upcall(detail, h)) {
case -EINVAL:
if (rv) {
set_bit(CACHE_NEGATIVE, &h->flags);
cache_fresh_locked(h, get_seconds() + CACHE_NEW_EXPIRY);
rv = -ENOENT;
}
/* FALLTHROUGH */
case -EAGAIN:
cache_fresh_unlocked(h, detail);
}
Though it isn't good that cache_fresh_locked() is being called without
holding a lock! Maybe we should import try_to_negate_entry() from mainline.
> =========================================
> From: Bodo Stroesser <bstroesser@ts.fujitsu.com>
> Date: Fri, 08 Feb 2013
> Subject: [PATCH] net: sunrpc: fix races in RPC cache
>
> This patch applies to SLES 11 SP2 linux-3.0.51-0.7.9 and also to
> vanilla linux-3.7.9
>
> It is untested and is only based on a code review after we
> analyzed the reason for NFS requests being dropped on a
> SLES11 SP1 (linux-2.6.32.59-0.7.1)
>
> Sporadically NFS3 RPC requests to the nfs server are dropped due to
> cache_check() (net/sunrpc/cache.c) returning -ETIMEDOUT for an entry
> of the "auth.unix.gid" cache.
> In this case, no NFS reply is sent to the client.
>
> The reason for the dropped requests are races in cache_check() when
> cache_make_upcall() returns -EINVAL (because it is called for a cache
> without readers) and cache_check() therefore refreshes the cache entry
> (rv == -EAGAIN).
>
> There are two details that need to be changed:
> 1) cache_revisit_request() must not be called before cache_fresh_locked()
> has updated the cache entry, as cache_revisit_request() wakes up
> threads waiting for the cache entry to be updated.
> The explicit call to cache_revisit_request() is not needed, as
> cache_fresh_unlocked() calls it anyway.
> (But in case of not updating the cache entry, cache_revisit_request()
> must be called).
> 2) CACHE_PENDING must not be cleared before cache_fresh_locked() has
> updated the cache entry, as cache_wait_req() called by
> cache_defer_req() can return without really sleeping if it detects
> CACHE_PENDING unset.
>
> Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
> ---
>
> --- a/net/sunrpc/cache.c 2013-02-08 15:56:07.000000000 +0100
> +++ b/net/sunrpc/cache.c 2013-02-08 16:04:32.000000000 +0100
> @@ -230,11 +230,14 @@ static int try_to_negate_entry(struct ca
> rv = cache_is_valid(detail, h);
> if (rv != -EAGAIN) {
> write_unlock(&detail->hash_lock);
> + clear_bit(CACHE_PENDING, &h->flags);
> + cache_revisit_request(h);
This should just be cache_fresh_unlocked(), as below.
> return rv;
> }
> set_bit(CACHE_NEGATIVE, &h->flags);
> cache_fresh_locked(h, seconds_since_boot()+CACHE_NEW_EXPIRY);
> write_unlock(&detail->hash_lock);
> + clear_bit(CACHE_PENDING, &h->flags);
Clearing this bit is wrong - cache_frsh_unlocked will do that.
> cache_fresh_unlocked(h, detail);
> return -ENOENT;
> }
So maybe:
static int try_to_negate_entry(....)
{
int rv;
write_lock(&detail->hash_lock);
rv = cache_is_valid(detail, h);
if (rv == -EAGAIN) {
set_bit(CACHE_NEGATIVE, &h->flags);
cache_fresh_locked(h, ....);
rv = -ENOENT;
}
write_unlock(&detail->hash_lock);
cache_fresh_unlocked(h, detail);
return rv;
}
????
> @@ -275,8 +278,6 @@ int cache_check(struct cache_detail *det
> if (!test_and_set_bit(CACHE_PENDING, &h->flags)) {
> switch (cache_make_upcall(detail, h)) {
> case -EINVAL:
> - clear_bit(CACHE_PENDING, &h->flags);
> - cache_revisit_request(h);
> rv = try_to_negate_entry(detail, h);
> break;
> case -EAGAIN:
Yes, those lines should definitely be removed.
So maybe this against mainline:
diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index 9afa439..7296644 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -228,12 +228,11 @@ static int try_to_negate_entry(struct cache_detail *detail, struct cache_head *h
write_lock(&detail->hash_lock);
rv = cache_is_valid(detail, h);
- if (rv != -EAGAIN) {
- write_unlock(&detail->hash_lock);
- return rv;
+ if (rv == -EAGAIN) {
+ set_bit(CACHE_NEGATIVE, &h->flags);
+ cache_fresh_locked(h, seconds_since_boot()+CACHE_NEW_EXPIRY);
+ rv = -ENOENT;
}
- set_bit(CACHE_NEGATIVE, &h->flags);
- cache_fresh_locked(h, seconds_since_boot()+CACHE_NEW_EXPIRY);
write_unlock(&detail->hash_lock);
cache_fresh_unlocked(h, detail);
return -ENOENT;
@@ -275,13 +274,10 @@ int cache_check(struct cache_detail *detail,
if (!test_and_set_bit(CACHE_PENDING, &h->flags)) {
switch (cache_make_upcall(detail, h)) {
case -EINVAL:
- clear_bit(CACHE_PENDING, &h->flags);
- cache_revisit_request(h);
rv = try_to_negate_entry(detail, h);
break;
case -EAGAIN:
- clear_bit(CACHE_PENDING, &h->flags);
- cache_revisit_request(h);
+ cache_fresh_unlocked(h, detail);
break;
}
}
Is that convincing?
Thanks a lot for your very thorough analysis!
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
next parent reply other threads:[~2013-02-20 3:09 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <61eb00$3f3hds@dgate20u.abg.fsc.net>
2013-02-20 3:08 ` NeilBrown [this message]
[not found] <d6437a$43j82m@dgate10u.abg.fsc.net>
2013-02-26 2:56 ` [PATCH] sunrpc.ko: RPC cache fix races NeilBrown
2013-02-25 19:55 Bodo Stroesser
[not found] <61eb00$3f9tb7@dgate20u.abg.fsc.net>
2013-02-24 22:52 ` NeilBrown
-- strict thread matches above, loose matches on Subject: below --
2013-02-21 15:44 Bodo Stroesser
[not found] <d6437a$434ic3@dgate10u.abg.fsc.net>
2013-02-20 23:55 ` NeilBrown
2013-02-20 13:57 bstroesser
[not found] <61eb00$3f3hdt@dgate20u.abg.fsc.net>
2013-02-19 21:35 ` J. Bruce Fields
2013-02-19 17:08 bstroesser
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130220140848.7d664dfb@notabene.brown \
--to=neilb@suse.de \
--cc=bfields@fieldses.org \
--cc=bstroesser@ts.fujitsu.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.