From: NeilBrown <neilb@suse.de>
To: bstroesser@ts.fujitsu.com
Cc: bfields@fieldses.org, linux-nfs@vger.kernel.org
Subject: Re: [PATCH] sunrpc.ko: RPC cache fix races
Date: Wed, 20 Feb 2013 14:08:48 +1100 [thread overview]
Message-ID: <20130220140848.7d664dfb@notabene.brown> (raw)
In-Reply-To: <61eb00$3f3hds@dgate20u.abg.fsc.net>
[-- Attachment #1: Type: text/plain, Size: 10498 bytes --]
On 19 Feb 2013 18:08:40 +0100 bstroesser@ts.fujitsu.com wrote:
> Second attempt using the correct FROM. Sorry for the noise.
>
>
> Hi,
>
> I found a problem in sunrpc.ko on a SLES11 SP1 (2.6.32.59-0,7.1)
> and fixed it (see first patch ifor 2.6.32.60 below).
> For us the patch works fine (tested on 2.6.32.59-0.7.1).
>
> AFAICS from the code, the problem seems to persist in current
> kernel versions also. Thus, I added the second patch for 3.7.9.
> As the setup to reproduce the problem is quite complex, I couldn't
> test the second patch yet. So consider this one as a RFC.
>
> Best regards,
> Bodo
>
> Please CC me, I'm not on the list.
>
> =========================================
> From: Bodo Stroesser <bstroesser@ts.fujitsu.com>
> Date: Fri, 08 Feb 2013
> Subject: [PATCH] net: sunrpc: fix races in RPC cache
>
> We found the problem and tested the patch on a SLES11 SP1 2.6.32.59-0.7.1
>
> This patch applies to linux-2.6.32.60 (changed monotonic_seconds -->
> get_seconds())
>
> Sporadically NFS3 RPC requests to the nfs server are dropped due to
> cache_check() (net/sunrpc/cache.c) returning -ETIMEDOUT for an entry
> of the "auth.unix.gid" cache.
> In this case, no NFS reply is sent to the client.
>
> The reason for the dropped requests are races in cache_check() when
> cache_make_upcall() returns -EINVAL (because it is called for a cache
> without readers) and cache_check() therefore refreshes the cache entry
> (rv == -EAGAIN).
>
> There are three details that need to be changed:
> 1) cache_revisit_request() must not be called before cache_fresh_locked()
> has updated the cache entry, as cache_revisit_request() wakes up
> threads waiting for the cache entry to be updated.
This certainly seems correct. It is wrong to call cache_revisit_request() so
early.
> The explicit call to cache_revisit_request() is not needed, as
> cache_fresh_unlocked() calls it anyway.
But cache_fresh_unlocked is only called if "rv == -EAGAIN", however we also
need to call it in the case where "age > refresh_age/2" - it must always be
called after clearing CACHE_PENDING.
Also, cache_fresh_unlocked() only calls cache_revisit_request() if
CACHE_PENDING is set, but we have just cleared it! Some definitely something
wrong here.
(Note that I'm looking at the SLES 2.6.32 code at the moment, mainline is a
bit different).
> (But in case of not updating the cache entry, cache_revisit_request()
> must be called. Thus, we use a fall through in the "case").
Hmm... I don't like case fallthroughs unless they have nice big comments:
/* FALLTHROUGH */
or similar. :-)
> 2) CACHE_PENDING must not be cleared before cache_fresh_locked() has
> updated the cache entry, as cache_defer_req() can return without really
> sleeping if it detects CACHE_PENDING unset.
Agreed. So we should leave the clearing of CACHE_PENDING to
cache_fresh_unlocked().
> (In case of not updating the cache entry again we use the fall through)
> 3) Imagine a thread that calls cache_check() and gets rv = -ENOENT from
> cache_is_valid(). Then it sets CACHE_PENDINGs and calls
> cache_make_upcall().
> We assume that meanwhile get_seconds() advances to the next
> sec. and a second thread also calls cache_check(). It gets -EAGAIN from
> cache_is_valid() for the same cache entry. As CACHE_PENDING still is
> set, it calls cache_defer_req() immediately and waits for a wakeup from
> the first thread.
> After cache_make_upcall() returned -EINVAL, the first thread does not
> update the cache entry as it had got rv = -ENOENT, but wakes up the
> second thread by calling cache_revisit_request().
> Thread two wakes up, calls cache_is_valid() and again gets -EAGAIN.
> Thus, the result of the second cache_check() is -ETIMEDOUT and the
> NFS request is dropped.
Yep, that's not so good....
> To solve this, the cache entry now is updated not only if rv == -EAGAIN
> but if rv == -ENOENT also. This is a sufficient workaround, as the
> first thread would have to stay in cache_check() between its call to
> cache_is_valid() and clearing CACHE_PENDING for more than 60 seconds
> to break the workaround.
Still, it isn't nice to just have a work-around. It would be best to have a
fix.
The key problem here is that cache_is_valid() is time-sensitive. This have
been address in mainline - cache_is_valid doesn't depend on the current time
there.
>
> Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
> ---
>
> --- a/net/sunrpc/cache.c 2012-08-08 21:35:09.000000000 +0200
> +++ b/net/sunrpc/cache.c 2013-02-08 14:29:41.000000000 +0100
> @@ -233,15 +233,14 @@ int cache_check(struct cache_detail *det
> if (!test_and_set_bit(CACHE_PENDING, &h->flags)) {
> switch (cache_make_upcall(detail, h)) {
> case -EINVAL:
> - clear_bit(CACHE_PENDING, &h->flags);
> - cache_revisit_request(h);
> - if (rv == -EAGAIN) {
> + if (rv == -EAGAIN || rv == -ENOENT) {
> set_bit(CACHE_NEGATIVE, &h->flags);
> cache_fresh_locked(h, get_seconds()+CACHE_NEW_EXPIRY);
> + clear_bit(CACHE_PENDING, &h->flags);
> cache_fresh_unlocked(h, detail);
> rv = -ENOENT;
> + break;
> }
> - break;
>
> case -EAGAIN:
> clear_bit(CACHE_PENDING, &h->flags);
I agree with some of this....
Maybe:
switch(cache_make_upcall(detail, h)) {
case -EINVAL:
if (rv) {
set_bit(CACHE_NEGATIVE, &h->flags);
cache_fresh_locked(h, get_seconds() + CACHE_NEW_EXPIRY);
rv = -ENOENT;
}
/* FALLTHROUGH */
case -EAGAIN:
cache_fresh_unlocked(h, detail);
}
Though it isn't good that cache_fresh_locked() is being called without
holding a lock! Maybe we should import try_to_negate_entry() from mainline.
> =========================================
> From: Bodo Stroesser <bstroesser@ts.fujitsu.com>
> Date: Fri, 08 Feb 2013
> Subject: [PATCH] net: sunrpc: fix races in RPC cache
>
> This patch applies to SLES 11 SP2 linux-3.0.51-0.7.9 and also to
> vanilla linux-3.7.9
>
> It is untested and is only based on a code review after we
> analyzed the reason for NFS requests being dropped on a
> SLES11 SP1 (linux-2.6.32.59-0.7.1)
>
> Sporadically NFS3 RPC requests to the nfs server are dropped due to
> cache_check() (net/sunrpc/cache.c) returning -ETIMEDOUT for an entry
> of the "auth.unix.gid" cache.
> In this case, no NFS reply is sent to the client.
>
> The reason for the dropped requests are races in cache_check() when
> cache_make_upcall() returns -EINVAL (because it is called for a cache
> without readers) and cache_check() therefore refreshes the cache entry
> (rv == -EAGAIN).
>
> There are two details that need to be changed:
> 1) cache_revisit_request() must not be called before cache_fresh_locked()
> has updated the cache entry, as cache_revisit_request() wakes up
> threads waiting for the cache entry to be updated.
> The explicit call to cache_revisit_request() is not needed, as
> cache_fresh_unlocked() calls it anyway.
> (But in case of not updating the cache entry, cache_revisit_request()
> must be called).
> 2) CACHE_PENDING must not be cleared before cache_fresh_locked() has
> updated the cache entry, as cache_wait_req() called by
> cache_defer_req() can return without really sleeping if it detects
> CACHE_PENDING unset.
>
> Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
> ---
>
> --- a/net/sunrpc/cache.c 2013-02-08 15:56:07.000000000 +0100
> +++ b/net/sunrpc/cache.c 2013-02-08 16:04:32.000000000 +0100
> @@ -230,11 +230,14 @@ static int try_to_negate_entry(struct ca
> rv = cache_is_valid(detail, h);
> if (rv != -EAGAIN) {
> write_unlock(&detail->hash_lock);
> + clear_bit(CACHE_PENDING, &h->flags);
> + cache_revisit_request(h);
This should just be cache_fresh_unlocked(), as below.
> return rv;
> }
> set_bit(CACHE_NEGATIVE, &h->flags);
> cache_fresh_locked(h, seconds_since_boot()+CACHE_NEW_EXPIRY);
> write_unlock(&detail->hash_lock);
> + clear_bit(CACHE_PENDING, &h->flags);
Clearing this bit is wrong - cache_frsh_unlocked will do that.
> cache_fresh_unlocked(h, detail);
> return -ENOENT;
> }
So maybe:
static int try_to_negate_entry(....)
{
int rv;
write_lock(&detail->hash_lock);
rv = cache_is_valid(detail, h);
if (rv == -EAGAIN) {
set_bit(CACHE_NEGATIVE, &h->flags);
cache_fresh_locked(h, ....);
rv = -ENOENT;
}
write_unlock(&detail->hash_lock);
cache_fresh_unlocked(h, detail);
return rv;
}
????
> @@ -275,8 +278,6 @@ int cache_check(struct cache_detail *det
> if (!test_and_set_bit(CACHE_PENDING, &h->flags)) {
> switch (cache_make_upcall(detail, h)) {
> case -EINVAL:
> - clear_bit(CACHE_PENDING, &h->flags);
> - cache_revisit_request(h);
> rv = try_to_negate_entry(detail, h);
> break;
> case -EAGAIN:
Yes, those lines should definitely be removed.
So maybe this against mainline:
diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index 9afa439..7296644 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -228,12 +228,11 @@ static int try_to_negate_entry(struct cache_detail *detail, struct cache_head *h
write_lock(&detail->hash_lock);
rv = cache_is_valid(detail, h);
- if (rv != -EAGAIN) {
- write_unlock(&detail->hash_lock);
- return rv;
+ if (rv == -EAGAIN) {
+ set_bit(CACHE_NEGATIVE, &h->flags);
+ cache_fresh_locked(h, seconds_since_boot()+CACHE_NEW_EXPIRY);
+ rv = -ENOENT;
}
- set_bit(CACHE_NEGATIVE, &h->flags);
- cache_fresh_locked(h, seconds_since_boot()+CACHE_NEW_EXPIRY);
write_unlock(&detail->hash_lock);
cache_fresh_unlocked(h, detail);
return -ENOENT;
@@ -275,13 +274,10 @@ int cache_check(struct cache_detail *detail,
if (!test_and_set_bit(CACHE_PENDING, &h->flags)) {
switch (cache_make_upcall(detail, h)) {
case -EINVAL:
- clear_bit(CACHE_PENDING, &h->flags);
- cache_revisit_request(h);
rv = try_to_negate_entry(detail, h);
break;
case -EAGAIN:
- clear_bit(CACHE_PENDING, &h->flags);
- cache_revisit_request(h);
+ cache_fresh_unlocked(h, detail);
break;
}
}
Is that convincing?
Thanks a lot for your very thorough analysis!
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
next parent reply other threads:[~2013-02-20 3:09 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <61eb00$3f3hds@dgate20u.abg.fsc.net>
2013-02-20 3:08 ` NeilBrown [this message]
[not found] <d6437a$43j82m@dgate10u.abg.fsc.net>
2013-02-26 2:56 ` [PATCH] sunrpc.ko: RPC cache fix races NeilBrown
2013-02-25 19:55 Bodo Stroesser
[not found] <61eb00$3f9tb7@dgate20u.abg.fsc.net>
2013-02-24 22:52 ` NeilBrown
-- strict thread matches above, loose matches on Subject: below --
2013-02-21 15:44 Bodo Stroesser
[not found] <d6437a$434ic3@dgate10u.abg.fsc.net>
2013-02-20 23:55 ` NeilBrown
2013-02-20 13:57 bstroesser
[not found] <61eb00$3f3hdt@dgate20u.abg.fsc.net>
2013-02-19 21:35 ` J. Bruce Fields
2013-02-19 17:08 bstroesser
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130220140848.7d664dfb@notabene.brown \
--to=neilb@suse.de \
--cc=bfields@fieldses.org \
--cc=bstroesser@ts.fujitsu.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).