From: NeilBrown <neilb@suse.com>
To: "Wang\,
Alan 1. \(NSB - CN\/Hangzhou\)" <alan.1.wang@nokia-sbell.com>,
"linux-nfs\@vger.kernel.org" <linux-nfs@vger.kernel.org>
To: "J. Bruce Fields" <bfields@fieldses.org>
Subject: Re: A special case in write_flush which cause the umount busy
Date: Wed, 14 Feb 2018 06:57:06 +1100 [thread overview]
Message-ID: <87inb0r7cd.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <VI1PR07MB342127E2A55088F691693A3BA2F00@VI1PR07MB3421.eurprd07.prod.outlook.com>
[-- Attachment #1: Type: text/plain, Size: 4649 bytes --]
On Sun, Feb 11 2018, Wang, Alan 1. (NSB - CN/Hangzhou) wrote:
> Hi,
>
> I have a test case on mount/umount on a partition from nfs server side. And encounter a problem of umount busy in a low probability.
>
> The Linux version is 3.10.64 with the patch "sunrpc/cache: make cache flushing more reliable".
> https://patchwork.kernel.org/patch/7410021/
>
> After some analysis and test in many times, I find that when it failed to mount, the time "then" and "now" are different, which caused the last_refresh is far beyond the flush_time. So this cache is not expired and won't be clean at once.
> Then the ref in cache_head won't be released, and mntput_no_expire didn't be called to decrease the count. That caused the umount busy.
>
> Below are logs in my test.
>
> kernel: [ 292.767801] write_flush 1480 then = 249, now = 250
> kernel: [ 292.767817] cache_clean 451, cd name nfsd.fh expiry_time 790485253, cd flush_time 249, last_refresh 369, seconds_since_boot 250
> kernel: [ 292.767913] write_flush 1480 then = 249, now = 250
> kernel: [ 292.767928] cache_clean 451, cd name nfsd.export expiry_time 2049, cd flush_time 249, last_refresh 369, seconds_since_boot 250
> kernel: [ 292.773229] do_refcount_check 283 mycount 4
> kernel: [ 292.773245] do_umount 1344 retval -16
>
> I think this happens in such case that the exportfs writes the flush with current time, the time of "then". But when seconds_since_boot being called in function write_flush, the time is on the next second, so the "now" is one second after "then".
> Because "then" is less than "now", the flush_time is set directly to original "then", rather than "cd->flush_time + 1".
>
>
> And I want to change the condition as below. I'm not sure it's OK and has no effects to other part.
I think this change is safe. It is a bit of a hack, but the "flush"
interface was actually very poorly designed and I don't think it *can*
be implemented cleanly.
Maybe we should just ignore the number written in and *always* flush
the cache completely. That is was it always wanted.
See below.
Bruce: do you have any thoughts on this?
Thanks,
NeilBrown
diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index aa36dad32db1..43f117d1547e 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -1450,7 +1450,7 @@ static ssize_t write_flush(struct file *file, const char __user *buf,
struct cache_detail *cd)
{
char tbuf[20];
- char *bp, *ep;
+ char *ep;
time_t then, now;
if (*ppos || count > sizeof(tbuf)-1)
@@ -1461,24 +1461,24 @@ static ssize_t write_flush(struct file *file, const char __user *buf,
simple_strtoul(tbuf, &ep, 0);
if (*ep && *ep != '\n')
return -EINVAL;
+ /* Note that while we check that 'buf' holds a valid number,
+ * we always ignore the value and just flush everything.
+ * Making use of the number leads to races.
+ */
- bp = tbuf;
- then = get_expiry(&bp);
now = seconds_since_boot();
- cd->nextcheck = now;
- /* Can only set flush_time to 1 second beyond "now", or
+ /* Always flush everything, so behave like cache_purge()
+ * Can only set flush_time to 1 second beyond "now", or
* possibly 1 second beyond flushtime. This is because
* flush_time never goes backwards so it mustn't get too far
* ahead of time.
*/
- if (then >= now) {
- /* Want to flush everything, so behave like cache_purge() */
- if (cd->flush_time >= now)
- now = cd->flush_time + 1;
- then = now;
- }
- cd->flush_time = then;
+ if (cd->flush_time >= now)
+ now = cd->flush_time + 1;
+
+ cd->flush_time = now;
+ cd->nextcheck = now;
cache_flush();
*ppos += count;
>
> --------------------------------------------------------------------------
> then = get_expiry(&bp);
> now = seconds_since_boot();
> cd->nextcheck = now;
> /* Can only set flush_time to 1 second beyond "now", or
> * possibly 1 second beyond flushtime. This is because
> * flush_time never goes backwards so it mustn't get too far
> * ahead of time.
> */
> if (then != now)
> printk("%s %d then = %d, now = %d\n", __func__, __LINE__, then, now);
> - if (then >= now) {
> + if (then >= now - 1) {
> /* Want to flush everything, so behave like cache_purge() */
> if (cd->flush_time >= now)
> now = cd->flush_time + 1;
> then = now;
> }
>
> cd->flush_time = then;
> --------------------------------------------------------------------------
>
> Best Regards,
> Alan Wang
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]
next prev parent reply other threads:[~2018-02-13 19:57 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-11 6:55 A special case in write_flush which cause the umount busy Wang, Alan 1. (NSB - CN/Hangzhou)
2018-02-13 19:57 ` NeilBrown [this message]
2018-02-13 21:03 ` J. Bruce Fields
2018-02-14 1:15 ` [PATCH] SUNRPC: cache: ignore timestamp written to 'flush' file NeilBrown
2018-02-22 0:46 ` J. Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87inb0r7cd.fsf@notabene.neil.brown.name \
--to=neilb@suse.com \
--cc=alan.1.wang@nokia-sbell.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).