From: NeilBrown <neilb@suse.de>
To: "J. Bruce Fields" <bfields@fieldses.org>
Cc: "ZUIDAM, Hans" <Hans.Zuidam@philips.com>,
"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
"DE WITTE, PETER" <PETER.DE.WITTE@philips.com>
Subject: Re: Linux NFS and cached properties
Date: Tue, 31 Jul 2012 15:08:01 +1000 [thread overview]
Message-ID: <20120731150801.0a4b557b@notabene.brown> (raw)
In-Reply-To: <20120726223607.GA28982@fieldses.org>
[-- Attachment #1: Type: text/plain, Size: 4700 bytes --]
On Thu, 26 Jul 2012 18:36:07 -0400 "J. Bruce Fields" <bfields@fieldses.org>
wrote:
> On Tue, Jul 24, 2012 at 05:28:02PM +0000, ZUIDAM, Hans wrote:
> > Hi Bruce,
> >
> > Thanks for the clarification.
> >
> > (I'm repeating a lot of my original mail because of the Cc: list.)
> >
> > > J. Bruce Fields
> > > I think that's right, though I'm curious how you're managing to hit
> > > that case reliably every time. Or is this an intermittent failure?
> > It's an intermittent failure, but with the procedure shown below it is
> > fairly easy to reproduce. The actual problem we see in our product
> > is because of the way external storage media are handled in user-land.
> >
> > 192.168.1.10# mount -t xfs /dev/sdcr/sda1 /mnt
> > 192.168.1.10# exportfs 192.168.1.11:/mnt
> >
> > 192.168.1.11# mount 192.168.1.10:/mnt /mnt
> > 192.168.1.11# umount /mnt
> >
> > 192.168.1.10# exportfs -u 192.168.1.11:/mnt
> > 192.168.1.10# umount /mnt
> > umount: can't umount /media/recdisk: Device or resource busy
> >
> > What I actually do is the mount/unmount on the client via ssh. That
> > is a good way to trigger the problem.
> >
> > We see that during the un-export the NFS caches are not flushed
> > properly which is why the final unmount fails.
> >
> > In net/sunrpc/cache.c the cache times (last_refresh, expiry_time,
> > flush_time) are measured in seconds. If I understand the code somewhat
> > then during an NFS un-export the is done by setting the flush_time to
> > the current time. The cache_flush() is called. If in that same second
> > last_refresh is set to the current time then the cached item is not
> > flushed. This will subsequently cause un-mount to fail because there
> > is still a reference to the mount point.
> >
> > > J. Bruce Fields
> > > I ran across that recently while reviewing the code to fix a related
> > > problem. I'm not sure what the best fix would be.
> > >
> > > Previously raised here:
> > >
> > > http://marc.info/?l=linux-nfs&m=133514319408283&w=2
> >
> > The description in your mail does indeed looks the same as the problem
> > that we see.
> >
> > >From reading the code in net/sunrpc/cache.c I get the impression that it is
> > not really possible to reliably flush the caches for an un-exportfs such
> > that after flushing they will not accept entries for the un-exported IP/mount
> > point combination.
>
> Right. So, possible ideas, from that previous message:
>
> - As Neil suggests, modify exportfs to wait a second between
> updating etab and flushing the cache. At that point any
> entries still using the old information are at least a second
> old. That may be adequate for your case, but if someone out
> there is sensitive to the time required to unexport then that
> will annoy them. It also leaves the small possibility of
> races where an in-progress rpc may still be using an export at
> the time you try to flush.
> - Implement some new interface that you can use to flush the
> cache and that doesn't return until in-progress rpc's
> complete. Since it waits for rpc's it's not purely a "cache"
> layer interface any more. So maybe something like
> /proc/fs/nfsd/flush_exports.
> - As a workaround requiring no code changes: unexport, then shut
> down the server entirely and restart it. Clients will see
> that as a reboot recovery event and recover automatically, but
> applications may see delays while that happens. Kind of a big
> hammer, but if unexporting while other exports are in use is
> rare maybe it would be adequate for your case.
That's a shame...
I had originally intended "rpc.nfsd 0" to simple stop all threads and nothing
else. Then you would be able to:
rpc.nfsd 0
exportfs -f
unmount
rpc.nfsd 16
and have a nice fast race-free unmount.
But commit e096bbc6488d3e49d476bf986d33752709361277 'fixed' that :-(
I wonder if it can be resurrected ... maybe not worth the effort.
The idea of a new interface to synchronise with all threads has potential and
doesn't need to be at the nfsd level - it could be in sunrpc. Maybe it could
be built into the current 'flush' interface.
1/ iterate through all no-sleeping threads setting a flag an increasing a
counter.
2/ when a thread completes current request, if test_and_clear the flag, it
atomic_dec_and_test the counter and then wakes up some wait_queue_head.
3/ 'flush'ing thread waits on the waut_queue_head for the counter to be 0.
If you don't hate it I could possibly even provide some code.
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
next prev parent reply other threads:[~2012-07-31 5:08 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <D307B3AC0BCD4C419E6B8FA6A2720A9C0C3B2F@011-DB3MPN1-001.MGDPHG.emi.philips.com>
[not found] ` <20120724143748.GC8570@fieldses.org>
2012-07-24 17:28 ` Linux NFS and cached properties ZUIDAM, Hans
2012-07-26 22:36 ` J. Bruce Fields
2012-07-31 5:08 ` NeilBrown [this message]
2012-07-31 12:25 ` J. Bruce Fields
2012-07-31 12:45 ` J. Bruce Fields
2012-07-31 14:07 ` J. Bruce Fields
2012-08-02 0:04 ` NeilBrown
2012-08-02 2:50 ` J. Bruce Fields
2012-08-16 19:10 ` J. Bruce Fields
2012-08-16 21:05 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120731150801.0a4b557b@notabene.brown \
--to=neilb@suse.de \
--cc=Hans.Zuidam@philips.com \
--cc=PETER.DE.WITTE@philips.com \
--cc=bfields@fieldses.org \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).