All of lore.kernel.org
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: "J. Bruce Fields" <bfields@fieldses.org>
Cc: "ZUIDAM, Hans" <Hans.Zuidam@philips.com>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"DE WITTE, PETER" <PETER.DE.WITTE@philips.com>
Subject: Re: Linux NFS and cached properties
Date: Tue, 31 Jul 2012 15:08:01 +1000	[thread overview]
Message-ID: <20120731150801.0a4b557b@notabene.brown> (raw)
In-Reply-To: <20120726223607.GA28982@fieldses.org>

[-- Attachment #1: Type: text/plain, Size: 4700 bytes --]

On Thu, 26 Jul 2012 18:36:07 -0400 "J. Bruce Fields" <bfields@fieldses.org>
wrote:

> On Tue, Jul 24, 2012 at 05:28:02PM +0000, ZUIDAM, Hans wrote:
> > Hi Bruce,
> > 
> > Thanks for the clarification.
> > 
> > (I'm repeating a lot of my original mail because of the Cc: list.)
> > 
> > > J. Bruce Fields
> > > I think that's right, though I'm curious how you're managing to hit
> > > that case reliably every time.  Or is this an intermittent failure?
> > It's an intermittent failure, but with the procedure shown below it is
> > fairly easy to reproduce.    The actual problem we see in our product
> > is because of the way external storage media are handled in user-land.
> > 
> >         192.168.1.10# mount -t xfs /dev/sdcr/sda1 /mnt
> >         192.168.1.10# exportfs 192.168.1.11:/mnt
> > 
> >         192.168.1.11# mount 192.168.1.10:/mnt /mnt
> >         192.168.1.11# umount /mnt
> > 
> >         192.168.1.10# exportfs -u 192.168.1.11:/mnt
> >         192.168.1.10# umount /mnt
> >         umount: can't umount /media/recdisk: Device or resource busy
> > 
> > What I actually do is the mount/unmount on the client via ssh.  That
> > is a good way to trigger the problem.
> > 
> > We see that during the un-export the NFS caches are not flushed
> > properly which is why the final unmount fails.
> > 
> > In net/sunrpc/cache.c the cache times (last_refresh, expiry_time,
> > flush_time) are measured in seconds.  If I understand the code somewhat
> > then during an NFS un-export the is done by setting the flush_time to
> > the current time.  The cache_flush() is called.  If in that same second
> > last_refresh is set to the current time then the cached item is not
> > flushed.  This will subsequently cause un-mount to fail because there
> > is still a reference to the mount point.
> > 
> > > J. Bruce Fields
> > > I ran across that recently while reviewing the code to fix a related
> > > problem.  I'm not sure what the best fix would be.
> > >
> > > Previously raised here:
> > >
> > >       http://marc.info/?l=linux-nfs&m=133514319408283&w=2
> > 
> > The description in your mail does indeed looks the same as the problem
> > that we see.
> > 
> > >From reading the code in net/sunrpc/cache.c I get the impression that it is
> > not really possible to reliably flush the caches for an un-exportfs such
> > that after flushing they will not accept entries for the un-exported IP/mount
> > point combination.
> 
> Right.  So, possible ideas, from that previous message:
> 
> 	- As Neil suggests, modify exportfs to wait a second between
> 	  updating etab and flushing the cache.  At that point any
> 	  entries still using the old information are at least a second
> 	  old.  That may be adequate for your case, but if someone out
> 	  there is sensitive to the time required to unexport then that
> 	  will annoy them.  It also leaves the small possibility of
> 	  races where an in-progress rpc may still be using an export at
> 	  the time you try to flush.
> 	- Implement some new interface that you can use to flush the
> 	  cache and that doesn't return until in-progress rpc's
> 	  complete.  Since it waits for rpc's it's not purely a "cache"
> 	  layer interface any more.  So maybe something like
> 	  /proc/fs/nfsd/flush_exports.
> 	- As a workaround requiring no code changes: unexport, then shut
> 	  down the server entirely and restart it.  Clients will see
> 	  that as a reboot recovery event and recover automatically, but
> 	  applications may see delays while that happens.  Kind of a big
> 	  hammer, but if unexporting while other exports are in use is
> 	  rare maybe it would be adequate for your case.

That's a shame...
I had originally intended "rpc.nfsd 0" to simple stop all threads and nothing
else.  Then you would be able to:
   rpc.nfsd 0
   exportfs -f
   unmount
   rpc.nfsd 16

and have a nice fast race-free unmount.
But commit e096bbc6488d3e49d476bf986d33752709361277 'fixed' that :-(

I wonder if it can be resurrected ... maybe not worth the effort.


The idea of a new interface to synchronise with all threads has potential and
doesn't need to be at the nfsd level - it could be in sunrpc.  Maybe it could
be built into the current 'flush' interface.
1/ iterate through all no-sleeping threads setting a flag an increasing a
counter.
2/ when a thread completes current request, if test_and_clear the flag, it
atomic_dec_and_test the counter and then wakes up some wait_queue_head.
3/ 'flush'ing thread waits on the waut_queue_head for the counter to be 0.

If you don't hate it I could possibly even provide some code.

NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

  reply	other threads:[~2012-07-31  5:08 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <D307B3AC0BCD4C419E6B8FA6A2720A9C0C3B2F@011-DB3MPN1-001.MGDPHG.emi.philips.com>
     [not found] ` <20120724143748.GC8570@fieldses.org>
2012-07-24 17:28   ` Linux NFS and cached properties ZUIDAM, Hans
2012-07-26 22:36     ` J. Bruce Fields
2012-07-31  5:08       ` NeilBrown [this message]
2012-07-31 12:25         ` J. Bruce Fields
2012-07-31 12:45           ` J. Bruce Fields
2012-07-31 14:07             ` J. Bruce Fields
2012-08-02  0:04           ` NeilBrown
2012-08-02  2:50             ` J. Bruce Fields
2012-08-16 19:10             ` J. Bruce Fields
2012-08-16 21:05               ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120731150801.0a4b557b@notabene.brown \
    --to=neilb@suse.de \
    --cc=Hans.Zuidam@philips.com \
    --cc=PETER.DE.WITTE@philips.com \
    --cc=bfields@fieldses.org \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.