All of lore.kernel.org
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>,
	NFS <linux-nfs@vger.kernel.org>
Subject: Re: Live lock in silly-rename.
Date: Thu, 5 Jun 2014 10:26:22 +1000	[thread overview]
Message-ID: <20140605102622.1c4cd6f9@notabene.brown> (raw)
In-Reply-To: <20140604132739.GK6839@fieldses.org>

[-- Attachment #1: Type: text/plain, Size: 4705 bytes --]

On Wed, 4 Jun 2014 09:27:39 -0400 "J. Bruce Fields" <bfields@fieldses.org>
wrote:

> On Wed, Jun 04, 2014 at 08:48:02AM -0400, Trond Myklebust wrote:
> > On Wed, Jun 4, 2014 at 3:39 AM, NeilBrown <neilb@suse.de> wrote:
> > > On Sat, 31 May 2014 08:13:58 +1000 NeilBrown <neilb@suse.de> wrote:
> > >
> > >> On Fri, 30 May 2014 17:55:23 -0400 "J. Bruce Fields" <bfields@fieldses.org>
> > >> wrote:
> > >>
> > >> > On Fri, May 30, 2014 at 01:44:42PM +1000, NeilBrown wrote:
> > >> > > On Thu, 29 May 2014 20:44:23 -0400 "J. Bruce Fields" <bfields@fieldses.org>
> > >> > > wrote:
> > >> > >
> > >> > > > Yes, it's a known server bug.
> > >> > > >
> > >> > > > As a first attempt I was thinking of just sticking a timestamp in struct
> > >> > > > inode to record the time of the most recent conflicting access and deny
> > >> > > > delegations if the timestamp is too recent, for some definition of too
> > >> > > > recent.
> > >> > > >
> > >> > >
> > >> > > Hmmm... I'll have a look next week and see what I can come up with.
> > >> >
> > >> > Thanks!
> > >> >
> > >> > If we didn't think it was worth another struct inode field, we could
> > >> > probably get away with global state.  Even just refusing to give out any
> > >> > delegations for a few seconds after any delegation break would be enough
> > >> > to fix this bug.
> > >> >
> > >> > Or you could make it a little less harsh with a small hash table: "don't
> > >> > give out a delegation on any inode whose inode number hashes to X for a
> > >> > few seconds."
> > >>
> > >> I was thinking of using a bloom filter - or possibly two.
> > >> - avoid handing out delegations if either bloom filter reports a match
> > >> - when reclaiming a delegation add the inode to the second bloom filter
> > >> - every so-often zero-out the older filter and swap them.
> > >>
> > >> Might be a bit of overkill, but I won't know until I implement it.
> > >>
> > >
> > > Below is my suggestion.  It seems easy enough.  It even works.
> > >
> > > However it does raise an issue with the NFS client.
> > >
> > > NFS performs a silly-rename as an 'asynchronous' operation.  One consequence
> > > of this is that NFS4ERR_DELAY always results in a delay of
> > > NFS4_POLL_RETRY_MAX (15*HZ), where as sync requests use an exponential scale
> > > from _MIN to _MAX.
> > >
> > > So in my test case there is always a 15second delay:
> > >   - try to silly-rename
> > >   - get NFS4ERR_DELAY
> > >   - server reclaim delegation
> > >   - 15 seconds passes
> > >   - retry silly-rename - it works.
> > >
> > > I hacked the NFS server to store a timeout in 'struct nfs_renamedata', and
> > > use the same exponential retry pattern and the 15 seconds (obviously)
> > > disappeared.
> > >
> > > Trond: would  you accept a patch which did that more generally?  e.g. pass a
> > > timeout pointer to nfs4_async_handle_error() and various *_done function pass
> > > a pointer to a field in their calldata?
> > 
> > It depends. If we're touching nfs4_async_handle_error, then I think we
> > should also convert nfs4_async_handle_error to use the same "struct
> > nfs4_exception" argument that we use for the synchronous case so that
> > we can share a bit more code.
> 
> I wonder why this hasn't been a major complaint before--is there
> something other servers are doing to mitigate the problem, or is
> renaming a delegated file just rarer than I would have expected?

Renaming a file isn't a problem as that is synchronous as gets the
exponentially increasing sequence of timeouts which starts small.

It is only the silly-rename which causes a problem as that is async
and so has a fixed large delay.

The async operations are:
  close, unlink, rename, callback(?), write, commit, delegreturn,
  unlock, layoutget, layoutreturn, layoutcommit, free_stateid

The async versions of 'unlink' and 'rename' are only used for silly-delete
processing.  'rename' when the last link is dropped, then 'unlink' on last
close.
The others look like being async and  possibly having a longer delay would
not be a problem.

'rename' is a problem because until the rename completes, the file is still
visible in the namespace...

I don't really get why an async rename is used for silly-rename as the
nfs_async_rename() call is followed immediately by
	error = rpc_wait_for_completion_task(task);

so it looks synchronous.  I suspect there is a subtlety....

NeilBrown

> 
> --b.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

  reply	other threads:[~2014-06-05  0:26 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-29  6:45 Live lock in silly-rename NeilBrown
2014-05-29 16:38 ` Trond Myklebust
     [not found]   ` <20140530075135.753fb7ed@notabene.brown>
2014-05-30  0:44     ` J. Bruce Fields
2014-05-30  3:44       ` NeilBrown
2014-05-30 21:55         ` J. Bruce Fields
2014-05-30 22:13           ` NeilBrown
2014-06-04  7:39             ` NeilBrown
2014-06-04 12:48               ` Trond Myklebust
2014-06-04 13:27                 ` J. Bruce Fields
2014-06-05  0:26                   ` NeilBrown [this message]
2014-06-05  0:40                 ` NeilBrown
2014-06-04 22:05               ` J. Bruce Fields
2014-06-05  0:34                 ` NeilBrown
2014-06-11 14:21                   ` J. Bruce Fields
2014-06-12  1:43                     ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140605102622.1c4cd6f9@notabene.brown \
    --to=neilb@suse.de \
    --cc=bfields@fieldses.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trond.myklebust@primarydata.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.