public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: "J. Bruce Fields" <bfields@fieldses.org>
To: Greg Banks <gnb-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
Cc: Neil Brown <neilb@suse.de>, NFS list <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH,RFC] more graceful sunrpc cache updates for HA
Date: Mon, 12 Jan 2009 10:51:46 -0500	[thread overview]
Message-ID: <20090112155146.GA24322@fieldses.org> (raw)
In-Reply-To: <496B1A7E.80807-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>

On Mon, Jan 12, 2009 at 09:25:02PM +1100, Greg Banks wrote:
> The kernel keeps an effectively global generation number (genid). 
> Interfaces are provided to userspace to allow querying the genid, and to
> allow atomically incrementing and querying the genid.  After exportfs
> makes a change to etab it asks the kernel to increment the genid.  When
> mountd wants to know if the etab has changed, it checks the genid and
> re-reads etab if the genid has changed since the last read.  The export
> updates that mountd writes into the kernel are tagged with the genid
> that mountd thinks they belong to, and this is stored in the cache
> entry.  Missing is a hunk to make cache_fresh() compare the genids of
> the entry and the cache_detail and if they differ start an upcall (but
> *not* consider the entry invalid, i.e. behave like the age >
> refresh_age/2 case).

So the result is just to give userspace a way to tell the kernel that it
should start making upcalls without yet dropping the existing cache
entries?

I'd like to guarantee that nfsd behavior reflects the updated exports
by the time exportfs returns.  From your description, it doesn't sound
like you're trying to meet such a guarantee?  Or is there some way for
exportfs to wait till it sees the updates made?

It also might be possible to teach exportfs and/or mountd how to write
the "diff" between the current kernel exports and the new exports into
the export cache.

> a) allow large NFS calls to be deferred, up to the maximum wsize rather
> than just a page, or
> 
> b) change call deferral to always block the calling thread instead of
> using a deferral record and returning -EAGAIN

Any deferral method sufficient to handle reads and writes already
requires saving a fair amount of state, so I wonder whether the extra
overhead just to keep another thread around is worth the trouble of
avoiding....

--b.

> Both approaches have interesting and potentially frightening side
> effects, but could be made to work.  I've discussed option b) with
> Bruce, and I understand the NFSv4.1 guys have their own reasons for
> wanting to do something like that.  Maybe the above will help explain
> why the current call deferral behaviour gives me the irrits :-)

  parent reply	other threads:[~2009-01-12 15:51 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-12 10:25 [PATCH,RFC] more graceful sunrpc cache updates for HA Greg Banks
     [not found] ` <496B1A7E.80807-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org>
2009-01-12 15:51   ` J. Bruce Fields [this message]
2009-01-12 21:15     ` Greg Banks

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090112155146.GA24322@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=gnb-cP1dWloDopni96+mSzHFpQC/G2K4zDHf@public.gmane.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox