Re: reboot recovery - Chuck Lever

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Chuck Lever <chuck.lever@oracle.com>
To: "J. Bruce Fields" <bfields@fieldses.org>
Cc: linux-nfs@vger.kernel.org
Subject: Re: reboot recovery
Date: Tue, 09 Mar 2010 16:07:31 -0500	[thread overview]
Message-ID: <4B96B893.4030300@oracle.com> (raw)
In-Reply-To: <20100309205349.GD26453@fieldses.org>

On 03/09/2010 03:53 PM, J. Bruce Fields wrote:
> On Tue, Mar 09, 2010 at 12:39:35PM -0500, Chuck Lever wrote:
>> Thanks, this is very clear.
>>
>> On 03/08/2010 08:46 PM, J. Bruce Fields wrote:
>>> The Linux server's reboot recovery code has long-standing architectural
>>> problems, fails to adhere to the specifications in some cases, and does
>>> not yet handle NFSv4.1 reboot recovery.  An overhaul has been a
>>> long-standing todo.
>>>
>>> This is my attempt to state the problem and a rough solution.
>>>
>>> Requirements
>>> ^^^^^^^^^^^^
>>>
>>> Requirements, as compared to current code:
>>>
>>> 	- Correctly implements the algorithm described in section 8.6.3
>>> 	  of rfc 3530, and eliminates known race conditions on recovery.
>>> 	- Does not attempt to manage files and directories directly from
>>> 	  inside the kernel.
>>> 	- Supports RECLAIM_COMPLETE.
>>>
>>> Requirements, in more detail:
>>>
>>> A "server instance" is the lifetime from start to shutdown of a server;
>>> a reboot ends one server instance and starts another.
>>
>> It would be better if you architected this not in terms of a server
>> reboot, but in terms of "service nfs stop" and "service nfs start".
>
> Good point; fixed in my local copy.
>
> (Though that may work for v4-only servers, since I think v2/v3 may still
> have problems with restarts that don't restart everything (including the
> client).)

Well, eventually I hope to address some of those issues.  But, no use 
tying our NFSv4 stuff to the problems of the v2/v3 implementation.

>>> Draft design
>>> ^^^^^^^^^^^^
>>>
>>> We will modify rpc.statd to handle to manage state in userspace.
>>
>> Please don't.  statd is ancient krufty code that is already barely able
>> to do what it needs to do.
>>
>> statd is single-threaded.  It makes dozens of blocking DNS calls to
>> handle NSM protocol requests.  It makes NLM downcalls on the same thread
>> that handles everything else.  Unless an effort was undertaken to make
>> statd multithreaded, this extra work could cause signficant latency for
>> handling upcalls.
>
> Hm, OK.  I guess I don't want to make this project dependent on
> rewriting statd.
>
> So, other possibilities:
> 	- Modify one of the other existing userland daemons.
> 	- Make a separate daemon just for this.
> 	- ditch the daemon entirely and depend mainly on hotplug-like
> 	  invocations of a userland program that exist after it handles
> 	  a single call.
>
>>> Previous prototype code from CITI will be considered as a starting
>>> point.
>>>
>>> Kernel<->user communication will use four files in the "nfsd"
>>> filesystem.  All of them will use the encoding used for rpc cache
>>> upcalls and downcalls, which consist of whitespace-separated fields
>>> escaped as necessary to allow binary data.
>>
>> In general, we don't want to mix RPC listeners and upcall file
>> descriptors.  mountd has to access the cache file descriptors to satisfy
>> MNT requests, so there is a reason to do it in that case.  Here there is
>> no purpose to mix these two.  It only adds needless implementation
>> complexity and unnecessary security exposures.
>>
>> Yesterday, it was suggested that we split mountd into a piece that
>> handled upcalls and a piece that handled remote MNT requests via RPC.
>> Weren't you the one who argued in favor of getting rid of daemons called
>> "rpc.foo" for NFSv4-only operation? :-)
>
> Yeah.  So I guess a subcase of the second option above would be to name
> the new daemon "nfsd-userland-helper" (or something as generic) and
> eventually make it handle export upcalls too.  I don't know.

I wasn't thinking of a single daemon for this stuff, necessarily, but 
rather a single framework that can be easily fit to whatever task is 
needed.  Just alter a few constants, specify the arguments and their 
types, add boiling water, type 'make' and fluff with fork.

We've already got referral/DNS, idmapper, gss, and mountd upcalls, and 
they all seem to do it differently from each other.

-- 
chuck[dot]lever[at]oracle[dot]com

     prev parent reply	other threads:[~2010-03-09 21:09 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-09  1:46 reboot recovery J. Bruce Fields
2010-03-09 14:46 ` Andy Adamson
2010-03-09 14:53   ` J. Bruce Fields
2010-03-09 14:55     ` William A. (Andy) Adamson
2010-03-09 15:10       ` J. Bruce Fields
2010-03-09 15:17         ` William A. (Andy) Adamson
2010-03-09 16:11           ` J. Bruce Fields
2010-03-09 17:39 ` Chuck Lever
2010-03-09 20:53   ` J. Bruce Fields
2010-03-09 21:07     ` Chuck Lever [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B96B893.4030300@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=bfields@fieldses.org \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.