Re: reboot recovery - Chuck Lever

public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Chuck Lever <chuck.lever@oracle.com>
To: "J. Bruce Fields" <bfields@fieldses.org>
Cc: linux-nfs@vger.kernel.org
Subject: Re: reboot recovery
Date: Tue, 09 Mar 2010 16:07:31 -0500	[thread overview]
Message-ID: <4B96B893.4030300@oracle.com> (raw)
In-Reply-To: <20100309205349.GD26453@fieldses.org>

On 03/09/2010 03:53 PM, J. Bruce Fields wrote:
> On Tue, Mar 09, 2010 at 12:39:35PM -0500, Chuck Lever wrote:
>> Thanks, this is very clear.
>>
>> On 03/08/2010 08:46 PM, J. Bruce Fields wrote:
>>> The Linux server's reboot recovery code has long-standing architectural
>>> problems, fails to adhere to the specifications in some cases, and does
>>> not yet handle NFSv4.1 reboot recovery.  An overhaul has been a
>>> long-standing todo.
>>>
>>> This is my attempt to state the problem and a rough solution.
>>>
>>> Requirements
>>> ^^^^^^^^^^^^
>>>
>>> Requirements, as compared to current code:
>>>
>>> 	- Correctly implements the algorithm described in section 8.6.3
>>> 	  of rfc 3530, and eliminates known race conditions on recovery.
>>> 	- Does not attempt to manage files and directories directly from
>>> 	  inside the kernel.
>>> 	- Supports RECLAIM_COMPLETE.
>>>
>>> Requirements, in more detail:
>>>
>>> A "server instance" is the lifetime from start to shutdown of a server;
>>> a reboot ends one server instance and starts another.
>>
>> It would be better if you architected this not in terms of a server
>> reboot, but in terms of "service nfs stop" and "service nfs start".
>
> Good point; fixed in my local copy.
>
> (Though that may work for v4-only servers, since I think v2/v3 may still
> have problems with restarts that don't restart everything (including the
> client).)

Well, eventually I hope to address some of those issues.  But, no use 
tying our NFSv4 stuff to the problems of the v2/v3 implementation.

>>> Draft design
>>> ^^^^^^^^^^^^
>>>
>>> We will modify rpc.statd to handle to manage state in userspace.
>>
>> Please don't.  statd is ancient krufty code that is already barely able
>> to do what it needs to do.
>>
>> statd is single-threaded.  It makes dozens of blocking DNS calls to
>> handle NSM protocol requests.  It makes NLM downcalls on the same thread
>> that handles everything else.  Unless an effort was undertaken to make
>> statd multithreaded, this extra work could cause signficant latency for
>> handling upcalls.
>
> Hm, OK.  I guess I don't want to make this project dependent on
> rewriting statd.
>
> So, other possibilities:
> 	- Modify one of the other existing userland daemons.
> 	- Make a separate daemon just for this.
> 	- ditch the daemon entirely and depend mainly on hotplug-like
> 	  invocations of a userland program that exist after it handles
> 	  a single call.
>
>>> Previous prototype code from CITI will be considered as a starting
>>> point.
>>>
>>> Kernel<->user communication will use four files in the "nfsd"
>>> filesystem.  All of them will use the encoding used for rpc cache
>>> upcalls and downcalls, which consist of whitespace-separated fields
>>> escaped as necessary to allow binary data.
>>
>> In general, we don't want to mix RPC listeners and upcall file
>> descriptors.  mountd has to access the cache file descriptors to satisfy
>> MNT requests, so there is a reason to do it in that case.  Here there is
>> no purpose to mix these two.  It only adds needless implementation
>> complexity and unnecessary security exposures.
>>
>> Yesterday, it was suggested that we split mountd into a piece that
>> handled upcalls and a piece that handled remote MNT requests via RPC.
>> Weren't you the one who argued in favor of getting rid of daemons called
>> "rpc.foo" for NFSv4-only operation? :-)
>
> Yeah.  So I guess a subcase of the second option above would be to name
> the new daemon "nfsd-userland-helper" (or something as generic) and
> eventually make it handle export upcalls too.  I don't know.

I wasn't thinking of a single daemon for this stuff, necessarily, but 
rather a single framework that can be easily fit to whatever task is 
needed.  Just alter a few constants, specify the arguments and their 
types, add boiling water, type 'make' and fluff with fork.

We've already got referral/DNS, idmapper, gss, and mountd upcalls, and 
they all seem to do it differently from each other.

-- 
chuck[dot]lever[at]oracle[dot]com

     prev parent reply	other threads:[~2010-03-09 21:09 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-09  1:46 reboot recovery J. Bruce Fields
2010-03-09 14:46 ` Andy Adamson
2010-03-09 14:53   ` J. Bruce Fields
2010-03-09 14:55     ` William A. (Andy) Adamson
2010-03-09 15:10       ` J. Bruce Fields
2010-03-09 15:17         ` William A. (Andy) Adamson
2010-03-09 16:11           ` J. Bruce Fields
2010-03-09 17:39 ` Chuck Lever
2010-03-09 20:53   ` J. Bruce Fields
2010-03-09 21:07     ` Chuck Lever [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B96B893.4030300@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=bfields@fieldses.org \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox