All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mikhail Pershin <Mikhail.Pershin@Sun.COM>
To: lustre-devel@lists.lustre.org
Subject: [Lustre-devel] Recovering opens by reconstruction
Date: Fri, 03 Jul 2009 23:02:16 +0400	[thread overview]
Message-ID: <op.uwh9t2mnatmt0c@garden> (raw)
In-Reply-To: <20090702223944.GR15302@Sun.COM>

On Fri, 03 Jul 2009 02:39:45 +0400, Nicolas Williams  
<Nicolas.Williams@sun.com> wrote:

> We're working on adding replay RPC signatures, so that clients may only
> replay RPCs that have been seen by the server (thus signed).

Could you explain that more? All replays have been seen by server just by  
definition because client got reply from server, so what is purpose of  
such signing?

> I've been researching this (and talking to Eric B. and Oleg about this).
> Several possible solutions are evident.  I'll describe the one that
> seems most elegant to me (and, I think, Oleg), namely separate open
> state recovery from transaction recovery.
>
> Server-side high-level description:
>
>  - during recovery the MDS will first process anonymous open by FID RPCs
>    from new clients (these open RPCs will not have transaction IDs
>    assigned to them as they imply no actual transactions)
>
>  - then the MDS will accept replays from all clients, new and old

It is not clear what do 'new' and 'old' mean here? If both 'new' and 'old'  
have requests to replay so they were active in previous server boot, so  
what is the difference between them?

>
>  - followed by lock recovery as usual
>
> Client-side high-level description:
>
>  - open processing will begin by sending an RPC as usual...,
> - ... but on commit the md_open_data will be added to a doubly-linked
>    list of opens and the RPC will be removed from the PTLRPC replay
>    queue
>
>  - during recovery the client will begin by traversing the list of
>    md_open_data (open state), reconstruct an anonymous open by FID RPC
>    and send it to the MDS, and after that the client will replay
>    outstanding transactions' RPCs, followed by lock recovery

Hmm, but currently it works exactly like this, the committed open replay  
are sent first followed by normal replays. So you propose to separate them  
just because they are not 'pure' replays as you described below?

>
> Old clients would recover as usual.
>
> Security is provided by the capabilities used in the anonymous open by
> FID RPCs and transport security.
>
> The general principle then would be:
>
>    RPC replaying is to be used only for recovering _transactions that
>    should not be outstanding for very long.
>
> Where "very long" is relative to the replay signature crypto key
> lifecycle, which will be on the order of days.
>
> Since opens are not transactions[*] and can stay "outstanding" forever,
> opens would not be suitable for recovery by replay under that principle.
> Open state is much more similar to DLM locks than transactions.
>
> Open recovery must precede uncommitted transaction recovery so as to
> ensure that open state is re-established before unlinks can be replayed
> that would cause the file to be destroyed.

That requires the server shouldn't start replays from all clients until  
'open recovery' is finished from all of them. In fact there is another  
solution for open-unlink problem that was implemented in 1.8. During  
recovery the unlink replay doesn't delete file but makes it orphan even if  
open count is 0. After recovery orphans are cleaned up already, so open  
replay after unlink will find orphan and open it.


-- 
Mikhail Pershin
Staff Engineer
Lustre Group
Sun Microsystems, Inc.

  reply	other threads:[~2009-07-03 19:02 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-07-02 22:39 [Lustre-devel] Recovering opens by reconstruction Nicolas Williams
2009-07-03 19:02 ` Mikhail Pershin [this message]
2009-07-03 21:55   ` Nicolas Williams
2009-07-04  0:48     ` Nicolas Williams
2009-07-04  7:14       ` Mikhail Pershin
2009-07-04  7:10     ` Mikhail Pershin
2009-07-06 17:34       ` Nicolas Williams
2009-07-06 22:42         ` Nicolas Williams
2009-07-07  9:56           ` Alex Zhuravlev
2009-07-07 14:38             ` Andreas Dilger
2009-07-08  6:46               ` Alex Zhuravlev
2009-07-07 16:03             ` Nicolas Williams
2009-07-07 13:56         ` Mikhail Pershin
2009-07-07 15:21           ` Andreas Dilger
2009-07-07 16:42             ` Mikhail Pershin
2009-07-07 16:50               ` Nicolas Williams
2009-07-07 16:14           ` Nicolas Williams
2009-07-08 17:15             ` Alex Zhuravlev
2009-07-06 17:20 ` Nicolas Williams
2009-07-06 22:37   ` Nicolas Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=op.uwh9t2mnatmt0c@garden \
    --to=mikhail.pershin@sun.com \
    --cc=lustre-devel@lists.lustre.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.