All of lore.kernel.org
 help / color / mirror / Atom feed
* [Lustre-devel] MDWBC and how much to trust clients
@ 2008-10-06  2:53 Eric Barton
  2008-10-06  3:19 ` Peter Braam
  2008-10-06 15:55 ` Nikita Danilov
  0 siblings, 2 replies; 6+ messages in thread
From: Eric Barton @ 2008-10-06  2:53 UTC (permalink / raw)
  To: lustre-devel

Nikita,

Do you agree that a buggy or malicious MDWBC could disrupt the
namespace (e.g. links to missing files, orphaned files) if
it splits up operations across multiple MDTs into sub-operations
for the individual targets?  I think it will be an issue for
security if we just trust the MDWBC to do such operations
correctly, and so I'm wondering how we can fix this.  

Using a master MDT to coordinate the operation across itself and
the remaining MDTs seems part of, but not all of the solution.
We have to process batches in bulk to retain a significant
performance advantage, so I wonder if that requires us to trust
that these batches have been created correctly.  

If so, we're stuck with the MDWBC being something we can only
do in a single trust domain - i.e. not across a WAN. That seems
unfortunate since WAN performance should be a major beneficiary
of the MDWBC.  Maybe in this case, we can still send batches over
the WAN, but to a single target which proxies for the remote client
and can be trusted to split multi-target ops over batches correctly.

Thoughts?

    Cheers,
              Eric

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Lustre-devel] MDWBC and how much to trust clients
  2008-10-06  2:53 [Lustre-devel] MDWBC and how much to trust clients Eric Barton
@ 2008-10-06  3:19 ` Peter Braam
  2008-10-06 15:55 ` Nikita Danilov
  1 sibling, 0 replies; 6+ messages in thread
From: Peter Braam @ 2008-10-06  3:19 UTC (permalink / raw)
  To: lustre-devel

We discussed this in Moscow recently.  It seems possible to avoid much
mis-behavior by building relationships that have to be confirmed before a
commit can happen.

For example a directory entry creation must be accompanied by an object
creation or link-count change.

I think it is possible for an MDS or MDS cluster to know in which cases such
relationships need to be present  for operations to transition the name
space to a new namespace (and clients can indicate what operations are
correlated).

Peter



On 10/5/08 8:53 PM, "Eric Barton" <eeb@sun.com> wrote:

> Nikita,
> 
> Do you agree that a buggy or malicious MDWBC could disrupt the
> namespace (e.g. links to missing files, orphaned files) if
> it splits up operations across multiple MDTs into sub-operations
> for the individual targets?  I think it will be an issue for
> security if we just trust the MDWBC to do such operations
> correctly, and so I'm wondering how we can fix this.
> 
> Using a master MDT to coordinate the operation across itself and
> the remaining MDTs seems part of, but not all of the solution.
> We have to process batches in bulk to retain a significant
> performance advantage, so I wonder if that requires us to trust
> that these batches have been created correctly.
> 
> If so, we're stuck with the MDWBC being something we can only
> do in a single trust domain - i.e. not across a WAN. That seems
> unfortunate since WAN performance should be a major beneficiary
> of the MDWBC.  Maybe in this case, we can still send batches over
> the WAN, but to a single target which proxies for the remote client
> and can be trusted to split multi-target ops over batches correctly.
> 
> Thoughts?
> 
>     Cheers,
>               Eric
> 
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Lustre-devel] MDWBC and how much to trust clients
  2008-10-06  2:53 [Lustre-devel] MDWBC and how much to trust clients Eric Barton
  2008-10-06  3:19 ` Peter Braam
@ 2008-10-06 15:55 ` Nikita Danilov
  2008-10-07  9:13   ` Nikita Danilov
  1 sibling, 1 reply; 6+ messages in thread
From: Nikita Danilov @ 2008-10-06 15:55 UTC (permalink / raw)
  To: lustre-devel

Eric Barton writes:
 > Nikita,

Hello,

 > 
 > Do you agree that a buggy or malicious MDWBC could disrupt the
 > namespace (e.g. links to missing files, orphaned files) if
 > it splits up operations across multiple MDTs into sub-operations
 > for the individual targets?  I think it will be an issue for
 > security if we just trust the MDWBC to do such operations
 > correctly, and so I'm wondering how we can fix this.  

as Peter mentioned, we discussed this topic during the Moscow
meeting. If I am not mistaken, we converged to the idea that before
committing an epoch, every mdt composes some kind of a `summary',
containing enough information for verification of a global consistency,
and this summary is passed though every server as a ticket, with every
server `approving' some bits in the summary accumulated so far, and
adding new ones. For example, one server adds 

        (SETATTR: FID: fid1, UPDATE: nlink += 2) 

to the summary, then another server having 

        (LINK: PARENT_FID: fid2, NAME: "foo", CHILD_FID: fid1),

in its local epoch replaces UPDATE part of the SETATTR record above with
nlink += 1, and yet another server with

        (LINK: PARENT_FID: fid3, NAME: "bar", CHILD_FID: fid1),

can cancel SETATTR completely. Note, that LINK might cancel UNLINK or
RENAME as well as SETATTR. Global consistency is verified when all
summary records are similarly canceled. All this is still very vague to
me:

    - it is not clear how to start summary exchange (round robin
      perhaps, based on an epoch number)?

    - what state should be kept in a summary?

    - is it always possible to prove consistency in one cycle?

 > 
 > Using a master MDT to coordinate the operation across itself and
 > the remaining MDTs seems part of, but not all of the solution.
 > We have to process batches in bulk to retain a significant
 > performance advantage, so I wonder if that requires us to trust
 > that these batches have been created correctly.  
 > 
 > If so, we're stuck with the MDWBC being something we can only
 > do in a single trust domain - i.e. not across a WAN. That seems
 > unfortunate since WAN performance should be a major beneficiary
 > of the MDWBC.  Maybe in this case, we can still send batches over
 > the WAN, but to a single target which proxies for the remote client
 > and can be trusted to split multi-target ops over batches correctly.
 > 
 > Thoughts?
 > 
 >     Cheers,
 >               Eric

Nikita.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Lustre-devel] MDWBC and how much to trust clients
  2008-10-06 15:55 ` Nikita Danilov
@ 2008-10-07  9:13   ` Nikita Danilov
  2008-10-09 14:04     ` Peter Braam
  0 siblings, 1 reply; 6+ messages in thread
From: Nikita Danilov @ 2008-10-07  9:13 UTC (permalink / raw)
  To: lustre-devel

Nikita Danilov writes:
 > Eric Barton writes:
 >  > Nikita,
 > 
 > Hello,

[...]

 > 
 > as Peter mentioned, we discussed this topic during the Moscow
 > meeting. If I am not mistaken, we converged to the idea that before
 > committing an epoch, every mdt composes some kind of a `summary',
 > containing enough information for verification of a global consistency,
 > and this summary is passed though every server as a ticket, with every

This can be simplified. Suppose total amount of `data', describing all
updates within given epoch is D, and there are N md servers in a cmd
cluster. Then total network traffic incurred by this algorithm is

             D   /* updates from client to all servers */ +
             D*N /* cycle summary through all servers */

that is, (N + 1)*D bytes, transferred in 2*N messages. So we won't
increase network traffic by broadcasting _all_ epoch updates to _every_
server (so that each server gets complete set of all updates within the
epoch). In this latter case, servers can prove that epoch is consistent
by

    - checking global consistency locally,

    - calculating md5 signature of all epoch updates, and

    - exchanging these signatures, to check that client sent the same
      set of updates to everybody.

This results in

             D*N /* broadcast epoch updates to all servers */ +
             e*N /* exchange signatures */

that is N*(D + e), for some small e, bytes transferred in 2*N
messages. Having complete set of updates on every server would probably
help in other places too.


 > server `approving' some bits in the summary accumulated so far, and

[...]

 >  > 
 >  > Thoughts?
 >  > 
 >  >     Cheers,
 >  >               Eric
 > 

Nikita.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Lustre-devel] MDWBC and how much to trust clients
  2008-10-07  9:13   ` Nikita Danilov
@ 2008-10-09 14:04     ` Peter Braam
  2008-10-09 16:13       ` Nikita Danilov
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Braam @ 2008-10-09 14:04 UTC (permalink / raw)
  To: lustre-devel

You'll need to limit this to the requests that have dependencies.  With the
algorithm below every server starts looking at every request - that probably
kills the scaling you want to achieve.

Peter


On 10/7/08 3:13 AM, "Nikita Danilov" <Nikita.Danilov@Sun.COM> wrote:

> Nikita Danilov writes:
>> Eric Barton writes:
>>> Nikita,
>> 
>> Hello,
> 
> [...]
> 
>> 
>> as Peter mentioned, we discussed this topic during the Moscow
>> meeting. If I am not mistaken, we converged to the idea that before
>> committing an epoch, every mdt composes some kind of a `summary',
>> containing enough information for verification of a global consistency,
>> and this summary is passed though every server as a ticket, with every
> 
> This can be simplified. Suppose total amount of `data', describing all
> updates within given epoch is D, and there are N md servers in a cmd
> cluster. Then total network traffic incurred by this algorithm is
> 
>              D   /* updates from client to all servers */ +
>              D*N /* cycle summary through all servers */
> 
> that is, (N + 1)*D bytes, transferred in 2*N messages. So we won't
> increase network traffic by broadcasting _all_ epoch updates to _every_
> server (so that each server gets complete set of all updates within the
> epoch). In this latter case, servers can prove that epoch is consistent
> by
> 
>     - checking global consistency locally,
> 
>     - calculating md5 signature of all epoch updates, and
> 
>     - exchanging these signatures, to check that client sent the same
>       set of updates to everybody.
> 
> This results in
> 
>              D*N /* broadcast epoch updates to all servers */ +
>              e*N /* exchange signatures */
> 
> that is N*(D + e), for some small e, bytes transferred in 2*N
> messages. Having complete set of updates on every server would probably
> help in other places too.
> 
> 
>> server `approving' some bits in the summary accumulated so far, and
> 
> [...]
> 
>>> 
>>> Thoughts?
>>> 
>>>     Cheers,
>>>               Eric
>> 
> 
> Nikita.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Lustre-devel] MDWBC and how much to trust clients
  2008-10-09 14:04     ` Peter Braam
@ 2008-10-09 16:13       ` Nikita Danilov
  0 siblings, 0 replies; 6+ messages in thread
From: Nikita Danilov @ 2008-10-09 16:13 UTC (permalink / raw)
  To: lustre-devel

Peter Braam writes:
 > You'll need to limit this to the requests that have dependencies.  With the
 > algorithm below every server starts looking at every request - that probably
 > kills the scaling you want to achieve.

I agree that total amount of data can be reduced significantly, but
won't it be sometimes useful to have complete epoch state on all
servers? E.g., we can do server->server replay forward instead of a roll
back.

After all, additional requests are only `looked at' rather than actually
processed. Moreover, global consistency check can be done by one server
only (selected round-robin for each epoch), after which this server
sends md5 signature of total epoch state to other servers to verify.

 > 
 > Peter

Nikita.

 > 
 > 
 > On 10/7/08 3:13 AM, "Nikita Danilov" <Nikita.Danilov@Sun.COM> wrote:
 > 
 > > Nikita Danilov writes:
 > >> Eric Barton writes:
 > >>> Nikita,
 > >> 
 > >> Hello,
 > > 
 > > [...]
 > > 
 > >> 
 > >> as Peter mentioned, we discussed this topic during the Moscow
 > >> meeting. If I am not mistaken, we converged to the idea that before
 > >> committing an epoch, every mdt composes some kind of a `summary',
 > >> containing enough information for verification of a global consistency,
 > >> and this summary is passed though every server as a ticket, with every
 > > 
 > > This can be simplified. Suppose total amount of `data', describing all
 > > updates within given epoch is D, and there are N md servers in a cmd
 > > cluster. Then total network traffic incurred by this algorithm is
 > > 
 > >              D   /* updates from client to all servers */ +
 > >              D*N /* cycle summary through all servers */
 > > 
 > > that is, (N + 1)*D bytes, transferred in 2*N messages. So we won't
 > > increase network traffic by broadcasting _all_ epoch updates to _every_
 > > server (so that each server gets complete set of all updates within the
 > > epoch). In this latter case, servers can prove that epoch is consistent
 > > by
 > > 
 > >     - checking global consistency locally,
 > > 
 > >     - calculating md5 signature of all epoch updates, and
 > > 
 > >     - exchanging these signatures, to check that client sent the same
 > >       set of updates to everybody.
 > > 
 > > This results in
 > > 
 > >              D*N /* broadcast epoch updates to all servers */ +
 > >              e*N /* exchange signatures */
 > > 
 > > that is N*(D + e), for some small e, bytes transferred in 2*N
 > > messages. Having complete set of updates on every server would probably
 > > help in other places too.
 > > 
 > > 
 > >> server `approving' some bits in the summary accumulated so far, and
 > > 
 > > [...]
 > > 
 > >>> 
 > >>> Thoughts?
 > >>> 
 > >>>     Cheers,
 > >>>               Eric
 > >> 
 > > 
 > > Nikita.
 > 
 > 
 > _______________________________________________
 > Lustre-devel mailing list
 > Lustre-devel at lists.lustre.org
 > http://lists.lustre.org/mailman/listinfo/lustre-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-10-09 16:13 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-06  2:53 [Lustre-devel] MDWBC and how much to trust clients Eric Barton
2008-10-06  3:19 ` Peter Braam
2008-10-06 15:55 ` Nikita Danilov
2008-10-07  9:13   ` Nikita Danilov
2008-10-09 14:04     ` Peter Braam
2008-10-09 16:13       ` Nikita Danilov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.