All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Braam <Peter.Braam@Sun.COM>
To: lustre-devel@lists.lustre.org
Subject: [Lustre-devel] Commit on share
Date: Sun, 01 Jun 2008 13:00:52 +0800	[thread overview]
Message-ID: <C4684E04.5626%peter.braam@sun.com> (raw)
In-Reply-To: <op.ubtknjyjatmt0c@goga>




On 5/27/08 11:43 PM, "Mikhail Pershin" <Mikhail.Pershin@Sun.COM> wrote:

> Hello Peter,
> 
> Thanks for review. Alexander is on vacation so I will answer as co-author.
> 
> On Tue, 27 May 2008 14:44:18 +0400, Peter Braam <Peter.Braam@Sun.COM>
> wrote:
> 
>> This HLD is definitely not ready at all.  It is very short, lacks
>> interaction diagrams and the arguments made are not sufficiently
>> detailed.
>> 
>> * the second sentence is not right.  Commit should happen before
>> un-committed data coming from a client is shared with a 2nd client.
> 
> Can you provide any issues with that? Committing data after but not before
> current operation has following benefits:

The document is written without any mention of read only interaction with
the file system.  On top of that the language was insufficiently clear,
meaning that I only understood what you wanted to do several pages further.
That means other people will encounter the same difficulty.


> 1) no need in starting/commiting separate transaction, simple code due to
> that
> 2) less syncs. E.g. if we are committing current operation too then we
> resolve possible share case related to current operation and later commit
> will not needed. Therefore we will have less synced operations.
> E.g. the worst case for COS is:
> client1 operation
> client2 operation -> sync
> client1 op -> sync
> client2 op -> sync
> ...
> COS will do commit for each operation if they are on the same object and
> sync is happened before current operation.
> But including the current operation in commit will reduce number of
> commits in twice:
> client1 operation
> client2 op -> sync (including this op)
> client1 op - no sync because no uncommitted shared data
> client2 op -> sync
> ...
> 

This may not be a worthwhile optimization, although it seems correct.
Please provide detailed use cases where it provides value.

For example with 1000 clients creating each one file in a directory, what is
the quantative benefit?

With one client creating a file and 999 clients doing a getattr, you now
have 999 locks blocking for completion - not convincing.

>> * Is COS dependent on VBR ? no it is not, and can equally apply to normal
>> recovery
> 
> Agree, COS is like just lever to turn sync commit on/off depending on some
> conditions. These conditions maybe quite simple like now - just
> comparision of clients - or maybe more complex and include checking of
> operation types, etc.
> 
> But COS was requested initially as optional feature of VBR, so we didn't
> review COS-only configuration. Without VBR any missed client will invoke
> the eviction of all clients with replays after the gap. Therefore COS will
> not helps until we will change current recovery to don't evict clients if
> COS is enabled. But we should know actually was COS enabled before the
> server failure to be sure that the excluding gap transactions is safe. Do
> we need COS-only use-case actually?

Yes, we need COS with traditional recovery and a precise explanation why COS
with COS adds any value over COS with traditional recovery.  Again, use
numbers and exact facts.


> 
>> * Section 3.2 is wrong: the recovery process will not fail with gaps in
>> the
>> sequence when there is VBR.  It only fails if there are gaps in the
>> versions, and this is rare.
> 
> the 3.2 section is talking only about gap in versions. Maybe it is not
> correct grammatically though.
> "... Highly probably we have non-trivial gaps version in the sequence and
> the recovery process fails"
> Could you mark what is wrong with 3.2? just rewrite the sentence to make
> it more clear about what gaps we mean?

Exact detail, example use cases, no mumbling of complex ideas.

I also want to see precise flow charts of interactions upon reconnection
(this perhaps belongs in the VBR HLD), how does the system transition from
one recovery type to the next.

> 
>> * 3.3 parallel creations in one directory are protected with different,
>> independent lock resources.  Isn?t that sufficient to allow parallel
>> operations with COS?
> 
> it is HEAD feature, but this design is for 1.8 (1.6-based) Lustre with one
> dir lock. If this is not mentioned in HLD then it should be fixed.
> But the issue is not about the lock only.
> The 'simple' COS checks only clients nids to determine the dependency.
> Therefore if two clients are creating objects in the same directory then
> we will have frequent syncs due to COS (operations from different nids)
> although there is no need for sync at all becase the operations are not
> dependent.

If they are not dependent then there should be no commits.  But, you have
not defined dependency in a precise way, so the HLD is hand-waving instead
of designing.

In any case I absolutely don't want the hash.  This has to be done with
commit callbacks unless the reasons not to do so become one order of
magnitude clearer.

> The same will be with parallel locking if we will not check type of
> operation to determine the dependency.
> 
>> * 3.6 provide a detailed explanation please
> "When enabled, COS makes REP-ACK not needed."
> Basically the COS is about two things:
> 1) dependency tracking. This functionality which try to determine is
> current operation depending on some other uncommitted one.

"try to?" - so it sometimes fails?

> It may be  
> simple 

What kind of language is this?

>and check only nids of clients, maybe more sophisticated and
> include type of operation checking or any other additional data.

Without a definition of dependency, you can see why I have completely
rejected the HLD, and I will continue to do so.

> 2) COS itself, the doing sync commit of current operation if there is
> dependency.
> 
> So if we have 1) and 2) we have only the following cases:
> - there is dependency determined and commit is needed to remove it. No ACK
> is needed.
> - there is no dependency and we don't need no ACK and lock nor commit
> because client's replays are not dependent
> Therefore the ACK is not needed in both cases. The COS don't need to wait
> on repack lock, it determine the share case and do commit.

In the HLD state and define in 100% accurate manner why REP ACKS are needed,
and prove that with COS they are not.  This clearly depends on precise
definitions.

> 
> how ACK is related to 'simple' COS (the only client NIDs are matter):
> 1) client1 did operation and lock object until ACK from it will come to
> server
> 2) client2 is waiting for ACK or commit to access the object
> 3) if there was no commit yet, then client2 determine the sharing exists
> and force commit
> 
> The only positive effect of ACK is delay before doing sync, that give us
> the chance to wait for commit without doing force sync. But that can be
> done with timer to get the same results.

No timers - end of discussion.

> 
> In HLD we propose the following:
> 1) client1 got lock, did operation, unlock object after operation is done
> 2) client2 got lock on object and check was there the dependency
> 3) if dependency then force commit (or wait for it as alternative way)
> 4) otherwise update dependency info for next check, unlock object when
> operation is done
> 
> This is generic way and will work with any dependency tracking (on NIDs,
> on types of operations, etc.)

Two clients is not a sufficient argument possibly.


Please put explanations in the HLD and supply a new one.


> 
>> * GC thread is wrong mechanism this is what we have commit callbacks for

No GC - end of discussion.

> 
> Well, with callbacks we have to scan through all hash to find data to
> delete on each callback. As alex said there can be about 10K uncommitted
> transactions in high load easily, therefore using callback may become the
> bottlneck. There was discussion recently in devel@ about that originated
> by zam. Although I agree the HLD should be clear about why we choose that
> way and what is wrong with another.
> 
>> * Why not use the DLM, then we can simply keep the client waiting ? the
>> mechanism already exists for repack; I am not convinced at all by the
>> reasoning that rep-ack is so different ? no real facts are quoted
> 
> Let's estimate how RepACK lock is suitable as dependency tracking
> functionality.

Without better definitions, the arguments below cannot be judged.

> In fact it is more like 'possible dependency prevention'
> mechanism, and block object always because we can't predict the next
> operation, so should keep lock taken for ALL modifying operations. It is
> not 'tracking' but 'prediction' mechanism, it blocks access to the object
> until client will got reply just because the conflicting operation is
> possible but not because it really happen.
> Moreover it conflicts in general with dependency tracking we needed,
> because it will serialize operations even when they may not depend.
> 
> With RepACK lock we are entering in operation AFTER the checks and we
> don't know the result of this check - was there operation from different
> client? are changes committed? Should we do sync or not? RepACK lock
> doesn't answer this question and we can't decide about sync is needed or
> not.
> 
> For example, the client2 will wait for commit or ACK before entering in
> locked area.
> 1) ACK is got but no commit yet. So client2 enter in locked area and now
> should determine was commit done or not. How to do that? This is vital
> because if there was no commit yet then we should do it. We may use
> version of object possible and check it against last_committed, but this
> will work only with VBR.
> So we need extra data per-object like transno.
> 2) Commit was done. We should still do the same as for 1) to be sure about
> was commit done or not because it is not known why lock was unlocked - due
> to ACK or commit.
> 3) But we don't know still is there conflict or not because we should
> check client uuids, but we don't store such info anywhere and waiting on
> lock is not reflected somehow. So we need extra data (or extra information
>  from ldlm?) again to store uuid of client who did latest operation on that
> object.
> 
> The only way how dlm can work without any additional data is to unlock
> only when commit. But in that case we don't need COS at all. Each
> conflicting client will wait on lock until previous changes will be
> committed. But this may lead to huge latency for requests, comparing with
> commit interval and it is not what we need.
> 
>> * It is left completely without explanation how the hash table (which I
>> think we don?t need/want) is used
> 
> hash table store the following data per object:
> struct lu_dep_info {
>          struct ll_fid     di_object;
>          struct obd_uuid   di_client;
>          __u64             di_transno;
> };
> 
> it contains uuid of client and transno of last change from this client.
> The uuid is compared to determine is there is conflict of not, the transno
> shows was that data committed already or not. I described above why it is
> needed. It is 1.6-related issue because we have only inode of object and
> no any extra structure. The HEAD has lu_object enveloping inodes, and hash
> will not needed, the dependency info may be stored per lu_object.
> 
>> 
>> Regards,
>> 
>> Peter
> 
> 

  reply	other threads:[~2008-06-01  5:00 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-27 10:44 [Lustre-devel] Commit on share Peter Braam
2008-05-27 15:43 ` Mikhail Pershin
2008-06-01  5:00   ` Peter Braam [this message]
2008-05-29 17:42 ` Mikhail Pershin
2008-05-31  2:45   ` Andreas Dilger
2008-05-31  9:37     ` Alex Zhuravlev
2008-06-01  7:03     ` Mikhail Pershin
2008-06-03 18:41       ` Andreas Dilger
2008-06-03 18:56         ` Alex Zhuravlev
2008-06-01 16:54     ` Alex Zhuravlev
2008-06-02  8:42 ` Alex Zhuravlev
2008-06-03 18:50   ` Andreas Dilger
2008-06-04  1:11     ` Peter Braam
2008-06-04 10:50     ` Nikita Danilov
2008-06-11 14:21 ` Alexander Zarochentsev
2008-06-11 14:35   ` Alex Zhuravlev
2008-06-11 15:26   ` Peter Braam
2008-06-11 16:27     ` Alex Zhuravlev
2008-06-11 16:28       ` Peter Braam
2008-06-11 16:46     ` Alexander Zarochentsev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=C4684E04.5626%peter.braam@sun.com \
    --to=peter.braam@sun.com \
    --cc=lustre-devel@lists.lustre.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.