From mboxrd@z Thu Jan 1 00:00:00 1970 From: Peter Braam Date: Sun, 01 Jun 2008 13:00:52 +0800 Subject: [Lustre-devel] Commit on share In-Reply-To: Message-ID: List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: lustre-devel@lists.lustre.org On 5/27/08 11:43 PM, "Mikhail Pershin" wrote: > Hello Peter, > > Thanks for review. Alexander is on vacation so I will answer as co-author. > > On Tue, 27 May 2008 14:44:18 +0400, Peter Braam > wrote: > >> This HLD is definitely not ready at all. It is very short, lacks >> interaction diagrams and the arguments made are not sufficiently >> detailed. >> >> * the second sentence is not right. Commit should happen before >> un-committed data coming from a client is shared with a 2nd client. > > Can you provide any issues with that? Committing data after but not before > current operation has following benefits: The document is written without any mention of read only interaction with the file system. On top of that the language was insufficiently clear, meaning that I only understood what you wanted to do several pages further. That means other people will encounter the same difficulty. > 1) no need in starting/commiting separate transaction, simple code due to > that > 2) less syncs. E.g. if we are committing current operation too then we > resolve possible share case related to current operation and later commit > will not needed. Therefore we will have less synced operations. > E.g. the worst case for COS is: > client1 operation > client2 operation -> sync > client1 op -> sync > client2 op -> sync > ... > COS will do commit for each operation if they are on the same object and > sync is happened before current operation. > But including the current operation in commit will reduce number of > commits in twice: > client1 operation > client2 op -> sync (including this op) > client1 op - no sync because no uncommitted shared data > client2 op -> sync > ... > This may not be a worthwhile optimization, although it seems correct. Please provide detailed use cases where it provides value. For example with 1000 clients creating each one file in a directory, what is the quantative benefit? With one client creating a file and 999 clients doing a getattr, you now have 999 locks blocking for completion - not convincing. >> * Is COS dependent on VBR ? no it is not, and can equally apply to normal >> recovery > > Agree, COS is like just lever to turn sync commit on/off depending on some > conditions. These conditions maybe quite simple like now - just > comparision of clients - or maybe more complex and include checking of > operation types, etc. > > But COS was requested initially as optional feature of VBR, so we didn't > review COS-only configuration. Without VBR any missed client will invoke > the eviction of all clients with replays after the gap. Therefore COS will > not helps until we will change current recovery to don't evict clients if > COS is enabled. But we should know actually was COS enabled before the > server failure to be sure that the excluding gap transactions is safe. Do > we need COS-only use-case actually? Yes, we need COS with traditional recovery and a precise explanation why COS with COS adds any value over COS with traditional recovery. Again, use numbers and exact facts. > >> * Section 3.2 is wrong: the recovery process will not fail with gaps in >> the >> sequence when there is VBR. It only fails if there are gaps in the >> versions, and this is rare. > > the 3.2 section is talking only about gap in versions. Maybe it is not > correct grammatically though. > "... Highly probably we have non-trivial gaps version in the sequence and > the recovery process fails" > Could you mark what is wrong with 3.2? just rewrite the sentence to make > it more clear about what gaps we mean? Exact detail, example use cases, no mumbling of complex ideas. I also want to see precise flow charts of interactions upon reconnection (this perhaps belongs in the VBR HLD), how does the system transition from one recovery type to the next. > >> * 3.3 parallel creations in one directory are protected with different, >> independent lock resources. Isn?t that sufficient to allow parallel >> operations with COS? > > it is HEAD feature, but this design is for 1.8 (1.6-based) Lustre with one > dir lock. If this is not mentioned in HLD then it should be fixed. > But the issue is not about the lock only. > The 'simple' COS checks only clients nids to determine the dependency. > Therefore if two clients are creating objects in the same directory then > we will have frequent syncs due to COS (operations from different nids) > although there is no need for sync at all becase the operations are not > dependent. If they are not dependent then there should be no commits. But, you have not defined dependency in a precise way, so the HLD is hand-waving instead of designing. In any case I absolutely don't want the hash. This has to be done with commit callbacks unless the reasons not to do so become one order of magnitude clearer. > The same will be with parallel locking if we will not check type of > operation to determine the dependency. > >> * 3.6 provide a detailed explanation please > "When enabled, COS makes REP-ACK not needed." > Basically the COS is about two things: > 1) dependency tracking. This functionality which try to determine is > current operation depending on some other uncommitted one. "try to?" - so it sometimes fails? > It may be > simple What kind of language is this? >and check only nids of clients, maybe more sophisticated and > include type of operation checking or any other additional data. Without a definition of dependency, you can see why I have completely rejected the HLD, and I will continue to do so. > 2) COS itself, the doing sync commit of current operation if there is > dependency. > > So if we have 1) and 2) we have only the following cases: > - there is dependency determined and commit is needed to remove it. No ACK > is needed. > - there is no dependency and we don't need no ACK and lock nor commit > because client's replays are not dependent > Therefore the ACK is not needed in both cases. The COS don't need to wait > on repack lock, it determine the share case and do commit. In the HLD state and define in 100% accurate manner why REP ACKS are needed, and prove that with COS they are not. This clearly depends on precise definitions. > > how ACK is related to 'simple' COS (the only client NIDs are matter): > 1) client1 did operation and lock object until ACK from it will come to > server > 2) client2 is waiting for ACK or commit to access the object > 3) if there was no commit yet, then client2 determine the sharing exists > and force commit > > The only positive effect of ACK is delay before doing sync, that give us > the chance to wait for commit without doing force sync. But that can be > done with timer to get the same results. No timers - end of discussion. > > In HLD we propose the following: > 1) client1 got lock, did operation, unlock object after operation is done > 2) client2 got lock on object and check was there the dependency > 3) if dependency then force commit (or wait for it as alternative way) > 4) otherwise update dependency info for next check, unlock object when > operation is done > > This is generic way and will work with any dependency tracking (on NIDs, > on types of operations, etc.) Two clients is not a sufficient argument possibly. Please put explanations in the HLD and supply a new one. > >> * GC thread is wrong mechanism this is what we have commit callbacks for No GC - end of discussion. > > Well, with callbacks we have to scan through all hash to find data to > delete on each callback. As alex said there can be about 10K uncommitted > transactions in high load easily, therefore using callback may become the > bottlneck. There was discussion recently in devel@ about that originated > by zam. Although I agree the HLD should be clear about why we choose that > way and what is wrong with another. > >> * Why not use the DLM, then we can simply keep the client waiting ? the >> mechanism already exists for repack; I am not convinced at all by the >> reasoning that rep-ack is so different ? no real facts are quoted > > Let's estimate how RepACK lock is suitable as dependency tracking > functionality. Without better definitions, the arguments below cannot be judged. > In fact it is more like 'possible dependency prevention' > mechanism, and block object always because we can't predict the next > operation, so should keep lock taken for ALL modifying operations. It is > not 'tracking' but 'prediction' mechanism, it blocks access to the object > until client will got reply just because the conflicting operation is > possible but not because it really happen. > Moreover it conflicts in general with dependency tracking we needed, > because it will serialize operations even when they may not depend. > > With RepACK lock we are entering in operation AFTER the checks and we > don't know the result of this check - was there operation from different > client? are changes committed? Should we do sync or not? RepACK lock > doesn't answer this question and we can't decide about sync is needed or > not. > > For example, the client2 will wait for commit or ACK before entering in > locked area. > 1) ACK is got but no commit yet. So client2 enter in locked area and now > should determine was commit done or not. How to do that? This is vital > because if there was no commit yet then we should do it. We may use > version of object possible and check it against last_committed, but this > will work only with VBR. > So we need extra data per-object like transno. > 2) Commit was done. We should still do the same as for 1) to be sure about > was commit done or not because it is not known why lock was unlocked - due > to ACK or commit. > 3) But we don't know still is there conflict or not because we should > check client uuids, but we don't store such info anywhere and waiting on > lock is not reflected somehow. So we need extra data (or extra information > from ldlm?) again to store uuid of client who did latest operation on that > object. > > The only way how dlm can work without any additional data is to unlock > only when commit. But in that case we don't need COS at all. Each > conflicting client will wait on lock until previous changes will be > committed. But this may lead to huge latency for requests, comparing with > commit interval and it is not what we need. > >> * It is left completely without explanation how the hash table (which I >> think we don?t need/want) is used > > hash table store the following data per object: > struct lu_dep_info { > struct ll_fid di_object; > struct obd_uuid di_client; > __u64 di_transno; > }; > > it contains uuid of client and transno of last change from this client. > The uuid is compared to determine is there is conflict of not, the transno > shows was that data committed already or not. I described above why it is > needed. It is 1.6-related issue because we have only inode of object and > no any extra structure. The HEAD has lu_object enveloping inodes, and hash > will not needed, the dependency info may be stored per lu_object. > >> >> Regards, >> >> Peter > >