* [Lustre-devel] some thoughts on COS
@ 2008-06-05 12:27 Alex Zhuravlev
2008-06-07 14:00 ` Peter Braam
2008-06-10 18:50 ` Alexander Zarochentsev
0 siblings, 2 replies; 8+ messages in thread
From: Alex Zhuravlev @ 2008-06-05 12:27 UTC (permalink / raw)
To: lustre-devel
Hi,
I think it could be great if we can use LDLM for COS. at the very first view
it looks possible:
1) each server lock is tagged with client unique id (uuid/export addr/etc)
2) mds registers own blocking AST function
3) locks to be used rep-ack's aren't released upon ACK, only upon commit
4) whenever conflict is observed by LDLM at enqueue time, MDS's blocking AST
function is called and depending on whetner conflicting locks are taken on
behalf same or different clients, the function issues sync causing commit
and old lock to be released later
but one use case isn't that obvious. it's OK when first lock L1 was from client
C1 with PW mode and new lock is also from C1/PW. but then we have a situation
with same client, but locks are PW then PR:
1) we wouldn't want to sync just because client does mkdir a; touch a
2) thus we have to grant PR lock (so, first problem - sometimes PW and PR doesn't
conflict?)
3) if we cancel PW to grant PR, then we'd have to make this PR conflicting with
any PR coming from different client?
4) changing PR to PW in order to inherit state? (client side doesn't expect such
locks_
all of this doesn't sound like a good solution, IMHO. at least it'd require serious
changes in LDLM while we're talking about 1.6/1.8 ... so we need another way.
probably we could re-use VBR as each inode change goes with new persistent version
and version is numerically equal to transno, comparing inode's version with last
committed transno we can learn whether the inode is committed?
next problem is to learn source of change, i.e. client. in the worst case all changes
are from different clients, thus every change means sync. but if we *cache* source
information we probably can avoid majority of syncs. IOW, we don't need to track
source all the time, it should be enough if we have this information most of time.
so, storing it in in-core inode is good enough probably. following this way we don't
need to care about inode's lifetime.
thanks, Alex
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Lustre-devel] some thoughts on COS
2008-06-05 12:27 [Lustre-devel] some thoughts on COS Alex Zhuravlev
@ 2008-06-07 14:00 ` Peter Braam
2008-06-30 8:10 ` Alex Zhuravlev
2008-06-10 18:50 ` Alexander Zarochentsev
1 sibling, 1 reply; 8+ messages in thread
From: Peter Braam @ 2008-06-07 14:00 UTC (permalink / raw)
To: lustre-devel
Somehow making this work seems very attractive. I really think that
clarifying the definitions further might reveal a solution.
Peter
On 6/5/08 5:27 AM, "Alex Zhuravlev" <Alex.Zhuravlev@Sun.COM> wrote:
> Hi,
>
> I think it could be great if we can use LDLM for COS. at the very first view
> it looks possible:
> 1) each server lock is tagged with client unique id (uuid/export addr/etc)
> 2) mds registers own blocking AST function
> 3) locks to be used rep-ack's aren't released upon ACK, only upon commit
> 4) whenever conflict is observed by LDLM at enqueue time, MDS's blocking AST
> function is called and depending on whetner conflicting locks are taken on
> behalf same or different clients, the function issues sync causing commit
> and old lock to be released later
>
> but one use case isn't that obvious. it's OK when first lock L1 was from
> client
> C1 with PW mode and new lock is also from C1/PW. but then we have a situation
> with same client, but locks are PW then PR:
> 1) we wouldn't want to sync just because client does mkdir a; touch a
> 2) thus we have to grant PR lock (so, first problem - sometimes PW and PR
> doesn't
> conflict?)
> 3) if we cancel PW to grant PR, then we'd have to make this PR conflicting
> with
> any PR coming from different client?
> 4) changing PR to PW in order to inherit state? (client side doesn't expect
> such
> locks_
>
> all of this doesn't sound like a good solution, IMHO. at least it'd require
> serious
> changes in LDLM while we're talking about 1.6/1.8 ... so we need another way.
>
> probably we could re-use VBR as each inode change goes with new persistent
> version
> and version is numerically equal to transno, comparing inode's version with
> last
> committed transno we can learn whether the inode is committed?
>
> next problem is to learn source of change, i.e. client. in the worst case all
> changes
> are from different clients, thus every change means sync. but if we *cache*
> source
> information we probably can avoid majority of syncs. IOW, we don't need to
> track
> source all the time, it should be enough if we have this information most of
> time.
> so, storing it in in-core inode is good enough probably. following this way we
> don't
> need to care about inode's lifetime.
>
> thanks, Alex
>
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Lustre-devel] some thoughts on COS
2008-06-05 12:27 [Lustre-devel] some thoughts on COS Alex Zhuravlev
2008-06-07 14:00 ` Peter Braam
@ 2008-06-10 18:50 ` Alexander Zarochentsev
2008-06-10 21:24 ` Peter Braam
1 sibling, 1 reply; 8+ messages in thread
From: Alexander Zarochentsev @ 2008-06-10 18:50 UTC (permalink / raw)
To: lustre-devel
hello Alex,
On 5 June 2008 16:27:27 Alex Zhuravlev wrote:
> Hi,
>
> I think it could be great if we can use LDLM for COS. at the very
> first view it looks possible:
> 1) each server lock is tagged with client unique id (uuid/export
> addr/etc) 2) mds registers own blocking AST function
> 3) locks to be used rep-ack's aren't released upon ACK, only upon
> commit 4) whenever conflict is observed by LDLM at enqueue time,
> MDS's blocking AST function is called and depending on whetner
> conflicting locks are taken on behalf same or different clients, the
> function issues sync causing commit and old lock to be released later
There could be dependency between operations and no lock conflicts at
all, just because the PW lock is released already but the changes are
not yet committed. Then we have no blocking AST and no commit.
It is why LDLM seems not a good place to do COS, LDLM deals with locks
and their conflicts but the COS deals with dependency info their
conflicts?) which has lifecycle and semantics different from LDLM
locks.
> but one use case isn't that obvious. it's OK when first lock L1 was
> from client C1 with PW mode and new lock is also from C1/PW. but then
> we have a situation with same client, but locks are PW then PR:
> 1) we wouldn't want to sync just because client does mkdir a; touch a
> 2) thus we have to grant PR lock (so, first problem - sometimes PW
> and PR doesn't conflict?)
> 3) if we cancel PW to grant PR, then we'd have to make this PR
> conflicting with any PR coming from different client?
> 4) changing PR to PW in order to inherit state? (client side doesn't
> expect such locks_
we just don't expect the lock to exist as long as the changes stay not
committed, so LDLM can't catch the dependency between the operations.
> all of this doesn't sound like a good solution, IMHO. at least it'd
> require serious changes in LDLM while we're talking about 1.6/1.8 ...
> so we need another way.
>
> probably we could re-use VBR as each inode change goes with new
> persistent version and version is numerically equal to transno,
> comparing inode's version with last committed transno we can learn
> whether the inode is committed?
>
> next problem is to learn source of change, i.e. client. in the worst
> case all changes are from different clients, thus every change means
> sync. but if we *cache* source information we probably can avoid
> majority of syncs. IOW, we don't need to track source all the time,
> it should be enough if we have this information most of time. so,
> storing it in in-core inode is good enough probably. following this
> way we don't need to care about inode's lifetime.
>
> thanks, Alex
>
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
Thanks,
--
Alexander "Zam" Zarochentsev
Staff Engineer
Lustre Group, Sun Microsystems
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Lustre-devel] some thoughts on COS
2008-06-10 18:50 ` Alexander Zarochentsev
@ 2008-06-10 21:24 ` Peter Braam
0 siblings, 0 replies; 8+ messages in thread
From: Peter Braam @ 2008-06-10 21:24 UTC (permalink / raw)
To: lustre-devel
On 6/10/08 12:50 PM, "Alexander Zarochentsev"
<Alexander.Zarochentsev@Sun.COM> wrote:
> hello Alex,
>
> On 5 June 2008 16:27:27 Alex Zhuravlev wrote:
>> Hi,
>>
>> I think it could be great if we can use LDLM for COS. at the very
>> first view it looks possible:
>> 1) each server lock is tagged with client unique id (uuid/export
>> addr/etc) 2) mds registers own blocking AST function
>> 3) locks to be used rep-ack's aren't released upon ACK, only upon
>> commit 4) whenever conflict is observed by LDLM at enqueue time,
>> MDS's blocking AST function is called and depending on whetner
>> conflicting locks are taken on behalf same or different clients, the
>> function issues sync causing commit and old lock to be released later
>
> There could be dependency between operations and no lock conflicts at
> all, just because the PW lock is released already but the changes are
> not yet committed. Then we have no blocking AST and no commit.
This makes no sense, the idea is to keep a lock until commit.
Peter
>
> It is why LDLM seems not a good place to do COS, LDLM deals with locks
> and their conflicts but the COS deals with dependency info their
> conflicts?) which has lifecycle and semantics different from LDLM
> locks.
>
>> but one use case isn't that obvious. it's OK when first lock L1 was
>> from client C1 with PW mode and new lock is also from C1/PW. but then
>> we have a situation with same client, but locks are PW then PR:
>> 1) we wouldn't want to sync just because client does mkdir a; touch a
>> 2) thus we have to grant PR lock (so, first problem - sometimes PW
>> and PR doesn't conflict?)
>> 3) if we cancel PW to grant PR, then we'd have to make this PR
>> conflicting with any PR coming from different client?
>> 4) changing PR to PW in order to inherit state? (client side doesn't
>> expect such locks_
>
> we just don't expect the lock to exist as long as the changes stay not
> committed, so LDLM can't catch the dependency between the operations.
>
>> all of this doesn't sound like a good solution, IMHO. at least it'd
>> require serious changes in LDLM while we're talking about 1.6/1.8 ...
>> so we need another way.
>>
>> probably we could re-use VBR as each inode change goes with new
>> persistent version and version is numerically equal to transno,
>> comparing inode's version with last committed transno we can learn
>> whether the inode is committed?
>>
>> next problem is to learn source of change, i.e. client. in the worst
>> case all changes are from different clients, thus every change means
>> sync. but if we *cache* source information we probably can avoid
>> majority of syncs. IOW, we don't need to track source all the time,
>> it should be enough if we have this information most of time. so,
>> storing it in in-core inode is good enough probably. following this
>> way we don't need to care about inode's lifetime.
>>
>> thanks, Alex
>>
>> _______________________________________________
>> Lustre-devel mailing list
>> Lustre-devel at lists.lustre.org
>> http://lists.lustre.org/mailman/listinfo/lustre-devel
>
> Thanks,
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Lustre-devel] some thoughts on COS
2008-06-07 14:00 ` Peter Braam
@ 2008-06-30 8:10 ` Alex Zhuravlev
2008-06-30 10:11 ` Nikita Danilov
2008-06-30 14:00 ` Peter Braam
0 siblings, 2 replies; 8+ messages in thread
From: Alex Zhuravlev @ 2008-06-30 8:10 UTC (permalink / raw)
To: lustre-devel
Hi,
all access to an object can be broken into 3 phases:
1) lock is acquired and used to modify data, no concurrent
access as data is inconsistent
2) data is consistent, but uncommitted; thus same client can
access data, others can not
3) all clients can access data
it'd make sense to have same lock handle for (1) and (2) as it
is stored in request and later used to release lock up on commit.
(1) and (3) are clear - this is just lock acquired and lock released.
what if we introduce new lock state (bit, whatever) compatible with
one client (some tag in the lock) and incompatible with others?
in order to keep same lock handle we convert lock (1) into lock (2).
conversion isn't a new conception, we did it before.
then, regular create would look like:
1) lockh = enqueue(PW, clientid); // clientid is stored in the lock
2) object creation; directory modification
3) ptlrpc_save_lock(req, lockh)
convert(lockh, PW, OWN)
...
4) commit
lock_decref(lockh, OWN)
also, we'd have to register blocking AST in MDS in order to intercept
collision when one client tries to access data modified by another one.
from that handling we could initiate or schedule sync commit.
this looks like a quite simple conception. but it's far from being
optimal - what if one client does thousand creations, we'll end with
thousand OWN locks, while to prevent alien access we need a single one.
couple ideas can be used here:
1) cache locks on the MDS side as per Nikita's suggestion
2) drop all OWN locks from completion AST
please comments, thoughts?
thanks, Alex
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Lustre-devel] some thoughts on COS
2008-06-30 8:10 ` Alex Zhuravlev
@ 2008-06-30 10:11 ` Nikita Danilov
2008-06-30 11:04 ` Alex Zhuravlev
2008-06-30 14:00 ` Peter Braam
1 sibling, 1 reply; 8+ messages in thread
From: Nikita Danilov @ 2008-06-30 10:11 UTC (permalink / raw)
To: lustre-devel
Alex Zhuravlev writes:
> Hi,
>
> all access to an object can be broken into 3 phases:
> 1) lock is acquired and used to modify data, no concurrent
> access as data is inconsistent
> 2) data is consistent, but uncommitted; thus same client can
> access data, others can not
Other clients cannot read uncommitted data, but they can safely
overwrite them. Do we care about such use cases?
> 3) all clients can access data
Nikita.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Lustre-devel] some thoughts on COS
2008-06-30 10:11 ` Nikita Danilov
@ 2008-06-30 11:04 ` Alex Zhuravlev
0 siblings, 0 replies; 8+ messages in thread
From: Alex Zhuravlev @ 2008-06-30 11:04 UTC (permalink / raw)
To: lustre-devel
Nikita Danilov wrote:
> Alex Zhuravlev writes:
> > Hi,
> >
> > all access to an object can be broken into 3 phases:
> > 1) lock is acquired and used to modify data, no concurrent
> > access as data is inconsistent
> > 2) data is consistent, but uncommitted; thus same client can
> > access data, others can not
>
> Other clients cannot read uncommitted data, but they can safely
> overwrite them. Do we care about such use cases?
this is true for data, but most of metadata operations do read first.
thanks, Alex
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Lustre-devel] some thoughts on COS
2008-06-30 8:10 ` Alex Zhuravlev
2008-06-30 10:11 ` Nikita Danilov
@ 2008-06-30 14:00 ` Peter Braam
1 sibling, 0 replies; 8+ messages in thread
From: Peter Braam @ 2008-06-30 14:00 UTC (permalink / raw)
To: lustre-devel
Very interesting. Could this new lock also be used to protect all data on
the file, meaning only the lock holding client can modify data (without
involving OST locks)? We have been looking for that also, and it smells
similar.
Peter
On 6/30/08 2:10 AM, "Alex Zhuravlev" <Alex.Zhuravlev@Sun.COM> wrote:
> Hi,
>
> all access to an object can be broken into 3 phases:
> 1) lock is acquired and used to modify data, no concurrent
> access as data is inconsistent
> 2) data is consistent, but uncommitted; thus same client can
> access data, others can not
> 3) all clients can access data
>
> it'd make sense to have same lock handle for (1) and (2) as it
> is stored in request and later used to release lock up on commit.
>
> (1) and (3) are clear - this is just lock acquired and lock released.
>
> what if we introduce new lock state (bit, whatever) compatible with
> one client (some tag in the lock) and incompatible with others?
> in order to keep same lock handle we convert lock (1) into lock (2).
> conversion isn't a new conception, we did it before.
>
> then, regular create would look like:
>
> 1) lockh = enqueue(PW, clientid); // clientid is stored in the lock
> 2) object creation; directory modification
> 3) ptlrpc_save_lock(req, lockh)
> convert(lockh, PW, OWN)
> ...
> 4) commit
> lock_decref(lockh, OWN)
>
> also, we'd have to register blocking AST in MDS in order to intercept
> collision when one client tries to access data modified by another one.
> from that handling we could initiate or schedule sync commit.
>
> this looks like a quite simple conception. but it's far from being
> optimal - what if one client does thousand creations, we'll end with
> thousand OWN locks, while to prevent alien access we need a single one.
> couple ideas can be used here:
> 1) cache locks on the MDS side as per Nikita's suggestion
> 2) drop all OWN locks from completion AST
>
> please comments, thoughts?
>
> thanks, Alex
>
> _______________________________________________
> Lustre-devel mailing list
> Lustre-devel at lists.lustre.org
> http://lists.lustre.org/mailman/listinfo/lustre-devel
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2008-06-30 14:00 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-05 12:27 [Lustre-devel] some thoughts on COS Alex Zhuravlev
2008-06-07 14:00 ` Peter Braam
2008-06-30 8:10 ` Alex Zhuravlev
2008-06-30 10:11 ` Nikita Danilov
2008-06-30 11:04 ` Alex Zhuravlev
2008-06-30 14:00 ` Peter Braam
2008-06-10 18:50 ` Alexander Zarochentsev
2008-06-10 21:24 ` Peter Braam
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.