* [Drbd-dev] lock for reading device state
@ 2006-12-06 17:22 Cristian Zamfir
2006-12-07 13:09 ` Lars Ellenberg
0 siblings, 1 reply; 5+ messages in thread
From: Cristian Zamfir @ 2006-12-06 17:22 UTC (permalink / raw)
To: drbd-dev
Hi,
I am using drbd to implement xen block device migration. Right now I am
parsing /proc/drbd to find out if the drives are synchronized and I can
migrate them. Is there a way to obtain a lock while reading and
processing this information and prevent other writes to the primary
device? I need to be able to prevent writes while reading the state of
the device from a script external to drbd.
In case there is no existing solution, can you please give me a few tips
on how to start developing such a locking mechanism?
Thank you very much.
Cristian
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Drbd-dev] lock for reading device state
2006-12-06 17:22 [Drbd-dev] lock for reading device state Cristian Zamfir
@ 2006-12-07 13:09 ` Lars Ellenberg
2006-12-07 13:52 ` Cristian Zamfir
0 siblings, 1 reply; 5+ messages in thread
From: Lars Ellenberg @ 2006-12-07 13:09 UTC (permalink / raw)
To: drbd-dev
/ 2006-12-06 17:22:43 +0000
\ Cristian Zamfir:
>
>
> Hi,
>
> I am using drbd to implement xen block device migration. Right now I
> am parsing /proc/drbd to find out if the drives are synchronized and I
> can migrate them.
you talk about drbd state "Connected, Consistent",
or what exactly are you parsing?
> Is there a way to obtain a lock while reading and processing this
> information and prevent other writes to the primary device?
no. why?
> I need to be able to prevent writes while reading the state of
> the device from a script external to drbd.
does not make sense to me yet?
> In case there is no existing solution, can you please give me a few
> tips on how to start developing such a locking mechanism?
please give more details about your assumptions and reasoning,
maybe its just that you have wrong expectations?
could also be that I'm just mentally block right now...
:)
--
: Lars Ellenberg Tel +43-1-8178292-55 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe http://www.linbit.com :
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Drbd-dev] lock for reading device state
2006-12-07 13:09 ` Lars Ellenberg
@ 2006-12-07 13:52 ` Cristian Zamfir
2006-12-07 15:56 ` Lars Ellenberg
0 siblings, 1 reply; 5+ messages in thread
From: Cristian Zamfir @ 2006-12-07 13:52 UTC (permalink / raw)
To: drbd-dev
Lars Ellenberg wrote:
> / 2006-12-06 17:22:43 +0000
> \ Cristian Zamfir:
>>
>> Hi,
>>
>> I am using drbd to implement xen block device migration. Right now I
>> am parsing /proc/drbd to find out if the drives are synchronized and I
>> can migrate them.
>
> you talk about drbd state "Connected, Consistent",
> or what exactly are you parsing?
Yes, indeed, I am parsing these values: "cs:Connected
st:Secondary/Primary ld:Consistent"
>
>> Is there a way to obtain a lock while reading and processing this
>> information and prevent other writes to the primary device?
>
> no. why?
>
I wrote a script that parses /proc/drbd on the primary node. While I am
running this script, writes to the primary device are still allowed. If
I find that the ld state is "Consistent" then I will make this node
secondary and the peer will become primary.
The problem is when writes happen while my script is making the peer
node primary.
A race situation would be the following:
At moment X, I read /proc/drbd and see the ld state is consistent.
At moment X+1 a write arrives at /dev/drbd1 and the devices are not
consistent any more. They start syncing but this may last longer, for
instance until moment X+5.
Now, at moment X+2, I wrongly believe that the state is still
consistentand I decide to make the peer node primary and thus loose the
write at moment X+1.
Are my assumptions correct so far?
I'm thinking that there are two solutions: One would be to prevent any
writes from Xen's domUs by modifying Xen.
The other would be to be able to hold a lock that prevents writes from
reaching /dev/drbdX and release it after the processing within the
script finishes (that is while I switch the peer device from secondary
to primary).
I haven't looked at drbd's source yet ( I am using 0.7.22 now) but I am
considering implementing this lock within drbd if there is no other
solution available.
As a future project, I am also interested if there is anyone working on
implementing multiple secondary devices. I am interested in having
multiple replicas of the primary node.
I hope this explains more my question.
Thank you very much for your help.
Cristian
>> I need to be able to prevent writes while reading the state of
>> the device from a script external to drbd.
>
> does not make sense to me yet?
>
>> In case there is no existing solution, can you please give me a few
>> tips on how to start developing such a locking mechanism?
>
> please give more details about your assumptions and reasoning,
> maybe its just that you have wrong expectations?
>
> could also be that I'm just mentally block right now...
>
> :)
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Drbd-dev] lock for reading device state
2006-12-07 13:52 ` Cristian Zamfir
@ 2006-12-07 15:56 ` Lars Ellenberg
2006-12-07 20:25 ` Cristian Zamfir
0 siblings, 1 reply; 5+ messages in thread
From: Lars Ellenberg @ 2006-12-07 15:56 UTC (permalink / raw)
To: drbd-dev
/ 2006-12-07 13:52:15 +0000
\ Cristian Zamfir:
>
>
> Lars Ellenberg wrote:
> >/ 2006-12-06 17:22:43 +0000
> >\ Cristian Zamfir:
> >>
> >>Hi,
> >>
> >>I am using drbd to implement xen block device migration. Right now I
> >>am parsing /proc/drbd to find out if the drives are synchronized and I
> >>can migrate them.
> >you talk about drbd state "Connected, Consistent",
> >or what exactly are you parsing?
>
> Yes, indeed, I am parsing these values: "cs:Connected st:Secondary/Primary ld:Consistent"
>
>
> >>Is there a way to obtain a lock while reading and processing this
> >>information and prevent other writes to the primary device?
> >no. why?
>
> I wrote a script that parses /proc/drbd on the primary node. While I am running this script, writes to the primary
> device are still allowed. If I find that the ld state is "Consistent" then I will make this node secondary and the
> peer will become primary.
> The problem is when writes happen while my script is making the peer node primary.
>
> A race situation would be the following:
> At moment X, I read /proc/drbd and see the ld state is consistent.
> At moment X+1 a write arrives at /dev/drbd1 and the devices are not
> consistent any more. They start syncing but this may last longer, for
> instance until moment X+5.
> Now, at moment X+2, I wrongly believe that the state is still
> consistentand I decide to make the peer node primary and thus loose
> the write at moment X+1.
>
> Are my assumptions correct so far?
no. you don't "become Inconsistent" because "some write".
"Consistent" in drbd speak is "not Inconsistent".
oh well.
so what is Inconsistent.
drbd starts as beeing "inconsistent" when the meta data is first
initialized. then you force one side to think it is Consistent,
to be able to make it Primary, and the initial full sync starts.
Once the sync is finished, the sync target becomes Connected Consistent.
If the nodes now disconnect, they still are "Consistent" in the sense of
"whatever data is on that disk, it is transactional consistent, though
maybe it is not 'clean', i.e. you may have to replay some journal to get
into 'clean' state."
You get into "Inconsistent" only by becoming SyncTarget after
(re)establishing the connection to the Peer and the handshake determins
that your data is different from the Peers, and the Peers is "better"
(which typically means "newer").
Because the Resync copies changed blocks linearly over the device,
while new writes get mirrored already, the data on the SyncTarget is
"not Consistent" anymore during sync. Even if we had data journalling
during degraded mode, and would replay that during Sync, the SyncTarget
would stay Consistent but "outdated" until the Resync was completely
done.
> I'm thinking that there are two solutions: One would be to prevent any writes from Xen's domUs by modifying Xen.
> The other would be to be able to hold a lock that prevents writes from reaching /dev/drbdX and release it after the
> processing within the script finishes (that is while I switch the peer device from secondary to primary).
>
> I haven't looked at drbd's source yet ( I am using 0.7.22 now) but I am considering implementing this lock within
> drbd if there is no other solution available.
That "lock" does not make sense to me,
and even if you could do it, it won't solve that "race",
it would only move it to some other point in time.
Note that a device in Secondary state denies access.
Also note that you cannot make a device Primary if it sees its Peer as
being Primary (unless you use drbd8, and explicitly allow
"two-primaries").
And a device that knows it is "Inconsistent" cannot be made Primary,
unless it is Connected, in which case it would be SyncTarget and get the
good data from the SyncSource Peer.
So what you need to do for xen migration with drbd 0.7 is:
Start the migration, once you think you want to switch over, i.e.
** once you are done writing on nodeA **
** you switch nodeA to Secondary. **
now, both nodes are Secondary, and neither can write.
now you can check wether the target nodeB is still Connected, Consistent.
if so, you make it Primary.
if not, you abort the migration.
"locking" the state of drbd or freezing io while it is Primary on
migration source nodeA won't help you in any way.
> As a future project, I am also interested if there is anyone working
> on implementing multiple secondary devices. I am interested in having
> multiple replicas of the primary node.
here at LINBIT we have some very nice concepts about how we'd implement
multiple (> 2) nodes and other nice features. But don't ask about timelines.
--
: Lars Ellenberg Tel +43-1-8178292-55 :
: LINBIT Information Technologies GmbH Fax +43-1-8178292-82 :
: Vivenotgasse 48, A-1120 Vienna/Europe http://www.linbit.com :
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Drbd-dev] lock for reading device state
2006-12-07 15:56 ` Lars Ellenberg
@ 2006-12-07 20:25 ` Cristian Zamfir
0 siblings, 0 replies; 5+ messages in thread
From: Cristian Zamfir @ 2006-12-07 20:25 UTC (permalink / raw)
To: drbd-dev
Lars Ellenberg wrote:
> / 2006-12-07 13:52:15 +0000
> \ Cristian Zamfir:
>>
>> Lars Ellenberg wrote:
>>> / 2006-12-06 17:22:43 +0000
>>> \ Cristian Zamfir:
>>>> Hi,
>>>>
>>>> I am using drbd to implement xen block device migration. Right now I
>>>> am parsing /proc/drbd to find out if the drives are synchronized and I
>>>> can migrate them.
>>> you talk about drbd state "Connected, Consistent",
>>> or what exactly are you parsing?
>> Yes, indeed, I am parsing these values: "cs:Connected st:Secondary/Primary ld:Consistent"
>>
>>
>>>> Is there a way to obtain a lock while reading and processing this
>>>> information and prevent other writes to the primary device?
>>> no. why?
>> I wrote a script that parses /proc/drbd on the primary node. While I am running this script, writes to the primary
>> device are still allowed. If I find that the ld state is "Consistent" then I will make this node secondary and the
>> peer will become primary.
>> The problem is when writes happen while my script is making the peer node primary.
>>
>> A race situation would be the following:
>> At moment X, I read /proc/drbd and see the ld state is consistent.
>> At moment X+1 a write arrives at /dev/drbd1 and the devices are not
>> consistent any more. They start syncing but this may last longer, for
>> instance until moment X+5.
>> Now, at moment X+2, I wrongly believe that the state is still
>> consistentand I decide to make the peer node primary and thus loose
>> the write at moment X+1.
>>
>> Are my assumptions correct so far?
>
> no. you don't "become Inconsistent" because "some write".
Thank you very much for your answer. I guess what I assumed incorrectly
was that writes would make the device inconsistent.
>
> "Consistent" in drbd speak is "not Inconsistent".
> oh well.
> so what is Inconsistent.
> drbd starts as beeing "inconsistent" when the meta data is first
> initialized. then you force one side to think it is Consistent,
> to be able to make it Primary, and the initial full sync starts.
>
> Once the sync is finished, the sync target becomes Connected Consistent.
> If the nodes now disconnect, they still are "Consistent" in the sense of
> "whatever data is on that disk, it is transactional consistent, though
> maybe it is not 'clean', i.e. you may have to replay some journal to get
> into 'clean' state."
>
> You get into "Inconsistent" only by becoming SyncTarget after
> (re)establishing the connection to the Peer and the handshake determins
> that your data is different from the Peers, and the Peers is "better"
> (which typically means "newer").
>
> Because the Resync copies changed blocks linearly over the device,
> while new writes get mirrored already, the data on the SyncTarget is
> "not Consistent" anymore during sync. Even if we had data journalling
> during degraded mode, and would replay that during Sync, the SyncTarget
> would stay Consistent but "outdated" until the Resync was completely
> done.
>
>> I'm thinking that there are two solutions: One would be to prevent any writes from Xen's domUs by modifying Xen.
>> The other would be to be able to hold a lock that prevents writes from reaching /dev/drbdX and release it after the
>> processing within the script finishes (that is while I switch the peer device from secondary to primary).
>>
>> I haven't looked at drbd's source yet ( I am using 0.7.22 now) but I am considering implementing this lock within
>> drbd if there is no other solution available.
>
> That "lock" does not make sense to me,
> and even if you could do it, it won't solve that "race",
> it would only move it to some other point in time.
>
> Note that a device in Secondary state denies access.
> Also note that you cannot make a device Primary if it sees its Peer as
> being Primary (unless you use drbd8, and explicitly allow
> "two-primaries").
I assume that using drbd8 would make xen bloc device migration easier
because both devices are primary. Am I right?
> And a device that knows it is "Inconsistent" cannot be made Primary,
> unless it is Connected, in which case it would be SyncTarget and get the
> good data from the SyncSource Peer.
>
> So what you need to do for xen migration with drbd 0.7 is:
> Start the migration, once you think you want to switch over, i.e.
> ** once you are done writing on nodeA **
> ** you switch nodeA to Secondary. **
> now, both nodes are Secondary, and neither can write.
> now you can check wether the target nodeB is still Connected, Consistent.
> if so, you make it Primary.
> if not, you abort the migration.
This is exactly what my code is doing now. I was worried that writes
would make the drive inconsistent so that is why I needed the lock. Now
it is clear that making the transition from primary to secondary is enough.
>
> "locking" the state of drbd or freezing io while it is Primary on
> migration source nodeA won't help you in any way.
>
>> As a future project, I am also interested if there is anyone working
>> on implementing multiple secondary devices. I am interested in having
>> multiple replicas of the primary node.
>
> here at LINBIT we have some very nice concepts about how we'd implement
> multiple (> 2) nodes and other nice features. But don't ask about timelines.
>
It is great that you are considering this because I will also start
working on something similar in the near future.
Thanks,
Cristian
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2006-12-07 20:25 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-12-06 17:22 [Drbd-dev] lock for reading device state Cristian Zamfir
2006-12-07 13:09 ` Lars Ellenberg
2006-12-07 13:52 ` Cristian Zamfir
2006-12-07 15:56 ` Lars Ellenberg
2006-12-07 20:25 ` Cristian Zamfir
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.