* [RFC]About ImageIndex
@ 2014-08-12 2:50 Haomai Wang
2014-08-13 2:14 ` Josh Durgin
0 siblings, 1 reply; 6+ messages in thread
From: Haomai Wang @ 2014-08-12 2:50 UTC (permalink / raw)
To: Josh Durgin, Sage Weil; +Cc: ceph-devel@vger.kernel.org
Hi Sage, Josh:
ImageIndex is aimed to hold each object's location info which avoid
extra checking for none-existing object. It's only used when image flags
exists LIBRBD_CREATE_NONSHARED. Otherwise, ImageIndex will become gawp and
has no effect.
Each object has three state:
1. UNKNOWN: default value, it will follow origin path
2. LOCAL: imply this object is local, don't need to lookup parent image
3. PARENT: imply this object is in the parent image, don't need to read
from local image
Note: ImageIndex isn't full sync to real data all the time. Because the
transformation {"unknown" -> "local", "unknown" -> "parent"} are safe.
So We only need to handle with the exception when ImageIndex implies
this object is "parent" but the real data is "local". There exists three
methods to solve it:
1. flush `state_map` every time when "parent" -> "local" happened
2. mark all objects from "parent" state to "unknown“ state when loading
image index(not including snapshot which has frozen index).
Here choose to implement method 2. This method only allow 2 read ops
in one read request at max and without overhead. Usually, librbd will
open image for many days(months) for normal use case such as VM usage.
So the image index will be warmed up and became smart when processing
ops.
Except image state changed problem, another concern is size. Image
index only permit single client write, but resize/flatten/rollback ops
are allowed to happen concurrently. For simply, now these ops don't
change and save states into rados. Resize op will affect "size" and
current write client will be notified.
Below listing object state change scenes:
1. When clone from image, it will mark all objects as "parent"
2. When creating snapshot, image index will be freeze and save it as the
index of the snapshot. All "parent" state objects will be marked as
"unknown" for safe
3. When write(including modified op) a object, it will mark this object
as "local"
4. When reading object, the current image object will be always read in
spite of the state. And the parent image's object will be checked and
trust the state. If exists parent image but read local object
successfully, the local object will be marked "local"
The principle is that only exists one client can operate and save the
ImageIndex changes into rados. Now the challenge is that how to decide
who is the owner of ImageIndex. Like VM usage, qemu will open image to
do IO ops and externally user also can do create snapshot for the
opened image(rbd snap create ...). So it will exists two client modify
the same image, one can be regarded as "owner client" another is
"management client". "management client" is expected to not change the
state of object.
I only come up with a idea that user need to call "set_owner_image"
when client want to become "owner client" and this client can operate
ImageIndex.
PR(https://github.com/ceph/ceph/pull/2212)
I have passed test_librbd and test_librbd_fsx tests. The IO logic
seemed worked as expected.
--
Best Regards,
Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC]About ImageIndex
2014-08-12 2:50 [RFC]About ImageIndex Haomai Wang
@ 2014-08-13 2:14 ` Josh Durgin
2014-08-13 11:32 ` Dhiraj Kamble
2014-08-13 12:37 ` Haomai Wang
0 siblings, 2 replies; 6+ messages in thread
From: Josh Durgin @ 2014-08-13 2:14 UTC (permalink / raw)
To: Haomai Wang, Sage Weil; +Cc: ceph-devel@vger.kernel.org
On 08/11/2014 07:50 PM, Haomai Wang wrote:
> Hi Sage, Josh:
>
> ImageIndex is aimed to hold each object's location info which avoid
> extra checking for none-existing object. It's only used when image flags
> exists LIBRBD_CREATE_NONSHARED. Otherwise, ImageIndex will become gawp and
> has no effect.
I like the general structure of the code, and it's great to hear you've
been testing it with test_librbd_fsx. Some thoughts on the design:
> Each object has three state:
> 1. UNKNOWN: default value, it will follow origin path
> 2. LOCAL: imply this object is local, don't need to lookup parent image
> 3. PARENT: imply this object is in the parent image, don't need to read
> from local image
What about calling these unknown, exists, and nonexistent instead?
This information can be used for non-clone use cases too. Internally
there's another ImageCtx for each parent, which can have its own
index, so I don't think there's a need to keep track of any objects
from different images in one ImageCtx.
> Note: ImageIndex isn't full sync to real data all the time. Because the
> transformation {"unknown" -> "local", "unknown" -> "parent"} are safe.
> So We only need to handle with the exception when ImageIndex implies
> this object is "parent" but the real data is "local". There exists three
> methods to solve it:
> 1. flush `state_map` every time when "parent" -> "local" happened
> 2. mark all objects from "parent" state to "unknown“ state when loading
> image index(not including snapshot which has frozen index).
At the end of the CDS session I was thinking that implementing it
strictly would be better, so that you could always trust the on-disk
copy of the index. In particular we were worried about the complexity of
handling a non-strict index during things like live migration, and
being able to use it to speed up management operations more reliably.
Since creating new objects (or discarding entire existing objects) is
pretty rare, it seems worth the extra I/Os to update the index
synchronously in these cases. To be fully safe I think any write to
a new object or discard of a full object would need to change index
state to unknown, do the write/discard, and finally change the index
state to exists/nonexistent. There might be some ways to optimize that
though.
> Here choose to implement method 2. This method only allow 2 read ops
> in one read request at max and without overhead. Usually, librbd will
> open image for many days(months) for normal use case such as VM usage.
> So the image index will be warmed up and became smart when processing
> ops.
>
> Except image state changed problem, another concern is size. Image
> index only permit single client write, but resize/flatten/rollback ops
> are allowed to happen concurrently. For simply, now these ops don't
> change and save states into rados. Resize op will affect "size" and
> current write client will be notified.
>
> Below listing object state change scenes:
> 1. When clone from image, it will mark all objects as "parent"
We can do this for any new image (mark all objects non-existent
initially), not just clones.
> 2. When creating snapshot, image index will be freeze and save it as the
> index of the snapshot. All "parent" state objects will be marked as
> "unknown" for safe
One implementation detail here - I think it'd be good to store snapshot
indexes in separate objects so that having many snapshots does not cause
the header object to become too large.
To recover from a partial snapshot, where the header was updated but
the index wasn't saved, it would be useful to have a rbd cli command to
repair (or optionally just check) the index for a snapshot.
> 3. When write(including modified op) a object, it will mark this object
> as "local"
> 4. When reading object, the current image object will be always read in
> spite of the state. And the parent image's object will be checked and
> trust the state. If exists parent image but read local object
> successfully, the local object will be marked "local"
>
> The principle is that only exists one client can operate and save the
> ImageIndex changes into rados. Now the challenge is that how to decide
> who is the owner of ImageIndex. Like VM usage, qemu will open image to
> do IO ops and externally user also can do create snapshot for the
> opened image(rbd snap create ...). So it will exists two client modify
> the same image, one can be regarded as "owner client" another is
> "management client". "management client" is expected to not change the
> state of object.
>
> I only come up with a idea that user need to call "set_owner_image"
> when client want to become "owner client" and this client can operate
> ImageIndex.
This is a great idea for all header updates. I've been thinking about
the same thing for rbd mirroring, and Sage started working on some
watch/notify changes that will help make this robust:
http://pad.ceph.com/p/watch-notify
> PR(https://github.com/ceph/ceph/pull/2212)
> I have passed test_librbd and test_librbd_fsx tests. The IO logic
> seemed worked as expected.
I haven't had time to look at the code in a lot of detail yet, but
it'd be nice to use just 2 bits per object on disk. I'm on vacation
the next couple weeks, but Sage may have more feedback.
Josh
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: [RFC]About ImageIndex
2014-08-13 2:14 ` Josh Durgin
@ 2014-08-13 11:32 ` Dhiraj Kamble
2014-08-13 12:38 ` Haomai Wang
2014-08-13 12:37 ` Haomai Wang
1 sibling, 1 reply; 6+ messages in thread
From: Dhiraj Kamble @ 2014-08-13 11:32 UTC (permalink / raw)
To: Josh Durgin, Haomai Wang, Sage Weil; +Cc: ceph-devel@vger.kernel.org
Hi Josh, Haomai,
I am currently working on a CLI to detect inconsistencies in rbd and have some functional checks working. We can leverage it to detect inconsistency with ImageIndex/header.
Let me know if I can help you with this.
Regards,
Dhiraj
-----Original Message-----
From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Josh Durgin
Sent: Wednesday, August 13, 2014 7:45 AM
To: Haomai Wang; Sage Weil
Cc: ceph-devel@vger.kernel.org
Subject: Re: [RFC]About ImageIndex
On 08/11/2014 07:50 PM, Haomai Wang wrote:
> Hi Sage, Josh:
>
> ImageIndex is aimed to hold each object's location info which avoid
> extra checking for none-existing object. It's only used when image
> flags exists LIBRBD_CREATE_NONSHARED. Otherwise, ImageIndex will
> become gawp and has no effect.
I like the general structure of the code, and it's great to hear you've been testing it with test_librbd_fsx. Some thoughts on the design:
> Each object has three state:
> 1. UNKNOWN: default value, it will follow origin path 2. LOCAL: imply
> this object is local, don't need to lookup parent image 3. PARENT:
> imply this object is in the parent image, don't need to read from
> local image
What about calling these unknown, exists, and nonexistent instead?
This information can be used for non-clone use cases too. Internally there's another ImageCtx for each parent, which can have its own index, so I don't think there's a need to keep track of any objects from different images in one ImageCtx.
> Note: ImageIndex isn't full sync to real data all the time. Because
> the transformation {"unknown" -> "local", "unknown" -> "parent"} are safe.
> So We only need to handle with the exception when ImageIndex implies
> this object is "parent" but the real data is "local". There exists
> three methods to solve it:
> 1. flush `state_map` every time when "parent" -> "local" happened 2.
> mark all objects from "parent" state to "unknown“ state when loading
> image index(not including snapshot which has frozen index).
At the end of the CDS session I was thinking that implementing it strictly would be better, so that you could always trust the on-disk copy of the index. In particular we were worried about the complexity of handling a non-strict index during things like live migration, and being able to use it to speed up management operations more reliably.
Since creating new objects (or discarding entire existing objects) is pretty rare, it seems worth the extra I/Os to update the index synchronously in these cases. To be fully safe I think any write to a new object or discard of a full object would need to change index state to unknown, do the write/discard, and finally change the index state to exists/nonexistent. There might be some ways to optimize that though.
> Here choose to implement method 2. This method only allow 2 read ops
> in one read request at max and without overhead. Usually, librbd will
> open image for many days(months) for normal use case such as VM usage.
> So the image index will be warmed up and became smart when processing
> ops.
>
> Except image state changed problem, another concern is size. Image
> index only permit single client write, but resize/flatten/rollback ops
> are allowed to happen concurrently. For simply, now these ops don't
> change and save states into rados. Resize op will affect "size" and
> current write client will be notified.
>
> Below listing object state change scenes:
> 1. When clone from image, it will mark all objects as "parent"
We can do this for any new image (mark all objects non-existent initially), not just clones.
> 2. When creating snapshot, image index will be freeze and save it as
> the index of the snapshot. All "parent" state objects will be marked
> as "unknown" for safe
One implementation detail here - I think it'd be good to store snapshot indexes in separate objects so that having many snapshots does not cause the header object to become too large.
To recover from a partial snapshot, where the header was updated but the index wasn't saved, it would be useful to have a rbd cli command to repair (or optionally just check) the index for a snapshot.
> 3. When write(including modified op) a object, it will mark this
> object as "local"
> 4. When reading object, the current image object will be always read
> in spite of the state. And the parent image's object will be checked
> and trust the state. If exists parent image but read local object
> successfully, the local object will be marked "local"
>
> The principle is that only exists one client can operate and save the
> ImageIndex changes into rados. Now the challenge is that how to decide
> who is the owner of ImageIndex. Like VM usage, qemu will open image to
> do IO ops and externally user also can do create snapshot for the
> opened image(rbd snap create ...). So it will exists two client modify
> the same image, one can be regarded as "owner client" another is
> "management client". "management client" is expected to not change the
> state of object.
>
> I only come up with a idea that user need to call "set_owner_image"
> when client want to become "owner client" and this client can operate
> ImageIndex.
This is a great idea for all header updates. I've been thinking about the same thing for rbd mirroring, and Sage started working on some watch/notify changes that will help make this robust:
http://pad.ceph.com/p/watch-notify
> PR(https://github.com/ceph/ceph/pull/2212)
> I have passed test_librbd and test_librbd_fsx tests. The IO logic
> seemed worked as expected.
I haven't had time to look at the code in a lot of detail yet, but it'd be nice to use just 2 bits per object on disk. I'm on vacation the next couple weeks, but Sage may have more feedback.
Josh
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
________________________________
PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC]About ImageIndex
2014-08-13 2:14 ` Josh Durgin
2014-08-13 11:32 ` Dhiraj Kamble
@ 2014-08-13 12:37 ` Haomai Wang
2014-08-13 13:12 ` Sage Weil
1 sibling, 1 reply; 6+ messages in thread
From: Haomai Wang @ 2014-08-13 12:37 UTC (permalink / raw)
To: Josh Durgin; +Cc: Sage Weil, ceph-devel@vger.kernel.org
On Wed, Aug 13, 2014 at 10:14 AM, Josh Durgin <josh.durgin@inktank.com> wrote:
> On 08/11/2014 07:50 PM, Haomai Wang wrote:
>>
>> Hi Sage, Josh:
>>
>> ImageIndex is aimed to hold each object's location info which avoid
>> extra checking for none-existing object. It's only used when image flags
>> exists LIBRBD_CREATE_NONSHARED. Otherwise, ImageIndex will become gawp and
>> has no effect.
>
>
> I like the general structure of the code, and it's great to hear you've
> been testing it with test_librbd_fsx. Some thoughts on the design:
>
>
>> Each object has three state:
>> 1. UNKNOWN: default value, it will follow origin path
>> 2. LOCAL: imply this object is local, don't need to lookup parent image
>> 3. PARENT: imply this object is in the parent image, don't need to read
>> from local image
>
>
> What about calling these unknown, exists, and nonexistent instead?
> This information can be used for non-clone use cases too. Internally
> there's another ImageCtx for each parent, which can have its own
> index, so I don't think there's a need to keep track of any objects
> from different images in one ImageCtx.
The names seemed better. But what's mean of latter sentences?
Now a image's ImageIndex only track own objects.
>
>
>> Note: ImageIndex isn't full sync to real data all the time. Because the
>> transformation {"unknown" -> "local", "unknown" -> "parent"} are safe.
>> So We only need to handle with the exception when ImageIndex implies
>> this object is "parent" but the real data is "local". There exists three
>> methods to solve it:
>> 1. flush `state_map` every time when "parent" -> "local" happened
>> 2. mark all objects from "parent" state to "unknown“ state when loading
>> image index(not including snapshot which has frozen index).
>
>
> At the end of the CDS session I was thinking that implementing it
> strictly would be better, so that you could always trust the on-disk copy of
> the index. In particular we were worried about the complexity of
> handling a non-strict index during things like live migration, and
> being able to use it to speed up management operations more reliably.
>
> Since creating new objects (or discarding entire existing objects) is
> pretty rare, it seems worth the extra I/Os to update the index
> synchronously in these cases. To be fully safe I think any write to
> a new object or discard of a full object would need to change index
> state to unknown, do the write/discard, and finally change the index
> state to exists/nonexistent. There might be some ways to optimize that
> though.
It make sense to me.
>
>
>> Here choose to implement method 2. This method only allow 2 read ops
>> in one read request at max and without overhead. Usually, librbd will
>> open image for many days(months) for normal use case such as VM usage.
>> So the image index will be warmed up and became smart when processing
>> ops.
>>
>> Except image state changed problem, another concern is size. Image
>> index only permit single client write, but resize/flatten/rollback ops
>> are allowed to happen concurrently. For simply, now these ops don't
>> change and save states into rados. Resize op will affect "size" and
>> current write client will be notified.
>>
>> Below listing object state change scenes:
>> 1. When clone from image, it will mark all objects as "parent"
>
>
> We can do this for any new image (mark all objects non-existent
> initially), not just clones.
>
>
>> 2. When creating snapshot, image index will be freeze and save it as the
>> index of the snapshot. All "parent" state objects will be marked as
>> "unknown" for safe
>
>
> One implementation detail here - I think it'd be good to store snapshot
> indexes in separate objects so that having many snapshots does not cause
> the header object to become too large.
Good point!
>
> To recover from a partial snapshot, where the header was updated but
> the index wasn't saved, it would be useful to have a rbd cli command to
> repair (or optionally just check) the index for a snapshot.
Yes, it's a good idea.
>
>
>> 3. When write(including modified op) a object, it will mark this object
>> as "local"
>> 4. When reading object, the current image object will be always read in
>> spite of the state. And the parent image's object will be checked and
>> trust the state. If exists parent image but read local object
>> successfully, the local object will be marked "local"
>>
>> The principle is that only exists one client can operate and save the
>> ImageIndex changes into rados. Now the challenge is that how to decide
>> who is the owner of ImageIndex. Like VM usage, qemu will open image to
>> do IO ops and externally user also can do create snapshot for the
>> opened image(rbd snap create ...). So it will exists two client modify
>> the same image, one can be regarded as "owner client" another is
>> "management client". "management client" is expected to not change the
>> state of object.
>>
>> I only come up with a idea that user need to call "set_owner_image"
>> when client want to become "owner client" and this client can operate
>> ImageIndex.
>
>
> This is a great idea for all header updates. I've been thinking about
> the same thing for rbd mirroring, and Sage started working on some
> watch/notify changes that will help make this robust:
Hmm, I will dive into it.
>
> http://pad.ceph.com/p/watch-notify
>
>
>> PR(https://github.com/ceph/ceph/pull/2212)
>> I have passed test_librbd and test_librbd_fsx tests. The IO logic
>> seemed worked as expected.
>
>
> I haven't had time to look at the code in a lot of detail yet, but
> it'd be nice to use just 2 bits per object on disk. I'm on vacation
> the next couple weeks, but Sage may have more feedback.
>
> Josh
Thanks for your reply, Josh!
--
Best Regards,
Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC]About ImageIndex
2014-08-13 11:32 ` Dhiraj Kamble
@ 2014-08-13 12:38 ` Haomai Wang
0 siblings, 0 replies; 6+ messages in thread
From: Haomai Wang @ 2014-08-13 12:38 UTC (permalink / raw)
To: Dhiraj Kamble; +Cc: Josh Durgin, Sage Weil, ceph-devel@vger.kernel.org
Thanks! When this bp arrived finished, it's a good time to implement
the functional checks.
On Wed, Aug 13, 2014 at 7:32 PM, Dhiraj Kamble
<Dhiraj.Kamble@sandisk.com> wrote:
> Hi Josh, Haomai,
>
> I am currently working on a CLI to detect inconsistencies in rbd and have some functional checks working. We can leverage it to detect inconsistency with ImageIndex/header.
> Let me know if I can help you with this.
>
>
> Regards,
> Dhiraj
>
>
> -----Original Message-----
> From: ceph-devel-owner@vger.kernel.org [mailto:ceph-devel-owner@vger.kernel.org] On Behalf Of Josh Durgin
> Sent: Wednesday, August 13, 2014 7:45 AM
> To: Haomai Wang; Sage Weil
> Cc: ceph-devel@vger.kernel.org
> Subject: Re: [RFC]About ImageIndex
>
> On 08/11/2014 07:50 PM, Haomai Wang wrote:
>> Hi Sage, Josh:
>>
>> ImageIndex is aimed to hold each object's location info which avoid
>> extra checking for none-existing object. It's only used when image
>> flags exists LIBRBD_CREATE_NONSHARED. Otherwise, ImageIndex will
>> become gawp and has no effect.
>
> I like the general structure of the code, and it's great to hear you've been testing it with test_librbd_fsx. Some thoughts on the design:
>
>> Each object has three state:
>> 1. UNKNOWN: default value, it will follow origin path 2. LOCAL: imply
>> this object is local, don't need to lookup parent image 3. PARENT:
>> imply this object is in the parent image, don't need to read from
>> local image
>
> What about calling these unknown, exists, and nonexistent instead?
> This information can be used for non-clone use cases too. Internally there's another ImageCtx for each parent, which can have its own index, so I don't think there's a need to keep track of any objects from different images in one ImageCtx.
>
>> Note: ImageIndex isn't full sync to real data all the time. Because
>> the transformation {"unknown" -> "local", "unknown" -> "parent"} are safe.
>> So We only need to handle with the exception when ImageIndex implies
>> this object is "parent" but the real data is "local". There exists
>> three methods to solve it:
>> 1. flush `state_map` every time when "parent" -> "local" happened 2.
>> mark all objects from "parent" state to "unknown“ state when loading
>> image index(not including snapshot which has frozen index).
>
> At the end of the CDS session I was thinking that implementing it strictly would be better, so that you could always trust the on-disk copy of the index. In particular we were worried about the complexity of handling a non-strict index during things like live migration, and being able to use it to speed up management operations more reliably.
>
> Since creating new objects (or discarding entire existing objects) is pretty rare, it seems worth the extra I/Os to update the index synchronously in these cases. To be fully safe I think any write to a new object or discard of a full object would need to change index state to unknown, do the write/discard, and finally change the index state to exists/nonexistent. There might be some ways to optimize that though.
>
>> Here choose to implement method 2. This method only allow 2 read ops
>> in one read request at max and without overhead. Usually, librbd will
>> open image for many days(months) for normal use case such as VM usage.
>> So the image index will be warmed up and became smart when processing
>> ops.
>>
>> Except image state changed problem, another concern is size. Image
>> index only permit single client write, but resize/flatten/rollback ops
>> are allowed to happen concurrently. For simply, now these ops don't
>> change and save states into rados. Resize op will affect "size" and
>> current write client will be notified.
>>
>> Below listing object state change scenes:
>> 1. When clone from image, it will mark all objects as "parent"
>
> We can do this for any new image (mark all objects non-existent initially), not just clones.
>
>> 2. When creating snapshot, image index will be freeze and save it as
>> the index of the snapshot. All "parent" state objects will be marked
>> as "unknown" for safe
>
> One implementation detail here - I think it'd be good to store snapshot indexes in separate objects so that having many snapshots does not cause the header object to become too large.
>
> To recover from a partial snapshot, where the header was updated but the index wasn't saved, it would be useful to have a rbd cli command to repair (or optionally just check) the index for a snapshot.
>
>> 3. When write(including modified op) a object, it will mark this
>> object as "local"
>> 4. When reading object, the current image object will be always read
>> in spite of the state. And the parent image's object will be checked
>> and trust the state. If exists parent image but read local object
>> successfully, the local object will be marked "local"
>>
>> The principle is that only exists one client can operate and save the
>> ImageIndex changes into rados. Now the challenge is that how to decide
>> who is the owner of ImageIndex. Like VM usage, qemu will open image to
>> do IO ops and externally user also can do create snapshot for the
>> opened image(rbd snap create ...). So it will exists two client modify
>> the same image, one can be regarded as "owner client" another is
>> "management client". "management client" is expected to not change the
>> state of object.
>>
>> I only come up with a idea that user need to call "set_owner_image"
>> when client want to become "owner client" and this client can operate
>> ImageIndex.
>
> This is a great idea for all header updates. I've been thinking about the same thing for rbd mirroring, and Sage started working on some watch/notify changes that will help make this robust:
>
> http://pad.ceph.com/p/watch-notify
>
>> PR(https://github.com/ceph/ceph/pull/2212)
>> I have passed test_librbd and test_librbd_fsx tests. The IO logic
>> seemed worked as expected.
>
> I haven't had time to look at the code in a lot of detail yet, but it'd be nice to use just 2 bits per object on disk. I'm on vacation the next couple weeks, but Sage may have more feedback.
>
> Josh
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> ________________________________
>
> PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
>
--
Best Regards,
Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC]About ImageIndex
2014-08-13 12:37 ` Haomai Wang
@ 2014-08-13 13:12 ` Sage Weil
0 siblings, 0 replies; 6+ messages in thread
From: Sage Weil @ 2014-08-13 13:12 UTC (permalink / raw)
To: Haomai Wang; +Cc: Josh Durgin, ceph-devel@vger.kernel.org
On Wed, 13 Aug 2014, Haomai Wang wrote:
> On Wed, Aug 13, 2014 at 10:14 AM, Josh Durgin <josh.durgin@inktank.com> wrote:
> > On 08/11/2014 07:50 PM, Haomai Wang wrote:
> >>
> >> Hi Sage, Josh:
> >>
> >> ImageIndex is aimed to hold each object's location info which avoid
> >> extra checking for none-existing object. It's only used when image flags
> >> exists LIBRBD_CREATE_NONSHARED. Otherwise, ImageIndex will become gawp and
> >> has no effect.
> >
> >
> > I like the general structure of the code, and it's great to hear you've
> > been testing it with test_librbd_fsx. Some thoughts on the design:
> >
> >
> >> Each object has three state:
> >> 1. UNKNOWN: default value, it will follow origin path
> >> 2. LOCAL: imply this object is local, don't need to lookup parent image
> >> 3. PARENT: imply this object is in the parent image, don't need to read
> >> from local image
> >
> >
> > What about calling these unknown, exists, and nonexistent instead?
> > This information can be used for non-clone use cases too. Internally
> > there's another ImageCtx for each parent, which can have its own
> > index, so I don't think there's a need to keep track of any objects
> > from different images in one ImageCtx.
>
> The names seemed better. But what's mean of latter sentences?
> Now a image's ImageIndex only track own objects.
Just that we should stay away from using the term "parent" as a state name
because we should not reflect the parent's allocation state (the parent
ImageCtx does that), only our own. Your definition really means
NONEXISTENT (object doesn't exist), and it useful/meaningful even if
cloning is not used (for example, rbd_remove() or rbd_discard() can skip
this object). Similarly, s/LOCAL/EXISTS/.
Cheers-
sage
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-08-13 13:12 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-12 2:50 [RFC]About ImageIndex Haomai Wang
2014-08-13 2:14 ` Josh Durgin
2014-08-13 11:32 ` Dhiraj Kamble
2014-08-13 12:38 ` Haomai Wang
2014-08-13 12:37 ` Haomai Wang
2014-08-13 13:12 ` Sage Weil
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.