* [PATCH] rbd: handle parent_overlap on writes correctly
@ 2014-06-11 16:40 Ilya Dryomov
2014-06-13 1:26 ` Josh Durgin
2014-06-30 12:14 ` Alex Elder
0 siblings, 2 replies; 4+ messages in thread
From: Ilya Dryomov @ 2014-06-11 16:40 UTC (permalink / raw)
To: ceph-devel
The following check in rbd_img_obj_request_submit()
rbd_dev->parent_overlap <= obj_request->img_offset
allows the fall through to the non-layered write case even if both
parent_overlap and obj_request->img_offset belong to the same RADOS
object. This leads to data corruption, because the area to the left of
parent_overlap ends up unconditionally zero-filled instead of being
populated with parent data. Suppose we want to write 1M to offset 6M
of image bar, which is a clone of foo@snap; object_size is 4M,
parent_overlap is 5M:
rbd_data.<id>.0000000000000001
---------------------|----------------------|------------
| should be copyup'ed | should be zeroed out | write ...
---------------------|----------------------|------------
4M 5M 6M
parent_overlap obj_request->img_offset
4..5M should be copyup'ed from foo, yet it is zero-filled, just like
5..6M is.
Given that the only striping mode kernel client currently supports is
chunking (i.e. stripe_unit == object_size, stripe_count == 1), round
parent_overlap up to the next object boundary for the purposes of the
overlap check.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
---
drivers/block/rbd.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 8295b3afa8e0..813e673d49df 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -1366,6 +1366,14 @@ static bool obj_request_exists_test(struct rbd_obj_request *obj_request)
return test_bit(OBJ_REQ_EXISTS, &obj_request->flags) != 0;
}
+static bool obj_request_overlaps_parent(struct rbd_obj_request *obj_request)
+{
+ struct rbd_device *rbd_dev = obj_request->img_request->rbd_dev;
+
+ return obj_request->img_offset <
+ round_up(rbd_dev->parent_overlap, rbd_obj_bytes(&rbd_dev->header));
+}
+
static void rbd_obj_request_get(struct rbd_obj_request *obj_request)
{
dout("%s: obj %p (was %d)\n", __func__, obj_request,
@@ -2683,7 +2691,7 @@ static int rbd_img_obj_request_submit(struct rbd_obj_request *obj_request)
*/
if (!img_request_write_test(img_request) ||
!img_request_layered_test(img_request) ||
- rbd_dev->parent_overlap <= obj_request->img_offset ||
+ !obj_request_overlaps_parent(obj_request) ||
((known = obj_request_known_test(obj_request)) &&
obj_request_exists_test(obj_request))) {
--
1.7.10.4
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] rbd: handle parent_overlap on writes correctly
2014-06-11 16:40 [PATCH] rbd: handle parent_overlap on writes correctly Ilya Dryomov
@ 2014-06-13 1:26 ` Josh Durgin
2014-06-13 8:06 ` Ilya Dryomov
2014-06-30 12:14 ` Alex Elder
1 sibling, 1 reply; 4+ messages in thread
From: Josh Durgin @ 2014-06-13 1:26 UTC (permalink / raw)
To: Ilya Dryomov, ceph-devel
On 06/11/2014 09:40 AM, Ilya Dryomov wrote:
> The following check in rbd_img_obj_request_submit()
>
> rbd_dev->parent_overlap <= obj_request->img_offset
>
> allows the fall through to the non-layered write case even if both
> parent_overlap and obj_request->img_offset belong to the same RADOS
> object. This leads to data corruption, because the area to the left of
> parent_overlap ends up unconditionally zero-filled instead of being
> populated with parent data. Suppose we want to write 1M to offset 6M
> of image bar, which is a clone of foo@snap; object_size is 4M,
> parent_overlap is 5M:
>
> rbd_data.<id>.0000000000000001
> ---------------------|----------------------|------------
> | should be copyup'ed | should be zeroed out | write ...
> ---------------------|----------------------|------------
> 4M 5M 6M
> parent_overlap obj_request->img_offset
>
> 4..5M should be copyup'ed from foo, yet it is zero-filled, just like
> 5..6M is.
>
> Given that the only striping mode kernel client currently supports is
> chunking (i.e. stripe_unit == object_size, stripe_count == 1), round
> parent_overlap up to the next object boundary for the purposes of the
> overlap check.
>
> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
> ---
Good catch! This should be included in any stable kernels 3.10 or later
too.
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
> drivers/block/rbd.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index 8295b3afa8e0..813e673d49df 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -1366,6 +1366,14 @@ static bool obj_request_exists_test(struct rbd_obj_request *obj_request)
> return test_bit(OBJ_REQ_EXISTS, &obj_request->flags) != 0;
> }
>
> +static bool obj_request_overlaps_parent(struct rbd_obj_request *obj_request)
> +{
> + struct rbd_device *rbd_dev = obj_request->img_request->rbd_dev;
> +
> + return obj_request->img_offset <
> + round_up(rbd_dev->parent_overlap, rbd_obj_bytes(&rbd_dev->header));
> +}
> +
> static void rbd_obj_request_get(struct rbd_obj_request *obj_request)
> {
> dout("%s: obj %p (was %d)\n", __func__, obj_request,
> @@ -2683,7 +2691,7 @@ static int rbd_img_obj_request_submit(struct rbd_obj_request *obj_request)
> */
> if (!img_request_write_test(img_request) ||
> !img_request_layered_test(img_request) ||
> - rbd_dev->parent_overlap <= obj_request->img_offset ||
> + !obj_request_overlaps_parent(obj_request) ||
> ((known = obj_request_known_test(obj_request)) &&
> obj_request_exists_test(obj_request))) {
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] rbd: handle parent_overlap on writes correctly
2014-06-13 1:26 ` Josh Durgin
@ 2014-06-13 8:06 ` Ilya Dryomov
0 siblings, 0 replies; 4+ messages in thread
From: Ilya Dryomov @ 2014-06-13 8:06 UTC (permalink / raw)
To: Josh Durgin; +Cc: Ceph Development
On Fri, Jun 13, 2014 at 5:26 AM, Josh Durgin <josh.durgin@inktank.com> wrote:
> On 06/11/2014 09:40 AM, Ilya Dryomov wrote:
>>
>> The following check in rbd_img_obj_request_submit()
>>
>> rbd_dev->parent_overlap <= obj_request->img_offset
>>
>> allows the fall through to the non-layered write case even if both
>> parent_overlap and obj_request->img_offset belong to the same RADOS
>> object. This leads to data corruption, because the area to the left of
>> parent_overlap ends up unconditionally zero-filled instead of being
>> populated with parent data. Suppose we want to write 1M to offset 6M
>> of image bar, which is a clone of foo@snap; object_size is 4M,
>> parent_overlap is 5M:
>>
>> rbd_data.<id>.0000000000000001
>> ---------------------|----------------------|------------
>> | should be copyup'ed | should be zeroed out | write ...
>> ---------------------|----------------------|------------
>> 4M 5M 6M
>> parent_overlap obj_request->img_offset
>>
>> 4..5M should be copyup'ed from foo, yet it is zero-filled, just like
>> 5..6M is.
>>
>> Given that the only striping mode kernel client currently supports is
>> chunking (i.e. stripe_unit == object_size, stripe_count == 1), round
>> parent_overlap up to the next object boundary for the purposes of the
>> overlap check.
>>
>> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
>> ---
>
>
> Good catch! This should be included in any stable kernels 3.10 or later
> too.
>
> Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
This commit in ceph-client already has the 3.10+ stable tag.
Thanks,
Ilya
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] rbd: handle parent_overlap on writes correctly
2014-06-11 16:40 [PATCH] rbd: handle parent_overlap on writes correctly Ilya Dryomov
2014-06-13 1:26 ` Josh Durgin
@ 2014-06-30 12:14 ` Alex Elder
1 sibling, 0 replies; 4+ messages in thread
From: Alex Elder @ 2014-06-30 12:14 UTC (permalink / raw)
To: Ilya Dryomov, ceph-devel
On 06/11/2014 11:40 AM, Ilya Dryomov wrote:
> The following check in rbd_img_obj_request_submit()
>
> rbd_dev->parent_overlap <= obj_request->img_offset
>
> allows the fall through to the non-layered write case even if both
> parent_overlap and obj_request->img_offset belong to the same RADOS
> object. This leads to data corruption, because the area to the left of
> parent_overlap ends up unconditionally zero-filled instead of being
> populated with parent data. Suppose we want to write 1M to offset 6M
> of image bar, which is a clone of foo@snap; object_size is 4M,
> parent_overlap is 5M:
>
> rbd_data.<id>.0000000000000001
> ---------------------|----------------------|------------
> | should be copyup'ed | should be zeroed out | write ...
> ---------------------|----------------------|------------
> 4M 5M 6M
> parent_overlap obj_request->img_offset
>
> 4..5M should be copyup'ed from foo, yet it is zero-filled, just like
> 5..6M is.
>
> Given that the only striping mode kernel client currently supports is
> chunking (i.e. stripe_unit == object_size, stripe_count == 1), round
> parent_overlap up to the next object boundary for the purposes of the
> overlap check.
I know Josh already reviewed this, but I wanted to make sure
I understood it. Now I do.
Previously the first layered write request starting at the
"should be zeroed out" region would be treated as a normal
write. Consequently the child object would get created, but
only the affected portion would get written, meaning the
"should be copyup'ed" region in the child object would be
zero filled. Later reads to the "should be copyup'ed" would
see that child object existed, so the parent data will never
get filled into the "should be copyup'ed" area as it must be.
Rounding up like you did for this test makes sense--it makes
the granularity of this overlap test match the granularity
of the "exists" test.
Reviewed-by: Alex Elder <elder@linaro.org>
>
> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
> ---
> drivers/block/rbd.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index 8295b3afa8e0..813e673d49df 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -1366,6 +1366,14 @@ static bool obj_request_exists_test(struct rbd_obj_request *obj_request)
> return test_bit(OBJ_REQ_EXISTS, &obj_request->flags) != 0;
> }
>
> +static bool obj_request_overlaps_parent(struct rbd_obj_request *obj_request)
> +{
> + struct rbd_device *rbd_dev = obj_request->img_request->rbd_dev;
> +
> + return obj_request->img_offset <
> + round_up(rbd_dev->parent_overlap, rbd_obj_bytes(&rbd_dev->header));
> +}
> +
> static void rbd_obj_request_get(struct rbd_obj_request *obj_request)
> {
> dout("%s: obj %p (was %d)\n", __func__, obj_request,
> @@ -2683,7 +2691,7 @@ static int rbd_img_obj_request_submit(struct rbd_obj_request *obj_request)
> */
> if (!img_request_write_test(img_request) ||
> !img_request_layered_test(img_request) ||
> - rbd_dev->parent_overlap <= obj_request->img_offset ||
> + !obj_request_overlaps_parent(obj_request) ||
> ((known = obj_request_known_test(obj_request)) &&
> obj_request_exists_test(obj_request))) {
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-06-30 12:14 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-06-11 16:40 [PATCH] rbd: handle parent_overlap on writes correctly Ilya Dryomov
2014-06-13 1:26 ` Josh Durgin
2014-06-13 8:06 ` Ilya Dryomov
2014-06-30 12:14 ` Alex Elder
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.