* Re: [PATCH] Rbd: copy-on-read support for kernel rbd client
2015-05-21 3:11 [PATCH] Rbd: copy-on-read support for kernel rbd client Li Wang
@ 2015-05-21 9:07 ` Ilya Dryomov
2015-05-21 9:19 ` [PATCH 1/4] Rbd: add an option for copy-on-read Li Wang
` (3 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Ilya Dryomov @ 2015-05-21 9:07 UTC (permalink / raw)
To: Li Wang
Cc: Sage Weil, Alex Elder, Josh Durgin, Ceph Development, Min Chen,
Yunchuan Wen
On Thu, May 21, 2015 at 6:11 AM, Li Wang <liwang@ubuntukylin.com> wrote:
> This is a new feature of rbd layering, when reading an object
> from child, if not exist, the kernel rbd client will not only
> request parent for the object, but also write it to child,
> and the jobs are done in an asynchronous way. Therefore, the
> subsequent accesses on this object will hit child without
> bothering parent. This feature could avoid longer latency incurred
> during accessing parent, especially when child and parent are
> geographically isolated, and it also could potentially avoid
> overloading parent.
>
> The patches:
> https://github.com/ceph/ceph-client/pull/11
>
> Min Chen (4):
> Rbd: add an option for copy-on-read
> Rbd: add a new request: rbd_copyup_request
> Rbd: helper functions to manipulate rbd_copyup_request
> Rbd: implement the copy-on-read logic
>
> drivers/block/rbd.c | 385 +++++++++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 382 insertions(+), 3 deletions(-)
Kernel client patches should be posted to ceph-devel. While I can
certainly do my review on github, others won't. Please post those
patches as a reply to your cover letter.
Thanks,
Ilya
^ permalink raw reply [flat|nested] 7+ messages in thread* [PATCH 1/4] Rbd: add an option for copy-on-read
2015-05-21 3:11 [PATCH] Rbd: copy-on-read support for kernel rbd client Li Wang
2015-05-21 9:07 ` Ilya Dryomov
@ 2015-05-21 9:19 ` Li Wang
2015-05-21 9:19 ` [PATCH 2/4] Rbd: add a new request: rbd_copyup_request Li Wang
` (2 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Li Wang @ 2015-05-21 9:19 UTC (permalink / raw)
To: Sage Weil, Ilya Dryomov, Alex Elder, Josh Durgin
Cc: ceph-devel, Min Chen, Yunchuan Wen, Li Wang
From: Min Chen <minchen@ubuntukylin.com>
Signed-off-by: Min Chen <minchen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
---
drivers/block/rbd.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index ec6c5c6..446204b 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -324,6 +324,7 @@ struct rbd_mapping {
u64 size;
u64 features;
bool read_only;
+ bool copy_on_read; /* rbd map option: copy on read */
};
/*
@@ -733,6 +734,7 @@ enum {
/* string args above */
Opt_read_only,
Opt_read_write,
+ Opt_copy_on_read,
/* Boolean args above */
Opt_last_bool,
};
@@ -744,15 +746,19 @@ static match_table_t rbd_opts_tokens = {
{Opt_read_only, "ro"}, /* Alternate spelling */
{Opt_read_write, "read_write"},
{Opt_read_write, "rw"}, /* Alternate spelling */
+ {Opt_copy_on_read, "copy_on_read"},
+ {Opt_copy_on_read, "cor"}, /* Alternate spelling */
/* Boolean args above */
{-1, NULL}
};
struct rbd_options {
bool read_only;
+ bool copy_on_read;
};
#define RBD_READ_ONLY_DEFAULT false
+#define RBD_COPY_ON_READ_DEFAULT false
static int parse_rbd_opts_token(char *c, void *private)
{
@@ -788,6 +794,9 @@ static int parse_rbd_opts_token(char *c, void *private)
case Opt_read_write:
rbd_opts->read_only = false;
break;
+ case Opt_copy_on_read:
+ rbd_opts->copy_on_read = true;
+ break;
default:
rbd_assert(false);
break;
@@ -1736,6 +1745,13 @@ static void rbd_osd_trivial_callback(struct rbd_obj_request *obj_request)
obj_request_done_set(obj_request);
}
+static inline bool is_copy_on_read(struct rbd_device *rbd_dev)
+{
+ rbd_assert(rbd_dev);
+ rbd_assert(&rbd_dev->mapping);
+ return !rbd_dev->mapping.read_only && rbd_dev->mapping.copy_on_read;
+}
+
static void rbd_osd_read_callback(struct rbd_obj_request *obj_request)
{
struct rbd_img_request *img_request = NULL;
@@ -4933,6 +4949,7 @@ static int rbd_add_parse_args(const char *buf,
goto out_mem;
rbd_opts->read_only = RBD_READ_ONLY_DEFAULT;
+ rbd_opts->copy_on_read = RBD_COPY_ON_READ_DEFAULT;
copts = ceph_parse_options(options, mon_addrs,
mon_addrs + mon_addrs_size - 1,
@@ -5385,6 +5402,7 @@ static ssize_t do_rbd_add(struct bus_type *bus,
struct rbd_spec *spec = NULL;
struct rbd_client *rbdc;
bool read_only;
+ bool copy_on_read;
int rc = -ENOMEM;
if (!try_module_get(THIS_MODULE))
@@ -5395,6 +5413,7 @@ static ssize_t do_rbd_add(struct bus_type *bus,
if (rc < 0)
goto err_out_module;
read_only = rbd_opts->read_only;
+ copy_on_read = rbd_opts->copy_on_read;
kfree(rbd_opts);
rbd_opts = NULL; /* done with this */
@@ -5436,7 +5455,9 @@ static ssize_t do_rbd_add(struct bus_type *bus,
if (rbd_dev->spec->snap_id != CEPH_NOSNAP)
read_only = true;
+
rbd_dev->mapping.read_only = read_only;
+ rbd_dev->mapping.copy_on_read = copy_on_read;
rc = rbd_dev_device_setup(rbd_dev);
if (rc) {
--
1.9.1
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH 2/4] Rbd: add a new request: rbd_copyup_request
2015-05-21 3:11 [PATCH] Rbd: copy-on-read support for kernel rbd client Li Wang
2015-05-21 9:07 ` Ilya Dryomov
2015-05-21 9:19 ` [PATCH 1/4] Rbd: add an option for copy-on-read Li Wang
@ 2015-05-21 9:19 ` Li Wang
2015-05-21 9:19 ` [PATCH 3/4] Rbd: helper functions to manipulate rbd_copyup_request Li Wang
2015-05-21 9:19 ` [PATCH 4/4] Rbd: implement the copy-on-read logic Li Wang
4 siblings, 0 replies; 7+ messages in thread
From: Li Wang @ 2015-05-21 9:19 UTC (permalink / raw)
To: Sage Weil, Ilya Dryomov, Alex Elder, Josh Durgin
Cc: ceph-devel, Min Chen, Yunchuan Wen, Li Wang
From: Min Chen <minchen@ubuntukylin.com>
Rbd_copyup_request is used only when copy_on_read enabled
for RBD child images. It is independent of rbd_obj_request.
Signed-off-by: Min Chen <minchen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
---
drivers/block/rbd.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 50 insertions(+)
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 446204b..581cb7b 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -207,6 +207,9 @@ typedef void (*rbd_img_callback_t)(struct rbd_img_request *);
struct rbd_obj_request;
typedef void (*rbd_obj_callback_t)(struct rbd_obj_request *);
+struct rbd_copyup_request;
+typedef void (*rbd_copyup_callback_t)(struct rbd_copyup_request *);
+
enum obj_request_type {
OBJ_REQUEST_NODATA, OBJ_REQUEST_BIO, OBJ_REQUEST_PAGES
};
@@ -279,6 +282,27 @@ struct rbd_obj_request {
struct kref kref;
};
+struct rbd_copyup_request {
+ const char * object_name;
+ u64 object_no;
+
+ struct page **copyup_pages;
+ u32 copyup_page_count;
+
+ struct rbd_img_request *img_request;
+ /* links for img_request->copyup_requests list */
+ struct list_head links;
+
+ struct ceph_osd_request *osd_req;
+
+ rbd_copyup_callback_t callback;
+ struct completion completion;
+
+ int result;
+
+ struct kref kref;
+};
+
enum img_req_flags {
IMG_REQ_WRITE, /* I/O direction: read = 0, write = 1 */
IMG_REQ_CHILD, /* initiator: block = 0, child image = 1 */
@@ -310,6 +334,9 @@ struct rbd_img_request {
u32 obj_request_count;
struct list_head obj_requests; /* rbd_obj_request structs */
+ struct list_head copyup_list; /* rbd_copyup_request list */
+ spinlock_t copyup_list_lock; /* protects copyup list */
+
struct kref kref;
};
@@ -320,6 +347,9 @@ struct rbd_img_request {
#define for_each_obj_request_safe(ireq, oreq, n) \
list_for_each_entry_safe_reverse(oreq, n, &(ireq)->obj_requests, links)
+#define for_each_copyup_request(ireq, cpreq) \
+ list_for_each_entry(cpreq, &(ireq)->copyup_list, links)
+
struct rbd_mapping {
u64 size;
u64 features;
@@ -399,6 +429,7 @@ static DEFINE_SPINLOCK(rbd_client_list_lock);
static struct kmem_cache *rbd_img_request_cache;
static struct kmem_cache *rbd_obj_request_cache;
+static struct kmem_cache *rbd_copyup_request_cache;
static struct kmem_cache *rbd_segment_name_cache;
static int rbd_major;
@@ -2178,6 +2209,8 @@ static struct rbd_img_request *rbd_img_request_create(
img_request->result = 0;
img_request->obj_request_count = 0;
INIT_LIST_HEAD(&img_request->obj_requests);
+ INIT_LIST_HEAD(&img_request->copyup_list);
+ spin_lock_init(&img_request->copyup_list_lock);
kref_init(&img_request->kref);
dout("%s: rbd_dev %p %s %llu/%llu -> img %p\n", __func__, rbd_dev,
@@ -5666,6 +5699,14 @@ static int rbd_slab_init(void)
if (!rbd_obj_request_cache)
goto out_err;
+ rbd_assert(!rbd_copyup_request_cache);
+ rbd_copyup_request_cache = kmem_cache_create("rbd_copyup_request",
+ sizeof (struct rbd_copyup_request),
+ __alignof__(struct rbd_copyup_request),
+ 0, NULL);
+ if (!rbd_copyup_request_cache)
+ goto out_err;
+
rbd_assert(!rbd_segment_name_cache);
rbd_segment_name_cache = kmem_cache_create("rbd_segment_name",
CEPH_MAX_OID_NAME_LEN + 1, 1, 0, NULL);
@@ -5677,6 +5718,11 @@ out_err:
rbd_obj_request_cache = NULL;
}
+ if (rbd_copyup_request_cache) {
+ kmem_cache_destroy(rbd_copyup_request_cache);
+ rbd_copyup_request_cache = NULL;
+ }
+
kmem_cache_destroy(rbd_img_request_cache);
rbd_img_request_cache = NULL;
@@ -5693,6 +5739,10 @@ static void rbd_slab_exit(void)
kmem_cache_destroy(rbd_obj_request_cache);
rbd_obj_request_cache = NULL;
+ rbd_assert(rbd_copyup_request_cache);
+ kmem_cache_destroy(rbd_copyup_request_cache);
+ rbd_copyup_request_cache = NULL;
+
rbd_assert(rbd_img_request_cache);
kmem_cache_destroy(rbd_img_request_cache);
rbd_img_request_cache = NULL;
--
1.9.1
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH 3/4] Rbd: helper functions to manipulate rbd_copyup_request
2015-05-21 3:11 [PATCH] Rbd: copy-on-read support for kernel rbd client Li Wang
` (2 preceding siblings ...)
2015-05-21 9:19 ` [PATCH 2/4] Rbd: add a new request: rbd_copyup_request Li Wang
@ 2015-05-21 9:19 ` Li Wang
2015-05-21 9:19 ` [PATCH 4/4] Rbd: implement the copy-on-read logic Li Wang
4 siblings, 0 replies; 7+ messages in thread
From: Li Wang @ 2015-05-21 9:19 UTC (permalink / raw)
To: Sage Weil, Ilya Dryomov, Alex Elder, Josh Durgin
Cc: ceph-devel, Min Chen, Yunchuan Wen, Li Wang
From: Min Chen <minchen@ubuntukylin.com>
Signed-off-by: Min Chen <minchen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
---
drivers/block/rbd.c | 128 ++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 128 insertions(+)
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 581cb7b..99a3a556 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -1202,6 +1202,26 @@ static void rbd_dev_mapping_clear(struct rbd_device *rbd_dev)
rbd_dev->mapping.features = 0;
}
+static inline u64 rbd_object_no(struct rbd_device *rbd_dev, const char *object_name)
+{
+ const char *ptr = NULL;
+ size_t len = 0;
+ u64 offset_width = 10;
+ u64 obj_no = (u64)-1;
+
+ rbd_assert(rbd_dev);
+ rbd_assert(object_name);
+
+ ptr = object_name;
+ len = strlen(object_name);
+ if (rbd_dev->image_format == 2)
+ offset_width = 16;
+ rbd_assert(len >= offset_width);
+ ptr += len - offset_width;
+ obj_no = simple_strtoull(ptr, NULL, 16);
+ return obj_no;
+}
+
static void rbd_segment_name_free(const char *name)
{
/* The explicit cast here is needed to drop the const qualifier */
@@ -1518,6 +1538,8 @@ static void rbd_obj_request_put(struct rbd_obj_request *obj_request)
kref_put(&obj_request->kref, rbd_obj_request_destroy);
}
+static void rbd_copyup_request_destroy(struct kref *kref);
+
static void rbd_img_request_get(struct rbd_img_request *img_request)
{
dout("%s: img %p (was %d)\n", __func__, img_request,
@@ -1575,6 +1597,34 @@ static inline void rbd_img_obj_request_del(struct rbd_img_request *img_request,
rbd_obj_request_put(obj_request);
}
+static inline void rbd_img_copyup_request_add(struct rbd_img_request *img_request,
+ struct rbd_copyup_request *copyup_request)
+{
+ rbd_assert(copyup_request->img_request == NULL)
+ copyup_request->img_request = img_request;
+ /*
+ * For object locality, the new request is more likely to
+ * access the last inserted object, so, just insert copyup_request
+ * after head of list_head(copyup_list)
+ */
+ spin_lock(&img_request->copyup_list_lock);
+ list_add(©up_request->links, &img_request->copyup_list);
+ spin_unlock(&img_request->copyup_list_lock);
+}
+
+static inline void rbd_img_copyup_request_del(struct rbd_img_request *img_request,
+ struct rbd_copyup_request *copyup_request)
+{
+ rbd_assert(copyup_request != NULL);
+ spin_lock(&img_request->copyup_list_lock);
+ list_del(©up_request->links);
+ spin_unlock(&img_request->copyup_list_lock);
+
+ rbd_assert(copyup_request->img_request == img_request);
+ copyup_request->img_request = NULL;
+ copyup_request->callback = NULL;
+}
+
static bool obj_request_type_valid(enum obj_request_type type)
{
switch (type) {
@@ -1726,6 +1776,8 @@ rbd_img_request_op_type(struct rbd_img_request *img_request)
return OBJ_OP_READ;
}
+static void rbd_img_copyup_start(struct rbd_img_request *img_request, const char *object_name);
+
static void
rbd_img_obj_request_read_callback(struct rbd_obj_request *obj_request)
{
@@ -2108,6 +2160,82 @@ static void rbd_obj_request_destroy(struct kref *kref)
kmem_cache_free(rbd_obj_request_cache, obj_request);
}
+static struct rbd_copyup_request *rbd_copyup_request_create(const char *object_name,
+ struct rbd_device *rbd_dev)
+{
+ struct rbd_copyup_request *copyup_request = NULL;
+ size_t size = 0;
+ u64 length = 0;
+ char *name = NULL;
+ struct page **pages = NULL;
+ u32 page_count = 0;
+
+ rbd_assert(rbd_dev);
+ rbd_assert(object_name);
+
+ /* Allocate memory for object_name */
+ size = strlen(object_name) + 1;
+ name = kmalloc(size, GFP_KERNEL);
+ if(!name)
+ goto out_name;
+
+ /* Allocate memory for entire object */
+ length = (u64)1 << rbd_dev->header.obj_order;
+ page_count = (u32)calc_pages_for(0,length);
+ pages = ceph_alloc_page_vector(page_count, GFP_KERNEL);
+ if (IS_ERR(pages))
+ goto out_pages;
+
+ /* Allocate memory for struct rbd_copyup_request */
+ copyup_request = kmem_cache_zalloc(rbd_copyup_request_cache, GFP_KERNEL);
+ if(!copyup_request)
+ goto out_request;
+
+ /* Init all members of struct rbd_copyup_request */
+ copyup_request->object_name = memcpy(name, object_name, size);
+ copyup_request->object_no = rbd_object_no(rbd_dev, object_name);
+ copyup_request->copyup_pages = pages;
+ copyup_request->copyup_page_count = page_count;
+
+ INIT_LIST_HEAD(©up_request->links);
+
+ init_completion(©up_request->completion);
+
+ return copyup_request;
+out_request:
+ if (copyup_request)
+ kmem_cache_free(rbd_copyup_request_cache, copyup_request);
+out_pages:
+ if (pages)
+ ceph_release_page_vector(pages, page_count);
+out_name:
+ if (name)
+ kfree(name);
+ return NULL;
+}
+
+static void rbd_copyup_request_destroy(struct kref *kref)
+{
+ struct rbd_copyup_request *copyup_request;
+ copyup_request = container_of(kref, struct rbd_copyup_request, kref);
+
+ if (copyup_request->osd_req) {
+ rbd_osd_req_destroy(copyup_request->osd_req);
+ copyup_request->osd_req = NULL;
+ }
+
+ if (copyup_request->copyup_pages) {
+ ceph_release_page_vector(copyup_request->copyup_pages, copyup_request->copyup_page_count);
+ copyup_request->copyup_pages = NULL;
+ }
+
+ if (copyup_request->object_name) {
+ kfree(copyup_request->object_name);
+ copyup_request->object_name = NULL;
+ }
+ kmem_cache_free(rbd_copyup_request_cache, copyup_request);
+}
+
/* It's OK to call this for a device with no parent */
static void rbd_spec_put(struct rbd_spec *spec);
--
1.9.1
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH 4/4] Rbd: implement the copy-on-read logic
2015-05-21 3:11 [PATCH] Rbd: copy-on-read support for kernel rbd client Li Wang
` (3 preceding siblings ...)
2015-05-21 9:19 ` [PATCH 3/4] Rbd: helper functions to manipulate rbd_copyup_request Li Wang
@ 2015-05-21 9:19 ` Li Wang
2015-06-24 8:50 ` Ilya Dryomov
4 siblings, 1 reply; 7+ messages in thread
From: Li Wang @ 2015-05-21 9:19 UTC (permalink / raw)
To: Sage Weil, Ilya Dryomov, Alex Elder, Josh Durgin
Cc: ceph-devel, Min Chen, Yunchuan Wen, Li Wang
From: Min Chen <minchen@ubuntukylin.com>
Signed-off-by: Min Chen <minchen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
---
drivers/block/rbd.c | 186 +++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 183 insertions(+), 3 deletions(-)
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 99a3a556..51d8398 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -1851,12 +1851,16 @@ static void rbd_osd_read_callback(struct rbd_obj_request *obj_request)
obj_request, img_request, obj_request->result,
obj_request->xferred, obj_request->length);
if (layered && obj_request->result == -ENOENT &&
- obj_request->img_offset < rbd_dev->parent_overlap)
+ obj_request->img_offset < rbd_dev->parent_overlap) {
rbd_img_parent_read(obj_request);
- else if (img_request)
+ rbd_assert(obj_request->img_request);
+ if(is_copy_on_read(obj_request->img_request->rbd_dev))
+ rbd_img_copyup_start(obj_request->img_request, obj_request->object_name);
+ } else if (img_request) {
rbd_img_obj_request_read_callback(obj_request);
- else
+ } else {
obj_request_done_set(obj_request);
+ }
}
static void rbd_osd_write_callback(struct rbd_obj_request *obj_request)
@@ -2915,6 +2919,182 @@ out_err:
return result;
}
+static void rbd_img_copyup_end(struct rbd_copyup_request *copyup_request)
+{
+ struct rbd_img_request *img_request = NULL;
+ rbd_assert(copyup_request);
+ img_request = copyup_request->img_request;
+ rbd_img_copyup_request_del(img_request, copyup_request);
+ rbd_copyup_request_destroy(©up_request->kref);
+ rbd_img_request_put(img_request);
+}
+
+static void rbd_osd_req_copyup_callback(struct ceph_osd_request *osd_req,
+ struct ceph_msg *msg)
+{
+ struct rbd_copyup_request *copyup_request = NULL;
+ rbd_assert(osd_req);
+ copyup_request = osd_req->r_priv;
+ copyup_request->result = osd_req->r_result;
+ if(copyup_request->callback)
+ copyup_request->callback(copyup_request);
+ else
+ complete_all(©up_request->completion);
+}
+
+static void rbd_img_copyup_write_async(struct rbd_copyup_request *copyup_request)
+{
+ struct rbd_img_request *img_request = NULL;
+ struct ceph_snap_context *snapc = NULL;
+ struct ceph_osd_request *osd_req = NULL;
+ struct ceph_osd_client *osdc = NULL;
+ struct rbd_device *rbd_dev = NULL;
+ struct page **pages = NULL;
+ struct timespec mtime = CURRENT_TIME;
+ u32 page_count = 0;
+ u64 object_size = 0;
+ int result = 0;
+
+ /* if copyup_request read from parent failed, just end it */
+ if (copyup_request->result < 0) {
+ rbd_img_copyup_end(copyup_request);
+ goto out;
+ }
+
+ img_request = copyup_request->img_request;
+ rbd_assert(img_request);
+ rbd_dev = img_request->rbd_dev;
+ rbd_assert(rbd_dev);
+ osdc = &rbd_dev->rbd_client->client->osdc;
+ rbd_assert(osdc);
+ snapc = rbd_dev->header.snapc;
+
+ ceph_osdc_put_request(copyup_request->osd_req);
+
+ copyup_request->osd_req = NULL;
+ osd_req = ceph_osdc_alloc_request(osdc, snapc, 1, false, GFP_ATOMIC);
+ if (!osd_req)
+ goto out;
+
+ pages = copyup_request->copyup_pages;
+ page_count = copyup_request->copyup_page_count;
+ object_size = (u64)1 << rbd_dev->header.obj_order;
+
+ /* Initialize copyup op */
+ osd_req_op_cls_init(osd_req, 0, CEPH_OSD_OP_CALL, "rbd", "copyup");
+ osd_req_op_cls_request_data_pages(osd_req, 0, pages, object_size, 0, false, false);
+ osd_req->r_flags = CEPH_OSD_FLAG_WRITE | CEPH_OSD_FLAG_ONDISK;
+ osd_req->r_callback = rbd_osd_req_copyup_callback;
+ osd_req->r_priv = copyup_request;
+
+ osd_req->r_base_oloc.pool = ceph_file_layout_pg_pool(rbd_dev->layout);
+ ceph_oid_set_name(&osd_req->r_base_oid, copyup_request->object_name);
+
+ copyup_request->osd_req = osd_req;
+ copyup_request->callback = rbd_img_copyup_end;
+
+ ceph_osdc_build_request(osd_req, 0, snapc, CEPH_NOSNAP, &mtime);
+ result = ceph_osdc_start_request(osdc, osd_req, false);
+ if(!result)
+ goto out;
+
+ ceph_osdc_put_request(osd_req);
+out:
+ return;
+}
+
+static void rbd_img_copyup_start(struct rbd_img_request *img_request,
+ const char *object_name)
+{
+ struct rbd_copyup_request *copyup_request = NULL;
+ struct rbd_device *rbd_dev = NULL;
+ struct ceph_snap_context *snapc = NULL;
+ struct ceph_osd_client *osdc = NULL;
+ struct ceph_osd_request *osd_req = NULL;
+ const char *parent_object_name = NULL;
+
+ int result = 0;
+ u64 object_no = (u64)-1;
+ u64 object_size = 0;
+ u64 snap_id = 0;
+ __u8 obj_order = 0;
+ bool is_read = false;
+
+ rbd_assert(img_request != NULL);
+ rbd_assert(object_name != NULL);
+
+ rbd_dev = img_request->rbd_dev;
+ rbd_assert(rbd_dev != NULL);
+
+ is_read = !img_request_write_test(img_request) &&
+ !img_request_discard_test(img_request);
+
+ object_no = rbd_object_no(rbd_dev, object_name);
+ obj_order = rbd_dev->header.obj_order;
+ object_size = (u64)1 << obj_order;
+
+ spin_lock(&img_request->copyup_list_lock);
+ /* Find if object_no exists in copyup_list */
+ for_each_copyup_request(img_request, copyup_request) {
+ /* Found, just return */
+ if(copyup_request->object_no == object_no) {
+ spin_unlock(&img_request->copyup_list_lock);
+ return;
+ }
+ }
+ spin_unlock(&img_request->copyup_list_lock);
+
+ /* Not found, send new copyup request */
+ copyup_request = NULL;
+ osdc = &rbd_dev->rbd_client->client->osdc;
+ parent_object_name = rbd_segment_name(rbd_dev->parent, object_no << obj_order);
+ if (!parent_object_name)
+ goto out;
+ osd_req = ceph_osdc_alloc_request(osdc, snapc, 1, false, GFP_ATOMIC);
+ if (!osd_req)
+ goto out;
+ copyup_request = rbd_copyup_request_create(object_name, rbd_dev);
+ if (!copyup_request) {
+ ceph_osdc_put_request(osd_req);
+ goto out;
+ }
+
+ /* Init osd_req */
+ osd_req_op_extent_init(osd_req, 0, CEPH_OSD_OP_READ, 0, object_size, 0, 0);
+ osd_req_op_extent_osd_data_pages(osd_req, 0, copyup_request->copyup_pages, object_size,
+ 0, false, false);
+
+ osd_req->r_flags = CEPH_OSD_FLAG_READ;
+ osd_req->r_callback = rbd_osd_req_copyup_callback;
+ osd_req->r_priv = copyup_request;
+
+ osd_req->r_base_oloc.pool = ceph_file_layout_pg_pool(rbd_dev->parent->layout);
+ ceph_oid_set_name(&osd_req->r_base_oid, parent_object_name);
+ rbd_segment_name_free(parent_object_name);
+
+ /* Init copyup request */
+ rbd_assert(copyup_request->osd_req == NULL);
+ copyup_request->osd_req = osd_req;
+ copyup_request->callback = rbd_img_copyup_write_async;
+
+ /* Encode osd_req data */
+ snap_id = img_request ? img_request->snap_id : CEPH_NOSNAP;
+ ceph_osdc_build_request(osd_req, 0, NULL, snap_id, NULL);
+
+ /* Add copyup request to img_request->copyup_list */
+ rbd_img_copyup_request_add(img_request, copyup_request);
+
+ rbd_img_request_get(img_request);
+
+ /* Send osd_req */
+ result = ceph_osdc_start_request(osdc, osd_req, false);
+ if (!result)
+ goto out;
+out:
+ return;
+}
+
+
static void rbd_img_obj_exists_callback(struct rbd_obj_request *obj_request)
{
struct rbd_obj_request *orig_request;
--
1.9.1
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH 4/4] Rbd: implement the copy-on-read logic
2015-05-21 9:19 ` [PATCH 4/4] Rbd: implement the copy-on-read logic Li Wang
@ 2015-06-24 8:50 ` Ilya Dryomov
0 siblings, 0 replies; 7+ messages in thread
From: Ilya Dryomov @ 2015-06-24 8:50 UTC (permalink / raw)
To: Li Wang
Cc: Sage Weil, Alex Elder, Josh Durgin, Ceph Development, Min Chen,
Yunchuan Wen
On Thu, May 21, 2015 at 12:19 PM, Li Wang <liwang@ubuntukylin.com> wrote:
> From: Min Chen <minchen@ubuntukylin.com>
>
> Signed-off-by: Min Chen <minchen@ubuntukylin.com>
> Signed-off-by: Li Wang <liwang@ubuntukylin.com>
> Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
> ---
> drivers/block/rbd.c | 186 +++++++++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 183 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index 99a3a556..51d8398 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -1851,12 +1851,16 @@ static void rbd_osd_read_callback(struct rbd_obj_request *obj_request)
> obj_request, img_request, obj_request->result,
> obj_request->xferred, obj_request->length);
> if (layered && obj_request->result == -ENOENT &&
> - obj_request->img_offset < rbd_dev->parent_overlap)
> + obj_request->img_offset < rbd_dev->parent_overlap) {
> rbd_img_parent_read(obj_request);
> - else if (img_request)
> + rbd_assert(obj_request->img_request);
> + if(is_copy_on_read(obj_request->img_request->rbd_dev))
> + rbd_img_copyup_start(obj_request->img_request, obj_request->object_name);
> + } else if (img_request) {
> rbd_img_obj_request_read_callback(obj_request);
> - else
> + } else {
> obj_request_done_set(obj_request);
> + }
> }
>
> static void rbd_osd_write_callback(struct rbd_obj_request *obj_request)
> @@ -2915,6 +2919,182 @@ out_err:
> return result;
> }
>
> +static void rbd_img_copyup_end(struct rbd_copyup_request *copyup_request)
> +{
> + struct rbd_img_request *img_request = NULL;
> + rbd_assert(copyup_request);
> + img_request = copyup_request->img_request;
> + rbd_img_copyup_request_del(img_request, copyup_request);
> + rbd_copyup_request_destroy(©up_request->kref);
> + rbd_img_request_put(img_request);
> +}
> +
> +static void rbd_osd_req_copyup_callback(struct ceph_osd_request *osd_req,
> + struct ceph_msg *msg)
> +{
> + struct rbd_copyup_request *copyup_request = NULL;
> + rbd_assert(osd_req);
> + copyup_request = osd_req->r_priv;
> + copyup_request->result = osd_req->r_result;
> + if(copyup_request->callback)
> + copyup_request->callback(copyup_request);
> + else
> + complete_all(©up_request->completion);
> +}
> +
> +static void rbd_img_copyup_write_async(struct rbd_copyup_request *copyup_request)
> +{
> + struct rbd_img_request *img_request = NULL;
> + struct ceph_snap_context *snapc = NULL;
> + struct ceph_osd_request *osd_req = NULL;
> + struct ceph_osd_client *osdc = NULL;
> + struct rbd_device *rbd_dev = NULL;
> + struct page **pages = NULL;
> + struct timespec mtime = CURRENT_TIME;
> + u32 page_count = 0;
> + u64 object_size = 0;
> + int result = 0;
> +
> + /* if copyup_request read from parent failed, just end it */
> + if (copyup_request->result < 0) {
> + rbd_img_copyup_end(copyup_request);
> + goto out;
> + }
> +
> + img_request = copyup_request->img_request;
> + rbd_assert(img_request);
> + rbd_dev = img_request->rbd_dev;
> + rbd_assert(rbd_dev);
> + osdc = &rbd_dev->rbd_client->client->osdc;
> + rbd_assert(osdc);
> + snapc = rbd_dev->header.snapc;
> +
> + ceph_osdc_put_request(copyup_request->osd_req);
> +
> + copyup_request->osd_req = NULL;
> + osd_req = ceph_osdc_alloc_request(osdc, snapc, 1, false, GFP_ATOMIC);
> + if (!osd_req)
> + goto out;
> +
> + pages = copyup_request->copyup_pages;
> + page_count = copyup_request->copyup_page_count;
> + object_size = (u64)1 << rbd_dev->header.obj_order;
> +
> + /* Initialize copyup op */
> + osd_req_op_cls_init(osd_req, 0, CEPH_OSD_OP_CALL, "rbd", "copyup");
> + osd_req_op_cls_request_data_pages(osd_req, 0, pages, object_size, 0, false, false);
> + osd_req->r_flags = CEPH_OSD_FLAG_WRITE | CEPH_OSD_FLAG_ONDISK;
> + osd_req->r_callback = rbd_osd_req_copyup_callback;
> + osd_req->r_priv = copyup_request;
> +
> + osd_req->r_base_oloc.pool = ceph_file_layout_pg_pool(rbd_dev->layout);
> + ceph_oid_set_name(&osd_req->r_base_oid, copyup_request->object_name);
> +
> + copyup_request->osd_req = osd_req;
> + copyup_request->callback = rbd_img_copyup_end;
> +
> + ceph_osdc_build_request(osd_req, 0, snapc, CEPH_NOSNAP, &mtime);
> + result = ceph_osdc_start_request(osdc, osd_req, false);
> + if(!result)
> + goto out;
> +
> + ceph_osdc_put_request(osd_req);
> +out:
> + return;
> +}
> +
> +static void rbd_img_copyup_start(struct rbd_img_request *img_request,
> + const char *object_name)
> +{
> + struct rbd_copyup_request *copyup_request = NULL;
> + struct rbd_device *rbd_dev = NULL;
> + struct ceph_snap_context *snapc = NULL;
> + struct ceph_osd_client *osdc = NULL;
> + struct ceph_osd_request *osd_req = NULL;
> + const char *parent_object_name = NULL;
> +
> + int result = 0;
> + u64 object_no = (u64)-1;
> + u64 object_size = 0;
> + u64 snap_id = 0;
> + __u8 obj_order = 0;
> + bool is_read = false;
> +
> + rbd_assert(img_request != NULL);
> + rbd_assert(object_name != NULL);
> +
> + rbd_dev = img_request->rbd_dev;
> + rbd_assert(rbd_dev != NULL);
> +
> + is_read = !img_request_write_test(img_request) &&
> + !img_request_discard_test(img_request);
> +
> + object_no = rbd_object_no(rbd_dev, object_name);
> + obj_order = rbd_dev->header.obj_order;
> + object_size = (u64)1 << obj_order;
> +
> + spin_lock(&img_request->copyup_list_lock);
> + /* Find if object_no exists in copyup_list */
> + for_each_copyup_request(img_request, copyup_request) {
> + /* Found, just return */
> + if(copyup_request->object_no == object_no) {
> + spin_unlock(&img_request->copyup_list_lock);
> + return;
> + }
> + }
> + spin_unlock(&img_request->copyup_list_lock);
> +
> + /* Not found, send new copyup request */
> + copyup_request = NULL;
> + osdc = &rbd_dev->rbd_client->client->osdc;
> + parent_object_name = rbd_segment_name(rbd_dev->parent, object_no << obj_order);
> + if (!parent_object_name)
> + goto out;
> + osd_req = ceph_osdc_alloc_request(osdc, snapc, 1, false, GFP_ATOMIC);
> + if (!osd_req)
> + goto out;
> + copyup_request = rbd_copyup_request_create(object_name, rbd_dev);
> + if (!copyup_request) {
> + ceph_osdc_put_request(osd_req);
> + goto out;
> + }
> +
> + /* Init osd_req */
> + osd_req_op_extent_init(osd_req, 0, CEPH_OSD_OP_READ, 0, object_size, 0, 0);
> + osd_req_op_extent_osd_data_pages(osd_req, 0, copyup_request->copyup_pages, object_size,
> + 0, false, false);
> +
> + osd_req->r_flags = CEPH_OSD_FLAG_READ;
> + osd_req->r_callback = rbd_osd_req_copyup_callback;
> + osd_req->r_priv = copyup_request;
> +
> + osd_req->r_base_oloc.pool = ceph_file_layout_pg_pool(rbd_dev->parent->layout);
> + ceph_oid_set_name(&osd_req->r_base_oid, parent_object_name);
> + rbd_segment_name_free(parent_object_name);
> +
> + /* Init copyup request */
> + rbd_assert(copyup_request->osd_req == NULL);
> + copyup_request->osd_req = osd_req;
> + copyup_request->callback = rbd_img_copyup_write_async;
> +
> + /* Encode osd_req data */
> + snap_id = img_request ? img_request->snap_id : CEPH_NOSNAP;
> + ceph_osdc_build_request(osd_req, 0, NULL, snap_id, NULL);
> +
> + /* Add copyup request to img_request->copyup_list */
> + rbd_img_copyup_request_add(img_request, copyup_request);
> +
> + rbd_img_request_get(img_request);
> +
> + /* Send osd_req */
> + result = ceph_osdc_start_request(osdc, osd_req, false);
> + if (!result)
> + goto out;
> +out:
> + return;
> +}
> +
> +
> static void rbd_img_obj_exists_callback(struct rbd_obj_request *obj_request)
> {
> struct rbd_obj_request *orig_request;
Hi,
Sorry for late feedback, I thought I had sent this..
I have a number of high-level issues with this. First and foremost,
the lifetime rules are unclear. It looks to me like there is nothing
preventing a bunch of async copyup requests from hanging around after
rbd unmap completes. These copyups bump img_request refcount, which
means that img_requests along with rbd_device, rbd_client and
ceph_client will also hang in there after rbd unmap, waiting for late
replies. This is wrong: if there are no images mapped, there should be
no rbd or libceph state around. I'm pretty sure async copyup requests
don't need an osd_client callback at all - you can just send and
forget.
Second, I think sync (copy-on-write) and async (copy-on-read) copyups
should be coordinated with each other. If there is an async copyup in
progress, a sync copyup can just wait for it to complete.
Related to the above is the copyup request list. I think it should be
a per image rather than a per img_request thing - copyup_list in struct
rbd_img_request doesn't make much sense to me. What exactly is it's
use there?
Fourth, unless I'm missing something, the following
rbd_img_parent_read(obj_request);
rbd_assert(obj_request->img_request);
if(is_copy_on_read(obj_request->img_request->rbd_dev))
rbd_img_copyup_start(...);
in rbd_osd_read_callback() means that async copyup requests will be
issued for objects that don't exist in the parent. I think the correct
way to handle this is to wait for a parent read request to complete and
issue a copyup only if it completes successfully.
Thanks,
Ilya
^ permalink raw reply [flat|nested] 7+ messages in thread