* [PATCH 0/3] patches for rbd
@ 2014-03-12 4:24 Guangliang Zhao
2014-03-12 4:24 ` [PATCH 1/3] rbd: skip the copyup when an entire object writing Guangliang Zhao
` (4 more replies)
0 siblings, 5 replies; 10+ messages in thread
From: Guangliang Zhao @ 2014-03-12 4:24 UTC (permalink / raw)
To: ceph-devel; +Cc: sage
Hi,
I am sorry that didn't notice Jean-Tiare's mail at all, just sage's
reminds me.
There are 3 patches, the first is optimization for rbd writing, the
remaining two are support for rbd discard. If you have completed the
discard things, just ignore them.
Guangliang Zhao (3):
rbd: skip the copyup when an entire object writing
rbd: extend the operation type
rbd: add discard support for rbd
drivers/block/rbd.c | 190 +++++++++++++++++++++++++++++++++++++++++----------
1 file changed, 154 insertions(+), 36 deletions(-)
--
1.7.9.5
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/3] rbd: skip the copyup when an entire object writing
2014-03-12 4:24 [PATCH 0/3] patches for rbd Guangliang Zhao
@ 2014-03-12 4:24 ` Guangliang Zhao
2014-03-12 4:39 ` Alex Elder
2014-03-12 4:24 ` [PATCH 2/3] rbd: extend the operation type Guangliang Zhao
` (3 subsequent siblings)
4 siblings, 1 reply; 10+ messages in thread
From: Guangliang Zhao @ 2014-03-12 4:24 UTC (permalink / raw)
To: ceph-devel; +Cc: sage
It need to copyup the parent's content when layered writing,
but an entire object write would overwrite it, so skip it.
Signed-off-by: Guangliang Zhao <lucienchao@gmail.com>
---
drivers/block/rbd.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index b365e0d..965b9b9 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -2629,12 +2629,14 @@ static int rbd_img_obj_request_submit(struct rbd_obj_request *obj_request)
struct rbd_img_request *img_request;
struct rbd_device *rbd_dev;
bool known;
+ u64 obj_size;
rbd_assert(obj_request_img_data_test(obj_request));
img_request = obj_request->img_request;
rbd_assert(img_request);
rbd_dev = img_request->rbd_dev;
+ obj_size = (u64) 1 << rbd_dev->header.obj_order;
/*
* Only writes to layered images need special handling.
@@ -2644,11 +2646,15 @@ static int rbd_img_obj_request_submit(struct rbd_obj_request *obj_request)
* simple object requests. Finally, if the target object is
* known to already exist, its parent data has already been
* copied, so a write to the object can also be handled as a
- * simple object request.
+ * simple object request. Another type: if the obj_request
+ * aligns with the boundary and equals to the size of an object,
+ * it doesn't need copyup, because the obj_request will overwrite
+ * it finally.
*/
if (!img_request_write_test(img_request) ||
!img_request_layered_test(img_request) ||
rbd_dev->parent_overlap <= obj_request->img_offset ||
+ ((!obj_request->offset) && (obj_request->length == obj_size)) ||
((known = obj_request_known_test(obj_request)) &&
obj_request_exists_test(obj_request))) {
--
1.7.9.5
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/3] rbd: extend the operation type
2014-03-12 4:24 [PATCH 0/3] patches for rbd Guangliang Zhao
2014-03-12 4:24 ` [PATCH 1/3] rbd: skip the copyup when an entire object writing Guangliang Zhao
@ 2014-03-12 4:24 ` Guangliang Zhao
2014-03-12 5:01 ` Alex Elder
2014-03-12 4:24 ` [PATCH 3/3] rbd: add discard support for rbd Guangliang Zhao
` (2 subsequent siblings)
4 siblings, 1 reply; 10+ messages in thread
From: Guangliang Zhao @ 2014-03-12 4:24 UTC (permalink / raw)
To: ceph-devel; +Cc: sage
It could only handle the read and write operations now,
extend it for the coming discard support.
Signed-off-by: Guangliang Zhao <lucienchao@gmail.com>
---
drivers/block/rbd.c | 89 ++++++++++++++++++++++++++++++++++-----------------
1 file changed, 60 insertions(+), 29 deletions(-)
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 965b9b9..ca1fd14 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -209,6 +209,20 @@ enum obj_request_type {
OBJ_REQUEST_NODATA, OBJ_REQUEST_BIO, OBJ_REQUEST_PAGES
};
+enum obj_operation_type {
+ OBJ_OPT_WRITE,
+ OBJ_OPT_READ,
+};
+
+/*
+ * Do *not* change order of the elements, it corresponds with
+ * the above enum
+ */
+static const char* obj_opt[] = {
+ "write",
+ "read",
+};
+
enum obj_req_flags {
OBJ_REQ_DONE, /* completion flag: not done = 0, done = 1 */
OBJ_REQ_IMG_DATA, /* object usage: standalone = 0, image = 1 */
@@ -1717,19 +1731,21 @@ static void rbd_osd_req_format_write(struct rbd_obj_request *obj_request)
static struct ceph_osd_request *rbd_osd_req_create(
struct rbd_device *rbd_dev,
- bool write_request,
+ enum obj_operation_type type,
struct rbd_obj_request *obj_request)
{
struct ceph_snap_context *snapc = NULL;
struct ceph_osd_client *osdc;
struct ceph_osd_request *osd_req;
+ bool write_request = (type == OBJ_OPT_WRITE) != 0;
if (obj_request_img_data_test(obj_request)) {
struct rbd_img_request *img_request = obj_request->img_request;
rbd_assert(write_request ==
img_request_write_test(img_request));
- if (write_request)
+
+ if (type == OBJ_OPT_WRITE)
snapc = img_request->snapc;
}
@@ -1740,7 +1756,7 @@ static struct ceph_osd_request *rbd_osd_req_create(
if (!osd_req)
return NULL; /* ENOMEM */
- if (write_request)
+ if (type == OBJ_OPT_WRITE)
osd_req->r_flags = CEPH_OSD_FLAG_WRITE | CEPH_OSD_FLAG_ONDISK;
else
osd_req->r_flags = CEPH_OSD_FLAG_READ;
@@ -1947,7 +1963,7 @@ static bool rbd_dev_parent_get(struct rbd_device *rbd_dev)
static struct rbd_img_request *rbd_img_request_create(
struct rbd_device *rbd_dev,
u64 offset, u64 length,
- bool write_request)
+ enum obj_operation_type type)
{
struct rbd_img_request *img_request;
@@ -1955,7 +1971,7 @@ static struct rbd_img_request *rbd_img_request_create(
if (!img_request)
return NULL;
- if (write_request) {
+ if (type == OBJ_OPT_WRITE) {
down_read(&rbd_dev->header_rwsem);
ceph_get_snap_context(rbd_dev->header.snapc);
up_read(&rbd_dev->header_rwsem);
@@ -1966,7 +1982,7 @@ static struct rbd_img_request *rbd_img_request_create(
img_request->offset = offset;
img_request->length = length;
img_request->flags = 0;
- if (write_request) {
+ if (type == OBJ_OPT_WRITE) {
img_request_write_set(img_request);
img_request->snapc = rbd_dev->header.snapc;
} else {
@@ -1983,8 +1999,7 @@ static struct rbd_img_request *rbd_img_request_create(
kref_init(&img_request->kref);
dout("%s: rbd_dev %p %s %llu/%llu -> img %p\n", __func__, rbd_dev,
- write_request ? "write" : "read", offset, length,
- img_request);
+ obj_opt[type], offset, length, img_request);
return img_request;
}
@@ -2024,8 +2039,8 @@ static struct rbd_img_request *rbd_parent_request_create(
rbd_assert(obj_request->img_request);
rbd_dev = obj_request->img_request->rbd_dev;
- parent_request = rbd_img_request_create(rbd_dev->parent,
- img_offset, length, false);
+ parent_request = rbd_img_request_create(rbd_dev->parent, img_offset,
+ length, OBJ_OPT_READ);
if (!parent_request)
return NULL;
@@ -2066,11 +2081,13 @@ static bool rbd_img_obj_end_request(struct rbd_obj_request *obj_request)
result = obj_request->result;
if (result) {
struct rbd_device *rbd_dev = img_request->rbd_dev;
+ enum obj_operation_type type;
+ type = img_request_write_test(img_request) ? OBJ_OPT_WRITE :
+ OBJ_OPT_READ;
rbd_warn(rbd_dev, "%s %llx at %llx (%llx)\n",
- img_request_write_test(img_request) ? "write" : "read",
- obj_request->length, obj_request->img_offset,
- obj_request->offset);
+ obj_opt[type], obj_request->length,
+ obj_request->img_offset, obj_request->offset);
rbd_warn(rbd_dev, " result %d xferred %x\n",
result, xferred);
if (!img_request->result)
@@ -2149,10 +2166,10 @@ static int rbd_img_request_fill(struct rbd_img_request *img_request,
struct rbd_device *rbd_dev = img_request->rbd_dev;
struct rbd_obj_request *obj_request = NULL;
struct rbd_obj_request *next_obj_request;
- bool write_request = img_request_write_test(img_request);
struct bio *bio_list = NULL;
unsigned int bio_offset = 0;
struct page **pages = NULL;
+ enum obj_operation_type op_type;
u64 img_offset;
u64 resid;
u16 opcode;
@@ -2160,7 +2177,6 @@ static int rbd_img_request_fill(struct rbd_img_request *img_request,
dout("%s: img %p type %d data_desc %p\n", __func__, img_request,
(int)type, data_desc);
- opcode = write_request ? CEPH_OSD_OP_WRITE : CEPH_OSD_OP_READ;
img_offset = img_request->offset;
resid = img_request->length;
rbd_assert(resid > 0);
@@ -2220,8 +2236,15 @@ static int rbd_img_request_fill(struct rbd_img_request *img_request,
pages += page_count;
}
- osd_req = rbd_osd_req_create(rbd_dev, write_request,
- obj_request);
+ if (img_request_write_test(img_request)) {
+ op_type = OBJ_OPT_WRITE;
+ opcode = CEPH_OSD_OP_WRITE;
+ } else {
+ op_type = OBJ_OPT_READ;
+ opcode = CEPH_OSD_OP_READ;
+ }
+
+ osd_req = rbd_osd_req_create(rbd_dev, op_type, obj_request);
if (!osd_req)
goto out_partial;
obj_request->osd_req = osd_req;
@@ -2237,7 +2260,7 @@ static int rbd_img_request_fill(struct rbd_img_request *img_request,
obj_request->pages, length,
offset & ~PAGE_MASK, false, false);
- if (write_request)
+ if (op_type == OBJ_OPT_WRITE)
rbd_osd_req_format_write(obj_request);
else
rbd_osd_req_format_read(obj_request);
@@ -2604,7 +2627,7 @@ static int rbd_img_obj_exists_submit(struct rbd_obj_request *obj_request)
rbd_assert(obj_request->img_request);
rbd_dev = obj_request->img_request->rbd_dev;
- stat_request->osd_req = rbd_osd_req_create(rbd_dev, false,
+ stat_request->osd_req = rbd_osd_req_create(rbd_dev, OBJ_OPT_READ,
stat_request);
if (!stat_request->osd_req)
goto out;
@@ -2814,7 +2837,8 @@ static int rbd_obj_notify_ack_sync(struct rbd_device *rbd_dev, u64 notify_id)
return -ENOMEM;
ret = -ENOMEM;
- obj_request->osd_req = rbd_osd_req_create(rbd_dev, false, obj_request);
+ obj_request->osd_req = rbd_osd_req_create(rbd_dev, OBJ_OPT_READ,
+ obj_request);
if (!obj_request->osd_req)
goto out;
@@ -2877,7 +2901,8 @@ static int __rbd_dev_header_watch_sync(struct rbd_device *rbd_dev, bool start)
if (!obj_request)
goto out_cancel;
- obj_request->osd_req = rbd_osd_req_create(rbd_dev, true, obj_request);
+ obj_request->osd_req = rbd_osd_req_create(rbd_dev, OBJ_OPT_WRITE,
+ obj_request);
if (!obj_request->osd_req)
goto out_cancel;
@@ -2985,7 +3010,8 @@ static int rbd_obj_method_sync(struct rbd_device *rbd_dev,
obj_request->pages = pages;
obj_request->page_count = page_count;
- obj_request->osd_req = rbd_osd_req_create(rbd_dev, false, obj_request);
+ obj_request->osd_req = rbd_osd_req_create(rbd_dev, OBJ_OPT_READ,
+ obj_request);
if (!obj_request->osd_req)
goto out;
@@ -3040,7 +3066,7 @@ static void rbd_request_fn(struct request_queue *q)
int result;
while ((rq = blk_fetch_request(q))) {
- bool write_request = rq_data_dir(rq) == WRITE;
+ enum obj_operation_type type;
struct rbd_img_request *img_request;
u64 offset;
u64 length;
@@ -3067,9 +3093,14 @@ static void rbd_request_fn(struct request_queue *q)
spin_unlock_irq(q->queue_lock);
- /* Disallow writes to a read-only device */
+ if (rq->cmd_flags & REQ_WRITE)
+ type = OBJ_OPT_WRITE;
+ else
+ type = OBJ_OPT_READ;
+
+ /* Only allow reads to a read-only device */
- if (write_request) {
+ if (type != OBJ_OPT_READ) {
result = -EROFS;
if (read_only)
goto end_request;
@@ -3106,7 +3137,7 @@ static void rbd_request_fn(struct request_queue *q)
result = -ENOMEM;
img_request = rbd_img_request_create(rbd_dev, offset, length,
- write_request);
+ type);
if (!img_request)
goto end_request;
@@ -3122,8 +3153,7 @@ end_request:
spin_lock_irq(q->queue_lock);
if (result < 0) {
rbd_warn(rbd_dev, "%s %llx at %llx result %d\n",
- write_request ? "write" : "read",
- length, offset, result);
+ obj_opt[type], length, offset, result);
__blk_end_request_all(rq, result);
}
@@ -3218,7 +3248,8 @@ static int rbd_obj_read_sync(struct rbd_device *rbd_dev,
obj_request->pages = pages;
obj_request->page_count = page_count;
- obj_request->osd_req = rbd_osd_req_create(rbd_dev, false, obj_request);
+ obj_request->osd_req = rbd_osd_req_create(rbd_dev, OBJ_OPT_READ,
+ obj_request);
if (!obj_request->osd_req)
goto out;
--
1.7.9.5
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/3] rbd: add discard support for rbd
2014-03-12 4:24 [PATCH 0/3] patches for rbd Guangliang Zhao
2014-03-12 4:24 ` [PATCH 1/3] rbd: skip the copyup when an entire object writing Guangliang Zhao
2014-03-12 4:24 ` [PATCH 2/3] rbd: extend the operation type Guangliang Zhao
@ 2014-03-12 4:24 ` Guangliang Zhao
2014-03-12 5:26 ` Alex Elder
2014-03-12 5:28 ` [PATCH 0/3] patches " Alex Elder
2014-03-12 9:34 ` Jean-Tiare LE BIGOT
4 siblings, 1 reply; 10+ messages in thread
From: Guangliang Zhao @ 2014-03-12 4:24 UTC (permalink / raw)
To: ceph-devel; +Cc: sage
This resolve:
http://tracker.ceph.com/issues/190
Signed-off-by: Guangliang Zhao <lucienchao@gmail.com>
---
drivers/block/rbd.c | 105 +++++++++++++++++++++++++++++++++++++++++++++------
1 file changed, 93 insertions(+), 12 deletions(-)
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index ca1fd14..67ea156 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -212,6 +212,7 @@ enum obj_request_type {
enum obj_operation_type {
OBJ_OPT_WRITE,
OBJ_OPT_READ,
+ OBJ_OPT_DISCARD,
};
/*
@@ -221,6 +222,7 @@ enum obj_operation_type {
static const char* obj_opt[] = {
"write",
"read",
+ "discard",
};
enum obj_req_flags {
@@ -228,6 +230,7 @@ enum obj_req_flags {
OBJ_REQ_IMG_DATA, /* object usage: standalone = 0, image = 1 */
OBJ_REQ_KNOWN, /* EXISTS flag valid: no = 0, yes = 1 */
OBJ_REQ_EXISTS, /* target exists: no = 0, yes = 1 */
+ IMG_REQ_DISCARD, /* discard: normal = 0, discard request = 1 */
};
struct rbd_obj_request {
@@ -1518,6 +1521,21 @@ static bool img_request_write_test(struct rbd_img_request *img_request)
return test_bit(IMG_REQ_WRITE, &img_request->flags) != 0;
}
+/*
+ * Set the discard flag when the img_request is an discard request
+ */
+static void img_request_discard_set(struct rbd_img_request *img_request)
+{
+ set_bit(IMG_REQ_DISCARD, &img_request->flags);
+ smp_mb();
+}
+
+static bool img_request_discard_test(struct rbd_img_request *img_request)
+{
+ smp_mb();
+ return test_bit(IMG_REQ_DISCARD, &img_request->flags) != 0;
+}
+
static void img_request_child_set(struct rbd_img_request *img_request)
{
set_bit(IMG_REQ_CHILD, &img_request->flags);
@@ -1640,6 +1658,18 @@ static void rbd_osd_write_callback(struct rbd_obj_request *obj_request)
obj_request_done_set(obj_request);
}
+static void rbd_osd_discard_callback(struct rbd_obj_request *obj_request)
+{
+ dout("%s: obj %p result %d %llu\n", __func__, obj_request,
+ obj_request->result, obj_request->length);
+ /*
+ * There is no such thing as a successful short write. Set
+ * it to our originally-requested length.
+ */
+ obj_request->xferred = obj_request->length;
+ obj_request_done_set(obj_request);
+}
+
/*
* For a simple stat call there's nothing to do. We'll do more if
* this is part of a write sequence for a layered image.
@@ -1687,6 +1717,11 @@ static void rbd_osd_req_callback(struct ceph_osd_request *osd_req,
case CEPH_OSD_OP_STAT:
rbd_osd_stat_callback(obj_request);
break;
+ case CEPH_OSD_OP_DELETE:
+ case CEPH_OSD_OP_TRUNCATE:
+ case CEPH_OSD_OP_ZERO:
+ rbd_osd_discard_callback(obj_request);
+ break;
case CEPH_OSD_OP_CALL:
case CEPH_OSD_OP_NOTIFY_ACK:
case CEPH_OSD_OP_WATCH:
@@ -1729,6 +1764,20 @@ static void rbd_osd_req_format_write(struct rbd_obj_request *obj_request)
snapc, CEPH_NOSNAP, &mtime);
}
+static void rbd_osd_req_format_discard(struct rbd_obj_request *obj_request)
+{
+ struct rbd_img_request *img_request = obj_request->img_request;
+ struct ceph_osd_request *osd_req = obj_request->osd_req;
+ struct timespec mtime = CURRENT_TIME;
+ u64 snap_id;
+
+ rbd_assert(osd_req != NULL);
+
+ snap_id = img_request ? img_request->snap_id : CEPH_NOSNAP;
+ ceph_osdc_build_request(osd_req, obj_request->offset,
+ NULL, snap_id, &mtime);
+}
+
static struct ceph_osd_request *rbd_osd_req_create(
struct rbd_device *rbd_dev,
enum obj_operation_type type,
@@ -1738,12 +1787,15 @@ static struct ceph_osd_request *rbd_osd_req_create(
struct ceph_osd_client *osdc;
struct ceph_osd_request *osd_req;
bool write_request = (type == OBJ_OPT_WRITE) != 0;
+ bool discard_request = (type == OBJ_OPT_DISCARD) != 0;
if (obj_request_img_data_test(obj_request)) {
struct rbd_img_request *img_request = obj_request->img_request;
rbd_assert(write_request ==
img_request_write_test(img_request));
+ rbd_assert(discard_request ==
+ img_request_discard_test(img_request));
if (type == OBJ_OPT_WRITE)
snapc = img_request->snapc;
@@ -1756,7 +1808,7 @@ static struct ceph_osd_request *rbd_osd_req_create(
if (!osd_req)
return NULL; /* ENOMEM */
- if (type == OBJ_OPT_WRITE)
+ if (type == OBJ_OPT_WRITE || type == OBJ_OPT_DISCARD)
osd_req->r_flags = CEPH_OSD_FLAG_WRITE | CEPH_OSD_FLAG_ONDISK;
else
osd_req->r_flags = CEPH_OSD_FLAG_READ;
@@ -1982,7 +2034,10 @@ static struct rbd_img_request *rbd_img_request_create(
img_request->offset = offset;
img_request->length = length;
img_request->flags = 0;
- if (type == OBJ_OPT_WRITE) {
+ if (type == OBJ_OPT_DISCARD) {
+ img_request_discard_set(img_request);
+ img_request->snap_id = rbd_dev->spec->snap_id;
+ } else if (type == OBJ_OPT_WRITE) {
img_request_write_set(img_request);
img_request->snapc = rbd_dev->header.snapc;
} else {
@@ -2083,8 +2138,13 @@ static bool rbd_img_obj_end_request(struct rbd_obj_request *obj_request)
struct rbd_device *rbd_dev = img_request->rbd_dev;
enum obj_operation_type type;
- type = img_request_write_test(img_request) ? OBJ_OPT_WRITE :
- OBJ_OPT_READ;
+ if (img_request_discard_test(img_request))
+ type = OBJ_OPT_DISCARD;
+ else if (img_request_write_test(img_request))
+ type = OBJ_OPT_WRITE;
+ else
+ type = OBJ_OPT_READ;
+
rbd_warn(rbd_dev, "%s %llx at %llx (%llx)\n",
obj_opt[type], obj_request->length,
obj_request->img_offset, obj_request->offset);
@@ -2170,6 +2230,7 @@ static int rbd_img_request_fill(struct rbd_img_request *img_request,
unsigned int bio_offset = 0;
struct page **pages = NULL;
enum obj_operation_type op_type;
+ u64 object_size = (u64) 1 << rbd_dev->header.obj_order;
u64 img_offset;
u64 resid;
u16 opcode;
@@ -2185,8 +2246,7 @@ static int rbd_img_request_fill(struct rbd_img_request *img_request,
bio_list = data_desc;
rbd_assert(img_offset ==
bio_list->bi_iter.bi_sector << SECTOR_SHIFT);
- } else {
- rbd_assert(type == OBJ_REQUEST_PAGES);
+ } else if (type == OBJ_REQUEST_PAGES) {
pages = data_desc;
}
@@ -2225,7 +2285,7 @@ static int rbd_img_request_fill(struct rbd_img_request *img_request,
GFP_ATOMIC);
if (!obj_request->bio_list)
goto out_partial;
- } else {
+ } else if (type == OBJ_REQUEST_PAGES) {
unsigned int page_count;
obj_request->pages = pages;
@@ -2236,7 +2296,15 @@ static int rbd_img_request_fill(struct rbd_img_request *img_request,
pages += page_count;
}
- if (img_request_write_test(img_request)) {
+ if (img_request_discard_test(img_request)) {
+ op_type = OBJ_OPT_DISCARD;
+ if (!offset && length == object_size)
+ opcode = CEPH_OSD_OP_DELETE;
+ else if (offset + length == object_size)
+ opcode = CEPH_OSD_OP_TRUNCATE;
+ else
+ opcode = CEPH_OSD_OP_ZERO;
+ } else if (img_request_write_test(img_request)) {
op_type = OBJ_OPT_WRITE;
opcode = CEPH_OSD_OP_WRITE;
} else {
@@ -2255,13 +2323,15 @@ static int rbd_img_request_fill(struct rbd_img_request *img_request,
if (type == OBJ_REQUEST_BIO)
osd_req_op_extent_osd_data_bio(osd_req, 0,
obj_request->bio_list, length);
- else
+ else if (type == OBJ_REQUEST_PAGES)
osd_req_op_extent_osd_data_pages(osd_req, 0,
obj_request->pages, length,
offset & ~PAGE_MASK, false, false);
if (op_type == OBJ_OPT_WRITE)
rbd_osd_req_format_write(obj_request);
+ else if (op_type == OBJ_OPT_DISCARD)
+ rbd_osd_req_format_discard(obj_request);
else
rbd_osd_req_format_read(obj_request);
@@ -3093,7 +3163,9 @@ static void rbd_request_fn(struct request_queue *q)
spin_unlock_irq(q->queue_lock);
- if (rq->cmd_flags & REQ_WRITE)
+ if (rq->cmd_flags & REQ_DISCARD)
+ type = OBJ_OPT_DISCARD;
+ else if (rq->cmd_flags & REQ_WRITE)
type = OBJ_OPT_WRITE;
else
type = OBJ_OPT_READ;
@@ -3143,8 +3215,12 @@ static void rbd_request_fn(struct request_queue *q)
img_request->rq = rq;
- result = rbd_img_request_fill(img_request, OBJ_REQUEST_BIO,
- rq->bio);
+ if (type == OBJ_OPT_DISCARD)
+ result = rbd_img_request_fill(img_request,
+ OBJ_REQUEST_NODATA, NULL);
+ else
+ result = rbd_img_request_fill(img_request,
+ OBJ_REQUEST_BIO, rq->bio);
if (!result)
result = rbd_img_request_submit(img_request);
if (result)
@@ -3452,6 +3528,11 @@ static int rbd_init_disk(struct rbd_device *rbd_dev)
blk_queue_io_min(q, segment_size);
blk_queue_io_opt(q, segment_size);
+ /* enable the discard support */
+ queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
+ q->limits.discard_granularity = segment_size;
+ q->limits.discard_alignment = segment_size;
+
blk_queue_merge_bvec(q, rbd_merge_bvec);
disk->queue = q;
--
1.7.9.5
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] rbd: skip the copyup when an entire object writing
2014-03-12 4:24 ` [PATCH 1/3] rbd: skip the copyup when an entire object writing Guangliang Zhao
@ 2014-03-12 4:39 ` Alex Elder
0 siblings, 0 replies; 10+ messages in thread
From: Alex Elder @ 2014-03-12 4:39 UTC (permalink / raw)
To: Guangliang Zhao, ceph-devel; +Cc: sage
On 03/11/2014 11:24 PM, Guangliang Zhao wrote:
> It need to copyup the parent's content when layered writing,
> but an entire object write would overwrite it, so skip it.
This is a pretty reasonable optimization. Has anyone
found how often this case is hit? Is it a big win?
I think the patch looks good but I have something for
you to consider, below. Either way:
Reviewed-by: Alex Elder <elder@linaro.org>
> Signed-off-by: Guangliang Zhao <lucienchao@gmail.com>
> ---
> drivers/block/rbd.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index b365e0d..965b9b9 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -2629,12 +2629,14 @@ static int rbd_img_obj_request_submit(struct rbd_obj_request *obj_request)
> struct rbd_img_request *img_request;
> struct rbd_device *rbd_dev;
> bool known;
> + u64 obj_size;
>
> rbd_assert(obj_request_img_data_test(obj_request));
>
> img_request = obj_request->img_request;
> rbd_assert(img_request);
> rbd_dev = img_request->rbd_dev;
> + obj_size = (u64) 1 << rbd_dev->header.obj_order;
>
> /*
> * Only writes to layered images need special handling.
> @@ -2644,11 +2646,15 @@ static int rbd_img_obj_request_submit(struct rbd_obj_request *obj_request)
> * simple object requests. Finally, if the target object is
> * known to already exist, its parent data has already been
> * copied, so a write to the object can also be handled as a
> - * simple object request.
> + * simple object request. Another type: if the obj_request
> + * aligns with the boundary and equals to the size of an object,
> + * it doesn't need copyup, because the obj_request will overwrite
> + * it finally.
> */
This test is awfully big, and it's now even bigger. I think it
would be good to encapsulate it into a helper function. Something
like rbd_obj_request_simple() (or figure out a better name). So
it might then look like:
/*
* If it's a simple image object request, no special
* handling is required.
*/
if (rbd_img_obj_request_simple(obj_request)) {
struct rbd_device *rbd_dev;
struct ceph_osd_client *osdc;
return rbd_obj_request_submit(osdc, obj_request);
}
The called function could probably be made a little easier
to understand by splitting up the conditions and returning
more than once. E.g.:
/* Only layered writes need special handling. */
img_request = obj_request->img_request;
rbd_dev = img_request->rbd_dev;
if (!img_request_write_test(obj_request))
return true;
if (!img_request_layered_test(img_request))
return true;
/*
* Layered writes that start beyond the end of the overlap
* with the parent have no parent data, so they too are
* simple object requests.
*/
if (rbd_dev->parent_overlap <= obj_request->img_offset)
return true;
And so on.
> if (!img_request_write_test(img_request) ||
> !img_request_layered_test(img_request) ||
> rbd_dev->parent_overlap <= obj_request->img_offset ||
> + ((!obj_request->offset) && (obj_request->length == obj_size)) ||
> ((known = obj_request_known_test(obj_request)) &&
> obj_request_exists_test(obj_request))) {
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/3] rbd: extend the operation type
2014-03-12 4:24 ` [PATCH 2/3] rbd: extend the operation type Guangliang Zhao
@ 2014-03-12 5:01 ` Alex Elder
0 siblings, 0 replies; 10+ messages in thread
From: Alex Elder @ 2014-03-12 5:01 UTC (permalink / raw)
To: Guangliang Zhao, ceph-devel; +Cc: sage
On 03/11/2014 11:24 PM, Guangliang Zhao wrote:
> It could only handle the read and write operations now,
> extend it for the coming discard support.
OK, here too I think this looks good, but please consider
the comments I've provided below.
Reviewed-by: Alex Elder <elder@linaro.org>
> Signed-off-by: Guangliang Zhao <lucienchao@gmail.com>
> ---
> drivers/block/rbd.c | 89 ++++++++++++++++++++++++++++++++++-----------------
> 1 file changed, 60 insertions(+), 29 deletions(-)
>
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index 965b9b9..ca1fd14 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -209,6 +209,20 @@ enum obj_request_type {
> OBJ_REQUEST_NODATA, OBJ_REQUEST_BIO, OBJ_REQUEST_PAGES
> };
>
> +enum obj_operation_type {
> + OBJ_OPT_WRITE,
> + OBJ_OPT_READ,
> +};
> +
> +/*
> + * Do *not* change order of the elements, it corresponds with
> + * the above enum
> + */
I don't like this. Why not do this instead, so you don't have
to worry about the above comment?
static const char *obj_operation_name[] = {
[OBJ_OPT_WRITE] = "write",
[OBJ_OPT_READ] = "read",
};
Beyond that, I'd actually prefer to use a helper function
to return the name, using a switch statement. That way
the default case could return a string that indicated a
bad operation value was provided rather than just being a
null pointer.
> +static const char* obj_opt[] = {
> + "write",
> + "read",
> +};
> +
> enum obj_req_flags {
> OBJ_REQ_DONE, /* completion flag: not done = 0, done = 1 */
> OBJ_REQ_IMG_DATA, /* object usage: standalone = 0, image = 1 */
> @@ -1717,19 +1731,21 @@ static void rbd_osd_req_format_write(struct rbd_obj_request *obj_request)
>
> static struct ceph_osd_request *rbd_osd_req_create(
> struct rbd_device *rbd_dev,
> - bool write_request,
> + enum obj_operation_type type,
> struct rbd_obj_request *obj_request)
> {
> struct ceph_snap_context *snapc = NULL;
> struct ceph_osd_client *osdc;
> struct ceph_osd_request *osd_req;
> + bool write_request = (type == OBJ_OPT_WRITE) != 0;
How about just:
bool write_request = type == OBJ_OPT_WRITE;
(Add parentheses around the condition if you feel you must.)
Furthermore, this write_request value is just used in an
assertion, so...
>
> if (obj_request_img_data_test(obj_request)) {
> struct rbd_img_request *img_request = obj_request->img_request;
The write_request local variable should be assigned only inside
this block (the only place it's used).
> rbd_assert(write_request ==
> img_request_write_test(img_request));
> - if (write_request)
> +
> + if (type == OBJ_OPT_WRITE)
> snapc = img_request->snapc;
And instead, I think I'd rather see:
if (obj_request_img_data_test(obj_request)) {
struct rbd_img_request *img_request = obj_request->img_request;
if (type == OBJ_OPT_WRITE) {
rbd_assert(img_request_write_test(img_request));
snapc = img_request->snapc;
}
}
> }
>
> @@ -1740,7 +1756,7 @@ static struct ceph_osd_request *rbd_osd_req_create(
> if (!osd_req)
> return NULL; /* ENOMEM */
>
> - if (write_request)
> + if (type == OBJ_OPT_WRITE)
> osd_req->r_flags = CEPH_OSD_FLAG_WRITE | CEPH_OSD_FLAG_ONDISK;
> else
> osd_req->r_flags = CEPH_OSD_FLAG_READ;
> @@ -1947,7 +1963,7 @@ static bool rbd_dev_parent_get(struct rbd_device *rbd_dev)
> static struct rbd_img_request *rbd_img_request_create(
> struct rbd_device *rbd_dev,
> u64 offset, u64 length,
> - bool write_request)
> + enum obj_operation_type type)
> {
> struct rbd_img_request *img_request;
>
> @@ -1955,7 +1971,7 @@ static struct rbd_img_request *rbd_img_request_create(
> if (!img_request)
> return NULL;
>
> - if (write_request) {
> + if (type == OBJ_OPT_WRITE) {
> down_read(&rbd_dev->header_rwsem);
> ceph_get_snap_context(rbd_dev->header.snapc);
> up_read(&rbd_dev->header_rwsem);
> @@ -1966,7 +1982,7 @@ static struct rbd_img_request *rbd_img_request_create(
> img_request->offset = offset;
> img_request->length = length;
> img_request->flags = 0;
> - if (write_request) {
> + if (type == OBJ_OPT_WRITE) {
> img_request_write_set(img_request);
> img_request->snapc = rbd_dev->header.snapc;
> } else {
> @@ -1983,8 +1999,7 @@ static struct rbd_img_request *rbd_img_request_create(
> kref_init(&img_request->kref);
>
> dout("%s: rbd_dev %p %s %llu/%llu -> img %p\n", __func__, rbd_dev,
> - write_request ? "write" : "read", offset, length,
> - img_request);
> + obj_opt[type], offset, length, img_request);
>
> return img_request;
> }
> @@ -2024,8 +2039,8 @@ static struct rbd_img_request *rbd_parent_request_create(
> rbd_assert(obj_request->img_request);
> rbd_dev = obj_request->img_request->rbd_dev;
>
> - parent_request = rbd_img_request_create(rbd_dev->parent,
> - img_offset, length, false);
> + parent_request = rbd_img_request_create(rbd_dev->parent, img_offset,
> + length, OBJ_OPT_READ);
> if (!parent_request)
> return NULL;
>
> @@ -2066,11 +2081,13 @@ static bool rbd_img_obj_end_request(struct rbd_obj_request *obj_request)
> result = obj_request->result;
> if (result) {
> struct rbd_device *rbd_dev = img_request->rbd_dev;
> + enum obj_operation_type type;
>
> + type = img_request_write_test(img_request) ? OBJ_OPT_WRITE :
> + OBJ_OPT_READ;
> rbd_warn(rbd_dev, "%s %llx at %llx (%llx)\n",
> - img_request_write_test(img_request) ? "write" : "read",
> - obj_request->length, obj_request->img_offset,
> - obj_request->offset);
> + obj_opt[type], obj_request->length,
> + obj_request->img_offset, obj_request->offset);
> rbd_warn(rbd_dev, " result %d xferred %x\n",
> result, xferred);
> if (!img_request->result)
> @@ -2149,10 +2166,10 @@ static int rbd_img_request_fill(struct rbd_img_request *img_request,
> struct rbd_device *rbd_dev = img_request->rbd_dev;
> struct rbd_obj_request *obj_request = NULL;
> struct rbd_obj_request *next_obj_request;
> - bool write_request = img_request_write_test(img_request);
> struct bio *bio_list = NULL;
> unsigned int bio_offset = 0;
> struct page **pages = NULL;
> + enum obj_operation_type op_type;
You used just "type" consistently for variables of this type
above. Pick a name and stick to it. I think just "type" is
fine but it might be too generic, so "op_type" might be a
better choice.
> u64 img_offset;
> u64 resid;
> u16 opcode;
> @@ -2160,7 +2177,6 @@ static int rbd_img_request_fill(struct rbd_img_request *img_request,
> dout("%s: img %p type %d data_desc %p\n", __func__, img_request,
> (int)type, data_desc);
>
> - opcode = write_request ? CEPH_OSD_OP_WRITE : CEPH_OSD_OP_READ;
> img_offset = img_request->offset;
> resid = img_request->length;
> rbd_assert(resid > 0);
> @@ -2220,8 +2236,15 @@ static int rbd_img_request_fill(struct rbd_img_request *img_request,
> pages += page_count;
> }
>
> - osd_req = rbd_osd_req_create(rbd_dev, write_request,
> - obj_request);
> + if (img_request_write_test(img_request)) {
> + op_type = OBJ_OPT_WRITE;
> + opcode = CEPH_OSD_OP_WRITE;
> + } else {
> + op_type = OBJ_OPT_READ;
> + opcode = CEPH_OSD_OP_READ;
> + }
> +
> + osd_req = rbd_osd_req_create(rbd_dev, op_type, obj_request);
> if (!osd_req)
> goto out_partial;
> obj_request->osd_req = osd_req;
> @@ -2237,7 +2260,7 @@ static int rbd_img_request_fill(struct rbd_img_request *img_request,
> obj_request->pages, length,
> offset & ~PAGE_MASK, false, false);
>
> - if (write_request)
> + if (op_type == OBJ_OPT_WRITE)
> rbd_osd_req_format_write(obj_request);
> else
> rbd_osd_req_format_read(obj_request);
> @@ -2604,7 +2627,7 @@ static int rbd_img_obj_exists_submit(struct rbd_obj_request *obj_request)
>
> rbd_assert(obj_request->img_request);
> rbd_dev = obj_request->img_request->rbd_dev;
> - stat_request->osd_req = rbd_osd_req_create(rbd_dev, false,
> + stat_request->osd_req = rbd_osd_req_create(rbd_dev, OBJ_OPT_READ,
> stat_request);
> if (!stat_request->osd_req)
> goto out;
> @@ -2814,7 +2837,8 @@ static int rbd_obj_notify_ack_sync(struct rbd_device *rbd_dev, u64 notify_id)
> return -ENOMEM;
>
> ret = -ENOMEM;
> - obj_request->osd_req = rbd_osd_req_create(rbd_dev, false, obj_request);
> + obj_request->osd_req = rbd_osd_req_create(rbd_dev, OBJ_OPT_READ,
> + obj_request);
> if (!obj_request->osd_req)
> goto out;
>
> @@ -2877,7 +2901,8 @@ static int __rbd_dev_header_watch_sync(struct rbd_device *rbd_dev, bool start)
> if (!obj_request)
> goto out_cancel;
>
> - obj_request->osd_req = rbd_osd_req_create(rbd_dev, true, obj_request);
> + obj_request->osd_req = rbd_osd_req_create(rbd_dev, OBJ_OPT_WRITE,
> + obj_request);
> if (!obj_request->osd_req)
> goto out_cancel;
>
> @@ -2985,7 +3010,8 @@ static int rbd_obj_method_sync(struct rbd_device *rbd_dev,
> obj_request->pages = pages;
> obj_request->page_count = page_count;
>
> - obj_request->osd_req = rbd_osd_req_create(rbd_dev, false, obj_request);
> + obj_request->osd_req = rbd_osd_req_create(rbd_dev, OBJ_OPT_READ,
> + obj_request);
> if (!obj_request->osd_req)
> goto out;
>
> @@ -3040,7 +3066,7 @@ static void rbd_request_fn(struct request_queue *q)
> int result;
>
> while ((rq = blk_fetch_request(q))) {
> - bool write_request = rq_data_dir(rq) == WRITE;
> + enum obj_operation_type type;
> struct rbd_img_request *img_request;
> u64 offset;
> u64 length;
> @@ -3067,9 +3093,14 @@ static void rbd_request_fn(struct request_queue *q)
>
> spin_unlock_irq(q->queue_lock);
>
> - /* Disallow writes to a read-only device */
> + if (rq->cmd_flags & REQ_WRITE)
> + type = OBJ_OPT_WRITE;
> + else
> + type = OBJ_OPT_READ;
> +
> + /* Only allow reads to a read-only device */
>
> - if (write_request) {
> + if (type != OBJ_OPT_READ) {
> result = -EROFS;
> if (read_only)
> goto end_request;
> @@ -3106,7 +3137,7 @@ static void rbd_request_fn(struct request_queue *q)
>
> result = -ENOMEM;
> img_request = rbd_img_request_create(rbd_dev, offset, length,
> - write_request);
> + type);
> if (!img_request)
> goto end_request;
>
> @@ -3122,8 +3153,7 @@ end_request:
> spin_lock_irq(q->queue_lock);
> if (result < 0) {
> rbd_warn(rbd_dev, "%s %llx at %llx result %d\n",
> - write_request ? "write" : "read",
> - length, offset, result);
> + obj_opt[type], length, offset, result);
>
> __blk_end_request_all(rq, result);
> }
> @@ -3218,7 +3248,8 @@ static int rbd_obj_read_sync(struct rbd_device *rbd_dev,
> obj_request->pages = pages;
> obj_request->page_count = page_count;
>
> - obj_request->osd_req = rbd_osd_req_create(rbd_dev, false, obj_request);
> + obj_request->osd_req = rbd_osd_req_create(rbd_dev, OBJ_OPT_READ,
> + obj_request);
> if (!obj_request->osd_req)
> goto out;
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 3/3] rbd: add discard support for rbd
2014-03-12 4:24 ` [PATCH 3/3] rbd: add discard support for rbd Guangliang Zhao
@ 2014-03-12 5:26 ` Alex Elder
0 siblings, 0 replies; 10+ messages in thread
From: Alex Elder @ 2014-03-12 5:26 UTC (permalink / raw)
To: Guangliang Zhao, ceph-devel; +Cc: sage
On 03/11/2014 11:24 PM, Guangliang Zhao wrote:
> This resolve:
> http://tracker.ceph.com/issues/190
This should have a bit more explanation here. Perhaps mention
that a new DISCARD image request type flag is defined which
includes no data. A bunch of the changes here are just making
minor extensions to allow for the third operation type (discard).
You could also mention that discard requests on an image will
remove the objects that back the discarded range if they're
completely contained within that range; the object at the end
of the discarded range is truncated or zeroed.
In any case, this looks nice to me. It's another case
where adding the feature seems not that intrusive, and
fits within the existing framework pretty well.
There is a bug below, which you should fix. I have a few
other comments too, for you to consider.
Reviewed-by: Alex Elder <elder@linaro.org>
> Signed-off-by: Guangliang Zhao <lucienchao@gmail.com>
> ---
> drivers/block/rbd.c | 105 +++++++++++++++++++++++++++++++++++++++++++++------
> 1 file changed, 93 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index ca1fd14..67ea156 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -212,6 +212,7 @@ enum obj_request_type {
> enum obj_operation_type {
> OBJ_OPT_WRITE,
> OBJ_OPT_READ,
> + OBJ_OPT_DISCARD,
I really find the "OPT" distracting. It reads as "option"
to me. OBJ_OP_* would be my suggestion.
> };
>
> /*
> @@ -221,6 +222,7 @@ enum obj_operation_type {
> static const char* obj_opt[] = {
> "write",
> "read",
> + "discard",
> };
>
> enum obj_req_flags {
> @@ -228,6 +230,7 @@ enum obj_req_flags {
> OBJ_REQ_IMG_DATA, /* object usage: standalone = 0, image = 1 */
> OBJ_REQ_KNOWN, /* EXISTS flag valid: no = 0, yes = 1 */
> OBJ_REQ_EXISTS, /* target exists: no = 0, yes = 1 */
> + IMG_REQ_DISCARD, /* discard: normal = 0, discard request = 1 */
This is an image request flag so it should be added to the img_req_flags
enum definition, not here. This is a bug.
> };
>
> struct rbd_obj_request {
> @@ -1518,6 +1521,21 @@ static bool img_request_write_test(struct rbd_img_request *img_request)
> return test_bit(IMG_REQ_WRITE, &img_request->flags) != 0;
> }
>
> +/*
> + * Set the discard flag when the img_request is an discard request
> + */
> +static void img_request_discard_set(struct rbd_img_request *img_request)
> +{
> + set_bit(IMG_REQ_DISCARD, &img_request->flags);
> + smp_mb();
> +}
> +
> +static bool img_request_discard_test(struct rbd_img_request *img_request)
> +{
> + smp_mb();
> + return test_bit(IMG_REQ_DISCARD, &img_request->flags) != 0;
> +}
> +
> static void img_request_child_set(struct rbd_img_request *img_request)
> {
> set_bit(IMG_REQ_CHILD, &img_request->flags);
> @@ -1640,6 +1658,18 @@ static void rbd_osd_write_callback(struct rbd_obj_request *obj_request)
> obj_request_done_set(obj_request);
> }
>
> +static void rbd_osd_discard_callback(struct rbd_obj_request *obj_request)
> +{
> + dout("%s: obj %p result %d %llu\n", __func__, obj_request,
> + obj_request->result, obj_request->length);
> + /*
> + * There is no such thing as a successful short write. Set
Maybe update the comment here to say "short *discard*" (not write).
I'm assuming the statement is true, I didn't verify it.
> + * it to our originally-requested length.
> + */
> + obj_request->xferred = obj_request->length;
> + obj_request_done_set(obj_request);
> +}
> +
> /*
> * For a simple stat call there's nothing to do. We'll do more if
> * this is part of a write sequence for a layered image.
> @@ -1687,6 +1717,11 @@ static void rbd_osd_req_callback(struct ceph_osd_request *osd_req,
> case CEPH_OSD_OP_STAT:
> rbd_osd_stat_callback(obj_request);
> break;
> + case CEPH_OSD_OP_DELETE:
> + case CEPH_OSD_OP_TRUNCATE:
> + case CEPH_OSD_OP_ZERO:
> + rbd_osd_discard_callback(obj_request);
> + break;
> case CEPH_OSD_OP_CALL:
> case CEPH_OSD_OP_NOTIFY_ACK:
> case CEPH_OSD_OP_WATCH:
> @@ -1729,6 +1764,20 @@ static void rbd_osd_req_format_write(struct rbd_obj_request *obj_request)
> snapc, CEPH_NOSNAP, &mtime);
> }
>
> +static void rbd_osd_req_format_discard(struct rbd_obj_request *obj_request)
> +{
> + struct rbd_img_request *img_request = obj_request->img_request;
> + struct ceph_osd_request *osd_req = obj_request->osd_req;
> + struct timespec mtime = CURRENT_TIME;
> + u64 snap_id;
> +
> + rbd_assert(osd_req != NULL);
> +
> + snap_id = img_request ? img_request->snap_id : CEPH_NOSNAP;
> + ceph_osdc_build_request(osd_req, obj_request->offset,
> + NULL, snap_id, &mtime);
> +}
> +
> static struct ceph_osd_request *rbd_osd_req_create(
> struct rbd_device *rbd_dev,
> enum obj_operation_type type,
> @@ -1738,12 +1787,15 @@ static struct ceph_osd_request *rbd_osd_req_create(
> struct ceph_osd_client *osdc;
> struct ceph_osd_request *osd_req;
> bool write_request = (type == OBJ_OPT_WRITE) != 0;
> + bool discard_request = (type == OBJ_OPT_DISCARD) != 0;
>
> if (obj_request_img_data_test(obj_request)) {
> struct rbd_img_request *img_request = obj_request->img_request;
>
> rbd_assert(write_request ==
> img_request_write_test(img_request));
> + rbd_assert(discard_request ==
> + img_request_discard_test(img_request));
>
> if (type == OBJ_OPT_WRITE)
> snapc = img_request->snapc;
> @@ -1756,7 +1808,7 @@ static struct ceph_osd_request *rbd_osd_req_create(
> if (!osd_req)
> return NULL; /* ENOMEM */
>
> - if (type == OBJ_OPT_WRITE)
> + if (type == OBJ_OPT_WRITE || type == OBJ_OPT_DISCARD)
> osd_req->r_flags = CEPH_OSD_FLAG_WRITE | CEPH_OSD_FLAG_ONDISK;
> else
> osd_req->r_flags = CEPH_OSD_FLAG_READ;
> @@ -1982,7 +2034,10 @@ static struct rbd_img_request *rbd_img_request_create(
> img_request->offset = offset;
> img_request->length = length;
> img_request->flags = 0;
> - if (type == OBJ_OPT_WRITE) {
> + if (type == OBJ_OPT_DISCARD) {
> + img_request_discard_set(img_request);
> + img_request->snap_id = rbd_dev->spec->snap_id;
> + } else if (type == OBJ_OPT_WRITE) {
> img_request_write_set(img_request);
> img_request->snapc = rbd_dev->header.snapc;
> } else {
> @@ -2083,8 +2138,13 @@ static bool rbd_img_obj_end_request(struct rbd_obj_request *obj_request)
> struct rbd_device *rbd_dev = img_request->rbd_dev;
> enum obj_operation_type type;
>
> - type = img_request_write_test(img_request) ? OBJ_OPT_WRITE :
> - OBJ_OPT_READ;
> + if (img_request_discard_test(img_request))
> + type = OBJ_OPT_DISCARD;
> + else if (img_request_write_test(img_request))
> + type = OBJ_OPT_WRITE;
> + else
> + type = OBJ_OPT_READ;
> +
> rbd_warn(rbd_dev, "%s %llx at %llx (%llx)\n",
> obj_opt[type], obj_request->length,
> obj_request->img_offset, obj_request->offset);
> @@ -2170,6 +2230,7 @@ static int rbd_img_request_fill(struct rbd_img_request *img_request,
> unsigned int bio_offset = 0;
> struct page **pages = NULL;
> enum obj_operation_type op_type;
> + u64 object_size = (u64) 1 << rbd_dev->header.obj_order;
> u64 img_offset;
> u64 resid;
> u16 opcode;
> @@ -2185,8 +2246,7 @@ static int rbd_img_request_fill(struct rbd_img_request *img_request,
> bio_list = data_desc;
> rbd_assert(img_offset ==
> bio_list->bi_iter.bi_sector << SECTOR_SHIFT);
> - } else {
> - rbd_assert(type == OBJ_REQUEST_PAGES);
> + } else if (type == OBJ_REQUEST_PAGES) {
> pages = data_desc;
> }
>
> @@ -2225,7 +2285,7 @@ static int rbd_img_request_fill(struct rbd_img_request *img_request,
> GFP_ATOMIC);
> if (!obj_request->bio_list)
> goto out_partial;
> - } else {
> + } else if (type == OBJ_REQUEST_PAGES) {
> unsigned int page_count;
>
> obj_request->pages = pages;
> @@ -2236,7 +2296,15 @@ static int rbd_img_request_fill(struct rbd_img_request *img_request,
> pages += page_count;
> }
>
> - if (img_request_write_test(img_request)) {
> + if (img_request_discard_test(img_request)) {
> + op_type = OBJ_OPT_DISCARD;
> + if (!offset && length == object_size)
> + opcode = CEPH_OSD_OP_DELETE;
> + else if (offset + length == object_size)
> + opcode = CEPH_OSD_OP_TRUNCATE;
I wonder if the OSD detects a zero operation that extends to
the end of the object, and if so, it could convert it to a
truncate. This would happen if the last object were not
a "full-sized" image object.
> + else
> + opcode = CEPH_OSD_OP_ZERO;
> + } else if (img_request_write_test(img_request)) {
> op_type = OBJ_OPT_WRITE;
> opcode = CEPH_OSD_OP_WRITE;
> } else {
> @@ -2255,13 +2323,15 @@ static int rbd_img_request_fill(struct rbd_img_request *img_request,
> if (type == OBJ_REQUEST_BIO)
> osd_req_op_extent_osd_data_bio(osd_req, 0,
> obj_request->bio_list, length);
> - else
> + else if (type == OBJ_REQUEST_PAGES)
> osd_req_op_extent_osd_data_pages(osd_req, 0,
> obj_request->pages, length,
> offset & ~PAGE_MASK, false, false);
>
> if (op_type == OBJ_OPT_WRITE)
> rbd_osd_req_format_write(obj_request);
> + else if (op_type == OBJ_OPT_DISCARD)
> + rbd_osd_req_format_discard(obj_request);
> else
> rbd_osd_req_format_read(obj_request);
>
> @@ -3093,7 +3163,9 @@ static void rbd_request_fn(struct request_queue *q)
>
> spin_unlock_irq(q->queue_lock);
>
> - if (rq->cmd_flags & REQ_WRITE)
> + if (rq->cmd_flags & REQ_DISCARD)
> + type = OBJ_OPT_DISCARD;
> + else if (rq->cmd_flags & REQ_WRITE)
> type = OBJ_OPT_WRITE;
> else
> type = OBJ_OPT_READ;
> @@ -3143,8 +3215,12 @@ static void rbd_request_fn(struct request_queue *q)
>
> img_request->rq = rq;
>
> - result = rbd_img_request_fill(img_request, OBJ_REQUEST_BIO,
> - rq->bio);
> + if (type == OBJ_OPT_DISCARD)
> + result = rbd_img_request_fill(img_request,
> + OBJ_REQUEST_NODATA, NULL);
> + else
> + result = rbd_img_request_fill(img_request,
> + OBJ_REQUEST_BIO, rq->bio);
> if (!result)
> result = rbd_img_request_submit(img_request);
> if (result)
> @@ -3452,6 +3528,11 @@ static int rbd_init_disk(struct rbd_device *rbd_dev)
> blk_queue_io_min(q, segment_size);
> blk_queue_io_opt(q, segment_size);
>
> + /* enable the discard support */
> + queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
> + q->limits.discard_granularity = segment_size;
> + q->limits.discard_alignment = segment_size;
> +
> blk_queue_merge_bvec(q, rbd_merge_bvec);
> disk->queue = q;
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] patches for rbd
2014-03-12 4:24 [PATCH 0/3] patches for rbd Guangliang Zhao
` (2 preceding siblings ...)
2014-03-12 4:24 ` [PATCH 3/3] rbd: add discard support for rbd Guangliang Zhao
@ 2014-03-12 5:28 ` Alex Elder
2014-03-12 8:12 ` Guangliang Zhao
2014-03-12 9:34 ` Jean-Tiare LE BIGOT
4 siblings, 1 reply; 10+ messages in thread
From: Alex Elder @ 2014-03-12 5:28 UTC (permalink / raw)
To: Guangliang Zhao, ceph-devel; +Cc: sage
On 03/11/2014 11:24 PM, Guangliang Zhao wrote:
> Hi,
>
> I am sorry that didn't notice Jean-Tiare's mail at all, just sage's
> reminds me.
>
> There are 3 patches, the first is optimization for rbd writing, the
> remaining two are support for rbd discard. If you have completed the
> discard things, just ignore them.
>
> Guangliang Zhao (3):
> rbd: skip the copyup when an entire object writing
> rbd: extend the operation type
> rbd: add discard support for rbd
>
> drivers/block/rbd.c | 190 +++++++++++++++++++++++++++++++++++++++++----------
> 1 file changed, 154 insertions(+), 36 deletions(-)
>
Oh, one more thing. I reviewed only the code changes presented
here. I did not go back to walk through the discard call chains
to make sure you're doing things correctly. I'm just assuming
that, and based on that, I think your changes look overall correct
to me.
-Alex
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] patches for rbd
2014-03-12 5:28 ` [PATCH 0/3] patches " Alex Elder
@ 2014-03-12 8:12 ` Guangliang Zhao
0 siblings, 0 replies; 10+ messages in thread
From: Guangliang Zhao @ 2014-03-12 8:12 UTC (permalink / raw)
To: Alex Elder; +Cc: ceph-devel, sage
On Wed, Mar 12, 2014 at 12:28:16AM -0500, Alex Elder wrote:
> On 03/11/2014 11:24 PM, Guangliang Zhao wrote:
Hi Alex,
Thanks very much for the reviews, version 2 would coming soon.
> > Hi,
> >
> > I am sorry that didn't notice Jean-Tiare's mail at all, just sage's
> > reminds me.
> >
> > There are 3 patches, the first is optimization for rbd writing, the
> > remaining two are support for rbd discard. If you have completed the
> > discard things, just ignore them.
> >
> > Guangliang Zhao (3):
> > rbd: skip the copyup when an entire object writing
> > rbd: extend the operation type
> > rbd: add discard support for rbd
> >
> > drivers/block/rbd.c | 190 +++++++++++++++++++++++++++++++++++++++++----------
> > 1 file changed, 154 insertions(+), 36 deletions(-)
> >
>
> Oh, one more thing. I reviewed only the code changes presented
> here. I did not go back to walk through the discard call chains
> to make sure you're doing things correctly. I'm just assuming
> that, and based on that, I think your changes look overall correct
> to me.
>
> -Alex
--
Best regards,
Guangliang
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] patches for rbd
2014-03-12 4:24 [PATCH 0/3] patches for rbd Guangliang Zhao
` (3 preceding siblings ...)
2014-03-12 5:28 ` [PATCH 0/3] patches " Alex Elder
@ 2014-03-12 9:34 ` Jean-Tiare LE BIGOT
4 siblings, 0 replies; 10+ messages in thread
From: Jean-Tiare LE BIGOT @ 2014-03-12 9:34 UTC (permalink / raw)
To: Guangliang Zhao; +Cc: ceph-devel, sage
Hi,
I must admit I've put it aside for the moment. So, I really don't mind,
on the contrary, if you implement the DISCARD op.
On 03/12/2014 05:24 AM, Guangliang Zhao wrote:
> I am sorry that didn't notice Jean-Tiare's mail at all, just sage's
> reminds me.
--
Jean-Tiare, shared-hosting team
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-03-12 13:10 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-12 4:24 [PATCH 0/3] patches for rbd Guangliang Zhao
2014-03-12 4:24 ` [PATCH 1/3] rbd: skip the copyup when an entire object writing Guangliang Zhao
2014-03-12 4:39 ` Alex Elder
2014-03-12 4:24 ` [PATCH 2/3] rbd: extend the operation type Guangliang Zhao
2014-03-12 5:01 ` Alex Elder
2014-03-12 4:24 ` [PATCH 3/3] rbd: add discard support for rbd Guangliang Zhao
2014-03-12 5:26 ` Alex Elder
2014-03-12 5:28 ` [PATCH 0/3] patches " Alex Elder
2014-03-12 8:12 ` Guangliang Zhao
2014-03-12 9:34 ` Jean-Tiare LE BIGOT
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.