* OSD Data Read/Write Separation Patches
@ 2013-03-04 18:05 Alex Elder
2013-03-04 18:08 ` [PATCH 0/3] libceph: a few cleanups Alex Elder
` (3 more replies)
0 siblings, 4 replies; 29+ messages in thread
From: Alex Elder @ 2013-03-04 18:05 UTC (permalink / raw)
To: ceph-devel
What follows are four series of patches that lead up to making
it clear exactly which data in an osd operation is outgoing
(being sent to an osd) and which is incoming (the result of
a read or object method call operation). They build on each
other, with the results of one making the subsequent ones
possible (or at least easier to understand and verify).
The four series are:
[PATCH 0/3] libceph: a few cleanups
[PATCH 1/3] libceph: use (void *) for untyped data in osd ops
[PATCH 2/3] libceph: kill ceph_msg->pagelist_count
[PATCH 3/3] libceph: rename ceph_calc_object_layout()
These are sort of miscellaneous cleanup, not directly
related but fixed while I was working with nearby code.
[PATCH 0/3] libceph: simplify incoming message allocation
[PATCH 1/3] libceph: drop mutex while allocating a message
[PATCH 2/3] libceph: define mds_alloc_msg() method
[PATCH 3/3] libceph: no need for alignment for mds message
This simplifies and localizes the code that allocates
messages for incoming data.
[PATCH 0/3] ceph: assign message data fields consistently
[PATCH 1/3] ceph: use calc_pages_for() in start_read()
[PATCH 2/3] ceph: simplify ceph_sync_write() page_align calculation
[PATCH 3/3] libceph: don't assign page info in ceph_osdc_new_request()
This allows the assignment of everything about the pages
for holding data in an osd request to happen in consecutive
lines of code, and such that it's done the same way throughout
the ceph code.
[PATCH 0/3] libceph: distinguish osd request read and write data
[PATCH 1/3] libceph: separate osd request data info
[PATCH 2/3] libceph: distinguish page and bio requests
[PATCH 3/3] libceph: separate read and write data
This encapsulates information about osd request data into a
separate data structure, then defines separate instances
of that structure for incoming and outgoing data.
These patches are available in branch "review/wip-rwsep" on
the ceph-client git repository. Included before them are two
others I posted recently:
[PATCH] libceph: fix wrong opcode use in osd_req_encode_op()
[PATCH] libceph: complete lingering requests only once
-Alex
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 0/3] libceph: a few cleanups
2013-03-04 18:05 OSD Data Read/Write Separation Patches Alex Elder
@ 2013-03-04 18:08 ` Alex Elder
2013-03-04 18:09 ` [PATCH 1/3] libceph: use (void *) for untyped data in osd ops Alex Elder
` (3 more replies)
2013-03-04 18:10 ` [PATCH 0/3] libceph: simplify incoming message allocation Alex Elder
` (2 subsequent siblings)
3 siblings, 4 replies; 29+ messages in thread
From: Alex Elder @ 2013-03-04 18:08 UTC (permalink / raw)
To: ceph-devel
Three small cleanups in the libceph code. -Alex
[PATCH 1/3] libceph: use (void *) for untyped data in osd ops
[PATCH 2/3] libceph: kill ceph_msg->pagelist_count
[PATCH 3/3] libceph: rename ceph_calc_object_layout()
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 1/3] libceph: use (void *) for untyped data in osd ops
2013-03-04 18:08 ` [PATCH 0/3] libceph: a few cleanups Alex Elder
@ 2013-03-04 18:09 ` Alex Elder
2013-03-04 18:09 ` [PATCH 2/3] libceph: kill ceph_msg->pagelist_count Alex Elder
` (2 subsequent siblings)
3 siblings, 0 replies; 29+ messages in thread
From: Alex Elder @ 2013-03-04 18:09 UTC (permalink / raw)
To: ceph-devel
Two of the fields defining osd operations are defined using (char *)
while the data they represent are really untyped, not character
strings. Change them to have type (void *).
Signed-off-by: Alex Elder <elder@inktank.com>
---
include/linux/ceph/osd_client.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/linux/ceph/osd_client.h
b/include/linux/ceph/osd_client.h
index a79f833..ec33588 100644
--- a/include/linux/ceph/osd_client.h
+++ b/include/linux/ceph/osd_client.h
@@ -184,7 +184,7 @@ struct ceph_osd_req_op {
} extent;
struct {
const char *name;
- const char *val;
+ const void *val;
u32 name_len;
u32 value_len;
__u8 cmp_op; /* CEPH_OSD_CMPXATTR_OP_* */
@@ -193,7 +193,7 @@ struct ceph_osd_req_op {
struct {
const char *class_name;
const char *method_name;
- const char *indata;
+ const void *indata;
u32 indata_len;
__u8 class_len;
__u8 method_len;
--
1.7.9.5
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 2/3] libceph: kill ceph_msg->pagelist_count
2013-03-04 18:08 ` [PATCH 0/3] libceph: a few cleanups Alex Elder
2013-03-04 18:09 ` [PATCH 1/3] libceph: use (void *) for untyped data in osd ops Alex Elder
@ 2013-03-04 18:09 ` Alex Elder
2013-03-04 18:09 ` [PATCH 3/3] libceph: rename ceph_calc_object_layout() Alex Elder
2013-03-05 1:49 ` [PATCH 0/3] libceph: a few cleanups Josh Durgin
3 siblings, 0 replies; 29+ messages in thread
From: Alex Elder @ 2013-03-04 18:09 UTC (permalink / raw)
To: ceph-devel
The pagelist_count field is never actually used, so get rid of it.
Signed-off-by: Alex Elder <elder@inktank.com>
---
fs/ceph/mds_client.c | 1 -
include/linux/ceph/messenger.h | 1 -
net/ceph/messenger.c | 2 --
3 files changed, 4 deletions(-)
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 44435c2..76b7344 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -2600,7 +2600,6 @@ static void send_mds_reconnect(struct
ceph_mds_client *mdsc,
}
reply->pagelist = pagelist;
- reply->pagelist_count = calc_pages_for(0, pagelist->length);
if (recon_state.flock)
reply->hdr.version = cpu_to_le16(2);
reply->hdr.data_len = cpu_to_le32(pagelist->length);
diff --git a/include/linux/ceph/messenger.h b/include/linux/ceph/messenger.h
index 1b08349..6c11874 100644
--- a/include/linux/ceph/messenger.h
+++ b/include/linux/ceph/messenger.h
@@ -78,7 +78,6 @@ struct ceph_msg {
unsigned page_count; /* size of page array */
unsigned page_alignment; /* io offset in first page */
struct ceph_pagelist *pagelist; /* instead of pages */
- unsigned int pagelist_count; /* number of pages in pagelist */
struct ceph_connection *con;
struct list_head list_head;
diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
index 9d8abb0..0f9933a 100644
--- a/net/ceph/messenger.c
+++ b/net/ceph/messenger.c
@@ -2718,7 +2718,6 @@ struct ceph_msg *ceph_msg_new(int type, int
front_len, gfp_t flags,
m->page_count = 0;
m->page_alignment = 0;
m->pages = NULL;
- m->pagelist_count = 0;
m->pagelist = NULL;
#ifdef CONFIG_BLOCK
m->bio = NULL;
@@ -2898,7 +2897,6 @@ void ceph_msg_last_put(struct kref *kref)
ceph_pagelist_release(m->pagelist);
kfree(m->pagelist);
m->pagelist = NULL;
- m->pagelist_count = 0;
}
m->trail = NULL;
--
1.7.9.5
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 3/3] libceph: rename ceph_calc_object_layout()
2013-03-04 18:08 ` [PATCH 0/3] libceph: a few cleanups Alex Elder
2013-03-04 18:09 ` [PATCH 1/3] libceph: use (void *) for untyped data in osd ops Alex Elder
2013-03-04 18:09 ` [PATCH 2/3] libceph: kill ceph_msg->pagelist_count Alex Elder
@ 2013-03-04 18:09 ` Alex Elder
2013-03-05 1:49 ` [PATCH 0/3] libceph: a few cleanups Josh Durgin
3 siblings, 0 replies; 29+ messages in thread
From: Alex Elder @ 2013-03-04 18:09 UTC (permalink / raw)
To: ceph-devel
The purpose of ceph_calc_object_layout() is to fill in the pool
number and seed for a ceph_pg structure provided, based on a given
osd map and target object id.
Currently that function takes a file layout parameter, but the only
thing used out of that is its pool number.
Change the function so it takes a pool number rather than the full
file layout structure. Only update the ceph_pg if the pool is found
in the osd map. Get rid of few useless lines of code from the
function while there.
Since the function now very clearly just fills in the ceph_pg
structure it's provided, rename it ceph_calc_ceph_pg().
Signed-off-by: Alex Elder <elder@inktank.com>
---
fs/ceph/ioctl.c | 5 +++--
include/linux/ceph/osdmap.h | 6 ++----
net/ceph/osd_client.c | 4 ++--
net/ceph/osdmap.c | 23 +++++++++--------------
4 files changed, 16 insertions(+), 22 deletions(-)
diff --git a/fs/ceph/ioctl.c b/fs/ceph/ioctl.c
index 7d85991..6c74987 100644
--- a/fs/ceph/ioctl.c
+++ b/fs/ceph/ioctl.c
@@ -208,8 +208,9 @@ static long ceph_ioctl_get_dataloc(struct file
*file, void __user *arg)
snprintf(dl.object_name, sizeof(dl.object_name), "%llx.%08llx",
ceph_ino(inode), dl.object_no);
- ceph_calc_object_layout(&pgid, dl.object_name, &ci->i_layout,
- osdc->osdmap);
+
+ ceph_calc_ceph_pg(&pgid, dl.object_name, osdc->osdmap,
+ ceph_file_layout_pg_pool(ci->i_layout));
dl.osd = ceph_calc_pg_primary(osdc->osdmap, pgid);
if (dl.osd >= 0) {
diff --git a/include/linux/ceph/osdmap.h b/include/linux/ceph/osdmap.h
index c819190..167daf6 100644
--- a/include/linux/ceph/osdmap.h
+++ b/include/linux/ceph/osdmap.h
@@ -131,10 +131,8 @@ extern int ceph_calc_file_object_mapping(struct
ceph_file_layout *layout,
u64 *bno, u64 *oxoff, u64 *oxlen);
/* calculate mapping of object to a placement group */
-extern int ceph_calc_object_layout(struct ceph_pg *pg,
- const char *oid,
- struct ceph_file_layout *fl,
- struct ceph_osdmap *osdmap);
+extern int ceph_calc_ceph_pg(struct ceph_pg *pg, const char *oid,
+ struct ceph_osdmap *osdmap, uint64_t pool);
extern int ceph_calc_pg_acting(struct ceph_osdmap *osdmap,
struct ceph_pg pgid,
int *acting);
diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index d7ce457..38d09d1 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -948,8 +948,8 @@ static int __map_request(struct ceph_osd_client *osdc,
int err;
dout("map_request %p tid %lld\n", req, req->r_tid);
- err = ceph_calc_object_layout(&pgid, req->r_oid,
- &req->r_file_layout, osdc->osdmap);
+ err = ceph_calc_ceph_pg(&pgid, req->r_oid, osdc->osdmap,
+ ceph_file_layout_pg_pool(req->r_file_layout));
if (err) {
list_move(&req->r_req_lru_item, &osdc->req_notarget);
return err;
diff --git a/net/ceph/osdmap.c b/net/ceph/osdmap.c
index 69bc4bf..a47ee06 100644
--- a/net/ceph/osdmap.c
+++ b/net/ceph/osdmap.c
@@ -1095,27 +1095,22 @@ EXPORT_SYMBOL(ceph_calc_file_object_mapping);
* calculate an object layout (i.e. pgid) from an oid,
* file_layout, and osdmap
*/
-int ceph_calc_object_layout(struct ceph_pg *pg,
- const char *oid,
- struct ceph_file_layout *fl,
- struct ceph_osdmap *osdmap)
+int ceph_calc_ceph_pg(struct ceph_pg *pg, const char *oid,
+ struct ceph_osdmap *osdmap, uint64_t pool)
{
- unsigned int num, num_mask;
- struct ceph_pg_pool_info *pool;
+ struct ceph_pg_pool_info *pool_info;
BUG_ON(!osdmap);
- pg->pool = le32_to_cpu(fl->fl_pg_pool);
- pool = __lookup_pg_pool(&osdmap->pg_pools, pg->pool);
- if (!pool)
+ pool_info = __lookup_pg_pool(&osdmap->pg_pools, pool);
+ if (!pool_info)
return -EIO;
- pg->seed = ceph_str_hash(pool->object_hash, oid, strlen(oid));
- num = pool->pg_num;
- num_mask = pool->pg_num_mask;
+ pg->pool = pool;
+ pg->seed = ceph_str_hash(pool_info->object_hash, oid, strlen(oid));
- dout("calc_object_layout '%s' pgid %lld.%x\n", oid, pg->pool, pg->seed);
+ dout("%s '%s' pgid %lld.%x\n", __func__, oid, pg->pool, pg->seed);
return 0;
}
-EXPORT_SYMBOL(ceph_calc_object_layout);
+EXPORT_SYMBOL(ceph_calc_ceph_pg);
/*
* Calculate raw osd vector for the given pgid. Return pointer to osd
--
1.7.9.5
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 0/3] libceph: simplify incoming message allocation
2013-03-04 18:05 OSD Data Read/Write Separation Patches Alex Elder
2013-03-04 18:08 ` [PATCH 0/3] libceph: a few cleanups Alex Elder
@ 2013-03-04 18:10 ` Alex Elder
2013-03-04 18:12 ` [PATCH 1/3] libceph: drop mutex while allocating a message Alex Elder
` (2 more replies)
2013-03-04 18:14 ` [PATCH 0/3] ceph: assign message data fields consistently Alex Elder
2013-03-04 18:16 ` [PATCH 0/3] libceph: distinguish osd request read and write data Alex Elder
3 siblings, 3 replies; 29+ messages in thread
From: Alex Elder @ 2013-03-04 18:10 UTC (permalink / raw)
To: ceph-devel
Simplify the code that allocates a message for incoming data
by having all users of the messenger interface define an
alloc_msg method.
-Alex
[PATCH 1/3] libceph: drop mutex while allocating a message
[PATCH 2/3] libceph: define mds_alloc_msg() method
[PATCH 3/3] libceph: no need for alignment for mds message
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 1/3] libceph: drop mutex while allocating a message
2013-03-04 18:10 ` [PATCH 0/3] libceph: simplify incoming message allocation Alex Elder
@ 2013-03-04 18:12 ` Alex Elder
2013-03-04 19:07 ` Gregory Farnum
2013-03-04 18:12 ` [PATCH 2/3] libceph: define mds_alloc_msg() method Alex Elder
2013-03-04 18:12 ` [PATCH 3/3] libceph: no need for alignment for mds message Alex Elder
2 siblings, 1 reply; 29+ messages in thread
From: Alex Elder @ 2013-03-04 18:12 UTC (permalink / raw)
To: ceph-devel
In ceph_con_in_msg_alloc(), if no alloc_msg method is defined for a
connection a new message is allocated with ceph_msg_new().
Drop the mutex before making this call, and make sure we're still
connected when we get it back again.
This is preparing for the next patch, which ensures all connections
define an alloc_msg method, and then handles them all the same way.
Signed-off-by: Alex Elder <elder@inktank.com>
---
net/ceph/messenger.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
index 0f9933a..6ec6051 100644
--- a/net/ceph/messenger.c
+++ b/net/ceph/messenger.c
@@ -2807,13 +2807,12 @@ static int ceph_con_in_msg_alloc(struct
ceph_connection *con, int *skip)
int type = le16_to_cpu(hdr->type);
int front_len = le32_to_cpu(hdr->front_len);
int middle_len = le32_to_cpu(hdr->middle_len);
+ struct ceph_msg *msg;
int ret = 0;
BUG_ON(con->in_msg != NULL);
if (con->ops->alloc_msg) {
- struct ceph_msg *msg;
-
mutex_unlock(&con->mutex);
msg = con->ops->alloc_msg(con, hdr, skip);
mutex_lock(&con->mutex);
@@ -2838,12 +2837,19 @@ static int ceph_con_in_msg_alloc(struct
ceph_connection *con, int *skip)
}
}
if (!con->in_msg) {
- con->in_msg = ceph_msg_new(type, front_len, GFP_NOFS, false);
- if (!con->in_msg) {
+ mutex_unlock(&con->mutex);
+ msg = ceph_msg_new(type, front_len, GFP_NOFS, false);
+ mutex_lock(&con->mutex);
+ if (!msg) {
pr_err("unable to allocate msg type %d len %d\n",
type, front_len);
return -ENOMEM;
}
+ if (con->state != CON_STATE_OPEN) {
+ ceph_msg_put(msg);
+ return -EAGAIN;
+ }
+ con->in_msg = msg;
con->in_msg->con = con->ops->get(con);
BUG_ON(con->in_msg->con == NULL);
con->in_msg->page_alignment = le16_to_cpu(hdr->data_off);
--
1.7.9.5
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 2/3] libceph: define mds_alloc_msg() method
2013-03-04 18:10 ` [PATCH 0/3] libceph: simplify incoming message allocation Alex Elder
2013-03-04 18:12 ` [PATCH 1/3] libceph: drop mutex while allocating a message Alex Elder
@ 2013-03-04 18:12 ` Alex Elder
2013-03-04 19:05 ` Gregory Farnum
2013-03-04 18:12 ` [PATCH 3/3] libceph: no need for alignment for mds message Alex Elder
2 siblings, 1 reply; 29+ messages in thread
From: Alex Elder @ 2013-03-04 18:12 UTC (permalink / raw)
To: ceph-devel
The only user of the ceph messenger that doesn't define an alloc_msg
method is the mds client. Define one, such that it works just like
it did before, and simplify ceph_con_in_msg_alloc() by assuming the
alloc_msg method is always present.
This and the next patch resolve:
http://tracker.ceph.com/issues/4322
Signed-off-by: Alex Elder <elder@inktank.com>
---
fs/ceph/mds_client.c | 23 ++++++++++++++++++++
net/ceph/messenger.c | 59
++++++++++++++++----------------------------------
2 files changed, 42 insertions(+), 40 deletions(-)
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index 76b7344..a4b986a 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -3469,6 +3469,28 @@ static int invalidate_authorizer(struct
ceph_connection *con)
return ceph_monc_validate_auth(&mdsc->fsc->client->monc);
}
+static struct ceph_msg *mds_alloc_msg(struct ceph_connection *con,
+ struct ceph_msg_header *hdr, int *skip)
+{
+ struct ceph_msg *msg;
+ int type = (int) le16_to_cpu(hdr->type);
+ int front_len = (int) le32_to_cpu(hdr->front_len);
+
+ if (con->in_msg)
+ return con->in_msg;
+
+ *skip = 0;
+ msg = ceph_msg_new(type, front_len, GFP_NOFS, false);
+ if (!msg) {
+ pr_err("unable to allocate msg type %d len %d\n",
+ type, front_len);
+ return NULL;
+ }
+ msg->page_alignment = (unsigned int) le16_to_cpu(hdr->data_off);
+
+ return msg;
+}
+
static const struct ceph_connection_operations mds_con_ops = {
.get = con_get,
.put = con_put,
@@ -3477,6 +3499,7 @@ static const struct ceph_connection_operations
mds_con_ops = {
.verify_authorizer_reply = verify_authorizer_reply,
.invalidate_authorizer = invalidate_authorizer,
.peer_reset = peer_reset,
+ .alloc_msg = mds_alloc_msg,
};
/* eof */
diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
index 6ec6051..c7d4278 100644
--- a/net/ceph/messenger.c
+++ b/net/ceph/messenger.c
@@ -2804,55 +2804,34 @@ static int ceph_alloc_middle(struct
ceph_connection *con, struct ceph_msg *msg)
static int ceph_con_in_msg_alloc(struct ceph_connection *con, int *skip)
{
struct ceph_msg_header *hdr = &con->in_hdr;
- int type = le16_to_cpu(hdr->type);
- int front_len = le32_to_cpu(hdr->front_len);
int middle_len = le32_to_cpu(hdr->middle_len);
struct ceph_msg *msg;
int ret = 0;
BUG_ON(con->in_msg != NULL);
+ BUG_ON(!con->ops->alloc_msg);
- if (con->ops->alloc_msg) {
- mutex_unlock(&con->mutex);
- msg = con->ops->alloc_msg(con, hdr, skip);
- mutex_lock(&con->mutex);
- if (con->state != CON_STATE_OPEN) {
- if (msg)
- ceph_msg_put(msg);
- return -EAGAIN;
- }
- con->in_msg = msg;
- if (con->in_msg) {
- con->in_msg->con = con->ops->get(con);
- BUG_ON(con->in_msg->con == NULL);
- }
- if (*skip) {
- con->in_msg = NULL;
- return 0;
- }
- if (!con->in_msg) {
- con->error_msg =
- "error allocating memory for incoming message";
- return -ENOMEM;
- }
- }
- if (!con->in_msg) {
- mutex_unlock(&con->mutex);
- msg = ceph_msg_new(type, front_len, GFP_NOFS, false);
- mutex_lock(&con->mutex);
- if (!msg) {
- pr_err("unable to allocate msg type %d len %d\n",
- type, front_len);
- return -ENOMEM;
- }
- if (con->state != CON_STATE_OPEN) {
+ mutex_unlock(&con->mutex);
+ msg = con->ops->alloc_msg(con, hdr, skip);
+ mutex_lock(&con->mutex);
+ if (con->state != CON_STATE_OPEN) {
+ if (msg)
ceph_msg_put(msg);
- return -EAGAIN;
- }
- con->in_msg = msg;
+ return -EAGAIN;
+ }
+ con->in_msg = msg;
+ if (con->in_msg) {
con->in_msg->con = con->ops->get(con);
BUG_ON(con->in_msg->con == NULL);
- con->in_msg->page_alignment = le16_to_cpu(hdr->data_off);
+ }
+ if (*skip) {
+ con->in_msg = NULL;
+ return 0;
+ }
+ if (!con->in_msg) {
+ con->error_msg =
+ "error allocating memory for incoming message";
+ return -ENOMEM;
}
memcpy(&con->in_msg->hdr, &con->in_hdr, sizeof(con->in_hdr));
--
1.7.9.5
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 3/3] libceph: no need for alignment for mds message
2013-03-04 18:10 ` [PATCH 0/3] libceph: simplify incoming message allocation Alex Elder
2013-03-04 18:12 ` [PATCH 1/3] libceph: drop mutex while allocating a message Alex Elder
2013-03-04 18:12 ` [PATCH 2/3] libceph: define mds_alloc_msg() method Alex Elder
@ 2013-03-04 18:12 ` Alex Elder
2013-03-04 19:09 ` Gregory Farnum
2 siblings, 1 reply; 29+ messages in thread
From: Alex Elder @ 2013-03-04 18:12 UTC (permalink / raw)
To: ceph-devel
Currently, incoming mds messages never use page data, which means
there is no need to set the page_alignment field in the message.
Signed-off-by: Alex Elder <elder@inktank.com>
---
fs/ceph/mds_client.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
index a4b986a..afb9dc9 100644
--- a/fs/ceph/mds_client.c
+++ b/fs/ceph/mds_client.c
@@ -3486,7 +3486,6 @@ static struct ceph_msg *mds_alloc_msg(struct
ceph_connection *con,
type, front_len);
return NULL;
}
- msg->page_alignment = (unsigned int) le16_to_cpu(hdr->data_off);
return msg;
}
--
1.7.9.5
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 0/3] ceph: assign message data fields consistently
2013-03-04 18:05 OSD Data Read/Write Separation Patches Alex Elder
2013-03-04 18:08 ` [PATCH 0/3] libceph: a few cleanups Alex Elder
2013-03-04 18:10 ` [PATCH 0/3] libceph: simplify incoming message allocation Alex Elder
@ 2013-03-04 18:14 ` Alex Elder
2013-03-04 18:15 ` [PATCH 1/3] ceph: use calc_pages_for() in start_read() Alex Elder
` (3 more replies)
2013-03-04 18:16 ` [PATCH 0/3] libceph: distinguish osd request read and write data Alex Elder
3 siblings, 4 replies; 29+ messages in thread
From: Alex Elder @ 2013-03-04 18:14 UTC (permalink / raw)
To: ceph-devel
Rearrange code so all the fields related to message data get
assigned the same way throughout the ceph code.
-Alex
[PATCH 1/3] ceph: use calc_pages_for() in start_read()
[PATCH 2/3] ceph: simplify ceph_sync_write() page_align calculation
[PATCH 3/3] libceph: don't assign page info in ceph_osdc_new_request()
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 1/3] ceph: use calc_pages_for() in start_read()
2013-03-04 18:14 ` [PATCH 0/3] ceph: assign message data fields consistently Alex Elder
@ 2013-03-04 18:15 ` Alex Elder
2013-03-04 18:15 ` [PATCH 2/3] ceph: simplify ceph_sync_write() page_align calculation Alex Elder
` (2 subsequent siblings)
3 siblings, 0 replies; 29+ messages in thread
From: Alex Elder @ 2013-03-04 18:15 UTC (permalink / raw)
To: ceph-devel
There's a spot that computes the number of pages to allocate for a
page-aligned length by just shifting it. Use calc_pages_for()
instead, to be consistent with usage everywhere else. The result
is the same.
The reason for this is to make it clearer in an upcoming patch that
this calculation is duplicated.
Signed-off-by: Alex Elder <elder@inktank.com>
---
fs/ceph/addr.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index cfef3e0..a284a1f 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -314,7 +314,7 @@ static int start_read(struct inode *inode, struct
list_head *page_list, int max)
return PTR_ERR(req);
/* build page vector */
- nr_pages = len >> PAGE_CACHE_SHIFT;
+ nr_pages = calc_pages_for(0, len);
pages = kmalloc(sizeof(*pages) * nr_pages, GFP_NOFS);
ret = -ENOMEM;
if (!pages)
--
1.7.9.5
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 2/3] ceph: simplify ceph_sync_write() page_align calculation
2013-03-04 18:14 ` [PATCH 0/3] ceph: assign message data fields consistently Alex Elder
2013-03-04 18:15 ` [PATCH 1/3] ceph: use calc_pages_for() in start_read() Alex Elder
@ 2013-03-04 18:15 ` Alex Elder
2013-03-04 18:15 ` [PATCH 3/3] libceph: don't assign page info in ceph_osdc_new_request() Alex Elder
2013-03-05 1:55 ` [PATCH 0/3] ceph: assign message data fields consistently Josh Durgin
3 siblings, 0 replies; 29+ messages in thread
From: Alex Elder @ 2013-03-04 18:15 UTC (permalink / raw)
To: ceph-devel
(This is being reposted. The first one had a problem because it
erroneously added a similar change elsewhere; that change has been
dropped.)
The next patch in this series points out that the calculation for
the number of pages in an osd request is getting done twice. It
is not obvious, but the result of both calculations is identical.
This patch simplifies one of them--as a separate step--to make
it clear that the transformation in the next patch is valid.
In ceph_sync_write() there is some magic that computes page_align
for an osd request. But a little analysis shows it can be
simplified.
First, we have:
io_align = pos & ~PAGE_MASK;
which is used here:
page_align = (pos - io_align + buf_align) & ~PAGE_MASK;
Note (pos - io_align) simply rounds "pos" down to the nearest multiple
of the page size.
We also have:
buf_align = (unsigned long)data & ~PAGE_MASK;
Adding buf_align to that rounded-down "pos" value will stay within
the same page; the result will just be offset by the page offset for
the "data" pointer. The final mask therefore leaves just the value
of "buf_align".
One more simplification. Note that the result of calc_pages_for()
is invariant of which page the offset starts in--the only thing that
matters is the offset within the starting page. We will have
put the proper page offset to use into "page_align", so just use
that in calculating num_pages.
This resolves:
http://tracker.ceph.com/issues/4166
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
---
fs/ceph/file.c | 13 ++++---------
1 file changed, 4 insertions(+), 9 deletions(-)
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 9c4325e..805678e 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -526,15 +526,10 @@ more:
io_align = pos & ~PAGE_MASK;
buf_align = (unsigned long)data & ~PAGE_MASK;
len = left;
- if (file->f_flags & O_DIRECT) {
- /* write from beginning of first page, regardless of
- io alignment */
- page_align = (pos - io_align + buf_align) & ~PAGE_MASK;
- num_pages = calc_pages_for((unsigned long)data, len);
- } else {
- page_align = pos & ~PAGE_MASK;
- num_pages = calc_pages_for(pos, len);
- }
+
+ /* write from beginning of first page, regardless of io alignment */
+ page_align = file->f_flags & O_DIRECT ? buf_align : io_align;
+ num_pages = calc_pages_for(page_align, len);
req = ceph_osdc_new_request(&fsc->client->osdc, &ci->i_layout,
ceph_vino(inode), pos, &len,
CEPH_OSD_OP_WRITE, flags,
--
1.7.9.5
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 3/3] libceph: don't assign page info in ceph_osdc_new_request()
2013-03-04 18:14 ` [PATCH 0/3] ceph: assign message data fields consistently Alex Elder
2013-03-04 18:15 ` [PATCH 1/3] ceph: use calc_pages_for() in start_read() Alex Elder
2013-03-04 18:15 ` [PATCH 2/3] ceph: simplify ceph_sync_write() page_align calculation Alex Elder
@ 2013-03-04 18:15 ` Alex Elder
2013-03-05 1:55 ` [PATCH 0/3] ceph: assign message data fields consistently Josh Durgin
3 siblings, 0 replies; 29+ messages in thread
From: Alex Elder @ 2013-03-04 18:15 UTC (permalink / raw)
To: ceph-devel
Currently ceph_osdc_new_request() assigns an osd request's
r_num_pages and r_alignment fields. The only thing it does
after that is call ceph_osdc_build_request(), and that doesn't
need those fields to be assigned.
Move the assignment of those fields out of ceph_osdc_new_request()
and into its caller. As a result, the page_align parameter is no
longer used, so get rid of it.
Note that in ceph_sync_write(), the value for req->r_num_pages had
already been calculated earlier (as num_pages, and fortunately
it was computed the same way). So don't bother recomputing it,
but because it's not needed earlier, move that calculation after the
call to ceph_osdc_new_request(). Hold off making the assignment to
r_alignment, doing it instead r_pages and r_num_pages are
getting set.
Similarly, in start_read(), nr_pages already holds the number of
pages in the array (and is calculated the same way), so there's no
need to recompute it. Move the assignment of the page alignment
down with the others there as well.
This and the next few patches are preparation work for:
http://tracker.ceph.com/issues/4127
Signed-off-by: Alex Elder <elder@inktank.com>
---
fs/ceph/addr.c | 7 +++++--
fs/ceph/file.c | 9 +++++----
include/linux/ceph/osd_client.h | 2 +-
net/ceph/osd_client.c | 19 ++++++++-----------
4 files changed, 19 insertions(+), 18 deletions(-)
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index a284a1f..e46e944 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -309,7 +309,7 @@ static int start_read(struct inode *inode, struct
list_head *page_list, int max)
CEPH_OSD_OP_READ, CEPH_OSD_FLAG_READ,
NULL, 0,
ci->i_truncate_seq, ci->i_truncate_size,
- NULL, false, 0);
+ NULL, false);
if (IS_ERR(req))
return PTR_ERR(req);
@@ -338,6 +338,7 @@ static int start_read(struct inode *inode, struct
list_head *page_list, int max)
}
req->r_pages = pages;
req->r_num_pages = nr_pages;
+ req->r_page_alignment = 0;
req->r_callback = finish_read;
req->r_inode = inode;
@@ -820,7 +821,7 @@ get_more_pages:
snapc, do_sync,
ci->i_truncate_seq,
ci->i_truncate_size,
- &inode->i_mtime, true, 0);
+ &inode->i_mtime, true);
if (IS_ERR(req)) {
rc = PTR_ERR(req);
@@ -828,6 +829,8 @@ get_more_pages:
break;
}
+ req->r_num_pages = calc_pages_for(0, len);
+ req->r_page_alignment = 0;
max_pages = req->r_num_pages;
alloc_page_vec(fsc, req);
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 805678e..fe7f711 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -527,19 +527,19 @@ more:
buf_align = (unsigned long)data & ~PAGE_MASK;
len = left;
- /* write from beginning of first page, regardless of io alignment */
- page_align = file->f_flags & O_DIRECT ? buf_align : io_align;
- num_pages = calc_pages_for(page_align, len);
req = ceph_osdc_new_request(&fsc->client->osdc, &ci->i_layout,
ceph_vino(inode), pos, &len,
CEPH_OSD_OP_WRITE, flags,
ci->i_snap_realm->cached_context,
do_sync,
ci->i_truncate_seq, ci->i_truncate_size,
- &mtime, false, page_align);
+ &mtime, false);
if (IS_ERR(req))
return PTR_ERR(req);
+ /* write from beginning of first page, regardless of io alignment */
+ page_align = file->f_flags & O_DIRECT ? buf_align : io_align;
+ num_pages = calc_pages_for(page_align, len);
if (file->f_flags & O_DIRECT) {
pages = ceph_get_direct_page_vector(data, num_pages, false);
if (IS_ERR(pages)) {
@@ -573,6 +573,7 @@ more:
}
req->r_pages = pages;
req->r_num_pages = num_pages;
+ req->r_page_alignment = page_align;
req->r_inode = inode;
ret = ceph_osdc_start_request(&fsc->client->osdc, req, false);
diff --git a/include/linux/ceph/osd_client.h
b/include/linux/ceph/osd_client.h
index ec33588..803a9db 100644
--- a/include/linux/ceph/osd_client.h
+++ b/include/linux/ceph/osd_client.h
@@ -247,7 +247,7 @@ extern struct ceph_osd_request
*ceph_osdc_new_request(struct ceph_osd_client *,
int do_sync, u32 truncate_seq,
u64 truncate_size,
struct timespec *mtime,
- bool use_mempool, int page_align);
+ bool use_mempool);
extern void ceph_osdc_set_request_linger(struct ceph_osd_client *osdc,
struct ceph_osd_request *req);
diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 38d09d1..de427cc 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -432,8 +432,7 @@ struct ceph_osd_request
*ceph_osdc_new_request(struct ceph_osd_client *osdc,
u32 truncate_seq,
u64 truncate_size,
struct timespec *mtime,
- bool use_mempool,
- int page_align)
+ bool use_mempool)
{
struct ceph_osd_req_op ops[2];
struct ceph_osd_request *req;
@@ -470,11 +469,6 @@ struct ceph_osd_request
*ceph_osdc_new_request(struct ceph_osd_client *osdc,
snprintf(req->r_oid, sizeof(req->r_oid), "%llx.%08llx", vino.ino, bno);
req->r_oid_len = strlen(req->r_oid);
- /* The alignment may differ from the natural (file) alignment */
-
- req->r_num_pages = calc_pages_for(page_align, *plen);
- req->r_page_alignment = page_align;
-
ceph_osdc_build_request(req, off, *plen, num_op, ops,
snapc, vino.snap, mtime);
@@ -1945,12 +1939,14 @@ int ceph_osdc_readpages(struct ceph_osd_client
*osdc,
req = ceph_osdc_new_request(osdc, layout, vino, off, plen,
CEPH_OSD_OP_READ, CEPH_OSD_FLAG_READ,
NULL, 0, truncate_seq, truncate_size, NULL,
- false, page_align);
+ false);
if (IS_ERR(req))
return PTR_ERR(req);
/* it may be a short read due to an object boundary */
req->r_pages = pages;
+ req->r_num_pages = calc_pages_for(page_align, *plen);
+ req->r_page_alignment = page_align;
dout("readpages final extent is %llu~%llu (%d pages align %d)\n",
off, *plen, req->r_num_pages, page_align);
@@ -1986,14 +1982,15 @@ int ceph_osdc_writepages(struct ceph_osd_client
*osdc, struct ceph_vino vino,
CEPH_OSD_FLAG_ONDISK | CEPH_OSD_FLAG_WRITE,
snapc, 0,
truncate_seq, truncate_size, mtime,
- true, page_align);
+ true);
if (IS_ERR(req))
return PTR_ERR(req);
/* it may be a short write due to an object boundary */
req->r_pages = pages;
- dout("writepages %llu~%llu (%d pages)\n", off, len,
- req->r_num_pages);
+ req->r_num_pages = calc_pages_for(page_align, len);
+ req->r_page_alignment = page_align;
+ dout("writepages %llu~%llu (%d pages)\n", off, len, req->r_num_pages);
rc = ceph_osdc_start_request(osdc, req, true);
if (!rc)
--
1.7.9.5
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 0/3] libceph: distinguish osd request read and write data
2013-03-04 18:05 OSD Data Read/Write Separation Patches Alex Elder
` (2 preceding siblings ...)
2013-03-04 18:14 ` [PATCH 0/3] ceph: assign message data fields consistently Alex Elder
@ 2013-03-04 18:16 ` Alex Elder
2013-03-04 18:17 ` [PATCH 1/3] libceph: separate osd request data info Alex Elder
` (2 more replies)
3 siblings, 3 replies; 29+ messages in thread
From: Alex Elder @ 2013-03-04 18:16 UTC (permalink / raw)
To: ceph-devel
Encapsulate information about osd request data into a
separate data structure, and define separate instances of
that structure for incoming and outgoing data.
-Alex
[PATCH 1/3] libceph: separate osd request data info
[PATCH 2/3] libceph: distinguish page and bio requests
[PATCH 3/3] libceph: separate read and write data
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 1/3] libceph: separate osd request data info
2013-03-04 18:16 ` [PATCH 0/3] libceph: distinguish osd request read and write data Alex Elder
@ 2013-03-04 18:17 ` Alex Elder
2013-03-05 2:13 ` Josh Durgin
2013-03-04 18:18 ` [PATCH 2/3] libceph: distinguish page and bio requests Alex Elder
2013-03-04 18:18 ` [PATCH 3/3] libceph: separate read and write data Alex Elder
2 siblings, 1 reply; 29+ messages in thread
From: Alex Elder @ 2013-03-04 18:17 UTC (permalink / raw)
To: ceph-devel
Pull the fields in an osd request structure that define the data for
the request out into a separate structure.
Signed-off-by: Alex Elder <elder@inktank.com>
---
drivers/block/rbd.c | 8 +++---
fs/ceph/addr.c | 59
+++++++++++++++++++++------------------
fs/ceph/file.c | 8 +++---
include/linux/ceph/osd_client.h | 24 ++++++++++------
net/ceph/osd_client.c | 44 ++++++++++++++---------------
5 files changed, 78 insertions(+), 65 deletions(-)
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 6c81a4c..dcf08f2 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -1412,12 +1412,12 @@ static struct ceph_osd_request *rbd_osd_req_create(
break; /* Nothing to do */
case OBJ_REQUEST_BIO:
rbd_assert(obj_request->bio_list != NULL);
- osd_req->r_bio = obj_request->bio_list;
+ osd_req->r_data.bio = obj_request->bio_list;
break;
case OBJ_REQUEST_PAGES:
- osd_req->r_pages = obj_request->pages;
- osd_req->r_num_pages = obj_request->page_count;
- osd_req->r_page_alignment = offset & ~PAGE_MASK;
+ osd_req->r_data.pages = obj_request->pages;
+ osd_req->r_data.num_pages = obj_request->page_count;
+ osd_req->r_data.alignment = offset & ~PAGE_MASK;
break;
}
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index e46e944..1cf7894 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -243,8 +243,9 @@ static void finish_read(struct ceph_osd_request
*req, struct ceph_msg *msg)
dout("finish_read %p req %p rc %d bytes %d\n", inode, req, rc, bytes);
/* unlock all pages, zeroing any data we didn't read */
- for (i = 0; i < req->r_num_pages; i++, bytes -= PAGE_CACHE_SIZE) {
- struct page *page = req->r_pages[i];
+ BUG_ON(req->r_data.type != CEPH_OSD_DATA_TYPE_PAGES);
+ for (i = 0; i < req->r_data.num_pages; i++, bytes -= PAGE_CACHE_SIZE) {
+ struct page *page = req->r_data.pages[i];
if (bytes < (int)PAGE_CACHE_SIZE) {
/* zero (remainder of) page */
@@ -258,7 +259,7 @@ static void finish_read(struct ceph_osd_request
*req, struct ceph_msg *msg)
unlock_page(page);
page_cache_release(page);
}
- kfree(req->r_pages);
+ kfree(req->r_data.pages);
}
static void ceph_unlock_page_vector(struct page **pages, int num_pages)
@@ -336,9 +337,10 @@ static int start_read(struct inode *inode, struct
list_head *page_list, int max)
}
pages[i] = page;
}
- req->r_pages = pages;
- req->r_num_pages = nr_pages;
- req->r_page_alignment = 0;
+ req->r_data.type = CEPH_OSD_DATA_TYPE_PAGES;
+ req->r_data.pages = pages;
+ req->r_data.num_pages = nr_pages;
+ req->r_data.alignment = 0;
req->r_callback = finish_read;
req->r_inode = inode;
@@ -374,7 +376,8 @@ static int ceph_readpages(struct file *file, struct
address_space *mapping,
max = (fsc->mount_options->rsize + PAGE_CACHE_SIZE - 1)
>> PAGE_SHIFT;
- dout("readpages %p file %p nr_pages %d max %d\n", inode, file, nr_pages,
+ dout("readpages %p file %p nr_pages %d max %d\n", inode,
+ file, nr_pages,
max);
while (!list_empty(page_list)) {
rc = start_read(inode, page_list, max);
@@ -560,6 +563,7 @@ static void writepages_finish(struct
ceph_osd_request *req,
long writeback_stat;
unsigned issued = ceph_caps_issued(ci);
+ BUG_ON(req->r_data.type != CEPH_OSD_DATA_TYPE_PAGES);
if (rc >= 0) {
/*
* Assume we wrote the pages we originally sent. The
@@ -567,7 +571,7 @@ static void writepages_finish(struct
ceph_osd_request *req,
* raced with a truncation and was adjusted at the osd,
* so don't believe the reply.
*/
- wrote = req->r_num_pages;
+ wrote = req->r_data.num_pages;
} else {
wrote = 0;
mapping_set_error(mapping, rc);
@@ -576,8 +580,8 @@ static void writepages_finish(struct
ceph_osd_request *req,
inode, rc, bytes, wrote);
/* clean all pages */
- for (i = 0; i < req->r_num_pages; i++) {
- page = req->r_pages[i];
+ for (i = 0; i < req->r_data.num_pages; i++) {
+ page = req->r_data.pages[i];
BUG_ON(!page);
WARN_ON(!PageUptodate(page));
@@ -606,31 +610,31 @@ static void writepages_finish(struct
ceph_osd_request *req,
unlock_page(page);
}
dout("%p wrote+cleaned %d pages\n", inode, wrote);
- ceph_put_wrbuffer_cap_refs(ci, req->r_num_pages, snapc);
+ ceph_put_wrbuffer_cap_refs(ci, req->r_data.num_pages, snapc);
- ceph_release_pages(req->r_pages, req->r_num_pages);
- if (req->r_pages_from_pool)
- mempool_free(req->r_pages,
+ ceph_release_pages(req->r_data.pages, req->r_data.num_pages);
+ if (req->r_data.pages_from_pool)
+ mempool_free(req->r_data.pages,
ceph_sb_to_client(inode->i_sb)->wb_pagevec_pool);
else
- kfree(req->r_pages);
+ kfree(req->r_data.pages);
ceph_osdc_put_request(req);
}
/*
* allocate a page vec, either directly, or if necessary, via a the
- * mempool. we avoid the mempool if we can because req->r_num_pages
+ * mempool. we avoid the mempool if we can because req->r_data.num_pages
* may be less than the maximum write size.
*/
static void alloc_page_vec(struct ceph_fs_client *fsc,
struct ceph_osd_request *req)
{
- req->r_pages = kmalloc(sizeof(struct page *) * req->r_num_pages,
+ req->r_data.pages = kmalloc(sizeof(struct page *) * req->r_data.num_pages,
GFP_NOFS);
- if (!req->r_pages) {
- req->r_pages = mempool_alloc(fsc->wb_pagevec_pool, GFP_NOFS);
- req->r_pages_from_pool = 1;
- WARN_ON(!req->r_pages);
+ if (!req->r_data.pages) {
+ req->r_data.pages = mempool_alloc(fsc->wb_pagevec_pool, GFP_NOFS);
+ req->r_data.pages_from_pool = 1;
+ WARN_ON(!req->r_data.pages);
}
}
@@ -829,9 +833,10 @@ get_more_pages:
break;
}
- req->r_num_pages = calc_pages_for(0, len);
- req->r_page_alignment = 0;
- max_pages = req->r_num_pages;
+ req->r_data.type = CEPH_OSD_DATA_TYPE_PAGES;
+ req->r_data.num_pages = calc_pages_for(0, len);
+ req->r_data.alignment = 0;
+ max_pages = req->r_data.num_pages;
alloc_page_vec(fsc, req);
req->r_callback = writepages_finish;
@@ -853,7 +858,7 @@ get_more_pages:
}
set_page_writeback(page);
- req->r_pages[locked_pages] = page;
+ req->r_data.pages[locked_pages] = page;
locked_pages++;
next = page->index + 1;
}
@@ -883,14 +888,14 @@ get_more_pages:
}
/* submit the write */
- offset = req->r_pages[0]->index << PAGE_CACHE_SHIFT;
+ offset = req->r_data.pages[0]->index << PAGE_CACHE_SHIFT;
len = min((snap_size ? snap_size : i_size_read(inode)) - offset,
(u64)locked_pages << PAGE_CACHE_SHIFT);
dout("writepages got %d pages at %llu~%llu\n",
locked_pages, offset, len);
/* revise final length, page count */
- req->r_num_pages = locked_pages;
+ req->r_data.num_pages = locked_pages;
req->r_request_ops[0].extent.length = cpu_to_le64(len);
req->r_request_ops[0].payload_len = cpu_to_le32(len);
req->r_request->hdr.data_len = cpu_to_le32(len);
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index fe7f711..76739049 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -568,12 +568,12 @@ more:
if ((file->f_flags & O_SYNC) == 0) {
/* get a second commit callback */
req->r_safe_callback = sync_write_commit;
- req->r_own_pages = 1;
+ req->r_data.own_pages = 1;
}
}
- req->r_pages = pages;
- req->r_num_pages = num_pages;
- req->r_page_alignment = page_align;
+ req->r_data.pages = pages;
+ req->r_data.num_pages = num_pages;
+ req->r_data.alignment = page_align;
req->r_inode = inode;
ret = ceph_osdc_start_request(&fsc->client->osdc, req, false);
diff --git a/include/linux/ceph/osd_client.h
b/include/linux/ceph/osd_client.h
index 803a9db..600b827 100644
--- a/include/linux/ceph/osd_client.h
+++ b/include/linux/ceph/osd_client.h
@@ -50,6 +50,21 @@ struct ceph_osd {
#define CEPH_OSD_MAX_OP 10
+struct ceph_osd_data {
+ struct {
+ struct {
+ struct page **pages;
+ u32 num_pages;
+ u32 alignment;
+ bool pages_from_pool;
+ bool own_pages;
+ };
+#ifdef CONFIG_BLOCK
+ struct bio *bio;
+#endif /* CONFIG_BLOCK */
+ };
+};
+
/* an in-flight request */
struct ceph_osd_request {
u64 r_tid; /* unique for this client */
@@ -105,15 +120,8 @@ struct ceph_osd_request {
struct ceph_file_layout r_file_layout;
struct ceph_snap_context *r_snapc; /* snap context for writes */
- unsigned r_num_pages; /* size of page array (follows) */
- unsigned r_page_alignment; /* io offset in first page */
- struct page **r_pages; /* pages for data payload */
- int r_pages_from_pool;
- int r_own_pages; /* if true, i own page list */
-#ifdef CONFIG_BLOCK
- struct bio *r_bio; /* instead of pages */
-#endif
+ struct ceph_osd_data r_data;
struct ceph_pagelist r_trail; /* trailing part of the data */
};
diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index de427cc..1f8c7a7 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -122,9 +122,9 @@ void ceph_osdc_release_request(struct kref *kref)
}
if (req->r_reply)
ceph_msg_put(req->r_reply);
- if (req->r_own_pages)
- ceph_release_page_vector(req->r_pages,
- req->r_num_pages);
+ if (req->r_data.own_pages)
+ ceph_release_page_vector(req->r_data.pages,
+ req->r_data.num_pages);
ceph_put_snap_context(req->r_snapc);
ceph_pagelist_release(&req->r_trail);
if (req->r_mempool)
@@ -1739,11 +1739,11 @@ int ceph_osdc_start_request(struct
ceph_osd_client *osdc,
{
int rc = 0;
- req->r_request->pages = req->r_pages;
- req->r_request->page_count = req->r_num_pages;
- req->r_request->page_alignment = req->r_page_alignment;
+ req->r_request->pages = req->r_data.pages;
+ req->r_request->page_count = req->r_data.num_pages;
+ req->r_request->page_alignment = req->r_data.alignment;
#ifdef CONFIG_BLOCK
- req->r_request->bio = req->r_bio;
+ req->r_request->bio = req->r_data.bio;
#endif
req->r_request->trail = &req->r_trail;
@@ -1944,12 +1944,12 @@ int ceph_osdc_readpages(struct ceph_osd_client
*osdc,
return PTR_ERR(req);
/* it may be a short read due to an object boundary */
- req->r_pages = pages;
- req->r_num_pages = calc_pages_for(page_align, *plen);
- req->r_page_alignment = page_align;
+ req->r_data.pages = pages;
+ req->r_data.num_pages = calc_pages_for(page_align, *plen);
+ req->r_data.alignment = page_align;
dout("readpages final extent is %llu~%llu (%d pages align %d)\n",
- off, *plen, req->r_num_pages, page_align);
+ off, *plen, req->r_data.num_pages, page_align);
rc = ceph_osdc_start_request(osdc, req, false);
if (!rc)
@@ -1987,10 +1987,10 @@ int ceph_osdc_writepages(struct ceph_osd_client
*osdc, struct ceph_vino vino,
return PTR_ERR(req);
/* it may be a short write due to an object boundary */
- req->r_pages = pages;
- req->r_num_pages = calc_pages_for(page_align, len);
- req->r_page_alignment = page_align;
- dout("writepages %llu~%llu (%d pages)\n", off, len, req->r_num_pages);
+ req->r_data.pages = pages;
+ req->r_data.num_pages = calc_pages_for(page_align, len);
+ req->r_data.alignment = page_align;
+ dout("writepages %llu~%llu (%d pages)\n", off, len,
req->r_data.num_pages);
rc = ceph_osdc_start_request(osdc, req, true);
if (!rc)
@@ -2083,22 +2083,22 @@ static struct ceph_msg *get_reply(struct
ceph_connection *con,
m = ceph_msg_get(req->r_reply);
if (data_len > 0) {
- int want = calc_pages_for(req->r_page_alignment, data_len);
+ int want = calc_pages_for(req->r_data.alignment, data_len);
- if (req->r_pages && unlikely(req->r_num_pages < want)) {
+ if (req->r_data.pages && unlikely(req->r_data.num_pages < want)) {
pr_warning("tid %lld reply has %d bytes %d pages, we"
" had only %d pages ready\n", tid, data_len,
- want, req->r_num_pages);
+ want, req->r_data.num_pages);
*skip = 1;
ceph_msg_put(m);
m = NULL;
goto out;
}
- m->pages = req->r_pages;
- m->page_count = req->r_num_pages;
- m->page_alignment = req->r_page_alignment;
+ m->pages = req->r_data.pages;
+ m->page_count = req->r_data.num_pages;
+ m->page_alignment = req->r_data.alignment;
#ifdef CONFIG_BLOCK
- m->bio = req->r_bio;
+ m->bio = req->r_data.bio;
#endif
}
*skip = 0;
--
1.7.9.5
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 2/3] libceph: distinguish page and bio requests
2013-03-04 18:16 ` [PATCH 0/3] libceph: distinguish osd request read and write data Alex Elder
2013-03-04 18:17 ` [PATCH 1/3] libceph: separate osd request data info Alex Elder
@ 2013-03-04 18:18 ` Alex Elder
2013-03-05 2:14 ` Josh Durgin
2013-03-04 18:18 ` [PATCH 3/3] libceph: separate read and write data Alex Elder
2 siblings, 1 reply; 29+ messages in thread
From: Alex Elder @ 2013-03-04 18:18 UTC (permalink / raw)
To: ceph-devel
An osd request uses either pages or a bio list for its data. Use a
union to record information about the two, and add a data type
tag to select between them.
Signed-off-by: Alex Elder <elder@inktank.com>
---
drivers/block/rbd.c | 4 +++
fs/ceph/file.c | 1 +
include/linux/ceph/osd_client.h | 11 +++++++-
net/ceph/osd_client.c | 56
+++++++++++++++++++++++++--------------
4 files changed, 51 insertions(+), 21 deletions(-)
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index dcf08f2..70c3b39 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -1412,12 +1412,16 @@ static struct ceph_osd_request *rbd_osd_req_create(
break; /* Nothing to do */
case OBJ_REQUEST_BIO:
rbd_assert(obj_request->bio_list != NULL);
+ osd_req->r_data.type = CEPH_OSD_DATA_TYPE_BIO;
osd_req->r_data.bio = obj_request->bio_list;
break;
case OBJ_REQUEST_PAGES:
+ osd_req->r_data.type = CEPH_OSD_DATA_TYPE_PAGES;
osd_req->r_data.pages = obj_request->pages;
osd_req->r_data.num_pages = obj_request->page_count;
osd_req->r_data.alignment = offset & ~PAGE_MASK;
+ osd_req->r_data.pages_from_pool = false;
+ osd_req->r_data.own_pages = false;
break;
}
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 76739049..0f7b3f4 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -571,6 +571,7 @@ more:
req->r_data.own_pages = 1;
}
}
+ req->r_data.type = CEPH_OSD_DATA_TYPE_PAGES;
req->r_data.pages = pages;
req->r_data.num_pages = num_pages;
req->r_data.alignment = page_align;
diff --git a/include/linux/ceph/osd_client.h
b/include/linux/ceph/osd_client.h
index 600b827..56604b3 100644
--- a/include/linux/ceph/osd_client.h
+++ b/include/linux/ceph/osd_client.h
@@ -50,8 +50,17 @@ struct ceph_osd {
#define CEPH_OSD_MAX_OP 10
+enum ceph_osd_data_type {
+ CEPH_OSD_DATA_TYPE_NONE,
+ CEPH_OSD_DATA_TYPE_PAGES,
+#ifdef CONFIG_BLOCK
+ CEPH_OSD_DATA_TYPE_BIO,
+#endif /* CONFIG_BLOCK */
+};
+
struct ceph_osd_data {
- struct {
+ enum ceph_osd_data_type type;
+ union {
struct {
struct page **pages;
u32 num_pages;
diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 1f8c7a7..591e1b0 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -122,7 +122,8 @@ void ceph_osdc_release_request(struct kref *kref)
}
if (req->r_reply)
ceph_msg_put(req->r_reply);
- if (req->r_data.own_pages)
+ if (req->r_data.type == CEPH_OSD_DATA_TYPE_PAGES &&
+ req->r_data.own_pages)
ceph_release_page_vector(req->r_data.pages,
req->r_data.num_pages);
ceph_put_snap_context(req->r_snapc);
@@ -188,6 +189,7 @@ struct ceph_osd_request
*ceph_osdc_alloc_request(struct ceph_osd_client *osdc,
}
req->r_reply = msg;
+ req->r_data.type = CEPH_OSD_DATA_TYPE_NONE;
ceph_pagelist_init(&req->r_trail);
/* create request message; allow space for oid */
@@ -1739,12 +1741,17 @@ int ceph_osdc_start_request(struct
ceph_osd_client *osdc,
{
int rc = 0;
- req->r_request->pages = req->r_data.pages;
- req->r_request->page_count = req->r_data.num_pages;
- req->r_request->page_alignment = req->r_data.alignment;
+ if (req->r_data.type == CEPH_OSD_DATA_TYPE_PAGES) {
+ req->r_request->pages = req->r_data.pages;
+ req->r_request->page_count = req->r_data.num_pages;
+ req->r_request->page_alignment = req->r_data.alignment;
#ifdef CONFIG_BLOCK
- req->r_request->bio = req->r_data.bio;
+ } else if (req->r_data.type == CEPH_OSD_DATA_TYPE_BIO) {
+ req->r_request->bio = req->r_data.bio;
#endif
+ } else {
+ pr_err("unknown request data type %d\n", req->r_data.type);
+ }
req->r_request->trail = &req->r_trail;
register_request(osdc, req);
@@ -1944,6 +1951,7 @@ int ceph_osdc_readpages(struct ceph_osd_client *osdc,
return PTR_ERR(req);
/* it may be a short read due to an object boundary */
+ req->r_data.type = CEPH_OSD_DATA_TYPE_PAGES;
req->r_data.pages = pages;
req->r_data.num_pages = calc_pages_for(page_align, *plen);
req->r_data.alignment = page_align;
@@ -1987,6 +1995,7 @@ int ceph_osdc_writepages(struct ceph_osd_client
*osdc, struct ceph_vino vino,
return PTR_ERR(req);
/* it may be a short write due to an object boundary */
+ req->r_data.type = CEPH_OSD_DATA_TYPE_PAGES;
req->r_data.pages = pages;
req->r_data.num_pages = calc_pages_for(page_align, len);
req->r_data.alignment = page_align;
@@ -2083,23 +2092,30 @@ static struct ceph_msg *get_reply(struct
ceph_connection *con,
m = ceph_msg_get(req->r_reply);
if (data_len > 0) {
- int want = calc_pages_for(req->r_data.alignment, data_len);
-
- if (req->r_data.pages && unlikely(req->r_data.num_pages < want)) {
- pr_warning("tid %lld reply has %d bytes %d pages, we"
- " had only %d pages ready\n", tid, data_len,
- want, req->r_data.num_pages);
- *skip = 1;
- ceph_msg_put(m);
- m = NULL;
- goto out;
- }
- m->pages = req->r_data.pages;
- m->page_count = req->r_data.num_pages;
- m->page_alignment = req->r_data.alignment;
+ if (req->r_data.type == CEPH_OSD_DATA_TYPE_PAGES) {
+ int want;
+
+ want = calc_pages_for(req->r_data.alignment, data_len);
+ if (req->r_data.pages &&
+ unlikely(req->r_data.num_pages < want)) {
+
+ pr_warning("tid %lld reply has %d bytes %d "
+ "pages, we had only %d pages ready\n",
+ tid, data_len, want,
+ req->r_data.num_pages);
+ *skip = 1;
+ ceph_msg_put(m);
+ m = NULL;
+ goto out;
+ }
+ m->pages = req->r_data.pages;
+ m->page_count = req->r_data.num_pages;
+ m->page_alignment = req->r_data.alignment;
#ifdef CONFIG_BLOCK
- m->bio = req->r_data.bio;
+ } else if (req->r_data.type == CEPH_OSD_DATA_TYPE_BIO) {
+ m->bio = req->r_data.bio;
#endif
+ }
}
*skip = 0;
req->r_con_filling_msg = con->ops->get(con);
--
1.7.9.5
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 3/3] libceph: separate read and write data
2013-03-04 18:16 ` [PATCH 0/3] libceph: distinguish osd request read and write data Alex Elder
2013-03-04 18:17 ` [PATCH 1/3] libceph: separate osd request data info Alex Elder
2013-03-04 18:18 ` [PATCH 2/3] libceph: distinguish page and bio requests Alex Elder
@ 2013-03-04 18:18 ` Alex Elder
2013-03-05 2:15 ` Josh Durgin
2 siblings, 1 reply; 29+ messages in thread
From: Alex Elder @ 2013-03-04 18:18 UTC (permalink / raw)
To: ceph-devel
An osd request defines information about where data to be read
should be placed as well as where data to write comes from.
Currently these are represented by common fields.
Keep information about data for writing separate from data to be
read by splitting these into data_in and data_out fields.
This is the key patch in this whole series, in that it actually
identifies which osd requests generate outgoing data and which
generate incoming data. It's less obvious (currently) that an osd
CALL op generates both outgoing and incoming data; that's the focus
of some upcoming work.
This resolves:
http://tracker.ceph.com/issues/4127
Signed-off-by: Alex Elder <elder@inktank.com>
---
drivers/block/rbd.c | 18 +++++----
fs/ceph/addr.c | 67 ++++++++++++++++---------------
fs/ceph/file.c | 10 ++---
include/linux/ceph/osd_client.h | 5 ++-
net/ceph/osd_client.c | 83
++++++++++++++++++++++++---------------
5 files changed, 105 insertions(+), 78 deletions(-)
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 70c3b39..a0a6182 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -1385,6 +1385,7 @@ static struct ceph_osd_request *rbd_osd_req_create(
struct ceph_snap_context *snapc = NULL;
struct ceph_osd_client *osdc;
struct ceph_osd_request *osd_req;
+ struct ceph_osd_data *osd_data;
struct timespec now;
struct timespec *mtime;
u64 snap_id = CEPH_NOSNAP;
@@ -1405,6 +1406,7 @@ static struct ceph_osd_request *rbd_osd_req_create(
osd_req = ceph_osdc_alloc_request(osdc, snapc, 1, false, GFP_ATOMIC);
if (!osd_req)
return NULL; /* ENOMEM */
+ osd_data = write_request ? &osd_req->r_data_out : &osd_req->r_data_in;
rbd_assert(obj_request_type_valid(obj_request->type));
switch (obj_request->type) {
@@ -1412,16 +1414,16 @@ static struct ceph_osd_request *rbd_osd_req_create(
break; /* Nothing to do */
case OBJ_REQUEST_BIO:
rbd_assert(obj_request->bio_list != NULL);
- osd_req->r_data.type = CEPH_OSD_DATA_TYPE_BIO;
- osd_req->r_data.bio = obj_request->bio_list;
+ osd_data->type = CEPH_OSD_DATA_TYPE_BIO;
+ osd_data->bio = obj_request->bio_list;
break;
case OBJ_REQUEST_PAGES:
- osd_req->r_data.type = CEPH_OSD_DATA_TYPE_PAGES;
- osd_req->r_data.pages = obj_request->pages;
- osd_req->r_data.num_pages = obj_request->page_count;
- osd_req->r_data.alignment = offset & ~PAGE_MASK;
- osd_req->r_data.pages_from_pool = false;
- osd_req->r_data.own_pages = false;
+ osd_data->type = CEPH_OSD_DATA_TYPE_PAGES;
+ osd_data->pages = obj_request->pages;
+ osd_data->num_pages = obj_request->page_count;
+ osd_data->alignment = offset & ~PAGE_MASK;
+ osd_data->pages_from_pool = false;
+ osd_data->own_pages = false;
break;
}
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 1cf7894..6b223d5 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -243,9 +243,9 @@ static void finish_read(struct ceph_osd_request
*req, struct ceph_msg *msg)
dout("finish_read %p req %p rc %d bytes %d\n", inode, req, rc, bytes);
/* unlock all pages, zeroing any data we didn't read */
- BUG_ON(req->r_data.type != CEPH_OSD_DATA_TYPE_PAGES);
- for (i = 0; i < req->r_data.num_pages; i++, bytes -= PAGE_CACHE_SIZE) {
- struct page *page = req->r_data.pages[i];
+ BUG_ON(req->r_data_in.type != CEPH_OSD_DATA_TYPE_PAGES);
+ for (i = 0; i < req->r_data_in.num_pages; i++) {
+ struct page *page = req->r_data_in.pages[i];
if (bytes < (int)PAGE_CACHE_SIZE) {
/* zero (remainder of) page */
@@ -258,8 +258,9 @@ static void finish_read(struct ceph_osd_request
*req, struct ceph_msg *msg)
SetPageUptodate(page);
unlock_page(page);
page_cache_release(page);
+ bytes -= PAGE_CACHE_SIZE;
}
- kfree(req->r_data.pages);
+ kfree(req->r_data_in.pages);
}
static void ceph_unlock_page_vector(struct page **pages, int num_pages)
@@ -337,10 +338,10 @@ static int start_read(struct inode *inode, struct
list_head *page_list, int max)
}
pages[i] = page;
}
- req->r_data.type = CEPH_OSD_DATA_TYPE_PAGES;
- req->r_data.pages = pages;
- req->r_data.num_pages = nr_pages;
- req->r_data.alignment = 0;
+ req->r_data_in.type = CEPH_OSD_DATA_TYPE_PAGES;
+ req->r_data_in.pages = pages;
+ req->r_data_in.num_pages = nr_pages;
+ req->r_data_in.alignment = 0;
req->r_callback = finish_read;
req->r_inode = inode;
@@ -563,7 +564,7 @@ static void writepages_finish(struct
ceph_osd_request *req,
long writeback_stat;
unsigned issued = ceph_caps_issued(ci);
- BUG_ON(req->r_data.type != CEPH_OSD_DATA_TYPE_PAGES);
+ BUG_ON(req->r_data_out.type != CEPH_OSD_DATA_TYPE_PAGES);
if (rc >= 0) {
/*
* Assume we wrote the pages we originally sent. The
@@ -571,7 +572,7 @@ static void writepages_finish(struct
ceph_osd_request *req,
* raced with a truncation and was adjusted at the osd,
* so don't believe the reply.
*/
- wrote = req->r_data.num_pages;
+ wrote = req->r_data_out.num_pages;
} else {
wrote = 0;
mapping_set_error(mapping, rc);
@@ -580,8 +581,8 @@ static void writepages_finish(struct
ceph_osd_request *req,
inode, rc, bytes, wrote);
/* clean all pages */
- for (i = 0; i < req->r_data.num_pages; i++) {
- page = req->r_data.pages[i];
+ for (i = 0; i < req->r_data_out.num_pages; i++) {
+ page = req->r_data_out.pages[i];
BUG_ON(!page);
WARN_ON(!PageUptodate(page));
@@ -610,31 +611,34 @@ static void writepages_finish(struct
ceph_osd_request *req,
unlock_page(page);
}
dout("%p wrote+cleaned %d pages\n", inode, wrote);
- ceph_put_wrbuffer_cap_refs(ci, req->r_data.num_pages, snapc);
+ ceph_put_wrbuffer_cap_refs(ci, req->r_data_out.num_pages, snapc);
- ceph_release_pages(req->r_data.pages, req->r_data.num_pages);
- if (req->r_data.pages_from_pool)
- mempool_free(req->r_data.pages,
+ ceph_release_pages(req->r_data_out.pages, req->r_data_out.num_pages);
+ if (req->r_data_out.pages_from_pool)
+ mempool_free(req->r_data_out.pages,
ceph_sb_to_client(inode->i_sb)->wb_pagevec_pool);
else
- kfree(req->r_data.pages);
+ kfree(req->r_data_out.pages);
ceph_osdc_put_request(req);
}
/*
* allocate a page vec, either directly, or if necessary, via a the
- * mempool. we avoid the mempool if we can because req->r_data.num_pages
+ * mempool. we avoid the mempool if we can because
req->r_data_out.num_pages
* may be less than the maximum write size.
*/
static void alloc_page_vec(struct ceph_fs_client *fsc,
struct ceph_osd_request *req)
{
- req->r_data.pages = kmalloc(sizeof(struct page *) * req->r_data.num_pages,
- GFP_NOFS);
- if (!req->r_data.pages) {
- req->r_data.pages = mempool_alloc(fsc->wb_pagevec_pool, GFP_NOFS);
- req->r_data.pages_from_pool = 1;
- WARN_ON(!req->r_data.pages);
+ size_t size;
+
+ size = sizeof (struct page *) * req->r_data_out.num_pages;
+ req->r_data_out.pages = kmalloc(size, GFP_NOFS);
+ if (!req->r_data_out.pages) {
+ req->r_data_out.pages = mempool_alloc(fsc->wb_pagevec_pool,
+ GFP_NOFS);
+ req->r_data_out.pages_from_pool = 1;
+ WARN_ON(!req->r_data_out.pages);
}
}
@@ -833,10 +837,11 @@ get_more_pages:
break;
}
- req->r_data.type = CEPH_OSD_DATA_TYPE_PAGES;
- req->r_data.num_pages = calc_pages_for(0, len);
- req->r_data.alignment = 0;
- max_pages = req->r_data.num_pages;
+ req->r_data_out.type = CEPH_OSD_DATA_TYPE_PAGES;
+ req->r_data_out.num_pages =
+ calc_pages_for(0, len);
+ req->r_data_out.alignment = 0;
+ max_pages = req->r_data_out.num_pages;
alloc_page_vec(fsc, req);
req->r_callback = writepages_finish;
@@ -858,7 +863,7 @@ get_more_pages:
}
set_page_writeback(page);
- req->r_data.pages[locked_pages] = page;
+ req->r_data_out.pages[locked_pages] = page;
locked_pages++;
next = page->index + 1;
}
@@ -888,14 +893,14 @@ get_more_pages:
}
/* submit the write */
- offset = req->r_data.pages[0]->index << PAGE_CACHE_SHIFT;
+ offset = req->r_data_out.pages[0]->index << PAGE_CACHE_SHIFT;
len = min((snap_size ? snap_size : i_size_read(inode)) - offset,
(u64)locked_pages << PAGE_CACHE_SHIFT);
dout("writepages got %d pages at %llu~%llu\n",
locked_pages, offset, len);
/* revise final length, page count */
- req->r_data.num_pages = locked_pages;
+ req->r_data_out.num_pages = locked_pages;
req->r_request_ops[0].extent.length = cpu_to_le64(len);
req->r_request_ops[0].payload_len = cpu_to_le32(len);
req->r_request->hdr.data_len = cpu_to_le32(len);
diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index 0f7b3f4..431cd7b 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -568,13 +568,13 @@ more:
if ((file->f_flags & O_SYNC) == 0) {
/* get a second commit callback */
req->r_safe_callback = sync_write_commit;
- req->r_data.own_pages = 1;
+ req->r_data_out.own_pages = 1;
}
}
- req->r_data.type = CEPH_OSD_DATA_TYPE_PAGES;
- req->r_data.pages = pages;
- req->r_data.num_pages = num_pages;
- req->r_data.alignment = page_align;
+ req->r_data_out.type = CEPH_OSD_DATA_TYPE_PAGES;
+ req->r_data_out.pages = pages;
+ req->r_data_out.num_pages = num_pages;
+ req->r_data_out.alignment = page_align;
req->r_inode = inode;
ret = ceph_osdc_start_request(&fsc->client->osdc, req, false);
diff --git a/include/linux/ceph/osd_client.h
b/include/linux/ceph/osd_client.h
index 56604b3..40e0260 100644
--- a/include/linux/ceph/osd_client.h
+++ b/include/linux/ceph/osd_client.h
@@ -130,8 +130,9 @@ struct ceph_osd_request {
struct ceph_file_layout r_file_layout;
struct ceph_snap_context *r_snapc; /* snap context for writes */
- struct ceph_osd_data r_data;
- struct ceph_pagelist r_trail; /* trailing part of the data */
+ struct ceph_osd_data r_data_in;
+ struct ceph_osd_data r_data_out;
+ struct ceph_pagelist r_trail; /* trailing part of data out */
};
struct ceph_osd_event {
diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 591e1b0..f9cf445 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -122,10 +122,16 @@ void ceph_osdc_release_request(struct kref *kref)
}
if (req->r_reply)
ceph_msg_put(req->r_reply);
- if (req->r_data.type == CEPH_OSD_DATA_TYPE_PAGES &&
- req->r_data.own_pages)
- ceph_release_page_vector(req->r_data.pages,
- req->r_data.num_pages);
+
+ if (req->r_data_in.type == CEPH_OSD_DATA_TYPE_PAGES &&
+ req->r_data_in.own_pages)
+ ceph_release_page_vector(req->r_data_in.pages,
+ req->r_data_in.num_pages);
+ if (req->r_data_out.type == CEPH_OSD_DATA_TYPE_PAGES &&
+ req->r_data_out.own_pages)
+ ceph_release_page_vector(req->r_data_out.pages,
+ req->r_data_out.num_pages);
+
ceph_put_snap_context(req->r_snapc);
ceph_pagelist_release(&req->r_trail);
if (req->r_mempool)
@@ -189,7 +195,8 @@ struct ceph_osd_request
*ceph_osdc_alloc_request(struct ceph_osd_client *osdc,
}
req->r_reply = msg;
- req->r_data.type = CEPH_OSD_DATA_TYPE_NONE;
+ req->r_data_in.type = CEPH_OSD_DATA_TYPE_NONE;
+ req->r_data_out.type = CEPH_OSD_DATA_TYPE_NONE;
ceph_pagelist_init(&req->r_trail);
/* create request message; allow space for oid */
@@ -1740,17 +1747,21 @@ int ceph_osdc_start_request(struct
ceph_osd_client *osdc,
bool nofail)
{
int rc = 0;
+ struct ceph_osd_data *osd_data;
+
+ /* Set up outgoing data */
- if (req->r_data.type == CEPH_OSD_DATA_TYPE_PAGES) {
- req->r_request->pages = req->r_data.pages;
- req->r_request->page_count = req->r_data.num_pages;
- req->r_request->page_alignment = req->r_data.alignment;
+ osd_data = &req->r_data_out;
+ if (osd_data->type == CEPH_OSD_DATA_TYPE_PAGES) {
+ req->r_request->pages = osd_data->pages;
+ req->r_request->page_count = osd_data->num_pages;
+ req->r_request->page_alignment = osd_data->alignment;
#ifdef CONFIG_BLOCK
- } else if (req->r_data.type == CEPH_OSD_DATA_TYPE_BIO) {
- req->r_request->bio = req->r_data.bio;
+ } else if (osd_data->type == CEPH_OSD_DATA_TYPE_BIO) {
+ req->r_request->bio = osd_data->bio;
#endif
} else {
- pr_err("unknown request data type %d\n", req->r_data.type);
+ BUG_ON(osd_data->type != CEPH_OSD_DATA_TYPE_NONE);
}
req->r_request->trail = &req->r_trail;
@@ -1939,6 +1950,7 @@ int ceph_osdc_readpages(struct ceph_osd_client *osdc,
struct page **pages, int num_pages, int page_align)
{
struct ceph_osd_request *req;
+ struct ceph_osd_data *osd_data;
int rc = 0;
dout("readpages on ino %llx.%llx on %llu~%llu\n", vino.ino,
@@ -1951,13 +1963,15 @@ int ceph_osdc_readpages(struct ceph_osd_client
*osdc,
return PTR_ERR(req);
/* it may be a short read due to an object boundary */
- req->r_data.type = CEPH_OSD_DATA_TYPE_PAGES;
- req->r_data.pages = pages;
- req->r_data.num_pages = calc_pages_for(page_align, *plen);
- req->r_data.alignment = page_align;
+
+ osd_data = &req->r_data_in;
+ osd_data->type = CEPH_OSD_DATA_TYPE_PAGES;
+ osd_data->pages = pages;
+ osd_data->num_pages = calc_pages_for(page_align, *plen);
+ osd_data->alignment = page_align;
dout("readpages final extent is %llu~%llu (%d pages align %d)\n",
- off, *plen, req->r_data.num_pages, page_align);
+ off, *plen, osd_data->num_pages, page_align);
rc = ceph_osdc_start_request(osdc, req, false);
if (!rc)
@@ -1981,6 +1995,7 @@ int ceph_osdc_writepages(struct ceph_osd_client
*osdc, struct ceph_vino vino,
struct page **pages, int num_pages)
{
struct ceph_osd_request *req;
+ struct ceph_osd_data *osd_data;
int rc = 0;
int page_align = off & ~PAGE_MASK;
@@ -1995,11 +2010,13 @@ int ceph_osdc_writepages(struct ceph_osd_client
*osdc, struct ceph_vino vino,
return PTR_ERR(req);
/* it may be a short write due to an object boundary */
- req->r_data.type = CEPH_OSD_DATA_TYPE_PAGES;
- req->r_data.pages = pages;
- req->r_data.num_pages = calc_pages_for(page_align, len);
- req->r_data.alignment = page_align;
- dout("writepages %llu~%llu (%d pages)\n", off, len,
req->r_data.num_pages);
+ osd_data = &req->r_data_out;
+ osd_data->type = CEPH_OSD_DATA_TYPE_PAGES;
+ osd_data->pages = pages;
+ osd_data->num_pages = calc_pages_for(page_align, len);
+ osd_data->alignment = page_align;
+ dout("writepages %llu~%llu (%d pages)\n", off, len,
+ osd_data->num_pages);
rc = ceph_osdc_start_request(osdc, req, true);
if (!rc)
@@ -2092,28 +2109,30 @@ static struct ceph_msg *get_reply(struct
ceph_connection *con,
m = ceph_msg_get(req->r_reply);
if (data_len > 0) {
- if (req->r_data.type == CEPH_OSD_DATA_TYPE_PAGES) {
+ struct ceph_osd_data *osd_data = &req->r_data_in;
+
+ if (osd_data->type == CEPH_OSD_DATA_TYPE_PAGES) {
int want;
- want = calc_pages_for(req->r_data.alignment, data_len);
- if (req->r_data.pages &&
- unlikely(req->r_data.num_pages < want)) {
+ want = calc_pages_for(osd_data->alignment, data_len);
+ if (osd_data->pages &&
+ unlikely(osd_data->num_pages < want)) {
pr_warning("tid %lld reply has %d bytes %d "
"pages, we had only %d pages ready\n",
tid, data_len, want,
- req->r_data.num_pages);
+ osd_data->num_pages);
*skip = 1;
ceph_msg_put(m);
m = NULL;
goto out;
}
- m->pages = req->r_data.pages;
- m->page_count = req->r_data.num_pages;
- m->page_alignment = req->r_data.alignment;
+ m->pages = osd_data->pages;
+ m->page_count = osd_data->num_pages;
+ m->page_alignment = osd_data->alignment;
#ifdef CONFIG_BLOCK
- } else if (req->r_data.type == CEPH_OSD_DATA_TYPE_BIO) {
- m->bio = req->r_data.bio;
+ } else if (osd_data->type == CEPH_OSD_DATA_TYPE_BIO) {
+ m->bio = osd_data->bio;
#endif
}
}
--
1.7.9.5
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 2/3] libceph: define mds_alloc_msg() method
2013-03-04 18:12 ` [PATCH 2/3] libceph: define mds_alloc_msg() method Alex Elder
@ 2013-03-04 19:05 ` Gregory Farnum
2013-03-04 19:37 ` Alex Elder
0 siblings, 1 reply; 29+ messages in thread
From: Gregory Farnum @ 2013-03-04 19:05 UTC (permalink / raw)
To: Alex Elder; +Cc: ceph-devel@vger.kernel.org
This looks like a faithful reshuffling to me.
But...
On Mon, Mar 4, 2013 at 10:12 AM, Alex Elder <elder@inktank.com> wrote:
> diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
> index 6ec6051..c7d4278 100644
> --- a/net/ceph/messenger.c
> +++ b/net/ceph/messenger.c
> @@ -2804,55 +2804,34 @@ static int ceph_alloc_middle(struct
> ceph_connection *con, struct ceph_msg *msg)
> static int ceph_con_in_msg_alloc(struct ceph_connection *con, int *skip)
> {
> struct ceph_msg_header *hdr = &con->in_hdr;
> - int type = le16_to_cpu(hdr->type);
> - int front_len = le32_to_cpu(hdr->front_len);
> int middle_len = le32_to_cpu(hdr->middle_len);
> struct ceph_msg *msg;
> int ret = 0;
>
> BUG_ON(con->in_msg != NULL);
> + BUG_ON(!con->ops->alloc_msg);
>
> - if (con->ops->alloc_msg) {
> - mutex_unlock(&con->mutex);
> - msg = con->ops->alloc_msg(con, hdr, skip);
> - mutex_lock(&con->mutex);
> - if (con->state != CON_STATE_OPEN) {
> - if (msg)
> - ceph_msg_put(msg);
> - return -EAGAIN;
> - }
> - con->in_msg = msg;
> - if (con->in_msg) {
> - con->in_msg->con = con->ops->get(con);
> - BUG_ON(con->in_msg->con == NULL);
> - }
> - if (*skip) {
> - con->in_msg = NULL;
> - return 0;
> - }
> - if (!con->in_msg) {
> - con->error_msg =
> - "error allocating memory for incoming message";
> - return -ENOMEM;
> - }
> - }
> - if (!con->in_msg) {
> - mutex_unlock(&con->mutex);
> - msg = ceph_msg_new(type, front_len, GFP_NOFS, false);
> - mutex_lock(&con->mutex);
> - if (!msg) {
> - pr_err("unable to allocate msg type %d len %d\n",
> - type, front_len);
> - return -ENOMEM;
> - }
> - if (con->state != CON_STATE_OPEN) {
> + mutex_unlock(&con->mutex);
> + msg = con->ops->alloc_msg(con, hdr, skip);
> + mutex_lock(&con->mutex);
> + if (con->state != CON_STATE_OPEN) {
> + if (msg)
> ceph_msg_put(msg);
> - return -EAGAIN;
> - }
> - con->in_msg = msg;
> + return -EAGAIN;
> + }
> + con->in_msg = msg;
> + if (con->in_msg) {
> con->in_msg->con = con->ops->get(con);
> BUG_ON(con->in_msg->con == NULL);
> - con->in_msg->page_alignment = le16_to_cpu(hdr->data_off);
> + }
> + if (*skip) {
> + con->in_msg = NULL;
> + return 0;
This looks like a memory leak to me? I'm not familiar with the skip
infrastructure; is there something that cleans up those messages in
the background, or should this actually be a BUG_ON?
-Greg
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/3] libceph: drop mutex while allocating a message
2013-03-04 18:12 ` [PATCH 1/3] libceph: drop mutex while allocating a message Alex Elder
@ 2013-03-04 19:07 ` Gregory Farnum
2013-03-04 19:25 ` Alex Elder
0 siblings, 1 reply; 29+ messages in thread
From: Gregory Farnum @ 2013-03-04 19:07 UTC (permalink / raw)
To: Alex Elder; +Cc: ceph-devel@vger.kernel.org
On Mon, Mar 4, 2013 at 10:12 AM, Alex Elder <elder@inktank.com> wrote:
> In ceph_con_in_msg_alloc(), if no alloc_msg method is defined for a
> connection a new message is allocated with ceph_msg_new().
>
> Drop the mutex before making this call, and make sure we're still
> connected when we get it back again.
Why do we need to drop the mutex at all? The mds_alloc_msg() that
you're about to define doesn't seem to need it.
-Greg
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 3/3] libceph: no need for alignment for mds message
2013-03-04 18:12 ` [PATCH 3/3] libceph: no need for alignment for mds message Alex Elder
@ 2013-03-04 19:09 ` Gregory Farnum
0 siblings, 0 replies; 29+ messages in thread
From: Gregory Farnum @ 2013-03-04 19:09 UTC (permalink / raw)
To: Alex Elder; +Cc: ceph-devel@vger.kernel.org
Looks good.
Reviewed-by: Greg Farnum <greg@inktank.com>
On Mon, Mar 4, 2013 at 10:12 AM, Alex Elder <elder@inktank.com> wrote:
> Currently, incoming mds messages never use page data, which means
> there is no need to set the page_alignment field in the message.
>
> Signed-off-by: Alex Elder <elder@inktank.com>
> ---
> fs/ceph/mds_client.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c
> index a4b986a..afb9dc9 100644
> --- a/fs/ceph/mds_client.c
> +++ b/fs/ceph/mds_client.c
> @@ -3486,7 +3486,6 @@ static struct ceph_msg *mds_alloc_msg(struct
> ceph_connection *con,
> type, front_len);
> return NULL;
> }
> - msg->page_alignment = (unsigned int) le16_to_cpu(hdr->data_off);
>
> return msg;
> }
> --
> 1.7.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/3] libceph: drop mutex while allocating a message
2013-03-04 19:07 ` Gregory Farnum
@ 2013-03-04 19:25 ` Alex Elder
2013-03-04 19:39 ` Gregory Farnum
0 siblings, 1 reply; 29+ messages in thread
From: Alex Elder @ 2013-03-04 19:25 UTC (permalink / raw)
To: Gregory Farnum; +Cc: ceph-devel@vger.kernel.org
On 03/04/2013 01:07 PM, Gregory Farnum wrote:
> On Mon, Mar 4, 2013 at 10:12 AM, Alex Elder <elder@inktank.com> wrote:
>> In ceph_con_in_msg_alloc(), if no alloc_msg method is defined for a
>> connection a new message is allocated with ceph_msg_new().
>>
>> Drop the mutex before making this call, and make sure we're still
>> connected when we get it back again.
>
> Why do we need to drop the mutex at all? The mds_alloc_msg() that
> you're about to define doesn't seem to need it.
> -Greg
>
My purpose in doing this is to make the third patch in
this series trivial to review. That is, I make the code
here match what will happen when I unify how all alloc_msg
calls get made, as a pre-step, and then it's easier to see
that the later patch is correct.
I agree, it's not needed for that allocator. By the same
token, it almost certainly causes no harm, and the end result
will be consistent handling by all users of this interface.
-Alex
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2/3] libceph: define mds_alloc_msg() method
2013-03-04 19:05 ` Gregory Farnum
@ 2013-03-04 19:37 ` Alex Elder
0 siblings, 0 replies; 29+ messages in thread
From: Alex Elder @ 2013-03-04 19:37 UTC (permalink / raw)
To: Gregory Farnum; +Cc: ceph-devel@vger.kernel.org
On 03/04/2013 01:05 PM, Gregory Farnum wrote:
> This looks like a faithful reshuffling to me.
>
> But...
> On Mon, Mar 4, 2013 at 10:12 AM, Alex Elder <elder@inktank.com> wrote:
>> diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
>> index 6ec6051..c7d4278 100644
>> --- a/net/ceph/messenger.c
>> +++ b/net/ceph/messenger.c
>> @@ -2804,55 +2804,34 @@ static int ceph_alloc_middle(struct
>> ceph_connection *con, struct ceph_msg *msg)
>> static int ceph_con_in_msg_alloc(struct ceph_connection *con, int *skip)
>> {
>> struct ceph_msg_header *hdr = &con->in_hdr;
>> - int type = le16_to_cpu(hdr->type);
>> - int front_len = le32_to_cpu(hdr->front_len);
>> int middle_len = le32_to_cpu(hdr->middle_len);
>> struct ceph_msg *msg;
>> int ret = 0;
>>
>> BUG_ON(con->in_msg != NULL);
>> + BUG_ON(!con->ops->alloc_msg);
>>
>> - if (con->ops->alloc_msg) {
>> - mutex_unlock(&con->mutex);
>> - msg = con->ops->alloc_msg(con, hdr, skip);
>> - mutex_lock(&con->mutex);
>> - if (con->state != CON_STATE_OPEN) {
>> - if (msg)
>> - ceph_msg_put(msg);
>> - return -EAGAIN;
>> - }
>> - con->in_msg = msg;
>> - if (con->in_msg) {
>> - con->in_msg->con = con->ops->get(con);
>> - BUG_ON(con->in_msg->con == NULL);
>> - }
>> - if (*skip) {
>> - con->in_msg = NULL;
>> - return 0;
>> - }
>> - if (!con->in_msg) {
>> - con->error_msg =
>> - "error allocating memory for incoming message";
>> - return -ENOMEM;
>> - }
>> - }
>> - if (!con->in_msg) {
>> - mutex_unlock(&con->mutex);
>> - msg = ceph_msg_new(type, front_len, GFP_NOFS, false);
>> - mutex_lock(&con->mutex);
>> - if (!msg) {
>> - pr_err("unable to allocate msg type %d len %d\n",
>> - type, front_len);
>> - return -ENOMEM;
>> - }
>> - if (con->state != CON_STATE_OPEN) {
>> + mutex_unlock(&con->mutex);
>> + msg = con->ops->alloc_msg(con, hdr, skip);
>> + mutex_lock(&con->mutex);
>> + if (con->state != CON_STATE_OPEN) {
>> + if (msg)
>> ceph_msg_put(msg);
>> - return -EAGAIN;
>> - }
>> - con->in_msg = msg;
>> + return -EAGAIN;
>> + }
>> + con->in_msg = msg;
>> + if (con->in_msg) {
>> con->in_msg->con = con->ops->get(con);
>> BUG_ON(con->in_msg->con == NULL);
>> - con->in_msg->page_alignment = le16_to_cpu(hdr->data_off);
>> + }
>> + if (*skip) {
>> + con->in_msg = NULL;
>> + return 0;
>
> This looks like a memory leak to me? I'm not familiar with the skip
> infrastructure; is there something that cleans up those messages in
> the background, or should this actually be a BUG_ON?
It looks that way to me too, but the con->ops->get(con) call before
it makes it a little tricky. I will investigate and get back
to you, and if it is a leak I'll insert a fix for it ahead of this
in the series.
I really appreciate your reviewing these.
-Alex
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/3] libceph: drop mutex while allocating a message
2013-03-04 19:25 ` Alex Elder
@ 2013-03-04 19:39 ` Gregory Farnum
2013-03-04 19:52 ` Alex Elder
0 siblings, 1 reply; 29+ messages in thread
From: Gregory Farnum @ 2013-03-04 19:39 UTC (permalink / raw)
To: Alex Elder; +Cc: ceph-devel@vger.kernel.org
On Mon, Mar 4, 2013 at 11:25 AM, Alex Elder <elder@inktank.com> wrote:
> On 03/04/2013 01:07 PM, Gregory Farnum wrote:
>> On Mon, Mar 4, 2013 at 10:12 AM, Alex Elder <elder@inktank.com> wrote:
>>> In ceph_con_in_msg_alloc(), if no alloc_msg method is defined for a
>>> connection a new message is allocated with ceph_msg_new().
>>>
>>> Drop the mutex before making this call, and make sure we're still
>>> connected when we get it back again.
>>
>> Why do we need to drop the mutex at all? The mds_alloc_msg() that
>> you're about to define doesn't seem to need it.
>> -Greg
>>
>
> My purpose in doing this is to make the third patch in
> this series trivial to review.
The one that doesn't set the page_alignment for mds messages? *confused*
> That is, I make the code
> here match what will happen when I unify how all alloc_msg
> calls get made, as a pre-step, and then it's easier to see
> that the later patch is correct.
Oh, and we do do the mutex_lock and unlock pair around alloc_msg.
Right, looks good! :)
Reviewed-by: Greg Farnum <greg@inktank.com>
>
> I agree, it's not needed for that allocator. By the same
> token, it almost certainly causes no harm, and the end result
> will be consistent handling by all users of this interface.
>
> -Alex
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/3] libceph: drop mutex while allocating a message
2013-03-04 19:39 ` Gregory Farnum
@ 2013-03-04 19:52 ` Alex Elder
0 siblings, 0 replies; 29+ messages in thread
From: Alex Elder @ 2013-03-04 19:52 UTC (permalink / raw)
To: Gregory Farnum; +Cc: ceph-devel@vger.kernel.org
On 03/04/2013 01:39 PM, Gregory Farnum wrote:
>> My purpose in doing this is to make the third patch in
>> > this series trivial to review.
> The one that doesn't set the page_alignment for mds messages? *confused*
>
Sorry, second patch in the series. -Alex
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 0/3] libceph: a few cleanups
2013-03-04 18:08 ` [PATCH 0/3] libceph: a few cleanups Alex Elder
` (2 preceding siblings ...)
2013-03-04 18:09 ` [PATCH 3/3] libceph: rename ceph_calc_object_layout() Alex Elder
@ 2013-03-05 1:49 ` Josh Durgin
3 siblings, 0 replies; 29+ messages in thread
From: Josh Durgin @ 2013-03-05 1:49 UTC (permalink / raw)
To: Alex Elder; +Cc: ceph-devel
On 03/04/2013 10:08 AM, Alex Elder wrote:
> Three small cleanups in the libceph code. -Alex
>
> [PATCH 1/3] libceph: use (void *) for untyped data in osd ops
> [PATCH 2/3] libceph: kill ceph_msg->pagelist_count
> [PATCH 3/3] libceph: rename ceph_calc_object_layout()
These look good.
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 0/3] ceph: assign message data fields consistently
2013-03-04 18:14 ` [PATCH 0/3] ceph: assign message data fields consistently Alex Elder
` (2 preceding siblings ...)
2013-03-04 18:15 ` [PATCH 3/3] libceph: don't assign page info in ceph_osdc_new_request() Alex Elder
@ 2013-03-05 1:55 ` Josh Durgin
3 siblings, 0 replies; 29+ messages in thread
From: Josh Durgin @ 2013-03-05 1:55 UTC (permalink / raw)
To: Alex Elder; +Cc: ceph-devel
On 03/04/2013 10:14 AM, Alex Elder wrote:
> Rearrange code so all the fields related to message data get
> assigned the same way throughout the ceph code.
>
> -Alex
>
> [PATCH 1/3] ceph: use calc_pages_for() in start_read()
> [PATCH 2/3] ceph: simplify ceph_sync_write() page_align calculation
> [PATCH 3/3] libceph: don't assign page info in ceph_osdc_new_request()
These look good.
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/3] libceph: separate osd request data info
2013-03-04 18:17 ` [PATCH 1/3] libceph: separate osd request data info Alex Elder
@ 2013-03-05 2:13 ` Josh Durgin
0 siblings, 0 replies; 29+ messages in thread
From: Josh Durgin @ 2013-03-05 2:13 UTC (permalink / raw)
To: Alex Elder; +Cc: ceph-devel
This looks good, although the are a few lines that go in the next
commit. With those moved:
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
On 03/04/2013 10:17 AM, Alex Elder wrote:
> Pull the fields in an osd request structure that define the data for
> the request out into a separate structure.
>
> Signed-off-by: Alex Elder <elder@inktank.com>
> ---
> drivers/block/rbd.c | 8 +++---
> fs/ceph/addr.c | 59
> +++++++++++++++++++++------------------
> fs/ceph/file.c | 8 +++---
> include/linux/ceph/osd_client.h | 24 ++++++++++------
> net/ceph/osd_client.c | 44 ++++++++++++++---------------
> 5 files changed, 78 insertions(+), 65 deletions(-)
>
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index 6c81a4c..dcf08f2 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -1412,12 +1412,12 @@ static struct ceph_osd_request *rbd_osd_req_create(
> break; /* Nothing to do */
> case OBJ_REQUEST_BIO:
> rbd_assert(obj_request->bio_list != NULL);
> - osd_req->r_bio = obj_request->bio_list;
> + osd_req->r_data.bio = obj_request->bio_list;
> break;
> case OBJ_REQUEST_PAGES:
> - osd_req->r_pages = obj_request->pages;
> - osd_req->r_num_pages = obj_request->page_count;
> - osd_req->r_page_alignment = offset & ~PAGE_MASK;
> + osd_req->r_data.pages = obj_request->pages;
> + osd_req->r_data.num_pages = obj_request->page_count;
> + osd_req->r_data.alignment = offset & ~PAGE_MASK;
> break;
> }
>
> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> index e46e944..1cf7894 100644
> --- a/fs/ceph/addr.c
> +++ b/fs/ceph/addr.c
> @@ -243,8 +243,9 @@ static void finish_read(struct ceph_osd_request
> *req, struct ceph_msg *msg)
> dout("finish_read %p req %p rc %d bytes %d\n", inode, req, rc, bytes);
>
> /* unlock all pages, zeroing any data we didn't read */
> - for (i = 0; i < req->r_num_pages; i++, bytes -= PAGE_CACHE_SIZE) {
> - struct page *page = req->r_pages[i];
> + BUG_ON(req->r_data.type != CEPH_OSD_DATA_TYPE_PAGES);
This belongs in the next commit.
> + for (i = 0; i < req->r_data.num_pages; i++, bytes -= PAGE_CACHE_SIZE) {
> + struct page *page = req->r_data.pages[i];
>
> if (bytes < (int)PAGE_CACHE_SIZE) {
> /* zero (remainder of) page */
> @@ -258,7 +259,7 @@ static void finish_read(struct ceph_osd_request
> *req, struct ceph_msg *msg)
> unlock_page(page);
> page_cache_release(page);
> }
> - kfree(req->r_pages);
> + kfree(req->r_data.pages);
> }
>
> static void ceph_unlock_page_vector(struct page **pages, int num_pages)
> @@ -336,9 +337,10 @@ static int start_read(struct inode *inode, struct
> list_head *page_list, int max)
> }
> pages[i] = page;
> }
> - req->r_pages = pages;
> - req->r_num_pages = nr_pages;
> - req->r_page_alignment = 0;
> + req->r_data.type = CEPH_OSD_DATA_TYPE_PAGES;
This also.
> + req->r_data.pages = pages;
> + req->r_data.num_pages = nr_pages;
> + req->r_data.alignment = 0;
> req->r_callback = finish_read;
> req->r_inode = inode;
>
> @@ -374,7 +376,8 @@ static int ceph_readpages(struct file *file, struct
> address_space *mapping,
> max = (fsc->mount_options->rsize + PAGE_CACHE_SIZE - 1)
> >> PAGE_SHIFT;
>
> - dout("readpages %p file %p nr_pages %d max %d\n", inode, file, nr_pages,
> + dout("readpages %p file %p nr_pages %d max %d\n", inode,
> + file, nr_pages,
> max);
> while (!list_empty(page_list)) {
> rc = start_read(inode, page_list, max);
> @@ -560,6 +563,7 @@ static void writepages_finish(struct
> ceph_osd_request *req,
> long writeback_stat;
> unsigned issued = ceph_caps_issued(ci);
>
> + BUG_ON(req->r_data.type != CEPH_OSD_DATA_TYPE_PAGES);
Also this.
> if (rc >= 0) {
> /*
> * Assume we wrote the pages we originally sent. The
> @@ -567,7 +571,7 @@ static void writepages_finish(struct
> ceph_osd_request *req,
> * raced with a truncation and was adjusted at the osd,
> * so don't believe the reply.
> */
> - wrote = req->r_num_pages;
> + wrote = req->r_data.num_pages;
> } else {
> wrote = 0;
> mapping_set_error(mapping, rc);
> @@ -576,8 +580,8 @@ static void writepages_finish(struct
> ceph_osd_request *req,
> inode, rc, bytes, wrote);
>
> /* clean all pages */
> - for (i = 0; i < req->r_num_pages; i++) {
> - page = req->r_pages[i];
> + for (i = 0; i < req->r_data.num_pages; i++) {
> + page = req->r_data.pages[i];
> BUG_ON(!page);
> WARN_ON(!PageUptodate(page));
>
> @@ -606,31 +610,31 @@ static void writepages_finish(struct
> ceph_osd_request *req,
> unlock_page(page);
> }
> dout("%p wrote+cleaned %d pages\n", inode, wrote);
> - ceph_put_wrbuffer_cap_refs(ci, req->r_num_pages, snapc);
> + ceph_put_wrbuffer_cap_refs(ci, req->r_data.num_pages, snapc);
>
> - ceph_release_pages(req->r_pages, req->r_num_pages);
> - if (req->r_pages_from_pool)
> - mempool_free(req->r_pages,
> + ceph_release_pages(req->r_data.pages, req->r_data.num_pages);
> + if (req->r_data.pages_from_pool)
> + mempool_free(req->r_data.pages,
> ceph_sb_to_client(inode->i_sb)->wb_pagevec_pool);
> else
> - kfree(req->r_pages);
> + kfree(req->r_data.pages);
> ceph_osdc_put_request(req);
> }
>
> /*
> * allocate a page vec, either directly, or if necessary, via a the
> - * mempool. we avoid the mempool if we can because req->r_num_pages
> + * mempool. we avoid the mempool if we can because req->r_data.num_pages
> * may be less than the maximum write size.
> */
> static void alloc_page_vec(struct ceph_fs_client *fsc,
> struct ceph_osd_request *req)
> {
> - req->r_pages = kmalloc(sizeof(struct page *) * req->r_num_pages,
> + req->r_data.pages = kmalloc(sizeof(struct page *) * req->r_data.num_pages,
> GFP_NOFS);
> - if (!req->r_pages) {
> - req->r_pages = mempool_alloc(fsc->wb_pagevec_pool, GFP_NOFS);
> - req->r_pages_from_pool = 1;
> - WARN_ON(!req->r_pages);
> + if (!req->r_data.pages) {
> + req->r_data.pages = mempool_alloc(fsc->wb_pagevec_pool, GFP_NOFS);
> + req->r_data.pages_from_pool = 1;
> + WARN_ON(!req->r_data.pages);
> }
> }
>
> @@ -829,9 +833,10 @@ get_more_pages:
> break;
> }
>
> - req->r_num_pages = calc_pages_for(0, len);
> - req->r_page_alignment = 0;
> - max_pages = req->r_num_pages;
> + req->r_data.type = CEPH_OSD_DATA_TYPE_PAGES;
And this.
> + req->r_data.num_pages = calc_pages_for(0, len);
> + req->r_data.alignment = 0;
> + max_pages = req->r_data.num_pages;
>
> alloc_page_vec(fsc, req);
> req->r_callback = writepages_finish;
> @@ -853,7 +858,7 @@ get_more_pages:
> }
>
> set_page_writeback(page);
> - req->r_pages[locked_pages] = page;
> + req->r_data.pages[locked_pages] = page;
> locked_pages++;
> next = page->index + 1;
> }
> @@ -883,14 +888,14 @@ get_more_pages:
> }
>
> /* submit the write */
> - offset = req->r_pages[0]->index << PAGE_CACHE_SHIFT;
> + offset = req->r_data.pages[0]->index << PAGE_CACHE_SHIFT;
> len = min((snap_size ? snap_size : i_size_read(inode)) - offset,
> (u64)locked_pages << PAGE_CACHE_SHIFT);
> dout("writepages got %d pages at %llu~%llu\n",
> locked_pages, offset, len);
>
> /* revise final length, page count */
> - req->r_num_pages = locked_pages;
> + req->r_data.num_pages = locked_pages;
> req->r_request_ops[0].extent.length = cpu_to_le64(len);
> req->r_request_ops[0].payload_len = cpu_to_le32(len);
> req->r_request->hdr.data_len = cpu_to_le32(len);
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index fe7f711..76739049 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -568,12 +568,12 @@ more:
> if ((file->f_flags & O_SYNC) == 0) {
> /* get a second commit callback */
> req->r_safe_callback = sync_write_commit;
> - req->r_own_pages = 1;
> + req->r_data.own_pages = 1;
> }
> }
> - req->r_pages = pages;
> - req->r_num_pages = num_pages;
> - req->r_page_alignment = page_align;
> + req->r_data.pages = pages;
> + req->r_data.num_pages = num_pages;
> + req->r_data.alignment = page_align;
> req->r_inode = inode;
>
> ret = ceph_osdc_start_request(&fsc->client->osdc, req, false);
> diff --git a/include/linux/ceph/osd_client.h
> b/include/linux/ceph/osd_client.h
> index 803a9db..600b827 100644
> --- a/include/linux/ceph/osd_client.h
> +++ b/include/linux/ceph/osd_client.h
> @@ -50,6 +50,21 @@ struct ceph_osd {
>
> #define CEPH_OSD_MAX_OP 10
>
> +struct ceph_osd_data {
> + struct {
> + struct {
> + struct page **pages;
> + u32 num_pages;
> + u32 alignment;
> + bool pages_from_pool;
> + bool own_pages;
> + };
> +#ifdef CONFIG_BLOCK
> + struct bio *bio;
> +#endif /* CONFIG_BLOCK */
> + };
> +};
> +
> /* an in-flight request */
> struct ceph_osd_request {
> u64 r_tid; /* unique for this client */
> @@ -105,15 +120,8 @@ struct ceph_osd_request {
>
> struct ceph_file_layout r_file_layout;
> struct ceph_snap_context *r_snapc; /* snap context for writes */
> - unsigned r_num_pages; /* size of page array (follows) */
> - unsigned r_page_alignment; /* io offset in first page */
> - struct page **r_pages; /* pages for data payload */
> - int r_pages_from_pool;
> - int r_own_pages; /* if true, i own page list */
> -#ifdef CONFIG_BLOCK
> - struct bio *r_bio; /* instead of pages */
> -#endif
>
> + struct ceph_osd_data r_data;
> struct ceph_pagelist r_trail; /* trailing part of the data */
> };
>
> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
> index de427cc..1f8c7a7 100644
> --- a/net/ceph/osd_client.c
> +++ b/net/ceph/osd_client.c
> @@ -122,9 +122,9 @@ void ceph_osdc_release_request(struct kref *kref)
> }
> if (req->r_reply)
> ceph_msg_put(req->r_reply);
> - if (req->r_own_pages)
> - ceph_release_page_vector(req->r_pages,
> - req->r_num_pages);
> + if (req->r_data.own_pages)
> + ceph_release_page_vector(req->r_data.pages,
> + req->r_data.num_pages);
> ceph_put_snap_context(req->r_snapc);
> ceph_pagelist_release(&req->r_trail);
> if (req->r_mempool)
> @@ -1739,11 +1739,11 @@ int ceph_osdc_start_request(struct
> ceph_osd_client *osdc,
> {
> int rc = 0;
>
> - req->r_request->pages = req->r_pages;
> - req->r_request->page_count = req->r_num_pages;
> - req->r_request->page_alignment = req->r_page_alignment;
> + req->r_request->pages = req->r_data.pages;
> + req->r_request->page_count = req->r_data.num_pages;
> + req->r_request->page_alignment = req->r_data.alignment;
> #ifdef CONFIG_BLOCK
> - req->r_request->bio = req->r_bio;
> + req->r_request->bio = req->r_data.bio;
> #endif
> req->r_request->trail = &req->r_trail;
>
> @@ -1944,12 +1944,12 @@ int ceph_osdc_readpages(struct ceph_osd_client
> *osdc,
> return PTR_ERR(req);
>
> /* it may be a short read due to an object boundary */
> - req->r_pages = pages;
> - req->r_num_pages = calc_pages_for(page_align, *plen);
> - req->r_page_alignment = page_align;
> + req->r_data.pages = pages;
> + req->r_data.num_pages = calc_pages_for(page_align, *plen);
> + req->r_data.alignment = page_align;
>
> dout("readpages final extent is %llu~%llu (%d pages align %d)\n",
> - off, *plen, req->r_num_pages, page_align);
> + off, *plen, req->r_data.num_pages, page_align);
>
> rc = ceph_osdc_start_request(osdc, req, false);
> if (!rc)
> @@ -1987,10 +1987,10 @@ int ceph_osdc_writepages(struct ceph_osd_client
> *osdc, struct ceph_vino vino,
> return PTR_ERR(req);
>
> /* it may be a short write due to an object boundary */
> - req->r_pages = pages;
> - req->r_num_pages = calc_pages_for(page_align, len);
> - req->r_page_alignment = page_align;
> - dout("writepages %llu~%llu (%d pages)\n", off, len, req->r_num_pages);
> + req->r_data.pages = pages;
> + req->r_data.num_pages = calc_pages_for(page_align, len);
> + req->r_data.alignment = page_align;
> + dout("writepages %llu~%llu (%d pages)\n", off, len,
> req->r_data.num_pages);
>
> rc = ceph_osdc_start_request(osdc, req, true);
> if (!rc)
> @@ -2083,22 +2083,22 @@ static struct ceph_msg *get_reply(struct
> ceph_connection *con,
> m = ceph_msg_get(req->r_reply);
>
> if (data_len > 0) {
> - int want = calc_pages_for(req->r_page_alignment, data_len);
> + int want = calc_pages_for(req->r_data.alignment, data_len);
>
> - if (req->r_pages && unlikely(req->r_num_pages < want)) {
> + if (req->r_data.pages && unlikely(req->r_data.num_pages < want)) {
> pr_warning("tid %lld reply has %d bytes %d pages, we"
> " had only %d pages ready\n", tid, data_len,
> - want, req->r_num_pages);
> + want, req->r_data.num_pages);
> *skip = 1;
> ceph_msg_put(m);
> m = NULL;
> goto out;
> }
> - m->pages = req->r_pages;
> - m->page_count = req->r_num_pages;
> - m->page_alignment = req->r_page_alignment;
> + m->pages = req->r_data.pages;
> + m->page_count = req->r_data.num_pages;
> + m->page_alignment = req->r_data.alignment;
> #ifdef CONFIG_BLOCK
> - m->bio = req->r_bio;
> + m->bio = req->r_data.bio;
> #endif
> }
> *skip = 0;
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2/3] libceph: distinguish page and bio requests
2013-03-04 18:18 ` [PATCH 2/3] libceph: distinguish page and bio requests Alex Elder
@ 2013-03-05 2:14 ` Josh Durgin
0 siblings, 0 replies; 29+ messages in thread
From: Josh Durgin @ 2013-03-05 2:14 UTC (permalink / raw)
To: Alex Elder; +Cc: ceph-devel
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
On 03/04/2013 10:18 AM, Alex Elder wrote:
> An osd request uses either pages or a bio list for its data. Use a
> union to record information about the two, and add a data type
> tag to select between them.
>
> Signed-off-by: Alex Elder <elder@inktank.com>
> ---
> drivers/block/rbd.c | 4 +++
> fs/ceph/file.c | 1 +
> include/linux/ceph/osd_client.h | 11 +++++++-
> net/ceph/osd_client.c | 56
> +++++++++++++++++++++++++--------------
> 4 files changed, 51 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index dcf08f2..70c3b39 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -1412,12 +1412,16 @@ static struct ceph_osd_request *rbd_osd_req_create(
> break; /* Nothing to do */
> case OBJ_REQUEST_BIO:
> rbd_assert(obj_request->bio_list != NULL);
> + osd_req->r_data.type = CEPH_OSD_DATA_TYPE_BIO;
> osd_req->r_data.bio = obj_request->bio_list;
> break;
> case OBJ_REQUEST_PAGES:
> + osd_req->r_data.type = CEPH_OSD_DATA_TYPE_PAGES;
> osd_req->r_data.pages = obj_request->pages;
> osd_req->r_data.num_pages = obj_request->page_count;
> osd_req->r_data.alignment = offset & ~PAGE_MASK;
> + osd_req->r_data.pages_from_pool = false;
> + osd_req->r_data.own_pages = false;
> break;
> }
>
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 76739049..0f7b3f4 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -571,6 +571,7 @@ more:
> req->r_data.own_pages = 1;
> }
> }
> + req->r_data.type = CEPH_OSD_DATA_TYPE_PAGES;
> req->r_data.pages = pages;
> req->r_data.num_pages = num_pages;
> req->r_data.alignment = page_align;
> diff --git a/include/linux/ceph/osd_client.h
> b/include/linux/ceph/osd_client.h
> index 600b827..56604b3 100644
> --- a/include/linux/ceph/osd_client.h
> +++ b/include/linux/ceph/osd_client.h
> @@ -50,8 +50,17 @@ struct ceph_osd {
>
> #define CEPH_OSD_MAX_OP 10
>
> +enum ceph_osd_data_type {
> + CEPH_OSD_DATA_TYPE_NONE,
> + CEPH_OSD_DATA_TYPE_PAGES,
> +#ifdef CONFIG_BLOCK
> + CEPH_OSD_DATA_TYPE_BIO,
> +#endif /* CONFIG_BLOCK */
> +};
> +
> struct ceph_osd_data {
> - struct {
> + enum ceph_osd_data_type type;
> + union {
> struct {
> struct page **pages;
> u32 num_pages;
> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
> index 1f8c7a7..591e1b0 100644
> --- a/net/ceph/osd_client.c
> +++ b/net/ceph/osd_client.c
> @@ -122,7 +122,8 @@ void ceph_osdc_release_request(struct kref *kref)
> }
> if (req->r_reply)
> ceph_msg_put(req->r_reply);
> - if (req->r_data.own_pages)
> + if (req->r_data.type == CEPH_OSD_DATA_TYPE_PAGES &&
> + req->r_data.own_pages)
> ceph_release_page_vector(req->r_data.pages,
> req->r_data.num_pages);
> ceph_put_snap_context(req->r_snapc);
> @@ -188,6 +189,7 @@ struct ceph_osd_request
> *ceph_osdc_alloc_request(struct ceph_osd_client *osdc,
> }
> req->r_reply = msg;
>
> + req->r_data.type = CEPH_OSD_DATA_TYPE_NONE;
> ceph_pagelist_init(&req->r_trail);
>
> /* create request message; allow space for oid */
> @@ -1739,12 +1741,17 @@ int ceph_osdc_start_request(struct
> ceph_osd_client *osdc,
> {
> int rc = 0;
>
> - req->r_request->pages = req->r_data.pages;
> - req->r_request->page_count = req->r_data.num_pages;
> - req->r_request->page_alignment = req->r_data.alignment;
> + if (req->r_data.type == CEPH_OSD_DATA_TYPE_PAGES) {
> + req->r_request->pages = req->r_data.pages;
> + req->r_request->page_count = req->r_data.num_pages;
> + req->r_request->page_alignment = req->r_data.alignment;
> #ifdef CONFIG_BLOCK
> - req->r_request->bio = req->r_data.bio;
> + } else if (req->r_data.type == CEPH_OSD_DATA_TYPE_BIO) {
> + req->r_request->bio = req->r_data.bio;
> #endif
> + } else {
> + pr_err("unknown request data type %d\n", req->r_data.type);
> + }
> req->r_request->trail = &req->r_trail;
>
> register_request(osdc, req);
> @@ -1944,6 +1951,7 @@ int ceph_osdc_readpages(struct ceph_osd_client *osdc,
> return PTR_ERR(req);
>
> /* it may be a short read due to an object boundary */
> + req->r_data.type = CEPH_OSD_DATA_TYPE_PAGES;
> req->r_data.pages = pages;
> req->r_data.num_pages = calc_pages_for(page_align, *plen);
> req->r_data.alignment = page_align;
> @@ -1987,6 +1995,7 @@ int ceph_osdc_writepages(struct ceph_osd_client
> *osdc, struct ceph_vino vino,
> return PTR_ERR(req);
>
> /* it may be a short write due to an object boundary */
> + req->r_data.type = CEPH_OSD_DATA_TYPE_PAGES;
> req->r_data.pages = pages;
> req->r_data.num_pages = calc_pages_for(page_align, len);
> req->r_data.alignment = page_align;
> @@ -2083,23 +2092,30 @@ static struct ceph_msg *get_reply(struct
> ceph_connection *con,
> m = ceph_msg_get(req->r_reply);
>
> if (data_len > 0) {
> - int want = calc_pages_for(req->r_data.alignment, data_len);
> -
> - if (req->r_data.pages && unlikely(req->r_data.num_pages < want)) {
> - pr_warning("tid %lld reply has %d bytes %d pages, we"
> - " had only %d pages ready\n", tid, data_len,
> - want, req->r_data.num_pages);
> - *skip = 1;
> - ceph_msg_put(m);
> - m = NULL;
> - goto out;
> - }
> - m->pages = req->r_data.pages;
> - m->page_count = req->r_data.num_pages;
> - m->page_alignment = req->r_data.alignment;
> + if (req->r_data.type == CEPH_OSD_DATA_TYPE_PAGES) {
> + int want;
> +
> + want = calc_pages_for(req->r_data.alignment, data_len);
> + if (req->r_data.pages &&
> + unlikely(req->r_data.num_pages < want)) {
> +
> + pr_warning("tid %lld reply has %d bytes %d "
> + "pages, we had only %d pages ready\n",
> + tid, data_len, want,
> + req->r_data.num_pages);
> + *skip = 1;
> + ceph_msg_put(m);
> + m = NULL;
> + goto out;
> + }
> + m->pages = req->r_data.pages;
> + m->page_count = req->r_data.num_pages;
> + m->page_alignment = req->r_data.alignment;
> #ifdef CONFIG_BLOCK
> - m->bio = req->r_data.bio;
> + } else if (req->r_data.type == CEPH_OSD_DATA_TYPE_BIO) {
> + m->bio = req->r_data.bio;
> #endif
> + }
> }
> *skip = 0;
> req->r_con_filling_msg = con->ops->get(con);
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 3/3] libceph: separate read and write data
2013-03-04 18:18 ` [PATCH 3/3] libceph: separate read and write data Alex Elder
@ 2013-03-05 2:15 ` Josh Durgin
0 siblings, 0 replies; 29+ messages in thread
From: Josh Durgin @ 2013-03-05 2:15 UTC (permalink / raw)
To: Alex Elder; +Cc: ceph-devel
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
On 03/04/2013 10:18 AM, Alex Elder wrote:
> An osd request defines information about where data to be read
> should be placed as well as where data to write comes from.
> Currently these are represented by common fields.
>
> Keep information about data for writing separate from data to be
> read by splitting these into data_in and data_out fields.
>
> This is the key patch in this whole series, in that it actually
> identifies which osd requests generate outgoing data and which
> generate incoming data. It's less obvious (currently) that an osd
> CALL op generates both outgoing and incoming data; that's the focus
> of some upcoming work.
>
> This resolves:
> http://tracker.ceph.com/issues/4127
>
> Signed-off-by: Alex Elder <elder@inktank.com>
> ---
> drivers/block/rbd.c | 18 +++++----
> fs/ceph/addr.c | 67 ++++++++++++++++---------------
> fs/ceph/file.c | 10 ++---
> include/linux/ceph/osd_client.h | 5 ++-
> net/ceph/osd_client.c | 83
> ++++++++++++++++++++++++---------------
> 5 files changed, 105 insertions(+), 78 deletions(-)
>
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index 70c3b39..a0a6182 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -1385,6 +1385,7 @@ static struct ceph_osd_request *rbd_osd_req_create(
> struct ceph_snap_context *snapc = NULL;
> struct ceph_osd_client *osdc;
> struct ceph_osd_request *osd_req;
> + struct ceph_osd_data *osd_data;
> struct timespec now;
> struct timespec *mtime;
> u64 snap_id = CEPH_NOSNAP;
> @@ -1405,6 +1406,7 @@ static struct ceph_osd_request *rbd_osd_req_create(
> osd_req = ceph_osdc_alloc_request(osdc, snapc, 1, false, GFP_ATOMIC);
> if (!osd_req)
> return NULL; /* ENOMEM */
> + osd_data = write_request ? &osd_req->r_data_out : &osd_req->r_data_in;
>
> rbd_assert(obj_request_type_valid(obj_request->type));
> switch (obj_request->type) {
> @@ -1412,16 +1414,16 @@ static struct ceph_osd_request *rbd_osd_req_create(
> break; /* Nothing to do */
> case OBJ_REQUEST_BIO:
> rbd_assert(obj_request->bio_list != NULL);
> - osd_req->r_data.type = CEPH_OSD_DATA_TYPE_BIO;
> - osd_req->r_data.bio = obj_request->bio_list;
> + osd_data->type = CEPH_OSD_DATA_TYPE_BIO;
> + osd_data->bio = obj_request->bio_list;
> break;
> case OBJ_REQUEST_PAGES:
> - osd_req->r_data.type = CEPH_OSD_DATA_TYPE_PAGES;
> - osd_req->r_data.pages = obj_request->pages;
> - osd_req->r_data.num_pages = obj_request->page_count;
> - osd_req->r_data.alignment = offset & ~PAGE_MASK;
> - osd_req->r_data.pages_from_pool = false;
> - osd_req->r_data.own_pages = false;
> + osd_data->type = CEPH_OSD_DATA_TYPE_PAGES;
> + osd_data->pages = obj_request->pages;
> + osd_data->num_pages = obj_request->page_count;
> + osd_data->alignment = offset & ~PAGE_MASK;
> + osd_data->pages_from_pool = false;
> + osd_data->own_pages = false;
> break;
> }
>
> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> index 1cf7894..6b223d5 100644
> --- a/fs/ceph/addr.c
> +++ b/fs/ceph/addr.c
> @@ -243,9 +243,9 @@ static void finish_read(struct ceph_osd_request
> *req, struct ceph_msg *msg)
> dout("finish_read %p req %p rc %d bytes %d\n", inode, req, rc, bytes);
>
> /* unlock all pages, zeroing any data we didn't read */
> - BUG_ON(req->r_data.type != CEPH_OSD_DATA_TYPE_PAGES);
> - for (i = 0; i < req->r_data.num_pages; i++, bytes -= PAGE_CACHE_SIZE) {
> - struct page *page = req->r_data.pages[i];
> + BUG_ON(req->r_data_in.type != CEPH_OSD_DATA_TYPE_PAGES);
> + for (i = 0; i < req->r_data_in.num_pages; i++) {
> + struct page *page = req->r_data_in.pages[i];
>
> if (bytes < (int)PAGE_CACHE_SIZE) {
> /* zero (remainder of) page */
> @@ -258,8 +258,9 @@ static void finish_read(struct ceph_osd_request
> *req, struct ceph_msg *msg)
> SetPageUptodate(page);
> unlock_page(page);
> page_cache_release(page);
> + bytes -= PAGE_CACHE_SIZE;
> }
> - kfree(req->r_data.pages);
> + kfree(req->r_data_in.pages);
> }
>
> static void ceph_unlock_page_vector(struct page **pages, int num_pages)
> @@ -337,10 +338,10 @@ static int start_read(struct inode *inode, struct
> list_head *page_list, int max)
> }
> pages[i] = page;
> }
> - req->r_data.type = CEPH_OSD_DATA_TYPE_PAGES;
> - req->r_data.pages = pages;
> - req->r_data.num_pages = nr_pages;
> - req->r_data.alignment = 0;
> + req->r_data_in.type = CEPH_OSD_DATA_TYPE_PAGES;
> + req->r_data_in.pages = pages;
> + req->r_data_in.num_pages = nr_pages;
> + req->r_data_in.alignment = 0;
> req->r_callback = finish_read;
> req->r_inode = inode;
>
> @@ -563,7 +564,7 @@ static void writepages_finish(struct
> ceph_osd_request *req,
> long writeback_stat;
> unsigned issued = ceph_caps_issued(ci);
>
> - BUG_ON(req->r_data.type != CEPH_OSD_DATA_TYPE_PAGES);
> + BUG_ON(req->r_data_out.type != CEPH_OSD_DATA_TYPE_PAGES);
> if (rc >= 0) {
> /*
> * Assume we wrote the pages we originally sent. The
> @@ -571,7 +572,7 @@ static void writepages_finish(struct
> ceph_osd_request *req,
> * raced with a truncation and was adjusted at the osd,
> * so don't believe the reply.
> */
> - wrote = req->r_data.num_pages;
> + wrote = req->r_data_out.num_pages;
> } else {
> wrote = 0;
> mapping_set_error(mapping, rc);
> @@ -580,8 +581,8 @@ static void writepages_finish(struct
> ceph_osd_request *req,
> inode, rc, bytes, wrote);
>
> /* clean all pages */
> - for (i = 0; i < req->r_data.num_pages; i++) {
> - page = req->r_data.pages[i];
> + for (i = 0; i < req->r_data_out.num_pages; i++) {
> + page = req->r_data_out.pages[i];
> BUG_ON(!page);
> WARN_ON(!PageUptodate(page));
>
> @@ -610,31 +611,34 @@ static void writepages_finish(struct
> ceph_osd_request *req,
> unlock_page(page);
> }
> dout("%p wrote+cleaned %d pages\n", inode, wrote);
> - ceph_put_wrbuffer_cap_refs(ci, req->r_data.num_pages, snapc);
> + ceph_put_wrbuffer_cap_refs(ci, req->r_data_out.num_pages, snapc);
>
> - ceph_release_pages(req->r_data.pages, req->r_data.num_pages);
> - if (req->r_data.pages_from_pool)
> - mempool_free(req->r_data.pages,
> + ceph_release_pages(req->r_data_out.pages, req->r_data_out.num_pages);
> + if (req->r_data_out.pages_from_pool)
> + mempool_free(req->r_data_out.pages,
> ceph_sb_to_client(inode->i_sb)->wb_pagevec_pool);
> else
> - kfree(req->r_data.pages);
> + kfree(req->r_data_out.pages);
> ceph_osdc_put_request(req);
> }
>
> /*
> * allocate a page vec, either directly, or if necessary, via a the
> - * mempool. we avoid the mempool if we can because req->r_data.num_pages
> + * mempool. we avoid the mempool if we can because
> req->r_data_out.num_pages
> * may be less than the maximum write size.
> */
> static void alloc_page_vec(struct ceph_fs_client *fsc,
> struct ceph_osd_request *req)
> {
> - req->r_data.pages = kmalloc(sizeof(struct page *) * req->r_data.num_pages,
> - GFP_NOFS);
> - if (!req->r_data.pages) {
> - req->r_data.pages = mempool_alloc(fsc->wb_pagevec_pool, GFP_NOFS);
> - req->r_data.pages_from_pool = 1;
> - WARN_ON(!req->r_data.pages);
> + size_t size;
> +
> + size = sizeof (struct page *) * req->r_data_out.num_pages;
> + req->r_data_out.pages = kmalloc(size, GFP_NOFS);
> + if (!req->r_data_out.pages) {
> + req->r_data_out.pages = mempool_alloc(fsc->wb_pagevec_pool,
> + GFP_NOFS);
> + req->r_data_out.pages_from_pool = 1;
> + WARN_ON(!req->r_data_out.pages);
> }
> }
>
> @@ -833,10 +837,11 @@ get_more_pages:
> break;
> }
>
> - req->r_data.type = CEPH_OSD_DATA_TYPE_PAGES;
> - req->r_data.num_pages = calc_pages_for(0, len);
> - req->r_data.alignment = 0;
> - max_pages = req->r_data.num_pages;
> + req->r_data_out.type = CEPH_OSD_DATA_TYPE_PAGES;
> + req->r_data_out.num_pages =
> + calc_pages_for(0, len);
> + req->r_data_out.alignment = 0;
> + max_pages = req->r_data_out.num_pages;
>
> alloc_page_vec(fsc, req);
> req->r_callback = writepages_finish;
> @@ -858,7 +863,7 @@ get_more_pages:
> }
>
> set_page_writeback(page);
> - req->r_data.pages[locked_pages] = page;
> + req->r_data_out.pages[locked_pages] = page;
> locked_pages++;
> next = page->index + 1;
> }
> @@ -888,14 +893,14 @@ get_more_pages:
> }
>
> /* submit the write */
> - offset = req->r_data.pages[0]->index << PAGE_CACHE_SHIFT;
> + offset = req->r_data_out.pages[0]->index << PAGE_CACHE_SHIFT;
> len = min((snap_size ? snap_size : i_size_read(inode)) - offset,
> (u64)locked_pages << PAGE_CACHE_SHIFT);
> dout("writepages got %d pages at %llu~%llu\n",
> locked_pages, offset, len);
>
> /* revise final length, page count */
> - req->r_data.num_pages = locked_pages;
> + req->r_data_out.num_pages = locked_pages;
> req->r_request_ops[0].extent.length = cpu_to_le64(len);
> req->r_request_ops[0].payload_len = cpu_to_le32(len);
> req->r_request->hdr.data_len = cpu_to_le32(len);
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index 0f7b3f4..431cd7b 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -568,13 +568,13 @@ more:
> if ((file->f_flags & O_SYNC) == 0) {
> /* get a second commit callback */
> req->r_safe_callback = sync_write_commit;
> - req->r_data.own_pages = 1;
> + req->r_data_out.own_pages = 1;
> }
> }
> - req->r_data.type = CEPH_OSD_DATA_TYPE_PAGES;
> - req->r_data.pages = pages;
> - req->r_data.num_pages = num_pages;
> - req->r_data.alignment = page_align;
> + req->r_data_out.type = CEPH_OSD_DATA_TYPE_PAGES;
> + req->r_data_out.pages = pages;
> + req->r_data_out.num_pages = num_pages;
> + req->r_data_out.alignment = page_align;
> req->r_inode = inode;
>
> ret = ceph_osdc_start_request(&fsc->client->osdc, req, false);
> diff --git a/include/linux/ceph/osd_client.h
> b/include/linux/ceph/osd_client.h
> index 56604b3..40e0260 100644
> --- a/include/linux/ceph/osd_client.h
> +++ b/include/linux/ceph/osd_client.h
> @@ -130,8 +130,9 @@ struct ceph_osd_request {
> struct ceph_file_layout r_file_layout;
> struct ceph_snap_context *r_snapc; /* snap context for writes */
>
> - struct ceph_osd_data r_data;
> - struct ceph_pagelist r_trail; /* trailing part of the data */
> + struct ceph_osd_data r_data_in;
> + struct ceph_osd_data r_data_out;
> + struct ceph_pagelist r_trail; /* trailing part of data out */
> };
>
> struct ceph_osd_event {
> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
> index 591e1b0..f9cf445 100644
> --- a/net/ceph/osd_client.c
> +++ b/net/ceph/osd_client.c
> @@ -122,10 +122,16 @@ void ceph_osdc_release_request(struct kref *kref)
> }
> if (req->r_reply)
> ceph_msg_put(req->r_reply);
> - if (req->r_data.type == CEPH_OSD_DATA_TYPE_PAGES &&
> - req->r_data.own_pages)
> - ceph_release_page_vector(req->r_data.pages,
> - req->r_data.num_pages);
> +
> + if (req->r_data_in.type == CEPH_OSD_DATA_TYPE_PAGES &&
> + req->r_data_in.own_pages)
> + ceph_release_page_vector(req->r_data_in.pages,
> + req->r_data_in.num_pages);
> + if (req->r_data_out.type == CEPH_OSD_DATA_TYPE_PAGES &&
> + req->r_data_out.own_pages)
> + ceph_release_page_vector(req->r_data_out.pages,
> + req->r_data_out.num_pages);
> +
> ceph_put_snap_context(req->r_snapc);
> ceph_pagelist_release(&req->r_trail);
> if (req->r_mempool)
> @@ -189,7 +195,8 @@ struct ceph_osd_request
> *ceph_osdc_alloc_request(struct ceph_osd_client *osdc,
> }
> req->r_reply = msg;
>
> - req->r_data.type = CEPH_OSD_DATA_TYPE_NONE;
> + req->r_data_in.type = CEPH_OSD_DATA_TYPE_NONE;
> + req->r_data_out.type = CEPH_OSD_DATA_TYPE_NONE;
> ceph_pagelist_init(&req->r_trail);
>
> /* create request message; allow space for oid */
> @@ -1740,17 +1747,21 @@ int ceph_osdc_start_request(struct
> ceph_osd_client *osdc,
> bool nofail)
> {
> int rc = 0;
> + struct ceph_osd_data *osd_data;
> +
> + /* Set up outgoing data */
>
> - if (req->r_data.type == CEPH_OSD_DATA_TYPE_PAGES) {
> - req->r_request->pages = req->r_data.pages;
> - req->r_request->page_count = req->r_data.num_pages;
> - req->r_request->page_alignment = req->r_data.alignment;
> + osd_data = &req->r_data_out;
> + if (osd_data->type == CEPH_OSD_DATA_TYPE_PAGES) {
> + req->r_request->pages = osd_data->pages;
> + req->r_request->page_count = osd_data->num_pages;
> + req->r_request->page_alignment = osd_data->alignment;
> #ifdef CONFIG_BLOCK
> - } else if (req->r_data.type == CEPH_OSD_DATA_TYPE_BIO) {
> - req->r_request->bio = req->r_data.bio;
> + } else if (osd_data->type == CEPH_OSD_DATA_TYPE_BIO) {
> + req->r_request->bio = osd_data->bio;
> #endif
> } else {
> - pr_err("unknown request data type %d\n", req->r_data.type);
> + BUG_ON(osd_data->type != CEPH_OSD_DATA_TYPE_NONE);
> }
> req->r_request->trail = &req->r_trail;
>
> @@ -1939,6 +1950,7 @@ int ceph_osdc_readpages(struct ceph_osd_client *osdc,
> struct page **pages, int num_pages, int page_align)
> {
> struct ceph_osd_request *req;
> + struct ceph_osd_data *osd_data;
> int rc = 0;
>
> dout("readpages on ino %llx.%llx on %llu~%llu\n", vino.ino,
> @@ -1951,13 +1963,15 @@ int ceph_osdc_readpages(struct ceph_osd_client
> *osdc,
> return PTR_ERR(req);
>
> /* it may be a short read due to an object boundary */
> - req->r_data.type = CEPH_OSD_DATA_TYPE_PAGES;
> - req->r_data.pages = pages;
> - req->r_data.num_pages = calc_pages_for(page_align, *plen);
> - req->r_data.alignment = page_align;
> +
> + osd_data = &req->r_data_in;
> + osd_data->type = CEPH_OSD_DATA_TYPE_PAGES;
> + osd_data->pages = pages;
> + osd_data->num_pages = calc_pages_for(page_align, *plen);
> + osd_data->alignment = page_align;
>
> dout("readpages final extent is %llu~%llu (%d pages align %d)\n",
> - off, *plen, req->r_data.num_pages, page_align);
> + off, *plen, osd_data->num_pages, page_align);
>
> rc = ceph_osdc_start_request(osdc, req, false);
> if (!rc)
> @@ -1981,6 +1995,7 @@ int ceph_osdc_writepages(struct ceph_osd_client
> *osdc, struct ceph_vino vino,
> struct page **pages, int num_pages)
> {
> struct ceph_osd_request *req;
> + struct ceph_osd_data *osd_data;
> int rc = 0;
> int page_align = off & ~PAGE_MASK;
>
> @@ -1995,11 +2010,13 @@ int ceph_osdc_writepages(struct ceph_osd_client
> *osdc, struct ceph_vino vino,
> return PTR_ERR(req);
>
> /* it may be a short write due to an object boundary */
> - req->r_data.type = CEPH_OSD_DATA_TYPE_PAGES;
> - req->r_data.pages = pages;
> - req->r_data.num_pages = calc_pages_for(page_align, len);
> - req->r_data.alignment = page_align;
> - dout("writepages %llu~%llu (%d pages)\n", off, len,
> req->r_data.num_pages);
> + osd_data = &req->r_data_out;
> + osd_data->type = CEPH_OSD_DATA_TYPE_PAGES;
> + osd_data->pages = pages;
> + osd_data->num_pages = calc_pages_for(page_align, len);
> + osd_data->alignment = page_align;
> + dout("writepages %llu~%llu (%d pages)\n", off, len,
> + osd_data->num_pages);
>
> rc = ceph_osdc_start_request(osdc, req, true);
> if (!rc)
> @@ -2092,28 +2109,30 @@ static struct ceph_msg *get_reply(struct
> ceph_connection *con,
> m = ceph_msg_get(req->r_reply);
>
> if (data_len > 0) {
> - if (req->r_data.type == CEPH_OSD_DATA_TYPE_PAGES) {
> + struct ceph_osd_data *osd_data = &req->r_data_in;
> +
> + if (osd_data->type == CEPH_OSD_DATA_TYPE_PAGES) {
> int want;
>
> - want = calc_pages_for(req->r_data.alignment, data_len);
> - if (req->r_data.pages &&
> - unlikely(req->r_data.num_pages < want)) {
> + want = calc_pages_for(osd_data->alignment, data_len);
> + if (osd_data->pages &&
> + unlikely(osd_data->num_pages < want)) {
>
> pr_warning("tid %lld reply has %d bytes %d "
> "pages, we had only %d pages ready\n",
> tid, data_len, want,
> - req->r_data.num_pages);
> + osd_data->num_pages);
> *skip = 1;
> ceph_msg_put(m);
> m = NULL;
> goto out;
> }
> - m->pages = req->r_data.pages;
> - m->page_count = req->r_data.num_pages;
> - m->page_alignment = req->r_data.alignment;
> + m->pages = osd_data->pages;
> + m->page_count = osd_data->num_pages;
> + m->page_alignment = osd_data->alignment;
> #ifdef CONFIG_BLOCK
> - } else if (req->r_data.type == CEPH_OSD_DATA_TYPE_BIO) {
> - m->bio = req->r_data.bio;
> + } else if (osd_data->type == CEPH_OSD_DATA_TYPE_BIO) {
> + m->bio = osd_data->bio;
> #endif
> }
> }
>
^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2013-03-05 2:15 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-04 18:05 OSD Data Read/Write Separation Patches Alex Elder
2013-03-04 18:08 ` [PATCH 0/3] libceph: a few cleanups Alex Elder
2013-03-04 18:09 ` [PATCH 1/3] libceph: use (void *) for untyped data in osd ops Alex Elder
2013-03-04 18:09 ` [PATCH 2/3] libceph: kill ceph_msg->pagelist_count Alex Elder
2013-03-04 18:09 ` [PATCH 3/3] libceph: rename ceph_calc_object_layout() Alex Elder
2013-03-05 1:49 ` [PATCH 0/3] libceph: a few cleanups Josh Durgin
2013-03-04 18:10 ` [PATCH 0/3] libceph: simplify incoming message allocation Alex Elder
2013-03-04 18:12 ` [PATCH 1/3] libceph: drop mutex while allocating a message Alex Elder
2013-03-04 19:07 ` Gregory Farnum
2013-03-04 19:25 ` Alex Elder
2013-03-04 19:39 ` Gregory Farnum
2013-03-04 19:52 ` Alex Elder
2013-03-04 18:12 ` [PATCH 2/3] libceph: define mds_alloc_msg() method Alex Elder
2013-03-04 19:05 ` Gregory Farnum
2013-03-04 19:37 ` Alex Elder
2013-03-04 18:12 ` [PATCH 3/3] libceph: no need for alignment for mds message Alex Elder
2013-03-04 19:09 ` Gregory Farnum
2013-03-04 18:14 ` [PATCH 0/3] ceph: assign message data fields consistently Alex Elder
2013-03-04 18:15 ` [PATCH 1/3] ceph: use calc_pages_for() in start_read() Alex Elder
2013-03-04 18:15 ` [PATCH 2/3] ceph: simplify ceph_sync_write() page_align calculation Alex Elder
2013-03-04 18:15 ` [PATCH 3/3] libceph: don't assign page info in ceph_osdc_new_request() Alex Elder
2013-03-05 1:55 ` [PATCH 0/3] ceph: assign message data fields consistently Josh Durgin
2013-03-04 18:16 ` [PATCH 0/3] libceph: distinguish osd request read and write data Alex Elder
2013-03-04 18:17 ` [PATCH 1/3] libceph: separate osd request data info Alex Elder
2013-03-05 2:13 ` Josh Durgin
2013-03-04 18:18 ` [PATCH 2/3] libceph: distinguish page and bio requests Alex Elder
2013-03-05 2:14 ` Josh Durgin
2013-03-04 18:18 ` [PATCH 3/3] libceph: separate read and write data Alex Elder
2013-03-05 2:15 ` Josh Durgin
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.