From: Liu Yuan <namei.unix@gmail.com>
To: Teruaki Ishizaki <ishizaki.teruaki@lab.ntt.co.jp>
Cc: kwolf@redhat.com, mitake.hitoshi@lab.ntt.co.jp,
sheepdog@lists.wpkg.org, qemu-devel@nongnu.org,
stefanha@redhat.com
Subject: Re: [Qemu-devel] [PATCH v4] sheepdog: selectable object size support
Date: Tue, 10 Feb 2015 11:18:25 +0800 [thread overview]
Message-ID: <20150210031825.GC3859@ubuntu-trusty> (raw)
In-Reply-To: <20150210031051.GB3859@ubuntu-trusty>
On Tue, Feb 10, 2015 at 11:10:51AM +0800, Liu Yuan wrote:
> On Tue, Jan 27, 2015 at 05:35:27PM +0900, Teruaki Ishizaki wrote:
> > Previously, qemu block driver of sheepdog used hard-coded VDI object size.
> > This patch enables users to handle "block_size_shift" value for
> > calculating VDI object size.
> >
> > When you start qemu, you don't need to specify additional command option.
> >
> > But when you create the VDI which doesn't have default object size
> > with qemu-img command, you specify block_size_shift option.
> >
> > If you want to create a VDI of 8MB(1 << 23) object size,
> > you need to specify following command option.
> >
> > # qemu-img create -o block_size_shift=23 sheepdog:test1 100M
> >
> > In addition, when you don't specify qemu-img command option,
> > a default value of sheepdog cluster is used for creating VDI.
> >
> > # qemu-img create sheepdog:test2 100M
> >
> > Signed-off-by: Teruaki Ishizaki <ishizaki.teruaki@lab.ntt.co.jp>
> > ---
> > V4:
> > - Limit a read/write buffer size for creating a preallocated VDI.
> > - Replace a parse function for the block_size_shift option.
> > - Fix an error message.
> >
> > V3:
> > - Delete the needless operation of buffer.
> > - Delete the needless operations of request header.
> > for SD_OP_GET_CLUSTER_DEFAULT.
> > - Fix coding style problems.
> >
> > V2:
> > - Fix coding style problem (white space).
> > - Add members, store_policy and block_size_shift to struct SheepdogVdiReq.
> > - Initialize request header to use block_size_shift specified by user.
> > ---
> > block/sheepdog.c | 138 ++++++++++++++++++++++++++++++++++++++-------
> > include/block/block_int.h | 1 +
> > 2 files changed, 119 insertions(+), 20 deletions(-)
> >
> > diff --git a/block/sheepdog.c b/block/sheepdog.c
> > index be3176f..a43b947 100644
> > --- a/block/sheepdog.c
> > +++ b/block/sheepdog.c
> > @@ -37,6 +37,7 @@
> > #define SD_OP_READ_VDIS 0x15
> > #define SD_OP_FLUSH_VDI 0x16
> > #define SD_OP_DEL_VDI 0x17
> > +#define SD_OP_GET_CLUSTER_DEFAULT 0x18
> >
> > #define SD_FLAG_CMD_WRITE 0x01
> > #define SD_FLAG_CMD_COW 0x02
> > @@ -167,7 +168,8 @@ typedef struct SheepdogVdiReq {
> > uint32_t base_vdi_id;
> > uint8_t copies;
> > uint8_t copy_policy;
> > - uint8_t reserved[2];
> > + uint8_t store_policy;
> > + uint8_t block_size_shift;
> > uint32_t snapid;
> > uint32_t type;
> > uint32_t pad[2];
> > @@ -186,6 +188,21 @@ typedef struct SheepdogVdiRsp {
> > uint32_t pad[5];
> > } SheepdogVdiRsp;
> >
> > +typedef struct SheepdogClusterRsp {
> > + uint8_t proto_ver;
> > + uint8_t opcode;
> > + uint16_t flags;
> > + uint32_t epoch;
> > + uint32_t id;
> > + uint32_t data_length;
> > + uint32_t result;
> > + uint8_t nr_copies;
> > + uint8_t copy_policy;
> > + uint8_t block_size_shift;
> > + uint8_t __pad1;
> > + uint32_t __pad2[6];
> > +} SheepdogClusterRsp;
> > +
> > typedef struct SheepdogInode {
> > char name[SD_MAX_VDI_LEN];
> > char tag[SD_MAX_VDI_TAG_LEN];
> > @@ -1544,6 +1561,7 @@ static int do_sd_create(BDRVSheepdogState *s, uint32_t *vdi_id, int snapshot,
> > hdr.vdi_size = s->inode.vdi_size;
> > hdr.copy_policy = s->inode.copy_policy;
> > hdr.copies = s->inode.nr_copies;
> > + hdr.block_size_shift = s->inode.block_size_shift;
> >
> > ret = do_req(fd, s->aio_context, (SheepdogReq *)&hdr, buf, &wlen, &rlen);
> >
> > @@ -1569,9 +1587,12 @@ static int do_sd_create(BDRVSheepdogState *s, uint32_t *vdi_id, int snapshot,
> > static int sd_prealloc(const char *filename, Error **errp)
> > {
> > BlockDriverState *bs = NULL;
> > + BDRVSheepdogState *base = NULL;
> > + unsigned long buf_size;
> > uint32_t idx, max_idx;
> > + uint32_t object_size;
> > int64_t vdi_size;
> > - void *buf = g_malloc0(SD_DATA_OBJ_SIZE);
> > + void *buf = NULL;
> > int ret;
> >
> > ret = bdrv_open(&bs, filename, NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL,
> > @@ -1585,18 +1606,24 @@ static int sd_prealloc(const char *filename, Error **errp)
> > ret = vdi_size;
> > goto out;
> > }
> > - max_idx = DIV_ROUND_UP(vdi_size, SD_DATA_OBJ_SIZE);
> > +
> > + base = bs->opaque;
> > + object_size = (UINT32_C(1) << base->inode.block_size_shift);
> > + buf_size = MIN(object_size, SD_DATA_OBJ_SIZE);
> > + buf = g_malloc0(buf_size);
> > +
> > + max_idx = DIV_ROUND_UP(vdi_size, buf_size);
> >
> > for (idx = 0; idx < max_idx; idx++) {
> > /*
> > * The created image can be a cloned image, so we need to read
> > * a data from the source image.
> > */
> > - ret = bdrv_pread(bs, idx * SD_DATA_OBJ_SIZE, buf, SD_DATA_OBJ_SIZE);
> > + ret = bdrv_pread(bs, idx * buf_size, buf, buf_size);
> > if (ret < 0) {
> > goto out;
> > }
> > - ret = bdrv_pwrite(bs, idx * SD_DATA_OBJ_SIZE, buf, SD_DATA_OBJ_SIZE);
> > + ret = bdrv_pwrite(bs, idx * buf_size, buf, buf_size);
> > if (ret < 0) {
> > goto out;
> > }
> > @@ -1669,6 +1696,21 @@ static int parse_redundancy(BDRVSheepdogState *s, const char *opt)
> > return 0;
> > }
> >
> > +static int parse_block_size_shift(BDRVSheepdogState *s, QemuOpts *opt)
> > +{
> > + struct SheepdogInode *inode = &s->inode;
> > + inode->block_size_shift =
> > + (uint8_t)qemu_opt_get_number_del(opt, "block_size_shift", 0);
> > + if (inode->block_size_shift == 0) {
> > + /* block_size_shift is set for cluster default value by sheepdog */
> > + return 0;
> > + } else if (inode->block_size_shift < 20 || inode->block_size_shift > 31) {
> > + return -EINVAL;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > static int sd_create(const char *filename, QemuOpts *opts,
> > Error **errp)
> > {
> > @@ -1679,6 +1721,7 @@ static int sd_create(const char *filename, QemuOpts *opts,
> > BDRVSheepdogState *s;
> > char tag[SD_MAX_VDI_TAG_LEN];
> > uint32_t snapid;
> > + uint64_t max_vdi_size;
> > bool prealloc = false;
> >
> > s = g_new0(BDRVSheepdogState, 1);
> > @@ -1718,9 +1761,10 @@ static int sd_create(const char *filename, QemuOpts *opts,
> > }
> > }
> >
> > - if (s->inode.vdi_size > SD_MAX_VDI_SIZE) {
> > - error_setg(errp, "too big image size");
> > - ret = -EINVAL;
> > + ret = parse_block_size_shift(s, opts);
> > + if (ret < 0) {
> > + error_setg(errp, "Invalid block_size_shift:"
> > + " '%s'. please use 20-31", buf);
> > goto out;
> > }
> >
> > @@ -1757,6 +1801,47 @@ static int sd_create(const char *filename, QemuOpts *opts,
> > }
> >
> > s->aio_context = qemu_get_aio_context();
> > +
> > + /* if block_size_shift is not specified, get cluster default value */
> > + if (s->inode.block_size_shift == 0) {
>
> Seems that we don't keep backward compatibility for new QEMU and old sheep that
> doesn't support SD_OP_GET_CLUSTER_DEFAULT.
>
> What will happen if old QEMU with new sheep that support block_size_shift? Most
> distributions will ship the old stable qemu that wouldn't be aware of
> block_size_shift.
>
> I'd suggest we have 0 in block_size_shift represent 4MB to keep backward
> compatiblity.
How about make this additional field as the factor to multiply 4MB? that is,
(x + 1) * 4MB
qemu-img create -o object_size=8MB test 1G
will have x set as 1 and the formula will equal to 8MB.
-o object_size=4MB will set x as 0 and default value for x is 0 too to keep
backward compatiblity.
Thanks
Yuan
next prev parent reply other threads:[~2015-02-10 3:18 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-01-27 8:35 [Qemu-devel] [PATCH v4] sheepdog: selectable object size support Teruaki Ishizaki
2015-02-02 6:52 ` Liu Yuan
2015-02-04 4:54 ` Teruaki Ishizaki
2015-02-06 2:18 ` Liu Yuan
2015-02-06 7:57 ` Teruaki Ishizaki
2015-02-09 3:08 ` Liu Yuan
2015-02-10 3:10 ` Liu Yuan
2015-02-10 3:18 ` Liu Yuan [this message]
2015-02-10 8:22 ` Teruaki Ishizaki
2015-02-10 8:58 ` Liu Yuan
2015-02-10 9:56 ` Teruaki Ishizaki
2015-02-10 10:35 ` Liu Yuan
2015-02-12 6:19 ` [Qemu-devel] [sheepdog] " Hitoshi Mitake
2015-02-12 7:00 ` Liu Yuan
2015-02-12 7:28 ` Hitoshi Mitake
2015-02-12 7:42 ` Liu Yuan
2015-02-12 8:01 ` Teruaki Ishizaki
2015-02-12 8:11 ` Liu Yuan
2015-02-12 8:13 ` Hitoshi Mitake
2015-02-12 8:16 ` Liu Yuan
2015-02-10 11:12 ` [Qemu-devel] " Liu Yuan
2015-02-12 1:51 ` Teruaki Ishizaki
2015-02-12 2:19 ` Liu Yuan
2015-02-12 2:33 ` Teruaki Ishizaki
2015-02-12 2:55 ` Liu Yuan
2015-02-13 1:33 ` Teruaki Ishizaki
2015-02-13 2:01 ` Liu Yuan
2015-02-13 4:28 ` Teruaki Ishizaki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150210031825.GC3859@ubuntu-trusty \
--to=namei.unix@gmail.com \
--cc=ishizaki.teruaki@lab.ntt.co.jp \
--cc=kwolf@redhat.com \
--cc=mitake.hitoshi@lab.ntt.co.jp \
--cc=qemu-devel@nongnu.org \
--cc=sheepdog@lists.wpkg.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).