qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/3] sheepdog: support online snapshot from qemu-img
@ 2013-04-25 10:37 MORITA Kazutaka
  2013-04-25 10:37 ` [Qemu-devel] [PATCH 1/3] sheepdog: cleanup find_vdi_name MORITA Kazutaka
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: MORITA Kazutaka @ 2013-04-25 10:37 UTC (permalink / raw)
  To: stefanha, kwolf; +Cc: sheepdog, qemu-devel

Currently, we can take sheepdog snapshots of running VMs only from the
qemu monitor.  This series allows taking online snapshots from
qemu-img.

The first two patches prepare for the thrid patch.

MORITA Kazutaka (3):
  sheepdog: cleanup find_vdi_name
  sheepdog: add SD_RES_READONLY result code
  sheepdog: resend write requests when SD_RES_READONLY is received

 block/sheepdog.c |  115 ++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 107 insertions(+), 8 deletions(-)

-- 
1.7.2.5

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH 1/3] sheepdog: cleanup find_vdi_name
  2013-04-25 10:37 [Qemu-devel] [PATCH 0/3] sheepdog: support online snapshot from qemu-img MORITA Kazutaka
@ 2013-04-25 10:37 ` MORITA Kazutaka
  2013-04-25 10:37 ` [Qemu-devel] [PATCH 2/3] sheepdog: add SD_RES_READONLY result code MORITA Kazutaka
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: MORITA Kazutaka @ 2013-04-25 10:37 UTC (permalink / raw)
  To: stefanha, kwolf; +Cc: sheepdog, qemu-devel

This makes 'filename' and 'tag' constant variables, and renames
'for_snapshot' to 'lock' to clear how it works.

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
---
 block/sheepdog.c |   15 ++++++++-------
 1 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 9f30a87..4326664 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -941,8 +941,9 @@ static int parse_vdiname(BDRVSheepdogState *s, const char *filename,
     return ret;
 }
 
-static int find_vdi_name(BDRVSheepdogState *s, char *filename, uint32_t snapid,
-                         char *tag, uint32_t *vid, int for_snapshot)
+static int find_vdi_name(BDRVSheepdogState *s, const char *filename,
+                         uint32_t snapid, const char *tag, uint32_t *vid,
+                         bool lock)
 {
     int ret, fd;
     SheepdogVdiReq hdr;
@@ -963,10 +964,10 @@ static int find_vdi_name(BDRVSheepdogState *s, char *filename, uint32_t snapid,
     strncpy(buf + SD_MAX_VDI_LEN, tag, SD_MAX_VDI_TAG_LEN);
 
     memset(&hdr, 0, sizeof(hdr));
-    if (for_snapshot) {
-        hdr.opcode = SD_OP_GET_VDI_INFO;
-    } else {
+    if (lock) {
         hdr.opcode = SD_OP_LOCK_VDI;
+    } else {
+        hdr.opcode = SD_OP_GET_VDI_INFO;
     }
     wlen = SD_MAX_VDI_LEN + SD_MAX_VDI_TAG_LEN;
     hdr.proto_ver = SD_PROTO_VER;
@@ -1205,7 +1206,7 @@ static int sd_open(BlockDriverState *bs, QDict *options, int flags)
         goto out;
     }
 
-    ret = find_vdi_name(s, vdi, snapid, tag, &vid, 0);
+    ret = find_vdi_name(s, vdi, snapid, tag, &vid, true);
     if (ret) {
         goto out;
     }
@@ -1921,7 +1922,7 @@ static int sd_snapshot_goto(BlockDriverState *bs, const char *snapshot_id)
         pstrcpy(tag, sizeof(tag), s->name);
     }
 
-    ret = find_vdi_name(s, vdi, snapid, tag, &vid, 1);
+    ret = find_vdi_name(s, vdi, snapid, tag, &vid, false);
     if (ret) {
         error_report("Failed to find_vdi_name");
         goto out;
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH 2/3] sheepdog: add SD_RES_READONLY result code
  2013-04-25 10:37 [Qemu-devel] [PATCH 0/3] sheepdog: support online snapshot from qemu-img MORITA Kazutaka
  2013-04-25 10:37 ` [Qemu-devel] [PATCH 1/3] sheepdog: cleanup find_vdi_name MORITA Kazutaka
@ 2013-04-25 10:37 ` MORITA Kazutaka
  2013-04-25 10:37 ` [Qemu-devel] [PATCH 3/3] sheepdog: resend write requests when SD_RES_READONLY is received MORITA Kazutaka
  2013-04-25 13:44 ` [Qemu-devel] [PATCH 0/3] sheepdog: support online snapshot from qemu-img Eric Blake
  3 siblings, 0 replies; 10+ messages in thread
From: MORITA Kazutaka @ 2013-04-25 10:37 UTC (permalink / raw)
  To: stefanha, kwolf; +Cc: sheepdog, qemu-devel

Sheepdog returns SD_RES_READONLY when qemu sends write requests to the
snapshot vdi.  This adds the result code and makes sd_strerror() print
its error reason.

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
---
 block/sheepdog.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index 4326664..f4e7204 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -68,6 +68,7 @@
 #define SD_RES_WAIT_FOR_JOIN    0x17 /* Waiting for other nodes joining */
 #define SD_RES_JOIN_FAILED   0x18 /* Target node had failed to join sheepdog */
 #define SD_RES_HALT          0x19 /* Sheepdog is stopped serving IO request */
+#define SD_RES_READONLY      0x1A /* Object is read-only */
 
 /*
  * Object ID rules
@@ -349,6 +350,7 @@ static const char * sd_strerror(int err)
         {SD_RES_WAIT_FOR_JOIN, "Sheepdog is waiting for other nodes joining"},
         {SD_RES_JOIN_FAILED, "Target node had failed to join sheepdog"},
         {SD_RES_HALT, "Sheepdog is stopped serving IO request"},
+        {SD_RES_READONLY, "Object is read-only"},
     };
 
     for (i = 0; i < ARRAY_SIZE(errors); ++i) {
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [Qemu-devel] [PATCH 3/3] sheepdog: resend write requests when SD_RES_READONLY is received
  2013-04-25 10:37 [Qemu-devel] [PATCH 0/3] sheepdog: support online snapshot from qemu-img MORITA Kazutaka
  2013-04-25 10:37 ` [Qemu-devel] [PATCH 1/3] sheepdog: cleanup find_vdi_name MORITA Kazutaka
  2013-04-25 10:37 ` [Qemu-devel] [PATCH 2/3] sheepdog: add SD_RES_READONLY result code MORITA Kazutaka
@ 2013-04-25 10:37 ` MORITA Kazutaka
  2013-04-25 13:08   ` [Qemu-devel] [sheepdog] " Liu Yuan
  2013-04-25 13:56   ` [Qemu-devel] " Stefan Hajnoczi
  2013-04-25 13:44 ` [Qemu-devel] [PATCH 0/3] sheepdog: support online snapshot from qemu-img Eric Blake
  3 siblings, 2 replies; 10+ messages in thread
From: MORITA Kazutaka @ 2013-04-25 10:37 UTC (permalink / raw)
  To: stefanha, kwolf; +Cc: sheepdog, qemu-devel

When a snapshot is taken from out side of qemu (e.g. qemu-img
snapshot), write requests to the current vdi return SD_RES_READONLY.
In this case, the sheepdog block driver needs to update the current
inode to the latest one and resend the write requests.

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
---
 block/sheepdog.c |   98 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 97 insertions(+), 1 deletions(-)

diff --git a/block/sheepdog.c b/block/sheepdog.c
index f4e7204..3338cd1 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -605,6 +605,7 @@ static int do_req(int sockfd, SheepdogReq *hdr, void *data,
 static int coroutine_fn add_aio_request(BDRVSheepdogState *s, AIOReq *aio_req,
                            struct iovec *iov, int niov, bool create,
                            enum AIOCBState aiocb_type);
+static int resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req);
 
 
 static AIOReq *find_pending_req(BDRVSheepdogState *s, uint64_t oid)
@@ -749,9 +750,19 @@ static void coroutine_fn aio_read_response(void *opaque)
         }
     }
 
-    if (rsp.result != SD_RES_SUCCESS) {
+    switch (rsp.result) {
+    case SD_RES_SUCCESS:
+        break;
+    case SD_RES_READONLY:
+        ret = resend_aioreq(s, aio_req);
+        if (ret == SD_RES_SUCCESS) {
+            goto out;
+        }
+        /* fall through */
+    default:
         acb->ret = -EIO;
         error_report("%s", sd_strerror(rsp.result));
+        break;
     }
 
     free_aio_req(s, aio_req);
@@ -1150,6 +1161,91 @@ static int write_object(int fd, char *buf, uint64_t oid, int copies,
                              create, cache_flags);
 }
 
+/* update inode with the latest state */
+static int coroutine_fn reload_vdi_object(BDRVSheepdogState *s)
+{
+    SheepdogInode *inode;
+    int ret = 0, fd;
+    uint32_t vid;
+
+    inode = (SheepdogInode *)g_malloc(sizeof(s->inode));
+
+    fd = connect_to_sdog(s);
+    if (fd < 0) {
+        ret = -EIO;
+        goto out;
+    }
+
+    ret = find_vdi_name(s, s->name, 0, "", &vid, false);
+    if (ret) {
+        goto out;
+    }
+
+    ret = read_object(fd, (char *)inode, vid_to_vdi_oid(vid),
+                      s->inode.nr_copies, sizeof(*inode), 0, s->cache_flags);
+    if (ret < 0) {
+        goto out;
+    }
+
+    if (inode->vdi_id != s->inode.vdi_id) {
+        memcpy(&s->inode, inode, sizeof(s->inode));
+    }
+
+out:
+    free(inode);
+    closesocket(fd);
+
+    return ret;
+}
+
+static int resend_aioreq(BDRVSheepdogState *s, AIOReq *aio_req)
+{
+    SheepdogAIOCB *acb = aio_req->aiocb;
+    bool create = false;
+    int ret;
+
+    ret = reload_vdi_object(s);
+    if (ret < 0) {
+        return ret;
+    }
+
+    aio_req->oid = vid_to_data_oid(s->inode.vdi_id,
+                                   data_oid_to_idx(aio_req->oid));
+
+    /* check whether this request becomes a CoW one */
+    if (acb->aiocb_type == AIOCB_WRITE_UDATA) {
+        int idx = data_oid_to_idx(aio_req->oid);
+        AIOReq *areq;
+
+        if (s->inode.data_vdi_id[idx] == 0) {
+            create = true;
+            goto out;
+        }
+        if (is_data_obj_writable(&s->inode, idx)) {
+            goto out;
+        }
+
+        /* link to the pendng list if there is another CoW request to
+         * the same object */
+        QLIST_FOREACH(areq, &s->inflight_aio_head, aio_siblings) {
+            if (areq != aio_req && areq->oid == aio_req->oid) {
+                dprintf("simultaneous CoW to %" PRIx64 "\n", aio_req->oid);
+                QLIST_REMOVE(aio_req, aio_siblings);
+                QLIST_INSERT_HEAD(&s->pending_aio_head, aio_req, aio_siblings);
+                return SD_RES_SUCCESS;
+            }
+        }
+
+        aio_req->base_oid = vid_to_data_oid(s->inode.data_vdi_id[idx], idx);
+        aio_req->flags |= SD_FLAG_CMD_COW;
+        create = true;
+    }
+out:
+    return add_aio_request(s, aio_req, acb->qiov->iov, acb->qiov->niov,
+                           create, acb->aiocb_type);
+}
+
+
 /* TODO Convert to fine grained options */
 static QemuOptsList runtime_opts = {
     .name = "sheepdog",
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [sheepdog] [PATCH 3/3] sheepdog: resend write requests when SD_RES_READONLY is received
  2013-04-25 10:37 ` [Qemu-devel] [PATCH 3/3] sheepdog: resend write requests when SD_RES_READONLY is received MORITA Kazutaka
@ 2013-04-25 13:08   ` Liu Yuan
  2013-04-25 16:16     ` MORITA Kazutaka
  2013-04-25 13:56   ` [Qemu-devel] " Stefan Hajnoczi
  1 sibling, 1 reply; 10+ messages in thread
From: Liu Yuan @ 2013-04-25 13:08 UTC (permalink / raw)
  To: MORITA Kazutaka; +Cc: kwolf, sheepdog, qemu-devel, stefanha

On 04/25/2013 06:37 PM, MORITA Kazutaka wrote:
> +/* update inode with the latest state */
> +static int coroutine_fn reload_vdi_object(BDRVSheepdogState *s)

I'd suggest function name as
'reload_inode(BDRVSheepdogState *s, tag, snapid)', then sd_create_branch
and sd_snapshot_goto can make use of this function. With this change, it
would be conflicted to my patch series applied to Stefan's block tree +
Fix loadvm patch. So it would be better you can develop this patch set
against Stefan's block tree + my fix loadvm patch.

Others looks good to me.

Thanks,
Yuan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] sheepdog: support online snapshot from qemu-img
  2013-04-25 10:37 [Qemu-devel] [PATCH 0/3] sheepdog: support online snapshot from qemu-img MORITA Kazutaka
                   ` (2 preceding siblings ...)
  2013-04-25 10:37 ` [Qemu-devel] [PATCH 3/3] sheepdog: resend write requests when SD_RES_READONLY is received MORITA Kazutaka
@ 2013-04-25 13:44 ` Eric Blake
  2013-04-25 14:01   ` [Qemu-devel] [sheepdog] " Liu Yuan
  2013-04-25 14:06   ` [Qemu-devel] " Stefan Hajnoczi
  3 siblings, 2 replies; 10+ messages in thread
From: Eric Blake @ 2013-04-25 13:44 UTC (permalink / raw)
  To: MORITA Kazutaka; +Cc: kwolf, sheepdog, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 650 bytes --]

On 04/25/2013 04:37 AM, MORITA Kazutaka wrote:
> Currently, we can take sheepdog snapshots of running VMs only from the
> qemu monitor.  This series allows taking online snapshots from
> qemu-img.

Is that even safe?  We specifically recommend that users NOT use
qemu-img on any image that is in-use by a qemu process, and instead only
use qemu monitor commands; since qemu-img may see inconsistent state,
and worse, any writes that qemu-img do may cause qemu to see
inconsistent state and cause corruption to the running guest.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] sheepdog: resend write requests when SD_RES_READONLY is received
  2013-04-25 10:37 ` [Qemu-devel] [PATCH 3/3] sheepdog: resend write requests when SD_RES_READONLY is received MORITA Kazutaka
  2013-04-25 13:08   ` [Qemu-devel] [sheepdog] " Liu Yuan
@ 2013-04-25 13:56   ` Stefan Hajnoczi
  1 sibling, 0 replies; 10+ messages in thread
From: Stefan Hajnoczi @ 2013-04-25 13:56 UTC (permalink / raw)
  To: MORITA Kazutaka; +Cc: kwolf, sheepdog, qemu-devel, stefanha

On Thu, Apr 25, 2013 at 07:37:43PM +0900, MORITA Kazutaka wrote:
> +/* update inode with the latest state */
> +static int coroutine_fn reload_vdi_object(BDRVSheepdogState *s)
> +{
> +    SheepdogInode *inode;
> +    int ret = 0, fd;
> +    uint32_t vid;
> +
> +    inode = (SheepdogInode *)g_malloc(sizeof(s->inode));
> +
> +    fd = connect_to_sdog(s);
> +    if (fd < 0) {
> +        ret = -EIO;
> +        goto out;
> +    }
> +
> +    ret = find_vdi_name(s, s->name, 0, "", &vid, false);
> +    if (ret) {
> +        goto out;
> +    }
> +
> +    ret = read_object(fd, (char *)inode, vid_to_vdi_oid(vid),
> +                      s->inode.nr_copies, sizeof(*inode), 0, s->cache_flags);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
> +    if (inode->vdi_id != s->inode.vdi_id) {
> +        memcpy(&s->inode, inode, sizeof(s->inode));
> +    }
> +
> +out:
> +    free(inode);

g_malloc() must be paired with g_free().

> +        /* link to the pendng list if there is another CoW request to

s/pendng/pending/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [sheepdog] [PATCH 0/3] sheepdog: support online snapshot from qemu-img
  2013-04-25 13:44 ` [Qemu-devel] [PATCH 0/3] sheepdog: support online snapshot from qemu-img Eric Blake
@ 2013-04-25 14:01   ` Liu Yuan
  2013-04-25 14:06   ` [Qemu-devel] " Stefan Hajnoczi
  1 sibling, 0 replies; 10+ messages in thread
From: Liu Yuan @ 2013-04-25 14:01 UTC (permalink / raw)
  To: Eric Blake; +Cc: kwolf, stefanha, sheepdog, qemu-devel, MORITA Kazutaka

On 04/25/2013 09:44 PM, Eric Blake wrote:
> Is that even safe?  We specifically recommend that users NOT use
> qemu-img on any image that is in-use by a qemu process, and instead only
> use qemu monitor commands; since qemu-img may see inconsistent state,
> and worse, any writes that qemu-img do may cause qemu to see
> inconsistent state and cause corruption to the running guest.

Then why not disable ( qemu-im snapshoting live VM) this behavior
completely?

Thanks,
Yuan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] sheepdog: support online snapshot from qemu-img
  2013-04-25 13:44 ` [Qemu-devel] [PATCH 0/3] sheepdog: support online snapshot from qemu-img Eric Blake
  2013-04-25 14:01   ` [Qemu-devel] [sheepdog] " Liu Yuan
@ 2013-04-25 14:06   ` Stefan Hajnoczi
  1 sibling, 0 replies; 10+ messages in thread
From: Stefan Hajnoczi @ 2013-04-25 14:06 UTC (permalink / raw)
  To: Eric Blake; +Cc: kwolf, sheepdog, qemu-devel, MORITA Kazutaka

On Thu, Apr 25, 2013 at 07:44:29AM -0600, Eric Blake wrote:
> On 04/25/2013 04:37 AM, MORITA Kazutaka wrote:
> > Currently, we can take sheepdog snapshots of running VMs only from the
> > qemu monitor.  This series allows taking online snapshots from
> > qemu-img.
> 
> Is that even safe?  We specifically recommend that users NOT use
> qemu-img on any image that is in-use by a qemu process, and instead only
> use qemu monitor commands; since qemu-img may see inconsistent state,
> and worse, any writes that qemu-img do may cause qemu to see
> inconsistent state and cause corruption to the running guest.

It's safe in the same way that multiple hosts can access the same NFS
export.  All I/O is going through the same NFS server and is therefore
kind of consistent ;).

The problem that qcow2 has is that two programs would overwrite metadata
in the file, possibly stepping on each other's feet and corrupting the
image metadata.

Sheepdog can handle snapshot requests coming from different clients.

Users can still shoot themselves in the foot if they start doing things
like reading or writing image data while the guest is running.

Stefan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [sheepdog] [PATCH 3/3] sheepdog: resend write requests when SD_RES_READONLY is received
  2013-04-25 13:08   ` [Qemu-devel] [sheepdog] " Liu Yuan
@ 2013-04-25 16:16     ` MORITA Kazutaka
  0 siblings, 0 replies; 10+ messages in thread
From: MORITA Kazutaka @ 2013-04-25 16:16 UTC (permalink / raw)
  To: Liu Yuan; +Cc: kwolf, stefanha, sheepdog, qemu-devel, MORITA Kazutaka

At Thu, 25 Apr 2013 21:08:01 +0800,
Liu Yuan wrote:
> 
> On 04/25/2013 06:37 PM, MORITA Kazutaka wrote:
> > +/* update inode with the latest state */
> > +static int coroutine_fn reload_vdi_object(BDRVSheepdogState *s)
> 
> I'd suggest function name as
> 'reload_inode(BDRVSheepdogState *s, tag, snapid)', then sd_create_branch
> and sd_snapshot_goto can make use of this function. With this change, it
> would be conflicted to my patch series applied to Stefan's block tree +
> Fix loadvm patch. So it would be better you can develop this patch set
> against Stefan's block tree + my fix loadvm patch.

sd_create_branch() doesn't use find_vdi_name() to get vid, so
reload_inode() is not suitable for it.  After all, I used
reload_inode() only for sd_snapshot_goto() and resend_aioreq() in v2,
so our patches didn't conflict.

Thanks,

Kazutaka

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-04-25 16:16 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-25 10:37 [Qemu-devel] [PATCH 0/3] sheepdog: support online snapshot from qemu-img MORITA Kazutaka
2013-04-25 10:37 ` [Qemu-devel] [PATCH 1/3] sheepdog: cleanup find_vdi_name MORITA Kazutaka
2013-04-25 10:37 ` [Qemu-devel] [PATCH 2/3] sheepdog: add SD_RES_READONLY result code MORITA Kazutaka
2013-04-25 10:37 ` [Qemu-devel] [PATCH 3/3] sheepdog: resend write requests when SD_RES_READONLY is received MORITA Kazutaka
2013-04-25 13:08   ` [Qemu-devel] [sheepdog] " Liu Yuan
2013-04-25 16:16     ` MORITA Kazutaka
2013-04-25 13:56   ` [Qemu-devel] " Stefan Hajnoczi
2013-04-25 13:44 ` [Qemu-devel] [PATCH 0/3] sheepdog: support online snapshot from qemu-img Eric Blake
2013-04-25 14:01   ` [Qemu-devel] [sheepdog] " Liu Yuan
2013-04-25 14:06   ` [Qemu-devel] " Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).