* [Qemu-devel] [RFC V3 1/9] quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB.
2012-08-14 14:14 [Qemu-devel] [RFC V3 0/9] Quorum disk image corruption resiliency Benoît Canet
@ 2012-08-14 14:14 ` Benoît Canet
2012-08-14 14:14 ` [Qemu-devel] [RFC V3 2/9] quorum: Create BDRVQuorumState and BlkDriver and do init Benoît Canet
` (10 subsequent siblings)
11 siblings, 0 replies; 25+ messages in thread
From: Benoît Canet @ 2012-08-14 14:14 UTC (permalink / raw)
To: qemu-devel
Cc: kwolf, stefanha, blauwirbel, anthony, pbonzini, eblake, afaerber,
Benoît Canet
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/Makefile.objs | 1 +
block/quorum.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 46 insertions(+)
create mode 100644 block/quorum.c
diff --git a/block/Makefile.objs b/block/Makefile.objs
index b5754d3..66af6dc 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -4,6 +4,7 @@ block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
block-obj-y += qed-check.o
block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
block-obj-y += stream.o
+block-obj-y += quorum.o
block-obj-$(CONFIG_WIN32) += raw-win32.o
block-obj-$(CONFIG_POSIX) += raw-posix.o
block-obj-$(CONFIG_LIBISCSI) += iscsi.o
diff --git a/block/quorum.c b/block/quorum.c
new file mode 100644
index 0000000..65a6b55
--- /dev/null
+++ b/block/quorum.c
@@ -0,0 +1,45 @@
+/*
+ * Quorum Block filter
+ *
+ * Copyright (C) 2012 Nodalink, SARL.
+ *
+ * Author:
+ * Benoît Canet <benoit.canet@irqsave.net>
+ *
+ * Based on the design and code of blkverify.c (Copyright (C) 2010 IBM, Corp)
+ * and blkmirror.c (Copyright (C) 2011 Red Hat, Inc).
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "block_int.h"
+
+typedef struct QuorumAIOCB QuorumAIOCB;
+
+typedef struct QuorumSingleAIOCB {
+ BlockDriverAIOCB *aiocb;
+ uint8_t *buf;
+ int ret;
+ QuorumAIOCB *parent;
+} QuorumSingleAIOCB;
+
+struct QuorumAIOCB {
+ BlockDriverAIOCB common;
+ QEMUBH *bh;
+
+ /* Request metadata */
+ int64_t sector_num;
+ int nb_sectors;
+
+ QEMUIOVector *qiov; /* calling readv IOV */
+
+ QuorumSingleAIOCB *aios; /* individual AIOs */
+ QEMUIOVector *qiovs; /* individual IOVs */
+ int count; /* number of completed AIOCB */
+ int success_count; /* number of successfully completed AIOCB */
+ bool *finished; /* completion signal for cancel */
+
+ void (*vote)(QuorumAIOCB *acb);
+ int vote_ret;
+};
--
1.7.9.5
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [Qemu-devel] [RFC V3 2/9] quorum: Create BDRVQuorumState and BlkDriver and do init.
2012-08-14 14:14 [Qemu-devel] [RFC V3 0/9] Quorum disk image corruption resiliency Benoît Canet
2012-08-14 14:14 ` [Qemu-devel] [RFC V3 1/9] quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB Benoît Canet
@ 2012-08-14 14:14 ` Benoît Canet
2012-08-14 14:14 ` [Qemu-devel] [RFC V3 3/9] quorum: Add quorum_open() and quorum_close() Benoît Canet
` (9 subsequent siblings)
11 siblings, 0 replies; 25+ messages in thread
From: Benoît Canet @ 2012-08-14 14:14 UTC (permalink / raw)
To: qemu-devel
Cc: kwolf, stefanha, blauwirbel, anthony, pbonzini, eblake, afaerber,
Benoît Canet
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/quorum.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/block/quorum.c b/block/quorum.c
index 65a6b55..bab6175 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -15,6 +15,13 @@
#include "block_int.h"
+typedef struct {
+ BlockDriverState **bs;
+ int n;
+ int m;
+ char **filenames;
+} BDRVQuorumState;
+
typedef struct QuorumAIOCB QuorumAIOCB;
typedef struct QuorumSingleAIOCB {
@@ -26,6 +33,7 @@ typedef struct QuorumSingleAIOCB {
struct QuorumAIOCB {
BlockDriverAIOCB common;
+ BDRVQuorumState *bqs;
QEMUBH *bh;
/* Request metadata */
@@ -43,3 +51,17 @@ struct QuorumAIOCB {
void (*vote)(QuorumAIOCB *acb);
int vote_ret;
};
+
+static BlockDriver bdrv_quorum = {
+ .format_name = "quorum",
+ .protocol_name = "quorum",
+
+ .instance_size = sizeof(BDRVQuorumState),
+};
+
+static void bdrv_quorum_init(void)
+{
+ bdrv_register(&bdrv_quorum);
+}
+
+block_init(bdrv_quorum_init);
--
1.7.9.5
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [Qemu-devel] [RFC V3 3/9] quorum: Add quorum_open() and quorum_close().
2012-08-14 14:14 [Qemu-devel] [RFC V3 0/9] Quorum disk image corruption resiliency Benoît Canet
2012-08-14 14:14 ` [Qemu-devel] [RFC V3 1/9] quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB Benoît Canet
2012-08-14 14:14 ` [Qemu-devel] [RFC V3 2/9] quorum: Create BDRVQuorumState and BlkDriver and do init Benoît Canet
@ 2012-08-14 14:14 ` Benoît Canet
2012-08-14 14:45 ` Eric Blake
2012-08-14 14:14 ` [Qemu-devel] [RFC V3 4/9] quorum: Add quorum_getlength() Benoît Canet
` (8 subsequent siblings)
11 siblings, 1 reply; 25+ messages in thread
From: Benoît Canet @ 2012-08-14 14:14 UTC (permalink / raw)
To: qemu-devel
Cc: kwolf, stefanha, blauwirbel, anthony, pbonzini, eblake, afaerber,
Benoît Canet
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/quorum.c | 113 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 113 insertions(+)
diff --git a/block/quorum.c b/block/quorum.c
index bab6175..f228428 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -52,11 +52,124 @@ struct QuorumAIOCB {
int vote_ret;
};
+/* Valid quorum filenames look like
+ * quorum:n/m:path/to/image_1, ... ,path/to/image_m
+ */
+static int quorum_open(BlockDriverState *bs, const char *filename, int flags)
+{
+ BDRVQuorumState *s = bs->opaque;
+ int escape, i, j, len, ret = 0;
+ char *a, *b, *names;
+
+ /* Parse the quorum: prefix */
+ if (strncmp(filename, "quorum:", strlen("quorum:"))) {
+ return -EINVAL;
+ }
+
+ filename += strlen("quorum:");
+
+ /* Get n */
+ errno = 0;
+ s->n = strtoul(filename, &a, 10);
+ if (*a != '/' || errno) {
+ return -EINVAL;
+ }
+ a += 1;
+
+ /* Get m */
+ errno = 0;
+ s->m = strtoul(a, &b, 10);
+ if (*b != ':' || errno) {
+ return -EINVAL;
+ }
+ b += 1;
+
+ if (s->n < 1 || s->m < 2) {
+ return -EINVAL;
+ }
+
+ if (s->n > s->m) {
+ return -EINVAL;
+ }
+
+ s->bs = g_malloc0(sizeof(BlockDriverState *) * s->m);
+ /* Two allocations for all filenames: simpler to free */
+ s->filenames = g_malloc0(sizeof(char *) * s->m);
+ names = g_strdup(b);
+
+ /* Get the filenames pointers */
+ escape = 0;
+ s->filenames[0] = names;
+ len = strlen(names);
+ for (i = 0, j = 1; i < len && j < s->m; i++) {
+ if (!escape && names[i] == ':') {
+ names[i] = '\0';
+ s->filenames[j] = names + i + 1;
+ j += 1;
+ }
+
+ if (!escape && names[i] == '\\') {
+ escape = 1;
+ } else {
+ escape = 0;
+ }
+ }
+
+ if (j != s->m) {
+ ret = -EINVAL;
+ goto free_exit;
+ }
+
+ /* Open files */
+ for (i = 0; i < s->m; i++) {
+ s->bs[i] = bdrv_new("");
+ ret = bdrv_open(s->bs[i], s->filenames[i], flags, NULL);
+ if (ret < 0) {
+ goto error_exit;
+ }
+ }
+
+ goto exit;
+
+error_exit:
+ for (; i >= 0; i--) {
+ bdrv_delete(s->bs[i]);
+ s->bs[i] = NULL;
+ }
+free_exit:
+ g_free(s->filenames[0]);
+ g_free(s->filenames);
+ s->filenames = NULL;
+ g_free(s->bs);
+exit:
+ return ret;
+}
+
+static void quorum_close(BlockDriverState *bs)
+{
+ BDRVQuorumState *s = bs->opaque;
+ int i;
+
+ for (i = 0; i < s->m; i++) {
+ /* Ensure writes reach stable storage */
+ bdrv_flush(s->bs[i]);
+ bdrv_delete(s->bs[i]);
+ }
+
+ g_free(s->filenames[0]);
+ g_free(s->filenames);
+ s->filenames = NULL;
+ g_free(s->bs);
+}
+
static BlockDriver bdrv_quorum = {
.format_name = "quorum",
.protocol_name = "quorum",
.instance_size = sizeof(BDRVQuorumState),
+
+ .bdrv_file_open = quorum_open,
+ .bdrv_close = quorum_close,
};
static void bdrv_quorum_init(void)
--
1.7.9.5
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [RFC V3 3/9] quorum: Add quorum_open() and quorum_close().
2012-08-14 14:14 ` [Qemu-devel] [RFC V3 3/9] quorum: Add quorum_open() and quorum_close() Benoît Canet
@ 2012-08-14 14:45 ` Eric Blake
0 siblings, 0 replies; 25+ messages in thread
From: Eric Blake @ 2012-08-14 14:45 UTC (permalink / raw)
To: Benoît Canet
Cc: kwolf, stefanha, qemu-devel, blauwirbel, anthony, pbonzini,
afaerber, Benoît Canet
[-- Attachment #1: Type: text/plain, Size: 1918 bytes --]
On 08/14/2012 08:14 AM, Benoît Canet wrote:
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
Your commit message is sparse. At least document the syntax expected
for opening a quorum file.
> ---
> block/quorum.c | 113 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 113 insertions(+)
>
> diff --git a/block/quorum.c b/block/quorum.c
> index bab6175..f228428 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -52,11 +52,124 @@ struct QuorumAIOCB {
> int vote_ret;
> };
>
> +/* Valid quorum filenames look like
> + * quorum:n/m:path/to/image_1, ... ,path/to/image_m
such as copying this in the commit message. Also, document your escape
handling; I had to read it from the code, but it looks like you set up
'\' to escape anything, so that '\:' and '\\' are the spellings for a
literal backslash or colon in a file name.
> + */
> +static int quorum_open(BlockDriverState *bs, const char *filename, int flags)
> +{
> + BDRVQuorumState *s = bs->opaque;
> + int escape, i, j, len, ret = 0;
escape only ever holds 0 or 1, so it should be a 'bool' instead.
> + /* Get the filenames pointers */
> + escape = 0;
s/0/false/
> + s->filenames[0] = names;
> + len = strlen(names);
> + for (i = 0, j = 1; i < len && j < s->m; i++) {
> + if (!escape && names[i] == ':') {
> + names[i] = '\0';
> + s->filenames[j] = names + i + 1;
> + j += 1;
Isn't this usually written 'j++'?
> + }
> +
> + if (!escape && names[i] == '\\') {
> + escape = 1;
s/1/true/
> + } else {
> + escape = 0;
s/0/false/
> + }
Or even simplify the 'if' to a one-liner:
escape = !escape && names[i] == '\\';
--
Eric Blake eblake@redhat.com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]
^ permalink raw reply [flat|nested] 25+ messages in thread
* [Qemu-devel] [RFC V3 4/9] quorum: Add quorum_getlength().
2012-08-14 14:14 [Qemu-devel] [RFC V3 0/9] Quorum disk image corruption resiliency Benoît Canet
` (2 preceding siblings ...)
2012-08-14 14:14 ` [Qemu-devel] [RFC V3 3/9] quorum: Add quorum_open() and quorum_close() Benoît Canet
@ 2012-08-14 14:14 ` Benoît Canet
2012-08-14 16:08 ` Eric Blake
2012-08-14 14:14 ` [Qemu-devel] [RFC V3 5/9] quorum: Add quorum_aio_writev and its dependencies Benoît Canet
` (7 subsequent siblings)
11 siblings, 1 reply; 25+ messages in thread
From: Benoît Canet @ 2012-08-14 14:14 UTC (permalink / raw)
To: qemu-devel
Cc: kwolf, stefanha, blauwirbel, anthony, pbonzini, eblake, afaerber,
Benoît Canet
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/quorum.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/block/quorum.c b/block/quorum.c
index f228428..a3f16ed 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -162,12 +162,21 @@ static void quorum_close(BlockDriverState *bs)
g_free(s->bs);
}
+static int64_t quorum_getlength(BlockDriverState *bs)
+{
+ BDRVQuorumState *s = bs->opaque;
+
+ return bdrv_getlength(s->bs[0]);
+}
+
static BlockDriver bdrv_quorum = {
.format_name = "quorum",
.protocol_name = "quorum",
.instance_size = sizeof(BDRVQuorumState),
+ .bdrv_getlength = quorum_getlength,
+
.bdrv_file_open = quorum_open,
.bdrv_close = quorum_close,
};
--
1.7.9.5
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [RFC V3 4/9] quorum: Add quorum_getlength().
2012-08-14 14:14 ` [Qemu-devel] [RFC V3 4/9] quorum: Add quorum_getlength() Benoît Canet
@ 2012-08-14 16:08 ` Eric Blake
2012-08-16 13:18 ` Benoît Canet
0 siblings, 1 reply; 25+ messages in thread
From: Eric Blake @ 2012-08-14 16:08 UTC (permalink / raw)
To: Benoît Canet
Cc: kwolf, stefanha, qemu-devel, blauwirbel, anthony, pbonzini,
afaerber, Benoît Canet
[-- Attachment #1: Type: text/plain, Size: 951 bytes --]
On 08/14/2012 08:14 AM, Benoît Canet wrote:
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
> block/quorum.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/block/quorum.c b/block/quorum.c
> index f228428..a3f16ed 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -162,12 +162,21 @@ static void quorum_close(BlockDriverState *bs)
> g_free(s->bs);
> }
>
> +static int64_t quorum_getlength(BlockDriverState *bs)
> +{
> + BDRVQuorumState *s = bs->opaque;
> +
> + return bdrv_getlength(s->bs[0]);
Is this implementation right? Shouldn't this be a quorum decision,
where all s->bs[...] elements have to agree on the same size, or even
where they can differ on size, as long as all files with larger size
have unallocated holes past the size of the smaller member?
--
Eric Blake eblake@redhat.com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [RFC V3 4/9] quorum: Add quorum_getlength().
2012-08-14 16:08 ` Eric Blake
@ 2012-08-16 13:18 ` Benoît Canet
0 siblings, 0 replies; 25+ messages in thread
From: Benoît Canet @ 2012-08-16 13:18 UTC (permalink / raw)
To: Eric Blake
Cc: kwolf, Benoît Canet, stefanha, qemu-devel, blauwirbel,
anthony, pbonzini, afaerber
Le Tuesday 14 Aug 2012 à 10:08:24 (-0600), Eric Blake a écrit :
> On 08/14/2012 08:14 AM, Benoît Canet wrote:
> > Signed-off-by: Benoit Canet <benoit@irqsave.net>
> > ---
> > block/quorum.c | 9 +++++++++
> > 1 file changed, 9 insertions(+)
> >
> > diff --git a/block/quorum.c b/block/quorum.c
> > index f228428..a3f16ed 100644
> > --- a/block/quorum.c
> > +++ b/block/quorum.c
> > @@ -162,12 +162,21 @@ static void quorum_close(BlockDriverState *bs)
> > g_free(s->bs);
> > }
> >
> > +static int64_t quorum_getlength(BlockDriverState *bs)
> > +{
> > + BDRVQuorumState *s = bs->opaque;
> > +
> > + return bdrv_getlength(s->bs[0]);
>
> Is this implementation right? Shouldn't this be a quorum decision,
> where all s->bs[...] elements have to agree on the same size, or even
> where they can differ on size, as long as all files with larger size
> have unallocated holes past the size of the smaller member?
You are right.
I have trouble figuring how it would work with differents sizes.
Requiring quorum decision on the same size seems the best solutions
I will implement it.
Benoît
>
> --
> Eric Blake eblake@redhat.com +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* [Qemu-devel] [RFC V3 5/9] quorum: Add quorum_aio_writev and its dependencies.
2012-08-14 14:14 [Qemu-devel] [RFC V3 0/9] Quorum disk image corruption resiliency Benoît Canet
` (3 preceding siblings ...)
2012-08-14 14:14 ` [Qemu-devel] [RFC V3 4/9] quorum: Add quorum_getlength() Benoît Canet
@ 2012-08-14 14:14 ` Benoît Canet
2012-08-14 14:14 ` [Qemu-devel] [RFC V3 6/9] blkverify: Extract qemu_iovec_clone() and qemu_iovec_compare() from blkverify Benoît Canet
` (6 subsequent siblings)
11 siblings, 0 replies; 25+ messages in thread
From: Benoît Canet @ 2012-08-14 14:14 UTC (permalink / raw)
To: qemu-devel
Cc: kwolf, stefanha, blauwirbel, anthony, pbonzini, eblake, afaerber,
Benoît Canet
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/quorum.c | 112 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 112 insertions(+)
diff --git a/block/quorum.c b/block/quorum.c
index a3f16ed..0a6647f 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -169,6 +169,116 @@ static int64_t quorum_getlength(BlockDriverState *bs)
return bdrv_getlength(s->bs[0]);
}
+static void quorum_aio_cancel(BlockDriverAIOCB *blockacb)
+{
+ QuorumAIOCB *acb = container_of(blockacb, QuorumAIOCB, common);
+ bool finished = false;
+
+ /* Wait for the request to finish */
+ acb->finished = &finished;
+ while (!finished) {
+ qemu_aio_wait();
+ }
+}
+
+static AIOPool quorum_aio_pool = {
+ .aiocb_size = sizeof(QuorumAIOCB),
+ .cancel = quorum_aio_cancel,
+};
+
+static void quorum_aio_bh(void *opaque)
+{
+ QuorumAIOCB *acb = opaque;
+ BDRVQuorumState *s = acb->bqs;
+ int ret;
+
+ ret = s->n <= acb->success_count ? 0 : -EIO;
+
+ qemu_bh_delete(acb->bh);
+ acb->common.cb(acb->common.opaque, ret);
+ if (acb->finished) {
+ *acb->finished = true;
+ }
+ g_free(acb->aios);
+ g_free(acb->qiovs);
+ qemu_aio_release(acb);
+}
+
+static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
+ BlockDriverState *bs,
+ QEMUIOVector *qiov,
+ int64_t sector_num,
+ int nb_sectors,
+ BlockDriverCompletionFunc *cb,
+ void *opaque)
+{
+ QuorumAIOCB *acb = qemu_aio_get(&quorum_aio_pool, bs, cb, opaque);
+ int i;
+
+ acb->aios = g_new0(QuorumSingleAIOCB, s->m);
+ acb->qiovs = g_new0(QEMUIOVector, s->m);
+
+ acb->bqs = s;
+ acb->qiov = qiov;
+ acb->bh = NULL;
+ acb->count = 0;
+ acb->success_count = 0;
+ acb->sector_num = sector_num;
+ acb->nb_sectors = nb_sectors;
+ acb->vote = NULL;
+ acb->vote_ret = 0;
+
+ for (i = 0; i < s->m; i++) {
+ acb->aios[i].buf = NULL;
+ acb->aios[i].ret = 0;
+ acb->aios[i].parent = acb;
+ }
+
+ return acb;
+}
+
+static void quorum_aio_cb(void *opaque, int ret)
+{
+ QuorumSingleAIOCB *sacb = opaque;
+ QuorumAIOCB *acb = sacb->parent;
+ BDRVQuorumState *s = acb->bqs;
+
+ sacb->ret = ret;
+ acb->count++;
+ if (ret == 0) {
+ acb->success_count += 1;
+ }
+ assert(acb->count <= s->m);
+ assert(acb->success_count <= s->m);
+ if (acb->count < s->m) {
+ return;
+ }
+
+ acb->bh = qemu_bh_new(quorum_aio_bh, acb);
+ qemu_bh_schedule(acb->bh);
+}
+
+static BlockDriverAIOCB *quorum_aio_writev(BlockDriverState *bs,
+ int64_t sector_num,
+ QEMUIOVector *qiov,
+ int nb_sectors,
+ BlockDriverCompletionFunc *cb,
+ void *opaque)
+{
+ BDRVQuorumState *s = bs->opaque;
+ QuorumAIOCB *acb = quorum_aio_get(s, bs, qiov, sector_num, nb_sectors,
+ cb, opaque);
+ int i;
+
+ for (i = 0; i < s->m; i++) {
+ acb->aios[i].aiocb = bdrv_aio_writev(s->bs[i], sector_num, qiov,
+ nb_sectors, &quorum_aio_cb,
+ &acb->aios[i]);
+ }
+
+ return &acb->common;
+}
+
static BlockDriver bdrv_quorum = {
.format_name = "quorum",
.protocol_name = "quorum",
@@ -179,6 +289,8 @@ static BlockDriver bdrv_quorum = {
.bdrv_file_open = quorum_open,
.bdrv_close = quorum_close,
+
+ .bdrv_aio_writev = quorum_aio_writev,
};
static void bdrv_quorum_init(void)
--
1.7.9.5
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [Qemu-devel] [RFC V3 6/9] blkverify: Extract qemu_iovec_clone() and qemu_iovec_compare() from blkverify.
2012-08-14 14:14 [Qemu-devel] [RFC V3 0/9] Quorum disk image corruption resiliency Benoît Canet
` (4 preceding siblings ...)
2012-08-14 14:14 ` [Qemu-devel] [RFC V3 5/9] quorum: Add quorum_aio_writev and its dependencies Benoît Canet
@ 2012-08-14 14:14 ` Benoît Canet
2012-08-14 14:14 ` [Qemu-devel] [RFC V3 7/9] quorum: Add quorum_co_flush() Benoît Canet
` (5 subsequent siblings)
11 siblings, 0 replies; 25+ messages in thread
From: Benoît Canet @ 2012-08-14 14:14 UTC (permalink / raw)
To: qemu-devel
Cc: kwolf, stefanha, blauwirbel, anthony, pbonzini, eblake, afaerber,
Benoît Canet
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/blkverify.c | 108 +----------------------------------------------------
cutils.c | 103 ++++++++++++++++++++++++++++++++++++++++++++++++++
qemu-common.h | 2 +
3 files changed, 107 insertions(+), 106 deletions(-)
diff --git a/block/blkverify.c b/block/blkverify.c
index 9d5f1ec..79d36d5 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -123,110 +123,6 @@ static int64_t blkverify_getlength(BlockDriverState *bs)
return bdrv_getlength(s->test_file);
}
-/**
- * Check that I/O vector contents are identical
- *
- * @a: I/O vector
- * @b: I/O vector
- * @ret: Offset to first mismatching byte or -1 if match
- */
-static ssize_t blkverify_iovec_compare(QEMUIOVector *a, QEMUIOVector *b)
-{
- int i;
- ssize_t offset = 0;
-
- assert(a->niov == b->niov);
- for (i = 0; i < a->niov; i++) {
- size_t len = 0;
- uint8_t *p = (uint8_t *)a->iov[i].iov_base;
- uint8_t *q = (uint8_t *)b->iov[i].iov_base;
-
- assert(a->iov[i].iov_len == b->iov[i].iov_len);
- while (len < a->iov[i].iov_len && *p++ == *q++) {
- len++;
- }
-
- offset += len;
-
- if (len != a->iov[i].iov_len) {
- return offset;
- }
- }
- return -1;
-}
-
-typedef struct {
- int src_index;
- struct iovec *src_iov;
- void *dest_base;
-} IOVectorSortElem;
-
-static int sortelem_cmp_src_base(const void *a, const void *b)
-{
- const IOVectorSortElem *elem_a = a;
- const IOVectorSortElem *elem_b = b;
-
- /* Don't overflow */
- if (elem_a->src_iov->iov_base < elem_b->src_iov->iov_base) {
- return -1;
- } else if (elem_a->src_iov->iov_base > elem_b->src_iov->iov_base) {
- return 1;
- } else {
- return 0;
- }
-}
-
-static int sortelem_cmp_src_index(const void *a, const void *b)
-{
- const IOVectorSortElem *elem_a = a;
- const IOVectorSortElem *elem_b = b;
-
- return elem_a->src_index - elem_b->src_index;
-}
-
-/**
- * Copy contents of I/O vector
- *
- * The relative relationships of overlapping iovecs are preserved. This is
- * necessary to ensure identical semantics in the cloned I/O vector.
- */
-static void blkverify_iovec_clone(QEMUIOVector *dest, const QEMUIOVector *src,
- void *buf)
-{
- IOVectorSortElem sortelems[src->niov];
- void *last_end;
- int i;
-
- /* Sort by source iovecs by base address */
- for (i = 0; i < src->niov; i++) {
- sortelems[i].src_index = i;
- sortelems[i].src_iov = &src->iov[i];
- }
- qsort(sortelems, src->niov, sizeof(sortelems[0]), sortelem_cmp_src_base);
-
- /* Allocate buffer space taking into account overlapping iovecs */
- last_end = NULL;
- for (i = 0; i < src->niov; i++) {
- struct iovec *cur = sortelems[i].src_iov;
- ptrdiff_t rewind = 0;
-
- /* Detect overlap */
- if (last_end && last_end > cur->iov_base) {
- rewind = last_end - cur->iov_base;
- }
-
- sortelems[i].dest_base = buf - rewind;
- buf += cur->iov_len - MIN(rewind, cur->iov_len);
- last_end = MAX(cur->iov_base + cur->iov_len, last_end);
- }
-
- /* Sort by source iovec index and build destination iovec */
- qsort(sortelems, src->niov, sizeof(sortelems[0]), sortelem_cmp_src_index);
- for (i = 0; i < src->niov; i++) {
- qemu_iovec_add(dest, sortelems[i].dest_base, src->iov[i].iov_len);
- }
-}
-
static BlkverifyAIOCB *blkverify_aio_get(BlockDriverState *bs, bool is_write,
int64_t sector_num, QEMUIOVector *qiov,
int nb_sectors,
@@ -290,7 +186,7 @@ static void blkverify_aio_cb(void *opaque, int ret)
static void blkverify_verify_readv(BlkverifyAIOCB *acb)
{
- ssize_t offset = blkverify_iovec_compare(acb->qiov, &acb->raw_qiov);
+ ssize_t offset = qemu_iovec_compare(acb->qiov, &acb->raw_qiov);
if (offset != -1) {
blkverify_err(acb, "contents mismatch in sector %" PRId64,
acb->sector_num + (int64_t)(offset / BDRV_SECTOR_SIZE));
@@ -308,7 +204,7 @@ static BlockDriverAIOCB *blkverify_aio_readv(BlockDriverState *bs,
acb->verify = blkverify_verify_readv;
acb->buf = qemu_blockalign(bs->file, qiov->size);
qemu_iovec_init(&acb->raw_qiov, acb->qiov->niov);
- blkverify_iovec_clone(&acb->raw_qiov, qiov, acb->buf);
+ qemu_iovec_clone(&acb->raw_qiov, qiov, acb->buf);
bdrv_aio_readv(s->test_file, sector_num, qiov, nb_sectors,
blkverify_aio_cb, acb);
diff --git a/cutils.c b/cutils.c
index ee4614d..dcdd60f 100644
--- a/cutils.c
+++ b/cutils.c
@@ -245,6 +245,109 @@ size_t qemu_iovec_memset(QEMUIOVector *qiov, size_t offset,
return iov_memset(qiov->iov, qiov->niov, offset, fillc, bytes);
}
+/**
+ * Check that I/O vector contents are identical
+ *
+ * @a: I/O vector
+ * @b: I/O vector
+ * @ret: Offset to first mismatching byte or -1 if match
+ */
+ssize_t qemu_iovec_compare(QEMUIOVector *a, QEMUIOVector *b)
+{
+ int i;
+ ssize_t offset = 0;
+
+ assert(a->niov == b->niov);
+ for (i = 0; i < a->niov; i++) {
+ size_t len = 0;
+ uint8_t *p = (uint8_t *)a->iov[i].iov_base;
+ uint8_t *q = (uint8_t *)b->iov[i].iov_base;
+
+ assert(a->iov[i].iov_len == b->iov[i].iov_len);
+ while (len < a->iov[i].iov_len && *p++ == *q++) {
+ len++;
+ }
+
+ offset += len;
+
+ if (len != a->iov[i].iov_len) {
+ return offset;
+ }
+ }
+ return -1;
+}
+
+typedef struct {
+ int src_index;
+ struct iovec *src_iov;
+ void *dest_base;
+} IOVectorSortElem;
+
+static int sortelem_cmp_src_base(const void *a, const void *b)
+{
+ const IOVectorSortElem *elem_a = a;
+ const IOVectorSortElem *elem_b = b;
+
+ /* Don't overflow */
+ if (elem_a->src_iov->iov_base < elem_b->src_iov->iov_base) {
+ return -1;
+ } else if (elem_a->src_iov->iov_base > elem_b->src_iov->iov_base) {
+ return 1;
+ } else {
+ return 0;
+ }
+}
+
+static int sortelem_cmp_src_index(const void *a, const void *b)
+{
+ const IOVectorSortElem *elem_a = a;
+ const IOVectorSortElem *elem_b = b;
+
+ return elem_a->src_index - elem_b->src_index;
+}
+
+/**
+ * Copy contents of I/O vector
+ *
+ * The relative relationships of overlapping iovecs are preserved. This is
+ * necessary to ensure identical semantics in the cloned I/O vector.
+ */
+void qemu_iovec_clone(QEMUIOVector *dest, const QEMUIOVector *src, void *buf)
+{
+ IOVectorSortElem sortelems[src->niov];
+ void *last_end;
+ int i;
+
+ /* Sort by source iovecs by base address */
+ for (i = 0; i < src->niov; i++) {
+ sortelems[i].src_index = i;
+ sortelems[i].src_iov = &src->iov[i];
+ }
+ qsort(sortelems, src->niov, sizeof(sortelems[0]), sortelem_cmp_src_base);
+
+ /* Allocate buffer space taking into account overlapping iovecs */
+ last_end = NULL;
+ for (i = 0; i < src->niov; i++) {
+ struct iovec *cur = sortelems[i].src_iov;
+ ptrdiff_t rewind = 0;
+
+ /* Detect overlap */
+ if (last_end && last_end > cur->iov_base) {
+ rewind = last_end - cur->iov_base;
+ }
+
+ sortelems[i].dest_base = buf - rewind;
+ buf += cur->iov_len - MIN(rewind, cur->iov_len);
+ last_end = MAX(cur->iov_base + cur->iov_len, last_end);
+ }
+
+ /* Sort by source iovec index and build destination iovec */
+ qsort(sortelems, src->niov, sizeof(sortelems[0]), sortelem_cmp_src_index);
+ for (i = 0; i < src->niov; i++) {
+ qemu_iovec_add(dest, sortelems[i].dest_base, src->iov[i].iov_len);
+ }
+}
+
/*
* Checks if a buffer is all zeroes
*
diff --git a/qemu-common.h b/qemu-common.h
index 095e28d..724d08a 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -371,6 +371,8 @@ size_t qemu_iovec_from_buf(QEMUIOVector *qiov, size_t offset,
const void *buf, size_t bytes);
size_t qemu_iovec_memset(QEMUIOVector *qiov, size_t offset,
int fillc, size_t bytes);
+ssize_t qemu_iovec_compare(QEMUIOVector *a, QEMUIOVector *b);
+void qemu_iovec_clone(QEMUIOVector *dest, const QEMUIOVector *src, void *buf);
bool buffer_is_zero(const void *buf, size_t len);
--
1.7.9.5
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [Qemu-devel] [RFC V3 7/9] quorum: Add quorum_co_flush().
2012-08-14 14:14 [Qemu-devel] [RFC V3 0/9] Quorum disk image corruption resiliency Benoît Canet
` (5 preceding siblings ...)
2012-08-14 14:14 ` [Qemu-devel] [RFC V3 6/9] blkverify: Extract qemu_iovec_clone() and qemu_iovec_compare() from blkverify Benoît Canet
@ 2012-08-14 14:14 ` Benoît Canet
2012-08-14 18:52 ` Blue Swirl
2012-08-14 14:14 ` [Qemu-devel] [RFC V3 8/9] quorum: Add quorum_aio_readv Benoît Canet
` (4 subsequent siblings)
11 siblings, 1 reply; 25+ messages in thread
From: Benoît Canet @ 2012-08-14 14:14 UTC (permalink / raw)
To: qemu-devel
Cc: kwolf, stefanha, blauwirbel, anthony, pbonzini, eblake, afaerber,
Benoît Canet
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/quorum.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/block/quorum.c b/block/quorum.c
index 0a6647f..86962b4 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -279,6 +279,21 @@ static BlockDriverAIOCB *quorum_aio_writev(BlockDriverState *bs,
return &acb->common;
}
+static coroutine_fn int quorum_co_flush(BlockDriverState *bs)
+{
+ BDRVQuorumState *s = bs->opaque;
+ int i, ret = 0;
+
+ for (i = 0; i < s->m; i++) {
+ ret = bdrv_co_flush(s->bs[i]);
+ if (ret < 0) {
+ return ret;
+ }
+ }
+
+ return ret;
+}
+
static BlockDriver bdrv_quorum = {
.format_name = "quorum",
.protocol_name = "quorum",
@@ -289,6 +304,7 @@ static BlockDriver bdrv_quorum = {
.bdrv_file_open = quorum_open,
.bdrv_close = quorum_close,
+ .bdrv_co_flush_to_disk = quorum_co_flush,
.bdrv_aio_writev = quorum_aio_writev,
};
--
1.7.9.5
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [RFC V3 7/9] quorum: Add quorum_co_flush().
2012-08-14 14:14 ` [Qemu-devel] [RFC V3 7/9] quorum: Add quorum_co_flush() Benoît Canet
@ 2012-08-14 18:52 ` Blue Swirl
0 siblings, 0 replies; 25+ messages in thread
From: Blue Swirl @ 2012-08-14 18:52 UTC (permalink / raw)
To: Benoît Canet
Cc: kwolf, stefanha, qemu-devel, anthony, pbonzini, eblake, afaerber,
Benoît Canet
On Tue, Aug 14, 2012 at 2:14 PM, Benoît Canet <benoit.canet@gmail.com> wrote:
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
> block/quorum.c | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> diff --git a/block/quorum.c b/block/quorum.c
> index 0a6647f..86962b4 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -279,6 +279,21 @@ static BlockDriverAIOCB *quorum_aio_writev(BlockDriverState *bs,
> return &acb->common;
> }
>
> +static coroutine_fn int quorum_co_flush(BlockDriverState *bs)
> +{
> + BDRVQuorumState *s = bs->opaque;
> + int i, ret = 0;
> +
> + for (i = 0; i < s->m; i++) {
> + ret = bdrv_co_flush(s->bs[i]);
> + if (ret < 0) {
> + return ret;
This stops flushing if any of the replicates fail. Shouldn't we just
ignore error?
> + }
> + }
> +
> + return ret;
> +}
> +
> static BlockDriver bdrv_quorum = {
> .format_name = "quorum",
> .protocol_name = "quorum",
> @@ -289,6 +304,7 @@ static BlockDriver bdrv_quorum = {
>
> .bdrv_file_open = quorum_open,
> .bdrv_close = quorum_close,
> + .bdrv_co_flush_to_disk = quorum_co_flush,
>
> .bdrv_aio_writev = quorum_aio_writev,
> };
> --
> 1.7.9.5
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* [Qemu-devel] [RFC V3 8/9] quorum: Add quorum_aio_readv.
2012-08-14 14:14 [Qemu-devel] [RFC V3 0/9] Quorum disk image corruption resiliency Benoît Canet
` (6 preceding siblings ...)
2012-08-14 14:14 ` [Qemu-devel] [RFC V3 7/9] quorum: Add quorum_co_flush() Benoît Canet
@ 2012-08-14 14:14 ` Benoît Canet
2012-08-15 10:53 ` Stefan Hajnoczi
2012-08-14 14:14 ` [Qemu-devel] [RFC V3 9/9] quorum: Add quorum mechanism Benoît Canet
` (3 subsequent siblings)
11 siblings, 1 reply; 25+ messages in thread
From: Benoît Canet @ 2012-08-14 14:14 UTC (permalink / raw)
To: qemu-devel
Cc: kwolf, stefanha, blauwirbel, anthony, pbonzini, eblake, afaerber,
Benoît Canet
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/quorum.c | 35 ++++++++++++++++++++++++++++++++++-
1 file changed, 34 insertions(+), 1 deletion(-)
diff --git a/block/quorum.c b/block/quorum.c
index 86962b4..8b449fb 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -190,10 +190,16 @@ static void quorum_aio_bh(void *opaque)
{
QuorumAIOCB *acb = opaque;
BDRVQuorumState *s = acb->bqs;
- int ret;
+ int i, ret;
ret = s->n <= acb->success_count ? 0 : -EIO;
+ for (i = 0; i < s->m; i++) {
+ qemu_vfree(acb->aios[i].buf);
+ acb->aios[i].buf = NULL;
+ acb->aios[i].ret = 0;
+ }
+
qemu_bh_delete(acb->bh);
acb->common.cb(acb->common.opaque, ret);
if (acb->finished) {
@@ -258,6 +264,32 @@ static void quorum_aio_cb(void *opaque, int ret)
qemu_bh_schedule(acb->bh);
}
+static BlockDriverAIOCB *quorum_aio_readv(BlockDriverState *bs,
+ int64_t sector_num,
+ QEMUIOVector *qiov,
+ int nb_sectors,
+ BlockDriverCompletionFunc *cb,
+ void *opaque)
+{
+ BDRVQuorumState *s = bs->opaque;
+ QuorumAIOCB *acb = quorum_aio_get(s, bs, qiov, sector_num,
+ nb_sectors, cb, opaque);
+ int i;
+
+ for (i = 0; i < s->m; i++) {
+ acb->aios[i].buf = qemu_blockalign(bs->file, qiov->size);
+ qemu_iovec_init(&acb->qiovs[i], qiov->niov);
+ qemu_iovec_clone(&acb->qiovs[i], qiov, acb->aios[i].buf);
+ }
+
+ for (i = 0; i < s->m; i++) {
+ bdrv_aio_readv(s->bs[i], sector_num, qiov, nb_sectors,
+ quorum_aio_cb, &acb->aios[i]);
+ }
+
+ return &acb->common;
+}
+
static BlockDriverAIOCB *quorum_aio_writev(BlockDriverState *bs,
int64_t sector_num,
QEMUIOVector *qiov,
@@ -306,6 +338,7 @@ static BlockDriver bdrv_quorum = {
.bdrv_close = quorum_close,
.bdrv_co_flush_to_disk = quorum_co_flush,
+ .bdrv_aio_readv = quorum_aio_readv,
.bdrv_aio_writev = quorum_aio_writev,
};
--
1.7.9.5
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [RFC V3 8/9] quorum: Add quorum_aio_readv.
2012-08-14 14:14 ` [Qemu-devel] [RFC V3 8/9] quorum: Add quorum_aio_readv Benoît Canet
@ 2012-08-15 10:53 ` Stefan Hajnoczi
0 siblings, 0 replies; 25+ messages in thread
From: Stefan Hajnoczi @ 2012-08-15 10:53 UTC (permalink / raw)
To: Benoît Canet
Cc: kwolf, Benoît Canet, qemu-devel, blauwirbel, anthony,
pbonzini, eblake, afaerber
On Tue, Aug 14, 2012 at 04:14:10PM +0200, Benoît Canet wrote:
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
> block/quorum.c | 35 ++++++++++++++++++++++++++++++++++-
> 1 file changed, 34 insertions(+), 1 deletion(-)
>
> diff --git a/block/quorum.c b/block/quorum.c
> index 86962b4..8b449fb 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -190,10 +190,16 @@ static void quorum_aio_bh(void *opaque)
> {
> QuorumAIOCB *acb = opaque;
> BDRVQuorumState *s = acb->bqs;
> - int ret;
> + int i, ret;
>
> ret = s->n <= acb->success_count ? 0 : -EIO;
>
> + for (i = 0; i < s->m; i++) {
> + qemu_vfree(acb->aios[i].buf);
> + acb->aios[i].buf = NULL;
> + acb->aios[i].ret = 0;
> + }
> +
> qemu_bh_delete(acb->bh);
> acb->common.cb(acb->common.opaque, ret);
> if (acb->finished) {
> @@ -258,6 +264,32 @@ static void quorum_aio_cb(void *opaque, int ret)
> qemu_bh_schedule(acb->bh);
> }
>
> +static BlockDriverAIOCB *quorum_aio_readv(BlockDriverState *bs,
> + int64_t sector_num,
> + QEMUIOVector *qiov,
> + int nb_sectors,
> + BlockDriverCompletionFunc *cb,
> + void *opaque)
> +{
> + BDRVQuorumState *s = bs->opaque;
> + QuorumAIOCB *acb = quorum_aio_get(s, bs, qiov, sector_num,
> + nb_sectors, cb, opaque);
> + int i;
> +
> + for (i = 0; i < s->m; i++) {
> + acb->aios[i].buf = qemu_blockalign(bs->file, qiov->size);
> + qemu_iovec_init(&acb->qiovs[i], qiov->niov);
> + qemu_iovec_clone(&acb->qiovs[i], qiov, acb->aios[i].buf);
> + }
Need to call qemu_iovec_destroy() to free &acb->qiovs[i] iovecs.
Stefan
^ permalink raw reply [flat|nested] 25+ messages in thread
* [Qemu-devel] [RFC V3 9/9] quorum: Add quorum mechanism.
2012-08-14 14:14 [Qemu-devel] [RFC V3 0/9] Quorum disk image corruption resiliency Benoît Canet
` (7 preceding siblings ...)
2012-08-14 14:14 ` [Qemu-devel] [RFC V3 8/9] quorum: Add quorum_aio_readv Benoît Canet
@ 2012-08-14 14:14 ` Benoît Canet
2012-08-15 10:51 ` Stefan Hajnoczi
2012-08-14 18:47 ` [Qemu-devel] [RFC V3 0/9] Quorum disk image corruption resiliency Blue Swirl
` (2 subsequent siblings)
11 siblings, 1 reply; 25+ messages in thread
From: Benoît Canet @ 2012-08-14 14:14 UTC (permalink / raw)
To: qemu-devel
Cc: kwolf, stefanha, blauwirbel, anthony, pbonzini, eblake, afaerber,
Benoît Canet
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/quorum.c | 211 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 210 insertions(+), 1 deletion(-)
diff --git a/block/quorum.c b/block/quorum.c
index 8b449fb..24c8298 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -14,6 +14,20 @@
*/
#include "block_int.h"
+#include "zlib.h"
+
+typedef struct QuorumIOVectorItem {
+ int index;
+ QLIST_ENTRY(QuorumIOVectorItem) next;
+} QuorumIOVectorItem;
+
+typedef struct QuorumIOVectorVersion {
+ unsigned long checksum;
+ int index;
+ int vote_count;
+ QLIST_HEAD(, QuorumIOVectorItem) qiov_items;
+ QLIST_ENTRY(QuorumIOVectorVersion) next;
+} QuorumIOVectorVersion;
typedef struct {
BlockDriverState **bs;
@@ -48,6 +62,7 @@ struct QuorumAIOCB {
int success_count; /* number of successfully completed AIOCB */
bool *finished; /* completion signal for cancel */
+ QLIST_HEAD(, QuorumIOVectorVersion) vote_list;
void (*vote)(QuorumAIOCB *acb);
int vote_ret;
};
@@ -201,6 +216,11 @@ static void quorum_aio_bh(void *opaque)
}
qemu_bh_delete(acb->bh);
+
+ if (acb->vote_ret) {
+ ret = acb->vote_ret;
+ }
+
acb->common.cb(acb->common.opaque, ret);
if (acb->finished) {
*acb->finished = true;
@@ -233,6 +253,7 @@ static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
acb->nb_sectors = nb_sectors;
acb->vote = NULL;
acb->vote_ret = 0;
+ QLIST_INIT(&acb->vote_list);
for (i = 0; i < s->m; i++) {
acb->aios[i].buf = NULL;
@@ -260,10 +281,196 @@ static void quorum_aio_cb(void *opaque, int ret)
return;
}
+ /* Do the vote */
+ if (acb->vote) {
+ acb->vote(acb);
+ }
+
acb->bh = qemu_bh_new(quorum_aio_bh, acb);
qemu_bh_schedule(acb->bh);
}
+static void quorum_print_bad(QuorumAIOCB *acb, const char *filename)
+{
+ fprintf(stderr, "quorum: corrected error in quorum file %s: sector_num=%"
+ PRId64 " nb_sectors=%i\n", filename, acb->sector_num,
+ acb->nb_sectors);
+}
+
+static void quorum_print_failure(QuorumAIOCB *acb)
+{
+ fprintf(stderr, "quorum: failure sector_num=%" PRId64 " nb_sectors=%i\n",
+ acb->sector_num, acb->nb_sectors);
+}
+
+static void quorum_print_bad_versions(QuorumAIOCB *acb,
+ unsigned long checksum)
+{
+ QuorumIOVectorVersion *version;
+ QuorumIOVectorItem *item;
+ BDRVQuorumState *s = acb->bqs;
+
+ QLIST_FOREACH(version, &acb->vote_list, next) {
+ if (version->checksum == checksum) {
+ continue;
+ }
+ QLIST_FOREACH(item, &version->qiov_items, next) {
+ quorum_print_bad(acb, s->filenames[item->index]);
+ }
+ }
+}
+
+static void quorum_copy_qiov(QEMUIOVector *dest, QEMUIOVector *source)
+{
+ int i;
+ assert(dest->niov == source->niov);
+ assert(dest->size == source->size);
+ for (i = 0; i < source->niov; i++) {
+ assert(dest->iov[i].iov_len == source->iov[i].iov_len);
+ memcpy(dest->iov[i].iov_base,
+ source->iov[i].iov_base,
+ source->iov[i].iov_len);
+ }
+}
+
+static void quorum_count_iovector_version(QuorumAIOCB *acb,
+ unsigned long checksum,
+ int index)
+{
+ QuorumIOVectorVersion *v = NULL, *version = NULL;
+ QuorumIOVectorItem *item;
+
+ /* look if we have something with this checksum */
+ QLIST_FOREACH(v, &acb->vote_list, next) {
+ if (v->checksum == checksum) {
+ version = v;
+ break;
+ }
+ }
+
+ /* It's a version not yet in the list add it */
+ if (!version) {
+ version = g_new0(QuorumIOVectorVersion, 1);
+ QLIST_INIT(&version->qiov_items);
+ version->checksum = checksum;
+ version->index = index;
+ version->vote_count = 0;
+ QLIST_INSERT_HEAD(&acb->vote_list, version, next);
+ }
+
+ version->vote_count += 1;
+
+ item = g_new0(QuorumIOVectorItem, 1);
+ item->index = index;
+ QLIST_INSERT_HEAD(&version->qiov_items, item, next);
+}
+
+#define QUORUM_FREE_QIOV_ITEMS(qlist) do { \
+ QLIST_FOREACH_SAFE(item, qlist, next, next_item) { \
+ QLIST_REMOVE(item, next); \
+ g_free(item); \
+ } } while (0)
+
+static void quorum_free_vote_list(QuorumAIOCB *acb)
+{
+ QuorumIOVectorVersion *version, *next_version;
+ QuorumIOVectorItem *item, *next_item;
+
+ QLIST_FOREACH_SAFE(version, &acb->vote_list, next, next_version) {
+ QLIST_REMOVE(version, next);
+ QUORUM_FREE_QIOV_ITEMS(&version->qiov_items);
+ g_free(version);
+ }
+}
+
+#undef QUORUM_FREE_QIOV_ITEMS
+
+static unsigned long quorum_compute_checksum(QuorumAIOCB *acb, int i)
+{
+ int j;
+ unsigned long adler = adler32(0L, Z_NULL, 0);
+ QEMUIOVector *qiov = &acb->qiovs[i];
+
+ for (j = 0; j < qiov->niov; j++) {
+ adler = adler32(adler,
+ qiov->iov[j].iov_base,
+ qiov->iov[j].iov_len);
+ }
+
+ return adler;
+}
+
+static void quorum_vote(QuorumAIOCB *acb)
+{
+ bool quorum = true;
+ int i, j;
+ unsigned long checksum = 0;
+ BDRVQuorumState *s = acb->bqs;
+ QuorumIOVectorVersion *candidate, *winner = NULL;
+
+ /* get the index of the first successfull read */
+ for (i = 0; i < s->m; i++) {
+ if (!acb->aios[i].ret) {
+ break;
+ }
+ }
+
+ /* compare this read with all other successfull read looking for quorum */
+ for (j = i + 1; j < s->m; j++) {
+ if (acb->aios[j].ret) {
+ continue;
+ }
+ if (qemu_iovec_compare(&acb->qiovs[i],
+ &acb->qiovs[j]) != -1) {
+ quorum = false;
+ break;
+ }
+ }
+
+ /* Every successfull read agrees -> Quorum */
+ if (quorum) {
+ quorum_copy_qiov(acb->qiov, &acb->qiovs[i]);
+ return;
+ }
+
+ /* compute checksums for each successfull read, also store indexes */
+ for (i = 0; i < s->m; i++) {
+ if (acb->aios[i].ret) {
+ continue;
+ }
+ checksum = quorum_compute_checksum(acb, i);
+ quorum_count_iovector_version(acb, checksum, i);
+ }
+
+ /* vote to select the most represented version */
+ i = 0;
+ QLIST_FOREACH(candidate, &acb->vote_list, next) {
+ if (candidate->vote_count > i) {
+ i = candidate->vote_count;
+ winner = candidate;
+ }
+ }
+
+ /* if the winner count is smaller than threshold read fail */
+ if (winner->vote_count < s->n) {
+ quorum_print_failure(acb);
+ acb->vote_ret = -EIO;
+ goto free_exit;
+ }
+
+ /* we have a winner: copy it */
+ quorum_copy_qiov(acb->qiov, &acb->qiovs[winner->index]);
+
+ /* if some versions are bad print them */
+ if (i < s->m) {
+ quorum_print_bad_versions(acb, winner->checksum);
+ }
+
+free_exit:
+ /* free lists */
+ quorum_free_vote_list(acb);
+}
+
static BlockDriverAIOCB *quorum_aio_readv(BlockDriverState *bs,
int64_t sector_num,
QEMUIOVector *qiov,
@@ -276,6 +483,8 @@ static BlockDriverAIOCB *quorum_aio_readv(BlockDriverState *bs,
nb_sectors, cb, opaque);
int i;
+ acb->vote = quorum_vote;
+
for (i = 0; i < s->m; i++) {
acb->aios[i].buf = qemu_blockalign(bs->file, qiov->size);
qemu_iovec_init(&acb->qiovs[i], qiov->niov);
@@ -283,7 +492,7 @@ static BlockDriverAIOCB *quorum_aio_readv(BlockDriverState *bs,
}
for (i = 0; i < s->m; i++) {
- bdrv_aio_readv(s->bs[i], sector_num, qiov, nb_sectors,
+ bdrv_aio_readv(s->bs[i], sector_num, &acb->qiovs[i], nb_sectors,
quorum_aio_cb, &acb->aios[i]);
}
--
1.7.9.5
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [RFC V3 9/9] quorum: Add quorum mechanism.
2012-08-14 14:14 ` [Qemu-devel] [RFC V3 9/9] quorum: Add quorum mechanism Benoît Canet
@ 2012-08-15 10:51 ` Stefan Hajnoczi
0 siblings, 0 replies; 25+ messages in thread
From: Stefan Hajnoczi @ 2012-08-15 10:51 UTC (permalink / raw)
To: Benoît Canet
Cc: kwolf, Benoît Canet, qemu-devel, blauwirbel, anthony,
pbonzini, eblake, afaerber
On Tue, Aug 14, 2012 at 04:14:11PM +0200, Benoît Canet wrote:
> +#define QUORUM_FREE_QIOV_ITEMS(qlist) do { \
> + QLIST_FOREACH_SAFE(item, qlist, next, next_item) { \
> + QLIST_REMOVE(item, next); \
> + g_free(item); \
> + } } while (0)
This is only used once, please open code it and don't use a macro.
> +
> +static void quorum_free_vote_list(QuorumAIOCB *acb)
> +{
> + QuorumIOVectorVersion *version, *next_version;
> + QuorumIOVectorItem *item, *next_item;
> +
> + QLIST_FOREACH_SAFE(version, &acb->vote_list, next, next_version) {
> + QLIST_REMOVE(version, next);
> + QUORUM_FREE_QIOV_ITEMS(&version->qiov_items);
> + g_free(version);
> + }
> +}
> +
> +#undef QUORUM_FREE_QIOV_ITEMS
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [RFC V3 0/9] Quorum disk image corruption resiliency
2012-08-14 14:14 [Qemu-devel] [RFC V3 0/9] Quorum disk image corruption resiliency Benoît Canet
` (8 preceding siblings ...)
2012-08-14 14:14 ` [Qemu-devel] [RFC V3 9/9] quorum: Add quorum mechanism Benoît Canet
@ 2012-08-14 18:47 ` Blue Swirl
2012-08-15 13:12 ` Stefan Hajnoczi
2012-08-20 10:12 ` Benoît Canet
11 siblings, 0 replies; 25+ messages in thread
From: Blue Swirl @ 2012-08-14 18:47 UTC (permalink / raw)
To: Benoît Canet
Cc: kwolf, stefanha, qemu-devel, anthony, pbonzini, eblake, afaerber,
Benoît Canet
On Tue, Aug 14, 2012 at 2:14 PM, Benoît Canet <benoit.canet@gmail.com> wrote:
> This patchset create a block driver implementing a quorum using m qemu disk
> images. Writes are mirrored on the m files.
> For the reading part the m files are read at the same time and a vote is
> done to determine if a qiov version is present n or more times. It then return
> this majority version to the upper layers.
> When i < n versions of the data are returned by the lower layer the
> quorum is broken and the read return -EIO.
>
> The goal of this patchset is to be turned in a QEMU block filter living just
> above raw-*.c and below qcow2/qed when the required infrastructure will be done.
>
> Main use of this feature will be people using NFS appliances which can be
> subjected to bitflip errors.
>
> This patchset can be used to replace blkverify and the out of tree blkmirror.
>
> usage: -drive file=quorum:n/m:image_1.raw:...:image_m.raw,if=virtio,cache=none
'n' and 'm' (these names are also used in the code) are not
descriptive. How about 'total' (or 'max') and 'quorum'?
>
> in v2:
>
> eblake: fix typos
> squash two first commits
>
> afärber: Modify the Makefile on first commit
>
> bcanet: move function prototype of quorum.c one patch down
>
> in v3:
>
> Blue Swirl: change char * to uint8_t * in QuorumSingleAIOCB
>
> Eric Blake: Add escaping of the : separator
> Allow to specify the n/m ratio parameters of the Quorum
>
> Stefan Hajnoczi: Squash quorum_close and quorum_open patch to avoid leak
> Add missing bdrv_delete() in quorum_close
> simpler quorum_getlength
> make the quorum_check_ret threshold a user setting (bind it to n)
> move blkverify_iovec_clone() and blkverify_iovec_compare() to cutils.c
> free unconditionally qemu_blockalign() with qemu_vfree()
> turn assignement into assert in quorum_copy_qiov()
>
> Benoît Canet (9):
> quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB.
> quorum: Create BDRVQuorumState and BlkDriver and do init.
> quorum: Add quorum_open() and quorum_close().
> quorum: Add quorum_getlength().
> quorum: Add quorum_aio_writev and its dependencies.
> blkverify: Extract qemu_iovec_clone() and qemu_iovec_compare() from
> blkverify.
> quorum: Add quorum_co_flush().
> quorum: Add quorum_aio_readv.
> quorum: Add quorum mechanism.
>
> block/Makefile.objs | 1 +
> block/blkverify.c | 108 +---------
> block/quorum.c | 559 +++++++++++++++++++++++++++++++++++++++++++++++++++
> cutils.c | 103 ++++++++++
> qemu-common.h | 2 +
> 5 files changed, 667 insertions(+), 106 deletions(-)
> create mode 100644 block/quorum.c
>
> --
> 1.7.9.5
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [RFC V3 0/9] Quorum disk image corruption resiliency
2012-08-14 14:14 [Qemu-devel] [RFC V3 0/9] Quorum disk image corruption resiliency Benoît Canet
` (9 preceding siblings ...)
2012-08-14 18:47 ` [Qemu-devel] [RFC V3 0/9] Quorum disk image corruption resiliency Blue Swirl
@ 2012-08-15 13:12 ` Stefan Hajnoczi
2012-08-20 10:12 ` Benoît Canet
11 siblings, 0 replies; 25+ messages in thread
From: Stefan Hajnoczi @ 2012-08-15 13:12 UTC (permalink / raw)
To: Benoît Canet
Cc: kwolf, stefanha, qemu-devel, blauwirbel, anthony, pbonzini,
eblake, afaerber, Benoît Canet
On Tue, Aug 14, 2012 at 3:14 PM, Benoît Canet <benoit.canet@gmail.com> wrote:
> This patchset create a block driver implementing a quorum using m qemu disk
> images. Writes are mirrored on the m files.
> For the reading part the m files are read at the same time and a vote is
> done to determine if a qiov version is present n or more times. It then return
> this majority version to the upper layers.
> When i < n versions of the data are returned by the lower layer the
> quorum is broken and the read return -EIO.
>
> The goal of this patchset is to be turned in a QEMU block filter living just
> above raw-*.c and below qcow2/qed when the required infrastructure will be done.
>
> Main use of this feature will be people using NFS appliances which can be
> subjected to bitflip errors.
>
> This patchset can be used to replace blkverify and the out of tree blkmirror.
>
> usage: -drive file=quorum:n/m:image_1.raw:...:image_m.raw,if=virtio,cache=none
>
> in v2:
>
> eblake: fix typos
> squash two first commits
>
> afärber: Modify the Makefile on first commit
>
> bcanet: move function prototype of quorum.c one patch down
>
> in v3:
>
> Blue Swirl: change char * to uint8_t * in QuorumSingleAIOCB
>
> Eric Blake: Add escaping of the : separator
> Allow to specify the n/m ratio parameters of the Quorum
>
> Stefan Hajnoczi: Squash quorum_close and quorum_open patch to avoid leak
> Add missing bdrv_delete() in quorum_close
> simpler quorum_getlength
> make the quorum_check_ret threshold a user setting (bind it to n)
> move blkverify_iovec_clone() and blkverify_iovec_compare() to cutils.c
> free unconditionally qemu_blockalign() with qemu_vfree()
> turn assignement into assert in quorum_copy_qiov()
>
> Benoît Canet (9):
> quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB.
> quorum: Create BDRVQuorumState and BlkDriver and do init.
> quorum: Add quorum_open() and quorum_close().
> quorum: Add quorum_getlength().
> quorum: Add quorum_aio_writev and its dependencies.
> blkverify: Extract qemu_iovec_clone() and qemu_iovec_compare() from
> blkverify.
> quorum: Add quorum_co_flush().
> quorum: Add quorum_aio_readv.
> quorum: Add quorum mechanism.
>
> block/Makefile.objs | 1 +
> block/blkverify.c | 108 +---------
> block/quorum.c | 559 +++++++++++++++++++++++++++++++++++++++++++++++++++
> cutils.c | 103 ++++++++++
> qemu-common.h | 2 +
> 5 files changed, 667 insertions(+), 106 deletions(-)
> create mode 100644 block/quorum.c
BTW once this feature is merged we could drop blkverify since quorum
is more generic and provides blkverify behavior in the n=2/m=2
configuration.
Stefan
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [RFC V3 0/9] Quorum disk image corruption resiliency
2012-08-14 14:14 [Qemu-devel] [RFC V3 0/9] Quorum disk image corruption resiliency Benoît Canet
` (10 preceding siblings ...)
2012-08-15 13:12 ` Stefan Hajnoczi
@ 2012-08-20 10:12 ` Benoît Canet
2012-08-20 11:23 ` Stefan Hajnoczi
11 siblings, 1 reply; 25+ messages in thread
From: Benoît Canet @ 2012-08-20 10:12 UTC (permalink / raw)
To: Benoît Canet
Cc: kwolf, stefanha, qemu-devel, blauwirbel, anthony, pbonzini,
eblake, afaerber
Le Tuesday 14 Aug 2012 à 16:14:02 (+0200), Benoît Canet a écrit :
> This patchset create a block driver implementing a quorum using m qemu disk
> images. Writes are mirrored on the m files.
> For the reading part the m files are read at the same time and a vote is
> done to determine if a qiov version is present n or more times. It then return
> this majority version to the upper layers.
> When i < n versions of the data are returned by the lower layer the
> quorum is broken and the read return -EIO.
>
> The goal of this patchset is to be turned in a QEMU block filter living just
> above raw-*.c and below qcow2/qed when the required infrastructure will be done.
>
> Main use of this feature will be people using NFS appliances which can be
> subjected to bitflip errors.
>
> This patchset can be used to replace blkverify and the out of tree blkmirror.
>
> usage: -drive file=quorum:n/m:image_1.raw:...:image_m.raw,if=virtio,cache=none
stefanha: I am wondering what would be needed to do in order to have COR and streaming working
with quorum.c ?
Same question for live migration.
Benoît
>
> in v2:
>
> eblake: fix typos
> squash two first commits
>
> afärber: Modify the Makefile on first commit
>
> bcanet: move function prototype of quorum.c one patch down
>
> in v3:
>
> Blue Swirl: change char * to uint8_t * in QuorumSingleAIOCB
>
> Eric Blake: Add escaping of the : separator
> Allow to specify the n/m ratio parameters of the Quorum
>
> Stefan Hajnoczi: Squash quorum_close and quorum_open patch to avoid leak
> Add missing bdrv_delete() in quorum_close
> simpler quorum_getlength
> make the quorum_check_ret threshold a user setting (bind it to n)
> move blkverify_iovec_clone() and blkverify_iovec_compare() to cutils.c
> free unconditionally qemu_blockalign() with qemu_vfree()
> turn assignement into assert in quorum_copy_qiov()
>
> Benoît Canet (9):
> quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB.
> quorum: Create BDRVQuorumState and BlkDriver and do init.
> quorum: Add quorum_open() and quorum_close().
> quorum: Add quorum_getlength().
> quorum: Add quorum_aio_writev and its dependencies.
> blkverify: Extract qemu_iovec_clone() and qemu_iovec_compare() from
> blkverify.
> quorum: Add quorum_co_flush().
> quorum: Add quorum_aio_readv.
> quorum: Add quorum mechanism.
>
> block/Makefile.objs | 1 +
> block/blkverify.c | 108 +---------
> block/quorum.c | 559 +++++++++++++++++++++++++++++++++++++++++++++++++++
> cutils.c | 103 ++++++++++
> qemu-common.h | 2 +
> 5 files changed, 667 insertions(+), 106 deletions(-)
> create mode 100644 block/quorum.c
>
> --
> 1.7.9.5
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [RFC V3 0/9] Quorum disk image corruption resiliency
2012-08-20 10:12 ` Benoît Canet
@ 2012-08-20 11:23 ` Stefan Hajnoczi
2012-08-20 11:24 ` Stefan Hajnoczi
0 siblings, 1 reply; 25+ messages in thread
From: Stefan Hajnoczi @ 2012-08-20 11:23 UTC (permalink / raw)
To: Benoît Canet
Cc: kwolf, Benoît Canet, stefanha, qemu-devel, blauwirbel,
anthony, pbonzini, eblake, afaerber
On Mon, Aug 20, 2012 at 11:12 AM, Benoît Canet <benoit.canet@irqsave.net> wrote:
> Le Tuesday 14 Aug 2012 à 16:14:02 (+0200), Benoît Canet a écrit :
>> This patchset create a block driver implementing a quorum using m qemu disk
>> images. Writes are mirrored on the m files.
>> For the reading part the m files are read at the same time and a vote is
>> done to determine if a qiov version is present n or more times. It then return
>> this majority version to the upper layers.
>> When i < n versions of the data are returned by the lower layer the
>> quorum is broken and the read return -EIO.
>>
>> The goal of this patchset is to be turned in a QEMU block filter living just
>> above raw-*.c and below qcow2/qed when the required infrastructure will be done.
>>
>> Main use of this feature will be people using NFS appliances which can be
>> subjected to bitflip errors.
>>
>> This patchset can be used to replace blkverify and the out of tree blkmirror.
>>
>> usage: -drive file=quorum:n/m:image_1.raw:...:image_m.raw,if=virtio,cache=none
>
> stefanha: I am wondering what would be needed to do in order to have COR and streaming working
> with quorum.c ?
.bdrv_is_allocated()/.bdrv_co_is_allocated() needs to be supported by
block/quorum.c. Have you tried it and found a problem?
Stefan
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [RFC V3 0/9] Quorum disk image corruption resiliency
2012-08-20 11:23 ` Stefan Hajnoczi
@ 2012-08-20 11:24 ` Stefan Hajnoczi
2012-08-20 11:42 ` Benoît Canet
0 siblings, 1 reply; 25+ messages in thread
From: Stefan Hajnoczi @ 2012-08-20 11:24 UTC (permalink / raw)
To: Benoît Canet
Cc: kwolf, Benoît Canet, stefanha, qemu-devel, blauwirbel,
anthony, pbonzini, eblake, afaerber
On Mon, Aug 20, 2012 at 12:23 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Mon, Aug 20, 2012 at 11:12 AM, Benoît Canet <benoit.canet@irqsave.net> wrote:
>> Le Tuesday 14 Aug 2012 à 16:14:02 (+0200), Benoît Canet a écrit :
>>> This patchset create a block driver implementing a quorum using m qemu disk
>>> images. Writes are mirrored on the m files.
>>> For the reading part the m files are read at the same time and a vote is
>>> done to determine if a qiov version is present n or more times. It then return
>>> this majority version to the upper layers.
>>> When i < n versions of the data are returned by the lower layer the
>>> quorum is broken and the read return -EIO.
>>>
>>> The goal of this patchset is to be turned in a QEMU block filter living just
>>> above raw-*.c and below qcow2/qed when the required infrastructure will be done.
>>>
>>> Main use of this feature will be people using NFS appliances which can be
>>> subjected to bitflip errors.
>>>
>>> This patchset can be used to replace blkverify and the out of tree blkmirror.
>>>
>>> usage: -drive file=quorum:n/m:image_1.raw:...:image_m.raw,if=virtio,cache=none
>>
>> stefanha: I am wondering what would be needed to do in order to have COR and streaming working
>> with quorum.c ?
>
> .bdrv_is_allocated()/.bdrv_co_is_allocated() needs to be supported by
> block/quorum.c. Have you tried it and found a problem?
Just want to confirm you are thinking about image streaming on top of
quorum? Or are you thinking about streaming underneath quorum?
Stefan
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [RFC V3 0/9] Quorum disk image corruption resiliency
2012-08-20 11:24 ` Stefan Hajnoczi
@ 2012-08-20 11:42 ` Benoît Canet
2012-08-20 12:56 ` Stefan Hajnoczi
0 siblings, 1 reply; 25+ messages in thread
From: Benoît Canet @ 2012-08-20 11:42 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Benoît Canet, kwolf, Benoît Canet, stefanha, qemu-devel,
blauwirbel, anthony, pbonzini, eblake, afaerber
Le Monday 20 Aug 2012 à 12:24:33 (+0100), Stefan Hajnoczi a écrit :
> On Mon, Aug 20, 2012 at 12:23 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> > On Mon, Aug 20, 2012 at 11:12 AM, Benoît Canet <benoit.canet@irqsave.net> wrote:
> >> Le Tuesday 14 Aug 2012 à 16:14:02 (+0200), Benoît Canet a écrit :
> >>> This patchset create a block driver implementing a quorum using m qemu disk
> >>> images. Writes are mirrored on the m files.
> >>> For the reading part the m files are read at the same time and a vote is
> >>> done to determine if a qiov version is present n or more times. It then return
> >>> this majority version to the upper layers.
> >>> When i < n versions of the data are returned by the lower layer the
> >>> quorum is broken and the read return -EIO.
> >>>
> >>> The goal of this patchset is to be turned in a QEMU block filter living just
> >>> above raw-*.c and below qcow2/qed when the required infrastructure will be done.
> >>>
> >>> Main use of this feature will be people using NFS appliances which can be
> >>> subjected to bitflip errors.
> >>>
> >>> This patchset can be used to replace blkverify and the out of tree blkmirror.
> >>>
> >>> usage: -drive file=quorum:n/m:image_1.raw:...:image_m.raw,if=virtio,cache=none
> >>
> >> stefanha: I am wondering what would be needed to do in order to have COR and streaming working
> >> with quorum.c ?
> >
> > .bdrv_is_allocated()/.bdrv_co_is_allocated() needs to be supported by
> > block/quorum.c. Have you tried it and found a problem?
>
> Just want to confirm you are thinking about image streaming on top of
> quorum? Or are you thinking about streaming underneath quorum?
I am thinking about streaming with quorum on top of a bunch of backing files.
ie: data landing into the higher level backing file living just under quorum.
Benoît
>
> Stefan
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [RFC V3 0/9] Quorum disk image corruption resiliency
2012-08-20 11:42 ` Benoît Canet
@ 2012-08-20 12:56 ` Stefan Hajnoczi
2012-08-20 14:03 ` Benoît Canet
0 siblings, 1 reply; 25+ messages in thread
From: Stefan Hajnoczi @ 2012-08-20 12:56 UTC (permalink / raw)
To: Benoît Canet
Cc: kwolf, Benoît Canet, stefanha, qemu-devel, blauwirbel,
anthony, pbonzini, eblake, afaerber
On Mon, Aug 20, 2012 at 12:42 PM, Benoît Canet <benoit.canet@irqsave.net> wrote:
>
> Le Monday 20 Aug 2012 à 12:24:33 (+0100), Stefan Hajnoczi a écrit :
>> On Mon, Aug 20, 2012 at 12:23 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>> > On Mon, Aug 20, 2012 at 11:12 AM, Benoît Canet <benoit.canet@irqsave.net> wrote:
>> >> Le Tuesday 14 Aug 2012 à 16:14:02 (+0200), Benoît Canet a écrit :
>> >>> This patchset create a block driver implementing a quorum using m qemu disk
>> >>> images. Writes are mirrored on the m files.
>> >>> For the reading part the m files are read at the same time and a vote is
>> >>> done to determine if a qiov version is present n or more times. It then return
>> >>> this majority version to the upper layers.
>> >>> When i < n versions of the data are returned by the lower layer the
>> >>> quorum is broken and the read return -EIO.
>> >>>
>> >>> The goal of this patchset is to be turned in a QEMU block filter living just
>> >>> above raw-*.c and below qcow2/qed when the required infrastructure will be done.
>> >>>
>> >>> Main use of this feature will be people using NFS appliances which can be
>> >>> subjected to bitflip errors.
>> >>>
>> >>> This patchset can be used to replace blkverify and the out of tree blkmirror.
>> >>>
>> >>> usage: -drive file=quorum:n/m:image_1.raw:...:image_m.raw,if=virtio,cache=none
>> >>
>> >> stefanha: I am wondering what would be needed to do in order to have COR and streaming working
>> >> with quorum.c ?
>> >
>> > .bdrv_is_allocated()/.bdrv_co_is_allocated() needs to be supported by
>> > block/quorum.c. Have you tried it and found a problem?
>>
>> Just want to confirm you are thinking about image streaming on top of
>> quorum? Or are you thinking about streaming underneath quorum?
>
> I am thinking about streaming with quorum on top of a bunch of backing files.
> ie: data landing into the higher level backing file living just under quorum.
If there are backing files then there must be qcow2 or another image
format on top of quorum:
qcow2 ("virtio0")
+------ foo.img (file)
+------ quorum (backing_hd)
+------- backing_a.img
+------- backing_b.img
+------- backing_c.img
.bdrv_is_allocated()/.bdrv_co_is_allocated() must be supported by quorum.
Stefan
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [RFC V3 0/9] Quorum disk image corruption resiliency
2012-08-20 12:56 ` Stefan Hajnoczi
@ 2012-08-20 14:03 ` Benoît Canet
2012-08-20 15:28 ` Stefan Hajnoczi
0 siblings, 1 reply; 25+ messages in thread
From: Benoît Canet @ 2012-08-20 14:03 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Benoît Canet, kwolf, Benoît Canet, stefanha, qemu-devel,
blauwirbel, anthony, pbonzini, eblake, afaerber
Le Monday 20 Aug 2012 à 13:56:53 (+0100), Stefan Hajnoczi a écrit :
> On Mon, Aug 20, 2012 at 12:42 PM, Benoît Canet <benoit.canet@irqsave.net> wrote:
> >
> > Le Monday 20 Aug 2012 à 12:24:33 (+0100), Stefan Hajnoczi a écrit :
> >> On Mon, Aug 20, 2012 at 12:23 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> >> > On Mon, Aug 20, 2012 at 11:12 AM, Benoît Canet <benoit.canet@irqsave.net> wrote:
> >> >> Le Tuesday 14 Aug 2012 à 16:14:02 (+0200), Benoît Canet a écrit :
> >> >>> This patchset create a block driver implementing a quorum using m qemu disk
> >> >>> images. Writes are mirrored on the m files.
> >> >>> For the reading part the m files are read at the same time and a vote is
> >> >>> done to determine if a qiov version is present n or more times. It then return
> >> >>> this majority version to the upper layers.
> >> >>> When i < n versions of the data are returned by the lower layer the
> >> >>> quorum is broken and the read return -EIO.
> >> >>>
> >> >>> The goal of this patchset is to be turned in a QEMU block filter living just
> >> >>> above raw-*.c and below qcow2/qed when the required infrastructure will be done.
> >> >>>
> >> >>> Main use of this feature will be people using NFS appliances which can be
> >> >>> subjected to bitflip errors.
> >> >>>
> >> >>> This patchset can be used to replace blkverify and the out of tree blkmirror.
> >> >>>
> >> >>> usage: -drive file=quorum:n/m:image_1.raw:...:image_m.raw,if=virtio,cache=none
> >> >>
> >> >> stefanha: I am wondering what would be needed to do in order to have COR and streaming working
> >> >> with quorum.c ?
> >> >
> >> > .bdrv_is_allocated()/.bdrv_co_is_allocated() needs to be supported by
> >> > block/quorum.c. Have you tried it and found a problem?
> >>
> >> Just want to confirm you are thinking about image streaming on top of
> >> quorum? Or are you thinking about streaming underneath quorum?
> >
> > I am thinking about streaming with quorum on top of a bunch of backing files.
> > ie: data landing into the higher level backing file living just under quorum.
>
> If there are backing files then there must be qcow2 or another image
> format on top of quorum:
>
> qcow2 ("virtio0")
> +------ foo.img (file)
> +------ quorum (backing_hd)
> +------- backing_a.img
> +------- backing_b.img
> +------- backing_c.img
>
> .bdrv_is_allocated()/.bdrv_co_is_allocated() must be supported by quorum.
This seems like a low hanging fruit.
Too bad we cannot create new snapshot of quorum file easily without full
block filter support in qemu.
Am I right ?
Benoît
>
> Stefan
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [RFC V3 0/9] Quorum disk image corruption resiliency
2012-08-20 14:03 ` Benoît Canet
@ 2012-08-20 15:28 ` Stefan Hajnoczi
0 siblings, 0 replies; 25+ messages in thread
From: Stefan Hajnoczi @ 2012-08-20 15:28 UTC (permalink / raw)
To: Benoît Canet
Cc: kwolf, Benoît Canet, stefanha, qemu-devel, blauwirbel,
anthony, pbonzini, eblake, afaerber
On Mon, Aug 20, 2012 at 3:03 PM, Benoît Canet <benoit.canet@irqsave.net> wrote:
> Le Monday 20 Aug 2012 ŕ 13:56:53 (+0100), Stefan Hajnoczi a écrit :
>> On Mon, Aug 20, 2012 at 12:42 PM, Benoît Canet <benoit.canet@irqsave.net> wrote:
>> >
>> > Le Monday 20 Aug 2012 ŕ 12:24:33 (+0100), Stefan Hajnoczi a écrit :
>> >> On Mon, Aug 20, 2012 at 12:23 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>> >> > On Mon, Aug 20, 2012 at 11:12 AM, Benoît Canet <benoit.canet@irqsave.net> wrote:
>> >> >> Le Tuesday 14 Aug 2012 ŕ 16:14:02 (+0200), Benoît Canet a écrit :
>> >> >>> This patchset create a block driver implementing a quorum using m qemu disk
>> >> >>> images. Writes are mirrored on the m files.
>> >> >>> For the reading part the m files are read at the same time and a vote is
>> >> >>> done to determine if a qiov version is present n or more times. It then return
>> >> >>> this majority version to the upper layers.
>> >> >>> When i < n versions of the data are returned by the lower layer the
>> >> >>> quorum is broken and the read return -EIO.
>> >> >>>
>> >> >>> The goal of this patchset is to be turned in a QEMU block filter living just
>> >> >>> above raw-*.c and below qcow2/qed when the required infrastructure will be done.
>> >> >>>
>> >> >>> Main use of this feature will be people using NFS appliances which can be
>> >> >>> subjected to bitflip errors.
>> >> >>>
>> >> >>> This patchset can be used to replace blkverify and the out of tree blkmirror.
>> >> >>>
>> >> >>> usage: -drive file=quorum:n/m:image_1.raw:...:image_m.raw,if=virtio,cache=none
>> >> >>
>> >> >> stefanha: I am wondering what would be needed to do in order to have COR and streaming working
>> >> >> with quorum.c ?
>> >> >
>> >> > .bdrv_is_allocated()/.bdrv_co_is_allocated() needs to be supported by
>> >> > block/quorum.c. Have you tried it and found a problem?
>> >>
>> >> Just want to confirm you are thinking about image streaming on top of
>> >> quorum? Or are you thinking about streaming underneath quorum?
>> >
>> > I am thinking about streaming with quorum on top of a bunch of backing files.
>> > ie: data landing into the higher level backing file living just under quorum.
>>
>> If there are backing files then there must be qcow2 or another image
>> format on top of quorum:
>>
>> qcow2 ("virtio0")
>> +------ foo.img (file)
>> +------ quorum (backing_hd)
>> +------- backing_a.img
>> +------- backing_b.img
>> +------- backing_c.img
>>
>> .bdrv_is_allocated()/.bdrv_co_is_allocated() must be supported by quorum.
>
> This seems like a low hanging fruit.
>
> Too bad we cannot create new snapshot of quorum file easily without full
> block filter support in qemu.
> Am I right ?
If the snapshot also needs to use quorum then we definitely need more
logic during the snapshot creation process than exists today.
Stefan
^ permalink raw reply [flat|nested] 25+ messages in thread