* [Qemu-devel] [RFC 01/12] qorum: Add GPL v2+ header file.
2012-08-02 10:16 [Qemu-devel] [RFC 00/12] Qorum disk image corruption resiliency Benoît Canet
@ 2012-08-02 10:16 ` Benoît Canet
2012-08-02 14:04 ` Eric Blake
2012-08-02 16:07 ` Andreas Färber
2012-08-02 10:16 ` [Qemu-devel] [RFC 02/12] qorum: Add QorumSingleAIOCB and QorumAIOCB Benoît Canet
` (13 subsequent siblings)
14 siblings, 2 replies; 25+ messages in thread
From: Benoît Canet @ 2012-08-02 10:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, stefanha, Benoît Canet
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/qorum.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
create mode 100644 block/qorum.c
diff --git a/block/qorum.c b/block/qorum.c
new file mode 100644
index 0000000..3341021
--- /dev/null
+++ b/block/qorum.c
@@ -0,0 +1,15 @@
+/*
+ * Qorum Block filter
+ *
+ * Copyright (C) Nodalink, SARL. 2012
+ *
+ * Author:
+ * Benoît Canet <benoit.canet@irqsave.net>
+ *
+ * Based on the design and code of blkverify.c (Copyright (C) 2010 IBM, Corp)
+ * and blkmirror.c (Copyright (C) 2011 Red Hat, Inc).
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
--
1.7.9.5
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [RFC 01/12] qorum: Add GPL v2+ header file.
2012-08-02 10:16 ` [Qemu-devel] [RFC 01/12] qorum: Add GPL v2+ header file Benoît Canet
@ 2012-08-02 14:04 ` Eric Blake
2012-08-02 14:55 ` Benoît Canet
2012-08-02 16:07 ` Andreas Färber
1 sibling, 1 reply; 25+ messages in thread
From: Eric Blake @ 2012-08-02 14:04 UTC (permalink / raw)
To: Benoît Canet
Cc: kwolf, pbonzini, Benoît Canet, qemu-devel, stefanha
[-- Attachment #1: Type: text/plain, Size: 755 bytes --]
On 08/02/2012 04:16 AM, Benoît Canet wrote:
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
> block/qorum.c | 15 +++++++++++++++
s/qorum/quorum/ for the file name.
> 1 file changed, 15 insertions(+)
> create mode 100644 block/qorum.c
>
> diff --git a/block/qorum.c b/block/qorum.c
> new file mode 100644
> index 0000000..3341021
> --- /dev/null
> +++ b/block/qorum.c
> @@ -0,0 +1,15 @@
> +/*
> + */
> +
>
What good is a patch that adds a file whose sole contents are a comment?
This should not be an independent patch; please squash it into the
first commit that really adds contents to the file.
--
Eric Blake eblake@redhat.com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [RFC 01/12] qorum: Add GPL v2+ header file.
2012-08-02 14:04 ` Eric Blake
@ 2012-08-02 14:55 ` Benoît Canet
0 siblings, 0 replies; 25+ messages in thread
From: Benoît Canet @ 2012-08-02 14:55 UTC (permalink / raw)
To: Eric Blake; +Cc: kwolf, pbonzini, Benoît Canet, qemu-devel, stefanha
Le Thursday 02 Aug 2012 à 08:04:46 (-0600), Eric Blake a écrit :
> On 08/02/2012 04:16 AM, Benoît Canet wrote:
> > Signed-off-by: Benoit Canet <benoit@irqsave.net>
> > ---
> > block/qorum.c | 15 +++++++++++++++
>
> s/qorum/quorum/ for the file name.
ack
>
> > 1 file changed, 15 insertions(+)
> > create mode 100644 block/qorum.c
> >
> > diff --git a/block/qorum.c b/block/qorum.c
> > new file mode 100644
> > index 0000000..3341021
> > --- /dev/null
> > +++ b/block/qorum.c
> > @@ -0,0 +1,15 @@
> > +/*
>
> > + */
> > +
> >
>
> What good is a patch that adds a file whose sole contents are a comment?
> This should not be an independent patch; please squash it into the
> first commit that really adds contents to the file.
ack
>
> --
> Eric Blake eblake@redhat.com +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [RFC 01/12] qorum: Add GPL v2+ header file.
2012-08-02 10:16 ` [Qemu-devel] [RFC 01/12] qorum: Add GPL v2+ header file Benoît Canet
2012-08-02 14:04 ` Eric Blake
@ 2012-08-02 16:07 ` Andreas Färber
2012-08-02 16:47 ` Benoît Canet
1 sibling, 1 reply; 25+ messages in thread
From: Andreas Färber @ 2012-08-02 16:07 UTC (permalink / raw)
To: Benoît Canet
Cc: kwolf, pbonzini, Benoît Canet, qemu-devel, stefanha
Am 02.08.2012 12:16, schrieb Benoît Canet:
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
> block/qorum.c | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
> create mode 100644 block/qorum.c
Did you forget to add the header file, or did you mean source file? :)
Andreas
--
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
^ permalink raw reply [flat|nested] 25+ messages in thread
* [Qemu-devel] [RFC 02/12] qorum: Add QorumSingleAIOCB and QorumAIOCB.
2012-08-02 10:16 [Qemu-devel] [RFC 00/12] Qorum disk image corruption resiliency Benoît Canet
2012-08-02 10:16 ` [Qemu-devel] [RFC 01/12] qorum: Add GPL v2+ header file Benoît Canet
@ 2012-08-02 10:16 ` Benoît Canet
2012-08-02 10:16 ` [Qemu-devel] [RFC 03/12] qorum: Create BDRVQorumState and BlkDriver and do init Benoît Canet
` (12 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: Benoît Canet @ 2012-08-02 10:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, stefanha, Benoît Canet
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/qorum.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/block/qorum.c b/block/qorum.c
index 3341021..5b4f031 100644
--- a/block/qorum.c
+++ b/block/qorum.c
@@ -13,3 +13,33 @@
* See the COPYING file in the top-level directory.
*/
+#include "block_int.h"
+
+typedef struct QorumAIOCB QorumAIOCB;
+
+typedef struct QorumSingleAIOCB {
+ BlockDriverAIOCB *aiocb;
+ char *buf;
+ int ret;
+ QorumAIOCB *parent;
+} QorumSingleAIOCB;
+
+struct QorumAIOCB {
+ BlockDriverAIOCB common;
+ QEMUBH *bh;
+
+ /* Request metadata */
+ bool is_write;
+ int64_t sector_num;
+ int nb_sectors;
+
+ QEMUIOVector *qiov; /* calling readv IOV */
+
+ QorumSingleAIOCB aios[3]; /* individual AIOs */
+ QEMUIOVector qiovs[3]; /* individual IOVs */
+ int count; /* number of completed AIOCB */
+ bool *finished; /* completion signal for cancel */
+
+ void (*vote)(QorumAIOCB *acb);
+ int vote_ret;
+};
--
1.7.9.5
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [Qemu-devel] [RFC 03/12] qorum: Create BDRVQorumState and BlkDriver and do init.
2012-08-02 10:16 [Qemu-devel] [RFC 00/12] Qorum disk image corruption resiliency Benoît Canet
2012-08-02 10:16 ` [Qemu-devel] [RFC 01/12] qorum: Add GPL v2+ header file Benoît Canet
2012-08-02 10:16 ` [Qemu-devel] [RFC 02/12] qorum: Add QorumSingleAIOCB and QorumAIOCB Benoît Canet
@ 2012-08-02 10:16 ` Benoît Canet
2012-08-02 10:16 ` [Qemu-devel] [RFC 04/12] qorum: Add qorum_open() Benoît Canet
` (11 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: Benoît Canet @ 2012-08-02 10:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, stefanha, Benoît Canet
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/qorum.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/block/qorum.c b/block/qorum.c
index 5b4f031..ea2a720 100644
--- a/block/qorum.c
+++ b/block/qorum.c
@@ -15,6 +15,10 @@
#include "block_int.h"
+typedef struct {
+ BlockDriverState * bs[3];
+} BDRVQorumState;
+
typedef struct QorumAIOCB QorumAIOCB;
typedef struct QorumSingleAIOCB {
@@ -43,3 +47,17 @@ struct QorumAIOCB {
void (*vote)(QorumAIOCB *acb);
int vote_ret;
};
+
+static BlockDriver bdrv_qorum = {
+ .format_name = "qorum",
+ .protocol_name = "qorum",
+
+ .instance_size = sizeof(BDRVQorumState),
+};
+
+static void bdrv_qorum_init(void)
+{
+ bdrv_register(&bdrv_qorum);
+}
+
+block_init(bdrv_qorum_init);
--
1.7.9.5
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [Qemu-devel] [RFC 04/12] qorum: Add qorum_open().
2012-08-02 10:16 [Qemu-devel] [RFC 00/12] Qorum disk image corruption resiliency Benoît Canet
` (2 preceding siblings ...)
2012-08-02 10:16 ` [Qemu-devel] [RFC 03/12] qorum: Create BDRVQorumState and BlkDriver and do init Benoît Canet
@ 2012-08-02 10:16 ` Benoît Canet
2012-08-02 10:16 ` [Qemu-devel] [RFC 05/12] qorum: Add qorum_close() Benoît Canet
` (10 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: Benoît Canet @ 2012-08-02 10:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, stefanha, Benoît Canet
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/qorum.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 62 insertions(+)
diff --git a/block/qorum.c b/block/qorum.c
index ea2a720..bdf1530 100644
--- a/block/qorum.c
+++ b/block/qorum.c
@@ -48,11 +48,73 @@ struct QorumAIOCB {
int vote_ret;
};
+/* Valid qorum filenames look like
+ * qorum:path/to/a_image:path/to/b_image:path/to/c_image
+ */
+static int qorum_open(BlockDriverState *bs, const char *filename, int flags)
+{
+ BDRVQorumState *s = bs->opaque;
+ int ret, i;
+ char *a, *b, *c, *filenames[3];
+
+ /* Parse the qorum: prefix */
+ if (strncmp(filename, "qorum:", strlen("qorum:"))) {
+ return -EINVAL;
+ }
+ a = g_strdup(filename + strlen("qorum:"));
+
+ /* Find separators */
+ b = strchr(a, ':');
+ if (b == NULL) {
+ return -EINVAL;
+ }
+
+ c = strrchr(a, ':');
+ if (c == NULL) {
+ return -EINVAL;
+ }
+
+ /* Check that filename contains two separate ':' */
+ if (b == c) {
+ return -EINVAL;
+ }
+
+ /* Split string */
+ *b = '\0';
+ *c = '\0';
+
+ filenames[0] = a;
+ filenames[1] = b + 1;
+ filenames[2] = c + 1;
+
+ /* Open files */
+ for (i = 0; i <= 2; i++) {
+ s->bs[i] = bdrv_new("");
+ ret = bdrv_open(s->bs[i], filenames[i], flags, NULL);
+ if (ret < 0) {
+ goto error_exit;
+ }
+ }
+
+ goto clean_exit;
+
+error_exit:
+ for (; i >= 0; i--) {
+ bdrv_delete(s->bs[i]);
+ s->bs[i] = NULL;
+ }
+clean_exit:
+ g_free(a);
+ return ret;
+}
+
static BlockDriver bdrv_qorum = {
.format_name = "qorum",
.protocol_name = "qorum",
.instance_size = sizeof(BDRVQorumState),
+
+ .bdrv_file_open = qorum_open,
};
static void bdrv_qorum_init(void)
--
1.7.9.5
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [Qemu-devel] [RFC 05/12] qorum: Add qorum_close().
2012-08-02 10:16 [Qemu-devel] [RFC 00/12] Qorum disk image corruption resiliency Benoît Canet
` (3 preceding siblings ...)
2012-08-02 10:16 ` [Qemu-devel] [RFC 04/12] qorum: Add qorum_open() Benoît Canet
@ 2012-08-02 10:16 ` Benoît Canet
2012-08-02 10:16 ` [Qemu-devel] [RFC 06/12] qorum: Add qorum_getlength() Benoît Canet
` (9 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: Benoît Canet @ 2012-08-02 10:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, stefanha, Benoît Canet
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/qorum.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/block/qorum.c b/block/qorum.c
index bdf1530..006ab8c 100644
--- a/block/qorum.c
+++ b/block/qorum.c
@@ -108,6 +108,17 @@ clean_exit:
return ret;
}
+static void qorum_close(BlockDriverState *bs)
+{
+ BDRVQorumState *s = bs->opaque;
+ int i;
+
+ /* Ensure writes reach stable storage */
+ for (i = 0; i <= 2; i++) {
+ bdrv_flush(s->bs[i]);
+ }
+}
+
static BlockDriver bdrv_qorum = {
.format_name = "qorum",
.protocol_name = "qorum",
@@ -115,6 +126,7 @@ static BlockDriver bdrv_qorum = {
.instance_size = sizeof(BDRVQorumState),
.bdrv_file_open = qorum_open,
+ .bdrv_close = qorum_close,
};
static void bdrv_qorum_init(void)
--
1.7.9.5
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [Qemu-devel] [RFC 06/12] qorum: Add qorum_getlength().
2012-08-02 10:16 [Qemu-devel] [RFC 00/12] Qorum disk image corruption resiliency Benoît Canet
` (4 preceding siblings ...)
2012-08-02 10:16 ` [Qemu-devel] [RFC 05/12] qorum: Add qorum_close() Benoît Canet
@ 2012-08-02 10:16 ` Benoît Canet
2012-08-02 10:16 ` [Qemu-devel] [RFC 07/12] qorum: Add qorum_aio_writev and its dependencies Benoît Canet
` (8 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: Benoît Canet @ 2012-08-02 10:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, stefanha, Benoît Canet
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/qorum.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/block/qorum.c b/block/qorum.c
index 006ab8c..37f6514 100644
--- a/block/qorum.c
+++ b/block/qorum.c
@@ -119,12 +119,29 @@ static void qorum_close(BlockDriverState *bs)
}
}
+static int64_t qorum_getlength(BlockDriverState *bs)
+{
+ BDRVQorumState *s = bs->opaque;
+ int i;
+ int64_t ret;
+
+ /* return the length of the first available qorum file */
+ for (i = 0, ret = bdrv_getlength(s->bs[i]);
+ ret == -ENOMEDIUM && i <= 2;
+ i++, ret = bdrv_getlength(s->bs[i])) {
+ }
+
+ return ret;
+}
+
static BlockDriver bdrv_qorum = {
.format_name = "qorum",
.protocol_name = "qorum",
.instance_size = sizeof(BDRVQorumState),
+ .bdrv_getlength = qorum_getlength,
+
.bdrv_file_open = qorum_open,
.bdrv_close = qorum_close,
};
--
1.7.9.5
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [Qemu-devel] [RFC 07/12] qorum: Add qorum_aio_writev and its dependencies.
2012-08-02 10:16 [Qemu-devel] [RFC 00/12] Qorum disk image corruption resiliency Benoît Canet
` (5 preceding siblings ...)
2012-08-02 10:16 ` [Qemu-devel] [RFC 06/12] qorum: Add qorum_getlength() Benoît Canet
@ 2012-08-02 10:16 ` Benoît Canet
2012-08-02 10:16 ` [Qemu-devel] [RFC 08/12] blkverify: Make blkverify_iovec_clone() and blkverify_iovec_compare() public Benoît Canet
` (7 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: Benoît Canet @ 2012-08-02 10:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, stefanha, Benoît Canet
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/qorum.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 112 insertions(+)
diff --git a/block/qorum.c b/block/qorum.c
index 37f6514..3dae8e4 100644
--- a/block/qorum.c
+++ b/block/qorum.c
@@ -134,6 +134,116 @@ static int64_t qorum_getlength(BlockDriverState *bs)
return ret;
}
+static void qorum_aio_cancel(BlockDriverAIOCB *blockacb)
+{
+ QorumAIOCB *acb = container_of(blockacb, QorumAIOCB, common);
+ bool finished = false;
+
+ /* Wait for the request to finish */
+ acb->finished = &finished;
+ while (!finished) {
+ qemu_aio_wait();
+ }
+}
+
+static AIOPool qorum_aio_pool = {
+ .aiocb_size = sizeof(QorumAIOCB),
+ .cancel = qorum_aio_cancel,
+};
+
+static int qorum_check_ret(QorumAIOCB *acb)
+{
+ int i, j;
+
+ for (i = 0, j = 0; i <= 2; i++) {
+ if (acb->aios[0].ret) {
+ j++;
+ }
+ }
+
+ if (j > 1) {
+ return -EIO;
+ }
+
+ return 0;
+}
+
+static void qorum_aio_bh(void *opaque)
+{
+ QorumAIOCB *acb = opaque;
+
+ qemu_bh_delete(acb->bh);
+ acb->common.cb(acb->common.opaque, qorum_check_ret(acb));
+ if (acb->finished) {
+ *acb->finished = true;
+ }
+ qemu_aio_release(acb);
+}
+
+static QorumAIOCB *qorum_aio_get(BlockDriverState *bs,
+ QEMUIOVector *qiov,
+ bool is_write,
+ int64_t sector_num,
+ int nb_sectors,
+ BlockDriverCompletionFunc *cb,
+ void *opaque)
+{
+ QorumAIOCB *acb = qemu_aio_get(&qorum_aio_pool, bs, cb, opaque);
+ int i;
+
+ acb->qiov = qiov;
+ acb->bh = NULL;
+ acb->count = 0;
+ acb->is_write = is_write;
+ acb->sector_num = sector_num;
+ acb->nb_sectors = nb_sectors;
+ acb->vote = NULL;
+ acb->vote_ret = 0;
+
+ for (i = 0; i <= 2; i++) {
+ acb->aios[i].buf = NULL;
+ acb->aios[i].ret = 0;
+ acb->aios[i].parent = acb;
+ }
+
+ return acb;
+}
+
+static void qorum_aio_cb(void *opaque, int ret)
+{
+ QorumSingleAIOCB *sacb = opaque;
+ QorumAIOCB *acb = sacb->parent;
+
+ sacb->ret = ret;
+ acb->count++;
+ assert(acb->count <= 3);
+ if (acb->count == 3) {
+ acb->bh = qemu_bh_new(qorum_aio_bh, acb);
+ qemu_bh_schedule(acb->bh);
+ }
+}
+
+static BlockDriverAIOCB *qorum_aio_writev(BlockDriverState *bs,
+ int64_t sector_num,
+ QEMUIOVector *qiov,
+ int nb_sectors,
+ BlockDriverCompletionFunc *cb,
+ void *opaque)
+{
+ BDRVQorumState *s = bs->opaque;
+ QorumAIOCB *acb = qorum_aio_get(bs, qiov, true, sector_num, nb_sectors,
+ cb, opaque);
+ int i;
+
+ for (i = 0; i <= 2; i++) {
+ acb->aios[i].aiocb = bdrv_aio_writev(s->bs[i], sector_num, qiov,
+ nb_sectors, &qorum_aio_cb,
+ &acb->aios[i]);
+ }
+
+ return &acb->common;
+}
+
static BlockDriver bdrv_qorum = {
.format_name = "qorum",
.protocol_name = "qorum",
@@ -144,6 +254,8 @@ static BlockDriver bdrv_qorum = {
.bdrv_file_open = qorum_open,
.bdrv_close = qorum_close,
+
+ .bdrv_aio_writev = qorum_aio_writev,
};
static void bdrv_qorum_init(void)
--
1.7.9.5
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [Qemu-devel] [RFC 08/12] blkverify: Make blkverify_iovec_clone() and blkverify_iovec_compare() public
2012-08-02 10:16 [Qemu-devel] [RFC 00/12] Qorum disk image corruption resiliency Benoît Canet
` (6 preceding siblings ...)
2012-08-02 10:16 ` [Qemu-devel] [RFC 07/12] qorum: Add qorum_aio_writev and its dependencies Benoît Canet
@ 2012-08-02 10:16 ` Benoît Canet
2012-08-02 10:16 ` [Qemu-devel] [RFC 09/12] qorum: Add qorum_co_flush() Benoît Canet
` (6 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: Benoît Canet @ 2012-08-02 10:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, stefanha, Benoît Canet
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/blkverify.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/block/blkverify.c b/block/blkverify.c
index 9d5f1ec..9e15081 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -11,6 +11,10 @@
#include "qemu_socket.h" /* for EINPROGRESS on Windows */
#include "block_int.h"
+ssize_t blkverify_iovec_compare(QEMUIOVector *a, QEMUIOVector *b);
+void blkverify_iovec_clone(QEMUIOVector *dest, const QEMUIOVector *src,
+ void *buf);
+
typedef struct {
BlockDriverState *test_file;
} BDRVBlkverifyState;
@@ -130,7 +134,7 @@ static int64_t blkverify_getlength(BlockDriverState *bs)
* @b: I/O vector
* @ret: Offset to first mismatching byte or -1 if match
*/
-static ssize_t blkverify_iovec_compare(QEMUIOVector *a, QEMUIOVector *b)
+ssize_t blkverify_iovec_compare(QEMUIOVector *a, QEMUIOVector *b)
{
int i;
ssize_t offset = 0;
@@ -190,7 +194,7 @@ static int sortelem_cmp_src_index(const void *a, const void *b)
* The relative relationships of overlapping iovecs are preserved. This is
* necessary to ensure identical semantics in the cloned I/O vector.
*/
-static void blkverify_iovec_clone(QEMUIOVector *dest, const QEMUIOVector *src,
+void blkverify_iovec_clone(QEMUIOVector *dest, const QEMUIOVector *src,
void *buf)
{
IOVectorSortElem sortelems[src->niov];
--
1.7.9.5
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [Qemu-devel] [RFC 09/12] qorum: Add qorum_co_flush().
2012-08-02 10:16 [Qemu-devel] [RFC 00/12] Qorum disk image corruption resiliency Benoît Canet
` (7 preceding siblings ...)
2012-08-02 10:16 ` [Qemu-devel] [RFC 08/12] blkverify: Make blkverify_iovec_clone() and blkverify_iovec_compare() public Benoît Canet
@ 2012-08-02 10:16 ` Benoît Canet
2012-08-02 10:16 ` [Qemu-devel] [RFC 10/12] qorum: Add qorum_aio_readv Benoît Canet
` (5 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: Benoît Canet @ 2012-08-02 10:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, stefanha, Benoît Canet
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/qorum.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/block/qorum.c b/block/qorum.c
index 3dae8e4..eeffac2 100644
--- a/block/qorum.c
+++ b/block/qorum.c
@@ -15,6 +15,10 @@
#include "block_int.h"
+ssize_t blkverify_iovec_compare(QEMUIOVector *a, QEMUIOVector *b);
+void blkverify_iovec_clone(QEMUIOVector *dest, const QEMUIOVector *src,
+ void *buf);
+
typedef struct {
BlockDriverState * bs[3];
} BDRVQorumState;
@@ -244,6 +248,21 @@ static BlockDriverAIOCB *qorum_aio_writev(BlockDriverState *bs,
return &acb->common;
}
+static coroutine_fn int qorum_co_flush(BlockDriverState *bs)
+{
+ BDRVQorumState *s = bs->opaque;
+ int i, ret;
+
+ for (i = 0; i <= 2; i++) {
+ ret = bdrv_co_flush(s->bs[i]);
+ if (ret < 0) {
+ return ret;
+ }
+ }
+
+ return ret;
+}
+
static BlockDriver bdrv_qorum = {
.format_name = "qorum",
.protocol_name = "qorum",
@@ -254,6 +273,7 @@ static BlockDriver bdrv_qorum = {
.bdrv_file_open = qorum_open,
.bdrv_close = qorum_close,
+ .bdrv_co_flush_to_disk = qorum_co_flush,
.bdrv_aio_writev = qorum_aio_writev,
};
--
1.7.9.5
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [Qemu-devel] [RFC 10/12] qorum: Add qorum_aio_readv.
2012-08-02 10:16 [Qemu-devel] [RFC 00/12] Qorum disk image corruption resiliency Benoît Canet
` (8 preceding siblings ...)
2012-08-02 10:16 ` [Qemu-devel] [RFC 09/12] qorum: Add qorum_co_flush() Benoît Canet
@ 2012-08-02 10:16 ` Benoît Canet
2012-08-02 10:16 ` [Qemu-devel] [RFC 11/12] qorum: Add qorum mechanism Benoît Canet
` (4 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: Benoît Canet @ 2012-08-02 10:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, stefanha, Benoît Canet
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/qorum.c | 35 +++++++++++++++++++++++++++++++++++
1 file changed, 35 insertions(+)
diff --git a/block/qorum.c b/block/qorum.c
index eeffac2..772d138 100644
--- a/block/qorum.c
+++ b/block/qorum.c
@@ -175,6 +175,14 @@ static int qorum_check_ret(QorumAIOCB *acb)
static void qorum_aio_bh(void *opaque)
{
QorumAIOCB *acb = opaque;
+ int i;
+
+ for (i = 0; i <= 2; i++) {
+ if (acb->aios[i].buf) {
+ g_free(acb->aios[i].buf);
+ acb->aios[i].buf = NULL;
+ }
+ }
qemu_bh_delete(acb->bh);
acb->common.cb(acb->common.opaque, qorum_check_ret(acb));
@@ -227,6 +235,32 @@ static void qorum_aio_cb(void *opaque, int ret)
}
}
+static BlockDriverAIOCB *qorum_aio_readv(BlockDriverState *bs,
+ int64_t sector_num,
+ QEMUIOVector *qiov,
+ int nb_sectors,
+ BlockDriverCompletionFunc *cb,
+ void *opaque)
+{
+ BDRVQorumState *s = bs->opaque;
+ QorumAIOCB *acb = qorum_aio_get(bs, qiov, true, sector_num,
+ nb_sectors, cb, opaque);
+ int i;
+
+ for (i = 0; i <= 2; i++) {
+ acb->aios[i].buf = qemu_blockalign(bs->file, qiov->size);
+ qemu_iovec_init(&acb->qiovs[i], qiov->niov);
+ blkverify_iovec_clone(&acb->qiovs[i], qiov, acb->aios[i].buf);
+ }
+
+ for (i = 0; i <= 2; i++) {
+ bdrv_aio_readv(s->bs[i], sector_num, qiov, nb_sectors,
+ qorum_aio_cb, &acb->aios[i]);
+ }
+
+ return &acb->common;
+}
+
static BlockDriverAIOCB *qorum_aio_writev(BlockDriverState *bs,
int64_t sector_num,
QEMUIOVector *qiov,
@@ -275,6 +309,7 @@ static BlockDriver bdrv_qorum = {
.bdrv_close = qorum_close,
.bdrv_co_flush_to_disk = qorum_co_flush,
+ .bdrv_aio_readv = qorum_aio_readv,
.bdrv_aio_writev = qorum_aio_writev,
};
--
1.7.9.5
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [Qemu-devel] [RFC 11/12] qorum: Add qorum mechanism.
2012-08-02 10:16 [Qemu-devel] [RFC 00/12] Qorum disk image corruption resiliency Benoît Canet
` (9 preceding siblings ...)
2012-08-02 10:16 ` [Qemu-devel] [RFC 10/12] qorum: Add qorum_aio_readv Benoît Canet
@ 2012-08-02 10:16 ` Benoît Canet
2012-08-02 10:16 ` [Qemu-devel] [RFC 12/12] qorum: build feature into QEMU Benoît Canet
` (3 subsequent siblings)
14 siblings, 0 replies; 25+ messages in thread
From: Benoît Canet @ 2012-08-02 10:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, stefanha, Benoît Canet
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/qorum.c | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 78 insertions(+), 6 deletions(-)
diff --git a/block/qorum.c b/block/qorum.c
index 772d138..1f307b6 100644
--- a/block/qorum.c
+++ b/block/qorum.c
@@ -175,7 +175,7 @@ static int qorum_check_ret(QorumAIOCB *acb)
static void qorum_aio_bh(void *opaque)
{
QorumAIOCB *acb = opaque;
- int i;
+ int i, ret;
for (i = 0; i <= 2; i++) {
if (acb->aios[i].buf) {
@@ -185,7 +185,12 @@ static void qorum_aio_bh(void *opaque)
}
qemu_bh_delete(acb->bh);
- acb->common.cb(acb->common.opaque, qorum_check_ret(acb));
+ if (acb->vote_ret) {
+ ret = acb->vote_ret;
+ } else {
+ ret = qorum_check_ret(acb);
+ }
+ acb->common.cb(acb->common.opaque, ret);
if (acb->finished) {
*acb->finished = true;
}
@@ -229,10 +234,75 @@ static void qorum_aio_cb(void *opaque, int ret)
sacb->ret = ret;
acb->count++;
assert(acb->count <= 3);
- if (acb->count == 3) {
- acb->bh = qemu_bh_new(qorum_aio_bh, acb);
- qemu_bh_schedule(acb->bh);
+ if (acb->count < 3) {
+ return;
}
+
+ /* Do the qorum */
+ if (acb->vote) {
+ acb->vote(acb);
+ }
+
+ acb->bh = qemu_bh_new(qorum_aio_bh, acb);
+ qemu_bh_schedule(acb->bh);
+}
+
+static void qorum_print_bad(QorumAIOCB *acb, const char *filename)
+{
+ fprintf(stderr, "qorum: corrected error in qorum file %s: sector_num=%"
+ PRId64 " nb_sectors=%i\n", filename, acb->sector_num,
+ acb->nb_sectors);
+}
+
+static void qorum_print_failure(QorumAIOCB *acb)
+{
+ fprintf(stderr, "qorum: failure sector_num=%" PRId64 " nb_sectors=%i\n",
+ acb->sector_num, acb->nb_sectors);
+}
+
+static void qorum_copy_qiov(QEMUIOVector *dest, QEMUIOVector *source)
+{
+ int i;
+ for (i = 0; i < source->niov; i++) {
+ memcpy(dest->iov[i].iov_base,
+ source->iov[i].iov_base,
+ source->iov[i].iov_len);
+ dest->iov[i].iov_len = source->iov[i].iov_len;
+ }
+ dest->niov = source->niov;
+ dest->nalloc = source->nalloc;
+ dest->size = source->size;
+}
+
+static void qorum_vote(QorumAIOCB *acb)
+{
+ ssize_t a_b, b_c, a_c;
+ a_b = blkverify_iovec_compare(&acb->qiovs[0], &acb->qiovs[1]);
+ b_c = blkverify_iovec_compare(&acb->qiovs[1], &acb->qiovs[2]);
+
+ /* Three vector identical -> qorum */
+ if (a_b == b_c && a_b == -1) {
+ qorum_copy_qiov(acb->qiov, &acb->qiovs[0]); /*clone a */
+ return;
+ }
+ if (a_b == -1) {
+ qorum_print_bad(acb, "C");
+ qorum_copy_qiov(acb->qiov, &acb->qiovs[0]); /*clone a */
+ return;
+ }
+ if (b_c == -1) {
+ qorum_print_bad(acb, "A");
+ qorum_copy_qiov(acb->qiov, &acb->qiovs[1]); /*clone b */
+ return;
+ }
+ a_c = blkverify_iovec_compare(&acb->qiovs[0], &acb->qiovs[2]);
+ if (a_c == -1) {
+ qorum_print_bad(acb, "B");
+ qorum_copy_qiov(acb->qiov, &acb->qiovs[0]); /*clone a */
+ return;
+ }
+ qorum_print_failure(acb);
+ acb->vote_ret = -EIO;
}
static BlockDriverAIOCB *qorum_aio_readv(BlockDriverState *bs,
@@ -247,6 +317,8 @@ static BlockDriverAIOCB *qorum_aio_readv(BlockDriverState *bs,
nb_sectors, cb, opaque);
int i;
+ acb->vote = qorum_vote;
+
for (i = 0; i <= 2; i++) {
acb->aios[i].buf = qemu_blockalign(bs->file, qiov->size);
qemu_iovec_init(&acb->qiovs[i], qiov->niov);
@@ -254,7 +326,7 @@ static BlockDriverAIOCB *qorum_aio_readv(BlockDriverState *bs,
}
for (i = 0; i <= 2; i++) {
- bdrv_aio_readv(s->bs[i], sector_num, qiov, nb_sectors,
+ bdrv_aio_readv(s->bs[i], sector_num, &acb->qiovs[i], nb_sectors,
qorum_aio_cb, &acb->aios[i]);
}
--
1.7.9.5
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [Qemu-devel] [RFC 12/12] qorum: build feature into QEMU.
2012-08-02 10:16 [Qemu-devel] [RFC 00/12] Qorum disk image corruption resiliency Benoît Canet
` (10 preceding siblings ...)
2012-08-02 10:16 ` [Qemu-devel] [RFC 11/12] qorum: Add qorum mechanism Benoît Canet
@ 2012-08-02 10:16 ` Benoît Canet
2012-08-02 16:06 ` Andreas Färber
2012-08-02 13:17 ` [Qemu-devel] [RFC 00/12] Qorum disk image corruption resiliency Eric Blake
` (2 subsequent siblings)
14 siblings, 1 reply; 25+ messages in thread
From: Benoît Canet @ 2012-08-02 10:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, stefanha, Benoît Canet
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/Makefile.objs | 1 +
1 file changed, 1 insertion(+)
diff --git a/block/Makefile.objs b/block/Makefile.objs
index b5754d3..6ff9ba7 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -4,6 +4,7 @@ block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
block-obj-y += qed-check.o
block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
block-obj-y += stream.o
+block-obj-y += qorum.o
block-obj-$(CONFIG_WIN32) += raw-win32.o
block-obj-$(CONFIG_POSIX) += raw-posix.o
block-obj-$(CONFIG_LIBISCSI) += iscsi.o
--
1.7.9.5
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [RFC 12/12] qorum: build feature into QEMU.
2012-08-02 10:16 ` [Qemu-devel] [RFC 12/12] qorum: build feature into QEMU Benoît Canet
@ 2012-08-02 16:06 ` Andreas Färber
0 siblings, 0 replies; 25+ messages in thread
From: Andreas Färber @ 2012-08-02 16:06 UTC (permalink / raw)
To: Benoît Canet
Cc: kwolf, pbonzini, Benoît Canet, qemu-devel, stefanha
Am 02.08.2012 12:16, schrieb Benoît Canet:
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
> block/Makefile.objs | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index b5754d3..6ff9ba7 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -4,6 +4,7 @@ block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
> block-obj-y += qed-check.o
> block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
> block-obj-y += stream.o
> +block-obj-y += qorum.o
> block-obj-$(CONFIG_WIN32) += raw-win32.o
> block-obj-$(CONFIG_POSIX) += raw-posix.o
> block-obj-$(CONFIG_LIBISCSI) += iscsi.o
Please do this in the patch adding the file.
Andreas
--
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [RFC 00/12] Qorum disk image corruption resiliency
2012-08-02 10:16 [Qemu-devel] [RFC 00/12] Qorum disk image corruption resiliency Benoît Canet
` (11 preceding siblings ...)
2012-08-02 10:16 ` [Qemu-devel] [RFC 12/12] qorum: build feature into QEMU Benoît Canet
@ 2012-08-02 13:17 ` Eric Blake
2012-08-02 13:28 ` Benoît Canet
2012-08-02 18:14 ` Anthony Liguori
2012-08-03 16:14 ` Blue Swirl
14 siblings, 1 reply; 25+ messages in thread
From: Eric Blake @ 2012-08-02 13:17 UTC (permalink / raw)
To: Benoît Canet
Cc: kwolf, pbonzini, Benoît Canet, qemu-devel, stefanha
[-- Attachment #1: Type: text/plain, Size: 1249 bytes --]
On 08/02/2012 04:16 AM, Benoît Canet wrote:
> This patchset create a block driver implementing a qorum using three qemu disk
s/qorum/quorum/g throughout the series, including subject line
> images. Writes are mirrored on the three files.
> For the reading part the three files are read at the same time and a vote is
> done to determine which is the majoritary qiov version. It then return this
s/majoritary/majority/
> majoritary version to the upper layers.
> When three differents versions of the data are returned by the lower layer the
s/differents/different/
> qorum is broken and the read return -EIO.
>
> The goal of this patchset is to be turned in a QEMU block filter living just
> above raw-*.c and below qcow2/qed when the required infrastructure will be done.
>
> Main use of this feature will be people using NFS appliances which can be
> subjected to bitflip errors.
>
> usage: -drive file=qorum:image1.raw:image2.raw:image3.raw,if=virtio,cache=none
How does this fit with snapshots? Does a snapshot of a quorum require
passing in three filenames, one for each of the three sources?
--
Eric Blake eblake@redhat.com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [RFC 00/12] Qorum disk image corruption resiliency
2012-08-02 13:17 ` [Qemu-devel] [RFC 00/12] Qorum disk image corruption resiliency Eric Blake
@ 2012-08-02 13:28 ` Benoît Canet
0 siblings, 0 replies; 25+ messages in thread
From: Benoît Canet @ 2012-08-02 13:28 UTC (permalink / raw)
To: Eric Blake; +Cc: kwolf, pbonzini, Benoît Canet, qemu-devel, stefanha
Le Thursday 02 Aug 2012 à 07:17:35 (-0600), Eric Blake a écrit :
> How does this fit with snapshots? Does a snapshot of a quorum require
> passing in three filenames, one for each of the three sources?
For now quorum lives on top of qcow*/qed it doesn't fit well with snapshot:
it a step before turning it in a block filter.
When the QEMU block filter infrastructure will be done (and I am willing to
help it happen) quorum will live on top of raw files and below qcow*/qed
this way snapshot should work fine.
Benoît
>
> --
> Eric Blake eblake@redhat.com +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [RFC 00/12] Qorum disk image corruption resiliency
2012-08-02 10:16 [Qemu-devel] [RFC 00/12] Qorum disk image corruption resiliency Benoît Canet
` (12 preceding siblings ...)
2012-08-02 13:17 ` [Qemu-devel] [RFC 00/12] Qorum disk image corruption resiliency Eric Blake
@ 2012-08-02 18:14 ` Anthony Liguori
2012-08-02 19:22 ` Benoît Canet
2012-08-03 16:14 ` Blue Swirl
14 siblings, 1 reply; 25+ messages in thread
From: Anthony Liguori @ 2012-08-02 18:14 UTC (permalink / raw)
To: Benoît Canet, qemu-devel
Cc: kwolf, pbonzini, stefanha, Benoît Canet
Benoît Canet <benoit.canet@gmail.com> writes:
> This patchset create a block driver implementing a qorum using three qemu disk
> images. Writes are mirrored on the three files.
> For the reading part the three files are read at the same time and a vote is
> done to determine which is the majoritary qiov version. It then return this
> majoritary version to the upper layers.
> When three differents versions of the data are returned by the lower layer the
> qorum is broken and the read return -EIO.
>
> The goal of this patchset is to be turned in a QEMU block filter living just
> above raw-*.c and below qcow2/qed when the required infrastructure will be done.
>
> Main use of this feature will be people using NFS appliances which can be
> subjected to bitflip errors.
I'm not entirely sure I understand the use-case all that well.
Wouldn't the more typical approach be RAID-5 and the use of parity
instead of relying on voting?
Quorum doesn't work well with an odd number of disks whereas RAID-5
does. You also get significantly more usable disk space with RAID-5
then with voting.
Regards,
Anthony Liguori
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [RFC 00/12] Qorum disk image corruption resiliency
2012-08-02 18:14 ` Anthony Liguori
@ 2012-08-02 19:22 ` Benoît Canet
2012-08-03 9:21 ` Stefan Hajnoczi
0 siblings, 1 reply; 25+ messages in thread
From: Benoît Canet @ 2012-08-02 19:22 UTC (permalink / raw)
To: Anthony Liguori; +Cc: kwolf, pbonzini, Benoît Canet, qemu-devel, stefanha
> I'm not entirely sure I understand the use-case all that well.
>
> Wouldn't the more typical approach be RAID-5 and the use of parity
> instead of relying on voting?
>
> Quorum doesn't work well with an odd number of disks whereas RAID-5
> does. You also get significantly more usable disk space with RAID-5
> then with voting.
>
Hello,
Use case:
A customer using NFS want to setup redudancy across multiple separate
rooms of the same datacenter.
In this case only the network is common.
Testing prove that synchronisation between high end storage applicances
fail in this case.
Something else is required.
With raid5 a small network glitch between the hypervisor and one
of the filer can bring down a while md raid-5 disk.
This involve a rebuild of this disk using heavy parity computation.
(imagine the load with many disk images)
Properly done qorum will correct the error on the fly.
Quorum can correct bitflips induced by the network raid5 cannot.
(bad case ethernet cable sitting around power cord)
Quorum require only two read out of three to reach majority in the
best case.
Some well known cloud provider already use quorum in their setup
Regards,
Benoît
> Regards,
>
> Anthony Liguori
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [RFC 00/12] Qorum disk image corruption resiliency
2012-08-02 19:22 ` Benoît Canet
@ 2012-08-03 9:21 ` Stefan Hajnoczi
0 siblings, 0 replies; 25+ messages in thread
From: Stefan Hajnoczi @ 2012-08-03 9:21 UTC (permalink / raw)
To: Benoît Canet
Cc: kwolf, Benoît Canet, stefanha, qemu-devel, Anthony Liguori,
pbonzini
On Thu, Aug 2, 2012 at 8:22 PM, Benoît Canet <benoit.canet@irqsave.net> wrote:
>> I'm not entirely sure I understand the use-case all that well.
>>
>> Wouldn't the more typical approach be RAID-5 and the use of parity
>> instead of relying on voting?
>>
>> Quorum doesn't work well with an odd number of disks whereas RAID-5
>> does. You also get significantly more usable disk space with RAID-5
>> then with voting.
>>
>
> Hello,
>
> Use case:
>
> A customer using NFS want to setup redudancy across multiple separate
> rooms of the same datacenter.
> In this case only the network is common.
>
> Testing prove that synchronisation between high end storage applicances
> fail in this case.
> Something else is required.
>
> With raid5 a small network glitch between the hypervisor and one
> of the filer can bring down a while md raid-5 disk.
> This involve a rebuild of this disk using heavy parity computation.
> (imagine the load with many disk images)
> Properly done qorum will correct the error on the fly.
>
> Quorum can correct bitflips induced by the network raid5 cannot.
> (bad case ethernet cable sitting around power cord)
>
> Quorum require only two read out of three to reach majority in the
> best case.
>
> Some well known cloud provider already use quorum in their setup
There is discussion about adding end-to-end data integrity checks to NFSv4:
http://www.ietf.org/proceedings/83/slides/slides-83-nfsv4-2.pdf
This doesn't seem to exist yet but I wanted to share the slides.
Stefan
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [RFC 00/12] Qorum disk image corruption resiliency
2012-08-02 10:16 [Qemu-devel] [RFC 00/12] Qorum disk image corruption resiliency Benoît Canet
` (13 preceding siblings ...)
2012-08-02 18:14 ` Anthony Liguori
@ 2012-08-03 16:14 ` Blue Swirl
2012-08-03 19:11 ` Benoît Canet
14 siblings, 1 reply; 25+ messages in thread
From: Blue Swirl @ 2012-08-03 16:14 UTC (permalink / raw)
To: Benoît Canet
Cc: kwolf, pbonzini, Benoît Canet, qemu-devel, stefanha
On Thu, Aug 2, 2012 at 10:16 AM, Benoît Canet <benoit.canet@gmail.com> wrote:
> This patchset create a block driver implementing a qorum using three qemu disk
> images. Writes are mirrored on the three files.
> For the reading part the three files are read at the same time and a vote is
> done to determine which is the majoritary qiov version. It then return this
> majoritary version to the upper layers.
> When three differents versions of the data are returned by the lower layer the
> qorum is broken and the read return -EIO.
It would be pretty easy to make the number of nodes and quorum
threshold values for both read and write selectable. Then you could
have for example 100 nodes and write quorum at 51 (for example, 49
nodes offline). Obviously writing the same data 100 times sequentially
would not give very high performance but it's a start.
>
> The goal of this patchset is to be turned in a QEMU block filter living just
> above raw-*.c and below qcow2/qed when the required infrastructure will be done.
>
> Main use of this feature will be people using NFS appliances which can be
> subjected to bitflip errors.
I think this would give pretty easy way to keep distributed replicas in synch.
>
> usage: -drive file=qorum:image1.raw:image2.raw:image3.raw,if=virtio,cache=none
>
> Benoît Canet (12):
> qorum: Add GPL v2+ header file.
> qorum: Add QorumSingleAIOCB and QorumAIOCB.
> qorum: Create BDRVQorumState and BlkDriver and do init.
> qorum: Add qorum_open().
> qorum: Add qorum_close().
> qorum: Add qorum_getlength().
> qorum: Add qorum_aio_writev and its dependencies.
> blkverify: Make blkverify_iovec_clone() and blkverify_iovec_compare()
> public
> qorum: Add qorum_co_flush().
> qorum: Add qorum_aio_readv.
> qorum: Add qorum mechanism.
> qorum: build feature into QEMU.
>
> block/Makefile.objs | 1 +
> block/blkverify.c | 8 +-
> block/qorum.c | 393 +++++++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 400 insertions(+), 2 deletions(-)
> create mode 100644 block/qorum.c
>
> --
> 1.7.9.5
>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [RFC 00/12] Qorum disk image corruption resiliency
2012-08-03 16:14 ` Blue Swirl
@ 2012-08-03 19:11 ` Benoît Canet
0 siblings, 0 replies; 25+ messages in thread
From: Benoît Canet @ 2012-08-03 19:11 UTC (permalink / raw)
To: Blue Swirl; +Cc: kwolf, pbonzini, Benoît Canet, qemu-devel, stefanha
Le Friday 03 Aug 2012 à 16:14:51 (+0000), Blue Swirl a écrit :
> On Thu, Aug 2, 2012 at 10:16 AM, Benoît Canet <benoit.canet@gmail.com> wrote:
> > This patchset create a block driver implementing a qorum using three qemu disk
> > images. Writes are mirrored on the three files.
> > For the reading part the three files are read at the same time and a vote is
> > done to determine which is the majoritary qiov version. It then return this
> > majoritary version to the upper layers.
> > When three differents versions of the data are returned by the lower layer the
> > qorum is broken and the read return -EIO.
>
> It would be pretty easy to make the number of nodes and quorum
> threshold values for both read and write selectable. Then you could
> have for example 100 nodes and write quorum at 51 (for example, 49
> nodes offline). Obviously writing the same data 100 times sequentially
> would not give very high performance but it's a start.
For now the number of disk is hardcoded to 3. But most of the code is written
with a variable number of disk in mind: just quorum_open and quorum_vote would need
to be rewritten with a few automatic changes across the code.
^ permalink raw reply [flat|nested] 25+ messages in thread