* [Qemu-devel] [RFC V4 0/9] Quorum disk image corruption resiliency
@ 2012-08-20 11:47 Benoît Canet
2012-08-20 11:47 ` [Qemu-devel] [RFC V4 1/9] quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB Benoît Canet
` (8 more replies)
0 siblings, 9 replies; 13+ messages in thread
From: Benoît Canet @ 2012-08-20 11:47 UTC (permalink / raw)
To: qemu-devel
Cc: kwolf, stefanha, blauwirbel, anthony, pbonzini, eblake, afaerber,
Benoît Canet
This patchset create a block driver implementing a quorum using total qemu disk
images. Writes are mirrored on the $total files.
For the reading part the $total files are read at the same time and a vote is
done to determine if a qiov version is present $threshold or more times. It then return
this majority version to the upper layers.
When i < $threshold versions of the data are returned by the lower layer the
quorum is broken and the read return -EIO.
The goal of this patchset is to be turned in a QEMU block filter living just
above raw-*.c and below qcow2/qed when the required infrastructure will be done.
Main use of this feature will be people using NFS appliances which can be
subjected to bitflip errors.
This patchset can be used to replace blkverify and the out of tree blkmirror.
usage: -drive file=quorum:threshold/total:image_1.raw:...:image_total.raw,if=virtio,cache=none
in v2:
eblake: fix typos
squash two first commits
afärber: Modify the Makefile on first commit
bcanet: move function prototype of quorum.c one patch down
in v3:
Blue Swirl: change char * to uint8_t * in QuorumSingleAIOCB
Eric Blake: Add escaping of the : separator
Allow to specify the n/m ratio parameters of the Quorum
Stefan Hajnoczi: Squash quorum_close and quorum_open patch to avoid leak
Add missing bdrv_delete() in quorum_close
simpler quorum_getlength
make the quorum_check_ret threshold a user setting (bind it to n)
move blkverify_iovec_clone() and blkverify_iovec_compare() to cutils.c
free unconditionally qemu_blockalign() with qemu_vfree()
turn assignement into assert in quorum_copy_qiov()
in v4:
Eric Blake: verbose commit message for "Add quorum_open() and quorum_close()"
use of a bool for the escape variable in the same commit
simplify a if to a one liner in the same commit
replace "+= 1" by ++ in a number of places
make quorum_getlength return a quorum vote.
Blue Swirl: replace n and m by threshold and total
ignore flush errors in quorum_co_flush
Stefan Hajnoczi: removal of a macro in "Add quorum mechanism"
call qemu_iovec_destroy in the bh
Benoît Canet: Now use QuorumVoteItem and QuorumVoteVersion as names for the
voting structs
refactor and rename function to quorum_count_vote.
Benoît Canet (9):
quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB.
quorum: Create BDRVQuorumState and BlkDriver and do init.
quorum: Add quorum_open() and quorum_close().
quorum: Add quorum_aio_writev and its dependencies.
blkverify: Extract qemu_iovec_clone() and qemu_iovec_compare() from
blkverify.
quorum: Add quorum_co_flush().
quorum: Add quorum_aio_readv.
quorum: Add quorum mechanism.
quorum: Add quorum_getlength().
block/Makefile.objs | 1 +
block/blkverify.c | 108 +---------
block/quorum.c | 577 +++++++++++++++++++++++++++++++++++++++++++++++++++
cutils.c | 103 +++++++++
qemu-common.h | 2 +
5 files changed, 685 insertions(+), 106 deletions(-)
create mode 100644 block/quorum.c
--
1.7.9.5
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Qemu-devel] [RFC V4 1/9] quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB.
2012-08-20 11:47 [Qemu-devel] [RFC V4 0/9] Quorum disk image corruption resiliency Benoît Canet
@ 2012-08-20 11:47 ` Benoît Canet
2012-08-20 11:47 ` [Qemu-devel] [RFC V4 2/9] quorum: Create BDRVQuorumState and BlkDriver and do init Benoît Canet
` (7 subsequent siblings)
8 siblings, 0 replies; 13+ messages in thread
From: Benoît Canet @ 2012-08-20 11:47 UTC (permalink / raw)
To: qemu-devel
Cc: kwolf, stefanha, blauwirbel, anthony, pbonzini, eblake, afaerber,
Benoît Canet
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/Makefile.objs | 1 +
block/quorum.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 46 insertions(+)
create mode 100644 block/quorum.c
diff --git a/block/Makefile.objs b/block/Makefile.objs
index b5754d3..66af6dc 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -4,6 +4,7 @@ block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
block-obj-y += qed-check.o
block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
block-obj-y += stream.o
+block-obj-y += quorum.o
block-obj-$(CONFIG_WIN32) += raw-win32.o
block-obj-$(CONFIG_POSIX) += raw-posix.o
block-obj-$(CONFIG_LIBISCSI) += iscsi.o
diff --git a/block/quorum.c b/block/quorum.c
new file mode 100644
index 0000000..65a6b55
--- /dev/null
+++ b/block/quorum.c
@@ -0,0 +1,45 @@
+/*
+ * Quorum Block filter
+ *
+ * Copyright (C) 2012 Nodalink, SARL.
+ *
+ * Author:
+ * Benoît Canet <benoit.canet@irqsave.net>
+ *
+ * Based on the design and code of blkverify.c (Copyright (C) 2010 IBM, Corp)
+ * and blkmirror.c (Copyright (C) 2011 Red Hat, Inc).
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "block_int.h"
+
+typedef struct QuorumAIOCB QuorumAIOCB;
+
+typedef struct QuorumSingleAIOCB {
+ BlockDriverAIOCB *aiocb;
+ uint8_t *buf;
+ int ret;
+ QuorumAIOCB *parent;
+} QuorumSingleAIOCB;
+
+struct QuorumAIOCB {
+ BlockDriverAIOCB common;
+ QEMUBH *bh;
+
+ /* Request metadata */
+ int64_t sector_num;
+ int nb_sectors;
+
+ QEMUIOVector *qiov; /* calling readv IOV */
+
+ QuorumSingleAIOCB *aios; /* individual AIOs */
+ QEMUIOVector *qiovs; /* individual IOVs */
+ int count; /* number of completed AIOCB */
+ int success_count; /* number of successfully completed AIOCB */
+ bool *finished; /* completion signal for cancel */
+
+ void (*vote)(QuorumAIOCB *acb);
+ int vote_ret;
+};
--
1.7.9.5
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [Qemu-devel] [RFC V4 2/9] quorum: Create BDRVQuorumState and BlkDriver and do init.
2012-08-20 11:47 [Qemu-devel] [RFC V4 0/9] Quorum disk image corruption resiliency Benoît Canet
2012-08-20 11:47 ` [Qemu-devel] [RFC V4 1/9] quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB Benoît Canet
@ 2012-08-20 11:47 ` Benoît Canet
2012-08-20 11:47 ` [Qemu-devel] [RFC V4 3/9] quorum: Add quorum_open() and quorum_close() Benoît Canet
` (6 subsequent siblings)
8 siblings, 0 replies; 13+ messages in thread
From: Benoît Canet @ 2012-08-20 11:47 UTC (permalink / raw)
To: qemu-devel
Cc: kwolf, stefanha, blauwirbel, anthony, pbonzini, eblake, afaerber,
Benoît Canet
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/quorum.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/block/quorum.c b/block/quorum.c
index 65a6b55..19a9a44 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -15,6 +15,13 @@
#include "block_int.h"
+typedef struct {
+ BlockDriverState **bs;
+ int threshold;
+ int total;
+ char **filenames;
+} BDRVQuorumState;
+
typedef struct QuorumAIOCB QuorumAIOCB;
typedef struct QuorumSingleAIOCB {
@@ -26,6 +33,7 @@ typedef struct QuorumSingleAIOCB {
struct QuorumAIOCB {
BlockDriverAIOCB common;
+ BDRVQuorumState *bqs;
QEMUBH *bh;
/* Request metadata */
@@ -43,3 +51,17 @@ struct QuorumAIOCB {
void (*vote)(QuorumAIOCB *acb);
int vote_ret;
};
+
+static BlockDriver bdrv_quorum = {
+ .format_name = "quorum",
+ .protocol_name = "quorum",
+
+ .instance_size = sizeof(BDRVQuorumState),
+};
+
+static void bdrv_quorum_init(void)
+{
+ bdrv_register(&bdrv_quorum);
+}
+
+block_init(bdrv_quorum_init);
--
1.7.9.5
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [Qemu-devel] [RFC V4 3/9] quorum: Add quorum_open() and quorum_close().
2012-08-20 11:47 [Qemu-devel] [RFC V4 0/9] Quorum disk image corruption resiliency Benoît Canet
2012-08-20 11:47 ` [Qemu-devel] [RFC V4 1/9] quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB Benoît Canet
2012-08-20 11:47 ` [Qemu-devel] [RFC V4 2/9] quorum: Create BDRVQuorumState and BlkDriver and do init Benoît Canet
@ 2012-08-20 11:47 ` Benoît Canet
2012-08-20 19:47 ` Blue Swirl
2012-08-20 11:47 ` [Qemu-devel] [RFC V4 4/9] quorum: Add quorum_aio_writev and its dependencies Benoît Canet
` (5 subsequent siblings)
8 siblings, 1 reply; 13+ messages in thread
From: Benoît Canet @ 2012-08-20 11:47 UTC (permalink / raw)
To: qemu-devel
Cc: kwolf, stefanha, blauwirbel, anthony, pbonzini, eblake, afaerber,
Benoît Canet
Valid quorum resources look like:
quorum:threshold/total:path/to/image_1: ... :path/to/image_total
'\' can escape the ':' character.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/quorum.c | 110 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 110 insertions(+)
diff --git a/block/quorum.c b/block/quorum.c
index 19a9a44..70e7216 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -52,11 +52,121 @@ struct QuorumAIOCB {
int vote_ret;
};
+/* Valid quorum resources look like
+ * quorum:threshold/total:path/to/image_1: ... :path/to/image_total
+ */
+static int quorum_open(BlockDriverState *bs, const char *filename, int flags)
+{
+ BDRVQuorumState *s = bs->opaque;
+ int i, j, len, ret = 0;
+ char *a, *b, *names;
+ bool escape;
+
+ /* Parse the quorum: prefix */
+ if (strncmp(filename, "quorum:", strlen("quorum:"))) {
+ return -EINVAL;
+ }
+
+ filename += strlen("quorum:");
+
+ /* Get n */
+ errno = 0;
+ s->threshold = strtoul(filename, &a, 10);
+ if (*a != '/' || errno) {
+ return -EINVAL;
+ }
+ a++;
+
+ /* Get m */
+ errno = 0;
+ s->total = strtoul(a, &b, 10);
+ if (*b != ':' || errno) {
+ return -EINVAL;
+ }
+ b++;
+
+ if (s->threshold < 1 || s->total < 2) {
+ return -EINVAL;
+ }
+
+ if (s->threshold > s->total) {
+ return -EINVAL;
+ }
+
+ s->bs = g_malloc0(sizeof(BlockDriverState *) * s->total);
+ /* Two allocations for all filenames: simpler to free */
+ s->filenames = g_malloc0(sizeof(char *) * s->total);
+ names = g_strdup(b);
+
+ /* Get the filenames pointers */
+ escape = false;
+ s->filenames[0] = names;
+ len = strlen(names);
+ for (i = 0, j = 1; i < len && j < s->total; i++) {
+ if (!escape && names[i] == ':') {
+ names[i] = '\0';
+ s->filenames[j] = names + i + 1;
+ j++;
+ }
+
+ escape = !escape && names[i] == '\\';
+ }
+
+ if (j != s->total) {
+ ret = -EINVAL;
+ goto free_exit;
+ }
+
+ /* Open files */
+ for (i = 0; i < s->total; i++) {
+ s->bs[i] = bdrv_new("");
+ ret = bdrv_open(s->bs[i], s->filenames[i], flags, NULL);
+ if (ret < 0) {
+ goto error_exit;
+ }
+ }
+
+ goto exit;
+
+error_exit:
+ for (; i >= 0; i--) {
+ bdrv_delete(s->bs[i]);
+ s->bs[i] = NULL;
+ }
+free_exit:
+ g_free(s->filenames[0]);
+ g_free(s->filenames);
+ s->filenames = NULL;
+ g_free(s->bs);
+exit:
+ return ret;
+}
+
+static void quorum_close(BlockDriverState *bs)
+{
+ BDRVQuorumState *s = bs->opaque;
+ int i;
+
+ for (i = 0; i < s->total; i++) {
+ /* Ensure writes reach stable storage */
+ bdrv_flush(s->bs[i]);
+ bdrv_delete(s->bs[i]);
+ }
+
+ g_free(s->filenames[0]);
+ g_free(s->filenames);
+ s->filenames = NULL;
+ g_free(s->bs);
+}
+
static BlockDriver bdrv_quorum = {
.format_name = "quorum",
.protocol_name = "quorum",
.instance_size = sizeof(BDRVQuorumState),
+
+ .bdrv_file_open = quorum_open,
+ .bdrv_close = quorum_close,
};
static void bdrv_quorum_init(void)
--
1.7.9.5
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [Qemu-devel] [RFC V4 4/9] quorum: Add quorum_aio_writev and its dependencies.
2012-08-20 11:47 [Qemu-devel] [RFC V4 0/9] Quorum disk image corruption resiliency Benoît Canet
` (2 preceding siblings ...)
2012-08-20 11:47 ` [Qemu-devel] [RFC V4 3/9] quorum: Add quorum_open() and quorum_close() Benoît Canet
@ 2012-08-20 11:47 ` Benoît Canet
2012-08-20 11:47 ` [Qemu-devel] [RFC V4 5/9] blkverify: Extract qemu_iovec_clone() and qemu_iovec_compare() from blkverify Benoît Canet
` (4 subsequent siblings)
8 siblings, 0 replies; 13+ messages in thread
From: Benoît Canet @ 2012-08-20 11:47 UTC (permalink / raw)
To: qemu-devel
Cc: kwolf, stefanha, blauwirbel, anthony, pbonzini, eblake, afaerber,
Benoît Canet
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/quorum.c | 112 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 112 insertions(+)
diff --git a/block/quorum.c b/block/quorum.c
index 70e7216..5c87cec 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -159,6 +159,116 @@ static void quorum_close(BlockDriverState *bs)
g_free(s->bs);
}
+static void quorum_aio_cancel(BlockDriverAIOCB *blockacb)
+{
+ QuorumAIOCB *acb = container_of(blockacb, QuorumAIOCB, common);
+ bool finished = false;
+
+ /* Wait for the request to finish */
+ acb->finished = &finished;
+ while (!finished) {
+ qemu_aio_wait();
+ }
+}
+
+static AIOPool quorum_aio_pool = {
+ .aiocb_size = sizeof(QuorumAIOCB),
+ .cancel = quorum_aio_cancel,
+};
+
+static void quorum_aio_bh(void *opaque)
+{
+ QuorumAIOCB *acb = opaque;
+ BDRVQuorumState *s = acb->bqs;
+ int ret;
+
+ ret = s->threshold <= acb->success_count ? 0 : -EIO;
+
+ qemu_bh_delete(acb->bh);
+ acb->common.cb(acb->common.opaque, ret);
+ if (acb->finished) {
+ *acb->finished = true;
+ }
+ g_free(acb->aios);
+ g_free(acb->qiovs);
+ qemu_aio_release(acb);
+}
+
+static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
+ BlockDriverState *bs,
+ QEMUIOVector *qiov,
+ int64_t sector_num,
+ int nb_sectors,
+ BlockDriverCompletionFunc *cb,
+ void *opaque)
+{
+ QuorumAIOCB *acb = qemu_aio_get(&quorum_aio_pool, bs, cb, opaque);
+ int i;
+
+ acb->aios = g_new0(QuorumSingleAIOCB, s->total);
+ acb->qiovs = g_new0(QEMUIOVector, s->total);
+
+ acb->bqs = s;
+ acb->qiov = qiov;
+ acb->bh = NULL;
+ acb->count = 0;
+ acb->success_count = 0;
+ acb->sector_num = sector_num;
+ acb->nb_sectors = nb_sectors;
+ acb->vote = NULL;
+ acb->vote_ret = 0;
+
+ for (i = 0; i < s->total; i++) {
+ acb->aios[i].buf = NULL;
+ acb->aios[i].ret = 0;
+ acb->aios[i].parent = acb;
+ }
+
+ return acb;
+}
+
+static void quorum_aio_cb(void *opaque, int ret)
+{
+ QuorumSingleAIOCB *sacb = opaque;
+ QuorumAIOCB *acb = sacb->parent;
+ BDRVQuorumState *s = acb->bqs;
+
+ sacb->ret = ret;
+ acb->count++;
+ if (ret == 0) {
+ acb->success_count++;
+ }
+ assert(acb->count <= s->total);
+ assert(acb->success_count <= s->total);
+ if (acb->count < s->total) {
+ return;
+ }
+
+ acb->bh = qemu_bh_new(quorum_aio_bh, acb);
+ qemu_bh_schedule(acb->bh);
+}
+
+static BlockDriverAIOCB *quorum_aio_writev(BlockDriverState *bs,
+ int64_t sector_num,
+ QEMUIOVector *qiov,
+ int nb_sectors,
+ BlockDriverCompletionFunc *cb,
+ void *opaque)
+{
+ BDRVQuorumState *s = bs->opaque;
+ QuorumAIOCB *acb = quorum_aio_get(s, bs, qiov, sector_num, nb_sectors,
+ cb, opaque);
+ int i;
+
+ for (i = 0; i < s->total; i++) {
+ acb->aios[i].aiocb = bdrv_aio_writev(s->bs[i], sector_num, qiov,
+ nb_sectors, &quorum_aio_cb,
+ &acb->aios[i]);
+ }
+
+ return &acb->common;
+}
+
static BlockDriver bdrv_quorum = {
.format_name = "quorum",
.protocol_name = "quorum",
@@ -167,6 +277,8 @@ static BlockDriver bdrv_quorum = {
.bdrv_file_open = quorum_open,
.bdrv_close = quorum_close,
+
+ .bdrv_aio_writev = quorum_aio_writev,
};
static void bdrv_quorum_init(void)
--
1.7.9.5
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [Qemu-devel] [RFC V4 5/9] blkverify: Extract qemu_iovec_clone() and qemu_iovec_compare() from blkverify.
2012-08-20 11:47 [Qemu-devel] [RFC V4 0/9] Quorum disk image corruption resiliency Benoît Canet
` (3 preceding siblings ...)
2012-08-20 11:47 ` [Qemu-devel] [RFC V4 4/9] quorum: Add quorum_aio_writev and its dependencies Benoît Canet
@ 2012-08-20 11:47 ` Benoît Canet
2012-08-20 11:48 ` [Qemu-devel] [RFC V4 6/9] quorum: Add quorum_co_flush() Benoît Canet
` (3 subsequent siblings)
8 siblings, 0 replies; 13+ messages in thread
From: Benoît Canet @ 2012-08-20 11:47 UTC (permalink / raw)
To: qemu-devel
Cc: kwolf, stefanha, blauwirbel, anthony, pbonzini, eblake, afaerber,
Benoît Canet
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/blkverify.c | 108 +----------------------------------------------------
cutils.c | 103 ++++++++++++++++++++++++++++++++++++++++++++++++++
qemu-common.h | 2 +
3 files changed, 107 insertions(+), 106 deletions(-)
diff --git a/block/blkverify.c b/block/blkverify.c
index 9d5f1ec..79d36d5 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -123,110 +123,6 @@ static int64_t blkverify_getlength(BlockDriverState *bs)
return bdrv_getlength(s->test_file);
}
-/**
- * Check that I/O vector contents are identical
- *
- * @a: I/O vector
- * @b: I/O vector
- * @ret: Offset to first mismatching byte or -1 if match
- */
-static ssize_t blkverify_iovec_compare(QEMUIOVector *a, QEMUIOVector *b)
-{
- int i;
- ssize_t offset = 0;
-
- assert(a->niov == b->niov);
- for (i = 0; i < a->niov; i++) {
- size_t len = 0;
- uint8_t *p = (uint8_t *)a->iov[i].iov_base;
- uint8_t *q = (uint8_t *)b->iov[i].iov_base;
-
- assert(a->iov[i].iov_len == b->iov[i].iov_len);
- while (len < a->iov[i].iov_len && *p++ == *q++) {
- len++;
- }
-
- offset += len;
-
- if (len != a->iov[i].iov_len) {
- return offset;
- }
- }
- return -1;
-}
-
-typedef struct {
- int src_index;
- struct iovec *src_iov;
- void *dest_base;
-} IOVectorSortElem;
-
-static int sortelem_cmp_src_base(const void *a, const void *b)
-{
- const IOVectorSortElem *elem_a = a;
- const IOVectorSortElem *elem_b = b;
-
- /* Don't overflow */
- if (elem_a->src_iov->iov_base < elem_b->src_iov->iov_base) {
- return -1;
- } else if (elem_a->src_iov->iov_base > elem_b->src_iov->iov_base) {
- return 1;
- } else {
- return 0;
- }
-}
-
-static int sortelem_cmp_src_index(const void *a, const void *b)
-{
- const IOVectorSortElem *elem_a = a;
- const IOVectorSortElem *elem_b = b;
-
- return elem_a->src_index - elem_b->src_index;
-}
-
-/**
- * Copy contents of I/O vector
- *
- * The relative relationships of overlapping iovecs are preserved. This is
- * necessary to ensure identical semantics in the cloned I/O vector.
- */
-static void blkverify_iovec_clone(QEMUIOVector *dest, const QEMUIOVector *src,
- void *buf)
-{
- IOVectorSortElem sortelems[src->niov];
- void *last_end;
- int i;
-
- /* Sort by source iovecs by base address */
- for (i = 0; i < src->niov; i++) {
- sortelems[i].src_index = i;
- sortelems[i].src_iov = &src->iov[i];
- }
- qsort(sortelems, src->niov, sizeof(sortelems[0]), sortelem_cmp_src_base);
-
- /* Allocate buffer space taking into account overlapping iovecs */
- last_end = NULL;
- for (i = 0; i < src->niov; i++) {
- struct iovec *cur = sortelems[i].src_iov;
- ptrdiff_t rewind = 0;
-
- /* Detect overlap */
- if (last_end && last_end > cur->iov_base) {
- rewind = last_end - cur->iov_base;
- }
-
- sortelems[i].dest_base = buf - rewind;
- buf += cur->iov_len - MIN(rewind, cur->iov_len);
- last_end = MAX(cur->iov_base + cur->iov_len, last_end);
- }
-
- /* Sort by source iovec index and build destination iovec */
- qsort(sortelems, src->niov, sizeof(sortelems[0]), sortelem_cmp_src_index);
- for (i = 0; i < src->niov; i++) {
- qemu_iovec_add(dest, sortelems[i].dest_base, src->iov[i].iov_len);
- }
-}
-
static BlkverifyAIOCB *blkverify_aio_get(BlockDriverState *bs, bool is_write,
int64_t sector_num, QEMUIOVector *qiov,
int nb_sectors,
@@ -290,7 +186,7 @@ static void blkverify_aio_cb(void *opaque, int ret)
static void blkverify_verify_readv(BlkverifyAIOCB *acb)
{
- ssize_t offset = blkverify_iovec_compare(acb->qiov, &acb->raw_qiov);
+ ssize_t offset = qemu_iovec_compare(acb->qiov, &acb->raw_qiov);
if (offset != -1) {
blkverify_err(acb, "contents mismatch in sector %" PRId64,
acb->sector_num + (int64_t)(offset / BDRV_SECTOR_SIZE));
@@ -308,7 +204,7 @@ static BlockDriverAIOCB *blkverify_aio_readv(BlockDriverState *bs,
acb->verify = blkverify_verify_readv;
acb->buf = qemu_blockalign(bs->file, qiov->size);
qemu_iovec_init(&acb->raw_qiov, acb->qiov->niov);
- blkverify_iovec_clone(&acb->raw_qiov, qiov, acb->buf);
+ qemu_iovec_clone(&acb->raw_qiov, qiov, acb->buf);
bdrv_aio_readv(s->test_file, sector_num, qiov, nb_sectors,
blkverify_aio_cb, acb);
diff --git a/cutils.c b/cutils.c
index ee4614d..dcdd60f 100644
--- a/cutils.c
+++ b/cutils.c
@@ -245,6 +245,109 @@ size_t qemu_iovec_memset(QEMUIOVector *qiov, size_t offset,
return iov_memset(qiov->iov, qiov->niov, offset, fillc, bytes);
}
+/**
+ * Check that I/O vector contents are identical
+ *
+ * @a: I/O vector
+ * @b: I/O vector
+ * @ret: Offset to first mismatching byte or -1 if match
+ */
+ssize_t qemu_iovec_compare(QEMUIOVector *a, QEMUIOVector *b)
+{
+ int i;
+ ssize_t offset = 0;
+
+ assert(a->niov == b->niov);
+ for (i = 0; i < a->niov; i++) {
+ size_t len = 0;
+ uint8_t *p = (uint8_t *)a->iov[i].iov_base;
+ uint8_t *q = (uint8_t *)b->iov[i].iov_base;
+
+ assert(a->iov[i].iov_len == b->iov[i].iov_len);
+ while (len < a->iov[i].iov_len && *p++ == *q++) {
+ len++;
+ }
+
+ offset += len;
+
+ if (len != a->iov[i].iov_len) {
+ return offset;
+ }
+ }
+ return -1;
+}
+
+typedef struct {
+ int src_index;
+ struct iovec *src_iov;
+ void *dest_base;
+} IOVectorSortElem;
+
+static int sortelem_cmp_src_base(const void *a, const void *b)
+{
+ const IOVectorSortElem *elem_a = a;
+ const IOVectorSortElem *elem_b = b;
+
+ /* Don't overflow */
+ if (elem_a->src_iov->iov_base < elem_b->src_iov->iov_base) {
+ return -1;
+ } else if (elem_a->src_iov->iov_base > elem_b->src_iov->iov_base) {
+ return 1;
+ } else {
+ return 0;
+ }
+}
+
+static int sortelem_cmp_src_index(const void *a, const void *b)
+{
+ const IOVectorSortElem *elem_a = a;
+ const IOVectorSortElem *elem_b = b;
+
+ return elem_a->src_index - elem_b->src_index;
+}
+
+/**
+ * Copy contents of I/O vector
+ *
+ * The relative relationships of overlapping iovecs are preserved. This is
+ * necessary to ensure identical semantics in the cloned I/O vector.
+ */
+void qemu_iovec_clone(QEMUIOVector *dest, const QEMUIOVector *src, void *buf)
+{
+ IOVectorSortElem sortelems[src->niov];
+ void *last_end;
+ int i;
+
+ /* Sort by source iovecs by base address */
+ for (i = 0; i < src->niov; i++) {
+ sortelems[i].src_index = i;
+ sortelems[i].src_iov = &src->iov[i];
+ }
+ qsort(sortelems, src->niov, sizeof(sortelems[0]), sortelem_cmp_src_base);
+
+ /* Allocate buffer space taking into account overlapping iovecs */
+ last_end = NULL;
+ for (i = 0; i < src->niov; i++) {
+ struct iovec *cur = sortelems[i].src_iov;
+ ptrdiff_t rewind = 0;
+
+ /* Detect overlap */
+ if (last_end && last_end > cur->iov_base) {
+ rewind = last_end - cur->iov_base;
+ }
+
+ sortelems[i].dest_base = buf - rewind;
+ buf += cur->iov_len - MIN(rewind, cur->iov_len);
+ last_end = MAX(cur->iov_base + cur->iov_len, last_end);
+ }
+
+ /* Sort by source iovec index and build destination iovec */
+ qsort(sortelems, src->niov, sizeof(sortelems[0]), sortelem_cmp_src_index);
+ for (i = 0; i < src->niov; i++) {
+ qemu_iovec_add(dest, sortelems[i].dest_base, src->iov[i].iov_len);
+ }
+}
+
/*
* Checks if a buffer is all zeroes
*
diff --git a/qemu-common.h b/qemu-common.h
index 095e28d..724d08a 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -371,6 +371,8 @@ size_t qemu_iovec_from_buf(QEMUIOVector *qiov, size_t offset,
const void *buf, size_t bytes);
size_t qemu_iovec_memset(QEMUIOVector *qiov, size_t offset,
int fillc, size_t bytes);
+ssize_t qemu_iovec_compare(QEMUIOVector *a, QEMUIOVector *b);
+void qemu_iovec_clone(QEMUIOVector *dest, const QEMUIOVector *src, void *buf);
bool buffer_is_zero(const void *buf, size_t len);
--
1.7.9.5
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [Qemu-devel] [RFC V4 6/9] quorum: Add quorum_co_flush().
2012-08-20 11:47 [Qemu-devel] [RFC V4 0/9] Quorum disk image corruption resiliency Benoît Canet
` (4 preceding siblings ...)
2012-08-20 11:47 ` [Qemu-devel] [RFC V4 5/9] blkverify: Extract qemu_iovec_clone() and qemu_iovec_compare() from blkverify Benoît Canet
@ 2012-08-20 11:48 ` Benoît Canet
2012-08-20 11:48 ` [Qemu-devel] [RFC V4 7/9] quorum: Add quorum_aio_readv Benoît Canet
` (2 subsequent siblings)
8 siblings, 0 replies; 13+ messages in thread
From: Benoît Canet @ 2012-08-20 11:48 UTC (permalink / raw)
To: qemu-devel
Cc: kwolf, stefanha, blauwirbel, anthony, pbonzini, eblake, afaerber,
Benoît Canet
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/quorum.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/block/quorum.c b/block/quorum.c
index 5c87cec..092d1ea 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -269,6 +269,18 @@ static BlockDriverAIOCB *quorum_aio_writev(BlockDriverState *bs,
return &acb->common;
}
+static coroutine_fn int quorum_co_flush(BlockDriverState *bs)
+{
+ BDRVQuorumState *s = bs->opaque;
+ int i;
+
+ for (i = 0; i < s->total; i++) {
+ bdrv_co_flush(s->bs[i]);
+ }
+
+ return 0;
+}
+
static BlockDriver bdrv_quorum = {
.format_name = "quorum",
.protocol_name = "quorum",
@@ -277,6 +289,7 @@ static BlockDriver bdrv_quorum = {
.bdrv_file_open = quorum_open,
.bdrv_close = quorum_close,
+ .bdrv_co_flush_to_disk = quorum_co_flush,
.bdrv_aio_writev = quorum_aio_writev,
};
--
1.7.9.5
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [Qemu-devel] [RFC V4 7/9] quorum: Add quorum_aio_readv.
2012-08-20 11:47 [Qemu-devel] [RFC V4 0/9] Quorum disk image corruption resiliency Benoît Canet
` (5 preceding siblings ...)
2012-08-20 11:48 ` [Qemu-devel] [RFC V4 6/9] quorum: Add quorum_co_flush() Benoît Canet
@ 2012-08-20 11:48 ` Benoît Canet
2012-08-20 11:48 ` [Qemu-devel] [RFC V4 8/9] quorum: Add quorum mechanism Benoît Canet
2012-08-20 11:48 ` [Qemu-devel] [RFC V4 9/9] quorum: Add quorum_getlength() Benoît Canet
8 siblings, 0 replies; 13+ messages in thread
From: Benoît Canet @ 2012-08-20 11:48 UTC (permalink / raw)
To: qemu-devel
Cc: kwolf, stefanha, blauwirbel, anthony, pbonzini, eblake, afaerber,
Benoît Canet
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/quorum.c | 38 +++++++++++++++++++++++++++++++++++++-
1 file changed, 37 insertions(+), 1 deletion(-)
diff --git a/block/quorum.c b/block/quorum.c
index 092d1ea..95f4668 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -180,15 +180,24 @@ static void quorum_aio_bh(void *opaque)
{
QuorumAIOCB *acb = opaque;
BDRVQuorumState *s = acb->bqs;
- int ret;
+ int i, ret;
ret = s->threshold <= acb->success_count ? 0 : -EIO;
+ for (i = 0; i < s->total; i++) {
+ qemu_vfree(acb->aios[i].buf);
+ acb->aios[i].buf = NULL;
+ acb->aios[i].ret = 0;
+ }
+
qemu_bh_delete(acb->bh);
acb->common.cb(acb->common.opaque, ret);
if (acb->finished) {
*acb->finished = true;
}
+ for (i = 0; i < s->total; i++) {
+ qemu_iovec_destroy(&acb->qiovs[i]);
+ }
g_free(acb->aios);
g_free(acb->qiovs);
qemu_aio_release(acb);
@@ -248,6 +257,32 @@ static void quorum_aio_cb(void *opaque, int ret)
qemu_bh_schedule(acb->bh);
}
+static BlockDriverAIOCB *quorum_aio_readv(BlockDriverState *bs,
+ int64_t sector_num,
+ QEMUIOVector *qiov,
+ int nb_sectors,
+ BlockDriverCompletionFunc *cb,
+ void *opaque)
+{
+ BDRVQuorumState *s = bs->opaque;
+ QuorumAIOCB *acb = quorum_aio_get(s, bs, qiov, sector_num,
+ nb_sectors, cb, opaque);
+ int i;
+
+ for (i = 0; i < s->total; i++) {
+ acb->aios[i].buf = qemu_blockalign(bs->file, qiov->size);
+ qemu_iovec_init(&acb->qiovs[i], qiov->niov);
+ qemu_iovec_clone(&acb->qiovs[i], qiov, acb->aios[i].buf);
+ }
+
+ for (i = 0; i < s->total; i++) {
+ bdrv_aio_readv(s->bs[i], sector_num, qiov, nb_sectors,
+ quorum_aio_cb, &acb->aios[i]);
+ }
+
+ return &acb->common;
+}
+
static BlockDriverAIOCB *quorum_aio_writev(BlockDriverState *bs,
int64_t sector_num,
QEMUIOVector *qiov,
@@ -291,6 +326,7 @@ static BlockDriver bdrv_quorum = {
.bdrv_close = quorum_close,
.bdrv_co_flush_to_disk = quorum_co_flush,
+ .bdrv_aio_readv = quorum_aio_readv,
.bdrv_aio_writev = quorum_aio_writev,
};
--
1.7.9.5
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [Qemu-devel] [RFC V4 8/9] quorum: Add quorum mechanism.
2012-08-20 11:47 [Qemu-devel] [RFC V4 0/9] Quorum disk image corruption resiliency Benoît Canet
` (6 preceding siblings ...)
2012-08-20 11:48 ` [Qemu-devel] [RFC V4 7/9] quorum: Add quorum_aio_readv Benoît Canet
@ 2012-08-20 11:48 ` Benoît Canet
2012-08-20 11:48 ` [Qemu-devel] [RFC V4 9/9] quorum: Add quorum_getlength() Benoît Canet
8 siblings, 0 replies; 13+ messages in thread
From: Benoît Canet @ 2012-08-20 11:48 UTC (permalink / raw)
To: qemu-devel
Cc: kwolf, stefanha, blauwirbel, anthony, pbonzini, eblake, afaerber,
Benoît Canet
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/quorum.c | 211 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 210 insertions(+), 1 deletion(-)
diff --git a/block/quorum.c b/block/quorum.c
index 95f4668..a909ce2 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -14,6 +14,20 @@
*/
#include "block_int.h"
+#include "zlib.h"
+
+typedef struct QuorumVoteItem {
+ int index;
+ QLIST_ENTRY(QuorumVoteItem) next;
+} QuorumVoteItem;
+
+typedef struct QuorumVoteVersion {
+ unsigned long value;
+ int index;
+ int vote_count;
+ QLIST_HEAD(, QuorumVoteItem) items;
+ QLIST_ENTRY(QuorumVoteVersion) next;
+} QuorumVoteVersion;
typedef struct {
BlockDriverState **bs;
@@ -31,6 +45,10 @@ typedef struct QuorumSingleAIOCB {
QuorumAIOCB *parent;
} QuorumSingleAIOCB;
+typedef struct QuorumVotes {
+ QLIST_HEAD(, QuorumVoteVersion) vote_list;
+} QuorumVotes;
+
struct QuorumAIOCB {
BlockDriverAIOCB common;
BDRVQuorumState *bqs;
@@ -48,6 +66,8 @@ struct QuorumAIOCB {
int success_count; /* number of successfully completed AIOCB */
bool *finished; /* completion signal for cancel */
+ QuorumVotes votes;
+
void (*vote)(QuorumAIOCB *acb);
int vote_ret;
};
@@ -191,6 +211,11 @@ static void quorum_aio_bh(void *opaque)
}
qemu_bh_delete(acb->bh);
+
+ if (acb->vote_ret) {
+ ret = acb->vote_ret;
+ }
+
acb->common.cb(acb->common.opaque, ret);
if (acb->finished) {
*acb->finished = true;
@@ -226,6 +251,7 @@ static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
acb->nb_sectors = nb_sectors;
acb->vote = NULL;
acb->vote_ret = 0;
+ QLIST_INIT(&acb->votes.vote_list);
for (i = 0; i < s->total; i++) {
acb->aios[i].buf = NULL;
@@ -253,10 +279,191 @@ static void quorum_aio_cb(void *opaque, int ret)
return;
}
+ /* Do the vote */
+ if (acb->vote) {
+ acb->vote(acb);
+ }
+
acb->bh = qemu_bh_new(quorum_aio_bh, acb);
qemu_bh_schedule(acb->bh);
}
+static void quorum_print_bad(QuorumAIOCB *acb, const char *filename)
+{
+ fprintf(stderr, "quorum: corrected error in quorum file %s: sector_num=%"
+ PRId64 " nb_sectors=%i\n", filename, acb->sector_num,
+ acb->nb_sectors);
+}
+
+static void quorum_print_failure(QuorumAIOCB *acb)
+{
+ fprintf(stderr, "quorum: failure sector_num=%" PRId64 " nb_sectors=%i\n",
+ acb->sector_num, acb->nb_sectors);
+}
+
+static void quorum_print_bad_versions(QuorumAIOCB *acb,
+ unsigned long checksum)
+{
+ QuorumVoteVersion *version;
+ QuorumVoteItem *item;
+ BDRVQuorumState *s = acb->bqs;
+
+ QLIST_FOREACH(version, &acb->votes.vote_list, next) {
+ if (version->value == checksum) {
+ continue;
+ }
+ QLIST_FOREACH(item, &version->items, next) {
+ quorum_print_bad(acb, s->filenames[item->index]);
+ }
+ }
+}
+
+static void quorum_copy_qiov(QEMUIOVector *dest, QEMUIOVector *source)
+{
+ int i;
+ assert(dest->niov == source->niov);
+ assert(dest->size == source->size);
+ for (i = 0; i < source->niov; i++) {
+ assert(dest->iov[i].iov_len == source->iov[i].iov_len);
+ memcpy(dest->iov[i].iov_base,
+ source->iov[i].iov_base,
+ source->iov[i].iov_len);
+ }
+}
+
+static void quorum_count_vote(QuorumVotes *votes,
+ unsigned long checksum,
+ int index)
+{
+ QuorumVoteVersion *v = NULL, *version = NULL;
+ QuorumVoteItem *item;
+
+ /* look if we have something with this checksum */
+ QLIST_FOREACH(v, &votes->vote_list, next) {
+ if (v->value == checksum) {
+ version = v;
+ break;
+ }
+ }
+
+ /* It's a version not yet in the list add it */
+ if (!version) {
+ version = g_new0(QuorumVoteVersion, 1);
+ QLIST_INIT(&version->items);
+ version->value = checksum;
+ version->index = index;
+ version->vote_count = 0;
+ QLIST_INSERT_HEAD(&votes->vote_list, version, next);
+ }
+
+ version->vote_count++;
+
+ item = g_new0(QuorumVoteItem, 1);
+ item->index = index;
+ QLIST_INSERT_HEAD(&version->items, item, next);
+}
+
+static void quorum_free_vote_list(QuorumVotes *votes)
+{
+ QuorumVoteVersion *version, *next_version;
+ QuorumVoteItem *item, *next_item;
+
+ QLIST_FOREACH_SAFE(version, &votes->vote_list, next, next_version) {
+ QLIST_REMOVE(version, next);
+ QLIST_FOREACH_SAFE(item, &version->items, next, next_item) {
+ QLIST_REMOVE(item, next);
+ g_free(item);
+ }
+ g_free(version);
+ }
+}
+
+static unsigned long quorum_compute_checksum(QuorumAIOCB *acb, int i)
+{
+ int j;
+ unsigned long adler = adler32(0L, Z_NULL, 0);
+ QEMUIOVector *qiov = &acb->qiovs[i];
+
+ for (j = 0; j < qiov->niov; j++) {
+ adler = adler32(adler,
+ qiov->iov[j].iov_base,
+ qiov->iov[j].iov_len);
+ }
+
+ return adler;
+}
+
+static void quorum_vote(QuorumAIOCB *acb)
+{
+ bool quorum = true;
+ int i, j;
+ unsigned long checksum = 0;
+ BDRVQuorumState *s = acb->bqs;
+ QuorumVoteVersion *candidate, *winner = NULL;
+
+ /* get the index of the first successfull read */
+ for (i = 0; i < s->total; i++) {
+ if (!acb->aios[i].ret) {
+ break;
+ }
+ }
+
+ /* compare this read with all other successfull read looking for quorum */
+ for (j = i + 1; j < s->total; j++) {
+ if (acb->aios[j].ret) {
+ continue;
+ }
+ if (qemu_iovec_compare(&acb->qiovs[i],
+ &acb->qiovs[j]) != -1) {
+ quorum = false;
+ break;
+ }
+ }
+
+ /* Every successfull read agrees -> Quorum */
+ if (quorum) {
+ quorum_copy_qiov(acb->qiov, &acb->qiovs[i]);
+ return;
+ }
+
+ /* compute checksums for each successfull read, also store indexes */
+ for (i = 0; i < s->total; i++) {
+ if (acb->aios[i].ret) {
+ continue;
+ }
+ checksum = quorum_compute_checksum(acb, i);
+ quorum_count_vote(&acb->votes, checksum, i);
+ }
+
+ /* vote to select the most represented version */
+ i = 0;
+ QLIST_FOREACH(candidate, &acb->votes.vote_list, next) {
+ if (candidate->vote_count > i) {
+ i = candidate->vote_count;
+ winner = candidate;
+ }
+ }
+
+ /* if the winner count is smaller than threshold read fail */
+ if (winner->vote_count < s->threshold) {
+ quorum_print_failure(acb);
+ acb->vote_ret = -EIO;
+ goto free_exit;
+ }
+
+ /* we have a winner: copy it */
+ quorum_copy_qiov(acb->qiov, &acb->qiovs[winner->index]);
+
+ /* if some versions are bad print them */
+ if (i < s->total) {
+ quorum_print_bad_versions(acb, winner->value);
+ }
+
+free_exit:
+ /* free lists */
+ quorum_free_vote_list(&acb->votes);
+}
+
static BlockDriverAIOCB *quorum_aio_readv(BlockDriverState *bs,
int64_t sector_num,
QEMUIOVector *qiov,
@@ -269,6 +476,8 @@ static BlockDriverAIOCB *quorum_aio_readv(BlockDriverState *bs,
nb_sectors, cb, opaque);
int i;
+ acb->vote = quorum_vote;
+
for (i = 0; i < s->total; i++) {
acb->aios[i].buf = qemu_blockalign(bs->file, qiov->size);
qemu_iovec_init(&acb->qiovs[i], qiov->niov);
@@ -276,7 +485,7 @@ static BlockDriverAIOCB *quorum_aio_readv(BlockDriverState *bs,
}
for (i = 0; i < s->total; i++) {
- bdrv_aio_readv(s->bs[i], sector_num, qiov, nb_sectors,
+ bdrv_aio_readv(s->bs[i], sector_num, &acb->qiovs[i], nb_sectors,
quorum_aio_cb, &acb->aios[i]);
}
--
1.7.9.5
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [Qemu-devel] [RFC V4 9/9] quorum: Add quorum_getlength().
2012-08-20 11:47 [Qemu-devel] [RFC V4 0/9] Quorum disk image corruption resiliency Benoît Canet
` (7 preceding siblings ...)
2012-08-20 11:48 ` [Qemu-devel] [RFC V4 8/9] quorum: Add quorum mechanism Benoît Canet
@ 2012-08-20 11:48 ` Benoît Canet
8 siblings, 0 replies; 13+ messages in thread
From: Benoît Canet @ 2012-08-20 11:48 UTC (permalink / raw)
To: qemu-devel
Cc: kwolf, stefanha, blauwirbel, anthony, pbonzini, eblake, afaerber,
Benoît Canet
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/quorum.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/block/quorum.c b/block/quorum.c
index a909ce2..7bd6f0e 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -525,12 +525,42 @@ static coroutine_fn int quorum_co_flush(BlockDriverState *bs)
return 0;
}
+static int64_t quorum_getlength(BlockDriverState *bs)
+{
+ BDRVQuorumState *s = bs->opaque;
+ QuorumVoteVersion *candidate, *winner = NULL;
+ QuorumVotes votes;
+ int64_t value;
+ int i;
+
+ QLIST_INIT(&votes.vote_list);
+ for (i = 0; i < s->total; i++) {
+ quorum_count_vote(&votes, (unsigned long) bdrv_getlength(s->bs[i]), i);
+ }
+
+ /* vote to select the most represented version */
+ i = 0;
+ QLIST_FOREACH(candidate, &votes.vote_list, next) {
+ if (candidate->vote_count > i) {
+ i = candidate->vote_count;
+ winner = candidate;
+ }
+ }
+
+ value = (int64_t) winner->value;
+ quorum_free_vote_list(&votes);
+
+ return value;
+}
+
static BlockDriver bdrv_quorum = {
.format_name = "quorum",
.protocol_name = "quorum",
.instance_size = sizeof(BDRVQuorumState),
+ .bdrv_getlength = quorum_getlength,
+
.bdrv_file_open = quorum_open,
.bdrv_close = quorum_close,
.bdrv_co_flush_to_disk = quorum_co_flush,
--
1.7.9.5
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [RFC V4 3/9] quorum: Add quorum_open() and quorum_close().
2012-08-20 11:47 ` [Qemu-devel] [RFC V4 3/9] quorum: Add quorum_open() and quorum_close() Benoît Canet
@ 2012-08-20 19:47 ` Blue Swirl
2012-08-20 20:37 ` Eric Blake
0 siblings, 1 reply; 13+ messages in thread
From: Blue Swirl @ 2012-08-20 19:47 UTC (permalink / raw)
To: Benoît Canet
Cc: kwolf, stefanha, qemu-devel, anthony, pbonzini, eblake, afaerber,
Benoît Canet
On Mon, Aug 20, 2012 at 11:47 AM, Benoît Canet <benoit.canet@gmail.com> wrote:
> Valid quorum resources look like:
> quorum:threshold/total:path/to/image_1: ... :path/to/image_total
>
> '\' can escape the ':' character.
>
> Signed-off-by: Benoit Canet <benoit@irqsave.net>
> ---
> block/quorum.c | 110 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 110 insertions(+)
>
> diff --git a/block/quorum.c b/block/quorum.c
> index 19a9a44..70e7216 100644
> --- a/block/quorum.c
> +++ b/block/quorum.c
> @@ -52,11 +52,121 @@ struct QuorumAIOCB {
> int vote_ret;
> };
>
> +/* Valid quorum resources look like
> + * quorum:threshold/total:path/to/image_1: ... :path/to/image_total
> + */
> +static int quorum_open(BlockDriverState *bs, const char *filename, int flags)
> +{
> + BDRVQuorumState *s = bs->opaque;
> + int i, j, len, ret = 0;
> + char *a, *b, *names;
> + bool escape;
> +
> + /* Parse the quorum: prefix */
> + if (strncmp(filename, "quorum:", strlen("quorum:"))) {
> + return -EINVAL;
> + }
> +
> + filename += strlen("quorum:");
> +
> + /* Get n */
Here's still 'n' instead of threshold.
> + errno = 0;
> + s->threshold = strtoul(filename, &a, 10);
> + if (*a != '/' || errno) {
> + return -EINVAL;
> + }
> + a++;
> +
> + /* Get m */
Also 'm' here.
> + errno = 0;
> + s->total = strtoul(a, &b, 10);
> + if (*b != ':' || errno) {
> + return -EINVAL;
> + }
> + b++;
> +
> + if (s->threshold < 1 || s->total < 2) {
> + return -EINVAL;
> + }
> +
> + if (s->threshold > s->total) {
> + return -EINVAL;
> + }
> +
> + s->bs = g_malloc0(sizeof(BlockDriverState *) * s->total);
> + /* Two allocations for all filenames: simpler to free */
> + s->filenames = g_malloc0(sizeof(char *) * s->total);
> + names = g_strdup(b);
> +
> + /* Get the filenames pointers */
> + escape = false;
> + s->filenames[0] = names;
> + len = strlen(names);
> + for (i = 0, j = 1; i < len && j < s->total; i++) {
> + if (!escape && names[i] == ':') {
> + names[i] = '\0';
> + s->filenames[j] = names + i + 1;
> + j++;
> + }
> +
> + escape = !escape && names[i] == '\\';
> + }
> +
> + if (j != s->total) {
> + ret = -EINVAL;
> + goto free_exit;
> + }
> +
> + /* Open files */
> + for (i = 0; i < s->total; i++) {
> + s->bs[i] = bdrv_new("");
> + ret = bdrv_open(s->bs[i], s->filenames[i], flags, NULL);
> + if (ret < 0) {
> + goto error_exit;
> + }
> + }
> +
> + goto exit;
> +
> +error_exit:
> + for (; i >= 0; i--) {
> + bdrv_delete(s->bs[i]);
> + s->bs[i] = NULL;
> + }
> +free_exit:
> + g_free(s->filenames[0]);
> + g_free(s->filenames);
> + s->filenames = NULL;
> + g_free(s->bs);
> +exit:
> + return ret;
> +}
> +
> +static void quorum_close(BlockDriverState *bs)
> +{
> + BDRVQuorumState *s = bs->opaque;
> + int i;
> +
> + for (i = 0; i < s->total; i++) {
> + /* Ensure writes reach stable storage */
> + bdrv_flush(s->bs[i]);
> + bdrv_delete(s->bs[i]);
> + }
> +
> + g_free(s->filenames[0]);
> + g_free(s->filenames);
> + s->filenames = NULL;
> + g_free(s->bs);
> +}
> +
> static BlockDriver bdrv_quorum = {
> .format_name = "quorum",
> .protocol_name = "quorum",
>
> .instance_size = sizeof(BDRVQuorumState),
> +
> + .bdrv_file_open = quorum_open,
> + .bdrv_close = quorum_close,
> };
>
> static void bdrv_quorum_init(void)
> --
> 1.7.9.5
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [RFC V4 3/9] quorum: Add quorum_open() and quorum_close().
2012-08-20 19:47 ` Blue Swirl
@ 2012-08-20 20:37 ` Eric Blake
2012-08-21 10:13 ` Benoît Canet
0 siblings, 1 reply; 13+ messages in thread
From: Eric Blake @ 2012-08-20 20:37 UTC (permalink / raw)
To: Blue Swirl
Cc: kwolf, Benoît Canet, stefanha, qemu-devel, anthony, pbonzini,
afaerber, Benoît Canet
[-- Attachment #1: Type: text/plain, Size: 492 bytes --]
On 08/20/2012 01:47 PM, Blue Swirl wrote:
> On Mon, Aug 20, 2012 at 11:47 AM, Benoît Canet <benoit.canet@gmail.com> wrote:
>> Valid quorum resources look like:
>> quorum:threshold/total:path/to/image_1: ... :path/to/image_total
>>
>> '\' can escape the ':' character.
Also, \ escapes itself (although admittedly : is more likely than \ in a
file name needing escaping).
--
Eric Blake eblake@redhat.com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [RFC V4 3/9] quorum: Add quorum_open() and quorum_close().
2012-08-20 20:37 ` Eric Blake
@ 2012-08-21 10:13 ` Benoît Canet
0 siblings, 0 replies; 13+ messages in thread
From: Benoît Canet @ 2012-08-21 10:13 UTC (permalink / raw)
To: Eric Blake
Cc: kwolf, Benoît Canet, stefanha, qemu-devel, Blue Swirl,
anthony, pbonzini, afaerber
Le Monday 20 Aug 2012 à 14:37:08 (-0600), Eric Blake a écrit :
> On 08/20/2012 01:47 PM, Blue Swirl wrote:
> > On Mon, Aug 20, 2012 at 11:47 AM, Benoît Canet <benoit.canet@gmail.com> wrote:
> >> Valid quorum resources look like:
> >> quorum:threshold/total:path/to/image_1: ... :path/to/image_total
> >>
> >> '\' can escape the ':' character.
>
> Also, \ escapes itself (although admittedly : is more likely than \ in a
> file name needing escaping).
After a few tests this escaping code seems broken.
Will fix it.
Benoît
>
> --
> Eric Blake eblake@redhat.com +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2012-08-21 10:14 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-20 11:47 [Qemu-devel] [RFC V4 0/9] Quorum disk image corruption resiliency Benoît Canet
2012-08-20 11:47 ` [Qemu-devel] [RFC V4 1/9] quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB Benoît Canet
2012-08-20 11:47 ` [Qemu-devel] [RFC V4 2/9] quorum: Create BDRVQuorumState and BlkDriver and do init Benoît Canet
2012-08-20 11:47 ` [Qemu-devel] [RFC V4 3/9] quorum: Add quorum_open() and quorum_close() Benoît Canet
2012-08-20 19:47 ` Blue Swirl
2012-08-20 20:37 ` Eric Blake
2012-08-21 10:13 ` Benoît Canet
2012-08-20 11:47 ` [Qemu-devel] [RFC V4 4/9] quorum: Add quorum_aio_writev and its dependencies Benoît Canet
2012-08-20 11:47 ` [Qemu-devel] [RFC V4 5/9] blkverify: Extract qemu_iovec_clone() and qemu_iovec_compare() from blkverify Benoît Canet
2012-08-20 11:48 ` [Qemu-devel] [RFC V4 6/9] quorum: Add quorum_co_flush() Benoît Canet
2012-08-20 11:48 ` [Qemu-devel] [RFC V4 7/9] quorum: Add quorum_aio_readv Benoît Canet
2012-08-20 11:48 ` [Qemu-devel] [RFC V4 8/9] quorum: Add quorum mechanism Benoît Canet
2012-08-20 11:48 ` [Qemu-devel] [RFC V4 9/9] quorum: Add quorum_getlength() Benoît Canet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).