* [Qemu-devel] [RFC V7 01/11] quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB.
2013-01-18 17:30 [Qemu-devel] [RFC V7 00/11] Quorum block filter Benoît Canet
@ 2013-01-18 17:30 ` Benoît Canet
2013-01-18 17:30 ` [Qemu-devel] [RFC V7 02/11] quorum: Create BDRVQuorumState and BlkDriver and do init Benoît Canet
` (10 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Benoît Canet @ 2013-01-18 17:30 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/Makefile.objs | 1 +
block/quorum.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 46 insertions(+)
create mode 100644 block/quorum.c
diff --git a/block/Makefile.objs b/block/Makefile.objs
index c067f38..4143e34 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -2,6 +2,7 @@ block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat
block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
block-obj-y += qed-check.o
+block-obj-y += quorum.o
block-obj-y += parallels.o blkdebug.o blkverify.o
block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o
block-obj-$(CONFIG_POSIX) += raw-posix.o
diff --git a/block/quorum.c b/block/quorum.c
new file mode 100644
index 0000000..ce094a1
--- /dev/null
+++ b/block/quorum.c
@@ -0,0 +1,45 @@
+/*
+ * Quorum Block filter
+ *
+ * Copyright (C) 2012-2013 Nodalink, SARL.
+ *
+ * Author:
+ * Benoît Canet <benoit.canet@irqsave.net>
+ *
+ * Based on the design and code of blkverify.c (Copyright (C) 2010 IBM, Corp)
+ * and blkmirror.c (Copyright (C) 2011 Red Hat, Inc).
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "block/block_int.h"
+
+typedef struct QuorumAIOCB QuorumAIOCB;
+
+typedef struct QuorumSingleAIOCB {
+ BlockDriverAIOCB *aiocb;
+ uint8_t *buf;
+ int ret;
+ QuorumAIOCB *parent;
+} QuorumSingleAIOCB;
+
+struct QuorumAIOCB {
+ BlockDriverAIOCB common;
+ QEMUBH *bh;
+
+ /* Request metadata */
+ uint64_t sector_num;
+ int nb_sectors;
+
+ QEMUIOVector *qiov; /* calling readv IOV */
+
+ QuorumSingleAIOCB *aios; /* individual AIOs */
+ QEMUIOVector *qiovs; /* individual IOVs */
+ int count; /* number of completed AIOCB */
+ int success_count; /* number of successfully completed AIOCB */
+ bool *finished; /* completion signal for cancel */
+
+ void (*vote)(QuorumAIOCB *acb);
+ int vote_ret;
+};
--
1.7.10.4
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [Qemu-devel] [RFC V7 02/11] quorum: Create BDRVQuorumState and BlkDriver and do init.
2013-01-18 17:30 [Qemu-devel] [RFC V7 00/11] Quorum block filter Benoît Canet
2013-01-18 17:30 ` [Qemu-devel] [RFC V7 01/11] quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB Benoît Canet
@ 2013-01-18 17:30 ` Benoît Canet
2013-01-18 17:30 ` [Qemu-devel] [RFC V7 03/11] quorum: Add quorum_open() and quorum_close() Benoît Canet
` (9 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Benoît Canet @ 2013-01-18 17:30 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/quorum.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/block/quorum.c b/block/quorum.c
index ce094a1..0524b63 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -15,6 +15,13 @@
#include "block/block_int.h"
+typedef struct {
+ BlockDriverState **bs;
+ unsigned long long threshold;
+ unsigned long long total;
+ char **filenames;
+} BDRVQuorumState;
+
typedef struct QuorumAIOCB QuorumAIOCB;
typedef struct QuorumSingleAIOCB {
@@ -26,6 +33,7 @@ typedef struct QuorumSingleAIOCB {
struct QuorumAIOCB {
BlockDriverAIOCB common;
+ BDRVQuorumState *bqs;
QEMUBH *bh;
/* Request metadata */
@@ -43,3 +51,17 @@ struct QuorumAIOCB {
void (*vote)(QuorumAIOCB *acb);
int vote_ret;
};
+
+static BlockDriver bdrv_quorum = {
+ .format_name = "quorum",
+ .protocol_name = "quorum",
+
+ .instance_size = sizeof(BDRVQuorumState),
+};
+
+static void bdrv_quorum_init(void)
+{
+ bdrv_register(&bdrv_quorum);
+}
+
+block_init(bdrv_quorum_init);
--
1.7.10.4
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [Qemu-devel] [RFC V7 03/11] quorum: Add quorum_open() and quorum_close().
2013-01-18 17:30 [Qemu-devel] [RFC V7 00/11] Quorum block filter Benoît Canet
2013-01-18 17:30 ` [Qemu-devel] [RFC V7 01/11] quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB Benoît Canet
2013-01-18 17:30 ` [Qemu-devel] [RFC V7 02/11] quorum: Create BDRVQuorumState and BlkDriver and do init Benoît Canet
@ 2013-01-18 17:30 ` Benoît Canet
2013-01-18 17:30 ` [Qemu-devel] [RFC V7 04/11] quorum: Add quorum_aio_writev and its dependencies Benoît Canet
` (8 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Benoît Canet @ 2013-01-18 17:30 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
Valid quorum resources look like
quorum:threshold/total:path/to/image_1: ... :path/to/image_total
':' is used as a separator
'\' is the escaping character for filename containing ':'
'\' escape itself
',' must be escaped with ','
On the command line for quorum files "img:test.raw", "img2,raw"
and "img3.raw" invocation look like:
-drive file=quorum:2/3:img\\:test.raw:img2,,raw:img3.raw
(note the double \\ and the double ,,)
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/quorum.c | 155 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 155 insertions(+)
diff --git a/block/quorum.c b/block/quorum.c
index 0524b63..e157eb1 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -52,11 +52,166 @@ struct QuorumAIOCB {
int vote_ret;
};
+static int quorum_parse_uint_step_next(const char *start,
+ const char *name,
+ const char separator,
+ unsigned long long *value,
+ char **next)
+{
+ int ret;
+ if (start[0] == '\0') {
+ qerror_report(QERR_MISSING_PARAMETER, name);
+ return -EINVAL;
+ }
+ ret = parse_uint(start, value, next, 10);
+ if (ret < 0) {
+ qerror_report(QERR_INVALID_PARAMETER_TYPE, name, "int");
+ return ret;
+ }
+ if (**next != separator) {
+ qerror_report(ERROR_CLASS_GENERIC_ERROR,
+ "%c separator required after %s",
+ separator, name);
+ return -EINVAL;
+ }
+ *next += 1;
+ return 0;
+}
+
+/* Valid quorum resources look like
+ * quorum:threshold/total:path/to/image_1: ... :path/to/image_total
+ *
+ * ':' is used as a separator
+ * '\' is the escaping character for filename containing ':'
+ */
+static int quorum_open(BlockDriverState *bs, const char *filename, int flags)
+{
+ BDRVQuorumState *s = bs->opaque;
+ int i, j, k, len, ret = 0;
+ char *a, *b, *names;
+ const char *start;
+ bool escape;
+
+ /* Parse the quorum: prefix */
+ if (!strstart(filename, "quorum:", &start)) {
+ return -EINVAL;
+ }
+
+ /* Get threshold */
+ ret = quorum_parse_uint_step_next(start, "threshold", '/',
+ &s->threshold, &a);
+ if (ret < 0) {
+ return ret;
+ }
+
+ /* Get total */
+ ret = quorum_parse_uint_step_next(a, "total", ':', &s->total, &b);
+ if (ret < 0) {
+ return ret;
+ }
+
+ if (s->threshold < 1) {
+ qerror_report(QERR_INVALID_PARAMETER_VALUE, "threshold", "value >= 1");
+ return -ERANGE;
+ }
+
+ if (s->total < 2) {
+ qerror_report(QERR_INVALID_PARAMETER_VALUE, "total", "value >= 2");
+ return -ERANGE;
+ }
+
+ if (s->threshold > s->total) {
+ qerror_report(ERROR_CLASS_GENERIC_ERROR,
+ "threshold <= total must be true");
+ return -ERANGE;
+ }
+
+ s->bs = g_malloc0(sizeof(BlockDriverState *) * s->total);
+ /* Two allocations for all filenames: simpler to free */
+ s->filenames = g_malloc0(sizeof(char *) * s->total);
+ names = g_strdup(b);
+
+ /* Get the filenames pointers */
+ escape = false;
+ s->filenames[0] = names;
+ len = strlen(names);
+ for (i = j = k = 0; i < len && j < s->total; i++) {
+ /* separation between two files */
+ if (!escape && names[i] == ':') {
+ char *prev = s->filenames[j];
+ prev[k] = '\0';
+ s->filenames[++j] = prev + k + 1;
+ k = 0;
+ continue;
+ }
+
+ escape = !escape && names[i] == '\\';
+
+ /* if we are not escaping copy */
+ if (!escape) {
+ s->filenames[j][k++] = names[i];
+ }
+ }
+ /* terminate last string */
+ s->filenames[j][k] = '\0';
+
+ if ((j + 1) != s->total) {
+ qerror_report(ERROR_CLASS_GENERIC_ERROR,
+ "Number of provided file must be equal to total");
+ ret = -EINVAL;
+ goto free_exit;
+ }
+
+ /* Open files */
+ for (i = 0; i < s->total; i++) {
+ s->bs[i] = bdrv_new("");
+ ret = bdrv_open(s->bs[i], s->filenames[i], flags, NULL);
+ if (ret < 0) {
+ goto error_exit;
+ }
+ }
+
+ goto exit;
+
+error_exit:
+ for (; i >= 0; i--) {
+ bdrv_delete(s->bs[i]);
+ s->bs[i] = NULL;
+ }
+free_exit:
+ g_free(s->filenames[0]);
+ g_free(s->filenames);
+ s->filenames = NULL;
+ g_free(s->bs);
+exit:
+ return ret;
+}
+
+static void quorum_close(BlockDriverState *bs)
+{
+ BDRVQuorumState *s = bs->opaque;
+ int i;
+
+ for (i = 0; i < s->total; i++) {
+ /* Ensure writes reach stable storage */
+ bdrv_flush(s->bs[i]);
+ bdrv_delete(s->bs[i]);
+ }
+
+ g_free(s->filenames[0]);
+ g_free(s->filenames);
+ s->filenames = NULL;
+ g_free(s->bs);
+}
+
static BlockDriver bdrv_quorum = {
.format_name = "quorum",
.protocol_name = "quorum",
.instance_size = sizeof(BDRVQuorumState),
+
+ .bdrv_file_open = quorum_open,
+ .bdrv_close = quorum_close,
};
static void bdrv_quorum_init(void)
--
1.7.10.4
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [Qemu-devel] [RFC V7 04/11] quorum: Add quorum_aio_writev and its dependencies.
2013-01-18 17:30 [Qemu-devel] [RFC V7 00/11] Quorum block filter Benoît Canet
` (2 preceding siblings ...)
2013-01-18 17:30 ` [Qemu-devel] [RFC V7 03/11] quorum: Add quorum_open() and quorum_close() Benoît Canet
@ 2013-01-18 17:30 ` Benoît Canet
2013-01-18 17:30 ` [Qemu-devel] [RFC V7 05/11] blkverify: Extract qemu_iovec_clone() and qemu_iovec_compare() from blkverify Benoît Canet
` (7 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Benoît Canet @ 2013-01-18 17:30 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/quorum.c | 113 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 113 insertions(+)
diff --git a/block/quorum.c b/block/quorum.c
index e157eb1..71ae9ce 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -204,6 +204,117 @@ static void quorum_close(BlockDriverState *bs)
g_free(s->bs);
}
+static void quorum_aio_cancel(BlockDriverAIOCB *blockacb)
+{
+ QuorumAIOCB *acb = container_of(blockacb, QuorumAIOCB, common);
+ bool finished = false;
+
+ /* Wait for the request to finish */
+ acb->finished = &finished;
+ while (!finished) {
+ qemu_aio_wait();
+ }
+}
+
+static AIOCBInfo quorum_aiocb_info = {
+ .aiocb_size = sizeof(QuorumAIOCB),
+ .cancel = quorum_aio_cancel,
+};
+
+static void quorum_aio_bh(void *opaque)
+{
+ QuorumAIOCB *acb = opaque;
+ BDRVQuorumState *s = acb->bqs;
+ int ret;
+
+ ret = s->threshold <= acb->success_count ? 0 : -EIO;
+
+ qemu_bh_delete(acb->bh);
+ acb->common.cb(acb->common.opaque, ret);
+ if (acb->finished) {
+ *acb->finished = true;
+ }
+ g_free(acb->aios);
+ g_free(acb->qiovs);
+ qemu_aio_release(acb);
+}
+
+static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
+ BlockDriverState *bs,
+ QEMUIOVector *qiov,
+ uint64_t sector_num,
+ int nb_sectors,
+ BlockDriverCompletionFunc *cb,
+ void *opaque)
+{
+ QuorumAIOCB *acb = qemu_aio_get(&quorum_aiocb_info, bs, cb, opaque);
+ int i;
+
+ acb->aios = g_new0(QuorumSingleAIOCB, s->total);
+ acb->qiovs = g_new0(QEMUIOVector, s->total);
+
+ acb->bqs = s;
+ acb->qiov = qiov;
+ acb->bh = NULL;
+ acb->count = 0;
+ acb->success_count = 0;
+ acb->sector_num = sector_num;
+ acb->nb_sectors = nb_sectors;
+ acb->vote = NULL;
+ acb->vote_ret = 0;
+ acb->finished = NULL;
+
+ for (i = 0; i < s->total; i++) {
+ acb->aios[i].buf = NULL;
+ acb->aios[i].ret = 0;
+ acb->aios[i].parent = acb;
+ }
+
+ return acb;
+}
+
+static void quorum_aio_cb(void *opaque, int ret)
+{
+ QuorumSingleAIOCB *sacb = opaque;
+ QuorumAIOCB *acb = sacb->parent;
+ BDRVQuorumState *s = acb->bqs;
+
+ sacb->ret = ret;
+ acb->count++;
+ if (ret == 0) {
+ acb->success_count++;
+ }
+ assert(acb->count <= s->total);
+ assert(acb->success_count <= s->total);
+ if (acb->count < s->total) {
+ return;
+ }
+
+ acb->bh = qemu_bh_new(quorum_aio_bh, acb);
+ qemu_bh_schedule(acb->bh);
+}
+
+static BlockDriverAIOCB *quorum_aio_writev(BlockDriverState *bs,
+ int64_t sector_num,
+ QEMUIOVector *qiov,
+ int nb_sectors,
+ BlockDriverCompletionFunc *cb,
+ void *opaque)
+{
+ BDRVQuorumState *s = bs->opaque;
+ QuorumAIOCB *acb = quorum_aio_get(s, bs, qiov, sector_num, nb_sectors,
+ cb, opaque);
+ int i;
+
+ for (i = 0; i < s->total; i++) {
+ acb->aios[i].aiocb = bdrv_aio_writev(s->bs[i], sector_num, qiov,
+ nb_sectors, &quorum_aio_cb,
+ &acb->aios[i]);
+ }
+
+ return &acb->common;
+}
+
static BlockDriver bdrv_quorum = {
.format_name = "quorum",
.protocol_name = "quorum",
@@ -212,6 +323,8 @@ static BlockDriver bdrv_quorum = {
.bdrv_file_open = quorum_open,
.bdrv_close = quorum_close,
+
+ .bdrv_aio_writev = quorum_aio_writev,
};
static void bdrv_quorum_init(void)
--
1.7.10.4
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [Qemu-devel] [RFC V7 05/11] blkverify: Extract qemu_iovec_clone() and qemu_iovec_compare() from blkverify.
2013-01-18 17:30 [Qemu-devel] [RFC V7 00/11] Quorum block filter Benoît Canet
` (3 preceding siblings ...)
2013-01-18 17:30 ` [Qemu-devel] [RFC V7 04/11] quorum: Add quorum_aio_writev and its dependencies Benoît Canet
@ 2013-01-18 17:30 ` Benoît Canet
2013-01-18 17:30 ` [Qemu-devel] [RFC V7 06/11] quorum: Add quorum_aio_readv Benoît Canet
` (6 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Benoît Canet @ 2013-01-18 17:30 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/blkverify.c | 108 +------------------------------------------------
include/qemu-common.h | 2 +
util/iov.c | 103 ++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 107 insertions(+), 106 deletions(-)
diff --git a/block/blkverify.c b/block/blkverify.c
index a7dd459..8c65425 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -123,110 +123,6 @@ static int64_t blkverify_getlength(BlockDriverState *bs)
return bdrv_getlength(s->test_file);
}
-/**
- * Check that I/O vector contents are identical
- *
- * @a: I/O vector
- * @b: I/O vector
- * @ret: Offset to first mismatching byte or -1 if match
- */
-static ssize_t blkverify_iovec_compare(QEMUIOVector *a, QEMUIOVector *b)
-{
- int i;
- ssize_t offset = 0;
-
- assert(a->niov == b->niov);
- for (i = 0; i < a->niov; i++) {
- size_t len = 0;
- uint8_t *p = (uint8_t *)a->iov[i].iov_base;
- uint8_t *q = (uint8_t *)b->iov[i].iov_base;
-
- assert(a->iov[i].iov_len == b->iov[i].iov_len);
- while (len < a->iov[i].iov_len && *p++ == *q++) {
- len++;
- }
-
- offset += len;
-
- if (len != a->iov[i].iov_len) {
- return offset;
- }
- }
- return -1;
-}
-
-typedef struct {
- int src_index;
- struct iovec *src_iov;
- void *dest_base;
-} IOVectorSortElem;
-
-static int sortelem_cmp_src_base(const void *a, const void *b)
-{
- const IOVectorSortElem *elem_a = a;
- const IOVectorSortElem *elem_b = b;
-
- /* Don't overflow */
- if (elem_a->src_iov->iov_base < elem_b->src_iov->iov_base) {
- return -1;
- } else if (elem_a->src_iov->iov_base > elem_b->src_iov->iov_base) {
- return 1;
- } else {
- return 0;
- }
-}
-
-static int sortelem_cmp_src_index(const void *a, const void *b)
-{
- const IOVectorSortElem *elem_a = a;
- const IOVectorSortElem *elem_b = b;
-
- return elem_a->src_index - elem_b->src_index;
-}
-
-/**
- * Copy contents of I/O vector
- *
- * The relative relationships of overlapping iovecs are preserved. This is
- * necessary to ensure identical semantics in the cloned I/O vector.
- */
-static void blkverify_iovec_clone(QEMUIOVector *dest, const QEMUIOVector *src,
- void *buf)
-{
- IOVectorSortElem sortelems[src->niov];
- void *last_end;
- int i;
-
- /* Sort by source iovecs by base address */
- for (i = 0; i < src->niov; i++) {
- sortelems[i].src_index = i;
- sortelems[i].src_iov = &src->iov[i];
- }
- qsort(sortelems, src->niov, sizeof(sortelems[0]), sortelem_cmp_src_base);
-
- /* Allocate buffer space taking into account overlapping iovecs */
- last_end = NULL;
- for (i = 0; i < src->niov; i++) {
- struct iovec *cur = sortelems[i].src_iov;
- ptrdiff_t rewind = 0;
-
- /* Detect overlap */
- if (last_end && last_end > cur->iov_base) {
- rewind = last_end - cur->iov_base;
- }
-
- sortelems[i].dest_base = buf - rewind;
- buf += cur->iov_len - MIN(rewind, cur->iov_len);
- last_end = MAX(cur->iov_base + cur->iov_len, last_end);
- }
-
- /* Sort by source iovec index and build destination iovec */
- qsort(sortelems, src->niov, sizeof(sortelems[0]), sortelem_cmp_src_index);
- for (i = 0; i < src->niov; i++) {
- qemu_iovec_add(dest, sortelems[i].dest_base, src->iov[i].iov_len);
- }
-}
-
static BlkverifyAIOCB *blkverify_aio_get(BlockDriverState *bs, bool is_write,
int64_t sector_num, QEMUIOVector *qiov,
int nb_sectors,
@@ -290,7 +186,7 @@ static void blkverify_aio_cb(void *opaque, int ret)
static void blkverify_verify_readv(BlkverifyAIOCB *acb)
{
- ssize_t offset = blkverify_iovec_compare(acb->qiov, &acb->raw_qiov);
+ ssize_t offset = qemu_iovec_compare(acb->qiov, &acb->raw_qiov);
if (offset != -1) {
blkverify_err(acb, "contents mismatch in sector %" PRId64,
acb->sector_num + (int64_t)(offset / BDRV_SECTOR_SIZE));
@@ -308,7 +204,7 @@ static BlockDriverAIOCB *blkverify_aio_readv(BlockDriverState *bs,
acb->verify = blkverify_verify_readv;
acb->buf = qemu_blockalign(bs->file, qiov->size);
qemu_iovec_init(&acb->raw_qiov, acb->qiov->niov);
- blkverify_iovec_clone(&acb->raw_qiov, qiov, acb->buf);
+ qemu_iovec_clone(&acb->raw_qiov, qiov, acb->buf);
bdrv_aio_readv(s->test_file, sector_num, qiov, nb_sectors,
blkverify_aio_cb, acb);
diff --git a/include/qemu-common.h b/include/qemu-common.h
index ca464bb..13ce13a 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -342,6 +342,8 @@ size_t qemu_iovec_from_buf(QEMUIOVector *qiov, size_t offset,
const void *buf, size_t bytes);
size_t qemu_iovec_memset(QEMUIOVector *qiov, size_t offset,
int fillc, size_t bytes);
+ssize_t qemu_iovec_compare(QEMUIOVector *a, QEMUIOVector *b);
+void qemu_iovec_clone(QEMUIOVector *dest, const QEMUIOVector *src, void *buf);
bool buffer_is_zero(const void *buf, size_t len);
diff --git a/util/iov.c b/util/iov.c
index c0f5c56..bed2c22 100644
--- a/util/iov.c
+++ b/util/iov.c
@@ -370,6 +370,109 @@ size_t qemu_iovec_memset(QEMUIOVector *qiov, size_t offset,
return iov_memset(qiov->iov, qiov->niov, offset, fillc, bytes);
}
+/**
+ * Check that I/O vector contents are identical
+ *
+ * @a: I/O vector
+ * @b: I/O vector
+ * @ret: Offset to first mismatching byte or -1 if match
+ */
+ssize_t qemu_iovec_compare(QEMUIOVector *a, QEMUIOVector *b)
+{
+ int i;
+ ssize_t offset = 0;
+
+ assert(a->niov == b->niov);
+ for (i = 0; i < a->niov; i++) {
+ size_t len = 0;
+ uint8_t *p = (uint8_t *)a->iov[i].iov_base;
+ uint8_t *q = (uint8_t *)b->iov[i].iov_base;
+
+ assert(a->iov[i].iov_len == b->iov[i].iov_len);
+ while (len < a->iov[i].iov_len && *p++ == *q++) {
+ len++;
+ }
+
+ offset += len;
+
+ if (len != a->iov[i].iov_len) {
+ return offset;
+ }
+ }
+ return -1;
+}
+
+typedef struct {
+ int src_index;
+ struct iovec *src_iov;
+ void *dest_base;
+} IOVectorSortElem;
+
+static int sortelem_cmp_src_base(const void *a, const void *b)
+{
+ const IOVectorSortElem *elem_a = a;
+ const IOVectorSortElem *elem_b = b;
+
+ /* Don't overflow */
+ if (elem_a->src_iov->iov_base < elem_b->src_iov->iov_base) {
+ return -1;
+ } else if (elem_a->src_iov->iov_base > elem_b->src_iov->iov_base) {
+ return 1;
+ } else {
+ return 0;
+ }
+}
+
+static int sortelem_cmp_src_index(const void *a, const void *b)
+{
+ const IOVectorSortElem *elem_a = a;
+ const IOVectorSortElem *elem_b = b;
+
+ return elem_a->src_index - elem_b->src_index;
+}
+
+/**
+ * Copy contents of I/O vector
+ *
+ * The relative relationships of overlapping iovecs are preserved. This is
+ * necessary to ensure identical semantics in the cloned I/O vector.
+ */
+void qemu_iovec_clone(QEMUIOVector *dest, const QEMUIOVector *src, void *buf)
+{
+ IOVectorSortElem sortelems[src->niov];
+ void *last_end;
+ int i;
+
+ /* Sort by source iovecs by base address */
+ for (i = 0; i < src->niov; i++) {
+ sortelems[i].src_index = i;
+ sortelems[i].src_iov = &src->iov[i];
+ }
+ qsort(sortelems, src->niov, sizeof(sortelems[0]), sortelem_cmp_src_base);
+
+ /* Allocate buffer space taking into account overlapping iovecs */
+ last_end = NULL;
+ for (i = 0; i < src->niov; i++) {
+ struct iovec *cur = sortelems[i].src_iov;
+ ptrdiff_t rewind = 0;
+
+ /* Detect overlap */
+ if (last_end && last_end > cur->iov_base) {
+ rewind = last_end - cur->iov_base;
+ }
+
+ sortelems[i].dest_base = buf - rewind;
+ buf += cur->iov_len - MIN(rewind, cur->iov_len);
+ last_end = MAX(cur->iov_base + cur->iov_len, last_end);
+ }
+
+ /* Sort by source iovec index and build destination iovec */
+ qsort(sortelems, src->niov, sizeof(sortelems[0]), sortelem_cmp_src_index);
+ for (i = 0; i < src->niov; i++) {
+ qemu_iovec_add(dest, sortelems[i].dest_base, src->iov[i].iov_len);
+ }
+}
+
size_t iov_discard_front(struct iovec **iov, unsigned int *iov_cnt,
size_t bytes)
{
--
1.7.10.4
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [Qemu-devel] [RFC V7 06/11] quorum: Add quorum_aio_readv.
2013-01-18 17:30 [Qemu-devel] [RFC V7 00/11] Quorum block filter Benoît Canet
` (4 preceding siblings ...)
2013-01-18 17:30 ` [Qemu-devel] [RFC V7 05/11] blkverify: Extract qemu_iovec_clone() and qemu_iovec_compare() from blkverify Benoît Canet
@ 2013-01-18 17:30 ` Benoît Canet
2013-01-18 17:30 ` [Qemu-devel] [RFC V7 07/11] quorum: Add quorum mechanism Benoît Canet
` (5 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Benoît Canet @ 2013-01-18 17:30 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/quorum.c | 38 +++++++++++++++++++++++++++++++++++++-
1 file changed, 37 insertions(+), 1 deletion(-)
diff --git a/block/quorum.c b/block/quorum.c
index 71ae9ce..7194809 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -225,15 +225,24 @@ static void quorum_aio_bh(void *opaque)
{
QuorumAIOCB *acb = opaque;
BDRVQuorumState *s = acb->bqs;
- int ret;
+ int i, ret;
ret = s->threshold <= acb->success_count ? 0 : -EIO;
+ for (i = 0; i < s->total; i++) {
+ qemu_vfree(acb->aios[i].buf);
+ acb->aios[i].buf = NULL;
+ acb->aios[i].ret = 0;
+ }
+
qemu_bh_delete(acb->bh);
acb->common.cb(acb->common.opaque, ret);
if (acb->finished) {
*acb->finished = true;
}
+ for (i = 0; i < s->total; i++) {
+ qemu_iovec_destroy(&acb->qiovs[i]);
+ }
g_free(acb->aios);
g_free(acb->qiovs);
qemu_aio_release(acb);
@@ -294,6 +303,32 @@ static void quorum_aio_cb(void *opaque, int ret)
qemu_bh_schedule(acb->bh);
}
+static BlockDriverAIOCB *quorum_aio_readv(BlockDriverState *bs,
+ int64_t sector_num,
+ QEMUIOVector *qiov,
+ int nb_sectors,
+ BlockDriverCompletionFunc *cb,
+ void *opaque)
+{
+ BDRVQuorumState *s = bs->opaque;
+ QuorumAIOCB *acb = quorum_aio_get(s, bs, qiov, sector_num,
+ nb_sectors, cb, opaque);
+ int i;
+
+ for (i = 0; i < s->total; i++) {
+ acb->aios[i].buf = qemu_blockalign(bs->file, qiov->size);
+ qemu_iovec_init(&acb->qiovs[i], qiov->niov);
+ qemu_iovec_clone(&acb->qiovs[i], qiov, acb->aios[i].buf);
+ }
+
+ for (i = 0; i < s->total; i++) {
+ bdrv_aio_readv(s->bs[i], sector_num, qiov, nb_sectors,
+ quorum_aio_cb, &acb->aios[i]);
+ }
+
+ return &acb->common;
+}
+
static BlockDriverAIOCB *quorum_aio_writev(BlockDriverState *bs,
int64_t sector_num,
QEMUIOVector *qiov,
@@ -324,6 +359,7 @@ static BlockDriver bdrv_quorum = {
.bdrv_file_open = quorum_open,
.bdrv_close = quorum_close,
+ .bdrv_aio_readv = quorum_aio_readv,
.bdrv_aio_writev = quorum_aio_writev,
};
--
1.7.10.4
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [Qemu-devel] [RFC V7 07/11] quorum: Add quorum mechanism.
2013-01-18 17:30 [Qemu-devel] [RFC V7 00/11] Quorum block filter Benoît Canet
` (5 preceding siblings ...)
2013-01-18 17:30 ` [Qemu-devel] [RFC V7 06/11] quorum: Add quorum_aio_readv Benoît Canet
@ 2013-01-18 17:30 ` Benoît Canet
2013-01-18 17:30 ` [Qemu-devel] [RFC V7 08/11] quorum: Add quorum_getlength() Benoît Canet
` (4 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Benoît Canet @ 2013-01-18 17:30 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
Use gnutls's SHA-256 to compare versions.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/quorum.c | 303 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
configure | 22 ++++
2 files changed, 324 insertions(+), 1 deletion(-)
diff --git a/block/quorum.c b/block/quorum.c
index 7194809..e2b5208 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -13,8 +13,30 @@
* See the COPYING file in the top-level directory.
*/
+#include <gnutls/gnutls.h>
+#include <gnutls/crypto.h>
#include "block/block_int.h"
+#define HASH_LENGTH 32
+
+typedef union QuorumVoteValue {
+ char h[HASH_LENGTH]; /* SHA-256 hash */
+ unsigned long l; /* simpler hash */
+} QuorumVoteValue;
+
+typedef struct QuorumVoteItem {
+ int index;
+ QLIST_ENTRY(QuorumVoteItem) next;
+} QuorumVoteItem;
+
+typedef struct QuorumVoteVersion {
+ QuorumVoteValue value;
+ int index;
+ int vote_count;
+ QLIST_HEAD(, QuorumVoteItem) items;
+ QLIST_ENTRY(QuorumVoteVersion) next;
+} QuorumVoteVersion;
+
typedef struct {
BlockDriverState **bs;
unsigned long long threshold;
@@ -31,6 +53,11 @@ typedef struct QuorumSingleAIOCB {
QuorumAIOCB *parent;
} QuorumSingleAIOCB;
+typedef struct QuorumVotes {
+ QLIST_HEAD(, QuorumVoteVersion) vote_list;
+ int (*compare)(QuorumVoteValue *a, QuorumVoteValue *b);
+} QuorumVotes;
+
struct QuorumAIOCB {
BlockDriverAIOCB common;
BDRVQuorumState *bqs;
@@ -48,6 +75,8 @@ struct QuorumAIOCB {
int success_count; /* number of successfully completed AIOCB */
bool *finished; /* completion signal for cancel */
+ QuorumVotes votes;
+
void (*vote)(QuorumAIOCB *acb);
int vote_ret;
};
@@ -236,6 +265,11 @@ static void quorum_aio_bh(void *opaque)
}
qemu_bh_delete(acb->bh);
+
+ if (acb->vote_ret) {
+ ret = acb->vote_ret;
+ }
+
acb->common.cb(acb->common.opaque, ret);
if (acb->finished) {
*acb->finished = true;
@@ -248,6 +282,11 @@ static void quorum_aio_bh(void *opaque)
qemu_aio_release(acb);
}
+static int quorum_sha256_compare(QuorumVoteValue *a, QuorumVoteValue *b)
+{
+ return memcmp(a, b, HASH_LENGTH);
+}
+
static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
BlockDriverState *bs,
QEMUIOVector *qiov,
@@ -272,6 +311,8 @@ static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
acb->vote = NULL;
acb->vote_ret = 0;
acb->finished = NULL;
+ acb->votes.compare = quorum_sha256_compare;
+ QLIST_INIT(&acb->votes.vote_list);
for (i = 0; i < s->total; i++) {
acb->aios[i].buf = NULL;
@@ -299,10 +340,268 @@ static void quorum_aio_cb(void *opaque, int ret)
return;
}
+ /* Do the vote */
+ if (acb->vote) {
+ acb->vote(acb);
+ }
+
acb->bh = qemu_bh_new(quorum_aio_bh, acb);
qemu_bh_schedule(acb->bh);
}
+static void quorum_print_bad(QuorumAIOCB *acb, const char *filename)
+{
+ fprintf(stderr, "quorum: corrected error in quorum file %s: sector_num=%"
+ PRId64 " nb_sectors=%i\n", filename, acb->sector_num,
+ acb->nb_sectors);
+}
+
+static void quorum_print_failure(QuorumAIOCB *acb)
+{
+ fprintf(stderr, "quorum: failure sector_num=%" PRId64 " nb_sectors=%i\n",
+ acb->sector_num, acb->nb_sectors);
+}
+
+static void quorum_print_bad_versions(QuorumAIOCB *acb,
+ QuorumVoteValue *value)
+{
+ QuorumVoteVersion *version;
+ QuorumVoteItem *item;
+ BDRVQuorumState *s = acb->bqs;
+
+ QLIST_FOREACH(version, &acb->votes.vote_list, next) {
+ if (!acb->votes.compare(&version->value, value)) {
+ continue;
+ }
+ QLIST_FOREACH(item, &version->items, next) {
+ quorum_print_bad(acb, s->filenames[item->index]);
+ }
+ }
+}
+
+static void quorum_copy_qiov(QEMUIOVector *dest, QEMUIOVector *source)
+{
+ int i;
+ assert(dest->niov == source->niov);
+ assert(dest->size == source->size);
+ for (i = 0; i < source->niov; i++) {
+ assert(dest->iov[i].iov_len == source->iov[i].iov_len);
+ memcpy(dest->iov[i].iov_base,
+ source->iov[i].iov_base,
+ source->iov[i].iov_len);
+ }
+}
+
+static void quorum_count_vote(QuorumVotes *votes,
+ QuorumVoteValue *value,
+ int index)
+{
+ QuorumVoteVersion *v = NULL, *version = NULL;
+ QuorumVoteItem *item;
+
+ /* look if we have something with this hash */
+ QLIST_FOREACH(v, &votes->vote_list, next) {
+ if (!votes->compare(&v->value, value)) {
+ version = v;
+ break;
+ }
+ }
+
+ /* It's a version not yet in the list add it */
+ if (!version) {
+ version = g_new0(QuorumVoteVersion, 1);
+ QLIST_INIT(&version->items);
+ memcpy(&version->value, value, sizeof(version->value));
+ version->index = index;
+ version->vote_count = 0;
+ QLIST_INSERT_HEAD(&votes->vote_list, version, next);
+ }
+
+ version->vote_count++;
+
+ item = g_new0(QuorumVoteItem, 1);
+ item->index = index;
+ QLIST_INSERT_HEAD(&version->items, item, next);
+}
+
+static void quorum_free_vote_list(QuorumVotes *votes)
+{
+ QuorumVoteVersion *version, *next_version;
+ QuorumVoteItem *item, *next_item;
+
+ QLIST_FOREACH_SAFE(version, &votes->vote_list, next, next_version) {
+ QLIST_REMOVE(version, next);
+ QLIST_FOREACH_SAFE(item, &version->items, next, next_item) {
+ QLIST_REMOVE(item, next);
+ g_free(item);
+ }
+ g_free(version);
+ }
+}
+
+static int quorum_compute_hash(QuorumAIOCB *acb, int i, QuorumVoteValue *hash)
+{
+ int j, ret;
+ gnutls_hash_hd_t dig;
+ QEMUIOVector *qiov = &acb->qiovs[i];
+
+ ret = gnutls_hash_init(&dig, GNUTLS_DIG_SHA256);
+
+ if (ret < 0) {
+ return ret;
+ }
+
+ for (j = 0; j < qiov->niov; j++) {
+ ret = gnutls_hash(dig, qiov->iov[j].iov_base, qiov->iov[j].iov_len);
+ if (ret < 0) {
+ return ret;
+ }
+ }
+
+ gnutls_hash_deinit(dig, (void *) hash);
+
+ return 0;
+}
+
+static QuorumVoteVersion *quorum_get_vote_winner(QuorumVotes *votes)
+{
+ int i = 0;
+ QuorumVoteVersion *candidate, *winner = NULL;
+
+ QLIST_FOREACH(candidate, &votes->vote_list, next) {
+ if (candidate->vote_count > i) {
+ i = candidate->vote_count;
+ winner = candidate;
+ }
+ }
+
+ return winner;
+}
+
+static bool quorum_iovec_compare(QEMUIOVector *a, QEMUIOVector *b)
+{
+ int i;
+ int result;
+
+ assert(a->niov == b->niov);
+ for (i = 0; i < a->niov; i++) {
+ assert(a->iov[i].iov_len == b->iov[i].iov_len);
+ result = memcmp(a->iov[i].iov_base,
+ b->iov[i].iov_base,
+ a->iov[i].iov_len);
+ if (result) {
+ return false;
+ }
+ }
+
+ return true;
+}
+
+static void GCC_FMT_ATTR(2, 3) quorum_err(QuorumAIOCB *acb,
+ const char *fmt, ...)
+{
+ va_list ap;
+
+ va_start(ap, fmt);
+ fprintf(stderr, "quorum: sector_num=%" PRId64 " nb_sectors=%d ",
+ acb->sector_num, acb->nb_sectors);
+ vfprintf(stderr, fmt, ap);
+ fprintf(stderr, "\n");
+ va_end(ap);
+ exit(1);
+}
+
+static bool quorum_compare(QuorumAIOCB *acb,
+ QEMUIOVector *a,
+ QEMUIOVector *b)
+{
+ BDRVQuorumState *s = acb->bqs;
+ bool blkverify = false;
+ ssize_t offset;
+
+ if (s->total == 2 && s->threshold == 2) {
+ blkverify = true;
+ }
+
+ if (blkverify) {
+ offset = qemu_iovec_compare(a, b);
+ if (offset != -1) {
+ quorum_err(acb, "contents mismatch in sector %" PRId64,
+ acb->sector_num +
+ (uint64_t)(offset / BDRV_SECTOR_SIZE));
+ }
+ return true;
+ }
+
+ return quorum_iovec_compare(a, b);
+}
+
+
+static void quorum_vote(QuorumAIOCB *acb)
+{
+ bool quorum = true;
+ int i, j, ret;
+ QuorumVoteValue hash;
+ BDRVQuorumState *s = acb->bqs;
+ QuorumVoteVersion *winner;
+
+ /* get the index of the first successful read */
+ for (i = 0; i < s->total; i++) {
+ if (!acb->aios[i].ret) {
+ break;
+ }
+ }
+
+ /* compare this read with all other successful read looking for quorum */
+ for (j = i + 1; j < s->total; j++) {
+ if (acb->aios[j].ret) {
+ continue;
+ }
+ quorum = quorum_compare(acb, &acb->qiovs[i], &acb->qiovs[j]);
+ if (!quorum) {
+ break;
+ }
+ }
+
+ /* Every successful read agrees -> Quorum */
+ if (quorum) {
+ quorum_copy_qiov(acb->qiov, &acb->qiovs[i]);
+ return;
+ }
+
+ /* compute hashs for each successful read, also store indexes */
+ for (i = 0; i < s->total; i++) {
+ if (acb->aios[i].ret) {
+ continue;
+ }
+ ret = quorum_compute_hash(acb, i, &hash);
+ assert(ret == 0);
+ quorum_count_vote(&acb->votes, &hash, i);
+ }
+
+ /* vote to select the most represented version */
+ winner = quorum_get_vote_winner(&acb->votes);
+ assert(winner != NULL);
+
+ /* if the winner count is smaller than threshold read fail */
+ if (winner->vote_count < s->threshold) {
+ quorum_print_failure(acb);
+ acb->vote_ret = -EIO;
+ fprintf(stderr, "quorum: vote result inferior to threshold\n");
+ goto free_exit;
+ }
+
+ /* we have a winner: copy it */
+ quorum_copy_qiov(acb->qiov, &acb->qiovs[winner->index]);
+
+ /* some versions are bad print them */
+ quorum_print_bad_versions(acb, &winner->value);
+
+free_exit:
+ /* free lists */
+ quorum_free_vote_list(&acb->votes);
+}
+
static BlockDriverAIOCB *quorum_aio_readv(BlockDriverState *bs,
int64_t sector_num,
QEMUIOVector *qiov,
@@ -315,6 +614,8 @@ static BlockDriverAIOCB *quorum_aio_readv(BlockDriverState *bs,
nb_sectors, cb, opaque);
int i;
+ acb->vote = quorum_vote;
+
for (i = 0; i < s->total; i++) {
acb->aios[i].buf = qemu_blockalign(bs->file, qiov->size);
qemu_iovec_init(&acb->qiovs[i], qiov->niov);
@@ -322,7 +623,7 @@ static BlockDriverAIOCB *quorum_aio_readv(BlockDriverState *bs,
}
for (i = 0; i < s->total; i++) {
- bdrv_aio_readv(s->bs[i], sector_num, qiov, nb_sectors,
+ bdrv_aio_readv(s->bs[i], sector_num, &acb->qiovs[i], nb_sectors,
quorum_aio_cb, &acb->aios[i]);
}
diff --git a/configure b/configure
index 4ebb60d..0832d26 100755
--- a/configure
+++ b/configure
@@ -1733,6 +1733,28 @@ EOF
fi
##########################################
+# Quorum gnutls detection
+cat > $TMPC <<EOF
+#include <gnutls/gnutls.h>
+#include <gnutls/crypto.h>
+int main(void) {char data[4096], digest[32];
+gnutls_hash_fast(GNUTLS_DIG_SHA256, data, 4096, digest);
+return 0;
+}
+EOF
+qcow_tls_cflags=`$pkg_config --cflags gnutls 2> /dev/null`
+qcow_tls_libs=`$pkg_config --libs gnutls 2> /dev/null`
+if compile_prog "$qcow_tls_cflags" "$qcow_tls_libs" ; then
+ qcow_tls=yes
+ libs_softmmu="$qcow_tls_libs $libs_softmmu"
+ libs_tools="$qcow_tls_libs $libs_softmmu"
+ QEMU_CFLAGS="$QEMU_CFLAGS $qcow_tls_cflags"
+else
+ echo "gnutls > 2.10.0 required to compile QEMU"
+ exit 1
+fi
+
+##########################################
# VNC SASL detection
if test "$vnc" = "yes" -a "$vnc_sasl" != "no" ; then
cat > $TMPC <<EOF
--
1.7.10.4
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [Qemu-devel] [RFC V7 08/11] quorum: Add quorum_getlength().
2013-01-18 17:30 [Qemu-devel] [RFC V7 00/11] Quorum block filter Benoît Canet
` (6 preceding siblings ...)
2013-01-18 17:30 ` [Qemu-devel] [RFC V7 07/11] quorum: Add quorum mechanism Benoît Canet
@ 2013-01-18 17:30 ` Benoît Canet
2013-01-18 17:30 ` [Qemu-devel] [RFC V7 09/11] quorum: Add quorum_invalidate_cache() Benoît Canet
` (3 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Benoît Canet @ 2013-01-18 17:30 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
Check that every bs file return the same length.
If not return -EIO to disable the quorum and
avoid length discrepancy.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/quorum.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/block/quorum.c b/block/quorum.c
index e2b5208..a63a84f 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -651,12 +651,32 @@ static BlockDriverAIOCB *quorum_aio_writev(BlockDriverState *bs,
return &acb->common;
}
+static int64_t quorum_getlength(BlockDriverState *bs)
+{
+ BDRVQuorumState *s = bs->opaque;
+ int64_t result;
+ int i;
+
+ /* check that every file have the same length */
+ result = bdrv_getlength(s->bs[0]);
+ for (i = 1; i < s->total; i++) {
+ int64_t value = bdrv_getlength(s->bs[i]);
+ if (value != result) {
+ return -EIO;
+ }
+ }
+
+ return result;
+}
+
static BlockDriver bdrv_quorum = {
.format_name = "quorum",
.protocol_name = "quorum",
.instance_size = sizeof(BDRVQuorumState),
+ .bdrv_getlength = quorum_getlength,
+
.bdrv_file_open = quorum_open,
.bdrv_close = quorum_close,
--
1.7.10.4
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [Qemu-devel] [RFC V7 09/11] quorum: Add quorum_invalidate_cache().
2013-01-18 17:30 [Qemu-devel] [RFC V7 00/11] Quorum block filter Benoît Canet
` (7 preceding siblings ...)
2013-01-18 17:30 ` [Qemu-devel] [RFC V7 08/11] quorum: Add quorum_getlength() Benoît Canet
@ 2013-01-18 17:30 ` Benoît Canet
2013-01-18 17:30 ` [Qemu-devel] [RFC V7 10/11] quorum: Add quorum_co_is_allocated Benoît Canet
` (2 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: Benoît Canet @ 2013-01-18 17:30 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/quorum.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/block/quorum.c b/block/quorum.c
index a63a84f..5cafb40 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -669,6 +669,16 @@ static int64_t quorum_getlength(BlockDriverState *bs)
return result;
}
+static void quorum_invalidate_cache(BlockDriverState *bs)
+{
+ BDRVQuorumState *s = bs->opaque;
+ int i;
+
+ for (i = 0; i < s->total; i++) {
+ bdrv_invalidate_cache(s->bs[i]);
+ }
+}
+
static BlockDriver bdrv_quorum = {
.format_name = "quorum",
.protocol_name = "quorum",
@@ -682,6 +692,7 @@ static BlockDriver bdrv_quorum = {
.bdrv_aio_readv = quorum_aio_readv,
.bdrv_aio_writev = quorum_aio_writev,
+ .bdrv_invalidate_cache = quorum_invalidate_cache,
};
static void bdrv_quorum_init(void)
--
1.7.10.4
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [Qemu-devel] [RFC V7 10/11] quorum: Add quorum_co_is_allocated.
2013-01-18 17:30 [Qemu-devel] [RFC V7 00/11] Quorum block filter Benoît Canet
` (8 preceding siblings ...)
2013-01-18 17:30 ` [Qemu-devel] [RFC V7 09/11] quorum: Add quorum_invalidate_cache() Benoît Canet
@ 2013-01-18 17:30 ` Benoît Canet
2013-01-18 17:30 ` [Qemu-devel] [RFC V7 11/11] quorum: Add quorum_co_flush() Benoît Canet
2013-01-21 13:02 ` [Qemu-devel] [RFC V7 00/11] Quorum block filter Zhi Yong Wu
11 siblings, 0 replies; 13+ messages in thread
From: Benoît Canet @ 2013-01-18 17:30 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/quorum.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 53 insertions(+)
diff --git a/block/quorum.c b/block/quorum.c
index 5cafb40..8cbf66f 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -287,6 +287,22 @@ static int quorum_sha256_compare(QuorumVoteValue *a, QuorumVoteValue *b)
return memcmp(a, b, HASH_LENGTH);
}
+static int quorum_long_compare(QuorumVoteValue *a, QuorumVoteValue *b)
+{
+ unsigned long i = a->l;
+ unsigned long j = b->l;
+
+ if (i < j) {
+ return -1;
+ }
+
+ if (i > j) {
+ return 1;
+ }
+
+ return 0;
+}
+
static QuorumAIOCB *quorum_aio_get(BDRVQuorumState *s,
BlockDriverState *bs,
QEMUIOVector *qiov,
@@ -679,6 +695,42 @@ static void quorum_invalidate_cache(BlockDriverState *bs)
}
}
+static int coroutine_fn quorum_co_is_allocated(BlockDriverState *bs,
+ int64_t sector_num,
+ int nb_sectors,
+ int *pnum)
+{
+ BDRVQuorumState *s = bs->opaque;
+ QuorumVoteVersion *winner = NULL;
+ QuorumVotes result_votes, num_votes;
+ QuorumVoteValue result_value, num_value;
+ int i, result = 0, num;
+
+ QLIST_INIT(&result_votes.vote_list);
+ QLIST_INIT(&num_votes.vote_list);
+ result_votes.compare = quorum_long_compare;
+ num_votes.compare = quorum_long_compare;
+
+ for (i = 0; i < s->total; i++) {
+ result = bdrv_co_is_allocated(s->bs[i], sector_num, nb_sectors, &num);
+ result_value.l = result;
+ num_value.l = num;
+ quorum_count_vote(&result_votes, &result_value, i);
+ quorum_count_vote(&num_votes, &num_value, i);
+ }
+
+ winner = quorum_get_vote_winner(&result_votes);
+ result = winner->value.l;
+
+ winner = quorum_get_vote_winner(&num_votes);
+ *pnum = winner->value.l;
+
+ quorum_free_vote_list(&result_votes);
+ quorum_free_vote_list(&num_votes);
+
+ return result;
+}
+
static BlockDriver bdrv_quorum = {
.format_name = "quorum",
.protocol_name = "quorum",
@@ -693,6 +745,7 @@ static BlockDriver bdrv_quorum = {
.bdrv_aio_readv = quorum_aio_readv,
.bdrv_aio_writev = quorum_aio_writev,
.bdrv_invalidate_cache = quorum_invalidate_cache,
+ .bdrv_co_is_allocated = quorum_co_is_allocated,
};
static void bdrv_quorum_init(void)
--
1.7.10.4
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [Qemu-devel] [RFC V7 11/11] quorum: Add quorum_co_flush().
2013-01-18 17:30 [Qemu-devel] [RFC V7 00/11] Quorum block filter Benoît Canet
` (9 preceding siblings ...)
2013-01-18 17:30 ` [Qemu-devel] [RFC V7 10/11] quorum: Add quorum_co_is_allocated Benoît Canet
@ 2013-01-18 17:30 ` Benoît Canet
2013-01-21 13:02 ` [Qemu-devel] [RFC V7 00/11] Quorum block filter Zhi Yong Wu
11 siblings, 0 replies; 13+ messages in thread
From: Benoît Canet @ 2013-01-18 17:30 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, Benoît Canet, stefanha
Makes a vote to select error if any.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
---
block/quorum.c | 33 +++++++++++++++++++++++++++++++++
1 file changed, 33 insertions(+)
diff --git a/block/quorum.c b/block/quorum.c
index 8cbf66f..0f4f634 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -731,6 +731,38 @@ static int coroutine_fn quorum_co_is_allocated(BlockDriverState *bs,
return result;
}
+static coroutine_fn int quorum_co_flush(BlockDriverState *bs)
+{
+ BDRVQuorumState *s = bs->opaque;
+ QuorumVoteVersion *winner = NULL;
+ QuorumVotes error_votes;
+ QuorumVoteValue result_value;
+ int i;
+ int result = 0;
+ bool error = false;
+
+ QLIST_INIT(&error_votes.vote_list);
+ error_votes.compare = quorum_long_compare;
+
+ for (i = 0; i < s->total; i++) {
+ result = bdrv_co_flush(s->bs[i]);
+ if (result) {
+ error = true;
+ result_value.l = result;
+ quorum_count_vote(&error_votes, &result_value, i);
+ }
+ }
+
+ if (error) {
+ winner = quorum_get_vote_winner(&error_votes);
+ result = winner->value.l;
+ }
+
+ quorum_free_vote_list(&error_votes);
+
+ return result;
+}
+
static BlockDriver bdrv_quorum = {
.format_name = "quorum",
.protocol_name = "quorum",
@@ -741,6 +773,7 @@ static BlockDriver bdrv_quorum = {
.bdrv_file_open = quorum_open,
.bdrv_close = quorum_close,
+ .bdrv_co_flush_to_disk = quorum_co_flush,
.bdrv_aio_readv = quorum_aio_readv,
.bdrv_aio_writev = quorum_aio_writev,
--
1.7.10.4
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [RFC V7 00/11] Quorum block filter
2013-01-18 17:30 [Qemu-devel] [RFC V7 00/11] Quorum block filter Benoît Canet
` (10 preceding siblings ...)
2013-01-18 17:30 ` [Qemu-devel] [RFC V7 11/11] quorum: Add quorum_co_flush() Benoît Canet
@ 2013-01-21 13:02 ` Zhi Yong Wu
11 siblings, 0 replies; 13+ messages in thread
From: Zhi Yong Wu @ 2013-01-21 13:02 UTC (permalink / raw)
To: Benoît Canet; +Cc: qemu-devel
On Sat, Jan 19, 2013 at 1:30 AM, Benoît Canet <benoit@irqsave.net> wrote:
> This patchset is rebased on top of "cutils: unsigned int parsing functions"
> by "Eduardo Habkost".
>
> This patchset create a block driver implementing a quorum using total qemu disk
> images. Writes are mirrored on the $total files.
> For the reading part the $total files are read at the same time and a vote is
> done to determine if a qiov version is present $threshold or more times. It then
> return this majority version to the upper layers.
> When i < $threshold versions of the data are returned by the lower layer the
> quorum is broken and the read return -EIO.
>
> The goal of this patchset is to be turned in a QEMU block filter living just
> above raw-*.c and below qcow2/qed when the required infrastructure will be done.
>
> Main use of this feature will be people using NFS appliances which can be
> subjected to bitflip errors.
>
> This patchset can be used to replace blkverify and the out of tree blkmirror.
>
> usage: -drive
> file=quorum:threshold/total:image_1.raw:...:image_total.raw,if=virtio,cache=none
I don't know if the following case can be handled correctly.
For example, quorum:2/3:image1.raw:image2.raw:image3.raw
Let us assume that some data in image2.raw and image3.raw get
corrupted, and the two images are now completely identical; while
image1.raw doesn't get corrupted. In this case, how will your vote
method know if which image gets corrupted and which image doesn't?
>
> in this version:
> parse total and threshold with parse_uint [Eric]
> return proper qerrors in quorum_open [Eric]
> Use sha256 for comparing blocks [Eric]
> Update the rest of the voting function to the new way of doing [Benoît]
>
> V6:
> fix commit message of "quorum: Add quorum_open() and quorum_close()." [Eric]
> return error after a vote in quorum_co_flush [Eric]
> Fix bitrot caused by headers and structures renaming [Benoît]
> initialize finished to NULL to prevent crash [Benoît]
> convert internal quorum code to uint64_t instead of int64_t [Benoît]
>
> V5:
>
> Eric Blake: revert back separator to ":"
> rewrite quorum_getlength
>
> Benoît Canet: use memcmp to compare iovec excepted for the blkverify case
> use strstart to parse argument in open
>
>
> Benoît Canet (11):
> quorum: Create quorum.c, add QuorumSingleAIOCB and QuorumAIOCB.
> quorum: Create BDRVQuorumState and BlkDriver and do init.
> quorum: Add quorum_open() and quorum_close().
> quorum: Add quorum_aio_writev and its dependencies.
> blkverify: Extract qemu_iovec_clone() and qemu_iovec_compare() from
> blkverify.
> quorum: Add quorum_aio_readv.
> quorum: Add quorum mechanism.
> quorum: Add quorum_getlength().
> quorum: Add quorum_invalidate_cache().
> quorum: Add quorum_co_is_allocated.
> quorum: Add quorum_co_flush().
>
> block/Makefile.objs | 1 +
> block/blkverify.c | 108 +------
> block/quorum.c | 789 +++++++++++++++++++++++++++++++++++++++++++++++++
> configure | 22 ++
> include/qemu-common.h | 2 +
> util/iov.c | 103 +++++++
> 6 files changed, 919 insertions(+), 106 deletions(-)
> create mode 100644 block/quorum.c
>
> --
> 1.7.10.4
>
>
--
Regards,
Zhi Yong Wu
^ permalink raw reply [flat|nested] 13+ messages in thread