* [Qemu-devel] [PATCH v5 0/8] Add metadata overlap checks
@ 2013-09-02 7:25 Max Reitz
2013-09-02 7:25 ` [Qemu-devel] [PATCH v5 1/8] qcow2: Add corrupt bit Max Reitz
` (9 more replies)
0 siblings, 10 replies; 26+ messages in thread
From: Max Reitz @ 2013-09-02 7:25 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Max Reitz
If a qcow2 image file becomes corrupted, any write may inadvertently
overwrite important metadata structures such as the L1 table. This
series adds functionality for detecting, preventing and (to some extent)
repairing such collisions.
v5:
- fixed patch 6 (forgot to update the event_names array for the new
event BLKDBG_REFTABLE_UPDATE); no other changes
v4:
- fixed handling of preallocated zero clusters in patch 4
- moved OFLAG_COPIED checks into a separate function (this affects
patches 4 and 5); functionality remains unchanged
- patches 1, 2, 3, 6, 7 and 8 remain unmodified (except for line
numbers in block/qcow2-refcount.c)
v3:
- split PATCH 4/5 into four distinct patches (4/8, 5/8, 6/8 and 7/8,
respectively)
- directly generate a JSON message when marking the image corrupt
- other (minor) fixes according to Kevin's comments
v2:
- Generally implemented Kevin's comments, especially:
- new QMP event QEVENT_BLOCK_IMAGE_CORRUPTED
- removed BDRV_O_REPAIR in favor of BDRV_O_CHECK | BDRV_O_RDWR
- always check full clusters for overlaps
- removed qcow2_check_allocations in favor of some
qcow2_check_refcounts extensions that will hopefully include all
that functionality
Max Reitz (8):
qcow2: Add corrupt bit
qcow2: Metadata overlap checks
qcow2: Employ metadata overlap checks
qcow2-refcount: Move OFLAG_COPIED checks
qcow2-refcount: Repair OFLAG_COPIED errors
qcow2-refcount: Repair shared refcount blocks
qcow2_check: Mark image consistent
qemu-iotests: Overlapping cluster allocations
block/blkdebug.c | 1 +
block/qcow2-cache.c | 17 ++
block/qcow2-cluster.c | 25 ++-
block/qcow2-refcount.c | 481 +++++++++++++++++++++++++++++++++++++++++----
block/qcow2-snapshot.c | 22 +++
block/qcow2.c | 79 +++++++-
block/qcow2.h | 47 ++++-
docs/specs/qcow2.txt | 7 +-
include/block/block.h | 1 +
include/monitor/monitor.h | 1 +
monitor.c | 1 +
tests/qemu-iotests/031.out | 12 +-
tests/qemu-iotests/036.out | 2 +-
tests/qemu-iotests/060 | 111 +++++++++++
tests/qemu-iotests/060.out | 44 +++++
tests/qemu-iotests/group | 1 +
16 files changed, 805 insertions(+), 47 deletions(-)
create mode 100755 tests/qemu-iotests/060
create mode 100644 tests/qemu-iotests/060.out
--
1.8.3.1
^ permalink raw reply [flat|nested] 26+ messages in thread
* [Qemu-devel] [PATCH v5 1/8] qcow2: Add corrupt bit
2013-09-02 7:25 [Qemu-devel] [PATCH v5 0/8] Add metadata overlap checks Max Reitz
@ 2013-09-02 7:25 ` Max Reitz
2013-09-02 7:25 ` [Qemu-devel] [PATCH v5 2/8] qcow2: Metadata overlap checks Max Reitz
` (8 subsequent siblings)
9 siblings, 0 replies; 26+ messages in thread
From: Max Reitz @ 2013-09-02 7:25 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Max Reitz
This adds an incompatible bit indicating corruption to qcow2. Any image
with this bit set may not be written to unless for repairing (and
subsequently clearing the bit if the repair has been successful).
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
block/qcow2.c | 47 ++++++++++++++++++++++++++++++++++++++++++++++
block/qcow2.h | 7 ++++++-
docs/specs/qcow2.txt | 7 ++++++-
tests/qemu-iotests/031.out | 12 ++++++------
tests/qemu-iotests/036.out | 2 +-
5 files changed, 66 insertions(+), 9 deletions(-)
diff --git a/block/qcow2.c b/block/qcow2.c
index 78097e5..16c9898 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -272,6 +272,37 @@ static int qcow2_mark_clean(BlockDriverState *bs)
return 0;
}
+/*
+ * Marks the image as corrupt.
+ */
+int qcow2_mark_corrupt(BlockDriverState *bs)
+{
+ BDRVQcowState *s = bs->opaque;
+
+ s->incompatible_features |= QCOW2_INCOMPAT_CORRUPT;
+ return qcow2_update_header(bs);
+}
+
+/*
+ * Marks the image as consistent, i.e., unsets the corrupt bit, and flushes
+ * before if necessary.
+ */
+int qcow2_mark_consistent(BlockDriverState *bs)
+{
+ BDRVQcowState *s = bs->opaque;
+
+ if (s->incompatible_features & QCOW2_INCOMPAT_CORRUPT) {
+ int ret = bdrv_flush(bs);
+ if (ret < 0) {
+ return ret;
+ }
+
+ s->incompatible_features &= ~QCOW2_INCOMPAT_CORRUPT;
+ return qcow2_update_header(bs);
+ }
+ return 0;
+}
+
static int qcow2_check(BlockDriverState *bs, BdrvCheckResult *result,
BdrvCheckMode fix)
{
@@ -402,6 +433,17 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags)
goto fail;
}
+ if (s->incompatible_features & QCOW2_INCOMPAT_CORRUPT) {
+ /* Corrupt images may not be written to unless they are being repaired
+ */
+ if ((flags & BDRV_O_RDWR) && !(flags & BDRV_O_CHECK)) {
+ error_report("qcow2: Image is corrupt; cannot be opened "
+ "read/write.");
+ ret = -EACCES;
+ goto fail;
+ }
+ }
+
/* Check support for various header values */
if (header.refcount_order != 4) {
report_unsupported(bs, "%d bit reference counts",
@@ -1130,6 +1172,11 @@ int qcow2_update_header(BlockDriverState *bs)
.name = "dirty bit",
},
{
+ .type = QCOW2_FEAT_TYPE_INCOMPATIBLE,
+ .bit = QCOW2_INCOMPAT_CORRUPT_BITNR,
+ .name = "corrupt bit",
+ },
+ {
.type = QCOW2_FEAT_TYPE_COMPATIBLE,
.bit = QCOW2_COMPAT_LAZY_REFCOUNTS_BITNR,
.name = "lazy refcounts",
diff --git a/block/qcow2.h b/block/qcow2.h
index dba9771..4297487 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -119,9 +119,12 @@ enum {
/* Incompatible feature bits */
enum {
QCOW2_INCOMPAT_DIRTY_BITNR = 0,
+ QCOW2_INCOMPAT_CORRUPT_BITNR = 1,
QCOW2_INCOMPAT_DIRTY = 1 << QCOW2_INCOMPAT_DIRTY_BITNR,
+ QCOW2_INCOMPAT_CORRUPT = 1 << QCOW2_INCOMPAT_CORRUPT_BITNR,
- QCOW2_INCOMPAT_MASK = QCOW2_INCOMPAT_DIRTY,
+ QCOW2_INCOMPAT_MASK = QCOW2_INCOMPAT_DIRTY
+ | QCOW2_INCOMPAT_CORRUPT,
};
/* Compatible feature bits */
@@ -361,6 +364,8 @@ int qcow2_backing_read1(BlockDriverState *bs, QEMUIOVector *qiov,
int64_t sector_num, int nb_sectors);
int qcow2_mark_dirty(BlockDriverState *bs);
+int qcow2_mark_corrupt(BlockDriverState *bs);
+int qcow2_mark_consistent(BlockDriverState *bs);
int qcow2_update_header(BlockDriverState *bs);
/* qcow2-refcount.c functions */
diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
index 36a559d..33eca36 100644
--- a/docs/specs/qcow2.txt
+++ b/docs/specs/qcow2.txt
@@ -80,7 +80,12 @@ in the description of a field.
tables to repair refcounts before accessing the
image.
- Bits 1-63: Reserved (set to 0)
+ Bit 1: Corrupt bit. If this bit is set then any data
+ structure may be corrupt and the image must not
+ be written to (unless for regaining
+ consistency).
+
+ Bits 2-63: Reserved (set to 0)
80 - 87: compatible_features
Bitmask of compatible features. An implementation can
diff --git a/tests/qemu-iotests/031.out b/tests/qemu-iotests/031.out
index 796c993..a943344 100644
--- a/tests/qemu-iotests/031.out
+++ b/tests/qemu-iotests/031.out
@@ -54,7 +54,7 @@ header_length 72
Header extension:
magic 0x6803f857
-length 96
+length 144
data <binary>
Header extension:
@@ -68,7 +68,7 @@ No errors were found on the image.
magic 0x514649fb
version 2
-backing_file_offset 0xf8
+backing_file_offset 0x128
backing_file_size 0x17
cluster_bits 16
size 67108864
@@ -92,7 +92,7 @@ data 'host_device'
Header extension:
magic 0x6803f857
-length 96
+length 144
data <binary>
Header extension:
@@ -155,7 +155,7 @@ header_length 104
Header extension:
magic 0x6803f857
-length 96
+length 144
data <binary>
Header extension:
@@ -169,7 +169,7 @@ No errors were found on the image.
magic 0x514649fb
version 3
-backing_file_offset 0x118
+backing_file_offset 0x148
backing_file_size 0x17
cluster_bits 16
size 67108864
@@ -193,7 +193,7 @@ data 'host_device'
Header extension:
magic 0x6803f857
-length 96
+length 144
data <binary>
Header extension:
diff --git a/tests/qemu-iotests/036.out b/tests/qemu-iotests/036.out
index 063ca22..55a3e6e 100644
--- a/tests/qemu-iotests/036.out
+++ b/tests/qemu-iotests/036.out
@@ -46,7 +46,7 @@ header_length 104
Header extension:
magic 0x6803f857
-length 96
+length 144
data <binary>
*** done
--
1.8.3.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [Qemu-devel] [PATCH v5 2/8] qcow2: Metadata overlap checks
2013-09-02 7:25 [Qemu-devel] [PATCH v5 0/8] Add metadata overlap checks Max Reitz
2013-09-02 7:25 ` [Qemu-devel] [PATCH v5 1/8] qcow2: Add corrupt bit Max Reitz
@ 2013-09-02 7:25 ` Max Reitz
2013-09-02 7:25 ` [Qemu-devel] [PATCH v5 3/8] qcow2: Employ metadata " Max Reitz
` (7 subsequent siblings)
9 siblings, 0 replies; 26+ messages in thread
From: Max Reitz @ 2013-09-02 7:25 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Max Reitz
Two new functions are added; the first one checks a given range in the
image file for overlaps with metadata (main header, L1 tables, L2
tables, refcount table and blocks).
The second one should be used immediately before writing to the image
file as it calls the first function and, upon collision, marks the
image as corrupt and makes the BDS unusable, thereby preventing
further access.
Both functions take a bitmask argument specifying the structures which
should be checked for overlaps, making it possible to also check
metadata writes against colliding with other structures.
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
block/qcow2-refcount.c | 172 ++++++++++++++++++++++++++++++++++++++++++++++
block/qcow2.h | 39 +++++++++++
include/monitor/monitor.h | 1 +
monitor.c | 1 +
4 files changed, 213 insertions(+)
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 1244693..310efcc 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -25,6 +25,8 @@
#include "qemu-common.h"
#include "block/block_int.h"
#include "block/qcow2.h"
+#include "qemu/range.h"
+#include "qapi/qmp/types.h"
static int64_t alloc_clusters_noref(BlockDriverState *bs, int64_t size);
static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
@@ -1372,3 +1374,173 @@ fail:
return ret;
}
+#define overlaps_with(ofs, sz) \
+ ranges_overlap(offset, size, ofs, sz)
+
+/*
+ * Checks if the given offset into the image file is actually free to use by
+ * looking for overlaps with important metadata sections (L1/L2 tables etc.),
+ * i.e. a sanity check without relying on the refcount tables.
+ *
+ * The chk parameter specifies exactly what checks to perform (being a bitmask
+ * of QCow2MetadataOverlap values).
+ *
+ * Returns:
+ * - 0 if writing to this offset will not affect the mentioned metadata
+ * - a positive QCow2MetadataOverlap value indicating one overlapping section
+ * - a negative value (-errno) indicating an error while performing a check,
+ * e.g. when bdrv_read failed on QCOW2_OL_INACTIVE_L2
+ */
+int qcow2_check_metadata_overlap(BlockDriverState *bs, int chk, int64_t offset,
+ int64_t size)
+{
+ BDRVQcowState *s = bs->opaque;
+ int i, j;
+
+ if (!size) {
+ return 0;
+ }
+
+ if (chk & QCOW2_OL_MAIN_HEADER) {
+ if (offset < s->cluster_size) {
+ return QCOW2_OL_MAIN_HEADER;
+ }
+ }
+
+ /* align range to test to cluster boundaries */
+ size = align_offset(offset_into_cluster(s, offset) + size, s->cluster_size);
+ offset = start_of_cluster(s, offset);
+
+ if ((chk & QCOW2_OL_ACTIVE_L1) && s->l1_size) {
+ if (overlaps_with(s->l1_table_offset, s->l1_size * sizeof(uint64_t))) {
+ return QCOW2_OL_ACTIVE_L1;
+ }
+ }
+
+ if ((chk & QCOW2_OL_REFCOUNT_TABLE) && s->refcount_table_size) {
+ if (overlaps_with(s->refcount_table_offset,
+ s->refcount_table_size * sizeof(uint64_t))) {
+ return QCOW2_OL_REFCOUNT_TABLE;
+ }
+ }
+
+ if ((chk & QCOW2_OL_SNAPSHOT_TABLE) && s->snapshots_size) {
+ if (overlaps_with(s->snapshots_offset, s->snapshots_size)) {
+ return QCOW2_OL_SNAPSHOT_TABLE;
+ }
+ }
+
+ if ((chk & QCOW2_OL_INACTIVE_L1) && s->snapshots) {
+ for (i = 0; i < s->nb_snapshots; i++) {
+ if (s->snapshots[i].l1_size &&
+ overlaps_with(s->snapshots[i].l1_table_offset,
+ s->snapshots[i].l1_size * sizeof(uint64_t))) {
+ return QCOW2_OL_INACTIVE_L1;
+ }
+ }
+ }
+
+ if ((chk & QCOW2_OL_ACTIVE_L2) && s->l1_table) {
+ for (i = 0; i < s->l1_size; i++) {
+ if ((s->l1_table[i] & L1E_OFFSET_MASK) &&
+ overlaps_with(s->l1_table[i] & L1E_OFFSET_MASK,
+ s->cluster_size)) {
+ return QCOW2_OL_ACTIVE_L2;
+ }
+ }
+ }
+
+ if ((chk & QCOW2_OL_REFCOUNT_BLOCK) && s->refcount_table) {
+ for (i = 0; i < s->refcount_table_size; i++) {
+ if ((s->refcount_table[i] & REFT_OFFSET_MASK) &&
+ overlaps_with(s->refcount_table[i] & REFT_OFFSET_MASK,
+ s->cluster_size)) {
+ return QCOW2_OL_REFCOUNT_BLOCK;
+ }
+ }
+ }
+
+ if ((chk & QCOW2_OL_INACTIVE_L2) && s->snapshots) {
+ for (i = 0; i < s->nb_snapshots; i++) {
+ uint64_t l1_ofs = s->snapshots[i].l1_table_offset;
+ uint32_t l1_sz = s->snapshots[i].l1_size;
+ uint64_t *l1 = g_malloc(l1_sz * sizeof(uint64_t));
+ int ret;
+
+ ret = bdrv_read(bs->file, l1_ofs / BDRV_SECTOR_SIZE, (uint8_t *)l1,
+ l1_sz * sizeof(uint64_t) / BDRV_SECTOR_SIZE);
+
+ if (ret < 0) {
+ g_free(l1);
+ return ret;
+ }
+
+ for (j = 0; j < l1_sz; j++) {
+ if ((l1[j] & L1E_OFFSET_MASK) &&
+ overlaps_with(l1[j] & L1E_OFFSET_MASK, s->cluster_size)) {
+ g_free(l1);
+ return QCOW2_OL_INACTIVE_L2;
+ }
+ }
+
+ g_free(l1);
+ }
+ }
+
+ return 0;
+}
+
+static const char *metadata_ol_names[] = {
+ [QCOW2_OL_MAIN_HEADER_BITNR] = "qcow2_header",
+ [QCOW2_OL_ACTIVE_L1_BITNR] = "active L1 table",
+ [QCOW2_OL_ACTIVE_L2_BITNR] = "active L2 table",
+ [QCOW2_OL_REFCOUNT_TABLE_BITNR] = "refcount table",
+ [QCOW2_OL_REFCOUNT_BLOCK_BITNR] = "refcount block",
+ [QCOW2_OL_SNAPSHOT_TABLE_BITNR] = "snapshot table",
+ [QCOW2_OL_INACTIVE_L1_BITNR] = "inactive L1 table",
+ [QCOW2_OL_INACTIVE_L2_BITNR] = "inactive L2 table",
+};
+
+/*
+ * First performs a check for metadata overlaps (through
+ * qcow2_check_metadata_overlap); if that fails with a negative value (error
+ * while performing a check), that value is returned. If an impending overlap
+ * is detected, the BDS will be made unusable, the qcow2 file marked corrupt
+ * and -EIO returned.
+ *
+ * Returns 0 if there were neither overlaps nor errors while checking for
+ * overlaps; or a negative value (-errno) on error.
+ */
+int qcow2_pre_write_overlap_check(BlockDriverState *bs, int chk, int64_t offset,
+ int64_t size)
+{
+ int ret = qcow2_check_metadata_overlap(bs, chk, offset, size);
+
+ if (ret < 0) {
+ return ret;
+ } else if (ret > 0) {
+ int metadata_ol_bitnr = ffs(ret) - 1;
+ char *message;
+ QObject *data;
+
+ assert(metadata_ol_bitnr < QCOW2_OL_MAX_BITNR);
+
+ fprintf(stderr, "qcow2: Preventing invalid write on metadata (overlaps "
+ "with %s); image marked as corrupt.\n",
+ metadata_ol_names[metadata_ol_bitnr]);
+ message = g_strdup_printf("Prevented %s overwrite",
+ metadata_ol_names[metadata_ol_bitnr]);
+ data = qobject_from_jsonf("{ 'device': %s, 'msg': %s, 'offset': %"
+ PRId64 ", 'size': %" PRId64 " }", bs->device_name, message,
+ offset, size);
+ monitor_protocol_event(QEVENT_BLOCK_IMAGE_CORRUPTED, data);
+ g_free(message);
+ qobject_decref(data);
+
+ qcow2_mark_corrupt(bs);
+ bs->drv = NULL; /* make BDS unusable */
+ return -EIO;
+ }
+
+ return 0;
+}
diff --git a/block/qcow2.h b/block/qcow2.h
index 4297487..86ddb30 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -289,6 +289,40 @@ enum {
QCOW2_CLUSTER_ZERO
};
+typedef enum QCow2MetadataOverlap {
+ QCOW2_OL_MAIN_HEADER_BITNR = 0,
+ QCOW2_OL_ACTIVE_L1_BITNR = 1,
+ QCOW2_OL_ACTIVE_L2_BITNR = 2,
+ QCOW2_OL_REFCOUNT_TABLE_BITNR = 3,
+ QCOW2_OL_REFCOUNT_BLOCK_BITNR = 4,
+ QCOW2_OL_SNAPSHOT_TABLE_BITNR = 5,
+ QCOW2_OL_INACTIVE_L1_BITNR = 6,
+ QCOW2_OL_INACTIVE_L2_BITNR = 7,
+
+ QCOW2_OL_MAX_BITNR = 8,
+
+ QCOW2_OL_NONE = 0,
+ QCOW2_OL_MAIN_HEADER = (1 << QCOW2_OL_MAIN_HEADER_BITNR),
+ QCOW2_OL_ACTIVE_L1 = (1 << QCOW2_OL_ACTIVE_L1_BITNR),
+ QCOW2_OL_ACTIVE_L2 = (1 << QCOW2_OL_ACTIVE_L2_BITNR),
+ QCOW2_OL_REFCOUNT_TABLE = (1 << QCOW2_OL_REFCOUNT_TABLE_BITNR),
+ QCOW2_OL_REFCOUNT_BLOCK = (1 << QCOW2_OL_REFCOUNT_BLOCK_BITNR),
+ QCOW2_OL_SNAPSHOT_TABLE = (1 << QCOW2_OL_SNAPSHOT_TABLE_BITNR),
+ QCOW2_OL_INACTIVE_L1 = (1 << QCOW2_OL_INACTIVE_L1_BITNR),
+ /* NOTE: Checking overlaps with inactive L2 tables will result in bdrv
+ * reads. */
+ QCOW2_OL_INACTIVE_L2 = (1 << QCOW2_OL_INACTIVE_L2_BITNR),
+} QCow2MetadataOverlap;
+
+/* Perform all overlap checks which don't require disk access */
+#define QCOW2_OL_CACHED \
+ (QCOW2_OL_MAIN_HEADER | QCOW2_OL_ACTIVE_L1 | QCOW2_OL_ACTIVE_L2 | \
+ QCOW2_OL_REFCOUNT_TABLE | QCOW2_OL_REFCOUNT_BLOCK | \
+ QCOW2_OL_SNAPSHOT_TABLE | QCOW2_OL_INACTIVE_L1)
+
+/* The default checks to perform */
+#define QCOW2_OL_DEFAULT QCOW2_OL_CACHED
+
#define L1E_OFFSET_MASK 0x00ffffffffffff00ULL
#define L2E_OFFSET_MASK 0x00ffffffffffff00ULL
#define L2E_COMPRESSED_OFFSET_SIZE_MASK 0x3fffffffffffffffULL
@@ -390,6 +424,11 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
void qcow2_process_discards(BlockDriverState *bs, int ret);
+int qcow2_check_metadata_overlap(BlockDriverState *bs, int chk, int64_t offset,
+ int64_t size);
+int qcow2_pre_write_overlap_check(BlockDriverState *bs, int chk, int64_t offset,
+ int64_t size);
+
/* qcow2-cluster.c functions */
int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
bool exact_size);
diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
index 1942cc4..10fa0e3 100644
--- a/include/monitor/monitor.h
+++ b/include/monitor/monitor.h
@@ -48,6 +48,7 @@ typedef enum MonitorEvent {
QEVENT_BALLOON_CHANGE,
QEVENT_SPICE_MIGRATE_COMPLETED,
QEVENT_GUEST_PANICKED,
+ QEVENT_BLOCK_IMAGE_CORRUPTED,
/* Add to 'monitor_event_names' array in monitor.c when
* defining new events here */
diff --git a/monitor.c b/monitor.c
index ee9744c..2c542e1 100644
--- a/monitor.c
+++ b/monitor.c
@@ -504,6 +504,7 @@ static const char *monitor_event_names[] = {
[QEVENT_BALLOON_CHANGE] = "BALLOON_CHANGE",
[QEVENT_SPICE_MIGRATE_COMPLETED] = "SPICE_MIGRATE_COMPLETED",
[QEVENT_GUEST_PANICKED] = "GUEST_PANICKED",
+ [QEVENT_BLOCK_IMAGE_CORRUPTED] = "BLOCK_IMAGE_CORRUPTED",
};
QEMU_BUILD_BUG_ON(ARRAY_SIZE(monitor_event_names) != QEVENT_MAX)
--
1.8.3.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [Qemu-devel] [PATCH v5 3/8] qcow2: Employ metadata overlap checks
2013-09-02 7:25 [Qemu-devel] [PATCH v5 0/8] Add metadata overlap checks Max Reitz
2013-09-02 7:25 ` [Qemu-devel] [PATCH v5 1/8] qcow2: Add corrupt bit Max Reitz
2013-09-02 7:25 ` [Qemu-devel] [PATCH v5 2/8] qcow2: Metadata overlap checks Max Reitz
@ 2013-09-02 7:25 ` Max Reitz
2013-09-02 7:25 ` [Qemu-devel] [PATCH v5 4/8] qcow2-refcount: Move OFLAG_COPIED checks Max Reitz
` (6 subsequent siblings)
9 siblings, 0 replies; 26+ messages in thread
From: Max Reitz @ 2013-09-02 7:25 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Max Reitz
The pre-write overlap check function is now called before most of the
qcow2 writes (aborting it on collision or other error).
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
block/qcow2-cache.c | 17 +++++++++++++++++
block/qcow2-cluster.c | 21 +++++++++++++++++++++
block/qcow2-snapshot.c | 22 ++++++++++++++++++++++
block/qcow2.c | 26 ++++++++++++++++++++++++++
4 files changed, 86 insertions(+)
diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
index 2f3114e..7bcae09 100644
--- a/block/qcow2-cache.c
+++ b/block/qcow2-cache.c
@@ -115,6 +115,23 @@ static int qcow2_cache_entry_flush(BlockDriverState *bs, Qcow2Cache *c, int i)
}
if (c == s->refcount_block_cache) {
+ ret = qcow2_pre_write_overlap_check(bs,
+ QCOW2_OL_DEFAULT & ~QCOW2_OL_REFCOUNT_BLOCK,
+ c->entries[i].offset, s->cluster_size);
+ } else if (c == s->l2_table_cache) {
+ ret = qcow2_pre_write_overlap_check(bs,
+ QCOW2_OL_DEFAULT & ~QCOW2_OL_ACTIVE_L2,
+ c->entries[i].offset, s->cluster_size);
+ } else {
+ ret = qcow2_pre_write_overlap_check(bs, QCOW2_OL_DEFAULT,
+ c->entries[i].offset, s->cluster_size);
+ }
+
+ if (ret < 0) {
+ return ret;
+ }
+
+ if (c == s->refcount_block_cache) {
BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_UPDATE_PART);
} else if (c == s->l2_table_cache) {
BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE);
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index cca76d4..7c248aa 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -80,6 +80,14 @@ int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
goto fail;
}
+ /* the L1 position has not yet been updated, so these clusters must
+ * indeed be completely free */
+ ret = qcow2_pre_write_overlap_check(bs, QCOW2_OL_DEFAULT,
+ new_l1_table_offset, new_l1_size2);
+ if (ret < 0) {
+ goto fail;
+ }
+
BLKDBG_EVENT(bs->file, BLKDBG_L1_GROW_WRITE_TABLE);
for(i = 0; i < s->l1_size; i++)
new_l1_table[i] = cpu_to_be64(new_l1_table[i]);
@@ -149,6 +157,13 @@ static int write_l1_entry(BlockDriverState *bs, int l1_index)
buf[i] = cpu_to_be64(s->l1_table[l1_start_index + i]);
}
+ ret = qcow2_pre_write_overlap_check(bs,
+ QCOW2_OL_DEFAULT & ~QCOW2_OL_ACTIVE_L1,
+ s->l1_table_offset + 8 * l1_start_index, sizeof(buf));
+ if (ret < 0) {
+ return ret;
+ }
+
BLKDBG_EVENT(bs->file, BLKDBG_L1_UPDATE);
ret = bdrv_pwrite_sync(bs->file, s->l1_table_offset + 8 * l1_start_index,
buf, sizeof(buf));
@@ -368,6 +383,12 @@ static int coroutine_fn copy_sectors(BlockDriverState *bs,
&s->aes_encrypt_key);
}
+ ret = qcow2_pre_write_overlap_check(bs, QCOW2_OL_DEFAULT,
+ cluster_offset + n_start * BDRV_SECTOR_SIZE, n * BDRV_SECTOR_SIZE);
+ if (ret < 0) {
+ goto out;
+ }
+
BLKDBG_EVENT(bs->file, BLKDBG_COW_WRITE);
ret = bdrv_co_writev(bs->file, (cluster_offset >> 9) + n_start, n, &qiov);
if (ret < 0) {
diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 0caac90..e7e6013 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -189,6 +189,15 @@ static int qcow2_write_snapshots(BlockDriverState *bs)
return ret;
}
+ /* The snapshot list position has not yet been updated, so these clusters
+ * must indeed be completely free */
+ ret = qcow2_pre_write_overlap_check(bs, QCOW2_OL_DEFAULT, offset,
+ s->snapshots_size);
+ if (ret < 0) {
+ return ret;
+ }
+
+
/* Write all snapshots to the new list */
for(i = 0; i < s->nb_snapshots; i++) {
sn = s->snapshots + i;
@@ -363,6 +372,12 @@ int qcow2_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info)
l1_table[i] = cpu_to_be64(s->l1_table[i]);
}
+ ret = qcow2_pre_write_overlap_check(bs, QCOW2_OL_DEFAULT,
+ sn->l1_table_offset, s->l1_size * sizeof(uint64_t));
+ if (ret < 0) {
+ goto fail;
+ }
+
ret = bdrv_pwrite(bs->file, sn->l1_table_offset, l1_table,
s->l1_size * sizeof(uint64_t));
if (ret < 0) {
@@ -475,6 +490,13 @@ int qcow2_snapshot_goto(BlockDriverState *bs, const char *snapshot_id)
goto fail;
}
+ ret = qcow2_pre_write_overlap_check(bs,
+ QCOW2_OL_DEFAULT & ~QCOW2_OL_ACTIVE_L1,
+ s->l1_table_offset, cur_l1_bytes);
+ if (ret < 0) {
+ goto fail;
+ }
+
ret = bdrv_pwrite_sync(bs->file, s->l1_table_offset, sn_l1_table,
cur_l1_bytes);
if (ret < 0) {
diff --git a/block/qcow2.c b/block/qcow2.c
index 16c9898..3a95ff1 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -624,6 +624,8 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags)
qcow2_free_snapshots(bs);
qcow2_refcount_close(bs);
g_free(s->l1_table);
+ /* else pre-write overlap checks in cache_destroy may crash */
+ s->l1_table = NULL;
if (s->l2_table_cache) {
qcow2_cache_destroy(bs, s->l2_table_cache);
}
@@ -923,6 +925,13 @@ static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
cur_nr_sectors * 512);
}
+ ret = qcow2_pre_write_overlap_check(bs, QCOW2_OL_DEFAULT,
+ cluster_offset + index_in_cluster * BDRV_SECTOR_SIZE,
+ cur_nr_sectors * BDRV_SECTOR_SIZE);
+ if (ret < 0) {
+ goto fail;
+ }
+
qemu_co_mutex_unlock(&s->lock);
BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
trace_qcow2_writev_data(qemu_coroutine_self(),
@@ -989,6 +998,8 @@ static void qcow2_close(BlockDriverState *bs)
{
BDRVQcowState *s = bs->opaque;
g_free(s->l1_table);
+ /* else pre-write overlap checks in cache_destroy may crash */
+ s->l1_table = NULL;
qcow2_cache_flush(bs, s->l2_table_cache);
qcow2_cache_flush(bs, s->refcount_block_cache);
@@ -1666,6 +1677,14 @@ static int qcow2_write_compressed(BlockDriverState *bs, int64_t sector_num,
if (ret != Z_STREAM_END || out_len >= s->cluster_size) {
/* could not compress: write normal cluster */
+
+ ret = qcow2_pre_write_overlap_check(bs, QCOW2_OL_DEFAULT,
+ sector_num * BDRV_SECTOR_SIZE,
+ s->cluster_sectors * BDRV_SECTOR_SIZE);
+ if (ret < 0) {
+ goto fail;
+ }
+
ret = bdrv_write(bs, sector_num, buf, s->cluster_sectors);
if (ret < 0) {
goto fail;
@@ -1678,6 +1697,13 @@ static int qcow2_write_compressed(BlockDriverState *bs, int64_t sector_num,
goto fail;
}
cluster_offset &= s->cluster_offset_mask;
+
+ ret = qcow2_pre_write_overlap_check(bs, QCOW2_OL_DEFAULT,
+ cluster_offset, out_len);
+ if (ret < 0) {
+ goto fail;
+ }
+
BLKDBG_EVENT(bs->file, BLKDBG_WRITE_COMPRESSED);
ret = bdrv_pwrite(bs->file, cluster_offset, out_buf, out_len);
if (ret < 0) {
--
1.8.3.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [Qemu-devel] [PATCH v5 4/8] qcow2-refcount: Move OFLAG_COPIED checks
2013-09-02 7:25 [Qemu-devel] [PATCH v5 0/8] Add metadata overlap checks Max Reitz
` (2 preceding siblings ...)
2013-09-02 7:25 ` [Qemu-devel] [PATCH v5 3/8] qcow2: Employ metadata " Max Reitz
@ 2013-09-02 7:25 ` Max Reitz
2013-09-02 7:25 ` [Qemu-devel] [PATCH v5 5/8] qcow2-refcount: Repair OFLAG_COPIED errors Max Reitz
` (5 subsequent siblings)
9 siblings, 0 replies; 26+ messages in thread
From: Max Reitz @ 2013-09-02 7:25 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Max Reitz
Move the OFLAG_COPIED checks out of check_refcounts_l1 and
check_refcounts_l2 and after the actual refcount checks/fixes (since the
refcounts might actually change there).
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
block/qcow2-refcount.c | 115 +++++++++++++++++++++++++++++++++++--------------
1 file changed, 82 insertions(+), 33 deletions(-)
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 310efcc..ddc3029 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1035,7 +1035,7 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
BDRVQcowState *s = bs->opaque;
uint64_t *l2_table, l2_entry;
uint64_t next_contiguous_offset = 0;
- int i, l2_size, nb_csectors, refcount;
+ int i, l2_size, nb_csectors;
/* Read L2 table from disk */
l2_size = s->l2_size * sizeof(uint64_t);
@@ -1087,23 +1087,8 @@ static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
case QCOW2_CLUSTER_NORMAL:
{
- /* QCOW_OFLAG_COPIED must be set iff refcount == 1 */
uint64_t offset = l2_entry & L2E_OFFSET_MASK;
- if (flags & CHECK_OFLAG_COPIED) {
- refcount = get_refcount(bs, offset >> s->cluster_bits);
- if (refcount < 0) {
- fprintf(stderr, "Can't get refcount for offset %"
- PRIx64 ": %s\n", l2_entry, strerror(-refcount));
- goto fail;
- }
- if ((refcount == 1) != ((l2_entry & QCOW_OFLAG_COPIED) != 0)) {
- fprintf(stderr, "ERROR OFLAG_COPIED: offset=%"
- PRIx64 " refcount=%d\n", l2_entry, refcount);
- res->corruptions++;
- }
- }
-
if (flags & CHECK_FRAG_INFO) {
res->bfi.allocated_clusters++;
if (next_contiguous_offset &&
@@ -1160,7 +1145,7 @@ static int check_refcounts_l1(BlockDriverState *bs,
{
BDRVQcowState *s = bs->opaque;
uint64_t *l1_table, l2_offset, l1_size2;
- int i, refcount, ret;
+ int i, ret;
l1_size2 = l1_size * sizeof(uint64_t);
@@ -1184,22 +1169,6 @@ static int check_refcounts_l1(BlockDriverState *bs,
for(i = 0; i < l1_size; i++) {
l2_offset = l1_table[i];
if (l2_offset) {
- /* QCOW_OFLAG_COPIED must be set iff refcount == 1 */
- if (flags & CHECK_OFLAG_COPIED) {
- refcount = get_refcount(bs, (l2_offset & ~QCOW_OFLAG_COPIED)
- >> s->cluster_bits);
- if (refcount < 0) {
- fprintf(stderr, "Can't get refcount for l2_offset %"
- PRIx64 ": %s\n", l2_offset, strerror(-refcount));
- goto fail;
- }
- if ((refcount == 1) != ((l2_offset & QCOW_OFLAG_COPIED) != 0)) {
- fprintf(stderr, "ERROR OFLAG_COPIED: l2_offset=%" PRIx64
- " refcount=%d\n", l2_offset, refcount);
- res->corruptions++;
- }
- }
-
/* Mark L2 table as used */
l2_offset &= L1E_OFFSET_MASK;
inc_refcounts(bs, res, refcount_table, refcount_table_size,
@@ -1231,6 +1200,80 @@ fail:
}
/*
+ * Checks the OFLAG_COPIED flag for all L1 and L2 entries.
+ *
+ * This function does not print an error message nor does it increment
+ * check_errors if get_refcount fails (this is because such an error will have
+ * been already detected and sufficiently signaled by the calling function
+ * (qcow2_check_refcounts) by the time this function is called).
+ */
+static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res)
+{
+ BDRVQcowState *s = bs->opaque;
+ uint64_t *l2_table = qemu_blockalign(bs, s->cluster_size);
+ int ret;
+ int refcount;
+ int i, j;
+
+ for (i = 0; i < s->l1_size; i++) {
+ uint64_t l1_entry = s->l1_table[i];
+ uint64_t l2_offset = l1_entry & L1E_OFFSET_MASK;
+
+ if (!l2_offset) {
+ continue;
+ }
+
+ refcount = get_refcount(bs, l2_offset >> s->cluster_bits);
+ if (refcount < 0) {
+ /* don't print message nor increment check_errors */
+ continue;
+ }
+ if ((refcount == 1) != ((l1_entry & QCOW_OFLAG_COPIED) != 0)) {
+ fprintf(stderr, "ERROR OFLAG_COPIED L2 cluster: l1_index=%d "
+ "l1_entry=%" PRIx64 " refcount=%d\n",
+ i, l1_entry, refcount);
+ res->corruptions++;
+ }
+
+ ret = bdrv_pread(bs->file, l2_offset, l2_table,
+ s->l2_size * sizeof(uint64_t));
+ if (ret < 0) {
+ fprintf(stderr, "ERROR: Could not read L2 table: %s\n",
+ strerror(-ret));
+ res->check_errors++;
+ goto fail;
+ }
+
+ for (j = 0; j < s->l2_size; j++) {
+ uint64_t l2_entry = be64_to_cpu(l2_table[j]);
+ uint64_t data_offset = l2_entry & L2E_OFFSET_MASK;
+ int cluster_type = qcow2_get_cluster_type(l2_entry);
+
+ if ((cluster_type == QCOW2_CLUSTER_NORMAL) ||
+ ((cluster_type == QCOW2_CLUSTER_ZERO) && (data_offset != 0))) {
+ refcount = get_refcount(bs, data_offset >> s->cluster_bits);
+ if (refcount < 0) {
+ /* don't print message nor increment check_errors */
+ continue;
+ }
+ if ((refcount == 1) != ((l2_entry & QCOW_OFLAG_COPIED) != 0)) {
+ fprintf(stderr, "ERROR OFLAG_COPIED data cluster: "
+ "l2_entry=%" PRIx64 " refcount=%d\n",
+ l2_entry, refcount);
+ res->corruptions++;
+ }
+ }
+ }
+ }
+
+ ret = 0;
+
+fail:
+ qemu_vfree(l2_table);
+ return ret;
+}
+
+/*
* Checks an image for refcount consistency.
*
* Returns 0 if no errors are found, the number of errors in case the image is
@@ -1365,6 +1408,12 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
}
}
+ /* check OFLAG_COPIED */
+ ret = check_oflag_copied(bs, res);
+ if (ret < 0) {
+ goto fail;
+ }
+
res->image_end_offset = (highest_cluster + 1) * s->cluster_size;
ret = 0;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [Qemu-devel] [PATCH v5 5/8] qcow2-refcount: Repair OFLAG_COPIED errors
2013-09-02 7:25 [Qemu-devel] [PATCH v5 0/8] Add metadata overlap checks Max Reitz
` (3 preceding siblings ...)
2013-09-02 7:25 ` [Qemu-devel] [PATCH v5 4/8] qcow2-refcount: Move OFLAG_COPIED checks Max Reitz
@ 2013-09-02 7:25 ` Max Reitz
2013-09-02 7:25 ` [Qemu-devel] [PATCH v5 6/8] qcow2-refcount: Repair shared refcount blocks Max Reitz
` (4 subsequent siblings)
9 siblings, 0 replies; 26+ messages in thread
From: Max Reitz @ 2013-09-02 7:25 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Max Reitz
Since the OFLAG_COPIED checks are now executed after the refcounts have
been repaired (if repairing), it is safe to assume that they are correct
but the OFLAG_COPIED flag may be not. Therefore, if its value differs
from what it should be (considering the according refcount), that
discrepancy can be repaired by correctly setting (or clearing that flag.
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
block/qcow2-cluster.c | 4 ++--
block/qcow2-refcount.c | 58 ++++++++++++++++++++++++++++++++++++++++++++------
block/qcow2.h | 1 +
3 files changed, 55 insertions(+), 8 deletions(-)
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 7c248aa..2d5aa92 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -145,7 +145,7 @@ static int l2_load(BlockDriverState *bs, uint64_t l2_offset,
* and we really don't want bdrv_pread to perform a read-modify-write)
*/
#define L1_ENTRIES_PER_SECTOR (512 / 8)
-static int write_l1_entry(BlockDriverState *bs, int l1_index)
+int qcow2_write_l1_entry(BlockDriverState *bs, int l1_index)
{
BDRVQcowState *s = bs->opaque;
uint64_t buf[L1_ENTRIES_PER_SECTOR];
@@ -254,7 +254,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
/* update the L1 entry */
trace_qcow2_l2_allocate_write_l1(bs, l1_index);
s->l1_table[l1_index] = l2_offset | QCOW_OFLAG_COPIED;
- ret = write_l1_entry(bs, l1_index);
+ ret = qcow2_write_l1_entry(bs, l1_index);
if (ret < 0) {
goto fail;
}
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index ddc3029..92ecc64 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1207,7 +1207,8 @@ fail:
* been already detected and sufficiently signaled by the calling function
* (qcow2_check_refcounts) by the time this function is called).
*/
-static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res)
+static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res,
+ BdrvCheckMode fix)
{
BDRVQcowState *s = bs->opaque;
uint64_t *l2_table = qemu_blockalign(bs, s->cluster_size);
@@ -1218,6 +1219,7 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res)
for (i = 0; i < s->l1_size; i++) {
uint64_t l1_entry = s->l1_table[i];
uint64_t l2_offset = l1_entry & L1E_OFFSET_MASK;
+ bool l2_dirty = false;
if (!l2_offset) {
continue;
@@ -1229,10 +1231,24 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res)
continue;
}
if ((refcount == 1) != ((l1_entry & QCOW_OFLAG_COPIED) != 0)) {
- fprintf(stderr, "ERROR OFLAG_COPIED L2 cluster: l1_index=%d "
+ fprintf(stderr, "%s OFLAG_COPIED L2 cluster: l1_index=%d "
"l1_entry=%" PRIx64 " refcount=%d\n",
+ fix & BDRV_FIX_ERRORS ? "Repairing" :
+ "ERROR",
i, l1_entry, refcount);
- res->corruptions++;
+ if (fix & BDRV_FIX_ERRORS) {
+ s->l1_table[i] = refcount == 1
+ ? l1_entry | QCOW_OFLAG_COPIED
+ : l1_entry & ~QCOW_OFLAG_COPIED;
+ ret = qcow2_write_l1_entry(bs, i);
+ if (ret < 0) {
+ res->check_errors++;
+ goto fail;
+ }
+ res->corruptions_fixed++;
+ } else {
+ res->corruptions++;
+ }
}
ret = bdrv_pread(bs->file, l2_offset, l2_table,
@@ -1257,13 +1273,43 @@ static int check_oflag_copied(BlockDriverState *bs, BdrvCheckResult *res)
continue;
}
if ((refcount == 1) != ((l2_entry & QCOW_OFLAG_COPIED) != 0)) {
- fprintf(stderr, "ERROR OFLAG_COPIED data cluster: "
+ fprintf(stderr, "%s OFLAG_COPIED data cluster: "
"l2_entry=%" PRIx64 " refcount=%d\n",
+ fix & BDRV_FIX_ERRORS ? "Repairing" :
+ "ERROR",
l2_entry, refcount);
- res->corruptions++;
+ if (fix & BDRV_FIX_ERRORS) {
+ l2_table[j] = cpu_to_be64(refcount == 1
+ ? l2_entry | QCOW_OFLAG_COPIED
+ : l2_entry & ~QCOW_OFLAG_COPIED);
+ l2_dirty = true;
+ res->corruptions_fixed++;
+ } else {
+ res->corruptions++;
+ }
}
}
}
+
+ if (l2_dirty) {
+ ret = qcow2_pre_write_overlap_check(bs,
+ QCOW2_OL_DEFAULT & ~QCOW2_OL_ACTIVE_L2, l2_offset,
+ s->cluster_size);
+ if (ret < 0) {
+ fprintf(stderr, "ERROR: Could not write L2 table; metadata "
+ "overlap check failed: %s\n", strerror(-ret));
+ res->check_errors++;
+ goto fail;
+ }
+
+ ret = bdrv_pwrite(bs->file, l2_offset, l2_table, s->cluster_size);
+ if (ret < 0) {
+ fprintf(stderr, "ERROR: Could not write L2 table: %s\n",
+ strerror(-ret));
+ res->check_errors++;
+ goto fail;
+ }
+ }
}
ret = 0;
@@ -1409,7 +1455,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
}
/* check OFLAG_COPIED */
- ret = check_oflag_copied(bs, res);
+ ret = check_oflag_copied(bs, res, fix);
if (ret < 0) {
goto fail;
}
diff --git a/block/qcow2.h b/block/qcow2.h
index 86ddb30..10b7bf4 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -432,6 +432,7 @@ int qcow2_pre_write_overlap_check(BlockDriverState *bs, int chk, int64_t offset,
/* qcow2-cluster.c functions */
int qcow2_grow_l1_table(BlockDriverState *bs, uint64_t min_size,
bool exact_size);
+int qcow2_write_l1_entry(BlockDriverState *bs, int l1_index);
void qcow2_l2_cache_reset(BlockDriverState *bs);
int qcow2_decompress_cluster(BlockDriverState *bs, uint64_t cluster_offset);
void qcow2_encrypt_sectors(BDRVQcowState *s, int64_t sector_num,
--
1.8.3.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [Qemu-devel] [PATCH v5 6/8] qcow2-refcount: Repair shared refcount blocks
2013-09-02 7:25 [Qemu-devel] [PATCH v5 0/8] Add metadata overlap checks Max Reitz
` (4 preceding siblings ...)
2013-09-02 7:25 ` [Qemu-devel] [PATCH v5 5/8] qcow2-refcount: Repair OFLAG_COPIED errors Max Reitz
@ 2013-09-02 7:25 ` Max Reitz
2013-09-02 7:25 ` [Qemu-devel] [PATCH v5 7/8] qcow2_check: Mark image consistent Max Reitz
` (3 subsequent siblings)
9 siblings, 0 replies; 26+ messages in thread
From: Max Reitz @ 2013-09-02 7:25 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Max Reitz
If the refcount of a refcount block is greater than one, we can at least
try to repair that problem by duplicating the affected block.
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
block/blkdebug.c | 1 +
block/qcow2-refcount.c | 148 ++++++++++++++++++++++++++++++++++++++++++++++++-
include/block/block.h | 1 +
3 files changed, 148 insertions(+), 2 deletions(-)
diff --git a/block/blkdebug.c b/block/blkdebug.c
index ccb627a..5d33e03 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -168,6 +168,7 @@ static const char *event_names[BLKDBG_EVENT_MAX] = {
[BLKDBG_REFTABLE_LOAD] = "reftable_load",
[BLKDBG_REFTABLE_GROW] = "reftable_grow",
+ [BLKDBG_REFTABLE_UPDATE] = "reftable_update",
[BLKDBG_REFBLOCK_LOAD] = "refblock_load",
[BLKDBG_REFBLOCK_UPDATE] = "refblock_update",
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 92ecc64..927bdeb 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1320,6 +1320,121 @@ fail:
}
/*
+ * Writes one sector of the refcount table to the disk
+ */
+#define RT_ENTRIES_PER_SECTOR (512 / sizeof(uint64_t))
+static int write_reftable_entry(BlockDriverState *bs, int rt_index)
+{
+ BDRVQcowState *s = bs->opaque;
+ uint64_t buf[RT_ENTRIES_PER_SECTOR];
+ int rt_start_index;
+ int i, ret;
+
+ rt_start_index = rt_index & ~(RT_ENTRIES_PER_SECTOR - 1);
+ for (i = 0; i < RT_ENTRIES_PER_SECTOR; i++) {
+ buf[i] = cpu_to_be64(s->refcount_table[rt_start_index + i]);
+ }
+
+ ret = qcow2_pre_write_overlap_check(bs,
+ QCOW2_OL_DEFAULT & ~QCOW2_OL_REFCOUNT_TABLE,
+ s->refcount_table_offset + rt_start_index * sizeof(uint64_t),
+ sizeof(buf));
+ if (ret < 0) {
+ return ret;
+ }
+
+ BLKDBG_EVENT(bs->file, BLKDBG_REFTABLE_UPDATE);
+ ret = bdrv_pwrite_sync(bs->file, s->refcount_table_offset +
+ rt_start_index * sizeof(uint64_t), buf, sizeof(buf));
+ if (ret < 0) {
+ return ret;
+ }
+
+ return 0;
+}
+
+/*
+ * Allocates a new cluster for the given refcount block (represented by its
+ * offset in the image file) and copies the current content there. This function
+ * does _not_ decrement the reference count for the currently occupied cluster.
+ *
+ * This function prints an informative message to stderr on error (and returns
+ * -errno); on success, 0 is returned.
+ */
+static int64_t realloc_refcount_block(BlockDriverState *bs, int reftable_index,
+ uint64_t offset)
+{
+ BDRVQcowState *s = bs->opaque;
+ int64_t new_offset = 0;
+ void *refcount_block = NULL;
+ int ret;
+
+ /* allocate new refcount block */
+ new_offset = qcow2_alloc_clusters(bs, s->cluster_size);
+ if (new_offset < 0) {
+ fprintf(stderr, "Could not allocate new cluster: %s\n",
+ strerror(-new_offset));
+ ret = new_offset;
+ goto fail;
+ }
+
+ /* fetch current refcount block content */
+ ret = qcow2_cache_get(bs, s->refcount_block_cache, offset, &refcount_block);
+ if (ret < 0) {
+ fprintf(stderr, "Could not fetch refcount block: %s\n", strerror(-ret));
+ goto fail;
+ }
+
+ /* new block has not yet been entered into refcount table, therefore it is
+ * no refcount block yet (regarding this check) */
+ ret = qcow2_pre_write_overlap_check(bs, QCOW2_OL_DEFAULT, new_offset,
+ s->cluster_size);
+ if (ret < 0) {
+ fprintf(stderr, "Could not write refcount block; metadata overlap "
+ "check failed: %s\n", strerror(-ret));
+ /* the image will be marked corrupt, so don't even attempt on freeing
+ * the cluster */
+ new_offset = 0;
+ goto fail;
+ }
+
+ /* write to new block */
+ ret = bdrv_write(bs->file, new_offset / BDRV_SECTOR_SIZE, refcount_block,
+ s->cluster_sectors);
+ if (ret < 0) {
+ fprintf(stderr, "Could not write refcount block: %s\n", strerror(-ret));
+ goto fail;
+ }
+
+ /* update refcount table */
+ assert(!(new_offset & (s->cluster_size - 1)));
+ s->refcount_table[reftable_index] = new_offset;
+ ret = write_reftable_entry(bs, reftable_index);
+ if (ret < 0) {
+ fprintf(stderr, "Could not update refcount table: %s\n",
+ strerror(-ret));
+ goto fail;
+ }
+
+fail:
+ if (new_offset && (ret < 0)) {
+ qcow2_free_clusters(bs, new_offset, s->cluster_size,
+ QCOW2_DISCARD_ALWAYS);
+ }
+ if (refcount_block) {
+ if (ret < 0) {
+ qcow2_cache_put(bs, s->refcount_block_cache, &refcount_block);
+ } else {
+ ret = qcow2_cache_put(bs, s->refcount_block_cache, &refcount_block);
+ }
+ }
+ if (ret < 0) {
+ return ret;
+ }
+ return new_offset;
+}
+
+/*
* Checks an image for refcount consistency.
*
* Returns 0 if no errors are found, the number of errors in case the image is
@@ -1395,10 +1510,39 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
inc_refcounts(bs, res, refcount_table, nb_clusters,
offset, s->cluster_size);
if (refcount_table[cluster] != 1) {
- fprintf(stderr, "ERROR refcount block %" PRId64
+ fprintf(stderr, "%s refcount block %" PRId64
" refcount=%d\n",
+ fix & BDRV_FIX_ERRORS ? "Repairing" :
+ "ERROR",
i, refcount_table[cluster]);
- res->corruptions++;
+
+ if (fix & BDRV_FIX_ERRORS) {
+ int64_t new_offset;
+
+ new_offset = realloc_refcount_block(bs, i, offset);
+ if (new_offset < 0) {
+ res->corruptions++;
+ continue;
+ }
+
+ /* update refcounts */
+ if ((new_offset >> s->cluster_bits) >= nb_clusters) {
+ /* increase refcount_table size if necessary */
+ int old_nb_clusters = nb_clusters;
+ nb_clusters = (new_offset >> s->cluster_bits) + 1;
+ refcount_table = g_realloc(refcount_table,
+ nb_clusters * sizeof(uint16_t));
+ memset(&refcount_table[old_nb_clusters], 0, (nb_clusters
+ - old_nb_clusters) * sizeof(uint16_t));
+ }
+ refcount_table[cluster]--;
+ inc_refcounts(bs, res, refcount_table, nb_clusters,
+ new_offset, s->cluster_size);
+
+ res->corruptions_fixed++;
+ } else {
+ res->corruptions++;
+ }
}
}
}
diff --git a/include/block/block.h b/include/block/block.h
index 742fce5..e6b391c 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -413,6 +413,7 @@ typedef enum {
BLKDBG_REFTABLE_LOAD,
BLKDBG_REFTABLE_GROW,
+ BLKDBG_REFTABLE_UPDATE,
BLKDBG_REFBLOCK_LOAD,
BLKDBG_REFBLOCK_UPDATE,
--
1.8.3.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [Qemu-devel] [PATCH v5 7/8] qcow2_check: Mark image consistent
2013-09-02 7:25 [Qemu-devel] [PATCH v5 0/8] Add metadata overlap checks Max Reitz
` (5 preceding siblings ...)
2013-09-02 7:25 ` [Qemu-devel] [PATCH v5 6/8] qcow2-refcount: Repair shared refcount blocks Max Reitz
@ 2013-09-02 7:25 ` Max Reitz
2013-09-02 7:25 ` [Qemu-devel] [PATCH v5 8/8] qemu-iotests: Overlapping cluster allocations Max Reitz
` (2 subsequent siblings)
9 siblings, 0 replies; 26+ messages in thread
From: Max Reitz @ 2013-09-02 7:25 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Max Reitz
If no corruptions remain after an image repair (and no errors have been
encountered), clear the corrupt flag in qcow2_check.
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
block/qcow2.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/block/qcow2.c b/block/qcow2.c
index 3a95ff1..aeb2ebb 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -312,7 +312,11 @@ static int qcow2_check(BlockDriverState *bs, BdrvCheckResult *result,
}
if (fix && result->check_errors == 0 && result->corruptions == 0) {
- return qcow2_mark_clean(bs);
+ ret = qcow2_mark_clean(bs);
+ if (ret < 0) {
+ return ret;
+ }
+ return qcow2_mark_consistent(bs);
}
return ret;
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* [Qemu-devel] [PATCH v5 8/8] qemu-iotests: Overlapping cluster allocations
2013-09-02 7:25 [Qemu-devel] [PATCH v5 0/8] Add metadata overlap checks Max Reitz
` (6 preceding siblings ...)
2013-09-02 7:25 ` [Qemu-devel] [PATCH v5 7/8] qcow2_check: Mark image consistent Max Reitz
@ 2013-09-02 7:25 ` Max Reitz
2013-09-12 14:57 ` [Qemu-devel] [PATCH v5 0/8] Add metadata overlap checks Eric Blake
2013-09-19 15:07 ` Max Reitz
9 siblings, 0 replies; 26+ messages in thread
From: Max Reitz @ 2013-09-02 7:25 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Max Reitz
A new test on corrupted images with overlapping cluster allocations.
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
tests/qemu-iotests/060 | 111 +++++++++++++++++++++++++++++++++++++++++++++
tests/qemu-iotests/060.out | 44 ++++++++++++++++++
tests/qemu-iotests/group | 1 +
3 files changed, 156 insertions(+)
create mode 100755 tests/qemu-iotests/060
create mode 100644 tests/qemu-iotests/060.out
diff --git a/tests/qemu-iotests/060 b/tests/qemu-iotests/060
new file mode 100755
index 0000000..65bb09f
--- /dev/null
+++ b/tests/qemu-iotests/060
@@ -0,0 +1,111 @@
+#!/bin/bash
+#
+# Test case for image corruption (overlapping data structures) in qcow2
+#
+# Copyright (C) 2013 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=mreitz@redhat.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1 # failure is the default!
+
+_cleanup()
+{
+ _cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+# This tests qocw2-specific low-level functionality
+_supported_fmt qcow2
+_supported_proto generic
+_supported_os Linux
+
+rt_offset=65536 # 0x10000 (XXX: just an assumption)
+rb_offset=131072 # 0x20000 (XXX: just an assumption)
+l1_offset=196608 # 0x30000 (XXX: just an assumption)
+l2_offset=262144 # 0x40000 (XXX: just an assumption)
+
+IMGOPTS="compat=1.1"
+
+echo
+echo "=== Testing L2 reference into L1 ==="
+echo
+_make_test_img 64M
+# Link first L1 entry (first L2 table) onto itself
+# (Note the MSb in the L1 entry is set, ensuring the refcount is one - else any
+# later write will result in a COW operation, effectively ruining this attempt
+# on image corruption)
+poke_file "$TEST_IMG" "$l1_offset" "\x80\x00\x00\x00\x00\x03\x00\x00"
+_check_test_img
+
+# The corrupt bit should not be set anyway
+./qcow2.py "$TEST_IMG" dump-header | grep incompatible_features
+
+# Try to write something, thereby forcing the corrupt bit to be set
+$QEMU_IO -c "write -P 0x2a 0 512" "$TEST_IMG" | _filter_qemu_io
+
+# The corrupt bit must now be set
+./qcow2.py "$TEST_IMG" dump-header | grep incompatible_features
+
+# Try to open the image R/W (which should fail)
+$QEMU_IO -c "read 0 512" "$TEST_IMG" 2>&1 | _filter_qemu_io | sed -e "s/can't open device .*$/can't open device/"
+
+# Try to open it RO (which should succeed)
+$QEMU_IO -c "read 0 512" -r "$TEST_IMG" | _filter_qemu_io
+
+# We could now try to fix the image, but this would probably fail (how should an
+# L2 table linked onto the L1 table be fixed?)
+
+echo
+echo "=== Testing cluster data reference into refcount block ==="
+echo
+_make_test_img 64M
+# Allocate L2 table
+truncate -s "$(($l2_offset+65536))" "$TEST_IMG"
+poke_file "$TEST_IMG" "$l1_offset" "\x80\x00\x00\x00\x00\x04\x00\x00"
+# Mark cluster as used
+poke_file "$TEST_IMG" "$(($rb_offset+8))" "\x00\x01"
+# Redirect new data cluster onto refcount block
+poke_file "$TEST_IMG" "$l2_offset" "\x80\x00\x00\x00\x00\x02\x00\x00"
+_check_test_img
+./qcow2.py "$TEST_IMG" dump-header | grep incompatible_features
+$QEMU_IO -c "write -P 0x2a 0 512" "$TEST_IMG" | _filter_qemu_io
+./qcow2.py "$TEST_IMG" dump-header | grep incompatible_features
+
+# Try to fix it
+_check_test_img -r all
+
+# The corrupt bit should be cleared
+./qcow2.py "$TEST_IMG" dump-header | grep incompatible_features
+
+# Look if it's really really fixed
+$QEMU_IO -c "write -P 0x2a 0 512" "$TEST_IMG" | _filter_qemu_io
+./qcow2.py "$TEST_IMG" dump-header | grep incompatible_features
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/060.out b/tests/qemu-iotests/060.out
new file mode 100644
index 0000000..ca4583a
--- /dev/null
+++ b/tests/qemu-iotests/060.out
@@ -0,0 +1,44 @@
+QA output created by 060
+
+=== Testing L2 reference into L1 ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+ERROR cluster 3 refcount=1 reference=3
+
+1 errors were found on the image.
+Data may be corrupted, or further writes to the image may corrupt it.
+incompatible_features 0x0
+qcow2: Preventing invalid write on metadata (overlaps with active L1 table); image marked as corrupt.
+write failed: Input/output error
+incompatible_features 0x2
+qcow2: Image is corrupt; cannot be opened read/write.
+qemu-io: can't open device
+no file open, try 'help open'
+read 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+=== Testing cluster data reference into refcount block ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+ERROR refcount block 0 refcount=2
+ERROR cluster 2 refcount=1 reference=2
+
+2 errors were found on the image.
+Data may be corrupted, or further writes to the image may corrupt it.
+incompatible_features 0x0
+qcow2: Preventing invalid write on metadata (overlaps with refcount block); image marked as corrupt.
+write failed: Input/output error
+incompatible_features 0x2
+Repairing refcount block 0 refcount=2
+The following inconsistencies were found and repaired:
+
+ 0 leaked clusters
+ 1 corruptions
+
+Double checking the fixed image now...
+No errors were found on the image.
+incompatible_features 0x0
+wrote 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+incompatible_features 0x0
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 43c05d6..0845eb5 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -64,3 +64,4 @@
055 rw auto
056 rw auto backing
059 rw auto
+060 rw auto
--
1.8.3.1
^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [PATCH v5 0/8] Add metadata overlap checks
2013-09-02 7:25 [Qemu-devel] [PATCH v5 0/8] Add metadata overlap checks Max Reitz
` (7 preceding siblings ...)
2013-09-02 7:25 ` [Qemu-devel] [PATCH v5 8/8] qemu-iotests: Overlapping cluster allocations Max Reitz
@ 2013-09-12 14:57 ` Eric Blake
2013-09-12 15:24 ` Max Reitz
2013-09-19 15:07 ` Max Reitz
9 siblings, 1 reply; 26+ messages in thread
From: Eric Blake @ 2013-09-12 14:57 UTC (permalink / raw)
To: Max Reitz; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi
[-- Attachment #1: Type: text/plain, Size: 1594 bytes --]
On 09/02/2013 01:25 AM, Max Reitz wrote:
> If a qcow2 image file becomes corrupted, any write may inadvertently
> overwrite important metadata structures such as the L1 table. This
> series adds functionality for detecting, preventing and (to some extent)
> repairing such collisions.
>
> v5:
> - fixed patch 6 (forgot to update the event_names array for the new
> event BLKDBG_REFTABLE_UPDATE); no other changes
>
> v4:
> - fixed handling of preallocated zero clusters in patch 4
> - moved OFLAG_COPIED checks into a separate function (this affects
> patches 4 and 5); functionality remains unchanged
> - patches 1, 2, 3, 6, 7 and 8 remain unmodified (except for line
> numbers in block/qcow2-refcount.c)
Just now looking at this series, and I have several questions.
It looks like Kevin applied v4 rather than v5; have we fixed that up?
Next, what sort of overhead do these new checks add to the write case?
Is it something that would be a noticeable slowdown? I'd love to see
some benchmark numbers (hopefully, the default set of checks are in the
noise compared to the overhead of actual I/O).
Also, is there a way to tune the set of checks used at runtime, or are
we stuck with the compiled-in default? That is, can a user opt in to
more expensive tests for robustness, or opt out of default tests for
speed, via a runtime command, or is it something where they have to
recompile to choose a different QCOW2_OL_DEFAULT value?
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [PATCH v5 0/8] Add metadata overlap checks
2013-09-12 14:57 ` [Qemu-devel] [PATCH v5 0/8] Add metadata overlap checks Eric Blake
@ 2013-09-12 15:24 ` Max Reitz
2013-09-13 9:57 ` Kevin Wolf
0 siblings, 1 reply; 26+ messages in thread
From: Max Reitz @ 2013-09-12 15:24 UTC (permalink / raw)
To: Eric Blake; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi
On 2013-09-12 16:57, Eric Blake wrote:
> On 09/02/2013 01:25 AM, Max Reitz wrote:
>> If a qcow2 image file becomes corrupted, any write may inadvertently
>> overwrite important metadata structures such as the L1 table. This
>> series adds functionality for detecting, preventing and (to some extent)
>> repairing such collisions.
>>
>> v5:
>> - fixed patch 6 (forgot to update the event_names array for the new
>> event BLKDBG_REFTABLE_UPDATE); no other changes
>>
>> v4:
>> - fixed handling of preallocated zero clusters in patch 4
>> - moved OFLAG_COPIED checks into a separate function (this affects
>> patches 4 and 5); functionality remains unchanged
>> - patches 1, 2, 3, 6, 7 and 8 remain unmodified (except for line
>> numbers in block/qcow2-refcount.c)
> Just now looking at this series, and I have several questions.
>
> It looks like Kevin applied v4 rather than v5; have we fixed that up?
Seems like it – I have the change (from v4 to v5) in master (see
block/blkdebug.c:171).
> Next, what sort of overhead do these new checks add to the write case?
> Is it something that would be a noticeable slowdown? I'd love to see
> some benchmark numbers (hopefully, the default set of checks are in the
> noise compared to the overhead of actual I/O).
I'll do my best to get some benchmarking done.
> Also, is there a way to tune the set of checks used at runtime, or are
> we stuck with the compiled-in default? That is, can a user opt in to
> more expensive tests for robustness, or opt out of default tests for
> speed, via a runtime command, or is it something where they have to
> recompile to choose a different QCOW2_OL_DEFAULT value?
>
Right now, it's obviously a compile-time constant. However, I think we
could also define QCOW2_OL_DEFAULT to be a runtime variable, however,
that variable would most probably be stored in the BDRVQcowState
structure. We could take advantage of that structure's instance being
named "s" almost universally, therefore we could define QCOW2_OL_DEFAULT
to be s->default_overlap_check or something like that. It would
obviously be kind of ugly, but it should work pretty well.
On the other hand, we could replace QCOW2_OL_DEFAULT manually everywhere
by s->default_overlap_check; or, we change the macro to take s as a
parameter.
Either way, I see three answers to your question:
First, right now, we're basically stuck with a compile time constant.
Second, if a user really wants to, he could always define the macro to
represent some variable. This should work pretty well already, although
this variable has to be defined first, of course.
Third, it wouldn't be too much of a problem to write a follow-up patch
manually replacing every QCOW2_OL_DEFAULT occurrence by a true variable
(such as default_overlap_check from the current BDRVQcowState
structure). I'd just undefine the macro and work down every compiler
error. ;-)
On the other hand, now, that I think about it, we could also invert the
current program logic: qcow2_check_metadata_overlap would then no longer
receive a bitmask of structures to check for overlaps, but rather a
bitmask of structures to ignore, since it can figure out
s->default_overlap_check by itself.
Max
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [PATCH v5 0/8] Add metadata overlap checks
2013-09-12 15:24 ` Max Reitz
@ 2013-09-13 9:57 ` Kevin Wolf
2013-09-13 10:23 ` Max Reitz
0 siblings, 1 reply; 26+ messages in thread
From: Kevin Wolf @ 2013-09-13 9:57 UTC (permalink / raw)
To: Max Reitz; +Cc: qemu-devel, Stefan Hajnoczi
Am 12.09.2013 um 17:24 hat Max Reitz geschrieben:
> On 2013-09-12 16:57, Eric Blake wrote:
> >Also, is there a way to tune the set of checks used at runtime, or are
> >we stuck with the compiled-in default? That is, can a user opt in to
> >more expensive tests for robustness, or opt out of default tests for
> >speed, via a runtime command, or is it something where they have to
> >recompile to choose a different QCOW2_OL_DEFAULT value?
> >
> Right now, it's obviously a compile-time constant. However, I think
> we could also define QCOW2_OL_DEFAULT to be a runtime variable,
> however, that variable would most probably be stored in the
> BDRVQcowState structure. We could take advantage of that structure's
> instance being named "s" almost universally, therefore we could
> define QCOW2_OL_DEFAULT to be s->default_overlap_check or something
> like that. It would obviously be kind of ugly, but it should work
> pretty well.
Not really acceptable, and not necessary either.
You could change all qcow2_pre_write_overlap_check() caller to not pass
QCOW2_OL_DEFAULT & foo, but pass either just foo or QCOW2_OL_ALL & ~foo,
and apply a s->overlap_checks mask inside qcow2_pre_write_overlap_check().
> On the other hand, we could replace QCOW2_OL_DEFAULT manually
> everywhere by s->default_overlap_check; or, we change the macro to
> take s as a parameter.
>
> Either way, I see three answers to your question:
>
> First, right now, we're basically stuck with a compile time constant.
>
> Second, if a user really wants to, he could always define the macro
> to represent some variable. This should work pretty well already,
> although this variable has to be defined first, of course.
>
> Third, it wouldn't be too much of a problem to write a follow-up
> patch manually replacing every QCOW2_OL_DEFAULT occurrence by a true
> variable (such as default_overlap_check from the current
> BDRVQcowState structure). I'd just undefine the macro and work down
> every compiler error. ;-)
>
> On the other hand, now, that I think about it, we could also invert
> the current program logic: qcow2_check_metadata_overlap would then
> no longer receive a bitmask of structures to check for overlaps, but
> rather a bitmask of structures to ignore, since it can figure out
> s->default_overlap_check by itself.
Yup, see above. It's not necessary to invert to logic, though it's an
option and perhaps the nicer one. The crucial part is moving the
evalution of s->overlap_check into qcow2_check_metadata_overlap. (Note
how I dropped the 'default' from s->default_overlap_check, it's not a
default any more, but the actual user choice.)
The more interesting part is that adding an option always needs thought
because once it is exposed, it's an API that is set in stone. And I'm
also not sure what the best command line and QMP representations of a
bitmask like this are.
Kevin
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [PATCH v5 0/8] Add metadata overlap checks
2013-09-13 9:57 ` Kevin Wolf
@ 2013-09-13 10:23 ` Max Reitz
2013-09-13 12:29 ` Eric Blake
0 siblings, 1 reply; 26+ messages in thread
From: Max Reitz @ 2013-09-13 10:23 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-devel, Stefan Hajnoczi
On 2013-09-13 11:57, Kevin Wolf wrote:
>
> […]
> Am 12.09.2013 um 17:24 hat Max Reitz geschrieben:
>> On the other hand, we could replace QCOW2_OL_DEFAULT manually
>> everywhere by s->default_overlap_check; or, we change the macro to
>> take s as a parameter.
>>
>> Either way, I see three answers to your question:
>>
>> First, right now, we're basically stuck with a compile time constant.
>>
>> Second, if a user really wants to, he could always define the macro
>> to represent some variable. This should work pretty well already,
>> although this variable has to be defined first, of course.
>>
>> Third, it wouldn't be too much of a problem to write a follow-up
>> patch manually replacing every QCOW2_OL_DEFAULT occurrence by a true
>> variable (such as default_overlap_check from the current
>> BDRVQcowState structure). I'd just undefine the macro and work down
>> every compiler error. ;-)
>>
>> On the other hand, now, that I think about it, we could also invert
>> the current program logic: qcow2_check_metadata_overlap would then
>> no longer receive a bitmask of structures to check for overlaps, but
>> rather a bitmask of structures to ignore, since it can figure out
>> s->default_overlap_check by itself.
> Yup, see above. It's not necessary to invert to logic, though it's an
> option and perhaps the nicer one. The crucial part is moving the
> evalution of s->overlap_check into qcow2_check_metadata_overlap. (Note
> how I dropped the 'default' from s->default_overlap_check, it's not a
> default any more, but the actual user choice.)
Well, I didn't mean the “default” to express that it's QEMU's default,
but rather that it's the default check to be performed by
qcow2_check_metadata_overlap if the caller doesn't have any
“restrictions” (such as which structures should be ignored). But anyway,
it makes sense to drop the prefix in this case as well, because that
value will then be used inside qcow2_check_metadata_overlap alone.
Furthermore, I'm always in favor of shorter variable names. ;-)
> The more interesting part is that adding an option always needs thought
> because once it is exposed, it's an API that is set in stone. And I'm
> also not sure what the best command line and QMP representations of a
> bitmask like this are.
I'd personally add it to the runtime options of qcow2. In addition, I
propose we add a mechanism to generally amend runtime options at runtime
through QMP (if there isn't one already which I then am unaware of). I
don't see why we should just allow the kind of overlap checks performed
to be changed at runtime, but not, for instance, whether lazy refcounts
should be used (except the latter would be a bit harder to implement, I
guess).
About the representation: The discard behavior is basically a bitfield
already and gives us therefore one possible representation (which is,
just using a single boolean per structure, named something like
"overlap-check.active-l1" etc.). In QMP we could probably also use a
dict, but then again, this is a decision to be made when generally
allowing modification of the qcow2 runtime options through QMP (in my
opinion). And finally, we could obviously just use an integer to
represent the mask.
I think, we should first take care of the command line interface and
about QMP later (that is, if you agree on generally allowing
modification of the qcow2 runtime options through QMP). There, we could
offer both one boolean per mask element and an integer option, probably
the boolean flags taking precedence.
The flags are nice for users who want an "easily" comprehensible
interface, the masked integer is better for those who prefer a short
representation.
Max
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [PATCH v5 0/8] Add metadata overlap checks
2013-09-13 10:23 ` Max Reitz
@ 2013-09-13 12:29 ` Eric Blake
2013-09-13 12:37 ` Max Reitz
2013-09-13 14:29 ` Max Reitz
0 siblings, 2 replies; 26+ messages in thread
From: Eric Blake @ 2013-09-13 12:29 UTC (permalink / raw)
To: Max Reitz; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi
[-- Attachment #1: Type: text/plain, Size: 2766 bytes --]
On 09/13/2013 04:23 AM, Max Reitz wrote:
>
>> The more interesting part is that adding an option always needs thought
>> because once it is exposed, it's an API that is set in stone. And I'm
>> also not sure what the best command line and QMP representations of a
>> bitmask like this are.
> I'd personally add it to the runtime options of qcow2. In addition, I
> propose we add a mechanism to generally amend runtime options at runtime
> through QMP (if there isn't one already which I then am unaware of). I
> don't see why we should just allow the kind of overlap checks performed
> to be changed at runtime, but not, for instance, whether lazy refcounts
> should be used (except the latter would be a bit harder to implement, I
> guess).
Indeed - I was asking more to spark conversation, and not to necessarily
state that we need it (again, without benchmark numbers, it's hard to
state whether there's enough timing difference for it to even matter
that someone would WANT a runtime tuning). IF we implement runtime
tuning, we're stuck supporting it.
>
> About the representation: The discard behavior is basically a bitfield
> already and gives us therefore one possible representation (which is,
> just using a single boolean per structure, named something like
> "overlap-check.active-l1" etc.). In QMP we could probably also use a
> dict, but then again, this is a decision to be made when generally
> allowing modification of the qcow2 runtime options through QMP (in my
> opinion). And finally, we could obviously just use an integer to
> represent the mask.
Implement it only as a struct of bools. A raw 'int' requires the caller
to have too much internal knowledge of which bools map to which bit
positions, and furthermore prevents you from ever changing bit positions.
>
> I think, we should first take care of the command line interface and
> about QMP later (that is, if you agree on generally allowing
> modification of the qcow2 runtime options through QMP). There, we could
> offer both one boolean per mask element and an integer option, probably
> the boolean flags taking precedence.
I'm fine if it is JUST a command-line parameter (all-or-nothing, turned
on when you boot qemu, and not something we can be changing on the fly).
But if we ever do want live changing via QMP, do NOT expose it as a raw
int, but only as named bools.
>
> The flags are nice for users who want an "easily" comprehensible
> interface, the masked integer is better for those who prefer a short
> representation.
Short representations that lock us into a particular implementation are bad.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [PATCH v5 0/8] Add metadata overlap checks
2013-09-13 12:29 ` Eric Blake
@ 2013-09-13 12:37 ` Max Reitz
2013-09-13 14:29 ` Max Reitz
1 sibling, 0 replies; 26+ messages in thread
From: Max Reitz @ 2013-09-13 12:37 UTC (permalink / raw)
To: Eric Blake; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi
On 2013-09-13 14:29, Eric Blake wrote:
> On 09/13/2013 04:23 AM, Max Reitz wrote:
>>> The more interesting part is that adding an option always needs thought
>>> because once it is exposed, it's an API that is set in stone. And I'm
>>> also not sure what the best command line and QMP representations of a
>>> bitmask like this are.
>> I'd personally add it to the runtime options of qcow2. In addition, I
>> propose we add a mechanism to generally amend runtime options at runtime
>> through QMP (if there isn't one already which I then am unaware of). I
>> don't see why we should just allow the kind of overlap checks performed
>> to be changed at runtime, but not, for instance, whether lazy refcounts
>> should be used (except the latter would be a bit harder to implement, I
>> guess).
> Indeed - I was asking more to spark conversation, and not to necessarily
> state that we need it (again, without benchmark numbers, it's hard to
> state whether there's enough timing difference for it to even matter
> that someone would WANT a runtime tuning). IF we implement runtime
> tuning, we're stuck supporting it.
I have done some short tests here (I think just writing some GB of data
to a qcow2 image on a ramdisk should suffice; maybe taking a couple of
snapshots to increase the number of inactive L1 tables (although the
active L2 tables should affect the results most anyway)) and I don't see
a difference so far. I will continue those tests at home where I have
more RAM.
>> About the representation: The discard behavior is basically a bitfield
>> already and gives us therefore one possible representation (which is,
>> just using a single boolean per structure, named something like
>> "overlap-check.active-l1" etc.). In QMP we could probably also use a
>> dict, but then again, this is a decision to be made when generally
>> allowing modification of the qcow2 runtime options through QMP (in my
>> opinion). And finally, we could obviously just use an integer to
>> represent the mask.
> Implement it only as a struct of bools. A raw 'int' requires the caller
> to have too much internal knowledge of which bools map to which bit
> positions, and furthermore prevents you from ever changing bit positions.
>
>> I think, we should first take care of the command line interface and
>> about QMP later (that is, if you agree on generally allowing
>> modification of the qcow2 runtime options through QMP). There, we could
>> offer both one boolean per mask element and an integer option, probably
>> the boolean flags taking precedence.
> I'm fine if it is JUST a command-line parameter (all-or-nothing, turned
> on when you boot qemu, and not something we can be changing on the fly).
> But if we ever do want live changing via QMP, do NOT expose it as a raw
> int, but only as named bools.
>
>> The flags are nice for users who want an "easily" comprehensible
>> interface, the masked integer is better for those who prefer a short
>> representation.
> Short representations that lock us into a particular implementation are bad.
>
Yes, that's true.
Max
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [PATCH v5 0/8] Add metadata overlap checks
2013-09-13 12:29 ` Eric Blake
2013-09-13 12:37 ` Max Reitz
@ 2013-09-13 14:29 ` Max Reitz
2013-09-13 14:34 ` Eric Blake
1 sibling, 1 reply; 26+ messages in thread
From: Max Reitz @ 2013-09-13 14:29 UTC (permalink / raw)
To: Eric Blake; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi
On 2013-09-13 14:29, Eric Blake wrote:
> On 09/13/2013 04:23 AM, Max Reitz wrote:
>> […]
>> I think, we should first take care of the command line interface and
>> about QMP later (that is, if you agree on generally allowing
>> modification of the qcow2 runtime options through QMP). There, we could
>> offer both one boolean per mask element and an integer option, probably
>> the boolean flags taking precedence.
> I'm fine if it is JUST a command-line parameter (all-or-nothing, turned
> on when you boot qemu, and not something we can be changing on the fly).
> But if we ever do want live changing via QMP, do NOT expose it as a raw
> int, but only as named bools.
Another idea: Instead of providing an integer for "shorthand"
manipulations, what do you think of a string parameter (such as -o
cache=foo right now, although I do know -o cache isn't even document
anymore and provided only for compatibility reasons, it seems to me)
which will automatically be translated to the right settings? I'm
thinking of:
- overlap-check=none (no checks at all)
- overlap-check=constant (only checks who can be performed in constant
time, i.e., main header, active L1, refcount table and snapshot table)
- overlap-check=cached (only checks which don't require disk access,
i.e. the current (and as I'd propose, future) default)
- overlap-check=all (all checks, including those requiring disk access
(i.e., overlaps on inactive L2 tables))
These would then provide templates which can be further refined through
the booleans (as is the case with -o cache right now).
Max
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [PATCH v5 0/8] Add metadata overlap checks
2013-09-13 14:29 ` Max Reitz
@ 2013-09-13 14:34 ` Eric Blake
0 siblings, 0 replies; 26+ messages in thread
From: Eric Blake @ 2013-09-13 14:34 UTC (permalink / raw)
To: Max Reitz; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi
[-- Attachment #1: Type: text/plain, Size: 1467 bytes --]
On 09/13/2013 08:29 AM, Max Reitz wrote:
> Another idea: Instead of providing an integer for "shorthand"
> manipulations, what do you think of a string parameter (such as -o
> cache=foo right now, although I do know -o cache isn't even document
> anymore and provided only for compatibility reasons, it seems to me)
> which will automatically be translated to the right settings? I'm
> thinking of:
> - overlap-check=none (no checks at all)
> - overlap-check=constant (only checks who can be performed in constant
> time, i.e., main header, active L1, refcount table and snapshot table)
> - overlap-check=cached (only checks which don't require disk access,
> i.e. the current (and as I'd propose, future) default)
> - overlap-check=all (all checks, including those requiring disk access
> (i.e., overlaps on inactive L2 tables))
Definitely a nice idea - saves the user from having to figure out which
checks have which impact, by categorizing the types of checks and only
exposing the categories.
>
> These would then provide templates which can be further refined through
> the booleans (as is the case with -o cache right now).
Probably don't need that much fine-tuning; your categories look pretty
nice (although we may think of more categories if we do add additional
checks with different levels of execution time).
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [PATCH v5 0/8] Add metadata overlap checks
2013-09-02 7:25 [Qemu-devel] [PATCH v5 0/8] Add metadata overlap checks Max Reitz
` (8 preceding siblings ...)
2013-09-12 14:57 ` [Qemu-devel] [PATCH v5 0/8] Add metadata overlap checks Eric Blake
@ 2013-09-19 15:07 ` Max Reitz
2013-09-19 17:26 ` Eric Blake
2013-09-20 10:32 ` Stefan Hajnoczi
9 siblings, 2 replies; 26+ messages in thread
From: Max Reitz @ 2013-09-19 15:07 UTC (permalink / raw)
To: Max Reitz; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi
[-- Attachment #1: Type: text/plain, Size: 4051 bytes --]
Hi,
I've done some benchmarks regarding this series now. In particular, I've
created a 7G image, installed Arch Linux to a partition in the first 2G
and created an empty ext4 partition for benchmarking in the remaining 5G.
My first test consisted of running bonnie++ ("bonnie++ -d [scratch
partition] -s 4g -n 0 -x 16 -Z /dev/urandom" (4G files for I/O
performance tests, no file creation tests, repeat 16 times)) using
different metadata overlap checks (none, constant (all tests can be
performed in constant time) and cached (current default)). The reason I
didn't test for "all" (perform all overlap checks) is that this will
only make a difference (when compared to "cached") if there are
snapshots (right now, at least). I put the underlying image file to /tmp
(tmpfs) for minimal true I/O latency (to maximize the check overhead).
The second test was basically the same, except I've taken 100 (internal)
snapshots before and used 2G files instead of 4G. In this case, I also
tested the "all" scenario.
I performed the third test on a HDD instead of tmpfs to maximize the
overhead of non-cached overlap checks (that is, checking inactive L2
table overlaps right now) which require disk I/O. I used -drive
cache=none for this test (in contrast to the others which ran on a tmpfs
anyway). Also, I've used 256M files since 2G just took too much time. :)
As far as I understand, the I/O speed (the duration of an I/O operation)
should be pretty much the same for all scenarios, however, the latency
is the value in question (since the overlap checks should affect the
latency only).
Basically, I didn't get any results which indicate a performance hit.
The raw HDD test data sometimes resulted in a standard deviation greater
than the average itself (!), thus I've removed some outliers there. The
averages rarely exceed each other's standard deviation and if they do,
often there is no trend at all. The only time there is a real trend
exceeding the standard deviation is for block writes in my first test –
however, the trend is negative, indicating overlap checks actually sped
things up (which is obviously contraintuitive). The difference, however,
is below 1 % anyway.
The only major differences visible (exceeding the combined standard
deviation of the two values in question) occured during the HDD test:
The duration of putc, block writes and rewrites for HDDs was much
greater (about 10 to 20 %, however, bear in mind the standard deviation
is in that magnitude as well) for "constant" and "cached" than for
"none" and "all". On the other hand, the putc and rewrite latency was
much better for "constant" and "cached" then for "none" and "all". The
durations differing so greatly is a sign to me that the data from this
test is not really usable (since I think it should be the same for all
scenarios). If we're to ignore that and the fact that there was indeed a
higher latency in "none" than in "all" for both latencies affected, we
could conclude that "all" is really much slower than "constant" or
"cached". But then again, the block write latency was even smaller for
"all" than for "cached" and "constant", so I'd just ignore these
benchmarks (for the HDD).
All in all, I don't see any significant performance difference when
benchmarking on a tmpfs (which should maximize the overhead of
"constant" and "cached") and the data from my HDD benchmarks is probably
stastically unusable. The only comparison which they would have been
useful for are the comparison of "all" to "cached", but since "all" will
not be the default (and anyone explicitly using this option is in fact
responsible for slow I/O himself)) they aren't actually that important
anyway.
I've attached a CSV file containing the edited results, that is, the
averages and standard deviations for the tests performed by bonnie++,
excluding some outliers from the HDD benchmark; I think the values are
given in microseconds.
Max
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: res.csv --]
[-- Type: text/csv; name="res.csv", Size: 998 bytes --]
,putc,put_block,rewrite,putc_latency,put_block_latency,rewrite_latency
tmpfs (4G),,,,,,
None,1376±12,1112658±4230,463507±37777,9934±1021,6740±144,33143±7043
Constant,1338±8,1107024±2347,476831±35860,10014±1090,6846±254,34115±5867
Cached,1366±9,1104980±3903,463079±39164,10413±1158,6794±181,33820±6338
tmpfs+snapshots (2G),,,,,,
None,1392±8,1127802±6131,459712±24960,9614±1132,6523±120,27823±6142
Constant,1396±11,1126206±4224,453260±25135,9178±1249,6579±126,23221±7521
Cached,1394±7,1119699±6276,467627±27784,9470±951,6616±137,27254±6785
All,1398±12,1123245±3805,472850±29836,9500±823,6617±169,28555±6206
HDD+snapshots (256M),,,,,,
None,850±21,72862±9765,32392±3900,15954±1942,126226±28733,148929±37483
Constant,986±17,84240±9009,38247±1117,10843±1214,101496±29997,106627±19098
Cached,988±19,84111±8460,37691±1379,10310±761,101249±42366,103950±14453
All,873±13,75392±5506,34556±1273,12690±563,95506±4917,134220±24256
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [PATCH v5 0/8] Add metadata overlap checks
2013-09-19 15:07 ` Max Reitz
@ 2013-09-19 17:26 ` Eric Blake
2013-09-20 8:23 ` Max Reitz
2013-09-20 10:32 ` Stefan Hajnoczi
1 sibling, 1 reply; 26+ messages in thread
From: Eric Blake @ 2013-09-19 17:26 UTC (permalink / raw)
To: Max Reitz; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi
[-- Attachment #1: Type: text/plain, Size: 1704 bytes --]
On 09/19/2013 09:07 AM, Max Reitz wrote:
> Hi,
>
> I've done some benchmarks regarding this series now. In particular, I've
> created a 7G image, installed Arch Linux to a partition in the first 2G
> and created an empty ext4 partition for benchmarking in the remaining 5G.
Thank you for doing this!
>
> Basically, I didn't get any results which indicate a performance hit.
>
> All in all, I don't see any significant performance difference when
> benchmarking on a tmpfs (which should maximize the overhead of
> "constant" and "cached") and the data from my HDD benchmarks is probably
> stastically unusable. The only comparison which they would have been
> useful for are the comparison of "all" to "cached", but since "all" will
> not be the default (and anyone explicitly using this option is in fact
> responsible for slow I/O himself)) they aren't actually that important
> anyway.
That's good news. Remember, my question about whether we needed the
ability for command-line tuning was conditional on whether the tuning
would make a difference - but if you aren't seeing a difference with
your benchmark, then we might as well unconditionally enable the
checking, and not worry about the complexity of exposing additional
tuning. Simpler is better if it makes no difference in end.
[Of course, if you WANT to write the patches for making it configurable,
I won't stop you; but now that you answered my question about
performance, and the answer is the desirable "no measurable impact", I
don't know that I would ever plan on using such tuning]
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [PATCH v5 0/8] Add metadata overlap checks
2013-09-19 17:26 ` Eric Blake
@ 2013-09-20 8:23 ` Max Reitz
0 siblings, 0 replies; 26+ messages in thread
From: Max Reitz @ 2013-09-20 8:23 UTC (permalink / raw)
To: Eric Blake; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi
On 2013-09-19 19:26, Eric Blake wrote:
> […]
> That's good news. Remember, my question about whether we needed the
> ability for command-line tuning was conditional on whether the tuning
> would make a difference - but if you aren't seeing a difference with
> your benchmark, then we might as well unconditionally enable the
> checking, and not worry about the complexity of exposing additional
> tuning. Simpler is better if it makes no difference in end.
>
> [Of course, if you WANT to write the patches for making it configurable,
> I won't stop you; but now that you answered my question about
> performance, and the answer is the desirable "no measurable impact", I
> don't know that I would ever plan on using such tuning]
In fact, I have written those patches already, so I'll just send them. ;)
Max
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [PATCH v5 0/8] Add metadata overlap checks
2013-09-19 15:07 ` Max Reitz
2013-09-19 17:26 ` Eric Blake
@ 2013-09-20 10:32 ` Stefan Hajnoczi
2013-10-26 13:03 ` Max Reitz
1 sibling, 1 reply; 26+ messages in thread
From: Stefan Hajnoczi @ 2013-09-20 10:32 UTC (permalink / raw)
To: Max Reitz; +Cc: Kevin Wolf, qemu-devel
On Thu, Sep 19, 2013 at 05:07:56PM +0200, Max Reitz wrote:
> As far as I understand, the I/O speed (the duration of an I/O
> operation) should be pretty much the same for all scenarios,
> however, the latency is the value in question (since the overlap
> checks should affect the latency only).
The other value to look at is the host CPU consumption per I/O. In
other words, the CPU overhead added by performing the extra checks:
efficiency = avg throughput / avg cpu utilization
Once CPU consumption reaches 100% the workload is CPU-bound and we have
a bottleneck.
Hopefully the efficiency doesn't change noticably either, then we know
there is no big impact from the extra checks.
Stefan
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [PATCH v5 0/8] Add metadata overlap checks
2013-09-20 10:32 ` Stefan Hajnoczi
@ 2013-10-26 13:03 ` Max Reitz
2013-10-26 13:05 ` Max Reitz
2013-11-05 8:51 ` Stefan Hajnoczi
0 siblings, 2 replies; 26+ messages in thread
From: Max Reitz @ 2013-10-26 13:03 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: Kevin Wolf, qemu-devel
Am 20.09.2013 12:32, schrieb Stefan Hajnoczi:
> On Thu, Sep 19, 2013 at 05:07:56PM +0200, Max Reitz wrote:
>> As far as I understand, the I/O speed (the duration of an I/O
>> operation) should be pretty much the same for all scenarios,
>> however, the latency is the value in question (since the overlap
>> checks should affect the latency only).
> The other value to look at is the host CPU consumption per I/O. In
> other words, the CPU overhead added by performing the extra checks:
>
> efficiency = avg throughput / avg cpu utilization
>
> Once CPU consumption reaches 100% the workload is CPU-bound and we have
> a bottleneck.
>
> Hopefully the efficiency doesn't change noticably either, then we know
> there is no big impact from the extra checks.
>
> Stefan
Okay, after fixing the VM state in qcow2, I was now finally able to
actually perform the CPU benchmark. On second thought, it wasn't really
neccessary, since I performed most of the tests in RAM anyway, so the
CPU was already the bottleneck for these tests.
I ran bonnie++ (bonnie++ -s 4g -n 0 -x 16) from an arch live CD ISO on a
5 GB qcow2 image formatted as ext4, both residing in /tmp; I prepared
the VM state to the point where I just had to press Enter to perform the
test and shut down the VM. I then performed a snapshot and used this
image as the basis for two tests, one with no overlap checks enabled and
one with all of them enabled.
The time output for both qemu instances was respectively:
echo 'sendkey ret' | time $QEMU_DIR/x86_64-softmmu/qemu-system-x86_64
-cdrom arch.iso -drive file=base.qcow2,overlap-check=none -enable-kvm
-vga std -m 512 -loadvm 0 -monitor stdio
d 294.42s user 117.72s system 98% cpu 6:58.00 total
echo 'sendkey ret' | time $QEMU_DIR/x86_64-softmmu/qemu-system-x86_64
-cdrom arch.iso -drive file=base.qcow2,overlap-check=all -enable-kvm
-vga std -m 512 -loadvm 0 -monitor stdio
d 298.87s user 119.55s system 100% cpu 6:56.37 total
So, as you can see, the CPU time differs only marginally (using all
overlap checks instead of none took 1.52 % more CPU time).
Max
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [PATCH v5 0/8] Add metadata overlap checks
2013-10-26 13:03 ` Max Reitz
@ 2013-10-26 13:05 ` Max Reitz
2013-11-05 8:51 ` Stefan Hajnoczi
1 sibling, 0 replies; 26+ messages in thread
From: Max Reitz @ 2013-10-26 13:05 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: Kevin Wolf, qemu-devel
Am 26.10.2013 15:03, schrieb Max Reitz:
> Am 20.09.2013 12:32, schrieb Stefan Hajnoczi:
>> On Thu, Sep 19, 2013 at 05:07:56PM +0200, Max Reitz wrote:
>>> As far as I understand, the I/O speed (the duration of an I/O
>>> operation) should be pretty much the same for all scenarios,
>>> however, the latency is the value in question (since the overlap
>>> checks should affect the latency only).
>> The other value to look at is the host CPU consumption per I/O. In
>> other words, the CPU overhead added by performing the extra checks:
>>
>> efficiency = avg throughput / avg cpu utilization
>>
>> Once CPU consumption reaches 100% the workload is CPU-bound and we have
>> a bottleneck.
>>
>> Hopefully the efficiency doesn't change noticably either, then we know
>> there is no big impact from the extra checks.
>>
>> Stefan
> Okay, after fixing the VM state in qcow2, I was now finally able to
> actually perform the CPU benchmark. On second thought, it wasn't really
> neccessary, since I performed most of the tests in RAM anyway, so the
> CPU was already the bottleneck for these tests.
I forgot to mention: I used the qemu from Kevin's current block branch
for these tests.
Max
> I ran bonnie++ (bonnie++ -s 4g -n 0 -x 16) from an arch live CD ISO on a
> 5 GB qcow2 image formatted as ext4, both residing in /tmp; I prepared
> the VM state to the point where I just had to press Enter to perform the
> test and shut down the VM. I then performed a snapshot and used this
> image as the basis for two tests, one with no overlap checks enabled and
> one with all of them enabled.
>
> The time output for both qemu instances was respectively:
>
> echo 'sendkey ret' | time $QEMU_DIR/x86_64-softmmu/qemu-system-x86_64
> -cdrom arch.iso -drive file=base.qcow2,overlap-check=none -enable-kvm
> -vga std -m 512 -loadvm 0 -monitor stdio
> d 294.42s user 117.72s system 98% cpu 6:58.00 total
>
> echo 'sendkey ret' | time $QEMU_DIR/x86_64-softmmu/qemu-system-x86_64
> -cdrom arch.iso -drive file=base.qcow2,overlap-check=all -enable-kvm
> -vga std -m 512 -loadvm 0 -monitor stdio
> d 298.87s user 119.55s system 100% cpu 6:56.37 total
>
> So, as you can see, the CPU time differs only marginally (using all
> overlap checks instead of none took 1.52 % more CPU time).
>
> Max
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [PATCH v5 0/8] Add metadata overlap checks
2013-10-26 13:03 ` Max Reitz
2013-10-26 13:05 ` Max Reitz
@ 2013-11-05 8:51 ` Stefan Hajnoczi
2013-11-14 18:37 ` Max Reitz
1 sibling, 1 reply; 26+ messages in thread
From: Stefan Hajnoczi @ 2013-11-05 8:51 UTC (permalink / raw)
To: Max Reitz; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi
On Sat, Oct 26, 2013 at 03:03:09PM +0200, Max Reitz wrote:
> Am 20.09.2013 12:32, schrieb Stefan Hajnoczi:
> > On Thu, Sep 19, 2013 at 05:07:56PM +0200, Max Reitz wrote:
> >> As far as I understand, the I/O speed (the duration of an I/O
> >> operation) should be pretty much the same for all scenarios,
> >> however, the latency is the value in question (since the overlap
> >> checks should affect the latency only).
> > The other value to look at is the host CPU consumption per I/O. In
> > other words, the CPU overhead added by performing the extra checks:
> >
> > efficiency = avg throughput / avg cpu utilization
> >
> > Once CPU consumption reaches 100% the workload is CPU-bound and we have
> > a bottleneck.
> >
> > Hopefully the efficiency doesn't change noticably either, then we know
> > there is no big impact from the extra checks.
> >
> > Stefan
>
> Okay, after fixing the VM state in qcow2, I was now finally able to
> actually perform the CPU benchmark. On second thought, it wasn't really
> neccessary, since I performed most of the tests in RAM anyway, so the
> CPU was already the bottleneck for these tests.
>
> I ran bonnie++ (bonnie++ -s 4g -n 0 -x 16) from an arch live CD ISO on a
> 5 GB qcow2 image formatted as ext4, both residing in /tmp; I prepared
> the VM state to the point where I just had to press Enter to perform the
> test and shut down the VM. I then performed a snapshot and used this
> image as the basis for two tests, one with no overlap checks enabled and
> one with all of them enabled.
>
> The time output for both qemu instances was respectively:
>
> echo 'sendkey ret' | time $QEMU_DIR/x86_64-softmmu/qemu-system-x86_64
> -cdrom arch.iso -drive file=base.qcow2,overlap-check=none -enable-kvm
> -vga std -m 512 -loadvm 0 -monitor stdio
> d 294.42s user 117.72s system 98% cpu 6:58.00 total
>
> echo 'sendkey ret' | time $QEMU_DIR/x86_64-softmmu/qemu-system-x86_64
> -cdrom arch.iso -drive file=base.qcow2,overlap-check=all -enable-kvm
> -vga std -m 512 -loadvm 0 -monitor stdio
> d 298.87s user 119.55s system 100% cpu 6:56.37 total
>
> So, as you can see, the CPU time differs only marginally (using all
> overlap checks instead of none took 1.52 % more CPU time).
Good, looks like the impact isn't very noticable.
I wonder if that 1.52% is reproducible or just noise, did you run the
benchmark multiple times?
Stefan
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [PATCH v5 0/8] Add metadata overlap checks
2013-11-05 8:51 ` Stefan Hajnoczi
@ 2013-11-14 18:37 ` Max Reitz
2013-11-15 8:13 ` Stefan Hajnoczi
0 siblings, 1 reply; 26+ messages in thread
From: Max Reitz @ 2013-11-14 18:37 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi
On 05.11.2013 09:51, Stefan Hajnoczi wrote:
> On Sat, Oct 26, 2013 at 03:03:09PM +0200, Max Reitz wrote:
>> Am 20.09.2013 12:32, schrieb Stefan Hajnoczi:
>>> On Thu, Sep 19, 2013 at 05:07:56PM +0200, Max Reitz wrote:
>>>> As far as I understand, the I/O speed (the duration of an I/O
>>>> operation) should be pretty much the same for all scenarios,
>>>> however, the latency is the value in question (since the overlap
>>>> checks should affect the latency only).
>>> The other value to look at is the host CPU consumption per I/O. In
>>> other words, the CPU overhead added by performing the extra checks:
>>>
>>> efficiency = avg throughput / avg cpu utilization
>>>
>>> Once CPU consumption reaches 100% the workload is CPU-bound and we have
>>> a bottleneck.
>>>
>>> Hopefully the efficiency doesn't change noticably either, then we know
>>> there is no big impact from the extra checks.
>>>
>>> Stefan
>> Okay, after fixing the VM state in qcow2, I was now finally able to
>> actually perform the CPU benchmark. On second thought, it wasn't really
>> neccessary, since I performed most of the tests in RAM anyway, so the
>> CPU was already the bottleneck for these tests.
>>
>> I ran bonnie++ (bonnie++ -s 4g -n 0 -x 16) from an arch live CD ISO on a
>> 5 GB qcow2 image formatted as ext4, both residing in /tmp; I prepared
>> the VM state to the point where I just had to press Enter to perform the
>> test and shut down the VM. I then performed a snapshot and used this
>> image as the basis for two tests, one with no overlap checks enabled and
>> one with all of them enabled.
>>
>> The time output for both qemu instances was respectively:
>>
>> echo 'sendkey ret' | time $QEMU_DIR/x86_64-softmmu/qemu-system-x86_64
>> -cdrom arch.iso -drive file=base.qcow2,overlap-check=none -enable-kvm
>> -vga std -m 512 -loadvm 0 -monitor stdio
>> d 294.42s user 117.72s system 98% cpu 6:58.00 total
>>
>> echo 'sendkey ret' | time $QEMU_DIR/x86_64-softmmu/qemu-system-x86_64
>> -cdrom arch.iso -drive file=base.qcow2,overlap-check=all -enable-kvm
>> -vga std -m 512 -loadvm 0 -monitor stdio
>> d 298.87s user 119.55s system 100% cpu 6:56.37 total
>>
>> So, as you can see, the CPU time differs only marginally (using all
>> overlap checks instead of none took 1.52 % more CPU time).
> Good, looks like the impact isn't very noticable.
>
> I wonder if that 1.52% is reproducible or just noise, did you run the
> benchmark multiple times?
>
> Stefan
I just ran three tests for each (alternating between the modes), the
results are as following (comparing the total time):
overlap-check=none: 421.29 s, 412.37 s, 414.60 s
Average: 416.09 s
Standard deviation: 3.79 s
overlap-check=all: 420.02 s, 415.11 s, 423.37 s
Average: 419.50 s
Standard deviation: 3.39 s
So, using all overlap checks is nearly consistently slower – however,
the difference is exactly within the single standard deviation. There is
a difference, but first of all, it is pretty much unremarkable, and
second, remember all tests are run in tmpfs. This is the absolute
maximum slowdown we'll ever experience.
Max
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Qemu-devel] [PATCH v5 0/8] Add metadata overlap checks
2013-11-14 18:37 ` Max Reitz
@ 2013-11-15 8:13 ` Stefan Hajnoczi
0 siblings, 0 replies; 26+ messages in thread
From: Stefan Hajnoczi @ 2013-11-15 8:13 UTC (permalink / raw)
To: Max Reitz; +Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel
On Thu, Nov 14, 2013 at 07:37:54PM +0100, Max Reitz wrote:
> On 05.11.2013 09:51, Stefan Hajnoczi wrote:
> > On Sat, Oct 26, 2013 at 03:03:09PM +0200, Max Reitz wrote:
> >> Am 20.09.2013 12:32, schrieb Stefan Hajnoczi:
> >>> On Thu, Sep 19, 2013 at 05:07:56PM +0200, Max Reitz wrote:
> >>>> As far as I understand, the I/O speed (the duration of an I/O
> >>>> operation) should be pretty much the same for all scenarios,
> >>>> however, the latency is the value in question (since the overlap
> >>>> checks should affect the latency only).
> >>> The other value to look at is the host CPU consumption per I/O. In
> >>> other words, the CPU overhead added by performing the extra checks:
> >>>
> >>> efficiency = avg throughput / avg cpu utilization
> >>>
> >>> Once CPU consumption reaches 100% the workload is CPU-bound and we have
> >>> a bottleneck.
> >>>
> >>> Hopefully the efficiency doesn't change noticably either, then we know
> >>> there is no big impact from the extra checks.
> >>>
> >>> Stefan
> >> Okay, after fixing the VM state in qcow2, I was now finally able to
> >> actually perform the CPU benchmark. On second thought, it wasn't really
> >> neccessary, since I performed most of the tests in RAM anyway, so the
> >> CPU was already the bottleneck for these tests.
> >>
> >> I ran bonnie++ (bonnie++ -s 4g -n 0 -x 16) from an arch live CD ISO on a
> >> 5 GB qcow2 image formatted as ext4, both residing in /tmp; I prepared
> >> the VM state to the point where I just had to press Enter to perform the
> >> test and shut down the VM. I then performed a snapshot and used this
> >> image as the basis for two tests, one with no overlap checks enabled and
> >> one with all of them enabled.
> >>
> >> The time output for both qemu instances was respectively:
> >>
> >> echo 'sendkey ret' | time $QEMU_DIR/x86_64-softmmu/qemu-system-x86_64
> >> -cdrom arch.iso -drive file=base.qcow2,overlap-check=none -enable-kvm
> >> -vga std -m 512 -loadvm 0 -monitor stdio
> >> d 294.42s user 117.72s system 98% cpu 6:58.00 total
> >>
> >> echo 'sendkey ret' | time $QEMU_DIR/x86_64-softmmu/qemu-system-x86_64
> >> -cdrom arch.iso -drive file=base.qcow2,overlap-check=all -enable-kvm
> >> -vga std -m 512 -loadvm 0 -monitor stdio
> >> d 298.87s user 119.55s system 100% cpu 6:56.37 total
> >>
> >> So, as you can see, the CPU time differs only marginally (using all
> >> overlap checks instead of none took 1.52 % more CPU time).
> > Good, looks like the impact isn't very noticable.
> >
> > I wonder if that 1.52% is reproducible or just noise, did you run the
> > benchmark multiple times?
> >
> > Stefan
>
> I just ran three tests for each (alternating between the modes), the
> results are as following (comparing the total time):
>
> overlap-check=none: 421.29 s, 412.37 s, 414.60 s
> Average: 416.09 s
> Standard deviation: 3.79 s
>
> overlap-check=all: 420.02 s, 415.11 s, 423.37 s
> Average: 419.50 s
> Standard deviation: 3.39 s
>
> So, using all overlap checks is nearly consistently slower – however,
> the difference is exactly within the single standard deviation. There is
> a difference, but first of all, it is pretty much unremarkable, and
> second, remember all tests are run in tmpfs. This is the absolute
> maximum slowdown we'll ever experience.
Thanks!
Stefan
^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2013-11-15 8:13 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-02 7:25 [Qemu-devel] [PATCH v5 0/8] Add metadata overlap checks Max Reitz
2013-09-02 7:25 ` [Qemu-devel] [PATCH v5 1/8] qcow2: Add corrupt bit Max Reitz
2013-09-02 7:25 ` [Qemu-devel] [PATCH v5 2/8] qcow2: Metadata overlap checks Max Reitz
2013-09-02 7:25 ` [Qemu-devel] [PATCH v5 3/8] qcow2: Employ metadata " Max Reitz
2013-09-02 7:25 ` [Qemu-devel] [PATCH v5 4/8] qcow2-refcount: Move OFLAG_COPIED checks Max Reitz
2013-09-02 7:25 ` [Qemu-devel] [PATCH v5 5/8] qcow2-refcount: Repair OFLAG_COPIED errors Max Reitz
2013-09-02 7:25 ` [Qemu-devel] [PATCH v5 6/8] qcow2-refcount: Repair shared refcount blocks Max Reitz
2013-09-02 7:25 ` [Qemu-devel] [PATCH v5 7/8] qcow2_check: Mark image consistent Max Reitz
2013-09-02 7:25 ` [Qemu-devel] [PATCH v5 8/8] qemu-iotests: Overlapping cluster allocations Max Reitz
2013-09-12 14:57 ` [Qemu-devel] [PATCH v5 0/8] Add metadata overlap checks Eric Blake
2013-09-12 15:24 ` Max Reitz
2013-09-13 9:57 ` Kevin Wolf
2013-09-13 10:23 ` Max Reitz
2013-09-13 12:29 ` Eric Blake
2013-09-13 12:37 ` Max Reitz
2013-09-13 14:29 ` Max Reitz
2013-09-13 14:34 ` Eric Blake
2013-09-19 15:07 ` Max Reitz
2013-09-19 17:26 ` Eric Blake
2013-09-20 8:23 ` Max Reitz
2013-09-20 10:32 ` Stefan Hajnoczi
2013-10-26 13:03 ` Max Reitz
2013-10-26 13:05 ` Max Reitz
2013-11-05 8:51 ` Stefan Hajnoczi
2013-11-14 18:37 ` Max Reitz
2013-11-15 8:13 ` Stefan Hajnoczi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).