* [Qemu-devel] [PATCH v6 0/9] qcow2: cluster space preallocation
@ 2018-01-16 13:04 Anton Nefedov
2018-01-16 13:04 ` [Qemu-devel] [PATCH v6 1/9] mirror: inherit supported write/zero flags Anton Nefedov
` (8 more replies)
0 siblings, 9 replies; 14+ messages in thread
From: Anton Nefedov @ 2018-01-16 13:04 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov
(used to be 'qcow2: preallocation and COW improvements')
v6: the series is split; now includes only ALLOCATE flag introduction
and improvement of qcow2 COW with efficient write-zeroes.
File space preallocation beyond EOF will be a separate series.
Rebased; remarks to patch 8 resolved.
v5: http://lists.nongnu.org/archive/html/qemu-devel/2017-11/msg00059.html
----
This pull request is to start to improve a few performance problems of
qcow2 format:
1. non cluster-aligned write requests (to unallocated clusters) explicitly
pad data with zeroes if there is no backing data.
Resulting increase in ops number and potential cluster fragmentation
(on the host file) is already solved by:
ee22a9d qcow2: Merge the writing of the COW regions with the guest data
However, in case of zero COW regions, that can be avoided at all
but the whole clusters are preallocated and zeroed in a single
efficient write_zeroes() operation
2. moreover, efficient write_zeroes() operation can be used to preallocate
space megabytes (*configurable number) ahead which gives noticeable
improvement on some storage types (e.g. distributed storage)
where the space allocation operation might be expensive)
(Not included in this patchset since v6).
3. this will also allow to enable simultaneous writes to the same unallocated
cluster after the space has been allocated & zeroed but before
the first data is written and the cluster is linked to L2.
(Not included in this patchset).
Efficient write_zeroes usually implies that the blocks are not actually
written to but only reserved and marked as zeroed by the storage.
In this patchset, file-posix driver is marked as supporting this operation
if it supports (/configured to support) fallocate() operation.
Existing bdrv_write_zeroes() falls back to writing zero buffers if
write_zeroes is not supported by the driver.
A new flag (BDRV_REQ_ALLOCATE) is introduced to avoid that but return ENOTSUP.
Such allocate requests are also implemented to possibly overlap with the
other requests. No wait is performed but an error returned in such case as well.
So the operation should be considered advisory and a fallback scenario still
handled by the caller (in this case, qcow2 driver).
Anton Nefedov (9):
mirror: inherit supported write/zero flags
blkverify: set supported write/zero flags
block: introduce BDRV_REQ_ALLOCATE flag
block: treat BDRV_REQ_ALLOCATE as serialising
file-posix: support BDRV_REQ_ALLOCATE
block: support BDRV_REQ_ALLOCATE in passthrough drivers
qcow2: move is_zero() up
qcow2: skip writing zero buffers to empty COW areas
iotest 134: test cluster-misaligned encrypted write
block/qcow2.h | 6 +++
include/block/block.h | 6 ++-
include/block/block_int.h | 2 +-
block/blkdebug.c | 3 +-
block/blkverify.c | 9 +++++
block/file-posix.c | 8 ++++
block/io.c | 47 ++++++++++++++++------
block/mirror.c | 5 +++
block/qcow2-cluster.c | 2 +-
block/qcow2.c | 98 ++++++++++++++++++++++++++++++++++++----------
block/raw-format.c | 3 +-
block/trace-events | 1 +
tests/qemu-iotests/060 | 2 +-
tests/qemu-iotests/060.out | 3 +-
tests/qemu-iotests/066 | 2 +-
tests/qemu-iotests/066.out | 4 +-
tests/qemu-iotests/134 | 9 +++++
tests/qemu-iotests/134.out | 10 +++++
18 files changed, 179 insertions(+), 41 deletions(-)
--
2.7.4
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Qemu-devel] [PATCH v6 1/9] mirror: inherit supported write/zero flags
2018-01-16 13:04 [Qemu-devel] [PATCH v6 0/9] qcow2: cluster space preallocation Anton Nefedov
@ 2018-01-16 13:04 ` Anton Nefedov
2018-01-16 13:04 ` [Qemu-devel] [PATCH v6 2/9] blkverify: set " Anton Nefedov
` (7 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: Anton Nefedov @ 2018-01-16 13:04 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
---
block/mirror.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/block/mirror.c b/block/mirror.c
index c9badc1..d18ec65 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1064,6 +1064,11 @@ static void bdrv_mirror_top_refresh_filename(BlockDriverState *bs, QDict *opts)
bdrv_refresh_filename(bs->backing->bs);
pstrcpy(bs->exact_filename, sizeof(bs->exact_filename),
bs->backing->bs->filename);
+ bs->supported_write_flags = BDRV_REQ_FUA &
+ bs->backing->bs->supported_write_flags;
+ bs->supported_zero_flags =
+ (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
+ bs->backing->bs->supported_zero_flags;
}
static void bdrv_mirror_top_close(BlockDriverState *bs)
--
2.7.4
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [Qemu-devel] [PATCH v6 2/9] blkverify: set supported write/zero flags
2018-01-16 13:04 [Qemu-devel] [PATCH v6 0/9] qcow2: cluster space preallocation Anton Nefedov
2018-01-16 13:04 ` [Qemu-devel] [PATCH v6 1/9] mirror: inherit supported write/zero flags Anton Nefedov
@ 2018-01-16 13:04 ` Anton Nefedov
2018-01-16 13:04 ` [Qemu-devel] [PATCH v6 3/9] block: introduce BDRV_REQ_ALLOCATE flag Anton Nefedov
` (6 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: Anton Nefedov @ 2018-01-16 13:04 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
---
block/blkverify.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/block/blkverify.c b/block/blkverify.c
index 06369f9..9ba65d0 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -140,6 +140,15 @@ static int blkverify_open(BlockDriverState *bs, QDict *options, int flags,
goto fail;
}
+ bs->supported_write_flags = BDRV_REQ_FUA &
+ bs->file->bs->supported_write_flags &
+ s->test_file->bs->supported_write_flags;
+
+ bs->supported_zero_flags =
+ (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
+ bs->file->bs->supported_zero_flags &
+ s->test_file->bs->supported_zero_flags;
+
ret = 0;
fail:
qemu_opts_del(opts);
--
2.7.4
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [Qemu-devel] [PATCH v6 3/9] block: introduce BDRV_REQ_ALLOCATE flag
2018-01-16 13:04 [Qemu-devel] [PATCH v6 0/9] qcow2: cluster space preallocation Anton Nefedov
2018-01-16 13:04 ` [Qemu-devel] [PATCH v6 1/9] mirror: inherit supported write/zero flags Anton Nefedov
2018-01-16 13:04 ` [Qemu-devel] [PATCH v6 2/9] blkverify: set " Anton Nefedov
@ 2018-01-16 13:04 ` Anton Nefedov
2018-01-16 13:04 ` [Qemu-devel] [PATCH v6 4/9] block: treat BDRV_REQ_ALLOCATE as serialising Anton Nefedov
` (5 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: Anton Nefedov @ 2018-01-16 13:04 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov
The flag is supposed to indicate that the region of the disk image has
to be sufficiently allocated so it reads as zeroes.
The call with the flag set must return -ENOTSUP if allocation cannot
be done efficiently.
This has to be made sure of by both
- the drivers that support the flag
- and the common block layer (so it will not fall back to any slowpath
(like writing zero buffers) in case the driver does not support
the flag).
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
---
include/block/block.h | 6 +++++-
include/block/block_int.h | 2 +-
block/io.c | 20 +++++++++++++++++---
3 files changed, 23 insertions(+), 5 deletions(-)
diff --git a/include/block/block.h b/include/block/block.h
index 9b12774..3e31b89 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -65,9 +65,13 @@ typedef enum {
BDRV_REQ_NO_SERIALISING = 0x8,
BDRV_REQ_FUA = 0x10,
BDRV_REQ_WRITE_COMPRESSED = 0x20,
+ /* The BDRV_REQ_ALLOCATE flag is used to indicate that the driver has to
+ * efficiently allocate the space so it reads as zeroes, or return an error.
+ */
+ BDRV_REQ_ALLOCATE = 0x40,
/* Mask of valid flags */
- BDRV_REQ_MASK = 0x3f,
+ BDRV_REQ_MASK = 0x7f,
} BdrvRequestFlags;
typedef struct BlockSizes {
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 29cafa4..b141710 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -632,7 +632,7 @@ struct BlockDriverState {
/* Flags honored during pwrite (so far: BDRV_REQ_FUA) */
unsigned int supported_write_flags;
/* Flags honored during pwrite_zeroes (so far: BDRV_REQ_FUA,
- * BDRV_REQ_MAY_UNMAP) */
+ * BDRV_REQ_MAY_UNMAP, BDRV_REQ_ALLOCATE) */
unsigned int supported_zero_flags;
/* the following member gives a name to every node on the bs graph. */
diff --git a/block/io.c b/block/io.c
index 7ea4023..cf2f84c 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1424,7 +1424,7 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
assert(!bs->supported_zero_flags);
}
- if (ret == -ENOTSUP) {
+ if (ret == -ENOTSUP && !(flags & BDRV_REQ_ALLOCATE)) {
/* Fall back to bounce buffer if write zeroes is unsupported */
BdrvRequestFlags write_flags = flags & ~BDRV_REQ_ZERO_WRITE;
@@ -1514,8 +1514,8 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child,
ret = notifier_with_return_list_notify(&bs->before_write_notifiers, req);
if (!ret && bs->detect_zeroes != BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF &&
- !(flags & BDRV_REQ_ZERO_WRITE) && drv->bdrv_co_pwrite_zeroes &&
- qemu_iovec_is_zero(qiov)) {
+ !(flags & BDRV_REQ_ZERO_WRITE) && !(flags & BDRV_REQ_ALLOCATE) &&
+ drv->bdrv_co_pwrite_zeroes && qemu_iovec_is_zero(qiov)) {
flags |= BDRV_REQ_ZERO_WRITE;
if (bs->detect_zeroes == BLOCKDEV_DETECT_ZEROES_OPTIONS_UNMAP) {
flags |= BDRV_REQ_MAY_UNMAP;
@@ -1593,6 +1593,9 @@ static int coroutine_fn bdrv_co_do_zero_pwritev(BdrvChild *child,
assert(flags & BDRV_REQ_ZERO_WRITE);
if (head_padding_bytes || tail_padding_bytes) {
+ if (flags & BDRV_REQ_ALLOCATE) {
+ return -ENOTSUP;
+ }
buf = qemu_blockalign(bs, align);
iov = (struct iovec) {
.iov_base = buf,
@@ -1693,6 +1696,9 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
return ret;
}
+ /* allocation request with qiov provided doesn't make much sense */
+ assert(!(qiov && (flags & BDRV_REQ_ALLOCATE)));
+
bdrv_inc_in_flight(bs);
/*
* Align write if necessary by performing a read-modify-write cycle.
@@ -1822,6 +1828,14 @@ int coroutine_fn bdrv_co_pwrite_zeroes(BdrvChild *child, int64_t offset,
{
trace_bdrv_co_pwrite_zeroes(child->bs, offset, bytes, flags);
+ assert(!((flags & BDRV_REQ_MAY_UNMAP) && (flags & BDRV_REQ_ALLOCATE)));
+
+ if ((flags & BDRV_REQ_ALLOCATE) &&
+ !(child->bs->supported_zero_flags & BDRV_REQ_ALLOCATE))
+ {
+ return -ENOTSUP;
+ }
+
if (!(child->bs->open_flags & BDRV_O_UNMAP)) {
flags &= ~BDRV_REQ_MAY_UNMAP;
}
--
2.7.4
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [Qemu-devel] [PATCH v6 4/9] block: treat BDRV_REQ_ALLOCATE as serialising
2018-01-16 13:04 [Qemu-devel] [PATCH v6 0/9] qcow2: cluster space preallocation Anton Nefedov
` (2 preceding siblings ...)
2018-01-16 13:04 ` [Qemu-devel] [PATCH v6 3/9] block: introduce BDRV_REQ_ALLOCATE flag Anton Nefedov
@ 2018-01-16 13:04 ` Anton Nefedov
2018-01-16 20:43 ` Eric Blake
2018-01-16 13:04 ` [Qemu-devel] [PATCH v6 5/9] file-posix: support BDRV_REQ_ALLOCATE Anton Nefedov
` (4 subsequent siblings)
8 siblings, 1 reply; 14+ messages in thread
From: Anton Nefedov @ 2018-01-16 13:04 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov
The idea is that ALLOCATE requests may overlap with other requests.
Reuse the existing block layer infrastructure for serialising requests.
Use the following approach:
- mark ALLOCATE serialising, so subsequent requests to the area wait
- ALLOCATE request itself must never wait if another request is in flight
already. Return EAGAIN, let the caller reconsider.
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
---
block/io.c | 27 +++++++++++++++++++--------
1 file changed, 19 insertions(+), 8 deletions(-)
diff --git a/block/io.c b/block/io.c
index cf2f84c..4b0d34f 100644
--- a/block/io.c
+++ b/block/io.c
@@ -605,7 +605,8 @@ void bdrv_dec_in_flight(BlockDriverState *bs)
bdrv_wakeup(bs);
}
-static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
+static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self,
+ bool nowait)
{
BlockDriverState *bs = self->bs;
BdrvTrackedRequest *req;
@@ -636,11 +637,14 @@ static bool coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
* will wait for us as soon as it wakes up, then just go on
* (instead of producing a deadlock in the former case). */
if (!req->waiting_for) {
+ waited = true;
+ if (nowait) {
+ break;
+ }
self->waiting_for = req;
qemu_co_queue_wait(&req->wait_queue, &bs->reqs_lock);
self->waiting_for = NULL;
retry = true;
- waited = true;
break;
}
}
@@ -1206,7 +1210,7 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child,
}
if (!(flags & BDRV_REQ_NO_SERIALISING)) {
- wait_serialising_requests(req);
+ wait_serialising_requests(req, false);
}
if (flags & BDRV_REQ_COPY_ON_READ) {
@@ -1504,7 +1508,10 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child,
max_transfer = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_transfer, INT_MAX),
align);
- waited = wait_serialising_requests(req);
+ waited = wait_serialising_requests(req, flags & BDRV_REQ_ALLOCATE);
+ if (waited && flags & BDRV_REQ_ALLOCATE) {
+ return -EAGAIN;
+ }
assert(!waited || !req->serialising);
assert(req->overlap_offset <= offset);
assert(offset + bytes <= req->overlap_offset + req->overlap_bytes);
@@ -1608,7 +1615,7 @@ static int coroutine_fn bdrv_co_do_zero_pwritev(BdrvChild *child,
/* RMW the unaligned part before head. */
mark_request_serialising(req, align);
- wait_serialising_requests(req);
+ wait_serialising_requests(req, false);
bdrv_debug_event(bs, BLKDBG_PWRITEV_RMW_HEAD);
ret = bdrv_aligned_preadv(child, req, offset & ~(align - 1), align,
align, &local_qiov, 0);
@@ -1628,6 +1635,10 @@ static int coroutine_fn bdrv_co_do_zero_pwritev(BdrvChild *child,
bytes -= zero_bytes;
}
+ if (flags & BDRV_REQ_ALLOCATE) {
+ mark_request_serialising(req, align);
+ }
+
assert(!bytes || (offset & (align - 1)) == 0);
if (bytes >= align) {
/* Write the aligned part in the middle. */
@@ -1646,7 +1657,7 @@ static int coroutine_fn bdrv_co_do_zero_pwritev(BdrvChild *child,
assert(align == tail_padding_bytes + bytes);
/* RMW the unaligned part after tail. */
mark_request_serialising(req, align);
- wait_serialising_requests(req);
+ wait_serialising_requests(req, false);
bdrv_debug_event(bs, BLKDBG_PWRITEV_RMW_TAIL);
ret = bdrv_aligned_preadv(child, req, offset, align,
align, &local_qiov, 0);
@@ -1717,7 +1728,7 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
struct iovec head_iov;
mark_request_serialising(&req, align);
- wait_serialising_requests(&req);
+ wait_serialising_requests(&req, false);
head_buf = qemu_blockalign(bs, align);
head_iov = (struct iovec) {
@@ -1758,7 +1769,7 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
bool waited;
mark_request_serialising(&req, align);
- waited = wait_serialising_requests(&req);
+ waited = wait_serialising_requests(&req, false);
assert(!waited || !use_local_qiov);
tail_buf = qemu_blockalign(bs, align);
--
2.7.4
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [Qemu-devel] [PATCH v6 5/9] file-posix: support BDRV_REQ_ALLOCATE
2018-01-16 13:04 [Qemu-devel] [PATCH v6 0/9] qcow2: cluster space preallocation Anton Nefedov
` (3 preceding siblings ...)
2018-01-16 13:04 ` [Qemu-devel] [PATCH v6 4/9] block: treat BDRV_REQ_ALLOCATE as serialising Anton Nefedov
@ 2018-01-16 13:04 ` Anton Nefedov
2018-01-16 13:04 ` [Qemu-devel] [PATCH v6 6/9] block: support BDRV_REQ_ALLOCATE in passthrough drivers Anton Nefedov
` (3 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: Anton Nefedov @ 2018-01-16 13:04 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov
Current write_zeroes implementation is good enough to satisfy this flag too
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
---
block/file-posix.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/block/file-posix.c b/block/file-posix.c
index 36ee89e..c36e156 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -558,7 +558,10 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
}
if (S_ISREG(st.st_mode)) {
s->discard_zeroes = true;
+#ifdef CONFIG_FALLOCATE
s->has_fallocate = true;
+ bs->supported_zero_flags |= BDRV_REQ_ALLOCATE;
+#endif
}
if (S_ISBLK(st.st_mode)) {
#ifdef BLKDISCARDZEROES
@@ -593,6 +596,7 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
#ifdef CONFIG_XFS
if (platform_test_xfs_fd(s->fd)) {
s->is_xfs = true;
+ bs->supported_zero_flags |= BDRV_REQ_ALLOCATE;
}
#endif
@@ -1413,6 +1417,10 @@ static ssize_t handle_aiocb_write_zeroes(RawPosixAIOData *aiocb)
}
s->has_fallocate = false;
}
+
+ if (!s->has_fallocate) {
+ aiocb->bs->supported_zero_flags &= ~BDRV_REQ_ALLOCATE;
+ }
#endif
return -ENOTSUP;
--
2.7.4
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [Qemu-devel] [PATCH v6 6/9] block: support BDRV_REQ_ALLOCATE in passthrough drivers
2018-01-16 13:04 [Qemu-devel] [PATCH v6 0/9] qcow2: cluster space preallocation Anton Nefedov
` (4 preceding siblings ...)
2018-01-16 13:04 ` [Qemu-devel] [PATCH v6 5/9] file-posix: support BDRV_REQ_ALLOCATE Anton Nefedov
@ 2018-01-16 13:04 ` Anton Nefedov
2018-01-16 13:04 ` [Qemu-devel] [PATCH v6 7/9] qcow2: move is_zero() up Anton Nefedov
` (2 subsequent siblings)
8 siblings, 0 replies; 14+ messages in thread
From: Anton Nefedov @ 2018-01-16 13:04 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov
Support the flag if the underlying BDS supports it
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
---
block/blkdebug.c | 3 ++-
block/blkverify.c | 2 +-
block/mirror.c | 2 +-
block/raw-format.c | 3 ++-
4 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/block/blkdebug.c b/block/blkdebug.c
index e216699..7d5773d 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -400,7 +400,8 @@ static int blkdebug_open(BlockDriverState *bs, QDict *options, int flags,
bs->supported_write_flags = BDRV_REQ_FUA &
bs->file->bs->supported_write_flags;
- bs->supported_zero_flags = (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
+ bs->supported_zero_flags =
+ (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_ALLOCATE) &
bs->file->bs->supported_zero_flags;
ret = -EINVAL;
diff --git a/block/blkverify.c b/block/blkverify.c
index 9ba65d0..b249636 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -145,7 +145,7 @@ static int blkverify_open(BlockDriverState *bs, QDict *options, int flags,
s->test_file->bs->supported_write_flags;
bs->supported_zero_flags =
- (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
+ (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_ALLOCATE) &
bs->file->bs->supported_zero_flags &
s->test_file->bs->supported_zero_flags;
diff --git a/block/mirror.c b/block/mirror.c
index d18ec65..eb41deb 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1067,7 +1067,7 @@ static void bdrv_mirror_top_refresh_filename(BlockDriverState *bs, QDict *opts)
bs->supported_write_flags = BDRV_REQ_FUA &
bs->backing->bs->supported_write_flags;
bs->supported_zero_flags =
- (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
+ (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_ALLOCATE) &
bs->backing->bs->supported_zero_flags;
}
diff --git a/block/raw-format.c b/block/raw-format.c
index ab552c0..b1deb93 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -416,7 +416,8 @@ static int raw_open(BlockDriverState *bs, QDict *options, int flags,
bs->sg = bs->file->bs->sg;
bs->supported_write_flags = BDRV_REQ_FUA &
bs->file->bs->supported_write_flags;
- bs->supported_zero_flags = (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP) &
+ bs->supported_zero_flags =
+ (BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_ALLOCATE) &
bs->file->bs->supported_zero_flags;
if (bs->probed && !bdrv_is_read_only(bs)) {
--
2.7.4
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [Qemu-devel] [PATCH v6 7/9] qcow2: move is_zero() up
2018-01-16 13:04 [Qemu-devel] [PATCH v6 0/9] qcow2: cluster space preallocation Anton Nefedov
` (5 preceding siblings ...)
2018-01-16 13:04 ` [Qemu-devel] [PATCH v6 6/9] block: support BDRV_REQ_ALLOCATE in passthrough drivers Anton Nefedov
@ 2018-01-16 13:04 ` Anton Nefedov
2018-01-16 13:04 ` [Qemu-devel] [PATCH v6 8/9] qcow2: skip writing zero buffers to empty COW areas Anton Nefedov
2018-01-16 13:04 ` [Qemu-devel] [PATCH v6 9/9] iotest 134: test cluster-misaligned encrypted write Anton Nefedov
8 siblings, 0 replies; 14+ messages in thread
From: Anton Nefedov @ 2018-01-16 13:04 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov
To be used in the following commit without a forward declaration.
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
---
block/qcow2.c | 35 +++++++++++++++++------------------
1 file changed, 17 insertions(+), 18 deletions(-)
diff --git a/block/qcow2.c b/block/qcow2.c
index 4348b2c..2ed21ff 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1858,6 +1858,23 @@ static bool merge_cow(uint64_t offset, unsigned bytes,
return false;
}
+static bool is_zero(BlockDriverState *bs, int64_t offset, int64_t bytes)
+{
+ int64_t nr;
+ int res;
+
+ /* Clamp to image length, before checking status of underlying sectors */
+ if (offset + bytes > bs->total_sectors * BDRV_SECTOR_SIZE) {
+ bytes = bs->total_sectors * BDRV_SECTOR_SIZE - offset;
+ }
+
+ if (!bytes) {
+ return true;
+ }
+ res = bdrv_block_status_above(bs, NULL, offset, bytes, &nr, NULL, NULL);
+ return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == bytes;
+}
+
static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
uint64_t bytes, QEMUIOVector *qiov,
int flags)
@@ -2975,24 +2992,6 @@ finish:
return ret;
}
-
-static bool is_zero(BlockDriverState *bs, int64_t offset, int64_t bytes)
-{
- int64_t nr;
- int res;
-
- /* Clamp to image length, before checking status of underlying sectors */
- if (offset + bytes > bs->total_sectors * BDRV_SECTOR_SIZE) {
- bytes = bs->total_sectors * BDRV_SECTOR_SIZE - offset;
- }
-
- if (!bytes) {
- return true;
- }
- res = bdrv_block_status_above(bs, NULL, offset, bytes, &nr, NULL, NULL);
- return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == bytes;
-}
-
static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
int64_t offset, int bytes, BdrvRequestFlags flags)
{
--
2.7.4
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [Qemu-devel] [PATCH v6 8/9] qcow2: skip writing zero buffers to empty COW areas
2018-01-16 13:04 [Qemu-devel] [PATCH v6 0/9] qcow2: cluster space preallocation Anton Nefedov
` (6 preceding siblings ...)
2018-01-16 13:04 ` [Qemu-devel] [PATCH v6 7/9] qcow2: move is_zero() up Anton Nefedov
@ 2018-01-16 13:04 ` Anton Nefedov
2018-01-16 16:11 ` Alberto Garcia
2018-01-16 13:04 ` [Qemu-devel] [PATCH v6 9/9] iotest 134: test cluster-misaligned encrypted write Anton Nefedov
8 siblings, 1 reply; 14+ messages in thread
From: Anton Nefedov @ 2018-01-16 13:04 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov
If COW areas of the newly allocated clusters are zeroes on the backing image,
efficient bdrv_write_zeroes(flags=BDRV_REQ_ALLOCATE) can be used on the whole
cluster instead of writing explicit zero buffers later in perform_cow().
iotest 060:
write to the discarded cluster does not trigger COW anymore.
so, break on write_aio event instead, will work for the test
(but write won't fail anymore, so update reference output)
iotest 066:
cluster-alignment areas that were not really COWed are now detected
as zeroes, hence the initial write has to be exactly the same size for
the maps to match
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
---
block/qcow2.h | 6 +++++
block/qcow2-cluster.c | 2 +-
block/qcow2.c | 63 ++++++++++++++++++++++++++++++++++++++++++++--
block/trace-events | 1 +
tests/qemu-iotests/060 | 2 +-
tests/qemu-iotests/060.out | 3 ++-
tests/qemu-iotests/066 | 2 +-
tests/qemu-iotests/066.out | 4 +--
8 files changed, 75 insertions(+), 8 deletions(-)
diff --git a/block/qcow2.h b/block/qcow2.h
index 46c8cf4..e6e3a22 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -377,6 +377,12 @@ typedef struct QCowL2Meta
Qcow2COWRegion cow_end;
/**
+ * Indicates that COW regions are already handled and do not require
+ * any more processing.
+ */
+ bool skip_cow;
+
+ /**
* The I/O vector with the data from the actual guest write request.
* If non-NULL, this is meant to be merged together with the data
* from @cow_start and @cow_end into one single write operation.
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index a3fec27..511ceb8 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -791,7 +791,7 @@ static int perform_cow(BlockDriverState *bs, QCowL2Meta *m)
assert(start->offset + start->nb_bytes <= end->offset);
assert(!m->data_qiov || m->data_qiov->size == data_bytes);
- if (start->nb_bytes == 0 && end->nb_bytes == 0) {
+ if ((start->nb_bytes == 0 && end->nb_bytes == 0) || m->skip_cow) {
return 0;
}
diff --git a/block/qcow2.c b/block/qcow2.c
index 2ed21ff..811adeb 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1833,6 +1833,11 @@ static bool merge_cow(uint64_t offset, unsigned bytes,
continue;
}
+ /* If COW regions are handled already, skip this too */
+ if (m->skip_cow) {
+ continue;
+ }
+
/* The data (middle) region must be immediately after the
* start region */
if (l2meta_cow_start(m) + m->cow_start.nb_bytes != offset) {
@@ -1875,6 +1880,53 @@ static bool is_zero(BlockDriverState *bs, int64_t offset, int64_t bytes)
return res >= 0 && (res & BDRV_BLOCK_ZERO) && nr == bytes;
}
+static bool is_zero_cow(BlockDriverState *bs, QCowL2Meta *m)
+{
+ if (bs->encrypted) {
+ return false;
+ }
+
+ if (!is_zero(bs, m->offset + m->cow_start.offset, m->cow_start.nb_bytes)) {
+ return false;
+ }
+
+ if (!is_zero(bs, m->offset + m->cow_end.offset, m->cow_end.nb_bytes)) {
+ return false;
+ }
+
+ return true;
+}
+
+static void handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
+{
+ BDRVQcow2State *s = bs->opaque;
+ QCowL2Meta *m;
+
+ for (m = l2meta; m != NULL; m = m->next) {
+ int ret;
+
+ if (!m->cow_start.nb_bytes && !m->cow_end.nb_bytes) {
+ continue;
+ }
+
+ if (!is_zero_cow(bs, m)) {
+ continue;
+ }
+
+ /* instead of writing zero COW buffers,
+ efficiently zero out the whole clusters */
+ ret = bdrv_co_pwrite_zeroes(bs->file, m->alloc_offset,
+ m->nb_clusters * s->cluster_size,
+ BDRV_REQ_ALLOCATE);
+ if (ret < 0) {
+ continue;
+ }
+
+ trace_qcow2_skip_cow(qemu_coroutine_self(), m->offset, m->nb_clusters);
+ m->skip_cow = true;
+ }
+}
+
static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
uint64_t bytes, QEMUIOVector *qiov,
int flags)
@@ -1957,24 +2009,31 @@ static coroutine_fn int qcow2_co_pwritev(BlockDriverState *bs, uint64_t offset,
goto fail;
}
+ qemu_co_mutex_unlock(&s->lock);
+
+ if (bs->file->bs->supported_zero_flags & BDRV_REQ_ALLOCATE) {
+ handle_alloc_space(bs, l2meta);
+ }
+
/* If we need to do COW, check if it's possible to merge the
* writing of the guest data together with that of the COW regions.
* If it's not possible (or not necessary) then write the
* guest data now. */
if (!merge_cow(offset, cur_bytes, &hd_qiov, l2meta)) {
- qemu_co_mutex_unlock(&s->lock);
BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
trace_qcow2_writev_data(qemu_coroutine_self(),
cluster_offset + offset_in_cluster);
ret = bdrv_co_pwritev(bs->file,
cluster_offset + offset_in_cluster,
cur_bytes, &hd_qiov, 0);
- qemu_co_mutex_lock(&s->lock);
if (ret < 0) {
+ qemu_co_mutex_lock(&s->lock);
goto fail;
}
}
+ qemu_co_mutex_lock(&s->lock);
+
while (l2meta != NULL) {
QCowL2Meta *next;
diff --git a/block/trace-events b/block/trace-events
index 11c8d5f..c9fa596 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -61,6 +61,7 @@ qcow2_writev_done_part(void *co, int cur_bytes) "co %p cur_bytes %d"
qcow2_writev_data(void *co, uint64_t offset) "co %p offset 0x%" PRIx64
qcow2_pwrite_zeroes_start_req(void *co, int64_t offset, int count) "co %p offset 0x%" PRIx64 " count %d"
qcow2_pwrite_zeroes(void *co, int64_t offset, int count) "co %p offset 0x%" PRIx64 " count %d"
+qcow2_skip_cow(void* co, uint64_t offset, int nb_clusters) "co %p offset 0x%" PRIx64 " nb_clusters %d"
# block/qcow2-cluster.c
qcow2_alloc_clusters_offset(void *co, uint64_t offset, int bytes) "co %p offset 0x%" PRIx64 " bytes %d"
diff --git a/tests/qemu-iotests/060 b/tests/qemu-iotests/060
index 14797dd..1d09bc0 100755
--- a/tests/qemu-iotests/060
+++ b/tests/qemu-iotests/060
@@ -160,7 +160,7 @@ poke_file "$TEST_IMG" '131084' "\x00\x00" # 0x2000c
# any unallocated cluster, leading to an attempt to overwrite the second L2
# table. Finally, resume the COW write and see it fail (but not crash).
echo "open -o file.driver=blkdebug $TEST_IMG
-break cow_read 0
+break write_aio 0
aio_write 0k 1k
wait_break 0
write 64k 64k
diff --git a/tests/qemu-iotests/060.out b/tests/qemu-iotests/060.out
index c4cb7c6..7a0fbb8 100644
--- a/tests/qemu-iotests/060.out
+++ b/tests/qemu-iotests/060.out
@@ -108,7 +108,8 @@ qcow2: Marking image as corrupt: Preventing invalid write on metadata (overlaps
blkdebug: Suspended request '0'
write failed: Input/output error
blkdebug: Resuming request '0'
-aio_write failed: No medium found
+wrote 1024/1024 bytes at offset 0
+1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
=== Testing unallocated image header ===
diff --git a/tests/qemu-iotests/066 b/tests/qemu-iotests/066
index 8638217..3c216a1 100755
--- a/tests/qemu-iotests/066
+++ b/tests/qemu-iotests/066
@@ -71,7 +71,7 @@ echo
_make_test_img $IMG_SIZE
# Create data clusters (not aligned to an L2 table)
-$QEMU_IO -c 'write -P 42 1M 256k' "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "write -P 42 $(((1024 + 32) * 1024)) 192k" "$TEST_IMG" | _filter_qemu_io
orig_map=$($QEMU_IMG map --output=json "$TEST_IMG")
# Convert the data clusters to preallocated zero clusters
diff --git a/tests/qemu-iotests/066.out b/tests/qemu-iotests/066.out
index 3d9da9b..093431e 100644
--- a/tests/qemu-iotests/066.out
+++ b/tests/qemu-iotests/066.out
@@ -19,8 +19,8 @@ Offset Length Mapped to File
=== Writing to preallocated zero clusters ===
Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67109376
-wrote 262144/262144 bytes at offset 1048576
-256 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 196608/196608 bytes at offset 1081344
+192 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
wrote 262144/262144 bytes at offset 1048576
256 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
wrote 196608/196608 bytes at offset 1081344
--
2.7.4
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [Qemu-devel] [PATCH v6 9/9] iotest 134: test cluster-misaligned encrypted write
2018-01-16 13:04 [Qemu-devel] [PATCH v6 0/9] qcow2: cluster space preallocation Anton Nefedov
` (7 preceding siblings ...)
2018-01-16 13:04 ` [Qemu-devel] [PATCH v6 8/9] qcow2: skip writing zero buffers to empty COW areas Anton Nefedov
@ 2018-01-16 13:04 ` Anton Nefedov
2018-01-16 20:45 ` Eric Blake
8 siblings, 1 reply; 14+ messages in thread
From: Anton Nefedov @ 2018-01-16 13:04 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den, berto, Anton Nefedov
COW (even empty/zero) areas require encryption too
Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
---
tests/qemu-iotests/134 | 9 +++++++++
tests/qemu-iotests/134.out | 10 ++++++++++
2 files changed, 19 insertions(+)
diff --git a/tests/qemu-iotests/134 b/tests/qemu-iotests/134
index 9914415..6083ae4 100755
--- a/tests/qemu-iotests/134
+++ b/tests/qemu-iotests/134
@@ -59,6 +59,15 @@ echo "== reading whole image =="
$QEMU_IO --object $SECRET -c "read 0 $size" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
echo
+echo "== rewriting cluster part =="
+$QEMU_IO --object $SECRET -c "write -P 0xb 512 512" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
+
+echo
+echo "== verify pattern =="
+$QEMU_IO --object $SECRET -c "read -P 0 0 512" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
+$QEMU_IO --object $SECRET -c "read -P 0xb 512 512" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
+
+echo
echo "== rewriting whole image =="
$QEMU_IO --object $SECRET -c "write -P 0xa 0 $size" --image-opts $IMGSPEC | _filter_qemu_io | _filter_testdir
diff --git a/tests/qemu-iotests/134.out b/tests/qemu-iotests/134.out
index 972be49..09d46f6 100644
--- a/tests/qemu-iotests/134.out
+++ b/tests/qemu-iotests/134.out
@@ -5,6 +5,16 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134217728 encryption=on encrypt.
read 134217728/134217728 bytes at offset 0
128 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+== rewriting cluster part ==
+wrote 512/512 bytes at offset 512
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+== verify pattern ==
+read 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 512
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
== rewriting whole image ==
wrote 134217728/134217728 bytes at offset 0
128 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
--
2.7.4
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] [PATCH v6 8/9] qcow2: skip writing zero buffers to empty COW areas
2018-01-16 13:04 ` [Qemu-devel] [PATCH v6 8/9] qcow2: skip writing zero buffers to empty COW areas Anton Nefedov
@ 2018-01-16 16:11 ` Alberto Garcia
2018-01-17 14:12 ` Anton Nefedov
0 siblings, 1 reply; 14+ messages in thread
From: Alberto Garcia @ 2018-01-16 16:11 UTC (permalink / raw)
To: Anton Nefedov, qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den
On Tue 16 Jan 2018 02:04:29 PM CET, Anton Nefedov wrote:
> iotest 060:
> write to the discarded cluster does not trigger COW anymore.
> so, break on write_aio event instead, will work for the test
> (but write won't fail anymore, so update reference output)
I'm wondering about this. The reason why the write doesn't fail anymore
is because after this patch we're breaking in write_aio as you say:
BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
trace_qcow2_writev_data(qemu_coroutine_self(),
cluster_offset + offset_in_cluster);
ret = bdrv_co_pwritev(bs->file,
cluster_offset + offset_in_cluster,
cur_bytes, &hd_qiov, 0);
When the image is marked as corrupted then bs->drv is set to NULL, but
bs->file->drv is still valid. So QEMU goes forward and writes into the
image.
Should we check bs->drv after BLKDBG_EVENT() or perhaps set
bs->file->bs->drv = NULL when an image is corrupted?
> +static bool is_zero_cow(BlockDriverState *bs, QCowL2Meta *m)
> +{
> + if (bs->encrypted) {
> + return false;
> + }
I found this a bit confusing because is_zero_cow() can be interpreted as
"the region we're going to copy only contains zeroes" or "we're only
going to write zeroes".
In the first case the bs->encrypted test does not belong there, because
that region may perfectly well contain only zeroes and bs->encrypted
tells us nothing about it.
In the second case the test is fine because bs->encrypted means that
we're definitely going to write something other than zeroes.
I think it's worth adding a comment clarifying this in order to avoid
confusion, or perhaps renaming the function to make it more explicit
(cow_writes_as_zeroes() or something like that).
> +static void handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
> +{
> + BDRVQcow2State *s = bs->opaque;
> + QCowL2Meta *m;
> +
> + for (m = l2meta; m != NULL; m = m->next) {
> + int ret;
> +
> + if (!m->cow_start.nb_bytes && !m->cow_end.nb_bytes) {
> + continue;
> + }
> +
> + if (!is_zero_cow(bs, m)) {
> + continue;
> + }
> +
> + /* instead of writing zero COW buffers,
> + efficiently zero out the whole clusters */
> + ret = bdrv_co_pwrite_zeroes(bs->file, m->alloc_offset,
> + m->nb_clusters * s->cluster_size,
> + BDRV_REQ_ALLOCATE);
> + if (ret < 0) {
> + continue;
> + }
Is it always fine to simply ignore the error and go on?
> --- a/tests/qemu-iotests/060
> +++ b/tests/qemu-iotests/060
> @@ -160,7 +160,7 @@ poke_file "$TEST_IMG" '131084' "\x00\x00" # 0x2000c
> # any unallocated cluster, leading to an attempt to overwrite the second L2
> # table. Finally, resume the COW write and see it fail (but not crash).
> echo "open -o file.driver=blkdebug $TEST_IMG
> -break cow_read 0
> +break write_aio 0
> aio_write 0k 1k
> wait_break 0
> write 64k 64k
Apart from what I wrote in the beginning of the e-mail, if you're
changing the semantics of this test you should also update the
comment. With your patch the COW no longer stops before doing the read,
and after being resumed it no longer crashes.
Berto
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] [PATCH v6 4/9] block: treat BDRV_REQ_ALLOCATE as serialising
2018-01-16 13:04 ` [Qemu-devel] [PATCH v6 4/9] block: treat BDRV_REQ_ALLOCATE as serialising Anton Nefedov
@ 2018-01-16 20:43 ` Eric Blake
0 siblings, 0 replies; 14+ messages in thread
From: Eric Blake @ 2018-01-16 20:43 UTC (permalink / raw)
To: Anton Nefedov, qemu-devel; +Cc: qemu-block, kwolf, mreitz, den, berto
[-- Attachment #1: Type: text/plain, Size: 791 bytes --]
On 01/16/2018 07:04 AM, Anton Nefedov wrote:
> The idea is that ALLOCATE requests may overlap with other requests.
> Reuse the existing block layer infrastructure for serialising requests.
> Use the following approach:
> - mark ALLOCATE serialising, so subsequent requests to the area wait
> - ALLOCATE request itself must never wait if another request is in flight
> already. Return EAGAIN, let the caller reconsider.
>
> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> ---
> block/io.c | 27 +++++++++++++++++++--------
> 1 file changed, 19 insertions(+), 8 deletions(-)
>
Reviewed-by: Eric Blake <eblake@redhat.com>
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] [PATCH v6 9/9] iotest 134: test cluster-misaligned encrypted write
2018-01-16 13:04 ` [Qemu-devel] [PATCH v6 9/9] iotest 134: test cluster-misaligned encrypted write Anton Nefedov
@ 2018-01-16 20:45 ` Eric Blake
0 siblings, 0 replies; 14+ messages in thread
From: Eric Blake @ 2018-01-16 20:45 UTC (permalink / raw)
To: Anton Nefedov, qemu-devel; +Cc: qemu-block, kwolf, mreitz, den, berto
[-- Attachment #1: Type: text/plain, Size: 483 bytes --]
On 01/16/2018 07:04 AM, Anton Nefedov wrote:
> COW (even empty/zero) areas require encryption too
>
> Signed-off-by: Anton Nefedov <anton.nefedov@virtuozzo.com>
> ---
> tests/qemu-iotests/134 | 9 +++++++++
> tests/qemu-iotests/134.out | 10 ++++++++++
> 2 files changed, 19 insertions(+)
Reviewed-by: Eric Blake <eblake@redhat.com>
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 619 bytes --]
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] [PATCH v6 8/9] qcow2: skip writing zero buffers to empty COW areas
2018-01-16 16:11 ` Alberto Garcia
@ 2018-01-17 14:12 ` Anton Nefedov
0 siblings, 0 replies; 14+ messages in thread
From: Anton Nefedov @ 2018-01-17 14:12 UTC (permalink / raw)
To: Alberto Garcia, qemu-devel; +Cc: qemu-block, kwolf, mreitz, eblake, den
On 16/1/2018 7:11 PM, Alberto Garcia wrote:
> On Tue 16 Jan 2018 02:04:29 PM CET, Anton Nefedov wrote:
>
>> iotest 060:
>> write to the discarded cluster does not trigger COW anymore.
>> so, break on write_aio event instead, will work for the test
>> (but write won't fail anymore, so update reference output)
>
> I'm wondering about this. The reason why the write doesn't fail anymore
> is because after this patch we're breaking in write_aio as you say:
>
> BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
> trace_qcow2_writev_data(qemu_coroutine_self(),
> cluster_offset + offset_in_cluster);
> ret = bdrv_co_pwritev(bs->file,
> cluster_offset + offset_in_cluster,
> cur_bytes, &hd_qiov, 0);
>
> When the image is marked as corrupted then bs->drv is set to NULL, but
> bs->file->drv is still valid. So QEMU goes forward and writes into the
> image.
>
> Should we check bs->drv after BLKDBG_EVENT() or perhaps set
> bs->file->bs->drv = NULL when an image is corrupted?
>
I don't know. On one hand we'll catch and cancel some of in-flight
requests which is rather good.
It feels though like the drv check that the test uses to get error on is
mostly because the driver function is used directly.
>> +static bool is_zero_cow(BlockDriverState *bs, QCowL2Meta *m)
>> +{
>> + if (bs->encrypted) {
>> + return false;
>> + }
>
> I found this a bit confusing because is_zero_cow() can be interpreted as
> "the region we're going to copy only contains zeroes" or "we're only
> going to write zeroes".
>
> In the first case the bs->encrypted test does not belong there, because
> that region may perfectly well contain only zeroes and bs->encrypted
> tells us nothing about it.
>
> In the second case the test is fine because bs->encrypted means that
> we're definitely going to write something other than zeroes.
>
> I think it's worth adding a comment clarifying this in order to avoid
> confusion, or perhaps renaming the function to make it more explicit
> (cow_writes_as_zeroes() or something like that).
>
Agree. I'd rather take bs->encrypted check out.
>> +static void handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
>> +{
>> + BDRVQcow2State *s = bs->opaque;
>> + QCowL2Meta *m;
>> +
>> + for (m = l2meta; m != NULL; m = m->next) {
>> + int ret;
>> +
>> + if (!m->cow_start.nb_bytes && !m->cow_end.nb_bytes) {
>> + continue;
>> + }
>> +
>> + if (!is_zero_cow(bs, m)) {
>> + continue;
>> + }
>> +
>> + /* instead of writing zero COW buffers,
>> + efficiently zero out the whole clusters */
>> + ret = bdrv_co_pwrite_zeroes(bs->file, m->alloc_offset,
>> + m->nb_clusters * s->cluster_size,
>> + BDRV_REQ_ALLOCATE);
>> + if (ret < 0) {
>> + continue;
>> + }
>
> Is it always fine to simply ignore the error and go on?
>
Good point, probably error codes other than ENOTSUP and EAGAIN should be
propagated.
>> --- a/tests/qemu-iotests/060
>> +++ b/tests/qemu-iotests/060
>> @@ -160,7 +160,7 @@ poke_file "$TEST_IMG" '131084' "\x00\x00" # 0x2000c
>> # any unallocated cluster, leading to an attempt to overwrite the second L2
>> # table. Finally, resume the COW write and see it fail (but not crash).
>> echo "open -o file.driver=blkdebug $TEST_IMG
>> -break cow_read 0
>> +break write_aio 0
>> aio_write 0k 1k
>> wait_break 0
>> write 64k 64k
>
> Apart from what I wrote in the beginning of the e-mail, if you're
> changing the semantics of this test you should also update the
> comment. With your patch the COW no longer stops before doing the read,
> and after being resumed it no longer crashes.
>
In fact, the change makes the test quite useless.
I will fix COW instead (i.e. use a real backing file).
Also I think I missed to create a new blkdbg event, it looks those are
generally put before bdrv_x(bs->file) calls.
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2018-01-17 14:13 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-16 13:04 [Qemu-devel] [PATCH v6 0/9] qcow2: cluster space preallocation Anton Nefedov
2018-01-16 13:04 ` [Qemu-devel] [PATCH v6 1/9] mirror: inherit supported write/zero flags Anton Nefedov
2018-01-16 13:04 ` [Qemu-devel] [PATCH v6 2/9] blkverify: set " Anton Nefedov
2018-01-16 13:04 ` [Qemu-devel] [PATCH v6 3/9] block: introduce BDRV_REQ_ALLOCATE flag Anton Nefedov
2018-01-16 13:04 ` [Qemu-devel] [PATCH v6 4/9] block: treat BDRV_REQ_ALLOCATE as serialising Anton Nefedov
2018-01-16 20:43 ` Eric Blake
2018-01-16 13:04 ` [Qemu-devel] [PATCH v6 5/9] file-posix: support BDRV_REQ_ALLOCATE Anton Nefedov
2018-01-16 13:04 ` [Qemu-devel] [PATCH v6 6/9] block: support BDRV_REQ_ALLOCATE in passthrough drivers Anton Nefedov
2018-01-16 13:04 ` [Qemu-devel] [PATCH v6 7/9] qcow2: move is_zero() up Anton Nefedov
2018-01-16 13:04 ` [Qemu-devel] [PATCH v6 8/9] qcow2: skip writing zero buffers to empty COW areas Anton Nefedov
2018-01-16 16:11 ` Alberto Garcia
2018-01-17 14:12 ` Anton Nefedov
2018-01-16 13:04 ` [Qemu-devel] [PATCH v6 9/9] iotest 134: test cluster-misaligned encrypted write Anton Nefedov
2018-01-16 20:45 ` Eric Blake
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).