From: Eric Blake <eblake@redhat.com>
To: qemu-devel@nongnu.org
Cc: kwolf@redhat.com, qemu-block@nongnu.org, mreitz@redhat.com
Subject: [Qemu-devel] [PATCH v6 04/10] qcow2: Discard/zero clusters by byte count
Date: Tue, 7 Mar 2017 20:54:22 -0600 [thread overview]
Message-ID: <20170308025428.1037-5-eblake@redhat.com> (raw)
In-Reply-To: <20170308025428.1037-1-eblake@redhat.com>
Passing a byte offset, but sector count, when we ultimately
want to operate on cluster granularity, is madness. Clean up
the external interfaces to take both offset and count as bytes,
while still keeping the assertion added previously that the
caller must align the values to a cluster. Then rename things
to make sure backports don't get confused by changed units:
instead of qcow2_discard_clusters() and qcow2_zero_clusters(),
we now have qcow2_cluster_discard() and qcow2_cluster_zeroize().
The internal functions still operate on clusters at a time, and
return an int for number of cleared clusters; but on an image
with 2M clusters, a single L2 table holds 256k entries that each
represent a 2M cluster, totalling well over INT_MAX bytes if we
ever had a request for that many bytes at once. All our callers
currently limit themselves to 32-bit bytes (and therefore fewer
clusters), but by making this function 64-bit clean, we have one
less place to clean up if we later improve the block layer to
support 64-bit bytes through all operations (with the block layer
auto-fragmenting on behalf of more-limited drivers), rather than
the current state where some interfaces are artificially limited
to INT_MAX at a time.
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
---
v6: rebase due to context
v5: s/count/byte/ to make the units obvious, and rework the math
to ensure no 32-bit integer overflow on large clusters
v4: improve function names, split assertion additions into earlier patch
[no v3 or v2]
v1: https://lists.gnu.org/archive/html/qemu-devel/2016-12/msg00339.html
---
block/qcow2.h | 9 +++++----
block/qcow2-cluster.c | 38 +++++++++++++++++++++-----------------
block/qcow2-snapshot.c | 7 +++----
block/qcow2.c | 21 +++++++++------------
4 files changed, 38 insertions(+), 37 deletions(-)
diff --git a/block/qcow2.h b/block/qcow2.h
index f8aeb08..808104c 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -544,10 +544,11 @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
int compressed_size);
int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m);
-int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,
- int nb_sectors, enum qcow2_discard_type type, bool full_discard);
-int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors,
- int flags);
+int qcow2_cluster_discard(BlockDriverState *bs, uint64_t offset,
+ uint64_t bytes, enum qcow2_discard_type type,
+ bool full_discard);
+int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
+ uint64_t bytes, int flags);
int qcow2_expand_zero_clusters(BlockDriverState *bs,
BlockDriverAmendStatusCB *status_cb,
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 046fbd8..ca9e7e6 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1509,16 +1509,16 @@ static int discard_single_l2(BlockDriverState *bs, uint64_t offset,
return nb_clusters;
}
-int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,
- int nb_sectors, enum qcow2_discard_type type, bool full_discard)
+int qcow2_cluster_discard(BlockDriverState *bs, uint64_t offset,
+ uint64_t bytes, enum qcow2_discard_type type,
+ bool full_discard)
{
BDRVQcow2State *s = bs->opaque;
- uint64_t end_offset;
+ uint64_t end_offset = offset + bytes;
uint64_t nb_clusters;
+ int64_t cleared;
int ret;
- end_offset = offset + (nb_sectors << BDRV_SECTOR_BITS);
-
/* Caller must pass aligned values, except at image end */
assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
assert(QEMU_IS_ALIGNED(end_offset, s->cluster_size) ||
@@ -1530,13 +1530,15 @@ int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,
/* Each L2 table is handled by its own loop iteration */
while (nb_clusters > 0) {
- ret = discard_single_l2(bs, offset, nb_clusters, type, full_discard);
- if (ret < 0) {
+ cleared = discard_single_l2(bs, offset, nb_clusters, type,
+ full_discard);
+ if (cleared < 0) {
+ ret = cleared;
goto fail;
}
- nb_clusters -= ret;
- offset += (ret * s->cluster_size);
+ nb_clusters -= cleared;
+ offset += (cleared * s->cluster_size);
}
ret = 0;
@@ -1590,16 +1592,17 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
return nb_clusters;
}
-int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors,
- int flags)
+int qcow2_cluster_zeroize(BlockDriverState *bs, uint64_t offset,
+ uint64_t bytes, int flags)
{
BDRVQcow2State *s = bs->opaque;
uint64_t nb_clusters;
+ int64_t cleared;
int ret;
/* Caller must pass aligned values */
assert(QEMU_IS_ALIGNED(offset, s->cluster_size));
- assert(QEMU_IS_ALIGNED(nb_sectors, s->cluster_sectors));
+ assert(QEMU_IS_ALIGNED(bytes, s->cluster_size));
/* The zero flag is only supported by version 3 and newer */
if (s->qcow_version < 3) {
@@ -1607,18 +1610,19 @@ int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors,
}
/* Each L2 table is handled by its own loop iteration */
- nb_clusters = size_to_clusters(s, nb_sectors << BDRV_SECTOR_BITS);
+ nb_clusters = size_to_clusters(s, bytes);
s->cache_discards = true;
while (nb_clusters > 0) {
- ret = zero_single_l2(bs, offset, nb_clusters, flags);
- if (ret < 0) {
+ cleared = zero_single_l2(bs, offset, nb_clusters, flags);
+ if (cleared < 0) {
+ ret = cleared;
goto fail;
}
- nb_clusters -= ret;
- offset += (ret * s->cluster_size);
+ nb_clusters -= cleared;
+ offset += (cleared * s->cluster_size);
}
ret = 0;
diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 0324243..44243e0 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -440,10 +440,9 @@ int qcow2_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info)
/* The VM state isn't needed any more in the active L1 table; in fact, it
* hurts by causing expensive COW for the next snapshot. */
- qcow2_discard_clusters(bs, qcow2_vm_state_offset(s),
- align_offset(sn->vm_state_size, s->cluster_size)
- >> BDRV_SECTOR_BITS,
- QCOW2_DISCARD_NEVER, false);
+ qcow2_cluster_discard(bs, qcow2_vm_state_offset(s),
+ align_offset(sn->vm_state_size, s->cluster_size),
+ QCOW2_DISCARD_NEVER, false);
#ifdef DEBUG_ALLOC
{
diff --git a/block/qcow2.c b/block/qcow2.c
index 6a92d2e..03215e2 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2501,7 +2501,7 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
trace_qcow2_pwrite_zeroes(qemu_coroutine_self(), offset, count);
/* Whatever is left can use real zero clusters */
- ret = qcow2_zero_clusters(bs, offset, count >> BDRV_SECTOR_BITS, flags);
+ ret = qcow2_cluster_zeroize(bs, offset, count, flags);
qemu_co_mutex_unlock(&s->lock);
return ret;
@@ -2519,8 +2519,8 @@ static coroutine_fn int qcow2_co_pdiscard(BlockDriverState *bs,
}
qemu_co_mutex_lock(&s->lock);
- ret = qcow2_discard_clusters(bs, offset, count >> BDRV_SECTOR_BITS,
- QCOW2_DISCARD_REQUEST, false);
+ ret = qcow2_cluster_discard(bs, offset, count, QCOW2_DISCARD_REQUEST,
+ false);
qemu_co_mutex_unlock(&s->lock);
return ret;
}
@@ -2822,9 +2822,8 @@ fail:
static int qcow2_make_empty(BlockDriverState *bs)
{
BDRVQcow2State *s = bs->opaque;
- uint64_t start_sector;
- int sector_step = (QEMU_ALIGN_DOWN(INT_MAX, s->cluster_size) /
- BDRV_SECTOR_SIZE);
+ uint64_t offset, end_offset;
+ int step = QEMU_ALIGN_DOWN(INT_MAX, s->cluster_size);
int l1_clusters, ret = 0;
l1_clusters = DIV_ROUND_UP(s->l1_size, s->cluster_size / sizeof(uint64_t));
@@ -2841,18 +2840,16 @@ static int qcow2_make_empty(BlockDriverState *bs)
/* This fallback code simply discards every active cluster; this is slow,
* but works in all cases */
- for (start_sector = 0; start_sector < bs->total_sectors;
- start_sector += sector_step)
+ end_offset = bs->total_sectors * BDRV_SECTOR_SIZE;
+ for (offset = 0; offset < end_offset; offset += step)
{
/* As this function is generally used after committing an external
* snapshot, QCOW2_DISCARD_SNAPSHOT seems appropriate. Also, the
* default action for this kind of discard is to pass the discard,
* which will ideally result in an actually smaller image file, as
* is probably desired. */
- ret = qcow2_discard_clusters(bs, start_sector * BDRV_SECTOR_SIZE,
- MIN(sector_step,
- bs->total_sectors - start_sector),
- QCOW2_DISCARD_SNAPSHOT, true);
+ ret = qcow2_cluster_discard(bs, offset, MIN(step, end_offset - offset),
+ QCOW2_DISCARD_SNAPSHOT, true);
if (ret < 0) {
break;
}
--
2.9.3
next prev parent reply other threads:[~2017-03-08 2:54 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-08 2:54 [Qemu-devel] [PATCH v6 00/10] add blkdebug tests Eric Blake
2017-03-08 2:54 ` [Qemu-devel] [PATCH v6 01/10] iotests: fix 097 when run with qcow Eric Blake
2017-03-08 2:54 ` [Qemu-devel] [PATCH v6 02/10] iotests: Improve image-clear tests on non-aligned image Eric Blake
2017-03-13 22:00 ` Max Reitz
2017-03-30 2:13 ` Eric Blake
2017-03-30 18:39 ` Eric Blake
2017-03-30 21:25 ` Max Reitz
2017-03-08 2:54 ` [Qemu-devel] [PATCH v6 03/10] qcow2: Assert that cluster operations are aligned Eric Blake
2017-03-13 22:07 ` Max Reitz
2017-03-08 2:54 ` Eric Blake [this message]
2017-03-08 2:54 ` [Qemu-devel] [PATCH v6 05/10] blkdebug: Sanity check block layer guarantees Eric Blake
2017-03-08 2:54 ` [Qemu-devel] [PATCH v6 06/10] blkdebug: Refactor error injection Eric Blake
2017-03-13 22:16 ` Max Reitz
2017-03-08 2:54 ` [Qemu-devel] [PATCH v6 07/10] blkdebug: Add pass-through write_zero and discard support Eric Blake
2017-03-13 22:20 ` Max Reitz
2017-03-08 2:54 ` [Qemu-devel] [PATCH v6 08/10] blkdebug: Simplify override logic Eric Blake
2017-03-08 2:54 ` [Qemu-devel] [PATCH v6 09/10] blkdebug: Add ability to override unmap geometries Eric Blake
2017-03-13 22:22 ` Max Reitz
2017-03-08 2:54 ` [Qemu-devel] [PATCH v6 10/10] tests: Add coverage for recent block geometry fixes Eric Blake
2017-03-08 8:40 ` [Qemu-devel] [PATCH v6 00/10] add blkdebug tests Kevin Wolf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170308025428.1037-5-eblake@redhat.com \
--to=eblake@redhat.com \
--cc=kwolf@redhat.com \
--cc=mreitz@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).