From: Max Reitz <mreitz@redhat.com>
To: qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>,
Stefan Hajnoczi <stefanha@redhat.com>,
Max Reitz <mreitz@redhat.com>
Subject: [Qemu-devel] [PATCH v12 03/14] qcow2: Optimize bdrv_make_empty()
Date: Tue, 26 Aug 2014 23:36:16 +0200 [thread overview]
Message-ID: <1409088987-17207-4-git-send-email-mreitz@redhat.com> (raw)
In-Reply-To: <1409088987-17207-1-git-send-email-mreitz@redhat.com>
bdrv_make_empty() is currently only called if the current image
represents an external snapshot that has been committed to its base
image; it is therefore unlikely to have internal snapshots. In this
case, bdrv_make_empty() can be greatly sped up by emptying the L1 and
refcount table (while having the dirty flag set) and creating a trivial
refcount structure.
If there are snapshots, fall back to the simple implementation (discard
all clusters).
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
block/blkdebug.c | 2 +
block/qcow2.c | 137 ++++++++++++++++++++++++++++++++++++++++++++------
include/block/block.h | 2 +
3 files changed, 126 insertions(+), 15 deletions(-)
diff --git a/block/blkdebug.c b/block/blkdebug.c
index 69b330e..a21678d 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -198,6 +198,8 @@ static const char *event_names[BLKDBG_EVENT_MAX] = {
[BLKDBG_PWRITEV] = "pwritev",
[BLKDBG_PWRITEV_ZERO] = "pwritev_zero",
[BLKDBG_PWRITEV_DONE] = "pwritev_done",
+
+ [BLKDBG_EMPTY_IMAGE_PREPARE] = "empty_image_prepare",
};
static int get_event_by_name(const char *name, BlkDebugEvent *event)
diff --git a/block/qcow2.c b/block/qcow2.c
index 2efd9b5..e475151 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2155,24 +2155,131 @@ fail:
static int qcow2_make_empty(BlockDriverState *bs)
{
+ BDRVQcowState *s = bs->opaque;
int ret = 0;
- uint64_t start_sector;
- int sector_step = INT_MAX / BDRV_SECTOR_SIZE;
- for (start_sector = 0; start_sector < bs->total_sectors;
- start_sector += sector_step)
- {
- /* As this function is generally used after committing an external
- * snapshot, QCOW2_DISCARD_SNAPSHOT seems appropriate. Also, the
- * default action for this kind of discard is to pass the discard,
- * which will ideally result in an actually smaller image file, as
- * is probably desired. */
- ret = qcow2_discard_clusters(bs, start_sector * BDRV_SECTOR_SIZE,
- MIN(sector_step,
- bs->total_sectors - start_sector),
- QCOW2_DISCARD_SNAPSHOT, true);
+ if (s->snapshots) {
+ uint64_t start_sector;
+ int sector_step = INT_MAX / BDRV_SECTOR_SIZE;
+
+ /* If there are snapshots, every active cluster has to be discarded */
+
+ for (start_sector = 0; start_sector < bs->total_sectors;
+ start_sector += sector_step)
+ {
+ /* As this function is generally used after committing an external
+ * snapshot, QCOW2_DISCARD_SNAPSHOT seems appropriate. Also, the
+ * default action for this kind of discard is to pass the discard,
+ * which will ideally result in an actually smaller image file, as
+ * is probably desired. */
+ ret = qcow2_discard_clusters(bs, start_sector * BDRV_SECTOR_SIZE,
+ MIN(sector_step,
+ bs->total_sectors - start_sector),
+ QCOW2_DISCARD_SNAPSHOT, true);
+ if (ret < 0) {
+ break;
+ }
+ }
+ } else {
+ int l1_clusters;
+ int64_t offset;
+ uint64_t *new_reftable;
+ uint8_t l1_ofs_rt_ofs_cls[20]; /* L1 offset; RT offset and clusters */
+ uint64_t rt_entry;
+
+ ret = qcow2_cache_empty(bs, s->l2_table_cache);
if (ret < 0) {
- break;
+ return ret;
+ }
+
+ ret = qcow2_cache_empty(bs, s->refcount_block_cache);
+ if (ret < 0) {
+ return ret;
+ }
+
+ /* Refcounts will be broken utterly */
+ ret = qcow2_mark_dirty(bs);
+ if (ret < 0) {
+ return ret;
+ }
+
+ l1_clusters = DIV_ROUND_UP(s->l1_size,
+ s->cluster_size / sizeof(uint64_t));
+ new_reftable = g_try_new0(uint64_t, s->cluster_size / sizeof(uint64_t));
+ if (!new_reftable) {
+ return -ENOMEM;
+ }
+
+ BLKDBG_EVENT(bs->file, BLKDBG_EMPTY_IMAGE_PREPARE);
+
+ /* Overwrite enough clusters at the beginning of the sectors to place
+ * the refcount table, a refcount block and the L1 table in; this may
+ * overwrite parts of the existing refcount and L1 table, which is not
+ * an issue because the dirty flag is set, complete data loss is in fact
+ * desired and partial data loss is consequently fine as well */
+ ret = bdrv_write_zeroes(bs->file, s->cluster_size / BDRV_SECTOR_SIZE,
+ (2 + l1_clusters) * s->cluster_size /
+ BDRV_SECTOR_SIZE, 0);
+ if (ret < 0) {
+ g_free(new_reftable);
+ return ret;
+ }
+
+ BLKDBG_EVENT(bs->file, BLKDBG_L1_UPDATE);
+ BLKDBG_EVENT(bs->file, BLKDBG_REFTABLE_UPDATE);
+
+ /* "Create" an empty reftable (one cluster) directly after the image
+ * header and an empty L1 table three clusters after the image header;
+ * the cluster between those two will be used as the first refblock */
+ cpu_to_be64w((uint64_t *)&l1_ofs_rt_ofs_cls[ 0], 3 * s->cluster_size);
+ cpu_to_be64w((uint64_t *)&l1_ofs_rt_ofs_cls[ 8], s->cluster_size);
+ cpu_to_be32w((uint32_t *)&l1_ofs_rt_ofs_cls[16], 1);
+ ret = bdrv_pwrite_sync(bs->file, offsetof(QCowHeader, l1_table_offset),
+ l1_ofs_rt_ofs_cls, sizeof(l1_ofs_rt_ofs_cls));
+ if (ret < 0) {
+ g_free(new_reftable);
+ return ret;
+ }
+
+ s->l1_table_offset = 3 * s->cluster_size;
+ memset(s->l1_table, 0, s->l1_size * sizeof(uint64_t));
+
+ s->refcount_table_offset = s->cluster_size;
+ s->refcount_table_size = s->cluster_size / sizeof(uint64_t);
+
+ g_free(s->refcount_table);
+ s->refcount_table = new_reftable;
+
+ BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_ALLOC);
+
+ /* Enter the first refblock into the reftable */
+ rt_entry = cpu_to_be64(2 * s->cluster_size);
+ ret = bdrv_pwrite_sync(bs->file, s->cluster_size,
+ &rt_entry, sizeof(rt_entry));
+ if (ret < 0) {
+ return ret;
+ }
+
+ s->refcount_table[0] = 2 * s->cluster_size;
+
+ ret = bdrv_truncate(bs->file, (3 + l1_clusters) * s->cluster_size);
+ if (ret < 0) {
+ return ret;
+ }
+
+ s->free_cluster_index = 0;
+ offset = qcow2_alloc_clusters(bs, 3 * s->cluster_size +
+ s->l1_size * sizeof(uint64_t));
+ if (offset < 0) {
+ return offset;
+ } else if (offset > 0) {
+ error_report("First cluster in emptied image is in use");
+ abort();
+ }
+
+ ret = qcow2_mark_clean(bs);
+ if (ret < 0) {
+ return ret;
}
}
diff --git a/include/block/block.h b/include/block/block.h
index 8f4ad16..7ac4caf 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -557,6 +557,8 @@ typedef enum {
BLKDBG_PWRITEV_ZERO,
BLKDBG_PWRITEV_DONE,
+ BLKDBG_EMPTY_IMAGE_PREPARE,
+
BLKDBG_EVENT_MAX,
} BlkDebugEvent;
--
2.1.0
next prev parent reply other threads:[~2014-08-26 21:36 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-26 21:36 [Qemu-devel] [PATCH v12 00/14] qemu-img: Implement commit like QMP Max Reitz
2014-08-26 21:36 ` [Qemu-devel] [PATCH v12 01/14] qcow2: Allow "full" discard Max Reitz
2014-08-26 21:36 ` [Qemu-devel] [PATCH v12 02/14] qcow2: Implement bdrv_make_empty() Max Reitz
2014-08-26 21:36 ` Max Reitz [this message]
2014-10-10 12:32 ` [Qemu-devel] [PATCH v12 03/14] qcow2: Optimize bdrv_make_empty() Eric Blake
2014-10-10 15:34 ` Eric Blake
2014-10-11 10:27 ` Max Reitz
2014-08-26 21:36 ` [Qemu-devel] [PATCH v12 04/14] blockjob: Introduce block_job_complete_sync() Max Reitz
2014-08-26 21:36 ` [Qemu-devel] [PATCH v12 05/14] blockjob: Add "ready" field Max Reitz
2014-08-26 21:36 ` [Qemu-devel] [PATCH v12 06/14] iotests: Omit length/offset test in 040 and 041 Max Reitz
2014-08-26 21:36 ` [Qemu-devel] [PATCH v12 07/14] block/mirror: Improve progress report Max Reitz
2014-08-26 21:36 ` [Qemu-devel] [PATCH v12 08/14] qemu-img: Implement commit like QMP Max Reitz
2014-08-26 21:36 ` [Qemu-devel] [PATCH v12 09/14] qemu-img: Empty image after commit Max Reitz
2014-08-26 21:36 ` [Qemu-devel] [PATCH v12 10/14] qemu-img: Enable progress output for commit Max Reitz
2014-08-26 21:36 ` [Qemu-devel] [PATCH v12 11/14] qemu-img: Specify backing file " Max Reitz
2014-08-26 21:36 ` [Qemu-devel] [PATCH v12 12/14] iotests: Add _filter_qemu_img_map Max Reitz
2014-08-26 21:36 ` [Qemu-devel] [PATCH v12 13/14] iotests: Add test for backing-chain commits Max Reitz
2014-08-26 21:36 ` [Qemu-devel] [PATCH v12 14/14] iotests: Add test for qcow2's bdrv_make_empty Max Reitz
2014-10-10 16:47 ` Eric Blake
2014-10-11 10:27 ` Max Reitz
2014-10-08 19:29 ` [Qemu-devel] [PATCH v12 00/14] qemu-img: Implement commit like QMP Max Reitz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1409088987-17207-4-git-send-email-mreitz@redhat.com \
--to=mreitz@redhat.com \
--cc=kwolf@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).