qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: qemu-devel@nongnu.org
Cc: Peter Maydell <peter.maydell@linaro.org>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Max Reitz <mreitz@redhat.com>
Subject: [Qemu-devel] [PULL 23/53] qcow2: Optimize bdrv_make_empty()
Date: Mon,  3 Nov 2014 11:50:26 +0000	[thread overview]
Message-ID: <1415015456-25086-24-git-send-email-stefanha@redhat.com> (raw)
In-Reply-To: <1415015456-25086-1-git-send-email-stefanha@redhat.com>

From: Max Reitz <mreitz@redhat.com>

bdrv_make_empty() is currently only called if the current image
represents an external snapshot that has been committed to its base
image; it is therefore unlikely to have internal snapshots. In this
case, bdrv_make_empty() can be greatly sped up by emptying the L1 and
refcount table (while having the dirty flag set, which only works for
compat=1.1) and creating a trivial refcount structure.

If there are snapshots or for compat=0.10, fall back to the simple
implementation (discard all clusters).

[Applied s/clusters/cluster/ typo fix suggested by Eric Blake
--Stefan]

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1414159063-25977-4-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/blkdebug.c      |   2 +
 block/qcow2.c         | 165 +++++++++++++++++++++++++++++++++++++++++++++++++-
 include/block/block.h |   2 +
 3 files changed, 168 insertions(+), 1 deletion(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index e046b92..862d93b 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -195,6 +195,8 @@ static const char *event_names[BLKDBG_EVENT_MAX] = {
     [BLKDBG_PWRITEV]                        = "pwritev",
     [BLKDBG_PWRITEV_ZERO]                   = "pwritev_zero",
     [BLKDBG_PWRITEV_DONE]                   = "pwritev_done",
+
+    [BLKDBG_EMPTY_IMAGE_PREPARE]            = "empty_image_prepare",
 };
 
 static int get_event_by_name(const char *name, BlkDebugEvent *event)
diff --git a/block/qcow2.c b/block/qcow2.c
index bf871d5..7ec7830 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2230,12 +2230,175 @@ fail:
     return ret;
 }
 
+static int make_completely_empty(BlockDriverState *bs)
+{
+    BDRVQcowState *s = bs->opaque;
+    int ret, l1_clusters;
+    int64_t offset;
+    uint64_t *new_reftable = NULL;
+    uint64_t rt_entry, l1_size2;
+    struct {
+        uint64_t l1_offset;
+        uint64_t reftable_offset;
+        uint32_t reftable_clusters;
+    } QEMU_PACKED l1_ofs_rt_ofs_cls;
+
+    ret = qcow2_cache_empty(bs, s->l2_table_cache);
+    if (ret < 0) {
+        goto fail;
+    }
+
+    ret = qcow2_cache_empty(bs, s->refcount_block_cache);
+    if (ret < 0) {
+        goto fail;
+    }
+
+    /* Refcounts will be broken utterly */
+    ret = qcow2_mark_dirty(bs);
+    if (ret < 0) {
+        goto fail;
+    }
+
+    BLKDBG_EVENT(bs->file, BLKDBG_L1_UPDATE);
+
+    l1_clusters = DIV_ROUND_UP(s->l1_size, s->cluster_size / sizeof(uint64_t));
+    l1_size2 = (uint64_t)s->l1_size * sizeof(uint64_t);
+
+    /* After this call, neither the in-memory nor the on-disk refcount
+     * information accurately describe the actual references */
+
+    ret = bdrv_write_zeroes(bs->file, s->l1_table_offset / BDRV_SECTOR_SIZE,
+                            l1_clusters * s->cluster_sectors, 0);
+    if (ret < 0) {
+        goto fail_broken_refcounts;
+    }
+    memset(s->l1_table, 0, l1_size2);
+
+    BLKDBG_EVENT(bs->file, BLKDBG_EMPTY_IMAGE_PREPARE);
+
+    /* Overwrite enough clusters at the beginning of the sectors to place
+     * the refcount table, a refcount block and the L1 table in; this may
+     * overwrite parts of the existing refcount and L1 table, which is not
+     * an issue because the dirty flag is set, complete data loss is in fact
+     * desired and partial data loss is consequently fine as well */
+    ret = bdrv_write_zeroes(bs->file, s->cluster_size / BDRV_SECTOR_SIZE,
+                            (2 + l1_clusters) * s->cluster_size /
+                            BDRV_SECTOR_SIZE, 0);
+    /* This call (even if it failed overall) may have overwritten on-disk
+     * refcount structures; in that case, the in-memory refcount information
+     * will probably differ from the on-disk information which makes the BDS
+     * unusable */
+    if (ret < 0) {
+        goto fail_broken_refcounts;
+    }
+
+    BLKDBG_EVENT(bs->file, BLKDBG_L1_UPDATE);
+    BLKDBG_EVENT(bs->file, BLKDBG_REFTABLE_UPDATE);
+
+    /* "Create" an empty reftable (one cluster) directly after the image
+     * header and an empty L1 table three clusters after the image header;
+     * the cluster between those two will be used as the first refblock */
+    cpu_to_be64w(&l1_ofs_rt_ofs_cls.l1_offset, 3 * s->cluster_size);
+    cpu_to_be64w(&l1_ofs_rt_ofs_cls.reftable_offset, s->cluster_size);
+    cpu_to_be32w(&l1_ofs_rt_ofs_cls.reftable_clusters, 1);
+    ret = bdrv_pwrite_sync(bs->file, offsetof(QCowHeader, l1_table_offset),
+                           &l1_ofs_rt_ofs_cls, sizeof(l1_ofs_rt_ofs_cls));
+    if (ret < 0) {
+        goto fail_broken_refcounts;
+    }
+
+    s->l1_table_offset = 3 * s->cluster_size;
+
+    new_reftable = g_try_new0(uint64_t, s->cluster_size / sizeof(uint64_t));
+    if (!new_reftable) {
+        ret = -ENOMEM;
+        goto fail_broken_refcounts;
+    }
+
+    s->refcount_table_offset = s->cluster_size;
+    s->refcount_table_size   = s->cluster_size / sizeof(uint64_t);
+
+    g_free(s->refcount_table);
+    s->refcount_table = new_reftable;
+    new_reftable = NULL;
+
+    /* Now the in-memory refcount information again corresponds to the on-disk
+     * information (reftable is empty and no refblocks (the refblock cache is
+     * empty)); however, this means some clusters (e.g. the image header) are
+     * referenced, but not refcounted, but the normal qcow2 code assumes that
+     * the in-memory information is always correct */
+
+    BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_ALLOC);
+
+    /* Enter the first refblock into the reftable */
+    rt_entry = cpu_to_be64(2 * s->cluster_size);
+    ret = bdrv_pwrite_sync(bs->file, s->cluster_size,
+                           &rt_entry, sizeof(rt_entry));
+    if (ret < 0) {
+        goto fail_broken_refcounts;
+    }
+    s->refcount_table[0] = 2 * s->cluster_size;
+
+    s->free_cluster_index = 0;
+    assert(3 + l1_clusters <= s->refcount_block_size);
+    offset = qcow2_alloc_clusters(bs, 3 * s->cluster_size + l1_size2);
+    if (offset < 0) {
+        ret = offset;
+        goto fail_broken_refcounts;
+    } else if (offset > 0) {
+        error_report("First cluster in emptied image is in use");
+        abort();
+    }
+
+    /* Now finally the in-memory information corresponds to the on-disk
+     * structures and is correct */
+    ret = qcow2_mark_clean(bs);
+    if (ret < 0) {
+        goto fail;
+    }
+
+    ret = bdrv_truncate(bs->file, (3 + l1_clusters) * s->cluster_size);
+    if (ret < 0) {
+        goto fail;
+    }
+
+    return 0;
+
+fail_broken_refcounts:
+    /* The BDS is unusable at this point. If we wanted to make it usable, we
+     * would have to call qcow2_refcount_close(), qcow2_refcount_init(),
+     * qcow2_check_refcounts(), qcow2_refcount_close() and qcow2_refcount_init()
+     * again. However, because the functions which could have caused this error
+     * path to be taken are used by those functions as well, it's very likely
+     * that that sequence will fail as well. Therefore, just eject the BDS. */
+    bs->drv = NULL;
+
+fail:
+    g_free(new_reftable);
+    return ret;
+}
+
 static int qcow2_make_empty(BlockDriverState *bs)
 {
-    int ret = 0;
+    BDRVQcowState *s = bs->opaque;
     uint64_t start_sector;
     int sector_step = INT_MAX / BDRV_SECTOR_SIZE;
+    int l1_clusters, ret = 0;
+
+    l1_clusters = DIV_ROUND_UP(s->l1_size, s->cluster_size / sizeof(uint64_t));
+
+    if (s->qcow_version >= 3 && !s->snapshots &&
+        3 + l1_clusters <= s->refcount_block_size) {
+        /* The following function only works for qcow2 v3 images (it requires
+         * the dirty flag) and only as long as there are no snapshots (because
+         * it completely empties the image). Furthermore, the L1 table and three
+         * additional clusters (image header, refcount table, one refcount
+         * block) have to fit inside one refcount block. */
+        return make_completely_empty(bs);
+    }
 
+    /* This fallback code simply discards every active cluster; this is slow,
+     * but works in all cases */
     for (start_sector = 0; start_sector < bs->total_sectors;
          start_sector += sector_step)
     {
diff --git a/include/block/block.h b/include/block/block.h
index 341054d..b1f4385 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -498,6 +498,8 @@ typedef enum {
     BLKDBG_PWRITEV_ZERO,
     BLKDBG_PWRITEV_DONE,
 
+    BLKDBG_EMPTY_IMAGE_PREPARE,
+
     BLKDBG_EVENT_MAX,
 } BlkDebugEvent;
 
-- 
1.9.3

  parent reply	other threads:[~2014-11-03 11:52 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-03 11:50 [Qemu-devel] [PULL 00/53] Block patches Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 01/53] util: introduce MIN_NON_ZERO Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 02/53] BlockLimits: introduce max_transfer_length Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 03/53] block/iscsi: set max_transfer_length Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 04/53] block: avoid creating oversized writes in multiwrite_merge Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 05/53] block/iscsi: use sector_limits_lun2qemu throughout iscsi_refresh_limits Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 06/53] block/iscsi: check for oversized requests Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 07/53] ahci: Correct PIO/D2H FIS responses Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 08/53] ahci: Update byte count after DMA completion Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 09/53] ahci: Fix SDB FIS Construction Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 10/53] snapshot: Reset err to NULL to avoid double free Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 11/53] iotests: replace fake parallels image with authentic one Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 12/53] iotests: add v2 parallels sample image and simple test for it Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 13/53] block/parallels: fix access to not initialized memory in catalog_bitmap Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 14/53] rbd: Add support for bdrv_invalidate_cache Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 15/53] block.c: Fix type of IoOperationType variable in send_qmp_error_event() Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 16/53] snapshot: add bdrv_drain_all() to bdrv_snapshot_delete() to avoid concurrency problem Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 17/53] block/curl: Improve type safety of s->timeout Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 18/53] raw-posix: Fix raw_co_get_block_status() after EOF Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 19/53] raw-posix: raw_co_get_block_status() return value Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 20/53] iotests: Add test for external image truncation Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 21/53] qcow2: Allow "full" discard Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 22/53] qcow2: Implement bdrv_make_empty() Stefan Hajnoczi
2014-11-03 11:50 ` Stefan Hajnoczi [this message]
2014-11-03 11:50 ` [Qemu-devel] [PULL 24/53] blockjob: Introduce block_job_complete_sync() Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 25/53] blockjob: Add "ready" field Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 26/53] iotests: Omit length/offset test in 040 and 041 Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 27/53] block/mirror: Improve progress report Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 28/53] qemu-img: Implement commit like QMP Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 29/53] qemu-img: Empty image after commit Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 30/53] qemu-img: Enable progress output for commit Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 31/53] qemu-img: Specify backing file " Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 32/53] iotests: Add _filter_qemu_img_map Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 33/53] iotests: Add test for backing-chain commits Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 34/53] iotests: Add test for qcow2's bdrv_make_empty Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 35/53] block: qemu-iotest 107 supports NFS Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 36/53] block: Add status callback to bdrv_amend_options() Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 37/53] qemu-img: Add progress output for amend Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 38/53] qemu-img: Fix insignificant memleak Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 39/53] block/qcow2: Implement status CB for amend Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 40/53] block/qcow2: Make get_refcount() global Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 41/53] block/qcow2: Simplify shared L2 handling in amend Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 42/53] iotests: Expand test 061 Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 43/53] block: acquire AioContext in generic blockjob QMP commands Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 44/53] blockdev: acquire AioContext in do_qmp_query_block_jobs_one() Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 45/53] blockdev: acquire AioContext in blockdev_mark_auto_del() Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 46/53] blockdev: add note that block_job_cb() must be thread-safe Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 47/53] blockjob: add block_job_defer_to_main_loop() Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 48/53] block: add bdrv_drain() Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 49/53] block: let backup blockjob run in BDS AioContext Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 50/53] block: let stream " Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 51/53] block: let mirror " Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 52/53] block: let commit " Stefan Hajnoczi
2014-11-03 11:50 ` [Qemu-devel] [PULL 53/53] block: declare blockjobs and dataplane friends! Stefan Hajnoczi
2014-11-03 20:22 ` [Qemu-devel] [PULL 00/53] Block patches Peter Maydell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1415015456-25086-24-git-send-email-stefanha@redhat.com \
    --to=stefanha@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).