[Qemu-devel] [PULL 20/37] qcow2: Give the refcount cache the minimum possible size by default

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Kevin Wolf <kwolf@redhat.com>
To: qemu-block@nongnu.org
Cc: kwolf@redhat.com, qemu-devel@nongnu.org
Subject: [Qemu-devel] [PULL 20/37] qcow2: Give the refcount cache the minimum possible size by default
Date: Tue, 15 May 2018 17:40:16 +0200	[thread overview]
Message-ID: <20180515154033.19899-21-kwolf@redhat.com> (raw)
In-Reply-To: <20180515154033.19899-1-kwolf@redhat.com>

From: Alberto Garcia <berto@igalia.com>

The L2 and refcount caches have default sizes that can be overridden
using the l2-cache-size and refcount-cache-size (an additional
parameter named cache-size sets the combined size of both caches).

Unless forced by one of the aforementioned parameters, QEMU will set
the unspecified sizes so that the L2 cache is 4 times larger than the
refcount cache.

This is based on the premise that the refcount metadata needs to be
only a fourth of the L2 metadata to cover the same amount of disk
space. This is incorrect for two reasons:

 a) The amount of disk covered by an L2 table depends solely on the
    cluster size, but in the case of a refcount block it depends on
    the cluster size *and* the width of each refcount entry.
    The 4/1 ratio is only valid with 16-bit entries (the default).

 b) When we talk about disk space and L2 tables we are talking about
    guest space (L2 tables map guest clusters to host clusters),
    whereas refcount blocks are used for host clusters (including
    L1/L2 tables and the refcount blocks themselves). On a fully
    populated (and uncompressed) qcow2 file, image size > virtual size
    so there are more refcount entries than L2 entries.

Problem (a) could be fixed by adjusting the algorithm to take into
account the refcount entry width. Problem (b) could be fixed by
increasing a bit the refcount cache size to account for the clusters
used for qcow2 metadata.

However this patch takes a completely different approach and instead
of keeping a ratio between both cache sizes it assigns as much as
possible to the L2 cache and the remainder to the refcount cache.

The reason is that L2 tables are used for every single I/O request
from the guest and the effect of increasing the cache is significant
and clearly measurable. Refcount blocks are however only used for
cluster allocation and internal snapshots and in practice are accessed
sequentially in most cases, so the effect of increasing the cache is
negligible (even when doing random writes from the guest).

So, make the refcount cache as small as possible unless the user
explicitly asks for a larger one.

Signed-off-by: Alberto Garcia <berto@igalia.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 9695182c2eb11b77cb319689a1ebaa4e7c9d6591.1523968389.git.berto@igalia.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2.h              |  4 ----
 block/qcow2.c              | 31 +++++++++++++++++++------------
 tests/qemu-iotests/137.out |  2 +-
 3 files changed, 20 insertions(+), 17 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index adf5c3950f..01b5250415 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -77,10 +77,6 @@
 #define DEFAULT_L2_CACHE_CLUSTERS 8 /* clusters */
 #define DEFAULT_L2_CACHE_BYTE_SIZE 1048576 /* bytes */
 
-/* The refblock cache needs only a fourth of the L2 cache size to cover as many
- * clusters */
-#define DEFAULT_L2_REFCOUNT_SIZE_RATIO 4
-
 #define DEFAULT_CLUSTER_SIZE 65536
 
 
diff --git a/block/qcow2.c b/block/qcow2.c
index 2f36e632f9..6d532470a8 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -802,23 +802,30 @@ static void read_cache_sizes(BlockDriverState *bs, QemuOpts *opts,
         } else if (refcount_cache_size_set) {
             *l2_cache_size = combined_cache_size - *refcount_cache_size;
         } else {
-            *refcount_cache_size = combined_cache_size
-                                 / (DEFAULT_L2_REFCOUNT_SIZE_RATIO + 1);
-            *l2_cache_size = combined_cache_size - *refcount_cache_size;
+            uint64_t virtual_disk_size = bs->total_sectors * BDRV_SECTOR_SIZE;
+            uint64_t max_l2_cache = virtual_disk_size / (s->cluster_size / 8);
+            uint64_t min_refcount_cache =
+                (uint64_t) MIN_REFCOUNT_CACHE_SIZE * s->cluster_size;
+
+            /* Assign as much memory as possible to the L2 cache, and
+             * use the remainder for the refcount cache */
+            if (combined_cache_size >= max_l2_cache + min_refcount_cache) {
+                *l2_cache_size = max_l2_cache;
+                *refcount_cache_size = combined_cache_size - *l2_cache_size;
+            } else {
+                *refcount_cache_size =
+                    MIN(combined_cache_size, min_refcount_cache);
+                *l2_cache_size = combined_cache_size - *refcount_cache_size;
+            }
         }
     } else {
-        if (!l2_cache_size_set && !refcount_cache_size_set) {
+        if (!l2_cache_size_set) {
             *l2_cache_size = MAX(DEFAULT_L2_CACHE_BYTE_SIZE,
                                  (uint64_t)DEFAULT_L2_CACHE_CLUSTERS
                                  * s->cluster_size);
-            *refcount_cache_size = *l2_cache_size
-                                 / DEFAULT_L2_REFCOUNT_SIZE_RATIO;
-        } else if (!l2_cache_size_set) {
-            *l2_cache_size = *refcount_cache_size
-                           * DEFAULT_L2_REFCOUNT_SIZE_RATIO;
-        } else if (!refcount_cache_size_set) {
-            *refcount_cache_size = *l2_cache_size
-                                 / DEFAULT_L2_REFCOUNT_SIZE_RATIO;
+        }
+        if (!refcount_cache_size_set) {
+            *refcount_cache_size = MIN_REFCOUNT_CACHE_SIZE * s->cluster_size;
         }
     }
 
diff --git a/tests/qemu-iotests/137.out b/tests/qemu-iotests/137.out
index e28e1eadba..96724a6c33 100644
--- a/tests/qemu-iotests/137.out
+++ b/tests/qemu-iotests/137.out
@@ -22,7 +22,7 @@ refcount-cache-size may not exceed cache-size
 L2 cache size too big
 L2 cache entry size must be a power of two between 512 and the cluster size (65536)
 L2 cache entry size must be a power of two between 512 and the cluster size (65536)
-L2 cache size too big
+Refcount cache size too big
 Conflicting values for qcow2 options 'overlap-check' ('constant') and 'overlap-check.template' ('all')
 Unsupported value 'blubb' for qcow2 option 'overlap-check'. Allowed are any of the following: none, constant, cached, all
 Unsupported value 'blubb' for qcow2 option 'overlap-check'. Allowed are any of the following: none, constant, cached, all
-- 
2.13.6

next prev parent reply	other threads:[~2018-05-15 15:41 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-15 15:39 [Qemu-devel] [PULL 00/37] Block layer patches Kevin Wolf
2018-05-15 15:39 ` [Qemu-devel] [PULL 01/37] block-backend: simplify blk_get_aio_context Kevin Wolf
2018-05-15 15:39 ` [Qemu-devel] [PULL 02/37] block: Support byte-based aio callbacks Kevin Wolf
2018-05-15 15:39 ` [Qemu-devel] [PULL 03/37] file-win32: Switch to byte-based callbacks Kevin Wolf
2018-05-15 15:40 ` [Qemu-devel] [PULL 04/37] null: Switch to byte-based read/write Kevin Wolf
2018-05-15 15:40 ` [Qemu-devel] [PULL 05/37] rbd: Switch to byte-based callbacks Kevin Wolf
2018-05-15 15:40 ` [Qemu-devel] [PULL 06/37] vxhs: " Kevin Wolf
2018-05-15 15:40 ` [Qemu-devel] [PULL 07/37] block: Drop last of the sector-based aio callbacks Kevin Wolf
2018-05-15 15:40 ` [Qemu-devel] [PULL 08/37] block: Merge .bdrv_co_writev{, _flags} in drivers Kevin Wolf
2018-05-15 15:40 ` [Qemu-devel] [PULL 09/37] hmp: Allow using a qdev id in block_set_io_throttle Kevin Wolf
2018-05-15 15:40 ` [Qemu-devel] [PULL 10/37] blockjob: expose error string via query Kevin Wolf
2018-05-15 15:40 ` [Qemu-devel] [PULL 11/37] blockjob: Fix assertion in block_job_finalize() Kevin Wolf
2018-05-15 15:40 ` [Qemu-devel] [PULL 12/37] blockjob: Wrappers for progress counter access Kevin Wolf
2018-05-15 15:40 ` [Qemu-devel] [PULL 13/37] blockjob: Move RateLimit to BlockJob Kevin Wolf
2018-05-15 15:40 ` [Qemu-devel] [PULL 14/37] blockjob: Implement block_job_set_speed() centrally Kevin Wolf
2018-05-15 15:40 ` [Qemu-devel] [PULL 15/37] blockjob: Introduce block_job_ratelimit_get_delay() Kevin Wolf
2018-05-15 15:40 ` [Qemu-devel] [PULL 16/37] blockjob: Add block_job_driver() Kevin Wolf
2018-05-15 15:40 ` [Qemu-devel] [PULL 17/37] iotests: Split 214 off of 122 Kevin Wolf
2018-05-15 15:40 ` [Qemu-devel] [PULL 18/37] Fix error message about compressed clusters with OFLAG_COPIED Kevin Wolf
2018-05-15 15:40 ` [Qemu-devel] [PULL 19/37] specs/qcow2: Clarify that compressed clusters have the COPIED bit reset Kevin Wolf
2018-05-15 15:40 ` Kevin Wolf [this message]
2018-05-25 17:10   ` [Qemu-devel] [PULL 20/37] qcow2: Give the refcount cache the minimum possible size by default Peter Maydell
2018-05-28  8:38     ` Kevin Wolf
2018-05-28  8:58       ` Alberto Garcia
2018-05-28 13:49         ` Peter Maydell
2018-05-28 13:58           ` Alberto Garcia
2018-05-15 15:40 ` [Qemu-devel] [PULL 21/37] docs: Document the new default sizes of the qcow2 caches Kevin Wolf
2018-05-15 15:40 ` [Qemu-devel] [PULL 22/37] iotests: Add failure matching to common.qemu Kevin Wolf
2018-05-15 15:40 ` [Qemu-devel] [PULL 23/37] iotests: Skip 181 and 201 without userfaultfd Kevin Wolf
2018-05-15 15:40 ` [Qemu-devel] [PULL 24/37] block: Add COR filter driver Kevin Wolf
2018-05-15 15:40 ` [Qemu-devel] [PULL 25/37] block: BLK_PERM_WRITE includes ..._UNCHANGED Kevin Wolf
2018-05-15 15:40 ` [Qemu-devel] [PULL 26/37] block: Add BDRV_REQ_WRITE_UNCHANGED flag Kevin Wolf
2018-05-15 15:40 ` [Qemu-devel] [PULL 27/37] block: Set BDRV_REQ_WRITE_UNCHANGED for COR writes Kevin Wolf
2018-05-15 15:40 ` [Qemu-devel] [PULL 28/37] block/quorum: Support BDRV_REQ_WRITE_UNCHANGED Kevin Wolf
2018-05-15 15:40 ` [Qemu-devel] [PULL 29/37] block: Support BDRV_REQ_WRITE_UNCHANGED in filters Kevin Wolf
2018-05-15 15:40 ` [Qemu-devel] [PULL 30/37] iotests: Clean up wrap image in 197 Kevin Wolf
2018-05-15 15:40 ` [Qemu-devel] [PULL 31/37] iotests: Copy 197 for COR filter driver Kevin Wolf
2018-05-15 15:40 ` [Qemu-devel] [PULL 32/37] iotests: Add test for COR across nodes Kevin Wolf
2018-05-15 15:40 ` [Qemu-devel] [PULL 33/37] qemu-img: Check post-truncation size Kevin Wolf
2018-05-15 15:40 ` [Qemu-devel] [PULL 34/37] block: Document BDRV_REQ_WRITE_UNCHANGED support Kevin Wolf
2018-05-15 15:40 ` [Qemu-devel] [PULL 35/37] qemu-io: Use purely string blockdev options Kevin Wolf
2018-05-15 15:40 ` [Qemu-devel] [PULL 36/37] qemu-img: Use only string options in img_open_opts Kevin Wolf
2018-05-15 15:40 ` [Qemu-devel] [PULL 37/37] iotests: Add test for -U/force-share conflicts Kevin Wolf
2018-05-15 16:59 ` [Qemu-devel] [PULL 00/37] Block layer patches Peter Maydell

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:adf5c3950 dfblob:01b525041 dfblob:2f36e632f dfblob:6d532470a
dfblob:e28e1eadb dfblob:96724a6c3 )
 OR (
bs:"[Qemu-devel] [PULL 20/37] qcow2: Give the refcount cache the minimum possible size by default" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180515154033.19899-21-kwolf@redhat.com \
    --to=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).