[Qemu-devel] [RFC PATCH] qcow2: add a readahead cache for qcow2_decompress_cluster

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Peter Lieven <pl@kamp.de>
To: qemu-devel@nongnu.org
Cc: kwolf@redhat.com, famz@redhat.com, stefanha@redhat.com,
	Peter Lieven <pl@kamp.de>,
	ronniesahlberg@gmail.com, pbonzini@redhat.com
Subject: [Qemu-devel] [RFC PATCH] qcow2: add a readahead cache for qcow2_decompress_cluster
Date: Thu, 26 Dec 2013 17:19:52 +0100	[thread overview]
Message-ID: <1388074792-29946-1-git-send-email-pl@kamp.de> (raw)

while evaluatiing compressed qcow2 images as a good basis for
virtual machine templates I found out that there are a lot
of partly redundant (compressed clusters have common physical
sectors) and relatively short reads.

This doesn't hurt if the image resides on a local
filesystem where we can benefit from the local page cache,
but it adds a lot of penalty when accessing remote images
on NFS or similar exports.

This patch effectevily implements a readahead of 2 * cluster_size
which is 2 * 64kB per default resulting in 128kB readahead. This
is the common setting for Linux for instance.

For example this leads to the following times when converting
a compressed qcow2 image to a local tmpfs partition.

Old:
time ./qemu-img convert nfs://10.0.0.1/export/VC-Ubuntu-LTS-12.04.2-64bit.qcow2 /tmp/test.raw
real	0m24.681s
user	0m8.597s
sys	0m4.084s

New:
time ./qemu-img convert nfs://10.0.0.1/export/VC-Ubuntu-LTS-12.04.2-64bit.qcow2 /tmp/test.raw
real	0m16.121s
user	0m7.932s
sys	0m2.244s

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 block/qcow2-cluster.c |   27 +++++++++++++++++++++++++--
 block/qcow2.h         |    1 +
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 11f9c50..367f089 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1321,7 +1321,7 @@ static int decompress_buffer(uint8_t *out_buf, int out_buf_size,
 int qcow2_decompress_cluster(BlockDriverState *bs, uint64_t cluster_offset)
 {
     BDRVQcowState *s = bs->opaque;
-    int ret, csize, nb_csectors, sector_offset;
+    int ret, csize, nb_csectors, sector_offset, max_read;
     uint64_t coffset;
 
     coffset = cluster_offset & s->cluster_offset_mask;
@@ -1329,9 +1329,32 @@ int qcow2_decompress_cluster(BlockDriverState *bs, uint64_t cluster_offset)
         nb_csectors = ((cluster_offset >> s->csize_shift) & s->csize_mask) + 1;
         sector_offset = coffset & 511;
         csize = nb_csectors * 512 - sector_offset;
+        max_read = MIN((bs->file->total_sectors - (coffset >> 9)), 2 * s->cluster_sectors);
         BLKDBG_EVENT(bs->file, BLKDBG_READ_COMPRESSED);
-        ret = bdrv_read(bs->file, coffset >> 9, s->cluster_data, nb_csectors);
+        if (s->cluster_cache_offset != -1 && coffset > s->cluster_cache_offset &&
+           (coffset >> 9) < (s->cluster_cache_offset >> 9) + s->cluster_data_sectors) {
+            int cached_sectors = s->cluster_data_sectors - ((coffset >> 9) -
+                                 (s->cluster_cache_offset >> 9));
+            memmove(s->cluster_data,
+                    s->cluster_data + (s->cluster_data_sectors - cached_sectors) * 512,
+                    cached_sectors * 512);
+            s->cluster_data_sectors = cached_sectors;
+            if (nb_csectors > cached_sectors) {
+                /* some sectors are missing read them and fill up to max_read sectors */
+                ret = bdrv_read(bs->file, (coffset >> 9) + cached_sectors,
+                                s->cluster_data + cached_sectors * 512,
+                                max_read);
+                s->cluster_data_sectors = cached_sectors + max_read;
+            } else {
+                /* all relevant sectors are in the cache */
+                ret = 0;
+            }
+        } else {
+            ret = bdrv_read(bs->file, coffset >> 9, s->cluster_data, max_read);
+            s->cluster_data_sectors = max_read;
+        }
         if (ret < 0) {
+            s->cluster_data_sectors = 0;
             return ret;
         }
         if (decompress_buffer(s->cluster_cache, s->cluster_size,
diff --git a/block/qcow2.h b/block/qcow2.h
index 922e190..5edad26 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -185,6 +185,7 @@ typedef struct BDRVQcowState {
 
     uint8_t *cluster_cache;
     uint8_t *cluster_data;
+    int cluster_data_sectors;
     uint64_t cluster_cache_offset;
     QLIST_HEAD(QCowClusterAlloc, QCowL2Meta) cluster_allocs;
 
-- 
1.7.9.5

next             reply	other threads:[~2013-12-26 16:19 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-26 16:19 Peter Lieven [this message]
2013-12-27  3:23 ` [Qemu-devel] [RFC PATCH] qcow2: add a readahead cache for qcow2_decompress_cluster Fam Zheng
2013-12-28 15:35   ` Peter Lieven
2013-12-28 15:38     ` Peter Lieven

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:11f9c50 dfblob:367f089 dfblob:922e190 dfblob:5edad26 )
 OR (
bs:"[Qemu-devel] [RFC PATCH] qcow2: add a readahead cache for qcow2_decompress_cluster" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1388074792-29946-1-git-send-email-pl@kamp.de \
    --to=pl@kamp.de \
    --cc=famz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=ronniesahlberg@gmail.com \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).