Re: [Qemu-devel] [question] virtio-blk performance degradation happened with virito-serial

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Max Reitz <mreitz@redhat.com>
To: Stefan Hajnoczi <stefanha@gmail.com>, Zhang Haoyu <zhanghy@sangfor.com>
Cc: kvm <kvm@vger.kernel.org>, qemu-devel <qemu-devel@nongnu.org>,
	Zhang Haoyu <ahzhanghaoyu@gmail.com>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Amit Shah <amit.shah@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] [question] virtio-blk performance degradation happened with virito-serial
Date: Sat, 13 Sep 2014 19:22:59 +0200	[thread overview]
Message-ID: <54147D73.1080609@redhat.com> (raw)
In-Reply-To: <20140912123857.GA6207@stefanha-thinkpad.redhat.com>

On 12.09.2014 14:38, Stefan Hajnoczi wrote:
> Max: Unrelated to this performance issue but I notice that the qcow2
> metadata overlap check is high in the host CPU profile.  Have you had
> any thoughts about optimizing the check?
>
> Stefan

In fact, I have done so (albeit only briefly). Instead of gathering all 
the information in the overlap function itself, we could either have a 
generic list of typed ranges (e.g. "cluster 0: header", "clusters 1 to 
5: L1 table", etc.) or a not-really-bitmap (with 4 bits per entry 
specifying the cluster type (header, L1 table, free or data cluster, etc.)).

The disadvantage of the former would be that in its simplest form we'd 
have to run through the whole list to find out whether a cluster is 
already reserved for metadata or not. We could easily optimize this by 
keeping the list in order and then performing a binary search.

The disadvantage of the latter would obviously be its memory size. For a 
1 TB image with 64 kB clusters, it would be 8 MB in size. Could be 
considered acceptable, but I deem it too large. The advantage would be 
constant access time, of course.

We could combine both approaches, that is, using the bitmap as a cache: 
Whenever a cluster is overlap checked, the corresponding bitmap range 
(or "bitmap window") is requested; if that is not available, it is 
generated from the range list and then put into the cache.

The remaining question is how large the range list would be in memory. 
Basically, its size would be comparable to an RLE version of the bitmap. 
In contrast to a raw RLE version, however, we'd have to add the start 
cluster to each entry in order to be able to perform binary search and 
we'd omit free and/or data clusters. So, we'd have 4 bits for the 
cluster type, let's say 12 bits for the cluster count and of course 64 
bits for the first cluster index. Or, for maximum efficiency, we'd have 
64 - 9 - 1 = 54 bits for the cluster index, 4 bits for the type and then 
6 bits for the cluster count. The first variant gives us 10 bytes per 
metadata range, the second 8. Considering one refcount block can handle 
cluster_size / 2 entries and one L2 table can handle cluster_size / 8 
entries, we have (for images with a cluster size of 64 kB) a ratio of 
about 1/32768 refcount blocks per cluster and 1/8192 L2 tables per 
cluster. I guess we therefore have a metadata ratio of about 1/6000. At 
the worst, each metadata cluster requires its own range list entry, 
which for 10 bytes per entry means less than 30 kB for the list of a 1 
TB image with 64 kB clusters. I think that's acceptable.

We could compress that list even more by making it a real RLE version of 
the bitmap, removing the cluster index from each entry; remember that 
for this mixed range list/bitmap approach we no longer need to be able 
to perform exact binary search but only need to be able to quickly seek 
to the beginning of a bitmap window. This can be achieved by forcing 
breaks in the range list at every window border and keeping track of 
those offsets along with the corresponding bitmap window index. When we 
want to generate a bitmap window, we look up the start offset in the 
range list (constant time), then generate it (linear to window size) and 
can then perform constant-time lookups for each overlap checks in that 
window.

I think that could greatly speed things up and also allow us to always 
perform range checks on data structures not kept in memory (inactive L1 
and L2 tables). The only question now remaining to me is whether that 
caching is actually feasible or whether binary search into the range 
list (which then would have to include the cluster index for each entry) 
would be faster than generating bitmap windows which might suffer from 
ping-pong effects.

Max

next prev parent reply	other threads:[~2014-09-13 17:23 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-29  7:45 [Qemu-devel] [question] virtio-blk performance degradation happened with virito-serial Zhang Haoyu
2014-08-29 14:38 ` Amit Shah
2014-09-01 12:38   ` [Qemu-devel] [question] virtio-blk performance degradationhappened " Zhang Haoyu
2014-09-01 12:46     ` Amit Shah
2014-09-01 12:57       ` [Qemu-devel] [question] virtio-blk performancedegradationhappened " Zhang Haoyu
2014-09-01 12:52     ` [Qemu-devel] [question] virtio-blk performance degradationhappened " Zhang Haoyu
2014-09-01 13:09       ` Christian Borntraeger
2014-09-01 13:12         ` Paolo Bonzini
2014-09-01 13:22           ` Christian Borntraeger
2014-09-01 13:29             ` Paolo Bonzini
2014-09-01 14:03               ` Christian Borntraeger
2014-09-01 14:15                 ` Christian Borntraeger
2014-09-04  7:56             ` [Qemu-devel] [question] virtio-blk performance degradationhappenedwith virito-serial Zhang Haoyu
2014-09-07  9:46               ` Zhang Haoyu
2014-09-11  6:11                 ` Amit Shah
2014-09-12  3:21                   ` [Qemu-devel] [question] virtio-blk performance degradation happened with virito-serial Zhang Haoyu
2014-09-12 12:38                     ` Stefan Hajnoczi
2014-09-13 17:22                       ` Max Reitz [this message]
2014-09-16 14:59                     ` Zhang Haoyu
2014-09-02  6:36       ` [Qemu-devel] [question] virtio-blk performance degradationhappened " Amit Shah
2014-09-02 18:05         ` Andrey Korolyov
2014-09-02 18:11           ` Amit Shah
2014-09-02 18:27             ` Andrey Korolyov
2014-09-04  2:20         ` [Qemu-devel] [question] virtio-blk performancedegradationhappened " Zhang Haoyu
2014-09-19  5:53         ` [Qemu-devel] [question] virtio-blk performance degradationhappened " Fam Zheng
2014-09-19 13:35           ` Paolo Bonzini
2014-09-22 13:23           ` [Qemu-devel] [question] virtio-blk performancedegradationhappened " Zhang Haoyu
2014-09-23  1:29             ` Fam Zheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54147D73.1080609@redhat.com \
    --to=mreitz@redhat.com \
    --cc=ahzhanghaoyu@gmail.com \
    --cc=amit.shah@redhat.com \
    --cc=borntraeger@de.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@gmail.com \
    --cc=zhanghy@sangfor.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).