Re: [Qemu-devel] RFC: Reducing the size of entries in the qcow2 L2 cache

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Kevin Wolf <kwolf@redhat.com>
To: Alberto Garcia <berto@igalia.com>
Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org,
	Max Reitz <mreitz@redhat.com>, "Denis V. Lunev" <den@openvz.org>
Subject: Re: [Qemu-devel] RFC: Reducing the size of entries in the qcow2 L2 cache
Date: Wed, 20 Sep 2017 09:06:20 +0200	[thread overview]
Message-ID: <20170920070620.GB4730@localhost.localdomain> (raw)
In-Reply-To: <87tvzyu3p4.fsf@igalia.com>

Am 19.09.2017 um 17:07 hat Alberto Garcia geschrieben:
> Hi everyone,
> 
> over the past few weeks I have been testing the effects of reducing
> the size of the entries in the qcow2 L2 cache. This was briefly
> mentioned by Denis in the same thread where we discussed subcluster
> allocation back in April, but I'll describe here the problem and the
> proposal in detail.
> [...]

Thanks for working on this, Berto! I think this is essential for large
cluster sizes and have been meaning to make a change like this for a
long time, but I never found the time for it.

> Some results from my tests (using an SSD drive and random 4K reads):
> 
> |-----------+--------------+-------------+---------------+--------------|
> | Disk size | Cluster size | L2 cache    | Standard QEMU | Patched QEMU |
> |-----------+--------------+-------------+---------------+--------------|
> | 16 GB     | 64 KB        | 1 MB [8 GB] | 5000 IOPS     | 12700 IOPS   |
> |  2 TB     |  2 MB        | 4 MB [1 TB] |  576 IOPS     | 11000 IOPS   |
> |-----------+--------------+-------------+---------------+--------------|
> 
> The improvements are clearly visible, but it's important to point out
> a couple of things:
> 
>    - L2 cache size is always < total L2 metadata on disk (otherwise
>      this wouldn't make sense). Increasing the L2 cache size improves
>      performance a lot (and makes the effect of these patches
>      disappear), but it requires more RAM.

Do you have the numbers for the two cases abve if the L2 tables covered
the whole image?

>    - Doing random reads over the whole disk is probably not a very
>      realistic scenario. During normal usage only certain areas of the
>      disk need to be accessed, so performance should be much better
>      with the same amount of cache.
>    - I wrote a best-case scenario test (several I/O jobs each accesing
>      a part of the disk that requires loading its own L2 table) and my
>      patched version is 20x faster even with 64KB clusters.

I suppose you choose the scenario so that the number of jobs is larger
than the number of cached L2 tables without the patch, but smaller than
than the number of cache entries with the patch?

We will probably need to do some more benchmarking to find a good
default value for the cached chunks. 4k is nice and small, so we can
cover many parallel jobs without using too much memory. But if we have a
single sequential job, we may end up doing the metadata updates in
small 4k chunks instead of doing a single larger write.

Of course, if this starts becoming a problem (maybe unlikely?), we can
always change the cache code to gather any adjacent dirty chunks in the
cache when writing out something. Same thing for readahead, if we can
find a policy when to evict old entries for readahead.

>    - We need a proper name for these sub-tables that we are loading
>      now. I'm actually still struggling with this :-) I can't think of
>      any name that is clear enough and not too cumbersome to use (L2
>      subtables? => Confusing. L3 tables? => they're not really that).

L2 table chunk? Or just L2 cache entry?

> I think I haven't forgotten anything. As I said I have a working
> prototype of this and if you like the idea I'd like to publish it
> soon. Any questions or comments will be appreciated.

Please do post it!

Kevin

next prev parent reply	other threads:[~2017-09-20  7:06 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-19 15:07 [Qemu-devel] RFC: Reducing the size of entries in the qcow2 L2 cache Alberto Garcia
2017-09-19 15:18 ` Denis V. Lunev
2017-09-20  7:06 ` Kevin Wolf [this message]
2017-09-20 13:10   ` Alberto Garcia
2017-09-25 20:15 ` [Qemu-devel] [Qemu-block] " John Snow
2017-09-25 20:21   ` Alberto Garcia

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170920070620.GB4730@localhost.localdomain \
    --to=kwolf@redhat.com \
    --cc=berto@igalia.com \
    --cc=den@openvz.org \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).