From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:59931)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mreitz@redhat.com>) id 1XtDn6-0001C6-VJ
	for qemu-devel@nongnu.org; Tue, 25 Nov 2014 05:56:23 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mreitz@redhat.com>) id 1XtDmy-0006my-Jv
	for qemu-devel@nongnu.org; Tue, 25 Nov 2014 05:56:16 -0500
Received: from mx1.redhat.com ([209.132.183.28]:57476)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mreitz@redhat.com>) id 1XtDmy-0006ms-Br
	for qemu-devel@nongnu.org; Tue, 25 Nov 2014 05:56:08 -0500
Message-ID: <5474603C.2060606@redhat.com>
Date: Tue, 25 Nov 2014 11:55:56 +0100
From: Max Reitz <mreitz@redhat.com>
MIME-Version: 1.0
References: <1416844620-17717-1-git-send-email-mreitz@redhat.com>
In-Reply-To: <1416844620-17717-1-git-send-email-mreitz@redhat.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH v2 00/12] qcow2: Add new overlap check
	functions
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>, Peter Lieven <pl@kamp.de>, Stefan Hajnoczi <stefanha@redhat.com>

On 2014-11-24 at 16:56, Max Reitz wrote:
> RAM usage
> =========
>
> So I have looked at my 2 GB image above, and the list uses 40 kB, which
> may or may not be too much (sounds completely fine to me for an image
> with 512 byte clusters); but it is a least a number I can use for
> testing the following theoretical inspection:

(I'm in the process of some kind of self-review right now)

Wrong, it's 371 kB after patch 1 is fixed.

[snip]

> Let's test that for the above image, which has a disk size of 266 MB:

Except the disk size doesn't matter; the image was created with 
preallocation=metadata, therefore all the metadata for a 2 GB virtual 
image is there. Let's check the file length: 2190559232, two percent 
above 2 GB. Sounds reasonable.

For that file length, we actually have:

40 * 2190559232 / (512 * 512) = 326 kB

Hm, okay, so it doesn't work so well. The good message is: I know why. 
In the calculation given here, I omitted the size of 
Qcow2MetadataWindow; for every WINDOW_SIZE (= WS) clusters, there is one 
such object. Let's include it in the calculation 
(sizeof(Qcow2MetadataWindow) is 40 on my x64 system):

40 * IS / (CS * CS) + 40 * IS / (CS * WS)
= 40 * IS / CS * (1 / CS + 1 / WS)

Okay, something else I forgot? There is the Qcow2MetadataList object 
itself; but we have it only once, so let's omit that. Then there is an 
integer array with an entry per cache entry and the cache itself; 
qcow2_create_empty_metadata_list() limits the cache size so that it that 
integer array and the cached bitmaps will not surpass the given byte 
size (currently 64 kB), so I'll just omit it as well (it's constant and 
can easily be adjusted).

So, with the above term we have:

40 * 2190559232 / 512 * (1 / 512 + 1 / 4096) = 367 kB

Much better.

> 40 * 266M / (512 * 512) = 42 kB
>
> Great! It works.
>
>
> So, now let's set CS to 64 kB, because that is basically the only
> cluster size we really care about. For a 1 TB image, we need 10 kB for
> the list. Sounds great to me. For a 1 PB image, we will need 10 MB. Fair
> enough. (Note that you don't need 10 MB of RAM to create a 1 PB image;
> you only need that once the disk size of the image has reached 1 PB).
>
> And 1 TB with 512 byte clusters? 160 MB. Urgh, that is a lot. But then
> again, you can switch off the overlap check with overlap-check=off; and
> trying to actually use a 1 TB image with 512 byte clusters is crazy in
> itself (have you tried just creating one without preallocation? It takes
> forever). So I can live with that.

And with the fixed term:

1 TB / 64 kB: 170 kB
1 PB / 64 kB: 170 MB

1 TB / 512 B: 180 MB

The actually "problematic" 512 B cluster version actually doesn't get so 
much worse (because 1 / 4096 < 1 / 512; whereas 1 / 4096 > 1 / 65536, 
which is why fixing the term has a much higher impact on greater cluster 
sizes).

But for the default of 64 kB, the size basically explodes. We now can 
either choose to ignore that fact (17x is a lot, but using more than 1 
MB starting from 6 TB still sounds fine to me) or increase WINDOW_SIZE 
(to a maximum of 65536, which would reduce the RAM usage to 20 kB for a 
1 TB image and 20 MB for a 1 PB image), which would probably somewhat 
limit performance in the conversion case, but since I haven't seen any 
issues for WINDOW_SIZE = 4096, I don't think it should make a huge 
difference. But as a side effect, we will want to increase the cache 
size, because with the current default of 64 kB, we will have only one 
cached bitmap; but we probably want to have at least two, maybe four if 
possible. But 256 kB does not sound too bad either.

> tl;dr
> =====
>
> * CPU usage at runtime decreased by 150 to 275 percent on
>    overlap-check-heavy tasks
> * No additional performance problems at loading time (in theory has the
>    same runtime complexity as a single overlap check right now; in
>    practice I could not find any problems)
> * Decent RAM usage (40 kB for a 1 TB image with 64 kB clusters; 40 MB
>    for a 1 PB image etc. pp.)

I'm not sure why I wrote 40 kB and 40 MB here; it was 10 kB and 10 MB.

Anyway, now it's 170 kB for a 1 TB image and 170 MB for a 1 PB image.

Max