From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:53717)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mreitz@redhat.com>) id 1XoE9K-0007Fp-N7
	for qemu-devel@nongnu.org; Tue, 11 Nov 2014 11:18:41 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mreitz@redhat.com>) id 1XoE9E-0006JJ-HK
	for qemu-devel@nongnu.org; Tue, 11 Nov 2014 11:18:34 -0500
Received: from mx1.redhat.com ([209.132.183.28]:40817)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mreitz@redhat.com>) id 1XoE9E-0006JB-9P
	for qemu-devel@nongnu.org; Tue, 11 Nov 2014 11:18:28 -0500
Message-ID: <546236CD.30301@redhat.com>
Date: Tue, 11 Nov 2014 17:18:21 +0100
From: Max Reitz <mreitz@redhat.com>
MIME-Version: 1.0
References: <1415627159-15941-1-git-send-email-mreitz@redhat.com>
	<1415627159-15941-6-git-send-email-mreitz@redhat.com>
	<54612A27.7000801@redhat.com> <5461C751.3080607@redhat.com>
	<5462359D.4040503@redhat.com>
In-Reply-To: <5462359D.4040503@redhat.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH 05/21] qcow2: Refcount overflow and
	qcow2_alloc_bytes()
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Eric Blake <eblake@redhat.com>, qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>, Peter Lieven <pl@kamp.de>, Stefan Hajnoczi <stefanha@redhat.com>

On 2014-11-11 at 17:13, Eric Blake wrote:
> On 11/11/2014 01:22 AM, Max Reitz wrote:
>> On 2014-11-10 at 22:12, Eric Blake wrote:
>>> On 11/10/2014 06:45 AM, Max Reitz wrote:
>>>> qcow2_alloc_bytes() may reuse a cluster multiple times, in which case
>>>> the refcount is increased accordingly. However, if this would lead to an
>>>> overflow the function should instead just not reuse this cluster and
>>>> allocate a new one.
>>> So if recount_order is 1 (2 bits per refcount, max refcount of 4
>> *max refcount of 3 (0b11)
> Oh right, because 0 is special.  Although I think I figured that out...
>
>>> ), and
>>> we encounter the same cluster 6 times (say by 5 back-to-back internal
>>> snapshots), does this code optimize to only 2 clusters (both with
>>> refcount 3) or does it result in each of the last 3 clusters spilling to
> ...when talking about 3 shares of a cluster.
>
>>> its own 1-ref cluster for a total of 4 clusters?  Short of Benoit's work
>>> on deduplication, is there even a way to avoid inefficient use of
>>> spilled clusters?
>> I'm not sure what you're referring to; maybe I should add that
>> qcow2_alloc_bytes() is used for allocating compressed clusters (which
>> ideally don't take up a full host cluster), so "reuse" in this context
>> just means that several compressed clusters share one host cluster.
> No, I was thinking about internal snapshots rather than compressed
> clusters (although there's probably some overlap on what happens).
>
>> Maybe you're referring to the following situation: We have the default
>> cluster size of 64k. Now we're trying to allocate 16k for each of the
>> compressed clusters A, B, C and D. D won't fit into that cluster because
>> the maximum refcount is three, so it will be put into a newly allocated
>> host cluster. Finally, we're trying to allocate 32k for a compressed
>> cluster E, which will then be put into the same cluster as D. We
>> therefore have the following allocation (each sub-box representing 16k):
>>
>> +---+---+---+---+   +---+---+---+---+
>> |A |B | C |   |   | D |   E | |
>> +---+---+---+---+   +---+---+---+---+
>>
>> whereas the ideal allocation would be:
>>
>> +---+---+---+---+   +---+---+---+---+
>> |A |B |   E   |   | C | D | | |
>> +---+---+---+---+   +---+---+---+---+
>>
>> This is a problem, but I think first it's a minor one (just use a
>> sufficiently large refcount width if you're going to use compressed
>> clusters) and second it's about compressed clusters, whose performance I
>> could hardly care less about, frankly.
> No, I was envisioning that we have a brand new image with one cluster
> allocated (cluster 1 has refcount 1), then 5 times in a row we do
> 'savevm' to take an internal snapshot.  If I understand your code
> correctly, the first two snapshots increase the refcount, so cluster 1
> has a refcount of 3. Then the next snapshot can't increase the refcount,
> so it instead copies the contents to cluster 2.

No, it just errors out.

qcow2_alloc_bytes() is only used for allocating space for a compressed 
cluster. When taking a snapshot, update_refcount() will be called to 
increase the clusters' refcounts, and that function will simply throw an 
error.

Max

> The fourth and fifth
> snapshots also see that cluster 1 is full, and allocate cluster 3 and 4;
> whereas a more efficient usage would increase the refcount of cluster 2
> instead of allocating.
>