From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:43324)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mreitz@redhat.com>) id 1XrSKx-0000bN-FG
	for qemu-devel@nongnu.org; Thu, 20 Nov 2014 09:04:01 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mreitz@redhat.com>) id 1XrSKr-0004W9-4E
	for qemu-devel@nongnu.org; Thu, 20 Nov 2014 09:03:55 -0500
Received: from mx1.redhat.com ([209.132.183.28]:54376)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mreitz@redhat.com>) id 1XrSKq-0004Vr-Sy
	for qemu-devel@nongnu.org; Thu, 20 Nov 2014 09:03:49 -0500
Message-ID: <546DF4BB.9040302@redhat.com>
Date: Thu, 20 Nov 2014 15:03:39 +0100
From: Max Reitz <mreitz@redhat.com>
MIME-Version: 1.0
References: <1415970374-16811-1-git-send-email-mreitz@redhat.com>	<1415970374-16811-22-git-send-email-mreitz@redhat.com>	<5467682E.1030604@redhat.com>
	<5469E4C0.8040509@redhat.com> <546BAB8E.3090200@redhat.com>
	<546C3022.9070701@redhat.com>
In-Reply-To: <546C3022.9070701@redhat.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH v2 21/21] iotests: Add test for different
 refcount widths
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Eric Blake <eblake@redhat.com>, qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>, Peter Lieven <pl@kamp.de>, Stefan Hajnoczi <stefanha@redhat.com>

On 2014-11-19 at 06:52, Eric Blake wrote:
> On 11/18/2014 01:26 PM, Eric Blake wrote:
>
>> Now, in response to your question about some other 3-pass inducing
>> pattern, let's think back to v1, where you questioned what would happen
>> if a hole in the reftable gets turned into data due to a later
>> allocation.  Let's see if I can come up with a scenario for that...
>>
>> Let's stick with a cluster size of 512, and use 32-bit and 64-bit widths
>> as our two sizes.  If we downsize from 64 to 32 bits, then every two
>> refblock clusters in the old table results in one call to operation()
>> for the new table; conversely, if we upsize, then every refblock cluster
>> in the old table gives two calls to operation() in the new table.  The
>> trick at hand is to come up with some image where we punch a hole so
>> that on the first pass, we call operation() with refblock_empty true for
>> one iteration (necessarily a call later than the first, since the image
>> header guarantees the first refblock is not empty), but where we have
>> data after the hole, where it is the later data that triggers the
>> allocation that will finally start to fill the hole.
>>
>> How about starting with an image that occupies between 1.5 and 2
>> refblocks worth of 32-width clusters (so an image anywhere between 193
>> and 256 clusters, or between 98816 and 131072 bytes).  You should be
>> able to figure out how many clusters this consumes for L1, L2, plus 1
>> for header, reftable, and 2 for refblocks, in order to figure out how
>> many remaining clusters are dedicated to data; ideally, the data
>> clusters are contiguous, and occupy a swath that covers at least
>> clusters 126 through 192.  Widening to 64-bit width will require 4
>> refblocks instead of 2, if all refblocks are needed.  But the whole idea
>> of punching a hole is that we don't need a refblock if it will be
>> all-zero entries.  So take this original image, and discard the data
>> clusters from physical index 126 through 192, (this is NOT the data
>> visible at guest offset 31744, but whatever actual offset of guest data
>> that maps to physical offset 31744).  The old reftable now looks like {
>> refblock_o1 [0-125 occupied, 126 and 127 empty], refblock_o2 [128-191
>> empty, 192-whatever occupied, tail empty] }.  With no allocations
>> required, this would in turn would map to the following new refblocks: {
>> refblock_n1 [0-64 occupied], refblock_n2 [65-125 occupied, 126-127
>> empty], NULL, refblock_n4 [192-whatever occupied] }.  Note that we do
>> not need to allocate refblock_n3 because of the hole in the old
>> refblock; we DO end up allocating three refblocks, but in the sequence
>> of things, refblock_n1 and refblock_n2 are allocated while we are
>> visiting refblock_o1 and still fit in refblock_o1, while refblock_n4 is
>> not allocated until after we have already passed over the first half of
>> refblock_o2.
>>
>> Thus, the second walk over the image will see that we need to allocate
>> refblock_n3 because it now contains entries (in particular, the entry
>> for refblock_n4, but also the 1-cluster entry for the proposed reftable
>> that is allocated between the walks).  So, while your test used the
>> allocation of the reftable as the spillover point, my scenario here uses
>> the allocation of later refblocks as the spillover point that got missed
>> during the first iteration.
>>
> Oops,...
>
>> which means the reftable now looks like { refblock1, NULL, refblock3,
>> NULL... }; and where refblock1 now has at least two free entries
>> (possibly three, if the just-freed refblock2 happened to live before
>> cluster 62).  is we can also free refblock2
>>
> ...forgot to delete these random thoughts that I typed up but no longer
> needed after reworking the above text.
>
> At any rate, I'm not certain we can come up with a four-pass scenario;
> if it is even possible, it would be quite complex.

[snip] (But rest assured, I read it all ;-))

> At this point, I've spent far too long writing this email.  I haven't
> completely ruled out the possibility of a corner case needing four
> passes through the do loop, but the image sizes required to get there
> are starting to be quite large compared to your simpler test of needing
> three passes through the do loop.

Right, see test 026. Without an SSD, it takes more than ten minutes, not 
least because it tests resizing the reftable which means writing a lot 
of data to an image with 512 byte clusters.

> I won't be bothered if we call it
> good, and quit trying to come up with any other "interesting" allocation
> sequencing.

The problem is, in my opinion, that we won't gain a whole lot from 
proving that there are cases where you need a fourth pass and test these 
cases. Fundamentally, they are not different from cases with three 
passes (technically, not even different from two pass cases). You scan 
through the refcounts, you detect that you need refblocks which you have 
not yet allocated, you allocate them, then you respin until all 
allocations are done. The only problem would be whether it'd be possible 
to run into an infinite loop: Can allocating new refblocks lead to a 
case where we have to allocate even more refblocks? Well, just judging 
from how complicated it is to even find a case where the number of new 
allocations hasn't gone down to zero in the fourth pass, we can safely 
rule that out.

Some people may ask why the walks are performed in a loop without a 
fixed limit (because they can't find cases where allocations haven't 
settled at the third pass). But I doubt that'll be a serious problem. 
It's much easier to have such a basically unlimited loop with the 
reasoning "We don't know exactly how many loops it'll take, but it will 
definitely settle at some point in time" than limiting the loop and then 
having to explain why we know exactly that it won't take more than X 
passes. The only problem with not limiting is that we need one walk to 
verify that all allocations have settled. But we need that for the 
common case (two passes) anyway, so that's not an issue.

The code from this version does not care whether it takes one, two, 
three, four or 42 passes. It's all the same. It will never take one and 
it will probably never take 42 passes; but if it does, well, it will 
work. Therefore, I think testing one non-standard number of passes 
(three) is enough. I'd like to test more, but the effort's just not 
worth it. I think.

Max