From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39074) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XrS5t-0004Xz-83 for qemu-devel@nongnu.org; Thu, 20 Nov 2014 08:48:27 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XrS5n-0006K8-0i for qemu-devel@nongnu.org; Thu, 20 Nov 2014 08:48:21 -0500 Received: from mx1.redhat.com ([209.132.183.28]:35602) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XrS5m-0006K0-Oh for qemu-devel@nongnu.org; Thu, 20 Nov 2014 08:48:14 -0500 Message-ID: <546DF113.70403@redhat.com> Date: Thu, 20 Nov 2014 14:48:03 +0100 From: Max Reitz MIME-Version: 1.0 References: <1415970374-16811-1-git-send-email-mreitz@redhat.com> <1415970374-16811-22-git-send-email-mreitz@redhat.com> <5467682E.1030604@redhat.com> <5469E4C0.8040509@redhat.com> <546BAB8E.3090200@redhat.com> In-Reply-To: <546BAB8E.3090200@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v2 21/21] iotests: Add test for different refcount widths List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Eric Blake , qemu-devel@nongnu.org Cc: Kevin Wolf , Peter Lieven , Stefan Hajnoczi On 2014-11-18 at 21:26, Eric Blake wrote: > On 11/17/2014 05:06 AM, Max Reitz wrote: > >>> Umm, that sounds backwards from what you document. It's a good test = of >>> the _new_ reftable needing a second round of allocations. So keep it >>> with corrected comments. But I think you _intended_ to write a test >>> that starts with a refcount_width=3D64 image and resize to a >>> refcount_width=3D1, where the _old_ reftable then suffers a reallocat= ion >>> as part of allocating refblocks for the new table. It may even help = if >>> you add a tracepoint for every iteration through the walk function >>> callback, to prove we are indeed executing it 3 times instead of the >>> usual 2, for these test cases. >> I'm currently thinking about a way to test the old reftable reallocati= on >> issue, and I can't find any. So, for the old reftable to require a >> reallocation it must grow. For it to grow we need some allocation beyo= nd >> what it can currently represent. For this to happen during the refbloc= k >> allocation walk, this allocation must be the allocation of a new refbl= ock. >> >> If the refblock is allocated beyond the current reftable's limit, this >> means that either all clusters between free_cluster_index and that poi= nt >> are already taken. If the reftable is then reallocated, it will >> therefore *always* be allocated behind that refblock, which is beyond >> its old limit. Therefore, that walk through the old reftable will neve= r >> miss that new allocation. >> >> So the issue can only occur if the old reftable is resized after the >> walk through it, that is, when allocating the new reftable. That is >> indeed an issue but I think it manifests itself basically like the iss= ue >> I'm testing here: There is now an area in the old refcount structures >> which was free before but has is used now, and the allocation causing >> that was the allocation of the new reftable. The only difference is >> whether the it's the old or the new reftable that resides in the >> previously free area. Thus, I think I'll leave it at this test =E2=80=93= but if >> you can describe to me how to create an image for a different "rewalk" >> path, I'm all ears. > =3D=3D=3D=3D=3D > The test you wrote does: > > original image, pre-walk: > reftable is one cluster; with one refblock and 63 zero entries > that refblock holds 4096 width-1 refcounts; of those, the first 63 ar= e > non-zero, the remaining are zero. Image is 32256 bytes long > > During the first walk, we call operation() 64 times - the first time > with refblock_empty false, the remaining 63 times with refblock_empty t= rue. > > after first walk but before reftable allocation, we have allocated one > refblock that holds 64 width-64 refcounts (all zero, because we don't > populate them until the final walk); and the old table now has 64 > refcounts populated. Image is 32768 bytes long. > > Then we allocating a new reftable; so far, we only created one refblock > for it to hold, so one cluster is sufficient. The allocation causes the > old table to now have 65 refcounts populated. Image is now 33280 bytes = long. > > On the second pass, we call operation() 64 times; now the first two > walks have refblock_empty as false, which means we allocate a new > refblock. This allocation causes the old table to now have 66 refcount= s > populated. Image is now 33792 bytes long. > > So we free our first attempt at a new reftable, and allocate another (a > single cluster is still sufficient to hold two refblocks); I'm not sure > whether this free/realloc will reuse cluster 65 or if it will pick up > cluster 67 and leave a hole in 65. [I guess it depends on whether > cluster allocation is done by first-fit analysis or whether it blindly > favors allocating at the end of the image]. There is a free_cluster_index to speed up finding the first fit. It's=20 reset when freeing clusters before that index, therefore cluster 65=20 should be reused. > Either way, we have to do a > third iteration, because the second iteration allocated a refblock and > "reallocated" a reftable. > > On the third pass, operation() is still called 64 times, but because th= e > only two calls with refblock_empty as false already have an allocated > refblock, no further allocations are needed, and we are done with the d= o > loop; the fourth walk can set refcounts. > > =3D=3D=3D=3D=3D > The test I thought you were writing would start > > original image, pre-walk: > reftable is one cluster; with one refblock and 63 zero entries > that refblock holds 64 width-64 refcounts; of those, the first 63 are > non-zero, the remaining are zero. Image is 32256 bytes long > > During the first walk, we call operation() 1 time, with refblock_empty > false. > > after first walk but before reftable allocation, we have allocated one > refblock that holds 4096 width-1 refcounts (all zero, because we don't > populate them until the final walk); and the old table now has 64 > refcounts populated. Image is 32768 bytes long. > > Then we allocating a new reftable; so far, we only created one refblock > for it to hold, so one cluster is sufficient. The allocation causes the > old table to now have 66 refcounts populated (one for the new refblock, > but also one for an additional refblock in the old table because the > first refblock was full). Image is now 33792 bytes long. > > On the second pass, we call operation() 1 time with refblock_empty as > false, so we don't need any allocation. > > Which means the test you wrote is correct, while my idea does NOT > trigger the third walk, at least not for the initial file size of 32256. > You've been vindicated, you did it correctly :) > > > =3D=3D=3D=3D=3D > Now, in response to your question about some other 3-pass inducing > pattern, let's think back to v1, where you questioned what would happen > if a hole in the reftable gets turned into data due to a later > allocation. Let's see if I can come up with a scenario for that... > > Let's stick with a cluster size of 512, and use 32-bit and 64-bit width= s > as our two sizes. If we downsize from 64 to 32 bits, then every two > refblock clusters in the old table results in one call to operation() > for the new table; conversely, if we upsize, then every refblock cluste= r > in the old table gives two calls to operation() in the new table. The > trick at hand is to come up with some image where we punch a hole so > that on the first pass, we call operation() with refblock_empty true fo= r > one iteration (necessarily a call later than the first, since the image > header guarantees the first refblock is not empty), but where we have > data after the hole, where it is the later data that triggers the > allocation that will finally start to fill the hole. > > How about starting with an image that occupies between 1.5 and 2 > refblocks worth of 32-width clusters (so an image anywhere between 193 > and 256 clusters, or between 98816 and 131072 bytes). You should be > able to figure out how many clusters this consumes for L1, L2, plus 1 > for header, reftable, and 2 for refblocks, in order to figure out how > many remaining clusters are dedicated to data; ideally, the data > clusters are contiguous, and occupy a swath that covers at least > clusters 126 through 192. Widening to 64-bit width will require 4 > refblocks instead of 2, if all refblocks are needed. But the whole ide= a > of punching a hole is that we don't need a refblock if it will be > all-zero entries. So take this original image, and discard the data > clusters from physical index 126 through 192, (this is NOT the data > visible at guest offset 31744, but whatever actual offset of guest data > that maps to physical offset 31744). The old reftable now looks like { > refblock_o1 [0-125 occupied, 126 and 127 empty], refblock_o2 [128-191 > empty, 192-whatever occupied, tail empty] }. With no allocations > required, this would in turn would map to the following new refblocks: = { > refblock_n1 [0-64 occupied], refblock_n2 [65-125 occupied, 126-127 > empty], NULL, refblock_n4 [192-whatever occupied] }. Note that we do > not need to allocate refblock_n3 because of the hole in the old > refblock; we DO end up allocating three refblocks, but in the sequence > of things, refblock_n1 and refblock_n2 are allocated while we are > visiting refblock_o1 and still fit in refblock_o1, while refblock_n4 is > not allocated until after we have already passed over the first half of > refblock_o2. > > Thus, the second walk over the image will see that we need to allocate > refblock_n3 because it now contains entries (in particular, the entry > for refblock_n4, but also the 1-cluster entry for the proposed reftable > that is allocated between the walks). So, while your test used the > allocation of the reftable as the spillover point, my scenario here use= s > the allocation of later refblocks as the spillover point that got misse= d > during the first iteration. Sounds good, the only problem is that I'd have to hand-craft the image=20 myself, because qemu generally uses self-references for refblocks (when=20 allocating new refblocks, they will contain their own refcount). I think this already would be too much effort (I'll reply to your second=20 email right away ;-)). There is no fundamental difference in how new=20 allocations for the new reftable and for the new refblocks are treated:=20 If there's a new allocation, respin. If that works for the new reftable,=20 that's enough to convince me it will work for new refblocks as well. Max