From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:51265) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XqyCB-0005JI-7r for qemu-devel@nongnu.org; Wed, 19 Nov 2014 00:52:56 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XqyC6-0000wc-3i for qemu-devel@nongnu.org; Wed, 19 Nov 2014 00:52:51 -0500 Received: from mx1.redhat.com ([209.132.183.28]:51216) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XqyC5-0000wY-Pi for qemu-devel@nongnu.org; Wed, 19 Nov 2014 00:52:46 -0500 Message-ID: <546C3022.9070701@redhat.com> Date: Tue, 18 Nov 2014 22:52:34 -0700 From: Eric Blake MIME-Version: 1.0 References: <1415970374-16811-1-git-send-email-mreitz@redhat.com> <1415970374-16811-22-git-send-email-mreitz@redhat.com> <5467682E.1030604@redhat.com> <5469E4C0.8040509@redhat.com> <546BAB8E.3090200@redhat.com> In-Reply-To: <546BAB8E.3090200@redhat.com> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="ivE7BT3ckQkLNe03DSOnHPNScuhvDXAvS" Subject: Re: [Qemu-devel] [PATCH v2 21/21] iotests: Add test for different refcount widths List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Max Reitz , qemu-devel@nongnu.org Cc: Kevin Wolf , Peter Lieven , Stefan Hajnoczi This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --ivE7BT3ckQkLNe03DSOnHPNScuhvDXAvS Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 11/18/2014 01:26 PM, Eric Blake wrote: > Now, in response to your question about some other 3-pass inducing > pattern, let's think back to v1, where you questioned what would happen= > if a hole in the reftable gets turned into data due to a later > allocation. Let's see if I can come up with a scenario for that... >=20 > Let's stick with a cluster size of 512, and use 32-bit and 64-bit width= s > as our two sizes. If we downsize from 64 to 32 bits, then every two > refblock clusters in the old table results in one call to operation() > for the new table; conversely, if we upsize, then every refblock cluste= r > in the old table gives two calls to operation() in the new table. The > trick at hand is to come up with some image where we punch a hole so > that on the first pass, we call operation() with refblock_empty true fo= r > one iteration (necessarily a call later than the first, since the image= > header guarantees the first refblock is not empty), but where we have > data after the hole, where it is the later data that triggers the > allocation that will finally start to fill the hole. >=20 > How about starting with an image that occupies between 1.5 and 2 > refblocks worth of 32-width clusters (so an image anywhere between 193 > and 256 clusters, or between 98816 and 131072 bytes). You should be > able to figure out how many clusters this consumes for L1, L2, plus 1 > for header, reftable, and 2 for refblocks, in order to figure out how > many remaining clusters are dedicated to data; ideally, the data > clusters are contiguous, and occupy a swath that covers at least > clusters 126 through 192. Widening to 64-bit width will require 4 > refblocks instead of 2, if all refblocks are needed. But the whole ide= a > of punching a hole is that we don't need a refblock if it will be > all-zero entries. So take this original image, and discard the data > clusters from physical index 126 through 192, (this is NOT the data > visible at guest offset 31744, but whatever actual offset of guest data= > that maps to physical offset 31744). The old reftable now looks like {= > refblock_o1 [0-125 occupied, 126 and 127 empty], refblock_o2 [128-191 > empty, 192-whatever occupied, tail empty] }. With no allocations > required, this would in turn would map to the following new refblocks: = { > refblock_n1 [0-64 occupied], refblock_n2 [65-125 occupied, 126-127 > empty], NULL, refblock_n4 [192-whatever occupied] }. Note that we do > not need to allocate refblock_n3 because of the hole in the old > refblock; we DO end up allocating three refblocks, but in the sequence > of things, refblock_n1 and refblock_n2 are allocated while we are > visiting refblock_o1 and still fit in refblock_o1, while refblock_n4 is= > not allocated until after we have already passed over the first half of= > refblock_o2. >=20 > Thus, the second walk over the image will see that we need to allocate > refblock_n3 because it now contains entries (in particular, the entry > for refblock_n4, but also the 1-cluster entry for the proposed reftable= > that is allocated between the walks). So, while your test used the > allocation of the reftable as the spillover point, my scenario here use= s > the allocation of later refblocks as the spillover point that got misse= d > during the first iteration. >=20 Oops,... >=20 > which means the reftable now looks like { refblock1, NULL, refblock3, > NULL... }; and where refblock1 now has at least two free entries > (possibly three, if the just-freed refblock2 happened to live before > cluster 62). is we can also free refblock2 >=20 =2E..forgot to delete these random thoughts that I typed up but no longer= needed after reworking the above text. At any rate, I'm not certain we can come up with a four-pass scenario; if it is even possible, it would be quite complex. Back in v1, I questioned what would happen with a completely full reftable, where the mere allocation of the new reftable causes a spillover not only in the size of the new reftable, but also to the old. Let's try to find a scenario where the reftable does NOT spill on the first pass, but DOES spill on the second. A 32-bit width image spills from 1 to 2 clusters for the reftable at the boundary of 64->65 refblocks (8192->8193 clusters); at that same cluster boundary, the reftable for a 64-bit width table spills from 2 to 3 clusters. Furthermore, a 64-bit width refblock misses an allocation on the first pass if there is a hole of 64 aligned clusters. First try: We want an image that has roughly 128 free clusters on the first pass, including a hole of at least 64 aligned clusters, to where a second pass is needed to cover the refblock that was treated as a hole on the first pass. Furthermore, we want the second pass to completely fill the image, so that the reftable allocation of 2 clusters after the walk is the trigger of the spill, then the third pass will allocate another refblock because of the spill, and a fourth pass is required to ensure no allocations before the final pass of assigning refcounts. Start with a 32-bit width image with clusters 0-125 allocated, 126-191 empty, then 192-8127 allocated (then 8128-8191 are unallocated to pad out to cluster boundary) - describing a file of size 4161536. This image has 130 clusters spare before it spills, with a hole of 66 clusters near the front, and a tail of 64 clusters; since there is no block of 128 aligned unallocated clusters, all 64 refblocks are in use. Now widen this image to 64-bit width refcounts. On the first walk, we allocate two refblocks for clusters 0-127 (using free slots 126 and 127), then leave an unallocated refblock for clusters 128-191, then allocate 124 more refblocks for clusters 192-8127 (using free slots 128-191, as well as 8128-8187); this walk also picks up a refblock for clusters 8128-8191 (using free slot 8188) because that area of the file is allocated before the walk reaches that point, even if it was a large enough hole to not need a refblock at the beginning of the walk. So we end the walk with 3 free clusters, and proceed to allocate a 2-cluster reftable for the 127 refblocks that we created (using free slots 8189-8190). On the second walk, we allocate a refblock for clusters 128-191 (slot 8191). At the end of the walk, we free the 2-cluster reftable, then reallocate; but as we still only need 2 clusters, we reuse slots 8189-8190. Bummer - a third pass doesn't need allocation. Second try: Tweak the input by one cluster: 0-125 allocated, 126-191 empty, 192-8128 allocated (8129-8191 unallocated); file size 4162048. 129 spare clusters, with hole of 66 and tail of 63. On the first walk, we allocate 127 refblocks (ending with the use of free slots 8129-8189), then allocate a 2-cluster reftable (free slots 8190-8191). On the second walk, we allocate a refblock for clusters 128-191 (slot 8192 - which triggers a spill of the old reftable), and a refblock for clusters 8192-8255 (since we already spilled). After the walk, we now allocate a 3-cluster reftable, but it fits nicely within 8192-8255. Bummer - a third pass still doesn't need an allocation. Third try: Tweak the input by an entire reftable cluster. Let's start with a 32-bit image with a 2-cluster reftable, and try for the spill of a 64-bit image from 3->4 clusters (happens at the boundary from cluster 12287->12288). We want around 192 free clusters, plus the space for the reftable, and still want a hole of at least 64 aligned clusters to trigger the second pass as the one that fills to the spilling point. So the image has clusters 0-125 allocated, 126-191 empty, then 192-12157 allocated (then 12158-16383 are unallocated to pad out to cluster boundary) - describing a file of size 6224896. This image has 196 clusters spare before it spills the 64-bit reftable, with a hole of 66 clusters near the front, and a tail of 130 clusters. On the first walk, we allocate two refblocks for clusters 0-127 (using free slots 126 and 127), then leave an unallocated refblock for clusters 128-191, then allocate 187 more refblocks for clusters 192-12157 (using free slots 128-191, as well as 12158-12280); this walk also triggers a refblock allocation for the 32-bit table (slot 12281 for clusters 12160-12287) and 2 refblocks for the 64-bit table (slots 12282-12283 for clusters 12158-12287) because that area of the file is allocated before the walk reaches that point. So we end the walk with 4 free clusters, and proceed to allocate a 3-cluster reftable for the 191 refblocks that we created (using free slots 12284-12286). On the second walk, we allocate a refblock for clusters 128-191 (slot 12287). At the end of the walk, we free the 3-cluster reftable, then reallocate; but as we still only need 3 clusters, we reuse slots 12284-12286. Bummer - a third pass doesn't need allocation. I'm sensing a pattern here - trying to go for reftable spills isn't going to help - by the second pass, we already have a reftable reservation (which we free and then reallocate), and the only time we'd need a larger reftable after the second walk is if we encountered extra refblocks during the walk, but the only way to get extra refblocks during the walk is to spill the old table during the second walk, but in that case the second walk picks up the spill. Fourth try: Even trying to play with larger files with larger holes won't easily help. Suppose I have a file that has 0-62 allocated, then 63-4159 unallocated, followed by 262144 clusters allocated (an image over 128M in size). On the first walk, we allocate a refblock (slot 63) for clusters 0-63, then pass over 64 refblocks that don't need allocation, then allocate 4096 refblocks (slots 64-4159). Not even considering the slots that are added in the old table, this means the second pass will allocate an additional 64 refblocks. But this allocation will NOT spill the size of the reftable, because the reftable is already sized large enough to cover the refblocks for the 128M tail of the file. What if we have a larger hole, and also tweak the input file so that there are no missing refblocks of the old table (the holes are only possible in the new table). That is, start with 0-62 allocated, 63-127 free, then a repeating pattern of 64 allocated clusters and 64 free, over the course of 130 repetitions. Follow that with 262144 allocated clusters. This means we have 65 + 130*64 =3D=3D 8385 free clusters, to dedicate solely to the new reftable; while the image occupies (131*128)+262144 =3D=3D 278912 clusters (requires 4358 refblock entries, which in turn requires 69 contiguous clusters for the reftable). On the first walk, we alternate between allocating a refblock and leaving a hole for a repetition of 131 times, then allocate 4096 refblocks. Thus, we have consumed 4227 allocations out of the 8385 free clusters; where the old table is now using the space of 67 of the allocation holes. Furthermore, the reftable allocation is too large to fit in any of the holes, so it gets appended at the end of the image (the image is now 278981 clusters, requiring 4360 refblock entries, but still only 69 clusters for the reftable). On the second walk, we allocate 67 more refblocks for the refcounts in the old table that we first treated as holes, plus 2 more refblocks for the tail of the file holding the new reftable; these allocations still fit in the holes, and are sufficient to fill up another hole. However, the question remains - does the hole in the old table get filled before or after the new table has already passed by the hole? If it is after, a third iteration would allocate yet another refblock or two; but if it is before, then the 2nd walk will already allocate refblocks when it encounters that spot in the old reftable. Remember, on the first iteration, we allocated a refblock for 0-63 (slot 63), for 128-191 (slot 64), and so on - the reason the second pass allocates a refblock for 64-127 is because the first pass didn't populate that area of the file until it was already beyond cluster 127. But on the second pass, the first refblock we allocate for clusters 64-127 will live somewhere around slot 8576, so it will be seen during the second walk. Even the refblock for clusters 8512-8575 will end up being allocated somewhere around slot 8640, which still gets visited during the second walk. You may still be able to tweak things to the point that we trigger an allocating third walk, but I'm not readily seeing what that tweak would be. At this point, I've spent far too long writing this email. I haven't completely ruled out the possibility of a corner case needing four passes through the do loop, but the image sizes required to get there are starting to be quite large compared to your simpler test of needing three passes through the do loop. I won't be bothered if we call it good, and quit trying to come up with any other "interesting" allocation sequencing. --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --ivE7BT3ckQkLNe03DSOnHPNScuhvDXAvS Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg iQEcBAEBCAAGBQJUbDAiAAoJEKeha0olJ0Nql98IAIGvRXHBuI8dSQDOxBZC0r56 aCQVji7utGcvpKJECmhGWN2UnA7UfAmyt7jZzAbiqRnNErIUJd9mfgs9Gl5MeQMI u4pUWu6/8Zg5ZNe2qZKrdS7+mOVNhm/Df2cyByuJHBzEBrFIGoyNHR5ojkbRqIBJ SWGNQBK/XPBoH9je4ap+jCeVg/Llw6xrUQN8gVPiyhzsQRXgVvruYYv+4OzAK5Fu 3G1CwKhHsWFYwU5tJ1cK+nA3I2KJ1oaqamWmtnShCqkQbiTW2NNIsckcG8qckMnf /wbF5gycyYauxMgtNxvbJ0n54ecFld/zJZLJolxzaziFQ9E0e7IGOLdnvKw1kYk= =kFZv -----END PGP SIGNATURE----- --ivE7BT3ckQkLNe03DSOnHPNScuhvDXAvS--