Re: [Qemu-devel] [PATCH v2 21/21] iotests: Add test for different refcount widths

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Max Reitz <mreitz@redhat.com>
To: Eric Blake <eblake@redhat.com>, qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>, Peter Lieven <pl@kamp.de>,
	Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v2 21/21] iotests: Add test for different refcount widths
Date: Thu, 20 Nov 2014 14:48:03 +0100	[thread overview]
Message-ID: <546DF113.70403@redhat.com> (raw)
In-Reply-To: <546BAB8E.3090200@redhat.com>

On 2014-11-18 at 21:26, Eric Blake wrote:
> On 11/17/2014 05:06 AM, Max Reitz wrote:
>
>>> Umm, that sounds backwards from what you document.  It's a good test of
>>> the _new_ reftable needing a second round of allocations.  So keep it
>>> with corrected comments.  But I think you _intended_ to write a test
>>> that starts with a refcount_width=64 image and resize to a
>>> refcount_width=1, where the _old_ reftable then suffers a reallocation
>>> as part of allocating refblocks for the new table.  It may even help if
>>> you add a tracepoint for every iteration through the walk function
>>> callback, to prove we are indeed executing it 3 times instead of the
>>> usual 2, for these test cases.
>> I'm currently thinking about a way to test the old reftable reallocation
>> issue, and I can't find any. So, for the old reftable to require a
>> reallocation it must grow. For it to grow we need some allocation beyond
>> what it can currently represent. For this to happen during the refblock
>> allocation walk, this allocation must be the allocation of a new refblock.
>>
>> If the refblock is allocated beyond the current reftable's limit, this
>> means that either all clusters between free_cluster_index and that point
>> are already taken. If the reftable is then reallocated, it will
>> therefore *always* be allocated behind that refblock, which is beyond
>> its old limit. Therefore, that walk through the old reftable will never
>> miss that new allocation.
>>
>> So the issue can only occur if the old reftable is resized after the
>> walk through it, that is, when allocating the new reftable. That is
>> indeed an issue but I think it manifests itself basically like the issue
>> I'm testing here: There is now an area in the old refcount structures
>> which was free before but has is used now, and the allocation causing
>> that was the allocation of the new reftable. The only difference is
>> whether the it's the old or the new reftable that resides in the
>> previously free area. Thus, I think I'll leave it at this test – but if
>> you can describe to me how to create an image for a different "rewalk"
>> path, I'm all ears.
> =====
> The test you wrote does:
>
> original image, pre-walk:
> reftable is one cluster; with one refblock and 63 zero entries
>   that refblock holds 4096 width-1 refcounts; of those, the first 63 are
> non-zero, the remaining are zero. Image is 32256 bytes long
>
> During the first walk, we call operation() 64 times - the first time
> with refblock_empty false, the remaining 63 times with refblock_empty true.
>
> after first walk but before reftable allocation, we have allocated one
> refblock that holds 64 width-64 refcounts (all zero, because we don't
> populate them until the final walk); and the old table now has 64
> refcounts populated. Image is 32768 bytes long.
>
> Then we allocating a new reftable; so far, we only created one refblock
> for it to hold, so one cluster is sufficient. The allocation causes the
> old table to now have 65 refcounts populated. Image is now 33280 bytes long.
>
> On the second pass, we call operation() 64 times; now the first two
> walks have refblock_empty as false, which means we allocate a new
> refblock.  This allocation causes the old table to now have 66 refcounts
> populated. Image is now 33792 bytes long.
>
> So we free our first attempt at a new reftable, and allocate another (a
> single cluster is still sufficient to hold two refblocks); I'm not sure
> whether this free/realloc will reuse cluster 65 or if it will pick up
> cluster 67 and leave a hole in 65.  [I guess it depends on whether
> cluster allocation is done by first-fit analysis or whether it blindly
> favors allocating at the end of the image].

There is a free_cluster_index to speed up finding the first fit. It's 
reset when freeing clusters before that index, therefore cluster 65 
should be reused.

> Either way, we have to do a
> third iteration, because the second iteration allocated a refblock and
> "reallocated" a reftable.
>
> On the third pass, operation() is still called 64 times, but because the
> only two calls with refblock_empty as false already have an allocated
> refblock, no further allocations are needed, and we are done with the do
> loop; the fourth walk can set refcounts.
>
> =====
> The test I thought you were writing would start
>
> original image, pre-walk:
> reftable is one cluster; with one refblock and 63 zero entries
>   that refblock holds 64 width-64 refcounts; of those, the first 63 are
> non-zero, the remaining are zero. Image is 32256 bytes long
>
> During the first walk, we call operation() 1 time, with refblock_empty
> false.
>
> after first walk but before reftable allocation, we have allocated one
> refblock that holds 4096 width-1 refcounts (all zero, because we don't
> populate them until the final walk); and the old table now has 64
> refcounts populated. Image is 32768 bytes long.
>
> Then we allocating a new reftable; so far, we only created one refblock
> for it to hold, so one cluster is sufficient. The allocation causes the
> old table to now have 66 refcounts populated (one for the new refblock,
> but also one for an additional refblock in the old table because the
> first refblock was full). Image is now 33792 bytes long.
>
> On the second pass, we call operation() 1 time with refblock_empty as
> false, so we don't need any allocation.
>
> Which means the test you wrote is correct, while my idea does NOT
> trigger the third walk, at least not for the initial file size of 32256.
>   You've been vindicated, you did it correctly :)
>
>
> =====
> Now, in response to your question about some other 3-pass inducing
> pattern, let's think back to v1, where you questioned what would happen
> if a hole in the reftable gets turned into data due to a later
> allocation.  Let's see if I can come up with a scenario for that...
>
> Let's stick with a cluster size of 512, and use 32-bit and 64-bit widths
> as our two sizes.  If we downsize from 64 to 32 bits, then every two
> refblock clusters in the old table results in one call to operation()
> for the new table; conversely, if we upsize, then every refblock cluster
> in the old table gives two calls to operation() in the new table.  The
> trick at hand is to come up with some image where we punch a hole so
> that on the first pass, we call operation() with refblock_empty true for
> one iteration (necessarily a call later than the first, since the image
> header guarantees the first refblock is not empty), but where we have
> data after the hole, where it is the later data that triggers the
> allocation that will finally start to fill the hole.
>
> How about starting with an image that occupies between 1.5 and 2
> refblocks worth of 32-width clusters (so an image anywhere between 193
> and 256 clusters, or between 98816 and 131072 bytes).  You should be
> able to figure out how many clusters this consumes for L1, L2, plus 1
> for header, reftable, and 2 for refblocks, in order to figure out how
> many remaining clusters are dedicated to data; ideally, the data
> clusters are contiguous, and occupy a swath that covers at least
> clusters 126 through 192.  Widening to 64-bit width will require 4
> refblocks instead of 2, if all refblocks are needed.  But the whole idea
> of punching a hole is that we don't need a refblock if it will be
> all-zero entries.  So take this original image, and discard the data
> clusters from physical index 126 through 192, (this is NOT the data
> visible at guest offset 31744, but whatever actual offset of guest data
> that maps to physical offset 31744).  The old reftable now looks like {
> refblock_o1 [0-125 occupied, 126 and 127 empty], refblock_o2 [128-191
> empty, 192-whatever occupied, tail empty] }.  With no allocations
> required, this would in turn would map to the following new refblocks: {
> refblock_n1 [0-64 occupied], refblock_n2 [65-125 occupied, 126-127
> empty], NULL, refblock_n4 [192-whatever occupied] }.  Note that we do
> not need to allocate refblock_n3 because of the hole in the old
> refblock; we DO end up allocating three refblocks, but in the sequence
> of things, refblock_n1 and refblock_n2 are allocated while we are
> visiting refblock_o1 and still fit in refblock_o1, while refblock_n4 is
> not allocated until after we have already passed over the first half of
> refblock_o2.
>
> Thus, the second walk over the image will see that we need to allocate
> refblock_n3 because it now contains entries (in particular, the entry
> for refblock_n4, but also the 1-cluster entry for the proposed reftable
> that is allocated between the walks).  So, while your test used the
> allocation of the reftable as the spillover point, my scenario here uses
> the allocation of later refblocks as the spillover point that got missed
> during the first iteration.

Sounds good, the only problem is that I'd have to hand-craft the image 
myself, because qemu generally uses self-references for refblocks (when 
allocating new refblocks, they will contain their own refcount).

I think this already would be too much effort (I'll reply to your second 
email right away ;-)). There is no fundamental difference in how new 
allocations for the new reftable and for the new refblocks are treated: 
If there's a new allocation, respin. If that works for the new reftable, 
that's enough to convince me it will work for new refblocks as well.

Max

next prev parent reply	other threads:[~2014-11-20 13:48 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-14 13:05 [Qemu-devel] [PATCH v2 00/21] qcow2: Support refcount orders != 4 Max Reitz
2014-11-14 13:05 ` [Qemu-devel] [PATCH v2 01/21] qcow2: Add two new fields to BDRVQcowState Max Reitz
2014-11-14 13:05 ` [Qemu-devel] [PATCH v2 02/21] qcow2: Add refcount_width to format-specific info Max Reitz
2014-11-15 16:00   ` Eric Blake
2014-11-14 13:05 ` [Qemu-devel] [PATCH v2 03/21] qcow2: Use 64 bits for refcount values Max Reitz
2014-11-14 13:05 ` [Qemu-devel] [PATCH v2 04/21] qcow2: Respect error in qcow2_alloc_bytes() Max Reitz
2014-11-14 13:05 ` [Qemu-devel] [PATCH v2 05/21] qcow2: Refcount overflow and qcow2_alloc_bytes() Max Reitz
2014-11-14 13:05 ` [Qemu-devel] [PATCH v2 06/21] qcow2: Helper for refcount array reallocation Max Reitz
2014-11-15 16:50   ` Eric Blake
2014-11-17  8:37     ` Max Reitz
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 07/21] qcow2: Helper function for refcount modification Max Reitz
2014-11-15 17:02   ` Eric Blake
2014-11-17  8:42     ` Max Reitz
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 08/21] qcow2: More helpers " Max Reitz
2014-11-15 17:08   ` Eric Blake
2014-11-17  8:44     ` Max Reitz
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 09/21] qcow2: Open images with refcount order != 4 Max Reitz
2014-11-15 17:09   ` Eric Blake
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 10/21] qcow2: refcount_order parameter for qcow2_create2 Max Reitz
2014-11-15 17:13   ` Eric Blake
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 11/21] iotests: Prepare for refcount_width option Max Reitz
2014-11-15 17:17   ` Eric Blake
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 12/21] qcow2: Allow creation with refcount order != 4 Max Reitz
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 13/21] block: Add opaque value to the amend CB Max Reitz
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 14/21] qcow2: Use error_report() in qcow2_amend_options() Max Reitz
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 15/21] qcow2: Use abort() instead of assert(false) Max Reitz
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 16/21] qcow2: Split upgrade/downgrade paths for amend Max Reitz
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 17/21] qcow2: Use intermediate helper CB " Max Reitz
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 18/21] qcow2: Add function for refcount order amendment Max Reitz
2014-11-18 17:55   ` Eric Blake
2014-11-18 18:58     ` Max Reitz
2014-11-18 19:56       ` Eric Blake
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 19/21] qcow2: Invoke refcount order amendment function Max Reitz
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 20/21] qcow2: Point to amend function in check Max Reitz
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 21/21] iotests: Add test for different refcount widths Max Reitz
2014-11-15 14:50   ` Eric Blake
2014-11-17  8:34     ` Max Reitz
2014-11-17 10:38       ` Max Reitz
2014-11-17 11:02         ` Max Reitz
2014-11-17 12:06     ` Max Reitz
2014-11-18 20:26       ` Eric Blake
2014-11-19  5:52         ` Eric Blake
2014-11-20 14:03           ` Max Reitz
2014-11-20 21:21             ` Eric Blake
2014-11-20 13:48         ` Max Reitz [this message]
2014-11-20 21:27           ` Eric Blake

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=546DF113.70403@redhat.com \
    --to=mreitz@redhat.com \
    --cc=eblake@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=pl@kamp.de \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).