All of lore.kernel.org
 help / color / mirror / Atom feed
From: Max Reitz <mreitz@redhat.com>
To: Eric Blake <eblake@redhat.com>, qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>, Peter Lieven <pl@kamp.de>,
	Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v2 21/21] iotests: Add test for different refcount widths
Date: Thu, 20 Nov 2014 14:48:03 +0100	[thread overview]
Message-ID: <546DF113.70403@redhat.com> (raw)
In-Reply-To: <546BAB8E.3090200@redhat.com>

On 2014-11-18 at 21:26, Eric Blake wrote:
> On 11/17/2014 05:06 AM, Max Reitz wrote:
>
>>> Umm, that sounds backwards from what you document.  It's a good test of
>>> the _new_ reftable needing a second round of allocations.  So keep it
>>> with corrected comments.  But I think you _intended_ to write a test
>>> that starts with a refcount_width=64 image and resize to a
>>> refcount_width=1, where the _old_ reftable then suffers a reallocation
>>> as part of allocating refblocks for the new table.  It may even help if
>>> you add a tracepoint for every iteration through the walk function
>>> callback, to prove we are indeed executing it 3 times instead of the
>>> usual 2, for these test cases.
>> I'm currently thinking about a way to test the old reftable reallocation
>> issue, and I can't find any. So, for the old reftable to require a
>> reallocation it must grow. For it to grow we need some allocation beyond
>> what it can currently represent. For this to happen during the refblock
>> allocation walk, this allocation must be the allocation of a new refblock.
>>
>> If the refblock is allocated beyond the current reftable's limit, this
>> means that either all clusters between free_cluster_index and that point
>> are already taken. If the reftable is then reallocated, it will
>> therefore *always* be allocated behind that refblock, which is beyond
>> its old limit. Therefore, that walk through the old reftable will never
>> miss that new allocation.
>>
>> So the issue can only occur if the old reftable is resized after the
>> walk through it, that is, when allocating the new reftable. That is
>> indeed an issue but I think it manifests itself basically like the issue
>> I'm testing here: There is now an area in the old refcount structures
>> which was free before but has is used now, and the allocation causing
>> that was the allocation of the new reftable. The only difference is
>> whether the it's the old or the new reftable that resides in the
>> previously free area. Thus, I think I'll leave it at this test – but if
>> you can describe to me how to create an image for a different "rewalk"
>> path, I'm all ears.
> =====
> The test you wrote does:
>
> original image, pre-walk:
> reftable is one cluster; with one refblock and 63 zero entries
>   that refblock holds 4096 width-1 refcounts; of those, the first 63 are
> non-zero, the remaining are zero. Image is 32256 bytes long
>
> During the first walk, we call operation() 64 times - the first time
> with refblock_empty false, the remaining 63 times with refblock_empty true.
>
> after first walk but before reftable allocation, we have allocated one
> refblock that holds 64 width-64 refcounts (all zero, because we don't
> populate them until the final walk); and the old table now has 64
> refcounts populated. Image is 32768 bytes long.
>
> Then we allocating a new reftable; so far, we only created one refblock
> for it to hold, so one cluster is sufficient. The allocation causes the
> old table to now have 65 refcounts populated. Image is now 33280 bytes long.
>
> On the second pass, we call operation() 64 times; now the first two
> walks have refblock_empty as false, which means we allocate a new
> refblock.  This allocation causes the old table to now have 66 refcounts
> populated. Image is now 33792 bytes long.
>
> So we free our first attempt at a new reftable, and allocate another (a
> single cluster is still sufficient to hold two refblocks); I'm not sure
> whether this free/realloc will reuse cluster 65 or if it will pick up
> cluster 67 and leave a hole in 65.  [I guess it depends on whether
> cluster allocation is done by first-fit analysis or whether it blindly
> favors allocating at the end of the image].

There is a free_cluster_index to speed up finding the first fit. It's 
reset when freeing clusters before that index, therefore cluster 65 
should be reused.

> Either way, we have to do a
> third iteration, because the second iteration allocated a refblock and
> "reallocated" a reftable.
>
> On the third pass, operation() is still called 64 times, but because the
> only two calls with refblock_empty as false already have an allocated
> refblock, no further allocations are needed, and we are done with the do
> loop; the fourth walk can set refcounts.
>
> =====
> The test I thought you were writing would start
>
> original image, pre-walk:
> reftable is one cluster; with one refblock and 63 zero entries
>   that refblock holds 64 width-64 refcounts; of those, the first 63 are
> non-zero, the remaining are zero. Image is 32256 bytes long
>
> During the first walk, we call operation() 1 time, with refblock_empty
> false.
>
> after first walk but before reftable allocation, we have allocated one
> refblock that holds 4096 width-1 refcounts (all zero, because we don't
> populate them until the final walk); and the old table now has 64
> refcounts populated. Image is 32768 bytes long.
>
> Then we allocating a new reftable; so far, we only created one refblock
> for it to hold, so one cluster is sufficient. The allocation causes the
> old table to now have 66 refcounts populated (one for the new refblock,
> but also one for an additional refblock in the old table because the
> first refblock was full). Image is now 33792 bytes long.
>
> On the second pass, we call operation() 1 time with refblock_empty as
> false, so we don't need any allocation.
>
> Which means the test you wrote is correct, while my idea does NOT
> trigger the third walk, at least not for the initial file size of 32256.
>   You've been vindicated, you did it correctly :)
>
>
> =====
> Now, in response to your question about some other 3-pass inducing
> pattern, let's think back to v1, where you questioned what would happen
> if a hole in the reftable gets turned into data due to a later
> allocation.  Let's see if I can come up with a scenario for that...
>
> Let's stick with a cluster size of 512, and use 32-bit and 64-bit widths
> as our two sizes.  If we downsize from 64 to 32 bits, then every two
> refblock clusters in the old table results in one call to operation()
> for the new table; conversely, if we upsize, then every refblock cluster
> in the old table gives two calls to operation() in the new table.  The
> trick at hand is to come up with some image where we punch a hole so
> that on the first pass, we call operation() with refblock_empty true for
> one iteration (necessarily a call later than the first, since the image
> header guarantees the first refblock is not empty), but where we have
> data after the hole, where it is the later data that triggers the
> allocation that will finally start to fill the hole.
>
> How about starting with an image that occupies between 1.5 and 2
> refblocks worth of 32-width clusters (so an image anywhere between 193
> and 256 clusters, or between 98816 and 131072 bytes).  You should be
> able to figure out how many clusters this consumes for L1, L2, plus 1
> for header, reftable, and 2 for refblocks, in order to figure out how
> many remaining clusters are dedicated to data; ideally, the data
> clusters are contiguous, and occupy a swath that covers at least
> clusters 126 through 192.  Widening to 64-bit width will require 4
> refblocks instead of 2, if all refblocks are needed.  But the whole idea
> of punching a hole is that we don't need a refblock if it will be
> all-zero entries.  So take this original image, and discard the data
> clusters from physical index 126 through 192, (this is NOT the data
> visible at guest offset 31744, but whatever actual offset of guest data
> that maps to physical offset 31744).  The old reftable now looks like {
> refblock_o1 [0-125 occupied, 126 and 127 empty], refblock_o2 [128-191
> empty, 192-whatever occupied, tail empty] }.  With no allocations
> required, this would in turn would map to the following new refblocks: {
> refblock_n1 [0-64 occupied], refblock_n2 [65-125 occupied, 126-127
> empty], NULL, refblock_n4 [192-whatever occupied] }.  Note that we do
> not need to allocate refblock_n3 because of the hole in the old
> refblock; we DO end up allocating three refblocks, but in the sequence
> of things, refblock_n1 and refblock_n2 are allocated while we are
> visiting refblock_o1 and still fit in refblock_o1, while refblock_n4 is
> not allocated until after we have already passed over the first half of
> refblock_o2.
>
> Thus, the second walk over the image will see that we need to allocate
> refblock_n3 because it now contains entries (in particular, the entry
> for refblock_n4, but also the 1-cluster entry for the proposed reftable
> that is allocated between the walks).  So, while your test used the
> allocation of the reftable as the spillover point, my scenario here uses
> the allocation of later refblocks as the spillover point that got missed
> during the first iteration.

Sounds good, the only problem is that I'd have to hand-craft the image 
myself, because qemu generally uses self-references for refblocks (when 
allocating new refblocks, they will contain their own refcount).

I think this already would be too much effort (I'll reply to your second 
email right away ;-)). There is no fundamental difference in how new 
allocations for the new reftable and for the new refblocks are treated: 
If there's a new allocation, respin. If that works for the new reftable, 
that's enough to convince me it will work for new refblocks as well.

Max

  parent reply	other threads:[~2014-11-20 13:48 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-14 13:05 [Qemu-devel] [PATCH v2 00/21] qcow2: Support refcount orders != 4 Max Reitz
2014-11-14 13:05 ` [Qemu-devel] [PATCH v2 01/21] qcow2: Add two new fields to BDRVQcowState Max Reitz
2014-11-14 13:05 ` [Qemu-devel] [PATCH v2 02/21] qcow2: Add refcount_width to format-specific info Max Reitz
2014-11-15 16:00   ` Eric Blake
2014-11-14 13:05 ` [Qemu-devel] [PATCH v2 03/21] qcow2: Use 64 bits for refcount values Max Reitz
2014-11-14 13:05 ` [Qemu-devel] [PATCH v2 04/21] qcow2: Respect error in qcow2_alloc_bytes() Max Reitz
2014-11-14 13:05 ` [Qemu-devel] [PATCH v2 05/21] qcow2: Refcount overflow and qcow2_alloc_bytes() Max Reitz
2014-11-14 13:05 ` [Qemu-devel] [PATCH v2 06/21] qcow2: Helper for refcount array reallocation Max Reitz
2014-11-15 16:50   ` Eric Blake
2014-11-17  8:37     ` Max Reitz
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 07/21] qcow2: Helper function for refcount modification Max Reitz
2014-11-15 17:02   ` Eric Blake
2014-11-17  8:42     ` Max Reitz
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 08/21] qcow2: More helpers " Max Reitz
2014-11-15 17:08   ` Eric Blake
2014-11-17  8:44     ` Max Reitz
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 09/21] qcow2: Open images with refcount order != 4 Max Reitz
2014-11-15 17:09   ` Eric Blake
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 10/21] qcow2: refcount_order parameter for qcow2_create2 Max Reitz
2014-11-15 17:13   ` Eric Blake
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 11/21] iotests: Prepare for refcount_width option Max Reitz
2014-11-15 17:17   ` Eric Blake
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 12/21] qcow2: Allow creation with refcount order != 4 Max Reitz
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 13/21] block: Add opaque value to the amend CB Max Reitz
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 14/21] qcow2: Use error_report() in qcow2_amend_options() Max Reitz
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 15/21] qcow2: Use abort() instead of assert(false) Max Reitz
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 16/21] qcow2: Split upgrade/downgrade paths for amend Max Reitz
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 17/21] qcow2: Use intermediate helper CB " Max Reitz
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 18/21] qcow2: Add function for refcount order amendment Max Reitz
2014-11-18 17:55   ` Eric Blake
2014-11-18 18:58     ` Max Reitz
2014-11-18 19:56       ` Eric Blake
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 19/21] qcow2: Invoke refcount order amendment function Max Reitz
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 20/21] qcow2: Point to amend function in check Max Reitz
2014-11-14 13:06 ` [Qemu-devel] [PATCH v2 21/21] iotests: Add test for different refcount widths Max Reitz
2014-11-15 14:50   ` Eric Blake
2014-11-17  8:34     ` Max Reitz
2014-11-17 10:38       ` Max Reitz
2014-11-17 11:02         ` Max Reitz
2014-11-17 12:06     ` Max Reitz
2014-11-18 20:26       ` Eric Blake
2014-11-19  5:52         ` Eric Blake
2014-11-20 14:03           ` Max Reitz
2014-11-20 21:21             ` Eric Blake
2014-11-20 13:48         ` Max Reitz [this message]
2014-11-20 21:27           ` Eric Blake

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=546DF113.70403@redhat.com \
    --to=mreitz@redhat.com \
    --cc=eblake@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=pl@kamp.de \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.