From: Max Reitz <mreitz@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
qemu-block@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>, qemu-devel@nongnu.org
Subject: Re: [PATCH 1/4] qcow2: Improve refcount structure rebuilding
Date: Fri, 26 Mar 2021 14:47:14 +0100 [thread overview]
Message-ID: <3eb1748e-012d-9c74-d9dc-c1a6f2644ba6@redhat.com> (raw)
In-Reply-To: <969cd321-0cc7-ddc3-8a0d-75819be3a6bf@virtuozzo.com>
On 26.03.21 12:48, Vladimir Sementsov-Ogievskiy wrote:
> 10.03.2021 18:59, Max Reitz wrote:
>> When rebuilding the refcount structures (when qemu-img check -r found
>> errors with refcount = 0, but reference count > 0), the new refcount
>> table defaults to being put at the image file end[1]. There is no good
>> reason for that except that it means we will not have to rewrite any
>> refblocks we already wrote to disk.
>>
>> Changing the code to rewrite those refblocks is not too difficult,
>> though, so let us do that. That is beneficial for images on block
>> devices, where we cannot really write beyond the end of the image file.
>>
>> [1] Unless there is something allocated in the area pointed to by the
>> last refblock, so we have to write that refblock. In that case, we
>> try to put the reftable in there.
>>
>> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1519071
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>> block/qcow2-refcount.c | 126 ++++++++++++++++++++++-------------------
>> 1 file changed, 67 insertions(+), 59 deletions(-)
>>
>> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
>> index 8e649b008e..162caeeb8e 100644
>> --- a/block/qcow2-refcount.c
>> +++ b/block/qcow2-refcount.c
>> @@ -2352,8 +2352,9 @@ static int
>> rebuild_refcount_structure(BlockDriverState *bs,
>> int64_t *nb_clusters)
>> {
>> BDRVQcow2State *s = bs->opaque;
>> - int64_t first_free_cluster = 0, reftable_offset = -1, cluster = 0;
>> + int64_t first_free_cluster = 0, reftable_offset = -1, cluster;
>> int64_t refblock_offset, refblock_start, refblock_index;
>> + int64_t first_cluster, end_cluster;
>> uint32_t reftable_size = 0;
>> uint64_t *on_disk_reftable = NULL;
>> void *on_disk_refblock;
>> @@ -2365,8 +2366,11 @@ static int
>> rebuild_refcount_structure(BlockDriverState *bs,
>> qcow2_cache_empty(bs, s->refcount_block_cache);
>> + first_cluster = 0;
>> + end_cluster = *nb_clusters;
>> +
>> write_refblocks:
>> - for (; cluster < *nb_clusters; cluster++) {
>> + for (cluster = first_cluster; cluster < end_cluster; cluster++) {
>> if (!s->get_refcount(*refcount_table, cluster)) {
>> continue;
>> }
>> @@ -2374,65 +2378,68 @@ write_refblocks:
>> refblock_index = cluster >> s->refcount_block_bits;
>> refblock_start = refblock_index << s->refcount_block_bits;
>> - /* Don't allocate a cluster in a refblock already written to
>> disk */
>> - if (first_free_cluster < refblock_start) {
>> - first_free_cluster = refblock_start;
>> - }
>> - refblock_offset = alloc_clusters_imrt(bs, 1, refcount_table,
>> - nb_clusters,
>> &first_free_cluster);
>> - if (refblock_offset < 0) {
>> - fprintf(stderr, "ERROR allocating refblock: %s\n",
>> - strerror(-refblock_offset));
>> - res->check_errors++;
>> - ret = refblock_offset;
>> - goto fail;
>> - }
>> -
>> - if (reftable_size <= refblock_index) {
>> - uint32_t old_reftable_size = reftable_size;
>> - uint64_t *new_on_disk_reftable;
>> + if (reftable_size > refblock_index &&
>> + on_disk_reftable[refblock_index])
>> + {
>> + refblock_offset = on_disk_reftable[refblock_index];
>
> In this branch, we assign it to ..
>
>> + } else {
>> + int64_t refblock_cluster_index;
>> - reftable_size = ROUND_UP((refblock_index + 1) *
>> REFTABLE_ENTRY_SIZE,
>> - s->cluster_size) /
>> REFTABLE_ENTRY_SIZE;
>> - new_on_disk_reftable = g_try_realloc(on_disk_reftable,
>> - reftable_size *
>> - REFTABLE_ENTRY_SIZE);
>> - if (!new_on_disk_reftable) {
>> + /* Don't allocate a cluster in a refblock already written
>> to disk */
>> + if (first_free_cluster < refblock_start) {
>> + first_free_cluster = refblock_start;
>> + }
>> + refblock_offset = alloc_clusters_imrt(bs, 1, refcount_table,
>> + nb_clusters,
>> + &first_free_cluster);
>> + if (refblock_offset < 0) {
>> + fprintf(stderr, "ERROR allocating refblock: %s\n",
>> + strerror(-refblock_offset));
>> res->check_errors++;
>> - ret = -ENOMEM;
>> + ret = refblock_offset;
>> goto fail;
>> }
>> - on_disk_reftable = new_on_disk_reftable;
>> - memset(on_disk_reftable + old_reftable_size, 0,
>> - (reftable_size - old_reftable_size) *
>> REFTABLE_ENTRY_SIZE);
>> + refblock_cluster_index = refblock_offset / s->cluster_size;
>> + if (refblock_cluster_index >= end_cluster) {
>> + /*
>> + * We must write the refblock that holds this refblock's
>> + * refcount
>> + */
>> + end_cluster = refblock_cluster_index + 1;
>> + }
>> - /* The offset we have for the reftable is now no longer
>> valid;
>> - * this will leak that range, but we can easily fix that
>> by running
>> - * a leak-fixing check after this rebuild operation */
>> - reftable_offset = -1;
>> - } else {
>> - assert(on_disk_reftable);
>> - }
>> - on_disk_reftable[refblock_index] = refblock_offset;
>> + if (reftable_size <= refblock_index) {
>> + uint32_t old_reftable_size = reftable_size;
>> + uint64_t *new_on_disk_reftable;
>> +
>> + reftable_size =
>> + ROUND_UP((refblock_index + 1) * REFTABLE_ENTRY_SIZE,
>> + s->cluster_size) / REFTABLE_ENTRY_SIZE;
>> + new_on_disk_reftable =
>> + g_try_realloc(on_disk_reftable,
>> + reftable_size * REFTABLE_ENTRY_SIZE);
>> + if (!new_on_disk_reftable) {
>> + res->check_errors++;
>> + ret = -ENOMEM;
>> + goto fail;
>> + }
>> + on_disk_reftable = new_on_disk_reftable;
>> - /* If this is apparently the last refblock (for now), try to
>> squeeze the
>> - * reftable in */
>> - if (refblock_index == (*nb_clusters - 1) >>
>> s->refcount_block_bits &&
>> - reftable_offset < 0)
>> - {
>> - uint64_t reftable_clusters = size_to_clusters(s,
>> reftable_size *
>> -
>> REFTABLE_ENTRY_SIZE);
>> - reftable_offset = alloc_clusters_imrt(bs, reftable_clusters,
>> - refcount_table,
>> nb_clusters,
>> - &first_free_cluster);
>> - if (reftable_offset < 0) {
>> - fprintf(stderr, "ERROR allocating reftable: %s\n",
>> - strerror(-reftable_offset));
>> - res->check_errors++;
>> - ret = reftable_offset;
>> - goto fail;
>> + memset(on_disk_reftable + old_reftable_size, 0,
>> + (reftable_size - old_reftable_size) *
>> + REFTABLE_ENTRY_SIZE);
>> +
>> + /*
>> + * The offset we have for the reftable is now no
>> longer valid;
>> + * this will leak that range, but we can easily fix
>> that by
>> + * running a leak-fixing check after this rebuild
>> operation
>> + */
>> + reftable_offset = -1;
>> + } else {
>> + assert(on_disk_reftable);
>> }
>> + on_disk_reftable[refblock_index] = refblock_offset;
>
> only to write back again ?
This assignment is on a deeper level, though, isn it? I.e. it’s inside
the else branch from above.
>> }
>> ret = qcow2_pre_write_overlap_check(bs, 0, refblock_offset,
>> @@ -2459,15 +2466,12 @@ write_refblocks:
>> }
>> if (reftable_offset < 0) {
>
> at this point reftable_offset is always -1 now..
>
> Ah not. and now I am a bit close to understanding all of this logic.
> this thing with "goto write_refblocks" is not obvious
>
>> - uint64_t post_refblock_start, reftable_clusters;
>> + uint64_t reftable_clusters;
>> - post_refblock_start = ROUND_UP(*nb_clusters,
>> s->refcount_block_size);
>> reftable_clusters =
>> size_to_clusters(s, reftable_size * REFTABLE_ENTRY_SIZE);
>> - /* Not pretty but simple */
>> - if (first_free_cluster < post_refblock_start) {
>> - first_free_cluster = post_refblock_start;
>> - }
>> +
>> + first_free_cluster = 0;
>> reftable_offset = alloc_clusters_imrt(bs, reftable_clusters,
>> refcount_table,
>> nb_clusters,
>> &first_free_cluster);
>> @@ -2479,6 +2483,10 @@ write_refblocks:
>> goto fail;
>> }
>> + assert(offset_into_cluster(s, reftable_offset) == 0);
>> + first_cluster = reftable_offset / s->cluster_size;
>> + end_cluster = first_cluster + reftable_clusters;
>> +
>> goto write_refblocks;
>
> these three lines now looks like a function call in assembler :)
>
>> }
>>
>
> You didn't ping the series (more than two week old) so, I'm not sure
> that you are not preparing v2 now.. But I kept it "marked unred" all
> this time, and several times tried to look at it, and postponed, because
> I don't familiar with this place of qcow2 driver. And the function looks
> too difficult. Now finally I think I understand most of the logic that
> you change. Honestly, I think a bit of refactoring the
> rebuild_refcount_structure() prior to logic change would make it a lot
> easier to review..
Hm, yes, putting the for () loop into its own function would probably be
a good starting put. I’ll look into it.
Max
next prev parent reply other threads:[~2021-03-26 13:49 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-10 15:59 [PATCH 0/4] qcow2: Improve refcount structure rebuilding Max Reitz
2021-03-10 15:59 ` [PATCH 1/4] " Max Reitz
2021-03-26 11:48 ` Vladimir Sementsov-Ogievskiy
2021-03-26 13:47 ` Max Reitz [this message]
2021-03-26 14:38 ` Vladimir Sementsov-Ogievskiy
2021-03-10 15:59 ` [PATCH 2/4] iotests/common.qemu: Add _cleanup_single_qemu Max Reitz
2021-03-10 15:59 ` [PATCH 3/4] iotests/common.qemu: Allow using the QSD Max Reitz
2021-03-10 15:59 ` [PATCH 4/4] iotests/108: Test new refcount rebuild algorithm Max Reitz
2021-03-10 16:07 ` Eric Blake
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3eb1748e-012d-9c74-d9dc-c1a6f2644ba6@redhat.com \
--to=mreitz@redhat.com \
--cc=kwolf@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=vsementsov@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).