From: Max Reitz <mreitz@redhat.com>
To: Eric Blake <eblake@redhat.com>, qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>, Peter Lieven <pl@kamp.de>,
Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 06/21] qcow2: Helper function for refcount modification
Date: Tue, 11 Nov 2014 10:43:06 +0100 [thread overview]
Message-ID: <5461DA2A.6030303@redhat.com> (raw)
In-Reply-To: <5461CA51.2080403@redhat.com>
On 2014-11-11 at 09:35, Max Reitz wrote:
> On 2014-11-10 at 23:30, Eric Blake wrote:
>> On 11/10/2014 06:45 AM, Max Reitz wrote:
>>> Since refcounts do not always have to be a uint16_t, all refcount
>>> blocks
>>> and arrays in memory should not have a specific type (thus they become
>>> pointers to void) and for accessing them, two helper functions are used
>>> (a getter and a setter). Those functions are called indirectly through
>>> function pointers in the BDRVQcowState so they may later be exchanged
>>> for different refcount orders.
>>>
>>> At the same time, replace all sizeof(**refcount_table) etc. in the
>>> qcow2
>>> check code by s->refcount_bits / 8. Note that this might lead to wrong
>>> values due to truncating division, but currently s->refcount_bits is
>>> always 16, and before the upcoming patch which removes this limitation
>>> another patch will make the division round up correctly.
>> Thanks for pointing out that this transition is still in progress, and
>> needs more patches. I agree that for this patch, the division is safe.
>>
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>> block/qcow2-refcount.c | 152
>>> +++++++++++++++++++++++++++++--------------------
>>> block/qcow2.h | 8 +++
>>> 2 files changed, 98 insertions(+), 62 deletions(-)
>>>
>>> @@ -116,20 +137,24 @@ int64_t qcow2_get_refcount(BlockDriverState
>>> *bs, int64_t cluster_index)
>>> }
>>> ret = qcow2_cache_get(bs, s->refcount_block_cache,
>>> refcount_block_offset,
>>> - (void**) &refcount_block);
>>> + &refcount_block);
>>> if (ret < 0) {
>>> return ret;
>>> }
>>> block_index = cluster_index & (s->refcount_block_size - 1);
>>> - refcount = be16_to_cpu(refcount_block[block_index]);
>>> + refcount = s->get_refcount(refcount_block, block_index);
>>> - ret = qcow2_cache_put(bs, s->refcount_block_cache,
>>> - (void**) &refcount_block);
>>> + ret = qcow2_cache_put(bs, s->refcount_block_cache,
>>> &refcount_block);
>>> if (ret < 0) {
>>> return ret;
>>> }
>>> + if (refcount < 0) {
>>> + /* overflow */
>>> + return -ERANGE;
>>> + }
>> Should you be checking for overflow prior to calling qcow2_cache_put?
>
> I don't think so; we want to free the cache entry reference even if
> the refblock seems unusable.
>
>>> @@ -362,7 +387,7 @@ static int alloc_refcount_block(BlockDriverState
>>> *bs,
>>> s->cluster_size;
>>> uint64_t table_offset = meta_offset + blocks_clusters *
>>> s->cluster_size;
>>> uint64_t *new_table = g_try_new0(uint64_t, table_size);
>>> - uint16_t *new_blocks = g_try_malloc0(blocks_clusters *
>>> s->cluster_size);
>>> + void *new_blocks = g_try_malloc0(blocks_clusters *
>>> s->cluster_size);
>> Can this multiplication ever overflow? Would we be better off with a
>> g_try_new0 approach?
>
> Yes, you're right. Patch 7 introduces a helper function which will
> allow checking such overflows in a central place, but I haven't done
> so in this version.
>
>>> @@ -1118,12 +1142,13 @@ fail:
>>> */
>>> static int inc_refcounts(BlockDriverState *bs,
>>> BdrvCheckResult *res,
>>> - uint16_t **refcount_table,
>>> + void **refcount_table,
>>> int64_t *refcount_table_size,
>>> int64_t offset, int64_t size)
>>> {
>>> BDRVQcowState *s = bs->opaque;
>>> - uint64_t start, last, cluster_offset, k;
>>> + uint64_t start, last, cluster_offset, k, refcount;
>> Why uint64_t, when you limit to int64_t in other patches?
>
> Because we don't need to check for errors here. But I guess we should
> really use int64_t everywhere to be consistent (so that if something
> breaks because a cluster has more than INT64_MAX references, it breaks
> everywhere).
>
>>> + int64_t i;
>>> if (size <= 0) {
>>> return 0;
>>> @@ -1136,12 +1161,12 @@ static int inc_refcounts(BlockDriverState *bs,
>>> k = cluster_offset >> s->cluster_bits;
>>> if (k >= *refcount_table_size) {
>>> int64_t old_refcount_table_size = *refcount_table_size;
>>> - uint16_t *new_refcount_table;
>>> + void *new_refcount_table;
>>> *refcount_table_size = k + 1;
>>> new_refcount_table = g_try_realloc(*refcount_table,
>>> *refcount_table_size *
>>> - sizeof(**refcount_table));
>>> + s->refcount_bits / 8);
>> This multiplies before dividing. Can it ever overflow, where writing
>> *refcount_table_size * (s->refcount_bits / 8) would be safer? Also, is
>> it better to use a malloc variant that checks for overflow (I think it
>> is g_try_renew?) instead of open-coding the multiply?
>>
>>> if (!new_refcount_table) {
>>> *refcount_table_size = old_refcount_table_size;
>>> res->check_errors++;
>>> @@ -1149,16 +1174,19 @@ static int inc_refcounts(BlockDriverState *bs,
>>> }
>>> *refcount_table = new_refcount_table;
>>> - memset(*refcount_table + old_refcount_table_size, 0,
>>> - (*refcount_table_size - old_refcount_table_size) *
>>> - sizeof(**refcount_table));
>>> + for (i = old_refcount_table_size; i <
>>> *refcount_table_size; i++) {
>>> + s->set_refcount(*refcount_table, i, 0);
>>> + }
>> This feels slower than memset.
>
> It is, yes. But this is the check function, I don't think performance
> is all that important here (especially not operations in RAM).
>
>> Any chance we can add an optimization
>> that brings back the speed of memset (may require an additional callback
>> in addition to the getter and setter)?
>
> For sub-byte refcount widths, we would have to manually set all
> non-byte aligned entries to 0 and then use memset() on the rest. Not
> impossible, but I think too complicated for a place where performance
> is not critical.
>
>>> @@ -1178,7 +1206,7 @@ enum {
>>> * error occurred.
>>> */
>>> static int check_refcounts_l2(BlockDriverState *bs,
>>> BdrvCheckResult *res,
>>> - uint16_t **refcount_table, int64_t *refcount_table_size,
>>> int64_t l2_offset,
>>> + void **refcount_table, int64_t *refcount_table_size, int64_t
>>> l2_offset,
>>> int flags)
>> I noticed you cleaned up indentation in a lot of the patch, but not
>> here. Any reason?
>
> Maybe I remembered touching that header once already and not fixing up
> the indentation. Will do in v2.
>
>>> @@ -1541,7 +1569,7 @@ static int check_refblocks(BlockDriverState
>>> *bs, BdrvCheckResult *res,
>>> new_refcount_table = g_try_realloc(*refcount_table,
>>> *nb_clusters *
>>> - sizeof(**refcount_table));
>>> + s->refcount_bits / 8);
>> Another possible overflow or g_try_renew site?
>>
>>> if (!new_refcount_table) {
>>> *nb_clusters = old_nb_clusters;
>>> res->check_errors++;
>>> @@ -1549,9 +1577,9 @@ static int check_refblocks(BlockDriverState
>>> *bs, BdrvCheckResult *res,
>>> }
>>> *refcount_table = new_refcount_table;
>>> - memset(*refcount_table + old_nb_clusters, 0,
>>> - (*nb_clusters - old_nb_clusters) *
>>> - sizeof(**refcount_table));
>>> + for (j = old_nb_clusters; j < *nb_clusters; j++) {
>>> + s->set_refcount(*refcount_table, j, 0);
>>> + }
>> Another memset pessimation. Maybe even having a callback to expand the
>> table, and factor out more of the common code of reallocating the table
>> and clearing all new entries.
>
> That sounds useful, yes. I'll look into it, but I'll probably still go
> without memset().
>
>>> @@ -1611,7 +1640,7 @@ static int
>>> calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>>> int ret;
>>> if (!*refcount_table) {
>>> - *refcount_table = g_try_new0(uint16_t, *nb_clusters);
>>> + *refcount_table = g_try_malloc0(*nb_clusters *
>>> s->refcount_bits / 8);
>> Feels like a step backwards in overflow detection?
>>
>>> @@ -1787,22 +1816,22 @@ static int64_t
>>> alloc_clusters_imrt(BlockDriverState *bs,
>>> *imrt_nb_clusters = cluster + cluster_count -
>>> contiguous_free_clusters;
>>> new_refcount_table = g_try_realloc(*refcount_table,
>>> *imrt_nb_clusters *
>>> - sizeof(**refcount_table));
>>> + s->refcount_bits / 8);
>> Another possible overflow
>>
>>> if (!new_refcount_table) {
>>> *imrt_nb_clusters = old_imrt_nb_clusters;
>>> return -ENOMEM;
>>> }
>>> *refcount_table = new_refcount_table;
>>> - memset(*refcount_table + old_imrt_nb_clusters, 0,
>>> - (*imrt_nb_clusters - old_imrt_nb_clusters) *
>>> - sizeof(**refcount_table));
>>> + for (i = old_imrt_nb_clusters; i < *imrt_nb_clusters; i++) {
>>> + s->set_refcount(*refcount_table, i, 0);
>>> + }
>>> }
>> and another resize where we pessimize memset
>>
>>
>>> @@ -1911,12 +1940,11 @@ write_refblocks:
>>> }
>>> on_disk_refblock = qemu_blockalign0(bs->file,
>>> s->cluster_size);
>>> - for (i = 0; i < s->refcount_block_size &&
>>> - refblock_start + i < *nb_clusters; i++)
>>> - {
>>> - on_disk_refblock[i] =
>>> - cpu_to_be16((*refcount_table)[refblock_start + i]);
>>> - }
>>> +
>>> + memcpy(on_disk_refblock, (void *)((uintptr_t)*refcount_table +
>>> + (refblock_index <<
>>> s->refcount_block_bits)),
>>> + MIN(s->refcount_block_size, *nb_clusters -
>>> refblock_start)
>>> + * s->refcount_bits / 8);
>> This one's different in that you move TO a memcpy instead of open-coded
>> loop.
>
> Yes, because the set_refcount() and get_refcount() helpers store the
> refcounts already in big endian, so now we can directly use the data
> without conversion.
>
>> But I still worry if multiply before /8 could be a problem.
>>
>>> @@ -2064,7 +2092,7 @@ int qcow2_check_refcounts(BlockDriverState
>>> *bs, BdrvCheckResult *res,
>>> /* Because the old reftable has been exchanged for a new
>>> one the
>>> * references have to be recalculated */
>>> rebuild = false;
>>> - memset(refcount_table, 0, nb_clusters * sizeof(uint16_t));
>>> + memset(refcount_table, 0, nb_clusters * s->refcount_bits / 8);
>> Another /8 possible overflow.
>>
>>> ret = calculate_refcounts(bs, res, 0, &rebuild,
>>> &refcount_table,
>>> &nb_clusters);
>>> if (ret < 0) {
>>> diff --git a/block/qcow2.h b/block/qcow2.h
>>> index 0f8eb15..1c63221 100644
>>> --- a/block/qcow2.h
>>> +++ b/block/qcow2.h
>>> @@ -213,6 +213,11 @@ typedef struct Qcow2DiscardRegion {
>>> QTAILQ_ENTRY(Qcow2DiscardRegion) next;
>>> } Qcow2DiscardRegion;
>>> +typedef uint64_t Qcow2GetRefcountFunc(const void *refcount_array,
>>> + uint64_t index);
>>> +typedef void Qcow2SetRefcountFunc(void *refcount_array,
>>> + uint64_t index, uint64_t value);
>> Do you want int64_t for any of the types here, to make it obvious that
>> you can't exceed 2^63?
>
> Yes, I do.
To be honest, I just read your reply and not the original code (which
I'm doing now while working on v2). I think I don't want int64_t after
all. I generally do want to use int64_t for all refcount values, but in
this case, as these are just helpers for directly accessing refcount
arrays (which I think should not fail, see my reply for patch 8), I'd
rather keep them using uint64_t.
Max
>> Looks like you are on track to a sane conversion, but I'm worried enough
>> about the math that it probably needs a respin (either comments stating
>> why we know we don't overflow, or else safer constructs).
>
> I'll add a comment in the commit message why overflows do not really
> get more probable than they were before, but the real overflow
> prevention will happen in patch 7.
>
> I think I'll factor out the refcount array resize which automatically
> sets the newly allocated entries to zero. Now that I think about it...
> I can actually get away without sub-byte operations. Since .*realloc()
> will only allocate full bytes anyway, I only need to set that newly
> allocated area to zero (with memset()). So it'll be back to memset().
> (I'm still wondering why there's no g_try_realloc0() or something;
> probably because the glib does not require the libc heap manager to
> know the exact size of each heap object which would be necessary for
> that to work)
>
> Max
next prev parent reply other threads:[~2014-11-11 9:43 UTC|newest]
Thread overview: 75+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-10 13:45 [Qemu-devel] [PATCH 00/21] qcow2: Support refcount orders != 4 Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 01/21] qcow2: Add two new fields to BDRVQcowState Max Reitz
2014-11-10 19:00 ` Eric Blake
2014-11-10 13:45 ` [Qemu-devel] [PATCH 02/21] qcow2: Add refcount_width to format-specific info Max Reitz
2014-11-10 19:06 ` Eric Blake
2014-11-11 8:11 ` Max Reitz
2014-11-11 15:49 ` Eric Blake
2014-11-10 13:45 ` [Qemu-devel] [PATCH 03/21] qcow2: Use 64 bits for refcount values Max Reitz
2014-11-10 20:59 ` Eric Blake
2014-11-11 8:12 ` Max Reitz
2014-11-11 9:22 ` Kevin Wolf
2014-11-11 9:25 ` Max Reitz
2014-11-11 9:26 ` Max Reitz
2014-11-11 9:36 ` Kevin Wolf
2014-11-10 13:45 ` [Qemu-devel] [PATCH 04/21] qcow2: Respect error in qcow2_alloc_bytes() Max Reitz
2014-11-10 21:05 ` Eric Blake
2014-11-10 13:45 ` [Qemu-devel] [PATCH 05/21] qcow2: Refcount overflow and qcow2_alloc_bytes() Max Reitz
2014-11-10 21:12 ` Eric Blake
2014-11-11 8:22 ` Max Reitz
2014-11-11 16:13 ` Eric Blake
2014-11-11 16:18 ` Max Reitz
2014-11-11 19:49 ` Eric Blake
2014-11-12 8:52 ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 06/21] qcow2: Helper function for refcount modification Max Reitz
2014-11-10 22:30 ` Eric Blake
2014-11-11 8:35 ` Max Reitz
2014-11-11 9:43 ` Max Reitz [this message]
2014-11-11 10:56 ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 07/21] qcow2: Helper for refcount array size calculation Max Reitz
2014-11-10 22:49 ` Eric Blake
2014-11-11 8:37 ` Max Reitz
2014-11-11 10:08 ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 08/21] qcow2: More helpers for refcount modification Max Reitz
2014-11-11 0:29 ` Eric Blake
2014-11-11 8:42 ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 09/21] qcow2: Open images with refcount order != 4 Max Reitz
2014-11-10 17:03 ` Eric Blake
2014-11-10 17:06 ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 10/21] qcow2: refcount_order parameter for qcow2_create2 Max Reitz
2014-11-11 5:40 ` Eric Blake
2014-11-11 8:48 ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 11/21] iotests: Prepare for refcount_width option Max Reitz
2014-11-11 17:57 ` Eric Blake
2014-11-12 8:41 ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 12/21] qcow2: Allow creation with refcount order != 4 Max Reitz
2014-11-11 18:05 ` Eric Blake
2014-11-12 8:47 ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 13/21] block: Add opaque value to the amend CB Max Reitz
2014-11-11 18:08 ` Eric Blake
2014-11-10 13:45 ` [Qemu-devel] [PATCH 14/21] qcow2: Use error_report() in qcow2_amend_options() Max Reitz
2014-11-11 18:11 ` Eric Blake
2014-11-12 8:47 ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 15/21] qcow2: Use abort() instead of assert(false) Max Reitz
2014-11-11 18:12 ` Eric Blake
2014-11-12 8:48 ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 16/21] qcow2: Split upgrade/downgrade paths for amend Max Reitz
2014-11-11 18:14 ` Eric Blake
2014-11-10 13:45 ` [Qemu-devel] [PATCH 17/21] qcow2: Use intermediate helper CB " Max Reitz
2014-11-11 21:05 ` Eric Blake
2014-11-12 9:10 ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 18/21] qcow2: Add function for refcount order amendment Max Reitz
2014-11-12 4:15 ` Eric Blake
2014-11-12 9:55 ` Max Reitz
2014-11-12 13:50 ` Eric Blake
2014-11-12 14:19 ` Eric Blake
2014-11-12 14:21 ` Max Reitz
2014-11-12 17:45 ` Max Reitz
2014-11-12 20:21 ` Eric Blake
2014-11-10 13:45 ` [Qemu-devel] [PATCH 19/21] qcow2: Invoke refcount order amendment function Max Reitz
2014-11-12 4:36 ` Eric Blake
2014-11-10 13:45 ` [Qemu-devel] [PATCH 20/21] qcow2: Point to amend function in check Max Reitz
2014-11-12 4:38 ` Eric Blake
2014-11-10 13:45 ` [Qemu-devel] [PATCH 21/21] iotests: Add test for different refcount widths Max Reitz
2014-11-11 19:53 ` Eric Blake
2014-11-12 8:58 ` Max Reitz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5461DA2A.6030303@redhat.com \
--to=mreitz@redhat.com \
--cc=eblake@redhat.com \
--cc=kwolf@redhat.com \
--cc=pl@kamp.de \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).