From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40905) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Xo6vS-0006ux-F4 for qemu-devel@nongnu.org; Tue, 11 Nov 2014 03:35:52 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Xo6vM-0001bE-9E for qemu-devel@nongnu.org; Tue, 11 Nov 2014 03:35:46 -0500 Received: from mx1.redhat.com ([209.132.183.28]:56125) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Xo6vL-0001b6-W6 for qemu-devel@nongnu.org; Tue, 11 Nov 2014 03:35:40 -0500 Message-ID: <5461CA51.2080403@redhat.com> Date: Tue, 11 Nov 2014 09:35:29 +0100 From: Max Reitz MIME-Version: 1.0 References: <1415627159-15941-1-git-send-email-mreitz@redhat.com> <1415627159-15941-7-git-send-email-mreitz@redhat.com> <54613C7E.4020707@redhat.com> In-Reply-To: <54613C7E.4020707@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 06/21] qcow2: Helper function for refcount modification List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Eric Blake , qemu-devel@nongnu.org Cc: Kevin Wolf , Peter Lieven , Stefan Hajnoczi On 2014-11-10 at 23:30, Eric Blake wrote: > On 11/10/2014 06:45 AM, Max Reitz wrote: >> Since refcounts do not always have to be a uint16_t, all refcount blocks >> and arrays in memory should not have a specific type (thus they become >> pointers to void) and for accessing them, two helper functions are used >> (a getter and a setter). Those functions are called indirectly through >> function pointers in the BDRVQcowState so they may later be exchanged >> for different refcount orders. >> >> At the same time, replace all sizeof(**refcount_table) etc. in the qcow2 >> check code by s->refcount_bits / 8. Note that this might lead to wrong >> values due to truncating division, but currently s->refcount_bits is >> always 16, and before the upcoming patch which removes this limitation >> another patch will make the division round up correctly. > Thanks for pointing out that this transition is still in progress, and > needs more patches. I agree that for this patch, the division is safe. > >> Signed-off-by: Max Reitz >> --- >> block/qcow2-refcount.c | 152 +++++++++++++++++++++++++++++-------------------- >> block/qcow2.h | 8 +++ >> 2 files changed, 98 insertions(+), 62 deletions(-) >> >> @@ -116,20 +137,24 @@ int64_t qcow2_get_refcount(BlockDriverState *bs, int64_t cluster_index) >> } >> >> ret = qcow2_cache_get(bs, s->refcount_block_cache, refcount_block_offset, >> - (void**) &refcount_block); >> + &refcount_block); >> if (ret < 0) { >> return ret; >> } >> >> block_index = cluster_index & (s->refcount_block_size - 1); >> - refcount = be16_to_cpu(refcount_block[block_index]); >> + refcount = s->get_refcount(refcount_block, block_index); >> >> - ret = qcow2_cache_put(bs, s->refcount_block_cache, >> - (void**) &refcount_block); >> + ret = qcow2_cache_put(bs, s->refcount_block_cache, &refcount_block); >> if (ret < 0) { >> return ret; >> } >> >> + if (refcount < 0) { >> + /* overflow */ >> + return -ERANGE; >> + } > Should you be checking for overflow prior to calling qcow2_cache_put? I don't think so; we want to free the cache entry reference even if the refblock seems unusable. >> @@ -362,7 +387,7 @@ static int alloc_refcount_block(BlockDriverState *bs, >> s->cluster_size; >> uint64_t table_offset = meta_offset + blocks_clusters * s->cluster_size; >> uint64_t *new_table = g_try_new0(uint64_t, table_size); >> - uint16_t *new_blocks = g_try_malloc0(blocks_clusters * s->cluster_size); >> + void *new_blocks = g_try_malloc0(blocks_clusters * s->cluster_size); > Can this multiplication ever overflow? Would we be better off with a > g_try_new0 approach? Yes, you're right. Patch 7 introduces a helper function which will allow checking such overflows in a central place, but I haven't done so in this version. >> @@ -1118,12 +1142,13 @@ fail: >> */ >> static int inc_refcounts(BlockDriverState *bs, >> BdrvCheckResult *res, >> - uint16_t **refcount_table, >> + void **refcount_table, >> int64_t *refcount_table_size, >> int64_t offset, int64_t size) >> { >> BDRVQcowState *s = bs->opaque; >> - uint64_t start, last, cluster_offset, k; >> + uint64_t start, last, cluster_offset, k, refcount; > Why uint64_t, when you limit to int64_t in other patches? Because we don't need to check for errors here. But I guess we should really use int64_t everywhere to be consistent (so that if something breaks because a cluster has more than INT64_MAX references, it breaks everywhere). >> + int64_t i; >> >> if (size <= 0) { >> return 0; >> @@ -1136,12 +1161,12 @@ static int inc_refcounts(BlockDriverState *bs, >> k = cluster_offset >> s->cluster_bits; >> if (k >= *refcount_table_size) { >> int64_t old_refcount_table_size = *refcount_table_size; >> - uint16_t *new_refcount_table; >> + void *new_refcount_table; >> >> *refcount_table_size = k + 1; >> new_refcount_table = g_try_realloc(*refcount_table, >> *refcount_table_size * >> - sizeof(**refcount_table)); >> + s->refcount_bits / 8); > This multiplies before dividing. Can it ever overflow, where writing > *refcount_table_size * (s->refcount_bits / 8) would be safer? Also, is > it better to use a malloc variant that checks for overflow (I think it > is g_try_renew?) instead of open-coding the multiply? > >> if (!new_refcount_table) { >> *refcount_table_size = old_refcount_table_size; >> res->check_errors++; >> @@ -1149,16 +1174,19 @@ static int inc_refcounts(BlockDriverState *bs, >> } >> *refcount_table = new_refcount_table; >> >> - memset(*refcount_table + old_refcount_table_size, 0, >> - (*refcount_table_size - old_refcount_table_size) * >> - sizeof(**refcount_table)); >> + for (i = old_refcount_table_size; i < *refcount_table_size; i++) { >> + s->set_refcount(*refcount_table, i, 0); >> + } > This feels slower than memset. It is, yes. But this is the check function, I don't think performance is all that important here (especially not operations in RAM). > Any chance we can add an optimization > that brings back the speed of memset (may require an additional callback > in addition to the getter and setter)? For sub-byte refcount widths, we would have to manually set all non-byte aligned entries to 0 and then use memset() on the rest. Not impossible, but I think too complicated for a place where performance is not critical. >> @@ -1178,7 +1206,7 @@ enum { >> * error occurred. >> */ >> static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res, >> - uint16_t **refcount_table, int64_t *refcount_table_size, int64_t l2_offset, >> + void **refcount_table, int64_t *refcount_table_size, int64_t l2_offset, >> int flags) > I noticed you cleaned up indentation in a lot of the patch, but not > here. Any reason? Maybe I remembered touching that header once already and not fixing up the indentation. Will do in v2. >> @@ -1541,7 +1569,7 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res, >> >> new_refcount_table = g_try_realloc(*refcount_table, >> *nb_clusters * >> - sizeof(**refcount_table)); >> + s->refcount_bits / 8); > Another possible overflow or g_try_renew site? > >> if (!new_refcount_table) { >> *nb_clusters = old_nb_clusters; >> res->check_errors++; >> @@ -1549,9 +1577,9 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res, >> } >> *refcount_table = new_refcount_table; >> >> - memset(*refcount_table + old_nb_clusters, 0, >> - (*nb_clusters - old_nb_clusters) * >> - sizeof(**refcount_table)); >> + for (j = old_nb_clusters; j < *nb_clusters; j++) { >> + s->set_refcount(*refcount_table, j, 0); >> + } > Another memset pessimation. Maybe even having a callback to expand the > table, and factor out more of the common code of reallocating the table > and clearing all new entries. That sounds useful, yes. I'll look into it, but I'll probably still go without memset(). >> @@ -1611,7 +1640,7 @@ static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res, >> int ret; >> >> if (!*refcount_table) { >> - *refcount_table = g_try_new0(uint16_t, *nb_clusters); >> + *refcount_table = g_try_malloc0(*nb_clusters * s->refcount_bits / 8); > Feels like a step backwards in overflow detection? > >> @@ -1787,22 +1816,22 @@ static int64_t alloc_clusters_imrt(BlockDriverState *bs, >> *imrt_nb_clusters = cluster + cluster_count - contiguous_free_clusters; >> new_refcount_table = g_try_realloc(*refcount_table, >> *imrt_nb_clusters * >> - sizeof(**refcount_table)); >> + s->refcount_bits / 8); > Another possible overflow > >> if (!new_refcount_table) { >> *imrt_nb_clusters = old_imrt_nb_clusters; >> return -ENOMEM; >> } >> *refcount_table = new_refcount_table; >> >> - memset(*refcount_table + old_imrt_nb_clusters, 0, >> - (*imrt_nb_clusters - old_imrt_nb_clusters) * >> - sizeof(**refcount_table)); >> + for (i = old_imrt_nb_clusters; i < *imrt_nb_clusters; i++) { >> + s->set_refcount(*refcount_table, i, 0); >> + } >> } > and another resize where we pessimize memset > > >> @@ -1911,12 +1940,11 @@ write_refblocks: >> } >> >> on_disk_refblock = qemu_blockalign0(bs->file, s->cluster_size); >> - for (i = 0; i < s->refcount_block_size && >> - refblock_start + i < *nb_clusters; i++) >> - { >> - on_disk_refblock[i] = >> - cpu_to_be16((*refcount_table)[refblock_start + i]); >> - } >> + >> + memcpy(on_disk_refblock, (void *)((uintptr_t)*refcount_table + >> + (refblock_index << s->refcount_block_bits)), >> + MIN(s->refcount_block_size, *nb_clusters - refblock_start) >> + * s->refcount_bits / 8); >> > This one's different in that you move TO a memcpy instead of open-coded > loop. Yes, because the set_refcount() and get_refcount() helpers store the refcounts already in big endian, so now we can directly use the data without conversion. > But I still worry if multiply before /8 could be a problem. > >> @@ -2064,7 +2092,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res, >> /* Because the old reftable has been exchanged for a new one the >> * references have to be recalculated */ >> rebuild = false; >> - memset(refcount_table, 0, nb_clusters * sizeof(uint16_t)); >> + memset(refcount_table, 0, nb_clusters * s->refcount_bits / 8); > Another /8 possible overflow. > >> ret = calculate_refcounts(bs, res, 0, &rebuild, &refcount_table, >> &nb_clusters); >> if (ret < 0) { >> diff --git a/block/qcow2.h b/block/qcow2.h >> index 0f8eb15..1c63221 100644 >> --- a/block/qcow2.h >> +++ b/block/qcow2.h >> @@ -213,6 +213,11 @@ typedef struct Qcow2DiscardRegion { >> QTAILQ_ENTRY(Qcow2DiscardRegion) next; >> } Qcow2DiscardRegion; >> >> +typedef uint64_t Qcow2GetRefcountFunc(const void *refcount_array, >> + uint64_t index); >> +typedef void Qcow2SetRefcountFunc(void *refcount_array, >> + uint64_t index, uint64_t value); > Do you want int64_t for any of the types here, to make it obvious that > you can't exceed 2^63? Yes, I do. > Looks like you are on track to a sane conversion, but I'm worried enough > about the math that it probably needs a respin (either comments stating > why we know we don't overflow, or else safer constructs). I'll add a comment in the commit message why overflows do not really get more probable than they were before, but the real overflow prevention will happen in patch 7. I think I'll factor out the refcount array resize which automatically sets the newly allocated entries to zero. Now that I think about it... I can actually get away without sub-byte operations. Since .*realloc() will only allocate full bytes anyway, I only need to set that newly allocated area to zero (with memset()). So it'll be back to memset(). (I'm still wondering why there's no g_try_realloc0() or something; probably because the glib does not require the libc heap manager to know the exact size of each heap object which would be necessary for that to work) Max