From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53078) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Xo7yr-0001Uq-2P for qemu-devel@nongnu.org; Tue, 11 Nov 2014 04:43:27 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Xo7yk-0007s2-QM for qemu-devel@nongnu.org; Tue, 11 Nov 2014 04:43:21 -0500 Received: from mx1.redhat.com ([209.132.183.28]:37391) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Xo7yk-0007ru-Gy for qemu-devel@nongnu.org; Tue, 11 Nov 2014 04:43:14 -0500 Message-ID: <5461DA2A.6030303@redhat.com> Date: Tue, 11 Nov 2014 10:43:06 +0100 From: Max Reitz MIME-Version: 1.0 References: <1415627159-15941-1-git-send-email-mreitz@redhat.com> <1415627159-15941-7-git-send-email-mreitz@redhat.com> <54613C7E.4020707@redhat.com> <5461CA51.2080403@redhat.com> In-Reply-To: <5461CA51.2080403@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 06/21] qcow2: Helper function for refcount modification List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Eric Blake , qemu-devel@nongnu.org Cc: Kevin Wolf , Peter Lieven , Stefan Hajnoczi On 2014-11-11 at 09:35, Max Reitz wrote: > On 2014-11-10 at 23:30, Eric Blake wrote: >> On 11/10/2014 06:45 AM, Max Reitz wrote: >>> Since refcounts do not always have to be a uint16_t, all refcount >>> blocks >>> and arrays in memory should not have a specific type (thus they become >>> pointers to void) and for accessing them, two helper functions are used >>> (a getter and a setter). Those functions are called indirectly through >>> function pointers in the BDRVQcowState so they may later be exchanged >>> for different refcount orders. >>> >>> At the same time, replace all sizeof(**refcount_table) etc. in the >>> qcow2 >>> check code by s->refcount_bits / 8. Note that this might lead to wrong >>> values due to truncating division, but currently s->refcount_bits is >>> always 16, and before the upcoming patch which removes this limitation >>> another patch will make the division round up correctly. >> Thanks for pointing out that this transition is still in progress, and >> needs more patches. I agree that for this patch, the division is safe. >> >>> Signed-off-by: Max Reitz >>> --- >>> block/qcow2-refcount.c | 152 >>> +++++++++++++++++++++++++++++-------------------- >>> block/qcow2.h | 8 +++ >>> 2 files changed, 98 insertions(+), 62 deletions(-) >>> >>> @@ -116,20 +137,24 @@ int64_t qcow2_get_refcount(BlockDriverState >>> *bs, int64_t cluster_index) >>> } >>> ret = qcow2_cache_get(bs, s->refcount_block_cache, >>> refcount_block_offset, >>> - (void**) &refcount_block); >>> + &refcount_block); >>> if (ret < 0) { >>> return ret; >>> } >>> block_index = cluster_index & (s->refcount_block_size - 1); >>> - refcount = be16_to_cpu(refcount_block[block_index]); >>> + refcount = s->get_refcount(refcount_block, block_index); >>> - ret = qcow2_cache_put(bs, s->refcount_block_cache, >>> - (void**) &refcount_block); >>> + ret = qcow2_cache_put(bs, s->refcount_block_cache, >>> &refcount_block); >>> if (ret < 0) { >>> return ret; >>> } >>> + if (refcount < 0) { >>> + /* overflow */ >>> + return -ERANGE; >>> + } >> Should you be checking for overflow prior to calling qcow2_cache_put? > > I don't think so; we want to free the cache entry reference even if > the refblock seems unusable. > >>> @@ -362,7 +387,7 @@ static int alloc_refcount_block(BlockDriverState >>> *bs, >>> s->cluster_size; >>> uint64_t table_offset = meta_offset + blocks_clusters * >>> s->cluster_size; >>> uint64_t *new_table = g_try_new0(uint64_t, table_size); >>> - uint16_t *new_blocks = g_try_malloc0(blocks_clusters * >>> s->cluster_size); >>> + void *new_blocks = g_try_malloc0(blocks_clusters * >>> s->cluster_size); >> Can this multiplication ever overflow? Would we be better off with a >> g_try_new0 approach? > > Yes, you're right. Patch 7 introduces a helper function which will > allow checking such overflows in a central place, but I haven't done > so in this version. > >>> @@ -1118,12 +1142,13 @@ fail: >>> */ >>> static int inc_refcounts(BlockDriverState *bs, >>> BdrvCheckResult *res, >>> - uint16_t **refcount_table, >>> + void **refcount_table, >>> int64_t *refcount_table_size, >>> int64_t offset, int64_t size) >>> { >>> BDRVQcowState *s = bs->opaque; >>> - uint64_t start, last, cluster_offset, k; >>> + uint64_t start, last, cluster_offset, k, refcount; >> Why uint64_t, when you limit to int64_t in other patches? > > Because we don't need to check for errors here. But I guess we should > really use int64_t everywhere to be consistent (so that if something > breaks because a cluster has more than INT64_MAX references, it breaks > everywhere). > >>> + int64_t i; >>> if (size <= 0) { >>> return 0; >>> @@ -1136,12 +1161,12 @@ static int inc_refcounts(BlockDriverState *bs, >>> k = cluster_offset >> s->cluster_bits; >>> if (k >= *refcount_table_size) { >>> int64_t old_refcount_table_size = *refcount_table_size; >>> - uint16_t *new_refcount_table; >>> + void *new_refcount_table; >>> *refcount_table_size = k + 1; >>> new_refcount_table = g_try_realloc(*refcount_table, >>> *refcount_table_size * >>> - sizeof(**refcount_table)); >>> + s->refcount_bits / 8); >> This multiplies before dividing. Can it ever overflow, where writing >> *refcount_table_size * (s->refcount_bits / 8) would be safer? Also, is >> it better to use a malloc variant that checks for overflow (I think it >> is g_try_renew?) instead of open-coding the multiply? >> >>> if (!new_refcount_table) { >>> *refcount_table_size = old_refcount_table_size; >>> res->check_errors++; >>> @@ -1149,16 +1174,19 @@ static int inc_refcounts(BlockDriverState *bs, >>> } >>> *refcount_table = new_refcount_table; >>> - memset(*refcount_table + old_refcount_table_size, 0, >>> - (*refcount_table_size - old_refcount_table_size) * >>> - sizeof(**refcount_table)); >>> + for (i = old_refcount_table_size; i < >>> *refcount_table_size; i++) { >>> + s->set_refcount(*refcount_table, i, 0); >>> + } >> This feels slower than memset. > > It is, yes. But this is the check function, I don't think performance > is all that important here (especially not operations in RAM). > >> Any chance we can add an optimization >> that brings back the speed of memset (may require an additional callback >> in addition to the getter and setter)? > > For sub-byte refcount widths, we would have to manually set all > non-byte aligned entries to 0 and then use memset() on the rest. Not > impossible, but I think too complicated for a place where performance > is not critical. > >>> @@ -1178,7 +1206,7 @@ enum { >>> * error occurred. >>> */ >>> static int check_refcounts_l2(BlockDriverState *bs, >>> BdrvCheckResult *res, >>> - uint16_t **refcount_table, int64_t *refcount_table_size, >>> int64_t l2_offset, >>> + void **refcount_table, int64_t *refcount_table_size, int64_t >>> l2_offset, >>> int flags) >> I noticed you cleaned up indentation in a lot of the patch, but not >> here. Any reason? > > Maybe I remembered touching that header once already and not fixing up > the indentation. Will do in v2. > >>> @@ -1541,7 +1569,7 @@ static int check_refblocks(BlockDriverState >>> *bs, BdrvCheckResult *res, >>> new_refcount_table = g_try_realloc(*refcount_table, >>> *nb_clusters * >>> - sizeof(**refcount_table)); >>> + s->refcount_bits / 8); >> Another possible overflow or g_try_renew site? >> >>> if (!new_refcount_table) { >>> *nb_clusters = old_nb_clusters; >>> res->check_errors++; >>> @@ -1549,9 +1577,9 @@ static int check_refblocks(BlockDriverState >>> *bs, BdrvCheckResult *res, >>> } >>> *refcount_table = new_refcount_table; >>> - memset(*refcount_table + old_nb_clusters, 0, >>> - (*nb_clusters - old_nb_clusters) * >>> - sizeof(**refcount_table)); >>> + for (j = old_nb_clusters; j < *nb_clusters; j++) { >>> + s->set_refcount(*refcount_table, j, 0); >>> + } >> Another memset pessimation. Maybe even having a callback to expand the >> table, and factor out more of the common code of reallocating the table >> and clearing all new entries. > > That sounds useful, yes. I'll look into it, but I'll probably still go > without memset(). > >>> @@ -1611,7 +1640,7 @@ static int >>> calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res, >>> int ret; >>> if (!*refcount_table) { >>> - *refcount_table = g_try_new0(uint16_t, *nb_clusters); >>> + *refcount_table = g_try_malloc0(*nb_clusters * >>> s->refcount_bits / 8); >> Feels like a step backwards in overflow detection? >> >>> @@ -1787,22 +1816,22 @@ static int64_t >>> alloc_clusters_imrt(BlockDriverState *bs, >>> *imrt_nb_clusters = cluster + cluster_count - >>> contiguous_free_clusters; >>> new_refcount_table = g_try_realloc(*refcount_table, >>> *imrt_nb_clusters * >>> - sizeof(**refcount_table)); >>> + s->refcount_bits / 8); >> Another possible overflow >> >>> if (!new_refcount_table) { >>> *imrt_nb_clusters = old_imrt_nb_clusters; >>> return -ENOMEM; >>> } >>> *refcount_table = new_refcount_table; >>> - memset(*refcount_table + old_imrt_nb_clusters, 0, >>> - (*imrt_nb_clusters - old_imrt_nb_clusters) * >>> - sizeof(**refcount_table)); >>> + for (i = old_imrt_nb_clusters; i < *imrt_nb_clusters; i++) { >>> + s->set_refcount(*refcount_table, i, 0); >>> + } >>> } >> and another resize where we pessimize memset >> >> >>> @@ -1911,12 +1940,11 @@ write_refblocks: >>> } >>> on_disk_refblock = qemu_blockalign0(bs->file, >>> s->cluster_size); >>> - for (i = 0; i < s->refcount_block_size && >>> - refblock_start + i < *nb_clusters; i++) >>> - { >>> - on_disk_refblock[i] = >>> - cpu_to_be16((*refcount_table)[refblock_start + i]); >>> - } >>> + >>> + memcpy(on_disk_refblock, (void *)((uintptr_t)*refcount_table + >>> + (refblock_index << >>> s->refcount_block_bits)), >>> + MIN(s->refcount_block_size, *nb_clusters - >>> refblock_start) >>> + * s->refcount_bits / 8); >> This one's different in that you move TO a memcpy instead of open-coded >> loop. > > Yes, because the set_refcount() and get_refcount() helpers store the > refcounts already in big endian, so now we can directly use the data > without conversion. > >> But I still worry if multiply before /8 could be a problem. >> >>> @@ -2064,7 +2092,7 @@ int qcow2_check_refcounts(BlockDriverState >>> *bs, BdrvCheckResult *res, >>> /* Because the old reftable has been exchanged for a new >>> one the >>> * references have to be recalculated */ >>> rebuild = false; >>> - memset(refcount_table, 0, nb_clusters * sizeof(uint16_t)); >>> + memset(refcount_table, 0, nb_clusters * s->refcount_bits / 8); >> Another /8 possible overflow. >> >>> ret = calculate_refcounts(bs, res, 0, &rebuild, >>> &refcount_table, >>> &nb_clusters); >>> if (ret < 0) { >>> diff --git a/block/qcow2.h b/block/qcow2.h >>> index 0f8eb15..1c63221 100644 >>> --- a/block/qcow2.h >>> +++ b/block/qcow2.h >>> @@ -213,6 +213,11 @@ typedef struct Qcow2DiscardRegion { >>> QTAILQ_ENTRY(Qcow2DiscardRegion) next; >>> } Qcow2DiscardRegion; >>> +typedef uint64_t Qcow2GetRefcountFunc(const void *refcount_array, >>> + uint64_t index); >>> +typedef void Qcow2SetRefcountFunc(void *refcount_array, >>> + uint64_t index, uint64_t value); >> Do you want int64_t for any of the types here, to make it obvious that >> you can't exceed 2^63? > > Yes, I do. To be honest, I just read your reply and not the original code (which I'm doing now while working on v2). I think I don't want int64_t after all. I generally do want to use int64_t for all refcount values, but in this case, as these are just helpers for directly accessing refcount arrays (which I think should not fail, see my reply for patch 8), I'd rather keep them using uint64_t. Max >> Looks like you are on track to a sane conversion, but I'm worried enough >> about the math that it probably needs a respin (either comments stating >> why we know we don't overflow, or else safer constructs). > > I'll add a comment in the commit message why overflows do not really > get more probable than they were before, but the real overflow > prevention will happen in patch 7. > > I think I'll factor out the refcount array resize which automatically > sets the newly allocated entries to zero. Now that I think about it... > I can actually get away without sub-byte operations. Since .*realloc() > will only allocate full bytes anyway, I only need to set that newly > allocated area to zero (with memset()). So it'll be back to memset(). > (I'm still wondering why there's no g_try_realloc0() or something; > probably because the glib does not require the libc heap manager to > know the exact size of each heap object which would be necessary for > that to work) > > Max