qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Max Reitz <mreitz@redhat.com>
To: Eric Blake <eblake@redhat.com>, qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>, Peter Lieven <pl@kamp.de>,
	Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 06/21] qcow2: Helper function for refcount modification
Date: Tue, 11 Nov 2014 10:43:06 +0100	[thread overview]
Message-ID: <5461DA2A.6030303@redhat.com> (raw)
In-Reply-To: <5461CA51.2080403@redhat.com>

On 2014-11-11 at 09:35, Max Reitz wrote:
> On 2014-11-10 at 23:30, Eric Blake wrote:
>> On 11/10/2014 06:45 AM, Max Reitz wrote:
>>> Since refcounts do not always have to be a uint16_t, all refcount 
>>> blocks
>>> and arrays in memory should not have a specific type (thus they become
>>> pointers to void) and for accessing them, two helper functions are used
>>> (a getter and a setter). Those functions are called indirectly through
>>> function pointers in the BDRVQcowState so they may later be exchanged
>>> for different refcount orders.
>>>
>>> At the same time, replace all sizeof(**refcount_table) etc. in the 
>>> qcow2
>>> check code by s->refcount_bits / 8. Note that this might lead to wrong
>>> values due to truncating division, but currently s->refcount_bits is
>>> always 16, and before the upcoming patch which removes this limitation
>>> another patch will make the division round up correctly.
>> Thanks for pointing out that this transition is still in progress, and
>> needs more patches.  I agree that for this patch, the division is safe.
>>
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> ---
>>>   block/qcow2-refcount.c | 152 
>>> +++++++++++++++++++++++++++++--------------------
>>>   block/qcow2.h          |   8 +++
>>>   2 files changed, 98 insertions(+), 62 deletions(-)
>>>
>>> @@ -116,20 +137,24 @@ int64_t qcow2_get_refcount(BlockDriverState 
>>> *bs, int64_t cluster_index)
>>>       }
>>>         ret = qcow2_cache_get(bs, s->refcount_block_cache, 
>>> refcount_block_offset,
>>> -        (void**) &refcount_block);
>>> +                          &refcount_block);
>>>       if (ret < 0) {
>>>           return ret;
>>>       }
>>>         block_index = cluster_index & (s->refcount_block_size - 1);
>>> -    refcount = be16_to_cpu(refcount_block[block_index]);
>>> +    refcount = s->get_refcount(refcount_block, block_index);
>>>   -    ret = qcow2_cache_put(bs, s->refcount_block_cache,
>>> -        (void**) &refcount_block);
>>> +    ret = qcow2_cache_put(bs, s->refcount_block_cache, 
>>> &refcount_block);
>>>       if (ret < 0) {
>>>           return ret;
>>>       }
>>>   +    if (refcount < 0) {
>>> +        /* overflow */
>>> +        return -ERANGE;
>>> +    }
>> Should you be checking for overflow prior to calling qcow2_cache_put?
>
> I don't think so; we want to free the cache entry reference even if 
> the refblock seems unusable.
>
>>> @@ -362,7 +387,7 @@ static int alloc_refcount_block(BlockDriverState 
>>> *bs,
>>>           s->cluster_size;
>>>       uint64_t table_offset = meta_offset + blocks_clusters * 
>>> s->cluster_size;
>>>       uint64_t *new_table = g_try_new0(uint64_t, table_size);
>>> -    uint16_t *new_blocks = g_try_malloc0(blocks_clusters * 
>>> s->cluster_size);
>>> +    void *new_blocks = g_try_malloc0(blocks_clusters * 
>>> s->cluster_size);
>> Can this multiplication ever overflow?  Would we be better off with a
>> g_try_new0 approach?
>
> Yes, you're right. Patch 7 introduces a helper function which will 
> allow checking such overflows in a central place, but I haven't done 
> so in this version.
>
>>> @@ -1118,12 +1142,13 @@ fail:
>>>    */
>>>   static int inc_refcounts(BlockDriverState *bs,
>>>                            BdrvCheckResult *res,
>>> -                         uint16_t **refcount_table,
>>> +                         void **refcount_table,
>>>                            int64_t *refcount_table_size,
>>>                            int64_t offset, int64_t size)
>>>   {
>>>       BDRVQcowState *s = bs->opaque;
>>> -    uint64_t start, last, cluster_offset, k;
>>> +    uint64_t start, last, cluster_offset, k, refcount;
>> Why uint64_t, when you limit to int64_t in other patches?
>
> Because we don't need to check for errors here. But I guess we should 
> really use int64_t everywhere to be consistent (so that if something 
> breaks because a cluster has more than INT64_MAX references, it breaks 
> everywhere).
>
>>> +    int64_t i;
>>>         if (size <= 0) {
>>>           return 0;
>>> @@ -1136,12 +1161,12 @@ static int inc_refcounts(BlockDriverState *bs,
>>>           k = cluster_offset >> s->cluster_bits;
>>>           if (k >= *refcount_table_size) {
>>>               int64_t old_refcount_table_size = *refcount_table_size;
>>> -            uint16_t *new_refcount_table;
>>> +            void *new_refcount_table;
>>>                 *refcount_table_size = k + 1;
>>>               new_refcount_table = g_try_realloc(*refcount_table,
>>> *refcount_table_size *
>>> - sizeof(**refcount_table));
>>> + s->refcount_bits / 8);
>> This multiplies before dividing.  Can it ever overflow, where writing
>> *refcount_table_size * (s->refcount_bits / 8) would be safer?  Also, is
>> it better to use a malloc variant that checks for overflow (I think it
>> is g_try_renew?) instead of open-coding the multiply?
>>
>>>               if (!new_refcount_table) {
>>>                   *refcount_table_size = old_refcount_table_size;
>>>                   res->check_errors++;
>>> @@ -1149,16 +1174,19 @@ static int inc_refcounts(BlockDriverState *bs,
>>>               }
>>>               *refcount_table = new_refcount_table;
>>>   -            memset(*refcount_table + old_refcount_table_size, 0,
>>> -                   (*refcount_table_size - old_refcount_table_size) *
>>> -                   sizeof(**refcount_table));
>>> +            for (i = old_refcount_table_size; i < 
>>> *refcount_table_size; i++) {
>>> +                s->set_refcount(*refcount_table, i, 0);
>>> +            }
>> This feels slower than memset.
>
> It is, yes. But this is the check function, I don't think performance 
> is all that important here (especially not operations in RAM).
>
>> Any chance we can add an optimization
>> that brings back the speed of memset (may require an additional callback
>> in addition to the getter and setter)?
>
> For sub-byte refcount widths, we would have to manually set all 
> non-byte aligned entries to 0 and then use memset() on the rest. Not 
> impossible, but I think too complicated for a place where performance 
> is not critical.
>
>>> @@ -1178,7 +1206,7 @@ enum {
>>>    * error occurred.
>>>    */
>>>   static int check_refcounts_l2(BlockDriverState *bs, 
>>> BdrvCheckResult *res,
>>> -    uint16_t **refcount_table, int64_t *refcount_table_size, 
>>> int64_t l2_offset,
>>> +    void **refcount_table, int64_t *refcount_table_size, int64_t 
>>> l2_offset,
>>>       int flags)
>> I noticed you cleaned up indentation in a lot of the patch, but not
>> here.  Any reason?
>
> Maybe I remembered touching that header once already and not fixing up 
> the indentation. Will do in v2.
>
>>> @@ -1541,7 +1569,7 @@ static int check_refblocks(BlockDriverState 
>>> *bs, BdrvCheckResult *res,
>>>                     new_refcount_table = g_try_realloc(*refcount_table,
>>> *nb_clusters *
>>> - sizeof(**refcount_table));
>>> + s->refcount_bits / 8);
>> Another possible overflow or g_try_renew site?
>>
>>>                   if (!new_refcount_table) {
>>>                       *nb_clusters = old_nb_clusters;
>>>                       res->check_errors++;
>>> @@ -1549,9 +1577,9 @@ static int check_refblocks(BlockDriverState 
>>> *bs, BdrvCheckResult *res,
>>>                   }
>>>                   *refcount_table = new_refcount_table;
>>>   -                memset(*refcount_table + old_nb_clusters, 0,
>>> -                       (*nb_clusters - old_nb_clusters) *
>>> -                       sizeof(**refcount_table));
>>> +                for (j = old_nb_clusters; j < *nb_clusters; j++) {
>>> +                    s->set_refcount(*refcount_table, j, 0);
>>> +                }
>> Another memset pessimation.  Maybe even having a callback to expand the
>> table, and factor out more of the common code of reallocating the table
>> and clearing all new entries.
>
> That sounds useful, yes. I'll look into it, but I'll probably still go 
> without memset().
>
>>> @@ -1611,7 +1640,7 @@ static int 
>>> calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>>>       int ret;
>>>         if (!*refcount_table) {
>>> -        *refcount_table = g_try_new0(uint16_t, *nb_clusters);
>>> +        *refcount_table = g_try_malloc0(*nb_clusters * 
>>> s->refcount_bits / 8);
>> Feels like a step backwards in overflow detection?
>>
>>> @@ -1787,22 +1816,22 @@ static int64_t 
>>> alloc_clusters_imrt(BlockDriverState *bs,
>>>           *imrt_nb_clusters = cluster + cluster_count - 
>>> contiguous_free_clusters;
>>>           new_refcount_table = g_try_realloc(*refcount_table,
>>>                                              *imrt_nb_clusters *
>>> - sizeof(**refcount_table));
>>> + s->refcount_bits / 8);
>> Another possible overflow
>>
>>>           if (!new_refcount_table) {
>>>               *imrt_nb_clusters = old_imrt_nb_clusters;
>>>               return -ENOMEM;
>>>           }
>>>           *refcount_table = new_refcount_table;
>>>   -        memset(*refcount_table + old_imrt_nb_clusters, 0,
>>> -               (*imrt_nb_clusters - old_imrt_nb_clusters) *
>>> -               sizeof(**refcount_table));
>>> +        for (i = old_imrt_nb_clusters; i < *imrt_nb_clusters; i++) {
>>> +            s->set_refcount(*refcount_table, i, 0);
>>> +        }
>>>       }
>> and another resize where we pessimize memset
>>
>>
>>> @@ -1911,12 +1940,11 @@ write_refblocks:
>>>           }
>>>             on_disk_refblock = qemu_blockalign0(bs->file, 
>>> s->cluster_size);
>>> -        for (i = 0; i < s->refcount_block_size &&
>>> -                    refblock_start + i < *nb_clusters; i++)
>>> -        {
>>> -            on_disk_refblock[i] =
>>> -                cpu_to_be16((*refcount_table)[refblock_start + i]);
>>> -        }
>>> +
>>> +        memcpy(on_disk_refblock, (void *)((uintptr_t)*refcount_table +
>>> +                                 (refblock_index << 
>>> s->refcount_block_bits)),
>>> +               MIN(s->refcount_block_size, *nb_clusters - 
>>> refblock_start)
>>> +               * s->refcount_bits / 8);
>> This one's different in that you move TO a memcpy instead of open-coded
>> loop.
>
> Yes, because the set_refcount() and get_refcount() helpers store the 
> refcounts already in big endian, so now we can directly use the data 
> without conversion.
>
>> But I still worry if multiply before /8 could be a problem.
>>
>>> @@ -2064,7 +2092,7 @@ int qcow2_check_refcounts(BlockDriverState 
>>> *bs, BdrvCheckResult *res,
>>>           /* Because the old reftable has been exchanged for a new 
>>> one the
>>>            * references have to be recalculated */
>>>           rebuild = false;
>>> -        memset(refcount_table, 0, nb_clusters * sizeof(uint16_t));
>>> +        memset(refcount_table, 0, nb_clusters * s->refcount_bits / 8);
>> Another /8 possible overflow.
>>
>>>           ret = calculate_refcounts(bs, res, 0, &rebuild, 
>>> &refcount_table,
>>>                                     &nb_clusters);
>>>           if (ret < 0) {
>>> diff --git a/block/qcow2.h b/block/qcow2.h
>>> index 0f8eb15..1c63221 100644
>>> --- a/block/qcow2.h
>>> +++ b/block/qcow2.h
>>> @@ -213,6 +213,11 @@ typedef struct Qcow2DiscardRegion {
>>>       QTAILQ_ENTRY(Qcow2DiscardRegion) next;
>>>   } Qcow2DiscardRegion;
>>>   +typedef uint64_t Qcow2GetRefcountFunc(const void *refcount_array,
>>> +                                      uint64_t index);
>>> +typedef void Qcow2SetRefcountFunc(void *refcount_array,
>>> +                                  uint64_t index, uint64_t value);
>> Do you want int64_t for any of the types here, to make it obvious that
>> you can't exceed 2^63?
>
> Yes, I do.

To be honest, I just read your reply and not the original code (which 
I'm doing now while working on v2). I think I don't want int64_t after 
all. I generally do want to use int64_t for all refcount values, but in 
this case, as these are just helpers for directly accessing refcount 
arrays (which I think should not fail, see my reply for patch 8), I'd 
rather keep them using uint64_t.

Max

>> Looks like you are on track to a sane conversion, but I'm worried enough
>> about the math that it probably needs a respin (either comments stating
>> why we know we don't overflow, or else safer constructs).
>
> I'll add a comment in the commit message why overflows do not really 
> get more probable than they were before, but the real overflow 
> prevention will happen in patch 7.
>
> I think I'll factor out the refcount array resize which automatically 
> sets the newly allocated entries to zero. Now that I think about it... 
> I can actually get away without sub-byte operations. Since .*realloc() 
> will only allocate full bytes anyway, I only need to set that newly 
> allocated area to zero (with memset()). So it'll be back to memset(). 
> (I'm still wondering why there's no g_try_realloc0() or something; 
> probably because the glib does not require the libc heap manager to 
> know the exact size of each heap object which would be necessary for 
> that to work)
>
> Max

  reply	other threads:[~2014-11-11  9:43 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-10 13:45 [Qemu-devel] [PATCH 00/21] qcow2: Support refcount orders != 4 Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 01/21] qcow2: Add two new fields to BDRVQcowState Max Reitz
2014-11-10 19:00   ` Eric Blake
2014-11-10 13:45 ` [Qemu-devel] [PATCH 02/21] qcow2: Add refcount_width to format-specific info Max Reitz
2014-11-10 19:06   ` Eric Blake
2014-11-11  8:11     ` Max Reitz
2014-11-11 15:49       ` Eric Blake
2014-11-10 13:45 ` [Qemu-devel] [PATCH 03/21] qcow2: Use 64 bits for refcount values Max Reitz
2014-11-10 20:59   ` Eric Blake
2014-11-11  8:12     ` Max Reitz
2014-11-11  9:22     ` Kevin Wolf
2014-11-11  9:25       ` Max Reitz
2014-11-11  9:26         ` Max Reitz
2014-11-11  9:36         ` Kevin Wolf
2014-11-10 13:45 ` [Qemu-devel] [PATCH 04/21] qcow2: Respect error in qcow2_alloc_bytes() Max Reitz
2014-11-10 21:05   ` Eric Blake
2014-11-10 13:45 ` [Qemu-devel] [PATCH 05/21] qcow2: Refcount overflow and qcow2_alloc_bytes() Max Reitz
2014-11-10 21:12   ` Eric Blake
2014-11-11  8:22     ` Max Reitz
2014-11-11 16:13       ` Eric Blake
2014-11-11 16:18         ` Max Reitz
2014-11-11 19:49           ` Eric Blake
2014-11-12  8:52             ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 06/21] qcow2: Helper function for refcount modification Max Reitz
2014-11-10 22:30   ` Eric Blake
2014-11-11  8:35     ` Max Reitz
2014-11-11  9:43       ` Max Reitz [this message]
2014-11-11 10:56       ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 07/21] qcow2: Helper for refcount array size calculation Max Reitz
2014-11-10 22:49   ` Eric Blake
2014-11-11  8:37     ` Max Reitz
2014-11-11 10:08       ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 08/21] qcow2: More helpers for refcount modification Max Reitz
2014-11-11  0:29   ` Eric Blake
2014-11-11  8:42     ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 09/21] qcow2: Open images with refcount order != 4 Max Reitz
2014-11-10 17:03   ` Eric Blake
2014-11-10 17:06     ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 10/21] qcow2: refcount_order parameter for qcow2_create2 Max Reitz
2014-11-11  5:40   ` Eric Blake
2014-11-11  8:48     ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 11/21] iotests: Prepare for refcount_width option Max Reitz
2014-11-11 17:57   ` Eric Blake
2014-11-12  8:41     ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 12/21] qcow2: Allow creation with refcount order != 4 Max Reitz
2014-11-11 18:05   ` Eric Blake
2014-11-12  8:47     ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 13/21] block: Add opaque value to the amend CB Max Reitz
2014-11-11 18:08   ` Eric Blake
2014-11-10 13:45 ` [Qemu-devel] [PATCH 14/21] qcow2: Use error_report() in qcow2_amend_options() Max Reitz
2014-11-11 18:11   ` Eric Blake
2014-11-12  8:47     ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 15/21] qcow2: Use abort() instead of assert(false) Max Reitz
2014-11-11 18:12   ` Eric Blake
2014-11-12  8:48     ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 16/21] qcow2: Split upgrade/downgrade paths for amend Max Reitz
2014-11-11 18:14   ` Eric Blake
2014-11-10 13:45 ` [Qemu-devel] [PATCH 17/21] qcow2: Use intermediate helper CB " Max Reitz
2014-11-11 21:05   ` Eric Blake
2014-11-12  9:10     ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 18/21] qcow2: Add function for refcount order amendment Max Reitz
2014-11-12  4:15   ` Eric Blake
2014-11-12  9:55     ` Max Reitz
2014-11-12 13:50       ` Eric Blake
2014-11-12 14:19   ` Eric Blake
2014-11-12 14:21     ` Max Reitz
2014-11-12 17:45       ` Max Reitz
2014-11-12 20:21         ` Eric Blake
2014-11-10 13:45 ` [Qemu-devel] [PATCH 19/21] qcow2: Invoke refcount order amendment function Max Reitz
2014-11-12  4:36   ` Eric Blake
2014-11-10 13:45 ` [Qemu-devel] [PATCH 20/21] qcow2: Point to amend function in check Max Reitz
2014-11-12  4:38   ` Eric Blake
2014-11-10 13:45 ` [Qemu-devel] [PATCH 21/21] iotests: Add test for different refcount widths Max Reitz
2014-11-11 19:53   ` Eric Blake
2014-11-12  8:58     ` Max Reitz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5461DA2A.6030303@redhat.com \
    --to=mreitz@redhat.com \
    --cc=eblake@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=pl@kamp.de \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).