qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Max Reitz <mreitz@redhat.com>
To: Eric Blake <eblake@redhat.com>, qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>, Peter Lieven <pl@kamp.de>,
	Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 06/21] qcow2: Helper function for refcount modification
Date: Tue, 11 Nov 2014 09:35:29 +0100	[thread overview]
Message-ID: <5461CA51.2080403@redhat.com> (raw)
In-Reply-To: <54613C7E.4020707@redhat.com>

On 2014-11-10 at 23:30, Eric Blake wrote:
> On 11/10/2014 06:45 AM, Max Reitz wrote:
>> Since refcounts do not always have to be a uint16_t, all refcount blocks
>> and arrays in memory should not have a specific type (thus they become
>> pointers to void) and for accessing them, two helper functions are used
>> (a getter and a setter). Those functions are called indirectly through
>> function pointers in the BDRVQcowState so they may later be exchanged
>> for different refcount orders.
>>
>> At the same time, replace all sizeof(**refcount_table) etc. in the qcow2
>> check code by s->refcount_bits / 8. Note that this might lead to wrong
>> values due to truncating division, but currently s->refcount_bits is
>> always 16, and before the upcoming patch which removes this limitation
>> another patch will make the division round up correctly.
> Thanks for pointing out that this transition is still in progress, and
> needs more patches.  I agree that for this patch, the division is safe.
>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/qcow2-refcount.c | 152 +++++++++++++++++++++++++++++--------------------
>>   block/qcow2.h          |   8 +++
>>   2 files changed, 98 insertions(+), 62 deletions(-)
>>
>> @@ -116,20 +137,24 @@ int64_t qcow2_get_refcount(BlockDriverState *bs, int64_t cluster_index)
>>       }
>>   
>>       ret = qcow2_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
>> -        (void**) &refcount_block);
>> +                          &refcount_block);
>>       if (ret < 0) {
>>           return ret;
>>       }
>>   
>>       block_index = cluster_index & (s->refcount_block_size - 1);
>> -    refcount = be16_to_cpu(refcount_block[block_index]);
>> +    refcount = s->get_refcount(refcount_block, block_index);
>>   
>> -    ret = qcow2_cache_put(bs, s->refcount_block_cache,
>> -        (void**) &refcount_block);
>> +    ret = qcow2_cache_put(bs, s->refcount_block_cache, &refcount_block);
>>       if (ret < 0) {
>>           return ret;
>>       }
>>   
>> +    if (refcount < 0) {
>> +        /* overflow */
>> +        return -ERANGE;
>> +    }
> Should you be checking for overflow prior to calling qcow2_cache_put?

I don't think so; we want to free the cache entry reference even if the 
refblock seems unusable.

>> @@ -362,7 +387,7 @@ static int alloc_refcount_block(BlockDriverState *bs,
>>           s->cluster_size;
>>       uint64_t table_offset = meta_offset + blocks_clusters * s->cluster_size;
>>       uint64_t *new_table = g_try_new0(uint64_t, table_size);
>> -    uint16_t *new_blocks = g_try_malloc0(blocks_clusters * s->cluster_size);
>> +    void *new_blocks = g_try_malloc0(blocks_clusters * s->cluster_size);
> Can this multiplication ever overflow?  Would we be better off with a
> g_try_new0 approach?

Yes, you're right. Patch 7 introduces a helper function which will allow 
checking such overflows in a central place, but I haven't done so in 
this version.

>> @@ -1118,12 +1142,13 @@ fail:
>>    */
>>   static int inc_refcounts(BlockDriverState *bs,
>>                            BdrvCheckResult *res,
>> -                         uint16_t **refcount_table,
>> +                         void **refcount_table,
>>                            int64_t *refcount_table_size,
>>                            int64_t offset, int64_t size)
>>   {
>>       BDRVQcowState *s = bs->opaque;
>> -    uint64_t start, last, cluster_offset, k;
>> +    uint64_t start, last, cluster_offset, k, refcount;
> Why uint64_t, when you limit to int64_t in other patches?

Because we don't need to check for errors here. But I guess we should 
really use int64_t everywhere to be consistent (so that if something 
breaks because a cluster has more than INT64_MAX references, it breaks 
everywhere).

>> +    int64_t i;
>>   
>>       if (size <= 0) {
>>           return 0;
>> @@ -1136,12 +1161,12 @@ static int inc_refcounts(BlockDriverState *bs,
>>           k = cluster_offset >> s->cluster_bits;
>>           if (k >= *refcount_table_size) {
>>               int64_t old_refcount_table_size = *refcount_table_size;
>> -            uint16_t *new_refcount_table;
>> +            void *new_refcount_table;
>>   
>>               *refcount_table_size = k + 1;
>>               new_refcount_table = g_try_realloc(*refcount_table,
>>                                                  *refcount_table_size *
>> -                                               sizeof(**refcount_table));
>> +                                               s->refcount_bits / 8);
> This multiplies before dividing.  Can it ever overflow, where writing
> *refcount_table_size * (s->refcount_bits / 8) would be safer?  Also, is
> it better to use a malloc variant that checks for overflow (I think it
> is g_try_renew?) instead of open-coding the multiply?
>
>>               if (!new_refcount_table) {
>>                   *refcount_table_size = old_refcount_table_size;
>>                   res->check_errors++;
>> @@ -1149,16 +1174,19 @@ static int inc_refcounts(BlockDriverState *bs,
>>               }
>>               *refcount_table = new_refcount_table;
>>   
>> -            memset(*refcount_table + old_refcount_table_size, 0,
>> -                   (*refcount_table_size - old_refcount_table_size) *
>> -                   sizeof(**refcount_table));
>> +            for (i = old_refcount_table_size; i < *refcount_table_size; i++) {
>> +                s->set_refcount(*refcount_table, i, 0);
>> +            }
> This feels slower than memset.

It is, yes. But this is the check function, I don't think performance is 
all that important here (especially not operations in RAM).

> Any chance we can add an optimization
> that brings back the speed of memset (may require an additional callback
> in addition to the getter and setter)?

For sub-byte refcount widths, we would have to manually set all non-byte 
aligned entries to 0 and then use memset() on the rest. Not impossible, 
but I think too complicated for a place where performance is not critical.

>> @@ -1178,7 +1206,7 @@ enum {
>>    * error occurred.
>>    */
>>   static int check_refcounts_l2(BlockDriverState *bs, BdrvCheckResult *res,
>> -    uint16_t **refcount_table, int64_t *refcount_table_size, int64_t l2_offset,
>> +    void **refcount_table, int64_t *refcount_table_size, int64_t l2_offset,
>>       int flags)
> I noticed you cleaned up indentation in a lot of the patch, but not
> here.  Any reason?

Maybe I remembered touching that header once already and not fixing up 
the indentation. Will do in v2.

>> @@ -1541,7 +1569,7 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
>>   
>>                   new_refcount_table = g_try_realloc(*refcount_table,
>>                                                      *nb_clusters *
>> -                                                   sizeof(**refcount_table));
>> +                                                   s->refcount_bits / 8);
> Another possible overflow or g_try_renew site?
>
>>                   if (!new_refcount_table) {
>>                       *nb_clusters = old_nb_clusters;
>>                       res->check_errors++;
>> @@ -1549,9 +1577,9 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
>>                   }
>>                   *refcount_table = new_refcount_table;
>>   
>> -                memset(*refcount_table + old_nb_clusters, 0,
>> -                       (*nb_clusters - old_nb_clusters) *
>> -                       sizeof(**refcount_table));
>> +                for (j = old_nb_clusters; j < *nb_clusters; j++) {
>> +                    s->set_refcount(*refcount_table, j, 0);
>> +                }
> Another memset pessimation.  Maybe even having a callback to expand the
> table, and factor out more of the common code of reallocating the table
> and clearing all new entries.

That sounds useful, yes. I'll look into it, but I'll probably still go 
without memset().

>> @@ -1611,7 +1640,7 @@ static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>>       int ret;
>>   
>>       if (!*refcount_table) {
>> -        *refcount_table = g_try_new0(uint16_t, *nb_clusters);
>> +        *refcount_table = g_try_malloc0(*nb_clusters * s->refcount_bits / 8);
> Feels like a step backwards in overflow detection?
>
>> @@ -1787,22 +1816,22 @@ static int64_t alloc_clusters_imrt(BlockDriverState *bs,
>>           *imrt_nb_clusters = cluster + cluster_count - contiguous_free_clusters;
>>           new_refcount_table = g_try_realloc(*refcount_table,
>>                                              *imrt_nb_clusters *
>> -                                           sizeof(**refcount_table));
>> +                                           s->refcount_bits / 8);
> Another possible overflow
>
>>           if (!new_refcount_table) {
>>               *imrt_nb_clusters = old_imrt_nb_clusters;
>>               return -ENOMEM;
>>           }
>>           *refcount_table = new_refcount_table;
>>   
>> -        memset(*refcount_table + old_imrt_nb_clusters, 0,
>> -               (*imrt_nb_clusters - old_imrt_nb_clusters) *
>> -               sizeof(**refcount_table));
>> +        for (i = old_imrt_nb_clusters; i < *imrt_nb_clusters; i++) {
>> +            s->set_refcount(*refcount_table, i, 0);
>> +        }
>>       }
> and another resize where we pessimize memset
>
>
>> @@ -1911,12 +1940,11 @@ write_refblocks:
>>           }
>>   
>>           on_disk_refblock = qemu_blockalign0(bs->file, s->cluster_size);
>> -        for (i = 0; i < s->refcount_block_size &&
>> -                    refblock_start + i < *nb_clusters; i++)
>> -        {
>> -            on_disk_refblock[i] =
>> -                cpu_to_be16((*refcount_table)[refblock_start + i]);
>> -        }
>> +
>> +        memcpy(on_disk_refblock, (void *)((uintptr_t)*refcount_table +
>> +                                 (refblock_index << s->refcount_block_bits)),
>> +               MIN(s->refcount_block_size, *nb_clusters - refblock_start)
>> +               * s->refcount_bits / 8);
>>   
> This one's different in that you move TO a memcpy instead of open-coded
> loop.

Yes, because the set_refcount() and get_refcount() helpers store the 
refcounts already in big endian, so now we can directly use the data 
without conversion.

> But I still worry if multiply before /8 could be a problem.
>
>> @@ -2064,7 +2092,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>>           /* Because the old reftable has been exchanged for a new one the
>>            * references have to be recalculated */
>>           rebuild = false;
>> -        memset(refcount_table, 0, nb_clusters * sizeof(uint16_t));
>> +        memset(refcount_table, 0, nb_clusters * s->refcount_bits / 8);
> Another /8 possible overflow.
>
>>           ret = calculate_refcounts(bs, res, 0, &rebuild, &refcount_table,
>>                                     &nb_clusters);
>>           if (ret < 0) {
>> diff --git a/block/qcow2.h b/block/qcow2.h
>> index 0f8eb15..1c63221 100644
>> --- a/block/qcow2.h
>> +++ b/block/qcow2.h
>> @@ -213,6 +213,11 @@ typedef struct Qcow2DiscardRegion {
>>       QTAILQ_ENTRY(Qcow2DiscardRegion) next;
>>   } Qcow2DiscardRegion;
>>   
>> +typedef uint64_t Qcow2GetRefcountFunc(const void *refcount_array,
>> +                                      uint64_t index);
>> +typedef void Qcow2SetRefcountFunc(void *refcount_array,
>> +                                  uint64_t index, uint64_t value);
> Do you want int64_t for any of the types here, to make it obvious that
> you can't exceed 2^63?

Yes, I do.

> Looks like you are on track to a sane conversion, but I'm worried enough
> about the math that it probably needs a respin (either comments stating
> why we know we don't overflow, or else safer constructs).

I'll add a comment in the commit message why overflows do not really get 
more probable than they were before, but the real overflow prevention 
will happen in patch 7.

I think I'll factor out the refcount array resize which automatically 
sets the newly allocated entries to zero. Now that I think about it... I 
can actually get away without sub-byte operations. Since .*realloc() 
will only allocate full bytes anyway, I only need to set that newly 
allocated area to zero (with memset()). So it'll be back to memset(). 
(I'm still wondering why there's no g_try_realloc0() or something; 
probably because the glib does not require the libc heap manager to know 
the exact size of each heap object which would be necessary for that to 
work)

Max

  reply	other threads:[~2014-11-11  8:35 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-10 13:45 [Qemu-devel] [PATCH 00/21] qcow2: Support refcount orders != 4 Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 01/21] qcow2: Add two new fields to BDRVQcowState Max Reitz
2014-11-10 19:00   ` Eric Blake
2014-11-10 13:45 ` [Qemu-devel] [PATCH 02/21] qcow2: Add refcount_width to format-specific info Max Reitz
2014-11-10 19:06   ` Eric Blake
2014-11-11  8:11     ` Max Reitz
2014-11-11 15:49       ` Eric Blake
2014-11-10 13:45 ` [Qemu-devel] [PATCH 03/21] qcow2: Use 64 bits for refcount values Max Reitz
2014-11-10 20:59   ` Eric Blake
2014-11-11  8:12     ` Max Reitz
2014-11-11  9:22     ` Kevin Wolf
2014-11-11  9:25       ` Max Reitz
2014-11-11  9:26         ` Max Reitz
2014-11-11  9:36         ` Kevin Wolf
2014-11-10 13:45 ` [Qemu-devel] [PATCH 04/21] qcow2: Respect error in qcow2_alloc_bytes() Max Reitz
2014-11-10 21:05   ` Eric Blake
2014-11-10 13:45 ` [Qemu-devel] [PATCH 05/21] qcow2: Refcount overflow and qcow2_alloc_bytes() Max Reitz
2014-11-10 21:12   ` Eric Blake
2014-11-11  8:22     ` Max Reitz
2014-11-11 16:13       ` Eric Blake
2014-11-11 16:18         ` Max Reitz
2014-11-11 19:49           ` Eric Blake
2014-11-12  8:52             ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 06/21] qcow2: Helper function for refcount modification Max Reitz
2014-11-10 22:30   ` Eric Blake
2014-11-11  8:35     ` Max Reitz [this message]
2014-11-11  9:43       ` Max Reitz
2014-11-11 10:56       ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 07/21] qcow2: Helper for refcount array size calculation Max Reitz
2014-11-10 22:49   ` Eric Blake
2014-11-11  8:37     ` Max Reitz
2014-11-11 10:08       ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 08/21] qcow2: More helpers for refcount modification Max Reitz
2014-11-11  0:29   ` Eric Blake
2014-11-11  8:42     ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 09/21] qcow2: Open images with refcount order != 4 Max Reitz
2014-11-10 17:03   ` Eric Blake
2014-11-10 17:06     ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 10/21] qcow2: refcount_order parameter for qcow2_create2 Max Reitz
2014-11-11  5:40   ` Eric Blake
2014-11-11  8:48     ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 11/21] iotests: Prepare for refcount_width option Max Reitz
2014-11-11 17:57   ` Eric Blake
2014-11-12  8:41     ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 12/21] qcow2: Allow creation with refcount order != 4 Max Reitz
2014-11-11 18:05   ` Eric Blake
2014-11-12  8:47     ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 13/21] block: Add opaque value to the amend CB Max Reitz
2014-11-11 18:08   ` Eric Blake
2014-11-10 13:45 ` [Qemu-devel] [PATCH 14/21] qcow2: Use error_report() in qcow2_amend_options() Max Reitz
2014-11-11 18:11   ` Eric Blake
2014-11-12  8:47     ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 15/21] qcow2: Use abort() instead of assert(false) Max Reitz
2014-11-11 18:12   ` Eric Blake
2014-11-12  8:48     ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 16/21] qcow2: Split upgrade/downgrade paths for amend Max Reitz
2014-11-11 18:14   ` Eric Blake
2014-11-10 13:45 ` [Qemu-devel] [PATCH 17/21] qcow2: Use intermediate helper CB " Max Reitz
2014-11-11 21:05   ` Eric Blake
2014-11-12  9:10     ` Max Reitz
2014-11-10 13:45 ` [Qemu-devel] [PATCH 18/21] qcow2: Add function for refcount order amendment Max Reitz
2014-11-12  4:15   ` Eric Blake
2014-11-12  9:55     ` Max Reitz
2014-11-12 13:50       ` Eric Blake
2014-11-12 14:19   ` Eric Blake
2014-11-12 14:21     ` Max Reitz
2014-11-12 17:45       ` Max Reitz
2014-11-12 20:21         ` Eric Blake
2014-11-10 13:45 ` [Qemu-devel] [PATCH 19/21] qcow2: Invoke refcount order amendment function Max Reitz
2014-11-12  4:36   ` Eric Blake
2014-11-10 13:45 ` [Qemu-devel] [PATCH 20/21] qcow2: Point to amend function in check Max Reitz
2014-11-12  4:38   ` Eric Blake
2014-11-10 13:45 ` [Qemu-devel] [PATCH 21/21] iotests: Add test for different refcount widths Max Reitz
2014-11-11 19:53   ` Eric Blake
2014-11-12  8:58     ` Max Reitz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5461CA51.2080403@redhat.com \
    --to=mreitz@redhat.com \
    --cc=eblake@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=pl@kamp.de \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).