[Qemu-devel] [PATCH v5 00/11] qcow2: Fix image repairing

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v5 00/11] qcow2: Fix image repairing
@ 2014-08-29 21:40 Max Reitz
  2014-08-29 21:40 ` [Qemu-devel] [PATCH v5 01/11] qcow2: Calculate refcount block entry count Max Reitz
                   ` (11 more replies)
  0 siblings, 12 replies; 34+ messages in thread
From: Max Reitz @ 2014-08-29 21:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Benoît Canet, Max Reitz

As can be seen in the final patch of this series, there are certain
cases where the current repair implementation of qcow2 actually damages
the image further because it allocates new clusters for the refcount
structure which overlap with existing but according to the on-disk
refcounts (which are assumed to be wrong to begin with) unallocated
clusters.

This series fixes this by completely recreating the refcount structure
based on the in-memory information calculated during the check operation
if the possibility of damaging the image while repairing the refcount
structures in-place exists.


v5:
- Added patch 1 which adds two helper variables to BDRVQcowState
  reflecting the number of entries per refcount block;
  in contrast to v4, we don't need to clamp the refcount order against
  sub-byte widths, because sub-byte widths are actually correct (that
  means, I dropped the MAX() around refcount_order - 3)
- Patch 8 (prev. 7):
  - Use these new variables [Benoît]
  - Use a struct for rt_offset_and_clusters [Benoît]


git-backport-diff against v4:

Key:
[----] : patches are identical
[####] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/11:[down] 'qcow2: Calculate refcount block entry count'
002/11:[----] [--] 'qcow2: Fix leaks in dirty images'
003/11:[----] [--] 'qcow2: Split qcow2_check_refcounts()'
004/11:[----] [--] 'qcow2: Pull check_refblocks() up'
005/11:[----] [--] 'qcow2: Reuse refcount table in calculate_refcounts()'
006/11:[----] [--] 'qcow2: Fix refcount blocks beyond image end'
007/11:[----] [--] 'qcow2: Do not perform potentially damaging repairs'
008/11:[0025] [FC] 'qcow2: Rebuild refcount structure during check'
009/11:[----] [--] 'qcow2: Clean up after refcount rebuild'
010/11:[----] [--] 'iotests: Fix test outputs'
011/11:[----] [-C] 'iotests: Add test for potentially damaging repairs'


Max Reitz (11):
  qcow2: Calculate refcount block entry count
  qcow2: Fix leaks in dirty images
  qcow2: Split qcow2_check_refcounts()
  qcow2: Pull check_refblocks() up
  qcow2: Reuse refcount table in calculate_refcounts()
  qcow2: Fix refcount blocks beyond image end
  qcow2: Do not perform potentially damaging repairs
  qcow2: Rebuild refcount structure during check
  qcow2: Clean up after refcount rebuild
  iotests: Fix test outputs
  iotests: Add test for potentially damaging repairs

 block/qcow2-refcount.c     | 677 ++++++++++++++++++++++++++++++++-------------
 block/qcow2.c              |   4 +-
 block/qcow2.h              |   2 +
 tests/qemu-iotests/039.out |  10 +-
 tests/qemu-iotests/060.out |  10 +-
 tests/qemu-iotests/061.out |  18 +-
 tests/qemu-iotests/104     | 141 ++++++++++
 tests/qemu-iotests/104.out | 110 ++++++++
 tests/qemu-iotests/group   |   1 +
 9 files changed, 767 insertions(+), 206 deletions(-)
 create mode 100755 tests/qemu-iotests/104
 create mode 100644 tests/qemu-iotests/104.out

-- 
2.1.0

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v5 01/11] qcow2: Calculate refcount block entry count
  2014-08-29 21:40 [Qemu-devel] [PATCH v5 00/11] qcow2: Fix image repairing Max Reitz
@ 2014-08-29 21:40 ` Max Reitz
  2014-08-29 23:03   ` Eric Blake
                     ` (2 more replies)
  2014-08-29 21:40 ` [Qemu-devel] [PATCH v5 02/11] qcow2: Fix leaks in dirty images Max Reitz
                   ` (10 subsequent siblings)
  11 siblings, 3 replies; 34+ messages in thread
From: Max Reitz @ 2014-08-29 21:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Benoît Canet, Max Reitz

The size of a refblock entry is (in theory) variable; calculate
therefore the number of entries per refblock and the according bit shift
(1 << x == entry count) when opening an image.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2.c | 2 ++
 block/qcow2.h | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index f9e045f..172ad00 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -689,6 +689,8 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
 
     s->l2_bits = s->cluster_bits - 3; /* L2 is always one cluster */
     s->l2_size = 1 << s->l2_bits;
+    s->refcount_block_bits = s->cluster_bits - (s->refcount_order - 3);
+    s->refcount_block_size = 1 << s->refcount_block_bits;
     bs->total_sectors = header.size / 512;
     s->csize_shift = (62 - (s->cluster_bits - 8));
     s->csize_mask = (1 << (s->cluster_bits - 8)) - 1;
diff --git a/block/qcow2.h b/block/qcow2.h
index 6aeb7ea..7c01fb7 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -222,6 +222,8 @@ typedef struct BDRVQcowState {
     int l2_size;
     int l1_size;
     int l1_vm_state_index;
+    int refcount_block_bits;
+    int refcount_block_size;
     int csize_shift;
     int csize_mask;
     uint64_t cluster_offset_mask;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v5 01/11] qcow2: Calculate refcount block entry count
  2014-08-29 21:40 ` [Qemu-devel] [PATCH v5 01/11] qcow2: Calculate refcount block entry count Max Reitz
@ 2014-08-29 23:03   ` Eric Blake
  2014-09-02 18:56     ` Max Reitz
  2014-10-10 12:29   ` Benoît Canet
  2014-10-20 14:25   ` Kevin Wolf
  2 siblings, 1 reply; 34+ messages in thread
From: Eric Blake @ 2014-08-29 23:03 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Benoît Canet

[-- Attachment #1: Type: text/plain, Size: 2614 bytes --]

On 08/29/2014 03:40 PM, Max Reitz wrote:
> The size of a refblock entry is (in theory) variable; calculate
> therefore the number of entries per refblock and the according bit shift
> (1 << x == entry count) when opening an image.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2.c | 2 ++
>  block/qcow2.h | 2 ++
>  2 files changed, 4 insertions(+)

What is the maximum refcount_order?  The specs don't mention; the file
format is wide open to overflows.  Even something as benign-sounding as
refcount_order=6 (64 bits) means that each cluster can be referenced
2**64 times, which is far longer than our lifetimes to build it up that
high incrementally, and represents far greater than the amount of
storage in existence being deduplicated!  Shockingly easy to start
getting into undefined territory, so maybe we ought to explicitly cap
refcount_order to 6.

> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index f9e045f..172ad00 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -689,6 +689,8 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
>  
>      s->l2_bits = s->cluster_bits - 3; /* L2 is always one cluster */
>      s->l2_size = 1 << s->l2_bits;
> +    s->refcount_block_bits = s->cluster_bits - (s->refcount_order - 3);

Hmm; we document that qemu requires cluster_bits to be between 9 and 21
inclusive.  So, if cluster_bits is 9 (512-byte clusters), and
refcount_order is 6, then we can pack in 9 - (6 - 3) or 2**6 (that is,
64) refcounts per cluster.

On the other extreme, the minimum refcount_order of 0 (each cluster
occupies refcount bits, and so is either allocated or not, but no
sharing), starts having the math looks ugly, because you are mixing:

(int) = (uint32_t) - ( (uint32_t) - (int) )

so at one point, you are doing s->cluster_bits - (4294967293U), but that
wraps around (thankfully, wraparound is well-defined on unsigned types)
for a net answer of cluster_bits + 3.  But in the worst case, that means
an image with 2M clusters will be packing 21 - (0 - 3) or 2**24 (that
is, 16M) refcounts in one cluster.  Still fits in an int, so it looks
like you are safe...

> +    s->refcount_block_size = 1 << s->refcount_block_bits;

...that this particular shift will not cause undefined behavior, for
reasonable refcount_order in the range [0,6].

Reviewed-by: Eric Blake <eblake@redhat.com>

[We really ought to tighten the qcow2 spec - but that's a separate patch]

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v5 01/11] qcow2: Calculate refcount block entry count
  2014-08-29 23:03   ` Eric Blake
@ 2014-09-02 18:56     ` Max Reitz
  0 siblings, 0 replies; 34+ messages in thread
From: Max Reitz @ 2014-09-02 18:56 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Benoît Canet

On 30.08.2014 01:03, Eric Blake wrote:
> On 08/29/2014 03:40 PM, Max Reitz wrote:
>> The size of a refblock entry is (in theory) variable; calculate
>> therefore the number of entries per refblock and the according bit shift
>> (1 << x == entry count) when opening an image.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/qcow2.c | 2 ++
>>   block/qcow2.h | 2 ++
>>   2 files changed, 4 insertions(+)
> What is the maximum refcount_order?  The specs don't mention; the file
> format is wide open to overflows.  Even something as benign-sounding as
> refcount_order=6 (64 bits) means that each cluster can be referenced
> 2**64 times, which is far longer than our lifetimes to build it up that
> high incrementally, and represents far greater than the amount of
> storage in existence being deduplicated!  Shockingly easy to start
> getting into undefined territory, so maybe we ought to explicitly cap
> refcount_order to 6.

Well, the most obvious issue to me is that qcow2 only supports 64 bit 
offsets and sizes etc., so it shouldn't have refcounts wider than 64 bits.

On the other hand, it is probably possible to generate a valid image 
with a cluster having a refcount which exceeds 2^{64} - 1: Set the 
virtual size to (2^{64} - cluster_size), use a single data cluster for 
all virtual clusters which makes its refcount (2^{64} - cluster_size) / 
cluster_size (which would be 2^{55} - 1 in the most extreme case). Then 
you create cluster_size snapshots and the refcount of that cluster is 
now at (2^{55} - 1) * (1 + 512) = 2^{64} + 2^{55} - 512 - 1 >= 2^{64}.

But that would be crazy, so I think it very reasonable to forbid 
refcount_order > 6, too.

>> diff --git a/block/qcow2.c b/block/qcow2.c
>> index f9e045f..172ad00 100644
>> --- a/block/qcow2.c
>> +++ b/block/qcow2.c
>> @@ -689,6 +689,8 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
>>   
>>       s->l2_bits = s->cluster_bits - 3; /* L2 is always one cluster */
>>       s->l2_size = 1 << s->l2_bits;
>> +    s->refcount_block_bits = s->cluster_bits - (s->refcount_order - 3);
> Hmm; we document that qemu requires cluster_bits to be between 9 and 21
> inclusive.  So, if cluster_bits is 9 (512-byte clusters), and
> refcount_order is 6, then we can pack in 9 - (6 - 3) or 2**6 (that is,
> 64) refcounts per cluster.
>
> On the other extreme, the minimum refcount_order of 0 (each cluster
> occupies refcount bits, and so is either allocated or not, but no
> sharing), starts having the math looks ugly, because you are mixing:
>
> (int) = (uint32_t) - ( (uint32_t) - (int) )
>
> so at one point, you are doing s->cluster_bits - (4294967293U), but that
> wraps around (thankfully, wraparound is well-defined on unsigned types)
> for a net answer of cluster_bits + 3.  But in the worst case, that means
> an image with 2M clusters will be packing 21 - (0 - 3) or 2**24 (that
> is, 16M) refcounts in one cluster.  Still fits in an int, so it looks
> like you are safe...
>
>> +    s->refcount_block_size = 1 << s->refcount_block_bits;
> ...that this particular shift will not cause undefined behavior, for
> reasonable refcount_order in the range [0,6].

Well, for now refcount_order is asserted to be 4, anyway. ;-)

> Reviewed-by: Eric Blake <eblake@redhat.com>
>
> [We really ought to tighten the qcow2 spec - but that's a separate patch]

Yep, I'll include it in v2 of the follow-up series.

Max

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v5 01/11] qcow2: Calculate refcount block entry count
  2014-08-29 21:40 ` [Qemu-devel] [PATCH v5 01/11] qcow2: Calculate refcount block entry count Max Reitz
  2014-08-29 23:03   ` Eric Blake
@ 2014-10-10 12:29   ` Benoît Canet
  2014-10-11 10:27     ` Max Reitz
  2014-10-20 14:25   ` Kevin Wolf
  2 siblings, 1 reply; 34+ messages in thread
From: Benoît Canet @ 2014-10-10 12:29 UTC (permalink / raw)
  To: Max Reitz; +Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, Benoît Canet

On Fri, Aug 29, 2014 at 11:40:53PM +0200, Max Reitz wrote:
> The size of a refblock entry is (in theory) variable; calculate
> therefore the number of entries per refblock and the according bit shift
> (1 << x == entry count) when opening an image.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2.c | 2 ++
>  block/qcow2.h | 2 ++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index f9e045f..172ad00 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -689,6 +689,8 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
>  
>      s->l2_bits = s->cluster_bits - 3; /* L2 is always one cluster */
>      s->l2_size = 1 << s->l2_bits;
> +    s->refcount_block_bits = s->cluster_bits - (s->refcount_order - 3);

After carefull examination (s->refcount_order - 3) == REFCOUNT_SHIFT.
Why not also creating s->refcount_shift and make use of it in order to avoid
torturing the mind of the next reader.

Or simply recall in a comment that there is 2^3 bits in a byte.

Best regards

Benoît

> +    s->refcount_block_size = 1 << s->refcount_block_bits;
>      bs->total_sectors = header.size / 512;
>      s->csize_shift = (62 - (s->cluster_bits - 8));
>      s->csize_mask = (1 << (s->cluster_bits - 8)) - 1;
> diff --git a/block/qcow2.h b/block/qcow2.h
> index 6aeb7ea..7c01fb7 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -222,6 +222,8 @@ typedef struct BDRVQcowState {
>      int l2_size;
>      int l1_size;
>      int l1_vm_state_index;
> +    int refcount_block_bits;
> +    int refcount_block_size;
>      int csize_shift;
>      int csize_mask;
>      uint64_t cluster_offset_mask;
> -- 
> 2.1.0
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v5 01/11] qcow2: Calculate refcount block entry count
  2014-10-10 12:29   ` Benoît Canet
@ 2014-10-11 10:27     ` Max Reitz
  0 siblings, 0 replies; 34+ messages in thread
From: Max Reitz @ 2014-10-11 10:27 UTC (permalink / raw)
  To: Benoît Canet; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi

Am 10.10.2014 um 14:29 schrieb Benoît Canet:
> On Fri, Aug 29, 2014 at 11:40:53PM +0200, Max Reitz wrote:
>> The size of a refblock entry is (in theory) variable; calculate
>> therefore the number of entries per refblock and the according bit shift
>> (1 << x == entry count) when opening an image.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/qcow2.c | 2 ++
>>   block/qcow2.h | 2 ++
>>   2 files changed, 4 insertions(+)
>>
>> diff --git a/block/qcow2.c b/block/qcow2.c
>> index f9e045f..172ad00 100644
>> --- a/block/qcow2.c
>> +++ b/block/qcow2.c
>> @@ -689,6 +689,8 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
>>   
>>       s->l2_bits = s->cluster_bits - 3; /* L2 is always one cluster */
>>       s->l2_size = 1 << s->l2_bits;
>> +    s->refcount_block_bits = s->cluster_bits - (s->refcount_order - 3);
> After carefull examination (s->refcount_order - 3) == REFCOUNT_SHIFT.
> Why not also creating s->refcount_shift and make use of it in order to avoid
> torturing the mind of the next reader.

I'm sorry. *g*

Well, I'm against creating s->refcount_shift, because that name implies 
you could use it for shifts. But you shouldn't do that, because it may 
be both negative and positive.

> Or simply recall in a comment that there is 2^3 bits in a byte.

I like this more. :-)

Max

> Best regards
>
> Benoît
>
>> +    s->refcount_block_size = 1 << s->refcount_block_bits;
>>       bs->total_sectors = header.size / 512;
>>       s->csize_shift = (62 - (s->cluster_bits - 8));
>>       s->csize_mask = (1 << (s->cluster_bits - 8)) - 1;
>> diff --git a/block/qcow2.h b/block/qcow2.h
>> index 6aeb7ea..7c01fb7 100644
>> --- a/block/qcow2.h
>> +++ b/block/qcow2.h
>> @@ -222,6 +222,8 @@ typedef struct BDRVQcowState {
>>       int l2_size;
>>       int l1_size;
>>       int l1_vm_state_index;
>> +    int refcount_block_bits;
>> +    int refcount_block_size;
>>       int csize_shift;
>>       int csize_mask;
>>       uint64_t cluster_offset_mask;
>> -- 
>> 2.1.0
>>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v5 01/11] qcow2: Calculate refcount block entry count
  2014-08-29 21:40 ` [Qemu-devel] [PATCH v5 01/11] qcow2: Calculate refcount block entry count Max Reitz
  2014-08-29 23:03   ` Eric Blake
  2014-10-10 12:29   ` Benoît Canet
@ 2014-10-20 14:25   ` Kevin Wolf
  2014-10-20 14:39     ` Max Reitz
  2 siblings, 1 reply; 34+ messages in thread
From: Kevin Wolf @ 2014-10-20 14:25 UTC (permalink / raw)
  To: Max Reitz; +Cc: Stefan Hajnoczi, qemu-devel, Benoît Canet

Am 29.08.2014 um 23:40 hat Max Reitz geschrieben:
> The size of a refblock entry is (in theory) variable; calculate
> therefore the number of entries per refblock and the according bit shift
> (1 << x == entry count) when opening an image.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2.c | 2 ++
>  block/qcow2.h | 2 ++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index f9e045f..172ad00 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -689,6 +689,8 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
>  
>      s->l2_bits = s->cluster_bits - 3; /* L2 is always one cluster */
>      s->l2_size = 1 << s->l2_bits;
> +    s->refcount_block_bits = s->cluster_bits - (s->refcount_order - 3);
> +    s->refcount_block_size = 1 << s->refcount_block_bits;
>      bs->total_sectors = header.size / 512;
>      s->csize_shift = (62 - (s->cluster_bits - 8));
>      s->csize_mask = (1 << (s->cluster_bits - 8)) - 1;
> diff --git a/block/qcow2.h b/block/qcow2.h
> index 6aeb7ea..7c01fb7 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -222,6 +222,8 @@ typedef struct BDRVQcowState {
>      int l2_size;
>      int l1_size;
>      int l1_vm_state_index;
> +    int refcount_block_bits;
> +    int refcount_block_size;

Might just be me, but size sounds to me as if the unit were bytes. Would
you mind renaming this as refcount_block_entries or refblock_entries?

Kevin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v5 01/11] qcow2: Calculate refcount block entry count
  2014-10-20 14:25   ` Kevin Wolf
@ 2014-10-20 14:39     ` Max Reitz
  2014-10-20 14:48       ` Kevin Wolf
  0 siblings, 1 reply; 34+ messages in thread
From: Max Reitz @ 2014-10-20 14:39 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Stefan Hajnoczi, qemu-devel, Benoît Canet

On 20.10.2014 at 16:25, Kevin Wolf wrote:
> Am 29.08.2014 um 23:40 hat Max Reitz geschrieben:
>> The size of a refblock entry is (in theory) variable; calculate
>> therefore the number of entries per refblock and the according bit shift
>> (1 << x == entry count) when opening an image.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/qcow2.c | 2 ++
>>   block/qcow2.h | 2 ++
>>   2 files changed, 4 insertions(+)
>>
>> diff --git a/block/qcow2.c b/block/qcow2.c
>> index f9e045f..172ad00 100644
>> --- a/block/qcow2.c
>> +++ b/block/qcow2.c
>> @@ -689,6 +689,8 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
>>   
>>       s->l2_bits = s->cluster_bits - 3; /* L2 is always one cluster */
>>       s->l2_size = 1 << s->l2_bits;
>> +    s->refcount_block_bits = s->cluster_bits - (s->refcount_order - 3);
>> +    s->refcount_block_size = 1 << s->refcount_block_bits;
>>       bs->total_sectors = header.size / 512;
>>       s->csize_shift = (62 - (s->cluster_bits - 8));
>>       s->csize_mask = (1 << (s->cluster_bits - 8)) - 1;
>> diff --git a/block/qcow2.h b/block/qcow2.h
>> index 6aeb7ea..7c01fb7 100644
>> --- a/block/qcow2.h
>> +++ b/block/qcow2.h
>> @@ -222,6 +222,8 @@ typedef struct BDRVQcowState {
>>       int l2_size;
>>       int l1_size;
>>       int l1_vm_state_index;
>> +    int refcount_block_bits;
>> +    int refcount_block_size;
> Might just be me, but size sounds to me as if the unit were bytes. Would
> you mind renaming this as refcount_block_entries or refblock_entries?

If I'm doing a v7, no. Otherwise, well, see l1_size and l2_size. ;-)

Max

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v5 01/11] qcow2: Calculate refcount block entry count
  2014-10-20 14:39     ` Max Reitz
@ 2014-10-20 14:48       ` Kevin Wolf
  2014-10-21 16:26         ` Max Reitz
  0 siblings, 1 reply; 34+ messages in thread
From: Kevin Wolf @ 2014-10-20 14:48 UTC (permalink / raw)
  To: Max Reitz; +Cc: Stefan Hajnoczi, qemu-devel, Benoît Canet

Am 20.10.2014 um 16:39 hat Max Reitz geschrieben:
> On 20.10.2014 at 16:25, Kevin Wolf wrote:
> >Am 29.08.2014 um 23:40 hat Max Reitz geschrieben:
> >>The size of a refblock entry is (in theory) variable; calculate
> >>therefore the number of entries per refblock and the according bit shift
> >>(1 << x == entry count) when opening an image.
> >>
> >>Signed-off-by: Max Reitz <mreitz@redhat.com>
> >>---
> >>  block/qcow2.c | 2 ++
> >>  block/qcow2.h | 2 ++
> >>  2 files changed, 4 insertions(+)
> >>
> >>diff --git a/block/qcow2.c b/block/qcow2.c
> >>index f9e045f..172ad00 100644
> >>--- a/block/qcow2.c
> >>+++ b/block/qcow2.c
> >>@@ -689,6 +689,8 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
> >>      s->l2_bits = s->cluster_bits - 3; /* L2 is always one cluster */
> >>      s->l2_size = 1 << s->l2_bits;
> >>+    s->refcount_block_bits = s->cluster_bits - (s->refcount_order - 3);
> >>+    s->refcount_block_size = 1 << s->refcount_block_bits;
> >>      bs->total_sectors = header.size / 512;
> >>      s->csize_shift = (62 - (s->cluster_bits - 8));
> >>      s->csize_mask = (1 << (s->cluster_bits - 8)) - 1;
> >>diff --git a/block/qcow2.h b/block/qcow2.h
> >>index 6aeb7ea..7c01fb7 100644
> >>--- a/block/qcow2.h
> >>+++ b/block/qcow2.h
> >>@@ -222,6 +222,8 @@ typedef struct BDRVQcowState {
> >>      int l2_size;
> >>      int l1_size;
> >>      int l1_vm_state_index;
> >>+    int refcount_block_bits;
> >>+    int refcount_block_size;
> >Might just be me, but size sounds to me as if the unit were bytes. Would
> >you mind renaming this as refcount_block_entries or refblock_entries?
> 
> If I'm doing a v7, no. Otherwise, well, see l1_size and l2_size. ;-)

Good point. This has confused people more than once, so it's probably
not just me.

Kevin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v5 01/11] qcow2: Calculate refcount block entry count
  2014-10-20 14:48       ` Kevin Wolf
@ 2014-10-21 16:26         ` Max Reitz
  2014-10-22  8:27           ` Kevin Wolf
  0 siblings, 1 reply; 34+ messages in thread
From: Max Reitz @ 2014-10-21 16:26 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Stefan Hajnoczi, qemu-devel, Benoît Canet

On 2014-10-20 at 16:48, Kevin Wolf wrote:
> Am 20.10.2014 um 16:39 hat Max Reitz geschrieben:
>> On 20.10.2014 at 16:25, Kevin Wolf wrote:
>>> Am 29.08.2014 um 23:40 hat Max Reitz geschrieben:
>>>> The size of a refblock entry is (in theory) variable; calculate
>>>> therefore the number of entries per refblock and the according bit shift
>>>> (1 << x == entry count) when opening an image.
>>>>
>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>> ---
>>>>   block/qcow2.c | 2 ++
>>>>   block/qcow2.h | 2 ++
>>>>   2 files changed, 4 insertions(+)
>>>>
>>>> diff --git a/block/qcow2.c b/block/qcow2.c
>>>> index f9e045f..172ad00 100644
>>>> --- a/block/qcow2.c
>>>> +++ b/block/qcow2.c
>>>> @@ -689,6 +689,8 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
>>>>       s->l2_bits = s->cluster_bits - 3; /* L2 is always one cluster */
>>>>       s->l2_size = 1 << s->l2_bits;
>>>> +    s->refcount_block_bits = s->cluster_bits - (s->refcount_order - 3);
>>>> +    s->refcount_block_size = 1 << s->refcount_block_bits;
>>>>       bs->total_sectors = header.size / 512;
>>>>       s->csize_shift = (62 - (s->cluster_bits - 8));
>>>>       s->csize_mask = (1 << (s->cluster_bits - 8)) - 1;
>>>> diff --git a/block/qcow2.h b/block/qcow2.h
>>>> index 6aeb7ea..7c01fb7 100644
>>>> --- a/block/qcow2.h
>>>> +++ b/block/qcow2.h
>>>> @@ -222,6 +222,8 @@ typedef struct BDRVQcowState {
>>>>       int l2_size;
>>>>       int l1_size;
>>>>       int l1_vm_state_index;
>>>> +    int refcount_block_bits;
>>>> +    int refcount_block_size;
>>> Might just be me, but size sounds to me as if the unit were bytes. Would
>>> you mind renaming this as refcount_block_entries or refblock_entries?
>> If I'm doing a v7, no. Otherwise, well, see l1_size and l2_size. ;-)
> Good point. This has confused people more than once, so it's probably
> not just me.

Okay, now that I've done it and was about to send the series and just 
wanted to convert a local variable named refcount_table_size to 
refcount_table_entries, I decided not do do this. I'll call it 
refcount_block_size in v7 as well.

The reason for this is that I started looking for "_size" in 
block/qcow2-refcount.c. "_entries" is never used, the number of entries 
per L1, L2 or refcount table is always foo_size. In BDRVQcowState, 
there's not only l1_size and l2_size, but refcount_table_size as well. 
Calling it refcount_block_entries without renaming those would be 
extremely weird, and renaming those does not seem like a viable option 
to me. Furthermore, I'd find it extremely confusing to have "_entries" 
in some places and "_size" in others when there's no difference between 
the two. Currently, people ask "Why is this foo_size an entry count? ... 
Well, okay, that seems to be just the way it is." With foo_entries, 
it'll be "Why is this foo_size an entry count when bar_entries exists, 
so shouldn't it be foo_entries if they want it to be an entry count?"

I'll keep it as refcount_block_size, although I'm afraid reverting all 
these changes will be as hard as having made them in the first place... 
Oh, well, here goes.

Max

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v5 01/11] qcow2: Calculate refcount block entry count
  2014-10-21 16:26         ` Max Reitz
@ 2014-10-22  8:27           ` Kevin Wolf
  0 siblings, 0 replies; 34+ messages in thread
From: Kevin Wolf @ 2014-10-22  8:27 UTC (permalink / raw)
  To: Max Reitz; +Cc: Stefan Hajnoczi, qemu-devel, Benoît Canet

Am 21.10.2014 um 18:26 hat Max Reitz geschrieben:
> On 2014-10-20 at 16:48, Kevin Wolf wrote:
> >Am 20.10.2014 um 16:39 hat Max Reitz geschrieben:
> >>On 20.10.2014 at 16:25, Kevin Wolf wrote:
> >>>Am 29.08.2014 um 23:40 hat Max Reitz geschrieben:
> >>>>The size of a refblock entry is (in theory) variable; calculate
> >>>>therefore the number of entries per refblock and the according bit shift
> >>>>(1 << x == entry count) when opening an image.
> >>>>
> >>>>Signed-off-by: Max Reitz <mreitz@redhat.com>
> >>>>---
> >>>>  block/qcow2.c | 2 ++
> >>>>  block/qcow2.h | 2 ++
> >>>>  2 files changed, 4 insertions(+)
> >>>>
> >>>>diff --git a/block/qcow2.c b/block/qcow2.c
> >>>>index f9e045f..172ad00 100644
> >>>>--- a/block/qcow2.c
> >>>>+++ b/block/qcow2.c
> >>>>@@ -689,6 +689,8 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
> >>>>      s->l2_bits = s->cluster_bits - 3; /* L2 is always one cluster */
> >>>>      s->l2_size = 1 << s->l2_bits;
> >>>>+    s->refcount_block_bits = s->cluster_bits - (s->refcount_order - 3);
> >>>>+    s->refcount_block_size = 1 << s->refcount_block_bits;
> >>>>      bs->total_sectors = header.size / 512;
> >>>>      s->csize_shift = (62 - (s->cluster_bits - 8));
> >>>>      s->csize_mask = (1 << (s->cluster_bits - 8)) - 1;
> >>>>diff --git a/block/qcow2.h b/block/qcow2.h
> >>>>index 6aeb7ea..7c01fb7 100644
> >>>>--- a/block/qcow2.h
> >>>>+++ b/block/qcow2.h
> >>>>@@ -222,6 +222,8 @@ typedef struct BDRVQcowState {
> >>>>      int l2_size;
> >>>>      int l1_size;
> >>>>      int l1_vm_state_index;
> >>>>+    int refcount_block_bits;
> >>>>+    int refcount_block_size;
> >>>Might just be me, but size sounds to me as if the unit were bytes. Would
> >>>you mind renaming this as refcount_block_entries or refblock_entries?
> >>If I'm doing a v7, no. Otherwise, well, see l1_size and l2_size. ;-)
> >Good point. This has confused people more than once, so it's probably
> >not just me.
> 
> Okay, now that I've done it and was about to send the series and
> just wanted to convert a local variable named refcount_table_size to
> refcount_table_entries, I decided not do do this. I'll call it
> refcount_block_size in v7 as well.
> 
> The reason for this is that I started looking for "_size" in
> block/qcow2-refcount.c. "_entries" is never used, the number of
> entries per L1, L2 or refcount table is always foo_size. In
> BDRVQcowState, there's not only l1_size and l2_size, but
> refcount_table_size as well. Calling it refcount_block_entries
> without renaming those would be extremely weird, and renaming those
> does not seem like a viable option to me. Furthermore, I'd find it
> extremely confusing to have "_entries" in some places and "_size" in
> others when there's no difference between the two. Currently, people
> ask "Why is this foo_size an entry count? ... Well, okay, that seems
> to be just the way it is." With foo_entries, it'll be "Why is this
> foo_size an entry count when bar_entries exists, so shouldn't it be
> foo_entries if they want it to be an entry count?"
> 
> I'll keep it as refcount_block_size, although I'm afraid reverting
> all these changes will be as hard as having made them in the first
> place... Oh, well, here goes.

Fair enough. Perhaps I should prepare a series renaming some things in
qcow2 once your current series are in (most importantly BDRVQcowState,
which also exists in qcow1 with the same name and makes debugging with
gdb harder than necessary...)

Kevin

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v5 02/11] qcow2: Fix leaks in dirty images
  2014-08-29 21:40 [Qemu-devel] [PATCH v5 00/11] qcow2: Fix image repairing Max Reitz
  2014-08-29 21:40 ` [Qemu-devel] [PATCH v5 01/11] qcow2: Calculate refcount block entry count Max Reitz
@ 2014-08-29 21:40 ` Max Reitz
  2014-10-20 14:28   ` Kevin Wolf
  2014-08-29 21:40 ` [Qemu-devel] [PATCH v5 03/11] qcow2: Split qcow2_check_refcounts() Max Reitz
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Max Reitz @ 2014-08-29 21:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Benoît Canet, Max Reitz

When opening dirty images, qcow2's repair function should not only
repair errors but leaks as well.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
---
 block/qcow2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 172ad00..82bca88 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -900,7 +900,7 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
         (s->incompatible_features & QCOW2_INCOMPAT_DIRTY)) {
         BdrvCheckResult result = {0};
 
-        ret = qcow2_check(bs, &result, BDRV_FIX_ERRORS);
+        ret = qcow2_check(bs, &result, BDRV_FIX_ERRORS | BDRV_FIX_LEAKS);
         if (ret < 0) {
             error_setg_errno(errp, -ret, "Could not repair dirty image");
             goto fail;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v5 02/11] qcow2: Fix leaks in dirty images
  2014-08-29 21:40 ` [Qemu-devel] [PATCH v5 02/11] qcow2: Fix leaks in dirty images Max Reitz
@ 2014-10-20 14:28   ` Kevin Wolf
  0 siblings, 0 replies; 34+ messages in thread
From: Kevin Wolf @ 2014-10-20 14:28 UTC (permalink / raw)
  To: Max Reitz; +Cc: Stefan Hajnoczi, qemu-devel, Benoît Canet

Am 29.08.2014 um 23:40 hat Max Reitz geschrieben:
> When opening dirty images, qcow2's repair function should not only
> repair errors but leaks as well.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>
> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>

Reviewed-by: Kevin Wolf <kwolf@redhat.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v5 03/11] qcow2: Split qcow2_check_refcounts()
  2014-08-29 21:40 [Qemu-devel] [PATCH v5 00/11] qcow2: Fix image repairing Max Reitz
  2014-08-29 21:40 ` [Qemu-devel] [PATCH v5 01/11] qcow2: Calculate refcount block entry count Max Reitz
  2014-08-29 21:40 ` [Qemu-devel] [PATCH v5 02/11] qcow2: Fix leaks in dirty images Max Reitz
@ 2014-08-29 21:40 ` Max Reitz
  2014-10-20 14:45   ` Kevin Wolf
  2014-08-29 21:40 ` [Qemu-devel] [PATCH v5 04/11] qcow2: Pull check_refblocks() up Max Reitz
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Max Reitz @ 2014-08-29 21:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Benoît Canet, Max Reitz

Put the code for calculating the reference counts and comparing them
during qemu-img check into own functions.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
---
 block/qcow2-refcount.c | 153 ++++++++++++++++++++++++++++++++-----------------
 1 file changed, 102 insertions(+), 51 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 43665b8..5f0920b 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1496,71 +1496,70 @@ done:
     return new_offset;
 }
 
+static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
+                           BdrvCheckMode fix, uint16_t **refcount_table,
+                           int64_t *nb_clusters);
+
 /*
- * Checks an image for refcount consistency.
- *
- * Returns 0 if no errors are found, the number of errors in case the image is
- * detected as corrupted, and -errno when an internal error occurred.
+ * Calculates an in-memory refcount table.
  */
-int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
-                          BdrvCheckMode fix)
+static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
+                               BdrvCheckMode fix, uint16_t **refcount_table,
+                               int64_t *nb_clusters)
 {
     BDRVQcowState *s = bs->opaque;
-    int64_t size, i, highest_cluster, nb_clusters;
-    int refcount1, refcount2;
+    int64_t i;
     QCowSnapshot *sn;
-    uint16_t *refcount_table;
     int ret;
 
-    size = bdrv_getlength(bs->file);
-    if (size < 0) {
-        res->check_errors++;
-        return size;
-    }
-
-    nb_clusters = size_to_clusters(s, size);
-    if (nb_clusters > INT_MAX) {
-        res->check_errors++;
-        return -EFBIG;
-    }
-
-    refcount_table = g_try_new0(uint16_t, nb_clusters);
-    if (nb_clusters && refcount_table == NULL) {
+    *refcount_table = g_try_new0(uint16_t, *nb_clusters);
+    if (*nb_clusters && *refcount_table == NULL) {
         res->check_errors++;
         return -ENOMEM;
     }
 
-    res->bfi.total_clusters =
-        size_to_clusters(s, bs->total_sectors * BDRV_SECTOR_SIZE);
-
     /* header */
-    inc_refcounts(bs, res, refcount_table, nb_clusters,
+    inc_refcounts(bs, res, *refcount_table, *nb_clusters,
         0, s->cluster_size);
 
     /* current L1 table */
-    ret = check_refcounts_l1(bs, res, refcount_table, nb_clusters,
+    ret = check_refcounts_l1(bs, res, *refcount_table, *nb_clusters,
                              s->l1_table_offset, s->l1_size, CHECK_FRAG_INFO);
     if (ret < 0) {
-        goto fail;
+        return ret;
     }
 
     /* snapshots */
-    for(i = 0; i < s->nb_snapshots; i++) {
+    for (i = 0; i < s->nb_snapshots; i++) {
         sn = s->snapshots + i;
-        ret = check_refcounts_l1(bs, res, refcount_table, nb_clusters,
+        ret = check_refcounts_l1(bs, res, *refcount_table, *nb_clusters,
             sn->l1_table_offset, sn->l1_size, 0);
         if (ret < 0) {
-            goto fail;
+            return ret;
         }
     }
-    inc_refcounts(bs, res, refcount_table, nb_clusters,
+    inc_refcounts(bs, res, *refcount_table, *nb_clusters,
         s->snapshots_offset, s->snapshots_size);
 
     /* refcount data */
-    inc_refcounts(bs, res, refcount_table, nb_clusters,
+    inc_refcounts(bs, res, *refcount_table, *nb_clusters,
         s->refcount_table_offset,
         s->refcount_table_size * sizeof(uint64_t));
 
+    return check_refblocks(bs, res, fix, refcount_table, nb_clusters);
+}
+
+/*
+ * Checks consistency of refblocks and accounts for each refblock in
+ * *refcount_table.
+ */
+static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
+                           BdrvCheckMode fix, uint16_t **refcount_table,
+                           int64_t *nb_clusters)
+{
+    BDRVQcowState *s = bs->opaque;
+    int64_t i;
+
     for(i = 0; i < s->refcount_table_size; i++) {
         uint64_t offset, cluster;
         offset = s->refcount_table[i];
@@ -1574,7 +1573,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
             continue;
         }
 
-        if (cluster >= nb_clusters) {
+        if (cluster >= *nb_clusters) {
             fprintf(stderr, "ERROR refcount block %" PRId64
                     " is outside image\n", i);
             res->corruptions++;
@@ -1582,14 +1581,14 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
         }
 
         if (offset != 0) {
-            inc_refcounts(bs, res, refcount_table, nb_clusters,
+            inc_refcounts(bs, res, *refcount_table, *nb_clusters,
                 offset, s->cluster_size);
-            if (refcount_table[cluster] != 1) {
+            if ((*refcount_table)[cluster] != 1) {
                 fprintf(stderr, "%s refcount block %" PRId64
                     " refcount=%d\n",
                     fix & BDRV_FIX_ERRORS ? "Repairing" :
                                             "ERROR",
-                    i, refcount_table[cluster]);
+                    i, (*refcount_table)[cluster]);
 
                 if (fix & BDRV_FIX_ERRORS) {
                     int64_t new_offset;
@@ -1601,17 +1600,18 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
                     }
 
                     /* update refcounts */
-                    if ((new_offset >> s->cluster_bits) >= nb_clusters) {
+                    if ((new_offset >> s->cluster_bits) >= *nb_clusters) {
                         /* increase refcount_table size if necessary */
-                        int old_nb_clusters = nb_clusters;
-                        nb_clusters = (new_offset >> s->cluster_bits) + 1;
-                        refcount_table = g_renew(uint16_t, refcount_table,
-                                                 nb_clusters);
-                        memset(&refcount_table[old_nb_clusters], 0, (nb_clusters
-                                - old_nb_clusters) * sizeof(uint16_t));
+                        int old_nb_clusters = *nb_clusters;
+                        *nb_clusters = (new_offset >> s->cluster_bits) + 1;
+                        *refcount_table = g_renew(uint16_t, *refcount_table,
+                                                  *nb_clusters);
+                        memset(&(*refcount_table)[old_nb_clusters], 0,
+                               (*nb_clusters - old_nb_clusters) *
+                               sizeof(uint16_t));
                     }
-                    refcount_table[cluster]--;
-                    inc_refcounts(bs, res, refcount_table, nb_clusters,
+                    (*refcount_table)[cluster]--;
+                    inc_refcounts(bs, res, *refcount_table, *nb_clusters,
                             new_offset, s->cluster_size);
 
                     res->corruptions_fixed++;
@@ -1622,8 +1622,22 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
         }
     }
 
-    /* compare ref counts */
-    for (i = 0, highest_cluster = 0; i < nb_clusters; i++) {
+    return 0;
+}
+
+/*
+ * Compares the actual reference count for each cluster in the image against the
+ * refcount as reported by the refcount structures on-disk.
+ */
+static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
+                              BdrvCheckMode fix, int64_t *highest_cluster,
+                              uint16_t *refcount_table, int64_t nb_clusters)
+{
+    BDRVQcowState *s = bs->opaque;
+    int64_t i;
+    int refcount1, refcount2, ret;
+
+    for (i = 0, *highest_cluster = 0; i < nb_clusters; i++) {
         refcount1 = get_refcount(bs, i);
         if (refcount1 < 0) {
             fprintf(stderr, "Can't get refcount for cluster %" PRId64 ": %s\n",
@@ -1635,11 +1649,10 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
         refcount2 = refcount_table[i];
 
         if (refcount1 > 0 || refcount2 > 0) {
-            highest_cluster = i;
+            *highest_cluster = i;
         }
 
         if (refcount1 != refcount2) {
-
             /* Check if we're allowed to fix the mismatch */
             int *num_fixed = NULL;
             if (refcount1 > refcount2 && (fix & BDRV_FIX_LEAKS)) {
@@ -1672,6 +1685,44 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
             }
         }
     }
+}
+
+/*
+ * Checks an image for refcount consistency.
+ *
+ * Returns 0 if no errors are found, the number of errors in case the image is
+ * detected as corrupted, and -errno when an internal error occurred.
+ */
+int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
+                          BdrvCheckMode fix)
+{
+    BDRVQcowState *s = bs->opaque;
+    int64_t size, highest_cluster, nb_clusters;
+    uint16_t *refcount_table;
+    int ret;
+
+    size = bdrv_getlength(bs->file);
+    if (size < 0) {
+        res->check_errors++;
+        return size;
+    }
+
+    nb_clusters = size_to_clusters(s, size);
+    if (nb_clusters > INT_MAX) {
+        res->check_errors++;
+        return -EFBIG;
+    }
+
+    res->bfi.total_clusters =
+        size_to_clusters(s, bs->total_sectors * BDRV_SECTOR_SIZE);
+
+    ret = calculate_refcounts(bs, res, fix, &refcount_table, &nb_clusters);
+    if (ret < 0) {
+        goto fail;
+    }
+
+    compare_refcounts(bs, res, fix, &highest_cluster, refcount_table,
+                      nb_clusters);
 
     /* check OFLAG_COPIED */
     ret = check_oflag_copied(bs, res, fix);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v5 03/11] qcow2: Split qcow2_check_refcounts()
  2014-08-29 21:40 ` [Qemu-devel] [PATCH v5 03/11] qcow2: Split qcow2_check_refcounts() Max Reitz
@ 2014-10-20 14:45   ` Kevin Wolf
  0 siblings, 0 replies; 34+ messages in thread
From: Kevin Wolf @ 2014-10-20 14:45 UTC (permalink / raw)
  To: Max Reitz; +Cc: Stefan Hajnoczi, qemu-devel, Benoît Canet

Am 29.08.2014 um 23:40 hat Max Reitz geschrieben:
> Put the code for calculating the reference counts and comparing them
> during qemu-img check into own functions.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>

> +static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
> +                               BdrvCheckMode fix, uint16_t **refcount_table,
> +                               int64_t *nb_clusters)
[..]
> +    return check_refblocks(bs, res, fix, refcount_table, nb_clusters);
> +}

At first I wanted to comment that it would make more sense to call
check_refblocks() from the main function qcow2_check_refcounts(). Now I
see that you call it here because it also increases the refcounts for
the refblocks.

The only thing we could consider here is to split the function into a
counting part that is called in calculate_refcounts() and a checking
part that is called in qcow2_check_refcounts(). I'll leave that to you,
though.

Reviewed-by: Kevin Wolf <kwolf@redhat.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v5 04/11] qcow2: Pull check_refblocks() up
  2014-08-29 21:40 [Qemu-devel] [PATCH v5 00/11] qcow2: Fix image repairing Max Reitz
                   ` (2 preceding siblings ...)
  2014-08-29 21:40 ` [Qemu-devel] [PATCH v5 03/11] qcow2: Split qcow2_check_refcounts() Max Reitz
@ 2014-08-29 21:40 ` Max Reitz
  2014-08-29 21:40 ` [Qemu-devel] [PATCH v5 05/11] qcow2: Reuse refcount table in calculate_refcounts() Max Reitz
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 34+ messages in thread
From: Max Reitz @ 2014-08-29 21:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Benoît Canet, Max Reitz

Pull check_refblocks() before calculate_refcounts() so we can drop its
static declaration.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
---
 block/qcow2-refcount.c | 102 ++++++++++++++++++++++++-------------------------
 1 file changed, 49 insertions(+), 53 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 5f0920b..2b728ef 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1496,59 +1496,6 @@ done:
     return new_offset;
 }
 
-static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
-                           BdrvCheckMode fix, uint16_t **refcount_table,
-                           int64_t *nb_clusters);
-
-/*
- * Calculates an in-memory refcount table.
- */
-static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
-                               BdrvCheckMode fix, uint16_t **refcount_table,
-                               int64_t *nb_clusters)
-{
-    BDRVQcowState *s = bs->opaque;
-    int64_t i;
-    QCowSnapshot *sn;
-    int ret;
-
-    *refcount_table = g_try_new0(uint16_t, *nb_clusters);
-    if (*nb_clusters && *refcount_table == NULL) {
-        res->check_errors++;
-        return -ENOMEM;
-    }
-
-    /* header */
-    inc_refcounts(bs, res, *refcount_table, *nb_clusters,
-        0, s->cluster_size);
-
-    /* current L1 table */
-    ret = check_refcounts_l1(bs, res, *refcount_table, *nb_clusters,
-                             s->l1_table_offset, s->l1_size, CHECK_FRAG_INFO);
-    if (ret < 0) {
-        return ret;
-    }
-
-    /* snapshots */
-    for (i = 0; i < s->nb_snapshots; i++) {
-        sn = s->snapshots + i;
-        ret = check_refcounts_l1(bs, res, *refcount_table, *nb_clusters,
-            sn->l1_table_offset, sn->l1_size, 0);
-        if (ret < 0) {
-            return ret;
-        }
-    }
-    inc_refcounts(bs, res, *refcount_table, *nb_clusters,
-        s->snapshots_offset, s->snapshots_size);
-
-    /* refcount data */
-    inc_refcounts(bs, res, *refcount_table, *nb_clusters,
-        s->refcount_table_offset,
-        s->refcount_table_size * sizeof(uint64_t));
-
-    return check_refblocks(bs, res, fix, refcount_table, nb_clusters);
-}
-
 /*
  * Checks consistency of refblocks and accounts for each refblock in
  * *refcount_table.
@@ -1626,6 +1573,55 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
 }
 
 /*
+ * Calculates an in-memory refcount table.
+ */
+static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
+                               BdrvCheckMode fix, uint16_t **refcount_table,
+                               int64_t *nb_clusters)
+{
+    BDRVQcowState *s = bs->opaque;
+    int64_t i;
+    QCowSnapshot *sn;
+    int ret;
+
+    *refcount_table = g_try_new0(uint16_t, *nb_clusters);
+    if (*nb_clusters && *refcount_table == NULL) {
+        res->check_errors++;
+        return -ENOMEM;
+    }
+
+    /* header */
+    inc_refcounts(bs, res, *refcount_table, *nb_clusters,
+        0, s->cluster_size);
+
+    /* current L1 table */
+    ret = check_refcounts_l1(bs, res, *refcount_table, *nb_clusters,
+                             s->l1_table_offset, s->l1_size, CHECK_FRAG_INFO);
+    if (ret < 0) {
+        return ret;
+    }
+
+    /* snapshots */
+    for (i = 0; i < s->nb_snapshots; i++) {
+        sn = s->snapshots + i;
+        ret = check_refcounts_l1(bs, res, *refcount_table, *nb_clusters,
+            sn->l1_table_offset, sn->l1_size, 0);
+        if (ret < 0) {
+            return ret;
+        }
+    }
+    inc_refcounts(bs, res, *refcount_table, *nb_clusters,
+        s->snapshots_offset, s->snapshots_size);
+
+    /* refcount data */
+    inc_refcounts(bs, res, *refcount_table, *nb_clusters,
+        s->refcount_table_offset,
+        s->refcount_table_size * sizeof(uint64_t));
+
+    return check_refblocks(bs, res, fix, refcount_table, nb_clusters);
+}
+
+/*
  * Compares the actual reference count for each cluster in the image against the
  * refcount as reported by the refcount structures on-disk.
  */
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v5 05/11] qcow2: Reuse refcount table in calculate_refcounts()
  2014-08-29 21:40 [Qemu-devel] [PATCH v5 00/11] qcow2: Fix image repairing Max Reitz
                   ` (3 preceding siblings ...)
  2014-08-29 21:40 ` [Qemu-devel] [PATCH v5 04/11] qcow2: Pull check_refblocks() up Max Reitz
@ 2014-08-29 21:40 ` Max Reitz
  2014-08-29 21:40 ` [Qemu-devel] [PATCH v5 06/11] qcow2: Fix refcount blocks beyond image end Max Reitz
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 34+ messages in thread
From: Max Reitz @ 2014-08-29 21:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Benoît Canet, Max Reitz

We will later call calculate_refcounts multiple times, so reuse the
refcount table if possible.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
---
 block/qcow2-refcount.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 2b728ef..babe6cb 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1584,10 +1584,12 @@ static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
     QCowSnapshot *sn;
     int ret;
 
-    *refcount_table = g_try_new0(uint16_t, *nb_clusters);
-    if (*nb_clusters && *refcount_table == NULL) {
-        res->check_errors++;
-        return -ENOMEM;
+    if (!*refcount_table) {
+        *refcount_table = g_try_new0(uint16_t, *nb_clusters);
+        if (*nb_clusters && *refcount_table == NULL) {
+            res->check_errors++;
+            return -ENOMEM;
+        }
     }
 
     /* header */
@@ -1694,7 +1696,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
 {
     BDRVQcowState *s = bs->opaque;
     int64_t size, highest_cluster, nb_clusters;
-    uint16_t *refcount_table;
+    uint16_t *refcount_table = NULL;
     int ret;
 
     size = bdrv_getlength(bs->file);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v5 06/11] qcow2: Fix refcount blocks beyond image end
  2014-08-29 21:40 [Qemu-devel] [PATCH v5 00/11] qcow2: Fix image repairing Max Reitz
                   ` (4 preceding siblings ...)
  2014-08-29 21:40 ` [Qemu-devel] [PATCH v5 05/11] qcow2: Reuse refcount table in calculate_refcounts() Max Reitz
@ 2014-08-29 21:40 ` Max Reitz
  2014-08-29 21:40 ` [Qemu-devel] [PATCH v5 07/11] qcow2: Do not perform potentially damaging repairs Max Reitz
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 34+ messages in thread
From: Max Reitz @ 2014-08-29 21:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Benoît Canet, Max Reitz

If the qcow2 check function detects a refcount block located beyond the
image end, grow the image appropriately. This cannot break anything and
is the logical fix for such a case.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 block/qcow2-refcount.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 58 insertions(+), 4 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index babe6cb..394a402 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1505,7 +1505,8 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
                            int64_t *nb_clusters)
 {
     BDRVQcowState *s = bs->opaque;
-    int64_t i;
+    int64_t i, size;
+    int ret;
 
     for(i = 0; i < s->refcount_table_size; i++) {
         uint64_t offset, cluster;
@@ -1521,9 +1522,62 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
         }
 
         if (cluster >= *nb_clusters) {
-            fprintf(stderr, "ERROR refcount block %" PRId64
-                    " is outside image\n", i);
-            res->corruptions++;
+            fprintf(stderr, "%s refcount block %" PRId64 " is outside image\n",
+                    fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR", i);
+
+            if (fix & BDRV_FIX_ERRORS) {
+                int64_t old_nb_clusters = *nb_clusters;
+
+                if (offset + s->cluster_size < offset ||
+                    offset + s->cluster_size > INT64_MAX)
+                {
+                    ret = -EINVAL;
+                    goto resize_fail;
+                }
+
+                ret = bdrv_truncate(bs->file, offset + s->cluster_size);
+                if (ret < 0) {
+                    goto resize_fail;
+                }
+                size = bdrv_getlength(bs->file);
+                if (size < 0) {
+                    ret = size;
+                    goto resize_fail;
+                }
+
+                *nb_clusters = size_to_clusters(s, size);
+                assert(*nb_clusters >= old_nb_clusters);
+
+                *refcount_table = g_try_realloc(*refcount_table,
+                        *nb_clusters * sizeof(uint16_t));
+                if (!*refcount_table) {
+                    res->check_errors++;
+                    return -ENOMEM;
+                }
+
+                memset(*refcount_table + old_nb_clusters, 0,
+                       (*nb_clusters - old_nb_clusters) * sizeof(uint16_t));
+
+                if (cluster >= *nb_clusters) {
+                    ret = -EINVAL;
+                    goto resize_fail;
+                }
+
+                res->corruptions_fixed++;
+                inc_refcounts(bs, res, *refcount_table, *nb_clusters,
+                              offset, s->cluster_size);
+                /* No need to check whether the refcount is now greater than 1:
+                 * This area was just allocated and zeroed, so it can only be
+                 * exactly 1 after inc_refcounts() */
+                continue;
+
+resize_fail:
+                res->corruptions++;
+                fprintf(stderr, "ERROR could not resize image: %s\n",
+                        strerror(-ret));
+            } else {
+                res->corruptions++;
+            }
             continue;
         }
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v5 07/11] qcow2: Do not perform potentially damaging repairs
  2014-08-29 21:40 [Qemu-devel] [PATCH v5 00/11] qcow2: Fix image repairing Max Reitz
                   ` (5 preceding siblings ...)
  2014-08-29 21:40 ` [Qemu-devel] [PATCH v5 06/11] qcow2: Fix refcount blocks beyond image end Max Reitz
@ 2014-08-29 21:40 ` Max Reitz
  2014-08-29 21:41 ` [Qemu-devel] [PATCH v5 08/11] qcow2: Rebuild refcount structure during check Max Reitz
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 34+ messages in thread
From: Max Reitz @ 2014-08-29 21:40 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Benoît Canet, Max Reitz

If a referenced cluster has a refcount of 0, increasing its refcount may
result in clusters being allocated for the refcount structures. This may
overwrite the referenced cluster, therefore we cannot simply increase
the refcount then.

In such cases, we can either try to replicate all the refcount
operations solely for the check operation, basing the allocations on the
in-memory refcount table; or we can simply rebuild the whole refcount
structure based on the in-memory refcount table. Since the latter will
be much easier, do that.

To prepare for this, introduce a "rebuild" boolean which should be set
to true whenever a fix is rather dangerous or too complicated using the
current refcount structures. Another example for this is refcount blocks
being referenced more than once.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
---
 block/qcow2-refcount.c | 185 ++++++++-----------------------------------------
 1 file changed, 27 insertions(+), 158 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 394a402..6300cec 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1382,127 +1382,12 @@ fail:
 }
 
 /*
- * Writes one sector of the refcount table to the disk
- */
-#define RT_ENTRIES_PER_SECTOR (512 / sizeof(uint64_t))
-static int write_reftable_entry(BlockDriverState *bs, int rt_index)
-{
-    BDRVQcowState *s = bs->opaque;
-    uint64_t buf[RT_ENTRIES_PER_SECTOR];
-    int rt_start_index;
-    int i, ret;
-
-    rt_start_index = rt_index & ~(RT_ENTRIES_PER_SECTOR - 1);
-    for (i = 0; i < RT_ENTRIES_PER_SECTOR; i++) {
-        buf[i] = cpu_to_be64(s->refcount_table[rt_start_index + i]);
-    }
-
-    ret = qcow2_pre_write_overlap_check(bs, QCOW2_OL_REFCOUNT_TABLE,
-            s->refcount_table_offset + rt_start_index * sizeof(uint64_t),
-            sizeof(buf));
-    if (ret < 0) {
-        return ret;
-    }
-
-    BLKDBG_EVENT(bs->file, BLKDBG_REFTABLE_UPDATE);
-    ret = bdrv_pwrite_sync(bs->file, s->refcount_table_offset +
-            rt_start_index * sizeof(uint64_t), buf, sizeof(buf));
-    if (ret < 0) {
-        return ret;
-    }
-
-    return 0;
-}
-
-/*
- * Allocates a new cluster for the given refcount block (represented by its
- * offset in the image file) and copies the current content there. This function
- * does _not_ decrement the reference count for the currently occupied cluster.
- *
- * This function prints an informative message to stderr on error (and returns
- * -errno); on success, the offset of the newly allocated cluster is returned.
- */
-static int64_t realloc_refcount_block(BlockDriverState *bs, int reftable_index,
-                                      uint64_t offset)
-{
-    BDRVQcowState *s = bs->opaque;
-    int64_t new_offset = 0;
-    void *refcount_block = NULL;
-    int ret;
-
-    /* allocate new refcount block */
-    new_offset = qcow2_alloc_clusters(bs, s->cluster_size);
-    if (new_offset < 0) {
-        fprintf(stderr, "Could not allocate new cluster: %s\n",
-                strerror(-new_offset));
-        ret = new_offset;
-        goto done;
-    }
-
-    /* fetch current refcount block content */
-    ret = qcow2_cache_get(bs, s->refcount_block_cache, offset, &refcount_block);
-    if (ret < 0) {
-        fprintf(stderr, "Could not fetch refcount block: %s\n", strerror(-ret));
-        goto fail_free_cluster;
-    }
-
-    /* new block has not yet been entered into refcount table, therefore it is
-     * no refcount block yet (regarding this check) */
-    ret = qcow2_pre_write_overlap_check(bs, 0, new_offset, s->cluster_size);
-    if (ret < 0) {
-        fprintf(stderr, "Could not write refcount block; metadata overlap "
-                "check failed: %s\n", strerror(-ret));
-        /* the image will be marked corrupt, so don't even attempt on freeing
-         * the cluster */
-        goto done;
-    }
-
-    /* write to new block */
-    ret = bdrv_write(bs->file, new_offset / BDRV_SECTOR_SIZE, refcount_block,
-            s->cluster_sectors);
-    if (ret < 0) {
-        fprintf(stderr, "Could not write refcount block: %s\n", strerror(-ret));
-        goto fail_free_cluster;
-    }
-
-    /* update refcount table */
-    assert(!offset_into_cluster(s, new_offset));
-    s->refcount_table[reftable_index] = new_offset;
-    ret = write_reftable_entry(bs, reftable_index);
-    if (ret < 0) {
-        fprintf(stderr, "Could not update refcount table: %s\n",
-                strerror(-ret));
-        goto fail_free_cluster;
-    }
-
-    goto done;
-
-fail_free_cluster:
-    qcow2_free_clusters(bs, new_offset, s->cluster_size, QCOW2_DISCARD_OTHER);
-
-done:
-    if (refcount_block) {
-        /* This should never fail, as it would only do so if the given refcount
-         * block cannot be found in the cache. As this is impossible as long as
-         * there are no bugs, assert the success. */
-        int tmp = qcow2_cache_put(bs, s->refcount_block_cache, &refcount_block);
-        assert(tmp == 0);
-    }
-
-    if (ret < 0) {
-        return ret;
-    }
-
-    return new_offset;
-}
-
-/*
  * Checks consistency of refblocks and accounts for each refblock in
  * *refcount_table.
  */
 static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
-                           BdrvCheckMode fix, uint16_t **refcount_table,
-                           int64_t *nb_clusters)
+                           BdrvCheckMode fix, bool *rebuild,
+                           uint16_t **refcount_table, int64_t *nb_clusters)
 {
     BDRVQcowState *s = bs->opaque;
     int64_t i, size;
@@ -1518,6 +1403,7 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
             fprintf(stderr, "ERROR refcount block %" PRId64 " is not "
                 "cluster aligned; refcount table entry corrupted\n", i);
             res->corruptions++;
+            *rebuild = true;
             continue;
         }
 
@@ -1573,6 +1459,7 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
 
 resize_fail:
                 res->corruptions++;
+                *rebuild = true;
                 fprintf(stderr, "ERROR could not resize image: %s\n",
                         strerror(-ret));
             } else {
@@ -1585,40 +1472,10 @@ resize_fail:
             inc_refcounts(bs, res, *refcount_table, *nb_clusters,
                 offset, s->cluster_size);
             if ((*refcount_table)[cluster] != 1) {
-                fprintf(stderr, "%s refcount block %" PRId64
-                    " refcount=%d\n",
-                    fix & BDRV_FIX_ERRORS ? "Repairing" :
-                                            "ERROR",
-                    i, (*refcount_table)[cluster]);
-
-                if (fix & BDRV_FIX_ERRORS) {
-                    int64_t new_offset;
-
-                    new_offset = realloc_refcount_block(bs, i, offset);
-                    if (new_offset < 0) {
-                        res->corruptions++;
-                        continue;
-                    }
-
-                    /* update refcounts */
-                    if ((new_offset >> s->cluster_bits) >= *nb_clusters) {
-                        /* increase refcount_table size if necessary */
-                        int old_nb_clusters = *nb_clusters;
-                        *nb_clusters = (new_offset >> s->cluster_bits) + 1;
-                        *refcount_table = g_renew(uint16_t, *refcount_table,
-                                                  *nb_clusters);
-                        memset(&(*refcount_table)[old_nb_clusters], 0,
-                               (*nb_clusters - old_nb_clusters) *
-                               sizeof(uint16_t));
-                    }
-                    (*refcount_table)[cluster]--;
-                    inc_refcounts(bs, res, *refcount_table, *nb_clusters,
-                            new_offset, s->cluster_size);
-
-                    res->corruptions_fixed++;
-                } else {
-                    res->corruptions++;
-                }
+                fprintf(stderr, "ERROR refcount block %" PRId64
+                        " refcount=%d\n", i, (*refcount_table)[cluster]);
+                res->corruptions++;
+                *rebuild = true;
             }
         }
     }
@@ -1630,8 +1487,8 @@ resize_fail:
  * Calculates an in-memory refcount table.
  */
 static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
-                               BdrvCheckMode fix, uint16_t **refcount_table,
-                               int64_t *nb_clusters)
+                               BdrvCheckMode fix, bool *rebuild,
+                               uint16_t **refcount_table, int64_t *nb_clusters)
 {
     BDRVQcowState *s = bs->opaque;
     int64_t i;
@@ -1674,7 +1531,7 @@ static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
         s->refcount_table_offset,
         s->refcount_table_size * sizeof(uint64_t));
 
-    return check_refblocks(bs, res, fix, refcount_table, nb_clusters);
+    return check_refblocks(bs, res, fix, rebuild, refcount_table, nb_clusters);
 }
 
 /*
@@ -1682,7 +1539,8 @@ static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
  * refcount as reported by the refcount structures on-disk.
  */
 static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
-                              BdrvCheckMode fix, int64_t *highest_cluster,
+                              BdrvCheckMode fix, bool *rebuild,
+                              int64_t *highest_cluster,
                               uint16_t *refcount_table, int64_t nb_clusters)
 {
     BDRVQcowState *s = bs->opaque;
@@ -1709,10 +1567,15 @@ static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
             int *num_fixed = NULL;
             if (refcount1 > refcount2 && (fix & BDRV_FIX_LEAKS)) {
                 num_fixed = &res->leaks_fixed;
-            } else if (refcount1 < refcount2 && (fix & BDRV_FIX_ERRORS)) {
+            } else if (refcount1 > 0 && refcount1 < refcount2 &&
+                       (fix & BDRV_FIX_ERRORS)) {
                 num_fixed = &res->corruptions_fixed;
             }
 
+            if (refcount1 == 0) {
+                *rebuild = true;
+            }
+
             fprintf(stderr, "%s cluster %" PRId64 " refcount=%d reference=%d\n",
                    num_fixed != NULL     ? "Repairing" :
                    refcount1 < refcount2 ? "ERROR" :
@@ -1751,6 +1614,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
     BDRVQcowState *s = bs->opaque;
     int64_t size, highest_cluster, nb_clusters;
     uint16_t *refcount_table = NULL;
+    bool rebuild = false;
     int ret;
 
     size = bdrv_getlength(bs->file);
@@ -1768,14 +1632,19 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
     res->bfi.total_clusters =
         size_to_clusters(s, bs->total_sectors * BDRV_SECTOR_SIZE);
 
-    ret = calculate_refcounts(bs, res, fix, &refcount_table, &nb_clusters);
+    ret = calculate_refcounts(bs, res, fix, &rebuild, &refcount_table,
+                              &nb_clusters);
     if (ret < 0) {
         goto fail;
     }
 
-    compare_refcounts(bs, res, fix, &highest_cluster, refcount_table,
+    compare_refcounts(bs, res, fix, &rebuild, &highest_cluster, refcount_table,
                       nb_clusters);
 
+    if (rebuild) {
+        fprintf(stderr, "ERROR need to rebuild refcount structures\n");
+    }
+
     /* check OFLAG_COPIED */
     ret = check_oflag_copied(bs, res, fix);
     if (ret < 0) {
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v5 08/11] qcow2: Rebuild refcount structure during check
  2014-08-29 21:40 [Qemu-devel] [PATCH v5 00/11] qcow2: Fix image repairing Max Reitz
                   ` (6 preceding siblings ...)
  2014-08-29 21:40 ` [Qemu-devel] [PATCH v5 07/11] qcow2: Do not perform potentially damaging repairs Max Reitz
@ 2014-08-29 21:41 ` Max Reitz
  2014-10-08 23:09   ` Eric Blake
                     ` (2 more replies)
  2014-08-29 21:41 ` [Qemu-devel] [PATCH v5 09/11] qcow2: Clean up after refcount rebuild Max Reitz
                   ` (3 subsequent siblings)
  11 siblings, 3 replies; 34+ messages in thread
From: Max Reitz @ 2014-08-29 21:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Benoît Canet, Max Reitz

The previous commit introduced the "rebuild" variable to qcow2's
implementation of the image consistency check. Now make use of this by
adding a function which creates a completely new refcount structure
based solely on the in-memory information gathered before.

The old refcount structure will be leaked, however.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-refcount.c | 286 ++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 283 insertions(+), 3 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 6300cec..318c152 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1603,6 +1603,266 @@ static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
 }
 
 /*
+ * Allocates a cluster using an in-memory refcount table (IMRT) in contrast to
+ * the on-disk refcount structures.
+ *
+ * *first_free_cluster does not necessarily point to the first free cluster, but
+ * may point to one cluster as close as possible before it. The offset returned
+ * will never be before that cluster.
+ *
+ * Note that *first_free_cluster is a cluster index whereas the return value is
+ * an offset.
+ */
+static int64_t alloc_clusters_imrt(BlockDriverState *bs,
+                                   int cluster_count,
+                                   uint16_t **refcount_table,
+                                   int64_t *nb_clusters,
+                                   int64_t *first_free_cluster)
+{
+    BDRVQcowState *s = bs->opaque;
+    int64_t cluster = *first_free_cluster, i;
+    bool first_gap = true;
+    int contiguous_free_clusters;
+
+    /* Starting at *first_free_cluster, find a range of at least cluster_count
+     * continuously free clusters */
+    for (contiguous_free_clusters = 0;
+         cluster < *nb_clusters && contiguous_free_clusters < cluster_count;
+         cluster++)
+    {
+        if (!(*refcount_table)[cluster]) {
+            contiguous_free_clusters++;
+            if (first_gap) {
+                /* If this is the first free cluster found, update
+                 * *first_free_cluster accordingly */
+                *first_free_cluster = cluster;
+                first_gap = false;
+            }
+        } else if (contiguous_free_clusters) {
+            contiguous_free_clusters = 0;
+        }
+    }
+
+    /* If contiguous_free_clusters is greater than zero, it contains the number
+     * of continuously free clusters until the current cluster; the first free
+     * cluster in the current "gap" is therefore
+     * cluster - contiguous_free_clusters */
+
+    /* If no such range could be found, grow the in-memory refcount table
+     * accordingly to append free clusters at the end of the image */
+    if (contiguous_free_clusters < cluster_count) {
+        int64_t old_nb_clusters = *nb_clusters;
+
+        /* There already is a gap of contiguous_free_clusters; we need
+         * cluster_count clusters; therefore, we have to allocate
+         * cluster_count - contiguous_free_clusters new clusters at the end of
+         * the image (which is the current value of cluster; note that cluster
+         * may exceed old_nb_clusters if *first_free_cluster pointed beyond the
+         * image end) */
+        *nb_clusters = cluster + cluster_count - contiguous_free_clusters;
+        *refcount_table = g_try_realloc(*refcount_table,
+                                        *nb_clusters * sizeof(uint16_t));
+        if (!*refcount_table) {
+            return -ENOMEM;
+        }
+
+        memset(*refcount_table + old_nb_clusters, 0,
+               (*nb_clusters - old_nb_clusters) * sizeof(uint16_t));
+    }
+
+    /* Go back to the first free cluster */
+    cluster -= contiguous_free_clusters;
+    for (i = 0; i < cluster_count; i++) {
+        (*refcount_table)[cluster + i] = 1;
+    }
+
+    return cluster << s->cluster_bits;
+}
+
+/*
+ * Creates a new refcount structure based solely on the in-memory information
+ * given through *refcount_table. All necessary allocations will be reflected
+ * in that array.
+ *
+ * On success, the old refcount structure is leaked (it will be covered by the
+ * new refcount structure).
+ */
+static int rebuild_refcount_structure(BlockDriverState *bs,
+                                      BdrvCheckResult *res,
+                                      uint16_t **refcount_table,
+                                      int64_t *nb_clusters)
+{
+    BDRVQcowState *s = bs->opaque;
+    int64_t first_free_cluster = 0, rt_ofs = -1, cluster = 0;
+    int64_t rb_ofs, rb_start, rb_index;
+    uint32_t reftable_size = 0;
+    uint64_t *reftable = NULL;
+    uint16_t *on_disk_rb;
+    int i, ret = 0;
+    struct {
+        uint64_t rt_offset;
+        uint32_t rt_clusters;
+    } QEMU_PACKED rt_offset_and_clusters;
+
+    qcow2_cache_empty(bs, s->refcount_block_cache);
+
+write_refblocks:
+    for (; cluster < *nb_clusters; cluster++) {
+        if (!(*refcount_table)[cluster]) {
+            continue;
+        }
+
+        rb_index = cluster >> s->refcount_block_bits;
+        rb_start = rb_index << s->refcount_block_bits;
+
+        /* Don't allocate a cluster in a refblock already written to disk */
+        if (first_free_cluster < rb_start) {
+            first_free_cluster = rb_start;
+        }
+        rb_ofs = alloc_clusters_imrt(bs, 1, refcount_table, nb_clusters,
+                                     &first_free_cluster);
+        if (rb_ofs < 0) {
+            fprintf(stderr, "ERROR allocating refblock: %s\n", strerror(-ret));
+            res->check_errors++;
+            ret = rb_ofs;
+            goto fail;
+        }
+
+        if (reftable_size <= rb_index) {
+            uint32_t old_rt_size = reftable_size;
+            reftable_size = ROUND_UP((rb_index + 1) * sizeof(uint64_t),
+                                     s->cluster_size) / sizeof(uint64_t);
+            reftable = g_try_realloc(reftable,
+                                     reftable_size * sizeof(uint64_t));
+            if (!reftable) {
+                res->check_errors++;
+                ret = -ENOMEM;
+                goto fail;
+            }
+
+            memset(reftable + old_rt_size, 0,
+                   (reftable_size - old_rt_size) * sizeof(uint64_t));
+
+            /* The offset we have for the reftable is now no longer valid;
+             * this will leak that range, but we can easily fix that by running
+             * a leak-fixing check after this rebuild operation */
+            rt_ofs = -1;
+        }
+        reftable[rb_index] = rb_ofs;
+
+        /* If this is apparently the last refblock (for now), try to squeeze the
+         * reftable in */
+        if (rb_index == (*nb_clusters - 1) >> s->refcount_block_bits &&
+            rt_ofs < 0)
+        {
+            rt_ofs = alloc_clusters_imrt(bs, size_to_clusters(s, reftable_size *
+                                                              sizeof(uint64_t)),
+                                         refcount_table, nb_clusters,
+                                         &first_free_cluster);
+            if (rt_ofs < 0) {
+                fprintf(stderr, "ERROR allocating reftable: %s\n",
+                        strerror(-ret));
+                res->check_errors++;
+                ret = rt_ofs;
+                goto fail;
+            }
+        }
+
+        ret = qcow2_pre_write_overlap_check(bs, 0, rb_ofs, s->cluster_size);
+        if (ret < 0) {
+            fprintf(stderr, "ERROR writing refblock: %s\n", strerror(-ret));
+            goto fail;
+        }
+
+        on_disk_rb = g_malloc0(s->cluster_size);
+        for (i = 0; i < s->cluster_size / sizeof(uint16_t) &&
+                    rb_start + i < *nb_clusters; i++)
+        {
+            on_disk_rb[i] = cpu_to_be16((*refcount_table)[rb_start + i]);
+        }
+
+        ret = bdrv_write(bs->file, rb_ofs / BDRV_SECTOR_SIZE,
+                         (void *)on_disk_rb, s->cluster_sectors);
+        g_free(on_disk_rb);
+        if (ret < 0) {
+            fprintf(stderr, "ERROR writing refblock: %s\n", strerror(-ret));
+            goto fail;
+        }
+
+        /* Go to the end of this refblock */
+        cluster = rb_start + s->cluster_size / sizeof(uint16_t) - 1;
+    }
+
+    if (rt_ofs < 0) {
+        int64_t post_rb_start = ROUND_UP(*nb_clusters,
+                                         s->cluster_size / sizeof(uint16_t));
+
+        /* Not pretty but simple */
+        if (first_free_cluster < post_rb_start) {
+            first_free_cluster = post_rb_start;
+        }
+        rt_ofs = alloc_clusters_imrt(bs, size_to_clusters(s, reftable_size *
+                                                          sizeof(uint64_t)),
+                                     refcount_table, nb_clusters,
+                                     &first_free_cluster);
+        if (rt_ofs < 0) {
+            fprintf(stderr, "ERROR allocating reftable: %s\n", strerror(-ret));
+            res->check_errors++;
+            ret = rt_ofs;
+            goto fail;
+        }
+
+        goto write_refblocks;
+    }
+
+    assert(reftable);
+
+    for (rb_index = 0; rb_index < reftable_size; rb_index++) {
+        cpu_to_be64s(&reftable[rb_index]);
+    }
+
+    ret = qcow2_pre_write_overlap_check(bs, 0, rt_ofs,
+                                        reftable_size * sizeof(uint64_t));
+    if (ret < 0) {
+        fprintf(stderr, "ERROR writing reftable: %s\n", strerror(-ret));
+        goto fail;
+    }
+
+    ret = bdrv_write(bs->file, rt_ofs / BDRV_SECTOR_SIZE, (void *)reftable,
+                     reftable_size * sizeof(uint64_t) / BDRV_SECTOR_SIZE);
+    if (ret < 0) {
+        fprintf(stderr, "ERROR writing reftable: %s\n", strerror(-ret));
+        goto fail;
+    }
+
+    /* Enter new reftable into the image header */
+    cpu_to_be64w(&rt_offset_and_clusters.rt_offset, rt_ofs);
+    cpu_to_be32w(&rt_offset_and_clusters.rt_clusters,
+                 size_to_clusters(s, reftable_size * sizeof(uint64_t)));
+    ret = bdrv_pwrite_sync(bs->file, offsetof(QCowHeader,
+                                              refcount_table_offset),
+                           &rt_offset_and_clusters,
+                           sizeof(rt_offset_and_clusters));
+    if (ret < 0) {
+        fprintf(stderr, "ERROR setting reftable: %s\n", strerror(-ret));
+        goto fail;
+    }
+
+    for (rb_index = 0; rb_index < reftable_size; rb_index++) {
+        be64_to_cpus(&reftable[rb_index]);
+    }
+    s->refcount_table = reftable;
+    s->refcount_table_offset = rt_ofs;
+    s->refcount_table_size = reftable_size;
+
+    return 0;
+
+fail:
+    g_free(reftable);
+    return ret;
+}
+
+/*
  * Checks an image for refcount consistency.
  *
  * Returns 0 if no errors are found, the number of errors in case the image is
@@ -1612,6 +1872,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
                           BdrvCheckMode fix)
 {
     BDRVQcowState *s = bs->opaque;
+    BdrvCheckResult pre_compare_res;
     int64_t size, highest_cluster, nb_clusters;
     uint16_t *refcount_table = NULL;
     bool rebuild = false;
@@ -1638,11 +1899,30 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
         goto fail;
     }
 
-    compare_refcounts(bs, res, fix, &rebuild, &highest_cluster, refcount_table,
+    /* In case we don't need to rebuild the refcount structure (but want to fix
+     * something), this function is immediately called again, in which case the
+     * result should be ignored */
+    pre_compare_res = *res;
+    compare_refcounts(bs, res, 0, &rebuild, &highest_cluster, refcount_table,
                       nb_clusters);
 
-    if (rebuild) {
-        fprintf(stderr, "ERROR need to rebuild refcount structures\n");
+    if (rebuild && (fix & BDRV_FIX_ERRORS)) {
+        fprintf(stderr, "Rebuilding refcount structure\n");
+        ret = rebuild_refcount_structure(bs, res, &refcount_table,
+                                         &nb_clusters);
+        if (ret < 0) {
+            goto fail;
+        }
+    } else if (fix) {
+        if (rebuild) {
+            fprintf(stderr, "ERROR need to rebuild refcount structures\n");
+        }
+
+        if (res->leaks || res->corruptions) {
+            *res = pre_compare_res;
+            compare_refcounts(bs, res, fix, &rebuild, &highest_cluster,
+                              refcount_table, nb_clusters);
+        }
     }
 
     /* check OFLAG_COPIED */
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v5 08/11] qcow2: Rebuild refcount structure during check
  2014-08-29 21:41 ` [Qemu-devel] [PATCH v5 08/11] qcow2: Rebuild refcount structure during check Max Reitz
@ 2014-10-08 23:09   ` Eric Blake
  2014-10-11 10:17     ` Max Reitz
  2014-10-16 15:27     ` Max Reitz
  2014-10-10 12:44   ` Benoît Canet
  2014-10-11 18:56   ` Benoît Canet
  2 siblings, 2 replies; 34+ messages in thread
From: Eric Blake @ 2014-10-08 23:09 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Benoît Canet

[-- Attachment #1: Type: text/plain, Size: 11963 bytes --]

On 08/29/2014 03:41 PM, Max Reitz wrote:
> The previous commit introduced the "rebuild" variable to qcow2's
> implementation of the image consistency check. Now make use of this by
> adding a function which creates a completely new refcount structure
> based solely on the in-memory information gathered before.
> 
> The old refcount structure will be leaked, however.

Might be worth mentioning in the commit message that a later commit will
deal with the leak.

> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2-refcount.c | 286 ++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 283 insertions(+), 3 deletions(-)
> 
> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index 6300cec..318c152 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -1603,6 +1603,266 @@ static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>  }
>  
>  /*
> + * Allocates a cluster using an in-memory refcount table (IMRT) in contrast to
> + * the on-disk refcount structures.
> + *
> + * *first_free_cluster does not necessarily point to the first free cluster, but
> + * may point to one cluster as close as possible before it. The offset returned
> + * will never be before that cluster.

Took me a couple reads of the comment and code to understand that.  If
I'm correct, this alternative wording may be better:

On input, *first_free_cluster tells where to start looking, and need not
actually be a free cluster; the returned offset will not be before that
cluster.  On output, *first_free_cluster points to the actual first free
cluster found.

Or, depending on the semantics you intended [1]:

On input, *first_free_cluster tells where to start looking, and need not
actually be a free cluster; the returned offset will not be before that
cluster.  On output, *first_free_cluster points to the first gap found,
even if that gap was too small to be used as the returned offset.

> + *
> + * Note that *first_free_cluster is a cluster index whereas the return value is
> + * an offset.
> + */
> +static int64_t alloc_clusters_imrt(BlockDriverState *bs,
> +                                   int cluster_count,
> +                                   uint16_t **refcount_table,
> +                                   int64_t *nb_clusters,
> +                                   int64_t *first_free_cluster)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    int64_t cluster = *first_free_cluster, i;
> +    bool first_gap = true;
> +    int contiguous_free_clusters;
> +
> +    /* Starting at *first_free_cluster, find a range of at least cluster_count
> +     * continuously free clusters */
> +    for (contiguous_free_clusters = 0;
> +         cluster < *nb_clusters && contiguous_free_clusters < cluster_count;
> +         cluster++)
> +    {
> +        if (!(*refcount_table)[cluster]) {
> +            contiguous_free_clusters++;
> +            if (first_gap) {
> +                /* If this is the first free cluster found, update
> +                 * *first_free_cluster accordingly */
> +                *first_free_cluster = cluster;
> +                first_gap = false;
> +            }
> +        } else if (contiguous_free_clusters) {
> +            contiguous_free_clusters = 0;
> +        }

[1] Should you be resetting first_gap in the 'else'?  If you don't, then
*first_free_cluster is NOT the start of the cluster just allocated, but
the first free cluster encountered on the way to the eventual
allocation.  I guess it depends on how the callers are using the
information; since the function is static, I guess I'll find out later
in my review.

> +    }
> +
> +    /* If contiguous_free_clusters is greater than zero, it contains the number
> +     * of continuously free clusters until the current cluster; the first free
> +     * cluster in the current "gap" is therefore
> +     * cluster - contiguous_free_clusters */
> +
> +    /* If no such range could be found, grow the in-memory refcount table
> +     * accordingly to append free clusters at the end of the image */
> +    if (contiguous_free_clusters < cluster_count) {
> +        int64_t old_nb_clusters = *nb_clusters;
> +
> +        /* There already is a gap of contiguous_free_clusters; we need

s/gap/tail/, since we are at the end of the table?

> +         * cluster_count clusters; therefore, we have to allocate
> +         * cluster_count - contiguous_free_clusters new clusters at the end of
> +         * the image (which is the current value of cluster; note that cluster
> +         * may exceed old_nb_clusters if *first_free_cluster pointed beyond the
> +         * image end) */
> +        *nb_clusters = cluster + cluster_count - contiguous_free_clusters;
> +        *refcount_table = g_try_realloc(*refcount_table,
> +                                        *nb_clusters * sizeof(uint16_t));
> +        if (!*refcount_table) {
> +            return -ENOMEM;
> +        }
> +
> +        memset(*refcount_table + old_nb_clusters, 0,
> +               (*nb_clusters - old_nb_clusters) * sizeof(uint16_t));

Is this calculation unnecessarily hard-coded to refcount_order==4?

> +    }
> +
> +    /* Go back to the first free cluster */
> +    cluster -= contiguous_free_clusters;
> +    for (i = 0; i < cluster_count; i++) {
> +        (*refcount_table)[cluster + i] = 1;
> +    }
> +
> +    return cluster << s->cluster_bits;
> +}
> +
> +/*
> + * Creates a new refcount structure based solely on the in-memory information
> + * given through *refcount_table. All necessary allocations will be reflected
> + * in that array.
> + *
> + * On success, the old refcount structure is leaked (it will be covered by the
> + * new refcount structure).
> + */
> +static int rebuild_refcount_structure(BlockDriverState *bs,
> +                                      BdrvCheckResult *res,
> +                                      uint16_t **refcount_table,
> +                                      int64_t *nb_clusters)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    int64_t first_free_cluster = 0, rt_ofs = -1, cluster = 0;
> +    int64_t rb_ofs, rb_start, rb_index;
> +    uint32_t reftable_size = 0;
> +    uint64_t *reftable = NULL;
> +    uint16_t *on_disk_rb;
> +    int i, ret = 0;

ret is 0...

> +    struct {
> +        uint64_t rt_offset;
> +        uint32_t rt_clusters;
> +    } QEMU_PACKED rt_offset_and_clusters;
> +
> +    qcow2_cache_empty(bs, s->refcount_block_cache);
> +
> +write_refblocks:
> +    for (; cluster < *nb_clusters; cluster++) {
> +        if (!(*refcount_table)[cluster]) {
> +            continue;
> +        }
> +
> +        rb_index = cluster >> s->refcount_block_bits;
> +        rb_start = rb_index << s->refcount_block_bits;
> +
> +        /* Don't allocate a cluster in a refblock already written to disk */
> +        if (first_free_cluster < rb_start) {
> +            first_free_cluster = rb_start;
> +        }
> +        rb_ofs = alloc_clusters_imrt(bs, 1, refcount_table, nb_clusters,
> +                                     &first_free_cluster);

[1] looking back at my earlier question, you are starting each iteration
no earlier than the current rb_start.  But if you end up jumping back to
write_refblocks, are you guaranteed that rb_start is safely far enough
into the file, even if first_free_cluster is pointing to a gap that was
too small for an allocation?

> +        if (rb_ofs < 0) {
> +            fprintf(stderr, "ERROR allocating refblock: %s\n", strerror(-ret));

...but if we hit this error on the first time through the for loop,
strerror(0) is NOT what you meant to print.  Did you mean
strerror(-rb_ofs) here?

> +            res->check_errors++;
> +            ret = rb_ofs;

Narrowing from int64_t to int; but I guess we know that if rb_ofs < 0,
it is only -1, and not something weird like -0x100000000.  Is the goal
that ret is -1/0, or are you trying to encode negative errno values in
the return?

> +            goto fail;
> +        }
> +
> +        if (reftable_size <= rb_index) {
> +            uint32_t old_rt_size = reftable_size;
> +            reftable_size = ROUND_UP((rb_index + 1) * sizeof(uint64_t),
> +                                     s->cluster_size) / sizeof(uint64_t);
> +            reftable = g_try_realloc(reftable,
> +                                     reftable_size * sizeof(uint64_t));
> +            if (!reftable) {
> +                res->check_errors++;
> +                ret = -ENOMEM;
> +                goto fail;
> +            }
> +
> +            memset(reftable + old_rt_size, 0,
> +                   (reftable_size - old_rt_size) * sizeof(uint64_t));
> +
> +            /* The offset we have for the reftable is now no longer valid;
> +             * this will leak that range, but we can easily fix that by running
> +             * a leak-fixing check after this rebuild operation */
> +            rt_ofs = -1;
> +        }
> +        reftable[rb_index] = rb_ofs;
> +
> +        /* If this is apparently the last refblock (for now), try to squeeze the
> +         * reftable in */
> +        if (rb_index == (*nb_clusters - 1) >> s->refcount_block_bits &&
> +            rt_ofs < 0)
> +        {
> +            rt_ofs = alloc_clusters_imrt(bs, size_to_clusters(s, reftable_size *
> +                                                              sizeof(uint64_t)),
> +                                         refcount_table, nb_clusters,
> +                                         &first_free_cluster);
> +            if (rt_ofs < 0) {
> +                fprintf(stderr, "ERROR allocating reftable: %s\n",
> +                        strerror(-ret));

Again, -ret looks wrong here.

> +                res->check_errors++;
> +                ret = rt_ofs;
> +                goto fail;
> +            }
> +        }
> +
> +        ret = qcow2_pre_write_overlap_check(bs, 0, rb_ofs, s->cluster_size);
> +        if (ret < 0) {
> +            fprintf(stderr, "ERROR writing refblock: %s\n", strerror(-ret));
> +            goto fail;
> +        }
> +
> +        on_disk_rb = g_malloc0(s->cluster_size);

Why g_try_malloc earlier, but abort()ing g_malloc0 here?

> +        for (i = 0; i < s->cluster_size / sizeof(uint16_t) &&
> +                    rb_start + i < *nb_clusters; i++)
> +        {
> +            on_disk_rb[i] = cpu_to_be16((*refcount_table)[rb_start + i]);
> +        }
> +
> +        ret = bdrv_write(bs->file, rb_ofs / BDRV_SECTOR_SIZE,
> +                         (void *)on_disk_rb, s->cluster_sectors);
> +        g_free(on_disk_rb);
> +        if (ret < 0) {
> +            fprintf(stderr, "ERROR writing refblock: %s\n", strerror(-ret));
> +            goto fail;
> +        }
> +
> +        /* Go to the end of this refblock */
> +        cluster = rb_start + s->cluster_size / sizeof(uint16_t) - 1;
> +    }
> +
> +    if (rt_ofs < 0) {
> +        int64_t post_rb_start = ROUND_UP(*nb_clusters,
> +                                         s->cluster_size / sizeof(uint16_t));
> +
> +        /* Not pretty but simple */
> +        if (first_free_cluster < post_rb_start) {
> +            first_free_cluster = post_rb_start;
> +        }
> +        rt_ofs = alloc_clusters_imrt(bs, size_to_clusters(s, reftable_size *
> +                                                          sizeof(uint64_t)),
> +                                     refcount_table, nb_clusters,
> +                                     &first_free_cluster);
> +        if (rt_ofs < 0) {
> +            fprintf(stderr, "ERROR allocating reftable: %s\n", strerror(-ret));

Another wrong -ret?

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v5 08/11] qcow2: Rebuild refcount structure during check
  2014-10-08 23:09   ` Eric Blake
@ 2014-10-11 10:17     ` Max Reitz
  2014-10-16 15:27     ` Max Reitz
  1 sibling, 0 replies; 34+ messages in thread
From: Max Reitz @ 2014-10-11 10:17 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Benoît Canet

Am 09.10.2014 um 01:09 schrieb Eric Blake:
> On 08/29/2014 03:41 PM, Max Reitz wrote:
>> The previous commit introduced the "rebuild" variable to qcow2's
>> implementation of the image consistency check. Now make use of this by
>> adding a function which creates a completely new refcount structure
>> based solely on the in-memory information gathered before.
>>
>> The old refcount structure will be leaked, however.
> Might be worth mentioning in the commit message that a later commit will
> deal with the leak.
>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/qcow2-refcount.c | 286 ++++++++++++++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 283 insertions(+), 3 deletions(-)
>>
>> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
>> index 6300cec..318c152 100644
>> --- a/block/qcow2-refcount.c
>> +++ b/block/qcow2-refcount.c
>> @@ -1603,6 +1603,266 @@ static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>>   }
>>   
>>   /*
>> + * Allocates a cluster using an in-memory refcount table (IMRT) in contrast to
>> + * the on-disk refcount structures.
>> + *
>> + * *first_free_cluster does not necessarily point to the first free cluster, but
>> + * may point to one cluster as close as possible before it. The offset returned
>> + * will never be before that cluster.
> Took me a couple reads of the comment and code to understand that.  If
> I'm correct, this alternative wording may be better:
>
> On input, *first_free_cluster tells where to start looking, and need not
> actually be a free cluster; the returned offset will not be before that
> cluster.  On output, *first_free_cluster points to the actual first free
> cluster found.
>
> Or, depending on the semantics you intended [1]:
>
> On input, *first_free_cluster tells where to start looking, and need not
> actually be a free cluster; the returned offset will not be before that
> cluster.  On output, *first_free_cluster points to the first gap found,
> even if that gap was too small to be used as the returned offset.

Yes, *first_free_cluster has nothing to do with the allocated cluster 
range. The offset of that allocated range will be returned by the 
function. *first_free_cluster should merely always point somewhere 
before or at the first gap (of any size) just so alloc_clusters_imrt() 
does not have to start searching at the beginning of the IMRT next time 
it is called. However, this also makes it useful to limit the search of 
the cluster range to the end of the IMRT (or even after the end) by the 
caller setting it to some arbitrary value, so it has a dual-use.

>> + *
>> + * Note that *first_free_cluster is a cluster index whereas the return value is
>> + * an offset.
>> + */
>> +static int64_t alloc_clusters_imrt(BlockDriverState *bs,
>> +                                   int cluster_count,
>> +                                   uint16_t **refcount_table,
>> +                                   int64_t *nb_clusters,
>> +                                   int64_t *first_free_cluster)
>> +{
>> +    BDRVQcowState *s = bs->opaque;
>> +    int64_t cluster = *first_free_cluster, i;
>> +    bool first_gap = true;
>> +    int contiguous_free_clusters;
>> +
>> +    /* Starting at *first_free_cluster, find a range of at least cluster_count
>> +     * continuously free clusters */
>> +    for (contiguous_free_clusters = 0;
>> +         cluster < *nb_clusters && contiguous_free_clusters < cluster_count;
>> +         cluster++)
>> +    {
>> +        if (!(*refcount_table)[cluster]) {
>> +            contiguous_free_clusters++;
>> +            if (first_gap) {
>> +                /* If this is the first free cluster found, update
>> +                 * *first_free_cluster accordingly */
>> +                *first_free_cluster = cluster;
>> +                first_gap = false;
>> +            }
>> +        } else if (contiguous_free_clusters) {
>> +            contiguous_free_clusters = 0;
>> +        }
> [1] Should you be resetting first_gap in the 'else'?  If you don't, then
> *first_free_cluster is NOT the start of the cluster just allocated, but
> the first free cluster encountered on the way to the eventual
> allocation.  I guess it depends on how the callers are using the
> information; since the function is static, I guess I'll find out later
> in my review.

Yes, *first_free_cluster is not that allocated cluster. I don't keep the 
offset in a dedicated variable, because it'll always be cluster - 
contiguous_free_clusters (see the next comment).

>> +    }
>> +
>> +    /* If contiguous_free_clusters is greater than zero, it contains the number
>> +     * of continuously free clusters until the current cluster; the first free
>> +     * cluster in the current "gap" is therefore
>> +     * cluster - contiguous_free_clusters */
>> +
>> +    /* If no such range could be found, grow the in-memory refcount table
>> +     * accordingly to append free clusters at the end of the image */
>> +    if (contiguous_free_clusters < cluster_count) {
>> +        int64_t old_nb_clusters = *nb_clusters;
>> +
>> +        /* There already is a gap of contiguous_free_clusters; we need
> s/gap/tail/, since we are at the end of the table?

Well, tail doesn't imply that it's empty. I could change it to 
"contiguous_free_clusters clusters are already empty at the image end".

>> +         * cluster_count clusters; therefore, we have to allocate
>> +         * cluster_count - contiguous_free_clusters new clusters at the end of
>> +         * the image (which is the current value of cluster; note that cluster
>> +         * may exceed old_nb_clusters if *first_free_cluster pointed beyond the
>> +         * image end) */
>> +        *nb_clusters = cluster + cluster_count - contiguous_free_clusters;
>> +        *refcount_table = g_try_realloc(*refcount_table,
>> +                                        *nb_clusters * sizeof(uint16_t));
>> +        if (!*refcount_table) {
>> +            return -ENOMEM;
>> +        }
>> +
>> +        memset(*refcount_table + old_nb_clusters, 0,
>> +               (*nb_clusters - old_nb_clusters) * sizeof(uint16_t));
> Is this calculation unnecessarily hard-coded to refcount_order==4?

Seems like it. Shame on me. ;-)

>> +    }
>> +
>> +    /* Go back to the first free cluster */
>> +    cluster -= contiguous_free_clusters;
>> +    for (i = 0; i < cluster_count; i++) {
>> +        (*refcount_table)[cluster + i] = 1;
>> +    }
>> +
>> +    return cluster << s->cluster_bits;
>> +}
>> +
>> +/*
>> + * Creates a new refcount structure based solely on the in-memory information
>> + * given through *refcount_table. All necessary allocations will be reflected
>> + * in that array.
>> + *
>> + * On success, the old refcount structure is leaked (it will be covered by the
>> + * new refcount structure).
>> + */
>> +static int rebuild_refcount_structure(BlockDriverState *bs,
>> +                                      BdrvCheckResult *res,
>> +                                      uint16_t **refcount_table,
>> +                                      int64_t *nb_clusters)
>> +{
>> +    BDRVQcowState *s = bs->opaque;
>> +    int64_t first_free_cluster = 0, rt_ofs = -1, cluster = 0;
>> +    int64_t rb_ofs, rb_start, rb_index;
>> +    uint32_t reftable_size = 0;
>> +    uint64_t *reftable = NULL;
>> +    uint16_t *on_disk_rb;
>> +    int i, ret = 0;
> ret is 0...
>
>> +    struct {
>> +        uint64_t rt_offset;
>> +        uint32_t rt_clusters;
>> +    } QEMU_PACKED rt_offset_and_clusters;
>> +
>> +    qcow2_cache_empty(bs, s->refcount_block_cache);
>> +
>> +write_refblocks:
>> +    for (; cluster < *nb_clusters; cluster++) {
>> +        if (!(*refcount_table)[cluster]) {
>> +            continue;
>> +        }
>> +
>> +        rb_index = cluster >> s->refcount_block_bits;
>> +        rb_start = rb_index << s->refcount_block_bits;
>> +
>> +        /* Don't allocate a cluster in a refblock already written to disk */
>> +        if (first_free_cluster < rb_start) {
>> +            first_free_cluster = rb_start;
>> +        }
>> +        rb_ofs = alloc_clusters_imrt(bs, 1, refcount_table, nb_clusters,
>> +                                     &first_free_cluster);
> [1] looking back at my earlier question, you are starting each iteration
> no earlier than the current rb_start.  But if you end up jumping back to
> write_refblocks, are you guaranteed that rb_start is safely far enough
> into the file, even if first_free_cluster is pointing to a gap that was
> too small for an allocation?

"cluster" is never decreased, it only grows. Therefore, rb_start will 
also grow or at least stay the same. The if condition before the 
alloc_clusters_imrt() call should keep first_free_clusters far enough 
into the file, shouldn't it? Other than that, what do you mean by 
"safely"? All that's important is that we don't allocate a cluster 
before the current refblock ("Don't allocate a cluster in a refblock 
already written to disk"). Apart from that, all allocated offsets are 
fine; and alloc_clusters_imrt() will always return a correctly allocated 
range, at or after first_free_clusters, so there shouldn't be any problem.

>> +        if (rb_ofs < 0) {
>> +            fprintf(stderr, "ERROR allocating refblock: %s\n", strerror(-ret));
> ...but if we hit this error on the first time through the for loop,
> strerror(0) is NOT what you meant to print.  Did you mean
> strerror(-rb_ofs) here?

Yes, it should be -rb_ofs.

>> +            res->check_errors++;
>> +            ret = rb_ofs;
> Narrowing from int64_t to int; but I guess we know that if rb_ofs < 0,
> it is only -1, and not something weird like -0x100000000.  Is the goal
> that ret is -1/0, or are you trying to encode negative errno values in
> the return?

alloc_clusters_imrt() returns -errno (0/-ENOMEM only, actually), so this 
is -errno as well.

>> +            goto fail;
>> +        }
>> +
>> +        if (reftable_size <= rb_index) {
>> +            uint32_t old_rt_size = reftable_size;
>> +            reftable_size = ROUND_UP((rb_index + 1) * sizeof(uint64_t),
>> +                                     s->cluster_size) / sizeof(uint64_t);
>> +            reftable = g_try_realloc(reftable,
>> +                                     reftable_size * sizeof(uint64_t));
>> +            if (!reftable) {
>> +                res->check_errors++;
>> +                ret = -ENOMEM;
>> +                goto fail;
>> +            }
>> +
>> +            memset(reftable + old_rt_size, 0,
>> +                   (reftable_size - old_rt_size) * sizeof(uint64_t));
>> +
>> +            /* The offset we have for the reftable is now no longer valid;
>> +             * this will leak that range, but we can easily fix that by running
>> +             * a leak-fixing check after this rebuild operation */
>> +            rt_ofs = -1;
>> +        }
>> +        reftable[rb_index] = rb_ofs;
>> +
>> +        /* If this is apparently the last refblock (for now), try to squeeze the
>> +         * reftable in */
>> +        if (rb_index == (*nb_clusters - 1) >> s->refcount_block_bits &&
>> +            rt_ofs < 0)
>> +        {
>> +            rt_ofs = alloc_clusters_imrt(bs, size_to_clusters(s, reftable_size *
>> +                                                              sizeof(uint64_t)),
>> +                                         refcount_table, nb_clusters,
>> +                                         &first_free_cluster);
>> +            if (rt_ofs < 0) {
>> +                fprintf(stderr, "ERROR allocating reftable: %s\n",
>> +                        strerror(-ret));
> Again, -ret looks wrong here.

Yes, should be -rt_ofs.

>> +                res->check_errors++;
>> +                ret = rt_ofs;
>> +                goto fail;
>> +            }
>> +        }
>> +
>> +        ret = qcow2_pre_write_overlap_check(bs, 0, rb_ofs, s->cluster_size);
>> +        if (ret < 0) {
>> +            fprintf(stderr, "ERROR writing refblock: %s\n", strerror(-ret));
>> +            goto fail;
>> +        }
>> +
>> +        on_disk_rb = g_malloc0(s->cluster_size);
> Why g_try_malloc earlier, but abort()ing g_malloc0 here?

I use g_try_realloc/g_try_malloc for the reftable and g_malloc for the 
refblocks. The reftable can be arbitrarily large; a refblock is pretty 
limited in size (it's a cluster). If g_malloc fails because there's no 
room for a single cluster anymore, two things will happen: (1) The whole 
qcow2 driver will explode (because it still uses g_malloc for most if 
not all cluster allocations, afaik); (2) Linux will kill qemu because of 
OOM, we won't even be able to catch that by using g_try_malloc. As far 
as I've seen, using the try variants is only really useful if you may 
have some absolutely absurd size value.

>> +        for (i = 0; i < s->cluster_size / sizeof(uint16_t) &&
>> +                    rb_start + i < *nb_clusters; i++)
>> +        {
>> +            on_disk_rb[i] = cpu_to_be16((*refcount_table)[rb_start + i]);
>> +        }
>> +
>> +        ret = bdrv_write(bs->file, rb_ofs / BDRV_SECTOR_SIZE,
>> +                         (void *)on_disk_rb, s->cluster_sectors);
>> +        g_free(on_disk_rb);
>> +        if (ret < 0) {
>> +            fprintf(stderr, "ERROR writing refblock: %s\n", strerror(-ret));
>> +            goto fail;
>> +        }
>> +
>> +        /* Go to the end of this refblock */
>> +        cluster = rb_start + s->cluster_size / sizeof(uint16_t) - 1;
>> +    }
>> +
>> +    if (rt_ofs < 0) {
>> +        int64_t post_rb_start = ROUND_UP(*nb_clusters,
>> +                                         s->cluster_size / sizeof(uint16_t));
>> +
>> +        /* Not pretty but simple */
>> +        if (first_free_cluster < post_rb_start) {
>> +            first_free_cluster = post_rb_start;
>> +        }
>> +        rt_ofs = alloc_clusters_imrt(bs, size_to_clusters(s, reftable_size *
>> +                                                          sizeof(uint64_t)),
>> +                                     refcount_table, nb_clusters,
>> +                                     &first_free_cluster);
>> +        if (rt_ofs < 0) {
>> +            fprintf(stderr, "ERROR allocating reftable: %s\n", strerror(-ret));
> Another wrong -ret?

I guess it's just a habit to type strerror(-ret)...

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v5 08/11] qcow2: Rebuild refcount structure during check
  2014-10-08 23:09   ` Eric Blake
  2014-10-11 10:17     ` Max Reitz
@ 2014-10-16 15:27     ` Max Reitz
  2014-10-16 15:33       ` Eric Blake
  1 sibling, 1 reply; 34+ messages in thread
From: Max Reitz @ 2014-10-16 15:27 UTC (permalink / raw)
  To: Eric Blake, qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Benoît Canet

Am 09.10.2014 um 01:09 schrieb Eric Blake:
> On 08/29/2014 03:41 PM, Max Reitz wrote:
>> +         * cluster_count clusters; therefore, we have to allocate
>> +         * cluster_count - contiguous_free_clusters new clusters at the end of
>> +         * the image (which is the current value of cluster; note that cluster
>> +         * may exceed old_nb_clusters if *first_free_cluster pointed beyond the
>> +         * image end) */
>> +        *nb_clusters = cluster + cluster_count - contiguous_free_clusters;
>> +        *refcount_table = g_try_realloc(*refcount_table,
>> +                                        *nb_clusters * sizeof(uint16_t));
>> +        if (!*refcount_table) {
>> +            return -ENOMEM;
>> +        }
>> +
>> +        memset(*refcount_table + old_nb_clusters, 0,
>> +               (*nb_clusters - old_nb_clusters) * sizeof(uint16_t));
> Is this calculation unnecessarily hard-coded to refcount_order==4?

While now in the process of editing this patch, no, this is not 
hard-coded to that refcount_order. It's hard-coded to *refcount_table 
being uint16_t *. I think this fine. Making the in-memory refcount table 
support variable refcount orders would be pretty hard and in fact we do 
not need it before we actually support other refcount_orders. When doing 
that, I (or anyone else) will probably add some function read_refcount() 
which reads a refcount from a refcount block or a concatenation of 
refblocks (such as this in-memory refcount table) while taking into 
account a variable refcount_order. When that is done, we can rework this 
code.

But for now it's fine to make the in-memory refcount table entries 
uint16_t, which is the reason for all the sizeof(uint16_t).

Max

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v5 08/11] qcow2: Rebuild refcount structure during check
  2014-10-16 15:27     ` Max Reitz
@ 2014-10-16 15:33       ` Eric Blake
  0 siblings, 0 replies; 34+ messages in thread
From: Eric Blake @ 2014-10-16 15:33 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Benoît Canet

[-- Attachment #1: Type: text/plain, Size: 1522 bytes --]

On 10/16/2014 09:27 AM, Max Reitz wrote:

>>> +        memset(*refcount_table + old_nb_clusters, 0,
>>> +               (*nb_clusters - old_nb_clusters) * sizeof(uint16_t));
>> Is this calculation unnecessarily hard-coded to refcount_order==4?
> 
> While now in the process of editing this patch, no, this is not
> hard-coded to that refcount_order. It's hard-coded to *refcount_table
> being uint16_t *. I think this fine.

Correct - our choice of uint16_t* is what hard-codes our current
dependence on refcount_order==4, and we assert that is the case elsewhere.

> Making the in-memory refcount table
> support variable refcount orders would be pretty hard and in fact we do
> not need it before we actually support other refcount_orders.

Agreed. Particularly for refcount orders < 3, where you pack more than
one refcount in a single byte.

> When doing
> that, I (or anyone else) will probably add some function read_refcount()
> which reads a refcount from a refcount block or a concatenation of
> refblocks (such as this in-memory refcount table) while taking into
> account a variable refcount_order. When that is done, we can rework this
> code.

Fair enough. Don't hold up this series for that future improvement.

> 
> But for now it's fine to make the in-memory refcount table entries
> uint16_t, which is the reason for all the sizeof(uint16_t).
> 
> Max
> 
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v5 08/11] qcow2: Rebuild refcount structure during check
  2014-08-29 21:41 ` [Qemu-devel] [PATCH v5 08/11] qcow2: Rebuild refcount structure during check Max Reitz
  2014-10-08 23:09   ` Eric Blake
@ 2014-10-10 12:44   ` Benoît Canet
  2014-10-11 10:27     ` Max Reitz
  2014-10-11 18:56   ` Benoît Canet
  2 siblings, 1 reply; 34+ messages in thread
From: Benoît Canet @ 2014-10-10 12:44 UTC (permalink / raw)
  To: Max Reitz; +Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, Benoît Canet

> +        *nb_clusters = cluster + cluster_count - contiguous_free_clusters;
> +        *refcount_table = g_try_realloc(*refcount_table,
> +                                        *nb_clusters * sizeof(uint16_t));

Something tells me that these sizeof(uint16_t) are connected to s->refcount_order
and indirectely to REFCOUNT_SHIFT and that this code could benefit from this
relationship thus probably saving you work in the future.


> +        if (!*refcount_table) {
> +            return -ENOMEM;
> +        }
> +
> +        memset(*refcount_table + old_nb_clusters, 0,
> +               (*nb_clusters - old_nb_clusters) * sizeof(uint16_t));
> +    }
> +
> +    /* Go back to the first free cluster */
> +    cluster -= contiguous_free_clusters;
> +    for (i = 0; i < cluster_count; i++) {
> +        (*refcount_table)[cluster + i] = 1;
> +    }
> +
> +    return cluster << s->cluster_bits;
> +}
> +
> +/*
> + * Creates a new refcount structure based solely on the in-memory information
> + * given through *refcount_table. All necessary allocations will be reflected
> + * in that array.
> + *
> + * On success, the old refcount structure is leaked (it will be covered by the
> + * new refcount structure).
> + */
> +static int rebuild_refcount_structure(BlockDriverState *bs,
> +                                      BdrvCheckResult *res,
> +                                      uint16_t **refcount_table,
> +                                      int64_t *nb_clusters)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    int64_t first_free_cluster = 0, rt_ofs = -1, cluster = 0;
> +    int64_t rb_ofs, rb_start, rb_index;
> +    uint32_t reftable_size = 0;
> +    uint64_t *reftable = NULL;
> +    uint16_t *on_disk_rb;
> +    int i, ret = 0;
> +    struct {
> +        uint64_t rt_offset;
> +        uint32_t rt_clusters;
> +    } QEMU_PACKED rt_offset_and_clusters;
> +
> +    qcow2_cache_empty(bs, s->refcount_block_cache);
> +
> +write_refblocks:
> +    for (; cluster < *nb_clusters; cluster++) {
> +        if (!(*refcount_table)[cluster]) {
> +            continue;
> +        }
> +
> +        rb_index = cluster >> s->refcount_block_bits;
> +        rb_start = rb_index << s->refcount_block_bits;
> +
> +        /* Don't allocate a cluster in a refblock already written to disk */
> +        if (first_free_cluster < rb_start) {
> +            first_free_cluster = rb_start;
> +        }
> +        rb_ofs = alloc_clusters_imrt(bs, 1, refcount_table, nb_clusters,
> +                                     &first_free_cluster);
> +        if (rb_ofs < 0) {
> +            fprintf(stderr, "ERROR allocating refblock: %s\n", strerror(-ret));
> +            res->check_errors++;
> +            ret = rb_ofs;
> +            goto fail;
> +        }
> +
> +        if (reftable_size <= rb_index) {
> +            uint32_t old_rt_size = reftable_size;
> +            reftable_size = ROUND_UP((rb_index + 1) * sizeof(uint64_t),
> +                                     s->cluster_size) / sizeof(uint64_t);
> +            reftable = g_try_realloc(reftable,
> +                                     reftable_size * sizeof(uint64_t));
> +            if (!reftable) {
> +                res->check_errors++;
> +                ret = -ENOMEM;
> +                goto fail;
> +            }
> +
> +            memset(reftable + old_rt_size, 0,
> +                   (reftable_size - old_rt_size) * sizeof(uint64_t));
> +
> +            /* The offset we have for the reftable is now no longer valid;
> +             * this will leak that range, but we can easily fix that by running
> +             * a leak-fixing check after this rebuild operation */
> +            rt_ofs = -1;
> +        }
> +        reftable[rb_index] = rb_ofs;
> +
> +        /* If this is apparently the last refblock (for now), try to squeeze the
> +         * reftable in */
> +        if (rb_index == (*nb_clusters - 1) >> s->refcount_block_bits &&
> +            rt_ofs < 0)
> +        {
> +            rt_ofs = alloc_clusters_imrt(bs, size_to_clusters(s, reftable_size *
> +                                                              sizeof(uint64_t)),
> +                                         refcount_table, nb_clusters,
> +                                         &first_free_cluster);
> +            if (rt_ofs < 0) {
> +                fprintf(stderr, "ERROR allocating reftable: %s\n",
> +                        strerror(-ret));
> +                res->check_errors++;
> +                ret = rt_ofs;
> +                goto fail;
> +            }
> +        }
> +
> +        ret = qcow2_pre_write_overlap_check(bs, 0, rb_ofs, s->cluster_size);
> +        if (ret < 0) {
> +            fprintf(stderr, "ERROR writing refblock: %s\n", strerror(-ret));
> +            goto fail;
> +        }
> +
> +        on_disk_rb = g_malloc0(s->cluster_size);
> +        for (i = 0; i < s->cluster_size / sizeof(uint16_t) &&
> +                    rb_start + i < *nb_clusters; i++)
> +        {
> +            on_disk_rb[i] = cpu_to_be16((*refcount_table)[rb_start + i]);
> +        }
> +
> +        ret = bdrv_write(bs->file, rb_ofs / BDRV_SECTOR_SIZE,
> +                         (void *)on_disk_rb, s->cluster_sectors);
> +        g_free(on_disk_rb);
> +        if (ret < 0) {
> +            fprintf(stderr, "ERROR writing refblock: %s\n", strerror(-ret));
> +            goto fail;
> +        }
> +
> +        /* Go to the end of this refblock */
> +        cluster = rb_start + s->cluster_size / sizeof(uint16_t) - 1;
> +    }
> +
> +    if (rt_ofs < 0) {
> +        int64_t post_rb_start = ROUND_UP(*nb_clusters,
> +                                         s->cluster_size / sizeof(uint16_t));
> +
> +        /* Not pretty but simple */
> +        if (first_free_cluster < post_rb_start) {
> +            first_free_cluster = post_rb_start;
> +        }
> +        rt_ofs = alloc_clusters_imrt(bs, size_to_clusters(s, reftable_size *
> +                                                          sizeof(uint64_t)),
> +                                     refcount_table, nb_clusters,
> +                                     &first_free_cluster);
> +        if (rt_ofs < 0) {
> +            fprintf(stderr, "ERROR allocating reftable: %s\n", strerror(-ret));
> +            res->check_errors++;
> +            ret = rt_ofs;
> +            goto fail;
> +        }
> +
> +        goto write_refblocks;
> +    }
> +
> +    assert(reftable);
> +
> +    for (rb_index = 0; rb_index < reftable_size; rb_index++) {
> +        cpu_to_be64s(&reftable[rb_index]);
> +    }
> +
> +    ret = qcow2_pre_write_overlap_check(bs, 0, rt_ofs,
> +                                        reftable_size * sizeof(uint64_t));
> +    if (ret < 0) {
> +        fprintf(stderr, "ERROR writing reftable: %s\n", strerror(-ret));
> +        goto fail;
> +    }
> +
> +    ret = bdrv_write(bs->file, rt_ofs / BDRV_SECTOR_SIZE, (void *)reftable,
> +                     reftable_size * sizeof(uint64_t) / BDRV_SECTOR_SIZE);
> +    if (ret < 0) {
> +        fprintf(stderr, "ERROR writing reftable: %s\n", strerror(-ret));
> +        goto fail;
> +    }
> +
> +    /* Enter new reftable into the image header */
> +    cpu_to_be64w(&rt_offset_and_clusters.rt_offset, rt_ofs);
> +    cpu_to_be32w(&rt_offset_and_clusters.rt_clusters,
> +                 size_to_clusters(s, reftable_size * sizeof(uint64_t)));
> +    ret = bdrv_pwrite_sync(bs->file, offsetof(QCowHeader,
> +                                              refcount_table_offset),
> +                           &rt_offset_and_clusters,
> +                           sizeof(rt_offset_and_clusters));
> +    if (ret < 0) {
> +        fprintf(stderr, "ERROR setting reftable: %s\n", strerror(-ret));
> +        goto fail;
> +    }
> +
> +    for (rb_index = 0; rb_index < reftable_size; rb_index++) {
> +        be64_to_cpus(&reftable[rb_index]);
> +    }
> +    s->refcount_table = reftable;
> +    s->refcount_table_offset = rt_ofs;
> +    s->refcount_table_size = reftable_size;
> +
> +    return 0;
> +
> +fail:
> +    g_free(reftable);
> +    return ret;
> +}
> +
> +/*
>   * Checks an image for refcount consistency.
>   *
>   * Returns 0 if no errors are found, the number of errors in case the image is
> @@ -1612,6 +1872,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>                            BdrvCheckMode fix)
>  {
>      BDRVQcowState *s = bs->opaque;
> +    BdrvCheckResult pre_compare_res;
>      int64_t size, highest_cluster, nb_clusters;
>      uint16_t *refcount_table = NULL;
>      bool rebuild = false;
> @@ -1638,11 +1899,30 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>          goto fail;
>      }
>  
> -    compare_refcounts(bs, res, fix, &rebuild, &highest_cluster, refcount_table,
> +    /* In case we don't need to rebuild the refcount structure (but want to fix
> +     * something), this function is immediately called again, in which case the
> +     * result should be ignored */
> +    pre_compare_res = *res;
> +    compare_refcounts(bs, res, 0, &rebuild, &highest_cluster, refcount_table,
>                        nb_clusters);
>  
> -    if (rebuild) {
> -        fprintf(stderr, "ERROR need to rebuild refcount structures\n");
> +    if (rebuild && (fix & BDRV_FIX_ERRORS)) {
> +        fprintf(stderr, "Rebuilding refcount structure\n");
> +        ret = rebuild_refcount_structure(bs, res, &refcount_table,
> +                                         &nb_clusters);
> +        if (ret < 0) {
> +            goto fail;
> +        }
> +    } else if (fix) {
> +        if (rebuild) {
> +            fprintf(stderr, "ERROR need to rebuild refcount structures\n");
> +        }
> +
> +        if (res->leaks || res->corruptions) {
> +            *res = pre_compare_res;
> +            compare_refcounts(bs, res, fix, &rebuild, &highest_cluster,
> +                              refcount_table, nb_clusters);
> +        }
>      }
>  
>      /* check OFLAG_COPIED */
> -- 
> 2.1.0
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v5 08/11] qcow2: Rebuild refcount structure during check
  2014-10-10 12:44   ` Benoît Canet
@ 2014-10-11 10:27     ` Max Reitz
  0 siblings, 0 replies; 34+ messages in thread
From: Max Reitz @ 2014-10-11 10:27 UTC (permalink / raw)
  To: Benoît Canet; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi

Am 10.10.2014 um 14:44 schrieb Benoît Canet:
>> +        *nb_clusters = cluster + cluster_count - contiguous_free_clusters;
>> +        *refcount_table = g_try_realloc(*refcount_table,
>> +                                        *nb_clusters * sizeof(uint16_t));
> Something tells me that these sizeof(uint16_t) are connected to s->refcount_order
> and indirectely to REFCOUNT_SHIFT and that this code could benefit from this
> relationship thus probably saving you work in the future.

Yes, see the answer to Eric. I'm totally in favor of variable refcounts, 
see my "qcow2: Drop REFCOUNT_SHIFT series". ;-)

Once again, thank you for your reviews!

Max

>> +        if (!*refcount_table) {
>> +            return -ENOMEM;
>> +        }
>> +
>> +        memset(*refcount_table + old_nb_clusters, 0,
>> +               (*nb_clusters - old_nb_clusters) * sizeof(uint16_t));
>> +    }
>> +
>> +    /* Go back to the first free cluster */
>> +    cluster -= contiguous_free_clusters;
>> +    for (i = 0; i < cluster_count; i++) {
>> +        (*refcount_table)[cluster + i] = 1;
>> +    }
>> +
>> +    return cluster << s->cluster_bits;
>> +}
>> +
>> +/*
>> + * Creates a new refcount structure based solely on the in-memory information
>> + * given through *refcount_table. All necessary allocations will be reflected
>> + * in that array.
>> + *
>> + * On success, the old refcount structure is leaked (it will be covered by the
>> + * new refcount structure).
>> + */
>> +static int rebuild_refcount_structure(BlockDriverState *bs,
>> +                                      BdrvCheckResult *res,
>> +                                      uint16_t **refcount_table,
>> +                                      int64_t *nb_clusters)
>> +{
>> +    BDRVQcowState *s = bs->opaque;
>> +    int64_t first_free_cluster = 0, rt_ofs = -1, cluster = 0;
>> +    int64_t rb_ofs, rb_start, rb_index;
>> +    uint32_t reftable_size = 0;
>> +    uint64_t *reftable = NULL;
>> +    uint16_t *on_disk_rb;
>> +    int i, ret = 0;
>> +    struct {
>> +        uint64_t rt_offset;
>> +        uint32_t rt_clusters;
>> +    } QEMU_PACKED rt_offset_and_clusters;
>> +
>> +    qcow2_cache_empty(bs, s->refcount_block_cache);
>> +
>> +write_refblocks:
>> +    for (; cluster < *nb_clusters; cluster++) {
>> +        if (!(*refcount_table)[cluster]) {
>> +            continue;
>> +        }
>> +
>> +        rb_index = cluster >> s->refcount_block_bits;
>> +        rb_start = rb_index << s->refcount_block_bits;
>> +
>> +        /* Don't allocate a cluster in a refblock already written to disk */
>> +        if (first_free_cluster < rb_start) {
>> +            first_free_cluster = rb_start;
>> +        }
>> +        rb_ofs = alloc_clusters_imrt(bs, 1, refcount_table, nb_clusters,
>> +                                     &first_free_cluster);
>> +        if (rb_ofs < 0) {
>> +            fprintf(stderr, "ERROR allocating refblock: %s\n", strerror(-ret));
>> +            res->check_errors++;
>> +            ret = rb_ofs;
>> +            goto fail;
>> +        }
>> +
>> +        if (reftable_size <= rb_index) {
>> +            uint32_t old_rt_size = reftable_size;
>> +            reftable_size = ROUND_UP((rb_index + 1) * sizeof(uint64_t),
>> +                                     s->cluster_size) / sizeof(uint64_t);
>> +            reftable = g_try_realloc(reftable,
>> +                                     reftable_size * sizeof(uint64_t));
>> +            if (!reftable) {
>> +                res->check_errors++;
>> +                ret = -ENOMEM;
>> +                goto fail;
>> +            }
>> +
>> +            memset(reftable + old_rt_size, 0,
>> +                   (reftable_size - old_rt_size) * sizeof(uint64_t));
>> +
>> +            /* The offset we have for the reftable is now no longer valid;
>> +             * this will leak that range, but we can easily fix that by running
>> +             * a leak-fixing check after this rebuild operation */
>> +            rt_ofs = -1;
>> +        }
>> +        reftable[rb_index] = rb_ofs;
>> +
>> +        /* If this is apparently the last refblock (for now), try to squeeze the
>> +         * reftable in */
>> +        if (rb_index == (*nb_clusters - 1) >> s->refcount_block_bits &&
>> +            rt_ofs < 0)
>> +        {
>> +            rt_ofs = alloc_clusters_imrt(bs, size_to_clusters(s, reftable_size *
>> +                                                              sizeof(uint64_t)),
>> +                                         refcount_table, nb_clusters,
>> +                                         &first_free_cluster);
>> +            if (rt_ofs < 0) {
>> +                fprintf(stderr, "ERROR allocating reftable: %s\n",
>> +                        strerror(-ret));
>> +                res->check_errors++;
>> +                ret = rt_ofs;
>> +                goto fail;
>> +            }
>> +        }
>> +
>> +        ret = qcow2_pre_write_overlap_check(bs, 0, rb_ofs, s->cluster_size);
>> +        if (ret < 0) {
>> +            fprintf(stderr, "ERROR writing refblock: %s\n", strerror(-ret));
>> +            goto fail;
>> +        }
>> +
>> +        on_disk_rb = g_malloc0(s->cluster_size);
>> +        for (i = 0; i < s->cluster_size / sizeof(uint16_t) &&
>> +                    rb_start + i < *nb_clusters; i++)
>> +        {
>> +            on_disk_rb[i] = cpu_to_be16((*refcount_table)[rb_start + i]);
>> +        }
>> +
>> +        ret = bdrv_write(bs->file, rb_ofs / BDRV_SECTOR_SIZE,
>> +                         (void *)on_disk_rb, s->cluster_sectors);
>> +        g_free(on_disk_rb);
>> +        if (ret < 0) {
>> +            fprintf(stderr, "ERROR writing refblock: %s\n", strerror(-ret));
>> +            goto fail;
>> +        }
>> +
>> +        /* Go to the end of this refblock */
>> +        cluster = rb_start + s->cluster_size / sizeof(uint16_t) - 1;
>> +    }
>> +
>> +    if (rt_ofs < 0) {
>> +        int64_t post_rb_start = ROUND_UP(*nb_clusters,
>> +                                         s->cluster_size / sizeof(uint16_t));
>> +
>> +        /* Not pretty but simple */
>> +        if (first_free_cluster < post_rb_start) {
>> +            first_free_cluster = post_rb_start;
>> +        }
>> +        rt_ofs = alloc_clusters_imrt(bs, size_to_clusters(s, reftable_size *
>> +                                                          sizeof(uint64_t)),
>> +                                     refcount_table, nb_clusters,
>> +                                     &first_free_cluster);
>> +        if (rt_ofs < 0) {
>> +            fprintf(stderr, "ERROR allocating reftable: %s\n", strerror(-ret));
>> +            res->check_errors++;
>> +            ret = rt_ofs;
>> +            goto fail;
>> +        }
>> +
>> +        goto write_refblocks;
>> +    }
>> +
>> +    assert(reftable);
>> +
>> +    for (rb_index = 0; rb_index < reftable_size; rb_index++) {
>> +        cpu_to_be64s(&reftable[rb_index]);
>> +    }
>> +
>> +    ret = qcow2_pre_write_overlap_check(bs, 0, rt_ofs,
>> +                                        reftable_size * sizeof(uint64_t));
>> +    if (ret < 0) {
>> +        fprintf(stderr, "ERROR writing reftable: %s\n", strerror(-ret));
>> +        goto fail;
>> +    }
>> +
>> +    ret = bdrv_write(bs->file, rt_ofs / BDRV_SECTOR_SIZE, (void *)reftable,
>> +                     reftable_size * sizeof(uint64_t) / BDRV_SECTOR_SIZE);
>> +    if (ret < 0) {
>> +        fprintf(stderr, "ERROR writing reftable: %s\n", strerror(-ret));
>> +        goto fail;
>> +    }
>> +
>> +    /* Enter new reftable into the image header */
>> +    cpu_to_be64w(&rt_offset_and_clusters.rt_offset, rt_ofs);
>> +    cpu_to_be32w(&rt_offset_and_clusters.rt_clusters,
>> +                 size_to_clusters(s, reftable_size * sizeof(uint64_t)));
>> +    ret = bdrv_pwrite_sync(bs->file, offsetof(QCowHeader,
>> +                                              refcount_table_offset),
>> +                           &rt_offset_and_clusters,
>> +                           sizeof(rt_offset_and_clusters));
>> +    if (ret < 0) {
>> +        fprintf(stderr, "ERROR setting reftable: %s\n", strerror(-ret));
>> +        goto fail;
>> +    }
>> +
>> +    for (rb_index = 0; rb_index < reftable_size; rb_index++) {
>> +        be64_to_cpus(&reftable[rb_index]);
>> +    }
>> +    s->refcount_table = reftable;
>> +    s->refcount_table_offset = rt_ofs;
>> +    s->refcount_table_size = reftable_size;
>> +
>> +    return 0;
>> +
>> +fail:
>> +    g_free(reftable);
>> +    return ret;
>> +}
>> +
>> +/*
>>    * Checks an image for refcount consistency.
>>    *
>>    * Returns 0 if no errors are found, the number of errors in case the image is
>> @@ -1612,6 +1872,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>>                             BdrvCheckMode fix)
>>   {
>>       BDRVQcowState *s = bs->opaque;
>> +    BdrvCheckResult pre_compare_res;
>>       int64_t size, highest_cluster, nb_clusters;
>>       uint16_t *refcount_table = NULL;
>>       bool rebuild = false;
>> @@ -1638,11 +1899,30 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>>           goto fail;
>>       }
>>   
>> -    compare_refcounts(bs, res, fix, &rebuild, &highest_cluster, refcount_table,
>> +    /* In case we don't need to rebuild the refcount structure (but want to fix
>> +     * something), this function is immediately called again, in which case the
>> +     * result should be ignored */
>> +    pre_compare_res = *res;
>> +    compare_refcounts(bs, res, 0, &rebuild, &highest_cluster, refcount_table,
>>                         nb_clusters);
>>   
>> -    if (rebuild) {
>> -        fprintf(stderr, "ERROR need to rebuild refcount structures\n");
>> +    if (rebuild && (fix & BDRV_FIX_ERRORS)) {
>> +        fprintf(stderr, "Rebuilding refcount structure\n");
>> +        ret = rebuild_refcount_structure(bs, res, &refcount_table,
>> +                                         &nb_clusters);
>> +        if (ret < 0) {
>> +            goto fail;
>> +        }
>> +    } else if (fix) {
>> +        if (rebuild) {
>> +            fprintf(stderr, "ERROR need to rebuild refcount structures\n");
>> +        }
>> +
>> +        if (res->leaks || res->corruptions) {
>> +            *res = pre_compare_res;
>> +            compare_refcounts(bs, res, fix, &rebuild, &highest_cluster,
>> +                              refcount_table, nb_clusters);
>> +        }
>>       }
>>   
>>       /* check OFLAG_COPIED */
>> -- 
>> 2.1.0
>>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v5 08/11] qcow2: Rebuild refcount structure during check
  2014-08-29 21:41 ` [Qemu-devel] [PATCH v5 08/11] qcow2: Rebuild refcount structure during check Max Reitz
  2014-10-08 23:09   ` Eric Blake
  2014-10-10 12:44   ` Benoît Canet
@ 2014-10-11 18:56   ` Benoît Canet
  2014-10-12  7:32     ` Max Reitz
  2 siblings, 1 reply; 34+ messages in thread
From: Benoît Canet @ 2014-10-11 18:56 UTC (permalink / raw)
  To: Max Reitz; +Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, Benoît Canet


> +    int64_t first_free_cluster = 0, rt_ofs = -1, cluster = 0;
> +    int64_t rb_ofs, rb_start, rb_index;

Everytime a few day pass and I read rb_ofs and rt_ofs again I found these names
obfuscated. I know Linus says that C is a spartan language but these look odd.
Maybe refcount_block_offset, refcount_block_* and refcount_table_offset would be
better.

I'll try to read this patch for real before you repost it.

Best regards

Benoît

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v5 08/11] qcow2: Rebuild refcount structure during check
  2014-10-11 18:56   ` Benoît Canet
@ 2014-10-12  7:32     ` Max Reitz
  0 siblings, 0 replies; 34+ messages in thread
From: Max Reitz @ 2014-10-12  7:32 UTC (permalink / raw)
  To: Benoît Canet; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi

Am 11.10.2014 um 20:56 schrieb Benoît Canet:
>> +    int64_t first_free_cluster = 0, rt_ofs = -1, cluster = 0;
>> +    int64_t rb_ofs, rb_start, rb_index;
> Everytime a few day pass and I read rb_ofs and rt_ofs again I found these names
> obfuscated. I know Linus says that C is a spartan language but these look odd.
> Maybe refcount_block_offset, refcount_block_* and refcount_table_offset would be
> better.

I would use longer names if there was no line length limit. ;-)

I'll try and see how it looks.

Max

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v5 09/11] qcow2: Clean up after refcount rebuild
  2014-08-29 21:40 [Qemu-devel] [PATCH v5 00/11] qcow2: Fix image repairing Max Reitz
                   ` (7 preceding siblings ...)
  2014-08-29 21:41 ` [Qemu-devel] [PATCH v5 08/11] qcow2: Rebuild refcount structure during check Max Reitz
@ 2014-08-29 21:41 ` Max Reitz
  2014-08-29 21:41 ` [Qemu-devel] [PATCH v5 10/11] iotests: Fix test outputs Max Reitz
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 34+ messages in thread
From: Max Reitz @ 2014-08-29 21:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Benoît Canet, Max Reitz

Because the old refcount structure will be leaked after having rebuilt
it, we need to recalculate the refcounts and run a leak-fixing operation
afterwards (if leaks should be fixed at all).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
---
 block/qcow2-refcount.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 318c152..29136ee 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1907,12 +1907,47 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
                       nb_clusters);
 
     if (rebuild && (fix & BDRV_FIX_ERRORS)) {
+        BdrvCheckResult old_res = *res;
+
         fprintf(stderr, "Rebuilding refcount structure\n");
         ret = rebuild_refcount_structure(bs, res, &refcount_table,
                                          &nb_clusters);
         if (ret < 0) {
             goto fail;
         }
+
+        res->corruptions = 0;
+        res->leaks = 0;
+
+        /* Because the old reftable has been exchanged for a new one the
+         * references have to be recalculated */
+        rebuild = false;
+        memset(refcount_table, 0, nb_clusters * sizeof(uint16_t));
+        ret = calculate_refcounts(bs, res, 0, &rebuild, &refcount_table,
+                                  &nb_clusters);
+        if (ret < 0) {
+            goto fail;
+        }
+
+        if (fix & BDRV_FIX_LEAKS) {
+            /* The old refcount structures are now leaked, fix it; the result
+             * can be ignored */
+            pre_compare_res = *res;
+            compare_refcounts(bs, res, BDRV_FIX_LEAKS, &rebuild,
+                              &highest_cluster, refcount_table, nb_clusters);
+            if (rebuild) {
+                fprintf(stderr, "ERROR rebuilt refcount structure is still "
+                        "broken\n");
+            }
+            *res = pre_compare_res;
+        }
+
+        if (res->corruptions < old_res.corruptions) {
+            res->corruptions_fixed += old_res.corruptions - res->corruptions;
+        }
+        if (res->leaks < old_res.leaks) {
+            res->leaks_fixed += old_res.leaks - res->leaks;
+        }
     } else if (fix) {
         if (rebuild) {
             fprintf(stderr, "ERROR need to rebuild refcount structures\n");
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v5 10/11] iotests: Fix test outputs
  2014-08-29 21:40 [Qemu-devel] [PATCH v5 00/11] qcow2: Fix image repairing Max Reitz
                   ` (8 preceding siblings ...)
  2014-08-29 21:41 ` [Qemu-devel] [PATCH v5 09/11] qcow2: Clean up after refcount rebuild Max Reitz
@ 2014-08-29 21:41 ` Max Reitz
  2014-09-02 19:49   ` Eric Blake
  2014-08-29 21:41 ` [Qemu-devel] [PATCH v5 11/11] iotests: Add test for potentially damaging repairs Max Reitz
  2014-10-08 19:25 ` [Qemu-devel] [PATCH v5 00/11] qcow2: Fix image repairing Max Reitz
  11 siblings, 1 reply; 34+ messages in thread
From: Max Reitz @ 2014-08-29 21:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Benoît Canet, Max Reitz

039, 060 and 061 all create images with referenced clusters having a
refcount of 0. Because previous commits changed handling of such errors,
these tests now have a different output. Fix it.

Furthermore, 060 created a refblock with a refcount greater than one
which now results in having to rebuild the refcount structure as well.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/039.out | 10 ++++++++--
 tests/qemu-iotests/060.out | 10 ++++++++--
 tests/qemu-iotests/061.out | 18 ++++++++++++------
 3 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/tests/qemu-iotests/039.out b/tests/qemu-iotests/039.out
index 67e7744..0adf153 100644
--- a/tests/qemu-iotests/039.out
+++ b/tests/qemu-iotests/039.out
@@ -25,7 +25,10 @@ read 512/512 bytes at offset 0
 incompatible_features     0x1
 
 == Repairing the image file must succeed ==
-Repairing cluster 5 refcount=0 reference=1
+ERROR cluster 5 refcount=0 reference=1
+Rebuilding refcount structure
+Repairing cluster 1 refcount=1 reference=0
+Repairing cluster 2 refcount=1 reference=0
 The following inconsistencies were found and repaired:
 
     0 leaked clusters
@@ -45,7 +48,10 @@ wrote 512/512 bytes at offset 0
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 ./039: Aborted                 ( ulimit -c 0; exec "$@" )
 incompatible_features     0x1
-Repairing cluster 5 refcount=0 reference=1
+ERROR cluster 5 refcount=0 reference=1
+Rebuilding refcount structure
+Repairing cluster 1 refcount=1 reference=0
+Repairing cluster 2 refcount=1 reference=0
 wrote 512/512 bytes at offset 0
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 incompatible_features     0x0
diff --git a/tests/qemu-iotests/060.out b/tests/qemu-iotests/060.out
index c27c952..41ecbc7 100644
--- a/tests/qemu-iotests/060.out
+++ b/tests/qemu-iotests/060.out
@@ -27,11 +27,15 @@ incompatible_features     0x0
 qcow2: Preventing invalid write on metadata (overlaps with refcount block); image marked as corrupt.
 write failed: Input/output error
 incompatible_features     0x2
-Repairing refcount block 0 refcount=2
+ERROR refcount block 0 refcount=2
+ERROR cluster 2 refcount=1 reference=2
+Rebuilding refcount structure
+Repairing cluster 1 refcount=1 reference=0
+Repairing cluster 2 refcount=2 reference=1
 The following inconsistencies were found and repaired:
 
     0 leaked clusters
-    1 corruptions
+    2 corruptions
 
 Double checking the fixed image now...
 No errors were found on the image.
@@ -59,6 +63,8 @@ incompatible_features     0x0
 qcow2: Preventing invalid write on metadata (overlaps with inactive L2 table); image marked as corrupt.
 write failed: Input/output error
 incompatible_features     0x2
+ERROR cluster 4 refcount=1 reference=2
+Leaked cluster 9 refcount=1 reference=0
 Repairing cluster 4 refcount=1 reference=2
 Repairing cluster 9 refcount=1 reference=0
 Repairing OFLAG_COPIED data cluster: l2_entry=8000000000040000 refcount=2
diff --git a/tests/qemu-iotests/061.out b/tests/qemu-iotests/061.out
index e372470..4ae6472 100644
--- a/tests/qemu-iotests/061.out
+++ b/tests/qemu-iotests/061.out
@@ -76,8 +76,11 @@ autoclear_features        0x0
 refcount_order            4
 header_length             104
 
-Repairing cluster 5 refcount=0 reference=1
-Repairing cluster 6 refcount=0 reference=1
+ERROR cluster 5 refcount=0 reference=1
+ERROR cluster 6 refcount=0 reference=1
+Rebuilding refcount structure
+Repairing cluster 1 refcount=1 reference=0
+Repairing cluster 2 refcount=1 reference=0
 magic                     0x514649fb
 version                   2
 backing_file_offset       0x0
@@ -87,7 +90,7 @@ size                      67108864
 crypt_method              0
 l1_size                   1
 l1_table_offset           0x30000
-refcount_table_offset     0x10000
+refcount_table_offset     0x80000
 refcount_table_clusters   1
 nb_snapshots              0
 snapshot_offset           0x0
@@ -230,8 +233,11 @@ autoclear_features        0x0
 refcount_order            4
 header_length             104
 
-Repairing cluster 5 refcount=0 reference=1
-Repairing cluster 6 refcount=0 reference=1
+ERROR cluster 5 refcount=0 reference=1
+ERROR cluster 6 refcount=0 reference=1
+Rebuilding refcount structure
+Repairing cluster 1 refcount=1 reference=0
+Repairing cluster 2 refcount=1 reference=0
 magic                     0x514649fb
 version                   3
 backing_file_offset       0x0
@@ -241,7 +247,7 @@ size                      67108864
 crypt_method              0
 l1_size                   1
 l1_table_offset           0x30000
-refcount_table_offset     0x10000
+refcount_table_offset     0x80000
 refcount_table_clusters   1
 nb_snapshots              0
 snapshot_offset           0x0
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v5 10/11] iotests: Fix test outputs
  2014-08-29 21:41 ` [Qemu-devel] [PATCH v5 10/11] iotests: Fix test outputs Max Reitz
@ 2014-09-02 19:49   ` Eric Blake
  0 siblings, 0 replies; 34+ messages in thread
From: Eric Blake @ 2014-09-02 19:49 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Benoît Canet

[-- Attachment #1: Type: text/plain, Size: 813 bytes --]

On 08/29/2014 03:41 PM, Max Reitz wrote:
> 039, 060 and 061 all create images with referenced clusters having a
> refcount of 0. Because previous commits changed handling of such errors,
> these tests now have a different output. Fix it.
> 
> Furthermore, 060 created a refblock with a refcount greater than one
> which now results in having to rebuild the refcount structure as well.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  tests/qemu-iotests/039.out | 10 ++++++++--
>  tests/qemu-iotests/060.out | 10 ++++++++--
>  tests/qemu-iotests/061.out | 18 ++++++++++++------
>  3 files changed, 28 insertions(+), 10 deletions(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Qemu-devel] [PATCH v5 11/11] iotests: Add test for potentially damaging repairs
  2014-08-29 21:40 [Qemu-devel] [PATCH v5 00/11] qcow2: Fix image repairing Max Reitz
                   ` (9 preceding siblings ...)
  2014-08-29 21:41 ` [Qemu-devel] [PATCH v5 10/11] iotests: Fix test outputs Max Reitz
@ 2014-08-29 21:41 ` Max Reitz
  2014-09-02 19:52   ` Eric Blake
  2014-10-08 19:25 ` [Qemu-devel] [PATCH v5 00/11] qcow2: Fix image repairing Max Reitz
  11 siblings, 1 reply; 34+ messages in thread
From: Max Reitz @ 2014-08-29 21:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Benoît Canet, Max Reitz

There are certain cases where repairing a qcow2 image might actually
damage it further (or rather, where repairing it has in fact damaged it
further with the old qcow2 check implementation). This should not
happen, so add a test for these cases.

Furthermore, the repair function now repairs refblocks beyond the image
end by resizing the image accordingly. Add several tests for this as
well.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/104     | 141 +++++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/104.out | 110 +++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 252 insertions(+)
 create mode 100755 tests/qemu-iotests/104
 create mode 100644 tests/qemu-iotests/104.out

diff --git a/tests/qemu-iotests/104 b/tests/qemu-iotests/104
new file mode 100755
index 0000000..a2458be
--- /dev/null
+++ b/tests/qemu-iotests/104
@@ -0,0 +1,141 @@
+#!/bin/bash
+#
+# Test case for repairing qcow2 images which cannot be repaired using
+# the on-disk refcount structures
+#
+# Copyright (C) 2014 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=mreitz@redhat.com
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+here="$PWD"
+tmp=/tmp/$$
+status=1	# failure is the default!
+
+_cleanup()
+{
+	_cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+# This tests qocw2-specific low-level functionality
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+
+echo
+echo '=== Repairing an image without any refcount table ==='
+echo
+
+_make_test_img 64M
+# just write some data
+$QEMU_IO -c 'write -P 42 0 64k' "$TEST_IMG" | _filter_qemu_io
+
+# refcount_table_offset
+poke_file "$TEST_IMG" 48 "\x00\x00\x00\x00\x00\x00\x00\x00"
+# refcount_table_clusters
+poke_file "$TEST_IMG" 56 "\x00\x00\x00\x00"
+
+_check_test_img -r all
+
+$QEMU_IO -c 'read -P 42 0 64k' "$TEST_IMG" | _filter_qemu_io
+
+echo
+echo '=== Repairing unreferenced data cluster in new refblock area ==='
+echo
+
+IMGOPTS='cluster_size=512' _make_test_img 64M
+# Allocate the first 128 kB in the image (first refblock)
+$QEMU_IO -c 'write 0 111104' "$TEST_IMG" | _filter_qemu_io
+# should be 131072
+stat -c '%s' "$TEST_IMG"
+
+# Enter a cluster at 128 kB (0x20000)
+# XXX: This (0x1ccc8) should be the first free entry in the last L2 table, but
+# we cannot be sure
+poke_file "$TEST_IMG" 117960 "\x80\x00\x00\x00\x00\x02\x00\x00"
+
+# Fill the cluster
+truncate -s 131584 "$TEST_IMG"
+$QEMU_IO -c "open -o driver=raw $TEST_IMG" -c 'write -P 42 128k 512' \
+    | _filter_qemu_io
+
+# The data should now appear at this guest offset
+$QEMU_IO -c 'read -P 42 111104 512' "$TEST_IMG" | _filter_qemu_io
+
+# This cluster is unallocated; fix it
+_check_test_img -r all
+
+# This repair operation must have allocated a new refblock; and that refblock
+# should not overlap with the unallocated data cluster. If it does, the data
+# will be damaged, so check it.
+$QEMU_IO -c 'read -P 42 111104 512' "$TEST_IMG" | _filter_qemu_io
+
+echo
+echo '=== Repairing refblock beyond the image end ==='
+echo
+
+echo
+echo '--- Otherwise clean ---'
+echo
+
+_make_test_img 64M
+# Normally, qemu doesn't create empty refblocks, so we just have to do it by
+# hand
+# XXX: This (0x10008) should be the entry for the second refblock
+poke_file "$TEST_IMG" 65544 "\x00\x00\x00\x00\x00\x10\x00\x00"
+# Mark that refblock as used
+# XXX: This (0x20020) should be the 17th entry (cluster 16) of the first
+# refblock
+poke_file "$TEST_IMG" 131104 "\x00\x01"
+_check_test_img -r all
+
+echo
+echo '--- Refblock is unallocated ---'
+echo
+
+_make_test_img 64M
+poke_file "$TEST_IMG" 65544 "\x00\x00\x00\x00\x00\x10\x00\x00"
+_check_test_img -r all
+
+echo
+echo '--- Signed overflow after the refblock ---'
+echo
+
+_make_test_img 64M
+poke_file "$TEST_IMG" 65544 "\x7f\xff\xff\xff\xff\xff\x00\x00"
+_check_test_img -r all
+
+echo
+echo '--- Unsigned overflow after the refblock ---'
+echo
+
+_make_test_img 64M
+poke_file "$TEST_IMG" 65544 "\xff\xff\xff\xff\xff\xff\x00\x00"
+_check_test_img -r all
+
+# success, all done
+echo '*** done'
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/104.out b/tests/qemu-iotests/104.out
new file mode 100644
index 0000000..80f559d
--- /dev/null
+++ b/tests/qemu-iotests/104.out
@@ -0,0 +1,110 @@
+QA output created by 104
+
+=== Repairing an image without any refcount table ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 
+wrote 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+ERROR cluster 0 refcount=0 reference=1
+ERROR cluster 3 refcount=0 reference=1
+ERROR cluster 4 refcount=0 reference=1
+ERROR cluster 5 refcount=0 reference=1
+Rebuilding refcount structure
+The following inconsistencies were found and repaired:
+
+    0 leaked clusters
+    4 corruptions
+
+Double checking the fixed image now...
+No errors were found on the image.
+read 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+=== Repairing unreferenced data cluster in new refblock area ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 
+wrote 111104/111104 bytes at offset 0
+108.500 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+131072
+wrote 512/512 bytes at offset 131072
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 512/512 bytes at offset 111104
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+ERROR cluster 256 refcount=0 reference=1
+Rebuilding refcount structure
+Repairing cluster 1 refcount=1 reference=0
+Repairing cluster 2 refcount=1 reference=0
+The following inconsistencies were found and repaired:
+
+    0 leaked clusters
+    1 corruptions
+
+Double checking the fixed image now...
+No errors were found on the image.
+read 512/512 bytes at offset 111104
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+=== Repairing refblock beyond the image end ===
+
+
+--- Otherwise clean ---
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 
+Repairing refcount block 1 is outside image
+The following inconsistencies were found and repaired:
+
+    0 leaked clusters
+    1 corruptions
+
+Double checking the fixed image now...
+No errors were found on the image.
+
+--- Refblock is unallocated ---
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 
+Repairing refcount block 1 is outside image
+ERROR cluster 16 refcount=0 reference=1
+Rebuilding refcount structure
+Repairing cluster 1 refcount=1 reference=0
+Repairing cluster 2 refcount=1 reference=0
+Repairing cluster 16 refcount=1 reference=0
+The following inconsistencies were found and repaired:
+
+    0 leaked clusters
+    2 corruptions
+
+Double checking the fixed image now...
+No errors were found on the image.
+
+--- Signed overflow after the refblock ---
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 
+Repairing refcount block 1 is outside image
+ERROR could not resize image: Invalid argument
+Rebuilding refcount structure
+Repairing cluster 1 refcount=1 reference=0
+Repairing cluster 2 refcount=1 reference=0
+The following inconsistencies were found and repaired:
+
+    0 leaked clusters
+    1 corruptions
+
+Double checking the fixed image now...
+No errors were found on the image.
+
+--- Unsigned overflow after the refblock ---
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864 
+Repairing refcount block 1 is outside image
+ERROR could not resize image: Invalid argument
+Rebuilding refcount structure
+Repairing cluster 1 refcount=1 reference=0
+Repairing cluster 2 refcount=1 reference=0
+The following inconsistencies were found and repaired:
+
+    0 leaked clusters
+    1 corruptions
+
+Double checking the fixed image now...
+No errors were found on the image.
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 0920b28..377687e 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -104,3 +104,4 @@
 100 rw auto quick
 101 rw auto quick
 103 rw auto quick
+104 rw auto quick
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v5 11/11] iotests: Add test for potentially damaging repairs
  2014-08-29 21:41 ` [Qemu-devel] [PATCH v5 11/11] iotests: Add test for potentially damaging repairs Max Reitz
@ 2014-09-02 19:52   ` Eric Blake
  0 siblings, 0 replies; 34+ messages in thread
From: Eric Blake @ 2014-09-02 19:52 UTC (permalink / raw)
  To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Benoît Canet

[-- Attachment #1: Type: text/plain, Size: 1011 bytes --]

On 08/29/2014 03:41 PM, Max Reitz wrote:
> There are certain cases where repairing a qcow2 image might actually
> damage it further (or rather, where repairing it has in fact damaged it
> further with the old qcow2 check implementation). This should not
> happen, so add a test for these cases.
> 
> Furthermore, the repair function now repairs refblocks beyond the image
> end by resizing the image accordingly. Add several tests for this as
> well.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  tests/qemu-iotests/104     | 141 +++++++++++++++++++++++++++++++++++++++++++++
>  tests/qemu-iotests/104.out | 110 +++++++++++++++++++++++++++++++++++
>  tests/qemu-iotests/group   |   1 +
>  3 files changed, 252 insertions(+)
>  create mode 100755 tests/qemu-iotests/104
>  create mode 100644 tests/qemu-iotests/104.out
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Qemu-devel] [PATCH v5 00/11] qcow2: Fix image repairing
  2014-08-29 21:40 [Qemu-devel] [PATCH v5 00/11] qcow2: Fix image repairing Max Reitz
                   ` (10 preceding siblings ...)
  2014-08-29 21:41 ` [Qemu-devel] [PATCH v5 11/11] iotests: Add test for potentially damaging repairs Max Reitz
@ 2014-10-08 19:25 ` Max Reitz
  11 siblings, 0 replies; 34+ messages in thread
From: Max Reitz @ 2014-10-08 19:25 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Fam Zheng, Stefan Hajnoczi, Benoît Canet

On 29.08.2014 23:40, Max Reitz wrote:
> As can be seen in the final patch of this series, there are certain
> cases where the current repair implementation of qcow2 actually damages
> the image further because it allocates new clusters for the refcount
> structure which overlap with existing but according to the on-disk
> refcounts (which are assumed to be wrong to begin with) unallocated
> clusters.
>
> This series fixes this by completely recreating the refcount structure
> based on the in-memory information calculated during the check operation
> if the possibility of damaging the image while repairing the refcount
> structures in-place exists.
>
>
> v5:
> - Added patch 1 which adds two helper variables to BDRVQcowState
>    reflecting the number of entries per refcount block;
>    in contrast to v4, we don't need to clamp the refcount order against
>    sub-byte widths, because sub-byte widths are actually correct (that
>    means, I dropped the MAX() around refcount_order - 3)
> - Patch 8 (prev. 7):
>    - Use these new variables [Benoît]
>    - Use a struct for rt_offset_and_clusters [Benoît]
>
>
> git-backport-diff against v4:
>
> Key:
> [----] : patches are identical
> [####] : number of functional differences between upstream/downstream patch
> [down] : patch is downstream-only
> The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively
>
> 001/11:[down] 'qcow2: Calculate refcount block entry count'
> 002/11:[----] [--] 'qcow2: Fix leaks in dirty images'
> 003/11:[----] [--] 'qcow2: Split qcow2_check_refcounts()'
> 004/11:[----] [--] 'qcow2: Pull check_refblocks() up'
> 005/11:[----] [--] 'qcow2: Reuse refcount table in calculate_refcounts()'
> 006/11:[----] [--] 'qcow2: Fix refcount blocks beyond image end'
> 007/11:[----] [--] 'qcow2: Do not perform potentially damaging repairs'
> 008/11:[0025] [FC] 'qcow2: Rebuild refcount structure during check'
> 009/11:[----] [--] 'qcow2: Clean up after refcount rebuild'
> 010/11:[----] [--] 'iotests: Fix test outputs'
> 011/11:[----] [-C] 'iotests: Add test for potentially damaging repairs'
>
>
> Max Reitz (11):
>    qcow2: Calculate refcount block entry count
>    qcow2: Fix leaks in dirty images
>    qcow2: Split qcow2_check_refcounts()
>    qcow2: Pull check_refblocks() up
>    qcow2: Reuse refcount table in calculate_refcounts()
>    qcow2: Fix refcount blocks beyond image end
>    qcow2: Do not perform potentially damaging repairs
>    qcow2: Rebuild refcount structure during check
>    qcow2: Clean up after refcount rebuild
>    iotests: Fix test outputs
>    iotests: Add test for potentially damaging repairs
>
>   block/qcow2-refcount.c     | 677 ++++++++++++++++++++++++++++++++-------------
>   block/qcow2.c              |   4 +-
>   block/qcow2.h              |   2 +
>   tests/qemu-iotests/039.out |  10 +-
>   tests/qemu-iotests/060.out |  10 +-
>   tests/qemu-iotests/061.out |  18 +-
>   tests/qemu-iotests/104     | 141 ++++++++++
>   tests/qemu-iotests/104.out | 110 ++++++++
>   tests/qemu-iotests/group   |   1 +
>   9 files changed, 767 insertions(+), 206 deletions(-)
>   create mode 100755 tests/qemu-iotests/104
>   create mode 100644 tests/qemu-iotests/104.out
>

Ping. Patch 8 still requires a review. I know it's probably the hardest 
patch of the series, but well...

Max

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2014-10-22  8:27 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-29 21:40 [Qemu-devel] [PATCH v5 00/11] qcow2: Fix image repairing Max Reitz
2014-08-29 21:40 ` [Qemu-devel] [PATCH v5 01/11] qcow2: Calculate refcount block entry count Max Reitz
2014-08-29 23:03   ` Eric Blake
2014-09-02 18:56     ` Max Reitz
2014-10-10 12:29   ` Benoît Canet
2014-10-11 10:27     ` Max Reitz
2014-10-20 14:25   ` Kevin Wolf
2014-10-20 14:39     ` Max Reitz
2014-10-20 14:48       ` Kevin Wolf
2014-10-21 16:26         ` Max Reitz
2014-10-22  8:27           ` Kevin Wolf
2014-08-29 21:40 ` [Qemu-devel] [PATCH v5 02/11] qcow2: Fix leaks in dirty images Max Reitz
2014-10-20 14:28   ` Kevin Wolf
2014-08-29 21:40 ` [Qemu-devel] [PATCH v5 03/11] qcow2: Split qcow2_check_refcounts() Max Reitz
2014-10-20 14:45   ` Kevin Wolf
2014-08-29 21:40 ` [Qemu-devel] [PATCH v5 04/11] qcow2: Pull check_refblocks() up Max Reitz
2014-08-29 21:40 ` [Qemu-devel] [PATCH v5 05/11] qcow2: Reuse refcount table in calculate_refcounts() Max Reitz
2014-08-29 21:40 ` [Qemu-devel] [PATCH v5 06/11] qcow2: Fix refcount blocks beyond image end Max Reitz
2014-08-29 21:40 ` [Qemu-devel] [PATCH v5 07/11] qcow2: Do not perform potentially damaging repairs Max Reitz
2014-08-29 21:41 ` [Qemu-devel] [PATCH v5 08/11] qcow2: Rebuild refcount structure during check Max Reitz
2014-10-08 23:09   ` Eric Blake
2014-10-11 10:17     ` Max Reitz
2014-10-16 15:27     ` Max Reitz
2014-10-16 15:33       ` Eric Blake
2014-10-10 12:44   ` Benoît Canet
2014-10-11 10:27     ` Max Reitz
2014-10-11 18:56   ` Benoît Canet
2014-10-12  7:32     ` Max Reitz
2014-08-29 21:41 ` [Qemu-devel] [PATCH v5 09/11] qcow2: Clean up after refcount rebuild Max Reitz
2014-08-29 21:41 ` [Qemu-devel] [PATCH v5 10/11] iotests: Fix test outputs Max Reitz
2014-09-02 19:49   ` Eric Blake
2014-08-29 21:41 ` [Qemu-devel] [PATCH v5 11/11] iotests: Add test for potentially damaging repairs Max Reitz
2014-09-02 19:52   ` Eric Blake
2014-10-08 19:25 ` [Qemu-devel] [PATCH v5 00/11] qcow2: Fix image repairing Max Reitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).