[Qemu-devel] [PATCH 0/2] qcow2: Undo leaked allocations in co

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH 0/2] qcow2: Undo leaked allocations in co_writev
@ 2013-10-10  8:52 Max Reitz
  2013-10-10  8:52 ` [Qemu-devel] [PATCH 1/2] " Max Reitz
  2013-10-10  8:52 ` [Qemu-devel] [PATCH 2/2] qemu-iotests: Extend test 026 Max Reitz
  0 siblings, 2 replies; 8+ messages in thread
From: Max Reitz @ 2013-10-10  8:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Max Reitz

If a write request on a qcow2 image spans more than one L2 table,
qcow2_alloc_cluster_offset cannot allocate the required clusters in a
single operation. This results in leaks, if a subsequent (atomic)
allocation in that function fails, because qcow2_co_writev does not undo
unused cluster allocations.

This series implements that deallocation and provides a test for it.

Max Reitz (2):
  qcow2: Undo leaked allocations in co_writev
  qemu-iotests: Extend test 026

 block/qcow2.c                      |  7 +++++++
 tests/qemu-iotests/026             | 31 +++++++++++++++++++++++++++++++
 tests/qemu-iotests/026.out         |  8 ++++++++
 tests/qemu-iotests/026.out.nocache |  8 ++++++++
 4 files changed, 54 insertions(+)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Qemu-devel] [PATCH 1/2] qcow2: Undo leaked allocations in co_writev
  2013-10-10  8:52 [Qemu-devel] [PATCH 0/2] qcow2: Undo leaked allocations in co_writev Max Reitz
@ 2013-10-10  8:52 ` Max Reitz
  2013-10-10 12:26   ` Kevin Wolf
  2013-10-10  8:52 ` [Qemu-devel] [PATCH 2/2] qemu-iotests: Extend test 026 Max Reitz
  1 sibling, 1 reply; 8+ messages in thread
From: Max Reitz @ 2013-10-10  8:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Max Reitz

If the write request spans more than one L2 table,
qcow2_alloc_cluster_offset cannot handle the required allocations
atomically. This results in leaks if it allocated new clusters in any
but the last L2 table touched and an error occurs in qcow2_co_writev
before having established the L2 link. These non-atomic allocations
were, however, indeed successful and are therefore given to the caller
in the L2Meta list.

If an error occurs in qcow2_co_writev and the L2Meta list is unwound,
all its remaining entries are clusters whose L2 links were not yet
established. Thus, all allocations in that list should be undone.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/block/qcow2.c b/block/qcow2.c
index b2489fb..6bedd5d 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1017,6 +1017,13 @@ fail:
     while (l2meta != NULL) {
         QCowL2Meta *next;

+        /* Undo all leaked allocations */
+        if (l2meta->nb_clusters != 0) {
+            qcow2_free_clusters(bs, l2meta->alloc_offset,
+                                l2meta->nb_clusters << s->cluster_bits,
+                                QCOW2_DISCARD_ALWAYS);
+        }
+
         if (l2meta->nb_clusters != 0) {
             QLIST_REMOVE(l2meta, next_in_flight);
         }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] qcow2: Undo leaked allocations in co_writev
  2013-10-10  8:52 ` [Qemu-devel] [PATCH 1/2] " Max Reitz
@ 2013-10-10 12:26   ` Kevin Wolf
  2013-10-10 12:54     ` Max Reitz
  2013-10-11  9:15     ` Stefan Hajnoczi
  0 siblings, 2 replies; 8+ messages in thread
From: Kevin Wolf @ 2013-10-10 12:26 UTC (permalink / raw)
  To: Max Reitz; +Cc: qemu-devel, Stefan Hajnoczi

Am 10.10.2013 um 10:52 hat Max Reitz geschrieben:
> If the write request spans more than one L2 table,
> qcow2_alloc_cluster_offset cannot handle the required allocations
> atomically. This results in leaks if it allocated new clusters in any
> but the last L2 table touched and an error occurs in qcow2_co_writev
> before having established the L2 link. These non-atomic allocations
> were, however, indeed successful and are therefore given to the caller
> in the L2Meta list.
> 
> If an error occurs in qcow2_co_writev and the L2Meta list is unwound,
> all its remaining entries are clusters whose L2 links were not yet
> established. Thus, all allocations in that list should be undone.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index b2489fb..6bedd5d 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -1017,6 +1017,13 @@ fail:
>      while (l2meta != NULL) {
>          QCowL2Meta *next;
>  
> +        /* Undo all leaked allocations */
> +        if (l2meta->nb_clusters != 0) {
> +            qcow2_free_clusters(bs, l2meta->alloc_offset,
> +                                l2meta->nb_clusters << s->cluster_bits,
> +                                QCOW2_DISCARD_ALWAYS);
> +        }
> +
>          if (l2meta->nb_clusters != 0) {
>              QLIST_REMOVE(l2meta, next_in_flight);
>          }

This feels a bit risky.

I think currently it does work, because qcow2_alloc_cluster_link_l2()
can only return an error when it didn't update the L2 entry in the cache
yet, but adding any error condition between that point and the L2Meta
unwinding would result in corruption. I'm unsure, but perhaps a cluster
leak is the lesser evil. Did you consider this? Do other people have an
opinion on it?

Also, shouldn't it be QCOW2_DISCARD_OTHER?

Kevin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] qcow2: Undo leaked allocations in co_writev
  2013-10-10 12:26   ` Kevin Wolf
@ 2013-10-10 12:54     ` Max Reitz
  2013-10-10 13:57       ` Kevin Wolf
  2013-10-11  9:15     ` Stefan Hajnoczi
  1 sibling, 1 reply; 8+ messages in thread
From: Max Reitz @ 2013-10-10 12:54 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, Stefan Hajnoczi

On 2013-10-10 14:26, Kevin Wolf wrote:
> Am 10.10.2013 um 10:52 hat Max Reitz geschrieben:
>> If the write request spans more than one L2 table,
>> qcow2_alloc_cluster_offset cannot handle the required allocations
>> atomically. This results in leaks if it allocated new clusters in any
>> but the last L2 table touched and an error occurs in qcow2_co_writev
>> before having established the L2 link. These non-atomic allocations
>> were, however, indeed successful and are therefore given to the caller
>> in the L2Meta list.
>>
>> If an error occurs in qcow2_co_writev and the L2Meta list is unwound,
>> all its remaining entries are clusters whose L2 links were not yet
>> established. Thus, all allocations in that list should be undone.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/qcow2.c | 7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/block/qcow2.c b/block/qcow2.c
>> index b2489fb..6bedd5d 100644
>> --- a/block/qcow2.c
>> +++ b/block/qcow2.c
>> @@ -1017,6 +1017,13 @@ fail:
>>       while (l2meta != NULL) {
>>           QCowL2Meta *next;
>>   
>> +        /* Undo all leaked allocations */
>> +        if (l2meta->nb_clusters != 0) {
>> +            qcow2_free_clusters(bs, l2meta->alloc_offset,
>> +                                l2meta->nb_clusters << s->cluster_bits,
>> +                                QCOW2_DISCARD_ALWAYS);
>> +        }
>> +
>>           if (l2meta->nb_clusters != 0) {
>>               QLIST_REMOVE(l2meta, next_in_flight);
>>           }
> This feels a bit risky.
>
> I think currently it does work, because qcow2_alloc_cluster_link_l2()
> can only return an error when it didn't update the L2 entry in the cache
> yet, but adding any error condition between that point and the L2Meta
> unwinding would result in corruption. I'm unsure, but perhaps a cluster
> leak is the lesser evil. Did you consider this? Do other people have an
> opinion on it?

What error conditions are there which can occur between 
qcow2_alloc_cluster_link_l2 and the L2Meta unwinding? If all 
qcow2_alloc_cluster_link_l2 calls are successful, the list is empty and 
the while loop either goes into another iteration or the function 
returns successfully (without any further need to unwind the list). If 
some call fails, all previous (successful) calls have already been 
removed from the list, therefore the unwinding only affects L2Meta 
request with failed calls to qcow2_alloc_cluster_link_l2 (or ones where 
that function wasn't called at all).

If the "currently" implied that this will turn out bad if there is a new 
error condition between a successful call to qcow2_alloc_cluster_link_l2 
and the removal of the L2Meta request from the list: Yes, that's true, 
of course. However, as you've said, currently, there is no such 
condition; and I don't see why it should be introduced. The sole purpose 
of the list seems to be (to me) to execute qcow2_alloc_cluster_link_l2 
on every of its elements. Thus, as soon as qcow2_alloc_cluster_link_l2 
is successful, the corresponding request should be removed from the list.

So, in case you do agree that it currently works fine, I would not 
consider it risky; if this patch is applied and some time in the future 
anything introduces a "goto fail" between qcow2_alloc_cluster_link_l2 
and l2_meta = next, this patch would simply have to make sure that 
qcow2_free_clusters isn't called in this case. In the probably very 
unlikely case all my previous assumptions and conclusions were true, I'd 
just add a comment in the qcow2_alloc_cluster_link_l2 loop informing 
about this case (“If you add a goto fail here, make sure to pay 
attention” or something along these lines).

> Also, shouldn't it be QCOW2_DISCARD_OTHER?

I'm always unsure about the discard flags. ;-)

I try to follow the rule of “use the specific type (or ‘other’) for 
freeing ‘out of the blue’, but use ‘always’ if it's just a very recent 
allocation that is being undone again”. I'd gladly accept better 
recommendations. ;-)

Max

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] qcow2: Undo leaked allocations in co_writev
  2013-10-10 12:54     ` Max Reitz
@ 2013-10-10 13:57       ` Kevin Wolf
  2013-10-10 14:01         ` Max Reitz
  0 siblings, 1 reply; 8+ messages in thread
From: Kevin Wolf @ 2013-10-10 13:57 UTC (permalink / raw)
  To: Max Reitz; +Cc: qemu-devel, Stefan Hajnoczi

Am 10.10.2013 um 14:54 hat Max Reitz geschrieben:
> If the "currently" implied that this will turn out bad if there is a
> new error condition between a successful call to
> qcow2_alloc_cluster_link_l2 and the removal of the L2Meta request
> from the list: Yes, that's true, of course.

Yes, that's the scenario. It seems easy to miss the error handling path
when reviewing a change to this code.

> However, as you've said,
> currently, there is no such condition; and I don't see why it should
> be introduced. The sole purpose of the list seems to be (to me) to
> execute qcow2_alloc_cluster_link_l2 on every of its elements. Thus,
> as soon as qcow2_alloc_cluster_link_l2 is successful, the
> corresponding request should be removed from the list.

If anything isn't complex enough in qcow2, think about how things will
turn out with Delayed COW and chances are that it does become complex.

For example, you can then have other requests running in parallel which
use the newly allocated cluster. You may decrease the refcount only
after the last of them has completed. This is just the first case that
comes to mind, I'm sure there's more fun to be had.

> So, in case you do agree that it currently works fine, I would not
> consider it risky; if this patch is applied and some time in the
> future anything introduces a "goto fail" between
> qcow2_alloc_cluster_link_l2 and l2_meta = next, this patch would
> simply have to make sure that qcow2_free_clusters isn't called in
> this case. In the probably very unlikely case all my previous
> assumptions and conclusions were true, I'd just add a comment in the
> qcow2_alloc_cluster_link_l2 loop informing about this case (“If you
> add a goto fail here, make sure to pay attention” or something along
> these lines).

Adding a comment there sounds like a fair compromise.

> >Also, shouldn't it be QCOW2_DISCARD_OTHER?
> 
> I'm always unsure about the discard flags. ;-)
> 
> I try to follow the rule of “use the specific type (or ‘other’) for
> freeing ‘out of the blue’, but use ‘always’ if it's just a very
> recent allocation that is being undone again”. I'd gladly accept
> better recommendations. ;-)

To be honest, I'm not sure if there are any legitimate use cases for
'always'... Discard is a slow operation, so unless there's a specific
reason anyway, I'd default to 'other' (or a specific type, of course).

Kevin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] qcow2: Undo leaked allocations in co_writev
  2013-10-10 13:57       ` Kevin Wolf
@ 2013-10-10 14:01         ` Max Reitz
  0 siblings, 0 replies; 8+ messages in thread
From: Max Reitz @ 2013-10-10 14:01 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, Stefan Hajnoczi

On 2013-10-10 15:57, Kevin Wolf wrote:
> Am 10.10.2013 um 14:54 hat Max Reitz geschrieben:
>> If the "currently" implied that this will turn out bad if there is a
>> new error condition between a successful call to
>> qcow2_alloc_cluster_link_l2 and the removal of the L2Meta request
>> from the list: Yes, that's true, of course.
> Yes, that's the scenario. It seems easy to miss the error handling path
> when reviewing a change to this code.
>
>> However, as you've said,
>> currently, there is no such condition; and I don't see why it should
>> be introduced. The sole purpose of the list seems to be (to me) to
>> execute qcow2_alloc_cluster_link_l2 on every of its elements. Thus,
>> as soon as qcow2_alloc_cluster_link_l2 is successful, the
>> corresponding request should be removed from the list.
> If anything isn't complex enough in qcow2, think about how things will
> turn out with Delayed COW and chances are that it does become complex.
>
> For example, you can then have other requests running in parallel which
> use the newly allocated cluster. You may decrease the refcount only
> after the last of them has completed. This is just the first case that
> comes to mind, I'm sure there's more fun to be had.

Well, yes, I somehow thought of something like this; but I thought 
that'd be the problem of qcow2_alloc_cluster_link_l2 and as soon as that 
function has been called, though maybe we cannot remove it from the 
requests in flight (global for the BDS), but we still should remove it 
from the local l2meta list.

>> So, in case you do agree that it currently works fine, I would not
>> consider it risky; if this patch is applied and some time in the
>> future anything introduces a "goto fail" between
>> qcow2_alloc_cluster_link_l2 and l2_meta = next, this patch would
>> simply have to make sure that qcow2_free_clusters isn't called in
>> this case. In the probably very unlikely case all my previous
>> assumptions and conclusions were true, I'd just add a comment in the
>> qcow2_alloc_cluster_link_l2 loop informing about this case (“If you
>> add a goto fail here, make sure to pay attention” or something along
>> these lines).
> Adding a comment there sounds like a fair compromise.

Okay, I'll add it.

>>> Also, shouldn't it be QCOW2_DISCARD_OTHER?
>> I'm always unsure about the discard flags. ;-)
>>
>> I try to follow the rule of “use the specific type (or ‘other’) for
>> freeing ‘out of the blue’, but use ‘always’ if it's just a very
>> recent allocation that is being undone again”. I'd gladly accept
>> better recommendations. ;-)
> To be honest, I'm not sure if there are any legitimate use cases for
> 'always'... Discard is a slow operation, so unless there's a specific
> reason anyway, I'd default to 'other' (or a specific type, of course).

Seems easy enough to remember, thanks. ;-)

Max

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] qcow2: Undo leaked allocations in co_writev
  2013-10-10 12:26   ` Kevin Wolf
  2013-10-10 12:54     ` Max Reitz
@ 2013-10-11  9:15     ` Stefan Hajnoczi
  1 sibling, 0 replies; 8+ messages in thread
From: Stefan Hajnoczi @ 2013-10-11  9:15 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, Stefan Hajnoczi, Max Reitz

On Thu, Oct 10, 2013 at 02:26:25PM +0200, Kevin Wolf wrote:
> Am 10.10.2013 um 10:52 hat Max Reitz geschrieben:
> > If the write request spans more than one L2 table,
> > qcow2_alloc_cluster_offset cannot handle the required allocations
> > atomically. This results in leaks if it allocated new clusters in any
> > but the last L2 table touched and an error occurs in qcow2_co_writev
> > before having established the L2 link. These non-atomic allocations
> > were, however, indeed successful and are therefore given to the caller
> > in the L2Meta list.
> > 
> > If an error occurs in qcow2_co_writev and the L2Meta list is unwound,
> > all its remaining entries are clusters whose L2 links were not yet
> > established. Thus, all allocations in that list should be undone.
> > 
> > Signed-off-by: Max Reitz <mreitz@redhat.com>
> > ---
> >  block/qcow2.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/block/qcow2.c b/block/qcow2.c
> > index b2489fb..6bedd5d 100644
> > --- a/block/qcow2.c
> > +++ b/block/qcow2.c
> > @@ -1017,6 +1017,13 @@ fail:
> >      while (l2meta != NULL) {
> >          QCowL2Meta *next;
> >  
> > +        /* Undo all leaked allocations */
> > +        if (l2meta->nb_clusters != 0) {
> > +            qcow2_free_clusters(bs, l2meta->alloc_offset,
> > +                                l2meta->nb_clusters << s->cluster_bits,
> > +                                QCOW2_DISCARD_ALWAYS);
> > +        }
> > +
> >          if (l2meta->nb_clusters != 0) {
> >              QLIST_REMOVE(l2meta, next_in_flight);
> >          }
> 
> This feels a bit risky.
> 
> I think currently it does work, because qcow2_alloc_cluster_link_l2()
> can only return an error when it didn't update the L2 entry in the cache
> yet, but adding any error condition between that point and the L2Meta
> unwinding would result in corruption. I'm unsure, but perhaps a cluster
> leak is the lesser evil. Did you consider this? Do other people have an
> opinion on it?

I agree it's easy to make things worse by trying to free clusters in an
error path.  I find these types of changes a lot of effort (to write and
review properly) for relatively little gain.

Stefan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Qemu-devel] [PATCH 2/2] qemu-iotests: Extend test 026
  2013-10-10  8:52 [Qemu-devel] [PATCH 0/2] qcow2: Undo leaked allocations in co_writev Max Reitz
  2013-10-10  8:52 ` [Qemu-devel] [PATCH 1/2] " Max Reitz
@ 2013-10-10  8:52 ` Max Reitz
  1 sibling, 0 replies; 8+ messages in thread
From: Max Reitz @ 2013-10-10  8:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi, Max Reitz

Extend test case 026 by an aio_write fail test, which should not result
in any leaked clusters.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 tests/qemu-iotests/026             | 31 +++++++++++++++++++++++++++++++
 tests/qemu-iotests/026.out         |  8 ++++++++
 tests/qemu-iotests/026.out.nocache |  8 ++++++++
 3 files changed, 47 insertions(+)

diff --git a/tests/qemu-iotests/026 b/tests/qemu-iotests/026
index ebe29d0..a9dfe36 100755
--- a/tests/qemu-iotests/026
+++ b/tests/qemu-iotests/026
@@ -193,6 +193,37 @@ done
 done
 done
 
+echo
+echo === Write leak test ===
+echo
+CLUSTER_SIZE=512
+
+for event in write_aio; do
+for errno in 28; do
+for imm in off; do
+for once in on; do
+
+cat > "$TEST_DIR/blkdebug.conf" <<EOF
+[inject-error]
+event = "$event"
+errno = "$errno"
+immediately = "$imm"
+once = "$once"
+EOF
+
+_make_test_img 1G
+
+echo
+echo "Event: $event; errno: $errno; imm: $imm; once: $once"
+$QEMU_IO -c "write 0 128k" "$BLKDBG_TEST_IMG" | _filter_qemu_io
+
+_check_test_img 2>&1 | grep -v "refcount=1 reference=0"
+
+done
+done
+done
+done
+
 # success, all done
 echo "*** done"
 rm -f $seq.full
diff --git a/tests/qemu-iotests/026.out b/tests/qemu-iotests/026.out
index 1504579..c94daca 100644
--- a/tests/qemu-iotests/026.out
+++ b/tests/qemu-iotests/026.out
@@ -599,4 +599,12 @@ write failed: No space left on device
 
 96 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
+
+=== Write leak test ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
+
+Event: write_aio; errno: 28; imm: off; once: on
+write failed: No space left on device
+No errors were found on the image.
 *** done
diff --git a/tests/qemu-iotests/026.out.nocache b/tests/qemu-iotests/026.out.nocache
index c9d242e..962bb71 100644
--- a/tests/qemu-iotests/026.out.nocache
+++ b/tests/qemu-iotests/026.out.nocache
@@ -607,4 +607,12 @@ write failed: No space left on device
 
 96 leaked clusters were found on the image.
 This means waste of disk space, but no harm to data.
+
+=== Write leak test ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1073741824 
+
+Event: write_aio; errno: 28; imm: off; once: on
+write failed: No space left on device
+No errors were found on the image.
 *** done
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-10-11  9:16 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-10  8:52 [Qemu-devel] [PATCH 0/2] qcow2: Undo leaked allocations in co_writev Max Reitz
2013-10-10  8:52 ` [Qemu-devel] [PATCH 1/2] " Max Reitz
2013-10-10 12:26   ` Kevin Wolf
2013-10-10 12:54     ` Max Reitz
2013-10-10 13:57       ` Kevin Wolf
2013-10-10 14:01         ` Max Reitz
2013-10-11  9:15     ` Stefan Hajnoczi
2013-10-10  8:52 ` [Qemu-devel] [PATCH 2/2] qemu-iotests: Extend test 026 Max Reitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).