qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] block/io: clear BDRV_BLOCK_RECURSE flag after recursing in bdrv_co_block_status
@ 2024-01-16 15:48 Fiona Ebner
  2024-01-16 22:06 ` Stefan Hajnoczi
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Fiona Ebner @ 2024-01-16 15:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, hreitz, kwolf, fam, stefanha

Using fleecing backup like in [0] on a qcow2 image (with metadata
preallocation) can lead to the following assertion failure:

> bdrv_co_do_block_status: Assertion `!(ret & BDRV_BLOCK_ZERO)' failed.

In the reproducer [0], it happens because the BDRV_BLOCK_RECURSE flag
will be set by the qcow2 driver, so the caller will recursively check
the file child. Then the BDRV_BLOCK_ZERO set too. Later up the call
chain, in bdrv_co_do_block_status() for the snapshot-access driver,
the assertion failure will happen, because both flags are set.

To fix it, clear the recurse flag after the recursive check was done.

In detail:

> #0  qcow2_co_block_status

Returns 0x45 = BDRV_BLOCK_RECURSE | BDRV_BLOCK_DATA |
BDRV_BLOCK_OFFSET_VALID.

> #1  bdrv_co_do_block_status

Because of the data flag, bdrv_co_do_block_status() will now also set
BDRV_BLOCK_ALLOCATED. Because of the recurse flag,
bdrv_co_do_block_status() for the bdrv_file child will be called,
which returns 0x16 = BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_OFFSET_VALID |
BDRV_BLOCK_ZERO. Now the return value inherits the zero flag.

Returns 0x57 = BDRV_BLOCK_RECURSE | BDRV_BLOCK_DATA |
BDRV_BLOCK_OFFSET_VALID | BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_ZERO.

> #2  bdrv_co_common_block_status_above
> #3  bdrv_co_block_status_above
> #4  bdrv_co_block_status
> #5  cbw_co_snapshot_block_status
> #6  bdrv_co_snapshot_block_status
> #7  snapshot_access_co_block_status
> #8  bdrv_co_do_block_status

Return value is propagated all the way up to here, where the assertion
failure happens, because BDRV_BLOCK_RECURSE and BDRV_BLOCK_ZERO are
both set.

> #9  bdrv_co_common_block_status_above
> #10 bdrv_co_block_status_above
> #11 block_copy_block_status
> #12 block_copy_dirty_clusters
> #13 block_copy_common
> #14 block_copy_async_co_entry
> #15 coroutine_trampoline

[0]:

> #!/bin/bash
> rm /tmp/disk.qcow2
> ./qemu-img create /tmp/disk.qcow2 -o preallocation=metadata -f qcow2 1G
> ./qemu-img create /tmp/fleecing.qcow2 -f qcow2 1G
> ./qemu-img create /tmp/backup.qcow2 -f qcow2 1G
> ./qemu-system-x86_64 --qmp stdio \
> --blockdev qcow2,node-name=node0,file.driver=file,file.filename=/tmp/disk.qcow2 \
> --blockdev qcow2,node-name=node1,file.driver=file,file.filename=/tmp/fleecing.qcow2 \
> --blockdev qcow2,node-name=node2,file.driver=file,file.filename=/tmp/backup.qcow2 \
> <<EOF
> {"execute": "qmp_capabilities"}
> {"execute": "blockdev-add", "arguments": { "driver": "copy-before-write", "file": "node0", "target": "node1", "node-name": "node3" } }
> {"execute": "blockdev-add", "arguments": { "driver": "snapshot-access", "file": "node3", "node-name": "snap0" } }
> {"execute": "blockdev-backup", "arguments": { "device": "snap0", "target": "node1", "sync": "full", "job-id": "backup0" } }
> EOF

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

I'm new to this part of the code, so I'm not sure if it is actually
safe to clear the flag? Intuitively, I'd expect it to be only relevant
until it was acted upon, but no clue.

 block/io.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/block/io.c b/block/io.c
index 8fa7670571..33150c0359 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2584,6 +2584,16 @@ bdrv_co_do_block_status(BlockDriverState *bs, bool want_zero,
                 ret |= (ret2 & BDRV_BLOCK_ZERO);
             }
         }
+
+        /*
+         * Now that the recursive search was done, clear the flag. Otherwise,
+         * with more complicated block graphs like snapshot-access ->
+         * copy-before-write -> qcow2, where the return value will be propagated
+         * further up to a parent bdrv_co_do_block_status() call, both the
+         * BDRV_BLOCK_RECURSE and BDRV_BLOCK_ZERO flags would be set, which is
+         * not allowed.
+         */
+        ret &= ~BDRV_BLOCK_RECURSE;
     }
 
 out:
-- 
2.39.2




^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] block/io: clear BDRV_BLOCK_RECURSE flag after recursing in bdrv_co_block_status
  2024-01-16 15:48 [PATCH] block/io: clear BDRV_BLOCK_RECURSE flag after recursing in bdrv_co_block_status Fiona Ebner
@ 2024-01-16 22:06 ` Stefan Hajnoczi
  2024-01-17 12:42 ` Vladimir Sementsov-Ogievskiy
  2024-01-17 14:32 ` Stefan Hajnoczi
  2 siblings, 0 replies; 4+ messages in thread
From: Stefan Hajnoczi @ 2024-01-16 22:06 UTC (permalink / raw)
  To: Fiona Ebner
  Cc: qemu-devel, qemu-block, hreitz, kwolf, fam,
	Vladimir Sementsov-Ogievskiy

[-- Attachment #1: Type: text/plain, Size: 4341 bytes --]

On Tue, Jan 16, 2024 at 04:48:39PM +0100, Fiona Ebner wrote:
> Using fleecing backup like in [0] on a qcow2 image (with metadata
> preallocation) can lead to the following assertion failure:
> 
> > bdrv_co_do_block_status: Assertion `!(ret & BDRV_BLOCK_ZERO)' failed.
> 
> In the reproducer [0], it happens because the BDRV_BLOCK_RECURSE flag
> will be set by the qcow2 driver, so the caller will recursively check
> the file child. Then the BDRV_BLOCK_ZERO set too. Later up the call
> chain, in bdrv_co_do_block_status() for the snapshot-access driver,
> the assertion failure will happen, because both flags are set.
> 
> To fix it, clear the recurse flag after the recursive check was done.

CCing Vladimir, who introduced the BDRV_BLOCK_RECURSE flag in commit
69f47505ee66 ("block: avoid recursive block_status call if possible").

> 
> In detail:
> 
> > #0  qcow2_co_block_status
> 
> Returns 0x45 = BDRV_BLOCK_RECURSE | BDRV_BLOCK_DATA |
> BDRV_BLOCK_OFFSET_VALID.
> 
> > #1  bdrv_co_do_block_status
> 
> Because of the data flag, bdrv_co_do_block_status() will now also set
> BDRV_BLOCK_ALLOCATED. Because of the recurse flag,
> bdrv_co_do_block_status() for the bdrv_file child will be called,
> which returns 0x16 = BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_OFFSET_VALID |
> BDRV_BLOCK_ZERO. Now the return value inherits the zero flag.
> 
> Returns 0x57 = BDRV_BLOCK_RECURSE | BDRV_BLOCK_DATA |
> BDRV_BLOCK_OFFSET_VALID | BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_ZERO.
> 
> > #2  bdrv_co_common_block_status_above
> > #3  bdrv_co_block_status_above
> > #4  bdrv_co_block_status
> > #5  cbw_co_snapshot_block_status
> > #6  bdrv_co_snapshot_block_status
> > #7  snapshot_access_co_block_status
> > #8  bdrv_co_do_block_status
> 
> Return value is propagated all the way up to here, where the assertion
> failure happens, because BDRV_BLOCK_RECURSE and BDRV_BLOCK_ZERO are
> both set.
> 
> > #9  bdrv_co_common_block_status_above
> > #10 bdrv_co_block_status_above
> > #11 block_copy_block_status
> > #12 block_copy_dirty_clusters
> > #13 block_copy_common
> > #14 block_copy_async_co_entry
> > #15 coroutine_trampoline
> 
> [0]:
> 
> > #!/bin/bash
> > rm /tmp/disk.qcow2
> > ./qemu-img create /tmp/disk.qcow2 -o preallocation=metadata -f qcow2 1G
> > ./qemu-img create /tmp/fleecing.qcow2 -f qcow2 1G
> > ./qemu-img create /tmp/backup.qcow2 -f qcow2 1G
> > ./qemu-system-x86_64 --qmp stdio \
> > --blockdev qcow2,node-name=node0,file.driver=file,file.filename=/tmp/disk.qcow2 \
> > --blockdev qcow2,node-name=node1,file.driver=file,file.filename=/tmp/fleecing.qcow2 \
> > --blockdev qcow2,node-name=node2,file.driver=file,file.filename=/tmp/backup.qcow2 \
> > <<EOF
> > {"execute": "qmp_capabilities"}
> > {"execute": "blockdev-add", "arguments": { "driver": "copy-before-write", "file": "node0", "target": "node1", "node-name": "node3" } }
> > {"execute": "blockdev-add", "arguments": { "driver": "snapshot-access", "file": "node3", "node-name": "snap0" } }
> > {"execute": "blockdev-backup", "arguments": { "device": "snap0", "target": "node1", "sync": "full", "job-id": "backup0" } }
> > EOF
> 
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
> 
> I'm new to this part of the code, so I'm not sure if it is actually
> safe to clear the flag? Intuitively, I'd expect it to be only relevant
> until it was acted upon, but no clue.
> 
>  block/io.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/block/io.c b/block/io.c
> index 8fa7670571..33150c0359 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -2584,6 +2584,16 @@ bdrv_co_do_block_status(BlockDriverState *bs, bool want_zero,
>                  ret |= (ret2 & BDRV_BLOCK_ZERO);
>              }
>          }
> +
> +        /*
> +         * Now that the recursive search was done, clear the flag. Otherwise,
> +         * with more complicated block graphs like snapshot-access ->
> +         * copy-before-write -> qcow2, where the return value will be propagated
> +         * further up to a parent bdrv_co_do_block_status() call, both the
> +         * BDRV_BLOCK_RECURSE and BDRV_BLOCK_ZERO flags would be set, which is
> +         * not allowed.
> +         */
> +        ret &= ~BDRV_BLOCK_RECURSE;
>      }
>  
>  out:
> -- 
> 2.39.2
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] block/io: clear BDRV_BLOCK_RECURSE flag after recursing in bdrv_co_block_status
  2024-01-16 15:48 [PATCH] block/io: clear BDRV_BLOCK_RECURSE flag after recursing in bdrv_co_block_status Fiona Ebner
  2024-01-16 22:06 ` Stefan Hajnoczi
@ 2024-01-17 12:42 ` Vladimir Sementsov-Ogievskiy
  2024-01-17 14:32 ` Stefan Hajnoczi
  2 siblings, 0 replies; 4+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2024-01-17 12:42 UTC (permalink / raw)
  To: Fiona Ebner, qemu-devel; +Cc: qemu-block, hreitz, kwolf, fam, stefanha

On 16.01.24 18:48, Fiona Ebner wrote:
> Using fleecing backup like in [0] on a qcow2 image (with metadata
> preallocation) can lead to the following assertion failure:
> 
>> bdrv_co_do_block_status: Assertion `!(ret & BDRV_BLOCK_ZERO)' failed.
> 
> In the reproducer [0], it happens because the BDRV_BLOCK_RECURSE flag
> will be set by the qcow2 driver, so the caller will recursively check
> the file child. Then the BDRV_BLOCK_ZERO set too. Later up the call
> chain, in bdrv_co_do_block_status() for the snapshot-access driver,
> the assertion failure will happen, because both flags are set.
> 
> To fix it, clear the recurse flag after the recursive check was done.
> 
> In detail:
> 
>> #0  qcow2_co_block_status
> 
> Returns 0x45 = BDRV_BLOCK_RECURSE | BDRV_BLOCK_DATA |
> BDRV_BLOCK_OFFSET_VALID.
> 
>> #1  bdrv_co_do_block_status
> 
> Because of the data flag, bdrv_co_do_block_status() will now also set
> BDRV_BLOCK_ALLOCATED. Because of the recurse flag,
> bdrv_co_do_block_status() for the bdrv_file child will be called,
> which returns 0x16 = BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_OFFSET_VALID |
> BDRV_BLOCK_ZERO. Now the return value inherits the zero flag.
> 
> Returns 0x57 = BDRV_BLOCK_RECURSE | BDRV_BLOCK_DATA |
> BDRV_BLOCK_OFFSET_VALID | BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_ZERO.
> 
>> #2  bdrv_co_common_block_status_above
>> #3  bdrv_co_block_status_above
>> #4  bdrv_co_block_status
>> #5  cbw_co_snapshot_block_status
>> #6  bdrv_co_snapshot_block_status
>> #7  snapshot_access_co_block_status
>> #8  bdrv_co_do_block_status
> 
> Return value is propagated all the way up to here, where the assertion
> failure happens, because BDRV_BLOCK_RECURSE and BDRV_BLOCK_ZERO are
> both set.
> 
>> #9  bdrv_co_common_block_status_above
>> #10 bdrv_co_block_status_above
>> #11 block_copy_block_status
>> #12 block_copy_dirty_clusters
>> #13 block_copy_common
>> #14 block_copy_async_co_entry
>> #15 coroutine_trampoline
> 
> [0]:
> 
>> #!/bin/bash
>> rm /tmp/disk.qcow2
>> ./qemu-img create /tmp/disk.qcow2 -o preallocation=metadata -f qcow2 1G
>> ./qemu-img create /tmp/fleecing.qcow2 -f qcow2 1G
>> ./qemu-img create /tmp/backup.qcow2 -f qcow2 1G
>> ./qemu-system-x86_64 --qmp stdio \
>> --blockdev qcow2,node-name=node0,file.driver=file,file.filename=/tmp/disk.qcow2 \
>> --blockdev qcow2,node-name=node1,file.driver=file,file.filename=/tmp/fleecing.qcow2 \
>> --blockdev qcow2,node-name=node2,file.driver=file,file.filename=/tmp/backup.qcow2 \
>> <<EOF
>> {"execute": "qmp_capabilities"}
>> {"execute": "blockdev-add", "arguments": { "driver": "copy-before-write", "file": "node0", "target": "node1", "node-name": "node3" } }
>> {"execute": "blockdev-add", "arguments": { "driver": "snapshot-access", "file": "node3", "node-name": "snap0" } }
>> {"execute": "blockdev-backup", "arguments": { "device": "snap0", "target": "node1", "sync": "full", "job-id": "backup0" } }
>> EOF
> 
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
> 
> I'm new to this part of the code, so I'm not sure if it is actually
> safe to clear the flag? Intuitively, I'd expect it to be only relevant
> until it was acted upon, but no clue.
> 
>   block/io.c | 10 ++++++++++
>   1 file changed, 10 insertions(+)
> 
> diff --git a/block/io.c b/block/io.c
> index 8fa7670571..33150c0359 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -2584,6 +2584,16 @@ bdrv_co_do_block_status(BlockDriverState *bs, bool want_zero,
>                   ret |= (ret2 & BDRV_BLOCK_ZERO);
>               }
>           }
> +
> +        /*
> +         * Now that the recursive search was done, clear the flag. Otherwise,
> +         * with more complicated block graphs like snapshot-access ->
> +         * copy-before-write -> qcow2, where the return value will be propagated
> +         * further up to a parent bdrv_co_do_block_status() call, both the
> +         * BDRV_BLOCK_RECURSE and BDRV_BLOCK_ZERO flags would be set, which is
> +         * not allowed.
> +         */
> +        ret &= ~BDRV_BLOCK_RECURSE;
>       }
>   
>   out:

I agree. I think that's a correct fix:

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] block/io: clear BDRV_BLOCK_RECURSE flag after recursing in bdrv_co_block_status
  2024-01-16 15:48 [PATCH] block/io: clear BDRV_BLOCK_RECURSE flag after recursing in bdrv_co_block_status Fiona Ebner
  2024-01-16 22:06 ` Stefan Hajnoczi
  2024-01-17 12:42 ` Vladimir Sementsov-Ogievskiy
@ 2024-01-17 14:32 ` Stefan Hajnoczi
  2 siblings, 0 replies; 4+ messages in thread
From: Stefan Hajnoczi @ 2024-01-17 14:32 UTC (permalink / raw)
  To: Fiona Ebner; +Cc: qemu-devel, qemu-block, hreitz, kwolf, fam

[-- Attachment #1: Type: text/plain, Size: 3438 bytes --]

On Tue, Jan 16, 2024 at 04:48:39PM +0100, Fiona Ebner wrote:
> Using fleecing backup like in [0] on a qcow2 image (with metadata
> preallocation) can lead to the following assertion failure:
> 
> > bdrv_co_do_block_status: Assertion `!(ret & BDRV_BLOCK_ZERO)' failed.
> 
> In the reproducer [0], it happens because the BDRV_BLOCK_RECURSE flag
> will be set by the qcow2 driver, so the caller will recursively check
> the file child. Then the BDRV_BLOCK_ZERO set too. Later up the call
> chain, in bdrv_co_do_block_status() for the snapshot-access driver,
> the assertion failure will happen, because both flags are set.
> 
> To fix it, clear the recurse flag after the recursive check was done.
> 
> In detail:
> 
> > #0  qcow2_co_block_status
> 
> Returns 0x45 = BDRV_BLOCK_RECURSE | BDRV_BLOCK_DATA |
> BDRV_BLOCK_OFFSET_VALID.
> 
> > #1  bdrv_co_do_block_status
> 
> Because of the data flag, bdrv_co_do_block_status() will now also set
> BDRV_BLOCK_ALLOCATED. Because of the recurse flag,
> bdrv_co_do_block_status() for the bdrv_file child will be called,
> which returns 0x16 = BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_OFFSET_VALID |
> BDRV_BLOCK_ZERO. Now the return value inherits the zero flag.
> 
> Returns 0x57 = BDRV_BLOCK_RECURSE | BDRV_BLOCK_DATA |
> BDRV_BLOCK_OFFSET_VALID | BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_ZERO.
> 
> > #2  bdrv_co_common_block_status_above
> > #3  bdrv_co_block_status_above
> > #4  bdrv_co_block_status
> > #5  cbw_co_snapshot_block_status
> > #6  bdrv_co_snapshot_block_status
> > #7  snapshot_access_co_block_status
> > #8  bdrv_co_do_block_status
> 
> Return value is propagated all the way up to here, where the assertion
> failure happens, because BDRV_BLOCK_RECURSE and BDRV_BLOCK_ZERO are
> both set.
> 
> > #9  bdrv_co_common_block_status_above
> > #10 bdrv_co_block_status_above
> > #11 block_copy_block_status
> > #12 block_copy_dirty_clusters
> > #13 block_copy_common
> > #14 block_copy_async_co_entry
> > #15 coroutine_trampoline
> 
> [0]:
> 
> > #!/bin/bash
> > rm /tmp/disk.qcow2
> > ./qemu-img create /tmp/disk.qcow2 -o preallocation=metadata -f qcow2 1G
> > ./qemu-img create /tmp/fleecing.qcow2 -f qcow2 1G
> > ./qemu-img create /tmp/backup.qcow2 -f qcow2 1G
> > ./qemu-system-x86_64 --qmp stdio \
> > --blockdev qcow2,node-name=node0,file.driver=file,file.filename=/tmp/disk.qcow2 \
> > --blockdev qcow2,node-name=node1,file.driver=file,file.filename=/tmp/fleecing.qcow2 \
> > --blockdev qcow2,node-name=node2,file.driver=file,file.filename=/tmp/backup.qcow2 \
> > <<EOF
> > {"execute": "qmp_capabilities"}
> > {"execute": "blockdev-add", "arguments": { "driver": "copy-before-write", "file": "node0", "target": "node1", "node-name": "node3" } }
> > {"execute": "blockdev-add", "arguments": { "driver": "snapshot-access", "file": "node3", "node-name": "snap0" } }
> > {"execute": "blockdev-backup", "arguments": { "device": "snap0", "target": "node1", "sync": "full", "job-id": "backup0" } }
> > EOF
> 
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
> 
> I'm new to this part of the code, so I'm not sure if it is actually
> safe to clear the flag? Intuitively, I'd expect it to be only relevant
> until it was acted upon, but no clue.
> 
>  block/io.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)

Thanks, applied to my block tree:
https://gitlab.com/stefanha/qemu/commits/block

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-01-17 14:32 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-16 15:48 [PATCH] block/io: clear BDRV_BLOCK_RECURSE flag after recursing in bdrv_co_block_status Fiona Ebner
2024-01-16 22:06 ` Stefan Hajnoczi
2024-01-17 12:42 ` Vladimir Sementsov-Ogievskiy
2024-01-17 14:32 ` Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).