* [PATCH v2 1/2] qcow2: handle discard-no-unref in measure
@ 2024-06-05 13:25 Jean-Louis Dupond
2024-06-05 13:25 ` [PATCH v2 2/2] qcow2: don't allow discard-no-unref when discard is not enabled Jean-Louis Dupond
2024-07-10 12:58 ` [PATCH v2 1/2] qcow2: handle discard-no-unref in measure Hanna Czenczek
0 siblings, 2 replies; 7+ messages in thread
From: Jean-Louis Dupond @ 2024-06-05 13:25 UTC (permalink / raw)
To: qemu-devel, kwolf, hreitz, andrey.drobyshev; +Cc: Jean-Louis Dupond
When doing a measure on an image with a backing file and
discard-no-unref is enabled, the code should take this into account.
If for example you have a snapshot image with a base, and you do a
discard within the snapshot, it will be ZERO and ALLOCATED, but without
host offset.
Now if we commit this snapshot, and the clusters in the base image have
a host offset, the clusters will only be set to ZERO, but the host offset
will not be cleared.
Therefor non-data clusters in the top image need to check the
base to see if space will be freed or not, to have a correct measure
output.
Bug-Url: https://gitlab.com/qemu-project/qemu/-/issues/2369
Signed-off-by: Jean-Louis Dupond <jean-louis@dupond.be>
---
block/qcow2.c | 32 +++++++++++++++++++++++++++++---
1 file changed, 29 insertions(+), 3 deletions(-)
diff --git a/block/qcow2.c b/block/qcow2.c
index 956128b409..50354e5b98 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -5163,9 +5163,16 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs,
} else {
int64_t offset;
int64_t pnum = 0;
+ BlockDriverState *parent = bdrv_filter_or_cow_bs(in_bs);
+ BDRVQcow2State *s = NULL;
+
+ if (parent) {
+ s = parent->opaque;
+ }
for (offset = 0; offset < ssize; offset += pnum) {
int ret;
+ int retp = 0;
ret = bdrv_block_status_above(in_bs, NULL, offset,
ssize - offset, &pnum, NULL,
@@ -5176,10 +5183,29 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs,
goto err;
}
- if (ret & BDRV_BLOCK_ZERO) {
+ /* If we have a parent in the chain and the current block is not data,
+ * then we want to check the allocation state of the parent block.
+ * If it has a valid offset, then we want to include it into
+ * the calculation, cause blocks with an offset will not be freed when
+ * committing the top into base with discard-no-unref enabled.
+ */
+ if (parent && s->discard_no_unref && !(ret & BDRV_BLOCK_DATA)) {
+ int64_t pnum_parent = 0;
+ retp = bdrv_block_status_above(parent, NULL, offset,
+ ssize - offset, &pnum_parent, NULL,
+ NULL);
+ /* If the parent continuous block is smaller, use that pnum,
+ * so the next iteration starts with the smallest offset.
+ */
+ if (pnum_parent < pnum) {
+ pnum = pnum_parent;
+ }
+ }
+ if (ret & BDRV_BLOCK_ZERO && !parent && !(parent && s->discard_no_unref)) {
/* Skip zero regions (safe with no backing file) */
- } else if ((ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED)) ==
- (BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED)) {
+ } else if (((ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED)) ==
+ (BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED)) ||
+ (retp & BDRV_BLOCK_OFFSET_VALID)) {
/* Extend pnum to end of cluster for next iteration */
pnum = ROUND_UP(offset + pnum, cluster_size) - offset;
--
2.45.2
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v2 2/2] qcow2: don't allow discard-no-unref when discard is not enabled
2024-06-05 13:25 [PATCH v2 1/2] qcow2: handle discard-no-unref in measure Jean-Louis Dupond
@ 2024-06-05 13:25 ` Jean-Louis Dupond
2024-07-10 13:00 ` Hanna Czenczek
2024-07-10 12:58 ` [PATCH v2 1/2] qcow2: handle discard-no-unref in measure Hanna Czenczek
1 sibling, 1 reply; 7+ messages in thread
From: Jean-Louis Dupond @ 2024-06-05 13:25 UTC (permalink / raw)
To: qemu-devel, kwolf, hreitz, andrey.drobyshev; +Cc: Jean-Louis Dupond
When discard is not set to unmap/on, we should not allow setting
discard-no-unref.
Signed-off-by: Jean-Louis Dupond <jean-louis@dupond.be>
---
block/qcow2.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/block/qcow2.c b/block/qcow2.c
index 50354e5b98..cead5479e4 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1156,6 +1156,12 @@ qcow2_update_options_prepare(BlockDriverState *bs, Qcow2ReopenState *r,
ret = -EINVAL;
goto fail;
}
+ if (r->discard_no_unref && !(flags & BDRV_O_UNMAP)) {
+ error_setg(errp,
+ "discard-no-unref is only valid with discard=unmap/on");
+ ret = -EINVAL;
+ goto fail;
+ }
switch (s->crypt_method_header) {
case QCOW_CRYPT_NONE:
--
2.45.2
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v2 1/2] qcow2: handle discard-no-unref in measure
2024-06-05 13:25 [PATCH v2 1/2] qcow2: handle discard-no-unref in measure Jean-Louis Dupond
2024-06-05 13:25 ` [PATCH v2 2/2] qcow2: don't allow discard-no-unref when discard is not enabled Jean-Louis Dupond
@ 2024-07-10 12:58 ` Hanna Czenczek
2025-02-17 15:34 ` Jean-Louis Dupond
1 sibling, 1 reply; 7+ messages in thread
From: Hanna Czenczek @ 2024-07-10 12:58 UTC (permalink / raw)
To: Jean-Louis Dupond, qemu-devel, kwolf, andrey.drobyshev
On 05.06.24 15:25, Jean-Louis Dupond wrote:
> When doing a measure on an image with a backing file and
> discard-no-unref is enabled, the code should take this into account.
That doesn’t make sense to me. As far as I understand, 'measure' is
supposed to report how much space you need for a given image, i.e. if
you were to convert it to a new image. discard-no-unref doesn’t factor
into that, because for a 'convert' target (a new image), nothing can be
discarded.
Reading the issue, I understand that oVirt uses measure to determine the
size of the target of a 'commit' operation. Seems a bit like abuse to
me, precisely because of the issue you’re facing. More specifically, a
'commit' operation is a complex thing with a lot of variables, so the
outcome depends on a lot.
For example, this patch just checks the discard-no-unref setting on the
top image. But AFAIU it doesn’t matter what the setting on the top
image is, it matters what the setting on the commit target is. 'measure'
can’t know this because it doesn’t know what the commit target is. As
far as I can see, this patch actually assumes the commit target is the
first backing image (it specifically checks in the image whether a block
is allocated) – why?
So to me that means if 'measure' is supposed to give reliable data on
the commit case, it needs to be extended. Best thing I can come up with
off the top of my head would be to add an option e.g.
'commit=<target-node-name>', so we (A) that we’re looking at a commit
and not a convert, and (B) we know what data will be collapsed into
which image and where we need to check for discard-no-unref.
Hanna
> If for example you have a snapshot image with a base, and you do a
> discard within the snapshot, it will be ZERO and ALLOCATED, but without
> host offset.
> Now if we commit this snapshot, and the clusters in the base image have
> a host offset, the clusters will only be set to ZERO, but the host offset
> will not be cleared.
> Therefor non-data clusters in the top image need to check the
> base to see if space will be freed or not, to have a correct measure
> output.
>
> Bug-Url: https://gitlab.com/qemu-project/qemu/-/issues/2369
> Signed-off-by: Jean-Louis Dupond <jean-louis@dupond.be>
> ---
> block/qcow2.c | 32 +++++++++++++++++++++++++++++---
> 1 file changed, 29 insertions(+), 3 deletions(-)
>
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 956128b409..50354e5b98 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -5163,9 +5163,16 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs,
> } else {
> int64_t offset;
> int64_t pnum = 0;
> + BlockDriverState *parent = bdrv_filter_or_cow_bs(in_bs);
> + BDRVQcow2State *s = NULL;
> +
> + if (parent) {
> + s = parent->opaque;
> + }
>
> for (offset = 0; offset < ssize; offset += pnum) {
> int ret;
> + int retp = 0;
>
> ret = bdrv_block_status_above(in_bs, NULL, offset,
> ssize - offset, &pnum, NULL,
> @@ -5176,10 +5183,29 @@ static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs,
> goto err;
> }
>
> - if (ret & BDRV_BLOCK_ZERO) {
> + /* If we have a parent in the chain and the current block is not data,
> + * then we want to check the allocation state of the parent block.
> + * If it has a valid offset, then we want to include it into
> + * the calculation, cause blocks with an offset will not be freed when
> + * committing the top into base with discard-no-unref enabled.
> + */
> + if (parent && s->discard_no_unref && !(ret & BDRV_BLOCK_DATA)) {
> + int64_t pnum_parent = 0;
> + retp = bdrv_block_status_above(parent, NULL, offset,
> + ssize - offset, &pnum_parent, NULL,
> + NULL);
> + /* If the parent continuous block is smaller, use that pnum,
> + * so the next iteration starts with the smallest offset.
> + */
> + if (pnum_parent < pnum) {
> + pnum = pnum_parent;
> + }
> + }
> + if (ret & BDRV_BLOCK_ZERO && !parent && !(parent && s->discard_no_unref)) {
> /* Skip zero regions (safe with no backing file) */
> - } else if ((ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED)) ==
> - (BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED)) {
> + } else if (((ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED)) ==
> + (BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED)) ||
> + (retp & BDRV_BLOCK_OFFSET_VALID)) {
> /* Extend pnum to end of cluster for next iteration */
> pnum = ROUND_UP(offset + pnum, cluster_size) - offset;
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 2/2] qcow2: don't allow discard-no-unref when discard is not enabled
2024-06-05 13:25 ` [PATCH v2 2/2] qcow2: don't allow discard-no-unref when discard is not enabled Jean-Louis Dupond
@ 2024-07-10 13:00 ` Hanna Czenczek
2025-02-17 15:35 ` Jean-Louis Dupond
0 siblings, 1 reply; 7+ messages in thread
From: Hanna Czenczek @ 2024-07-10 13:00 UTC (permalink / raw)
To: Jean-Louis Dupond, qemu-devel, kwolf, andrey.drobyshev
On 05.06.24 15:25, Jean-Louis Dupond wrote:
> When discard is not set to unmap/on, we should not allow setting
> discard-no-unref.
Is this important? Technically, it’s an incompatible change, and would
require a deprecation warning first.
(I can imagine people setting this option indiscriminately on all image,
whether discard actually plays a role or not. It may make sense for them.)
Hanna
> Signed-off-by: Jean-Louis Dupond <jean-louis@dupond.be>
> ---
> block/qcow2.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 50354e5b98..cead5479e4 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -1156,6 +1156,12 @@ qcow2_update_options_prepare(BlockDriverState *bs, Qcow2ReopenState *r,
> ret = -EINVAL;
> goto fail;
> }
> + if (r->discard_no_unref && !(flags & BDRV_O_UNMAP)) {
> + error_setg(errp,
> + "discard-no-unref is only valid with discard=unmap/on");
> + ret = -EINVAL;
> + goto fail;
> + }
>
> switch (s->crypt_method_header) {
> case QCOW_CRYPT_NONE:
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 1/2] qcow2: handle discard-no-unref in measure
2024-07-10 12:58 ` [PATCH v2 1/2] qcow2: handle discard-no-unref in measure Hanna Czenczek
@ 2025-02-17 15:34 ` Jean-Louis Dupond
2025-03-27 9:16 ` Jean-Louis Dupond
0 siblings, 1 reply; 7+ messages in thread
From: Jean-Louis Dupond @ 2025-02-17 15:34 UTC (permalink / raw)
To: Hanna Czenczek, qemu-devel, kwolf, andrey.drobyshev
Hi,
First of all sorry for the huge delay, but didn't had time to follow-up
on this lately.
And it got some lower priority, as we don't hit it often and have a
fairly easy workaround (fill the empty blocks again in the snapshot by
writing data to the disk).
On 7/10/24 14:58, Hanna Czenczek wrote:
> On 05.06.24 15:25, Jean-Louis Dupond wrote:
>> When doing a measure on an image with a backing file and
>> discard-no-unref is enabled, the code should take this into account.
>
> That doesn’t make sense to me. As far as I understand, 'measure' is
> supposed to report how much space you need for a given image, i.e. if
> you were to convert it to a new image. discard-no-unref doesn’t
> factor into that, because for a 'convert' target (a new image),
> nothing can be discarded.
>
> Reading the issue, I understand that oVirt uses measure to determine
> the size of the target of a 'commit' operation. Seems a bit like
> abuse to me, precisely because of the issue you’re facing. More
> specifically, a 'commit' operation is a complex thing with a lot of
> variables, so the outcome depends on a lot.
Correct. oVirt uses the measure command to find out how big the
destination volume needs to be when running a commit/merge of 2 disks.
This way it can resize the container (Logical Volume here) to the
correct size in order to succeed the commit.
>
> For example, this patch just checks the discard-no-unref setting on
> the top image. But AFAIU it doesn’t matter what the setting on the
> top image is, it matters what the setting on the commit target is.
> 'measure' can’t know this because it doesn’t know what the commit
> target is. As far as I can see, this patch actually assumes the
> commit target is the first backing image (it specifically checks in
> the image whether a block is allocated) – why?
By default it would check the top image indeed, but not when using the
complex json parameters to qemu-img measure.
For example:
./build/qemu-img create -f qcow2 /tmp/test.qcow2 128M
./build/qemu-io -c 'open /tmp/test.qcow2' -c 'write 0 8M' -c 'write 56M
20M' -c 'write 10M 8M' -c 'write 24M 32M'
./build/qemu-img create -f qcow2 -b /tmp/test.qcow2 -F qcow2
/tmp/test_snap.qcow2
./build/qemu-io -c 'open -o discard=unmap,discard-no-unref=on
/tmp/test_snap.qcow2' -c 'write 16M 8M' -c 'discard 60M 20M' -c 'write
84M 10M'
The following commands will give the current output:
[jean-louis@lt-jeanlouis qemu]$ ./build/qemu-img measure --output json
-O qcow2 'json:{"file": {"driver": "file", "filename":
"/tmp/test_snap.qcow2"}, "driver": "qcow2", "discard":"unmap",
"discard-no-unref":true, "backing": {"driver": "qcow2",
"discard-no-unref":false, "file": {"driver": "file", "filename":
"/tmp/test.qcow2"}, "backing": null}}'
{
"bitmaps": 0,
"required": 71630848,
"fully-allocated": 134545408
}
[jean-louis@lt-jeanlouis qemu]$ ./build/qemu-img measure --output json
-O qcow2 /tmp/test_snap.qcow2
{
"bitmaps": 0,
"required": 71630848,
"fully-allocated": 134545408
}
[jean-louis@lt-jeanlouis qemu]$ ./build/qemu-img measure --output json
-O qcow2 'json:{"file": {"driver": "file", "filename":
"/tmp/test_snap.qcow2"}, "driver": "qcow2", "backing": {"driver":
"qcow2", "file": {"driver": "file", "filename": "/tmp/test.qcow2"},
"backing": null}}'
{
"bitmaps": 0,
"required": 71630848,
"fully-allocated": 134545408
}
Cause it will not take into account the discard-no-unref flag. And will
give the output like you have in the current version.
But when running measure with the following options:
./build/qemu-img measure --output json -O qcow2 'json:{"file":
{"driver": "file", "filename": "/tmp/test_snap.qcow2"}, "driver":
"qcow2", "discard":"unmap", "discard-no-unref":true, "backing":
{"driver": "qcow2", "discard-no-unref":true, "file": {"driver": "file",
"filename": "/tmp/test.qcow2"}, "backing": null}}'
It will give a bigger required size:
{
"bitmaps": 0,
"required": 88408064,
"fully-allocated": 134545408
}
Why? if a block has already been allocated (either with data or contains
an allocated ZERO block), we want to include its size in the calculation.
Because with discard-no-unref, an allocated block will not be reused for
some other cluster, so it's not available for data in the snapshot layer.
So if the cluster was not yet allocated in the destination image, a new
cluster will need to be allocated to fit the new data from the snapshot
layer.
>
> So to me that means if 'measure' is supposed to give reliable data on
> the commit case, it needs to be extended. Best thing I can come up
> with off the top of my head would be to add an option e.g.
> 'commit=<target-node-name>', so we (A) that we’re looking at a commit
> and not a convert, and (B) we know what data will be collapsed into
> which image and where we need to check for discard-no-unref.
I think that is what can be achieved by using the json argument. Cause
there we can specify the target with its flags.
And it's then the responsibility of oVirt (or whatever other tool), to
pass the correct flags.
>
> Hanna
Thanks for the review
Jean-Louis
>
>> If for example you have a snapshot image with a base, and you do a
>> discard within the snapshot, it will be ZERO and ALLOCATED, but without
>> host offset.
>> Now if we commit this snapshot, and the clusters in the base image have
>> a host offset, the clusters will only be set to ZERO, but the host
>> offset
>> will not be cleared.
>> Therefor non-data clusters in the top image need to check the
>> base to see if space will be freed or not, to have a correct measure
>> output.
>>
>> Bug-Url: https://gitlab.com/qemu-project/qemu/-/issues/2369
>> Signed-off-by: Jean-Louis Dupond <jean-louis@dupond.be>
>> ---
>> block/qcow2.c | 32 +++++++++++++++++++++++++++++---
>> 1 file changed, 29 insertions(+), 3 deletions(-)
>>
>> diff --git a/block/qcow2.c b/block/qcow2.c
>> index 956128b409..50354e5b98 100644
>> --- a/block/qcow2.c
>> +++ b/block/qcow2.c
>> @@ -5163,9 +5163,16 @@ static BlockMeasureInfo
>> *qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs,
>> } else {
>> int64_t offset;
>> int64_t pnum = 0;
>> + BlockDriverState *parent = bdrv_filter_or_cow_bs(in_bs);
>> + BDRVQcow2State *s = NULL;
>> +
>> + if (parent) {
>> + s = parent->opaque;
>> + }
>> for (offset = 0; offset < ssize; offset += pnum) {
>> int ret;
>> + int retp = 0;
>> ret = bdrv_block_status_above(in_bs, NULL, offset,
>> ssize - offset,
>> &pnum, NULL,
>> @@ -5176,10 +5183,29 @@ static BlockMeasureInfo
>> *qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs,
>> goto err;
>> }
>> - if (ret & BDRV_BLOCK_ZERO) {
>> + /* If we have a parent in the chain and the current
>> block is not data,
>> + * then we want to check the allocation state of the
>> parent block.
>> + * If it has a valid offset, then we want to include
>> it into
>> + * the calculation, cause blocks with an offset will
>> not be freed when
>> + * committing the top into base with
>> discard-no-unref enabled.
>> + */
>> + if (parent && s->discard_no_unref && !(ret &
>> BDRV_BLOCK_DATA)) {
>> + int64_t pnum_parent = 0;
>> + retp = bdrv_block_status_above(parent, NULL,
>> offset,
>> + ssize - offset,
>> &pnum_parent, NULL,
>> + NULL);
>> + /* If the parent continuous block is
>> smaller, use that pnum,
>> + * so the next iteration starts with the
>> smallest offset.
>> + */
>> + if (pnum_parent < pnum) {
>> + pnum = pnum_parent;
>> + }
>> + }
>> + if (ret & BDRV_BLOCK_ZERO && !parent && !(parent &&
>> s->discard_no_unref)) {
>> /* Skip zero regions (safe with no backing
>> file) */
>> - } else if ((ret & (BDRV_BLOCK_DATA |
>> BDRV_BLOCK_ALLOCATED)) ==
>> - (BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED)) {
>> + } else if (((ret & (BDRV_BLOCK_DATA |
>> BDRV_BLOCK_ALLOCATED)) ==
>> + (BDRV_BLOCK_DATA |
>> BDRV_BLOCK_ALLOCATED)) ||
>> + (retp & BDRV_BLOCK_OFFSET_VALID)) {
>> /* Extend pnum to end of cluster for next
>> iteration */
>> pnum = ROUND_UP(offset + pnum, cluster_size) -
>> offset;
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 2/2] qcow2: don't allow discard-no-unref when discard is not enabled
2024-07-10 13:00 ` Hanna Czenczek
@ 2025-02-17 15:35 ` Jean-Louis Dupond
0 siblings, 0 replies; 7+ messages in thread
From: Jean-Louis Dupond @ 2025-02-17 15:35 UTC (permalink / raw)
To: Hanna Czenczek, qemu-devel, kwolf, andrey.drobyshev
On 7/10/24 15:00, Hanna Czenczek wrote:
> On 05.06.24 15:25, Jean-Louis Dupond wrote:
>> When discard is not set to unmap/on, we should not allow setting
>> discard-no-unref.
>
> Is this important? Technically, it’s an incompatible change, and
> would require a deprecation warning first.
No it doesn't hurt, but it's 'cleaner'.
But indeed, if its a hassle to get that resolved, we can leave it like
this.
>
> (I can imagine people setting this option indiscriminately on all
> image, whether discard actually plays a role or not. It may make
> sense for them.)
I don't think a lot of people use the discard-no-unref option (yet).
>
> Hanna
>
>> Signed-off-by: Jean-Louis Dupond <jean-louis@dupond.be>
>> ---
>> block/qcow2.c | 6 ++++++
>> 1 file changed, 6 insertions(+)
>>
>> diff --git a/block/qcow2.c b/block/qcow2.c
>> index 50354e5b98..cead5479e4 100644
>> --- a/block/qcow2.c
>> +++ b/block/qcow2.c
>> @@ -1156,6 +1156,12 @@ qcow2_update_options_prepare(BlockDriverState
>> *bs, Qcow2ReopenState *r,
>> ret = -EINVAL;
>> goto fail;
>> }
>> + if (r->discard_no_unref && !(flags & BDRV_O_UNMAP)) {
>> + error_setg(errp,
>> + "discard-no-unref is only valid with
>> discard=unmap/on");
>> + ret = -EINVAL;
>> + goto fail;
>> + }
>> switch (s->crypt_method_header) {
>> case QCOW_CRYPT_NONE:
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 1/2] qcow2: handle discard-no-unref in measure
2025-02-17 15:34 ` Jean-Louis Dupond
@ 2025-03-27 9:16 ` Jean-Louis Dupond
0 siblings, 0 replies; 7+ messages in thread
From: Jean-Louis Dupond @ 2025-03-27 9:16 UTC (permalink / raw)
To: Hanna Czenczek, qemu-devel, kwolf, andrey.drobyshev
Any chance this can get reviewed and perhaps merged?
We would like to enable discard-no-unref by default on oVirt, as this
makes qcow2 inside LVM LV's way more reliable (because we can calculate
the size).
But we are still missing this measure patch to be able to properly
calculate the destination size of the LV on a snapshot merge.
Thanks
Jean-Louis
On 2/17/25 16:34, Jean-Louis Dupond wrote:
> Hi,
>
> First of all sorry for the huge delay, but didn't had time to
> follow-up on this lately.
> And it got some lower priority, as we don't hit it often and have a
> fairly easy workaround (fill the empty blocks again in the snapshot by
> writing data to the disk).
>
> On 7/10/24 14:58, Hanna Czenczek wrote:
>> On 05.06.24 15:25, Jean-Louis Dupond wrote:
>>> When doing a measure on an image with a backing file and
>>> discard-no-unref is enabled, the code should take this into account.
>>
>> That doesn’t make sense to me. As far as I understand, 'measure' is
>> supposed to report how much space you need for a given image, i.e. if
>> you were to convert it to a new image. discard-no-unref doesn’t
>> factor into that, because for a 'convert' target (a new image),
>> nothing can be discarded.
>>
>> Reading the issue, I understand that oVirt uses measure to determine
>> the size of the target of a 'commit' operation. Seems a bit like
>> abuse to me, precisely because of the issue you’re facing. More
>> specifically, a 'commit' operation is a complex thing with a lot of
>> variables, so the outcome depends on a lot.
> Correct. oVirt uses the measure command to find out how big the
> destination volume needs to be when running a commit/merge of 2 disks.
> This way it can resize the container (Logical Volume here) to the
> correct size in order to succeed the commit.
>>
>> For example, this patch just checks the discard-no-unref setting on
>> the top image. But AFAIU it doesn’t matter what the setting on the
>> top image is, it matters what the setting on the commit target is.
>> 'measure' can’t know this because it doesn’t know what the commit
>> target is. As far as I can see, this patch actually assumes the
>> commit target is the first backing image (it specifically checks in
>> the image whether a block is allocated) – why?
> By default it would check the top image indeed, but not when using the
> complex json parameters to qemu-img measure.
> For example:
> ./build/qemu-img create -f qcow2 /tmp/test.qcow2 128M
> ./build/qemu-io -c 'open /tmp/test.qcow2' -c 'write 0 8M' -c 'write
> 56M 20M' -c 'write 10M 8M' -c 'write 24M 32M'
> ./build/qemu-img create -f qcow2 -b /tmp/test.qcow2 -F qcow2
> /tmp/test_snap.qcow2
> ./build/qemu-io -c 'open -o discard=unmap,discard-no-unref=on
> /tmp/test_snap.qcow2' -c 'write 16M 8M' -c 'discard 60M 20M' -c 'write
> 84M 10M'
>
>
> The following commands will give the current output:
> [jean-louis@lt-jeanlouis qemu]$ ./build/qemu-img measure --output json
> -O qcow2 'json:{"file": {"driver": "file", "filename":
> "/tmp/test_snap.qcow2"}, "driver": "qcow2", "discard":"unmap",
> "discard-no-unref":true, "backing": {"driver": "qcow2",
> "discard-no-unref":false, "file": {"driver": "file", "filename":
> "/tmp/test.qcow2"}, "backing": null}}'
> {
> "bitmaps": 0,
> "required": 71630848,
> "fully-allocated": 134545408
> }
> [jean-louis@lt-jeanlouis qemu]$ ./build/qemu-img measure --output json
> -O qcow2 /tmp/test_snap.qcow2
> {
> "bitmaps": 0,
> "required": 71630848,
> "fully-allocated": 134545408
> }
> [jean-louis@lt-jeanlouis qemu]$ ./build/qemu-img measure --output json
> -O qcow2 'json:{"file": {"driver": "file", "filename":
> "/tmp/test_snap.qcow2"}, "driver": "qcow2", "backing": {"driver":
> "qcow2", "file": {"driver": "file", "filename": "/tmp/test.qcow2"},
> "backing": null}}'
> {
> "bitmaps": 0,
> "required": 71630848,
> "fully-allocated": 134545408
> }
>
> Cause it will not take into account the discard-no-unref flag. And
> will give the output like you have in the current version.
>
>
> But when running measure with the following options:
> ./build/qemu-img measure --output json -O qcow2 'json:{"file":
> {"driver": "file", "filename": "/tmp/test_snap.qcow2"}, "driver":
> "qcow2", "discard":"unmap", "discard-no-unref":true, "backing":
> {"driver": "qcow2", "discard-no-unref":true, "file": {"driver":
> "file", "filename": "/tmp/test.qcow2"}, "backing": null}}'
>
> It will give a bigger required size:
> {
> "bitmaps": 0,
> "required": 88408064,
> "fully-allocated": 134545408
> }
>
>
> Why? if a block has already been allocated (either with data or
> contains an allocated ZERO block), we want to include its size in the
> calculation.
> Because with discard-no-unref, an allocated block will not be reused
> for some other cluster, so it's not available for data in the snapshot
> layer.
> So if the cluster was not yet allocated in the destination image, a
> new cluster will need to be allocated to fit the new data from the
> snapshot layer.
>
>>
>> So to me that means if 'measure' is supposed to give reliable data on
>> the commit case, it needs to be extended. Best thing I can come up
>> with off the top of my head would be to add an option e.g.
>> 'commit=<target-node-name>', so we (A) that we’re looking at a commit
>> and not a convert, and (B) we know what data will be collapsed into
>> which image and where we need to check for discard-no-unref.
> I think that is what can be achieved by using the json argument. Cause
> there we can specify the target with its flags.
> And it's then the responsibility of oVirt (or whatever other tool), to
> pass the correct flags.
>>
>> Hanna
> Thanks for the review
>
> Jean-Louis
>>
>>> If for example you have a snapshot image with a base, and you do a
>>> discard within the snapshot, it will be ZERO and ALLOCATED, but without
>>> host offset.
>>> Now if we commit this snapshot, and the clusters in the base image have
>>> a host offset, the clusters will only be set to ZERO, but the host
>>> offset
>>> will not be cleared.
>>> Therefor non-data clusters in the top image need to check the
>>> base to see if space will be freed or not, to have a correct measure
>>> output.
>>>
>>> Bug-Url: https://gitlab.com/qemu-project/qemu/-/issues/2369
>>> Signed-off-by: Jean-Louis Dupond <jean-louis@dupond.be>
>>> ---
>>> block/qcow2.c | 32 +++++++++++++++++++++++++++++---
>>> 1 file changed, 29 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/block/qcow2.c b/block/qcow2.c
>>> index 956128b409..50354e5b98 100644
>>> --- a/block/qcow2.c
>>> +++ b/block/qcow2.c
>>> @@ -5163,9 +5163,16 @@ static BlockMeasureInfo
>>> *qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs,
>>> } else {
>>> int64_t offset;
>>> int64_t pnum = 0;
>>> + BlockDriverState *parent = bdrv_filter_or_cow_bs(in_bs);
>>> + BDRVQcow2State *s = NULL;
>>> +
>>> + if (parent) {
>>> + s = parent->opaque;
>>> + }
>>> for (offset = 0; offset < ssize; offset += pnum) {
>>> int ret;
>>> + int retp = 0;
>>> ret = bdrv_block_status_above(in_bs, NULL, offset,
>>> ssize - offset,
>>> &pnum, NULL,
>>> @@ -5176,10 +5183,29 @@ static BlockMeasureInfo
>>> *qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs,
>>> goto err;
>>> }
>>> - if (ret & BDRV_BLOCK_ZERO) {
>>> + /* If we have a parent in the chain and the current
>>> block is not data,
>>> + * then we want to check the allocation state of
>>> the parent block.
>>> + * If it has a valid offset, then we want to
>>> include it into
>>> + * the calculation, cause blocks with an offset
>>> will not be freed when
>>> + * committing the top into base with
>>> discard-no-unref enabled.
>>> + */
>>> + if (parent && s->discard_no_unref && !(ret &
>>> BDRV_BLOCK_DATA)) {
>>> + int64_t pnum_parent = 0;
>>> + retp = bdrv_block_status_above(parent,
>>> NULL, offset,
>>> + ssize - offset,
>>> &pnum_parent, NULL,
>>> + NULL);
>>> + /* If the parent continuous block is
>>> smaller, use that pnum,
>>> + * so the next iteration starts with the
>>> smallest offset.
>>> + */
>>> + if (pnum_parent < pnum) {
>>> + pnum = pnum_parent;
>>> + }
>>> + }
>>> + if (ret & BDRV_BLOCK_ZERO && !parent && !(parent &&
>>> s->discard_no_unref)) {
>>> /* Skip zero regions (safe with no backing
>>> file) */
>>> - } else if ((ret & (BDRV_BLOCK_DATA |
>>> BDRV_BLOCK_ALLOCATED)) ==
>>> - (BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED)) {
>>> + } else if (((ret & (BDRV_BLOCK_DATA |
>>> BDRV_BLOCK_ALLOCATED)) ==
>>> + (BDRV_BLOCK_DATA |
>>> BDRV_BLOCK_ALLOCATED)) ||
>>> + (retp & BDRV_BLOCK_OFFSET_VALID)) {
>>> /* Extend pnum to end of cluster for next
>>> iteration */
>>> pnum = ROUND_UP(offset + pnum, cluster_size) -
>>> offset;
>>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-03-27 9:16 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-05 13:25 [PATCH v2 1/2] qcow2: handle discard-no-unref in measure Jean-Louis Dupond
2024-06-05 13:25 ` [PATCH v2 2/2] qcow2: don't allow discard-no-unref when discard is not enabled Jean-Louis Dupond
2024-07-10 13:00 ` Hanna Czenczek
2025-02-17 15:35 ` Jean-Louis Dupond
2024-07-10 12:58 ` [PATCH v2 1/2] qcow2: handle discard-no-unref in measure Hanna Czenczek
2025-02-17 15:34 ` Jean-Louis Dupond
2025-03-27 9:16 ` Jean-Louis Dupond
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).