From: Jean-Louis Dupond <jean-louis@dupond.be>
To: Hanna Czenczek <hreitz@redhat.com>,
qemu-devel@nongnu.org, kwolf@redhat.com
Subject: Re: [PATCH v2] qcow2: add discard-no-unref option
Date: Mon, 5 Jun 2023 10:47:43 +0200 [thread overview]
Message-ID: <d012533d-0fd5-6ab7-16e7-4f3dc0872ab5@dupond.be> (raw)
In-Reply-To: <11d5e450-f24f-a52c-e143-9889ccf724ef@redhat.com>
On 2/06/2023 17:28, Hanna Czenczek wrote:
> On 02.06.23 14:47, Jean-Louis Dupond wrote:
>> When we for example have a sparse qcow2 image and discard: unmap is
>> enabled,
>> there can be a lot of fragmentation in the image after some time.
>> Especially on VM's
>> that do a lot of writes/deletes.
>> This causes the qcow2 image to grow even over 110% of its virtual size,
>> because the free gaps in the image get too small to allocate new
>> continuous clusters. So it allocates new space at the end of the image.
>>
>> Disabling discard is not an option, as discard is needed to keep the
>> incremental backup size as low as possible. Without discard, the
>> incremental backups would become large, as qemu thinks it's just dirty
>> blocks but it doesn't know the blocks are unneeded.
>> So we need to avoid fragmentation but also 'empty' the unneeded
>> blocks in
>> the image to have a small incremental backup.
>>
>> In addition, we also want to send the discards further down the
>> stack, so
>> the underlying blocks are still discarded.
>>
>> Therefor we introduce a new qcow2 option "discard-no-unref".
>> When setting this option to true, discards will no longer have the qcow2
>> driver relinquish cluster allocations. Other than that, the request is
>> handled as normal: All clusters in range are marked as zero, and, if
>> pass-discard-request is true, it is passed further down the stack.
>> The only difference is that the now-zero clusters are preallocated
>> instead of being unallocated.
>> This will avoid fragmentation on the qcow2 image.
>>
>> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1621
>> Signed-off-by: Jean-Louis Dupond <jean-louis@dupond.be>
>> ---
>> block/qcow2-cluster.c | 32 ++++++++++++++++++++++++++++----
>> block/qcow2.c | 18 ++++++++++++++++++
>> block/qcow2.h | 3 +++
>> qapi/block-core.json | 12 ++++++++++++
>> qemu-options.hx | 12 ++++++++++++
>> 5 files changed, 73 insertions(+), 4 deletions(-)
>>
>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
>> index 39cda7f907..1f130c6ab9 100644
>> --- a/block/qcow2-cluster.c
>> +++ b/block/qcow2-cluster.c
>> @@ -1894,6 +1894,7 @@ again:
>> return 0;
>> }
>> +
>> /*
>> * This discards as many clusters of nb_clusters as possible at
>> once (i.e.
>> * all clusters in the same L2 slice) and returns the number of
>> discarded
>
> Was adding this empty line intentional? (If not, I’d drop it.)
Dropped
>
>> @@ -1925,6 +1926,9 @@ static int discard_in_l2_slice(BlockDriverState
>> *bs, uint64_t offset,
>> uint64_t new_l2_bitmap = old_l2_bitmap;
>> QCow2ClusterType cluster_type =
>> qcow2_get_cluster_type(bs, old_l2_entry);
>> + bool keep_reference = (cluster_type !=
>> QCOW2_CLUSTER_COMPRESSED) &&
>> + (s->discard_no_unref &&
>> + type == QCOW2_DISCARD_REQUEST);
>
> (Sorry I didn’t notice before :/) I think there’s a condition missing
> here, namely `full_discard` (i.e. `&& !full_discard`). We must set
> `keep_reference` only if we will actually keep the reference, which
> won’t happen when `full_discard` is set. (Same could be said for
> s->qcow_version < 3, but in that case, `s->discard_no_unref` can’t be
> true.)
>
> (Not a problem in practice because `type == QCOW2_DISCARD_REQUEST`
> never happens together with `full_discard`, but better be safe than
> sorry.)
Fixed!
>
> Alternatively...
>
>> /*
>> * If full_discard is true, the cluster should not read
>> back as zeroes,
>
> [...]
>
>> @@ -1960,8 +1976,16 @@ static int
>> discard_in_l2_slice(BlockDriverState *bs, uint64_t offset,
>> if (has_subclusters(s)) {
>> set_l2_bitmap(s, l2_slice, l2_index + i, new_l2_bitmap);
>> }
>> - /* Then decrease the refcount */
>> - qcow2_free_any_cluster(bs, old_l2_entry, type);
>> + if (!keep_reference) {
>
> ...we could explicitly check here whether the new L2 entry is still
> allocated or not, like
>
> ```
> QCow2ClusterType new_cluster_type =
> qcow2_get_cluster_type(bs, new_l2_entry);
>
> if (!qcow2_cluster_is_allocated(new_cluster_type)) {
> /* Decrease the refcount if the cluster has been deallocated */
> qcow2_free_any_cluster(...);
> } else if (s->discard_passthrough[type] &&
> qcow2_cluster_is_allocated(cluster_type)) {
> /* If we keep the reference, pass on the discard still */
>
> /* Discard must always produce zero-reading clusters */
> assert(new_cluster_type == QCOW2_CLUSTER_ZERO_ALLOC);
> /* Compressed clusters will never remain allocated */
> assert(cluster_type != QCOW2_CLUSTER_COMPRESSED);
>
> bdrv_pdiscard(...);
> }
> ```
>
> Just an idea, though, I understand if you’d rather not modify the
> patch further.
>
>> + /* Then decrease the refcount */
>> + qcow2_free_any_cluster(bs, old_l2_entry, type);
>> + } else if (s->discard_passthrough[type] &&
>> + (cluster_type == QCOW2_CLUSTER_NORMAL ||
>> + cluster_type == QCOW2_CLUSTER_ZERO_ALLOC)) {
>> + /* If we keep the reference, pass on the discard still */
>> + bdrv_pdiscard(s->data_file, new_l2_entry & L2E_OFFSET_MASK,
>> + s->cluster_size);
>
> I mentioned this briefly on IRC, might have gone under the radar; I
> think using `old_l2_entry` is better than `new_l2_entry`. In
> practice, there shouldn’t be a difference, but I think it’s slightly
> cleaner to free based on the old entry than have this be based on the
> new one.
Was caused by undoing to much :) Fixed.
>
> (Also, in case we did mess up, like in the hypothetical case above
> where `keep_reference` is true while `full_discard` is also true,
> using `old_l2_entry` means we’ll just accidentally discard the old
> cluster (the accident is merely to discard the cluster instead of
> decrementing its refcount), instead of discarding a completely wrong
> cluster (the image header, with `new_l2_entry = 0`).)
>
> Rest looks good to me!
Posted v3 patch to the ML
>
> Hanna
>
Thanks
Jean-Louis
prev parent reply other threads:[~2023-06-05 8:48 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-02 12:47 [PATCH v2] qcow2: add discard-no-unref option Jean-Louis Dupond
2023-06-02 15:28 ` Hanna Czenczek
2023-06-05 8:47 ` Jean-Louis Dupond [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d012533d-0fd5-6ab7-16e7-4f3dc0872ab5@dupond.be \
--to=jean-louis@dupond.be \
--cc=hreitz@redhat.com \
--cc=kwolf@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).