qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Jean-Louis Dupond <jean-louis@dupond.be>
To: Hanna Czenczek <hreitz@redhat.com>,
	qemu-devel@nongnu.org, kwolf@redhat.com
Subject: Re: [PATCH v2] qcow2: add discard-no-unref option
Date: Mon, 5 Jun 2023 10:47:43 +0200	[thread overview]
Message-ID: <d012533d-0fd5-6ab7-16e7-4f3dc0872ab5@dupond.be> (raw)
In-Reply-To: <11d5e450-f24f-a52c-e143-9889ccf724ef@redhat.com>


On 2/06/2023 17:28, Hanna Czenczek wrote:
> On 02.06.23 14:47, Jean-Louis Dupond wrote:
>> When we for example have a sparse qcow2 image and discard: unmap is 
>> enabled,
>> there can be a lot of fragmentation in the image after some time. 
>> Especially on VM's
>> that do a lot of writes/deletes.
>> This causes the qcow2 image to grow even over 110% of its virtual size,
>> because the free gaps in the image get too small to allocate new
>> continuous clusters. So it allocates new space at the end of the image.
>>
>> Disabling discard is not an option, as discard is needed to keep the
>> incremental backup size as low as possible. Without discard, the
>> incremental backups would become large, as qemu thinks it's just dirty
>> blocks but it doesn't know the blocks are unneeded.
>> So we need to avoid fragmentation but also 'empty' the unneeded 
>> blocks in
>> the image to have a small incremental backup.
>>
>> In addition, we also want to send the discards further down the 
>> stack, so
>> the underlying blocks are still discarded.
>>
>> Therefor we introduce a new qcow2 option "discard-no-unref".
>> When setting this option to true, discards will no longer have the qcow2
>> driver relinquish cluster allocations. Other than that, the request is
>> handled as normal: All clusters in range are marked as zero, and, if
>> pass-discard-request is true, it is passed further down the stack.
>> The only difference is that the now-zero clusters are preallocated
>> instead of being unallocated.
>> This will avoid fragmentation on the qcow2 image.
>>
>> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1621
>> Signed-off-by: Jean-Louis Dupond <jean-louis@dupond.be>
>> ---
>>   block/qcow2-cluster.c | 32 ++++++++++++++++++++++++++++----
>>   block/qcow2.c         | 18 ++++++++++++++++++
>>   block/qcow2.h         |  3 +++
>>   qapi/block-core.json  | 12 ++++++++++++
>>   qemu-options.hx       | 12 ++++++++++++
>>   5 files changed, 73 insertions(+), 4 deletions(-)
>>
>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
>> index 39cda7f907..1f130c6ab9 100644
>> --- a/block/qcow2-cluster.c
>> +++ b/block/qcow2-cluster.c
>> @@ -1894,6 +1894,7 @@ again:
>>       return 0;
>>   }
>>   +
>>   /*
>>    * This discards as many clusters of nb_clusters as possible at 
>> once (i.e.
>>    * all clusters in the same L2 slice) and returns the number of 
>> discarded
>
> Was adding this empty line intentional?  (If not, I’d drop it.)
Dropped
>
>> @@ -1925,6 +1926,9 @@ static int discard_in_l2_slice(BlockDriverState 
>> *bs, uint64_t offset,
>>           uint64_t new_l2_bitmap = old_l2_bitmap;
>>           QCow2ClusterType cluster_type =
>>               qcow2_get_cluster_type(bs, old_l2_entry);
>> +        bool keep_reference = (cluster_type != 
>> QCOW2_CLUSTER_COMPRESSED) &&
>> +                              (s->discard_no_unref &&
>> +                               type == QCOW2_DISCARD_REQUEST);
>
> (Sorry I didn’t notice before :/) I think there’s a condition missing 
> here, namely `full_discard` (i.e. `&& !full_discard`).  We must set 
> `keep_reference` only if we will actually keep the reference, which 
> won’t happen when `full_discard` is set.  (Same could be said for 
> s->qcow_version < 3, but in that case, `s->discard_no_unref` can’t be 
> true.)
>
> (Not a problem in practice because `type == QCOW2_DISCARD_REQUEST` 
> never happens together with `full_discard`, but better be safe than 
> sorry.)
Fixed!
>
> Alternatively...
>
>>           /*
>>            * If full_discard is true, the cluster should not read 
>> back as zeroes,
>
> [...]
>
>> @@ -1960,8 +1976,16 @@ static int 
>> discard_in_l2_slice(BlockDriverState *bs, uint64_t offset,
>>           if (has_subclusters(s)) {
>>               set_l2_bitmap(s, l2_slice, l2_index + i, new_l2_bitmap);
>>           }
>> -        /* Then decrease the refcount */
>> -        qcow2_free_any_cluster(bs, old_l2_entry, type);
>> +        if (!keep_reference) {
>
> ...we could explicitly check here whether the new L2 entry is still 
> allocated or not, like
>
> ```
> QCow2ClusterType new_cluster_type =
>     qcow2_get_cluster_type(bs, new_l2_entry);
>
> if (!qcow2_cluster_is_allocated(new_cluster_type)) {
>     /* Decrease the refcount if the cluster has been deallocated */
>     qcow2_free_any_cluster(...);
> } else if (s->discard_passthrough[type] &&
>            qcow2_cluster_is_allocated(cluster_type)) {
>     /* If we keep the reference, pass on the discard still */
>
>     /* Discard must always produce zero-reading clusters */
>     assert(new_cluster_type == QCOW2_CLUSTER_ZERO_ALLOC);
>     /* Compressed clusters will never remain allocated */
>     assert(cluster_type != QCOW2_CLUSTER_COMPRESSED);
>
>     bdrv_pdiscard(...);
> }
> ```
>
> Just an idea, though, I understand if you’d rather not modify the 
> patch further.
>
>> +            /* Then decrease the refcount */
>> +            qcow2_free_any_cluster(bs, old_l2_entry, type);
>> +        } else if (s->discard_passthrough[type] &&
>> +                   (cluster_type == QCOW2_CLUSTER_NORMAL ||
>> +                    cluster_type == QCOW2_CLUSTER_ZERO_ALLOC)) {
>> +            /* If we keep the reference, pass on the discard still */
>> +            bdrv_pdiscard(s->data_file, new_l2_entry & L2E_OFFSET_MASK,
>> +                          s->cluster_size);
>
> I mentioned this briefly on IRC, might have gone under the radar; I 
> think using `old_l2_entry` is better than `new_l2_entry`.  In 
> practice, there shouldn’t be a difference, but I think it’s slightly 
> cleaner to free based on the old entry than have this be based on the 
> new one.
Was caused by undoing to much :) Fixed.
>
> (Also, in case we did mess up, like in the hypothetical case above 
> where `keep_reference` is true while `full_discard` is also true, 
> using `old_l2_entry` means we’ll just accidentally discard the old 
> cluster (the accident is merely to discard the cluster instead of 
> decrementing its refcount), instead of discarding a completely wrong 
> cluster (the image header, with `new_l2_entry = 0`).)
>
> Rest looks good to me!
Posted v3 patch to the ML
>
> Hanna
>

Thanks
Jean-Louis



      reply	other threads:[~2023-06-05  8:48 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-02 12:47 [PATCH v2] qcow2: add discard-no-unref option Jean-Louis Dupond
2023-06-02 15:28 ` Hanna Czenczek
2023-06-05  8:47   ` Jean-Louis Dupond [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d012533d-0fd5-6ab7-16e7-4f3dc0872ab5@dupond.be \
    --to=jean-louis@dupond.be \
    --cc=hreitz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).