qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Jean-Louis Dupond <jean-louis@dupond.be>
To: Hanna Czenczek <hreitz@redhat.com>,
	qemu-devel@nongnu.org, kwolf@redhat.com
Subject: Re: [PATCH] qcow2: add discard-no-unref option
Date: Thu, 1 Jun 2023 14:56:53 +0200	[thread overview]
Message-ID: <08a88b12-ec0a-110a-ad64-2116712065e8@dupond.be> (raw)
In-Reply-To: <5cc0ec56-8a13-c651-0b4e-da644c9f6900@redhat.com>

On 31/05/2023 17:05, Hanna Czenczek wrote:
> On 15.05.23 09:36, Jean-Louis Dupond wrote:
>> When we for example have a sparse qcow2 image and discard: unmap is 
>> enabled,
>> there can be a lot of fragmentation in the image after some time. 
>> Surely on VM's
>
> s/. Surely/, especially/
>
>> that do a lot of writes/deletes.
>> This causes the qcow2 image to grow even over 110% of its virtual size,
>> because the free gaps in the image get to small to allocate new
>
> s/to small/too small/
>
>> continuous clusters. So it allocates new space as the end of the image.
>
> s/as/at/
>
>> Disabling discard is not an option, as discard is needed to keep the
>> incremental backup size as low as possible. Without discard, the
>> incremental backups would become large, as qemu thinks it's just dirty
>> blocks but it doesn't know the blocks are empty/useless.
>> So we need to avoid fragmentation but also 'empty' the useless blocks in
>
> s/useless/unneeded/ in both lines?
>
>> the image to have a small incremental backup.
>>
>> Next to that we also want to send the discards futher down the stack, so
>
> s/Next to that/In addition/, s/futher/further/
>
>> the underlying blocks are still discarded.
>>
>> Therefor we introduce a new qcow2 option "discard-no-unref". When
>> setting this option to true (defaults to false), the discard requests
>> will still be executed, but it will keep the offset of the cluster. And
>> it will also pass the discard request further down the stack (if
>> discard:unmap is enabled).
>
> I think this could be more explicit, e.g. “When setting this option to 
> true, discards will no longer have the qcow2 driver relinquish cluster 
> allocations. Other than that, the request is handled as normal: All 
> clusters in range are marked as zero, and, if pass-discard-request is 
> true, it is passed further down the stack. The only difference is that 
> the now-zero clusters are preallocated instead of being unallocated.”
>
>> This will avoid fragmentation and for example on a fully preallocated
>> qcow2 image, this will make sure the image is perfectly continuous.
>
> Well, on the qcow2 layer, yes.
All above -> Fixed :)
>
>> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1621
>> Signed-off-by: Jean-Louis Dupond <jean-louis@dupond.be>
>> ---
>>   block/qcow2-cluster.c  |  16 ++++-
>>   block/qcow2-refcount.c | 136 ++++++++++++++++++++++++-----------------
>>   block/qcow2.c          |  12 ++++
>>   block/qcow2.h          |   3 +
>>   qapi/block-core.json   |   4 ++
>>   qemu-options.hx        |   6 ++
>>   6 files changed, 120 insertions(+), 57 deletions(-)
>>
>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
>> index 39cda7f907..88da70db5e 100644
>> --- a/block/qcow2-cluster.c
>> +++ b/block/qcow2-cluster.c
>> @@ -1943,10 +1943,22 @@ static int 
>> discard_in_l2_slice(BlockDriverState *bs, uint64_t offset,
>>               new_l2_entry = new_l2_bitmap = 0;
>>           } else if (bs->backing || 
>> qcow2_cluster_is_allocated(cluster_type)) {
>>               if (has_subclusters(s)) {
>> -                new_l2_entry = 0;
>> +                if (s->discard_no_unref && (type & 
>> QCOW2_DISCARD_REQUEST)) {
>
> As far as I understand the discard type is just a plain enum, not a 
> bit field.  So I think this should be `type == QCOW2_DISCARD_REQUEST`, 
> not an `&`.  (Same below.)
>
Ack
>> +                    new_l2_entry = old_l2_entry;
>> +                } else {
>> +                    new_l2_entry = 0;
>> +                }
>>                   new_l2_bitmap = QCOW_L2_BITMAP_ALL_ZEROES;
>>               } else {
>> -                new_l2_entry = s->qcow_version >= 3 ? 
>> QCOW_OFLAG_ZERO : 0;
>> +                if (s->qcow_version >= 3) {
>> +                    if (s->discard_no_unref && (type & 
>> QCOW2_DISCARD_REQUEST)) {
>> +                        new_l2_entry |= QCOW_OFLAG_ZERO;
>> +                    } else {
>> +                        new_l2_entry = QCOW_OFLAG_ZERO;
>> +                    }
>> +                } else {
>> +                    new_l2_entry = 0;
>> +                }
>>               }
>>           }
>
> Context below:
>
>         if (old_l2_entry == new_l2_entry && old_l2_bitmap == 
> new_l2_bitmap) {
>             continue;
>         }
>
>         /* First remove L2 entries */
>         qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
>         set_l2_entry(s, l2_slice, l2_index + i, new_l2_entry);
>         if (has_subclusters(s)) {
>             set_l2_bitmap(s, l2_slice, l2_index + i, new_l2_bitmap);
>         }
>         /* Then decrease the refcount */
>         qcow2_free_any_cluster(bs, old_l2_entry, type);
>
> If we keep the allocation, I don’t see why we would call 
> qcow2_free_any_cluster().  If we simply skip the call (if 
> `qcow2_is_allocated(qcow2_get_cluster_type(bs, new_l2_entry))`), I 
> think you could drop the modification to update_refcount().
>
If we don't call qcow2_free_any_cluster, the discard will not get passed 
to the lower layer.
We also call it in zero_in_l2_slice for example to discard lower layer.
> [...]
>
>> diff --git a/block/qcow2.c b/block/qcow2.c
>> index 5bde3b8401..9dde2ac1a5 100644
>> --- a/block/qcow2.c
>> +++ b/block/qcow2.c
>
> [...]
>
>> @@ -725,6 +726,11 @@ static QemuOptsList qcow2_runtime_opts = {
>>               .type = QEMU_OPT_BOOL,
>>               .help = "Generate discard requests when other clusters 
>> are freed",
>>           },
>> +        {
>> +            .name = QCOW2_OPT_DISCARD_NO_UNREF,
>> +            .type = QEMU_OPT_BOOL,
>> +            .help = "Do not dereference discarded clusters",
>
> I wouldn’t call it “dereference” because of the overloaded meaning in 
> C, but “unreference” instead.
>
ack
>> +        },
>>           {
>>               .name = QCOW2_OPT_OVERLAP,
>>               .type = QEMU_OPT_STRING,
>
> [...]
>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index 187e35d473..63aa792e9c 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -3432,6 +3432,9 @@
>>   # @pass-discard-other: whether discard requests for the data source
>>   #     should be issued on other occasions where a cluster gets freed
>>   #
>> +# @discard-no-unref: don't dereference the cluster when we do a discard
>> +#     this to avoid fragmentation of the qcow2 image (since 8.1)
>
> Because this comment is used to build different documentation than 
> qemu-options.hx is, I would duplicate the full comment you put into 
> qemu-options.hx here (to provide the best documentation possible).
>
ack
>> +#
>>   # @overlap-check: which overlap checks to perform for writes to the
>>   #     image, defaults to 'cached' (since 2.2)
>>   #
>> @@ -3470,6 +3473,7 @@
>>               '*pass-discard-request': 'bool',
>>               '*pass-discard-snapshot': 'bool',
>>               '*pass-discard-other': 'bool',
>> +            '*discard-no-unref': 'bool',
>>               '*overlap-check': 'Qcow2OverlapChecks',
>>               '*cache-size': 'int',
>>               '*l2-cache-size': 'int',
>> diff --git a/qemu-options.hx b/qemu-options.hx
>> index 42b9094c10..17ac701d0d 100644
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
>> @@ -1431,6 +1431,12 @@ SRST
>>               issued on other occasions where a cluster gets freed
>>               (on/off; default: off)
>>   +        ``discard-no-unref``
>> +            When enabled, a discard in the guest does not cause the
>> +            cluster inside the qcow2 image to be dereferenced. This
>
> Like above, I’d prefer “unreferenced”, or “the cluster’s allocation 
> […] to be relinquished”.
>
ack
>> +            was added to avoid qcow2 fragmentation whithin the image.
>> +            (on/off; default: off)
>
> I wouldn’t describe history here, but instead what this is for. E.g.: 
> “When enabled, discards from the guest will not cause cluster 
> allocations to be relinquished. This prevents qcow2 fragmentation that 
> would be caused by such discards. Besides potential performance 
> degradation, such fragmentation can lead to increased allocation of 
> clusters past the end of the image file, resulting in image files 
> whose file length can grow much larger than their guest disk size 
> would suggest. If image file length is of concern (e.g. when storing 
> qcow2 images directly on block devices), you should consider enabling 
> this option.”
ack
>
> What do you think?

Let me know your opinion on the qcow2_free_any_cluster call, and then 
I'll post a new patch version.

>
> Hanna
>

Jean-Louis



  reply	other threads:[~2023-06-01 12:57 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-15  7:36 [PATCH] qcow2: add discard-no-unref option Jean-Louis Dupond
2023-05-26 13:31 ` Hanna Czenczek
2023-05-26 14:30   ` Jean-Louis Dupond
2023-05-31 12:48     ` Hanna Czenczek
2023-05-31 15:05 ` Hanna Czenczek
2023-06-01 12:56   ` Jean-Louis Dupond [this message]
2023-06-02  8:01     ` Hanna Czenczek
2023-05-31 15:16 ` Hanna Czenczek
2023-06-01 13:13   ` Jean-Louis Dupond

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=08a88b12-ec0a-110a-ad64-2116712065e8@dupond.be \
    --to=jean-louis@dupond.be \
    --cc=hreitz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).