From: Jean-Louis Dupond <jean-louis@dupond.be>
To: Hanna Czenczek <hreitz@redhat.com>,
qemu-devel@nongnu.org, kwolf@redhat.com
Subject: Re: [PATCH] qcow2: add discard-no-unref option
Date: Thu, 1 Jun 2023 14:56:53 +0200 [thread overview]
Message-ID: <08a88b12-ec0a-110a-ad64-2116712065e8@dupond.be> (raw)
In-Reply-To: <5cc0ec56-8a13-c651-0b4e-da644c9f6900@redhat.com>
On 31/05/2023 17:05, Hanna Czenczek wrote:
> On 15.05.23 09:36, Jean-Louis Dupond wrote:
>> When we for example have a sparse qcow2 image and discard: unmap is
>> enabled,
>> there can be a lot of fragmentation in the image after some time.
>> Surely on VM's
>
> s/. Surely/, especially/
>
>> that do a lot of writes/deletes.
>> This causes the qcow2 image to grow even over 110% of its virtual size,
>> because the free gaps in the image get to small to allocate new
>
> s/to small/too small/
>
>> continuous clusters. So it allocates new space as the end of the image.
>
> s/as/at/
>
>> Disabling discard is not an option, as discard is needed to keep the
>> incremental backup size as low as possible. Without discard, the
>> incremental backups would become large, as qemu thinks it's just dirty
>> blocks but it doesn't know the blocks are empty/useless.
>> So we need to avoid fragmentation but also 'empty' the useless blocks in
>
> s/useless/unneeded/ in both lines?
>
>> the image to have a small incremental backup.
>>
>> Next to that we also want to send the discards futher down the stack, so
>
> s/Next to that/In addition/, s/futher/further/
>
>> the underlying blocks are still discarded.
>>
>> Therefor we introduce a new qcow2 option "discard-no-unref". When
>> setting this option to true (defaults to false), the discard requests
>> will still be executed, but it will keep the offset of the cluster. And
>> it will also pass the discard request further down the stack (if
>> discard:unmap is enabled).
>
> I think this could be more explicit, e.g. “When setting this option to
> true, discards will no longer have the qcow2 driver relinquish cluster
> allocations. Other than that, the request is handled as normal: All
> clusters in range are marked as zero, and, if pass-discard-request is
> true, it is passed further down the stack. The only difference is that
> the now-zero clusters are preallocated instead of being unallocated.”
>
>> This will avoid fragmentation and for example on a fully preallocated
>> qcow2 image, this will make sure the image is perfectly continuous.
>
> Well, on the qcow2 layer, yes.
All above -> Fixed :)
>
>> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1621
>> Signed-off-by: Jean-Louis Dupond <jean-louis@dupond.be>
>> ---
>> block/qcow2-cluster.c | 16 ++++-
>> block/qcow2-refcount.c | 136 ++++++++++++++++++++++++-----------------
>> block/qcow2.c | 12 ++++
>> block/qcow2.h | 3 +
>> qapi/block-core.json | 4 ++
>> qemu-options.hx | 6 ++
>> 6 files changed, 120 insertions(+), 57 deletions(-)
>>
>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
>> index 39cda7f907..88da70db5e 100644
>> --- a/block/qcow2-cluster.c
>> +++ b/block/qcow2-cluster.c
>> @@ -1943,10 +1943,22 @@ static int
>> discard_in_l2_slice(BlockDriverState *bs, uint64_t offset,
>> new_l2_entry = new_l2_bitmap = 0;
>> } else if (bs->backing ||
>> qcow2_cluster_is_allocated(cluster_type)) {
>> if (has_subclusters(s)) {
>> - new_l2_entry = 0;
>> + if (s->discard_no_unref && (type &
>> QCOW2_DISCARD_REQUEST)) {
>
> As far as I understand the discard type is just a plain enum, not a
> bit field. So I think this should be `type == QCOW2_DISCARD_REQUEST`,
> not an `&`. (Same below.)
>
Ack
>> + new_l2_entry = old_l2_entry;
>> + } else {
>> + new_l2_entry = 0;
>> + }
>> new_l2_bitmap = QCOW_L2_BITMAP_ALL_ZEROES;
>> } else {
>> - new_l2_entry = s->qcow_version >= 3 ?
>> QCOW_OFLAG_ZERO : 0;
>> + if (s->qcow_version >= 3) {
>> + if (s->discard_no_unref && (type &
>> QCOW2_DISCARD_REQUEST)) {
>> + new_l2_entry |= QCOW_OFLAG_ZERO;
>> + } else {
>> + new_l2_entry = QCOW_OFLAG_ZERO;
>> + }
>> + } else {
>> + new_l2_entry = 0;
>> + }
>> }
>> }
>
> Context below:
>
> if (old_l2_entry == new_l2_entry && old_l2_bitmap ==
> new_l2_bitmap) {
> continue;
> }
>
> /* First remove L2 entries */
> qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
> set_l2_entry(s, l2_slice, l2_index + i, new_l2_entry);
> if (has_subclusters(s)) {
> set_l2_bitmap(s, l2_slice, l2_index + i, new_l2_bitmap);
> }
> /* Then decrease the refcount */
> qcow2_free_any_cluster(bs, old_l2_entry, type);
>
> If we keep the allocation, I don’t see why we would call
> qcow2_free_any_cluster(). If we simply skip the call (if
> `qcow2_is_allocated(qcow2_get_cluster_type(bs, new_l2_entry))`), I
> think you could drop the modification to update_refcount().
>
If we don't call qcow2_free_any_cluster, the discard will not get passed
to the lower layer.
We also call it in zero_in_l2_slice for example to discard lower layer.
> [...]
>
>> diff --git a/block/qcow2.c b/block/qcow2.c
>> index 5bde3b8401..9dde2ac1a5 100644
>> --- a/block/qcow2.c
>> +++ b/block/qcow2.c
>
> [...]
>
>> @@ -725,6 +726,11 @@ static QemuOptsList qcow2_runtime_opts = {
>> .type = QEMU_OPT_BOOL,
>> .help = "Generate discard requests when other clusters
>> are freed",
>> },
>> + {
>> + .name = QCOW2_OPT_DISCARD_NO_UNREF,
>> + .type = QEMU_OPT_BOOL,
>> + .help = "Do not dereference discarded clusters",
>
> I wouldn’t call it “dereference” because of the overloaded meaning in
> C, but “unreference” instead.
>
ack
>> + },
>> {
>> .name = QCOW2_OPT_OVERLAP,
>> .type = QEMU_OPT_STRING,
>
> [...]
>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index 187e35d473..63aa792e9c 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -3432,6 +3432,9 @@
>> # @pass-discard-other: whether discard requests for the data source
>> # should be issued on other occasions where a cluster gets freed
>> #
>> +# @discard-no-unref: don't dereference the cluster when we do a discard
>> +# this to avoid fragmentation of the qcow2 image (since 8.1)
>
> Because this comment is used to build different documentation than
> qemu-options.hx is, I would duplicate the full comment you put into
> qemu-options.hx here (to provide the best documentation possible).
>
ack
>> +#
>> # @overlap-check: which overlap checks to perform for writes to the
>> # image, defaults to 'cached' (since 2.2)
>> #
>> @@ -3470,6 +3473,7 @@
>> '*pass-discard-request': 'bool',
>> '*pass-discard-snapshot': 'bool',
>> '*pass-discard-other': 'bool',
>> + '*discard-no-unref': 'bool',
>> '*overlap-check': 'Qcow2OverlapChecks',
>> '*cache-size': 'int',
>> '*l2-cache-size': 'int',
>> diff --git a/qemu-options.hx b/qemu-options.hx
>> index 42b9094c10..17ac701d0d 100644
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
>> @@ -1431,6 +1431,12 @@ SRST
>> issued on other occasions where a cluster gets freed
>> (on/off; default: off)
>> + ``discard-no-unref``
>> + When enabled, a discard in the guest does not cause the
>> + cluster inside the qcow2 image to be dereferenced. This
>
> Like above, I’d prefer “unreferenced”, or “the cluster’s allocation
> […] to be relinquished”.
>
ack
>> + was added to avoid qcow2 fragmentation whithin the image.
>> + (on/off; default: off)
>
> I wouldn’t describe history here, but instead what this is for. E.g.:
> “When enabled, discards from the guest will not cause cluster
> allocations to be relinquished. This prevents qcow2 fragmentation that
> would be caused by such discards. Besides potential performance
> degradation, such fragmentation can lead to increased allocation of
> clusters past the end of the image file, resulting in image files
> whose file length can grow much larger than their guest disk size
> would suggest. If image file length is of concern (e.g. when storing
> qcow2 images directly on block devices), you should consider enabling
> this option.”
ack
>
> What do you think?
Let me know your opinion on the qcow2_free_any_cluster call, and then
I'll post a new patch version.
>
> Hanna
>
Jean-Louis
next prev parent reply other threads:[~2023-06-01 12:57 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-15 7:36 [PATCH] qcow2: add discard-no-unref option Jean-Louis Dupond
2023-05-26 13:31 ` Hanna Czenczek
2023-05-26 14:30 ` Jean-Louis Dupond
2023-05-31 12:48 ` Hanna Czenczek
2023-05-31 15:05 ` Hanna Czenczek
2023-06-01 12:56 ` Jean-Louis Dupond [this message]
2023-06-02 8:01 ` Hanna Czenczek
2023-05-31 15:16 ` Hanna Czenczek
2023-06-01 13:13 ` Jean-Louis Dupond
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=08a88b12-ec0a-110a-ad64-2116712065e8@dupond.be \
--to=jean-louis@dupond.be \
--cc=hreitz@redhat.com \
--cc=kwolf@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).