From: Hanna Czenczek <hreitz@redhat.com>
To: Jean-Louis Dupond <jean-louis@dupond.be>,
qemu-devel@nongnu.org, kwolf@redhat.com
Subject: Re: [PATCH] qcow2: add discard-no-unref option
Date: Fri, 2 Jun 2023 10:01:00 +0200 [thread overview]
Message-ID: <624b6d8c-492c-02cc-7a13-f86af572dcf5@redhat.com> (raw)
In-Reply-To: <08a88b12-ec0a-110a-ad64-2116712065e8@dupond.be>
On 01.06.23 14:56, Jean-Louis Dupond wrote:
> On 31/05/2023 17:05, Hanna Czenczek wrote:
>> On 15.05.23 09:36, Jean-Louis Dupond wrote:
>>> When we for example have a sparse qcow2 image and discard: unmap is
>>> enabled,
>>> there can be a lot of fragmentation in the image after some time.
>>> Surely on VM's
>>
>> s/. Surely/, especially/
>>
>>> that do a lot of writes/deletes.
>>> This causes the qcow2 image to grow even over 110% of its virtual size,
>>> because the free gaps in the image get to small to allocate new
>>
>> s/to small/too small/
>>
>>> continuous clusters. So it allocates new space as the end of the image.
>>
>> s/as/at/
>>
>>> Disabling discard is not an option, as discard is needed to keep the
>>> incremental backup size as low as possible. Without discard, the
>>> incremental backups would become large, as qemu thinks it's just dirty
>>> blocks but it doesn't know the blocks are empty/useless.
>>> So we need to avoid fragmentation but also 'empty' the useless
>>> blocks in
>>
>> s/useless/unneeded/ in both lines?
>>
>>> the image to have a small incremental backup.
>>>
>>> Next to that we also want to send the discards futher down the
>>> stack, so
>>
>> s/Next to that/In addition/, s/futher/further/
>>
>>> the underlying blocks are still discarded.
>>>
>>> Therefor we introduce a new qcow2 option "discard-no-unref". When
>>> setting this option to true (defaults to false), the discard requests
>>> will still be executed, but it will keep the offset of the cluster. And
>>> it will also pass the discard request further down the stack (if
>>> discard:unmap is enabled).
>>
>> I think this could be more explicit, e.g. “When setting this option
>> to true, discards will no longer have the qcow2 driver relinquish
>> cluster allocations. Other than that, the request is handled as
>> normal: All clusters in range are marked as zero, and, if
>> pass-discard-request is true, it is passed further down the stack.
>> The only difference is that the now-zero clusters are preallocated
>> instead of being unallocated.”
>>
>>> This will avoid fragmentation and for example on a fully preallocated
>>> qcow2 image, this will make sure the image is perfectly continuous.
>>
>> Well, on the qcow2 layer, yes.
> All above -> Fixed :)
>>
>>> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1621
>>> Signed-off-by: Jean-Louis Dupond <jean-louis@dupond.be>
>>> ---
>>> block/qcow2-cluster.c | 16 ++++-
>>> block/qcow2-refcount.c | 136
>>> ++++++++++++++++++++++++-----------------
>>> block/qcow2.c | 12 ++++
>>> block/qcow2.h | 3 +
>>> qapi/block-core.json | 4 ++
>>> qemu-options.hx | 6 ++
>>> 6 files changed, 120 insertions(+), 57 deletions(-)
>>>
>>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
>>> index 39cda7f907..88da70db5e 100644
>>> --- a/block/qcow2-cluster.c
>>> +++ b/block/qcow2-cluster.c
>>> @@ -1943,10 +1943,22 @@ static int
>>> discard_in_l2_slice(BlockDriverState *bs, uint64_t offset,
>>> new_l2_entry = new_l2_bitmap = 0;
>>> } else if (bs->backing ||
>>> qcow2_cluster_is_allocated(cluster_type)) {
>>> if (has_subclusters(s)) {
>>> - new_l2_entry = 0;
>>> + if (s->discard_no_unref && (type &
>>> QCOW2_DISCARD_REQUEST)) {
>>
>> As far as I understand the discard type is just a plain enum, not a
>> bit field. So I think this should be `type ==
>> QCOW2_DISCARD_REQUEST`, not an `&`. (Same below.)
>>
> Ack
>>> + new_l2_entry = old_l2_entry;
>>> + } else {
>>> + new_l2_entry = 0;
>>> + }
>>> new_l2_bitmap = QCOW_L2_BITMAP_ALL_ZEROES;
>>> } else {
>>> - new_l2_entry = s->qcow_version >= 3 ?
>>> QCOW_OFLAG_ZERO : 0;
>>> + if (s->qcow_version >= 3) {
>>> + if (s->discard_no_unref && (type &
>>> QCOW2_DISCARD_REQUEST)) {
>>> + new_l2_entry |= QCOW_OFLAG_ZERO;
>>> + } else {
>>> + new_l2_entry = QCOW_OFLAG_ZERO;
>>> + }
>>> + } else {
>>> + new_l2_entry = 0;
>>> + }
>>> }
>>> }
>>
>> Context below:
>>
>> if (old_l2_entry == new_l2_entry && old_l2_bitmap ==
>> new_l2_bitmap) {
>> continue;
>> }
>>
>> /* First remove L2 entries */
>> qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
>> set_l2_entry(s, l2_slice, l2_index + i, new_l2_entry);
>> if (has_subclusters(s)) {
>> set_l2_bitmap(s, l2_slice, l2_index + i, new_l2_bitmap);
>> }
>> /* Then decrease the refcount */
>> qcow2_free_any_cluster(bs, old_l2_entry, type);
>>
>> If we keep the allocation, I don’t see why we would call
>> qcow2_free_any_cluster(). If we simply skip the call (if
>> `qcow2_is_allocated(qcow2_get_cluster_type(bs, new_l2_entry))`), I
>> think you could drop the modification to update_refcount().
>>
> If we don't call qcow2_free_any_cluster, the discard will not get
> passed to the lower layer.
That’s a pickle.
> We also call it in zero_in_l2_slice for example to discard lower layer.
We only call it there if the allocation is dropped. (`new_l2_entry =
unmap ? 0 : old_l2_entry`)
I’d either lift the discard to discard_in_l2_slice() (if dropping the
reference, call qcow2_free_any_cluster(); otherwise, if the old cluster
was a normal or zero allocated cluster, discard it); or add a bool
parameter to `qcow2_free_any_cluster()` that tells it to only discard,
not free, the cluster, which makes it take the existing `if
(has_data_file(bs))` path there.
The latter is simpler, but I find it problematic still to call
qcow2_free_any_cluster() when there’s no intention of actually freeing a
cluster (i.e. releasing the reference to it).
Hanna
next prev parent reply other threads:[~2023-06-02 8:02 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-15 7:36 [PATCH] qcow2: add discard-no-unref option Jean-Louis Dupond
2023-05-26 13:31 ` Hanna Czenczek
2023-05-26 14:30 ` Jean-Louis Dupond
2023-05-31 12:48 ` Hanna Czenczek
2023-05-31 15:05 ` Hanna Czenczek
2023-06-01 12:56 ` Jean-Louis Dupond
2023-06-02 8:01 ` Hanna Czenczek [this message]
2023-05-31 15:16 ` Hanna Czenczek
2023-06-01 13:13 ` Jean-Louis Dupond
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=624b6d8c-492c-02cc-7a13-f86af572dcf5@redhat.com \
--to=hreitz@redhat.com \
--cc=jean-louis@dupond.be \
--cc=kwolf@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).