qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Hanna Czenczek <hreitz@redhat.com>
To: Jean-Louis Dupond <jean-louis@dupond.be>,
	qemu-devel@nongnu.org, kwolf@redhat.com
Subject: Re: [PATCH] qcow2: add discard-no-unref option
Date: Fri, 2 Jun 2023 10:01:00 +0200	[thread overview]
Message-ID: <624b6d8c-492c-02cc-7a13-f86af572dcf5@redhat.com> (raw)
In-Reply-To: <08a88b12-ec0a-110a-ad64-2116712065e8@dupond.be>

On 01.06.23 14:56, Jean-Louis Dupond wrote:
> On 31/05/2023 17:05, Hanna Czenczek wrote:
>> On 15.05.23 09:36, Jean-Louis Dupond wrote:
>>> When we for example have a sparse qcow2 image and discard: unmap is 
>>> enabled,
>>> there can be a lot of fragmentation in the image after some time. 
>>> Surely on VM's
>>
>> s/. Surely/, especially/
>>
>>> that do a lot of writes/deletes.
>>> This causes the qcow2 image to grow even over 110% of its virtual size,
>>> because the free gaps in the image get to small to allocate new
>>
>> s/to small/too small/
>>
>>> continuous clusters. So it allocates new space as the end of the image.
>>
>> s/as/at/
>>
>>> Disabling discard is not an option, as discard is needed to keep the
>>> incremental backup size as low as possible. Without discard, the
>>> incremental backups would become large, as qemu thinks it's just dirty
>>> blocks but it doesn't know the blocks are empty/useless.
>>> So we need to avoid fragmentation but also 'empty' the useless 
>>> blocks in
>>
>> s/useless/unneeded/ in both lines?
>>
>>> the image to have a small incremental backup.
>>>
>>> Next to that we also want to send the discards futher down the 
>>> stack, so
>>
>> s/Next to that/In addition/, s/futher/further/
>>
>>> the underlying blocks are still discarded.
>>>
>>> Therefor we introduce a new qcow2 option "discard-no-unref". When
>>> setting this option to true (defaults to false), the discard requests
>>> will still be executed, but it will keep the offset of the cluster. And
>>> it will also pass the discard request further down the stack (if
>>> discard:unmap is enabled).
>>
>> I think this could be more explicit, e.g. “When setting this option 
>> to true, discards will no longer have the qcow2 driver relinquish 
>> cluster allocations. Other than that, the request is handled as 
>> normal: All clusters in range are marked as zero, and, if 
>> pass-discard-request is true, it is passed further down the stack. 
>> The only difference is that the now-zero clusters are preallocated 
>> instead of being unallocated.”
>>
>>> This will avoid fragmentation and for example on a fully preallocated
>>> qcow2 image, this will make sure the image is perfectly continuous.
>>
>> Well, on the qcow2 layer, yes.
> All above -> Fixed :)
>>
>>> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1621
>>> Signed-off-by: Jean-Louis Dupond <jean-louis@dupond.be>
>>> ---
>>>   block/qcow2-cluster.c  |  16 ++++-
>>>   block/qcow2-refcount.c | 136 
>>> ++++++++++++++++++++++++-----------------
>>>   block/qcow2.c          |  12 ++++
>>>   block/qcow2.h          |   3 +
>>>   qapi/block-core.json   |   4 ++
>>>   qemu-options.hx        |   6 ++
>>>   6 files changed, 120 insertions(+), 57 deletions(-)
>>>
>>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
>>> index 39cda7f907..88da70db5e 100644
>>> --- a/block/qcow2-cluster.c
>>> +++ b/block/qcow2-cluster.c
>>> @@ -1943,10 +1943,22 @@ static int 
>>> discard_in_l2_slice(BlockDriverState *bs, uint64_t offset,
>>>               new_l2_entry = new_l2_bitmap = 0;
>>>           } else if (bs->backing || 
>>> qcow2_cluster_is_allocated(cluster_type)) {
>>>               if (has_subclusters(s)) {
>>> -                new_l2_entry = 0;
>>> +                if (s->discard_no_unref && (type & 
>>> QCOW2_DISCARD_REQUEST)) {
>>
>> As far as I understand the discard type is just a plain enum, not a 
>> bit field.  So I think this should be `type == 
>> QCOW2_DISCARD_REQUEST`, not an `&`.  (Same below.)
>>
> Ack
>>> +                    new_l2_entry = old_l2_entry;
>>> +                } else {
>>> +                    new_l2_entry = 0;
>>> +                }
>>>                   new_l2_bitmap = QCOW_L2_BITMAP_ALL_ZEROES;
>>>               } else {
>>> -                new_l2_entry = s->qcow_version >= 3 ? 
>>> QCOW_OFLAG_ZERO : 0;
>>> +                if (s->qcow_version >= 3) {
>>> +                    if (s->discard_no_unref && (type & 
>>> QCOW2_DISCARD_REQUEST)) {
>>> +                        new_l2_entry |= QCOW_OFLAG_ZERO;
>>> +                    } else {
>>> +                        new_l2_entry = QCOW_OFLAG_ZERO;
>>> +                    }
>>> +                } else {
>>> +                    new_l2_entry = 0;
>>> +                }
>>>               }
>>>           }
>>
>> Context below:
>>
>>         if (old_l2_entry == new_l2_entry && old_l2_bitmap == 
>> new_l2_bitmap) {
>>             continue;
>>         }
>>
>>         /* First remove L2 entries */
>>         qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
>>         set_l2_entry(s, l2_slice, l2_index + i, new_l2_entry);
>>         if (has_subclusters(s)) {
>>             set_l2_bitmap(s, l2_slice, l2_index + i, new_l2_bitmap);
>>         }
>>         /* Then decrease the refcount */
>>         qcow2_free_any_cluster(bs, old_l2_entry, type);
>>
>> If we keep the allocation, I don’t see why we would call 
>> qcow2_free_any_cluster().  If we simply skip the call (if 
>> `qcow2_is_allocated(qcow2_get_cluster_type(bs, new_l2_entry))`), I 
>> think you could drop the modification to update_refcount().
>>
> If we don't call qcow2_free_any_cluster, the discard will not get 
> passed to the lower layer.

That’s a pickle.

> We also call it in zero_in_l2_slice for example to discard lower layer.

We only call it there if the allocation is dropped.  (`new_l2_entry = 
unmap ? 0 : old_l2_entry`)

I’d either lift the discard to discard_in_l2_slice() (if dropping the 
reference, call qcow2_free_any_cluster(); otherwise, if the old cluster 
was a normal or zero allocated cluster, discard it); or add a bool 
parameter to `qcow2_free_any_cluster()` that tells it to only discard, 
not free, the cluster, which makes it take the existing `if 
(has_data_file(bs))` path there.

The latter is simpler, but I find it problematic still to call 
qcow2_free_any_cluster() when there’s no intention of actually freeing a 
cluster (i.e. releasing the reference to it).

Hanna



  reply	other threads:[~2023-06-02  8:02 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-15  7:36 [PATCH] qcow2: add discard-no-unref option Jean-Louis Dupond
2023-05-26 13:31 ` Hanna Czenczek
2023-05-26 14:30   ` Jean-Louis Dupond
2023-05-31 12:48     ` Hanna Czenczek
2023-05-31 15:05 ` Hanna Czenczek
2023-06-01 12:56   ` Jean-Louis Dupond
2023-06-02  8:01     ` Hanna Czenczek [this message]
2023-05-31 15:16 ` Hanna Czenczek
2023-06-01 13:13   ` Jean-Louis Dupond

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=624b6d8c-492c-02cc-7a13-f86af572dcf5@redhat.com \
    --to=hreitz@redhat.com \
    --cc=jean-louis@dupond.be \
    --cc=kwolf@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).