qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Jean-Louis Dupond <jean-louis@dupond.be>
To: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>, qemu-block@nongnu.org
Cc: qemu-devel@nongnu.org, hreitz@redhat.com, kwolf@redhat.com,
	eblake@redhat.com, berto@igalia.com, den@virtuozzo.com
Subject: Re: [PATCH 4/7] qcow2: make subclusters discardable
Date: Fri, 19 Apr 2024 11:06:29 +0200	[thread overview]
Message-ID: <0ba81018-8512-40b8-917e-1ef1cf95ca72@dupond.be> (raw)
In-Reply-To: <9fcebaf5-bdac-435f-8582-0dab75ff7afc@virtuozzo.com>

On 16/04/2024 21:56, Andrey Drobyshev wrote:
> On 10/27/23 14:10, Jean-Louis Dupond wrote:
>> [...]
>>
>> I've checked all the code paths, and as far as I see it nowhere breaks
>> the discard_no_unref option.
>> It's important that we don't introduce new code paths that can make
>> holes in the qcow2 image when this option is enabled :)
>>
>> If you can confirm my conclusion, that would be great.
>>
>>
>> Thanks
>> Jean-Louis
>>
> Hi Jean-Louis,
>
> I've finally got to working on v2 for this series.  However I'm failing
> to get a grasp on what this option is supposed to be doing and what are
> we trying to avoid here.
The discard-no-unref option causes qemu to only zero the blocks/clusters 
that get discarded, but does NOT remove the reference of the cluster.
So the cluster stays allocated/referenced, but is just marked zero.

There are multiple scenario's where you would need this.
First of all when you have a pre-allocated image, you most likely 
created it because you don't want fragmentation.
But if you don't have discard-no-unref enabled, you will end up with a 
fragmented image anyway, because discard will create holes in your 
image, and will be randomly allocated. Ending up with a fragmented image.

Another scenario (and why we implemented it), is that with a sparse 
image, you allocate new blocks at the end of the 'allocation pointer' 
(which points to the first available blocks in your image).
But if you do discards, afaik the pointer is not moved to the freed 
cluster, but still allocates at the end until you reopen the image.
And even then, take you created a hole of 5 free clusters, and you need 
to allocate 4 new clusters, it will use those 5 and leave 1 empty cluster.
But the next allocation needs 2 clusters, it will jump to the next free 
space with at least 2 clusters. Leaving that 1 cluster unallocated.
And this caused us to have 'sparse' images of 110GB for 100GB images for 
example. Just because the qcow2 images was full of small empty clusters 
completely fragmented.
>
> Consider this simple example:
>
> # cd build
> # ./qemu-img create -f qcow2   unref.qcow2 192K
> # ./qemu-img create -f qcow2 nounref.qcow2 192K
> # ./qemu-io -c "write 0 192K"   unref.qcow2
> # ./qemu-io -c "write 0 192K" nounref.qcow2
> #
> # strace -fv -e fallocate ./qemu-io -c "discard 64K 64K" unref.qcow2
> [pid 887710] fallocate(9, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE,
> 393216, 65536) = 0
> discard 65536/65536 bytes at offset 65536
> 64 KiB, 1 ops; 00.00 sec (252.123 MiB/sec and 4033.9660 ops/sec)
> #
> # strace -fv -e fallocate ./qemu-io -c "reopen -o discard-no-unref=on"
> -c "discard 64K 64K" nounref.qcow2
> # [pid 887789] fallocate(9, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE,
> 393216, 65536) = 0
> discard 65536/65536 bytes at offset 65536
> 64 KiB, 1 ops; 00.00 sec (345.457 MiB/sec and 5527.3049 ops/sec)
> #
> # ./qemu-img check unref.qcow2
>
> No errors were found on the image.
> 2/3 = 66.67% allocated, 50.00% fragmented, 0.00% compressed clusters
> Image end offset: 524288
> # ./qemu-img check nounref.qcow2
> No errors were found on the image.
> 3/3 = 100.00% allocated, 0.00% fragmented, 0.00% compressed clusters
> Image end offset: 524288
> #
> # ls -la *.qcow2
>
> -rw-r--r-- 1 root root 524288 Apr 16 22:42 nounref.qcow2
> -rw-r--r-- 1 root root 524288 Apr 16 22:41 unref.qcow2
> # du --block-size=1 *.qcow2
> 397312  nounref.qcow2
> 397312  unref.qcow2
>
> I understand that by keeping the L2 entry we achieve that cluster
> remains formally allocated, but no matter whether "discard-no-unref"
> option is enabled fallocate(FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE) is
> being called leaving a hole in the file (e.g. file becomes sparse).
> However you say in the comment above that we can't allow making new
> holes in the file when this option is enabled.  How does that correlate
> and what do we achieve?  And which logic do you think we need to follow
> when discarding separate subclusters?
>
> Andrey


  reply	other threads:[~2024-04-19  9:07 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-20 21:56 [PATCH 0/7] qcow2: make subclusters discardable Andrey Drobyshev
2023-10-20 21:56 ` [PATCH 1/7] qcow2: make function update_refcount_discard() global Andrey Drobyshev
2023-10-31 15:27   ` Hanna Czenczek
2023-10-20 21:56 ` [PATCH 2/7] qcow2: add get_sc_range_info() helper for working with subcluster ranges Andrey Drobyshev
2023-10-31 15:53   ` Hanna Czenczek
2023-11-09 12:32     ` Andrey Drobyshev
2023-10-20 21:56 ` [PATCH 3/7] qcow2: zeroize the entire cluster when there're no non-zero subclusters Andrey Drobyshev
2023-10-31 16:06   ` Hanna Czenczek
2023-10-20 21:56 ` [PATCH 4/7] qcow2: make subclusters discardable Andrey Drobyshev
2023-10-27 11:10   ` Jean-Louis Dupond
2024-04-16 19:56     ` Andrey Drobyshev
2024-04-19  9:06       ` Jean-Louis Dupond [this message]
2023-10-31 16:32   ` Hanna Czenczek
2023-11-09 15:05     ` Andrey Drobyshev
2023-10-31 16:33   ` Hanna Czenczek
2023-11-10 13:26     ` Andrey Drobyshev
2023-11-03 15:53   ` Hanna Czenczek
2023-10-20 21:56 ` [PATCH 5/7] qcow2: zero_l2_subclusters: fall through to discard operation when requested Andrey Drobyshev
2023-11-03 15:19   ` Hanna Czenczek
2023-11-10 13:17     ` Andrey Drobyshev
2023-10-20 21:56 ` [PATCH 6/7] iotests/common.rc: add disk_usage function Andrey Drobyshev
2023-11-03 15:20   ` Hanna Czenczek
2023-11-09 12:35     ` Andrey Drobyshev
2023-10-20 21:56 ` [PATCH 7/7] iotests/271: check disk usage on subcluster-based discard/unmap Andrey Drobyshev
2023-11-03 15:51   ` Hanna Czenczek
2023-11-03 15:59     ` Hanna Czenczek
2023-11-09 14:05       ` Andrey Drobyshev
2023-11-09 13:55     ` Andrey Drobyshev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0ba81018-8512-40b8-917e-1ef1cf95ca72@dupond.be \
    --to=jean-louis@dupond.be \
    --cc=andrey.drobyshev@virtuozzo.com \
    --cc=berto@igalia.com \
    --cc=den@virtuozzo.com \
    --cc=eblake@redhat.com \
    --cc=hreitz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).