From: Hanna Czenczek <hreitz@redhat.com>
To: qemu-block@nongnu.org
Cc: qemu-devel@nongnu.org, Hanna Czenczek <hreitz@redhat.com>
Subject: [PULL 1/3] qcow2: keep reference on zeroize with discard-no-unref enabled
Date: Mon, 6 Nov 2023 18:10:29 +0100 [thread overview]
Message-ID: <20231106171031.1084277-2-hreitz@redhat.com> (raw)
In-Reply-To: <20231106171031.1084277-1-hreitz@redhat.com>
From: Jean-Louis Dupond <jean-louis@dupond.be>
When the discard-no-unref flag is enabled, we keep the reference for
normal discard requests.
But when a discard is executed on a snapshot/qcow2 image with backing,
the discards are saved as zero clusters in the snapshot image.
When committing the snapshot to the backing file, not
discard_in_l2_slice is called but zero_in_l2_slice. Which did not had
any logic to keep the reference when discard-no-unref is enabled.
Therefor we add logic in the zero_in_l2_slice call to keep the reference
on commit.
Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1621
Signed-off-by: Jean-Louis Dupond <jean-louis@dupond.be>
Message-Id: <20231003125236.216473-2-jean-louis@dupond.be>
[hreitz: Made the documentation change more verbose, as discussed
on-list]
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
---
qapi/block-core.json | 24 ++++++++++++++----------
block/qcow2-cluster.c | 22 ++++++++++++++++++----
qemu-options.hx | 10 +++++++---
3 files changed, 39 insertions(+), 17 deletions(-)
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 99961256f2..ca390c5700 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3528,16 +3528,20 @@
# @pass-discard-other: whether discard requests for the data source
# should be issued on other occasions where a cluster gets freed
#
-# @discard-no-unref: when enabled, discards from the guest will not
-# cause cluster allocations to be relinquished. This prevents
-# qcow2 fragmentation that would be caused by such discards.
-# Besides potential performance degradation, such fragmentation
-# can lead to increased allocation of clusters past the end of the
-# image file, resulting in image files whose file length can grow
-# much larger than their guest disk size would suggest. If image
-# file length is of concern (e.g. when storing qcow2 images
-# directly on block devices), you should consider enabling this
-# option. (since 8.1)
+# @discard-no-unref: when enabled, data clusters will remain
+# preallocated when they are no longer used, e.g. because they are
+# discarded or converted to zero clusters. As usual, whether the
+# old data is discarded or kept on the protocol level (i.e. in the
+# image file) depends on the setting of the pass-discard-request
+# option. Keeping the clusters preallocated prevents qcow2
+# fragmentation that would otherwise be caused by freeing and
+# re-allocating them later. Besides potential performance
+# degradation, such fragmentation can lead to increased allocation
+# of clusters past the end of the image file, resulting in image
+# files whose file length can grow much larger than their guest disk
+# size would suggest. If image file length is of concern (e.g. when
+# storing qcow2 images directly on block devices), you should
+# consider enabling this option. (since 8.1)
#
# @overlap-check: which overlap checks to perform for writes to the
# image, defaults to 'cached' (since 2.2)
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 904f00d1b3..5af439bd11 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1983,7 +1983,7 @@ discard_in_l2_slice(BlockDriverState *bs, uint64_t offset, uint64_t nb_clusters,
/* If we keep the reference, pass on the discard still */
bdrv_pdiscard(s->data_file, old_l2_entry & L2E_OFFSET_MASK,
s->cluster_size);
- }
+ }
}
qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
@@ -2061,9 +2061,15 @@ zero_in_l2_slice(BlockDriverState *bs, uint64_t offset,
QCow2ClusterType type = qcow2_get_cluster_type(bs, old_l2_entry);
bool unmap = (type == QCOW2_CLUSTER_COMPRESSED) ||
((flags & BDRV_REQ_MAY_UNMAP) && qcow2_cluster_is_allocated(type));
- uint64_t new_l2_entry = unmap ? 0 : old_l2_entry;
+ bool keep_reference =
+ (s->discard_no_unref && type != QCOW2_CLUSTER_COMPRESSED);
+ uint64_t new_l2_entry = old_l2_entry;
uint64_t new_l2_bitmap = old_l2_bitmap;
+ if (unmap && !keep_reference) {
+ new_l2_entry = 0;
+ }
+
if (has_subclusters(s)) {
new_l2_bitmap = QCOW_L2_BITMAP_ALL_ZEROES;
} else {
@@ -2081,9 +2087,17 @@ zero_in_l2_slice(BlockDriverState *bs, uint64_t offset,
set_l2_bitmap(s, l2_slice, l2_index + i, new_l2_bitmap);
}
- /* Then decrease the refcount */
if (unmap) {
- qcow2_free_any_cluster(bs, old_l2_entry, QCOW2_DISCARD_REQUEST);
+ if (!keep_reference) {
+ /* Then decrease the refcount */
+ qcow2_free_any_cluster(bs, old_l2_entry, QCOW2_DISCARD_REQUEST);
+ } else if (s->discard_passthrough[QCOW2_DISCARD_REQUEST] &&
+ (type == QCOW2_CLUSTER_NORMAL ||
+ type == QCOW2_CLUSTER_ZERO_ALLOC)) {
+ /* If we keep the reference, pass on the discard still */
+ bdrv_pdiscard(s->data_file, old_l2_entry & L2E_OFFSET_MASK,
+ s->cluster_size);
+ }
}
}
diff --git a/qemu-options.hx b/qemu-options.hx
index e26230bac5..7809036d8c 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1457,9 +1457,13 @@ SRST
(on/off; default: off)
``discard-no-unref``
- When enabled, discards from the guest will not cause cluster
- allocations to be relinquished. This prevents qcow2 fragmentation
- that would be caused by such discards. Besides potential
+ When enabled, data clusters will remain preallocated when they are
+ no longer used, e.g. because they are discarded or converted to
+ zero clusters. As usual, whether the old data is discarded or kept
+ on the protocol level (i.e. in the image file) depends on the
+ setting of the pass-discard-request option. Keeping the clusters
+ preallocated prevents qcow2 fragmentation that would otherwise be
+ caused by freeing and re-allocating them later. Besides potential
performance degradation, such fragmentation can lead to increased
allocation of clusters past the end of the image file,
resulting in image files whose file length can grow much larger
--
2.41.0
next prev parent reply other threads:[~2023-11-06 17:11 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-06 17:10 [PULL 0/3] Block patches Hanna Czenczek
2023-11-06 17:10 ` Hanna Czenczek [this message]
2023-11-06 17:10 ` [PULL 2/3] block/file-posix: fix update_zones_wp() caller Hanna Czenczek
2023-11-06 17:10 ` [PULL 3/3] file-posix: fix over-writing of returning zone_append offset Hanna Czenczek
2023-11-07 3:03 ` [PULL 0/3] Block patches Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231106171031.1084277-2-hreitz@redhat.com \
--to=hreitz@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).