From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
To: qemu-block@nongnu.org
Cc: qemu-devel@nongnu.org, mreitz@redhat.com, kwolf@redhat.com,
	vsementsov@virtuozzo.com, den@openvz.org
Subject: [PATCH v6 11/12] qcow2: protect data writing by host range reference
Date: Thu, 22 Apr 2021 19:30:45 +0300	[thread overview]
Message-ID: <20210422163046.442932-12-vsementsov@virtuozzo.com> (raw)
In-Reply-To: <20210422163046.442932-1-vsementsov@virtuozzo.com>
We have the following bug:
1. Start write to qcow2. Assume guest cluster G and corresponding host
   cluster is H.
2. The write requests come to the point of data writing to .file. The
   write to .file is started and qcow2 mutex is unlocked.
3. At this time refcount of H becomes 0. For example, it may be due to
   discard operation on qcow2 node, or rewriting compressed data by
   normal write, or some operation with snapshots..
4. Next, some operations occurs and leads to allocation of H for some
   other needs: it may be another write-to-qcow2-node operation, or
   allocation of L2 table or some other data or metadata cluster
   allocation.
5. So, at this point H is used for something other. Assume, L2 table is
   written into H.
6. And now, our write from [2] finishes. And pollutes L2 table in H.
   That's a bug.
To fix the bug we now have host-range-refs, which work in a
way that cluster is not "free" (and therefore will not be reused
and we don't fall into use-after-free described above) until both
refcount and host-range-ref are zero for this cluster.
Let's call qcow2_host_range_ref() in cluster allocation functions:
qcow2_alloc_host_offset() and qcow2_alloc_compressed_cluster_offset()
used on when writing host clusters. So that now these functions returns
"referenced" range, which caller should finally unref.
Iotest qcow2-discard-during-rewrite is enabled, as it works now.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/qcow2-cluster.c                            | 13 +++++++++++++
 block/qcow2.c                                    | 16 ++++++++++++++++
 .../tests/qcow2-discard-during-rewrite           |  2 +-
 3 files changed, 30 insertions(+), 1 deletion(-)
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 6105d4e7e0..999a739024 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -809,6 +809,10 @@ static int get_cluster_table(BlockDriverState *bs, uint64_t offset,
  * already allocated at the offset, return an error.
  *
  * Return 0 on success and -errno in error cases
+ *
+ * On success the host range [*host_offset, *host_offset + compressed_size) is
+ * referenced. Caller is responsible to unref it by qcow2_host_range_unref()
+ * after finishing IO operation with this range.
  */
 int qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
                                           uint64_t offset,
@@ -866,6 +870,9 @@ int qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
     qcow2_cache_put(s->l2_table_cache, (void **) &l2_slice);
 
     *host_offset = cluster_offset & s->cluster_offset_mask;
+
+    qcow2_host_range_ref(bs, *host_offset, compressed_size);
+
     return 0;
 }
 
@@ -1738,6 +1745,10 @@ out:
  * is queued and will be reentered when the dependency has completed.
  *
  * Return 0 on success and -errno in error cases
+ *
+ * On success the host range [*host_offset, *host_offset + *bytes) is
+ * referenced. Caller is responsible to unref it by qcow2_host_range_unref()
+ * after finishing IO operation with this range.
  */
 int qcow2_alloc_host_offset(BlockDriverState *bs, uint64_t offset,
                             unsigned int *bytes, uint64_t *host_offset,
@@ -1848,6 +1859,8 @@ again:
     assert(offset_into_cluster(s, *host_offset) ==
            offset_into_cluster(s, offset));
 
+    qcow2_host_range_ref(bs, *host_offset, *bytes);
+
     return 0;
 }
 
diff --git a/block/qcow2.c b/block/qcow2.c
index aa298c9e42..d0d2eaa914 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2489,6 +2489,8 @@ static int handle_alloc_space(BlockDriverState *bs, QCowL2Meta *l2meta)
  * Called with s->lock unlocked
  * l2meta  - if not NULL, qcow2_co_pwritev_task() will consume it. Caller must
  *           not use it somehow after qcow2_co_pwritev_task() call
+ *
+ * Function consumes range reference both on success and failure.
  */
 static coroutine_fn int qcow2_co_pwritev_task(BlockDriverState *bs,
                                               uint64_t host_offset,
@@ -2554,6 +2556,9 @@ out_unlocked:
 
 out_locked:
     qcow2_handle_l2meta(bs, &l2meta, false);
+
+    qcow2_host_range_unref(bs, host_offset, bytes);
+
     qemu_co_mutex_unlock(&s->lock);
 
     qemu_vfree(crypt_buf);
@@ -2610,6 +2615,7 @@ static coroutine_fn int qcow2_co_pwritev_part(
         ret = qcow2_pre_write_overlap_check(bs, 0, host_offset,
                                             cur_bytes, true);
         if (ret < 0) {
+            qcow2_host_range_unref(bs, host_offset, cur_bytes);
             goto out_locked;
         }
 
@@ -3151,6 +3157,9 @@ static int coroutine_fn preallocate_co(BlockDriverState *bs, uint64_t offset,
             goto out;
         }
 
+        /* We do truncate under mutex, don't bother with host range refs */
+        qcow2_host_range_unref(bs, host_offset, cur_bytes);
+
         for (m = meta; m != NULL; m = m->next) {
             m->prealloc = true;
         }
@@ -4122,12 +4131,14 @@ qcow2_co_copy_range_to(BlockDriverState *bs,
         ret = qcow2_pre_write_overlap_check(bs, 0, host_offset, cur_bytes,
                                             true);
         if (ret < 0) {
+            qcow2_host_range_unref(bs, host_offset, cur_bytes);
             goto fail;
         }
 
         qemu_co_mutex_unlock(&s->lock);
         ret = bdrv_co_copy_range_to(src, src_offset, s->data_file, host_offset,
                                     cur_bytes, read_flags, write_flags);
+        qcow2_host_range_unref(bs, host_offset, cur_bytes);
         qemu_co_mutex_lock(&s->lock);
         if (ret < 0) {
             goto fail;
@@ -4540,6 +4551,7 @@ qcow2_co_pwritev_compressed_task(BlockDriverState *bs,
     ssize_t out_len;
     uint8_t *buf, *out_buf;
     uint64_t cluster_offset;
+    bool unref_range = false;
 
     assert(bytes == s->cluster_size || (bytes < s->cluster_size &&
            (offset + bytes == bs->total_sectors << BDRV_SECTOR_BITS)));
@@ -4574,6 +4586,7 @@ qcow2_co_pwritev_compressed_task(BlockDriverState *bs,
         qemu_co_mutex_unlock(&s->lock);
         goto fail;
     }
+    unref_range = true;
 
     ret = qcow2_pre_write_overlap_check(bs, 0, cluster_offset, out_len, true);
     qemu_co_mutex_unlock(&s->lock);
@@ -4589,6 +4602,9 @@ qcow2_co_pwritev_compressed_task(BlockDriverState *bs,
 success:
     ret = 0;
 fail:
+    if (unref_range) {
+        qcow2_host_range_unref(bs, cluster_offset, out_len);
+    }
     qemu_vfree(buf);
     g_free(out_buf);
     return ret;
diff --git a/tests/qemu-iotests/tests/qcow2-discard-during-rewrite b/tests/qemu-iotests/tests/qcow2-discard-during-rewrite
index 7f0d8a107a..2e2e0d2cb0 100755
--- a/tests/qemu-iotests/tests/qcow2-discard-during-rewrite
+++ b/tests/qemu-iotests/tests/qcow2-discard-during-rewrite
@@ -1,5 +1,5 @@
 #!/usr/bin/env bash
-# group: quick disabled
+# group: quick
 #
 # Test discarding (and reusing) host cluster during writing data to it.
 #
-- 
2.29.2
next prev parent reply	other threads:[~2021-04-22 16:46 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-22 16:30 [PATCH v6 00/12] qcow2: fix parallel rewrite and discard (lockless) Vladimir Sementsov-Ogievskiy
2021-04-22 16:30 ` [PATCH v6 01/12] iotests: add qcow2-discard-during-rewrite Vladimir Sementsov-Ogievskiy
2021-04-22 16:30 ` [PATCH v6 02/12] qcow2: fix cache discarding in update_refcount() Vladimir Sementsov-Ogievskiy
2021-04-22 16:30 ` [PATCH v6 03/12] block/qcow2-cluster: assert no data_file on compressed write path Vladimir Sementsov-Ogievskiy
2021-04-22 16:30 ` [PATCH v6 04/12] block/qcow2-refcount: rename and publish update_refcount_discard() Vladimir Sementsov-Ogievskiy
2021-04-22 16:30 ` [PATCH v6 05/12] block/qcow2: introduce qcow2_parse_compressed_cluster_descriptor() Vladimir Sementsov-Ogievskiy
2021-04-22 16:30 ` [PATCH v6 06/12] block/qcow2: refactor qcow2_co_preadv_task() to have one return Vladimir Sementsov-Ogievskiy
2021-04-22 16:30 ` [PATCH v6 07/12] block/qcow2: qcow2_co_pwrite_zeroes: use QEMU_LOCK_GUARD Vladimir Sementsov-Ogievskiy
2021-04-22 16:30 ` [PATCH v6 08/12] qcow2: introduce is_cluster_free() helper Vladimir Sementsov-Ogievskiy
2021-04-22 16:30 ` [PATCH v6 09/12] qcow2: introduce host-range-refs Vladimir Sementsov-Ogievskiy
2021-04-22 16:30 ` [PATCH v6 10/12] qcow2: introduce qcow2_host_cluster_postponed_discard() Vladimir Sementsov-Ogievskiy
2021-04-22 16:30 ` Vladimir Sementsov-Ogievskiy [this message]
2021-04-22 16:30 ` [PATCH v6 12/12] qcow2: protect data reading by host range reference Vladimir Sementsov-Ogievskiy
2021-04-26 12:15 ` [PATCH v6 00/12] qcow2: fix parallel rewrite and discard (lockless) Vladimir Sementsov-Ogievskiy
2021-05-10 13:07 ` Vladimir Sementsov-Ogievskiy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox
  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):
  git send-email \
    --in-reply-to=20210422163046.442932-12-vsementsov@virtuozzo.com \
    --to=vsementsov@virtuozzo.com \
    --cc=den@openvz.org \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY
  https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
  Be sure your reply has a Subject: header at the top and a blank line
  before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).