qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Hanna Czenczek <hreitz@redhat.com>
To: qemu-block@nongnu.org
Cc: qemu-devel@nongnu.org, "Hanna Czenczek" <hreitz@redhat.com>,
	"Kevin Wolf" <kwolf@redhat.com>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Richard W . M . Jones" <rjones@redhat.com>,
	"Ilya Dryomov" <idryomov@gmail.com>,
	"Peter Lieven" <pl@dlhnet.de>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Alex Bennée" <alex.bennee@linaro.org>,
	"Fam Zheng" <fam@euphon.net>,
	"Ronnie Sahlberg" <ronniesahlberg@gmail.com>
Subject: [PATCH 10/16] qcow2: Fix cache_clean_timer
Date: Tue, 28 Oct 2025 17:33:36 +0100	[thread overview]
Message-ID: <20251028163343.116249-11-hreitz@redhat.com> (raw)
In-Reply-To: <20251028163343.116249-1-hreitz@redhat.com>

The cache-cleaner runs as a timer CB in the BDS AioContext.  With
multiqueue, it can run concurrently to I/O requests, and because it does
not take any lock, this can break concurrent cache accesses, corrupting
the image.  While the chances of this happening are low, it can be
reproduced e.g. by modifying the code to schedule the timer CB every
5 ms (instead of at most once per second) and modifying the last (inner)
while loop of qcow2_cache_clean_unused() like so:

    while (i < c->size && can_clean_entry(c, i)) {
        for (int j = 0; j < 1000 && can_clean_entry(c, i); j++) {
            usleep(100);
        }
        c->entries[i].offset = 0;
        c->entries[i].lru_counter = 0;
        i++;
        to_clean++;
    }

i.e. making it wait on purpose for the point in time where the cache is
in use by something else.

The solution chosen for this in this patch is not the best solution, I
hope, but I admittedly can’t come up with anything strictly better.

We can protect from concurrent cache accesses either by taking the
existing s->lock, or we introduce a new (non-coroutine) mutex
specifically for cache accesses.  I would prefer to avoid the latter so
as not to introduce additional (very slight) overhead.

Using s->lock, which is a coroutine mutex, however means that we need to
take it in a coroutine, so the timer CB must enter such a coroutine.  As
a result, descheduling the timer is no longer a guarantee that the
cache-cleaner will not run, because it may now be yielding in
qemu_co_mutex_lock().

(Note even now this was only guaranteed for cache_clean_timer_del()
callers that run in the BDS (the timer’s) AioContext.  For callers
running in the main context, the problem may have already existed,
though maybe the BQL prevents timers from running in other contexts, I’m
not sure.)

Polling to await the timer to actually settle seems very complicated for
something that’s rather a minor problem, but I can’t come up with any
better solution that doesn’t again just overlook potential problems.

(One cleaner idea may be to have a generic way to have timers run
coroutines, and to await those when descheduling the timer.  But while
cleaner, it would also be more complicated, and I don’t think worth it
at this point.)

(Not Cc-ing qemu-stable, as the issue is quite unlikely to be hit, and
I’m not too fond of this solution.)

Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
---
 block/qcow2.h |  1 +
 block/qcow2.c | 90 ++++++++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 79 insertions(+), 12 deletions(-)

diff --git a/block/qcow2.h b/block/qcow2.h
index a9e3481c6e..272d34def1 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -347,6 +347,7 @@ typedef struct BDRVQcow2State {
     Qcow2Cache *refcount_block_cache;
     QEMUTimer *cache_clean_timer;
     unsigned cache_clean_interval;
+    bool cache_clean_running, cache_clean_polling;
 
     QLIST_HEAD(, QCowL2Meta) cluster_allocs;
 
diff --git a/block/qcow2.c b/block/qcow2.c
index 4aa9f9e068..ef851794c3 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -835,12 +835,38 @@ static const char *overlap_bool_option_names[QCOW2_OL_MAX_BITNR] = {
     [QCOW2_OL_BITMAP_DIRECTORY_BITNR] = QCOW2_OPT_OVERLAP_BITMAP_DIRECTORY,
 };
 
-static void cache_clean_timer_cb(void *opaque)
+static void coroutine_fn cache_clean_timer_co(void *opaque)
 {
     BlockDriverState *bs = opaque;
     BDRVQcow2State *s = bs->opaque;
+
+    qemu_co_mutex_lock(&s->lock);
+    if (!s->cache_clean_timer) {
+        /* cache_clean_timer_del() has been called, skip doing anything */
+        goto out;
+    }
     qcow2_cache_clean_unused(s->l2_table_cache);
     qcow2_cache_clean_unused(s->refcount_block_cache);
+
+out:
+    qemu_co_mutex_unlock(&s->lock);
+    qatomic_set(&s->cache_clean_running, false);
+    if (qatomic_xchg(&s->cache_clean_polling, false)) {
+        aio_wait_kick();
+    }
+}
+
+static void cache_clean_timer_cb(void *opaque)
+{
+    BlockDriverState *bs = opaque;
+    BDRVQcow2State *s = bs->opaque;
+    Coroutine *co;
+
+    co = qemu_coroutine_create(cache_clean_timer_co, bs);
+    /* cleared in cache_clean_timer_co() */
+    qatomic_set(&s->cache_clean_running, true);
+    qemu_coroutine_enter(co);
+
     timer_mod(s->cache_clean_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
               (int64_t) s->cache_clean_interval * 1000);
 }
@@ -867,6 +893,39 @@ static void cache_clean_timer_del(BlockDriverState *bs)
     }
 }
 
+/*
+ * Delete the cache clean timer and await any yet running instance.
+ * Must be called from the main or BDS AioContext, holding s->lock.
+ */
+static void coroutine_fn
+cache_clean_timer_locked_co_del_and_wait(BlockDriverState *bs)
+{
+    BDRVQcow2State *s = bs->opaque;
+    IO_OR_GS_CODE();
+    cache_clean_timer_del(bs);
+    if (qatomic_read(&s->cache_clean_running)) {
+        qemu_co_mutex_unlock(&s->lock);
+        qatomic_set(&s->cache_clean_polling, true);
+        BDRV_POLL_WHILE(bs, qatomic_read(&s->cache_clean_running));
+        qemu_co_mutex_lock(&s->lock);
+    }
+}
+
+/*
+ * Delete the cache clean timer and await any yet running instance.
+ * Must be called from the main or BDS AioContext without s->lock held.
+ */
+static void cache_clean_timer_del_and_wait(BlockDriverState *bs)
+{
+    BDRVQcow2State *s = bs->opaque;
+    IO_OR_GS_CODE();
+    cache_clean_timer_del(bs);
+    if (qatomic_read(&s->cache_clean_running)) {
+        qatomic_set(&s->cache_clean_polling, true);
+        BDRV_POLL_WHILE(bs, qatomic_read(&s->cache_clean_running));
+    }
+}
+
 static void qcow2_detach_aio_context(BlockDriverState *bs)
 {
     cache_clean_timer_del(bs);
@@ -1214,12 +1273,20 @@ fail:
     return ret;
 }
 
+/* s_locked specifies whether s->lock is held or not */
 static void qcow2_update_options_commit(BlockDriverState *bs,
-                                        Qcow2ReopenState *r)
+                                        Qcow2ReopenState *r,
+                                        bool s_locked)
 {
     BDRVQcow2State *s = bs->opaque;
     int i;
 
+    if (s_locked) {
+        cache_clean_timer_locked_co_del_and_wait(bs);
+    } else {
+        cache_clean_timer_del_and_wait(bs);
+    }
+
     if (s->l2_table_cache) {
         qcow2_cache_destroy(s->l2_table_cache);
     }
@@ -1228,6 +1295,10 @@ static void qcow2_update_options_commit(BlockDriverState *bs,
     }
     s->l2_table_cache = r->l2_table_cache;
     s->refcount_block_cache = r->refcount_block_cache;
+
+    s->cache_clean_interval = r->cache_clean_interval;
+    cache_clean_timer_init(bs, bdrv_get_aio_context(bs));
+
     s->l2_slice_size = r->l2_slice_size;
 
     s->overlap_check = r->overlap_check;
@@ -1239,12 +1310,6 @@ static void qcow2_update_options_commit(BlockDriverState *bs,
 
     s->discard_no_unref = r->discard_no_unref;
 
-    if (s->cache_clean_interval != r->cache_clean_interval) {
-        cache_clean_timer_del(bs);
-        s->cache_clean_interval = r->cache_clean_interval;
-        cache_clean_timer_init(bs, bdrv_get_aio_context(bs));
-    }
-
     qapi_free_QCryptoBlockOpenOptions(s->crypto_opts);
     s->crypto_opts = r->crypto_opts;
 }
@@ -1261,6 +1326,7 @@ static void qcow2_update_options_abort(BlockDriverState *bs,
     qapi_free_QCryptoBlockOpenOptions(r->crypto_opts);
 }
 
+/* Called with s->lock held */
 static int coroutine_fn GRAPH_RDLOCK
 qcow2_update_options(BlockDriverState *bs, QDict *options, int flags,
                      Error **errp)
@@ -1270,7 +1336,7 @@ qcow2_update_options(BlockDriverState *bs, QDict *options, int flags,
 
     ret = qcow2_update_options_prepare(bs, &r, options, flags, errp);
     if (ret >= 0) {
-        qcow2_update_options_commit(bs, &r);
+        qcow2_update_options_commit(bs, &r, true);
     } else {
         qcow2_update_options_abort(bs, &r);
     }
@@ -1908,7 +1974,7 @@ qcow2_do_open(BlockDriverState *bs, QDict *options, int flags,
     qemu_vfree(s->l1_table);
     /* else pre-write overlap checks in cache_destroy may crash */
     s->l1_table = NULL;
-    cache_clean_timer_del(bs);
+    cache_clean_timer_locked_co_del_and_wait(bs);
     if (s->l2_table_cache) {
         qcow2_cache_destroy(s->l2_table_cache);
     }
@@ -2048,7 +2114,7 @@ static void qcow2_reopen_commit(BDRVReopenState *state)
 
     GRAPH_RDLOCK_GUARD_MAINLOOP();
 
-    qcow2_update_options_commit(state->bs, state->opaque);
+    qcow2_update_options_commit(state->bs, state->opaque, false);
     if (!s->data_file) {
         /*
          * If we don't have an external data file, s->data_file was cleared by
@@ -2805,7 +2871,7 @@ qcow2_do_close(BlockDriverState *bs, bool close_data_file)
         qcow2_inactivate(bs);
     }
 
-    cache_clean_timer_del(bs);
+    cache_clean_timer_del_and_wait(bs);
     qcow2_cache_destroy(s->l2_table_cache);
     qcow2_cache_destroy(s->refcount_block_cache);
 
-- 
2.51.0



  parent reply	other threads:[~2025-10-28 16:35 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-28 16:33 [PATCH 00/16] block: Some multi-threading fixes Hanna Czenczek
2025-10-28 16:33 ` [PATCH 01/16] block: Note on aio_co_wake use if not yet yielding Hanna Czenczek
2025-10-28 16:33 ` [PATCH 02/16] rbd: Run co BH CB in the coroutine’s AioContext Hanna Czenczek
2025-10-28 16:33 ` [PATCH 03/16] iscsi: " Hanna Czenczek
2025-10-29 14:27   ` Kevin Wolf
2025-10-31  9:07     ` Hanna Czenczek
2025-10-31 13:27       ` Kevin Wolf
2025-10-28 16:33 ` [PATCH 04/16] nfs: " Hanna Czenczek
2025-10-28 16:33 ` [PATCH 05/16] curl: Fix coroutine waking Hanna Czenczek
2025-10-29 16:57   ` Kevin Wolf
2025-10-31  9:15     ` Hanna Czenczek
2025-10-31 13:17       ` Kevin Wolf
2025-10-28 16:33 ` [PATCH 06/16] gluster: Do not move coroutine into BDS context Hanna Czenczek
2025-10-29 17:10   ` Kevin Wolf
2025-10-31  9:16     ` Hanna Czenczek
2025-10-28 16:33 ` [PATCH 07/16] nvme: Kick and check completions in " Hanna Czenczek
2025-10-29 17:23   ` Kevin Wolf
2025-10-29 17:39     ` Kevin Wolf
2025-10-31  9:18       ` Hanna Czenczek
2025-10-31  9:19     ` Hanna Czenczek
2025-10-28 16:33 ` [PATCH 08/16] nvme: Fix coroutine waking Hanna Czenczek
2025-10-29 17:43   ` Kevin Wolf
2025-10-28 16:33 ` [PATCH 09/16] block/io: Take reqs_lock for tracked_requests Hanna Czenczek
2025-10-28 16:33 ` Hanna Czenczek [this message]
2025-10-29 20:23   ` [PATCH 10/16] qcow2: Fix cache_clean_timer Kevin Wolf
2025-10-31  9:29     ` Hanna Czenczek
2025-10-31 13:03       ` Kevin Wolf
2025-11-06 16:08         ` Hanna Czenczek
2025-10-28 16:33 ` [PATCH 11/16] ssh: Run restart_coroutine in current AioContext Hanna Czenczek
2025-10-28 16:33 ` [PATCH 12/16] blkreplay: Run BH in coroutine’s AioContext Hanna Czenczek
2025-10-28 16:33 ` [PATCH 13/16] block: Note in which AioContext AIO CBs are called Hanna Czenczek
2025-10-28 16:33 ` [PATCH 14/16] iscsi: Create AIO BH in original AioContext Hanna Czenczek
2025-10-28 16:33 ` [PATCH 15/16] null-aio: Run CB " Hanna Czenczek
2025-10-28 16:33 ` [PATCH 16/16] win32-aio: Run CB in original context Hanna Czenczek
2025-10-30 14:12   ` Kevin Wolf
2025-10-31  9:31     ` Hanna Czenczek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251028163343.116249-11-hreitz@redhat.com \
    --to=hreitz@redhat.com \
    --cc=alex.bennee@linaro.org \
    --cc=fam@euphon.net \
    --cc=idryomov@gmail.com \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=philmd@linaro.org \
    --cc=pl@dlhnet.de \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rjones@redhat.com \
    --cc=ronniesahlberg@gmail.com \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).