Re: [PATCH v2 12/19] qcow2: Fix cache_clean_timer

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Kevin Wolf <kwolf@redhat.com>
To: Hanna Czenczek <hreitz@redhat.com>
Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Richard W . M . Jones" <rjones@redhat.com>,
	"Ilya Dryomov" <idryomov@gmail.com>,
	"Peter Lieven" <pl@dlhnet.de>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Alex Bennée" <alex.bennee@linaro.org>,
	"Fam Zheng" <fam@euphon.net>,
	"Ronnie Sahlberg" <ronniesahlberg@gmail.com>
Subject: Re: [PATCH v2 12/19] qcow2: Fix cache_clean_timer
Date: Mon, 17 Nov 2025 15:50:33 +0100	[thread overview]
Message-ID: <aRs2OSuFy0HyW4EU@redhat.com> (raw)
In-Reply-To: <20251110154854.151484-13-hreitz@redhat.com>

Am 10.11.2025 um 16:48 hat Hanna Czenczek geschrieben:
> The cache-cleaner runs as a timer CB in the BDS AioContext.  With
> multiqueue, it can run concurrently to I/O requests, and because it does
> not take any lock, this can break concurrent cache accesses, corrupting
> the image.  While the chances of this happening are low, it can be
> reproduced e.g. by modifying the code to schedule the timer CB every
> 5 ms (instead of at most once per second) and modifying the last (inner)
> while loop of qcow2_cache_clean_unused() like so:
> 
>     while (i < c->size && can_clean_entry(c, i)) {
>         for (int j = 0; j < 1000 && can_clean_entry(c, i); j++) {
>             usleep(100);
>         }
>         c->entries[i].offset = 0;
>         c->entries[i].lru_counter = 0;
>         i++;
>         to_clean++;
>     }
> 
> i.e. making it wait on purpose for the point in time where the cache is
> in use by something else.
> 
> The solution chosen for this in this patch is not the best solution, I
> hope, but I admittedly can’t come up with anything strictly better.
> 
> We can protect from concurrent cache accesses either by taking the
> existing s->lock, or we introduce a new (non-coroutine) mutex
> specifically for cache accesses.  I would prefer to avoid the latter so
> as not to introduce additional (very slight) overhead.
> 
> Using s->lock, which is a coroutine mutex, however means that we need to
> take it in a coroutine, so the timer must run in a coroutine.  We can
> transform it from the current timer CB style into a coroutine that
> sleeps for the set interval.  As a result, however, we can no longer
> just deschedule the timer to instantly guarantee it won’t run anymore,
> but have to await the coroutine’s exit.
> 
> (Note even before this patch there were places that may not have been so
> guaranteed after all: Anything calling cache_clean_timer_del() from the
> QEMU main AioContext could have been running concurrently to an existing
> timer CB invocation.)
> 
> Polling to await the timer to actually settle seems very complicated for
> something that’s rather a minor problem, but I can’t come up with any
> better solution that doesn’t again just overlook potential problems.
> 
> (Not Cc-ing qemu-stable, as the issue is quite unlikely to be hit, and
> I’m not too fond of this solution.)
> 
> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>

> -static void cache_clean_timer_cb(void *opaque)
> +static void coroutine_fn cache_clean_timer(void *opaque)
>  {
> -    BlockDriverState *bs = opaque;
> -    BDRVQcow2State *s = bs->opaque;
> -    qcow2_cache_clean_unused(s->l2_table_cache);
> -    qcow2_cache_clean_unused(s->refcount_block_cache);
> -    timer_mod(s->cache_clean_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> -              (int64_t) s->cache_clean_interval * 1000);
> +    BDRVQcow2State *s = opaque;
> +    uint64_t wait_ns;
> +
> +    WITH_QEMU_LOCK_GUARD(&s->lock) {
> +        wait_ns = s->cache_clean_interval * NANOSECONDS_PER_SECOND;
> +    }
> +
> +    while (wait_ns > 0) {
> +        qemu_co_sleep_ns_wakeable(&s->cache_clean_timer_wake,
> +                                  QEMU_CLOCK_VIRTUAL, wait_ns);
> +
> +        WITH_QEMU_LOCK_GUARD(&s->lock) {
> +            if (s->cache_clean_interval > 0) {
> +                qcow2_cache_clean_unused(s->l2_table_cache);
> +                qcow2_cache_clean_unused(s->refcount_block_cache);
> +            }
> +
> +            wait_ns = s->cache_clean_interval * NANOSECONDS_PER_SECOND;
> +        }
> +    }
> +
> +    WITH_QEMU_LOCK_GUARD(&s->lock) {
> +        s->cache_clean_timer_co = NULL;
> +        qemu_co_queue_restart_all(&s->cache_clean_timer_exit);
> +    }
>  }

> +/**
> + * Delete the cache clean timer and await any yet running instance.
> + * Called holding s->lock.
> + */
> +static void coroutine_fn
> +cache_clean_timer_co_locked_del_and_wait(BlockDriverState *bs)
> +{
> +    BDRVQcow2State *s = bs->opaque;
> +
> +    if (s->cache_clean_timer_co) {
> +        s->cache_clean_interval = 0;
> +        qemu_co_sleep_wake(&s->cache_clean_timer_wake);
> +        qemu_co_queue_wait(&s->cache_clean_timer_exit, &s->lock);
>      }
>  }

I don't want to count the number of context switches this dance between
cache_clean_timer_co_locked_del_and_wait() and cache_clean_timer()
takes! Good that it's not a hot path. :-)

> +/* s_locked specifies whether s->lock is held or not */
>  static void qcow2_update_options_commit(BlockDriverState *bs,
> -                                        Qcow2ReopenState *r)
> +                                        Qcow2ReopenState *r,
> +                                        bool s_locked)
>  {
>      BDRVQcow2State *s = bs->opaque;
>      int i;
>  
> +    /*
> +     * We need to stop the cache-clean-timer before destroying the metadata
> +     * table caches
> +     */
> +    if (s_locked) {
> +        cache_clean_timer_co_locked_del_and_wait(bs);
> +    } else {
> +        cache_clean_timer_del_and_wait(bs);
> +    }
> +
>      if (s->l2_table_cache) {
>          qcow2_cache_destroy(s->l2_table_cache);
>      }
> @@ -1228,6 +1312,10 @@ static void qcow2_update_options_commit(BlockDriverState *bs,
>      }
>      s->l2_table_cache = r->l2_table_cache;
>      s->refcount_block_cache = r->refcount_block_cache;
> +
> +    s->cache_clean_interval = r->cache_clean_interval;
> +    cache_clean_timer_init(bs, bdrv_get_aio_context(bs));
> +
>      s->l2_slice_size = r->l2_slice_size;
>  
>      s->overlap_check = r->overlap_check;
> @@ -1239,12 +1327,6 @@ static void qcow2_update_options_commit(BlockDriverState *bs,
>  
>      s->discard_no_unref = r->discard_no_unref;
>  
> -    if (s->cache_clean_interval != r->cache_clean_interval) {
> -        cache_clean_timer_del(bs);
> -        s->cache_clean_interval = r->cache_clean_interval;
> -        cache_clean_timer_init(bs, bdrv_get_aio_context(bs));
> -    }
> -
>      qapi_free_QCryptoBlockOpenOptions(s->crypto_opts);
>      s->crypto_opts = r->crypto_opts;
>  }

Is there any specific reason why you move cache_clean_timer_init()
earlier? I don't see an actual problem with the code as it is after this
change, but s->l2_slice_size is related to the cache in a way, so it
would feel safer if the cache cleaner were only started once all the
settings are updated and not only those that it actually happens to
access at the moment.

Kevin

next prev parent reply	other threads:[~2025-11-17 14:51 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-10 15:48 [PATCH v2 00/19] block: Some multi-threading fixes Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 01/19] block: Note on aio_co_wake use if not yet yielding Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 02/19] rbd: Run co BH CB in the coroutine’s AioContext Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 03/19] iscsi: " Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 04/19] nfs: " Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 05/19] curl: Fix coroutine waking Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 06/19] gluster: Do not move coroutine into BDS context Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 07/19] nvme: Kick and check completions in " Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 08/19] nvme: Fix coroutine waking Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 09/19] nvme: Note in which AioContext some functions run Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 10/19] block/io: Take reqs_lock for tracked_requests Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 11/19] qcow2: Re-initialize lock in invalidate_cache Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 12/19] qcow2: Fix cache_clean_timer Hanna Czenczek
2025-11-17 14:50   ` Kevin Wolf [this message]
2025-11-18 11:01     ` Hanna Czenczek
2025-11-18 17:06       ` Kevin Wolf
2025-11-18 17:19         ` Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 13/19] qcow2: Schedule cache-clean-timer in realtime Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 14/19] ssh: Run restart_coroutine in current AioContext Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 15/19] blkreplay: Run BH in coroutine’s AioContext Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 16/19] block: Note in which AioContext AIO CBs are called Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 17/19] iscsi: Create AIO BH in original AioContext Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 18/19] null-aio: Run CB " Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 19/19] win32-aio: Run CB in original context Hanna Czenczek
2025-11-17 15:11 ` [PATCH v2 00/19] block: Some multi-threading fixes Kevin Wolf
2025-11-17 19:11 ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aRs2OSuFy0HyW4EU@redhat.com \
    --to=kwolf@redhat.com \
    --cc=alex.bennee@linaro.org \
    --cc=fam@euphon.net \
    --cc=hreitz@redhat.com \
    --cc=idryomov@gmail.com \
    --cc=pbonzini@redhat.com \
    --cc=philmd@linaro.org \
    --cc=pl@dlhnet.de \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rjones@redhat.com \
    --cc=ronniesahlberg@gmail.com \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).