qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Hanna Czenczek <hreitz@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Richard W . M . Jones" <rjones@redhat.com>,
	"Ilya Dryomov" <idryomov@gmail.com>,
	"Peter Lieven" <pl@dlhnet.de>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Alex Bennée" <alex.bennee@linaro.org>,
	"Fam Zheng" <fam@euphon.net>,
	"Ronnie Sahlberg" <ronniesahlberg@gmail.com>
Subject: Re: [PATCH v2 12/19] qcow2: Fix cache_clean_timer
Date: Tue, 18 Nov 2025 12:01:31 +0100	[thread overview]
Message-ID: <7a3dc3f2-cded-4e2f-a914-938acd52576f@redhat.com> (raw)
In-Reply-To: <aRs2OSuFy0HyW4EU@redhat.com>

On 17.11.25 15:50, Kevin Wolf wrote:
> Am 10.11.2025 um 16:48 hat Hanna Czenczek geschrieben:
>> The cache-cleaner runs as a timer CB in the BDS AioContext.  With
>> multiqueue, it can run concurrently to I/O requests, and because it does
>> not take any lock, this can break concurrent cache accesses, corrupting
>> the image.  While the chances of this happening are low, it can be
>> reproduced e.g. by modifying the code to schedule the timer CB every
>> 5 ms (instead of at most once per second) and modifying the last (inner)
>> while loop of qcow2_cache_clean_unused() like so:
>>
>>      while (i < c->size && can_clean_entry(c, i)) {
>>          for (int j = 0; j < 1000 && can_clean_entry(c, i); j++) {
>>              usleep(100);
>>          }
>>          c->entries[i].offset = 0;
>>          c->entries[i].lru_counter = 0;
>>          i++;
>>          to_clean++;
>>      }
>>
>> i.e. making it wait on purpose for the point in time where the cache is
>> in use by something else.
>>
>> The solution chosen for this in this patch is not the best solution, I
>> hope, but I admittedly can’t come up with anything strictly better.
>>
>> We can protect from concurrent cache accesses either by taking the
>> existing s->lock, or we introduce a new (non-coroutine) mutex
>> specifically for cache accesses.  I would prefer to avoid the latter so
>> as not to introduce additional (very slight) overhead.
>>
>> Using s->lock, which is a coroutine mutex, however means that we need to
>> take it in a coroutine, so the timer must run in a coroutine.  We can
>> transform it from the current timer CB style into a coroutine that
>> sleeps for the set interval.  As a result, however, we can no longer
>> just deschedule the timer to instantly guarantee it won’t run anymore,
>> but have to await the coroutine’s exit.
>>
>> (Note even before this patch there were places that may not have been so
>> guaranteed after all: Anything calling cache_clean_timer_del() from the
>> QEMU main AioContext could have been running concurrently to an existing
>> timer CB invocation.)
>>
>> Polling to await the timer to actually settle seems very complicated for
>> something that’s rather a minor problem, but I can’t come up with any
>> better solution that doesn’t again just overlook potential problems.
>>
>> (Not Cc-ing qemu-stable, as the issue is quite unlikely to be hit, and
>> I’m not too fond of this solution.)
>>
>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
>> -static void cache_clean_timer_cb(void *opaque)
>> +static void coroutine_fn cache_clean_timer(void *opaque)
>>   {
>> -    BlockDriverState *bs = opaque;
>> -    BDRVQcow2State *s = bs->opaque;
>> -    qcow2_cache_clean_unused(s->l2_table_cache);
>> -    qcow2_cache_clean_unused(s->refcount_block_cache);
>> -    timer_mod(s->cache_clean_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
>> -              (int64_t) s->cache_clean_interval * 1000);
>> +    BDRVQcow2State *s = opaque;
>> +    uint64_t wait_ns;
>> +
>> +    WITH_QEMU_LOCK_GUARD(&s->lock) {
>> +        wait_ns = s->cache_clean_interval * NANOSECONDS_PER_SECOND;
>> +    }
>> +
>> +    while (wait_ns > 0) {
>> +        qemu_co_sleep_ns_wakeable(&s->cache_clean_timer_wake,
>> +                                  QEMU_CLOCK_VIRTUAL, wait_ns);
>> +
>> +        WITH_QEMU_LOCK_GUARD(&s->lock) {
>> +            if (s->cache_clean_interval > 0) {
>> +                qcow2_cache_clean_unused(s->l2_table_cache);
>> +                qcow2_cache_clean_unused(s->refcount_block_cache);
>> +            }
>> +
>> +            wait_ns = s->cache_clean_interval * NANOSECONDS_PER_SECOND;
>> +        }
>> +    }
>> +
>> +    WITH_QEMU_LOCK_GUARD(&s->lock) {
>> +        s->cache_clean_timer_co = NULL;
>> +        qemu_co_queue_restart_all(&s->cache_clean_timer_exit);
>> +    }
>>   }
>> +/**
>> + * Delete the cache clean timer and await any yet running instance.
>> + * Called holding s->lock.
>> + */
>> +static void coroutine_fn
>> +cache_clean_timer_co_locked_del_and_wait(BlockDriverState *bs)
>> +{
>> +    BDRVQcow2State *s = bs->opaque;
>> +
>> +    if (s->cache_clean_timer_co) {
>> +        s->cache_clean_interval = 0;
>> +        qemu_co_sleep_wake(&s->cache_clean_timer_wake);
>> +        qemu_co_queue_wait(&s->cache_clean_timer_exit, &s->lock);
>>       }
>>   }
> I don't want to count the number of context switches this dance between
> cache_clean_timer_co_locked_del_and_wait() and cache_clean_timer()
> takes! Good that it's not a hot path. :-)
>
>> +/* s_locked specifies whether s->lock is held or not */
>>   static void qcow2_update_options_commit(BlockDriverState *bs,
>> -                                        Qcow2ReopenState *r)
>> +                                        Qcow2ReopenState *r,
>> +                                        bool s_locked)
>>   {
>>       BDRVQcow2State *s = bs->opaque;
>>       int i;
>>   
>> +    /*
>> +     * We need to stop the cache-clean-timer before destroying the metadata
>> +     * table caches
>> +     */
>> +    if (s_locked) {
>> +        cache_clean_timer_co_locked_del_and_wait(bs);
>> +    } else {
>> +        cache_clean_timer_del_and_wait(bs);
>> +    }
>> +
>>       if (s->l2_table_cache) {
>>           qcow2_cache_destroy(s->l2_table_cache);
>>       }
>> @@ -1228,6 +1312,10 @@ static void qcow2_update_options_commit(BlockDriverState *bs,
>>       }
>>       s->l2_table_cache = r->l2_table_cache;
>>       s->refcount_block_cache = r->refcount_block_cache;
>> +
>> +    s->cache_clean_interval = r->cache_clean_interval;
>> +    cache_clean_timer_init(bs, bdrv_get_aio_context(bs));
>> +
>>       s->l2_slice_size = r->l2_slice_size;
>>   
>>       s->overlap_check = r->overlap_check;
>> @@ -1239,12 +1327,6 @@ static void qcow2_update_options_commit(BlockDriverState *bs,
>>   
>>       s->discard_no_unref = r->discard_no_unref;
>>   
>> -    if (s->cache_clean_interval != r->cache_clean_interval) {
>> -        cache_clean_timer_del(bs);
>> -        s->cache_clean_interval = r->cache_clean_interval;
>> -        cache_clean_timer_init(bs, bdrv_get_aio_context(bs));
>> -    }
>> -
>>       qapi_free_QCryptoBlockOpenOptions(s->crypto_opts);
>>       s->crypto_opts = r->crypto_opts;
>>   }
> Is there any specific reason why you move cache_clean_timer_init()
> earlier? I don't see an actual problem with the code as it is after this
> change, but s->l2_slice_size is related to the cache in a way, so it
> would feel safer if the cache cleaner were only started once all the
> settings are updated and not only those that it actually happens to
> access at the moment.

Oh.  I don’t think there’s a good reason.  I think it makes sense to 
keep the set-up in the old place.  Can you do that in your tree?

(I think it’s just because when I split it to stop the timer before 
deleting the caches, I decided it would conversely make sense to start 
it after they have been set up.)

Hanna



  reply	other threads:[~2025-11-18 11:02 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-10 15:48 [PATCH v2 00/19] block: Some multi-threading fixes Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 01/19] block: Note on aio_co_wake use if not yet yielding Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 02/19] rbd: Run co BH CB in the coroutine’s AioContext Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 03/19] iscsi: " Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 04/19] nfs: " Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 05/19] curl: Fix coroutine waking Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 06/19] gluster: Do not move coroutine into BDS context Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 07/19] nvme: Kick and check completions in " Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 08/19] nvme: Fix coroutine waking Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 09/19] nvme: Note in which AioContext some functions run Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 10/19] block/io: Take reqs_lock for tracked_requests Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 11/19] qcow2: Re-initialize lock in invalidate_cache Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 12/19] qcow2: Fix cache_clean_timer Hanna Czenczek
2025-11-17 14:50   ` Kevin Wolf
2025-11-18 11:01     ` Hanna Czenczek [this message]
2025-11-18 17:06       ` Kevin Wolf
2025-11-18 17:19         ` Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 13/19] qcow2: Schedule cache-clean-timer in realtime Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 14/19] ssh: Run restart_coroutine in current AioContext Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 15/19] blkreplay: Run BH in coroutine’s AioContext Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 16/19] block: Note in which AioContext AIO CBs are called Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 17/19] iscsi: Create AIO BH in original AioContext Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 18/19] null-aio: Run CB " Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 19/19] win32-aio: Run CB in original context Hanna Czenczek
2025-11-17 15:11 ` [PATCH v2 00/19] block: Some multi-threading fixes Kevin Wolf
2025-11-17 19:11 ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7a3dc3f2-cded-4e2f-a914-938acd52576f@redhat.com \
    --to=hreitz@redhat.com \
    --cc=alex.bennee@linaro.org \
    --cc=fam@euphon.net \
    --cc=idryomov@gmail.com \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=philmd@linaro.org \
    --cc=pl@dlhnet.de \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rjones@redhat.com \
    --cc=ronniesahlberg@gmail.com \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).