All of lore.kernel.org
 help / color / mirror / Atom feed
From: Fabiano Rosas <farosas@suse.de>
To: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-devel@nongnu.org, berrange@redhat.com,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Hanna Reitz <hreitz@redhat.com>
Subject: Re: [PATCH v1] qed: Don't try to drain during INMIGRATE
Date: Thu, 28 May 2026 18:01:57 -0300	[thread overview]
Message-ID: <87y0h3tfi2.fsf@suse.de> (raw)
In-Reply-To: <ahhzCjSY7zrbXFme@redhat.com>

Kevin Wolf <kwolf@redhat.com> writes:

> Am 28.05.2026 um 15:55 hat Fabiano Rosas geschrieben:
>> It's not possible to access the image file while there is an incoming
>> migration in progress, the QEMU process doesn't hold any locks to the
>> storage at this point so nodes are inactive. Attempting to drain leads
>> to an assert at bdrv_co_write_req_prepare():
>> 
>>    assert(!(bs->open_flags & BDRV_O_INACTIVE))
>> 
>> The issue is reproducible by running iotest 181 on a host under cpu
>> load. The migration must coincide with the header already containing
>> the QED_F_NEED_CHECK flag.
>> 
>> The sequence of events is as follows, with the respective call stacks
>> referenced below:
>> 
>> During block device init, bdrv_qed_attach_aio_context() starts the
>> 'need_check' timer. The timer will not fire during incoming migration
>> as it uses QEMU_CLOCK_VIRTUAL (to avoid this very issue, as the code
>> comment indicates).                                                   (0)
>> 
>> However, there's still bdrv_qed_drain_begin() which uses the fact that
>> the timer is live to decide whether to start the
>> qed_need_check_timer_entry() directly.                                (1)
>> 
>> The qed_need_check_timer_entry() eventually calls into
>> qed_write_header() -> bdrv_co_pwrite() leading to the assert.         (2)
>> 
>> Since we don't have an API for checking if a timer is enabled, an
>> alternative is to skip this logic whenever the runstate is
>> INMIGRATE. This actually matches the setting of the BDRV_O_INACTIVE
>> flag at blockdev.c.
>
> I'm not sure what you mean by "checking if a timer is enabled", because
> bdrv_qed_drain_begin() has a timer_pending() check that sounds exactly
> what you say doesn't exist.
>

Sorry, I meant to say checking if a *clock* is enabled. 

The code comment at qed_start_need_check_timer() says:

    /* Use QEMU_CLOCK_VIRTUAL so we don't alter the image file while suspended for
     * migration.
     */

It seems to assume that since QEMU_CLOCK_VIRTUAL is disabled when vcpus
are paused then the timer function will never execute while the image is
inactive. If bdrv_qed_drain_begin() could check clock->enabled, then it
would be under the same assumption.

> But it's also not clear to me why s->need_check_timer is true on an
> inactive image. I don't think this is supposed to be the case?

bdrv_qed_do_open() reads the header during QEMU startup. Whatever is set
in the header at that moment will be picked up, including
QED_F_NEED_CHECK, which is what the code uses to determine the timer
should be started.

> It looks like qed is missing a .bdrv_inactivate implementation that
> cancels the timer after flushing everything (including the header) to
> disk.
>

Hmm, but bdrv_inactivate happens on the migration source. The
destination machine could still start at any moment while the image is
dirty and see the QED_F_NEED_CHECK flag. Even doing something on the
destination machine wouldn't help I think given how early the crash
happens.

>> 
>> The stacks:
>> 
>> (0) == issues timer_mod ==
>>  #6  in qed_start_need_check_timer       at ../block/qed.c:340
>>  #7  in bdrv_qed_attach_aio_context      at ../block/qed.c:373
>>  #8  in bdrv_qed_do_open                 at ../block/qed.c:556
>>  #9  in bdrv_qed_open_entry              at ../block/qed.c:582
>>  #10 in coroutine_trampoline             at ../util/coroutine-ucontext.c:175
>>  #0  in qemu_coroutine_switch<+120>      at ../util/coroutine-ucontext.c:321
>>  #1  in qemu_aio_coroutine_enter<+356>   at ../util/qemu-coroutine.c:293
>>  #2  in aio_co_enter<+179>               at ../util/async.c:710
>>  #3  in aio_co_wake<+53>                 at ../util/async.c:695
>>  #4  in thread_pool_co_cb<+47>           at ../util/thread-pool.c:283
>>  #5  in thread_pool_completion_bh<+241>  at ../util/thread-pool.c:202
>>  #6  in aio_bh_call<+109>                at ../util/async.c:173
>>  #7  in aio_bh_poll<+299>                at ../util/async.c:220
>>  #8  in aio_poll<+690>                   at ../util/aio-posix.c:745
>>  #9  in bdrv_qed_open<+392>              at ../block/qed.c:607
>>  #10 in bdrv_open_driver<+327>           at ../block.c:1678
>>  #11 in bdrv_open_common<+1619>          at ../block.c:2008
>>  #12 in bdrv_open_inherit<+2556>         at ../block.c:4191
>>  #13 in bdrv_open<+118>                  at ../block.c:4286
>>  #14 in blk_new_open<+199>               at ../block/block-backend.c:458
>>  #15 in blockdev_init<+2011>             at ../blockdev.c:612
>>  #16 in drive_new<+3008>                 at ../blockdev.c:1008
>>  #17 in drive_init_func<+51>             at ../system/vl.c:662
>>  #18 in qemu_opts_foreach<+227>          at ../util/qemu-option.c:1148
>>  #19 in configure_blockdev<+350>         at ../system/vl.c:721
>>  #20 in qemu_create_early_backends<+343> at ../system/vl.c:2076
>>  #21 in qemu_init<+12483>                at ../system/vl.c:3778
>>  #22 in main<+46>                        at ../system/main.c:71
>> 
>> (1) == sees timer_pending ==
>>  #6  in bdrv_qed_drain_begin             at ../block/qed.c:391
>>  #7  in bdrv_do_drained_begin            at ../block/io.c:366
>>  #8  in bdrv_do_drained_begin_quiesce    at ../block/io.c:386
>>  #9  in bdrv_child_cb_drained_begin      at ../block.c:1207
>>  #10 in bdrv_parent_drained_begin_single at ../block/io.c:133
>>  #11 in bdrv_parent_drained_begin        at ../block/io.c:64
>>  #12 in bdrv_do_drained_begin            at ../block/io.c:364
>>  #13 in bdrv_drained_begin               at ../block/io.c:393
>>  #14 in blk_drain                        at ../block/block-backend.c:2101
>>  #15 in blk_unref                        at ../block/block-backend.c:544
>>  #16 in bdrv_open_inherit                at ../block.c:4197
>>  #17 in bdrv_open                        at ../block.c:4286
>>  #18 in blk_new_open                     at ../block/block-backend.c:458
>>  #19 in blockdev_init                    at ../blockdev.c:612
>>  #20 in drive_new                        at ../blockdev.c:1008
>>  #21 in drive_init_func                  at ../system/vl.c:662
>>  #22 in qemu_opts_foreach                at ../util/qemu-option.c:1148
>>  #23 in configure_blockdev               at ../system/vl.c:721
>>  #24 in qemu_create_early_backends       at ../system/vl.c:2076
>>  #25 in qemu_init                        at ../system/vl.c:3778
>>  #26 in main                             at ../system/main.c:71
>> 
>> (2) == crashes ==
>>  #5  in __assert_fail (assertion="!(bs->open_flags & BDRV_O_INACTIVE)", file="../block/io.c", line=1977
>>  #6  in bdrv_co_write_req_prepare  at ../block/io.c:1977
>>  #7  in bdrv_aligned_pwritev       at ../block/io.c:2099
>>  #8  in bdrv_co_pwritev_part       at ../block/io.c:2316
>>  #9  in bdrv_co_pwritev            at ../block/io.c:2233
>>  #10 in bdrv_co_pwrite             at ../include/block/block_int-io.h:77
>>  #11 in qed_write_header           at ../block/qed.c:128
>>  #12 in qed_need_check_timer       at ../block/qed.c:305
>>  #13 in qed_need_check_timer_entry at ../block/qed.c:319
>> 
>> Note that this issue is not exactly the same as what's been reported
>> in Gitlab, but given how easily this reproduces, I imagine it has to
>> be happening in that setup as well.
>> 
>> Link: https://gitlab.com/qemu-project/qemu/-/work_items/3515
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> ---
>> CI run: https://gitlab.com/farosas/qemu/-/pipelines/2557314306
>> ---
>>  block/qed.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>> 
>> diff --git a/block/qed.c b/block/qed.c
>> index da23a83d62..ccf32bb4ae 100644
>> --- a/block/qed.c
>> +++ b/block/qed.c
>> @@ -21,6 +21,7 @@
>>  #include "qemu/module.h"
>>  #include "qemu/option.h"
>>  #include "qemu/memalign.h"
>> +#include "system/runstate.h"
>>  #include "trace.h"
>>  #include "qed.h"
>>  #include "system/block-backend.h"
>> @@ -373,6 +374,11 @@ static void bdrv_qed_drain_begin(BlockDriverState *bs)
>>  {
>>      BDRVQEDState *s = bs->opaque;
>>  
>> +    /* Nodes are inactive while waiting for an incoming migration. */
>> +    if (runstate_check(RUN_STATE_INMIGRATE)) {
>
> If you're interested in whether the node is inactive, this is what you
> should check. RUN_STATE_INMIGRATE is only one of the conditions in which
> it might (but doesn't have to) be true. There is bdrv_is_inactive() to
> check this.
>
> Of course, if we're fixing the root cause (that QED_F_NEED_CHECK and
> s->need_check_timer are set on an inactive image), this hunk might not
> be needed at all.
>
>> +        return;
>> +    }
>> +
>>      /* Fire the timer immediately in order to start doing I/O as soon as the
>>       * header is flushed.
>>       */
>
> Kevin


  reply	other threads:[~2026-05-28 21:03 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-28 13:55 [PATCH v1] qed: Don't try to drain during INMIGRATE Fabiano Rosas
2026-05-28 16:53 ` Kevin Wolf
2026-05-28 21:01   ` Fabiano Rosas [this message]
2026-06-01 19:08     ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y0h3tfi2.fsf@suse.de \
    --to=farosas@suse.de \
    --cc=berrange@redhat.com \
    --cc=hreitz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.