public inbox for qemu-devel@nongnu.org
 help / color / mirror / Atom feed
From: Fabiano Rosas <farosas@suse.de>
To: Thomas Huth <thuth@redhat.com>, qemu-devel@nongnu.org
Cc: Peter Xu <peterx@redhat.com>, Prasad Pandit <pjp@fedoraproject.org>
Subject: Re: [PULL 05/10] tests/qtest/migration: Force exit-on-error=false
Date: Thu, 26 Mar 2026 10:28:02 -0300	[thread overview]
Message-ID: <87cy0qk9el.fsf@suse.de> (raw)
In-Reply-To: <7002cee0-9287-4ff1-9580-eff97aa02566@redhat.com>

Thomas Huth <thuth@redhat.com> writes:

> On 17/03/2026 19.23, Fabiano Rosas wrote:
>> Some tests can cause QEMU to exit(1) too early while the incoming
>> coroutine has not yielded for a first time yet. This trips ASAN
>> because resources related to dispatching the incoming process will
>> still be allocated in the io/channel.c layer without a
>> straight-forward way for the migration code to clean them up.
>> 
>> As an example of one such issue, the UUID validation happens early
>> enough that the temporary socket from qio_net_listener_channel_func()
>> still has an elevated refcount. If it fails, the listener dispatch
>> code never gets to free the resource:
>> 
>> Direct leak of 400 byte(s) in 1 object(s) allocated from:
>>      #0 0x55e668890a07 in malloc asan_malloc_linux.cpp:68:3
>>      #1 0x7f3c7e2b6648 in g_malloc ../glib/gmem.c:130
>>      #2 0x55e66a8ef05f in object_new_with_type ../qom/object.c:767:15
>>      #3 0x55e66a8ef178 in object_new ../qom/object.c:789:12
>>      #4 0x55e66a93bcc6 in qio_channel_socket_new ../io/channel-socket.c:70:31
>>      #5 0x55e66a93f34f in qio_channel_socket_accept ../io/channel-socket.c:401:12
>>      #6 0x55e66a96752a in qio_net_listener_channel_func ../io/net-listener.c:64:12
>>      #7 0x55e66a94bdac in qio_channel_fd_source_dispatch ../io/channel-watch.c:84:12
>>      #8 0x7f3c7e2adf4b in g_main_dispatch ../glib/gmain.c:3476
>>      #9 0x7f3c7e2adf4b in g_main_context_dispatch_unlocked ../glib/gmain.c:4284
>>      #10 0x7f3c7e2b00c8 in g_main_context_dispatch ../glib/gmain.c:4272
>> 
>> The exit(1) also requires some tests to setup qtest to expect a return
>> code of 1 from the QEMU process. Although we can check migration
>> status changes to be fairly certain where the failure happened, there
>> is always the possibility of QEMU exiting for another reason and the
>> test passing. This happens frequently with sanitizers enabled, but
>> also risks masking issues in the regular build.
>> 
>> Stop allowing the incoming migration to exit and instead require the
>> tests to wait for the FAILED state and end QEMU gracefully with
>> qtest_quit.
>> 
>> In practice this means setting exit-on-error=false for every incoming
>> migration, changing MIG_TEST_FAIL_DEST_QUIT_ERR to MIG_TEST_FAIL and
>> waiting for a change of state where necessary.
>> 
>> With this, the MIG_TEST_FAIL_DEST_QUIT_ERR error result is now unused,
>> remove it.
>> 
>> The affected tests are:
>> validate_uuid_error
>> multifd_tcp_cancel
>> dirty_limit
>> precopy_unix_tls_x509_default_host
>> precopy_tcp_tls_no_hostname
>> tcp_tls_x509_mismatch_host
>> dbus_vmstate_missing_src
>> dbus_vmstate_missing_dst
>> 
>> Also add a comment to QEMU source explaining that the incoming
>> coroutine might block for a while until it yields as this is the
>> actual root cause of the issue.
>> 
>> Reviewed-by: Peter Xu <peterx@redhat.com>
>> Reviewed-by: Prasad Pandit <pjp@fedoraproject.org>
>> Link: https://lore.kernel.org/qemu-devel/20260311213418.16951-6-farosas@suse.de
>> [assert that key doesn't already exists]
>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> ---
>>   migration/migration.c                 |  5 +++++
>>   tests/qtest/dbus-vmstate-test.c       |  5 +++--
>>   tests/qtest/migration/framework.c     |  5 +----
>>   tests/qtest/migration/framework.h     |  2 --
>>   tests/qtest/migration/migration-qmp.c |  7 +++++++
>>   tests/qtest/migration/misc-tests.c    |  4 ++--
>>   tests/qtest/migration/precopy-tests.c | 12 +++++-------
>>   tests/qtest/migration/tls-tests.c     | 14 ++++++++------
>>   8 files changed, 31 insertions(+), 23 deletions(-)
>
>   Hi Fabiano,
>
> this patch now triggers a failure in the qtests when I'm running these in 
> "SPEED=thorough" mode:
>
> MESON_TEST_ITERATION=1 MALLOC_PERTURB_=120 
> ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 G_TEST_SLOW=1 
> PYTHON=/home/thuth/tmp/qemu-build/pyvenv/bin/python3 RUST_BACKTRACE=1 
> QTEST_QEMU_IMG=./qemu-img 
> G_TEST_DBUS_DAEMON=/home/thuth/devel/qemu/tests/dbus-vmstate-daemon.sh 
> QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon 
> UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 
> MSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 
> QTEST_QEMU_BINARY=./qemu-system-x86_64 
> /home/thuth/tmp/qemu-build/tests/qtest/migration-test --tap -k --full
>
> TAP version 14
> # random seed: R02Sb882c8142734dce2265e65214fd2b060
> # starting QEMU: exec ./qemu-system-x86_64 -qtest 
> unix:/tmp/qtest-106610.sock -qtest-log /dev/null -chardev 
> socket,path=/tmp/qtest-106610.qmp,id=char0 -mon chardev=char0,mode=control 
> -display none -audio none -run-with exit-with-parent=on -machine none -accel 
> qtest
> # Skipping test: userfaultfd not available
> 1..80
> # Start of x86_64 tests
> # Running /x86_64/dirty_limit
> # Using machine type: pc-q35-11.0
> # starting QEMU: exec ./qemu-system-x86_64 -qtest 
> unix:/tmp/qtest-106610.sock -qtest-log /dev/null -chardev 
> socket,path=/tmp/qtest-106610.qmp,id=char0 -mon chardev=char0,mode=control 
> -display none -audio none -run-with exit-with-parent=on -accel 
> kvm,dirty-ring-size=4096 -accel tcg -machine pc-q35-11.0, -name 
> source,debug-threads=on -machine memory-backend=mig.mem -object 
> memory-backend-ram,id=mig.mem,size=150M,share=off -serial 
> file:/tmp/migration-test-8B95M3/src_serial -drive 
> if=none,id=d0,file=/tmp/migration-test-8B95M3/bootsect,format=raw -device 
> ide-hd,drive=d0,secs=1,cyls=1,heads=1  2>/dev/null -accel qtest
> # starting QEMU: exec ./qemu-system-x86_64 -qtest 
> unix:/tmp/qtest-106610.sock -qtest-log /dev/null -chardev 
> socket,path=/tmp/qtest-106610.qmp,id=char0 -mon chardev=char0,mode=control 
> -display none -audio none -run-with exit-with-parent=on -accel 
> kvm,dirty-ring-size=4096 -accel tcg -machine pc-q35-11.0, -name 
> target,debug-threads=on -machine memory-backend=mig.mem -object 
> memory-backend-ram,id=mig.mem,size=150M,share=off -serial 
> file:/tmp/migration-test-8B95M3/dest_serial -incoming 
> unix:/tmp/migration-test-8B95M3/migsocket  -drive 
> if=none,id=d0,file=/tmp/migration-test-8B95M3/bootsect,format=raw -device 
> ide-hd,drive=d0,secs=1,cyls=1,heads=1  2>/dev/null -accel qtest
> ../../devel/qemu/tests/qtest/libqtest.c:201: kill_qemu() tried to terminate 
> QEMU process but encountered exit status 1 (expected 0)
> Aborted (core dumped)
>
> Could you please try whether you could reproduce that crash?
>
>   Thomas

Argh, too many dirty this, dirty that. I'll send a patch. Thanks!


  reply	other threads:[~2026-03-26 13:28 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-17 18:23 [PULL 00/10] Migration/Qtest patches for 2026-03-17 Fabiano Rosas
2026-03-17 18:23 ` [PULL 01/10] tests/qtest/migration: Fix leak of migration tests data Fabiano Rosas
2026-03-17 18:23 ` [PULL 02/10] io: Fix TLS bye task leak Fabiano Rosas
2026-03-18 20:36   ` Michael Tokarev
2026-03-19  8:57     ` Daniel P. Berrangé
2026-03-17 18:23 ` [PULL 03/10] tests/qtest/migration: Fix leak in CPR exec test Fabiano Rosas
2026-03-17 18:23 ` [PULL 04/10] migration/multifd: Fix leaks of TLS error objects Fabiano Rosas
2026-03-17 18:23 ` [PULL 05/10] tests/qtest/migration: Force exit-on-error=false Fabiano Rosas
2026-03-26  9:02   ` Thomas Huth
2026-03-26 13:28     ` Fabiano Rosas [this message]
2026-03-17 18:23 ` [PULL 06/10] migration: assert that the same migration handler is not being added twice Fabiano Rosas
2026-03-17 18:23 ` [PULL 07/10] migration/options: Fix leaks in StrOrNull qdev accessors Fabiano Rosas
2026-03-17 18:23 ` [PULL 08/10] migration: fix implicit integer division in migration_update_counters Fabiano Rosas
2026-03-17 18:23 ` [PULL 09/10] tests/qtest: Don't dup machine name in qtest_cb_for_every_machine callbacks Fabiano Rosas
2026-03-17 18:23 ` [PULL 10/10] tests/qtest/test-hmp: Free machine options Fabiano Rosas
2026-03-18 13:26 ` [PULL 00/10] Migration/Qtest patches for 2026-03-17 Peter Maydell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87cy0qk9el.fsf@suse.de \
    --to=farosas@suse.de \
    --cc=peterx@redhat.com \
    --cc=pjp@fedoraproject.org \
    --cc=qemu-devel@nongnu.org \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox