From: "Alex Bennée" <alex.bennee@linaro.org>
To: Kevin Wolf <kwolf@redhat.com>
Cc: "Markus Armbruster" <armbru@redhat.com>,
qemu-devel@nongnu.org, "Philippe Mathieu-Daudé" <f4bug@amsat.org>,
"Pavel Dovgalyuk" <dovgaluk@ispras.ru>,
"Stefan Hajnoczi" <stefanha@redhat.com>,
"Cleber Rosa" <crosa@redhat.com>
Subject: Re: [PULL 09/14] qmp: Move dispatcher to a coroutine
Date: Mon, 12 Oct 2020 16:02:34 +0100 [thread overview]
Message-ID: <87r1q3zdv9.fsf@linaro.org> (raw)
In-Reply-To: <20201012124743.GD7777@merkur.fritz.box>
Kevin Wolf <kwolf@redhat.com> writes:
> Am 12.10.2020 um 13:53 hat Philippe Mathieu-Daudé geschrieben:
>> On 10/12/20 1:25 PM, Kevin Wolf wrote:
>> > Am 12.10.2020 um 12:47 hat Alex Bennée geschrieben:
>> > >
>> > > Markus Armbruster <armbru@redhat.com> writes:
>> > >
>> > > > From: Kevin Wolf <kwolf@redhat.com>
>> > > >
>> > > > This moves the QMP dispatcher to a coroutine and runs all QMP command
>> > > > handlers that declare 'coroutine': true in coroutine context so they
>> > > > can avoid blocking the main loop while doing I/O or waiting for other
>> > > > events.
>> > >
>> > > This subtly changes the replay behaviour leading to a hang in:
>> > >
>> > > 10:55:18 [alex.bennee@hackbox2:~/l/q/b/bisect] (625581c2…)|✚1(+1/-1) + ./tests/venv/bin/avocado run tests/acceptance/replay_kernel.py:ReplayKernel.test_arm_virt
>> > > Fetching asset from tests/acceptance/replay_kernel.py:ReplayKernel.test_arm_virt
>> > > JOB ID : ec11fd2544f06e6c0d421f16afa757b49f7ed734
>> > > JOB LOG : /home/alex.bennee/avocado/job-results/job-2020-10-12T11.40-ec11fd2/job.log
>> > > (1/1) tests/acceptance/replay_kernel.py:ReplayKernel.test_arm_virt: ERROR: Could not perform graceful shutdown (26.27 s)
>> > > RESULTS : PASS 0 | ERROR 1 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
>> > > JOB TIME : 27.77 s
>> > >
>> > > Looking at the log:
>> > >
>> > > 2020-10-12 11:40:31,426 __init__ L0085 DEBUG| [ 3.887411] rtc-pl031 9010000.pl031: setting system clock to 2020-10-12 10:40:31 UTC (1602499231)
>> > > 2020-10-12 11:40:31,428 __init__ L0085 DEBUG| [ 3.887431] sr_init: No PMIC hook to init smartreflex
>> > > 2020-10-12 11:40:31,447 __init__ L0085 DEBUG| [ 3.897193] uart-pl011 9000000.pl011: no DMA platform data
>> > > 2020-10-12 11:40:31,460 __init__ L0085 DEBUG| [ 3.897242] md: Waiting for all devices to be available before autodetect
>> > > 2020-10-12 11:40:31,462 __init__ L0085 DEBUG| [ 3.897259] md: If you don't use raid, use raid=noautodetect
>> > > 2020-10-12 11:40:31,475 __init__ L0085 DEBUG| [ 3.897819] md: Autodetecting RAID arrays.
>> > > 2020-10-12 11:40:31,476 __init__ L0085 DEBUG| [ 3.897832] md: autorun ...
>> > > 2020-10-12 11:40:31,477 __init__ L0085 DEBUG| [ 3.897842] md: ... autorun DONE.
>> > > 2020-10-12 11:40:31,483 __init__ L0085 DEBUG| [ 3.897962] VFS: Cannot open root device "(null)" or unknown-block(0,0): error -6
>> > > 2020-10-12 11:40:31,483 qmp L0245 DEBUG| >>> {'execute': 'quit'}
>> > > 2020-10-12 11:40:31,495 qmp L0145 DEBUG| <<< {'timestamp': {'seconds': 1602499231, 'microseconds': 493379}, 'event': 'SHUTDOWN', 'data': {'guest': True, 'reason': 'guest-reset'}}
>> > > 2020-10-12 11:40:31,733 machine L0325 WARNI| qemu received signal 6; command: "./qemu-system-arm -display none -vga none -chardev socket,id=mon,path=/var/tmp/tmpzls53khe/qemu-8487-monitor.sock -mon chardev=mon,mode=control -machine virt -chardev socket,id=console,path=/var/tmp/tmpzls53khe/qemu-8487-console.sock,server,nowait -serial chardev:console -icount shift=1,rr=record,rrfile=/var/tmp/avocado_n00stdrf/avocado_job_aw60qdul/1-tests_acceptance_replay_kernel.py_ReplayKernel.test_arm_virt/replay.bin -kernel /home/alex.bennee/avocado/data/cache/by_location/62750ce9e069e69e6a7ff04ff54c382ee660b92a/vmlinuz -append printk.time=1 panic=-1 console=ttyAMA0 -net none -no-reboot"
>> >
>> > This looks like a crash (SIGABRT) rather than a hang. Do you have a
>> > stack trace for the crashed process?
>>
>> No crash, exit(0):
>
> Why does the log say "qemu received signal 6" then?
>
>> VFS: Cannot open root device "(null)" or unknown-block(0,0): error -6
>
> Alex has this error in the logs before this commit, so I assume this is
> expected. All of the following is then probably expected, too, because
> it follows directly from this error:
>
>> Please append a correct "root=" boot option; here are the available
>> partitions:
>> Kernel panic - not syncing: VFS: Unable to mount root fs on
>> unknown-block(0,0)
>> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.16-300.fc29.armv7hl #1
>> Hardware name: Generic DT based system
>> [<c0313f7c>] (unwind_backtrace) from [<c030dc64>] (show_stack+0x20/0x24)
>> [<c030dc64>] (show_stack) from [<c0b50ec4>] (dump_stack+0x88/0xa8)
>> [<c0b50ec4>] (dump_stack) from [<c03592f8>] (panic+0xd4/0x26c)
>> [<c03592f8>] (panic) from [<c110183c>] (mount_block_root+0x250/0x2ec)
>> [<c110183c>] (mount_block_root) from [<c1101950>] (mount_root+0x78/0x90)
>> [<c1101950>] (mount_root) from [<c1101ac4>] (prepare_namespace+0x15c/0x19c)
>> [<c1101ac4>] (prepare_namespace) from [<c11012e8>]
>> (kernel_init_freeable+0x2c0/0x370)
>> [<c11012e8>] (kernel_init_freeable) from [<c0b63914>]
>> (kernel_init+0x18/0x128)
>> [<c0b63914>] (kernel_init) from [<c03010e8>] (ret_from_fork+0x14/0x2c)
>> Exception stack(0xc790bfb0 to 0xc790bff8)
>> bfa0: 00000000 00000000 00000000
>> 00000000
>> bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>> 00000000
>> bfe0: 00000000 00000000 00000000 00000000 00000013 00000000
>>
>> -> PSCI call
>> -> QEMU_PSCI_0_2_FN_SYSTEM_RESET
>> -> SHUTDOWN_CAUSE_GUEST_RESET
>> -> exit(0)
Yes - the test is recording the kernel up until the point it gives up.
> Hm... So you're saying that the test sends a 'quit' QMP command, but
> before it could be processed, the guest causes QEMU to exit, so the test
> will never receive a reply to its request?
>
> If commit 9ce44e2ce2 changes anything about this, it seems to me that it
> would be that more QMP commands are processed during monitor_cleanup()
> because it doesn't just delete the dispatcher BH, but waits until it's
> not busy any more.
>
> Looking at this code again, however, the order in monitor_cleanup() is
> probably wrong. We should first shut down the dispatcher (which may
> still be using Monitor objects) and then destroy the monitors. This
> could possibly explain a crash, but probably not wrong results with a
> successful shutdown.
I suspect this is a race between QEMU shutting down because the guest
shut it down and the acceptance test trying to shut things down via QMP.
I think the proper fix is either to:
- s/panic=-1/panic=0/ in the command line (and probably drop --no-reboot)
which would let the acceptance test cleanly shutdown via QMP. Or:
- modify the test to declare that qemu will shutdown itself and
therefor no "quit" needs to be sent
WDYT?
--
Alex Bennée
next prev parent reply other threads:[~2020-10-12 15:03 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-09 6:34 [PULL 00/14] Monitor patches for 2020-10-09 Markus Armbruster
2020-10-09 6:34 ` [PULL 01/14] monitor: Add Monitor parameter to monitor_set_cpu() Markus Armbruster
2020-10-09 6:34 ` [PULL 02/14] monitor: Add Monitor parameter to monitor_get_cpu_index() Markus Armbruster
2020-10-09 6:34 ` [PULL 03/14] monitor: Use getter/setter functions for cur_mon Markus Armbruster
2020-10-09 6:34 ` [PULL 04/14] hmp: Update current monitor only in handle_hmp_command() Markus Armbruster
2020-10-09 6:34 ` [PULL 05/14] qmp: Assert that no other monitor is active Markus Armbruster
2020-10-09 6:34 ` [PULL 06/14] qmp: Call monitor_set_cur() only in qmp_dispatch() Markus Armbruster
2020-10-09 6:34 ` [PULL 07/14] monitor: Make current monitor a per-coroutine property Markus Armbruster
2020-10-09 6:34 ` [PULL 08/14] qapi: Add a 'coroutine' flag for commands Markus Armbruster
2020-10-09 6:34 ` [PULL 09/14] qmp: Move dispatcher to a coroutine Markus Armbruster
2020-10-12 10:47 ` Alex Bennée
2020-10-12 11:25 ` Kevin Wolf
2020-10-12 11:53 ` Philippe Mathieu-Daudé
2020-10-12 12:47 ` Kevin Wolf
2020-10-12 15:02 ` Alex Bennée [this message]
2020-10-12 18:49 ` Ben Widawsky
2020-10-13 7:56 ` Philippe Mathieu-Daudé
2020-10-13 11:29 ` Kevin Wolf
2020-10-17 8:15 ` Volker Rümelin
2020-10-19 8:57 ` Kevin Wolf
2020-10-19 18:24 ` Volker Rümelin
2020-10-09 6:34 ` [PULL 10/14] hmp: Add support for coroutine command handlers Markus Armbruster
2020-10-09 6:34 ` [PULL 11/14] util/async: Add aio_co_reschedule_self() Markus Armbruster
2020-10-09 6:34 ` [PULL 12/14] block: Add bdrv_co_enter()/leave() Markus Armbruster
2020-10-09 6:34 ` [PULL 13/14] block: Add bdrv_lock()/unlock() Markus Armbruster
2020-10-09 6:34 ` [PULL 14/14] block: Convert 'block_resize' to coroutine Markus Armbruster
2020-10-09 13:47 ` [PULL 00/14] Monitor patches for 2020-10-09 Peter Maydell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87r1q3zdv9.fsf@linaro.org \
--to=alex.bennee@linaro.org \
--cc=armbru@redhat.com \
--cc=crosa@redhat.com \
--cc=dovgaluk@ispras.ru \
--cc=f4bug@amsat.org \
--cc=kwolf@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.