From: "Daniel P. Berrangé" <berrange@redhat.com>
To: Stefan Berger <stefanb@linux.ibm.com>
Cc: "Peter Maydell" <peter.maydell@linaro.org>,
"QEMU Developers" <qemu-devel@nongnu.org>,
"Eric Auger" <eric.auger@redhat.com>,
"Alex Bennée" <alex.bennee@linaro.org>,
"Thomas Huth" <thuth@redhat.com>,
"Laurent Vivier" <lvivier@redhat.com>,
"Paolo Bonzini" <pbonzini@redhat.com>
Subject: Re: intermittent hang, s390x host, bios-tables-test test, TPM
Date: Tue, 10 Jan 2023 19:44:11 +0000 [thread overview]
Message-ID: <Y73AC98T0VLDcnj9@redhat.com> (raw)
In-Reply-To: <32c53c77-5827-7839-94a1-73003bc3f8af@linux.ibm.com>
On Fri, Jan 06, 2023 at 10:16:36AM -0500, Stefan Berger wrote:
>
>
> On 1/6/23 07:10, Peter Maydell wrote:
> > I'm seeing an intermittent hang on the s390 CI runner in the
> > bios-tables-test test. It looks like we've deadlocked because:
> >
> > * the TPM device is waiting for data on its socket that never arrives,
> > and it's holding the iothread lock
> > * QEMU is therefore not making forward progress;
> > in particular it is unable to handle qtest queries/responses
> > * the test binary thread 1 is waiting to get a response to its
> > qtest command, which is not going to arrive
> > * test binary thread 3 (tpm_emu_ctrl_thread) is has hit an
> > assertion and is trying to kill QEMU via qtest_kill_qemu()
> > * qtest_kill_qemu() is only a "SIGTERM and wait", so will wait
> > forever, because QEMU won't respond to the SIGTERM while it's
> > blocked waiting for the TPM device to release the iothread lock
> > * because the ctrl-thread is waiting for QEMU to exit, it's never
> > going to send the data that would unblock the TPM device emulation
> >
> [...]
>
> >
> > Thread 3 (Thread 0x3ff8dafe900 (LWP 2661316)):
> > #0 0x000003ff8e9c6002 in __GI___wait4 (pid=<optimized out>,
> > stat_loc=stat_loc@entry=0x2aa0b42c9bc, options=<optimized out>,
> > usage=usage@entry=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:27
> > #1 0x000003ff8e9c5f72 in __GI___waitpid (pid=<optimized out>,
> > stat_loc=stat_loc@entry=0x2aa0b42c9bc, options=options@entry=0) at
> > waitpid.c:38
> > #2 0x000002aa0952a516 in qtest_wait_qemu (s=0x2aa0b42c9b0) at
> > ../tests/qtest/libqtest.c:206
> > #3 0x000002aa0952a58a in qtest_kill_qemu (s=0x2aa0b42c9b0) at
> > ../tests/qtest/libqtest.c:229
> > #4 0x000003ff8f0c288e in g_hook_list_invoke () from
> > /lib/s390x-linux-gnu/libglib-2.0.so.0
> > #5 <signal handler called>
> > #6 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> > #7 0x000003ff8e9240a2 in __GI_abort () at abort.c:79
> > #8 0x000003ff8f0feda8 in g_assertion_message () from
> > /lib/s390x-linux-gnu/libglib-2.0.so.0
> > #9 0x000003ff8f0fedfe in g_assertion_message_expr () from
> > /lib/s390x-linux-gnu/libglib-2.0.so.0
> > #10 0x000002aa09522904 in tpm_emu_ctrl_thread (data=0x3fff5ffa160) at
> > ../tests/qtest/tpm-emu.c:189
>
> This here seems to be the root cause. An unknown control channel command was received from the TPM emulator backend by the control channel thread and we end up in g_assert_not_reached().
>
> https://github.com/qemu/qemu/blob/master/tests/qtest/tpm-emu.c#L189
>
>
>
> ret = qio_channel_read(ioc, (char *)&cmd, sizeof(cmd), NULL);
> if (ret <= 0) {
> break;
> }
>
> cmd = be32_to_cpu(cmd);
> switch (cmd) {
> [...]
> default:
> g_debug("unimplemented %u", cmd);
> g_assert_not_reached(); <------------------
> }
>
> I will run this test case in an endless loop on an x86_64 host and see what we get there ...
The QEMU stack trace shows:
#7 0x000002aa1224a2ca in tpm_emulator_cancel_cmd (tb=<optimized out>)
at ../backends/tpm/tpm_emulator.c:500
#8 0x000002aa121e68c4 in tpm_tis_mmio_write (opaque=0x2aa1529ec20,
addr=24, val=64, size=<optimized out>) at
../hw/tpm/tpm_tis_common.c:663
IOW, we're getting CMD_CANCEL_TPM_CMD, which is indeed not handled
by any 'case:' in the switch in qtest/tpm-emu.c
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
prev parent reply other threads:[~2023-01-10 19:45 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-06 12:10 intermittent hang, s390x host, bios-tables-test test, TPM Peter Maydell
2023-01-06 13:53 ` Stefan Berger
2023-01-06 14:04 ` Peter Maydell
2023-01-06 15:16 ` Stefan Berger
2023-01-06 15:39 ` Peter Maydell
2023-01-06 15:58 ` Stefan Berger
2023-01-10 19:25 ` Daniel P. Berrangé
2023-01-10 22:10 ` Peter Maydell
2023-01-10 18:50 ` Stefan Berger
2023-01-10 19:27 ` Daniel P. Berrangé
2023-01-10 19:47 ` Stefan Berger
2023-01-10 22:02 ` Stefan Berger
2023-01-11 9:05 ` Daniel P. Berrangé
2023-01-11 13:00 ` Stefan Berger
2023-01-10 19:44 ` Daniel P. Berrangé [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y73AC98T0VLDcnj9@redhat.com \
--to=berrange@redhat.com \
--cc=alex.bennee@linaro.org \
--cc=eric.auger@redhat.com \
--cc=lvivier@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanb@linux.ibm.com \
--cc=thuth@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.