qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Daniel P. Berrangé" <berrange@redhat.com>
To: Stefan Berger <stefanb@linux.ibm.com>
Cc: "Peter Maydell" <peter.maydell@linaro.org>,
	"QEMU Developers" <qemu-devel@nongnu.org>,
	"Eric Auger" <eric.auger@redhat.com>,
	"Alex Bennée" <alex.bennee@linaro.org>,
	"Thomas Huth" <thuth@redhat.com>,
	"Laurent Vivier" <lvivier@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>
Subject: Re: intermittent hang, s390x host, bios-tables-test test, TPM
Date: Wed, 11 Jan 2023 09:05:44 +0000	[thread overview]
Message-ID: <Y7576DXofEJC24q6@redhat.com> (raw)
In-Reply-To: <6a0957eb-6108-82ac-675a-fc20de94a2e0@linux.ibm.com>

On Tue, Jan 10, 2023 at 05:02:58PM -0500, Stefan Berger wrote:
> 
> 
> On 1/10/23 14:47, Stefan Berger wrote:
> > 
> > 
> > On 1/10/23 14:27, Daniel P. Berrangé wrote:
> > > On Tue, Jan 10, 2023 at 01:50:26PM -0500, Stefan Berger wrote:
> > > > 
> > > > 
> > > > On 1/6/23 10:16, Stefan Berger wrote:
> > > > > This here seems to be the root cause. An unknown control channel
> > > > > command was received from the TPM emulator backend by the control channel thread and we end up in g_assert_not_reached().
> > > > > 
> > > > > https://github.com/qemu/qemu/blob/master/tests/qtest/tpm-emu.c#L189
> > > > > 
> > > > > 
> > > > > 
> > > > >           ret = qio_channel_read(ioc, (char *)&cmd, sizeof(cmd), NULL);
> > > > >           if (ret <= 0) {
> > > > >               break;
> > > > >           }
> > > > > 
> > > > >           cmd = be32_to_cpu(cmd);
> > > > >           switch (cmd) {
> > > > >    [...]
> > > > >           default:
> > > > >               g_debug("unimplemented %u", cmd);
> > > > >               g_assert_not_reached();                <------------------
> > > > >           }
> > > > > 
> > > > > I will run this test case in an endless loop on an x86_64 host and see what we get there ...
> > > > 
> > > > I could not recreate the issue running the  test on a ppc64 and x86_64
> > > > host. There we like >100k test runs on ppc64 and >40k on x86_64. Also
> > > > simulating the reception of an unsupported command did not lead to a
> > > > hang like shown here.
> > > 
> > > Assuming your ppc64 host is running an little endian OS, and
> > > we're only seeing the test failure on s390x, then it points towards
> > > the problem being an endianness issue in the TPM code. Something
> > > missing a byteswap somewhere along the way ?
> > 
> > Yes, my ppc64 machine is also little endian. If the issue  was not an intermittent but a permanent
> > failure I would look for something like that. I would think it's more some sort of initialization
> > issue, like a value on the stack that occasionally set to an undesirable value -- maybe even in a
> > dependency.
> 
> I found I still had access to an s390x machine. ~2700 loops on this test case
> so far but nothing... it would be good to be able to recreate the issue and
> apply the fix but we'll have to do it without testing then I guess.
> 
> Does this look about right? From my tests with injecting an error it at least
> seems to do what it is intended to do.
> 
> diff --git a/tests/qtest/tpm-emu.c b/tests/qtest/tpm-emu.c
> index 2994d1cf42..dbc308a572 100644
> --- a/tests/qtest/tpm-emu.c
> +++ b/tests/qtest/tpm-emu.c
> @@ -36,11 +36,19 @@ void tpm_emu_test_wait_cond(TPMTestState *s)
>      g_mutex_unlock(&s->data_mutex);
>  }
> 
> +static void tpm_emu_close_data_ioc(void *ioc)
> +{
> +    g_debug("CLOSE DATA IOC");
> +    qio_channel_close(ioc, NULL);
> +}
> +
>  static void *tpm_emu_tpm_thread(void *data)
>  {
>      TPMTestState *s = data;
>      QIOChannel *ioc = s->tpm_ioc;
> 
> +    qtest_add_abrt_handler(tpm_emu_close_data_ioc, ioc);
> +
>      s->tpm_msg = g_new(struct tpm_hdr, 1);
>      while (true) {
>          int minhlen = sizeof(s->tpm_msg->tag) + sizeof(s->tpm_msg->len);
> @@ -77,12 +85,19 @@ static void *tpm_emu_tpm_thread(void *data)
>                            &error_abort);
>      }
> 
> +    qtest_remove_abrt_handler(ioc);
>      g_free(s->tpm_msg);
>      s->tpm_msg = NULL;
>      object_unref(OBJECT(s->tpm_ioc));
>      return NULL;
>  }
> 
> +static void tpm_emu_close_ctrl_ioc(void *ioc)
> +{
> +    g_debug("CLOSE CTRL IOC");
> +    qio_channel_close(ioc, NULL);
> +}
> +
>  void *tpm_emu_ctrl_thread(void *data)
>  {
>      TPMTestState *s = data;
> @@ -119,6 +134,8 @@ void *tpm_emu_ctrl_thread(void *data)
>          s->emu_tpm_thread = g_thread_new(NULL, tpm_emu_tpm_thread, s);
>      }
> 
> +    qtest_add_abrt_handler(tpm_emu_close_ctrl_ioc, ioc);

I'd suggest this be done before starting tpm_emu_tpm_thread,
immediately after the "ioc" is created. 

> +
>      while (true) {
>          uint32_t cmd;
>          ssize_t ret;
> @@ -129,6 +146,9 @@ void *tpm_emu_ctrl_thread(void *data)
>          }
> 
>          cmd = be32_to_cpu(cmd);
> +        //g_debug("cmd=%u", cmd);
> +        //if (cmd == 14)
> +        //    cmd = 100;
>          switch (cmd) {
>          case CMD_GET_CAPABILITY: {
>              ptm_cap cap = cpu_to_be64(0x3fff);
> @@ -190,6 +210,8 @@ void *tpm_emu_ctrl_thread(void *data)
>          }
>      }
> 
> +    qtest_remove_abrt_handler(ioc);
> +
>      object_unref(OBJECT(ioc));
>      object_unref(OBJECT(lioc));
>      return NULL;
> 
> > 
> >     Stefan
> > 
> > > 
> > > 
> > > With regards,
> > > Daniel
> > 
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



  reply	other threads:[~2023-01-11  9:06 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-06 12:10 intermittent hang, s390x host, bios-tables-test test, TPM Peter Maydell
2023-01-06 13:53 ` Stefan Berger
2023-01-06 14:04   ` Peter Maydell
2023-01-06 15:16 ` Stefan Berger
2023-01-06 15:39   ` Peter Maydell
2023-01-06 15:58     ` Stefan Berger
2023-01-10 19:25     ` Daniel P. Berrangé
2023-01-10 22:10       ` Peter Maydell
2023-01-10 18:50   ` Stefan Berger
2023-01-10 19:27     ` Daniel P. Berrangé
2023-01-10 19:47       ` Stefan Berger
2023-01-10 22:02         ` Stefan Berger
2023-01-11  9:05           ` Daniel P. Berrangé [this message]
2023-01-11 13:00             ` Stefan Berger
2023-01-10 19:44   ` Daniel P. Berrangé

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y7576DXofEJC24q6@redhat.com \
    --to=berrange@redhat.com \
    --cc=alex.bennee@linaro.org \
    --cc=eric.auger@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanb@linux.ibm.com \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).