All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Daniel P. Berrangé" <berrange@redhat.com>
To: Stefan Berger <stefanb@linux.ibm.com>
Cc: "Peter Maydell" <peter.maydell@linaro.org>,
	"QEMU Developers" <qemu-devel@nongnu.org>,
	"Eric Auger" <eric.auger@redhat.com>,
	"Alex Bennée" <alex.bennee@linaro.org>,
	"Thomas Huth" <thuth@redhat.com>,
	"Laurent Vivier" <lvivier@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>
Subject: Re: intermittent hang, s390x host, bios-tables-test test, TPM
Date: Wed, 11 Jan 2023 09:05:44 +0000	[thread overview]
Message-ID: <Y7576DXofEJC24q6@redhat.com> (raw)
In-Reply-To: <6a0957eb-6108-82ac-675a-fc20de94a2e0@linux.ibm.com>

On Tue, Jan 10, 2023 at 05:02:58PM -0500, Stefan Berger wrote:
> 
> 
> On 1/10/23 14:47, Stefan Berger wrote:
> > 
> > 
> > On 1/10/23 14:27, Daniel P. Berrangé wrote:
> > > On Tue, Jan 10, 2023 at 01:50:26PM -0500, Stefan Berger wrote:
> > > > 
> > > > 
> > > > On 1/6/23 10:16, Stefan Berger wrote:
> > > > > This here seems to be the root cause. An unknown control channel
> > > > > command was received from the TPM emulator backend by the control channel thread and we end up in g_assert_not_reached().
> > > > > 
> > > > > https://github.com/qemu/qemu/blob/master/tests/qtest/tpm-emu.c#L189
> > > > > 
> > > > > 
> > > > > 
> > > > >           ret = qio_channel_read(ioc, (char *)&cmd, sizeof(cmd), NULL);
> > > > >           if (ret <= 0) {
> > > > >               break;
> > > > >           }
> > > > > 
> > > > >           cmd = be32_to_cpu(cmd);
> > > > >           switch (cmd) {
> > > > >    [...]
> > > > >           default:
> > > > >               g_debug("unimplemented %u", cmd);
> > > > >               g_assert_not_reached();                <------------------
> > > > >           }
> > > > > 
> > > > > I will run this test case in an endless loop on an x86_64 host and see what we get there ...
> > > > 
> > > > I could not recreate the issue running the  test on a ppc64 and x86_64
> > > > host. There we like >100k test runs on ppc64 and >40k on x86_64. Also
> > > > simulating the reception of an unsupported command did not lead to a
> > > > hang like shown here.
> > > 
> > > Assuming your ppc64 host is running an little endian OS, and
> > > we're only seeing the test failure on s390x, then it points towards
> > > the problem being an endianness issue in the TPM code. Something
> > > missing a byteswap somewhere along the way ?
> > 
> > Yes, my ppc64 machine is also little endian. If the issue  was not an intermittent but a permanent
> > failure I would look for something like that. I would think it's more some sort of initialization
> > issue, like a value on the stack that occasionally set to an undesirable value -- maybe even in a
> > dependency.
> 
> I found I still had access to an s390x machine. ~2700 loops on this test case
> so far but nothing... it would be good to be able to recreate the issue and
> apply the fix but we'll have to do it without testing then I guess.
> 
> Does this look about right? From my tests with injecting an error it at least
> seems to do what it is intended to do.
> 
> diff --git a/tests/qtest/tpm-emu.c b/tests/qtest/tpm-emu.c
> index 2994d1cf42..dbc308a572 100644
> --- a/tests/qtest/tpm-emu.c
> +++ b/tests/qtest/tpm-emu.c
> @@ -36,11 +36,19 @@ void tpm_emu_test_wait_cond(TPMTestState *s)
>      g_mutex_unlock(&s->data_mutex);
>  }
> 
> +static void tpm_emu_close_data_ioc(void *ioc)
> +{
> +    g_debug("CLOSE DATA IOC");
> +    qio_channel_close(ioc, NULL);
> +}
> +
>  static void *tpm_emu_tpm_thread(void *data)
>  {
>      TPMTestState *s = data;
>      QIOChannel *ioc = s->tpm_ioc;
> 
> +    qtest_add_abrt_handler(tpm_emu_close_data_ioc, ioc);
> +
>      s->tpm_msg = g_new(struct tpm_hdr, 1);
>      while (true) {
>          int minhlen = sizeof(s->tpm_msg->tag) + sizeof(s->tpm_msg->len);
> @@ -77,12 +85,19 @@ static void *tpm_emu_tpm_thread(void *data)
>                            &error_abort);
>      }
> 
> +    qtest_remove_abrt_handler(ioc);
>      g_free(s->tpm_msg);
>      s->tpm_msg = NULL;
>      object_unref(OBJECT(s->tpm_ioc));
>      return NULL;
>  }
> 
> +static void tpm_emu_close_ctrl_ioc(void *ioc)
> +{
> +    g_debug("CLOSE CTRL IOC");
> +    qio_channel_close(ioc, NULL);
> +}
> +
>  void *tpm_emu_ctrl_thread(void *data)
>  {
>      TPMTestState *s = data;
> @@ -119,6 +134,8 @@ void *tpm_emu_ctrl_thread(void *data)
>          s->emu_tpm_thread = g_thread_new(NULL, tpm_emu_tpm_thread, s);
>      }
> 
> +    qtest_add_abrt_handler(tpm_emu_close_ctrl_ioc, ioc);

I'd suggest this be done before starting tpm_emu_tpm_thread,
immediately after the "ioc" is created. 

> +
>      while (true) {
>          uint32_t cmd;
>          ssize_t ret;
> @@ -129,6 +146,9 @@ void *tpm_emu_ctrl_thread(void *data)
>          }
> 
>          cmd = be32_to_cpu(cmd);
> +        //g_debug("cmd=%u", cmd);
> +        //if (cmd == 14)
> +        //    cmd = 100;
>          switch (cmd) {
>          case CMD_GET_CAPABILITY: {
>              ptm_cap cap = cpu_to_be64(0x3fff);
> @@ -190,6 +210,8 @@ void *tpm_emu_ctrl_thread(void *data)
>          }
>      }
> 
> +    qtest_remove_abrt_handler(ioc);
> +
>      object_unref(OBJECT(ioc));
>      object_unref(OBJECT(lioc));
>      return NULL;
> 
> > 
> >     Stefan
> > 
> > > 
> > > 
> > > With regards,
> > > Daniel
> > 
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



  reply	other threads:[~2023-01-11  9:06 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-06 12:10 intermittent hang, s390x host, bios-tables-test test, TPM Peter Maydell
2023-01-06 13:53 ` Stefan Berger
2023-01-06 14:04   ` Peter Maydell
2023-01-06 15:16 ` Stefan Berger
2023-01-06 15:39   ` Peter Maydell
2023-01-06 15:58     ` Stefan Berger
2023-01-10 19:25     ` Daniel P. Berrangé
2023-01-10 22:10       ` Peter Maydell
2023-01-10 18:50   ` Stefan Berger
2023-01-10 19:27     ` Daniel P. Berrangé
2023-01-10 19:47       ` Stefan Berger
2023-01-10 22:02         ` Stefan Berger
2023-01-11  9:05           ` Daniel P. Berrangé [this message]
2023-01-11 13:00             ` Stefan Berger
2023-01-10 19:44   ` Daniel P. Berrangé

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y7576DXofEJC24q6@redhat.com \
    --to=berrange@redhat.com \
    --cc=alex.bennee@linaro.org \
    --cc=eric.auger@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanb@linux.ibm.com \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.