qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Nathan Chancellor <natechancellor@gmail.com>
Cc: qemu-devel@nongnu.org, "Nicholas Piggin" <npiggin@gmail.com>,
	clang-built-linux@googlegroups.com,
	"Cédric Le Goater" <clg@fr.ibm.com>,
	qemu-ppc@nongnu.org, linuxppc-dev@lists.ozlabs.org
Subject: Re: Boot flakiness with QEMU 3.1.0 and Clang built kernels
Date: Tue, 14 Apr 2020 14:40:10 +1000	[thread overview]
Message-ID: <20200414044010.GK48061@umbus.fritz.box> (raw)
In-Reply-To: <20200414040515.GA22855@ubuntu-s3-xlarge-x86>

[-- Attachment #1: Type: text/plain, Size: 4831 bytes --]

On Mon, Apr 13, 2020 at 09:05:15PM -0700, Nathan Chancellor wrote:
> On Tue, Apr 14, 2020 at 12:05:53PM +1000, David Gibson wrote:
> > On Sat, Apr 11, 2020 at 11:57:23PM +1000, Nicholas Piggin wrote:
> > > Nicholas Piggin's on April 11, 2020 7:32 pm:
> > > > Nathan Chancellor's on April 11, 2020 10:53 am:
> > > >> The tt.config values are needed to reproduce but I did not verify that
> > > >> ONLY tt.config was needed. Other than that, no, we are just building
> > > >> either pseries_defconfig or powernv_defconfig with those configs and
> > > >> letting it boot up with a simple initramfs, which prints the version
> > > >> string then shuts the machine down.
> > > >> 
> > > >> Let me know if you need any more information, cheers!
> > > > 
> > > > Okay I can reproduce it. Sometimes it eventually recovers after a long
> > > > pause, and some keyboard input often helps it along. So that seems like 
> > > > it might be a lost interrupt.
> > > > 
> > > > POWER8 vs POWER9 might just be a timing thing if P9 is still hanging
> > > > sometimes. I wasn't able to reproduce it with defconfig+tt.config, I
> > > > needed your other config with various other debug options.
> > > > 
> > > > Thanks for the very good report. I'll let you know what I find.
> > > 
> > > It looks like a qemu bug. Booting with '-d int' shows the decrementer 
> > > simply stops firing at the point of the hang, even though MSR[EE]=1 and 
> > > the DEC register is wrapping. Linux appears to be doing the right thing 
> > > as far as I can tell (not losing interrupts).
> > > 
> > > This qemu patch fixes the boot hang for me. I don't know that qemu 
> > > really has the right idea of "context synchronizing" as defined in the
> > > powerpc architecture -- mtmsrd L=1 is not context synchronizing but that
> > > does not mean it can avoid looking at exceptions until the next such
> > > event. It looks like the decrementer exception goes high but the
> > > execution of mtmsrd L=1 is ignoring it.
> > > 
> > > Prior to the Linux patch 3282a3da25b you bisected to, interrupt replay
> > > code would return with an 'rfi' instruction as part of interrupt return,
> > > which probably helped to get things moving along a bit. However it would
> > > not be foolproof, and Cedric did say he encountered some mysterious
> > > lockups under load with qemu powernv before that patch was merged, so
> > > maybe it's the same issue?
> > > 
> > > Thanks,
> > > Nick
> > > 
> > > The patch is a bit of a hack, but if you can run it and verify it fixes
> > > your boot hang would be good.
> > 
> > So a bug in this handling wouldn't surprise me at all.  However a
> > report against QEMU 3.1 isn't particularly useful.
> > 
> >  * Does the problem occur with current upstream master qemu?
> 
> Yes, I can reproduce the hang on 5.0.0-rc2.

Ok.

Nick, can you polish up your fix shortly and submit upstream in the
usual fashion?

> >  * Does the problem occur with qemu-2.12 (a pretty widely deployed
> >    "stable" qemu, e.g. in RHEL)?
> 
> No idea but I would assume so. I might have time later this week to test
> but I assume it is kind of irrelevant if it is reproducible at ToT.
> 
> > > ---
> > > 
> > > diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> > > index b207fb5386..1d997f5c32 100644
> > > --- a/target/ppc/translate.c
> > > +++ b/target/ppc/translate.c
> > > @@ -4364,12 +4364,21 @@ static void gen_mtmsrd(DisasContext *ctx)
> > >      if (ctx->opcode & 0x00010000) {
> > >          /* Special form that does not need any synchronisation */
> > >          TCGv t0 = tcg_temp_new();
> > > +        TCGv t1 = tcg_temp_new();
> > >          tcg_gen_andi_tl(t0, cpu_gpr[rS(ctx->opcode)],
> > >                          (1 << MSR_RI) | (1 << MSR_EE));
> > > -        tcg_gen_andi_tl(cpu_msr, cpu_msr,
> > > +        tcg_gen_andi_tl(t1, cpu_msr,
> > >                          ~(target_ulong)((1 << MSR_RI) | (1 << MSR_EE)));
> > > -        tcg_gen_or_tl(cpu_msr, cpu_msr, t0);
> > > +        tcg_gen_or_tl(t1, t1, t0);
> > > +
> > > +        gen_update_nip(ctx, ctx->base.pc_next);
> > > +        gen_helper_store_msr(cpu_env, t1);
> > >          tcg_temp_free(t0);
> > > +        tcg_temp_free(t1);
> > > +        /* Must stop the translation as machine state (may have) changed */
> > > +        /* Note that mtmsr is not always defined as context-synchronizing */
> > > +        gen_stop_exception(ctx);
> > > +
> > >      } else {
> > >          /*
> > >           * XXX: we need to update nip before the store if we enter
> > > 
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

      reply	other threads:[~2020-04-14  6:26 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20200410205932.GA880@ubuntu-s3-xlarge-x86>
     [not found] ` <1586564375.zt8lm9finh.astroid@bobo.none>
     [not found]   ` <20200411005354.GA24145@ubuntu-s3-xlarge-x86>
     [not found]     ` <1586597161.xyshvdbjo6.astroid@bobo.none>
2020-04-11 13:57       ` Boot flakiness with QEMU 3.1.0 and Clang built kernels Nicholas Piggin
2020-04-11 23:35         ` Nathan Chancellor
2020-04-12 12:03         ` Cédric Le Goater
2020-04-14  2:05         ` David Gibson
2020-04-14  4:05           ` Nathan Chancellor
2020-04-14  4:40             ` David Gibson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200414044010.GK48061@umbus.fritz.box \
    --to=david@gibson.dropbear.id.au \
    --cc=clang-built-linux@googlegroups.com \
    --cc=clg@fr.ibm.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=natechancellor@gmail.com \
    --cc=npiggin@gmail.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).