All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Alexander A. Filippov" <a.filippov@yadro.com>
To: Joel Stanley <joel@jms.id.au>
Cc: "Alexander A. Filippov" <a.filippov@yadro.com>,
	Eddie James <eajames@linux.ibm.com>,
	Alexander Amelkin <a.amelkin@yadro.com>,
	Artem Senichev <artemsen@gmail.com>,
	openbmc <openbmc@lists.ozlabs.org>
Subject: Re: The Power9 host booting problem with OpenBMC kernel 5.7.x
Date: Tue, 11 Aug 2020 21:33:14 +0300	[thread overview]
Message-ID: <20200811183314.GA26661@bbwork.lan> (raw)
In-Reply-To: <CACPK8XdFNpsyzgY8n_3VTxS-Z88bT1pBEXPO+w=dWE6G1fj3jA@mail.gmail.com>

On Tue, Aug 11, 2020 at 06:12:30AM +0000, Joel Stanley wrote:
> On Mon, 10 Aug 2020 at 18:48, Alexander A. Filippov
> <a.filippov@yadro.com> wrote:
> >
> > Since the kernel in OpenBMC was updated to 5.7.x we have a problem with the P9
> > hosts booting.
> > On host with one Power9 CPU the failure happens during the Petitboot is trying
> > to initialize the network and it leads to host restarts.
> > On host with two Power9 CPU the same failure happens during OS booting. It
> > increases boot time, but at the end the host OS is completely started.
> 
> Oh no. I have spent some time testing the 5.7 tree primarily on
> Tacoma, our ast2600/p9 platform. We saw some strange systemd failures,
> where services such as udevd and journald would be killed by systemd's
> watchdog functionality. I did some preliminary debugging but didn't
> find a root cause.
> 
> I have since published a 5.8 based tree that does not suffer from this
> issue. Could you give that a spin on your hardware and see if it
> recreates your issue?
> 
>  https://gerrit.openbmc-project.xyz/c/openbmc/meta-aspeed/+/35315
> 

With the kerenl 5.8 the host is still not booting.
I've checked on both machines and they have very different results:
 - On the machine with two CPUs the issue is still reproduced.
   I see no difference, neither in the behavior, nor in the logs.
 - On the machine with one CPU the failure happens due the PNOR flash.
   It looks like this:

[ 16:23:27 ] --== Welcome to Hostboot hostboot-9865ef9/hbicore.bin ==--
[ 16:23:27 ] 
[ 16:23:27 ]   5.31049|secure|SecureROM valid - enabling functionality
[ 16:23:30 ]   8.00820|Booting from SBE side 0 on master proc=00050000
[ 16:23:30 ]   8.04587|ISTEP  6. 5 - host_init_fsi
[ 16:23:30 ]   8.21815|ISTEP  6. 6 - host_set_ipl_parms
[ 16:23:30 ]   8.40171|ISTEP  6. 7 - host_discover_targets
[ 16:23:32 ]   9.55142|HWAS|PRESENT> DIMM[03]=A0A0000000000000
[ 16:23:32 ]   9.55144|HWAS|PRESENT> Proc[05]=8000000000000000
[ 16:23:32 ]   9.55145|HWAS|PRESENT> Core[07]=33FFC30000000000
[ 16:23:33 ]  10.38865|ISTEP  6. 8 - host_update_master_tpm
[ 16:23:33 ]  10.41071|SECURE|Security Access Bit> 0x0000000000000000
[ 16:23:33 ]  10.41072|SECURE|Secure Mode Disable (via Jumper)> 0x8000000000000000
[ 16:23:33 ]  10.41089|ISTEP  6. 9 - host_gard
[ 16:23:33 ]  10.68154|HWAS|FUNCTIONAL> DIMM[03]=A0A0000000000000
[ 16:23:33 ]  10.68156|HWAS|FUNCTIONAL> Proc[05]=8000000000000000
[ 16:23:33 ]  10.68157|HWAS|FUNCTIONAL> Core[07]=33FFC30000000000
[ 16:23:33 ]  10.68776|ISTEP  6.11 - host_start_occ_xstop_handler
[ 16:23:34 ]  11.10376|ECC error in PNOR flash in section offset 0x030DF600
[ 16:23:34 ] 
[ 16:23:34 ]  11.10387|System shutting down with error status 0x60F
[ 16:24:52 ] 
[ 16:24:52 ] 
[ 16:24:52 ] --== Welcome to SBE - CommitId[0xc58e8fd0] ==--


   After that the PNOR flash is corrupted and all other trying to boot stops
   at stage 'SBE starting hostboot'.

I've noticed that the kernel 5.8 detect the flash driver incorrectly:
mx25l51245g instead of mx66l51235f.
It happens on both machines and I don't understand why it leads to the problems
on only one of them.

After restoring the previous firmware and power cycle both machines work fine.

> > So, I have two questions:
> > - Could you please, check if Romulus is also affected by this issue?
> > - Do you have any idea what is going wrong?
> 
> I'll fire up a romulus and see if it reproduces.
> 
> My guess is it's something to do with the timekeeping, irq or rcu
> code. All areas of complexity!
> 
> Thanks for the report.
> 
> Cheers,
> 
> Joel
> 
> > I've attached the tarball with full logs.
> > - poopsy is a system with two Power9 CPU
> > - whoopsy is a system with one Power9 CPU
> >
> > --
> > Regards,
> > Alexander

--
Regards,
Alexander

  parent reply	other threads:[~2020-08-11 18:33 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-10 18:44 The Power9 host booting problem with OpenBMC kernel 5.7.x Alexander A. Filippov
2020-08-11  6:12 ` Joel Stanley
2020-08-11 11:55   ` Artem Senichev
2020-08-11 18:33   ` Alexander A. Filippov [this message]
2020-08-12  8:56     ` Joel Stanley
2020-08-12 13:59       ` Alexander A. Filippov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200811183314.GA26661@bbwork.lan \
    --to=a.filippov@yadro.com \
    --cc=a.amelkin@yadro.com \
    --cc=artemsen@gmail.com \
    --cc=eajames@linux.ibm.com \
    --cc=joel@jms.id.au \
    --cc=openbmc@lists.ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.