From: Mark Salter <msalter@redhat.com>
To: James Morse <james.morse@arm.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>,
Geoff Levand <geoff@infradead.org>,
Riku Voipio <riku.voipio@linaro.org>,
linux-acpi@vger.kernel.org, Hanjun Guo <hanjun.guo@linaro.org>,
Sudeep Holla <sudeep.holla@arm.com>,
linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH] arm64/acpi: Add fixup for HPE m400 quirks
Date: Tue, 26 Jun 2018 16:20:26 -0400 [thread overview]
Message-ID: <257bbf8d90669921cede5b2e7555b9523311b795.camel@redhat.com> (raw)
In-Reply-To: <950b3034-08a8-38b9-b8f9-514d3e2519fa@arm.com>
On Tue, 2018-06-26 at 15:51 +0100, James Morse wrote:
> Hi Mark,
>
> Thanks for shed-ing some light on what is going on here!
>
> On 25/06/18 16:34, Mark Salter wrote:
> > On Fri, 2018-06-22 at 11:19 -0400, Mark Salter wrote:
> > > I'm going to hack something to get to the ghes info earlier in boot and
> > > check the things you mention above wrt Error Status Block and GHES.0.
> >
> > So I had to end up instrumenting the EFI stub to see where the error came
> > from. At the start of the stub, there is no GHES.2 error. The error first
> > shows up after the stub's call to ExitBootServices returns.
>
> What's the notification type of GHES.2? I'm guessing POLLed or some kind of IRQ.
SCI
Here's the HEST entry:
[028h 0040 2] Subtable Type : 0009 [Generic Hardware Error Source]
[02Ah 0042 2] Source Id : 0002
[02Ch 0044 2] Related Source Id : FFFF
[02Eh 0046 1] Reserved : 00
[02Fh 0047 1] Enabled : 01
[030h 0048 4] Records To Preallocate : 00000001
[034h 0052 4] Max Sections Per Record : 00000001
[038h 0056 4] Max Raw Data Length : 00000AEC
[03Ch 0060 12] Error Status Address : [Generic Address Structure]
[03Ch 0060 1] Space ID : 00 [SystemMemory]
[03Dh 0061 1] Bit Width : 40
[03Eh 0062 1] Bit Offset : 00
[03Fh 0063 1] Encoded Access Width : 04 [QWord Access:64]
[040h 0064 8] Address : 0000004FF7E9F0E0
There are 9 others all identical except for Source ID and address.
> These systems don't have EL3, so the CPU must continue running while something
> external generates the CPER records. The records being visible is the last point
> the faulty-access could have been made, with the window of time depending on how
> fast this external-thing receives and processes the error.
There's a System Control Processor (slimpro) on the SoC which can interact with
the CPU in various ways and which has access to memory and other hw.
>
>
> > So it looks
> > like the firmware itself is causing the error. There's still a chance that
> > the stub is doing something wrong with the memory map passed to the
> > firmware, so I'll try to eliminate that as well.
>
> adding delay loops will help prove the EFIStub is innocent.
Didn't change anything.
>
> Are there any optional drivers being loaded by UEFI? (can you remove any USB
> mass storage drives for instance).
The only storage is pci based. There is a USB port but doesn't look like
anything is attached to it. I don't have physical access to it. It is one on
many moonshot cartridges in a chassis several hundred miles away.
>
> Are redhat able to rebuild UEFI on these systems? (Can it be fixed?)
No.
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1285107 is about the m400
> description of the GIC, comments 15 and 16 show a UEFI patch to something other
> than the upstream platforms tree[0], and new firmware being tested.
> (although this may be wishful thinking)
HPe would respond to bug reports until m400 reached EOL. They have been pretty
clear that no more firmware updates will be done.
>
> It looks like quirking this based on the DMI platform name and UEFI version will
> be what we need. We could discard anything in the error status block areas at
> ghes_probe() time based on this quirk, but we may have missed other problems
> during boot, giving a false sense of security.
>
>
> Thanks,
>
> James
>
>
> [0] Might be wrong, but this is where I look:
> https://github.com/tianocore/edk2-platforms.git
next prev parent reply other threads:[~2018-06-26 20:20 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-06-13 18:22 [PATCH] arm64/acpi: Add fixup for HPE m400 quirks Geoff Levand
2018-06-15 8:47 ` Riku Voipio
2018-06-15 9:51 ` Graeme Gregory
2018-06-15 11:14 ` James Morse
2018-06-15 17:17 ` Geoff Levand
2018-06-15 17:33 ` Mark Salter
2018-06-15 18:15 ` Geoff Levand
2018-06-15 19:14 ` Mark Salter
2018-06-18 16:18 ` James Morse
2018-06-18 18:04 ` Geoff Levand
2018-06-18 22:18 ` Mark Salter
2018-06-19 10:21 ` James Morse
2018-06-22 15:19 ` Mark Salter
2018-06-25 15:34 ` Mark Salter
2018-06-26 14:51 ` James Morse
2018-06-26 20:20 ` Mark Salter [this message]
2018-06-27 8:48 ` Ard Biesheuvel
2018-06-27 12:25 ` Mark Salter
2018-07-03 9:30 ` Ian Campbell
2018-07-03 15:20 ` Mark Salter
2018-06-28 10:06 ` James Morse
2018-06-29 13:05 ` Mark Salter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=257bbf8d90669921cede5b2e7555b9523311b795.camel@redhat.com \
--to=msalter@redhat.com \
--cc=geoff@infradead.org \
--cc=hanjun.guo@linaro.org \
--cc=james.morse@arm.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=lorenzo.pieralisi@arm.com \
--cc=riku.voipio@linaro.org \
--cc=sudeep.holla@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).