From: Geoff Levand <geoff@infradead.org>
To: James Morse <james.morse@arm.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>,
Riku Voipio <riku.voipio@linaro.org>,
Sudeep Holla <sudeep.holla@arm.com>,
linux-acpi@vger.kernel.org, Hanjun Guo <hanjun.guo@linaro.org>,
Mark Salter <msalter@redhat.com>,
linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH] arm64/acpi: Add fixup for HPE m400 quirks
Date: Fri, 15 Jun 2018 10:17:18 -0700 [thread overview]
Message-ID: <8a3034b9-6cf3-5182-717f-dd1dc8a087aa@infradead.org> (raw)
In-Reply-To: <098e6d53-8dc7-439f-7165-adbe0e7c4941@arm.com>
Hi James,
Just for background, this is a well known bug in the m400's AEPI/HEST
firmware. There are a number of fixes out there the different distros
have. I just put together this patch to unify things and have a
common 'upstream' fix.
On 06/15/2018 04:14 AM, James Morse wrote:
> On 13/06/18 19:22, Geoff Levand wrote:
>> Adds a new ACPI init routine acpi_fixup_m400_quirks that adds
>> a work-around for HPE ProLiant m400 APEI firmware problems.
>>
>> The work-around disables APEI when CONFIG_ACPI_APEI is set and
>> m400 firmware is detected. Without this fixup m400 systems
>> experience errors like these on startup:
>>
>> [Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 2
>> [Hardware Error]: event severity: fatal
>> [Hardware Error]: Error 0, type: fatal
>> [Hardware Error]: section_type: memory error
>> [Hardware Error]: error_status: 0x0000000000001300
>
> "Access to a memory address which is not mapped to any component"
>
>
>> [Hardware Error]: error_type: 10, invalid address
>> Kernel panic - not syncing: Fatal hardware error!
>
> Why is this a problem?
>
> Surely this is a valid description of an error.
The firmware bug causes this failure, not bad hardware.
> (okay its not particularly useful without the physical address, but the address
> is optional in that structure)
>
> When does this happen during boot? This looks like a driver mapping some
> non-existent physical address space to see if its device is present...
> unsurprisingly this doesn't go well.
> (might also be a typo in the DSDT)
>
> Can't we pin down the driver that does this and fix it. Its either wrong for
> everyone, or still broken after you disable APEI.
>
>
>> It seems unlikely there will be any m400 firmware updates to fix
>> this problem.
>
> What is the problem? This patch looks like it shoots the messenger for bringing
> bad news.
The news is incorrect, so this patch disables the source (APEI code).
>> diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c
>> index 7b09487ff8fb..3c315c2c7476 100644
>> --- a/arch/arm64/kernel/acpi.c
>> +++ b/arch/arm64/kernel/acpi.c
>> @@ -31,6 +31,8 @@
>> #include <asm/cpu_ops.h>
>> #include <asm/smp_plat.h>
>>
>> +#include <acpi/apei.h>
>> +
>> #ifdef CONFIG_ACPI_APEI
>> # include <linux/efi.h>
>> # include <asm/pgtable.h>
>> @@ -177,6 +179,33 @@ static int __init acpi_fadt_sanity_check(void)
>> return ret;
>> }
>>
>> +/*
>> + * acpi_fixup_m400_quirks - Work-around for HPE ProLiant m400 APEI firmware
>> + * problems.
>> + */
>> +static void __init acpi_fixup_m400_quirks(void)
>> +{
>> + acpi_status status;
>> + struct acpi_table_header *header;
>> +#if !defined(CONFIG_ACPI_APEI)
>> + int hest_disable = HEST_DISABLED;
>> +#endif
>
> Yuck.
Yes, unfortunately, the hest code conditionally defines hest_disable.
>> +
>> + if (!IS_ENABLED(CONFIG_ACPI_APEI) || hest_disable != HEST_ENABLED)
>> + return;
>> +
>> + status = acpi_get_table(ACPI_SIG_HEST, 0, &header);
>> +
>> + if (ACPI_SUCCESS(status) && !strncmp(header->oem_id, "HPE ", 6) &&
>> + !strncmp(header->oem_table_id, "ProLiant", 8) &&
>
> You should match the affected range of OEM table revisions too, that way a
> firmware upgrade should start working, instead of being permanently disabled
> because we think its unlikely.
The m400 has reached end of life. No one really expects to see any firmware
update. I don't know what the effected OEM table revisions are, and I don't
think there is an active platform maintainer who could give that info either.
If someone can provide the info. I'll update the fix.
>> + MIDR_IMPLEMENTOR(read_cpuid_id()) == ARM_CPU_IMP_APM) {
>
> How is the CPU implementer relevant?
That was just a copy of what other fixes had. Should I remove it?
> You suggest a firmware-update would make this issue go away...
>
>
>> + hest_disable = HEST_DISABLED;
>> + pr_info("Disabled APEI for m400.\n");
>> + }
>> +
>> + acpi_put_table(header);
>> +}
>> +
>> /*
>> * acpi_boot_table_init() called from setup_arch(), always.
>> * 1. find RSDP and get its address, and then find XSDT
>
> Nothing arch-specific here. You're adding this to arch/arm64 because
> drivers/acpi/apei doesn't have an existing quirks table?
There was a fix submitted that had it in drivers/acpi/scan.c, but the
ACPI maintainer said he didn't want the fix in the main ACPI code.
See:
https://lkml.org/lkml/2018/4/19/1020 (ACPI / scan: Fix regression related to X-Gene UARTs)
The m400 is an arm64 platform, so it seems most appropriate to
have it in arch/arm64/kernel/acpi.c. I followed what was done
for x86 quirks (into arch/x86/kernel/acpi/boot.c), and what was
suggested here:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=900581 (linux: Enable Buster kernel features for newer ARM64 servers)
Thanks for the review.
-Geoff
next prev parent reply other threads:[~2018-06-15 17:17 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-06-13 18:22 [PATCH] arm64/acpi: Add fixup for HPE m400 quirks Geoff Levand
2018-06-15 8:47 ` Riku Voipio
2018-06-15 9:51 ` Graeme Gregory
2018-06-15 11:14 ` James Morse
2018-06-15 17:17 ` Geoff Levand [this message]
2018-06-15 17:33 ` Mark Salter
2018-06-15 18:15 ` Geoff Levand
2018-06-15 19:14 ` Mark Salter
2018-06-18 16:18 ` James Morse
2018-06-18 18:04 ` Geoff Levand
2018-06-18 22:18 ` Mark Salter
2018-06-19 10:21 ` James Morse
2018-06-22 15:19 ` Mark Salter
2018-06-25 15:34 ` Mark Salter
2018-06-26 14:51 ` James Morse
2018-06-26 20:20 ` Mark Salter
2018-06-27 8:48 ` Ard Biesheuvel
2018-06-27 12:25 ` Mark Salter
2018-07-03 9:30 ` Ian Campbell
2018-07-03 15:20 ` Mark Salter
2018-06-28 10:06 ` James Morse
2018-06-29 13:05 ` Mark Salter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8a3034b9-6cf3-5182-717f-dd1dc8a087aa@infradead.org \
--to=geoff@infradead.org \
--cc=hanjun.guo@linaro.org \
--cc=james.morse@arm.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=lorenzo.pieralisi@arm.com \
--cc=msalter@redhat.com \
--cc=riku.voipio@linaro.org \
--cc=sudeep.holla@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).