From: "Baicar, Tyler" <tbaicar@codeaurora.org>
To: Suzuki K Poulose <Suzuki.Poulose@arm.com>,
christoffer.dall@linaro.org, marc.zyngier@arm.com,
pbonzini@redhat.com, rkrcmar@redhat.com, linux@armlinux.org.uk,
catalin.marinas@arm.com, will.deacon@arm.com, rjw@rjwysocki.net,
lenb@kernel.org, matt@codeblueprint.co.uk,
robert.moore@intel.com, lv.zheng@intel.com, mark.rutland@arm.com,
james.morse@arm.com, akpm@linux-foundation.org,
sandeepa.s.prabhu@gmail.com, shijie.huang@arm.com,
paul.gortmaker@windriver.com, tomasz.nowicki@linaro.org,
fu.wei@linaro.org, rostedt@goodmis.org, bristot@redhat.com,
linux-arm-kernel@lists.infradead.org,
kvmarm@lists.cs.columbia.edu, Dkvm@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org,
linux-efi@vger.kernel.org, devel@acpica.org
Cc: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>
Subject: Re: [PATCH V3 06/10] acpi: apei: panic OS with fatal error status block
Date: Thu, 13 Oct 2016 17:34:08 -0600 [thread overview]
Message-ID: <18205aac-02ae-bd45-2d2d-aa01cf845ae7@codeaurora.org> (raw)
In-Reply-To: <bf911628-71be-0dca-f1c7-c12e681bd37f@arm.com>
Hello Suzuki,
On 10/13/2016 7:00 AM, Suzuki K Poulose wrote:
> On 07/10/16 22:31, Tyler Baicar wrote:
>> From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>
>>
>> Even if an error status block's severity is fatal, the kernel does not
>> honor the severity level and panic.
>>
>> With the firmware first model, the platform could inform the OS about a
>> fatal hardware error through the non-NMI GHES notification type. The OS
>> should panic when a hardware error record is received with this
>> severity.
>>
>> Call panic() after CPER data in error status block is printed if
>> severity is fatal, before each error section is handled.
>>
>> Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
>> ---
>> drivers/acpi/apei/ghes.c | 10 ++++++++--
>> 1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index 28d5a09..36894c8 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -141,6 +141,8 @@ static unsigned long ghes_estatus_pool_size_request;
>> static struct ghes_estatus_cache
>> *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
>> static atomic_t ghes_estatus_cache_alloced;
>>
>> +static int ghes_panic_timeout __read_mostly = 30;
>> +
>> static int ghes_ioremap_init(void)
>> {
>> ghes_ioremap_area = __get_vm_area(PAGE_SIZE * GHES_IOREMAP_PAGES,
>> @@ -715,6 +717,12 @@ static int ghes_proc(struct ghes *ghes)
>> if (ghes_print_estatus(NULL, ghes->generic, ghes->estatus))
>> ghes_estatus_cache_add(ghes->generic, ghes->estatus);
>> }
>> + if (ghes_severity(ghes->estatus->error_severity) >=
>> GHES_SEV_PANIC) {
>> + if (panic_timeout == 0)
>> + panic_timeout = ghes_panic_timeout;
>> + panic("Fatal hardware error!");
>
> I think there is a chance that we might miss the o/p of
> ghes_print_estatus() as we use
> no pfx, and it could default to the normal loglevel and would never
> get printed
> if panic() is encountered before it. On the other hand, there is
> already a
> __ghes_panic() which does similar stuff. Is there a way we could reuse
> (may be even parts of) it ? Or at least use KERN_EMERG for the
> ghes_print_estatus(),
> if the severity could result in panic() ?
__ghes_panic() does additional handling which we do not want to do here.
I could make the following a helper function so it is not duplicated though:
if (panic_timeout == 0)
panic_timeout = ghes_panic_timeout;
panic("Fatal hardware error!");
The pfx is actually being calculated already in __ghes_print_estatus():
if (pfx == NULL) {
if (ghes_severity(estatus->error_severity) <=
GHES_SEV_CORRECTED)
pfx = KERN_WARNING;
else
pfx = KERN_ERR;
}
From ghes.h:
enum {
GHES_SEV_NO = 0x0,
GHES_SEV_CORRECTED = 0x1,
GHES_SEV_RECOVERABLE = 0x2,
GHES_SEV_PANIC = 0x3,
};
This will make the pfx KERN_ERR for the case of a panic.
Thanks,
Tyler
>
> Cheers
> Suzuki
>
--
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.
WARNING: multiple messages have this Message-ID (diff)
From: tbaicar@codeaurora.org (Baicar, Tyler)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH V3 06/10] acpi: apei: panic OS with fatal error status block
Date: Thu, 13 Oct 2016 17:34:08 -0600 [thread overview]
Message-ID: <18205aac-02ae-bd45-2d2d-aa01cf845ae7@codeaurora.org> (raw)
In-Reply-To: <bf911628-71be-0dca-f1c7-c12e681bd37f@arm.com>
Hello Suzuki,
On 10/13/2016 7:00 AM, Suzuki K Poulose wrote:
> On 07/10/16 22:31, Tyler Baicar wrote:
>> From: "Jonathan (Zhixiong) Zhang" <zjzhang@codeaurora.org>
>>
>> Even if an error status block's severity is fatal, the kernel does not
>> honor the severity level and panic.
>>
>> With the firmware first model, the platform could inform the OS about a
>> fatal hardware error through the non-NMI GHES notification type. The OS
>> should panic when a hardware error record is received with this
>> severity.
>>
>> Call panic() after CPER data in error status block is printed if
>> severity is fatal, before each error section is handled.
>>
>> Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
>> ---
>> drivers/acpi/apei/ghes.c | 10 ++++++++--
>> 1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index 28d5a09..36894c8 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -141,6 +141,8 @@ static unsigned long ghes_estatus_pool_size_request;
>> static struct ghes_estatus_cache
>> *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
>> static atomic_t ghes_estatus_cache_alloced;
>>
>> +static int ghes_panic_timeout __read_mostly = 30;
>> +
>> static int ghes_ioremap_init(void)
>> {
>> ghes_ioremap_area = __get_vm_area(PAGE_SIZE * GHES_IOREMAP_PAGES,
>> @@ -715,6 +717,12 @@ static int ghes_proc(struct ghes *ghes)
>> if (ghes_print_estatus(NULL, ghes->generic, ghes->estatus))
>> ghes_estatus_cache_add(ghes->generic, ghes->estatus);
>> }
>> + if (ghes_severity(ghes->estatus->error_severity) >=
>> GHES_SEV_PANIC) {
>> + if (panic_timeout == 0)
>> + panic_timeout = ghes_panic_timeout;
>> + panic("Fatal hardware error!");
>
> I think there is a chance that we might miss the o/p of
> ghes_print_estatus() as we use
> no pfx, and it could default to the normal loglevel and would never
> get printed
> if panic() is encountered before it. On the other hand, there is
> already a
> __ghes_panic() which does similar stuff. Is there a way we could reuse
> (may be even parts of) it ? Or at least use KERN_EMERG for the
> ghes_print_estatus(),
> if the severity could result in panic() ?
__ghes_panic() does additional handling which we do not want to do here.
I could make the following a helper function so it is not duplicated though:
if (panic_timeout == 0)
panic_timeout = ghes_panic_timeout;
panic("Fatal hardware error!");
The pfx is actually being calculated already in __ghes_print_estatus():
if (pfx == NULL) {
if (ghes_severity(estatus->error_severity) <=
GHES_SEV_CORRECTED)
pfx = KERN_WARNING;
else
pfx = KERN_ERR;
}
From ghes.h:
enum {
GHES_SEV_NO = 0x0,
GHES_SEV_CORRECTED = 0x1,
GHES_SEV_RECOVERABLE = 0x2,
GHES_SEV_PANIC = 0x3,
};
This will make the pfx KERN_ERR for the case of a panic.
Thanks,
Tyler
>
> Cheers
> Suzuki
>
--
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.
next prev parent reply other threads:[~2016-10-13 23:24 UTC|newest]
Thread overview: 113+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-10-07 21:31 [PATCH V3 00/10] Add UEFI 2.6 and ACPI 6.1 updates for RAS on ARM64 Tyler Baicar
2016-10-07 21:31 ` Tyler Baicar
2016-10-07 21:31 ` Tyler Baicar
2016-10-07 21:31 ` [PATCH V3 01/10] acpi: apei: read ack upon ghes record consumption Tyler Baicar
2016-10-07 21:31 ` Tyler Baicar
2016-10-12 15:39 ` Punit Agrawal
2016-10-12 15:39 ` Punit Agrawal
2016-10-12 15:39 ` Punit Agrawal
2016-10-13 13:49 ` Baicar, Tyler
2016-10-13 13:49 ` Baicar, Tyler
2016-10-13 13:49 ` Baicar, Tyler
2016-10-07 21:31 ` [PATCH V3 02/10] ras: acpi/apei: cper: generic error data entry v3 per ACPI 6.1 Tyler Baicar
2016-10-07 21:31 ` Tyler Baicar
2016-10-11 17:28 ` Suzuki K Poulose
2016-10-11 17:28 ` Suzuki K Poulose
2016-10-11 17:28 ` Suzuki K Poulose
2016-10-12 22:10 ` Baicar, Tyler
2016-10-12 22:10 ` Baicar, Tyler
2016-10-12 22:10 ` Baicar, Tyler
2016-10-13 8:50 ` Suzuki K Poulose
2016-10-13 8:50 ` Suzuki K Poulose
2016-10-13 8:50 ` Suzuki K Poulose
2016-10-13 19:37 ` Baicar, Tyler
2016-10-13 19:37 ` Baicar, Tyler
2016-10-13 19:37 ` Baicar, Tyler
[not found] ` <912acc88-fbaf-2576-8048-1fcc67439600-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2016-10-14 16:28 ` Suzuki K Poulose
2016-10-14 16:28 ` Suzuki K Poulose
2016-10-14 16:28 ` Suzuki K Poulose
2016-10-14 16:39 ` Mark Rutland
2016-10-14 16:39 ` Mark Rutland
2016-10-14 16:39 ` Mark Rutland
2016-10-11 18:52 ` Russell King - ARM Linux
2016-10-11 18:52 ` Russell King - ARM Linux
2016-10-11 18:52 ` Russell King - ARM Linux
[not found] ` <20161011185236.GC1041-l+eeeJia6m9URfEZ8mYm6t73F7V6hmMc@public.gmane.org>
2016-10-12 22:18 ` Baicar, Tyler
2016-10-12 22:18 ` Baicar, Tyler
2016-10-12 22:18 ` Baicar, Tyler
2016-10-07 21:31 ` [PATCH V3 04/10] arm64: exception: handle Synchronous External Abort Tyler Baicar
2016-10-07 21:31 ` Tyler Baicar
2016-10-12 17:46 ` Punit Agrawal
2016-10-12 17:46 ` Punit Agrawal
2016-10-12 17:46 ` Punit Agrawal
2016-10-13 13:56 ` Baicar, Tyler
2016-10-13 13:56 ` Baicar, Tyler
[not found] ` <1475875882-2604-1-git-send-email-tbaicar-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2016-10-07 21:31 ` [PATCH V3 03/10] efi: parse ARMv8 processor error Tyler Baicar
2016-10-07 21:31 ` Tyler Baicar
2016-10-07 21:31 ` Tyler Baicar
2016-10-07 21:31 ` [PATCH V3 05/10] acpi: apei: handle SEA notification type for ARMv8 Tyler Baicar
2016-10-07 21:31 ` Tyler Baicar
2016-10-07 21:31 ` Tyler Baicar
2016-10-12 18:00 ` Punit Agrawal
2016-10-12 18:00 ` Punit Agrawal
2016-10-12 18:00 ` Punit Agrawal
2016-10-13 14:03 ` Baicar, Tyler
2016-10-13 14:03 ` Baicar, Tyler
2016-10-13 14:03 ` Baicar, Tyler
2016-10-14 9:39 ` Punit Agrawal
2016-10-14 9:39 ` Punit Agrawal
2016-10-14 9:39 ` Punit Agrawal
2016-10-18 12:44 ` Hanjun Guo
2016-10-18 12:44 ` [Devel] " Hanjun Guo
2016-10-18 12:44 ` Hanjun Guo
[not found] ` <496ddac3-a220-fd42-5ca1-3d0fb0238907-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2016-10-19 16:59 ` Abdulhamid, Harb
2016-10-19 16:59 ` Abdulhamid, Harb
2016-10-19 16:59 ` Abdulhamid, Harb
2016-10-23 9:13 ` Hanjun Guo
2016-10-23 9:13 ` Hanjun Guo
2016-10-23 9:13 ` Hanjun Guo
[not found] ` <1475875882-2604-6-git-send-email-tbaicar-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2016-10-18 13:04 ` Hanjun Guo
2016-10-18 13:04 ` [Devel] " Hanjun Guo
2016-10-18 13:04 ` Hanjun Guo
2016-10-18 13:04 ` Hanjun Guo
[not found] ` <57c81498-78f1-8aac-01b1-b5445415d822-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2016-10-19 17:12 ` Abdulhamid, Harb
2016-10-19 17:12 ` Abdulhamid, Harb
2016-10-19 17:12 ` Abdulhamid, Harb
2016-10-07 21:31 ` [PATCH V3 07/10] efi: print unrecognized CPER section Tyler Baicar
2016-10-07 21:31 ` Tyler Baicar
2016-10-07 21:31 ` Tyler Baicar
2016-10-07 21:31 ` [PATCH V3 09/10] trace, ras: add ARM processor error trace event Tyler Baicar
2016-10-07 21:31 ` Tyler Baicar
2016-10-07 21:31 ` Tyler Baicar
2016-10-07 21:39 ` Steven Rostedt
2016-10-07 21:39 ` Steven Rostedt
2016-10-07 21:39 ` Steven Rostedt
2016-10-12 21:23 ` Baicar, Tyler
2016-10-12 21:23 ` Baicar, Tyler
2016-10-07 21:31 ` [PATCH V3 06/10] acpi: apei: panic OS with fatal error status block Tyler Baicar
2016-10-07 21:31 ` Tyler Baicar
2016-10-13 13:00 ` Suzuki K Poulose
2016-10-13 13:00 ` Suzuki K Poulose
2016-10-13 23:34 ` Baicar, Tyler [this message]
2016-10-13 23:34 ` Baicar, Tyler
2016-10-07 21:31 ` [PATCH V3 08/10] ras: acpi / apei: generate trace event for unrecognized CPER section Tyler Baicar
2016-10-07 21:31 ` Tyler Baicar
2016-10-13 10:54 ` Punit Agrawal
2016-10-13 10:54 ` Punit Agrawal
2016-10-13 10:54 ` Punit Agrawal
2016-10-13 20:15 ` Baicar, Tyler
2016-10-13 20:15 ` Baicar, Tyler
2016-10-13 20:15 ` Baicar, Tyler
2016-10-07 21:31 ` [PATCH V3 10/10] arm64: KVM: add guest SEA support Tyler Baicar
2016-10-07 21:31 ` Tyler Baicar
2016-10-13 13:14 ` Punit Agrawal
2016-10-13 13:14 ` Punit Agrawal
2016-10-13 13:14 ` Punit Agrawal
[not found] ` <87h98gs853.fsf-Z9gB6HwUD+TZROr8t4l/smS4ubULX0JqMm0uRHvK7Nw@public.gmane.org>
2016-10-13 20:14 ` Baicar, Tyler
2016-10-13 20:14 ` Baicar, Tyler
2016-10-13 20:14 ` Baicar, Tyler
2016-10-14 9:38 ` Punit Agrawal
2016-10-14 9:38 ` Punit Agrawal
2016-10-14 9:38 ` Punit Agrawal
2016-10-14 21:58 ` Baicar, Tyler
2016-10-14 21:58 ` Baicar, Tyler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=18205aac-02ae-bd45-2d2d-aa01cf845ae7@codeaurora.org \
--to=tbaicar@codeaurora.org \
--cc=Dkvm@vger.kernel.org \
--cc=Suzuki.Poulose@arm.com \
--cc=akpm@linux-foundation.org \
--cc=bristot@redhat.com \
--cc=catalin.marinas@arm.com \
--cc=christoffer.dall@linaro.org \
--cc=devel@acpica.org \
--cc=fu.wei@linaro.org \
--cc=james.morse@arm.com \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-efi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@armlinux.org.uk \
--cc=lv.zheng@intel.com \
--cc=marc.zyngier@arm.com \
--cc=mark.rutland@arm.com \
--cc=matt@codeblueprint.co.uk \
--cc=paul.gortmaker@windriver.com \
--cc=pbonzini@redhat.com \
--cc=rjw@rjwysocki.net \
--cc=rkrcmar@redhat.com \
--cc=robert.moore@intel.com \
--cc=rostedt@goodmis.org \
--cc=sandeepa.s.prabhu@gmail.com \
--cc=shijie.huang@arm.com \
--cc=tomasz.nowicki@linaro.org \
--cc=will.deacon@arm.com \
--cc=zjzhang@codeaurora.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.