From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751959AbdBOMNW (ORCPT ); Wed, 15 Feb 2017 07:13:22 -0500 Received: from foss.arm.com ([217.140.101.70]:54902 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751359AbdBOMNS (ORCPT ); Wed, 15 Feb 2017 07:13:18 -0500 Message-ID: <58A445D5.7030501@arm.com> Date: Wed, 15 Feb 2017 12:13:09 +0000 From: James Morse User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.6.0 MIME-Version: 1.0 To: "Baicar, Tyler" , zjzhang@codeaurora.org CC: christoffer.dall@linaro.org, marc.zyngier@arm.com, pbonzini@redhat.com, rkrcmar@redhat.com, linux@armlinux.org.uk, catalin.marinas@arm.com, will.deacon@arm.com, rjw@rjwysocki.net, lenb@kernel.org, matt@codeblueprint.co.uk, robert.moore@intel.com, lv.zheng@intel.com, nkaje@codeaurora.org, mark.rutland@arm.com, akpm@linux-foundation.org, eun.taik.lee@samsung.com, sandeepa.s.prabhu@gmail.com, labbott@redhat.com, shijie.huang@arm.com, rruigrok@codeaurora.org, paul.gortmaker@windriver.com, tn@semihalf.com, fu.wei@linaro.org, rostedt@goodmis.org, bristot@redhat.com, linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-efi@vger.kernel.org, devel@acpica.org, Suzuki.Poulose@arm.com, punit.agrawal@arm.com, astone@redhat.com, harba@codeaurora.org, hanjun.guo@linaro.org, john.garry@huawei.com, shiju.jose@huawei.com Subject: Re: [PATCH V8 06/10] acpi: apei: panic OS with fatal error status block References: <1485969413-23577-1-git-send-email-tbaicar@codeaurora.org> <1485969413-23577-7-git-send-email-tbaicar@codeaurora.org> <589C490A.9080109@arm.com> <5b06372d-e389-5157-ccb4-a7b023990d4d@codeaurora.org> In-Reply-To: <5b06372d-e389-5157-ccb4-a7b023990d4d@codeaurora.org> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Tyler, On 13/02/17 22:45, Baicar, Tyler wrote: > On 2/9/2017 3:48 AM, James Morse wrote: >> On 01/02/17 17:16, Tyler Baicar wrote: >>> From: "Jonathan (Zhixiong) Zhang" >>> >>> Even if an error status block's severity is fatal, the kernel does not >>> honor the severity level and panic. >>> >>> With the firmware first model, the platform could inform the OS about a >>> fatal hardware error through the non-NMI GHES notification type. The OS >>> should panic when a hardware error record is received with this >>> severity. >>> >>> Call panic() after CPER data in error status block is printed if >>> severity is fatal, before each error section is handled. >>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c >>> index 8756172..86c1f15 100644 >>> --- a/drivers/acpi/apei/ghes.c >>> +++ b/drivers/acpi/apei/ghes.c >>> @@ -687,6 +689,13 @@ static int ghes_ack_error(struct acpi_hest_generic_v2 >>> *generic_v2) >>> return rc; >>> } >>> +static void __ghes_call_panic(void) >>> +{ >>> + if (panic_timeout == 0) >>> + panic_timeout = ghes_panic_timeout; >>> + panic("Fatal hardware error!"); >>> +} >>> + >> __ghes_panic() also has: >>> __ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus); >> Which prints this estatus regardless of rate limiting and cache-ing. [...] >>> ghes_estatus_cache_add(ghes->generic, ghes->estatus); >>> } >>> + if (ghes_severity(ghes->estatus->error_severity) >= GHES_SEV_PANIC) { >>> + __ghes_call_panic(); >>> + } >> I think this ghes_severity() then panic() should go above the: >>> if (!ghes_estatus_cached(ghes->estatus)) { >> and we should call __ghes_print_estatus() here too, to make sure the message >> definitely got out! > Okay, that makes sense. If we move this up, is there a problem with calling > __ghes_panic() instead of making the __ghes_print_estatus() and > __ghes_call_panic() calls here? It looks like that will just add a call to > oops_begin() and ghes_print_queued_estatus() as well, but this is what > ghes_notify_nmi() does if the severity is panic. I don't think the queued stuff is relevant, isn't that just for x86-NMI messages that it doesn't print out directly? A quick grep shows arm64 doesn't have oops_begin(), you may have to add some equivalent mechanism. Lets try and avoid that rabbit hole! Given __ghes_panic() calls __ghes_print_estatus() too, you could try moving that into your new __ghes_call_panic().... or whatever results in the least lines changed! Thanks, James