From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexandru Gagniuc Subject: [RFC PATCH v2 3/4] acpi: apei: Do not panic() when correctable errors are marked as fatal. Date: Mon, 16 Apr 2018 16:59:02 -0500 Message-ID: <20180416215903.7318-4-mr.nuke.me@gmail.com> References: <20180416215903.7318-1-mr.nuke.me@gmail.com> Return-path: In-Reply-To: <20180416215903.7318-1-mr.nuke.me@gmail.com> Sender: linux-kernel-owner@vger.kernel.org To: linux-acpi@vger.kernel.org, linux-edac@vger.kernel.org Cc: rjw@rjwysocki.net, lenb@kernel.org, tony.luck@intel.com, bp@alien8.de, tbaicar@codeaurora.org, will.deacon@arm.com, james.morse@arm.com, shiju.jose@huawei.com, zjzhang@codeaurora.org, gengdongjiu@huawei.com, linux-kernel@vger.kernel.org, alex_gagniuc@dellteam.com, austin_bolen@dell.com, shyam_iyer@dell.com, devel@acpica.org, mchehab@kernel.org, robert.moore@intel.com, erik.schmauss@intel.com, Alexandru Gagniuc List-Id: linux-acpi@vger.kernel.org Firmware is evil: - ACPI was created to "try and make the 'ACPI' extensions somehow Windows specific" in order to "work well with NT and not the others even if they are open" - EFI was created to hide "secret" registers from the OS. - UEFI was created to allow compromising an otherwise secure OS. Never has firmware been created to solve a problem or simplify an otherwise cumbersome process. It is of no surprise then, that firmware nowadays intentionally crashes an OS. One simple way to do that is to mark GHES errors as fatal. Firmware knows and even expects that an OS will crash in this case. And most OSes do. PCIe errors are notorious for having different definitions of "fatal". In ACPI, and other firmware sandards, 'fatal' means the machine is about to explode and needs to be reset. In PCIe, on the other hand, fatal means that the link to a device has died. In the hotplug world of PCIe, this is akin to a USB disconnect. From that view, the "fatal" loss of a link is a normal event. To allow a machine to crash in this case is downright idiotic. To solve this, implement an IRQ safe handler for AER. This makes sure we have enough information to invoke the full AER handler later down the road, and tells ghes_notify_nmi that "It's all cool". ghes_notify_nmi() then gets calmed down a little, and doesn't panic(). Signed-off-by: Alexandru Gagniuc --- drivers/acpi/apei/ghes.c | 44 ++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 42 insertions(+), 2 deletions(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 2119c51b4a9e..e0528da4e8f8 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -481,12 +481,26 @@ static int ghes_handle_aer(struct acpi_hest_generic_data *gdata, int sev) return ghes_severity(gdata->error_severity); } +static int ghes_handle_aer_irqsafe(struct acpi_hest_generic_data *gdata, + int sev) +{ + struct cper_sec_pcie *pcie_err = acpi_hest_get_payload(gdata); + + /* The system can always recover from AER errors. */ + if (pcie_err->validation_bits & CPER_PCIE_VALID_DEVICE_ID && + pcie_err->validation_bits & CPER_PCIE_VALID_AER_INFO) + return CPER_SEV_RECOVERABLE; + + return ghes_severity(gdata->error_severity); +} + /** * ghes_handler - handler for ACPI APEI errors * @error_uuid: UUID describing the error entry (See ACPI/EFI CPER for details) * @handle: Handler for the GHES entry of type 'error_uuid'. The handler * returns the severity of the error after handling. A handler is allowed * to demote errors to correctable or corrected, as appropriate. + * @handle_irqsafe: (optional) Non-blocking handler for GHES entry. */ static const struct ghes_handler { const guid_t *error_uuid; @@ -498,6 +512,7 @@ static const struct ghes_handler { .handle = ghes_handle_mem, }, { .error_uuid = &CPER_SEC_PCIE, + .handle_irqsafe = ghes_handle_aer_irqsafe, .handle = ghes_handle_aer, }, { .error_uuid = &CPER_SEC_PROC_ARM, @@ -551,6 +566,30 @@ static void ghes_do_proc(struct ghes *ghes, } } +/* How severe is the error if handling is deferred outside IRQ/NMI context? */ +static int ghes_deferrable_severity(struct ghes *ghes) +{ + int deferrable_sev, sev, sec_sev; + struct acpi_hest_generic_data *gdata; + const struct ghes_handler *handler; + const guid_t *section_type; + const struct acpi_hest_generic_status *estatus = ghes->estatus; + + deferrable_sev = GHES_SEV_NO; + sev = ghes_severity(estatus->error_severity); + apei_estatus_for_each_section(estatus, gdata) { + section_type = (guid_t *)gdata->section_type; + handler = get_handler(section_type); + if (handler && handler->handle_irqsafe) + sec_sev = handler->handle_irqsafe(gdata, sev); + else + sec_sev = ghes_severity(gdata->error_severity); + deferrable_sev = max(deferrable_sev, sec_sev); + } + + return deferrable_sev; +} + static void __ghes_print_estatus(const char *pfx, const struct acpi_hest_generic *generic, const struct acpi_hest_generic_status *estatus) @@ -980,7 +1019,7 @@ static void __process_error(struct ghes *ghes) static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs) { struct ghes *ghes; - int sev, ret = NMI_DONE; + int sev, dsev, ret = NMI_DONE; if (!atomic_add_unless(&ghes_in_nmi, 1, 1)) return ret; @@ -993,8 +1032,9 @@ static int ghes_notify_nmi(unsigned int cmd, struct pt_regs *regs) ret = NMI_HANDLED; } + dsev = ghes_deferrable_severity(ghes); sev = ghes_severity(ghes->estatus->error_severity); - if (sev >= GHES_SEV_PANIC) { + if ((sev >= GHES_SEV_PANIC) && (dsev >= GHES_SEV_PANIC)) { oops_begin(); ghes_print_queued_estatus(); __ghes_panic(ghes); -- 2.14.3 From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Subject: [RFC,v2,3/4] acpi: apei: Do not panic() when correctable errors are marked as fatal. From: Alexandru Gagniuc Message-Id: <20180416215903.7318-4-mr.nuke.me@gmail.com> Date: Mon, 16 Apr 2018 16:59:02 -0500 To: linux-acpi@vger.kernel.org, linux-edac@vger.kernel.org Cc: rjw@rjwysocki.net, lenb@kernel.org, tony.luck@intel.com, bp@alien8.de, tbaicar@codeaurora.org, will.deacon@arm.com, james.morse@arm.com, shiju.jose@huawei.com, zjzhang@codeaurora.org, gengdongjiu@huawei.com, linux-kernel@vger.kernel.org, alex_gagniuc@dellteam.com, austin_bolen@dell.com, shyam_iyer@dell.com, devel@acpica.org, mchehab@kernel.org, robert.moore@intel.com, erik.schmauss@intel.com, Alexandru Gagniuc List-ID: RmlybXdhcmUgaXMgZXZpbDoKIC0gQUNQSSB3YXMgY3JlYXRlZCB0byAidHJ5IGFuZCBtYWtlIHRo ZSAnQUNQSScgZXh0ZW5zaW9ucyBzb21laG93CiBXaW5kb3dzIHNwZWNpZmljIiBpbiBvcmRlciB0 byAid29yayB3ZWxsIHdpdGggTlQgYW5kIG5vdCB0aGUgb3RoZXJzCiBldmVuIGlmIHRoZXkgYXJl IG9wZW4iCiAtIEVGSSB3YXMgY3JlYXRlZCB0byBoaWRlICJzZWNyZXQiIHJlZ2lzdGVycyBmcm9t IHRoZSBPUy4KIC0gVUVGSSB3YXMgY3JlYXRlZCB0byBhbGxvdyBjb21wcm9taXNpbmcgYW4gb3Ro ZXJ3aXNlIHNlY3VyZSBPUy4KCk5ldmVyIGhhcyBmaXJtd2FyZSBiZWVuIGNyZWF0ZWQgdG8gc29s dmUgYSBwcm9ibGVtIG9yIHNpbXBsaWZ5IGFuCm90aGVyd2lzZSBjdW1iZXJzb21lIHByb2Nlc3Mu IEl0IGlzIG9mIG5vIHN1cnByaXNlIHRoZW4sIHRoYXQKZmlybXdhcmUgbm93YWRheXMgaW50ZW50 aW9uYWxseSBjcmFzaGVzIGFuIE9TLgoKT25lIHNpbXBsZSB3YXkgdG8gZG8gdGhhdCBpcyB0byBt YXJrIEdIRVMgZXJyb3JzIGFzIGZhdGFsLiBGaXJtd2FyZQprbm93cyBhbmQgZXZlbiBleHBlY3Rz IHRoYXQgYW4gT1Mgd2lsbCBjcmFzaCBpbiB0aGlzIGNhc2UuIEFuZCBtb3N0Ck9TZXMgZG8uCgpQ Q0llIGVycm9ycyBhcmUgbm90b3Jpb3VzIGZvciBoYXZpbmcgZGlmZmVyZW50IGRlZmluaXRpb25z IG9mICJmYXRhbCIuCkluIEFDUEksIGFuZCBvdGhlciBmaXJtd2FyZSBzYW5kYXJkcywgJ2ZhdGFs JyBtZWFucyB0aGUgbWFjaGluZSBpcwphYm91dCB0byBleHBsb2RlIGFuZCBuZWVkcyB0byBiZSBy ZXNldC4gSW4gUENJZSwgb24gdGhlIG90aGVyIGhhbmQsCmZhdGFsIG1lYW5zIHRoYXQgdGhlIGxp bmsgdG8gYSBkZXZpY2UgaGFzIGRpZWQuIEluIHRoZSBob3RwbHVnIHdvcmxkCm9mIFBDSWUsIHRo aXMgaXMgYWtpbiB0byBhIFVTQiBkaXNjb25uZWN0LiBGcm9tIHRoYXQgdmlldywgdGhlICJmYXRh bCIKbG9zcyBvZiBhIGxpbmsgaXMgYSBub3JtYWwgZXZlbnQuIFRvIGFsbG93IGEgbWFjaGluZSB0 byBjcmFzaCBpbiB0aGlzCmNhc2UgaXMgZG93bnJpZ2h0IGlkaW90aWMuCgpUbyBzb2x2ZSB0aGlz LCBpbXBsZW1lbnQgYW4gSVJRIHNhZmUgaGFuZGxlciBmb3IgQUVSLiBUaGlzIG1ha2VzIHN1cmUK d2UgaGF2ZSBlbm91Z2ggaW5mb3JtYXRpb24gdG8gaW52b2tlIHRoZSBmdWxsIEFFUiBoYW5kbGVy IGxhdGVyIGRvd24KdGhlIHJvYWQsIGFuZCB0ZWxscyBnaGVzX25vdGlmeV9ubWkgdGhhdCAiSXQn cyBhbGwgY29vbCIuCmdoZXNfbm90aWZ5X25taSgpIHRoZW4gZ2V0cyBjYWxtZWQgZG93biBhIGxp dHRsZSwgYW5kIGRvZXNuJ3QgcGFuaWMoKS4KClNpZ25lZC1vZmYtYnk6IEFsZXhhbmRydSBHYWdu aXVjIDxtci5udWtlLm1lQGdtYWlsLmNvbT4KLS0tCiBkcml2ZXJzL2FjcGkvYXBlaS9naGVzLmMg fCA0NCArKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKystLQogMSBmaWxl IGNoYW5nZWQsIDQyIGluc2VydGlvbnMoKyksIDIgZGVsZXRpb25zKC0pCgpkaWZmIC0tZ2l0IGEv ZHJpdmVycy9hY3BpL2FwZWkvZ2hlcy5jIGIvZHJpdmVycy9hY3BpL2FwZWkvZ2hlcy5jCmluZGV4 IDIxMTljNTFiNGE5ZS4uZTA1MjhkYTRlOGY4IDEwMDY0NAotLS0gYS9kcml2ZXJzL2FjcGkvYXBl aS9naGVzLmMKKysrIGIvZHJpdmVycy9hY3BpL2FwZWkvZ2hlcy5jCkBAIC00ODEsMTIgKzQ4MSwy NiBAQCBzdGF0aWMgaW50IGdoZXNfaGFuZGxlX2FlcihzdHJ1Y3QgYWNwaV9oZXN0X2dlbmVyaWNf ZGF0YSAqZ2RhdGEsIGludCBzZXYpCiAJcmV0dXJuIGdoZXNfc2V2ZXJpdHkoZ2RhdGEtPmVycm9y X3NldmVyaXR5KTsKIH0KIAorc3RhdGljIGludCBnaGVzX2hhbmRsZV9hZXJfaXJxc2FmZShzdHJ1 Y3QgYWNwaV9oZXN0X2dlbmVyaWNfZGF0YSAqZ2RhdGEsCisJCQkJICAgaW50IHNldikKK3sKKwlz dHJ1Y3QgY3Blcl9zZWNfcGNpZSAqcGNpZV9lcnIgPSBhY3BpX2hlc3RfZ2V0X3BheWxvYWQoZ2Rh dGEpOworCisJLyogVGhlIHN5c3RlbSBjYW4gYWx3YXlzIHJlY292ZXIgZnJvbSBBRVIgZXJyb3Jz LiAqLworCWlmIChwY2llX2Vyci0+dmFsaWRhdGlvbl9iaXRzICYgQ1BFUl9QQ0lFX1ZBTElEX0RF VklDRV9JRCAmJgorCQlwY2llX2Vyci0+dmFsaWRhdGlvbl9iaXRzICYgQ1BFUl9QQ0lFX1ZBTElE X0FFUl9JTkZPKQorCQlyZXR1cm4gQ1BFUl9TRVZfUkVDT1ZFUkFCTEU7CisKKwlyZXR1cm4gZ2hl c19zZXZlcml0eShnZGF0YS0+ZXJyb3Jfc2V2ZXJpdHkpOworfQorCiAvKioKICAqIGdoZXNfaGFu ZGxlciAtIGhhbmRsZXIgZm9yIEFDUEkgQVBFSSBlcnJvcnMKICAqIEBlcnJvcl91dWlkOiBVVUlE IGRlc2NyaWJpbmcgdGhlIGVycm9yIGVudHJ5IChTZWUgQUNQSS9FRkkgQ1BFUiBmb3IgZGV0YWls cykKICAqIEBoYW5kbGU6IEhhbmRsZXIgZm9yIHRoZSBHSEVTIGVudHJ5IG9mIHR5cGUgJ2Vycm9y X3V1aWQnLiBUaGUgaGFuZGxlcgogICoJcmV0dXJucyB0aGUgc2V2ZXJpdHkgb2YgdGhlIGVycm9y IGFmdGVyIGhhbmRsaW5nLiBBIGhhbmRsZXIgaXMgYWxsb3dlZAogICoJdG8gZGVtb3RlIGVycm9y cyB0byBjb3JyZWN0YWJsZSBvciBjb3JyZWN0ZWQsIGFzIGFwcHJvcHJpYXRlLgorICogQGhhbmRs ZV9pcnFzYWZlOiAob3B0aW9uYWwpIE5vbi1ibG9ja2luZyBoYW5kbGVyIGZvciBHSEVTIGVudHJ5 LgogICovCiBzdGF0aWMgY29uc3Qgc3RydWN0IGdoZXNfaGFuZGxlciB7CiAJY29uc3QgZ3VpZF90 ICplcnJvcl91dWlkOwpAQCAtNDk4LDYgKzUxMiw3IEBAIHN0YXRpYyBjb25zdCBzdHJ1Y3QgZ2hl c19oYW5kbGVyIHsKIAkJLmhhbmRsZSA9IGdoZXNfaGFuZGxlX21lbSwKIAl9LCB7CiAJCS5lcnJv cl91dWlkID0gJkNQRVJfU0VDX1BDSUUsCisJCS5oYW5kbGVfaXJxc2FmZSA9IGdoZXNfaGFuZGxl X2Flcl9pcnFzYWZlLAogCQkuaGFuZGxlID0gZ2hlc19oYW5kbGVfYWVyLAogCX0sIHsKIAkJLmVy cm9yX3V1aWQgPSAmQ1BFUl9TRUNfUFJPQ19BUk0sCkBAIC01NTEsNiArNTY2LDMwIEBAIHN0YXRp YyB2b2lkIGdoZXNfZG9fcHJvYyhzdHJ1Y3QgZ2hlcyAqZ2hlcywKIAl9CiB9CiAKKy8qIEhvdyBz ZXZlcmUgaXMgdGhlIGVycm9yIGlmIGhhbmRsaW5nIGlzIGRlZmVycmVkIG91dHNpZGUgSVJRL05N SSBjb250ZXh0PyAqLworc3RhdGljIGludCBnaGVzX2RlZmVycmFibGVfc2V2ZXJpdHkoc3RydWN0 IGdoZXMgKmdoZXMpCit7CisJaW50IGRlZmVycmFibGVfc2V2LCBzZXYsIHNlY19zZXY7CisJc3Ry dWN0IGFjcGlfaGVzdF9nZW5lcmljX2RhdGEgKmdkYXRhOworCWNvbnN0IHN0cnVjdCBnaGVzX2hh bmRsZXIgKmhhbmRsZXI7CisJY29uc3QgZ3VpZF90ICpzZWN0aW9uX3R5cGU7CisJY29uc3Qgc3Ry dWN0IGFjcGlfaGVzdF9nZW5lcmljX3N0YXR1cyAqZXN0YXR1cyA9IGdoZXMtPmVzdGF0dXM7CisK KwlkZWZlcnJhYmxlX3NldiA9IEdIRVNfU0VWX05POworCXNldiA9IGdoZXNfc2V2ZXJpdHkoZXN0 YXR1cy0+ZXJyb3Jfc2V2ZXJpdHkpOworCWFwZWlfZXN0YXR1c19mb3JfZWFjaF9zZWN0aW9uKGVz dGF0dXMsIGdkYXRhKSB7CisJCXNlY3Rpb25fdHlwZSA9IChndWlkX3QgKilnZGF0YS0+c2VjdGlv bl90eXBlOworCQloYW5kbGVyID0gZ2V0X2hhbmRsZXIoc2VjdGlvbl90eXBlKTsKKwkJaWYgKGhh bmRsZXIgJiYgaGFuZGxlci0+aGFuZGxlX2lycXNhZmUpCisJCQlzZWNfc2V2ID0gaGFuZGxlci0+ aGFuZGxlX2lycXNhZmUoZ2RhdGEsIHNldik7CisJCWVsc2UKKwkJCXNlY19zZXYgPSBnaGVzX3Nl dmVyaXR5KGdkYXRhLT5lcnJvcl9zZXZlcml0eSk7CisJCWRlZmVycmFibGVfc2V2ID0gbWF4KGRl ZmVycmFibGVfc2V2LCBzZWNfc2V2KTsKKwl9CisKKwlyZXR1cm4gZGVmZXJyYWJsZV9zZXY7Cit9 CisKIHN0YXRpYyB2b2lkIF9fZ2hlc19wcmludF9lc3RhdHVzKGNvbnN0IGNoYXIgKnBmeCwKIAkJ CQkgY29uc3Qgc3RydWN0IGFjcGlfaGVzdF9nZW5lcmljICpnZW5lcmljLAogCQkJCSBjb25zdCBz dHJ1Y3QgYWNwaV9oZXN0X2dlbmVyaWNfc3RhdHVzICplc3RhdHVzKQpAQCAtOTgwLDcgKzEwMTks NyBAQCBzdGF0aWMgdm9pZCBfX3Byb2Nlc3NfZXJyb3Ioc3RydWN0IGdoZXMgKmdoZXMpCiBzdGF0 aWMgaW50IGdoZXNfbm90aWZ5X25taSh1bnNpZ25lZCBpbnQgY21kLCBzdHJ1Y3QgcHRfcmVncyAq cmVncykKIHsKIAlzdHJ1Y3QgZ2hlcyAqZ2hlczsKLQlpbnQgc2V2LCByZXQgPSBOTUlfRE9ORTsK KwlpbnQgc2V2LCBkc2V2LCByZXQgPSBOTUlfRE9ORTsKIAogCWlmICghYXRvbWljX2FkZF91bmxl c3MoJmdoZXNfaW5fbm1pLCAxLCAxKSkKIAkJcmV0dXJuIHJldDsKQEAgLTk5Myw4ICsxMDMyLDkg QEAgc3RhdGljIGludCBnaGVzX25vdGlmeV9ubWkodW5zaWduZWQgaW50IGNtZCwgc3RydWN0IHB0 X3JlZ3MgKnJlZ3MpCiAJCQlyZXQgPSBOTUlfSEFORExFRDsKIAkJfQogCisJCWRzZXYgPSBnaGVz X2RlZmVycmFibGVfc2V2ZXJpdHkoZ2hlcyk7CiAJCXNldiA9IGdoZXNfc2V2ZXJpdHkoZ2hlcy0+ ZXN0YXR1cy0+ZXJyb3Jfc2V2ZXJpdHkpOwotCQlpZiAoc2V2ID49IEdIRVNfU0VWX1BBTklDKSB7 CisJCWlmICgoc2V2ID49IEdIRVNfU0VWX1BBTklDKSAmJiAoZHNldiA+PSBHSEVTX1NFVl9QQU5J QykpIHsKIAkJCW9vcHNfYmVnaW4oKTsKIAkJCWdoZXNfcHJpbnRfcXVldWVkX2VzdGF0dXMoKTsK IAkJCV9fZ2hlc19wYW5pYyhnaGVzKTsK