From mboxrd@z Thu Jan 1 00:00:00 1970 From: Borislav Petkov Subject: Re: [RFC PATCH v4 3/3] acpi: apei: Do not panic() on PCIe errors reported through GHES Date: Fri, 11 May 2018 17:40:39 +0200 Message-ID: <20180511154039.GD12705@pd.tnic> References: <20180430212836.7807-1-mr.nuke.me@gmail.com> <20180430213358.8319-1-mr.nuke.me@gmail.com> <20180430213358.8319-3-mr.nuke.me@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Return-path: Content-Disposition: inline In-Reply-To: <20180430213358.8319-3-mr.nuke.me@gmail.com> Sender: linux-kernel-owner@vger.kernel.org To: Alexandru Gagniuc Cc: alex_gagniuc@dellteam.com, austin_bolen@dell.com, shyam_iyer@dell.com, "Rafael J. Wysocki" , Len Brown , Tony Luck , Mauro Carvalho Chehab , Robert Moore , Erik Schmauss , Tyler Baicar , Will Deacon , James Morse , Shiju Jose , "Jonathan (Zhixiong) Zhang" , Dongjiu Geng , linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org, devel@acpica.org List-Id: linux-acpi@vger.kernel.org On Mon, Apr 30, 2018 at 04:33:52PM -0500, Alexandru Gagniuc wrote: > The policy was to panic() when GHES said that an error is "Fatal". > This logic is wrong for several reasons, as it doesn't take into > account what caused the error. > > PCIe fatal errors indicate that the link to a device is either > unstable or unusable. They don't indicate that the machine is on fire, > and they are not severe enough that we need to panic(). Instead of > relying on crackmonkey firmware, evaluate the error severity based on ^^^^^^^^^^^^ Please keep the smartass formulations for the ML only and do not let them leak into commit messages. > Signed-off-by: Alexandru Gagniuc > --- > drivers/acpi/apei/ghes.c | 45 ++++++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 42 insertions(+), 3 deletions(-) > > diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c > index c9f1971333c1..49318fba409c 100644 > --- a/drivers/acpi/apei/ghes.c > +++ b/drivers/acpi/apei/ghes.c > @@ -425,8 +425,7 @@ static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int > * GHES_SEV_RECOVERABLE -> AER_NONFATAL > * GHES_SEV_RECOVERABLE && CPER_SEC_RESET -> AER_FATAL > * These both need to be reported and recovered from by the AER driver. > - * GHES_SEV_PANIC does not make it to this handling since the kernel must > - * panic. > + * GHES_SEV_PANIC -> AER_FATAL > */ > static void ghes_handle_aer(struct acpi_hest_generic_data *gdata) > { > @@ -459,6 +458,46 @@ static void ghes_handle_aer(struct acpi_hest_generic_data *gdata) > #endif > } > > +/* PCIe errors should not cause a panic. */ > +static int ghes_sec_pcie_severity(struct acpi_hest_generic_data *gdata) > +{ > + struct cper_sec_pcie *pcie_err = acpi_hest_get_payload(gdata); > + > + if (pcie_err->validation_bits & CPER_PCIE_VALID_DEVICE_ID && > + pcie_err->validation_bits & CPER_PCIE_VALID_AER_INFO && > + IS_ENABLED(CONFIG_ACPI_APEI_PCIEAER)) How is PCIe error severity dependent on whether the AER error reporting driver is enabled (and possibly not even loaded) on the system? > + return CPER_SEV_RECOVERABLE; > + > + return ghes_cper_severity(gdata->error_severity); > +} > +/* -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply. From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Subject: [RFC,v4,3/3] acpi: apei: Do not panic() on PCIe errors reported through GHES From: Borislav Petkov Message-Id: <20180511154039.GD12705@pd.tnic> Date: Fri, 11 May 2018 17:40:39 +0200 To: Alexandru Gagniuc Cc: alex_gagniuc@dellteam.com, austin_bolen@dell.com, shyam_iyer@dell.com, "Rafael J. Wysocki" , Len Brown , Tony Luck , Mauro Carvalho Chehab , Robert Moore , Erik Schmauss , Tyler Baicar , Will Deacon , James Morse , Shiju Jose , "Jonathan (Zhixiong) Zhang" , Dongjiu Geng , linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org, linux-edac@vger.kernel.org, devel@acpica.org List-ID: T24gTW9uLCBBcHIgMzAsIDIwMTggYXQgMDQ6MzM6NTJQTSAtMDUwMCwgQWxleGFuZHJ1IEdhZ25p dWMgd3JvdGU6Cj4gVGhlIHBvbGljeSB3YXMgdG8gcGFuaWMoKSB3aGVuIEdIRVMgc2FpZCB0aGF0 IGFuIGVycm9yIGlzICJGYXRhbCIuCj4gVGhpcyBsb2dpYyBpcyB3cm9uZyBmb3Igc2V2ZXJhbCBy ZWFzb25zLCBhcyBpdCBkb2Vzbid0IHRha2UgaW50bwo+IGFjY291bnQgd2hhdCBjYXVzZWQgdGhl IGVycm9yLgo+IAo+IFBDSWUgZmF0YWwgZXJyb3JzIGluZGljYXRlIHRoYXQgdGhlIGxpbmsgdG8g YSBkZXZpY2UgaXMgZWl0aGVyCj4gdW5zdGFibGUgb3IgdW51c2FibGUuIFRoZXkgZG9uJ3QgaW5k aWNhdGUgdGhhdCB0aGUgbWFjaGluZSBpcyBvbiBmaXJlLAo+IGFuZCB0aGV5IGFyZSBub3Qgc2V2 ZXJlIGVub3VnaCB0aGF0IHdlIG5lZWQgdG8gcGFuaWMoKS4gSW5zdGVhZCBvZgo+IHJlbHlpbmcg b24gY3JhY2ttb25rZXkgZmlybXdhcmUsIGV2YWx1YXRlIHRoZSBlcnJvciBzZXZlcml0eSBiYXNl ZCBvbgoJICAgICBeXl5eXl5eXl5eXl4KClBsZWFzZSBrZWVwIHRoZSBzbWFydGFzcyBmb3JtdWxh dGlvbnMgZm9yIHRoZSBNTCBvbmx5IGFuZCBkbyBub3QgbGV0CnRoZW0gbGVhayBpbnRvIGNvbW1p dCBtZXNzYWdlcy4KCj4gU2lnbmVkLW9mZi1ieTogQWxleGFuZHJ1IEdhZ25pdWMgPG1yLm51a2Uu bWVAZ21haWwuY29tPgo+IC0tLQo+ICBkcml2ZXJzL2FjcGkvYXBlaS9naGVzLmMgfCA0NSArKysr KysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKystLS0KPiAgMSBmaWxlIGNoYW5n ZWQsIDQyIGluc2VydGlvbnMoKyksIDMgZGVsZXRpb25zKC0pCj4gCj4gZGlmZiAtLWdpdCBhL2Ry aXZlcnMvYWNwaS9hcGVpL2doZXMuYyBiL2RyaXZlcnMvYWNwaS9hcGVpL2doZXMuYwo+IGluZGV4 IGM5ZjE5NzEzMzNjMS4uNDkzMThmYmE0MDljIDEwMDY0NAo+IC0tLSBhL2RyaXZlcnMvYWNwaS9h cGVpL2doZXMuYwo+ICsrKyBiL2RyaXZlcnMvYWNwaS9hcGVpL2doZXMuYwo+IEBAIC00MjUsOCAr NDI1LDcgQEAgc3RhdGljIHZvaWQgZ2hlc19oYW5kbGVfbWVtb3J5X2ZhaWx1cmUoc3RydWN0IGFj cGlfaGVzdF9nZW5lcmljX2RhdGEgKmdkYXRhLCBpbnQKPiAgICogR0hFU19TRVZfUkVDT1ZFUkFC TEUgLT4gQUVSX05PTkZBVEFMCj4gICAqIEdIRVNfU0VWX1JFQ09WRVJBQkxFICYmIENQRVJfU0VD X1JFU0VUIC0+IEFFUl9GQVRBTAo+ICAgKiAgICAgVGhlc2UgYm90aCBuZWVkIHRvIGJlIHJlcG9y dGVkIGFuZCByZWNvdmVyZWQgZnJvbSBieSB0aGUgQUVSIGRyaXZlci4KPiAtICogR0hFU19TRVZf UEFOSUMgZG9lcyBub3QgbWFrZSBpdCB0byB0aGlzIGhhbmRsaW5nIHNpbmNlIHRoZSBrZXJuZWwg bXVzdAo+IC0gKiAgICAgcGFuaWMuCj4gKyAqIEdIRVNfU0VWX1BBTklDIC0+IEFFUl9GQVRBTAo+ ICAgKi8KPiAgc3RhdGljIHZvaWQgZ2hlc19oYW5kbGVfYWVyKHN0cnVjdCBhY3BpX2hlc3RfZ2Vu ZXJpY19kYXRhICpnZGF0YSkKPiAgewo+IEBAIC00NTksNiArNDU4LDQ2IEBAIHN0YXRpYyB2b2lk IGdoZXNfaGFuZGxlX2FlcihzdHJ1Y3QgYWNwaV9oZXN0X2dlbmVyaWNfZGF0YSAqZ2RhdGEpCj4g ICNlbmRpZgo+ICB9Cj4gIAo+ICsvKiBQQ0llIGVycm9ycyBzaG91bGQgbm90IGNhdXNlIGEgcGFu aWMuICovCj4gK3N0YXRpYyBpbnQgZ2hlc19zZWNfcGNpZV9zZXZlcml0eShzdHJ1Y3QgYWNwaV9o ZXN0X2dlbmVyaWNfZGF0YSAqZ2RhdGEpCj4gK3sKPiArCXN0cnVjdCBjcGVyX3NlY19wY2llICpw Y2llX2VyciA9IGFjcGlfaGVzdF9nZXRfcGF5bG9hZChnZGF0YSk7Cj4gKwo+ICsJaWYgKHBjaWVf ZXJyLT52YWxpZGF0aW9uX2JpdHMgJiBDUEVSX1BDSUVfVkFMSURfREVWSUNFX0lEICYmCj4gKwkg ICAgcGNpZV9lcnItPnZhbGlkYXRpb25fYml0cyAmIENQRVJfUENJRV9WQUxJRF9BRVJfSU5GTyAm Jgo+ICsJICAgIElTX0VOQUJMRUQoQ09ORklHX0FDUElfQVBFSV9QQ0lFQUVSKSkKCkhvdyBpcyBQ Q0llIGVycm9yIHNldmVyaXR5IGRlcGVuZGVudCBvbiB3aGV0aGVyIHRoZSBBRVIgZXJyb3IgcmVw b3J0aW5nCmRyaXZlciBpcyBlbmFibGVkIChhbmQgcG9zc2libHkgbm90IGV2ZW4gbG9hZGVkKSBv biB0aGUgc3lzdGVtPwoKPiArCQlyZXR1cm4gQ1BFUl9TRVZfUkVDT1ZFUkFCTEU7Cj4gKwo+ICsJ cmV0dXJuIGdoZXNfY3Blcl9zZXZlcml0eShnZGF0YS0+ZXJyb3Jfc2V2ZXJpdHkpOwo+ICt9Cj4g Ky8qCg==