From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Alex G." Subject: Re: [RFC PATCH v2 4/4] acpi: apei: Warn when GHES marks correctable errors as "fatal" Date: Thu, 19 Apr 2018 10:11:03 -0500 Message-ID: <807002b1-ccb9-22c8-6563-ade7e44912ff@gmail.com> References: <20180416215903.7318-1-mr.nuke.me@gmail.com> <20180416215903.7318-5-mr.nuke.me@gmail.com> <20180418175452.GK4795@pd.tnic> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20180418175452.GK4795@pd.tnic> Content-Language: en-US Sender: linux-kernel-owner@vger.kernel.org To: Borislav Petkov Cc: linux-acpi@vger.kernel.org, linux-edac@vger.kernel.org, rjw@rjwysocki.net, lenb@kernel.org, tony.luck@intel.com, tbaicar@codeaurora.org, will.deacon@arm.com, james.morse@arm.com, shiju.jose@huawei.com, zjzhang@codeaurora.org, gengdongjiu@huawei.com, linux-kernel@vger.kernel.org, alex_gagniuc@dellteam.com, austin_bolen@dell.com, shyam_iyer@dell.com, devel@acpica.org, mchehab@kernel.org, robert.moore@intel.com, erik.schmauss@intel.com List-Id: linux-acpi@vger.kernel.org On 04/18/2018 12:54 PM, Borislav Petkov wrote: > On Mon, Apr 16, 2018 at 04:59:03PM -0500, Alexandru Gagniuc wrote: (snip) >> + >> + corrected_sev = max(corrected_sev, sec_sev); >> + } >> + >> + if ((sev >= GHES_SEV_PANIC) && (corrected_sev < sev)) { >> + pr_warn("FIRMWARE BUG: Firmware sent fatal error that we were able to correct"); >> + pr_warn("BROKEN FIRMWARE: Complain to your hardware vendor"); > > No, I don't want any of that crap issuing stuff in dmesg and then people > opening bugs and running around and trying to replace hardware. > > We either can handle the error and log a normal record somewhere or we > cannot and explode. There is value in this. From my observations, fw claims it will do everything through FFS, yet fails to fully handle the situation. It's rooted in FW's assumptions about OS behavior. Because the (old) versions of windows, esxi, and rhel used during development crash, fw assumes that _all_ OSes crash. The result in a surprising majority of cases is that FFS doesn't properly handle recurring errors, and fw is, in fact, broken. > The complaining about the FW doesn't bring shit. You are correct. It doesn't bring defecation. It brings a red flag that helps people get closer to the root cause of problems. That being said, I can just drop this patch. Alex From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Subject: [RFC,v2,4/4] acpi: apei: Warn when GHES marks correctable errors as "fatal" From: Alexandru Gagniuc Message-Id: <807002b1-ccb9-22c8-6563-ade7e44912ff@gmail.com> Date: Thu, 19 Apr 2018 10:11:03 -0500 To: Borislav Petkov Cc: linux-acpi@vger.kernel.org, linux-edac@vger.kernel.org, rjw@rjwysocki.net, lenb@kernel.org, tony.luck@intel.com, tbaicar@codeaurora.org, will.deacon@arm.com, james.morse@arm.com, shiju.jose@huawei.com, zjzhang@codeaurora.org, gengdongjiu@huawei.com, linux-kernel@vger.kernel.org, alex_gagniuc@dellteam.com, austin_bolen@dell.com, shyam_iyer@dell.com, devel@acpica.org, mchehab@kernel.org, robert.moore@intel.com, erik.schmauss@intel.com List-ID: T24gMDQvMTgvMjAxOCAxMjo1NCBQTSwgQm9yaXNsYXYgUGV0a292IHdyb3RlOgo+IE9uIE1vbiwg QXByIDE2LCAyMDE4IGF0IDA0OjU5OjAzUE0gLTA1MDAsIEFsZXhhbmRydSBHYWduaXVjIHdyb3Rl OgoKKHNuaXApCj4+ICsKPj4gKwkJY29ycmVjdGVkX3NldiA9IG1heChjb3JyZWN0ZWRfc2V2LCBz ZWNfc2V2KTsKPj4gKwl9Cj4+ICsKPj4gKwlpZiAoKHNldiA+PSBHSEVTX1NFVl9QQU5JQykgJiYg KGNvcnJlY3RlZF9zZXYgPCBzZXYpKSB7Cj4+ICsJCXByX3dhcm4oIkZJUk1XQVJFIEJVRzogRmly bXdhcmUgc2VudCBmYXRhbCBlcnJvciB0aGF0IHdlIHdlcmUgYWJsZSB0byBjb3JyZWN0Iik7Cj4+ ICsJCXByX3dhcm4oIkJST0tFTiBGSVJNV0FSRTogQ29tcGxhaW4gdG8geW91ciBoYXJkd2FyZSB2 ZW5kb3IiKTsKPiAKPiBObywgSSBkb24ndCB3YW50IGFueSBvZiB0aGF0IGNyYXAgaXNzdWluZyBz dHVmZiBpbiBkbWVzZyBhbmQgdGhlbiBwZW9wbGUKPiBvcGVuaW5nIGJ1Z3MgYW5kIHJ1bm5pbmcg YXJvdW5kIGFuZCB0cnlpbmcgdG8gcmVwbGFjZSBoYXJkd2FyZS4KPiAKPiBXZSBlaXRoZXIgY2Fu IGhhbmRsZSB0aGUgZXJyb3IgYW5kIGxvZyBhIG5vcm1hbCByZWNvcmQgc29tZXdoZXJlIG9yIHdl Cj4gY2Fubm90IGFuZCBleHBsb2RlLgoKVGhlcmUgaXMgdmFsdWUgaW4gdGhpcy4gRnJvbSBteSBv YnNlcnZhdGlvbnMsIGZ3IGNsYWltcyBpdCB3aWxsIGRvCmV2ZXJ5dGhpbmcgdGhyb3VnaCBGRlMs IHlldCBmYWlscyB0byBmdWxseSBoYW5kbGUgdGhlIHNpdHVhdGlvbi4gSXQncwpyb290ZWQgaW4g RlcncyBhc3N1bXB0aW9ucyBhYm91dCBPUyBiZWhhdmlvci4gQmVjYXVzZSB0aGUgKG9sZCkgdmVy c2lvbnMKb2Ygd2luZG93cywgZXN4aSwgYW5kIHJoZWwgdXNlZCBkdXJpbmcgZGV2ZWxvcG1lbnQg Y3Jhc2gsIGZ3IGFzc3VtZXMKdGhhdCBfYWxsXyBPU2VzIGNyYXNoLiBUaGUgcmVzdWx0IGluIGEg c3VycHJpc2luZyBtYWpvcml0eSBvZiBjYXNlcyBpcwp0aGF0IEZGUyBkb2Vzbid0IHByb3Blcmx5 IGhhbmRsZSByZWN1cnJpbmcgZXJyb3JzLCBhbmQgZncgaXMsIGluIGZhY3QsCmJyb2tlbi4KCj4g VGhlIGNvbXBsYWluaW5nIGFib3V0IHRoZSBGVyBkb2Vzbid0IGJyaW5nIHNoaXQuCgpZb3UgYXJl IGNvcnJlY3QuIEl0IGRvZXNuJ3QgYnJpbmcgZGVmZWNhdGlvbi4gSXQgYnJpbmdzIGEgcmVkIGZs YWcgdGhhdApoZWxwcyBwZW9wbGUgZ2V0IGNsb3NlciB0byB0aGUgcm9vdCBjYXVzZSBvZiBwcm9i bGVtcy4KClRoYXQgYmVpbmcgc2FpZCwgSSBjYW4ganVzdCBkcm9wIHRoaXMgcGF0Y2guCgpBbGV4 Ci0tLQpUbyB1bnN1YnNjcmliZSBmcm9tIHRoaXMgbGlzdDogc2VuZCB0aGUgbGluZSAidW5zdWJz Y3JpYmUgbGludXgtZWRhYyIgaW4KdGhlIGJvZHkgb2YgYSBtZXNzYWdlIHRvIG1ham9yZG9tb0B2 Z2VyLmtlcm5lbC5vcmcKTW9yZSBtYWpvcmRvbW8gaW5mbyBhdCAgaHR0cDovL3ZnZXIua2VybmVs Lm9yZy9tYWpvcmRvbW8taW5mby5odG1sCg==