From mboxrd@z Thu Jan 1 00:00:00 1970 From: Borislav Petkov Subject: Re: [RFC PATCH v2 3/4] acpi: apei: Do not panic() when correctable errors are marked as fatal. Date: Wed, 25 Apr 2018 16:01:08 +0200 Message-ID: <20180425140108.GA2597@pd.tnic> References: <20180418175415.GJ4795@pd.tnic> <20180419154006.GE3600@pd.tnic> <977608e6-9f5d-c523-a78a-993ac5bfd55f@gmail.com> <20180419164528.GD5635@pd.tnic> <20180419190323.GF5635@pd.tnic> <20180422104849.GA32754@pd.tnic> <70c43399-e8e5-5061-b5a5-451deb5f02fa@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Return-path: Content-Disposition: inline In-Reply-To: <70c43399-e8e5-5061-b5a5-451deb5f02fa@gmail.com> Sender: linux-kernel-owner@vger.kernel.org To: "Alex G." Cc: linux-acpi@vger.kernel.org, linux-edac@vger.kernel.org, rjw@rjwysocki.net, lenb@kernel.org, tony.luck@intel.com, tbaicar@codeaurora.org, will.deacon@arm.com, james.morse@arm.com, shiju.jose@huawei.com, zjzhang@codeaurora.org, gengdongjiu@huawei.com, linux-kernel@vger.kernel.org, alex_gagniuc@dellteam.com, austin_bolen@dell.com, shyam_iyer@dell.com, devel@acpica.org, mchehab@kernel.org, robert.moore@intel.com, erik.schmauss@intel.com, Yazen Ghannam , Ard Biesheuvel List-Id: linux-acpi@vger.kernel.org On Mon, Apr 23, 2018 at 11:19:25PM -0500, Alex G. wrote: > That tells you what FFS said about the error. I betcha those status and command values have a human-readable counterparts. Btw, what do you abbreviate with "FFS"? > It's immediately obvious if there's a glaring FFS bug and if we get bogus > data. If you distrust firmware as much as I do, then you will find great > value in having such info in the logs. It's probably not too useful to a > casual user, but then neither is a majority of the system log. No no, you're missing the point - I *want* all data in the error log which helps debug a hardware issue. I just want it humanly readable so that I don't have to jot down the values and go scour the manuals to map what it actually means. > You're missing the timing and assuming you will get the hotplug interrupt. > In this example, you have 22ms between the link down and presence detect > state change. This is a fairly fast removal. > > Hotplug dependencies aside (you can have the kernel run without PCIe hotplug > support), I don't think you want to just linger in NMI for dozens of > milliseconds waiting for presence detect confirmation. No, I don't mean that. I mean something like deferred processing: you get an error, you notice it is a device which supports physical removal so you exit the NMI handler and process the error in normal, process context which allows you to query the device and say, "Hey device, are you still there?" If it is not, you drop all the hw I/O errors reported for it. Hmmm? -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply. From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Subject: [RFC,v2,3/4] acpi: apei: Do not panic() when correctable errors are marked as fatal. From: Borislav Petkov Message-Id: <20180425140108.GA2597@pd.tnic> Date: Wed, 25 Apr 2018 16:01:08 +0200 To: "Alex G." Cc: linux-acpi@vger.kernel.org, linux-edac@vger.kernel.org, rjw@rjwysocki.net, lenb@kernel.org, tony.luck@intel.com, tbaicar@codeaurora.org, will.deacon@arm.com, james.morse@arm.com, shiju.jose@huawei.com, zjzhang@codeaurora.org, gengdongjiu@huawei.com, linux-kernel@vger.kernel.org, alex_gagniuc@dellteam.com, austin_bolen@dell.com, shyam_iyer@dell.com, devel@acpica.org, mchehab@kernel.org, robert.moore@intel.com, erik.schmauss@intel.com, Yazen Ghannam , Ard Biesheuvel List-ID: T24gTW9uLCBBcHIgMjMsIDIwMTggYXQgMTE6MTk6MjVQTSAtMDUwMCwgQWxleCBHLiB3cm90ZToK PiBUaGF0IHRlbGxzIHlvdSB3aGF0IEZGUyBzYWlkIGFib3V0IHRoZSBlcnJvci4KCkkgYmV0Y2hh IHRob3NlIHN0YXR1cyBhbmQgY29tbWFuZCB2YWx1ZXMgaGF2ZSBhIGh1bWFuLXJlYWRhYmxlIGNv dW50ZXJwYXJ0cy4KCkJ0dywgd2hhdCBkbyB5b3UgYWJicmV2aWF0ZSB3aXRoICJGRlMiPwoKPiBJ dCdzIGltbWVkaWF0ZWx5IG9idmlvdXMgaWYgdGhlcmUncyBhIGdsYXJpbmcgRkZTIGJ1ZyBhbmQg aWYgd2UgZ2V0IGJvZ3VzCj4gZGF0YS4gSWYgeW91IGRpc3RydXN0IGZpcm13YXJlIGFzIG11Y2gg YXMgSSBkbywgdGhlbiB5b3Ugd2lsbCBmaW5kIGdyZWF0Cj4gdmFsdWUgaW4gaGF2aW5nIHN1Y2gg aW5mbyBpbiB0aGUgbG9ncy4gSXQncyBwcm9iYWJseSBub3QgdG9vIHVzZWZ1bCB0byBhCj4gY2Fz dWFsIHVzZXIsIGJ1dCB0aGVuIG5laXRoZXIgaXMgYSBtYWpvcml0eSBvZiB0aGUgc3lzdGVtIGxv Zy4KCk5vIG5vLCB5b3UncmUgbWlzc2luZyB0aGUgcG9pbnQgLSBJICp3YW50KiBhbGwgZGF0YSBp biB0aGUgZXJyb3IgbG9nCndoaWNoIGhlbHBzIGRlYnVnIGEgaGFyZHdhcmUgaXNzdWUuIEkganVz dCB3YW50IGl0IGh1bWFubHkgcmVhZGFibGUgc28KdGhhdCBJIGRvbid0IGhhdmUgdG8gam90IGRv d24gdGhlIHZhbHVlcyBhbmQgZ28gc2NvdXIgdGhlIG1hbnVhbHMgdG8gbWFwCndoYXQgaXQgYWN0 dWFsbHkgbWVhbnMuCgo+IFlvdSdyZSBtaXNzaW5nIHRoZSB0aW1pbmcgYW5kIGFzc3VtaW5nIHlv dSB3aWxsIGdldCB0aGUgaG90cGx1ZyBpbnRlcnJ1cHQuCj4gSW4gdGhpcyBleGFtcGxlLCB5b3Ug aGF2ZSAyMm1zIGJldHdlZW4gdGhlIGxpbmsgZG93biBhbmQgcHJlc2VuY2UgZGV0ZWN0Cj4gc3Rh dGUgY2hhbmdlLiBUaGlzIGlzIGEgZmFpcmx5IGZhc3QgcmVtb3ZhbC4KPiAKPiBIb3RwbHVnIGRl cGVuZGVuY2llcyBhc2lkZSAoeW91IGNhbiBoYXZlIHRoZSBrZXJuZWwgcnVuIHdpdGhvdXQgUENJ ZSBob3RwbHVnCj4gc3VwcG9ydCksIEkgZG9uJ3QgdGhpbmsgeW91IHdhbnQgdG8ganVzdCBsaW5n ZXIgaW4gTk1JIGZvciBkb3plbnMgb2YKPiBtaWxsaXNlY29uZHMgd2FpdGluZyBmb3IgcHJlc2Vu Y2UgZGV0ZWN0IGNvbmZpcm1hdGlvbi4KCk5vLCBJIGRvbid0IG1lYW4gdGhhdC4gSSBtZWFuIHNv bWV0aGluZyBsaWtlIGRlZmVycmVkIHByb2Nlc3Npbmc6IHlvdQpnZXQgYW4gZXJyb3IsIHlvdSBu b3RpY2UgaXQgaXMgYSBkZXZpY2Ugd2hpY2ggc3VwcG9ydHMgcGh5c2ljYWwgcmVtb3ZhbApzbyB5 b3UgZXhpdCB0aGUgTk1JIGhhbmRsZXIgYW5kIHByb2Nlc3MgdGhlIGVycm9yIGluIG5vcm1hbCwg cHJvY2Vzcwpjb250ZXh0IHdoaWNoIGFsbG93cyB5b3UgdG8gcXVlcnkgdGhlIGRldmljZSBhbmQg c2F5LCAiSGV5IGRldmljZSwgYXJlCnlvdSBzdGlsbCB0aGVyZT8iCgpJZiBpdCBpcyBub3QsIHlv dSBkcm9wIGFsbCB0aGUgaHcgSS9PIGVycm9ycyByZXBvcnRlZCBmb3IgaXQuCgpIbW1tPwo=