From: Sinan Kaya <okaya@codeaurora.org>
To: Borislav Petkov <bp@suse.de>
Cc: "Baicar, Tyler" <tbaicar@codeaurora.org>,
Tony Luck <tony.luck@intel.com>,
rjw@rjwysocki.net, lenb@kernel.org, will.deacon@arm.com,
james.morse@arm.com, prarit@redhat.com, punit.agrawal@arm.com,
shiju.jose@huawei.com, andriy.shevchenko@linux.intel.com,
linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org,
Linux PCI <linux-pci@vger.kernel.org>,
Huang Ying <ying.huang@intel.com>
Subject: Re: [PATCH] acpi: apei: call into AER handling regardless of severity
Date: Wed, 30 Aug 2017 11:31:06 -0400 [thread overview]
Message-ID: <af5dc902-faca-a7a1-6781-95ff82d5d8fd@codeaurora.org> (raw)
In-Reply-To: <20170830151601.ro5qt5272e2msevp@pd.tnic>
On 8/30/2017 11:16 AM, Borislav Petkov wrote:
> On Wed, Aug 30, 2017 at 10:05:44AM -0400, Sinan Kaya wrote:
>> Link reset is not the only recovery mechanism. In the case of nonfatal
>> errors, it is assumed that the endpoint CSR is still reachable.
>> Error is propagated the PCIe endpoint driver. Endpoint driver does a
>> re-initialization, we are back in business.
>
> I'm assuming that's broadcast_error_message()'s job.
>
That's right. Each driver provides an err_handler hook. broadcast function
calls these.
static struct pci_driver e1000_driver = {
..
.err_handler = &e1000_err_handler
};
struct pci_error_handlers {
...
pci_ers_result_t (*error_detected)(struct pci_dev *dev,
enum pci_channel_state error);
}
>> That's not true. The GHES code is changing the severity here before posting
>> to the AER driver in ghes_do_proc().
>>
>> if (gdata->flags & CPER_SEC_RESET)
>> aer_severity = AER_FATAL;
>
> You're missing the point that we would walk into that if branch *only* for
>
> if (sev == GHES_SEV_RECOVERABLE &&
> sec_sev == GHES_SEV_RECOVERABLE
>
> severities. So if you have an AER_FATAL error but ghes severities are
> not GHES_SEV_RECOVERABLE, nothing happens.
I see. We should probably try to do something only if GHES_SEV_CORRECTED or
GHES_SEV_RECOVERABLE.
If somebody wants to crash the system with GHES_SEV_PANIC, there is no point
in doing additional work.
>
>> No, AER ISR is not set up if firmware first is enabled.
>
> So then this is a major suckage. We do AER recovery on FF systems only
> for GHES_SEV_RECOVERABLE severity.
>
>> The behavior should match non firmware-first case ideally.
>>
>> 1. Print all correctable errors.
>> 2. Go to do_recovery for all uncorrectable errors including fatal and
>> non-fatal.
>>
>> This is also what AER driver does in the absence of firmware first via
>> handle_error_source().
>
> Yes, that makes sense.
>
> Which would mean that we'd call aer_recover_queue() regardless of GHES
> severity but we'd do recovery only if GHES_SEV_RECOVERABLE is set
> or CPER_SEC_RESET. I.e., we can communicate all that by setting the
> correct AER severity before calling aer_recover_queue(). And then call
> do_recovery() based on AER severity.
>
> Hmmm?
>
Sounds good. Do you still want to do PCIe recovery in the case of
GHES_SEV_PANIC or if some FW returns GHES_SEV_NO?
--
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
next prev parent reply other threads:[~2017-08-30 15:31 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-28 17:11 [PATCH] acpi: apei: call into AER handling regardless of severity Tyler Baicar
2017-08-28 20:52 ` Rafael J. Wysocki
2017-08-29 8:20 ` Borislav Petkov
2017-08-29 21:27 ` Baicar, Tyler
2017-08-29 22:19 ` Borislav Petkov
2017-08-29 22:34 ` Sinan Kaya
2017-08-30 10:16 ` Borislav Petkov
2017-08-30 14:05 ` Sinan Kaya
2017-08-30 15:16 ` Borislav Petkov
2017-08-30 15:31 ` Sinan Kaya [this message]
2017-08-30 15:42 ` Baicar, Tyler
2017-08-30 17:14 ` Borislav Petkov
2017-08-30 18:09 ` Baicar, Tyler
2017-08-30 17:02 ` Borislav Petkov
2017-08-29 23:06 ` Luck, Tony
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=af5dc902-faca-a7a1-6781-95ff82d5d8fd@codeaurora.org \
--to=okaya@codeaurora.org \
--cc=andriy.shevchenko@linux.intel.com \
--cc=bp@suse.de \
--cc=james.morse@arm.com \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=prarit@redhat.com \
--cc=punit.agrawal@arm.com \
--cc=rjw@rjwysocki.net \
--cc=shiju.jose@huawei.com \
--cc=tbaicar@codeaurora.org \
--cc=tony.luck@intel.com \
--cc=will.deacon@arm.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox