linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Baicar, Tyler" <tbaicar@codeaurora.org>
To: Sinan Kaya <okaya@codeaurora.org>, Borislav Petkov <bp@suse.de>
Cc: Tony Luck <tony.luck@intel.com>,
	rjw@rjwysocki.net, lenb@kernel.org, will.deacon@arm.com,
	james.morse@arm.com, prarit@redhat.com, punit.agrawal@arm.com,
	shiju.jose@huawei.com, andriy.shevchenko@linux.intel.com,
	linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org,
	Linux PCI <linux-pci@vger.kernel.org>,
	Huang Ying <ying.huang@intel.com>
Subject: Re: [PATCH] acpi: apei: call into AER handling regardless of severity
Date: Wed, 30 Aug 2017 09:42:08 -0600	[thread overview]
Message-ID: <a074792e-ad43-d4f5-6f99-8fcf8349c7ad@codeaurora.org> (raw)
In-Reply-To: <af5dc902-faca-a7a1-6781-95ff82d5d8fd@codeaurora.org>

On 8/30/2017 9:31 AM, Sinan Kaya wrote:
> On 8/30/2017 11:16 AM, Borislav Petkov wrote:
>> On Wed, Aug 30, 2017 at 10:05:44AM -0400, Sinan Kaya wrote:
>>> Link reset is not the only recovery mechanism. In the case of nonfatal
>>> errors, it is assumed that the endpoint CSR is still reachable.
>>> Error is propagated the PCIe endpoint driver. Endpoint driver does a
>>> re-initialization, we are back in business.
>> I'm assuming that's broadcast_error_message()'s job.
>>
> That's right. Each driver provides an err_handler hook. broadcast function
> calls these.
>
> static struct pci_driver e1000_driver = {
> 	..
> 	.err_handler = &e1000_err_handler
> };
>
> struct pci_error_handlers {
> 	...
> 	pci_ers_result_t (*error_detected)(struct pci_dev *dev,
> 					   enum pci_channel_state error);
> }
>
>
>>> That's not true. The GHES code is changing the severity here before posting
>>> to the AER driver in ghes_do_proc().
>>>
>>> 	if (gdata->flags & CPER_SEC_RESET)
>>> 		aer_severity = AER_FATAL;
>> You're missing the point that we would walk into that if branch *only* for
>>
>>                          if (sev == GHES_SEV_RECOVERABLE &&
>>                              sec_sev == GHES_SEV_RECOVERABLE
>>
>> severities. So if you have an AER_FATAL error but ghes severities are
>> not GHES_SEV_RECOVERABLE, nothing happens.
> I see. We should probably try to do something only if GHES_SEV_CORRECTED or
> GHES_SEV_RECOVERABLE.
>
> If somebody wants to crash the system with GHES_SEV_PANIC, there is no point
> in doing additional work.
See below.
>>> No, AER ISR is not set up if firmware first is enabled.
>> So then this is a major suckage. We do AER recovery on FF systems only
>> for GHES_SEV_RECOVERABLE severity.
>>
>>> The behavior should match non firmware-first case ideally.
>>>
>>> 1. Print all correctable errors.
>>> 2. Go to do_recovery for all uncorrectable errors including fatal and
>>> non-fatal.
>>>
>>> This is also what AER driver does in the absence of firmware first via
>>> handle_error_source().
>> Yes, that makes sense.
>>
>> Which would mean that we'd call aer_recover_queue() regardless of GHES
>> severity but we'd do recovery only if GHES_SEV_RECOVERABLE is set
>> or CPER_SEC_RESET. I.e., we can communicate all that by setting the
>> correct AER severity before calling aer_recover_queue(). And then call
>> do_recovery() based on AER severity.
>>
>> Hmmm?
>>
> Sounds good. Do you still want to do PCIe recovery in the case of
> GHES_SEV_PANIC or if some FW returns GHES_SEV_NO?
>
We do not need to worry about the GHES_SEV_PANIC case. Those get sent to 
__ghes_panic() in ghes_proc() without even making it to ghes_do_proc(). 
Those errors are just printed and then the kernel panics.

I think with my two patches we will have the desired functionality:

GHES_SEV_CORRECTABLE -> AER_CORRECTABLE -> Print AER info, but do not 
call do_recovery

GHES_SEV_RECOVERABLE -> AER_NONFATAL -> Print AER info and do_recovery

GHES_RECOVERABLE and CPER_SEC_RESET -> AER_FATAL -> Print AER info and 
do_recover

Thanks,
Tyler

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.


  reply	other threads:[~2017-08-30 15:42 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1503940314-29526-1-git-send-email-tbaicar@codeaurora.org>
     [not found] ` <20170829082055.u3qpwtgyzxjxfvup@pd.tnic>
     [not found]   ` <9abb2e99-44be-3315-47d9-2689b6c76d79@codeaurora.org>
     [not found]     ` <20170829221932.ojkvr4y6s76hcpkj@pd.tnic>
2017-08-29 22:34       ` [PATCH] acpi: apei: call into AER handling regardless of severity Sinan Kaya
2017-08-30 10:16         ` Borislav Petkov
2017-08-30 14:05           ` Sinan Kaya
2017-08-30 15:16             ` Borislav Petkov
2017-08-30 15:31               ` Sinan Kaya
2017-08-30 15:42                 ` Baicar, Tyler [this message]
2017-08-30 17:14                   ` Borislav Petkov
2017-08-30 18:09                     ` Baicar, Tyler
2017-08-30 17:02                 ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a074792e-ad43-d4f5-6f99-8fcf8349c7ad@codeaurora.org \
    --to=tbaicar@codeaurora.org \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=bp@suse.de \
    --cc=james.morse@arm.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=okaya@codeaurora.org \
    --cc=prarit@redhat.com \
    --cc=punit.agrawal@arm.com \
    --cc=rjw@rjwysocki.net \
    --cc=shiju.jose@huawei.com \
    --cc=tony.luck@intel.com \
    --cc=will.deacon@arm.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).