From: Gavin Shan <shangw@linux.vnet.ibm.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linuxppc-dev@lists.ozlabs.org, Gavin Shan <shangw@linux.vnet.ibm.com>
Subject: Re: [PATCH 21/27] powerpc/eeh: Process interrupts caused by EEH
Date: Sun, 16 Jun 2013 15:27:45 +0800 [thread overview]
Message-ID: <20130616072744.GA3845@shangw.(null)> (raw)
In-Reply-To: <1371359531.21896.128.camel@pasglop>
On Sun, Jun 16, 2013 at 03:12:11PM +1000, Benjamin Herrenschmidt wrote:
>On Sat, 2013-06-15 at 17:03 +0800, Gavin Shan wrote:
>> On PowerNV platform, the EEH event is produced either by detect
>> on accessing config or I/O registers, or by interrupts dedicated
>> for EEH report. The patch adds support to process the interrupts
>> dedicated for EEH report.
>>
>> Firstly, the kernel thread will be waken up to process incoming
>> interrupt. The PHBs will be scanned one by one to process all
>> existing EEH errors. Besides, There're mulple EEH errors that can
>> be reported from interrupts and we have differentiated actions
>> against them:
>>
>> - If the IOC is dead, all PCI buses under all PHBs will be removed
>> from the system.
>> - If the PHB is dead, all PCI buses under the PHB will be removed
>> from the system.
>> - If the PHB is fenced, EEH event will be sent to EEH core and
>> the fenced PHB is expected to be resetted completely.
>> - If specific PE has been put into frozen state, EEH event will
>> be sent to EEH core so that the PE will be resetted.
>> - If the error is informational one, we just output the related
>> registers for debugging purpose and no more action will be
>> taken.
>
Thanks for the review, Ben.
>Getting better.... but:
>
> - I still don't like having a kthread for that. Why not use schedule_work() ?
>
Ok. Will update it with schedule_work() in next revision :-)
> - We already have an EEH thread, why not just use it ? IE send it a special
>type of message that makes it query the backend for error info instead ?
>
Ok. I'll try to do as you suggested in next revision. Something like:
- Interrupt comes in
- OPAL notifier callback
- Mark all PHB and its subordinate PEs "isolated" since we don't know
which PHB/PE has problems (Note: we still need eeh_serialize_lock())
- Create an EEH event without binding PE to EEH core.
- EEH core starts new kthread and calls to next_error() backend
and handle the EEH errors accordingly.
* Informational errors: clear PHB "isolated" state and output diag-data
in backend (in eeh-ioda.c as you suggested).
* Fenced PHB: PHB complete reset by EEH core and "isolated" state will
be cleared during the reset automatically.
* Dead PHB: Remove the PHB and its subordinate PCI buses/devices from
the system.
* Dead IOC: Remove PCI domain from the system.
The problem with the scheme is that the PHB's state can't reflect the real state
any more. For example, PHB#0 has been fenced, but PHB#1 is normal state. We have
to mark all PHBs as "isolated" (fenced) since we don't know which PHB is encountering
problems in the OPAL notifier callback.
I think it would work well. Let me have a try to change the code and make it
workable. The side-effect would be introducing more logic to EEH core and it's
shared by multiple platforms (powernv, pseries, powerkvm guest in future). So
my initial though is making opal_pci_next_error() invisible from EEH core and
make the EEH core totally event-driven :-)
> - I'm not fan of exposing that EEH private lock. I don't entirely understand
>why you need to do that either.
>
It's used to get consistent PE isolated state, which is protected by the lock.
Without it, we would have following case. Since we're going to change the
PE's state in platform code (pci-err.c), we need the lock to protect the PE's
state.
CPU#0 CPU#1
PCI-CFG read returns 0xFF's PCI-CFG read returns 0xFF's
PE not fenced PE not fenced
PE marked as fenced PE marked as fenced
EEH event to EEH core EEH event to EEH core
>Generally speaking, I'm thinking this file should contain less stuff, most of
>it should move into the ioda backend, the interrupt just turning into some
>request down to the existing EEH thread.
>
Yeah, I'll move most of the stuff into eeh-ioda.c with above scheme applied :-)
Thanks,
Gavin
next prev parent reply other threads:[~2013-06-16 7:27 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-15 9:02 [PATCH v4 00/27] EEH Support for PowerNV platform Gavin Shan
2013-06-15 9:02 ` [PATCH 01/27] powerpc/eeh: Move common part to kernel directory Gavin Shan
2013-06-17 3:03 ` Mike Qiu
2013-06-18 0:55 ` Gavin Shan
2013-06-15 9:02 ` [PATCH 02/27] powerpc/eeh: Cleanup for EEH core Gavin Shan
2013-06-15 9:02 ` [PATCH 03/27] powerpc/eeh: Make eeh_phb_pe_get() public Gavin Shan
2013-06-15 9:02 ` [PATCH 04/27] powerpc/eeh: Make eeh_pe_get() public Gavin Shan
2013-06-15 9:02 ` [PATCH 05/27] powerpc/eeh: Trace PCI bus from PE Gavin Shan
2013-06-15 9:02 ` [PATCH 06/27] powerpc/eeh: Make eeh_init() public Gavin Shan
2013-06-15 9:02 ` [PATCH 07/27] powerpc/eeh: EEH post initialization operation Gavin Shan
2013-06-15 9:02 ` [PATCH 08/27] powerpc/eeh: Refactor eeh_reset_pe_once() Gavin Shan
2013-06-15 9:03 ` [PATCH 09/27] powerpc/eeh: Delay EEH probe during hotplug Gavin Shan
2013-06-15 9:03 ` [PATCH 10/27] powerpc/eeh: Export confirm_error_lock Gavin Shan
2013-06-15 9:03 ` [PATCH 11/27] powerpc/eeh: Sync OPAL API with firmware Gavin Shan
2013-06-15 9:03 ` [PATCH 12/27] powerpc/eeh: EEH backend for P7IOC Gavin Shan
2013-06-15 9:03 ` [PATCH 13/27] powerpc/eeh: I/O chip post initialization Gavin Shan
2013-06-15 9:03 ` [PATCH 14/27] powerpc/eeh: I/O chip EEH enable option Gavin Shan
2013-06-15 9:03 ` [PATCH 15/27] powerpc/eeh: I/O chip EEH state retrieval Gavin Shan
2013-06-15 9:03 ` [PATCH 16/27] powerpc/eeh: I/O chip PE reset Gavin Shan
2013-06-15 9:03 ` [PATCH 17/27] powerpc/eeh: I/O chip PE log and bridge setup Gavin Shan
2013-06-15 9:03 ` [PATCH 18/27] powerpc/eeh: PowerNV EEH backends Gavin Shan
2013-06-15 9:03 ` [PATCH 19/27] powerpc/eeh: Initialization for PowerNV Gavin Shan
2013-06-15 9:03 ` [PATCH 20/27] powerpc/eeh: Enable EEH check for config access Gavin Shan
2013-06-15 9:03 ` [PATCH 21/27] powerpc/eeh: Process interrupts caused by EEH Gavin Shan
2013-06-16 5:12 ` Benjamin Herrenschmidt
2013-06-16 7:27 ` Gavin Shan [this message]
2013-06-16 8:37 ` Benjamin Herrenschmidt
2013-06-15 9:03 ` [PATCH 22/27] powerpc/eeh: Allow to check fenced PHB proactively Gavin Shan
2013-06-15 9:03 ` [PATCH 23/27] powernv/opal: Notifier for OPAL events Gavin Shan
2013-06-15 9:03 ` [PATCH 24/27] powernv/opal: Disable OPAL notifier upon poweroff Gavin Shan
2013-06-15 9:03 ` [PATCH 25/27] powerpc/eeh: Register OPAL notifier for PCI error Gavin Shan
2013-06-15 9:03 ` [PATCH 26/27] powerpc/powernv: Debugfs directory for PHB Gavin Shan
2013-06-15 9:03 ` [PATCH 27/27] powerpc/eeh: Debugfs for error injection Gavin Shan
-- strict thread matches above, loose matches on Subject: below --
2013-06-05 7:34 [PATCH v3 00/27] EEH Support for PowerNV platform Gavin Shan
2013-06-05 7:34 ` [PATCH 21/27] powerpc/eeh: Process interrupts caused by EEH Gavin Shan
2013-06-11 8:13 ` Benjamin Herrenschmidt
2013-06-13 4:14 ` Gavin Shan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='20130616072744.GA3845@shangw.(null)' \
--to=shangw@linux.vnet.ibm.com \
--cc=benh@kernel.crashing.org \
--cc=linuxppc-dev@lists.ozlabs.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.