linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Gavin Shan <shangw@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH 21/27] powerpc/eeh: Process interrupts caused by EEH
Date: Sun, 16 Jun 2013 18:37:06 +1000	[thread overview]
Message-ID: <1371371826.21896.140.camel@pasglop> (raw)
In-Reply-To: <20130616072744.GA3845@shangw.(null)>

On Sun, 2013-06-16 at 15:27 +0800, Gavin Shan wrote:

> Thanks for the review, Ben.
> 
> >Getting better.... but:
> >
> > - I still don't like having a kthread for that. Why not use schedule_work() ?
> >
> 
> Ok. Will update it with schedule_work() in next revision :-)
> 
> > - We already have an EEH thread, why not just use it ? IE send it a special
> >type of message that makes it query the backend for error info instead ?
> >
> 
> Ok. I'll try to do as you suggested in next revision. Something like:
> 
> 	- Interrupt comes in
> 	- OPAL notifier callback
> 	- Mark all PHB and its subordinate PEs "isolated" since we don't know
> 	  which PHB/PE has problems (Note: we still need eeh_serialize_lock())

No, don't mark anything. It wouldn't be good to start marking "isolated"
things that aren't. It doesn't matter if we don't "know" they are
isolated just yet. Just "poke" the EEH thread with a different type of
message from the current one.

> 	- Create an EEH event without binding PE to EEH core.
> 	- EEH core starts new kthread and calls to next_error() backend
> 	  and handle the EEH errors accordingly.

No need for a new kthread, EEH core already runs in one, doesn't it ? or
am I missing something here ? And yes, if EEH core doesn't get a PE in
the message, then it can call a backend function along the lines of
"next_error()" which ... returns a PE, and does whatever additional
processing we want to do for PHBs.

That can also return "nothing to do" (INF).
	  
> 	  * Informational errors: clear PHB "isolated" state and output diag-data
> 	    in backend (in eeh-ioda.c as you suggested).

Don't mark isolated in the first place.

> 	  * Fenced PHB: PHB complete reset by EEH core and "isolated" state will
> 	    be cleared during the reset automatically.
> 	  * Dead PHB: Remove the PHB and its subordinate PCI buses/devices from
> 		      the system.
> 	  * Dead IOC: Remove PCI domain from the system.

Might want to have return codes from next_error for doing that from the
core, up to you, not a big deal.

> The problem with the scheme is that the PHB's state can't reflect the real state
> any more. For example, PHB#0 has been fenced, but PHB#1 is normal state. We have
> to mark all PHBs as "isolated" (fenced) since we don't know which PHB is encountering
> problems in the OPAL notifier callback.

No, we don't have to mark anything. The interrupt is an asynchronous
thing anyway, so the effect in practice is that we'll react to it a
little bit later, but it's already coming an undefined amount of time
after the error anyway so we may as well deal with it, and that helps
with the synchronization between detection via the interrupt vs.
detection (of the same error) via the MMIO reads.

> I think it would work well. Let me have a try to change the code and make it
> workable. The side-effect would be introducing more logic to EEH core and it's
> shared by multiple platforms (powernv, pseries, powerkvm guest in future). So
> my initial though is making opal_pci_next_error() invisible from EEH core and
> make the EEH core totally event-driven :-)
> 
> > - I'm not fan of exposing that EEH private lock. I don't entirely understand
> >why you need to do that either.
> >
> 
> It's used to get consistent PE isolated state, which is protected by the lock.
> Without it, we would have following case. Since we're going to change the
> PE's state in platform code (pci-err.c), we need the lock to protect the PE's
> state.

I'd rather you expose get/set_state functions and keep the lock local if
that makes sense but first look at what I've proposed above. It might be
that you are right and the lock must be exposed.

> 	
> 		    CPU#0				CPU#1
> 	PCI-CFG read returns 0xFF's		PCI-CFG read returns 0xFF's
> 	PE not fenced				PE not fenced
> 	PE marked as fenced			PE marked as fenced
> 	EEH event to EEH core			EEH event to EEH core
> 
> >Generally speaking, I'm thinking this file should contain less stuff, most of
> >it should move into the ioda backend, the interrupt just turning into some
> >request down to the existing EEH thread.
> >
> 
> Yeah, I'll move most of the stuff into eeh-ioda.c with above scheme applied :-)

Cheers,
Ben.

  reply	other threads:[~2013-06-16  8:37 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-15  9:02 [PATCH v4 00/27] EEH Support for PowerNV platform Gavin Shan
2013-06-15  9:02 ` [PATCH 01/27] powerpc/eeh: Move common part to kernel directory Gavin Shan
2013-06-17  3:03   ` Mike Qiu
2013-06-18  0:55     ` Gavin Shan
2013-06-15  9:02 ` [PATCH 02/27] powerpc/eeh: Cleanup for EEH core Gavin Shan
2013-06-15  9:02 ` [PATCH 03/27] powerpc/eeh: Make eeh_phb_pe_get() public Gavin Shan
2013-06-15  9:02 ` [PATCH 04/27] powerpc/eeh: Make eeh_pe_get() public Gavin Shan
2013-06-15  9:02 ` [PATCH 05/27] powerpc/eeh: Trace PCI bus from PE Gavin Shan
2013-06-15  9:02 ` [PATCH 06/27] powerpc/eeh: Make eeh_init() public Gavin Shan
2013-06-15  9:02 ` [PATCH 07/27] powerpc/eeh: EEH post initialization operation Gavin Shan
2013-06-15  9:02 ` [PATCH 08/27] powerpc/eeh: Refactor eeh_reset_pe_once() Gavin Shan
2013-06-15  9:03 ` [PATCH 09/27] powerpc/eeh: Delay EEH probe during hotplug Gavin Shan
2013-06-15  9:03 ` [PATCH 10/27] powerpc/eeh: Export confirm_error_lock Gavin Shan
2013-06-15  9:03 ` [PATCH 11/27] powerpc/eeh: Sync OPAL API with firmware Gavin Shan
2013-06-15  9:03 ` [PATCH 12/27] powerpc/eeh: EEH backend for P7IOC Gavin Shan
2013-06-15  9:03 ` [PATCH 13/27] powerpc/eeh: I/O chip post initialization Gavin Shan
2013-06-15  9:03 ` [PATCH 14/27] powerpc/eeh: I/O chip EEH enable option Gavin Shan
2013-06-15  9:03 ` [PATCH 15/27] powerpc/eeh: I/O chip EEH state retrieval Gavin Shan
2013-06-15  9:03 ` [PATCH 16/27] powerpc/eeh: I/O chip PE reset Gavin Shan
2013-06-15  9:03 ` [PATCH 17/27] powerpc/eeh: I/O chip PE log and bridge setup Gavin Shan
2013-06-15  9:03 ` [PATCH 18/27] powerpc/eeh: PowerNV EEH backends Gavin Shan
2013-06-15  9:03 ` [PATCH 19/27] powerpc/eeh: Initialization for PowerNV Gavin Shan
2013-06-15  9:03 ` [PATCH 20/27] powerpc/eeh: Enable EEH check for config access Gavin Shan
2013-06-15  9:03 ` [PATCH 21/27] powerpc/eeh: Process interrupts caused by EEH Gavin Shan
2013-06-16  5:12   ` Benjamin Herrenschmidt
2013-06-16  7:27     ` Gavin Shan
2013-06-16  8:37       ` Benjamin Herrenschmidt [this message]
2013-06-15  9:03 ` [PATCH 22/27] powerpc/eeh: Allow to check fenced PHB proactively Gavin Shan
2013-06-15  9:03 ` [PATCH 23/27] powernv/opal: Notifier for OPAL events Gavin Shan
2013-06-15  9:03 ` [PATCH 24/27] powernv/opal: Disable OPAL notifier upon poweroff Gavin Shan
2013-06-15  9:03 ` [PATCH 25/27] powerpc/eeh: Register OPAL notifier for PCI error Gavin Shan
2013-06-15  9:03 ` [PATCH 26/27] powerpc/powernv: Debugfs directory for PHB Gavin Shan
2013-06-15  9:03 ` [PATCH 27/27] powerpc/eeh: Debugfs for error injection Gavin Shan
  -- strict thread matches above, loose matches on Subject: below --
2013-06-05  7:34 [PATCH v3 00/27] EEH Support for PowerNV platform Gavin Shan
2013-06-05  7:34 ` [PATCH 21/27] powerpc/eeh: Process interrupts caused by EEH Gavin Shan
2013-06-11  8:13   ` Benjamin Herrenschmidt
2013-06-13  4:14     ` Gavin Shan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1371371826.21896.140.camel@pasglop \
    --to=benh@kernel.crashing.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=shangw@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).