linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Sam Bobroff <sbobroff@linux.ibm.com>
To: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH 2/8] powerpc/eeh: Clear stale EEH_DEV_NO_HANDLER flag
Date: Mon, 8 Apr 2019 16:50:36 +1000	[thread overview]
Message-ID: <20190408065035.GA21472@tungsten.ozlabs.ibm.com> (raw)
In-Reply-To: <fd1acf72-aefe-233f-cd01-e89846551e08@ozlabs.ru>

[-- Attachment #1: Type: text/plain, Size: 3918 bytes --]

On Wed, Mar 20, 2019 at 05:02:57PM +1100, Alexey Kardashevskiy wrote:
> 
> 
> On 20/03/2019 13:58, Sam Bobroff wrote:
> > The EEH_DEV_NO_HANDLER flag is used by the EEH system to prevent the
> > use of driver callbacks in drivers that have been bound part way
> > through the recovery process. This is necessary to prevent later stage
> > handlers from being called when the earlier stage handlers haven't,
> > which can be confusing for drivers.
> 
> The flag is used from eeh_pe_report()->eeh_pe_report_edev which is
> called many times from eeh_handle_normal_event() (and you clear the flag
> here unconditionally) and once from eeh_handle_special_event() - so this
> is actually the only case now when the flag matters. Is my understanding
> correct? Also is not clearing the flag correct in that case? I do not
> quite understand eeh_handle_normal_event vs. eeh_handle_special_event
> business though.

I'm not sure I fully understand your question, but here's the situation:

* EEH is detected on a PCI device that has no driver bound but there is
  a driver that COULD bind.
* eeh_handle_normal_event() follows the "EEH: Reset with hotplug
  activity" path because the device doesn't (currently) have a driver
  that supports EEH.
* eeh_reset_device() removes the device (pci_hp_remove_devices()).
* eeh_reset_device() re-discovers the device with pci_hp_add_devices().
* As part of re-discovery the PCI subsystem will bind the available driver.
* eeh_handle_normal_event() calls eeh_report_resume() (via eeh_pe_report()).

If the (newly bound) driver has a resume() handler, then
eeh_report_resume() will call it and AFAIK this will cause a problem for
some drivers because their error_detected() handler wasn't called first.

The fix for this is to have eeh_add_device_late() set EEH_DEV_NO_HANDLER
so that we can detect that the device has been added DURING recovery,
and avoid calling it's handlers later.

I see what you mean about the eeh_handle_special_event() case, where
EEH_DEV_NO_HANDLER isn't cleared before calling eeh_pe_report(), and I
think it's a bug! I'll fix it in the next version.

(Cleaning up that flag is on my list. I don't think it's a very good
solution.)

> > 
> > However, the flag is set for all devices that are added after boot
> > time and only cleared at the end of the EEH recovery process. This
> > results in hot plugged devices erroneously having the flag set during
> > the first recovery after they are added (causing their driver's
> > handlers to be incorrectly ignored).
> > 
> > To remedy this, clear the flag at the beginning of recovery
> > processing. The flag is still cleared at the end of recovery
> > processing, although it is no longer really necessary.
> 
> Then may be remove that redundant clearing?

I don't really mind either way; clearing it when we are finished with
recovery seems "cleaner" to me but it doesn't have any function. (In
any case, I think I will eventually want to remove it.)

> > 
> > Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
> > ---
> >  arch/powerpc/kernel/eeh_driver.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
> > index 6f3ee30565dd..4c34b9901f15 100644
> > --- a/arch/powerpc/kernel/eeh_driver.c
> > +++ b/arch/powerpc/kernel/eeh_driver.c
> > @@ -819,6 +819,10 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
> >  		result = PCI_ERS_RESULT_DISCONNECT;
> >  	}
> >  
> > +	eeh_for_each_pe(pe, tmp_pe)
> > +		eeh_pe_for_each_dev(tmp_pe, edev, tmp)
> > +			edev->mode &= ~EEH_DEV_NO_HANDLER;
> > +
> >  	/* Walk the various device drivers attached to this slot through
> >  	 * a reset sequence, giving each an opportunity to do what it needs
> >  	 * to accomplish the reset.  Each child gets a report of the
> > 
> 
> -- 
> Alexey
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2019-04-08  6:52 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-20  2:58 [PATCH 0/8] Sam Bobroff
2019-03-20  2:58 ` [PATCH 1/8] powerpc/64: Adjust order in pcibios_init() Sam Bobroff
2019-03-20  6:03   ` Alexey Kardashevskiy
2019-03-20  2:58 ` [PATCH 2/8] powerpc/eeh: Clear stale EEH_DEV_NO_HANDLER flag Sam Bobroff
2019-03-20  6:02   ` Alexey Kardashevskiy
2019-04-08  6:50     ` Sam Bobroff [this message]
2019-03-20  2:58 ` [PATCH 3/8] powerpc/eeh: Convert PNV_PHB_FLAG_EEH to global flag Sam Bobroff
2019-03-20  6:02   ` Alexey Kardashevskiy
2019-04-09  1:41     ` Sam Bobroff
2019-04-18  9:51   ` Oliver O'Halloran
2019-04-30  5:30     ` Sam Bobroff
2019-03-20  2:58 ` [PATCH 4/8] powerpc/eeh: Improve debug messages around device addition Sam Bobroff
2019-03-20  6:02   ` Alexey Kardashevskiy
2019-03-20  2:58 ` [PATCH 5/8] powerpc/eeh: Add eeh_show_enabled() Sam Bobroff
2019-03-20  6:02   ` Alexey Kardashevskiy
2019-03-20  6:24     ` Oliver
2019-04-09  3:30     ` Sam Bobroff
2019-04-18 10:01   ` Oliver O'Halloran
2019-04-30  5:44     ` Sam Bobroff
2019-03-20  2:58 ` [PATCH 6/8] powerpc/eeh: Initialize EEH address cache earlier Sam Bobroff
2019-03-20  6:02   ` Alexey Kardashevskiy
2019-04-18 10:13   ` Oliver O'Halloran
2019-04-30  5:54     ` Sam Bobroff
2019-03-20  2:58 ` [PATCH 7/8] powerpc/eeh: EEH for pSeries hot plug Sam Bobroff
2019-03-20  6:02   ` Alexey Kardashevskiy
2019-03-20  2:58 ` [PATCH 8/8] powerpc/eeh: Remove eeh_probe_devices() and eeh_addr_cache_build() Sam Bobroff
2019-03-20  6:05   ` Alexey Kardashevskiy
2019-04-09  3:31     ` Sam Bobroff
2019-04-12  0:55 ` [PATCH 0/8] Tyrel Datwyler
2019-04-15  3:41   ` Sam Bobroff
2019-04-19 22:36     ` Tyrel Datwyler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190408065035.GA21472@tungsten.ozlabs.ibm.com \
    --to=sbobroff@linux.ibm.com \
    --cc=aik@ozlabs.ru \
    --cc=linuxppc-dev@lists.ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).