linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Laight <David.Laight@ACULAB.COM>
To: 'Oliver O'Halloran' <oohall@gmail.com>, Arnd Bergmann <arnd@arndb.de>
Cc: Bjorn Helgaas <helgaas@kernel.org>,
	Keith Busch <kbusch@kernel.org>,
	"Paul Mackerras" <paulus@samba.org>,
	sparclinux <sparclinux@vger.kernel.org>,
	"Toan Le" <toan@os.amperecomputing.com>,
	Greg Ungerer <gerg@linux-m68k.org>,
	"Marek Vasut" <marek.vasut+renesas@gmail.com>,
	Rob Herring <robh@kernel.org>,
	"Lorenzo Pieralisi" <lorenzo.pieralisi@arm.com>,
	Sagi Grimberg <sagi@grimberg.me>,
	Russell King <linux@armlinux.org.uk>,
	Ley Foon Tan <ley.foon.tan@intel.com>,
	Christoph Hellwig <hch@lst.de>,
	Geert Uytterhoeven <geert@linux-m68k.org>,
	Kevin Hilman <khilman@baylibre.com>,
	linux-pci <linux-pci@vger.kernel.org>,
	Jakub Kicinski <kuba@kernel.org>,
	Matt Turner <mattst88@gmail.com>,
	"linux-kernel-mentees@lists.linuxfoundation.org" 
	<linux-kernel-mentees@lists.linuxfoundation.org>,
	Guenter Roeck <linux@roeck-us.net>, Ray Jui <rjui@broadcom.com>,
	Jens Axboe <axboe@fb.com>,
	Ivan Kokshaysky <ink@jurassic.park.msu.ru>,
	Shuah Khan <skhan@linuxfoundation.org>,
	"bjorn@helgaas.com" <bjorn@helgaas.com>,
	"Boris Ostrovsky" <boris.ostrovsky@oracle.com>,
	Richard Henderson <rth@twiddle.net>,
	Juergen Gross <jgross@suse.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	"Thomas Bogendoerfer" <tsbogend@alpha.franken.de>,
	Scott Branden <sbranden@broadcom.com>,
	Jingoo Han <jingoohan1@gmail.com>,
	"Saheed O. Bolarinwa" <refactormyself@gmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Philipp Zabel <p.zabel@pengutronix.de>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	Gustavo Pimentel <gustavo.pimentel@synopsys.com>,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
	"David S. Miller" <davem@davemloft.net>,
	Heiner Kallweit <hkallweit1@gmail.com>
Subject: RE: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86
Date: Wed, 15 Jul 2020 14:38:29 +0000	[thread overview]
Message-ID: <1e2ae69a55f542faa18988a49e9b9491@AcuMS.aculab.com> (raw)
In-Reply-To: <CAOSf1CEviMYySQhyQGks8hHjST-85wGpxEBasuxwSX_homBJ2A@mail.gmail.com>

From: Oliver O'Halloran
> Sent: 15 July 2020 05:19
> 
> On Wed, Jul 15, 2020 at 8:03 AM Arnd Bergmann <arnd@arndb.de> wrote:
...
> > - config space accesses are very rare compared to memory
> >   space access and on the hardware side the error handling
> >   would be similar, but readl/writel don't return errors, they just
> >   access wrong registers or return 0xffffffff.
> >   arch/powerpc/kernel/eeh.c has a ton extra code written to
> >   deal with it, but no other architectures do.
> 
> TBH the EEH MMIO hooks were probably a mistake to begin with. Errors
> detected via MMIO are almost always asynchronous to the error itself
> so you usually just wind up with a misleading stack trace rather than
> any kind of useful synchronous error reporting. It seems like most
> drivers don't bother checking for 0xFFs either and rely on the
> asynchronous reporting via .error_detected() instead, so I have to
> wonder what the point is. I've been thinking of removing the MMIO
> hooks and using a background poller to check for errors on each PHB
> periodically (assuming we don't have an EEH interrupt) instead. That
> would remove the requirement for eeh_dev_check_failure() to be
> interrupt safe too, so it might even let us fix all the godawful races
> in EEH.

I've 'played' with PCIe error handling - without much success.
What might be useful is for a driver that has just read ~0u to
be able to ask 'has there been an error signalled for this device?'.

I got an error generated by doing an MMIO access that was inside
the address range forwarded to the slave, but outside any of its BARs.
(Two BARs of different sizes leaves a nice gap.)
This got reported up to the bridge nearest the slave (which supported
error handling), but not to the root bridge (which I don't think does).
ISTR a message about EEH being handled by the hardware (the machine
is up but dmesg is full of messages from a bouncing USB mouse).

With such partial error reporting useful info can still be extracted.

Of course, what actually happens on a PCIe error is that the signal
gets routed to some 'board support logic' and then passed back into
the kernel as an NMI - which then crashes the kernel!
This even happens when the PCIe link goes down after we've done a
soft-remove of the device itself!
Rather makes updating the board's FPGA without a reboot tricky.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

  reply	other threads:[~2020-07-15 14:38 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAK8P3a3NWSZw6678k1O2eJ6-c5GuW7484PRvEzU9MEPPrCD-yw@mail.gmail.com>
2020-07-14 18:45 ` [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86 Bjorn Helgaas
2020-07-14 21:02   ` Kjetil Oftedal
2020-07-15  2:14     ` Benjamin Herrenschmidt
2020-07-14 22:01   ` Arnd Bergmann
2020-07-14 23:46     ` Bjorn Helgaas
2020-07-15  2:19       ` Benjamin Herrenschmidt
2020-07-15  6:47       ` Arnd Bergmann
2020-07-15 22:26         ` Benjamin Herrenschmidt
2020-07-15  4:18     ` Oliver O'Halloran
2020-07-15 14:38       ` David Laight [this message]
2020-07-15 22:12         ` Bjorn Helgaas
2020-07-15 22:49           ` Benjamin Herrenschmidt
2020-07-16  8:07             ` David Laight
2020-07-14 23:14   ` Rob Herring
2020-07-15  2:12   ` Benjamin Herrenschmidt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1e2ae69a55f542faa18988a49e9b9491@AcuMS.aculab.com \
    --to=david.laight@aculab.com \
    --cc=arnd@arndb.de \
    --cc=axboe@fb.com \
    --cc=bhelgaas@google.com \
    --cc=bjorn@helgaas.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=davem@davemloft.net \
    --cc=geert@linux-m68k.org \
    --cc=gerg@linux-m68k.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=gustavo.pimentel@synopsys.com \
    --cc=hch@lst.de \
    --cc=helgaas@kernel.org \
    --cc=hkallweit1@gmail.com \
    --cc=ink@jurassic.park.msu.ru \
    --cc=jgross@suse.com \
    --cc=jingoohan1@gmail.com \
    --cc=kbusch@kernel.org \
    --cc=khilman@baylibre.com \
    --cc=kuba@kernel.org \
    --cc=ley.foon.tan@intel.com \
    --cc=linux-kernel-mentees@lists.linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=linux@roeck-us.net \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=lorenzo.pieralisi@arm.com \
    --cc=marek.vasut+renesas@gmail.com \
    --cc=mattst88@gmail.com \
    --cc=oohall@gmail.com \
    --cc=p.zabel@pengutronix.de \
    --cc=paulus@samba.org \
    --cc=refactormyself@gmail.com \
    --cc=rjui@broadcom.com \
    --cc=robh@kernel.org \
    --cc=rth@twiddle.net \
    --cc=sagi@grimberg.me \
    --cc=sbranden@broadcom.com \
    --cc=skhan@linuxfoundation.org \
    --cc=sparclinux@vger.kernel.org \
    --cc=toan@os.amperecomputing.com \
    --cc=tsbogend@alpha.franken.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).