linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: David Laight <David.Laight@ACULAB.COM>
To: 'Bjorn Helgaas' <helgaas@kernel.org>, Ira Weiny <ira.weiny@intel.com>
Cc: Alison Schofield <alison.schofield@intel.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Kuppuswamy Sathyanarayanan
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	Mahesh J Salgaonkar <mahesh@linux.ibm.com>,
	"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Ben Widawsky <bwidawsk@kernel.org>,
	Oliver O'Halloran <oohall@gmail.com>,
	Jonathan Cameron <Jonathan.Cameron@Huawei.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Stefan Roese <sr@denx.de>,
	"linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>
Subject: RE: [PATCH RFC] PCI/AER: Enable internal AER errors by default
Date: Mon, 13 Feb 2023 22:44:16 +0000	[thread overview]
Message-ID: <f5d13d0d67a34567a586a1171ba44b52@AcuMS.aculab.com> (raw)
In-Reply-To: <20230213213820.GA2935044@bhelgaas>

From: Bjorn Helgaas
> Sent: 13 February 2023 21:38
> 
> On Fri, Feb 10, 2023 at 02:33:23PM -0800, Ira Weiny wrote:
> > The CXL driver expects internal error reporting to be enabled via
> > pci_enable_pcie_error_reporting().  It is likely other drivers expect the same.
> > Dave submitted a patch to enable the CXL side[1] but the PCI AER registers
> > still mask errors.
> >
> > PCIe v6.0 Uncorrectable Mask Register (7.8.4.3) and Correctable Mask
> > Register (7.8.4.6) default to masking internal errors.  The
> > Uncorrectable Error Severity Register (7.8.4.4) defaults internal errors
> > as fatal.
> >
> > Enable internal errors to be reported via the standard
> > pci_enable_pcie_error_reporting() call.  Ensure uncorrectable errors are set
> > non-fatal to limit any impact to other drivers.
> 
> Do you have any background on why the spec makes these errors masked
> by default?  I'm sympathetic to wanting to learn about all the errors
> we can, but I'm a little wary if the spec authors thought it was
> important to mask these by default.

I'd guess that it is for backwards compatibility with older hardware
and/or software that that didn't support error notifications.

Then there are the x86 systems that manage to take the AER
error into some 'board management hardware' which finally
interrupts the kernel with an NMI - and the obvious consequence.
These systems are NEBS? 'qualified' for telecoms use, but take
out a PCIe link and the system crashes.

It is pretty easy to generate a PCIe error.
Any endpoint with two (or more) different sized BARs leaves
a big chunk of PCIe address space that is forwarded by the upstream
bridge but is not responded to.
The requirement to put the MSI-X area in its own BAR pretty much
ensures that such addresses exist.

(Never mind reprogramming the fpga that is terminating the link.)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


  reply	other threads:[~2023-02-13 22:45 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-10 22:33 [PATCH RFC] PCI/AER: Enable internal AER errors by default Ira Weiny
2023-02-13 21:38 ` Bjorn Helgaas
2023-02-13 22:44   ` David Laight [this message]
2023-02-15  0:08   ` Ira Weiny

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f5d13d0d67a34567a586a1171ba44b52@AcuMS.aculab.com \
    --to=david.laight@aculab.com \
    --cc=Jonathan.Cameron@Huawei.com \
    --cc=alison.schofield@intel.com \
    --cc=bhelgaas@google.com \
    --cc=bwidawsk@kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=helgaas@kernel.org \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mahesh@linux.ibm.com \
    --cc=oohall@gmail.com \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=sr@denx.de \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).