linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>
To: Bjorn Helgaas <helgaas@kernel.org>,
	 Jim Quinlan <james.quinlan@broadcom.com>
Cc: linux-pci@vger.kernel.org,
	"Nicolas Saenz Julienne" <nsaenz@kernel.org>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Lorenzo Pieralisi" <lorenzo.pieralisi@arm.com>,
	"Cyril Brulebois" <kibi@debian.org>,
	bcm-kernel-feedback-list@broadcom.com, jim2101024@gmail.com,
	"Florian Fainelli" <florian.fainelli@broadcom.com>,
	"Lorenzo Pieralisi" <lpieralisi@kernel.org>,
	"Krzysztof Wilczyński" <kwilczynski@kernel.org>,
	"Manivannan Sadhasivam" <mani@kernel.org>,
	"Rob Herring" <robh@kernel.org>,
	"moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE"
	<linux-rpi-kernel@lists.infradead.org>,
	"moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE"
	<linux-arm-kernel@lists.infradead.org>,
	"open list" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v3 2/2] PCI: brcmstb: Add panic/die handler to driver
Date: Tue, 21 Oct 2025 14:02:10 +0300 (EEST)	[thread overview]
Message-ID: <2b0f9620-a105-6e49-f9cb-4bac14e14ce2@linux.intel.com> (raw)
In-Reply-To: <20251020184832.GA1144646@bhelgaas>

On Mon, 20 Oct 2025, Bjorn Helgaas wrote:

> On Fri, Oct 03, 2025 at 03:56:07PM -0400, Jim Quinlan wrote:
> > Whereas most PCIe HW returns 0xffffffff on illegal accesses and the like,
> > by default Broadcom's STB PCIe controller effects an abort.  Some SoCs --
> > 7216 and its descendants -- have new HW that identifies error details.
> > 
> > This simple handler determines if the PCIe controller was the cause of the
> > abort and if so, prints out diagnostic info.  Unfortunately, an abort still
> > occurs.
> > 
> > Care is taken to read the error registers only when the PCIe bridge is
> > active and the PCIe registers are acceptable.  Otherwise, a "die" event
> > caused by something other than the PCIe could cause an abort if the PCIe
> > "die" handler tried to access registers when the bridge is off.
> > 
> > Example error output:
> >   brcm-pcie 8b20000.pcie: Error: Mem Acc: 32bit, Read, @0x38000000
> >   brcm-pcie 8b20000.pcie:  Type: TO=0 Abt=0 UnspReq=1 AccDsble=0 BadAddr=0
> 
> > +/* Error report registers */
> > +#define PCIE_OUTB_ERR_TREAT				0x6000
> > +#define  PCIE_OUTB_ERR_TREAT_CONFIG_MASK		0x1
> > +#define  PCIE_OUTB_ERR_TREAT_MEM_MASK			0x2
> > +#define PCIE_OUTB_ERR_VALID				0x6004
> > +#define PCIE_OUTB_ERR_CLEAR				0x6008
> > +#define PCIE_OUTB_ERR_ACC_INFO				0x600c
> > +#define  PCIE_OUTB_ERR_ACC_INFO_CFG_ERR_MASK		0x01
> > +#define  PCIE_OUTB_ERR_ACC_INFO_MEM_ERR_MASK		0x02
> > +#define  PCIE_OUTB_ERR_ACC_INFO_TYPE_64_MASK		0x04
> > +#define  PCIE_OUTB_ERR_ACC_INFO_DIR_WRITE_MASK		0x10
> > +#define  PCIE_OUTB_ERR_ACC_INFO_BYTE_LANES_MASK		0xff00
> > +#define PCIE_OUTB_ERR_ACC_ADDR				0x6010
> > +#define PCIE_OUTB_ERR_ACC_ADDR_BUS_MASK			0xff00000
> > +#define PCIE_OUTB_ERR_ACC_ADDR_DEV_MASK			0xf8000
> > +#define PCIE_OUTB_ERR_ACC_ADDR_FUNC_MASK		0x7000
> > +#define PCIE_OUTB_ERR_ACC_ADDR_REG_MASK			0xfff
> > +#define PCIE_OUTB_ERR_CFG_CAUSE				0x6014
> > +#define  PCIE_OUTB_ERR_CFG_CAUSE_TIMEOUT_MASK		0x40
> > +#define  PCIE_OUTB_ERR_CFG_CAUSE_ABORT_MASK		0x20
> > +#define  PCIE_OUTB_ERR_CFG_CAUSE_UNSUPP_REQ_MASK	0x10
> > +#define  PCIE_OUTB_ERR_CFG_CAUSE_ACC_TIMEOUT_MASK	0x4
> > +#define  PCIE_OUTB_ERR_CFG_CAUSE_ACC_DISABLED_MASK	0x2
> > +#define  PCIE_OUTB_ERR_CFG_CAUSE_ACC_64BIT__MASK	0x1

Double __

> > +#define PCIE_OUTB_ERR_MEM_ADDR_LO			0x6018
> > +#define PCIE_OUTB_ERR_MEM_ADDR_HI			0x601c
> > +#define PCIE_OUTB_ERR_MEM_CAUSE				0x6020
> > +#define  PCIE_OUTB_ERR_MEM_CAUSE_TIMEOUT_MASK		0x40
> > +#define  PCIE_OUTB_ERR_MEM_CAUSE_ABORT_MASK		0x20
> > +#define  PCIE_OUTB_ERR_MEM_CAUSE_UNSUPP_REQ_MASK	0x10
> > +#define  PCIE_OUTB_ERR_MEM_CAUSE_ACC_DISABLED_MASK	0x2
> > +#define  PCIE_OUTB_ERR_MEM_CAUSE_BAD_ADDR_MASK		0x1

Maybe use BIT() instead for single bits?

> IMO "_MASK" is not adding anything useful to these names.  But I see
> there's a lot of precedent in this driver.
>
> >  #define  PCIE_RGR1_SW_INIT_1_PERST_MASK			0x1
> >  #define  PCIE_RGR1_SW_INIT_1_PERST_SHIFT		0x0

Please don't add unnecessary _SHIFT defines as FIELD_GET/PREP() for the 
field define should have most cases covered that require shifting.

This define is also entirely unused in this patch.

> > @@ -306,6 +342,8 @@ struct brcm_pcie {
> >  	bool			ep_wakeup_capable;
> >  	const struct pcie_cfg_data	*cfg;
> >  	bool			bridge_in_reset;
> > +	struct notifier_block	die_notifier;
> > +	struct notifier_block	panic_notifier;
> >  	spinlock_t		bridge_lock;
> >  };
> >  
> > @@ -1731,6 +1769,115 @@ static int brcm_pcie_resume_noirq(struct device *dev)
> >  	return ret;
> >  }
> >  
> > +/* Dump out PCIe errors on die or panic */
> > +static int _brcm_pcie_dump_err(struct brcm_pcie *pcie,
> 
> What is the leading underscore telling me?  There's no
> brcm_pcie_dump_err() that we need to distinguish from.
> 
> > +			       const char *type)
> > +{
> > +	void __iomem *base = pcie->base;
> > +	int i, is_cfg_err, is_mem_err, lanes;
> > +	char *width_str, *direction_str, lanes_str[9];
> > +	u32 info, cfg_addr, cfg_cause, mem_cause, lo, hi;
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&pcie->bridge_lock, flags);
> > +	/* Don't access registers when the bridge is off */
> > +	if (pcie->bridge_in_reset || readl(base + PCIE_OUTB_ERR_VALID) == 0) {
> > +		spin_unlock_irqrestore(&pcie->bridge_lock, flags);
> > +		return NOTIFY_DONE;
> > +	}
> > +
> > +	/* Read all necessary registers so we can release the spinlock ASAP */
> > +	info = readl(base + PCIE_OUTB_ERR_ACC_INFO);
> > +	is_cfg_err = !!(info & PCIE_OUTB_ERR_ACC_INFO_CFG_ERR_MASK);
> > +	is_mem_err = !!(info & PCIE_OUTB_ERR_ACC_INFO_MEM_ERR_MASK);
> > +	if (is_cfg_err) {
> > +		cfg_addr = readl(base + PCIE_OUTB_ERR_ACC_ADDR);
> > +		cfg_cause = readl(base + PCIE_OUTB_ERR_CFG_CAUSE);
> > +	}
> > +	if (is_mem_err) {
> > +		mem_cause = readl(base + PCIE_OUTB_ERR_MEM_CAUSE);
> > +		lo = readl(base + PCIE_OUTB_ERR_MEM_ADDR_LO);
> > +		hi = readl(base + PCIE_OUTB_ERR_MEM_ADDR_HI);
> > +	}
> > +	/* We've got all of the info, clear the error */
> > +	writel(1, base + PCIE_OUTB_ERR_CLEAR);
> > +	spin_unlock_irqrestore(&pcie->bridge_lock, flags);
> > +
> > +	dev_err(pcie->dev, "reporting data on PCIe %s error\n", type);
> 
> Looks like this isn't included in the example error output.  Not a big
> deal in itself, but logging this:
> 
>   brcm-pcie 8b20000.pcie: reporting data on PCIe Panic error
> 
> suggests that we know this panic was directly *caused* by PCIe, and
> I'm not sure the fact that somebody called panic() and
> PCIE_OUTB_ERR_VALID was non-zero is convincing evidence of that.
> 
> I think this relies on the assumptions that (a) the controller
> triggers an abort and (b) the abort handler calls panic().  So I think
> this logs useful information that *might* be related to the panic.
> 
> I'd rather phrase this with a little less certainty, to convey the
> idea that "here's some PCIe error information that might be related to
> the panic/die".
> 
> > +	width_str = (info & PCIE_OUTB_ERR_ACC_INFO_TYPE_64_MASK) ? "64bit" : "32bit";
> > +	direction_str = (info & PCIE_OUTB_ERR_ACC_INFO_DIR_WRITE_MASK) ? "Write" : "Read";

Please use str_read_write() + don't forget it's include.

It might be also worth to add str_64bit_32bit() in the form with the
dash ("64-bit") as there a couple of other drivers print the same choice.


> > +	lanes = FIELD_GET(PCIE_OUTB_ERR_ACC_INFO_BYTE_LANES_MASK, info);
> > +	for (i = 0, lanes_str[8] = 0; i < 8; i++)
> > +		lanes_str[i] = (lanes & (1 << i)) ? '1' : '0';
> > +
> > +	if (is_cfg_err) {
> > +		int bus = FIELD_GET(PCIE_OUTB_ERR_ACC_ADDR_BUS_MASK, cfg_addr);
> > +		int dev = FIELD_GET(PCIE_OUTB_ERR_ACC_ADDR_DEV_MASK, cfg_addr);
> > +		int func = FIELD_GET(PCIE_OUTB_ERR_ACC_ADDR_FUNC_MASK, cfg_addr);
> > +		int reg = FIELD_GET(PCIE_OUTB_ERR_ACC_ADDR_REG_MASK, cfg_addr);
> > +
> > +		dev_err(pcie->dev, "Error: CFG Acc, %s, %s, Bus=%d, Dev=%d, Fun=%d, Reg=0x%x, lanes=%s\n",
> 
> Why are we printing bus and dev with %d?  Can we use the usual format
> ("%04x:%02x:%02x.%d") so it matches other logging?
> 
> > +			width_str, direction_str, bus, dev, func, reg, lanes_str);
> > +		dev_err(pcie->dev, " Type: TO=%d Abt=%d UnsupReq=%d AccTO=%d AccDsbld=%d Acc64bit=%d\n",
> > +			!!(cfg_cause & PCIE_OUTB_ERR_CFG_CAUSE_TIMEOUT_MASK),
> > +			!!(cfg_cause & PCIE_OUTB_ERR_CFG_CAUSE_ABORT_MASK),
> > +			!!(cfg_cause & PCIE_OUTB_ERR_CFG_CAUSE_UNSUPP_REQ_MASK),
> > +			!!(cfg_cause & PCIE_OUTB_ERR_CFG_CAUSE_ACC_TIMEOUT_MASK),
> > +			!!(cfg_cause & PCIE_OUTB_ERR_CFG_CAUSE_ACC_DISABLED_MASK),
> > +			!!(cfg_cause & PCIE_OUTB_ERR_CFG_CAUSE_ACC_64BIT__MASK));
> > +	}
> > +
> > +	if (is_mem_err) {
> > +		u64 addr = ((u64)hi << 32) | (u64)lo;
> > +
> > +		dev_err(pcie->dev, "Error: Mem Acc, %s, %s, @0x%llx, lanes=%s\n",
> > +			width_str, direction_str, addr, lanes_str);
> > +		dev_err(pcie->dev, " Type: TO=%d Abt=%d UnsupReq=%d AccDsble=%d BadAddr=%d\n",
> > +			!!(mem_cause & PCIE_OUTB_ERR_MEM_CAUSE_TIMEOUT_MASK),
> > +			!!(mem_cause & PCIE_OUTB_ERR_MEM_CAUSE_ABORT_MASK),
> > +			!!(mem_cause & PCIE_OUTB_ERR_MEM_CAUSE_UNSUPP_REQ_MASK),
> > +			!!(mem_cause & PCIE_OUTB_ERR_MEM_CAUSE_ACC_DISABLED_MASK),
> > +			!!(mem_cause & PCIE_OUTB_ERR_MEM_CAUSE_BAD_ADDR_MASK));
> > +	}
> > +
> > +	return NOTIFY_OK;
> 
> What is the difference between NOTIFY_DONE and NOTIFY_OK?  Can the
> caller do anything useful based on the difference?
> 
> This seems like opportunistic error information that isn't definitely
> definitely connected to anything, so I'm not sure returning different
> values is really reliable.
> 

  reply	other threads:[~2025-10-21 11:02 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-03 19:56 [PATCH v3 0/2] PCI: brcmstb: Add panic/die handler to driver Jim Quinlan
2025-10-03 19:56 ` [PATCH v3 1/2] PCI: brcmstb: Add a way to indicate if PCIe bridge is active Jim Quinlan
2025-10-03 19:56 ` [PATCH v3 2/2] PCI: brcmstb: Add panic/die handler to driver Jim Quinlan
2025-10-04  5:06   ` [External] : " ALOK TIWARI
2025-10-28 20:34     ` James Quinlan
2025-10-20 18:48   ` Bjorn Helgaas
2025-10-21 11:02     ` Ilpo Järvinen [this message]
2025-10-28 22:37       ` James Quinlan
2025-10-28 21:17     ` James Quinlan
2025-10-20  6:48 ` [PATCH v3 0/2] " Manivannan Sadhasivam
2025-10-28 18:07   ` Bjorn Helgaas
2025-10-29 21:28     ` James Quinlan
2025-10-29 23:04       ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2b0f9620-a105-6e49-f9cb-4bac14e14ce2@linux.intel.com \
    --to=ilpo.jarvinen@linux.intel.com \
    --cc=bcm-kernel-feedback-list@broadcom.com \
    --cc=bhelgaas@google.com \
    --cc=florian.fainelli@broadcom.com \
    --cc=helgaas@kernel.org \
    --cc=james.quinlan@broadcom.com \
    --cc=jim2101024@gmail.com \
    --cc=kibi@debian.org \
    --cc=kwilczynski@kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-rpi-kernel@lists.infradead.org \
    --cc=lorenzo.pieralisi@arm.com \
    --cc=lpieralisi@kernel.org \
    --cc=mani@kernel.org \
    --cc=nsaenz@kernel.org \
    --cc=robh@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).