From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 714B1C87FCB for ; Wed, 6 Aug 2025 19:20:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:In-Reply-To:Content-Type: MIME-Version:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:References: List-Owner; bh=V2k+UePl5Mlbv7jOKAs6GCQvbTPhPO+6f6TtH5eukNk=; b=yLMcrd8qTq0BAR xtPiJ0snXBD7JdYPNMiZOTht2fuYwf5OyBInxXVGsQTACgwpUEP7elQMDN4s2htLDxQuMrFSbpL4J dyq8NRKKoQJxW5cGnjXymW7Aktd7GiiIPQQF412Onb6hkxUB6kqM7kVoNqAGOmhQd1raSM0txS2vQ GW+GSLdBKIuKGu4MVgLMgAE3mjptzRcpg1dv8d13Fd82bCWD9d0EeHEik6g0hy9l5y8hG1mp04V3z vLM9lLv8TU4+S/dklzFJVVA3ZwCY5SBylawsy2R4T3s6xIOGnhaLBvkr9zlkQ7rrR8/BLf5tnzZqZ nE+fEWAtG2Ol6oRt+75w==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98.2 #2 (Red Hat Linux)) id 1ujjgF-0000000GBjG-15kx; Wed, 06 Aug 2025 19:20:07 +0000 Received: from sea.source.kernel.org ([2600:3c0a:e001:78e:0:1991:8:25]) by bombadil.infradead.org with esmtps (Exim 4.98.2 #2 (Red Hat Linux)) id 1ujjbC-0000000GB4F-1SLe; Wed, 06 Aug 2025 19:14:55 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sea.source.kernel.org (Postfix) with ESMTP id BE51A45848; Wed, 6 Aug 2025 19:14:53 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6A1B8C4CEE7; Wed, 6 Aug 2025 19:14:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1754507693; bh=tcS5mr/jyRzpJT28O3H86kSEfCRhVdzq9mLUNr4HIQU=; h=Date:From:To:Cc:Subject:In-Reply-To:From; b=L3EyHDOhYlQXCwEdaUOACa9al3j1Fv008lfjM0z7NFb5a+sLm5fA32epknU/YBtD3 RVu/35dPvEbBFfedgEUg/FgtQpWOlzn/BAyOxtZW1Yh0Tbx3KcRWnBX3wXgkftahGV c9ZMPNj5uGYuJJzr3mHlQSJjrwbtmYrgWRYpMNg5kdgkwE8W3GPky9ie8H0bx3AHRm +Ivdfy4C+RYqgxZJETkD/2f9BomUpPzybctn8P9rwtwwJt5xXRIcZPUFtZTawvxVa1 gD3qsbjl6DjlkT3Ugj0xkdWxjR3gjea9I2kpWy/GSvjCjcvAnYxJEOtS8TT81YaVfE D4gWckA5JM29A== Date: Wed, 6 Aug 2025 14:14:52 -0500 From: Bjorn Helgaas To: Jim Quinlan Cc: linux-pci@vger.kernel.org, Nicolas Saenz Julienne , Bjorn Helgaas , Lorenzo Pieralisi , Cyril Brulebois , bcm-kernel-feedback-list@broadcom.com, jim2101024@gmail.com, Florian Fainelli , Lorenzo Pieralisi , Krzysztof =?utf-8?Q?Wilczy=C5=84ski?= , Manivannan Sadhasivam , Rob Herring , "moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE" , "moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE" , open list Subject: Re: [PATCH 2/2] PCI: brcmstb: Add panic/die handler to driver Message-ID: <20250806191452.GA8313@bhelgaas> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250613220843.698227-3-james.quinlan@broadcom.com> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250806_121454_428406_BD13E052 X-CRM114-Status: GOOD ( 32.64 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Fri, Jun 13, 2025 at 06:08:43PM -0400, Jim Quinlan wrote: > Whereas most PCIe HW returns 0xffffffff on illegal accesses and the like, > by default Broadcom's STB PCIe controller effects an abort. Some SoCs -- > 7216 and its descendants -- have new HW that identifies error details. > > This simple handler determines if the PCIe controller was the cause of the > abort and if so, prints out diagnostic info. Unfortunately, an abort still > occurs. > > Care is taken to read the error registers only when the PCIe bridge is > active and the PCIe registers are acceptable. Otherwise, a "die" event > caused by something other than the PCIe could cause an abort if the PCIe > "die" handler tried to access registers when the bridge is off. s/acceptable/accessible/ ? > Example error output: > brcm-pcie 8b20000.pcie: Error: Mem Acc: 32bit, Read, @0x38000000 > brcm-pcie 8b20000.pcie: Type: TO=0 Abt=0 UnspReq=1 AccDsble=0 BadAddr=0 Ugly that we have to do this at all, but since I guess it's the best we can do, looks ok to me. > Signed-off-by: Jim Quinlan > --- > drivers/pci/controller/pcie-brcmstb.c | 155 +++++++++++++++++++++++++- > 1 file changed, 154 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/controller/pcie-brcmstb.c b/drivers/pci/controller/pcie-brcmstb.c > index 400854c893d8..abc56acad1fe 100644 > --- a/drivers/pci/controller/pcie-brcmstb.c > +++ b/drivers/pci/controller/pcie-brcmstb.c > @@ -13,15 +13,18 @@ > #include > #include > #include > +#include > #include > #include > #include > #include > #include > +#include > #include > #include > #include > #include > +#include > #include > #include > #include > @@ -151,6 +154,39 @@ > #define MSI_INT_MASK_SET 0x10 > #define MSI_INT_MASK_CLR 0x14 > > +/* Error report registers */ > +#define PCIE_OUTB_ERR_TREAT 0x6000 > +#define PCIE_OUTB_ERR_TREAT_CONFIG_MASK 0x1 > +#define PCIE_OUTB_ERR_TREAT_MEM_MASK 0x2 > +#define PCIE_OUTB_ERR_VALID 0x6004 > +#define PCIE_OUTB_ERR_CLEAR 0x6008 > +#define PCIE_OUTB_ERR_ACC_INFO 0x600c > +#define PCIE_OUTB_ERR_ACC_INFO_CFG_ERR_MASK 0x01 > +#define PCIE_OUTB_ERR_ACC_INFO_MEM_ERR_MASK 0x02 > +#define PCIE_OUTB_ERR_ACC_INFO_TYPE_64_MASK 0x04 > +#define PCIE_OUTB_ERR_ACC_INFO_DIR_WRITE_MASK 0x10 Including "MASK" in these names seems kind of pointless since they're all single bits. Some drivers don't bother with "MASK" even for the multi-bit fields, since uses read pretty naturally without it. But I suppose this is following the existing brcmstb style. > +#define PCIE_OUTB_ERR_ACC_INFO_BYTE_LANES_MASK 0xff00 > +#define PCIE_OUTB_ERR_ACC_ADDR 0x6010 > +#define PCIE_OUTB_ERR_ACC_ADDR_BUS_MASK 0xff00000 > +#define PCIE_OUTB_ERR_ACC_ADDR_DEV_MASK 0xf8000 > +#define PCIE_OUTB_ERR_ACC_ADDR_FUNC_MASK 0x7000 > +#define PCIE_OUTB_ERR_ACC_ADDR_REG_MASK 0xfff > +#define PCIE_OUTB_ERR_CFG_CAUSE 0x6014 > +#define PCIE_OUTB_ERR_CFG_CAUSE_TIMEOUT_MASK 0x40 > +#define PCIE_OUTB_ERR_CFG_CAUSE_ABORT_MASK 0x20 > +#define PCIE_OUTB_ERR_CFG_CAUSE_UNSUPP_REQ_MASK 0x10 > +#define PCIE_OUTB_ERR_CFG_CAUSE_ACC_TIMEOUT_MASK 0x4 > +#define PCIE_OUTB_ERR_CFG_CAUSE_ACC_DISABLED_MASK 0x2 > +#define PCIE_OUTB_ERR_CFG_CAUSE_ACC_64BIT__MASK 0x1 > +#define PCIE_OUTB_ERR_MEM_ADDR_LO 0x6018 > +#define PCIE_OUTB_ERR_MEM_ADDR_HI 0x601c > +#define PCIE_OUTB_ERR_MEM_CAUSE 0x6020 > +#define PCIE_OUTB_ERR_MEM_CAUSE_TIMEOUT_MASK 0x40 > +#define PCIE_OUTB_ERR_MEM_CAUSE_ABORT_MASK 0x20 > +#define PCIE_OUTB_ERR_MEM_CAUSE_UNSUPP_REQ_MASK 0x10 > +#define PCIE_OUTB_ERR_MEM_CAUSE_ACC_DISABLED_MASK 0x2 > +#define PCIE_OUTB_ERR_MEM_CAUSE_BAD_ADDR_MASK 0x1 > + > #define PCIE_RGR1_SW_INIT_1_PERST_MASK 0x1 > #define PCIE_RGR1_SW_INIT_1_PERST_SHIFT 0x0 > > @@ -301,6 +337,8 @@ struct brcm_pcie { > struct subdev_regulators *sr; > bool ep_wakeup_capable; > const struct pcie_cfg_data *cfg; > + struct notifier_block die_notifier; > + struct notifier_block panic_notifier; > bool bridge_on; > spinlock_t bridge_lock; > }; > @@ -1711,6 +1749,115 @@ static int brcm_pcie_resume_noirq(struct device *dev) > return ret; > } > > +/* Dump out PCIe errors on die or panic */ > +static int _brcm_pcie_dump_err(struct brcm_pcie *pcie, > + const char *type) Fits on one line. > +{ > + void __iomem *base = pcie->base; > + int i, is_cfg_err, is_mem_err, lanes; > + char *width_str, *direction_str, lanes_str[9]; > + u32 info, cfg_addr, cfg_cause, mem_cause, lo, hi; > + unsigned long flags; > + > + spin_lock_irqsave(&pcie->bridge_lock, flags); > + /* Don't access registers when the bridge is off */ > + if (!pcie->bridge_on || readl(base + PCIE_OUTB_ERR_VALID) == 0) { > + spin_unlock_irqrestore(&pcie->bridge_lock, flags); > + return NOTIFY_DONE; > + } > + > + /* Read all necessary registers so we can release the spinlock ASAP */ > + info = readl(base + PCIE_OUTB_ERR_ACC_INFO); > + is_cfg_err = !!(info & PCIE_OUTB_ERR_ACC_INFO_CFG_ERR_MASK); > + is_mem_err = !!(info & PCIE_OUTB_ERR_ACC_INFO_MEM_ERR_MASK); > + if (is_cfg_err) { > + cfg_addr = readl(base + PCIE_OUTB_ERR_ACC_ADDR); > + cfg_cause = readl(base + PCIE_OUTB_ERR_CFG_CAUSE); > + } > + if (is_mem_err) { > + mem_cause = readl(base + PCIE_OUTB_ERR_MEM_CAUSE); > + lo = readl(base + PCIE_OUTB_ERR_MEM_ADDR_LO); > + hi = readl(base + PCIE_OUTB_ERR_MEM_ADDR_HI); > + } > + /* We've got all of the info, clear the error */ > + writel(1, base + PCIE_OUTB_ERR_CLEAR); > + spin_unlock_irqrestore(&pcie->bridge_lock, flags); > + > + dev_err(pcie->dev, "handling %s error notification\n", type); > + width_str = (info & PCIE_OUTB_ERR_ACC_INFO_TYPE_64_MASK) ? "64bit" : "32bit"; > + direction_str = (info & PCIE_OUTB_ERR_ACC_INFO_DIR_WRITE_MASK) ? "Write" : "Read"; > + lanes = FIELD_GET(PCIE_OUTB_ERR_ACC_INFO_BYTE_LANES_MASK, info); > + for (i = 0, lanes_str[8] = 0; i < 8; i++) > + lanes_str[i] = (lanes & (1 << i)) ? '1' : '0'; > + > + if (is_cfg_err) { > + int bus = FIELD_GET(PCIE_OUTB_ERR_ACC_ADDR_BUS_MASK, cfg_addr); > + int dev = FIELD_GET(PCIE_OUTB_ERR_ACC_ADDR_DEV_MASK, cfg_addr); > + int func = FIELD_GET(PCIE_OUTB_ERR_ACC_ADDR_FUNC_MASK, cfg_addr); > + int reg = FIELD_GET(PCIE_OUTB_ERR_ACC_ADDR_REG_MASK, cfg_addr); > + > + dev_err(pcie->dev, "Error: CFG Acc, %s, %s, Bus=%d, Dev=%d, Fun=%d, Reg=0x%x, lanes=%s\n", > + width_str, direction_str, bus, dev, func, reg, lanes_str); > + dev_err(pcie->dev, " Type: TO=%d Abt=%d UnsupReq=%d AccTO=%d AccDsbld=%d Acc64bit=%d\n", > + !!(cfg_cause & PCIE_OUTB_ERR_CFG_CAUSE_TIMEOUT_MASK), > + !!(cfg_cause & PCIE_OUTB_ERR_CFG_CAUSE_ABORT_MASK), > + !!(cfg_cause & PCIE_OUTB_ERR_CFG_CAUSE_UNSUPP_REQ_MASK), > + !!(cfg_cause & PCIE_OUTB_ERR_CFG_CAUSE_ACC_TIMEOUT_MASK), > + !!(cfg_cause & PCIE_OUTB_ERR_CFG_CAUSE_ACC_DISABLED_MASK), > + !!(cfg_cause & PCIE_OUTB_ERR_CFG_CAUSE_ACC_64BIT__MASK)); > + } > + > + if (is_mem_err) { > + u64 addr = ((u64)hi << 32) | (u64)lo; > + > + dev_err(pcie->dev, "Error: Mem Acc, %s, %s, @0x%llx, lanes=%s\n", > + width_str, direction_str, addr, lanes_str); > + dev_err(pcie->dev, " Type: TO=%d Abt=%d UnsupReq=%d AccDsble=%d BadAddr=%d\n", > + !!(mem_cause & PCIE_OUTB_ERR_MEM_CAUSE_TIMEOUT_MASK), > + !!(mem_cause & PCIE_OUTB_ERR_MEM_CAUSE_ABORT_MASK), > + !!(mem_cause & PCIE_OUTB_ERR_MEM_CAUSE_UNSUPP_REQ_MASK), > + !!(mem_cause & PCIE_OUTB_ERR_MEM_CAUSE_ACC_DISABLED_MASK), > + !!(mem_cause & PCIE_OUTB_ERR_MEM_CAUSE_BAD_ADDR_MASK)); > + } > + > + return NOTIFY_OK; > +} > + > +static int brcm_pcie_die_notify_cb(struct notifier_block *self, > + unsigned long v, void *p) > +{ > + struct brcm_pcie *pcie = > + container_of(self, struct brcm_pcie, die_notifier); > + > + return _brcm_pcie_dump_err(pcie, "Die"); > +} > + > +static int brcm_pcie_panic_notify_cb(struct notifier_block *self, > + unsigned long v, void *p) > +{ > + struct brcm_pcie *pcie = > + container_of(self, struct brcm_pcie, panic_notifier); > + > + return _brcm_pcie_dump_err(pcie, "Panic"); > +} > + > +static void brcm_register_die_notifiers(struct brcm_pcie *pcie) > +{ > + pcie->panic_notifier.notifier_call = brcm_pcie_panic_notify_cb; > + atomic_notifier_chain_register(&panic_notifier_list, > + &pcie->panic_notifier); > + > + pcie->die_notifier.notifier_call = brcm_pcie_die_notify_cb; > + register_die_notifier(&pcie->die_notifier); > +} > + > +static void brcm_unregister_die_notifiers(struct brcm_pcie *pcie) > +{ > + unregister_die_notifier(&pcie->die_notifier); > + atomic_notifier_chain_unregister(&panic_notifier_list, > + &pcie->panic_notifier); > +} > + > static void __brcm_pcie_remove(struct brcm_pcie *pcie) > { > brcm_msi_remove(pcie); > @@ -1729,6 +1876,9 @@ static void brcm_pcie_remove(struct platform_device *pdev) > > pci_stop_root_bus(bridge->bus); > pci_remove_root_bus(bridge->bus); > + if (pcie->cfg->has_err_report) > + brcm_unregister_die_notifiers(pcie); > + > __brcm_pcie_remove(pcie); > } > > @@ -1829,6 +1979,7 @@ static const struct pcie_cfg_data bcm7216_cfg = { > .bridge_sw_init_set = brcm_pcie_bridge_sw_init_set_7278, > .has_phy = true, > .num_inbound_wins = 3, > + .has_err_report = true, > }; > > static const struct pcie_cfg_data bcm7712_cfg = { > @@ -2003,8 +2154,10 @@ static int brcm_pcie_probe(struct platform_device *pdev) > return ret; > } > > - if (pcie->cfg->has_err_report) > + if (pcie->cfg->has_err_report) { > spin_lock_init(&pcie->bridge_lock); > + brcm_register_die_notifiers(pcie); > + } > > return 0; > > -- > 2.34.1 >