From: Bjorn Helgaas <helgaas@kernel.org>
To: Akshay Jindal <akshayaj.lkd@gmail.com>
Cc: bhelgaas@google.com, mani@kernel.org,
manivannan.sadhasivam@linaro.org, kwilczynski@kernel.org,
mahesh@linux.ibm.com, oohall@gmail.com,
ilpo.jarvinen@linux.intel.com, Jonathan.Cameron@huawei.com,
sathyanarayanan.kuppuswamy@linux.intel.com, lukas@wunner.de,
shuah@kernel.org, linux-pci@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] PCI/AER: Add error message when AER_MAX_MULTI_ERR_DEVICES limit is hit during AER handling
Date: Thu, 26 Jun 2025 15:35:55 -0500 [thread overview]
Message-ID: <20250626203555.GA1637877@bhelgaas> (raw)
In-Reply-To: <20250619185041.73240-1-akshayaj.lkd@gmail.com>
On Fri, Jun 20, 2025 at 12:20:30AM +0530, Akshay Jindal wrote:
> When a PCIe error is detected, the root port receives the error message
> and the threaded IRQ handler, aer_isr, traverses the hierarchy downward
> from the root port. It populates the e_info->dev[] array with the PCIe
> devices that have recorded error status, so that appropriate error
> handling and recovery can be performed.
>
> The e_info->dev[] array is limited in size by AER_MAX_MULTI_ERR_DEVICES,
> which is currently defined as 5. If more than five devices report errors
> in the same event, the array silently truncates the list, and those
> extra devices are not included in the recovery flow.
>
> Emit an error message when this limit is reached, fulfilling a TODO
> comment in drivers/pci/pcie/aer.c.
> /* TODO: Should print error message here? */
>
> Signed-off-by: Akshay Jindal <akshayaj.lkd@gmail.com>
Applied to pci/aer for v6.17, thanks!
> ---
>
> Changes since v1:
> - Reworded commit message in imperative mood (per Shuah’s feedback)
> - Mentioned and quoted related TODO in the message
> - Updated recipient list
>
> Testing:
> ========
> Verified log in dmesg on QEMU.
>
> 1. Following command created the required environment. As mentioned below a
> pcie-root-port and a virtio-net-pci device are used on a Q35 machine model.
> ./qemu-system-x86_64 \
> -M q35,accel=kvm \
> -m 2G -cpu host -nographic \
> -serial mon:stdio \
> -kernel /home/akshayaj/pci/arch/x86/boot/bzImage \
> -initrd /home/akshayaj/Embedded_System_Using_QEMU/rootfs/rootfs.cpio.gz \
> -append "console=ttyS0 root=/ pci=pcie_scan_all" \
> -device pcie-root-port,id=rp0,chassis=1,slot=1 \
> -device virtio-net-pci,bus=rp0
>
> ~ # mylspci -t
> -[0000:00]-+-00.0
> +-01.0
> +-02.0
> +-03.0-[01]----00.0
> +-1f.0
> +-1f.2
> \-1f.3
> 00:03.0--> pcie-root-port
>
> 2. Kernel bzImage compiled with following changes:
> 2.1 CONFIG_PCIEAER=y in config
> 2.2 AER_MAX_MULTI_ERR_DEVICES set to 0
> Since there is no pcie-testdev in QEMU, it is impossible to create
> a 5-level hierarchy of PCIe devices in QEMU. So we simulate the
> error scenario by changing the limit to 0.
> 2.3 Log added at the required place in aer.c.
>
> 3. Both correctable and uncorrectable errors were injected on
> pcie-root-port via HMP command (pcie_aer_inject_error) in QEMU.
> HMP Command used are as follows:
> 3.1 pcie_aer_inject_error -c rp0 0x1
> 3.2 pcie_aer_inject_error -c rp0 0x40
> 3.3 pcie_aer_inject_error rp0 0x10
>
> Resulting dmesg:
> ================
> [ 0.380534] pcieport 0000:00:03.0: AER: enabled with IRQ 24
> [ 55.729530] pcieport 0000:00:03.0: AER: Exceeded max allowed (0) addition of PCIe devices for AER handling
> [ 225.484456] pcieport 0000:00:03.0: AER: Exceeded max allowed (0) addition of PCIe devices for AER handling
> [ 356.976253] pcieport 0000:00:03.0: AER: Exceeded max allowed (0) addition of PCIe devices for AER handling
>
> drivers/pci/pcie/aer.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 70ac66188367..3995a1db5699 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -1039,7 +1039,8 @@ static int find_device_iter(struct pci_dev *dev, void *data)
> /* List this device */
> if (add_error_device(e_info, dev)) {
> /* We cannot handle more... Stop iteration */
> - /* TODO: Should print error message here? */
> + pci_err(dev, "Exceeded max allowed (%d) addition of PCIe "
> + "devices for AER handling\n", AER_MAX_MULTI_ERR_DEVICES);
> return 1;
> }
>
> --
> 2.43.0
>
prev parent reply other threads:[~2025-06-26 20:35 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-19 18:50 [PATCH v2] PCI/AER: Add error message when AER_MAX_MULTI_ERR_DEVICES limit is hit during AER handling Akshay Jindal
2025-06-25 10:29 ` Akshay Jindal
2025-06-26 20:35 ` Bjorn Helgaas [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250626203555.GA1637877@bhelgaas \
--to=helgaas@kernel.org \
--cc=Jonathan.Cameron@huawei.com \
--cc=akshayaj.lkd@gmail.com \
--cc=bhelgaas@google.com \
--cc=ilpo.jarvinen@linux.intel.com \
--cc=kwilczynski@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=mahesh@linux.ibm.com \
--cc=mani@kernel.org \
--cc=manivannan.sadhasivam@linaro.org \
--cc=oohall@gmail.com \
--cc=sathyanarayanan.kuppuswamy@linux.intel.com \
--cc=shuah@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox