linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Hans Zhang <18255117159@163.com>
Cc: bhelgaas@google.com, tglx@linutronix.de, kw@linux.com,
	manivannan.sadhasivam@linaro.org, mahesh@linux.ibm.com,
	oohall@gmail.com, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH 0/4] pci: implement "pci=aer_panic"
Date: Mon, 19 May 2025 17:03:10 -0500	[thread overview]
Message-ID: <20250519220310.GA1258923@bhelgaas> (raw)
In-Reply-To: <20250516165518.125495-1-18255117159@163.com>

On Sat, May 17, 2025 at 12:55:14AM +0800, Hans Zhang wrote:
> The following series introduces a new kernel command-line option aer_panic
> to enhance error handling for PCIe Advanced Error Reporting (AER) in
> mission-critical environments. This feature ensures deterministic recover
> from fatal PCIe errors by triggering a controlled kernel panic when device
> recovery fails, avoiding indefinite system hangs.

We try very hard not to add new kernel parameters.

It sounds like part of the problem is the use of SPI interrupts rather
than the PCIe-architected INTx/MSI/MSI-X.  I'm not sure this warrants
generic upstream code changes.  This might be something you need to
maintain out-of-tree.

> Problem Statement
> In systems where unresolved PCIe errors (e.g., bus hangs) occur,
> traditional error recovery mechanisms may leave the system unresponsive
> indefinitely. This is unacceptable for high-availability environment
> requiring prompt recovery via reboot.
> 
> Solution
> The aer_panic option forces a kernel panic on unrecoverable AER errors.
> This bypasses prolonged recovery attempts and ensures immediate reboot.
> 
> Patch Summary:
> Documentation Update: Adds aer_panic to kernel-parameters.txt, explaining
> its purpose and usage.
> 
> Command-Line Handling: Implements pci=aer_panic parsing and state
> management in PCI core.
> 
> State Exposure: Introduces pci_aer_panic_enabled() to check if the panic
> mode is active.
> 
> Panic Trigger: Modifies recovery logic to panic the system when recovery
> fails and aer_panic is enabled.
> 
> Impact
> Controlled Recovery: Reduces downtime by replacing hangs with immediate
> reboots.
> 
> Optional: Enabled via pci=aer_panic; no default behavior change.
> 
> Dependency: Requires CONFIG_PCIEAER.
> 
> For example, in mobile phones and tablets, when there is a problem with
> the PCIe link and it cannot be restored, it is expected to provide an
> alternative method to make the system panic without waiting for the
> battery power to be completely exhausted before restarting the system.
> 
> ---
> For example, the sm8250 and sm8350 of qcom will panic and restart the
> system when they are linked down.
> 
> https://github.com/DOITfit/xiaomi_kernel_sm8250/blob/d42aa408e8cef14f4ec006554fac67ef80b86d0d/drivers/pci/controller/pci-msm.c#L5440
> 
> https://github.com/OnePlusOSS/android_kernel_oneplus_sm8350/blob/13ca08fdf0979fdd61d5e8991661874bb2d19150/drivers/net/wireless/cnss2/pci.c#L950
> 
> 
> Since the design schemes of each SOC manufacturer are different, the AXI
> and other buses connected by PCIe do not have a design to prevent hanging.
> Once a FATAL error occurs in the PCIe link and cannot be restored, the
> system needs to be restarted.
> 
> 
> Dear Mani,
> 
> I wonder if you know how other SoCs of qcom handle FATAL errors that occur
> in PCIe link.
> ---
> 
> Hans Zhang (4):
>   pci: implement "pci=aer_panic"
>   PCI/AER: Introduce aer_panic kernel command-line option
>   PCI/AER: Expose AER panic state via pci_aer_panic_enabled()
>   PCI/AER: Trigger kernel panic on recovery failure if aer_panic is set
> 
>  .../admin-guide/kernel-parameters.txt          |  7 +++++++
>  drivers/pci/pci.c                              |  2 ++
>  drivers/pci/pci.h                              |  4 ++++
>  drivers/pci/pcie/aer.c                         | 18 ++++++++++++++++++
>  drivers/pci/pcie/err.c                         |  8 ++++++--
>  5 files changed, 37 insertions(+), 2 deletions(-)
> 
> 
> base-commit: fee3e843b309444f48157e2188efa6818bae85cf
> prerequisite-patch-id: 299f33d3618e246cd7c04de10e591ace2d0116e6
> prerequisite-patch-id: 482ad0609459a7654a4100cdc9f9aa4b671be50b
> -- 
> 2.25.1
> 


  parent reply	other threads:[~2025-05-19 22:03 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-16 16:55 [PATCH 0/4] pci: implement "pci=aer_panic" Hans Zhang
2025-05-16 16:55 ` [PATCH 1/4] " Hans Zhang
2025-05-16 16:55 ` [PATCH 2/4] PCI/AER: Introduce aer_panic kernel command-line option Hans Zhang
2025-05-16 16:55 ` [PATCH 3/4] PCI/AER: Expose AER panic state via pci_aer_panic_enabled() Hans Zhang
2025-05-17  4:07   ` Sathyanarayanan Kuppuswamy
2025-05-19 14:03     ` Hans Zhang
2025-05-16 16:55 ` [PATCH 4/4] PCI/AER: Trigger kernel panic on recovery failure if aer_panic is set Hans Zhang
2025-05-16 18:10 ` [PATCH 0/4] pci: implement "pci=aer_panic" Sathyanarayanan Kuppuswamy
2025-05-19 14:21   ` Hans Zhang
2025-05-19 14:39     ` Hans Zhang
2025-05-19 14:41     ` Hans Zhang
2025-05-20 16:09       ` Sathyanarayanan Kuppuswamy
2025-05-21 14:54         ` Hans Zhang
2025-05-21 16:17           ` Sathyanarayanan Kuppuswamy
2025-05-22  9:33             ` Hans Zhang
2025-05-19 22:03 ` Bjorn Helgaas [this message]
2025-05-20 15:11   ` Hans Zhang
2025-05-22 11:47 ` Manivannan Sadhasivam
2025-05-22 16:01   ` Hans Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250519220310.GA1258923@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=18255117159@163.com \
    --cc=bhelgaas@google.com \
    --cc=kw@linux.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mahesh@linux.ibm.com \
    --cc=manivannan.sadhasivam@linaro.org \
    --cc=oohall@gmail.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).