The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [PATCH] pci_crash: capture PCI config space at panic time
@ 2026-05-12 16:24 hangej
  2026-05-13 21:30 ` kernel test robot
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: hangej @ 2026-05-12 16:24 UTC (permalink / raw)
  To: bhelgaas; +Cc: linux-pci, linux-kernel, dwmw2, kexec, hangej

From: Johannes Hange <hangej@amazon.com>

When a system crashes and kexec boots into the crash kernel, PCI
device state is lost.  AER registers are volatile and cleared by
device reset, making post-mortem root cause analysis difficult.

Add CONFIG_PCI_CRASH to capture PCI configuration space (including
AER extended capability registers) during panic, before kexec.  The
data is exported via VMCOREINFO so crash analysis tools can extract
it from the vmcore.

Hook point: crash_save_vmcoreinfo() inside __crash_kexec(), which
runs before machine_kexec() -- the only reliable capture point since
panic notifiers run after kexec by default.

Key design choices:
- No config reads at init or hotplug; buffer pre-allocated, filled
  only at crash time for fresh register state.
- capture=aer (default): quick-scan root port ROOT_STATUS, bail if
  no uncorrectable errors.  capture=always: every panic.
- devices= filter applied at rebuild time (zero crash-path cost).
- 200ms debounced workqueue rebuild for VF enumeration storms.
- kvmalloc buffer + kmalloc pagemap for physical page directory
  (vmalloc addresses need explicit page-to-phys translation).
- dcache flush (ARM64/x86) ensures data survives kexec.

Build-tested on v7.1-rc2 (x86_64).  Runtime-tested on 6.6 with
forced panics: ~2ms for 31 PCIe devices (4096B config each),
sub-microsecond AER bail on non-PCI panics.

See Documentation/PCI/pci-crash-capture.rst for full details.

Signed-off-by: Johannes Hange <hangej@amazon.com>
---
 Documentation/PCI/index.rst             |   1 +
 Documentation/PCI/pci-crash-capture.rst | 134 +++++
 MAINTAINERS                             |   8 +
 include/linux/pci_crash.h               | 110 ++++
 kernel/Kconfig.kexec                    |  17 +
 kernel/Makefile                         |   1 +
 kernel/pci_crash.c                      | 755 ++++++++++++++++++++++++
 kernel/vmcore_info.c                    |  13 +
 8 files changed, 1039 insertions(+)
 create mode 100644 Documentation/PCI/pci-crash-capture.rst
 create mode 100644 include/linux/pci_crash.h
 create mode 100644 kernel/pci_crash.c

diff --git a/Documentation/PCI/index.rst b/Documentation/PCI/index.rst
index 5d720d2a415e..7f499a43ddb4 100644
--- a/Documentation/PCI/index.rst
+++ b/Documentation/PCI/index.rst
@@ -19,4 +19,5 @@ PCI Bus Subsystem
    endpoint/index
    controller/index
    boot-interrupts
+   pci-crash-capture
    tph
diff --git a/Documentation/PCI/pci-crash-capture.rst b/Documentation/PCI/pci-crash-capture.rst
new file mode 100644
index 000000000000..35f676c2724b
--- /dev/null
+++ b/Documentation/PCI/pci-crash-capture.rst
@@ -0,0 +1,134 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+========================
+PCI Crash Capture Buffer
+========================
+
+Overview
+========
+
+The PCI crash capture module (``CONFIG_PCI_CRASH``) saves PCI configuration
+space for all (or selected) devices at panic time.  The data is written into
+a pre-allocated buffer whose physical pages are exported via VMCOREINFO,
+allowing crash analysis tools to extract device state from the vmcore.
+
+This is useful because AER (Advanced Error Reporting) registers are volatile
+and cleared by device reset during kexec into the crash kernel.  Capturing
+them before kexec preserves the error state that caused or contributed to the
+crash.
+
+Boot Parameters
+===============
+
+``pci_crash.capture=`` (default: ``aer``)
+  When to capture PCI config space.  Comma-separated tokens:
+
+  ``aer``
+    Capture only if a root port reports uncorrectable errors in its
+    AER ROOT_STATUS register.  Non-PCI panics skip capture entirely
+    (a handful of MMIO reads to root ports, sub-microsecond).
+
+  ``always``
+    Capture on every panic regardless of AER state.  Useful for
+    cascading failures where a PCI link-down causes an MCE or NMI
+    watchdog timeout before DPC/AER fires.
+
+``pci_crash.devices=`` (default: ``all``)
+  Which devices to include in the capture buffer.  Comma-separated tokens:
+
+  ``all``
+    Every PCI device in the system.
+
+  ``bridges``
+    PCI-to-PCI bridges (class 0604) and CardBus bridges (class 0607).
+
+  ``root_ports``
+    PCIe root ports only.
+
+  ``XXYY``
+    Hex PCI class code (class byte XX, subclass byte YY).
+    Up to 8 class codes may be specified.
+
+  Bridges are always implicitly included regardless of the filter value
+  because they hold AER registers needed for root cause analysis.  The
+  filter is applied at device enumeration and hotplug rebuild time, not at
+  crash time (zero overhead on the panic path).
+
+Both parameters are writable at runtime via sysfs
+(``/sys/module/pci_crash/parameters/``).
+
+Architecture
+============
+
+::
+
+  late_initcall
+      │
+      ├── enumerate PCI devices (filtered by devices= param)
+      ├── allocate buffer via kvmalloc (may be vmalloc for >4 MiB)
+      ├── build pagemap: kmalloc'd array of per-page physical addresses
+      └── register bus notifier for hotplug
+
+  hotplug (BUS_NOTIFY_ADD_DEVICE / BUS_NOTIFY_DEL_DEVICE)
+      │
+      └── schedule delayed rebuild (200 ms debounce)
+              └── re-enumerate, re-allocate buffer + pagemap
+
+  panic (__crash_kexec → crash_save_vmcoreinfo → pci_crash_save)
+      │
+      ├── quick-scan root port AER ROOT_STATUS (capture=aer)
+      │     └── bail if no uncorrectable errors
+      ├── read config space for each device (MMIO, no locks)
+      ├── flush dcache (buffer + pagemap) to RAM
+      └── VMCOREINFO exports: PCI_CRASH_PAGEMAP, PCI_CRASH_BUF_SZ
+
+Buffer Format
+=============
+
+The buffer consists of a 32-byte header followed by variable-length
+device records:
+
+.. code-block:: c
+
+    struct pci_crash_buffer_header {   /* 32 bytes */
+        __le32 magic;           /* 0x50434943 "PCIC" */
+        __le32 version;         /* 1 */
+        __le32 device_count;
+        __le32 config_size;     /* 0 = variable-length records */
+        __le64 timestamp;       /* ktime_get_real_fast_ns() */
+        __le32 flags;           /* reserved */
+        __le32 reserved;
+    };
+
+    struct pci_crash_device_record {   /* 8 + cfg_size bytes */
+        __le16 domain;
+        __u8   bus;
+        __u8   devfn;
+        __le32 config_size;     /* 256 or 4096 */
+        __u8   config_data[];
+    };
+
+The pagemap (exported via ``PCI_CRASH_PAGEMAP``) allows the parser to
+locate buffer pages without walking page tables:
+
+.. code-block:: c
+
+    struct pci_crash_pagemap {
+        __le32 magic;           /* 0x5043504d "PCPM" */
+        __le32 num_pages;
+        __le64 buf_size;
+        __le64 addrs[];         /* physical address per page */
+    };
+
+Safety Considerations
+=====================
+
+- ``pci_read_config_dword()`` is direct ECAM MMIO at crash time (no locks).
+- ``ktime_get_real_fast_ns()`` is NMI-safe (lockless timekeeper snapshot).
+- ``WRITE_ONCE``/``READ_ONCE`` + memory barriers between rebuild (process
+  context) and ``pci_crash_save()`` (crash context, single CPU, interrupts
+  disabled).
+- Buffer capped at 24 MiB to prevent excessive allocation on systems with
+  thousands of VFs.
+- ``slab_is_available()`` guard in param setters prevents use-before-init
+  when parameters are set via kernel command line.
diff --git a/MAINTAINERS b/MAINTAINERS
index b2040011a386..3e4fbb3011ce 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -20622,6 +20622,14 @@ T:	git git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git
 F:	drivers/pci/pwrctrl/*
 F:	include/linux/pci-pwrctrl.h
 
+
+PCI CRASH BUFFER
+M:	Johannes Hange <hangej@amazon.com>
+L:	linux-pci@vger.kernel.org
+S:	Maintained
+F:	include/linux/pci_crash.h
+F:	kernel/pci_crash.c
+
 PCI SUBSYSTEM
 M:	Bjorn Helgaas <bhelgaas@google.com>
 L:	linux-pci@vger.kernel.org
diff --git a/include/linux/pci_crash.h b/include/linux/pci_crash.h
new file mode 100644
index 000000000000..59a46ee26183
--- /dev/null
+++ b/include/linux/pci_crash.h
@@ -0,0 +1,110 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * PCI Crash Buffer - Capture PCI config space at panic time
+ *
+ * This module captures PCI configuration space data (including AER
+ * extended capability registers) for all PCI devices at panic time.
+ * The data is stored in a buffer whose pages are captured in the
+ * vmcore for off-site analysis.
+ *
+ * Copyright (c) 2026 Amazon.com, Inc. or its affiliates.
+ */
+#ifndef _LINUX_PCI_CRASH_H
+#define _LINUX_PCI_CRASH_H
+
+#include <linux/types.h>
+
+#define PCI_CRASH_MAGIC         0x50434943  /* "PCIC" in ASCII */
+#define PCI_CRASH_VERSION       1
+
+/**
+ * struct pci_crash_buffer_header - Header for PCI crash buffer
+ * @magic:        Magic number (PCI_CRASH_MAGIC)
+ * @version:      Format version (PCI_CRASH_VERSION)
+ * @device_count: Number of device records following this header
+ * @config_size:  0 -- indicates variable-length records. Each device
+ *                record stores its own config_size (pdev->cfg_size:
+ *                256 for legacy PCI, 4096 for PCIe). Parsers walk
+ *                records sequentially using per-record config_size.
+ * @timestamp:    Capture timestamp from ktime_get_real_fast_ns()
+ * @flags:        Reserved for future use (0 for now)
+ * @reserved:     Padding to align to 32 bytes
+ *
+ * Total size: 32 bytes
+ */
+struct pci_crash_buffer_header {
+	__le32 magic;
+	__le32 version;
+	__le32 device_count;
+	__le32 config_size;
+	__le64 timestamp;
+	__le32 flags;
+	__le32 reserved;
+} __packed;
+
+/**
+ * struct pci_crash_device_record - Per-device record in crash buffer
+ * @domain:      PCI domain number
+ * @bus:         PCI bus number
+ * @devfn:       Device and function number (PCI_DEVFN format)
+ * @config_size: Config space size for this device (pdev->cfg_size:
+ *               256 for legacy PCI, 4096 for PCIe)
+ * @config_data: Raw PCI config space (config_size bytes)
+ *
+ * Records are variable-length: total size per record is
+ * PCI_CRASH_RECORD_META + config_size bytes.
+ */
+struct pci_crash_device_record {
+	__le16 domain;
+	__u8   bus;
+	__u8   devfn;
+	__le32 config_size;
+	__u8   config_data[];
+} __packed;
+
+#define PCI_CRASH_HEADER_SIZE  sizeof(struct pci_crash_buffer_header)
+#define PCI_CRASH_RECORD_META  sizeof(struct pci_crash_device_record)
+
+/**
+ * struct pci_crash_pagemap - Physical page directory for crash buffer
+ *
+ * The PCI crash buffer may be allocated via vmalloc (for buffers
+ * exceeding ~4 MB where the buddy allocator cannot provide contiguous
+ * pages).  virt_to_phys() returns garbage for vmalloc addresses, so
+ * we maintain this small kmalloc'd directory that maps the buffer's
+ * virtual pages to their actual physical addresses.
+ *
+ * At panic time, crash_core.c exports the pagemap's physical address
+ * via VMCOREINFO.  The parser reads the pagemap, then reads each
+ * physical page from the vmcore to reconstruct the full buffer.
+ *
+ * The pagemap itself is always kmalloc'd (direct-mapped), so
+ * virt_to_phys() works correctly on it.
+ *
+ * @magic:     0x5043504d ("PCPM") -- validates this is a pagemap
+ * @num_pages: Number of entries in the addrs[] array
+ * @buf_size:  Exact buffer size in bytes (last page may be partial)
+ * @addrs:     Physical address of each PAGE_SIZE page backing the buffer
+ */
+struct pci_crash_pagemap {
+	__le32 magic;
+	__le32 num_pages;
+	__le64 buf_size;
+	__le64 addrs[];
+} __packed;
+
+#define PCI_CRASH_PAGEMAP_MAGIC  0x5043504d  /* "PCPM" in ASCII */
+
+#ifdef CONFIG_PCI_CRASH
+void pci_crash_save(void);
+extern void *pci_crash_buffer;
+extern size_t pci_crash_buffer_size;
+extern phys_addr_t pci_crash_pagemap_phys;
+#else
+static inline void pci_crash_save(void) {}
+#define pci_crash_buffer        ((void *)NULL)
+#define pci_crash_buffer_size   ((size_t)0)
+#define pci_crash_pagemap_phys  ((phys_addr_t)0)
+#endif
+
+#endif /* _LINUX_PCI_CRASH_H */
diff --git a/kernel/Kconfig.kexec b/kernel/Kconfig.kexec
index 15632358bcf7..6eb49a934e50 100644
--- a/kernel/Kconfig.kexec
+++ b/kernel/Kconfig.kexec
@@ -179,4 +179,21 @@ config CRASH_MAX_MEMORY_RANGES
 	  the computation behind the value provided through the
 	  /sys/kernel/crash_elfcorehdr_size attribute.
 
+
+config PCI_CRASH
+	bool "Capture PCI config space at panic time"
+	depends on VMCORE_INFO && PCI && (ARM64 || X86)
+	default y
+	help
+	  Capture PCI configuration space (including AER extended capability
+	  registers) for all PCI devices at panic time.  The data is stored
+	  in a buffer whose pages are recorded in VMCOREINFO for off-site
+	  crash analysis.
+
+	  This is useful for diagnosing PCI errors that caused or contributed
+	  to the crash, especially when AER registers are volatile and cleared
+	  by device reset during kexec.
+
+	  If unsure, say Y.
+
 endmenu
diff --git a/kernel/Makefile b/kernel/Makefile
index 6785982013dc..6629755bc5a3 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -82,6 +82,7 @@ obj-$(CONFIG_KEXEC_CORE) += kexec_core.o
 obj-$(CONFIG_CRASH_DUMP) += crash_core.o
 obj-$(CONFIG_CRASH_DM_CRYPT) += crash_dump_dm_crypt.o
 obj-$(CONFIG_CRASH_DUMP_KUNIT_TEST) += crash_core_test.o
+obj-$(CONFIG_PCI_CRASH) += pci_crash.o
 obj-$(CONFIG_KEXEC) += kexec.o
 obj-$(CONFIG_KEXEC_FILE) += kexec_file.o
 obj-$(CONFIG_KEXEC_ELF) += kexec_elf.o
diff --git a/kernel/pci_crash.c b/kernel/pci_crash.c
new file mode 100644
index 000000000000..8bbc9008fdc8
--- /dev/null
+++ b/kernel/pci_crash.c
@@ -0,0 +1,755 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * PCI Crash Buffer - Capture PCI config space for crash analysis
+ *
+ * Copyright (c) 2026 Amazon.com, Inc. or its affiliates.
+ *
+ * Captures PCI configuration space at crash time so AER error
+ * registers reflect the crash-time state for off-site analysis.
+ *
+ * Design:
+ * - Init (late_initcall): enumerate devices, allocate buffer.
+ * - Hotplug: bus notifier queues deferred rebuild of device list
+ *   and buffer via workqueue -- no PCI reads.
+ * - Crash: crash_save_vmcoreinfo() calls pci_crash_save() which
+ *   reads config space into buffer, flushes dcache to RAM so
+ *   data survives kexec into crash kernel.
+ *
+ * Records are variable-length: each device's record is exactly
+ * 8 + pdev->cfg_size bytes (264 for legacy PCI, 4104 for PCIe).
+ * The parser walks records sequentially using per-record config_size.
+ *
+ * Buffer pages may be physically scattered (kvmalloc falls back to
+ * vmalloc for buffers exceeding ~4 MB).  A small kmalloc'd pagemap
+ * records each page's physical address so the crash parser can
+ * reconstruct the buffer without page-table walking.
+ *
+ * pci_read_config_dword() is direct MMIO (no locks) -- safe in crash.
+ * for_each_pci_dev() needs pci_bus_sem -- only used at init/hotplug.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/crash_dump.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/moduleparam.h>
+#include <linux/mutex.h>
+#include <linux/pci.h>
+#include <linux/pci_crash.h>
+#include <linux/slab.h>
+#include <linux/timekeeping.h>
+#include <linux/workqueue.h>
+
+#include <linux/cacheflush.h>
+#include <linux/unaligned.h>
+
+/**
+ * pci_crash_flush_dcache() - Flush a memory region from CPU cache to RAM
+ * @addr: virtual address of region to flush
+ * @size: size in bytes
+ *
+ * Used at crash time to ensure the crash kernel sees our buffer/pagemap
+ * writes after kexec.
+ */
+static inline void pci_crash_flush_dcache(void *addr, size_t size)
+{
+#ifdef CONFIG_ARM64
+	unsigned long start = (unsigned long)addr;
+	unsigned long end = start + size;
+
+	dcache_clean_inval_poc(start, end);
+#elif defined(CONFIG_X86)
+	clflush_cache_range(addr, size);
+#endif
+}
+
+/**
+ * pci_crash_buffer - Virtual address of PCI crash capture buffer
+ *
+ * Contains PCI configuration space data captured at panic time.
+ * Written by pci_crash_save(), read by crash kernel via VMCOREINFO.
+ * May be vmalloc'd -- use pci_crash_pagemap for physical addresses.
+ */
+void *pci_crash_buffer;
+EXPORT_SYMBOL_GPL(pci_crash_buffer);
+
+/**
+ * pci_crash_buffer_size - Size of pci_crash_buffer in bytes
+ */
+size_t pci_crash_buffer_size;
+EXPORT_SYMBOL_GPL(pci_crash_buffer_size);
+
+/**
+ * pci_crash_pagemap_phys - Physical address of page directory
+ *
+ * Points to a struct pci_crash_pagemap (always kmalloc'd, direct-mapped).
+ * Exported via VMCOREINFO so the crash kernel can locate buffer pages
+ * without walking page tables.
+ */
+phys_addr_t pci_crash_pagemap_phys;
+EXPORT_SYMBOL_GPL(pci_crash_pagemap_phys);
+
+static struct pci_crash_pagemap *pci_crash_pagemap;
+static size_t pci_crash_pagemap_size;
+static struct pci_dev **pci_crash_devs;
+static unsigned int pci_crash_num_devs;
+static DEFINE_MUTEX(pci_crash_lock);
+
+/*
+ * Set in pci_crash_init() after delayed_work, PCI bus and notifier are
+ * ready.  Guards parse + rebuild in param setters: at boot (level -1)
+ * the setter just stores the string; pci_crash_init() parses and does
+ * the initial rebuild once PCI is up.
+ */
+static bool pci_crash_ready;
+
+/*
+ * capture -- when to capture PCI config space.
+ * Comma-separated tokens:
+ *   aer    -- root port ROOT_STATUS has uncorrectable errors (default)
+ *   always -- every panic regardless of PCI error state
+ *
+ * Writable at runtime (0644) so operators and tests can toggle without
+ * reboot.  Writes re-parse capture_flags immediately.
+ */
+static char capture[32] = "aer";
+
+#define PCI_CRASH_CAPTURE_AER		BIT(0)
+#define PCI_CRASH_CAPTURE_ALWAYS	BIT(1)
+static unsigned long capture_flags = PCI_CRASH_CAPTURE_AER;
+
+static void pci_crash_parse_capture(void);
+
+static int capture_param_set(const char *val, const struct kernel_param *kp)
+{
+	if (strlen(val) >= sizeof(capture))
+		return -EINVAL;
+	strscpy(capture, val, sizeof(capture));
+	strim(capture);
+	if (READ_ONCE(pci_crash_ready))
+		pci_crash_parse_capture();
+	return 0;
+}
+
+static int capture_param_get(char *buf, const struct kernel_param *kp)
+{
+	return scnprintf(buf, PAGE_SIZE, "%s\n", capture);
+}
+
+static const struct kernel_param_ops capture_param_ops = {
+	.set = capture_param_set,
+	.get = capture_param_get,
+};
+module_param_cb(capture, &capture_param_ops, NULL, 0644);
+MODULE_PARM_DESC(capture, "When to capture: aer, always (default: aer)");
+
+/*
+ * devices -- which devices to capture.
+ * Comma-separated tokens:
+ *   all        -- every PCI device (default)
+ *   bridges    -- PCI bridges (class 0604, 0607)
+ *   root_ports -- PCIe root ports only
+ *   XXYY       -- hex PCI class code (class + subclass)
+ *
+ * Bridges are always implicitly included regardless of filter value
+ * because they hold the AER registers needed for root cause analysis.
+ * Applies at rebuild time only -- zero cost at crash time.  Writable
+ * at runtime (0644); writes re-parse and trigger async rebuild.
+ */
+static char devices[256] = "all";
+
+static void pci_crash_parse_devices(void);
+static struct delayed_work pci_crash_rebuild_dwork;
+
+/* Debounce period for bus notifications (ms).
+ * TRN2 liveupdate enumerates ~3000 VFs in ~1.5s -- this coalesces
+ * the storm into a single rebuild after the last event.
+ */
+#define PCI_CRASH_REBUILD_DELAY_MS	200
+
+static int devices_param_set(const char *val, const struct kernel_param *kp)
+{
+	if (strlen(val) >= sizeof(devices))
+		return -EINVAL;
+
+	mutex_lock(&pci_crash_lock);
+	strscpy(devices, val, sizeof(devices));
+	strim(devices);
+	if (READ_ONCE(pci_crash_ready)) {
+		pci_crash_parse_devices();
+		mod_delayed_work(system_wq, &pci_crash_rebuild_dwork,
+				 msecs_to_jiffies(PCI_CRASH_REBUILD_DELAY_MS));
+	}
+	mutex_unlock(&pci_crash_lock);
+	return 0;
+}
+
+static int devices_param_get(char *buf, const struct kernel_param *kp)
+{
+	return scnprintf(buf, PAGE_SIZE, "%s\n", devices);
+}
+
+static const struct kernel_param_ops devices_param_ops = {
+	.set = devices_param_set,
+	.get = devices_param_get,
+};
+module_param_cb(devices, &devices_param_ops, NULL, 0644);
+MODULE_PARM_DESC(devices,
+	"Which devices: all, bridges, root_ports, XXYY hex class (default: all)");
+
+#define PCI_CRASH_DEVICES_ALL		BIT(0)
+#define PCI_CRASH_DEVICES_BRIDGES	BIT(1)
+#define PCI_CRASH_DEVICES_ROOT_PORTS	BIT(2)
+#define PCI_CRASH_MAX_DEVICE_CLASSES	8
+static unsigned long devices_flags = PCI_CRASH_DEVICES_ALL;
+static u16 device_classes[PCI_CRASH_MAX_DEVICE_CLASSES];
+static unsigned int device_class_count;
+
+static void pci_crash_parse_capture(void)
+{
+	char *buf, *token, *rest;
+	unsigned long flags = 0;
+
+	if (!*capture) {
+		WRITE_ONCE(capture_flags, PCI_CRASH_CAPTURE_AER);
+		return;
+	}
+
+	buf = kstrdup(capture, GFP_KERNEL);
+	if (!buf) {
+		WRITE_ONCE(capture_flags, PCI_CRASH_CAPTURE_AER);
+		return;
+	}
+
+	rest = buf;
+	while ((token = strsep(&rest, ",")) != NULL) {
+		if (strcmp(token, "aer") == 0)
+			flags |= PCI_CRASH_CAPTURE_AER;
+		else if (strcmp(token, "always") == 0)
+			flags |= PCI_CRASH_CAPTURE_ALWAYS;
+		else
+			pr_warn("unknown capture token: %s\n",
+				token);
+	}
+	kfree(buf);
+
+	if (!flags) {
+		pr_warn("no valid capture tokens, defaulting to aer\n");
+		flags = PCI_CRASH_CAPTURE_AER;
+	}
+	WRITE_ONCE(capture_flags, flags);
+}
+
+static void pci_crash_parse_devices(void)
+{
+	char *buf, *token, *rest;
+	unsigned long val;
+
+	devices_flags = 0;
+	device_class_count = 0;
+
+	if (!*devices) {
+		devices_flags = PCI_CRASH_DEVICES_ALL;
+		return;
+	}
+
+	buf = kstrdup(devices, GFP_KERNEL);
+	if (!buf) {
+		devices_flags = PCI_CRASH_DEVICES_ALL;
+		return;
+	}
+
+	rest = buf;
+	while ((token = strsep(&rest, ",")) != NULL) {
+		if (strcmp(token, "all") == 0) {
+			devices_flags |= PCI_CRASH_DEVICES_ALL;
+		} else if (strcmp(token, "bridges") == 0) {
+			devices_flags |= PCI_CRASH_DEVICES_BRIDGES;
+		} else if (strcmp(token, "root_ports") == 0) {
+			devices_flags |= PCI_CRASH_DEVICES_ROOT_PORTS;
+		} else if (kstrtoul(token, 16, &val) == 0 && val <= 0xFFFF) {
+			if (device_class_count < PCI_CRASH_MAX_DEVICE_CLASSES)
+				device_classes[device_class_count++] = (u16)val;
+			else
+				pr_warn("too many device classes (max %d)\n",
+					PCI_CRASH_MAX_DEVICE_CLASSES);
+		} else {
+			pr_warn("unknown devices token: %s\n",
+				token);
+		}
+	}
+	kfree(buf);
+
+	if (!devices_flags && device_class_count == 0) {
+		pr_warn("no valid devices tokens, defaulting to all\n");
+		devices_flags = PCI_CRASH_DEVICES_ALL;
+	}
+}
+
+static bool pci_crash_device_matches(struct pci_dev *pdev)
+{
+	unsigned int i;
+	u16 dev_class = pdev->class >> 8;
+
+	if (devices_flags & PCI_CRASH_DEVICES_ALL)
+		return true;
+
+	/* Bridges always included -- they hold AER registers */
+	if (dev_class == PCI_CLASS_BRIDGE_PCI ||
+	    dev_class == PCI_CLASS_BRIDGE_CARDBUS)
+		return true;
+
+	if ((devices_flags & PCI_CRASH_DEVICES_ROOT_PORTS) &&
+	    pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT)
+		return true;
+
+	for (i = 0; i < device_class_count; i++) {
+		if (dev_class == device_classes[i])
+			return true;
+	}
+
+	return false;
+}
+
+/* Sanity limit -- prevents multi-GB allocations on systems with many VFs */
+#define PCI_CRASH_MAX_BUFFER_SIZE	(24 * 1024 * 1024)
+
+/**
+ * pci_crash_build_pagemap() - Build physical page directory for buffer
+ * @buf: buffer allocated via kvmalloc (may be vmalloc'd)
+ * @buf_size: buffer size in bytes
+ *
+ * Allocates a kmalloc'd directory containing the physical address of
+ * each page backing @buf.  The pagemap is always direct-mapped, so
+ * virt_to_phys() works on it at crash time.
+ *
+ * Returns the new pagemap, or NULL on allocation failure.
+ */
+static struct pci_crash_pagemap *pci_crash_build_pagemap(void *buf,
+							 size_t buf_size)
+{
+	unsigned int num_pages = DIV_ROUND_UP(buf_size, PAGE_SIZE);
+	struct pci_crash_pagemap *pm;
+	unsigned int i;
+
+	pm = kmalloc(struct_size(pm, addrs, num_pages), GFP_KERNEL);
+	if (!pm)
+		return NULL;
+
+	pm->magic = cpu_to_le32(PCI_CRASH_PAGEMAP_MAGIC);
+	pm->num_pages = cpu_to_le32(num_pages);
+	pm->buf_size = cpu_to_le64(buf_size);
+
+	for (i = 0; i < num_pages; i++) {
+		struct page *page;
+		phys_addr_t pa;
+
+		if (is_vmalloc_addr(buf + i * PAGE_SIZE))
+			page = vmalloc_to_page(buf + i * PAGE_SIZE);
+		else
+			page = virt_to_page(buf + i * PAGE_SIZE);
+
+		if (!page) {
+			kfree(pm);
+			return NULL;
+		}
+		pa = page_to_phys(page);
+		pm->addrs[i] = cpu_to_le64(pa);
+	}
+
+	return pm;
+}
+
+/**
+ * pci_crash_read_config_space() - Read config space for one device
+ * @pdev: PCI device to read
+ * @ptr: destination pointer within the crash buffer
+ *
+ * Reads pdev->cfg_size bytes of config space (256 for legacy PCI,
+ * 4096 for PCIe). Each 4-byte dword is read individually via MMIO.
+ * Failed reads write 0xFFFFFFFF -- standard PCI convention for
+ * absent/unreachable devices.
+ */
+static void pci_crash_read_config_space(struct pci_dev *pdev, u8 *ptr)
+{
+	struct pci_crash_device_record *record =
+		(struct pci_crash_device_record *)ptr;
+	u8 *cfg_data = ptr + PCI_CRASH_RECORD_META;
+	int offset;
+	u32 val;
+
+	record->domain = cpu_to_le16(pci_domain_nr(pdev->bus));
+	record->bus = pdev->bus->number;
+	record->devfn = pdev->devfn;
+	record->config_size = cpu_to_le32(pdev->cfg_size);
+
+	for (offset = 0; offset < pdev->cfg_size; offset += 4) {
+		if (pci_read_config_dword(pdev, offset, &val)) {
+			put_unaligned_le32(0xFFFFFFFF, &cfg_data[offset]);
+			continue;
+		}
+		put_unaligned_le32(val, &cfg_data[offset]);
+	}
+}
+
+/**
+ * pci_crash_fill_buffer() - Populate buffer with config space
+ * @buffer: destination buffer (header + variable-length records)
+ * @num_devs: number of devices to read
+ *
+ * Records are variable-length: each is PCI_CRASH_RECORD_META +
+ * pdev->cfg_size bytes. The header's config_size is 0 to indicate
+ * variable-length; parsers walk records using per-record config_size.
+ *
+ * Uses ktime_get_real_fast_ns() for the timestamp -- safe in NMI/panic
+ * context (lockless, reads the NMI-safe timekeeper snapshot).
+ */
+static void pci_crash_fill_buffer(void *buffer, unsigned int num_devs)
+{
+	struct pci_crash_buffer_header *header = buffer;
+	struct pci_dev **devs = READ_ONCE(pci_crash_devs);
+	u8 *ptr;
+	unsigned int i;
+
+	header->magic = cpu_to_le32(PCI_CRASH_MAGIC);
+	header->version = cpu_to_le32(PCI_CRASH_VERSION);
+	header->device_count = cpu_to_le32(num_devs);
+	header->config_size = 0;
+	header->timestamp = cpu_to_le64(ktime_get_real_fast_ns());
+	header->flags = 0;
+	header->reserved = 0;
+
+	ptr = (u8 *)buffer + PCI_CRASH_HEADER_SIZE;
+	for (i = 0; i < num_devs; i++) {
+		struct pci_dev *pdev = devs[i];
+
+		if (unlikely(!pdev))
+			break;
+		pci_crash_read_config_space(pdev, ptr);
+		ptr += PCI_CRASH_RECORD_META + pdev->cfg_size;
+	}
+
+	header->device_count = cpu_to_le32(i);
+}
+
+/**
+ * pci_crash_rebuild_snapshot() - Rebuild device list and allocate buffer
+ *
+ * Two-pass approach:
+ *   Pass 1: count PCI devices
+ *   Pass 2: populate device array (filtered) and compute exact buffer
+ *            size from actual pdev->cfg_size per device (no padding)
+ *
+ * The devices param controls which devices are included.  Bridges are
+ * always included regardless of devices setting (they hold AER registers).
+ * devices=all (default) includes everything.
+ *
+ * Does NOT read PCI config space -- reads happen only at crash time.
+ * This keeps rebuild fast during VF enumeration storms (~6000 ADD
+ * events on TRN2 during liveupdate).
+ *
+ * After allocation, builds the pagemap so the crash parser can
+ * locate the buffer's physical pages in the vmcore.
+ *
+ * Caller must hold pci_crash_lock.
+ */
+static void pci_crash_rebuild_snapshot(void)
+{
+	struct pci_dev *pdev = NULL;
+	unsigned int count = 0, i;
+	void *old_buf;
+	unsigned int old_num_devs = pci_crash_num_devs;
+	struct pci_dev **old_devs = pci_crash_devs;
+	struct pci_crash_pagemap *old_pm = pci_crash_pagemap;
+	struct pci_crash_pagemap *new_pm;
+	struct pci_dev **new_devs;
+	void *new_buf;
+	size_t total_size;
+
+	/* Pass 1: count devices */
+	for_each_pci_dev(pdev)
+		count++;
+
+	if (count == 0) {
+		pr_info("no PCI devices found\n");
+		WRITE_ONCE(pci_crash_num_devs, 0);
+		if (old_devs) {
+			for (i = 0; i < old_num_devs; i++)
+				pci_dev_put(old_devs[i]);
+			kvfree(old_devs);
+			WRITE_ONCE(pci_crash_devs, NULL);
+		}
+		kfree(old_pm);
+		WRITE_ONCE(pci_crash_pagemap, NULL);
+		WRITE_ONCE(pci_crash_pagemap_phys, 0);
+		WRITE_ONCE(pci_crash_pagemap_size, 0);
+		old_buf = pci_crash_buffer;
+		WRITE_ONCE(pci_crash_buffer, NULL);
+		WRITE_ONCE(pci_crash_buffer_size, 0);
+		kvfree(old_buf);
+		return;
+	}
+
+	/*
+	 * Disable capture during rebuild to prevent pci_crash_save()
+	 * from accessing stale device references or partially populated
+	 * arrays.
+	 */
+	WRITE_ONCE(pci_crash_num_devs, 0);
+
+	new_devs = kvmalloc_array(count, sizeof(struct pci_dev *),
+				  GFP_KERNEL | __GFP_ZERO);
+	if (!new_devs) {
+		pr_err("dev array alloc failed (%u)\n", count);
+		goto err_restore;
+	}
+
+	/*
+	 * Pass 2: populate device array (filtered) and compute exact
+	 * buffer size from actual pdev->cfg_size per device.
+	 * count from pass 1 is an upper bound; actual may be smaller.
+	 */
+	total_size = PCI_CRASH_HEADER_SIZE;
+	pdev = NULL;
+	i = 0;
+	for_each_pci_dev(pdev) {
+		if (i >= count) {
+			pci_dev_put(pdev);
+			break;
+		}
+		if (!pci_crash_device_matches(pdev))
+			continue;
+		new_devs[i] = pci_dev_get(pdev);
+		total_size += PCI_CRASH_RECORD_META + pdev->cfg_size;
+		i++;
+	}
+	count = i;
+
+	if (count == 0) {
+		kvfree(new_devs);
+		pr_info("no devices match devices=%s\n", devices);
+		goto err_restore;
+	}
+
+	if (total_size > PCI_CRASH_MAX_BUFFER_SIZE) {
+		for (i = 0; i < count; i++)
+			pci_dev_put(new_devs[i]);
+		kvfree(new_devs);
+		pr_warn("buffer too large (%zu > %d bytes)\n",
+			total_size, PCI_CRASH_MAX_BUFFER_SIZE);
+		goto err_restore;
+	}
+
+	new_buf = kvmalloc(total_size, GFP_KERNEL | __GFP_ZERO);
+	if (!new_buf) {
+		for (i = 0; i < count; i++)
+			pci_dev_put(new_devs[i]);
+		kvfree(new_devs);
+		pr_err("buffer alloc failed (%zu bytes)\n",
+		       total_size);
+		goto err_restore;
+	}
+
+	new_pm = pci_crash_build_pagemap(new_buf, total_size);
+	if (!new_pm) {
+		kvfree(new_buf);
+		for (i = 0; i < count; i++)
+			pci_dev_put(new_devs[i]);
+		kvfree(new_devs);
+		pr_err("pagemap alloc failed\n");
+		goto err_restore;
+	}
+
+	/* Release old device references */
+	for (i = 0; i < old_num_devs; i++)
+		if (old_devs && old_devs[i])
+			pci_dev_put(old_devs[i]);
+
+	WRITE_ONCE(pci_crash_devs, new_devs);
+	kvfree(old_devs);
+
+	/*
+	 * Swap pagemap first (with pre-computed phys addr), then buffer,
+	 * then size and count.  If crash fires mid-swap, num_devs is
+	 * still 0 from above, so pci_crash_save() bails out safely.
+	 */
+	WRITE_ONCE(pci_crash_pagemap, new_pm);
+	WRITE_ONCE(pci_crash_pagemap_phys, virt_to_phys(new_pm));
+	WRITE_ONCE(pci_crash_pagemap_size, struct_size(new_pm, addrs,
+			le32_to_cpu(new_pm->num_pages)));
+	kfree(old_pm);
+
+	old_buf = pci_crash_buffer;
+	WRITE_ONCE(pci_crash_buffer, new_buf);
+	WRITE_ONCE(pci_crash_buffer_size, total_size);
+	/* Ensure buffer/pagemap/size are visible before num_devs enables capture */
+	smp_wmb();
+	WRITE_ONCE(pci_crash_num_devs, count);
+	kvfree(old_buf);
+
+	pr_info("rebuild: %u devices (%zu bytes, %u pages)\n",
+		count, total_size, le32_to_cpu(new_pm->num_pages));
+	return;
+
+err_restore:
+	WRITE_ONCE(pci_crash_num_devs, old_num_devs);
+}
+
+/**
+ * pci_crash_save() - Capture PCI config space at crash time
+ *
+ * Called from crash_save_vmcoreinfo() inside __crash_kexec(), which
+ * runs before machine_kexec() boots the crash kernel.  This is the
+ * only reliable capture point -- panic notifiers run AFTER kexec by
+ * default (crash_kexec_post_notifiers=0).
+ *
+ * Capture check (capture param):
+ *   always  -- capture unconditionally
+ *   aer     -- quick-scan root port AER ROOT_STATUS for uncorrectable
+ *             errors; skip if none found
+ *
+ * When capture=always, captures on every panic.
+ * This is useful for cascading failures: a PCI link-down can cause
+ * an MCE or NMI watchdog timeout before DPC/AER fires, so the crash
+ * reason is UNKNOWN but AER registers may still hold error state.
+ *
+ * Reads config space fresh -- successful reads get current register
+ * state, failed reads (offline devices) write 0xFFFFFFFF.
+ *
+ * Flushes both buffer and pagemap from CPU cache to RAM so data
+ * survives kexec into crash kernel.
+ */
+void pci_crash_save(void)
+{
+	struct pci_crash_pagemap *pm;
+	unsigned long cflags;
+	unsigned int num_devs;
+	size_t pm_size;
+	size_t buf_size;
+	void *buffer;
+
+	num_devs = READ_ONCE(pci_crash_num_devs);
+	if (num_devs == 0)
+		return;
+
+	/* Pairs with smp_wmb() in rebuild -- ensures buffer/pagemap visible */
+	smp_rmb();
+
+	buffer = READ_ONCE(pci_crash_buffer);
+	buf_size = READ_ONCE(pci_crash_buffer_size);
+	pm = READ_ONCE(pci_crash_pagemap);
+	pm_size = READ_ONCE(pci_crash_pagemap_size);
+	if (!buffer || buf_size == 0)
+		return;
+
+	cflags = READ_ONCE(capture_flags);
+	if (!(cflags & PCI_CRASH_CAPTURE_ALWAYS)) {
+		if (cflags & PCI_CRASH_CAPTURE_AER) {
+			struct pci_dev **devs = READ_ONCE(pci_crash_devs);
+			unsigned int i;
+			bool pci_error_found = false;
+
+			if (!devs)
+				return;
+
+			for (i = 0; i < num_devs; i++) {
+				struct pci_dev *pdev = devs[i];
+				u32 status;
+
+				if (!pdev || !pdev->aer_cap)
+					continue;
+
+				if (pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT) {
+					pci_read_config_dword(pdev,
+							     pdev->aer_cap + PCI_ERR_ROOT_STATUS,
+							     &status);
+					if (status & PCI_ERR_ROOT_UNCOR_RCV) {
+						pci_error_found = true;
+						break;
+					}
+				}
+			}
+
+			if (!pci_error_found) {
+				pr_emerg("no PCI errors detected, skipping capture\n");
+				return;
+			}
+		} else {
+			return;
+		}
+	}
+
+	pci_crash_fill_buffer(buffer, num_devs);
+
+	/*
+	 * Flush buffer and pagemap from CPU cache to RAM so the
+	 * crash kernel sees our writes after kexec.
+	 */
+	pci_crash_flush_dcache(buffer, buf_size);
+
+	if (pm && pm_size > 0)
+		pci_crash_flush_dcache(pm, pm_size);
+
+	pr_emerg("CAPTURE: %u devices, %zu bytes\n",
+		 num_devs, buf_size);
+}
+EXPORT_SYMBOL_GPL(pci_crash_save);
+
+static void pci_crash_rebuild_worker(struct work_struct *work)
+{
+	mutex_lock(&pci_crash_lock);
+	pci_crash_rebuild_snapshot();
+	mutex_unlock(&pci_crash_lock);
+}
+
+static int pci_crash_bus_notifier(struct notifier_block *nb,
+				  unsigned long action, void *data)
+{
+	if (action == BUS_NOTIFY_ADD_DEVICE ||
+	    action == BUS_NOTIFY_DEL_DEVICE)
+		mod_delayed_work(system_wq, &pci_crash_rebuild_dwork,
+				 msecs_to_jiffies(PCI_CRASH_REBUILD_DELAY_MS));
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block pci_crash_bus_nb = {
+	.notifier_call = pci_crash_bus_notifier,
+};
+
+static int __init pci_crash_init(void)
+{
+	/* Nothing to do in crash kernel -- the buffer from the first kernel
+	 * is already in RAM (flushed before kexec) and the parser finds it
+	 * via the pagemap in VMCOREINFO.
+	 */
+	if (is_kdump_kernel())
+		return 0;
+
+	INIT_DELAYED_WORK(&pci_crash_rebuild_dwork, pci_crash_rebuild_worker);
+
+	pci_crash_parse_capture();
+	pci_crash_parse_devices();
+
+	mutex_lock(&pci_crash_lock);
+	pci_crash_rebuild_snapshot();
+	mutex_unlock(&pci_crash_lock);
+
+	bus_register_notifier(&pci_bus_type, &pci_crash_bus_nb);
+
+	WRITE_ONCE(pci_crash_ready, true);
+
+	pr_info("ready: %u devices (%zu bytes), capture=%s devices=%s\n",
+		pci_crash_num_devs, pci_crash_buffer_size,
+		capture, devices);
+
+	return 0;
+}
+late_initcall(pci_crash_init);
+
+/* Built-in only: crash infrastructure must outlive all drivers. */
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Capture PCI config space at panic time for crash analysis");
+MODULE_AUTHOR("Amazon.com, Inc.");
diff --git a/kernel/vmcore_info.c b/kernel/vmcore_info.c
index 8614430ca212..8813f7e2e516 100644
--- a/kernel/vmcore_info.c
+++ b/kernel/vmcore_info.c
@@ -14,6 +14,7 @@
 #include <linux/cpuhotplug.h>
 #include <linux/memblock.h>
 #include <linux/kmemleak.h>
+#include <linux/pci_crash.h>
 
 #include <asm/page.h>
 #include <asm/sections.h>
@@ -91,6 +92,18 @@ void crash_save_vmcoreinfo(void)
 		vmcoreinfo_data = vmcoreinfo_data_safecopy;
 
 	vmcoreinfo_append_str("CRASHTIME=%lld\n", ktime_get_real_seconds());
+
+	/* Capture PCI config space before kexec into crash kernel */
+	pci_crash_save();
+	if (pci_crash_pagemap_phys && pci_crash_buffer_size > 0) {
+		vmcoreinfo_append_str("PCI_CRASH_PAGEMAP=0x%llx\n",
+				      (unsigned long long)pci_crash_pagemap_phys);
+		vmcoreinfo_append_str("PCI_CRASH_VERSION=%d\n",
+				      PCI_CRASH_VERSION);
+		vmcoreinfo_append_str("PCI_CRASH_BUF_SZ=%zu\n",
+				      pci_crash_buffer_size);
+	}
+
 	update_vmcoreinfo_note();
 }
 
-- 
2.47.3




Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christof Hellmis, Andreas Stieger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] pci_crash: capture PCI config space at panic time
  2026-05-12 16:24 [PATCH] pci_crash: capture PCI config space at panic time hangej
@ 2026-05-13 21:30 ` kernel test robot
  2026-05-13 22:23 ` kernel test robot
  2026-05-13 23:07 ` kernel test robot
  2 siblings, 0 replies; 4+ messages in thread
From: kernel test robot @ 2026-05-13 21:30 UTC (permalink / raw)
  To: hangej, bhelgaas
  Cc: oe-kbuild-all, linux-pci, linux-kernel, dwmw2, kexec, hangej

Hi hangej,

kernel test robot noticed the following build errors:

[auto build test ERROR on pci/next]
[also build test ERROR on pci/for-linus linus/master v7.1-rc3 next-20260508]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/hangej/pci_crash-capture-PCI-config-space-at-panic-time/20260514-000653
base:   https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git next
patch link:    https://lore.kernel.org/r/20260512162402.1929687-1-hangej%40amazon.com
patch subject: [PATCH] pci_crash: capture PCI config space at panic time
config: i386-randconfig-r071-20260514 (https://download.01.org/0day-ci/archive/20260514/202605140501.1tSG7lxR-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
smatch: v0.5.0-9185-gbcc58b9c
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260514/202605140501.1tSG7lxR-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202605140501.1tSG7lxR-lkp@intel.com/

All errors (new ones prefixed by >>):

   kernel/pci_crash.c: In function 'pci_crash_save':
>> kernel/pci_crash.c:661:53: error: 'struct pci_dev' has no member named 'aer_cap'; did you mean 'acs_cap'?
     661 |                                 if (!pdev || !pdev->aer_cap)
         |                                                     ^~~~~~~
         |                                                     acs_cap
   kernel/pci_crash.c:666:68: error: 'struct pci_dev' has no member named 'aer_cap'; did you mean 'acs_cap'?
     666 |                                                              pdev->aer_cap + PCI_ERR_ROOT_STATUS,
         |                                                                    ^~~~~~~
         |                                                                    acs_cap


vim +661 kernel/pci_crash.c

   599	
   600	/**
   601	 * pci_crash_save() - Capture PCI config space at crash time
   602	 *
   603	 * Called from crash_save_vmcoreinfo() inside __crash_kexec(), which
   604	 * runs before machine_kexec() boots the crash kernel.  This is the
   605	 * only reliable capture point -- panic notifiers run AFTER kexec by
   606	 * default (crash_kexec_post_notifiers=0).
   607	 *
   608	 * Capture check (capture param):
   609	 *   always  -- capture unconditionally
   610	 *   aer     -- quick-scan root port AER ROOT_STATUS for uncorrectable
   611	 *             errors; skip if none found
   612	 *
   613	 * When capture=always, captures on every panic.
   614	 * This is useful for cascading failures: a PCI link-down can cause
   615	 * an MCE or NMI watchdog timeout before DPC/AER fires, so the crash
   616	 * reason is UNKNOWN but AER registers may still hold error state.
   617	 *
   618	 * Reads config space fresh -- successful reads get current register
   619	 * state, failed reads (offline devices) write 0xFFFFFFFF.
   620	 *
   621	 * Flushes both buffer and pagemap from CPU cache to RAM so data
   622	 * survives kexec into crash kernel.
   623	 */
   624	void pci_crash_save(void)
   625	{
   626		struct pci_crash_pagemap *pm;
   627		unsigned long cflags;
   628		unsigned int num_devs;
   629		size_t pm_size;
   630		size_t buf_size;
   631		void *buffer;
   632	
   633		num_devs = READ_ONCE(pci_crash_num_devs);
   634		if (num_devs == 0)
   635			return;
   636	
   637		/* Pairs with smp_wmb() in rebuild -- ensures buffer/pagemap visible */
   638		smp_rmb();
   639	
   640		buffer = READ_ONCE(pci_crash_buffer);
   641		buf_size = READ_ONCE(pci_crash_buffer_size);
   642		pm = READ_ONCE(pci_crash_pagemap);
   643		pm_size = READ_ONCE(pci_crash_pagemap_size);
   644		if (!buffer || buf_size == 0)
   645			return;
   646	
   647		cflags = READ_ONCE(capture_flags);
   648		if (!(cflags & PCI_CRASH_CAPTURE_ALWAYS)) {
   649			if (cflags & PCI_CRASH_CAPTURE_AER) {
   650				struct pci_dev **devs = READ_ONCE(pci_crash_devs);
   651				unsigned int i;
   652				bool pci_error_found = false;
   653	
   654				if (!devs)
   655					return;
   656	
   657				for (i = 0; i < num_devs; i++) {
   658					struct pci_dev *pdev = devs[i];
   659					u32 status;
   660	
 > 661					if (!pdev || !pdev->aer_cap)
   662						continue;
   663	
   664					if (pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT) {
   665						pci_read_config_dword(pdev,
   666								     pdev->aer_cap + PCI_ERR_ROOT_STATUS,
   667								     &status);
   668						if (status & PCI_ERR_ROOT_UNCOR_RCV) {
   669							pci_error_found = true;
   670							break;
   671						}
   672					}
   673				}
   674	
   675				if (!pci_error_found) {
   676					pr_emerg("no PCI errors detected, skipping capture\n");
   677					return;
   678				}
   679			} else {
   680				return;
   681			}
   682		}
   683	
   684		pci_crash_fill_buffer(buffer, num_devs);
   685	
   686		/*
   687		 * Flush buffer and pagemap from CPU cache to RAM so the
   688		 * crash kernel sees our writes after kexec.
   689		 */
   690		pci_crash_flush_dcache(buffer, buf_size);
   691	
   692		if (pm && pm_size > 0)
   693			pci_crash_flush_dcache(pm, pm_size);
   694	
   695		pr_emerg("CAPTURE: %u devices, %zu bytes\n",
   696			 num_devs, buf_size);
   697	}
   698	EXPORT_SYMBOL_GPL(pci_crash_save);
   699	

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] pci_crash: capture PCI config space at panic time
  2026-05-12 16:24 [PATCH] pci_crash: capture PCI config space at panic time hangej
  2026-05-13 21:30 ` kernel test robot
@ 2026-05-13 22:23 ` kernel test robot
  2026-05-13 23:07 ` kernel test robot
  2 siblings, 0 replies; 4+ messages in thread
From: kernel test robot @ 2026-05-13 22:23 UTC (permalink / raw)
  To: hangej, bhelgaas
  Cc: oe-kbuild-all, linux-pci, linux-kernel, dwmw2, kexec, hangej

Hi hangej,

kernel test robot noticed the following build errors:

[auto build test ERROR on pci/next]
[also build test ERROR on pci/for-linus linus/master v7.1-rc3 next-20260508]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/hangej/pci_crash-capture-PCI-config-space-at-panic-time/20260514-000653
base:   https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git next
patch link:    https://lore.kernel.org/r/20260512162402.1929687-1-hangej%40amazon.com
patch subject: [PATCH] pci_crash: capture PCI config space at panic time
config: x86_64-randconfig-161-20260514 (https://download.01.org/0day-ci/archive/20260514/202605140624.Tw71sECt-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
smatch: v0.5.0-9185-gbcc58b9c
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260514/202605140624.Tw71sECt-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202605140624.Tw71sECt-lkp@intel.com/

All errors (new ones prefixed by >>):

>> kernel/pci_crash.c:661:25: error: no member named 'aer_cap' in 'struct pci_dev'
     661 |                                 if (!pdev || !pdev->aer_cap)
         |                                               ~~~~  ^
   kernel/pci_crash.c:666:19: error: no member named 'aer_cap' in 'struct pci_dev'
     666 |                                                              pdev->aer_cap + PCI_ERR_ROOT_STATUS,
         |                                                              ~~~~  ^
   2 errors generated.


vim +661 kernel/pci_crash.c

   599	
   600	/**
   601	 * pci_crash_save() - Capture PCI config space at crash time
   602	 *
   603	 * Called from crash_save_vmcoreinfo() inside __crash_kexec(), which
   604	 * runs before machine_kexec() boots the crash kernel.  This is the
   605	 * only reliable capture point -- panic notifiers run AFTER kexec by
   606	 * default (crash_kexec_post_notifiers=0).
   607	 *
   608	 * Capture check (capture param):
   609	 *   always  -- capture unconditionally
   610	 *   aer     -- quick-scan root port AER ROOT_STATUS for uncorrectable
   611	 *             errors; skip if none found
   612	 *
   613	 * When capture=always, captures on every panic.
   614	 * This is useful for cascading failures: a PCI link-down can cause
   615	 * an MCE or NMI watchdog timeout before DPC/AER fires, so the crash
   616	 * reason is UNKNOWN but AER registers may still hold error state.
   617	 *
   618	 * Reads config space fresh -- successful reads get current register
   619	 * state, failed reads (offline devices) write 0xFFFFFFFF.
   620	 *
   621	 * Flushes both buffer and pagemap from CPU cache to RAM so data
   622	 * survives kexec into crash kernel.
   623	 */
   624	void pci_crash_save(void)
   625	{
   626		struct pci_crash_pagemap *pm;
   627		unsigned long cflags;
   628		unsigned int num_devs;
   629		size_t pm_size;
   630		size_t buf_size;
   631		void *buffer;
   632	
   633		num_devs = READ_ONCE(pci_crash_num_devs);
   634		if (num_devs == 0)
   635			return;
   636	
   637		/* Pairs with smp_wmb() in rebuild -- ensures buffer/pagemap visible */
   638		smp_rmb();
   639	
   640		buffer = READ_ONCE(pci_crash_buffer);
   641		buf_size = READ_ONCE(pci_crash_buffer_size);
   642		pm = READ_ONCE(pci_crash_pagemap);
   643		pm_size = READ_ONCE(pci_crash_pagemap_size);
   644		if (!buffer || buf_size == 0)
   645			return;
   646	
   647		cflags = READ_ONCE(capture_flags);
   648		if (!(cflags & PCI_CRASH_CAPTURE_ALWAYS)) {
   649			if (cflags & PCI_CRASH_CAPTURE_AER) {
   650				struct pci_dev **devs = READ_ONCE(pci_crash_devs);
   651				unsigned int i;
   652				bool pci_error_found = false;
   653	
   654				if (!devs)
   655					return;
   656	
   657				for (i = 0; i < num_devs; i++) {
   658					struct pci_dev *pdev = devs[i];
   659					u32 status;
   660	
 > 661					if (!pdev || !pdev->aer_cap)
   662						continue;
   663	
   664					if (pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT) {
   665						pci_read_config_dword(pdev,
   666								     pdev->aer_cap + PCI_ERR_ROOT_STATUS,
   667								     &status);
   668						if (status & PCI_ERR_ROOT_UNCOR_RCV) {
   669							pci_error_found = true;
   670							break;
   671						}
   672					}
   673				}
   674	
   675				if (!pci_error_found) {
   676					pr_emerg("no PCI errors detected, skipping capture\n");
   677					return;
   678				}
   679			} else {
   680				return;
   681			}
   682		}
   683	
   684		pci_crash_fill_buffer(buffer, num_devs);
   685	
   686		/*
   687		 * Flush buffer and pagemap from CPU cache to RAM so the
   688		 * crash kernel sees our writes after kexec.
   689		 */
   690		pci_crash_flush_dcache(buffer, buf_size);
   691	
   692		if (pm && pm_size > 0)
   693			pci_crash_flush_dcache(pm, pm_size);
   694	
   695		pr_emerg("CAPTURE: %u devices, %zu bytes\n",
   696			 num_devs, buf_size);
   697	}
   698	EXPORT_SYMBOL_GPL(pci_crash_save);
   699	

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] pci_crash: capture PCI config space at panic time
  2026-05-12 16:24 [PATCH] pci_crash: capture PCI config space at panic time hangej
  2026-05-13 21:30 ` kernel test robot
  2026-05-13 22:23 ` kernel test robot
@ 2026-05-13 23:07 ` kernel test robot
  2 siblings, 0 replies; 4+ messages in thread
From: kernel test robot @ 2026-05-13 23:07 UTC (permalink / raw)
  To: hangej, bhelgaas
  Cc: llvm, oe-kbuild-all, linux-pci, linux-kernel, dwmw2, kexec,
	hangej

Hi hangej,

kernel test robot noticed the following build warnings:

[auto build test WARNING on pci/next]
[also build test WARNING on pci/for-linus linus/master v7.1-rc3 next-20260508]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/hangej/pci_crash-capture-PCI-config-space-at-panic-time/20260514-000653
base:   https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git next
patch link:    https://lore.kernel.org/r/20260512162402.1929687-1-hangej%40amazon.com
patch subject: [PATCH] pci_crash: capture PCI config space at panic time
config: x86_64-kexec (https://download.01.org/0day-ci/archive/20260514/202605140714.yFl4KATZ-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20260514/202605140714.yFl4KATZ-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202605140714.yFl4KATZ-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> Warning: kernel/pci_crash.c:74 cannot understand function prototype: 'void *pci_crash_buffer;'
>> Warning: kernel/pci_crash.c:80 cannot understand function prototype: 'size_t pci_crash_buffer_size;'
>> Warning: kernel/pci_crash.c:90 cannot understand function prototype: 'phys_addr_t pci_crash_pagemap_phys;'

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-05-13 23:07 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-12 16:24 [PATCH] pci_crash: capture PCI config space at panic time hangej
2026-05-13 21:30 ` kernel test robot
2026-05-13 22:23 ` kernel test robot
2026-05-13 23:07 ` kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox