* [PATCH v13 0/3] PCI: trace: Add a RAS tracepoint to monitor link speed changes
@ 2025-10-25 11:41 Shuai Xue
2025-10-25 11:41 ` [PATCH v13 1/3] PCI: trace: Add a generic RAS tracepoint for hotplug event Shuai Xue
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: Shuai Xue @ 2025-10-25 11:41 UTC (permalink / raw)
To: rostedt, lukas, linux-pci, linux-kernel, linux-edac,
linux-trace-kernel, helgaas, ilpo.jarvinen, mattc,
Jonathan.Cameron
Cc: bhelgaas, tony.luck, bp, xueshuai, mhiramat, mathieu.desnoyers,
oleg, naveen, davem, anil.s.keshavamurthy, mark.rutland, peterz,
tianruidong
changes since v12:
- add Reviewed-by tag for PATCH 1 from Steve
- add Reviewed-by tag for PATCH 1-3 from Ilpo
- add comments for why use string to define tracepoint per Steve
- minor doc improvements from Ilpo
- remove use pci_speed_string to fix PCI dependends which cause build error on sparc64
changes since v11:
- rebase to Linux 6.18-rc1 (no functional changes)
changes since v10:
- explicitly include header file per Ilpo
- add comma on any non-terminator entry per Ilpo
- compile trace.o under CONFIG_TRACING per Ilpo
changes since v9:
- add a documentation about PCI tracepoints per Bjorn
- create a dedicated drivers/pci/trace.c that always defines the PCI tracepoints per Steve
- move tracepoint callite into __pcie_update_link_speed() per Lukas and Bjorn
changes since v8:
- rewrite commit log from Bjorn
- move pci_hp_event to a common place (include/trace/events/pci.h) per Ilpo
- rename hotplug event strings per Bjorn and Lukas
- add PCIe link tracepoint per Bjorn, Lukas, and Ilpo
changes since v7:
- replace the TRACE_INCLUDE_PATH to avoid macro conflict per Steven
- pick up Reviewed-by from Lukas Wunner
Hotplug events are critical indicators for analyzing hardware health, and
surprise link downs can significantly impact system performance and reliability.
In addition, PCIe link speed degradation directly impacts system performance and
often indicates hardware issues such as faulty devices, physical layer problems,
or configuration errors.
This patch set add PCI hotplug and PCIe link tracepoint to help analyze PCI
hotplug events and PCIe link speed degradation.
Shuai Xue (3):
PCI: trace: Add a generic RAS tracepoint for hotplug event
PCI: trace: Add a RAS tracepoint to monitor link speed changes
Documentation: tracing: Add documentation about PCI tracepoints
Documentation/trace/events-pci.rst | 74 +++++++++++++++++
drivers/pci/Makefile | 3 +
drivers/pci/hotplug/pciehp_ctrl.c | 31 +++++--
drivers/pci/hotplug/pciehp_hpc.c | 3 +-
drivers/pci/pci.c | 2 +-
drivers/pci/pci.h | 21 ++++-
drivers/pci/pcie/bwctrl.c | 4 +-
drivers/pci/probe.c | 9 +-
drivers/pci/trace.c | 11 +++
include/trace/events/pci.h | 129 +++++++++++++++++++++++++++++
include/uapi/linux/pci.h | 7 ++
11 files changed, 279 insertions(+), 15 deletions(-)
create mode 100644 Documentation/trace/events-pci.rst
create mode 100644 drivers/pci/trace.c
create mode 100644 include/trace/events/pci.h
--
2.39.3
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v13 1/3] PCI: trace: Add a generic RAS tracepoint for hotplug event
2025-10-25 11:41 [PATCH v13 0/3] PCI: trace: Add a RAS tracepoint to monitor link speed changes Shuai Xue
@ 2025-10-25 11:41 ` Shuai Xue
2025-10-25 11:41 ` [PATCH v13 2/3] PCI: trace: Add a RAS tracepoint to monitor link speed changes Shuai Xue
` (2 subsequent siblings)
3 siblings, 0 replies; 8+ messages in thread
From: Shuai Xue @ 2025-10-25 11:41 UTC (permalink / raw)
To: rostedt, lukas, linux-pci, linux-kernel, linux-edac,
linux-trace-kernel, helgaas, ilpo.jarvinen, mattc,
Jonathan.Cameron
Cc: bhelgaas, tony.luck, bp, xueshuai, mhiramat, mathieu.desnoyers,
oleg, naveen, davem, anil.s.keshavamurthy, mark.rutland, peterz,
tianruidong
Hotplug events are critical indicators for analyzing hardware health,
and surprise link downs can significantly impact system performance and
reliability.
Define a new TRACING_SYSTEM named "pci", add a generic RAS tracepoint
for hotplug event to help health checks. Add enum pci_hotplug_event in
include/uapi/linux/pci.h so applications like rasdaemon can register
tracepoint event handlers for it.
The following output is generated when a device is hotplugged:
$ echo 1 > /sys/kernel/debug/tracing/events/pci/pci_hp_event/enable
$ cat /sys/kernel/debug/tracing/trace_pipe
irq/51-pciehp-88 [001] ..... 1311.177459: pci_hp_event: 0000:00:02.0 slot:10, event:CARD_PRESENT
irq/51-pciehp-88 [001] ..... 1311.177566: pci_hp_event: 0000:00:02.0 slot:10, event:LINK_UP
Suggested-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: "Steven Rostedt (Google)" <rostedt@goodmis.org> # for trace event
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
drivers/pci/Makefile | 3 ++
drivers/pci/hotplug/pciehp_ctrl.c | 31 ++++++++++---
drivers/pci/trace.c | 11 +++++
include/trace/events/pci.h | 72 +++++++++++++++++++++++++++++++
include/uapi/linux/pci.h | 7 +++
5 files changed, 118 insertions(+), 6 deletions(-)
create mode 100644 drivers/pci/trace.c
create mode 100644 include/trace/events/pci.h
diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
index 67647f1880fb..58a4e4ea76b0 100644
--- a/drivers/pci/Makefile
+++ b/drivers/pci/Makefile
@@ -45,3 +45,6 @@ obj-y += controller/
obj-y += switch/
subdir-ccflags-$(CONFIG_PCI_DEBUG) := -DDEBUG
+
+CFLAGS_trace.o := -I$(src)
+obj-$(CONFIG_TRACING) += trace.o
diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index bcc938d4420f..7805f697a02c 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -19,6 +19,7 @@
#include <linux/types.h>
#include <linux/pm_runtime.h>
#include <linux/pci.h>
+#include <trace/events/pci.h>
#include "../pci.h"
#include "pciehp.h"
@@ -244,12 +245,20 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
case ON_STATE:
ctrl->state = POWEROFF_STATE;
mutex_unlock(&ctrl->state_lock);
- if (events & PCI_EXP_SLTSTA_DLLSC)
+ if (events & PCI_EXP_SLTSTA_DLLSC) {
ctrl_info(ctrl, "Slot(%s): Link Down\n",
slot_name(ctrl));
- if (events & PCI_EXP_SLTSTA_PDC)
+ trace_pci_hp_event(pci_name(ctrl->pcie->port),
+ slot_name(ctrl),
+ PCI_HOTPLUG_LINK_DOWN);
+ }
+ if (events & PCI_EXP_SLTSTA_PDC) {
ctrl_info(ctrl, "Slot(%s): Card not present\n",
slot_name(ctrl));
+ trace_pci_hp_event(pci_name(ctrl->pcie->port),
+ slot_name(ctrl),
+ PCI_HOTPLUG_CARD_NOT_PRESENT);
+ }
pciehp_disable_slot(ctrl, SURPRISE_REMOVAL);
break;
default:
@@ -269,6 +278,9 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
INDICATOR_NOOP);
ctrl_info(ctrl, "Slot(%s): Card not present\n",
slot_name(ctrl));
+ trace_pci_hp_event(pci_name(ctrl->pcie->port),
+ slot_name(ctrl),
+ PCI_HOTPLUG_CARD_NOT_PRESENT);
}
mutex_unlock(&ctrl->state_lock);
return;
@@ -281,12 +293,19 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
case OFF_STATE:
ctrl->state = POWERON_STATE;
mutex_unlock(&ctrl->state_lock);
- if (present)
+ if (present) {
ctrl_info(ctrl, "Slot(%s): Card present\n",
slot_name(ctrl));
- if (link_active)
- ctrl_info(ctrl, "Slot(%s): Link Up\n",
- slot_name(ctrl));
+ trace_pci_hp_event(pci_name(ctrl->pcie->port),
+ slot_name(ctrl),
+ PCI_HOTPLUG_CARD_PRESENT);
+ }
+ if (link_active) {
+ ctrl_info(ctrl, "Slot(%s): Link Up\n", slot_name(ctrl));
+ trace_pci_hp_event(pci_name(ctrl->pcie->port),
+ slot_name(ctrl),
+ PCI_HOTPLUG_LINK_UP);
+ }
ctrl->request_result = pciehp_enable_slot(ctrl);
break;
default:
diff --git a/drivers/pci/trace.c b/drivers/pci/trace.c
new file mode 100644
index 000000000000..cf11abca8602
--- /dev/null
+++ b/drivers/pci/trace.c
@@ -0,0 +1,11 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Tracepoints for PCI system
+ *
+ * Copyright (C) 2025 Alibaba Corporation
+ */
+
+#include <linux/pci.h>
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/pci.h>
diff --git a/include/trace/events/pci.h b/include/trace/events/pci.h
new file mode 100644
index 000000000000..39e512a167ee
--- /dev/null
+++ b/include/trace/events/pci.h
@@ -0,0 +1,72 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM pci
+
+#if !defined(_TRACE_HW_EVENT_PCI_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HW_EVENT_PCI_H
+
+#include <linux/tracepoint.h>
+
+#define PCI_HOTPLUG_EVENT \
+ EM(PCI_HOTPLUG_LINK_UP, "LINK_UP") \
+ EM(PCI_HOTPLUG_LINK_DOWN, "LINK_DOWN") \
+ EM(PCI_HOTPLUG_CARD_PRESENT, "CARD_PRESENT") \
+ EMe(PCI_HOTPLUG_CARD_NOT_PRESENT, "CARD_NOT_PRESENT")
+
+/* Enums require being exported to userspace, for user tool parsing */
+#undef EM
+#undef EMe
+#define EM(a, b) TRACE_DEFINE_ENUM(a);
+#define EMe(a, b) TRACE_DEFINE_ENUM(a);
+
+PCI_HOTPLUG_EVENT
+
+/*
+ * Now redefine the EM() and EMe() macros to map the enums to the strings
+ * that will be printed in the output.
+ */
+#undef EM
+#undef EMe
+#define EM(a, b) {a, b},
+#define EMe(a, b) {a, b}
+
+/*
+ * Note: For generic PCI hotplug events, we pass already-resolved strings
+ * (port_name, slot) instead of driver-specific structures like 'struct
+ * controller'. This is because different PCI hotplug drivers (pciehp, cpqphp,
+ * ibmphp, shpchp) define their own versions of 'struct controller' with
+ * different fields and helper functions. Using driver-specific structures would
+ * make the tracepoint interface non-generic and cause compatibility issues
+ * across different drivers.
+ */
+TRACE_EVENT(pci_hp_event,
+
+ TP_PROTO(const char *port_name,
+ const char *slot,
+ const int event),
+
+ TP_ARGS(port_name, slot, event),
+
+ TP_STRUCT__entry(
+ __string( port_name, port_name )
+ __string( slot, slot )
+ __field( int, event )
+ ),
+
+ TP_fast_assign(
+ __assign_str(port_name);
+ __assign_str(slot);
+ __entry->event = event;
+ ),
+
+ TP_printk("%s slot:%s, event:%s\n",
+ __get_str(port_name),
+ __get_str(slot),
+ __print_symbolic(__entry->event, PCI_HOTPLUG_EVENT)
+ )
+);
+
+#endif /* _TRACE_HW_EVENT_PCI_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/include/uapi/linux/pci.h b/include/uapi/linux/pci.h
index a769eefc5139..4f150028965d 100644
--- a/include/uapi/linux/pci.h
+++ b/include/uapi/linux/pci.h
@@ -39,4 +39,11 @@
#define PCIIOC_MMAP_IS_MEM (PCIIOC_BASE | 0x02) /* Set mmap state to MEM space. */
#define PCIIOC_WRITE_COMBINE (PCIIOC_BASE | 0x03) /* Enable/disable write-combining. */
+enum pci_hotplug_event {
+ PCI_HOTPLUG_LINK_UP,
+ PCI_HOTPLUG_LINK_DOWN,
+ PCI_HOTPLUG_CARD_PRESENT,
+ PCI_HOTPLUG_CARD_NOT_PRESENT,
+};
+
#endif /* _UAPILINUX_PCI_H */
--
2.39.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v13 2/3] PCI: trace: Add a RAS tracepoint to monitor link speed changes
2025-10-25 11:41 [PATCH v13 0/3] PCI: trace: Add a RAS tracepoint to monitor link speed changes Shuai Xue
2025-10-25 11:41 ` [PATCH v13 1/3] PCI: trace: Add a generic RAS tracepoint for hotplug event Shuai Xue
@ 2025-10-25 11:41 ` Shuai Xue
2025-10-25 11:41 ` [PATCH v13 3/3] Documentation: tracing: Add documentation about PCI tracepoints Shuai Xue
2025-11-04 9:34 ` [PATCH v13 0/3] PCI: trace: Add a RAS tracepoint to monitor link speed changes Shuai Xue
3 siblings, 0 replies; 8+ messages in thread
From: Shuai Xue @ 2025-10-25 11:41 UTC (permalink / raw)
To: rostedt, lukas, linux-pci, linux-kernel, linux-edac,
linux-trace-kernel, helgaas, ilpo.jarvinen, mattc,
Jonathan.Cameron
Cc: bhelgaas, tony.luck, bp, xueshuai, mhiramat, mathieu.desnoyers,
oleg, naveen, davem, anil.s.keshavamurthy, mark.rutland, peterz,
tianruidong
PCIe link speed degradation directly impacts system performance and
often indicates hardware issues such as faulty devices, physical layer
problems, or configuration errors.
To this end, add a RAS tracepoint to monitor link speed changes,
enabling proactive health checks and diagnostic analysis.
The following output is generated when a device is hotplugged:
$ echo 1 > /sys/kernel/debug/tracing/events/pci/pcie_link_event/enable
$ cat /sys/kernel/debug/tracing/trace_pipe
irq/51-pciehp-88 [001] ..... 381.545386: pcie_link_event: 0000:00:02.0 type:4, reason:4, cur_bus_speed:20, max_bus_speed:23, width:1, flit_mode:0, status:DLLLA
Suggested-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Suggested-by: Matthew W Carlis <mattc@purestorage.com>
Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
drivers/pci/hotplug/pciehp_hpc.c | 3 +-
drivers/pci/pci.c | 2 +-
drivers/pci/pci.h | 21 ++++++++++--
drivers/pci/pcie/bwctrl.c | 4 +--
drivers/pci/probe.c | 9 +++--
include/trace/events/pci.h | 57 ++++++++++++++++++++++++++++++++
6 files changed, 87 insertions(+), 9 deletions(-)
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index bcc51b26d03d..ad5f28f6a8b1 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -320,7 +320,8 @@ int pciehp_check_link_status(struct controller *ctrl)
}
pcie_capability_read_word(pdev, PCI_EXP_LNKSTA2, &linksta2);
- __pcie_update_link_speed(ctrl->pcie->port->subordinate, lnk_status, linksta2);
+ __pcie_update_link_speed(ctrl->pcie->port->subordinate, PCIE_HOTPLUG,
+ lnk_status, linksta2);
if (!found) {
ctrl_info(ctrl, "Slot(%s): No device found\n",
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index b14dd064006c..6a979a234fe6 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4698,7 +4698,7 @@ int pcie_retrain_link(struct pci_dev *pdev, bool use_lt)
* Link Speed.
*/
if (pdev->subordinate)
- pcie_update_link_speed(pdev->subordinate);
+ pcie_update_link_speed(pdev->subordinate, PCIE_LINK_RETRAIN);
return rc;
}
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 4492b809094b..bc4de8fb4cba 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -5,6 +5,7 @@
#include <linux/align.h>
#include <linux/bitfield.h>
#include <linux/pci.h>
+#include <trace/events/pci.h>
struct pcie_tlp_log;
@@ -553,12 +554,28 @@ const char *pci_speed_string(enum pci_bus_speed speed);
void __pcie_print_link_status(struct pci_dev *dev, bool verbose);
void pcie_report_downtraining(struct pci_dev *dev);
-static inline void __pcie_update_link_speed(struct pci_bus *bus, u16 linksta, u16 linksta2)
+enum pcie_link_change_reason {
+ PCIE_LINK_RETRAIN,
+ PCIE_ADD_BUS,
+ PCIE_BWCTRL_ENABLE,
+ PCIE_BWCTRL_IRQ,
+ PCIE_HOTPLUG,
+};
+
+static inline void __pcie_update_link_speed(struct pci_bus *bus,
+ enum pcie_link_change_reason reason,
+ u16 linksta, u16 linksta2)
{
bus->cur_bus_speed = pcie_link_speed[linksta & PCI_EXP_LNKSTA_CLS];
bus->flit_mode = (linksta2 & PCI_EXP_LNKSTA2_FLIT) ? 1 : 0;
+
+ trace_pcie_link_event(bus,
+ reason,
+ FIELD_GET(PCI_EXP_LNKSTA_NLW, linksta),
+ linksta & PCI_EXP_LNKSTA_LINK_STATUS_MASK);
}
-void pcie_update_link_speed(struct pci_bus *bus);
+
+void pcie_update_link_speed(struct pci_bus *bus, enum pcie_link_change_reason reason);
/* Single Root I/O Virtualization */
struct pci_sriov {
diff --git a/drivers/pci/pcie/bwctrl.c b/drivers/pci/pcie/bwctrl.c
index 36f939f23d34..32f1b30ecb84 100644
--- a/drivers/pci/pcie/bwctrl.c
+++ b/drivers/pci/pcie/bwctrl.c
@@ -199,7 +199,7 @@ static void pcie_bwnotif_enable(struct pcie_device *srv)
* Update after enabling notifications & clearing status bits ensures
* link speed is up to date.
*/
- pcie_update_link_speed(port->subordinate);
+ pcie_update_link_speed(port->subordinate, PCIE_BWCTRL_ENABLE);
}
static void pcie_bwnotif_disable(struct pci_dev *port)
@@ -234,7 +234,7 @@ static irqreturn_t pcie_bwnotif_irq(int irq, void *context)
* speed (inside pcie_update_link_speed()) after LBMS has been
* cleared to avoid missing link speed changes.
*/
- pcie_update_link_speed(port->subordinate);
+ pcie_update_link_speed(port->subordinate, PCIE_BWCTRL_IRQ);
return IRQ_HANDLED;
}
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index c83e75a0ec12..d52f997ea476 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -22,6 +22,7 @@
#include <linux/irqdomain.h>
#include <linux/pm_runtime.h>
#include <linux/bitfield.h>
+#include <trace/events/pci.h>
#include "pci.h"
#define CARDBUS_LATENCY_TIMER 176 /* secondary latency timer */
@@ -813,14 +814,16 @@ const char *pci_speed_string(enum pci_bus_speed speed)
}
EXPORT_SYMBOL_GPL(pci_speed_string);
-void pcie_update_link_speed(struct pci_bus *bus)
+void pcie_update_link_speed(struct pci_bus *bus,
+ enum pcie_link_change_reason reason)
{
struct pci_dev *bridge = bus->self;
u16 linksta, linksta2;
pcie_capability_read_word(bridge, PCI_EXP_LNKSTA, &linksta);
pcie_capability_read_word(bridge, PCI_EXP_LNKSTA2, &linksta2);
- __pcie_update_link_speed(bus, linksta, linksta2);
+
+ __pcie_update_link_speed(bus, reason, linksta, linksta2);
}
EXPORT_SYMBOL_GPL(pcie_update_link_speed);
@@ -907,7 +910,7 @@ static void pci_set_bus_speed(struct pci_bus *bus)
pcie_capability_read_dword(bridge, PCI_EXP_LNKCAP, &linkcap);
bus->max_bus_speed = pcie_link_speed[linkcap & PCI_EXP_LNKCAP_SLS];
- pcie_update_link_speed(bus);
+ pcie_update_link_speed(bus, PCIE_ADD_BUS);
}
}
diff --git a/include/trace/events/pci.h b/include/trace/events/pci.h
index 39e512a167ee..9a9122f62fd3 100644
--- a/include/trace/events/pci.h
+++ b/include/trace/events/pci.h
@@ -5,6 +5,7 @@
#if !defined(_TRACE_HW_EVENT_PCI_H) || defined(TRACE_HEADER_MULTI_READ)
#define _TRACE_HW_EVENT_PCI_H
+#include <uapi/linux/pci_regs.h>
#include <linux/tracepoint.h>
#define PCI_HOTPLUG_EVENT \
@@ -66,6 +67,62 @@ TRACE_EVENT(pci_hp_event,
)
);
+#define PCI_EXP_LNKSTA_LINK_STATUS_MASK (PCI_EXP_LNKSTA_LBMS | \
+ PCI_EXP_LNKSTA_LABS | \
+ PCI_EXP_LNKSTA_LT | \
+ PCI_EXP_LNKSTA_DLLLA)
+
+#define LNKSTA_FLAGS \
+ { PCI_EXP_LNKSTA_LT, "LT"}, \
+ { PCI_EXP_LNKSTA_DLLLA, "DLLLA"}, \
+ { PCI_EXP_LNKSTA_LBMS, "LBMS"}, \
+ { PCI_EXP_LNKSTA_LABS, "LABS"}
+
+TRACE_EVENT(pcie_link_event,
+
+ TP_PROTO(struct pci_bus *bus,
+ unsigned int reason,
+ unsigned int width,
+ unsigned int status
+ ),
+
+ TP_ARGS(bus, reason, width, status),
+
+ TP_STRUCT__entry(
+ __string( port_name, pci_name(bus->self))
+ __field( unsigned int, type )
+ __field( unsigned int, reason )
+ __field( unsigned int, cur_bus_speed )
+ __field( unsigned int, max_bus_speed )
+ __field( unsigned int, width )
+ __field( unsigned int, flit_mode )
+ __field( unsigned int, link_status )
+ ),
+
+ TP_fast_assign(
+ __assign_str(port_name);
+ __entry->type = pci_pcie_type(bus->self);
+ __entry->reason = reason;
+ __entry->cur_bus_speed = bus->cur_bus_speed;
+ __entry->max_bus_speed = bus->max_bus_speed;
+ __entry->width = width;
+ __entry->flit_mode = bus->flit_mode;
+ __entry->link_status = status;
+ ),
+
+ TP_printk("%s type:%d, reason:%d, cur_bus_speed:%d, max_bus_speed:%d, width:%u, flit_mode:%u, status:%s\n",
+ __get_str(port_name),
+ __entry->type,
+ __entry->reason,
+ __entry->cur_bus_speed,
+ __entry->max_bus_speed,
+ __entry->width,
+ __entry->flit_mode,
+ __print_flags((unsigned long)__entry->link_status, "|",
+ LNKSTA_FLAGS)
+ )
+);
+
#endif /* _TRACE_HW_EVENT_PCI_H */
/* This part must be outside protection */
--
2.39.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v13 3/3] Documentation: tracing: Add documentation about PCI tracepoints
2025-10-25 11:41 [PATCH v13 0/3] PCI: trace: Add a RAS tracepoint to monitor link speed changes Shuai Xue
2025-10-25 11:41 ` [PATCH v13 1/3] PCI: trace: Add a generic RAS tracepoint for hotplug event Shuai Xue
2025-10-25 11:41 ` [PATCH v13 2/3] PCI: trace: Add a RAS tracepoint to monitor link speed changes Shuai Xue
@ 2025-10-25 11:41 ` Shuai Xue
2025-12-09 13:59 ` [External] : " ALOK TIWARI
2025-11-04 9:34 ` [PATCH v13 0/3] PCI: trace: Add a RAS tracepoint to monitor link speed changes Shuai Xue
3 siblings, 1 reply; 8+ messages in thread
From: Shuai Xue @ 2025-10-25 11:41 UTC (permalink / raw)
To: rostedt, lukas, linux-pci, linux-kernel, linux-edac,
linux-trace-kernel, helgaas, ilpo.jarvinen, mattc,
Jonathan.Cameron
Cc: bhelgaas, tony.luck, bp, xueshuai, mhiramat, mathieu.desnoyers,
oleg, naveen, davem, anil.s.keshavamurthy, mark.rutland, peterz,
tianruidong
The PCI tracing system provides tracepoints to monitor critical hardware
events that can impact system performance and reliability. Add
documentation about it.
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
Documentation/trace/events-pci.rst | 74 ++++++++++++++++++++++++++++++
1 file changed, 74 insertions(+)
create mode 100644 Documentation/trace/events-pci.rst
diff --git a/Documentation/trace/events-pci.rst b/Documentation/trace/events-pci.rst
new file mode 100644
index 000000000000..88bd38fcc184
--- /dev/null
+++ b/Documentation/trace/events-pci.rst
@@ -0,0 +1,74 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===========================
+Subsystem Trace Points: PCI
+===========================
+
+Overview
+========
+The PCI tracing system provides tracepoints to monitor critical hardware events
+that can impact system performance and reliability. These events normally show
+up here:
+
+ /sys/kernel/tracing/events/pci
+
+Cf. include/trace/events/pci.h for the events definitions.
+
+Available Tracepoints
+=====================
+
+pci_hp_event
+------------
+
+Monitors PCI hotplug events including card insertion/removal and link
+state changes.
+::
+
+ pci_hp_event "%s slot:%s, event:%s\n"
+
+**Event Types**:
+
+* ``LINK_UP`` - PCIe link established
+* ``LINK_DOWN`` - PCIe link lost
+* ``CARD_PRESENT`` - Card detected in slot
+* ``CARD_NOT_PRESENT`` - Card removed from slot
+
+**Example Usage**:
+
+ # Enable the tracepoint
+ echo 1> /sys/kernel/debug/tracing/events/pci/pci_hp_event/enable
+
+ # Monitor events (the following output is generated when a device is hotplugged)
+ cat /sys/kernel/debug/tracing/trace_pipe
+ irq/51-pciehp-88 [001] ..... 1311.177459: pci_hp_event: 0000:00:02.0 slot:10, event:CARD_PRESENT
+
+ irq/51-pciehp-88 [001] ..... 1311.177566: pci_hp_event: 0000:00:02.0 slot:10, event:LINK_UP
+
+pcie_link_event
+---------------
+
+Monitors PCIe link speed changes and provides detailed link status information.
+::
+
+ pcie_link_event "%s type:%d, reason:%d, cur_bus_speed:%s, max_bus_speed:%s, width:%u, flit_mode:%u, status:%s\n"
+
+**Parameters**:
+
+* ``type`` - PCIe device type (4=Root Port, etc.)
+* ``reason`` - Reason for link change:
+
+ - ``0`` - Link retrain
+ - ``1`` - Bus enumeration
+ - ``2`` - Bandwidth notification enable
+ - ``3`` - Bandwidth notification IRQ
+ - ``4`` - Hotplug event
+
+
+**Example Usage**:
+
+ # Enable the tracepoint
+ echo1 > /sys/kernel/debug/tracing/events/pci/pcie_link_event/enable
+
+ # Monitor events (the following output is generated when a device is hotplugged)
+ cat /sys/kernel/debug/tracing/trace_pipe
+ irq/51-pciehp-88 [001] ..... 381.545386: pcie_link_event: 0000:00:02.0 type:4, reason:4, cur_bus_speed:20, max_bus_speed:23, width:1, flit_mode:0, status:DLLLA
--
2.39.3
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v13 0/3] PCI: trace: Add a RAS tracepoint to monitor link speed changes
2025-10-25 11:41 [PATCH v13 0/3] PCI: trace: Add a RAS tracepoint to monitor link speed changes Shuai Xue
` (2 preceding siblings ...)
2025-10-25 11:41 ` [PATCH v13 3/3] Documentation: tracing: Add documentation about PCI tracepoints Shuai Xue
@ 2025-11-04 9:34 ` Shuai Xue
2025-12-09 13:19 ` Shuai Xue
3 siblings, 1 reply; 8+ messages in thread
From: Shuai Xue @ 2025-11-04 9:34 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: tony.luck, bp, mhiramat, mathieu.desnoyers, oleg, naveen, davem,
anil.s.keshavamurthy, mark.rutland, peterz, tianruidong, rostedt,
lukas, linux-pci, linux-kernel, linux-edac, linux-trace-kernel,
helgaas, ilpo.jarvinen, mattc, Jonathan.Cameron
在 2025/10/25 19:41, Shuai Xue 写道:
> changes since v12:
> - add Reviewed-by tag for PATCH 1 from Steve
> - add Reviewed-by tag for PATCH 1-3 from Ilpo
> - add comments for why use string to define tracepoint per Steve
> - minor doc improvements from Ilpo
> - remove use pci_speed_string to fix PCI dependends which cause build error on sparc64
>
> changes since v11:
> - rebase to Linux 6.18-rc1 (no functional changes)
>
> changes since v10:
> - explicitly include header file per Ilpo
> - add comma on any non-terminator entry per Ilpo
> - compile trace.o under CONFIG_TRACING per Ilpo
>
> changes since v9:
> - add a documentation about PCI tracepoints per Bjorn
> - create a dedicated drivers/pci/trace.c that always defines the PCI tracepoints per Steve
> - move tracepoint callite into __pcie_update_link_speed() per Lukas and Bjorn
>
> changes since v8:
> - rewrite commit log from Bjorn
> - move pci_hp_event to a common place (include/trace/events/pci.h) per Ilpo
> - rename hotplug event strings per Bjorn and Lukas
> - add PCIe link tracepoint per Bjorn, Lukas, and Ilpo
>
> changes since v7:
> - replace the TRACE_INCLUDE_PATH to avoid macro conflict per Steven
> - pick up Reviewed-by from Lukas Wunner
>
> Hotplug events are critical indicators for analyzing hardware health, and
> surprise link downs can significantly impact system performance and reliability.
> In addition, PCIe link speed degradation directly impacts system performance and
> often indicates hardware issues such as faulty devices, physical layer problems,
> or configuration errors.
>
> This patch set add PCI hotplug and PCIe link tracepoint to help analyze PCI
> hotplug events and PCIe link speed degradation.
>
> Shuai Xue (3):
> PCI: trace: Add a generic RAS tracepoint for hotplug event
> PCI: trace: Add a RAS tracepoint to monitor link speed changes
> Documentation: tracing: Add documentation about PCI tracepoints
>
Hi, Bjorn,
Gentle ping.
Do you have any further concerns about this patch set?
Shuai
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v13 0/3] PCI: trace: Add a RAS tracepoint to monitor link speed changes
2025-11-04 9:34 ` [PATCH v13 0/3] PCI: trace: Add a RAS tracepoint to monitor link speed changes Shuai Xue
@ 2025-12-09 13:19 ` Shuai Xue
0 siblings, 0 replies; 8+ messages in thread
From: Shuai Xue @ 2025-12-09 13:19 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: tony.luck, bp, mhiramat, mathieu.desnoyers, oleg, naveen, davem,
anil.s.keshavamurthy, mark.rutland, peterz, tianruidong, rostedt,
lukas, linux-pci, linux-kernel, linux-edac, linux-trace-kernel,
helgaas, ilpo.jarvinen, mattc, Jonathan.Cameron
在 2025/11/4 17:34, Shuai Xue 写道:
>
>
> 在 2025/10/25 19:41, Shuai Xue 写道:
>> changes since v12:
>> - add Reviewed-by tag for PATCH 1 from Steve
>> - add Reviewed-by tag for PATCH 1-3 from Ilpo
>> - add comments for why use string to define tracepoint per Steve
>> - minor doc improvements from Ilpo
>> - remove use pci_speed_string to fix PCI dependends which cause build error on sparc64
>>
>> changes since v11:
>> - rebase to Linux 6.18-rc1 (no functional changes)
>>
>> changes since v10:
>> - explicitly include header file per Ilpo
>> - add comma on any non-terminator entry per Ilpo
>> - compile trace.o under CONFIG_TRACING per Ilpo
>>
>> changes since v9:
>> - add a documentation about PCI tracepoints per Bjorn
>> - create a dedicated drivers/pci/trace.c that always defines the PCI tracepoints per Steve
>> - move tracepoint callite into __pcie_update_link_speed() per Lukas and Bjorn
>>
>> changes since v8:
>> - rewrite commit log from Bjorn
>> - move pci_hp_event to a common place (include/trace/events/pci.h) per Ilpo
>> - rename hotplug event strings per Bjorn and Lukas
>> - add PCIe link tracepoint per Bjorn, Lukas, and Ilpo
>>
>> changes since v7:
>> - replace the TRACE_INCLUDE_PATH to avoid macro conflict per Steven
>> - pick up Reviewed-by from Lukas Wunner
>>
>> Hotplug events are critical indicators for analyzing hardware health, and
>> surprise link downs can significantly impact system performance and reliability.
>> In addition, PCIe link speed degradation directly impacts system performance and
>> often indicates hardware issues such as faulty devices, physical layer problems,
>> or configuration errors.
>>
>> This patch set add PCI hotplug and PCIe link tracepoint to help analyze PCI
>> hotplug events and PCIe link speed degradation.
>>
>> Shuai Xue (3):
>> PCI: trace: Add a generic RAS tracepoint for hotplug event
>> PCI: trace: Add a RAS tracepoint to monitor link speed changes
>> Documentation: tracing: Add documentation about PCI tracepoints
>>
>
> Hi, Bjorn,
>
> Gentle ping.
>
> Do you have any further concerns about this patch set?
>
> Shuai
Hi, all,
Gentle ping.
Thanks.
Shuai
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [External] : [PATCH v13 3/3] Documentation: tracing: Add documentation about PCI tracepoints
2025-10-25 11:41 ` [PATCH v13 3/3] Documentation: tracing: Add documentation about PCI tracepoints Shuai Xue
@ 2025-12-09 13:59 ` ALOK TIWARI
2025-12-10 1:56 ` Shuai Xue
0 siblings, 1 reply; 8+ messages in thread
From: ALOK TIWARI @ 2025-12-09 13:59 UTC (permalink / raw)
To: Shuai Xue, rostedt, lukas, linux-pci, linux-kernel, linux-edac,
linux-trace-kernel, helgaas, ilpo.jarvinen, mattc,
Jonathan.Cameron
Cc: bhelgaas, tony.luck, bp, mhiramat, mathieu.desnoyers, oleg,
naveen, davem, anil.s.keshavamurthy, mark.rutland, peterz,
tianruidong
On 10/25/2025 5:11 PM, Shuai Xue wrote:
> The PCI tracing system provides tracepoints to monitor critical hardware
> events that can impact system performance and reliability. Add
> documentation about it.
>
> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
> ---
> Documentation/trace/events-pci.rst | 74 ++++++++++++++++++++++++++++++
> 1 file changed, 74 insertions(+)
> create mode 100644 Documentation/trace/events-pci.rst
>
> diff --git a/Documentation/trace/events-pci.rst b/Documentation/trace/events-pci.rst
> new file mode 100644
> index 000000000000..88bd38fcc184
> --- /dev/null
> +++ b/Documentation/trace/events-pci.rst
> @@ -0,0 +1,74 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===========================
> +Subsystem Trace Points: PCI
> +===========================
> +
> +Overview
> +========
> +The PCI tracing system provides tracepoints to monitor critical hardware events
> +that can impact system performance and reliability. These events normally show
> +up here:
> +
> + /sys/kernel/tracing/events/pci
> +
> +Cf. include/trace/events/pci.h for the events definitions.
> +
> +Available Tracepoints
> +=====================
> +
> +pci_hp_event
> +------------
> +
> +Monitors PCI hotplug events including card insertion/removal and link
> +state changes.
> +::
> +
> + pci_hp_event "%s slot:%s, event:%s\n"
> +
> +**Event Types**:
> +
> +* ``LINK_UP`` - PCIe link established
> +* ``LINK_DOWN`` - PCIe link lost
> +* ``CARD_PRESENT`` - Card detected in slot
> +* ``CARD_NOT_PRESENT`` - Card removed from slot
> +
> +**Example Usage**:
> +
> + # Enable the tracepoint
> + echo 1> /sys/kernel/debug/tracing/events/pci/pci_hp_event/enable
missing space echo 1 >"
> +
> + # Monitor events (the following output is generated when a device is hotplugged)
> + cat /sys/kernel/debug/tracing/trace_pipe
> + irq/51-pciehp-88 [001] ..... 1311.177459: pci_hp_event: 0000:00:02.0 slot:10, event:CARD_PRESENT
> +
> + irq/51-pciehp-88 [001] ..... 1311.177566: pci_hp_event: 0000:00:02.0 slot:10, event:LINK_UP
> +
> +pcie_link_event
> +---------------
> +
> +Monitors PCIe link speed changes and provides detailed link status information.
> +::
> +
> + pcie_link_event "%s type:%d, reason:%d, cur_bus_speed:%s, max_bus_speed:%s, width:%u, flit_mode:%u, status:%s\n"
%s -> %d mismatch for cur_bus_speed and max_bus_speed
TP_printk("%s type:%d, reason:%d, cur_bus_speed:%d, max_bus_speed:%d,
width:%u, flit_mode:%u, status:%s\n",
> +
> +**Parameters**:
> +
> +* ``type`` - PCIe device type (4=Root Port, etc.)
> +* ``reason`` - Reason for link change:
> +
> + - ``0`` - Link retrain
> + - ``1`` - Bus enumeration
> + - ``2`` - Bandwidth notification enable
> + - ``3`` - Bandwidth notification IRQ
> + - ``4`` - Hotplug event
> +
> +
> +**Example Usage**:
> +
> + # Enable the tracepoint
> + echo1 > /sys/kernel/debug/tracing/events/pci/pcie_link_event/enable
> +
> + # Monitor events (the following output is generated when a device is hotplugged)
> + cat /sys/kernel/debug/tracing/trace_pipe
> + irq/51-pciehp-88 [001] ..... 381.545386: pcie_link_event: 0000:00:02.0 type:4, reason:4, cur_bus_speed:20, max_bus_speed:23, width:1, flit_mode:0, status:DLLLA
Thanks,
Alok
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [External] : [PATCH v13 3/3] Documentation: tracing: Add documentation about PCI tracepoints
2025-12-09 13:59 ` [External] : " ALOK TIWARI
@ 2025-12-10 1:56 ` Shuai Xue
0 siblings, 0 replies; 8+ messages in thread
From: Shuai Xue @ 2025-12-10 1:56 UTC (permalink / raw)
To: ALOK TIWARI, rostedt, lukas, linux-pci, linux-kernel, linux-edac,
linux-trace-kernel, helgaas, ilpo.jarvinen, mattc,
Jonathan.Cameron
Cc: bhelgaas, tony.luck, bp, mhiramat, mathieu.desnoyers, oleg,
naveen, davem, anil.s.keshavamurthy, mark.rutland, peterz,
tianruidong
Hi, Alok,
在 2025/12/9 21:59, ALOK TIWARI 写道:
>
>
> On 10/25/2025 5:11 PM, Shuai Xue wrote:
>> The PCI tracing system provides tracepoints to monitor critical hardware
>> events that can impact system performance and reliability. Add
>> documentation about it.
>>
>> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
>> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
>> ---
>> Documentation/trace/events-pci.rst | 74 ++++++++++++++++++++++++++++++
>> 1 file changed, 74 insertions(+)
>> create mode 100644 Documentation/trace/events-pci.rst
>>
>> diff --git a/Documentation/trace/events-pci.rst b/Documentation/trace/events-pci.rst
>> new file mode 100644
>> index 000000000000..88bd38fcc184
>> --- /dev/null
>> +++ b/Documentation/trace/events-pci.rst
>> @@ -0,0 +1,74 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +
>> +===========================
>> +Subsystem Trace Points: PCI
>> +===========================
>> +
>> +Overview
>> +========
>> +The PCI tracing system provides tracepoints to monitor critical hardware events
>> +that can impact system performance and reliability. These events normally show
>> +up here:
>> +
>> + /sys/kernel/tracing/events/pci
>> +
>> +Cf. include/trace/events/pci.h for the events definitions.
>> +
>> +Available Tracepoints
>> +=====================
>> +
>> +pci_hp_event
>> +------------
>> +
>> +Monitors PCI hotplug events including card insertion/removal and link
>> +state changes.
>> +::
>> +
>> + pci_hp_event "%s slot:%s, event:%s\n"
>> +
>> +**Event Types**:
>> +
>> +* ``LINK_UP`` - PCIe link established
>> +* ``LINK_DOWN`` - PCIe link lost
>> +* ``CARD_PRESENT`` - Card detected in slot
>> +* ``CARD_NOT_PRESENT`` - Card removed from slot
>> +
>> +**Example Usage**:
>> +
>> + # Enable the tracepoint
>> + echo 1> /sys/kernel/debug/tracing/events/pci/pci_hp_event/enable
>
> missing space echo 1 >"
Good catch.
>
>> +
>> + # Monitor events (the following output is generated when a device is hotplugged)
>> + cat /sys/kernel/debug/tracing/trace_pipe
>> + irq/51-pciehp-88 [001] ..... 1311.177459: pci_hp_event: 0000:00:02.0 slot:10, event:CARD_PRESENT
>> +
>> + irq/51-pciehp-88 [001] ..... 1311.177566: pci_hp_event: 0000:00:02.0 slot:10, event:LINK_UP
>> +
>> +pcie_link_event
>> +---------------
>> +
>> +Monitors PCIe link speed changes and provides detailed link status information.
>> +::
>> +
>> + pcie_link_event "%s type:%d, reason:%d, cur_bus_speed:%s, max_bus_speed:%s, width:%u, flit_mode:%u, status:%s\n"
>
> %s -> %d mismatch for cur_bus_speed and max_bus_speed
>
> TP_printk("%s type:%d, reason:%d, cur_bus_speed:%d, max_bus_speed:%d, width:%u, flit_mode:%u, status:%s\n",
Sorry for missing the format.
>
>> +
>> +**Parameters**:
>> +
>> +* ``type`` - PCIe device type (4=Root Port, etc.)
>> +* ``reason`` - Reason for link change:
>> +
>> + - ``0`` - Link retrain
>> + - ``1`` - Bus enumeration
>> + - ``2`` - Bandwidth notification enable
>> + - ``3`` - Bandwidth notification IRQ
>> + - ``4`` - Hotplug event
>> +
>> +
>> +**Example Usage**:
>> +
>> + # Enable the tracepoint
>> + echo1 > /sys/kernel/debug/tracing/events/pci/pcie_link_event/enable
>> +
>> + # Monitor events (the following output is generated when a device is hotplugged)
>> + cat /sys/kernel/debug/tracing/trace_pipe
>> + irq/51-pciehp-88 [001] ..... 381.545386: pcie_link_event: 0000:00:02.0 type:4, reason:4, cur_bus_speed:20, max_bus_speed:23, width:1, flit_mode:0, status:DLLLA
>
>
> Thanks,
> Alok
I will send a new version to fix above issues.
Thanks.
Shuai
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-12-10 1:57 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-25 11:41 [PATCH v13 0/3] PCI: trace: Add a RAS tracepoint to monitor link speed changes Shuai Xue
2025-10-25 11:41 ` [PATCH v13 1/3] PCI: trace: Add a generic RAS tracepoint for hotplug event Shuai Xue
2025-10-25 11:41 ` [PATCH v13 2/3] PCI: trace: Add a RAS tracepoint to monitor link speed changes Shuai Xue
2025-10-25 11:41 ` [PATCH v13 3/3] Documentation: tracing: Add documentation about PCI tracepoints Shuai Xue
2025-12-09 13:59 ` [External] : " ALOK TIWARI
2025-12-10 1:56 ` Shuai Xue
2025-11-04 9:34 ` [PATCH v13 0/3] PCI: trace: Add a RAS tracepoint to monitor link speed changes Shuai Xue
2025-12-09 13:19 ` Shuai Xue
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox