* [PATCH v9 0/2] add PCI hotplug and PCIe link tracepoint
@ 2025-07-23 3:31 Shuai Xue
2025-07-23 3:31 ` [PATCH v9 1/2] PCI: trace: Add a generic RAS tracepoint for hotplug event Shuai Xue
2025-07-23 3:31 ` [PATCH v9 2/2] PCI: trace: Add a RAS tracepoint to monitor link speed changes Shuai Xue
0 siblings, 2 replies; 20+ messages in thread
From: Shuai Xue @ 2025-07-23 3:31 UTC (permalink / raw)
To: rostedt, lukas, linux-pci, linux-kernel, linux-edac,
linux-trace-kernel, helgaas, ilpo.jarvinen, mattc,
Jonathan.Cameron
Cc: bhelgaas, tony.luck, bp, xueshuai, mhiramat, mathieu.desnoyers,
oleg, naveen, davem, anil.s.keshavamurthy, mark.rutland, peterz,
tianruidong
changes since v8:
- rewrite commit log from Bjorn
- move pci_hp_event to a common place (include/trace/events/pci.h) per Ilpo
- rename hotplug event strings per Bjorn and Lukas
- add PCIe link tracepoint per Bjorn, Lukas, and Ilpo
Hotplug events are critical indicators for analyzing hardware health, and
surprise link downs can significantly impact system performance and reliability.
In addition, PCIe link speed degradation directly impacts system performance and
often indicates hardware issues such as faulty devices, physical layer problems,
or configuration errors.
This patch set add PCI hotplug and PCIe link tracepoint to help analyze PCI
hotplug events and PCIe link speed degradation.
Shuai Xue (2):
PCI: trace: Add a generic RAS tracepoint for hotplug event
PCI: trace: Add a RAS tracepoint to monitor link speed changes
drivers/pci/hotplug/pciehp_ctrl.c | 33 +++++++--
drivers/pci/hotplug/pciehp_hpc.c | 5 +-
drivers/pci/pci.c | 2 +-
drivers/pci/pci.h | 12 ++-
drivers/pci/pcie/bwctrl.c | 4 +-
drivers/pci/probe.c | 10 ++-
include/linux/pci.h | 1 +
include/trace/events/pci.h | 119 ++++++++++++++++++++++++++++++
include/uapi/linux/pci.h | 7 ++
9 files changed, 177 insertions(+), 16 deletions(-)
create mode 100644 include/trace/events/pci.h
--
2.39.3
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v9 1/2] PCI: trace: Add a generic RAS tracepoint for hotplug event
2025-07-23 3:31 [PATCH v9 0/2] add PCI hotplug and PCIe link tracepoint Shuai Xue
@ 2025-07-23 3:31 ` Shuai Xue
2025-07-25 21:09 ` Bjorn Helgaas
2025-07-23 3:31 ` [PATCH v9 2/2] PCI: trace: Add a RAS tracepoint to monitor link speed changes Shuai Xue
1 sibling, 1 reply; 20+ messages in thread
From: Shuai Xue @ 2025-07-23 3:31 UTC (permalink / raw)
To: rostedt, lukas, linux-pci, linux-kernel, linux-edac,
linux-trace-kernel, helgaas, ilpo.jarvinen, mattc,
Jonathan.Cameron
Cc: bhelgaas, tony.luck, bp, xueshuai, mhiramat, mathieu.desnoyers,
oleg, naveen, davem, anil.s.keshavamurthy, mark.rutland, peterz,
tianruidong
Hotplug events are critical indicators for analyzing hardware health,
and surprise link downs can significantly impact system performance and
reliability.
Define a new TRACING_SYSTEM named "pci", add a generic RAS tracepoint
for hotplug event to help health checks. Add enum pci_hotplug_event in
include/uapi/linux/pci.h so applications like rasdaemon can register
tracepoint event handlers for it.
The output is like below:
$ echo 1 > /sys/kernel/debug/tracing/events/pci/pci_hp_event/enable
$ cat /sys/kernel/debug/tracing/trace_pipe
<...>-206 [001] ..... 40.373870: pci_hp_event: 0000:00:02.0 slot:10, event:LINK_DOWN
<...>-206 [001] ..... 40.374871: pci_hp_event: 0000:00:02.0 slot:10, event:CARD_NOT_PRESENT
Suggested-by: Lukas Wunner <lukas@wunner.de>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
drivers/pci/hotplug/pciehp_ctrl.c | 33 +++++++++++++---
include/trace/events/pci.h | 63 +++++++++++++++++++++++++++++++
include/uapi/linux/pci.h | 7 ++++
3 files changed, 97 insertions(+), 6 deletions(-)
create mode 100644 include/trace/events/pci.h
diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index bcc938d4420f..d241f270c8de 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -20,6 +20,9 @@
#include <linux/pm_runtime.h>
#include <linux/pci.h>
+#define CREATE_TRACE_POINTS
+#include <trace/events/pci.h>
+
#include "../pci.h"
#include "pciehp.h"
@@ -244,12 +247,20 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
case ON_STATE:
ctrl->state = POWEROFF_STATE;
mutex_unlock(&ctrl->state_lock);
- if (events & PCI_EXP_SLTSTA_DLLSC)
+ if (events & PCI_EXP_SLTSTA_DLLSC) {
ctrl_info(ctrl, "Slot(%s): Link Down\n",
slot_name(ctrl));
- if (events & PCI_EXP_SLTSTA_PDC)
+ trace_pci_hp_event(pci_name(ctrl->pcie->port),
+ slot_name(ctrl),
+ PCI_HOTPLUG_LINK_DOWN);
+ }
+ if (events & PCI_EXP_SLTSTA_PDC) {
ctrl_info(ctrl, "Slot(%s): Card not present\n",
slot_name(ctrl));
+ trace_pci_hp_event(pci_name(ctrl->pcie->port),
+ slot_name(ctrl),
+ PCI_HOTPLUG_CARD_NOT_PRESENT);
+ }
pciehp_disable_slot(ctrl, SURPRISE_REMOVAL);
break;
default:
@@ -269,6 +280,9 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
INDICATOR_NOOP);
ctrl_info(ctrl, "Slot(%s): Card not present\n",
slot_name(ctrl));
+ trace_pci_hp_event(pci_name(ctrl->pcie->port),
+ slot_name(ctrl),
+ PCI_HOTPLUG_CARD_NOT_PRESENT);
}
mutex_unlock(&ctrl->state_lock);
return;
@@ -281,12 +295,19 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
case OFF_STATE:
ctrl->state = POWERON_STATE;
mutex_unlock(&ctrl->state_lock);
- if (present)
+ if (present) {
ctrl_info(ctrl, "Slot(%s): Card present\n",
slot_name(ctrl));
- if (link_active)
- ctrl_info(ctrl, "Slot(%s): Link Up\n",
- slot_name(ctrl));
+ trace_pci_hp_event(pci_name(ctrl->pcie->port),
+ slot_name(ctrl),
+ PCI_HOTPLUG_CARD_PRESENT);
+ }
+ if (link_active) {
+ ctrl_info(ctrl, "Slot(%s): Link Up\n", slot_name(ctrl));
+ trace_pci_hp_event(pci_name(ctrl->pcie->port),
+ slot_name(ctrl),
+ PCI_HOTPLUG_LINK_UP);
+ }
ctrl->request_result = pciehp_enable_slot(ctrl);
break;
default:
diff --git a/include/trace/events/pci.h b/include/trace/events/pci.h
new file mode 100644
index 000000000000..208609492c06
--- /dev/null
+++ b/include/trace/events/pci.h
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM pci
+
+#if !defined(_TRACE_HW_EVENT_PCI_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HW_EVENT_PCI_H
+
+#include <linux/tracepoint.h>
+
+#define PCI_HOTPLUG_EVENT \
+ EM(PCI_HOTPLUG_LINK_UP, "LINK_UP") \
+ EM(PCI_HOTPLUG_LINK_DOWN, "LINK_DOWN") \
+ EM(PCI_HOTPLUG_CARD_PRESENT, "CARD_PRESENT") \
+ EMe(PCI_HOTPLUG_CARD_NOT_PRESENT, "CARD_NOT_PRESENT")
+
+/* Enums require being exported to userspace, for user tool parsing */
+#undef EM
+#undef EMe
+#define EM(a, b) TRACE_DEFINE_ENUM(a);
+#define EMe(a, b) TRACE_DEFINE_ENUM(a);
+
+PCI_HOTPLUG_EVENT
+
+/*
+ * Now redefine the EM() and EMe() macros to map the enums to the strings
+ * that will be printed in the output.
+ */
+#undef EM
+#undef EMe
+#define EM(a, b) {a, b},
+#define EMe(a, b) {a, b}
+
+TRACE_EVENT(pci_hp_event,
+
+ TP_PROTO(const char *port_name,
+ const char *slot,
+ const int event),
+
+ TP_ARGS(port_name, slot, event),
+
+ TP_STRUCT__entry(
+ __string( port_name, port_name )
+ __string( slot, slot )
+ __field( int, event )
+ ),
+
+ TP_fast_assign(
+ __assign_str(port_name);
+ __assign_str(slot);
+ __entry->event = event;
+ ),
+
+ TP_printk("%s slot:%s, event:%s\n",
+ __get_str(port_name),
+ __get_str(slot),
+ __print_symbolic(__entry->event, PCI_HOTPLUG_EVENT)
+ )
+);
+
+#endif /* _TRACE_HW_EVENT_PCI_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/include/uapi/linux/pci.h b/include/uapi/linux/pci.h
index a769eefc5139..4f150028965d 100644
--- a/include/uapi/linux/pci.h
+++ b/include/uapi/linux/pci.h
@@ -39,4 +39,11 @@
#define PCIIOC_MMAP_IS_MEM (PCIIOC_BASE | 0x02) /* Set mmap state to MEM space. */
#define PCIIOC_WRITE_COMBINE (PCIIOC_BASE | 0x03) /* Enable/disable write-combining. */
+enum pci_hotplug_event {
+ PCI_HOTPLUG_LINK_UP,
+ PCI_HOTPLUG_LINK_DOWN,
+ PCI_HOTPLUG_CARD_PRESENT,
+ PCI_HOTPLUG_CARD_NOT_PRESENT,
+};
+
#endif /* _UAPILINUX_PCI_H */
--
2.39.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v9 2/2] PCI: trace: Add a RAS tracepoint to monitor link speed changes
2025-07-23 3:31 [PATCH v9 0/2] add PCI hotplug and PCIe link tracepoint Shuai Xue
2025-07-23 3:31 ` [PATCH v9 1/2] PCI: trace: Add a generic RAS tracepoint for hotplug event Shuai Xue
@ 2025-07-23 3:31 ` Shuai Xue
2025-07-23 14:05 ` Steven Rostedt
` (2 more replies)
1 sibling, 3 replies; 20+ messages in thread
From: Shuai Xue @ 2025-07-23 3:31 UTC (permalink / raw)
To: rostedt, lukas, linux-pci, linux-kernel, linux-edac,
linux-trace-kernel, helgaas, ilpo.jarvinen, mattc,
Jonathan.Cameron
Cc: bhelgaas, tony.luck, bp, xueshuai, mhiramat, mathieu.desnoyers,
oleg, naveen, davem, anil.s.keshavamurthy, mark.rutland, peterz,
tianruidong
PCIe link speed degradation directly impacts system performance and
often indicates hardware issues such as faulty devices, physical layer
problems, or configuration errors.
To this end, add a RAS tracepoint to monitor link speed changes,
enabling proactive health checks and diagnostic analysis.
The output is like below:
$ echo 1 > /sys/kernel/debug/tracing/events/pci/pci_link_event/enable
$ cat /sys/kernel/debug/tracing/trace_pipe
cat /sys/kernel/debug/tracing/trace_pipe
<...>-119 [002] ..... 125.776171: pci_hp_event: 0000:00:03.0 slot:30, event:CARD_PRESENT
<...>-119 [002] ..... 125.776197: pci_hp_event: 0000:00:03.0 slot:30, event:LINK_UP
irq/57-pciehp-119 [002] ..... 125.904335: pcie_link_event: 0000:00:03.0 type:4, reason:4, cur_bus_speed:2.5 GT/s PCIe, max_bus_speed:16.0 GT/s PCIe, width:1, flit_mode:0, status:DLLLA
irq/57-pciehp-119 [002] ..... 125.907051: pcie_link_event: 0000:00:03.0 type:4, reason:0, cur_bus_speed:2.5 GT/s PCIe, max_bus_speed:16.0 GT/s PCIe, width:1, flit_mode:0, status:DLLLA
Suggested-by: Bjorn Helgaas <helgaas@kernel.org>
Suggested-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Suggested-by: Matthew W Carlis <mattc@purestorage.com>
Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
drivers/pci/hotplug/pciehp_hpc.c | 5 ++-
drivers/pci/pci.c | 2 +-
drivers/pci/pci.h | 12 +++++--
drivers/pci/pcie/bwctrl.c | 4 +--
drivers/pci/probe.c | 10 ++++--
include/linux/pci.h | 1 +
include/trace/events/pci.h | 56 ++++++++++++++++++++++++++++++++
7 files changed, 80 insertions(+), 10 deletions(-)
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 91d2d92717d9..232d6704cb81 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -292,7 +292,7 @@ int pciehp_check_link_status(struct controller *ctrl)
{
struct pci_dev *pdev = ctrl_dev(ctrl);
bool found;
- u16 lnk_status, linksta2;
+ u16 lnk_status;
if (!pcie_wait_for_link(pdev, true)) {
ctrl_info(ctrl, "Slot(%s): No link\n", slot_name(ctrl));
@@ -319,8 +319,7 @@ int pciehp_check_link_status(struct controller *ctrl)
return -1;
}
- pcie_capability_read_word(pdev, PCI_EXP_LNKSTA2, &linksta2);
- __pcie_update_link_speed(ctrl->pcie->port->subordinate, lnk_status, linksta2);
+ pcie_update_link_speed(ctrl->pcie->port->subordinate, PCIE_HOTPLUG);
if (!found) {
ctrl_info(ctrl, "Slot(%s): No device found\n",
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 9e42090fb108..6c13af287d63 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4739,7 +4739,7 @@ int pcie_retrain_link(struct pci_dev *pdev, bool use_lt)
* Link Speed.
*/
if (pdev->subordinate)
- pcie_update_link_speed(pdev->subordinate);
+ pcie_update_link_speed(pdev->subordinate, PCIE_LINK_RETRAIN);
return rc;
}
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 12215ee72afb..438a6d29d84e 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -452,7 +452,6 @@ static inline int pcie_dev_speed_mbps(enum pci_bus_speed speed)
}
u8 pcie_get_supported_speeds(struct pci_dev *dev);
-const char *pci_speed_string(enum pci_bus_speed speed);
void __pcie_print_link_status(struct pci_dev *dev, bool verbose);
void pcie_report_downtraining(struct pci_dev *dev);
@@ -461,7 +460,16 @@ static inline void __pcie_update_link_speed(struct pci_bus *bus, u16 linksta, u1
bus->cur_bus_speed = pcie_link_speed[linksta & PCI_EXP_LNKSTA_CLS];
bus->flit_mode = (linksta2 & PCI_EXP_LNKSTA2_FLIT) ? 1 : 0;
}
-void pcie_update_link_speed(struct pci_bus *bus);
+
+enum pcie_link_change_reason {
+ PCIE_LINK_RETRAIN,
+ PCIE_ADD_BUS,
+ PCIE_BWCTRL_ENABLE,
+ PCIE_BWCTRL_IRQ,
+ PCIE_HOTPLUG
+};
+
+void pcie_update_link_speed(struct pci_bus *bus, enum pcie_link_change_reason reason);
/* Single Root I/O Virtualization */
struct pci_sriov {
diff --git a/drivers/pci/pcie/bwctrl.c b/drivers/pci/pcie/bwctrl.c
index 36f939f23d34..32f1b30ecb84 100644
--- a/drivers/pci/pcie/bwctrl.c
+++ b/drivers/pci/pcie/bwctrl.c
@@ -199,7 +199,7 @@ static void pcie_bwnotif_enable(struct pcie_device *srv)
* Update after enabling notifications & clearing status bits ensures
* link speed is up to date.
*/
- pcie_update_link_speed(port->subordinate);
+ pcie_update_link_speed(port->subordinate, PCIE_BWCTRL_ENABLE);
}
static void pcie_bwnotif_disable(struct pci_dev *port)
@@ -234,7 +234,7 @@ static irqreturn_t pcie_bwnotif_irq(int irq, void *context)
* speed (inside pcie_update_link_speed()) after LBMS has been
* cleared to avoid missing link speed changes.
*/
- pcie_update_link_speed(port->subordinate);
+ pcie_update_link_speed(port->subordinate, PCIE_BWCTRL_IRQ);
return IRQ_HANDLED;
}
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 4b8693ec9e4c..1a42a156f501 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -21,6 +21,7 @@
#include <linux/irqdomain.h>
#include <linux/pm_runtime.h>
#include <linux/bitfield.h>
+#include <trace/events/pci.h>
#include "pci.h"
#define CARDBUS_LATENCY_TIMER 176 /* secondary latency timer */
@@ -788,14 +789,19 @@ const char *pci_speed_string(enum pci_bus_speed speed)
}
EXPORT_SYMBOL_GPL(pci_speed_string);
-void pcie_update_link_speed(struct pci_bus *bus)
+void pcie_update_link_speed(struct pci_bus *bus, enum pcie_link_change_reason reason)
{
struct pci_dev *bridge = bus->self;
u16 linksta, linksta2;
pcie_capability_read_word(bridge, PCI_EXP_LNKSTA, &linksta);
pcie_capability_read_word(bridge, PCI_EXP_LNKSTA2, &linksta2);
+
__pcie_update_link_speed(bus, linksta, linksta2);
+ trace_pcie_link_event(bus,
+ reason,
+ FIELD_GET(PCI_EXP_LNKSTA_NLW, linksta),
+ linksta & PCI_EXP_LNKSTA_LINK_STATUS_MASK);
}
EXPORT_SYMBOL_GPL(pcie_update_link_speed);
@@ -882,7 +888,7 @@ static void pci_set_bus_speed(struct pci_bus *bus)
pcie_capability_read_dword(bridge, PCI_EXP_LNKCAP, &linkcap);
bus->max_bus_speed = pcie_link_speed[linkcap & PCI_EXP_LNKCAP_SLS];
- pcie_update_link_speed(bus);
+ pcie_update_link_speed(bus, PCIE_ADD_BUS);
}
}
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 05e68f35f392..8346121c035d 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -305,6 +305,7 @@ enum pci_bus_speed {
PCI_SPEED_UNKNOWN = 0xff,
};
+const char *pci_speed_string(enum pci_bus_speed speed);
enum pci_bus_speed pcie_get_speed_cap(struct pci_dev *dev);
enum pcie_link_width pcie_get_width_cap(struct pci_dev *dev);
diff --git a/include/trace/events/pci.h b/include/trace/events/pci.h
index 208609492c06..78e651b95cb3 100644
--- a/include/trace/events/pci.h
+++ b/include/trace/events/pci.h
@@ -57,6 +57,62 @@ TRACE_EVENT(pci_hp_event,
)
);
+#define PCI_EXP_LNKSTA_LINK_STATUS_MASK (PCI_EXP_LNKSTA_LBMS | \
+ PCI_EXP_LNKSTA_LABS | \
+ PCI_EXP_LNKSTA_LT | \
+ PCI_EXP_LNKSTA_DLLLA)
+
+#define LNKSTA_FLAGS \
+ { PCI_EXP_LNKSTA_LT, "LT"}, \
+ { PCI_EXP_LNKSTA_DLLLA, "DLLLA"}, \
+ { PCI_EXP_LNKSTA_LBMS, "LBMS"}, \
+ { PCI_EXP_LNKSTA_LABS, "LABS"}
+
+TRACE_EVENT(pcie_link_event,
+
+ TP_PROTO(struct pci_bus *bus,
+ unsigned int reason,
+ unsigned int width,
+ unsigned int status
+ ),
+
+ TP_ARGS(bus, reason, width, status),
+
+ TP_STRUCT__entry(
+ __string( port_name, pci_name(bus->self))
+ __field( unsigned int, type )
+ __field( unsigned int, reason )
+ __field( unsigned int, cur_bus_speed )
+ __field( unsigned int, max_bus_speed )
+ __field( unsigned int, width )
+ __field( unsigned int, flit_mode )
+ __field( unsigned int, link_status )
+ ),
+
+ TP_fast_assign(
+ __assign_str(port_name);
+ __entry->type = pci_pcie_type(bus->self);
+ __entry->reason = reason;
+ __entry->cur_bus_speed = bus->cur_bus_speed;
+ __entry->max_bus_speed = bus->max_bus_speed;
+ __entry->width = width;
+ __entry->flit_mode = bus->flit_mode;
+ __entry->link_status = status;
+ ),
+
+ TP_printk("%s type:%d, reason:%d, cur_bus_speed:%s, max_bus_speed:%s, width:%u, flit_mode:%u, status:%s\n",
+ __get_str(port_name),
+ __entry->type,
+ __entry->reason,
+ pci_speed_string(__entry->cur_bus_speed),
+ pci_speed_string(__entry->max_bus_speed),
+ __entry->width,
+ __entry->flit_mode,
+ __print_flags((unsigned long)__entry->link_status, "|",
+ LNKSTA_FLAGS)
+ )
+);
+
#endif /* _TRACE_HW_EVENT_PCI_H */
/* This part must be outside protection */
--
2.39.3
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v9 2/2] PCI: trace: Add a RAS tracepoint to monitor link speed changes
2025-07-23 3:31 ` [PATCH v9 2/2] PCI: trace: Add a RAS tracepoint to monitor link speed changes Shuai Xue
@ 2025-07-23 14:05 ` Steven Rostedt
2025-07-25 2:11 ` Shuai Xue
2025-07-23 19:30 ` kernel test robot
2025-07-25 21:09 ` Bjorn Helgaas
2 siblings, 1 reply; 20+ messages in thread
From: Steven Rostedt @ 2025-07-23 14:05 UTC (permalink / raw)
To: Shuai Xue
Cc: lukas, linux-pci, linux-kernel, linux-edac, linux-trace-kernel,
helgaas, ilpo.jarvinen, mattc, Jonathan.Cameron, bhelgaas,
tony.luck, bp, mhiramat, mathieu.desnoyers, oleg, naveen, davem,
anil.s.keshavamurthy, mark.rutland, peterz, tianruidong
On Wed, 23 Jul 2025 11:31:08 +0800
Shuai Xue <xueshuai@linux.alibaba.com> wrote:
> + TP_printk("%s type:%d, reason:%d, cur_bus_speed:%s, max_bus_speed:%s, width:%u, flit_mode:%u, status:%s\n",
> + __get_str(port_name),
> + __entry->type,
> + __entry->reason,
> + pci_speed_string(__entry->cur_bus_speed),
> + pci_speed_string(__entry->max_bus_speed),
Hmm, I guess pci_speed_string() should be added to libtraceveent so
that perf and trace-cmd parses it correctly. I guess rasdaemon would
want that too (which also uses libtraceevent).
-- Steve
> + __entry->width,
> + __entry->flit_mode,
> + __print_flags((unsigned long)__entry->link_status, "|",
> + LNKSTA_FLAGS)
> + )
> +);
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v9 2/2] PCI: trace: Add a RAS tracepoint to monitor link speed changes
2025-07-23 3:31 ` [PATCH v9 2/2] PCI: trace: Add a RAS tracepoint to monitor link speed changes Shuai Xue
2025-07-23 14:05 ` Steven Rostedt
@ 2025-07-23 19:30 ` kernel test robot
2025-07-24 14:06 ` Steven Rostedt
2025-07-25 21:09 ` Bjorn Helgaas
2 siblings, 1 reply; 20+ messages in thread
From: kernel test robot @ 2025-07-23 19:30 UTC (permalink / raw)
To: Shuai Xue, rostedt, lukas, linux-pci, linux-kernel, linux-edac,
linux-trace-kernel, helgaas, ilpo.jarvinen, mattc,
Jonathan.Cameron
Cc: oe-kbuild-all, bhelgaas, tony.luck, bp, xueshuai, mhiramat,
mathieu.desnoyers, oleg, naveen, davem, anil.s.keshavamurthy,
mark.rutland, peterz, tianruidong
Hi Shuai,
kernel test robot noticed the following build errors:
[auto build test ERROR on pci/for-linus]
[also build test ERROR on trace/for-next linus/master v6.16-rc7 next-20250723]
[cannot apply to pci/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Shuai-Xue/PCI-trace-Add-a-generic-RAS-tracepoint-for-hotplug-event/20250723-113454
base: https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git for-linus
patch link: https://lore.kernel.org/r/20250723033108.61587-3-xueshuai%40linux.alibaba.com
patch subject: [PATCH v9 2/2] PCI: trace: Add a RAS tracepoint to monitor link speed changes
config: sparc-sparc64_defconfig (https://download.01.org/0day-ci/archive/20250724/202507240322.nJGdyXsy-lkp@intel.com/config)
compiler: sparc64-linux-gcc (GCC) 15.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250724/202507240322.nJGdyXsy-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202507240322.nJGdyXsy-lkp@intel.com/
All errors (new ones prefixed by >>):
sparc64-linux-ld: drivers/pci/probe.o: in function `pcie_update_link_speed':
>> probe.c:(.text+0x370): undefined reference to `__tracepoint_pcie_link_event'
>> sparc64-linux-ld: probe.c:(.text+0x37c): undefined reference to `__tracepoint_pcie_link_event'
>> sparc64-linux-ld: probe.c:(.text+0x3dc): undefined reference to `__traceiter_pcie_link_event'
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v9 2/2] PCI: trace: Add a RAS tracepoint to monitor link speed changes
2025-07-23 19:30 ` kernel test robot
@ 2025-07-24 14:06 ` Steven Rostedt
2025-07-25 2:31 ` Shuai Xue
0 siblings, 1 reply; 20+ messages in thread
From: Steven Rostedt @ 2025-07-24 14:06 UTC (permalink / raw)
To: kernel test robot
Cc: Shuai Xue, lukas, linux-pci, linux-kernel, linux-edac,
linux-trace-kernel, helgaas, ilpo.jarvinen, mattc,
Jonathan.Cameron, oe-kbuild-all, bhelgaas, tony.luck, bp,
mhiramat, mathieu.desnoyers, oleg, naveen, davem,
anil.s.keshavamurthy, mark.rutland, peterz, tianruidong
On Thu, 24 Jul 2025 03:30:17 +0800
kernel test robot <lkp@intel.com> wrote:
> url: https://github.com/intel-lab-lkp/linux/commits/Shuai-Xue/PCI-trace-Add-a-generic-RAS-tracepoint-for-hotplug-event/20250723-113454
> base: https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git for-linus
> patch link: https://lore.kernel.org/r/20250723033108.61587-3-xueshuai%40linux.alibaba.com
> patch subject: [PATCH v9 2/2] PCI: trace: Add a RAS tracepoint to monitor link speed changes
> config: sparc-sparc64_defconfig (https://download.01.org/0day-ci/archive/20250724/202507240322.nJGdyXsy-lkp@intel.com/config)
> compiler: sparc64-linux-gcc (GCC) 15.1.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250724/202507240322.nJGdyXsy-lkp@intel.com/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202507240322.nJGdyXsy-lkp@intel.com/
>
> All errors (new ones prefixed by >>):
>
> sparc64-linux-ld: drivers/pci/probe.o: in function `pcie_update_link_speed':
> >> probe.c:(.text+0x370): undefined reference to `__tracepoint_pcie_link_event'
> >> sparc64-linux-ld: probe.c:(.text+0x37c): undefined reference to `__tracepoint_pcie_link_event'
> >> sparc64-linux-ld: probe.c:(.text+0x3dc): undefined reference to `__traceiter_pcie_link_event'
The config has:
# CONFIG_CPU_HOTPLUG_STATE_CONTROL is not set
Which looks to me from the first patch, would enable the code that
defines the trace events via the:
+#define CREATE_TRACE_POINTS
+#include <trace/events/pci.h>
Thus, without compiling that file, the tracepoints would not be created
and you would get the above errors.
-- Steve
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v9 2/2] PCI: trace: Add a RAS tracepoint to monitor link speed changes
2025-07-23 14:05 ` Steven Rostedt
@ 2025-07-25 2:11 ` Shuai Xue
2025-07-25 2:25 ` Steven Rostedt
0 siblings, 1 reply; 20+ messages in thread
From: Shuai Xue @ 2025-07-25 2:11 UTC (permalink / raw)
To: Steven Rostedt
Cc: lukas, linux-pci, linux-kernel, linux-edac, linux-trace-kernel,
helgaas, ilpo.jarvinen, mattc, Jonathan.Cameron, bhelgaas,
tony.luck, bp, mhiramat, mathieu.desnoyers, oleg, naveen, davem,
anil.s.keshavamurthy, mark.rutland, peterz, tianruidong
Hi, Steve
在 2025/7/23 22:05, Steven Rostedt 写道:
> On Wed, 23 Jul 2025 11:31:08 +0800
> Shuai Xue <xueshuai@linux.alibaba.com> wrote:
>
>> + TP_printk("%s type:%d, reason:%d, cur_bus_speed:%s, max_bus_speed:%s, width:%u, flit_mode:%u, status:%s\n",
>> + __get_str(port_name),
>> + __entry->type,
>> + __entry->reason,
>> + pci_speed_string(__entry->cur_bus_speed),
>> + pci_speed_string(__entry->max_bus_speed),
>
> Hmm, I guess pci_speed_string() should be added to libtraceveent so
> that perf and trace-cmd parses it correctly. I guess rasdaemon would
> want that too (which also uses libtraceevent).
Thank you for pointing this out. You're absolutely right that
pci_speed_string() should be properly handled in libtraceevent for
better userspace tool support.
$ cat /sys/kernel/debug/tracing/trace_pipe
irq/57-pciehp-119 [002] ..... 125.904335: pcie_link_event: 0000:00:03.0 type:4, reason:4, cur_bus_speed:2.5 GT/s PCIe, max_bus_speed:16.0 GT/s PCIe, width:1, flit_mode:0, status:DLLLA
irq/57-pciehp-119 [002] ..... 125.907051: pcie_link_event: 0000:00:03.0 type:4, reason:0, cur_bus_speed:2.5 GT/s PCIe, max_bus_speed:16.0 GT/s PCIe, width:1, flit_mode:0, status:DLLLA
Compared with debug/tracing, perf trace used the raw event field, and
the speed is not handman readable.
$ perf trace -e pci:pcie_link_event
0.000 irq/57-pciehp/121 pci:pcie_link_event(port_name: "0000:00:03.0", type: 4, reason: 4, cur_bus_speed: 20, max_bus_speed: 23, width: 1, link_status: 8192)
4.058 irq/57-pciehp/121 pci:pcie_link_event(port_name: "0000:00:03.0", type: 4, cur_bus_speed: 20, max_bus_speed: 23, width: 1, link_status: 8192)
I see a couple of options here:
1. Keep the current approach and add libtraceevent support as follow-up
work. The tracepoint would still be functional, but userspace tools
would show raw speed values instead of formatted strings until
libtraceevent is updated.
2. Use raw values in the tracepoint for now (e.g., store speed as
integer) and let userspace tools handle the formatting. This would avoid
the immediate dependency on libtraceevent updates.
3. Address both kernel and userspace in coordinated patch set.
Which approach would you prefer? If you think option1 is acceptable, I'm
happy to work on the libtraceevent changes as a follow-up.
Alternatively, if you'd prefer option 2, I can modify the tracepoint to
use raw values. And if you perfer opiton 3, I will also include a new
patch 3 to add a plugin helper for libtraceevent.
For the libtraceevent implementation, I believe we'd
need to:
- Add the PCI speed mapping table to libtraceevent
- Create a print function similar to other existing parsers
- Ensure perf, trace-cmd, and rasdaemon can all benefit from it
Would you like me to investigate the libtraceevent changes, or do you
have other suggestions for the approach?
Thanks again for the feedback.
Best regards,
Shuai
>
> -- Steve
>
>
>> + __entry->width,
>> + __entry->flit_mode,
>> + __print_flags((unsigned long)__entry->link_status, "|",
>> + LNKSTA_FLAGS)
>> + )
>> +);
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v9 2/2] PCI: trace: Add a RAS tracepoint to monitor link speed changes
2025-07-25 2:11 ` Shuai Xue
@ 2025-07-25 2:25 ` Steven Rostedt
2025-07-25 2:59 ` Shuai Xue
0 siblings, 1 reply; 20+ messages in thread
From: Steven Rostedt @ 2025-07-25 2:25 UTC (permalink / raw)
To: Shuai Xue
Cc: lukas, linux-pci, linux-kernel, linux-edac, linux-trace-kernel,
helgaas, ilpo.jarvinen, mattc, Jonathan.Cameron, bhelgaas,
tony.luck, bp, mhiramat, mathieu.desnoyers, oleg, naveen, davem,
anil.s.keshavamurthy, mark.rutland, peterz, tianruidong
On Fri, 25 Jul 2025 10:11:10 +0800
Shuai Xue <xueshuai@linux.alibaba.com> wrote:
> For the libtraceevent implementation, I believe we'd
> need to:
>
> - Add the PCI speed mapping table to libtraceevent
> - Create a print function similar to other existing parsers
> - Ensure perf, trace-cmd, and rasdaemon can all benefit from it
>
> Would you like me to investigate the libtraceevent changes, or do you
Yeah, just update libtraceevent. In fact, libtraceevent has plugins for
things like this.
You can use this as an example:
https://git.kernel.org/pub/scm/libs/libtrace/libtraceevent.git/tree/plugins/plugin_jbd2.c
That adds two functions that are used in print fmt strings. Here's one:
static unsigned long long
process_jbd2_dev_to_name(struct trace_seq *s, unsigned long long *args)
{
unsigned int dev = args[0];
trace_seq_printf(s, "%d:%d", MAJOR(dev), MINOR(dev));
return 0;
}
int TEP_PLUGIN_LOADER(struct tep_handle *tep)
{
tep_register_print_function(tep,
process_jbd2_dev_to_name,
TEP_FUNC_ARG_STRING,
"jbd2_dev_to_name",
TEP_FUNC_ARG_INT,
TEP_FUNC_ARG_VOID);
[..]
The above defines:
char *jbd2_dev_to_name(int arg0);
And when this is found in the parsing, it calls process_jbd2_dev_to_name()
passing it the arguments that was found in the trace.
You would have something like:
tep_register_print_function(tep,
process_pci_speed_string,
TEP_FUNC_ARG_STRING,
"pci_speed_string",
TEP_FUNC_ARG_INT,
TEP_FUNC_ARG_VOID);
Which will return a string and take an integer as an argument. Then you
would just implement the process_pci_speed_string() function to do the same
thing as the pci_speed_string() does in the kernel.
Oh, and here's the man page for you on tep_register_print_function()
https://trace-cmd.org/Documentation/libtraceevent/libtraceevent-reg_print_func.html
-- Steve
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v9 2/2] PCI: trace: Add a RAS tracepoint to monitor link speed changes
2025-07-24 14:06 ` Steven Rostedt
@ 2025-07-25 2:31 ` Shuai Xue
2025-07-25 2:57 ` Steven Rostedt
0 siblings, 1 reply; 20+ messages in thread
From: Shuai Xue @ 2025-07-25 2:31 UTC (permalink / raw)
To: Steven Rostedt, kernel test robot, Bjorn Helgaas
Cc: lukas, linux-pci, linux-kernel, linux-edac, linux-trace-kernel,
helgaas, ilpo.jarvinen, mattc, Jonathan.Cameron, oe-kbuild-all,
tony.luck, bp, mhiramat, mathieu.desnoyers, oleg, naveen, davem,
anil.s.keshavamurthy, mark.rutland, peterz, tianruidong
Hi, Steven,
在 2025/7/24 22:06, Steven Rostedt 写道:
> On Thu, 24 Jul 2025 03:30:17 +0800
> kernel test robot <lkp@intel.com> wrote:
>
>> url: https://github.com/intel-lab-lkp/linux/commits/Shuai-Xue/PCI-trace-Add-a-generic-RAS-tracepoint-for-hotplug-event/20250723-113454
>> base: https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git for-linus
>> patch link: https://lore.kernel.org/r/20250723033108.61587-3-xueshuai%40linux.alibaba.com
>> patch subject: [PATCH v9 2/2] PCI: trace: Add a RAS tracepoint to monitor link speed changes
>> config: sparc-sparc64_defconfig (https://download.01.org/0day-ci/archive/20250724/202507240322.nJGdyXsy-lkp@intel.com/config)
>> compiler: sparc64-linux-gcc (GCC) 15.1.0
>> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250724/202507240322.nJGdyXsy-lkp@intel.com/reproduce)
>>
>> If you fix the issue in a separate patch/commit (i.e. not just a new version of
>> the same patch/commit), kindly add following tags
>> | Reported-by: kernel test robot <lkp@intel.com>
>> | Closes: https://lore.kernel.org/oe-kbuild-all/202507240322.nJGdyXsy-lkp@intel.com/
>>
>> All errors (new ones prefixed by >>):
>>
>> sparc64-linux-ld: drivers/pci/probe.o: in function `pcie_update_link_speed':
>>>> probe.c:(.text+0x370): undefined reference to `__tracepoint_pcie_link_event'
>>>> sparc64-linux-ld: probe.c:(.text+0x37c): undefined reference to `__tracepoint_pcie_link_event'
>>>> sparc64-linux-ld: probe.c:(.text+0x3dc): undefined reference to `__traceiter_pcie_link_event'
>
> The config has:
>
> # CONFIG_CPU_HOTPLUG_STATE_CONTROL is not set
>
> Which looks to me from the first patch, would enable the code that
> defines the trace events via the:
>
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/pci.h>
>
> Thus, without compiling that file, the tracepoints would not be created
> and you would get the above errors.
Thanks for the detailed analysis! You're absolutely right about the root
cause.
The issue is that I have the tracepoint usage in drivers/pci/probe.c
(which is always compiled), but the tracepoint definition with:
#define CREATE_TRACE_POINTS
#include<trace/events/pci.h>
is in drivers/pci/hotplug/pciehp_ctrl.c that's conditionally compiled
based on CONFIG_HOTPLUG_PCI_PCIE=y, creating the undefined reference
when that config is not set.
I can see a few ways to fix this:
- Move tracepoint definition to always-compiled code - Create a dedicated drivers/pci/trace.c that always defines the PCI tracepoints
- Make the tracepoint usage conditional - Wrap the trace calls with the same config dependency
- Create a PCI-specific trace config - Add a CONFIG_PCI_TRACE option
I'm leaning toward option 1 (dedicated trace.c file) as it's the
cleanest approach and follows the pattern used in other subsystems. Does
that sound reasonable to you?
>
> -- Steve
Thanks for valuable feedback.
Best Regards,
Shuai
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v9 2/2] PCI: trace: Add a RAS tracepoint to monitor link speed changes
2025-07-25 2:31 ` Shuai Xue
@ 2025-07-25 2:57 ` Steven Rostedt
2025-07-25 3:18 ` Shuai Xue
0 siblings, 1 reply; 20+ messages in thread
From: Steven Rostedt @ 2025-07-25 2:57 UTC (permalink / raw)
To: Shuai Xue
Cc: kernel test robot, Bjorn Helgaas, lukas, linux-pci, linux-kernel,
linux-edac, linux-trace-kernel, helgaas, ilpo.jarvinen, mattc,
Jonathan.Cameron, oe-kbuild-all, tony.luck, bp, mhiramat,
mathieu.desnoyers, oleg, naveen, davem, anil.s.keshavamurthy,
mark.rutland, peterz, tianruidong
On Fri, 25 Jul 2025 10:31:45 +0800
Shuai Xue <xueshuai@linux.alibaba.com> wrote:
> I'm leaning toward option 1 (dedicated trace.c file) as it's the
> cleanest approach and follows the pattern used in other subsystems. Does
> that sound reasonable to you?
That's fine, but make sure you have #ifdef CONFIG_FOO around tracepoints
that are only used when those configs are enabled, otherwise you will get
warnings.
Well, if they are exported, then the warnings are suppressed.
https://lore.kernel.org/all/20250725025149.726267838@kernel.org/
-- Steve
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v9 2/2] PCI: trace: Add a RAS tracepoint to monitor link speed changes
2025-07-25 2:25 ` Steven Rostedt
@ 2025-07-25 2:59 ` Shuai Xue
2025-07-25 3:06 ` Steven Rostedt
0 siblings, 1 reply; 20+ messages in thread
From: Shuai Xue @ 2025-07-25 2:59 UTC (permalink / raw)
To: Steven Rostedt
Cc: lukas, linux-pci, linux-kernel, linux-edac, linux-trace-kernel,
helgaas, ilpo.jarvinen, mattc, Jonathan.Cameron, bhelgaas,
tony.luck, bp, mhiramat, mathieu.desnoyers, oleg, naveen, davem,
anil.s.keshavamurthy, mark.rutland, peterz, tianruidong
在 2025/7/25 10:25, Steven Rostedt 写道:
> On Fri, 25 Jul 2025 10:11:10 +0800
> Shuai Xue <xueshuai@linux.alibaba.com> wrote:
>
>> For the libtraceevent implementation, I believe we'd
>> need to:
>>
>> - Add the PCI speed mapping table to libtraceevent
>> - Create a print function similar to other existing parsers
>> - Ensure perf, trace-cmd, and rasdaemon can all benefit from it
>>
>> Would you like me to investigate the libtraceevent changes, or do you
>
> Yeah, just update libtraceevent. In fact, libtraceevent has plugins for
> things like this.
>
> You can use this as an example:
>
> https://git.kernel.org/pub/scm/libs/libtrace/libtraceevent.git/tree/plugins/plugin_jbd2.c
>
> That adds two functions that are used in print fmt strings. Here's one:
>
> static unsigned long long
> process_jbd2_dev_to_name(struct trace_seq *s, unsigned long long *args)
> {
> unsigned int dev = args[0];
>
> trace_seq_printf(s, "%d:%d", MAJOR(dev), MINOR(dev));
> return 0;
> }
>
>
> int TEP_PLUGIN_LOADER(struct tep_handle *tep)
> {
> tep_register_print_function(tep,
> process_jbd2_dev_to_name,
> TEP_FUNC_ARG_STRING,
> "jbd2_dev_to_name",
> TEP_FUNC_ARG_INT,
> TEP_FUNC_ARG_VOID);
> [..]
>
> The above defines:
>
> char *jbd2_dev_to_name(int arg0);
>
> And when this is found in the parsing, it calls process_jbd2_dev_to_name()
> passing it the arguments that was found in the trace.
>
> You would have something like:
>
> tep_register_print_function(tep,
> process_pci_speed_string,
> TEP_FUNC_ARG_STRING,
> "pci_speed_string",
> TEP_FUNC_ARG_INT,
> TEP_FUNC_ARG_VOID);
>
> Which will return a string and take an integer as an argument. Then you
> would just implement the process_pci_speed_string() function to do the same
> thing as the pci_speed_string() does in the kernel.
>
> Oh, and here's the man page for you on tep_register_print_function()
>
> https://trace-cmd.org/Documentation/libtraceevent/libtraceevent-reg_print_func.html
Hi Steve,
Thank you so much for the detailed guidance and the excellent example!
This makes it much clearer how to implement the libtraceevent support.
Should I include the libtraceevent plugin patch in the same kernel patch
series, or submit it separately? I'm not sure about the best practice
here.
>
> -- Steve
I'll work on the libtraceevent patch and submit it according to your
guidance. Thanks again for the clear direction and the documentation
link!
Thanks.
Shuai
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v9 2/2] PCI: trace: Add a RAS tracepoint to monitor link speed changes
2025-07-25 2:59 ` Shuai Xue
@ 2025-07-25 3:06 ` Steven Rostedt
2025-07-25 3:15 ` Shuai Xue
0 siblings, 1 reply; 20+ messages in thread
From: Steven Rostedt @ 2025-07-25 3:06 UTC (permalink / raw)
To: Shuai Xue
Cc: lukas, linux-pci, linux-kernel, linux-edac, linux-trace-kernel,
helgaas, ilpo.jarvinen, mattc, Jonathan.Cameron, bhelgaas,
tony.luck, bp, mhiramat, mathieu.desnoyers, oleg, naveen, davem,
anil.s.keshavamurthy, mark.rutland, peterz, tianruidong
On Fri, 25 Jul 2025 10:59:16 +0800
Shuai Xue <xueshuai@linux.alibaba.com> wrote:
> Thank you so much for the detailed guidance and the excellent example!
> This makes it much clearer how to implement the libtraceevent support.
>
> Should I include the libtraceevent plugin patch in the same kernel patch
> series, or submit it separately? I'm not sure about the best practice
> here.
No, libtraceevent lives outside the kernel tree.
>
> >
> > -- Steve
>
> I'll work on the libtraceevent patch and submit it according to your
> guidance. Thanks again for the clear direction and the documentation
> link!
Make a patch against: git://git.kernel.org/pub/scm/libs/libtrace/libtraceevent.git
Follow the same procedure as you would for submitting to the linux kernel,
but instead of sending it to LKML, send it to: linux-trace-devel@vger.kernel.org
You don't even need to Cc me. I'll get it from that mailing list.
Patchwork is here: https://patchwork.kernel.org/project/linux-trace-devel/list/
-- Steve
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v9 2/2] PCI: trace: Add a RAS tracepoint to monitor link speed changes
2025-07-25 3:06 ` Steven Rostedt
@ 2025-07-25 3:15 ` Shuai Xue
0 siblings, 0 replies; 20+ messages in thread
From: Shuai Xue @ 2025-07-25 3:15 UTC (permalink / raw)
To: Steven Rostedt
Cc: lukas, linux-pci, linux-kernel, linux-edac, linux-trace-kernel,
helgaas, ilpo.jarvinen, mattc, Jonathan.Cameron, bhelgaas,
tony.luck, bp, mhiramat, mathieu.desnoyers, oleg, naveen, davem,
anil.s.keshavamurthy, mark.rutland, peterz, tianruidong
在 2025/7/25 11:06, Steven Rostedt 写道:
> On Fri, 25 Jul 2025 10:59:16 +0800
> Shuai Xue <xueshuai@linux.alibaba.com> wrote:
>
>> Thank you so much for the detailed guidance and the excellent example!
>> This makes it much clearer how to implement the libtraceevent support.
>>
>> Should I include the libtraceevent plugin patch in the same kernel patch
>> series, or submit it separately? I'm not sure about the best practice
>> here.
>
> No, libtraceevent lives outside the kernel tree.
>
>>
>>>
>>> -- Steve
>>
>> I'll work on the libtraceevent patch and submit it according to your
>> guidance. Thanks again for the clear direction and the documentation
>> link!
>
> Make a patch against: git://git.kernel.org/pub/scm/libs/libtrace/libtraceevent.git
>
> Follow the same procedure as you would for submitting to the linux kernel,
> but instead of sending it to LKML, send it to: linux-trace-devel@vger.kernel.org
>
> You don't even need to Cc me. I'll get it from that mailing list.
>
> Patchwork is here: https://patchwork.kernel.org/project/linux-trace-devel/list/
>
> -- Steve
Hi Steve,
Perfect, thank you for the clear instructions!
I'll work on the plugin and submit it to the trace-devel list once this
patch set is merged in upstream.
Thanks again for all the guidance!
Best regards,
Shuai
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v9 2/2] PCI: trace: Add a RAS tracepoint to monitor link speed changes
2025-07-25 2:57 ` Steven Rostedt
@ 2025-07-25 3:18 ` Shuai Xue
0 siblings, 0 replies; 20+ messages in thread
From: Shuai Xue @ 2025-07-25 3:18 UTC (permalink / raw)
To: Steven Rostedt
Cc: kernel test robot, Bjorn Helgaas, lukas, linux-pci, linux-kernel,
linux-edac, linux-trace-kernel, helgaas, ilpo.jarvinen, mattc,
Jonathan.Cameron, oe-kbuild-all, tony.luck, bp, mhiramat,
mathieu.desnoyers, oleg, naveen, davem, anil.s.keshavamurthy,
mark.rutland, peterz, tianruidong
在 2025/7/25 10:57, Steven Rostedt 写道:
> On Fri, 25 Jul 2025 10:31:45 +0800
> Shuai Xue <xueshuai@linux.alibaba.com> wrote:
>
>> I'm leaning toward option 1 (dedicated trace.c file) as it's the
>> cleanest approach and follows the pattern used in other subsystems. Does
>> that sound reasonable to you?
>
> That's fine, but make sure you have #ifdef CONFIG_FOO around tracepoints
> that are only used when those configs are enabled, otherwise you will get
> warnings.
>
> Well, if they are exported, then the warnings are suppressed.
>
> https://lore.kernel.org/all/20250725025149.726267838@kernel.org/
>
> -- Steve
Hi Steve,
Got it. Thanks for the important reminder about the CONFIG guards!
Best regards,
Shuai
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v9 1/2] PCI: trace: Add a generic RAS tracepoint for hotplug event
2025-07-23 3:31 ` [PATCH v9 1/2] PCI: trace: Add a generic RAS tracepoint for hotplug event Shuai Xue
@ 2025-07-25 21:09 ` Bjorn Helgaas
2025-07-28 9:05 ` Shuai Xue
0 siblings, 1 reply; 20+ messages in thread
From: Bjorn Helgaas @ 2025-07-25 21:09 UTC (permalink / raw)
To: Shuai Xue
Cc: rostedt, lukas, linux-pci, linux-kernel, linux-edac,
linux-trace-kernel, ilpo.jarvinen, mattc, Jonathan.Cameron,
bhelgaas, tony.luck, bp, mhiramat, mathieu.desnoyers, oleg,
naveen, davem, anil.s.keshavamurthy, mark.rutland, peterz,
tianruidong
On Wed, Jul 23, 2025 at 11:31:07AM +0800, Shuai Xue wrote:
> Hotplug events are critical indicators for analyzing hardware health,
> and surprise link downs can significantly impact system performance and
> reliability.
>
> Define a new TRACING_SYSTEM named "pci", add a generic RAS tracepoint
> for hotplug event to help health checks. Add enum pci_hotplug_event in
> include/uapi/linux/pci.h so applications like rasdaemon can register
> tracepoint event handlers for it.
>
> The output is like below:
>
> $ echo 1 > /sys/kernel/debug/tracing/events/pci/pci_hp_event/enable
> $ cat /sys/kernel/debug/tracing/trace_pipe
> <...>-206 [001] ..... 40.373870: pci_hp_event: 0000:00:02.0 slot:10, event:LINK_DOWN
>
> <...>-206 [001] ..... 40.374871: pci_hp_event: 0000:00:02.0 slot:10, event:CARD_NOT_PRESENT
I asked about documentation earlier [1], but didn't see any response.
I think these tracepoints are important and will be widely used, so it
seems like some kind of user guide would be helpful.
> Suggested-by: Lukas Wunner <lukas@wunner.de>
> Suggested-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Nit: I assume this came from the patch I had applied to pci/trace, but
you shouldn't include any sign-offs from people to whom you send
patches [2].
Bjorn
[1] https://lore.kernel.org/all/20250717192950.GA2594528@bhelgaas/#t
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=v6.13#n449
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v9 2/2] PCI: trace: Add a RAS tracepoint to monitor link speed changes
2025-07-23 3:31 ` [PATCH v9 2/2] PCI: trace: Add a RAS tracepoint to monitor link speed changes Shuai Xue
2025-07-23 14:05 ` Steven Rostedt
2025-07-23 19:30 ` kernel test robot
@ 2025-07-25 21:09 ` Bjorn Helgaas
2025-07-26 7:51 ` Lukas Wunner
2025-07-28 9:17 ` Shuai Xue
2 siblings, 2 replies; 20+ messages in thread
From: Bjorn Helgaas @ 2025-07-25 21:09 UTC (permalink / raw)
To: Shuai Xue
Cc: rostedt, lukas, linux-pci, linux-kernel, linux-edac,
linux-trace-kernel, ilpo.jarvinen, mattc, Jonathan.Cameron,
bhelgaas, tony.luck, bp, mhiramat, mathieu.desnoyers, oleg,
naveen, davem, anil.s.keshavamurthy, mark.rutland, peterz,
tianruidong
On Wed, Jul 23, 2025 at 11:31:08AM +0800, Shuai Xue wrote:
> PCIe link speed degradation directly impacts system performance and
> often indicates hardware issues such as faulty devices, physical layer
> problems, or configuration errors.
>
> To this end, add a RAS tracepoint to monitor link speed changes,
> enabling proactive health checks and diagnostic analysis.
>
> The output is like below:
>
> $ echo 1 > /sys/kernel/debug/tracing/events/pci/pci_link_event/enable
> $ cat /sys/kernel/debug/tracing/trace_pipe
> cat /sys/kernel/debug/tracing/trace_pipe
> <...>-119 [002] ..... 125.776171: pci_hp_event: 0000:00:03.0 slot:30, event:CARD_PRESENT
>
> <...>-119 [002] ..... 125.776197: pci_hp_event: 0000:00:03.0 slot:30, event:LINK_UP
>
> irq/57-pciehp-119 [002] ..... 125.904335: pcie_link_event: 0000:00:03.0 type:4, reason:4, cur_bus_speed:2.5 GT/s PCIe, max_bus_speed:16.0 GT/s PCIe, width:1, flit_mode:0, status:DLLLA
>
> irq/57-pciehp-119 [002] ..... 125.907051: pcie_link_event: 0000:00:03.0 type:4, reason:0, cur_bus_speed:2.5 GT/s PCIe, max_bus_speed:16.0 GT/s PCIe, width:1, flit_mode:0, status:DLLLA
I guess this example would actually require both of these enables, right?
echo 1 > /sys/kernel/debug/tracing/events/pci/pci_hp_event/enable
echo 1 > /sys/kernel/debug/tracing/events/pci/pci_link_event/enable
> Suggested-by: Bjorn Helgaas <helgaas@kernel.org>
I don't think I've suggested anything that really warrants this ;)
> ...
> @@ -292,7 +292,7 @@ int pciehp_check_link_status(struct controller *ctrl)
> {
> struct pci_dev *pdev = ctrl_dev(ctrl);
> bool found;
> - u16 lnk_status, linksta2;
> + u16 lnk_status;
>
> if (!pcie_wait_for_link(pdev, true)) {
> ctrl_info(ctrl, "Slot(%s): No link\n", slot_name(ctrl));
> @@ -319,8 +319,7 @@ int pciehp_check_link_status(struct controller *ctrl)
> return -1;
> }
>
> - pcie_capability_read_word(pdev, PCI_EXP_LNKSTA2, &linksta2);
> - __pcie_update_link_speed(ctrl->pcie->port->subordinate, lnk_status, linksta2);
> + pcie_update_link_speed(ctrl->pcie->port->subordinate, PCIE_HOTPLUG);
It kind of bugs me that the hot-add flow reads LNKSTA three times and
generates both pci_hp_event LINK_UP and link_event tracepoints:
pciehp_handle_presence_or_link_change
link_active = pciehp_check_link_active()
pcie_capability_read_word(PCI_EXP_LNKSTA)
if (link_active)
ctrl_info(ctrl, "Slot(%s): Link Up\n")
trace_pci_hp_event(PCI_HOTPLUG_LINK_UP)
pciehp_enable_slot
__pciehp_enable_slot
board_added
pciehp_check_link_status
pcie_capability_read_word(PCI_EXP_LNKSTA)
pcie_update_link_speed
pcie_capability_read_word(PCI_EXP_LNKSTA)
pcie_capability_read_word(PCI_EXP_LNKSTA2)
trace_pcie_link_event(<REASON>)
Maybe there are good reasons for reading LNKSTA three times, but it
does make me raise my eyebrows. Not that this is a performance path,
but it just offends my sense of propriety.
And maybe we need both a bare LINK_UP event and a link_event with all
the details, but again it seems a little weird to me that there are
two tracepoints when there's really only one event and we know all the
link_event information from the very first LNKSTA read.
Bjorn
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v9 2/2] PCI: trace: Add a RAS tracepoint to monitor link speed changes
2025-07-25 21:09 ` Bjorn Helgaas
@ 2025-07-26 7:51 ` Lukas Wunner
2025-07-28 9:28 ` Shuai Xue
2025-07-28 9:17 ` Shuai Xue
1 sibling, 1 reply; 20+ messages in thread
From: Lukas Wunner @ 2025-07-26 7:51 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Shuai Xue, rostedt, linux-pci, linux-kernel, linux-edac,
linux-trace-kernel, ilpo.jarvinen, mattc, Jonathan.Cameron,
bhelgaas, tony.luck, bp, mhiramat, mathieu.desnoyers, oleg,
naveen, davem, anil.s.keshavamurthy, mark.rutland, peterz,
tianruidong
On Fri, Jul 25, 2025 at 04:09:21PM -0500, Bjorn Helgaas wrote:
> > @@ -319,8 +319,7 @@ int pciehp_check_link_status(struct controller *ctrl)
> > return -1;
> > }
> >
> > - pcie_capability_read_word(pdev, PCI_EXP_LNKSTA2, &linksta2);
> > - __pcie_update_link_speed(ctrl->pcie->port->subordinate, lnk_status, linksta2);
> > + pcie_update_link_speed(ctrl->pcie->port->subordinate, PCIE_HOTPLUG);
>
> It kind of bugs me that the hot-add flow reads LNKSTA three times and
> generates both pci_hp_event LINK_UP and link_event tracepoints:
>
> pciehp_handle_presence_or_link_change
> link_active = pciehp_check_link_active()
> pcie_capability_read_word(PCI_EXP_LNKSTA)
> if (link_active)
> ctrl_info(ctrl, "Slot(%s): Link Up\n")
This LNKSTA read decides whether to bring up the slot.
It can't be eliminated.
> trace_pci_hp_event(PCI_HOTPLUG_LINK_UP)
> pciehp_enable_slot
> __pciehp_enable_slot
> board_added
> pciehp_check_link_status
> pcie_capability_read_word(PCI_EXP_LNKSTA)
This is sort of a final check whether the link is (still) active
before commencing device enumeration. Doesn't look like it can
safely be eliminated either.
> pcie_update_link_speed
> pcie_capability_read_word(PCI_EXP_LNKSTA)
> pcie_capability_read_word(PCI_EXP_LNKSTA2)
> trace_pcie_link_event(<REASON>)
This third register read is introduced by the present patch and is
indeed somewhat a step back, given that pciehp_check_link_status()
currently deliberately calls __pcie_update_link_speed() to pass
the already read LNKSTA value.
I'm wondering if the tracepoint can be moved down to
__pcie_update_link_speed()?
> And maybe we need both a bare LINK_UP event and a link_event with all
> the details, but again it seems a little weird to me that there are
> two tracepoints when there's really only one event and we know all the
> link_event information from the very first LNKSTA read.
One of the reasons is that a "Link Down" event would have to
contain dummy values for link speed etc, so it seemed cleaner
to separate the hotplug event from the link speed event.
Thanks,
Lukas
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v9 1/2] PCI: trace: Add a generic RAS tracepoint for hotplug event
2025-07-25 21:09 ` Bjorn Helgaas
@ 2025-07-28 9:05 ` Shuai Xue
0 siblings, 0 replies; 20+ messages in thread
From: Shuai Xue @ 2025-07-28 9:05 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: rostedt, lukas, linux-pci, linux-kernel, linux-edac,
linux-trace-kernel, ilpo.jarvinen, mattc, Jonathan.Cameron,
bhelgaas, tony.luck, bp, mhiramat, mathieu.desnoyers, oleg,
naveen, davem, anil.s.keshavamurthy, mark.rutland, peterz,
tianruidong
Hi, Bjorn,
在 2025/7/26 05:09, Bjorn Helgaas 写道:
> On Wed, Jul 23, 2025 at 11:31:07AM +0800, Shuai Xue wrote:
>> Hotplug events are critical indicators for analyzing hardware health,
>> and surprise link downs can significantly impact system performance and
>> reliability.
>>
>> Define a new TRACING_SYSTEM named "pci", add a generic RAS tracepoint
>> for hotplug event to help health checks. Add enum pci_hotplug_event in
>> include/uapi/linux/pci.h so applications like rasdaemon can register
>> tracepoint event handlers for it.
>>
>> The output is like below:
>>
>> $ echo 1 > /sys/kernel/debug/tracing/events/pci/pci_hp_event/enable
>> $ cat /sys/kernel/debug/tracing/trace_pipe
>> <...>-206 [001] ..... 40.373870: pci_hp_event: 0000:00:02.0 slot:10, event:LINK_DOWN
>>
>> <...>-206 [001] ..... 40.374871: pci_hp_event: 0000:00:02.0 slot:10, event:CARD_NOT_PRESENT
>
> I asked about documentation earlier [1], but didn't see any response.
> I think these tracepoints are important and will be widely used, so it
> seems like some kind of user guide would be helpful.
Sorry for missing your earlier email about documentation.
> Is there any convention for documenting tracepoints somewhere? It
> looks like there's some doc in Documentation/trace/? Should we be
> adding something there?
Regarding tracepoint documentation conventions, you raise a good point.
Looking at Documentation/trace/, most of the existing documentation
focuses on the tracing infrastructure itself rather than individual
tracepoint events.
For tracepoint events like the ones I'm familiar with (aer_event,
memory_failure_event, mce_event, mc_event), the typical approach has
been:
- Self-documenting through code - The TRACE_EVENT() definitions in
include/trace/events/ serve as the primary specification
- UAPI headers - Enums and structures in include/uapi/ provide the
interface definitions
- Commit messages - Detailed explanations of when/why events are
generated
However, there are some exceptions where specific events do have
dedicated documentation:
- Documentation/trace/events-power.rst - Power management events
- Documentation/trace/events-kmem.rst - Kernel memory allocation events
- Documentation/trace/events-nmi.rst - NMI events
- Documentation/trace/events-msr.rst - MSR (Model Specific Register) events
Given your point about these PCI tracepoints potentially being widely
used, I think adding documentation would be valuable. Bellow is the RFC
doc, are you happy with this?
diff --git a/Documentation/trace/events-pci.rst b/Documentation/trace/events-pci.rst
new file mode 100644
index 000000000000..f2f7cacba862
--- /dev/null
+++ b/Documentation/trace/events-pci.rst
@@ -0,0 +1,72 @@
+===========================
+Subsystem Trace Points: PCI
+===========================
+
+Overview
+========
+The PCI tracing system provides tracepoints to monitor critical hardware events
+that can impact system performance and reliability. These events normally show
+up here:
+
+ /sys/kernel/tracing/events/pci
+
+Cf. include/trace/events/pci.h for the events definitions.
+
+Available Tracepoints
+=====================
+
+pci_hp_event
+------------
+
+Monitors PCI hotplug events including card insertion/removal and link
+state changes.
+::
+
+ pci_hp_event "%s slot:%s, event:%s\n"
+
+**Event Types**:
+
+* ``LINK_UP`` - PCIe link established
+* ``LINK_DOWN`` - PCIe link lost
+* ``CARD_PRESENT`` - Card detected in slot
+* ``CARD_NOT_PRESENT`` - Card removed from slot
+
+**Example Usage**:
+
+ # Enable the tracepoint
+ echo 1> /sys/kernel/debug/tracing/events/pci/pci_hp_event/enable
+
+ # Monitor events
+ cat /sys/kernel/debug/tracing/trace_pipe
+ <...>-206 [001] ..... 40.373870: pci_hp_event: 0000:00:02.0 slot:10, event:LINK_DOWN
+
+ <...>-206 [001] ..... 40.374871: pci_hp_event: 0000:00:02.0 slot:10, event:CARD_NOT_PRESENT
+
+
+pcie_link_event
+---------------
+
+Monitors PCIe link speed changes and provides detailed link status information.
+::
+
+ pcie_link_event "%s type:%d, reason:%d, cur_bus_speed:%s, max_bus_speed:%s, width:%u, flit_mode:%u, status:%s\n"
+
+**Parameters**:
+
+* ``type`` - PCIe device type (4=Root Port, etc.)
+* ``reason`` - Reason for link change:
+
+ - ``0`` - Link retrain
+ - ``1`` - Bus enumeration
+ - ``2`` - Bandwidth controller enable
+ - ``3`` - Bandwidth controller IRQ
+ - ``4`` - Hotplug event
+
+
+**Example Usage**::
+
+ # Enable the tracepoint
+ echo1 > /sys/kernel/debug/tracing/events/pci/pcie_link_event/enable
+
+ # Monitor link events
+ cat /sys/kernel/debug/tracing/trace_pipe
>
>> Suggested-by: Lukas Wunner <lukas@wunner.de>
>> Suggested-by: Steven Rostedt <rostedt@goodmis.org>
>> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
>> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
>
> Nit: I assume this came from the patch I had applied to pci/trace, but
> you shouldn't include any sign-offs from people to whom you send
> patches [2].
Yep, I copied the commit log from your applied patch in pci/trace. I
will drop your sign-offs.
>
> Bjorn
>
> [1] https://lore.kernel.org/all/20250717192950.GA2594528@bhelgaas/#t
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?id=v6.13#n449
Thanks for the guidance!
Best regards,
Shuai
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v9 2/2] PCI: trace: Add a RAS tracepoint to monitor link speed changes
2025-07-25 21:09 ` Bjorn Helgaas
2025-07-26 7:51 ` Lukas Wunner
@ 2025-07-28 9:17 ` Shuai Xue
1 sibling, 0 replies; 20+ messages in thread
From: Shuai Xue @ 2025-07-28 9:17 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: rostedt, lukas, linux-pci, linux-kernel, linux-edac,
linux-trace-kernel, ilpo.jarvinen, mattc, Jonathan.Cameron,
bhelgaas, tony.luck, bp, mhiramat, mathieu.desnoyers, oleg,
naveen, davem, anil.s.keshavamurthy, mark.rutland, peterz,
tianruidong
在 2025/7/26 05:09, Bjorn Helgaas 写道:
> On Wed, Jul 23, 2025 at 11:31:08AM +0800, Shuai Xue wrote:
>> PCIe link speed degradation directly impacts system performance and
>> often indicates hardware issues such as faulty devices, physical layer
>> problems, or configuration errors.
>>
>> To this end, add a RAS tracepoint to monitor link speed changes,
>> enabling proactive health checks and diagnostic analysis.
>>
>> The output is like below:
>>
>> $ echo 1 > /sys/kernel/debug/tracing/events/pci/pci_link_event/enable
>> $ cat /sys/kernel/debug/tracing/trace_pipe
>> cat /sys/kernel/debug/tracing/trace_pipe
>> <...>-119 [002] ..... 125.776171: pci_hp_event: 0000:00:03.0 slot:30, event:CARD_PRESENT
>>
>> <...>-119 [002] ..... 125.776197: pci_hp_event: 0000:00:03.0 slot:30, event:LINK_UP
>>
>> irq/57-pciehp-119 [002] ..... 125.904335: pcie_link_event: 0000:00:03.0 type:4, reason:4, cur_bus_speed:2.5 GT/s PCIe, max_bus_speed:16.0 GT/s PCIe, width:1, flit_mode:0, status:DLLLA
>>
>> irq/57-pciehp-119 [002] ..... 125.907051: pcie_link_event: 0000:00:03.0 type:4, reason:0, cur_bus_speed:2.5 GT/s PCIe, max_bus_speed:16.0 GT/s PCIe, width:1, flit_mode:0, status:DLLLA
>
> I guess this example would actually require both of these enables, right?
>
> echo 1 > /sys/kernel/debug/tracing/events/pci/pci_hp_event/enable
> echo 1 > /sys/kernel/debug/tracing/events/pci/pci_link_event/enable
Yes, you're absolutely right. I'll correct the commit log to show both
commands.
(As a side note, echo 1 > /sys/kernel/debug/tracing/events/pci/enable
would enable both events at once.)
(echo 1 > /sys/kernel/debug/tracing/events/pci/enable will enable both
of these event.)
>
>> Suggested-by: Bjorn Helgaas <helgaas@kernel.org>
>
> I don't think I've suggested anything that really warrants this ;)
>
Fair enough! I'll drop the Suggested-by tag.
>> ...
>> @@ -292,7 +292,7 @@ int pciehp_check_link_status(struct controller *ctrl)
>> {
>> struct pci_dev *pdev = ctrl_dev(ctrl);
>> bool found;
>> - u16 lnk_status, linksta2;
>> + u16 lnk_status;
>>
>> if (!pcie_wait_for_link(pdev, true)) {
>> ctrl_info(ctrl, "Slot(%s): No link\n", slot_name(ctrl));
>> @@ -319,8 +319,7 @@ int pciehp_check_link_status(struct controller *ctrl)
>> return -1;
>> }
>>
>> - pcie_capability_read_word(pdev, PCI_EXP_LNKSTA2, &linksta2);
>> - __pcie_update_link_speed(ctrl->pcie->port->subordinate, lnk_status, linksta2);
>> + pcie_update_link_speed(ctrl->pcie->port->subordinate, PCIE_HOTPLUG);
>
> It kind of bugs me that the hot-add flow reads LNKSTA three times and
> generates both pci_hp_event LINK_UP and link_event tracepoints:
>
> pciehp_handle_presence_or_link_change
> link_active = pciehp_check_link_active()
> pcie_capability_read_word(PCI_EXP_LNKSTA)
> if (link_active)
> ctrl_info(ctrl, "Slot(%s): Link Up\n")
> trace_pci_hp_event(PCI_HOTPLUG_LINK_UP)
> pciehp_enable_slot
> __pciehp_enable_slot
> board_added
> pciehp_check_link_status
> pcie_capability_read_word(PCI_EXP_LNKSTA)
> pcie_update_link_speed
> pcie_capability_read_word(PCI_EXP_LNKSTA)
> pcie_capability_read_word(PCI_EXP_LNKSTA2)
> trace_pcie_link_event(<REASON>)
>
> Maybe there are good reasons for reading LNKSTA three times, but it
> does make me raise my eyebrows. Not that this is a performance path,
> but it just offends my sense of propriety.
>
> And maybe we need both a bare LINK_UP event and a link_event with all
> the details, but again it seems a little weird to me that there are
> two tracepoints when there's really only one event and we know all the
> link_event information from the very first LNKSTA read.
>
> Bjorn
I understand your concern about the multiple LNKSTA reads and the
apparent duplication. Please see comments from Lukas.
Best Regards,
Shuai
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v9 2/2] PCI: trace: Add a RAS tracepoint to monitor link speed changes
2025-07-26 7:51 ` Lukas Wunner
@ 2025-07-28 9:28 ` Shuai Xue
0 siblings, 0 replies; 20+ messages in thread
From: Shuai Xue @ 2025-07-28 9:28 UTC (permalink / raw)
To: Lukas Wunner, Bjorn Helgaas
Cc: rostedt, linux-pci, linux-kernel, linux-edac, linux-trace-kernel,
ilpo.jarvinen, mattc, Jonathan.Cameron, bhelgaas, tony.luck, bp,
mhiramat, mathieu.desnoyers, oleg, naveen, davem,
anil.s.keshavamurthy, mark.rutland, peterz, tianruidong
在 2025/7/26 15:51, Lukas Wunner 写道:
> On Fri, Jul 25, 2025 at 04:09:21PM -0500, Bjorn Helgaas wrote:
>>> @@ -319,8 +319,7 @@ int pciehp_check_link_status(struct controller *ctrl)
>>> return -1;
>>> }
>>>
>>> - pcie_capability_read_word(pdev, PCI_EXP_LNKSTA2, &linksta2);
>>> - __pcie_update_link_speed(ctrl->pcie->port->subordinate, lnk_status, linksta2);
>>> + pcie_update_link_speed(ctrl->pcie->port->subordinate, PCIE_HOTPLUG);
>>
>> It kind of bugs me that the hot-add flow reads LNKSTA three times and
>> generates both pci_hp_event LINK_UP and link_event tracepoints:
>>
>> pciehp_handle_presence_or_link_change
>> link_active = pciehp_check_link_active()
>> pcie_capability_read_word(PCI_EXP_LNKSTA)
>> if (link_active)
>> ctrl_info(ctrl, "Slot(%s): Link Up\n")
>
> This LNKSTA read decides whether to bring up the slot.
> It can't be eliminated.
>
>> trace_pci_hp_event(PCI_HOTPLUG_LINK_UP)
>> pciehp_enable_slot
>> __pciehp_enable_slot
>> board_added
>> pciehp_check_link_status
>> pcie_capability_read_word(PCI_EXP_LNKSTA)
>
> This is sort of a final check whether the link is (still) active
> before commencing device enumeration. Doesn't look like it can
> safely be eliminated either.
>
>> pcie_update_link_speed
>> pcie_capability_read_word(PCI_EXP_LNKSTA)
>> pcie_capability_read_word(PCI_EXP_LNKSTA2)
>> trace_pcie_link_event(<REASON>)
>
> This third register read is introduced by the present patch and is
> indeed somewhat a step back, given that pciehp_check_link_status()
> currently deliberately calls __pcie_update_link_speed() to pass
> the already read LNKSTA value.
Hi Lukas, and Bjorn:
Thanks for the excellent technical analysis!
You're absolutely right. I introduced an unnecessary regression by
adding a third LNKSTA read when pciehp_check_link_status() already has
the LNKSTA value and was specifically designed to pass it to
__pcie_update_link_speed().
>
> I'm wondering if the tracepoint can be moved down to
> __pcie_update_link_speed()?
Yes, that's a much better approach. Will fix it in next version.
>
>> And maybe we need both a bare LINK_UP event and a link_event with all
>> the details, but again it seems a little weird to me that there are
>> two tracepoints when there's really only one event and we know all the
>> link_event information from the very first LNKSTA read.
>
> One of the reasons is that a "Link Down" event would have to
> contain dummy values for link speed etc, so it seemed cleaner
> to separate the hotplug event from the link speed event.
>
> Thanks,
>
> Lukas
I agree with Lukas and I completely agree with this separation. The two
tracepoints serve different purposes:
- pci_hp_event: Pure hotplug state changes (LINK_UP/LINK_DOWN,
CARD_PRESENT/CARD_NOT_PRESENT)
- pcie_link_event: Actual link parameter information when meaningful
values exist
For LINK_DOWN events, we don't have meaningful speed/width values, so
forcing them into a single tracepoint would indeed require dummy/invalid
values, making the interface confusing.
Thanks for the clear technical guidance and for catching my regression!
Best regards,
Shuai
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2025-07-28 9:28 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-23 3:31 [PATCH v9 0/2] add PCI hotplug and PCIe link tracepoint Shuai Xue
2025-07-23 3:31 ` [PATCH v9 1/2] PCI: trace: Add a generic RAS tracepoint for hotplug event Shuai Xue
2025-07-25 21:09 ` Bjorn Helgaas
2025-07-28 9:05 ` Shuai Xue
2025-07-23 3:31 ` [PATCH v9 2/2] PCI: trace: Add a RAS tracepoint to monitor link speed changes Shuai Xue
2025-07-23 14:05 ` Steven Rostedt
2025-07-25 2:11 ` Shuai Xue
2025-07-25 2:25 ` Steven Rostedt
2025-07-25 2:59 ` Shuai Xue
2025-07-25 3:06 ` Steven Rostedt
2025-07-25 3:15 ` Shuai Xue
2025-07-23 19:30 ` kernel test robot
2025-07-24 14:06 ` Steven Rostedt
2025-07-25 2:31 ` Shuai Xue
2025-07-25 2:57 ` Steven Rostedt
2025-07-25 3:18 ` Shuai Xue
2025-07-25 21:09 ` Bjorn Helgaas
2025-07-26 7:51 ` Lukas Wunner
2025-07-28 9:28 ` Shuai Xue
2025-07-28 9:17 ` Shuai Xue
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).