* [PATCH v12 1/3] PCI: trace: Add a generic RAS tracepoint for hotplug event
2025-10-14 12:31 [PATCH v12 0/3] PCI: trace: Add a RAS tracepoint to monitor link speed changes Shuai Xue
@ 2025-10-14 12:31 ` Shuai Xue
2025-10-14 15:40 ` Steven Rostedt
2025-10-14 12:31 ` [PATCH v12 2/3] PCI: trace: Add a RAS tracepoint to monitor link speed changes Shuai Xue
2025-10-14 12:31 ` [PATCH v12 3/3] Documentation: tracing: Add documentation about PCI tracepoints Shuai Xue
2 siblings, 1 reply; 14+ messages in thread
From: Shuai Xue @ 2025-10-14 12:31 UTC (permalink / raw)
To: rostedt, lukas, linux-pci, linux-kernel, linux-edac,
linux-trace-kernel, helgaas, ilpo.jarvinen, mattc,
Jonathan.Cameron
Cc: bhelgaas, tony.luck, bp, xueshuai, mhiramat, mathieu.desnoyers,
oleg, naveen, davem, anil.s.keshavamurthy, mark.rutland, peterz,
tianruidong
Hotplug events are critical indicators for analyzing hardware health,
and surprise link downs can significantly impact system performance and
reliability.
Define a new TRACING_SYSTEM named "pci", add a generic RAS tracepoint
for hotplug event to help health checks. Add enum pci_hotplug_event in
include/uapi/linux/pci.h so applications like rasdaemon can register
tracepoint event handlers for it.
The following output is generated when a device is hotplugged:
$ echo 1 > /sys/kernel/debug/tracing/events/pci/pci_hp_event/enable
$ cat /sys/kernel/debug/tracing/trace_pipe
irq/51-pciehp-88 [001] ..... 1311.177459: pci_hp_event: 0000:00:02.0 slot:10, event:CARD_PRESENT
irq/51-pciehp-88 [001] ..... 1311.177566: pci_hp_event: 0000:00:02.0 slot:10, event:LINK_UP
Suggested-by: Lukas Wunner <lukas@wunner.de>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
drivers/pci/Makefile | 3 ++
drivers/pci/hotplug/pciehp_ctrl.c | 31 ++++++++++++---
drivers/pci/trace.c | 11 ++++++
include/trace/events/pci.h | 63 +++++++++++++++++++++++++++++++
include/uapi/linux/pci.h | 7 ++++
5 files changed, 109 insertions(+), 6 deletions(-)
create mode 100644 drivers/pci/trace.c
create mode 100644 include/trace/events/pci.h
diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
index 67647f1880fb..58a4e4ea76b0 100644
--- a/drivers/pci/Makefile
+++ b/drivers/pci/Makefile
@@ -45,3 +45,6 @@ obj-y += controller/
obj-y += switch/
subdir-ccflags-$(CONFIG_PCI_DEBUG) := -DDEBUG
+
+CFLAGS_trace.o := -I$(src)
+obj-$(CONFIG_TRACING) += trace.o
diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index bcc938d4420f..7805f697a02c 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -19,6 +19,7 @@
#include <linux/types.h>
#include <linux/pm_runtime.h>
#include <linux/pci.h>
+#include <trace/events/pci.h>
#include "../pci.h"
#include "pciehp.h"
@@ -244,12 +245,20 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
case ON_STATE:
ctrl->state = POWEROFF_STATE;
mutex_unlock(&ctrl->state_lock);
- if (events & PCI_EXP_SLTSTA_DLLSC)
+ if (events & PCI_EXP_SLTSTA_DLLSC) {
ctrl_info(ctrl, "Slot(%s): Link Down\n",
slot_name(ctrl));
- if (events & PCI_EXP_SLTSTA_PDC)
+ trace_pci_hp_event(pci_name(ctrl->pcie->port),
+ slot_name(ctrl),
+ PCI_HOTPLUG_LINK_DOWN);
+ }
+ if (events & PCI_EXP_SLTSTA_PDC) {
ctrl_info(ctrl, "Slot(%s): Card not present\n",
slot_name(ctrl));
+ trace_pci_hp_event(pci_name(ctrl->pcie->port),
+ slot_name(ctrl),
+ PCI_HOTPLUG_CARD_NOT_PRESENT);
+ }
pciehp_disable_slot(ctrl, SURPRISE_REMOVAL);
break;
default:
@@ -269,6 +278,9 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
INDICATOR_NOOP);
ctrl_info(ctrl, "Slot(%s): Card not present\n",
slot_name(ctrl));
+ trace_pci_hp_event(pci_name(ctrl->pcie->port),
+ slot_name(ctrl),
+ PCI_HOTPLUG_CARD_NOT_PRESENT);
}
mutex_unlock(&ctrl->state_lock);
return;
@@ -281,12 +293,19 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
case OFF_STATE:
ctrl->state = POWERON_STATE;
mutex_unlock(&ctrl->state_lock);
- if (present)
+ if (present) {
ctrl_info(ctrl, "Slot(%s): Card present\n",
slot_name(ctrl));
- if (link_active)
- ctrl_info(ctrl, "Slot(%s): Link Up\n",
- slot_name(ctrl));
+ trace_pci_hp_event(pci_name(ctrl->pcie->port),
+ slot_name(ctrl),
+ PCI_HOTPLUG_CARD_PRESENT);
+ }
+ if (link_active) {
+ ctrl_info(ctrl, "Slot(%s): Link Up\n", slot_name(ctrl));
+ trace_pci_hp_event(pci_name(ctrl->pcie->port),
+ slot_name(ctrl),
+ PCI_HOTPLUG_LINK_UP);
+ }
ctrl->request_result = pciehp_enable_slot(ctrl);
break;
default:
diff --git a/drivers/pci/trace.c b/drivers/pci/trace.c
new file mode 100644
index 000000000000..cf11abca8602
--- /dev/null
+++ b/drivers/pci/trace.c
@@ -0,0 +1,11 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Tracepoints for PCI system
+ *
+ * Copyright (C) 2025 Alibaba Corporation
+ */
+
+#include <linux/pci.h>
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/pci.h>
diff --git a/include/trace/events/pci.h b/include/trace/events/pci.h
new file mode 100644
index 000000000000..208609492c06
--- /dev/null
+++ b/include/trace/events/pci.h
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM pci
+
+#if !defined(_TRACE_HW_EVENT_PCI_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HW_EVENT_PCI_H
+
+#include <linux/tracepoint.h>
+
+#define PCI_HOTPLUG_EVENT \
+ EM(PCI_HOTPLUG_LINK_UP, "LINK_UP") \
+ EM(PCI_HOTPLUG_LINK_DOWN, "LINK_DOWN") \
+ EM(PCI_HOTPLUG_CARD_PRESENT, "CARD_PRESENT") \
+ EMe(PCI_HOTPLUG_CARD_NOT_PRESENT, "CARD_NOT_PRESENT")
+
+/* Enums require being exported to userspace, for user tool parsing */
+#undef EM
+#undef EMe
+#define EM(a, b) TRACE_DEFINE_ENUM(a);
+#define EMe(a, b) TRACE_DEFINE_ENUM(a);
+
+PCI_HOTPLUG_EVENT
+
+/*
+ * Now redefine the EM() and EMe() macros to map the enums to the strings
+ * that will be printed in the output.
+ */
+#undef EM
+#undef EMe
+#define EM(a, b) {a, b},
+#define EMe(a, b) {a, b}
+
+TRACE_EVENT(pci_hp_event,
+
+ TP_PROTO(const char *port_name,
+ const char *slot,
+ const int event),
+
+ TP_ARGS(port_name, slot, event),
+
+ TP_STRUCT__entry(
+ __string( port_name, port_name )
+ __string( slot, slot )
+ __field( int, event )
+ ),
+
+ TP_fast_assign(
+ __assign_str(port_name);
+ __assign_str(slot);
+ __entry->event = event;
+ ),
+
+ TP_printk("%s slot:%s, event:%s\n",
+ __get_str(port_name),
+ __get_str(slot),
+ __print_symbolic(__entry->event, PCI_HOTPLUG_EVENT)
+ )
+);
+
+#endif /* _TRACE_HW_EVENT_PCI_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/include/uapi/linux/pci.h b/include/uapi/linux/pci.h
index a769eefc5139..4f150028965d 100644
--- a/include/uapi/linux/pci.h
+++ b/include/uapi/linux/pci.h
@@ -39,4 +39,11 @@
#define PCIIOC_MMAP_IS_MEM (PCIIOC_BASE | 0x02) /* Set mmap state to MEM space. */
#define PCIIOC_WRITE_COMBINE (PCIIOC_BASE | 0x03) /* Enable/disable write-combining. */
+enum pci_hotplug_event {
+ PCI_HOTPLUG_LINK_UP,
+ PCI_HOTPLUG_LINK_DOWN,
+ PCI_HOTPLUG_CARD_PRESENT,
+ PCI_HOTPLUG_CARD_NOT_PRESENT,
+};
+
#endif /* _UAPILINUX_PCI_H */
--
2.39.3
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH v12 1/3] PCI: trace: Add a generic RAS tracepoint for hotplug event
2025-10-14 12:31 ` [PATCH v12 1/3] PCI: trace: Add a generic RAS tracepoint for hotplug event Shuai Xue
@ 2025-10-14 15:40 ` Steven Rostedt
2025-10-15 2:35 ` Shuai Xue
2025-10-15 6:29 ` Shuai Xue
0 siblings, 2 replies; 14+ messages in thread
From: Steven Rostedt @ 2025-10-14 15:40 UTC (permalink / raw)
To: Shuai Xue
Cc: lukas, linux-pci, linux-kernel, linux-edac, linux-trace-kernel,
helgaas, ilpo.jarvinen, mattc, Jonathan.Cameron, bhelgaas,
tony.luck, bp, mhiramat, mathieu.desnoyers, oleg, naveen, davem,
anil.s.keshavamurthy, mark.rutland, peterz, tianruidong
On Tue, 14 Oct 2025 20:31:57 +0800
Shuai Xue <xueshuai@linux.alibaba.com> wrote:
> Hotplug events are critical indicators for analyzing hardware health,
> and surprise link downs can significantly impact system performance and
> reliability.
>
> Define a new TRACING_SYSTEM named "pci", add a generic RAS tracepoint
> for hotplug event to help health checks. Add enum pci_hotplug_event in
> include/uapi/linux/pci.h so applications like rasdaemon can register
> tracepoint event handlers for it.
>
> The following output is generated when a device is hotplugged:
>
> $ echo 1 > /sys/kernel/debug/tracing/events/pci/pci_hp_event/enable
> $ cat /sys/kernel/debug/tracing/trace_pipe
> irq/51-pciehp-88 [001] ..... 1311.177459: pci_hp_event: 0000:00:02.0 slot:10, event:CARD_PRESENT
>
> irq/51-pciehp-88 [001] ..... 1311.177566: pci_hp_event: 0000:00:02.0 slot:10, event:LINK_UP
>
> Suggested-by: Lukas Wunner <lukas@wunner.de>
> Suggested-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
> Reviewed-by: Lukas Wunner <lukas@wunner.de>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> drivers/pci/Makefile | 3 ++
> drivers/pci/hotplug/pciehp_ctrl.c | 31 ++++++++++++---
> drivers/pci/trace.c | 11 ++++++
> include/trace/events/pci.h | 63 +++++++++++++++++++++++++++++++
> include/uapi/linux/pci.h | 7 ++++
> 5 files changed, 109 insertions(+), 6 deletions(-)
> create mode 100644 drivers/pci/trace.c
> create mode 100644 include/trace/events/pci.h
>
> diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
> index 67647f1880fb..58a4e4ea76b0 100644
> --- a/drivers/pci/Makefile
> +++ b/drivers/pci/Makefile
> @@ -45,3 +45,6 @@ obj-y += controller/
> obj-y += switch/
>
> subdir-ccflags-$(CONFIG_PCI_DEBUG) := -DDEBUG
> +
> +CFLAGS_trace.o := -I$(src)
> +obj-$(CONFIG_TRACING) += trace.o
> diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
> index bcc938d4420f..7805f697a02c 100644
> --- a/drivers/pci/hotplug/pciehp_ctrl.c
> +++ b/drivers/pci/hotplug/pciehp_ctrl.c
> @@ -19,6 +19,7 @@
> #include <linux/types.h>
> #include <linux/pm_runtime.h>
> #include <linux/pci.h>
> +#include <trace/events/pci.h>
>
> #include "../pci.h"
> #include "pciehp.h"
> @@ -244,12 +245,20 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
> case ON_STATE:
> ctrl->state = POWEROFF_STATE;
> mutex_unlock(&ctrl->state_lock);
> - if (events & PCI_EXP_SLTSTA_DLLSC)
> + if (events & PCI_EXP_SLTSTA_DLLSC) {
> ctrl_info(ctrl, "Slot(%s): Link Down\n",
> slot_name(ctrl));
> - if (events & PCI_EXP_SLTSTA_PDC)
> + trace_pci_hp_event(pci_name(ctrl->pcie->port),
> + slot_name(ctrl),
> + PCI_HOTPLUG_LINK_DOWN);
I know this is v12 and I don't remember if I suggested this before and you
gave me a reason already, but why not simply pass in "ctrl" and have the
TRACE_EVENT() denote the names?
> + }
> + if (events & PCI_EXP_SLTSTA_PDC) {
> ctrl_info(ctrl, "Slot(%s): Card not present\n",
> slot_name(ctrl));
> + trace_pci_hp_event(pci_name(ctrl->pcie->port),
> + slot_name(ctrl),
> + PCI_HOTPLUG_CARD_NOT_PRESENT);
> + }
> pciehp_disable_slot(ctrl, SURPRISE_REMOVAL);
> break;
> default:
> @@ -269,6 +278,9 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
> INDICATOR_NOOP);
> ctrl_info(ctrl, "Slot(%s): Card not present\n",
> slot_name(ctrl));
> + trace_pci_hp_event(pci_name(ctrl->pcie->port),
> + slot_name(ctrl),
> + PCI_HOTPLUG_CARD_NOT_PRESENT);
> }
> mutex_unlock(&ctrl->state_lock);
> return;
> @@ -281,12 +293,19 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
> case OFF_STATE:
> ctrl->state = POWERON_STATE;
> mutex_unlock(&ctrl->state_lock);
> - if (present)
> + if (present) {
> ctrl_info(ctrl, "Slot(%s): Card present\n",
> slot_name(ctrl));
> - if (link_active)
> - ctrl_info(ctrl, "Slot(%s): Link Up\n",
> - slot_name(ctrl));
> + trace_pci_hp_event(pci_name(ctrl->pcie->port),
> + slot_name(ctrl),
> + PCI_HOTPLUG_CARD_PRESENT);
> + }
> + if (link_active) {
> + ctrl_info(ctrl, "Slot(%s): Link Up\n", slot_name(ctrl));
> + trace_pci_hp_event(pci_name(ctrl->pcie->port),
> + slot_name(ctrl),
> + PCI_HOTPLUG_LINK_UP);
> + }
> ctrl->request_result = pciehp_enable_slot(ctrl);
> break;
> default:
> diff --git a/drivers/pci/trace.c b/drivers/pci/trace.c
> new file mode 100644
> index 000000000000..cf11abca8602
> --- /dev/null
> +++ b/drivers/pci/trace.c
> @@ -0,0 +1,11 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Tracepoints for PCI system
> + *
> + * Copyright (C) 2025 Alibaba Corporation
> + */
> +
> +#include <linux/pci.h>
> +
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/pci.h>
> diff --git a/include/trace/events/pci.h b/include/trace/events/pci.h
> new file mode 100644
> index 000000000000..208609492c06
> --- /dev/null
> +++ b/include/trace/events/pci.h
> @@ -0,0 +1,63 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM pci
> +
> +#if !defined(_TRACE_HW_EVENT_PCI_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_HW_EVENT_PCI_H
> +
> +#include <linux/tracepoint.h>
> +
> +#define PCI_HOTPLUG_EVENT \
> + EM(PCI_HOTPLUG_LINK_UP, "LINK_UP") \
> + EM(PCI_HOTPLUG_LINK_DOWN, "LINK_DOWN") \
> + EM(PCI_HOTPLUG_CARD_PRESENT, "CARD_PRESENT") \
> + EMe(PCI_HOTPLUG_CARD_NOT_PRESENT, "CARD_NOT_PRESENT")
> +
> +/* Enums require being exported to userspace, for user tool parsing */
> +#undef EM
> +#undef EMe
> +#define EM(a, b) TRACE_DEFINE_ENUM(a);
> +#define EMe(a, b) TRACE_DEFINE_ENUM(a);
> +
> +PCI_HOTPLUG_EVENT
> +
> +/*
> + * Now redefine the EM() and EMe() macros to map the enums to the strings
> + * that will be printed in the output.
> + */
> +#undef EM
> +#undef EMe
> +#define EM(a, b) {a, b},
> +#define EMe(a, b) {a, b}
> +
> +TRACE_EVENT(pci_hp_event,
> +
> + TP_PROTO(const char *port_name,
> + const char *slot,
> + const int event),
> +
> + TP_ARGS(port_name, slot, event),
> +
> + TP_STRUCT__entry(
> + __string( port_name, port_name )
> + __string( slot, slot )
> + __field( int, event )
> + ),
TP_PROTO(struct controller *ctrl, int event),
TP_ARGS(ctrl, event),
TP_STRUCT__entry(
__string( port_name, pci_name(ctrl->pcie->port) )
__string( slot, slot_name(ctrl) )
__field( int, event )
),
It would move the work out of the calling path.
-- Steve
> +
> + TP_fast_assign(
> + __assign_str(port_name);
> + __assign_str(slot);
> + __entry->event = event;
> + ),
> +
> + TP_printk("%s slot:%s, event:%s\n",
> + __get_str(port_name),
> + __get_str(slot),
> + __print_symbolic(__entry->event, PCI_HOTPLUG_EVENT)
> + )
> +);
> +
> +#endif /* _TRACE_HW_EVENT_PCI_H */
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>
> diff --git a/include/uapi/linux/pci.h b/include/uapi/linux/pci.h
> index a769eefc5139..4f150028965d 100644
> --- a/include/uapi/linux/pci.h
> +++ b/include/uapi/linux/pci.h
> @@ -39,4 +39,11 @@
> #define PCIIOC_MMAP_IS_MEM (PCIIOC_BASE | 0x02) /* Set mmap state to MEM space. */
> #define PCIIOC_WRITE_COMBINE (PCIIOC_BASE | 0x03) /* Enable/disable write-combining. */
>
> +enum pci_hotplug_event {
> + PCI_HOTPLUG_LINK_UP,
> + PCI_HOTPLUG_LINK_DOWN,
> + PCI_HOTPLUG_CARD_PRESENT,
> + PCI_HOTPLUG_CARD_NOT_PRESENT,
> +};
> +
> #endif /* _UAPILINUX_PCI_H */
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH v12 1/3] PCI: trace: Add a generic RAS tracepoint for hotplug event
2025-10-14 15:40 ` Steven Rostedt
@ 2025-10-15 2:35 ` Shuai Xue
2025-10-15 6:29 ` Shuai Xue
1 sibling, 0 replies; 14+ messages in thread
From: Shuai Xue @ 2025-10-15 2:35 UTC (permalink / raw)
To: Steven Rostedt
Cc: lukas, linux-pci, linux-kernel, linux-edac, linux-trace-kernel,
helgaas, ilpo.jarvinen, mattc, Jonathan.Cameron, bhelgaas,
tony.luck, bp, mhiramat, mathieu.desnoyers, oleg, naveen, davem,
anil.s.keshavamurthy, mark.rutland, peterz, tianruidong
在 2025/10/14 23:40, Steven Rostedt 写道:
> On Tue, 14 Oct 2025 20:31:57 +0800
> Shuai Xue <xueshuai@linux.alibaba.com> wrote:
>
>> Hotplug events are critical indicators for analyzing hardware health,
>> and surprise link downs can significantly impact system performance and
>> reliability.
>>
>> Define a new TRACING_SYSTEM named "pci", add a generic RAS tracepoint
>> for hotplug event to help health checks. Add enum pci_hotplug_event in
>> include/uapi/linux/pci.h so applications like rasdaemon can register
>> tracepoint event handlers for it.
>>
>> The following output is generated when a device is hotplugged:
>>
>> $ echo 1 > /sys/kernel/debug/tracing/events/pci/pci_hp_event/enable
>> $ cat /sys/kernel/debug/tracing/trace_pipe
>> irq/51-pciehp-88 [001] ..... 1311.177459: pci_hp_event: 0000:00:02.0 slot:10, event:CARD_PRESENT
>>
>> irq/51-pciehp-88 [001] ..... 1311.177566: pci_hp_event: 0000:00:02.0 slot:10, event:LINK_UP
>>
>> Suggested-by: Lukas Wunner <lukas@wunner.de>
>> Suggested-by: Steven Rostedt <rostedt@goodmis.org>
>> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
>> Reviewed-by: Lukas Wunner <lukas@wunner.de>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> ---
>> drivers/pci/Makefile | 3 ++
>> drivers/pci/hotplug/pciehp_ctrl.c | 31 ++++++++++++---
>> drivers/pci/trace.c | 11 ++++++
>> include/trace/events/pci.h | 63 +++++++++++++++++++++++++++++++
>> include/uapi/linux/pci.h | 7 ++++
>> 5 files changed, 109 insertions(+), 6 deletions(-)
>> create mode 100644 drivers/pci/trace.c
>> create mode 100644 include/trace/events/pci.h
>>
>> diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
>> index 67647f1880fb..58a4e4ea76b0 100644
>> --- a/drivers/pci/Makefile
>> +++ b/drivers/pci/Makefile
>> @@ -45,3 +45,6 @@ obj-y += controller/
>> obj-y += switch/
>>
>> subdir-ccflags-$(CONFIG_PCI_DEBUG) := -DDEBUG
>> +
>> +CFLAGS_trace.o := -I$(src)
>> +obj-$(CONFIG_TRACING) += trace.o
>> diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
>> index bcc938d4420f..7805f697a02c 100644
>> --- a/drivers/pci/hotplug/pciehp_ctrl.c
>> +++ b/drivers/pci/hotplug/pciehp_ctrl.c
>> @@ -19,6 +19,7 @@
>> #include <linux/types.h>
>> #include <linux/pm_runtime.h>
>> #include <linux/pci.h>
>> +#include <trace/events/pci.h>
>>
>> #include "../pci.h"
>> #include "pciehp.h"
>> @@ -244,12 +245,20 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
>> case ON_STATE:
>> ctrl->state = POWEROFF_STATE;
>> mutex_unlock(&ctrl->state_lock);
>> - if (events & PCI_EXP_SLTSTA_DLLSC)
>> + if (events & PCI_EXP_SLTSTA_DLLSC) {
>> ctrl_info(ctrl, "Slot(%s): Link Down\n",
>> slot_name(ctrl));
>> - if (events & PCI_EXP_SLTSTA_PDC)
>> + trace_pci_hp_event(pci_name(ctrl->pcie->port),
>> + slot_name(ctrl),
>> + PCI_HOTPLUG_LINK_DOWN);
>
> I know this is v12 and I don't remember if I suggested this before and you
> gave me a reason already,
Aha, in our previous email discussions regarding the tracepoint
implementation, you provided many constructive suggestions, so I added
the Suggested-by tag. Perhaps I misunderstood the meaning of
Suggested-by - I will drop it.
> but why not simply pass in "ctrl" and have the
> TRACE_EVENT() denote the names?
Sure, I will send a version.
>> + }
>> + if (events & PCI_EXP_SLTSTA_PDC) {
>> ctrl_info(ctrl, "Slot(%s): Card not present\n",
>> slot_name(ctrl));
>> + trace_pci_hp_event(pci_name(ctrl->pcie->port),
>> + slot_name(ctrl),
>> + PCI_HOTPLUG_CARD_NOT_PRESENT);
>> + }
>> pciehp_disable_slot(ctrl, SURPRISE_REMOVAL);
>> break;
>> default:
>> @@ -269,6 +278,9 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
>> INDICATOR_NOOP);
>> ctrl_info(ctrl, "Slot(%s): Card not present\n",
>> slot_name(ctrl));
>> + trace_pci_hp_event(pci_name(ctrl->pcie->port),
>> + slot_name(ctrl),
>> + PCI_HOTPLUG_CARD_NOT_PRESENT);
>> }
>> mutex_unlock(&ctrl->state_lock);
>> return;
>> @@ -281,12 +293,19 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
>> case OFF_STATE:
>> ctrl->state = POWERON_STATE;
>> mutex_unlock(&ctrl->state_lock);
>> - if (present)
>> + if (present) {
>> ctrl_info(ctrl, "Slot(%s): Card present\n",
>> slot_name(ctrl));
>> - if (link_active)
>> - ctrl_info(ctrl, "Slot(%s): Link Up\n",
>> - slot_name(ctrl));
>> + trace_pci_hp_event(pci_name(ctrl->pcie->port),
>> + slot_name(ctrl),
>> + PCI_HOTPLUG_CARD_PRESENT);
>> + }
>> + if (link_active) {
>> + ctrl_info(ctrl, "Slot(%s): Link Up\n", slot_name(ctrl));
>> + trace_pci_hp_event(pci_name(ctrl->pcie->port),
>> + slot_name(ctrl),
>> + PCI_HOTPLUG_LINK_UP);
>> + }
>> ctrl->request_result = pciehp_enable_slot(ctrl);
>> break;
>> default:
>> diff --git a/drivers/pci/trace.c b/drivers/pci/trace.c
>> new file mode 100644
>> index 000000000000..cf11abca8602
>> --- /dev/null
>> +++ b/drivers/pci/trace.c
>> @@ -0,0 +1,11 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * Tracepoints for PCI system
>> + *
>> + * Copyright (C) 2025 Alibaba Corporation
>> + */
>> +
>> +#include <linux/pci.h>
>> +
>> +#define CREATE_TRACE_POINTS
>> +#include <trace/events/pci.h>
>> diff --git a/include/trace/events/pci.h b/include/trace/events/pci.h
>> new file mode 100644
>> index 000000000000..208609492c06
>> --- /dev/null
>> +++ b/include/trace/events/pci.h
>> @@ -0,0 +1,63 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#undef TRACE_SYSTEM
>> +#define TRACE_SYSTEM pci
>> +
>> +#if !defined(_TRACE_HW_EVENT_PCI_H) || defined(TRACE_HEADER_MULTI_READ)
>> +#define _TRACE_HW_EVENT_PCI_H
>> +
>> +#include <linux/tracepoint.h>
>> +
>> +#define PCI_HOTPLUG_EVENT \
>> + EM(PCI_HOTPLUG_LINK_UP, "LINK_UP") \
>> + EM(PCI_HOTPLUG_LINK_DOWN, "LINK_DOWN") \
>> + EM(PCI_HOTPLUG_CARD_PRESENT, "CARD_PRESENT") \
>> + EMe(PCI_HOTPLUG_CARD_NOT_PRESENT, "CARD_NOT_PRESENT")
>> +
>> +/* Enums require being exported to userspace, for user tool parsing */
>> +#undef EM
>> +#undef EMe
>> +#define EM(a, b) TRACE_DEFINE_ENUM(a);
>> +#define EMe(a, b) TRACE_DEFINE_ENUM(a);
>> +
>> +PCI_HOTPLUG_EVENT
>> +
>> +/*
>> + * Now redefine the EM() and EMe() macros to map the enums to the strings
>> + * that will be printed in the output.
>> + */
>> +#undef EM
>> +#undef EMe
>> +#define EM(a, b) {a, b},
>> +#define EMe(a, b) {a, b}
>> +
>> +TRACE_EVENT(pci_hp_event,
>> +
>> + TP_PROTO(const char *port_name,
>> + const char *slot,
>> + const int event),
>> +
>> + TP_ARGS(port_name, slot, event),
>> +
>> + TP_STRUCT__entry(
>> + __string( port_name, port_name )
>> + __string( slot, slot )
>> + __field( int, event )
>> + ),
>
> TP_PROTO(struct controller *ctrl, int event),
>
> TP_ARGS(ctrl, event),
>
> TP_STRUCT__entry(
> __string( port_name, pci_name(ctrl->pcie->port) )
> __string( slot, slot_name(ctrl) )
> __field( int, event )
> ),
>
> It would move the work out of the calling path.
>
I see.
> -- Steve
Thanks.
Best Regards,
Shuai
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH v12 1/3] PCI: trace: Add a generic RAS tracepoint for hotplug event
2025-10-14 15:40 ` Steven Rostedt
2025-10-15 2:35 ` Shuai Xue
@ 2025-10-15 6:29 ` Shuai Xue
2025-10-15 14:37 ` Steven Rostedt
1 sibling, 1 reply; 14+ messages in thread
From: Shuai Xue @ 2025-10-15 6:29 UTC (permalink / raw)
To: Steven Rostedt
Cc: lukas, linux-pci, linux-kernel, linux-edac, linux-trace-kernel,
helgaas, ilpo.jarvinen, mattc, Jonathan.Cameron, bhelgaas,
tony.luck, bp, mhiramat, mathieu.desnoyers, oleg, naveen, davem,
anil.s.keshavamurthy, mark.rutland, peterz, tianruidong
在 2025/10/14 23:40, Steven Rostedt 写道:
> On Tue, 14 Oct 2025 20:31:57 +0800
> Shuai Xue <xueshuai@linux.alibaba.com> wrote:
>
>> Hotplug events are critical indicators for analyzing hardware health,
>> and surprise link downs can significantly impact system performance and
>> reliability.
>>
>> Define a new TRACING_SYSTEM named "pci", add a generic RAS tracepoint
>> for hotplug event to help health checks. Add enum pci_hotplug_event in
>> include/uapi/linux/pci.h so applications like rasdaemon can register
>> tracepoint event handlers for it.
>>
>> The following output is generated when a device is hotplugged:
>>
>> $ echo 1 > /sys/kernel/debug/tracing/events/pci/pci_hp_event/enable
>> $ cat /sys/kernel/debug/tracing/trace_pipe
>> irq/51-pciehp-88 [001] ..... 1311.177459: pci_hp_event: 0000:00:02.0 slot:10, event:CARD_PRESENT
>>
>> irq/51-pciehp-88 [001] ..... 1311.177566: pci_hp_event: 0000:00:02.0 slot:10, event:LINK_UP
>>
>> Suggested-by: Lukas Wunner <lukas@wunner.de>
>> Suggested-by: Steven Rostedt <rostedt@goodmis.org>
>> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
>> Reviewed-by: Lukas Wunner <lukas@wunner.de>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> ---
>> drivers/pci/Makefile | 3 ++
>> drivers/pci/hotplug/pciehp_ctrl.c | 31 ++++++++++++---
>> drivers/pci/trace.c | 11 ++++++
>> include/trace/events/pci.h | 63 +++++++++++++++++++++++++++++++
>> include/uapi/linux/pci.h | 7 ++++
>> 5 files changed, 109 insertions(+), 6 deletions(-)
>> create mode 100644 drivers/pci/trace.c
>> create mode 100644 include/trace/events/pci.h
>>
>> diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
>> index 67647f1880fb..58a4e4ea76b0 100644
>> --- a/drivers/pci/Makefile
>> +++ b/drivers/pci/Makefile
>> @@ -45,3 +45,6 @@ obj-y += controller/
>> obj-y += switch/
>>
>> subdir-ccflags-$(CONFIG_PCI_DEBUG) := -DDEBUG
>> +
>> +CFLAGS_trace.o := -I$(src)
>> +obj-$(CONFIG_TRACING) += trace.o
>> diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
>> index bcc938d4420f..7805f697a02c 100644
>> --- a/drivers/pci/hotplug/pciehp_ctrl.c
>> +++ b/drivers/pci/hotplug/pciehp_ctrl.c
>> @@ -19,6 +19,7 @@
>> #include <linux/types.h>
>> #include <linux/pm_runtime.h>
>> #include <linux/pci.h>
>> +#include <trace/events/pci.h>
>>
>> #include "../pci.h"
>> #include "pciehp.h"
>> @@ -244,12 +245,20 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
>> case ON_STATE:
>> ctrl->state = POWEROFF_STATE;
>> mutex_unlock(&ctrl->state_lock);
>> - if (events & PCI_EXP_SLTSTA_DLLSC)
>> + if (events & PCI_EXP_SLTSTA_DLLSC) {
>> ctrl_info(ctrl, "Slot(%s): Link Down\n",
>> slot_name(ctrl));
>> - if (events & PCI_EXP_SLTSTA_PDC)
>> + trace_pci_hp_event(pci_name(ctrl->pcie->port),
>> + slot_name(ctrl),
>> + PCI_HOTPLUG_LINK_DOWN);
>
> I know this is v12 and I don't remember if I suggested this before and you
> gave me a reason already, but why not simply pass in "ctrl" and have the
> TRACE_EVENT() denote the names?
>
>> + }
>> + if (events & PCI_EXP_SLTSTA_PDC) {
>> ctrl_info(ctrl, "Slot(%s): Card not present\n",
>> slot_name(ctrl));
>> + trace_pci_hp_event(pci_name(ctrl->pcie->port),
>> + slot_name(ctrl),
>> + PCI_HOTPLUG_CARD_NOT_PRESENT);
>> + }
>> pciehp_disable_slot(ctrl, SURPRISE_REMOVAL);
>> break;
>> default:
>> @@ -269,6 +278,9 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
>> INDICATOR_NOOP);
>> ctrl_info(ctrl, "Slot(%s): Card not present\n",
>> slot_name(ctrl));
>> + trace_pci_hp_event(pci_name(ctrl->pcie->port),
>> + slot_name(ctrl),
>> + PCI_HOTPLUG_CARD_NOT_PRESENT);
>> }
>> mutex_unlock(&ctrl->state_lock);
>> return;
>> @@ -281,12 +293,19 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
>> case OFF_STATE:
>> ctrl->state = POWERON_STATE;
>> mutex_unlock(&ctrl->state_lock);
>> - if (present)
>> + if (present) {
>> ctrl_info(ctrl, "Slot(%s): Card present\n",
>> slot_name(ctrl));
>> - if (link_active)
>> - ctrl_info(ctrl, "Slot(%s): Link Up\n",
>> - slot_name(ctrl));
>> + trace_pci_hp_event(pci_name(ctrl->pcie->port),
>> + slot_name(ctrl),
>> + PCI_HOTPLUG_CARD_PRESENT);
>> + }
>> + if (link_active) {
>> + ctrl_info(ctrl, "Slot(%s): Link Up\n", slot_name(ctrl));
>> + trace_pci_hp_event(pci_name(ctrl->pcie->port),
>> + slot_name(ctrl),
>> + PCI_HOTPLUG_LINK_UP);
>> + }
>> ctrl->request_result = pciehp_enable_slot(ctrl);
>> break;
>> default:
>> diff --git a/drivers/pci/trace.c b/drivers/pci/trace.c
>> new file mode 100644
>> index 000000000000..cf11abca8602
>> --- /dev/null
>> +++ b/drivers/pci/trace.c
>> @@ -0,0 +1,11 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * Tracepoints for PCI system
>> + *
>> + * Copyright (C) 2025 Alibaba Corporation
>> + */
>> +
>> +#include <linux/pci.h>
>> +
>> +#define CREATE_TRACE_POINTS
>> +#include <trace/events/pci.h>
>> diff --git a/include/trace/events/pci.h b/include/trace/events/pci.h
>> new file mode 100644
>> index 000000000000..208609492c06
>> --- /dev/null
>> +++ b/include/trace/events/pci.h
>> @@ -0,0 +1,63 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#undef TRACE_SYSTEM
>> +#define TRACE_SYSTEM pci
>> +
>> +#if !defined(_TRACE_HW_EVENT_PCI_H) || defined(TRACE_HEADER_MULTI_READ)
>> +#define _TRACE_HW_EVENT_PCI_H
>> +
>> +#include <linux/tracepoint.h>
>> +
>> +#define PCI_HOTPLUG_EVENT \
>> + EM(PCI_HOTPLUG_LINK_UP, "LINK_UP") \
>> + EM(PCI_HOTPLUG_LINK_DOWN, "LINK_DOWN") \
>> + EM(PCI_HOTPLUG_CARD_PRESENT, "CARD_PRESENT") \
>> + EMe(PCI_HOTPLUG_CARD_NOT_PRESENT, "CARD_NOT_PRESENT")
>> +
>> +/* Enums require being exported to userspace, for user tool parsing */
>> +#undef EM
>> +#undef EMe
>> +#define EM(a, b) TRACE_DEFINE_ENUM(a);
>> +#define EMe(a, b) TRACE_DEFINE_ENUM(a);
>> +
>> +PCI_HOTPLUG_EVENT
>> +
>> +/*
>> + * Now redefine the EM() and EMe() macros to map the enums to the strings
>> + * that will be printed in the output.
>> + */
>> +#undef EM
>> +#undef EMe
>> +#define EM(a, b) {a, b},
>> +#define EMe(a, b) {a, b}
>> +
>> +TRACE_EVENT(pci_hp_event,
>> +
>> + TP_PROTO(const char *port_name,
>> + const char *slot,
>> + const int event),
>> +
>> + TP_ARGS(port_name, slot, event),
>> +
>> + TP_STRUCT__entry(
>> + __string( port_name, port_name )
>> + __string( slot, slot )
>> + __field( int, event )
>> + ),
>
> TP_PROTO(struct controller *ctrl, int event),
>
> TP_ARGS(ctrl, event),
>
> TP_STRUCT__entry(
> __string( port_name, pci_name(ctrl->pcie->port) )
> __string( slot, slot_name(ctrl) )
> __field( int, event )
> ),
>
> It would move the work out of the calling path.
>
> -- Steve
>
Hi, Steve,
Thank you for your suggestion about passing the controller directly to
the trace event. I investigated this approach, but unfortunately we
cannot implement it due to structural limitations in the PCI hotplug
subsystem.
The issue is that `struct controller` is not standardized across
different PCI hotplug drivers. Each driver defines its own version:
- pciehp has its own struct controller
- cpqphp has a different struct controller
- ibmphp and shpchp also have their own variants
This leads to naming conflicts. For example, both pciehp and cpqphp
define a slot_name() function, but with different signatures:
// In pciehp:
static inline const char *slot_name(struct controller *ctrl)
{
return hotplug_slot_name(&ctrl->hotplug_slot);
}
// In cpqphp:
static inline const char *slot_name(struct slot *slot)
Additionally, `struct hotplug_slot` is not a common field across all
controller variants, making it impossible to have a unified way to
extract the slot name from a generic controller pointer in the trace
event.
Since we want these trace events to be generic and usable across all PCI
hotplug drivers (not just pciehp), we need to pass the already-resolved
strings rather than driver-specific structures. This ensures
compatibility and avoids the complexity of handling multiple controller
types within the trace infrastructure.
I understand this means doing the name resolution in the calling path,
but it's necessary to maintain a generic interface that works across all
PCI hotplug implementations.
Best Regards,
Shuai
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH v12 1/3] PCI: trace: Add a generic RAS tracepoint for hotplug event
2025-10-15 6:29 ` Shuai Xue
@ 2025-10-15 14:37 ` Steven Rostedt
2025-10-20 1:32 ` Shuai Xue
0 siblings, 1 reply; 14+ messages in thread
From: Steven Rostedt @ 2025-10-15 14:37 UTC (permalink / raw)
To: Shuai Xue
Cc: lukas, linux-pci, linux-kernel, linux-edac, linux-trace-kernel,
helgaas, ilpo.jarvinen, mattc, Jonathan.Cameron, bhelgaas,
tony.luck, bp, mhiramat, mathieu.desnoyers, oleg, naveen, davem,
anil.s.keshavamurthy, mark.rutland, peterz, tianruidong
On Wed, 15 Oct 2025 14:29:07 +0800
Shuai Xue <xueshuai@linux.alibaba.com> wrote:
> Hi, Steve,
>
> Thank you for your suggestion about passing the controller directly to
> the trace event. I investigated this approach, but unfortunately we
> cannot implement it due to structural limitations in the PCI hotplug
> subsystem.
Ah, that makes sense. Perhaps add a comment about this by the TRACE_EVENT()
so that I don't recommend this again ;-)
-- Steve
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v12 1/3] PCI: trace: Add a generic RAS tracepoint for hotplug event
2025-10-15 14:37 ` Steven Rostedt
@ 2025-10-20 1:32 ` Shuai Xue
2025-10-20 15:30 ` Steven Rostedt
0 siblings, 1 reply; 14+ messages in thread
From: Shuai Xue @ 2025-10-20 1:32 UTC (permalink / raw)
To: Steven Rostedt
Cc: lukas, linux-pci, linux-kernel, linux-edac, linux-trace-kernel,
helgaas, ilpo.jarvinen, mattc, Jonathan.Cameron, bhelgaas,
tony.luck, bp, mhiramat, mathieu.desnoyers, oleg, naveen, davem,
anil.s.keshavamurthy, mark.rutland, peterz, tianruidong
在 2025/10/15 22:37, Steven Rostedt 写道:
> On Wed, 15 Oct 2025 14:29:07 +0800
> Shuai Xue <xueshuai@linux.alibaba.com> wrote:
>
>> Hi, Steve,
>>
>> Thank you for your suggestion about passing the controller directly to
>> the trace event. I investigated this approach, but unfortunately we
>> cannot implement it due to structural limitations in the PCI hotplug
>> subsystem.
>
> Ah, that makes sense. Perhaps add a comment about this by the TRACE_EVENT()
> so that I don't recommend this again ;-)
>
Hi Steve,
Got it, will add a comment.
If you don't have any other concerns with this patch, would you mind
adding your Reviewed-by tag?
> -- Steve
Best Regards,
Shuai
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v12 1/3] PCI: trace: Add a generic RAS tracepoint for hotplug event
2025-10-20 1:32 ` Shuai Xue
@ 2025-10-20 15:30 ` Steven Rostedt
2025-10-21 1:50 ` Shuai Xue
0 siblings, 1 reply; 14+ messages in thread
From: Steven Rostedt @ 2025-10-20 15:30 UTC (permalink / raw)
To: Shuai Xue
Cc: lukas, linux-pci, linux-kernel, linux-edac, linux-trace-kernel,
helgaas, ilpo.jarvinen, mattc, Jonathan.Cameron, bhelgaas,
tony.luck, bp, mhiramat, mathieu.desnoyers, oleg, naveen, davem,
anil.s.keshavamurthy, mark.rutland, peterz, tianruidong
On Mon, 20 Oct 2025 09:32:59 +0800
Shuai Xue <xueshuai@linux.alibaba.com> wrote:
> Got it, will add a comment.
>
> If you don't have any other concerns with this patch, would you mind
> adding your Reviewed-by tag?
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> # for trace event
-- Steve
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v12 1/3] PCI: trace: Add a generic RAS tracepoint for hotplug event
2025-10-20 15:30 ` Steven Rostedt
@ 2025-10-21 1:50 ` Shuai Xue
0 siblings, 0 replies; 14+ messages in thread
From: Shuai Xue @ 2025-10-21 1:50 UTC (permalink / raw)
To: Steven Rostedt
Cc: lukas, linux-pci, linux-kernel, linux-edac, linux-trace-kernel,
helgaas, ilpo.jarvinen, mattc, Jonathan.Cameron, bhelgaas,
tony.luck, bp, mhiramat, mathieu.desnoyers, oleg, naveen, davem,
anil.s.keshavamurthy, mark.rutland, peterz, tianruidong
在 2025/10/20 23:30, Steven Rostedt 写道:
> On Mon, 20 Oct 2025 09:32:59 +0800
> Shuai Xue <xueshuai@linux.alibaba.com> wrote:
>
>> Got it, will add a comment.
>>
>> If you don't have any other concerns with this patch, would you mind
>> adding your Reviewed-by tag?
>
> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> # for trace event
>
> -- Steve
Thanks.
Shuai
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v12 2/3] PCI: trace: Add a RAS tracepoint to monitor link speed changes
2025-10-14 12:31 [PATCH v12 0/3] PCI: trace: Add a RAS tracepoint to monitor link speed changes Shuai Xue
2025-10-14 12:31 ` [PATCH v12 1/3] PCI: trace: Add a generic RAS tracepoint for hotplug event Shuai Xue
@ 2025-10-14 12:31 ` Shuai Xue
2025-10-15 4:49 ` kernel test robot
2025-10-14 12:31 ` [PATCH v12 3/3] Documentation: tracing: Add documentation about PCI tracepoints Shuai Xue
2 siblings, 1 reply; 14+ messages in thread
From: Shuai Xue @ 2025-10-14 12:31 UTC (permalink / raw)
To: rostedt, lukas, linux-pci, linux-kernel, linux-edac,
linux-trace-kernel, helgaas, ilpo.jarvinen, mattc,
Jonathan.Cameron
Cc: bhelgaas, tony.luck, bp, xueshuai, mhiramat, mathieu.desnoyers,
oleg, naveen, davem, anil.s.keshavamurthy, mark.rutland, peterz,
tianruidong
PCIe link speed degradation directly impacts system performance and
often indicates hardware issues such as faulty devices, physical layer
problems, or configuration errors.
To this end, add a RAS tracepoint to monitor link speed changes,
enabling proactive health checks and diagnostic analysis.
The following output is generated when a device is hotplugged:
$ echo 1 > /sys/kernel/debug/tracing/events/pci/pcie_link_event/enable
$ cat /sys/kernel/debug/tracing/trace_pipe
irq/51-pciehp-88 [001] ..... 381.545386: pcie_link_event: 0000:00:02.0 type:4, reason:4, cur_bus_speed:2.5 GT/s PCIe, max_bus_speed:16.0 GT/s PCIe, width:1, flit_mode:0, status:DLLLA
Suggested-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Suggested-by: Matthew W Carlis <mattc@purestorage.com>
Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
drivers/pci/hotplug/pciehp_hpc.c | 3 +-
drivers/pci/pci.c | 2 +-
drivers/pci/pci.h | 22 ++++++++++--
drivers/pci/pcie/bwctrl.c | 4 +--
drivers/pci/probe.c | 9 +++--
include/linux/pci.h | 1 +
include/trace/events/pci.h | 57 ++++++++++++++++++++++++++++++++
7 files changed, 88 insertions(+), 10 deletions(-)
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index bcc51b26d03d..ad5f28f6a8b1 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -320,7 +320,8 @@ int pciehp_check_link_status(struct controller *ctrl)
}
pcie_capability_read_word(pdev, PCI_EXP_LNKSTA2, &linksta2);
- __pcie_update_link_speed(ctrl->pcie->port->subordinate, lnk_status, linksta2);
+ __pcie_update_link_speed(ctrl->pcie->port->subordinate, PCIE_HOTPLUG,
+ lnk_status, linksta2);
if (!found) {
ctrl_info(ctrl, "Slot(%s): No device found\n",
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index b14dd064006c..6a979a234fe6 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4698,7 +4698,7 @@ int pcie_retrain_link(struct pci_dev *pdev, bool use_lt)
* Link Speed.
*/
if (pdev->subordinate)
- pcie_update_link_speed(pdev->subordinate);
+ pcie_update_link_speed(pdev->subordinate, PCIE_LINK_RETRAIN);
return rc;
}
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 4492b809094b..fff30521ed83 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -5,6 +5,7 @@
#include <linux/align.h>
#include <linux/bitfield.h>
#include <linux/pci.h>
+#include <trace/events/pci.h>
struct pcie_tlp_log;
@@ -549,16 +550,31 @@ static inline int pcie_dev_speed_mbps(enum pci_bus_speed speed)
}
u8 pcie_get_supported_speeds(struct pci_dev *dev);
-const char *pci_speed_string(enum pci_bus_speed speed);
void __pcie_print_link_status(struct pci_dev *dev, bool verbose);
void pcie_report_downtraining(struct pci_dev *dev);
-static inline void __pcie_update_link_speed(struct pci_bus *bus, u16 linksta, u16 linksta2)
+enum pcie_link_change_reason {
+ PCIE_LINK_RETRAIN,
+ PCIE_ADD_BUS,
+ PCIE_BWCTRL_ENABLE,
+ PCIE_BWCTRL_IRQ,
+ PCIE_HOTPLUG,
+};
+
+static inline void __pcie_update_link_speed(struct pci_bus *bus,
+ enum pcie_link_change_reason reason,
+ u16 linksta, u16 linksta2)
{
bus->cur_bus_speed = pcie_link_speed[linksta & PCI_EXP_LNKSTA_CLS];
bus->flit_mode = (linksta2 & PCI_EXP_LNKSTA2_FLIT) ? 1 : 0;
+
+ trace_pcie_link_event(bus,
+ reason,
+ FIELD_GET(PCI_EXP_LNKSTA_NLW, linksta),
+ linksta & PCI_EXP_LNKSTA_LINK_STATUS_MASK);
}
-void pcie_update_link_speed(struct pci_bus *bus);
+
+void pcie_update_link_speed(struct pci_bus *bus, enum pcie_link_change_reason reason);
/* Single Root I/O Virtualization */
struct pci_sriov {
diff --git a/drivers/pci/pcie/bwctrl.c b/drivers/pci/pcie/bwctrl.c
index 36f939f23d34..32f1b30ecb84 100644
--- a/drivers/pci/pcie/bwctrl.c
+++ b/drivers/pci/pcie/bwctrl.c
@@ -199,7 +199,7 @@ static void pcie_bwnotif_enable(struct pcie_device *srv)
* Update after enabling notifications & clearing status bits ensures
* link speed is up to date.
*/
- pcie_update_link_speed(port->subordinate);
+ pcie_update_link_speed(port->subordinate, PCIE_BWCTRL_ENABLE);
}
static void pcie_bwnotif_disable(struct pci_dev *port)
@@ -234,7 +234,7 @@ static irqreturn_t pcie_bwnotif_irq(int irq, void *context)
* speed (inside pcie_update_link_speed()) after LBMS has been
* cleared to avoid missing link speed changes.
*/
- pcie_update_link_speed(port->subordinate);
+ pcie_update_link_speed(port->subordinate, PCIE_BWCTRL_IRQ);
return IRQ_HANDLED;
}
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index c83e75a0ec12..d52f997ea476 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -22,6 +22,7 @@
#include <linux/irqdomain.h>
#include <linux/pm_runtime.h>
#include <linux/bitfield.h>
+#include <trace/events/pci.h>
#include "pci.h"
#define CARDBUS_LATENCY_TIMER 176 /* secondary latency timer */
@@ -813,14 +814,16 @@ const char *pci_speed_string(enum pci_bus_speed speed)
}
EXPORT_SYMBOL_GPL(pci_speed_string);
-void pcie_update_link_speed(struct pci_bus *bus)
+void pcie_update_link_speed(struct pci_bus *bus,
+ enum pcie_link_change_reason reason)
{
struct pci_dev *bridge = bus->self;
u16 linksta, linksta2;
pcie_capability_read_word(bridge, PCI_EXP_LNKSTA, &linksta);
pcie_capability_read_word(bridge, PCI_EXP_LNKSTA2, &linksta2);
- __pcie_update_link_speed(bus, linksta, linksta2);
+
+ __pcie_update_link_speed(bus, reason, linksta, linksta2);
}
EXPORT_SYMBOL_GPL(pcie_update_link_speed);
@@ -907,7 +910,7 @@ static void pci_set_bus_speed(struct pci_bus *bus)
pcie_capability_read_dword(bridge, PCI_EXP_LNKCAP, &linkcap);
bus->max_bus_speed = pcie_link_speed[linkcap & PCI_EXP_LNKCAP_SLS];
- pcie_update_link_speed(bus);
+ pcie_update_link_speed(bus, PCIE_ADD_BUS);
}
}
diff --git a/include/linux/pci.h b/include/linux/pci.h
index d1fdf81fbe1e..f35a5b8522af 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -306,6 +306,7 @@ enum pci_bus_speed {
PCI_SPEED_UNKNOWN = 0xff,
};
+const char *pci_speed_string(enum pci_bus_speed speed);
enum pci_bus_speed pcie_get_speed_cap(struct pci_dev *dev);
enum pcie_link_width pcie_get_width_cap(struct pci_dev *dev);
diff --git a/include/trace/events/pci.h b/include/trace/events/pci.h
index 208609492c06..4df4daaa2d27 100644
--- a/include/trace/events/pci.h
+++ b/include/trace/events/pci.h
@@ -5,6 +5,7 @@
#if !defined(_TRACE_HW_EVENT_PCI_H) || defined(TRACE_HEADER_MULTI_READ)
#define _TRACE_HW_EVENT_PCI_H
+#include <uapi/linux/pci_regs.h>
#include <linux/tracepoint.h>
#define PCI_HOTPLUG_EVENT \
@@ -57,6 +58,62 @@ TRACE_EVENT(pci_hp_event,
)
);
+#define PCI_EXP_LNKSTA_LINK_STATUS_MASK (PCI_EXP_LNKSTA_LBMS | \
+ PCI_EXP_LNKSTA_LABS | \
+ PCI_EXP_LNKSTA_LT | \
+ PCI_EXP_LNKSTA_DLLLA)
+
+#define LNKSTA_FLAGS \
+ { PCI_EXP_LNKSTA_LT, "LT"}, \
+ { PCI_EXP_LNKSTA_DLLLA, "DLLLA"}, \
+ { PCI_EXP_LNKSTA_LBMS, "LBMS"}, \
+ { PCI_EXP_LNKSTA_LABS, "LABS"}
+
+TRACE_EVENT(pcie_link_event,
+
+ TP_PROTO(struct pci_bus *bus,
+ unsigned int reason,
+ unsigned int width,
+ unsigned int status
+ ),
+
+ TP_ARGS(bus, reason, width, status),
+
+ TP_STRUCT__entry(
+ __string( port_name, pci_name(bus->self))
+ __field( unsigned int, type )
+ __field( unsigned int, reason )
+ __field( unsigned int, cur_bus_speed )
+ __field( unsigned int, max_bus_speed )
+ __field( unsigned int, width )
+ __field( unsigned int, flit_mode )
+ __field( unsigned int, link_status )
+ ),
+
+ TP_fast_assign(
+ __assign_str(port_name);
+ __entry->type = pci_pcie_type(bus->self);
+ __entry->reason = reason;
+ __entry->cur_bus_speed = bus->cur_bus_speed;
+ __entry->max_bus_speed = bus->max_bus_speed;
+ __entry->width = width;
+ __entry->flit_mode = bus->flit_mode;
+ __entry->link_status = status;
+ ),
+
+ TP_printk("%s type:%d, reason:%d, cur_bus_speed:%s, max_bus_speed:%s, width:%u, flit_mode:%u, status:%s\n",
+ __get_str(port_name),
+ __entry->type,
+ __entry->reason,
+ pci_speed_string(__entry->cur_bus_speed),
+ pci_speed_string(__entry->max_bus_speed),
+ __entry->width,
+ __entry->flit_mode,
+ __print_flags((unsigned long)__entry->link_status, "|",
+ LNKSTA_FLAGS)
+ )
+);
+
#endif /* _TRACE_HW_EVENT_PCI_H */
/* This part must be outside protection */
--
2.39.3
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH v12 2/3] PCI: trace: Add a RAS tracepoint to monitor link speed changes
2025-10-14 12:31 ` [PATCH v12 2/3] PCI: trace: Add a RAS tracepoint to monitor link speed changes Shuai Xue
@ 2025-10-15 4:49 ` kernel test robot
0 siblings, 0 replies; 14+ messages in thread
From: kernel test robot @ 2025-10-15 4:49 UTC (permalink / raw)
To: Shuai Xue, rostedt, lukas, linux-pci, linux-kernel, linux-edac,
linux-trace-kernel, helgaas, ilpo.jarvinen, mattc,
Jonathan.Cameron
Cc: oe-kbuild-all, bhelgaas, tony.luck, bp, xueshuai, mhiramat,
mathieu.desnoyers, oleg, naveen, davem, mark.rutland, peterz,
tianruidong
Hi Shuai,
kernel test robot noticed the following build errors:
[auto build test ERROR on pci/next]
[also build test ERROR on pci/for-linus linus/master v6.18-rc1 next-20251014]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Shuai-Xue/PCI-trace-Add-a-generic-RAS-tracepoint-for-hotplug-event/20251014-203432
base: https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git next
patch link: https://lore.kernel.org/r/20251014123159.57764-3-xueshuai%40linux.alibaba.com
patch subject: [PATCH v12 2/3] PCI: trace: Add a RAS tracepoint to monitor link speed changes
config: sparc-randconfig-001-20251015 (https://download.01.org/0day-ci/archive/20251015/202510151212.ifOhr1Ak-lkp@intel.com/config)
compiler: sparc64-linux-gcc (GCC) 8.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251015/202510151212.ifOhr1Ak-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202510151212.ifOhr1Ak-lkp@intel.com/
All errors (new ones prefixed by >>):
sparc64-linux-ld: drivers/pci/trace.o: in function `trace_raw_output_pcie_link_event':
trace.c:(.text+0x1dc): undefined reference to `pci_speed_string'
>> sparc64-linux-ld: trace.c:(.text+0x1ec): undefined reference to `pci_speed_string'
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v12 3/3] Documentation: tracing: Add documentation about PCI tracepoints
2025-10-14 12:31 [PATCH v12 0/3] PCI: trace: Add a RAS tracepoint to monitor link speed changes Shuai Xue
2025-10-14 12:31 ` [PATCH v12 1/3] PCI: trace: Add a generic RAS tracepoint for hotplug event Shuai Xue
2025-10-14 12:31 ` [PATCH v12 2/3] PCI: trace: Add a RAS tracepoint to monitor link speed changes Shuai Xue
@ 2025-10-14 12:31 ` Shuai Xue
2025-10-14 14:40 ` Ilpo Järvinen
2 siblings, 1 reply; 14+ messages in thread
From: Shuai Xue @ 2025-10-14 12:31 UTC (permalink / raw)
To: rostedt, lukas, linux-pci, linux-kernel, linux-edac,
linux-trace-kernel, helgaas, ilpo.jarvinen, mattc,
Jonathan.Cameron
Cc: bhelgaas, tony.luck, bp, xueshuai, mhiramat, mathieu.desnoyers,
oleg, naveen, davem, anil.s.keshavamurthy, mark.rutland, peterz,
tianruidong
The PCI tracing system provides tracepoints to monitor critical hardware
events that can impact system performance and reliability. Add
documentation about it.
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
Documentation/trace/events-pci.rst | 74 ++++++++++++++++++++++++++++++
1 file changed, 74 insertions(+)
create mode 100644 Documentation/trace/events-pci.rst
diff --git a/Documentation/trace/events-pci.rst b/Documentation/trace/events-pci.rst
new file mode 100644
index 000000000000..500b27713224
--- /dev/null
+++ b/Documentation/trace/events-pci.rst
@@ -0,0 +1,74 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===========================
+Subsystem Trace Points: PCI
+===========================
+
+Overview
+========
+The PCI tracing system provides tracepoints to monitor critical hardware events
+that can impact system performance and reliability. These events normally show
+up here:
+
+ /sys/kernel/tracing/events/pci
+
+Cf. include/trace/events/pci.h for the events definitions.
+
+Available Tracepoints
+=====================
+
+pci_hp_event
+------------
+
+Monitors PCI hotplug events including card insertion/removal and link
+state changes.
+::
+
+ pci_hp_event "%s slot:%s, event:%s\n"
+
+**Event Types**:
+
+* ``LINK_UP`` - PCIe link established
+* ``LINK_DOWN`` - PCIe link lost
+* ``CARD_PRESENT`` - Card detected in slot
+* ``CARD_NOT_PRESENT`` - Card removed from slot
+
+**Example Usage**:
+
+ # Enable the tracepoint
+ echo 1> /sys/kernel/debug/tracing/events/pci/pci_hp_event/enable
+
+ # Monitor events (the following output is generated when a device is hotplugged)
+ cat /sys/kernel/debug/tracing/trace_pipe
+ irq/51-pciehp-88 [001] ..... 1311.177459: pci_hp_event: 0000:00:02.0 slot:10, event:CARD_PRESENT
+
+ irq/51-pciehp-88 [001] ..... 1311.177566: pci_hp_event: 0000:00:02.0 slot:10, event:LINK_UP
+
+pcie_link_event
+---------------
+
+Monitors PCIe link speed changes and provides detailed link status information.
+::
+
+ pcie_link_event "%s type:%d, reason:%d, cur_bus_speed:%s, max_bus_speed:%s, width:%u, flit_mode:%u, status:%s\n"
+
+**Parameters**:
+
+* ``type`` - PCIe device type (4=Root Port, etc.)
+* ``reason`` - Reason for link change:
+
+ - ``0`` - Link retrain
+ - ``1`` - Bus enumeration
+ - ``2`` - Bandwidth controller enable
+ - ``3`` - Bandwidth controller IRQ
+ - ``4`` - Hotplug event
+
+
+**Example Usage**:
+
+ # Enable the tracepoint
+ echo1 > /sys/kernel/debug/tracing/events/pci/pcie_link_event/enable
+
+ # Monitor events (the following output is generated when a device is hotplugged)
+ cat /sys/kernel/debug/tracing/trace_pipe
+ irq/51-pciehp-88 [001] ..... 381.545386: pcie_link_event: 0000:00:02.0 type:4, reason:4, cur_bus_speed:2.5 GT/s PCIe, max_bus_speed:16.0 GT/s PCIe, width:1, flit_mode:0, status:DLLLA
--
2.39.3
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v12 3/3] Documentation: tracing: Add documentation about PCI tracepoints
2025-10-14 12:31 ` [PATCH v12 3/3] Documentation: tracing: Add documentation about PCI tracepoints Shuai Xue
@ 2025-10-14 14:40 ` Ilpo Järvinen
2025-10-15 2:36 ` Shuai Xue
0 siblings, 1 reply; 14+ messages in thread
From: Ilpo Järvinen @ 2025-10-14 14:40 UTC (permalink / raw)
To: Shuai Xue
Cc: rostedt, Lukas Wunner, linux-pci, LKML, linux-edac,
linux-trace-kernel, helgaas, mattc, Jonathan.Cameron, bhelgaas,
tony.luck, bp, mhiramat, mathieu.desnoyers, oleg, naveen, davem,
anil.s.keshavamurthy, mark.rutland, peterz, tianruidong
[-- Attachment #1: Type: text/plain, Size: 3396 bytes --]
On Tue, 14 Oct 2025, Shuai Xue wrote:
> The PCI tracing system provides tracepoints to monitor critical hardware
> events that can impact system performance and reliability. Add
> documentation about it.
>
> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
> ---
> Documentation/trace/events-pci.rst | 74 ++++++++++++++++++++++++++++++
> 1 file changed, 74 insertions(+)
> create mode 100644 Documentation/trace/events-pci.rst
>
> diff --git a/Documentation/trace/events-pci.rst b/Documentation/trace/events-pci.rst
> new file mode 100644
> index 000000000000..500b27713224
> --- /dev/null
> +++ b/Documentation/trace/events-pci.rst
> @@ -0,0 +1,74 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===========================
> +Subsystem Trace Points: PCI
> +===========================
> +
> +Overview
> +========
> +The PCI tracing system provides tracepoints to monitor critical hardware events
> +that can impact system performance and reliability. These events normally show
> +up here:
> +
> + /sys/kernel/tracing/events/pci
> +
> +Cf. include/trace/events/pci.h for the events definitions.
> +
> +Available Tracepoints
> +=====================
> +
> +pci_hp_event
> +------------
> +
> +Monitors PCI hotplug events including card insertion/removal and link
> +state changes.
> +::
> +
> + pci_hp_event "%s slot:%s, event:%s\n"
> +
> +**Event Types**:
> +
> +* ``LINK_UP`` - PCIe link established
> +* ``LINK_DOWN`` - PCIe link lost
> +* ``CARD_PRESENT`` - Card detected in slot
> +* ``CARD_NOT_PRESENT`` - Card removed from slot
> +
> +**Example Usage**:
> +
> + # Enable the tracepoint
> + echo 1> /sys/kernel/debug/tracing/events/pci/pci_hp_event/enable
> +
> + # Monitor events (the following output is generated when a device is hotplugged)
> + cat /sys/kernel/debug/tracing/trace_pipe
> + irq/51-pciehp-88 [001] ..... 1311.177459: pci_hp_event: 0000:00:02.0 slot:10, event:CARD_PRESENT
> +
> + irq/51-pciehp-88 [001] ..... 1311.177566: pci_hp_event: 0000:00:02.0 slot:10, event:LINK_UP
> +
> +pcie_link_event
> +---------------
> +
> +Monitors PCIe link speed changes and provides detailed link status information.
> +::
> +
> + pcie_link_event "%s type:%d, reason:%d, cur_bus_speed:%s, max_bus_speed:%s, width:%u, flit_mode:%u, status:%s\n"
> +
> +**Parameters**:
> +
> +* ``type`` - PCIe device type (4=Root Port, etc.)
> +* ``reason`` - Reason for link change:
> +
> + - ``0`` - Link retrain
> + - ``1`` - Bus enumeration
> + - ``2`` - Bandwidth controller enable
> + - ``3`` - Bandwidth controller IRQ
Maybe these two should be called "Bandwidth notification" as that's the
name of the underlying mechanism.
For the entire series,
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> + - ``4`` - Hotplug event
> +
> +
> +**Example Usage**:
> +
> + # Enable the tracepoint
> + echo1 > /sys/kernel/debug/tracing/events/pci/pcie_link_event/enable
> +
> + # Monitor events (the following output is generated when a device is hotplugged)
> + cat /sys/kernel/debug/tracing/trace_pipe
> + irq/51-pciehp-88 [001] ..... 381.545386: pcie_link_event: 0000:00:02.0 type:4, reason:4, cur_bus_speed:2.5 GT/s PCIe, max_bus_speed:16.0 GT/s PCIe, width:1, flit_mode:0, status:DLLLA
>
--
i.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v12 3/3] Documentation: tracing: Add documentation about PCI tracepoints
2025-10-14 14:40 ` Ilpo Järvinen
@ 2025-10-15 2:36 ` Shuai Xue
0 siblings, 0 replies; 14+ messages in thread
From: Shuai Xue @ 2025-10-15 2:36 UTC (permalink / raw)
To: Ilpo Järvinen
Cc: rostedt, Lukas Wunner, linux-pci, LKML, linux-edac,
linux-trace-kernel, helgaas, mattc, Jonathan.Cameron, bhelgaas,
tony.luck, bp, mhiramat, mathieu.desnoyers, oleg, naveen, davem,
anil.s.keshavamurthy, mark.rutland, peterz, tianruidong
在 2025/10/14 22:40, Ilpo Järvinen 写道:
> On Tue, 14 Oct 2025, Shuai Xue wrote:
>
>> The PCI tracing system provides tracepoints to monitor critical hardware
>> events that can impact system performance and reliability. Add
>> documentation about it.
>>
>> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
>> ---
>> Documentation/trace/events-pci.rst | 74 ++++++++++++++++++++++++++++++
>> 1 file changed, 74 insertions(+)
>> create mode 100644 Documentation/trace/events-pci.rst
>>
>> diff --git a/Documentation/trace/events-pci.rst b/Documentation/trace/events-pci.rst
>> new file mode 100644
>> index 000000000000..500b27713224
>> --- /dev/null
>> +++ b/Documentation/trace/events-pci.rst
>> @@ -0,0 +1,74 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +
>> +===========================
>> +Subsystem Trace Points: PCI
>> +===========================
>> +
>> +Overview
>> +========
>> +The PCI tracing system provides tracepoints to monitor critical hardware events
>> +that can impact system performance and reliability. These events normally show
>> +up here:
>> +
>> + /sys/kernel/tracing/events/pci
>> +
>> +Cf. include/trace/events/pci.h for the events definitions.
>> +
>> +Available Tracepoints
>> +=====================
>> +
>> +pci_hp_event
>> +------------
>> +
>> +Monitors PCI hotplug events including card insertion/removal and link
>> +state changes.
>> +::
>> +
>> + pci_hp_event "%s slot:%s, event:%s\n"
>> +
>> +**Event Types**:
>> +
>> +* ``LINK_UP`` - PCIe link established
>> +* ``LINK_DOWN`` - PCIe link lost
>> +* ``CARD_PRESENT`` - Card detected in slot
>> +* ``CARD_NOT_PRESENT`` - Card removed from slot
>> +
>> +**Example Usage**:
>> +
>> + # Enable the tracepoint
>> + echo 1> /sys/kernel/debug/tracing/events/pci/pci_hp_event/enable
>> +
>> + # Monitor events (the following output is generated when a device is hotplugged)
>> + cat /sys/kernel/debug/tracing/trace_pipe
>> + irq/51-pciehp-88 [001] ..... 1311.177459: pci_hp_event: 0000:00:02.0 slot:10, event:CARD_PRESENT
>> +
>> + irq/51-pciehp-88 [001] ..... 1311.177566: pci_hp_event: 0000:00:02.0 slot:10, event:LINK_UP
>> +
>> +pcie_link_event
>> +---------------
>> +
>> +Monitors PCIe link speed changes and provides detailed link status information.
>> +::
>> +
>> + pcie_link_event "%s type:%d, reason:%d, cur_bus_speed:%s, max_bus_speed:%s, width:%u, flit_mode:%u, status:%s\n"
>> +
>> +**Parameters**:
>> +
>> +* ``type`` - PCIe device type (4=Root Port, etc.)
>> +* ``reason`` - Reason for link change:
>> +
>> + - ``0`` - Link retrain
>> + - ``1`` - Bus enumeration
>> + - ``2`` - Bandwidth controller enable
>> + - ``3`` - Bandwidth controller IRQ
>
> Maybe these two should be called "Bandwidth notification" as that's the
> name of the underlying mechanism.
Sure, I will rename it.
>
> For the entire series,
>
> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Thanks.
Best Regards,
Shuai
^ permalink raw reply [flat|nested] 14+ messages in thread