* [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event
@ 2025-05-12 1:38 Shuai Xue
2025-05-19 17:10 ` Ilpo Järvinen
` (3 more replies)
0 siblings, 4 replies; 37+ messages in thread
From: Shuai Xue @ 2025-05-12 1:38 UTC (permalink / raw)
To: rostedt, lukas, linux-pci, linux-kernel, linux-edac,
linux-trace-kernel, helgaas
Cc: bhelgaas, tony.luck, bp, xueshuai, mhiramat, mathieu.desnoyers,
oleg, naveen, davem, anil.s.keshavamurthy, mark.rutland, peterz,
tianruidong
Hotplug events are critical indicators for analyzing hardware health,
particularly in AI supercomputers where surprise link downs can
significantly impact system performance and reliability.
To this end, define a new TRACING_SYSTEM named pci, add a generic RAS
tracepoint for hotplug event to help healthy check, and generate
tracepoints for pcie hotplug event. Add enum pci_hotplug_event in
include/uapi/linux/pci.h so applications like rasdaemon can register
tracepoint event handlers for it.
The output like below:
$ echo 1 > /sys/kernel/debug/tracing/events/pci/pci_hp_event/enable
$ cat /sys/kernel/debug/tracing/trace_pipe
<...>-206 [001] ..... 40.373870: pci_hp_event: 0000:00:02.0 slot:10, event:Link Down
<...>-206 [001] ..... 40.374871: pci_hp_event: 0000:00:02.0 slot:10, event:Card not present
Suggested-by: Lukas Wunner <lukas@wunner.de>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
---
changes since v7:
- replace the TRACE_INCLUDE_PATH to avoid macro conflict per Steven
- pick up Reviewed-by from Lukas Wunner
---
drivers/pci/hotplug/Makefile | 3 ++
drivers/pci/hotplug/pciehp_ctrl.c | 33 ++++++++++++---
drivers/pci/hotplug/trace.h | 68 +++++++++++++++++++++++++++++++
include/uapi/linux/pci.h | 7 ++++
4 files changed, 105 insertions(+), 6 deletions(-)
create mode 100644 drivers/pci/hotplug/trace.h
diff --git a/drivers/pci/hotplug/Makefile b/drivers/pci/hotplug/Makefile
index 40aaf31fe338..a1a9d1e98962 100644
--- a/drivers/pci/hotplug/Makefile
+++ b/drivers/pci/hotplug/Makefile
@@ -3,6 +3,9 @@
# Makefile for the Linux kernel pci hotplug controller drivers.
#
+# define_trace.h needs to know how to find our header
+CFLAGS_pciehp_ctrl.o := -I$(src)
+
obj-$(CONFIG_HOTPLUG_PCI) += pci_hotplug.o
obj-$(CONFIG_HOTPLUG_PCI_COMPAQ) += cpqphp.o
obj-$(CONFIG_HOTPLUG_PCI_IBM) += ibmphp.o
diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index d603a7aa7483..f9beb4d3a9b8 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -23,6 +23,9 @@
#include "../pci.h"
#include "pciehp.h"
+#define CREATE_TRACE_POINTS
+#include "trace.h"
+
/* The following routines constitute the bulk of the
hotplug controller logic
*/
@@ -244,12 +247,20 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
case ON_STATE:
ctrl->state = POWEROFF_STATE;
mutex_unlock(&ctrl->state_lock);
- if (events & PCI_EXP_SLTSTA_DLLSC)
+ if (events & PCI_EXP_SLTSTA_DLLSC) {
ctrl_info(ctrl, "Slot(%s): Link Down\n",
slot_name(ctrl));
- if (events & PCI_EXP_SLTSTA_PDC)
+ trace_pci_hp_event(pci_name(ctrl->pcie->port),
+ slot_name(ctrl),
+ PCI_HOTPLUG_LINK_DOWN);
+ }
+ if (events & PCI_EXP_SLTSTA_PDC) {
ctrl_info(ctrl, "Slot(%s): Card not present\n",
slot_name(ctrl));
+ trace_pci_hp_event(pci_name(ctrl->pcie->port),
+ slot_name(ctrl),
+ PCI_HOTPLUG_CARD_NOT_PRESENT);
+ }
pciehp_disable_slot(ctrl, SURPRISE_REMOVAL);
break;
default:
@@ -269,6 +280,9 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
INDICATOR_NOOP);
ctrl_info(ctrl, "Slot(%s): Card not present\n",
slot_name(ctrl));
+ trace_pci_hp_event(pci_name(ctrl->pcie->port),
+ slot_name(ctrl),
+ PCI_HOTPLUG_CARD_NOT_PRESENT);
}
mutex_unlock(&ctrl->state_lock);
return;
@@ -281,12 +295,19 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
case OFF_STATE:
ctrl->state = POWERON_STATE;
mutex_unlock(&ctrl->state_lock);
- if (present)
+ if (present) {
ctrl_info(ctrl, "Slot(%s): Card present\n",
slot_name(ctrl));
- if (link_active)
- ctrl_info(ctrl, "Slot(%s): Link Up\n",
- slot_name(ctrl));
+ trace_pci_hp_event(pci_name(ctrl->pcie->port),
+ slot_name(ctrl),
+ PCI_HOTPLUG_CARD_PRESENT);
+ }
+ if (link_active) {
+ ctrl_info(ctrl, "Slot(%s): Link Up\n", slot_name(ctrl));
+ trace_pci_hp_event(pci_name(ctrl->pcie->port),
+ slot_name(ctrl),
+ PCI_HOTPLUG_LINK_UP);
+ }
ctrl->request_result = pciehp_enable_slot(ctrl);
break;
default:
diff --git a/drivers/pci/hotplug/trace.h b/drivers/pci/hotplug/trace.h
new file mode 100644
index 000000000000..21329c198019
--- /dev/null
+++ b/drivers/pci/hotplug/trace.h
@@ -0,0 +1,68 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#if !defined(_TRACE_HW_EVENT_PCI_HP_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_HW_EVENT_PCI_HP_H
+
+#include <linux/tracepoint.h>
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM pci
+
+#define PCI_HOTPLUG_EVENT \
+ EM(PCI_HOTPLUG_LINK_UP, "Link Up") \
+ EM(PCI_HOTPLUG_LINK_DOWN, "Link Down") \
+ EM(PCI_HOTPLUG_CARD_PRESENT, "Card present") \
+ EMe(PCI_HOTPLUG_CARD_NOT_PRESENT, "Card not present")
+
+/* Enums require being exported to userspace, for user tool parsing */
+#undef EM
+#undef EMe
+#define EM(a, b) TRACE_DEFINE_ENUM(a);
+#define EMe(a, b) TRACE_DEFINE_ENUM(a);
+
+PCI_HOTPLUG_EVENT
+
+/*
+ * Now redefine the EM() and EMe() macros to map the enums to the strings
+ * that will be printed in the output.
+ */
+#undef EM
+#undef EMe
+#define EM(a, b) {a, b},
+#define EMe(a, b) {a, b}
+
+TRACE_EVENT(pci_hp_event,
+
+ TP_PROTO(const char *port_name,
+ const char *slot,
+ const int event),
+
+ TP_ARGS(port_name, slot, event),
+
+ TP_STRUCT__entry(
+ __string( port_name, port_name )
+ __string( slot, slot )
+ __field( int, event )
+ ),
+
+ TP_fast_assign(
+ __assign_str(port_name);
+ __assign_str(slot);
+ __entry->event = event;
+ ),
+
+ TP_printk("%s slot:%s, event:%s\n",
+ __get_str(port_name),
+ __get_str(slot),
+ __print_symbolic(__entry->event, PCI_HOTPLUG_EVENT)
+ )
+);
+
+#endif /* _TRACE_HW_EVENT_PCI_HP_H */
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH .
+#undef TRACE_INCLUDE_FILE
+#define TRACE_INCLUDE_FILE trace
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/include/uapi/linux/pci.h b/include/uapi/linux/pci.h
index a769eefc5139..4f150028965d 100644
--- a/include/uapi/linux/pci.h
+++ b/include/uapi/linux/pci.h
@@ -39,4 +39,11 @@
#define PCIIOC_MMAP_IS_MEM (PCIIOC_BASE | 0x02) /* Set mmap state to MEM space. */
#define PCIIOC_WRITE_COMBINE (PCIIOC_BASE | 0x03) /* Enable/disable write-combining. */
+enum pci_hotplug_event {
+ PCI_HOTPLUG_LINK_UP,
+ PCI_HOTPLUG_LINK_DOWN,
+ PCI_HOTPLUG_CARD_PRESENT,
+ PCI_HOTPLUG_CARD_NOT_PRESENT,
+};
+
#endif /* _UAPILINUX_PCI_H */
--
2.39.3
^ permalink raw reply related [flat|nested] 37+ messages in thread
* Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event
2025-05-12 1:38 [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event Shuai Xue
@ 2025-05-19 17:10 ` Ilpo Järvinen
2025-05-20 2:36 ` Shuai Xue
2025-06-02 6:30 ` Ilpo Järvinen
` (2 subsequent siblings)
3 siblings, 1 reply; 37+ messages in thread
From: Ilpo Järvinen @ 2025-05-19 17:10 UTC (permalink / raw)
To: Shuai Xue
Cc: rostedt, Lukas Wunner, linux-pci, LKML, linux-edac,
linux-trace-kernel, helgaas, bhelgaas, tony.luck, bp, mhiramat,
mathieu.desnoyers, oleg, naveen, davem, anil.s.keshavamurthy,
mark.rutland, peterz, tianruidong
On Mon, 12 May 2025, Shuai Xue wrote:
> Hotplug events are critical indicators for analyzing hardware health,
> particularly in AI supercomputers where surprise link downs can
> significantly impact system performance and reliability.
>
> To this end, define a new TRACING_SYSTEM named pci, add a generic RAS
> tracepoint for hotplug event to help healthy check, and generate
> tracepoints for pcie hotplug event. Add enum pci_hotplug_event in
> include/uapi/linux/pci.h so applications like rasdaemon can register
> tracepoint event handlers for it.
>
> The output like below:
>
> $ echo 1 > /sys/kernel/debug/tracing/events/pci/pci_hp_event/enable
> $ cat /sys/kernel/debug/tracing/trace_pipe
> <...>-206 [001] ..... 40.373870: pci_hp_event: 0000:00:02.0 slot:10, event:Link Down
>
> <...>-206 [001] ..... 40.374871: pci_hp_event: 0000:00:02.0 slot:10, event:Card not present
>
> Suggested-by: Lukas Wunner <lukas@wunner.de>
> Suggested-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
> Reviewed-by: Lukas Wunner <lukas@wunner.de>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> changes since v7:
> - replace the TRACE_INCLUDE_PATH to avoid macro conflict per Steven
> - pick up Reviewed-by from Lukas Wunner
> ---
> drivers/pci/hotplug/Makefile | 3 ++
> drivers/pci/hotplug/pciehp_ctrl.c | 33 ++++++++++++---
> drivers/pci/hotplug/trace.h | 68 +++++++++++++++++++++++++++++++
> include/uapi/linux/pci.h | 7 ++++
> 4 files changed, 105 insertions(+), 6 deletions(-)
> create mode 100644 drivers/pci/hotplug/trace.h
>
> diff --git a/drivers/pci/hotplug/Makefile b/drivers/pci/hotplug/Makefile
> index 40aaf31fe338..a1a9d1e98962 100644
> --- a/drivers/pci/hotplug/Makefile
> +++ b/drivers/pci/hotplug/Makefile
> @@ -3,6 +3,9 @@
> # Makefile for the Linux kernel pci hotplug controller drivers.
> #
>
> +# define_trace.h needs to know how to find our header
> +CFLAGS_pciehp_ctrl.o := -I$(src)
> +
> obj-$(CONFIG_HOTPLUG_PCI) += pci_hotplug.o
> obj-$(CONFIG_HOTPLUG_PCI_COMPAQ) += cpqphp.o
> obj-$(CONFIG_HOTPLUG_PCI_IBM) += ibmphp.o
> diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
> index d603a7aa7483..f9beb4d3a9b8 100644
> --- a/drivers/pci/hotplug/pciehp_ctrl.c
> +++ b/drivers/pci/hotplug/pciehp_ctrl.c
> @@ -23,6 +23,9 @@
> #include "../pci.h"
> #include "pciehp.h"
>
> +#define CREATE_TRACE_POINTS
> +#include "trace.h"
> +
> /* The following routines constitute the bulk of the
> hotplug controller logic
> */
> @@ -244,12 +247,20 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
> case ON_STATE:
> ctrl->state = POWEROFF_STATE;
> mutex_unlock(&ctrl->state_lock);
> - if (events & PCI_EXP_SLTSTA_DLLSC)
> + if (events & PCI_EXP_SLTSTA_DLLSC) {
> ctrl_info(ctrl, "Slot(%s): Link Down\n",
> slot_name(ctrl));
> - if (events & PCI_EXP_SLTSTA_PDC)
> + trace_pci_hp_event(pci_name(ctrl->pcie->port),
> + slot_name(ctrl),
> + PCI_HOTPLUG_LINK_DOWN);
> + }
> + if (events & PCI_EXP_SLTSTA_PDC) {
> ctrl_info(ctrl, "Slot(%s): Card not present\n",
> slot_name(ctrl));
> + trace_pci_hp_event(pci_name(ctrl->pcie->port),
> + slot_name(ctrl),
> + PCI_HOTPLUG_CARD_NOT_PRESENT);
> + }
> pciehp_disable_slot(ctrl, SURPRISE_REMOVAL);
> break;
> default:
> @@ -269,6 +280,9 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
> INDICATOR_NOOP);
> ctrl_info(ctrl, "Slot(%s): Card not present\n",
> slot_name(ctrl));
> + trace_pci_hp_event(pci_name(ctrl->pcie->port),
> + slot_name(ctrl),
> + PCI_HOTPLUG_CARD_NOT_PRESENT);
> }
> mutex_unlock(&ctrl->state_lock);
> return;
> @@ -281,12 +295,19 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
> case OFF_STATE:
> ctrl->state = POWERON_STATE;
> mutex_unlock(&ctrl->state_lock);
> - if (present)
> + if (present) {
> ctrl_info(ctrl, "Slot(%s): Card present\n",
> slot_name(ctrl));
> - if (link_active)
> - ctrl_info(ctrl, "Slot(%s): Link Up\n",
> - slot_name(ctrl));
> + trace_pci_hp_event(pci_name(ctrl->pcie->port),
> + slot_name(ctrl),
> + PCI_HOTPLUG_CARD_PRESENT);
> + }
> + if (link_active) {
> + ctrl_info(ctrl, "Slot(%s): Link Up\n", slot_name(ctrl));
> + trace_pci_hp_event(pci_name(ctrl->pcie->port),
> + slot_name(ctrl),
> + PCI_HOTPLUG_LINK_UP);
> + }
> ctrl->request_result = pciehp_enable_slot(ctrl);
> break;
> default:
> diff --git a/drivers/pci/hotplug/trace.h b/drivers/pci/hotplug/trace.h
> new file mode 100644
> index 000000000000..21329c198019
> --- /dev/null
> +++ b/drivers/pci/hotplug/trace.h
> @@ -0,0 +1,68 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#if !defined(_TRACE_HW_EVENT_PCI_HP_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_HW_EVENT_PCI_HP_H
> +
> +#include <linux/tracepoint.h>
> +
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM pci
> +
> +#define PCI_HOTPLUG_EVENT \
> + EM(PCI_HOTPLUG_LINK_UP, "Link Up") \
> + EM(PCI_HOTPLUG_LINK_DOWN, "Link Down") \
> + EM(PCI_HOTPLUG_CARD_PRESENT, "Card present") \
> + EMe(PCI_HOTPLUG_CARD_NOT_PRESENT, "Card not present")
Hi,
While I was thinking of adding tracing into PCIe BW controller (bwctrl),
I ended up thinking that perhaps it would make more sense to have PCIe
Link related tracepoints which would cover both hotplug and bwctrl so that
also Link Speed changes would be reported through the same trace event.
Downgraded speed may indicate there's something wrong with the card and
the Link Speed does have performance impact too for those who are pushing
BW boundaries such as AI systems.
So my suggestion is:
- Add "Link Speed changed" to the event types.
- Add Link Speed and Width into the event format (and probably also Flit
mode as PCIe gen6 is coming).
> +/* Enums require being exported to userspace, for user tool parsing */
> +#undef EM
> +#undef EMe
> +#define EM(a, b) TRACE_DEFINE_ENUM(a);
> +#define EMe(a, b) TRACE_DEFINE_ENUM(a);
> +
> +PCI_HOTPLUG_EVENT
> +
> +/*
> + * Now redefine the EM() and EMe() macros to map the enums to the strings
> + * that will be printed in the output.
> + */
> +#undef EM
> +#undef EMe
> +#define EM(a, b) {a, b},
> +#define EMe(a, b) {a, b}
> +
> +TRACE_EVENT(pci_hp_event,
> +
> + TP_PROTO(const char *port_name,
> + const char *slot,
> + const int event),
> +
> + TP_ARGS(port_name, slot, event),
> +
> + TP_STRUCT__entry(
> + __string( port_name, port_name )
> + __string( slot, slot )
> + __field( int, event )
> + ),
> +
> + TP_fast_assign(
> + __assign_str(port_name);
> + __assign_str(slot);
> + __entry->event = event;
> + ),
> +
> + TP_printk("%s slot:%s, event:%s\n",
> + __get_str(port_name),
> + __get_str(slot),
> + __print_symbolic(__entry->event, PCI_HOTPLUG_EVENT)
> + )
> +);
> +
> +#endif /* _TRACE_HW_EVENT_PCI_HP_H */
> +
> +#undef TRACE_INCLUDE_PATH
> +#define TRACE_INCLUDE_PATH .
> +#undef TRACE_INCLUDE_FILE
> +#define TRACE_INCLUDE_FILE trace
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>
> diff --git a/include/uapi/linux/pci.h b/include/uapi/linux/pci.h
> index a769eefc5139..4f150028965d 100644
> --- a/include/uapi/linux/pci.h
> +++ b/include/uapi/linux/pci.h
> @@ -39,4 +39,11 @@
> #define PCIIOC_MMAP_IS_MEM (PCIIOC_BASE | 0x02) /* Set mmap state to MEM space. */
> #define PCIIOC_WRITE_COMBINE (PCIIOC_BASE | 0x03) /* Enable/disable write-combining. */
>
> +enum pci_hotplug_event {
> + PCI_HOTPLUG_LINK_UP,
> + PCI_HOTPLUG_LINK_DOWN,
> + PCI_HOTPLUG_CARD_PRESENT,
> + PCI_HOTPLUG_CARD_NOT_PRESENT,
> +};
> +
> #endif /* _UAPILINUX_PCI_H */
>
--
i.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event
2025-05-19 17:10 ` Ilpo Järvinen
@ 2025-05-20 2:36 ` Shuai Xue
2025-05-20 10:07 ` Ilpo Järvinen
0 siblings, 1 reply; 37+ messages in thread
From: Shuai Xue @ 2025-05-20 2:36 UTC (permalink / raw)
To: Ilpo Järvinen
Cc: rostedt, Lukas Wunner, linux-pci, LKML, linux-edac,
linux-trace-kernel, helgaas, bhelgaas, tony.luck, bp, mhiramat,
mathieu.desnoyers, oleg, naveen, davem, anil.s.keshavamurthy,
mark.rutland, peterz, tianruidong
Hi, Ilpo,
在 2025/5/20 01:10, Ilpo Järvinen 写道:
> On Mon, 12 May 2025, Shuai Xue wrote:
>
>> Hotplug events are critical indicators for analyzing hardware health,
>> particularly in AI supercomputers where surprise link downs can
>> significantly impact system performance and reliability.
>>
>> To this end, define a new TRACING_SYSTEM named pci, add a generic RAS
>> tracepoint for hotplug event to help healthy check, and generate
>> tracepoints for pcie hotplug event. Add enum pci_hotplug_event in
>> include/uapi/linux/pci.h so applications like rasdaemon can register
>> tracepoint event handlers for it.
>>
>> The output like below:
>>
>> $ echo 1 > /sys/kernel/debug/tracing/events/pci/pci_hp_event/enable
>> $ cat /sys/kernel/debug/tracing/trace_pipe
>> <...>-206 [001] ..... 40.373870: pci_hp_event: 0000:00:02.0 slot:10, event:Link Down
>>
>> <...>-206 [001] ..... 40.374871: pci_hp_event: 0000:00:02.0 slot:10, event:Card not present
>>
>> Suggested-by: Lukas Wunner <lukas@wunner.de>
>> Suggested-by: Steven Rostedt <rostedt@goodmis.org>
>> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
>> Reviewed-by: Lukas Wunner <lukas@wunner.de>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> ---
>> changes since v7:
>> - replace the TRACE_INCLUDE_PATH to avoid macro conflict per Steven
>> - pick up Reviewed-by from Lukas Wunner
>> ---
>> drivers/pci/hotplug/Makefile | 3 ++
>> drivers/pci/hotplug/pciehp_ctrl.c | 33 ++++++++++++---
>> drivers/pci/hotplug/trace.h | 68 +++++++++++++++++++++++++++++++
>> include/uapi/linux/pci.h | 7 ++++
>> 4 files changed, 105 insertions(+), 6 deletions(-)
>> create mode 100644 drivers/pci/hotplug/trace.h
>>
>> diff --git a/drivers/pci/hotplug/Makefile b/drivers/pci/hotplug/Makefile
>> index 40aaf31fe338..a1a9d1e98962 100644
>> --- a/drivers/pci/hotplug/Makefile
>> +++ b/drivers/pci/hotplug/Makefile
>> @@ -3,6 +3,9 @@
>> # Makefile for the Linux kernel pci hotplug controller drivers.
>> #
>>
>> +# define_trace.h needs to know how to find our header
>> +CFLAGS_pciehp_ctrl.o := -I$(src)
>> +
>> obj-$(CONFIG_HOTPLUG_PCI) += pci_hotplug.o
>> obj-$(CONFIG_HOTPLUG_PCI_COMPAQ) += cpqphp.o
>> obj-$(CONFIG_HOTPLUG_PCI_IBM) += ibmphp.o
>> diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
>> index d603a7aa7483..f9beb4d3a9b8 100644
>> --- a/drivers/pci/hotplug/pciehp_ctrl.c
>> +++ b/drivers/pci/hotplug/pciehp_ctrl.c
>> @@ -23,6 +23,9 @@
>> #include "../pci.h"
>> #include "pciehp.h"
>>
>> +#define CREATE_TRACE_POINTS
>> +#include "trace.h"
>> +
>> /* The following routines constitute the bulk of the
>> hotplug controller logic
>> */
>> @@ -244,12 +247,20 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
>> case ON_STATE:
>> ctrl->state = POWEROFF_STATE;
>> mutex_unlock(&ctrl->state_lock);
>> - if (events & PCI_EXP_SLTSTA_DLLSC)
>> + if (events & PCI_EXP_SLTSTA_DLLSC) {
>> ctrl_info(ctrl, "Slot(%s): Link Down\n",
>> slot_name(ctrl));
>> - if (events & PCI_EXP_SLTSTA_PDC)
>> + trace_pci_hp_event(pci_name(ctrl->pcie->port),
>> + slot_name(ctrl),
>> + PCI_HOTPLUG_LINK_DOWN);
>> + }
>> + if (events & PCI_EXP_SLTSTA_PDC) {
>> ctrl_info(ctrl, "Slot(%s): Card not present\n",
>> slot_name(ctrl));
>> + trace_pci_hp_event(pci_name(ctrl->pcie->port),
>> + slot_name(ctrl),
>> + PCI_HOTPLUG_CARD_NOT_PRESENT);
>> + }
>> pciehp_disable_slot(ctrl, SURPRISE_REMOVAL);
>> break;
>> default:
>> @@ -269,6 +280,9 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
>> INDICATOR_NOOP);
>> ctrl_info(ctrl, "Slot(%s): Card not present\n",
>> slot_name(ctrl));
>> + trace_pci_hp_event(pci_name(ctrl->pcie->port),
>> + slot_name(ctrl),
>> + PCI_HOTPLUG_CARD_NOT_PRESENT);
>> }
>> mutex_unlock(&ctrl->state_lock);
>> return;
>> @@ -281,12 +295,19 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
>> case OFF_STATE:
>> ctrl->state = POWERON_STATE;
>> mutex_unlock(&ctrl->state_lock);
>> - if (present)
>> + if (present) {
>> ctrl_info(ctrl, "Slot(%s): Card present\n",
>> slot_name(ctrl));
>> - if (link_active)
>> - ctrl_info(ctrl, "Slot(%s): Link Up\n",
>> - slot_name(ctrl));
>> + trace_pci_hp_event(pci_name(ctrl->pcie->port),
>> + slot_name(ctrl),
>> + PCI_HOTPLUG_CARD_PRESENT);
>> + }
>> + if (link_active) {
>> + ctrl_info(ctrl, "Slot(%s): Link Up\n", slot_name(ctrl));
>> + trace_pci_hp_event(pci_name(ctrl->pcie->port),
>> + slot_name(ctrl),
>> + PCI_HOTPLUG_LINK_UP);
>> + }
>> ctrl->request_result = pciehp_enable_slot(ctrl);
>> break;
>> default:
>> diff --git a/drivers/pci/hotplug/trace.h b/drivers/pci/hotplug/trace.h
>> new file mode 100644
>> index 000000000000..21329c198019
>> --- /dev/null
>> +++ b/drivers/pci/hotplug/trace.h
>> @@ -0,0 +1,68 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +#if !defined(_TRACE_HW_EVENT_PCI_HP_H) || defined(TRACE_HEADER_MULTI_READ)
>> +#define _TRACE_HW_EVENT_PCI_HP_H
>> +
>> +#include <linux/tracepoint.h>
>> +
>> +#undef TRACE_SYSTEM
>> +#define TRACE_SYSTEM pci
>> +
>> +#define PCI_HOTPLUG_EVENT \
>> + EM(PCI_HOTPLUG_LINK_UP, "Link Up") \
>> + EM(PCI_HOTPLUG_LINK_DOWN, "Link Down") \
>> + EM(PCI_HOTPLUG_CARD_PRESENT, "Card present") \
>> + EMe(PCI_HOTPLUG_CARD_NOT_PRESENT, "Card not present")
>
> Hi,
>
> While I was thinking of adding tracing into PCIe BW controller (bwctrl),
> I ended up thinking that perhaps it would make more sense to have PCIe
> Link related tracepoints which would cover both hotplug and bwctrl so that
> also Link Speed changes would be reported through the same trace event.
>
> Downgraded speed may indicate there's something wrong with the card and
> the Link Speed does have performance impact too for those who are pushing
> BW boundaries such as AI systems.
Agreed!
>
> So my suggestion is:
>
> - Add "Link Speed changed" to the event types.
> - Add Link Speed and Width into the event format (and probably also Flit
> mode as PCIe gen6 is coming).
How about bellow event format:
+ TP_STRUCT__entry(
+ __string( port_name, port_name )
+ __field( unsigned char, cur_bus_speed )
+ __field( unsigned char, max_bus_speed )
+ __field( unsigned char, flit_mode )
+ ),
And add the event to pcie_update_link_speed():
@@ -796,6 +799,10 @@ void pcie_update_link_speed(struct pci_bus *bus)
pcie_capability_read_word(bridge, PCI_EXP_LNKSTA, &linksta);
pcie_capability_read_word(bridge, PCI_EXP_LNKSTA2, &linksta2);
__pcie_update_link_speed(bus, linksta, linksta2);
+
+ trace_pci_link_event(pci_name(bridge), bus->cur_bus_speed,
+ bus->max_bus_speed,
+ PCI_HOTPLUG_LINK_SPEED_CHANGED);
But I don't find link speed changed in hotplug driver, and the
format of "Link Speed changed" is a bit different from "pci_hp_event".
Do we really need a PCI_HOTPLUG_EVENT? May PCI_LINK_EVENT is more
appropriate?
Thanks.
Best Regards.
Shuai
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event
2025-05-20 2:36 ` Shuai Xue
@ 2025-05-20 10:07 ` Ilpo Järvinen
2025-05-20 10:44 ` Lukas Wunner
2025-05-22 9:41 ` Shuai Xue
0 siblings, 2 replies; 37+ messages in thread
From: Ilpo Järvinen @ 2025-05-20 10:07 UTC (permalink / raw)
To: Shuai Xue
Cc: rostedt, Lukas Wunner, linux-pci, LKML, linux-edac,
linux-trace-kernel, helgaas, bhelgaas, tony.luck, bp, mhiramat,
mathieu.desnoyers, oleg, naveen, davem, anil.s.keshavamurthy,
mark.rutland, peterz, tianruidong
[-- Attachment #1: Type: text/plain, Size: 9535 bytes --]
On Tue, 20 May 2025, Shuai Xue wrote:
> Hi, Ilpo,
>
> 在 2025/5/20 01:10, Ilpo Järvinen 写道:
> > On Mon, 12 May 2025, Shuai Xue wrote:
> >
> > > Hotplug events are critical indicators for analyzing hardware health,
> > > particularly in AI supercomputers where surprise link downs can
> > > significantly impact system performance and reliability.
> > >
> > > To this end, define a new TRACING_SYSTEM named pci, add a generic RAS
> > > tracepoint for hotplug event to help healthy check, and generate
> > > tracepoints for pcie hotplug event. Add enum pci_hotplug_event in
> > > include/uapi/linux/pci.h so applications like rasdaemon can register
> > > tracepoint event handlers for it.
> > >
> > > The output like below:
> > >
> > > $ echo 1 > /sys/kernel/debug/tracing/events/pci/pci_hp_event/enable
> > > $ cat /sys/kernel/debug/tracing/trace_pipe
> > > <...>-206 [001] ..... 40.373870: pci_hp_event: 0000:00:02.0
> > > slot:10, event:Link Down
> > >
> > > <...>-206 [001] ..... 40.374871: pci_hp_event: 0000:00:02.0
> > > slot:10, event:Card not present
> > >
> > > Suggested-by: Lukas Wunner <lukas@wunner.de>
> > > Suggested-by: Steven Rostedt <rostedt@goodmis.org>
> > > Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
> > > Reviewed-by: Lukas Wunner <lukas@wunner.de>
> > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > > ---
> > > changes since v7:
> > > - replace the TRACE_INCLUDE_PATH to avoid macro conflict per Steven
> > > - pick up Reviewed-by from Lukas Wunner
> > > ---
> > > drivers/pci/hotplug/Makefile | 3 ++
> > > drivers/pci/hotplug/pciehp_ctrl.c | 33 ++++++++++++---
> > > drivers/pci/hotplug/trace.h | 68 +++++++++++++++++++++++++++++++
> > > include/uapi/linux/pci.h | 7 ++++
> > > 4 files changed, 105 insertions(+), 6 deletions(-)
> > > create mode 100644 drivers/pci/hotplug/trace.h
> > >
> > > diff --git a/drivers/pci/hotplug/Makefile b/drivers/pci/hotplug/Makefile
> > > index 40aaf31fe338..a1a9d1e98962 100644
> > > --- a/drivers/pci/hotplug/Makefile
> > > +++ b/drivers/pci/hotplug/Makefile
> > > @@ -3,6 +3,9 @@
> > > # Makefile for the Linux kernel pci hotplug controller drivers.
> > > #
> > > +# define_trace.h needs to know how to find our header
> > > +CFLAGS_pciehp_ctrl.o := -I$(src)
> > > +
> > > obj-$(CONFIG_HOTPLUG_PCI) += pci_hotplug.o
> > > obj-$(CONFIG_HOTPLUG_PCI_COMPAQ) += cpqphp.o
> > > obj-$(CONFIG_HOTPLUG_PCI_IBM) += ibmphp.o
> > > diff --git a/drivers/pci/hotplug/pciehp_ctrl.c
> > > b/drivers/pci/hotplug/pciehp_ctrl.c
> > > index d603a7aa7483..f9beb4d3a9b8 100644
> > > --- a/drivers/pci/hotplug/pciehp_ctrl.c
> > > +++ b/drivers/pci/hotplug/pciehp_ctrl.c
> > > @@ -23,6 +23,9 @@
> > > #include "../pci.h"
> > > #include "pciehp.h"
> > > +#define CREATE_TRACE_POINTS
> > > +#include "trace.h"
> > > +
> > > /* The following routines constitute the bulk of the
> > > hotplug controller logic
> > > */
> > > @@ -244,12 +247,20 @@ void pciehp_handle_presence_or_link_change(struct
> > > controller *ctrl, u32 events)
> > > case ON_STATE:
> > > ctrl->state = POWEROFF_STATE;
> > > mutex_unlock(&ctrl->state_lock);
> > > - if (events & PCI_EXP_SLTSTA_DLLSC)
> > > + if (events & PCI_EXP_SLTSTA_DLLSC) {
> > > ctrl_info(ctrl, "Slot(%s): Link Down\n",
> > > slot_name(ctrl));
> > > - if (events & PCI_EXP_SLTSTA_PDC)
> > > + trace_pci_hp_event(pci_name(ctrl->pcie->port),
> > > + slot_name(ctrl),
> > > + PCI_HOTPLUG_LINK_DOWN);
> > > + }
> > > + if (events & PCI_EXP_SLTSTA_PDC) {
> > > ctrl_info(ctrl, "Slot(%s): Card not present\n",
> > > slot_name(ctrl));
> > > + trace_pci_hp_event(pci_name(ctrl->pcie->port),
> > > + slot_name(ctrl),
> > > + PCI_HOTPLUG_CARD_NOT_PRESENT);
> > > + }
> > > pciehp_disable_slot(ctrl, SURPRISE_REMOVAL);
> > > break;
> > > default:
> > > @@ -269,6 +280,9 @@ void pciehp_handle_presence_or_link_change(struct
> > > controller *ctrl, u32 events)
> > > INDICATOR_NOOP);
> > > ctrl_info(ctrl, "Slot(%s): Card not present\n",
> > > slot_name(ctrl));
> > > + trace_pci_hp_event(pci_name(ctrl->pcie->port),
> > > + slot_name(ctrl),
> > > + PCI_HOTPLUG_CARD_NOT_PRESENT);
> > > }
> > > mutex_unlock(&ctrl->state_lock);
> > > return;
> > > @@ -281,12 +295,19 @@ void pciehp_handle_presence_or_link_change(struct
> > > controller *ctrl, u32 events)
> > > case OFF_STATE:
> > > ctrl->state = POWERON_STATE;
> > > mutex_unlock(&ctrl->state_lock);
> > > - if (present)
> > > + if (present) {
> > > ctrl_info(ctrl, "Slot(%s): Card present\n",
> > > slot_name(ctrl));
> > > - if (link_active)
> > > - ctrl_info(ctrl, "Slot(%s): Link Up\n",
> > > - slot_name(ctrl));
> > > + trace_pci_hp_event(pci_name(ctrl->pcie->port),
> > > + slot_name(ctrl),
> > > + PCI_HOTPLUG_CARD_PRESENT);
> > > + }
> > > + if (link_active) {
> > > + ctrl_info(ctrl, "Slot(%s): Link Up\n",
> > > slot_name(ctrl));
> > > + trace_pci_hp_event(pci_name(ctrl->pcie->port),
> > > + slot_name(ctrl),
> > > + PCI_HOTPLUG_LINK_UP);
> > > + }
> > > ctrl->request_result = pciehp_enable_slot(ctrl);
> > > break;
> > > default:
> > > diff --git a/drivers/pci/hotplug/trace.h b/drivers/pci/hotplug/trace.h
> > > new file mode 100644
> > > index 000000000000..21329c198019
> > > --- /dev/null
> > > +++ b/drivers/pci/hotplug/trace.h
> > > @@ -0,0 +1,68 @@
> > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > +#if !defined(_TRACE_HW_EVENT_PCI_HP_H) ||
> > > defined(TRACE_HEADER_MULTI_READ)
> > > +#define _TRACE_HW_EVENT_PCI_HP_H
> > > +
> > > +#include <linux/tracepoint.h>
> > > +
> > > +#undef TRACE_SYSTEM
> > > +#define TRACE_SYSTEM pci
> > > +
> > > +#define PCI_HOTPLUG_EVENT \
> > > + EM(PCI_HOTPLUG_LINK_UP, "Link Up") \
> > > + EM(PCI_HOTPLUG_LINK_DOWN, "Link Down") \
> > > + EM(PCI_HOTPLUG_CARD_PRESENT, "Card present") \
> > > + EMe(PCI_HOTPLUG_CARD_NOT_PRESENT, "Card not present")
> >
> > Hi,
> >
> > While I was thinking of adding tracing into PCIe BW controller (bwctrl),
> > I ended up thinking that perhaps it would make more sense to have PCIe
> > Link related tracepoints which would cover both hotplug and bwctrl so that
> > also Link Speed changes would be reported through the same trace event.
> >
> > Downgraded speed may indicate there's something wrong with the card and
> > the Link Speed does have performance impact too for those who are pushing
> > BW boundaries such as AI systems.
>
> Agreed!
>
> >
> > So my suggestion is:
> >
> > - Add "Link Speed changed" to the event types.
> > - Add Link Speed and Width into the event format (and probably also Flit
> > mode as PCIe gen6 is coming).
>
>
> How about bellow event format:
>
> + TP_STRUCT__entry(
> + __string( port_name, port_name )
> + __field( unsigned char, cur_bus_speed )
> + __field( unsigned char, max_bus_speed )
Add also the Link Width.
> + __field( unsigned char, flit_mode )
> + ),
>
> And add the event to pcie_update_link_speed():
>
> @@ -796,6 +799,10 @@ void pcie_update_link_speed(struct pci_bus *bus)
> pcie_capability_read_word(bridge, PCI_EXP_LNKSTA, &linksta);
> pcie_capability_read_word(bridge, PCI_EXP_LNKSTA2, &linksta2);
> __pcie_update_link_speed(bus, linksta, linksta2);
> +
> + trace_pci_link_event(pci_name(bridge), bus->cur_bus_speed,
> + bus->max_bus_speed,
I don't think outputting the internal values of enum pci_bus_speed is a
good idea. Maybe these could be printed as a string (with
pci_speed_string()) or encoded with trace interface specific values.
Perhaps it would make sense to check if the speed really changed before
sending that event, but there are good sides in both approaches as I
know some platforms assert LBMS more than once during a single Link Speed
change.
> + PCI_HOTPLUG_LINK_SPEED_CHANGED);
>
> But I don't find link speed changed in hotplug driver
pciehp_check_link_status() calls __pcie_update_link_speed().
> , and the format of "Link Speed changed" is a bit different from
> "pci_hp_event".
The difference is only because when the Link is down, there's no Link
Speed (obviously). Whenever a new device is hotplugged and it comes up,
there's also Link Speed for it which can be included into the trace event.
I think the trace event should have some special value for the fields that
are N/A due to Link being off. While it would be possible to create
separate events for speed changes and hotplug, I don't see any pros in
that approach over just having the N/A fields marked as such when the Link
is Down.
Perhaps it would even make sense to add PCIE_SPEED_LINK_DOWN into
bus->cur_bus_speed when hotplug finds the card is gone (I'm not entirely
sure how bwctrl or pcie_cooling driver would cope with that though, they
might need minor tweaking to support it, and there are a few other drivers
that use that field).
> Do we really need a PCI_HOTPLUG_EVENT? May PCI_LINK_EVENT is more
> appropriate?
Ah, right, I forgot to mention it would make sense to rename it to
PCI_LINK_EVENT.
--
i.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event
2025-05-20 10:07 ` Ilpo Järvinen
@ 2025-05-20 10:44 ` Lukas Wunner
2025-05-20 10:59 ` Ilpo Järvinen
2025-05-20 12:09 ` Lukas Wunner
2025-05-22 9:41 ` Shuai Xue
1 sibling, 2 replies; 37+ messages in thread
From: Lukas Wunner @ 2025-05-20 10:44 UTC (permalink / raw)
To: Ilpo Järvinen
Cc: Shuai Xue, rostedt, linux-pci, LKML, linux-edac,
linux-trace-kernel, helgaas, bhelgaas, tony.luck, bp, mhiramat,
mathieu.desnoyers, oleg, naveen, davem, anil.s.keshavamurthy,
mark.rutland, peterz, tianruidong
On Tue, May 20, 2025 at 01:07:28PM +0300, Ilpo Järvinen wrote:
> On Tue, 20 May 2025, Shuai Xue wrote:
> > , and the format of "Link Speed changed" is a bit different from
> > "pci_hp_event".
>
> The difference is only because when the Link is down, there's no Link
> Speed (obviously). Whenever a new device is hotplugged and it comes up,
> there's also Link Speed for it which can be included into the trace event.
>
> I think the trace event should have some special value for the fields that
> are N/A due to Link being off. While it would be possible to create
> separate events for speed changes and hotplug, I don't see any pros in
> that approach over just having the N/A fields marked as such when the Link
> is Down.
Link speed changes and device plug/unplug events are orthogonal,
I don't think they should be mixed together in the same event.
A link speed event can be signaled simultaneously to a plug event
and then user space can decide in which type of event it's
interested in.
That also avoids the awkwardness of having N/A values for the
link speed on unplug.
Thanks,
Lukas
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event
2025-05-20 10:44 ` Lukas Wunner
@ 2025-05-20 10:59 ` Ilpo Järvinen
2025-05-20 12:09 ` Lukas Wunner
1 sibling, 0 replies; 37+ messages in thread
From: Ilpo Järvinen @ 2025-05-20 10:59 UTC (permalink / raw)
To: Lukas Wunner, Shuai Xue
Cc: rostedt, linux-pci, LKML, linux-edac, linux-trace-kernel, helgaas,
bhelgaas, tony.luck, bp, mhiramat, mathieu.desnoyers, oleg,
naveen, davem, anil.s.keshavamurthy, mark.rutland, peterz,
tianruidong
[-- Attachment #1: Type: text/plain, Size: 1235 bytes --]
On Tue, 20 May 2025, Lukas Wunner wrote:
> On Tue, May 20, 2025 at 01:07:28PM +0300, Ilpo Järvinen wrote:
> > On Tue, 20 May 2025, Shuai Xue wrote:
> > > , and the format of "Link Speed changed" is a bit different from
> > > "pci_hp_event".
> >
> > The difference is only because when the Link is down, there's no Link
> > Speed (obviously). Whenever a new device is hotplugged and it comes up,
> > there's also Link Speed for it which can be included into the trace event.
> >
> > I think the trace event should have some special value for the fields that
> > are N/A due to Link being off. While it would be possible to create
> > separate events for speed changes and hotplug, I don't see any pros in
> > that approach over just having the N/A fields marked as such when the Link
> > is Down.
>
> Link speed changes and device plug/unplug events are orthogonal,
> I don't think they should be mixed together in the same event.
>
> A link speed event can be signaled simultaneously to a plug event
> and then user space can decide in which type of event it's
> interested in.
>
> That also avoids the awkwardness of having N/A values for the
> link speed on unplug.
Fair enough.
--
i.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event
2025-05-20 10:44 ` Lukas Wunner
2025-05-20 10:59 ` Ilpo Järvinen
@ 2025-05-20 12:09 ` Lukas Wunner
2025-05-20 12:52 ` Ilpo Järvinen
1 sibling, 1 reply; 37+ messages in thread
From: Lukas Wunner @ 2025-05-20 12:09 UTC (permalink / raw)
To: Ilpo Järvinen
Cc: Shuai Xue, rostedt, linux-pci, LKML, linux-edac,
linux-trace-kernel, helgaas, bhelgaas, tony.luck, bp, mhiramat,
mathieu.desnoyers, oleg, naveen, davem, anil.s.keshavamurthy,
mark.rutland, peterz, tianruidong
On Tue, May 20, 2025 at 12:44:38PM +0200, Lukas Wunner wrote:
> Link speed changes and device plug/unplug events are orthogonal,
> I don't think they should be mixed together in the same event.
>
> A link speed event can be signaled simultaneously to a plug event
> and then user space can decide in which type of event it's
> interested in.
>
> That also avoids the awkwardness of having N/A values for the
> link speed on unplug.
After thinking about this some more:
A link speed event could contain a "reason" field
which indicates why the link speed changed,
e.g. "hotplug", "autonomous", "thermal", "retrain", etc.
In other words, instead of mixing the infomation for hotplug
and link speed events together in one event, a separate link
speed event could point to hotplug as one possible reason for
the new speed.
Thanks,
Lukas
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event
2025-05-20 12:09 ` Lukas Wunner
@ 2025-05-20 12:52 ` Ilpo Järvinen
2025-05-20 13:11 ` Lukas Wunner
0 siblings, 1 reply; 37+ messages in thread
From: Ilpo Järvinen @ 2025-05-20 12:52 UTC (permalink / raw)
To: Lukas Wunner
Cc: Shuai Xue, rostedt, linux-pci, LKML, linux-edac,
linux-trace-kernel, helgaas, bhelgaas, tony.luck, bp, mhiramat,
mathieu.desnoyers, oleg, naveen, davem, anil.s.keshavamurthy,
mark.rutland, peterz, tianruidong
On Tue, 20 May 2025, Lukas Wunner wrote:
> On Tue, May 20, 2025 at 12:44:38PM +0200, Lukas Wunner wrote:
> > Link speed changes and device plug/unplug events are orthogonal,
> > I don't think they should be mixed together in the same event.
> >
> > A link speed event can be signaled simultaneously to a plug event
> > and then user space can decide in which type of event it's
> > interested in.
> >
> > That also avoids the awkwardness of having N/A values for the
> > link speed on unplug.
>
> After thinking about this some more:
>
> A link speed event could contain a "reason" field
> which indicates why the link speed changed,
> e.g. "hotplug", "autonomous", "thermal", "retrain", etc.
>
> In other words, instead of mixing the infomation for hotplug
> and link speed events together in one event, a separate link
> speed event could point to hotplug as one possible reason for
> the new speed.
It will be somewhat challenging to link LBMS into what caused it,
especially in cases where there is more than one LBMS following a single
Link Retraining.
Do you have opinion on should the event be only recorded from LBMS/LABS
if the speed changed from the previous value? The speed should probably
also be reported also for the first time (initial enumeration, hotplugging
a new board).
--
i.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event
2025-05-20 12:52 ` Ilpo Järvinen
@ 2025-05-20 13:11 ` Lukas Wunner
2025-05-22 9:50 ` Shuai Xue
0 siblings, 1 reply; 37+ messages in thread
From: Lukas Wunner @ 2025-05-20 13:11 UTC (permalink / raw)
To: Ilpo Järvinen
Cc: Shuai Xue, rostedt, linux-pci, LKML, linux-edac,
linux-trace-kernel, helgaas, bhelgaas, tony.luck, bp, mhiramat,
mathieu.desnoyers, oleg, naveen, davem, anil.s.keshavamurthy,
mark.rutland, peterz, tianruidong
On Tue, May 20, 2025 at 03:52:56PM +0300, Ilpo Järvinen wrote:
> On Tue, 20 May 2025, Lukas Wunner wrote:
> > A link speed event could contain a "reason" field
> > which indicates why the link speed changed,
> > e.g. "hotplug", "autonomous", "thermal", "retrain", etc.
> >
> > In other words, instead of mixing the infomation for hotplug
> > and link speed events together in one event, a separate link
> > speed event could point to hotplug as one possible reason for
> > the new speed.
>
> It will be somewhat challenging to link LBMS into what caused it,
> especially in cases where there is more than one LBMS following a single
> Link Retraining.
>
> Do you have opinion on should the event be only recorded from LBMS/LABS
> if the speed changed from the previous value? The speed should probably
> also be reported also for the first time (initial enumeration, hotplugging
> a new board).
One idea would be to amend struct pcie_bwctrl_data with an
enum member describing the reason.
pcie_bwnotif_irq() uses that reason when reporting the speed change
in a trace event.
After an Endpoint has been removed, the Downstream Port or Root Port
above resets the reason to "hotplug", so that the next link event
is assigned that reason.
Similarly pcie_set_target_speed() could be amended with an enum argument
for the reason and it would set that in struct pcie_bwctrl_data before
calling pcie_bwctrl_change_speed().
Thanks,
Lukas
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event
2025-05-20 10:07 ` Ilpo Järvinen
2025-05-20 10:44 ` Lukas Wunner
@ 2025-05-22 9:41 ` Shuai Xue
1 sibling, 0 replies; 37+ messages in thread
From: Shuai Xue @ 2025-05-22 9:41 UTC (permalink / raw)
To: Ilpo Järvinen
Cc: rostedt, Lukas Wunner, linux-pci, LKML, linux-edac,
linux-trace-kernel, helgaas, bhelgaas, tony.luck, bp, mhiramat,
mathieu.desnoyers, oleg, naveen, davem, anil.s.keshavamurthy,
mark.rutland, peterz, tianruidong
在 2025/5/20 18:07, Ilpo Järvinen 写道:
> On Tue, 20 May 2025, Shuai Xue wrote:
>
>> Hi, Ilpo,
>>
>> 在 2025/5/20 01:10, Ilpo Järvinen 写道:
>>> On Mon, 12 May 2025, Shuai Xue wrote:
>>>
>>>> Hotplug events are critical indicators for analyzing hardware health,
>>>> particularly in AI supercomputers where surprise link downs can
>>>> significantly impact system performance and reliability.
>>>>
>>>> To this end, define a new TRACING_SYSTEM named pci, add a generic RAS
>>>> tracepoint for hotplug event to help healthy check, and generate
>>>> tracepoints for pcie hotplug event. Add enum pci_hotplug_event in
>>>> include/uapi/linux/pci.h so applications like rasdaemon can register
>>>> tracepoint event handlers for it.
>>>>
>>>> The output like below:
>>>>
>>>> $ echo 1 > /sys/kernel/debug/tracing/events/pci/pci_hp_event/enable
>>>> $ cat /sys/kernel/debug/tracing/trace_pipe
>>>> <...>-206 [001] ..... 40.373870: pci_hp_event: 0000:00:02.0
>>>> slot:10, event:Link Down
>>>>
>>>> <...>-206 [001] ..... 40.374871: pci_hp_event: 0000:00:02.0
>>>> slot:10, event:Card not present
>>>>
>>>> Suggested-by: Lukas Wunner <lukas@wunner.de>
>>>> Suggested-by: Steven Rostedt <rostedt@goodmis.org>
>>>> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
>>>> Reviewed-by: Lukas Wunner <lukas@wunner.de>
>>>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>>>> ---
>>>> changes since v7:
>>>> - replace the TRACE_INCLUDE_PATH to avoid macro conflict per Steven
>>>> - pick up Reviewed-by from Lukas Wunner
>>>> ---
>>>> drivers/pci/hotplug/Makefile | 3 ++
>>>> drivers/pci/hotplug/pciehp_ctrl.c | 33 ++++++++++++---
>>>> drivers/pci/hotplug/trace.h | 68 +++++++++++++++++++++++++++++++
>>>> include/uapi/linux/pci.h | 7 ++++
>>>> 4 files changed, 105 insertions(+), 6 deletions(-)
>>>> create mode 100644 drivers/pci/hotplug/trace.h
>>>>
>>>> diff --git a/drivers/pci/hotplug/Makefile b/drivers/pci/hotplug/Makefile
>>>> index 40aaf31fe338..a1a9d1e98962 100644
>>>> --- a/drivers/pci/hotplug/Makefile
>>>> +++ b/drivers/pci/hotplug/Makefile
>>>> @@ -3,6 +3,9 @@
>>>> # Makefile for the Linux kernel pci hotplug controller drivers.
>>>> #
>>>> +# define_trace.h needs to know how to find our header
>>>> +CFLAGS_pciehp_ctrl.o := -I$(src)
>>>> +
>>>> obj-$(CONFIG_HOTPLUG_PCI) += pci_hotplug.o
>>>> obj-$(CONFIG_HOTPLUG_PCI_COMPAQ) += cpqphp.o
>>>> obj-$(CONFIG_HOTPLUG_PCI_IBM) += ibmphp.o
>>>> diff --git a/drivers/pci/hotplug/pciehp_ctrl.c
>>>> b/drivers/pci/hotplug/pciehp_ctrl.c
>>>> index d603a7aa7483..f9beb4d3a9b8 100644
>>>> --- a/drivers/pci/hotplug/pciehp_ctrl.c
>>>> +++ b/drivers/pci/hotplug/pciehp_ctrl.c
>>>> @@ -23,6 +23,9 @@
>>>> #include "../pci.h"
>>>> #include "pciehp.h"
>>>> +#define CREATE_TRACE_POINTS
>>>> +#include "trace.h"
>>>> +
>>>> /* The following routines constitute the bulk of the
>>>> hotplug controller logic
>>>> */
>>>> @@ -244,12 +247,20 @@ void pciehp_handle_presence_or_link_change(struct
>>>> controller *ctrl, u32 events)
>>>> case ON_STATE:
>>>> ctrl->state = POWEROFF_STATE;
>>>> mutex_unlock(&ctrl->state_lock);
>>>> - if (events & PCI_EXP_SLTSTA_DLLSC)
>>>> + if (events & PCI_EXP_SLTSTA_DLLSC) {
>>>> ctrl_info(ctrl, "Slot(%s): Link Down\n",
>>>> slot_name(ctrl));
>>>> - if (events & PCI_EXP_SLTSTA_PDC)
>>>> + trace_pci_hp_event(pci_name(ctrl->pcie->port),
>>>> + slot_name(ctrl),
>>>> + PCI_HOTPLUG_LINK_DOWN);
>>>> + }
>>>> + if (events & PCI_EXP_SLTSTA_PDC) {
>>>> ctrl_info(ctrl, "Slot(%s): Card not present\n",
>>>> slot_name(ctrl));
>>>> + trace_pci_hp_event(pci_name(ctrl->pcie->port),
>>>> + slot_name(ctrl),
>>>> + PCI_HOTPLUG_CARD_NOT_PRESENT);
>>>> + }
>>>> pciehp_disable_slot(ctrl, SURPRISE_REMOVAL);
>>>> break;
>>>> default:
>>>> @@ -269,6 +280,9 @@ void pciehp_handle_presence_or_link_change(struct
>>>> controller *ctrl, u32 events)
>>>> INDICATOR_NOOP);
>>>> ctrl_info(ctrl, "Slot(%s): Card not present\n",
>>>> slot_name(ctrl));
>>>> + trace_pci_hp_event(pci_name(ctrl->pcie->port),
>>>> + slot_name(ctrl),
>>>> + PCI_HOTPLUG_CARD_NOT_PRESENT);
>>>> }
>>>> mutex_unlock(&ctrl->state_lock);
>>>> return;
>>>> @@ -281,12 +295,19 @@ void pciehp_handle_presence_or_link_change(struct
>>>> controller *ctrl, u32 events)
>>>> case OFF_STATE:
>>>> ctrl->state = POWERON_STATE;
>>>> mutex_unlock(&ctrl->state_lock);
>>>> - if (present)
>>>> + if (present) {
>>>> ctrl_info(ctrl, "Slot(%s): Card present\n",
>>>> slot_name(ctrl));
>>>> - if (link_active)
>>>> - ctrl_info(ctrl, "Slot(%s): Link Up\n",
>>>> - slot_name(ctrl));
>>>> + trace_pci_hp_event(pci_name(ctrl->pcie->port),
>>>> + slot_name(ctrl),
>>>> + PCI_HOTPLUG_CARD_PRESENT);
>>>> + }
>>>> + if (link_active) {
>>>> + ctrl_info(ctrl, "Slot(%s): Link Up\n",
>>>> slot_name(ctrl));
>>>> + trace_pci_hp_event(pci_name(ctrl->pcie->port),
>>>> + slot_name(ctrl),
>>>> + PCI_HOTPLUG_LINK_UP);
>>>> + }
>>>> ctrl->request_result = pciehp_enable_slot(ctrl);
>>>> break;
>>>> default:
>>>> diff --git a/drivers/pci/hotplug/trace.h b/drivers/pci/hotplug/trace.h
>>>> new file mode 100644
>>>> index 000000000000..21329c198019
>>>> --- /dev/null
>>>> +++ b/drivers/pci/hotplug/trace.h
>>>> @@ -0,0 +1,68 @@
>>>> +/* SPDX-License-Identifier: GPL-2.0 */
>>>> +#if !defined(_TRACE_HW_EVENT_PCI_HP_H) ||
>>>> defined(TRACE_HEADER_MULTI_READ)
>>>> +#define _TRACE_HW_EVENT_PCI_HP_H
>>>> +
>>>> +#include <linux/tracepoint.h>
>>>> +
>>>> +#undef TRACE_SYSTEM
>>>> +#define TRACE_SYSTEM pci
>>>> +
>>>> +#define PCI_HOTPLUG_EVENT \
>>>> + EM(PCI_HOTPLUG_LINK_UP, "Link Up") \
>>>> + EM(PCI_HOTPLUG_LINK_DOWN, "Link Down") \
>>>> + EM(PCI_HOTPLUG_CARD_PRESENT, "Card present") \
>>>> + EMe(PCI_HOTPLUG_CARD_NOT_PRESENT, "Card not present")
>>>
>>> Hi,
>>>
>>> While I was thinking of adding tracing into PCIe BW controller (bwctrl),
>>> I ended up thinking that perhaps it would make more sense to have PCIe
>>> Link related tracepoints which would cover both hotplug and bwctrl so that
>>> also Link Speed changes would be reported through the same trace event.
>>>
>>> Downgraded speed may indicate there's something wrong with the card and
>>> the Link Speed does have performance impact too for those who are pushing
>>> BW boundaries such as AI systems.
>>
>> Agreed!
>>
>>>
>>> So my suggestion is:
>>>
>>> - Add "Link Speed changed" to the event types.
>>> - Add Link Speed and Width into the event format (and probably also Flit
>>> mode as PCIe gen6 is coming).
>>
>>
>> How about bellow event format:
>>
>> + TP_STRUCT__entry(
>> + __string( port_name, port_name )
>> + __field( unsigned char, cur_bus_speed )
>> + __field( unsigned char, max_bus_speed )
>
> Add also the Link Width.
Got it.
>
>> + __field( unsigned char, flit_mode )
>> + ),
>>
>> And add the event to pcie_update_link_speed():
>>
>> @@ -796,6 +799,10 @@ void pcie_update_link_speed(struct pci_bus *bus)
>> pcie_capability_read_word(bridge, PCI_EXP_LNKSTA, &linksta);
>> pcie_capability_read_word(bridge, PCI_EXP_LNKSTA2, &linksta2);
>> __pcie_update_link_speed(bus, linksta, linksta2);
>> +
>> + trace_pci_link_event(pci_name(bridge), bus->cur_bus_speed,
>> + bus->max_bus_speed,
>
> I don't think outputting the internal values of enum pci_bus_speed is a
> good idea. Maybe these could be printed as a string (with
> pci_speed_string()) or encoded with trace interface specific values.
I see, a human readable string is better.
>
> Perhaps it would make sense to check if the speed really changed before
> sending that event, but there are good sides in both approaches as I
> know some platforms assert LBMS more than once during a single Link Speed
> change.
>
>> + PCI_HOTPLUG_LINK_SPEED_CHANGED);
>>
>> But I don't find link speed changed in hotplug driver
>
> pciehp_check_link_status() calls __pcie_update_link_speed().
Thanks.
>
>> , and the format of "Link Speed changed" is a bit different from
>> "pci_hp_event".
>
> The difference is only because when the Link is down, there's no Link
> Speed (obviously). Whenever a new device is hotplugged and it comes up,
> there's also Link Speed for it which can be included into the trace event.
>
> I think the trace event should have some special value for the fields that
> are N/A due to Link being off. While it would be possible to create
> separate events for speed changes and hotplug, I don't see any pros in
> that approach over just having the N/A fields marked as such when the Link
> is Down.
>
> Perhaps it would even make sense to add PCIE_SPEED_LINK_DOWN into
> bus->cur_bus_speed when hotplug finds the card is gone (I'm not entirely
> sure how bwctrl or pcie_cooling driver would cope with that though, they
> might need minor tweaking to support it, and there are a few other drivers
> that use that field).
>
>> Do we really need a PCI_HOTPLUG_EVENT? May PCI_LINK_EVENT is more
>> appropriate?
>
> Ah, right, I forgot to mention it would make sense to rename it to
> PCI_LINK_EVENT.
>
Got it.
Thanks.
Shuai
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event
2025-05-20 13:11 ` Lukas Wunner
@ 2025-05-22 9:50 ` Shuai Xue
2025-05-31 14:15 ` Lukas Wunner
0 siblings, 1 reply; 37+ messages in thread
From: Shuai Xue @ 2025-05-22 9:50 UTC (permalink / raw)
To: Lukas Wunner, Ilpo Järvinen, Bjorn Helgaas
Cc: rostedt, linux-pci, LKML, linux-edac, linux-trace-kernel, helgaas,
bhelgaas, tony.luck, bp, mhiramat, mathieu.desnoyers, oleg,
naveen, davem, anil.s.keshavamurthy, mark.rutland, peterz,
tianruidong
在 2025/5/20 21:11, Lukas Wunner 写道:
> On Tue, May 20, 2025 at 03:52:56PM +0300, Ilpo Järvinen wrote:
>> On Tue, 20 May 2025, Lukas Wunner wrote:
>>> A link speed event could contain a "reason" field
>>> which indicates why the link speed changed,
>>> e.g. "hotplug", "autonomous", "thermal", "retrain", etc.
>>>
>>> In other words, instead of mixing the infomation for hotplug
>>> and link speed events together in one event, a separate link
>>> speed event could point to hotplug as one possible reason for
>>> the new speed.
>>
>> It will be somewhat challenging to link LBMS into what caused it,
>> especially in cases where there is more than one LBMS following a single
>> Link Retraining.
>>
>> Do you have opinion on should the event be only recorded from LBMS/LABS
>> if the speed changed from the previous value? The speed should probably
>> also be reported also for the first time (initial enumeration, hotplugging
>> a new board).
>
> One idea would be to amend struct pcie_bwctrl_data with an
> enum member describing the reason.
>
> pcie_bwnotif_irq() uses that reason when reporting the speed change
> in a trace event.
>
> After an Endpoint has been removed, the Downstream Port or Root Port
> above resets the reason to "hotplug", so that the next link event
> is assigned that reason.
>
> Similarly pcie_set_target_speed() could be amended with an enum argument
> for the reason and it would set that in struct pcie_bwctrl_data before
> calling pcie_bwctrl_change_speed().
>
> Thanks,
>
> Lukas
Hi Lukas and Ilpo,
Thank you for the discussion.
As @Lukas points out, link speed changes and device plug/unplug events are
orthogonal issues.
Based on this thread discussion, I believe we need additional tweaking to
introduce a new tracepoint (perhaps named PCI_LINK_EVENT) to handle
link speed changes separately.
Regarding our next steps, would it be acceptable to merge the
PCI_HOTPLUG_EVENT to mainline first, and then work on implementing
the new link event tracepoint afterward?
Best regards,
Shuai
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event
2025-05-22 9:50 ` Shuai Xue
@ 2025-05-31 14:15 ` Lukas Wunner
2025-07-16 6:52 ` Shuai Xue
0 siblings, 1 reply; 37+ messages in thread
From: Lukas Wunner @ 2025-05-31 14:15 UTC (permalink / raw)
To: Shuai Xue
Cc: Ilpo Järvinen, Bjorn Helgaas, rostedt, linux-pci, LKML,
linux-edac, linux-trace-kernel, helgaas, tony.luck, bp, mhiramat,
mathieu.desnoyers, oleg, naveen, davem, anil.s.keshavamurthy,
mark.rutland, peterz, tianruidong
On Thu, May 22, 2025 at 05:50:05PM +0800, Shuai Xue wrote:
> As @Lukas points out, link speed changes and device plug/unplug events are
> orthogonal issues.
>
> Based on this thread discussion, I believe we need additional tweaking to
> introduce a new tracepoint (perhaps named PCI_LINK_EVENT) to handle
> link speed changes separately.
>
> Regarding our next steps, would it be acceptable to merge the
> PCI_HOTPLUG_EVENT to mainline first, and then work on implementing
> the new link event tracepoint afterward?
Yes I think so, I think this patch is ready to go in.
However I'm not part of the PCI maintainer team,
it would have to be applied by them (barring any objections).
We're now in the merge window and it may be too late to squeeze it
into the v6.16 pull request, but maybe it can be applied after
the merge window has closed (in 1 week).
Thanks,
Lukas
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event
2025-05-12 1:38 [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event Shuai Xue
2025-05-19 17:10 ` Ilpo Järvinen
@ 2025-06-02 6:30 ` Ilpo Järvinen
2025-06-23 3:04 ` Shuai Xue
2025-07-16 22:25 ` Bjorn Helgaas
2025-07-17 17:28 ` Matthew W Carlis
3 siblings, 1 reply; 37+ messages in thread
From: Ilpo Järvinen @ 2025-06-02 6:30 UTC (permalink / raw)
To: Shuai Xue, Lukas Wunner
Cc: rostedt, linux-pci, LKML, linux-edac, linux-trace-kernel, helgaas,
bhelgaas, tony.luck, bp, mhiramat, mathieu.desnoyers, oleg,
naveen, davem, anil.s.keshavamurthy, mark.rutland, peterz,
tianruidong
On Mon, 12 May 2025, Shuai Xue wrote:
> Hotplug events are critical indicators for analyzing hardware health,
> particularly in AI supercomputers where surprise link downs can
> significantly impact system performance and reliability.
>
> To this end, define a new TRACING_SYSTEM named pci, add a generic RAS
> tracepoint for hotplug event to help healthy check, and generate
> tracepoints for pcie hotplug event. Add enum pci_hotplug_event in
> include/uapi/linux/pci.h so applications like rasdaemon can register
> tracepoint event handlers for it.
>
> The output like below:
>
> $ echo 1 > /sys/kernel/debug/tracing/events/pci/pci_hp_event/enable
> $ cat /sys/kernel/debug/tracing/trace_pipe
> <...>-206 [001] ..... 40.373870: pci_hp_event: 0000:00:02.0 slot:10, event:Link Down
>
> <...>-206 [001] ..... 40.374871: pci_hp_event: 0000:00:02.0 slot:10, event:Card not present
>
> Suggested-by: Lukas Wunner <lukas@wunner.de>
> Suggested-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
> Reviewed-by: Lukas Wunner <lukas@wunner.de>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> changes since v7:
> - replace the TRACE_INCLUDE_PATH to avoid macro conflict per Steven
> - pick up Reviewed-by from Lukas Wunner
> ---
> drivers/pci/hotplug/Makefile | 3 ++
> drivers/pci/hotplug/pciehp_ctrl.c | 33 ++++++++++++---
> drivers/pci/hotplug/trace.h | 68 +++++++++++++++++++++++++++++++
> include/uapi/linux/pci.h | 7 ++++
> 4 files changed, 105 insertions(+), 6 deletions(-)
> create mode 100644 drivers/pci/hotplug/trace.h
>
> diff --git a/drivers/pci/hotplug/Makefile b/drivers/pci/hotplug/Makefile
> index 40aaf31fe338..a1a9d1e98962 100644
> --- a/drivers/pci/hotplug/Makefile
> +++ b/drivers/pci/hotplug/Makefile
> @@ -3,6 +3,9 @@
> # Makefile for the Linux kernel pci hotplug controller drivers.
> #
>
> +# define_trace.h needs to know how to find our header
> +CFLAGS_pciehp_ctrl.o := -I$(src)
> +
> obj-$(CONFIG_HOTPLUG_PCI) += pci_hotplug.o
> obj-$(CONFIG_HOTPLUG_PCI_COMPAQ) += cpqphp.o
> obj-$(CONFIG_HOTPLUG_PCI_IBM) += ibmphp.o
> diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
> index d603a7aa7483..f9beb4d3a9b8 100644
> --- a/drivers/pci/hotplug/pciehp_ctrl.c
> +++ b/drivers/pci/hotplug/pciehp_ctrl.c
> @@ -23,6 +23,9 @@
> #include "../pci.h"
> #include "pciehp.h"
>
> +#define CREATE_TRACE_POINTS
> +#include "trace.h"
Hi,
Instead of spreading tracepoint creating into subdriver code like this,
should we place it into one place, e.g., drivers/pci/pci-trace.c (which is
what I seem to have used in my yet to be submitted patch that adds
tracepoints into bwctrl link speed events)?
> diff --git a/drivers/pci/hotplug/trace.h b/drivers/pci/hotplug/trace.h
> new file mode 100644
> index 000000000000..21329c198019
> --- /dev/null
> +++ b/drivers/pci/hotplug/trace.h
Perhaps in include/trace/events/pci.h (or
include/trace/events/pci-hotplug.h)?
I don't know what is the general rule having them inside drivers/ vs
include/trace/events, Documentation/trace/tracepoints.rst only mentions
the latter, but there seems to be plenty under drivers/ too.
> @@ -0,0 +1,68 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#if !defined(_TRACE_HW_EVENT_PCI_HP_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_HW_EVENT_PCI_HP_H
> +
> +#include <linux/tracepoint.h>
> +
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM pci
> +
> +#define PCI_HOTPLUG_EVENT \
--
i.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event
2025-06-02 6:30 ` Ilpo Järvinen
@ 2025-06-23 3:04 ` Shuai Xue
0 siblings, 0 replies; 37+ messages in thread
From: Shuai Xue @ 2025-06-23 3:04 UTC (permalink / raw)
To: Ilpo Järvinen, Lukas Wunner, rostedt
Cc: rostedt, linux-pci, LKML, linux-edac, linux-trace-kernel, helgaas,
bhelgaas, tony.luck, bp, mhiramat, mathieu.desnoyers, oleg,
naveen, davem, anil.s.keshavamurthy, mark.rutland, peterz,
tianruidong
在 2025/6/2 14:30, Ilpo Järvinen 写道:
> On Mon, 12 May 2025, Shuai Xue wrote:
>
>> Hotplug events are critical indicators for analyzing hardware health,
>> particularly in AI supercomputers where surprise link downs can
>> significantly impact system performance and reliability.
>>
>> To this end, define a new TRACING_SYSTEM named pci, add a generic RAS
>> tracepoint for hotplug event to help healthy check, and generate
>> tracepoints for pcie hotplug event. Add enum pci_hotplug_event in
>> include/uapi/linux/pci.h so applications like rasdaemon can register
>> tracepoint event handlers for it.
>>
>> The output like below:
>>
>> $ echo 1 > /sys/kernel/debug/tracing/events/pci/pci_hp_event/enable
>> $ cat /sys/kernel/debug/tracing/trace_pipe
>> <...>-206 [001] ..... 40.373870: pci_hp_event: 0000:00:02.0 slot:10, event:Link Down
>>
>> <...>-206 [001] ..... 40.374871: pci_hp_event: 0000:00:02.0 slot:10, event:Card not present
>>
>> Suggested-by: Lukas Wunner <lukas@wunner.de>
>> Suggested-by: Steven Rostedt <rostedt@goodmis.org>
>> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
>> Reviewed-by: Lukas Wunner <lukas@wunner.de>
>> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
>> ---
>> changes since v7:
>> - replace the TRACE_INCLUDE_PATH to avoid macro conflict per Steven
>> - pick up Reviewed-by from Lukas Wunner
>> ---
>> drivers/pci/hotplug/Makefile | 3 ++
>> drivers/pci/hotplug/pciehp_ctrl.c | 33 ++++++++++++---
>> drivers/pci/hotplug/trace.h | 68 +++++++++++++++++++++++++++++++
>> include/uapi/linux/pci.h | 7 ++++
>> 4 files changed, 105 insertions(+), 6 deletions(-)
>> create mode 100644 drivers/pci/hotplug/trace.h
>>
>> diff --git a/drivers/pci/hotplug/Makefile b/drivers/pci/hotplug/Makefile
>> index 40aaf31fe338..a1a9d1e98962 100644
>> --- a/drivers/pci/hotplug/Makefile
>> +++ b/drivers/pci/hotplug/Makefile
>> @@ -3,6 +3,9 @@
>> # Makefile for the Linux kernel pci hotplug controller drivers.
>> #
>>
>> +# define_trace.h needs to know how to find our header
>> +CFLAGS_pciehp_ctrl.o := -I$(src)
>> +
>> obj-$(CONFIG_HOTPLUG_PCI) += pci_hotplug.o
>> obj-$(CONFIG_HOTPLUG_PCI_COMPAQ) += cpqphp.o
>> obj-$(CONFIG_HOTPLUG_PCI_IBM) += ibmphp.o
>> diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
>> index d603a7aa7483..f9beb4d3a9b8 100644
>> --- a/drivers/pci/hotplug/pciehp_ctrl.c
>> +++ b/drivers/pci/hotplug/pciehp_ctrl.c
>> @@ -23,6 +23,9 @@
>> #include "../pci.h"
>> #include "pciehp.h"
>>
>> +#define CREATE_TRACE_POINTS
>> +#include "trace.h"
>
> Hi,
>
> Instead of spreading tracepoint creating into subdriver code like this,
> should we place it into one place, e.g., drivers/pci/pci-trace.c (which is
> what I seem to have used in my yet to be submitted patch that adds
> tracepoints into bwctrl link speed events)?
>
>> diff --git a/drivers/pci/hotplug/trace.h b/drivers/pci/hotplug/trace.h
>> new file mode 100644
>> index 000000000000..21329c198019
>> --- /dev/null
>> +++ b/drivers/pci/hotplug/trace.h
>
> Perhaps in include/trace/events/pci.h (or
> include/trace/events/pci-hotplug.h)?
>
> I don't know what is the general rule having them inside drivers/ vs
> include/trace/events, Documentation/trace/tracepoints.rst only mentions
> the latter, but there seems to be plenty under drivers/ too.
>
Hi, Ilpo,
I don't know either.
I think it is tracepoint stuff and it is up to @Steve.
@Steve, which way do you prefer?
Thanks for help.
Best Regards,
Shuai
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event
2025-05-31 14:15 ` Lukas Wunner
@ 2025-07-16 6:52 ` Shuai Xue
0 siblings, 0 replies; 37+ messages in thread
From: Shuai Xue @ 2025-07-16 6:52 UTC (permalink / raw)
To: Lukas Wunner, Bjorn Helgaas, linux-pci
Cc: Ilpo Järvinen, rostedt, LKML, linux-edac, linux-trace-kernel,
helgaas, tony.luck, bp, mhiramat, mathieu.desnoyers, oleg, naveen,
davem, anil.s.keshavamurthy, mark.rutland, peterz, tianruidong
在 2025/5/31 22:15, Lukas Wunner 写道:
> On Thu, May 22, 2025 at 05:50:05PM +0800, Shuai Xue wrote:
>> As @Lukas points out, link speed changes and device plug/unplug events are
>> orthogonal issues.
>>
>> Based on this thread discussion, I believe we need additional tweaking to
>> introduce a new tracepoint (perhaps named PCI_LINK_EVENT) to handle
>> link speed changes separately.
>>
>> Regarding our next steps, would it be acceptable to merge the
>> PCI_HOTPLUG_EVENT to mainline first, and then work on implementing
>> the new link event tracepoint afterward?
>
> Yes I think so, I think this patch is ready to go in.
>
> However I'm not part of the PCI maintainer team,
> it would have to be applied by them (barring any objections).
>
> We're now in the merge window and it may be too late to squeeze it
> into the v6.16 pull request, but maybe it can be applied after
> the merge window has closed (in 1 week).
>
> Thanks,
>
> Lukas
Hi, Bjorn
Gentle ping.
Are you happy with this patch?
Thanks
Shuai
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event
2025-05-12 1:38 [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event Shuai Xue
2025-05-19 17:10 ` Ilpo Järvinen
2025-06-02 6:30 ` Ilpo Järvinen
@ 2025-07-16 22:25 ` Bjorn Helgaas
2025-07-17 6:00 ` Shuai Xue
2025-07-24 22:27 ` Bjorn Helgaas
2025-07-17 17:28 ` Matthew W Carlis
3 siblings, 2 replies; 37+ messages in thread
From: Bjorn Helgaas @ 2025-07-16 22:25 UTC (permalink / raw)
To: Shuai Xue
Cc: rostedt, lukas, linux-pci, linux-kernel, linux-edac,
linux-trace-kernel, bhelgaas, tony.luck, bp, mhiramat,
mathieu.desnoyers, oleg, naveen, davem, anil.s.keshavamurthy,
mark.rutland, peterz, tianruidong, Ilpo Järvinen,
Jonathan Cameron
[+cc Ilpo, Jonathan (should have been included since the patch has his
Reviewed-by)]
Thanks for the ping; I noticed quite a bit of discussion but didn't
follow it myself, so didn't know it was basically all resolved.
On Mon, May 12, 2025 at 09:38:39AM +0800, Shuai Xue wrote:
> Hotplug events are critical indicators for analyzing hardware health,
> particularly in AI supercomputers where surprise link downs can
> significantly impact system performance and reliability.
I dropped the "particularly in AI supercomputers" part because I think
this is relevant in general.
> To this end, define a new TRACING_SYSTEM named pci, add a generic RAS
> tracepoint for hotplug event to help healthy check, and generate
> tracepoints for pcie hotplug event.
I'm not quite clear on the difference between "add generic RAS
tracepoint for hotplug event" and "generate tracepoints for pcie
hotplug event." Are these two different things?
I see the new TRACE_EVENT(pci_hp_event, ...) definition. Is that what
you mean by the "generic RAS tracepoint"?
And the five new trace_pci_hp_event() calls that use the TRACE_EVENT
are the "tracepoints for PCIe hotplug event"?
> Add enum pci_hotplug_event in
> include/uapi/linux/pci.h so applications like rasdaemon can register
> tracepoint event handlers for it.
>
> The output like below:
>
> $ echo 1 > /sys/kernel/debug/tracing/events/pci/pci_hp_event/enable
> $ cat /sys/kernel/debug/tracing/trace_pipe
> <...>-206 [001] ..... 40.373870: pci_hp_event: 0000:00:02.0 slot:10, event:Link Down
>
> <...>-206 [001] ..... 40.374871: pci_hp_event: 0000:00:02.0 slot:10, event:Card not present
> +#define PCI_HOTPLUG_EVENT \
> + EM(PCI_HOTPLUG_LINK_UP, "Link Up") \
> + EM(PCI_HOTPLUG_LINK_DOWN, "Link Down") \
> + EM(PCI_HOTPLUG_CARD_PRESENT, "Card present") \
> + EMe(PCI_HOTPLUG_CARD_NOT_PRESENT, "Card not present")
Running this:
$ git grep -E "\<(EM|EMe)\("
I notice that these new events don't look like the others, which
mostly look like "word" or "event-type" or "VERB object".
I'm OK with this, but just giving you a chance to consider what will
be the least surprise to users and easiest for grep and shell
scripting.
I also noticed capitalization of "Up" and "Down", but not "present"
and "not present".
"Card" is only used occasionally and informally in the PCIe spec, and
not at all in the context of hotplug of Slot Status (Presence Detect
State refers to "adapter in the slot"), but it does match the pciehp
dmesg text, so it probably makes sense to use that.
Anyway, I applied this on pci/trace for v6.17. If there's anything
you want to tweak in the commit log or event text, we can still do
that.
https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git/commit/?h=trace
Bjorn
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event
2025-07-16 22:25 ` Bjorn Helgaas
@ 2025-07-17 6:00 ` Shuai Xue
2025-07-17 19:29 ` Bjorn Helgaas
2025-07-21 8:55 ` Ilpo Järvinen
2025-07-24 22:27 ` Bjorn Helgaas
1 sibling, 2 replies; 37+ messages in thread
From: Shuai Xue @ 2025-07-17 6:00 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: rostedt, lukas, linux-pci, linux-kernel, linux-edac,
linux-trace-kernel, bhelgaas, tony.luck, bp, mhiramat,
mathieu.desnoyers, oleg, naveen, davem, anil.s.keshavamurthy,
mark.rutland, peterz, tianruidong, Ilpo Järvinen,
Jonathan Cameron
在 2025/7/17 06:25, Bjorn Helgaas 写道:
> [+cc Ilpo, Jonathan (should have been included since the patch has his
> Reviewed-by)]
>
Thanks.
> Thanks for the ping; I noticed quite a bit of discussion but didn't
> follow it myself, so didn't know it was basically all resolved.
>
> On Mon, May 12, 2025 at 09:38:39AM +0800, Shuai Xue wrote:
>> Hotplug events are critical indicators for analyzing hardware health,
>> particularly in AI supercomputers where surprise link downs can
>> significantly impact system performance and reliability.
>
> I dropped the "particularly in AI supercomputers" part because I think
> this is relevant in general.
>
>> To this end, define a new TRACING_SYSTEM named pci, add a generic RAS
>> tracepoint for hotplug event to help healthy check, and generate
>> tracepoints for pcie hotplug event.
>
> I'm not quite clear on the difference between "add generic RAS
> tracepoint for hotplug event" and "generate tracepoints for pcie
> hotplug event." Are these two different things?
The purpose of this patch is to address the lack of tracepoints for PCIe
hotplug events in our production environment. In the initial RFC
version, I defined tracepoints such as "Link Up" and "Link Down"
specifically for PCIe hotplug. Later, Lukas suggested that these
tracepoints could be made more generic so that other PCI hotplug drivers
could also use them.
That’s why, when defining the event, I used a "generic" pci_hotplug_event
instead of a pcie_hotplug_event. If you're interested in more details
about this discussion, please refer to this link[1].
[1]https://erol.kernel.org/linux-pci/git/0/commit/?id=0ffd56f572f25bcd6c2265a1863848a18dce0e29
However, currently only PCIe hotplug is using these tracepoints, which
is why the CREATE_TRACE_POINTS macro is placed in
drivers/pci/hotplug/pciehp_ctrl.c.
>
> I see the new TRACE_EVENT(pci_hp_event, ...) definition. Is that what
> you mean by the "generic RAS tracepoint"?
Yes.
>
> And the five new trace_pci_hp_event() calls that use the TRACE_EVENT
> are the "tracepoints for PCIe hotplug event"?
Actually, the tracepoints are generic, although right now they are only
used for PCIe hotplug.
>
>> Add enum pci_hotplug_event in
>> include/uapi/linux/pci.h so applications like rasdaemon can register
>> tracepoint event handlers for it.
>>
>> The output like below:
>>
>> $ echo 1 > /sys/kernel/debug/tracing/events/pci/pci_hp_event/enable
>> $ cat /sys/kernel/debug/tracing/trace_pipe
>> <...>-206 [001] ..... 40.373870: pci_hp_event: 0000:00:02.0 slot:10, event:Link Down
>>
>> <...>-206 [001] ..... 40.374871: pci_hp_event: 0000:00:02.0 slot:10, event:Card not present
>
>> +#define PCI_HOTPLUG_EVENT \
>> + EM(PCI_HOTPLUG_LINK_UP, "Link Up") \
>> + EM(PCI_HOTPLUG_LINK_DOWN, "Link Down") \
>> + EM(PCI_HOTPLUG_CARD_PRESENT, "Card present") \
>> + EMe(PCI_HOTPLUG_CARD_NOT_PRESENT, "Card not present")
>
> Running this:
>
> $ git grep -E "\<(EM|EMe)\("
>
> I notice that these new events don't look like the others, which
> mostly look like "word" or "event-type" or "VERB object".
>
> I'm OK with this, but just giving you a chance to consider what will
> be the least surprise to users and easiest for grep and shell
> scripting.
I think this is also common. For example, MF_PAGE_TYPE for
memory_failure_event uses a similar format:
#define MF_PAGE_TYPE \
EM ( MF_MSG_KERNEL, "reserved kernel page" ) \
EM ( MF_MSG_KERNEL_HIGH_ORDER, "high-order kernel page" )
and aer_uncorrectable_errors for aer_event:
#define aer_uncorrectable_errors \
{PCI_ERR_UNC_UND, "Undefined"}, \
{PCI_ERR_UNC_DLP, "Data Link Protocol Error"}, \
{PCI_ERR_UNC_SURPDN, "Surprise Down Error"}, \
{PCI_ERR_UNC_POISON_TLP,"Poisoned TLP"},
>
> I also noticed capitalization of "Up" and "Down", but not "present"
> and "not present".
Aha, this is a bit tricky:)
The original kernel log messages are not consistent either:
ctrl_info(ctrl, "Slot(%s): Link Down\n",
ctrl_info(ctrl, "Slot(%s): Card not present\n",
I tried to keep the output as close as possible to the existing log
messages. If you prefer a more consistent capitalization style, I can
send another patch to fix that.
>
> "Card" is only used occasionally and informally in the PCIe spec, and
> not at all in the context of hotplug of Slot Status (Presence Detect
> State refers to "adapter in the slot"), but it does match the pciehp
> dmesg text, so it probably makes sense to use that.
>
> Anyway, I applied this on pci/trace for v6.17. If there's anything
> you want to tweak in the commit log or event text, we can still do
> that.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git/commit/?h=trace
>
> Bjorn
Thank you again for applying this to pci/trace for v6.17. If there’s
anything more to tweak in the commit log or event text, please let me
know.
Best regards,
Shuai
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event
2025-05-12 1:38 [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event Shuai Xue
` (2 preceding siblings ...)
2025-07-16 22:25 ` Bjorn Helgaas
@ 2025-07-17 17:28 ` Matthew W Carlis
2025-07-17 19:07 ` Bjorn Helgaas
3 siblings, 1 reply; 37+ messages in thread
From: Matthew W Carlis @ 2025-07-17 17:28 UTC (permalink / raw)
To: xueshuai
Cc: anil.s.keshavamurthy, bhelgaas, bp, davem, helgaas, linux-edac,
linux-kernel, linux-pci, linux-trace-kernel, lukas, mark.rutland,
mathieu.desnoyers, mhiramat, naveen, oleg, peterz, rostedt,
tianruidong, tony.luck
A bit late to the discussion here.. Looks like "too late" in fact, but I
wanted to just make some comments.
On Tue, 12 May 2025, Shuai Xue wrote:
> Hotplug events are critical indicators for analyzing hardware health,
In terms of a "hot plug" event I'm not actually sure what that means. I
mean to say that the spec has some room for different implementations.
I think sometimes that means a presence detect state change event, but a
system is not required to implement a presence pin (at least not for the
Slot Status presence). Some vendors support an "inband" presence which
is when the LTSSM essentially asserts presence if the link is active
and deasserts it when the link is down.
Appendix I in the newer PCIe specs say to use data link layer state change
event if presence is not implemented. It looks like this tracepoint would still
work, but its just something to keep in mind. At the risk of including too
much information I could see it also being useful to put the device/vendor IDs
of the DSP and the EP into the trace event for link up. Perhaps even the link
speed/width cap for DSP/EP. The real challenge with tracking a fleet is getting
all the things you care about into one place.
On Tue, 20 May 2025, Lukas Wunner wrote:
> Link speed changes and device plug/unplug events are orthogonal
I guess what I wanted to get at here were some of the discussion from Lukas &
Ilpo. I think it makes sense to separate presence events from link events, but
I think it would make sense to have a "link tracepoint" which reports previous
and new speed. One of those speeds being DOWN/DISABLED etc. Width could be in
there as well. I have seen many times now an engineer become confused about
checking speed because "Current Link Speed" & "Negotiated Link Width" are
"undefined" when "Data Link Layer Active" bit is unset. Ideally a solution
here would be immediately clear to the user.
When it comes to tracking things across a "fleet" having the slot number of
the device is extremely useful. We have an internal specification for our
slot number assignments that allows us to track meaning across different
generations of hardware or different architectures. The BDF is often changing
between generations, but the meaning of the slot is not.
Cheers!
- Matt
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event
2025-07-17 17:28 ` Matthew W Carlis
@ 2025-07-17 19:07 ` Bjorn Helgaas
2025-07-17 20:23 ` Lukas Wunner
0 siblings, 1 reply; 37+ messages in thread
From: Bjorn Helgaas @ 2025-07-17 19:07 UTC (permalink / raw)
To: Matthew W Carlis
Cc: xueshuai, anil.s.keshavamurthy, bhelgaas, bp, davem, linux-edac,
linux-kernel, linux-pci, linux-trace-kernel, lukas, mark.rutland,
mathieu.desnoyers, mhiramat, naveen, oleg, peterz, rostedt,
tianruidong, tony.luck, Ilpo Järvinen, Jonathan Cameron
[+cc Ilpo, Jonathan]
On Thu, Jul 17, 2025 at 11:28:26AM -0600, Matthew W Carlis wrote:
> A bit late to the discussion here.. Looks like "too late" in fact,
> but I wanted to just make some comments.
Not too late, thanks for your thoughts! When I apply things, I
consider them a draft with intention to go upstream, but not
immutable. If it makes sense to revise or postpone, we can still do
that.
> On Tue, 12 May 2025, Shuai Xue wrote:
> > Hotplug events are critical indicators for analyzing hardware
> > health,
>
> In terms of a "hot plug" event I'm not actually sure what that
> means. I mean to say that the spec has some room for different
> implementations. I think sometimes that means a presence detect
> state change event, but a system is not required to implement a
> presence pin (at least not for the Slot Status presence). Some
> vendors support an "inband" presence which is when the LTSSM
> essentially asserts presence if the link is active and deasserts it
> when the link is down.
>
> Appendix I in the newer PCIe specs say to use data link layer state
> change event if presence is not implemented. It looks like this
> tracepoint would still work, but its just something to keep in mind.
> At the risk of including too much information I could see it also
> being useful to put the device/vendor IDs of the DSP and the EP into
> the trace event for link up. Perhaps even the link speed/width cap
> for DSP/EP. The real challenge with tracking a fleet is getting all
> the things you care about into one place.
>
> On Tue, 20 May 2025, Lukas Wunner wrote:
> > Link speed changes and device plug/unplug events are orthogonal
>
> I guess what I wanted to get at here were some of the discussion
> from Lukas & Ilpo. I think it makes sense to separate presence
> events from link events, but I think it would make sense to have a
> "link tracepoint" which reports previous and new speed. One of those
> speeds being DOWN/DISABLED etc. Width could be in there as well. I
> have seen many times now an engineer become confused about checking
> speed because "Current Link Speed" & "Negotiated Link Width" are
> "undefined" when "Data Link Layer Active" bit is unset. Ideally a
> solution here would be immediately clear to the user.
>
> When it comes to tracking things across a "fleet" having the slot
> number of the device is extremely useful. We have an internal
> specification for our slot number assignments that allows us to
> track meaning across different generations of hardware or different
> architectures. The BDF is often changing between generations, but
> the meaning of the slot is not.
All the tracepoints here already include:
- pci_name() (the bus/device/function)
- slot_name() (which I think comes from make_slot_name(); would you
want something else?)
and IIUC, it would be helpful for you to add:
- DSP Vendor/Device ID (the Root Port or Switch Downstream Port,
which is relatively static, so seems less useful to me than the
USP/EP would be)
- USP/EP Vendor/Device ID
And you would consider adding a new format for "Link Up" that would
include the above plus current link speed/width? I expect we will
likely see new tracepoints similar to "Link Up" for link speed/width
changes done by bwctrl, and this would definitely make sense for
those.
As a consumer of tracepoints, do you have an opinion on the event
string? I wonder if spaces in the strings complicate searching and
scripting? I don't think tracepoints necessarily need to match text
in dmesg exactly because I suspect they're mostly processed
mechanically. But I'm not a tracepoint user myself (yet), and about
20% of existing tracepoints already include spaces, so maybe it's not
a concern.
Bjorn
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event
2025-07-17 6:00 ` Shuai Xue
@ 2025-07-17 19:29 ` Bjorn Helgaas
2025-07-21 8:55 ` Ilpo Järvinen
1 sibling, 0 replies; 37+ messages in thread
From: Bjorn Helgaas @ 2025-07-17 19:29 UTC (permalink / raw)
To: Shuai Xue
Cc: rostedt, lukas, linux-pci, linux-kernel, linux-edac,
linux-trace-kernel, bhelgaas, tony.luck, bp, mhiramat,
mathieu.desnoyers, oleg, naveen, davem, anil.s.keshavamurthy,
mark.rutland, peterz, tianruidong, Ilpo Järvinen,
Jonathan Cameron
On Thu, Jul 17, 2025 at 02:00:14PM +0800, Shuai Xue wrote:
> 在 2025/7/17 06:25, Bjorn Helgaas 写道:
> ...
> Thank you again for applying this to pci/trace for v6.17. If there’s
> anything more to tweak in the commit log or event text, please let me
> know.
Is there any convention for documenting tracepoints somewhere? It
looks like there's some doc in Documentation/trace/? Should we be
adding something there?
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event
2025-07-17 19:07 ` Bjorn Helgaas
@ 2025-07-17 20:23 ` Lukas Wunner
2025-07-17 23:27 ` Matthew W Carlis
0 siblings, 1 reply; 37+ messages in thread
From: Lukas Wunner @ 2025-07-17 20:23 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Matthew W Carlis, xueshuai, anil.s.keshavamurthy, bhelgaas, bp,
davem, linux-edac, linux-kernel, linux-pci, linux-trace-kernel,
mark.rutland, mathieu.desnoyers, mhiramat, naveen, oleg, peterz,
rostedt, tianruidong, tony.luck, Ilpo Järvinen,
Jonathan Cameron
On Thu, Jul 17, 2025 at 02:07:50PM -0500, Bjorn Helgaas wrote:
> and IIUC, it would be helpful for you to add:
>
> - DSP Vendor/Device ID (the Root Port or Switch Downstream Port,
> which is relatively static, so seems less useful to me than the
> USP/EP would be)
Right, this is already logged in dmesg upon enumeration of the hotplug port,
as well as available via lspci.
> - USP/EP Vendor/Device ID
There's no 1:1 relation between link or presence events on the one hand,
and enumeration of hotplugged components on the other hand: The link
may go up but the kernel may fail to enumerate the component, e.g. because
it was yanked before it could be enumerated, or because the kernel has run
out of MMIO space or bus numbers.
Hence this would have to be logged through a separate tracepoint in
pciehp_configure_device(), not by changing the tracepoints added here.
Thanks,
Lukas
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event
2025-07-17 20:23 ` Lukas Wunner
@ 2025-07-17 23:27 ` Matthew W Carlis
2025-07-17 23:50 ` Bjorn Helgaas
0 siblings, 1 reply; 37+ messages in thread
From: Matthew W Carlis @ 2025-07-17 23:27 UTC (permalink / raw)
To: lukas
Cc: anil.s.keshavamurthy, bhelgaas, bp, davem, helgaas, ilpo.jarvinen,
linux-edac, linux-kernel, linux-pci, linux-trace-kernel,
mark.rutland, mathieu.desnoyers, mattc, mhiramat, naveen, oleg,
peterz, rostedt, tianruidong, tony.luck, xueshuai
On Thu, 17 Jul 2025, Bjorn Helgaas wrote:
> - slot_name() (which I think comes from make_slot_name(); would you
> want something else?)
afaik it ends up coming from the Slot Cap Register "Physical Slot Number" bits.
I brought up the slot to just say that I was happy to see it & that it is useful
for our purposes & why.
On Thu, 17 Jul 2025, Lukas Wunner wrote:
>> and IIUC, it would be helpful for you to add:
>>
>> - DSP Vendor/Device ID (the Root Port or Switch Downstream Port,
>> which is relatively static, so seems less useful to me than the
>> USP/EP would be)
>
> Right, this is already logged in dmesg upon enumeration of the hotplug port,
> as well as available via lspci.
I also agree that the DSP Vendor/Device ID is less useful.
>> - USP/EP Vendor/Device ID
>
> There's no 1:1 relation between link or presence events on the one hand,
> and enumeration of hotplugged components on the other hand: The link
> may go up but the kernel may fail to enumerate the component, e.g. because
> it was yanked before it could be enumerated, or because the kernel has run
> out of MMIO space or bus numbers.
>
> Hence this would have to be logged through a separate tracepoint in
> pciehp_configure_device(), not by changing the tracepoints added here.
Ok I think its reasonable to use a separate tracepoint that would have more
information about the EP.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event
2025-07-17 23:27 ` Matthew W Carlis
@ 2025-07-17 23:50 ` Bjorn Helgaas
2025-07-18 3:46 ` Matthew W Carlis
0 siblings, 1 reply; 37+ messages in thread
From: Bjorn Helgaas @ 2025-07-17 23:50 UTC (permalink / raw)
To: Matthew W Carlis
Cc: lukas, anil.s.keshavamurthy, bhelgaas, bp, davem, ilpo.jarvinen,
linux-edac, linux-kernel, linux-pci, linux-trace-kernel,
mark.rutland, mathieu.desnoyers, mhiramat, naveen, oleg, peterz,
rostedt, tianruidong, tony.luck, xueshuai
On Thu, Jul 17, 2025 at 05:27:58PM -0600, Matthew W Carlis wrote:
> On Thu, 17 Jul 2025, Bjorn Helgaas wrote:
> > - slot_name() (which I think comes from make_slot_name(); would you
> > want something else?)
>
> afaik it ends up coming from the Slot Cap Register "Physical Slot
> Number" bits. I brought up the slot to just say that I was happy to
> see it & that it is useful for our purposes & why.
I don't think slot_name() is directly from Physical Slot Number
because there's nothing to guarantee that is unique. I think
make_slot_name() starts with PSN but does add things to make sure the
name unique.
> On Thu, 17 Jul 2025, Lukas Wunner wrote:
> >> and IIUC, it would be helpful for you to add:
> ...
> >> - USP/EP Vendor/Device ID
> >
> > There's no 1:1 relation between link or presence events on the one
> > hand, and enumeration of hotplugged components on the other hand:
> > The link may go up but the kernel may fail to enumerate the
> > component, e.g. because it was yanked before it could be
> > enumerated, or because the kernel has run out of MMIO space or bus
> > numbers.
> >
> > Hence this would have to be logged through a separate tracepoint
> > in pciehp_configure_device(), not by changing the tracepoints
> > added here.
> Ok I think its reasonable to use a separate tracepoint that would have more
> information about the EP.
So I think your idea of adding current link speed/width to the "Link
Up" event is still on the table, and that does sound useful to me.
In the future we might add another tracepoint when we enumerate the
device and know the Vendor/Device ID.
Bjorn
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event
2025-07-17 23:50 ` Bjorn Helgaas
@ 2025-07-18 3:46 ` Matthew W Carlis
2025-07-18 5:29 ` Shuai Xue
0 siblings, 1 reply; 37+ messages in thread
From: Matthew W Carlis @ 2025-07-18 3:46 UTC (permalink / raw)
To: helgaas
Cc: anil.s.keshavamurthy, bhelgaas, bp, davem, ilpo.jarvinen,
linux-edac, linux-kernel, linux-pci, linux-trace-kernel, lukas,
mark.rutland, mathieu.desnoyers, mattc, mhiramat, naveen, oleg,
peterz, rostedt, tianruidong, tony.luck, xueshuai
On Thu, Jul 17, 2025 Bjorn Helgaas wrote
> So I think your idea of adding current link speed/width to the "Link
> Up" event is still on the table, and that does sound useful to me.
We're already reading the link status register here to check DLLA so
it would be nice. I guess if everything is healthy we're probably already
at the maximum speed by this point.
> In the future we might add another tracepoint when we enumerate the
> device and know the Vendor/Device ID.
I think we might have someone who would be interested in doing it.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event
2025-07-18 3:46 ` Matthew W Carlis
@ 2025-07-18 5:29 ` Shuai Xue
2025-07-18 16:35 ` Bjorn Helgaas
2025-07-21 10:18 ` Ilpo Järvinen
0 siblings, 2 replies; 37+ messages in thread
From: Shuai Xue @ 2025-07-18 5:29 UTC (permalink / raw)
To: Matthew W Carlis, helgaas, lukas, Ilpo Järvinen
Cc: anil.s.keshavamurthy, bhelgaas, bp, davem, ilpo.jarvinen,
linux-edac, linux-kernel, linux-pci, linux-trace-kernel, lukas,
mark.rutland, mathieu.desnoyers, mhiramat, naveen, oleg, peterz,
rostedt, tianruidong, tony.luck
在 2025/7/18 11:46, Matthew W Carlis 写道:
> On Thu, Jul 17, 2025 Bjorn Helgaas wrote
>> So I think your idea of adding current link speed/width to the "Link
>> Up" event is still on the table, and that does sound useful to me.
>
> We're already reading the link status register here to check DLLA so
> it would be nice. I guess if everything is healthy we're probably already
> at the maximum speed by this point.
>
>> In the future we might add another tracepoint when we enumerate the
>> device and know the Vendor/Device ID.
>
> I think we might have someone who would be interested in doing it.
Hi, all,
IIUC, the current hotplug event (or presence event) is enough for Matthew.
and we would like a new tracepoing for link speed change which reports
speeds.
For hotplug event, I plan to send a new version to
1. address Bjorn' concerns about event strings by removing its spaces.
#define PCI_HOTPLUG_EVENT \
EM(PCI_HOTPLUG_LINK_UP, "PCI_HOTPLUG_LINK_UP") \
EM(PCI_HOTPLUG_LINK_DOWN, "PCI_HOTPLUG_LINK_DOWN") \
EM(PCI_HOTPLUG_CARD_PRESENT, "PCI_HOTPLUG_CARD_PRESENT") \
EMe(PCI_HOTPLUG_CARD_NOT_PRESENT, "PCI_HOTPLUG_CARD_NOT_PRESENT")
2. address Ilpo comments by moving pci_hp_event to a common place
(include/trace/events/pci.h) so that the new comming can also use it.
For link speed change event (perhaps named as pci_link_event),
I plan to send a seperate patch, which provides:
TP_STRUCT__entry(
__string( port_name, port_name )
__field( unsigned char, cur_bus_speed )
__field( unsigned char, max_bus_speed )
__field( unsigned char, width )
__field( unsigned int, flit_mode )
__field( unsigned char, reason )
),
The reason field is from Lukas ideas which indicates why the link speed
changed, e.g. "hotplug", "autonomous", "thermal", "retrain", etc.
Are you happy with above changes?
Thanks.
Shuai
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event
2025-07-18 5:29 ` Shuai Xue
@ 2025-07-18 16:35 ` Bjorn Helgaas
2025-07-19 5:23 ` Shuai Xue
2025-07-21 10:18 ` Ilpo Järvinen
1 sibling, 1 reply; 37+ messages in thread
From: Bjorn Helgaas @ 2025-07-18 16:35 UTC (permalink / raw)
To: Shuai Xue
Cc: Matthew W Carlis, lukas, Ilpo Järvinen, anil.s.keshavamurthy,
bhelgaas, bp, davem, linux-edac, linux-kernel, linux-pci,
linux-trace-kernel, mark.rutland, mathieu.desnoyers, mhiramat,
naveen, oleg, peterz, rostedt, tianruidong, tony.luck
On Fri, Jul 18, 2025 at 01:29:18PM +0800, Shuai Xue wrote:
> 在 2025/7/18 11:46, Matthew W Carlis 写道:
> > On Thu, Jul 17, 2025 Bjorn Helgaas wrote
> > > So I think your idea of adding current link speed/width to the "Link
> > > Up" event is still on the table, and that does sound useful to me.
> >
> > We're already reading the link status register here to check DLLA so
> > it would be nice. I guess if everything is healthy we're probably already
> > at the maximum speed by this point.
> >
> > > In the future we might add another tracepoint when we enumerate the
> > > device and know the Vendor/Device ID.
> >
> > I think we might have someone who would be interested in doing it.
>
> IIUC, the current hotplug event (or presence event) is enough for Matthew.
> and we would like a new tracepoing for link speed change which reports
> speeds.
>
> For hotplug event, I plan to send a new version to
>
> 1. address Bjorn' concerns about event strings by removing its spaces.
>
> #define PCI_HOTPLUG_EVENT \
> EM(PCI_HOTPLUG_LINK_UP, "PCI_HOTPLUG_LINK_UP") \
> EM(PCI_HOTPLUG_LINK_DOWN, "PCI_HOTPLUG_LINK_DOWN") \
> EM(PCI_HOTPLUG_CARD_PRESENT, "PCI_HOTPLUG_CARD_PRESENT") \
> EMe(PCI_HOTPLUG_CARD_NOT_PRESENT, "PCI_HOTPLUG_CARD_NOT_PRESENT")
>
> 2. address Ilpo comments by moving pci_hp_event to a common place
> (include/trace/events/pci.h) so that the new comming can also use it.
>
> For link speed change event (perhaps named as pci_link_event),
> I plan to send a seperate patch, which provides:
>
> TP_STRUCT__entry(
> __string( port_name, port_name )
> __field( unsigned char, cur_bus_speed )
> __field( unsigned char, max_bus_speed )
> __field( unsigned char, width )
> __field( unsigned int, flit_mode )
> __field( unsigned char, reason )
> ),
>
> The reason field is from Lukas ideas which indicates why the link speed
> changed, e.g. "hotplug", "autonomous", "thermal", "retrain", etc.
>
> Are you happy with above changes?
Seems good to me.
What do you plan for PCI_HOTPLUG_LINK_UP? It would be nice to have
the link info there since that's sort of a link speed change itself.
Bjorn
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event
2025-07-18 16:35 ` Bjorn Helgaas
@ 2025-07-19 5:23 ` Shuai Xue
2025-07-19 7:11 ` Lukas Wunner
0 siblings, 1 reply; 37+ messages in thread
From: Shuai Xue @ 2025-07-19 5:23 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Matthew W Carlis, lukas, Ilpo Järvinen, anil.s.keshavamurthy,
bhelgaas, bp, davem, linux-edac, linux-kernel, linux-pci,
linux-trace-kernel, mark.rutland, mathieu.desnoyers, mhiramat,
naveen, oleg, peterz, rostedt, tianruidong, tony.luck
在 2025/7/19 00:35, Bjorn Helgaas 写道:
> On Fri, Jul 18, 2025 at 01:29:18PM +0800, Shuai Xue wrote:
>> 在 2025/7/18 11:46, Matthew W Carlis 写道:
>>> On Thu, Jul 17, 2025 Bjorn Helgaas wrote
>>>> So I think your idea of adding current link speed/width to the "Link
>>>> Up" event is still on the table, and that does sound useful to me.
>>>
>>> We're already reading the link status register here to check DLLA so
>>> it would be nice. I guess if everything is healthy we're probably already
>>> at the maximum speed by this point.
>>>
>>>> In the future we might add another tracepoint when we enumerate the
>>>> device and know the Vendor/Device ID.
>>>
>>> I think we might have someone who would be interested in doing it.
>>
>> IIUC, the current hotplug event (or presence event) is enough for Matthew.
>> and we would like a new tracepoing for link speed change which reports
>> speeds.
>>
>> For hotplug event, I plan to send a new version to
>>
>> 1. address Bjorn' concerns about event strings by removing its spaces.
>>
>> #define PCI_HOTPLUG_EVENT \
>> EM(PCI_HOTPLUG_LINK_UP, "PCI_HOTPLUG_LINK_UP") \
>> EM(PCI_HOTPLUG_LINK_DOWN, "PCI_HOTPLUG_LINK_DOWN") \
>> EM(PCI_HOTPLUG_CARD_PRESENT, "PCI_HOTPLUG_CARD_PRESENT") \
>> EMe(PCI_HOTPLUG_CARD_NOT_PRESENT, "PCI_HOTPLUG_CARD_NOT_PRESENT")
>>
>> 2. address Ilpo comments by moving pci_hp_event to a common place
>> (include/trace/events/pci.h) so that the new comming can also use it.
>>
>> For link speed change event (perhaps named as pci_link_event),
>> I plan to send a seperate patch, which provides:
>>
>> TP_STRUCT__entry(
>> __string( port_name, port_name )
>> __field( unsigned char, cur_bus_speed )
>> __field( unsigned char, max_bus_speed )
>> __field( unsigned char, width )
>> __field( unsigned int, flit_mode )
>> __field( unsigned char, reason )
>> ),
>>
>> The reason field is from Lukas ideas which indicates why the link speed
>> changed, e.g. "hotplug", "autonomous", "thermal", "retrain", etc.
>>
>> Are you happy with above changes?
>
> Seems good to me.
>
> What do you plan for PCI_HOTPLUG_LINK_UP? It would be nice to have
> the link info there since that's sort of a link speed change itself.
>
> Bjorn
Hi, Bjorn,
IMMO, link information is best included in the pci_link_event tracepoint,
rather than in the hotplug event itself.
For example, with the a device hotpluged, the trace output looks like this:
$ cat /sys/kernel/debug/tracing/trace_pipe
<...>-120 [002] ..... 104.864051: pci_hp_event: 0000:00:03.0 slot:30, event:PCI_HOTPLUG_CARD_PRESENT
<...>-120 [002] ..... 104.864081: pci_hp_event: 0000:00:03.0 slot:30, event:PCI_HOTPLUG_LINK_UP
irq/57-pciehp-120 [002] ..... 104.990434: pci_link_event: 0000:00:03.0 cur_bus_speed:20, max_bus_speed:23, width:1, flit_mode:0, reason:5
irq/57-pciehp-120 [002] ..... 104.992377: pci_link_event: 0000:00:03.0 cur_bus_speed:20, max_bus_speed:23, width:1, flit_mode:0, reason:0
Here, reason 5 corresponds to PCIe hotplug, and reason 0 indicates a PCIe link retrain.
This approach keeps the separation of concerns clear: hotplug events are
tracked by pci_hp_event, while all link-related information, including
speed changes and the reason for the change, is handled by
pci_link_event.
Thanks.
Shuai
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event
2025-07-19 5:23 ` Shuai Xue
@ 2025-07-19 7:11 ` Lukas Wunner
2025-07-21 13:17 ` Shuai Xue
0 siblings, 1 reply; 37+ messages in thread
From: Lukas Wunner @ 2025-07-19 7:11 UTC (permalink / raw)
To: Shuai Xue
Cc: Bjorn Helgaas, Matthew W Carlis, Ilpo Järvinen,
anil.s.keshavamurthy, bhelgaas, bp, davem, linux-edac,
linux-kernel, linux-pci, linux-trace-kernel, mark.rutland,
mathieu.desnoyers, mhiramat, naveen, oleg, peterz, rostedt,
tianruidong, tony.luck
On Sat, Jul 19, 2025 at 01:23:28PM +0800, Shuai Xue wrote:
> <...>-120 [002] ..... 104.864051: pci_hp_event: 0000:00:03.0 slot:30, event:PCI_HOTPLUG_CARD_PRESENT
> <...>-120 [002] ..... 104.864081: pci_hp_event: 0000:00:03.0 slot:30, event:PCI_HOTPLUG_LINK_UP
Somehow I liked the simple "Link Up" and "Card present" strings more
than this. :)
The PCI_HOTPLUG substring repeats what pci_hp_event already betrays,
that this is a hotplug event.
> irq/57-pciehp-120 [002] ..... 104.990434: pci_link_event: 0000:00:03.0 cur_bus_speed:20, max_bus_speed:23, width:1, flit_mode:0, reason:5
> irq/57-pciehp-120 [002] ..... 104.992377: pci_link_event: 0000:00:03.0 cur_bus_speed:20, max_bus_speed:23, width:1, flit_mode:0, reason:0
This contains a lot of terminology specific to PCI *Express*
(versus Conventional PCI or PCI-X). Either it needs to be
"pcie_link_event" or we need to come up with a structure that
works for non-PCIe as well.
PCI links can be tunneled over Thunderbolt, in this case the
link speed is fixed to 2.5 GT/s (USB4 v1.0 sec 11.2.1), but
in reality is governed by the speed of the Thunderbolt fabric
(which can even be asymmetric). Do we want to report the
virtual 2.5 GT/s in this case or the actual Thunderbolt speed?
Or do we want a separate trace event for Thunderbolt?
For Root and Downstream Ports, the physical "port" points "downstream",
whereas for Upstream Ports and Endpoints, the physical "port" points
"upstream". Software interpreting the trace event may want to know
the direction (or whatever one wants to call it) because it cannot
tell from the address 0000:00:03.0 what the PCIe type is. Having to
look this up in lspci seems cumbersome. So it may be worthwhile to
include either the port's direction or the device's PCIe type in the
trace event.
Of course, hotplug only exists at Root or Downstream Ports, so any
trace event generated from the PCIe hotplug driver will pertain to
a downstream-facing port. But the bandwidth controller also binds
to Upstream Ports and its trace events may thus pertain to link speed
changes at an upstream-facing port.
Thanks,
Lukas
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event
2025-07-17 6:00 ` Shuai Xue
2025-07-17 19:29 ` Bjorn Helgaas
@ 2025-07-21 8:55 ` Ilpo Järvinen
1 sibling, 0 replies; 37+ messages in thread
From: Ilpo Järvinen @ 2025-07-21 8:55 UTC (permalink / raw)
To: Shuai Xue, Bjorn Helgaas, rostedt, mhiramat
Cc: Lukas Wunner, linux-pci, LKML, linux-edac, linux-trace-kernel,
bhelgaas, tony.luck, bp, mathieu.desnoyers, oleg, naveen, davem,
anil.s.keshavamurthy, mark.rutland, peterz, tianruidong,
Jonathan Cameron
[-- Attachment #1: Type: text/plain, Size: 6437 bytes --]
Steven or other tracing people, can you check below the
discussion/question on preferred placement of the tracing
related code.
On Thu, 17 Jul 2025, Shuai Xue wrote:
> 在 2025/7/17 06:25, Bjorn Helgaas 写道:
> > [+cc Ilpo, Jonathan (should have been included since the patch has his
> > Reviewed-by)]
> >
>
> Thanks.
>
> > Thanks for the ping; I noticed quite a bit of discussion but didn't
> > follow it myself, so didn't know it was basically all resolved.
The biggest remaining uncertainty related to the placement of some of the
tracepoint code (tracepoint creation and the trace header). None of us
knew the answer so we tried to ask a more informed opinion from others but
that never resulted us getting any answer.
I've a patch which adds tracing to bwctrl [1] (I'll send it officially
once the placement discussion concludes), and in it, I ended up adding
drivers/pci/pci-trace.c as the place where to define the tracepoints and
include/trace/events/pci.h as the trace header placement, whereas this
patch added them both into the hp driver code.
So if you (Bjorn) are fine placing them into each individual driver as
done in this patch and tracing folks don't provide us guidance, we can
go with that approach.
The placement discussion is in this subleaf of the thread:
https://lore.kernel.org/linux-pci/c0cbdd85-9702-40ab-bc97-b51b02b9fc89@linux.alibaba.com/
[1] https://lore.kernel.org/linux-pci/7c289bba-3133-0989-6333-41fc41fe3504@linux.intel.com/1-a.txt
--
i.
> > On Mon, May 12, 2025 at 09:38:39AM +0800, Shuai Xue wrote:
> > > Hotplug events are critical indicators for analyzing hardware health,
> > > particularly in AI supercomputers where surprise link downs can
> > > significantly impact system performance and reliability.
> >
> > I dropped the "particularly in AI supercomputers" part because I think
> > this is relevant in general.
> >
> > > To this end, define a new TRACING_SYSTEM named pci, add a generic RAS
> > > tracepoint for hotplug event to help healthy check, and generate
> > > tracepoints for pcie hotplug event.
> >
> > I'm not quite clear on the difference between "add generic RAS
> > tracepoint for hotplug event" and "generate tracepoints for pcie
> > hotplug event." Are these two different things?
>
> The purpose of this patch is to address the lack of tracepoints for PCIe
> hotplug events in our production environment. In the initial RFC
> version, I defined tracepoints such as "Link Up" and "Link Down"
> specifically for PCIe hotplug. Later, Lukas suggested that these
> tracepoints could be made more generic so that other PCI hotplug drivers
> could also use them.
>
> That’s why, when defining the event, I used a "generic" pci_hotplug_event
> instead of a pcie_hotplug_event. If you're interested in more details
> about this discussion, please refer to this link[1].
>
> [1]https://erol.kernel.org/linux-pci/git/0/commit/?id=0ffd56f572f25bcd6c2265a1863848a18dce0e29
>
> However, currently only PCIe hotplug is using these tracepoints, which
> is why the CREATE_TRACE_POINTS macro is placed in
> drivers/pci/hotplug/pciehp_ctrl.c.
>
> >
> > I see the new TRACE_EVENT(pci_hp_event, ...) definition. Is that what
> > you mean by the "generic RAS tracepoint"?
>
> Yes.
>
>
> >
> > And the five new trace_pci_hp_event() calls that use the TRACE_EVENT
> > are the "tracepoints for PCIe hotplug event"?
>
> Actually, the tracepoints are generic, although right now they are only
> used for PCIe hotplug.
>
> >
> > > Add enum pci_hotplug_event in
> > > include/uapi/linux/pci.h so applications like rasdaemon can register
> > > tracepoint event handlers for it.
> > >
> > > The output like below:
> > >
> > > $ echo 1 > /sys/kernel/debug/tracing/events/pci/pci_hp_event/enable
> > > $ cat /sys/kernel/debug/tracing/trace_pipe
> > > <...>-206 [001] ..... 40.373870: pci_hp_event: 0000:00:02.0
> > > slot:10, event:Link Down
> > >
> > > <...>-206 [001] ..... 40.374871: pci_hp_event: 0000:00:02.0
> > > slot:10, event:Card not present
> >
> > > +#define PCI_HOTPLUG_EVENT \
> > > + EM(PCI_HOTPLUG_LINK_UP, "Link Up") \
> > > + EM(PCI_HOTPLUG_LINK_DOWN, "Link Down") \
> > > + EM(PCI_HOTPLUG_CARD_PRESENT, "Card present") \
> > > + EMe(PCI_HOTPLUG_CARD_NOT_PRESENT, "Card not present")
> >
> > Running this:
> >
> > $ git grep -E "\<(EM|EMe)\("
> >
> > I notice that these new events don't look like the others, which
> > mostly look like "word" or "event-type" or "VERB object".
> >
> > I'm OK with this, but just giving you a chance to consider what will
> > be the least surprise to users and easiest for grep and shell
> > scripting.
>
> I think this is also common. For example, MF_PAGE_TYPE for
> memory_failure_event uses a similar format:
>
> #define MF_PAGE_TYPE \
> EM ( MF_MSG_KERNEL, "reserved kernel page" ) \
> EM ( MF_MSG_KERNEL_HIGH_ORDER, "high-order kernel page" )
>
>
> and aer_uncorrectable_errors for aer_event:
>
> #define aer_uncorrectable_errors \
> {PCI_ERR_UNC_UND, "Undefined"}, \
> {PCI_ERR_UNC_DLP, "Data Link Protocol Error"}, \
> {PCI_ERR_UNC_SURPDN, "Surprise Down Error"}, \
> {PCI_ERR_UNC_POISON_TLP,"Poisoned TLP"},
>
> >
> > I also noticed capitalization of "Up" and "Down", but not "present"
> > and "not present".
>
> Aha, this is a bit tricky:)
>
> The original kernel log messages are not consistent either:
>
> ctrl_info(ctrl, "Slot(%s): Link Down\n",
> ctrl_info(ctrl, "Slot(%s): Card not present\n",
>
> I tried to keep the output as close as possible to the existing log
> messages. If you prefer a more consistent capitalization style, I can
> send another patch to fix that.
>
>
> >
> > "Card" is only used occasionally and informally in the PCIe spec, and
> > not at all in the context of hotplug of Slot Status (Presence Detect
> > State refers to "adapter in the slot"), but it does match the pciehp
> > dmesg text, so it probably makes sense to use that.
> >
> > Anyway, I applied this on pci/trace for v6.17. If there's anything
> > you want to tweak in the commit log or event text, we can still do
> > that.
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git/commit/?h=trace
> >
> > Bjorn
>
> Thank you again for applying this to pci/trace for v6.17. If there’s
> anything more to tweak in the commit log or event text, please let me
> know.
>
> Best regards,
>
> Shuai
>
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event
2025-07-18 5:29 ` Shuai Xue
2025-07-18 16:35 ` Bjorn Helgaas
@ 2025-07-21 10:18 ` Ilpo Järvinen
2025-07-22 2:43 ` [PATCH v8] PCI: hotplug: Add a generic RAS tracepoinggt " Shuai Xue
1 sibling, 1 reply; 37+ messages in thread
From: Ilpo Järvinen @ 2025-07-21 10:18 UTC (permalink / raw)
To: Shuai Xue
Cc: Matthew W Carlis, helgaas, Lukas Wunner, anil.s.keshavamurthy,
bhelgaas, bp, davem, linux-edac, LKML, linux-pci,
linux-trace-kernel, mark.rutland, mathieu.desnoyers, mhiramat,
naveen, oleg, peterz, rostedt, tianruidong, tony.luck
[-- Attachment #1: Type: text/plain, Size: 2667 bytes --]
On Fri, 18 Jul 2025, Shuai Xue wrote:
> 在 2025/7/18 11:46, Matthew W Carlis 写道:
> > On Thu, Jul 17, 2025 Bjorn Helgaas wrote
> > > So I think your idea of adding current link speed/width to the "Link
> > > Up" event is still on the table, and that does sound useful to me.
> >
> > We're already reading the link status register here to check DLLA so
> > it would be nice. I guess if everything is healthy we're probably already
> > at the maximum speed by this point.
> >
> > > In the future we might add another tracepoint when we enumerate the
> > > device and know the Vendor/Device ID.
> >
> > I think we might have someone who would be interested in doing it.
>
>
> Hi, all,
>
> IIUC, the current hotplug event (or presence event) is enough for Matthew.
> and we would like a new tracepoing for link speed change which reports
> speeds.
>
> For hotplug event, I plan to send a new version to
>
> 1. address Bjorn' concerns about event strings by removing its spaces.
>
> #define PCI_HOTPLUG_EVENT
> \
> EM(PCI_HOTPLUG_LINK_UP, "PCI_HOTPLUG_LINK_UP")
> \
> EM(PCI_HOTPLUG_LINK_DOWN, "PCI_HOTPLUG_LINK_DOWN")
> \
> EM(PCI_HOTPLUG_CARD_PRESENT, "PCI_HOTPLUG_CARD_PRESENT")
> \
> EMe(PCI_HOTPLUG_CARD_NOT_PRESENT,
> "PCI_HOTPLUG_CARD_NOT_PRESENT")
>
> 2. address Ilpo comments by moving pci_hp_event to a common place
> (include/trace/events/pci.h) so that the new comming can also use it.
Ah, I only now noticed you've decided to re-place them. Please disregard
my other comment about this being still open/undecided item.
> For link speed change event (perhaps named as pci_link_event),
> I plan to send a seperate patch, which provides:
>
> TP_STRUCT__entry(
> __string( port_name, port_name )
> __field( unsigned char, cur_bus_speed )
> __field( unsigned char, max_bus_speed )
> __field( unsigned char, width )
> __field( unsigned int, flit_mode )
> __field( unsigned char, reason )
> ),
>
> The reason field is from Lukas ideas which indicates why the link speed
> changed, e.g. "hotplug", "autonomous", "thermal", "retrain", etc.
>
> Are you happy with above changes?
Since you're probably quite far with the pcie link event patch too given
above, could you take a look at the LNKSTA flags representation in my
patch and incorporate those as well as there seems to always lot of
uncertainty about those flags when investigating the LBMS/bwctrl related
issues so it seems prudent to explicitly include them into the traceevent
output:
https://lore.kernel.org/linux-pci/7c289bba-3133-0989-6333-41fc41fe3504@linux.intel.com/
--
i.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event
2025-07-19 7:11 ` Lukas Wunner
@ 2025-07-21 13:17 ` Shuai Xue
2025-07-26 7:55 ` Lukas Wunner
0 siblings, 1 reply; 37+ messages in thread
From: Shuai Xue @ 2025-07-21 13:17 UTC (permalink / raw)
To: Lukas Wunner, Bjorn Helgaas, Matthew W Carlis, Ilpo Järvinen
Cc: anil.s.keshavamurthy, bhelgaas, bp, davem, linux-edac,
linux-kernel, linux-pci, linux-trace-kernel, mark.rutland,
mathieu.desnoyers, mhiramat, naveen, oleg, peterz, rostedt,
tianruidong, tony.luck
在 2025/7/19 15:11, Lukas Wunner 写道:
> On Sat, Jul 19, 2025 at 01:23:28PM +0800, Shuai Xue wrote:
>> <...>-120 [002] ..... 104.864051: pci_hp_event: 0000:00:03.0 slot:30, event:PCI_HOTPLUG_CARD_PRESENT
>> <...>-120 [002] ..... 104.864081: pci_hp_event: 0000:00:03.0 slot:30, event:PCI_HOTPLUG_LINK_UP
>
> Somehow I liked the simple "Link Up" and "Card present" strings more
> than this. :)
>
> The PCI_HOTPLUG substring repeats what pci_hp_event already betrays,
> that this is a hotplug event.
I think Bjorn's concern is mainly about parsing issues when strings
contain spaces.
So, how about using "Link_UP" and "Card_Present" instead?
>
>> irq/57-pciehp-120 [002] ..... 104.990434: pci_link_event: 0000:00:03.0 cur_bus_speed:20, max_bus_speed:23, width:1, flit_mode:0, reason:5
>> irq/57-pciehp-120 [002] ..... 104.992377: pci_link_event: 0000:00:03.0 cur_bus_speed:20, max_bus_speed:23, width:1, flit_mode:0, reason:0
>
> This contains a lot of terminology specific to PCI *Express*
> (versus Conventional PCI or PCI-X). Either it needs to be
> "pcie_link_event" or we need to come up with a structure that
> works for non-PCIe as well.
>
I see, I will rename it to pcie_link_event.
> PCI links can be tunneled over Thunderbolt, in this case the
> link speed is fixed to 2.5 GT/s (USB4 v1.0 sec 11.2.1), but
> in reality is governed by the speed of the Thunderbolt fabric
> (which can even be asymmetric). Do we want to report the
> virtual 2.5 GT/s in this case or the actual Thunderbolt speed?
> Or do we want a separate trace event for Thunderbolt?
I'm not a user of Thunderbolt, which way do you prefer?
>
> For Root and Downstream Ports, the physical "port" points "downstream",
> whereas for Upstream Ports and Endpoints, the physical "port" points
> "upstream". Software interpreting the trace event may want to know
> the direction (or whatever one wants to call it) because it cannot
> tell from the address 0000:00:03.0 what the PCIe type is. Having to
> look this up in lspci seems cumbersome. So it may be worthwhile to
> include either the port's direction or the device's PCIe type in the
> trace event.
>
> Of course, hotplug only exists at Root or Downstream Ports, so any
> trace event generated from the PCIe hotplug driver will pertain to
> a downstream-facing port. But the bandwidth controller also binds
> to Upstream Ports and its trace events may thus pertain to link speed
> changes at an upstream-facing port.
Got it, i will device's PCIe type,
>
> Thanks,
>
> Lukas
Thanks for valuable comments.
Best Regards,
Shuai
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoinggt for hotplug event
2025-07-21 10:18 ` Ilpo Järvinen
@ 2025-07-22 2:43 ` Shuai Xue
2025-07-22 12:29 ` Ilpo Järvinen
0 siblings, 1 reply; 37+ messages in thread
From: Shuai Xue @ 2025-07-22 2:43 UTC (permalink / raw)
To: Ilpo Järvinen
Cc: Matthew W Carlis, helgaas, Lukas Wunner, anil.s.keshavamurthy,
bhelgaas, bp, davem, linux-edac, LKML, linux-pci,
linux-trace-kernel, mark.rutland, mathieu.desnoyers, mhiramat,
naveen, oleg, peterz, rostedt, tianruidong, tony.luck
在 2025/7/21 18:18, Ilpo Järvinen 写道:
> On Fri, 18 Jul 2025, Shuai Xue wrote:
>> 在 2025/7/18 11:46, Matthew W Carlis 写道:
>>> On Thu, Jul 17, 2025 Bjorn Helgaas wrote
>>>> So I think your idea of adding current link speed/width to the "Link
>>>> Up" event is still on the table, and that does sound useful to me.
>>>
>>> We're already reading the link status register here to check DLLA so
>>> it would be nice. I guess if everything is healthy we're probably already
>>> at the maximum speed by this point.
>>>
>>>> In the future we might add another tracepoint when we enumerate the
>>>> device and know the Vendor/Device ID.
>>>
>>> I think we might have someone who would be interested in doing it.
>>
>>
>> Hi, all,
>>
>> IIUC, the current hotplug event (or presence event) is enough for Matthew.
>> and we would like a new tracepoing for link speed change which reports
>> speeds.
>>
>> For hotplug event, I plan to send a new version to
>>
>> 1. address Bjorn' concerns about event strings by removing its spaces.
>>
>> #define PCI_HOTPLUG_EVENT
>> \
>> EM(PCI_HOTPLUG_LINK_UP, "PCI_HOTPLUG_LINK_UP")
>> \
>> EM(PCI_HOTPLUG_LINK_DOWN, "PCI_HOTPLUG_LINK_DOWN")
>> \
>> EM(PCI_HOTPLUG_CARD_PRESENT, "PCI_HOTPLUG_CARD_PRESENT")
>> \
>> EMe(PCI_HOTPLUG_CARD_NOT_PRESENT,
>> "PCI_HOTPLUG_CARD_NOT_PRESENT")
>>
>> 2. address Ilpo comments by moving pci_hp_event to a common place
>> (include/trace/events/pci.h) so that the new comming can also use it.
>
> Ah, I only now noticed you've decided to re-place them. Please disregard
> my other comment about this being still open/undecided item.
>
>> For link speed change event (perhaps named as pci_link_event),
>> I plan to send a seperate patch, which provides:
>>
>> TP_STRUCT__entry(
>> __string( port_name, port_name )
>> __field( unsigned char, cur_bus_speed )
>> __field( unsigned char, max_bus_speed )
>> __field( unsigned char, width )
>> __field( unsigned int, flit_mode )
>> __field( unsigned char, reason )
>> ),
>>
>> The reason field is from Lukas ideas which indicates why the link speed
>> changed, e.g. "hotplug", "autonomous", "thermal", "retrain", etc.
>>
>> Are you happy with above changes?
>
> Since you're probably quite far with the pcie link event patch too given
> above, could you take a look at the LNKSTA flags representation in my
> patch and incorporate those as well as there seems to always lot of
> uncertainty about those flags when investigating the LBMS/bwctrl related
> issues so it seems prudent to explicitly include them into the traceevent
> output:
>
> https://lore.kernel.org/linux-pci/7c289bba-3133-0989-6333-41fc41fe3504@linux.intel.com/
>
>
Sure, Thank you for the feedback.
I like the LNKSTA flags, LNKSTA flags provides better genericity
compared to the custom reason field I initially proposed. But it may
cause confusion when used in pcie_retrain_link(). However, I've
identified a potential issue when this approach is applied in
pcie_retrain_link() scenarios.
Consider the following trace output when a device hotpluged:
$ cat /sys/kernel/debug/tracing/trace_pipe
$ cat /sys/kernel/debug/tracing/trace_pipe
<...>-118 [002] ..... 28.414220: pci_hp_event: 0000:00:03.0 slot:30, event:PCI_HOTPLUG_CARD_PRESENT
<...>-118 [002] ..... 28.414273: pci_hp_event: 0000:00:03.0 slot:30, event:PCI_HOTPLUG_LINK_UP
irq/57-pciehp-118 [002] ..... 28.540189: pcie_link_event: 0000:00:03.0 type:4, cur_bus_speed:2.5 GT/s PCIe, max_bus_speed:16.0 GT/s PCIe, width:1, flit_mode:0, status:DLLLA
irq/57-pciehp-118 [002] ..... 28.544999: pcie_link_event: 0000:00:03.0 type:4, cur_bus_speed:2.5 GT/s PCIe, max_bus_speed:16.0 GT/s PCIe, width:1, flit_mode:0, status:DLLLA
The problem is that both trace events show status:DLLLA (Data Link Layer
Link Active), which is the direct reading from PCI_EXP_LNKSTA. However,
this doesn't accurately reflect the underlying context:
- First DLLLA: Triggered by board_added() - link establishment after
card insertion
- Second DLLLA: Triggered by pcie_retrain_link() - link retraining
completion
( I trace the events in pcie_update_link_speed() )
In the second case, the more relevant status would be PCI_EXP_LNKSTA_LT
(Link Training) to indicate that link retraining was performed, even
though the final register state shows DLLLA.
Question: Should we explicitly report the contextual status (e.g.,
PCI_EXP_LNKSTA_LT for retraining scenarios) rather than always reading
the current register field? This would provide more meaningful trace
information for debugging link state transitions.
Additionally, I'd appreciate your thoughts on the overall tracepoint
format shown above. Does this structure provide sufficient information
for hotplug and link analysis while maintaining readability?
Thanks.
Shuai
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoinggt for hotplug event
2025-07-22 2:43 ` [PATCH v8] PCI: hotplug: Add a generic RAS tracepoinggt " Shuai Xue
@ 2025-07-22 12:29 ` Ilpo Järvinen
2025-07-23 1:29 ` Shuai Xue
0 siblings, 1 reply; 37+ messages in thread
From: Ilpo Järvinen @ 2025-07-22 12:29 UTC (permalink / raw)
To: Shuai Xue
Cc: Matthew W Carlis, helgaas, Lukas Wunner, anil.s.keshavamurthy,
bhelgaas, bp, davem, linux-edac, LKML, linux-pci,
linux-trace-kernel, mark.rutland, mathieu.desnoyers, mhiramat,
naveen, oleg, peterz, rostedt, tianruidong, tony.luck
[-- Attachment #1: Type: text/plain, Size: 6066 bytes --]
On Tue, 22 Jul 2025, Shuai Xue wrote:
> 在 2025/7/21 18:18, Ilpo Järvinen 写道:
> > On Fri, 18 Jul 2025, Shuai Xue wrote:
> > > 在 2025/7/18 11:46, Matthew W Carlis 写道:
> > > > On Thu, Jul 17, 2025 Bjorn Helgaas wrote
> > > > > So I think your idea of adding current link speed/width to the "Link
> > > > > Up" event is still on the table, and that does sound useful to me.
> > > >
> > > > We're already reading the link status register here to check DLLA so
> > > > it would be nice. I guess if everything is healthy we're probably
> > > > already
> > > > at the maximum speed by this point.
> > > >
> > > > > In the future we might add another tracepoint when we enumerate the
> > > > > device and know the Vendor/Device ID.
> > > >
> > > > I think we might have someone who would be interested in doing it.
> > >
> > >
> > > Hi, all,
> > >
> > > IIUC, the current hotplug event (or presence event) is enough for Matthew.
> > > and we would like a new tracepoing for link speed change which reports
> > > speeds.
> > >
> > > For hotplug event, I plan to send a new version to
> > >
> > > 1. address Bjorn' concerns about event strings by removing its spaces.
> > >
> > > #define PCI_HOTPLUG_EVENT
> > > \
> > > EM(PCI_HOTPLUG_LINK_UP, "PCI_HOTPLUG_LINK_UP")
> > > \
> > > EM(PCI_HOTPLUG_LINK_DOWN, "PCI_HOTPLUG_LINK_DOWN")
> > > \
> > > EM(PCI_HOTPLUG_CARD_PRESENT, "PCI_HOTPLUG_CARD_PRESENT")
> > > \
> > > EMe(PCI_HOTPLUG_CARD_NOT_PRESENT,
> > > "PCI_HOTPLUG_CARD_NOT_PRESENT")
> > >
> > > 2. address Ilpo comments by moving pci_hp_event to a common place
> > > (include/trace/events/pci.h) so that the new comming can also use it.
> >
> > Ah, I only now noticed you've decided to re-place them. Please disregard
> > my other comment about this being still open/undecided item.
> >
> > > For link speed change event (perhaps named as pci_link_event),
> > > I plan to send a seperate patch, which provides:
> > >
> > > TP_STRUCT__entry(
> > > __string( port_name, port_name )
> > > __field( unsigned char, cur_bus_speed )
> > > __field( unsigned char, max_bus_speed )
> > > __field( unsigned char, width )
> > > __field( unsigned int, flit_mode )
> > > __field( unsigned char, reason )
> > > ),
> > >
> > > The reason field is from Lukas ideas which indicates why the link speed
> > > changed, e.g. "hotplug", "autonomous", "thermal", "retrain", etc.
> > >
> > > Are you happy with above changes?
> >
> > Since you're probably quite far with the pcie link event patch too given
> > above, could you take a look at the LNKSTA flags representation in my
> > patch and incorporate those as well as there seems to always lot of
> > uncertainty about those flags when investigating the LBMS/bwctrl related
> > issues so it seems prudent to explicitly include them into the traceevent
> > output:
> >
> > https://lore.kernel.org/linux-pci/7c289bba-3133-0989-6333-41fc41fe3504@linux.intel.com/
> >
> >
>
> Sure, Thank you for the feedback.
>
> I like the LNKSTA flags, LNKSTA flags provides better genericity
> compared to the custom reason field I initially proposed. But it may
> cause confusion when used in pcie_retrain_link(). However, I've
> identified a potential issue when this approach is applied in
> pcie_retrain_link() scenarios.
I was trying to say the flags should be in addition to the other
information, not replace reason.
> Consider the following trace output when a device hotpluged:
>
> $ cat /sys/kernel/debug/tracing/trace_pipe
> $ cat /sys/kernel/debug/tracing/trace_pipe
> <...>-118 [002] ..... 28.414220: pci_hp_event: 0000:00:03.0
> slot:30, event:PCI_HOTPLUG_CARD_PRESENT
>
> <...>-118 [002] ..... 28.414273: pci_hp_event: 0000:00:03.0
> slot:30, event:PCI_HOTPLUG_LINK_UP
>
> irq/57-pciehp-118 [002] ..... 28.540189: pcie_link_event:
> 0000:00:03.0 type:4, cur_bus_speed:2.5 GT/s PCIe, max_bus_speed:16.0 GT/s
> PCIe, width:1, flit_mode:0, status:DLLLA
>
> irq/57-pciehp-118 [002] ..... 28.544999: pcie_link_event:
> 0000:00:03.0 type:4, cur_bus_speed:2.5 GT/s PCIe, max_bus_speed:16.0 GT/s
> PCIe, width:1, flit_mode:0, status:DLLLA
>
> The problem is that both trace events show status:DLLLA (Data Link Layer
> Link Active), which is the direct reading from PCI_EXP_LNKSTA. However,
> this doesn't accurately reflect the underlying context:
>
> - First DLLLA: Triggered by board_added() - link establishment after
> card insertion
> - Second DLLLA: Triggered by pcie_retrain_link() - link retraining
> completion
>
> ( I trace the events in pcie_update_link_speed() )
>
> In the second case, the more relevant status would be PCI_EXP_LNKSTA_LT
> (Link Training) to indicate that link retraining was performed, even
> though the final register state shows DLLLA.
>
> Question: Should we explicitly report the contextual status (e.g.,
> PCI_EXP_LNKSTA_LT for retraining scenarios) rather than always reading
> the current register field? This would provide more meaningful trace
> information for debugging link state transitions.
I'd prefer it coming from the LNKSTA register (TBH, I don't see much value
in synthetizing it at all). If we start to synthetize them, it will
potentially hide hw issues. I see on some platforms two LBMS assertions
per bwctrl speed change (which is done by retraining the link), one with
LT=1 and the second with LT=0.
...But I never meant to replace "reason" with "flags".
> Additionally, I'd appreciate your thoughts on the overall tracepoint
> format shown above. Does this structure provide sufficient information
> for hotplug and link analysis while maintaining readability?
I don't have ideas how it could be improved beyond having those 4 flags
available. I suspect noone does as we've not had ability to collect this
information before when investigating issues so we're yet to understand
all its potential.
--
i.
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoinggt for hotplug event
2025-07-22 12:29 ` Ilpo Järvinen
@ 2025-07-23 1:29 ` Shuai Xue
0 siblings, 0 replies; 37+ messages in thread
From: Shuai Xue @ 2025-07-23 1:29 UTC (permalink / raw)
To: Ilpo Järvinen
Cc: Matthew W Carlis, helgaas, Lukas Wunner, anil.s.keshavamurthy,
bhelgaas, bp, davem, linux-edac, LKML, linux-pci,
linux-trace-kernel, mark.rutland, mathieu.desnoyers, mhiramat,
naveen, oleg, peterz, rostedt, tianruidong, tony.luck
在 2025/7/22 20:29, Ilpo Järvinen 写道:
> On Tue, 22 Jul 2025, Shuai Xue wrote:
>> 在 2025/7/21 18:18, Ilpo Järvinen 写道:
>>> On Fri, 18 Jul 2025, Shuai Xue wrote:
>>>> 在 2025/7/18 11:46, Matthew W Carlis 写道:
>>>>> On Thu, Jul 17, 2025 Bjorn Helgaas wrote
>>>>>> So I think your idea of adding current link speed/width to the "Link
>>>>>> Up" event is still on the table, and that does sound useful to me.
>>>>>
>>>>> We're already reading the link status register here to check DLLA so
>>>>> it would be nice. I guess if everything is healthy we're probably
>>>>> already
>>>>> at the maximum speed by this point.
>>>>>
>>>>>> In the future we might add another tracepoint when we enumerate the
>>>>>> device and know the Vendor/Device ID.
>>>>>
>>>>> I think we might have someone who would be interested in doing it.
>>>>
>>>>
>>>> Hi, all,
>>>>
>>>> IIUC, the current hotplug event (or presence event) is enough for Matthew.
>>>> and we would like a new tracepoing for link speed change which reports
>>>> speeds.
>>>>
>>>> For hotplug event, I plan to send a new version to
>>>>
>>>> 1. address Bjorn' concerns about event strings by removing its spaces.
>>>>
>>>> #define PCI_HOTPLUG_EVENT
>>>> \
>>>> EM(PCI_HOTPLUG_LINK_UP, "PCI_HOTPLUG_LINK_UP")
>>>> \
>>>> EM(PCI_HOTPLUG_LINK_DOWN, "PCI_HOTPLUG_LINK_DOWN")
>>>> \
>>>> EM(PCI_HOTPLUG_CARD_PRESENT, "PCI_HOTPLUG_CARD_PRESENT")
>>>> \
>>>> EMe(PCI_HOTPLUG_CARD_NOT_PRESENT,
>>>> "PCI_HOTPLUG_CARD_NOT_PRESENT")
>>>>
>>>> 2. address Ilpo comments by moving pci_hp_event to a common place
>>>> (include/trace/events/pci.h) so that the new comming can also use it.
>>>
>>> Ah, I only now noticed you've decided to re-place them. Please disregard
>>> my other comment about this being still open/undecided item.
>>>
>>>> For link speed change event (perhaps named as pci_link_event),
>>>> I plan to send a seperate patch, which provides:
>>>>
>>>> TP_STRUCT__entry(
>>>> __string( port_name, port_name )
>>>> __field( unsigned char, cur_bus_speed )
>>>> __field( unsigned char, max_bus_speed )
>>>> __field( unsigned char, width )
>>>> __field( unsigned int, flit_mode )
>>>> __field( unsigned char, reason )
>>>> ),
>>>>
>>>> The reason field is from Lukas ideas which indicates why the link speed
>>>> changed, e.g. "hotplug", "autonomous", "thermal", "retrain", etc.
>>>>
>>>> Are you happy with above changes?
>>>
>>> Since you're probably quite far with the pcie link event patch too given
>>> above, could you take a look at the LNKSTA flags representation in my
>>> patch and incorporate those as well as there seems to always lot of
>>> uncertainty about those flags when investigating the LBMS/bwctrl related
>>> issues so it seems prudent to explicitly include them into the traceevent
>>> output:
>>>
>>> https://lore.kernel.org/linux-pci/7c289bba-3133-0989-6333-41fc41fe3504@linux.intel.com/
>>>
>>>
>>
>> Sure, Thank you for the feedback.
>>
>> I like the LNKSTA flags, LNKSTA flags provides better genericity
>> compared to the custom reason field I initially proposed. But it may
>> cause confusion when used in pcie_retrain_link(). However, I've
>> identified a potential issue when this approach is applied in
>> pcie_retrain_link() scenarios.
>
> I was trying to say the flags should be in addition to the other
> information, not replace reason.
>
>> Consider the following trace output when a device hotpluged:
>>
>> $ cat /sys/kernel/debug/tracing/trace_pipe
>> $ cat /sys/kernel/debug/tracing/trace_pipe
>> <...>-118 [002] ..... 28.414220: pci_hp_event: 0000:00:03.0
>> slot:30, event:PCI_HOTPLUG_CARD_PRESENT
>>
>> <...>-118 [002] ..... 28.414273: pci_hp_event: 0000:00:03.0
>> slot:30, event:PCI_HOTPLUG_LINK_UP
>>
>> irq/57-pciehp-118 [002] ..... 28.540189: pcie_link_event:
>> 0000:00:03.0 type:4, cur_bus_speed:2.5 GT/s PCIe, max_bus_speed:16.0 GT/s
>> PCIe, width:1, flit_mode:0, status:DLLLA
>>
>> irq/57-pciehp-118 [002] ..... 28.544999: pcie_link_event:
>> 0000:00:03.0 type:4, cur_bus_speed:2.5 GT/s PCIe, max_bus_speed:16.0 GT/s
>> PCIe, width:1, flit_mode:0, status:DLLLA
>>
>> The problem is that both trace events show status:DLLLA (Data Link Layer
>> Link Active), which is the direct reading from PCI_EXP_LNKSTA. However,
>> this doesn't accurately reflect the underlying context:
>>
>> - First DLLLA: Triggered by board_added() - link establishment after
>> card insertion
>> - Second DLLLA: Triggered by pcie_retrain_link() - link retraining
>> completion
>>
>> ( I trace the events in pcie_update_link_speed() )
>>
>> In the second case, the more relevant status would be PCI_EXP_LNKSTA_LT
>> (Link Training) to indicate that link retraining was performed, even
>> though the final register state shows DLLLA.
>>
>> Question: Should we explicitly report the contextual status (e.g.,
>> PCI_EXP_LNKSTA_LT for retraining scenarios) rather than always reading
>> the current register field? This would provide more meaningful trace
>> information for debugging link state transitions.
>
> I'd prefer it coming from the LNKSTA register (TBH, I don't see much value
> in synthetizing it at all). If we start to synthetize them, it will
> potentially hide hw issues. I see on some platforms two LBMS assertions
> per bwctrl speed change (which is done by retraining the link), one with
> LT=1 and the second with LT=0.
>
> ...But I never meant to replace "reason" with "flags".
I see, I will keep both reason and flags.
>
>> Additionally, I'd appreciate your thoughts on the overall tracepoint
>> format shown above. Does this structure provide sufficient information
>> for hotplug and link analysis while maintaining readability?
>
> I don't have ideas how it could be improved beyond having those 4 flags
> available. I suspect noone does as we've not had ability to collect this
> information before when investigating issues so we're yet to understand
> all its potential.
>
Aha, agree.
Thanks for valuable coments.
Best Regards,
Shuai
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event
2025-07-16 22:25 ` Bjorn Helgaas
2025-07-17 6:00 ` Shuai Xue
@ 2025-07-24 22:27 ` Bjorn Helgaas
2025-07-25 4:33 ` Shuai Xue
1 sibling, 1 reply; 37+ messages in thread
From: Bjorn Helgaas @ 2025-07-24 22:27 UTC (permalink / raw)
To: Shuai Xue
Cc: rostedt, lukas, linux-pci, linux-kernel, linux-edac,
linux-trace-kernel, bhelgaas, tony.luck, bp, mhiramat,
mathieu.desnoyers, oleg, naveen, davem, anil.s.keshavamurthy,
mark.rutland, peterz, tianruidong, Ilpo Järvinen,
Jonathan Cameron
On Wed, Jul 16, 2025 at 05:25:33PM -0500, Bjorn Helgaas wrote:
> ...
> Anyway, I applied this on pci/trace for v6.17. If there's anything
> you want to tweak in the commit log or event text, we can still do
> that.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git/commit/?h=trace
Given the ongoing discussion, I deferred this series so we don't end
up adding trace events and immediately changing them next cycle.
Bjorn
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event
2025-07-24 22:27 ` Bjorn Helgaas
@ 2025-07-25 4:33 ` Shuai Xue
0 siblings, 0 replies; 37+ messages in thread
From: Shuai Xue @ 2025-07-25 4:33 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: rostedt, lukas, linux-pci, linux-kernel, linux-edac,
linux-trace-kernel, bhelgaas, tony.luck, bp, mhiramat,
mathieu.desnoyers, oleg, naveen, davem, anil.s.keshavamurthy,
mark.rutland, peterz, tianruidong, Ilpo Järvinen,
Jonathan Cameron
在 2025/7/25 06:27, Bjorn Helgaas 写道:
> On Wed, Jul 16, 2025 at 05:25:33PM -0500, Bjorn Helgaas wrote:
>> ...
>
>> Anyway, I applied this on pci/trace for v6.17. If there's anything
>> you want to tweak in the commit log or event text, we can still do
>> that.
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/pci/pci.git/commit/?h=trace
>
> Given the ongoing discussion, I deferred this series so we don't end
> up adding trace events and immediately changing them next cycle.
>
> Bjorn
Hi, Bjorn,
Thank you for the thoughtful decision to defer the series.
Please review v9 when you have a chance:
https://lore.kernel.org/all/20250723033108.61587-1-xueshuai@linux.alibaba.com/
I believe it addresses the concerns raised in the discussions.
Thanks.
Shuai
^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event
2025-07-21 13:17 ` Shuai Xue
@ 2025-07-26 7:55 ` Lukas Wunner
0 siblings, 0 replies; 37+ messages in thread
From: Lukas Wunner @ 2025-07-26 7:55 UTC (permalink / raw)
To: Shuai Xue
Cc: Bjorn Helgaas, Matthew W Carlis, Ilpo Järvinen,
anil.s.keshavamurthy, bhelgaas, bp, davem, linux-edac,
linux-kernel, linux-pci, linux-trace-kernel, mark.rutland,
mathieu.desnoyers, mhiramat, naveen, oleg, peterz, rostedt,
tianruidong, tony.luck
On Mon, Jul 21, 2025 at 09:17:55PM +0800, Shuai Xue wrote:
> 2025/7/19 15:11, Lukas Wunner :
> > PCI links can be tunneled over Thunderbolt, in this case the
> > link speed is fixed to 2.5 GT/s (USB4 v1.0 sec 11.2.1), but
> > in reality is governed by the speed of the Thunderbolt fabric
> > (which can even be asymmetric). Do we want to report the
> > virtual 2.5 GT/s in this case or the actual Thunderbolt speed?
> > Or do we want a separate trace event for Thunderbolt?
>
> I'm not a user of Thunderbolt, which way do you prefer?
Keep reporting the virtual 2.5 GT/s in the PCI tracepoint and
maybe add a separate tracepoint later in the thunderbolt driver
to report the Thunderbolt speed.
Thanks,
Lukas
^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2025-07-26 7:55 UTC | newest]
Thread overview: 37+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-12 1:38 [PATCH v8] PCI: hotplug: Add a generic RAS tracepoint for hotplug event Shuai Xue
2025-05-19 17:10 ` Ilpo Järvinen
2025-05-20 2:36 ` Shuai Xue
2025-05-20 10:07 ` Ilpo Järvinen
2025-05-20 10:44 ` Lukas Wunner
2025-05-20 10:59 ` Ilpo Järvinen
2025-05-20 12:09 ` Lukas Wunner
2025-05-20 12:52 ` Ilpo Järvinen
2025-05-20 13:11 ` Lukas Wunner
2025-05-22 9:50 ` Shuai Xue
2025-05-31 14:15 ` Lukas Wunner
2025-07-16 6:52 ` Shuai Xue
2025-05-22 9:41 ` Shuai Xue
2025-06-02 6:30 ` Ilpo Järvinen
2025-06-23 3:04 ` Shuai Xue
2025-07-16 22:25 ` Bjorn Helgaas
2025-07-17 6:00 ` Shuai Xue
2025-07-17 19:29 ` Bjorn Helgaas
2025-07-21 8:55 ` Ilpo Järvinen
2025-07-24 22:27 ` Bjorn Helgaas
2025-07-25 4:33 ` Shuai Xue
2025-07-17 17:28 ` Matthew W Carlis
2025-07-17 19:07 ` Bjorn Helgaas
2025-07-17 20:23 ` Lukas Wunner
2025-07-17 23:27 ` Matthew W Carlis
2025-07-17 23:50 ` Bjorn Helgaas
2025-07-18 3:46 ` Matthew W Carlis
2025-07-18 5:29 ` Shuai Xue
2025-07-18 16:35 ` Bjorn Helgaas
2025-07-19 5:23 ` Shuai Xue
2025-07-19 7:11 ` Lukas Wunner
2025-07-21 13:17 ` Shuai Xue
2025-07-26 7:55 ` Lukas Wunner
2025-07-21 10:18 ` Ilpo Järvinen
2025-07-22 2:43 ` [PATCH v8] PCI: hotplug: Add a generic RAS tracepoinggt " Shuai Xue
2025-07-22 12:29 ` Ilpo Järvinen
2025-07-23 1:29 ` Shuai Xue
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).