* [PATCH v2 0/2] PCI: hotplug: Add a generic RAS tracepoint for hotplug event
@ 2024-11-12 11:58 Shuai Xue
2024-11-12 11:58 ` [PATCH v2 1/2] " Shuai Xue
2024-11-12 11:58 ` [PATCH v2 2/2] pci: pciehp: Generate tracepoints " Shuai Xue
0 siblings, 2 replies; 5+ messages in thread
From: Shuai Xue @ 2024-11-12 11:58 UTC (permalink / raw)
To: lukas, linux-pci, linux-kernel, linux-edac
Cc: bhelgaas, tony.luck, bp, xueshuai
Hotplug events are critical indicators for analyzing hardware health,
particularly in AI supercomputers where surprise link downs can
significantly impact system performance and reliability. The failure
characterization analysis illustrates the significance of failures
caused by the Infiniband link errors. Meta observes that 2% in a machine
learning cluster and 6% in a vision application cluster of Infiniband
failures co-occur with GPU failures, such as falling off the bus, which
may indicate a correlation with PCIe.[1]
PATCH 1/2: Add a generic RAS tracepoint for hotplug event to help healthy check.
PATCH 2/2: Generate tracepoints for PCIe hotplug event
The output like below:
$ echo 1 > /sys/kernel/debug/tracing/events/hotplug/pci_hp_event/enable
$ cat /sys/kernel/debug/tracing/trace_pipe
<...>-206 [001] ..... 40.373870: pci_hp_event: 0000:00:02.0 slot:10, trans_state:Link Down
<...>-206 [001] ..... 40.374871: pci_hp_event: 0000:00:02.0 slot:10, trans_state:Card not present
[1]https://arxiv.org/abs/2410.21680
Shuai Xue (2):
PCI: hotplug: Add a generic RAS tracepoint for hotplug event
pci: pciehp: Generate tracepoints for hotplug event
drivers/pci/hotplug/pciehp_ctrl.c | 33 ++++++++++++---
drivers/pci/hotplug/trace.h | 68 +++++++++++++++++++++++++++++++
include/uapi/linux/pci.h | 7 ++++
3 files changed, 102 insertions(+), 6 deletions(-)
create mode 100644 drivers/pci/hotplug/trace.h
--
2.39.3
^ permalink raw reply [flat|nested] 5+ messages in thread* [PATCH v2 1/2] PCI: hotplug: Add a generic RAS tracepoint for hotplug event 2024-11-12 11:58 [PATCH v2 0/2] PCI: hotplug: Add a generic RAS tracepoint for hotplug event Shuai Xue @ 2024-11-12 11:58 ` Shuai Xue 2024-11-13 16:27 ` Lukas Wunner 2024-11-12 11:58 ` [PATCH v2 2/2] pci: pciehp: Generate tracepoints " Shuai Xue 1 sibling, 1 reply; 5+ messages in thread From: Shuai Xue @ 2024-11-12 11:58 UTC (permalink / raw) To: lukas, linux-pci, linux-kernel, linux-edac Cc: bhelgaas, tony.luck, bp, xueshuai Hotplug events are critical indicators for analyzing hardware health, particularly in AI supercomputers where surprise link downs can significantly impact system performance and reliability. The failure characterization analysis illustrates the significance of failures caused by the Infiniband link errors. Meta observes that 2% in a machine learning cluster and 6% in a vision application cluster of Infiniband failures co-occur with GPU failures, such as falling off the bus, which may indicate a correlation with PCIe.[1] Add a generic RAS tracepoint for hotplug event to help healthy check. [1]https://arxiv.org/abs/2410.21680 Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> --- drivers/pci/hotplug/trace.h | 68 +++++++++++++++++++++++++++++++++++++ include/uapi/linux/pci.h | 7 ++++ 2 files changed, 75 insertions(+) create mode 100644 drivers/pci/hotplug/trace.h diff --git a/drivers/pci/hotplug/trace.h b/drivers/pci/hotplug/trace.h new file mode 100644 index 000000000000..a56e22534bc0 --- /dev/null +++ b/drivers/pci/hotplug/trace.h @@ -0,0 +1,68 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#if !defined(_TRACE_HW_EVENT_PCI_HP_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_HW_EVENT_PCI_HP_H + +#include <linux/tracepoint.h> + +#undef TRACE_SYSTEM +#define TRACE_SYSTEM hotplug + +#define PCI_HP_TRANS_STATE \ + EM(PCI_HOTPLUG_LINK_UP, "Link Up") \ + EM(PCI_HOTPLUG_LINK_DOWN, "Link Down") \ + EM(PCI_HOTPLUG_CARD_PRESENT, "Card present") \ + EMe(PCI_HOTPLUG_CARD_NO_PRESENT, "Card not present") + +/* Enums require being exported to userspace, for user tool parsing */ +#undef EM +#undef EMe +#define EM(a, b) TRACE_DEFINE_ENUM(a); +#define EMe(a, b) TRACE_DEFINE_ENUM(a); + +PCI_HP_TRANS_STATE + +/* + * Now redefine the EM() and EMe() macros to map the enums to the strings + * that will be printed in the output. + */ +#undef EM +#undef EMe +#define EM(a, b) {a, b}, +#define EMe(a, b) {a, b} + +TRACE_EVENT(pci_hp_event, + + TP_PROTO(const char *port_name, + const char *slot, + const int trans_state), + + TP_ARGS(port_name, slot, trans_state), + + TP_STRUCT__entry( + __string( port_name, port_name ) + __string( slot, slot ) + __field( int, trans_state ) + ), + + TP_fast_assign( + __assign_str(port_name); + __assign_str(slot); + __entry->trans_state = trans_state; + ), + + TP_printk("%s slot:%s, trans_state:%s\n", + __get_str(port_name), + __get_str(slot), + __print_symbolic(__entry->trans_state, PCI_HP_TRANS_STATE) + ) +); + +#endif /* _TRACE_HW_EVENT_PCI_HP_H */ + +#undef TRACE_INCLUDE_PATH +#define TRACE_INCLUDE_PATH ../../drivers/pci/hotplug +#undef TRACE_INCLUDE_FILE +#define TRACE_INCLUDE_FILE trace + +/* This part must be outside protection */ +#include <trace/define_trace.h> diff --git a/include/uapi/linux/pci.h b/include/uapi/linux/pci.h index a769eefc5139..1f751785a43d 100644 --- a/include/uapi/linux/pci.h +++ b/include/uapi/linux/pci.h @@ -39,4 +39,11 @@ #define PCIIOC_MMAP_IS_MEM (PCIIOC_BASE | 0x02) /* Set mmap state to MEM space. */ #define PCIIOC_WRITE_COMBINE (PCIIOC_BASE | 0x03) /* Enable/disable write-combining. */ +enum pci_hotplug_trans_type { + PCI_HOTPLUG_LINK_UP, + PCI_HOTPLUG_LINK_DOWN, + PCI_HOTPLUG_CARD_PRESENT, + PCI_HOTPLUG_CARD_NO_PRESENT, +}; + #endif /* _UAPILINUX_PCI_H */ -- 2.39.3 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v2 1/2] PCI: hotplug: Add a generic RAS tracepoint for hotplug event 2024-11-12 11:58 ` [PATCH v2 1/2] " Shuai Xue @ 2024-11-13 16:27 ` Lukas Wunner 2024-11-14 1:39 ` Shuai Xue 0 siblings, 1 reply; 5+ messages in thread From: Lukas Wunner @ 2024-11-13 16:27 UTC (permalink / raw) To: Shuai Xue; +Cc: linux-pci, linux-kernel, linux-edac, bhelgaas, tony.luck, bp On Tue, Nov 12, 2024 at 07:58:51PM +0800, Shuai Xue wrote: > Hotplug events are critical indicators for analyzing hardware health, > particularly in AI supercomputers where surprise link downs can > significantly impact system performance and reliability. The failure > characterization analysis illustrates the significance of failures > caused by the Infiniband link errors. Meta observes that 2% in a machine > learning cluster and 6% in a vision application cluster of Infiniband > failures co-occur with GPU failures, such as falling off the bus, which > may indicate a correlation with PCIe.[1] > > Add a generic RAS tracepoint for hotplug event to help healthy check. It would be good if you could mention in the commit message that you intend to use rasdaemon for monitoring these tracepoints and that you're hence adding the enum to a uapi header. So that if someone wonders later on why you chose a uapi header, they will find breadcrumbs in the commit message. I think both patches can be squashed into one. I'm not an expert on tracepoints, so when respinning, perhaps you could cc linux-trace-kernel@vger.kernel.org and maybe also tracing maintainers so that subject matter experts can look the patch over and ack it. > +#undef TRACE_SYSTEM > +#define TRACE_SYSTEM hotplug Maybe "pci_hotplug" to differentiate this from, say, cpu hotplug? > +#define PCI_HP_TRANS_STATE \ > + EM(PCI_HOTPLUG_LINK_UP, "Link Up") \ > + EM(PCI_HOTPLUG_LINK_DOWN, "Link Down") \ > + EM(PCI_HOTPLUG_CARD_PRESENT, "Card present") \ > + EMe(PCI_HOTPLUG_CARD_NO_PRESENT, "Card not present") PCI_HOTPLUG_CARD_NOT_PRESENT would be neater I think. ^ > +PCI_HP_TRANS_STATE Not sure what "trans state" stands for, maybe "state transition"? What if we add tracepoints going forward which aren't for state transitions but other types of events, such as "Power Failure"? Perhaps something more generic such as "PCI_HOTPLUG_EVENT" would be more apt? > +enum pci_hotplug_trans_type { > + PCI_HOTPLUG_LINK_UP, > + PCI_HOTPLUG_LINK_DOWN, > + PCI_HOTPLUG_CARD_PRESENT, > + PCI_HOTPLUG_CARD_NO_PRESENT, > +}; I note that this is called "pci_hotplug_trans_type", perhaps for consistency use "pci_hotplug_trans_state" as everywhere else (or whatever you choose to substitute the name for, see above). Other than these cosmetic things, the patches LGTM. Again, my knowledge on tracepoints is superficial, but broadly it looks reasonable. Thanks, Lukas ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2 1/2] PCI: hotplug: Add a generic RAS tracepoint for hotplug event 2024-11-13 16:27 ` Lukas Wunner @ 2024-11-14 1:39 ` Shuai Xue 0 siblings, 0 replies; 5+ messages in thread From: Shuai Xue @ 2024-11-14 1:39 UTC (permalink / raw) To: Lukas Wunner; +Cc: linux-pci, linux-kernel, linux-edac, bhelgaas, tony.luck, bp 在 2024/11/14 00:27, Lukas Wunner 写道: > On Tue, Nov 12, 2024 at 07:58:51PM +0800, Shuai Xue wrote: >> Hotplug events are critical indicators for analyzing hardware health, >> particularly in AI supercomputers where surprise link downs can >> significantly impact system performance and reliability. The failure >> characterization analysis illustrates the significance of failures >> caused by the Infiniband link errors. Meta observes that 2% in a machine >> learning cluster and 6% in a vision application cluster of Infiniband >> failures co-occur with GPU failures, such as falling off the bus, which >> may indicate a correlation with PCIe.[1] >> >> Add a generic RAS tracepoint for hotplug event to help healthy check. > > It would be good if you could mention in the commit message that > you intend to use rasdaemon for monitoring these tracepoints > and that you're hence adding the enum to a uapi header. > So that if someone wonders later on why you chose a uapi header, > they will find breadcrumbs in the commit message. Got it, will add it. > > I think both patches can be squashed into one. Will do it. > > I'm not an expert on tracepoints, so when respinning, perhaps you > could cc linux-trace-kernel@vger.kernel.org and maybe also tracing > maintainers so that subject matter experts can look the patch over > and ack it. Sorry, I forget it. Will add the maillist. > > >> +#undef TRACE_SYSTEM >> +#define TRACE_SYSTEM hotplug > > Maybe "pci_hotplug" to differentiate this from, say, cpu hotplug? Yes, will fix it. > > >> +#define PCI_HP_TRANS_STATE \ >> + EM(PCI_HOTPLUG_LINK_UP, "Link Up") \ >> + EM(PCI_HOTPLUG_LINK_DOWN, "Link Down") \ >> + EM(PCI_HOTPLUG_CARD_PRESENT, "Card present") \ >> + EMe(PCI_HOTPLUG_CARD_NO_PRESENT, "Card not present") > > PCI_HOTPLUG_CARD_NOT_PRESENT would be neater I think. Ok, will rename it. > ^ > >> +PCI_HP_TRANS_STATE > > Not sure what "trans state" stands for, maybe "state transition"? > What if we add tracepoints going forward which aren't for state > transitions but other types of events, such as "Power Failure"? > Perhaps something more generic such as "PCI_HOTPLUG_EVENT" would > be more apt? > I see, will rename as PCI_HOTPLUG_EVENT. > >> +enum pci_hotplug_trans_type { >> + PCI_HOTPLUG_LINK_UP, >> + PCI_HOTPLUG_LINK_DOWN, >> + PCI_HOTPLUG_CARD_PRESENT, >> + PCI_HOTPLUG_CARD_NO_PRESENT, >> +}; > > I note that this is called "pci_hotplug_trans_type", perhaps for > consistency use "pci_hotplug_trans_state" as everywhere else > (or whatever you choose to substitute the name for, see above). > > Other than these cosmetic things, the patches LGTM. > Again, my knowledge on tracepoints is superficial, but broadly > it looks reasonable. > > Thanks, > > Lukas Thank you for valuable comments. Best Regards, Shuai ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v2 2/2] pci: pciehp: Generate tracepoints for hotplug event 2024-11-12 11:58 [PATCH v2 0/2] PCI: hotplug: Add a generic RAS tracepoint for hotplug event Shuai Xue 2024-11-12 11:58 ` [PATCH v2 1/2] " Shuai Xue @ 2024-11-12 11:58 ` Shuai Xue 1 sibling, 0 replies; 5+ messages in thread From: Shuai Xue @ 2024-11-12 11:58 UTC (permalink / raw) To: lukas, linux-pci, linux-kernel, linux-edac Cc: bhelgaas, tony.luck, bp, xueshuai Generate tracepoints for hotplug event. The output like below: $ echo 1 > /sys/kernel/debug/tracing/events/hotplug/pci_hp_event/enable $ cat /sys/kernel/debug/tracing/trace_pipe <...>-206 [001] ..... 40.373870: pci_hp_event: 0000:00:02.0 slot:10, trans_state:Link Down <...>-206 [001] ..... 40.374871: pci_hp_event: 0000:00:02.0 slot:10, trans_state:Card not present Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> --- drivers/pci/hotplug/pciehp_ctrl.c | 33 +++++++++++++++++++++++++------ 1 file changed, 27 insertions(+), 6 deletions(-) diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c index dcdbfcf404dd..ba099cb19f5e 100644 --- a/drivers/pci/hotplug/pciehp_ctrl.c +++ b/drivers/pci/hotplug/pciehp_ctrl.c @@ -21,6 +21,9 @@ #include <linux/pci.h> #include "pciehp.h" +#define CREATE_TRACE_POINTS +#include "trace.h" + /* The following routines constitute the bulk of the hotplug controller logic */ @@ -239,12 +242,20 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events) case ON_STATE: ctrl->state = POWEROFF_STATE; mutex_unlock(&ctrl->state_lock); - if (events & PCI_EXP_SLTSTA_DLLSC) + if (events & PCI_EXP_SLTSTA_DLLSC) { ctrl_info(ctrl, "Slot(%s): Link Down\n", slot_name(ctrl)); - if (events & PCI_EXP_SLTSTA_PDC) + trace_pci_hp_event(pci_name(ctrl->pcie->port), + slot_name(ctrl), + PCI_HOTPLUG_LINK_DOWN); + } + if (events & PCI_EXP_SLTSTA_PDC) { ctrl_info(ctrl, "Slot(%s): Card not present\n", slot_name(ctrl)); + trace_pci_hp_event(pci_name(ctrl->pcie->port), + slot_name(ctrl), + PCI_HOTPLUG_CARD_NO_PRESENT); + } pciehp_disable_slot(ctrl, SURPRISE_REMOVAL); break; default: @@ -264,6 +275,9 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events) INDICATOR_NOOP); ctrl_info(ctrl, "Slot(%s): Card not present\n", slot_name(ctrl)); + trace_pci_hp_event(pci_name(ctrl->pcie->port), + slot_name(ctrl), + PCI_HOTPLUG_CARD_NO_PRESENT); } mutex_unlock(&ctrl->state_lock); return; @@ -276,12 +290,19 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events) case OFF_STATE: ctrl->state = POWERON_STATE; mutex_unlock(&ctrl->state_lock); - if (present) + if (present) { ctrl_info(ctrl, "Slot(%s): Card present\n", slot_name(ctrl)); - if (link_active) - ctrl_info(ctrl, "Slot(%s): Link Up\n", - slot_name(ctrl)); + trace_pci_hp_event(pci_name(ctrl->pcie->port), + slot_name(ctrl), + PCI_HOTPLUG_CARD_PRESENT); + } + if (link_active) { + ctrl_info(ctrl, "Slot(%s): Link Up\n", slot_name(ctrl)); + trace_pci_hp_event(pci_name(ctrl->pcie->port), + slot_name(ctrl), + PCI_HOTPLUG_LINK_UP); + } ctrl->request_result = pciehp_enable_slot(ctrl); break; default: -- 2.39.3 ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-11-14 1:39 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-11-12 11:58 [PATCH v2 0/2] PCI: hotplug: Add a generic RAS tracepoint for hotplug event Shuai Xue 2024-11-12 11:58 ` [PATCH v2 1/2] " Shuai Xue 2024-11-13 16:27 ` Lukas Wunner 2024-11-14 1:39 ` Shuai Xue 2024-11-12 11:58 ` [PATCH v2 2/2] pci: pciehp: Generate tracepoints " Shuai Xue
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox