* [PATCH v5 1/3] aerdrv: Trace Event for AER
@ 2012-12-03 21:20 Lance Ortiz
2012-12-03 21:20 ` [PATCH v5 2/3] aerdrv: Enhanced AER logging Lance Ortiz
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Lance Ortiz @ 2012-12-03 21:20 UTC (permalink / raw)
To: bhelgaas, lance_ortiz, jiang.liu, tony.luck, bp, rostedt, mchehab,
linux-acpi, linux-pci, linux-kernel
This header file will define a new trace event that will be triggered when
a AER event occurs. The following data will be provided to the trace
event.
char * dev_name - The name of the slot where the device resides
([domain:]bus:device.function).
u32 status - Either the correctable or uncorrectable register
indicating what error or errors have been see.
u8 severity - error severity 0:NONFATAL 1:FATAL 2:CORRECTED
The trace event will also provide a trace string that may look like:
"0000:05:00.0 PCIe Bus Error:severity=Uncorrected (Non-Fatal), Poisoned
TLP"
v1-v2 Move header from include/ras/aer_event.h to
include/trace/events/ras.h
v3-v4 Cleaned up comments and commit header
v4-v5 More cleanup remove () from if statement in print.
Renamed string define to be more specific.
Signed-off-by: Lance Ortiz <lance.ortiz@hp.com>
---
include/trace/events/ras.h | 78 ++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 78 insertions(+), 0 deletions(-)
create mode 100644 include/trace/events/ras.h
diff --git a/include/trace/events/ras.h b/include/trace/events/ras.h
new file mode 100644
index 0000000..e6f123e
--- /dev/null
+++ b/include/trace/events/ras.h
@@ -0,0 +1,78 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM aer_event
+#define TRACE_INCLUDE_FILE ras
+
+#if !defined(_TRACE_AER_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_AER_H
+
+#include <linux/tracepoint.h>
+#include <linux/edac.h>
+
+
+/*
+ * PCIe AER Trace event
+ *
+ * These events are generated when hardware detects a corrected or
+ * uncorrected event on a PCIe device. The event report has
+ * the following structure:
+ *
+ * char * dev_name - The name of the slot where the device resides
+ * ([domain:]bus:device.function).
+ * u32 status - Either the correctable or uncorrectable register
+ * indicating what error or errors have been seen
+ * u8 severity - error severity 0:NONFATAL 1:FATAL 2:CORRECTED
+ */
+
+#define aer_correctable_errors \
+ {BIT(0), "Receiver Error"}, \
+ {BIT(6), "Bad TLP"}, \
+ {BIT(7), "Bad DLLP"}, \
+ {BIT(8), "RELAY_NUM Rollover"}, \
+ {BIT(12), "Replay Timer Timeout"}, \
+ {BIT(13), "Advisory Non-Fatal"}
+
+#define aer_uncorrectable_errors \
+ {BIT(4), "Data Link Protocol"}, \
+ {BIT(12), "Poisoned TLP"}, \
+ {BIT(13), "Flow Control Protocol"}, \
+ {BIT(14), "Completion Timeout"}, \
+ {BIT(15), "Completer Abort"}, \
+ {BIT(16), "Unexpected Completion"}, \
+ {BIT(17), "Receiver Overflow"}, \
+ {BIT(18), "Malformed TLP"}, \
+ {BIT(19), "ECRC"}, \
+ {BIT(20), "Unsupported Request"}
+
+TRACE_EVENT(aer_event,
+ TP_PROTO(const char *dev_name,
+ const u32 status,
+ const u8 severity),
+
+ TP_ARGS(dev_name, status, severity),
+
+ TP_STRUCT__entry(
+ __string( dev_name, dev_name )
+ __field( u32, status )
+ __field( u8, severity )
+ ),
+
+ TP_fast_assign(
+ __assign_str(dev_name, dev_name);
+ __entry->status = status;
+ __entry->severity = severity;
+ ),
+
+ TP_printk("%s PCIe Bus Error: severity=%s, %s\n",
+ __get_str(dev_name),
+ __entry->severity == HW_EVENT_ERR_CORRECTED ? "Corrected" :
+ __entry->severity == HW_EVENT_ERR_FATAL ?
+ "Fatal" : "Uncorrected",
+ __entry->severity == HW_EVENT_ERR_CORRECTED ?
+ __print_flags(__entry->status, "|", aer_correctable_errors) :
+ __print_flags(__entry->status, "|", aer_uncorrectable_errors))
+);
+
+#endif /* _TRACE_AER_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH v5 2/3] aerdrv: Enhanced AER logging 2012-12-03 21:20 [PATCH v5 1/3] aerdrv: Trace Event for AER Lance Ortiz @ 2012-12-03 21:20 ` Lance Ortiz 2012-12-04 9:30 ` Borislav Petkov 2012-12-03 21:21 ` [PATCH v5 3/3] aerdrv: Cleanup log output for CPER based AER Lance Ortiz 2012-12-04 9:22 ` [PATCH v5 1/3] aerdrv: Trace Event for AER Borislav Petkov 2 siblings, 1 reply; 7+ messages in thread From: Lance Ortiz @ 2012-12-03 21:20 UTC (permalink / raw) To: bhelgaas, lance_ortiz, jiang.liu, tony.luck, bp, rostedt, mchehab, linux-acpi, linux-pci, linux-kernel This patch will provide a more reliable and easy way for user-space applications to have access to AER logs rather than reading them from the message buffer. It also provides a way to notify user-space when an AER event occurs. The aer driver is updated to generate a trace event of function 'aer_event' when a PCIe error is reported over the AER interface. The trace event was added to both the interrupt based aer path and the firmware first path. v1-v2 fix compile errors in ifdefs. v2-v3 Update to new location of trace header. Update print to remove warning. v3-v4 Reworked logic when getting ready to call cper_print_aer Signed-off-by: Lance Ortiz <lance.ortiz@hp.com> --- drivers/acpi/apei/cper.c | 19 ++++++++++++++++--- drivers/pci/pcie/aer/aerdrv_errprint.c | 10 +++++++++- include/linux/aer.h | 2 +- 3 files changed, 26 insertions(+), 5 deletions(-) diff --git a/drivers/acpi/apei/cper.c b/drivers/acpi/apei/cper.c index e6defd8..4a3e945 100644 --- a/drivers/acpi/apei/cper.c +++ b/drivers/acpi/apei/cper.c @@ -29,6 +29,7 @@ #include <linux/time.h> #include <linux/cper.h> #include <linux/acpi.h> +#include <linux/pci.h> #include <linux/aer.h> /* @@ -249,6 +250,10 @@ static const char *cper_pcie_port_type_strs[] = { static void cper_print_pcie(const char *pfx, const struct cper_sec_pcie *pcie, const struct acpi_hest_generic_data *gdata) { +#ifdef CONFIG_ACPI_APEI_PCIEAER + struct pci_dev *dev; +#endif + if (pcie->validation_bits & CPER_PCIE_VALID_PORT_TYPE) printk("%s""port_type: %d, %s\n", pfx, pcie->port_type, pcie->port_type < ARRAY_SIZE(cper_pcie_port_type_strs) ? @@ -281,10 +286,18 @@ static void cper_print_pcie(const char *pfx, const struct cper_sec_pcie *pcie, "%s""bridge: secondary_status: 0x%04x, control: 0x%04x\n", pfx, pcie->bridge.secondary_status, pcie->bridge.control); #ifdef CONFIG_ACPI_APEI_PCIEAER - if (pcie->validation_bits & CPER_PCIE_VALID_AER_INFO) { - struct aer_capability_regs *aer_regs = (void *)pcie->aer_info; - cper_print_aer(pfx, gdata->error_severity, aer_regs); + dev = pci_get_domain_bus_and_slot(pcie->device_id.segment, + pcie->device_id.bus, pcie->device_id.function); + if (!dev) { + pr_info("PCI AER Cannot get PCI device %04x:%02x:%02x.%d\n", + pcie->device_id.segment, pcie->device_id.bus, + pcie->device_id.slot, pcie->device_id.function); + return; } + if (pcie->validation_bits & CPER_PCIE_VALID_AER_INFO) + cper_print_aer(dev, gdata->error_severity, + (struct aer_capability_regs *) pcie->aer_info); + pci_dev_put(dev); #endif } diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c b/drivers/pci/pcie/aer/aerdrv_errprint.c index 3ea5173..34d96e4 100644 --- a/drivers/pci/pcie/aer/aerdrv_errprint.c +++ b/drivers/pci/pcie/aer/aerdrv_errprint.c @@ -23,6 +23,9 @@ #include "aerdrv.h" +#define CREATE_TRACE_POINTS +#include <trace/events/ras.h> + #define AER_AGENT_RECEIVER 0 #define AER_AGENT_REQUESTER 1 #define AER_AGENT_COMPLETER 2 @@ -194,6 +197,8 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info) if (info->id && info->error_dev_num > 1 && info->id == id) printk("%s"" Error of this Agent(%04x) is reported first\n", prefix, id); + trace_aer_event(dev_name(&dev->dev), (info->status & ~info->mask), + info->severity); } void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info) @@ -217,12 +222,13 @@ int cper_severity_to_aer(int cper_severity) } EXPORT_SYMBOL_GPL(cper_severity_to_aer); -void cper_print_aer(const char *prefix, int cper_severity, +void cper_print_aer(struct pci_dev *dev, int cper_severity, struct aer_capability_regs *aer) { int aer_severity, layer, agent, status_strs_size, tlp_header_valid = 0; u32 status, mask; const char **status_strs; + char *prefix = NULL; aer_severity = cper_severity_to_aer(cper_severity); if (aer_severity == AER_CORRECTABLE) { @@ -259,5 +265,7 @@ void cper_print_aer(const char *prefix, int cper_severity, *(tlp + 8), *(tlp + 15), *(tlp + 14), *(tlp + 13), *(tlp + 12)); } + trace_aer_event(dev_name(&dev->dev), (status & ~mask), + aer_severity); } #endif diff --git a/include/linux/aer.h b/include/linux/aer.h index 544abdb..7b86dc6 100644 --- a/include/linux/aer.h +++ b/include/linux/aer.h @@ -49,7 +49,7 @@ static inline int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev) } #endif -extern void cper_print_aer(const char *prefix, int cper_severity, +extern void cper_print_aer(struct pci_dev *dev, int cper_severity, struct aer_capability_regs *aer); extern int cper_severity_to_aer(int cper_severity); extern void aer_recover_queue(int domain, unsigned int bus, unsigned int devfn, ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v5 2/3] aerdrv: Enhanced AER logging 2012-12-03 21:20 ` [PATCH v5 2/3] aerdrv: Enhanced AER logging Lance Ortiz @ 2012-12-04 9:30 ` Borislav Petkov 2012-12-06 1:40 ` Steven Rostedt 0 siblings, 1 reply; 7+ messages in thread From: Borislav Petkov @ 2012-12-04 9:30 UTC (permalink / raw) To: Steven Rostedt Cc: Lance Ortiz, bhelgaas, lance_ortiz, jiang.liu, tony.luck, mchehab, linux-acpi, linux-pci, linux-kernel On Mon, Dec 03, 2012 at 02:20:54PM -0700, Lance Ortiz wrote: > This patch will provide a more reliable and easy way for user-space > applications to have access to AER logs rather than reading them from the > message buffer. It also provides a way to notify user-space when an AER > event occurs. > > The aer driver is updated to generate a trace event of function 'aer_event' > when a PCIe error is reported over the AER interface. The trace event was > added to both the interrupt based aer path and the firmware first path. > > v1-v2 fix compile errors in ifdefs. > v2-v3 Update to new location of trace header. Update print to remove > warning. > v3-v4 Reworked logic when getting ready to call cper_print_aer > Signed-off-by: Lance Ortiz <lance.ortiz@hp.com> > --- > > drivers/acpi/apei/cper.c | 19 ++++++++++++++++--- > drivers/pci/pcie/aer/aerdrv_errprint.c | 10 +++++++++- > include/linux/aer.h | 2 +- > 3 files changed, 26 insertions(+), 5 deletions(-) > > diff --git a/drivers/acpi/apei/cper.c b/drivers/acpi/apei/cper.c > index e6defd8..4a3e945 100644 > --- a/drivers/acpi/apei/cper.c > +++ b/drivers/acpi/apei/cper.c > @@ -29,6 +29,7 @@ > #include <linux/time.h> > #include <linux/cper.h> > #include <linux/acpi.h> > +#include <linux/pci.h> > #include <linux/aer.h> > > /* > @@ -249,6 +250,10 @@ static const char *cper_pcie_port_type_strs[] = { > static void cper_print_pcie(const char *pfx, const struct cper_sec_pcie *pcie, > const struct acpi_hest_generic_data *gdata) > { > +#ifdef CONFIG_ACPI_APEI_PCIEAER > + struct pci_dev *dev; > +#endif > + > if (pcie->validation_bits & CPER_PCIE_VALID_PORT_TYPE) > printk("%s""port_type: %d, %s\n", pfx, pcie->port_type, > pcie->port_type < ARRAY_SIZE(cper_pcie_port_type_strs) ? > @@ -281,10 +286,18 @@ static void cper_print_pcie(const char *pfx, const struct cper_sec_pcie *pcie, > "%s""bridge: secondary_status: 0x%04x, control: 0x%04x\n", > pfx, pcie->bridge.secondary_status, pcie->bridge.control); > #ifdef CONFIG_ACPI_APEI_PCIEAER > - if (pcie->validation_bits & CPER_PCIE_VALID_AER_INFO) { > - struct aer_capability_regs *aer_regs = (void *)pcie->aer_info; > - cper_print_aer(pfx, gdata->error_severity, aer_regs); > + dev = pci_get_domain_bus_and_slot(pcie->device_id.segment, > + pcie->device_id.bus, pcie->device_id.function); > + if (!dev) { > + pr_info("PCI AER Cannot get PCI device %04x:%02x:%02x.%d\n", > + pcie->device_id.segment, pcie->device_id.bus, > + pcie->device_id.slot, pcie->device_id.function); > + return; > } > + if (pcie->validation_bits & CPER_PCIE_VALID_AER_INFO) > + cper_print_aer(dev, gdata->error_severity, > + (struct aer_capability_regs *) pcie->aer_info); > + pci_dev_put(dev); > #endif > } > > diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c b/drivers/pci/pcie/aer/aerdrv_errprint.c > index 3ea5173..34d96e4 100644 > --- a/drivers/pci/pcie/aer/aerdrv_errprint.c > +++ b/drivers/pci/pcie/aer/aerdrv_errprint.c > @@ -23,6 +23,9 @@ > > #include "aerdrv.h" > > +#define CREATE_TRACE_POINTS > +#include <trace/events/ras.h> Steve, AFAIU, this will create all tracepoint code from the ras.h header in this compilation unit, i.e. aerdrv_errprint.c. It has only one tracepoint now but with time, as more RAS TPs are being added, it would make sense to have that CREATE_TRACE_POINTS code at a more central place in the kernel, right? And, on configs with PCIEAER disabled, we won't have the TPs available so the CREATE_TRACE_POINTS thing should be in a compilation unit which gets included unconditionally, correct? Thanks. -- Regards/Gruss, Boris. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v5 2/3] aerdrv: Enhanced AER logging 2012-12-04 9:30 ` Borislav Petkov @ 2012-12-06 1:40 ` Steven Rostedt 2012-12-06 10:24 ` Borislav Petkov 0 siblings, 1 reply; 7+ messages in thread From: Steven Rostedt @ 2012-12-06 1:40 UTC (permalink / raw) To: Borislav Petkov Cc: Lance Ortiz, bhelgaas, lance_ortiz, jiang.liu, tony.luck, mchehab, linux-acpi, linux-pci, linux-kernel On Tue, 2012-12-04 at 10:30 +0100, Borislav Petkov wrote: > Steve, > > AFAIU, this will create all tracepoint code from the ras.h header > in this compilation unit, i.e. aerdrv_errprint.c. It has only one > tracepoint now but with time, as more RAS TPs are being added, it would > make sense to have that CREATE_TRACE_POINTS code at a more central place > in the kernel, right? Yes. > > And, on configs with PCIEAER disabled, we won't have the TPs available > so the CREATE_TRACE_POINTS thing should be in a compilation unit which > gets included unconditionally, correct? You can have a config that enables these trace points, and when you enable one of the systems that uses them, have that config select the config that enables tracepoints. Have that config compile the file for tracepoints. For example, in a Makefile: obj-$(CONFIG_RAS_TRACE_POINTS) += ras-trace.o Use #ifdef CONFIG_FOO_BAR around tracepoints that are required for specific systems if the tracepoints in the header file are for different subsystems that can be enabled or disabled separately by configs. -- Steve ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v5 2/3] aerdrv: Enhanced AER logging 2012-12-06 1:40 ` Steven Rostedt @ 2012-12-06 10:24 ` Borislav Petkov 0 siblings, 0 replies; 7+ messages in thread From: Borislav Petkov @ 2012-12-06 10:24 UTC (permalink / raw) To: Steven Rostedt Cc: Lance Ortiz, bhelgaas, lance_ortiz, jiang.liu, tony.luck, mchehab, linux-acpi, linux-pci, linux-kernel On Wed, Dec 05, 2012 at 08:40:31PM -0500, Steven Rostedt wrote: > You can have a config that enables these trace points, and when you > enable one of the systems that uses them, have that config select the > config that enables tracepoints. Have that config compile the file for > tracepoints. For example, in a Makefile: > > obj-$(CONFIG_RAS_TRACE_POINTS) += ras-trace.o > > > Use #ifdef CONFIG_FOO_BAR around tracepoints that are required for > specific systems if the tracepoints in the header file are for different > subsystems that can be enabled or disabled separately by configs. Cool, this sounds exactly like what we should do, I'll hack it up soon. Thanks a lot. -- Regards/Gruss, Boris. ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v5 3/3] aerdrv: Cleanup log output for CPER based AER 2012-12-03 21:20 [PATCH v5 1/3] aerdrv: Trace Event for AER Lance Ortiz 2012-12-03 21:20 ` [PATCH v5 2/3] aerdrv: Enhanced AER logging Lance Ortiz @ 2012-12-03 21:21 ` Lance Ortiz 2012-12-04 9:22 ` [PATCH v5 1/3] aerdrv: Trace Event for AER Borislav Petkov 2 siblings, 0 replies; 7+ messages in thread From: Lance Ortiz @ 2012-12-03 21:21 UTC (permalink / raw) To: bhelgaas, lance_ortiz, jiang.liu, tony.luck, bp, rostedt, mchehab, linux-acpi, linux-pci, linux-kernel These changes make cper_print_aer more consistent with aer_print_error which is called in the AER interrupt case. The string in the variable 'prefix' is printed at the beginning of each print statement in cper_print_aer(). The prefix is a string containing the driver name and the device's slot location. From the callers the value of prefix is never assigned and is NULL, so when cper_print_aer prints data the initial string does not get printed. This string is important because it identifies the device that the error is on. v1-v2 fix some compile errors withinn the #ifdef v3-v4 remove agent id stuff and kept print the same to avoid compatibility issues Signed-off-by: Lance Ortiz <lance.ortiz@hp.com> --- drivers/pci/pcie/aer/aerdrv_errprint.c | 7 ++++++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c b/drivers/pci/pcie/aer/aerdrv_errprint.c index 34d96e4..58ff4c0 100644 --- a/drivers/pci/pcie/aer/aerdrv_errprint.c +++ b/drivers/pci/pcie/aer/aerdrv_errprint.c @@ -228,9 +228,14 @@ void cper_print_aer(struct pci_dev *dev, int cper_severity, int aer_severity, layer, agent, status_strs_size, tlp_header_valid = 0; u32 status, mask; const char **status_strs; - char *prefix = NULL; + char prefix[44]; aer_severity = cper_severity_to_aer(cper_severity); + snprintf(prefix, sizeof(prefix), "%s%s %s: ", + (aer_severity == AER_CORRECTABLE) ? + KERN_WARNING : KERN_ERR, + dev_driver_string(&dev->dev), dev_name(&dev->dev)); + if (aer_severity == AER_CORRECTABLE) { status = aer->cor_status; mask = aer->cor_mask; ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v5 1/3] aerdrv: Trace Event for AER 2012-12-03 21:20 [PATCH v5 1/3] aerdrv: Trace Event for AER Lance Ortiz 2012-12-03 21:20 ` [PATCH v5 2/3] aerdrv: Enhanced AER logging Lance Ortiz 2012-12-03 21:21 ` [PATCH v5 3/3] aerdrv: Cleanup log output for CPER based AER Lance Ortiz @ 2012-12-04 9:22 ` Borislav Petkov 2 siblings, 0 replies; 7+ messages in thread From: Borislav Petkov @ 2012-12-04 9:22 UTC (permalink / raw) To: Lance Ortiz Cc: bhelgaas, lance_ortiz, jiang.liu, tony.luck, rostedt, mchehab, linux-acpi, linux-pci, linux-kernel On Mon, Dec 03, 2012 at 02:20:48PM -0700, Lance Ortiz wrote: > This header file will define a new trace event that will be triggered when > a AER event occurs. The following data will be provided to the trace > event. > > char * dev_name - The name of the slot where the device resides > ([domain:]bus:device.function). > > u32 status - Either the correctable or uncorrectable register > indicating what error or errors have been see. > > u8 severity - error severity 0:NONFATAL 1:FATAL 2:CORRECTED > > The trace event will also provide a trace string that may look like: > > "0000:05:00.0 PCIe Bus Error:severity=Uncorrected (Non-Fatal), Poisoned > TLP" > > v1-v2 Move header from include/ras/aer_event.h to > include/trace/events/ras.h > v3-v4 Cleaned up comments and commit header > v4-v5 More cleanup remove () from if statement in print. > Renamed string define to be more specific. > Signed-off-by: Lance Ortiz <lance.ortiz@hp.com> > --- > > include/trace/events/ras.h | 78 ++++++++++++++++++++++++++++++++++++++++++++ > 1 files changed, 78 insertions(+), 0 deletions(-) > create mode 100644 include/trace/events/ras.h > > diff --git a/include/trace/events/ras.h b/include/trace/events/ras.h > new file mode 100644 > index 0000000..e6f123e > --- /dev/null > +++ b/include/trace/events/ras.h > @@ -0,0 +1,78 @@ > +#undef TRACE_SYSTEM > +#define TRACE_SYSTEM aer_event > +#define TRACE_INCLUDE_FILE ras Whoops, I somehow missed that the last time: TRACE_SYSTEM should be "ras" and then you don't need to define TRACE_INCLUDE_FILE at all. Thanks. -- Regards/Gruss, Boris. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2012-12-06 10:24 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-12-03 21:20 [PATCH v5 1/3] aerdrv: Trace Event for AER Lance Ortiz 2012-12-03 21:20 ` [PATCH v5 2/3] aerdrv: Enhanced AER logging Lance Ortiz 2012-12-04 9:30 ` Borislav Petkov 2012-12-06 1:40 ` Steven Rostedt 2012-12-06 10:24 ` Borislav Petkov 2012-12-03 21:21 ` [PATCH v5 3/3] aerdrv: Cleanup log output for CPER based AER Lance Ortiz 2012-12-04 9:22 ` [PATCH v5 1/3] aerdrv: Trace Event for AER Borislav Petkov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox