* [PATCH v6 1/3] aerdrv: Trace Event for AER @ 2012-12-04 17:04 Lance Ortiz 2012-12-04 17:04 ` [PATCH v6 2/3] aerdrv: Enhanced AER logging Lance Ortiz ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Lance Ortiz @ 2012-12-04 17:04 UTC (permalink / raw) To: bhelgaas, lance_ortiz, jiang.liu, tony.luck, bp, rostedt, mchehab, linux-acpi, linux-pci, linux-kernel This header file will define a new trace event that will be triggered when a AER event occurs. The following data will be provided to the trace event. char * dev_name - The name of the slot where the device resides ([domain:]bus:device.function). u32 status - Either the correctable or uncorrectable register indicating what error or errors have been see. u8 severity - error severity 0:NONFATAL 1:FATAL 2:CORRECTED The trace event will also provide a trace string that may look like: "0000:05:00.0 PCIe Bus Error:severity=Uncorrected (Non-Fatal), Poisoned TLP" v1-v2 Move header from include/ras/aer_event.h to include/trace/events/ras.h v3-v4 Cleaned up comments and commit header v4-v5 More cleanup remove () from if statement in print. Renamed string define to be more specific. v5-v6 change TRACE_SYSTEM define to be ras and not aer. Signed-off-by: Lance Ortiz <lance.ortiz@hp.com> --- 0 files changed, 0 insertions(+), 0 deletions(-) diff --git a/include/trace/events/ras.h b/include/trace/events/ras.h new file mode 100644 index 0000000..88b8783 --- /dev/null +++ b/include/trace/events/ras.h @@ -0,0 +1,77 @@ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM ras + +#if !defined(_TRACE_AER_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_AER_H + +#include <linux/tracepoint.h> +#include <linux/edac.h> + + +/* + * PCIe AER Trace event + * + * These events are generated when hardware detects a corrected or + * uncorrected event on a PCIe device. The event report has + * the following structure: + * + * char * dev_name - The name of the slot where the device resides + * ([domain:]bus:device.function). + * u32 status - Either the correctable or uncorrectable register + * indicating what error or errors have been seen + * u8 severity - error severity 0:NONFATAL 1:FATAL 2:CORRECTED + */ + +#define aer_correctable_errors \ + {BIT(0), "Receiver Error"}, \ + {BIT(6), "Bad TLP"}, \ + {BIT(7), "Bad DLLP"}, \ + {BIT(8), "RELAY_NUM Rollover"}, \ + {BIT(12), "Replay Timer Timeout"}, \ + {BIT(13), "Advisory Non-Fatal"} + +#define aer_uncorrectable_errors \ + {BIT(4), "Data Link Protocol"}, \ + {BIT(12), "Poisoned TLP"}, \ + {BIT(13), "Flow Control Protocol"}, \ + {BIT(14), "Completion Timeout"}, \ + {BIT(15), "Completer Abort"}, \ + {BIT(16), "Unexpected Completion"}, \ + {BIT(17), "Receiver Overflow"}, \ + {BIT(18), "Malformed TLP"}, \ + {BIT(19), "ECRC"}, \ + {BIT(20), "Unsupported Request"} + +TRACE_EVENT(aer_event, + TP_PROTO(const char *dev_name, + const u32 status, + const u8 severity), + + TP_ARGS(dev_name, status, severity), + + TP_STRUCT__entry( + __string( dev_name, dev_name ) + __field( u32, status ) + __field( u8, severity ) + ), + + TP_fast_assign( + __assign_str(dev_name, dev_name); + __entry->status = status; + __entry->severity = severity; + ), + + TP_printk("%s PCIe Bus Error: severity=%s, %s\n", + __get_str(dev_name), + __entry->severity == HW_EVENT_ERR_CORRECTED ? "Corrected" : + __entry->severity == HW_EVENT_ERR_FATAL ? + "Fatal" : "Uncorrected", + __entry->severity == HW_EVENT_ERR_CORRECTED ? + __print_flags(__entry->status, "|", aer_correctable_errors) : + __print_flags(__entry->status, "|", aer_uncorrectable_errors)) +); + +#endif /* _TRACE_AER_H */ + +/* This part must be outside protection */ +#include <trace/define_trace.h> ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v6 2/3] aerdrv: Enhanced AER logging 2012-12-04 17:04 [PATCH v6 1/3] aerdrv: Trace Event for AER Lance Ortiz @ 2012-12-04 17:04 ` Lance Ortiz 2012-12-04 18:41 ` Mauro Carvalho Chehab 2012-12-04 17:04 ` [PATCH v6 3/3] aerdrv: Cleanup log output for CPER based AER Lance Ortiz 2012-12-04 18:36 ` [PATCH v6 1/3] aerdrv: Trace Event for AER Mauro Carvalho Chehab 2 siblings, 1 reply; 10+ messages in thread From: Lance Ortiz @ 2012-12-04 17:04 UTC (permalink / raw) To: bhelgaas, lance_ortiz, jiang.liu, tony.luck, bp, rostedt, mchehab, linux-acpi, linux-pci, linux-kernel This patch will provide a more reliable and easy way for user-space applications to have access to AER logs rather than reading them from the message buffer. It also provides a way to notify user-space when an AER event occurs. The aer driver is updated to generate a trace event of function 'aer_event' when a PCIe error is reported over the AER interface. The trace event was added to both the interrupt based aer path and the firmware first path. v1-v2 fix compile errors in ifdefs. v2-v3 Update to new location of trace header. Update print to remove warning. v3-v4 Reworked logic when getting ready to call cper_print_aer Signed-off-by: Lance Ortiz <lance.ortiz@hp.com> --- 0 files changed, 0 insertions(+), 0 deletions(-) diff --git a/drivers/acpi/apei/cper.c b/drivers/acpi/apei/cper.c index e6defd8..4a3e945 100644 --- a/drivers/acpi/apei/cper.c +++ b/drivers/acpi/apei/cper.c @@ -29,6 +29,7 @@ #include <linux/time.h> #include <linux/cper.h> #include <linux/acpi.h> +#include <linux/pci.h> #include <linux/aer.h> /* @@ -249,6 +250,10 @@ static const char *cper_pcie_port_type_strs[] = { static void cper_print_pcie(const char *pfx, const struct cper_sec_pcie *pcie, const struct acpi_hest_generic_data *gdata) { +#ifdef CONFIG_ACPI_APEI_PCIEAER + struct pci_dev *dev; +#endif + if (pcie->validation_bits & CPER_PCIE_VALID_PORT_TYPE) printk("%s""port_type: %d, %s\n", pfx, pcie->port_type, pcie->port_type < ARRAY_SIZE(cper_pcie_port_type_strs) ? @@ -281,10 +286,18 @@ static void cper_print_pcie(const char *pfx, const struct cper_sec_pcie *pcie, "%s""bridge: secondary_status: 0x%04x, control: 0x%04x\n", pfx, pcie->bridge.secondary_status, pcie->bridge.control); #ifdef CONFIG_ACPI_APEI_PCIEAER - if (pcie->validation_bits & CPER_PCIE_VALID_AER_INFO) { - struct aer_capability_regs *aer_regs = (void *)pcie->aer_info; - cper_print_aer(pfx, gdata->error_severity, aer_regs); + dev = pci_get_domain_bus_and_slot(pcie->device_id.segment, + pcie->device_id.bus, pcie->device_id.function); + if (!dev) { + pr_info("PCI AER Cannot get PCI device %04x:%02x:%02x.%d\n", + pcie->device_id.segment, pcie->device_id.bus, + pcie->device_id.slot, pcie->device_id.function); + return; } + if (pcie->validation_bits & CPER_PCIE_VALID_AER_INFO) + cper_print_aer(dev, gdata->error_severity, + (struct aer_capability_regs *) pcie->aer_info); + pci_dev_put(dev); #endif } diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c b/drivers/pci/pcie/aer/aerdrv_errprint.c index 3ea5173..34d96e4 100644 --- a/drivers/pci/pcie/aer/aerdrv_errprint.c +++ b/drivers/pci/pcie/aer/aerdrv_errprint.c @@ -23,6 +23,9 @@ #include "aerdrv.h" +#define CREATE_TRACE_POINTS +#include <trace/events/ras.h> + #define AER_AGENT_RECEIVER 0 #define AER_AGENT_REQUESTER 1 #define AER_AGENT_COMPLETER 2 @@ -194,6 +197,8 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info) if (info->id && info->error_dev_num > 1 && info->id == id) printk("%s"" Error of this Agent(%04x) is reported first\n", prefix, id); + trace_aer_event(dev_name(&dev->dev), (info->status & ~info->mask), + info->severity); } void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info) @@ -217,12 +222,13 @@ int cper_severity_to_aer(int cper_severity) } EXPORT_SYMBOL_GPL(cper_severity_to_aer); -void cper_print_aer(const char *prefix, int cper_severity, +void cper_print_aer(struct pci_dev *dev, int cper_severity, struct aer_capability_regs *aer) { int aer_severity, layer, agent, status_strs_size, tlp_header_valid = 0; u32 status, mask; const char **status_strs; + char *prefix = NULL; aer_severity = cper_severity_to_aer(cper_severity); if (aer_severity == AER_CORRECTABLE) { @@ -259,5 +265,7 @@ void cper_print_aer(const char *prefix, int cper_severity, *(tlp + 8), *(tlp + 15), *(tlp + 14), *(tlp + 13), *(tlp + 12)); } + trace_aer_event(dev_name(&dev->dev), (status & ~mask), + aer_severity); } #endif diff --git a/include/linux/aer.h b/include/linux/aer.h index 544abdb..7b86dc6 100644 --- a/include/linux/aer.h +++ b/include/linux/aer.h @@ -49,7 +49,7 @@ static inline int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev) } #endif -extern void cper_print_aer(const char *prefix, int cper_severity, +extern void cper_print_aer(struct pci_dev *dev, int cper_severity, struct aer_capability_regs *aer); extern int cper_severity_to_aer(int cper_severity); extern void aer_recover_queue(int domain, unsigned int bus, unsigned int devfn, ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v6 2/3] aerdrv: Enhanced AER logging 2012-12-04 17:04 ` [PATCH v6 2/3] aerdrv: Enhanced AER logging Lance Ortiz @ 2012-12-04 18:41 ` Mauro Carvalho Chehab 2012-12-04 20:14 ` Ortiz, Lance E 0 siblings, 1 reply; 10+ messages in thread From: Mauro Carvalho Chehab @ 2012-12-04 18:41 UTC (permalink / raw) To: Lance Ortiz Cc: bhelgaas, lance_ortiz, jiang.liu, tony.luck, bp, rostedt, linux-acpi, linux-pci, linux-kernel Em Tue, 04 Dec 2012 10:04:36 -0700 Lance Ortiz <lance.ortiz@hp.com> escreveu: > This patch will provide a more reliable and easy way for user-space > applications to have access to AER logs rather than reading them from the > message buffer. It also provides a way to notify user-space when an AER > event occurs. > > The aer driver is updated to generate a trace event of function 'aer_event' > when a PCIe error is reported over the AER interface. The trace event was > added to both the interrupt based aer path and the firmware first path. > > v1-v2 fix compile errors in ifdefs. > v2-v3 Update to new location of trace header. Update print to remove > warning. > v3-v4 Reworked logic when getting ready to call cper_print_aer > Signed-off-by: Lance Ortiz <lance.ortiz@hp.com> > --- > > 0 files changed, 0 insertions(+), 0 deletions(-) > > diff --git a/drivers/acpi/apei/cper.c b/drivers/acpi/apei/cper.c > index e6defd8..4a3e945 100644 > --- a/drivers/acpi/apei/cper.c > +++ b/drivers/acpi/apei/cper.c > @@ -29,6 +29,7 @@ > #include <linux/time.h> > #include <linux/cper.h> > #include <linux/acpi.h> > +#include <linux/pci.h> > #include <linux/aer.h> > > /* > @@ -249,6 +250,10 @@ static const char *cper_pcie_port_type_strs[] = { > static void cper_print_pcie(const char *pfx, const struct cper_sec_pcie *pcie, > const struct acpi_hest_generic_data *gdata) > { > +#ifdef CONFIG_ACPI_APEI_PCIEAER > + struct pci_dev *dev; > +#endif > + > if (pcie->validation_bits & CPER_PCIE_VALID_PORT_TYPE) > printk("%s""port_type: %d, %s\n", pfx, pcie->port_type, > pcie->port_type < ARRAY_SIZE(cper_pcie_port_type_strs) ? > @@ -281,10 +286,18 @@ static void cper_print_pcie(const char *pfx, const struct cper_sec_pcie *pcie, > "%s""bridge: secondary_status: 0x%04x, control: 0x%04x\n", > pfx, pcie->bridge.secondary_status, pcie->bridge.control); > #ifdef CONFIG_ACPI_APEI_PCIEAER > - if (pcie->validation_bits & CPER_PCIE_VALID_AER_INFO) { > - struct aer_capability_regs *aer_regs = (void *)pcie->aer_info; > - cper_print_aer(pfx, gdata->error_severity, aer_regs); > + dev = pci_get_domain_bus_and_slot(pcie->device_id.segment, > + pcie->device_id.bus, pcie->device_id.function); > + if (!dev) { > + pr_info("PCI AER Cannot get PCI device %04x:%02x:%02x.%d\n", > + pcie->device_id.segment, pcie->device_id.bus, > + pcie->device_id.slot, pcie->device_id.function); Hmm... please correct if I'm wrong, but an error happened at PCI, and also a kernel bug that prevented it to get the proper PCI device... IMHO, the message here should be stronger, and likely printed via pr_err(). > + return; > } > + if (pcie->validation_bits & CPER_PCIE_VALID_AER_INFO) > + cper_print_aer(dev, gdata->error_severity, > + (struct aer_capability_regs *) pcie->aer_info); > + pci_dev_put(dev); > #endif > } > > diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c b/drivers/pci/pcie/aer/aerdrv_errprint.c > index 3ea5173..34d96e4 100644 > --- a/drivers/pci/pcie/aer/aerdrv_errprint.c > +++ b/drivers/pci/pcie/aer/aerdrv_errprint.c > @@ -23,6 +23,9 @@ > > #include "aerdrv.h" > > +#define CREATE_TRACE_POINTS > +#include <trace/events/ras.h> > + > #define AER_AGENT_RECEIVER 0 > #define AER_AGENT_REQUESTER 1 > #define AER_AGENT_COMPLETER 2 > @@ -194,6 +197,8 @@ void aer_print_error(struct pci_dev *dev, struct aer_err_info *info) > if (info->id && info->error_dev_num > 1 && info->id == id) > printk("%s"" Error of this Agent(%04x) is reported first\n", > prefix, id); > + trace_aer_event(dev_name(&dev->dev), (info->status & ~info->mask), > + info->severity); > } > > void aer_print_port_info(struct pci_dev *dev, struct aer_err_info *info) > @@ -217,12 +222,13 @@ int cper_severity_to_aer(int cper_severity) > } > EXPORT_SYMBOL_GPL(cper_severity_to_aer); > > -void cper_print_aer(const char *prefix, int cper_severity, > +void cper_print_aer(struct pci_dev *dev, int cper_severity, > struct aer_capability_regs *aer) > { > int aer_severity, layer, agent, status_strs_size, tlp_header_valid = 0; > u32 status, mask; > const char **status_strs; > + char *prefix = NULL; > > aer_severity = cper_severity_to_aer(cper_severity); > if (aer_severity == AER_CORRECTABLE) { > @@ -259,5 +265,7 @@ void cper_print_aer(const char *prefix, int cper_severity, > *(tlp + 8), *(tlp + 15), *(tlp + 14), > *(tlp + 13), *(tlp + 12)); > } > + trace_aer_event(dev_name(&dev->dev), (status & ~mask), > + aer_severity); > } > #endif > diff --git a/include/linux/aer.h b/include/linux/aer.h > index 544abdb..7b86dc6 100644 > --- a/include/linux/aer.h > +++ b/include/linux/aer.h > @@ -49,7 +49,7 @@ static inline int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev) > } > #endif > > -extern void cper_print_aer(const char *prefix, int cper_severity, > +extern void cper_print_aer(struct pci_dev *dev, int cper_severity, > struct aer_capability_regs *aer); > extern int cper_severity_to_aer(int cper_severity); > extern void aer_recover_queue(int domain, unsigned int bus, unsigned int devfn, > After addressing the above: Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> -- Cheers, Mauro ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [PATCH v6 2/3] aerdrv: Enhanced AER logging 2012-12-04 18:41 ` Mauro Carvalho Chehab @ 2012-12-04 20:14 ` Ortiz, Lance E 2012-12-04 20:31 ` Mauro Carvalho Chehab 0 siblings, 1 reply; 10+ messages in thread From: Ortiz, Lance E @ 2012-12-04 20:14 UTC (permalink / raw) To: Mauro Carvalho Chehab Cc: bhelgaas@google.com, lance_ortiz@hotmail.com, jiang.liu@huawei.com, tony.luck@intel.com, bp@alien8.de, rostedt@goodmis.org, linux-acpi@vger.kernel.org, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org > > + if (!dev) { > > + pr_info("PCI AER Cannot get PCI device > %04x:%02x:%02x.%d\n", > > + pcie->device_id.segment, pcie->device_id.bus, > > + pcie->device_id.slot, pcie->device_id.function); > > Hmm... please correct if I'm wrong, but an error happened at PCI, and > also a > kernel bug that prevented it to get the proper PCI device... > > IMHO, the message here should be stronger, and likely printed via > pr_err(). > Mauro, I modeled this message after other places in the kernel where this function failed. So I figured it would be safe to be consistent there. I agree though that it should be pr_err(). I can make that change. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v6 2/3] aerdrv: Enhanced AER logging 2012-12-04 20:14 ` Ortiz, Lance E @ 2012-12-04 20:31 ` Mauro Carvalho Chehab 2012-12-04 20:36 ` Borislav Petkov 0 siblings, 1 reply; 10+ messages in thread From: Mauro Carvalho Chehab @ 2012-12-04 20:31 UTC (permalink / raw) To: Ortiz, Lance E Cc: bhelgaas@google.com, lance_ortiz@hotmail.com, jiang.liu@huawei.com, tony.luck@intel.com, bp@alien8.de, rostedt@goodmis.org, linux-acpi@vger.kernel.org, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org Em Tue, 4 Dec 2012 20:14:10 +0000 "Ortiz, Lance E" <lance.oritz@hp.com> escreveu: > > > + if (!dev) { > > > + pr_info("PCI AER Cannot get PCI device > > %04x:%02x:%02x.%d\n", > > > + pcie->device_id.segment, pcie->device_id.bus, > > > + pcie->device_id.slot, pcie->device_id.function); > > > > Hmm... please correct if I'm wrong, but an error happened at PCI, and > > also a > > kernel bug that prevented it to get the proper PCI device... > > > > IMHO, the message here should be stronger, and likely printed via > > pr_err(). > > > > Mauro, > > I modeled this message after other places in the kernel where this function failed. So I figured it would be safe to be consistent there. I agree though that it should be pr_err(). I can make that change. I understand. On most cases, this may not be a critical issue. However, in this particular case, if PCI AER got an error, but the device is not found when trying to handle it, it can be an indication that the PCI device has a more serious issue. So, I'm in favor of changing it, and likely be more verbose at the error message, saying that the device was not found while trying to report an error condition that happened there. It could make sense to even send a trace for the daemon to be aware of the error, on some pci device that vanished likely due to the error. Regards, Mauro ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v6 2/3] aerdrv: Enhanced AER logging 2012-12-04 20:31 ` Mauro Carvalho Chehab @ 2012-12-04 20:36 ` Borislav Petkov 0 siblings, 0 replies; 10+ messages in thread From: Borislav Petkov @ 2012-12-04 20:36 UTC (permalink / raw) To: Mauro Carvalho Chehab Cc: Ortiz, Lance E, bhelgaas@google.com, lance_ortiz@hotmail.com, jiang.liu@huawei.com, tony.luck@intel.com, rostedt@goodmis.org, linux-acpi@vger.kernel.org, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org On Tue, Dec 04, 2012 at 06:31:10PM -0200, Mauro Carvalho Chehab wrote: > However, in this particular case, if PCI AER got an error, but the > device is not found when trying to handle it, it can be an indication > that the PCI device has a more serious issue. So, I'm in favor > of changing it, and likely be more verbose at the error message, > saying that the device was not found while trying to report an error > condition that happened there. It could make sense to even send a > trace for the daemon to be aware of the error, on some pci device that > vanished likely due to the error. Let's leave it at pr_err now and if we actually start seeing errors like that and decide that they need more verbose reporting, then to change it to whatever's best. IOW, do the empirical approach. Thanks. -- Regards/Gruss, Boris. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v6 3/3] aerdrv: Cleanup log output for CPER based AER 2012-12-04 17:04 [PATCH v6 1/3] aerdrv: Trace Event for AER Lance Ortiz 2012-12-04 17:04 ` [PATCH v6 2/3] aerdrv: Enhanced AER logging Lance Ortiz @ 2012-12-04 17:04 ` Lance Ortiz 2012-12-04 17:39 ` Joe Perches 2012-12-04 18:36 ` [PATCH v6 1/3] aerdrv: Trace Event for AER Mauro Carvalho Chehab 2 siblings, 1 reply; 10+ messages in thread From: Lance Ortiz @ 2012-12-04 17:04 UTC (permalink / raw) To: bhelgaas, lance_ortiz, jiang.liu, tony.luck, bp, rostedt, mchehab, linux-acpi, linux-pci, linux-kernel These changes make cper_print_aer more consistent with aer_print_error which is called in the AER interrupt case. The string in the variable 'prefix' is printed at the beginning of each print statement in cper_print_aer(). The prefix is a string containing the driver name and the device's slot location. From the callers the value of prefix is never assigned and is NULL, so when cper_print_aer prints data the initial string does not get printed. This string is important because it identifies the device that the error is on. v1-v2 fix some compile errors withinn the #ifdef v3-v4 remove agent id stuff and kept print the same to avoid compatibility issues Signed-off-by: Lance Ortiz <lance.ortiz@hp.com> --- 0 files changed, 0 insertions(+), 0 deletions(-) diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c b/drivers/pci/pcie/aer/aerdrv_errprint.c index 34d96e4..58ff4c0 100644 --- a/drivers/pci/pcie/aer/aerdrv_errprint.c +++ b/drivers/pci/pcie/aer/aerdrv_errprint.c @@ -228,9 +228,14 @@ void cper_print_aer(struct pci_dev *dev, int cper_severity, int aer_severity, layer, agent, status_strs_size, tlp_header_valid = 0; u32 status, mask; const char **status_strs; - char *prefix = NULL; + char prefix[44]; aer_severity = cper_severity_to_aer(cper_severity); + snprintf(prefix, sizeof(prefix), "%s%s %s: ", + (aer_severity == AER_CORRECTABLE) ? + KERN_WARNING : KERN_ERR, + dev_driver_string(&dev->dev), dev_name(&dev->dev)); + if (aer_severity == AER_CORRECTABLE) { status = aer->cor_status; mask = aer->cor_mask; ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v6 3/3] aerdrv: Cleanup log output for CPER based AER 2012-12-04 17:04 ` [PATCH v6 3/3] aerdrv: Cleanup log output for CPER based AER Lance Ortiz @ 2012-12-04 17:39 ` Joe Perches 2012-12-04 20:06 ` Ortiz, Lance E 0 siblings, 1 reply; 10+ messages in thread From: Joe Perches @ 2012-12-04 17:39 UTC (permalink / raw) To: Lance Ortiz Cc: bhelgaas, lance_ortiz, jiang.liu, tony.luck, bp, rostedt, mchehab, linux-acpi, linux-pci, linux-kernel On Tue, 2012-12-04 at 10:04 -0700, Lance Ortiz wrote: > These changes make cper_print_aer more consistent with aer_print_error > which is called in the AER interrupt case. The string in the variable > 'prefix' is printed at the beginning of each print statement in > cper_print_aer(). The prefix is a string containing the driver name > and the device's slot location. From the callers the value of prefix > is never assigned and is NULL, so when cper_print_aer prints data the > initial string does not get printed. This string is important because > it identifies the device that the error is on. Hi again Lance. > diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c b/drivers/pci/pcie/aer/aerdrv_errprint.c [] > @@ -228,9 +228,14 @@ void cper_print_aer(struct pci_dev *dev, int cper_severity, > int aer_severity, layer, agent, status_strs_size, tlp_header_valid = 0; > u32 status, mask; > const char **status_strs; > - char *prefix = NULL; > + char prefix[44]; > > aer_severity = cper_severity_to_aer(cper_severity); > + snprintf(prefix, sizeof(prefix), "%s%s %s: ", > + (aer_severity == AER_CORRECTABLE) ? > + KERN_WARNING : KERN_ERR, > + dev_driver_string(&dev->dev), dev_name(&dev->dev)); > + > if (aer_severity == AER_CORRECTABLE) { > status = aer->cor_status; > mask = aer->cor_mask; Perhaps this would be better just using dev_printk instead of snprintf into a prefix with printk to emulate dev_printk. Also, perhaps KERN_NOTICE is preferable to KERN_WARNING in the CORRECTABLE case. Maybe something like: const char *level = KERN_ERR; if (aer_severity == AER_CORRECTABLE) level = KERN_NOTICE; ... dev_printk(level, &dev->dev, etc...); Maybe do this after this series of patches is accepted. Enough with the revisions for awhile... ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [PATCH v6 3/3] aerdrv: Cleanup log output for CPER based AER 2012-12-04 17:39 ` Joe Perches @ 2012-12-04 20:06 ` Ortiz, Lance E 0 siblings, 0 replies; 10+ messages in thread From: Ortiz, Lance E @ 2012-12-04 20:06 UTC (permalink / raw) To: Joe Perches Cc: bhelgaas@google.com, lance_ortiz@hotmail.com, jiang.liu@huawei.com, tony.luck@intel.com, bp@alien8.de, rostedt@goodmis.org, mchehab@redhat.com, linux-acpi@vger.kernel.org, linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org > Hi again Lance. > > > diff --git a/drivers/pci/pcie/aer/aerdrv_errprint.c > b/drivers/pci/pcie/aer/aerdrv_errprint.c > [] > > @@ -228,9 +228,14 @@ void cper_print_aer(struct pci_dev *dev, int > cper_severity, > > int aer_severity, layer, agent, status_strs_size, > tlp_header_valid = 0; > > u32 status, mask; > > const char **status_strs; > > - char *prefix = NULL; > > + char prefix[44]; > > > > aer_severity = cper_severity_to_aer(cper_severity); > > + snprintf(prefix, sizeof(prefix), "%s%s %s: ", > > + (aer_severity == AER_CORRECTABLE) ? > > + KERN_WARNING : KERN_ERR, > > + dev_driver_string(&dev->dev), dev_name(&dev->dev)); > > + > > if (aer_severity == AER_CORRECTABLE) { > > status = aer->cor_status; > > mask = aer->cor_mask; > > Perhaps this would be better just using dev_printk > instead of snprintf into a prefix with printk to > emulate dev_printk. > > Also, perhaps KERN_NOTICE is preferable to KERN_WARNING > in the CORRECTABLE case. > > Maybe something like: > > const char *level = KERN_ERR; > if (aer_severity == AER_CORRECTABLE) > level = KERN_NOTICE; > > ... > > dev_printk(level, &dev->dev, etc...); > > Maybe do this after this series of patches is accepted. > Enough with the revisions for awhile... Hi Joe, The reason I did it this way was I was trying to be consistent with what was done for the AER interrupt case. The function aer_print_error()in the same file uses sprintf this way. I'm not sure why they chose that instead of dev_printk. I think if we did make the change we should do it in both places to keep them the same. Lance ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v6 1/3] aerdrv: Trace Event for AER 2012-12-04 17:04 [PATCH v6 1/3] aerdrv: Trace Event for AER Lance Ortiz 2012-12-04 17:04 ` [PATCH v6 2/3] aerdrv: Enhanced AER logging Lance Ortiz 2012-12-04 17:04 ` [PATCH v6 3/3] aerdrv: Cleanup log output for CPER based AER Lance Ortiz @ 2012-12-04 18:36 ` Mauro Carvalho Chehab 2 siblings, 0 replies; 10+ messages in thread From: Mauro Carvalho Chehab @ 2012-12-04 18:36 UTC (permalink / raw) To: Lance Ortiz Cc: bhelgaas, lance_ortiz, jiang.liu, tony.luck, bp, rostedt, linux-acpi, linux-pci, linux-kernel Em Tue, 04 Dec 2012 10:04:30 -0700 Lance Ortiz <lance.ortiz@hp.com> escreveu: > This header file will define a new trace event that will be triggered when > a AER event occurs. The following data will be provided to the trace > event. > > char * dev_name - The name of the slot where the device resides > ([domain:]bus:device.function). > > u32 status - Either the correctable or uncorrectable register > indicating what error or errors have been see. > > u8 severity - error severity 0:NONFATAL 1:FATAL 2:CORRECTED > > The trace event will also provide a trace string that may look like: > > "0000:05:00.0 PCIe Bus Error:severity=Uncorrected (Non-Fatal), Poisoned > TLP" > > v1-v2 Move header from include/ras/aer_event.h to > include/trace/events/ras.h > v3-v4 Cleaned up comments and commit header > v4-v5 More cleanup remove () from if statement in print. > Renamed string define to be more specific. > v5-v6 change TRACE_SYSTEM define to be ras and not aer. > Signed-off-by: Lance Ortiz <lance.ortiz@hp.com> > --- > > 0 files changed, 0 insertions(+), 0 deletions(-) > > diff --git a/include/trace/events/ras.h b/include/trace/events/ras.h > new file mode 100644 > index 0000000..88b8783 > --- /dev/null > +++ b/include/trace/events/ras.h > @@ -0,0 +1,77 @@ > +#undef TRACE_SYSTEM > +#define TRACE_SYSTEM ras > + > +#if !defined(_TRACE_AER_H) || defined(TRACE_HEADER_MULTI_READ) > +#define _TRACE_AER_H > + > +#include <linux/tracepoint.h> > +#include <linux/edac.h> > + > + > +/* > + * PCIe AER Trace event > + * > + * These events are generated when hardware detects a corrected or > + * uncorrected event on a PCIe device. The event report has > + * the following structure: > + * > + * char * dev_name - The name of the slot where the device resides > + * ([domain:]bus:device.function). > + * u32 status - Either the correctable or uncorrectable register > + * indicating what error or errors have been seen > + * u8 severity - error severity 0:NONFATAL 1:FATAL 2:CORRECTED > + */ > + > +#define aer_correctable_errors \ > + {BIT(0), "Receiver Error"}, \ > + {BIT(6), "Bad TLP"}, \ > + {BIT(7), "Bad DLLP"}, \ > + {BIT(8), "RELAY_NUM Rollover"}, \ > + {BIT(12), "Replay Timer Timeout"}, \ > + {BIT(13), "Advisory Non-Fatal"} > + > +#define aer_uncorrectable_errors \ > + {BIT(4), "Data Link Protocol"}, \ > + {BIT(12), "Poisoned TLP"}, \ > + {BIT(13), "Flow Control Protocol"}, \ > + {BIT(14), "Completion Timeout"}, \ > + {BIT(15), "Completer Abort"}, \ > + {BIT(16), "Unexpected Completion"}, \ > + {BIT(17), "Receiver Overflow"}, \ > + {BIT(18), "Malformed TLP"}, \ > + {BIT(19), "ECRC"}, \ > + {BIT(20), "Unsupported Request"} > + > +TRACE_EVENT(aer_event, > + TP_PROTO(const char *dev_name, > + const u32 status, > + const u8 severity), > + > + TP_ARGS(dev_name, status, severity), > + > + TP_STRUCT__entry( > + __string( dev_name, dev_name ) > + __field( u32, status ) > + __field( u8, severity ) > + ), > + > + TP_fast_assign( > + __assign_str(dev_name, dev_name); > + __entry->status = status; > + __entry->severity = severity; > + ), > + > + TP_printk("%s PCIe Bus Error: severity=%s, %s\n", > + __get_str(dev_name), > + __entry->severity == HW_EVENT_ERR_CORRECTED ? "Corrected" : > + __entry->severity == HW_EVENT_ERR_FATAL ? > + "Fatal" : "Uncorrected", > + __entry->severity == HW_EVENT_ERR_CORRECTED ? > + __print_flags(__entry->status, "|", aer_correctable_errors) : > + __print_flags(__entry->status, "|", aer_uncorrectable_errors)) > +); > + > +#endif /* _TRACE_AER_H */ > + > +/* This part must be outside protection */ > +#include <trace/define_trace.h> > Provided that a latter patch adds the missing bits and joins the strings used both here and at aer, as proposed by Boris: Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> -- Cheers, Mauro ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2012-12-04 20:36 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-12-04 17:04 [PATCH v6 1/3] aerdrv: Trace Event for AER Lance Ortiz 2012-12-04 17:04 ` [PATCH v6 2/3] aerdrv: Enhanced AER logging Lance Ortiz 2012-12-04 18:41 ` Mauro Carvalho Chehab 2012-12-04 20:14 ` Ortiz, Lance E 2012-12-04 20:31 ` Mauro Carvalho Chehab 2012-12-04 20:36 ` Borislav Petkov 2012-12-04 17:04 ` [PATCH v6 3/3] aerdrv: Cleanup log output for CPER based AER Lance Ortiz 2012-12-04 17:39 ` Joe Perches 2012-12-04 20:06 ` Ortiz, Lance E 2012-12-04 18:36 ` [PATCH v6 1/3] aerdrv: Trace Event for AER Mauro Carvalho Chehab
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).