* [PATCH 0/2] Make ELOG log and trace consistently with GHES
@ 2024-05-27 14:43 Fabio M. De Francesco
2024-05-27 14:43 ` [PATCH 1/2] ACPI: extlog: Trace CPER Non-standard Section Body Fabio M. De Francesco
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Fabio M. De Francesco @ 2024-05-27 14:43 UTC (permalink / raw)
To: Rafael J. Wysocki, Len Brown, Mahesh J Salgaonkar,
Oliver O'Halloran, Bjorn Helgaas, linux-kernel, linux-acpi,
linuxppc-dev, linux-pci, Dan Williams, Fabio M . De Francesco
When Firmware First is enabled, BIOS handles errors first and then it
makes them available to the kernel via the Common Platform Error Record
(CPER) sections (UEFI 2.10 Appendix N). Linux parses the CPER sections
via one of two similar paths, either ELOG or GHES.
Currently, ELOG and GHES show some inconsistencies in how they print to
the kernel log as well as in how they report to userspace via trace
events.
Make the two mentioned paths act similarly for what relates to logging
and tracing.
--- Changes for v1 ---
- Drop the RFC prefix and restart from PATCH v1
- Drop patch 3/3 because a discussion on it has not yet been
settled
- Drop namespacing in export of pci_print_aer while() (Dan)
- Don't use '#ifdef' in *.c files (Dan)
- Drop a reference on pdev after operation is complete (Dan)
- Don't log an error message if pdev is NULL (Dan)
--- Changes for RFC v2 ---
- 0/3: rework the subject line and the letter.
- 1/3: no changes.
- 2/3: trace CPER PCIe Section only if CONFIG_ACPI_APEI_PCIEAER
is defined; the kernel test robot reported the use of two
undefined symbols because the test for the config option was
missing; rewrite the subject line and part of commit message.
- 3/3: no changes.
Fabio M. De Francesco (2):
ACPI: extlog: Trace CPER Non-standard Section Body
ACPI: extlog: Trace CPER PCI Express Error Section
drivers/acpi/acpi_extlog.c | 35 +++++++++++++++++++++++++++++++++++
drivers/pci/pcie/aer.c | 2 +-
include/linux/aer.h | 9 ++++++---
3 files changed, 42 insertions(+), 4 deletions(-)
--
2.45.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/2] ACPI: extlog: Trace CPER Non-standard Section Body
2024-05-27 14:43 [PATCH 0/2] Make ELOG log and trace consistently with GHES Fabio M. De Francesco
@ 2024-05-27 14:43 ` Fabio M. De Francesco
2024-08-06 19:21 ` Dan Williams
2024-08-06 21:07 ` Bjorn Helgaas
2024-05-27 14:43 ` [PATCH 2/2] ACPI: extlog: Trace CPER PCI Express Error Section Fabio M. De Francesco
2024-07-23 13:43 ` [PATCH 0/2] Make ELOG log and trace consistently with GHES Fabio M. De Francesco
2 siblings, 2 replies; 12+ messages in thread
From: Fabio M. De Francesco @ 2024-05-27 14:43 UTC (permalink / raw)
To: Rafael J. Wysocki, Len Brown, Mahesh J Salgaonkar,
Oliver O'Halloran, Bjorn Helgaas, linux-kernel, linux-acpi,
linuxppc-dev, linux-pci, Dan Williams, Fabio M . De Francesco
In extlog_print(), trace "Non-standard Section Body" reported by firmware
to the OS via Common Platform Error Record (CPER) (UEFI v2.10 Appendix N
2.3) to add further debug information and so to make ELOG log
consistently with ghes_do_proc() (GHES).
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
---
drivers/acpi/acpi_extlog.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
index f055609d4b64..e025ae390737 100644
--- a/drivers/acpi/acpi_extlog.c
+++ b/drivers/acpi/acpi_extlog.c
@@ -179,6 +179,12 @@ static int extlog_print(struct notifier_block *nb, unsigned long val,
if (gdata->error_data_length >= sizeof(*mem))
trace_extlog_mem_event(mem, err_seq, fru_id, fru_text,
(u8)gdata->error_severity);
+ } else {
+ void *err = acpi_hest_get_payload(gdata);
+
+ trace_non_standard_event(sec_type, fru_id, fru_text,
+ gdata->error_severity, err,
+ gdata->error_data_length);
}
}
--
2.45.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/2] ACPI: extlog: Trace CPER PCI Express Error Section
2024-05-27 14:43 [PATCH 0/2] Make ELOG log and trace consistently with GHES Fabio M. De Francesco
2024-05-27 14:43 ` [PATCH 1/2] ACPI: extlog: Trace CPER Non-standard Section Body Fabio M. De Francesco
@ 2024-05-27 14:43 ` Fabio M. De Francesco
2024-08-06 19:56 ` Dan Williams
2024-08-06 21:31 ` Bjorn Helgaas
2024-07-23 13:43 ` [PATCH 0/2] Make ELOG log and trace consistently with GHES Fabio M. De Francesco
2 siblings, 2 replies; 12+ messages in thread
From: Fabio M. De Francesco @ 2024-05-27 14:43 UTC (permalink / raw)
To: Rafael J. Wysocki, Len Brown, Mahesh J Salgaonkar,
Oliver O'Halloran, Bjorn Helgaas, linux-kernel, linux-acpi,
linuxppc-dev, linux-pci, Dan Williams, Fabio M . De Francesco
Currently, extlog_print() (ELOG) only reports CPER PCIe section (UEFI
v2.10, Appendix N.2.7) to the kernel log via print_extlog_rcd(). Instead,
the similar ghes_do_proc() (GHES) prints to kernel log and calls
pci_print_aer() to report via the ftrace infrastructure.
Add support to report the CPER PCIe Error section also via the ftrace
infrastructure by calling pci_print_aer() to make ELOG act consistently
with GHES.
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
---
drivers/acpi/acpi_extlog.c | 30 ++++++++++++++++++++++++++++++
drivers/pci/pcie/aer.c | 2 +-
include/linux/aer.h | 13 +++++++++++--
3 files changed, 42 insertions(+), 3 deletions(-)
diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
index e025ae390737..007ce96f8672 100644
--- a/drivers/acpi/acpi_extlog.c
+++ b/drivers/acpi/acpi_extlog.c
@@ -131,6 +131,32 @@ static int print_extlog_rcd(const char *pfx,
return 1;
}
+static void extlog_print_pcie(struct cper_sec_pcie *pcie_err,
+ int severity)
+{
+ struct aer_capability_regs *aer;
+ struct pci_dev *pdev;
+ unsigned int devfn;
+ unsigned int bus;
+ int aer_severity;
+ int domain;
+
+ if (pcie_err->validation_bits & CPER_PCIE_VALID_DEVICE_ID &&
+ pcie_err->validation_bits & CPER_PCIE_VALID_AER_INFO) {
+ aer_severity = cper_severity_to_aer(severity);
+ aer = (struct aer_capability_regs *)pcie_err->aer_info;
+ domain = pcie_err->device_id.segment;
+ bus = pcie_err->device_id.bus;
+ devfn = PCI_DEVFN(pcie_err->device_id.device,
+ pcie_err->device_id.function);
+ pdev = pci_get_domain_bus_and_slot(domain, bus, devfn);
+ if (!pdev)
+ return;
+ pci_print_aer(pdev, aer_severity, aer);
+ pci_dev_put(pdev);
+ }
+}
+
static int extlog_print(struct notifier_block *nb, unsigned long val,
void *data)
{
@@ -179,6 +205,10 @@ static int extlog_print(struct notifier_block *nb, unsigned long val,
if (gdata->error_data_length >= sizeof(*mem))
trace_extlog_mem_event(mem, err_seq, fru_id, fru_text,
(u8)gdata->error_severity);
+ } else if (guid_equal(sec_type, &CPER_SEC_PCIE)) {
+ struct cper_sec_pcie *pcie_err = acpi_hest_get_payload(gdata);
+
+ extlog_print_pcie(pcie_err, gdata->error_severity);
} else {
void *err = acpi_hest_get_payload(gdata);
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index ac6293c24976..794aa15527ba 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -801,7 +801,7 @@ void pci_print_aer(struct pci_dev *dev, int aer_severity,
trace_aer_event(dev_name(&dev->dev), (status & ~mask),
aer_severity, tlp_header_valid, &aer->header_log);
}
-EXPORT_SYMBOL_NS_GPL(pci_print_aer, CXL);
+EXPORT_SYMBOL_GPL(pci_print_aer);
/**
* add_error_device - list device to be handled
diff --git a/include/linux/aer.h b/include/linux/aer.h
index 4b97f38f3fcf..fbc82206045c 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -42,17 +42,26 @@ int pcie_read_tlp_log(struct pci_dev *dev, int where, struct pcie_tlp_log *log);
#if defined(CONFIG_PCIEAER)
int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
int pcie_aer_is_native(struct pci_dev *dev);
+void pci_print_aer(struct pci_dev *dev, int aer_severity,
+ struct aer_capability_regs *aer);
#else
static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
{
return -EINVAL;
}
+
static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; }
+static inline void pci_print_aer(struct pci_dev *dev, int aer_severity,
+ struct aer_capability_regs *aer)
+{ }
#endif
-void pci_print_aer(struct pci_dev *dev, int aer_severity,
- struct aer_capability_regs *aer);
+#if defined(CONFIG_ACPI_APEI_PCIEAER)
int cper_severity_to_aer(int cper_severity);
+#else
+static inline int cper_severity_to_aer(int cper_severity) { return 0; }
+#endif
+
void aer_recover_queue(int domain, unsigned int bus, unsigned int devfn,
int severity, struct aer_capability_regs *aer_regs);
#endif //_AER_H_
--
2.45.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 0/2] Make ELOG log and trace consistently with GHES
2024-05-27 14:43 [PATCH 0/2] Make ELOG log and trace consistently with GHES Fabio M. De Francesco
2024-05-27 14:43 ` [PATCH 1/2] ACPI: extlog: Trace CPER Non-standard Section Body Fabio M. De Francesco
2024-05-27 14:43 ` [PATCH 2/2] ACPI: extlog: Trace CPER PCI Express Error Section Fabio M. De Francesco
@ 2024-07-23 13:43 ` Fabio M. De Francesco
2 siblings, 0 replies; 12+ messages in thread
From: Fabio M. De Francesco @ 2024-07-23 13:43 UTC (permalink / raw)
To: Rafael J. Wysocki, Len Brown, Mahesh J Salgaonkar,
Oliver O'Halloran, Bjorn Helgaas, linux-kernel, linux-acpi,
linuxppc-dev, linux-pci, Dan Williams
On Monday, May 27, 2024 4:43:39 PM GMT+2 Fabio M. De Francesco wrote:
> When Firmware First is enabled, BIOS handles errors first and then it
> makes them available to the kernel via the Common Platform Error Record
> (CPER) sections (UEFI 2.10 Appendix N). Linux parses the CPER sections
> via one of two similar paths, either ELOG or GHES.
>
> Currently, ELOG and GHES show some inconsistencies in how they print to
> the kernel log as well as in how they report to userspace via trace
> events.
>
> Make the two mentioned paths act similarly for what relates to logging
> and tracing.
Gentle ping.
Thanks,
Fabio
> --- Changes for v1 ---
>
> - Drop the RFC prefix and restart from PATCH v1
> - Drop patch 3/3 because a discussion on it has not yet been
> settled
> - Drop namespacing in export of pci_print_aer while() (Dan)
> - Don't use '#ifdef' in *.c files (Dan)
> - Drop a reference on pdev after operation is complete (Dan)
> - Don't log an error message if pdev is NULL (Dan)
>
> --- Changes for RFC v2 ---
>
> - 0/3: rework the subject line and the letter.
> - 1/3: no changes.
> - 2/3: trace CPER PCIe Section only if CONFIG_ACPI_APEI_PCIEAER
> is defined; the kernel test robot reported the use of two
> undefined symbols because the test for the config option was
> missing; rewrite the subject line and part of commit message.
> - 3/3: no changes.
>
> Fabio M. De Francesco (2):
> ACPI: extlog: Trace CPER Non-standard Section Body
> ACPI: extlog: Trace CPER PCI Express Error Section
>
> drivers/acpi/acpi_extlog.c | 35 +++++++++++++++++++++++++++++++++++
> drivers/pci/pcie/aer.c | 2 +-
> include/linux/aer.h | 9 ++++++---
> 3 files changed, 42 insertions(+), 4 deletions(-)
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] ACPI: extlog: Trace CPER Non-standard Section Body
2024-05-27 14:43 ` [PATCH 1/2] ACPI: extlog: Trace CPER Non-standard Section Body Fabio M. De Francesco
@ 2024-08-06 19:21 ` Dan Williams
2024-08-06 21:07 ` Bjorn Helgaas
1 sibling, 0 replies; 12+ messages in thread
From: Dan Williams @ 2024-08-06 19:21 UTC (permalink / raw)
To: Fabio M. De Francesco, Rafael J. Wysocki, Len Brown,
Mahesh J Salgaonkar, Oliver O'Halloran, Bjorn Helgaas,
linux-kernel, linux-acpi, linuxppc-dev, linux-pci, Dan Williams
Fabio M. De Francesco wrote:
> In extlog_print(), trace "Non-standard Section Body" reported by firmware
> to the OS via Common Platform Error Record (CPER) (UEFI v2.10 Appendix N
> 2.3) to add further debug information and so to make ELOG log
> consistently with ghes_do_proc() (GHES).
I think this description could be clearer, how about:
---
ghes_do_proc() has a catch-all for unknown or unhandled CPER formats
(UEFI v2.10 Appendix N 2.3), extlog_print() does not. This gap was
noticed by a RAS test that injected CXL protocol errors which were
notified to extlog_print() via the IOMCA (I/O Machine Check
Architecture) mechanism. Bring parity to the extlog_print() path by
including a similar trace_non_standard_event().
---
>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
> ---
> drivers/acpi/acpi_extlog.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
> index f055609d4b64..e025ae390737 100644
> --- a/drivers/acpi/acpi_extlog.c
> +++ b/drivers/acpi/acpi_extlog.c
> @@ -179,6 +179,12 @@ static int extlog_print(struct notifier_block *nb, unsigned long val,
> if (gdata->error_data_length >= sizeof(*mem))
> trace_extlog_mem_event(mem, err_seq, fru_id, fru_text,
> (u8)gdata->error_severity);
> + } else {
> + void *err = acpi_hest_get_payload(gdata);
> +
> + trace_non_standard_event(sec_type, fru_id, fru_text,
> + gdata->error_severity, err,
> + gdata->error_data_length);
> }
...with the above changelog update the code change looks good to me, you
can add:
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] ACPI: extlog: Trace CPER PCI Express Error Section
2024-05-27 14:43 ` [PATCH 2/2] ACPI: extlog: Trace CPER PCI Express Error Section Fabio M. De Francesco
@ 2024-08-06 19:56 ` Dan Williams
2024-10-23 13:35 ` Fabio M. De Francesco
2024-08-06 21:31 ` Bjorn Helgaas
1 sibling, 1 reply; 12+ messages in thread
From: Dan Williams @ 2024-08-06 19:56 UTC (permalink / raw)
To: Fabio M. De Francesco, Rafael J. Wysocki, Len Brown,
Mahesh J Salgaonkar, Oliver O'Halloran, Bjorn Helgaas,
linux-kernel, linux-acpi, linuxppc-dev, linux-pci, Dan Williams
Fabio M. De Francesco wrote:
> Currently, extlog_print() (ELOG) only reports CPER PCIe section (UEFI
> v2.10, Appendix N.2.7) to the kernel log via print_extlog_rcd().
I think the critical detail is is that print_extlog_rcd() is only
triggered when ras_userspace_consumers() returns true. The observation
is that ras_userspace_consumers() hides information from the trace path
when the intended purpose of it was to hide duplicate emissions to the
kernel log when userspace is watching the tracepoints.
Setting aside whether ras_userspace_consumers() is still a good idea or
not, it is obvious that this patch as is may surprise environments that
start seeing kernel error logs where the kernel was silent before.
I think the path of least surprise would be to make sure that
pci_print_aer() optionally skips emitting to the kernel log when not
needed wanted.
So perhaps first do a lead-in patch to optionally quiet the print
messages in pci_print_aer() and then pass in KERN_DEBUG from the
extlog_print() path. Then we can decide later what to do about
ras_userspace_consumers().
> the similar ghes_do_proc() (GHES) prints to kernel log and calls
> pci_print_aer() to report via the ftrace infrastructure.
>
> Add support to report the CPER PCIe Error section also via the ftrace
> infrastructure by calling pci_print_aer() to make ELOG act consistently
> with GHES.
You might also want to explain a bit about the motivation for this which
is that I/O Machine Check Arcitecture events may signal failing PCIe
components or links. The AER event contains details on what was
happening on the wire when the error was signaled.
>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
> ---
> drivers/acpi/acpi_extlog.c | 30 ++++++++++++++++++++++++++++++
> drivers/pci/pcie/aer.c | 2 +-
> include/linux/aer.h | 13 +++++++++++--
> 3 files changed, 42 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
> index e025ae390737..007ce96f8672 100644
> --- a/drivers/acpi/acpi_extlog.c
> +++ b/drivers/acpi/acpi_extlog.c
> @@ -131,6 +131,32 @@ static int print_extlog_rcd(const char *pfx,
> return 1;
> }
>
> +static void extlog_print_pcie(struct cper_sec_pcie *pcie_err,
> + int severity)
> +{
> + struct aer_capability_regs *aer;
> + struct pci_dev *pdev;
> + unsigned int devfn;
> + unsigned int bus;
> + int aer_severity;
> + int domain;
> +
> + if (pcie_err->validation_bits & CPER_PCIE_VALID_DEVICE_ID &&
> + pcie_err->validation_bits & CPER_PCIE_VALID_AER_INFO) {
> + aer_severity = cper_severity_to_aer(severity);
> + aer = (struct aer_capability_regs *)pcie_err->aer_info;
> + domain = pcie_err->device_id.segment;
> + bus = pcie_err->device_id.bus;
> + devfn = PCI_DEVFN(pcie_err->device_id.device,
> + pcie_err->device_id.function);
> + pdev = pci_get_domain_bus_and_slot(domain, bus, devfn);
> + if (!pdev)
> + return;
> + pci_print_aer(pdev, aer_severity, aer);
...per above this would become:
pci_print_aer(KERN_DEBUG, pdev, aer_severity, aer);
[..]
Rest of the changes look good to me.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] ACPI: extlog: Trace CPER Non-standard Section Body
2024-05-27 14:43 ` [PATCH 1/2] ACPI: extlog: Trace CPER Non-standard Section Body Fabio M. De Francesco
2024-08-06 19:21 ` Dan Williams
@ 2024-08-06 21:07 ` Bjorn Helgaas
1 sibling, 0 replies; 12+ messages in thread
From: Bjorn Helgaas @ 2024-08-06 21:07 UTC (permalink / raw)
To: Fabio M. De Francesco
Cc: Rafael J. Wysocki, linux-pci, linux-kernel, Mahesh J Salgaonkar,
linux-acpi, Oliver O'Halloran, Bjorn Helgaas, Dan Williams,
linuxppc-dev, Len Brown
On Mon, May 27, 2024 at 04:43:40PM +0200, Fabio M. De Francesco wrote:
> In extlog_print(), trace "Non-standard Section Body" reported by firmware
> to the OS via Common Platform Error Record (CPER) (UEFI v2.10 Appendix N
> 2.3) to add further debug information and so to make ELOG log
> consistently with ghes_do_proc() (GHES).
>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
> ---
> drivers/acpi/acpi_extlog.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
> index f055609d4b64..e025ae390737 100644
> --- a/drivers/acpi/acpi_extlog.c
> +++ b/drivers/acpi/acpi_extlog.c
> @@ -179,6 +179,12 @@ static int extlog_print(struct notifier_block *nb, unsigned long val,
> if (gdata->error_data_length >= sizeof(*mem))
> trace_extlog_mem_event(mem, err_seq, fru_id, fru_text,
> (u8)gdata->error_severity);
> + } else {
> + void *err = acpi_hest_get_payload(gdata);
> +
> + trace_non_standard_event(sec_type, fru_id, fru_text,
> + gdata->error_severity, err,
> + gdata->error_data_length);
Kudos for making these two paths more similar.
Not specific to *this* patch, but it's annoying to try to find
tracepoint implementations. I guess it's
TRACE_EVENT(non_standard_event, ...) in include/ras/ras_event.h.
This has the same prototype as log_non_standard_event(), so
could extlog_print() be made a little bit more like ghes_do_proc() by
using log_non_standard_event() instead of trace_non_standard_event()
directly?
Bjorn
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] ACPI: extlog: Trace CPER PCI Express Error Section
2024-05-27 14:43 ` [PATCH 2/2] ACPI: extlog: Trace CPER PCI Express Error Section Fabio M. De Francesco
2024-08-06 19:56 ` Dan Williams
@ 2024-08-06 21:31 ` Bjorn Helgaas
2024-08-07 20:28 ` Dan Williams
1 sibling, 1 reply; 12+ messages in thread
From: Bjorn Helgaas @ 2024-08-06 21:31 UTC (permalink / raw)
To: Fabio M. De Francesco
Cc: Rafael J. Wysocki, linux-pci, linux-kernel, Mahesh J Salgaonkar,
linux-acpi, Oliver O'Halloran, Bjorn Helgaas, Dan Williams,
linuxppc-dev, Len Brown
On Mon, May 27, 2024 at 04:43:41PM +0200, Fabio M. De Francesco wrote:
> Currently, extlog_print() (ELOG) only reports CPER PCIe section (UEFI
> v2.10, Appendix N.2.7) to the kernel log via print_extlog_rcd(). Instead,
> the similar ghes_do_proc() (GHES) prints to kernel log and calls
> pci_print_aer() to report via the ftrace infrastructure.
>
> Add support to report the CPER PCIe Error section also via the ftrace
> infrastructure by calling pci_print_aer() to make ELOG act consistently
> with GHES.
>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
> ---
> drivers/acpi/acpi_extlog.c | 30 ++++++++++++++++++++++++++++++
> drivers/pci/pcie/aer.c | 2 +-
> include/linux/aer.h | 13 +++++++++++--
> 3 files changed, 42 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
> index e025ae390737..007ce96f8672 100644
> --- a/drivers/acpi/acpi_extlog.c
> +++ b/drivers/acpi/acpi_extlog.c
> @@ -131,6 +131,32 @@ static int print_extlog_rcd(const char *pfx,
> return 1;
> }
>
> +static void extlog_print_pcie(struct cper_sec_pcie *pcie_err,
> + int severity)
> +{
> + struct aer_capability_regs *aer;
> + struct pci_dev *pdev;
> + unsigned int devfn;
> + unsigned int bus;
> + int aer_severity;
> + int domain;
> +
> + if (pcie_err->validation_bits & CPER_PCIE_VALID_DEVICE_ID &&
> + pcie_err->validation_bits & CPER_PCIE_VALID_AER_INFO) {
> + aer_severity = cper_severity_to_aer(severity);
> + aer = (struct aer_capability_regs *)pcie_err->aer_info;
> + domain = pcie_err->device_id.segment;
> + bus = pcie_err->device_id.bus;
> + devfn = PCI_DEVFN(pcie_err->device_id.device,
> + pcie_err->device_id.function);
> + pdev = pci_get_domain_bus_and_slot(domain, bus, devfn);
> + if (!pdev)
> + return;
> + pci_print_aer(pdev, aer_severity, aer);
> + pci_dev_put(pdev);
> + }
I'm 100% in favor of making error reporting work and look the same
across GHES and ELOG. But I do have to gripe a bit...
It's already unfortunate that GHES and the native AER handling are
separate paths that lead to the same place (__aer_print_error()).
I'm sorry that we need to add a third path that again does
fundamentally the same thing. The fact that they're separate means
all the design, reviewing, testing, and maintenance effort is diluted,
and error handling always gets too little love in the first place.
I think this is a recipe for confusion.
ghes_do_proc # GHES
apei_estatus_for_each_section
...
if (guid_equal(sec_type, &CPER_SEC_PCIE))
ghes_handle_aer
cper_severity_to_aer
aer_recover_queue
kfifo_in_spinlocked(&aer_recover_ring) # add to queue
aer_recover_work_func # another thread
kfifo_get(&aer_recover_ring) # remove from queue
pci_print_aer
__aer_print_error <---
aer_irq # native AER
kfifo_put(&aer_fifo) # add to queue
aer_isr # another thread
kfifo_get(&aer_fifo) # remove from queue
...
aer_isr_one_error
aer_process_err_devices
aer_print_error
__aer_print_error <---
extlog_print # extlog (x86 only)
apei_estatus_for_each_section
...
if (guid_equal(sec_type, &CPER_SEC_PCIE))
extlog_print_pcie
cper_severity_to_aer
pci_get_domain_bus_and_slot
pci_print_aer
__aer_print_error <---
And we also have CXL paths that lead to __aer_print_error(), although
it seems like they at least start in the native AER (and maybe GHES?)
path and branch out somewhere. My head is spinning.
Do I *object* to this patch? No, not really; it's a trivial change in
drivers/pci/, and Rafael can add my
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
as needed. But I am afraid we're making ourselves a maintenance
headache.
> +}
> +
> static int extlog_print(struct notifier_block *nb, unsigned long val,
> void *data)
> {
> @@ -179,6 +205,10 @@ static int extlog_print(struct notifier_block *nb, unsigned long val,
> if (gdata->error_data_length >= sizeof(*mem))
> trace_extlog_mem_event(mem, err_seq, fru_id, fru_text,
> (u8)gdata->error_severity);
> + } else if (guid_equal(sec_type, &CPER_SEC_PCIE)) {
> + struct cper_sec_pcie *pcie_err = acpi_hest_get_payload(gdata);
> +
> + extlog_print_pcie(pcie_err, gdata->error_severity);
> } else {
> void *err = acpi_hest_get_payload(gdata);
>
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index ac6293c24976..794aa15527ba 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -801,7 +801,7 @@ void pci_print_aer(struct pci_dev *dev, int aer_severity,
> trace_aer_event(dev_name(&dev->dev), (status & ~mask),
> aer_severity, tlp_header_valid, &aer->header_log);
> }
> -EXPORT_SYMBOL_NS_GPL(pci_print_aer, CXL);
> +EXPORT_SYMBOL_GPL(pci_print_aer);
>
> /**
> * add_error_device - list device to be handled
> diff --git a/include/linux/aer.h b/include/linux/aer.h
> index 4b97f38f3fcf..fbc82206045c 100644
> --- a/include/linux/aer.h
> +++ b/include/linux/aer.h
> @@ -42,17 +42,26 @@ int pcie_read_tlp_log(struct pci_dev *dev, int where, struct pcie_tlp_log *log);
> #if defined(CONFIG_PCIEAER)
> int pci_aer_clear_nonfatal_status(struct pci_dev *dev);
> int pcie_aer_is_native(struct pci_dev *dev);
> +void pci_print_aer(struct pci_dev *dev, int aer_severity,
> + struct aer_capability_regs *aer);
> #else
> static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
> {
> return -EINVAL;
> }
> +
> static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; }
> +static inline void pci_print_aer(struct pci_dev *dev, int aer_severity,
> + struct aer_capability_regs *aer)
> +{ }
> #endif
>
> -void pci_print_aer(struct pci_dev *dev, int aer_severity,
> - struct aer_capability_regs *aer);
> +#if defined(CONFIG_ACPI_APEI_PCIEAER)
> int cper_severity_to_aer(int cper_severity);
> +#else
> +static inline int cper_severity_to_aer(int cper_severity) { return 0; }
> +#endif
> +
> void aer_recover_queue(int domain, unsigned int bus, unsigned int devfn,
> int severity, struct aer_capability_regs *aer_regs);
> #endif //_AER_H_
> --
> 2.45.1
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] ACPI: extlog: Trace CPER PCI Express Error Section
2024-08-06 21:31 ` Bjorn Helgaas
@ 2024-08-07 20:28 ` Dan Williams
2024-08-07 20:31 ` Dan Williams
0 siblings, 1 reply; 12+ messages in thread
From: Dan Williams @ 2024-08-07 20:28 UTC (permalink / raw)
To: Bjorn Helgaas, Fabio M. De Francesco
Cc: Rafael J. Wysocki, linux-pci, linux-kernel, Mahesh J Salgaonkar,
linux-acpi, Oliver O'Halloran, Bjorn Helgaas, Dan Williams,
linuxppc-dev, Len Brown
[ add Boris ]
Bjorn Helgaas wrote:
> On Mon, May 27, 2024 at 04:43:41PM +0200, Fabio M. De Francesco wrote:
> > Currently, extlog_print() (ELOG) only reports CPER PCIe section (UEFI
> > v2.10, Appendix N.2.7) to the kernel log via print_extlog_rcd(). Instead,
> > the similar ghes_do_proc() (GHES) prints to kernel log and calls
> > pci_print_aer() to report via the ftrace infrastructure.
> >
> > Add support to report the CPER PCIe Error section also via the ftrace
> > infrastructure by calling pci_print_aer() to make ELOG act consistently
> > with GHES.
> >
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
> > ---
> > drivers/acpi/acpi_extlog.c | 30 ++++++++++++++++++++++++++++++
> > drivers/pci/pcie/aer.c | 2 +-
> > include/linux/aer.h | 13 +++++++++++--
> > 3 files changed, 42 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
> > index e025ae390737..007ce96f8672 100644
> > --- a/drivers/acpi/acpi_extlog.c
> > +++ b/drivers/acpi/acpi_extlog.c
> > @@ -131,6 +131,32 @@ static int print_extlog_rcd(const char *pfx,
> > return 1;
> > }
> >
> > +static void extlog_print_pcie(struct cper_sec_pcie *pcie_err,
> > + int severity)
> > +{
> > + struct aer_capability_regs *aer;
> > + struct pci_dev *pdev;
> > + unsigned int devfn;
> > + unsigned int bus;
> > + int aer_severity;
> > + int domain;
> > +
> > + if (pcie_err->validation_bits & CPER_PCIE_VALID_DEVICE_ID &&
> > + pcie_err->validation_bits & CPER_PCIE_VALID_AER_INFO) {
> > + aer_severity = cper_severity_to_aer(severity);
> > + aer = (struct aer_capability_regs *)pcie_err->aer_info;
> > + domain = pcie_err->device_id.segment;
> > + bus = pcie_err->device_id.bus;
> > + devfn = PCI_DEVFN(pcie_err->device_id.device,
> > + pcie_err->device_id.function);
> > + pdev = pci_get_domain_bus_and_slot(domain, bus, devfn);
> > + if (!pdev)
> > + return;
> > + pci_print_aer(pdev, aer_severity, aer);
> > + pci_dev_put(pdev);
> > + }
>
> I'm 100% in favor of making error reporting work and look the same
> across GHES and ELOG. But I do have to gripe a bit...
>
> It's already unfortunate that GHES and the native AER handling are
> separate paths that lead to the same place (__aer_print_error()).
>
> I'm sorry that we need to add a third path that again does
> fundamentally the same thing. The fact that they're separate means
> all the design, reviewing, testing, and maintenance effort is diluted,
> and error handling always gets too little love in the first place.
> I think this is a recipe for confusion.
>
> ghes_do_proc # GHES
> apei_estatus_for_each_section
> ...
> if (guid_equal(sec_type, &CPER_SEC_PCIE))
> ghes_handle_aer
> cper_severity_to_aer
> aer_recover_queue
> kfifo_in_spinlocked(&aer_recover_ring) # add to queue
> aer_recover_work_func # another thread
> kfifo_get(&aer_recover_ring) # remove from queue
> pci_print_aer
> __aer_print_error <---
>
> aer_irq # native AER
> kfifo_put(&aer_fifo) # add to queue
> aer_isr # another thread
> kfifo_get(&aer_fifo) # remove from queue
> ...
> aer_isr_one_error
> aer_process_err_devices
> aer_print_error
> __aer_print_error <---
>
> extlog_print # extlog (x86 only)
> apei_estatus_for_each_section
> ...
> if (guid_equal(sec_type, &CPER_SEC_PCIE))
> extlog_print_pcie
> cper_severity_to_aer
> pci_get_domain_bus_and_slot
> pci_print_aer
> __aer_print_error <---
>
> And we also have CXL paths that lead to __aer_print_error(), although
> it seems like they at least start in the native AER (and maybe GHES?)
> path and branch out somewhere. My head is spinning.
>
> Do I *object* to this patch? No, not really; it's a trivial change in
> drivers/pci/, and Rafael can add my
>
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
>
> as needed. But I am afraid we're making ourselves a maintenance
> headache.
To be honest, I am too. Upon discovering that extlog_print() behaves
differently than ghes_do_proc(), I had the snarky thought "great, can we
now just go ahead and deprecate the extlog path, it's just a source of
maintenance pain.".
So *if*we keep acpi_extlog it then I definitely think it should be
consistent with other CPER handlers (needs this patch). But, I am also
open to entertaining "deprecate it".
To me, the fact that ras_userspace_consumers() is only honored for
acpi_extlog is clear evidence that the kernel has already painted itself
into a confusing user ABI corner and maybe the proper path forward at
this point is to cut acpi_extlog loose.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] ACPI: extlog: Trace CPER PCI Express Error Section
2024-08-07 20:28 ` Dan Williams
@ 2024-08-07 20:31 ` Dan Williams
0 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2024-08-07 20:31 UTC (permalink / raw)
To: Dan Williams, Bjorn Helgaas, Fabio M. De Francesco
Cc: Rafael J. Wysocki, linux-pci, linux-kernel, Mahesh J Salgaonkar,
linux-acpi, Oliver O'Halloran, Bjorn Helgaas, bp,
Dan Williams, linuxppc-dev, Len Brown
Dan Williams wrote:
> [ add Boris ]
[ actually add Boris ]
Boris, see below, thoughts on deprecating acpi_extlog...
> Bjorn Helgaas wrote:
> > On Mon, May 27, 2024 at 04:43:41PM +0200, Fabio M. De Francesco wrote:
> > > Currently, extlog_print() (ELOG) only reports CPER PCIe section (UEFI
> > > v2.10, Appendix N.2.7) to the kernel log via print_extlog_rcd(). Instead,
> > > the similar ghes_do_proc() (GHES) prints to kernel log and calls
> > > pci_print_aer() to report via the ftrace infrastructure.
> > >
> > > Add support to report the CPER PCIe Error section also via the ftrace
> > > infrastructure by calling pci_print_aer() to make ELOG act consistently
> > > with GHES.
> > >
> > > Cc: Dan Williams <dan.j.williams@intel.com>
> > > Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
> > > ---
> > > drivers/acpi/acpi_extlog.c | 30 ++++++++++++++++++++++++++++++
> > > drivers/pci/pcie/aer.c | 2 +-
> > > include/linux/aer.h | 13 +++++++++++--
> > > 3 files changed, 42 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
> > > index e025ae390737..007ce96f8672 100644
> > > --- a/drivers/acpi/acpi_extlog.c
> > > +++ b/drivers/acpi/acpi_extlog.c
> > > @@ -131,6 +131,32 @@ static int print_extlog_rcd(const char *pfx,
> > > return 1;
> > > }
> > >
> > > +static void extlog_print_pcie(struct cper_sec_pcie *pcie_err,
> > > + int severity)
> > > +{
> > > + struct aer_capability_regs *aer;
> > > + struct pci_dev *pdev;
> > > + unsigned int devfn;
> > > + unsigned int bus;
> > > + int aer_severity;
> > > + int domain;
> > > +
> > > + if (pcie_err->validation_bits & CPER_PCIE_VALID_DEVICE_ID &&
> > > + pcie_err->validation_bits & CPER_PCIE_VALID_AER_INFO) {
> > > + aer_severity = cper_severity_to_aer(severity);
> > > + aer = (struct aer_capability_regs *)pcie_err->aer_info;
> > > + domain = pcie_err->device_id.segment;
> > > + bus = pcie_err->device_id.bus;
> > > + devfn = PCI_DEVFN(pcie_err->device_id.device,
> > > + pcie_err->device_id.function);
> > > + pdev = pci_get_domain_bus_and_slot(domain, bus, devfn);
> > > + if (!pdev)
> > > + return;
> > > + pci_print_aer(pdev, aer_severity, aer);
> > > + pci_dev_put(pdev);
> > > + }
> >
> > I'm 100% in favor of making error reporting work and look the same
> > across GHES and ELOG. But I do have to gripe a bit...
> >
> > It's already unfortunate that GHES and the native AER handling are
> > separate paths that lead to the same place (__aer_print_error()).
> >
> > I'm sorry that we need to add a third path that again does
> > fundamentally the same thing. The fact that they're separate means
> > all the design, reviewing, testing, and maintenance effort is diluted,
> > and error handling always gets too little love in the first place.
> > I think this is a recipe for confusion.
> >
> > ghes_do_proc # GHES
> > apei_estatus_for_each_section
> > ...
> > if (guid_equal(sec_type, &CPER_SEC_PCIE))
> > ghes_handle_aer
> > cper_severity_to_aer
> > aer_recover_queue
> > kfifo_in_spinlocked(&aer_recover_ring) # add to queue
> > aer_recover_work_func # another thread
> > kfifo_get(&aer_recover_ring) # remove from queue
> > pci_print_aer
> > __aer_print_error <---
> >
> > aer_irq # native AER
> > kfifo_put(&aer_fifo) # add to queue
> > aer_isr # another thread
> > kfifo_get(&aer_fifo) # remove from queue
> > ...
> > aer_isr_one_error
> > aer_process_err_devices
> > aer_print_error
> > __aer_print_error <---
> >
> > extlog_print # extlog (x86 only)
> > apei_estatus_for_each_section
> > ...
> > if (guid_equal(sec_type, &CPER_SEC_PCIE))
> > extlog_print_pcie
> > cper_severity_to_aer
> > pci_get_domain_bus_and_slot
> > pci_print_aer
> > __aer_print_error <---
> >
> > And we also have CXL paths that lead to __aer_print_error(), although
> > it seems like they at least start in the native AER (and maybe GHES?)
> > path and branch out somewhere. My head is spinning.
> >
> > Do I *object* to this patch? No, not really; it's a trivial change in
> > drivers/pci/, and Rafael can add my
> >
> > Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> >
> > as needed. But I am afraid we're making ourselves a maintenance
> > headache.
>
> To be honest, I am too. Upon discovering that extlog_print() behaves
> differently than ghes_do_proc(), I had the snarky thought "great, can we
> now just go ahead and deprecate the extlog path, it's just a source of
> maintenance pain.".
>
> So *if*we keep acpi_extlog it then I definitely think it should be
> consistent with other CPER handlers (needs this patch). But, I am also
> open to entertaining "deprecate it".
>
> To me, the fact that ras_userspace_consumers() is only honored for
> acpi_extlog is clear evidence that the kernel has already painted itself
> into a confusing user ABI corner and maybe the proper path forward at
> this point is to cut acpi_extlog loose.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] ACPI: extlog: Trace CPER PCI Express Error Section
2024-08-06 19:56 ` Dan Williams
@ 2024-10-23 13:35 ` Fabio M. De Francesco
2024-12-11 1:51 ` Dan Williams
0 siblings, 1 reply; 12+ messages in thread
From: Fabio M. De Francesco @ 2024-10-23 13:35 UTC (permalink / raw)
To: linux-kernel, Dan Williams
Cc: Rafael J. Wysocki, Len Brown, Mahesh J Salgaonkar,
Oliver O'Halloran, Bjorn Helgaas, linux-acpi, linuxppc-dev,
linux-pci, Dan Williams
On Tuesday, August 6, 2024 9:56:24 PM GMT+2 Dan Williams wrote:
> Fabio M. De Francesco wrote:
> > Currently, extlog_print() (ELOG) only reports CPER PCIe section (UEFI
> > v2.10, Appendix N.2.7) to the kernel log via print_extlog_rcd().
>
> I think the critical detail is is that print_extlog_rcd() is only
> triggered when ras_userspace_consumers() returns true. The observation
> is that ras_userspace_consumers() hides information from the trace path
> when the intended purpose of it was to hide duplicate emissions to the
> kernel log when userspace is watching the tracepoints.
>
> Setting aside whether ras_userspace_consumers() is still a good idea or
> not, it is obvious that this patch as is may surprise environments that
> start seeing kernel error logs where the kernel was silent before.
>
> I think the path of least surprise would be to make sure that
> pci_print_aer() optionally skips emitting to the kernel log when not
> needed wanted.
Sorry for replying so late...
I'm not entirely sure that users would not prefer to be surprised by
_finally_ seeing kernel error logs for failing PCIe components. I suspect
that users might have been confused by not seeing any output.
> So perhaps first do a lead-in patch to optionally quiet the print
> messages in pci_print_aer() and then pass in KERN_DEBUG from the
> extlog_print() path. Then we can decide later what to do about
> ras_userspace_consumers().
Anyway, I'll do it.
> > the similar ghes_do_proc() (GHES) prints to kernel log and calls
> > pci_print_aer() to report via the ftrace infrastructure.
> >
> > Add support to report the CPER PCIe Error section also via the ftrace
> > infrastructure by calling pci_print_aer() to make ELOG act consistently
> > with GHES.
>
> You might also want to explain a bit about the motivation for this which
> is that I/O Machine Check Arcitecture events may signal failing PCIe
> components or links. The AER event contains details on what was
> happening on the wire when the error was signaled.
Yes, I agree.
> >
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Signed-off-by: Fabio M. De Francesco
<fabio.m.de.francesco@linux.intel.com>
> > ---
> > drivers/acpi/acpi_extlog.c | 30 ++++++++++++++++++++++++++++++
> > drivers/pci/pcie/aer.c | 2 +-
> > include/linux/aer.h | 13 +++++++++++--
> > 3 files changed, 42 insertions(+), 3 deletions(-)
> >
> > [...]
> >
> > + pci_print_aer(pdev, aer_severity, aer);
>
> ...per above this would become:
>
> pci_print_aer(KERN_DEBUG, pdev, aer_severity, aer);
>
> [..]
>
> Rest of the changes look good to me.
I need to be sure that I understood...
void pci_print_aer(char *level, struct pci_dev *dev, int aer_severity,
struct aer_capability_regs *aer)
{
[...]
if (printk_get_level(level) <= console_loglevel) {
pci_err(dev, "aer_status: 0x%08x, aer_mask: 0x%08x\n",
status, mask);
__aer_print_error(dev, &info);
pci_err(dev, "aer_layer=%s, aer_agent=%s\n",
aer_error_layer[layer], aer_agent_string[agent]);
if (aer_severity != AER_CORRECTABLE)
pci_err(dev, "aer_uncor_severity: 0x%08x\n",
aer->uncor_severity);
if (tlp_header_valid)
__print_tlp_header(dev, &aer->header_log);
}
[...]
}
It would require changing a couple of call sites, like in
aer_recover_work_func():
pci_print_aer(KERN_ERR, pdev, entry.severity, entry.regs);
Would you please confirm that the code shown above is what
you asked for?
Thanks,
Fabio
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] ACPI: extlog: Trace CPER PCI Express Error Section
2024-10-23 13:35 ` Fabio M. De Francesco
@ 2024-12-11 1:51 ` Dan Williams
0 siblings, 0 replies; 12+ messages in thread
From: Dan Williams @ 2024-12-11 1:51 UTC (permalink / raw)
To: Fabio M. De Francesco, linux-kernel, Dan Williams
Cc: Rafael J. Wysocki, Len Brown, Mahesh J Salgaonkar,
Oliver O'Halloran, Bjorn Helgaas, linux-acpi, linuxppc-dev,
linux-pci, Dan Williams
Fabio M. De Francesco wrote:
> On Tuesday, August 6, 2024 9:56:24 PM GMT+2 Dan Williams wrote:
> > Fabio M. De Francesco wrote:
> > > Currently, extlog_print() (ELOG) only reports CPER PCIe section (UEFI
> > > v2.10, Appendix N.2.7) to the kernel log via print_extlog_rcd().
> >
> > I think the critical detail is is that print_extlog_rcd() is only
> > triggered when ras_userspace_consumers() returns true. The observation
> > is that ras_userspace_consumers() hides information from the trace path
> > when the intended purpose of it was to hide duplicate emissions to the
> > kernel log when userspace is watching the tracepoints.
> >
> > Setting aside whether ras_userspace_consumers() is still a good idea or
> > not, it is obvious that this patch as is may surprise environments that
> > start seeing kernel error logs where the kernel was silent before.
> >
> > I think the path of least surprise would be to make sure that
> > pci_print_aer() optionally skips emitting to the kernel log when not
> > needed wanted.
>
> Sorry for replying so late...
>
> I'm not entirely sure that users would not prefer to be surprised by
> _finally_ seeing kernel error logs for failing PCIe components. I suspect
> that users might have been confused by not seeing any output.
2 notes:
* New KERN_ERR prints are often found to be unwelcome. When the kernel starts
printing new error messages it causes sysadmins to scramble.
* The future of RAS is trace-events. Any new RAS messages to the kernel
log need to ask the question, "is userspace better served by
registering for a RAS trace event, rather than parsing kernel log
messsages".
[..]
> I need to be sure that I understood...
>
> void pci_print_aer(char *level, struct pci_dev *dev, int aer_severity,
> struct aer_capability_regs *aer)
> {
> [...]
>
> if (printk_get_level(level) <= console_loglevel) {
> pci_err(dev, "aer_status: 0x%08x, aer_mask: 0x%08x\n",
> status, mask);
No, the code would be:
pci_printk(level, dev, "aer_status: 0x%08x, aer_mask: 0x%08x\n", status, mask);
...i.e. just pass @level rather than open code "if
(printk_get_level(level) <= console_loglevel)".
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2024-12-11 1:53 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-27 14:43 [PATCH 0/2] Make ELOG log and trace consistently with GHES Fabio M. De Francesco
2024-05-27 14:43 ` [PATCH 1/2] ACPI: extlog: Trace CPER Non-standard Section Body Fabio M. De Francesco
2024-08-06 19:21 ` Dan Williams
2024-08-06 21:07 ` Bjorn Helgaas
2024-05-27 14:43 ` [PATCH 2/2] ACPI: extlog: Trace CPER PCI Express Error Section Fabio M. De Francesco
2024-08-06 19:56 ` Dan Williams
2024-10-23 13:35 ` Fabio M. De Francesco
2024-12-11 1:51 ` Dan Williams
2024-08-06 21:31 ` Bjorn Helgaas
2024-08-07 20:28 ` Dan Williams
2024-08-07 20:31 ` Dan Williams
2024-07-23 13:43 ` [PATCH 0/2] Make ELOG log and trace consistently with GHES Fabio M. De Francesco
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).