From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Igor Mammedov <imammedo@redhat.com>,
Shiju Jose <shiju.jose@huawei.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
Ani Sinha <anisinha@redhat.com>,
Dongjiu Geng <gengdongjiu1@gmail.com>,
<linux-kernel@vger.kernel.org>, <qemu-arm@nongnu.org>,
<qemu-devel@nongnu.org>
Subject: Re: [PATCH v3 08/15] acpi/ghes: make the GHES record generation more generic
Date: Fri, 22 Nov 2024 09:24:55 +0100 [thread overview]
Message-ID: <20241122092455.5e2584a4@foz.lan> (raw)
In-Reply-To: <20241120141838.000070c1@huawei.com>
Em Wed, 20 Nov 2024 14:18:38 +0000
Jonathan Cameron <Jonathan.Cameron@huawei.com> escreveu:
> On Tue, 12 Nov 2024 11:14:52 +0100
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
>
> > Split the code into separate functions to allow using the
> > common CPER filling code by different error sources.
> >
> > The generic code was moved to ghes_record_cper_errors(),
> > and ghes_gen_err_data_uncorrectable_recoverable() now contains
> > only a logic to fill GEGB part of the record.
> GESB?
This name came from this part of the code at hw/acpi/ghes.c:
/*
* Total size for Generic Error Status Block except Generic Error Data Entries
* ACPI 6.2: 18.3.2.7.1 Generic Error Data,
* Table 18-380 Generic Error Status Block
*/
#define ACPI_GHES_GESB_SIZE 20
I replaced it to:
The generic code was moved to ghes_record_cper_errors(),
and ghes_gen_err_data_uncorrectable_recoverable() now contains
only a logic to fill the Generic Error Data part of the record,
as described at:
ACPI 6.2: 18.3.2.7.1 Generic Error Data
to make it clearer.
> >
> > The remaining code to generate a memory error now belongs to
> > acpi_ghes_record_errors() function.
> >
> > A further patch will give it a better name.
> >
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> Hi Mauro,
>
> I've kind of forgotten how all this works. Anyhow, looking afresh
> I think there is a functional change in her which I wasn't expecting
> to see.
>
> > ---
> > hw/acpi/ghes.c | 122 +++++++++++++++++++++++------------------
> > include/hw/acpi/ghes.h | 3 +
> > 2 files changed, 73 insertions(+), 52 deletions(-)
> >
> > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> > index edc74c38bf8a..0eb874a11ff7 100644
> > --- a/hw/acpi/ghes.c
> > +++ b/hw/acpi/ghes.c
> > @@ -181,51 +181,30 @@ static void acpi_ghes_build_append_mem_cper(GArray *table,
> > build_append_int_noprefix(table, 0, 7);
> > }
> >
> > -static int acpi_ghes_record_mem_error(uint64_t error_block_address,
> > - uint64_t error_physical_addr)
> > +static void
> > +ghes_gen_err_data_uncorrectable_recoverable(GArray *block,
> > + const uint8_t *section_type,
> > + int data_length)
> > {
> > - GArray *block;
> > -
> > - /* Memory Error Section Type */
> > - const uint8_t uefi_cper_mem_sec[] =
> > - UUID_LE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
> > - 0xED, 0x7C, 0x83, 0xB1);
> > -
> > /* invalid fru id: ACPI 4.0: 17.3.2.6.1 Generic Error Data,
> > * Table 17-13 Generic Error Data Entry
> > */
> > QemuUUID fru_id = {};
> > - uint32_t data_length;
> >
> > - block = g_array_new(false, true /* clear */, 1);
> > -
> > - /* This is the length if adding a new generic error data entry*/
> > - data_length = ACPI_GHES_DATA_LENGTH + ACPI_GHES_MEM_CPER_LENGTH;
> > /*
> > - * It should not run out of the preallocated memory if adding a new generic
> > - * error data entry
> > + * Calculate the size with this block. No need to check for
> > + * too big CPER, as CPER size is checked at ghes_record_cper_errors()
> > */
> > - assert((data_length + ACPI_GHES_GESB_SIZE) <=
> > - ACPI_GHES_MAX_RAW_DATA_LENGTH);
> > + data_length += ACPI_GHES_GESB_SIZE;
>
> After this change the data length passe dto acpi_ghes_generic_error_status is
> ACPI_GHES_MAX_RAW_DATA_LENGTH + ACPI_GHES_GESB_SIZE;
> I can't see why that would be the same as previous value of
> ACPI_GHES_DATA_LENGTH + ACPI_GHES_MEM_CPER_LENGTH
>
> So is this a functional change? If so please call it out in the patch
> description with some info on why it doesn't matter.
Good point.
My understanding from the specs is that the generic error data part
is meant to contain the size of the entire space, and not only the
payload part.
Such change used to make more sense on some previous versions of the
series, where ghes_gen_err_data_uncorrectable_recoverable() was called
by error injection. However, now the only place where this is used is
inside the code to return QEMU memory errors via SEA.
So, I removed the functional change from the series I'm currently
submitting, as it doesn't belong here. It could be relevant only when
we decide to implement other internal errors on QEMU that would reuse
ghes_gen_err_data_uncorrectable_recoverable() function.
So, I'll keep on my pile of QEMU patches, at the end, to be submitted
if/when we need it.
> >
> > /* Build the new generic error status block header */
> > acpi_ghes_generic_error_status(block, ACPI_GEBS_UNCORRECTABLE,
> > 0, 0, data_length, ACPI_CPER_SEV_RECOVERABLE);
> >
> > /* Build this new generic error data entry header */
> > - acpi_ghes_generic_error_data(block, uefi_cper_mem_sec,
> > + acpi_ghes_generic_error_data(block, section_type,
> > ACPI_CPER_SEV_RECOVERABLE, 0, 0,
> > ACPI_GHES_MEM_CPER_LENGTH, fru_id, 0);
> > -
> > - /* Build the memory section CPER for above new generic error data entry */
> > - acpi_ghes_build_append_mem_cper(block, error_physical_addr);
> > -
> > - /* Write the generic error data entry into guest memory */
> > - cpu_physical_memory_write(error_block_address, block->data, block->len);
> > -
> > - g_array_free(block, true);
> > -
> > - return 0;
> > }
> >
> > /*
> > @@ -383,15 +362,18 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
> > ags->present = true;
> > }
> >
> > -int acpi_ghes_record_errors(uint16_t source_id, uint64_t physical_address)
> > +void ghes_record_cper_errors(const void *cper, size_t len,
> > + uint16_t source_id, Error **errp)
> > {
> > uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
> > uint64_t start_addr;
> > - bool ret = -1;
> > AcpiGedState *acpi_ged_state;
> > AcpiGhesState *ags;
> >
> > - assert(source_id < ACPI_GHES_ERROR_SOURCE_COUNT);
> > + if (len > ACPI_GHES_MAX_RAW_DATA_LENGTH) {
> > + error_setg(errp, "GHES CPER record is too big: %ld", len);
> > + return;
> > + }
> >
> > acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
> > NULL));
> > @@ -400,16 +382,16 @@ int acpi_ghes_record_errors(uint16_t source_id, uint64_t physical_address)
> >
> > start_addr = le64_to_cpu(ags->ghes_addr_le);
> >
> > - if (!physical_address) {
> > - return -1;
> > - }
> > -
> > start_addr += source_id * sizeof(uint64_t);
> >
> > cpu_physical_memory_read(start_addr, &error_block_addr,
> > sizeof(error_block_addr));
> >
> > error_block_addr = le64_to_cpu(error_block_addr);
> > + if (!error_block_addr) {
> > + error_setg(errp, "can not find Generic Error Status Block");
> > + return;
> > + }
> >
> > read_ack_register_addr = start_addr +
> > ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t);
> > @@ -419,24 +401,60 @@ int acpi_ghes_record_errors(uint16_t source_id, uint64_t physical_address)
> >
> > /* zero means OSPM does not acknowledge the error */
> > if (!read_ack_register) {
> > - error_report("OSPM does not acknowledge previous error,"
> > - " so can not record CPER for current error anymore");
> > - } else if (error_block_addr) {
> > - read_ack_register = cpu_to_le64(0);
> > - /*
> > - * Clear the Read Ack Register, OSPM will write it to 1 when
> > - * it acknowledges this error.
> > - */
> > - cpu_physical_memory_write(read_ack_register_addr,
> > - &read_ack_register, sizeof(uint64_t));
> > -
> > - ret = acpi_ghes_record_mem_error(error_block_addr,
> > - physical_address);
> > - } else {
> > - error_report("can not find Generic Error Status Block");
> > + error_setg(errp,
> > + "OSPM does not acknowledge previous error,"
> > + " so can not record CPER for current error anymore");
> > + return;
> > }
> >
> > - return ret;
> > + read_ack_register = cpu_to_le64(0);
> > + /*
> > + * Clear the Read Ack Register, OSPM will write it to 1 when
> > + * it acknowledges this error.
> > + */
> > + cpu_physical_memory_write(read_ack_register_addr,
> > + &read_ack_register, sizeof(uint64_t));
> Alignment of this could be more consistent with rest of the code around it.
> So perhaps align after (
>
> > +
> > + /* Write the generic error data entry into guest memory */
> > + cpu_physical_memory_write(error_block_addr, cper, len);
> > +
> > + return;
> > +}
> > +
> > +int acpi_ghes_record_errors(uint16_t source_id, uint64_t physical_address)
> > +{
> > + /* Memory Error Section Type */
> > + const uint8_t guid[] =
> > + UUID_LE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
> > + 0xED, 0x7C, 0x83, 0xB1);
> > + Error *errp = NULL;
> > + GArray *block;
> > +
> > + if (!physical_address) {
> > + error_report("can not find Generic Error Status Block for source id %d",
> > + source_id);
> > + return -1;
> > + }
> > +
> > + block = g_array_new(false, true /* clear */, 1);
> > +
> > + ghes_gen_err_data_uncorrectable_recoverable(block, guid,
> > + ACPI_GHES_MAX_RAW_DATA_LENGTH);
> > +
> > + /* Build the memory section CPER for above new generic error data entry */
> > + acpi_ghes_build_append_mem_cper(block, physical_address);
> > +
> > + /* Report the error */
> > + ghes_record_cper_errors(block->data, block->len, source_id, &errp);
> > +
> > + g_array_free(block, true);
> > +
> > + if (errp) {
> > + error_report_err(errp);
> > + return -1;
> > + }
> > +
> > + return 0;
> > }
> >
> > bool acpi_ghes_present(void)
> > diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
> > index 9295e46be25e..8859346af51a 100644
> > --- a/include/hw/acpi/ghes.h
> > +++ b/include/hw/acpi/ghes.h
> > @@ -23,6 +23,7 @@
> > #define ACPI_GHES_H
> >
> > #include "hw/acpi/bios-linker-loader.h"
> > +#include "qapi/error.h"
> >
> > /*
> > * Values for Hardware Error Notification Type field
> > @@ -73,6 +74,8 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
> > const char *oem_id, const char *oem_table_id);
> > void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
> > GArray *hardware_errors);
> > +void ghes_record_cper_errors(const void *cper, size_t len,
> > + uint16_t source_id, Error **errp);
> > int acpi_ghes_record_errors(uint16_t source_id, uint64_t error_physical_addr);
> >
> > /**
>
Thanks,
Mauro
next prev parent reply other threads:[~2024-11-22 8:25 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-12 10:14 [PATCH v3 00/15] Prepare GHES driver to support error injection Mauro Carvalho Chehab
2024-11-12 10:14 ` [PATCH v3 01/15] acpi/ghes: get rid of ACPI_HEST_SRC_ID_RESERVED Mauro Carvalho Chehab
2024-11-12 10:14 ` [PATCH v3 02/15] acpi/ghes: simplify acpi_ghes_record_errors() code Mauro Carvalho Chehab
2024-11-12 10:14 ` [PATCH v3 03/15] acpi/ghes: simplify the per-arch caller to build HEST table Mauro Carvalho Chehab
2024-11-12 10:14 ` [PATCH v3 04/15] acpi/ghes: better handle source_id and notification Mauro Carvalho Chehab
2024-11-12 10:14 ` [PATCH v3 05/15] acpi/ghes: Fix acpi_ghes_record_errors() argument Mauro Carvalho Chehab
2024-11-12 10:14 ` [PATCH v3 06/15] acpi/ghes: Remove a duplicated out of bounds check Mauro Carvalho Chehab
2024-11-12 10:14 ` [PATCH v3 07/15] acpi/ghes: Change the type for source_id Mauro Carvalho Chehab
2024-11-12 10:14 ` [PATCH v3 08/15] acpi/ghes: make the GHES record generation more generic Mauro Carvalho Chehab
2024-11-20 14:18 ` Jonathan Cameron
2024-11-22 8:24 ` Mauro Carvalho Chehab [this message]
2024-11-12 10:14 ` [PATCH v3 09/15] acpi/ghes: better name GHES memory error function Mauro Carvalho Chehab
2024-11-20 14:22 ` Jonathan Cameron
2024-11-12 10:14 ` [PATCH v3 10/15] acpi/ghes: don't crash QEMU if ghes GED is not found Mauro Carvalho Chehab
2024-11-20 14:22 ` Jonathan Cameron
2024-11-12 10:14 ` [PATCH v3 11/15] acpi/ghes: rename etc/hardware_error file macros Mauro Carvalho Chehab
2024-11-12 10:14 ` [PATCH v3 12/15] acpi/ghes: better name the offset of the hardware error firmware Mauro Carvalho Chehab
2024-11-12 10:14 ` [PATCH v3 13/15] acpi/ghes: move offset calculus to a separate function Mauro Carvalho Chehab
2024-11-12 10:14 ` [PATCH v3 14/15] acpi/ghes: Change ghes fill logic to work with only one source Mauro Carvalho Chehab
2024-11-20 14:24 ` Jonathan Cameron
2024-11-12 10:14 ` [PATCH v3 15/15] docs: acpi_hest_ghes: fix documentation for CPER size Mauro Carvalho Chehab
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241122092455.5e2584a4@foz.lan \
--to=mchehab+huawei@kernel.org \
--cc=Jonathan.Cameron@huawei.com \
--cc=anisinha@redhat.com \
--cc=gengdongjiu1@gmail.com \
--cc=imammedo@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mst@redhat.com \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=shiju.jose@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox