qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: Igor Mammedov <imammedo@redhat.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	Shiju Jose <shiju.jose@huawei.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Ani Sinha <anisinha@redhat.com>,
	Dongjiu Geng <gengdongjiu1@gmail.com>,
	linux-kernel@vger.kernel.org, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org
Subject: Re: [PATCH v10 07/21] acpi/ghes: rework the logic to handle HEST source ID
Date: Tue, 1 Oct 2024 13:57:59 +0200	[thread overview]
Message-ID: <20241001135759.474452d8@foz.lan> (raw)
In-Reply-To: <20240917135934.38579213@imammedo.users.ipa.redhat.com>

Em Tue, 17 Sep 2024 13:59:34 +0200
Igor Mammedov <imammedo@redhat.com> escreveu:

> On Sat, 14 Sep 2024 08:13:28 +0200
> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:
> 
> > The current logic is based on a lot of duct tape, with
> > offsets calculated based on one define with the number of
> > source IDs and an enum.
> > 
> > Rewrite the logic in a way that it would be more resilient
> > of code changes, by moving the source ID count to an enum
> > and make the offset calculus more explicit.
> > 
> > Such change was inspired on a patch from Jonathan Cameron
> > splitting the logic to get the CPER address on a separate
> > function, as this will be needed to support generic error
> > injection.  
> 
> so this patch switches to using HEST to lookup error status block
> by source id, though nothing in commit message mentions that.
> Perhaps it's time to rewrite commit message to be more
> specific/clear on what it's doing.  
> 
> now, I'd split this on several patches that should also take care of
> wiring needed to preserve old lookup to keep migration with 9.1 machines
> working:
> 
>  1. extract error status block read_ack addresses calculation/lookup
>     into a separate function (just current code, without HEST lookup).

Done. This is at the cleanup series.

>  2. since old lookup code handles only 1 source and won't ever see multiple
>     error sources, simplify that as much as possible to keep old way simple
>     (and mention that in commit message).
>     it basically should take 1st error status block/read ack addresses from
>     hardware_errors

Done there too.

> 
>  3. add to AcpiGedState a callback to with signature of #1 lookup func.
>     (which will be used to switch between old/new code), by default set to
>     old lookup func
> 
>  4. add to GED a bool property x-has-hardware_errors_addr = 1
>     and use it in acpi_ged_realize() to set callback
> 
>           if x-has-hardware_errors_addr == 1
>                callback = old_lookup

Instead of a callback, I opted to define a boolean:

	DEFINE_PROP_BOOL("x-has-hest-addr", AcpiGedState, ghes_state.hest_lookup, true),

That avoided the need of exporting the math logic and deal with
callbacks. It also avoided the need of touching acpi_ged_realize().

> 
>  5. add new HEST lookup func (not used yet with mentioning that follow up patch
>     will use it)
> 
>  6. cleanup fwcfg based on x-has-hardware_errors_addr,
>        i.e. for 'true':
>           ask for write pointer to hardware_errors like it's done in current code
>           and don't register hest_addr write pointer
>        while for 'false'
>           do opposite of above.

This doesn't work. without the fw_cfg logic for both, QEMU/BIOS won't boot 
and/or the hardware_errors won't work, causing ghes to do nothing.

>  7. flip x-has-hardware_errors_addr default to 0 and add to hw/core/machine.c
>       hw_compat_9_1[] = {
>           {ged type,  'x-has-hardware_errors_addr', 'true'},
>       }
>     this will make 9.1 and older machine types to use old lookup callback
>     while newer machines will use default new lookup

I opted to add the changes on machine.c at the same patch.

See the RFC series I posted.

> that should take care of switching between new and old ABI,
> i.e. which code qemu and guest will use/see.

With the new series, both virt-9.1 and virt-9.2 are doing the
right thing using either the legacy or the new math. I double
checked with a small debug patch, and changing the virt-9.1
code to use GPIO, as it is a way easier to inject errors than
via SEA (SEA requires the host OS to generate a bus error on
a hw address used by the VM).

> 
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > 
> > ---
> > 
> > Changes from v9:
> > - patch split on multiple patches to avoid multiple changes
> >   at the same patch.
> > 
> > Changes from v8:
> > - Non-rename/cleanup changes merged altogether;
> > - source ID is now more generic, defined per guest target.
> >   That should make easier to add support for 86.
> > 
> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> > ---
> >  hw/acpi/ghes.c         | 95 ++++++++++++++++++++++++++++++------------
> >  include/hw/acpi/ghes.h |  6 ++-
> >  2 files changed, 73 insertions(+), 28 deletions(-)
> > 
> > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> > index 36fe5f68782f..6e5f0e8cf0c9 100644
> > --- a/hw/acpi/ghes.c
> > +++ b/hw/acpi/ghes.c
> > @@ -61,6 +61,23 @@
> >   */
> >  #define ACPI_GHES_GESB_SIZE                 20
> >  
> > +/*
> > + * Offsets with regards to the start of the HEST table stored at
> > + * ags->hest_addr_le, according with the memory layout map at
> > + * docs/specs/acpi_hest_ghes.rst.
> > + */
> > +
> > +/* ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2
> > + * Table 18-382 Generic Hardware Error Source version 2 (GHESv2) Structure
> > + */
> > +#define HEST_GHES_V2_TABLE_SIZE  92
> > +#define GHES_ACK_OFFSET          (64 + GAS_ADDR_OFFSET)
> > +
> > +/* ACPI 6.2: 18.3.2.7: Generic Hardware Error Source
> > + * Table 18-380: 'Error Status Address' field
> > + */
> > +#define GHES_ERR_ST_ADDR_OFFSET  (20 + GAS_ADDR_OFFSET)
> > +
> >  /*
> >   * Values for error_severity field
> >   */
> > @@ -401,11 +418,13 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
> >  
> >  int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
> >  {
> > -    uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
> > -    uint64_t start_addr;
> > -    bool ret = -1;
> > +    uint64_t hest_read_ack_start_addr, read_ack_start_addr;
> > +    uint64_t hest_addr, cper_addr, err_source_struct;
> > +    uint64_t hest_err_block_addr, error_block_addr;
> > +    uint32_t i, ret;
> >      AcpiGedState *acpi_ged_state;
> >      AcpiGhesState *ags;
> > +    uint64_t read_ack;
> >  
> >      assert(source_id < ACPI_GHES_ERROR_SOURCE_COUNT);
> >  
> > @@ -414,44 +433,66 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
> >      g_assert(acpi_ged_state);
> >      ags = &acpi_ged_state->ghes_state;
> >  
> > -    start_addr = le64_to_cpu(ags->ghes_addr_le);
> > +    hest_addr = le64_to_cpu(ags->hest_addr_le);
> >  
> >      if (!physical_address) {
> >          return -1;
> >      }
> >  
> > -    start_addr += source_id * sizeof(uint64_t);
> > +    err_source_struct = hest_addr + ACPI_GHES_ERROR_SOURCE_COUNT;
> >  
> > -    cpu_physical_memory_read(start_addr, &error_block_addr,
> > -                                sizeof(error_block_addr));
> > +    /*
> > +     * Currently, HEST Error source navigates only for GHESv2 tables
> > +     */
> > +    for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
> > +        uint64_t addr = err_source_struct;
> > +        uint16_t type, src_id;
> >  
> > -    error_block_addr = le64_to_cpu(error_block_addr);
> > +        cpu_physical_memory_read(addr, &type, sizeof(type));
> >  
> > -    read_ack_register_addr = start_addr +
> > -        ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t);
> > +        /* For now, we only know the size of GHESv2 table */
> > +        assert(type == ACPI_GHES_SOURCE_GENERIC_ERROR_V2);
> >  
> > -    cpu_physical_memory_read(read_ack_register_addr,
> > -                                &read_ack_register, sizeof(read_ack_register));
> > +        /* It is GHES. Compare CPER source address */
> > +        addr += sizeof(type);
> > +        cpu_physical_memory_read(addr, &src_id, sizeof(src_id));
> >  
> > -    /* zero means OSPM does not acknowledge the error */
> > -    if (!read_ack_register) {
> > -        error_report("OSPM does not acknowledge previous error,"
> > -            " so can not record CPER for current error anymore");
> > -    } else if (error_block_addr) {
> > -        read_ack_register = cpu_to_le64(0);
> > -        /*
> > -         * Clear the Read Ack Register, OSPM will write it to 1 when
> > -         * it acknowledges this error.
> > -         */
> > -        cpu_physical_memory_write(read_ack_register_addr,
> > -            &read_ack_register, sizeof(uint64_t));
> > +        if (src_id == source_id) {
> > +            break;
> > +        }
> >  
> > -        ret = acpi_ghes_record_mem_error(error_block_addr,
> > -                                            physical_address);
> > -    } else {
> > +        err_source_struct += HEST_GHES_V2_TABLE_SIZE;
> > +    }
> > +    if (i == ACPI_GHES_ERROR_SOURCE_COUNT) {
> >          error_report("can not find Generic Error Status Block");
> > +
> > +        return -1;
> >      }
> >  
> > +    /* Navigate though table address pointers */
> > +
> > +    hest_err_block_addr = err_source_struct + GHES_ERR_ST_ADDR_OFFSET;
> > +    hest_read_ack_start_addr = err_source_struct + GHES_ACK_OFFSET;
> > +
> > +    cpu_physical_memory_read(hest_err_block_addr, &error_block_addr,
> > +                             sizeof(error_block_addr));
> > +
> > +    cpu_physical_memory_read(error_block_addr, &cper_addr,
> > +                             sizeof(error_block_addr));
> > +
> > +    cpu_physical_memory_read(hest_read_ack_start_addr, &read_ack_start_addr,
> > +                             sizeof(read_ack_start_addr));
> > +
> > +    /* Update ACK offset to notify about a new error */
> > +
> > +    cpu_physical_memory_read(read_ack_start_addr,
> > +                             &read_ack, sizeof(read_ack));
> > +
> > +    read_ack = cpu_to_le64(0);
> > +    cpu_physical_memory_write(read_ack_start_addr,
> > +                              &read_ack, sizeof(read_ack));
> > +
> > +    ret = acpi_ghes_record_mem_error(error_block_addr, physical_address);
> >      return ret;
> >  }
> >  
> > diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
> > index c9bbda4740e2..7485f54c28f2 100644
> > --- a/include/hw/acpi/ghes.h
> > +++ b/include/hw/acpi/ghes.h
> > @@ -23,6 +23,7 @@
> >  #define ACPI_GHES_H
> >  
> >  #include "hw/acpi/bios-linker-loader.h"
> > +#include "qapi/error.h"
> >  
> >  /*
> >   * Values for Hardware Error Notification Type field
> > @@ -74,7 +75,10 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
> >                       const char *oem_id, const char *oem_table_id);
> >  void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
> >                            GArray *hardware_errors);
> > -int acpi_ghes_record_errors(uint8_t source_id, uint64_t error_physical_addr);
> > +int acpi_ghes_record_errors(uint8_t source_id,
> > +                            uint64_t error_physical_addr);
> > +void ghes_record_cper_errors(const void *cper, size_t len,
> > +                             uint16_t source_id, Error **errp);
> >  
> >  /**
> >   * acpi_ghes_present: Report whether ACPI GHES table is present  
> 



Thanks,
Mauro


  reply	other threads:[~2024-10-01 14:57 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-14  6:13 [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
2024-09-14  6:13 ` [PATCH v10 01/21] acpi/ghes: add a firmware file with HEST address Mauro Carvalho Chehab
2024-09-17  9:34   ` Igor Mammedov
2024-09-14  6:13 ` [PATCH v10 02/21] acpi/generic_event_device: Update GHES migration to cover hest addr Mauro Carvalho Chehab
2024-09-17  9:19   ` Igor Mammedov
2024-09-17 15:22     ` Peter Xu
2024-09-25  8:04       ` Mauro Carvalho Chehab
2024-09-17 12:01   ` Igor Mammedov
2024-09-25  7:32     ` Mauro Carvalho Chehab
2024-09-14  6:13 ` [PATCH v10 03/21] acpi/ghes: get rid of ACPI_HEST_SRC_ID_RESERVED Mauro Carvalho Chehab
2024-09-17  9:22   ` Igor Mammedov
2024-09-14  6:13 ` [PATCH v10 04/21] acpi/ghes: simplify acpi_ghes_record_errors() code Mauro Carvalho Chehab
2024-09-17  9:28   ` Igor Mammedov
2024-09-14  6:13 ` [PATCH v10 05/21] acpi/ghes: better handle source_id and notification Mauro Carvalho Chehab
2024-09-17  9:01   ` Igor Mammedov
2024-09-14  6:13 ` [PATCH v10 06/21] acpi/ghes: Remove a duplicated out of bounds check Mauro Carvalho Chehab
2024-09-17 10:39   ` Igor Mammedov
2024-09-14  6:13 ` [PATCH v10 07/21] acpi/ghes: rework the logic to handle HEST source ID Mauro Carvalho Chehab
2024-09-17 11:59   ` Igor Mammedov
2024-10-01 11:57     ` Mauro Carvalho Chehab [this message]
2024-10-03 14:51       ` Igor Mammedov
2024-09-14  6:13 ` [PATCH v10 08/21] acpi/ghes: Change the type for source_id Mauro Carvalho Chehab
2024-09-17 12:04   ` Igor Mammedov
2024-09-14  6:13 ` [PATCH v10 09/21] acpi/ghes: Don't hardcode the number of sources on ghes Mauro Carvalho Chehab
2024-09-14  6:13 ` [PATCH v10 10/21] acpi/ghes: make the GHES record generation more generic Mauro Carvalho Chehab
2024-09-14  6:13 ` [PATCH v10 11/21] acpi/ghes: don't crash QEMU if ghes GED is not found Mauro Carvalho Chehab
2024-09-14  6:13 ` [PATCH v10 12/21] acpi/ghes: rename etc/hardware_error file macros Mauro Carvalho Chehab
2024-09-14  6:13 ` [PATCH v10 13/21] acpi/ghes: better name GHES memory error function Mauro Carvalho Chehab
2024-09-14  6:13 ` [PATCH v10 14/21] acpi/ghes: add a notifier to notify when error data is ready Mauro Carvalho Chehab
2024-09-14  6:13 ` [PATCH v10 15/21] acpi/generic_event_device: add an APEI error device Mauro Carvalho Chehab
2024-09-14  6:13 ` [PATCH v10 16/21] arm/virt: Wire up a GED error device for ACPI / GHES Mauro Carvalho Chehab
2024-09-14  6:13 ` [PATCH v10 17/21] qapi/acpi-hest: add an interface to do generic CPER error injection Mauro Carvalho Chehab
2024-09-14  6:13 ` [PATCH v10 18/21] docs: acpi_hest_ghes: fix documentation for CPER size Mauro Carvalho Chehab
2024-09-14  6:13 ` [PATCH v10 19/21] scripts/ghes_inject: add a script to generate GHES error inject Mauro Carvalho Chehab
2024-09-14  6:13 ` [PATCH v10 20/21] target/arm: add an experimental mpidr arm cpu property object Mauro Carvalho Chehab
2024-09-14  6:13 ` [PATCH v10 21/21] scripts/arm_processor_error.py: retrieve mpidr if not filled Mauro Carvalho Chehab
2024-09-17 12:15 ` [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation Igor Mammedov
2024-09-24 13:00   ` Mauro Carvalho Chehab
2024-09-24 13:14     ` Igor Mammedov
2024-09-25  4:26       ` Mauro Carvalho Chehab

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241001135759.474452d8@foz.lan \
    --to=mchehab+huawei@kernel.org \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=anisinha@redhat.com \
    --cc=gengdongjiu1@gmail.com \
    --cc=imammedo@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=shiju.jose@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).