qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Igor Mammedov <imammedo@redhat.com>
To: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	Shiju Jose <shiju.jose@huawei.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Ani Sinha <anisinha@redhat.com>,
	Dongjiu Geng <gengdongjiu1@gmail.com>,
	linux-kernel@vger.kernel.org, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org
Subject: Re: [PATCH v10 07/21] acpi/ghes: rework the logic to handle HEST source ID
Date: Tue, 17 Sep 2024 13:59:34 +0200	[thread overview]
Message-ID: <20240917135934.38579213@imammedo.users.ipa.redhat.com> (raw)
In-Reply-To: <0dd7081717b23b4c1536bc86abfa926388cf2139.1726293808.git.mchehab+huawei@kernel.org>

On Sat, 14 Sep 2024 08:13:28 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> The current logic is based on a lot of duct tape, with
> offsets calculated based on one define with the number of
> source IDs and an enum.
> 
> Rewrite the logic in a way that it would be more resilient
> of code changes, by moving the source ID count to an enum
> and make the offset calculus more explicit.
> 
> Such change was inspired on a patch from Jonathan Cameron
> splitting the logic to get the CPER address on a separate
> function, as this will be needed to support generic error
> injection.

so this patch switches to using HEST to lookup error status block
by source id, though nothing in commit message mentions that.
Perhaps it's time to rewrite commit message to be more
specific/clear on what it's doing.  

now, I'd split this on several patches that should also take care of
wiring needed to preserve old lookup to keep migration with 9.1 machines
working:

 1. extract error status block read_ack addresses calculation/lookup
    into a separate function (just current code, without HEST lookup).
 2. since old lookup code handles only 1 source and won't ever see multiple
    error sources, simplify that as much as possible to keep old way simple
    (and mention that in commit message).
    it basically should take 1st error status block/read ack addresses from
    hardware_errors

 3. add to AcpiGedState a callback to with signature of #1 lookup func.
    (which will be used to switch between old/new code), by default set to
    old lookup func

 4. add to GED a bool property x-has-hardware_errors_addr = 1
    and use it in acpi_ged_realize() to set callback

          if x-has-hardware_errors_addr == 1
               callback = old_lookup

 5. add new HEST lookup func (not used yet with mentioning that follow up patch
    will use it)

 6. cleanup fwcfg based on x-has-hardware_errors_addr,
       i.e. for 'true':
          ask for write pointer to hardware_errors like it's done in current code
          and don't register hest_addr write pointer
       while for 'false'
          do opposite of above.


 7. flip x-has-hardware_errors_addr default to 0 and add to hw/core/machine.c
      hw_compat_9_1[] = {
          {ged type,  'x-has-hardware_errors_addr', 'true'},
      }
    this will make 9.1 and older machine types to use old lookup callback
    while newer machines will use default new lookup

that should take care of switching between new and old ABI,
i.e. which code qemu and guest will use/see.

> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> 
> ---
> 
> Changes from v9:
> - patch split on multiple patches to avoid multiple changes
>   at the same patch.
> 
> Changes from v8:
> - Non-rename/cleanup changes merged altogether;
> - source ID is now more generic, defined per guest target.
>   That should make easier to add support for 86.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
>  hw/acpi/ghes.c         | 95 ++++++++++++++++++++++++++++++------------
>  include/hw/acpi/ghes.h |  6 ++-
>  2 files changed, 73 insertions(+), 28 deletions(-)
> 
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index 36fe5f68782f..6e5f0e8cf0c9 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -61,6 +61,23 @@
>   */
>  #define ACPI_GHES_GESB_SIZE                 20
>  
> +/*
> + * Offsets with regards to the start of the HEST table stored at
> + * ags->hest_addr_le, according with the memory layout map at
> + * docs/specs/acpi_hest_ghes.rst.
> + */
> +
> +/* ACPI 6.2: 18.3.2.8 Generic Hardware Error Source version 2
> + * Table 18-382 Generic Hardware Error Source version 2 (GHESv2) Structure
> + */
> +#define HEST_GHES_V2_TABLE_SIZE  92
> +#define GHES_ACK_OFFSET          (64 + GAS_ADDR_OFFSET)
> +
> +/* ACPI 6.2: 18.3.2.7: Generic Hardware Error Source
> + * Table 18-380: 'Error Status Address' field
> + */
> +#define GHES_ERR_ST_ADDR_OFFSET  (20 + GAS_ADDR_OFFSET)
> +
>  /*
>   * Values for error_severity field
>   */
> @@ -401,11 +418,13 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, FWCfgState *s,
>  
>  int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
>  {
> -    uint64_t error_block_addr, read_ack_register_addr, read_ack_register = 0;
> -    uint64_t start_addr;
> -    bool ret = -1;
> +    uint64_t hest_read_ack_start_addr, read_ack_start_addr;
> +    uint64_t hest_addr, cper_addr, err_source_struct;
> +    uint64_t hest_err_block_addr, error_block_addr;
> +    uint32_t i, ret;
>      AcpiGedState *acpi_ged_state;
>      AcpiGhesState *ags;
> +    uint64_t read_ack;
>  
>      assert(source_id < ACPI_GHES_ERROR_SOURCE_COUNT);
>  
> @@ -414,44 +433,66 @@ int acpi_ghes_record_errors(uint8_t source_id, uint64_t physical_address)
>      g_assert(acpi_ged_state);
>      ags = &acpi_ged_state->ghes_state;
>  
> -    start_addr = le64_to_cpu(ags->ghes_addr_le);
> +    hest_addr = le64_to_cpu(ags->hest_addr_le);
>  
>      if (!physical_address) {
>          return -1;
>      }
>  
> -    start_addr += source_id * sizeof(uint64_t);
> +    err_source_struct = hest_addr + ACPI_GHES_ERROR_SOURCE_COUNT;
>  
> -    cpu_physical_memory_read(start_addr, &error_block_addr,
> -                                sizeof(error_block_addr));
> +    /*
> +     * Currently, HEST Error source navigates only for GHESv2 tables
> +     */
> +    for (i = 0; i < ACPI_GHES_ERROR_SOURCE_COUNT; i++) {
> +        uint64_t addr = err_source_struct;
> +        uint16_t type, src_id;
>  
> -    error_block_addr = le64_to_cpu(error_block_addr);
> +        cpu_physical_memory_read(addr, &type, sizeof(type));
>  
> -    read_ack_register_addr = start_addr +
> -        ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t);
> +        /* For now, we only know the size of GHESv2 table */
> +        assert(type == ACPI_GHES_SOURCE_GENERIC_ERROR_V2);
>  
> -    cpu_physical_memory_read(read_ack_register_addr,
> -                                &read_ack_register, sizeof(read_ack_register));
> +        /* It is GHES. Compare CPER source address */
> +        addr += sizeof(type);
> +        cpu_physical_memory_read(addr, &src_id, sizeof(src_id));
>  
> -    /* zero means OSPM does not acknowledge the error */
> -    if (!read_ack_register) {
> -        error_report("OSPM does not acknowledge previous error,"
> -            " so can not record CPER for current error anymore");
> -    } else if (error_block_addr) {
> -        read_ack_register = cpu_to_le64(0);
> -        /*
> -         * Clear the Read Ack Register, OSPM will write it to 1 when
> -         * it acknowledges this error.
> -         */
> -        cpu_physical_memory_write(read_ack_register_addr,
> -            &read_ack_register, sizeof(uint64_t));
> +        if (src_id == source_id) {
> +            break;
> +        }
>  
> -        ret = acpi_ghes_record_mem_error(error_block_addr,
> -                                            physical_address);
> -    } else {
> +        err_source_struct += HEST_GHES_V2_TABLE_SIZE;
> +    }
> +    if (i == ACPI_GHES_ERROR_SOURCE_COUNT) {
>          error_report("can not find Generic Error Status Block");
> +
> +        return -1;
>      }
>  
> +    /* Navigate though table address pointers */
> +
> +    hest_err_block_addr = err_source_struct + GHES_ERR_ST_ADDR_OFFSET;
> +    hest_read_ack_start_addr = err_source_struct + GHES_ACK_OFFSET;
> +
> +    cpu_physical_memory_read(hest_err_block_addr, &error_block_addr,
> +                             sizeof(error_block_addr));
> +
> +    cpu_physical_memory_read(error_block_addr, &cper_addr,
> +                             sizeof(error_block_addr));
> +
> +    cpu_physical_memory_read(hest_read_ack_start_addr, &read_ack_start_addr,
> +                             sizeof(read_ack_start_addr));
> +
> +    /* Update ACK offset to notify about a new error */
> +
> +    cpu_physical_memory_read(read_ack_start_addr,
> +                             &read_ack, sizeof(read_ack));
> +
> +    read_ack = cpu_to_le64(0);
> +    cpu_physical_memory_write(read_ack_start_addr,
> +                              &read_ack, sizeof(read_ack));
> +
> +    ret = acpi_ghes_record_mem_error(error_block_addr, physical_address);
>      return ret;
>  }
>  
> diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
> index c9bbda4740e2..7485f54c28f2 100644
> --- a/include/hw/acpi/ghes.h
> +++ b/include/hw/acpi/ghes.h
> @@ -23,6 +23,7 @@
>  #define ACPI_GHES_H
>  
>  #include "hw/acpi/bios-linker-loader.h"
> +#include "qapi/error.h"
>  
>  /*
>   * Values for Hardware Error Notification Type field
> @@ -74,7 +75,10 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
>                       const char *oem_id, const char *oem_table_id);
>  void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
>                            GArray *hardware_errors);
> -int acpi_ghes_record_errors(uint8_t source_id, uint64_t error_physical_addr);
> +int acpi_ghes_record_errors(uint8_t source_id,
> +                            uint64_t error_physical_addr);
> +void ghes_record_cper_errors(const void *cper, size_t len,
> +                             uint16_t source_id, Error **errp);
>  
>  /**
>   * acpi_ghes_present: Report whether ACPI GHES table is present



  reply	other threads:[~2024-09-17 12:00 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-14  6:13 [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
2024-09-14  6:13 ` [PATCH v10 01/21] acpi/ghes: add a firmware file with HEST address Mauro Carvalho Chehab
2024-09-17  9:34   ` Igor Mammedov
2024-09-14  6:13 ` [PATCH v10 02/21] acpi/generic_event_device: Update GHES migration to cover hest addr Mauro Carvalho Chehab
2024-09-17  9:19   ` Igor Mammedov
2024-09-17 15:22     ` Peter Xu
2024-09-25  8:04       ` Mauro Carvalho Chehab
2024-09-17 12:01   ` Igor Mammedov
2024-09-25  7:32     ` Mauro Carvalho Chehab
2024-09-14  6:13 ` [PATCH v10 03/21] acpi/ghes: get rid of ACPI_HEST_SRC_ID_RESERVED Mauro Carvalho Chehab
2024-09-17  9:22   ` Igor Mammedov
2024-09-14  6:13 ` [PATCH v10 04/21] acpi/ghes: simplify acpi_ghes_record_errors() code Mauro Carvalho Chehab
2024-09-17  9:28   ` Igor Mammedov
2024-09-14  6:13 ` [PATCH v10 05/21] acpi/ghes: better handle source_id and notification Mauro Carvalho Chehab
2024-09-17  9:01   ` Igor Mammedov
2024-09-14  6:13 ` [PATCH v10 06/21] acpi/ghes: Remove a duplicated out of bounds check Mauro Carvalho Chehab
2024-09-17 10:39   ` Igor Mammedov
2024-09-14  6:13 ` [PATCH v10 07/21] acpi/ghes: rework the logic to handle HEST source ID Mauro Carvalho Chehab
2024-09-17 11:59   ` Igor Mammedov [this message]
2024-10-01 11:57     ` Mauro Carvalho Chehab
2024-10-03 14:51       ` Igor Mammedov
2024-09-14  6:13 ` [PATCH v10 08/21] acpi/ghes: Change the type for source_id Mauro Carvalho Chehab
2024-09-17 12:04   ` Igor Mammedov
2024-09-14  6:13 ` [PATCH v10 09/21] acpi/ghes: Don't hardcode the number of sources on ghes Mauro Carvalho Chehab
2024-09-14  6:13 ` [PATCH v10 10/21] acpi/ghes: make the GHES record generation more generic Mauro Carvalho Chehab
2024-09-14  6:13 ` [PATCH v10 11/21] acpi/ghes: don't crash QEMU if ghes GED is not found Mauro Carvalho Chehab
2024-09-14  6:13 ` [PATCH v10 12/21] acpi/ghes: rename etc/hardware_error file macros Mauro Carvalho Chehab
2024-09-14  6:13 ` [PATCH v10 13/21] acpi/ghes: better name GHES memory error function Mauro Carvalho Chehab
2024-09-14  6:13 ` [PATCH v10 14/21] acpi/ghes: add a notifier to notify when error data is ready Mauro Carvalho Chehab
2024-09-14  6:13 ` [PATCH v10 15/21] acpi/generic_event_device: add an APEI error device Mauro Carvalho Chehab
2024-09-14  6:13 ` [PATCH v10 16/21] arm/virt: Wire up a GED error device for ACPI / GHES Mauro Carvalho Chehab
2024-09-14  6:13 ` [PATCH v10 17/21] qapi/acpi-hest: add an interface to do generic CPER error injection Mauro Carvalho Chehab
2024-09-14  6:13 ` [PATCH v10 18/21] docs: acpi_hest_ghes: fix documentation for CPER size Mauro Carvalho Chehab
2024-09-14  6:13 ` [PATCH v10 19/21] scripts/ghes_inject: add a script to generate GHES error inject Mauro Carvalho Chehab
2024-09-14  6:13 ` [PATCH v10 20/21] target/arm: add an experimental mpidr arm cpu property object Mauro Carvalho Chehab
2024-09-14  6:13 ` [PATCH v10 21/21] scripts/arm_processor_error.py: retrieve mpidr if not filled Mauro Carvalho Chehab
2024-09-17 12:15 ` [PATCH v10 00/21] Add ACPI CPER firmware first error injection on ARM emulation Igor Mammedov
2024-09-24 13:00   ` Mauro Carvalho Chehab
2024-09-24 13:14     ` Igor Mammedov
2024-09-25  4:26       ` Mauro Carvalho Chehab

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240917135934.38579213@imammedo.users.ipa.redhat.com \
    --to=imammedo@redhat.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=anisinha@redhat.com \
    --cc=gengdongjiu1@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab+huawei@kernel.org \
    --cc=mst@redhat.com \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=shiju.jose@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).