All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: Igor Mammedov <imammedo@redhat.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	Shiju Jose <shiju.jose@huawei.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Ani Sinha <anisinha@redhat.com>,
	Dongjiu Geng <gengdongjiu1@gmail.com>,
	linux-kernel@vger.kernel.org, qemu-arm@nongnu.org,
	qemu-devel@nongnu.org
Subject: Re: [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI
Date: Thu, 8 Aug 2024 14:11:14 +0200	[thread overview]
Message-ID: <20240808141114.3b021f80@foz.lan> (raw)
In-Reply-To: <20240806163113.3bdc260a@imammedo.users.ipa.redhat.com>

Em Tue, 6 Aug 2024 16:31:13 +0200
Igor Mammedov <imammedo@redhat.com> escreveu:

> > +    /* Could also be read back from the error_block_address register */
> > +    *error_block_addr = base +
> > +        ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
> > +        ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) +
> > +        error_source_to_index[notify] * ACPI_GHES_MAX_RAW_DATA_LENGTH;
> > +
> > +    return true;
> > +}  
> 
> I don't like all this pointer math, which is basically a reverse engineered
> QEMU actions on startup + guest provided etc/hardware_errors address.
> 
> For once, it assumes error_source_to_index[] matches order in which HEST
> error sources were described, which is fragile.
> 
> 2nd: migration-wive it's disaster, since old/new HEST/hardware_errors tables
> in RAM migrated from older version might not match above assumptions
> of target QEMU. 
> 
> I see 2 ways to rectify it:
>   1st: preferred/cleanest would be to tell QEMU (via fw_cfg) address of HEST table
>        in guest RAM, like we do with etc/hardware_errors, see
>             build_ghes_error_table()
>                ...
>                tell firmware to write hardware_errors GPA into
>        and then fetch from HEST table in RAM, the guest patched error/ack addresses
>        for given source_id
> 
>        code-wise: relatively simple once one wraps their own head over
>                  how this whole APEI thing works in QEMU
>                  workflow  is described in docs/specs/acpi_hest_ghes.rst
>                  look to me as sufficient to grasp it.
>                  (but my view is very biased given my prior knowledge,
>                   aka: docs/comments/examples wrt acpi patching are good enough)
>                  (if it's not clear how to do it, ask me for pointers)

That sounds a better approach, however...

>   2nd:  sort of hack based on build_ghes_v2() Error Status Address/Read Ack Register
>         patching instructions
>                bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,                
>                    address_offset + GAS_ADDR_OFFSET, sizeof(uint64_t),                      
>                    ACPI_GHES_ERRORS_FW_CFG_FILE, source_id * sizeof(uint64_t));
>                                                  ^^^^^^^^^^^^^^^^^^^^^^^^^
>         during build_ghes_v2() also store on a side mapping
>              source_id -> error address offset : read ack address
> 
>         so when you are injecting error, you'd at least use offsets
>         used at start time, to get rid of risk where injection code
>         diverge from HEST:etc/hardware_errors layout at start time.
> 
>         However to make migration safe, one would need to add a fat
>         comment not to change order ghest error sources in HEST _and_
>         a dedicated unit test to make sure we catch it when that happens.
>         bios_tables_test should be able to catch the change, but it won't
>         say what's wrong, hence a test case that explicitly checks order
>         and loudly & clear complains when we will break order assumptions.
> 
>         downside:
>            * we are are limiting ways HEST could be composed/reshuffled in future
>            * consumption of extra CI resources
>            * and well, it relies on above duct tape holding all pieces together

I ended opting to do approach (2) on this changeset, as the current code
is already using bios_linker_loader_add_pointer() for ghes, being deeply 
relying on the block address/ack and cper calculus.

To avoid troubles on this duct tape, I opted to move all offset math
to a single function at ghes.c:

	/*
	 * ID numbers used to fill HEST source ID field
	 */
	enum AcpiHestSourceId {
	    ACPI_HEST_SRC_ID_SEA,
	    ACPI_HEST_SRC_ID_GED,
	
	    /* Shall be the last one */
	    ACPI_HEST_SRC_ID_COUNT
	} AcpiHestSourceId;

	...

	static bool acpi_hest_address_offset(enum AcpiGhesNotifyType notify,
        	                             uint64_t *error_block_offset,
                	                     uint64_t *ack_offset,
                        	             uint64_t *cper_offset,
                                	     enum AcpiHestSourceId *source_id)
	{
	    enum AcpiHestSourceId source;
	    uint64_t offset;

	    switch (notify) {
	    case ACPI_GHES_NOTIFY_SEA:      /* Only on ARMv8 */
	        source = ACPI_HEST_SRC_ID_SEA;
	        break;
	    case ACPI_GHES_NOTIFY_GPIO:
	        source = ACPI_HEST_SRC_ID_GED;
	        break;
	    default:
	        return true;
	    }

	    if (source_id) {
	        *source_id = source;
	    }

	    /*
	     * Please see docs/specs/acpi_hest_ghes.rst for the memory layout.
	     * In summary, memory starts with error addresses, then acks and
	     * finally CPER blocks.
	     */

	    offset = source * sizeof(uint64_t);

	    if (error_block_offset) {
	        *error_block_offset = offset;
	    }
	    if (ack_offset) {
	        *ack_offset = offset + ACPI_HEST_SRC_ID_COUNT * sizeof(uint64_t);
	    }
	    if (cper_offset) {
	        *cper_offset = 2 * ACPI_HEST_SRC_ID_COUNT * sizeof(uint64_t) +
	                       source * ACPI_GHES_MAX_RAW_DATA_LENGTH;
	    }

	    return false;
	}

I also removed the anonymous enum with SEA/GPIO source IDs, using
only the ACPI notify type as arguments at the function calls.

As there's now a single point where the offsets from
docs/specs/acpi_hest_ghes.rst are enforced, this should be error
prone.

The code could later be changed to use approach (2), on a separate
cleanup.

Thanks,
Mauro

  parent reply	other threads:[~2024-08-08 12:11 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-02 21:43 [PATCH v5 0/7] Add ACPI CPER firmware first error injection on ARM emulation Mauro Carvalho Chehab
2024-08-02 21:43 ` [PATCH v5 1/7] arm/virt: place power button pin number on a define Mauro Carvalho Chehab
2024-08-06  8:57   ` Igor Mammedov
2024-08-02 21:43 ` [PATCH v5 2/7] acpi/generic_event_device: add an APEI error device Mauro Carvalho Chehab
2024-08-05 16:39   ` Jonathan Cameron via
2024-08-05 16:39     ` Jonathan Cameron
2024-08-06  5:50     ` Mauro Carvalho Chehab
2024-08-06  8:54   ` Igor Mammedov
2024-08-02 21:43 ` [PATCH v5 3/7] arm/virt: Wire up GPIO error source for ACPI / GHES Mauro Carvalho Chehab
2024-08-05 16:54   ` Jonathan Cameron
2024-08-05 16:54     ` Jonathan Cameron via
2024-08-06  5:56     ` Mauro Carvalho Chehab
2024-08-06  9:15   ` Igor Mammedov
2024-08-02 21:43 ` [PATCH v5 4/7] acpi/ghes: Support GPIO error source Mauro Carvalho Chehab
2024-08-05 16:56   ` Jonathan Cameron via
2024-08-05 16:56     ` Jonathan Cameron
2024-08-05 16:56     ` Jonathan Cameron via
2024-08-06  6:09     ` Mauro Carvalho Chehab
2024-08-06  9:18       ` Igor Mammedov
2024-08-06  9:32   ` Igor Mammedov
2024-08-07  7:15     ` Mauro Carvalho Chehab
2024-08-02 21:44 ` [PATCH v5 5/7] qapi/ghes-cper: add an interface to do generic CPER error injection Mauro Carvalho Chehab
2024-08-05 17:00   ` Jonathan Cameron
2024-08-05 17:00     ` Jonathan Cameron via
2024-08-06  9:15   ` Shiju Jose
2024-08-06  9:15     ` Shiju Jose via
2024-08-06 12:51   ` Igor Mammedov
2024-08-06 12:58     ` Mauro Carvalho Chehab
2024-08-08  8:50   ` Markus Armbruster
2024-08-08 14:11     ` Mauro Carvalho Chehab
2024-08-08 14:22       ` Igor Mammedov
2024-08-08 14:45         ` Markus Armbruster
2024-08-09  8:42           ` Mauro Carvalho Chehab
2024-08-02 21:44 ` [PATCH v5 6/7] acpi/ghes: add support for generic error injection via QAPI Mauro Carvalho Chehab
2024-08-05 17:03   ` Jonathan Cameron
2024-08-05 17:03     ` Jonathan Cameron via
2024-08-06 11:13   ` Shiju Jose
2024-08-06 11:13     ` Shiju Jose via
2024-08-06 14:31   ` Igor Mammedov
2024-08-07  7:47     ` Mauro Carvalho Chehab
2024-08-07  9:34       ` Jonathan Cameron
2024-08-07  9:34         ` Jonathan Cameron via
2024-08-07 13:23         ` Mauro Carvalho Chehab
2024-08-07 13:43           ` Igor Mammedov
2024-08-07 13:28         ` Igor Mammedov
2024-08-07 14:25     ` Jonathan Cameron
2024-08-07 14:25       ` Jonathan Cameron via
2024-08-08  8:11       ` Igor Mammedov
2024-08-08 18:19         ` Mauro Carvalho Chehab
2024-08-12  9:39           ` Igor Mammedov
2024-08-13 18:59             ` Mauro Carvalho Chehab
2024-08-08 12:11     ` Mauro Carvalho Chehab [this message]
2024-08-08 12:45       ` Igor Mammedov
2024-08-02 21:44 ` [PATCH v5 7/7] scripts/ghes_inject: add a script to generate GHES error inject Mauro Carvalho Chehab
2024-08-06 14:56   ` Igor Mammedov
2024-08-08 20:58   ` John Snow
2024-08-08 21:51     ` Mauro Carvalho Chehab
2024-08-08 21:21   ` John Snow
2024-08-08 22:41     ` Mauro Carvalho Chehab
2024-08-08 23:33       ` John Snow
2024-08-09  8:24         ` Mauro Carvalho Chehab
2024-08-09 19:26           ` John Snow
2024-08-09  6:26       ` Mauro Carvalho Chehab
2024-08-09  7:37         ` Mauro Carvalho Chehab

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240808141114.3b021f80@foz.lan \
    --to=mchehab+huawei@kernel.org \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=anisinha@redhat.com \
    --cc=gengdongjiu1@gmail.com \
    --cc=imammedo@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=shiju.jose@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.