From: Markus Armbruster <armbru@redhat.com>
To: Gavin Shan <gshan@redhat.com>
Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org,
jonathan.cameron@huawei.com, mchehab+huawei@kernel.org,
gengdongjiu1@gmail.com, mst@redhat.com, imammedo@redhat.com,
anisinha@redhat.com, peter.maydell@linaro.org,
pbonzini@redhat.com, shan.gavin@gmail.com
Subject: Re: [PATCH v3 6/8] acpi/ghes: Use error_abort in acpi_ghes_memory_errors()
Date: Tue, 11 Nov 2025 08:31:49 +0100 [thread overview]
Message-ID: <87jyzxf2je.fsf@pond.sub.org> (raw)
In-Reply-To: <b673bf36-cf1b-4103-bce8-0465a1385403@redhat.com> (Gavin Shan's message of "Tue, 11 Nov 2025 16:02:24 +1000")
Gavin Shan <gshan@redhat.com> writes:
> Hi Markus,
>
> On 11/11/25 3:25 PM, Markus Armbruster wrote:
>> Gavin Shan <gshan@redhat.com> writes:
>>
>>> Use error_abort in acpi_ghes_memory_errors() so that the caller needn't
>>> explicitly call abort() on errors. With this change, its return value
>>> isn't needed any more.
>>>
>>> Suggested-by: Igor Mammedov <imammedo@redhat.com>
>>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>>> ---
>>> hw/acpi/ghes-stub.c | 6 +++---
>>> hw/acpi/ghes.c | 15 ++++-----------
>>> include/hw/acpi/ghes.h | 5 +++--
>>> target/arm/kvm.c | 10 +++-------
>>> 4 files changed, 13 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
>>> index 4faf573aeb..4ef914ffc5 100644
>>> --- a/hw/acpi/ghes-stub.c
>>> +++ b/hw/acpi/ghes-stub.c
>>> @@ -11,10 +11,10 @@
>>> #include "qemu/osdep.h"
>>> #include "hw/acpi/ghes.h"
>>>
>>> -int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
>>> - uint64_t *addresses, uint32_t num_of_addresses)
>>> +void acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
>>> + uint64_t *addresses, uint32_t num_of_addresses,
>>> + Error **errp)
>>> {
>>> - return -1;
>>> }
>>
>> Before the patch, this function always fails: it returns -1.
>>
>> Afterwards, it always succeeds: it doesn't set @errp.
>>
>> Which one is correct?
>>
>
> Both are correct. This variant is only used on !CONFIG_ACPI_APEI. In this case,
> there is no chance to call this variant in the only caller kvm_arch_on_sigbus_vcpu().
> acpi_ghes_get_state() returns NULL on !CONFIG_ACPI_APEI there.
If this is not supposed to be called, have it abort() to make it
obvious.
>>>
>>> AcpiGhesState *acpi_ghes_get_state(void)
>>> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
>>> index 055e5d719a..aa469c03f2 100644
>>> --- a/hw/acpi/ghes.c
>>> +++ b/hw/acpi/ghes.c
>>> @@ -543,8 +543,9 @@ void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
>>> notifier_list_notify(&acpi_generic_error_notifiers, &source_id);
>>> }
>>>
>>> -int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
>>> - uint64_t *addresses, uint32_t num_of_addresses)
>>> +void acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
>>> + uint64_t *addresses, uint32_t num_of_addresses,
>>> + Error **errp)
>>
>> qapi/error.h:
>>
>> * - Whenever practical, also return a value that indicates success /
>> * failure. This can make the error checking more concise, and can
>> * avoid useless error object creation and destruction. Note that
>> * we still have many functions returning void. We recommend
>> * • bool-valued functions return true on success / false on failure,
>> * • pointer-valued functions return non-null / null pointer, and
>> * • integer-valued functions return non-negative / negative.
>>
>
> Question: If it's deterministic that caller passes @error_abort or @error_fatal
> to acpi_ghes_memory_errors(), QEMU crashes with a core dump or exit before its
> caller to check the return value. In this case, it's still preferred for
> acpi_ghes_memory_errors() returns a value to indicate the success or failure?
Yes, to separate concerns: acpi_ghes_memory_errors() remains oblivious
of how it is called.
>>> {
>>> /* Memory Error Section Type */
>>> const uint8_t guid[] =
>>> @@ -555,7 +556,6 @@ int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
>>> * Table 17-13 Generic Error Data Entry
>>> */
>>> QemuUUID fru_id = {};
>>> - Error *errp = NULL;
>>> int data_length;
>>> GArray *block;
>>> uint32_t block_status, i;
>>> @@ -592,16 +592,9 @@ int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
>>> }
>>>
>>> /* Report the error */
>>> - ghes_record_cper_errors(ags, block->data, block->len, source_id, &errp);
>>> + ghes_record_cper_errors(ags, block->data, block->len, source_id, errp);
>>>
>>> g_array_free(block, true);
>>> -
>>> - if (errp) {
>>> - error_report_err(errp);
>>> - return -1;
>>> - }
>>> -
>>> - return 0;
>>> }
>>
>> The error reporting moves into the caller.
>>
>
> Similar question as above. If it's deterministic for the caller passes @error_abort
> or @error_fatal to acpi_ghes_memory_errors() and then to ghes_record_cper_errors(),
> QEMU crashes with a core dump or exit before error_report_err(errp) can be executed.
> In this case, it's still preferred to have error_report_err(&error_abort) or
> error_report_err(&error_fatal) in its caller?
Yes.
>>>
>>> AcpiGhesState *acpi_ghes_get_state(void)
>>> diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
>>> index f73908985d..35c7bbbb01 100644
>>> --- a/include/hw/acpi/ghes.h
>>> +++ b/include/hw/acpi/ghes.h
>>> @@ -98,8 +98,9 @@ void acpi_build_hest(AcpiGhesState *ags, GArray *table_data,
>>> const char *oem_id, const char *oem_table_id);
>>> void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
>>> GArray *hardware_errors);
>>> -int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
>>> - uint64_t *addresses, uint32_t num_of_addresses);
>>> +void acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
>>> + uint64_t *addresses, uint32_t num_of_addresses,
>>> + Error **errp);
>>> void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
>>> uint16_t source_id, Error **errp);
>>>
>>> diff --git a/target/arm/kvm.c b/target/arm/kvm.c
>>> index 459ca4a9b0..a889315606 100644
>>> --- a/target/arm/kvm.c
>>> +++ b/target/arm/kvm.c
>>> @@ -2458,13 +2458,9 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
>>> addresses[0] = paddr;
>>> if (code == BUS_MCEERR_AR) {
>>> kvm_cpu_synchronize_state(c);
>>> - if (!acpi_ghes_memory_errors(ags, ACPI_HEST_SRC_ID_SYNC,
>>> - addresses, 1)) {
>>> - kvm_inject_arm_sea(c);
>>> - } else {
>>> - error_report("failed to record the error");
>>> - abort();
>>> - }
>>> + acpi_ghes_memory_errors(ags, ACPI_HEST_SRC_ID_SYNC,
>>> + addresses, 1, &error_abort);
>>> + kvm_inject_arm_sea(c);
>>
>> Before the patch, we get two error reports, like this:
>>
>> qemu-system-FOO: OSPM does not acknowledge previous error, so can not record CPER for current error anymore
>> qemu-system-FOO: failed to record the error
>> Aborted (core dumped)
>>
>> Such error cascades should be avoided.
>>
>> Afterwards, we get one:
>>
>> Unexpected error at SOURCE-FILE:LINE-NUMBER:
>> qemu-system-FOO: OSPM does not acknowledge previous error, so can not record CPER for current error anymore
>> Aborted (core dumped)
>>
>> Are all errors reported by acpi_ghes_memory_errors() programming errors,
>> i.e. when an error is reported, there's a bug for us to fix?
>>
>> If not, abort() is wrong before the patch, and &error_abort is wrong
>> afterwards.
>>
>> You can use &error_fatal for fatal errors that are not programming
>> errors.
>
> No, there is no programming errors and the core dump is actually no sense.
Confirms my suspicion :)
> It makes more sense for the caller to pass @error_fatal down to acpi_ghes_memory_errors().
Makes sense.
>>> }
>>> return;
>>> }
>>
>
> Thanks,
> Gavin
next prev parent reply other threads:[~2025-11-11 7:32 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-05 11:44 [PATCH v3 0/8] target/arm/kvm: Improve memory error handling Gavin Shan
2025-11-05 11:44 ` [PATCH v3 1/8] tests/qtest/bios-tables-test: Prepare for changes in the HEST table Gavin Shan
2025-11-05 14:16 ` Jonathan Cameron via
2025-11-05 14:16 ` Jonathan Cameron via
2025-11-05 11:44 ` [PATCH v3 2/8] acpi/ghes: Increase GHES raw data maximal length to 4KiB Gavin Shan
2025-11-05 14:16 ` Jonathan Cameron via
2025-11-05 14:16 ` Jonathan Cameron via
2025-11-10 14:11 ` Igor Mammedov
2025-11-11 4:05 ` Gavin Shan
2025-11-12 12:32 ` Igor Mammedov
2025-11-12 17:41 ` Gavin Shan
2025-11-05 11:44 ` [PATCH v3 3/8] tests/qtest/bios-tables-test: Update HEST table Gavin Shan
2025-11-05 14:17 ` Jonathan Cameron via
2025-11-05 14:17 ` Jonathan Cameron via
2025-11-05 11:44 ` [PATCH v3 4/8] acpi/ghes: Extend acpi_ghes_memory_errors() to support multiple CPERs Gavin Shan
2025-11-05 14:14 ` Jonathan Cameron via
2025-11-05 14:14 ` Jonathan Cameron via
2025-11-06 3:15 ` Gavin Shan
2025-11-10 14:49 ` Igor Mammedov
2025-11-11 4:08 ` Gavin Shan
2025-11-11 10:07 ` Jonathan Cameron via
2025-11-11 10:07 ` Jonathan Cameron via
2025-11-11 10:55 ` Gavin Shan
2025-11-11 11:55 ` Jonathan Cameron via
2025-11-11 11:55 ` Jonathan Cameron via
2025-11-11 12:19 ` Gavin Shan
2025-11-11 13:12 ` Jonathan Cameron via
2025-11-11 13:12 ` Jonathan Cameron via
2025-11-10 14:38 ` Igor Mammedov
2025-11-11 4:40 ` Gavin Shan
2025-11-12 13:12 ` Igor Mammedov
2025-11-12 17:36 ` Gavin Shan
2025-11-10 14:43 ` Philippe Mathieu-Daudé
2025-11-10 23:38 ` Gavin Shan
2025-11-11 3:40 ` Gavin Shan
2025-11-10 14:48 ` Philippe Mathieu-Daudé
2025-11-11 3:44 ` Gavin Shan
2025-11-05 11:44 ` [PATCH v3 5/8] acpi/ghes: Bail early on error from get_ghes_source_offsets() Gavin Shan
2025-11-05 14:17 ` Jonathan Cameron via
2025-11-05 14:17 ` Jonathan Cameron via
2025-11-10 14:50 ` Philippe Mathieu-Daudé
2025-11-11 3:48 ` Gavin Shan
2025-11-10 14:51 ` Igor Mammedov
2025-11-05 11:44 ` [PATCH v3 6/8] acpi/ghes: Use error_abort in acpi_ghes_memory_errors() Gavin Shan
2025-11-05 14:18 ` Jonathan Cameron via
2025-11-05 14:18 ` Jonathan Cameron via
2025-11-10 14:53 ` Igor Mammedov
2025-11-10 14:54 ` Philippe Mathieu-Daudé
2025-11-11 3:58 ` Gavin Shan
2025-11-12 12:49 ` Igor Mammedov
2025-11-12 17:38 ` Gavin Shan
2025-11-11 5:08 ` Markus Armbruster
2025-11-11 5:25 ` Markus Armbruster
2025-11-11 6:02 ` Gavin Shan
2025-11-11 7:31 ` Markus Armbruster [this message]
2025-11-05 11:44 ` [PATCH v3 7/8] kvm/arm/kvm: Introduce helper push_ghes_memory_errors() Gavin Shan
2025-11-05 14:19 ` Jonathan Cameron via
2025-11-05 14:19 ` Jonathan Cameron via
2025-11-10 14:56 ` Igor Mammedov
2025-11-11 4:09 ` Gavin Shan
2025-11-05 11:44 ` [PATCH v3 8/8] target/arm/kvm: Support multiple memory CPERs injection Gavin Shan
2025-11-05 14:37 ` Jonathan Cameron via
2025-11-05 14:37 ` Jonathan Cameron via
2025-11-06 3:26 ` Gavin Shan
2025-11-11 10:12 ` Jonathan Cameron
2025-11-11 10:12 ` Jonathan Cameron via
2025-11-11 10:12 ` Jonathan Cameron via
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87jyzxf2je.fsf@pond.sub.org \
--to=armbru@redhat.com \
--cc=anisinha@redhat.com \
--cc=gengdongjiu1@gmail.com \
--cc=gshan@redhat.com \
--cc=imammedo@redhat.com \
--cc=jonathan.cameron@huawei.com \
--cc=mchehab+huawei@kernel.org \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=shan.gavin@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.