From: Gavin Shan <gshan@redhat.com>
To: Igor Mammedov <imammedo@redhat.com>
Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org,
jonathan.cameron@huawei.com, mchehab+huawei@kernel.org,
gengdongjiu1@gmail.com, mst@redhat.com, anisinha@redhat.com,
peter.maydell@linaro.org, pbonzini@redhat.com,
shan.gavin@gmail.com
Subject: Re: [PATCH v3 4/8] acpi/ghes: Extend acpi_ghes_memory_errors() to support multiple CPERs
Date: Thu, 13 Nov 2025 03:36:43 +1000 [thread overview]
Message-ID: <5eb0786e-39b6-4d7d-856e-e7e77cf033d4@redhat.com> (raw)
In-Reply-To: <20251112141203.7d663088@fedora>
Hi Igor,
On 11/12/25 11:12 PM, Igor Mammedov wrote:
> On Tue, 11 Nov 2025 14:40:42 +1000
> Gavin Shan <gshan@redhat.com> wrote:
>> On 11/11/25 12:38 AM, Igor Mammedov wrote:
>>> On Wed, 5 Nov 2025 21:44:49 +1000
>>> Gavin Shan <gshan@redhat.com> wrote:
>>>
>>>> In the situation where host and guest has 64KiB and 4KiB page sizes,
>>>> one problematic host page affects 16 guest pages. we need to send 16
>>>> consective errors in this specific case.
>>>
>>> I still don't like it, since it won't fix anything in case of more than
>>> 1 broken host pages. (in v2 discussion quickly went hugepages route
>>> and futility of recovering from them).
>>>
>>> If having per vCPU source is not desirable,
>>> can we stall all other vcpus that touch poisoned pages until
>>> error is acked by guest and then let another VCPU to queue its own error?
>>>
>>
>> We're trying to avoid the guest from suddenly disappearing due to the QEMU
>> crash, instead of recovering from the memory errors. To keep the guest
>> accessible, system administrators still get a chance to collect important
>> information from the guest.
>>
>> The idea of stalling the vCPU which is accessing any poisoned pages and
>> retry on delivering the error was proposed in v1, but was rejected.
>>
>> https://lists.nongnu.org/archive/html/qemu-arm/2025-02/msg01071.html
>
> that depends on what outcome we do wish for.
> Described deadlock might be even desired vs QEMU abort() as it lets
> guest admin to collect VM crash dump.
>
> But honestly I'd go with per/vCPU approach if it's possible,
> as that still get guest side chance to recover.
>
Yes, per-vcpu error source is a nice idea as we agreed :-)
>> As the intention of this series is just to improve the memory error
>> reporting, to avoid QEMU crash if possible, it sounds reasonable to send
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> that,
> this series doesn't do that as it would still crash QEMU if another
> vCPU faults on another faulty host page (i.e. not the one we've generated CPERs)
>
Correct, Qemu can't bear with memory error storm even with this series
applied.
> You also mentioned in previous review that with per vCPU error source
> variant that QEMU would abort elsewhere (is it fixable?).
>
I need to take a closer look. If two consecutive errors are duplicate
when the problematic error physical addresses falls into same guest
page. The QEMU crash can be avoided by adding more checks. Otherwise,
I need to figure out the fix.
>> 16x consecutive CPERs in one shot for this specific case (4KB guest on
>> 64KB host).
>
> I don't object to generating 16x CPERs per fault as that obviously
> should reduce # of guest exits.
>
Thanks, Igor. Just posted (v4) and please take your time to review.
>
> Given it's rather late in release cycle,
> we probably can handle 1 page case 1st as in this series,
> with followup series to switch to per/vCPU variant once new merge
> window opens (assuming I can coax a promise from you to follow up on that).
>
Yes, it's actually the plan I had. To improve this series so that it can
be merged soon. After that, I will continue the improvement with per-vcpu
error source. The ultimate goal is to have 16x consecutive errors and
per-vcpu error source combined.
>> As to hugetlb cases, it's different story. If the hugetlb
>> folio (page) size is small enough (like 64KB), we can leverage current
>> design to send consecutive CPERs. I don't think there are too much we
>> can do if hugetlb folio size is large enough (from 2MB to 16GB).
>>
>>>
>>>> Extend acpi_ghes_memory_errors() to support multiple CPERs after the
>>>> hunk of code to generate the GHES error status is pulled out from
>>>> ghes_gen_err_data_uncorrectable_recoverable(). The status field of
>>>> generic error status block is also updated accordingly if multiple
>>>> error data entries are contained in the generic error status block.
>>>
>>> I don't mind much translating 64K page error into several 4K CPER
>>> records, so this part is fine. But it's hardly a solution to the generic
>>> problem.
>>>
>>
>> Note that I don't expect a memory error storm from the hardware level.
>> In that case, it's a good sign indicating the memory DIMM has been totally
>> broken and needs a replacement :-)
>>
>>>>
>>>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>>>> ---
>>>> hw/acpi/ghes-stub.c | 2 +-
>>>> hw/acpi/ghes.c | 60 +++++++++++++++++++++++-------------------
>>>> include/hw/acpi/ghes.h | 2 +-
>>>> target/arm/kvm.c | 4 ++-
>>>> 4 files changed, 38 insertions(+), 30 deletions(-)
>>>>
>>> ...
>>>> @@ -577,10 +568,25 @@ int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
>>>> assert((data_length + ACPI_GHES_GESB_SIZE) <=
>>>> ACPI_GHES_MAX_RAW_DATA_LENGTH);
>>>>
>>>> - ghes_gen_err_data_uncorrectable_recoverable(block, guid, data_length);
>>>> + /* Build the new generic error status block header */
>>>> + block_status = (1 << ACPI_GEBS_UNCORRECTABLE) |
>>>> + (num_of_addresses << ACPI_GEBS_ERROR_DATA_ENTRIES);
>>> ^^^^^^^^^^^^^^
>>> maybe assert in case it won't fit into bit field
>>>
>>
>> Yep, Same thing was suggested by Philippe.
>>
>>>> + if (num_of_addresses > 1) {
>>>> + block_status |= ACPI_GEBS_MULTIPLE_UNCORRECTABLE;
>>>> + }
>>>> +
>>>> + acpi_ghes_generic_error_status(block, block_status, 0, 0,
>>>> + data_length, ACPI_CPER_SEV_RECOVERABLE);
>>>>
>>>> - /* Build the memory section CPER for above new generic error data entry */
>>>> - acpi_ghes_build_append_mem_cper(block, physical_address);
>>>> + for (i = 0; i < num_of_addresses; i++) {
>>>> + /* Build generic error data entries */
>>>> + acpi_ghes_generic_error_data(block, guid,
>>>> + ACPI_CPER_SEV_RECOVERABLE, 0, 0,
>>>> + ACPI_GHES_MEM_CPER_LENGTH, fru_id, 0);
>>>> +
>>>> + /* Memory section CPER on top of the generic error data entry */
>>>> + acpi_ghes_build_append_mem_cper(block, addresses[i]);
>>>> + }
>>>>
>>>> /* Report the error */
>>>> ghes_record_cper_errors(ags, block->data, block->len, source_id, &errp);
>>>> diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
>>>> index df2ecbf6e4..f73908985d 100644
>>>> --- a/include/hw/acpi/ghes.h
>>>> +++ b/include/hw/acpi/ghes.h
>>>> @@ -99,7 +99,7 @@ void acpi_build_hest(AcpiGhesState *ags, GArray *table_data,
>>>> void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
>>>> GArray *hardware_errors);
>>>> int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
>>>> - uint64_t error_physical_addr);
>>>> + uint64_t *addresses, uint32_t num_of_addresses);
>>>> void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
>>>> uint16_t source_id, Error **errp);
>>>>
>>>> diff --git a/target/arm/kvm.c b/target/arm/kvm.c
>>>> index 0d57081e69..459ca4a9b0 100644
>>>> --- a/target/arm/kvm.c
>>>> +++ b/target/arm/kvm.c
>>>> @@ -2434,6 +2434,7 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
>>>> ram_addr_t ram_addr;
>>>> hwaddr paddr;
>>>> AcpiGhesState *ags;
>>>> + uint64_t addresses[16];
>>>>
>>>> assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
>>>>
>>>> @@ -2454,10 +2455,11 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
>>>> * later from the main thread, so doing the injection of
>>>> * the error would be more complicated.
>>>> */
>>>> + addresses[0] = paddr;
>>>> if (code == BUS_MCEERR_AR) {
>>>> kvm_cpu_synchronize_state(c);
>>>> if (!acpi_ghes_memory_errors(ags, ACPI_HEST_SRC_ID_SYNC,
>>>> - paddr)) {
>>>> + addresses, 1)) {
>>>> kvm_inject_arm_sea(c);
>>>> } else {
>>>> error_report("failed to record the error");
>>>
Thanks,
Gavin
next prev parent reply other threads:[~2025-11-12 17:37 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-05 11:44 [PATCH v3 0/8] target/arm/kvm: Improve memory error handling Gavin Shan
2025-11-05 11:44 ` [PATCH v3 1/8] tests/qtest/bios-tables-test: Prepare for changes in the HEST table Gavin Shan
2025-11-05 14:16 ` Jonathan Cameron via
2025-11-05 11:44 ` [PATCH v3 2/8] acpi/ghes: Increase GHES raw data maximal length to 4KiB Gavin Shan
2025-11-05 14:16 ` Jonathan Cameron via
2025-11-10 14:11 ` Igor Mammedov
2025-11-11 4:05 ` Gavin Shan
2025-11-12 12:32 ` Igor Mammedov
2025-11-12 17:41 ` Gavin Shan
2025-11-05 11:44 ` [PATCH v3 3/8] tests/qtest/bios-tables-test: Update HEST table Gavin Shan
2025-11-05 14:17 ` Jonathan Cameron via
2025-11-05 11:44 ` [PATCH v3 4/8] acpi/ghes: Extend acpi_ghes_memory_errors() to support multiple CPERs Gavin Shan
2025-11-05 14:14 ` Jonathan Cameron via
2025-11-06 3:15 ` Gavin Shan
2025-11-10 14:49 ` Igor Mammedov
2025-11-11 4:08 ` Gavin Shan
2025-11-11 10:07 ` Jonathan Cameron via
2025-11-11 10:55 ` Gavin Shan
2025-11-11 11:55 ` Jonathan Cameron via
2025-11-11 12:19 ` Gavin Shan
2025-11-11 13:12 ` Jonathan Cameron via
2025-11-10 14:38 ` Igor Mammedov
2025-11-11 4:40 ` Gavin Shan
2025-11-12 13:12 ` Igor Mammedov
2025-11-12 17:36 ` Gavin Shan [this message]
2025-11-10 14:43 ` Philippe Mathieu-Daudé
2025-11-10 23:38 ` Gavin Shan
2025-11-11 3:40 ` Gavin Shan
2025-11-10 14:48 ` Philippe Mathieu-Daudé
2025-11-11 3:44 ` Gavin Shan
2025-11-05 11:44 ` [PATCH v3 5/8] acpi/ghes: Bail early on error from get_ghes_source_offsets() Gavin Shan
2025-11-05 14:17 ` Jonathan Cameron via
2025-11-10 14:50 ` Philippe Mathieu-Daudé
2025-11-11 3:48 ` Gavin Shan
2025-11-10 14:51 ` Igor Mammedov
2025-11-05 11:44 ` [PATCH v3 6/8] acpi/ghes: Use error_abort in acpi_ghes_memory_errors() Gavin Shan
2025-11-05 14:18 ` Jonathan Cameron via
2025-11-10 14:53 ` Igor Mammedov
2025-11-10 14:54 ` Philippe Mathieu-Daudé
2025-11-11 3:58 ` Gavin Shan
2025-11-12 12:49 ` Igor Mammedov
2025-11-12 17:38 ` Gavin Shan
2025-11-11 5:08 ` Markus Armbruster
2025-11-11 5:25 ` Markus Armbruster
2025-11-11 6:02 ` Gavin Shan
2025-11-11 7:31 ` Markus Armbruster
2025-11-05 11:44 ` [PATCH v3 7/8] kvm/arm/kvm: Introduce helper push_ghes_memory_errors() Gavin Shan
2025-11-05 14:19 ` Jonathan Cameron via
2025-11-10 14:56 ` Igor Mammedov
2025-11-11 4:09 ` Gavin Shan
2025-11-05 11:44 ` [PATCH v3 8/8] target/arm/kvm: Support multiple memory CPERs injection Gavin Shan
2025-11-05 14:37 ` Jonathan Cameron via
2025-11-06 3:26 ` Gavin Shan
2025-11-11 10:12 ` Jonathan Cameron via
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5eb0786e-39b6-4d7d-856e-e7e77cf033d4@redhat.com \
--to=gshan@redhat.com \
--cc=anisinha@redhat.com \
--cc=gengdongjiu1@gmail.com \
--cc=imammedo@redhat.com \
--cc=jonathan.cameron@huawei.com \
--cc=mchehab+huawei@kernel.org \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=shan.gavin@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).