qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Gavin Shan <gshan@redhat.com>
To: Igor Mammedov <imammedo@redhat.com>
Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	jonathan.cameron@huawei.com, mchehab+huawei@kernel.org,
	gengdongjiu1@gmail.com, mst@redhat.com, anisinha@redhat.com,
	peter.maydell@linaro.org, pbonzini@redhat.com,
	shan.gavin@gmail.com
Subject: Re: [PATCH v3 4/8] acpi/ghes: Extend acpi_ghes_memory_errors() to support multiple CPERs
Date: Thu, 13 Nov 2025 03:36:43 +1000	[thread overview]
Message-ID: <5eb0786e-39b6-4d7d-856e-e7e77cf033d4@redhat.com> (raw)
In-Reply-To: <20251112141203.7d663088@fedora>

Hi Igor,

On 11/12/25 11:12 PM, Igor Mammedov wrote:
> On Tue, 11 Nov 2025 14:40:42 +1000
> Gavin Shan <gshan@redhat.com> wrote:
>> On 11/11/25 12:38 AM, Igor Mammedov wrote:
>>> On Wed,  5 Nov 2025 21:44:49 +1000
>>> Gavin Shan <gshan@redhat.com> wrote:
>>>    
>>>> In the situation where host and guest has 64KiB and 4KiB page sizes,
>>>> one problematic host page affects 16 guest pages. we need to send 16
>>>> consective errors in this specific case.
>>>
>>> I still don't like it, since it won't fix anything in case of more than
>>> 1 broken host pages. (in v2 discussion quickly went hugepages route
>>> and futility of recovering from them).
>>>
>>> If having per vCPU source is not desirable,
>>> can we stall all other vcpus that touch poisoned pages until
>>> error is acked by guest and then let another VCPU to queue its own error?
>>>    
>>
>> We're trying to avoid the guest from suddenly disappearing due to the QEMU
>> crash, instead of recovering from the memory errors. To keep the guest
>> accessible, system administrators still get a chance to collect important
>> information from the guest.
>>
>> The idea of stalling the vCPU which is accessing any poisoned pages and
>> retry on delivering the error was proposed in v1, but was rejected.
>>
>> https://lists.nongnu.org/archive/html/qemu-arm/2025-02/msg01071.html
> 
> that depends on what outcome we do wish for.
> Described deadlock might be even desired vs QEMU abort() as it lets
> guest admin to collect VM crash dump.
> 
> But honestly I'd go with per/vCPU approach if it's possible,
> as that still get guest side chance to recover.
> 

Yes, per-vcpu error source is a nice idea as we agreed :-)

>> As the intention of this series is just to improve the memory error
>> reporting, to avoid QEMU crash if possible, it sounds reasonable to send
>               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> that,
> this series doesn't do that as it would still crash QEMU if another
> vCPU faults on another faulty host page (i.e. not the one we've generated CPERs)
> 

Correct, Qemu can't bear with memory error storm even with this series
applied.

> You also mentioned in previous review that with per vCPU error source
> variant that QEMU would abort elsewhere (is it fixable?).
> 

I need to take a closer look. If two consecutive errors are duplicate
when the problematic error physical addresses falls into same guest
page. The QEMU crash can be avoided by adding more checks. Otherwise,
I need to figure out the fix.

>> 16x consecutive CPERs in one shot for this specific case (4KB guest on
>> 64KB host).
> 
> I don't object to generating 16x CPERs per fault as that obviously
> should reduce # of guest exits.
> 

Thanks, Igor. Just posted (v4) and please take your time to review.
  
> 
> Given it's rather late in release cycle,
> we probably can handle 1 page case 1st as in this series,
> with followup series to switch to per/vCPU variant once new merge
> window opens (assuming I can coax a promise from you to follow up on that).
> 

Yes, it's actually the plan I had. To improve this series so that it can
be merged soon. After that, I will continue the improvement with per-vcpu
error source. The ultimate goal is to have 16x consecutive errors and
per-vcpu error source combined.

>> As to hugetlb cases, it's different story. If the hugetlb
>> folio (page) size is small enough (like 64KB), we can leverage current
>> design to send consecutive CPERs. I don't think there are too much we
>> can do if hugetlb folio size is large enough (from 2MB to 16GB).
>>
>>>    
>>>> Extend acpi_ghes_memory_errors() to support multiple CPERs after the
>>>> hunk of code to generate the GHES error status is pulled out from
>>>> ghes_gen_err_data_uncorrectable_recoverable(). The status field of
>>>> generic error status block is also updated accordingly if multiple
>>>> error data entries are contained in the generic error status block.
>>>
>>> I don't mind much translating 64K page error into several 4K CPER
>>> records, so this part is fine. But it's hardly a solution to the generic
>>> problem.
>>>    
>>
>> Note that I don't expect a memory error storm from the hardware level.
>> In that case, it's a good sign indicating the memory DIMM has been totally
>> broken and needs a replacement :-)
>>
>>>>
>>>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>>>> ---
>>>>    hw/acpi/ghes-stub.c    |  2 +-
>>>>    hw/acpi/ghes.c         | 60 +++++++++++++++++++++++-------------------
>>>>    include/hw/acpi/ghes.h |  2 +-
>>>>    target/arm/kvm.c       |  4 ++-
>>>>    4 files changed, 38 insertions(+), 30 deletions(-)
>>>>   
>>> ...
>>>> @@ -577,10 +568,25 @@ int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
>>>>        assert((data_length + ACPI_GHES_GESB_SIZE) <=
>>>>                ACPI_GHES_MAX_RAW_DATA_LENGTH);
>>>>    
>>>> -    ghes_gen_err_data_uncorrectable_recoverable(block, guid, data_length);
>>>> +    /* Build the new generic error status block header */
>>>> +    block_status = (1 << ACPI_GEBS_UNCORRECTABLE) |
>>>> +                   (num_of_addresses << ACPI_GEBS_ERROR_DATA_ENTRIES);
>>>                          ^^^^^^^^^^^^^^
>>> maybe assert in case it won't fit into bit field
>>>    
>>
>> Yep, Same thing was suggested by Philippe.
>>
>>>> +    if (num_of_addresses > 1) {
>>>> +        block_status |= ACPI_GEBS_MULTIPLE_UNCORRECTABLE;
>>>> +    }
>>>> +
>>>> +    acpi_ghes_generic_error_status(block, block_status, 0, 0,
>>>> +                                   data_length, ACPI_CPER_SEV_RECOVERABLE);
>>>>    
>>>> -    /* Build the memory section CPER for above new generic error data entry */
>>>> -    acpi_ghes_build_append_mem_cper(block, physical_address);
>>>> +    for (i = 0; i < num_of_addresses; i++) {
>>>> +        /* Build generic error data entries */
>>>> +        acpi_ghes_generic_error_data(block, guid,
>>>> +                                     ACPI_CPER_SEV_RECOVERABLE, 0, 0,
>>>> +                                     ACPI_GHES_MEM_CPER_LENGTH, fru_id, 0);
>>>> +
>>>> +        /* Memory section CPER on top of the generic error data entry */
>>>> +        acpi_ghes_build_append_mem_cper(block, addresses[i]);
>>>> +    }
>>>>    
>>>>        /* Report the error */
>>>>        ghes_record_cper_errors(ags, block->data, block->len, source_id, &errp);
>>>> diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
>>>> index df2ecbf6e4..f73908985d 100644
>>>> --- a/include/hw/acpi/ghes.h
>>>> +++ b/include/hw/acpi/ghes.h
>>>> @@ -99,7 +99,7 @@ void acpi_build_hest(AcpiGhesState *ags, GArray *table_data,
>>>>    void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
>>>>                              GArray *hardware_errors);
>>>>    int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
>>>> -                            uint64_t error_physical_addr);
>>>> +                            uint64_t *addresses, uint32_t num_of_addresses);
>>>>    void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
>>>>                                 uint16_t source_id, Error **errp);
>>>>    
>>>> diff --git a/target/arm/kvm.c b/target/arm/kvm.c
>>>> index 0d57081e69..459ca4a9b0 100644
>>>> --- a/target/arm/kvm.c
>>>> +++ b/target/arm/kvm.c
>>>> @@ -2434,6 +2434,7 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
>>>>        ram_addr_t ram_addr;
>>>>        hwaddr paddr;
>>>>        AcpiGhesState *ags;
>>>> +    uint64_t addresses[16];
>>>>    
>>>>        assert(code == BUS_MCEERR_AR || code == BUS_MCEERR_AO);
>>>>    
>>>> @@ -2454,10 +2455,11 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
>>>>                 * later from the main thread, so doing the injection of
>>>>                 * the error would be more complicated.
>>>>                 */
>>>> +            addresses[0] = paddr;
>>>>                if (code == BUS_MCEERR_AR) {
>>>>                    kvm_cpu_synchronize_state(c);
>>>>                    if (!acpi_ghes_memory_errors(ags, ACPI_HEST_SRC_ID_SYNC,
>>>> -                                             paddr)) {
>>>> +                                             addresses, 1)) {
>>>>                        kvm_inject_arm_sea(c);
>>>>                    } else {
>>>>                        error_report("failed to record the error");
>>>    

Thanks,
Gavin



  reply	other threads:[~2025-11-12 17:37 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-05 11:44 [PATCH v3 0/8] target/arm/kvm: Improve memory error handling Gavin Shan
2025-11-05 11:44 ` [PATCH v3 1/8] tests/qtest/bios-tables-test: Prepare for changes in the HEST table Gavin Shan
2025-11-05 14:16   ` Jonathan Cameron via
2025-11-05 11:44 ` [PATCH v3 2/8] acpi/ghes: Increase GHES raw data maximal length to 4KiB Gavin Shan
2025-11-05 14:16   ` Jonathan Cameron via
2025-11-10 14:11   ` Igor Mammedov
2025-11-11  4:05     ` Gavin Shan
2025-11-12 12:32       ` Igor Mammedov
2025-11-12 17:41         ` Gavin Shan
2025-11-05 11:44 ` [PATCH v3 3/8] tests/qtest/bios-tables-test: Update HEST table Gavin Shan
2025-11-05 14:17   ` Jonathan Cameron via
2025-11-05 11:44 ` [PATCH v3 4/8] acpi/ghes: Extend acpi_ghes_memory_errors() to support multiple CPERs Gavin Shan
2025-11-05 14:14   ` Jonathan Cameron via
2025-11-06  3:15     ` Gavin Shan
2025-11-10 14:49       ` Igor Mammedov
2025-11-11  4:08         ` Gavin Shan
2025-11-11 10:07           ` Jonathan Cameron via
2025-11-11 10:55             ` Gavin Shan
2025-11-11 11:55               ` Jonathan Cameron via
2025-11-11 12:19                 ` Gavin Shan
2025-11-11 13:12                   ` Jonathan Cameron via
2025-11-10 14:38   ` Igor Mammedov
2025-11-11  4:40     ` Gavin Shan
2025-11-12 13:12       ` Igor Mammedov
2025-11-12 17:36         ` Gavin Shan [this message]
2025-11-10 14:43   ` Philippe Mathieu-Daudé
2025-11-10 23:38     ` Gavin Shan
2025-11-11  3:40       ` Gavin Shan
2025-11-10 14:48   ` Philippe Mathieu-Daudé
2025-11-11  3:44     ` Gavin Shan
2025-11-05 11:44 ` [PATCH v3 5/8] acpi/ghes: Bail early on error from get_ghes_source_offsets() Gavin Shan
2025-11-05 14:17   ` Jonathan Cameron via
2025-11-10 14:50   ` Philippe Mathieu-Daudé
2025-11-11  3:48     ` Gavin Shan
2025-11-10 14:51   ` Igor Mammedov
2025-11-05 11:44 ` [PATCH v3 6/8] acpi/ghes: Use error_abort in acpi_ghes_memory_errors() Gavin Shan
2025-11-05 14:18   ` Jonathan Cameron via
2025-11-10 14:53   ` Igor Mammedov
2025-11-10 14:54   ` Philippe Mathieu-Daudé
2025-11-11  3:58     ` Gavin Shan
2025-11-12 12:49       ` Igor Mammedov
2025-11-12 17:38         ` Gavin Shan
2025-11-11  5:08     ` Markus Armbruster
2025-11-11  5:25   ` Markus Armbruster
2025-11-11  6:02     ` Gavin Shan
2025-11-11  7:31       ` Markus Armbruster
2025-11-05 11:44 ` [PATCH v3 7/8] kvm/arm/kvm: Introduce helper push_ghes_memory_errors() Gavin Shan
2025-11-05 14:19   ` Jonathan Cameron via
2025-11-10 14:56   ` Igor Mammedov
2025-11-11  4:09     ` Gavin Shan
2025-11-05 11:44 ` [PATCH v3 8/8] target/arm/kvm: Support multiple memory CPERs injection Gavin Shan
2025-11-05 14:37   ` Jonathan Cameron via
2025-11-06  3:26     ` Gavin Shan
2025-11-11 10:12       ` Jonathan Cameron via

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5eb0786e-39b6-4d7d-856e-e7e77cf033d4@redhat.com \
    --to=gshan@redhat.com \
    --cc=anisinha@redhat.com \
    --cc=gengdongjiu1@gmail.com \
    --cc=imammedo@redhat.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=mchehab+huawei@kernel.org \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=shan.gavin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).