From: Gavin Shan <gshan@redhat.com>
To: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>,
Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org,
gengdongjiu1@gmail.com, mst@redhat.com, imammedo@redhat.com,
armbru@redhat.com, anisinha@redhat.com, eduardo@habkost.net,
marcel.apfelbaum@gmail.com, philmd@linaro.org,
wangyanan55@huawei.com, zhao1.liu@intel.com,
peter.maydell@linaro.org, pbonzini@redhat.com,
shan.gavin@gmail.com, zhangliang5@huawei.com
Subject: Re: [PATCH v4 0/8] target/arm/kvm: Improve memory error handling
Date: Fri, 21 Nov 2025 16:54:39 +1000 [thread overview]
Message-ID: <a43d044e-51e3-4000-b5cc-3bcc14317d20@redhat.com> (raw)
In-Reply-To: <lghhh6xohwekbst2bbuqksiono5dgtrkyjxoypb4ahij2t2qgs@7dmgytmbiehd>
Hi Mauro,
On 11/18/25 8:54 PM, Mauro Carvalho Chehab wrote:
> On Tue, Nov 18, 2025 at 10:47:55AM +0000, Jonathan Cameron wrote:
>> On Thu, 13 Nov 2025 03:25:27 +1000
>> Gavin Shan <gshan@redhat.com> wrote:
>>
>>> In the combination of 64KiB host and 4KiB guest, a problematic host
>>> page affects 16x guest pages. Those 16x guest pages are most likely
>>> owned by separate threads and accessed by the threads in parallel.
>>> It means 16x memory errors can be raised at once. However, we're
>>> unable to handle this situation because the only error source has
>>> one read acknowledgement register in current design. QEMU has to
>>> crash in the following path due to the previously delivered error
>>> isn't acknowledged by the guest on attempt to deliver another error.
>>>
>>> kvm_vcpu_thread_fn
>>> kvm_cpu_exec
>>> kvm_arch_on_sigbus_vcpu
>>> kvm_cpu_synchronize_state
>>> acpi_ghes_memory_errors
>>> abort
>>>
>>> This series fixes the issue by sending 16x consective CPER errors
>>> which are contained in a single GHES error block.
>>>
>>> PATCH[1-4] Increases GHES raw data maximal length from 1KiB to 4KiB
>>> PATCH[5] Supports multiple error records in a single error block
>>> PATCH[6-7] Improves the error handling in the error delivery path
>>> PATCH[8] Sends 16x consective CPERs in a single block if needed
>>>
>>
>> Hi Gavin,
>>
>> Just a quick head's up to say we've had some internal discussions around the
>> kernel handling of broader address masks in CPER and think it is probably
>> broken. Rectifying that may at least simplify what is needed on the QEMU side
>> of things and maybe even handle much larger blocks (2M and larger).
>
> Btw, I just added a logic at rasdaemon to catch SIGBUS errors:
> https://github.com/mchehab/rasdaemon/pull/199
>
> But so far, I didn't find a proper way to check such code.
>
> Jonathan/Gavin,
>
> Do you know a good way for us to check how the mm SEA notification
> is handled with QEMU?
>
Sorry that I'm not familiar with rasdaemon. Could you please provide more
contexts about your question?
Thanks,
Gavin
next prev parent reply other threads:[~2025-11-21 7:37 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-12 17:25 [PATCH v4 0/8] target/arm/kvm: Improve memory error handling Gavin Shan
2025-11-12 17:25 ` [PATCH v4 1/8] acpi/ghes: Make GHES max raw data length dynamic Gavin Shan
2025-11-12 17:25 ` [PATCH v4 2/8] tests/qtest/bios-tables-test: Prepare for changes in the HEST table Gavin Shan
2025-11-12 17:25 ` [PATCH v4 3/8] acpi/ghes: Increase GHES raw data maximal length to 4KiB Gavin Shan
2025-11-12 17:25 ` [PATCH v4 4/8] tests/qtest/bios-tables-test: Update HEST table Gavin Shan
2025-11-12 17:25 ` [PATCH v4 5/8] acpi/ghes: Extend acpi_ghes_memory_errors() for multiple CPERs Gavin Shan
2025-11-12 17:25 ` [PATCH v4 6/8] acpi/ghes: Bail early on error from get_ghes_source_offsets() Gavin Shan
2025-11-12 17:25 ` [PATCH v4 7/8] acpi/ghes: Use error_fatal in acpi_ghes_memory_errors() Gavin Shan
2025-11-13 7:41 ` Markus Armbruster
2025-11-14 9:46 ` Gavin Shan
2025-11-12 17:25 ` [PATCH v4 8/8] target/arm/kvm: Support multiple memory CPERs injection Gavin Shan
2025-11-18 10:47 ` [PATCH v4 0/8] target/arm/kvm: Improve memory error handling Jonathan Cameron via
2025-11-18 10:54 ` Mauro Carvalho Chehab
2025-11-21 6:54 ` Gavin Shan [this message]
2025-11-21 6:51 ` Gavin Shan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a43d044e-51e3-4000-b5cc-3bcc14317d20@redhat.com \
--to=gshan@redhat.com \
--cc=anisinha@redhat.com \
--cc=armbru@redhat.com \
--cc=eduardo@habkost.net \
--cc=gengdongjiu1@gmail.com \
--cc=imammedo@redhat.com \
--cc=jonathan.cameron@huawei.com \
--cc=marcel.apfelbaum@gmail.com \
--cc=mchehab+huawei@kernel.org \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=philmd@linaro.org \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=shan.gavin@gmail.com \
--cc=wangyanan55@huawei.com \
--cc=zhangliang5@huawei.com \
--cc=zhao1.liu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).