qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Gavin Shan <gshan@redhat.com>
To: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org,
	mchehab+huawei@kernel.org, gengdongjiu1@gmail.com,
	mst@redhat.com, imammedo@redhat.com, armbru@redhat.com,
	anisinha@redhat.com, eduardo@habkost.net,
	marcel.apfelbaum@gmail.com, philmd@linaro.org,
	wangyanan55@huawei.com, zhao1.liu@intel.com,
	peter.maydell@linaro.org, pbonzini@redhat.com,
	shan.gavin@gmail.com, zhangliang5@huawei.com
Subject: Re: [PATCH v4 0/8] target/arm/kvm: Improve memory error handling
Date: Fri, 21 Nov 2025 16:51:58 +1000	[thread overview]
Message-ID: <3be51d9f-366a-47a3-a048-623608aa0475@redhat.com> (raw)
In-Reply-To: <20251118104755.000024c8@huawei.com>

Hi Jonathan,

On 11/18/25 8:47 PM, Jonathan Cameron wrote:
> On Thu, 13 Nov 2025 03:25:27 +1000
> Gavin Shan <gshan@redhat.com> wrote:
> 
>> In the combination of 64KiB host and 4KiB guest, a problematic host
>> page affects 16x guest pages. Those 16x guest pages are most likely
>> owned by separate threads and accessed by the threads in parallel.
>> It means 16x memory errors can be raised at once. However, we're
>> unable to handle this situation because the only error source has
>> one read acknowledgement register in current design. QEMU has to
>> crash in the following path due to the previously delivered error
>> isn't acknowledged by the guest on attempt to deliver another error.
>>
>>    kvm_vcpu_thread_fn
>>      kvm_cpu_exec
>>        kvm_arch_on_sigbus_vcpu
>>          kvm_cpu_synchronize_state
>>          acpi_ghes_memory_errors
>>          abort
>>
>> This series fixes the issue by sending 16x consective CPER errors
>> which are contained in a single GHES error block.
>>
>> PATCH[1-4] Increases GHES raw data maximal length from 1KiB to 4KiB
>> PATCH[5]   Supports multiple error records in a single error block
>> PATCH[6-7] Improves the error handling in the error delivery path
>> PATCH[8]   Sends 16x consective CPERs in a single block if needed
>>
> 
> Hi Gavin,
> 
> Just a quick head's up to say we've had some internal discussions around the
> kernel handling of broader address masks in CPER and think it is probably
> broken. Rectifying that may at least simplify what is needed on the QEMU side
> of things and maybe even handle much larger blocks (2M and larger).
> 
> Will keep everyone informed of how we get on with resolving that.
> 

Thanks, Jonathan. If the broader address mask in CPER can be used to isolate
the specified memory range instead of a page, QEMU needn't the improvement
done in this series. Please copy me if the linux patches are going to be
sent for review if possible, I will try to review.

I will pull those patches improving error handling and post them separately
so that they can be merged. Those patches aren't really relevant to error handling.

Thanks,
Gavin



      parent reply	other threads:[~2025-11-21  6:52 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-12 17:25 [PATCH v4 0/8] target/arm/kvm: Improve memory error handling Gavin Shan
2025-11-12 17:25 ` [PATCH v4 1/8] acpi/ghes: Make GHES max raw data length dynamic Gavin Shan
2025-11-12 17:25 ` [PATCH v4 2/8] tests/qtest/bios-tables-test: Prepare for changes in the HEST table Gavin Shan
2025-11-12 17:25 ` [PATCH v4 3/8] acpi/ghes: Increase GHES raw data maximal length to 4KiB Gavin Shan
2025-11-12 17:25 ` [PATCH v4 4/8] tests/qtest/bios-tables-test: Update HEST table Gavin Shan
2025-11-12 17:25 ` [PATCH v4 5/8] acpi/ghes: Extend acpi_ghes_memory_errors() for multiple CPERs Gavin Shan
2025-11-12 17:25 ` [PATCH v4 6/8] acpi/ghes: Bail early on error from get_ghes_source_offsets() Gavin Shan
2025-11-12 17:25 ` [PATCH v4 7/8] acpi/ghes: Use error_fatal in acpi_ghes_memory_errors() Gavin Shan
2025-11-13  7:41   ` Markus Armbruster
2025-11-14  9:46     ` Gavin Shan
2025-11-12 17:25 ` [PATCH v4 8/8] target/arm/kvm: Support multiple memory CPERs injection Gavin Shan
2025-11-18 10:47 ` [PATCH v4 0/8] target/arm/kvm: Improve memory error handling Jonathan Cameron via
2025-11-18 10:54   ` Mauro Carvalho Chehab
2025-11-21  6:54     ` Gavin Shan
2025-11-21  6:51   ` Gavin Shan [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3be51d9f-366a-47a3-a048-623608aa0475@redhat.com \
    --to=gshan@redhat.com \
    --cc=anisinha@redhat.com \
    --cc=armbru@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=gengdongjiu1@gmail.com \
    --cc=imammedo@redhat.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mchehab+huawei@kernel.org \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=philmd@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=shan.gavin@gmail.com \
    --cc=wangyanan55@huawei.com \
    --cc=zhangliang5@huawei.com \
    --cc=zhao1.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).