From: Igor Mammedov <imammedo@redhat.com>
To: Gavin Shan <gshan@redhat.com>
Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, mst@redhat.com,
anisinha@redhat.com, gengdongjiu1@gmail.com,
peter.maydell@linaro.org, pbonzini@redhat.com,
mchehab+huawei@kernel.org, Jonathan.Cameron@huawei.com,
shan.gavin@gmail.com
Subject: Re: [PATCH RESEND v2 3/3] target/arm/kvm: Support multiple memory CPERs injection
Date: Fri, 17 Oct 2025 16:27:46 +0200 [thread overview]
Message-ID: <20251017162746.2a99015b@fedora> (raw)
In-Reply-To: <20251007060810.258536-4-gshan@redhat.com>
On Tue, 7 Oct 2025 16:08:10 +1000
Gavin Shan <gshan@redhat.com> wrote:
> In the combination of 64KB host and 4KB guest, a problematic host page
> affects 16x guest pages. In this specific case, it's reasonable to
> push 16 consecutive memory CPERs. Otherwise, QEMU can run into core
> dump due to the current error can't be delivered as the previous error
> isn't acknoledges. It's caused by the nature the host page can be
> accessed in parallel due to the mismatched host and guest page sizes.
can you explain a bit more what goes wrong?
I'm especially interested in parallel access you've mentioned
and why batch adding error records is needed
as opposed to adding records every time invalid access happens?
PS:
Assume I don't remember details on how HEST works,
Answering it in this format also should improve commit message
making it more digestible for uninitiated.
> Imporve push_ghes_memory_errors() to push 16x consecutive memory CPERs
> for this specific case. The maximal error block size is bumped to 4KB,
> providing enough storage space for those 16x memory CPERs.
>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> ---
> hw/acpi/ghes.c | 2 +-
> target/arm/kvm.c | 46 +++++++++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 46 insertions(+), 2 deletions(-)
>
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index 045b77715f..5c87b3a027 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -33,7 +33,7 @@
> #define ACPI_HEST_ADDR_FW_CFG_FILE "etc/acpi_table_hest_addr"
>
> /* The max size in bytes for one error block */
> -#define ACPI_GHES_MAX_RAW_DATA_LENGTH (1 * KiB)
> +#define ACPI_GHES_MAX_RAW_DATA_LENGTH (4 * KiB)
>
> /* Generic Hardware Error Source version 2 */
> #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2 10
> diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> index c5d5b3b16e..3ecb85e4b7 100644
> --- a/target/arm/kvm.c
> +++ b/target/arm/kvm.c
> @@ -11,6 +11,7 @@
> */
>
> #include "qemu/osdep.h"
> +#include "qemu/units.h"
> #include <sys/ioctl.h>
>
> #include <linux/kvm.h>
> @@ -2433,10 +2434,53 @@ static void push_ghes_memory_errors(CPUState *c, AcpiGhesState *ags,
> uint64_t paddr)
> {
> GArray *addresses = g_array_new(false, false, sizeof(paddr));
> + uint64_t val, start, end, guest_pgsz, host_pgsz;
> int ret;
>
> kvm_cpu_synchronize_state(c);
> - g_array_append_vals(addresses, &paddr, 1);
> +
> + /*
> + * Sort out the guest page size from TCR_EL1, which can be modified
> + * by the guest from time to time. So we have to sort it out dynamically.
> + */
> + ret = read_sys_reg64(c->kvm_fd, &val, ARM64_SYS_REG(3, 0, 2, 0, 2));
> + if (ret) {
> + goto error;
> + }
> +
> + switch (extract64(val, 14, 2)) {
> + case 0:
> + guest_pgsz = 4 * KiB;
> + break;
> + case 1:
> + guest_pgsz = 64 * KiB;
> + break;
> + case 2:
> + guest_pgsz = 16 * KiB;
> + break;
> + default:
> + error_report("unknown page size from TCR_EL1 (0x%" PRIx64 ")", val);
> + goto error;
> + }
> +
> + host_pgsz = qemu_real_host_page_size();
> + start = paddr & ~(host_pgsz - 1);
> + end = start + host_pgsz;
> + while (start < end) {
> + /*
> + * The precise physical address is provided for the affected
> + * guest page that contains @paddr. Otherwise, the starting
> + * address of the guest page is provided.
> + */
> + if (paddr >= start && paddr < (start + guest_pgsz)) {
> + g_array_append_vals(addresses, &paddr, 1);
> + } else {
> + g_array_append_vals(addresses, &start, 1);
> + }
> +
> + start += guest_pgsz;
> + }
> +
> ret = acpi_ghes_memory_errors(ags, ACPI_HEST_SRC_ID_SYNC, addresses);
> if (ret) {
> goto error;
next prev parent reply other threads:[~2025-10-17 14:28 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-07 6:08 [PATCH RESEND v2 0/3] target/arm/kvm: Improve memory error handling Gavin Shan
2025-10-07 6:08 ` [PATCH RESEND v2 1/3] acpi/ghes: Extend acpi_ghes_memory_errors() to support multiple CPERs Gavin Shan
2025-10-07 6:08 ` [PATCH RESEND v2 2/3] kvm/arm/kvm: Introduce helper push_ghes_memory_errors() Gavin Shan
2025-10-07 6:08 ` [PATCH RESEND v2 3/3] target/arm/kvm: Support multiple memory CPERs injection Gavin Shan
2025-10-07 10:57 ` Mauro Carvalho Chehab
2025-10-08 3:57 ` Gavin Shan
2025-10-17 14:27 ` Igor Mammedov [this message]
2025-10-19 0:36 ` Gavin Shan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251017162746.2a99015b@fedora \
--to=imammedo@redhat.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=anisinha@redhat.com \
--cc=gengdongjiu1@gmail.com \
--cc=gshan@redhat.com \
--cc=mchehab+huawei@kernel.org \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=shan.gavin@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).