From: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
To: Gavin Shan <gshan@redhat.com>
Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org, mst@redhat.com,
imammedo@redhat.com, anisinha@redhat.com, gengdongjiu1@gmail.com,
peter.maydell@linaro.org, pbonzini@redhat.com,
shan.gavin@gmail.com
Subject: Re: [PATCH 0/4] target/arm: Improvement on memory error handling
Date: Fri, 14 Feb 2025 13:59:52 +0100 [thread overview]
Message-ID: <20250214135945.322319cc@foz.lan> (raw)
In-Reply-To: <20250214041635.608012-1-gshan@redhat.com>
Em Fri, 14 Feb 2025 14:16:31 +1000
Gavin Shan <gshan@redhat.com> escreveu:
> Currently, there is only one CPER buffer (entry), meaning only one
> memory error can be reported. In extreme case, multiple memory errors
> can be raised on different vCPUs. For example, a singile memory error
> on a 64KB page of the host can results in 16 memory errors to 4KB
> pages of the guest.
There is already a patchset allowing to have multiple CPER entries
floating around since last year:
https://lore.kernel.org/qemu-devel/cover.1738345063.git.mchehab+huawei@kernel.org/
I guess it is almost ready for being merged, needing just some
nitpick changes to satisfy ACPI maintainers. Such changeset already
adds a second CPER entry for GED, and allows to easily add more as
needed.
> In extreme case, multiple memory errors
> can be raised on different vCPUs. For example, a singile memory error
> on a 64KB page of the host can results in 16 memory errors to 4KB
> pages of the guest.
> Unfortunately, the virtual machine is simply aborted
> by multiple concurrent memory errors, as the following call trace shows.
> A SEA exception is injected to the guest so that the CPER buffer can
> be claimed if the error is successfully pushed by acpi_ghes_memory_errors(),
> Otherwise, abort() is triggered to crash the virtual machine.
>
> kvm_vcpu_thread_fn
> kvm_cpu_exec
> kvm_arch_on_sigbus_vcpu
> kvm_cpu_synchronize_state
> acpi_ghes_memory_errors (a)
> kvm_inject_arm_sea | abort
>
> It's arguably to crash the virtual machine in this case. The better
> behaviour would be to retry on pushing the memory errors, to keep the
> virtual machine alive so that the administrator has chance to chime
> in, for example to dump the important data with luck. This series
> adds one more parameter to acpi_ghes_memory_errors() so that it will
> be tried to push the memory error until it succeeds.
Having a retry buffer might be interesting for some types of errors,
like error-injected and corrected errors. Yet, it doesn't sound right
to buffer uncorrected errors that would affect the virtual machine.
>
> Gavin Shan (4):
> acpi/ghes: Make ghes_record_cper_errors() static
> acpi/ghes: Use error_report() in ghes_record_cper_errors()
> acpi/ghes: Allow retry to write CPER errors
> target/arm: Retry pushing CPER error if necessary
>
> hw/acpi/ghes-stub.c | 3 ++-
> hw/acpi/ghes.c | 45 +++++++++++++++++++++---------------------
> include/hw/acpi/ghes.h | 5 ++---
> target/arm/kvm.c | 31 +++++++++++++++++++++++------
> 4 files changed, 51 insertions(+), 33 deletions(-)
>
Thanks,
Mauro
next prev parent reply other threads:[~2025-02-14 13:00 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-14 4:16 [PATCH 0/4] target/arm: Improvement on memory error handling Gavin Shan
2025-02-14 4:16 ` [PATCH 1/4] acpi/ghes: Make ghes_record_cper_errors() static Gavin Shan
2025-02-21 10:44 ` Philippe Mathieu-Daudé
2025-02-14 4:16 ` [PATCH 2/4] acpi/ghes: Use error_report() in ghes_record_cper_errors() Gavin Shan
2025-02-14 4:16 ` [PATCH 3/4] acpi/ghes: Allow retry to write CPER errors Gavin Shan
2025-02-14 4:16 ` [PATCH 4/4] target/arm: Retry pushing CPER error if necessary Gavin Shan
2025-02-19 17:55 ` Igor Mammedov
2025-02-21 5:27 ` Gavin Shan
2025-02-21 11:04 ` Jonathan Cameron via
2025-02-25 11:19 ` Igor Mammedov
2025-02-26 4:58 ` Gavin Shan
2025-02-28 1:55 ` Jonathan Cameron via
2025-02-26 6:56 ` Gavin Shan
2025-02-14 9:53 ` [PATCH 0/4] target/arm: Improvement on memory error handling Jonathan Cameron via
2025-02-17 0:29 ` Gavin Shan
2025-02-14 10:12 ` Jonathan Cameron via
2025-02-17 3:49 ` Gavin Shan
2025-02-14 12:59 ` Mauro Carvalho Chehab [this message]
2025-02-17 3:58 ` Gavin Shan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250214135945.322319cc@foz.lan \
--to=mchehab+huawei@kernel.org \
--cc=anisinha@redhat.com \
--cc=gengdongjiu1@gmail.com \
--cc=gshan@redhat.com \
--cc=imammedo@redhat.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=shan.gavin@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).