[PATCH 0/4] target/arm: Improvement on memory error handling

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/4] target/arm: Improvement on memory error handling
@ 2025-02-14  4:16 Gavin Shan
  2025-02-14  4:16 ` [PATCH 1/4] acpi/ghes: Make ghes_record_cper_errors() static Gavin Shan
                   ` (6 more replies)
  0 siblings, 7 replies; 19+ messages in thread
From: Gavin Shan @ 2025-02-14  4:16 UTC (permalink / raw)
  To: qemu-arm
  Cc: qemu-devel, mst, imammedo, anisinha, gengdongjiu1, peter.maydell,
	pbonzini, shan.gavin

Currently, there is only one CPER buffer (entry), meaning only one
memory error can be reported. In extreme case, multiple memory errors
can be raised on different vCPUs. For example, a singile memory error
on a 64KB page of the host can results in 16 memory errors to 4KB
pages of the guest. Unfortunately, the virtual machine is simply aborted
by multiple concurrent memory errors, as the following call trace shows.
A SEA exception is injected to the guest so that the CPER buffer can
be claimed if the error is successfully pushed by acpi_ghes_memory_errors(),
Otherwise, abort() is triggered to crash the virtual machine.

  kvm_vcpu_thread_fn
    kvm_cpu_exec
      kvm_arch_on_sigbus_vcpu
        kvm_cpu_synchronize_state
        acpi_ghes_memory_errors         (a)
        kvm_inject_arm_sea | abort

It's arguably to crash the virtual machine in this case. The better
behaviour would be to retry on pushing the memory errors, to keep the
virtual machine alive so that the administrator has chance to chime
in, for example to dump the important data with luck. This series
adds one more parameter to acpi_ghes_memory_errors() so that it will
be tried to push the memory error until it succeeds.

Gavin Shan (4):
  acpi/ghes: Make ghes_record_cper_errors() static
  acpi/ghes: Use error_report() in ghes_record_cper_errors()
  acpi/ghes: Allow retry to write CPER errors
  target/arm: Retry pushing CPER error if necessary

 hw/acpi/ghes-stub.c    |  3 ++-
 hw/acpi/ghes.c         | 45 +++++++++++++++++++++---------------------
 include/hw/acpi/ghes.h |  5 ++---
 target/arm/kvm.c       | 31 +++++++++++++++++++++++------
 4 files changed, 51 insertions(+), 33 deletions(-)

-- 
2.48.1

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/4] acpi/ghes: Make ghes_record_cper_errors() static
  2025-02-14  4:16 [PATCH 0/4] target/arm: Improvement on memory error handling Gavin Shan
@ 2025-02-14  4:16 ` Gavin Shan
  2025-02-21 10:44   ` Philippe Mathieu-Daudé
  2025-02-14  4:16 ` [PATCH 2/4] acpi/ghes: Use error_report() in ghes_record_cper_errors() Gavin Shan
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Gavin Shan @ 2025-02-14  4:16 UTC (permalink / raw)
  To: qemu-arm
  Cc: qemu-devel, mst, imammedo, anisinha, gengdongjiu1, peter.maydell,
	pbonzini, shan.gavin

acpi_ghes_memory_errors() is the only caller, no need to expose
the function. Besides, the last 'return' in this function isn't
necessary and remove it.

No functional changes intended.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 hw/acpi/ghes.c         | 6 ++----
 include/hw/acpi/ghes.h | 2 --
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index b709c177cd..b85bb48195 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -390,8 +390,8 @@ static void get_hw_error_offsets(uint64_t ghes_addr,
     *read_ack_register_addr = ghes_addr + sizeof(uint64_t);
 }
 
-void ghes_record_cper_errors(const void *cper, size_t len,
-                             uint16_t source_id, Error **errp)
+static void ghes_record_cper_errors(const void *cper, size_t len,
+                                    uint16_t source_id, Error **errp)
 {
     uint64_t cper_addr = 0, read_ack_register_addr = 0, read_ack_register;
     AcpiGedState *acpi_ged_state;
@@ -440,8 +440,6 @@ void ghes_record_cper_errors(const void *cper, size_t len,
 
     /* Write the generic error data entry into guest memory */
     cpu_physical_memory_write(cper_addr, cper, len);
-
-    return;
 }
 
 int acpi_ghes_memory_errors(uint16_t source_id, uint64_t physical_address)
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 39619a2457..578a582203 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -75,8 +75,6 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
 void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
                           GArray *hardware_errors);
 int acpi_ghes_memory_errors(uint16_t source_id, uint64_t error_physical_addr);
-void ghes_record_cper_errors(const void *cper, size_t len,
-                             uint16_t source_id, Error **errp);
 
 /**
  * acpi_ghes_present: Report whether ACPI GHES table is present
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/4] acpi/ghes: Use error_report() in ghes_record_cper_errors()
  2025-02-14  4:16 [PATCH 0/4] target/arm: Improvement on memory error handling Gavin Shan
  2025-02-14  4:16 ` [PATCH 1/4] acpi/ghes: Make ghes_record_cper_errors() static Gavin Shan
@ 2025-02-14  4:16 ` Gavin Shan
  2025-02-14  4:16 ` [PATCH 3/4] acpi/ghes: Allow retry to write CPER errors Gavin Shan
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Gavin Shan @ 2025-02-14  4:16 UTC (permalink / raw)
  To: qemu-arm
  Cc: qemu-devel, mst, imammedo, anisinha, gengdongjiu1, peter.maydell,
	pbonzini, shan.gavin

An error object is created by ghes_record_cper_errors() to collect
the error messages in its failing path. The caller prints the error
messages and determine its return value base on the error object.

It's unnecessary to use an error object if the error number is
returned by ghes_record_cper_errors() and the error messages are
printed with error_report() in the function.

It's the preparatory work to add parameter for ghes_record_cper_errors()
to indicate if the request can be retried by its callers. No functional
changes intended.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 hw/acpi/ghes.c | 37 ++++++++++++++++---------------------
 1 file changed, 16 insertions(+), 21 deletions(-)

diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index b85bb48195..a67326fd50 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -390,23 +390,23 @@ static void get_hw_error_offsets(uint64_t ghes_addr,
     *read_ack_register_addr = ghes_addr + sizeof(uint64_t);
 }
 
-static void ghes_record_cper_errors(const void *cper, size_t len,
-                                    uint16_t source_id, Error **errp)
+static int ghes_record_cper_errors(const void *cper, size_t len,
+                                   uint16_t source_id)
 {
     uint64_t cper_addr = 0, read_ack_register_addr = 0, read_ack_register;
     AcpiGedState *acpi_ged_state;
     AcpiGhesState *ags;
 
     if (len > ACPI_GHES_MAX_RAW_DATA_LENGTH) {
-        error_setg(errp, "GHES CPER record is too big: %zd", len);
-        return;
+        error_report("GHES CPER record is too big: %zd", len);
+        return -1;
     }
 
     acpi_ged_state = ACPI_GED(object_resolve_path_type("", TYPE_ACPI_GED,
                                                        NULL));
     if (!acpi_ged_state) {
-        error_setg(errp, "Can't find ACPI_GED object");
-        return;
+        error_report("Can't find ACPI_GED object");
+        return -1;
     }
     ags = &acpi_ged_state->ghes_state;
 
@@ -415,8 +415,8 @@ static void ghes_record_cper_errors(const void *cper, size_t len,
                          &cper_addr, &read_ack_register_addr);
 
     if (!cper_addr) {
-        error_setg(errp, "can not find Generic Error Status Block");
-        return;
+        error_report("can not find Generic Error Status Block");
+        return -1;
     }
 
     cpu_physical_memory_read(read_ack_register_addr,
@@ -424,10 +424,9 @@ static void ghes_record_cper_errors(const void *cper, size_t len,
 
     /* zero means OSPM does not acknowledge the error */
     if (!read_ack_register) {
-        error_setg(errp,
-                   "OSPM does not acknowledge previous error,"
-                   " so can not record CPER for current error anymore");
-        return;
+        error_report("OSPM does not acknowledge previous error,"
+                     " so can not record CPER for current error anymore");
+        return -1;
     }
 
     read_ack_register = cpu_to_le64(0);
@@ -440,6 +439,8 @@ static void ghes_record_cper_errors(const void *cper, size_t len,
 
     /* Write the generic error data entry into guest memory */
     cpu_physical_memory_write(cper_addr, cper, len);
+
+    return 0;
 }
 
 int acpi_ghes_memory_errors(uint16_t source_id, uint64_t physical_address)
@@ -448,9 +449,8 @@ int acpi_ghes_memory_errors(uint16_t source_id, uint64_t physical_address)
     const uint8_t guid[] =
           UUID_LE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
                   0xED, 0x7C, 0x83, 0xB1);
-    Error *errp = NULL;
-    int data_length;
     GArray *block;
+    int data_length, ret;
 
     block = g_array_new(false, true /* clear */, 1);
 
@@ -468,16 +468,11 @@ int acpi_ghes_memory_errors(uint16_t source_id, uint64_t physical_address)
     acpi_ghes_build_append_mem_cper(block, physical_address);
 
     /* Report the error */
-    ghes_record_cper_errors(block->data, block->len, source_id, &errp);
+    ret = ghes_record_cper_errors(block->data, block->len, source_id);
 
     g_array_free(block, true);
 
-    if (errp) {
-        error_report_err(errp);
-        return -1;
-    }
-
-    return 0;
+    return ret;
 }
 
 bool acpi_ghes_present(void)
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/4] acpi/ghes: Allow retry to write CPER errors
  2025-02-14  4:16 [PATCH 0/4] target/arm: Improvement on memory error handling Gavin Shan
  2025-02-14  4:16 ` [PATCH 1/4] acpi/ghes: Make ghes_record_cper_errors() static Gavin Shan
  2025-02-14  4:16 ` [PATCH 2/4] acpi/ghes: Use error_report() in ghes_record_cper_errors() Gavin Shan
@ 2025-02-14  4:16 ` Gavin Shan
  2025-02-14  4:16 ` [PATCH 4/4] target/arm: Retry pushing CPER error if necessary Gavin Shan
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Gavin Shan @ 2025-02-14  4:16 UTC (permalink / raw)
  To: qemu-arm
  Cc: qemu-devel, mst, imammedo, anisinha, gengdongjiu1, peter.maydell,
	pbonzini, shan.gavin

Multiple CPER errors can be raised on multiple vCPUs at the same
time. The error -1 is returned from ghes_record_cper_errors() and
QEMU is terminated due to abort() in kvm_arch_on_sigbus_vcpu().
it isn't correct and expected behaviour since the affected vCPU
can't proceed with execution. It's reasonable to retry if the
reported error is transient, for example the previously reported
CPER error isn't claimed by the guest.

Add one more parameter (@retry_allowed) to acpi_ghes_memory_errors(),
passed down to ghes_record_cper_errors(). The differentiated error
number (1 or -1) is returned if the the previously reported CPER
error hasn't been claimed by the guest. The caller will retry the
request if the returned error number is 1.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 hw/acpi/ghes-stub.c    |  3 ++-
 hw/acpi/ghes.c         | 12 +++++++++---
 include/hw/acpi/ghes.h |  3 ++-
 target/arm/kvm.c       |  2 +-
 4 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
index 7cec1812da..5c807afe21 100644
--- a/hw/acpi/ghes-stub.c
+++ b/hw/acpi/ghes-stub.c
@@ -11,7 +11,8 @@
 #include "qemu/osdep.h"
 #include "hw/acpi/ghes.h"
 
-int acpi_ghes_memory_errors(uint16_t source_id, uint64_t physical_address)
+int acpi_ghes_memory_errors(uint16_t source_id, uint64_t physical_address,
+                            bool retry_allowed)
 {
     return -1;
 }
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index a67326fd50..60587f3c1b 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -391,7 +391,7 @@ static void get_hw_error_offsets(uint64_t ghes_addr,
 }
 
 static int ghes_record_cper_errors(const void *cper, size_t len,
-                                   uint16_t source_id)
+                                   uint16_t source_id, bool retry_allowed)
 {
     uint64_t cper_addr = 0, read_ack_register_addr = 0, read_ack_register;
     AcpiGedState *acpi_ged_state;
@@ -424,6 +424,10 @@ static int ghes_record_cper_errors(const void *cper, size_t len,
 
     /* zero means OSPM does not acknowledge the error */
     if (!read_ack_register) {
+        if (retry_allowed) {
+            return 1;
+        }
+
         error_report("OSPM does not acknowledge previous error,"
                      " so can not record CPER for current error anymore");
         return -1;
@@ -443,7 +447,8 @@ static int ghes_record_cper_errors(const void *cper, size_t len,
     return 0;
 }
 
-int acpi_ghes_memory_errors(uint16_t source_id, uint64_t physical_address)
+int acpi_ghes_memory_errors(uint16_t source_id, uint64_t physical_address,
+                            bool retry_allowed)
 {
     /* Memory Error Section Type */
     const uint8_t guid[] =
@@ -468,7 +473,8 @@ int acpi_ghes_memory_errors(uint16_t source_id, uint64_t physical_address)
     acpi_ghes_build_append_mem_cper(block, physical_address);
 
     /* Report the error */
-    ret = ghes_record_cper_errors(block->data, block->len, source_id);
+    ret = ghes_record_cper_errors(block->data, block->len,
+                                  source_id, retry_allowed);
 
     g_array_free(block, true);
 
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index 578a582203..1dad62100a 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -74,7 +74,8 @@ void acpi_build_hest(GArray *table_data, GArray *hardware_errors,
                      const char *oem_id, const char *oem_table_id);
 void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
                           GArray *hardware_errors);
-int acpi_ghes_memory_errors(uint16_t source_id, uint64_t error_physical_addr);
+int acpi_ghes_memory_errors(uint16_t source_id, uint64_t error_physical_addr,
+                            bool retry_allowed);
 
 /**
  * acpi_ghes_present: Report whether ACPI GHES table is present
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index da30bdbb23..5c0bf99aec 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -2387,7 +2387,7 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
              */
             if (code == BUS_MCEERR_AR) {
                 kvm_cpu_synchronize_state(c);
-                if (!acpi_ghes_memory_errors(ACPI_HEST_SRC_ID_SEA, paddr)) {
+                if (!acpi_ghes_memory_errors(ACPI_HEST_SRC_ID_SEA, paddr, false)) {
                     kvm_inject_arm_sea(c);
                 } else {
                     error_report("failed to record the error");
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/4] target/arm: Retry pushing CPER error if necessary
  2025-02-14  4:16 [PATCH 0/4] target/arm: Improvement on memory error handling Gavin Shan
                   ` (2 preceding siblings ...)
  2025-02-14  4:16 ` [PATCH 3/4] acpi/ghes: Allow retry to write CPER errors Gavin Shan
@ 2025-02-14  4:16 ` Gavin Shan
  2025-02-19 17:55   ` Igor Mammedov
  2025-02-14  9:53 ` [PATCH 0/4] target/arm: Improvement on memory error handling Jonathan Cameron via
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Gavin Shan @ 2025-02-14  4:16 UTC (permalink / raw)
  To: qemu-arm
  Cc: qemu-devel, mst, imammedo, anisinha, gengdongjiu1, peter.maydell,
	pbonzini, shan.gavin

The error -1 is returned if the previously reported CPER error
hasn't been claimed. The virtual machine is terminated due to
abort(). It's conflicting to the ideal behaviour that the affected
vCPU retries pushing the CPER error in this case since the vCPU
can't proceed its execution.

Move the chunk of code to push CPER error to a separate helper
report_memory_errors() and retry the request when the return
value from acpi_ghes_memory_errors() is greater than zero.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 target/arm/kvm.c | 31 +++++++++++++++++++++++++------
 1 file changed, 25 insertions(+), 6 deletions(-)

diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 5c0bf99aec..9f063f6053 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -2362,6 +2362,30 @@ int kvm_arch_get_registers(CPUState *cs, Error **errp)
     return ret;
 }
 
+static void report_memory_error(CPUState *c, hwaddr paddr)
+{
+    int ret;
+
+    while (true) {
+        /* Retry if the previously report error hasn't been claimed */
+        ret = acpi_ghes_memory_errors(ACPI_HEST_SRC_ID_SEA, paddr, true);
+        if (ret <= 0) {
+            break;
+        }
+
+        bql_unlock();
+        g_usleep(1000);
+        bql_lock();
+    }
+
+    if (ret == 0) {
+        kvm_inject_arm_sea(c);
+    } else {
+        error_report("Error %d to report memory error", ret);
+        abort();
+    }
+}
+
 void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
 {
     ram_addr_t ram_addr;
@@ -2387,12 +2411,7 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
              */
             if (code == BUS_MCEERR_AR) {
                 kvm_cpu_synchronize_state(c);
-                if (!acpi_ghes_memory_errors(ACPI_HEST_SRC_ID_SEA, paddr, false)) {
-                    kvm_inject_arm_sea(c);
-                } else {
-                    error_report("failed to record the error");
-                    abort();
-                }
+                report_memory_error(c, paddr);
             }
             return;
         }
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/4] target/arm: Improvement on memory error handling
  2025-02-14  4:16 [PATCH 0/4] target/arm: Improvement on memory error handling Gavin Shan
                   ` (3 preceding siblings ...)
  2025-02-14  4:16 ` [PATCH 4/4] target/arm: Retry pushing CPER error if necessary Gavin Shan
@ 2025-02-14  9:53 ` Jonathan Cameron via
  2025-02-17  0:29   ` Gavin Shan
  2025-02-14 10:12 ` Jonathan Cameron via
  2025-02-14 12:59 ` Mauro Carvalho Chehab
  6 siblings, 1 reply; 19+ messages in thread
From: Jonathan Cameron via @ 2025-02-14  9:53 UTC (permalink / raw)
  To: Gavin Shan
  Cc: qemu-arm, qemu-devel, mst, imammedo, anisinha, gengdongjiu1,
	peter.maydell, pbonzini, shan.gavin, Mauro Carvalho Chehab

On Fri, 14 Feb 2025 14:16:31 +1000
Gavin Shan <gshan@redhat.com> wrote:

> Currently, there is only one CPER buffer (entry), meaning only one
> memory error can be reported. In extreme case, multiple memory errors
> can be raised on different vCPUs. For example, a singile memory error
> on a 64KB page of the host can results in 16 memory errors to 4KB
> pages of the guest. Unfortunately, the virtual machine is simply aborted
> by multiple concurrent memory errors, as the following call trace shows.
> A SEA exception is injected to the guest so that the CPER buffer can
> be claimed if the error is successfully pushed by acpi_ghes_memory_errors(),
> Otherwise, abort() is triggered to crash the virtual machine.
> 
>   kvm_vcpu_thread_fn
>     kvm_cpu_exec
>       kvm_arch_on_sigbus_vcpu
>         kvm_cpu_synchronize_state
>         acpi_ghes_memory_errors         (a)
>         kvm_inject_arm_sea | abort
> 
> It's arguably to crash the virtual machine in this case. The better
> behaviour would be to retry on pushing the memory errors, to keep the
> virtual machine alive so that the administrator has chance to chime
> in, for example to dump the important data with luck. This series
> adds one more parameter to acpi_ghes_memory_errors() so that it will
> be tried to push the memory error until it succeeds.

Hi Gavin,

+CC Mauro given:
https://lore.kernel.org/all/cover.1738345063.git.mchehab+huawei@kernel.org/

is more or less reviewed subject to some requested patch reordering and
whilst I haven't checked, seems unlikely that there won't be a
clash with this series (might just be some fuzz)

Jonathan



> 
> Gavin Shan (4):
>   acpi/ghes: Make ghes_record_cper_errors() static
>   acpi/ghes: Use error_report() in ghes_record_cper_errors()
>   acpi/ghes: Allow retry to write CPER errors
>   target/arm: Retry pushing CPER error if necessary
> 
>  hw/acpi/ghes-stub.c    |  3 ++-
>  hw/acpi/ghes.c         | 45 +++++++++++++++++++++---------------------
>  include/hw/acpi/ghes.h |  5 ++---
>  target/arm/kvm.c       | 31 +++++++++++++++++++++++------
>  4 files changed, 51 insertions(+), 33 deletions(-)
> 



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/4] target/arm: Improvement on memory error handling
  2025-02-14  4:16 [PATCH 0/4] target/arm: Improvement on memory error handling Gavin Shan
                   ` (4 preceding siblings ...)
  2025-02-14  9:53 ` [PATCH 0/4] target/arm: Improvement on memory error handling Jonathan Cameron via
@ 2025-02-14 10:12 ` Jonathan Cameron via
  2025-02-17  3:49   ` Gavin Shan
  2025-02-14 12:59 ` Mauro Carvalho Chehab
  6 siblings, 1 reply; 19+ messages in thread
From: Jonathan Cameron via @ 2025-02-14 10:12 UTC (permalink / raw)
  To: Gavin Shan, Mauro Carvalho Chehab
  Cc: qemu-arm, qemu-devel, mst, imammedo, anisinha, gengdongjiu1,
	peter.maydell, pbonzini, shan.gavin

On Fri, 14 Feb 2025 14:16:31 +1000
Gavin Shan <gshan@redhat.com> wrote:

> Currently, there is only one CPER buffer (entry), meaning only one
> memory error can be reported. In extreme case, multiple memory errors
> can be raised on different vCPUs. For example, a singile memory error
> on a 64KB page of the host can results in 16 memory errors to 4KB
> pages of the guest. Unfortunately, the virtual machine is simply aborted
> by multiple concurrent memory errors, as the following call trace shows.
> A SEA exception is injected to the guest so that the CPER buffer can
> be claimed if the error is successfully pushed by acpi_ghes_memory_errors(),
> Otherwise, abort() is triggered to crash the virtual machine.
> 
>   kvm_vcpu_thread_fn
>     kvm_cpu_exec
>       kvm_arch_on_sigbus_vcpu
>         kvm_cpu_synchronize_state
>         acpi_ghes_memory_errors         (a)
>         kvm_inject_arm_sea | abort
> 
> It's arguably to crash the virtual machine in this case. The better
> behaviour would be to retry on pushing the memory errors, to keep the
> virtual machine alive so that the administrator has chance to chime
> in, for example to dump the important data with luck. This series
> adds one more parameter to acpi_ghes_memory_errors() so that it will
> be tried to push the memory error until it succeeds.
Hi Gavin,

If the ultimate aim is to support multiple memory errors why not
just do that?  Been a while since I look at how that works, but 
the spec definitely allows it.  I think by just queuing up the errors
and updating the Error Status Address as each one is handled.
I think that's what GHESv2 ack is all about as it prevents the
RAS firmware updating the error record until it is acknowledged
at which point the RAS firmware can report the next one.

Or... Given the usecase above of a 64KiB host page and 4KiB guest
can we inject a single error record with multiple CPER entries and
just handle it all in one go?

Set the Error record header -> section count to 16 and provide
16 Memory Error Sections or equivalent.

Doesn't help with multiple errors in unrelated memory addresses but
maybe removes one problem case.

I've not checked all the information makes it to the right places
however or that we don't end up with a deadlock when multiple vCPU
involved.

If doing the more significant surgery this would involve, I'd
love to see Mauro's series land first as it cleans up a lot of
how HEST is handled etc.

Jonathan

> 
> Gavin Shan (4):
>   acpi/ghes: Make ghes_record_cper_errors() static
>   acpi/ghes: Use error_report() in ghes_record_cper_errors()
>   acpi/ghes: Allow retry to write CPER errors
>   target/arm: Retry pushing CPER error if necessary
> 
>  hw/acpi/ghes-stub.c    |  3 ++-
>  hw/acpi/ghes.c         | 45 +++++++++++++++++++++---------------------
>  include/hw/acpi/ghes.h |  5 ++---
>  target/arm/kvm.c       | 31 +++++++++++++++++++++++------
>  4 files changed, 51 insertions(+), 33 deletions(-)
> 



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/4] target/arm: Improvement on memory error handling
  2025-02-14  4:16 [PATCH 0/4] target/arm: Improvement on memory error handling Gavin Shan
                   ` (5 preceding siblings ...)
  2025-02-14 10:12 ` Jonathan Cameron via
@ 2025-02-14 12:59 ` Mauro Carvalho Chehab
  2025-02-17  3:58   ` Gavin Shan
  6 siblings, 1 reply; 19+ messages in thread
From: Mauro Carvalho Chehab @ 2025-02-14 12:59 UTC (permalink / raw)
  To: Gavin Shan
  Cc: qemu-arm, qemu-devel, mst, imammedo, anisinha, gengdongjiu1,
	peter.maydell, pbonzini, shan.gavin

Em Fri, 14 Feb 2025 14:16:31 +1000
Gavin Shan <gshan@redhat.com> escreveu:

> Currently, there is only one CPER buffer (entry), meaning only one
> memory error can be reported. In extreme case, multiple memory errors
> can be raised on different vCPUs. For example, a singile memory error
> on a 64KB page of the host can results in 16 memory errors to 4KB
> pages of the guest. 

There is already a patchset allowing to have multiple CPER entries
floating around since last year:

	https://lore.kernel.org/qemu-devel/cover.1738345063.git.mchehab+huawei@kernel.org/

I guess it is almost ready for being merged, needing just some
nitpick changes to satisfy ACPI maintainers. Such changeset already
adds a second CPER entry for GED, and allows to easily add more as
needed. 

> In extreme case, multiple memory errors
> can be raised on different vCPUs. For example, a singile memory error
> on a 64KB page of the host can results in 16 memory errors to 4KB
> pages of the guest. 

> Unfortunately, the virtual machine is simply aborted
> by multiple concurrent memory errors, as the following call trace shows.
> A SEA exception is injected to the guest so that the CPER buffer can
> be claimed if the error is successfully pushed by acpi_ghes_memory_errors(),
> Otherwise, abort() is triggered to crash the virtual machine.
> 
>   kvm_vcpu_thread_fn
>     kvm_cpu_exec
>       kvm_arch_on_sigbus_vcpu
>         kvm_cpu_synchronize_state
>         acpi_ghes_memory_errors         (a)
>         kvm_inject_arm_sea | abort
> 
> It's arguably to crash the virtual machine in this case. The better
> behaviour would be to retry on pushing the memory errors, to keep the
> virtual machine alive so that the administrator has chance to chime
> in, for example to dump the important data with luck. This series
> adds one more parameter to acpi_ghes_memory_errors() so that it will
> be tried to push the memory error until it succeeds.

Having a retry buffer might be interesting for some types of errors,
like error-injected and corrected errors. Yet, it doesn't sound right 
to buffer uncorrected errors that would affect the virtual machine.

> 
> Gavin Shan (4):
>   acpi/ghes: Make ghes_record_cper_errors() static
>   acpi/ghes: Use error_report() in ghes_record_cper_errors()
>   acpi/ghes: Allow retry to write CPER errors
>   target/arm: Retry pushing CPER error if necessary
> 
>  hw/acpi/ghes-stub.c    |  3 ++-
>  hw/acpi/ghes.c         | 45 +++++++++++++++++++++---------------------
>  include/hw/acpi/ghes.h |  5 ++---
>  target/arm/kvm.c       | 31 +++++++++++++++++++++++------
>  4 files changed, 51 insertions(+), 33 deletions(-)
> 



Thanks,
Mauro


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/4] target/arm: Improvement on memory error handling
  2025-02-14  9:53 ` [PATCH 0/4] target/arm: Improvement on memory error handling Jonathan Cameron via
@ 2025-02-17  0:29   ` Gavin Shan
  0 siblings, 0 replies; 19+ messages in thread
From: Gavin Shan @ 2025-02-17  0:29 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: qemu-arm, qemu-devel, mst, imammedo, anisinha, gengdongjiu1,
	peter.maydell, pbonzini, shan.gavin, Mauro Carvalho Chehab

On 2/14/25 7:53 PM, Jonathan Cameron wrote:
> On Fri, 14 Feb 2025 14:16:31 +1000
> Gavin Shan <gshan@redhat.com> wrote:
> 
>> Currently, there is only one CPER buffer (entry), meaning only one
>> memory error can be reported. In extreme case, multiple memory errors
>> can be raised on different vCPUs. For example, a singile memory error
>> on a 64KB page of the host can results in 16 memory errors to 4KB
>> pages of the guest. Unfortunately, the virtual machine is simply aborted
>> by multiple concurrent memory errors, as the following call trace shows.
>> A SEA exception is injected to the guest so that the CPER buffer can
>> be claimed if the error is successfully pushed by acpi_ghes_memory_errors(),
>> Otherwise, abort() is triggered to crash the virtual machine.
>>
>>    kvm_vcpu_thread_fn
>>      kvm_cpu_exec
>>        kvm_arch_on_sigbus_vcpu
>>          kvm_cpu_synchronize_state
>>          acpi_ghes_memory_errors         (a)
>>          kvm_inject_arm_sea | abort
>>
>> It's arguably to crash the virtual machine in this case. The better
>> behaviour would be to retry on pushing the memory errors, to keep the
>> virtual machine alive so that the administrator has chance to chime
>> in, for example to dump the important data with luck. This series
>> adds one more parameter to acpi_ghes_memory_errors() so that it will
>> be tried to push the memory error until it succeeds.
> 
> Hi Gavin,
> 
> +CC Mauro given:
> https://lore.kernel.org/all/cover.1738345063.git.mchehab+huawei@kernel.org/
> 
> is more or less reviewed subject to some requested patch reordering and
> whilst I haven't checked, seems unlikely that there won't be a
> clash with this series (might just be some fuzz)
> 

Jonathan, thanks for the pointer. I didn't notice there are pending acpi/hest
changes. The changes clash with those included in this series, I will take a
close look.

Thanks,
Gavin

> Jonathan
> 
> 
> 
>>
>> Gavin Shan (4):
>>    acpi/ghes: Make ghes_record_cper_errors() static
>>    acpi/ghes: Use error_report() in ghes_record_cper_errors()
>>    acpi/ghes: Allow retry to write CPER errors
>>    target/arm: Retry pushing CPER error if necessary
>>
>>   hw/acpi/ghes-stub.c    |  3 ++-
>>   hw/acpi/ghes.c         | 45 +++++++++++++++++++++---------------------
>>   include/hw/acpi/ghes.h |  5 ++---
>>   target/arm/kvm.c       | 31 +++++++++++++++++++++++------
>>   4 files changed, 51 insertions(+), 33 deletions(-)
>>
> 



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/4] target/arm: Improvement on memory error handling
  2025-02-14 10:12 ` Jonathan Cameron via
@ 2025-02-17  3:49   ` Gavin Shan
  0 siblings, 0 replies; 19+ messages in thread
From: Gavin Shan @ 2025-02-17  3:49 UTC (permalink / raw)
  To: Jonathan Cameron, Mauro Carvalho Chehab
  Cc: qemu-arm, qemu-devel, mst, imammedo, anisinha, gengdongjiu1,
	peter.maydell, pbonzini, shan.gavin

On 2/14/25 8:12 PM, Jonathan Cameron wrote:
> On Fri, 14 Feb 2025 14:16:31 +1000
> Gavin Shan <gshan@redhat.com> wrote:
> 
>> Currently, there is only one CPER buffer (entry), meaning only one
>> memory error can be reported. In extreme case, multiple memory errors
>> can be raised on different vCPUs. For example, a singile memory error
>> on a 64KB page of the host can results in 16 memory errors to 4KB
>> pages of the guest. Unfortunately, the virtual machine is simply aborted
>> by multiple concurrent memory errors, as the following call trace shows.
>> A SEA exception is injected to the guest so that the CPER buffer can
>> be claimed if the error is successfully pushed by acpi_ghes_memory_errors(),
>> Otherwise, abort() is triggered to crash the virtual machine.
>>
>>    kvm_vcpu_thread_fn
>>      kvm_cpu_exec
>>        kvm_arch_on_sigbus_vcpu
>>          kvm_cpu_synchronize_state
>>          acpi_ghes_memory_errors         (a)
>>          kvm_inject_arm_sea | abort
>>
>> It's arguably to crash the virtual machine in this case. The better
>> behaviour would be to retry on pushing the memory errors, to keep the
>> virtual machine alive so that the administrator has chance to chime
>> in, for example to dump the important data with luck. This series
>> adds one more parameter to acpi_ghes_memory_errors() so that it will
>> be tried to push the memory error until it succeeds.
> Hi Gavin,
> 
> If the ultimate aim is to support multiple memory errors why not
> just do that?  Been a while since I look at how that works, but
> the spec definitely allows it.  I think by just queuing up the errors
> and updating the Error Status Address as each one is handled.
> I think that's what GHESv2 ack is all about as it prevents the
> RAS firmware updating the error record until it is acknowledged
> at which point the RAS firmware can report the next one.
> 
> Or... Given the usecase above of a 64KiB host page and 4KiB guest
> can we inject a single error record with multiple CPER entries and
> just handle it all in one go?
> 
> Set the Error record header -> section count to 16 and provide
> 16 Memory Error Sections or equivalent.
> 
> Doesn't help with multiple errors in unrelated memory addresses but
> maybe removes one problem case.
> 
> I've not checked all the information makes it to the right places
> however or that we don't end up with a deadlock when multiple vCPU
> involved.
> 
> If doing the more significant surgery this would involve, I'd
> love to see Mauro's series land first as it cleans up a lot of
> how HEST is handled etc.
> 

Jonathan, thanks for review and comments. It's just an example that a problematic
64k host page can affect 16 4k guest pages. The errors aren't raised at the same
time because the SIGBUS signal is received by QEMU when the corresponding 4k guest
page is accessed. If all those errors are queued up and delivered at once, the
problem is when all those queued errors are delivered?

Besides, the problematic 64k host page affecting 16 4k guest page is an example.
when host/guest has same page size (e.g. 4KB), it's possible that two problematic
pages are detected by SIGBUS signals. It's also possible that one CPER error
is being delivered, but not acknowledged. A followup CPER error is raised to be
delivered. In this case, abort() is triggered either. So the problem isn't specific
64k host page size + 4k guest page size.

Thanks,
Gavin



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/4] target/arm: Improvement on memory error handling
  2025-02-14 12:59 ` Mauro Carvalho Chehab
@ 2025-02-17  3:58   ` Gavin Shan
  0 siblings, 0 replies; 19+ messages in thread
From: Gavin Shan @ 2025-02-17  3:58 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: qemu-arm, qemu-devel, mst, imammedo, anisinha, gengdongjiu1,
	peter.maydell, pbonzini, shan.gavin

On 2/14/25 10:59 PM, Mauro Carvalho Chehab wrote:
> Em Fri, 14 Feb 2025 14:16:31 +1000
> Gavin Shan <gshan@redhat.com> escreveu:
> 
>> Currently, there is only one CPER buffer (entry), meaning only one
>> memory error can be reported. In extreme case, multiple memory errors
>> can be raised on different vCPUs. For example, a singile memory error
>> on a 64KB page of the host can results in 16 memory errors to 4KB
>> pages of the guest.
> 
> There is already a patchset allowing to have multiple CPER entries
> floating around since last year:
> 
> 	https://lore.kernel.org/qemu-devel/cover.1738345063.git.mchehab+huawei@kernel.org/
> 
> I guess it is almost ready for being merged, needing just some
> nitpick changes to satisfy ACPI maintainers. Such changeset already
> adds a second CPER entry for GED, and allows to easily add more as
> needed.
> 

Thanks for the linker, Mauro. As I explained to Jonathan, the bottleneck
isn't the number of CPER entries (single or multiple). The bottleneck
is actually the acknowledgment mechanism. With the mechanism, a single
CPER buffer, which could contain multiple entries, can be delivered
and acknowledged at once. I don't see your series changes anything in
this regard if I don't miss anything.

>> In extreme case, multiple memory errors
>> can be raised on different vCPUs. For example, a singile memory error
>> on a 64KB page of the host can results in 16 memory errors to 4KB
>> pages of the guest.
> 
>> Unfortunately, the virtual machine is simply aborted
>> by multiple concurrent memory errors, as the following call trace shows.
>> A SEA exception is injected to the guest so that the CPER buffer can
>> be claimed if the error is successfully pushed by acpi_ghes_memory_errors(),
>> Otherwise, abort() is triggered to crash the virtual machine.
>>
>>    kvm_vcpu_thread_fn
>>      kvm_cpu_exec
>>        kvm_arch_on_sigbus_vcpu
>>          kvm_cpu_synchronize_state
>>          acpi_ghes_memory_errors         (a)
>>          kvm_inject_arm_sea | abort
>>
>> It's arguably to crash the virtual machine in this case. The better
>> behaviour would be to retry on pushing the memory errors, to keep the
>> virtual machine alive so that the administrator has chance to chime
>> in, for example to dump the important data with luck. This series
>> adds one more parameter to acpi_ghes_memory_errors() so that it will
>> be tried to push the memory error until it succeeds.
> 
> Having a retry buffer might be interesting for some types of errors,
> like error-injected and corrected errors. Yet, it doesn't sound right
> to buffer uncorrected errors that would affect the virtual machine.
> 

The question is how the uncorrected error can be delivered if the previous
corrected error is being delivered and not acknowledged yet? With the
acknowledgement mechanism, all errors are equal in priority when they're
delivered, correct?

Thanks,
Gavin



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] target/arm: Retry pushing CPER error if necessary
  2025-02-14  4:16 ` [PATCH 4/4] target/arm: Retry pushing CPER error if necessary Gavin Shan
@ 2025-02-19 17:55   ` Igor Mammedov
  2025-02-21  5:27     ` Gavin Shan
  0 siblings, 1 reply; 19+ messages in thread
From: Igor Mammedov @ 2025-02-19 17:55 UTC (permalink / raw)
  To: Gavin Shan
  Cc: qemu-arm, qemu-devel, mst, anisinha, gengdongjiu1, peter.maydell,
	pbonzini, shan.gavin, Mauro Carvalho Chehab

On Fri, 14 Feb 2025 14:16:35 +1000
Gavin Shan <gshan@redhat.com> wrote:

> The error -1 is returned if the previously reported CPER error
> hasn't been claimed. The virtual machine is terminated due to
> abort(). It's conflicting to the ideal behaviour that the affected
> vCPU retries pushing the CPER error in this case since the vCPU
> can't proceed its execution.
> 
> Move the chunk of code to push CPER error to a separate helper
> report_memory_errors() and retry the request when the return
> value from acpi_ghes_memory_errors() is greater than zero.
> 
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> ---
>  target/arm/kvm.c | 31 +++++++++++++++++++++++++------
>  1 file changed, 25 insertions(+), 6 deletions(-)
> 
> diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> index 5c0bf99aec..9f063f6053 100644
> --- a/target/arm/kvm.c
> +++ b/target/arm/kvm.c
> @@ -2362,6 +2362,30 @@ int kvm_arch_get_registers(CPUState *cs, Error **errp)
>      return ret;
>  }
>  
> +static void report_memory_error(CPUState *c, hwaddr paddr)
> +{
> +    int ret;
> +
> +    while (true) {
> +        /* Retry if the previously report error hasn't been claimed */
> +        ret = acpi_ghes_memory_errors(ACPI_HEST_SRC_ID_SEA, paddr, true);
> +        if (ret <= 0) {
> +            break;
> +        }
> +
> +        bql_unlock();
> +        g_usleep(1000);
even with bql released it's not safe to loop in here.
consider,
  a guest with 2 vcpus
    * vcpu 1 gets SIGBUS due to error
    * vcpu 2 trips over the same error and gets into this loop
    * on guest side vcpu 1 continues to run to handle SEA but
      might need to acquire a lock that vcpu 2 holds

GHESv2 error source we support, can report several errors,
currently QEMU supports only 1 'error status block' which
can hold several error records (CPER) (though storage size is limited)

1:
We can potentially add support for more GHESv2 error sources
with their own Read ACK registers (let's say =max_cpus)
(that is under assumption that no other error will be
triggered while guest VCPUs handle their own SEA (upto clearing Read ACK))

2:
Another way could be for QEMU to allocate more error status _blocks_
for the only one error source it has now and try to find
empty status block to inject new error(s).
 * it can be saturated with high rate of errors (so what do we do in case it happens?)
 * subject to race between clearing/setting Read ACK
    (maybe it can dealt with that on side by keeping internal read_ack counter)

3:
And alternatively, queue incoming errors until read ack is cleared
and then inject pending errors in one go.
(problem with that is that at the moment QEMU doesn't monitor
read ack register memory so it won't notice guest clearing that)


Given spec has provision for multiple error status blocks/error data entries
it seems that #2 is an expected way to deal with the problem.

PS:
I'd prefer Mauro's series being merged 1st (once it's resplit),
for it refactors a bunch of original code and hopefully makes
code easier to follow/extend.

> +        bql_lock();
> +    }
> +
> +    if (ret == 0) {
> +        kvm_inject_arm_sea(c);
> +    } else {
> +        error_report("Error %d to report memory error", ret);
> +        abort();
> +    }
> +}
> +
>  void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
>  {
>      ram_addr_t ram_addr;
> @@ -2387,12 +2411,7 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
>               */
>              if (code == BUS_MCEERR_AR) {
>                  kvm_cpu_synchronize_state(c);
> -                if (!acpi_ghes_memory_errors(ACPI_HEST_SRC_ID_SEA, paddr, false)) {
> -                    kvm_inject_arm_sea(c);
> -                } else {
> -                    error_report("failed to record the error");
> -                    abort();
> -                }
> +                report_memory_error(c, paddr);
>              }
>              return;
>          }



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] target/arm: Retry pushing CPER error if necessary
  2025-02-19 17:55   ` Igor Mammedov
@ 2025-02-21  5:27     ` Gavin Shan
  2025-02-21 11:04       ` Jonathan Cameron via
  0 siblings, 1 reply; 19+ messages in thread
From: Gavin Shan @ 2025-02-21  5:27 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: qemu-arm, qemu-devel, mst, anisinha, gengdongjiu1, peter.maydell,
	pbonzini, shan.gavin, Mauro Carvalho Chehab

On 2/20/25 3:55 AM, Igor Mammedov wrote:
> On Fri, 14 Feb 2025 14:16:35 +1000
> Gavin Shan <gshan@redhat.com> wrote:
> 
>> The error -1 is returned if the previously reported CPER error
>> hasn't been claimed. The virtual machine is terminated due to
>> abort(). It's conflicting to the ideal behaviour that the affected
>> vCPU retries pushing the CPER error in this case since the vCPU
>> can't proceed its execution.
>>
>> Move the chunk of code to push CPER error to a separate helper
>> report_memory_errors() and retry the request when the return
>> value from acpi_ghes_memory_errors() is greater than zero.
>>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> ---
>>   target/arm/kvm.c | 31 +++++++++++++++++++++++++------
>>   1 file changed, 25 insertions(+), 6 deletions(-)
>>
>> diff --git a/target/arm/kvm.c b/target/arm/kvm.c
>> index 5c0bf99aec..9f063f6053 100644
>> --- a/target/arm/kvm.c
>> +++ b/target/arm/kvm.c
>> @@ -2362,6 +2362,30 @@ int kvm_arch_get_registers(CPUState *cs, Error **errp)
>>       return ret;
>>   }
>>   
>> +static void report_memory_error(CPUState *c, hwaddr paddr)
>> +{
>> +    int ret;
>> +
>> +    while (true) {
>> +        /* Retry if the previously report error hasn't been claimed */
>> +        ret = acpi_ghes_memory_errors(ACPI_HEST_SRC_ID_SEA, paddr, true);
>> +        if (ret <= 0) {
>> +            break;
>> +        }
>> +
>> +        bql_unlock();
>> +        g_usleep(1000);

Igor, thanks for the detailed comments. Sorry for a bit delay of the reply, I
was checking the code to understand it better :)

> even with bql released it's not safe to loop in here.
> consider,
>    a guest with 2 vcpus
>      * vcpu 1 gets SIGBUS due to error
>      * vcpu 2 trips over the same error and gets into this loop
>      * on guest side vcpu 1 continues to run to handle SEA but
>        might need to acquire a lock that vcpu 2 holds
> 

Agreed.

> GHESv2 error source we support, can report several errors,
> currently QEMU supports only 1 'error status block' which
> can hold several error records (CPER) (though storage size is limited)
> 
> 1:
> We can potentially add support for more GHESv2 error sources
> with their own Read ACK registers (let's say =max_cpus)
> (that is under assumption that no other error will be
> triggered while guest VCPUs handle their own SEA (upto clearing Read ACK))
> 
> 2:
> Another way could be for QEMU to allocate more error status _blocks_
> for the only one error source it has now and try to find
> empty status block to inject new error(s).
>   * it can be saturated with high rate of errors (so what do we do in case it happens?)
>   * subject to race between clearing/setting Read ACK
>      (maybe it can dealt with that on side by keeping internal read_ack counter)
> 
> 3:
> And alternatively, queue incoming errors until read ack is cleared
> and then inject pending errors in one go.
> (problem with that is that at the moment QEMU doesn't monitor
> read ack register memory so it won't notice guest clearing that)
> 
> 
> Given spec has provision for multiple error status blocks/error data entries
> it seems that #2 is an expected way to deal with the problem.
> 

I would say #1 is the ideal model because the read_ack_register is the bottleneck
and it should be scaled up to max_cpus. In that way, the bottleneck can be avoided
from the bottom. Another benefit with #1 is the error can be delivered immediately
to the vCPU where the error was raised. This matches with the syntax of SEA to me.

#2 still has the risk to saturate the multiple error status blocks if there are
high rate of errors as you said. Besides, the vCPU where read_ack_register is acknoledged
can be different from the vCPU where the error is raised, violating the syntax of
SEA.

#3's drawback is to violate the syntax of SEA, similar to #2.

However, #2/#3 wouldn't be that complicated to #1. I didn't expect big surgery to
GHES module, but it seems there isn't perfect solution without a big surgery.
I would vote for #1 to resolve the issue from the ground. What do you think, Igor?
I'm also hoping Jonathan and Mauro can provide their preference.

> PS:
> I'd prefer Mauro's series being merged 1st (once it's resplit),
> for it refactors a bunch of original code and hopefully makes
> code easier to follow/extend.
> 

Sure. I won't start the coding until the solution is confirmed. All the followup
work will base on Mauro's series.

>> +        bql_lock();
>> +    }
>> +
>> +    if (ret == 0) {
>> +        kvm_inject_arm_sea(c);
>> +    } else {
>> +        error_report("Error %d to report memory error", ret);
>> +        abort();
>> +    }
>> +}
>> +
>>   void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
>>   {
>>       ram_addr_t ram_addr;
>> @@ -2387,12 +2411,7 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
>>                */
>>               if (code == BUS_MCEERR_AR) {
>>                   kvm_cpu_synchronize_state(c);
>> -                if (!acpi_ghes_memory_errors(ACPI_HEST_SRC_ID_SEA, paddr, false)) {
>> -                    kvm_inject_arm_sea(c);
>> -                } else {
>> -                    error_report("failed to record the error");
>> -                    abort();
>> -                }
>> +                report_memory_error(c, paddr);
>>               }
>>               return;
>>           }
> 

Thanks,
Gavin



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/4] acpi/ghes: Make ghes_record_cper_errors() static
  2025-02-14  4:16 ` [PATCH 1/4] acpi/ghes: Make ghes_record_cper_errors() static Gavin Shan
@ 2025-02-21 10:44   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 19+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-02-21 10:44 UTC (permalink / raw)
  To: Gavin Shan, qemu-arm
  Cc: qemu-devel, mst, imammedo, anisinha, gengdongjiu1, peter.maydell,
	pbonzini, shan.gavin

On 14/2/25 05:16, Gavin Shan wrote:
> acpi_ghes_memory_errors() is the only caller, no need to expose
> the function. Besides, the last 'return' in this function isn't
> necessary and remove it.
> 
> No functional changes intended.
> 
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> ---
>   hw/acpi/ghes.c         | 6 ++----
>   include/hw/acpi/ghes.h | 2 --
>   2 files changed, 2 insertions(+), 6 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] target/arm: Retry pushing CPER error if necessary
  2025-02-21  5:27     ` Gavin Shan
@ 2025-02-21 11:04       ` Jonathan Cameron via
  2025-02-25 11:19         ` Igor Mammedov
  2025-02-26  6:56         ` Gavin Shan
  0 siblings, 2 replies; 19+ messages in thread
From: Jonathan Cameron via @ 2025-02-21 11:04 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Igor Mammedov, qemu-arm, qemu-devel, mst, anisinha, gengdongjiu1,
	peter.maydell, pbonzini, shan.gavin, Mauro Carvalho Chehab

On Fri, 21 Feb 2025 15:27:36 +1000
Gavin Shan <gshan@redhat.com> wrote:

> On 2/20/25 3:55 AM, Igor Mammedov wrote:
> > On Fri, 14 Feb 2025 14:16:35 +1000
> > Gavin Shan <gshan@redhat.com> wrote:
> >   
> >> The error -1 is returned if the previously reported CPER error
> >> hasn't been claimed. The virtual machine is terminated due to
> >> abort(). It's conflicting to the ideal behaviour that the affected
> >> vCPU retries pushing the CPER error in this case since the vCPU
> >> can't proceed its execution.
> >>
> >> Move the chunk of code to push CPER error to a separate helper
> >> report_memory_errors() and retry the request when the return
> >> value from acpi_ghes_memory_errors() is greater than zero.
> >>
> >> Signed-off-by: Gavin Shan <gshan@redhat.com>
> >> ---
> >>   target/arm/kvm.c | 31 +++++++++++++++++++++++++------
> >>   1 file changed, 25 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> >> index 5c0bf99aec..9f063f6053 100644
> >> --- a/target/arm/kvm.c
> >> +++ b/target/arm/kvm.c
> >> @@ -2362,6 +2362,30 @@ int kvm_arch_get_registers(CPUState *cs, Error **errp)
> >>       return ret;
> >>   }
> >>   
> >> +static void report_memory_error(CPUState *c, hwaddr paddr)
> >> +{
> >> +    int ret;
> >> +
> >> +    while (true) {
> >> +        /* Retry if the previously report error hasn't been claimed */
> >> +        ret = acpi_ghes_memory_errors(ACPI_HEST_SRC_ID_SEA, paddr, true);
> >> +        if (ret <= 0) {
> >> +            break;
> >> +        }
> >> +
> >> +        bql_unlock();
> >> +        g_usleep(1000);  
> 
> Igor, thanks for the detailed comments. Sorry for a bit delay of the reply, I
> was checking the code to understand it better :)

This is moderately tricky stuff so I'm not 100% sure of some of the things
I said below, but will be traveling for next few weeks so want to get some
comments out before that!

> 
> > even with bql released it's not safe to loop in here.
> > consider,
> >    a guest with 2 vcpus
> >      * vcpu 1 gets SIGBUS due to error
> >      * vcpu 2 trips over the same error and gets into this loop
> >      * on guest side vcpu 1 continues to run to handle SEA but
> >        might need to acquire a lock that vcpu 2 holds
> >   
> 
> Agreed.
> 
> > GHESv2 error source we support, can report several errors,
> > currently QEMU supports only 1 'error status block' which
> > can hold several error records (CPER) (though storage size is limited)
> > 
> > 1:
> > We can potentially add support for more GHESv2 error sources
> > with their own Read ACK registers (let's say =max_cpus)
> > (that is under assumption that no other error will be
> > triggered while guest VCPUs handle their own SEA (upto clearing Read ACK))

This one seems straight forward but I'd kind of like to know if real systems
do this (I'll try and find out about ours).  I don't think there is
any association available between a cpu and and SEA source, so linux at
least will just go looking for any that are active on each SEA.
Locking looks fine but it won't help with performance

> > 
> > 2:
> > Another way could be for QEMU to allocate more error status _blocks_
> > for the only one error source it has now and try to find
> > empty status block to inject new error(s).

Let me try to get my head around this one...

Each GHESv2 entry points, indirectly, to a single error status block at a time
(only one address to read that from)  Curious quirk is the length for that
error status block is fixed as that's just a value in GHESv2 not an indirection
via a register - however I think you can just make it 'big'.
So what I think you are proposing here is that on read_ack write (which we would
need to monitor for, the value of the error status address register is updated
to point to next one of a queue of error blocks.

That can work.  I'm not sure it actually gets us anything over just queuing in
qemu and writing the same error status block.  Those status blocks can contain
multiple Generic Error Data entries, but unless we have a load of them gathered
up at time of first notifying the guest, I'm not sure that helps us.

One thing that I'm nervous about is that I can't actually find spec language
that says that the OS 'must' reread the error status address register on
each event. That isn't mentioned in the GHESv2 flow description which just says:
"
These are the steps the OS must take once detecting an error from a particular GHESv2 error source:
• OSPM detects error (via interrupt/exception or polling the block status)
• OSPM copies the error status block
• OSPM clears the block status field of the error status block
• OSPM acknowledges the error via Read Ack register. For example:
– OSPM reads the Read Ack register –> X
– OSPM writes –> (( X & ReadAckPreserve) | ReadAckWrite)
"

The linux code is confusing me, but I think it wonderfully changes
the fixmap on every access as it needs to do an ioremap type operation
in NMI conditions.

> >   * it can be saturated with high rate of errors (so what do we do in case it happens?)

Make it big :)  But sure big just makes the condition unlikely rather than solving it.

> >   * subject to race between clearing/setting Read ACK
> >      (maybe it can dealt with that on side by keeping internal read_ack counter)

I don't think there are any races as long as we update the register only on clear which
should I think happen before the next SEA can happen? My understanding, which may
be wrong, is the OS must just take a copy of the error status block and set the
read_ack all in the exception handler.

> > 
> > 3:
> > And alternatively, queue incoming errors until read ack is cleared
> > and then inject pending errors in one go.
> > (problem with that is that at the moment QEMU doesn't monitor
> > read ack register memory so it won't notice guest clearing that)

We'd need to monitor it definitely.  Injecting all we have queued up
in one go here seems like a reasonable optimization over doing them
one at a time.

> > 
> > 
> > Given spec has provision for multiple error status blocks/error data entries
> > it seems that #2 is an expected way to deal with the problem.
> >   
> 
> I would say #1 is the ideal model because the read_ack_register is the bottleneck
> and it should be scaled up to max_cpus. In that way, the bottleneck can be avoided
> from the bottom. Another benefit with #1 is the error can be delivered immediately
> to the vCPU where the error was raised. This matches with the syntax of SEA to me.

I don't think it helps for the bottleneck in linux at least.  A whole bunch of locks
are taken on each SEA because of the novel use of the fixmap.  There is only one
VA ever used to access the error status blocks we just change what PA it points to
under a spin lock. Maybe that can be improved on if we can persuade people that error
handling performance is a thing to care about!

> 
> #2 still has the risk to saturate the multiple error status blocks if there are
> high rate of errors as you said. Besides, the vCPU where read_ack_register is acknoledged
> can be different from the vCPU where the error is raised, violating the syntax of
> SEA.
> 
> #3's drawback is to violate the syntax of SEA, similar to #2.
> 
> However, #2/#3 wouldn't be that complicated to #1. I didn't expect big surgery to
> GHES module, but it seems there isn't perfect solution without a big surgery.
> I would vote for #1 to resolve the issue from the ground. What do you think, Igor?
> I'm also hoping Jonathan and Mauro can provide their preference.

Ideally I'd like whatever we choose to look like what a bare metal machine
does - mostly because we are less likely to hit untested OS paths.

> 
> > PS:
> > I'd prefer Mauro's series being merged 1st (once it's resplit),
> > for it refactors a bunch of original code and hopefully makes
> > code easier to follow/extend.
> >   
> 
> Sure. I won't start the coding until the solution is confirmed. All the followup
> work will base on Mauro's series.
> 
> >> +        bql_lock();
> >> +    }
> >> +
> >> +    if (ret == 0) {
> >> +        kvm_inject_arm_sea(c);
> >> +    } else {
> >> +        error_report("Error %d to report memory error", ret);
> >> +        abort();
> >> +    }
> >> +}
> >> +
> >>   void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
> >>   {
> >>       ram_addr_t ram_addr;
> >> @@ -2387,12 +2411,7 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
> >>                */
> >>               if (code == BUS_MCEERR_AR) {
> >>                   kvm_cpu_synchronize_state(c);
> >> -                if (!acpi_ghes_memory_errors(ACPI_HEST_SRC_ID_SEA, paddr, false)) {
> >> -                    kvm_inject_arm_sea(c);
> >> -                } else {
> >> -                    error_report("failed to record the error");
> >> -                    abort();
> >> -                }
> >> +                report_memory_error(c, paddr);
> >>               }
> >>               return;
> >>           }  
> >   
> 
> Thanks,
> Gavin
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] target/arm: Retry pushing CPER error if necessary
  2025-02-21 11:04       ` Jonathan Cameron via
@ 2025-02-25 11:19         ` Igor Mammedov
  2025-02-26  4:58           ` Gavin Shan
  2025-02-26  6:56         ` Gavin Shan
  1 sibling, 1 reply; 19+ messages in thread
From: Igor Mammedov @ 2025-02-25 11:19 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Gavin Shan, qemu-arm, qemu-devel, mst, anisinha, gengdongjiu1,
	peter.maydell, pbonzini, shan.gavin, Mauro Carvalho Chehab

On Fri, 21 Feb 2025 11:04:35 +0000
Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:

> On Fri, 21 Feb 2025 15:27:36 +1000
> Gavin Shan <gshan@redhat.com> wrote:
> 
> > On 2/20/25 3:55 AM, Igor Mammedov wrote:  
> > > On Fri, 14 Feb 2025 14:16:35 +1000
> > > Gavin Shan <gshan@redhat.com> wrote:
> > >     
> > >> The error -1 is returned if the previously reported CPER error
> > >> hasn't been claimed. The virtual machine is terminated due to
> > >> abort(). It's conflicting to the ideal behaviour that the affected
> > >> vCPU retries pushing the CPER error in this case since the vCPU
> > >> can't proceed its execution.
> > >>
> > >> Move the chunk of code to push CPER error to a separate helper
> > >> report_memory_errors() and retry the request when the return
> > >> value from acpi_ghes_memory_errors() is greater than zero.
> > >>
> > >> Signed-off-by: Gavin Shan <gshan@redhat.com>
> > >> ---
> > >>   target/arm/kvm.c | 31 +++++++++++++++++++++++++------
> > >>   1 file changed, 25 insertions(+), 6 deletions(-)
> > >>
> > >> diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> > >> index 5c0bf99aec..9f063f6053 100644
> > >> --- a/target/arm/kvm.c
> > >> +++ b/target/arm/kvm.c
> > >> @@ -2362,6 +2362,30 @@ int kvm_arch_get_registers(CPUState *cs, Error **errp)
> > >>       return ret;
> > >>   }
> > >>   
> > >> +static void report_memory_error(CPUState *c, hwaddr paddr)
> > >> +{
> > >> +    int ret;
> > >> +
> > >> +    while (true) {
> > >> +        /* Retry if the previously report error hasn't been claimed */
> > >> +        ret = acpi_ghes_memory_errors(ACPI_HEST_SRC_ID_SEA, paddr, true);
> > >> +        if (ret <= 0) {
> > >> +            break;
> > >> +        }
> > >> +
> > >> +        bql_unlock();
> > >> +        g_usleep(1000);    
> > 
> > Igor, thanks for the detailed comments. Sorry for a bit delay of the reply, I
> > was checking the code to understand it better :)  
> 
> This is moderately tricky stuff so I'm not 100% sure of some of the things
> I said below, but will be traveling for next few weeks so want to get some
> comments out before that!
> 
> >   
> > > even with bql released it's not safe to loop in here.
> > > consider,
> > >    a guest with 2 vcpus
> > >      * vcpu 1 gets SIGBUS due to error
> > >      * vcpu 2 trips over the same error and gets into this loop
> > >      * on guest side vcpu 1 continues to run to handle SEA but
> > >        might need to acquire a lock that vcpu 2 holds
> > >     
> > 
> > Agreed.
> >   
> > > GHESv2 error source we support, can report several errors,
> > > currently QEMU supports only 1 'error status block' which
> > > can hold several error records (CPER) (though storage size is limited)
> > > 
> > > 1:
> > > We can potentially add support for more GHESv2 error sources
> > > with their own Read ACK registers (let's say =max_cpus)
> > > (that is under assumption that no other error will be
> > > triggered while guest VCPUs handle their own SEA (upto clearing Read ACK))  
> 
> This one seems straight forward but I'd kind of like to know if real systems
> do this (I'll try and find out about ours).  I don't think there is
> any association available between a cpu and and SEA source, so linux at
> least will just go looking for any that are active on each SEA.
> Locking looks fine but it won't help with performance
> 
> > > 
> > > 2:
> > > Another way could be for QEMU to allocate more error status _blocks_
> > > for the only one error source it has now and try to find
> > > empty status block to inject new error(s).  
> 
> Let me try to get my head around this one...
> 
> Each GHESv2 entry points, indirectly, to a single error status block at a time
> (only one address to read that from)  Curious quirk is the length for that
> error status block is fixed as that's just a value in GHESv2 not an indirection
> via a register - however I think you can just make it 'big'.
> So what I think you are proposing here is that on read_ack write (which we would
> need to monitor for, the value of the error status address register is updated
> to point to next one of a queue of error blocks.

All my suggestions here aren't recommendation, but rather thinking out loud
to see if we can find a crorrect way to deal with the issue.

The part I don't fancy doing on QEMU part is read_ack write intercept.

> 
> That can work.  I'm not sure it actually gets us anything over just queuing in
> qemu and writing the same error status block.  Those status blocks can contain

benefit would be that we won't have to maintain state on QEMU side
and then migrate it.

> multiple Generic Error Data entries, but unless we have a load of them gathered
> up at time of first notifying the guest, I'm not sure that helps us.
> 
> One thing that I'm nervous about is that I can't actually find spec language
> that says that the OS 'must' reread the error status address register on
> each event. That isn't mentioned in the GHESv2 flow description which just says:

that bothers me as well.

> "
> These are the steps the OS must take once detecting an error from a particular GHESv2 error source:
> • OSPM detects error (via interrupt/exception or polling the block status)
> • OSPM copies the error status block
> • OSPM clears the block status field of the error status block
> • OSPM acknowledges the error via Read Ack register. For example:
> – OSPM reads the Read Ack register –> X
> – OSPM writes –> (( X & ReadAckPreserve) | ReadAckWrite)
> "
> 
> The linux code is confusing me, but I think it wonderfully changes
> the fixmap on every access as it needs to do an ioremap type operation
> in NMI conditions.
> 
> > >   * it can be saturated with high rate of errors (so what do we do in case it happens?)  
> 
> Make it big :)  But sure big just makes the condition unlikely rather than solving it.
> 
> > >   * subject to race between clearing/setting Read ACK
> > >      (maybe it can dealt with that on side by keeping internal read_ack counter)  
> 
> I don't think there are any races as long as we update the register only on clear which
> should I think happen before the next SEA can happen? My understanding, which may
> be wrong, is the OS must just take a copy of the error status block and set the
> read_ack all in the exception handler.
> 
> > > 
> > > 3:
> > > And alternatively, queue incoming errors until read ack is cleared
> > > and then inject pending errors in one go.
> > > (problem with that is that at the moment QEMU doesn't monitor
> > > read ack register memory so it won't notice guest clearing that)  
> 
> We'd need to monitor it definitely.  Injecting all we have queued up
> in one go here seems like a reasonable optimization over doing them
> one at a time.
> 
> > > 
> > > 
> > > Given spec has provision for multiple error status blocks/error data entries
> > > it seems that #2 is an expected way to deal with the problem.
> > >     
> > 
> > I would say #1 is the ideal model because the read_ack_register is the bottleneck
> > and it should be scaled up to max_cpus. In that way, the bottleneck can be avoided
> > from the bottom. Another benefit with #1 is the error can be delivered immediately
> > to the vCPU where the error was raised. This matches with the syntax of SEA to me.  
> 
> I don't think it helps for the bottleneck in linux at least.  A whole bunch of locks
> are taken on each SEA because of the novel use of the fixmap.  There is only one
> VA ever used to access the error status blocks we just change what PA it points to
> under a spin lock. Maybe that can be improved on if we can persuade people that error
> handling performance is a thing to care about!
> 
> > 
> > #2 still has the risk to saturate the multiple error status blocks if there are
> > high rate of errors as you said. Besides, the vCPU where read_ack_register is acknoledged
> > can be different from the vCPU where the error is raised, violating the syntax of
> > SEA.
> > 
> > #3's drawback is to violate the syntax of SEA, similar to #2.
> > 
> > However, #2/#3 wouldn't be that complicated to #1. I didn't expect big surgery to
> > GHES module, but it seems there isn't perfect solution without a big surgery.
> > I would vote for #1 to resolve the issue from the ground. What do you think, Igor?
> > I'm also hoping Jonathan and Mauro can provide their preference.  
> 
> Ideally I'd like whatever we choose to look like what a bare metal machine
> does - mostly because we are less likely to hit untested OS paths.

Ack for that but,
that would need someone from hw/firmware side since error status block
handling is done by firmware. 

right now we are just making things up based on spec interpretation.

> > > PS:
> > > I'd prefer Mauro's series being merged 1st (once it's resplit),
> > > for it refactors a bunch of original code and hopefully makes
> > > code easier to follow/extend.
> > >     
> > 
> > Sure. I won't start the coding until the solution is confirmed. All the followup
> > work will base on Mauro's series.
> >   
> > >> +        bql_lock();
> > >> +    }
> > >> +
> > >> +    if (ret == 0) {
> > >> +        kvm_inject_arm_sea(c);
> > >> +    } else {
> > >> +        error_report("Error %d to report memory error", ret);
> > >> +        abort();
> > >> +    }
> > >> +}
> > >> +
> > >>   void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
> > >>   {
> > >>       ram_addr_t ram_addr;
> > >> @@ -2387,12 +2411,7 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
> > >>                */
> > >>               if (code == BUS_MCEERR_AR) {
> > >>                   kvm_cpu_synchronize_state(c);
> > >> -                if (!acpi_ghes_memory_errors(ACPI_HEST_SRC_ID_SEA, paddr, false)) {
> > >> -                    kvm_inject_arm_sea(c);
> > >> -                } else {
> > >> -                    error_report("failed to record the error");
> > >> -                    abort();
> > >> -                }
> > >> +                report_memory_error(c, paddr);
> > >>               }
> > >>               return;
> > >>           }    
> > >     
> > 
> > Thanks,
> > Gavin
> > 
> >   
> 



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] target/arm: Retry pushing CPER error if necessary
  2025-02-25 11:19         ` Igor Mammedov
@ 2025-02-26  4:58           ` Gavin Shan
  2025-02-28  1:55             ` Jonathan Cameron via
  0 siblings, 1 reply; 19+ messages in thread
From: Gavin Shan @ 2025-02-26  4:58 UTC (permalink / raw)
  To: Igor Mammedov, Jonathan Cameron
  Cc: qemu-arm, qemu-devel, mst, anisinha, gengdongjiu1, peter.maydell,
	pbonzini, shan.gavin, Mauro Carvalho Chehab

On 2/25/25 9:19 PM, Igor Mammedov wrote:
> On Fri, 21 Feb 2025 11:04:35 +0000
> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
>>
>> Ideally I'd like whatever we choose to look like what a bare metal machine
>> does - mostly because we are less likely to hit untested OS paths.
> 
> Ack for that but,
> that would need someone from hw/firmware side since error status block
> handling is done by firmware.
> 
> right now we are just making things up based on spec interpretation.
> 

It's a good point. I think it's worthwhile to understand how the RAS event
is processed and turned to CPER by firmware.

I didn't figure out how CPER is generated by edk2 after looking into tf-a (trust
firmware ARM) and edk2 for a while. I will consult to EDK2 developers to seek
their helps. However, there is a note in tf-a that briefly explaining how RAS
event is handled.

   From tf-a/plat/arm/board/fvp/aarch64/fvp_lsp_ras_sp.c:
   (git@github.com:ARM-software/arm-trusted-firmware.git)

   /*
    * Note: Typical RAS error handling flow with Firmware First Handling
    *
    * Step 1: Exception resulting from a RAS error in the normal world is routed to
    *         EL3.
    * Step 2: This exception is typically signaled as either a synchronous external
    *         abort or SError or interrupt. TF-A (EL3 firmware) delegates the
    *         control to platform specific handler built on top of the RAS helper
    *         utilities.
    * Step 3: With the help of a Logical Secure Partition, TF-A sends a direct
    *         message to dedicated S-EL0 (or S-EL1) RAS Partition managed by SPMC.
    *         TF-A also populates a shared buffer with a data structure containing
    *         enough information (such as system registers) to identify and triage
    *         the RAS error.
    * Step 4: RAS SP generates the Common Platform Error Record (CPER) and shares
    *         it with normal world firmware and/or OS kernel through a reserved
    *         buffer memory.
    * Step 5: RAS SP responds to the direct message with information necessary for
    *         TF-A to notify the OS kernel.
    * Step 6: Consequently, TF-A dispatches an SDEI event to notify the OS kernel
    *         about the CPER records for further logging.
    */

According to the note, RAS SP (Secure Partition) is the black box where the RAS
event raised by tf-a is turned to CPER. Unfortunately, I didn't find the source
code to understand the details yet.

Thanks,
Gavin



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] target/arm: Retry pushing CPER error if necessary
  2025-02-21 11:04       ` Jonathan Cameron via
  2025-02-25 11:19         ` Igor Mammedov
@ 2025-02-26  6:56         ` Gavin Shan
  1 sibling, 0 replies; 19+ messages in thread
From: Gavin Shan @ 2025-02-26  6:56 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Igor Mammedov, qemu-arm, qemu-devel, mst, anisinha, gengdongjiu1,
	peter.maydell, pbonzini, shan.gavin, Mauro Carvalho Chehab

On 2/21/25 9:04 PM, Jonathan Cameron wrote:
> On Fri, 21 Feb 2025 15:27:36 +1000
> Gavin Shan <gshan@redhat.com> wrote:

[...]
   
>>
>> I would say #1 is the ideal model because the read_ack_register is the bottleneck
>> and it should be scaled up to max_cpus. In that way, the bottleneck can be avoided
>> from the bottom. Another benefit with #1 is the error can be delivered immediately
>> to the vCPU where the error was raised. This matches with the syntax of SEA to me.
> 
> I don't think it helps for the bottleneck in linux at least.  A whole bunch of locks
> are taken on each SEA because of the novel use of the fixmap.  There is only one
> VA ever used to access the error status blocks we just change what PA it points to
> under a spin lock. Maybe that can be improved on if we can persuade people that error
> handling performance is a thing to care about!
> 

Right, it doesn't helps for the bottleneck in guest kernel due to @ghes_notify_lock_sea.
With the lock, all existing GHES devices and error statuses are serialized for access. I
was actually talking about the benefit to avoid the bottleneck regarding the read_ack_regsiter,
which is the synchronization mechanism between guest kernel and QEMU. For example, an error
has been raised on vCPU-0, but not acknowledged at (A). Another error raised on vCPU-1
can't be delivered because we have only one GHES device and error status block, which
has been reserved for the error raised on vCPU-0.  With solution #1, the bottleneck can
be avoided with multiple GHES devices and error status blocks.

   vCPU-0                                           vCPU-1
   ======                                           ======
   kvm_cpu_exec                                     kvm_cpu_exec
     kvm_vcpu_ioctl(RUN)                              kvm_vcpu_ioctl(RUN)
     kvm_arch_on_sigbus_vcpu                          kvm_arch_on_sigbus_vcpu
       acpi_ghes_memory_errors                          acpi_ghes_memory_errors   (B)
       kvm_inject_arm_sea
     kvm_vcpu_ioctl(RUN)
       :
     do_mem_abort
       do_sea
         apei_claim_sea
           ghes_notify_sea
             raw_spin_lock(&ghes_notify_lock_sea)
             ghes_in_nmi_spool_from_list
               ghes_in_nmi_queue_one_entry
                 ghes_clear_estatus                 (A)
             raw_spin_unlock(&ghes_notify_lock_sea)

Thanks,
Gavin



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/4] target/arm: Retry pushing CPER error if necessary
  2025-02-26  4:58           ` Gavin Shan
@ 2025-02-28  1:55             ` Jonathan Cameron via
  0 siblings, 0 replies; 19+ messages in thread
From: Jonathan Cameron via @ 2025-02-28  1:55 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Igor Mammedov, qemu-arm, qemu-devel, mst, anisinha, gengdongjiu1,
	peter.maydell, pbonzini, shan.gavin, Mauro Carvalho Chehab

On Wed, 26 Feb 2025 14:58:46 +1000
Gavin Shan <gshan@redhat.com> wrote:

> On 2/25/25 9:19 PM, Igor Mammedov wrote:
> > On Fri, 21 Feb 2025 11:04:35 +0000
> > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:  
> >>
> >> Ideally I'd like whatever we choose to look like what a bare metal machine
> >> does - mostly because we are less likely to hit untested OS paths.  
> > 
> > Ack for that but,
> > that would need someone from hw/firmware side since error status block
> > handling is done by firmware.
> > 
> > right now we are just making things up based on spec interpretation.
> >   
> 
> It's a good point. I think it's worthwhile to understand how the RAS event
> is processed and turned to CPER by firmware.
> 
> I didn't figure out how CPER is generated by edk2 after looking into tf-a (trust
> firmware ARM) and edk2 for a while. I will consult to EDK2 developers to seek
> their helps. However, there is a note in tf-a that briefly explaining how RAS
> event is handled.
> 
>    From tf-a/plat/arm/board/fvp/aarch64/fvp_lsp_ras_sp.c:
>    (git@github.com:ARM-software/arm-trusted-firmware.git)
> 
>    /*
>     * Note: Typical RAS error handling flow with Firmware First Handling
>     *
>     * Step 1: Exception resulting from a RAS error in the normal world is routed to
>     *         EL3.
>     * Step 2: This exception is typically signaled as either a synchronous external
>     *         abort or SError or interrupt. TF-A (EL3 firmware) delegates the
>     *         control to platform specific handler built on top of the RAS helper
>     *         utilities.
>     * Step 3: With the help of a Logical Secure Partition, TF-A sends a direct
>     *         message to dedicated S-EL0 (or S-EL1) RAS Partition managed by SPMC.
>     *         TF-A also populates a shared buffer with a data structure containing
>     *         enough information (such as system registers) to identify and triage
>     *         the RAS error.
>     * Step 4: RAS SP generates the Common Platform Error Record (CPER) and shares
>     *         it with normal world firmware and/or OS kernel through a reserved
>     *         buffer memory.
>     * Step 5: RAS SP responds to the direct message with information necessary for
>     *         TF-A to notify the OS kernel.
>     * Step 6: Consequently, TF-A dispatches an SDEI event to notify the OS kernel
>     *         about the CPER records for further logging.
>     */
> 
> According to the note, RAS SP (Secure Partition) is the black box where the RAS
> event raised by tf-a is turned to CPER. Unfortunately, I didn't find the source
> code to understand the details yet.

This is very much 'a flow' rather than 'the flow'.  TFA may not even be
involved in many systems, nor SDEI, nor EDK2 beyond passing through some
config.   One option, as I understand it, is to offload the firmware handing
and building of the record to a management processor and stick to SEA
for the signalling.

I'd be rather surprised if you can find anything beyond binary blobs
for those firmware (if that!).  Maybe all we can get from publicish sources
is what the HEST tables look like.  I've asked our firmware folk if they
can share more on how we do it but might take a while.

I have confirmed we only have one GHESv2 SEA entry on at least the one random
board I looked at (and various interrupt ones).  That board may not be
representative but seems pushing everything through one structure is an option.

Jonathan

> 
> Thanks,
> Gavin
> 
> 



^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2025-02-28  1:57 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-14  4:16 [PATCH 0/4] target/arm: Improvement on memory error handling Gavin Shan
2025-02-14  4:16 ` [PATCH 1/4] acpi/ghes: Make ghes_record_cper_errors() static Gavin Shan
2025-02-21 10:44   ` Philippe Mathieu-Daudé
2025-02-14  4:16 ` [PATCH 2/4] acpi/ghes: Use error_report() in ghes_record_cper_errors() Gavin Shan
2025-02-14  4:16 ` [PATCH 3/4] acpi/ghes: Allow retry to write CPER errors Gavin Shan
2025-02-14  4:16 ` [PATCH 4/4] target/arm: Retry pushing CPER error if necessary Gavin Shan
2025-02-19 17:55   ` Igor Mammedov
2025-02-21  5:27     ` Gavin Shan
2025-02-21 11:04       ` Jonathan Cameron via
2025-02-25 11:19         ` Igor Mammedov
2025-02-26  4:58           ` Gavin Shan
2025-02-28  1:55             ` Jonathan Cameron via
2025-02-26  6:56         ` Gavin Shan
2025-02-14  9:53 ` [PATCH 0/4] target/arm: Improvement on memory error handling Jonathan Cameron via
2025-02-17  0:29   ` Gavin Shan
2025-02-14 10:12 ` Jonathan Cameron via
2025-02-17  3:49   ` Gavin Shan
2025-02-14 12:59 ` Mauro Carvalho Chehab
2025-02-17  3:58   ` Gavin Shan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).