* [PATCH 0/5] acpi/ghes: Error object handling improvement
@ 2025-11-27 0:44 Gavin Shan
2025-11-27 0:44 ` [PATCH 1/5] acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors() Gavin Shan
` (6 more replies)
0 siblings, 7 replies; 27+ messages in thread
From: Gavin Shan @ 2025-11-27 0:44 UTC (permalink / raw)
To: qemu-arm
Cc: qemu-devel, mchehab+huawei, jonathan.cameron, armbru, mst,
imammedo, anisinha, gengdongjiu1, peter.maydell, pbonzini,
shan.gavin
This series is curved from that for memory error handling improvement
[1] based on the received comments, to improve the error object handling
in various aspects.
[1] https://lists.nongnu.org/archive/html/qemu-arm/2025-11/msg00534.html
Gavin Shan (5):
acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors()
acpi/ghes: Abort in acpi_ghes_memory_errors() if necessary
target/arm/kvm: Exit on error from acpi_ghes_memory_errors()
acpi/ghes: Bail early on error from get_ghes_source_offsets()
acpi/ghes: Use error_fatal in acpi_ghes_memory_errors()
hw/acpi/ghes-stub.c | 6 +++---
hw/acpi/ghes.c | 45 ++++++++++++++++++------------------------
include/hw/acpi/ghes.h | 6 +++---
target/arm/kvm.c | 10 +++-------
4 files changed, 28 insertions(+), 39 deletions(-)
--
2.51.1
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 1/5] acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors()
2025-11-27 0:44 [PATCH 0/5] acpi/ghes: Error object handling improvement Gavin Shan
@ 2025-11-27 0:44 ` Gavin Shan
2025-11-27 8:06 ` Markus Armbruster
2025-12-01 9:32 ` Igor Mammedov
2025-11-27 0:44 ` [PATCH 2/5] acpi/ghes: Abort in acpi_ghes_memory_errors() if necessary Gavin Shan
` (5 subsequent siblings)
6 siblings, 2 replies; 27+ messages in thread
From: Gavin Shan @ 2025-11-27 0:44 UTC (permalink / raw)
To: qemu-arm
Cc: qemu-devel, mchehab+huawei, jonathan.cameron, armbru, mst,
imammedo, anisinha, gengdongjiu1, peter.maydell, pbonzini,
shan.gavin
Use g_auto_ptr() to automate data block cleanup in the function so
that it won't be a burden to us.
Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
---
hw/acpi/ghes.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 06555905ce..6366c74248 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -565,9 +565,7 @@ int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
0xED, 0x7C, 0x83, 0xB1);
Error *errp = NULL;
int data_length;
- GArray *block;
-
- block = g_array_new(false, true /* clear */, 1);
+ g_autoptr(GArray) block = g_array_new(false, true /* clear */, 1);
data_length = ACPI_GHES_DATA_LENGTH + ACPI_GHES_MEM_CPER_LENGTH;
/*
@@ -585,8 +583,6 @@ int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
/* Report the error */
ghes_record_cper_errors(ags, block->data, block->len, source_id, &errp);
- g_array_free(block, true);
-
if (errp) {
error_report_err(errp);
return -1;
--
2.51.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 2/5] acpi/ghes: Abort in acpi_ghes_memory_errors() if necessary
2025-11-27 0:44 [PATCH 0/5] acpi/ghes: Error object handling improvement Gavin Shan
2025-11-27 0:44 ` [PATCH 1/5] acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors() Gavin Shan
@ 2025-11-27 0:44 ` Gavin Shan
2025-12-01 9:37 ` Igor Mammedov
2025-11-27 0:44 ` [PATCH 3/5] target/arm/kvm: Exit on error from acpi_ghes_memory_errors() Gavin Shan
` (4 subsequent siblings)
6 siblings, 1 reply; 27+ messages in thread
From: Gavin Shan @ 2025-11-27 0:44 UTC (permalink / raw)
To: qemu-arm
Cc: qemu-devel, mchehab+huawei, jonathan.cameron, armbru, mst,
imammedo, anisinha, gengdongjiu1, peter.maydell, pbonzini,
shan.gavin
The function hw/acpi/ghes-stub.c::acpi_ghes_memory_errors() shouldn't
be called by any one. Take g_assert_not_reached() as a clearer indication.
Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
---
hw/acpi/ghes-stub.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
index 40f660c246..b54f1b093c 100644
--- a/hw/acpi/ghes-stub.c
+++ b/hw/acpi/ghes-stub.c
@@ -14,7 +14,7 @@
int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
uint64_t physical_address)
{
- return -1;
+ g_assert_not_reached();
}
AcpiGhesState *acpi_ghes_get_state(void)
--
2.51.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 3/5] target/arm/kvm: Exit on error from acpi_ghes_memory_errors()
2025-11-27 0:44 [PATCH 0/5] acpi/ghes: Error object handling improvement Gavin Shan
2025-11-27 0:44 ` [PATCH 1/5] acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors() Gavin Shan
2025-11-27 0:44 ` [PATCH 2/5] acpi/ghes: Abort in acpi_ghes_memory_errors() if necessary Gavin Shan
@ 2025-11-27 0:44 ` Gavin Shan
2025-11-28 14:07 ` Igor Mammedov
2025-12-01 10:06 ` Igor Mammedov
2025-11-27 0:44 ` [PATCH 4/5] acpi/ghes: Bail early on error from get_ghes_source_offsets() Gavin Shan
` (3 subsequent siblings)
6 siblings, 2 replies; 27+ messages in thread
From: Gavin Shan @ 2025-11-27 0:44 UTC (permalink / raw)
To: qemu-arm
Cc: qemu-devel, mchehab+huawei, jonathan.cameron, armbru, mst,
imammedo, anisinha, gengdongjiu1, peter.maydell, pbonzini,
shan.gavin
A core dump is no sense as there isn't programming bugs related to
errors from acpi_ghes_memory_errors().
Exit instead of abort when the function returns errors, and the
excessive error message is also dropped.
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
---
target/arm/kvm.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 0d57081e69..acda0b3fb4 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -2460,8 +2460,7 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
paddr)) {
kvm_inject_arm_sea(c);
} else {
- error_report("failed to record the error");
- abort();
+ exit(1);
}
}
return;
--
2.51.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 4/5] acpi/ghes: Bail early on error from get_ghes_source_offsets()
2025-11-27 0:44 [PATCH 0/5] acpi/ghes: Error object handling improvement Gavin Shan
` (2 preceding siblings ...)
2025-11-27 0:44 ` [PATCH 3/5] target/arm/kvm: Exit on error from acpi_ghes_memory_errors() Gavin Shan
@ 2025-11-27 0:44 ` Gavin Shan
2025-11-27 8:10 ` Markus Armbruster
2025-12-01 10:10 ` Igor Mammedov
2025-11-27 0:44 ` [PATCH 5/5] acpi/ghes: Use error_fatal in acpi_ghes_memory_errors() Gavin Shan
` (2 subsequent siblings)
6 siblings, 2 replies; 27+ messages in thread
From: Gavin Shan @ 2025-11-27 0:44 UTC (permalink / raw)
To: qemu-arm
Cc: qemu-devel, mchehab+huawei, jonathan.cameron, armbru, mst,
imammedo, anisinha, gengdongjiu1, peter.maydell, pbonzini,
shan.gavin
For one particular error (Error), we can't call error_setg() for twice.
Otherwise, the assert(*errp == NULL) will be triggered unexpectedly in
error_setv(). In ghes_record_cper_errors(), get_ghes_source_offsets()
can return a error initialized by error_setg(). Without bailing on
this error, it can call into the second error_setg() due to the
unexpected value from the read acknowledgement register.
Bail early in ghes_record_cper_errors() when error is received from
get_ghes_source_offsets() to avoid the exception.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
---
hw/acpi/ghes.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index 6366c74248..c35883dfa9 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -443,7 +443,7 @@ static void get_hw_error_offsets(uint64_t ghes_addr,
*read_ack_register_addr = ghes_addr + sizeof(uint64_t);
}
-static void get_ghes_source_offsets(uint16_t source_id,
+static bool get_ghes_source_offsets(uint16_t source_id,
uint64_t hest_addr,
uint64_t *cper_addr,
uint64_t *read_ack_start_addr,
@@ -474,7 +474,7 @@ static void get_ghes_source_offsets(uint16_t source_id,
/* For now, we only know the size of GHESv2 table */
if (type != ACPI_GHES_SOURCE_GENERIC_ERROR_V2) {
error_setg(errp, "HEST: type %d not supported.", type);
- return;
+ return false;
}
/* Compare CPER source ID at the GHESv2 structure */
@@ -488,7 +488,7 @@ static void get_ghes_source_offsets(uint16_t source_id,
}
if (i == num_sources) {
error_setg(errp, "HEST: Source %d not found.", source_id);
- return;
+ return false;
}
/* Navigate through table address pointers */
@@ -508,6 +508,8 @@ static void get_ghes_source_offsets(uint16_t source_id,
cpu_physical_memory_read(hest_read_ack_addr, read_ack_start_addr,
sizeof(*read_ack_start_addr));
*read_ack_start_addr = le64_to_cpu(*read_ack_start_addr);
+
+ return true;
}
NotifierList acpi_generic_error_notifiers =
@@ -526,9 +528,10 @@ void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
if (!ags->use_hest_addr) {
get_hw_error_offsets(le64_to_cpu(ags->hw_error_le),
&cper_addr, &read_ack_register_addr);
- } else {
- get_ghes_source_offsets(source_id, le64_to_cpu(ags->hest_addr_le),
- &cper_addr, &read_ack_register_addr, errp);
+ } else if (!get_ghes_source_offsets(source_id,
+ le64_to_cpu(ags->hest_addr_le),
+ &cper_addr, &read_ack_register_addr, errp)) {
+ return;
}
cpu_physical_memory_read(read_ack_register_addr,
--
2.51.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH 5/5] acpi/ghes: Use error_fatal in acpi_ghes_memory_errors()
2025-11-27 0:44 [PATCH 0/5] acpi/ghes: Error object handling improvement Gavin Shan
` (3 preceding siblings ...)
2025-11-27 0:44 ` [PATCH 4/5] acpi/ghes: Bail early on error from get_ghes_source_offsets() Gavin Shan
@ 2025-11-27 0:44 ` Gavin Shan
2025-11-27 8:14 ` Markus Armbruster
2025-12-01 10:12 ` Igor Mammedov
2025-11-28 14:09 ` [PATCH 0/5] acpi/ghes: Error object handling improvement Igor Mammedov
2025-12-01 12:17 ` Mauro Carvalho Chehab
6 siblings, 2 replies; 27+ messages in thread
From: Gavin Shan @ 2025-11-27 0:44 UTC (permalink / raw)
To: qemu-arm
Cc: qemu-devel, mchehab+huawei, jonathan.cameron, armbru, mst,
imammedo, anisinha, gengdongjiu1, peter.maydell, pbonzini,
shan.gavin
Use error_fatal in acpi_ghes_memory_errors() so that the caller needn't
explicitly call exit(). The return value of acpi_ghes_memory_errors()
and ghes_record_cper_errors() is changed to 'bool' indicating an error
has been raised, to be compatible with what's documented in error.h.
Suggested-by: Igor Mammedov <imammedo@redhat.com>
Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
---
hw/acpi/ghes-stub.c | 4 ++--
hw/acpi/ghes.c | 26 ++++++++++----------------
include/hw/acpi/ghes.h | 6 +++---
target/arm/kvm.c | 9 +++------
4 files changed, 18 insertions(+), 27 deletions(-)
diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
index b54f1b093c..5f9313cce9 100644
--- a/hw/acpi/ghes-stub.c
+++ b/hw/acpi/ghes-stub.c
@@ -11,8 +11,8 @@
#include "qemu/osdep.h"
#include "hw/acpi/ghes.h"
-int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
- uint64_t physical_address)
+bool acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
+ uint64_t physical_address, Error **errp)
{
g_assert_not_reached();
}
diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
index c35883dfa9..3033e93d65 100644
--- a/hw/acpi/ghes.c
+++ b/hw/acpi/ghes.c
@@ -515,14 +515,14 @@ static bool get_ghes_source_offsets(uint16_t source_id,
NotifierList acpi_generic_error_notifiers =
NOTIFIER_LIST_INITIALIZER(acpi_generic_error_notifiers);
-void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
+bool ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
uint16_t source_id, Error **errp)
{
uint64_t cper_addr = 0, read_ack_register_addr = 0, read_ack_register;
if (len > ACPI_GHES_MAX_RAW_DATA_LENGTH) {
error_setg(errp, "GHES CPER record is too big: %zd", len);
- return;
+ return false;
}
if (!ags->use_hest_addr) {
@@ -531,7 +531,7 @@ void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
} else if (!get_ghes_source_offsets(source_id,
le64_to_cpu(ags->hest_addr_le),
&cper_addr, &read_ack_register_addr, errp)) {
- return;
+ return false;
}
cpu_physical_memory_read(read_ack_register_addr,
@@ -542,7 +542,7 @@ void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
error_setg(errp,
"OSPM does not acknowledge previous error,"
" so can not record CPER for current error anymore");
- return;
+ return false;
}
read_ack_register = cpu_to_le64(0);
@@ -557,16 +557,17 @@ void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
cpu_physical_memory_write(cper_addr, cper, len);
notifier_list_notify(&acpi_generic_error_notifiers, &source_id);
+
+ return true;
}
-int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
- uint64_t physical_address)
+bool acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
+ uint64_t physical_address, Error **errp)
{
/* Memory Error Section Type */
const uint8_t guid[] =
UUID_LE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
0xED, 0x7C, 0x83, 0xB1);
- Error *errp = NULL;
int data_length;
g_autoptr(GArray) block = g_array_new(false, true /* clear */, 1);
@@ -583,15 +584,8 @@ int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
/* Build the memory section CPER for above new generic error data entry */
acpi_ghes_build_append_mem_cper(block, physical_address);
- /* Report the error */
- ghes_record_cper_errors(ags, block->data, block->len, source_id, &errp);
-
- if (errp) {
- error_report_err(errp);
- return -1;
- }
-
- return 0;
+ return ghes_record_cper_errors(ags, block->data, block->len,
+ source_id, errp);
}
AcpiGhesState *acpi_ghes_get_state(void)
diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
index df2ecbf6e4..5b29aae4dd 100644
--- a/include/hw/acpi/ghes.h
+++ b/include/hw/acpi/ghes.h
@@ -98,9 +98,9 @@ void acpi_build_hest(AcpiGhesState *ags, GArray *table_data,
const char *oem_id, const char *oem_table_id);
void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
GArray *hardware_errors);
-int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
- uint64_t error_physical_addr);
-void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
+bool acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
+ uint64_t error_physical_addr, Error **errp);
+bool ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
uint16_t source_id, Error **errp);
/**
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index acda0b3fb4..76aa09810f 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -2456,12 +2456,9 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
*/
if (code == BUS_MCEERR_AR) {
kvm_cpu_synchronize_state(c);
- if (!acpi_ghes_memory_errors(ags, ACPI_HEST_SRC_ID_SYNC,
- paddr)) {
- kvm_inject_arm_sea(c);
- } else {
- exit(1);
- }
+ acpi_ghes_memory_errors(ags, ACPI_HEST_SRC_ID_SYNC,
+ paddr, &error_fatal);
+ kvm_inject_arm_sea(c);
}
return;
}
--
2.51.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 1/5] acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors()
2025-11-27 0:44 ` [PATCH 1/5] acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors() Gavin Shan
@ 2025-11-27 8:06 ` Markus Armbruster
2025-12-01 9:32 ` Igor Mammedov
1 sibling, 0 replies; 27+ messages in thread
From: Markus Armbruster @ 2025-11-27 8:06 UTC (permalink / raw)
To: Gavin Shan
Cc: qemu-arm, qemu-devel, mchehab+huawei, jonathan.cameron, mst,
imammedo, anisinha, gengdongjiu1, peter.maydell, pbonzini,
shan.gavin
Gavin Shan <gshan@redhat.com> writes:
> Use g_auto_ptr() to automate data block cleanup in the function so
> that it won't be a burden to us.
>
> Suggested-by: Markus Armbruster <armbru@redhat.com>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 4/5] acpi/ghes: Bail early on error from get_ghes_source_offsets()
2025-11-27 0:44 ` [PATCH 4/5] acpi/ghes: Bail early on error from get_ghes_source_offsets() Gavin Shan
@ 2025-11-27 8:10 ` Markus Armbruster
2025-12-01 10:10 ` Igor Mammedov
1 sibling, 0 replies; 27+ messages in thread
From: Markus Armbruster @ 2025-11-27 8:10 UTC (permalink / raw)
To: Gavin Shan
Cc: qemu-arm, qemu-devel, mchehab+huawei, jonathan.cameron, mst,
imammedo, anisinha, gengdongjiu1, peter.maydell, pbonzini,
shan.gavin
Gavin Shan <gshan@redhat.com> writes:
> For one particular error (Error), we can't call error_setg() for twice.
> Otherwise, the assert(*errp == NULL) will be triggered unexpectedly in
> error_setv(). In ghes_record_cper_errors(), get_ghes_source_offsets()
> can return a error initialized by error_setg(). Without bailing on
> this error, it can call into the second error_setg() due to the
> unexpected value from the read acknowledgement register.
>
> Bail early in ghes_record_cper_errors() when error is received from
> get_ghes_source_offsets() to avoid the exception.
>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 5/5] acpi/ghes: Use error_fatal in acpi_ghes_memory_errors()
2025-11-27 0:44 ` [PATCH 5/5] acpi/ghes: Use error_fatal in acpi_ghes_memory_errors() Gavin Shan
@ 2025-11-27 8:14 ` Markus Armbruster
2025-11-29 1:23 ` Gavin Shan
2025-12-01 10:12 ` Igor Mammedov
1 sibling, 1 reply; 27+ messages in thread
From: Markus Armbruster @ 2025-11-27 8:14 UTC (permalink / raw)
To: Gavin Shan
Cc: qemu-arm, qemu-devel, mchehab+huawei, jonathan.cameron, mst,
imammedo, anisinha, gengdongjiu1, peter.maydell, pbonzini,
shan.gavin
Gavin Shan <gshan@redhat.com> writes:
> Use error_fatal in acpi_ghes_memory_errors() so that the caller needn't
> explicitly call exit(). The return value of acpi_ghes_memory_errors()
> and ghes_record_cper_errors() is changed to 'bool' indicating an error
> has been raised, to be compatible with what's documented in error.h.
>
> Suggested-by: Igor Mammedov <imammedo@redhat.com>
> Suggested-by: Markus Armbruster <armbru@redhat.com>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
I figure I would've split this for easier review, say 1. return bool,
2. convert acpi_ghes_memory_errors to Error. Since you already got
review, it's probably not worthwhile now. Next time :)
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 3/5] target/arm/kvm: Exit on error from acpi_ghes_memory_errors()
2025-11-27 0:44 ` [PATCH 3/5] target/arm/kvm: Exit on error from acpi_ghes_memory_errors() Gavin Shan
@ 2025-11-28 14:07 ` Igor Mammedov
2025-11-28 14:54 ` Markus Armbruster
2025-12-01 10:06 ` Igor Mammedov
1 sibling, 1 reply; 27+ messages in thread
From: Igor Mammedov @ 2025-11-28 14:07 UTC (permalink / raw)
To: Gavin Shan
Cc: qemu-arm, qemu-devel, mchehab+huawei, jonathan.cameron, armbru,
mst, anisinha, gengdongjiu1, peter.maydell, pbonzini, shan.gavin
On Thu, 27 Nov 2025 10:44:33 +1000
Gavin Shan <gshan@redhat.com> wrote:
> A core dump is no sense as there isn't programming bugs related to
> errors from acpi_ghes_memory_errors().
>
> Exit instead of abort when the function returns errors, and the
> excessive error message is also dropped.
>
> Suggested-by: Igor Mammedov <imammedo@redhat.com>
> Suggested-by: Markus Armbruster <armbru@redhat.com>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> ---
> target/arm/kvm.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> index 0d57081e69..acda0b3fb4 100644
> --- a/target/arm/kvm.c
> +++ b/target/arm/kvm.c
> @@ -2460,8 +2460,7 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
> paddr)) {
> kvm_inject_arm_sea(c);
> } else {
> - error_report("failed to record the error");
so with message gone, user will just see qemu silently exit?
> - abort();
> + exit(1);
> }
> }
> return;
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/5] acpi/ghes: Error object handling improvement
2025-11-27 0:44 [PATCH 0/5] acpi/ghes: Error object handling improvement Gavin Shan
` (4 preceding siblings ...)
2025-11-27 0:44 ` [PATCH 5/5] acpi/ghes: Use error_fatal in acpi_ghes_memory_errors() Gavin Shan
@ 2025-11-28 14:09 ` Igor Mammedov
2025-11-29 1:21 ` Gavin Shan
2025-12-01 12:17 ` Mauro Carvalho Chehab
6 siblings, 1 reply; 27+ messages in thread
From: Igor Mammedov @ 2025-11-28 14:09 UTC (permalink / raw)
To: Gavin Shan
Cc: qemu-arm, qemu-devel, mchehab+huawei, jonathan.cameron, armbru,
mst, anisinha, gengdongjiu1, peter.maydell, pbonzini, shan.gavin
On Thu, 27 Nov 2025 10:44:30 +1000
Gavin Shan <gshan@redhat.com> wrote:
> This series is curved from that for memory error handling improvement
^^^ confusing
based on above I'm not sure if it depends on [1] and shoul be applied on top
or it can be merged on its own
> [1] based on the received comments, to improve the error object handling
> in various aspects.
>
> [1] https://lists.nongnu.org/archive/html/qemu-arm/2025-11/msg00534.html
>
> Gavin Shan (5):
> acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors()
> acpi/ghes: Abort in acpi_ghes_memory_errors() if necessary
> target/arm/kvm: Exit on error from acpi_ghes_memory_errors()
> acpi/ghes: Bail early on error from get_ghes_source_offsets()
> acpi/ghes: Use error_fatal in acpi_ghes_memory_errors()
>
> hw/acpi/ghes-stub.c | 6 +++---
> hw/acpi/ghes.c | 45 ++++++++++++++++++------------------------
> include/hw/acpi/ghes.h | 6 +++---
> target/arm/kvm.c | 10 +++-------
> 4 files changed, 28 insertions(+), 39 deletions(-)
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 3/5] target/arm/kvm: Exit on error from acpi_ghes_memory_errors()
2025-11-28 14:07 ` Igor Mammedov
@ 2025-11-28 14:54 ` Markus Armbruster
0 siblings, 0 replies; 27+ messages in thread
From: Markus Armbruster @ 2025-11-28 14:54 UTC (permalink / raw)
To: Igor Mammedov
Cc: Gavin Shan, qemu-arm, qemu-devel, mchehab+huawei,
jonathan.cameron, mst, anisinha, gengdongjiu1, peter.maydell,
pbonzini, shan.gavin
Igor Mammedov <imammedo@redhat.com> writes:
> On Thu, 27 Nov 2025 10:44:33 +1000
> Gavin Shan <gshan@redhat.com> wrote:
>
>> A core dump is no sense as there isn't programming bugs related to
>> errors from acpi_ghes_memory_errors().
>>
>> Exit instead of abort when the function returns errors, and the
>> excessive error message is also dropped.
>>
>> Suggested-by: Igor Mammedov <imammedo@redhat.com>
>> Suggested-by: Markus Armbruster <armbru@redhat.com>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> ---
>> target/arm/kvm.c | 3 +--
>> 1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/target/arm/kvm.c b/target/arm/kvm.c
>> index 0d57081e69..acda0b3fb4 100644
>> --- a/target/arm/kvm.c
>> +++ b/target/arm/kvm.c
>> @@ -2460,8 +2460,7 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
>> paddr)) {
>> kvm_inject_arm_sea(c);
>> } else {
>> - error_report("failed to record the error");
>
> so with message gone, user will just see qemu silently exit?
No: acpi_ghes_memory_errors() reports with error_report().
But the stub doesn't. If it can be called, it needs a suitable
error_report(). It it can't be called, it should abort().
>> - abort();
>> + exit(1);
>> }
>> }
>> return;
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/5] acpi/ghes: Error object handling improvement
2025-11-28 14:09 ` [PATCH 0/5] acpi/ghes: Error object handling improvement Igor Mammedov
@ 2025-11-29 1:21 ` Gavin Shan
2025-12-01 9:31 ` Igor Mammedov
0 siblings, 1 reply; 27+ messages in thread
From: Gavin Shan @ 2025-11-29 1:21 UTC (permalink / raw)
To: Igor Mammedov
Cc: qemu-arm, qemu-devel, mchehab+huawei, jonathan.cameron, armbru,
mst, anisinha, gengdongjiu1, peter.maydell, pbonzini, shan.gavin
Hi Igor,
On 11/29/25 12:09 AM, Igor Mammedov wrote:
> On Thu, 27 Nov 2025 10:44:30 +1000
> Gavin Shan <gshan@redhat.com> wrote:
>
>> This series is curved from that for memory error handling improvement
> ^^^ confusing
> based on above I'm not sure if it depends on [1] and shoul be applied on top
> or it can be merged on its own
>
The current series is a standalone series and expected to be merged by its own.
For (v4) series of memory error improvement [1], Jonathan wants to extend
the handlers in the guest kernel so that the granularity in CPER record
will be used to isolate the corresponding memory address range. With this,
the patches in the (v4) series to send 16x continuous errors become useless.
However, those patches in (v4) series to improve the Error (object) hanlding
are still useful. So I pulled those patches for the Error (object) hanlding
improvement from (v4) series to form this series.
>> [1] based on the received comments, to improve the error object handling
>> in various aspects.
>>
>> [1] https://lists.nongnu.org/archive/html/qemu-arm/2025-11/msg00534.html
>>
Thanks,
Gavin
>> Gavin Shan (5):
>> acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors()
>> acpi/ghes: Abort in acpi_ghes_memory_errors() if necessary
>> target/arm/kvm: Exit on error from acpi_ghes_memory_errors()
>> acpi/ghes: Bail early on error from get_ghes_source_offsets()
>> acpi/ghes: Use error_fatal in acpi_ghes_memory_errors()
>>
>> hw/acpi/ghes-stub.c | 6 +++---
>> hw/acpi/ghes.c | 45 ++++++++++++++++++------------------------
>> include/hw/acpi/ghes.h | 6 +++---
>> target/arm/kvm.c | 10 +++-------
>> 4 files changed, 28 insertions(+), 39 deletions(-)
>>
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 5/5] acpi/ghes: Use error_fatal in acpi_ghes_memory_errors()
2025-11-27 8:14 ` Markus Armbruster
@ 2025-11-29 1:23 ` Gavin Shan
0 siblings, 0 replies; 27+ messages in thread
From: Gavin Shan @ 2025-11-29 1:23 UTC (permalink / raw)
To: Markus Armbruster
Cc: qemu-arm, qemu-devel, mchehab+huawei, jonathan.cameron, mst,
imammedo, anisinha, gengdongjiu1, peter.maydell, pbonzini,
shan.gavin
Hi Markus,
On 11/27/25 6:14 PM, Markus Armbruster wrote:
> Gavin Shan <gshan@redhat.com> writes:
>
>> Use error_fatal in acpi_ghes_memory_errors() so that the caller needn't
>> explicitly call exit(). The return value of acpi_ghes_memory_errors()
>> and ghes_record_cper_errors() is changed to 'bool' indicating an error
>> has been raised, to be compatible with what's documented in error.h.
>>
>> Suggested-by: Igor Mammedov <imammedo@redhat.com>
>> Suggested-by: Markus Armbruster <armbru@redhat.com>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>
> Reviewed-by: Markus Armbruster <armbru@redhat.com>
>
> I figure I would've split this for easier review, say 1. return bool,
> 2. convert acpi_ghes_memory_errors to Error. Since you already got
> review, it's probably not worthwhile now. Next time :)
>
Nice point, thanks for your review. Lets try to do that next time :)
Thanks,
Gavin
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/5] acpi/ghes: Error object handling improvement
2025-11-29 1:21 ` Gavin Shan
@ 2025-12-01 9:31 ` Igor Mammedov
0 siblings, 0 replies; 27+ messages in thread
From: Igor Mammedov @ 2025-12-01 9:31 UTC (permalink / raw)
To: Gavin Shan
Cc: qemu-arm, qemu-devel, mchehab+huawei, jonathan.cameron, armbru,
mst, anisinha, gengdongjiu1, peter.maydell, pbonzini, shan.gavin
On Sat, 29 Nov 2025 11:21:55 +1000
Gavin Shan <gshan@redhat.com> wrote:
> Hi Igor,
>
> On 11/29/25 12:09 AM, Igor Mammedov wrote:
> > On Thu, 27 Nov 2025 10:44:30 +1000
> > Gavin Shan <gshan@redhat.com> wrote:
> >
> >> This series is curved from that for memory error handling improvement
> > ^^^ confusing
> > based on above I'm not sure if it depends on [1] and shoul be applied on top
> > or it can be merged on its own
> >
>
> The current series is a standalone series and expected to be merged by its own.
>
> For (v4) series of memory error improvement [1], Jonathan wants to extend
> the handlers in the guest kernel so that the granularity in CPER record
> will be used to isolate the corresponding memory address range. With this,
> the patches in the (v4) series to send 16x continuous errors become useless.
> However, those patches in (v4) series to improve the Error (object) hanlding
> are still useful. So I pulled those patches for the Error (object) hanlding
> improvement from (v4) series to form this series.
ok, then I'll review this series and skip v4 for now
>
> >> [1] based on the received comments, to improve the error object handling
> >> in various aspects.
> >>
> >> [1] https://lists.nongnu.org/archive/html/qemu-arm/2025-11/msg00534.html
> >>
>
> Thanks,
> Gavin
>
> >> Gavin Shan (5):
> >> acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors()
> >> acpi/ghes: Abort in acpi_ghes_memory_errors() if necessary
> >> target/arm/kvm: Exit on error from acpi_ghes_memory_errors()
> >> acpi/ghes: Bail early on error from get_ghes_source_offsets()
> >> acpi/ghes: Use error_fatal in acpi_ghes_memory_errors()
> >>
> >> hw/acpi/ghes-stub.c | 6 +++---
> >> hw/acpi/ghes.c | 45 ++++++++++++++++++------------------------
> >> include/hw/acpi/ghes.h | 6 +++---
> >> target/arm/kvm.c | 10 +++-------
> >> 4 files changed, 28 insertions(+), 39 deletions(-)
> >>
> >
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/5] acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors()
2025-11-27 0:44 ` [PATCH 1/5] acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors() Gavin Shan
2025-11-27 8:06 ` Markus Armbruster
@ 2025-12-01 9:32 ` Igor Mammedov
1 sibling, 0 replies; 27+ messages in thread
From: Igor Mammedov @ 2025-12-01 9:32 UTC (permalink / raw)
To: Gavin Shan
Cc: qemu-arm, qemu-devel, mchehab+huawei, jonathan.cameron, armbru,
mst, anisinha, gengdongjiu1, peter.maydell, pbonzini, shan.gavin
On Thu, 27 Nov 2025 10:44:31 +1000
Gavin Shan <gshan@redhat.com> wrote:
> Use g_auto_ptr() to automate data block cleanup in the function so
> that it won't be a burden to us.
>
> Suggested-by: Markus Armbruster <armbru@redhat.com>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
> ---
> hw/acpi/ghes.c | 6 +-----
> 1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index 06555905ce..6366c74248 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -565,9 +565,7 @@ int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
> 0xED, 0x7C, 0x83, 0xB1);
> Error *errp = NULL;
> int data_length;
> - GArray *block;
> -
> - block = g_array_new(false, true /* clear */, 1);
> + g_autoptr(GArray) block = g_array_new(false, true /* clear */, 1);
>
> data_length = ACPI_GHES_DATA_LENGTH + ACPI_GHES_MEM_CPER_LENGTH;
> /*
> @@ -585,8 +583,6 @@ int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
> /* Report the error */
> ghes_record_cper_errors(ags, block->data, block->len, source_id, &errp);
>
> - g_array_free(block, true);
> -
> if (errp) {
> error_report_err(errp);
> return -1;
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 2/5] acpi/ghes: Abort in acpi_ghes_memory_errors() if necessary
2025-11-27 0:44 ` [PATCH 2/5] acpi/ghes: Abort in acpi_ghes_memory_errors() if necessary Gavin Shan
@ 2025-12-01 9:37 ` Igor Mammedov
0 siblings, 0 replies; 27+ messages in thread
From: Igor Mammedov @ 2025-12-01 9:37 UTC (permalink / raw)
To: Gavin Shan
Cc: qemu-arm, qemu-devel, mchehab+huawei, jonathan.cameron, armbru,
mst, anisinha, gengdongjiu1, peter.maydell, pbonzini, shan.gavin
On Thu, 27 Nov 2025 10:44:32 +1000
Gavin Shan <gshan@redhat.com> wrote:
> The function hw/acpi/ghes-stub.c::acpi_ghes_memory_errors() shouldn't
> be called by any one. Take g_assert_not_reached() as a clearer indication.
>
> Suggested-by: Markus Armbruster <armbru@redhat.com>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
> ---
> hw/acpi/ghes-stub.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
> index 40f660c246..b54f1b093c 100644
> --- a/hw/acpi/ghes-stub.c
> +++ b/hw/acpi/ghes-stub.c
> @@ -14,7 +14,7 @@
> int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
> uint64_t physical_address)
> {
> - return -1;
> + g_assert_not_reached();
> }
>
> AcpiGhesState *acpi_ghes_get_state(void)
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 3/5] target/arm/kvm: Exit on error from acpi_ghes_memory_errors()
2025-11-27 0:44 ` [PATCH 3/5] target/arm/kvm: Exit on error from acpi_ghes_memory_errors() Gavin Shan
2025-11-28 14:07 ` Igor Mammedov
@ 2025-12-01 10:06 ` Igor Mammedov
1 sibling, 0 replies; 27+ messages in thread
From: Igor Mammedov @ 2025-12-01 10:06 UTC (permalink / raw)
To: Gavin Shan
Cc: qemu-arm, qemu-devel, mchehab+huawei, jonathan.cameron, armbru,
mst, anisinha, gengdongjiu1, peter.maydell, pbonzini, shan.gavin
On Thu, 27 Nov 2025 10:44:33 +1000
Gavin Shan <gshan@redhat.com> wrote:
> A core dump is no sense as there isn't programming bugs related to
> errors from acpi_ghes_memory_errors().
>
> Exit instead of abort when the function returns errors, and the
> excessive error message is also dropped.
>
> Suggested-by: Igor Mammedov <imammedo@redhat.com>
> Suggested-by: Markus Armbruster <armbru@redhat.com>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
> ---
> target/arm/kvm.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> index 0d57081e69..acda0b3fb4 100644
> --- a/target/arm/kvm.c
> +++ b/target/arm/kvm.c
> @@ -2460,8 +2460,7 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
> paddr)) {
> kvm_inject_arm_sea(c);
> } else {
> - error_report("failed to record the error");
> - abort();
> + exit(1);
> }
> }
> return;
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 4/5] acpi/ghes: Bail early on error from get_ghes_source_offsets()
2025-11-27 0:44 ` [PATCH 4/5] acpi/ghes: Bail early on error from get_ghes_source_offsets() Gavin Shan
2025-11-27 8:10 ` Markus Armbruster
@ 2025-12-01 10:10 ` Igor Mammedov
2025-12-01 14:15 ` Gavin Shan
1 sibling, 1 reply; 27+ messages in thread
From: Igor Mammedov @ 2025-12-01 10:10 UTC (permalink / raw)
To: Gavin Shan
Cc: qemu-arm, qemu-devel, mchehab+huawei, jonathan.cameron, armbru,
mst, anisinha, gengdongjiu1, peter.maydell, pbonzini, shan.gavin
On Thu, 27 Nov 2025 10:44:34 +1000
Gavin Shan <gshan@redhat.com> wrote:
> For one particular error (Error), we can't call error_setg() for twice.
^^^^^^^^^^^^^^^^^^^^^^^^^^^
I can't really parse that
maybe rephrase it to make some sense?
> Otherwise, the assert(*errp == NULL) will be triggered unexpectedly in
> error_setv(). In ghes_record_cper_errors(), get_ghes_source_offsets()
> can return a error initialized by error_setg(). Without bailing on
> this error, it can call into the second error_setg() due to the
> unexpected value from the read acknowledgement register.
>
> Bail early in ghes_record_cper_errors() when error is received from
> get_ghes_source_offsets() to avoid the exception.
>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
patch itself LGTM
and with commit message fixed
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
> ---
> hw/acpi/ghes.c | 15 +++++++++------
> 1 file changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index 6366c74248..c35883dfa9 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -443,7 +443,7 @@ static void get_hw_error_offsets(uint64_t ghes_addr,
> *read_ack_register_addr = ghes_addr + sizeof(uint64_t);
> }
>
> -static void get_ghes_source_offsets(uint16_t source_id,
> +static bool get_ghes_source_offsets(uint16_t source_id,
> uint64_t hest_addr,
> uint64_t *cper_addr,
> uint64_t *read_ack_start_addr,
> @@ -474,7 +474,7 @@ static void get_ghes_source_offsets(uint16_t source_id,
> /* For now, we only know the size of GHESv2 table */
> if (type != ACPI_GHES_SOURCE_GENERIC_ERROR_V2) {
> error_setg(errp, "HEST: type %d not supported.", type);
> - return;
> + return false;
> }
>
> /* Compare CPER source ID at the GHESv2 structure */
> @@ -488,7 +488,7 @@ static void get_ghes_source_offsets(uint16_t source_id,
> }
> if (i == num_sources) {
> error_setg(errp, "HEST: Source %d not found.", source_id);
> - return;
> + return false;
> }
>
> /* Navigate through table address pointers */
> @@ -508,6 +508,8 @@ static void get_ghes_source_offsets(uint16_t source_id,
> cpu_physical_memory_read(hest_read_ack_addr, read_ack_start_addr,
> sizeof(*read_ack_start_addr));
> *read_ack_start_addr = le64_to_cpu(*read_ack_start_addr);
> +
> + return true;
> }
>
> NotifierList acpi_generic_error_notifiers =
> @@ -526,9 +528,10 @@ void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
> if (!ags->use_hest_addr) {
> get_hw_error_offsets(le64_to_cpu(ags->hw_error_le),
> &cper_addr, &read_ack_register_addr);
> - } else {
> - get_ghes_source_offsets(source_id, le64_to_cpu(ags->hest_addr_le),
> - &cper_addr, &read_ack_register_addr, errp);
> + } else if (!get_ghes_source_offsets(source_id,
> + le64_to_cpu(ags->hest_addr_le),
> + &cper_addr, &read_ack_register_addr, errp)) {
> + return;
> }
>
> cpu_physical_memory_read(read_ack_register_addr,
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 5/5] acpi/ghes: Use error_fatal in acpi_ghes_memory_errors()
2025-11-27 0:44 ` [PATCH 5/5] acpi/ghes: Use error_fatal in acpi_ghes_memory_errors() Gavin Shan
2025-11-27 8:14 ` Markus Armbruster
@ 2025-12-01 10:12 ` Igor Mammedov
1 sibling, 0 replies; 27+ messages in thread
From: Igor Mammedov @ 2025-12-01 10:12 UTC (permalink / raw)
To: Gavin Shan
Cc: qemu-arm, qemu-devel, mchehab+huawei, jonathan.cameron, armbru,
mst, anisinha, gengdongjiu1, peter.maydell, pbonzini, shan.gavin
On Thu, 27 Nov 2025 10:44:35 +1000
Gavin Shan <gshan@redhat.com> wrote:
> Use error_fatal in acpi_ghes_memory_errors() so that the caller needn't
> explicitly call exit(). The return value of acpi_ghes_memory_errors()
> and ghes_record_cper_errors() is changed to 'bool' indicating an error
> has been raised, to be compatible with what's documented in error.h.
>
> Suggested-by: Igor Mammedov <imammedo@redhat.com>
> Suggested-by: Markus Armbruster <armbru@redhat.com>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
> ---
> hw/acpi/ghes-stub.c | 4 ++--
> hw/acpi/ghes.c | 26 ++++++++++----------------
> include/hw/acpi/ghes.h | 6 +++---
> target/arm/kvm.c | 9 +++------
> 4 files changed, 18 insertions(+), 27 deletions(-)
>
> diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
> index b54f1b093c..5f9313cce9 100644
> --- a/hw/acpi/ghes-stub.c
> +++ b/hw/acpi/ghes-stub.c
> @@ -11,8 +11,8 @@
> #include "qemu/osdep.h"
> #include "hw/acpi/ghes.h"
>
> -int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
> - uint64_t physical_address)
> +bool acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
> + uint64_t physical_address, Error **errp)
> {
> g_assert_not_reached();
> }
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index c35883dfa9..3033e93d65 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -515,14 +515,14 @@ static bool get_ghes_source_offsets(uint16_t source_id,
> NotifierList acpi_generic_error_notifiers =
> NOTIFIER_LIST_INITIALIZER(acpi_generic_error_notifiers);
>
> -void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
> +bool ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
> uint16_t source_id, Error **errp)
> {
> uint64_t cper_addr = 0, read_ack_register_addr = 0, read_ack_register;
>
> if (len > ACPI_GHES_MAX_RAW_DATA_LENGTH) {
> error_setg(errp, "GHES CPER record is too big: %zd", len);
> - return;
> + return false;
> }
>
> if (!ags->use_hest_addr) {
> @@ -531,7 +531,7 @@ void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
> } else if (!get_ghes_source_offsets(source_id,
> le64_to_cpu(ags->hest_addr_le),
> &cper_addr, &read_ack_register_addr, errp)) {
> - return;
> + return false;
> }
>
> cpu_physical_memory_read(read_ack_register_addr,
> @@ -542,7 +542,7 @@ void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
> error_setg(errp,
> "OSPM does not acknowledge previous error,"
> " so can not record CPER for current error anymore");
> - return;
> + return false;
> }
>
> read_ack_register = cpu_to_le64(0);
> @@ -557,16 +557,17 @@ void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
> cpu_physical_memory_write(cper_addr, cper, len);
>
> notifier_list_notify(&acpi_generic_error_notifiers, &source_id);
> +
> + return true;
> }
>
> -int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
> - uint64_t physical_address)
> +bool acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
> + uint64_t physical_address, Error **errp)
> {
> /* Memory Error Section Type */
> const uint8_t guid[] =
> UUID_LE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
> 0xED, 0x7C, 0x83, 0xB1);
> - Error *errp = NULL;
> int data_length;
> g_autoptr(GArray) block = g_array_new(false, true /* clear */, 1);
>
> @@ -583,15 +584,8 @@ int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
> /* Build the memory section CPER for above new generic error data entry */
> acpi_ghes_build_append_mem_cper(block, physical_address);
>
> - /* Report the error */
> - ghes_record_cper_errors(ags, block->data, block->len, source_id, &errp);
> -
> - if (errp) {
> - error_report_err(errp);
> - return -1;
> - }
> -
> - return 0;
> + return ghes_record_cper_errors(ags, block->data, block->len,
> + source_id, errp);
> }
>
> AcpiGhesState *acpi_ghes_get_state(void)
> diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
> index df2ecbf6e4..5b29aae4dd 100644
> --- a/include/hw/acpi/ghes.h
> +++ b/include/hw/acpi/ghes.h
> @@ -98,9 +98,9 @@ void acpi_build_hest(AcpiGhesState *ags, GArray *table_data,
> const char *oem_id, const char *oem_table_id);
> void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
> GArray *hardware_errors);
> -int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
> - uint64_t error_physical_addr);
> -void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
> +bool acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
> + uint64_t error_physical_addr, Error **errp);
> +bool ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
> uint16_t source_id, Error **errp);
>
> /**
> diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> index acda0b3fb4..76aa09810f 100644
> --- a/target/arm/kvm.c
> +++ b/target/arm/kvm.c
> @@ -2456,12 +2456,9 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
> */
> if (code == BUS_MCEERR_AR) {
> kvm_cpu_synchronize_state(c);
> - if (!acpi_ghes_memory_errors(ags, ACPI_HEST_SRC_ID_SYNC,
> - paddr)) {
> - kvm_inject_arm_sea(c);
> - } else {
> - exit(1);
> - }
> + acpi_ghes_memory_errors(ags, ACPI_HEST_SRC_ID_SYNC,
> + paddr, &error_fatal);
> + kvm_inject_arm_sea(c);
> }
> return;
> }
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/5] acpi/ghes: Error object handling improvement
2025-11-27 0:44 [PATCH 0/5] acpi/ghes: Error object handling improvement Gavin Shan
` (5 preceding siblings ...)
2025-11-28 14:09 ` [PATCH 0/5] acpi/ghes: Error object handling improvement Igor Mammedov
@ 2025-12-01 12:17 ` Mauro Carvalho Chehab
2025-12-01 14:13 ` Gavin Shan
6 siblings, 1 reply; 27+ messages in thread
From: Mauro Carvalho Chehab @ 2025-12-01 12:17 UTC (permalink / raw)
To: Gavin Shan
Cc: qemu-arm, qemu-devel, jonathan.cameron, armbru, mst, imammedo,
anisinha, gengdongjiu1, peter.maydell, pbonzini, shan.gavin
On Thu, 27 Nov 2025 10:44:30 +1000
Gavin Shan <gshan@redhat.com> wrote:
> This series is curved from that for memory error handling improvement
> [1] based on the received comments, to improve the error object handling
> in various aspects.
>
> [1] https://lists.nongnu.org/archive/html/qemu-arm/2025-11/msg00534.html
>
> Gavin Shan (5):
> acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors()
> acpi/ghes: Abort in acpi_ghes_memory_errors() if necessary
> target/arm/kvm: Exit on error from acpi_ghes_memory_errors()
> acpi/ghes: Bail early on error from get_ghes_source_offsets()
> acpi/ghes: Use error_fatal in acpi_ghes_memory_errors()
Patch series look ok on my eyes.
Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
-
Btw, what setup are you using to test memory errors? It would be
nice to have it documented somewhere, maybe at
docs/specs/acpi_hest_ghes.rst.
Thanks,
Mauro
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/5] acpi/ghes: Error object handling improvement
2025-12-01 12:17 ` Mauro Carvalho Chehab
@ 2025-12-01 14:13 ` Gavin Shan
2025-12-01 14:31 ` Mauro Carvalho Chehab
0 siblings, 1 reply; 27+ messages in thread
From: Gavin Shan @ 2025-12-01 14:13 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: qemu-arm, qemu-devel, jonathan.cameron, armbru, mst, imammedo,
anisinha, gengdongjiu1, peter.maydell, pbonzini, shan.gavin
[-- Attachment #1: Type: text/plain, Size: 3852 bytes --]
Hi Mauro,
On 12/1/25 10:17 PM, Mauro Carvalho Chehab wrote:
> On Thu, 27 Nov 2025 10:44:30 +1000
> Gavin Shan <gshan@redhat.com> wrote:
>
>> This series is curved from that for memory error handling improvement
>> [1] based on the received comments, to improve the error object handling
>> in various aspects.
>>
>> [1] https://lists.nongnu.org/archive/html/qemu-arm/2025-11/msg00534.html
>>
>> Gavin Shan (5):
>> acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors()
>> acpi/ghes: Abort in acpi_ghes_memory_errors() if necessary
>> target/arm/kvm: Exit on error from acpi_ghes_memory_errors()
>> acpi/ghes: Bail early on error from get_ghes_source_offsets()
>> acpi/ghes: Use error_fatal in acpi_ghes_memory_errors()
>
> Patch series look ok on my eyes.
>
> Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
>
Thanks.
> -
>
> Btw, what setup are you using to test memory errors? It would be
> nice to have it documented somewhere, maybe at
> docs/specs/acpi_hest_ghes.rst.
>
I don't think docs/specs/acpi_hest_ghes.rst is the right place for that
as it's for specifications. I'm sharing how this is tested here to make
the thread complete.
- Both host and guest has 4KB page size
- Start the guest by the following command lines
/home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
-accel kvm -machine virt,gic-version=host,nvdimm=on,ras=on \
-cpu host -smp maxcpus=8,cpus=8,sockets=2,clusters=2,cores=2,threads=1 \
-m 4096M,slots=16,maxmem=128G \
-object memory-backend-ram,id=mem0,size=4096M \
-numa node,nodeid=0,cpus=0-7,memdev=mem0 \
-L /home/gavin/sandbox/qemu.main/build/pc-bios \
-monitor none -serial mon:stdio -nographic \
-gdb tcp::6666 -qmp tcp:localhost:5555,server,wait=off \
-bios /home/gavin/sandbox/qemu.main/build/pc-bios/edk2-aarch64-code.fd \
-boot c \
-device pcie-root-port,bus=pcie.0,chassis=1,id=pcie.1 \
-device pcie-root-port,bus=pcie.0,chassis=2,id=pcie.2 \
-device pcie-root-port,bus=pcie.0,chassis=3,id=pcie.3 \
: \
-device pcie-root-port,bus=pcie.0,chassis=16,id=pcie.16 \
-drive file=/home/gavin/sandbox/images/disk.qcow2,if=none,id=drive0 \
-device virtio-blk-pci,id=virtblk0,bus=pcie.1,drive=drive0,num-queues=4 \
-netdev tap,id=tap1,vhost=true,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
-device virtio-net-pci,bus=pcie.8,netdev=tap1,mac=52:54:00:f1:26:b0
- Trigger 'victim -d' in the guest
guest$ ./victim -d
physical address of (0xffff8d9b7000) = 0x1251d6000
Hit any key to trigger error:
- Inject error to the GPA. "test.c" is attached
host$ ./test 0x1251d6000
- Press enter on the guest so that 'victim' continues its execution
[ 435.467481] EDAC MC0: 1 UE unknown on unknown memory ( page:0x1251d6 offset:0x0 grain:1 - APEI location: )
[ 435.467542] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[ 435.467543] {1}[Hardware Error]: event severity: recoverable
[ 435.467544] {1}[Hardware Error]: Error 0, type: recoverable
[ 435.467545] {1}[Hardware Error]: section_type: memory error
[ 435.467546] {1}[Hardware Error]: physical_address: 0x00000001251d6000
[ 435.467547] {1}[Hardware Error]: error_type: 0, unknown
[ 435.468380] Memory failure: 0x1251d6: recovery action for dirty LRU page: Recovered
Bus error (core dumped)
Thanks,
Gavin
> Thanks,
> Mauro
>
[-- Attachment #2: test.c --]
[-- Type: text/x-csrc, Size: 5821 bytes --]
// SPDX-License-Identifier: GPL-2.0+
/*
* This test program runs on the host, to receive GPA outputed by 'victimd'
* from the guest. The GPA is translated to HPA, and recoverable error
* is inject to HPA automatically.
*
* NOTE: We have the assumption that the guest has only one NUMA node and
* the memory capacity is 4GB. The test program won't work if the assumption
* is broken.
*
* Author: Gavin Shan <gshan@redhat.com>
*/
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <assert.h>
#include <errno.h>
#include <time.h>
#include <fcntl.h>
#include <dirent.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <sys/wait.h>
#define TEST_GUEST_MEM_SIZE 0x100000000 /* 4GB */
#define TEST_GUEST_MEM_START 0x040000000 /* 1GB */
#define TEST_INJECT_ERROR_TYPE 0x10
struct test_struct {
int pid;
unsigned long guest_mem_size;
unsigned long gpa;
unsigned long hva;
unsigned long hpa;
};
static void usage(void)
{
fprintf(stdout, "\n");
fprintf(stdout, "./test <gpa>\n");
fprintf(stdout, "gpa The GPA (Guest Physical Address) where the error is injected\n");
fprintf(stdout, "\n");
}
static void init_test_struct(struct test_struct *test)
{
test->pid = -1;
test->guest_mem_size = TEST_GUEST_MEM_SIZE;
test->gpa = -1UL;
test->hpa = -1UL;
}
static int fetch_gpa(struct test_struct *test, int argc, char **argv)
{
if (argc != 2) {
usage();
return -EINVAL;
}
test->gpa = strtoul(argv[1], NULL, 16);
if (test->gpa < TEST_GUEST_MEM_START ||
test->gpa > (TEST_GUEST_MEM_START + test->guest_mem_size)) {
fprintf(stderr, "%s: GPA 0x%lx out of range [1GB, 1GB+0x%lx]\n",
__func__, test->gpa, test->guest_mem_size);
return -EINVAL;
}
return 0;
}
static int find_qemu_pid(struct test_struct *test)
{
DIR *dir;
FILE *fp;
struct dirent *entry;
char path[256], data[256];
size_t sz;
int ret = -ENODEV;
dir = opendir("/proc");
if (!dir) {
fprintf(stderr, "%s: unable to open </proc>\n", __func__);
return -EIO;
}
while ((entry = readdir(dir)) != NULL) {
if (entry->d_type != DT_DIR || entry->d_name[0] == '.')
continue;
memset(path, 0, sizeof(path));
snprintf(path, sizeof(path), "/proc/%s/comm", entry->d_name);
fp = fopen(path, "r");
if (!fp)
continue;
memset(data, 0, sizeof(data));
sz = fread(data, 1, sizeof(data), fp);
fclose(fp);
if (sz <= 0)
continue;
if (strstr(data, "qemu")) {
ret = 0;
test->pid = atoi(entry->d_name);
break;
}
}
if (ret != 0)
fprintf(stderr, "%s: Unable to find QEMU PID\n", __func__);
closedir(dir);
return ret;
}
static int fetch_hva(struct test_struct *test)
{
FILE *fp;
char filename[64], *data = NULL, *next, *next1;
unsigned long start, end;
size_t sz, len;
int ret = -EIO;
memset(filename, 0, sizeof(filename));
snprintf(filename, sizeof(filename), "/proc/%d/smaps", test->pid);
fp = fopen(filename, "r");
if (!fp) {
fprintf(stderr, "%s: Unable to open <%s>\n", __func__, filename);
return ret;
}
while ((sz = getline(&data, &len, fp)) != -1) {
if (!strstr(data, "rw-p"))
continue;
next = strchr(data, '-');
if (!next)
continue;
*next++ = '\0';
next1 = strchr(next, ' ');
if (!next1)
continue;
*next1 = '\0';
start = strtoul(data, NULL, 16);
end = strtoul(next, NULL, 16);
if (end - start == test->guest_mem_size) {
ret = 0;
test->hva = start + (test->gpa - TEST_GUEST_MEM_START);
break;
}
}
if (data)
free(data);
fclose(fp);
return ret;
}
static int fetch_hpa(struct test_struct *test)
{
int fd;
unsigned long pinfo, pgsize = getpagesize();
off_t offset = (test->hva / pgsize) * sizeof(pinfo);
char filename[128];
ssize_t sz;
memset(filename, 0, sizeof(filename));
snprintf(filename, sizeof(filename), "/proc/%d/pagemap", test->pid);
fd = open(filename, O_RDONLY);
if (fd < 0) {
fprintf(stderr, "%s: Unable to open <%s>\n", __func__, filename);
return -EIO;
}
sz = pread(fd, &pinfo, sizeof(pinfo), offset);
close(fd);
if (sz != sizeof(pinfo)) {
fprintf(stderr, "%s: Unable to read from <%s>\n", __func__, filename);
return -EIO;
}
if (!(pinfo & (1UL << 63))) {
fprintf(stderr, "%s: Page not present\n", __func__);
return -EINVAL;
}
test->hpa = ((pinfo & 0x007fffffffffffffUL) * pgsize) + (test->hva & (pgsize - 1));
return 0;
}
static int write_file(const char *filename, unsigned long val)
{
int fd;
char data[128];
size_t sz;
int ret = 0;
memset(data, 0, sizeof(data));
sz = snprintf(data, sizeof(data), "0x%lx", val);
fd = open(filename, O_WRONLY);
if (fd < 0) {
fprintf(stderr, "%s: Unable to open <%s>\n", __func__, filename);
return -EIO;
}
if (write(fd, data, sz) != sz) {
ret = -EIO;
fprintf(stderr, "%s: Unable to write <%s>\n", __func__, filename);
}
close(fd);
return ret;
}
static int inject_error(struct test_struct *test)
{
fprintf(stdout, "pid: %d\n", test->pid);
fprintf(stdout, "gpa: 0x%lx\n", test->gpa);
fprintf(stdout, "hva: 0x%lx\n", test->hva);
fprintf(stdout, "hpa: 0x%lx\n", test->hpa);
system("modprobe einj > /dev/null");
if (write_file("/sys/kernel/debug/apei/einj/param1", test->hpa) ||
write_file("/sys/kernel/debug/apei/einj/param2", 0xfffffffffffff000) ||
write_file("/sys/kernel/debug/apei/einj/flags", 0x0) ||
write_file("/sys/kernel/debug/apei/einj/error_type", TEST_INJECT_ERROR_TYPE) ||
write_file("/sys/kernel/debug/apei/einj/notrigger", 1) ||
write_file("/sys/kernel/debug/apei/einj/error_inject", 1))
return -EIO;
return 0;
}
int main(int argc, char **argv)
{
struct test_struct test;
int ret;
init_test_struct(&test);
if (fetch_gpa(&test, argc, argv) ||
find_qemu_pid(&test) ||
fetch_hva(&test) ||
fetch_hpa(&test) ||
inject_error(&test))
return -EIO;
return 0;
}
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 4/5] acpi/ghes: Bail early on error from get_ghes_source_offsets()
2025-12-01 10:10 ` Igor Mammedov
@ 2025-12-01 14:15 ` Gavin Shan
0 siblings, 0 replies; 27+ messages in thread
From: Gavin Shan @ 2025-12-01 14:15 UTC (permalink / raw)
To: Igor Mammedov
Cc: qemu-arm, qemu-devel, mchehab+huawei, jonathan.cameron, armbru,
mst, anisinha, gengdongjiu1, peter.maydell, pbonzini, shan.gavin
Hi Igor,
On 12/1/25 8:10 PM, Igor Mammedov wrote:
> On Thu, 27 Nov 2025 10:44:34 +1000
> Gavin Shan <gshan@redhat.com> wrote:
>
>> For one particular error (Error), we can't call error_setg() for twice.
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^
> I can't really parse that
> maybe rephrase it to make some sense?
>
I will drop this sentence in v2.
>> Otherwise, the assert(*errp == NULL) will be triggered unexpectedly in
>> error_setv(). In ghes_record_cper_errors(), get_ghes_source_offsets()
>> can return a error initialized by error_setg(). Without bailing on
>> this error, it can call into the second error_setg() due to the
>> unexpected value from the read acknowledgement register.
>>
>> Bail early in ghes_record_cper_errors() when error is received from
>> get_ghes_source_offsets() to avoid the exception.
>>
With above sentence dropped, the commit log improved to something like
below in v2.
In ghes_record_cper_errors(), get_ghes_source_offsets() can return
a error initialized by error_setg(). Without bailing on this error,
it can call into the second error_setg() due to the unexpected value
returned from the read acknowledgement register. The second error_setg()
can trigger assert(*errp == NULL) in its callee error_setv(), which
isn't expected.
Bail early in ghes_record_cper_errors() when error is received from
get_ghes_source_offsets() to avoid the unexpected behavior.
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
>
> patch itself LGTM
> and with commit message fixed
> Reviewed-by: Igor Mammedov <imammedo@redhat.com>
>
>> ---
>> hw/acpi/ghes.c | 15 +++++++++------
>> 1 file changed, 9 insertions(+), 6 deletions(-)
>>
>> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
>> index 6366c74248..c35883dfa9 100644
>> --- a/hw/acpi/ghes.c
>> +++ b/hw/acpi/ghes.c
>> @@ -443,7 +443,7 @@ static void get_hw_error_offsets(uint64_t ghes_addr,
>> *read_ack_register_addr = ghes_addr + sizeof(uint64_t);
>> }
>>
>> -static void get_ghes_source_offsets(uint16_t source_id,
>> +static bool get_ghes_source_offsets(uint16_t source_id,
>> uint64_t hest_addr,
>> uint64_t *cper_addr,
>> uint64_t *read_ack_start_addr,
>> @@ -474,7 +474,7 @@ static void get_ghes_source_offsets(uint16_t source_id,
>> /* For now, we only know the size of GHESv2 table */
>> if (type != ACPI_GHES_SOURCE_GENERIC_ERROR_V2) {
>> error_setg(errp, "HEST: type %d not supported.", type);
>> - return;
>> + return false;
>> }
>>
>> /* Compare CPER source ID at the GHESv2 structure */
>> @@ -488,7 +488,7 @@ static void get_ghes_source_offsets(uint16_t source_id,
>> }
>> if (i == num_sources) {
>> error_setg(errp, "HEST: Source %d not found.", source_id);
>> - return;
>> + return false;
>> }
>>
>> /* Navigate through table address pointers */
>> @@ -508,6 +508,8 @@ static void get_ghes_source_offsets(uint16_t source_id,
>> cpu_physical_memory_read(hest_read_ack_addr, read_ack_start_addr,
>> sizeof(*read_ack_start_addr));
>> *read_ack_start_addr = le64_to_cpu(*read_ack_start_addr);
>> +
>> + return true;
>> }
>>
>> NotifierList acpi_generic_error_notifiers =
>> @@ -526,9 +528,10 @@ void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len,
>> if (!ags->use_hest_addr) {
>> get_hw_error_offsets(le64_to_cpu(ags->hw_error_le),
>> &cper_addr, &read_ack_register_addr);
>> - } else {
>> - get_ghes_source_offsets(source_id, le64_to_cpu(ags->hest_addr_le),
>> - &cper_addr, &read_ack_register_addr, errp);
>> + } else if (!get_ghes_source_offsets(source_id,
>> + le64_to_cpu(ags->hest_addr_le),
>> + &cper_addr, &read_ack_register_addr, errp)) {
>> + return;
>> }
>>
>> cpu_physical_memory_read(read_ack_register_addr,
>
Thanks,
Gavin
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/5] acpi/ghes: Error object handling improvement
2025-12-01 14:13 ` Gavin Shan
@ 2025-12-01 14:31 ` Mauro Carvalho Chehab
2025-12-01 14:37 ` Gavin Shan
0 siblings, 1 reply; 27+ messages in thread
From: Mauro Carvalho Chehab @ 2025-12-01 14:31 UTC (permalink / raw)
To: Gavin Shan
Cc: qemu-arm, qemu-devel, jonathan.cameron, armbru, mst, imammedo,
anisinha, gengdongjiu1, peter.maydell, pbonzini, shan.gavin
On Tue, 2 Dec 2025 00:13:06 +1000
Gavin Shan <gshan@redhat.com> wrote:
> Hi Mauro,
>
> On 12/1/25 10:17 PM, Mauro Carvalho Chehab wrote:
> > On Thu, 27 Nov 2025 10:44:30 +1000
> > Gavin Shan <gshan@redhat.com> wrote:
> >
> >> This series is curved from that for memory error handling improvement
> >> [1] based on the received comments, to improve the error object handling
> >> in various aspects.
> >>
> >> [1] https://lists.nongnu.org/archive/html/qemu-arm/2025-11/msg00534.html
> >>
> >> Gavin Shan (5):
> >> acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors()
> >> acpi/ghes: Abort in acpi_ghes_memory_errors() if necessary
> >> target/arm/kvm: Exit on error from acpi_ghes_memory_errors()
> >> acpi/ghes: Bail early on error from get_ghes_source_offsets()
> >> acpi/ghes: Use error_fatal in acpi_ghes_memory_errors()
> >
> > Patch series look ok on my eyes.
> >
> > Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> >
>
> Thanks.
>
> > -
> >
> > Btw, what setup are you using to test memory errors? It would be
> > nice to have it documented somewhere, maybe at
> > docs/specs/acpi_hest_ghes.rst.
> >
>
> I don't think docs/specs/acpi_hest_ghes.rst is the right place for that
> as it's for specifications.
Perhaps not, but it would be nice to have it documented somewhere,
either there or at QEMU wiki.
> I'm sharing how this is tested here to make the thread complete.
Thanks!
>
> - Both host and guest has 4KB page size
>
> - Start the guest by the following command lines
>
> /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
> -accel kvm -machine virt,gic-version=host,nvdimm=on,ras=on \
> -cpu host -smp maxcpus=8,cpus=8,sockets=2,clusters=2,cores=2,threads=1 \
> -m 4096M,slots=16,maxmem=128G \
> -object memory-backend-ram,id=mem0,size=4096M \
> -numa node,nodeid=0,cpus=0-7,memdev=mem0 \
> -L /home/gavin/sandbox/qemu.main/build/pc-bios \
> -monitor none -serial mon:stdio -nographic \
> -gdb tcp::6666 -qmp tcp:localhost:5555,server,wait=off \
> -bios /home/gavin/sandbox/qemu.main/build/pc-bios/edk2-aarch64-code.fd \
> -boot c \
> -device pcie-root-port,bus=pcie.0,chassis=1,id=pcie.1 \
> -device pcie-root-port,bus=pcie.0,chassis=2,id=pcie.2 \
> -device pcie-root-port,bus=pcie.0,chassis=3,id=pcie.3 \
> : \
> -device pcie-root-port,bus=pcie.0,chassis=16,id=pcie.16 \
> -drive file=/home/gavin/sandbox/images/disk.qcow2,if=none,id=drive0 \
> -device virtio-blk-pci,id=virtblk0,bus=pcie.1,drive=drive0,num-queues=4 \
> -netdev tap,id=tap1,vhost=true,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
> -device virtio-net-pci,bus=pcie.8,netdev=tap1,mac=52:54:00:f1:26:b0
>
> - Trigger 'victim -d' in the guest
Hmm... from where I can get victim?
Regards,
Mauro
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/5] acpi/ghes: Error object handling improvement
2025-12-01 14:31 ` Mauro Carvalho Chehab
@ 2025-12-01 14:37 ` Gavin Shan
2025-12-02 12:10 ` Igor Mammedov
2025-12-02 13:20 ` Peter Maydell
0 siblings, 2 replies; 27+ messages in thread
From: Gavin Shan @ 2025-12-01 14:37 UTC (permalink / raw)
To: Mauro Carvalho Chehab
Cc: qemu-arm, qemu-devel, jonathan.cameron, armbru, mst, imammedo,
anisinha, gengdongjiu1, peter.maydell, pbonzini, shan.gavin
Hi Mauro,
On 12/2/25 12:31 AM, Mauro Carvalho Chehab wrote:
> On Tue, 2 Dec 2025 00:13:06 +1000
> Gavin Shan <gshan@redhat.com> wrote:
>> On 12/1/25 10:17 PM, Mauro Carvalho Chehab wrote:
>>> On Thu, 27 Nov 2025 10:44:30 +1000
>>> Gavin Shan <gshan@redhat.com> wrote:
[...]
>>>
>>> Btw, what setup are you using to test memory errors? It would be
>>> nice to have it documented somewhere, maybe at
>>> docs/specs/acpi_hest_ghes.rst.
>>>
>>
>> I don't think docs/specs/acpi_hest_ghes.rst is the right place for that
>> as it's for specifications.
>
> Perhaps not, but it would be nice to have it documented somewhere,
> either there or at QEMU wiki.
>
QEMU wiki may be the best place for it. I never updated to QEMU wiki and
any guiding steps on how to do that?
>> I'm sharing how this is tested here to make the thread complete.
>
> Thanks!
>
>>
>> - Both host and guest has 4KB page size
>>
>> - Start the guest by the following command lines
>>
>> /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
>> -accel kvm -machine virt,gic-version=host,nvdimm=on,ras=on \
>> -cpu host -smp maxcpus=8,cpus=8,sockets=2,clusters=2,cores=2,threads=1 \
>> -m 4096M,slots=16,maxmem=128G \
>> -object memory-backend-ram,id=mem0,size=4096M \
>> -numa node,nodeid=0,cpus=0-7,memdev=mem0 \
>> -L /home/gavin/sandbox/qemu.main/build/pc-bios \
>> -monitor none -serial mon:stdio -nographic \
>> -gdb tcp::6666 -qmp tcp:localhost:5555,server,wait=off \
>> -bios /home/gavin/sandbox/qemu.main/build/pc-bios/edk2-aarch64-code.fd \
>> -boot c \
>> -device pcie-root-port,bus=pcie.0,chassis=1,id=pcie.1 \
>> -device pcie-root-port,bus=pcie.0,chassis=2,id=pcie.2 \
>> -device pcie-root-port,bus=pcie.0,chassis=3,id=pcie.3 \
>> : \
>> -device pcie-root-port,bus=pcie.0,chassis=16,id=pcie.16 \
>> -drive file=/home/gavin/sandbox/images/disk.qcow2,if=none,id=drive0 \
>> -device virtio-blk-pci,id=virtblk0,bus=pcie.1,drive=drive0,num-queues=4 \
>> -netdev tap,id=tap1,vhost=true,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
>> -device virtio-net-pci,bus=pcie.8,netdev=tap1,mac=52:54:00:f1:26:b0
>>
>> - Trigger 'victim -d' in the guest
>
> Hmm... from where I can get victim?
>
https://git.kernel.org/pub/scm/utils/cpu/mce/mce-test.git
> Regards,
> Mauro
>
Thanks,
Gavin
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/5] acpi/ghes: Error object handling improvement
2025-12-01 14:37 ` Gavin Shan
@ 2025-12-02 12:10 ` Igor Mammedov
2025-12-02 13:20 ` Peter Maydell
1 sibling, 0 replies; 27+ messages in thread
From: Igor Mammedov @ 2025-12-02 12:10 UTC (permalink / raw)
To: Gavin Shan
Cc: Mauro Carvalho Chehab, qemu-arm, qemu-devel, jonathan.cameron,
armbru, mst, anisinha, gengdongjiu1, peter.maydell, pbonzini,
shan.gavin
On Tue, 2 Dec 2025 00:37:53 +1000
Gavin Shan <gshan@redhat.com> wrote:
> Hi Mauro,
>
> On 12/2/25 12:31 AM, Mauro Carvalho Chehab wrote:
> > On Tue, 2 Dec 2025 00:13:06 +1000
> > Gavin Shan <gshan@redhat.com> wrote:
> >> On 12/1/25 10:17 PM, Mauro Carvalho Chehab wrote:
> >>> On Thu, 27 Nov 2025 10:44:30 +1000
> >>> Gavin Shan <gshan@redhat.com> wrote:
>
> [...]
>
> >>>
> >>> Btw, what setup are you using to test memory errors? It would be
> >>> nice to have it documented somewhere, maybe at
> >>> docs/specs/acpi_hest_ghes.rst.
> >>>
> >>
> >> I don't think docs/specs/acpi_hest_ghes.rst is the right place for that
> >> as it's for specifications.
> >
> > Perhaps not, but it would be nice to have it documented somewhere,
> > either there or at QEMU wiki.
> >
>
> QEMU wiki may be the best place for it. I never updated to QEMU wiki and
> any guiding steps on how to do that?
do you have an account already?
>
> >> I'm sharing how this is tested here to make the thread complete.
> >
> > Thanks!
> >
> >>
> >> - Both host and guest has 4KB page size
> >>
> >> - Start the guest by the following command lines
> >>
> >> /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
> >> -accel kvm -machine virt,gic-version=host,nvdimm=on,ras=on \
> >> -cpu host -smp maxcpus=8,cpus=8,sockets=2,clusters=2,cores=2,threads=1 \
> >> -m 4096M,slots=16,maxmem=128G \
> >> -object memory-backend-ram,id=mem0,size=4096M \
> >> -numa node,nodeid=0,cpus=0-7,memdev=mem0 \
> >> -L /home/gavin/sandbox/qemu.main/build/pc-bios \
> >> -monitor none -serial mon:stdio -nographic \
> >> -gdb tcp::6666 -qmp tcp:localhost:5555,server,wait=off \
> >> -bios /home/gavin/sandbox/qemu.main/build/pc-bios/edk2-aarch64-code.fd \
> >> -boot c \
> >> -device pcie-root-port,bus=pcie.0,chassis=1,id=pcie.1 \
> >> -device pcie-root-port,bus=pcie.0,chassis=2,id=pcie.2 \
> >> -device pcie-root-port,bus=pcie.0,chassis=3,id=pcie.3 \
> >> : \
> >> -device pcie-root-port,bus=pcie.0,chassis=16,id=pcie.16 \
> >> -drive file=/home/gavin/sandbox/images/disk.qcow2,if=none,id=drive0 \
> >> -device virtio-blk-pci,id=virtblk0,bus=pcie.1,drive=drive0,num-queues=4 \
> >> -netdev tap,id=tap1,vhost=true,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
> >> -device virtio-net-pci,bus=pcie.8,netdev=tap1,mac=52:54:00:f1:26:b0
> >>
> >> - Trigger 'victim -d' in the guest
> >
> > Hmm... from where I can get victim?
> >
>
> https://git.kernel.org/pub/scm/utils/cpu/mce/mce-test.git
>
> > Regards,
> > Mauro
> >
>
> Thanks,
> Gavin
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/5] acpi/ghes: Error object handling improvement
2025-12-01 14:37 ` Gavin Shan
2025-12-02 12:10 ` Igor Mammedov
@ 2025-12-02 13:20 ` Peter Maydell
1 sibling, 0 replies; 27+ messages in thread
From: Peter Maydell @ 2025-12-02 13:20 UTC (permalink / raw)
To: Gavin Shan
Cc: Mauro Carvalho Chehab, qemu-arm, qemu-devel, jonathan.cameron,
armbru, mst, imammedo, anisinha, gengdongjiu1, pbonzini,
shan.gavin
On Mon, 1 Dec 2025 at 14:38, Gavin Shan <gshan@redhat.com> wrote:
>
> Hi Mauro,
>
> On 12/2/25 12:31 AM, Mauro Carvalho Chehab wrote:
> > On Tue, 2 Dec 2025 00:13:06 +1000
> > Gavin Shan <gshan@redhat.com> wrote:
> >> On 12/1/25 10:17 PM, Mauro Carvalho Chehab wrote:
> >>> On Thu, 27 Nov 2025 10:44:30 +1000
> >>> Gavin Shan <gshan@redhat.com> wrote:
>
> [...]
>
> >>>
> >>> Btw, what setup are you using to test memory errors? It would be
> >>> nice to have it documented somewhere, maybe at
> >>> docs/specs/acpi_hest_ghes.rst.
> >>>
> >>
> >> I don't think docs/specs/acpi_hest_ghes.rst is the right place for that
> >> as it's for specifications.
> >
> > Perhaps not, but it would be nice to have it documented somewhere,
> > either there or at QEMU wiki.
> >
>
> QEMU wiki may be the best place for it. I never updated to QEMU wiki and
> any guiding steps on how to do that?
I think in general we should prefer to document things in docs/
if we think users would want to know them. If it's just a
test setup then perhaps docs/devel, or if feasible actually
make it a test in tests/. The wiki is largely unused except
for the changelog and planning docs.
(In an ideal world we'd check for parts of the wiki that still
have useful-to-users up to date information, and fold them into
our manuals.)
thanks
-- PMM
^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2025-12-02 13:21 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-27 0:44 [PATCH 0/5] acpi/ghes: Error object handling improvement Gavin Shan
2025-11-27 0:44 ` [PATCH 1/5] acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors() Gavin Shan
2025-11-27 8:06 ` Markus Armbruster
2025-12-01 9:32 ` Igor Mammedov
2025-11-27 0:44 ` [PATCH 2/5] acpi/ghes: Abort in acpi_ghes_memory_errors() if necessary Gavin Shan
2025-12-01 9:37 ` Igor Mammedov
2025-11-27 0:44 ` [PATCH 3/5] target/arm/kvm: Exit on error from acpi_ghes_memory_errors() Gavin Shan
2025-11-28 14:07 ` Igor Mammedov
2025-11-28 14:54 ` Markus Armbruster
2025-12-01 10:06 ` Igor Mammedov
2025-11-27 0:44 ` [PATCH 4/5] acpi/ghes: Bail early on error from get_ghes_source_offsets() Gavin Shan
2025-11-27 8:10 ` Markus Armbruster
2025-12-01 10:10 ` Igor Mammedov
2025-12-01 14:15 ` Gavin Shan
2025-11-27 0:44 ` [PATCH 5/5] acpi/ghes: Use error_fatal in acpi_ghes_memory_errors() Gavin Shan
2025-11-27 8:14 ` Markus Armbruster
2025-11-29 1:23 ` Gavin Shan
2025-12-01 10:12 ` Igor Mammedov
2025-11-28 14:09 ` [PATCH 0/5] acpi/ghes: Error object handling improvement Igor Mammedov
2025-11-29 1:21 ` Gavin Shan
2025-12-01 9:31 ` Igor Mammedov
2025-12-01 12:17 ` Mauro Carvalho Chehab
2025-12-01 14:13 ` Gavin Shan
2025-12-01 14:31 ` Mauro Carvalho Chehab
2025-12-01 14:37 ` Gavin Shan
2025-12-02 12:10 ` Igor Mammedov
2025-12-02 13:20 ` Peter Maydell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).