* [PATCH 0/5] acpi/ghes: Error object handling improvement
@ 2025-11-27 0:44 Gavin Shan
2025-11-27 0:44 ` [PATCH 1/5] acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors() Gavin Shan
` (6 more replies)
0 siblings, 7 replies; 27+ messages in thread
From: Gavin Shan @ 2025-11-27 0:44 UTC (permalink / raw)
To: qemu-arm
Cc: qemu-devel, mchehab+huawei, jonathan.cameron, armbru, mst,
imammedo, anisinha, gengdongjiu1, peter.maydell, pbonzini,
shan.gavin
This series is curved from that for memory error handling improvement
[1] based on the received comments, to improve the error object handling
in various aspects.
[1] https://lists.nongnu.org/archive/html/qemu-arm/2025-11/msg00534.html
Gavin Shan (5):
acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors()
acpi/ghes: Abort in acpi_ghes_memory_errors() if necessary
target/arm/kvm: Exit on error from acpi_ghes_memory_errors()
acpi/ghes: Bail early on error from get_ghes_source_offsets()
acpi/ghes: Use error_fatal in acpi_ghes_memory_errors()
hw/acpi/ghes-stub.c | 6 +++---
hw/acpi/ghes.c | 45 ++++++++++++++++++------------------------
include/hw/acpi/ghes.h | 6 +++---
target/arm/kvm.c | 10 +++-------
4 files changed, 28 insertions(+), 39 deletions(-)
--
2.51.1
^ permalink raw reply [flat|nested] 27+ messages in thread* [PATCH 1/5] acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors() 2025-11-27 0:44 [PATCH 0/5] acpi/ghes: Error object handling improvement Gavin Shan @ 2025-11-27 0:44 ` Gavin Shan 2025-11-27 8:06 ` Markus Armbruster 2025-12-01 9:32 ` Igor Mammedov 2025-11-27 0:44 ` [PATCH 2/5] acpi/ghes: Abort in acpi_ghes_memory_errors() if necessary Gavin Shan ` (5 subsequent siblings) 6 siblings, 2 replies; 27+ messages in thread From: Gavin Shan @ 2025-11-27 0:44 UTC (permalink / raw) To: qemu-arm Cc: qemu-devel, mchehab+huawei, jonathan.cameron, armbru, mst, imammedo, anisinha, gengdongjiu1, peter.maydell, pbonzini, shan.gavin Use g_auto_ptr() to automate data block cleanup in the function so that it won't be a burden to us. Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Gavin Shan <gshan@redhat.com> --- hw/acpi/ghes.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c index 06555905ce..6366c74248 100644 --- a/hw/acpi/ghes.c +++ b/hw/acpi/ghes.c @@ -565,9 +565,7 @@ int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id, 0xED, 0x7C, 0x83, 0xB1); Error *errp = NULL; int data_length; - GArray *block; - - block = g_array_new(false, true /* clear */, 1); + g_autoptr(GArray) block = g_array_new(false, true /* clear */, 1); data_length = ACPI_GHES_DATA_LENGTH + ACPI_GHES_MEM_CPER_LENGTH; /* @@ -585,8 +583,6 @@ int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id, /* Report the error */ ghes_record_cper_errors(ags, block->data, block->len, source_id, &errp); - g_array_free(block, true); - if (errp) { error_report_err(errp); return -1; -- 2.51.1 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 1/5] acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors() 2025-11-27 0:44 ` [PATCH 1/5] acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors() Gavin Shan @ 2025-11-27 8:06 ` Markus Armbruster 2025-12-01 9:32 ` Igor Mammedov 1 sibling, 0 replies; 27+ messages in thread From: Markus Armbruster @ 2025-11-27 8:06 UTC (permalink / raw) To: Gavin Shan Cc: qemu-arm, qemu-devel, mchehab+huawei, jonathan.cameron, mst, imammedo, anisinha, gengdongjiu1, peter.maydell, pbonzini, shan.gavin Gavin Shan <gshan@redhat.com> writes: > Use g_auto_ptr() to automate data block cleanup in the function so > that it won't be a burden to us. > > Suggested-by: Markus Armbruster <armbru@redhat.com> > Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 1/5] acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors() 2025-11-27 0:44 ` [PATCH 1/5] acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors() Gavin Shan 2025-11-27 8:06 ` Markus Armbruster @ 2025-12-01 9:32 ` Igor Mammedov 1 sibling, 0 replies; 27+ messages in thread From: Igor Mammedov @ 2025-12-01 9:32 UTC (permalink / raw) To: Gavin Shan Cc: qemu-arm, qemu-devel, mchehab+huawei, jonathan.cameron, armbru, mst, anisinha, gengdongjiu1, peter.maydell, pbonzini, shan.gavin On Thu, 27 Nov 2025 10:44:31 +1000 Gavin Shan <gshan@redhat.com> wrote: > Use g_auto_ptr() to automate data block cleanup in the function so > that it won't be a burden to us. > > Suggested-by: Markus Armbruster <armbru@redhat.com> > Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> > --- > hw/acpi/ghes.c | 6 +----- > 1 file changed, 1 insertion(+), 5 deletions(-) > > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c > index 06555905ce..6366c74248 100644 > --- a/hw/acpi/ghes.c > +++ b/hw/acpi/ghes.c > @@ -565,9 +565,7 @@ int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id, > 0xED, 0x7C, 0x83, 0xB1); > Error *errp = NULL; > int data_length; > - GArray *block; > - > - block = g_array_new(false, true /* clear */, 1); > + g_autoptr(GArray) block = g_array_new(false, true /* clear */, 1); > > data_length = ACPI_GHES_DATA_LENGTH + ACPI_GHES_MEM_CPER_LENGTH; > /* > @@ -585,8 +583,6 @@ int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id, > /* Report the error */ > ghes_record_cper_errors(ags, block->data, block->len, source_id, &errp); > > - g_array_free(block, true); > - > if (errp) { > error_report_err(errp); > return -1; ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 2/5] acpi/ghes: Abort in acpi_ghes_memory_errors() if necessary 2025-11-27 0:44 [PATCH 0/5] acpi/ghes: Error object handling improvement Gavin Shan 2025-11-27 0:44 ` [PATCH 1/5] acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors() Gavin Shan @ 2025-11-27 0:44 ` Gavin Shan 2025-12-01 9:37 ` Igor Mammedov 2025-11-27 0:44 ` [PATCH 3/5] target/arm/kvm: Exit on error from acpi_ghes_memory_errors() Gavin Shan ` (4 subsequent siblings) 6 siblings, 1 reply; 27+ messages in thread From: Gavin Shan @ 2025-11-27 0:44 UTC (permalink / raw) To: qemu-arm Cc: qemu-devel, mchehab+huawei, jonathan.cameron, armbru, mst, imammedo, anisinha, gengdongjiu1, peter.maydell, pbonzini, shan.gavin The function hw/acpi/ghes-stub.c::acpi_ghes_memory_errors() shouldn't be called by any one. Take g_assert_not_reached() as a clearer indication. Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Gavin Shan <gshan@redhat.com> --- hw/acpi/ghes-stub.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c index 40f660c246..b54f1b093c 100644 --- a/hw/acpi/ghes-stub.c +++ b/hw/acpi/ghes-stub.c @@ -14,7 +14,7 @@ int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id, uint64_t physical_address) { - return -1; + g_assert_not_reached(); } AcpiGhesState *acpi_ghes_get_state(void) -- 2.51.1 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 2/5] acpi/ghes: Abort in acpi_ghes_memory_errors() if necessary 2025-11-27 0:44 ` [PATCH 2/5] acpi/ghes: Abort in acpi_ghes_memory_errors() if necessary Gavin Shan @ 2025-12-01 9:37 ` Igor Mammedov 0 siblings, 0 replies; 27+ messages in thread From: Igor Mammedov @ 2025-12-01 9:37 UTC (permalink / raw) To: Gavin Shan Cc: qemu-arm, qemu-devel, mchehab+huawei, jonathan.cameron, armbru, mst, anisinha, gengdongjiu1, peter.maydell, pbonzini, shan.gavin On Thu, 27 Nov 2025 10:44:32 +1000 Gavin Shan <gshan@redhat.com> wrote: > The function hw/acpi/ghes-stub.c::acpi_ghes_memory_errors() shouldn't > be called by any one. Take g_assert_not_reached() as a clearer indication. > > Suggested-by: Markus Armbruster <armbru@redhat.com> > Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> > --- > hw/acpi/ghes-stub.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c > index 40f660c246..b54f1b093c 100644 > --- a/hw/acpi/ghes-stub.c > +++ b/hw/acpi/ghes-stub.c > @@ -14,7 +14,7 @@ > int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id, > uint64_t physical_address) > { > - return -1; > + g_assert_not_reached(); > } > > AcpiGhesState *acpi_ghes_get_state(void) ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 3/5] target/arm/kvm: Exit on error from acpi_ghes_memory_errors() 2025-11-27 0:44 [PATCH 0/5] acpi/ghes: Error object handling improvement Gavin Shan 2025-11-27 0:44 ` [PATCH 1/5] acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors() Gavin Shan 2025-11-27 0:44 ` [PATCH 2/5] acpi/ghes: Abort in acpi_ghes_memory_errors() if necessary Gavin Shan @ 2025-11-27 0:44 ` Gavin Shan 2025-11-28 14:07 ` Igor Mammedov 2025-12-01 10:06 ` Igor Mammedov 2025-11-27 0:44 ` [PATCH 4/5] acpi/ghes: Bail early on error from get_ghes_source_offsets() Gavin Shan ` (3 subsequent siblings) 6 siblings, 2 replies; 27+ messages in thread From: Gavin Shan @ 2025-11-27 0:44 UTC (permalink / raw) To: qemu-arm Cc: qemu-devel, mchehab+huawei, jonathan.cameron, armbru, mst, imammedo, anisinha, gengdongjiu1, peter.maydell, pbonzini, shan.gavin A core dump is no sense as there isn't programming bugs related to errors from acpi_ghes_memory_errors(). Exit instead of abort when the function returns errors, and the excessive error message is also dropped. Suggested-by: Igor Mammedov <imammedo@redhat.com> Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Gavin Shan <gshan@redhat.com> --- target/arm/kvm.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/target/arm/kvm.c b/target/arm/kvm.c index 0d57081e69..acda0b3fb4 100644 --- a/target/arm/kvm.c +++ b/target/arm/kvm.c @@ -2460,8 +2460,7 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr) paddr)) { kvm_inject_arm_sea(c); } else { - error_report("failed to record the error"); - abort(); + exit(1); } } return; -- 2.51.1 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 3/5] target/arm/kvm: Exit on error from acpi_ghes_memory_errors() 2025-11-27 0:44 ` [PATCH 3/5] target/arm/kvm: Exit on error from acpi_ghes_memory_errors() Gavin Shan @ 2025-11-28 14:07 ` Igor Mammedov 2025-11-28 14:54 ` Markus Armbruster 2025-12-01 10:06 ` Igor Mammedov 1 sibling, 1 reply; 27+ messages in thread From: Igor Mammedov @ 2025-11-28 14:07 UTC (permalink / raw) To: Gavin Shan Cc: qemu-arm, qemu-devel, mchehab+huawei, jonathan.cameron, armbru, mst, anisinha, gengdongjiu1, peter.maydell, pbonzini, shan.gavin On Thu, 27 Nov 2025 10:44:33 +1000 Gavin Shan <gshan@redhat.com> wrote: > A core dump is no sense as there isn't programming bugs related to > errors from acpi_ghes_memory_errors(). > > Exit instead of abort when the function returns errors, and the > excessive error message is also dropped. > > Suggested-by: Igor Mammedov <imammedo@redhat.com> > Suggested-by: Markus Armbruster <armbru@redhat.com> > Signed-off-by: Gavin Shan <gshan@redhat.com> > --- > target/arm/kvm.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/target/arm/kvm.c b/target/arm/kvm.c > index 0d57081e69..acda0b3fb4 100644 > --- a/target/arm/kvm.c > +++ b/target/arm/kvm.c > @@ -2460,8 +2460,7 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr) > paddr)) { > kvm_inject_arm_sea(c); > } else { > - error_report("failed to record the error"); so with message gone, user will just see qemu silently exit? > - abort(); > + exit(1); > } > } > return; ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 3/5] target/arm/kvm: Exit on error from acpi_ghes_memory_errors() 2025-11-28 14:07 ` Igor Mammedov @ 2025-11-28 14:54 ` Markus Armbruster 0 siblings, 0 replies; 27+ messages in thread From: Markus Armbruster @ 2025-11-28 14:54 UTC (permalink / raw) To: Igor Mammedov Cc: Gavin Shan, qemu-arm, qemu-devel, mchehab+huawei, jonathan.cameron, mst, anisinha, gengdongjiu1, peter.maydell, pbonzini, shan.gavin Igor Mammedov <imammedo@redhat.com> writes: > On Thu, 27 Nov 2025 10:44:33 +1000 > Gavin Shan <gshan@redhat.com> wrote: > >> A core dump is no sense as there isn't programming bugs related to >> errors from acpi_ghes_memory_errors(). >> >> Exit instead of abort when the function returns errors, and the >> excessive error message is also dropped. >> >> Suggested-by: Igor Mammedov <imammedo@redhat.com> >> Suggested-by: Markus Armbruster <armbru@redhat.com> >> Signed-off-by: Gavin Shan <gshan@redhat.com> >> --- >> target/arm/kvm.c | 3 +-- >> 1 file changed, 1 insertion(+), 2 deletions(-) >> >> diff --git a/target/arm/kvm.c b/target/arm/kvm.c >> index 0d57081e69..acda0b3fb4 100644 >> --- a/target/arm/kvm.c >> +++ b/target/arm/kvm.c >> @@ -2460,8 +2460,7 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr) >> paddr)) { >> kvm_inject_arm_sea(c); >> } else { >> - error_report("failed to record the error"); > > so with message gone, user will just see qemu silently exit? No: acpi_ghes_memory_errors() reports with error_report(). But the stub doesn't. If it can be called, it needs a suitable error_report(). It it can't be called, it should abort(). >> - abort(); >> + exit(1); >> } >> } >> return; ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 3/5] target/arm/kvm: Exit on error from acpi_ghes_memory_errors() 2025-11-27 0:44 ` [PATCH 3/5] target/arm/kvm: Exit on error from acpi_ghes_memory_errors() Gavin Shan 2025-11-28 14:07 ` Igor Mammedov @ 2025-12-01 10:06 ` Igor Mammedov 1 sibling, 0 replies; 27+ messages in thread From: Igor Mammedov @ 2025-12-01 10:06 UTC (permalink / raw) To: Gavin Shan Cc: qemu-arm, qemu-devel, mchehab+huawei, jonathan.cameron, armbru, mst, anisinha, gengdongjiu1, peter.maydell, pbonzini, shan.gavin On Thu, 27 Nov 2025 10:44:33 +1000 Gavin Shan <gshan@redhat.com> wrote: > A core dump is no sense as there isn't programming bugs related to > errors from acpi_ghes_memory_errors(). > > Exit instead of abort when the function returns errors, and the > excessive error message is also dropped. > > Suggested-by: Igor Mammedov <imammedo@redhat.com> > Suggested-by: Markus Armbruster <armbru@redhat.com> > Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> > --- > target/arm/kvm.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/target/arm/kvm.c b/target/arm/kvm.c > index 0d57081e69..acda0b3fb4 100644 > --- a/target/arm/kvm.c > +++ b/target/arm/kvm.c > @@ -2460,8 +2460,7 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr) > paddr)) { > kvm_inject_arm_sea(c); > } else { > - error_report("failed to record the error"); > - abort(); > + exit(1); > } > } > return; ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 4/5] acpi/ghes: Bail early on error from get_ghes_source_offsets() 2025-11-27 0:44 [PATCH 0/5] acpi/ghes: Error object handling improvement Gavin Shan ` (2 preceding siblings ...) 2025-11-27 0:44 ` [PATCH 3/5] target/arm/kvm: Exit on error from acpi_ghes_memory_errors() Gavin Shan @ 2025-11-27 0:44 ` Gavin Shan 2025-11-27 8:10 ` Markus Armbruster 2025-12-01 10:10 ` Igor Mammedov 2025-11-27 0:44 ` [PATCH 5/5] acpi/ghes: Use error_fatal in acpi_ghes_memory_errors() Gavin Shan ` (2 subsequent siblings) 6 siblings, 2 replies; 27+ messages in thread From: Gavin Shan @ 2025-11-27 0:44 UTC (permalink / raw) To: qemu-arm Cc: qemu-devel, mchehab+huawei, jonathan.cameron, armbru, mst, imammedo, anisinha, gengdongjiu1, peter.maydell, pbonzini, shan.gavin For one particular error (Error), we can't call error_setg() for twice. Otherwise, the assert(*errp == NULL) will be triggered unexpectedly in error_setv(). In ghes_record_cper_errors(), get_ghes_source_offsets() can return a error initialized by error_setg(). Without bailing on this error, it can call into the second error_setg() due to the unexpected value from the read acknowledgement register. Bail early in ghes_record_cper_errors() when error is received from get_ghes_source_offsets() to avoid the exception. Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> --- hw/acpi/ghes.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c index 6366c74248..c35883dfa9 100644 --- a/hw/acpi/ghes.c +++ b/hw/acpi/ghes.c @@ -443,7 +443,7 @@ static void get_hw_error_offsets(uint64_t ghes_addr, *read_ack_register_addr = ghes_addr + sizeof(uint64_t); } -static void get_ghes_source_offsets(uint16_t source_id, +static bool get_ghes_source_offsets(uint16_t source_id, uint64_t hest_addr, uint64_t *cper_addr, uint64_t *read_ack_start_addr, @@ -474,7 +474,7 @@ static void get_ghes_source_offsets(uint16_t source_id, /* For now, we only know the size of GHESv2 table */ if (type != ACPI_GHES_SOURCE_GENERIC_ERROR_V2) { error_setg(errp, "HEST: type %d not supported.", type); - return; + return false; } /* Compare CPER source ID at the GHESv2 structure */ @@ -488,7 +488,7 @@ static void get_ghes_source_offsets(uint16_t source_id, } if (i == num_sources) { error_setg(errp, "HEST: Source %d not found.", source_id); - return; + return false; } /* Navigate through table address pointers */ @@ -508,6 +508,8 @@ static void get_ghes_source_offsets(uint16_t source_id, cpu_physical_memory_read(hest_read_ack_addr, read_ack_start_addr, sizeof(*read_ack_start_addr)); *read_ack_start_addr = le64_to_cpu(*read_ack_start_addr); + + return true; } NotifierList acpi_generic_error_notifiers = @@ -526,9 +528,10 @@ void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len, if (!ags->use_hest_addr) { get_hw_error_offsets(le64_to_cpu(ags->hw_error_le), &cper_addr, &read_ack_register_addr); - } else { - get_ghes_source_offsets(source_id, le64_to_cpu(ags->hest_addr_le), - &cper_addr, &read_ack_register_addr, errp); + } else if (!get_ghes_source_offsets(source_id, + le64_to_cpu(ags->hest_addr_le), + &cper_addr, &read_ack_register_addr, errp)) { + return; } cpu_physical_memory_read(read_ack_register_addr, -- 2.51.1 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 4/5] acpi/ghes: Bail early on error from get_ghes_source_offsets() 2025-11-27 0:44 ` [PATCH 4/5] acpi/ghes: Bail early on error from get_ghes_source_offsets() Gavin Shan @ 2025-11-27 8:10 ` Markus Armbruster 2025-12-01 10:10 ` Igor Mammedov 1 sibling, 0 replies; 27+ messages in thread From: Markus Armbruster @ 2025-11-27 8:10 UTC (permalink / raw) To: Gavin Shan Cc: qemu-arm, qemu-devel, mchehab+huawei, jonathan.cameron, mst, imammedo, anisinha, gengdongjiu1, peter.maydell, pbonzini, shan.gavin Gavin Shan <gshan@redhat.com> writes: > For one particular error (Error), we can't call error_setg() for twice. > Otherwise, the assert(*errp == NULL) will be triggered unexpectedly in > error_setv(). In ghes_record_cper_errors(), get_ghes_source_offsets() > can return a error initialized by error_setg(). Without bailing on > this error, it can call into the second error_setg() due to the > unexpected value from the read acknowledgement register. > > Bail early in ghes_record_cper_errors() when error is received from > get_ghes_source_offsets() to avoid the exception. > > Signed-off-by: Gavin Shan <gshan@redhat.com> > Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 4/5] acpi/ghes: Bail early on error from get_ghes_source_offsets() 2025-11-27 0:44 ` [PATCH 4/5] acpi/ghes: Bail early on error from get_ghes_source_offsets() Gavin Shan 2025-11-27 8:10 ` Markus Armbruster @ 2025-12-01 10:10 ` Igor Mammedov 2025-12-01 14:15 ` Gavin Shan 1 sibling, 1 reply; 27+ messages in thread From: Igor Mammedov @ 2025-12-01 10:10 UTC (permalink / raw) To: Gavin Shan Cc: qemu-arm, qemu-devel, mchehab+huawei, jonathan.cameron, armbru, mst, anisinha, gengdongjiu1, peter.maydell, pbonzini, shan.gavin On Thu, 27 Nov 2025 10:44:34 +1000 Gavin Shan <gshan@redhat.com> wrote: > For one particular error (Error), we can't call error_setg() for twice. ^^^^^^^^^^^^^^^^^^^^^^^^^^^ I can't really parse that maybe rephrase it to make some sense? > Otherwise, the assert(*errp == NULL) will be triggered unexpectedly in > error_setv(). In ghes_record_cper_errors(), get_ghes_source_offsets() > can return a error initialized by error_setg(). Without bailing on > this error, it can call into the second error_setg() due to the > unexpected value from the read acknowledgement register. > > Bail early in ghes_record_cper_errors() when error is received from > get_ghes_source_offsets() to avoid the exception. > > Signed-off-by: Gavin Shan <gshan@redhat.com> > Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> patch itself LGTM and with commit message fixed Reviewed-by: Igor Mammedov <imammedo@redhat.com> > --- > hw/acpi/ghes.c | 15 +++++++++------ > 1 file changed, 9 insertions(+), 6 deletions(-) > > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c > index 6366c74248..c35883dfa9 100644 > --- a/hw/acpi/ghes.c > +++ b/hw/acpi/ghes.c > @@ -443,7 +443,7 @@ static void get_hw_error_offsets(uint64_t ghes_addr, > *read_ack_register_addr = ghes_addr + sizeof(uint64_t); > } > > -static void get_ghes_source_offsets(uint16_t source_id, > +static bool get_ghes_source_offsets(uint16_t source_id, > uint64_t hest_addr, > uint64_t *cper_addr, > uint64_t *read_ack_start_addr, > @@ -474,7 +474,7 @@ static void get_ghes_source_offsets(uint16_t source_id, > /* For now, we only know the size of GHESv2 table */ > if (type != ACPI_GHES_SOURCE_GENERIC_ERROR_V2) { > error_setg(errp, "HEST: type %d not supported.", type); > - return; > + return false; > } > > /* Compare CPER source ID at the GHESv2 structure */ > @@ -488,7 +488,7 @@ static void get_ghes_source_offsets(uint16_t source_id, > } > if (i == num_sources) { > error_setg(errp, "HEST: Source %d not found.", source_id); > - return; > + return false; > } > > /* Navigate through table address pointers */ > @@ -508,6 +508,8 @@ static void get_ghes_source_offsets(uint16_t source_id, > cpu_physical_memory_read(hest_read_ack_addr, read_ack_start_addr, > sizeof(*read_ack_start_addr)); > *read_ack_start_addr = le64_to_cpu(*read_ack_start_addr); > + > + return true; > } > > NotifierList acpi_generic_error_notifiers = > @@ -526,9 +528,10 @@ void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len, > if (!ags->use_hest_addr) { > get_hw_error_offsets(le64_to_cpu(ags->hw_error_le), > &cper_addr, &read_ack_register_addr); > - } else { > - get_ghes_source_offsets(source_id, le64_to_cpu(ags->hest_addr_le), > - &cper_addr, &read_ack_register_addr, errp); > + } else if (!get_ghes_source_offsets(source_id, > + le64_to_cpu(ags->hest_addr_le), > + &cper_addr, &read_ack_register_addr, errp)) { > + return; > } > > cpu_physical_memory_read(read_ack_register_addr, ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 4/5] acpi/ghes: Bail early on error from get_ghes_source_offsets() 2025-12-01 10:10 ` Igor Mammedov @ 2025-12-01 14:15 ` Gavin Shan 0 siblings, 0 replies; 27+ messages in thread From: Gavin Shan @ 2025-12-01 14:15 UTC (permalink / raw) To: Igor Mammedov Cc: qemu-arm, qemu-devel, mchehab+huawei, jonathan.cameron, armbru, mst, anisinha, gengdongjiu1, peter.maydell, pbonzini, shan.gavin Hi Igor, On 12/1/25 8:10 PM, Igor Mammedov wrote: > On Thu, 27 Nov 2025 10:44:34 +1000 > Gavin Shan <gshan@redhat.com> wrote: > >> For one particular error (Error), we can't call error_setg() for twice. > ^^^^^^^^^^^^^^^^^^^^^^^^^^^ > I can't really parse that > maybe rephrase it to make some sense? > I will drop this sentence in v2. >> Otherwise, the assert(*errp == NULL) will be triggered unexpectedly in >> error_setv(). In ghes_record_cper_errors(), get_ghes_source_offsets() >> can return a error initialized by error_setg(). Without bailing on >> this error, it can call into the second error_setg() due to the >> unexpected value from the read acknowledgement register. >> >> Bail early in ghes_record_cper_errors() when error is received from >> get_ghes_source_offsets() to avoid the exception. >> With above sentence dropped, the commit log improved to something like below in v2. In ghes_record_cper_errors(), get_ghes_source_offsets() can return a error initialized by error_setg(). Without bailing on this error, it can call into the second error_setg() due to the unexpected value returned from the read acknowledgement register. The second error_setg() can trigger assert(*errp == NULL) in its callee error_setv(), which isn't expected. Bail early in ghes_record_cper_errors() when error is received from get_ghes_source_offsets() to avoid the unexpected behavior. >> Signed-off-by: Gavin Shan <gshan@redhat.com> >> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> > > patch itself LGTM > and with commit message fixed > Reviewed-by: Igor Mammedov <imammedo@redhat.com> > >> --- >> hw/acpi/ghes.c | 15 +++++++++------ >> 1 file changed, 9 insertions(+), 6 deletions(-) >> >> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c >> index 6366c74248..c35883dfa9 100644 >> --- a/hw/acpi/ghes.c >> +++ b/hw/acpi/ghes.c >> @@ -443,7 +443,7 @@ static void get_hw_error_offsets(uint64_t ghes_addr, >> *read_ack_register_addr = ghes_addr + sizeof(uint64_t); >> } >> >> -static void get_ghes_source_offsets(uint16_t source_id, >> +static bool get_ghes_source_offsets(uint16_t source_id, >> uint64_t hest_addr, >> uint64_t *cper_addr, >> uint64_t *read_ack_start_addr, >> @@ -474,7 +474,7 @@ static void get_ghes_source_offsets(uint16_t source_id, >> /* For now, we only know the size of GHESv2 table */ >> if (type != ACPI_GHES_SOURCE_GENERIC_ERROR_V2) { >> error_setg(errp, "HEST: type %d not supported.", type); >> - return; >> + return false; >> } >> >> /* Compare CPER source ID at the GHESv2 structure */ >> @@ -488,7 +488,7 @@ static void get_ghes_source_offsets(uint16_t source_id, >> } >> if (i == num_sources) { >> error_setg(errp, "HEST: Source %d not found.", source_id); >> - return; >> + return false; >> } >> >> /* Navigate through table address pointers */ >> @@ -508,6 +508,8 @@ static void get_ghes_source_offsets(uint16_t source_id, >> cpu_physical_memory_read(hest_read_ack_addr, read_ack_start_addr, >> sizeof(*read_ack_start_addr)); >> *read_ack_start_addr = le64_to_cpu(*read_ack_start_addr); >> + >> + return true; >> } >> >> NotifierList acpi_generic_error_notifiers = >> @@ -526,9 +528,10 @@ void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len, >> if (!ags->use_hest_addr) { >> get_hw_error_offsets(le64_to_cpu(ags->hw_error_le), >> &cper_addr, &read_ack_register_addr); >> - } else { >> - get_ghes_source_offsets(source_id, le64_to_cpu(ags->hest_addr_le), >> - &cper_addr, &read_ack_register_addr, errp); >> + } else if (!get_ghes_source_offsets(source_id, >> + le64_to_cpu(ags->hest_addr_le), >> + &cper_addr, &read_ack_register_addr, errp)) { >> + return; >> } >> >> cpu_physical_memory_read(read_ack_register_addr, > Thanks, Gavin ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH 5/5] acpi/ghes: Use error_fatal in acpi_ghes_memory_errors() 2025-11-27 0:44 [PATCH 0/5] acpi/ghes: Error object handling improvement Gavin Shan ` (3 preceding siblings ...) 2025-11-27 0:44 ` [PATCH 4/5] acpi/ghes: Bail early on error from get_ghes_source_offsets() Gavin Shan @ 2025-11-27 0:44 ` Gavin Shan 2025-11-27 8:14 ` Markus Armbruster 2025-12-01 10:12 ` Igor Mammedov 2025-11-28 14:09 ` [PATCH 0/5] acpi/ghes: Error object handling improvement Igor Mammedov 2025-12-01 12:17 ` Mauro Carvalho Chehab 6 siblings, 2 replies; 27+ messages in thread From: Gavin Shan @ 2025-11-27 0:44 UTC (permalink / raw) To: qemu-arm Cc: qemu-devel, mchehab+huawei, jonathan.cameron, armbru, mst, imammedo, anisinha, gengdongjiu1, peter.maydell, pbonzini, shan.gavin Use error_fatal in acpi_ghes_memory_errors() so that the caller needn't explicitly call exit(). The return value of acpi_ghes_memory_errors() and ghes_record_cper_errors() is changed to 'bool' indicating an error has been raised, to be compatible with what's documented in error.h. Suggested-by: Igor Mammedov <imammedo@redhat.com> Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> --- hw/acpi/ghes-stub.c | 4 ++-- hw/acpi/ghes.c | 26 ++++++++++---------------- include/hw/acpi/ghes.h | 6 +++--- target/arm/kvm.c | 9 +++------ 4 files changed, 18 insertions(+), 27 deletions(-) diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c index b54f1b093c..5f9313cce9 100644 --- a/hw/acpi/ghes-stub.c +++ b/hw/acpi/ghes-stub.c @@ -11,8 +11,8 @@ #include "qemu/osdep.h" #include "hw/acpi/ghes.h" -int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id, - uint64_t physical_address) +bool acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id, + uint64_t physical_address, Error **errp) { g_assert_not_reached(); } diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c index c35883dfa9..3033e93d65 100644 --- a/hw/acpi/ghes.c +++ b/hw/acpi/ghes.c @@ -515,14 +515,14 @@ static bool get_ghes_source_offsets(uint16_t source_id, NotifierList acpi_generic_error_notifiers = NOTIFIER_LIST_INITIALIZER(acpi_generic_error_notifiers); -void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len, +bool ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len, uint16_t source_id, Error **errp) { uint64_t cper_addr = 0, read_ack_register_addr = 0, read_ack_register; if (len > ACPI_GHES_MAX_RAW_DATA_LENGTH) { error_setg(errp, "GHES CPER record is too big: %zd", len); - return; + return false; } if (!ags->use_hest_addr) { @@ -531,7 +531,7 @@ void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len, } else if (!get_ghes_source_offsets(source_id, le64_to_cpu(ags->hest_addr_le), &cper_addr, &read_ack_register_addr, errp)) { - return; + return false; } cpu_physical_memory_read(read_ack_register_addr, @@ -542,7 +542,7 @@ void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len, error_setg(errp, "OSPM does not acknowledge previous error," " so can not record CPER for current error anymore"); - return; + return false; } read_ack_register = cpu_to_le64(0); @@ -557,16 +557,17 @@ void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len, cpu_physical_memory_write(cper_addr, cper, len); notifier_list_notify(&acpi_generic_error_notifiers, &source_id); + + return true; } -int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id, - uint64_t physical_address) +bool acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id, + uint64_t physical_address, Error **errp) { /* Memory Error Section Type */ const uint8_t guid[] = UUID_LE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \ 0xED, 0x7C, 0x83, 0xB1); - Error *errp = NULL; int data_length; g_autoptr(GArray) block = g_array_new(false, true /* clear */, 1); @@ -583,15 +584,8 @@ int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id, /* Build the memory section CPER for above new generic error data entry */ acpi_ghes_build_append_mem_cper(block, physical_address); - /* Report the error */ - ghes_record_cper_errors(ags, block->data, block->len, source_id, &errp); - - if (errp) { - error_report_err(errp); - return -1; - } - - return 0; + return ghes_record_cper_errors(ags, block->data, block->len, + source_id, errp); } AcpiGhesState *acpi_ghes_get_state(void) diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h index df2ecbf6e4..5b29aae4dd 100644 --- a/include/hw/acpi/ghes.h +++ b/include/hw/acpi/ghes.h @@ -98,9 +98,9 @@ void acpi_build_hest(AcpiGhesState *ags, GArray *table_data, const char *oem_id, const char *oem_table_id); void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s, GArray *hardware_errors); -int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id, - uint64_t error_physical_addr); -void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len, +bool acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id, + uint64_t error_physical_addr, Error **errp); +bool ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len, uint16_t source_id, Error **errp); /** diff --git a/target/arm/kvm.c b/target/arm/kvm.c index acda0b3fb4..76aa09810f 100644 --- a/target/arm/kvm.c +++ b/target/arm/kvm.c @@ -2456,12 +2456,9 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr) */ if (code == BUS_MCEERR_AR) { kvm_cpu_synchronize_state(c); - if (!acpi_ghes_memory_errors(ags, ACPI_HEST_SRC_ID_SYNC, - paddr)) { - kvm_inject_arm_sea(c); - } else { - exit(1); - } + acpi_ghes_memory_errors(ags, ACPI_HEST_SRC_ID_SYNC, + paddr, &error_fatal); + kvm_inject_arm_sea(c); } return; } -- 2.51.1 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH 5/5] acpi/ghes: Use error_fatal in acpi_ghes_memory_errors() 2025-11-27 0:44 ` [PATCH 5/5] acpi/ghes: Use error_fatal in acpi_ghes_memory_errors() Gavin Shan @ 2025-11-27 8:14 ` Markus Armbruster 2025-11-29 1:23 ` Gavin Shan 2025-12-01 10:12 ` Igor Mammedov 1 sibling, 1 reply; 27+ messages in thread From: Markus Armbruster @ 2025-11-27 8:14 UTC (permalink / raw) To: Gavin Shan Cc: qemu-arm, qemu-devel, mchehab+huawei, jonathan.cameron, mst, imammedo, anisinha, gengdongjiu1, peter.maydell, pbonzini, shan.gavin Gavin Shan <gshan@redhat.com> writes: > Use error_fatal in acpi_ghes_memory_errors() so that the caller needn't > explicitly call exit(). The return value of acpi_ghes_memory_errors() > and ghes_record_cper_errors() is changed to 'bool' indicating an error > has been raised, to be compatible with what's documented in error.h. > > Suggested-by: Igor Mammedov <imammedo@redhat.com> > Suggested-by: Markus Armbruster <armbru@redhat.com> > Signed-off-by: Gavin Shan <gshan@redhat.com> > Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> I figure I would've split this for easier review, say 1. return bool, 2. convert acpi_ghes_memory_errors to Error. Since you already got review, it's probably not worthwhile now. Next time :) ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 5/5] acpi/ghes: Use error_fatal in acpi_ghes_memory_errors() 2025-11-27 8:14 ` Markus Armbruster @ 2025-11-29 1:23 ` Gavin Shan 0 siblings, 0 replies; 27+ messages in thread From: Gavin Shan @ 2025-11-29 1:23 UTC (permalink / raw) To: Markus Armbruster Cc: qemu-arm, qemu-devel, mchehab+huawei, jonathan.cameron, mst, imammedo, anisinha, gengdongjiu1, peter.maydell, pbonzini, shan.gavin Hi Markus, On 11/27/25 6:14 PM, Markus Armbruster wrote: > Gavin Shan <gshan@redhat.com> writes: > >> Use error_fatal in acpi_ghes_memory_errors() so that the caller needn't >> explicitly call exit(). The return value of acpi_ghes_memory_errors() >> and ghes_record_cper_errors() is changed to 'bool' indicating an error >> has been raised, to be compatible with what's documented in error.h. >> >> Suggested-by: Igor Mammedov <imammedo@redhat.com> >> Suggested-by: Markus Armbruster <armbru@redhat.com> >> Signed-off-by: Gavin Shan <gshan@redhat.com> >> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> > > Reviewed-by: Markus Armbruster <armbru@redhat.com> > > I figure I would've split this for easier review, say 1. return bool, > 2. convert acpi_ghes_memory_errors to Error. Since you already got > review, it's probably not worthwhile now. Next time :) > Nice point, thanks for your review. Lets try to do that next time :) Thanks, Gavin ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 5/5] acpi/ghes: Use error_fatal in acpi_ghes_memory_errors() 2025-11-27 0:44 ` [PATCH 5/5] acpi/ghes: Use error_fatal in acpi_ghes_memory_errors() Gavin Shan 2025-11-27 8:14 ` Markus Armbruster @ 2025-12-01 10:12 ` Igor Mammedov 1 sibling, 0 replies; 27+ messages in thread From: Igor Mammedov @ 2025-12-01 10:12 UTC (permalink / raw) To: Gavin Shan Cc: qemu-arm, qemu-devel, mchehab+huawei, jonathan.cameron, armbru, mst, anisinha, gengdongjiu1, peter.maydell, pbonzini, shan.gavin On Thu, 27 Nov 2025 10:44:35 +1000 Gavin Shan <gshan@redhat.com> wrote: > Use error_fatal in acpi_ghes_memory_errors() so that the caller needn't > explicitly call exit(). The return value of acpi_ghes_memory_errors() > and ghes_record_cper_errors() is changed to 'bool' indicating an error > has been raised, to be compatible with what's documented in error.h. > > Suggested-by: Igor Mammedov <imammedo@redhat.com> > Suggested-by: Markus Armbruster <armbru@redhat.com> > Signed-off-by: Gavin Shan <gshan@redhat.com> > Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> > --- > hw/acpi/ghes-stub.c | 4 ++-- > hw/acpi/ghes.c | 26 ++++++++++---------------- > include/hw/acpi/ghes.h | 6 +++--- > target/arm/kvm.c | 9 +++------ > 4 files changed, 18 insertions(+), 27 deletions(-) > > diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c > index b54f1b093c..5f9313cce9 100644 > --- a/hw/acpi/ghes-stub.c > +++ b/hw/acpi/ghes-stub.c > @@ -11,8 +11,8 @@ > #include "qemu/osdep.h" > #include "hw/acpi/ghes.h" > > -int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id, > - uint64_t physical_address) > +bool acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id, > + uint64_t physical_address, Error **errp) > { > g_assert_not_reached(); > } > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c > index c35883dfa9..3033e93d65 100644 > --- a/hw/acpi/ghes.c > +++ b/hw/acpi/ghes.c > @@ -515,14 +515,14 @@ static bool get_ghes_source_offsets(uint16_t source_id, > NotifierList acpi_generic_error_notifiers = > NOTIFIER_LIST_INITIALIZER(acpi_generic_error_notifiers); > > -void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len, > +bool ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len, > uint16_t source_id, Error **errp) > { > uint64_t cper_addr = 0, read_ack_register_addr = 0, read_ack_register; > > if (len > ACPI_GHES_MAX_RAW_DATA_LENGTH) { > error_setg(errp, "GHES CPER record is too big: %zd", len); > - return; > + return false; > } > > if (!ags->use_hest_addr) { > @@ -531,7 +531,7 @@ void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len, > } else if (!get_ghes_source_offsets(source_id, > le64_to_cpu(ags->hest_addr_le), > &cper_addr, &read_ack_register_addr, errp)) { > - return; > + return false; > } > > cpu_physical_memory_read(read_ack_register_addr, > @@ -542,7 +542,7 @@ void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len, > error_setg(errp, > "OSPM does not acknowledge previous error," > " so can not record CPER for current error anymore"); > - return; > + return false; > } > > read_ack_register = cpu_to_le64(0); > @@ -557,16 +557,17 @@ void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len, > cpu_physical_memory_write(cper_addr, cper, len); > > notifier_list_notify(&acpi_generic_error_notifiers, &source_id); > + > + return true; > } > > -int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id, > - uint64_t physical_address) > +bool acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id, > + uint64_t physical_address, Error **errp) > { > /* Memory Error Section Type */ > const uint8_t guid[] = > UUID_LE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \ > 0xED, 0x7C, 0x83, 0xB1); > - Error *errp = NULL; > int data_length; > g_autoptr(GArray) block = g_array_new(false, true /* clear */, 1); > > @@ -583,15 +584,8 @@ int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id, > /* Build the memory section CPER for above new generic error data entry */ > acpi_ghes_build_append_mem_cper(block, physical_address); > > - /* Report the error */ > - ghes_record_cper_errors(ags, block->data, block->len, source_id, &errp); > - > - if (errp) { > - error_report_err(errp); > - return -1; > - } > - > - return 0; > + return ghes_record_cper_errors(ags, block->data, block->len, > + source_id, errp); > } > > AcpiGhesState *acpi_ghes_get_state(void) > diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h > index df2ecbf6e4..5b29aae4dd 100644 > --- a/include/hw/acpi/ghes.h > +++ b/include/hw/acpi/ghes.h > @@ -98,9 +98,9 @@ void acpi_build_hest(AcpiGhesState *ags, GArray *table_data, > const char *oem_id, const char *oem_table_id); > void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s, > GArray *hardware_errors); > -int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id, > - uint64_t error_physical_addr); > -void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len, > +bool acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id, > + uint64_t error_physical_addr, Error **errp); > +bool ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t len, > uint16_t source_id, Error **errp); > > /** > diff --git a/target/arm/kvm.c b/target/arm/kvm.c > index acda0b3fb4..76aa09810f 100644 > --- a/target/arm/kvm.c > +++ b/target/arm/kvm.c > @@ -2456,12 +2456,9 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr) > */ > if (code == BUS_MCEERR_AR) { > kvm_cpu_synchronize_state(c); > - if (!acpi_ghes_memory_errors(ags, ACPI_HEST_SRC_ID_SYNC, > - paddr)) { > - kvm_inject_arm_sea(c); > - } else { > - exit(1); > - } > + acpi_ghes_memory_errors(ags, ACPI_HEST_SRC_ID_SYNC, > + paddr, &error_fatal); > + kvm_inject_arm_sea(c); > } > return; > } ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/5] acpi/ghes: Error object handling improvement 2025-11-27 0:44 [PATCH 0/5] acpi/ghes: Error object handling improvement Gavin Shan ` (4 preceding siblings ...) 2025-11-27 0:44 ` [PATCH 5/5] acpi/ghes: Use error_fatal in acpi_ghes_memory_errors() Gavin Shan @ 2025-11-28 14:09 ` Igor Mammedov 2025-11-29 1:21 ` Gavin Shan 2025-12-01 12:17 ` Mauro Carvalho Chehab 6 siblings, 1 reply; 27+ messages in thread From: Igor Mammedov @ 2025-11-28 14:09 UTC (permalink / raw) To: Gavin Shan Cc: qemu-arm, qemu-devel, mchehab+huawei, jonathan.cameron, armbru, mst, anisinha, gengdongjiu1, peter.maydell, pbonzini, shan.gavin On Thu, 27 Nov 2025 10:44:30 +1000 Gavin Shan <gshan@redhat.com> wrote: > This series is curved from that for memory error handling improvement ^^^ confusing based on above I'm not sure if it depends on [1] and shoul be applied on top or it can be merged on its own > [1] based on the received comments, to improve the error object handling > in various aspects. > > [1] https://lists.nongnu.org/archive/html/qemu-arm/2025-11/msg00534.html > > Gavin Shan (5): > acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors() > acpi/ghes: Abort in acpi_ghes_memory_errors() if necessary > target/arm/kvm: Exit on error from acpi_ghes_memory_errors() > acpi/ghes: Bail early on error from get_ghes_source_offsets() > acpi/ghes: Use error_fatal in acpi_ghes_memory_errors() > > hw/acpi/ghes-stub.c | 6 +++--- > hw/acpi/ghes.c | 45 ++++++++++++++++++------------------------ > include/hw/acpi/ghes.h | 6 +++--- > target/arm/kvm.c | 10 +++------- > 4 files changed, 28 insertions(+), 39 deletions(-) > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/5] acpi/ghes: Error object handling improvement 2025-11-28 14:09 ` [PATCH 0/5] acpi/ghes: Error object handling improvement Igor Mammedov @ 2025-11-29 1:21 ` Gavin Shan 2025-12-01 9:31 ` Igor Mammedov 0 siblings, 1 reply; 27+ messages in thread From: Gavin Shan @ 2025-11-29 1:21 UTC (permalink / raw) To: Igor Mammedov Cc: qemu-arm, qemu-devel, mchehab+huawei, jonathan.cameron, armbru, mst, anisinha, gengdongjiu1, peter.maydell, pbonzini, shan.gavin Hi Igor, On 11/29/25 12:09 AM, Igor Mammedov wrote: > On Thu, 27 Nov 2025 10:44:30 +1000 > Gavin Shan <gshan@redhat.com> wrote: > >> This series is curved from that for memory error handling improvement > ^^^ confusing > based on above I'm not sure if it depends on [1] and shoul be applied on top > or it can be merged on its own > The current series is a standalone series and expected to be merged by its own. For (v4) series of memory error improvement [1], Jonathan wants to extend the handlers in the guest kernel so that the granularity in CPER record will be used to isolate the corresponding memory address range. With this, the patches in the (v4) series to send 16x continuous errors become useless. However, those patches in (v4) series to improve the Error (object) hanlding are still useful. So I pulled those patches for the Error (object) hanlding improvement from (v4) series to form this series. >> [1] based on the received comments, to improve the error object handling >> in various aspects. >> >> [1] https://lists.nongnu.org/archive/html/qemu-arm/2025-11/msg00534.html >> Thanks, Gavin >> Gavin Shan (5): >> acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors() >> acpi/ghes: Abort in acpi_ghes_memory_errors() if necessary >> target/arm/kvm: Exit on error from acpi_ghes_memory_errors() >> acpi/ghes: Bail early on error from get_ghes_source_offsets() >> acpi/ghes: Use error_fatal in acpi_ghes_memory_errors() >> >> hw/acpi/ghes-stub.c | 6 +++--- >> hw/acpi/ghes.c | 45 ++++++++++++++++++------------------------ >> include/hw/acpi/ghes.h | 6 +++--- >> target/arm/kvm.c | 10 +++------- >> 4 files changed, 28 insertions(+), 39 deletions(-) >> > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/5] acpi/ghes: Error object handling improvement 2025-11-29 1:21 ` Gavin Shan @ 2025-12-01 9:31 ` Igor Mammedov 0 siblings, 0 replies; 27+ messages in thread From: Igor Mammedov @ 2025-12-01 9:31 UTC (permalink / raw) To: Gavin Shan Cc: qemu-arm, qemu-devel, mchehab+huawei, jonathan.cameron, armbru, mst, anisinha, gengdongjiu1, peter.maydell, pbonzini, shan.gavin On Sat, 29 Nov 2025 11:21:55 +1000 Gavin Shan <gshan@redhat.com> wrote: > Hi Igor, > > On 11/29/25 12:09 AM, Igor Mammedov wrote: > > On Thu, 27 Nov 2025 10:44:30 +1000 > > Gavin Shan <gshan@redhat.com> wrote: > > > >> This series is curved from that for memory error handling improvement > > ^^^ confusing > > based on above I'm not sure if it depends on [1] and shoul be applied on top > > or it can be merged on its own > > > > The current series is a standalone series and expected to be merged by its own. > > For (v4) series of memory error improvement [1], Jonathan wants to extend > the handlers in the guest kernel so that the granularity in CPER record > will be used to isolate the corresponding memory address range. With this, > the patches in the (v4) series to send 16x continuous errors become useless. > However, those patches in (v4) series to improve the Error (object) hanlding > are still useful. So I pulled those patches for the Error (object) hanlding > improvement from (v4) series to form this series. ok, then I'll review this series and skip v4 for now > > >> [1] based on the received comments, to improve the error object handling > >> in various aspects. > >> > >> [1] https://lists.nongnu.org/archive/html/qemu-arm/2025-11/msg00534.html > >> > > Thanks, > Gavin > > >> Gavin Shan (5): > >> acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors() > >> acpi/ghes: Abort in acpi_ghes_memory_errors() if necessary > >> target/arm/kvm: Exit on error from acpi_ghes_memory_errors() > >> acpi/ghes: Bail early on error from get_ghes_source_offsets() > >> acpi/ghes: Use error_fatal in acpi_ghes_memory_errors() > >> > >> hw/acpi/ghes-stub.c | 6 +++--- > >> hw/acpi/ghes.c | 45 ++++++++++++++++++------------------------ > >> include/hw/acpi/ghes.h | 6 +++--- > >> target/arm/kvm.c | 10 +++------- > >> 4 files changed, 28 insertions(+), 39 deletions(-) > >> > > > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/5] acpi/ghes: Error object handling improvement 2025-11-27 0:44 [PATCH 0/5] acpi/ghes: Error object handling improvement Gavin Shan ` (5 preceding siblings ...) 2025-11-28 14:09 ` [PATCH 0/5] acpi/ghes: Error object handling improvement Igor Mammedov @ 2025-12-01 12:17 ` Mauro Carvalho Chehab 2025-12-01 14:13 ` Gavin Shan 6 siblings, 1 reply; 27+ messages in thread From: Mauro Carvalho Chehab @ 2025-12-01 12:17 UTC (permalink / raw) To: Gavin Shan Cc: qemu-arm, qemu-devel, jonathan.cameron, armbru, mst, imammedo, anisinha, gengdongjiu1, peter.maydell, pbonzini, shan.gavin On Thu, 27 Nov 2025 10:44:30 +1000 Gavin Shan <gshan@redhat.com> wrote: > This series is curved from that for memory error handling improvement > [1] based on the received comments, to improve the error object handling > in various aspects. > > [1] https://lists.nongnu.org/archive/html/qemu-arm/2025-11/msg00534.html > > Gavin Shan (5): > acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors() > acpi/ghes: Abort in acpi_ghes_memory_errors() if necessary > target/arm/kvm: Exit on error from acpi_ghes_memory_errors() > acpi/ghes: Bail early on error from get_ghes_source_offsets() > acpi/ghes: Use error_fatal in acpi_ghes_memory_errors() Patch series look ok on my eyes. Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> - Btw, what setup are you using to test memory errors? It would be nice to have it documented somewhere, maybe at docs/specs/acpi_hest_ghes.rst. Thanks, Mauro ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/5] acpi/ghes: Error object handling improvement 2025-12-01 12:17 ` Mauro Carvalho Chehab @ 2025-12-01 14:13 ` Gavin Shan 2025-12-01 14:31 ` Mauro Carvalho Chehab 0 siblings, 1 reply; 27+ messages in thread From: Gavin Shan @ 2025-12-01 14:13 UTC (permalink / raw) To: Mauro Carvalho Chehab Cc: qemu-arm, qemu-devel, jonathan.cameron, armbru, mst, imammedo, anisinha, gengdongjiu1, peter.maydell, pbonzini, shan.gavin [-- Attachment #1: Type: text/plain, Size: 3852 bytes --] Hi Mauro, On 12/1/25 10:17 PM, Mauro Carvalho Chehab wrote: > On Thu, 27 Nov 2025 10:44:30 +1000 > Gavin Shan <gshan@redhat.com> wrote: > >> This series is curved from that for memory error handling improvement >> [1] based on the received comments, to improve the error object handling >> in various aspects. >> >> [1] https://lists.nongnu.org/archive/html/qemu-arm/2025-11/msg00534.html >> >> Gavin Shan (5): >> acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors() >> acpi/ghes: Abort in acpi_ghes_memory_errors() if necessary >> target/arm/kvm: Exit on error from acpi_ghes_memory_errors() >> acpi/ghes: Bail early on error from get_ghes_source_offsets() >> acpi/ghes: Use error_fatal in acpi_ghes_memory_errors() > > Patch series look ok on my eyes. > > Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> > Thanks. > - > > Btw, what setup are you using to test memory errors? It would be > nice to have it documented somewhere, maybe at > docs/specs/acpi_hest_ghes.rst. > I don't think docs/specs/acpi_hest_ghes.rst is the right place for that as it's for specifications. I'm sharing how this is tested here to make the thread complete. - Both host and guest has 4KB page size - Start the guest by the following command lines /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \ -accel kvm -machine virt,gic-version=host,nvdimm=on,ras=on \ -cpu host -smp maxcpus=8,cpus=8,sockets=2,clusters=2,cores=2,threads=1 \ -m 4096M,slots=16,maxmem=128G \ -object memory-backend-ram,id=mem0,size=4096M \ -numa node,nodeid=0,cpus=0-7,memdev=mem0 \ -L /home/gavin/sandbox/qemu.main/build/pc-bios \ -monitor none -serial mon:stdio -nographic \ -gdb tcp::6666 -qmp tcp:localhost:5555,server,wait=off \ -bios /home/gavin/sandbox/qemu.main/build/pc-bios/edk2-aarch64-code.fd \ -boot c \ -device pcie-root-port,bus=pcie.0,chassis=1,id=pcie.1 \ -device pcie-root-port,bus=pcie.0,chassis=2,id=pcie.2 \ -device pcie-root-port,bus=pcie.0,chassis=3,id=pcie.3 \ : \ -device pcie-root-port,bus=pcie.0,chassis=16,id=pcie.16 \ -drive file=/home/gavin/sandbox/images/disk.qcow2,if=none,id=drive0 \ -device virtio-blk-pci,id=virtblk0,bus=pcie.1,drive=drive0,num-queues=4 \ -netdev tap,id=tap1,vhost=true,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \ -device virtio-net-pci,bus=pcie.8,netdev=tap1,mac=52:54:00:f1:26:b0 - Trigger 'victim -d' in the guest guest$ ./victim -d physical address of (0xffff8d9b7000) = 0x1251d6000 Hit any key to trigger error: - Inject error to the GPA. "test.c" is attached host$ ./test 0x1251d6000 - Press enter on the guest so that 'victim' continues its execution [ 435.467481] EDAC MC0: 1 UE unknown on unknown memory ( page:0x1251d6 offset:0x0 grain:1 - APEI location: ) [ 435.467542] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0 [ 435.467543] {1}[Hardware Error]: event severity: recoverable [ 435.467544] {1}[Hardware Error]: Error 0, type: recoverable [ 435.467545] {1}[Hardware Error]: section_type: memory error [ 435.467546] {1}[Hardware Error]: physical_address: 0x00000001251d6000 [ 435.467547] {1}[Hardware Error]: error_type: 0, unknown [ 435.468380] Memory failure: 0x1251d6: recovery action for dirty LRU page: Recovered Bus error (core dumped) Thanks, Gavin > Thanks, > Mauro > [-- Attachment #2: test.c --] [-- Type: text/x-csrc, Size: 5821 bytes --] // SPDX-License-Identifier: GPL-2.0+ /* * This test program runs on the host, to receive GPA outputed by 'victimd' * from the guest. The GPA is translated to HPA, and recoverable error * is inject to HPA automatically. * * NOTE: We have the assumption that the guest has only one NUMA node and * the memory capacity is 4GB. The test program won't work if the assumption * is broken. * * Author: Gavin Shan <gshan@redhat.com> */ #define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <assert.h> #include <errno.h> #include <time.h> #include <fcntl.h> #include <dirent.h> #include <sys/stat.h> #include <sys/mman.h> #include <sys/wait.h> #define TEST_GUEST_MEM_SIZE 0x100000000 /* 4GB */ #define TEST_GUEST_MEM_START 0x040000000 /* 1GB */ #define TEST_INJECT_ERROR_TYPE 0x10 struct test_struct { int pid; unsigned long guest_mem_size; unsigned long gpa; unsigned long hva; unsigned long hpa; }; static void usage(void) { fprintf(stdout, "\n"); fprintf(stdout, "./test <gpa>\n"); fprintf(stdout, "gpa The GPA (Guest Physical Address) where the error is injected\n"); fprintf(stdout, "\n"); } static void init_test_struct(struct test_struct *test) { test->pid = -1; test->guest_mem_size = TEST_GUEST_MEM_SIZE; test->gpa = -1UL; test->hpa = -1UL; } static int fetch_gpa(struct test_struct *test, int argc, char **argv) { if (argc != 2) { usage(); return -EINVAL; } test->gpa = strtoul(argv[1], NULL, 16); if (test->gpa < TEST_GUEST_MEM_START || test->gpa > (TEST_GUEST_MEM_START + test->guest_mem_size)) { fprintf(stderr, "%s: GPA 0x%lx out of range [1GB, 1GB+0x%lx]\n", __func__, test->gpa, test->guest_mem_size); return -EINVAL; } return 0; } static int find_qemu_pid(struct test_struct *test) { DIR *dir; FILE *fp; struct dirent *entry; char path[256], data[256]; size_t sz; int ret = -ENODEV; dir = opendir("/proc"); if (!dir) { fprintf(stderr, "%s: unable to open </proc>\n", __func__); return -EIO; } while ((entry = readdir(dir)) != NULL) { if (entry->d_type != DT_DIR || entry->d_name[0] == '.') continue; memset(path, 0, sizeof(path)); snprintf(path, sizeof(path), "/proc/%s/comm", entry->d_name); fp = fopen(path, "r"); if (!fp) continue; memset(data, 0, sizeof(data)); sz = fread(data, 1, sizeof(data), fp); fclose(fp); if (sz <= 0) continue; if (strstr(data, "qemu")) { ret = 0; test->pid = atoi(entry->d_name); break; } } if (ret != 0) fprintf(stderr, "%s: Unable to find QEMU PID\n", __func__); closedir(dir); return ret; } static int fetch_hva(struct test_struct *test) { FILE *fp; char filename[64], *data = NULL, *next, *next1; unsigned long start, end; size_t sz, len; int ret = -EIO; memset(filename, 0, sizeof(filename)); snprintf(filename, sizeof(filename), "/proc/%d/smaps", test->pid); fp = fopen(filename, "r"); if (!fp) { fprintf(stderr, "%s: Unable to open <%s>\n", __func__, filename); return ret; } while ((sz = getline(&data, &len, fp)) != -1) { if (!strstr(data, "rw-p")) continue; next = strchr(data, '-'); if (!next) continue; *next++ = '\0'; next1 = strchr(next, ' '); if (!next1) continue; *next1 = '\0'; start = strtoul(data, NULL, 16); end = strtoul(next, NULL, 16); if (end - start == test->guest_mem_size) { ret = 0; test->hva = start + (test->gpa - TEST_GUEST_MEM_START); break; } } if (data) free(data); fclose(fp); return ret; } static int fetch_hpa(struct test_struct *test) { int fd; unsigned long pinfo, pgsize = getpagesize(); off_t offset = (test->hva / pgsize) * sizeof(pinfo); char filename[128]; ssize_t sz; memset(filename, 0, sizeof(filename)); snprintf(filename, sizeof(filename), "/proc/%d/pagemap", test->pid); fd = open(filename, O_RDONLY); if (fd < 0) { fprintf(stderr, "%s: Unable to open <%s>\n", __func__, filename); return -EIO; } sz = pread(fd, &pinfo, sizeof(pinfo), offset); close(fd); if (sz != sizeof(pinfo)) { fprintf(stderr, "%s: Unable to read from <%s>\n", __func__, filename); return -EIO; } if (!(pinfo & (1UL << 63))) { fprintf(stderr, "%s: Page not present\n", __func__); return -EINVAL; } test->hpa = ((pinfo & 0x007fffffffffffffUL) * pgsize) + (test->hva & (pgsize - 1)); return 0; } static int write_file(const char *filename, unsigned long val) { int fd; char data[128]; size_t sz; int ret = 0; memset(data, 0, sizeof(data)); sz = snprintf(data, sizeof(data), "0x%lx", val); fd = open(filename, O_WRONLY); if (fd < 0) { fprintf(stderr, "%s: Unable to open <%s>\n", __func__, filename); return -EIO; } if (write(fd, data, sz) != sz) { ret = -EIO; fprintf(stderr, "%s: Unable to write <%s>\n", __func__, filename); } close(fd); return ret; } static int inject_error(struct test_struct *test) { fprintf(stdout, "pid: %d\n", test->pid); fprintf(stdout, "gpa: 0x%lx\n", test->gpa); fprintf(stdout, "hva: 0x%lx\n", test->hva); fprintf(stdout, "hpa: 0x%lx\n", test->hpa); system("modprobe einj > /dev/null"); if (write_file("/sys/kernel/debug/apei/einj/param1", test->hpa) || write_file("/sys/kernel/debug/apei/einj/param2", 0xfffffffffffff000) || write_file("/sys/kernel/debug/apei/einj/flags", 0x0) || write_file("/sys/kernel/debug/apei/einj/error_type", TEST_INJECT_ERROR_TYPE) || write_file("/sys/kernel/debug/apei/einj/notrigger", 1) || write_file("/sys/kernel/debug/apei/einj/error_inject", 1)) return -EIO; return 0; } int main(int argc, char **argv) { struct test_struct test; int ret; init_test_struct(&test); if (fetch_gpa(&test, argc, argv) || find_qemu_pid(&test) || fetch_hva(&test) || fetch_hpa(&test) || inject_error(&test)) return -EIO; return 0; } ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/5] acpi/ghes: Error object handling improvement 2025-12-01 14:13 ` Gavin Shan @ 2025-12-01 14:31 ` Mauro Carvalho Chehab 2025-12-01 14:37 ` Gavin Shan 0 siblings, 1 reply; 27+ messages in thread From: Mauro Carvalho Chehab @ 2025-12-01 14:31 UTC (permalink / raw) To: Gavin Shan Cc: qemu-arm, qemu-devel, jonathan.cameron, armbru, mst, imammedo, anisinha, gengdongjiu1, peter.maydell, pbonzini, shan.gavin On Tue, 2 Dec 2025 00:13:06 +1000 Gavin Shan <gshan@redhat.com> wrote: > Hi Mauro, > > On 12/1/25 10:17 PM, Mauro Carvalho Chehab wrote: > > On Thu, 27 Nov 2025 10:44:30 +1000 > > Gavin Shan <gshan@redhat.com> wrote: > > > >> This series is curved from that for memory error handling improvement > >> [1] based on the received comments, to improve the error object handling > >> in various aspects. > >> > >> [1] https://lists.nongnu.org/archive/html/qemu-arm/2025-11/msg00534.html > >> > >> Gavin Shan (5): > >> acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors() > >> acpi/ghes: Abort in acpi_ghes_memory_errors() if necessary > >> target/arm/kvm: Exit on error from acpi_ghes_memory_errors() > >> acpi/ghes: Bail early on error from get_ghes_source_offsets() > >> acpi/ghes: Use error_fatal in acpi_ghes_memory_errors() > > > > Patch series look ok on my eyes. > > > > Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> > > > > Thanks. > > > - > > > > Btw, what setup are you using to test memory errors? It would be > > nice to have it documented somewhere, maybe at > > docs/specs/acpi_hest_ghes.rst. > > > > I don't think docs/specs/acpi_hest_ghes.rst is the right place for that > as it's for specifications. Perhaps not, but it would be nice to have it documented somewhere, either there or at QEMU wiki. > I'm sharing how this is tested here to make the thread complete. Thanks! > > - Both host and guest has 4KB page size > > - Start the guest by the following command lines > > /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \ > -accel kvm -machine virt,gic-version=host,nvdimm=on,ras=on \ > -cpu host -smp maxcpus=8,cpus=8,sockets=2,clusters=2,cores=2,threads=1 \ > -m 4096M,slots=16,maxmem=128G \ > -object memory-backend-ram,id=mem0,size=4096M \ > -numa node,nodeid=0,cpus=0-7,memdev=mem0 \ > -L /home/gavin/sandbox/qemu.main/build/pc-bios \ > -monitor none -serial mon:stdio -nographic \ > -gdb tcp::6666 -qmp tcp:localhost:5555,server,wait=off \ > -bios /home/gavin/sandbox/qemu.main/build/pc-bios/edk2-aarch64-code.fd \ > -boot c \ > -device pcie-root-port,bus=pcie.0,chassis=1,id=pcie.1 \ > -device pcie-root-port,bus=pcie.0,chassis=2,id=pcie.2 \ > -device pcie-root-port,bus=pcie.0,chassis=3,id=pcie.3 \ > : \ > -device pcie-root-port,bus=pcie.0,chassis=16,id=pcie.16 \ > -drive file=/home/gavin/sandbox/images/disk.qcow2,if=none,id=drive0 \ > -device virtio-blk-pci,id=virtblk0,bus=pcie.1,drive=drive0,num-queues=4 \ > -netdev tap,id=tap1,vhost=true,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \ > -device virtio-net-pci,bus=pcie.8,netdev=tap1,mac=52:54:00:f1:26:b0 > > - Trigger 'victim -d' in the guest Hmm... from where I can get victim? Regards, Mauro ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/5] acpi/ghes: Error object handling improvement 2025-12-01 14:31 ` Mauro Carvalho Chehab @ 2025-12-01 14:37 ` Gavin Shan 2025-12-02 12:10 ` Igor Mammedov 2025-12-02 13:20 ` Peter Maydell 0 siblings, 2 replies; 27+ messages in thread From: Gavin Shan @ 2025-12-01 14:37 UTC (permalink / raw) To: Mauro Carvalho Chehab Cc: qemu-arm, qemu-devel, jonathan.cameron, armbru, mst, imammedo, anisinha, gengdongjiu1, peter.maydell, pbonzini, shan.gavin Hi Mauro, On 12/2/25 12:31 AM, Mauro Carvalho Chehab wrote: > On Tue, 2 Dec 2025 00:13:06 +1000 > Gavin Shan <gshan@redhat.com> wrote: >> On 12/1/25 10:17 PM, Mauro Carvalho Chehab wrote: >>> On Thu, 27 Nov 2025 10:44:30 +1000 >>> Gavin Shan <gshan@redhat.com> wrote: [...] >>> >>> Btw, what setup are you using to test memory errors? It would be >>> nice to have it documented somewhere, maybe at >>> docs/specs/acpi_hest_ghes.rst. >>> >> >> I don't think docs/specs/acpi_hest_ghes.rst is the right place for that >> as it's for specifications. > > Perhaps not, but it would be nice to have it documented somewhere, > either there or at QEMU wiki. > QEMU wiki may be the best place for it. I never updated to QEMU wiki and any guiding steps on how to do that? >> I'm sharing how this is tested here to make the thread complete. > > Thanks! > >> >> - Both host and guest has 4KB page size >> >> - Start the guest by the following command lines >> >> /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \ >> -accel kvm -machine virt,gic-version=host,nvdimm=on,ras=on \ >> -cpu host -smp maxcpus=8,cpus=8,sockets=2,clusters=2,cores=2,threads=1 \ >> -m 4096M,slots=16,maxmem=128G \ >> -object memory-backend-ram,id=mem0,size=4096M \ >> -numa node,nodeid=0,cpus=0-7,memdev=mem0 \ >> -L /home/gavin/sandbox/qemu.main/build/pc-bios \ >> -monitor none -serial mon:stdio -nographic \ >> -gdb tcp::6666 -qmp tcp:localhost:5555,server,wait=off \ >> -bios /home/gavin/sandbox/qemu.main/build/pc-bios/edk2-aarch64-code.fd \ >> -boot c \ >> -device pcie-root-port,bus=pcie.0,chassis=1,id=pcie.1 \ >> -device pcie-root-port,bus=pcie.0,chassis=2,id=pcie.2 \ >> -device pcie-root-port,bus=pcie.0,chassis=3,id=pcie.3 \ >> : \ >> -device pcie-root-port,bus=pcie.0,chassis=16,id=pcie.16 \ >> -drive file=/home/gavin/sandbox/images/disk.qcow2,if=none,id=drive0 \ >> -device virtio-blk-pci,id=virtblk0,bus=pcie.1,drive=drive0,num-queues=4 \ >> -netdev tap,id=tap1,vhost=true,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \ >> -device virtio-net-pci,bus=pcie.8,netdev=tap1,mac=52:54:00:f1:26:b0 >> >> - Trigger 'victim -d' in the guest > > Hmm... from where I can get victim? > https://git.kernel.org/pub/scm/utils/cpu/mce/mce-test.git > Regards, > Mauro > Thanks, Gavin ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/5] acpi/ghes: Error object handling improvement 2025-12-01 14:37 ` Gavin Shan @ 2025-12-02 12:10 ` Igor Mammedov 2025-12-02 13:20 ` Peter Maydell 1 sibling, 0 replies; 27+ messages in thread From: Igor Mammedov @ 2025-12-02 12:10 UTC (permalink / raw) To: Gavin Shan Cc: Mauro Carvalho Chehab, qemu-arm, qemu-devel, jonathan.cameron, armbru, mst, anisinha, gengdongjiu1, peter.maydell, pbonzini, shan.gavin On Tue, 2 Dec 2025 00:37:53 +1000 Gavin Shan <gshan@redhat.com> wrote: > Hi Mauro, > > On 12/2/25 12:31 AM, Mauro Carvalho Chehab wrote: > > On Tue, 2 Dec 2025 00:13:06 +1000 > > Gavin Shan <gshan@redhat.com> wrote: > >> On 12/1/25 10:17 PM, Mauro Carvalho Chehab wrote: > >>> On Thu, 27 Nov 2025 10:44:30 +1000 > >>> Gavin Shan <gshan@redhat.com> wrote: > > [...] > > >>> > >>> Btw, what setup are you using to test memory errors? It would be > >>> nice to have it documented somewhere, maybe at > >>> docs/specs/acpi_hest_ghes.rst. > >>> > >> > >> I don't think docs/specs/acpi_hest_ghes.rst is the right place for that > >> as it's for specifications. > > > > Perhaps not, but it would be nice to have it documented somewhere, > > either there or at QEMU wiki. > > > > QEMU wiki may be the best place for it. I never updated to QEMU wiki and > any guiding steps on how to do that? do you have an account already? > > >> I'm sharing how this is tested here to make the thread complete. > > > > Thanks! > > > >> > >> - Both host and guest has 4KB page size > >> > >> - Start the guest by the following command lines > >> > >> /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \ > >> -accel kvm -machine virt,gic-version=host,nvdimm=on,ras=on \ > >> -cpu host -smp maxcpus=8,cpus=8,sockets=2,clusters=2,cores=2,threads=1 \ > >> -m 4096M,slots=16,maxmem=128G \ > >> -object memory-backend-ram,id=mem0,size=4096M \ > >> -numa node,nodeid=0,cpus=0-7,memdev=mem0 \ > >> -L /home/gavin/sandbox/qemu.main/build/pc-bios \ > >> -monitor none -serial mon:stdio -nographic \ > >> -gdb tcp::6666 -qmp tcp:localhost:5555,server,wait=off \ > >> -bios /home/gavin/sandbox/qemu.main/build/pc-bios/edk2-aarch64-code.fd \ > >> -boot c \ > >> -device pcie-root-port,bus=pcie.0,chassis=1,id=pcie.1 \ > >> -device pcie-root-port,bus=pcie.0,chassis=2,id=pcie.2 \ > >> -device pcie-root-port,bus=pcie.0,chassis=3,id=pcie.3 \ > >> : \ > >> -device pcie-root-port,bus=pcie.0,chassis=16,id=pcie.16 \ > >> -drive file=/home/gavin/sandbox/images/disk.qcow2,if=none,id=drive0 \ > >> -device virtio-blk-pci,id=virtblk0,bus=pcie.1,drive=drive0,num-queues=4 \ > >> -netdev tap,id=tap1,vhost=true,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \ > >> -device virtio-net-pci,bus=pcie.8,netdev=tap1,mac=52:54:00:f1:26:b0 > >> > >> - Trigger 'victim -d' in the guest > > > > Hmm... from where I can get victim? > > > > https://git.kernel.org/pub/scm/utils/cpu/mce/mce-test.git > > > Regards, > > Mauro > > > > Thanks, > Gavin > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH 0/5] acpi/ghes: Error object handling improvement 2025-12-01 14:37 ` Gavin Shan 2025-12-02 12:10 ` Igor Mammedov @ 2025-12-02 13:20 ` Peter Maydell 1 sibling, 0 replies; 27+ messages in thread From: Peter Maydell @ 2025-12-02 13:20 UTC (permalink / raw) To: Gavin Shan Cc: Mauro Carvalho Chehab, qemu-arm, qemu-devel, jonathan.cameron, armbru, mst, imammedo, anisinha, gengdongjiu1, pbonzini, shan.gavin On Mon, 1 Dec 2025 at 14:38, Gavin Shan <gshan@redhat.com> wrote: > > Hi Mauro, > > On 12/2/25 12:31 AM, Mauro Carvalho Chehab wrote: > > On Tue, 2 Dec 2025 00:13:06 +1000 > > Gavin Shan <gshan@redhat.com> wrote: > >> On 12/1/25 10:17 PM, Mauro Carvalho Chehab wrote: > >>> On Thu, 27 Nov 2025 10:44:30 +1000 > >>> Gavin Shan <gshan@redhat.com> wrote: > > [...] > > >>> > >>> Btw, what setup are you using to test memory errors? It would be > >>> nice to have it documented somewhere, maybe at > >>> docs/specs/acpi_hest_ghes.rst. > >>> > >> > >> I don't think docs/specs/acpi_hest_ghes.rst is the right place for that > >> as it's for specifications. > > > > Perhaps not, but it would be nice to have it documented somewhere, > > either there or at QEMU wiki. > > > > QEMU wiki may be the best place for it. I never updated to QEMU wiki and > any guiding steps on how to do that? I think in general we should prefer to document things in docs/ if we think users would want to know them. If it's just a test setup then perhaps docs/devel, or if feasible actually make it a test in tests/. The wiki is largely unused except for the changelog and planning docs. (In an ideal world we'd check for parts of the wiki that still have useful-to-users up to date information, and fold them into our manuals.) thanks -- PMM ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2025-12-02 13:21 UTC | newest] Thread overview: 27+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-11-27 0:44 [PATCH 0/5] acpi/ghes: Error object handling improvement Gavin Shan 2025-11-27 0:44 ` [PATCH 1/5] acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors() Gavin Shan 2025-11-27 8:06 ` Markus Armbruster 2025-12-01 9:32 ` Igor Mammedov 2025-11-27 0:44 ` [PATCH 2/5] acpi/ghes: Abort in acpi_ghes_memory_errors() if necessary Gavin Shan 2025-12-01 9:37 ` Igor Mammedov 2025-11-27 0:44 ` [PATCH 3/5] target/arm/kvm: Exit on error from acpi_ghes_memory_errors() Gavin Shan 2025-11-28 14:07 ` Igor Mammedov 2025-11-28 14:54 ` Markus Armbruster 2025-12-01 10:06 ` Igor Mammedov 2025-11-27 0:44 ` [PATCH 4/5] acpi/ghes: Bail early on error from get_ghes_source_offsets() Gavin Shan 2025-11-27 8:10 ` Markus Armbruster 2025-12-01 10:10 ` Igor Mammedov 2025-12-01 14:15 ` Gavin Shan 2025-11-27 0:44 ` [PATCH 5/5] acpi/ghes: Use error_fatal in acpi_ghes_memory_errors() Gavin Shan 2025-11-27 8:14 ` Markus Armbruster 2025-11-29 1:23 ` Gavin Shan 2025-12-01 10:12 ` Igor Mammedov 2025-11-28 14:09 ` [PATCH 0/5] acpi/ghes: Error object handling improvement Igor Mammedov 2025-11-29 1:21 ` Gavin Shan 2025-12-01 9:31 ` Igor Mammedov 2025-12-01 12:17 ` Mauro Carvalho Chehab 2025-12-01 14:13 ` Gavin Shan 2025-12-01 14:31 ` Mauro Carvalho Chehab 2025-12-01 14:37 ` Gavin Shan 2025-12-02 12:10 ` Igor Mammedov 2025-12-02 13:20 ` Peter Maydell
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).