From: Gavin Shan <gshan@redhat.com>
To: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Cc: qemu-arm@nongnu.org, qemu-devel@nongnu.org,
jonathan.cameron@huawei.com, armbru@redhat.com, mst@redhat.com,
imammedo@redhat.com, anisinha@redhat.com, gengdongjiu1@gmail.com,
peter.maydell@linaro.org, pbonzini@redhat.com,
shan.gavin@gmail.com
Subject: Re: [PATCH 0/5] acpi/ghes: Error object handling improvement
Date: Tue, 2 Dec 2025 00:13:06 +1000 [thread overview]
Message-ID: <12b7baee-1d6d-440a-a119-971b47d7f3ad@redhat.com> (raw)
In-Reply-To: <20251201131729.615abe68@foz.lan>
[-- Attachment #1: Type: text/plain, Size: 3852 bytes --]
Hi Mauro,
On 12/1/25 10:17 PM, Mauro Carvalho Chehab wrote:
> On Thu, 27 Nov 2025 10:44:30 +1000
> Gavin Shan <gshan@redhat.com> wrote:
>
>> This series is curved from that for memory error handling improvement
>> [1] based on the received comments, to improve the error object handling
>> in various aspects.
>>
>> [1] https://lists.nongnu.org/archive/html/qemu-arm/2025-11/msg00534.html
>>
>> Gavin Shan (5):
>> acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors()
>> acpi/ghes: Abort in acpi_ghes_memory_errors() if necessary
>> target/arm/kvm: Exit on error from acpi_ghes_memory_errors()
>> acpi/ghes: Bail early on error from get_ghes_source_offsets()
>> acpi/ghes: Use error_fatal in acpi_ghes_memory_errors()
>
> Patch series look ok on my eyes.
>
> Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
>
Thanks.
> -
>
> Btw, what setup are you using to test memory errors? It would be
> nice to have it documented somewhere, maybe at
> docs/specs/acpi_hest_ghes.rst.
>
I don't think docs/specs/acpi_hest_ghes.rst is the right place for that
as it's for specifications. I'm sharing how this is tested here to make
the thread complete.
- Both host and guest has 4KB page size
- Start the guest by the following command lines
/home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
-accel kvm -machine virt,gic-version=host,nvdimm=on,ras=on \
-cpu host -smp maxcpus=8,cpus=8,sockets=2,clusters=2,cores=2,threads=1 \
-m 4096M,slots=16,maxmem=128G \
-object memory-backend-ram,id=mem0,size=4096M \
-numa node,nodeid=0,cpus=0-7,memdev=mem0 \
-L /home/gavin/sandbox/qemu.main/build/pc-bios \
-monitor none -serial mon:stdio -nographic \
-gdb tcp::6666 -qmp tcp:localhost:5555,server,wait=off \
-bios /home/gavin/sandbox/qemu.main/build/pc-bios/edk2-aarch64-code.fd \
-boot c \
-device pcie-root-port,bus=pcie.0,chassis=1,id=pcie.1 \
-device pcie-root-port,bus=pcie.0,chassis=2,id=pcie.2 \
-device pcie-root-port,bus=pcie.0,chassis=3,id=pcie.3 \
: \
-device pcie-root-port,bus=pcie.0,chassis=16,id=pcie.16 \
-drive file=/home/gavin/sandbox/images/disk.qcow2,if=none,id=drive0 \
-device virtio-blk-pci,id=virtblk0,bus=pcie.1,drive=drive0,num-queues=4 \
-netdev tap,id=tap1,vhost=true,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \
-device virtio-net-pci,bus=pcie.8,netdev=tap1,mac=52:54:00:f1:26:b0
- Trigger 'victim -d' in the guest
guest$ ./victim -d
physical address of (0xffff8d9b7000) = 0x1251d6000
Hit any key to trigger error:
- Inject error to the GPA. "test.c" is attached
host$ ./test 0x1251d6000
- Press enter on the guest so that 'victim' continues its execution
[ 435.467481] EDAC MC0: 1 UE unknown on unknown memory ( page:0x1251d6 offset:0x0 grain:1 - APEI location: )
[ 435.467542] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[ 435.467543] {1}[Hardware Error]: event severity: recoverable
[ 435.467544] {1}[Hardware Error]: Error 0, type: recoverable
[ 435.467545] {1}[Hardware Error]: section_type: memory error
[ 435.467546] {1}[Hardware Error]: physical_address: 0x00000001251d6000
[ 435.467547] {1}[Hardware Error]: error_type: 0, unknown
[ 435.468380] Memory failure: 0x1251d6: recovery action for dirty LRU page: Recovered
Bus error (core dumped)
Thanks,
Gavin
> Thanks,
> Mauro
>
[-- Attachment #2: test.c --]
[-- Type: text/x-csrc, Size: 5821 bytes --]
// SPDX-License-Identifier: GPL-2.0+
/*
* This test program runs on the host, to receive GPA outputed by 'victimd'
* from the guest. The GPA is translated to HPA, and recoverable error
* is inject to HPA automatically.
*
* NOTE: We have the assumption that the guest has only one NUMA node and
* the memory capacity is 4GB. The test program won't work if the assumption
* is broken.
*
* Author: Gavin Shan <gshan@redhat.com>
*/
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <assert.h>
#include <errno.h>
#include <time.h>
#include <fcntl.h>
#include <dirent.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <sys/wait.h>
#define TEST_GUEST_MEM_SIZE 0x100000000 /* 4GB */
#define TEST_GUEST_MEM_START 0x040000000 /* 1GB */
#define TEST_INJECT_ERROR_TYPE 0x10
struct test_struct {
int pid;
unsigned long guest_mem_size;
unsigned long gpa;
unsigned long hva;
unsigned long hpa;
};
static void usage(void)
{
fprintf(stdout, "\n");
fprintf(stdout, "./test <gpa>\n");
fprintf(stdout, "gpa The GPA (Guest Physical Address) where the error is injected\n");
fprintf(stdout, "\n");
}
static void init_test_struct(struct test_struct *test)
{
test->pid = -1;
test->guest_mem_size = TEST_GUEST_MEM_SIZE;
test->gpa = -1UL;
test->hpa = -1UL;
}
static int fetch_gpa(struct test_struct *test, int argc, char **argv)
{
if (argc != 2) {
usage();
return -EINVAL;
}
test->gpa = strtoul(argv[1], NULL, 16);
if (test->gpa < TEST_GUEST_MEM_START ||
test->gpa > (TEST_GUEST_MEM_START + test->guest_mem_size)) {
fprintf(stderr, "%s: GPA 0x%lx out of range [1GB, 1GB+0x%lx]\n",
__func__, test->gpa, test->guest_mem_size);
return -EINVAL;
}
return 0;
}
static int find_qemu_pid(struct test_struct *test)
{
DIR *dir;
FILE *fp;
struct dirent *entry;
char path[256], data[256];
size_t sz;
int ret = -ENODEV;
dir = opendir("/proc");
if (!dir) {
fprintf(stderr, "%s: unable to open </proc>\n", __func__);
return -EIO;
}
while ((entry = readdir(dir)) != NULL) {
if (entry->d_type != DT_DIR || entry->d_name[0] == '.')
continue;
memset(path, 0, sizeof(path));
snprintf(path, sizeof(path), "/proc/%s/comm", entry->d_name);
fp = fopen(path, "r");
if (!fp)
continue;
memset(data, 0, sizeof(data));
sz = fread(data, 1, sizeof(data), fp);
fclose(fp);
if (sz <= 0)
continue;
if (strstr(data, "qemu")) {
ret = 0;
test->pid = atoi(entry->d_name);
break;
}
}
if (ret != 0)
fprintf(stderr, "%s: Unable to find QEMU PID\n", __func__);
closedir(dir);
return ret;
}
static int fetch_hva(struct test_struct *test)
{
FILE *fp;
char filename[64], *data = NULL, *next, *next1;
unsigned long start, end;
size_t sz, len;
int ret = -EIO;
memset(filename, 0, sizeof(filename));
snprintf(filename, sizeof(filename), "/proc/%d/smaps", test->pid);
fp = fopen(filename, "r");
if (!fp) {
fprintf(stderr, "%s: Unable to open <%s>\n", __func__, filename);
return ret;
}
while ((sz = getline(&data, &len, fp)) != -1) {
if (!strstr(data, "rw-p"))
continue;
next = strchr(data, '-');
if (!next)
continue;
*next++ = '\0';
next1 = strchr(next, ' ');
if (!next1)
continue;
*next1 = '\0';
start = strtoul(data, NULL, 16);
end = strtoul(next, NULL, 16);
if (end - start == test->guest_mem_size) {
ret = 0;
test->hva = start + (test->gpa - TEST_GUEST_MEM_START);
break;
}
}
if (data)
free(data);
fclose(fp);
return ret;
}
static int fetch_hpa(struct test_struct *test)
{
int fd;
unsigned long pinfo, pgsize = getpagesize();
off_t offset = (test->hva / pgsize) * sizeof(pinfo);
char filename[128];
ssize_t sz;
memset(filename, 0, sizeof(filename));
snprintf(filename, sizeof(filename), "/proc/%d/pagemap", test->pid);
fd = open(filename, O_RDONLY);
if (fd < 0) {
fprintf(stderr, "%s: Unable to open <%s>\n", __func__, filename);
return -EIO;
}
sz = pread(fd, &pinfo, sizeof(pinfo), offset);
close(fd);
if (sz != sizeof(pinfo)) {
fprintf(stderr, "%s: Unable to read from <%s>\n", __func__, filename);
return -EIO;
}
if (!(pinfo & (1UL << 63))) {
fprintf(stderr, "%s: Page not present\n", __func__);
return -EINVAL;
}
test->hpa = ((pinfo & 0x007fffffffffffffUL) * pgsize) + (test->hva & (pgsize - 1));
return 0;
}
static int write_file(const char *filename, unsigned long val)
{
int fd;
char data[128];
size_t sz;
int ret = 0;
memset(data, 0, sizeof(data));
sz = snprintf(data, sizeof(data), "0x%lx", val);
fd = open(filename, O_WRONLY);
if (fd < 0) {
fprintf(stderr, "%s: Unable to open <%s>\n", __func__, filename);
return -EIO;
}
if (write(fd, data, sz) != sz) {
ret = -EIO;
fprintf(stderr, "%s: Unable to write <%s>\n", __func__, filename);
}
close(fd);
return ret;
}
static int inject_error(struct test_struct *test)
{
fprintf(stdout, "pid: %d\n", test->pid);
fprintf(stdout, "gpa: 0x%lx\n", test->gpa);
fprintf(stdout, "hva: 0x%lx\n", test->hva);
fprintf(stdout, "hpa: 0x%lx\n", test->hpa);
system("modprobe einj > /dev/null");
if (write_file("/sys/kernel/debug/apei/einj/param1", test->hpa) ||
write_file("/sys/kernel/debug/apei/einj/param2", 0xfffffffffffff000) ||
write_file("/sys/kernel/debug/apei/einj/flags", 0x0) ||
write_file("/sys/kernel/debug/apei/einj/error_type", TEST_INJECT_ERROR_TYPE) ||
write_file("/sys/kernel/debug/apei/einj/notrigger", 1) ||
write_file("/sys/kernel/debug/apei/einj/error_inject", 1))
return -EIO;
return 0;
}
int main(int argc, char **argv)
{
struct test_struct test;
int ret;
init_test_struct(&test);
if (fetch_gpa(&test, argc, argv) ||
find_qemu_pid(&test) ||
fetch_hva(&test) ||
fetch_hpa(&test) ||
inject_error(&test))
return -EIO;
return 0;
}
next prev parent reply other threads:[~2025-12-01 14:14 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-27 0:44 [PATCH 0/5] acpi/ghes: Error object handling improvement Gavin Shan
2025-11-27 0:44 ` [PATCH 1/5] acpi/ghes: Automate data block cleanup in acpi_ghes_memory_errors() Gavin Shan
2025-11-27 8:06 ` Markus Armbruster
2025-12-01 9:32 ` Igor Mammedov
2025-11-27 0:44 ` [PATCH 2/5] acpi/ghes: Abort in acpi_ghes_memory_errors() if necessary Gavin Shan
2025-12-01 9:37 ` Igor Mammedov
2025-11-27 0:44 ` [PATCH 3/5] target/arm/kvm: Exit on error from acpi_ghes_memory_errors() Gavin Shan
2025-11-28 14:07 ` Igor Mammedov
2025-11-28 14:54 ` Markus Armbruster
2025-12-01 10:06 ` Igor Mammedov
2025-11-27 0:44 ` [PATCH 4/5] acpi/ghes: Bail early on error from get_ghes_source_offsets() Gavin Shan
2025-11-27 8:10 ` Markus Armbruster
2025-12-01 10:10 ` Igor Mammedov
2025-12-01 14:15 ` Gavin Shan
2025-11-27 0:44 ` [PATCH 5/5] acpi/ghes: Use error_fatal in acpi_ghes_memory_errors() Gavin Shan
2025-11-27 8:14 ` Markus Armbruster
2025-11-29 1:23 ` Gavin Shan
2025-12-01 10:12 ` Igor Mammedov
2025-11-28 14:09 ` [PATCH 0/5] acpi/ghes: Error object handling improvement Igor Mammedov
2025-11-29 1:21 ` Gavin Shan
2025-12-01 9:31 ` Igor Mammedov
2025-12-01 12:17 ` Mauro Carvalho Chehab
2025-12-01 14:13 ` Gavin Shan [this message]
2025-12-01 14:31 ` Mauro Carvalho Chehab
2025-12-01 14:37 ` Gavin Shan
2025-12-02 12:10 ` Igor Mammedov
2025-12-02 13:20 ` Peter Maydell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=12b7baee-1d6d-440a-a119-971b47d7f3ad@redhat.com \
--to=gshan@redhat.com \
--cc=anisinha@redhat.com \
--cc=armbru@redhat.com \
--cc=gengdongjiu1@gmail.com \
--cc=imammedo@redhat.com \
--cc=jonathan.cameron@huawei.com \
--cc=mchehab+huawei@kernel.org \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=shan.gavin@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).