[PATCH v8 0/3] Poisoned memory recovery on reboot

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v8 0/3] Poisoned memory recovery on reboot
@ 2025-02-11 21:27 “William Roche
  2025-02-11 21:27 ` [PATCH v8 1/3] system/physmem: handle hugetlb correctly in qemu_ram_remap() “William Roche
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: “William Roche @ 2025-02-11 21:27 UTC (permalink / raw)
  To: peterx, david, kvm, qemu-devel, qemu-arm
  Cc: william.roche, pbonzini, richard.henderson, philmd, peter.maydell,
	joao.m.martins

From: William Roche <william.roche@oracle.com>

Here is a very simplified version of my fix only dealing with the
recovery of huge pages on VM reset.
 ---
This set of patches fixes an existing bug with hardware memory errors
impacting hugetlbfs memory backed VMs and its recovery on VM reset.
When using hugetlbfs large pages, any large page location being impacted
by an HW memory error results in poisoning the entire page, suddenly
making a large chunk of the VM memory unusable.

The main problem that currently exists in Qemu is the lack of backend
file repair before resetting the VM memory, resulting in the impacted
memory to be silently unusable even after a VM reboot.

In order to fix this issue, we take into account the page size of the
impacted memory block when dealing with the associated poisoned page
location.

Using the page size information we also try to regenerate the memory
calling ram_block_discard_range() on VM reset when running
qemu_ram_remap(). So that a poisoned memory backed by a hugetlbfs
file is regenerated with a hole punched in this file. A new page is
loaded when the location is first touched.  In case of a discard
failure we fall back to remapping the memory location.

But we currently don't reset the memory settings and the 'prealloc'
attribute is ignored after the remap from the file backend.
 ----

v1 -> v2:
. I removed the kernel SIGBUS siginfo provided lsb size information
  tracking. Only relying on the RAMBlock page_size instead.
. I adapted the 3 patches you indicated me to implement the
  notification mechanism on remap.  Thank you for this code!
  I left them as Authored by you.
  But I haven't tested if the policy setting works as expected on VM
  reset, only that the replacement of physical memory works.
. I also removed the old memory setting that was kept in qemu_ram_remap()
  but this small last fix could probably be merged with your last commit.

v2 -> v3:
. dropped the size parameter from qemu_ram_remap() and determine the page
  size when adding it to the poison list, aligning the offset down to the
  pagesize. Multiple sub-pages poisoned on a large page lead to a single
  poison entry.
. introduction of a helper function for the mmap code
. adding "on lost large page <size>@<ram_addr>" to the error injection
  msg (notation used in qemu_ram_remap() too ).
  So only in the case of a large page, it looks like:
Guest MCE Memory Error at QEMU addr 0x7fc1f5dd6000 and GUEST addr 0x19fd6000 on lost large page 200000@19e00000 of type BUS_MCEERR_AR injected
. as we need the page_size value for the above message, I retrieve the
  value in kvm_arch_on_sigbus_vcpu() to pass the appropriate pointer
  to kvm_hwpoison_page_add() that doesn't need to align it anymore.
. added a similar message for the ARM platform (removing the MCE
  keyword)
. I also introduced a "fail hard" in the remap notification:
  host_memory_backend_ram_remapped()

v3 -> v4:
. Fixed some commit messages typos
. Enhanced some code comments
. Changed the discard fall back conditions to consider only anonymous
  memory
. Fixed missing some variable name changes in intermediary patches.
. Modify the error message given when an error is injected to report
  the case of a large page
. use snprintf() to generate this message
. Adding this same type of message in the ARM case too

v4->v5:
. Updated commit messages (for patches 1, 5 and 6)
. Fixed comment typo of patch 2
. Changed the fall back function parameters to match the
  ram_block_discard_range() function.
. Removed the unused case of remapping a file in this function
. add the assert(block->fd < 0) in this function too
. I merged my patch 7 with your patch 6 (we only have 6 patches now)

v5->v6:
. don't align down ram_addr on kvm_hwpoison_page_add() but create
  a new entry for each subpage reported as poisoned
. introduce similar messages about memory error as discard_range()
. introduce a function to retrieve more information about a RAMBlock
  experiencing an error than just its associated page size
. file offset as an uint64_t instead of a ram_addr_t
. changed ownership of patch 6/6

v6->v7:
. change the block location information collection function name to
  qemu_ram_block_info_from_addr()
. display the fd_offset value only when dealing with a file backend
  in kvm_hwpoison_page_add() and qemu_ram_remap()
. better placed offset alignment computation
. two empty separation lines missing

v7->v8:
. shrinking the code to only fix the main bug with memory recovery
. the 'else' statement mistakenly removed is back in patch 2/3
. all the remap notification mechanism is removed, including
  prealloc hooks on remap
. no more memory error message specifically informing of a large
  page being impacted by a memory failure

This code is scripts/checkpatch.pl clean
'make check' runs clean on both x86 and ARM.

William Roche (3):
  system/physmem: handle hugetlb correctly in qemu_ram_remap()
  system/physmem: poisoned memory discard on reboot
  target/arm/kvm: Report memory errors injection

 accel/kvm/kvm-all.c       |  2 +-
 include/exec/cpu-common.h |  2 +-
 system/physmem.c          | 85 +++++++++++++++++++++++++++------------
 target/arm/kvm.c          |  3 ++
 4 files changed, 65 insertions(+), 27 deletions(-)

-- 
2.43.5

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v8 1/3] system/physmem: handle hugetlb correctly in qemu_ram_remap()
  2025-02-11 21:27 [PATCH v8 0/3] Poisoned memory recovery on reboot “William Roche
@ 2025-02-11 21:27 ` “William Roche
  2025-02-11 21:27 ` [PATCH v8 2/3] system/physmem: poisoned memory discard on reboot “William Roche
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: “William Roche @ 2025-02-11 21:27 UTC (permalink / raw)
  To: peterx, david, kvm, qemu-devel, qemu-arm
  Cc: william.roche, pbonzini, richard.henderson, philmd, peter.maydell,
	joao.m.martins

From: William Roche <william.roche@oracle.com>

The list of hwpoison pages used to remap the memory on reset
is based on the backend real page size.
To correctly handle hugetlb, we must mmap(MAP_FIXED) a complete
hugetlb page; hugetlb pages cannot be partially mapped.

Signed-off-by: William Roche <william.roche@oracle.com>
Co-developed-by: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
---
 accel/kvm/kvm-all.c       |  2 +-
 include/exec/cpu-common.h |  2 +-
 system/physmem.c          | 38 +++++++++++++++++++++++++++++---------
 3 files changed, 31 insertions(+), 11 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index c65b790433..f89568bfa3 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1288,7 +1288,7 @@ static void kvm_unpoison_all(void *param)
 
     QLIST_FOREACH_SAFE(page, &hwpoison_page_list, list, next_page) {
         QLIST_REMOVE(page, list);
-        qemu_ram_remap(page->ram_addr, TARGET_PAGE_SIZE);
+        qemu_ram_remap(page->ram_addr);
         g_free(page);
     }
 }
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index b1d76d6985..3771b2130c 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -67,7 +67,7 @@ typedef uintptr_t ram_addr_t;
 
 /* memory API */
 
-void qemu_ram_remap(ram_addr_t addr, ram_addr_t length);
+void qemu_ram_remap(ram_addr_t addr);
 /* This should not be used by devices.  */
 ram_addr_t qemu_ram_addr_from_host(void *ptr);
 ram_addr_t qemu_ram_addr_from_host_nofail(void *ptr);
diff --git a/system/physmem.c b/system/physmem.c
index 67c9db9daa..a5d848b350 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -2275,17 +2275,35 @@ void qemu_ram_free(RAMBlock *block)
 }
 
 #ifndef _WIN32
-void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
+/*
+ * qemu_ram_remap - remap a single RAM page
+ *
+ * @addr: address in ram_addr_t address space.
+ *
+ * This function will try remapping a single page of guest RAM identified by
+ * @addr, essentially discarding memory to recover from previously poisoned
+ * memory (MCE). The page size depends on the RAMBlock (i.e., hugetlb). @addr
+ * does not have to point at the start of the page.
+ *
+ * This function is only to be used during system resets; it will kill the
+ * VM if remapping failed.
+ */
+void qemu_ram_remap(ram_addr_t addr)
 {
     RAMBlock *block;
-    ram_addr_t offset;
+    uint64_t offset;
     int flags;
     void *area, *vaddr;
     int prot;
+    size_t page_size;
 
     RAMBLOCK_FOREACH(block) {
         offset = addr - block->offset;
         if (offset < block->max_length) {
+            /* Respect the pagesize of our RAMBlock */
+            page_size = qemu_ram_pagesize(block);
+            offset = QEMU_ALIGN_DOWN(offset, page_size);
+
             vaddr = ramblock_ptr(block, offset);
             if (block->flags & RAM_PREALLOC) {
                 ;
@@ -2299,21 +2317,23 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
                 prot = PROT_READ;
                 prot |= block->flags & RAM_READONLY ? 0 : PROT_WRITE;
                 if (block->fd >= 0) {
-                    area = mmap(vaddr, length, prot, flags, block->fd,
+                    area = mmap(vaddr, page_size, prot, flags, block->fd,
                                 offset + block->fd_offset);
                 } else {
                     flags |= MAP_ANONYMOUS;
-                    area = mmap(vaddr, length, prot, flags, -1, 0);
+                    area = mmap(vaddr, page_size, prot, flags, -1, 0);
                 }
                 if (area != vaddr) {
-                    error_report("Could not remap addr: "
-                                 RAM_ADDR_FMT "@" RAM_ADDR_FMT "",
-                                 length, addr);
+                    error_report("Could not remap RAM %s:%" PRIx64 "+%" PRIx64
+                                 " +%zx", block->idstr, offset,
+                                 block->fd_offset, page_size);
                     exit(1);
                 }
-                memory_try_enable_merging(vaddr, length);
-                qemu_ram_setup_dump(vaddr, length);
+                memory_try_enable_merging(vaddr, page_size);
+                qemu_ram_setup_dump(vaddr, page_size);
             }
+
+            break;
         }
     }
 }
-- 
2.43.5



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v8 2/3] system/physmem: poisoned memory discard on reboot
  2025-02-11 21:27 [PATCH v8 0/3] Poisoned memory recovery on reboot “William Roche
  2025-02-11 21:27 ` [PATCH v8 1/3] system/physmem: handle hugetlb correctly in qemu_ram_remap() “William Roche
@ 2025-02-11 21:27 ` “William Roche
  2025-02-11 21:27 ` [PATCH v8 3/3] target/arm/kvm: Report memory errors injection “William Roche
  2025-02-11 22:35 ` [PATCH v8 0/3] Poisoned memory recovery on reboot Peter Xu
  3 siblings, 0 replies; 7+ messages in thread
From: “William Roche @ 2025-02-11 21:27 UTC (permalink / raw)
  To: peterx, david, kvm, qemu-devel, qemu-arm
  Cc: william.roche, pbonzini, richard.henderson, philmd, peter.maydell,
	joao.m.martins

From: William Roche <william.roche@oracle.com>

Repair poisoned memory location(s), calling ram_block_discard_range():
punching a hole in the backend file when necessary and regenerating
a usable memory.
If the kernel doesn't support the madvise calls used by this function
and we are dealing with anonymous memory, fall back to remapping the
location(s).

Signed-off-by: William Roche <william.roche@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
---
 system/physmem.c | 57 ++++++++++++++++++++++++++++++------------------
 1 file changed, 36 insertions(+), 21 deletions(-)

diff --git a/system/physmem.c b/system/physmem.c
index a5d848b350..5d97a5fe11 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -2275,6 +2275,23 @@ void qemu_ram_free(RAMBlock *block)
 }
 
 #ifndef _WIN32
+/* Simply remap the given VM memory location from start to start+length */
+static int qemu_ram_remap_mmap(RAMBlock *block, uint64_t start, size_t length)
+{
+    int flags, prot;
+    void *area;
+    void *host_startaddr = block->host + start;
+
+    assert(block->fd < 0);
+    flags = MAP_FIXED | MAP_ANONYMOUS;
+    flags |= block->flags & RAM_SHARED ? MAP_SHARED : MAP_PRIVATE;
+    flags |= block->flags & RAM_NORESERVE ? MAP_NORESERVE : 0;
+    prot = PROT_READ;
+    prot |= block->flags & RAM_READONLY ? 0 : PROT_WRITE;
+    area = mmap(host_startaddr, length, prot, flags, -1, 0);
+    return area != host_startaddr ? -errno : 0;
+}
+
 /*
  * qemu_ram_remap - remap a single RAM page
  *
@@ -2292,9 +2309,7 @@ void qemu_ram_remap(ram_addr_t addr)
 {
     RAMBlock *block;
     uint64_t offset;
-    int flags;
-    void *area, *vaddr;
-    int prot;
+    void *vaddr;
     size_t page_size;
 
     RAMBLOCK_FOREACH(block) {
@@ -2310,24 +2325,24 @@ void qemu_ram_remap(ram_addr_t addr)
             } else if (xen_enabled()) {
                 abort();
             } else {
-                flags = MAP_FIXED;
-                flags |= block->flags & RAM_SHARED ?
-                         MAP_SHARED : MAP_PRIVATE;
-                flags |= block->flags & RAM_NORESERVE ? MAP_NORESERVE : 0;
-                prot = PROT_READ;
-                prot |= block->flags & RAM_READONLY ? 0 : PROT_WRITE;
-                if (block->fd >= 0) {
-                    area = mmap(vaddr, page_size, prot, flags, block->fd,
-                                offset + block->fd_offset);
-                } else {
-                    flags |= MAP_ANONYMOUS;
-                    area = mmap(vaddr, page_size, prot, flags, -1, 0);
-                }
-                if (area != vaddr) {
-                    error_report("Could not remap RAM %s:%" PRIx64 "+%" PRIx64
-                                 " +%zx", block->idstr, offset,
-                                 block->fd_offset, page_size);
-                    exit(1);
+                if (ram_block_discard_range(block, offset, page_size) != 0) {
+                    /*
+                     * Fall back to using mmap() only for anonymous mapping,
+                     * as if a backing file is associated we may not be able
+                     * to recover the memory in all cases.
+                     * So don't take the risk of using only mmap and fail now.
+                     */
+                    if (block->fd >= 0) {
+                        error_report("Could not remap RAM %s:%" PRIx64 "+%"
+                                     PRIx64 " +%zx", block->idstr, offset,
+                                     block->fd_offset, page_size);
+                        exit(1);
+                    }
+                    if (qemu_ram_remap_mmap(block, offset, page_size) != 0) {
+                        error_report("Could not remap RAM %s:%" PRIx64 " +%zx",
+                                     block->idstr, offset, page_size);
+                        exit(1);
+                    }
                 }
                 memory_try_enable_merging(vaddr, page_size);
                 qemu_ram_setup_dump(vaddr, page_size);
-- 
2.43.5



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v8 3/3] target/arm/kvm: Report memory errors injection
  2025-02-11 21:27 [PATCH v8 0/3] Poisoned memory recovery on reboot “William Roche
  2025-02-11 21:27 ` [PATCH v8 1/3] system/physmem: handle hugetlb correctly in qemu_ram_remap() “William Roche
  2025-02-11 21:27 ` [PATCH v8 2/3] system/physmem: poisoned memory discard on reboot “William Roche
@ 2025-02-11 21:27 ` “William Roche
  2025-02-11 22:35 ` [PATCH v8 0/3] Poisoned memory recovery on reboot Peter Xu
  3 siblings, 0 replies; 7+ messages in thread
From: “William Roche @ 2025-02-11 21:27 UTC (permalink / raw)
  To: peterx, david, kvm, qemu-devel, qemu-arm
  Cc: william.roche, pbonzini, richard.henderson, philmd, peter.maydell,
	joao.m.martins

From: William Roche <william.roche@oracle.com>

Generate an x86 similar error injection message on ras enabled ARM
platforms.
ARM qemu only deals with action required memory errors signaled with
SIGBUS/BUS_MCEERR_AR, and will report a message on every memory error
relayed to the VM. A message like:
Guest Memory Error at QEMU addr 0xXXX and GUEST addr 0xYYY of type BUS_MCEERR_AR injected

Signed-off-by: William Roche <william.roche@oracle.com>
---
 target/arm/kvm.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index da30bdbb23..d9dedc6d74 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -2389,6 +2389,9 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, void *addr)
                 kvm_cpu_synchronize_state(c);
                 if (!acpi_ghes_memory_errors(ACPI_HEST_SRC_ID_SEA, paddr)) {
                     kvm_inject_arm_sea(c);
+                    error_report("Guest Memory Error at QEMU addr %p and "
+                        "GUEST addr 0x%" HWADDR_PRIx " of type %s injected",
+                        addr, paddr, "BUS_MCEERR_AR");
                 } else {
                     error_report("failed to record the error");
                     abort();
-- 
2.43.5



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v8 0/3] Poisoned memory recovery on reboot
  2025-02-11 21:27 [PATCH v8 0/3] Poisoned memory recovery on reboot “William Roche
                   ` (2 preceding siblings ...)
  2025-02-11 21:27 ` [PATCH v8 3/3] target/arm/kvm: Report memory errors injection “William Roche
@ 2025-02-11 22:35 ` Peter Xu
  2025-02-13 19:35   ` William Roche
  3 siblings, 1 reply; 7+ messages in thread
From: Peter Xu @ 2025-02-11 22:35 UTC (permalink / raw)
  To: “William Roche
  Cc: david, kvm, qemu-devel, qemu-arm, pbonzini, richard.henderson,
	philmd, peter.maydell, joao.m.martins

On Tue, Feb 11, 2025 at 09:27:04PM +0000, “William Roche wrote:
> From: William Roche <william.roche@oracle.com>
> 
> Here is a very simplified version of my fix only dealing with the
> recovery of huge pages on VM reset.
>  ---
> This set of patches fixes an existing bug with hardware memory errors
> impacting hugetlbfs memory backed VMs and its recovery on VM reset.
> When using hugetlbfs large pages, any large page location being impacted
> by an HW memory error results in poisoning the entire page, suddenly
> making a large chunk of the VM memory unusable.
> 
> The main problem that currently exists in Qemu is the lack of backend
> file repair before resetting the VM memory, resulting in the impacted
> memory to be silently unusable even after a VM reboot.
> 
> In order to fix this issue, we take into account the page size of the
> impacted memory block when dealing with the associated poisoned page
> location.
> 
> Using the page size information we also try to regenerate the memory
> calling ram_block_discard_range() on VM reset when running
> qemu_ram_remap(). So that a poisoned memory backed by a hugetlbfs
> file is regenerated with a hole punched in this file. A new page is
> loaded when the location is first touched.  In case of a discard
> failure we fall back to remapping the memory location.
> 
> But we currently don't reset the memory settings and the 'prealloc'
> attribute is ignored after the remap from the file backend.

queued patch 1-2, thanks.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v8 0/3] Poisoned memory recovery on reboot
  2025-02-11 22:35 ` [PATCH v8 0/3] Poisoned memory recovery on reboot Peter Xu
@ 2025-02-13 19:35   ` William Roche
  2025-02-13 20:58     ` Peter Xu
  0 siblings, 1 reply; 7+ messages in thread
From: William Roche @ 2025-02-13 19:35 UTC (permalink / raw)
  To: Peter Xu, david
  Cc: kvm, qemu-devel, qemu-arm, pbonzini, richard.henderson, philmd,
	peter.maydell, joao.m.martins

On 2/11/25 23:35, Peter Xu wrote:
> On Tue, Feb 11, 2025 at 09:27:04PM +0000, “William Roche wrote:
>> From: William Roche <william.roche@oracle.com>
>>
>> Here is a very simplified version of my fix only dealing with the
>> recovery of huge pages on VM reset.
>>   ---
>> This set of patches fixes an existing bug with hardware memory errors
>> impacting hugetlbfs memory backed VMs and its recovery on VM reset.
>> When using hugetlbfs large pages, any large page location being impacted
>> by an HW memory error results in poisoning the entire page, suddenly
>> making a large chunk of the VM memory unusable.
>>
>> The main problem that currently exists in Qemu is the lack of backend
>> file repair before resetting the VM memory, resulting in the impacted
>> memory to be silently unusable even after a VM reboot.
>>
>> In order to fix this issue, we take into account the page size of the
>> impacted memory block when dealing with the associated poisoned page
>> location.
>>
>> Using the page size information we also try to regenerate the memory
>> calling ram_block_discard_range() on VM reset when running
>> qemu_ram_remap(). So that a poisoned memory backed by a hugetlbfs
>> file is regenerated with a hole punched in this file. A new page is
>> loaded when the location is first touched.  In case of a discard
>> failure we fall back to remapping the memory location.
>>
>> But we currently don't reset the memory settings and the 'prealloc'
>> attribute is ignored after the remap from the file backend.
> 
> queued patch 1-2, thanks.
> 

Thank you very much Peter, and thanks to David too !

According to me, ARM needs more than only error injection messages.
For example, the loop of errors that can appear during kdump when 
dealing with large pages is a real problem, hanging a VM.

There is also the remap notification (to better deal with 'prealloc' 
attribute for example) that needs to be implemented now.

And finally the kernel KVM enhancement needed on x86 to return a more 
accurate SIGBUS siginfo.si_addr_lsb value on large pages memory errors.
Qemu could than take this information into account to provide more 
useful feedback about the 'failed' memory size.

I don't know yet when I'll have the possibility to come back to these 
problems, but at least we have the recovery of large pages mostly fixed 
with the 2 patches queued.

Thanks again,
William.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v8 0/3] Poisoned memory recovery on reboot
  2025-02-13 19:35   ` William Roche
@ 2025-02-13 20:58     ` Peter Xu
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Xu @ 2025-02-13 20:58 UTC (permalink / raw)
  To: William Roche
  Cc: david, kvm, qemu-devel, qemu-arm, pbonzini, richard.henderson,
	philmd, peter.maydell, joao.m.martins, Eric Auger

On Thu, Feb 13, 2025 at 08:35:09PM +0100, William Roche wrote:
> On 2/11/25 23:35, Peter Xu wrote:
> > On Tue, Feb 11, 2025 at 09:27:04PM +0000, “William Roche wrote:
> > > From: William Roche <william.roche@oracle.com>
> > > 
> > > Here is a very simplified version of my fix only dealing with the
> > > recovery of huge pages on VM reset.
> > >   ---
> > > This set of patches fixes an existing bug with hardware memory errors
> > > impacting hugetlbfs memory backed VMs and its recovery on VM reset.
> > > When using hugetlbfs large pages, any large page location being impacted
> > > by an HW memory error results in poisoning the entire page, suddenly
> > > making a large chunk of the VM memory unusable.
> > > 
> > > The main problem that currently exists in Qemu is the lack of backend
> > > file repair before resetting the VM memory, resulting in the impacted
> > > memory to be silently unusable even after a VM reboot.
> > > 
> > > In order to fix this issue, we take into account the page size of the
> > > impacted memory block when dealing with the associated poisoned page
> > > location.
> > > 
> > > Using the page size information we also try to regenerate the memory
> > > calling ram_block_discard_range() on VM reset when running
> > > qemu_ram_remap(). So that a poisoned memory backed by a hugetlbfs
> > > file is regenerated with a hole punched in this file. A new page is
> > > loaded when the location is first touched.  In case of a discard
> > > failure we fall back to remapping the memory location.
> > > 
> > > But we currently don't reset the memory settings and the 'prealloc'
> > > attribute is ignored after the remap from the file backend.
> > 
> > queued patch 1-2, thanks.
> > 
> 
> Thank you very much Peter, and thanks to David too !
> 
> According to me, ARM needs more than only error injection messages.
> For example, the loop of errors that can appear during kdump when dealing
> with large pages is a real problem, hanging a VM.

Maybe it'll be better to post arm patches separately so that it can attract
more attention from arm developers specifically.

I see that PeterM is in the loop, besides you could also try to copy Eric
Auger <eric.auger@redhat.com>.  I have Eric copied.  I'm not sure whether
Eric or the team would be interested in this too at some point.

> 
> There is also the remap notification (to better deal with 'prealloc'
> attribute for example) that needs to be implemented now.

Right, this one should be easy, IMHO.  And I hope if prealloc is the only
concern then the impact is low, I mean prefaults for hugetlb pages is less
important comparing to small pages - fault one 1G page which used to be
poisoned may not even be detectable from guest as long as there're still
free 1G available.

> 
> And finally the kernel KVM enhancement needed on x86 to return a more
> accurate SIGBUS siginfo.si_addr_lsb value on large pages memory errors.
> Qemu could than take this information into account to provide more useful
> feedback about the 'failed' memory size.

I confess this isn't high priority for now.  Before hugetlb pages can be
split this isn't providing much help indeed, as QEMU knows whatever the
kernel knows.

It can be more important until someone allows hugetlb pages to split so
poison may only affect 4k out of 1G.  In that case it must report with 4k
of part of 1G.  We'll see how that would work out at last, we may need some
new mm interface to tell the userspace that "when I say 4k is poisoned,
it's really only that 4k..", or something like that.

> 
> I don't know yet when I'll have the possibility to come back to these
> problems, but at least we have the recovery of large pages mostly fixed with
> the 2 patches queued.

Yes, thanks for fixing it.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-02-13 20:59 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-11 21:27 [PATCH v8 0/3] Poisoned memory recovery on reboot “William Roche
2025-02-11 21:27 ` [PATCH v8 1/3] system/physmem: handle hugetlb correctly in qemu_ram_remap() “William Roche
2025-02-11 21:27 ` [PATCH v8 2/3] system/physmem: poisoned memory discard on reboot “William Roche
2025-02-11 21:27 ` [PATCH v8 3/3] target/arm/kvm: Report memory errors injection “William Roche
2025-02-11 22:35 ` [PATCH v8 0/3] Poisoned memory recovery on reboot Peter Xu
2025-02-13 19:35   ` William Roche
2025-02-13 20:58     ` Peter Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).