qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: William Roche <william.roche@oracle.com>
To: Peter Xu <peterx@redhat.com>, david@redhat.com
Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org, qemu-arm@nongnu.org,
	pbonzini@redhat.com, richard.henderson@linaro.org,
	philmd@linaro.org, peter.maydell@linaro.org,
	joao.m.martins@oracle.com
Subject: Re: [PATCH v8 0/3] Poisoned memory recovery on reboot
Date: Thu, 13 Feb 2025 20:35:09 +0100	[thread overview]
Message-ID: <6e8aedfc-f270-4fa8-a1d3-df0389e505cb@oracle.com> (raw)
In-Reply-To: <Z6vQvr4dCCsBR2sX@x1.local>

On 2/11/25 23:35, Peter Xu wrote:
> On Tue, Feb 11, 2025 at 09:27:04PM +0000, “William Roche wrote:
>> From: William Roche <william.roche@oracle.com>
>>
>> Here is a very simplified version of my fix only dealing with the
>> recovery of huge pages on VM reset.
>>   ---
>> This set of patches fixes an existing bug with hardware memory errors
>> impacting hugetlbfs memory backed VMs and its recovery on VM reset.
>> When using hugetlbfs large pages, any large page location being impacted
>> by an HW memory error results in poisoning the entire page, suddenly
>> making a large chunk of the VM memory unusable.
>>
>> The main problem that currently exists in Qemu is the lack of backend
>> file repair before resetting the VM memory, resulting in the impacted
>> memory to be silently unusable even after a VM reboot.
>>
>> In order to fix this issue, we take into account the page size of the
>> impacted memory block when dealing with the associated poisoned page
>> location.
>>
>> Using the page size information we also try to regenerate the memory
>> calling ram_block_discard_range() on VM reset when running
>> qemu_ram_remap(). So that a poisoned memory backed by a hugetlbfs
>> file is regenerated with a hole punched in this file. A new page is
>> loaded when the location is first touched.  In case of a discard
>> failure we fall back to remapping the memory location.
>>
>> But we currently don't reset the memory settings and the 'prealloc'
>> attribute is ignored after the remap from the file backend.
> 
> queued patch 1-2, thanks.
> 

Thank you very much Peter, and thanks to David too !

According to me, ARM needs more than only error injection messages.
For example, the loop of errors that can appear during kdump when 
dealing with large pages is a real problem, hanging a VM.

There is also the remap notification (to better deal with 'prealloc' 
attribute for example) that needs to be implemented now.

And finally the kernel KVM enhancement needed on x86 to return a more 
accurate SIGBUS siginfo.si_addr_lsb value on large pages memory errors.
Qemu could than take this information into account to provide more 
useful feedback about the 'failed' memory size.

I don't know yet when I'll have the possibility to come back to these 
problems, but at least we have the recovery of large pages mostly fixed 
with the 2 patches queued.

Thanks again,
William.


  reply	other threads:[~2025-02-13 19:36 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-11 21:27 [PATCH v8 0/3] Poisoned memory recovery on reboot “William Roche
2025-02-11 21:27 ` [PATCH v8 1/3] system/physmem: handle hugetlb correctly in qemu_ram_remap() “William Roche
2025-02-11 21:27 ` [PATCH v8 2/3] system/physmem: poisoned memory discard on reboot “William Roche
2025-02-11 21:27 ` [PATCH v8 3/3] target/arm/kvm: Report memory errors injection “William Roche
2025-02-11 22:35 ` [PATCH v8 0/3] Poisoned memory recovery on reboot Peter Xu
2025-02-13 19:35   ` William Roche [this message]
2025-02-13 20:58     ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6e8aedfc-f270-4fa8-a1d3-df0389e505cb@oracle.com \
    --to=william.roche@oracle.com \
    --cc=david@redhat.com \
    --cc=joao.m.martins@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=peterx@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).