From: William Roche <william.roche@oracle.com>
To: David Hildenbrand <david@redhat.com>,
kvm@vger.kernel.org, qemu-devel@nongnu.org, qemu-arm@nongnu.org
Cc: peterx@redhat.com, pbonzini@redhat.com,
richard.henderson@linaro.org, philmd@linaro.org,
peter.maydell@linaro.org, mtosatti@redhat.com,
imammedo@redhat.com, eduardo@habkost.net,
marcel.apfelbaum@gmail.com, wangyanan55@huawei.com,
zhao1.liu@intel.com, joao.m.martins@oracle.com
Subject: Re: [PATCH v2 2/7] system/physmem: poisoned memory discard on reboot
Date: Tue, 12 Nov 2024 19:17:31 +0100 [thread overview]
Message-ID: <a79a2639-a6f1-4ca1-9b12-d4e125d894d4@oracle.com> (raw)
In-Reply-To: <b0e80857-b9cb-4e93-81bd-93e8dc4b1d51@redhat.com>
On 11/12/24 12:07, David Hildenbrand wrote:
> On 07.11.24 11:21, “William Roche wrote:
>> From: William Roche <william.roche@oracle.com>
>>
>> We take into account the recorded page sizes to repair the
>> memory locations, calling ram_block_discard_range() to punch a hole
>> in the backend file when necessary and regenerate a usable memory.
>> Fall back to unmap/remap the memory location(s) if the kernel doesn't
>> support the madvise calls used by ram_block_discard_range().
>>
>> Hugetlbfs poison case is also taken into account as a hole punch
>> with fallocate will reload a new page when first touched.
>>
>> Signed-off-by: William Roche <william.roche@oracle.com>
>> ---
>> system/physmem.c | 50 +++++++++++++++++++++++++++++-------------------
>> 1 file changed, 30 insertions(+), 20 deletions(-)
>>
>> diff --git a/system/physmem.c b/system/physmem.c
>> index 750604d47d..dfea120cc5 100644
>> --- a/system/physmem.c
>> +++ b/system/physmem.c
>> @@ -2197,27 +2197,37 @@ void qemu_ram_remap(ram_addr_t addr,
>> ram_addr_t length)
>> } else if (xen_enabled()) {
>> abort();
>> } else {
>> - flags = MAP_FIXED;
>> - flags |= block->flags & RAM_SHARED ?
>> - MAP_SHARED : MAP_PRIVATE;
>> - flags |= block->flags & RAM_NORESERVE ?
>> MAP_NORESERVE : 0;
>> - prot = PROT_READ;
>> - prot |= block->flags & RAM_READONLY ? 0 : PROT_WRITE;
>> - if (block->fd >= 0) {
>> - area = mmap(vaddr, length, prot, flags, block->fd,
>> - offset + block->fd_offset);
>> - } else {
>> - flags |= MAP_ANONYMOUS;
>> - area = mmap(vaddr, length, prot, flags, -1, 0);
>> - }
>> - if (area != vaddr) {
>> - error_report("Could not remap addr: "
>> - RAM_ADDR_FMT "@" RAM_ADDR_FMT "",
>> - length, addr);
>> - exit(1);
>> + if (ram_block_discard_range(block, offset + block-
>> >fd_offset,
>> + length) != 0) {
>> + if (length > TARGET_PAGE_SIZE) {
>> + /* punch hole is mandatory on hugetlbfs */
>> + error_report("large page recovery failure
>> addr: "
>> + RAM_ADDR_FMT "@" RAM_ADDR_FMT "",
>> + length, addr);
>> + exit(1);
>> + }
>
> For shared memory we really need it.
>
> Private file-backed is weird ... because we don't know if the shared or
> the private page is problematic ... :(
I agree with you, and we have to decide when should we bail out if
ram_block_discard_range() doesn't work.
According to me, if discard doesn't work and we are dealing with
file-backed largepages (shared or not) we have to exit, because the
fallocate is mandatory. It is the case with hugetlbfs.
In the non-file-backed case, or the file-backed non-largepage private
case, according to me we can trust the mmap() method to put everything
back in place for the VM reset to work as expected.
Are there aspects I don't see, and for which mmap + the remap handler is
not sufficient and we should also bail out here ?
>
> Maybe we should just do:
>
> if (block->fd >= 0) {
> /* mmap(MAP_FIXED) cannot reliably zap our problematic page. */
> error_report(...);
> exit(-1);
> }
>
> Or alternatively
>
> if (block->fd >= 0 && qemu_ram_is_shared(block)) {
> /* mmap() cannot possibly zap our problematic page. */
> error_report(...);
> exit(-1);
> } else if (block->fd >= 0) {
> /*
> * MAP_PRIVATE file-backed ... mmap() can only zap the private
> * page, not the shared one ... we don't know which one is
> * problematic.
> */
> warn_report(...);
> }
I also agree that any file-backed/shared case should bail out if discard
(fallocate) fails, no mater large or standard pages are used.
In the case of file-backed private standard pages, I think that a poison
on the private page can be fixed with a new mmap.
According to me, there are 2 cases to consider: at the moment the poison
is seen, the page was dirty (so it means that it was a pure private
page), or the page was not dirty, and in this case the poison could
replace this non-dirty page with a new copy of the file content.
In both cases, I'd say that the remap should clean up the poison.
So the conditions when discard fails, could be something like:
if (block->fd >= 0 && (qemu_ram_is_shared(block) ||
(length > TARGET_PAGE_SIZE))) {
/* punch hole is mandatory, mmap() cannot possibly zap our page*/
error_report("%spage recovery failure addr: "
RAM_ADDR_FMT "@" RAM_ADDR_FMT "",
(length > TARGET_PAGE_SIZE) ? "large " : "",
length, addr);
exit(1);
}
>> + flags = MAP_FIXED;
>> + flags |= block->flags & RAM_SHARED ?
>> + MAP_SHARED : MAP_PRIVATE;
>> + flags |= block->flags & RAM_NORESERVE ?
>> MAP_NORESERVE : 0;
>> + prot = PROT_READ;
>> + prot |= block->flags & RAM_READONLY ? 0 :
>> PROT_WRITE;
>> + if (block->fd >= 0) {
>> + area = mmap(vaddr, length, prot, flags,
>> block->fd,
>> + offset + block->fd_offset);
>> + } else {
>> + flags |= MAP_ANONYMOUS;
>> + area = mmap(vaddr, length, prot, flags, -1, 0);
>> + }
>> + if (area != vaddr) {
>> + error_report("Could not remap addr: "
>> + RAM_ADDR_FMT "@" RAM_ADDR_FMT "",
>> + length, addr);
>> + exit(1);
>> + }
>> + memory_try_enable_merging(vaddr, length);
>> + qemu_ram_setup_dump(vaddr, length);
>
> Can we factor the mmap hack out into a separate helper function to clean
> this up a bit?
Sure, I'll do that.
next prev parent reply other threads:[~2024-11-12 18:18 UTC|newest]
Thread overview: 119+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-10 9:07 [RFC 0/6] hugetlbfs largepage RAS project “William Roche
2024-09-10 9:07 ` [RFC 1/6] accel/kvm: SIGBUS handler should also deal with si_addr_lsb “William Roche
2024-09-10 9:07 ` [RFC 2/6] accel/kvm: Keep track of the HWPoisonPage sizes “William Roche
2024-09-10 9:07 ` [RFC 3/6] system/physmem: Remap memory pages on reset based on the page size “William Roche
2024-09-10 9:07 ` [RFC 4/6] system: Introducing hugetlbfs largepage RAS feature “William Roche
2024-09-10 9:07 ` [RFC 5/6] system/hugetlb_ras: Handle madvise SIGBUS signal on listener “William Roche
2024-09-10 9:07 ` [RFC 6/6] system/hugetlb_ras: Replay lost BUS_MCEERR_AO signals on VM resume “William Roche
2024-09-10 10:02 ` [RFC RESEND 0/6] hugetlbfs largepage RAS project “William Roche
2024-09-10 10:02 ` [RFC RESEND 1/6] accel/kvm: SIGBUS handler should also deal with si_addr_lsb “William Roche
2024-09-10 10:02 ` [RFC RESEND 2/6] accel/kvm: Keep track of the HWPoisonPage sizes “William Roche
2024-09-10 10:02 ` [RFC RESEND 3/6] system/physmem: Remap memory pages on reset based on the page size “William Roche
2024-09-10 10:02 ` [RFC RESEND 4/6] system: Introducing hugetlbfs largepage RAS feature “William Roche
2024-09-10 10:02 ` [RFC RESEND 5/6] system/hugetlb_ras: Handle madvise SIGBUS signal on listener “William Roche
2024-09-10 10:02 ` [RFC RESEND 6/6] system/hugetlb_ras: Replay lost BUS_MCEERR_AO signals on VM resume “William Roche
2024-09-10 11:36 ` [RFC RESEND 0/6] hugetlbfs largepage RAS project David Hildenbrand
2024-09-10 16:24 ` William Roche
2024-09-11 22:07 ` David Hildenbrand
2024-09-12 17:07 ` William Roche
2024-09-19 16:52 ` William Roche
2024-10-09 15:45 ` Peter Xu
2024-10-10 20:35 ` William Roche
2024-10-22 21:34 ` [PATCH v1 0/4] hugetlbfs memory HW error fixes “William Roche
2024-10-22 21:35 ` [PATCH v1 1/4] accel/kvm: SIGBUS handler should also deal with si_addr_lsb “William Roche
2024-10-22 21:35 ` [PATCH v1 2/4] accel/kvm: Keep track of the HWPoisonPage page_size “William Roche
2024-10-23 7:28 ` David Hildenbrand
2024-10-25 23:27 ` William Roche
2024-10-28 16:42 ` David Hildenbrand
2024-10-30 1:56 ` William Roche
2024-11-04 14:10 ` David Hildenbrand
2024-10-25 23:30 ` William Roche
2024-10-22 21:35 ` [PATCH v1 3/4] system/physmem: Largepage punch hole before reset of memory pages “William Roche
2024-10-23 7:30 ` David Hildenbrand
2024-10-25 23:27 ` William Roche
2024-10-28 17:01 ` David Hildenbrand
2024-10-30 1:56 ` William Roche
2024-11-04 13:30 ` David Hildenbrand
2024-11-07 10:21 ` [PATCH v2 0/7] hugetlbfs memory HW error fixes “William Roche
2024-11-07 10:21 ` [PATCH v2 1/7] accel/kvm: Keep track of the HWPoisonPage page_size “William Roche
2024-11-12 10:30 ` David Hildenbrand
2024-11-12 18:17 ` William Roche
2024-11-12 21:35 ` David Hildenbrand
2024-11-07 10:21 ` [PATCH v2 2/7] system/physmem: poisoned memory discard on reboot “William Roche
2024-11-12 11:07 ` David Hildenbrand
2024-11-12 18:17 ` William Roche [this message]
2024-11-12 22:06 ` David Hildenbrand
2024-11-07 10:21 ` [PATCH v2 3/7] accel/kvm: Report the loss of a large memory page “William Roche
2024-11-12 11:13 ` David Hildenbrand
2024-11-12 18:17 ` William Roche
2024-11-12 22:22 ` David Hildenbrand
2024-11-15 21:03 ` William Roche
2024-11-18 9:45 ` David Hildenbrand
2024-11-07 10:21 ` [PATCH v2 4/7] numa: Introduce and use ram_block_notify_remap() “William Roche
2024-11-07 10:21 ` [PATCH v2 5/7] hostmem: Factor out applying settings “William Roche
2024-11-07 10:21 ` [PATCH v2 6/7] hostmem: Handle remapping of RAM “William Roche
2024-11-12 13:45 ` David Hildenbrand
2024-11-12 18:17 ` William Roche
2024-11-12 22:24 ` David Hildenbrand
2024-11-07 10:21 ` [PATCH v2 7/7] system/physmem: Memory settings applied on remap notification “William Roche
2024-10-22 21:35 ` [PATCH v1 4/4] accel/kvm: Report the loss of a large memory page “William Roche
2024-10-28 16:32 ` [RFC RESEND 0/6] hugetlbfs largepage RAS project David Hildenbrand
2024-11-25 14:27 ` [PATCH v3 0/7] hugetlbfs memory HW error fixes “William Roche
2024-11-25 14:27 ` [PATCH v3 1/7] hwpoison_page_list and qemu_ram_remap are based of pages “William Roche
2024-11-25 14:27 ` [PATCH v3 2/7] system/physmem: poisoned memory discard on reboot “William Roche
2024-11-25 14:27 ` [PATCH v3 3/7] accel/kvm: Report the loss of a large memory page “William Roche
2024-11-25 14:27 ` [PATCH v3 4/7] numa: Introduce and use ram_block_notify_remap() “William Roche
2024-11-25 14:27 ` [PATCH v3 5/7] hostmem: Factor out applying settings “William Roche
2024-11-25 14:27 ` [PATCH v3 6/7] hostmem: Handle remapping of RAM “William Roche
2024-11-25 14:27 ` [PATCH v3 7/7] system/physmem: Memory settings applied on remap notification “William Roche
2024-12-02 15:41 ` [PATCH v3 0/7] hugetlbfs memory HW error fixes William Roche
2024-12-02 16:00 ` David Hildenbrand
2024-12-03 0:15 ` William Roche
2024-12-03 14:08 ` David Hildenbrand
2024-12-03 14:39 ` William Roche
2024-12-03 15:00 ` David Hildenbrand
2024-12-06 18:26 ` William Roche
2024-12-09 21:25 ` David Hildenbrand
2024-12-14 13:45 ` [PATCH v4 0/7] Poisoned memory recovery on reboot “William Roche
2024-12-14 13:45 ` [PATCH v4 1/7] hwpoison_page_list and qemu_ram_remap are based on pages “William Roche
2025-01-08 21:34 ` David Hildenbrand
2025-01-10 20:56 ` William Roche
2025-01-14 13:56 ` David Hildenbrand
2024-12-14 13:45 ` [PATCH v4 2/7] system/physmem: poisoned memory discard on reboot “William Roche
2025-01-08 21:44 ` David Hildenbrand
2025-01-10 20:56 ` William Roche
2025-01-14 14:00 ` David Hildenbrand
2025-01-27 21:15 ` William Roche
2024-12-14 13:45 ` [PATCH v4 3/7] accel/kvm: Report the loss of a large memory page “William Roche
2024-12-14 13:45 ` [PATCH v4 4/7] numa: Introduce and use ram_block_notify_remap() “William Roche
2024-12-14 13:45 ` [PATCH v4 5/7] hostmem: Factor out applying settings “William Roche
2025-01-08 21:58 ` David Hildenbrand
2025-01-10 20:56 ` William Roche
2024-12-14 13:45 ` [PATCH v4 6/7] hostmem: Handle remapping of RAM “William Roche
2025-01-08 21:51 ` [PATCH v4 6/7] c David Hildenbrand
2025-01-10 20:57 ` [PATCH v4 6/7] hostmem: Handle remapping of RAM William Roche
2024-12-14 13:45 ` [PATCH v4 7/7] system/physmem: Memory settings applied on remap notification “William Roche
2025-01-08 21:53 ` David Hildenbrand
2025-01-10 20:57 ` William Roche
2025-01-14 14:01 ` David Hildenbrand
2025-01-08 21:22 ` [PATCH v4 0/7] Poisoned memory recovery on reboot David Hildenbrand
2025-01-10 20:55 ` William Roche
2025-01-10 21:13 ` [PATCH v5 0/6] " “William Roche
2025-01-10 21:14 ` [PATCH v5 1/6] system/physmem: handle hugetlb correctly in qemu_ram_remap() “William Roche
2025-01-14 14:02 ` David Hildenbrand
2025-01-27 21:16 ` William Roche
2025-01-28 18:41 ` David Hildenbrand
2025-01-10 21:14 ` [PATCH v5 2/6] system/physmem: poisoned memory discard on reboot “William Roche
2025-01-14 14:07 ` David Hildenbrand
2025-01-27 21:16 ` William Roche
2025-01-10 21:14 ` [PATCH v5 3/6] accel/kvm: Report the loss of a large memory page “William Roche
2025-01-14 14:09 ` David Hildenbrand
2025-01-27 21:16 ` William Roche
2025-01-28 18:45 ` David Hildenbrand
2025-01-10 21:14 ` [PATCH v5 4/6] numa: Introduce and use ram_block_notify_remap() “William Roche
2025-01-10 21:14 ` [PATCH v5 5/6] hostmem: Factor out applying settings “William Roche
2025-01-10 21:14 ` [PATCH v5 6/6] hostmem: Handle remapping of RAM “William Roche
2025-01-14 14:11 ` David Hildenbrand
2025-01-27 21:16 ` William Roche
2025-01-14 14:12 ` [PATCH v5 0/6] Poisoned memory recovery on reboot David Hildenbrand
2025-01-27 21:16 ` William Roche
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a79a2639-a6f1-4ca1-9b12-d4e125d894d4@oracle.com \
--to=william.roche@oracle.com \
--cc=david@redhat.com \
--cc=eduardo@habkost.net \
--cc=imammedo@redhat.com \
--cc=joao.m.martins@oracle.com \
--cc=kvm@vger.kernel.org \
--cc=marcel.apfelbaum@gmail.com \
--cc=mtosatti@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=peterx@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
--cc=wangyanan55@huawei.com \
--cc=zhao1.liu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).