From: David Hildenbrand <david@redhat.com>
To: William Roche <william.roche@oracle.com>,
kvm@vger.kernel.org, qemu-devel@nongnu.org, qemu-arm@nongnu.org
Cc: peterx@redhat.com, pbonzini@redhat.com,
richard.henderson@linaro.org, philmd@linaro.org,
peter.maydell@linaro.org, mtosatti@redhat.com,
imammedo@redhat.com, eduardo@habkost.net,
marcel.apfelbaum@gmail.com, wangyanan55@huawei.com,
zhao1.liu@intel.com, joao.m.martins@oracle.com
Subject: Re: [PATCH v2 3/7] accel/kvm: Report the loss of a large memory page
Date: Mon, 18 Nov 2024 10:45:25 +0100 [thread overview]
Message-ID: <439e280e-bc82-4a79-b325-d18fcf65feec@redhat.com> (raw)
In-Reply-To: <386af93d-5a61-4a90-9af0-1f33fa04b0bd@oracle.com>
>> Hm, I think we should definitely be including the size in the existing
>> one. That code was written without huge pages in mind.
>
> Yes we can do that, and get the page size at this level to pass as a
> 'page_sise' argument to kvm_hwpoison_page_add().
>
> It would make the message longer as we will have the extra information
> about the large page on all messages when an error impacts a large page.
> We could change the messages only when we are dealing with a large page,
> so that the standard (4k) case isn't modified.
Right. And likely we should call it "huge page" instead, which is the
Linux term for anything larger than a single page.
[...]
>>
>> With the "large page" hint you can highlight that this is special.
>
> Right, we can do it that way. It also gives the impression that we
> somehow inject errors on a large range of the memory. Which is not the
> case. I'll send a proposal with a different formulation, so that you can
> choose.
>
Make sense.
>
>
>> On a related note ...I think we have a problem. Assume we got a SIGBUS
>> on a huge page (e.g., somewhere in a 1 GiB page).
>>
>> We will call kvm_mce_inject(cpu, paddr, code) /
>> acpi_ghes_record_errors(ACPI_HEST_SRC_ID_SEA, paddr)
>>
>> But where is the size information? :// Won't the VM simply assume that
>> there was a MCE on a single 4k page starting at paddr?
>
> This is absolutely right !
> It's exactly what happens: The VM kernel received the information and
> considers that only the impacted page has to be poisoned.
> > That's also the reason why Qemu repeats the error injections every time
> the poisoned large page is accessed (for all other touched 4k pages
> located on this "memory hole").
:/
So we always get from Linux the full 1Gig range and always report the
first 4k page essentially, on any such access, right?
BTW, should we handle duplicates in our poison list?
>
>>
>> I'm not sure if we can inject ranges, or if we would have to issue one
>> MCE per page ... hm, what's your take on this?
>
> I don't know of any size information about a memory error reported by
> the hardware. The kernel doesn't seem to expect any such information.
> It explains why there is no impact/blast size information provided when
> an error is relayed to the VM.
>
> We could take the "memory hole" size into account in Qemu, but repeating
> error injections is not going to help a lot either: We'd need to give
> the VM some time to deal with an error injection before producing a new
> error for the next page etc... in the case (x86 only) where an
I had the same thoughts.
> asynchronous error is relayed with BUS_MCEERR_AO, we would also have to
> repeat the error for all the 4k pages located on the lost large page too.
>
> We can see that the Linux kernel has some mechanisms to deal with a
> seldom 4k page loss, but a larger blast is very likely to crash the VM
> (which is fine).
Right, and that will inevitably happen when we get a MVE on a 1GiG
hugetlb page, correct? The whole thing will be inaccessible.
> And as a significant part of the memory is no longer
> accessible, dealing with the error itself can be impaired and we
> increase the risk of loosing data, even though most of the memory on the
> large page could still be used.
>
> Now if we can recover the 'still valid' memory of the impacted large
> page, we can significantly reduce this blast and give a much better
> chance to the VM to survive the incident or crash more gracefully.
Right. That cannot be sorted out in user space alone, unfortunately.
>
> I've looked at the project you indicated me, which is not ready to be
> adopted:
> https://lore.kernel.org/linux-mm/20240924043924.3562257-2-jiaqiyan@google.com/T/
>
Yes, that goes into a better direction, though.
> But we see that, this large page enhancement is needed, sometimes just
> to give a chance to the VM to survive a little longer before being
> terminated or moved.
> Injecting multiple MCEs or ACPI error records doesn't help, according to me.
I suspect that in most cases, when we get an MCE on a 1Gig page in the
hypervisor, our running Linux guest will soon crash, because it really
lost 1 Gig of contiguous memory. :(
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2024-11-18 9:46 UTC|newest]
Thread overview: 119+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-10 9:07 [RFC 0/6] hugetlbfs largepage RAS project “William Roche
2024-09-10 9:07 ` [RFC 1/6] accel/kvm: SIGBUS handler should also deal with si_addr_lsb “William Roche
2024-09-10 9:07 ` [RFC 2/6] accel/kvm: Keep track of the HWPoisonPage sizes “William Roche
2024-09-10 9:07 ` [RFC 3/6] system/physmem: Remap memory pages on reset based on the page size “William Roche
2024-09-10 9:07 ` [RFC 4/6] system: Introducing hugetlbfs largepage RAS feature “William Roche
2024-09-10 9:07 ` [RFC 5/6] system/hugetlb_ras: Handle madvise SIGBUS signal on listener “William Roche
2024-09-10 9:07 ` [RFC 6/6] system/hugetlb_ras: Replay lost BUS_MCEERR_AO signals on VM resume “William Roche
2024-09-10 10:02 ` [RFC RESEND 0/6] hugetlbfs largepage RAS project “William Roche
2024-09-10 10:02 ` [RFC RESEND 1/6] accel/kvm: SIGBUS handler should also deal with si_addr_lsb “William Roche
2024-09-10 10:02 ` [RFC RESEND 2/6] accel/kvm: Keep track of the HWPoisonPage sizes “William Roche
2024-09-10 10:02 ` [RFC RESEND 3/6] system/physmem: Remap memory pages on reset based on the page size “William Roche
2024-09-10 10:02 ` [RFC RESEND 4/6] system: Introducing hugetlbfs largepage RAS feature “William Roche
2024-09-10 10:02 ` [RFC RESEND 5/6] system/hugetlb_ras: Handle madvise SIGBUS signal on listener “William Roche
2024-09-10 10:02 ` [RFC RESEND 6/6] system/hugetlb_ras: Replay lost BUS_MCEERR_AO signals on VM resume “William Roche
2024-09-10 11:36 ` [RFC RESEND 0/6] hugetlbfs largepage RAS project David Hildenbrand
2024-09-10 16:24 ` William Roche
2024-09-11 22:07 ` David Hildenbrand
2024-09-12 17:07 ` William Roche
2024-09-19 16:52 ` William Roche
2024-10-09 15:45 ` Peter Xu
2024-10-10 20:35 ` William Roche
2024-10-22 21:34 ` [PATCH v1 0/4] hugetlbfs memory HW error fixes “William Roche
2024-10-22 21:35 ` [PATCH v1 1/4] accel/kvm: SIGBUS handler should also deal with si_addr_lsb “William Roche
2024-10-22 21:35 ` [PATCH v1 2/4] accel/kvm: Keep track of the HWPoisonPage page_size “William Roche
2024-10-23 7:28 ` David Hildenbrand
2024-10-25 23:27 ` William Roche
2024-10-28 16:42 ` David Hildenbrand
2024-10-30 1:56 ` William Roche
2024-11-04 14:10 ` David Hildenbrand
2024-10-25 23:30 ` William Roche
2024-10-22 21:35 ` [PATCH v1 3/4] system/physmem: Largepage punch hole before reset of memory pages “William Roche
2024-10-23 7:30 ` David Hildenbrand
2024-10-25 23:27 ` William Roche
2024-10-28 17:01 ` David Hildenbrand
2024-10-30 1:56 ` William Roche
2024-11-04 13:30 ` David Hildenbrand
2024-11-07 10:21 ` [PATCH v2 0/7] hugetlbfs memory HW error fixes “William Roche
2024-11-07 10:21 ` [PATCH v2 1/7] accel/kvm: Keep track of the HWPoisonPage page_size “William Roche
2024-11-12 10:30 ` David Hildenbrand
2024-11-12 18:17 ` William Roche
2024-11-12 21:35 ` David Hildenbrand
2024-11-07 10:21 ` [PATCH v2 2/7] system/physmem: poisoned memory discard on reboot “William Roche
2024-11-12 11:07 ` David Hildenbrand
2024-11-12 18:17 ` William Roche
2024-11-12 22:06 ` David Hildenbrand
2024-11-07 10:21 ` [PATCH v2 3/7] accel/kvm: Report the loss of a large memory page “William Roche
2024-11-12 11:13 ` David Hildenbrand
2024-11-12 18:17 ` William Roche
2024-11-12 22:22 ` David Hildenbrand
2024-11-15 21:03 ` William Roche
2024-11-18 9:45 ` David Hildenbrand [this message]
2024-11-07 10:21 ` [PATCH v2 4/7] numa: Introduce and use ram_block_notify_remap() “William Roche
2024-11-07 10:21 ` [PATCH v2 5/7] hostmem: Factor out applying settings “William Roche
2024-11-07 10:21 ` [PATCH v2 6/7] hostmem: Handle remapping of RAM “William Roche
2024-11-12 13:45 ` David Hildenbrand
2024-11-12 18:17 ` William Roche
2024-11-12 22:24 ` David Hildenbrand
2024-11-07 10:21 ` [PATCH v2 7/7] system/physmem: Memory settings applied on remap notification “William Roche
2024-10-22 21:35 ` [PATCH v1 4/4] accel/kvm: Report the loss of a large memory page “William Roche
2024-10-28 16:32 ` [RFC RESEND 0/6] hugetlbfs largepage RAS project David Hildenbrand
2024-11-25 14:27 ` [PATCH v3 0/7] hugetlbfs memory HW error fixes “William Roche
2024-11-25 14:27 ` [PATCH v3 1/7] hwpoison_page_list and qemu_ram_remap are based of pages “William Roche
2024-11-25 14:27 ` [PATCH v3 2/7] system/physmem: poisoned memory discard on reboot “William Roche
2024-11-25 14:27 ` [PATCH v3 3/7] accel/kvm: Report the loss of a large memory page “William Roche
2024-11-25 14:27 ` [PATCH v3 4/7] numa: Introduce and use ram_block_notify_remap() “William Roche
2024-11-25 14:27 ` [PATCH v3 5/7] hostmem: Factor out applying settings “William Roche
2024-11-25 14:27 ` [PATCH v3 6/7] hostmem: Handle remapping of RAM “William Roche
2024-11-25 14:27 ` [PATCH v3 7/7] system/physmem: Memory settings applied on remap notification “William Roche
2024-12-02 15:41 ` [PATCH v3 0/7] hugetlbfs memory HW error fixes William Roche
2024-12-02 16:00 ` David Hildenbrand
2024-12-03 0:15 ` William Roche
2024-12-03 14:08 ` David Hildenbrand
2024-12-03 14:39 ` William Roche
2024-12-03 15:00 ` David Hildenbrand
2024-12-06 18:26 ` William Roche
2024-12-09 21:25 ` David Hildenbrand
2024-12-14 13:45 ` [PATCH v4 0/7] Poisoned memory recovery on reboot “William Roche
2024-12-14 13:45 ` [PATCH v4 1/7] hwpoison_page_list and qemu_ram_remap are based on pages “William Roche
2025-01-08 21:34 ` David Hildenbrand
2025-01-10 20:56 ` William Roche
2025-01-14 13:56 ` David Hildenbrand
2024-12-14 13:45 ` [PATCH v4 2/7] system/physmem: poisoned memory discard on reboot “William Roche
2025-01-08 21:44 ` David Hildenbrand
2025-01-10 20:56 ` William Roche
2025-01-14 14:00 ` David Hildenbrand
2025-01-27 21:15 ` William Roche
2024-12-14 13:45 ` [PATCH v4 3/7] accel/kvm: Report the loss of a large memory page “William Roche
2024-12-14 13:45 ` [PATCH v4 4/7] numa: Introduce and use ram_block_notify_remap() “William Roche
2024-12-14 13:45 ` [PATCH v4 5/7] hostmem: Factor out applying settings “William Roche
2025-01-08 21:58 ` David Hildenbrand
2025-01-10 20:56 ` William Roche
2024-12-14 13:45 ` [PATCH v4 6/7] hostmem: Handle remapping of RAM “William Roche
2025-01-08 21:51 ` [PATCH v4 6/7] c David Hildenbrand
2025-01-10 20:57 ` [PATCH v4 6/7] hostmem: Handle remapping of RAM William Roche
2024-12-14 13:45 ` [PATCH v4 7/7] system/physmem: Memory settings applied on remap notification “William Roche
2025-01-08 21:53 ` David Hildenbrand
2025-01-10 20:57 ` William Roche
2025-01-14 14:01 ` David Hildenbrand
2025-01-08 21:22 ` [PATCH v4 0/7] Poisoned memory recovery on reboot David Hildenbrand
2025-01-10 20:55 ` William Roche
2025-01-10 21:13 ` [PATCH v5 0/6] " “William Roche
2025-01-10 21:14 ` [PATCH v5 1/6] system/physmem: handle hugetlb correctly in qemu_ram_remap() “William Roche
2025-01-14 14:02 ` David Hildenbrand
2025-01-27 21:16 ` William Roche
2025-01-28 18:41 ` David Hildenbrand
2025-01-10 21:14 ` [PATCH v5 2/6] system/physmem: poisoned memory discard on reboot “William Roche
2025-01-14 14:07 ` David Hildenbrand
2025-01-27 21:16 ` William Roche
2025-01-10 21:14 ` [PATCH v5 3/6] accel/kvm: Report the loss of a large memory page “William Roche
2025-01-14 14:09 ` David Hildenbrand
2025-01-27 21:16 ` William Roche
2025-01-28 18:45 ` David Hildenbrand
2025-01-10 21:14 ` [PATCH v5 4/6] numa: Introduce and use ram_block_notify_remap() “William Roche
2025-01-10 21:14 ` [PATCH v5 5/6] hostmem: Factor out applying settings “William Roche
2025-01-10 21:14 ` [PATCH v5 6/6] hostmem: Handle remapping of RAM “William Roche
2025-01-14 14:11 ` David Hildenbrand
2025-01-27 21:16 ` William Roche
2025-01-14 14:12 ` [PATCH v5 0/6] Poisoned memory recovery on reboot David Hildenbrand
2025-01-27 21:16 ` William Roche
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=439e280e-bc82-4a79-b325-d18fcf65feec@redhat.com \
--to=david@redhat.com \
--cc=eduardo@habkost.net \
--cc=imammedo@redhat.com \
--cc=joao.m.martins@oracle.com \
--cc=kvm@vger.kernel.org \
--cc=marcel.apfelbaum@gmail.com \
--cc=mtosatti@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=peterx@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
--cc=wangyanan55@huawei.com \
--cc=william.roche@oracle.com \
--cc=zhao1.liu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).