qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: William Roche <william.roche@oracle.com>
To: David Hildenbrand <david@redhat.com>,
	kvm@vger.kernel.org, qemu-devel@nongnu.org, qemu-arm@nongnu.org
Cc: peterx@redhat.com, pbonzini@redhat.com,
	richard.henderson@linaro.org, philmd@linaro.org,
	peter.maydell@linaro.org, mtosatti@redhat.com,
	imammedo@redhat.com, eduardo@habkost.net,
	marcel.apfelbaum@gmail.com, wangyanan55@huawei.com,
	zhao1.liu@intel.com, joao.m.martins@oracle.com
Subject: Re: [PATCH v2 3/7] accel/kvm: Report the loss of a large memory page
Date: Fri, 15 Nov 2024 22:03:34 +0100	[thread overview]
Message-ID: <386af93d-5a61-4a90-9af0-1f33fa04b0bd@oracle.com> (raw)
In-Reply-To: <e5d6bae8-a3bd-4225-b38f-65de6b1a2b54@redhat.com>

Thanks for the feedback on the patches, I'll send a new version in the 
coming week.

But I just wanted to answer now the questions you asked on this specific 
one as they are related to the importance of fixing the large page 
failures handling.

On 11/12/24 23:22, David Hildenbrand wrote:
> On 12.11.24 19:17, William Roche wrote:
>> On 11/12/24 12:13, David Hildenbrand wrote:
>>> On 07.11.24 11:21, “William Roche wrote:
>>>> From: William Roche <william.roche@oracle.com>
>>>>
>>>> When an entire large page is impacted by an error (hugetlbfs case),
>>>> report better the size and location of this large memory hole, so
>>>> give a warning message when this page is first hit:
>>>> Memory error: Loosing a large page (size: X) at QEMU addr Y and GUEST
>>>> addr Z
>>>>
>>>
>>> Hm, I wonder if we really want to special-case hugetlb here.
>>>
>>> Why not make the warning independent of the underlying page size?
>>
>> We already have a warning provided by Qemu (in 
>> kvm_arch_on_sigbus_vcpu()):
>>
>> Guest MCE Memory Error at QEMU addr Y and GUEST addr Z of type
>> BUS_MCEERR_AR/_AO injected
>>
>> The one I suggest is an additional message provided before the above
>> message.
>>
>> Here is an example:
>> qemu-system-x86_64: warning: Memory error: Loosing a large page (size:
>> 2097152) at QEMU addr 0x7fdd7d400000 and GUEST addr 0x11600000
>> qemu-system-x86_64: warning: Guest MCE Memory Error at QEMU addr
>> 0x7fdd7d400000 and GUEST addr 0x11600000 of type BUS_MCEERR_AO injected
>>
> 
> Hm, I think we should definitely be including the size in the existing 
> one. That code was written without huge pages in mind.

Yes we can do that, and get the page size at this level to pass as a 
'page_sise' argument to kvm_hwpoison_page_add().

It would make the message longer as we will have the extra information 
about the large page on all messages when an error impacts a large page.
We could change the messages only when we are dealing with a large page, 
so that the standard (4k) case isn't modified.


> 
> We should similarly warn in the arm implementation (where I don't see a 
> similar message yet).

Ok, I'll also add a message for the ARM platform.

>>
>> According to me, this large page case additional message will help to
>> better understand the probable sudden proliferation of memory errors
>> that can be reported by Qemu on the impacted range.
>> Not only will the machine administrator identify better that a single
>> memory error had this large impact, it can also help us to better
>> measure the impact of fixing the large page memory error support in the
>> field (in the future).
> 
> What about extending the existing one to something like
> 
> warning: Guest MCE Memory Error at QEMU addr $ADDR and GUEST $PADDR of 
> type BUS_MCEERR_AO and size $SIZE (large page) injected
> 
> 
> With the "large page" hint you can highlight that this is special.

Right, we can do it that way. It also gives the impression that we 
somehow inject errors on a large range of the memory. Which is not the 
case. I'll send a proposal with a different formulation, so that you can 
choose.



> On a related note ...I think we have a problem. Assume we got a SIGBUS 
> on a huge page (e.g., somewhere in a 1 GiB page).
> 
> We will call kvm_mce_inject(cpu, paddr, code) / 
> acpi_ghes_record_errors(ACPI_HEST_SRC_ID_SEA, paddr)
> 
> But where is the size information? :// Won't the VM simply assume that 
> there was a MCE on a single 4k page starting at paddr?

This is absolutely right !
It's exactly what happens: The VM kernel received the information and 
considers that only the impacted page has to be poisoned.

That's also the reason why Qemu repeats the error injections every time 
the poisoned large page is accessed (for all other touched 4k pages 
located on this "memory hole").

> 
> I'm not sure if we can inject ranges, or if we would have to issue one 
> MCE per page ... hm, what's your take on this?

I don't know of any size information about a memory error reported by 
the hardware. The kernel doesn't seem to expect any such information.
It explains why there is no impact/blast size information provided when 
an error is relayed to the VM.

We could take the "memory hole" size into account in Qemu, but repeating 
error injections is not going to help a lot either: We'd need to give 
the VM some time to deal with an error injection before producing a new 
error for the next page etc... in the case (x86 only) where an 
asynchronous error is relayed with BUS_MCEERR_AO, we would also have to 
repeat the error for all the 4k pages located on the lost large page too.

We can see that the Linux kernel has some mechanisms to deal with a 
seldom 4k page loss, but a larger blast is very likely to crash the VM 
(which is fine). And as a significant part of the memory is no longer 
accessible, dealing with the error itself can be impaired and we 
increase the risk of loosing data, even though most of the memory on the 
large page could still be used.

Now if we can recover the 'still valid' memory of the impacted large 
page, we can significantly reduce this blast and give a much better 
chance to the VM to survive the incident or crash more gracefully.

I've looked at the project you indicated me, which is not ready to be 
adopted:
https://lore.kernel.org/linux-mm/20240924043924.3562257-2-jiaqiyan@google.com/T/

But we see that, this large page enhancement is needed, sometimes just 
to give a chance to the VM to survive a little longer before being 
terminated or moved.
Injecting multiple MCEs or ACPI error records doesn't help, according to me.

William.



  reply	other threads:[~2024-11-15 21:04 UTC|newest]

Thread overview: 119+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-10  9:07 [RFC 0/6] hugetlbfs largepage RAS project “William Roche
2024-09-10  9:07 ` [RFC 1/6] accel/kvm: SIGBUS handler should also deal with si_addr_lsb “William Roche
2024-09-10  9:07 ` [RFC 2/6] accel/kvm: Keep track of the HWPoisonPage sizes “William Roche
2024-09-10  9:07 ` [RFC 3/6] system/physmem: Remap memory pages on reset based on the page size “William Roche
2024-09-10  9:07 ` [RFC 4/6] system: Introducing hugetlbfs largepage RAS feature “William Roche
2024-09-10  9:07 ` [RFC 5/6] system/hugetlb_ras: Handle madvise SIGBUS signal on listener “William Roche
2024-09-10  9:07 ` [RFC 6/6] system/hugetlb_ras: Replay lost BUS_MCEERR_AO signals on VM resume “William Roche
2024-09-10 10:02 ` [RFC RESEND 0/6] hugetlbfs largepage RAS project “William Roche
2024-09-10 10:02   ` [RFC RESEND 1/6] accel/kvm: SIGBUS handler should also deal with si_addr_lsb “William Roche
2024-09-10 10:02   ` [RFC RESEND 2/6] accel/kvm: Keep track of the HWPoisonPage sizes “William Roche
2024-09-10 10:02   ` [RFC RESEND 3/6] system/physmem: Remap memory pages on reset based on the page size “William Roche
2024-09-10 10:02   ` [RFC RESEND 4/6] system: Introducing hugetlbfs largepage RAS feature “William Roche
2024-09-10 10:02   ` [RFC RESEND 5/6] system/hugetlb_ras: Handle madvise SIGBUS signal on listener “William Roche
2024-09-10 10:02   ` [RFC RESEND 6/6] system/hugetlb_ras: Replay lost BUS_MCEERR_AO signals on VM resume “William Roche
2024-09-10 11:36   ` [RFC RESEND 0/6] hugetlbfs largepage RAS project David Hildenbrand
2024-09-10 16:24     ` William Roche
2024-09-11 22:07       ` David Hildenbrand
2024-09-12 17:07         ` William Roche
2024-09-19 16:52           ` William Roche
2024-10-09 15:45             ` Peter Xu
2024-10-10 20:35               ` William Roche
2024-10-22 21:34               ` [PATCH v1 0/4] hugetlbfs memory HW error fixes “William Roche
2024-10-22 21:35                 ` [PATCH v1 1/4] accel/kvm: SIGBUS handler should also deal with si_addr_lsb “William Roche
2024-10-22 21:35                 ` [PATCH v1 2/4] accel/kvm: Keep track of the HWPoisonPage page_size “William Roche
2024-10-23  7:28                   ` David Hildenbrand
2024-10-25 23:27                     ` William Roche
2024-10-28 16:42                       ` David Hildenbrand
2024-10-30  1:56                         ` William Roche
2024-11-04 14:10                           ` David Hildenbrand
2024-10-25 23:30                     ` William Roche
2024-10-22 21:35                 ` [PATCH v1 3/4] system/physmem: Largepage punch hole before reset of memory pages “William Roche
2024-10-23  7:30                   ` David Hildenbrand
2024-10-25 23:27                     ` William Roche
2024-10-28 17:01                       ` David Hildenbrand
2024-10-30  1:56                         ` William Roche
2024-11-04 13:30                           ` David Hildenbrand
2024-11-07 10:21                             ` [PATCH v2 0/7] hugetlbfs memory HW error fixes “William Roche
2024-11-07 10:21                               ` [PATCH v2 1/7] accel/kvm: Keep track of the HWPoisonPage page_size “William Roche
2024-11-12 10:30                                 ` David Hildenbrand
2024-11-12 18:17                                   ` William Roche
2024-11-12 21:35                                     ` David Hildenbrand
2024-11-07 10:21                               ` [PATCH v2 2/7] system/physmem: poisoned memory discard on reboot “William Roche
2024-11-12 11:07                                 ` David Hildenbrand
2024-11-12 18:17                                   ` William Roche
2024-11-12 22:06                                     ` David Hildenbrand
2024-11-07 10:21                               ` [PATCH v2 3/7] accel/kvm: Report the loss of a large memory page “William Roche
2024-11-12 11:13                                 ` David Hildenbrand
2024-11-12 18:17                                   ` William Roche
2024-11-12 22:22                                     ` David Hildenbrand
2024-11-15 21:03                                       ` William Roche [this message]
2024-11-18  9:45                                         ` David Hildenbrand
2024-11-07 10:21                               ` [PATCH v2 4/7] numa: Introduce and use ram_block_notify_remap() “William Roche
2024-11-07 10:21                               ` [PATCH v2 5/7] hostmem: Factor out applying settings “William Roche
2024-11-07 10:21                               ` [PATCH v2 6/7] hostmem: Handle remapping of RAM “William Roche
2024-11-12 13:45                                 ` David Hildenbrand
2024-11-12 18:17                                   ` William Roche
2024-11-12 22:24                                     ` David Hildenbrand
2024-11-07 10:21                               ` [PATCH v2 7/7] system/physmem: Memory settings applied on remap notification “William Roche
2024-10-22 21:35                 ` [PATCH v1 4/4] accel/kvm: Report the loss of a large memory page “William Roche
2024-10-28 16:32             ` [RFC RESEND 0/6] hugetlbfs largepage RAS project David Hildenbrand
2024-11-25 14:27         ` [PATCH v3 0/7] hugetlbfs memory HW error fixes “William Roche
2024-11-25 14:27           ` [PATCH v3 1/7] hwpoison_page_list and qemu_ram_remap are based of pages “William Roche
2024-11-25 14:27           ` [PATCH v3 2/7] system/physmem: poisoned memory discard on reboot “William Roche
2024-11-25 14:27           ` [PATCH v3 3/7] accel/kvm: Report the loss of a large memory page “William Roche
2024-11-25 14:27           ` [PATCH v3 4/7] numa: Introduce and use ram_block_notify_remap() “William Roche
2024-11-25 14:27           ` [PATCH v3 5/7] hostmem: Factor out applying settings “William Roche
2024-11-25 14:27           ` [PATCH v3 6/7] hostmem: Handle remapping of RAM “William Roche
2024-11-25 14:27           ` [PATCH v3 7/7] system/physmem: Memory settings applied on remap notification “William Roche
2024-12-02 15:41           ` [PATCH v3 0/7] hugetlbfs memory HW error fixes William Roche
2024-12-02 16:00             ` David Hildenbrand
2024-12-03  0:15               ` William Roche
2024-12-03 14:08                 ` David Hildenbrand
2024-12-03 14:39                   ` William Roche
2024-12-03 15:00                     ` David Hildenbrand
2024-12-06 18:26                       ` William Roche
2024-12-09 21:25                         ` David Hildenbrand
2024-12-14 13:45         ` [PATCH v4 0/7] Poisoned memory recovery on reboot “William Roche
2024-12-14 13:45           ` [PATCH v4 1/7] hwpoison_page_list and qemu_ram_remap are based on pages “William Roche
2025-01-08 21:34             ` David Hildenbrand
2025-01-10 20:56               ` William Roche
2025-01-14 13:56                 ` David Hildenbrand
2024-12-14 13:45           ` [PATCH v4 2/7] system/physmem: poisoned memory discard on reboot “William Roche
2025-01-08 21:44             ` David Hildenbrand
2025-01-10 20:56               ` William Roche
2025-01-14 14:00                 ` David Hildenbrand
2025-01-27 21:15                   ` William Roche
2024-12-14 13:45           ` [PATCH v4 3/7] accel/kvm: Report the loss of a large memory page “William Roche
2024-12-14 13:45           ` [PATCH v4 4/7] numa: Introduce and use ram_block_notify_remap() “William Roche
2024-12-14 13:45           ` [PATCH v4 5/7] hostmem: Factor out applying settings “William Roche
2025-01-08 21:58             ` David Hildenbrand
2025-01-10 20:56               ` William Roche
2024-12-14 13:45           ` [PATCH v4 6/7] hostmem: Handle remapping of RAM “William Roche
2025-01-08 21:51             ` [PATCH v4 6/7] c David Hildenbrand
2025-01-10 20:57               ` [PATCH v4 6/7] hostmem: Handle remapping of RAM William Roche
2024-12-14 13:45           ` [PATCH v4 7/7] system/physmem: Memory settings applied on remap notification “William Roche
2025-01-08 21:53             ` David Hildenbrand
2025-01-10 20:57               ` William Roche
2025-01-14 14:01                 ` David Hildenbrand
2025-01-08 21:22           ` [PATCH v4 0/7] Poisoned memory recovery on reboot David Hildenbrand
2025-01-10 20:55             ` William Roche
2025-01-10 21:13         ` [PATCH v5 0/6] " “William Roche
2025-01-10 21:14           ` [PATCH v5 1/6] system/physmem: handle hugetlb correctly in qemu_ram_remap() “William Roche
2025-01-14 14:02             ` David Hildenbrand
2025-01-27 21:16               ` William Roche
2025-01-28 18:41                 ` David Hildenbrand
2025-01-10 21:14           ` [PATCH v5 2/6] system/physmem: poisoned memory discard on reboot “William Roche
2025-01-14 14:07             ` David Hildenbrand
2025-01-27 21:16               ` William Roche
2025-01-10 21:14           ` [PATCH v5 3/6] accel/kvm: Report the loss of a large memory page “William Roche
2025-01-14 14:09             ` David Hildenbrand
2025-01-27 21:16               ` William Roche
2025-01-28 18:45                 ` David Hildenbrand
2025-01-10 21:14           ` [PATCH v5 4/6] numa: Introduce and use ram_block_notify_remap() “William Roche
2025-01-10 21:14           ` [PATCH v5 5/6] hostmem: Factor out applying settings “William Roche
2025-01-10 21:14           ` [PATCH v5 6/6] hostmem: Handle remapping of RAM “William Roche
2025-01-14 14:11             ` David Hildenbrand
2025-01-27 21:16               ` William Roche
2025-01-14 14:12           ` [PATCH v5 0/6] Poisoned memory recovery on reboot David Hildenbrand
2025-01-27 21:16             ` William Roche

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=386af93d-5a61-4a90-9af0-1f33fa04b0bd@oracle.com \
    --to=william.roche@oracle.com \
    --cc=david@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=imammedo@redhat.com \
    --cc=joao.m.martins@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=peterx@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=wangyanan55@huawei.com \
    --cc=zhao1.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).