From: William Roche <william.roche@oracle.com>
To: Peter Xu <peterx@redhat.com>, david@redhat.com
Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org, qemu-arm@nongnu.org,
pbonzini@redhat.com, richard.henderson@linaro.org,
philmd@linaro.org, peter.maydell@linaro.org, mtosatti@redhat.com,
imammedo@redhat.com, eduardo@habkost.net,
marcel.apfelbaum@gmail.com, wangyanan55@huawei.com,
zhao1.liu@intel.com, joao.m.martins@oracle.com
Subject: Re: [PATCH v7 3/6] accel/kvm: Report the loss of a large memory page
Date: Fri, 7 Feb 2025 19:02:22 +0100 [thread overview]
Message-ID: <2ad49f5d-f2c1-4ba2-9b6b-77ba96c83bab@oracle.com> (raw)
In-Reply-To: <Z6Oaukumli1eIEDB@x1.local>
On 2/5/25 18:07, Peter Xu wrote:
> On Wed, Feb 05, 2025 at 05:27:13PM +0100, William Roche wrote:
>> [...]
>> The HMP command "info ramblock" is implemented with the ram_block_format()
>> function which returns a message buffer built with a string for each
>> ramblock (protected by the RCU_READ_LOCK_GUARD). Our new function copies a
>> struct with the necessary information.
>>
>> Relaying on the buffer format to retrieve the information doesn't seem
>> reasonable, and more importantly, this buffer doesn't provide all the needed
>> data, like fd and fd_offset.
>>
>> I would say that ram_block_format() and qemu_ram_block_info_from_addr()
>> serve 2 different goals.
>>
>> (a reimplementation of ram_block_format() with an adapted version of
>> qemu_ram_block_info_from_addr() taking the extra information needed could be
>> doable for example, but may not be worth doing for now)
>
> IIUC admin should be aware of fd_offset because the admin should be fully
> aware of the start offset of FDs to specify in qemu cmdlines, or in
> Libvirt. But yes, we can always add fd_offset into ram_block_format() if
> it's helpful.
>
> Besides, the existing issues on this patch:
>
> - From outcome of this patch, it introduces one ramblock API (which is ok
> to me, so far), to do some error_report()s. It looks pretty much for
> debugging rather than something serious (e.g. report via QMP queries,
> QMP events etc.). From debug POV, I still don't see why this is
> needed.. per discussed above.
The reason why I want to inform the user of a large memory failure more
specifically than a standard sized page loss is because of the
significant behavior difference: Our current implementation can
transparently handle many situations without necessarily leading the VM
to a crash. But when it comes to large pages, there is no mechanism to
inform the VM of a large memory loss, and usually this situation leads
the VM to crash, and can also generate some weird situations like qemu
itself crashing or a loop of errors, for example.
So having a message informing of such a memory loss can help to
understand a more radical VM or qemu behavior -- it increases the
diagnosability of our code.
To verify that a SIGBUS appeared because of a large page loss, we
currently need to verify the targeted memory block backend page_size.
We should usually get this information from the SIGBUS siginfo data
(with a si_addr_lsb field giving an indication of the page size) but a
KVM weakness with a hardcoded si_addr_lsb=PAGE_SHIFT value in the SIGBUS
siginfo returned from the kernel prevents that: See
kvm_send_hwpoison_signal() function.
So I first wrote a small API addition called
qemu_ram_pagesize_from_addr() to retrieve only this page_size value from
the impacted address; and later on, this function turned into the richer
qemu_ram_block_info_from_addr() function to have the generated messages
match the existing memory messages as rightly requested by David.
So the main reason is a KVM "weakness" with kvm_send_hwpoison_signal(),
and the second reason is to have richer error messages.
> - From merge POV, this patch isn't a pure memory change, so I'll need to
> get ack from other maintainers, at least that should be how it works..
I agree :)
>
> I feel like when hwpoison becomes a serious topic, we need some more
> serious reporting facility than error reports. So that we could have this
> as separate topic to be revisited. It might speed up your prior patches
> from not being blocked on this.
I explained why I think that error messages are important, but I don't
want to get blocked on fixing the hugepage memory recovery because of that.
If you think that not displaying a specific message for large page loss
can help to get the recovery fixed, than I can change my proposal to do so.
Early next week, I'll send a simplified version of my first 3 patches
without this specific messages and without the preallocation handling in
all remap cases, so you can evaluate this possibility.
Thank again for your feedback
William.
next prev parent reply other threads:[~2025-02-07 18:03 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-01 9:57 [PATCH v7 0/6] Poisoned memory recovery on reboot “William Roche
2025-02-01 9:57 ` [PATCH v7 1/6] system/physmem: handle hugetlb correctly in qemu_ram_remap() “William Roche
2025-02-04 17:09 ` Peter Xu
2025-02-01 9:57 ` [PATCH v7 2/6] system/physmem: poisoned memory discard on reboot “William Roche
2025-02-04 17:09 ` Peter Xu
2025-02-05 16:27 ` William Roche
2025-02-01 9:57 ` [PATCH v7 3/6] accel/kvm: Report the loss of a large memory page “William Roche
2025-02-04 17:01 ` Peter Xu
2025-02-05 16:27 ` William Roche
2025-02-05 17:07 ` Peter Xu
2025-02-07 18:02 ` William Roche [this message]
2025-02-10 16:48 ` Peter Xu
2025-02-11 21:22 ` William Roche
2025-02-11 21:45 ` Peter Xu
2025-02-01 9:57 ` [PATCH v7 4/6] numa: Introduce and use ram_block_notify_remap() “William Roche
2025-02-04 17:17 ` Peter Xu
2025-02-04 17:42 ` David Hildenbrand
2025-02-01 9:57 ` [PATCH v7 5/6] hostmem: Factor out applying settings “William Roche
2025-02-01 9:57 ` [PATCH v7 6/6] hostmem: Handle remapping of RAM “William Roche
2025-02-04 17:50 ` David Hildenbrand
2025-02-04 17:58 ` Peter Xu
2025-02-04 18:55 ` David Hildenbrand
2025-02-04 20:16 ` Peter Xu
2025-02-05 16:27 ` William Roche
2025-02-05 17:58 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2ad49f5d-f2c1-4ba2-9b6b-77ba96c83bab@oracle.com \
--to=william.roche@oracle.com \
--cc=david@redhat.com \
--cc=eduardo@habkost.net \
--cc=imammedo@redhat.com \
--cc=joao.m.martins@oracle.com \
--cc=kvm@vger.kernel.org \
--cc=marcel.apfelbaum@gmail.com \
--cc=mtosatti@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=peterx@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
--cc=wangyanan55@huawei.com \
--cc=zhao1.liu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).