qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: William Roche <william.roche@oracle.com>
To: Peter Xu <peterx@redhat.com>, david@redhat.com
Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org, qemu-arm@nongnu.org,
	pbonzini@redhat.com, richard.henderson@linaro.org,
	philmd@linaro.org, peter.maydell@linaro.org, mtosatti@redhat.com,
	imammedo@redhat.com, eduardo@habkost.net,
	marcel.apfelbaum@gmail.com, wangyanan55@huawei.com,
	zhao1.liu@intel.com, joao.m.martins@oracle.com
Subject: Re: [PATCH v7 3/6] accel/kvm: Report the loss of a large memory page
Date: Fri, 7 Feb 2025 19:02:22 +0100	[thread overview]
Message-ID: <2ad49f5d-f2c1-4ba2-9b6b-77ba96c83bab@oracle.com> (raw)
In-Reply-To: <Z6Oaukumli1eIEDB@x1.local>

On 2/5/25 18:07, Peter Xu wrote:
> On Wed, Feb 05, 2025 at 05:27:13PM +0100, William Roche wrote:
>> [...]
>> The HMP command "info ramblock" is implemented with the ram_block_format()
>> function which returns a message buffer built with a string for each
>> ramblock (protected by the RCU_READ_LOCK_GUARD). Our new function copies a
>> struct with the necessary information.
>>
>> Relaying on the buffer format to retrieve the information doesn't seem
>> reasonable, and more importantly, this buffer doesn't provide all the needed
>> data, like fd and fd_offset.
>>
>> I would say that ram_block_format() and qemu_ram_block_info_from_addr()
>> serve 2 different goals.
>>
>> (a reimplementation of ram_block_format() with an adapted version of
>> qemu_ram_block_info_from_addr() taking the extra information needed could be
>> doable for example, but may not be worth doing for now)
> 
> IIUC admin should be aware of fd_offset because the admin should be fully
> aware of the start offset of FDs to specify in qemu cmdlines, or in
> Libvirt. But yes, we can always add fd_offset into ram_block_format() if
> it's helpful.
> 
> Besides, the existing issues on this patch:
> 
>    - From outcome of this patch, it introduces one ramblock API (which is ok
>      to me, so far), to do some error_report()s.  It looks pretty much for
>      debugging rather than something serious (e.g. report via QMP queries,
>      QMP events etc.).  From debug POV, I still don't see why this is
>      needed.. per discussed above.

The reason why I want to inform the user of a large memory failure more 
specifically than a standard sized page loss is because of the 
significant behavior difference: Our current implementation can 
transparently handle many situations without necessarily leading the VM 
to a crash. But when it comes to large pages, there is no mechanism to 
inform the VM of a large memory loss, and usually this situation leads 
the VM to crash, and can also generate some weird situations like qemu 
itself crashing or a loop of errors, for example.

So having a message informing of such a memory loss can help to 
understand a more radical VM or qemu behavior -- it increases the 
diagnosability of our code.

To verify that a SIGBUS appeared because of a large page loss, we 
currently need to verify the targeted memory block backend page_size.
We should usually get this information from the SIGBUS siginfo data 
(with a si_addr_lsb field giving an indication of the page size) but a 
KVM weakness with a hardcoded si_addr_lsb=PAGE_SHIFT value in the SIGBUS 
siginfo returned from the kernel prevents that: See 
kvm_send_hwpoison_signal() function.

So I first wrote a small API addition called 
qemu_ram_pagesize_from_addr() to retrieve only this page_size value from 
the impacted address; and later on, this function turned into the richer 
qemu_ram_block_info_from_addr() function to have the generated messages 
match the existing memory messages as rightly requested by David.

So the main reason is a KVM "weakness" with kvm_send_hwpoison_signal(), 
and the second reason is to have richer error messages.



>    - From merge POV, this patch isn't a pure memory change, so I'll need to
>      get ack from other maintainers, at least that should be how it works..

I agree :)

> 
> I feel like when hwpoison becomes a serious topic, we need some more
> serious reporting facility than error reports.  So that we could have this
> as separate topic to be revisited.  It might speed up your prior patches
> from not being blocked on this.

I explained why I think that error messages are important, but I don't 
want to get blocked on fixing the hugepage memory recovery because of that.

If you think that not displaying a specific message for large page loss 
can help to get the recovery fixed, than I can change my proposal to do so.

Early next week, I'll send a simplified version of my first 3 patches 
without this specific messages and without the preallocation handling in 
all remap cases, so you can evaluate this possibility.

Thank again for your feedback
William.



  reply	other threads:[~2025-02-07 18:03 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-01  9:57 [PATCH v7 0/6] Poisoned memory recovery on reboot “William Roche
2025-02-01  9:57 ` [PATCH v7 1/6] system/physmem: handle hugetlb correctly in qemu_ram_remap() “William Roche
2025-02-04 17:09   ` Peter Xu
2025-02-01  9:57 ` [PATCH v7 2/6] system/physmem: poisoned memory discard on reboot “William Roche
2025-02-04 17:09   ` Peter Xu
2025-02-05 16:27     ` William Roche
2025-02-01  9:57 ` [PATCH v7 3/6] accel/kvm: Report the loss of a large memory page “William Roche
2025-02-04 17:01   ` Peter Xu
2025-02-05 16:27     ` William Roche
2025-02-05 17:07       ` Peter Xu
2025-02-07 18:02         ` William Roche [this message]
2025-02-10 16:48           ` Peter Xu
2025-02-11 21:22             ` William Roche
2025-02-11 21:45               ` Peter Xu
2025-02-01  9:57 ` [PATCH v7 4/6] numa: Introduce and use ram_block_notify_remap() “William Roche
2025-02-04 17:17   ` Peter Xu
2025-02-04 17:42     ` David Hildenbrand
2025-02-01  9:57 ` [PATCH v7 5/6] hostmem: Factor out applying settings “William Roche
2025-02-01  9:57 ` [PATCH v7 6/6] hostmem: Handle remapping of RAM “William Roche
2025-02-04 17:50   ` David Hildenbrand
2025-02-04 17:58     ` Peter Xu
2025-02-04 18:55       ` David Hildenbrand
2025-02-04 20:16         ` Peter Xu
2025-02-05 16:27           ` William Roche
2025-02-05 17:58             ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2ad49f5d-f2c1-4ba2-9b6b-77ba96c83bab@oracle.com \
    --to=william.roche@oracle.com \
    --cc=david@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=imammedo@redhat.com \
    --cc=joao.m.martins@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=peterx@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=wangyanan55@huawei.com \
    --cc=zhao1.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).