From: Joao Martins <joao.m.martins@oracle.com>
To: William Roche <william.roche@oracle.com>, Peter Xu <peterx@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
Juan Quintela <quintela@redhat.com>,
Leonardo Bras <leobras@redhat.com>,
qemu-devel@nongnu.org
Subject: Re: [PATCH 1/1] migration: skip poisoned memory pages on "ram saving" phase
Date: Sat, 9 Sep 2023 15:57:44 +0100 [thread overview]
Message-ID: <40a7a752-7bdd-968d-ec5a-2a58f72a6d32@oracle.com> (raw)
In-Reply-To: <3968a8e5-4010-0c97-7e1b-0dcba64ade01@oracle.com>
On 06/09/2023 22:29, William Roche wrote:
> On 9/6/23 17:16, Peter Xu wrote:
>>
>> Just a note..
>>
>> Probably fine for now to reuse block page size, but IIUC the right thing to
>> do is to fetch it from the signal info (in QEMU's sigbus_handler()) of
>> kernel_siginfo.si_addr_lsb.
>>
>> At least for x86 I think that stores the "shift" of covered poisoned page
>> (one needs to track the Linux handling of VM_FAULT_HWPOISON_LARGE for a
>> huge page, though.. not aware of any man page for that). It'll then work
>> naturally when Linux huge pages will start to support sub-huge-page-size
>> poisoning someday. We can definitely leave that for later.
>>
>
> I totally agree with that !
>
Provided this bug affects all qemu versions thus far, perhaps should be a follow
up series, to make the changer easier to bring into stable tree.
>
>>>> --- a/migration/ram.c
>>>> +++ b/migration/ram.c
>>>> @@ -1145,7 +1145,8 @@ static int save_zero_page_to_file(PageSearchStatus
>>>> *pss, QEMUFile *file,
>>>> uint8_t *p = block->host + offset;
>>>> int len = 0;
>>>> - if (buffer_is_zero(p, TARGET_PAGE_SIZE)) {
>>>> + if ((kvm_enabled() && kvm_hwpoisoned_page(block, (void *)offset)) ||
>>
>> Can we move this out of zero page handling? Zero detection is not
>> guaranteed to always be the 1st thing to do when processing a guest page.
>> Currently it'll already skip either rdma or when compression enabled, so
>> it'll keep crashing there.
>>
>> Perhaps at the entry of ram_save_target_page_legacy()?
>
> Right, as expected, using migration compression with poisoned pages crashes even
> with this fix...
>
> The difficulty I see to place the poisoned page verification on the
> entry of ram_save_target_page_legacy() is what to do to skip the found poison
> page(s) if any ?
>
> Should I continue to treat them as zero pages written with
> save_zero_page_to_file ?
MCE had already been forward to the guest, so guest is supposed to not be using
the page (nor rely on its contents). Hence destination ought to just see a zero
page. So what you said seems like the best course of action.
> Or should I consider the case of an ongoing compression
> use and create a new code compressing an empty page with save_compress_page() ?
>
The compress code looks to be a tentative compression (not guaranteed IIUC), so
I am not sure it needs any more logic that just adding at the top of
ram_save_target_page_legacy() as Peter suggested?
> And what about an RDMA memory region impacted by a memory error ?
> This is an important aspect.
> Does anyone know how this situation is dealt with ? And how it should be handled
> in Qemu ?
>
If you refer to guest RDMA MRs that is just guest RAM, not sure we are even
aware of those from qemu. But if you refer to the RDMA transport that sits below
the Qemu file (or rather acts as an implementation of QemuFile), so handling in
ram_save_target_page_legacy() already seems to cover it.
> --
> Thanks,
> William.
next prev parent reply other threads:[~2023-09-09 14:59 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-06 13:59 [PATCH 0/1] Qemu crashes on VM migration after an handled memory error “William Roche
2023-09-06 13:59 ` [PATCH 1/1] migration: skip poisoned memory pages on "ram saving" phase “William Roche
2023-09-06 14:19 ` Joao Martins
2023-09-06 15:16 ` Peter Xu
2023-09-06 21:29 ` William Roche
2023-09-09 14:57 ` Joao Martins [this message]
2023-09-11 19:48 ` Peter Xu
2023-09-12 18:44 ` Peter Xu
2023-09-14 20:20 ` [PATCH v2 0/1] Qemu crashes on VM migration after an handled memory error “William Roche
2023-09-14 20:20 ` [PATCH v2 1/1] migration: skip poisoned memory pages on "ram saving" phase “William Roche
2023-09-15 3:13 ` Zhijian Li (Fujitsu)
2023-09-15 11:31 ` William Roche
2023-09-18 3:47 ` Zhijian Li (Fujitsu)
2023-09-20 10:04 ` Zhijian Li (Fujitsu)
2023-09-20 12:11 ` William Roche
2023-09-20 23:53 ` [PATCH v3 0/1] Qemu crashes on VM migration after an handled memory error “William Roche
2023-09-20 23:53 ` [PATCH v3 1/1] migration: skip poisoned memory pages on "ram saving" phase “William Roche
2023-10-13 15:08 ` [PATCH v4 0/2] Qemu crashes on VM migration after an handled memory error “William Roche
2023-10-13 15:08 ` [PATCH v4 1/2] migration: skip poisoned memory pages on "ram saving" phase “William Roche
2023-10-13 15:08 ` [PATCH v4 2/2] migration: prevent migration when a poisoned page is unknown from the VM “William Roche
2023-10-16 16:48 ` Peter Xu
2023-10-17 0:38 ` William Roche
2023-10-17 15:13 ` Peter Xu
2023-11-06 21:38 ` William Roche
2023-11-08 21:45 ` Peter Xu
2023-11-10 19:22 ` William Roche
2023-11-06 22:03 ` [PATCH v5 0/2] Qemu crashes on VM migration after an handled memory error “William Roche
2023-11-06 22:03 ` [PATCH v5 1/2] migration: skip poisoned memory pages on "ram saving" phase “William Roche
2023-11-06 22:03 ` [PATCH v5 2/2] migration: prevent migration when a poisoned page is unknown from the VM “William Roche
2023-11-08 21:49 ` [PATCH v5 0/2] Qemu crashes on VM migration after an handled memory error Peter Xu
2024-01-30 19:06 ` [PATCH v1 0/1] " “William Roche
2024-01-30 19:06 ` [PATCH v1 1/1] migration: prevent migration when VM has poisoned memory “William Roche
2024-01-31 1:48 ` Peter Xu
2023-09-14 21:50 ` [PATCH v2 0/1] Qemu crashes on VM migration after an handled memory error Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=40a7a752-7bdd-968d-ec5a-2a58f72a6d32@oracle.com \
--to=joao.m.martins@oracle.com \
--cc=leobras@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=william.roche@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).