qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: William Roche <william.roche@oracle.com>
To: Peter Xu <peterx@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Juan Quintela <quintela@redhat.com>,
	 Leonardo Bras <leobras@redhat.com>,
	qemu-devel@nongnu.org, Joao Martins <joao.m.martins@oracle.com>
Subject: Re: [PATCH 1/1] migration: skip poisoned memory pages on "ram saving" phase
Date: Wed, 6 Sep 2023 23:29:29 +0200	[thread overview]
Message-ID: <3968a8e5-4010-0c97-7e1b-0dcba64ade01@oracle.com> (raw)
In-Reply-To: <ZPiX4YLAT5HjxUAJ@x1n>

On 9/6/23 17:16, Peter Xu wrote:
> 
> Just a note..
> 
> Probably fine for now to reuse block page size, but IIUC the right thing to
> do is to fetch it from the signal info (in QEMU's sigbus_handler()) of
> kernel_siginfo.si_addr_lsb.
> 
> At least for x86 I think that stores the "shift" of covered poisoned page
> (one needs to track the Linux handling of VM_FAULT_HWPOISON_LARGE for a
> huge page, though.. not aware of any man page for that).  It'll then work
> naturally when Linux huge pages will start to support sub-huge-page-size
> poisoning someday.  We can definitely leave that for later.
> 

I totally agree with that !


>>> --- a/migration/ram.c
>>> +++ b/migration/ram.c
>>> @@ -1145,7 +1145,8 @@ static int save_zero_page_to_file(PageSearchStatus *pss, QEMUFile *file,
>>>       uint8_t *p = block->host + offset;
>>>       int len = 0;
>>>   
>>> -    if (buffer_is_zero(p, TARGET_PAGE_SIZE)) {
>>> +    if ((kvm_enabled() && kvm_hwpoisoned_page(block, (void *)offset)) ||
> 
> Can we move this out of zero page handling?  Zero detection is not
> guaranteed to always be the 1st thing to do when processing a guest page.
> Currently it'll already skip either rdma or when compression enabled, so
> it'll keep crashing there.
> 
> Perhaps at the entry of ram_save_target_page_legacy()?

Right, as expected, using migration compression with poisoned pages 
crashes even with this fix...

The difficulty I see to place the poisoned page verification on the
entry of ram_save_target_page_legacy() is what to do to skip the found 
poison page(s) if any ?

Should I continue to treat them as zero pages written with 
save_zero_page_to_file ? Or should I consider the case of an ongoing 
compression use and create a new code compressing an empty page with 
save_compress_page() ?

And what about an RDMA memory region impacted by a memory error ?
This is an important aspect.
Does anyone know how this situation is dealt with ? And how it should be 
handled in Qemu ?

--
Thanks,
William.


  reply	other threads:[~2023-09-06 21:30 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-06 13:59 [PATCH 0/1] Qemu crashes on VM migration after an handled memory error “William Roche
2023-09-06 13:59 ` [PATCH 1/1] migration: skip poisoned memory pages on "ram saving" phase “William Roche
2023-09-06 14:19   ` Joao Martins
2023-09-06 15:16     ` Peter Xu
2023-09-06 21:29       ` William Roche [this message]
2023-09-09 14:57         ` Joao Martins
2023-09-11 19:48           ` Peter Xu
2023-09-12 18:44             ` Peter Xu
2023-09-14 20:20               ` [PATCH v2 0/1] Qemu crashes on VM migration after an handled memory error “William Roche
2023-09-14 20:20                 ` [PATCH v2 1/1] migration: skip poisoned memory pages on "ram saving" phase “William Roche
2023-09-15  3:13                   ` Zhijian Li (Fujitsu)
2023-09-15 11:31                     ` William Roche
2023-09-18  3:47                       ` Zhijian Li (Fujitsu)
2023-09-20 10:04                       ` Zhijian Li (Fujitsu)
2023-09-20 12:11                         ` William Roche
2023-09-20 23:53                         ` [PATCH v3 0/1] Qemu crashes on VM migration after an handled memory error “William Roche
2023-09-20 23:53                           ` [PATCH v3 1/1] migration: skip poisoned memory pages on "ram saving" phase “William Roche
2023-10-13 15:08                           ` [PATCH v4 0/2] Qemu crashes on VM migration after an handled memory error “William Roche
2023-10-13 15:08                             ` [PATCH v4 1/2] migration: skip poisoned memory pages on "ram saving" phase “William Roche
2023-10-13 15:08                             ` [PATCH v4 2/2] migration: prevent migration when a poisoned page is unknown from the VM “William Roche
2023-10-16 16:48                               ` Peter Xu
2023-10-17  0:38                                 ` William Roche
2023-10-17 15:13                                   ` Peter Xu
2023-11-06 21:38                                     ` William Roche
2023-11-08 21:45                                       ` Peter Xu
2023-11-10 19:22                                         ` William Roche
2023-11-06 22:03                                     ` [PATCH v5 0/2] Qemu crashes on VM migration after an handled memory error “William Roche
2023-11-06 22:03                                       ` [PATCH v5 1/2] migration: skip poisoned memory pages on "ram saving" phase “William Roche
2023-11-06 22:03                                       ` [PATCH v5 2/2] migration: prevent migration when a poisoned page is unknown from the VM “William Roche
2023-11-08 21:49                                       ` [PATCH v5 0/2] Qemu crashes on VM migration after an handled memory error Peter Xu
2024-01-30 19:06                                         ` [PATCH v1 0/1] " “William Roche
2024-01-30 19:06                                           ` [PATCH v1 1/1] migration: prevent migration when VM has poisoned memory “William Roche
2024-01-31  1:48                                             ` Peter Xu
2023-09-14 21:50                 ` [PATCH v2 0/1] Qemu crashes on VM migration after an handled memory error Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3968a8e5-4010-0c97-7e1b-0dcba64ade01@oracle.com \
    --to=william.roche@oracle.com \
    --cc=joao.m.martins@oracle.com \
    --cc=leobras@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).