All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Joao Martins <joao.m.martins@oracle.com>
Cc: William Roche <william.roche@oracle.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Juan Quintela <quintela@redhat.com>,
	Leonardo Bras <leobras@redhat.com>,
	qemu-devel@nongnu.org, lizhijian@fujitsu.com,
	lidongchen@tencent.com
Subject: Re: [PATCH 1/1] migration: skip poisoned memory pages on "ram saving" phase
Date: Tue, 12 Sep 2023 14:44:19 -0400	[thread overview]
Message-ID: <ZQCxg+M2IpecRT8w@x1n> (raw)
In-Reply-To: <ZP9vFuY0r29RSU1g@x1n>

On Mon, Sep 11, 2023 at 03:48:38PM -0400, Peter Xu wrote:
> On Sat, Sep 09, 2023 at 03:57:44PM +0100, Joao Martins wrote:
> > > Should I continue to treat them as zero pages written with
> > > save_zero_page_to_file ? 
> > 
> > MCE had already been forward to the guest, so guest is supposed to not be using
> > the page (nor rely on its contents). Hence destination ought to just see a zero
> > page. So what you said seems like the best course of action.
> > 
> > > Or should I consider the case of an ongoing compression
> > > use and create a new code compressing an empty page with save_compress_page() ?
> > > 
> > The compress code looks to be a tentative compression (not guaranteed IIUC), so
> > I am not sure it needs any more logic that just adding at the top of
> > ram_save_target_page_legacy() as Peter suggested?
> > 
> > > And what about an RDMA memory region impacted by a memory error ?
> > > This is an important aspect.
> > > Does anyone know how this situation is dealt with ? And how it should be handled
> > > in Qemu ?
> > > 
> > 
> > If you refer to guest RDMA MRs that is just guest RAM, not sure we are even
> > aware of those from qemu. But if you refer to the RDMA transport that sits below
> > the Qemu file (or rather acts as an implementation of QemuFile), so handling in
> > ram_save_target_page_legacy() already seems to cover it.
> 
> I'm also not familiar enough with RDMA, but it looks tricky indeed. AFAIU
> it's leveraging RDMA_CONTROL_COMPRESS for zero pages for now (with
> RDMACompress.value==0), so it doesn't seem to be using generic migration
> protocols.
> 
> If we want to fix all places well, one way to consider is to introduce
> migration_buffer_is_zero(), which can be a wrapper for buffer_is_zero() by
> default, but also returns true for poisoned pages before reading the
> buffer.  Then we use it in all three places:
> 
>   - For compression, in do_compress_ram_page()
>   - For RDMA, in qemu_rdma_write_one()

Ah, this may not be enough.. sorry.

It seems this is only one path that RDMA will use to save a target page,
for (!rdma->pin_all || !block->is_ram_block) && !block->remote_keys[chunk].

RDMA seems to also possible to merge buffers if virtually continuous
(qemu_rdma_buffer_mergable()), so IIUC it may not trigger an immediate
access to the guest page until later if it finds continuous pages and skip
even more logic.  I suspect that's also problematic for poisoned pages so
we should not allow any merged buffer to contain a poisoned page.

Not sure how complicated will it be to fix rdma specifically, copy again
two rdma developers.  One option is we state the issue in rdma and fix
non-rdma first.  Looks like rdma needs its own fix anyway.

>   - For generic migration, in save_zero_page_to_file() (your current patch)
> 
> I suppose then all cases will be fixed.  We need to make sure we'll always
> use migration_buffer_is_zero() as the 1st thing to call when QEMU wants to
> migrate a target page.  Maybe it'll worth a comment above that function.
> 
> Thanks,
> 
> -- 
> Peter Xu

-- 
Peter Xu



  reply	other threads:[~2023-09-12 18:44 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-06 13:59 [PATCH 0/1] Qemu crashes on VM migration after an handled memory error “William Roche
2023-09-06 13:59 ` [PATCH 1/1] migration: skip poisoned memory pages on "ram saving" phase “William Roche
2023-09-06 14:19   ` Joao Martins
2023-09-06 15:16     ` Peter Xu
2023-09-06 21:29       ` William Roche
2023-09-09 14:57         ` Joao Martins
2023-09-11 19:48           ` Peter Xu
2023-09-12 18:44             ` Peter Xu [this message]
2023-09-14 20:20               ` [PATCH v2 0/1] Qemu crashes on VM migration after an handled memory error “William Roche
2023-09-14 20:20                 ` [PATCH v2 1/1] migration: skip poisoned memory pages on "ram saving" phase “William Roche
2023-09-15  3:13                   ` Zhijian Li (Fujitsu)
2023-09-15 11:31                     ` William Roche
2023-09-18  3:47                       ` Zhijian Li (Fujitsu)
2023-09-20 10:04                       ` Zhijian Li (Fujitsu)
2023-09-20 12:11                         ` William Roche
2023-09-20 23:53                         ` [PATCH v3 0/1] Qemu crashes on VM migration after an handled memory error “William Roche
2023-09-20 23:53                           ` [PATCH v3 1/1] migration: skip poisoned memory pages on "ram saving" phase “William Roche
2023-10-13 15:08                           ` [PATCH v4 0/2] Qemu crashes on VM migration after an handled memory error “William Roche
2023-10-13 15:08                             ` [PATCH v4 1/2] migration: skip poisoned memory pages on "ram saving" phase “William Roche
2023-10-13 15:08                             ` [PATCH v4 2/2] migration: prevent migration when a poisoned page is unknown from the VM “William Roche
2023-10-16 16:48                               ` Peter Xu
2023-10-17  0:38                                 ` William Roche
2023-10-17 15:13                                   ` Peter Xu
2023-11-06 21:38                                     ` William Roche
2023-11-08 21:45                                       ` Peter Xu
2023-11-10 19:22                                         ` William Roche
2023-11-06 22:03                                     ` [PATCH v5 0/2] Qemu crashes on VM migration after an handled memory error “William Roche
2023-11-06 22:03                                       ` [PATCH v5 1/2] migration: skip poisoned memory pages on "ram saving" phase “William Roche
2023-11-06 22:03                                       ` [PATCH v5 2/2] migration: prevent migration when a poisoned page is unknown from the VM “William Roche
2023-11-08 21:49                                       ` [PATCH v5 0/2] Qemu crashes on VM migration after an handled memory error Peter Xu
2024-01-30 19:06                                         ` [PATCH v1 0/1] " “William Roche
2024-01-30 19:06                                           ` [PATCH v1 1/1] migration: prevent migration when VM has poisoned memory “William Roche
2024-01-31  1:48                                             ` Peter Xu
2023-09-14 21:50                 ` [PATCH v2 0/1] Qemu crashes on VM migration after an handled memory error Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZQCxg+M2IpecRT8w@x1n \
    --to=peterx@redhat.com \
    --cc=joao.m.martins@oracle.com \
    --cc=leobras@redhat.com \
    --cc=lidongchen@tencent.com \
    --cc=lizhijian@fujitsu.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=william.roche@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.