From: Peter Xu <peterx@redhat.com>
To: Joao Martins <joao.m.martins@oracle.com>
Cc: William Roche <william.roche@oracle.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Juan Quintela <quintela@redhat.com>,
Leonardo Bras <leobras@redhat.com>,
qemu-devel@nongnu.org, lizhijian@fujitsu.com,
lidongchen@tencent.com
Subject: Re: [PATCH 1/1] migration: skip poisoned memory pages on "ram saving" phase
Date: Tue, 12 Sep 2023 14:44:19 -0400 [thread overview]
Message-ID: <ZQCxg+M2IpecRT8w@x1n> (raw)
In-Reply-To: <ZP9vFuY0r29RSU1g@x1n>
On Mon, Sep 11, 2023 at 03:48:38PM -0400, Peter Xu wrote:
> On Sat, Sep 09, 2023 at 03:57:44PM +0100, Joao Martins wrote:
> > > Should I continue to treat them as zero pages written with
> > > save_zero_page_to_file ?
> >
> > MCE had already been forward to the guest, so guest is supposed to not be using
> > the page (nor rely on its contents). Hence destination ought to just see a zero
> > page. So what you said seems like the best course of action.
> >
> > > Or should I consider the case of an ongoing compression
> > > use and create a new code compressing an empty page with save_compress_page() ?
> > >
> > The compress code looks to be a tentative compression (not guaranteed IIUC), so
> > I am not sure it needs any more logic that just adding at the top of
> > ram_save_target_page_legacy() as Peter suggested?
> >
> > > And what about an RDMA memory region impacted by a memory error ?
> > > This is an important aspect.
> > > Does anyone know how this situation is dealt with ? And how it should be handled
> > > in Qemu ?
> > >
> >
> > If you refer to guest RDMA MRs that is just guest RAM, not sure we are even
> > aware of those from qemu. But if you refer to the RDMA transport that sits below
> > the Qemu file (or rather acts as an implementation of QemuFile), so handling in
> > ram_save_target_page_legacy() already seems to cover it.
>
> I'm also not familiar enough with RDMA, but it looks tricky indeed. AFAIU
> it's leveraging RDMA_CONTROL_COMPRESS for zero pages for now (with
> RDMACompress.value==0), so it doesn't seem to be using generic migration
> protocols.
>
> If we want to fix all places well, one way to consider is to introduce
> migration_buffer_is_zero(), which can be a wrapper for buffer_is_zero() by
> default, but also returns true for poisoned pages before reading the
> buffer. Then we use it in all three places:
>
> - For compression, in do_compress_ram_page()
> - For RDMA, in qemu_rdma_write_one()
Ah, this may not be enough.. sorry.
It seems this is only one path that RDMA will use to save a target page,
for (!rdma->pin_all || !block->is_ram_block) && !block->remote_keys[chunk].
RDMA seems to also possible to merge buffers if virtually continuous
(qemu_rdma_buffer_mergable()), so IIUC it may not trigger an immediate
access to the guest page until later if it finds continuous pages and skip
even more logic. I suspect that's also problematic for poisoned pages so
we should not allow any merged buffer to contain a poisoned page.
Not sure how complicated will it be to fix rdma specifically, copy again
two rdma developers. One option is we state the issue in rdma and fix
non-rdma first. Looks like rdma needs its own fix anyway.
> - For generic migration, in save_zero_page_to_file() (your current patch)
>
> I suppose then all cases will be fixed. We need to make sure we'll always
> use migration_buffer_is_zero() as the 1st thing to call when QEMU wants to
> migrate a target page. Maybe it'll worth a comment above that function.
>
> Thanks,
>
> --
> Peter Xu
--
Peter Xu
next prev parent reply other threads:[~2023-09-12 18:44 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-06 13:59 [PATCH 0/1] Qemu crashes on VM migration after an handled memory error “William Roche
2023-09-06 13:59 ` [PATCH 1/1] migration: skip poisoned memory pages on "ram saving" phase “William Roche
2023-09-06 14:19 ` Joao Martins
2023-09-06 15:16 ` Peter Xu
2023-09-06 21:29 ` William Roche
2023-09-09 14:57 ` Joao Martins
2023-09-11 19:48 ` Peter Xu
2023-09-12 18:44 ` Peter Xu [this message]
2023-09-14 20:20 ` [PATCH v2 0/1] Qemu crashes on VM migration after an handled memory error “William Roche
2023-09-14 20:20 ` [PATCH v2 1/1] migration: skip poisoned memory pages on "ram saving" phase “William Roche
2023-09-15 3:13 ` Zhijian Li (Fujitsu)
2023-09-15 11:31 ` William Roche
2023-09-18 3:47 ` Zhijian Li (Fujitsu)
2023-09-20 10:04 ` Zhijian Li (Fujitsu)
2023-09-20 12:11 ` William Roche
2023-09-20 23:53 ` [PATCH v3 0/1] Qemu crashes on VM migration after an handled memory error “William Roche
2023-09-20 23:53 ` [PATCH v3 1/1] migration: skip poisoned memory pages on "ram saving" phase “William Roche
2023-10-13 15:08 ` [PATCH v4 0/2] Qemu crashes on VM migration after an handled memory error “William Roche
2023-10-13 15:08 ` [PATCH v4 1/2] migration: skip poisoned memory pages on "ram saving" phase “William Roche
2023-10-13 15:08 ` [PATCH v4 2/2] migration: prevent migration when a poisoned page is unknown from the VM “William Roche
2023-10-16 16:48 ` Peter Xu
2023-10-17 0:38 ` William Roche
2023-10-17 15:13 ` Peter Xu
2023-11-06 21:38 ` William Roche
2023-11-08 21:45 ` Peter Xu
2023-11-10 19:22 ` William Roche
2023-11-06 22:03 ` [PATCH v5 0/2] Qemu crashes on VM migration after an handled memory error “William Roche
2023-11-06 22:03 ` [PATCH v5 1/2] migration: skip poisoned memory pages on "ram saving" phase “William Roche
2023-11-06 22:03 ` [PATCH v5 2/2] migration: prevent migration when a poisoned page is unknown from the VM “William Roche
2023-11-08 21:49 ` [PATCH v5 0/2] Qemu crashes on VM migration after an handled memory error Peter Xu
2024-01-30 19:06 ` [PATCH v1 0/1] " “William Roche
2024-01-30 19:06 ` [PATCH v1 1/1] migration: prevent migration when VM has poisoned memory “William Roche
2024-01-31 1:48 ` Peter Xu
2023-09-14 21:50 ` [PATCH v2 0/1] Qemu crashes on VM migration after an handled memory error Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZQCxg+M2IpecRT8w@x1n \
--to=peterx@redhat.com \
--cc=joao.m.martins@oracle.com \
--cc=leobras@redhat.com \
--cc=lidongchen@tencent.com \
--cc=lizhijian@fujitsu.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=william.roche@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).