qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "“William Roche" <william.roche@oracle.com>
To: qemu-devel@nongnu.org, peterx@redhat.com
Cc: qemu-arm@nongnu.org, lizhijian@fujitsu.com, pbonzini@redhat.com,
	quintela@redhat.com, leobras@redhat.com,
	joao.m.martins@oracle.com, lidongchen@tencent.com,
	william.roche@oracle.com
Subject: [PATCH v1 0/1] Qemu crashes on VM migration after an handled memory error
Date: Tue, 30 Jan 2024 19:06:39 +0000	[thread overview]
Message-ID: <20240130190640.139364-1-william.roche@oracle.com> (raw)
In-Reply-To: <ZUwCZdZj-vZD1NJC@x1n>

From: William Roche <william.roche@oracle.com>

Problem:
--------
A Qemu VM can survive a memory error, as qemu can relay the error to the
VM kernel which could also deal with it -- poisoning/off-lining the impacted
page. This situation creates a hole in the VM memory address space (an
unreadable page or set of pages).

A migration request of this VM (live migration through the network or
pseudo-migration with the creation of a state file) will crash Qemu when
it sequentially reads the memory address space and stumbles on the
existing hole.

New fix proposal:
-----------------
Let's prevent the migration when we know that there is a poison page in
the VM address space.


History:
--------
My first fix proposal for this crash condition (latest version:
https://lore.kernel.org/all/20231106220319.456765-1-william.roche@oracle.com/ )
relied on a well behaving kernel to guaranty that a known poison page is
not accessed. It introduced an ARM platform specificity.
I haven't received any feedback about the ARM specificity to avoid
a possible memory corruption after a migration transforming a poisoned
page into an all zero page.

I also accept that when a memory error leads to memory poisoning, this
platform functionality has to be honored as long as a physical platform
would provide it.

Peter asked for a complete correction of this problem (transfering
the memory holes information with the migration and recreating these
holes on the destination platform).

In the meantime, this is a very small fix to avoid the current crash
situation reading the poisoned memory pages.  I'm simply preventing
the migration when we know that it would crash, when there is a
poisoned page in the VM address space.

This is a generic protection code, avoiding a crash condition and
reporting the following error message:
"Error: Can't migrate this vm with hardware poisoned memory, please reboot the vm and try again"
instead of crashing the VM.

This fix is scripts/checkpatch.pl clean.
Unit tested on ARM and x86.


William Roche (1):
  migration: prevent migration when VM has poisoned memory

 accel/kvm/kvm-all.c    | 10 ++++++++++
 accel/stubs/kvm-stub.c |  5 +++++
 include/sysemu/kvm.h   |  6 ++++++
 migration/migration.c  |  7 +++++++
 4 files changed, 28 insertions(+)

-- 
2.39.3



  reply	other threads:[~2024-01-30 19:07 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-06 13:59 [PATCH 0/1] Qemu crashes on VM migration after an handled memory error “William Roche
2023-09-06 13:59 ` [PATCH 1/1] migration: skip poisoned memory pages on "ram saving" phase “William Roche
2023-09-06 14:19   ` Joao Martins
2023-09-06 15:16     ` Peter Xu
2023-09-06 21:29       ` William Roche
2023-09-09 14:57         ` Joao Martins
2023-09-11 19:48           ` Peter Xu
2023-09-12 18:44             ` Peter Xu
2023-09-14 20:20               ` [PATCH v2 0/1] Qemu crashes on VM migration after an handled memory error “William Roche
2023-09-14 20:20                 ` [PATCH v2 1/1] migration: skip poisoned memory pages on "ram saving" phase “William Roche
2023-09-15  3:13                   ` Zhijian Li (Fujitsu)
2023-09-15 11:31                     ` William Roche
2023-09-18  3:47                       ` Zhijian Li (Fujitsu)
2023-09-20 10:04                       ` Zhijian Li (Fujitsu)
2023-09-20 12:11                         ` William Roche
2023-09-20 23:53                         ` [PATCH v3 0/1] Qemu crashes on VM migration after an handled memory error “William Roche
2023-09-20 23:53                           ` [PATCH v3 1/1] migration: skip poisoned memory pages on "ram saving" phase “William Roche
2023-10-13 15:08                           ` [PATCH v4 0/2] Qemu crashes on VM migration after an handled memory error “William Roche
2023-10-13 15:08                             ` [PATCH v4 1/2] migration: skip poisoned memory pages on "ram saving" phase “William Roche
2023-10-13 15:08                             ` [PATCH v4 2/2] migration: prevent migration when a poisoned page is unknown from the VM “William Roche
2023-10-16 16:48                               ` Peter Xu
2023-10-17  0:38                                 ` William Roche
2023-10-17 15:13                                   ` Peter Xu
2023-11-06 21:38                                     ` William Roche
2023-11-08 21:45                                       ` Peter Xu
2023-11-10 19:22                                         ` William Roche
2023-11-06 22:03                                     ` [PATCH v5 0/2] Qemu crashes on VM migration after an handled memory error “William Roche
2023-11-06 22:03                                       ` [PATCH v5 1/2] migration: skip poisoned memory pages on "ram saving" phase “William Roche
2023-11-06 22:03                                       ` [PATCH v5 2/2] migration: prevent migration when a poisoned page is unknown from the VM “William Roche
2023-11-08 21:49                                       ` [PATCH v5 0/2] Qemu crashes on VM migration after an handled memory error Peter Xu
2024-01-30 19:06                                         ` “William Roche [this message]
2024-01-30 19:06                                           ` [PATCH v1 1/1] migration: prevent migration when VM has poisoned memory “William Roche
2024-01-31  1:48                                             ` Peter Xu
2023-09-14 21:50                 ` [PATCH v2 0/1] Qemu crashes on VM migration after an handled memory error Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240130190640.139364-1-william.roche@oracle.com \
    --to=william.roche@oracle.com \
    --cc=joao.m.martins@oracle.com \
    --cc=leobras@redhat.com \
    --cc=lidongchen@tencent.com \
    --cc=lizhijian@fujitsu.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).