From: zhenwei pi <pizhenwei@bytedance.com>
To: akpm@linux-foundation.org, naoya.horiguchi@nec.com,
mst@redhat.com, david@redhat.com
Cc: qemu-devel@nongnu.org, linux-kernel@vger.kernel.org,
virtualization@lists.linux-foundation.org, linux-mm@kvack.org,
zhenwei pi <pizhenwei@bytedance.com>,
pbonzini@redhat.com
Subject: [PATCH 0/3] recover hardware corrupted page by virtio balloon
Date: Fri, 20 May 2022 15:06:45 +0800 [thread overview]
Message-ID: <20220520070648.1794132-1-pizhenwei@bytedance.com> (raw)
Hi,
I'm trying to recover hardware corrupted page by virtio balloon, the
workflow of this feature like this:
Guest 5.MF -> 6.RVQ FE 10.Unpoison page
/ \ /
-------------------+-------------+----------+-----------
| | |
4.MCE 7.RVQ BE 9.RVQ Event
QEMU / \ /
3.SIGBUS 8.Remap
/
----------------+------------------------------------
|
+--2.MF
Host /
1.HW error
1, HardWare page error occurs randomly.
2, host side handles corrupted page by Memory Failure mechanism, sends
SIGBUS to the user process if early-kill is enabled.
3, QEMU handles SIGBUS, if the address belongs to guest RAM, then:
4, QEMU tries to inject MCE into guest.
5, guest handles memory failure again.
1-5 is already supported for a long time, the next steps are supported
in this patch(also related driver patch):
6, guest balloon driver gets noticed of the corrupted PFN, and sends
request to host side by Recover VQ FrontEnd.
7, QEMU handles request from Recover VQ BackEnd, then:
8, QEMU remaps the corrupted HVA fo fix the memory failure, then:
9, QEMU acks the guest side the result by Recover VQ.
10, guest unpoisons the page if the corrupted page gets recoverd
successfully.
Test:
This patch set can be tested with QEMU(also in developing):
https://github.com/pizhenwei/qemu/tree/balloon-recover
Emulate MCE by QEMU(guest RAM normal page only, hugepage is not supported):
virsh qemu-monitor-command vm --hmp mce 0 9 0xbd000000000000c0 0xd 0x61646678 0x8c
The guest works fine(on Intel Platinum 8260):
mce: [Hardware Error]: Machine check events logged
Memory failure: 0x61646: recovery action for dirty LRU page: Recovered
virtio_balloon virtio5: recovered pfn 0x61646
Unpoison: Unpoisoned page 0x61646 by virtio-balloon
MCE: Killing stress:24502 due to hardware memory corruption fault at 7f5be2e5a010
And the 'HardwareCorrupted' in /proc/meminfo also shows 0 kB.
About the protocol of virtio balloon recover VQ, it's undefined and in
developing currently:
- 'struct virtio_balloon_recover' defines the structure which is used to
exchange message between guest and host.
- '__le32 corrupted_pages' in struct virtio_balloon_config is used in the next
step:
1, a VM uses RAM of 2M huge page, once a MCE occurs, the 2M becomes
unaccessible. Reporting 512 * 4K 'corrupted_pages' to the guest, the guest
has a chance to isolate the 512 pages ahead of time.
2, after migrating to another host, the corrupted pages are actually recovered,
once the guest gets the 'corrupted_pages' with 0, then the guest could
unpoison all the poisoned pages which are recorded in the balloon driver.
zhenwei pi (3):
memory-failure: Introduce memory failure notifier
mm/memory-failure.c: support reset PTE during unpoison
virtio_balloon: Introduce memory recover
drivers/virtio/virtio_balloon.c | 243 ++++++++++++++++++++++++++++
include/linux/mm.h | 4 +-
include/uapi/linux/virtio_balloon.h | 16 ++
mm/hwpoison-inject.c | 2 +-
mm/memory-failure.c | 59 ++++++-
5 files changed, 315 insertions(+), 9 deletions(-)
--
2.20.1
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
next reply other threads:[~2022-05-20 7:11 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-05-20 7:06 zhenwei pi [this message]
2022-05-20 7:06 ` [PATCH 1/3] memory-failure: Introduce memory failure notifier zhenwei pi
2022-05-20 7:06 ` [PATCH 2/3] mm/memory-failure.c: support reset PTE during unpoison zhenwei pi
[not found] ` <20220530050234.GA1036127@hori.linux.bs1.fc.nec.co.jp>
2022-05-30 5:46 ` zhenwei pi
2022-05-30 6:50 ` David Hildenbrand
2022-05-20 7:06 ` [PATCH 3/3] virtio_balloon: Introduce memory recover zhenwei pi
2022-05-20 12:48 ` kernel test robot
2022-05-20 13:39 ` kernel test robot
2022-05-20 15:28 ` kernel test robot
[not found] ` <Yo0zmP28FqpivlxF@google.com>
2022-05-24 23:32 ` zhenwei pi
2022-05-30 7:53 ` David Hildenbrand
2022-05-26 19:18 ` Michael S. Tsirkin
2022-05-27 2:22 ` zhenwei pi
2022-05-30 7:48 ` David Hildenbrand
2022-05-30 12:47 ` zhenwei pi
2022-05-24 18:59 ` [PATCH 0/3] recover hardware corrupted page by virtio balloon David Hildenbrand
2022-05-27 3:47 ` zhenwei pi
[not found] <CAPcxDJ5pduUyMA0rf+-aTjK_2eBvig05UTiTptX1nVkWE-_g8w@mail.gmail.com>
2022-05-26 18:37 ` Peter Xu
2022-05-27 6:32 ` zhenwei pi
2022-05-30 7:41 ` David Hildenbrand
2022-05-30 11:33 ` zhenwei pi
2022-05-30 15:49 ` Peter Xu
[not found] ` <CAPcxDJ5UMfpys8KyLQVnkV9BPO1vaubxbhc7f4XC_TdNO7jr7g@mail.gmail.com>
2022-06-01 2:17 ` zhenwei pi
2022-06-01 7:59 ` David Hildenbrand
2022-06-02 9:28 ` zhenwei pi
2022-06-02 9:40 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220520070648.1794132-1-pizhenwei@bytedance.com \
--to=pizhenwei@bytedance.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mst@redhat.com \
--cc=naoya.horiguchi@nec.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).