qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: William Roche <william.roche@oracle.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, lizhijian@fujitsu.com,
	pbonzini@redhat.com, quintela@redhat.com, leobras@redhat.com,
	joao.m.martins@oracle.com, lidongchen@tencent.com
Subject: Re: [PATCH v4 2/2] migration: prevent migration when a poisoned page is unknown from the VM
Date: Fri, 10 Nov 2023 20:22:45 +0100	[thread overview]
Message-ID: <43bea81c-cb7d-44e9-a9b9-3f059faf472e@oracle.com> (raw)
In-Reply-To: <ZUwBgzr1GcSIy0sJ@x1n>

[-- Attachment #1: Type: text/plain, Size: 2765 bytes --]

On 11/8/23 22:45, Peter Xu wrote:
> On Mon, Nov 06, 2023 at 10:38:14PM +0100, William Roche wrote:
>> But it implies a lot of other changes:
>>      - The source has to flag the error pages to indicate a poison
>>        (new flag in the exchange protocole)
>>      - The destination has to be able to deal with the new protocole
> IIUC these two can be simply implemented by migrating hwpoison_page_list
> over to dest.  You need to have a compat bit for doing this, ignoring the
> list on old machine types, because old QEMUs will not recognize this vmsd.
>
> QEMU should even support migrating a list object in VMSD, feel free to have
> a look at VMSTATE_QLIST_V().

This is another area that I'll need to learn about.

>>      - The destination has to be able to mark the pages as poisoned
>>        (authorized to use userfaultfd)
> Note: userfaultfd is actually available without any privilege if to use
> UFFDIO_POISON only, as long as to open the uffd (either via syscall or
> /dev/userfaultfd) using UFFD_FLAG_USER_ONLY.
>
> A trick is we can register with UFFD_WP mode (not MISSING; because when a
> kernel accesses a missing page it'll cause SIGBUS then with USER_ONLY),
> then inject whatever POISON we want.  As long as UFFDIO_WRITEPROTECT is not
> invoked, UFFD_WP does nothing (unlike MISSING).
>
>>      - So both source and destination have to be upgraded (of course
>>        qemu but also an appropriate kernel version providing
>>        UFFDIO_POISON on the destination)
> True.  Unfortunately this is not avoidable.
>
>>      - we may need to be able to negotiate a fall back solution
>>      - an indication of the method to use could belong to the
>>        migration capabilities and parameters
> For above two points: it's a common issue with migration compatibility.  As
> long as you can provide above VMSD to migrate hwpoison_page_list, marking
> all old QEMU machine types skipping that, then it should just work.
>
> You can have a closer look at anything in hw_compat_* as an example.

Yes, I'll do that.

>>      - etc...
> I think you did summarize mostly all the points I can think of; is there
> really anything more? :)

Probably some work to select the poison migration method (allowing a
migration transforming poison into zeros as a fall back method if the
poison migration itself with UFFDIO_POISON can't be used, or not) for
example.

> It'll be great if you can, or plan to, fix that for good.

Thanks for the offer ;)
I'd really like to implement that, but I currently have another pressing
issue to work on. I should be back on this topic within a few months.

I'm now waiting for some feedback from the ARM architecture reviewer(s).

Thanks a lot for all your suggestions.

[-- Attachment #2: Type: text/html, Size: 4777 bytes --]

  reply	other threads:[~2023-11-10 19:24 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-06 13:59 [PATCH 0/1] Qemu crashes on VM migration after an handled memory error “William Roche
2023-09-06 13:59 ` [PATCH 1/1] migration: skip poisoned memory pages on "ram saving" phase “William Roche
2023-09-06 14:19   ` Joao Martins
2023-09-06 15:16     ` Peter Xu
2023-09-06 21:29       ` William Roche
2023-09-09 14:57         ` Joao Martins
2023-09-11 19:48           ` Peter Xu
2023-09-12 18:44             ` Peter Xu
2023-09-14 20:20               ` [PATCH v2 0/1] Qemu crashes on VM migration after an handled memory error “William Roche
2023-09-14 20:20                 ` [PATCH v2 1/1] migration: skip poisoned memory pages on "ram saving" phase “William Roche
2023-09-15  3:13                   ` Zhijian Li (Fujitsu)
2023-09-15 11:31                     ` William Roche
2023-09-18  3:47                       ` Zhijian Li (Fujitsu)
2023-09-20 10:04                       ` Zhijian Li (Fujitsu)
2023-09-20 12:11                         ` William Roche
2023-09-20 23:53                         ` [PATCH v3 0/1] Qemu crashes on VM migration after an handled memory error “William Roche
2023-09-20 23:53                           ` [PATCH v3 1/1] migration: skip poisoned memory pages on "ram saving" phase “William Roche
2023-10-13 15:08                           ` [PATCH v4 0/2] Qemu crashes on VM migration after an handled memory error “William Roche
2023-10-13 15:08                             ` [PATCH v4 1/2] migration: skip poisoned memory pages on "ram saving" phase “William Roche
2023-10-13 15:08                             ` [PATCH v4 2/2] migration: prevent migration when a poisoned page is unknown from the VM “William Roche
2023-10-16 16:48                               ` Peter Xu
2023-10-17  0:38                                 ` William Roche
2023-10-17 15:13                                   ` Peter Xu
2023-11-06 21:38                                     ` William Roche
2023-11-08 21:45                                       ` Peter Xu
2023-11-10 19:22                                         ` William Roche [this message]
2023-11-06 22:03                                     ` [PATCH v5 0/2] Qemu crashes on VM migration after an handled memory error “William Roche
2023-11-06 22:03                                       ` [PATCH v5 1/2] migration: skip poisoned memory pages on "ram saving" phase “William Roche
2023-11-06 22:03                                       ` [PATCH v5 2/2] migration: prevent migration when a poisoned page is unknown from the VM “William Roche
2023-11-08 21:49                                       ` [PATCH v5 0/2] Qemu crashes on VM migration after an handled memory error Peter Xu
2024-01-30 19:06                                         ` [PATCH v1 0/1] " “William Roche
2024-01-30 19:06                                           ` [PATCH v1 1/1] migration: prevent migration when VM has poisoned memory “William Roche
2024-01-31  1:48                                             ` Peter Xu
2023-09-14 21:50                 ` [PATCH v2 0/1] Qemu crashes on VM migration after an handled memory error Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43bea81c-cb7d-44e9-a9b9-3f059faf472e@oracle.com \
    --to=william.roche@oracle.com \
    --cc=joao.m.martins@oracle.com \
    --cc=leobras@redhat.com \
    --cc=lidongchen@tencent.com \
    --cc=lizhijian@fujitsu.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).