qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: William Roche <william.roche@oracle.com>
To: "Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"peterx@redhat.com" <peterx@redhat.com>
Cc: "pbonzini@redhat.com" <pbonzini@redhat.com>,
	"quintela@redhat.com" <quintela@redhat.com>,
	"leobras@redhat.com" <leobras@redhat.com>,
	"joao.m.martins@oracle.com" <joao.m.martins@oracle.com>,
	"lidongchen@tencent.com" <lidongchen@tencent.com>
Subject: Re: [PATCH v2 1/1] migration: skip poisoned memory pages on "ram saving" phase
Date: Wed, 20 Sep 2023 14:11:35 +0200	[thread overview]
Message-ID: <2f088b44-801d-37a3-0edc-0286ac58d0be@oracle.com> (raw)
In-Reply-To: <128792ce-e3aa-a357-5e96-a4d8211193d6@fujitsu.com>

Thank you Zhijian for your feedback.

So I'll try to push this change today.

Cheers,
William.


On 9/20/23 12:04, Zhijian Li (Fujitsu) wrote:
> 
> 
> On 15/09/2023 19:31, William Roche wrote:
>> On 9/15/23 05:13, Zhijian Li (Fujitsu) wrote:
>>>
>>>
>>> I'm okay with "RDMA isn't touched".
>>> BTW, could you share your reproducing program/hacking to poison the page, so that
>>> i am able to take a look the RDMA part later when i'm free.
>>>
>>> Not sure it's suitable to acknowledge a not touched part. Anyway
>>> Acked-by: Li Zhijian <lizhijian@fujitsu.com> # RDMA
>>>
>>
>> Thanks.
>> As you asked for a procedure to inject memory errors into a running VM,
>> I've attached to this email the source code (mce_process_react.c) of a
>> program that will help to target the error injection in the VM.
> 
> 
> I just tried you hwpoison program and do RDMA migration. Migration failed, but fortunately
> the source side is still alive :).
> 
> (qemu) Failed to register chunk!: Bad address
> Chunk details: block: 0 chunk index 671 start 139955096518656 end 139955097567232 host 139955096518656 local 139954392924160 registrations: 636
> qemu-system-x86_64: cannot get lkey
> qemu-system-x86_64: rdma migration: write error! -22
> qemu-system-x86_64: RDMA is in an error state waiting migration to abort!
> qemu-system-x86_64: failed to save SaveStateEntry with id(name): 2(ram): -22
> qemu-system-x86_64: Early error. Sending error.
> 
> 
> Since current RDMA migration transfers guest memory in a chunk size(1M) by default, we may need to
> 
> option 1: reduce all chunk size to 1 page
> option 2: handle the hwpoison chunk specially
> 
> However, because there may be a chance to use another protocol, it's also possible to temporarily not fix the issue.
> 
> Tested-by: Li Zhijian <lizhijian@fujitsu.com>
> 
> Thanks
> Zhijian


  reply	other threads:[~2023-09-20 12:12 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-06 13:59 [PATCH 0/1] Qemu crashes on VM migration after an handled memory error “William Roche
2023-09-06 13:59 ` [PATCH 1/1] migration: skip poisoned memory pages on "ram saving" phase “William Roche
2023-09-06 14:19   ` Joao Martins
2023-09-06 15:16     ` Peter Xu
2023-09-06 21:29       ` William Roche
2023-09-09 14:57         ` Joao Martins
2023-09-11 19:48           ` Peter Xu
2023-09-12 18:44             ` Peter Xu
2023-09-14 20:20               ` [PATCH v2 0/1] Qemu crashes on VM migration after an handled memory error “William Roche
2023-09-14 20:20                 ` [PATCH v2 1/1] migration: skip poisoned memory pages on "ram saving" phase “William Roche
2023-09-15  3:13                   ` Zhijian Li (Fujitsu)
2023-09-15 11:31                     ` William Roche
2023-09-18  3:47                       ` Zhijian Li (Fujitsu)
2023-09-20 10:04                       ` Zhijian Li (Fujitsu)
2023-09-20 12:11                         ` William Roche [this message]
2023-09-20 23:53                         ` [PATCH v3 0/1] Qemu crashes on VM migration after an handled memory error “William Roche
2023-09-20 23:53                           ` [PATCH v3 1/1] migration: skip poisoned memory pages on "ram saving" phase “William Roche
2023-10-13 15:08                           ` [PATCH v4 0/2] Qemu crashes on VM migration after an handled memory error “William Roche
2023-10-13 15:08                             ` [PATCH v4 1/2] migration: skip poisoned memory pages on "ram saving" phase “William Roche
2023-10-13 15:08                             ` [PATCH v4 2/2] migration: prevent migration when a poisoned page is unknown from the VM “William Roche
2023-10-16 16:48                               ` Peter Xu
2023-10-17  0:38                                 ` William Roche
2023-10-17 15:13                                   ` Peter Xu
2023-11-06 21:38                                     ` William Roche
2023-11-08 21:45                                       ` Peter Xu
2023-11-10 19:22                                         ` William Roche
2023-11-06 22:03                                     ` [PATCH v5 0/2] Qemu crashes on VM migration after an handled memory error “William Roche
2023-11-06 22:03                                       ` [PATCH v5 1/2] migration: skip poisoned memory pages on "ram saving" phase “William Roche
2023-11-06 22:03                                       ` [PATCH v5 2/2] migration: prevent migration when a poisoned page is unknown from the VM “William Roche
2023-11-08 21:49                                       ` [PATCH v5 0/2] Qemu crashes on VM migration after an handled memory error Peter Xu
2024-01-30 19:06                                         ` [PATCH v1 0/1] " “William Roche
2024-01-30 19:06                                           ` [PATCH v1 1/1] migration: prevent migration when VM has poisoned memory “William Roche
2024-01-31  1:48                                             ` Peter Xu
2023-09-14 21:50                 ` [PATCH v2 0/1] Qemu crashes on VM migration after an handled memory error Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2f088b44-801d-37a3-0edc-0286ac58d0be@oracle.com \
    --to=william.roche@oracle.com \
    --cc=joao.m.martins@oracle.com \
    --cc=leobras@redhat.com \
    --cc=lidongchen@tencent.com \
    --cc=lizhijian@fujitsu.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).