From: Steven Sistare <steven.sistare@oracle.com>
To: Michael Roth <michael.roth@amd.com>, Fabiano Rosas <farosas@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>,
qemu-devel@nongnu.org, Peter Xu <peterx@redhat.com>,
David Hildenbrand <david@redhat.com>
Subject: Re: [PULL 2/8] migration: ram block cpr blockers
Date: Thu, 27 Mar 2025 08:27:51 -0400 [thread overview]
Message-ID: <307f12e8-7de0-42b2-97ab-997c2dc3f39c@oracle.com> (raw)
In-Reply-To: <20250326213443.jl3r77hqh6gy2h4w@amd.com>
On 3/26/2025 5:34 PM, Michael Roth wrote:
> On Wed, Mar 26, 2025 at 05:13:50PM -0300, Fabiano Rosas wrote:
>> Michael Roth <michael.roth@amd.com> writes:
>>
>>> Quoting Tom Lendacky (2025-03-26 14:21:31)
>>>> On 3/26/25 13:46, Tom Lendacky wrote:
>>>>> On 3/7/25 12:15, Fabiano Rosas wrote:
>>>>>> From: Steve Sistare <steven.sistare@oracle.com>
>>>>>>
>>>>>> Unlike cpr-reboot mode, cpr-transfer mode cannot save volatile ram blocks
>>>>>> in the migration stream file and recreate them later, because the physical
>>>>>> memory for the blocks is pinned and registered for vfio. Add a blocker
>>>>>> for volatile ram blocks.
>>>>>>
>>>>>> Also add a blocker for RAM_GUEST_MEMFD. Preserving guest_memfd may be
>>>>>> sufficient for CPR, but it has not been tested yet.
>>>>>>
>>>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>>>>> Reviewed-by: Fabiano Rosas <farosas@suse.de>
>>>>>> Reviewed-by: Peter Xu <peterx@redhat.com>
>>>>>> Reviewed-by: David Hildenbrand <david@redhat.com>
>>>>>> Message-ID: <1740667681-257312-1-git-send-email-steven.sistare@oracle.com>
>>>>>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>>>>>> ---
>>>>>> include/exec/memory.h | 3 ++
>>>>>> include/exec/ramblock.h | 1 +
>>>>>> migration/savevm.c | 2 ++
>>>>>> system/physmem.c | 66 +++++++++++++++++++++++++++++++++++++++++
>>>>>> 4 files changed, 72 insertions(+)
>>>>>
>>>>> This patch breaks booting an SNP guest as it triggers the following
>>>>> assert:
>>>>>
>>>>> qemu-system-x86_64: ../util/error.c:68: error_setv: Assertion `*errp == NULL' failed.
>>>>>
>>
>> Usually this means the error has already been set previously, which is
>> not allowed.
>>
>>>>> I tracked it to the err_setg() call in ram_block_add_cpr_blocker().
>>>>> It looks like the error message is unable to be printed because
>>>>> rb->cpr_blocker is NULL.
>>>>>
>>>>> Adding aux-ram-share=on to the -machine object gets me past the error and
>>>>> therefore the assertion, but isn't that an incompatible change to how an
>>>>> SNP guest has to be started?
>>>>
>>>> If I update the err_setg() call to use the errp parameter that is passed
>>>> into ram_block_add_cpr_blocker(), I get the following message and then
>>>> the guest launch terminates:
>>>>
>>
>> The usage at ram_block_add_cpr_blocker() is correct, the cpr_blocker
>> gets initialized and registered as a migration blocker. The errp only
>> becomes relevant later when migration starts and the error condition is
>> met.
>>
>>>> qemu-system-x86_64: Memory region pc.bios is not compatible with CPR.
>>>> share=on is required for memory-backend objects, and aux-ram-share=on is
>>>> required.
>>
>> Since errp is an &error_fatal, it causes QEMU to exit, so this^ error
>> message is bogus.
>>
>>>>
>>>> The qemu parameters I used prior to this patch that allowed an SNP guest
>>>> to launch were:
>>>>
>>>> -machine type=q35,confidential-guest-support=sev0,memory-backend=ram1
>>>> -object memory-backend-memfd,id=ram1,size=16G,share=true,prealloc=false
>>>>
>>>> With these parameters after this patch, the launch fails.
>>>
>>> I think it might be failing because the caller of
>>> ram_block_add_cpr_blocker() is passing in &error_abort, but if the
>>> error_setg() is call on a properly initialized cpr_blocker value then
>>> SNP is still able to boot for me.
>>> I'm not sure where the best spot is
>>> to initialize cpr_blocker, it probably needs to be done before either
>>> ram_block_add_cpr_blocker() or ram_block_del_cpr_blocker() are callable,
>>> but the following avoids the reported crash at least:
>>>
>>> diff --git a/system/physmem.c b/system/physmem.c
>>> index 44dd129662..bff0fdcaac 100644
>>> --- a/system/physmem.c
>>> +++ b/system/physmem.c
>>> @@ -4176,6 +4176,7 @@ void ram_block_add_cpr_blocker(RAMBlock *rb, Error **errp)
>>> return;
>>> }
>>>
>>> + rb->cpr_blocker = NULL;
>>
>> Could it be the cpr_blocker already got set at ram_block_add() in the
>> RAM_GUEST_MEMFD path?
>
> That seems to be the case: in some cases ram_block_add() sets cpr_blocker
> when (new_block->flags & RAM_GUEST_MEMFD) is true, and then soon after
> when ram_block_add_cpr_blocker() is called on the same RAMBlock:
>
> 2025-03-26T21:08:15.092427Z qemu-system-x86_64: warning: ram_block_add: new_block 0x55c247e4c880 new_block->cpr_blocker (nil) name ram1
>
> 2025-03-26T21:08:15.124710Z qemu-system-x86_64: warning: ram_block_add: new_block 0x55c2480fde00 new_block->cpr_blocker (nil) name pc.bios
> 2025-03-26T21:08:15.126190Z qemu-system-x86_64: warning: ram_block_add_cpr_blocker: rb 0x55c2480fde00 rb->cpr_blocker 0x55c2480fe050 name pc.bios
> 2025-03-26T21:08:15.138582Z qemu-system-x86_64: warning: ram_block_add: new_block 0x55c247e3c1e0 new_block->cpr_blocker (nil) name pc.rom
> 2025-03-26T21:08:15.138938Z qemu-system-x86_64: warning: ram_block_add_cpr_blocker: rb 0x55c247e3c1e0 rb->cpr_blocker 0x55c247e3c890 name pc.rom
>
> 2025-03-26T21:08:16.185577Z qemu-system-x86_64: warning: ram_block_add_cpr_blocker: rb 0x55c248db9200 rb->cpr_blocker (nil) name /rom@etc/acpi/tables
> 2025-03-26T21:08:16.187140Z qemu-system-x86_64: warning: ram_block_add_cpr_blocker: rb 0x55c248085620 rb->cpr_blocker (nil) name /rom@etc/table-loader
> 2025-03-26T21:08:16.188029Z qemu-system-x86_64: warning: ram_block_add_cpr_blocker: rb 0x55c2480ce220 rb->cpr_blocker (nil) name /rom@etc/acpi/rsd
Thanks everyone for debugging this. To summarize, ram_block_add_cpr_blocker already blocks
guest_memfd, because rb->fd < 0. The fix is to delete this redundant code in ram_block_add:
error_setg(&new_block->cpr_blocker,
"Memory region %s uses guest_memfd, "
"which is not supported with CPR.",
memory_region_name(new_block->mr));
migrate_add_blocker_modes(&new_block->cpr_blocker, errp,
MIG_MODE_CPR_TRANSFER, MIG_MODE_CPR_EXEC,
-1);
I will submit a fix (unless Tom or Michael would prefer to author it).
- Steve
next prev parent reply other threads:[~2025-03-27 12:29 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-07 18:15 [PULL 0/8] Migration patches for 2025-03-07 Fabiano Rosas
2025-03-07 18:15 ` [PULL 1/8] migration: Fix UAF for incoming migration on MigrationState Fabiano Rosas
2025-03-07 18:15 ` [PULL 2/8] migration: ram block cpr blockers Fabiano Rosas
2025-03-26 18:46 ` Tom Lendacky
2025-03-26 19:21 ` Tom Lendacky
2025-03-26 19:50 ` Michael Roth
2025-03-26 20:13 ` Fabiano Rosas
2025-03-26 21:34 ` Michael Roth
2025-03-27 12:27 ` Steven Sistare [this message]
2025-03-27 13:42 ` Tom Lendacky
2025-03-28 7:05 ` Xiaoyao Li
2025-03-07 18:15 ` [PULL 3/8] migration: Prioritize RDMA in ram_save_target_page() Fabiano Rosas
2025-03-07 18:15 ` [PULL 4/8] migration: check RDMA and capabilities are compatible on both sides Fabiano Rosas
2025-03-07 18:15 ` [PULL 5/8] migration: disable RDMA + postcopy-ram Fabiano Rosas
2025-03-07 18:15 ` [PULL 6/8] migration/rdma: Remove redundant migration_in_postcopy checks Fabiano Rosas
2025-03-07 18:15 ` [PULL 7/8] migration: Unfold control_save_page() Fabiano Rosas
2025-03-07 18:15 ` [PULL 8/8] migration: Add qtest for migration over RDMA Fabiano Rosas
2025-03-08 6:00 ` Philippe Mathieu-Daudé
2025-03-08 8:42 ` Stefan Hajnoczi
2025-03-10 8:33 ` Zhijian Li (Fujitsu) via
2025-03-10 14:36 ` Peter Xu
2025-03-11 2:06 ` Zhijian Li (Fujitsu) via
2025-03-10 8:01 ` Zhijian Li (Fujitsu) via
2025-03-10 15:00 ` Fabiano Rosas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=307f12e8-7de0-42b2-97ab-997c2dc3f39c@oracle.com \
--to=steven.sistare@oracle.com \
--cc=david@redhat.com \
--cc=farosas@suse.de \
--cc=michael.roth@amd.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=thomas.lendacky@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).