qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Xiaoyao Li <xiaoyao.li@intel.com>
To: Steven Sistare <steven.sistare@oracle.com>,
	Michael Roth <michael.roth@amd.com>,
	Fabiano Rosas <farosas@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>,
	qemu-devel@nongnu.org, Peter Xu <peterx@redhat.com>,
	David Hildenbrand <david@redhat.com>
Subject: Re: [PULL 2/8] migration: ram block cpr blockers
Date: Fri, 28 Mar 2025 15:05:00 +0800	[thread overview]
Message-ID: <7b4d0c5f-eb9e-4fc1-bc85-c08b1e674586@intel.com> (raw)
In-Reply-To: <307f12e8-7de0-42b2-97ab-997c2dc3f39c@oracle.com>

On 3/27/2025 8:27 PM, Steven Sistare wrote:
> On 3/26/2025 5:34 PM, Michael Roth wrote:
>> On Wed, Mar 26, 2025 at 05:13:50PM -0300, Fabiano Rosas wrote:
>>> Michael Roth <michael.roth@amd.com> writes:
>>>
>>>> Quoting Tom Lendacky (2025-03-26 14:21:31)
>>>>> On 3/26/25 13:46, Tom Lendacky wrote:
>>>>>> On 3/7/25 12:15, Fabiano Rosas wrote:
>>>>>>> From: Steve Sistare <steven.sistare@oracle.com>
>>>>>>>
>>>>>>> Unlike cpr-reboot mode, cpr-transfer mode cannot save volatile 
>>>>>>> ram blocks
>>>>>>> in the migration stream file and recreate them later, because the 
>>>>>>> physical
>>>>>>> memory for the blocks is pinned and registered for vfio.  Add a 
>>>>>>> blocker
>>>>>>> for volatile ram blocks.
>>>>>>>
>>>>>>> Also add a blocker for RAM_GUEST_MEMFD.  Preserving guest_memfd 
>>>>>>> may be
>>>>>>> sufficient for CPR, but it has not been tested yet.
>>>>>>>
>>>>>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>>>>>> Reviewed-by: Fabiano Rosas <farosas@suse.de>
>>>>>>> Reviewed-by: Peter Xu <peterx@redhat.com>
>>>>>>> Reviewed-by: David Hildenbrand <david@redhat.com>
>>>>>>> Message-ID: <1740667681-257312-1-git-send-email- 
>>>>>>> steven.sistare@oracle.com>
>>>>>>> Signed-off-by: Fabiano Rosas <farosas@suse.de>
>>>>>>> ---
>>>>>>>   include/exec/memory.h   |  3 ++
>>>>>>>   include/exec/ramblock.h |  1 +
>>>>>>>   migration/savevm.c      |  2 ++
>>>>>>>   system/physmem.c        | 66 ++++++++++++++++++++++++++++++++++ 
>>>>>>> +++++++
>>>>>>>   4 files changed, 72 insertions(+)
>>>>>>
>>>>>> This patch breaks booting an SNP guest as it triggers the following
>>>>>> assert:
>>>>>>
>>>>>> qemu-system-x86_64: ../util/error.c:68: error_setv: Assertion 
>>>>>> `*errp == NULL' failed.
>>>>>>
>>>
>>> Usually this means the error has already been set previously, which is
>>> not allowed.
>>>
>>>>>> I tracked it to the err_setg() call in ram_block_add_cpr_blocker().
>>>>>> It looks like the error message is unable to be printed because
>>>>>> rb->cpr_blocker is NULL.
>>>>>>
>>>>>> Adding aux-ram-share=on to the -machine object gets me past the 
>>>>>> error and
>>>>>> therefore the assertion, but isn't that an incompatible change to 
>>>>>> how an
>>>>>> SNP guest has to be started?
>>>>>
>>>>> If I update the err_setg() call to use the errp parameter that is 
>>>>> passed
>>>>> into ram_block_add_cpr_blocker(), I get the following message and then
>>>>> the guest launch terminates:
>>>>>
>>>
>>> The usage at ram_block_add_cpr_blocker() is correct, the cpr_blocker
>>> gets initialized and registered as a migration blocker. The errp only
>>> becomes relevant later when migration starts and the error condition is
>>> met.
>>>
>>>>> qemu-system-x86_64: Memory region pc.bios is not compatible with CPR.
>>>>> share=on is required for memory-backend objects, and aux-ram- 
>>>>> share=on is
>>>>> required.
>>>
>>> Since errp is an &error_fatal, it causes QEMU to exit, so this^ error
>>> message is bogus.
>>>
>>>>>
>>>>> The qemu parameters I used prior to this patch that allowed an SNP 
>>>>> guest
>>>>> to launch were:
>>>>>
>>>>> -machine type=q35,confidential-guest-support=sev0,memory-backend=ram1
>>>>> -object memory-backend- 
>>>>> memfd,id=ram1,size=16G,share=true,prealloc=false
>>>>>
>>>>> With these parameters after this patch, the launch fails.
>>>>
>>>> I think it might be failing because the caller of
>>>> ram_block_add_cpr_blocker() is passing in &error_abort, but if the
>>>> error_setg() is call on a properly initialized cpr_blocker value then
>>>> SNP is still able to boot for me.
>>>> I'm not sure where the best spot is
>>>> to initialize cpr_blocker, it probably needs to be done before either
>>>> ram_block_add_cpr_blocker() or ram_block_del_cpr_blocker() are 
>>>> callable,
>>>> but the following avoids the reported crash at least:
>>>>
>>>> diff --git a/system/physmem.c b/system/physmem.c
>>>> index 44dd129662..bff0fdcaac 100644
>>>> --- a/system/physmem.c
>>>> +++ b/system/physmem.c
>>>> @@ -4176,6 +4176,7 @@ void ram_block_add_cpr_blocker(RAMBlock *rb, 
>>>> Error **errp)
>>>>           return;
>>>>       }
>>>>
>>>> +    rb->cpr_blocker = NULL;
>>>
>>> Could it be the cpr_blocker already got set at ram_block_add() in the
>>> RAM_GUEST_MEMFD path?
>>
>> That seems to be the case: in some cases ram_block_add() sets cpr_blocker
>> when (new_block->flags & RAM_GUEST_MEMFD) is true, and then soon after
>> when ram_block_add_cpr_blocker() is called on the same RAMBlock:
>>
>>    2025-03-26T21:08:15.092427Z qemu-system-x86_64: warning: 
>> ram_block_add: new_block 0x55c247e4c880 new_block->cpr_blocker (nil) 
>> name ram1
>>    2025-03-26T21:08:15.124710Z qemu-system-x86_64: warning: 
>> ram_block_add: new_block 0x55c2480fde00 new_block->cpr_blocker (nil) 
>> name pc.bios
>>    2025-03-26T21:08:15.126190Z qemu-system-x86_64: warning: 
>> ram_block_add_cpr_blocker: rb 0x55c2480fde00 rb->cpr_blocker 
>> 0x55c2480fe050 name pc.bios
>>    2025-03-26T21:08:15.138582Z qemu-system-x86_64: warning: 
>> ram_block_add: new_block 0x55c247e3c1e0 new_block->cpr_blocker (nil) 
>> name pc.rom
>>    2025-03-26T21:08:15.138938Z qemu-system-x86_64: warning: 
>> ram_block_add_cpr_blocker: rb 0x55c247e3c1e0 rb->cpr_blocker 
>> 0x55c247e3c890 name pc.rom
>>    2025-03-26T21:08:16.185577Z qemu-system-x86_64: warning: 
>> ram_block_add_cpr_blocker: rb 0x55c248db9200 rb->cpr_blocker (nil) 
>> name /rom@etc/acpi/tables
>>    2025-03-26T21:08:16.187140Z qemu-system-x86_64: warning: 
>> ram_block_add_cpr_blocker: rb 0x55c248085620 rb->cpr_blocker (nil) 
>> name /rom@etc/table-loader
>>    2025-03-26T21:08:16.188029Z qemu-system-x86_64: warning: 
>> ram_block_add_cpr_blocker: rb 0x55c2480ce220 rb->cpr_blocker (nil) 
>> name /rom@etc/acpi/rsd
> 
> Thanks everyone for debugging this.  To summarize, 
> ram_block_add_cpr_blocker already blocks
> guest_memfd, because rb->fd < 0.  The fix is to delete this redundant 
> code in ram_block_add:
> 
>          error_setg(&new_block->cpr_blocker,
>                     "Memory region %s uses guest_memfd, "
>                     "which is not supported with CPR.",
>                     memory_region_name(new_block->mr));
>          migrate_add_blocker_modes(&new_block->cpr_blocker, errp,
>                                    MIG_MODE_CPR_TRANSFER, 
> MIG_MODE_CPR_EXEC,
>                                    -1);

I just encountered the same issue with TDX guest, after rebasing TDX 
code to 10.0.0-rc0.

thank you all for the reporting and quick solution for it.

> I will submit a fix (unless Tom or Michael would prefer to author it).
> 
> - Steve
> 



  parent reply	other threads:[~2025-03-28  7:05 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-07 18:15 [PULL 0/8] Migration patches for 2025-03-07 Fabiano Rosas
2025-03-07 18:15 ` [PULL 1/8] migration: Fix UAF for incoming migration on MigrationState Fabiano Rosas
2025-03-07 18:15 ` [PULL 2/8] migration: ram block cpr blockers Fabiano Rosas
2025-03-26 18:46   ` Tom Lendacky
2025-03-26 19:21     ` Tom Lendacky
2025-03-26 19:50       ` Michael Roth
2025-03-26 20:13         ` Fabiano Rosas
2025-03-26 21:34           ` Michael Roth
2025-03-27 12:27             ` Steven Sistare
2025-03-27 13:42               ` Tom Lendacky
2025-03-28  7:05               ` Xiaoyao Li [this message]
2025-03-07 18:15 ` [PULL 3/8] migration: Prioritize RDMA in ram_save_target_page() Fabiano Rosas
2025-03-07 18:15 ` [PULL 4/8] migration: check RDMA and capabilities are compatible on both sides Fabiano Rosas
2025-03-07 18:15 ` [PULL 5/8] migration: disable RDMA + postcopy-ram Fabiano Rosas
2025-03-07 18:15 ` [PULL 6/8] migration/rdma: Remove redundant migration_in_postcopy checks Fabiano Rosas
2025-03-07 18:15 ` [PULL 7/8] migration: Unfold control_save_page() Fabiano Rosas
2025-03-07 18:15 ` [PULL 8/8] migration: Add qtest for migration over RDMA Fabiano Rosas
2025-03-08  6:00   ` Philippe Mathieu-Daudé
2025-03-08  8:42     ` Stefan Hajnoczi
2025-03-10  8:33       ` Zhijian Li (Fujitsu) via
2025-03-10 14:36         ` Peter Xu
2025-03-11  2:06           ` Zhijian Li (Fujitsu) via
2025-03-10  8:01     ` Zhijian Li (Fujitsu) via
2025-03-10 15:00       ` Fabiano Rosas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7b4d0c5f-eb9e-4fc1-bc85-c08b1e674586@intel.com \
    --to=xiaoyao.li@intel.com \
    --cc=david@redhat.com \
    --cc=farosas@suse.de \
    --cc=michael.roth@amd.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=steven.sistare@oracle.com \
    --cc=thomas.lendacky@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).