* Steps towards live migration
@ 2025-06-27 13:45 Jakub Růžička
2025-06-30 12:09 ` Gupta, Pankaj
2025-07-07 11:24 ` Daniel P. Berrangé
0 siblings, 2 replies; 5+ messages in thread
From: Jakub Růžička @ 2025-06-27 13:45 UTC (permalink / raw)
To: coconut-svsm; +Cc: pankaj.gupta, thomas.lendacky, sgarzare
[-- Attachment #1: Type: text/plain, Size: 2812 bytes --]
Hi,
as mentioned during the last SVSM development talk, I am currently writing my
thesis on live migration of confidential guests. With this email I would like to
start a discussion so that the same problem is not addressed multiple times
independently. I also welcome any feedback to make the code written useful for
SVSM Coconut (not only for the thesis). Development is done on machine with
SEV-SNP support. I made a pull request with the SVSM patches[1] to ease the
discussion.
Here is a summary of progress to date:
All validated guest pages can be transferred from the source SVSM to the target
SVSM. Currently, no packaging (no confidentiality) is performed, but a hash of
all transferred/received blocks is computed that can be compared to verify that
the channel is working correctly. Also, the main migration function of the
source and destination SVSM is busy waiting for a signal from QEMU to start
outbound or incoming migration.
A single shared page called MigrationPage is used for data transfer and
communication with QEMU. The migration page contains two registers and a buffer:
a status register, a data register, and a data buffer. The status register is
used to signal a change in status (e.g., migration starts, migration is
complete). The data register is used to signal that a new page has been prepared
in the data buffer by the provider or processed by the consumer. The roles of
provider and consumer are switched between SVSM and QEMU on the source and
destination machines.
The QEMU patch[2] implements communication with the SVSM migration handler and
block transfer from source to destination. The idea is that creating the
communication channel is all the hypervisor should do, the rest should be done
in SVSM.
Current plan for the future in order of realisation:
(1) A function that puts all hosted vCPUs (except the migration handler) into a
spinning state. The hypervisor is not trusted, so the SVSM must be able to
ensure that it is not running any vCPUs. For this task, I consider two-phase
checkpointing[3].
(2) Migrate the machine from the source to the destination with all vCPUs
stopped by the function from (1).
(3) Secret key establishment - though about using [5].
(4) Package the pages - authenticated encryption using [4].
(5) Dirty page tracking.
(6) Move the SVSM migration handler on an extra vCPU.
(7) Start migration handler on signal instead of busy-waiting loop.
Best regards,
Jakub
[1] https://github.com/coconut-svsm/svsm/pull/745
[2] https://github.com/coconut-svsm/qemu/pull/23
[3] https://ipads.se.sjtu.edu.cn/_media/publications/sgxmigration-dsn17.pdf
[4] https://github.com/RustCrypto/AEADs/blob/master/aes-gcm/src/lib.rs
[5] https://github.com/nihalpasham/static-dh-ecdh
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Steps towards live migration
2025-06-27 13:45 Steps towards live migration Jakub Růžička
@ 2025-06-30 12:09 ` Gupta, Pankaj
2025-07-08 13:49 ` Jakub Růžička
2025-07-07 11:24 ` Daniel P. Berrangé
1 sibling, 1 reply; 5+ messages in thread
From: Gupta, Pankaj @ 2025-06-30 12:09 UTC (permalink / raw)
To: Jakub Růžička, coconut-svsm
Cc: thomas.lendacky, sgarzare, John Allen, Joerg Roedel
+CC [John & Joerg]
Hi Jacob,
I briefly looked at your code. I have some questions below to understand
your design more. Do you have a comprehensive design document somewhere?
> Hi,
>
> as mentioned during the last SVSM development talk, I am currently writing my
> thesis on live migration of confidential guests. With this email I would like to
> start a discussion so that the same problem is not addressed multiple times
Ofcourse. That's the reason we shared our design in the kvm forum talk
and asked for collaboration on common components (open vendor agnostic
problems):
https://kvm-forum.qemu.org/2024/SNP_Live_Migration_KVM_forum_2024_svDwxa3.pdf
> independently. I also welcome any feedback to make the code written useful for
> SVSM Coconut (not only for the thesis). Development is done on machine with
> SEV-SNP support. I made a pull request with the SVSM patches[1] to ease the
> discussion.
>
> Here is a summary of progress to date:
>
> All validated guest pages can be transferred from the source SVSM to the target
> SVSM. Currently, no packaging (no confidentiality) is performed, but a hash of
> all transferred/received blocks is computed that can be compared to verify that
> the channel is working correctly. Also, the main migration function of the
Any analysis on algorithm you plan to use for this? IIUC you are
currently sharing the hash from source to destination? Wouldn't the
'aes-gcm' in [4] will do that in addition to encryption?
> source and destination SVSM is busy waiting for a signal from QEMU to start
> outbound or incoming migration.
> > A single shared page called MigrationPage is used for data transfer and
> communication with QEMU. The migration page contains two registers and a buffer:
> a status register, a data register, and a data buffer. The status register is
> used to signal a change in status (e.g., migration starts, migration is
> complete). The data register is used to signal that a new page has been prepared
> in the data buffer by the provider or processed by the consumer. The roles of
> provider and consumer are switched between SVSM and QEMU on the source and
> destination machines.
This looks like a communication channel between Qemu and SVSM. We are
re-using few bits in per CPU ghcb page for SVSM <-> host commands
communication.
I can understand you are not using additional vCPUs. All this can be too
much work for guest general purpose vCPUs.
>
> The QEMU patch[2] implements communication with the SVSM migration handler and
> block transfer from source to destination. The idea is that creating the
> communication channel is all the hypervisor should do, the rest should be done
> in SVSM.
>
> Current plan for the future in order of realisation:
>
> (1) A function that puts all hosted vCPUs (except the migration handler) into a
[...]
> spinning state. The hypervisor is not trusted, so the SVSM must be able to
> ensure that it is not running any vCPUs. For this task, I consider two-phase
> checkpointing[3].
Can you please elaborate more this.
> (2) Migrate the machine from the source to the destination with all vCPUs
> stopped by the function from (1).
You mean black-out phase?
> (3) Secret key establishment - though about using [5].
We need to tackle this problem. A probable collaboration point, maybe
coupled with attestation and migration key sharing.
> (4) Package the pages - authenticated encryption using [4].
> (5) Dirty page tracking.
> (6) Move the SVSM migration handler on an extra vCPU.
> (7) Start migration handler on signal instead of busy-waiting loop.
There still are other open questions at more granular level. Can discuss
those as well, once I go through your complete design.
But at higher level, I would like to reuse more of the existing
functionality in Qemu for live migration and use SVSM for memory
packaging, guests memory permission setting at VMPL0, and live migration
sanity related tasks.
Best regards,
Pankaj
>
> Best regards,
> Jakub
>
> [1] https://github.com/coconut-svsm/svsm/pull/745
> [2] https://github.com/coconut-svsm/qemu/pull/23
> [3] https://ipads.se.sjtu.edu.cn/_media/publications/sgxmigration-dsn17.pdf
> [4] https://github.com/RustCrypto/AEADs/blob/master/aes-gcm/src/lib.rs
> [5] https://github.com/nihalpasham/static-dh-ecdh
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Steps towards live migration
2025-06-27 13:45 Steps towards live migration Jakub Růžička
2025-06-30 12:09 ` Gupta, Pankaj
@ 2025-07-07 11:24 ` Daniel P. Berrangé
1 sibling, 0 replies; 5+ messages in thread
From: Daniel P. Berrangé @ 2025-07-07 11:24 UTC (permalink / raw)
To: Jakub Růžička
Cc: coconut-svsm, pankaj.gupta, thomas.lendacky, sgarzare
On Fri, Jun 27, 2025 at 03:45:56PM +0200, Jakub Růžička wrote:
> The QEMU patch[2] implements communication with the SVSM migration handler and
> block transfer from source to destination. The idea is that creating the
> communication channel is all the hypervisor should do, the rest should be done
> in SVSM.
QEMU normally has device state that needs to be preserved across a live
migration. If this isn't done then the target VM devices will all be in
their initial state from after a machine board reset, which won't match
the state the guest OS believes the devices are currently in.
I very much doubt QEMU will want to support a new SNP migration protocol
and monitor commands, as opposed to integrating SNP migration into their
existing protocol & monitor commands.
Also this migration memory transfer code in QEMU is completely single
threaded over a single channel, while QEMU is moving towards expecting
multi-threaded and multi-TCP chanels being its baseline. Achieving live
migration convergence with only a single thread & TCP channel is often
not practical under highly loaded VMs. So ideally any comms protocol
for QEMU<->SVSM would be designed to enable QEMU to transfer guest
pages to/from SVSM in parallel across many threads.
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Steps towards live migration
2025-06-30 12:09 ` Gupta, Pankaj
@ 2025-07-08 13:49 ` Jakub Růžička
2025-07-09 12:39 ` Joerg Roedel
0 siblings, 1 reply; 5+ messages in thread
From: Jakub Růžička @ 2025-07-08 13:49 UTC (permalink / raw)
To: Gupta, Pankaj, coconut-svsm
Cc: thomas.lendacky, sgarzare, John Allen, Joerg Roedel
[-- Attachment #1: Type: text/plain, Size: 5560 bytes --]
On Mon Jun 30, 2025 at 2:09 PM CEST, Pankaj Gupta wrote:
> +CC [John & Joerg]
>
> Hi Jacob,
>
> I briefly looked at your code. I have some questions below to understand
> your design more. Do you have a comprehensive design document somewhere?
>
There is no comprehensive design document at the moment.
>> Hi,
>>
>> as mentioned during the last SVSM development talk, I am currently writing my
>> thesis on live migration of confidential guests. With this email I would like to
>> start a discussion so that the same problem is not addressed multiple times
>
> Ofcourse. That's the reason we shared our design in the kvm forum talk
> and asked for collaboration on common components (open vendor agnostic
> problems):
>
> https://kvm-forum.qemu.org/2024/SNP_Live_Migration_KVM_forum_2024_svDwxa3.pdf
>
>
>> independently. I also welcome any feedback to make the code written useful for
>> SVSM Coconut (not only for the thesis). Development is done on machine with
>> SEV-SNP support. I made a pull request with the SVSM patches[1] to ease the
>> discussion.
>>
>> Here is a summary of progress to date:
>>
>> All validated guest pages can be transferred from the source SVSM to the target
>> SVSM. Currently, no packaging (no confidentiality) is performed, but a hash of
>> all transferred/received blocks is computed that can be compared to verify that
>> the channel is working correctly. Also, the main migration function of the
>
> Any analysis on algorithm you plan to use for this? IIUC you are
> currently sharing the hash from source to destination? Wouldn't the
> 'aes-gcm' in [4] will do that in addition to encryption?
Yes, it would, but it was not used when I sent the email.
>> source and destination SVSM is busy waiting for a signal from QEMU to start
>> outbound or incoming migration.
>
> > > A single shared page called MigrationPage is used for data transfer and
>> communication with QEMU. The migration page contains two registers and a buffer:
>> a status register, a data register, and a data buffer. The status register is
>> used to signal a change in status (e.g., migration starts, migration is
>> complete). The data register is used to signal that a new page has been prepared
>> in the data buffer by the provider or processed by the consumer. The roles of
>> provider and consumer are switched between SVSM and QEMU on the source and
>> destination machines.
>
> This looks like a communication channel between Qemu and SVSM. We are
> re-using few bits in per CPU ghcb page for SVSM <-> host commands
> communication.
The communication protocol description is not yet available for reading, or?
>
> I can understand you are not using additional vCPUs. All this can be too
> much work for guest general purpose vCPUs.
>
>>
>> The QEMU patch[2] implements communication with the SVSM migration handler and
>> block transfer from source to destination. The idea is that creating the
>> communication channel is all the hypervisor should do, the rest should be done
>> in SVSM.
>>
>> Current plan for the future in order of realisation:
>>
>> (1) A function that puts all hosted vCPUs (except the migration handler) into a
>
> [...]
>
>> spinning state. The hypervisor is not trusted, so the SVSM must be able to
>> ensure that it is not running any vCPUs. For this task, I consider two-phase
>> checkpointing[3].
>
> Can you please elaborate more this.
At the beginning of the back-out phase, all guest vCPUs except the migration
handler should be stopped. To ensure that the hypervisor does not let any of the
vCPUs start, I want all vCPUs to enter a wait loop so that even if the
hypervisor allowed the vCPUs to start, no guest code would be run.
>> (2) Migrate the machine from the source to the destination with all vCPUs
>> stopped by the function from (1).
>
> You mean black-out phase?
Yes.
>> (3) Secret key establishment - though about using [5].
>
> We need to tackle this problem. A probable collaboration point, maybe
> coupled with attestation and migration key sharing.
>
>> (4) Package the pages - authenticated encryption using [4].
>> (5) Dirty page tracking.
>> (6) Move the SVSM migration handler on an extra vCPU.
>> (7) Start migration handler on signal instead of busy-waiting loop.
>
> There still are other open questions at more granular level. Can discuss
> those as well, once I go through your complete design.
>
> But at higher level, I would like to reuse more of the existing
> functionality in Qemu for live migration and use SVSM for memory
> packaging, guests memory permission setting at VMPL0, and live migration
> sanity related tasks.
By reuse more of the existing functionality. Do you mean using the current
migration in QEMU and registering different SaveVMHandlers?
What's the plan for tracking dirty sites? I understood from the slides of the
last kvm forum that this task should be done in SVSM. Have you already
implemented this? If so, are you using dirty page tracking in KVM or is it
implemented in SVSM.
Regards,
Jakub
> Best regards,
> Pankaj
>
>>
>> Best regards,
>> Jakub
>>
>> [1] https://github.com/coconut-svsm/svsm/pull/745
>> [2] https://github.com/coconut-svsm/qemu/pull/23
>> [3] https://ipads.se.sjtu.edu.cn/_media/publications/sgxmigration-dsn17.pdf
>> [4] https://github.com/RustCrypto/AEADs/blob/master/aes-gcm/src/lib.rs
>> [5] https://github.com/nihalpasham/static-dh-ecdh
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Steps towards live migration
2025-07-08 13:49 ` Jakub Růžička
@ 2025-07-09 12:39 ` Joerg Roedel
0 siblings, 0 replies; 5+ messages in thread
From: Joerg Roedel @ 2025-07-09 12:39 UTC (permalink / raw)
To: Jakub Růžička
Cc: Gupta, Pankaj, coconut-svsm, thomas.lendacky, sgarzare,
John Allen
Hi Jakub,
Thanks a lot for working on this and bringing your results to the COCONUT-SVSM
community. Live migration for confidential VMs is a much wanted feature and any
helping hand to make it happen is appreciated!
On Tue, Jul 08, 2025 at 03:49:48PM +0200, Jakub Růžička wrote:
> There is no comprehensive design document at the moment.
Before any code can be merged, a couple of fundamental questions need to be
answered, e.g.:
* What are the security guarantees for SVSM-based live migration?
* How is continuity in the attestation chain achieved?
* Which platforms are in scope for live migration support?
* Is SVSM live migration code generic or hypervisor-specific?
I think that the discussion should focus on these questions for now, because
they are fundamental and heavily influence how the code and the VMM/SVSM
interface will look like.
Having a design document that tries to answer the above questions would be a
great start for the discussion.
Kind regards,
Joerg
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-07-09 12:39 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-27 13:45 Steps towards live migration Jakub Růžička
2025-06-30 12:09 ` Gupta, Pankaj
2025-07-08 13:49 ` Jakub Růžička
2025-07-09 12:39 ` Joerg Roedel
2025-07-07 11:24 ` Daniel P. Berrangé
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.