* Trying cpr
@ 2025-04-21 13:38 Dr. David Alan Gilbert
2025-04-21 15:22 ` Dongli Zhang
0 siblings, 1 reply; 3+ messages in thread
From: Dr. David Alan Gilbert @ 2025-04-21 13:38 UTC (permalink / raw)
To: steven.sistare; +Cc: qemu-devel
Hi Steve,
I've just had a go with cpr-transfer, it's quite interesting.
I was just trying it on my (AMD) desktop.
* I was running with qemu displaying graphics, and after migration
the source display got updated every time I moved my mouse into the
source window; the VM was still stopped, but I guess that means
the source GUI is still parsing the guest VRAM and displaying it.
I'm not sure if there's any other interactions - e.g. is there any
situation where the source GUI will try and write into the shared
guest ram?
* Given that you pass fd's over the CPR socket, had you considered
passing main migration fd's over it as well, that way you'd
only need one incoming.
* The guest noticed the time skew:
timekeeping watchdog on CPU1: Marking clocksource 'tsc' as unstable because the skew is too large:
'kvm-clock' wd_nsec: 556248511 wd_new: 4a93129e69 wd_alst: 4a71eaf0aa mask: (all f's)
'tsc' cs_nsec: 514023131 cs_now: 1047f1d8489 cs_last: 10414538c1 mask: (all f's)
Clocksource 'tsc' skewed -42225380 ns (-42 ms) over watchdog 'kvm-clock' interval of 556248511 ns (556 ms)
'kvm-clock' (not 'tsc') is current clocksource
(That was hand copied, probably with some typos - who knew the
GUI doesn't let you copy/paste from serial0...)
The source commandline was:
./try/qemu-system-x86_64 -object memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/qemuram0,share=on -m 4G -machine memory-backend=ram0,aux-ram-share=on -cpu host --enable-kvm -smp 16 -drive if=virtio,file=/discs/more/images/debian-13-nocloud-amd64-daily.qcow2 -qmp stdio
The dest commandline was:
./try/qemu-system-x86_64 -object memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/qemuram0,share=on -m 4G -machine memory-backend=ram0,aux-ram-share=on -cpu host --enable-kvm -smp 16 -drive if=virtio,file=/discs/more/images/debian-13-nocloud-amd64-daily.qcow2 -incoming tcp:0:44444 -incoming '{"channel-type": "cpr", "addr": { "transport": "socket", "type": "unix", "path": "cpr.sock"}}'
Dave
--
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux | Happy \
\ dave @ treblig.org | | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Trying cpr
2025-04-21 13:38 Trying cpr Dr. David Alan Gilbert
@ 2025-04-21 15:22 ` Dongli Zhang
2025-04-21 17:07 ` Dr. David Alan Gilbert
0 siblings, 1 reply; 3+ messages in thread
From: Dongli Zhang @ 2025-04-21 15:22 UTC (permalink / raw)
To: Dr. David Alan Gilbert, steven.sistare; +Cc: qemu-devel
On 4/21/25 6:38 AM, Dr. David Alan Gilbert wrote:
> Hi Steve,
> I've just had a go with cpr-transfer, it's quite interesting.
> I was just trying it on my (AMD) desktop.
>
> * I was running with qemu displaying graphics, and after migration
> the source display got updated every time I moved my mouse into the
> source window; the VM was still stopped, but I guess that means
> the source GUI is still parsing the guest VRAM and displaying it.
> I'm not sure if there's any other interactions - e.g. is there any
> situation where the source GUI will try and write into the shared
> guest ram?
>
> * Given that you pass fd's over the CPR socket, had you considered
> passing main migration fd's over it as well, that way you'd
> only need one incoming.
>
> * The guest noticed the time skew:
> timekeeping watchdog on CPU1: Marking clocksource 'tsc' as unstable because the skew is too large:
> 'kvm-clock' wd_nsec: 556248511 wd_new: 4a93129e69 wd_alst: 4a71eaf0aa mask: (all f's)
> 'tsc' cs_nsec: 514023131 cs_now: 1047f1d8489 cs_last: 10414538c1 mask: (all f's)
> Clocksource 'tsc' skewed -42225380 ns (-42 ms) over watchdog 'kvm-clock' interval of 556248511 ns (556 ms)
> 'kvm-clock' (not 'tsc') is current clocksource
Here the guest kernel uses kvm-clock to measure the accuracy of tsc.
While there is a chance that the accuracy of tsc is broken, it is more likely
the kvm-clock's accuracy is broken.
That is, suppose the TSC is still good enough, it is marked unstable because the
kernel uses an inaccurate kvm-clock to measure tsc.
How about the guest kernel version? Does it have the below patch? Or is this an
AMD server (by default X86_FEATURE_CONSTANT_TSC isn't set)?
x86/tsc: Disable clocksource watchdog for TSC on qualified platorms
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b50db7095fe002fa3e16605546cba66bf1b68a3e
In addition, I assume the cpr-transfer doesn't re-create a new KVM instance (fd).
I used to encounter similar issue during vCPU hotplug.
KVM: x86: Don't unnecessarily force masterclock update on vCPU hotplug
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c52ffadc65e28ab461fd055e9991e8d8106a0056
David Woodhouse has a patchset related to kvmclock and live migration.
[RFC PATCH v3 00/21] Cleaning up the KVM clock mess
https://lore.kernel.org/all/20240522001817.619072-1-dwmw2@infradead.org/
Maciej also fixed a similar clock unstable issue.
target/i386: Reset TSCs of parked vCPUs too on VM reset
https://gitlab.com/qemu-project/qemu/-/commit/3f2a05b31ee9ce2ddb6c75a9bc3f5e7f7af9a76f
Dongli Zhang
>
> (That was hand copied, probably with some typos - who knew the
> GUI doesn't let you copy/paste from serial0...)
>
>
> The source commandline was:
> ./try/qemu-system-x86_64 -object memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/qemuram0,share=on -m 4G -machine memory-backend=ram0,aux-ram-share=on -cpu host --enable-kvm -smp 16 -drive if=virtio,file=/discs/more/images/debian-13-nocloud-amd64-daily.qcow2 -qmp stdio
>
> The dest commandline was:
> ./try/qemu-system-x86_64 -object memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/qemuram0,share=on -m 4G -machine memory-backend=ram0,aux-ram-share=on -cpu host --enable-kvm -smp 16 -drive if=virtio,file=/discs/more/images/debian-13-nocloud-amd64-daily.qcow2 -incoming tcp:0:44444 -incoming '{"channel-type": "cpr", "addr": { "transport": "socket", "type": "unix", "path": "cpr.sock"}}'
>
> Dave
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Trying cpr
2025-04-21 15:22 ` Dongli Zhang
@ 2025-04-21 17:07 ` Dr. David Alan Gilbert
0 siblings, 0 replies; 3+ messages in thread
From: Dr. David Alan Gilbert @ 2025-04-21 17:07 UTC (permalink / raw)
To: Dongli Zhang; +Cc: steven.sistare, qemu-devel
* Dongli Zhang (dongli.zhang@oracle.com) wrote:
>
>
> On 4/21/25 6:38 AM, Dr. David Alan Gilbert wrote:
> > Hi Steve,
> > I've just had a go with cpr-transfer, it's quite interesting.
> > I was just trying it on my (AMD) desktop.
> >
> > * I was running with qemu displaying graphics, and after migration
> > the source display got updated every time I moved my mouse into the
> > source window; the VM was still stopped, but I guess that means
> > the source GUI is still parsing the guest VRAM and displaying it.
> > I'm not sure if there's any other interactions - e.g. is there any
> > situation where the source GUI will try and write into the shared
> > guest ram?
> >
> > * Given that you pass fd's over the CPR socket, had you considered
> > passing main migration fd's over it as well, that way you'd
> > only need one incoming.
> >
> > * The guest noticed the time skew:
> > timekeeping watchdog on CPU1: Marking clocksource 'tsc' as unstable because the skew is too large:
> > 'kvm-clock' wd_nsec: 556248511 wd_new: 4a93129e69 wd_alst: 4a71eaf0aa mask: (all f's)
> > 'tsc' cs_nsec: 514023131 cs_now: 1047f1d8489 cs_last: 10414538c1 mask: (all f's)
> > Clocksource 'tsc' skewed -42225380 ns (-42 ms) over watchdog 'kvm-clock' interval of 556248511 ns (556 ms)
> > 'kvm-clock' (not 'tsc') is current clocksource
>
> Here the guest kernel uses kvm-clock to measure the accuracy of tsc.
>
> While there is a chance that the accuracy of tsc is broken, it is more likely
> the kvm-clock's accuracy is broken.
Well, remember that with CPR there's a live migration going on and a brief
pause; so it's always a fun question about what happens with clocks; however
with CPR I guess there's a better chance of it being solvable than general
live migration.
> That is, suppose the TSC is still good enough, it is marked unstable because the
> kernel uses an inaccurate kvm-clock to measure tsc.
>
> How about the guest kernel version? Does it have the below patch?
It's debian's 6.12.12 as the guest, and git describe b50db7095
v5.16-rc2-5-gb50db7095fe0
so I'd say yes, should have that patch.
> Or is this an
> AMD server (by default X86_FEATURE_CONSTANT_TSC isn't set)?
It's an AMD single socket desktop (3950X) [with 6.14.2 kernel]
>
> x86/tsc: Disable clocksource watchdog for TSC on qualified platorms
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b50db7095fe002fa3e16605546cba66bf1b68a3e
>
> In addition, I assume the cpr-transfer doesn't re-create a new KVM instance (fd).
>
> I used to encounter similar issue during vCPU hotplug.
>
> KVM: x86: Don't unnecessarily force masterclock update on vCPU hotplug
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c52ffadc65e28ab461fd055e9991e8d8106a0056
>
> David Woodhouse has a patchset related to kvmclock and live migration.
>
> [RFC PATCH v3 00/21] Cleaning up the KVM clock mess
> https://lore.kernel.org/all/20240522001817.619072-1-dwmw2@infradead.org/
>
> Maciej also fixed a similar clock unstable issue.
>
> target/i386: Reset TSCs of parked vCPUs too on VM reset
> https://gitlab.com/qemu-project/qemu/-/commit/3f2a05b31ee9ce2ddb6c75a9bc3f5e7f7af9a76f
Thanks.
Dave
> Dongli Zhang
>
> >
> > (That was hand copied, probably with some typos - who knew the
> > GUI doesn't let you copy/paste from serial0...)
> >
> >
> > The source commandline was:
> > ./try/qemu-system-x86_64 -object memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/qemuram0,share=on -m 4G -machine memory-backend=ram0,aux-ram-share=on -cpu host --enable-kvm -smp 16 -drive if=virtio,file=/discs/more/images/debian-13-nocloud-amd64-daily.qcow2 -qmp stdio
> >
> > The dest commandline was:
> > ./try/qemu-system-x86_64 -object memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/qemuram0,share=on -m 4G -machine memory-backend=ram0,aux-ram-share=on -cpu host --enable-kvm -smp 16 -drive if=virtio,file=/discs/more/images/debian-13-nocloud-amd64-daily.qcow2 -incoming tcp:0:44444 -incoming '{"channel-type": "cpr", "addr": { "transport": "socket", "type": "unix", "path": "cpr.sock"}}'
> >
> > Dave
>
--
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux | Happy \
\ dave @ treblig.org | | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-04-21 17:07 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-21 13:38 Trying cpr Dr. David Alan Gilbert
2025-04-21 15:22 ` Dongli Zhang
2025-04-21 17:07 ` Dr. David Alan Gilbert
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).