From: "Dr. David Alan Gilbert" <dave@treblig.org>
To: Dongli Zhang <dongli.zhang@oracle.com>
Cc: steven.sistare@oracle.com, qemu-devel@nongnu.org
Subject: Re: Trying cpr
Date: Mon, 21 Apr 2025 17:07:00 +0000 [thread overview]
Message-ID: <aAZ7NNxTAn8u8egY@gallifrey> (raw)
In-Reply-To: <df432912-de0c-4a77-8008-0c07b23f42f0@oracle.com>
* Dongli Zhang (dongli.zhang@oracle.com) wrote:
>
>
> On 4/21/25 6:38 AM, Dr. David Alan Gilbert wrote:
> > Hi Steve,
> > I've just had a go with cpr-transfer, it's quite interesting.
> > I was just trying it on my (AMD) desktop.
> >
> > * I was running with qemu displaying graphics, and after migration
> > the source display got updated every time I moved my mouse into the
> > source window; the VM was still stopped, but I guess that means
> > the source GUI is still parsing the guest VRAM and displaying it.
> > I'm not sure if there's any other interactions - e.g. is there any
> > situation where the source GUI will try and write into the shared
> > guest ram?
> >
> > * Given that you pass fd's over the CPR socket, had you considered
> > passing main migration fd's over it as well, that way you'd
> > only need one incoming.
> >
> > * The guest noticed the time skew:
> > timekeeping watchdog on CPU1: Marking clocksource 'tsc' as unstable because the skew is too large:
> > 'kvm-clock' wd_nsec: 556248511 wd_new: 4a93129e69 wd_alst: 4a71eaf0aa mask: (all f's)
> > 'tsc' cs_nsec: 514023131 cs_now: 1047f1d8489 cs_last: 10414538c1 mask: (all f's)
> > Clocksource 'tsc' skewed -42225380 ns (-42 ms) over watchdog 'kvm-clock' interval of 556248511 ns (556 ms)
> > 'kvm-clock' (not 'tsc') is current clocksource
>
> Here the guest kernel uses kvm-clock to measure the accuracy of tsc.
>
> While there is a chance that the accuracy of tsc is broken, it is more likely
> the kvm-clock's accuracy is broken.
Well, remember that with CPR there's a live migration going on and a brief
pause; so it's always a fun question about what happens with clocks; however
with CPR I guess there's a better chance of it being solvable than general
live migration.
> That is, suppose the TSC is still good enough, it is marked unstable because the
> kernel uses an inaccurate kvm-clock to measure tsc.
>
> How about the guest kernel version? Does it have the below patch?
It's debian's 6.12.12 as the guest, and git describe b50db7095
v5.16-rc2-5-gb50db7095fe0
so I'd say yes, should have that patch.
> Or is this an
> AMD server (by default X86_FEATURE_CONSTANT_TSC isn't set)?
It's an AMD single socket desktop (3950X) [with 6.14.2 kernel]
>
> x86/tsc: Disable clocksource watchdog for TSC on qualified platorms
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b50db7095fe002fa3e16605546cba66bf1b68a3e
>
> In addition, I assume the cpr-transfer doesn't re-create a new KVM instance (fd).
>
> I used to encounter similar issue during vCPU hotplug.
>
> KVM: x86: Don't unnecessarily force masterclock update on vCPU hotplug
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c52ffadc65e28ab461fd055e9991e8d8106a0056
>
> David Woodhouse has a patchset related to kvmclock and live migration.
>
> [RFC PATCH v3 00/21] Cleaning up the KVM clock mess
> https://lore.kernel.org/all/20240522001817.619072-1-dwmw2@infradead.org/
>
> Maciej also fixed a similar clock unstable issue.
>
> target/i386: Reset TSCs of parked vCPUs too on VM reset
> https://gitlab.com/qemu-project/qemu/-/commit/3f2a05b31ee9ce2ddb6c75a9bc3f5e7f7af9a76f
Thanks.
Dave
> Dongli Zhang
>
> >
> > (That was hand copied, probably with some typos - who knew the
> > GUI doesn't let you copy/paste from serial0...)
> >
> >
> > The source commandline was:
> > ./try/qemu-system-x86_64 -object memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/qemuram0,share=on -m 4G -machine memory-backend=ram0,aux-ram-share=on -cpu host --enable-kvm -smp 16 -drive if=virtio,file=/discs/more/images/debian-13-nocloud-amd64-daily.qcow2 -qmp stdio
> >
> > The dest commandline was:
> > ./try/qemu-system-x86_64 -object memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/qemuram0,share=on -m 4G -machine memory-backend=ram0,aux-ram-share=on -cpu host --enable-kvm -smp 16 -drive if=virtio,file=/discs/more/images/debian-13-nocloud-amd64-daily.qcow2 -incoming tcp:0:44444 -incoming '{"channel-type": "cpr", "addr": { "transport": "socket", "type": "unix", "path": "cpr.sock"}}'
> >
> > Dave
>
--
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux | Happy \
\ dave @ treblig.org | | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/
prev parent reply other threads:[~2025-04-21 17:07 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-21 13:38 Trying cpr Dr. David Alan Gilbert
2025-04-21 15:22 ` Dongli Zhang
2025-04-21 17:07 ` Dr. David Alan Gilbert [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aAZ7NNxTAn8u8egY@gallifrey \
--to=dave@treblig.org \
--cc=dongli.zhang@oracle.com \
--cc=qemu-devel@nongnu.org \
--cc=steven.sistare@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.