From: bugzilla-daemon@kernel.org
To: kvm@vger.kernel.org
Subject: [Bug 217423] TSC synchronization issue in VM restore
Date: Fri, 23 Jun 2023 14:41:02 +0000 [thread overview]
Message-ID: <bug-217423-28872-SViA10PlQq@https.bugzilla.kernel.org/> (raw)
In-Reply-To: <bug-217423-28872@https.bugzilla.kernel.org/>
https://bugzilla.kernel.org/show_bug.cgi?id=217423
--- Comment #4 from robert.hoo.linux@gmail.com ---
On 5/18/2023 8:48 PM, bugzilla-daemon@kernel.org wrote:
[...]
(Sorry for late response)
>> Can elaborate more on this hrtimer issue/code path?
>
> Below are the steps in detail, I traced them via bpftrace, to simplify the
> analysis, the preemption timer on host is disabled, guest is running with
> TSC timer deadline mode.
>
> TSC changes before save VM:
> 1 create VM/VCPU, guest TSC start from 0 (VCPU initial value)
> host_tsc0 = 0 + offset0
> 2 pause VM after guest start finished (about 200ms)
> host_tsc1 = guest_tsc1 + offset0
> guest_tsc1_deadline = guest_tsc1 + expire1
> 3 save VM state
> save guest_tsc1 by reading MSR_IA32_TSC
> save guest_tsc1_deadline by reading MSR_IA32_TSC_DEADLINE
>
> TSC changes in restore VM (to simplify the analysis, step 4
> and step 5 ignore the host TSC changes in restore process):
> 4 create VM/VCPU, guest TSC start from 0 (VCPU initial value)
> host_tsc3 = 0 + offset1
> 5 restore VM state
> set MSR_IA32_TSC by guest_tsc1
> set MSR_IA32_TSC_DEADLINE by guest_tsc1_deadline
> 6 start VM
> VCPU_RUN
>
> In step 5 setting MSR_IA32_TSC, because the guest_tsc1 is within 1 second,
> KVM will take this update as TSC synchronize, then skip update offset1.
> This means the guest TSC is still at 0 (initialize value).
IIUC, here no matter synchronizing = true or false, offset will always be
updated, i.e. __kvm_synchronize_tsc() will be called. But the offset value will
differ.
I guess your environment is tsc_stable, then offset = kvm->arch.cur_tsc_offset,
which is 0. That is to say, the elapsed time isn't counted in by the heuristics
method in current code, that's the culprit.
static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data)
{
...
offset = kvm_compute_l1_tsc_offset(vcpu, data);
...
/*
* For a reliable TSC, we can match TSC offsets, and for an unstable
* TSC, we add elapsed time in this computation. We could let the
* compensation code attempt to catch up if we fall behind, but
* it's better to try to match offsets from the beginning.
*/
if (synchronizing &&
vcpu->arch.virtual_tsc_khz == kvm->arch.last_tsc_khz) {
if (!kvm_check_tsc_unstable()) {
offset = kvm->arch.cur_tsc_offset;
} else {
u64 delta = nsec_to_cycles(vcpu, elapsed);
data += delta;
offset = kvm_compute_l1_tsc_offset(vcpu, data);
}
matched = true;
}
__kvm_synchronize_tsc(vcpu, offset, data, ns, matched);
raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
}
>> An alternative, I think, is to bypass this directly write IA32_MSR_TSC way
>> to set/sync TSC offsets, but follow new approach introduced in your VMM by
>>
>> commit 828ca89628bfcb1b8f27535025f69dd00eb55207
>> Author: Oliver Upton <oliver.upton@linux.dev>
>> Date: Thu Sep 16 18:15:38 2021 +0000
>>
>> KVM: x86: Expose TSC offset controls to userspace
>>
>> ...
>>
>> Documentation/virt/kvm/devices/vcpu.rst:
>>
>> 4.1 ATTRIBUTE: KVM_VCPU_TSC_OFFSET
>>
>> :Parameters: 64-bit unsigned TSC offset
>>
>> ...
>>
>> Specifies the guest's TSC offset relative to the host's TSC. The guest's
>> TSC is then derived by the following equation:
>>
>> guest_tsc = host_tsc + KVM_VCPU_TSC_OFFSET
>>
>> The following describes a possible algorithm to use for this purpose
>> ...
>
> "TSC counts the time during which the VM was paused.", This new feature works
> for live migration. But if we save/restore VM with snapshot, the TSC should
> be
> paused either?
>
Not sure what's host's TSC situation when host is, say, suspended/hibernated.
VM
Save/Restore can refer to that.
But, the key point of this new approach is to use OFFSET rather than direct TSC
value, this is like x86 TSC_ADJUST was introduced, and is preferred.
Via this new interface,
"... Ensure that the KVM_CLOCK_REALTIME flag is set in the provided structure.
KVM will advance the VM's kvmclock to account for elapsed time since recording
the clock values.", therefore I think it can solve your problem, rather than
modify the ancient and heuristics code at high risk.
--
You may reply to this email to add a comment.
You are receiving this mail because:
You are watching the assignee of the bug.
prev parent reply other threads:[~2023-06-23 14:41 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-09 14:01 [Bug 217423] New: TSC synchronization issue in VM restore bugzilla-daemon
2023-05-09 16:27 ` [Bug 217423] " bugzilla-daemon
2023-05-17 6:56 ` [Bug 217423] New: " Robert Hoo
2023-05-17 6:56 ` [Bug 217423] " bugzilla-daemon
2023-05-18 12:48 ` bugzilla-daemon
2023-06-23 14:40 ` Robert Hoo
2023-06-23 14:41 ` bugzilla-daemon [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bug-217423-28872-SViA10PlQq@https.bugzilla.kernel.org/ \
--to=bugzilla-daemon@kernel.org \
--cc=kvm@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.