* [PATCH] KVM, CPU hotplug: Avoid wraparound in pvclock_get_nsec_offset
@ 2011-12-12 13:37 Vasilis Liaskovitis
2011-12-12 13:53 ` Jan Kiszka
2011-12-16 15:27 ` Marcelo Tosatti
0 siblings, 2 replies; 4+ messages in thread
From: Vasilis Liaskovitis @ 2011-12-12 13:37 UTC (permalink / raw)
To: kvm; +Cc: jan.kiszka, glommer, kraxel, Vasilis Liaskovitis
Hotplugging a vCPU with kvmclock enabled can cause a guest stall/hang. When
the stall happens, pvclock_clocksource_read() is called for the new vCPU and
pvclock_get_nsec_offset calculates native_read_tsc() - shadow->tsc_timestamp.
shadow->tsc_timestamp contains a value larger than native_read_tsc(), so the
result is a very large 64-bit unsigned value. The global tsc variable
last_value gets updated with this, causing system stall/freeze:
"rcu_sched_state detected stalls on CPUs/tasks ..."
The large shadow->tsc_timestamp value observed in the hanged cases is the tsc
written into the "boot clock" on VM startup.
Is the "boot clock" persistent in the guest? Can it get accessed by a vCPU
other than vCPU 0, if its own hv_clock struct has not yet been registered
or if the host has not yet updated the new hv_clock with a valid tsc_timestamp
in kvm_guest_time_update() ?
Fix temporarily by returning a zero offset if the delta in
pvclock_get_nsec_offset() is negative.
Tested on 3.0.6 guest kernel. Testing this patch requires qemu-kvm from:
git://git.kiszka.org/qemu-kvm.git queues/cpu-hotplug
---
arch/x86/kernel/pvclock.c | 11 ++++++++---
1 files changed, 8 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 42eb330..9d31144 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -43,9 +43,14 @@ void pvclock_set_flags(u8 flags)
static u64 pvclock_get_nsec_offset(struct pvclock_shadow_time *shadow)
{
- u64 delta = native_read_tsc() - shadow->tsc_timestamp;
- return pvclock_scale_delta(delta, shadow->tsc_to_nsec_mul,
- shadow->tsc_shift);
+ u64 current_read_tsc = native_read_tsc();
+ if (current_read_tsc > shadow->tsc_timestamp) {
+ u64 delta = current_read_tsc - shadow->tsc_timestamp;
+ return pvclock_scale_delta(delta, shadow->tsc_to_nsec_mul,
+ shadow->tsc_shift);
+ }
+ /* tsc value can be smaller than tsc_timestamp on a vCPU hotplug */
+ else return 0;
}
/*
--
1.7.7.3
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] KVM, CPU hotplug: Avoid wraparound in pvclock_get_nsec_offset
2011-12-12 13:37 [PATCH] KVM, CPU hotplug: Avoid wraparound in pvclock_get_nsec_offset Vasilis Liaskovitis
@ 2011-12-12 13:53 ` Jan Kiszka
2011-12-12 14:59 ` Vasilis Liaskovitis
2011-12-16 15:27 ` Marcelo Tosatti
1 sibling, 1 reply; 4+ messages in thread
From: Jan Kiszka @ 2011-12-12 13:53 UTC (permalink / raw)
To: Vasilis Liaskovitis
Cc: kvm@vger.kernel.org, Glauber Costa, kraxel@redhat.com,
Zachary Amsden
On 2011-12-12 14:37, Vasilis Liaskovitis wrote:
> Hotplugging a vCPU with kvmclock enabled can cause a guest stall/hang. When
> the stall happens, pvclock_clocksource_read() is called for the new vCPU and
> pvclock_get_nsec_offset calculates native_read_tsc() - shadow->tsc_timestamp.
> shadow->tsc_timestamp contains a value larger than native_read_tsc(), so the
> result is a very large 64-bit unsigned value. The global tsc variable
> last_value gets updated with this, causing system stall/freeze:
> "rcu_sched_state detected stalls on CPUs/tasks ..."
>
> The large shadow->tsc_timestamp value observed in the hanged cases is the tsc
> written into the "boot clock" on VM startup.
> Is the "boot clock" persistent in the guest? Can it get accessed by a vCPU
> other than vCPU 0, if its own hv_clock struct has not yet been registered
> or if the host has not yet updated the new hv_clock with a valid tsc_timestamp
> in kvm_guest_time_update() ?
>
> Fix temporarily by returning a zero offset if the delta in
> pvclock_get_nsec_offset() is negative.
>
> Tested on 3.0.6 guest kernel. Testing this patch requires qemu-kvm from:
> git://git.kiszka.org/qemu-kvm.git queues/cpu-hotplug
>
Fixing up Glommer's address (in case he has time) and adding Zach to CC.
> ---
> arch/x86/kernel/pvclock.c | 11 ++++++++---
> 1 files changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
> index 42eb330..9d31144 100644
> --- a/arch/x86/kernel/pvclock.c
> +++ b/arch/x86/kernel/pvclock.c
> @@ -43,9 +43,14 @@ void pvclock_set_flags(u8 flags)
>
> static u64 pvclock_get_nsec_offset(struct pvclock_shadow_time *shadow)
> {
> - u64 delta = native_read_tsc() - shadow->tsc_timestamp;
> - return pvclock_scale_delta(delta, shadow->tsc_to_nsec_mul,
> - shadow->tsc_shift);
> + u64 current_read_tsc = native_read_tsc();
> + if (current_read_tsc > shadow->tsc_timestamp) {
> + u64 delta = current_read_tsc - shadow->tsc_timestamp;
> + return pvclock_scale_delta(delta, shadow->tsc_to_nsec_mul,
> + shadow->tsc_shift);
> + }
> + /* tsc value can be smaller than tsc_timestamp on a vCPU hotplug */
> + else return 0;
> }
>
> /*
Can't comment on the semantics, but your patch is whitespace damaged and
doesn't follow kernel coding style. But I assume it's not for
application yet, right?
Would be cool if we find a fix the kvmclock hotplug issue. There are
some good patches on the way to finally make this a proper upstream feature.
Jan
--
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] KVM, CPU hotplug: Avoid wraparound in pvclock_get_nsec_offset
2011-12-12 13:53 ` Jan Kiszka
@ 2011-12-12 14:59 ` Vasilis Liaskovitis
0 siblings, 0 replies; 4+ messages in thread
From: Vasilis Liaskovitis @ 2011-12-12 14:59 UTC (permalink / raw)
To: Jan Kiszka
Cc: kvm@vger.kernel.org, Glauber Costa, kraxel@redhat.com,
Zachary Amsden
On Mon, Dec 12, 2011 at 02:53:29PM +0100, Jan Kiszka wrote:
>
> Can't comment on the semantics, but your patch is whitespace damaged and
> doesn't follow kernel coding style. But I assume it's not for
> application yet, right?
right. It fixes the hang for me, but I am not sure it's the best solution. If
it is, I 'll resend properly.
thanks,
- Vasilis
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] KVM, CPU hotplug: Avoid wraparound in pvclock_get_nsec_offset
2011-12-12 13:37 [PATCH] KVM, CPU hotplug: Avoid wraparound in pvclock_get_nsec_offset Vasilis Liaskovitis
2011-12-12 13:53 ` Jan Kiszka
@ 2011-12-16 15:27 ` Marcelo Tosatti
1 sibling, 0 replies; 4+ messages in thread
From: Marcelo Tosatti @ 2011-12-16 15:27 UTC (permalink / raw)
To: Vasilis Liaskovitis; +Cc: kvm, jan.kiszka, glommer, kraxel
On Mon, Dec 12, 2011 at 02:37:15PM +0100, Vasilis Liaskovitis wrote:
> Hotplugging a vCPU with kvmclock enabled can cause a guest stall/hang. When
> the stall happens, pvclock_clocksource_read() is called for the new vCPU and
> pvclock_get_nsec_offset calculates native_read_tsc() - shadow->tsc_timestamp.
> shadow->tsc_timestamp contains a value larger than native_read_tsc(), so the
> result is a very large 64-bit unsigned value. The global tsc variable
> last_value gets updated with this, causing system stall/freeze:
> "rcu_sched_state detected stalls on CPUs/tasks ..."
>
> The large shadow->tsc_timestamp value observed in the hanged cases is the tsc
> written into the "boot clock" on VM startup.
> Is the "boot clock" persistent in the guest? Can it get accessed by a vCPU
> other than vCPU 0, if its own hv_clock struct has not yet been registered
> or if the host has not yet updated the new hv_clock with a valid tsc_timestamp
> in kvm_guest_time_update() ?
When a CPU is hotplugged it'll have its TSC start counting at 0.
We should cope with that fact and fix this bug in the boot clock handling.
>From the guests perspective, shadow->tsc_timestamp should be updated to
reflect the current vcpu (which is not the case when its reading the
value from the boot clock).
That said, i am not sure what is the best path to fix this, but the
workaround below is ugly.
>
> Fix temporarily by returning a zero offset if the delta in
> pvclock_get_nsec_offset() is negative.
>
> Tested on 3.0.6 guest kernel. Testing this patch requires qemu-kvm from:
> git://git.kiszka.org/qemu-kvm.git queues/cpu-hotplug
>
> ---
> arch/x86/kernel/pvclock.c | 11 ++++++++---
> 1 files changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
> index 42eb330..9d31144 100644
> --- a/arch/x86/kernel/pvclock.c
> +++ b/arch/x86/kernel/pvclock.c
> @@ -43,9 +43,14 @@ void pvclock_set_flags(u8 flags)
>
> static u64 pvclock_get_nsec_offset(struct pvclock_shadow_time *shadow)
> {
> - u64 delta = native_read_tsc() - shadow->tsc_timestamp;
> - return pvclock_scale_delta(delta, shadow->tsc_to_nsec_mul,
> - shadow->tsc_shift);
> + u64 current_read_tsc = native_read_tsc();
> + if (current_read_tsc > shadow->tsc_timestamp) {
> + u64 delta = current_read_tsc - shadow->tsc_timestamp;
> + return pvclock_scale_delta(delta, shadow->tsc_to_nsec_mul,
> + shadow->tsc_shift);
> + }
> + /* tsc value can be smaller than tsc_timestamp on a vCPU hotplug */
> + else return 0;
> }
>
> /*
> --
> 1.7.7.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2011-12-17 15:58 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-12 13:37 [PATCH] KVM, CPU hotplug: Avoid wraparound in pvclock_get_nsec_offset Vasilis Liaskovitis
2011-12-12 13:53 ` Jan Kiszka
2011-12-12 14:59 ` Vasilis Liaskovitis
2011-12-16 15:27 ` Marcelo Tosatti
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox