* paravirt clock stil causing hangs in kvm-65 @ 2008-04-07 11:53 Nikola Ciprich 2008-04-07 21:34 ` Marcelo Tosatti 0 siblings, 1 reply; 15+ messages in thread From: Nikola Ciprich @ 2008-04-07 11:53 UTC (permalink / raw) To: kvm-devel Hi, I also tried paravirt clock again in latest git with kvm-65 patch applied, and problem with cpu-lockups persists: [10813.654806] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0] [10813.655789] CPU 0: [10813.656624] Modules linked in: virtio_pci virtio_ring virtio_blk virtio piix dm_snapshot dm_zero dm_mirror dm_mod ide_disk ide_core sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd [10813.658805] Pid: 0, comm: swapper Not tainted 2.6.25-rc7 #5 [10813.658805] RIP: 0010:[<ffffffff80222ab2>] [<ffffffff80222ab2>] native_safe_halt+0x2/0x10 [10813.658805] RSP: 0018:ffffffff805adf50 EFLAGS: 00000296 [10813.658805] RAX: 000000019b08eeb0 RBX: ffffffff805f5000 RCX: 000000019b08eeb0 [10813.658805] RDX: 0000000000000006 RSI: 00000000356832b0 RDI: ffffffff805adf38 [10813.658805] RBP: 0000000000000da8 R08: 0000000000000000 R09: 0000000000000000 [10813.658805] R10: 0000000000000001 R11: 0000000000000002 R12: ffffffff802228ed [10813.658805] R13: 00000000000132a0 R14: ffffffff80200bba R15: ffff81000100a280 [10813.658805] FS: 0000000000000000(0000) GS:ffffffff80576000(0000) knlGS:0000000000000000 [10813.658805] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b [10813.658805] CR2: 00007fac0f852000 CR3: 0000000000201000 CR4: 00000000000006e0 [10813.658805] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [10813.658805] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [10813.658805] [10813.658805] Call Trace: [10813.658805] [<ffffffff8020a55b>] ? default_idle+0x3b/0x70 [10813.658805] [<ffffffff8020a520>] ? default_idle+0x0/0x70 [10813.658805] [<ffffffff8020a60e>] ? cpu_idle+0x7e/0xe0 [10813.658805] [<ffffffff80211630>] ? pda_init+0x30/0xb0 Can I somehow help to track this one down?? Additionaly, I'd like to ask two questions: 1) why is kvm-65 patch changing release to -rc7? maybe this is unintended?? 2) I'm getting lot's of following messages on host, what do they mean? [16836.605669] Ignoring de-assert INIT to vcpu SOMENUMBER [16836.605687] SIPI to vcpu SOMENUMBER vector 0xSOMENUMBER BR nik ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Register now and save $200. Hurry, offer ends at 11:59 p.m., Monday, April 7! Use priority code J8TLD2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: paravirt clock stil causing hangs in kvm-65 2008-04-07 11:53 paravirt clock stil causing hangs in kvm-65 Nikola Ciprich @ 2008-04-07 21:34 ` Marcelo Tosatti 2008-04-08 1:18 ` Marcelo Tosatti 2008-04-19 15:29 ` Marcelo Tosatti 0 siblings, 2 replies; 15+ messages in thread From: Marcelo Tosatti @ 2008-04-07 21:34 UTC (permalink / raw) To: Nikola Ciprich; +Cc: kvm-devel On Mon, Apr 07, 2008 at 01:53:36PM +0200, Nikola Ciprich wrote: > Hi, > > I also tried paravirt clock again in latest git with kvm-65 patch > applied, and problem with cpu-lockups persists: > > [10813.654806] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0] > [10813.655789] CPU 0: > [10813.656624] Modules linked in: virtio_pci virtio_ring virtio_blk virtio > piix dm_snapshot dm_zero dm_mirror dm_mod ide_disk > ide_core sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd > [10813.658805] Pid: 0, comm: swapper Not tainted 2.6.25-rc7 #5 > [10813.658805] RIP: 0010:[<ffffffff80222ab2>] [<ffffffff80222ab2>] > native_safe_halt+0x2/0x10 > [10813.658805] RSP: 0018:ffffffff805adf50 EFLAGS: 00000296 > [10813.658805] RAX: 000000019b08eeb0 RBX: ffffffff805f5000 RCX: > 000000019b08eeb0 > [10813.658805] RDX: 0000000000000006 RSI: 00000000356832b0 RDI: > ffffffff805adf38 > [10813.658805] RBP: 0000000000000da8 R08: 0000000000000000 R09: > 0000000000000000 > [10813.658805] R10: 0000000000000001 R11: 0000000000000002 R12: > ffffffff802228ed > [10813.658805] R13: 00000000000132a0 R14: ffffffff80200bba R15: > ffff81000100a280 > [10813.658805] FS: 0000000000000000(0000) GS:ffffffff80576000(0000) > knlGS:0000000000000000 > [10813.658805] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > [10813.658805] CR2: 00007fac0f852000 CR3: 0000000000201000 CR4: > 00000000000006e0 > [10813.658805] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [10813.658805] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > 0000000000000400 > [10813.658805] > [10813.658805] Call Trace: > [10813.658805] [<ffffffff8020a55b>] ? default_idle+0x3b/0x70 > [10813.658805] [<ffffffff8020a520>] ? default_idle+0x0/0x70 > [10813.658805] [<ffffffff8020a60e>] ? cpu_idle+0x7e/0xe0 > [10813.658805] [<ffffffff80211630>] ? pda_init+0x30/0xb0 > > Can I somehow help to track this one down?? Hi Nikola, I just reproduced this on a UP guest. Were you seeing the exact same stack trace in the guest with kvm-64 ? > > Additionaly, I'd like to ask two questions: > 1) why is kvm-65 patch changing release to -rc7? maybe this is > unintended?? > > 2) I'm getting lot's of following messages on host, what do they mean? > [16836.605669] Ignoring de-assert INIT to vcpu SOMENUMBER > [16836.605687] SIPI to vcpu SOMENUMBER vector 0xSOMENUMBER > > BR > nik > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference > Register now and save $200. Hurry, offer ends at 11:59 p.m., > Monday, April 7! Use priority code J8TLD2. > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone > _______________________________________________ > kvm-devel mailing list > kvm-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/kvm-devel ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Register now and save $200. Hurry, offer ends at 11:59 p.m., Monday, April 7! Use priority code J8TLD2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: paravirt clock stil causing hangs in kvm-65 2008-04-07 21:34 ` Marcelo Tosatti @ 2008-04-08 1:18 ` Marcelo Tosatti 2008-04-08 5:59 ` Nikola Ciprich 2008-04-19 15:29 ` Marcelo Tosatti 1 sibling, 1 reply; 15+ messages in thread From: Marcelo Tosatti @ 2008-04-08 1:18 UTC (permalink / raw) To: Nikola Ciprich, Avi Kivity; +Cc: kvm-devel On Mon, Apr 07, 2008 at 06:34:57PM -0300, Marcelo Tosatti wrote: > On Mon, Apr 07, 2008 at 01:53:36PM +0200, Nikola Ciprich wrote: > > Hi, > > > > I also tried paravirt clock again in latest git with kvm-65 patch > > applied, and problem with cpu-lockups persists: > > > > [10813.654806] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0] > > [10813.655789] CPU 0: > > [10813.656624] Modules linked in: virtio_pci virtio_ring virtio_blk virtio > > piix dm_snapshot dm_zero dm_mirror dm_mod ide_disk > > ide_core sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd > > [10813.658805] Pid: 0, comm: swapper Not tainted 2.6.25-rc7 #5 > > [10813.658805] RIP: 0010:[<ffffffff80222ab2>] [<ffffffff80222ab2>] > > native_safe_halt+0x2/0x10 > > [10813.658805] RSP: 0018:ffffffff805adf50 EFLAGS: 00000296 > > [10813.658805] RAX: 000000019b08eeb0 RBX: ffffffff805f5000 RCX: > > 000000019b08eeb0 > > [10813.658805] RDX: 0000000000000006 RSI: 00000000356832b0 RDI: > > ffffffff805adf38 > > [10813.658805] RBP: 0000000000000da8 R08: 0000000000000000 R09: > > 0000000000000000 > > [10813.658805] R10: 0000000000000001 R11: 0000000000000002 R12: > > ffffffff802228ed > > [10813.658805] R13: 00000000000132a0 R14: ffffffff80200bba R15: > > ffff81000100a280 > > [10813.658805] FS: 0000000000000000(0000) GS:ffffffff80576000(0000) > > knlGS:0000000000000000 > > [10813.658805] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > > [10813.658805] CR2: 00007fac0f852000 CR3: 0000000000201000 CR4: > > 00000000000006e0 > > [10813.658805] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > > 0000000000000000 > > [10813.658805] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > > 0000000000000400 > > [10813.658805] > > [10813.658805] Call Trace: > > [10813.658805] [<ffffffff8020a55b>] ? default_idle+0x3b/0x70 > > [10813.658805] [<ffffffff8020a520>] ? default_idle+0x0/0x70 > > [10813.658805] [<ffffffff8020a60e>] ? cpu_idle+0x7e/0xe0 > > [10813.658805] [<ffffffff80211630>] ? pda_init+0x30/0xb0 > > > > Can I somehow help to track this one down?? > > Hi Nikola, > > I just reproduced this on a UP guest. Were you seeing the exact same > stack trace in the guest with kvm-64 ? I think the logic to wakeup tasks in HLT is racy. Nothing prevents a timer event from being lost if it fires in between guest exit and vcpu_block(). Please try the patch below. > > 2) I'm getting lot's of following messages on host, what do they mean? > > [16836.605669] Ignoring de-assert INIT to vcpu SOMENUMBER > > [16836.605687] SIPI to vcpu SOMENUMBER vector 0xSOMENUMBER This is the APIC SMP initialization protocol. diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index 06a241a..fdd8342 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -199,10 +199,8 @@ int __pit_timer_fn(struct kvm_kpit_state *ps) struct kvm_kpit_timer *pt = &ps->pit_timer; atomic_inc(&pt->pending); - if (vcpu0 && waitqueue_active(&vcpu0->wq)) { - vcpu0->arch.mp_state = VCPU_MP_STATE_RUNNABLE; - wake_up_interruptible(&vcpu0->wq); - } + if (vcpu0) + kvm_wakeup_vcpu(vcpu0, VCPU_MP_STATE_RUNNABLE); pt->timer.expires = ktime_add_ns(pt->timer.expires, pt->period); pt->scheduled = ktime_to_ns(pt->timer.expires); @@ -210,6 +208,16 @@ int __pit_timer_fn(struct kvm_kpit_state *ps) return (pt->period == 0 ? 0 : 1); } +int pit_has_pending_event(struct kvm_vcpu *vcpu) +{ + struct kvm_pit *pit = vcpu->kvm->arch.vpit; + + if (pit && vcpu->vcpu_id == 0) + return atomic_read(&pit->pit_state.pit_timer.pending); + + return 0; +} + static enum hrtimer_restart pit_timer_fn(struct hrtimer *data) { struct kvm_kpit_state *ps; diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c index dbfe21c..18b8a0e 100644 --- a/arch/x86/kvm/irq.c +++ b/arch/x86/kvm/irq.c @@ -26,6 +26,21 @@ #include "i8254.h" /* + * check if there are pending timer events + * to be processed. + */ +int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu) +{ + int ret; + + ret = pit_has_pending_event(vcpu); + ret |= apic_has_pending_event(vcpu); + + return ret; +} +EXPORT_SYMBOL(kvm_cpu_has_pending_timer); + +/* * check if there is pending interrupt without * intack. */ diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h index fa5ed5d..ff0ddb5 100644 --- a/arch/x86/kvm/irq.h +++ b/arch/x86/kvm/irq.h @@ -85,4 +85,7 @@ void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu); void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu); void __kvm_migrate_apic_timer(struct kvm_vcpu *vcpu); +int pit_has_pending_event(struct kvm_vcpu *vcpu); +int apic_has_pending_event(struct kvm_vcpu *vcpu); + #endif diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 31280df..9973d8e 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -936,13 +936,10 @@ EXPORT_SYMBOL_GPL(kvm_lapic_enabled); static int __apic_timer_fn(struct kvm_lapic *apic) { int result = 0; - wait_queue_head_t *q = &apic->vcpu->wq; atomic_inc(&apic->timer.pending); - if (waitqueue_active(q)) { - apic->vcpu->arch.mp_state = VCPU_MP_STATE_RUNNABLE; - wake_up_interruptible(q); - } + kvm_wakeup_vcpu(apic->vcpu, VCPU_MP_STATE_RUNNABLE); + if (apic_lvtt_period(apic)) { result = 1; apic->timer.dev.expires = ktime_add_ns( @@ -952,6 +949,16 @@ static int __apic_timer_fn(struct kvm_lapic *apic) return result; } +int apic_has_pending_event(struct kvm_vcpu *vcpu) +{ + struct kvm_lapic *lapic = vcpu->arch.apic; + + if (lapic) + return atomic_read(&lapic->timer.pending); + + return 0; +} + static int __inject_apic_timer_irq(struct kvm_lapic *apic) { int vector; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index cb57b6a..b11ea81 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3925,10 +3944,7 @@ void kvm_vcpu_kick(struct kvm_vcpu *vcpu) { int ipi_pcpu = vcpu->cpu; - if (waitqueue_active(&vcpu->wq)) { - wake_up_interruptible(&vcpu->wq); - ++vcpu->stat.halt_wakeup; - } + kvm_wakeup_vcpu(vcpu, vcpu->arch.mp_state); if (vcpu->guest_mode) smp_call_function_single(ipi_pcpu, vcpu_kick_intr, vcpu, 0, 0); } diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index a2ceb51..a6cb75e 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -69,6 +69,7 @@ struct kvm_vcpu { int fpu_active; int guest_fpu_loaded; wait_queue_head_t wq; + spinlock_t wq_lock; int sigset_active; sigset_t sigset; struct kvm_vcpu_stat stat; @@ -184,6 +185,7 @@ int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn); void mark_page_dirty(struct kvm *kvm, gfn_t gfn); void kvm_vcpu_block(struct kvm_vcpu *vcpu); +void kvm_wakeup_vcpu(struct kvm_vcpu *vcpu, int mpstate); void kvm_resched(struct kvm_vcpu *vcpu); void kvm_load_guest_fpu(struct kvm_vcpu *vcpu); void kvm_put_guest_fpu(struct kvm_vcpu *vcpu); @@ -256,6 +262,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm); int kvm_cpu_get_interrupt(struct kvm_vcpu *v); int kvm_cpu_has_interrupt(struct kvm_vcpu *v); +int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu); void kvm_vcpu_kick(struct kvm_vcpu *vcpu); static inline void kvm_guest_enter(void) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 3396a5f..00d1a6c 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -152,6 +152,7 @@ int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id) vcpu->kvm = kvm; vcpu->vcpu_id = id; init_waitqueue_head(&vcpu->wq); + spin_lock_init(&vcpu->wq_lock); page = alloc_page(GFP_KERNEL | __GFP_ZERO); if (!page) { @@ -698,6 +699,23 @@ void mark_page_dirty(struct kvm *kvm, gfn_t gfn) } } +void kvm_wakeup_vcpu(struct kvm_vcpu *vcpu, int mpstate) +{ + wait_queue_head_t *q = &vcpu->wq; + unsigned long flags; + + spin_lock_irqsave(&vcpu->wq_lock, flags); + if (waitqueue_active(q)) { +#ifdef CONFIG_X86 + vcpu->arch.mp_state = mpstate; +#endif + wake_up_interruptible(q); + ++vcpu->stat.halt_wakeup; + } + spin_unlock_irqrestore(&vcpu->wq_lock, flags); + +} + /* * The vCPU has executed a HLT instruction with in-kernel mode enabled. */ @@ -705,19 +723,24 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu) { DECLARE_WAITQUEUE(wait, current); + spin_lock_irq(&vcpu->wq_lock); add_wait_queue(&vcpu->wq, &wait); /* * We will block until either an interrupt or a signal wakes us up */ while (!kvm_cpu_has_interrupt(vcpu) + && !kvm_cpu_has_pending_timer(vcpu) && !signal_pending(current) && !kvm_arch_vcpu_runnable(vcpu)) { set_current_state(TASK_INTERRUPTIBLE); + spin_unlock_irq(&vcpu->wq_lock); vcpu_put(vcpu); schedule(); vcpu_load(vcpu); + spin_lock_irq(&vcpu->wq_lock); } + spin_unlock_irq(&vcpu->wq_lock); __set_current_state(TASK_RUNNING); remove_wait_queue(&vcpu->wq, &wait); ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Register now and save $200. Hurry, offer ends at 11:59 p.m., Monday, April 7! Use priority code J8TLD2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: paravirt clock stil causing hangs in kvm-65 2008-04-08 1:18 ` Marcelo Tosatti @ 2008-04-08 5:59 ` Nikola Ciprich 0 siblings, 0 replies; 15+ messages in thread From: Nikola Ciprich @ 2008-04-08 5:59 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: kvm-devel, Avi Kivity On Mon, 7 Apr 2008, Marcelo Tosatti wrote: > I think the logic to wakeup tasks in HLT is racy. Nothing prevents > a timer event from being lost if it fires in between guest exit and > vcpu_block(). > > Please try the patch below. > Hi Marcelo, the problem persists even with patch. [ 1905.899112] [<ffffffff8020a55b>] ? default_idle+0x3b/0x70 [ 1905.899112] [<ffffffff8020a520>] ? default_idle+0x0/0x70 [ 1905.899112] [<ffffffff8020a60e>] ? cpu_idle+0x7e/0xe0 [ 1905.899112] [<ffffffff80211630>] ? pda_init+0x30/0xb0 As I already said, if I enable ACPI, system does not boot at all (even with init=/bin/bash) and ends with mentioned trace immediately. If I disable ACPI, it hangs on "Starting udev", with init=/bin/bash it at least boots and I can use shell. At least for some time. Funny thing is, that If I then use poweroff, the system shuts down, ends with "System halted." message, but everything still seems to be working, shell, etc... ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Register now and save $200. Hurry, offer ends at 11:59 p.m., Monday, April 7! Use priority code J8TLD2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: paravirt clock stil causing hangs in kvm-65 2008-04-07 21:34 ` Marcelo Tosatti 2008-04-08 1:18 ` Marcelo Tosatti @ 2008-04-19 15:29 ` Marcelo Tosatti 2008-04-19 16:22 ` Glauber Costa 2008-04-19 16:29 ` Marcelo Tosatti 1 sibling, 2 replies; 15+ messages in thread From: Marcelo Tosatti @ 2008-04-19 15:29 UTC (permalink / raw) To: Nikola Ciprich, Glauber de Oliveira Costa; +Cc: kvm-devel On Mon, Apr 07, 2008 at 06:34:57PM -0300, Marcelo Tosatti wrote: > On Mon, Apr 07, 2008 at 01:53:36PM +0200, Nikola Ciprich wrote: > > Hi, > > > > I also tried paravirt clock again in latest git with kvm-65 patch > > applied, and problem with cpu-lockups persists: > > > > [10813.654806] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0] > > [10813.655789] CPU 0: > > [10813.656624] Modules linked in: virtio_pci virtio_ring virtio_blk virtio > > piix dm_snapshot dm_zero dm_mirror dm_mod ide_disk > > ide_core sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd > > [10813.658805] Pid: 0, comm: swapper Not tainted 2.6.25-rc7 #5 > > [10813.658805] RIP: 0010:[<ffffffff80222ab2>] [<ffffffff80222ab2>] > > native_safe_halt+0x2/0x10 > > [10813.658805] RSP: 0018:ffffffff805adf50 EFLAGS: 00000296 > > [10813.658805] RAX: 000000019b08eeb0 RBX: ffffffff805f5000 RCX: > > 000000019b08eeb0 > > [10813.658805] RDX: 0000000000000006 RSI: 00000000356832b0 RDI: > > ffffffff805adf38 > > [10813.658805] RBP: 0000000000000da8 R08: 0000000000000000 R09: > > 0000000000000000 > > [10813.658805] R10: 0000000000000001 R11: 0000000000000002 R12: > > ffffffff802228ed > > [10813.658805] R13: 00000000000132a0 R14: ffffffff80200bba R15: > > ffff81000100a280 > > [10813.658805] FS: 0000000000000000(0000) GS:ffffffff80576000(0000) > > knlGS:0000000000000000 > > [10813.658805] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > > [10813.658805] CR2: 00007fac0f852000 CR3: 0000000000201000 CR4: > > 00000000000006e0 > > [10813.658805] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > > 0000000000000000 > > [10813.658805] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > > 0000000000000400 > > [10813.658805] > > [10813.658805] Call Trace: > > [10813.658805] [<ffffffff8020a55b>] ? default_idle+0x3b/0x70 > > [10813.658805] [<ffffffff8020a520>] ? default_idle+0x0/0x70 > > [10813.658805] [<ffffffff8020a60e>] ? cpu_idle+0x7e/0xe0 > > [10813.658805] [<ffffffff80211630>] ? pda_init+0x30/0xb0 > > > > Can I somehow help to track this one down?? > > Hi Nikola, > > I just reproduced this on a UP guest. Were you seeing the exact same > stack trace in the guest with kvm-64 ? I've been able to reproduce the problem. Symptoms are that when using NOHZ vcpu0 LAPIC timer is ticking far less than the others (apparently vcpu0 is the only one ticking "correctly"): nohz=on with kvmclock [root@localhost ~]# cat /proc/timer_stats | grep apic 13214, 8590 qemu-system-x86 apic_mmio_write (apic_timer_fn) 13214, 8589 qemu-system-x86 apic_mmio_write (apic_timer_fn) 13211, 8588 qemu-system-x86 apic_mmio_write (apic_timer_fn) 389, 8587 qemu-system-x86 apic_mmio_write (apic_timer_fn) nohz=off 3253, 8672 qemu-system-x86 apic_mmio_write (apic_timer_fn) 2876, 8673 qemu-system-x86 apic_mmio_write (apic_timer_fn) 2543, 8674 qemu-system-x86 apic_mmio_write (apic_timer_fn) 2179, 8675 qemu-system-x86 apic_mmio_write (apic_timer_fn) no-kvmclock 1017, 8808 qemu-system-x86 apic_mmio_write (apic_timer_fn) 1577, 8809 qemu-system-x86 apic_mmio_write (apic_timer_fn) 1708, 8807 qemu-system-x86 apic_mmio_write (apic_timer_fn) 1812, 8806 qemu-system-x86 apic_mmio_write (apic_timer_fn) Glauber will start looking at this next week. ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: paravirt clock stil causing hangs in kvm-65 2008-04-19 15:29 ` Marcelo Tosatti @ 2008-04-19 16:22 ` Glauber Costa 2008-04-19 16:49 ` Marcelo Tosatti 2008-04-19 16:29 ` Marcelo Tosatti 1 sibling, 1 reply; 15+ messages in thread From: Glauber Costa @ 2008-04-19 16:22 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: kvm-devel, Gerd Hoffmann, Glauber de Oliveira Costa On Sat, Apr 19, 2008 at 12:29 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote: > On Mon, Apr 07, 2008 at 06:34:57PM -0300, Marcelo Tosatti wrote: > > > > On Mon, Apr 07, 2008 at 01:53:36PM +0200, Nikola Ciprich wrote: > > > Hi, > > > > > > I also tried paravirt clock again in latest git with kvm-65 patch > > > applied, and problem with cpu-lockups persists: > > > > > > [10813.654806] BUG: soft lockup - CPU#0 stuck for 61s! [swapper:0] > > > [10813.655789] CPU 0: > > > [10813.656624] Modules linked in: virtio_pci virtio_ring virtio_blk virtio > > > piix dm_snapshot dm_zero dm_mirror dm_mod ide_disk > > > ide_core sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd > > > [10813.658805] Pid: 0, comm: swapper Not tainted 2.6.25-rc7 #5 > > > [10813.658805] RIP: 0010:[<ffffffff80222ab2>] [<ffffffff80222ab2>] > > > native_safe_halt+0x2/0x10 > > > [10813.658805] RSP: 0018:ffffffff805adf50 EFLAGS: 00000296 > > > [10813.658805] RAX: 000000019b08eeb0 RBX: ffffffff805f5000 RCX: > > > 000000019b08eeb0 > > > [10813.658805] RDX: 0000000000000006 RSI: 00000000356832b0 RDI: > > > ffffffff805adf38 > > > [10813.658805] RBP: 0000000000000da8 R08: 0000000000000000 R09: > > > 0000000000000000 > > > [10813.658805] R10: 0000000000000001 R11: 0000000000000002 R12: > > > ffffffff802228ed > > > [10813.658805] R13: 00000000000132a0 R14: ffffffff80200bba R15: > > > ffff81000100a280 > > > [10813.658805] FS: 0000000000000000(0000) GS:ffffffff80576000(0000) > > > knlGS:0000000000000000 > > > [10813.658805] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > > > [10813.658805] CR2: 00007fac0f852000 CR3: 0000000000201000 CR4: > > > 00000000000006e0 > > > [10813.658805] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > > > 0000000000000000 > > > [10813.658805] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: > > > 0000000000000400 > > > [10813.658805] > > > [10813.658805] Call Trace: > > > [10813.658805] [<ffffffff8020a55b>] ? default_idle+0x3b/0x70 > > > [10813.658805] [<ffffffff8020a520>] ? default_idle+0x0/0x70 > > > [10813.658805] [<ffffffff8020a60e>] ? cpu_idle+0x7e/0xe0 > > > [10813.658805] [<ffffffff80211630>] ? pda_init+0x30/0xb0 > > > > > > Can I somehow help to track this one down?? > > > > Hi Nikola, > > > > I just reproduced this on a UP guest. Were you seeing the exact same > > stack trace in the guest with kvm-64 ? > > I've been able to reproduce the problem. Symptoms are that when using > NOHZ vcpu0 LAPIC timer is ticking far less than the others (apparently vcpu0 > is the only one ticking "correctly"): > > > nohz=on with kvmclock > [root@localhost ~]# cat /proc/timer_stats | grep apic > 13214, 8590 qemu-system-x86 apic_mmio_write (apic_timer_fn) > 13214, 8589 qemu-system-x86 apic_mmio_write (apic_timer_fn) > 13211, 8588 qemu-system-x86 apic_mmio_write (apic_timer_fn) > 389, 8587 qemu-system-x86 apic_mmio_write (apic_timer_fn) > > nohz=off > 3253, 8672 qemu-system-x86 apic_mmio_write (apic_timer_fn) > 2876, 8673 qemu-system-x86 apic_mmio_write (apic_timer_fn) > 2543, 8674 qemu-system-x86 apic_mmio_write (apic_timer_fn) > 2179, 8675 qemu-system-x86 apic_mmio_write (apic_timer_fn) > > no-kvmclock > 1017, 8808 qemu-system-x86 apic_mmio_write (apic_timer_fn) > 1577, 8809 qemu-system-x86 apic_mmio_write (apic_timer_fn) > 1708, 8807 qemu-system-x86 apic_mmio_write (apic_timer_fn) > 1812, 8806 qemu-system-x86 apic_mmio_write (apic_timer_fn) > > > Glauber will start looking at this next week. > > > >From what me and marcelo discussed, I think there's a possibility that it has marginally something to do with precision of clock calculation. Gerd's patches address that issues. Can somebody test this with those patches (both guest and host), while I'm off ? -- Glauber Costa. "Free as in Freedom" http://glommer.net "The less confident you are, the more serious you have to act." ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: paravirt clock stil causing hangs in kvm-65 2008-04-19 16:22 ` Glauber Costa @ 2008-04-19 16:49 ` Marcelo Tosatti 2008-04-21 7:15 ` Gerd Hoffmann 0 siblings, 1 reply; 15+ messages in thread From: Marcelo Tosatti @ 2008-04-19 16:49 UTC (permalink / raw) To: Glauber Costa, Gerd Hoffmann; +Cc: kvm-devel On Sat, Apr 19, 2008 at 01:22:28PM -0300, Glauber Costa wrote: > > I've been able to reproduce the problem. Symptoms are that when using > > NOHZ vcpu0 LAPIC timer is ticking far less than the others (apparently vcpu0 > > is the only one ticking "correctly"): > > > > > > nohz=on with kvmclock > > [root@localhost ~]# cat /proc/timer_stats | grep apic > > 13214, 8590 qemu-system-x86 apic_mmio_write (apic_timer_fn) > > 13214, 8589 qemu-system-x86 apic_mmio_write (apic_timer_fn) > > 13211, 8588 qemu-system-x86 apic_mmio_write (apic_timer_fn) > > 389, 8587 qemu-system-x86 apic_mmio_write (apic_timer_fn) > > > > nohz=off > > 3253, 8672 qemu-system-x86 apic_mmio_write (apic_timer_fn) > > 2876, 8673 qemu-system-x86 apic_mmio_write (apic_timer_fn) > > 2543, 8674 qemu-system-x86 apic_mmio_write (apic_timer_fn) > > 2179, 8675 qemu-system-x86 apic_mmio_write (apic_timer_fn) > > > > no-kvmclock > > 1017, 8808 qemu-system-x86 apic_mmio_write (apic_timer_fn) > > 1577, 8809 qemu-system-x86 apic_mmio_write (apic_timer_fn) > > 1708, 8807 qemu-system-x86 apic_mmio_write (apic_timer_fn) > > 1812, 8806 qemu-system-x86 apic_mmio_write (apic_timer_fn) > > > > > > Glauber will start looking at this next week. > > > > > > > > >From what me and marcelo discussed, I think there's a possibility that > it has marginally something to do with precision of clock calculation. > Gerd's patches address that issues. Can somebody test this with those > patches (both guest and host), while I'm off ? Haven't seen Gerd's guest patches ? ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: paravirt clock stil causing hangs in kvm-65 2008-04-19 16:49 ` Marcelo Tosatti @ 2008-04-21 7:15 ` Gerd Hoffmann 2008-04-21 8:33 ` [ RfC / patch ] kvmclock fixes Gerd Hoffmann 2008-04-24 13:20 ` paravirt clock stil causing hangs in kvm-65 Glauber Costa 0 siblings, 2 replies; 15+ messages in thread From: Gerd Hoffmann @ 2008-04-21 7:15 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: kvm-devel Marcelo Tosatti wrote: >> >From what me and marcelo discussed, I think there's a possibility that >> it has marginally something to do with precision of clock calculation. >> Gerd's patches address that issues. Can somebody test this with those >> patches (both guest and host), while I'm off ? > > Haven't seen Gerd's guest patches ? I'm still busy cooking them up. I've mentioned them in a mail, but they didn't ran over the list (yet). Stay tuned ;) cheers, Gerd -- http://kraxel.fedorapeople.org/xenner/ ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ^ permalink raw reply [flat|nested] 15+ messages in thread
* [ RfC / patch ] kvmclock fixes 2008-04-21 7:15 ` Gerd Hoffmann @ 2008-04-21 8:33 ` Gerd Hoffmann 2008-04-21 9:57 ` Jeremy Fitzhardinge 2008-04-24 13:20 ` paravirt clock stil causing hangs in kvm-65 Glauber Costa 1 sibling, 1 reply; 15+ messages in thread From: Gerd Hoffmann @ 2008-04-21 8:33 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: kvm-devel, Jeremy Fitzhardinge [-- Attachment #1: Type: text/plain, Size: 755 bytes --] Gerd Hoffmann wrote: > Marcelo Tosatti wrote: >> Haven't seen Gerd's guest patches ? > > I'm still busy cooking them up. I've mentioned them in a mail, but they > didn't ran over the list (yet). Stay tuned ;) It compiles, ship it! This time as all-in one patch (both guest and host side). Almost untested and not (yet) splitted into pieces. Changes: * Host: make kvm pv clock really compatible with xen pv clock. * Guest/xen: factor out some xen clock code into a separate source file (pvclock.[ch]), so kvm can reuse it. * Guest/kvm: make kvm clock compatible with xen clock by using the common code bits. Tests, reviews and comments are welcome. cheers, Gerd -- http://kraxel.fedorapeople.org/xenner/ [-- Attachment #2: kvmclock-4.diff --] [-- Type: text/x-patch, Size: 15294 bytes --] diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 1cc9d42..688df87 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -79,7 +79,7 @@ obj-$(CONFIG_DEBUG_NX_TEST) += test_nx.o obj-$(CONFIG_VMI) += vmi_32.o vmiclock_32.o obj-$(CONFIG_KVM_GUEST) += kvm.o obj-$(CONFIG_KVM_CLOCK) += kvmclock.o -obj-$(CONFIG_PARAVIRT) += paravirt.o paravirt_patch_$(BITS).o +obj-$(CONFIG_PARAVIRT) += paravirt.o paravirt_patch_$(BITS).o pvclock.o ifdef CONFIG_INPUT_PCSPKR obj-y += pcspeaker.o diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index ddee040..476b7c7 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -18,6 +18,7 @@ #include <linux/clocksource.h> #include <linux/kvm_para.h> +#include <asm/pvclock.h> #include <asm/arch_hooks.h> #include <asm/msr.h> #include <asm/apic.h> @@ -37,17 +38,9 @@ early_param("no-kvmclock", parse_no_kvmclock); /* The hypervisor will put information about time periodically here */ static DEFINE_PER_CPU_SHARED_ALIGNED(struct kvm_vcpu_time_info, hv_clock); -#define get_clock(cpu, field) per_cpu(hv_clock, cpu).field - -static inline u64 kvm_get_delta(u64 last_tsc) -{ - int cpu = smp_processor_id(); - u64 delta = native_read_tsc() - last_tsc; - return (delta * get_clock(cpu, tsc_to_system_mul)) >> KVM_SCALE; -} static struct kvm_wall_clock wall_clock; -static cycle_t kvm_clock_read(void); + /* * The wallclock is the time of day when we booted. Since then, some time may * have elapsed since the hypervisor wrote the data. So we try to account for @@ -55,35 +48,19 @@ static cycle_t kvm_clock_read(void); */ unsigned long kvm_get_wallclock(void) { - u32 wc_sec, wc_nsec; - u64 delta; + struct kvm_vcpu_time_info *vcpu_time; struct timespec ts; - int version, nsec; int low, high; low = (int)__pa(&wall_clock); high = ((u64)__pa(&wall_clock) >> 32); - - delta = kvm_clock_read(); - native_write_msr(MSR_KVM_WALL_CLOCK, low, high); - do { - version = wall_clock.wc_version; - rmb(); - wc_sec = wall_clock.wc_sec; - wc_nsec = wall_clock.wc_nsec; - rmb(); - } while ((wall_clock.wc_version != version) || (version & 1)); - - delta = kvm_clock_read() - delta; - delta += wc_nsec; - nsec = do_div(delta, NSEC_PER_SEC); - set_normalized_timespec(&ts, wc_sec + delta, nsec); - /* - * Of all mechanisms of time adjustment I've tested, this one - * was the champion! - */ - return ts.tv_sec + 1; + + vcpu_time = &get_cpu_var(hv_clock); + pvclock_read_wallclock(&wall_clock, vcpu_time, &ts); + put_cpu_var(hv_clock); + + return ts.tv_sec; } int kvm_set_wallclock(unsigned long now) @@ -91,28 +68,17 @@ int kvm_set_wallclock(unsigned long now) return 0; } -/* - * This is our read_clock function. The host puts an tsc timestamp each time - * it updates a new time. Without the tsc adjustment, we can have a situation - * in which a vcpu starts to run earlier (smaller system_time), but probes - * time later (compared to another vcpu), leading to backwards time - */ static cycle_t kvm_clock_read(void) { - u64 last_tsc, now; - int cpu; + struct kvm_vcpu_time_info *src; + cycle_t ret; - preempt_disable(); - cpu = smp_processor_id(); - - last_tsc = get_clock(cpu, tsc_timestamp); - now = get_clock(cpu, system_time); - - now += kvm_get_delta(last_tsc); - preempt_enable(); - - return now; + src = &get_cpu_var(hv_clock); + ret = pvclock_clocksource_read(src); + put_cpu_var(hv_clock); + return ret; } + static struct clocksource kvm_clock = { .name = "kvm-clock", .read = kvm_clock_read, diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c new file mode 100644 index 0000000..2da148d --- /dev/null +++ b/arch/x86/kernel/pvclock.c @@ -0,0 +1,144 @@ +/* paravirtual clock -- common code used by kvm/xen + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, write to the Free Software + Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +*/ + +#include <linux/kernel.h> +#include <linux/percpu.h> +#include <asm/pvclock.h> + +/* + * These are perodically updated + * xen: magic shared_info page + * kvm: gpa registered via msr + * and then copied here. + */ +struct pvclock_shadow_time { + u64 tsc_timestamp; /* TSC at last update of time vals. */ + u64 system_timestamp; /* Time, in nanosecs, since boot. */ + u32 tsc_to_nsec_mul; + int tsc_shift; + u32 version; +}; + +static DEFINE_PER_CPU(struct pvclock_shadow_time, shadow_time); + +/* + * Scale a 64-bit delta by scaling and multiplying by a 32-bit fraction, + * yielding a 64-bit result. + */ +static inline u64 scale_delta(u64 delta, u32 mul_frac, int shift) +{ + u64 product; +#ifdef __i386__ + u32 tmp1, tmp2; +#endif + + if (shift < 0) + delta >>= -shift; + else + delta <<= shift; + +#ifdef __i386__ + __asm__ ( + "mul %5 ; " + "mov %4,%%eax ; " + "mov %%edx,%4 ; " + "mul %5 ; " + "xor %5,%5 ; " + "add %4,%%eax ; " + "adc %5,%%edx ; " + : "=A" (product), "=r" (tmp1), "=r" (tmp2) + : "a" ((u32)delta), "1" ((u32)(delta >> 32)), "2" (mul_frac) ); +#elif __x86_64__ + __asm__ ( + "mul %%rdx ; shrd $32,%%rdx,%%rax" + : "=a" (product) : "0" (delta), "d" ((u64)mul_frac) ); +#else +#error implement me! +#endif + + return product; +} + +static u64 pvclock_get_nsec_offset(struct pvclock_shadow_time *shadow) +{ + u64 delta = native_read_tsc() - shadow->tsc_timestamp; + return scale_delta(delta, shadow->tsc_to_nsec_mul, shadow->tsc_shift); +} + +/* + * Reads a consistent set of time-base values from hypervisor, + * into a shadow data area. + */ +static unsigned pvclock_get_time_values(struct pvclock_shadow_time *dst, + struct kvm_vcpu_time_info *src) +{ + do { + dst->version = src->version; + rmb(); /* fetch version before data */ + dst->tsc_timestamp = src->tsc_timestamp; + dst->system_timestamp = src->system_time; + dst->tsc_to_nsec_mul = src->tsc_to_system_mul; + dst->tsc_shift = src->tsc_shift; + rmb(); /* test version after fetching data */ + } while ((src->version & 1) | (dst->version ^ src->version)); + + return dst->version; +} + +/* + * This is our read_clock function. The host puts an tsc timestamp each time + * it updates a new time. Without the tsc adjustment, we can have a situation + * in which a vcpu starts to run earlier (smaller system_time), but probes + * time later (compared to another vcpu), leading to backwards time + */ +cycle_t pvclock_clocksource_read(struct kvm_vcpu_time_info *src) +{ + struct pvclock_shadow_time *shadow = &get_cpu_var(shadow_time); + cycle_t ret; + + pvclock_get_time_values(shadow, src); + ret = shadow->system_timestamp + pvclock_get_nsec_offset(shadow); + + put_cpu_var(shadow_time); + return ret; +} + +void pvclock_read_wallclock(struct kvm_wall_clock *wall_clock, + struct kvm_vcpu_time_info *vcpu_time, + struct timespec *ts) +{ + u32 version; + u64 delta; + struct timespec now; + + /* get wallclock at system boot */ + do { + version = wall_clock->wc_version; + rmb(); /* fetch version before time */ + now.tv_sec = wall_clock->wc_sec; + now.tv_nsec = wall_clock->wc_nsec; + rmb(); /* fetch time before checking version */ + } while ((wall_clock->wc_version & 1) || (version != wall_clock->wc_version)); + + delta = pvclock_clocksource_read(vcpu_time); /* time since system boot */ + delta += now.tv_sec * (u64)NSEC_PER_SEC + now.tv_nsec; + + now.tv_nsec = do_div(delta, NSEC_PER_SEC); + now.tv_sec = delta; + + set_normalized_timespec(ts, now.tv_sec, now.tv_nsec); +} diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 0ce5563..bdd3128 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -493,7 +493,7 @@ static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock) { static int version; struct kvm_wall_clock wc; - struct timespec wc_ts; + struct timespec now,sys,boot; if (!wall_clock) return; @@ -502,9 +502,16 @@ static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock) kvm_write_guest(kvm, wall_clock, &version, sizeof(version)); - wc_ts = current_kernel_time(); - wc.wc_sec = wc_ts.tv_sec; - wc.wc_nsec = wc_ts.tv_nsec; +#if 0 + /* Hmm, getboottime() isn't exported to modules ... */ + getboottime(&boot); +#else + now = current_kernel_time(); + ktime_get_ts(&sys); + boot = ns_to_timespec(timespec_to_ns(&now) - timespec_to_ns(&sys)); +#endif + wc.wc_sec = boot.tv_sec; + wc.wc_nsec = boot.tv_nsec; wc.wc_version = version; kvm_write_guest(kvm, wall_clock, &wc, sizeof(wc)); @@ -544,13 +551,51 @@ static void kvm_write_guest_time(struct kvm_vcpu *v) shared_kaddr = kmap_atomic(vcpu->time_page, KM_USER0); memcpy(shared_kaddr + vcpu->time_offset, &vcpu->hv_clock, - sizeof(vcpu->hv_clock)); + sizeof(vcpu->hv_clock)); kunmap_atomic(shared_kaddr, KM_USER0); mark_page_dirty(v->kvm, vcpu->time >> PAGE_SHIFT); } +static uint32_t div_frac(uint32_t dividend, uint32_t divisor) +{ + uint32_t quotient, remainder; + + __asm__ ( "divl %4" + : "=a" (quotient), "=d" (remainder) + : "0" (0), "1" (dividend), "r" (divisor) ); + return quotient; +} + +static void kvm_set_time_scale(uint32_t tsc_khz, struct kvm_vcpu_time_info *hv_clock) +{ + uint64_t nsecs = 1000000000LL; + int32_t shift = 0; + uint64_t tps64; + uint32_t tps32; + + tps64 = tsc_khz * 1000LL; + while (tps64 > nsecs*2) { + tps64 >>= 1; + shift--; + } + + tps32 = (uint32_t)tps64; + while (tps32 <= (uint32_t)nsecs) { + tps32 <<= 1; + shift++; + } + + hv_clock->tsc_shift = shift; + hv_clock->tsc_to_system_mul = div_frac(nsecs, tps32); + +#if 0 + printk(KERN_DEBUG "%s: tsc_khz %u, tsc_shift %d, tsc_mul %u\n", + __FUNCTION__, tsc_khz, hv_clock->tsc_shift, + hv_clock->tsc_to_system_mul); +#endif +} int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) { @@ -599,9 +644,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) /* ...but clean it before doing the actual write */ vcpu->arch.time_offset = data & ~(PAGE_MASK | 1); - vcpu->arch.hv_clock.tsc_to_system_mul = - clocksource_khz2mult(tsc_khz, 22); - vcpu->arch.hv_clock.tsc_shift = 22; + kvm_set_time_scale(tsc_khz, &vcpu->arch.hv_clock); down_read(¤t->mm->mmap_sem); vcpu->arch.time_page = diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c index c39e1a5..3d5f945 100644 --- a/arch/x86/xen/time.c +++ b/arch/x86/xen/time.c @@ -13,6 +13,7 @@ #include <linux/clockchips.h> #include <linux/kernel_stat.h> +#include <asm/pvclock.h> #include <asm/xen/hypervisor.h> #include <asm/xen/hypercall.h> @@ -30,17 +31,6 @@ static cycle_t xen_clocksource_read(void); -/* These are perodically updated in shared_info, and then copied here. */ -struct shadow_time_info { - u64 tsc_timestamp; /* TSC at last update of time vals. */ - u64 system_timestamp; /* Time, in nanosecs, since boot. */ - u32 tsc_to_nsec_mul; - int tsc_shift; - u32 version; -}; - -static DEFINE_PER_CPU(struct shadow_time_info, shadow_time); - /* runstate info updated by Xen */ static DEFINE_PER_CPU(struct vcpu_runstate_info, runstate); @@ -230,95 +220,14 @@ unsigned long xen_cpu_khz(void) return xen_khz; } -/* - * Reads a consistent set of time-base values from Xen, into a shadow data - * area. - */ -static unsigned get_time_values_from_xen(void) -{ - struct vcpu_time_info *src; - struct shadow_time_info *dst; - - /* src is shared memory with the hypervisor, so we need to - make sure we get a consistent snapshot, even in the face of - being preempted. */ - src = &__get_cpu_var(xen_vcpu)->time; - dst = &__get_cpu_var(shadow_time); - - do { - dst->version = src->version; - rmb(); /* fetch version before data */ - dst->tsc_timestamp = src->tsc_timestamp; - dst->system_timestamp = src->system_time; - dst->tsc_to_nsec_mul = src->tsc_to_system_mul; - dst->tsc_shift = src->tsc_shift; - rmb(); /* test version after fetching data */ - } while ((src->version & 1) | (dst->version ^ src->version)); - - return dst->version; -} - -/* - * Scale a 64-bit delta by scaling and multiplying by a 32-bit fraction, - * yielding a 64-bit result. - */ -static inline u64 scale_delta(u64 delta, u32 mul_frac, int shift) -{ - u64 product; -#ifdef __i386__ - u32 tmp1, tmp2; -#endif - - if (shift < 0) - delta >>= -shift; - else - delta <<= shift; - -#ifdef __i386__ - __asm__ ( - "mul %5 ; " - "mov %4,%%eax ; " - "mov %%edx,%4 ; " - "mul %5 ; " - "xor %5,%5 ; " - "add %4,%%eax ; " - "adc %5,%%edx ; " - : "=A" (product), "=r" (tmp1), "=r" (tmp2) - : "a" ((u32)delta), "1" ((u32)(delta >> 32)), "2" (mul_frac) ); -#elif __x86_64__ - __asm__ ( - "mul %%rdx ; shrd $32,%%rdx,%%rax" - : "=a" (product) : "0" (delta), "d" ((u64)mul_frac) ); -#else -#error implement me! -#endif - - return product; -} - -static u64 get_nsec_offset(struct shadow_time_info *shadow) -{ - u64 now, delta; - now = native_read_tsc(); - delta = now - shadow->tsc_timestamp; - return scale_delta(delta, shadow->tsc_to_nsec_mul, shadow->tsc_shift); -} - static cycle_t xen_clocksource_read(void) { - struct shadow_time_info *shadow = &get_cpu_var(shadow_time); + struct vcpu_time_info *src; cycle_t ret; - unsigned version; - - do { - version = get_time_values_from_xen(); - barrier(); - ret = shadow->system_timestamp + get_nsec_offset(shadow); - barrier(); - } while (version != __get_cpu_var(xen_vcpu)->time.version); - - put_cpu_var(shadow_time); + src = &get_cpu_var(xen_vcpu)->time; + ret = pvclock_clocksource_read((void*)src); + put_cpu_var(xen_vcpu); return ret; } @@ -349,9 +258,14 @@ static void xen_read_wallclock(struct timespec *ts) unsigned long xen_get_wallclock(void) { + const struct shared_info *s = HYPERVISOR_shared_info; + struct kvm_wall_clock *wall_clock = (void*)&(s->wc_version); + struct vcpu_time_info *vcpu_time; struct timespec ts; - xen_read_wallclock(&ts); + vcpu_time = &get_cpu_var(xen_vcpu)->time; + pvclock_read_wallclock(wall_clock, (void*)vcpu_time, &ts); + put_cpu_var(xen_vcpu); return ts.tv_sec; } @@ -576,8 +490,6 @@ __init void xen_time_init(void) { int cpu = smp_processor_id(); - get_time_values_from_xen(); - clocksource_register(&xen_clocksource); if (HYPERVISOR_vcpu_op(VCPUOP_stop_periodic_timer, cpu, NULL) == 0) { diff --git a/include/asm-x86/pvclock.h b/include/asm-x86/pvclock.h new file mode 100644 index 0000000..2b9812f --- /dev/null +++ b/include/asm-x86/pvclock.h @@ -0,0 +1,6 @@ +#include <linux/clocksource.h> +#include <asm/kvm_para.h> +cycle_t pvclock_clocksource_read(struct kvm_vcpu_time_info *src); +void pvclock_read_wallclock(struct kvm_wall_clock *wall, + struct kvm_vcpu_time_info *vcpu, + struct timespec *ts); [-- Attachment #3: Type: text/plain, Size: 320 bytes --] ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone [-- Attachment #4: Type: text/plain, Size: 158 bytes --] _______________________________________________ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [ RfC / patch ] kvmclock fixes 2008-04-21 8:33 ` [ RfC / patch ] kvmclock fixes Gerd Hoffmann @ 2008-04-21 9:57 ` Jeremy Fitzhardinge 2008-04-21 13:01 ` Gerd Hoffmann 0 siblings, 1 reply; 15+ messages in thread From: Jeremy Fitzhardinge @ 2008-04-21 9:57 UTC (permalink / raw) To: Gerd Hoffmann; +Cc: kvm-devel, Marcelo Tosatti Gerd Hoffmann wrote: > * Host: make kvm pv clock really compatible with xen pv clock. > * Guest/xen: factor out some xen clock code into a separate > source file (pvclock.[ch]), so kvm can reuse it. > * Guest/kvm: make kvm clock compatible with xen clock by using > the common code bits. > I guess saving on code duplication is good... > +cycle_t pvclock_clocksource_read(struct kvm_vcpu_time_info *src) > +{ > + struct pvclock_shadow_time *shadow = &get_cpu_var(shadow_time); > + cycle_t ret; > + > + pvclock_get_time_values(shadow, src); > + ret = shadow->system_timestamp + pvclock_get_nsec_offset(shadow); > You need to put this in a loop in case the system clock parameters change between the pvclock_get_time_values() and pvclock_get_nsec_offset(). How does kvm deal with suspend/resume with respect to time? Is the "system" timestamp guaranteed to remain monotonic? For Xen, I think we'll need to maintain an offset between the initial system timestamp and whatever it is after resuming. J ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [ RfC / patch ] kvmclock fixes 2008-04-21 9:57 ` Jeremy Fitzhardinge @ 2008-04-21 13:01 ` Gerd Hoffmann 2008-04-21 13:38 ` Jeremy Fitzhardinge 0 siblings, 1 reply; 15+ messages in thread From: Gerd Hoffmann @ 2008-04-21 13:01 UTC (permalink / raw) To: Jeremy Fitzhardinge; +Cc: kvm-devel, Marcelo Tosatti [-- Attachment #1: Type: text/plain, Size: 856 bytes --] Jeremy Fitzhardinge wrote: > Gerd Hoffmann wrote: >> +cycle_t pvclock_clocksource_read(struct kvm_vcpu_time_info *src) >> +{ >> + struct pvclock_shadow_time *shadow = &get_cpu_var(shadow_time); >> + cycle_t ret; >> + >> + pvclock_get_time_values(shadow, src); >> + ret = shadow->system_timestamp + pvclock_get_nsec_offset(shadow); >> > > You need to put this in a loop in case the system clock parameters > change between the pvclock_get_time_values() and pvclock_get_nsec_offset(). Fixed, new patch attached. > How does kvm deal with suspend/resume with respect to time? Is the > "system" timestamp guaranteed to remain monotonic? For Xen, I think > we'll need to maintain an offset between the initial system timestamp > and whatever it is after resuming. Havn't looked at it yet. cheers, Gerd -- http://kraxel.fedorapeople.org/xenner/ [-- Attachment #2: kvmclock-5.diff --] [-- Type: text/x-patch, Size: 15408 bytes --] diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 1cc9d42..688df87 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -79,7 +79,7 @@ obj-$(CONFIG_DEBUG_NX_TEST) += test_nx.o obj-$(CONFIG_VMI) += vmi_32.o vmiclock_32.o obj-$(CONFIG_KVM_GUEST) += kvm.o obj-$(CONFIG_KVM_CLOCK) += kvmclock.o -obj-$(CONFIG_PARAVIRT) += paravirt.o paravirt_patch_$(BITS).o +obj-$(CONFIG_PARAVIRT) += paravirt.o paravirt_patch_$(BITS).o pvclock.o ifdef CONFIG_INPUT_PCSPKR obj-y += pcspeaker.o diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index ddee040..476b7c7 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -18,6 +18,7 @@ #include <linux/clocksource.h> #include <linux/kvm_para.h> +#include <asm/pvclock.h> #include <asm/arch_hooks.h> #include <asm/msr.h> #include <asm/apic.h> @@ -37,17 +38,9 @@ early_param("no-kvmclock", parse_no_kvmclock); /* The hypervisor will put information about time periodically here */ static DEFINE_PER_CPU_SHARED_ALIGNED(struct kvm_vcpu_time_info, hv_clock); -#define get_clock(cpu, field) per_cpu(hv_clock, cpu).field - -static inline u64 kvm_get_delta(u64 last_tsc) -{ - int cpu = smp_processor_id(); - u64 delta = native_read_tsc() - last_tsc; - return (delta * get_clock(cpu, tsc_to_system_mul)) >> KVM_SCALE; -} static struct kvm_wall_clock wall_clock; -static cycle_t kvm_clock_read(void); + /* * The wallclock is the time of day when we booted. Since then, some time may * have elapsed since the hypervisor wrote the data. So we try to account for @@ -55,35 +48,19 @@ static cycle_t kvm_clock_read(void); */ unsigned long kvm_get_wallclock(void) { - u32 wc_sec, wc_nsec; - u64 delta; + struct kvm_vcpu_time_info *vcpu_time; struct timespec ts; - int version, nsec; int low, high; low = (int)__pa(&wall_clock); high = ((u64)__pa(&wall_clock) >> 32); - - delta = kvm_clock_read(); - native_write_msr(MSR_KVM_WALL_CLOCK, low, high); - do { - version = wall_clock.wc_version; - rmb(); - wc_sec = wall_clock.wc_sec; - wc_nsec = wall_clock.wc_nsec; - rmb(); - } while ((wall_clock.wc_version != version) || (version & 1)); - - delta = kvm_clock_read() - delta; - delta += wc_nsec; - nsec = do_div(delta, NSEC_PER_SEC); - set_normalized_timespec(&ts, wc_sec + delta, nsec); - /* - * Of all mechanisms of time adjustment I've tested, this one - * was the champion! - */ - return ts.tv_sec + 1; + + vcpu_time = &get_cpu_var(hv_clock); + pvclock_read_wallclock(&wall_clock, vcpu_time, &ts); + put_cpu_var(hv_clock); + + return ts.tv_sec; } int kvm_set_wallclock(unsigned long now) @@ -91,28 +68,17 @@ int kvm_set_wallclock(unsigned long now) return 0; } -/* - * This is our read_clock function. The host puts an tsc timestamp each time - * it updates a new time. Without the tsc adjustment, we can have a situation - * in which a vcpu starts to run earlier (smaller system_time), but probes - * time later (compared to another vcpu), leading to backwards time - */ static cycle_t kvm_clock_read(void) { - u64 last_tsc, now; - int cpu; + struct kvm_vcpu_time_info *src; + cycle_t ret; - preempt_disable(); - cpu = smp_processor_id(); - - last_tsc = get_clock(cpu, tsc_timestamp); - now = get_clock(cpu, system_time); - - now += kvm_get_delta(last_tsc); - preempt_enable(); - - return now; + src = &get_cpu_var(hv_clock); + ret = pvclock_clocksource_read(src); + put_cpu_var(hv_clock); + return ret; } + static struct clocksource kvm_clock = { .name = "kvm-clock", .read = kvm_clock_read, diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c new file mode 100644 index 0000000..6e7dae0 --- /dev/null +++ b/arch/x86/kernel/pvclock.c @@ -0,0 +1,150 @@ +/* paravirtual clock -- common code used by kvm/xen + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program; if not, write to the Free Software + Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +*/ + +#include <linux/kernel.h> +#include <linux/percpu.h> +#include <asm/pvclock.h> + +/* + * These are perodically updated + * xen: magic shared_info page + * kvm: gpa registered via msr + * and then copied here. + */ +struct pvclock_shadow_time { + u64 tsc_timestamp; /* TSC at last update of time vals. */ + u64 system_timestamp; /* Time, in nanosecs, since boot. */ + u32 tsc_to_nsec_mul; + int tsc_shift; + u32 version; +}; + +static DEFINE_PER_CPU(struct pvclock_shadow_time, shadow_time); + +/* + * Scale a 64-bit delta by scaling and multiplying by a 32-bit fraction, + * yielding a 64-bit result. + */ +static inline u64 scale_delta(u64 delta, u32 mul_frac, int shift) +{ + u64 product; +#ifdef __i386__ + u32 tmp1, tmp2; +#endif + + if (shift < 0) + delta >>= -shift; + else + delta <<= shift; + +#ifdef __i386__ + __asm__ ( + "mul %5 ; " + "mov %4,%%eax ; " + "mov %%edx,%4 ; " + "mul %5 ; " + "xor %5,%5 ; " + "add %4,%%eax ; " + "adc %5,%%edx ; " + : "=A" (product), "=r" (tmp1), "=r" (tmp2) + : "a" ((u32)delta), "1" ((u32)(delta >> 32)), "2" (mul_frac) ); +#elif __x86_64__ + __asm__ ( + "mul %%rdx ; shrd $32,%%rdx,%%rax" + : "=a" (product) : "0" (delta), "d" ((u64)mul_frac) ); +#else +#error implement me! +#endif + + return product; +} + +static u64 pvclock_get_nsec_offset(struct pvclock_shadow_time *shadow) +{ + u64 delta = native_read_tsc() - shadow->tsc_timestamp; + return scale_delta(delta, shadow->tsc_to_nsec_mul, shadow->tsc_shift); +} + +/* + * Reads a consistent set of time-base values from hypervisor, + * into a shadow data area. + */ +static unsigned pvclock_get_time_values(struct pvclock_shadow_time *dst, + struct kvm_vcpu_time_info *src) +{ + do { + dst->version = src->version; + rmb(); /* fetch version before data */ + dst->tsc_timestamp = src->tsc_timestamp; + dst->system_timestamp = src->system_time; + dst->tsc_to_nsec_mul = src->tsc_to_system_mul; + dst->tsc_shift = src->tsc_shift; + rmb(); /* test version after fetching data */ + } while ((src->version & 1) | (dst->version ^ src->version)); + + return dst->version; +} + +/* + * This is our read_clock function. The host puts an tsc timestamp each time + * it updates a new time. Without the tsc adjustment, we can have a situation + * in which a vcpu starts to run earlier (smaller system_time), but probes + * time later (compared to another vcpu), leading to backwards time + */ +cycle_t pvclock_clocksource_read(struct kvm_vcpu_time_info *src) +{ + struct pvclock_shadow_time *shadow; + cycle_t ret; + unsigned version; + + shadow = &get_cpu_var(shadow_time); + do { + version = pvclock_get_time_values(shadow, src); + barrier(); + ret = shadow->system_timestamp + pvclock_get_nsec_offset(shadow); + barrier(); + } while (version != src->version); + + put_cpu_var(shadow_time); + return ret; +} + +void pvclock_read_wallclock(struct kvm_wall_clock *wall_clock, + struct kvm_vcpu_time_info *vcpu_time, + struct timespec *ts) +{ + u32 version; + u64 delta; + struct timespec now; + + /* get wallclock at system boot */ + do { + version = wall_clock->wc_version; + rmb(); /* fetch version before time */ + now.tv_sec = wall_clock->wc_sec; + now.tv_nsec = wall_clock->wc_nsec; + rmb(); /* fetch time before checking version */ + } while ((wall_clock->wc_version & 1) || (version != wall_clock->wc_version)); + + delta = pvclock_clocksource_read(vcpu_time); /* time since system boot */ + delta += now.tv_sec * (u64)NSEC_PER_SEC + now.tv_nsec; + + now.tv_nsec = do_div(delta, NSEC_PER_SEC); + now.tv_sec = delta; + + set_normalized_timespec(ts, now.tv_sec, now.tv_nsec); +} diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 0ce5563..bdd3128 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -493,7 +493,7 @@ static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock) { static int version; struct kvm_wall_clock wc; - struct timespec wc_ts; + struct timespec now,sys,boot; if (!wall_clock) return; @@ -502,9 +502,16 @@ static void kvm_write_wall_clock(struct kvm *kvm, gpa_t wall_clock) kvm_write_guest(kvm, wall_clock, &version, sizeof(version)); - wc_ts = current_kernel_time(); - wc.wc_sec = wc_ts.tv_sec; - wc.wc_nsec = wc_ts.tv_nsec; +#if 0 + /* Hmm, getboottime() isn't exported to modules ... */ + getboottime(&boot); +#else + now = current_kernel_time(); + ktime_get_ts(&sys); + boot = ns_to_timespec(timespec_to_ns(&now) - timespec_to_ns(&sys)); +#endif + wc.wc_sec = boot.tv_sec; + wc.wc_nsec = boot.tv_nsec; wc.wc_version = version; kvm_write_guest(kvm, wall_clock, &wc, sizeof(wc)); @@ -544,13 +551,51 @@ static void kvm_write_guest_time(struct kvm_vcpu *v) shared_kaddr = kmap_atomic(vcpu->time_page, KM_USER0); memcpy(shared_kaddr + vcpu->time_offset, &vcpu->hv_clock, - sizeof(vcpu->hv_clock)); + sizeof(vcpu->hv_clock)); kunmap_atomic(shared_kaddr, KM_USER0); mark_page_dirty(v->kvm, vcpu->time >> PAGE_SHIFT); } +static uint32_t div_frac(uint32_t dividend, uint32_t divisor) +{ + uint32_t quotient, remainder; + + __asm__ ( "divl %4" + : "=a" (quotient), "=d" (remainder) + : "0" (0), "1" (dividend), "r" (divisor) ); + return quotient; +} + +static void kvm_set_time_scale(uint32_t tsc_khz, struct kvm_vcpu_time_info *hv_clock) +{ + uint64_t nsecs = 1000000000LL; + int32_t shift = 0; + uint64_t tps64; + uint32_t tps32; + + tps64 = tsc_khz * 1000LL; + while (tps64 > nsecs*2) { + tps64 >>= 1; + shift--; + } + + tps32 = (uint32_t)tps64; + while (tps32 <= (uint32_t)nsecs) { + tps32 <<= 1; + shift++; + } + + hv_clock->tsc_shift = shift; + hv_clock->tsc_to_system_mul = div_frac(nsecs, tps32); + +#if 0 + printk(KERN_DEBUG "%s: tsc_khz %u, tsc_shift %d, tsc_mul %u\n", + __FUNCTION__, tsc_khz, hv_clock->tsc_shift, + hv_clock->tsc_to_system_mul); +#endif +} int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) { @@ -599,9 +644,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 data) /* ...but clean it before doing the actual write */ vcpu->arch.time_offset = data & ~(PAGE_MASK | 1); - vcpu->arch.hv_clock.tsc_to_system_mul = - clocksource_khz2mult(tsc_khz, 22); - vcpu->arch.hv_clock.tsc_shift = 22; + kvm_set_time_scale(tsc_khz, &vcpu->arch.hv_clock); down_read(¤t->mm->mmap_sem); vcpu->arch.time_page = diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c index c39e1a5..3d5f945 100644 --- a/arch/x86/xen/time.c +++ b/arch/x86/xen/time.c @@ -13,6 +13,7 @@ #include <linux/clockchips.h> #include <linux/kernel_stat.h> +#include <asm/pvclock.h> #include <asm/xen/hypervisor.h> #include <asm/xen/hypercall.h> @@ -30,17 +31,6 @@ static cycle_t xen_clocksource_read(void); -/* These are perodically updated in shared_info, and then copied here. */ -struct shadow_time_info { - u64 tsc_timestamp; /* TSC at last update of time vals. */ - u64 system_timestamp; /* Time, in nanosecs, since boot. */ - u32 tsc_to_nsec_mul; - int tsc_shift; - u32 version; -}; - -static DEFINE_PER_CPU(struct shadow_time_info, shadow_time); - /* runstate info updated by Xen */ static DEFINE_PER_CPU(struct vcpu_runstate_info, runstate); @@ -230,95 +220,14 @@ unsigned long xen_cpu_khz(void) return xen_khz; } -/* - * Reads a consistent set of time-base values from Xen, into a shadow data - * area. - */ -static unsigned get_time_values_from_xen(void) -{ - struct vcpu_time_info *src; - struct shadow_time_info *dst; - - /* src is shared memory with the hypervisor, so we need to - make sure we get a consistent snapshot, even in the face of - being preempted. */ - src = &__get_cpu_var(xen_vcpu)->time; - dst = &__get_cpu_var(shadow_time); - - do { - dst->version = src->version; - rmb(); /* fetch version before data */ - dst->tsc_timestamp = src->tsc_timestamp; - dst->system_timestamp = src->system_time; - dst->tsc_to_nsec_mul = src->tsc_to_system_mul; - dst->tsc_shift = src->tsc_shift; - rmb(); /* test version after fetching data */ - } while ((src->version & 1) | (dst->version ^ src->version)); - - return dst->version; -} - -/* - * Scale a 64-bit delta by scaling and multiplying by a 32-bit fraction, - * yielding a 64-bit result. - */ -static inline u64 scale_delta(u64 delta, u32 mul_frac, int shift) -{ - u64 product; -#ifdef __i386__ - u32 tmp1, tmp2; -#endif - - if (shift < 0) - delta >>= -shift; - else - delta <<= shift; - -#ifdef __i386__ - __asm__ ( - "mul %5 ; " - "mov %4,%%eax ; " - "mov %%edx,%4 ; " - "mul %5 ; " - "xor %5,%5 ; " - "add %4,%%eax ; " - "adc %5,%%edx ; " - : "=A" (product), "=r" (tmp1), "=r" (tmp2) - : "a" ((u32)delta), "1" ((u32)(delta >> 32)), "2" (mul_frac) ); -#elif __x86_64__ - __asm__ ( - "mul %%rdx ; shrd $32,%%rdx,%%rax" - : "=a" (product) : "0" (delta), "d" ((u64)mul_frac) ); -#else -#error implement me! -#endif - - return product; -} - -static u64 get_nsec_offset(struct shadow_time_info *shadow) -{ - u64 now, delta; - now = native_read_tsc(); - delta = now - shadow->tsc_timestamp; - return scale_delta(delta, shadow->tsc_to_nsec_mul, shadow->tsc_shift); -} - static cycle_t xen_clocksource_read(void) { - struct shadow_time_info *shadow = &get_cpu_var(shadow_time); + struct vcpu_time_info *src; cycle_t ret; - unsigned version; - - do { - version = get_time_values_from_xen(); - barrier(); - ret = shadow->system_timestamp + get_nsec_offset(shadow); - barrier(); - } while (version != __get_cpu_var(xen_vcpu)->time.version); - - put_cpu_var(shadow_time); + src = &get_cpu_var(xen_vcpu)->time; + ret = pvclock_clocksource_read((void*)src); + put_cpu_var(xen_vcpu); return ret; } @@ -349,9 +258,14 @@ static void xen_read_wallclock(struct timespec *ts) unsigned long xen_get_wallclock(void) { + const struct shared_info *s = HYPERVISOR_shared_info; + struct kvm_wall_clock *wall_clock = (void*)&(s->wc_version); + struct vcpu_time_info *vcpu_time; struct timespec ts; - xen_read_wallclock(&ts); + vcpu_time = &get_cpu_var(xen_vcpu)->time; + pvclock_read_wallclock(wall_clock, (void*)vcpu_time, &ts); + put_cpu_var(xen_vcpu); return ts.tv_sec; } @@ -576,8 +490,6 @@ __init void xen_time_init(void) { int cpu = smp_processor_id(); - get_time_values_from_xen(); - clocksource_register(&xen_clocksource); if (HYPERVISOR_vcpu_op(VCPUOP_stop_periodic_timer, cpu, NULL) == 0) { diff --git a/include/asm-x86/pvclock.h b/include/asm-x86/pvclock.h new file mode 100644 index 0000000..2b9812f --- /dev/null +++ b/include/asm-x86/pvclock.h @@ -0,0 +1,6 @@ +#include <linux/clocksource.h> +#include <asm/kvm_para.h> +cycle_t pvclock_clocksource_read(struct kvm_vcpu_time_info *src); +void pvclock_read_wallclock(struct kvm_wall_clock *wall, + struct kvm_vcpu_time_info *vcpu, + struct timespec *ts); [-- Attachment #3: Type: text/plain, Size: 320 bytes --] ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone [-- Attachment #4: Type: text/plain, Size: 158 bytes --] _______________________________________________ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [ RfC / patch ] kvmclock fixes 2008-04-21 13:01 ` Gerd Hoffmann @ 2008-04-21 13:38 ` Jeremy Fitzhardinge 0 siblings, 0 replies; 15+ messages in thread From: Jeremy Fitzhardinge @ 2008-04-21 13:38 UTC (permalink / raw) To: Gerd Hoffmann; +Cc: kvm-devel, Marcelo Tosatti Gerd Hoffmann wrote: > +cycle_t pvclock_clocksource_read(struct kvm_vcpu_time_info *src) > +{ > + struct pvclock_shadow_time *shadow; > + cycle_t ret; > + unsigned version; > + > + shadow = &get_cpu_var(shadow_time); > + do { > + version = pvclock_get_time_values(shadow, src); > + barrier(); > + ret = shadow->system_timestamp + pvclock_get_nsec_offset(shadow); > + barrier(); > Is barrier() strong enough? Does kvm guarantee that the per-cpu time parameters are only ever updated by that cpu? I'm pretty sure Xen does, so that's OK. J ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: paravirt clock stil causing hangs in kvm-65 2008-04-21 7:15 ` Gerd Hoffmann 2008-04-21 8:33 ` [ RfC / patch ] kvmclock fixes Gerd Hoffmann @ 2008-04-24 13:20 ` Glauber Costa 2008-04-26 14:24 ` extmaillist 1 sibling, 1 reply; 15+ messages in thread From: Glauber Costa @ 2008-04-24 13:20 UTC (permalink / raw) To: Gerd Hoffmann; +Cc: kvm-devel, Marcelo Tosatti On Mon, Apr 21, 2008 at 4:15 AM, Gerd Hoffmann <kraxel@redhat.com> wrote: > Marcelo Tosatti wrote: > >> >From what me and marcelo discussed, I think there's a possibility that > >> it has marginally something to do with precision of clock calculation. > >> Gerd's patches address that issues. Can somebody test this with those > >> patches (both guest and host), while I'm off ? > > > > Haven't seen Gerd's guest patches ? > > I'm still busy cooking them up. I've mentioned them in a mail, but they > didn't ran over the list (yet). Stay tuned ;) > > cheers, > Gerd > Just saw Gerd's patches flying around. Can anyone that is able to reproduce this problem (a subgroup of human beings that does not include me) test it with them applied? If it still fails, please let me know asap -- Glauber Costa. "Free as in Freedom" http://glommer.net "The less confident you are, the more serious you have to act." ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: paravirt clock stil causing hangs in kvm-65 2008-04-24 13:20 ` paravirt clock stil causing hangs in kvm-65 Glauber Costa @ 2008-04-26 14:24 ` extmaillist 0 siblings, 0 replies; 15+ messages in thread From: extmaillist @ 2008-04-26 14:24 UTC (permalink / raw) To: Glauber Costa; +Cc: kvm-devel, nikola.ciprich, Marcelo Tosatti, Gerd Hoffmann Hi Glauber, sorry for late reply. well, I'm no longer able to reproduce the problem in the same way (with backtraces etc) as before, but anyways enabling paravirt_clock with or without Your patches on SMP guest still causes troubles: on my phenom machine, the kernel hangs after printing "PCI-GART - No AMD northbridge found." on intel machine normal system boot seems to be terribly slow taking tens of seconds between steps and later hangs, using init=/bin/sh seems to work though. I'm wondering how can I get the backtraces I was getting before, I have soft CPU lockup debugging enabled, what else could help? cheers n. On Thu, 24 Apr 2008, Glauber Costa wrote: > Just saw Gerd's patches flying around. Can anyone that is able to > reproduce this problem (a subgroup of human beings that does not > include me) test it with them applied? > > If it still fails, please let me know asap > > -- > Glauber Costa. > "Free as in Freedom" > http://glommer.net > > "The less confident you are, the more serious you have to act." > > ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: paravirt clock stil causing hangs in kvm-65 2008-04-19 15:29 ` Marcelo Tosatti 2008-04-19 16:22 ` Glauber Costa @ 2008-04-19 16:29 ` Marcelo Tosatti 1 sibling, 0 replies; 15+ messages in thread From: Marcelo Tosatti @ 2008-04-19 16:29 UTC (permalink / raw) To: Nikola Ciprich, Glauber de Oliveira Costa; +Cc: kvm-devel On Sat, Apr 19, 2008 at 12:29:47PM -0300, Marcelo Tosatti wrote: > > I just reproduced this on a UP guest. Were you seeing the exact same > > stack trace in the guest with kvm-64 ? > > I've been able to reproduce the problem. Symptoms are that when using > NOHZ vcpu0 LAPIC timer is ticking far less than the others (apparently vcpu0 > is the only one ticking "correctly"): > > > nohz=on with kvmclock > [root@localhost ~]# cat /proc/timer_stats | grep apic > 13214, 8590 qemu-system-x86 apic_mmio_write (apic_timer_fn) > 13214, 8589 qemu-system-x86 apic_mmio_write (apic_timer_fn) > 13211, 8588 qemu-system-x86 apic_mmio_write (apic_timer_fn) > 389, 8587 qemu-system-x86 apic_mmio_write (apic_timer_fn) > > nohz=off > 3253, 8672 qemu-system-x86 apic_mmio_write (apic_timer_fn) > 2876, 8673 qemu-system-x86 apic_mmio_write (apic_timer_fn) > 2543, 8674 qemu-system-x86 apic_mmio_write (apic_timer_fn) > 2179, 8675 qemu-system-x86 apic_mmio_write (apic_timer_fn) > > no-kvmclock > 1017, 8808 qemu-system-x86 apic_mmio_write (apic_timer_fn) > 1577, 8809 qemu-system-x86 apic_mmio_write (apic_timer_fn) > 1708, 8807 qemu-system-x86 apic_mmio_write (apic_timer_fn) > 1812, 8806 qemu-system-x86 apic_mmio_write (apic_timer_fn) > > > Glauber will start looking at this next week. Glauber, that printk you suggested has just triggered, but in a different condition. Guest was working fine (SMP 2-way), then suddenly: [root@localhost bonnie++-1.03c]# ./bonnie++ You must use the "-u" switch when running as root. usage: bonnie++ [-d scratch-dir] [-s size(Mb)[:chunk-size(b)]] [-n number-to-stat[:max-size[:min-size][:num-directories]]] [-m machine-name] [-r ram-size-iserial8250: too much work for irq4 n-Mb] [-x number-of-tests] [-u uid-to-use:gid-to-use] [-g gid-to-use] [-q] [-f] [-b] [-p processes | -y] Version: 1.03c [root@localhost bonnie++-1.03c]# ./bonnie++ dirty_portuguese_word: -361322 And there it hanged. diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index a3fa587..7785fcc 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -453,6 +453,8 @@ void update_wall_time(void) #else offset = clock->cycle_interval; #endif + if ((s64) offset < 0) + printk("...! %lld\n", (s64)offset); clock->xtime_nsec += (s64)xtime.tv_nsec << clock->shift; /* normally this loop will run just once, however in the ------------------------------------------------------------------------- This SF.net email is sponsored by the 2008 JavaOne(SM) Conference Don't miss this year's exciting event. There's still time to save $100. Use priority code J8TL2D2. http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone ^ permalink raw reply related [flat|nested] 15+ messages in thread
end of thread, other threads:[~2008-04-26 14:24 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-04-07 11:53 paravirt clock stil causing hangs in kvm-65 Nikola Ciprich 2008-04-07 21:34 ` Marcelo Tosatti 2008-04-08 1:18 ` Marcelo Tosatti 2008-04-08 5:59 ` Nikola Ciprich 2008-04-19 15:29 ` Marcelo Tosatti 2008-04-19 16:22 ` Glauber Costa 2008-04-19 16:49 ` Marcelo Tosatti 2008-04-21 7:15 ` Gerd Hoffmann 2008-04-21 8:33 ` [ RfC / patch ] kvmclock fixes Gerd Hoffmann 2008-04-21 9:57 ` Jeremy Fitzhardinge 2008-04-21 13:01 ` Gerd Hoffmann 2008-04-21 13:38 ` Jeremy Fitzhardinge 2008-04-24 13:20 ` paravirt clock stil causing hangs in kvm-65 Glauber Costa 2008-04-26 14:24 ` extmaillist 2008-04-19 16:29 ` Marcelo Tosatti
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox