kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 0/3] kvmclock: allow stable sched clock
@ 2015-05-28  1:46 Marcelo Tosatti
  2015-05-28  1:46 ` [patch 1/3] x86: kvmclock: add flag to indicate pvclock counts from zero Marcelo Tosatti
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Marcelo Tosatti @ 2015-05-28  1:46 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Luiz Capitulino, kvm

kvmclock provides the behaviour sched_clock users expect.
Mark it as stable allowing nohz_full in guests.
See individual patches for more details.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [patch 1/3] x86: kvmclock: add flag to indicate pvclock counts from zero
  2015-05-28  1:46 [patch 0/3] kvmclock: allow stable sched clock Marcelo Tosatti
@ 2015-05-28  1:46 ` Marcelo Tosatti
  2015-05-28  1:46 ` [patch 2/3] x86: kvmclock: set scheduler clock stable Marcelo Tosatti
  2015-05-28  1:47 ` [patch 3/3] KVM: x86: zero kvmclock_offset when vcpu0 initializes kvmclock system MSR Marcelo Tosatti
  2 siblings, 0 replies; 6+ messages in thread
From: Marcelo Tosatti @ 2015-05-28  1:46 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Luiz Capitulino, kvm, Marcelo Tosatti

[-- Attachment #1: 01-pvclock-add-flag --]
[-- Type: text/plain, Size: 880 bytes --]

Setting sched clock stable for kvmclock causes the printk timestamps
to not start from zero, which is different from baremetal and 
can possibly break userspace. Add a flag to indicate that 
hypervisor sets clock base at zero when kvmclock is initialized.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

---
 arch/x86/include/asm/pvclock-abi.h |    1 +
 1 file changed, 1 insertion(+)

Index: kvm/arch/x86/include/asm/pvclock-abi.h
===================================================================
--- kvm.orig/arch/x86/include/asm/pvclock-abi.h	2014-11-06 23:59:14.615913334 -0200
+++ kvm/arch/x86/include/asm/pvclock-abi.h	2015-05-27 17:40:53.435192771 -0300
@@ -41,5 +41,6 @@
 
 #define PVCLOCK_TSC_STABLE_BIT	(1 << 0)
 #define PVCLOCK_GUEST_STOPPED	(1 << 1)
+#define PVCLOCK_COUNTS_FROM_ZERO (1 << 2)
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_X86_PVCLOCK_ABI_H */



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [patch 2/3] x86: kvmclock: set scheduler clock stable
  2015-05-28  1:46 [patch 0/3] kvmclock: allow stable sched clock Marcelo Tosatti
  2015-05-28  1:46 ` [patch 1/3] x86: kvmclock: add flag to indicate pvclock counts from zero Marcelo Tosatti
@ 2015-05-28  1:46 ` Marcelo Tosatti
  2015-05-28  8:39   ` Paolo Bonzini
  2015-05-28  1:47 ` [patch 3/3] KVM: x86: zero kvmclock_offset when vcpu0 initializes kvmclock system MSR Marcelo Tosatti
  2 siblings, 1 reply; 6+ messages in thread
From: Marcelo Tosatti @ 2015-05-28  1:46 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Luiz Capitulino, kvm, Marcelo Tosatti

[-- Attachment #1: 02-guest-patch --]
[-- Type: text/plain, Size: 3489 bytes --]

From: Luiz Capitulino <lcapitulino@redhat.com>

If you try to enable NOHZ_FULL on a guest today, you'll get
the following error when the guest tries to deactivate the
scheduler tick:

 WARNING: CPU: 3 PID: 2182 at kernel/time/tick-sched.c:192 can_stop_full_tick+0xb9/0x290()
 NO_HZ FULL will not work with unstable sched clock
 CPU: 3 PID: 2182 Comm: kworker/3:1 Not tainted 4.0.0-10545-gb9bb6fb #204
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 Workqueue: events flush_to_ldisc
  ffffffff8162a0c7 ffff88011f583e88 ffffffff814e6ba0 0000000000000002
  ffff88011f583ed8 ffff88011f583ec8 ffffffff8104d095 ffff88011f583eb8
  0000000000000000 0000000000000003 0000000000000001 0000000000000001
 Call Trace:
  <IRQ>  [<ffffffff814e6ba0>] dump_stack+0x4f/0x7b
  [<ffffffff8104d095>] warn_slowpath_common+0x85/0xc0
  [<ffffffff8104d146>] warn_slowpath_fmt+0x46/0x50
  [<ffffffff810bd2a9>] can_stop_full_tick+0xb9/0x290
  [<ffffffff810bd9ed>] tick_nohz_irq_exit+0x8d/0xb0
  [<ffffffff810511c5>] irq_exit+0xc5/0x130
  [<ffffffff814f180a>] smp_apic_timer_interrupt+0x4a/0x60
  [<ffffffff814eff5e>] apic_timer_interrupt+0x6e/0x80
  <EOI>  [<ffffffff814ee5d1>] ? _raw_spin_unlock_irqrestore+0x31/0x60
  [<ffffffff8108bbc8>] __wake_up+0x48/0x60
  [<ffffffff8134836c>] n_tty_receive_buf_common+0x49c/0xba0
  [<ffffffff8134a6bf>] ? tty_ldisc_ref+0x1f/0x70
  [<ffffffff81348a84>] n_tty_receive_buf2+0x14/0x20
  [<ffffffff8134b390>] flush_to_ldisc+0xe0/0x120
  [<ffffffff81064d05>] process_one_work+0x1d5/0x540
  [<ffffffff81064c81>] ? process_one_work+0x151/0x540
  [<ffffffff81065191>] worker_thread+0x121/0x470
  [<ffffffff81065070>] ? process_one_work+0x540/0x540
  [<ffffffff8106b4df>] kthread+0xef/0x110
  [<ffffffff8106b3f0>] ? __kthread_parkme+0xa0/0xa0
  [<ffffffff814ef4f2>] ret_from_fork+0x42/0x70
  [<ffffffff8106b3f0>] ? __kthread_parkme+0xa0/0xa0
 ---[ end trace 06e3507544a38866 ]---

However, it turns out that kvmclock does provide a stable
sched_clock callback. So, let the scheduler know this which
in turn makes NOHZ_FULL work in the guest.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>

---
 arch/x86/kernel/kvmclock.c |   17 +++++++++++++++--
 1 file changed, 15 insertions(+), 2 deletions(-)

Index: kvm/arch/x86/kernel/kvmclock.c
===================================================================
--- kvm.orig/arch/x86/kernel/kvmclock.c	2015-05-27 18:00:53.616391551 -0300
+++ kvm/arch/x86/kernel/kvmclock.c	2015-05-27 22:43:14.474432962 -0300
@@ -24,6 +24,7 @@
 #include <linux/percpu.h>
 #include <linux/hardirq.h>
 #include <linux/memblock.h>
+#include <linux/sched.h>
 
 #include <asm/x86_init.h>
 #include <asm/reboot.h>
@@ -217,8 +218,10 @@
 
 void __init kvmclock_init(void)
 {
+	struct pvclock_vcpu_time_info *vcpu_time;
 	unsigned long mem;
-	int size;
+	int size, cpu;
+	u8 flags;
 
 	size = PAGE_ALIGN(sizeof(struct pvclock_vsyscall_time_info)*NR_CPUS);
 
@@ -263,8 +266,18 @@
 	clocksource_register_hz(&kvm_clock, NSEC_PER_SEC);
 	pv_info.name = "KVM";
 
+	flags = PVCLOCK_COUNTS_FROM_ZERO;
 	if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE_STABLE_BIT))
-		pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
+		flags |= PVCLOCK_TSC_STABLE_BIT;
+
+	pvclock_set_flags(flags);
+
+	cpu = get_cpu();
+	vcpu_time = &hv_clock[cpu].pvti;
+	flags = pvclock_read_flags(vcpu_time);
+	if (flags & PVCLOCK_COUNTS_FROM_ZERO)
+		set_sched_clock_stable();
+	put_cpu();
 }
 
 int __init kvm_setup_vsyscall_timeinfo(void)



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [patch 3/3] KVM: x86: zero kvmclock_offset when vcpu0 initializes kvmclock system MSR
  2015-05-28  1:46 [patch 0/3] kvmclock: allow stable sched clock Marcelo Tosatti
  2015-05-28  1:46 ` [patch 1/3] x86: kvmclock: add flag to indicate pvclock counts from zero Marcelo Tosatti
  2015-05-28  1:46 ` [patch 2/3] x86: kvmclock: set scheduler clock stable Marcelo Tosatti
@ 2015-05-28  1:47 ` Marcelo Tosatti
  2015-05-28  8:52   ` Paolo Bonzini
  2 siblings, 1 reply; 6+ messages in thread
From: Marcelo Tosatti @ 2015-05-28  1:47 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Luiz Capitulino, kvm, Marcelo Tosatti

[-- Attachment #1: 03-host-patch --]
[-- Type: text/plain, Size: 1038 bytes --]

Initialize kvmclock base, on kvmclock system MSR write time,
so that the guest sees kvmclock counting from zero.

This matches baremetal behaviour when kvmclock in guest
sets sched clock stable.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

---
 arch/x86/kvm/x86.c |    5 +++++
 1 file changed, 5 insertions(+)

Index: kvm/arch/x86/kvm/x86.c
===================================================================
--- kvm.orig/arch/x86/kvm/x86.c	2015-05-27 17:40:46.948189811 -0300
+++ kvm/arch/x86/kvm/x86.c	2015-05-27 22:43:47.340413347 -0300
@@ -1703,6 +1703,8 @@
 	/* If the host uses TSC clocksource, then it is stable */
 	if (use_master_clock)
 		pvclock_flags |= PVCLOCK_TSC_STABLE_BIT;
+	if (ka->kvmclk_counts_from_zero)
+		pvclock_flags |= PVCLOCK_COUNTS_FROM_ZERO;
 
 	vcpu->hv_clock.flags = pvclock_flags;
 
@@ -2282,6 +2284,9 @@
 					&vcpu->requests);
 
 			ka->boot_vcpu_runs_old_kvmclock = tmp;
+
+			ka->kvmclock_offset = -get_kernel_ns();
+			ka->kvmclk_counts_from_zero = true;
 		}
 
 		vcpu->arch.time = data;



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch 2/3] x86: kvmclock: set scheduler clock stable
  2015-05-28  1:46 ` [patch 2/3] x86: kvmclock: set scheduler clock stable Marcelo Tosatti
@ 2015-05-28  8:39   ` Paolo Bonzini
  0 siblings, 0 replies; 6+ messages in thread
From: Paolo Bonzini @ 2015-05-28  8:39 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Luiz Capitulino, kvm



On 28/05/2015 03:46, Marcelo Tosatti wrote:
> +	flags = PVCLOCK_COUNTS_FROM_ZERO;

If the KVM_FEATURE_CLOCKSOURCE_STABLE_BIT bit is not set, we cannot
trust flags at all.  So let's just do...

>  	if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE_STABLE_BIT))
> -		pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
> +		flags |= PVCLOCK_TSC_STABLE_BIT;

-		pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
+		pvclock_set_flags(~0);

Otherwise looks good.  Shall I do the above change and apply?

Paolo

> +	pvclock_set_flags(flags);
> +
> +	cpu = get_cpu();
> +	vcpu_time = &hv_clock[cpu].pvti;
> +	flags = pvclock_read_flags(vcpu_time);
> +	if (flags & PVCLOCK_COUNTS_FROM_ZERO)
> +		set_sched_clock_stable();
> +	put_cpu();
>  }

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch 3/3] KVM: x86: zero kvmclock_offset when vcpu0 initializes kvmclock system MSR
  2015-05-28  1:47 ` [patch 3/3] KVM: x86: zero kvmclock_offset when vcpu0 initializes kvmclock system MSR Marcelo Tosatti
@ 2015-05-28  8:52   ` Paolo Bonzini
  0 siblings, 0 replies; 6+ messages in thread
From: Paolo Bonzini @ 2015-05-28  8:52 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Luiz Capitulino, kvm



On 28/05/2015 03:47, Marcelo Tosatti wrote:
> Initialize kvmclock base, on kvmclock system MSR write time,
> so that the guest sees kvmclock counting from zero.
> 
> This matches baremetal behaviour when kvmclock in guest
> sets sched clock stable.
> 
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> 
> ---
>  arch/x86/kvm/x86.c |    5 +++++
>  1 file changed, 5 insertions(+)
> 
> Index: kvm/arch/x86/kvm/x86.c
> ===================================================================
> --- kvm.orig/arch/x86/kvm/x86.c	2015-05-27 17:40:46.948189811 -0300
> +++ kvm/arch/x86/kvm/x86.c	2015-05-27 22:43:47.340413347 -0300
> @@ -1703,6 +1703,8 @@
>  	/* If the host uses TSC clocksource, then it is stable */
>  	if (use_master_clock)
>  		pvclock_flags |= PVCLOCK_TSC_STABLE_BIT;
> +	if (ka->kvmclk_counts_from_zero)

Not defined anywhere.

> +		pvclock_flags |= PVCLOCK_COUNTS_FROM_ZERO;

Why can't this be unconditional?

Paolo

>  	vcpu->hv_clock.flags = pvclock_flags;
>  
> @@ -2282,6 +2284,9 @@
>  					&vcpu->requests);
>  
>  			ka->boot_vcpu_runs_old_kvmclock = tmp;
> +
> +			ka->kvmclock_offset = -get_kernel_ns();
> +			ka->kvmclk_counts_from_zero = true;
>  		}
>  
>  		vcpu->arch.time = data;
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-05-28  8:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-28  1:46 [patch 0/3] kvmclock: allow stable sched clock Marcelo Tosatti
2015-05-28  1:46 ` [patch 1/3] x86: kvmclock: add flag to indicate pvclock counts from zero Marcelo Tosatti
2015-05-28  1:46 ` [patch 2/3] x86: kvmclock: set scheduler clock stable Marcelo Tosatti
2015-05-28  8:39   ` Paolo Bonzini
2015-05-28  1:47 ` [patch 3/3] KVM: x86: zero kvmclock_offset when vcpu0 initializes kvmclock system MSR Marcelo Tosatti
2015-05-28  8:52   ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).