public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Alexander Graf <agraf@suse.de>, Greg KH <gregkh@suse.de>
Cc: Avi Kivity <avi@redhat.com>, KVM list <kvm@vger.kernel.org>
Subject: Re: [PATCH 15/35] KVM: allow userspace to adjust kvmclock offset
Date: Mon, 1 Feb 2010 16:54:05 -0200	[thread overview]
Message-ID: <20100201185405.GB5381@amt.cnet> (raw)
In-Reply-To: <5E4FA032-114A-4A69-B1D6-48AEF3158E1D@suse.de>

On Fri, Jan 29, 2010 at 02:32:43PM +0100, Alexander Graf wrote:
> 
> On 19.11.2009, at 14:34, Avi Kivity wrote:
> 
> > From: Glauber Costa <glommer@redhat.com>
> > 
> > When we migrate a kvm guest that uses pvclock between two hosts, we may
> > suffer a large skew. This is because there can be significant differences
> > between the monotonic clock of the hosts involved. When a new host with
> > a much larger monotonic time starts running the guest, the view of time
> > will be significantly impacted.
> > 
> > Situation is much worse when we do the opposite, and migrate to a host with
> > a smaller monotonic clock.
> > 
> > This proposed ioctl will allow userspace to inform us what is the monotonic
> > clock value in the source host, so we can keep the time skew short, and
> > more importantly, never goes backwards. Userspace may also need to trigger
> > the current data, since from the first migration onwards, it won't be
> > reflected by a simple call to clock_gettime() anymore.
> 
> So I assume without this feature there's no way to have a reliable kvmclock inside the guest? Isn't it stable material then?

Its unreliable only with migration. Yes, it is stable material.

Here is a backport for 2.6.32. Greg, can you please include it ? 

Thanks

------------------

From: Glauber Costa <glommer@redhat.com>

KVM: allow userspace to adjust kvmclock offset

When we migrate a kvm guest that uses pvclock between two hosts, we may
suffer a large skew. This is because there can be significant differences
between the monotonic clock of the hosts involved. When a new host with
a much larger monotonic time starts running the guest, the view of time
will be significantly impacted.

Situation is much worse when we do the opposite, and migrate to a host with
a smaller monotonic clock.

This proposed ioctl will allow userspace to inform us what is the monotonic
clock value in the source host, so we can keep the time skew short, and
more importantly, never goes backwards. Userspace may also need to trigger
the current data, since from the first migration onwards, it won't be
reflected by a simple call to clock_gettime() anymore.

[marcelo: future-proof abi with a flags field]
[jan: fix KVM_GET_CLOCK by clearing flags field instead of checking it]

Signed-off-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
(cherry picked from afbcf7ab8d1bc8c2d04792f6d9e786e0adeb328d)

diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
index 5a4bc8c..db3a706 100644
--- a/Documentation/kvm/api.txt
+++ b/Documentation/kvm/api.txt
@@ -593,6 +593,42 @@ struct kvm_irqchip {
 	} chip;
 };
 
+4.27 KVM_GET_CLOCK
+
+Capability: KVM_CAP_ADJUST_CLOCK
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_clock_data (out)
+Returns: 0 on success, -1 on error
+
+Gets the current timestamp of kvmclock as seen by the current guest. In
+conjunction with KVM_SET_CLOCK, it is used to ensure monotonicity on scenarios
+such as migration.
+
+struct kvm_clock_data {
+	__u64 clock;  /* kvmclock current value */
+	__u32 flags;
+	__u32 pad[9];
+};
+
+4.28 KVM_SET_CLOCK
+
+Capability: KVM_CAP_ADJUST_CLOCK
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_clock_data (in)
+Returns: 0 on success, -1 on error
+
+Sets the current timestamp of kvmclock to the valued specific in its parameter.
+In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios
+such as migration.
+
+struct kvm_clock_data {
+	__u64 clock;  /* kvmclock current value */
+	__u32 flags;
+	__u32 pad[9];
+};
+
 5. The kvm_run structure
 
 Application code obtains a pointer to the kvm_run structure by
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d838922..d759a1f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -412,6 +412,7 @@ struct kvm_arch{
 	unsigned long irq_sources_bitmap;
 	unsigned long irq_states[KVM_IOAPIC_NUM_PINS];
 	u64 vm_init_tsc;
+	s64 kvmclock_offset;
 };
 
 struct kvm_vm_stat {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ae07d26..adb7912 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -677,7 +677,8 @@ static void kvm_write_guest_time(struct kvm_vcpu *v)
 	/* With all the info we got, fill in the values */
 
 	vcpu->hv_clock.system_time = ts.tv_nsec +
-				     (NSEC_PER_SEC * (u64)ts.tv_sec);
+				     (NSEC_PER_SEC * (u64)ts.tv_sec) + v->kvm->arch.kvmclock_offset;
+
 	/*
 	 * The interface expects us to write an even number signaling that the
 	 * update is finished. Since the guest won't see the intermediate
@@ -1224,6 +1225,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 	case KVM_CAP_PIT2:
 	case KVM_CAP_PIT_STATE2:
 	case KVM_CAP_SET_IDENTITY_MAP_ADDR:
+	case KVM_CAP_ADJUST_CLOCK:
 		r = 1;
 		break;
 	case KVM_CAP_COALESCED_MMIO:
@@ -2421,6 +2423,44 @@ long kvm_arch_vm_ioctl(struct file *filp,
 		r = 0;
 		break;
 	}
+	case KVM_SET_CLOCK: {
+		struct timespec now;
+		struct kvm_clock_data user_ns;
+		u64 now_ns;
+		s64 delta;
+
+		r = -EFAULT;
+		if (copy_from_user(&user_ns, argp, sizeof(user_ns)))
+			goto out;
+
+		r = -EINVAL;
+		if (user_ns.flags)
+			goto out;
+
+		r = 0;
+		ktime_get_ts(&now);
+		now_ns = timespec_to_ns(&now);
+		delta = user_ns.clock - now_ns;
+		kvm->arch.kvmclock_offset = delta;
+		break;
+	}
+	case KVM_GET_CLOCK: {
+		struct timespec now;
+		struct kvm_clock_data user_ns;
+		u64 now_ns;
+
+		ktime_get_ts(&now);
+		now_ns = timespec_to_ns(&now);
+		user_ns.clock = kvm->arch.kvmclock_offset + now_ns;
+		user_ns.flags = 0;
+
+		r = -EFAULT;
+		if (copy_to_user(argp, &user_ns, sizeof(user_ns)))
+			goto out;
+		r = 0;
+		break;
+	}
+
 	default:
 		;
 	}
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index f8f8900..b80fec1 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -436,6 +436,7 @@ struct kvm_ioeventfd {
 #endif
 #define KVM_CAP_IOEVENTFD 36
 #define KVM_CAP_SET_IDENTITY_MAP_ADDR 37
+#define KVM_CAP_ADJUST_CLOCK 39
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -497,6 +498,12 @@ struct kvm_irqfd {
 	__u8  pad[20];
 };
 
+struct kvm_clock_data {
+	__u64 clock;
+	__u32 flags;
+	__u32 pad[9];
+};
+
 /*
  * ioctls for VM fds
  */
@@ -546,6 +553,8 @@ struct kvm_irqfd {
 #define KVM_CREATE_PIT2		   _IOW(KVMIO, 0x77, struct kvm_pit_config)
 #define KVM_SET_BOOT_CPU_ID        _IO(KVMIO, 0x78)
 #define KVM_IOEVENTFD             _IOW(KVMIO, 0x79, struct kvm_ioeventfd)
+#define KVM_SET_CLOCK             _IOW(KVMIO, 0x7b, struct kvm_clock_data)
+#define KVM_GET_CLOCK             _IOR(KVMIO, 0x7c, struct kvm_clock_data)
 
 /*
  * ioctls for vcpu fds

  reply	other threads:[~2010-02-01 19:31 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-19 13:34 [PATCH 00/35] KVM updates for the 2.6.33 merge window (batch 2/2) Avi Kivity
2009-11-19 13:34 ` [PATCH 01/35] KVM: SVM: Add tracepoint for #vmexit because intr pending Avi Kivity
2009-11-19 13:34 ` [PATCH 02/35] KVM: SVM: Add tracepoint for invlpga instruction Avi Kivity
2009-11-19 13:34 ` [PATCH 03/35] KVM: SVM: Add tracepoint for skinit instruction Avi Kivity
2009-11-19 13:34 ` [PATCH 04/35] KVM: SVM: Remove nsvm_printk debugging code Avi Kivity
2009-11-19 13:34 ` [PATCH 05/35] KVM: introduce kvm_vcpu_on_spin Avi Kivity
2009-11-19 13:34 ` [PATCH 06/35] KVM: VMX: Add support for Pause-Loop Exiting Avi Kivity
2009-11-19 13:34 ` [PATCH 07/35] KVM: SVM: Support Pause Filter in AMD processors Avi Kivity
2009-11-19 13:34 ` [PATCH 08/35] KVM: x86: Harden against cpufreq Avi Kivity
2009-11-19 13:34 ` [PATCH 09/35] KVM: VMX: fix handle_pause declaration Avi Kivity
2009-11-19 13:34 ` [PATCH 10/35] KVM: x86: Drop unneeded CONFIG_HAS_IOMEM check Avi Kivity
2009-11-19 13:34 ` [PATCH 11/35] KVM: Xen PV-on-HVM guest support Avi Kivity
2009-11-19 13:34 ` [PATCH 12/35] KVM: x86: Fix guest single-stepping while interruptible Avi Kivity
2009-11-19 13:34 ` [PATCH 13/35] KVM: SVM: Cleanup NMI singlestep Avi Kivity
2009-11-19 13:34 ` [PATCH 14/35] KVM: fix irq_source_id size verification Avi Kivity
2009-11-19 13:34 ` [PATCH 15/35] KVM: allow userspace to adjust kvmclock offset Avi Kivity
2010-01-29 13:32   ` Alexander Graf
2010-02-01 18:54     ` Marcelo Tosatti [this message]
2010-02-01 21:42       ` patch kvm-allow-userspace-to-adjust-kvmclock-offset.patch added to 2.6.32-stable tree gregkh
2009-11-19 13:34 ` [PATCH 16/35] KVM: Enable 32bit dirty log pointers on 64bit host Avi Kivity
2009-11-19 13:34 ` [PATCH 17/35] KVM: VMX: Use macros instead of hex value on cr0 initialization Avi Kivity
2009-11-19 13:34 ` [PATCH 18/35] KVM: SVM: Reset cr0 properly on vcpu reset Avi Kivity
2009-11-19 13:34 ` [PATCH 19/35] KVM: SVM: init_vmcb(): remove redundant save->cr0 initialization Avi Kivity
2009-11-19 13:34 ` [PATCH 20/35] KVM: VMX: Move MSR_KERNEL_GS_BASE out of the vmx autoload msr area Avi Kivity
2009-11-19 13:34 ` [PATCH 21/35] KVM: x86 shared msr infrastructure Avi Kivity
2009-11-19 13:34 ` [PATCH 22/35] KVM: VMX: Use " Avi Kivity
2009-11-19 13:34 ` [PATCH 23/35] KVM: powerpc: Fix BUILD_BUG_ON condition Avi Kivity
2009-11-19 13:35 ` [PATCH 24/35] KVM: remove duplicated task_switch check Avi Kivity
2009-11-19 13:35 ` [PATCH 25/35] KVM: VMX: move CR3/PDPTR update to vmx_set_cr3 Avi Kivity
2009-11-19 13:35 ` [PATCH 26/35] KVM: MMU: update invlpg handler comment Avi Kivity
2009-11-19 13:35 ` [PATCH 27/35] KVM: VMX: Remove vmx->msr_offset_efer Avi Kivity
2009-11-19 13:35 ` [PATCH 28/35] KVM: x86: disallow multiple KVM_CREATE_IRQCHIP Avi Kivity
2009-11-19 13:35 ` [PATCH 29/35] KVM: x86: disallow KVM_{SET,GET}_LAPIC without allocated in-kernel lapic Avi Kivity
2009-11-19 13:35 ` [PATCH 30/35] KVM: only clear irq_source_id if irqchip is present Avi Kivity
2009-11-19 13:35 ` [PATCH 31/35] KVM: x86: Polish exception injection via KVM_SET_GUEST_DEBUG Avi Kivity
2009-11-19 13:35 ` [PATCH 32/35] KVM: Reorder IOCTLs in main kvm.h Avi Kivity
2009-11-19 13:35 ` [PATCH 33/35] KVM: Allow internal errors reported to userspace to carry extra data Avi Kivity
2009-11-19 13:35 ` [PATCH 34/35] KVM: VMX: Report unexpected simultaneous exceptions as internal errors Avi Kivity
2009-11-19 13:35 ` [PATCH 35/35] KVM: x86: Add KVM_GET/SET_VCPU_EVENTS Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100201185405.GB5381@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=agraf@suse.de \
    --cc=avi@redhat.com \
    --cc=gregkh@suse.de \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox