public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/11] make L2's kvm-clock stable, get rid of pvclock_gtod
@ 2017-07-21 15:45 Denis Plotnikov
  2017-07-21 15:45 ` [PATCH v2 01/11] timekeeper: change interface of clocksource reding functions Denis Plotnikov
                   ` (10 more replies)
  0 siblings, 11 replies; 16+ messages in thread
From: Denis Plotnikov @ 2017-07-21 15:45 UTC (permalink / raw)
  To: kvm, rkrcmar; +Cc: pbonzini, den, rkagan

The main goal is to make L2 kvm-clock be stable when it's running over L1
with stable kvm-clock.

The patch series is for x86 architecture only. If the series is approved
I'll do changes for other architectures but I don't have an ability to
compile and check for every single on (help needed)

The patch series do the following:

	* change timekeeper interface to get cycles stamp value from
	  the timekeeper
	* get rid of pvclock copy in KVM by using the changed timekeeper
          interface: get time and cycles right from the timekeeper
	* make KVM recognize a stable kvm-clock as stable clocksource
	  and use the KVM masterclock in this case, which means making
	  L2 stable when running over stable L1 kvm-clock

Denis Plotnikov (11):
  timekeeper: change interface of clocksource reding functions
  pvclock: write cycle stamp value if a pointer given
  kvmclock: pass cycles pointer to the changed pvclock interface
  TSC: write cycles stamp value to input pointer
  timekeeping: change ktime_get_with_offset interface to accept cycles
    pointer
  timekeeping: add functions returning cycle stamp counter along with
    time
  timekeeper: add clocksource change notifier
  timekeeper: add a couple of the core timekeeper reading helpers
  KVM: get rid of pv_clock_gtod
  pvclock: add clocksource change notification on changing of tsc stable
    bit
  KVM: add pvclock to a list of stable clocks.

 arch/x86/hyperv/hv_init.c       |   4 +-
 arch/x86/include/asm/kvm_host.h |   2 +-
 arch/x86/include/asm/pvclock.h  |   3 +-
 arch/x86/kernel/hpet.c          |   4 +-
 arch/x86/kernel/kvmclock.c      |  19 +--
 arch/x86/kernel/pvclock.c       |  32 ++++-
 arch/x86/kernel/tsc.c           |   8 +-
 arch/x86/kvm/trace.h            |  27 ++--
 arch/x86/kvm/x86.c              | 267 ++++++++++++----------------------------
 arch/x86/lguest/boot.c          |   2 +-
 arch/x86/platform/uv/uv_time.c  |  10 +-
 arch/x86/xen/time.c             |   4 +-
 drivers/char/hpet.c             |   2 +-
 drivers/clocksource/acpi_pm.c   |  13 +-
 drivers/hv/hv_util.c            |   6 +-
 include/linux/clocksource.h     |   7 +-
 include/linux/cs_notifier.h     |  17 +++
 include/linux/timekeeping.h     |  35 +++++-
 kernel/time/clocksource.c       |   4 +-
 kernel/time/jiffies.c           |   2 +-
 kernel/time/timekeeping.c       | 119 ++++++++++++++----
 21 files changed, 316 insertions(+), 271 deletions(-)
 create mode 100644 include/linux/cs_notifier.h

-- 
2.7.4

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2 01/11] timekeeper: change interface of clocksource reding functions
  2017-07-21 15:45 [PATCH v2 00/11] make L2's kvm-clock stable, get rid of pvclock_gtod Denis Plotnikov
@ 2017-07-21 15:45 ` Denis Plotnikov
  2017-07-23  4:24   ` kbuild test robot
  2017-07-21 15:45 ` [PATCH v2 02/11] pvclock: write cycle stamp value if a pointer given Denis Plotnikov
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 16+ messages in thread
From: Denis Plotnikov @ 2017-07-21 15:45 UTC (permalink / raw)
  To: kvm, rkrcmar; +Cc: pbonzini, den, rkagan

When using timekeepeing API in some cases it is useful to return
cycles stamp value which has been used used along with the time calculated
to use that cycles stamp value for other purpuses
(e.g. in KVM master clock)

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 arch/x86/hyperv/hv_init.c      |  4 ++--
 arch/x86/include/asm/pvclock.h |  3 ++-
 arch/x86/kernel/hpet.c         |  4 ++--
 arch/x86/kernel/kvmclock.c     |  4 ++--
 arch/x86/kernel/pvclock.c      |  6 ++++--
 arch/x86/kernel/tsc.c          |  2 +-
 arch/x86/lguest/boot.c         |  2 +-
 arch/x86/platform/uv/uv_time.c | 10 +++++-----
 arch/x86/xen/time.c            |  4 ++--
 drivers/char/hpet.c            |  2 +-
 drivers/clocksource/acpi_pm.c  | 13 +++++++------
 drivers/hv/hv_util.c           |  6 +++---
 include/linux/clocksource.h    |  7 +++++--
 kernel/time/clocksource.c      |  4 ++--
 kernel/time/jiffies.c          |  2 +-
 kernel/time/timekeeping.c      | 26 +++++++++++++-------------
 16 files changed, 53 insertions(+), 46 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 5b882cc..43ed8c2 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -36,7 +36,7 @@ struct ms_hyperv_tsc_page *hv_get_tsc_page(void)
 	return tsc_pg;
 }
 
-static u64 read_hv_clock_tsc(struct clocksource *arg)
+static u64 read_hv_clock_tsc(struct clocksource *arg, u64 *cycle_stamp)
 {
 	u64 current_tick = hv_read_tsc_page(tsc_pg);
 
@@ -55,7 +55,7 @@ static struct clocksource hyperv_cs_tsc = {
 };
 #endif
 
-static u64 read_hv_clock_msr(struct clocksource *arg)
+static u64 read_hv_clock_msr(struct clocksource *arg, u64 *cycles_stamp)
 {
 	u64 current_tick;
 	/*
diff --git a/arch/x86/include/asm/pvclock.h b/arch/x86/include/asm/pvclock.h
index 448cfe1..1095ad6 100644
--- a/arch/x86/include/asm/pvclock.h
+++ b/arch/x86/include/asm/pvclock.h
@@ -14,7 +14,8 @@ static inline struct pvclock_vsyscall_time_info *pvclock_pvti_cpu0_va(void)
 #endif
 
 /* some helper functions for xen and kvm pv clock sources */
-u64 pvclock_clocksource_read(struct pvclock_vcpu_time_info *src);
+u64 pvclock_clocksource_read(struct pvclock_vcpu_time_info *src,
+				u64 *cycles_stamp);
 u8 pvclock_read_flags(struct pvclock_vcpu_time_info *src);
 void pvclock_set_flags(u8 flags);
 unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src);
diff --git a/arch/x86/kernel/hpet.c b/arch/x86/kernel/hpet.c
index 89ff7af..091ef2f 100644
--- a/arch/x86/kernel/hpet.c
+++ b/arch/x86/kernel/hpet.c
@@ -792,7 +792,7 @@ static union hpet_lock hpet __cacheline_aligned = {
 	{ .lock = __ARCH_SPIN_LOCK_UNLOCKED, },
 };
 
-static u64 read_hpet(struct clocksource *cs)
+static u64 read_hpet(struct clocksource *cs, u64 *cycles_stamp)
 {
 	unsigned long flags;
 	union hpet_lock old, new;
@@ -850,7 +850,7 @@ static u64 read_hpet(struct clocksource *cs)
 /*
  * For UP or 32-bit.
  */
-static u64 read_hpet(struct clocksource *cs)
+static u64 read_hpet(struct clocksource *cs, u64 *cycles_stamp)
 {
 	return (u64)hpet_readl(HPET_COUNTER);
 }
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index d889676..177f2f4 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -91,12 +91,12 @@ static u64 kvm_clock_read(void)
 	preempt_disable_notrace();
 	cpu = smp_processor_id();
 	src = &hv_clock[cpu].pvti;
-	ret = pvclock_clocksource_read(src);
+	ret = pvclock_clocksource_read(src, NULL);
 	preempt_enable_notrace();
 	return ret;
 }
 
-static u64 kvm_clock_get_cycles(struct clocksource *cs)
+static u64 kvm_clock_get_cycles(struct clocksource *cs, u64 *cycles_stamp)
 {
 	return kvm_clock_read();
 }
diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 5c3f6d6..1a0d86a 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -73,7 +73,8 @@ u8 pvclock_read_flags(struct pvclock_vcpu_time_info *src)
 	return flags & valid_flags;
 }
 
-u64 pvclock_clocksource_read(struct pvclock_vcpu_time_info *src)
+u64 pvclock_clocksource_read(struct pvclock_vcpu_time_info *src,
+				u64 *cycles_stamp)
 {
 	unsigned version;
 	u64 ret;
@@ -136,7 +137,8 @@ void pvclock_read_wallclock(struct pvclock_wall_clock *wall_clock,
 		rmb();		/* fetch time before checking version */
 	} while ((wall_clock->version & 1) || (version != wall_clock->version));
 
-	delta = pvclock_clocksource_read(vcpu_time);	/* time since system boot */
+	/* time since system boot */
+	delta = pvclock_clocksource_read(vcpu_time, NULL);
 	delta += now.tv_sec * (u64)NSEC_PER_SEC + now.tv_nsec;
 
 	now.tv_nsec = do_div(delta, NSEC_PER_SEC);
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 714dfba..b475f6c 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1110,7 +1110,7 @@ static void tsc_resume(struct clocksource *cs)
  * checking the result of read_tsc() - cycle_last for being negative.
  * That works because CLOCKSOURCE_MASK(64) does not mask out any bit.
  */
-static u64 read_tsc(struct clocksource *cs)
+static u64 read_tsc(struct clocksource *cs, u64 *cycles_stamp)
 {
 	return (u64)rdtsc_ordered();
 }
diff --git a/arch/x86/lguest/boot.c b/arch/x86/lguest/boot.c
index 9947269..9109cdc 100644
--- a/arch/x86/lguest/boot.c
+++ b/arch/x86/lguest/boot.c
@@ -916,7 +916,7 @@ static unsigned long lguest_tsc_khz(void)
  * If we can't use the TSC, the kernel falls back to our lower-priority
  * "lguest_clock", where we read the time value given to us by the Host.
  */
-static u64 lguest_clock_read(struct clocksource *cs)
+static u64 lguest_clock_read(struct clocksource *cs, u64 *cycles_stamp)
 {
 	unsigned long sec, nsec;
 
diff --git a/arch/x86/platform/uv/uv_time.c b/arch/x86/platform/uv/uv_time.c
index b082d71..4dddc4c 100644
--- a/arch/x86/platform/uv/uv_time.c
+++ b/arch/x86/platform/uv/uv_time.c
@@ -30,7 +30,7 @@
 
 #define RTC_NAME		"sgi_rtc"
 
-static u64 uv_read_rtc(struct clocksource *cs);
+static u64 uv_read_rtc(struct clocksource *cs, u64 *cycles_stamp);
 static int uv_rtc_next_event(unsigned long, struct clock_event_device *);
 static int uv_rtc_shutdown(struct clock_event_device *evt);
 
@@ -133,7 +133,7 @@ static int uv_setup_intr(int cpu, u64 expires)
 	/* Initialize comparator value */
 	uv_write_global_mmr64(pnode, UVH_INT_CMPB, expires);
 
-	if (uv_read_rtc(NULL) <= expires)
+	if (uv_read_rtc(NULL, NULL) <= expires)
 		return 0;
 
 	return !uv_intr_pending(pnode);
@@ -269,7 +269,7 @@ static int uv_rtc_unset_timer(int cpu, int force)
 
 	spin_lock_irqsave(&head->lock, flags);
 
-	if ((head->next_cpu == bcpu && uv_read_rtc(NULL) >= *t) || force)
+	if ((head->next_cpu == bcpu && uv_read_rtc(NULL, NULL) >= *t) || force)
 		rc = 1;
 
 	if (rc) {
@@ -296,7 +296,7 @@ static int uv_rtc_unset_timer(int cpu, int force)
  * cachelines of it's own page.  This allows faster simultaneous reads
  * from a given socket.
  */
-static u64 uv_read_rtc(struct clocksource *cs)
+static u64 uv_read_rtc(struct clocksource *cs, u64 *cycles_stamp)
 {
 	unsigned long offset;
 
@@ -316,7 +316,7 @@ static int uv_rtc_next_event(unsigned long delta,
 {
 	int ced_cpu = cpumask_first(ced->cpumask);
 
-	return uv_rtc_set_timer(ced_cpu, delta + uv_read_rtc(NULL));
+	return uv_rtc_set_timer(ced_cpu, delta + uv_read_rtc(NULL, NULL));
 }
 
 /*
diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index a1895a8..aafbe6d 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -46,12 +46,12 @@ u64 xen_clocksource_read(void)
 
 	preempt_disable_notrace();
 	src = &__this_cpu_read(xen_vcpu)->time;
-	ret = pvclock_clocksource_read(src);
+	ret = pvclock_clocksource_read(src, NULL);
 	preempt_enable_notrace();
 	return ret;
 }
 
-static u64 xen_clocksource_get_cycles(struct clocksource *cs)
+static u64 xen_clocksource_get_cycles(struct clocksource *cs, u64 *cycles_stamp)
 {
 	return xen_clocksource_read();
 }
diff --git a/drivers/char/hpet.c b/drivers/char/hpet.c
index b941e6d..4702207 100644
--- a/drivers/char/hpet.c
+++ b/drivers/char/hpet.c
@@ -70,7 +70,7 @@ static u32 hpet_nhpet, hpet_max_freq = HPET_USER_FREQ;
 #ifdef CONFIG_IA64
 static void __iomem *hpet_mctr;
 
-static u64 read_hpet(struct clocksource *cs)
+static u64 read_hpet(struct clocksource *cs, u64 *cycles_stamp)
 {
 	return (u64)read_counter((void __iomem *)hpet_mctr);
 }
diff --git a/drivers/clocksource/acpi_pm.c b/drivers/clocksource/acpi_pm.c
index 1961e35..c7420b2 100644
--- a/drivers/clocksource/acpi_pm.c
+++ b/drivers/clocksource/acpi_pm.c
@@ -58,7 +58,7 @@ u32 acpi_pm_read_verified(void)
 	return v2;
 }
 
-static u64 acpi_pm_read(struct clocksource *cs)
+static u64 acpi_pm_read(struct clocksource *cs, u64 *cycles_stamp)
 {
 	return (u64)read_pmtmr();
 }
@@ -81,7 +81,7 @@ static int __init acpi_pm_good_setup(char *__str)
 }
 __setup("acpi_pm_good", acpi_pm_good_setup);
 
-static u64 acpi_pm_read_slow(struct clocksource *cs)
+static u64 acpi_pm_read_slow(struct clocksource *cs, u64 *cycles_stamp)
 {
 	return (u64)acpi_pm_read_verified();
 }
@@ -149,9 +149,9 @@ static int verify_pmtmr_rate(void)
 	unsigned long count, delta;
 
 	mach_prepare_counter();
-	value1 = clocksource_acpi_pm.read(&clocksource_acpi_pm);
+	value1 = clocksource_acpi_pm.read(&clocksource_acpi_pm, NULL);
 	mach_countup(&count);
-	value2 = clocksource_acpi_pm.read(&clocksource_acpi_pm);
+	value2 = clocksource_acpi_pm.read(&clocksource_acpi_pm, NULL);
 	delta = (value2 - value1) & ACPI_PM_MASK;
 
 	/* Check that the PMTMR delta is within 5% of what we expect */
@@ -184,9 +184,10 @@ static int __init init_acpi_pm_clocksource(void)
 	/* "verify" this timing source: */
 	for (j = 0; j < ACPI_PM_MONOTONICITY_CHECKS; j++) {
 		udelay(100 * j);
-		value1 = clocksource_acpi_pm.read(&clocksource_acpi_pm);
+		value1 = clocksource_acpi_pm.read(&clocksource_acpi_pm, NULL);
 		for (i = 0; i < ACPI_PM_READ_CHECKS; i++) {
-			value2 = clocksource_acpi_pm.read(&clocksource_acpi_pm);
+			value2 = clocksource_acpi_pm.read(
+					&clocksource_acpi_pm, NULL);
 			if (value2 == value1)
 				continue;
 			if (value2 > value1)
diff --git a/drivers/hv/hv_util.c b/drivers/hv/hv_util.c
index 186b100..74def09 100644
--- a/drivers/hv/hv_util.c
+++ b/drivers/hv/hv_util.c
@@ -218,7 +218,7 @@ static void hv_set_host_time(struct work_struct *work)
 
 	wrk = container_of(work, struct adj_time_work, work);
 
-	reftime = hyperv_cs->read(hyperv_cs);
+	reftime = hyperv_cs->read(hyperv_cs, NULL);
 	newtime = wrk->host_time + (reftime - wrk->ref_time);
 	host_ts = ns_to_timespec64((newtime - WLTIMEDELTA) * 100);
 
@@ -278,7 +278,7 @@ static inline void adj_guesttime(u64 hosttime, u64 reftime, u8 adj_flags)
 		 */
 		spin_lock_irqsave(&host_ts.lock, flags);
 
-		cur_reftime = hyperv_cs->read(hyperv_cs);
+		cur_reftime = hyperv_cs->read(hyperv_cs, NULL);
 		host_ts.host_time = hosttime;
 		host_ts.ref_time = cur_reftime;
 		ktime_get_snapshot(&host_ts.snap);
@@ -530,7 +530,7 @@ static int hv_ptp_gettime(struct ptp_clock_info *info, struct timespec64 *ts)
 	u64 newtime, reftime;
 
 	spin_lock_irqsave(&host_ts.lock, flags);
-	reftime = hyperv_cs->read(hyperv_cs);
+	reftime = hyperv_cs->read(hyperv_cs, NULL);
 	newtime = host_ts.host_time + (reftime - host_ts.ref_time);
 	*ts = ns_to_timespec64((newtime - WLTIMEDELTA) * 100);
 	spin_unlock_irqrestore(&host_ts.lock, flags);
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index f2b10d9..b6f00a4 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -48,7 +48,10 @@ struct module;
  *			400-499: Perfect
  *				The ideal clocksource. A must-use where
  *				available.
- * @read:		returns a cycle value, passes clocksource as argument
+ * @read:		returns a cycle value, passes as arguments clocksource
+ *			and a pointer where the cycles "stamp" is stored which
+ *			was used in calcualtion of the returning cycle value,
+ *			if any,	otherwise the pointer value is untouched.
  * @enable:		optional function to enable the clocksource
  * @disable:		optional function to disable the clocksource
  * @mask:		bitmask for two's complement
@@ -77,7 +80,7 @@ struct module;
  * structure.
  */
 struct clocksource {
-	u64 (*read)(struct clocksource *cs);
+	u64 (*read)(struct clocksource *cs, u64 *cycles_stamp);
 	u64 mask;
 	u32 mult;
 	u32 shift;
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 93621ae..e48a6eb 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -194,8 +194,8 @@ static void clocksource_watchdog(unsigned long data)
 		}
 
 		local_irq_disable();
-		csnow = cs->read(cs);
-		wdnow = watchdog->read(watchdog);
+		csnow = cs->read(cs, NULL);
+		wdnow = watchdog->read(watchdog, NULL);
 		local_irq_enable();
 
 		/* Clocksource initialized ? */
diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c
index 4977191..b235dce 100644
--- a/kernel/time/jiffies.c
+++ b/kernel/time/jiffies.c
@@ -48,7 +48,7 @@
 #define JIFFIES_SHIFT	8
 #endif
 
-static u64 jiffies_read(struct clocksource *cs)
+static u64 jiffies_read(struct clocksource *cs, u64 *cycles_stamp)
 {
 	return (u64) jiffies;
 }
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index b602c48..5d0c4d0 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -131,11 +131,11 @@ static inline void tk_update_sleep_time(struct timekeeper *tk, ktime_t delta)
  * a read of the fast-timekeeper tkrs (which is protected by its own locking
  * and update logic).
  */
-static inline u64 tk_clock_read(struct tk_read_base *tkr)
+static inline u64 tk_clock_read(struct tk_read_base *tkr, u64 *cycles_stamp)
 {
 	struct clocksource *clock = READ_ONCE(tkr->clock);
 
-	return clock->read(clock);
+	return clock->read(clock, cycles_stamp);
 }
 
 #ifdef CONFIG_DEBUG_TIMEKEEPING
@@ -195,7 +195,7 @@ static inline u64 timekeeping_get_delta(struct tk_read_base *tkr)
 	 */
 	do {
 		seq = read_seqcount_begin(&tk_core.seq);
-		now = tk_clock_read(tkr);
+		now = tk_clock_read(tkr, NULL);
 		last = tkr->cycle_last;
 		mask = tkr->mask;
 		max = tkr->clock->max_cycles;
@@ -229,7 +229,7 @@ static inline u64 timekeeping_get_delta(struct tk_read_base *tkr)
 	u64 cycle_now, delta;
 
 	/* read clocksource */
-	cycle_now = tk_clock_read(tkr);
+	cycle_now = tk_clock_read(tkr, NULL);
 
 	/* calculate the delta since the last update_wall_time */
 	delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);
@@ -259,7 +259,7 @@ static void tk_setup_internals(struct timekeeper *tk, struct clocksource *clock)
 	old_clock = tk->tkr_mono.clock;
 	tk->tkr_mono.clock = clock;
 	tk->tkr_mono.mask = clock->mask;
-	tk->tkr_mono.cycle_last = tk_clock_read(&tk->tkr_mono);
+	tk->tkr_mono.cycle_last = tk_clock_read(&tk->tkr_mono, NULL);
 
 	tk->tkr_raw.clock = clock;
 	tk->tkr_raw.mask = clock->mask;
@@ -422,7 +422,7 @@ static __always_inline u64 __ktime_get_fast_ns(struct tk_fast *tkf)
 
 		now += timekeeping_delta_to_ns(tkr,
 				clocksource_delta(
-					tk_clock_read(tkr),
+					tk_clock_read(tkr, NULL),
 					tkr->cycle_last,
 					tkr->mask));
 	} while (read_seqcount_retry(&tkf->seq, seq));
@@ -474,7 +474,7 @@ EXPORT_SYMBOL_GPL(ktime_get_boot_fast_ns);
 /* Suspend-time cycles value for halted fast timekeeper. */
 static u64 cycles_at_suspend;
 
-static u64 dummy_clock_read(struct clocksource *cs)
+static u64 dummy_clock_read(struct clocksource *cs, u64 *cycles_stamp)
 {
 	return cycles_at_suspend;
 }
@@ -499,7 +499,7 @@ static void halt_fast_timekeeper(struct timekeeper *tk)
 	struct tk_read_base *tkr = &tk->tkr_mono;
 
 	memcpy(&tkr_dummy, tkr, sizeof(tkr_dummy));
-	cycles_at_suspend = tk_clock_read(tkr);
+	cycles_at_suspend = tk_clock_read(tkr, NULL);
 	tkr_dummy.clock = &dummy_clock;
 	update_fast_timekeeper(&tkr_dummy, &tk_fast_mono);
 
@@ -674,7 +674,7 @@ static void timekeeping_forward_now(struct timekeeper *tk)
 	u64 cycle_now, delta;
 	u64 nsec;
 
-	cycle_now = tk_clock_read(&tk->tkr_mono);
+	cycle_now = tk_clock_read(&tk->tkr_mono, NULL);
 	delta = clocksource_delta(cycle_now, tk->tkr_mono.cycle_last, tk->tkr_mono.mask);
 	tk->tkr_mono.cycle_last = cycle_now;
 	tk->tkr_raw.cycle_last  = cycle_now;
@@ -950,7 +950,7 @@ void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot)
 
 	do {
 		seq = read_seqcount_begin(&tk_core.seq);
-		now = tk_clock_read(&tk->tkr_mono);
+		now = tk_clock_read(&tk->tkr_mono, NULL);
 		systime_snapshot->cs_was_changed_seq = tk->cs_was_changed_seq;
 		systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq;
 		base_real = ktime_add(tk->tkr_mono.base,
@@ -1128,7 +1128,7 @@ int get_device_system_crosststamp(int (*get_time_fn)
 		 * Check whether the system counter value provided by the
 		 * device driver is on the current timekeeping interval.
 		 */
-		now = tk_clock_read(&tk->tkr_mono);
+		now = tk_clock_read(&tk->tkr_mono, NULL);
 		interval_start = tk->tkr_mono.cycle_last;
 		if (!cycle_between(interval_start, cycles, now)) {
 			clock_was_set_seq = tk->clock_was_set_seq;
@@ -1649,7 +1649,7 @@ void timekeeping_resume(void)
 	 * The less preferred source will only be tried if there is no better
 	 * usable source. The rtc part is handled separately in rtc core code.
 	 */
-	cycle_now = tk_clock_read(&tk->tkr_mono);
+	cycle_now = tk_clock_read(&tk->tkr_mono, NULL);
 	if ((clock->flags & CLOCK_SOURCE_SUSPEND_NONSTOP) &&
 		cycle_now > tk->tkr_mono.cycle_last) {
 		u64 nsec, cyc_delta;
@@ -2051,7 +2051,7 @@ void update_wall_time(void)
 #ifdef CONFIG_ARCH_USES_GETTIMEOFFSET
 	offset = real_tk->cycle_interval;
 #else
-	offset = clocksource_delta(tk_clock_read(&tk->tkr_mono),
+	offset = clocksource_delta(tk_clock_read(&tk->tkr_mono, NULL),
 				   tk->tkr_mono.cycle_last, tk->tkr_mono.mask);
 #endif
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 02/11] pvclock: write cycle stamp value if a pointer given
  2017-07-21 15:45 [PATCH v2 00/11] make L2's kvm-clock stable, get rid of pvclock_gtod Denis Plotnikov
  2017-07-21 15:45 ` [PATCH v2 01/11] timekeeper: change interface of clocksource reding functions Denis Plotnikov
@ 2017-07-21 15:45 ` Denis Plotnikov
  2017-07-21 15:45 ` [PATCH v2 03/11] kvmclock: pass cycles pointer to the changed pvclock interface Denis Plotnikov
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 16+ messages in thread
From: Denis Plotnikov @ 2017-07-21 15:45 UTC (permalink / raw)
  To: kvm, rkrcmar; +Cc: pbonzini, den, rkagan

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 arch/x86/kernel/pvclock.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index 1a0d86a..ab54c92 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -82,8 +82,12 @@ u64 pvclock_clocksource_read(struct pvclock_vcpu_time_info *src,
 	u8 flags;
 
 	do {
+		u64 tsc;
 		version = pvclock_read_begin(src);
-		ret = __pvclock_read_cycles(src, rdtsc_ordered());
+		tsc = rdtsc_ordered();
+		ret = __pvclock_read_cycles(src, tsc);
+		if (cycles_stamp)
+			*cycles_stamp = tsc;
 		flags = src->flags;
 	} while (pvclock_read_retry(src, version));
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 03/11] kvmclock: pass cycles pointer to the changed pvclock interface
  2017-07-21 15:45 [PATCH v2 00/11] make L2's kvm-clock stable, get rid of pvclock_gtod Denis Plotnikov
  2017-07-21 15:45 ` [PATCH v2 01/11] timekeeper: change interface of clocksource reding functions Denis Plotnikov
  2017-07-21 15:45 ` [PATCH v2 02/11] pvclock: write cycle stamp value if a pointer given Denis Plotnikov
@ 2017-07-21 15:45 ` Denis Plotnikov
  2017-07-21 15:45 ` [PATCH v2 04/11] TSC: write cycles stamp value to input pointer Denis Plotnikov
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 16+ messages in thread
From: Denis Plotnikov @ 2017-07-21 15:45 UTC (permalink / raw)
  To: kvm, rkrcmar; +Cc: pbonzini, den, rkagan

This allows to get cycles stamp used for time calculation on
kvm-clock

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 arch/x86/kernel/kvmclock.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
index 177f2f4..79dd035 100644
--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -82,7 +82,7 @@ static int kvm_set_wallclock(const struct timespec *now)
 	return -1;
 }
 
-static u64 kvm_clock_read(void)
+static u64 kvm_clock_read(u64 *cycles_stamp)
 {
 	struct pvclock_vcpu_time_info *src;
 	u64 ret;
@@ -91,30 +91,35 @@ static u64 kvm_clock_read(void)
 	preempt_disable_notrace();
 	cpu = smp_processor_id();
 	src = &hv_clock[cpu].pvti;
-	ret = pvclock_clocksource_read(src, NULL);
+	ret = pvclock_clocksource_read(src, cycles_stamp);
 	preempt_enable_notrace();
 	return ret;
 }
 
 static u64 kvm_clock_get_cycles(struct clocksource *cs, u64 *cycles_stamp)
 {
-	return kvm_clock_read();
+	return kvm_clock_read(cycles_stamp);
+}
+
+static u64 kvm_sched_clock_read_no_offset(void)
+{
+	return kvm_clock_read(NULL);
 }
 
 static u64 kvm_sched_clock_read(void)
 {
-	return kvm_clock_read() - kvm_sched_clock_offset;
+	return kvm_clock_read(NULL) - kvm_sched_clock_offset;
 }
 
 static inline void kvm_sched_clock_init(bool stable)
 {
 	if (!stable) {
-		pv_time_ops.sched_clock = kvm_clock_read;
+		pv_time_ops.sched_clock = kvm_sched_clock_read_no_offset;
 		clear_sched_clock_stable();
 		return;
 	}
 
-	kvm_sched_clock_offset = kvm_clock_read();
+	kvm_sched_clock_offset = kvm_clock_read(NULL);
 	pv_time_ops.sched_clock = kvm_sched_clock_read;
 
 	printk(KERN_INFO "kvm-clock: using sched offset of %llu cycles\n",
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 04/11] TSC: write cycles stamp value to input pointer
  2017-07-21 15:45 [PATCH v2 00/11] make L2's kvm-clock stable, get rid of pvclock_gtod Denis Plotnikov
                   ` (2 preceding siblings ...)
  2017-07-21 15:45 ` [PATCH v2 03/11] kvmclock: pass cycles pointer to the changed pvclock interface Denis Plotnikov
@ 2017-07-21 15:45 ` Denis Plotnikov
  2017-07-21 15:45 ` [PATCH v2 05/11] timekeeping: change ktime_get_with_offset interface to accept cycles pointer Denis Plotnikov
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 16+ messages in thread
From: Denis Plotnikov @ 2017-07-21 15:45 UTC (permalink / raw)
  To: kvm, rkrcmar; +Cc: pbonzini, den, rkagan

This allows to get cycles stamp used for time calculation when
clocksource is tsc

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 arch/x86/kernel/tsc.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index b475f6c..5411b18 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1112,7 +1112,11 @@ static void tsc_resume(struct clocksource *cs)
  */
 static u64 read_tsc(struct clocksource *cs, u64 *cycles_stamp)
 {
-	return (u64)rdtsc_ordered();
+	u64 tsc = rdtsc_ordered();
+
+	if (cycles_stamp)
+		*cycles_stamp = tsc;
+	return tsc;
 }
 
 static void tsc_cs_mark_unstable(struct clocksource *cs)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 05/11] timekeeping: change ktime_get_with_offset interface to accept cycles pointer
  2017-07-21 15:45 [PATCH v2 00/11] make L2's kvm-clock stable, get rid of pvclock_gtod Denis Plotnikov
                   ` (3 preceding siblings ...)
  2017-07-21 15:45 ` [PATCH v2 04/11] TSC: write cycles stamp value to input pointer Denis Plotnikov
@ 2017-07-21 15:45 ` Denis Plotnikov
  2017-07-21 15:45 ` [PATCH v2 06/11] timekeeping: add functions returning cycle stamp counter along with time Denis Plotnikov
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 16+ messages in thread
From: Denis Plotnikov @ 2017-07-21 15:45 UTC (permalink / raw)
  To: kvm, rkrcmar; +Cc: pbonzini, den, rkagan

It's a part of preparation to using cycle stamp values when calculating
the kerenel time

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 include/linux/timekeeping.h |  8 ++++----
 kernel/time/timekeeping.c   | 31 +++++++++++++++++--------------
 2 files changed, 21 insertions(+), 18 deletions(-)

diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index ddc229f..fc6683b 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -171,7 +171,7 @@ enum tk_offsets {
 };
 
 extern ktime_t ktime_get(void);
-extern ktime_t ktime_get_with_offset(enum tk_offsets offs);
+extern ktime_t ktime_get_with_offset(enum tk_offsets offs, u64 *cycles_stamp);
 extern ktime_t ktime_mono_to_any(ktime_t tmono, enum tk_offsets offs);
 extern ktime_t ktime_get_raw(void);
 extern u32 ktime_get_resolution_ns(void);
@@ -181,7 +181,7 @@ extern u32 ktime_get_resolution_ns(void);
  */
 static inline ktime_t ktime_get_real(void)
 {
-	return ktime_get_with_offset(TK_OFFS_REAL);
+	return ktime_get_with_offset(TK_OFFS_REAL, NULL);
 }
 
 /**
@@ -192,7 +192,7 @@ static inline ktime_t ktime_get_real(void)
  */
 static inline ktime_t ktime_get_boottime(void)
 {
-	return ktime_get_with_offset(TK_OFFS_BOOT);
+	return ktime_get_with_offset(TK_OFFS_BOOT, NULL);
 }
 
 /**
@@ -200,7 +200,7 @@ static inline ktime_t ktime_get_boottime(void)
  */
 static inline ktime_t ktime_get_clocktai(void)
 {
-	return ktime_get_with_offset(TK_OFFS_TAI);
+	return ktime_get_with_offset(TK_OFFS_TAI, NULL);
 }
 
 /**
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 5d0c4d0..b6d7882 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -180,7 +180,8 @@ static void timekeeping_check_update(struct timekeeper *tk, u64 offset)
 	}
 }
 
-static inline u64 timekeeping_get_delta(struct tk_read_base *tkr)
+static inline u64 timekeeping_get_delta(struct tk_read_base *tkr,
+					u64 *cycles_stamp)
 {
 	struct timekeeper *tk = &tk_core.timekeeper;
 	u64 now, last, mask, max, delta;
@@ -195,7 +196,7 @@ static inline u64 timekeeping_get_delta(struct tk_read_base *tkr)
 	 */
 	do {
 		seq = read_seqcount_begin(&tk_core.seq);
-		now = tk_clock_read(tkr, NULL);
+		now = tk_clock_read(tkr, cycles_stamp);
 		last = tkr->cycle_last;
 		mask = tkr->mask;
 		max = tkr->clock->max_cycles;
@@ -224,12 +225,13 @@ static inline u64 timekeeping_get_delta(struct tk_read_base *tkr)
 static inline void timekeeping_check_update(struct timekeeper *tk, u64 offset)
 {
 }
-static inline u64 timekeeping_get_delta(struct tk_read_base *tkr)
+static inline u64 timekeeping_get_delta(struct tk_read_base *tkr,
+					u64 *cycles_stamp)
 {
 	u64 cycle_now, delta;
 
 	/* read clocksource */
-	cycle_now = tk_clock_read(tkr, NULL);
+	cycle_now = tk_clock_read(tkr, cycles_stamp);
 
 	/* calculate the delta since the last update_wall_time */
 	delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);
@@ -329,11 +331,12 @@ static inline u64 timekeeping_delta_to_ns(struct tk_read_base *tkr, u64 delta)
 	return nsec + arch_gettimeoffset();
 }
 
-static inline u64 timekeeping_get_ns(struct tk_read_base *tkr)
+static inline u64 timekeeping_get_ns(struct tk_read_base *tkr,
+					u64 *cycles_stamp)
 {
 	u64 delta;
 
-	delta = timekeeping_get_delta(tkr);
+	delta = timekeeping_get_delta(tkr, cycles_stamp);
 	return timekeeping_delta_to_ns(tkr, delta);
 }
 
@@ -707,7 +710,7 @@ int __getnstimeofday64(struct timespec64 *ts)
 		seq = read_seqcount_begin(&tk_core.seq);
 
 		ts->tv_sec = tk->xtime_sec;
-		nsecs = timekeeping_get_ns(&tk->tkr_mono);
+		nsecs = timekeeping_get_ns(&tk->tkr_mono, NULL);
 
 	} while (read_seqcount_retry(&tk_core.seq, seq));
 
@@ -748,7 +751,7 @@ ktime_t ktime_get(void)
 	do {
 		seq = read_seqcount_begin(&tk_core.seq);
 		base = tk->tkr_mono.base;
-		nsecs = timekeeping_get_ns(&tk->tkr_mono);
+		nsecs = timekeeping_get_ns(&tk->tkr_mono, NULL);
 
 	} while (read_seqcount_retry(&tk_core.seq, seq));
 
@@ -779,7 +782,7 @@ static ktime_t *offsets[TK_OFFS_MAX] = {
 	[TK_OFFS_TAI]	= &tk_core.timekeeper.offs_tai,
 };
 
-ktime_t ktime_get_with_offset(enum tk_offsets offs)
+ktime_t ktime_get_with_offset(enum tk_offsets offs, u64 *cycles_stamp)
 {
 	struct timekeeper *tk = &tk_core.timekeeper;
 	unsigned int seq;
@@ -791,7 +794,7 @@ ktime_t ktime_get_with_offset(enum tk_offsets offs)
 	do {
 		seq = read_seqcount_begin(&tk_core.seq);
 		base = ktime_add(tk->tkr_mono.base, *offset);
-		nsecs = timekeeping_get_ns(&tk->tkr_mono);
+		nsecs = timekeeping_get_ns(&tk->tkr_mono, cycles_stamp);
 
 	} while (read_seqcount_retry(&tk_core.seq, seq));
 
@@ -833,7 +836,7 @@ ktime_t ktime_get_raw(void)
 	do {
 		seq = read_seqcount_begin(&tk_core.seq);
 		base = tk->tkr_raw.base;
-		nsecs = timekeeping_get_ns(&tk->tkr_raw);
+		nsecs = timekeeping_get_ns(&tk->tkr_raw, NULL);
 
 	} while (read_seqcount_retry(&tk_core.seq, seq));
 
@@ -861,7 +864,7 @@ void ktime_get_ts64(struct timespec64 *ts)
 	do {
 		seq = read_seqcount_begin(&tk_core.seq);
 		ts->tv_sec = tk->xtime_sec;
-		nsec = timekeeping_get_ns(&tk->tkr_mono);
+		nsec = timekeeping_get_ns(&tk->tkr_mono, NULL);
 		tomono = tk->wall_to_monotonic;
 
 	} while (read_seqcount_retry(&tk_core.seq, seq));
@@ -1379,7 +1382,7 @@ void getrawmonotonic64(struct timespec64 *ts)
 
 	do {
 		seq = read_seqcount_begin(&tk_core.seq);
-		nsecs = timekeeping_get_ns(&tk->tkr_raw);
+		nsecs = timekeeping_get_ns(&tk->tkr_raw, NULL);
 		ts64 = tk->raw_time;
 
 	} while (read_seqcount_retry(&tk_core.seq, seq));
@@ -2224,7 +2227,7 @@ ktime_t ktime_get_update_offsets_now(unsigned int *cwsseq, ktime_t *offs_real,
 		seq = read_seqcount_begin(&tk_core.seq);
 
 		base = tk->tkr_mono.base;
-		nsecs = timekeeping_get_ns(&tk->tkr_mono);
+		nsecs = timekeeping_get_ns(&tk->tkr_mono, NULL);
 		base = ktime_add_ns(base, nsecs);
 
 		if (*cwsseq != tk->clock_was_set_seq) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 06/11] timekeeping: add functions returning cycle stamp counter along with time
  2017-07-21 15:45 [PATCH v2 00/11] make L2's kvm-clock stable, get rid of pvclock_gtod Denis Plotnikov
                   ` (4 preceding siblings ...)
  2017-07-21 15:45 ` [PATCH v2 05/11] timekeeping: change ktime_get_with_offset interface to accept cycles pointer Denis Plotnikov
@ 2017-07-21 15:45 ` Denis Plotnikov
  2017-07-21 15:45 ` [PATCH v2 07/11] timekeeper: add clocksource change notifier Denis Plotnikov
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 16+ messages in thread
From: Denis Plotnikov @ 2017-07-21 15:45 UTC (permalink / raw)
  To: kvm, rkrcmar; +Cc: pbonzini, den, rkagan

Make interface functions providing support of cycle stamp pointer

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 include/linux/timekeeping.h | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index fc6683b..edffe82 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -178,23 +178,37 @@ extern u32 ktime_get_resolution_ns(void);
 
 /**
  * ktime_get_real - get the real (wall-) time in ktime_t format
+ * ktime_get_real_with_cycles - does the same and stores the cycles value
+ * (if any) used for the ktime_ calculation in the pointer given
  */
 static inline ktime_t ktime_get_real(void)
 {
 	return ktime_get_with_offset(TK_OFFS_REAL, NULL);
 }
 
+static inline ktime_t ktime_get_real_with_cycles(u64 *cycles_stamp)
+{
+	return ktime_get_with_offset(TK_OFFS_REAL, cycles_stamp);
+}
+
 /**
  * ktime_get_boottime - Returns monotonic time since boot in ktime_t format
  *
  * This is similar to CLOCK_MONTONIC/ktime_get, but also includes the
  * time spent in suspend.
+ *
+ * ktime_get_boottime_with_cycles - the version of the function storing the
+ *	cycles value used for the ktime_ calculationif (any in) the pointer
  */
 static inline ktime_t ktime_get_boottime(void)
 {
 	return ktime_get_with_offset(TK_OFFS_BOOT, NULL);
 }
 
+static inline ktime_t ktime_get_boottime_with_cycles(u64 *cycles_stamp)
+{
+	return ktime_get_with_offset(TK_OFFS_BOOT, cycles_stamp);
+}
 /**
  * ktime_get_clocktai - Returns the TAI time of day in ktime_t format
  */
@@ -221,11 +235,21 @@ static inline u64 ktime_get_real_ns(void)
 	return ktime_to_ns(ktime_get_real());
 }
 
+static inline u64 ktime_get_real_ns_with_cycles(u64 *cycles_stamp)
+{
+	return ktime_to_ns(ktime_get_real_with_cycles(cycles_stamp));
+}
+
 static inline u64 ktime_get_boot_ns(void)
 {
 	return ktime_to_ns(ktime_get_boottime());
 }
 
+static inline u64 ktime_get_boot_ns_with_cycles(u64 *cycles_stamp)
+{
+	return ktime_to_ns(ktime_get_boottime_with_cycles(cycles_stamp));
+}
+
 static inline u64 ktime_get_tai_ns(void)
 {
 	return ktime_to_ns(ktime_get_clocktai());
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 07/11] timekeeper: add clocksource change notifier
  2017-07-21 15:45 [PATCH v2 00/11] make L2's kvm-clock stable, get rid of pvclock_gtod Denis Plotnikov
                   ` (5 preceding siblings ...)
  2017-07-21 15:45 ` [PATCH v2 06/11] timekeeping: add functions returning cycle stamp counter along with time Denis Plotnikov
@ 2017-07-21 15:45 ` Denis Plotnikov
  2017-07-21 15:45 ` [PATCH v2 08/11] timekeeper: add a couple of the core timekeeper reading helpers Denis Plotnikov
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 16+ messages in thread
From: Denis Plotnikov @ 2017-07-21 15:45 UTC (permalink / raw)
  To: kvm, rkrcmar; +Cc: pbonzini, den, rkagan

This notifier will fire when clocksource is changed or
any properties of the clocksource are changed which alter
the clocksource nature, for example its stability property

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 include/linux/cs_notifier.h | 17 +++++++++++++++
 kernel/time/timekeeping.c   | 51 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 68 insertions(+)
 create mode 100644 include/linux/cs_notifier.h

diff --git a/include/linux/cs_notifier.h b/include/linux/cs_notifier.h
new file mode 100644
index 0000000..2b1b4e6
--- /dev/null
+++ b/include/linux/cs_notifier.h
@@ -0,0 +1,17 @@
+#ifndef _CS_CHANGES_H
+#define _CS_CHANGES_H
+
+#include <linux/notifier.h>
+
+/*
+ * The clocksource changes notifier is called when the system
+ * clocksource is changed or some properties of the current
+ * system clocksource is changed that can affect other parts of the system,
+ * for example KVM guests
+ */
+
+extern void clocksource_changes_notify(void);
+extern int clocksource_changes_register_notifier(struct notifier_block *nb);
+extern int clocksource_changes_unregister_notifier(struct notifier_block *nb);
+
+#endif /* _CS_CHANGES_H */
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index b6d7882..1cef214 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -592,6 +592,55 @@ int pvclock_gtod_unregister_notifier(struct notifier_block *nb)
 }
 EXPORT_SYMBOL_GPL(pvclock_gtod_unregister_notifier);
 
+/* notification chain when there is some changes in the clocksource */
+static RAW_NOTIFIER_HEAD(clocksource_changes_chain);
+
+/**
+ * notify_clocksource_changing - notify all the listeners about changes
+ * happened in the clocksource: changing a clocksource, changing the sensitive
+ * parameters of the clocksource, e.g. stability flag for kvmclock
+ */
+void clocksource_changes_notify(void)
+{
+	raw_notifier_call_chain(&clocksource_changes_chain, 0L, NULL);
+}
+EXPORT_SYMBOL_GPL(clocksource_changes_notify);
+
+/**
+ * clocksource_changes_register_notifier - register
+ * a clocksource changes listener
+ */
+int clocksource_changes_register_notifier(struct notifier_block *nb)
+{
+	unsigned long flags;
+	int ret;
+
+	raw_spin_lock_irqsave(&timekeeper_lock, flags);
+	ret = raw_notifier_chain_register(&clocksource_changes_chain, nb);
+	clocksource_changes_notify();
+	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(clocksource_changes_register_notifier);
+
+/**
+ * clocksource_changes_unregister_notifier - unregister
+ * a clocksource changes listener
+ */
+int clocksource_changes_unregister_notifier(struct notifier_block *nb)
+{
+	unsigned long flags;
+	int ret;
+
+	raw_spin_lock_irqsave(&timekeeper_lock, flags);
+	ret = raw_notifier_chain_unregister(&clocksource_changes_chain, nb);
+	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(clocksource_changes_unregister_notifier);
+
 /*
  * tk_update_leap_state - helper to update the next_leap_ktime
  */
@@ -1342,6 +1391,7 @@ static int change_clocksource(void *data)
 		}
 	}
 	timekeeping_update(tk, TK_CLEAR_NTP | TK_MIRROR | TK_CLOCK_WAS_SET);
+	clocksource_changes_notify();
 
 	write_seqcount_end(&tk_core.seq);
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
@@ -1521,6 +1571,7 @@ void __init timekeeping_init(void)
 	tk_set_wall_to_mono(tk, tmp);
 
 	timekeeping_update(tk, TK_MIRROR | TK_CLOCK_WAS_SET);
+	clocksource_changes_notify();
 
 	write_seqcount_end(&tk_core.seq);
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 08/11] timekeeper: add a couple of the core timekeeper reading helpers
  2017-07-21 15:45 [PATCH v2 00/11] make L2's kvm-clock stable, get rid of pvclock_gtod Denis Plotnikov
                   ` (6 preceding siblings ...)
  2017-07-21 15:45 ` [PATCH v2 07/11] timekeeper: add clocksource change notifier Denis Plotnikov
@ 2017-07-21 15:45 ` Denis Plotnikov
  2017-07-23  4:02   ` kbuild test robot
  2017-07-23  4:02   ` kbuild test robot
  2017-07-21 15:45 ` [PATCH v2 09/11] KVM: get rid of pv_clock_gtod Denis Plotnikov
                   ` (2 subsequent siblings)
  10 siblings, 2 replies; 16+ messages in thread
From: Denis Plotnikov @ 2017-07-21 15:45 UTC (permalink / raw)
  To: kvm, rkrcmar; +Cc: pbonzini, den, rkagan

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 include/linux/timekeeping.h |  3 +++
 kernel/time/timekeeping.c   | 15 +++++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
index edffe82..092bf5f 100644
--- a/include/linux/timekeeping.h
+++ b/include/linux/timekeeping.h
@@ -372,3 +372,6 @@ extern int update_persistent_clock64(struct timespec64 now);
 
 
 #endif
+
+extern const seqcount_t *get_tk_seq(void);
+extern int get_tk_mono_clock_mode(void);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 1cef214..3f35e52 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -2381,3 +2381,18 @@ void xtime_update(unsigned long ticks)
 	write_sequnlock(&jiffies_lock);
 	update_wall_time();
 }
+
+/*
+ * Helpers forthe core timekeeper reading
+ */
+const seqcount_t *get_tk_seq(void)
+{
+	return &tk_core.seq;
+}
+EXPORT_SYMBOL(get_tk_seq);
+
+int get_tk_mono_clock_mode(void)
+{
+	return tk_core.timekeeper.tkr_mono.clock->archdata.vclock_mode;
+}
+EXPORT_SYMBOL(get_tk_mono_clock_mode);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 09/11] KVM: get rid of pv_clock_gtod
  2017-07-21 15:45 [PATCH v2 00/11] make L2's kvm-clock stable, get rid of pvclock_gtod Denis Plotnikov
                   ` (7 preceding siblings ...)
  2017-07-21 15:45 ` [PATCH v2 08/11] timekeeper: add a couple of the core timekeeper reading helpers Denis Plotnikov
@ 2017-07-21 15:45 ` Denis Plotnikov
  2017-07-25 10:37   ` Paolo Bonzini
  2017-07-21 15:45 ` [PATCH v2 10/11] pvclock: add clocksource change notification on changing of tsc stable bit Denis Plotnikov
  2017-07-21 15:45 ` [PATCH v2 11/11] KVM: add pvclock to a list of stable clocks Denis Plotnikov
  10 siblings, 1 reply; 16+ messages in thread
From: Denis Plotnikov @ 2017-07-21 15:45 UTC (permalink / raw)
  To: kvm, rkrcmar; +Cc: pbonzini, den, rkagan

Thanks to a set of recently added timekeeper functions providing
the cycles stamp along with the kernel time, now we have an ability
to get time values right from the kerenl avoiding supporting a shadow
copy of timekeeper data structures.

This reduces overheads and complexity of the KVM code and makes
time operations more clear.

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 arch/x86/include/asm/kvm_host.h |   2 +-
 arch/x86/kvm/trace.h            |  27 +++--
 arch/x86/kvm/x86.c              | 259 ++++++++++++----------------------------
 3 files changed, 89 insertions(+), 199 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 695605e..27a2df9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -781,7 +781,7 @@ struct kvm_arch {
 	u64 cur_tsc_generation;
 	int nr_vcpus_matched_tsc;
 
-	spinlock_t pvclock_gtod_sync_lock;
+	spinlock_t masterclock_lock;
 	bool use_master_clock;
 	u64 master_kernel_ns;
 	u64 master_cycle_now;
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 0a6cc67..5ed12fe 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -812,40 +812,41 @@ TRACE_EVENT(kvm_write_tsc_offset,
 	{VCLOCK_TSC,  "tsc"}				\
 
 TRACE_EVENT(kvm_update_master_clock,
-	TP_PROTO(bool use_master_clock, unsigned int host_clock, bool offset_matched),
-	TP_ARGS(use_master_clock, host_clock, offset_matched),
+	TP_PROTO(bool use_master_clock, bool host_clock_stable,
+		bool offset_matched),
+	TP_ARGS(use_master_clock, host_clock_stable, offset_matched),
 
 	TP_STRUCT__entry(
 		__field(		bool,	use_master_clock	)
-		__field(	unsigned int,	host_clock		)
+		__field(		bool,	host_clock_stable	)
 		__field(		bool,	offset_matched		)
 	),
 
 	TP_fast_assign(
 		__entry->use_master_clock	= use_master_clock;
-		__entry->host_clock		= host_clock;
+		__entry->host_clock_stable	= host_clock_stable;
 		__entry->offset_matched		= offset_matched;
 	),
 
-	TP_printk("masterclock %d hostclock %s offsetmatched %u",
+	TP_printk("masterclock %d hostclock stable %u offsetmatched %u",
 		  __entry->use_master_clock,
-		  __print_symbolic(__entry->host_clock, host_clocks),
+		  __entry->host_clock_stable,
 		  __entry->offset_matched)
 );
 
 TRACE_EVENT(kvm_track_tsc,
 	TP_PROTO(unsigned int vcpu_id, unsigned int nr_matched,
 		 unsigned int online_vcpus, bool use_master_clock,
-		 unsigned int host_clock),
+		 bool host_clock_stable),
 	TP_ARGS(vcpu_id, nr_matched, online_vcpus, use_master_clock,
-		host_clock),
+		host_clock_stable),
 
 	TP_STRUCT__entry(
 		__field(	unsigned int,	vcpu_id			)
 		__field(	unsigned int,	nr_vcpus_matched_tsc	)
 		__field(	unsigned int,	online_vcpus		)
 		__field(	bool,		use_master_clock	)
-		__field(	unsigned int,	host_clock		)
+		__field(	bool,		host_clock_stable	)
 	),
 
 	TP_fast_assign(
@@ -853,14 +854,14 @@ TRACE_EVENT(kvm_track_tsc,
 		__entry->nr_vcpus_matched_tsc	= nr_matched;
 		__entry->online_vcpus		= online_vcpus;
 		__entry->use_master_clock	= use_master_clock;
-		__entry->host_clock		= host_clock;
+		__entry->host_clock_stable	= host_clock_stable;
 	),
 
-	TP_printk("vcpu_id %u masterclock %u offsetmatched %u nr_online %u"
-		  " hostclock %s",
+	TP_printk("vcpu_id %u masterclock %u offsetmatched %u nr_online %u "
+		  "hostclock stable %u",
 		  __entry->vcpu_id, __entry->use_master_clock,
 		  __entry->nr_vcpus_matched_tsc, __entry->online_vcpus,
-		  __print_symbolic(__entry->host_clock, host_clocks))
+		  __entry->host_clock_stable)
 );
 
 #endif /* CONFIG_X86_64 */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0e846f0..ce491bb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -50,7 +50,7 @@
 #include <linux/hash.h>
 #include <linux/pci.h>
 #include <linux/timekeeper_internal.h>
-#include <linux/pvclock_gtod.h>
+#include <linux/cs_notifier.h>
 #include <linux/kvm_irqfd.h>
 #include <linux/irqbypass.h>
 #include <linux/sched/stat.h>
@@ -1131,50 +1131,6 @@ static int do_set_msr(struct kvm_vcpu *vcpu, unsigned index, u64 *data)
 	return kvm_set_msr(vcpu, &msr);
 }
 
-#ifdef CONFIG_X86_64
-struct pvclock_gtod_data {
-	seqcount_t	seq;
-
-	struct { /* extract of a clocksource struct */
-		int vclock_mode;
-		u64	cycle_last;
-		u64	mask;
-		u32	mult;
-		u32	shift;
-	} clock;
-
-	u64		boot_ns;
-	u64		nsec_base;
-	u64		wall_time_sec;
-};
-
-static struct pvclock_gtod_data pvclock_gtod_data;
-
-static void update_pvclock_gtod(struct timekeeper *tk)
-{
-	struct pvclock_gtod_data *vdata = &pvclock_gtod_data;
-	u64 boot_ns;
-
-	boot_ns = ktime_to_ns(ktime_add(tk->tkr_mono.base, tk->offs_boot));
-
-	write_seqcount_begin(&vdata->seq);
-
-	/* copy pvclock gtod data */
-	vdata->clock.vclock_mode	= tk->tkr_mono.clock->archdata.vclock_mode;
-	vdata->clock.cycle_last		= tk->tkr_mono.cycle_last;
-	vdata->clock.mask		= tk->tkr_mono.mask;
-	vdata->clock.mult		= tk->tkr_mono.mult;
-	vdata->clock.shift		= tk->tkr_mono.shift;
-
-	vdata->boot_ns			= boot_ns;
-	vdata->nsec_base		= tk->tkr_mono.xtime_nsec;
-
-	vdata->wall_time_sec            = tk->xtime_sec;
-
-	write_seqcount_end(&vdata->seq);
-}
-#endif
-
 void kvm_set_pending_timer(struct kvm_vcpu *vcpu)
 {
 	/*
@@ -1266,10 +1222,6 @@ static void kvm_get_time_scale(uint64_t scaled_hz, uint64_t base_hz,
 		 __func__, base_hz, scaled_hz, shift, *pmultiplier);
 }
 
-#ifdef CONFIG_X86_64
-static atomic_t kvm_guest_has_master_clock = ATOMIC_INIT(0);
-#endif
-
 static DEFINE_PER_CPU(unsigned long, cpu_tsc_khz);
 static unsigned long max_tsc_khz;
 
@@ -1358,12 +1310,32 @@ static u64 compute_guest_tsc(struct kvm_vcpu *vcpu, s64 kernel_ns)
 	return tsc;
 }
 
+#ifdef CONFIG_X86_64
+static bool clocksource_stable(void)
+{
+	return get_tk_mono_clock_mode() == VCLOCK_TSC;
+}
+
+static bool clocksource_stability_check(void)
+{
+	unsigned int seq;
+	const seqcount_t *s = get_tk_seq();
+	bool stable;
+
+	{
+		seq = read_seqcount_begin(s);
+		stable = clocksource_stable();
+	} while (unlikely(read_seqcount_retry(s, seq)));
+
+	return stable;
+}
+#endif
+
 static void kvm_track_tsc_matching(struct kvm_vcpu *vcpu)
 {
 #ifdef CONFIG_X86_64
-	bool vcpus_matched;
+	bool vcpus_matched, clocksource_stable;
 	struct kvm_arch *ka = &vcpu->kvm->arch;
-	struct pvclock_gtod_data *gtod = &pvclock_gtod_data;
 
 	vcpus_matched = (ka->nr_vcpus_matched_tsc + 1 ==
 			 atomic_read(&vcpu->kvm->online_vcpus));
@@ -1376,13 +1348,14 @@ static void kvm_track_tsc_matching(struct kvm_vcpu *vcpu)
 	 * and the vcpus need to have matched TSCs.  When that happens,
 	 * perform request to enable masterclock.
 	 */
+	clocksource_stable = clocksource_stability_check();
 	if (ka->use_master_clock ||
-	    (gtod->clock.vclock_mode == VCLOCK_TSC && vcpus_matched))
+		(clocksource_stable && vcpus_matched))
 		kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu);
 
 	trace_kvm_track_tsc(vcpu->vcpu_id, ka->nr_vcpus_matched_tsc,
-			    atomic_read(&vcpu->kvm->online_vcpus),
-		            ka->use_master_clock, gtod->clock.vclock_mode);
+				atomic_read(&vcpu->kvm->online_vcpus),
+				ka->use_master_clock, clocksource_stable);
 #endif
 }
 
@@ -1535,7 +1508,7 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, struct msr_data *msr)
 	kvm_vcpu_write_tsc_offset(vcpu, offset);
 	raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
 
-	spin_lock(&kvm->arch.pvclock_gtod_sync_lock);
+	spin_lock(&kvm->arch.masterclock_lock);
 	if (!matched) {
 		kvm->arch.nr_vcpus_matched_tsc = 0;
 	} else if (!already_matched) {
@@ -1543,7 +1516,7 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, struct msr_data *msr)
 	}
 
 	kvm_track_tsc_matching(vcpu);
-	spin_unlock(&kvm->arch.pvclock_gtod_sync_lock);
+	spin_unlock(&kvm->arch.masterclock_lock);
 }
 
 EXPORT_SYMBOL_GPL(kvm_write_tsc);
@@ -1563,99 +1536,41 @@ static inline void adjust_tsc_offset_host(struct kvm_vcpu *vcpu, s64 adjustment)
 }
 
 #ifdef CONFIG_X86_64
-
-static u64 read_tsc(void)
+static bool kvm_get_host_time_and_cycles(s64 *kernel_ns, u64 *cycle_now,
+						u64 (*get_time)(u64 *cycle_now))
 {
-	u64 ret = (u64)rdtsc_ordered();
-	u64 last = pvclock_gtod_data.clock.cycle_last;
+	unsigned int seq;
+	const seqcount_t *s = get_tk_seq();
+	bool stable;
 
-	if (likely(ret >= last))
-		return ret;
+	{
+		seq = read_seqcount_begin(s);
+		stable = clocksource_stable();
+		if (stable)
+			*kernel_ns = get_time(cycle_now);
+	} while (unlikely(read_seqcount_retry(s, seq)));
 
-	/*
-	 * GCC likes to generate cmov here, but this branch is extremely
-	 * predictable (it's just a function of time and the likely is
-	 * very likely) and there's a data dependence, so force GCC
-	 * to generate a branch instead.  I don't barrier() because
-	 * we don't actually need a barrier, and if this function
-	 * ever gets inlined it will generate worse code.
-	 */
-	asm volatile ("");
-	return last;
-}
-
-static inline u64 vgettsc(u64 *cycle_now)
-{
-	long v;
-	struct pvclock_gtod_data *gtod = &pvclock_gtod_data;
-
-	*cycle_now = read_tsc();
-
-	v = (*cycle_now - gtod->clock.cycle_last) & gtod->clock.mask;
-	return v * gtod->clock.mult;
-}
-
-static int do_monotonic_boot(s64 *t, u64 *cycle_now)
-{
-	struct pvclock_gtod_data *gtod = &pvclock_gtod_data;
-	unsigned long seq;
-	int mode;
-	u64 ns;
-
-	do {
-		seq = read_seqcount_begin(&gtod->seq);
-		mode = gtod->clock.vclock_mode;
-		ns = gtod->nsec_base;
-		ns += vgettsc(cycle_now);
-		ns >>= gtod->clock.shift;
-		ns += gtod->boot_ns;
-	} while (unlikely(read_seqcount_retry(&gtod->seq, seq)));
-	*t = ns;
-
-	return mode;
-}
-
-static int do_realtime(struct timespec *ts, u64 *cycle_now)
-{
-	struct pvclock_gtod_data *gtod = &pvclock_gtod_data;
-	unsigned long seq;
-	int mode;
-	u64 ns;
-
-	do {
-		seq = read_seqcount_begin(&gtod->seq);
-		mode = gtod->clock.vclock_mode;
-		ts->tv_sec = gtod->wall_time_sec;
-		ns = gtod->nsec_base;
-		ns += vgettsc(cycle_now);
-		ns >>= gtod->clock.shift;
-	} while (unlikely(read_seqcount_retry(&gtod->seq, seq)));
-
-	ts->tv_sec += __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns);
-	ts->tv_nsec = ns;
-
-	return mode;
+	return stable;
 }
 
 /* returns true if host is using tsc clocksource */
 static bool kvm_get_time_and_clockread(s64 *kernel_ns, u64 *cycle_now)
 {
-	/* checked again under seqlock below */
-	if (pvclock_gtod_data.clock.vclock_mode != VCLOCK_TSC)
-		return false;
-
-	return do_monotonic_boot(kernel_ns, cycle_now) == VCLOCK_TSC;
+	return kvm_get_host_time_and_cycles(
+		kernel_ns, cycle_now, ktime_get_boot_ns_with_cycles);
 }
 
 /* returns true if host is using tsc clocksource */
-static bool kvm_get_walltime_and_clockread(struct timespec *ts,
-					   u64 *cycle_now)
+static bool kvm_get_walltime_and_clockread(struct timespec *ts, u64 *cycle_now)
 {
-	/* checked again under seqlock below */
-	if (pvclock_gtod_data.clock.vclock_mode != VCLOCK_TSC)
-		return false;
+	bool res;
+	s64 kernel_ns;
 
-	return do_realtime(ts, cycle_now) == VCLOCK_TSC;
+	res = kvm_get_host_time_and_cycles(
+		&kernel_ns, cycle_now, ktime_get_real_ns_with_cycles);
+	*ts = ktime_to_timespec(kernel_ns);
+
+	return res;
 }
 #endif
 
@@ -1700,12 +1615,11 @@ static bool kvm_get_walltime_and_clockread(struct timespec *ts,
  *
  */
 
-static void pvclock_update_vm_gtod_copy(struct kvm *kvm)
+static void update_masterclock(struct kvm *kvm)
 {
 #ifdef CONFIG_X86_64
 	struct kvm_arch *ka = &kvm->arch;
-	int vclock_mode;
-	bool host_tsc_clocksource, vcpus_matched;
+	bool host_clocksource_stable, vcpus_matched;
 
 	vcpus_matched = (ka->nr_vcpus_matched_tsc + 1 ==
 			atomic_read(&kvm->online_vcpus));
@@ -1714,20 +1628,16 @@ static void pvclock_update_vm_gtod_copy(struct kvm *kvm)
 	 * If the host uses TSC clock, then passthrough TSC as stable
 	 * to the guest.
 	 */
-	host_tsc_clocksource = kvm_get_time_and_clockread(
+	host_clocksource_stable = kvm_get_time_and_clockread(
 					&ka->master_kernel_ns,
 					&ka->master_cycle_now);
 
-	ka->use_master_clock = host_tsc_clocksource && vcpus_matched
+	ka->use_master_clock = host_clocksource_stable && vcpus_matched
 				&& !backwards_tsc_observed
 				&& !ka->boot_vcpu_runs_old_kvmclock;
 
-	if (ka->use_master_clock)
-		atomic_set(&kvm_guest_has_master_clock, 1);
-
-	vclock_mode = pvclock_gtod_data.clock.vclock_mode;
-	trace_kvm_update_master_clock(ka->use_master_clock, vclock_mode,
-					vcpus_matched);
+	trace_kvm_update_master_clock(ka->use_master_clock,
+					host_clocksource_stable, vcpus_matched);
 #endif
 }
 
@@ -1743,10 +1653,10 @@ static void kvm_gen_update_masterclock(struct kvm *kvm)
 	struct kvm_vcpu *vcpu;
 	struct kvm_arch *ka = &kvm->arch;
 
-	spin_lock(&ka->pvclock_gtod_sync_lock);
+	spin_lock(&ka->masterclock_lock);
 	kvm_make_mclock_inprogress_request(kvm);
 	/* no guest entries from this point */
-	pvclock_update_vm_gtod_copy(kvm);
+	update_masterclock(kvm);
 
 	kvm_for_each_vcpu(i, vcpu, kvm)
 		kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
@@ -1755,7 +1665,7 @@ static void kvm_gen_update_masterclock(struct kvm *kvm)
 	kvm_for_each_vcpu(i, vcpu, kvm)
 		kvm_clear_request(KVM_REQ_MCLOCK_INPROGRESS, vcpu);
 
-	spin_unlock(&ka->pvclock_gtod_sync_lock);
+	spin_unlock(&ka->masterclock_lock);
 #endif
 }
 
@@ -1765,15 +1675,15 @@ u64 get_kvmclock_ns(struct kvm *kvm)
 	struct pvclock_vcpu_time_info hv_clock;
 	u64 ret;
 
-	spin_lock(&ka->pvclock_gtod_sync_lock);
+	spin_lock(&ka->masterclock_lock);
 	if (!ka->use_master_clock) {
-		spin_unlock(&ka->pvclock_gtod_sync_lock);
+		spin_unlock(&ka->masterclock_lock);
 		return ktime_get_boot_ns() + ka->kvmclock_offset;
 	}
 
 	hv_clock.tsc_timestamp = ka->master_cycle_now;
 	hv_clock.system_time = ka->master_kernel_ns + ka->kvmclock_offset;
-	spin_unlock(&ka->pvclock_gtod_sync_lock);
+	spin_unlock(&ka->masterclock_lock);
 
 	/* both __this_cpu_read() and rdtsc() should be on the same cpu */
 	get_cpu();
@@ -1859,13 +1769,13 @@ static int kvm_guest_time_update(struct kvm_vcpu *v)
 	 * If the host uses TSC clock, then passthrough TSC as stable
 	 * to the guest.
 	 */
-	spin_lock(&ka->pvclock_gtod_sync_lock);
+	spin_lock(&ka->masterclock_lock);
 	use_master_clock = ka->use_master_clock;
 	if (use_master_clock) {
 		host_tsc = ka->master_cycle_now;
 		kernel_ns = ka->master_kernel_ns;
 	}
-	spin_unlock(&ka->pvclock_gtod_sync_lock);
+	spin_unlock(&ka->masterclock_lock);
 
 	/* Keep irq disabled to prevent changes to the clock */
 	local_irq_save(flags);
@@ -6015,7 +5925,8 @@ static void kvm_set_mmio_spte_mask(void)
 }
 
 #ifdef CONFIG_X86_64
-static void pvclock_gtod_update_fn(struct work_struct *work)
+static int process_clocksource_change(struct notifier_block *nb,
+					unsigned long unused0, void *unused1)
 {
 	struct kvm *kvm;
 
@@ -6026,35 +5937,13 @@ static void pvclock_gtod_update_fn(struct work_struct *work)
 	list_for_each_entry(kvm, &vm_list, vm_list)
 		kvm_for_each_vcpu(i, vcpu, kvm)
 			kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu);
-	atomic_set(&kvm_guest_has_master_clock, 0);
 	spin_unlock(&kvm_lock);
-}
-
-static DECLARE_WORK(pvclock_gtod_work, pvclock_gtod_update_fn);
-
-/*
- * Notification about pvclock gtod data update.
- */
-static int pvclock_gtod_notify(struct notifier_block *nb, unsigned long unused,
-			       void *priv)
-{
-	struct pvclock_gtod_data *gtod = &pvclock_gtod_data;
-	struct timekeeper *tk = priv;
-
-	update_pvclock_gtod(tk);
-
-	/* disable master clock if host does not trust, or does not
-	 * use, TSC clocksource
-	 */
-	if (gtod->clock.vclock_mode != VCLOCK_TSC &&
-	    atomic_read(&kvm_guest_has_master_clock) != 0)
-		queue_work(system_long_wq, &pvclock_gtod_work);
-
 	return 0;
 }
 
-static struct notifier_block pvclock_gtod_notifier = {
-	.notifier_call = pvclock_gtod_notify,
+
+static struct notifier_block clocksource_changes_notifier = {
+	.notifier_call = process_clocksource_change,
 };
 #endif
 
@@ -6107,7 +5996,7 @@ int kvm_arch_init(void *opaque)
 
 	kvm_lapic_init();
 #ifdef CONFIG_X86_64
-	pvclock_gtod_register_notifier(&pvclock_gtod_notifier);
+	clocksource_changes_register_notifier(&clocksource_changes_notifier);
 #endif
 
 	return 0;
@@ -6128,7 +6017,7 @@ void kvm_arch_exit(void)
 					    CPUFREQ_TRANSITION_NOTIFIER);
 	cpuhp_remove_state_nocalls(CPUHP_AP_X86_KVM_CLK_ONLINE);
 #ifdef CONFIG_X86_64
-	pvclock_gtod_unregister_notifier(&pvclock_gtod_notifier);
+	clocksource_changes_unregister_notifier(&clocksource_changes_notifier);
 #endif
 	kvm_x86_ops = NULL;
 	kvm_mmu_module_exit();
@@ -8031,10 +7920,10 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	raw_spin_lock_init(&kvm->arch.tsc_write_lock);
 	mutex_init(&kvm->arch.apic_map_lock);
 	mutex_init(&kvm->arch.hyperv.hv_lock);
-	spin_lock_init(&kvm->arch.pvclock_gtod_sync_lock);
+	spin_lock_init(&kvm->arch.masterclock_lock);
 
 	kvm->arch.kvmclock_offset = -ktime_get_boot_ns();
-	pvclock_update_vm_gtod_copy(kvm);
+	update_masterclock(kvm);
 
 	INIT_DELAYED_WORK(&kvm->arch.kvmclock_update_work, kvmclock_update_fn);
 	INIT_DELAYED_WORK(&kvm->arch.kvmclock_sync_work, kvmclock_sync_fn);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 10/11] pvclock: add clocksource change notification on changing of tsc stable bit
  2017-07-21 15:45 [PATCH v2 00/11] make L2's kvm-clock stable, get rid of pvclock_gtod Denis Plotnikov
                   ` (8 preceding siblings ...)
  2017-07-21 15:45 ` [PATCH v2 09/11] KVM: get rid of pv_clock_gtod Denis Plotnikov
@ 2017-07-21 15:45 ` Denis Plotnikov
  2017-07-21 15:45 ` [PATCH v2 11/11] KVM: add pvclock to a list of stable clocks Denis Plotnikov
  10 siblings, 0 replies; 16+ messages in thread
From: Denis Plotnikov @ 2017-07-21 15:45 UTC (permalink / raw)
  To: kvm, rkrcmar; +Cc: pbonzini, den, rkagan

We going to allow L2 guest to use L1's kvm-clock clocksource, which is
a pvclock clocksource, as a stable one if it's stable in L1.

Therefore, we need to know when L1's kvm-clock becomes stable/unstable
to make it stable/unstable in L2 as well.

Do it, by adding tracking stability flag in pvclock

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 arch/x86/kernel/pvclock.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index ab54c92..c73e5a5 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -22,6 +22,7 @@
 #include <linux/gfp.h>
 #include <linux/bootmem.h>
 #include <linux/nmi.h>
+#include <linux/cs_notifier.h>
 
 #include <asm/fixmap.h>
 #include <asm/pvclock.h>
@@ -73,6 +74,8 @@ u8 pvclock_read_flags(struct pvclock_vcpu_time_info *src)
 	return flags & valid_flags;
 }
 
+static atomic64_t clocksource_stable = ATOMIC64_INIT(0);
+
 u64 pvclock_clocksource_read(struct pvclock_vcpu_time_info *src,
 				u64 *cycles_stamp)
 {
@@ -96,10 +99,20 @@ u64 pvclock_clocksource_read(struct pvclock_vcpu_time_info *src,
 		pvclock_touch_watchdogs();
 	}
 
-	if ((valid_flags & PVCLOCK_TSC_STABLE_BIT) &&
-		(flags & PVCLOCK_TSC_STABLE_BIT))
-		return ret;
+	if (likely(valid_flags & PVCLOCK_TSC_STABLE_BIT)) {
+		bool stable_now = !!(flags & PVCLOCK_TSC_STABLE_BIT);
+		bool stable_last = (bool) atomic64_read(&clocksource_stable);
+
+		if (unlikely(stable_now != stable_last)) {
+			/* send notification once */
+			if (stable_last == atomic64_cmpxchg(
+				&clocksource_stable, stable_last, stable_now))
+				clocksource_changes_notify();
+		}
 
+		if (stable_now)
+			return ret;
+	}
 	/*
 	 * Assumption here is that last_value, a global accumulator, always goes
 	 * forward. If we are less than that, we should not be much smaller.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v2 11/11] KVM: add pvclock to a list of stable clocks.
  2017-07-21 15:45 [PATCH v2 00/11] make L2's kvm-clock stable, get rid of pvclock_gtod Denis Plotnikov
                   ` (9 preceding siblings ...)
  2017-07-21 15:45 ` [PATCH v2 10/11] pvclock: add clocksource change notification on changing of tsc stable bit Denis Plotnikov
@ 2017-07-21 15:45 ` Denis Plotnikov
  10 siblings, 0 replies; 16+ messages in thread
From: Denis Plotnikov @ 2017-07-21 15:45 UTC (permalink / raw)
  To: kvm, rkrcmar; +Cc: pbonzini, den, rkagan

This means allow to L2 guests to use masterclock when
L1 has a stable pvclock (currenly, kvm-clock)

Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 arch/x86/kernel/pvclock.c |  1 +
 arch/x86/kvm/x86.c        | 10 +++++++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c
index c73e5a5..d6bef69 100644
--- a/arch/x86/kernel/pvclock.c
+++ b/arch/x86/kernel/pvclock.c
@@ -73,6 +73,7 @@ u8 pvclock_read_flags(struct pvclock_vcpu_time_info *src)
 
 	return flags & valid_flags;
 }
+EXPORT_SYMBOL(pvclock_read_flags);
 
 static atomic64_t clocksource_stable = ATOMIC64_INIT(0);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ce491bb..9b0f306 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1311,9 +1311,17 @@ static u64 compute_guest_tsc(struct kvm_vcpu *vcpu, s64 kernel_ns)
 }
 
 #ifdef CONFIG_X86_64
+static bool pvclock_stable(void)
+{
+	return !!(pvclock_read_flags(&pvclock_pvti_cpu0_va()->pvti)
+		& PVCLOCK_TSC_STABLE_BIT);
+}
+
 static bool clocksource_stable(void)
 {
-	return get_tk_mono_clock_mode() == VCLOCK_TSC;
+	return get_tk_mono_clock_mode() == VCLOCK_TSC ||
+		(get_tk_mono_clock_mode() == VCLOCK_PVCLOCK
+		&& pvclock_stable());
 }
 
 static bool clocksource_stability_check(void)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 08/11] timekeeper: add a couple of the core timekeeper reading helpers
  2017-07-21 15:45 ` [PATCH v2 08/11] timekeeper: add a couple of the core timekeeper reading helpers Denis Plotnikov
@ 2017-07-23  4:02   ` kbuild test robot
  2017-07-23  4:02   ` kbuild test robot
  1 sibling, 0 replies; 16+ messages in thread
From: kbuild test robot @ 2017-07-23  4:02 UTC (permalink / raw)
  To: Denis Plotnikov; +Cc: kbuild-all, kvm, rkrcmar, pbonzini, den, rkagan

[-- Attachment #1: Type: text/plain, Size: 1417 bytes --]

Hi Denis,

[auto build test ERROR on tip/x86/core]
[cannot apply to v4.13-rc1 next-20170721]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Denis-Plotnikov/make-L2-s-kvm-clock-stable-get-rid-of-pvclock_gtod/20170723-113103
config: xtensa-allyesconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 4.9.0
reproduce:
        wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=xtensa 

All errors (new ones prefixed by >>):

   kernel/time/timekeeping.c: In function 'get_tk_mono_clock_mode':
>> kernel/time/timekeeping.c:2396:42: error: 'struct clocksource' has no member named 'archdata'
     return tk_core.timekeeper.tkr_mono.clock->archdata.vclock_mode;
                                             ^
   kernel/time/timekeeping.c:2397:1: warning: control reaches end of non-void function [-Wreturn-type]
    }
    ^

vim +2396 kernel/time/timekeeping.c

  2393	
  2394	int get_tk_mono_clock_mode(void)
  2395	{
> 2396		return tk_core.timekeeper.tkr_mono.clock->archdata.vclock_mode;

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 50282 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 08/11] timekeeper: add a couple of the core timekeeper reading helpers
  2017-07-21 15:45 ` [PATCH v2 08/11] timekeeper: add a couple of the core timekeeper reading helpers Denis Plotnikov
  2017-07-23  4:02   ` kbuild test robot
@ 2017-07-23  4:02   ` kbuild test robot
  1 sibling, 0 replies; 16+ messages in thread
From: kbuild test robot @ 2017-07-23  4:02 UTC (permalink / raw)
  To: Denis Plotnikov; +Cc: kbuild-all, kvm, rkrcmar, pbonzini, den, rkagan

[-- Attachment #1: Type: text/plain, Size: 1434 bytes --]

Hi Denis,

[auto build test ERROR on tip/x86/core]
[cannot apply to v4.13-rc1 next-20170721]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Denis-Plotnikov/make-L2-s-kvm-clock-stable-get-rid-of-pvclock_gtod/20170723-113103
config: ia64-allyesconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 6.2.0
reproduce:
        wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=ia64 

All errors (new ones prefixed by >>):

   kernel/time/timekeeping.c: In function 'get_tk_mono_clock_mode':
>> kernel/time/timekeeping.c:2396:52: error: 'struct arch_clocksource_data' has no member named 'vclock_mode'
     return tk_core.timekeeper.tkr_mono.clock->archdata.vclock_mode;
                                                       ^
   kernel/time/timekeeping.c:2397:1: warning: control reaches end of non-void function [-Wreturn-type]
    }
    ^

vim +2396 kernel/time/timekeeping.c

  2393	
  2394	int get_tk_mono_clock_mode(void)
  2395	{
> 2396		return tk_core.timekeeper.tkr_mono.clock->archdata.vclock_mode;

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 50649 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 01/11] timekeeper: change interface of clocksource reding functions
  2017-07-21 15:45 ` [PATCH v2 01/11] timekeeper: change interface of clocksource reding functions Denis Plotnikov
@ 2017-07-23  4:24   ` kbuild test robot
  0 siblings, 0 replies; 16+ messages in thread
From: kbuild test robot @ 2017-07-23  4:24 UTC (permalink / raw)
  To: Denis Plotnikov; +Cc: kbuild-all, kvm, rkrcmar, pbonzini, den, rkagan

[-- Attachment #1: Type: text/plain, Size: 5185 bytes --]

Hi Denis,

[auto build test ERROR on tip/x86/core]
[cannot apply to v4.13-rc1 next-20170721]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Denis-Plotnikov/make-L2-s-kvm-clock-stable-get-rid-of-pvclock_gtod/20170723-113103
config: sh-allmodconfig (attached as .config)
compiler: sh4-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=sh 

All errors (new ones prefixed by >>):

   drivers/clocksource/timer-sun5i.c: In function 'sun5i_setup_clocksource':
>> drivers/clocksource/timer-sun5i.c:223:18: error: assignment from incompatible pointer type [-Werror=incompatible-pointer-types]
     cs->clksrc.read = sun5i_clksrc_read;
                     ^
   cc1: some warnings being treated as errors

vim +223 drivers/clocksource/timer-sun5i.c

3071efa4 Maxime Ripard 2015-03-31  185  
4a59058f Maxime Ripard 2015-03-31  186  static int __init sun5i_setup_clocksource(struct device_node *node,
4a59058f Maxime Ripard 2015-03-31  187  					  void __iomem *base,
4a59058f Maxime Ripard 2015-03-31  188  					  struct clk *clk, int irq)
4a59058f Maxime Ripard 2015-03-31  189  {
4a59058f Maxime Ripard 2015-03-31  190  	struct sun5i_timer_clksrc *cs;
4a59058f Maxime Ripard 2015-03-31  191  	unsigned long rate;
4a59058f Maxime Ripard 2015-03-31  192  	int ret;
4a59058f Maxime Ripard 2015-03-31  193  
4a59058f Maxime Ripard 2015-03-31  194  	cs = kzalloc(sizeof(*cs), GFP_KERNEL);
4a59058f Maxime Ripard 2015-03-31  195  	if (!cs)
4a59058f Maxime Ripard 2015-03-31  196  		return -ENOMEM;
4a59058f Maxime Ripard 2015-03-31  197  
4a59058f Maxime Ripard 2015-03-31  198  	ret = clk_prepare_enable(clk);
4a59058f Maxime Ripard 2015-03-31  199  	if (ret) {
4a59058f Maxime Ripard 2015-03-31  200  		pr_err("Couldn't enable parent clock\n");
4a59058f Maxime Ripard 2015-03-31  201  		goto err_free;
4a59058f Maxime Ripard 2015-03-31  202  	}
4a59058f Maxime Ripard 2015-03-31  203  
4a59058f Maxime Ripard 2015-03-31  204  	rate = clk_get_rate(clk);
4a59058f Maxime Ripard 2015-03-31  205  
4a59058f Maxime Ripard 2015-03-31  206  	cs->timer.base = base;
4a59058f Maxime Ripard 2015-03-31  207  	cs->timer.clk = clk;
3071efa4 Maxime Ripard 2015-03-31  208  	cs->timer.clk_rate_cb.notifier_call = sun5i_rate_cb_clksrc;
3071efa4 Maxime Ripard 2015-03-31  209  	cs->timer.clk_rate_cb.next = NULL;
3071efa4 Maxime Ripard 2015-03-31  210  
3071efa4 Maxime Ripard 2015-03-31  211  	ret = clk_notifier_register(clk, &cs->timer.clk_rate_cb);
3071efa4 Maxime Ripard 2015-03-31  212  	if (ret) {
3071efa4 Maxime Ripard 2015-03-31  213  		pr_err("Unable to register clock notifier.\n");
3071efa4 Maxime Ripard 2015-03-31  214  		goto err_disable_clk;
3071efa4 Maxime Ripard 2015-03-31  215  	}
4a59058f Maxime Ripard 2015-03-31  216  
4a59058f Maxime Ripard 2015-03-31  217  	writel(~0, base + TIMER_INTVAL_LO_REG(1));
4a59058f Maxime Ripard 2015-03-31  218  	writel(TIMER_CTL_ENABLE | TIMER_CTL_RELOAD,
4a59058f Maxime Ripard 2015-03-31  219  	       base + TIMER_CTL_REG(1));
4a59058f Maxime Ripard 2015-03-31  220  
59387683 Chen-Yu Tsai  2016-10-18  221  	cs->clksrc.name = node->name;
59387683 Chen-Yu Tsai  2016-10-18  222  	cs->clksrc.rating = 340;
59387683 Chen-Yu Tsai  2016-10-18 @223  	cs->clksrc.read = sun5i_clksrc_read;
59387683 Chen-Yu Tsai  2016-10-18  224  	cs->clksrc.mask = CLOCKSOURCE_MASK(32);
59387683 Chen-Yu Tsai  2016-10-18  225  	cs->clksrc.flags = CLOCK_SOURCE_IS_CONTINUOUS;
59387683 Chen-Yu Tsai  2016-10-18  226  
59387683 Chen-Yu Tsai  2016-10-18  227  	ret = clocksource_register_hz(&cs->clksrc, rate);
4a59058f Maxime Ripard 2015-03-31  228  	if (ret) {
4a59058f Maxime Ripard 2015-03-31  229  		pr_err("Couldn't register clock source.\n");
3071efa4 Maxime Ripard 2015-03-31  230  		goto err_remove_notifier;
4a59058f Maxime Ripard 2015-03-31  231  	}
4a59058f Maxime Ripard 2015-03-31  232  
4a59058f Maxime Ripard 2015-03-31  233  	return 0;
4a59058f Maxime Ripard 2015-03-31  234  
3071efa4 Maxime Ripard 2015-03-31  235  err_remove_notifier:
3071efa4 Maxime Ripard 2015-03-31  236  	clk_notifier_unregister(clk, &cs->timer.clk_rate_cb);
4a59058f Maxime Ripard 2015-03-31  237  err_disable_clk:
4a59058f Maxime Ripard 2015-03-31  238  	clk_disable_unprepare(clk);
4a59058f Maxime Ripard 2015-03-31  239  err_free:
4a59058f Maxime Ripard 2015-03-31  240  	kfree(cs);
4a59058f Maxime Ripard 2015-03-31  241  	return ret;
4a59058f Maxime Ripard 2015-03-31  242  }
4a59058f Maxime Ripard 2015-03-31  243  

:::::: The code at line 223 was first introduced by commit
:::::: 593876838826914a7e4e05fbbcb728be6fbc4d89 Revert "clocksource/drivers/timer_sun5i: Replace code by clocksource_mmio_init"

:::::: TO: Chen-Yu Tsai <wens@csie.org>
:::::: CC: Thomas Gleixner <tglx@linutronix.de>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 45304 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2 09/11] KVM: get rid of pv_clock_gtod
  2017-07-21 15:45 ` [PATCH v2 09/11] KVM: get rid of pv_clock_gtod Denis Plotnikov
@ 2017-07-25 10:37   ` Paolo Bonzini
  0 siblings, 0 replies; 16+ messages in thread
From: Paolo Bonzini @ 2017-07-25 10:37 UTC (permalink / raw)
  To: Denis Plotnikov, kvm, rkrcmar; +Cc: den, rkagan

On 21/07/2017 17:45, Denis Plotnikov wrote:
> +static bool kvm_get_host_time_and_cycles(s64 *kernel_ns, u64 *cycle_now,
> +						u64 (*get_time)(u64 *cycle_now))
>  {
> -	u64 ret = (u64)rdtsc_ordered();
> -	u64 last = pvclock_gtod_data.clock.cycle_last;
> +	unsigned int seq;
> +	const seqcount_t *s = get_tk_seq();
> +	bool stable;
>  
> -	if (likely(ret >= last))
> -		return ret;
> +	{
> +		seq = read_seqcount_begin(s);
> +		stable = clocksource_stable();
> +		if (stable)
> +			*kernel_ns = get_time(cycle_now);
> +	} while (unlikely(read_seqcount_retry(s, seq)));
>  

The clocksource change notifier seems fine to me, but this
is a layering violation.  All this needs to happen in
kernel/time/timekeeping.c.

Here is a possible plan:

1) define an alternative reading function such as

int (*read_with_system_time)(ktime_t *device_time,
			     u64 *tstamp)

which calls into a new timekeeper function pointer.
ktime_get_with_system_counter can return -EOPNOTSUPP if the
new function pointer is not supported, and the function pointer
can also return -EOPNOTSUPP if the clock is not stable.

2) use this function from ktime_get_snapshot.  This way that
function returns an actual TSC timestamp in systime_snapshot->cycles
instead of the raw value of the clocksource(*).

3) if needed, add a 'boot' field to struct system_time_snapshot.

4) use ktime_get_snapshot from KVM.  Admittedly there is some
handwaving here.


(*) There is no user of systime_snapshot->cycles, or actually
    the only one (get_system_device_crosststamp) is in dead code
    because get_system_device_crosststamp is always called with
    history_begin == NULL.  But that only user seems to assume
    that systime_snapshot->cycles has the same unit of measure
    as system_counterval.cycles.  So this counts even as a bugfix.

Paolo

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-07-25 10:37 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-07-21 15:45 [PATCH v2 00/11] make L2's kvm-clock stable, get rid of pvclock_gtod Denis Plotnikov
2017-07-21 15:45 ` [PATCH v2 01/11] timekeeper: change interface of clocksource reding functions Denis Plotnikov
2017-07-23  4:24   ` kbuild test robot
2017-07-21 15:45 ` [PATCH v2 02/11] pvclock: write cycle stamp value if a pointer given Denis Plotnikov
2017-07-21 15:45 ` [PATCH v2 03/11] kvmclock: pass cycles pointer to the changed pvclock interface Denis Plotnikov
2017-07-21 15:45 ` [PATCH v2 04/11] TSC: write cycles stamp value to input pointer Denis Plotnikov
2017-07-21 15:45 ` [PATCH v2 05/11] timekeeping: change ktime_get_with_offset interface to accept cycles pointer Denis Plotnikov
2017-07-21 15:45 ` [PATCH v2 06/11] timekeeping: add functions returning cycle stamp counter along with time Denis Plotnikov
2017-07-21 15:45 ` [PATCH v2 07/11] timekeeper: add clocksource change notifier Denis Plotnikov
2017-07-21 15:45 ` [PATCH v2 08/11] timekeeper: add a couple of the core timekeeper reading helpers Denis Plotnikov
2017-07-23  4:02   ` kbuild test robot
2017-07-23  4:02   ` kbuild test robot
2017-07-21 15:45 ` [PATCH v2 09/11] KVM: get rid of pv_clock_gtod Denis Plotnikov
2017-07-25 10:37   ` Paolo Bonzini
2017-07-21 15:45 ` [PATCH v2 10/11] pvclock: add clocksource change notification on changing of tsc stable bit Denis Plotnikov
2017-07-21 15:45 ` [PATCH v2 11/11] KVM: add pvclock to a list of stable clocks Denis Plotnikov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox