linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv4 0/4] xen: maintain an accurate persistent clock in more cases
@ 2013-06-19 15:25 David Vrabel
  2013-06-19 15:25 ` [PATCH 1/4] xen: disable non-boot VCPUs during suspend David Vrabel
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: David Vrabel @ 2013-06-19 15:25 UTC (permalink / raw)
  To: xen-devel
  Cc: David Vrabel, Konrad Rzeszutek Wilk, linux-kernel, John Stultz,
	Thomas Gleixner

Xen guests use the Xen wallclock as their persistent clock.  This is a
software only clock in the hypervisor that is used by guests instead
of a real hardware RTC.

The kernel has limited support for updating the persistent clock or
RTC when NTP is synced.  This has the following limitations:

* The persistent clock is not updated on step changes.  This leaves a
  window where it will be incorrect (while NTP resyncs).

* Xen guests use the Xen wallclock as their persistent clock.  dom0
  maintains this clock so it is persistent for domUs but not dom0
  itself.

These limitations mean that guests started before NTP is synchronized
will start with an incorrect wallclock time and the hardware RTC will
not be updated (as on bare metal).

These series fixes the above limitations and depends on "x86: increase
precision of x86_platform.get/set_wallclock()" which was previously
posted.

Patch 1 is a prerequisite.  It removes a call to clock_was_set() from
the Xen suspend/resume code.

Changes since v3:

Add a new clock_was_set notifier chain. Use this instead of direct
calls to clock_was_set() from the timekeeping code.  Use this notifier
and a new timer to synchronize the Xen wallclock.

Changes since v2:

Don't peek at the timekeeper internals (use __current_kernel_time()
instead).  Use the native set_wallclock hook in dom0.

Changes since v1:

Reworked to use the pvclock_gtod notifier to sync the wallclock (this
looked similar to what a KVM host does).  update_persistent_clock()
will now only update the CMOS RTC.

David


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/4] xen: disable non-boot VCPUs during suspend
  2013-06-19 15:25 [PATCHv4 0/4] xen: maintain an accurate persistent clock in more cases David Vrabel
@ 2013-06-19 15:25 ` David Vrabel
  2013-06-19 17:11   ` Konrad Rzeszutek Wilk
  2013-06-20 10:38   ` [Xen-devel] " Jan Beulich
  2013-06-19 15:25 ` [PATCH 2/4] time: add a notifier chain for when the system time is stepped David Vrabel
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 14+ messages in thread
From: David Vrabel @ 2013-06-19 15:25 UTC (permalink / raw)
  To: xen-devel
  Cc: David Vrabel, Konrad Rzeszutek Wilk, linux-kernel, John Stultz,
	Thomas Gleixner

From: David Vrabel <david.vrabel@citrix.com>

syscore_suspend() and syscore_resume() expect there to be only one
online CPU.  e.g., hrtimers_resume() only triggers events for the
current CPU.  Xen's suspend path was leaving all VCPUs online and then
attempting to fixup problems afterwards (e.g., with an explicit call
to clock_was_set() to trigger pending high resolution timers).

Instead, disable non-boot CPUs before calling stop_machine() and
reenable them afterwards.

This is then similar to what the kexec code does before and after a
kexec jump (see kernel_kexec() in kernel/kexec.c).

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 drivers/xen/manage.c |   14 +++++++++++---
 1 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c
index 412b96c..596e55a 100644
--- a/drivers/xen/manage.c
+++ b/drivers/xen/manage.c
@@ -148,8 +148,19 @@ static void do_suspend(void)
 		si.post = &xen_post_suspend;
 	}
 
+	/*
+	 * syscore_suspend() and syscore_resume() called in
+	 * xen_suspend() above, assume that only the boot CPU is
+	 * online.
+	 */
+	err = disable_nonboot_cpus();
+	if (err)
+		goto out_resume;
+
 	err = stop_machine(xen_suspend, &si, cpumask_of(0));
 
+	enable_nonboot_cpus();
+
 	dpm_resume_start(si.cancelled ? PMSG_THAW : PMSG_RESTORE);
 
 	if (err) {
@@ -166,9 +177,6 @@ out_resume:
 
 	dpm_resume_end(si.cancelled ? PMSG_THAW : PMSG_RESTORE);
 
-	/* Make sure timer events get retriggered on all CPUs */
-	clock_was_set();
-
 out_thaw:
 #ifdef CONFIG_PREEMPT
 	thaw_processes();
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/4] time: add a notifier chain for when the system time is stepped
  2013-06-19 15:25 [PATCHv4 0/4] xen: maintain an accurate persistent clock in more cases David Vrabel
  2013-06-19 15:25 ` [PATCH 1/4] xen: disable non-boot VCPUs during suspend David Vrabel
@ 2013-06-19 15:25 ` David Vrabel
  2013-06-19 16:52   ` John Stultz
  2013-06-19 15:25 ` [PATCH 3/4] x86/xen: sync the wallclock " David Vrabel
  2013-06-19 15:25 ` [PATCH 4/4] x86/xen: sync the CMOS RTC as well as the Xen wallclock David Vrabel
  3 siblings, 1 reply; 14+ messages in thread
From: David Vrabel @ 2013-06-19 15:25 UTC (permalink / raw)
  To: xen-devel
  Cc: David Vrabel, Konrad Rzeszutek Wilk, linux-kernel, John Stultz,
	Thomas Gleixner

From: David Vrabel <david.vrabel@citrix.com>

The high resolution timer code gets notified of step changes to the
system time with clock_was_set() or clock_was_set_delayed() calls.  If
other parts of the kernel require similar notification there is no
clear place to hook into.

Add a clock_was_set atomic notifier chain
(clock_was_set_notifier_list) and call this in place of
clock_was_set().  If the timekeeping locks are held, the calls are
deferred to a new tasklet.

The hrtimer code adds a notifier block to this chain and uses it to
call (the now internal) clock_was_set().  Since the timekeeping code
does not call the chain from the timer irq clock_was_set_delayed() and
associated code can be removed.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 include/linux/hrtimer.h   |    7 -------
 include/linux/time.h      |    5 +++++
 kernel/hrtimer.c          |   33 ++++++++++++++-------------------
 kernel/time/timekeeping.c |   34 +++++++++++++++++++++++++---------
 4 files changed, 44 insertions(+), 35 deletions(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index d19a5c2..6da7439 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -166,7 +166,6 @@ enum  hrtimer_base_type {
  * @lock:		lock protecting the base and associated clock bases
  *			and timers
  * @active_bases:	Bitfield to mark bases with active timers
- * @clock_was_set:	Indicates that clock was set from irq context.
  * @expires_next:	absolute time of the next event which was scheduled
  *			via clock_set_next_event()
  * @hres_active:	State of high resolution mode
@@ -180,7 +179,6 @@ enum  hrtimer_base_type {
 struct hrtimer_cpu_base {
 	raw_spinlock_t			lock;
 	unsigned int			active_bases;
-	unsigned int			clock_was_set;
 #ifdef CONFIG_HIGH_RES_TIMERS
 	ktime_t				expires_next;
 	int				hres_active;
@@ -289,8 +287,6 @@ extern void hrtimer_peek_ahead_timers(void);
 # define MONOTONIC_RES_NSEC	HIGH_RES_NSEC
 # define KTIME_MONOTONIC_RES	KTIME_HIGH_RES
 
-extern void clock_was_set_delayed(void);
-
 #else
 
 # define MONOTONIC_RES_NSEC	LOW_RES_NSEC
@@ -312,11 +308,8 @@ static inline int hrtimer_is_hres_active(struct hrtimer *timer)
 	return 0;
 }
 
-static inline void clock_was_set_delayed(void) { }
-
 #endif
 
-extern void clock_was_set(void);
 #ifdef CONFIG_TIMERFD
 extern void timerfd_clock_was_set(void);
 #else
diff --git a/include/linux/time.h b/include/linux/time.h
index d5d229b..75bca39 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -185,6 +185,11 @@ struct tms;
 extern void do_sys_times(struct tms *);
 
 /*
+ * Notifier chain called when system time is stepped.
+ */
+extern struct atomic_notifier_head clock_was_set_notifier_list;
+
+/*
  * Similar to the struct tm in userspace <time.h>, but it needs to be here so
  * that the kernel source is self contained.
  */
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index fd4b13b..6e475d5 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -721,19 +721,6 @@ static int hrtimer_switch_to_hres(void)
 	return 1;
 }
 
-/*
- * Called from timekeeping code to reprogramm the hrtimer interrupt
- * device. If called from the timer interrupt context we defer it to
- * softirq context.
- */
-void clock_was_set_delayed(void)
-{
-	struct hrtimer_cpu_base *cpu_base = &__get_cpu_var(hrtimer_bases);
-
-	cpu_base->clock_was_set = 1;
-	__raise_softirq_irqoff(HRTIMER_SOFTIRQ);
-}
-
 #else
 
 static inline int hrtimer_hres_active(void) { return 0; }
@@ -762,7 +749,7 @@ static inline void retrigger_next_event(void *arg) { }
  * resolution timer interrupts. On UP we just disable interrupts and
  * call the high resolution interrupt code.
  */
-void clock_was_set(void)
+static void clock_was_set(void)
 {
 #ifdef CONFIG_HIGH_RES_TIMERS
 	/* Retrigger the CPU local events everywhere */
@@ -1434,11 +1421,6 @@ static void run_hrtimer_softirq(struct softirq_action *h)
 {
 	struct hrtimer_cpu_base *cpu_base = &__get_cpu_var(hrtimer_bases);
 
-	if (cpu_base->clock_was_set) {
-		cpu_base->clock_was_set = 0;
-		clock_was_set();
-	}
-
 	hrtimer_peek_ahead_timers();
 }
 
@@ -1776,11 +1758,24 @@ static struct notifier_block __cpuinitdata hrtimers_nb = {
 	.notifier_call = hrtimer_cpu_notify,
 };
 
+static int hrtimer_clock_was_set_notify(struct notifier_block *self,
+					unsigned long action, void *data)
+{
+	clock_was_set();
+	return NOTIFY_OK;
+}
+
+static struct notifier_block hrtimers_clock_was_set_nb = {
+	.notifier_call = hrtimer_clock_was_set_notify,
+};
+
 void __init hrtimers_init(void)
 {
 	hrtimer_cpu_notify(&hrtimers_nb, (unsigned long)CPU_UP_PREPARE,
 			  (void *)(long)smp_processor_id());
 	register_cpu_notifier(&hrtimers_nb);
+	atomic_notifier_chain_register(&clock_was_set_notifier_list,
+				       &hrtimers_clock_was_set_nb);
 #ifdef CONFIG_HIGH_RES_TIMERS
 	open_softirq(HRTIMER_SOFTIRQ, run_hrtimer_softirq);
 #endif
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index baeeb5c..852b880 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -198,6 +198,25 @@ static inline s64 timekeeping_get_ns_raw(struct timekeeper *tk)
 	return nsec + get_arch_timeoffset();
 }
 
+ATOMIC_NOTIFIER_HEAD(clock_was_set_notifier_list);
+EXPORT_SYMBOL_GPL(clock_was_set_notifier_list);
+
+static void timekeeping_clock_was_set(void)
+{
+	atomic_notifier_call_chain(&clock_was_set_notifier_list, 0, NULL);
+}
+
+static void timekeeping_clock_was_set_task(unsigned long d)
+{
+	timekeeping_clock_was_set();
+}
+DECLARE_TASKLET(clock_was_set_tasklet, timekeeping_clock_was_set_task, 0);
+
+static void timekeeping_clock_was_set_delayed(void)
+{
+	tasklet_schedule(&clock_was_set_tasklet);
+}
+
 static RAW_NOTIFIER_HEAD(pvclock_gtod_chain);
 
 static void update_pvclock_gtod(struct timekeeper *tk)
@@ -513,8 +532,7 @@ int do_settimeofday(const struct timespec *tv)
 	write_seqcount_end(&timekeeper_seq);
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
 
-	/* signal hrtimers about time change */
-	clock_was_set();
+	timekeeping_clock_was_set();
 
 	return 0;
 }
@@ -557,8 +575,7 @@ error: /* even if we error out, we forwarded the time, so call update */
 	write_seqcount_end(&timekeeper_seq);
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
 
-	/* signal hrtimers about time change */
-	clock_was_set();
+	timekeeping_clock_was_set();
 
 	return ret;
 }
@@ -607,7 +624,7 @@ void timekeeping_set_tai_offset(s32 tai_offset)
 	__timekeeping_set_tai_offset(tk, tai_offset);
 	write_seqcount_end(&timekeeper_seq);
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
-	clock_was_set();
+	timekeeping_clock_was_set();
 }
 
 /**
@@ -877,8 +894,7 @@ void timekeeping_inject_sleeptime(struct timespec *delta)
 	write_seqcount_end(&timekeeper_seq);
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
 
-	/* signal hrtimers about time change */
-	clock_was_set();
+	timekeeping_clock_was_set();
 }
 
 /**
@@ -1260,7 +1276,7 @@ static inline void accumulate_nsecs_to_secs(struct timekeeper *tk)
 
 			__timekeeping_set_tai_offset(tk, tk->tai_offset - leap);
 
-			clock_was_set_delayed();
+			timekeeping_clock_was_set_delayed();
 		}
 	}
 }
@@ -1677,7 +1693,7 @@ int do_adjtimex(struct timex *txc)
 
 	if (tai != orig_tai) {
 		__timekeeping_set_tai_offset(tk, tai);
-		clock_was_set_delayed();
+		timekeeping_clock_was_set_delayed();
 	}
 	write_seqcount_end(&timekeeper_seq);
 	raw_spin_unlock_irqrestore(&timekeeper_lock, flags);
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/4] x86/xen: sync the wallclock when the system time is stepped
  2013-06-19 15:25 [PATCHv4 0/4] xen: maintain an accurate persistent clock in more cases David Vrabel
  2013-06-19 15:25 ` [PATCH 1/4] xen: disable non-boot VCPUs during suspend David Vrabel
  2013-06-19 15:25 ` [PATCH 2/4] time: add a notifier chain for when the system time is stepped David Vrabel
@ 2013-06-19 15:25 ` David Vrabel
  2013-06-20 10:43   ` [Xen-devel] " Jan Beulich
  2013-06-19 15:25 ` [PATCH 4/4] x86/xen: sync the CMOS RTC as well as the Xen wallclock David Vrabel
  3 siblings, 1 reply; 14+ messages in thread
From: David Vrabel @ 2013-06-19 15:25 UTC (permalink / raw)
  To: xen-devel
  Cc: David Vrabel, Konrad Rzeszutek Wilk, linux-kernel, John Stultz,
	Thomas Gleixner

From: David Vrabel <david.vrabel@citrix.com>

The Xen wallclock is a software only clock within the Xen hypervisor
that is used by: a) PV guests as the equivalent of a hardware RTC; and
b) the hypervisor as the clock source for the emulated RTC provided to
HVM guests.

Currently the Xen wallclock is only updated every 11 minutes if NTP is
synchronized to its clock source.  If a guest is started before NTP is
synchronized it may see an incorrect wallclock time.

Use the clock_was_set notifier chain to receive a notification when
the system time is stepped and update the wallclock to match the
current system time.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 arch/x86/xen/time.c |   24 ++++++++++++++++++++++++
 1 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index a1947ac..c4bb255 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -211,6 +211,26 @@ static int xen_set_wallclock(const struct timespec *now)
 
 	return HYPERVISOR_dom0_op(&op);
 }
+ 
+static int xen_clock_was_set_notify(struct notifier_block *nb, unsigned long unused,
+				    void *priv)
+{
+       struct timespec now;
+       int ret;
+
+       /*
+        * Set the Xen wallclock from Linux system time.
+        */
+       now = current_kernel_time();
+       xen_set_wallclock(&now);
+
+       return NOTIFY_OK;
+}
+
+static struct notifier_block xen_clock_was_set_notifier = {
+	.notifier_call = xen_clock_was_set_notify,
+};
+
 
 static struct clocksource xen_clocksource __read_mostly = {
 	.name = "xen",
@@ -473,6 +493,10 @@ static void __init xen_time_init(void)
 	xen_setup_runstate_info(cpu);
 	xen_setup_timer(cpu);
 	xen_setup_cpu_clockevents();
+
+	if (xen_initial_domain())
+		atomic_notifier_chain_register(&clock_was_set_notifier_list,
+					       &xen_clock_was_set_notifier);
 }
 
 void __init xen_init_time_ops(void)
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/4] x86/xen: sync the CMOS RTC as well as the Xen wallclock
  2013-06-19 15:25 [PATCHv4 0/4] xen: maintain an accurate persistent clock in more cases David Vrabel
                   ` (2 preceding siblings ...)
  2013-06-19 15:25 ` [PATCH 3/4] x86/xen: sync the wallclock " David Vrabel
@ 2013-06-19 15:25 ` David Vrabel
  3 siblings, 0 replies; 14+ messages in thread
From: David Vrabel @ 2013-06-19 15:25 UTC (permalink / raw)
  To: xen-devel
  Cc: David Vrabel, Konrad Rzeszutek Wilk, linux-kernel, John Stultz,
	Thomas Gleixner

From: David Vrabel <david.vrabel@citrix.com>

Adjustments to Xen's persistent clock via update_persistent_clock()
don't actually persist, as the Xen wallclock is a software only clock
and modifications to it do not modify the underlying CMOS RTC.

The x86_platform.set_wallclock hook can be used to keep the hardware
RTC synchronized (as on bare metal).  Because the Xen wallclock is now
kept synchronized by the clock_was_set notifier and a new timer,
xen_set_wallclock() need not do this and dom0 can simply use the
native implementation.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 arch/x86/xen/time.c |   49 +++++++++++++++++++++++++++++++------------------
 1 files changed, 31 insertions(+), 18 deletions(-)

diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index c4bb255..18a3db6 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -198,39 +198,50 @@ static void xen_get_wallclock(struct timespec *now)
 
 static int xen_set_wallclock(const struct timespec *now)
 {
+	return -1;
+}
+
+static void xen_wallclock_timer_func(unsigned long d);
+static DEFINE_TIMER(xen_wallclock_timer, xen_wallclock_timer_func, 0, 0);
+
+static void xen_sync_wallclock(void)
+{
+	struct timespec now;
 	struct xen_platform_op op;
 
-	/* do nothing for domU */
-	if (!xen_initial_domain())
-		return -1;
+	now = current_kernel_time();
 
 	op.cmd = XENPF_settime;
-	op.u.settime.secs = now->tv_sec;
-	op.u.settime.nsecs = now->tv_nsec;
+	op.u.settime.secs = now.tv_sec;
+	op.u.settime.nsecs = now.tv_nsec;
 	op.u.settime.system_time = xen_clocksource_read();
 
-	return HYPERVISOR_dom0_op(&op);
+	(void)HYPERVISOR_dom0_op(&op);
+
+	/*
+	 * Use a timer to correct for any drift in the Xen
+	 * wallclock.
+	 *
+	 * 11 minutes is the same period as sync_cmos_clock().
+	 */
+	mod_timer(&xen_wallclock_timer, round_jiffies(jiffies + 11*60*HZ));
 }
- 
+
 static int xen_clock_was_set_notify(struct notifier_block *nb, unsigned long unused,
 				    void *priv)
 {
-       struct timespec now;
-       int ret;
-
-       /*
-        * Set the Xen wallclock from Linux system time.
-        */
-       now = current_kernel_time();
-       xen_set_wallclock(&now);
-
-       return NOTIFY_OK;
+	xen_sync_wallclock();
+	return NOTIFY_OK;
 }
 
 static struct notifier_block xen_clock_was_set_notifier = {
 	.notifier_call = xen_clock_was_set_notify,
 };
 
+static void xen_wallclock_timer_func(unsigned long d)
+{
+	xen_sync_wallclock();
+}
 
 static struct clocksource xen_clocksource __read_mostly = {
 	.name = "xen",
@@ -509,7 +520,9 @@ void __init xen_init_time_ops(void)
 
 	x86_platform.calibrate_tsc = xen_tsc_khz;
 	x86_platform.get_wallclock = xen_get_wallclock;
-	x86_platform.set_wallclock = xen_set_wallclock;
+	/* Dom0 uses the native method to set the hardware RTC. */
+	if (!xen_initial_domain())
+		x86_platform.set_wallclock = xen_set_wallclock;
 }
 
 #ifdef CONFIG_XEN_PVHVM
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] time: add a notifier chain for when the system time is stepped
  2013-06-19 15:25 ` [PATCH 2/4] time: add a notifier chain for when the system time is stepped David Vrabel
@ 2013-06-19 16:52   ` John Stultz
  2013-06-19 17:13     ` Konrad Rzeszutek Wilk
  2013-06-20 10:50     ` David Vrabel
  0 siblings, 2 replies; 14+ messages in thread
From: John Stultz @ 2013-06-19 16:52 UTC (permalink / raw)
  To: David Vrabel
  Cc: xen-devel, Konrad Rzeszutek Wilk, linux-kernel, Thomas Gleixner

On 06/19/2013 08:25 AM, David Vrabel wrote:
> From: David Vrabel <david.vrabel@citrix.com>
>
> The high resolution timer code gets notified of step changes to the
> system time with clock_was_set() or clock_was_set_delayed() calls.  If
> other parts of the kernel require similar notification there is no
> clear place to hook into.
>
> Add a clock_was_set atomic notifier chain
> (clock_was_set_notifier_list) and call this in place of
> clock_was_set().  If the timekeeping locks are held, the calls are
> deferred to a new tasklet.
>
> The hrtimer code adds a notifier block to this chain and uses it to
> call (the now internal) clock_was_set().  Since the timekeeping code
> does not call the chain from the timer irq clock_was_set_delayed() and
> associated code can be removed.

So on my initial quick review, this *looks* pretty reasonable. I get a 
little worried about interface abuse (ie: random drivers trying to hook 
into clock_was_set_notifier_list), but we can move that into 
timekeeper_internal.h or something similar to limit that.

The other issue here is we've been burned pretty badly in the past with 
changes to clock_was_set(), as its key to keeping timers in line with 
timekeeping.  So this will need a fair amount of testing and run time 
before this gets merged, so 3.12 is what we'd be targeting at the 
earliest (its getting a bit late for taking changes for 3.11 anyway).

If you want to try to push patch 1/4 in for 3.11 via the Xen tree, I'll 
see about queuing the other three for hopefully 3.12.

thanks
-john


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] xen: disable non-boot VCPUs during suspend
  2013-06-19 15:25 ` [PATCH 1/4] xen: disable non-boot VCPUs during suspend David Vrabel
@ 2013-06-19 17:11   ` Konrad Rzeszutek Wilk
  2013-06-20 10:01     ` David Vrabel
  2013-06-20 10:38   ` [Xen-devel] " Jan Beulich
  1 sibling, 1 reply; 14+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-06-19 17:11 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, linux-kernel, John Stultz, Thomas Gleixner

On Wed, Jun 19, 2013 at 04:25:20PM +0100, David Vrabel wrote:
> From: David Vrabel <david.vrabel@citrix.com>
> 
> syscore_suspend() and syscore_resume() expect there to be only one
> online CPU.  e.g., hrtimers_resume() only triggers events for the
> current CPU.  Xen's suspend path was leaving all VCPUs online and then
> attempting to fixup problems afterwards (e.g., with an explicit call
> to clock_was_set() to trigger pending high resolution timers).
> 
> Instead, disable non-boot CPUs before calling stop_machine() and
> reenable them afterwards.
> 
> This is then similar to what the kexec code does before and after a
> kexec jump (see kernel_kexec() in kernel/kexec.c).
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>

Looks like a bug-fix. But considering that the VCPU hotplug code
in PVHVM had bugs in until v3.10 it probably shouldn't even hit
stable tree.

Looks ok to me so will stick it on the 3.11 train if nobody screams.

> ---
>  drivers/xen/manage.c |   14 +++++++++++---
>  1 files changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c
> index 412b96c..596e55a 100644
> --- a/drivers/xen/manage.c
> +++ b/drivers/xen/manage.c
> @@ -148,8 +148,19 @@ static void do_suspend(void)
>  		si.post = &xen_post_suspend;
>  	}
>  
> +	/*
> +	 * syscore_suspend() and syscore_resume() called in
> +	 * xen_suspend() above, assume that only the boot CPU is
> +	 * online.
> +	 */
> +	err = disable_nonboot_cpus();
> +	if (err)
> +		goto out_resume;
> +
>  	err = stop_machine(xen_suspend, &si, cpumask_of(0));
>  
> +	enable_nonboot_cpus();
> +
>  	dpm_resume_start(si.cancelled ? PMSG_THAW : PMSG_RESTORE);
>  
>  	if (err) {
> @@ -166,9 +177,6 @@ out_resume:
>  
>  	dpm_resume_end(si.cancelled ? PMSG_THAW : PMSG_RESTORE);
>  
> -	/* Make sure timer events get retriggered on all CPUs */
> -	clock_was_set();
> -
>  out_thaw:
>  #ifdef CONFIG_PREEMPT
>  	thaw_processes();
> -- 
> 1.7.2.5
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] time: add a notifier chain for when the system time is stepped
  2013-06-19 16:52   ` John Stultz
@ 2013-06-19 17:13     ` Konrad Rzeszutek Wilk
  2013-06-19 17:38       ` John Stultz
  2013-06-20 10:50     ` David Vrabel
  1 sibling, 1 reply; 14+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-06-19 17:13 UTC (permalink / raw)
  To: John Stultz; +Cc: David Vrabel, xen-devel, linux-kernel, Thomas Gleixner

On Wed, Jun 19, 2013 at 09:52:06AM -0700, John Stultz wrote:
> On 06/19/2013 08:25 AM, David Vrabel wrote:
> >From: David Vrabel <david.vrabel@citrix.com>
> >
> >The high resolution timer code gets notified of step changes to the
> >system time with clock_was_set() or clock_was_set_delayed() calls.  If
> >other parts of the kernel require similar notification there is no
> >clear place to hook into.
> >
> >Add a clock_was_set atomic notifier chain
> >(clock_was_set_notifier_list) and call this in place of
> >clock_was_set().  If the timekeeping locks are held, the calls are
> >deferred to a new tasklet.
> >
> >The hrtimer code adds a notifier block to this chain and uses it to
> >call (the now internal) clock_was_set().  Since the timekeeping code
> >does not call the chain from the timer irq clock_was_set_delayed() and
> >associated code can be removed.
> 
> So on my initial quick review, this *looks* pretty reasonable. I get
> a little worried about interface abuse (ie: random drivers trying to
> hook into clock_was_set_notifier_list), but we can move that into
> timekeeper_internal.h or something similar to limit that.
> 
> The other issue here is we've been burned pretty badly in the past
> with changes to clock_was_set(), as its key to keeping timers in
> line with timekeeping.  So this will need a fair amount of testing
> and run time before this gets merged, so 3.12 is what we'd be
> targeting at the earliest (its getting a bit late for taking changes
> for 3.11 anyway).

I think we have four weeks left? Or you think Linus is going to release
at the end of the month?
> 
> If you want to try to push patch 1/4 in for 3.11 via the Xen tree,

Done.
> I'll see about queuing the other three for hopefully 3.12.

OK, let me run them through the testing gauntleet to have the
warm fuzzy feeling.
> 
> thanks
> -john
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] time: add a notifier chain for when the system time is stepped
  2013-06-19 17:13     ` Konrad Rzeszutek Wilk
@ 2013-06-19 17:38       ` John Stultz
  0 siblings, 0 replies; 14+ messages in thread
From: John Stultz @ 2013-06-19 17:38 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: David Vrabel, xen-devel, linux-kernel, Thomas Gleixner

On 06/19/2013 10:13 AM, Konrad Rzeszutek Wilk wrote:
> On Wed, Jun 19, 2013 at 09:52:06AM -0700, John Stultz wrote:
>> On 06/19/2013 08:25 AM, David Vrabel wrote:
>>> From: David Vrabel <david.vrabel@citrix.com>
>>>
>>> The high resolution timer code gets notified of step changes to the
>>> system time with clock_was_set() or clock_was_set_delayed() calls.  If
>>> other parts of the kernel require similar notification there is no
>>> clear place to hook into.
>>>
>>> Add a clock_was_set atomic notifier chain
>>> (clock_was_set_notifier_list) and call this in place of
>>> clock_was_set().  If the timekeeping locks are held, the calls are
>>> deferred to a new tasklet.
>>>
>>> The hrtimer code adds a notifier block to this chain and uses it to
>>> call (the now internal) clock_was_set().  Since the timekeeping code
>>> does not call the chain from the timer irq clock_was_set_delayed() and
>>> associated code can be removed.
>> So on my initial quick review, this *looks* pretty reasonable. I get
>> a little worried about interface abuse (ie: random drivers trying to
>> hook into clock_was_set_notifier_list), but we can move that into
>> timekeeper_internal.h or something similar to limit that.
>>
>> The other issue here is we've been burned pretty badly in the past
>> with changes to clock_was_set(), as its key to keeping timers in
>> line with timekeeping.  So this will need a fair amount of testing
>> and run time before this gets merged, so 3.12 is what we'd be
>> targeting at the earliest (its getting a bit late for taking changes
>> for 3.11 anyway).
> I think we have four weeks left? Or you think Linus is going to release
> at the end of the month?

Well, I have to queue it, and then get Thomas to pull it from me, and 
that can take some time. And after -rc6 there's just not a lot of time 
for things to get testing in -next before being pushed to Linus. Its 
just a bit rushed for me.


>> If you want to try to push patch 1/4 in for 3.11 via the Xen tree,
> Done.
>> I'll see about queuing the other three for hopefully 3.12.
> OK, let me run them through the testing gauntleet to have the
> warm fuzzy feeling.

Thanks.  I'll be running it through my test cases as well.

-john

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] xen: disable non-boot VCPUs during suspend
  2013-06-19 17:11   ` Konrad Rzeszutek Wilk
@ 2013-06-20 10:01     ` David Vrabel
  0 siblings, 0 replies; 14+ messages in thread
From: David Vrabel @ 2013-06-20 10:01 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: xen-devel, linux-kernel, John Stultz, Thomas Gleixner

On 19/06/13 18:11, Konrad Rzeszutek Wilk wrote:
> On Wed, Jun 19, 2013 at 04:25:20PM +0100, David Vrabel wrote:
>> From: David Vrabel <david.vrabel@citrix.com>
>>
>> syscore_suspend() and syscore_resume() expect there to be only one
>> online CPU.  e.g., hrtimers_resume() only triggers events for the
>> current CPU.  Xen's suspend path was leaving all VCPUs online and then
>> attempting to fixup problems afterwards (e.g., with an explicit call
>> to clock_was_set() to trigger pending high resolution timers).
>>
>> Instead, disable non-boot CPUs before calling stop_machine() and
>> reenable them afterwards.
>>
>> This is then similar to what the kexec code does before and after a
>> kexec jump (see kernel_kexec() in kernel/kexec.c).
>>
>> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> 
> Looks like a bug-fix. But considering that the VCPU hotplug code
> in PVHVM had bugs in until v3.10 it probably shouldn't even hit
> stable tree.

I don't think actually fixes any bugs so it doesn't need to go to stable.

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xen-devel] [PATCH 1/4] xen: disable non-boot VCPUs during suspend
  2013-06-19 15:25 ` [PATCH 1/4] xen: disable non-boot VCPUs during suspend David Vrabel
  2013-06-19 17:11   ` Konrad Rzeszutek Wilk
@ 2013-06-20 10:38   ` Jan Beulich
  2013-06-20 11:46     ` David Vrabel
  1 sibling, 1 reply; 14+ messages in thread
From: Jan Beulich @ 2013-06-20 10:38 UTC (permalink / raw)
  To: David Vrabel
  Cc: John Stultz, Thomas Gleixner, xen-devel, Konrad Rzeszutek Wilk,
	linux-kernel

>>> On 19.06.13 at 17:25, David Vrabel <david.vrabel@citrix.com> wrote:
> syscore_suspend() and syscore_resume() expect there to be only one
> online CPU.  e.g., hrtimers_resume() only triggers events for the
> current CPU.  Xen's suspend path was leaving all VCPUs online and then
> attempting to fixup problems afterwards (e.g., with an explicit call
> to clock_was_set() to trigger pending high resolution timers).
> 
> Instead, disable non-boot CPUs before calling stop_machine() and
> reenable them afterwards.

In XenoLinux the so called "fast suspend" mode was specifically
added for performance reasons, and it looks like to date pv-ops
only ever supported that mode. So one question is whether
there's going to be any bad performance effect from this.

The other question is that about retaining the use of
stop_machine() then - it seems pretty pointless if you already
bring down all other CPUs.

Jan


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xen-devel] [PATCH 3/4] x86/xen: sync the wallclock when the system time is stepped
  2013-06-19 15:25 ` [PATCH 3/4] x86/xen: sync the wallclock " David Vrabel
@ 2013-06-20 10:43   ` Jan Beulich
  0 siblings, 0 replies; 14+ messages in thread
From: Jan Beulich @ 2013-06-20 10:43 UTC (permalink / raw)
  To: David Vrabel
  Cc: John Stultz, Thomas Gleixner, xen-devel, Konrad Rzeszutek Wilk,
	linux-kernel

>>> On 19.06.13 at 17:25, David Vrabel <david.vrabel@citrix.com> wrote:
> --- a/arch/x86/xen/time.c
> +++ b/arch/x86/xen/time.c
> @@ -211,6 +211,26 @@ static int xen_set_wallclock(const struct timespec *now)
>  
>  	return HYPERVISOR_dom0_op(&op);
>  }
> + 
> +static int xen_clock_was_set_notify(struct notifier_block *nb, unsigned long unused,
> +				    void *priv)
> +{
> +       struct timespec now;
> +       int ret;

Unused variable.

Jan

> +
> +       /*
> +        * Set the Xen wallclock from Linux system time.
> +        */
> +       now = current_kernel_time();
> +       xen_set_wallclock(&now);
> +
> +       return NOTIFY_OK;
> +}
> +
> +static struct notifier_block xen_clock_was_set_notifier = {
> +	.notifier_call = xen_clock_was_set_notify,
> +};
> +
>  
>  static struct clocksource xen_clocksource __read_mostly = {
>  	.name = "xen",



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] time: add a notifier chain for when the system time is stepped
  2013-06-19 16:52   ` John Stultz
  2013-06-19 17:13     ` Konrad Rzeszutek Wilk
@ 2013-06-20 10:50     ` David Vrabel
  1 sibling, 0 replies; 14+ messages in thread
From: David Vrabel @ 2013-06-20 10:50 UTC (permalink / raw)
  To: John Stultz
  Cc: xen-devel, Konrad Rzeszutek Wilk, linux-kernel, Thomas Gleixner

On 19/06/13 17:52, John Stultz wrote:
> On 06/19/2013 08:25 AM, David Vrabel wrote:
>> From: David Vrabel <david.vrabel@citrix.com>
>>
>> The high resolution timer code gets notified of step changes to the
>> system time with clock_was_set() or clock_was_set_delayed() calls.  If
>> other parts of the kernel require similar notification there is no
>> clear place to hook into.
>>
>> Add a clock_was_set atomic notifier chain
>> (clock_was_set_notifier_list) and call this in place of
>> clock_was_set().  If the timekeeping locks are held, the calls are
>> deferred to a new tasklet.
>>
>> The hrtimer code adds a notifier block to this chain and uses it to
>> call (the now internal) clock_was_set().  Since the timekeeping code
>> does not call the chain from the timer irq clock_was_set_delayed() and
>> associated code can be removed.
> 
> So on my initial quick review, this *looks* pretty reasonable. I get a
> little worried about interface abuse (ie: random drivers trying to hook
> into clock_was_set_notifier_list), but we can move that into
> timekeeper_internal.h or something similar to limit that.

I can move the actual list to time.c but we still need to provide a
register_clock_was_set_notifier() call in linux/time.h as the two
current users (Xen and hrtimers) don't include and probably should not
include timekeeper_internal.h.

> The other issue here is we've been burned pretty badly in the past with
> changes to clock_was_set(), as its key to keeping timers in line with
> timekeeping.  So this will need a fair amount of testing and run time
> before this gets merged, so 3.12 is what we'd be targeting at the
> earliest (its getting a bit late for taking changes for 3.11 anyway).

hrtimer's clock_was_set() is called at the same point in the non-delayed
case.

For the delayed case, it used to be called at the beginning of the
hrtimer softirq and now it is called from a tasklet, but if I understand
this correctly, the tasklet softirq will be called before the hrtimer
one so this should give the same behaviour.

> If you want to try to push patch 1/4 in for 3.11 via the Xen tree, I'll
> see about queuing the other three for hopefully 3.12.

3.12 is fine.  I'd prefer to have a correct and well-tested fix merged.

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Xen-devel] [PATCH 1/4] xen: disable non-boot VCPUs during suspend
  2013-06-20 10:38   ` [Xen-devel] " Jan Beulich
@ 2013-06-20 11:46     ` David Vrabel
  0 siblings, 0 replies; 14+ messages in thread
From: David Vrabel @ 2013-06-20 11:46 UTC (permalink / raw)
  To: Jan Beulich
  Cc: John Stultz, Thomas Gleixner, xen-devel, Konrad Rzeszutek Wilk,
	linux-kernel

On 20/06/13 11:38, Jan Beulich wrote:
>>>> On 19.06.13 at 17:25, David Vrabel <david.vrabel@citrix.com> wrote:
>> syscore_suspend() and syscore_resume() expect there to be only one
>> online CPU.  e.g., hrtimers_resume() only triggers events for the
>> current CPU.  Xen's suspend path was leaving all VCPUs online and then
>> attempting to fixup problems afterwards (e.g., with an explicit call
>> to clock_was_set() to trigger pending high resolution timers).
>>
>> Instead, disable non-boot CPUs before calling stop_machine() and
>> reenable them afterwards.
> 
> In XenoLinux the so called "fast suspend" mode was specifically
> added for performance reasons, and it looks like to date pv-ops
> only ever supported that mode. So one question is whether
> there's going to be any bad performance effect from this.

Yes :(

On a VM with 4 VCPUs, disable_boot_cpus() took > 200 ms.

I'll have to rethink this.

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2013-06-20 11:46 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-19 15:25 [PATCHv4 0/4] xen: maintain an accurate persistent clock in more cases David Vrabel
2013-06-19 15:25 ` [PATCH 1/4] xen: disable non-boot VCPUs during suspend David Vrabel
2013-06-19 17:11   ` Konrad Rzeszutek Wilk
2013-06-20 10:01     ` David Vrabel
2013-06-20 10:38   ` [Xen-devel] " Jan Beulich
2013-06-20 11:46     ` David Vrabel
2013-06-19 15:25 ` [PATCH 2/4] time: add a notifier chain for when the system time is stepped David Vrabel
2013-06-19 16:52   ` John Stultz
2013-06-19 17:13     ` Konrad Rzeszutek Wilk
2013-06-19 17:38       ` John Stultz
2013-06-20 10:50     ` David Vrabel
2013-06-19 15:25 ` [PATCH 3/4] x86/xen: sync the wallclock " David Vrabel
2013-06-20 10:43   ` [Xen-devel] " Jan Beulich
2013-06-19 15:25 ` [PATCH 4/4] x86/xen: sync the CMOS RTC as well as the Xen wallclock David Vrabel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).