[PATCH v2 00/13] local_clock() vs noinstr

linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 00/13] local_clock() vs noinstr
@ 2023-05-19 10:20 Peter Zijlstra
  2023-05-19 10:20 ` [PATCH v2 01/13] seqlock/latch: Provide raw_read_seqcount_latch_retry() Peter Zijlstra
                   ` (13 more replies)
  0 siblings, 14 replies; 25+ messages in thread
From: Peter Zijlstra @ 2023-05-19 10:20 UTC (permalink / raw)
  To: bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel, hca,
	gor, agordeev, borntraeger, svens, pbonzini, wanpengli, vkuznets,
	tglx, mingo, bp, dave.hansen, x86, hpa, jgross, boris.ostrovsky,
	daniel.lezcano, kys, haiyangz, wei.liu, decui, rafael, peterz,
	longman, boqun.feng, pmladek, senozhatsky, rostedt, john.ogness,
	juri.lelli, vincent.guittot, dietmar.eggemann, bsegall, mgorman,
	bristot, vschneid, jstultz, sboyd, linux-kernel, loongarch,
	linux-s390, kvm, linux-hyperv, linux-pm

Hi All,

latest version of the local_clock_noinstr() patches.

Most of the changes are in Hyper-V and x86/vdso/gettimeofday; Michael has been
very helpful navigating the Hyper-V spec and fixing their sched_clock
implementation.

---
 arch/arm64/include/asm/arch_timer.h      |  8 +----
 arch/arm64/include/asm/io.h              | 12 +++----
 arch/loongarch/include/asm/loongarch.h   |  2 +-
 arch/loongarch/kernel/time.c             |  6 ++--
 arch/s390/include/asm/timex.h            | 13 ++++---
 arch/s390/kernel/time.c                  |  5 +++
 arch/x86/include/asm/mshyperv.h          |  5 +++
 arch/x86/include/asm/vdso/gettimeofday.h | 41 ++++++++++++++++------
 arch/x86/kernel/kvmclock.c               |  4 +--
 arch/x86/kernel/tsc.c                    | 38 +++++++++++++++-----
 arch/x86/kvm/x86.c                       |  7 ++--
 arch/x86/xen/time.c                      |  3 +-
 drivers/clocksource/arm_arch_timer.c     | 60 ++++++++++++++++++++++++--------
 drivers/clocksource/hyperv_timer.c       | 44 ++++++++++++++---------
 drivers/cpuidle/cpuidle.c                |  8 ++---
 drivers/cpuidle/poll_state.c             |  4 +--
 include/clocksource/hyperv_timer.h       | 24 +++++--------
 include/linux/math64.h                   |  2 +-
 include/linux/rbtree_latch.h             |  2 +-
 include/linux/sched/clock.h              | 17 ++++++++-
 include/linux/seqlock.h                  | 15 ++++----
 kernel/printk/printk.c                   |  2 +-
 kernel/sched/clock.c                     | 19 ++++++----
 kernel/time/sched_clock.c                | 24 +++++++++----
 kernel/time/timekeeping.c                |  4 +--
 25 files changed, 242 insertions(+), 127 deletions(-)


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 01/13] seqlock/latch: Provide raw_read_seqcount_latch_retry()
  2023-05-19 10:20 [PATCH v2 00/13] local_clock() vs noinstr Peter Zijlstra
@ 2023-05-19 10:20 ` Peter Zijlstra
  2023-05-19 10:21 ` [PATCH v2 02/13] time/sched_clock: Provide sched_clock_noinstr() Peter Zijlstra
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2023-05-19 10:20 UTC (permalink / raw)
  To: bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel, hca,
	gor, agordeev, borntraeger, svens, pbonzini, wanpengli, vkuznets,
	tglx, mingo, bp, dave.hansen, x86, hpa, jgross, boris.ostrovsky,
	daniel.lezcano, kys, haiyangz, wei.liu, decui, rafael, peterz,
	longman, boqun.feng, pmladek, senozhatsky, rostedt, john.ogness,
	juri.lelli, vincent.guittot, dietmar.eggemann, bsegall, mgorman,
	bristot, vschneid, jstultz, sboyd, linux-kernel, loongarch,
	linux-s390, kvm, linux-hyperv, linux-pm

The read side of seqcount_latch consists of:

  do {
    seq = raw_read_seqcount_latch(&latch->seq);
    ...
  } while (read_seqcount_latch_retry(&latch->seq, seq));

which is asymmetric in the raw_ department, and sure enough,
read_seqcount_latch_retry() includes (explicit) instrumentation where
raw_read_seqcount_latch() does not.

This inconsistency becomes a problem when trying to use it from
noinstr code. As such, fix it by renaming and re-implementing
raw_read_seqcount_latch_retry() without the instrumentation.

Specifically the instrumentation in question is kcsan_atomic_next(0)
in do___read_seqcount_retry(). Loosing this annotation is not a
problem because raw_read_seqcount_latch() does not pass through
kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
---
 include/linux/rbtree_latch.h |    2 +-
 include/linux/seqlock.h      |   15 ++++++++-------
 kernel/printk/printk.c       |    2 +-
 kernel/time/sched_clock.c    |    2 +-
 kernel/time/timekeeping.c    |    4 ++--
 5 files changed, 13 insertions(+), 12 deletions(-)

--- a/include/linux/rbtree_latch.h
+++ b/include/linux/rbtree_latch.h
@@ -206,7 +206,7 @@ latch_tree_find(void *key, struct latch_
 	do {
 		seq = raw_read_seqcount_latch(&root->seq);
 		node = __lt_find(key, root, seq & 1, ops->comp);
-	} while (read_seqcount_latch_retry(&root->seq, seq));
+	} while (raw_read_seqcount_latch_retry(&root->seq, seq));
 
 	return node;
 }
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -671,9 +671,9 @@ typedef struct {
  *
  * Return: sequence counter raw value. Use the lowest bit as an index for
  * picking which data copy to read. The full counter must then be checked
- * with read_seqcount_latch_retry().
+ * with raw_read_seqcount_latch_retry().
  */
-static inline unsigned raw_read_seqcount_latch(const seqcount_latch_t *s)
+static __always_inline unsigned raw_read_seqcount_latch(const seqcount_latch_t *s)
 {
 	/*
 	 * Pairs with the first smp_wmb() in raw_write_seqcount_latch().
@@ -683,16 +683,17 @@ static inline unsigned raw_read_seqcount
 }
 
 /**
- * read_seqcount_latch_retry() - end a seqcount_latch_t read section
+ * raw_read_seqcount_latch_retry() - end a seqcount_latch_t read section
  * @s:		Pointer to seqcount_latch_t
  * @start:	count, from raw_read_seqcount_latch()
  *
  * Return: true if a read section retry is required, else false
  */
-static inline int
-read_seqcount_latch_retry(const seqcount_latch_t *s, unsigned start)
+static __always_inline int
+raw_read_seqcount_latch_retry(const seqcount_latch_t *s, unsigned start)
 {
-	return read_seqcount_retry(&s->seqcount, start);
+	smp_rmb();
+	return unlikely(READ_ONCE(s->seqcount.sequence) != start);
 }
 
 /**
@@ -752,7 +753,7 @@ read_seqcount_latch_retry(const seqcount
  *			entry = data_query(latch->data[idx], ...);
  *
  *		// This includes needed smp_rmb()
- *		} while (read_seqcount_latch_retry(&latch->seq, seq));
+ *		} while (raw_read_seqcount_latch_retry(&latch->seq, seq));
  *
  *		return entry;
  *	}
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -597,7 +597,7 @@ static u64 latched_seq_read_nolock(struc
 		seq = raw_read_seqcount_latch(&ls->latch);
 		idx = seq & 0x1;
 		val = ls->val[idx];
-	} while (read_seqcount_latch_retry(&ls->latch, seq));
+	} while (raw_read_seqcount_latch_retry(&ls->latch, seq));
 
 	return val;
 }
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -77,7 +77,7 @@ notrace struct clock_read_data *sched_cl
 
 notrace int sched_clock_read_retry(unsigned int seq)
 {
-	return read_seqcount_latch_retry(&cd.seq, seq);
+	return raw_read_seqcount_latch_retry(&cd.seq, seq);
 }
 
 unsigned long long notrace sched_clock(void)
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -450,7 +450,7 @@ static __always_inline u64 __ktime_get_f
 		tkr = tkf->base + (seq & 0x01);
 		now = ktime_to_ns(tkr->base);
 		now += fast_tk_get_delta_ns(tkr);
-	} while (read_seqcount_latch_retry(&tkf->seq, seq));
+	} while (raw_read_seqcount_latch_retry(&tkf->seq, seq));
 
 	return now;
 }
@@ -566,7 +566,7 @@ static __always_inline u64 __ktime_get_r
 		basem = ktime_to_ns(tkr->base);
 		baser = ktime_to_ns(tkr->base_real);
 		delta = fast_tk_get_delta_ns(tkr);
-	} while (read_seqcount_latch_retry(&tkf->seq, seq));
+	} while (raw_read_seqcount_latch_retry(&tkf->seq, seq));
 
 	if (mono)
 		*mono = basem + delta;



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 02/13] time/sched_clock: Provide sched_clock_noinstr()
  2023-05-19 10:20 [PATCH v2 00/13] local_clock() vs noinstr Peter Zijlstra
  2023-05-19 10:20 ` [PATCH v2 01/13] seqlock/latch: Provide raw_read_seqcount_latch_retry() Peter Zijlstra
@ 2023-05-19 10:21 ` Peter Zijlstra
  2023-05-19 10:21 ` [PATCH v2 03/13] arm64/io: Always inline all of __raw_{read,write}[bwlq]() Peter Zijlstra
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2023-05-19 10:21 UTC (permalink / raw)
  To: bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel, hca,
	gor, agordeev, borntraeger, svens, pbonzini, wanpengli, vkuznets,
	tglx, mingo, bp, dave.hansen, x86, hpa, jgross, boris.ostrovsky,
	daniel.lezcano, kys, haiyangz, wei.liu, decui, rafael, peterz,
	longman, boqun.feng, pmladek, senozhatsky, rostedt, john.ogness,
	juri.lelli, vincent.guittot, dietmar.eggemann, bsegall, mgorman,
	bristot, vschneid, jstultz, sboyd, linux-kernel, loongarch,
	linux-s390, kvm, linux-hyperv, linux-pm

With the intent to provide local_clock_noinstr(), a variant of
local_clock() that's safe to be called from noinstr code (with the
assumption that any such code will already be non-preemptible),
prepare for things by providing a noinstr sched_clock_noinstr() function.

Specifically, preempt_enable_*() calls out to schedule(), which upsets
noinstr validation efforts.

As such, pull out the preempt_{dis,en}able_notrace() requirements from
the sched_clock_read() implementations by explicitly providing it in
the sched_clock() function.

This further requires said sched_clock_read() functions to be noinstr
themselves, for ARCH_WANTS_NO_INSTR users. See the next few patches.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/time/sched_clock.c |   22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -64,7 +64,7 @@ static struct clock_data cd ____cachelin
 	.actual_read_sched_clock = jiffy_sched_clock_read,
 };
 
-static inline u64 notrace cyc_to_ns(u64 cyc, u32 mult, u32 shift)
+static __always_inline u64 cyc_to_ns(u64 cyc, u32 mult, u32 shift)
 {
 	return (cyc * mult) >> shift;
 }
@@ -80,23 +80,33 @@ notrace int sched_clock_read_retry(unsig
 	return raw_read_seqcount_latch_retry(&cd.seq, seq);
 }
 
-unsigned long long notrace sched_clock(void)
+unsigned long long noinstr sched_clock_noinstr(void)
 {
-	u64 cyc, res;
-	unsigned int seq;
 	struct clock_read_data *rd;
+	unsigned int seq;
+	u64 cyc, res;
 
 	do {
-		rd = sched_clock_read_begin(&seq);
+		seq = raw_read_seqcount_latch(&cd.seq);
+		rd = cd.read_data + (seq & 1);
 
 		cyc = (rd->read_sched_clock() - rd->epoch_cyc) &
 		      rd->sched_clock_mask;
 		res = rd->epoch_ns + cyc_to_ns(cyc, rd->mult, rd->shift);
-	} while (sched_clock_read_retry(seq));
+	} while (raw_read_seqcount_latch_retry(&cd.seq, seq));
 
 	return res;
 }
 
+unsigned long long notrace sched_clock(void)
+{
+	unsigned long long ns;
+	preempt_disable_notrace();
+	ns = sched_clock_noinstr();
+	preempt_enable_notrace();
+	return ns;
+}
+
 /*
  * Updating the data required to read the clock.
  *



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 03/13] arm64/io: Always inline all of __raw_{read,write}[bwlq]()
  2023-05-19 10:20 [PATCH v2 00/13] local_clock() vs noinstr Peter Zijlstra
  2023-05-19 10:20 ` [PATCH v2 01/13] seqlock/latch: Provide raw_read_seqcount_latch_retry() Peter Zijlstra
  2023-05-19 10:21 ` [PATCH v2 02/13] time/sched_clock: Provide sched_clock_noinstr() Peter Zijlstra
@ 2023-05-19 10:21 ` Peter Zijlstra
  2023-05-24 16:40   ` Valentin Schneider
  2023-05-19 10:21 ` [PATCH v2 04/13] arm64/arch_timer: Provide noinstr sched_clock_read() functions Peter Zijlstra
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2023-05-19 10:21 UTC (permalink / raw)
  To: bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel, hca,
	gor, agordeev, borntraeger, svens, pbonzini, wanpengli, vkuznets,
	tglx, mingo, bp, dave.hansen, x86, hpa, jgross, boris.ostrovsky,
	daniel.lezcano, kys, haiyangz, wei.liu, decui, rafael, peterz,
	longman, boqun.feng, pmladek, senozhatsky, rostedt, john.ogness,
	juri.lelli, vincent.guittot, dietmar.eggemann, bsegall, mgorman,
	bristot, vschneid, jstultz, sboyd, linux-kernel, loongarch,
	linux-s390, kvm, linux-hyperv, linux-pm

The next patch will want to use __raw_readl() from a noinstr section
and as such that needs to be marked __always_inline to avoid the
compiler being a silly bugger.

Turns out it already is, but its siblings are not. Finish the work
started in commit e43f1331e2ef913b ("arm64: Ask the compiler to
__always_inline functions used by KVM at HYP") for consistenies sake.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/arm64/include/asm/io.h |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

--- a/arch/arm64/include/asm/io.h
+++ b/arch/arm64/include/asm/io.h
@@ -22,13 +22,13 @@
  * Generic IO read/write.  These perform native-endian accesses.
  */
 #define __raw_writeb __raw_writeb
-static inline void __raw_writeb(u8 val, volatile void __iomem *addr)
+static __always_inline void __raw_writeb(u8 val, volatile void __iomem *addr)
 {
 	asm volatile("strb %w0, [%1]" : : "rZ" (val), "r" (addr));
 }
 
 #define __raw_writew __raw_writew
-static inline void __raw_writew(u16 val, volatile void __iomem *addr)
+static __always_inline void __raw_writew(u16 val, volatile void __iomem *addr)
 {
 	asm volatile("strh %w0, [%1]" : : "rZ" (val), "r" (addr));
 }
@@ -40,13 +40,13 @@ static __always_inline void __raw_writel
 }
 
 #define __raw_writeq __raw_writeq
-static inline void __raw_writeq(u64 val, volatile void __iomem *addr)
+static __always_inline void __raw_writeq(u64 val, volatile void __iomem *addr)
 {
 	asm volatile("str %x0, [%1]" : : "rZ" (val), "r" (addr));
 }
 
 #define __raw_readb __raw_readb
-static inline u8 __raw_readb(const volatile void __iomem *addr)
+static __always_inline u8 __raw_readb(const volatile void __iomem *addr)
 {
 	u8 val;
 	asm volatile(ALTERNATIVE("ldrb %w0, [%1]",
@@ -57,7 +57,7 @@ static inline u8 __raw_readb(const volat
 }
 
 #define __raw_readw __raw_readw
-static inline u16 __raw_readw(const volatile void __iomem *addr)
+static __always_inline u16 __raw_readw(const volatile void __iomem *addr)
 {
 	u16 val;
 
@@ -80,7 +80,7 @@ static __always_inline u32 __raw_readl(c
 }
 
 #define __raw_readq __raw_readq
-static inline u64 __raw_readq(const volatile void __iomem *addr)
+static __always_inline u64 __raw_readq(const volatile void __iomem *addr)
 {
 	u64 val;
 	asm volatile(ALTERNATIVE("ldr %0, [%1]",



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 04/13] arm64/arch_timer: Provide noinstr sched_clock_read() functions
  2023-05-19 10:20 [PATCH v2 00/13] local_clock() vs noinstr Peter Zijlstra
                   ` (2 preceding siblings ...)
  2023-05-19 10:21 ` [PATCH v2 03/13] arm64/io: Always inline all of __raw_{read,write}[bwlq]() Peter Zijlstra
@ 2023-05-19 10:21 ` Peter Zijlstra
  2023-05-24 16:40   ` Valentin Schneider
  2023-05-19 10:21 ` [PATCH v2 05/13] loongarch: Provide noinstr sched_clock_read() Peter Zijlstra
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2023-05-19 10:21 UTC (permalink / raw)
  To: bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel, hca,
	gor, agordeev, borntraeger, svens, pbonzini, wanpengli, vkuznets,
	tglx, mingo, bp, dave.hansen, x86, hpa, jgross, boris.ostrovsky,
	daniel.lezcano, kys, haiyangz, wei.liu, decui, rafael, peterz,
	longman, boqun.feng, pmladek, senozhatsky, rostedt, john.ogness,
	juri.lelli, vincent.guittot, dietmar.eggemann, bsegall, mgorman,
	bristot, vschneid, jstultz, sboyd, linux-kernel, loongarch,
	linux-s390, kvm, linux-hyperv, linux-pm

With the intent to provide local_clock_noinstr(), a variant of
local_clock() that's safe to be called from noinstr code (with the
assumption that any such code will already be non-preemptible),
prepare for things by providing a noinstr sched_clock_read() function.

Specifically, preempt_enable_*() calls out to schedule(), which upsets
noinstr validation efforts.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/arm64/include/asm/arch_timer.h  |    8 ----
 drivers/clocksource/arm_arch_timer.c |   60 ++++++++++++++++++++++++++---------
 2 files changed, 47 insertions(+), 21 deletions(-)

--- a/arch/arm64/include/asm/arch_timer.h
+++ b/arch/arm64/include/asm/arch_timer.h
@@ -88,13 +88,7 @@ static inline notrace u64 arch_timer_rea
 
 #define arch_timer_reg_read_stable(reg)					\
 	({								\
-		u64 _val;						\
-									\
-		preempt_disable_notrace();				\
-		_val = erratum_handler(read_ ## reg)();			\
-		preempt_enable_notrace();				\
-									\
-		_val;							\
+		erratum_handler(read_ ## reg)();			\
 	})
 
 /*
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -191,22 +191,40 @@ u32 arch_timer_reg_read(int access, enum
 	return val;
 }
 
-static notrace u64 arch_counter_get_cntpct_stable(void)
+static noinstr u64 _arch_counter_get_cntpct_stable(void)
 {
 	return __arch_counter_get_cntpct_stable();
 }
 
-static notrace u64 arch_counter_get_cntpct(void)
+static notrace u64 arch_counter_get_cntpct_stable(void)
+{
+	u64 val;
+	preempt_disable_notrace();
+	val = __arch_counter_get_cntpct_stable();
+	preempt_enable_notrace();
+	return val;
+}
+
+static noinstr u64 arch_counter_get_cntpct(void)
 {
 	return __arch_counter_get_cntpct();
 }
 
-static notrace u64 arch_counter_get_cntvct_stable(void)
+static noinstr u64 _arch_counter_get_cntvct_stable(void)
 {
 	return __arch_counter_get_cntvct_stable();
 }
 
-static notrace u64 arch_counter_get_cntvct(void)
+static notrace u64 arch_counter_get_cntvct_stable(void)
+{
+	u64 val;
+	preempt_disable_notrace();
+	val = __arch_counter_get_cntvct_stable();
+	preempt_enable_notrace();
+	return val;
+}
+
+static noinstr u64 arch_counter_get_cntvct(void)
 {
 	return __arch_counter_get_cntvct();
 }
@@ -753,14 +771,14 @@ static int arch_timer_set_next_event_phy
 	return 0;
 }
 
-static u64 arch_counter_get_cnt_mem(struct arch_timer *t, int offset_lo)
+static noinstr u64 arch_counter_get_cnt_mem(struct arch_timer *t, int offset_lo)
 {
 	u32 cnt_lo, cnt_hi, tmp_hi;
 
 	do {
-		cnt_hi = readl_relaxed(t->base + offset_lo + 4);
-		cnt_lo = readl_relaxed(t->base + offset_lo);
-		tmp_hi = readl_relaxed(t->base + offset_lo + 4);
+		cnt_hi = __raw_readl(t->base + offset_lo + 4);
+		cnt_lo = __raw_readl(t->base + offset_lo);
+		tmp_hi = __raw_readl(t->base + offset_lo + 4);
 	} while (cnt_hi != tmp_hi);
 
 	return ((u64) cnt_hi << 32) | cnt_lo;
@@ -1060,7 +1078,7 @@ bool arch_timer_evtstrm_available(void)
 	return cpumask_test_cpu(raw_smp_processor_id(), &evtstrm_available);
 }
 
-static u64 arch_counter_get_cntvct_mem(void)
+static noinstr u64 arch_counter_get_cntvct_mem(void)
 {
 	return arch_counter_get_cnt_mem(arch_timer_mem, CNTVCT_LO);
 }
@@ -1074,6 +1092,13 @@ struct arch_timer_kvm_info *arch_timer_g
 
 static void __init arch_counter_register(unsigned type)
 {
+	/*
+	 * Default to cp15 based access because arm64 uses this function for
+	 * sched_clock() before DT is probed and the cp15 method is guaranteed
+	 * to exist on arm64. arm doesn't use this before DT is probed so even
+	 * if we don't have the cp15 accessors we won't have a problem.
+	 */
+	u64 (*scr)(void) = arch_counter_get_cntvct;
 	u64 start_count;
 	int width;
 
@@ -1083,21 +1108,28 @@ static void __init arch_counter_register
 
 		if ((IS_ENABLED(CONFIG_ARM64) && !is_hyp_mode_available()) ||
 		    arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI) {
-			if (arch_timer_counter_has_wa())
+			if (arch_timer_counter_has_wa()) {
 				rd = arch_counter_get_cntvct_stable;
-			else
+				scr = _arch_counter_get_cntvct_stable;
+			} else {
 				rd = arch_counter_get_cntvct;
+				scr = arch_counter_get_cntvct;
+			}
 		} else {
-			if (arch_timer_counter_has_wa())
+			if (arch_timer_counter_has_wa()) {
 				rd = arch_counter_get_cntpct_stable;
-			else
+				scr = _arch_counter_get_cntpct_stable;
+			} else {
 				rd = arch_counter_get_cntpct;
+				scr = arch_counter_get_cntpct;
+			}
 		}
 
 		arch_timer_read_counter = rd;
 		clocksource_counter.vdso_clock_mode = vdso_default;
 	} else {
 		arch_timer_read_counter = arch_counter_get_cntvct_mem;
+		scr = arch_counter_get_cntvct_mem;
 	}
 
 	width = arch_counter_get_width();
@@ -1113,7 +1145,7 @@ static void __init arch_counter_register
 	timecounter_init(&arch_timer_kvm_info.timecounter,
 			 &cyclecounter, start_count);
 
-	sched_clock_register(arch_timer_read_counter, width, arch_timer_rate);
+	sched_clock_register(scr, width, arch_timer_rate);
 }
 
 static void arch_timer_stop(struct clock_event_device *clk)



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 05/13] loongarch: Provide noinstr sched_clock_read()
  2023-05-19 10:20 [PATCH v2 00/13] local_clock() vs noinstr Peter Zijlstra
                   ` (3 preceding siblings ...)
  2023-05-19 10:21 ` [PATCH v2 04/13] arm64/arch_timer: Provide noinstr sched_clock_read() functions Peter Zijlstra
@ 2023-05-19 10:21 ` Peter Zijlstra
  2023-05-19 10:21 ` [PATCH v2 06/13] s390/time: Provide sched_clock_noinstr() Peter Zijlstra
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2023-05-19 10:21 UTC (permalink / raw)
  To: bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel, hca,
	gor, agordeev, borntraeger, svens, pbonzini, wanpengli, vkuznets,
	tglx, mingo, bp, dave.hansen, x86, hpa, jgross, boris.ostrovsky,
	daniel.lezcano, kys, haiyangz, wei.liu, decui, rafael, peterz,
	longman, boqun.feng, pmladek, senozhatsky, rostedt, john.ogness,
	juri.lelli, vincent.guittot, dietmar.eggemann, bsegall, mgorman,
	bristot, vschneid, jstultz, sboyd, linux-kernel, loongarch,
	linux-s390, kvm, linux-hyperv, linux-pm

With the intent to provide local_clock_noinstr(), a variant of
local_clock() that's safe to be called from noinstr code (with the
assumption that any such code will already be non-preemptible),
prepare for things by providing a noinstr sched_clock_read() function.

Specifically, preempt_enable_*() calls out to schedule(), which upsets
noinstr validation efforts.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/loongarch/include/asm/loongarch.h |    2 +-
 arch/loongarch/kernel/time.c           |    6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

--- a/arch/loongarch/include/asm/loongarch.h
+++ b/arch/loongarch/include/asm/loongarch.h
@@ -1167,7 +1167,7 @@ static __always_inline void iocsr_write6
 
 #ifndef __ASSEMBLY__
 
-static inline u64 drdtime(void)
+static __always_inline u64 drdtime(void)
 {
 	int rID = 0;
 	u64 val = 0;
--- a/arch/loongarch/kernel/time.c
+++ b/arch/loongarch/kernel/time.c
@@ -190,9 +190,9 @@ static u64 read_const_counter(struct clo
 	return drdtime();
 }
 
-static u64 native_sched_clock(void)
+static noinstr u64 sched_clock_read(void)
 {
-	return read_const_counter(NULL);
+	return drdtime();
 }
 
 static struct clocksource clocksource_const = {
@@ -211,7 +211,7 @@ int __init constant_clocksource_init(voi
 
 	res = clocksource_register_hz(&clocksource_const, freq);
 
-	sched_clock_register(native_sched_clock, 64, freq);
+	sched_clock_register(sched_clock_read, 64, freq);
 
 	pr_info("Constant clock source device register\n");
 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 06/13] s390/time: Provide sched_clock_noinstr()
  2023-05-19 10:20 [PATCH v2 00/13] local_clock() vs noinstr Peter Zijlstra
                   ` (4 preceding siblings ...)
  2023-05-19 10:21 ` [PATCH v2 05/13] loongarch: Provide noinstr sched_clock_read() Peter Zijlstra
@ 2023-05-19 10:21 ` Peter Zijlstra
  2023-05-22 14:18   ` Heiko Carstens
  2023-05-19 10:21 ` [PATCH v2 07/13] math64: Always inline u128 version of mul_u64_u64_shr() Peter Zijlstra
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2023-05-19 10:21 UTC (permalink / raw)
  To: bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel, hca,
	gor, agordeev, borntraeger, svens, pbonzini, wanpengli, vkuznets,
	tglx, mingo, bp, dave.hansen, x86, hpa, jgross, boris.ostrovsky,
	daniel.lezcano, kys, haiyangz, wei.liu, decui, rafael, peterz,
	longman, boqun.feng, pmladek, senozhatsky, rostedt, john.ogness,
	juri.lelli, vincent.guittot, dietmar.eggemann, bsegall, mgorman,
	bristot, vschneid, jstultz, sboyd, linux-kernel, loongarch,
	linux-s390, kvm, linux-hyperv, linux-pm

With the intent to provide local_clock_noinstr(), a variant of
local_clock() that's safe to be called from noinstr code (with the
assumption that any such code will already be non-preemptible),
prepare for things by providing a noinstr sched_clock_noinstr()
function.

Specifically, preempt_enable_*() calls out to schedule(), which upsets
noinstr validation efforts.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/s390/include/asm/timex.h |   13 +++++++++----
 arch/s390/kernel/time.c       |   11 ++++++++++-
 2 files changed, 19 insertions(+), 5 deletions(-)

--- a/arch/s390/include/asm/timex.h
+++ b/arch/s390/include/asm/timex.h
@@ -63,7 +63,7 @@ static inline int store_tod_clock_ext_cc
 	return cc;
 }
 
-static inline void store_tod_clock_ext(union tod_clock *tod)
+static __always_inline void store_tod_clock_ext(union tod_clock *tod)
 {
 	asm volatile("stcke %0" : "=Q" (*tod) : : "cc");
 }
@@ -177,7 +177,7 @@ static inline void local_tick_enable(uns
 
 typedef unsigned long cycles_t;
 
-static inline unsigned long get_tod_clock(void)
+static __always_inline unsigned long get_tod_clock(void)
 {
 	union tod_clock clk;
 
@@ -204,6 +204,11 @@ void init_cpu_timer(void);
 
 extern union tod_clock tod_clock_base;
 
+static __always_inline unsigned long __get_tod_clock_monotonic(void)
+{
+	return get_tod_clock() - tod_clock_base.tod;
+}
+
 /**
  * get_clock_monotonic - returns current time in clock rate units
  *
@@ -216,7 +221,7 @@ static inline unsigned long get_tod_cloc
 	unsigned long tod;
 
 	preempt_disable_notrace();
-	tod = get_tod_clock() - tod_clock_base.tod;
+	tod = __get_tod_clock_monotonic();
 	preempt_enable_notrace();
 	return tod;
 }
@@ -240,7 +245,7 @@ static inline unsigned long get_tod_cloc
  * -> ns = (th * 125) + ((tl * 125) >> 9);
  *
  */
-static inline unsigned long tod_to_ns(unsigned long todval)
+static __always_inline unsigned long tod_to_ns(unsigned long todval)
 {
 	return ((todval >> 9) * 125) + (((todval & 0x1ff) * 125) >> 9);
 }
--- a/arch/s390/kernel/time.c
+++ b/arch/s390/kernel/time.c
@@ -102,6 +102,11 @@ void __init time_early_init(void)
 			((long) qui.old_leap * 4096000000L);
 }
 
+unsigned long long noinstr sched_clock_noinstr(void)
+{
+	return tod_to_ns(__get_tod_clock_monotonic());
+}
+
 /*
  * Scheduler clock - returns current time in nanosec units.
  */



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 07/13] math64: Always inline u128 version of mul_u64_u64_shr()
  2023-05-19 10:20 [PATCH v2 00/13] local_clock() vs noinstr Peter Zijlstra
                   ` (5 preceding siblings ...)
  2023-05-19 10:21 ` [PATCH v2 06/13] s390/time: Provide sched_clock_noinstr() Peter Zijlstra
@ 2023-05-19 10:21 ` Peter Zijlstra
  2023-05-19 10:21 ` [PATCH v2 08/13] x86/vdso: Fix gettimeofday masking Peter Zijlstra
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2023-05-19 10:21 UTC (permalink / raw)
  To: bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel, hca,
	gor, agordeev, borntraeger, svens, pbonzini, wanpengli, vkuznets,
	tglx, mingo, bp, dave.hansen, x86, hpa, jgross, boris.ostrovsky,
	daniel.lezcano, kys, haiyangz, wei.liu, decui, rafael, peterz,
	longman, boqun.feng, pmladek, senozhatsky, rostedt, john.ogness,
	juri.lelli, vincent.guittot, dietmar.eggemann, bsegall, mgorman,
	bristot, vschneid, jstultz, sboyd, linux-kernel, loongarch,
	linux-s390, kvm, linux-hyperv, linux-pm

In order to prevent the following complaint from happening, always
inline the u128 variant of mul_u64_u64_shr() -- which is what x86_64
will use.

  vmlinux.o: warning: objtool: read_hv_sched_clock_tsc+0x5a: call to mul_u64_u64_shr.constprop.0() leaves .noinstr.text section

It should compile into something like:

  asm("mul	%[mul];"
      "shrd	%rdx, %rax, %cl"
      : "+&a" (a)
      : "c" shift, [mul] "r" (mul)
      : "d");

Which is silly not to inline, but it happens.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/math64.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/include/linux/math64.h
+++ b/include/linux/math64.h
@@ -168,7 +168,7 @@ static __always_inline u64 mul_u64_u32_s
 #endif /* mul_u64_u32_shr */
 
 #ifndef mul_u64_u64_shr
-static inline u64 mul_u64_u64_shr(u64 a, u64 mul, unsigned int shift)
+static __always_inline u64 mul_u64_u64_shr(u64 a, u64 mul, unsigned int shift)
 {
 	return (u64)(((unsigned __int128)a * mul) >> shift);
 }



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 08/13] x86/vdso: Fix gettimeofday masking
  2023-05-19 10:20 [PATCH v2 00/13] local_clock() vs noinstr Peter Zijlstra
                   ` (6 preceding siblings ...)
  2023-05-19 10:21 ` [PATCH v2 07/13] math64: Always inline u128 version of mul_u64_u64_shr() Peter Zijlstra
@ 2023-05-19 10:21 ` Peter Zijlstra
  2023-05-31 15:27   ` Thomas Gleixner
  2023-05-31 22:46   ` Thomas Gleixner
  2023-05-19 10:21 ` [PATCH v2 09/13] clocksource: hyper-v: Adjust hv_read_tsc_page_tsc() to avoid special casing U64_MAX Peter Zijlstra
                   ` (5 subsequent siblings)
  13 siblings, 2 replies; 25+ messages in thread
From: Peter Zijlstra @ 2023-05-19 10:21 UTC (permalink / raw)
  To: bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel, hca,
	gor, agordeev, borntraeger, svens, pbonzini, wanpengli, vkuznets,
	tglx, mingo, bp, dave.hansen, x86, hpa, jgross, boris.ostrovsky,
	daniel.lezcano, kys, haiyangz, wei.liu, decui, rafael, peterz,
	longman, boqun.feng, pmladek, senozhatsky, rostedt, john.ogness,
	juri.lelli, vincent.guittot, dietmar.eggemann, bsegall, mgorman,
	bristot, vschneid, jstultz, sboyd, linux-kernel, loongarch,
	linux-s390, kvm, linux-hyperv, linux-pm

Because of how the virtual clocks use U64_MAX as an exception value
instead of a valid time, the clocks can no longer be assumed to wrap
cleanly. This is then compounded by arch_vdso_cycles_ok() rejecting
everything with the MSB/Sign-bit set.

Therefore, the effective mask becomes S64_MAX, and the comment with
vdso_calc_delta() that states the mask is U64_MAX and isn't optimized
out is just plain silly.

Now, the code has a negative filter -- to deal with TSC wobbles:

	if (cycles > last)

which is just plain wrong, because it should've been written as:

	if ((s64)(cycles - last) > 0)

to take wrapping into account, but per all the above, we don't
actually wrap on u64 anymore.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/vdso/gettimeofday.h |   39 ++++++++++++++++++++++---------
 1 file changed, 28 insertions(+), 11 deletions(-)

--- a/arch/x86/include/asm/vdso/gettimeofday.h
+++ b/arch/x86/include/asm/vdso/gettimeofday.h
@@ -231,14 +231,17 @@ static u64 vread_pvclock(void)
 		ret = __pvclock_read_cycles(pvti, rdtsc_ordered());
 	} while (pvclock_read_retry(pvti, version));
 
-	return ret;
+	return ret & S64_MAX;
 }
 #endif
 
 #ifdef CONFIG_HYPERV_TIMER
 static u64 vread_hvclock(void)
 {
-	return hv_read_tsc_page(&hvclock_page);
+	u64 ret = hv_read_tsc_page(&hvclock_page);
+	if (likely(ret != U64_MAX))
+		ret &= S64_MAX;
+	return ret;
 }
 #endif
 
@@ -246,7 +249,7 @@ static inline u64 __arch_get_hw_counter(
 					const struct vdso_data *vd)
 {
 	if (likely(clock_mode == VDSO_CLOCKMODE_TSC))
-		return (u64)rdtsc_ordered();
+		return (u64)rdtsc_ordered() & S64_MAX;
 	/*
 	 * For any memory-mapped vclock type, we need to make sure that gcc
 	 * doesn't cleverly hoist a load before the mode check.  Otherwise we
@@ -284,6 +287,9 @@ static inline bool arch_vdso_clocksource
  * which can be invalidated asynchronously and indicate invalidation by
  * returning U64_MAX, which can be effectively tested by checking for a
  * negative value after casting it to s64.
+ *
+ * This effectively forces a S64_MAX mask on the calculations, unlike the
+ * U64_MAX mask normally used by x86 clocksources.
  */
 static inline bool arch_vdso_cycles_ok(u64 cycles)
 {
@@ -303,18 +309,29 @@ static inline bool arch_vdso_cycles_ok(u
  * @last. If not then use @last, which is the base time of the current
  * conversion period.
  *
- * This variant also removes the masking of the subtraction because the
- * clocksource mask of all VDSO capable clocksources on x86 is U64_MAX
- * which would result in a pointless operation. The compiler cannot
- * optimize it away as the mask comes from the vdso data and is not compile
- * time constant.
+ * This variant also uses a custom mask because while the clocksource mask of
+ * all the VDSO capable clocksources on x86 is U64_MAX, the above code uses
+ * U64_MASK as an exception value, additionally arch_vdso_cycles_ok() above
+ * declares everything with the MSB/Sign-bit set as invalid. Therefore the
+ * effective mask is S64_MAX.
  */
 static __always_inline
 u64 vdso_calc_delta(u64 cycles, u64 last, u64 mask, u32 mult)
 {
-	if (cycles > last)
-		return (cycles - last) * mult;
-	return 0;
+	/*
+	 * Due to the MSB/Sign-bit being used as invald marker (see
+	 * arch_vdso_cycles_valid() above), the effective mask is S64_MAX.
+	 */
+	u64 delta = (cycles - last) & S64_MAX;
+
+	/*
+	 * Due to the above mentioned TSC wobbles, filter out negative motion.
+	 * Per the above masking, the effective sign bit is now bit 62.
+	 */
+	if (unlikely(delta & (1ULL << 62)))
+		return 0;
+
+	return delta * mult;
 }
 #define vdso_calc_delta vdso_calc_delta
 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 09/13] clocksource: hyper-v: Adjust hv_read_tsc_page_tsc() to avoid special casing U64_MAX
  2023-05-19 10:20 [PATCH v2 00/13] local_clock() vs noinstr Peter Zijlstra
                   ` (7 preceding siblings ...)
  2023-05-19 10:21 ` [PATCH v2 08/13] x86/vdso: Fix gettimeofday masking Peter Zijlstra
@ 2023-05-19 10:21 ` Peter Zijlstra
  2023-05-19 18:38   ` Michael Kelley (LINUX)
  2023-05-19 10:21 ` [PATCH v2 10/13] clocksource: hyper-v: Provide noinstr sched_clock() Peter Zijlstra
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2023-05-19 10:21 UTC (permalink / raw)
  To: bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel, hca,
	gor, agordeev, borntraeger, svens, pbonzini, wanpengli, vkuznets,
	tglx, mingo, bp, dave.hansen, x86, hpa, jgross, boris.ostrovsky,
	daniel.lezcano, kys, haiyangz, wei.liu, decui, rafael, peterz,
	longman, boqun.feng, pmladek, senozhatsky, rostedt, john.ogness,
	juri.lelli, vincent.guittot, dietmar.eggemann, bsegall, mgorman,
	bristot, vschneid, jstultz, sboyd, linux-kernel, loongarch,
	linux-s390, kvm, linux-hyperv, linux-pm

Currently hv_read_tsc_page_tsc() (ab)uses the (valid) time value of
U64_MAX as an error return. This breaks the clean wrap-around of the
clock.

Modify the function signature to return a boolean state and provide
another u64 pointer to store the actual time on success. This obviates
the need to steal one time value and restores the full counter width.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/vdso/gettimeofday.h |   10 ++++++----
 arch/x86/kvm/x86.c                       |    7 +++----
 drivers/clocksource/hyperv_timer.c       |   16 +++++++++++-----
 include/clocksource/hyperv_timer.h       |   24 +++++++++---------------
 4 files changed, 29 insertions(+), 28 deletions(-)

--- a/arch/x86/include/asm/vdso/gettimeofday.h
+++ b/arch/x86/include/asm/vdso/gettimeofday.h
@@ -238,10 +238,12 @@ static u64 vread_pvclock(void)
 #ifdef CONFIG_HYPERV_TIMER
 static u64 vread_hvclock(void)
 {
-	u64 ret = hv_read_tsc_page(&hvclock_page);
-	if (likely(ret != U64_MAX))
-		ret &= S64_MAX;
-	return ret;
+	u64 tsc, time;
+
+	if (hv_read_tsc_page_tsc(&hvclock_page, &tsc, &time))
+		return time & S64_MAX;
+
+	return U64_MAX;
 }
 #endif
 
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2799,14 +2799,13 @@ static u64 read_tsc(void)
 static inline u64 vgettsc(struct pvclock_clock *clock, u64 *tsc_timestamp,
 			  int *mode)
 {
-	long v;
 	u64 tsc_pg_val;
+	long v;
 
 	switch (clock->vclock_mode) {
 	case VDSO_CLOCKMODE_HVCLOCK:
-		tsc_pg_val = hv_read_tsc_page_tsc(hv_get_tsc_page(),
-						  tsc_timestamp);
-		if (tsc_pg_val != U64_MAX) {
+		if (hv_read_tsc_page_tsc(hv_get_tsc_page(),
+					 tsc_timestamp, &tsc_pg_val)) {
 			/* TSC page valid */
 			*mode = VDSO_CLOCKMODE_HVCLOCK;
 			v = (tsc_pg_val - clock->cycle_last) &
--- a/drivers/clocksource/hyperv_timer.c
+++ b/drivers/clocksource/hyperv_timer.c
@@ -393,14 +393,20 @@ struct ms_hyperv_tsc_page *hv_get_tsc_pa
 }
 EXPORT_SYMBOL_GPL(hv_get_tsc_page);
 
-static u64 notrace read_hv_clock_tsc(void)
+static notrace u64 read_hv_clock_tsc(void)
 {
-	u64 current_tick = hv_read_tsc_page(hv_get_tsc_page());
+	u64 cur_tsc, time;
 
-	if (current_tick == U64_MAX)
-		current_tick = hv_get_register(HV_REGISTER_TIME_REF_COUNT);
+	/*
+	 * The Hyper-V Top-Level Function Spec (TLFS), section Timers,
+	 * subsection Refererence Counter, guarantees that the TSC and MSR
+	 * times are in sync and monotonic. Therefore we can fall back
+	 * to the MSR in case the TSC page indicates unavailability.
+	 */
+	if (!hv_read_tsc_page_tsc(tsc_page, &cur_tsc, &time))
+		time = hv_get_register(HV_REGISTER_TIME_REF_COUNT);
 
-	return current_tick;
+	return time;
 }
 
 static u64 notrace read_hv_clock_tsc_cs(struct clocksource *arg)
--- a/include/clocksource/hyperv_timer.h
+++ b/include/clocksource/hyperv_timer.h
@@ -38,8 +38,9 @@ extern void hv_remap_tsc_clocksource(voi
 extern unsigned long hv_get_tsc_pfn(void);
 extern struct ms_hyperv_tsc_page *hv_get_tsc_page(void);
 
-static inline notrace u64
-hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg, u64 *cur_tsc)
+static __always_inline bool
+hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg,
+		     u64 *cur_tsc, u64 *time)
 {
 	u64 scale, offset;
 	u32 sequence;
@@ -63,7 +64,7 @@ hv_read_tsc_page_tsc(const struct ms_hyp
 	do {
 		sequence = READ_ONCE(tsc_pg->tsc_sequence);
 		if (!sequence)
-			return U64_MAX;
+			return false;
 		/*
 		 * Make sure we read sequence before we read other values from
 		 * TSC page.
@@ -82,15 +83,8 @@ hv_read_tsc_page_tsc(const struct ms_hyp
 
 	} while (READ_ONCE(tsc_pg->tsc_sequence) != sequence);
 
-	return mul_u64_u64_shr(*cur_tsc, scale, 64) + offset;
-}
-
-static inline notrace u64
-hv_read_tsc_page(const struct ms_hyperv_tsc_page *tsc_pg)
-{
-	u64 cur_tsc;
-
-	return hv_read_tsc_page_tsc(tsc_pg, &cur_tsc);
+	*time = mul_u64_u64_shr(*cur_tsc, scale, 64) + offset;
+	return true;
 }
 
 #else /* CONFIG_HYPERV_TIMER */
@@ -104,10 +98,10 @@ static inline struct ms_hyperv_tsc_page
 	return NULL;
 }
 
-static inline u64 hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg,
-				       u64 *cur_tsc)
+static __always_inline bool
+hv_read_tsc_page_tsc(const struct ms_hyperv_tsc_page *tsc_pg, u64 *cur_tsc, u64 *time)
 {
-	return U64_MAX;
+	return false;
 }
 
 static inline int hv_stimer_cleanup(unsigned int cpu) { return 0; }



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 10/13] clocksource: hyper-v: Provide noinstr sched_clock()
  2023-05-19 10:20 [PATCH v2 00/13] local_clock() vs noinstr Peter Zijlstra
                   ` (8 preceding siblings ...)
  2023-05-19 10:21 ` [PATCH v2 09/13] clocksource: hyper-v: Adjust hv_read_tsc_page_tsc() to avoid special casing U64_MAX Peter Zijlstra
@ 2023-05-19 10:21 ` Peter Zijlstra
  2023-05-19 10:21 ` [PATCH v2 11/13] x86/tsc: Provide sched_clock_noinstr() Peter Zijlstra
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2023-05-19 10:21 UTC (permalink / raw)
  To: bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel, hca,
	gor, agordeev, borntraeger, svens, pbonzini, wanpengli, vkuznets,
	tglx, mingo, bp, dave.hansen, x86, hpa, jgross, boris.ostrovsky,
	daniel.lezcano, kys, haiyangz, wei.liu, decui, rafael, peterz,
	longman, boqun.feng, pmladek, senozhatsky, rostedt, john.ogness,
	juri.lelli, vincent.guittot, dietmar.eggemann, bsegall, mgorman,
	bristot, vschneid, jstultz, sboyd, linux-kernel, loongarch,
	linux-s390, kvm, linux-hyperv, linux-pm, Michael Kelley

With the intent to provide local_clock_noinstr(), a variant of
local_clock() that's safe to be called from noinstr code (with the
assumption that any such code will already be non-preemptible),
prepare for things by making the Hyper-V TSC and MSR sched_clock
implementations noinstr.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Co-developed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/mshyperv.h    |    5 +++++
 drivers/clocksource/hyperv_timer.c |   32 ++++++++++++++++++--------------
 2 files changed, 23 insertions(+), 14 deletions(-)

--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -257,6 +257,11 @@ void hv_set_register(unsigned int reg, u
 u64 hv_get_non_nested_register(unsigned int reg);
 void hv_set_non_nested_register(unsigned int reg, u64 value);
 
+static __always_inline u64 hv_raw_get_register(unsigned int reg)
+{
+	return __rdmsr(reg);
+}
+
 #else /* CONFIG_HYPERV */
 static inline void hyperv_init(void) {}
 static inline void hyperv_setup_mmu_ops(void) {}
--- a/drivers/clocksource/hyperv_timer.c
+++ b/drivers/clocksource/hyperv_timer.c
@@ -365,6 +365,20 @@ void hv_stimer_global_cleanup(void)
 }
 EXPORT_SYMBOL_GPL(hv_stimer_global_cleanup);
 
+static __always_inline u64 read_hv_clock_msr(void)
+{
+	/*
+	 * Read the partition counter to get the current tick count. This count
+	 * is set to 0 when the partition is created and is incremented in 100
+	 * nanosecond units.
+	 *
+	 * Use hv_raw_get_register() because this function is used from
+	 * noinstr. Notable; while HV_REGISTER_TIME_REF_COUNT is a synthetic
+	 * register it doesn't need the GHCB path.
+	 */
+	return hv_raw_get_register(HV_REGISTER_TIME_REF_COUNT);
+}
+
 /*
  * Code and definitions for the Hyper-V clocksources.  Two
  * clocksources are defined: one that reads the Hyper-V defined MSR, and
@@ -393,7 +407,7 @@ struct ms_hyperv_tsc_page *hv_get_tsc_pa
 }
 EXPORT_SYMBOL_GPL(hv_get_tsc_page);
 
-static notrace u64 read_hv_clock_tsc(void)
+static __always_inline u64 read_hv_clock_tsc(void)
 {
 	u64 cur_tsc, time;
 
@@ -404,7 +418,7 @@ static notrace u64 read_hv_clock_tsc(voi
 	 * to the MSR in case the TSC page indicates unavailability.
 	 */
 	if (!hv_read_tsc_page_tsc(tsc_page, &cur_tsc, &time))
-		time = hv_get_register(HV_REGISTER_TIME_REF_COUNT);
+		time = read_hv_clock_msr();
 
 	return time;
 }
@@ -414,7 +428,7 @@ static u64 notrace read_hv_clock_tsc_cs(
 	return read_hv_clock_tsc();
 }
 
-static u64 notrace read_hv_sched_clock_tsc(void)
+static u64 noinstr read_hv_sched_clock_tsc(void)
 {
 	return (read_hv_clock_tsc() - hv_sched_clock_offset) *
 		(NSEC_PER_SEC / HV_CLOCK_HZ);
@@ -466,22 +480,12 @@ static struct clocksource hyperv_cs_tsc
 #endif
 };
 
-static u64 notrace read_hv_clock_msr(void)
-{
-	/*
-	 * Read the partition counter to get the current tick count. This count
-	 * is set to 0 when the partition is created and is incremented in
-	 * 100 nanosecond units.
-	 */
-	return hv_get_register(HV_REGISTER_TIME_REF_COUNT);
-}
-
 static u64 notrace read_hv_clock_msr_cs(struct clocksource *arg)
 {
 	return read_hv_clock_msr();
 }
 
-static u64 notrace read_hv_sched_clock_msr(void)
+static u64 noinstr read_hv_sched_clock_msr(void)
 {
 	return (read_hv_clock_msr() - hv_sched_clock_offset) *
 		(NSEC_PER_SEC / HV_CLOCK_HZ);



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 11/13] x86/tsc: Provide sched_clock_noinstr()
  2023-05-19 10:20 [PATCH v2 00/13] local_clock() vs noinstr Peter Zijlstra
                   ` (9 preceding siblings ...)
  2023-05-19 10:21 ` [PATCH v2 10/13] clocksource: hyper-v: Provide noinstr sched_clock() Peter Zijlstra
@ 2023-05-19 10:21 ` Peter Zijlstra
  2023-05-19 10:21 ` [PATCH v2 12/13] sched/clock: Provide local_clock_noinstr() Peter Zijlstra
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2023-05-19 10:21 UTC (permalink / raw)
  To: bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel, hca,
	gor, agordeev, borntraeger, svens, pbonzini, wanpengli, vkuznets,
	tglx, mingo, bp, dave.hansen, x86, hpa, jgross, boris.ostrovsky,
	daniel.lezcano, kys, haiyangz, wei.liu, decui, rafael, peterz,
	longman, boqun.feng, pmladek, senozhatsky, rostedt, john.ogness,
	juri.lelli, vincent.guittot, dietmar.eggemann, bsegall, mgorman,
	bristot, vschneid, jstultz, sboyd, linux-kernel, loongarch,
	linux-s390, kvm, linux-hyperv, linux-pm

With the intent to provide local_clock_noinstr(), a variant of
local_clock() that's safe to be called from noinstr code (with the
assumption that any such code will already be non-preemptible),
prepare for things by providing a noinstr sched_clock_noinstr()
function.

Specifically, preempt_enable_*() calls out to schedule(), which upsets
noinstr validation efforts.

  vmlinux.o: warning: objtool: native_sched_clock+0x96: call to preempt_schedule_notrace_thunk() leaves .noinstr.text section
  vmlinux.o: warning: objtool: kvm_clock_read+0x22: call to preempt_schedule_notrace_thunk() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/hyperv_timer.h |    5 ++++
 arch/x86/kernel/kvmclock.c          |    4 +--
 arch/x86/kernel/tsc.c               |   38 +++++++++++++++++++++++--------
 arch/x86/kvm/x86.c                  |    7 ++---
 arch/x86/xen/time.c                 |    3 --
 drivers/clocksource/hyperv_timer.c  |   44 ++++++++++++++++++++++--------------
 include/clocksource/hyperv_timer.h  |   24 +++++++------------
 7 files changed, 76 insertions(+), 49 deletions(-)

--- a/arch/x86/kernel/kvmclock.c
+++ b/arch/x86/kernel/kvmclock.c
@@ -71,7 +71,7 @@ static int kvm_set_wallclock(const struc
 	return -ENODEV;
 }
 
-static noinstr u64 kvm_clock_read(void)
+static u64 kvm_clock_read(void)
 {
 	u64 ret;
 
@@ -88,7 +88,7 @@ static u64 kvm_clock_get_cycles(struct c
 
 static noinstr u64 kvm_sched_clock_read(void)
 {
-	return kvm_clock_read() - kvm_sched_clock_offset;
+	return pvclock_clocksource_read_nowd(this_cpu_pvti()) - kvm_sched_clock_offset;
 }
 
 static inline void kvm_sched_clock_init(bool stable)
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -69,12 +69,10 @@ static int __init tsc_early_khz_setup(ch
 }
 early_param("tsc_early_khz", tsc_early_khz_setup);
 
-__always_inline void cyc2ns_read_begin(struct cyc2ns_data *data)
+__always_inline void __cyc2ns_read(struct cyc2ns_data *data)
 {
 	int seq, idx;
 
-	preempt_disable_notrace();
-
 	do {
 		seq = this_cpu_read(cyc2ns.seq.seqcount.sequence);
 		idx = seq & 1;
@@ -86,6 +84,12 @@ __always_inline void cyc2ns_read_begin(s
 	} while (unlikely(seq != this_cpu_read(cyc2ns.seq.seqcount.sequence)));
 }
 
+__always_inline void cyc2ns_read_begin(struct cyc2ns_data *data)
+{
+	preempt_disable_notrace();
+	__cyc2ns_read(data);
+}
+
 __always_inline void cyc2ns_read_end(void)
 {
 	preempt_enable_notrace();
@@ -115,18 +119,25 @@ __always_inline void cyc2ns_read_end(voi
  *                      -johnstul@us.ibm.com "math is hard, lets go shopping!"
  */
 
-static __always_inline unsigned long long cycles_2_ns(unsigned long long cyc)
+static __always_inline unsigned long long __cycles_2_ns(unsigned long long cyc)
 {
 	struct cyc2ns_data data;
 	unsigned long long ns;
 
-	cyc2ns_read_begin(&data);
+	__cyc2ns_read(&data);
 
 	ns = data.cyc2ns_offset;
 	ns += mul_u64_u32_shr(cyc, data.cyc2ns_mul, data.cyc2ns_shift);
 
-	cyc2ns_read_end();
+	return ns;
+}
 
+static __always_inline unsigned long long cycles_2_ns(unsigned long long cyc)
+{
+	unsigned long long ns;
+	preempt_disable_notrace();
+	ns = __cycles_2_ns(cyc);
+	preempt_enable_notrace();
 	return ns;
 }
 
@@ -223,7 +234,7 @@ noinstr u64 native_sched_clock(void)
 		u64 tsc_now = rdtsc();
 
 		/* return the value in ns */
-		return cycles_2_ns(tsc_now);
+		return __cycles_2_ns(tsc_now);
 	}
 
 	/*
@@ -250,7 +261,7 @@ u64 native_sched_clock_from_tsc(u64 tsc)
 /* We need to define a real function for sched_clock, to override the
    weak default version */
 #ifdef CONFIG_PARAVIRT
-noinstr u64 sched_clock(void)
+noinstr u64 sched_clock_noinstr(void)
 {
 	return paravirt_sched_clock();
 }
@@ -260,11 +271,20 @@ bool using_native_sched_clock(void)
 	return static_call_query(pv_sched_clock) == native_sched_clock;
 }
 #else
-u64 sched_clock(void) __attribute__((alias("native_sched_clock")));
+u64 sched_clock_noinstr(void) __attribute__((alias("native_sched_clock")));
 
 bool using_native_sched_clock(void) { return true; }
 #endif
 
+notrace u64 sched_clock(void)
+{
+	u64 now;
+	preempt_disable_notrace();
+	now = sched_clock_noinstr();
+	preempt_enable_notrace();
+	return now;
+}
+
 int check_tsc_unstable(void)
 {
 	return tsc_unstable;
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -66,11 +66,10 @@ static noinstr u64 xen_sched_clock(void)
         struct pvclock_vcpu_time_info *src;
 	u64 ret;
 
-	preempt_disable_notrace();
 	src = &__this_cpu_read(xen_vcpu)->time;
 	ret = pvclock_clocksource_read_nowd(src);
 	ret -= xen_sched_clock_offset;
-	preempt_enable_notrace();
+
 	return ret;
 }
 



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 12/13] sched/clock: Provide local_clock_noinstr()
  2023-05-19 10:20 [PATCH v2 00/13] local_clock() vs noinstr Peter Zijlstra
                   ` (10 preceding siblings ...)
  2023-05-19 10:21 ` [PATCH v2 11/13] x86/tsc: Provide sched_clock_noinstr() Peter Zijlstra
@ 2023-05-19 10:21 ` Peter Zijlstra
  2023-05-19 10:21 ` [PATCH v2 13/13] cpuidle: Use local_clock_noinstr() Peter Zijlstra
  2023-05-19 18:48 ` [PATCH v2 00/13] local_clock() vs noinstr Michael Kelley (LINUX)
  13 siblings, 0 replies; 25+ messages in thread
From: Peter Zijlstra @ 2023-05-19 10:21 UTC (permalink / raw)
  To: bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel, hca,
	gor, agordeev, borntraeger, svens, pbonzini, wanpengli, vkuznets,
	tglx, mingo, bp, dave.hansen, x86, hpa, jgross, boris.ostrovsky,
	daniel.lezcano, kys, haiyangz, wei.liu, decui, rafael, peterz,
	longman, boqun.feng, pmladek, senozhatsky, rostedt, john.ogness,
	juri.lelli, vincent.guittot, dietmar.eggemann, bsegall, mgorman,
	bristot, vschneid, jstultz, sboyd, linux-kernel, loongarch,
	linux-s390, kvm, linux-hyperv, linux-pm

Now that all ARCH_WANTS_NO_INSTR architectures (arm64, loongarch,
s390, x86) provide sched_clock_noinstr(), use this to provide
local_clock_noinstr().

This local_clock_noinstr() will be safe to use from noinstr code with
the assumption that any such noinstr code is non-preemptible (it had
better be, entry code will have IRQs disabled while __cpuidle must
have preemption disabled).

Specifically, preempt_enable_notrace(), a common part of many a
sched_clock() implementation calls out to schedule() -- even though,
per the above, it will never trigger -- which frustrates noinstr
validation.

  vmlinux.o: warning: objtool: local_clock+0xb5: call to preempt_schedule_notrace_thunk() leaves .noinstr.text section

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/sched/clock.h |   17 ++++++++++++++++-
 kernel/sched/clock.c        |   19 +++++++++++++------
 2 files changed, 29 insertions(+), 7 deletions(-)

--- a/include/linux/sched/clock.h
+++ b/include/linux/sched/clock.h
@@ -12,7 +12,16 @@
  *
  * Please use one of the three interfaces below.
  */
-extern unsigned long long notrace sched_clock(void);
+extern u64 sched_clock(void);
+
+#if defined(CONFIG_ARCH_WANTS_NO_INSTR) || defined(CONFIG_GENERIC_SCHED_CLOCK)
+extern u64 sched_clock_noinstr(void);
+#else
+static __always_inline u64 sched_clock_noinstr(void)
+{
+	return sched_clock();
+}
+#endif
 
 /*
  * See the comment in kernel/sched/clock.c
@@ -45,6 +54,11 @@ static inline u64 cpu_clock(int cpu)
 	return sched_clock();
 }
 
+static __always_inline u64 local_clock_noinstr(void)
+{
+	return sched_clock_noinstr();
+}
+
 static __always_inline u64 local_clock(void)
 {
 	return sched_clock();
@@ -79,6 +93,7 @@ static inline u64 cpu_clock(int cpu)
 	return sched_clock_cpu(cpu);
 }
 
+extern u64 local_clock_noinstr(void);
 extern u64 local_clock(void);
 
 #endif
--- a/kernel/sched/clock.c
+++ b/kernel/sched/clock.c
@@ -266,7 +266,7 @@ static __always_inline u64 sched_clock_l
 	s64 delta;
 
 again:
-	now = sched_clock();
+	now = sched_clock_noinstr();
 	delta = now - scd->tick_raw;
 	if (unlikely(delta < 0))
 		delta = 0;
@@ -293,22 +293,29 @@ static __always_inline u64 sched_clock_l
 	return clock;
 }
 
-noinstr u64 local_clock(void)
+noinstr u64 local_clock_noinstr(void)
 {
 	u64 clock;
 
 	if (static_branch_likely(&__sched_clock_stable))
-		return sched_clock() + __sched_clock_offset;
+		return sched_clock_noinstr() + __sched_clock_offset;
 
 	if (!static_branch_likely(&sched_clock_running))
-		return sched_clock();
+		return sched_clock_noinstr();
 
-	preempt_disable_notrace();
 	clock = sched_clock_local(this_scd());
-	preempt_enable_notrace();
 
 	return clock;
 }
+
+u64 local_clock(void)
+{
+	u64 now;
+	preempt_disable_notrace();
+	now = local_clock_noinstr();
+	preempt_enable_notrace();
+	return now;
+}
 EXPORT_SYMBOL_GPL(local_clock);
 
 static notrace u64 sched_clock_remote(struct sched_clock_data *scd)



^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2 13/13] cpuidle: Use local_clock_noinstr()
  2023-05-19 10:20 [PATCH v2 00/13] local_clock() vs noinstr Peter Zijlstra
                   ` (11 preceding siblings ...)
  2023-05-19 10:21 ` [PATCH v2 12/13] sched/clock: Provide local_clock_noinstr() Peter Zijlstra
@ 2023-05-19 10:21 ` Peter Zijlstra
  2023-05-19 14:13   ` Rafael J. Wysocki
  2023-05-19 18:48 ` [PATCH v2 00/13] local_clock() vs noinstr Michael Kelley (LINUX)
  13 siblings, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2023-05-19 10:21 UTC (permalink / raw)
  To: bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel, hca,
	gor, agordeev, borntraeger, svens, pbonzini, wanpengli, vkuznets,
	tglx, mingo, bp, dave.hansen, x86, hpa, jgross, boris.ostrovsky,
	daniel.lezcano, kys, haiyangz, wei.liu, decui, rafael, peterz,
	longman, boqun.feng, pmladek, senozhatsky, rostedt, john.ogness,
	juri.lelli, vincent.guittot, dietmar.eggemann, bsegall, mgorman,
	bristot, vschneid, jstultz, sboyd, linux-kernel, loongarch,
	linux-s390, kvm, linux-hyperv, linux-pm

With the introduction of local_clock_noinstr(), local_clock() itself
is no longer marked noinstr, use the correct function.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 drivers/cpuidle/cpuidle.c    |    8 ++++----
 drivers/cpuidle/poll_state.c |    4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)

--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -145,7 +145,7 @@ static noinstr void enter_s2idle_proper(
 
 	instrumentation_begin();
 
-	time_start = ns_to_ktime(local_clock());
+	time_start = ns_to_ktime(local_clock_noinstr());
 
 	tick_freeze();
 	/*
@@ -169,7 +169,7 @@ static noinstr void enter_s2idle_proper(
 	tick_unfreeze();
 	start_critical_timings();
 
-	time_end = ns_to_ktime(local_clock());
+	time_end = ns_to_ktime(local_clock_noinstr());
 
 	dev->states_usage[index].s2idle_time += ktime_us_delta(time_end, time_start);
 	dev->states_usage[index].s2idle_usage++;
@@ -243,7 +243,7 @@ noinstr int cpuidle_enter_state(struct c
 	sched_idle_set_state(target_state);
 
 	trace_cpu_idle(index, dev->cpu);
-	time_start = ns_to_ktime(local_clock());
+	time_start = ns_to_ktime(local_clock_noinstr());
 
 	stop_critical_timings();
 	if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE)) {
@@ -276,7 +276,7 @@ noinstr int cpuidle_enter_state(struct c
 	start_critical_timings();
 
 	sched_clock_idle_wakeup_event();
-	time_end = ns_to_ktime(local_clock());
+	time_end = ns_to_ktime(local_clock_noinstr());
 	trace_cpu_idle(PWR_EVENT_EXIT, dev->cpu);
 
 	/* The cpu is no longer idle or about to enter idle. */
--- a/drivers/cpuidle/poll_state.c
+++ b/drivers/cpuidle/poll_state.c
@@ -15,7 +15,7 @@ static int __cpuidle poll_idle(struct cp
 {
 	u64 time_start;
 
-	time_start = local_clock();
+	time_start = local_clock_noinstr();
 
 	dev->poll_time_limit = false;
 
@@ -32,7 +32,7 @@ static int __cpuidle poll_idle(struct cp
 				continue;
 
 			loop_count = 0;
-			if (local_clock() - time_start > limit) {
+			if (local_clock_noinstr() - time_start > limit) {
 				dev->poll_time_limit = true;
 				break;
 			}



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 13/13] cpuidle: Use local_clock_noinstr()
  2023-05-19 10:21 ` [PATCH v2 13/13] cpuidle: Use local_clock_noinstr() Peter Zijlstra
@ 2023-05-19 14:13   ` Rafael J. Wysocki
  0 siblings, 0 replies; 25+ messages in thread
From: Rafael J. Wysocki @ 2023-05-19 14:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: bigeasy, mark.rutland, maz, catalin.marinas, will, chenhuacai,
	kernel, hca, gor, agordeev, borntraeger, svens, pbonzini,
	wanpengli, vkuznets, tglx, mingo, bp, dave.hansen, x86, hpa,
	jgross, boris.ostrovsky, daniel.lezcano, kys, haiyangz, wei.liu,
	decui, rafael, longman, boqun.feng, pmladek, senozhatsky, rostedt,
	john.ogness, juri.lelli, vincent.guittot, dietmar.eggemann,
	bsegall, mgorman, bristot, vschneid, jstultz, sboyd, linux-kernel,
	loongarch, linux-s390, kvm, linux-hyperv, linux-pm

On Fri, May 19, 2023 at 12:33 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> With the introduction of local_clock_noinstr(), local_clock() itself
> is no longer marked noinstr, use the correct function.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Acked-by: Rafael J. Wysocki <rafael@kernel.org>

> ---
>  drivers/cpuidle/cpuidle.c    |    8 ++++----
>  drivers/cpuidle/poll_state.c |    4 ++--
>  2 files changed, 6 insertions(+), 6 deletions(-)
>
> --- a/drivers/cpuidle/cpuidle.c
> +++ b/drivers/cpuidle/cpuidle.c
> @@ -145,7 +145,7 @@ static noinstr void enter_s2idle_proper(
>
>         instrumentation_begin();
>
> -       time_start = ns_to_ktime(local_clock());
> +       time_start = ns_to_ktime(local_clock_noinstr());
>
>         tick_freeze();
>         /*
> @@ -169,7 +169,7 @@ static noinstr void enter_s2idle_proper(
>         tick_unfreeze();
>         start_critical_timings();
>
> -       time_end = ns_to_ktime(local_clock());
> +       time_end = ns_to_ktime(local_clock_noinstr());
>
>         dev->states_usage[index].s2idle_time += ktime_us_delta(time_end, time_start);
>         dev->states_usage[index].s2idle_usage++;
> @@ -243,7 +243,7 @@ noinstr int cpuidle_enter_state(struct c
>         sched_idle_set_state(target_state);
>
>         trace_cpu_idle(index, dev->cpu);
> -       time_start = ns_to_ktime(local_clock());
> +       time_start = ns_to_ktime(local_clock_noinstr());
>
>         stop_critical_timings();
>         if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE)) {
> @@ -276,7 +276,7 @@ noinstr int cpuidle_enter_state(struct c
>         start_critical_timings();
>
>         sched_clock_idle_wakeup_event();
> -       time_end = ns_to_ktime(local_clock());
> +       time_end = ns_to_ktime(local_clock_noinstr());
>         trace_cpu_idle(PWR_EVENT_EXIT, dev->cpu);
>
>         /* The cpu is no longer idle or about to enter idle. */
> --- a/drivers/cpuidle/poll_state.c
> +++ b/drivers/cpuidle/poll_state.c
> @@ -15,7 +15,7 @@ static int __cpuidle poll_idle(struct cp
>  {
>         u64 time_start;
>
> -       time_start = local_clock();
> +       time_start = local_clock_noinstr();
>
>         dev->poll_time_limit = false;
>
> @@ -32,7 +32,7 @@ static int __cpuidle poll_idle(struct cp
>                                 continue;
>
>                         loop_count = 0;
> -                       if (local_clock() - time_start > limit) {
> +                       if (local_clock_noinstr() - time_start > limit) {
>                                 dev->poll_time_limit = true;
>                                 break;
>                         }
>
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH v2 09/13] clocksource: hyper-v: Adjust hv_read_tsc_page_tsc() to avoid special casing U64_MAX
  2023-05-19 10:21 ` [PATCH v2 09/13] clocksource: hyper-v: Adjust hv_read_tsc_page_tsc() to avoid special casing U64_MAX Peter Zijlstra
@ 2023-05-19 18:38   ` Michael Kelley (LINUX)
  0 siblings, 0 replies; 25+ messages in thread
From: Michael Kelley (LINUX) @ 2023-05-19 18:38 UTC (permalink / raw)
  To: Peter Zijlstra, bigeasy@linutronix.de
  Cc: Mark Rutland, maz@kernel.org, catalin.marinas@arm.com,
	will@kernel.org, chenhuacai@kernel.org, kernel@xen0n.name,
	hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com,
	borntraeger@linux.ibm.com, svens@linux.ibm.com,
	pbonzini@redhat.com, wanpengli@tencent.com, vkuznets@redhat.com,
	tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com,
	jgross@suse.com, boris.ostrovsky@oracle.com,
	daniel.lezcano@linaro.org, KY Srinivasan, Haiyang Zhang,
	wei.liu@kernel.org, Dexuan Cui, rafael@kernel.org,
	longman@redhat.com, boqun.feng@gmail.com, pmladek@suse.com,
	senozhatsky@chromium.org, rostedt@goodmis.org,
	john.ogness@linutronix.de, juri.lelli@redhat.com,
	vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
	bsegall@google.com, mgorman@suse.de, bristot@redhat.com,
	vschneid@redhat.com, jstultz@google.com, sboyd@kernel.org,
	linux-kernel@vger.kernel.org, loongarch@lists.linux.dev,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org,
	linux-hyperv@vger.kernel.org, linux-pm@vger.kernel.org

From: Peter Zijlstra <peterz@infradead.org> Sent: Friday, May 19, 2023 3:21 AM
> 
> Currently hv_read_tsc_page_tsc() (ab)uses the (valid) time value of
> U64_MAX as an error return. This breaks the clean wrap-around of the
> clock.
> 
> Modify the function signature to return a boolean state and provide
> another u64 pointer to store the actual time on success. This obviates
> the need to steal one time value and restores the full counter width.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/include/asm/vdso/gettimeofday.h |   10 ++++++----
>  arch/x86/kvm/x86.c                       |    7 +++----
>  drivers/clocksource/hyperv_timer.c       |   16 +++++++++++-----
>  include/clocksource/hyperv_timer.h       |   24 +++++++++---------------
>  4 files changed, 29 insertions(+), 28 deletions(-)
> 

Reviewed-by: Michael Kelley <mikelley@microsoft.com>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH v2 00/13] local_clock() vs noinstr
  2023-05-19 10:20 [PATCH v2 00/13] local_clock() vs noinstr Peter Zijlstra
                   ` (12 preceding siblings ...)
  2023-05-19 10:21 ` [PATCH v2 13/13] cpuidle: Use local_clock_noinstr() Peter Zijlstra
@ 2023-05-19 18:48 ` Michael Kelley (LINUX)
  13 siblings, 0 replies; 25+ messages in thread
From: Michael Kelley (LINUX) @ 2023-05-19 18:48 UTC (permalink / raw)
  To: Peter Zijlstra, bigeasy@linutronix.de
  Cc: Mark Rutland, maz@kernel.org, catalin.marinas@arm.com,
	will@kernel.org, chenhuacai@kernel.org, kernel@xen0n.name,
	hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com,
	borntraeger@linux.ibm.com, svens@linux.ibm.com,
	pbonzini@redhat.com, wanpengli@tencent.com, vkuznets@redhat.com,
	tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com,
	jgross@suse.com, boris.ostrovsky@oracle.com,
	daniel.lezcano@linaro.org, KY Srinivasan, Haiyang Zhang,
	wei.liu@kernel.org, Dexuan Cui, rafael@kernel.org,
	longman@redhat.com, boqun.feng@gmail.com, pmladek@suse.com,
	senozhatsky@chromium.org, rostedt@goodmis.org,
	john.ogness@linutronix.de, juri.lelli@redhat.com,
	vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
	bsegall@google.com, mgorman@suse.de, bristot@redhat.com,
	vschneid@redhat.com, jstultz@google.com, sboyd@kernel.org,
	linux-kernel@vger.kernel.org, loongarch@lists.linux.dev,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org,
	linux-hyperv@vger.kernel.org, linux-pm@vger.kernel.org

From: Peter Zijlstra <peterz@infradead.org> Sent: Friday, May 19, 2023 3:21 AM
> 
> Hi All,
> 
> latest version of the local_clock_noinstr() patches.
> 
> Most of the changes are in Hyper-V and x86/vdso/gettimeofday; Michael has been
> very helpful navigating the Hyper-V spec and fixing their sched_clock
> implementation.
> 
> ---
>  arch/arm64/include/asm/arch_timer.h      |  8 +----
>  arch/arm64/include/asm/io.h              | 12 +++----
>  arch/loongarch/include/asm/loongarch.h   |  2 +-
>  arch/loongarch/kernel/time.c             |  6 ++--
>  arch/s390/include/asm/timex.h            | 13 ++++---
>  arch/s390/kernel/time.c                  |  5 +++
>  arch/x86/include/asm/mshyperv.h          |  5 +++
>  arch/x86/include/asm/vdso/gettimeofday.h | 41 ++++++++++++++++------
>  arch/x86/kernel/kvmclock.c               |  4 +--
>  arch/x86/kernel/tsc.c                    | 38 +++++++++++++++-----
>  arch/x86/kvm/x86.c                       |  7 ++--
>  arch/x86/xen/time.c                      |  3 +-
>  drivers/clocksource/arm_arch_timer.c     | 60 ++++++++++++++++++++++++--------
>  drivers/clocksource/hyperv_timer.c       | 44 ++++++++++++++---------
>  drivers/cpuidle/cpuidle.c                |  8 ++---
>  drivers/cpuidle/poll_state.c             |  4 +--
>  include/clocksource/hyperv_timer.h       | 24 +++++--------
>  include/linux/math64.h                   |  2 +-
>  include/linux/rbtree_latch.h             |  2 +-
>  include/linux/sched/clock.h              | 17 ++++++++-
>  include/linux/seqlock.h                  | 15 ++++----
>  kernel/printk/printk.c                   |  2 +-
>  kernel/sched/clock.c                     | 19 ++++++----
>  kernel/time/sched_clock.c                | 24 +++++++++----
>  kernel/time/timekeeping.c                |  4 +--
>  25 files changed, 242 insertions(+), 127 deletions(-)

Based on linux-next20230519, tested the full series in a Hyper-V
VM on x86/x64.   Did mainly a basic smoke test.  Sched clock
appears to work correctly.  Verified correct operation of the TSC page
clocksource and the MSR-based clocksource.  Verified that vDSO
gettimeofday() works with the Hyper-V TSC page clocksource and
is correctly disabled with the Hyper-V MSR-based clocksource.
Tested in a normal VM and in an "SEV-SNP with vTOM" VM.

Tested-by: Michael Kelley <mikelley@microsoft.com>  # Hyper-V

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 06/13] s390/time: Provide sched_clock_noinstr()
  2023-05-19 10:21 ` [PATCH v2 06/13] s390/time: Provide sched_clock_noinstr() Peter Zijlstra
@ 2023-05-22 14:18   ` Heiko Carstens
  0 siblings, 0 replies; 25+ messages in thread
From: Heiko Carstens @ 2023-05-22 14:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: bigeasy, mark.rutland, maz, catalin.marinas, will, chenhuacai,
	kernel, gor, agordeev, borntraeger, svens, pbonzini, wanpengli,
	vkuznets, tglx, mingo, bp, dave.hansen, x86, hpa, jgross,
	boris.ostrovsky, daniel.lezcano, kys, haiyangz, wei.liu, decui,
	rafael, longman, boqun.feng, pmladek, senozhatsky, rostedt,
	john.ogness, juri.lelli, vincent.guittot, dietmar.eggemann,
	bsegall, mgorman, bristot, vschneid, jstultz, sboyd, linux-kernel,
	loongarch, linux-s390, kvm, linux-hyperv, linux-pm

On Fri, May 19, 2023 at 12:21:04PM +0200, Peter Zijlstra wrote:
> With the intent to provide local_clock_noinstr(), a variant of
> local_clock() that's safe to be called from noinstr code (with the
> assumption that any such code will already be non-preemptible),
> prepare for things by providing a noinstr sched_clock_noinstr()
> function.
> 
> Specifically, preempt_enable_*() calls out to schedule(), which upsets
> noinstr validation efforts.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/s390/include/asm/timex.h |   13 +++++++++----
>  arch/s390/kernel/time.c       |   11 ++++++++++-
>  2 files changed, 19 insertions(+), 5 deletions(-)

Acked-by: Heiko Carstens <hca@linux.ibm.com>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 03/13] arm64/io: Always inline all of __raw_{read,write}[bwlq]()
  2023-05-19 10:21 ` [PATCH v2 03/13] arm64/io: Always inline all of __raw_{read,write}[bwlq]() Peter Zijlstra
@ 2023-05-24 16:40   ` Valentin Schneider
  0 siblings, 0 replies; 25+ messages in thread
From: Valentin Schneider @ 2023-05-24 16:40 UTC (permalink / raw)
  To: Peter Zijlstra, bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel, hca,
	gor, agordeev, borntraeger, svens, pbonzini, wanpengli, vkuznets,
	tglx, mingo, bp, dave.hansen, x86, hpa, jgross, boris.ostrovsky,
	daniel.lezcano, kys, haiyangz, wei.liu, decui, rafael, peterz,
	longman, boqun.feng, pmladek, senozhatsky, rostedt, john.ogness,
	juri.lelli, vincent.guittot, dietmar.eggemann, bsegall, mgorman,
	bristot, jstultz, sboyd, linux-kernel, loongarch, linux-s390, kvm,
	linux-hyperv, linux-pm

On 19/05/23 12:21, Peter Zijlstra wrote:
> The next patch will want to use __raw_readl() from a noinstr section
> and as such that needs to be marked __always_inline to avoid the
> compiler being a silly bugger.
>
> Turns out it already is, but its siblings are not. Finish the work
> started in commit e43f1331e2ef913b ("arm64: Ask the compiler to
> __always_inline functions used by KVM at HYP") for consistenies sake.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Reviewed-by: Valentin Schneider <vschneid@redhat.com>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 04/13] arm64/arch_timer: Provide noinstr sched_clock_read() functions
  2023-05-19 10:21 ` [PATCH v2 04/13] arm64/arch_timer: Provide noinstr sched_clock_read() functions Peter Zijlstra
@ 2023-05-24 16:40   ` Valentin Schneider
  2023-06-02 11:54     ` Peter Zijlstra
  0 siblings, 1 reply; 25+ messages in thread
From: Valentin Schneider @ 2023-05-24 16:40 UTC (permalink / raw)
  To: Peter Zijlstra, bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel, hca,
	gor, agordeev, borntraeger, svens, pbonzini, wanpengli, vkuznets,
	tglx, mingo, bp, dave.hansen, x86, hpa, jgross, boris.ostrovsky,
	daniel.lezcano, kys, haiyangz, wei.liu, decui, rafael, peterz,
	longman, boqun.feng, pmladek, senozhatsky, rostedt, john.ogness,
	juri.lelli, vincent.guittot, dietmar.eggemann, bsegall, mgorman,
	bristot, jstultz, sboyd, linux-kernel, loongarch, linux-s390, kvm,
	linux-hyperv, linux-pm

On 19/05/23 12:21, Peter Zijlstra wrote:
> With the intent to provide local_clock_noinstr(), a variant of
> local_clock() that's safe to be called from noinstr code (with the
> assumption that any such code will already be non-preemptible),
> prepare for things by providing a noinstr sched_clock_read() function.
>
> Specifically, preempt_enable_*() calls out to schedule(), which upsets
> noinstr validation efforts.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/arm64/include/asm/arch_timer.h  |    8 ----
>  drivers/clocksource/arm_arch_timer.c |   60 ++++++++++++++++++++++++++---------
>  2 files changed, 47 insertions(+), 21 deletions(-)
>
> --- a/arch/arm64/include/asm/arch_timer.h
> +++ b/arch/arm64/include/asm/arch_timer.h
> @@ -88,13 +88,7 @@ static inline notrace u64 arch_timer_rea
>
>  #define arch_timer_reg_read_stable(reg)					\
>       ({								\
> -		u64 _val;						\
> -									\
> -		preempt_disable_notrace();				\
> -		_val = erratum_handler(read_ ## reg)();			\
> -		preempt_enable_notrace();				\
> -									\
> -		_val;							\
> +		erratum_handler(read_ ## reg)();			\
>       })
>
>  /*
> --- a/drivers/clocksource/arm_arch_timer.c
> +++ b/drivers/clocksource/arm_arch_timer.c
> @@ -191,22 +191,40 @@ u32 arch_timer_reg_read(int access, enum
>       return val;
>  }
>
> -static notrace u64 arch_counter_get_cntpct_stable(void)
> +static noinstr u64 _arch_counter_get_cntpct_stable(void)
>  {
>       return __arch_counter_get_cntpct_stable();
>  }
>

I tripped over the underscores when reviewing this :( This must be
getting close to the symbol length limit, but could we give this a helpful
prefix or suffix? raw_* or *_noinstr?

> @@ -1074,6 +1092,13 @@ struct arch_timer_kvm_info *arch_timer_g
>
>  static void __init arch_counter_register(unsigned type)
>  {
> +	/*
> +	 * Default to cp15 based access because arm64 uses this function for
> +	 * sched_clock() before DT is probed and the cp15 method is guaranteed
> +	 * to exist on arm64. arm doesn't use this before DT is probed so even
> +	 * if we don't have the cp15 accessors we won't have a problem.
> +	 */
> +	u64 (*scr)(void) = arch_counter_get_cntvct;

So this bit sent me on a little spelunking session :-)

From a control flow perspective the initialization isn't required, but then
I looked into the comment and found it comes from the
arch_timer_read_counter() definition... Which itself doesn't get used by
sched_clock() until the sched_clock_register() below!

So AFAICT that comment was true as of

  220069945b29 ("clocksource: arch_timer: Add support for memory mapped timers")

but not after a commit that came 2 months later:

  65cd4f6c99c1 ("arch_timer: Move to generic sched_clock framework")

which IIUC made arm/arm64 follow the default approach of using the
jiffy-based sched_clock() before probing DT/ACPI and registering a "proper"
sched_clock.

All of that to say: the comment about arch_timer_read_counter() vs early
sched_clock() doesn't apply anymore, but I think we need to keep its
initalization around for stuff like get_cycles(). This initialization here
should be OK to put to the bin, though.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 08/13] x86/vdso: Fix gettimeofday masking
  2023-05-19 10:21 ` [PATCH v2 08/13] x86/vdso: Fix gettimeofday masking Peter Zijlstra
@ 2023-05-31 15:27   ` Thomas Gleixner
  2023-05-31 22:46     ` Thomas Gleixner
  2023-05-31 22:46   ` Thomas Gleixner
  1 sibling, 1 reply; 25+ messages in thread
From: Thomas Gleixner @ 2023-05-31 15:27 UTC (permalink / raw)
  To: Peter Zijlstra, bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel, hca,
	gor, agordeev, borntraeger, svens, pbonzini, wanpengli, vkuznets,
	mingo, bp, dave.hansen, x86, hpa, jgross, boris.ostrovsky,
	daniel.lezcano, kys, haiyangz, wei.liu, decui, rafael, peterz,
	longman, boqun.feng, pmladek, senozhatsky, rostedt, john.ogness,
	juri.lelli, vincent.guittot, dietmar.eggemann, bsegall, mgorman,
	bristot, vschneid, jstultz, sboyd, linux-kernel, loongarch,
	linux-s390, kvm, linux-hyperv, linux-pm

On Fri, May 19 2023 at 12:21, Peter Zijlstra wrote:
> Because of how the virtual clocks use U64_MAX as an exception value
> instead of a valid time, the clocks can no longer be assumed to wrap
> cleanly. This is then compounded by arch_vdso_cycles_ok() rejecting
> everything with the MSB/Sign-bit set.
>
> Therefore, the effective mask becomes S64_MAX, and the comment with
> vdso_calc_delta() that states the mask is U64_MAX and isn't optimized
> out is just plain silly.
>
> Now, the code has a negative filter -- to deal with TSC wobbles:
>
> 	if (cycles > last)
>
> which is just plain wrong, because it should've been written as:
>
> 	if ((s64)(cycles - last) > 0)
>
> to take wrapping into account, but per all the above, we don't
> actually wrap on u64 anymore.

Indeed. The rationale was that you need ~146 years uptime with a 4GHz
TSC or ~584 years with 1GHz to actually reach the wrap around point.

Though I can see your point to make sure that silly BIOSes or VMMs
cannot cause havoc by accident or malice.

Did anyone ever validate that wrap around on TSC including TSC deadline
timer works correctly?

I have faint memories of TSC_ADJUST, which I prefer not to bring back to
main memory :)

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 08/13] x86/vdso: Fix gettimeofday masking
  2023-05-31 15:27   ` Thomas Gleixner
@ 2023-05-31 22:46     ` Thomas Gleixner
  0 siblings, 0 replies; 25+ messages in thread
From: Thomas Gleixner @ 2023-05-31 22:46 UTC (permalink / raw)
  To: Peter Zijlstra, bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel, hca,
	gor, agordeev, borntraeger, svens, pbonzini, wanpengli, vkuznets,
	mingo, bp, dave.hansen, x86, hpa, jgross, boris.ostrovsky,
	daniel.lezcano, kys, haiyangz, wei.liu, decui, rafael, peterz,
	longman, boqun.feng, pmladek, senozhatsky, rostedt, john.ogness,
	juri.lelli, vincent.guittot, dietmar.eggemann, bsegall, mgorman,
	bristot, vschneid, jstultz, sboyd, linux-kernel, loongarch,
	linux-s390, kvm, linux-hyperv, linux-pm

On Wed, May 31 2023 at 17:27, Thomas Gleixner wrote:
> On Fri, May 19 2023 at 12:21, Peter Zijlstra wrote:
>> to take wrapping into account, but per all the above, we don't
>> actually wrap on u64 anymore.
>
> Indeed. The rationale was that you need ~146 years uptime with a 4GHz
> TSC or ~584 years with 1GHz to actually reach the wrap around point.
>
> Though I can see your point to make sure that silly BIOSes or VMMs
> cannot cause havoc by accident or malice.
>
> Did anyone ever validate that wrap around on TSC including TSC deadline
> timer works correctly?
>
> I have faint memories of TSC_ADJUST, which I prefer not to bring back to
> main memory :)

It seems my fears have been unjustified.

At least a quick test which sets the TSC to ~ -8min @2.1Ghz the machine
seems to survive without the colourful explosions I expected due to my
early exposure to TSC_ADJUST and TSC_DEADLINE_TIMER :)

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 08/13] x86/vdso: Fix gettimeofday masking
  2023-05-19 10:21 ` [PATCH v2 08/13] x86/vdso: Fix gettimeofday masking Peter Zijlstra
  2023-05-31 15:27   ` Thomas Gleixner
@ 2023-05-31 22:46   ` Thomas Gleixner
  1 sibling, 0 replies; 25+ messages in thread
From: Thomas Gleixner @ 2023-05-31 22:46 UTC (permalink / raw)
  To: Peter Zijlstra, bigeasy
  Cc: mark.rutland, maz, catalin.marinas, will, chenhuacai, kernel, hca,
	gor, agordeev, borntraeger, svens, pbonzini, wanpengli, vkuznets,
	mingo, bp, dave.hansen, x86, hpa, jgross, boris.ostrovsky,
	daniel.lezcano, kys, haiyangz, wei.liu, decui, rafael, peterz,
	longman, boqun.feng, pmladek, senozhatsky, rostedt, john.ogness,
	juri.lelli, vincent.guittot, dietmar.eggemann, bsegall, mgorman,
	bristot, vschneid, jstultz, sboyd, linux-kernel, loongarch,
	linux-s390, kvm, linux-hyperv, linux-pm

On Fri, May 19 2023 at 12:21, Peter Zijlstra wrote:
> to take wrapping into account, but per all the above, we don't
> actually wrap on u64 anymore.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Tested-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 04/13] arm64/arch_timer: Provide noinstr sched_clock_read() functions
  2023-05-24 16:40   ` Valentin Schneider
@ 2023-06-02 11:54     ` Peter Zijlstra
  2023-06-07  8:58       ` Valentin Schneider
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2023-06-02 11:54 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: bigeasy, mark.rutland, maz, catalin.marinas, will, chenhuacai,
	kernel, hca, gor, agordeev, borntraeger, svens, pbonzini,
	wanpengli, vkuznets, tglx, mingo, bp, dave.hansen, x86, hpa,
	jgross, boris.ostrovsky, daniel.lezcano, kys, haiyangz, wei.liu,
	decui, rafael, longman, boqun.feng, pmladek, senozhatsky, rostedt,
	john.ogness, juri.lelli, vincent.guittot, dietmar.eggemann,
	bsegall, mgorman, bristot, jstultz, sboyd, linux-kernel,
	loongarch, linux-s390, kvm, linux-hyperv, linux-pm

On Wed, May 24, 2023 at 05:40:47PM +0100, Valentin Schneider wrote:
> On 19/05/23 12:21, Peter Zijlstra wrote:
> > With the intent to provide local_clock_noinstr(), a variant of
> > local_clock() that's safe to be called from noinstr code (with the
> > assumption that any such code will already be non-preemptible),
> > prepare for things by providing a noinstr sched_clock_read() function.
> >
> > Specifically, preempt_enable_*() calls out to schedule(), which upsets
> > noinstr validation efforts.
> >
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >  arch/arm64/include/asm/arch_timer.h  |    8 ----
> >  drivers/clocksource/arm_arch_timer.c |   60 ++++++++++++++++++++++++++---------
> >  2 files changed, 47 insertions(+), 21 deletions(-)
> >
> > --- a/arch/arm64/include/asm/arch_timer.h
> > +++ b/arch/arm64/include/asm/arch_timer.h
> > @@ -88,13 +88,7 @@ static inline notrace u64 arch_timer_rea
> >
> >  #define arch_timer_reg_read_stable(reg)					\
> >       ({								\
> > -		u64 _val;						\
> > -									\
> > -		preempt_disable_notrace();				\
> > -		_val = erratum_handler(read_ ## reg)();			\
> > -		preempt_enable_notrace();				\
> > -									\
> > -		_val;							\
> > +		erratum_handler(read_ ## reg)();			\
> >       })
> >
> >  /*
> > --- a/drivers/clocksource/arm_arch_timer.c
> > +++ b/drivers/clocksource/arm_arch_timer.c
> > @@ -191,22 +191,40 @@ u32 arch_timer_reg_read(int access, enum
> >       return val;
> >  }
> >
> > -static notrace u64 arch_counter_get_cntpct_stable(void)
> > +static noinstr u64 _arch_counter_get_cntpct_stable(void)
> >  {
> >       return __arch_counter_get_cntpct_stable();
> >  }
> >
> 
> I tripped over the underscores when reviewing this :( This must be
> getting close to the symbol length limit, but could we give this a helpful
> prefix or suffix? raw_* or *_noinstr?
> 
> > @@ -1074,6 +1092,13 @@ struct arch_timer_kvm_info *arch_timer_g
> >
> >  static void __init arch_counter_register(unsigned type)
> >  {
> > +	/*
> > +	 * Default to cp15 based access because arm64 uses this function for
> > +	 * sched_clock() before DT is probed and the cp15 method is guaranteed
> > +	 * to exist on arm64. arm doesn't use this before DT is probed so even
> > +	 * if we don't have the cp15 accessors we won't have a problem.
> > +	 */
> > +	u64 (*scr)(void) = arch_counter_get_cntvct;
> 
> So this bit sent me on a little spelunking session :-)
> 
> From a control flow perspective the initialization isn't required, but then
> I looked into the comment and found it comes from the
> arch_timer_read_counter() definition... Which itself doesn't get used by
> sched_clock() until the sched_clock_register() below!
> 
> So AFAICT that comment was true as of
> 
>   220069945b29 ("clocksource: arch_timer: Add support for memory mapped timers")
> 
> but not after a commit that came 2 months later:
> 
>   65cd4f6c99c1 ("arch_timer: Move to generic sched_clock framework")
> 
> which IIUC made arm/arm64 follow the default approach of using the
> jiffy-based sched_clock() before probing DT/ACPI and registering a "proper"
> sched_clock.
> 
> All of that to say: the comment about arch_timer_read_counter() vs early
> sched_clock() doesn't apply anymore, but I think we need to keep its
> initalization around for stuff like get_cycles(). This initialization here
> should be OK to put to the bin, though.

Something like the below folded in then?

---
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -191,7 +191,7 @@ u32 arch_timer_reg_read(int access, enum
 	return val;
 }
 
-static noinstr u64 _arch_counter_get_cntpct_stable(void)
+static noinstr u64 raw_counter_get_cntpct_stable(void)
 {
 	return __arch_counter_get_cntpct_stable();
 }
@@ -210,7 +210,7 @@ static noinstr u64 arch_counter_get_cntp
 	return __arch_counter_get_cntpct();
 }
 
-static noinstr u64 _arch_counter_get_cntvct_stable(void)
+static noinstr u64 raw_counter_get_cntvct_stable(void)
 {
 	return __arch_counter_get_cntvct_stable();
 }
@@ -1092,13 +1092,7 @@ struct arch_timer_kvm_info *arch_timer_g
 
 static void __init arch_counter_register(unsigned type)
 {
-	/*
-	 * Default to cp15 based access because arm64 uses this function for
-	 * sched_clock() before DT is probed and the cp15 method is guaranteed
-	 * to exist on arm64. arm doesn't use this before DT is probed so even
-	 * if we don't have the cp15 accessors we won't have a problem.
-	 */
-	u64 (*scr)(void) = arch_counter_get_cntvct;
+	u64 (*scr)(void);
 	u64 start_count;
 	int width;
 
@@ -1110,7 +1104,7 @@ static void __init arch_counter_register
 		    arch_timer_uses_ppi == ARCH_TIMER_VIRT_PPI) {
 			if (arch_timer_counter_has_wa()) {
 				rd = arch_counter_get_cntvct_stable;
-				scr = _arch_counter_get_cntvct_stable;
+				scr = raw_counter_get_cntvct_stable;
 			} else {
 				rd = arch_counter_get_cntvct;
 				scr = arch_counter_get_cntvct;
@@ -1118,7 +1112,7 @@ static void __init arch_counter_register
 		} else {
 			if (arch_timer_counter_has_wa()) {
 				rd = arch_counter_get_cntpct_stable;
-				scr = _arch_counter_get_cntpct_stable;
+				scr = raw_counter_get_cntpct_stable;
 			} else {
 				rd = arch_counter_get_cntpct;
 				scr = arch_counter_get_cntpct;

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v2 04/13] arm64/arch_timer: Provide noinstr sched_clock_read() functions
  2023-06-02 11:54     ` Peter Zijlstra
@ 2023-06-07  8:58       ` Valentin Schneider
  0 siblings, 0 replies; 25+ messages in thread
From: Valentin Schneider @ 2023-06-07  8:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: bigeasy, mark.rutland, maz, catalin.marinas, will, chenhuacai,
	kernel, hca, gor, agordeev, borntraeger, svens, pbonzini,
	wanpengli, vkuznets, tglx, mingo, bp, dave.hansen, x86, hpa,
	jgross, boris.ostrovsky, daniel.lezcano, kys, haiyangz, wei.liu,
	decui, rafael, longman, boqun.feng, pmladek, senozhatsky, rostedt,
	john.ogness, juri.lelli, vincent.guittot, dietmar.eggemann,
	bsegall, mgorman, bristot, jstultz, sboyd, linux-kernel,
	loongarch, linux-s390, kvm, linux-hyperv, linux-pm

On 02/06/23 13:54, Peter Zijlstra wrote:
> On Wed, May 24, 2023 at 05:40:47PM +0100, Valentin Schneider wrote:
>>
>> So this bit sent me on a little spelunking session :-)
>>
>> From a control flow perspective the initialization isn't required, but then
>> I looked into the comment and found it comes from the
>> arch_timer_read_counter() definition... Which itself doesn't get used by
>> sched_clock() until the sched_clock_register() below!
>>
>> So AFAICT that comment was true as of
>>
>>   220069945b29 ("clocksource: arch_timer: Add support for memory mapped timers")
>>
>> but not after a commit that came 2 months later:
>>
>>   65cd4f6c99c1 ("arch_timer: Move to generic sched_clock framework")
>>
>> which IIUC made arm/arm64 follow the default approach of using the
>> jiffy-based sched_clock() before probing DT/ACPI and registering a "proper"
>> sched_clock.
>>
>> All of that to say: the comment about arch_timer_read_counter() vs early
>> sched_clock() doesn't apply anymore, but I think we need to keep its
>> initalization around for stuff like get_cycles(). This initialization here
>> should be OK to put to the bin, though.
>
> Something like the below folded in then?
>

Much better, thank you!


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2023-06-07  9:01 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-19 10:20 [PATCH v2 00/13] local_clock() vs noinstr Peter Zijlstra
2023-05-19 10:20 ` [PATCH v2 01/13] seqlock/latch: Provide raw_read_seqcount_latch_retry() Peter Zijlstra
2023-05-19 10:21 ` [PATCH v2 02/13] time/sched_clock: Provide sched_clock_noinstr() Peter Zijlstra
2023-05-19 10:21 ` [PATCH v2 03/13] arm64/io: Always inline all of __raw_{read,write}[bwlq]() Peter Zijlstra
2023-05-24 16:40   ` Valentin Schneider
2023-05-19 10:21 ` [PATCH v2 04/13] arm64/arch_timer: Provide noinstr sched_clock_read() functions Peter Zijlstra
2023-05-24 16:40   ` Valentin Schneider
2023-06-02 11:54     ` Peter Zijlstra
2023-06-07  8:58       ` Valentin Schneider
2023-05-19 10:21 ` [PATCH v2 05/13] loongarch: Provide noinstr sched_clock_read() Peter Zijlstra
2023-05-19 10:21 ` [PATCH v2 06/13] s390/time: Provide sched_clock_noinstr() Peter Zijlstra
2023-05-22 14:18   ` Heiko Carstens
2023-05-19 10:21 ` [PATCH v2 07/13] math64: Always inline u128 version of mul_u64_u64_shr() Peter Zijlstra
2023-05-19 10:21 ` [PATCH v2 08/13] x86/vdso: Fix gettimeofday masking Peter Zijlstra
2023-05-31 15:27   ` Thomas Gleixner
2023-05-31 22:46     ` Thomas Gleixner
2023-05-31 22:46   ` Thomas Gleixner
2023-05-19 10:21 ` [PATCH v2 09/13] clocksource: hyper-v: Adjust hv_read_tsc_page_tsc() to avoid special casing U64_MAX Peter Zijlstra
2023-05-19 18:38   ` Michael Kelley (LINUX)
2023-05-19 10:21 ` [PATCH v2 10/13] clocksource: hyper-v: Provide noinstr sched_clock() Peter Zijlstra
2023-05-19 10:21 ` [PATCH v2 11/13] x86/tsc: Provide sched_clock_noinstr() Peter Zijlstra
2023-05-19 10:21 ` [PATCH v2 12/13] sched/clock: Provide local_clock_noinstr() Peter Zijlstra
2023-05-19 10:21 ` [PATCH v2 13/13] cpuidle: Use local_clock_noinstr() Peter Zijlstra
2023-05-19 14:13   ` Rafael J. Wysocki
2023-05-19 18:48 ` [PATCH v2 00/13] local_clock() vs noinstr Michael Kelley (LINUX)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).