public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/5] kcsan, seqlock: Support seqcount_latch_t
@ 2024-11-04 15:43 Marco Elver
  2024-11-04 15:43 ` [PATCH v2 1/5] time/sched_clock: Swap update_clock_read_data() latch writes Marco Elver
                   ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: Marco Elver @ 2024-11-04 15:43 UTC (permalink / raw)
  To: elver, Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long,
	Boqun Feng
  Cc: Paul E. McKenney, Thomas Gleixner, Mark Rutland, Dmitry Vyukov,
	kasan-dev, linux-kernel

While fuzzing an arm64 kernel, Alexander Potapenko reported:

| BUG: KCSAN: data-race in ktime_get_mono_fast_ns / timekeeping_update
|
| write to 0xffffffc082e74248 of 56 bytes by interrupt on cpu 0:
|  update_fast_timekeeper kernel/time/timekeeping.c:430 [inline]
|  timekeeping_update+0x1d8/0x2d8 kernel/time/timekeeping.c:768
|  timekeeping_advance+0x9e8/0xb78 kernel/time/timekeeping.c:2344
|  update_wall_time+0x18/0x38 kernel/time/timekeeping.c:2360
|  [...]
|
| read to 0xffffffc082e74258 of 8 bytes by task 5260 on cpu 1:
|  __ktime_get_fast_ns kernel/time/timekeeping.c:372 [inline]
|  ktime_get_mono_fast_ns+0x88/0x174 kernel/time/timekeeping.c:489
|  init_srcu_struct_fields+0x40c/0x530 kernel/rcu/srcutree.c:263
|  init_srcu_struct+0x14/0x20 kernel/rcu/srcutree.c:311
|  [...]
|
| value changed: 0x000002f875d33266 -> 0x000002f877416866
|
| Reported by Kernel Concurrency Sanitizer on:
| CPU: 1 UID: 0 PID: 5260 Comm: syz.2.7483 Not tainted 6.12.0-rc3-dirty #78

This is a false positive data race between a seqcount latch writer and a reader
accessing stale data. Since its introduction, KCSAN has never understood the
seqcount_latch interface (due to being unannotated).

Unlike the regular seqlock interface, the seqcount_latch interface for latch
writers never has had a well-defined critical section, making it difficult to
teach tooling where the critical section starts and ends.

This series introduces an instrumentable (non-raw) seqcount_latch interface,
with which we can clearly denote writer critical sections. This both helps
readability and tooling like KCSAN to understand when the writer is done
updating all latch copies.

Changelog
=========

v2:
* New interface, courtesy of Peter Zijlstra. This simplifies things and we
  avoid instrumenting the raw interface which is now reserved for noinstr
  functions.
* Fix for read_seqbegin/retry() found during testing of new changes.

v1: https://lkml.kernel.org/r/20241029083658.1096492-1-elver@google.com

Marco Elver (5):
  time/sched_clock: Swap update_clock_read_data() latch writes
  time/sched_clock: Broaden sched_clock()'s instrumentation coverage
  kcsan, seqlock: Support seqcount_latch_t
  seqlock, treewide: Switch to non-raw seqcount_latch interface
  kcsan, seqlock: Fix incorrect assumption in read_seqbegin()

 Documentation/locking/seqlock.rst |  2 +-
 arch/x86/kernel/tsc.c             |  5 +-
 include/linux/rbtree_latch.h      | 20 ++++---
 include/linux/seqlock.h           | 98 +++++++++++++++++++++++--------
 kernel/printk/printk.c            |  9 +--
 kernel/time/sched_clock.c         | 34 +++++++----
 kernel/time/timekeeping.c         | 12 ++--
 7 files changed, 123 insertions(+), 57 deletions(-)

-- 
2.47.0.163.g1226f6d8fa-goog

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 1/5] time/sched_clock: Swap update_clock_read_data() latch writes
  2024-11-04 15:43 [PATCH v2 0/5] kcsan, seqlock: Support seqcount_latch_t Marco Elver
@ 2024-11-04 15:43 ` Marco Elver
  2024-11-06 10:47   ` [tip: locking/core] " tip-bot2 for Marco Elver
  2024-11-04 15:43 ` [PATCH v2 2/5] time/sched_clock: Broaden sched_clock()'s instrumentation coverage Marco Elver
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Marco Elver @ 2024-11-04 15:43 UTC (permalink / raw)
  To: elver, Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long,
	Boqun Feng
  Cc: Paul E. McKenney, Thomas Gleixner, Mark Rutland, Dmitry Vyukov,
	kasan-dev, linux-kernel

Swap the writes to the odd and even copies to make the writer critical
section look like all other seqcount_latch writers.

Signed-off-by: Marco Elver <elver@google.com>
---
v2:
* New patch.
---
 kernel/time/sched_clock.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
index 68d6c1190ac7..85595fcf6aa2 100644
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -119,9 +119,6 @@ unsigned long long notrace sched_clock(void)
  */
 static void update_clock_read_data(struct clock_read_data *rd)
 {
-	/* update the backup (odd) copy with the new data */
-	cd.read_data[1] = *rd;
-
 	/* steer readers towards the odd copy */
 	raw_write_seqcount_latch(&cd.seq);
 
@@ -130,6 +127,9 @@ static void update_clock_read_data(struct clock_read_data *rd)
 
 	/* switch readers back to the even copy */
 	raw_write_seqcount_latch(&cd.seq);
+
+	/* update the backup (odd) copy with the new data */
+	cd.read_data[1] = *rd;
 }
 
 /*
-- 
2.47.0.163.g1226f6d8fa-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 2/5] time/sched_clock: Broaden sched_clock()'s instrumentation coverage
  2024-11-04 15:43 [PATCH v2 0/5] kcsan, seqlock: Support seqcount_latch_t Marco Elver
  2024-11-04 15:43 ` [PATCH v2 1/5] time/sched_clock: Swap update_clock_read_data() latch writes Marco Elver
@ 2024-11-04 15:43 ` Marco Elver
  2024-11-05  9:22   ` Marco Elver
  2024-11-06 10:47   ` [tip: locking/core] " tip-bot2 for Marco Elver
  2024-11-04 15:43 ` [PATCH v2 3/5] kcsan, seqlock: Support seqcount_latch_t Marco Elver
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 18+ messages in thread
From: Marco Elver @ 2024-11-04 15:43 UTC (permalink / raw)
  To: elver, Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long,
	Boqun Feng
  Cc: Paul E. McKenney, Thomas Gleixner, Mark Rutland, Dmitry Vyukov,
	kasan-dev, linux-kernel

Most of sched_clock()'s implementation is ineligible for instrumentation
due to relying on sched_clock_noinstr().

Split the implementation off into an __always_inline function
__sched_clock(), which is then used by the noinstr and instrumentable
version, to allow more of sched_clock() to be covered by various
instrumentation.

This will allow instrumentation with the various sanitizers (KASAN,
KCSAN, KMSAN, UBSAN). For KCSAN, we know that raw seqcount_latch usage
without annotations will result in false positive reports: tell it that
all of __sched_clock() is "atomic" for the latch writer; later changes
in this series will take care of the readers.

Link: https://lore.kernel.org/all/20241030204815.GQ14555@noisy.programming.kicks-ass.net/
Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Marco Elver <elver@google.com>
---
v2:
* New patch.
---
 kernel/time/sched_clock.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
index 85595fcf6aa2..29bdf309dae8 100644
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -80,7 +80,7 @@ notrace int sched_clock_read_retry(unsigned int seq)
 	return raw_read_seqcount_latch_retry(&cd.seq, seq);
 }
 
-unsigned long long noinstr sched_clock_noinstr(void)
+static __always_inline unsigned long long __sched_clock(void)
 {
 	struct clock_read_data *rd;
 	unsigned int seq;
@@ -98,11 +98,23 @@ unsigned long long noinstr sched_clock_noinstr(void)
 	return res;
 }
 
+unsigned long long noinstr sched_clock_noinstr(void)
+{
+	return __sched_clock();
+}
+
 unsigned long long notrace sched_clock(void)
 {
 	unsigned long long ns;
 	preempt_disable_notrace();
-	ns = sched_clock_noinstr();
+	/*
+	 * All of __sched_clock() is a seqcount_latch reader critical section,
+	 * but relies on the raw helpers which are uninstrumented. For KCSAN,
+	 * mark all accesses in __sched_clock() as atomic.
+	 */
+	kcsan_nestable_atomic_begin();
+	ns = __sched_clock();
+	kcsan_nestable_atomic_end();
 	preempt_enable_notrace();
 	return ns;
 }
-- 
2.47.0.163.g1226f6d8fa-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 3/5] kcsan, seqlock: Support seqcount_latch_t
  2024-11-04 15:43 [PATCH v2 0/5] kcsan, seqlock: Support seqcount_latch_t Marco Elver
  2024-11-04 15:43 ` [PATCH v2 1/5] time/sched_clock: Swap update_clock_read_data() latch writes Marco Elver
  2024-11-04 15:43 ` [PATCH v2 2/5] time/sched_clock: Broaden sched_clock()'s instrumentation coverage Marco Elver
@ 2024-11-04 15:43 ` Marco Elver
  2024-11-06 10:47   ` [tip: locking/core] " tip-bot2 for Marco Elver
  2024-11-04 15:43 ` [PATCH v2 4/5] seqlock, treewide: Switch to non-raw seqcount_latch interface Marco Elver
  2024-11-04 15:43 ` [PATCH v2 5/5] kcsan, seqlock: Fix incorrect assumption in read_seqbegin() Marco Elver
  4 siblings, 1 reply; 18+ messages in thread
From: Marco Elver @ 2024-11-04 15:43 UTC (permalink / raw)
  To: elver, Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long,
	Boqun Feng
  Cc: Paul E. McKenney, Thomas Gleixner, Mark Rutland, Dmitry Vyukov,
	kasan-dev, linux-kernel, Alexander Potapenko

While fuzzing an arm64 kernel, Alexander Potapenko reported:

| BUG: KCSAN: data-race in ktime_get_mono_fast_ns / timekeeping_update
|
| write to 0xffffffc082e74248 of 56 bytes by interrupt on cpu 0:
|  update_fast_timekeeper kernel/time/timekeeping.c:430 [inline]
|  timekeeping_update+0x1d8/0x2d8 kernel/time/timekeeping.c:768
|  timekeeping_advance+0x9e8/0xb78 kernel/time/timekeeping.c:2344
|  update_wall_time+0x18/0x38 kernel/time/timekeeping.c:2360
|  [...]
|
| read to 0xffffffc082e74258 of 8 bytes by task 5260 on cpu 1:
|  __ktime_get_fast_ns kernel/time/timekeeping.c:372 [inline]
|  ktime_get_mono_fast_ns+0x88/0x174 kernel/time/timekeeping.c:489
|  init_srcu_struct_fields+0x40c/0x530 kernel/rcu/srcutree.c:263
|  init_srcu_struct+0x14/0x20 kernel/rcu/srcutree.c:311
|  [...]
|
| value changed: 0x000002f875d33266 -> 0x000002f877416866
|
| Reported by Kernel Concurrency Sanitizer on:
| CPU: 1 UID: 0 PID: 5260 Comm: syz.2.7483 Not tainted 6.12.0-rc3-dirty #78

This is a false positive data race between a seqcount latch writer and a reader
accessing stale data. Since its introduction, KCSAN has never understood the
seqcount_latch interface (due to being unannotated).

Unlike the regular seqlock interface, the seqcount_latch interface for latch
writers never has had a well-defined critical section, making it difficult to
teach tooling where the critical section starts and ends.

Introduce an instrumentable (non-raw) seqcount_latch interface, with
which we can clearly denote writer critical sections. This both helps
readability and tooling like KCSAN to understand when the writer is done
updating all latch copies.

Link: https://lore.kernel.org/all/20241030204815.GQ14555@noisy.programming.kicks-ass.net/
Reported-by: Alexander Potapenko <glider@google.com>
Fixes: 88ecd153be95 ("seqlock, kcsan: Add annotations for KCSAN")
Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Marco Elver <elver@google.com>
---
v2:
* Introduce new interface, courtesy of Peter Zijlstra. Adjust
  documentation along with its introduction.
---
 Documentation/locking/seqlock.rst |  2 +-
 include/linux/seqlock.h           | 86 +++++++++++++++++++++++++------
 2 files changed, 72 insertions(+), 16 deletions(-)

diff --git a/Documentation/locking/seqlock.rst b/Documentation/locking/seqlock.rst
index bfda1a5fecad..ec6411d02ac8 100644
--- a/Documentation/locking/seqlock.rst
+++ b/Documentation/locking/seqlock.rst
@@ -153,7 +153,7 @@ Use seqcount_latch_t when the write side sections cannot be protected
 from interruption by readers. This is typically the case when the read
 side can be invoked from NMI handlers.
 
-Check `raw_write_seqcount_latch()` for more information.
+Check `write_seqcount_latch()` for more information.
 
 
 .. _seqlock_t:
diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index fffeb754880f..45eee0e5dca0 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -621,6 +621,23 @@ static __always_inline unsigned raw_read_seqcount_latch(const seqcount_latch_t *
 	return READ_ONCE(s->seqcount.sequence);
 }
 
+/**
+ * read_seqcount_latch() - pick even/odd latch data copy
+ * @s: Pointer to seqcount_latch_t
+ *
+ * See write_seqcount_latch() for details and a full reader/writer usage
+ * example.
+ *
+ * Return: sequence counter raw value. Use the lowest bit as an index for
+ * picking which data copy to read. The full counter must then be checked
+ * with read_seqcount_latch_retry().
+ */
+static __always_inline unsigned read_seqcount_latch(const seqcount_latch_t *s)
+{
+	kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);
+	return raw_read_seqcount_latch(s);
+}
+
 /**
  * raw_read_seqcount_latch_retry() - end a seqcount_latch_t read section
  * @s:		Pointer to seqcount_latch_t
@@ -635,9 +652,34 @@ raw_read_seqcount_latch_retry(const seqcount_latch_t *s, unsigned start)
 	return unlikely(READ_ONCE(s->seqcount.sequence) != start);
 }
 
+/**
+ * read_seqcount_latch_retry() - end a seqcount_latch_t read section
+ * @s:		Pointer to seqcount_latch_t
+ * @start:	count, from read_seqcount_latch()
+ *
+ * Return: true if a read section retry is required, else false
+ */
+static __always_inline int
+read_seqcount_latch_retry(const seqcount_latch_t *s, unsigned start)
+{
+	kcsan_atomic_next(0);
+	return raw_read_seqcount_latch_retry(s, start);
+}
+
 /**
  * raw_write_seqcount_latch() - redirect latch readers to even/odd copy
  * @s: Pointer to seqcount_latch_t
+ */
+static __always_inline void raw_write_seqcount_latch(seqcount_latch_t *s)
+{
+	smp_wmb();	/* prior stores before incrementing "sequence" */
+	s->seqcount.sequence++;
+	smp_wmb();      /* increment "sequence" before following stores */
+}
+
+/**
+ * write_seqcount_latch_begin() - redirect latch readers to odd copy
+ * @s: Pointer to seqcount_latch_t
  *
  * The latch technique is a multiversion concurrency control method that allows
  * queries during non-atomic modifications. If you can guarantee queries never
@@ -665,17 +707,11 @@ raw_read_seqcount_latch_retry(const seqcount_latch_t *s, unsigned start)
  *
  *	void latch_modify(struct latch_struct *latch, ...)
  *	{
- *		smp_wmb();	// Ensure that the last data[1] update is visible
- *		latch->seq.sequence++;
- *		smp_wmb();	// Ensure that the seqcount update is visible
- *
+ *		write_seqcount_latch_begin(&latch->seq);
  *		modify(latch->data[0], ...);
- *
- *		smp_wmb();	// Ensure that the data[0] update is visible
- *		latch->seq.sequence++;
- *		smp_wmb();	// Ensure that the seqcount update is visible
- *
+ *		write_seqcount_latch(&latch->seq);
  *		modify(latch->data[1], ...);
+ *		write_seqcount_latch_end(&latch->seq);
  *	}
  *
  * The query will have a form like::
@@ -686,13 +722,13 @@ raw_read_seqcount_latch_retry(const seqcount_latch_t *s, unsigned start)
  *		unsigned seq, idx;
  *
  *		do {
- *			seq = raw_read_seqcount_latch(&latch->seq);
+ *			seq = read_seqcount_latch(&latch->seq);
  *
  *			idx = seq & 0x01;
  *			entry = data_query(latch->data[idx], ...);
  *
  *		// This includes needed smp_rmb()
- *		} while (raw_read_seqcount_latch_retry(&latch->seq, seq));
+ *		} while (read_seqcount_latch_retry(&latch->seq, seq));
  *
  *		return entry;
  *	}
@@ -716,11 +752,31 @@ raw_read_seqcount_latch_retry(const seqcount_latch_t *s, unsigned start)
  *	When data is a dynamic data structure; one should use regular RCU
  *	patterns to manage the lifetimes of the objects within.
  */
-static inline void raw_write_seqcount_latch(seqcount_latch_t *s)
+static __always_inline void write_seqcount_latch_begin(seqcount_latch_t *s)
 {
-	smp_wmb();	/* prior stores before incrementing "sequence" */
-	s->seqcount.sequence++;
-	smp_wmb();      /* increment "sequence" before following stores */
+	kcsan_nestable_atomic_begin();
+	raw_write_seqcount_latch(s);
+}
+
+/**
+ * write_seqcount_latch() - redirect latch readers to even copy
+ * @s: Pointer to seqcount_latch_t
+ */
+static __always_inline void write_seqcount_latch(seqcount_latch_t *s)
+{
+	raw_write_seqcount_latch(s);
+}
+
+/**
+ * write_seqcount_latch_end() - end a seqcount_latch_t write section
+ * @s:		Pointer to seqcount_latch_t
+ *
+ * Marks the end of a seqcount_latch_t writer section, after all copies of the
+ * latch-protected data have been updated.
+ */
+static __always_inline void write_seqcount_latch_end(seqcount_latch_t *s)
+{
+	kcsan_nestable_atomic_end();
 }
 
 #define __SEQLOCK_UNLOCKED(lockname)					\
-- 
2.47.0.163.g1226f6d8fa-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 4/5] seqlock, treewide: Switch to non-raw seqcount_latch interface
  2024-11-04 15:43 [PATCH v2 0/5] kcsan, seqlock: Support seqcount_latch_t Marco Elver
                   ` (2 preceding siblings ...)
  2024-11-04 15:43 ` [PATCH v2 3/5] kcsan, seqlock: Support seqcount_latch_t Marco Elver
@ 2024-11-04 15:43 ` Marco Elver
  2024-11-06 10:47   ` [tip: locking/core] " tip-bot2 for Marco Elver
  2024-11-04 15:43 ` [PATCH v2 5/5] kcsan, seqlock: Fix incorrect assumption in read_seqbegin() Marco Elver
  4 siblings, 1 reply; 18+ messages in thread
From: Marco Elver @ 2024-11-04 15:43 UTC (permalink / raw)
  To: elver, Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long,
	Boqun Feng
  Cc: Paul E. McKenney, Thomas Gleixner, Mark Rutland, Dmitry Vyukov,
	kasan-dev, linux-kernel

Switch all instrumentable users of the seqcount_latch interface over to
the non-raw interface.

Link: https://lore.kernel.org/all/20241030204815.GQ14555@noisy.programming.kicks-ass.net/
Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Marco Elver <elver@google.com>
---
v2:
* New patch.
---
 arch/x86/kernel/tsc.c        |  5 +++--
 include/linux/rbtree_latch.h | 20 +++++++++++---------
 kernel/printk/printk.c       |  9 +++++----
 kernel/time/sched_clock.c    | 12 +++++++-----
 kernel/time/timekeeping.c    | 12 +++++++-----
 5 files changed, 33 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index dfe6847fd99e..67aeaba4ba9c 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -174,10 +174,11 @@ static void __set_cyc2ns_scale(unsigned long khz, int cpu, unsigned long long ts
 
 	c2n = per_cpu_ptr(&cyc2ns, cpu);
 
-	raw_write_seqcount_latch(&c2n->seq);
+	write_seqcount_latch_begin(&c2n->seq);
 	c2n->data[0] = data;
-	raw_write_seqcount_latch(&c2n->seq);
+	write_seqcount_latch(&c2n->seq);
 	c2n->data[1] = data;
+	write_seqcount_latch_end(&c2n->seq);
 }
 
 static void set_cyc2ns_scale(unsigned long khz, int cpu, unsigned long long tsc_now)
diff --git a/include/linux/rbtree_latch.h b/include/linux/rbtree_latch.h
index 6a0999c26c7c..2f630eb8307e 100644
--- a/include/linux/rbtree_latch.h
+++ b/include/linux/rbtree_latch.h
@@ -14,7 +14,7 @@
  *
  * If we need to allow unconditional lookups (say as required for NMI context
  * usage) we need a more complex setup; this data structure provides this by
- * employing the latch technique -- see @raw_write_seqcount_latch -- to
+ * employing the latch technique -- see @write_seqcount_latch_begin -- to
  * implement a latched RB-tree which does allow for unconditional lookups by
  * virtue of always having (at least) one stable copy of the tree.
  *
@@ -132,7 +132,7 @@ __lt_find(void *key, struct latch_tree_root *ltr, int idx,
  * @ops: operators defining the node order
  *
  * It inserts @node into @root in an ordered fashion such that we can always
- * observe one complete tree. See the comment for raw_write_seqcount_latch().
+ * observe one complete tree. See the comment for write_seqcount_latch_begin().
  *
  * The inserts use rcu_assign_pointer() to publish the element such that the
  * tree structure is stored before we can observe the new @node.
@@ -145,10 +145,11 @@ latch_tree_insert(struct latch_tree_node *node,
 		  struct latch_tree_root *root,
 		  const struct latch_tree_ops *ops)
 {
-	raw_write_seqcount_latch(&root->seq);
+	write_seqcount_latch_begin(&root->seq);
 	__lt_insert(node, root, 0, ops->less);
-	raw_write_seqcount_latch(&root->seq);
+	write_seqcount_latch(&root->seq);
 	__lt_insert(node, root, 1, ops->less);
+	write_seqcount_latch_end(&root->seq);
 }
 
 /**
@@ -159,7 +160,7 @@ latch_tree_insert(struct latch_tree_node *node,
  *
  * Removes @node from the trees @root in an ordered fashion such that we can
  * always observe one complete tree. See the comment for
- * raw_write_seqcount_latch().
+ * write_seqcount_latch_begin().
  *
  * It is assumed that @node will observe one RCU quiescent state before being
  * reused of freed.
@@ -172,10 +173,11 @@ latch_tree_erase(struct latch_tree_node *node,
 		 struct latch_tree_root *root,
 		 const struct latch_tree_ops *ops)
 {
-	raw_write_seqcount_latch(&root->seq);
+	write_seqcount_latch_begin(&root->seq);
 	__lt_erase(node, root, 0);
-	raw_write_seqcount_latch(&root->seq);
+	write_seqcount_latch(&root->seq);
 	__lt_erase(node, root, 1);
+	write_seqcount_latch_end(&root->seq);
 }
 
 /**
@@ -204,9 +206,9 @@ latch_tree_find(void *key, struct latch_tree_root *root,
 	unsigned int seq;
 
 	do {
-		seq = raw_read_seqcount_latch(&root->seq);
+		seq = read_seqcount_latch(&root->seq);
 		node = __lt_find(key, root, seq & 1, ops->comp);
-	} while (raw_read_seqcount_latch_retry(&root->seq, seq));
+	} while (read_seqcount_latch_retry(&root->seq, seq));
 
 	return node;
 }
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index beb808f4c367..19911c8fa7b6 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -560,10 +560,11 @@ bool printk_percpu_data_ready(void)
 /* Must be called under syslog_lock. */
 static void latched_seq_write(struct latched_seq *ls, u64 val)
 {
-	raw_write_seqcount_latch(&ls->latch);
+	write_seqcount_latch_begin(&ls->latch);
 	ls->val[0] = val;
-	raw_write_seqcount_latch(&ls->latch);
+	write_seqcount_latch(&ls->latch);
 	ls->val[1] = val;
+	write_seqcount_latch_end(&ls->latch);
 }
 
 /* Can be called from any context. */
@@ -574,10 +575,10 @@ static u64 latched_seq_read_nolock(struct latched_seq *ls)
 	u64 val;
 
 	do {
-		seq = raw_read_seqcount_latch(&ls->latch);
+		seq = read_seqcount_latch(&ls->latch);
 		idx = seq & 0x1;
 		val = ls->val[idx];
-	} while (raw_read_seqcount_latch_retry(&ls->latch, seq));
+	} while (read_seqcount_latch_retry(&ls->latch, seq));
 
 	return val;
 }
diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
index 29bdf309dae8..fcca4e72f1ef 100644
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -71,13 +71,13 @@ static __always_inline u64 cyc_to_ns(u64 cyc, u32 mult, u32 shift)
 
 notrace struct clock_read_data *sched_clock_read_begin(unsigned int *seq)
 {
-	*seq = raw_read_seqcount_latch(&cd.seq);
+	*seq = read_seqcount_latch(&cd.seq);
 	return cd.read_data + (*seq & 1);
 }
 
 notrace int sched_clock_read_retry(unsigned int seq)
 {
-	return raw_read_seqcount_latch_retry(&cd.seq, seq);
+	return read_seqcount_latch_retry(&cd.seq, seq);
 }
 
 static __always_inline unsigned long long __sched_clock(void)
@@ -132,16 +132,18 @@ unsigned long long notrace sched_clock(void)
 static void update_clock_read_data(struct clock_read_data *rd)
 {
 	/* steer readers towards the odd copy */
-	raw_write_seqcount_latch(&cd.seq);
+	write_seqcount_latch_begin(&cd.seq);
 
 	/* now its safe for us to update the normal (even) copy */
 	cd.read_data[0] = *rd;
 
 	/* switch readers back to the even copy */
-	raw_write_seqcount_latch(&cd.seq);
+	write_seqcount_latch(&cd.seq);
 
 	/* update the backup (odd) copy with the new data */
 	cd.read_data[1] = *rd;
+
+	write_seqcount_latch_end(&cd.seq);
 }
 
 /*
@@ -279,7 +281,7 @@ void __init generic_sched_clock_init(void)
  */
 static u64 notrace suspended_sched_clock_read(void)
 {
-	unsigned int seq = raw_read_seqcount_latch(&cd.seq);
+	unsigned int seq = read_seqcount_latch(&cd.seq);
 
 	return cd.read_data[seq & 1].epoch_cyc;
 }
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 7e6f409bf311..18752983e834 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -411,7 +411,7 @@ static inline u64 timekeeping_get_ns(const struct tk_read_base *tkr)
  * We want to use this from any context including NMI and tracing /
  * instrumenting the timekeeping code itself.
  *
- * Employ the latch technique; see @raw_write_seqcount_latch.
+ * Employ the latch technique; see @write_seqcount_latch.
  *
  * So if a NMI hits the update of base[0] then it will use base[1]
  * which is still consistent. In the worst case this can result is a
@@ -424,16 +424,18 @@ static void update_fast_timekeeper(const struct tk_read_base *tkr,
 	struct tk_read_base *base = tkf->base;
 
 	/* Force readers off to base[1] */
-	raw_write_seqcount_latch(&tkf->seq);
+	write_seqcount_latch_begin(&tkf->seq);
 
 	/* Update base[0] */
 	memcpy(base, tkr, sizeof(*base));
 
 	/* Force readers back to base[0] */
-	raw_write_seqcount_latch(&tkf->seq);
+	write_seqcount_latch(&tkf->seq);
 
 	/* Update base[1] */
 	memcpy(base + 1, base, sizeof(*base));
+
+	write_seqcount_latch_end(&tkf->seq);
 }
 
 static __always_inline u64 __ktime_get_fast_ns(struct tk_fast *tkf)
@@ -443,11 +445,11 @@ static __always_inline u64 __ktime_get_fast_ns(struct tk_fast *tkf)
 	u64 now;
 
 	do {
-		seq = raw_read_seqcount_latch(&tkf->seq);
+		seq = read_seqcount_latch(&tkf->seq);
 		tkr = tkf->base + (seq & 0x01);
 		now = ktime_to_ns(tkr->base);
 		now += __timekeeping_get_ns(tkr);
-	} while (raw_read_seqcount_latch_retry(&tkf->seq, seq));
+	} while (read_seqcount_latch_retry(&tkf->seq, seq));
 
 	return now;
 }
-- 
2.47.0.163.g1226f6d8fa-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 5/5] kcsan, seqlock: Fix incorrect assumption in read_seqbegin()
  2024-11-04 15:43 [PATCH v2 0/5] kcsan, seqlock: Support seqcount_latch_t Marco Elver
                   ` (3 preceding siblings ...)
  2024-11-04 15:43 ` [PATCH v2 4/5] seqlock, treewide: Switch to non-raw seqcount_latch interface Marco Elver
@ 2024-11-04 15:43 ` Marco Elver
  2024-11-05  9:13   ` Peter Zijlstra
                     ` (2 more replies)
  4 siblings, 3 replies; 18+ messages in thread
From: Marco Elver @ 2024-11-04 15:43 UTC (permalink / raw)
  To: elver, Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long,
	Boqun Feng
  Cc: Paul E. McKenney, Thomas Gleixner, Mark Rutland, Dmitry Vyukov,
	kasan-dev, linux-kernel

During testing of the preceding changes, I noticed that in some cases,
current->kcsan_ctx.in_flat_atomic remained true until task exit. This is
obviously wrong, because _all_ accesses for the given task will be
treated as atomic, resulting in false negatives i.e. missed data races.

Debugging led to fs/dcache.c, where we can see this usage of seqlock:

	struct dentry *d_lookup(const struct dentry *parent, const struct qstr *name)
	{
		struct dentry *dentry;
		unsigned seq;

		do {
			seq = read_seqbegin(&rename_lock);
			dentry = __d_lookup(parent, name);
			if (dentry)
				break;
		} while (read_seqretry(&rename_lock, seq));
	[...]

As can be seen, read_seqretry() is never called if dentry != NULL;
consequently, current->kcsan_ctx.in_flat_atomic will never be reset to
false by read_seqretry().

Give up on the wrong assumption of "assume closing read_seqretry()", and
rely on the already-present annotations in read_seqcount_begin/retry().

Fixes: 88ecd153be95 ("seqlock, kcsan: Add annotations for KCSAN")
Signed-off-by: Marco Elver <elver@google.com>
---
v2:
* New patch.
---
 include/linux/seqlock.h | 12 +-----------
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 45eee0e5dca0..5298765d6ca4 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -810,11 +810,7 @@ static __always_inline void write_seqcount_latch_end(seqcount_latch_t *s)
  */
 static inline unsigned read_seqbegin(const seqlock_t *sl)
 {
-	unsigned ret = read_seqcount_begin(&sl->seqcount);
-
-	kcsan_atomic_next(0);  /* non-raw usage, assume closing read_seqretry() */
-	kcsan_flat_atomic_begin();
-	return ret;
+	return read_seqcount_begin(&sl->seqcount);
 }
 
 /**
@@ -830,12 +826,6 @@ static inline unsigned read_seqbegin(const seqlock_t *sl)
  */
 static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
 {
-	/*
-	 * Assume not nested: read_seqretry() may be called multiple times when
-	 * completing read critical section.
-	 */
-	kcsan_flat_atomic_end();
-
 	return read_seqcount_retry(&sl->seqcount, start);
 }
 
-- 
2.47.0.163.g1226f6d8fa-goog


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 5/5] kcsan, seqlock: Fix incorrect assumption in read_seqbegin()
  2024-11-04 15:43 ` [PATCH v2 5/5] kcsan, seqlock: Fix incorrect assumption in read_seqbegin() Marco Elver
@ 2024-11-05  9:13   ` Peter Zijlstra
  2024-11-05  9:28     ` Marco Elver
  2024-11-05  9:34   ` Peter Zijlstra
  2024-11-06 10:47   ` [tip: locking/core] " tip-bot2 for Marco Elver
  2 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2024-11-05  9:13 UTC (permalink / raw)
  To: Marco Elver
  Cc: Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng,
	Paul E. McKenney, Thomas Gleixner, Mark Rutland, Dmitry Vyukov,
	kasan-dev, linux-kernel

On Mon, Nov 04, 2024 at 04:43:09PM +0100, Marco Elver wrote:
> During testing of the preceding changes, I noticed that in some cases,
> current->kcsan_ctx.in_flat_atomic remained true until task exit. This is
> obviously wrong, because _all_ accesses for the given task will be
> treated as atomic, resulting in false negatives i.e. missed data races.
> 
> Debugging led to fs/dcache.c, where we can see this usage of seqlock:
> 
> 	struct dentry *d_lookup(const struct dentry *parent, const struct qstr *name)
> 	{
> 		struct dentry *dentry;
> 		unsigned seq;
> 
> 		do {
> 			seq = read_seqbegin(&rename_lock);
> 			dentry = __d_lookup(parent, name);
> 			if (dentry)
> 				break;
> 		} while (read_seqretry(&rename_lock, seq));
> 	[...]
> 
> As can be seen, read_seqretry() is never called if dentry != NULL;
> consequently, current->kcsan_ctx.in_flat_atomic will never be reset to
> false by read_seqretry().
> 
> Give up on the wrong assumption of "assume closing read_seqretry()", and
> rely on the already-present annotations in read_seqcount_begin/retry().
> 
> Fixes: 88ecd153be95 ("seqlock, kcsan: Add annotations for KCSAN")
> Signed-off-by: Marco Elver <elver@google.com>
> ---
> v2:
> * New patch.
> ---
>  include/linux/seqlock.h | 12 +-----------
>  1 file changed, 1 insertion(+), 11 deletions(-)
> 
> diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
> index 45eee0e5dca0..5298765d6ca4 100644
> --- a/include/linux/seqlock.h
> +++ b/include/linux/seqlock.h
> @@ -810,11 +810,7 @@ static __always_inline void write_seqcount_latch_end(seqcount_latch_t *s)
>   */
>  static inline unsigned read_seqbegin(const seqlock_t *sl)
>  {
> -	unsigned ret = read_seqcount_begin(&sl->seqcount);
> -
> -	kcsan_atomic_next(0);  /* non-raw usage, assume closing read_seqretry() */
> -	kcsan_flat_atomic_begin();
> -	return ret;
> +	return read_seqcount_begin(&sl->seqcount);
>  }
>  
>  /**
> @@ -830,12 +826,6 @@ static inline unsigned read_seqbegin(const seqlock_t *sl)
>   */
>  static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
>  {
> -	/*
> -	 * Assume not nested: read_seqretry() may be called multiple times when
> -	 * completing read critical section.
> -	 */
> -	kcsan_flat_atomic_end();
> -
>  	return read_seqcount_retry(&sl->seqcount, start);
>  }

OK, so this takes us back to kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX)
and kcsan_atomic_next(0).

Which I suppose is safe, except it doesn't nest properly.

Anyway, these all look really nice, let me go queue them up.

Thanks!

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 2/5] time/sched_clock: Broaden sched_clock()'s instrumentation coverage
  2024-11-04 15:43 ` [PATCH v2 2/5] time/sched_clock: Broaden sched_clock()'s instrumentation coverage Marco Elver
@ 2024-11-05  9:22   ` Marco Elver
  2024-11-05  9:35     ` Peter Zijlstra
  2024-11-06 10:47   ` [tip: locking/core] " tip-bot2 for Marco Elver
  1 sibling, 1 reply; 18+ messages in thread
From: Marco Elver @ 2024-11-05  9:22 UTC (permalink / raw)
  To: elver, Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long,
	Boqun Feng
  Cc: Paul E. McKenney, Thomas Gleixner, Mark Rutland, Dmitry Vyukov,
	kasan-dev, linux-kernel

Oops, typo'd the commit message:

On Mon, 4 Nov 2024 at 17:19, Marco Elver <elver@google.com> wrote:
>
> Most of sched_clock()'s implementation is ineligible for instrumentation
> due to relying on sched_clock_noinstr().
>
> Split the implementation off into an __always_inline function
> __sched_clock(), which is then used by the noinstr and instrumentable
> version, to allow more of sched_clock() to be covered by various
> instrumentation.
>
> This will allow instrumentation with the various sanitizers (KASAN,
> KCSAN, KMSAN, UBSAN). For KCSAN, we know that raw seqcount_latch usage
> without annotations will result in false positive reports: tell it that
> all of __sched_clock() is "atomic" for the latch writer; later changes

s/writer/reader/

> in this series will take care of the readers.

s/readers/writers/

... might be less confusing. If you apply, kindly fix up the commit
message, so that future people will be less confused. The code comment
is correct.

Thanks,
-- Marco

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 5/5] kcsan, seqlock: Fix incorrect assumption in read_seqbegin()
  2024-11-05  9:13   ` Peter Zijlstra
@ 2024-11-05  9:28     ` Marco Elver
  0 siblings, 0 replies; 18+ messages in thread
From: Marco Elver @ 2024-11-05  9:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng,
	Paul E. McKenney, Thomas Gleixner, Mark Rutland, Dmitry Vyukov,
	kasan-dev, linux-kernel

On Tue, 5 Nov 2024 at 10:13, Peter Zijlstra <peterz@infradead.org> wrote:

> >  static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
> >  {
> > -     /*
> > -      * Assume not nested: read_seqretry() may be called multiple times when
> > -      * completing read critical section.
> > -      */
> > -     kcsan_flat_atomic_end();
> > -
> >       return read_seqcount_retry(&sl->seqcount, start);
> >  }
>
> OK, so this takes us back to kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX)
> and kcsan_atomic_next(0).
>
> Which I suppose is safe, except it doesn't nest properly.

Yes correct - we just give up on trying to be special here. It would
be nice to also have readers have a clear critical section, but that
seems a lot harder to enforce with a bunch of them having rather
convoluted control flow. :-/

> Anyway, these all look really nice, let me go queue them up.

Many thanks!

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 5/5] kcsan, seqlock: Fix incorrect assumption in read_seqbegin()
  2024-11-04 15:43 ` [PATCH v2 5/5] kcsan, seqlock: Fix incorrect assumption in read_seqbegin() Marco Elver
  2024-11-05  9:13   ` Peter Zijlstra
@ 2024-11-05  9:34   ` Peter Zijlstra
  2024-11-05  9:41     ` Peter Zijlstra
  2024-11-05  9:50     ` Marco Elver
  2024-11-06 10:47   ` [tip: locking/core] " tip-bot2 for Marco Elver
  2 siblings, 2 replies; 18+ messages in thread
From: Peter Zijlstra @ 2024-11-05  9:34 UTC (permalink / raw)
  To: Marco Elver
  Cc: Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng,
	Paul E. McKenney, Thomas Gleixner, Mark Rutland, Dmitry Vyukov,
	kasan-dev, linux-kernel, Linus Torvalds

On Mon, Nov 04, 2024 at 04:43:09PM +0100, Marco Elver wrote:
> During testing of the preceding changes, I noticed that in some cases,
> current->kcsan_ctx.in_flat_atomic remained true until task exit. This is
> obviously wrong, because _all_ accesses for the given task will be
> treated as atomic, resulting in false negatives i.e. missed data races.
> 
> Debugging led to fs/dcache.c, where we can see this usage of seqlock:
> 
> 	struct dentry *d_lookup(const struct dentry *parent, const struct qstr *name)
> 	{
> 		struct dentry *dentry;
> 		unsigned seq;
> 
> 		do {
> 			seq = read_seqbegin(&rename_lock);
> 			dentry = __d_lookup(parent, name);
> 			if (dentry)
> 				break;
> 		} while (read_seqretry(&rename_lock, seq));
> 	[...]


How's something like this completely untested hack?


	struct dentry *dentry;

	read_seqcount_scope (&rename_lock) {
		dentry = __d_lookup(parent, name);
		if (dentry)
			break;
	}


But perhaps naming isn't right, s/_scope/_loop/ ?


--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -829,6 +829,33 @@ static inline unsigned read_seqretry(con
 	return read_seqcount_retry(&sl->seqcount, start);
 }
 
+
+static inline unsigned read_seq_scope_begin(const struct seqlock_t *sl)
+{
+	unsigned ret = read_seqcount_begin(&sl->seqcount);
+	kcsan_atomic_next(0);
+	kcsan_flat_atomic_begin();
+	return ret;
+}
+
+static inline void read_seq_scope_end(unsigned *seq)
+{
+	kcsan_flat_atomic_end();
+}
+
+static inline bool read_seq_scope_retry(const struct seqlock_t *sl, unsigned *seq)
+{
+	bool done = !read_seqcount_retry(&sl->seqcount, *seq);
+	if (!done)
+		*seq = read_seqcount_begin(&sl->seqcount);
+	return done;
+}
+
+#define read_seqcount_scope(sl) \
+	for (unsigned seq __cleanup(read_seq_scope_end) =		\
+			read_seq_scope_begin(sl), done = 0;		\
+	     !done; done = read_seq_scope_retry(sl, &seq))
+
 /*
  * For all seqlock_t write side functions, use the internal
  * do_write_seqcount_begin() instead of generic write_seqcount_begin().

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 2/5] time/sched_clock: Broaden sched_clock()'s instrumentation coverage
  2024-11-05  9:22   ` Marco Elver
@ 2024-11-05  9:35     ` Peter Zijlstra
  0 siblings, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2024-11-05  9:35 UTC (permalink / raw)
  To: Marco Elver
  Cc: Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng,
	Paul E. McKenney, Thomas Gleixner, Mark Rutland, Dmitry Vyukov,
	kasan-dev, linux-kernel

On Tue, Nov 05, 2024 at 10:22:51AM +0100, Marco Elver wrote:
> Oops, typo'd the commit message:
> 
> On Mon, 4 Nov 2024 at 17:19, Marco Elver <elver@google.com> wrote:
> >
> > Most of sched_clock()'s implementation is ineligible for instrumentation
> > due to relying on sched_clock_noinstr().
> >
> > Split the implementation off into an __always_inline function
> > __sched_clock(), which is then used by the noinstr and instrumentable
> > version, to allow more of sched_clock() to be covered by various
> > instrumentation.
> >
> > This will allow instrumentation with the various sanitizers (KASAN,
> > KCSAN, KMSAN, UBSAN). For KCSAN, we know that raw seqcount_latch usage
> > without annotations will result in false positive reports: tell it that
> > all of __sched_clock() is "atomic" for the latch writer; later changes
> 
> s/writer/reader/
> 
> > in this series will take care of the readers.
> 
> s/readers/writers/
> 
> ... might be less confusing. If you apply, kindly fix up the commit
> message, so that future people will be less confused. The code comment
> is correct.

So done. Thanks!

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 5/5] kcsan, seqlock: Fix incorrect assumption in read_seqbegin()
  2024-11-05  9:34   ` Peter Zijlstra
@ 2024-11-05  9:41     ` Peter Zijlstra
  2024-11-05  9:50     ` Marco Elver
  1 sibling, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2024-11-05  9:41 UTC (permalink / raw)
  To: Marco Elver
  Cc: Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng,
	Paul E. McKenney, Thomas Gleixner, Mark Rutland, Dmitry Vyukov,
	kasan-dev, linux-kernel, Linus Torvalds

On Tue, Nov 05, 2024 at 10:34:00AM +0100, Peter Zijlstra wrote:
> On Mon, Nov 04, 2024 at 04:43:09PM +0100, Marco Elver wrote:
> > During testing of the preceding changes, I noticed that in some cases,
> > current->kcsan_ctx.in_flat_atomic remained true until task exit. This is
> > obviously wrong, because _all_ accesses for the given task will be
> > treated as atomic, resulting in false negatives i.e. missed data races.
> > 
> > Debugging led to fs/dcache.c, where we can see this usage of seqlock:
> > 
> > 	struct dentry *d_lookup(const struct dentry *parent, const struct qstr *name)
> > 	{
> > 		struct dentry *dentry;
> > 		unsigned seq;
> > 
> > 		do {
> > 			seq = read_seqbegin(&rename_lock);
> > 			dentry = __d_lookup(parent, name);
> > 			if (dentry)
> > 				break;
> > 		} while (read_seqretry(&rename_lock, seq));
> > 	[...]
> 
> 
> How's something like this completely untested hack?
> 
> 
> 	struct dentry *dentry;
> 
> 	read_seqcount_scope (&rename_lock) {
> 		dentry = __d_lookup(parent, name);
> 		if (dentry)
> 			break;
> 	}
> 
> 
> But perhaps naming isn't right, s/_scope/_loop/ ?

It is also confused between seqcount and seqlock. So perhaps it should
read:

	read_seqcount_loop (&rename_lock.seqcount) {
	   ...
	}

instead.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 5/5] kcsan, seqlock: Fix incorrect assumption in read_seqbegin()
  2024-11-05  9:34   ` Peter Zijlstra
  2024-11-05  9:41     ` Peter Zijlstra
@ 2024-11-05  9:50     ` Marco Elver
  1 sibling, 0 replies; 18+ messages in thread
From: Marco Elver @ 2024-11-05  9:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Will Deacon, Waiman Long, Boqun Feng,
	Paul E. McKenney, Thomas Gleixner, Mark Rutland, Dmitry Vyukov,
	kasan-dev, linux-kernel, Linus Torvalds

On Tue, 5 Nov 2024 at 10:34, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Nov 04, 2024 at 04:43:09PM +0100, Marco Elver wrote:
> > During testing of the preceding changes, I noticed that in some cases,
> > current->kcsan_ctx.in_flat_atomic remained true until task exit. This is
> > obviously wrong, because _all_ accesses for the given task will be
> > treated as atomic, resulting in false negatives i.e. missed data races.
> >
> > Debugging led to fs/dcache.c, where we can see this usage of seqlock:
> >
> >       struct dentry *d_lookup(const struct dentry *parent, const struct qstr *name)
> >       {
> >               struct dentry *dentry;
> >               unsigned seq;
> >
> >               do {
> >                       seq = read_seqbegin(&rename_lock);
> >                       dentry = __d_lookup(parent, name);
> >                       if (dentry)
> >                               break;
> >               } while (read_seqretry(&rename_lock, seq));
> >       [...]
>
>
> How's something like this completely untested hack?
>
>
>         struct dentry *dentry;
>
>         read_seqcount_scope (&rename_lock) {
>                 dentry = __d_lookup(parent, name);
>                 if (dentry)
>                         break;
>         }
>
>
> But perhaps naming isn't right, s/_scope/_loop/ ?

_loop seems straightforward.

> --- a/include/linux/seqlock.h
> +++ b/include/linux/seqlock.h
> @@ -829,6 +829,33 @@ static inline unsigned read_seqretry(con
>         return read_seqcount_retry(&sl->seqcount, start);
>  }
>
> +
> +static inline unsigned read_seq_scope_begin(const struct seqlock_t *sl)
> +{
> +       unsigned ret = read_seqcount_begin(&sl->seqcount);
> +       kcsan_atomic_next(0);
> +       kcsan_flat_atomic_begin();
> +       return ret;
> +}
> +
> +static inline void read_seq_scope_end(unsigned *seq)
> +{
> +       kcsan_flat_atomic_end();

If we are guaranteed to always have one _begin paired by a matching
_end, we can s/kcsan_flat_atomic/kcsan_nestable_atomic/ for these.

> +}
> +
> +static inline bool read_seq_scope_retry(const struct seqlock_t *sl, unsigned *seq)
> +{
> +       bool done = !read_seqcount_retry(&sl->seqcount, *seq);
> +       if (!done)
> +               *seq = read_seqcount_begin(&sl->seqcount);
> +       return done;
> +}
> +
> +#define read_seqcount_scope(sl) \
> +       for (unsigned seq __cleanup(read_seq_scope_end) =               \
> +                       read_seq_scope_begin(sl), done = 0;             \
> +            !done; done = read_seq_scope_retry(sl, &seq))
> +

That's nice! I think before we fully moved over to C11, I recall Mark
and I discussed something like that (on IRC?) but gave up until C11
landed and then we forgot. ;-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [tip: locking/core] kcsan, seqlock: Fix incorrect assumption in read_seqbegin()
  2024-11-04 15:43 ` [PATCH v2 5/5] kcsan, seqlock: Fix incorrect assumption in read_seqbegin() Marco Elver
  2024-11-05  9:13   ` Peter Zijlstra
  2024-11-05  9:34   ` Peter Zijlstra
@ 2024-11-06 10:47   ` tip-bot2 for Marco Elver
  2 siblings, 0 replies; 18+ messages in thread
From: tip-bot2 for Marco Elver @ 2024-11-06 10:47 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Marco Elver, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     183ec5f26b2fc97a4a9871865bfe9b33c41fddb2
Gitweb:        https://git.kernel.org/tip/183ec5f26b2fc97a4a9871865bfe9b33c41fddb2
Author:        Marco Elver <elver@google.com>
AuthorDate:    Mon, 04 Nov 2024 16:43:09 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 05 Nov 2024 12:55:35 +01:00

kcsan, seqlock: Fix incorrect assumption in read_seqbegin()

During testing of the preceding changes, I noticed that in some cases,
current->kcsan_ctx.in_flat_atomic remained true until task exit. This is
obviously wrong, because _all_ accesses for the given task will be
treated as atomic, resulting in false negatives i.e. missed data races.

Debugging led to fs/dcache.c, where we can see this usage of seqlock:

	struct dentry *d_lookup(const struct dentry *parent, const struct qstr *name)
	{
		struct dentry *dentry;
		unsigned seq;

		do {
			seq = read_seqbegin(&rename_lock);
			dentry = __d_lookup(parent, name);
			if (dentry)
				break;
		} while (read_seqretry(&rename_lock, seq));
	[...]

As can be seen, read_seqretry() is never called if dentry != NULL;
consequently, current->kcsan_ctx.in_flat_atomic will never be reset to
false by read_seqretry().

Give up on the wrong assumption of "assume closing read_seqretry()", and
rely on the already-present annotations in read_seqcount_begin/retry().

Fixes: 88ecd153be95 ("seqlock, kcsan: Add annotations for KCSAN")
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241104161910.780003-6-elver@google.com
---
 include/linux/seqlock.h | 12 +-----------
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 45eee0e..5298765 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -810,11 +810,7 @@ static __always_inline void write_seqcount_latch_end(seqcount_latch_t *s)
  */
 static inline unsigned read_seqbegin(const seqlock_t *sl)
 {
-	unsigned ret = read_seqcount_begin(&sl->seqcount);
-
-	kcsan_atomic_next(0);  /* non-raw usage, assume closing read_seqretry() */
-	kcsan_flat_atomic_begin();
-	return ret;
+	return read_seqcount_begin(&sl->seqcount);
 }
 
 /**
@@ -830,12 +826,6 @@ static inline unsigned read_seqbegin(const seqlock_t *sl)
  */
 static inline unsigned read_seqretry(const seqlock_t *sl, unsigned start)
 {
-	/*
-	 * Assume not nested: read_seqretry() may be called multiple times when
-	 * completing read critical section.
-	 */
-	kcsan_flat_atomic_end();
-
 	return read_seqcount_retry(&sl->seqcount, start);
 }
 

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [tip: locking/core] seqlock, treewide: Switch to non-raw seqcount_latch interface
  2024-11-04 15:43 ` [PATCH v2 4/5] seqlock, treewide: Switch to non-raw seqcount_latch interface Marco Elver
@ 2024-11-06 10:47   ` tip-bot2 for Marco Elver
  0 siblings, 0 replies; 18+ messages in thread
From: tip-bot2 for Marco Elver @ 2024-11-06 10:47 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), Marco Elver, x86, linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     93190bc35d6d4364a4d8c38ac8961dabecbff4ed
Gitweb:        https://git.kernel.org/tip/93190bc35d6d4364a4d8c38ac8961dabecbff4ed
Author:        Marco Elver <elver@google.com>
AuthorDate:    Mon, 04 Nov 2024 16:43:08 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 05 Nov 2024 12:55:35 +01:00

seqlock, treewide: Switch to non-raw seqcount_latch interface

Switch all instrumentable users of the seqcount_latch interface over to
the non-raw interface.

Co-developed-by: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Signed-off-by: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241104161910.780003-5-elver@google.com
---
 arch/x86/kernel/tsc.c        |  5 +++--
 include/linux/rbtree_latch.h | 20 +++++++++++---------
 kernel/printk/printk.c       |  9 +++++----
 kernel/time/sched_clock.c    | 12 +++++++-----
 kernel/time/timekeeping.c    | 12 +++++++-----
 5 files changed, 33 insertions(+), 25 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index dfe6847..67aeaba 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -174,10 +174,11 @@ static void __set_cyc2ns_scale(unsigned long khz, int cpu, unsigned long long ts
 
 	c2n = per_cpu_ptr(&cyc2ns, cpu);
 
-	raw_write_seqcount_latch(&c2n->seq);
+	write_seqcount_latch_begin(&c2n->seq);
 	c2n->data[0] = data;
-	raw_write_seqcount_latch(&c2n->seq);
+	write_seqcount_latch(&c2n->seq);
 	c2n->data[1] = data;
+	write_seqcount_latch_end(&c2n->seq);
 }
 
 static void set_cyc2ns_scale(unsigned long khz, int cpu, unsigned long long tsc_now)
diff --git a/include/linux/rbtree_latch.h b/include/linux/rbtree_latch.h
index 6a0999c..2f630eb 100644
--- a/include/linux/rbtree_latch.h
+++ b/include/linux/rbtree_latch.h
@@ -14,7 +14,7 @@
  *
  * If we need to allow unconditional lookups (say as required for NMI context
  * usage) we need a more complex setup; this data structure provides this by
- * employing the latch technique -- see @raw_write_seqcount_latch -- to
+ * employing the latch technique -- see @write_seqcount_latch_begin -- to
  * implement a latched RB-tree which does allow for unconditional lookups by
  * virtue of always having (at least) one stable copy of the tree.
  *
@@ -132,7 +132,7 @@ __lt_find(void *key, struct latch_tree_root *ltr, int idx,
  * @ops: operators defining the node order
  *
  * It inserts @node into @root in an ordered fashion such that we can always
- * observe one complete tree. See the comment for raw_write_seqcount_latch().
+ * observe one complete tree. See the comment for write_seqcount_latch_begin().
  *
  * The inserts use rcu_assign_pointer() to publish the element such that the
  * tree structure is stored before we can observe the new @node.
@@ -145,10 +145,11 @@ latch_tree_insert(struct latch_tree_node *node,
 		  struct latch_tree_root *root,
 		  const struct latch_tree_ops *ops)
 {
-	raw_write_seqcount_latch(&root->seq);
+	write_seqcount_latch_begin(&root->seq);
 	__lt_insert(node, root, 0, ops->less);
-	raw_write_seqcount_latch(&root->seq);
+	write_seqcount_latch(&root->seq);
 	__lt_insert(node, root, 1, ops->less);
+	write_seqcount_latch_end(&root->seq);
 }
 
 /**
@@ -159,7 +160,7 @@ latch_tree_insert(struct latch_tree_node *node,
  *
  * Removes @node from the trees @root in an ordered fashion such that we can
  * always observe one complete tree. See the comment for
- * raw_write_seqcount_latch().
+ * write_seqcount_latch_begin().
  *
  * It is assumed that @node will observe one RCU quiescent state before being
  * reused of freed.
@@ -172,10 +173,11 @@ latch_tree_erase(struct latch_tree_node *node,
 		 struct latch_tree_root *root,
 		 const struct latch_tree_ops *ops)
 {
-	raw_write_seqcount_latch(&root->seq);
+	write_seqcount_latch_begin(&root->seq);
 	__lt_erase(node, root, 0);
-	raw_write_seqcount_latch(&root->seq);
+	write_seqcount_latch(&root->seq);
 	__lt_erase(node, root, 1);
+	write_seqcount_latch_end(&root->seq);
 }
 
 /**
@@ -204,9 +206,9 @@ latch_tree_find(void *key, struct latch_tree_root *root,
 	unsigned int seq;
 
 	do {
-		seq = raw_read_seqcount_latch(&root->seq);
+		seq = read_seqcount_latch(&root->seq);
 		node = __lt_find(key, root, seq & 1, ops->comp);
-	} while (raw_read_seqcount_latch_retry(&root->seq, seq));
+	} while (read_seqcount_latch_retry(&root->seq, seq));
 
 	return node;
 }
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index beb808f..19911c8 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -560,10 +560,11 @@ bool printk_percpu_data_ready(void)
 /* Must be called under syslog_lock. */
 static void latched_seq_write(struct latched_seq *ls, u64 val)
 {
-	raw_write_seqcount_latch(&ls->latch);
+	write_seqcount_latch_begin(&ls->latch);
 	ls->val[0] = val;
-	raw_write_seqcount_latch(&ls->latch);
+	write_seqcount_latch(&ls->latch);
 	ls->val[1] = val;
+	write_seqcount_latch_end(&ls->latch);
 }
 
 /* Can be called from any context. */
@@ -574,10 +575,10 @@ static u64 latched_seq_read_nolock(struct latched_seq *ls)
 	u64 val;
 
 	do {
-		seq = raw_read_seqcount_latch(&ls->latch);
+		seq = read_seqcount_latch(&ls->latch);
 		idx = seq & 0x1;
 		val = ls->val[idx];
-	} while (raw_read_seqcount_latch_retry(&ls->latch, seq));
+	} while (read_seqcount_latch_retry(&ls->latch, seq));
 
 	return val;
 }
diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
index 29bdf30..fcca4e7 100644
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -71,13 +71,13 @@ static __always_inline u64 cyc_to_ns(u64 cyc, u32 mult, u32 shift)
 
 notrace struct clock_read_data *sched_clock_read_begin(unsigned int *seq)
 {
-	*seq = raw_read_seqcount_latch(&cd.seq);
+	*seq = read_seqcount_latch(&cd.seq);
 	return cd.read_data + (*seq & 1);
 }
 
 notrace int sched_clock_read_retry(unsigned int seq)
 {
-	return raw_read_seqcount_latch_retry(&cd.seq, seq);
+	return read_seqcount_latch_retry(&cd.seq, seq);
 }
 
 static __always_inline unsigned long long __sched_clock(void)
@@ -132,16 +132,18 @@ unsigned long long notrace sched_clock(void)
 static void update_clock_read_data(struct clock_read_data *rd)
 {
 	/* steer readers towards the odd copy */
-	raw_write_seqcount_latch(&cd.seq);
+	write_seqcount_latch_begin(&cd.seq);
 
 	/* now its safe for us to update the normal (even) copy */
 	cd.read_data[0] = *rd;
 
 	/* switch readers back to the even copy */
-	raw_write_seqcount_latch(&cd.seq);
+	write_seqcount_latch(&cd.seq);
 
 	/* update the backup (odd) copy with the new data */
 	cd.read_data[1] = *rd;
+
+	write_seqcount_latch_end(&cd.seq);
 }
 
 /*
@@ -279,7 +281,7 @@ void __init generic_sched_clock_init(void)
  */
 static u64 notrace suspended_sched_clock_read(void)
 {
-	unsigned int seq = raw_read_seqcount_latch(&cd.seq);
+	unsigned int seq = read_seqcount_latch(&cd.seq);
 
 	return cd.read_data[seq & 1].epoch_cyc;
 }
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 7e6f409..1875298 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -411,7 +411,7 @@ static inline u64 timekeeping_get_ns(const struct tk_read_base *tkr)
  * We want to use this from any context including NMI and tracing /
  * instrumenting the timekeeping code itself.
  *
- * Employ the latch technique; see @raw_write_seqcount_latch.
+ * Employ the latch technique; see @write_seqcount_latch.
  *
  * So if a NMI hits the update of base[0] then it will use base[1]
  * which is still consistent. In the worst case this can result is a
@@ -424,16 +424,18 @@ static void update_fast_timekeeper(const struct tk_read_base *tkr,
 	struct tk_read_base *base = tkf->base;
 
 	/* Force readers off to base[1] */
-	raw_write_seqcount_latch(&tkf->seq);
+	write_seqcount_latch_begin(&tkf->seq);
 
 	/* Update base[0] */
 	memcpy(base, tkr, sizeof(*base));
 
 	/* Force readers back to base[0] */
-	raw_write_seqcount_latch(&tkf->seq);
+	write_seqcount_latch(&tkf->seq);
 
 	/* Update base[1] */
 	memcpy(base + 1, base, sizeof(*base));
+
+	write_seqcount_latch_end(&tkf->seq);
 }
 
 static __always_inline u64 __ktime_get_fast_ns(struct tk_fast *tkf)
@@ -443,11 +445,11 @@ static __always_inline u64 __ktime_get_fast_ns(struct tk_fast *tkf)
 	u64 now;
 
 	do {
-		seq = raw_read_seqcount_latch(&tkf->seq);
+		seq = read_seqcount_latch(&tkf->seq);
 		tkr = tkf->base + (seq & 0x01);
 		now = ktime_to_ns(tkr->base);
 		now += __timekeeping_get_ns(tkr);
-	} while (raw_read_seqcount_latch_retry(&tkf->seq, seq));
+	} while (read_seqcount_latch_retry(&tkf->seq, seq));
 
 	return now;
 }

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [tip: locking/core] kcsan, seqlock: Support seqcount_latch_t
  2024-11-04 15:43 ` [PATCH v2 3/5] kcsan, seqlock: Support seqcount_latch_t Marco Elver
@ 2024-11-06 10:47   ` tip-bot2 for Marco Elver
  0 siblings, 0 replies; 18+ messages in thread
From: tip-bot2 for Marco Elver @ 2024-11-06 10:47 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Alexander Potapenko, Peter Zijlstra (Intel), Marco Elver, x86,
	linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     5c1806c41ce0a0110db5dd4c483cf2dc28b3ddf0
Gitweb:        https://git.kernel.org/tip/5c1806c41ce0a0110db5dd4c483cf2dc28b3ddf0
Author:        Marco Elver <elver@google.com>
AuthorDate:    Mon, 04 Nov 2024 16:43:07 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 05 Nov 2024 12:55:35 +01:00

kcsan, seqlock: Support seqcount_latch_t

While fuzzing an arm64 kernel, Alexander Potapenko reported:

| BUG: KCSAN: data-race in ktime_get_mono_fast_ns / timekeeping_update
|
| write to 0xffffffc082e74248 of 56 bytes by interrupt on cpu 0:
|  update_fast_timekeeper kernel/time/timekeeping.c:430 [inline]
|  timekeeping_update+0x1d8/0x2d8 kernel/time/timekeeping.c:768
|  timekeeping_advance+0x9e8/0xb78 kernel/time/timekeeping.c:2344
|  update_wall_time+0x18/0x38 kernel/time/timekeeping.c:2360
|  [...]
|
| read to 0xffffffc082e74258 of 8 bytes by task 5260 on cpu 1:
|  __ktime_get_fast_ns kernel/time/timekeeping.c:372 [inline]
|  ktime_get_mono_fast_ns+0x88/0x174 kernel/time/timekeeping.c:489
|  init_srcu_struct_fields+0x40c/0x530 kernel/rcu/srcutree.c:263
|  init_srcu_struct+0x14/0x20 kernel/rcu/srcutree.c:311
|  [...]
|
| value changed: 0x000002f875d33266 -> 0x000002f877416866
|
| Reported by Kernel Concurrency Sanitizer on:
| CPU: 1 UID: 0 PID: 5260 Comm: syz.2.7483 Not tainted 6.12.0-rc3-dirty #78

This is a false positive data race between a seqcount latch writer and a reader
accessing stale data. Since its introduction, KCSAN has never understood the
seqcount_latch interface (due to being unannotated).

Unlike the regular seqlock interface, the seqcount_latch interface for latch
writers never has had a well-defined critical section, making it difficult to
teach tooling where the critical section starts and ends.

Introduce an instrumentable (non-raw) seqcount_latch interface, with
which we can clearly denote writer critical sections. This both helps
readability and tooling like KCSAN to understand when the writer is done
updating all latch copies.

Fixes: 88ecd153be95 ("seqlock, kcsan: Add annotations for KCSAN")
Reported-by: Alexander Potapenko <glider@google.com>
Co-developed-by: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Signed-off-by: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241104161910.780003-4-elver@google.com
---
 Documentation/locking/seqlock.rst |  2 +-
 include/linux/seqlock.h           | 86 ++++++++++++++++++++++++------
 2 files changed, 72 insertions(+), 16 deletions(-)

diff --git a/Documentation/locking/seqlock.rst b/Documentation/locking/seqlock.rst
index bfda1a5..ec6411d 100644
--- a/Documentation/locking/seqlock.rst
+++ b/Documentation/locking/seqlock.rst
@@ -153,7 +153,7 @@ Use seqcount_latch_t when the write side sections cannot be protected
 from interruption by readers. This is typically the case when the read
 side can be invoked from NMI handlers.
 
-Check `raw_write_seqcount_latch()` for more information.
+Check `write_seqcount_latch()` for more information.
 
 
 .. _seqlock_t:
diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index fffeb75..45eee0e 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -622,6 +622,23 @@ static __always_inline unsigned raw_read_seqcount_latch(const seqcount_latch_t *
 }
 
 /**
+ * read_seqcount_latch() - pick even/odd latch data copy
+ * @s: Pointer to seqcount_latch_t
+ *
+ * See write_seqcount_latch() for details and a full reader/writer usage
+ * example.
+ *
+ * Return: sequence counter raw value. Use the lowest bit as an index for
+ * picking which data copy to read. The full counter must then be checked
+ * with read_seqcount_latch_retry().
+ */
+static __always_inline unsigned read_seqcount_latch(const seqcount_latch_t *s)
+{
+	kcsan_atomic_next(KCSAN_SEQLOCK_REGION_MAX);
+	return raw_read_seqcount_latch(s);
+}
+
+/**
  * raw_read_seqcount_latch_retry() - end a seqcount_latch_t read section
  * @s:		Pointer to seqcount_latch_t
  * @start:	count, from raw_read_seqcount_latch()
@@ -636,8 +653,33 @@ raw_read_seqcount_latch_retry(const seqcount_latch_t *s, unsigned start)
 }
 
 /**
+ * read_seqcount_latch_retry() - end a seqcount_latch_t read section
+ * @s:		Pointer to seqcount_latch_t
+ * @start:	count, from read_seqcount_latch()
+ *
+ * Return: true if a read section retry is required, else false
+ */
+static __always_inline int
+read_seqcount_latch_retry(const seqcount_latch_t *s, unsigned start)
+{
+	kcsan_atomic_next(0);
+	return raw_read_seqcount_latch_retry(s, start);
+}
+
+/**
  * raw_write_seqcount_latch() - redirect latch readers to even/odd copy
  * @s: Pointer to seqcount_latch_t
+ */
+static __always_inline void raw_write_seqcount_latch(seqcount_latch_t *s)
+{
+	smp_wmb();	/* prior stores before incrementing "sequence" */
+	s->seqcount.sequence++;
+	smp_wmb();      /* increment "sequence" before following stores */
+}
+
+/**
+ * write_seqcount_latch_begin() - redirect latch readers to odd copy
+ * @s: Pointer to seqcount_latch_t
  *
  * The latch technique is a multiversion concurrency control method that allows
  * queries during non-atomic modifications. If you can guarantee queries never
@@ -665,17 +707,11 @@ raw_read_seqcount_latch_retry(const seqcount_latch_t *s, unsigned start)
  *
  *	void latch_modify(struct latch_struct *latch, ...)
  *	{
- *		smp_wmb();	// Ensure that the last data[1] update is visible
- *		latch->seq.sequence++;
- *		smp_wmb();	// Ensure that the seqcount update is visible
- *
+ *		write_seqcount_latch_begin(&latch->seq);
  *		modify(latch->data[0], ...);
- *
- *		smp_wmb();	// Ensure that the data[0] update is visible
- *		latch->seq.sequence++;
- *		smp_wmb();	// Ensure that the seqcount update is visible
- *
+ *		write_seqcount_latch(&latch->seq);
  *		modify(latch->data[1], ...);
+ *		write_seqcount_latch_end(&latch->seq);
  *	}
  *
  * The query will have a form like::
@@ -686,13 +722,13 @@ raw_read_seqcount_latch_retry(const seqcount_latch_t *s, unsigned start)
  *		unsigned seq, idx;
  *
  *		do {
- *			seq = raw_read_seqcount_latch(&latch->seq);
+ *			seq = read_seqcount_latch(&latch->seq);
  *
  *			idx = seq & 0x01;
  *			entry = data_query(latch->data[idx], ...);
  *
  *		// This includes needed smp_rmb()
- *		} while (raw_read_seqcount_latch_retry(&latch->seq, seq));
+ *		} while (read_seqcount_latch_retry(&latch->seq, seq));
  *
  *		return entry;
  *	}
@@ -716,11 +752,31 @@ raw_read_seqcount_latch_retry(const seqcount_latch_t *s, unsigned start)
  *	When data is a dynamic data structure; one should use regular RCU
  *	patterns to manage the lifetimes of the objects within.
  */
-static inline void raw_write_seqcount_latch(seqcount_latch_t *s)
+static __always_inline void write_seqcount_latch_begin(seqcount_latch_t *s)
 {
-	smp_wmb();	/* prior stores before incrementing "sequence" */
-	s->seqcount.sequence++;
-	smp_wmb();      /* increment "sequence" before following stores */
+	kcsan_nestable_atomic_begin();
+	raw_write_seqcount_latch(s);
+}
+
+/**
+ * write_seqcount_latch() - redirect latch readers to even copy
+ * @s: Pointer to seqcount_latch_t
+ */
+static __always_inline void write_seqcount_latch(seqcount_latch_t *s)
+{
+	raw_write_seqcount_latch(s);
+}
+
+/**
+ * write_seqcount_latch_end() - end a seqcount_latch_t write section
+ * @s:		Pointer to seqcount_latch_t
+ *
+ * Marks the end of a seqcount_latch_t writer section, after all copies of the
+ * latch-protected data have been updated.
+ */
+static __always_inline void write_seqcount_latch_end(seqcount_latch_t *s)
+{
+	kcsan_nestable_atomic_end();
 }
 
 #define __SEQLOCK_UNLOCKED(lockname)					\

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [tip: locking/core] time/sched_clock: Broaden sched_clock()'s instrumentation coverage
  2024-11-04 15:43 ` [PATCH v2 2/5] time/sched_clock: Broaden sched_clock()'s instrumentation coverage Marco Elver
  2024-11-05  9:22   ` Marco Elver
@ 2024-11-06 10:47   ` tip-bot2 for Marco Elver
  1 sibling, 0 replies; 18+ messages in thread
From: tip-bot2 for Marco Elver @ 2024-11-06 10:47 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), Marco Elver, x86, linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     8ab40fc2b9086b915e46890bb9252dc7692f1da0
Gitweb:        https://git.kernel.org/tip/8ab40fc2b9086b915e46890bb9252dc7692f1da0
Author:        Marco Elver <elver@google.com>
AuthorDate:    Mon, 04 Nov 2024 16:43:06 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 05 Nov 2024 12:55:35 +01:00

time/sched_clock: Broaden sched_clock()'s instrumentation coverage

Most of sched_clock()'s implementation is ineligible for instrumentation
due to relying on sched_clock_noinstr().

Split the implementation off into an __always_inline function
__sched_clock(), which is then used by the noinstr and instrumentable
version, to allow more of sched_clock() to be covered by various
instrumentation.

This will allow instrumentation with the various sanitizers (KASAN,
KCSAN, KMSAN, UBSAN). For KCSAN, we know that raw seqcount_latch usage
without annotations will result in false positive reports: tell it that
all of __sched_clock() is "atomic" for the latch reader; later changes
in this series will take care of the writers.

Co-developed-by: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Signed-off-by: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241104161910.780003-3-elver@google.com
---
 kernel/time/sched_clock.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
index 85595fc..29bdf30 100644
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -80,7 +80,7 @@ notrace int sched_clock_read_retry(unsigned int seq)
 	return raw_read_seqcount_latch_retry(&cd.seq, seq);
 }
 
-unsigned long long noinstr sched_clock_noinstr(void)
+static __always_inline unsigned long long __sched_clock(void)
 {
 	struct clock_read_data *rd;
 	unsigned int seq;
@@ -98,11 +98,23 @@ unsigned long long noinstr sched_clock_noinstr(void)
 	return res;
 }
 
+unsigned long long noinstr sched_clock_noinstr(void)
+{
+	return __sched_clock();
+}
+
 unsigned long long notrace sched_clock(void)
 {
 	unsigned long long ns;
 	preempt_disable_notrace();
-	ns = sched_clock_noinstr();
+	/*
+	 * All of __sched_clock() is a seqcount_latch reader critical section,
+	 * but relies on the raw helpers which are uninstrumented. For KCSAN,
+	 * mark all accesses in __sched_clock() as atomic.
+	 */
+	kcsan_nestable_atomic_begin();
+	ns = __sched_clock();
+	kcsan_nestable_atomic_end();
 	preempt_enable_notrace();
 	return ns;
 }

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [tip: locking/core] time/sched_clock: Swap update_clock_read_data() latch writes
  2024-11-04 15:43 ` [PATCH v2 1/5] time/sched_clock: Swap update_clock_read_data() latch writes Marco Elver
@ 2024-11-06 10:47   ` tip-bot2 for Marco Elver
  0 siblings, 0 replies; 18+ messages in thread
From: tip-bot2 for Marco Elver @ 2024-11-06 10:47 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Marco Elver, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     1139c71df5ca29a36f08e3a08c7cee160db21ec1
Gitweb:        https://git.kernel.org/tip/1139c71df5ca29a36f08e3a08c7cee160db21ec1
Author:        Marco Elver <elver@google.com>
AuthorDate:    Mon, 04 Nov 2024 16:43:05 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 05 Nov 2024 12:55:34 +01:00

time/sched_clock: Swap update_clock_read_data() latch writes

Swap the writes to the odd and even copies to make the writer critical
section look like all other seqcount_latch writers.

Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20241104161910.780003-2-elver@google.com
---
 kernel/time/sched_clock.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
index 68d6c11..85595fc 100644
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -119,9 +119,6 @@ unsigned long long notrace sched_clock(void)
  */
 static void update_clock_read_data(struct clock_read_data *rd)
 {
-	/* update the backup (odd) copy with the new data */
-	cd.read_data[1] = *rd;
-
 	/* steer readers towards the odd copy */
 	raw_write_seqcount_latch(&cd.seq);
 
@@ -130,6 +127,9 @@ static void update_clock_read_data(struct clock_read_data *rd)
 
 	/* switch readers back to the even copy */
 	raw_write_seqcount_latch(&cd.seq);
+
+	/* update the backup (odd) copy with the new data */
+	cd.read_data[1] = *rd;
 }
 
 /*

^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2024-11-06 10:47 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-04 15:43 [PATCH v2 0/5] kcsan, seqlock: Support seqcount_latch_t Marco Elver
2024-11-04 15:43 ` [PATCH v2 1/5] time/sched_clock: Swap update_clock_read_data() latch writes Marco Elver
2024-11-06 10:47   ` [tip: locking/core] " tip-bot2 for Marco Elver
2024-11-04 15:43 ` [PATCH v2 2/5] time/sched_clock: Broaden sched_clock()'s instrumentation coverage Marco Elver
2024-11-05  9:22   ` Marco Elver
2024-11-05  9:35     ` Peter Zijlstra
2024-11-06 10:47   ` [tip: locking/core] " tip-bot2 for Marco Elver
2024-11-04 15:43 ` [PATCH v2 3/5] kcsan, seqlock: Support seqcount_latch_t Marco Elver
2024-11-06 10:47   ` [tip: locking/core] " tip-bot2 for Marco Elver
2024-11-04 15:43 ` [PATCH v2 4/5] seqlock, treewide: Switch to non-raw seqcount_latch interface Marco Elver
2024-11-06 10:47   ` [tip: locking/core] " tip-bot2 for Marco Elver
2024-11-04 15:43 ` [PATCH v2 5/5] kcsan, seqlock: Fix incorrect assumption in read_seqbegin() Marco Elver
2024-11-05  9:13   ` Peter Zijlstra
2024-11-05  9:28     ` Marco Elver
2024-11-05  9:34   ` Peter Zijlstra
2024-11-05  9:41     ` Peter Zijlstra
2024-11-05  9:50     ` Marco Elver
2024-11-06 10:47   ` [tip: locking/core] " tip-bot2 for Marco Elver

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox