* [PATCH 00/12] powerpc: KCSAN fix warnings and mark accesses
@ 2023-05-08 2:01 Rohan McLure
2023-05-08 2:01 ` [PATCH 01/12] powerpc: qspinlock: Fix qnode->locked value interpretation Rohan McLure
` (11 more replies)
0 siblings, 12 replies; 29+ messages in thread
From: Rohan McLure @ 2023-05-08 2:01 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Rohan McLure, npiggin, arnd
The KCSAN sanitiser notifies programmers of instances where unmarked
accesses to shared state has lead to a data race, or when the compiler
has liberty to reorder an unmarked access and so generate a data race.
This patch series deals with benign data races, which nonetheless need
annotation in order to ensure the correctness of the emitted code.
In keeping with the principles given in
tools/memory-model/Documentation/access-marking.txt, racing reads of
shared state for purely diagnostic/debug purposes are annotated with
data_race, while reads/writes that are examples of intention polling of
shared variables are performed with READ_ONCE, WRITE_ONCE.
These changes remove the majority of warnings observable on pseries and
powernv, where for development, I was able to narrow down to only power
relevant bugs by temporarily disabling sanitisation for all other files.
Future patch series will deal with the subtler bugs which persist under
this configuration.
KCSAN races addressed:
- qspinlock: assignign of qnode->locked and polling
- check_return_regs_valid [h]srr_valid
- arch_cpu_idle idle callback
- powernv idle_state paca entry (polling the bit-lock is viewed by
KCSAN as asynchronous access to the fields it protects)
- Asynchronous access to irq_data->hwirq
- Opal asynchronous event handling
- IPIs
Miscellaneous other changes:
- Annotate the asm-generic/mmiowb code, which riscv and powerpc each
consume
- Update usages of qnode->locked in powerpc's qspinlock interpretation
to reflect the comment beside this field
Rohan McLure (12):
powerpc: qspinlock: Fix qnode->locked value interpretation
powerpc: qspinlock: Mark accesses to qnode lock checks
powerpc: qspinlock: Enforce qnode writes prior to publishing to queue
asm-generic/mmiowb: Mark accesses to fix KCSAN warnings
powerpc: Mark [h]ssr_valid accesses in check_return_regs_valid
powerpc: Mark accesses to power_save callback in arch_cpu_idle
powerpc: powernv: Fix KCSAN datarace warnings on idle_state contention
powerpc: Annotate accesses to ipi message flags
powerpc: Mark writes registering ipi to host cpu through kvm
powerpc: powernv: Annotate data races in opal events
powerpc: powernv: Annotate asynchronous access to opal tokens
powerpc: Mark asynchronous accesses to irq_data
arch/powerpc/include/asm/kvm_ppc.h | 4 ++--
arch/powerpc/include/asm/paca.h | 1 +
arch/powerpc/include/asm/ptrace.h | 4 ++--
arch/powerpc/kernel/idle.c | 6 ++++--
arch/powerpc/kernel/interrupt.c | 14 ++++++-------
arch/powerpc/kernel/irq.c | 2 +-
arch/powerpc/kernel/smp.c | 2 +-
arch/powerpc/lib/qspinlock.c | 14 ++++++++-----
arch/powerpc/platforms/powernv/idle.c | 20 ++++++++++---------
arch/powerpc/platforms/powernv/opal-async.c | 6 +++---
arch/powerpc/platforms/powernv/opal-irqchip.c | 6 +++---
arch/powerpc/platforms/powernv/pci-ioda.c | 12 +++++------
include/asm-generic/mmiowb.h | 17 ++++++++++------
include/linux/irq.h | 2 +-
kernel/irq/irqdomain.c | 4 ++--
15 files changed, 63 insertions(+), 51 deletions(-)
--
2.37.2
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 01/12] powerpc: qspinlock: Fix qnode->locked value interpretation
2023-05-08 2:01 [PATCH 00/12] powerpc: KCSAN fix warnings and mark accesses Rohan McLure
@ 2023-05-08 2:01 ` Rohan McLure
2023-05-09 2:01 ` Nicholas Piggin
2023-05-08 2:01 ` [PATCH 02/12] powerpc: qspinlock: Mark accesses to qnode lock checks Rohan McLure
` (10 subsequent siblings)
11 siblings, 1 reply; 29+ messages in thread
From: Rohan McLure @ 2023-05-08 2:01 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Rohan McLure, npiggin, arnd
A comment accompanying the locked attribute of a qnode assigns a value
of 1 to mean that the lock has been acquired. The usages of this
variable however assume opposite semantics. Update usages so that the
assertions of this comment are reflected in this file.
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
---
arch/powerpc/lib/qspinlock.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/lib/qspinlock.c b/arch/powerpc/lib/qspinlock.c
index e4bd145255d0..9cf93963772b 100644
--- a/arch/powerpc/lib/qspinlock.c
+++ b/arch/powerpc/lib/qspinlock.c
@@ -435,7 +435,7 @@ static __always_inline bool yield_to_prev(struct qspinlock *lock, struct qnode *
smp_rmb(); /* See __yield_to_locked_owner comment */
- if (!node->locked) {
+ if (node->locked) {
yield_to_preempted(prev_cpu, yield_count);
spin_begin();
return preempted;
@@ -566,7 +566,7 @@ static __always_inline void queued_spin_lock_mcs_queue(struct qspinlock *lock, b
node->lock = lock;
node->cpu = smp_processor_id();
node->yield_cpu = -1;
- node->locked = 0;
+ node->locked = 1;
tail = encode_tail_cpu(node->cpu);
@@ -584,7 +584,7 @@ static __always_inline void queued_spin_lock_mcs_queue(struct qspinlock *lock, b
/* Wait for mcs node lock to be released */
spin_begin();
- while (!node->locked) {
+ while (node->locked) {
spec_barrier();
if (yield_to_prev(lock, node, old, paravirt))
@@ -693,13 +693,13 @@ static __always_inline void queued_spin_lock_mcs_queue(struct qspinlock *lock, b
*/
if (paravirt && pv_prod_head) {
int next_cpu = next->cpu;
- WRITE_ONCE(next->locked, 1);
+ WRITE_ONCE(next->locked, 0);
if (_Q_SPIN_MISO)
asm volatile("miso" ::: "memory");
if (vcpu_is_preempted(next_cpu))
prod_cpu(next_cpu);
} else {
- WRITE_ONCE(next->locked, 1);
+ WRITE_ONCE(next->locked, 0);
if (_Q_SPIN_MISO)
asm volatile("miso" ::: "memory");
}
--
2.37.2
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 02/12] powerpc: qspinlock: Mark accesses to qnode lock checks
2023-05-08 2:01 [PATCH 00/12] powerpc: KCSAN fix warnings and mark accesses Rohan McLure
2023-05-08 2:01 ` [PATCH 01/12] powerpc: qspinlock: Fix qnode->locked value interpretation Rohan McLure
@ 2023-05-08 2:01 ` Rohan McLure
2023-05-09 2:02 ` Nicholas Piggin
2023-05-08 2:01 ` [PATCH 03/12] powerpc: qspinlock: Enforce qnode writes prior to publishing to queue Rohan McLure
` (9 subsequent siblings)
11 siblings, 1 reply; 29+ messages in thread
From: Rohan McLure @ 2023-05-08 2:01 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Rohan McLure, npiggin, arnd
The powerpc implemenation of qspinlocks will both poll and spin on the
bitlock guarding a qnode. Mark these accesses with READ_ONCE to convey
to KCSAN that polling is intentional here.
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
---
arch/powerpc/lib/qspinlock.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/lib/qspinlock.c b/arch/powerpc/lib/qspinlock.c
index 9cf93963772b..579290d55abf 100644
--- a/arch/powerpc/lib/qspinlock.c
+++ b/arch/powerpc/lib/qspinlock.c
@@ -435,7 +435,7 @@ static __always_inline bool yield_to_prev(struct qspinlock *lock, struct qnode *
smp_rmb(); /* See __yield_to_locked_owner comment */
- if (node->locked) {
+ if (READ_ONCE(node->locked)) {
yield_to_preempted(prev_cpu, yield_count);
spin_begin();
return preempted;
@@ -584,7 +584,7 @@ static __always_inline void queued_spin_lock_mcs_queue(struct qspinlock *lock, b
/* Wait for mcs node lock to be released */
spin_begin();
- while (node->locked) {
+ while (READ_ONCE(node->locked)) {
spec_barrier();
if (yield_to_prev(lock, node, old, paravirt))
--
2.37.2
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 03/12] powerpc: qspinlock: Enforce qnode writes prior to publishing to queue
2023-05-08 2:01 [PATCH 00/12] powerpc: KCSAN fix warnings and mark accesses Rohan McLure
2023-05-08 2:01 ` [PATCH 01/12] powerpc: qspinlock: Fix qnode->locked value interpretation Rohan McLure
2023-05-08 2:01 ` [PATCH 02/12] powerpc: qspinlock: Mark accesses to qnode lock checks Rohan McLure
@ 2023-05-08 2:01 ` Rohan McLure
2023-05-09 2:04 ` Nicholas Piggin
2023-05-08 2:01 ` [PATCH 04/12] asm-generic/mmiowb: Mark accesses to fix KCSAN warnings Rohan McLure
` (8 subsequent siblings)
11 siblings, 1 reply; 29+ messages in thread
From: Rohan McLure @ 2023-05-08 2:01 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Rohan McLure, npiggin, arnd
Use a compiler barrier to enforce that all fields of a new struct qnode
be written to (especially the lock value) before publishing the qnode to
the waitqueue.
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
---
arch/powerpc/lib/qspinlock.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/arch/powerpc/lib/qspinlock.c b/arch/powerpc/lib/qspinlock.c
index 579290d55abf..d548001a86be 100644
--- a/arch/powerpc/lib/qspinlock.c
+++ b/arch/powerpc/lib/qspinlock.c
@@ -567,6 +567,10 @@ static __always_inline void queued_spin_lock_mcs_queue(struct qspinlock *lock, b
node->cpu = smp_processor_id();
node->yield_cpu = -1;
node->locked = 1;
+ /*
+ * Assign all attributes of a node before it can be published.
+ */
+ barrier();
tail = encode_tail_cpu(node->cpu);
--
2.37.2
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 04/12] asm-generic/mmiowb: Mark accesses to fix KCSAN warnings
2023-05-08 2:01 [PATCH 00/12] powerpc: KCSAN fix warnings and mark accesses Rohan McLure
` (2 preceding siblings ...)
2023-05-08 2:01 ` [PATCH 03/12] powerpc: qspinlock: Enforce qnode writes prior to publishing to queue Rohan McLure
@ 2023-05-08 2:01 ` Rohan McLure
2023-05-08 6:30 ` Arnd Bergmann
` (2 more replies)
2023-05-08 2:01 ` [PATCH 05/12] powerpc: Mark [h]ssr_valid accesses in check_return_regs_valid Rohan McLure
` (7 subsequent siblings)
11 siblings, 3 replies; 29+ messages in thread
From: Rohan McLure @ 2023-05-08 2:01 UTC (permalink / raw)
To: linuxppc-dev; +Cc: arnd, npiggin, Rohan McLure, Gautam Menghani
Prior to this patch, data races are detectable by KCSAN of the following
forms:
[1] Asynchronous calls to mmiowb_set_pending() from an interrupt context
or otherwise outside of a critical section
[2] Interrupted critical sections, where the interrupt will itself
acquire a lock
In case [1], calling context does not need an mmiowb() call to be
issued, otherwise it would do so itself. Such calls to
mmiowb_set_pending() are either idempotent or no-ops.
In case [2], irrespective of when the interrupt occurs, the interrupt
will acquire and release its locks prior to its return, nesting_count
will continue balanced. In the worst case, the interrupted critical
section during a mmiowb_spin_unlock() call observes an mmiowb to be
pending and afterward is interrupted, leading to an extraneous call to
mmiowb(). This data race is clearly innocuous.
Mark all potentially asynchronous memory accesses with READ_ONCE or
WRITE_ONCE, including increments and decrements to nesting_count. This
has the effect of removing KCSAN warnings at consumer's callsites.
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Reported-by: Gautam Menghani <gautammenghani201@gmail.com>
---
include/asm-generic/mmiowb.h | 17 +++++++++++------
1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/include/asm-generic/mmiowb.h b/include/asm-generic/mmiowb.h
index 5698fca3bf56..0b8b794150db 100644
--- a/include/asm-generic/mmiowb.h
+++ b/include/asm-generic/mmiowb.h
@@ -35,27 +35,32 @@ DECLARE_PER_CPU(struct mmiowb_state, __mmiowb_state);
static inline void mmiowb_set_pending(void)
{
struct mmiowb_state *ms = __mmiowb_state();
+ u16 nesting_count = READ_ONCE(ms->nesting_count);
- if (likely(ms->nesting_count))
- ms->mmiowb_pending = ms->nesting_count;
+ if (likely(nesting_count))
+ WRITE_ONCE(ms->mmiowb_pending, nesting_count);
}
static inline void mmiowb_spin_lock(void)
{
struct mmiowb_state *ms = __mmiowb_state();
- ms->nesting_count++;
+
+ /* Increment need not be atomic. Nestedness is balanced over interrupts. */
+ WRITE_ONCE(ms->nesting_count, READ_ONCE(ms->nesting_count) + 1);
}
static inline void mmiowb_spin_unlock(void)
{
struct mmiowb_state *ms = __mmiowb_state();
+ u16 pending = READ_ONCE(ms->mmiowb_pending);
- if (unlikely(ms->mmiowb_pending)) {
- ms->mmiowb_pending = 0;
+ WRITE_ONCE(ms->mmiowb_pending, 0);
+ if (unlikely(pending)) {
mmiowb();
}
- ms->nesting_count--;
+ /* Decrement need not be atomic. Nestedness is balanced over interrupts. */
+ WRITE_ONCE(ms->nesting_count, READ_ONCE(ms->nesting_count) - 1);
}
#else
#define mmiowb_set_pending() do { } while (0)
--
2.37.2
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 05/12] powerpc: Mark [h]ssr_valid accesses in check_return_regs_valid
2023-05-08 2:01 [PATCH 00/12] powerpc: KCSAN fix warnings and mark accesses Rohan McLure
` (3 preceding siblings ...)
2023-05-08 2:01 ` [PATCH 04/12] asm-generic/mmiowb: Mark accesses to fix KCSAN warnings Rohan McLure
@ 2023-05-08 2:01 ` Rohan McLure
2023-05-09 2:17 ` Nicholas Piggin
2023-05-08 2:01 ` [PATCH 06/12] powerpc: Mark accesses to power_save callback in arch_cpu_idle Rohan McLure
` (6 subsequent siblings)
11 siblings, 1 reply; 29+ messages in thread
From: Rohan McLure @ 2023-05-08 2:01 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Rohan McLure, npiggin, arnd
Checks to see if the [H]SRR registers have been clobbered by (soft)
NMI interrupts imply the possibility for a data race on the
[h]srr_valid entries in the PACA. Annotate accesses to these fields with
READ_ONCE, removing the need for the barrier.
The diagnostic can use plain-access reads and writes, but annotate with
data_race.
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
---
arch/powerpc/include/asm/ptrace.h | 4 ++--
arch/powerpc/kernel/interrupt.c | 14 ++++++--------
2 files changed, 8 insertions(+), 10 deletions(-)
diff --git a/arch/powerpc/include/asm/ptrace.h b/arch/powerpc/include/asm/ptrace.h
index 0eb90a013346..9db8b16567e2 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -180,8 +180,8 @@ void do_syscall_trace_leave(struct pt_regs *regs);
static inline void set_return_regs_changed(void)
{
#ifdef CONFIG_PPC_BOOK3S_64
- local_paca->hsrr_valid = 0;
- local_paca->srr_valid = 0;
+ WRITE_ONCE(local_paca->hsrr_valid, 0);
+ WRITE_ONCE(local_paca->srr_valid, 0);
#endif
}
diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
index e34c72285b4e..1f033f11b871 100644
--- a/arch/powerpc/kernel/interrupt.c
+++ b/arch/powerpc/kernel/interrupt.c
@@ -125,7 +125,7 @@ static notrace void check_return_regs_valid(struct pt_regs *regs)
case 0x1600:
case 0x1800:
validp = &local_paca->hsrr_valid;
- if (!*validp)
+ if (!READ_ONCE(*validp))
return;
srr0 = mfspr(SPRN_HSRR0);
@@ -135,7 +135,7 @@ static notrace void check_return_regs_valid(struct pt_regs *regs)
break;
default:
validp = &local_paca->srr_valid;
- if (!*validp)
+ if (!READ_ONCE(*validp))
return;
srr0 = mfspr(SPRN_SRR0);
@@ -161,19 +161,17 @@ static notrace void check_return_regs_valid(struct pt_regs *regs)
* such things will get caught most of the time, statistically
* enough to be able to get a warning out.
*/
- barrier();
-
- if (!*validp)
+ if (!READ_ONCE(*validp))
return;
- if (!warned) {
- warned = true;
+ if (!data_race(warned)) {
+ data_race(warned = true);
printk("%sSRR0 was: %lx should be: %lx\n", h, srr0, regs->nip);
printk("%sSRR1 was: %lx should be: %lx\n", h, srr1, regs->msr);
show_regs(regs);
}
- *validp = 0; /* fixup */
+ WRITE_ONCE(*validp, 0); /* fixup */
#endif
}
--
2.37.2
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 06/12] powerpc: Mark accesses to power_save callback in arch_cpu_idle
2023-05-08 2:01 [PATCH 00/12] powerpc: KCSAN fix warnings and mark accesses Rohan McLure
` (4 preceding siblings ...)
2023-05-08 2:01 ` [PATCH 05/12] powerpc: Mark [h]ssr_valid accesses in check_return_regs_valid Rohan McLure
@ 2023-05-08 2:01 ` Rohan McLure
2023-05-09 2:21 ` Nicholas Piggin
2023-05-08 2:01 ` [PATCH 07/12] powerpc: powernv: Fix KCSAN datarace warnings on idle_state contention Rohan McLure
` (5 subsequent siblings)
11 siblings, 1 reply; 29+ messages in thread
From: Rohan McLure @ 2023-05-08 2:01 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Rohan McLure, npiggin, arnd
The power_save callback can be overwritten by another core at boot time.
Specifically, null values will be replaced exactly once with the callback
suitable for the particular platform (PowerNV / pseries lpars). Mark
reads to this variable with READ_ONCE to signal to KCSAN that this race
is acceptable, as well as to rule-out the possibility for compiler reorderings
leading to calling a null pointer.
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
---
arch/powerpc/kernel/idle.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
index b1c0418b25c8..a1589bb97c98 100644
--- a/arch/powerpc/kernel/idle.c
+++ b/arch/powerpc/kernel/idle.c
@@ -43,10 +43,12 @@ __setup("powersave=off", powersave_off);
void arch_cpu_idle(void)
{
+ void (*power_save)(void) = READ_ONCE(ppc_md.power_save);
+
ppc64_runlatch_off();
- if (ppc_md.power_save) {
- ppc_md.power_save();
+ if (power_save) {
+ power_save();
/*
* Some power_save functions return with
* interrupts enabled, some don't.
--
2.37.2
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 07/12] powerpc: powernv: Fix KCSAN datarace warnings on idle_state contention
2023-05-08 2:01 [PATCH 00/12] powerpc: KCSAN fix warnings and mark accesses Rohan McLure
` (5 preceding siblings ...)
2023-05-08 2:01 ` [PATCH 06/12] powerpc: Mark accesses to power_save callback in arch_cpu_idle Rohan McLure
@ 2023-05-08 2:01 ` Rohan McLure
2023-05-09 2:26 ` Nicholas Piggin
2023-05-08 2:01 ` [PATCH 08/12] powerpc: Annotate accesses to ipi message flags Rohan McLure
` (4 subsequent siblings)
11 siblings, 1 reply; 29+ messages in thread
From: Rohan McLure @ 2023-05-08 2:01 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Rohan McLure, arnd, npiggin, Rohan McLure
The idle_state entry in the PACA on PowerNV features a bit which is
atomically tested and set through ldarx/stdcx. to be used as a spinlock.
This lock then guards access to other bit fields of idle_state. KCSAN
cannot differentiate between any of these bitfield accesses as they all
are implemented by 8-byte store/load instructions, thus cores contending
on the bit-lock appear to data race with modifications to idle_state.
Separate the bit-lock entry from the data guarded by the lock to avoid
the possibility of data races being detected by KCSAN.
Suggested-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Rohan McLure <rmclure@ibm.com>
---
arch/powerpc/include/asm/paca.h | 1 +
arch/powerpc/platforms/powernv/idle.c | 20 +++++++++++---------
2 files changed, 12 insertions(+), 9 deletions(-)
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index da0377f46597..cb325938766a 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -191,6 +191,7 @@ struct paca_struct {
#ifdef CONFIG_PPC_POWERNV
/* PowerNV idle fields */
/* PNV_CORE_IDLE_* bits, all siblings work on thread 0 paca */
+ unsigned long idle_lock; /* A value of 1 means acquired */
unsigned long idle_state;
union {
/* P7/P8 specific fields */
diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
index 841cb7f31f4f..97dbb7bc2b00 100644
--- a/arch/powerpc/platforms/powernv/idle.c
+++ b/arch/powerpc/platforms/powernv/idle.c
@@ -246,9 +246,9 @@ static inline void atomic_lock_thread_idle(void)
{
int cpu = raw_smp_processor_id();
int first = cpu_first_thread_sibling(cpu);
- unsigned long *state = &paca_ptrs[first]->idle_state;
+ unsigned long *lock = &paca_ptrs[first]->idle_lock;
- while (unlikely(test_and_set_bit_lock(NR_PNV_CORE_IDLE_LOCK_BIT, state)))
+ while (unlikely(test_and_set_bit_lock(NR_PNV_CORE_IDLE_LOCK_BIT, lock)))
barrier();
}
@@ -258,29 +258,31 @@ static inline void atomic_unlock_and_stop_thread_idle(void)
int first = cpu_first_thread_sibling(cpu);
unsigned long thread = 1UL << cpu_thread_in_core(cpu);
unsigned long *state = &paca_ptrs[first]->idle_state;
+ unsigned long *lock = &paca_ptrs[first]->idle_lock;
u64 s = READ_ONCE(*state);
u64 new, tmp;
- BUG_ON(!(s & PNV_CORE_IDLE_LOCK_BIT));
+ BUG_ON(!(READ_ONCE(*lock) & PNV_CORE_IDLE_LOCK_BIT));
BUG_ON(s & thread);
again:
- new = (s | thread) & ~PNV_CORE_IDLE_LOCK_BIT;
+ new = s | thread;
tmp = cmpxchg(state, s, new);
if (unlikely(tmp != s)) {
s = tmp;
goto again;
}
+ clear_bit_unlock(NR_PNV_CORE_IDLE_LOCK_BIT, lock);
}
static inline void atomic_unlock_thread_idle(void)
{
int cpu = raw_smp_processor_id();
int first = cpu_first_thread_sibling(cpu);
- unsigned long *state = &paca_ptrs[first]->idle_state;
+ unsigned long *lock = &paca_ptrs[first]->idle_lock;
- BUG_ON(!test_bit(NR_PNV_CORE_IDLE_LOCK_BIT, state));
- clear_bit_unlock(NR_PNV_CORE_IDLE_LOCK_BIT, state);
+ BUG_ON(!test_bit(NR_PNV_CORE_IDLE_LOCK_BIT, lock));
+ clear_bit_unlock(NR_PNV_CORE_IDLE_LOCK_BIT, lock);
}
/* P7 and P8 */
@@ -380,9 +382,9 @@ static unsigned long power7_idle_insn(unsigned long type)
sprs.uamor = mfspr(SPRN_UAMOR);
}
- local_paca->thread_idle_state = type;
+ WRITE_ONCE(local_paca->thread_idle_state, type);
srr1 = isa206_idle_insn_mayloss(type); /* go idle */
- local_paca->thread_idle_state = PNV_THREAD_RUNNING;
+ WRITE_ONCE(local_paca->thread_idle_state, PNV_THREAD_RUNNING);
WARN_ON_ONCE(!srr1);
WARN_ON_ONCE(mfmsr() & (MSR_IR|MSR_DR));
--
2.37.2
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 08/12] powerpc: Annotate accesses to ipi message flags
2023-05-08 2:01 [PATCH 00/12] powerpc: KCSAN fix warnings and mark accesses Rohan McLure
` (6 preceding siblings ...)
2023-05-08 2:01 ` [PATCH 07/12] powerpc: powernv: Fix KCSAN datarace warnings on idle_state contention Rohan McLure
@ 2023-05-08 2:01 ` Rohan McLure
2023-05-09 2:28 ` Nicholas Piggin
2023-05-08 2:01 ` [PATCH 09/12] powerpc: Mark writes registering ipi to host cpu through kvm Rohan McLure
` (3 subsequent siblings)
11 siblings, 1 reply; 29+ messages in thread
From: Rohan McLure @ 2023-05-08 2:01 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Rohan McLure, npiggin, arnd
IPI message flags are observed and consequently consumed in the
smp_ipi_demux_relaxed function, which handles these message sources
until it observes none more arriving. Mark the checked loop guard with
READ_ONCE, to signal to KCSAN that the read is known to be volatile, and
that non-determinism is expected.
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
---
arch/powerpc/kernel/smp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 6b90f10a6c81..00b74d66b771 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -348,7 +348,7 @@ irqreturn_t smp_ipi_demux_relaxed(void)
if (all & IPI_MESSAGE(PPC_MSG_NMI_IPI))
nmi_ipi_action(0, NULL);
#endif
- } while (info->messages);
+ } while (READ_ONCE(info->messages));
return IRQ_HANDLED;
}
--
2.37.2
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 09/12] powerpc: Mark writes registering ipi to host cpu through kvm
2023-05-08 2:01 [PATCH 00/12] powerpc: KCSAN fix warnings and mark accesses Rohan McLure
` (7 preceding siblings ...)
2023-05-08 2:01 ` [PATCH 08/12] powerpc: Annotate accesses to ipi message flags Rohan McLure
@ 2023-05-08 2:01 ` Rohan McLure
2023-05-09 2:30 ` Nicholas Piggin
2023-05-08 2:01 ` [PATCH 10/12] powerpc: powernv: Annotate data races in opal events Rohan McLure
` (2 subsequent siblings)
11 siblings, 1 reply; 29+ messages in thread
From: Rohan McLure @ 2023-05-08 2:01 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Rohan McLure, npiggin, arnd
Mark writes to hypervisor ipi state so that KCSAN recognises these
asynchronous issue of kvmppc_{set,clear}_host_ipi to be intended, with
atomic writes.
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
---
arch/powerpc/include/asm/kvm_ppc.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index bc57d058ad5b..d701df006c08 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -548,12 +548,12 @@ static inline void kvmppc_set_host_ipi(int cpu)
* pairs with the barrier in kvmppc_clear_host_ipi()
*/
smp_mb();
- paca_ptrs[cpu]->kvm_hstate.host_ipi = 1;
+ WRITE_ONCE(paca_ptrs[cpu]->kvm_hstate.host_ipi, 1);
}
static inline void kvmppc_clear_host_ipi(int cpu)
{
- paca_ptrs[cpu]->kvm_hstate.host_ipi = 0;
+ WRITE_ONCE(paca_ptrs[cpu]->kvm_hstate.host_ipi, 0);
/*
* order clearing of host_ipi flag vs. processing of IPI messages
*
--
2.37.2
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 10/12] powerpc: powernv: Annotate data races in opal events
2023-05-08 2:01 [PATCH 00/12] powerpc: KCSAN fix warnings and mark accesses Rohan McLure
` (8 preceding siblings ...)
2023-05-08 2:01 ` [PATCH 09/12] powerpc: Mark writes registering ipi to host cpu through kvm Rohan McLure
@ 2023-05-08 2:01 ` Rohan McLure
2023-05-09 2:31 ` Nicholas Piggin
2023-05-08 2:01 ` [PATCH 11/12] powerpc: powernv: Annotate asynchronous access to opal tokens Rohan McLure
2023-05-08 2:01 ` [PATCH 12/12] powerpc: Mark asynchronous accesses to irq_data Rohan McLure
11 siblings, 1 reply; 29+ messages in thread
From: Rohan McLure @ 2023-05-08 2:01 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Rohan McLure, npiggin, arnd
The kopald thread handles opal events as they appear, but by polling a
static bit-vector in last_outstanding_events. Annotate these data races
accordingly. We are not at risk of missing events, but use of READ_ONCE,
WRITE_ONCE will assist readers in seeing that kopald only consumes the
events it is aware of when it is scheduled. Also removes extraneous
KCSAN warnings.
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
---
arch/powerpc/platforms/powernv/opal-irqchip.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/opal-irqchip.c b/arch/powerpc/platforms/powernv/opal-irqchip.c
index d55652b5f6fa..f9a7001dacb7 100644
--- a/arch/powerpc/platforms/powernv/opal-irqchip.c
+++ b/arch/powerpc/platforms/powernv/opal-irqchip.c
@@ -59,7 +59,7 @@ void opal_handle_events(void)
cond_resched();
}
- last_outstanding_events = 0;
+ WRITE_ONCE(last_outstanding_events, 0);
if (opal_poll_events(&events) != OPAL_SUCCESS)
return;
e = be64_to_cpu(events) & opal_event_irqchip.mask;
@@ -69,7 +69,7 @@ void opal_handle_events(void)
bool opal_have_pending_events(void)
{
- if (last_outstanding_events & opal_event_irqchip.mask)
+ if (READ_ONCE(last_outstanding_events) & opal_event_irqchip.mask)
return true;
return false;
}
@@ -124,7 +124,7 @@ static irqreturn_t opal_interrupt(int irq, void *data)
__be64 events;
opal_handle_interrupt(virq_to_hw(irq), &events);
- last_outstanding_events = be64_to_cpu(events);
+ WRITE_ONCE(last_outstanding_events, be64_to_cpu(events));
if (opal_have_pending_events())
opal_wake_poller();
--
2.37.2
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 11/12] powerpc: powernv: Annotate asynchronous access to opal tokens
2023-05-08 2:01 [PATCH 00/12] powerpc: KCSAN fix warnings and mark accesses Rohan McLure
` (9 preceding siblings ...)
2023-05-08 2:01 ` [PATCH 10/12] powerpc: powernv: Annotate data races in opal events Rohan McLure
@ 2023-05-08 2:01 ` Rohan McLure
2023-05-08 2:01 ` [PATCH 12/12] powerpc: Mark asynchronous accesses to irq_data Rohan McLure
11 siblings, 0 replies; 29+ messages in thread
From: Rohan McLure @ 2023-05-08 2:01 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Rohan McLure, npiggin, arnd
The opal-async.c unit contains code for polling event sources, which
implies intentional data races. Ensure that the compiler will atomically
access such variables by means of {READ,WRITE}_ONCE calls, which in turn
inform KCSAN that polling behaviour is intended.
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
---
arch/powerpc/platforms/powernv/opal-async.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/opal-async.c b/arch/powerpc/platforms/powernv/opal-async.c
index c094fdf5825c..282d2ac6fbb0 100644
--- a/arch/powerpc/platforms/powernv/opal-async.c
+++ b/arch/powerpc/platforms/powernv/opal-async.c
@@ -146,7 +146,7 @@ int opal_async_wait_response(uint64_t token, struct opal_msg *msg)
* functional.
*/
opal_wake_poller();
- wait_event(opal_async_wait, opal_async_tokens[token].state
+ wait_event(opal_async_wait, READ_ONCE(opal_async_tokens[token].state)
== ASYNC_TOKEN_COMPLETED);
memcpy(msg, &opal_async_tokens[token].response, sizeof(*msg));
@@ -185,7 +185,7 @@ int opal_async_wait_response_interruptible(uint64_t token, struct opal_msg *msg)
* interruptible version before doing anything else with the
* token.
*/
- if (opal_async_tokens[token].state == ASYNC_TOKEN_ALLOCATED) {
+ if (READ_ONCE(opal_async_tokens[token].state) == ASYNC_TOKEN_ALLOCATED) {
spin_lock_irqsave(&opal_async_comp_lock, flags);
if (opal_async_tokens[token].state == ASYNC_TOKEN_ALLOCATED)
opal_async_tokens[token].state = ASYNC_TOKEN_DISPATCHED;
@@ -199,7 +199,7 @@ int opal_async_wait_response_interruptible(uint64_t token, struct opal_msg *msg)
*/
opal_wake_poller();
ret = wait_event_interruptible(opal_async_wait,
- opal_async_tokens[token].state ==
+ READ_ONCE(opal_async_tokens[token].state) ==
ASYNC_TOKEN_COMPLETED);
if (!ret)
memcpy(msg, &opal_async_tokens[token].response, sizeof(*msg));
--
2.37.2
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 12/12] powerpc: Mark asynchronous accesses to irq_data
2023-05-08 2:01 [PATCH 00/12] powerpc: KCSAN fix warnings and mark accesses Rohan McLure
` (10 preceding siblings ...)
2023-05-08 2:01 ` [PATCH 11/12] powerpc: powernv: Annotate asynchronous access to opal tokens Rohan McLure
@ 2023-05-08 2:01 ` Rohan McLure
11 siblings, 0 replies; 29+ messages in thread
From: Rohan McLure @ 2023-05-08 2:01 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Rohan McLure, npiggin, arnd
KCSAN revealed that while irq_data entries are written to either from
behind a mutex, or otherwise atomically, accesses to irq_data->hwirq can
occur asynchronously, without volatile annotation. Mark these accesses
with READ_ONCE to avoid unfortunate compiler reorderings and remove
KCSAN warnings.
Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
---
arch/powerpc/kernel/irq.c | 2 +-
arch/powerpc/platforms/powernv/pci-ioda.c | 12 ++++++------
include/linux/irq.h | 2 +-
kernel/irq/irqdomain.c | 4 ++--
4 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 6f7d4edaa0bc..4ac192755510 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -353,7 +353,7 @@ void do_softirq_own_stack(void)
irq_hw_number_t virq_to_hw(unsigned int virq)
{
struct irq_data *irq_data = irq_get_irq_data(virq);
- return WARN_ON(!irq_data) ? 0 : irq_data->hwirq;
+ return WARN_ON(!irq_data) ? 0 : READ_ONCE(irq_data->hwirq);
}
EXPORT_SYMBOL_GPL(virq_to_hw);
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index f851f4983423..141491e86bba 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -1986,7 +1986,7 @@ int64_t pnv_opal_pci_msi_eoi(struct irq_data *d)
struct pci_controller *hose = irq_data_get_irq_chip_data(d->parent_data);
struct pnv_phb *phb = hose->private_data;
- return opal_pci_msi_eoi(phb->opal_id, d->parent_data->hwirq);
+ return opal_pci_msi_eoi(phb->opal_id, READ_ONCE(d->parent_data->hwirq));
}
/*
@@ -2162,11 +2162,11 @@ static void pnv_msi_compose_msg(struct irq_data *d, struct msi_msg *msg)
struct pnv_phb *phb = hose->private_data;
int rc;
- rc = __pnv_pci_ioda_msi_setup(phb, pdev, d->hwirq,
+ rc = __pnv_pci_ioda_msi_setup(phb, pdev, READ_ONCE(d->hwirq),
entry->pci.msi_attrib.is_64, msg);
if (rc)
dev_err(&pdev->dev, "Failed to setup %s-bit MSI #%ld : %d\n",
- entry->pci.msi_attrib.is_64 ? "64" : "32", d->hwirq, rc);
+ entry->pci.msi_attrib.is_64 ? "64" : "32", data_race(d->hwirq), rc);
}
/*
@@ -2184,7 +2184,7 @@ static void pnv_msi_eoi(struct irq_data *d)
* since it is translated into a vector number in
* OPAL, use that directly.
*/
- WARN_ON_ONCE(opal_pci_msi_eoi(phb->opal_id, d->hwirq));
+ WARN_ON_ONCE(opal_pci_msi_eoi(phb->opal_id, READ_ONCE(d->hwirq)));
}
irq_chip_eoi_parent(d);
@@ -2263,9 +2263,9 @@ static void pnv_irq_domain_free(struct irq_domain *domain, unsigned int virq,
struct pnv_phb *phb = hose->private_data;
pr_debug("%s bridge %pOF %d/%lx #%d\n", __func__, hose->dn,
- virq, d->hwirq, nr_irqs);
+ virq, data_race(d->hwirq), nr_irqs);
- msi_bitmap_free_hwirqs(&phb->msi_bmp, d->hwirq, nr_irqs);
+ msi_bitmap_free_hwirqs(&phb->msi_bmp, READ_ONCE(d->hwirq), nr_irqs);
/* XIVE domain is cleared through ->msi_free() */
}
diff --git a/include/linux/irq.h b/include/linux/irq.h
index b1b28affb32a..a6888bcb3c5b 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -452,7 +452,7 @@ static inline bool irqd_affinity_on_activate(struct irq_data *d)
static inline irq_hw_number_t irqd_to_hwirq(struct irq_data *d)
{
- return d->hwirq;
+ return READ_ONCE(d->hwirq);
}
/**
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index f34760a1e222..dd9054494f84 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -549,7 +549,7 @@ static void irq_domain_disassociate(struct irq_domain *domain, unsigned int irq)
"virq%i doesn't exist; cannot disassociate\n", irq))
return;
- hwirq = irq_data->hwirq;
+ hwirq = READ_ONCE(irq_data->hwirq);
mutex_lock(&domain->root->mutex);
@@ -948,7 +948,7 @@ struct irq_desc *__irq_resolve_mapping(struct irq_domain *domain,
if (irq_domain_is_nomap(domain)) {
if (hwirq < domain->hwirq_max) {
data = irq_domain_get_irq_data(domain, hwirq);
- if (data && data->hwirq == hwirq)
+ if (data && READ_ONCE(data->hwirq) == hwirq)
desc = irq_data_to_desc(data);
if (irq && desc)
*irq = hwirq;
--
2.37.2
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 04/12] asm-generic/mmiowb: Mark accesses to fix KCSAN warnings
2023-05-08 2:01 ` [PATCH 04/12] asm-generic/mmiowb: Mark accesses to fix KCSAN warnings Rohan McLure
@ 2023-05-08 6:30 ` Arnd Bergmann
2023-05-08 15:44 ` [PATCH 4/12] " Gautam Menghani
2023-05-09 2:16 ` [PATCH 04/12] " Nicholas Piggin
2 siblings, 0 replies; 29+ messages in thread
From: Arnd Bergmann @ 2023-05-08 6:30 UTC (permalink / raw)
To: Rohan McLure, linuxppc-dev; +Cc: Nicholas Piggin, Gautam Menghani
On Mon, May 8, 2023, at 04:01, Rohan McLure wrote:
> Prior to this patch, data races are detectable by KCSAN of the following
> forms:
>
> [1] Asynchronous calls to mmiowb_set_pending() from an interrupt context
> or otherwise outside of a critical section
> [2] Interrupted critical sections, where the interrupt will itself
> acquire a lock
>
> In case [1], calling context does not need an mmiowb() call to be
> issued, otherwise it would do so itself. Such calls to
> mmiowb_set_pending() are either idempotent or no-ops.
>
> In case [2], irrespective of when the interrupt occurs, the interrupt
> will acquire and release its locks prior to its return, nesting_count
> will continue balanced. In the worst case, the interrupted critical
> section during a mmiowb_spin_unlock() call observes an mmiowb to be
> pending and afterward is interrupted, leading to an extraneous call to
> mmiowb(). This data race is clearly innocuous.
>
> Mark all potentially asynchronous memory accesses with READ_ONCE or
> WRITE_ONCE, including increments and decrements to nesting_count. This
> has the effect of removing KCSAN warnings at consumer's callsites.
>
> Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
> Reported-by: Michael Ellerman <mpe@ellerman.id.au>
> Reported-by: Gautam Menghani <gautammenghani201@gmail.com>
> ---
> include/asm-generic/mmiowb.h | 17 +++++++++++------
> 1 file changed, 11 insertions(+), 6 deletions(-)
For asm-generic:
Acked-by: Arnd Bergmann <arnd@arndb.de>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 4/12] asm-generic/mmiowb: Mark accesses to fix KCSAN warnings
2023-05-08 2:01 ` [PATCH 04/12] asm-generic/mmiowb: Mark accesses to fix KCSAN warnings Rohan McLure
2023-05-08 6:30 ` Arnd Bergmann
@ 2023-05-08 15:44 ` Gautam Menghani
2023-05-09 2:16 ` [PATCH 04/12] " Nicholas Piggin
2 siblings, 0 replies; 29+ messages in thread
From: Gautam Menghani @ 2023-05-08 15:44 UTC (permalink / raw)
To: Rohan McLure; +Cc: linuxppc-dev, npiggin, arnd, Gautam Menghani
On Mon, May 08, 2023 at 12:01:12PM +1000, Rohan McLure wrote:
> Prior to this patch, data races are detectable by KCSAN of the following
> forms:
>
> [1] Asynchronous calls to mmiowb_set_pending() from an interrupt context
> or otherwise outside of a critical section
> [2] Interrupted critical sections, where the interrupt will itself
> acquire a lock
>
> In case [1], calling context does not need an mmiowb() call to be
> issued, otherwise it would do so itself. Such calls to
> mmiowb_set_pending() are either idempotent or no-ops.
>
> In case [2], irrespective of when the interrupt occurs, the interrupt
> will acquire and release its locks prior to its return, nesting_count
> will continue balanced. In the worst case, the interrupted critical
> section during a mmiowb_spin_unlock() call observes an mmiowb to be
> pending and afterward is interrupted, leading to an extraneous call to
> mmiowb(). This data race is clearly innocuous.
>
> Mark all potentially asynchronous memory accesses with READ_ONCE or
> WRITE_ONCE, including increments and decrements to nesting_count. This
> has the effect of removing KCSAN warnings at consumer's callsites.
>
> Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
> Reported-by: Michael Ellerman <mpe@ellerman.id.au>
> Reported-by: Gautam Menghani <gautammenghani201@gmail.com>
> Acked-by: Arnd Bergmann <arnd@arndb.de>
> ---
> include/asm-generic/mmiowb.h | 17 +++++++++++------
> 1 file changed, 11 insertions(+), 6 deletions(-)
>
> diff --git a/include/asm-generic/mmiowb.h b/include/asm-generic/mmiowb.h
> index 5698fca3bf56..0b8b794150db 100644
> --- a/include/asm-generic/mmiowb.h
> +++ b/include/asm-generic/mmiowb.h
> @@ -35,27 +35,32 @@ DECLARE_PER_CPU(struct mmiowb_state, __mmiowb_state);
> static inline void mmiowb_set_pending(void)
> {
> struct mmiowb_state *ms = __mmiowb_state();
> + u16 nesting_count = READ_ONCE(ms->nesting_count);
>
> - if (likely(ms->nesting_count))
> - ms->mmiowb_pending = ms->nesting_count;
> + if (likely(nesting_count))
> + WRITE_ONCE(ms->mmiowb_pending, nesting_count);
> }
>
> static inline void mmiowb_spin_lock(void)
> {
> struct mmiowb_state *ms = __mmiowb_state();
> - ms->nesting_count++;
> +
> + /* Increment need not be atomic. Nestedness is balanced over interrupts. */
> + WRITE_ONCE(ms->nesting_count, READ_ONCE(ms->nesting_count) + 1);
> }
>
> static inline void mmiowb_spin_unlock(void)
> {
> struct mmiowb_state *ms = __mmiowb_state();
> + u16 pending = READ_ONCE(ms->mmiowb_pending);
>
> - if (unlikely(ms->mmiowb_pending)) {
> - ms->mmiowb_pending = 0;
> + WRITE_ONCE(ms->mmiowb_pending, 0);
> + if (unlikely(pending)) {
> mmiowb();
> }
>
> - ms->nesting_count--;
> + /* Decrement need not be atomic. Nestedness is balanced over interrupts. */
> + WRITE_ONCE(ms->nesting_count, READ_ONCE(ms->nesting_count) - 1);
> }
> #else
> #define mmiowb_set_pending() do { } while (0)
>
> --
> 2.37.2
>
Successfully tested for changes in include/asm-generic/mmiowb.h
Tested-by: Gautam Menghani <gautam@linux.ibm.com>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 01/12] powerpc: qspinlock: Fix qnode->locked value interpretation
2023-05-08 2:01 ` [PATCH 01/12] powerpc: qspinlock: Fix qnode->locked value interpretation Rohan McLure
@ 2023-05-09 2:01 ` Nicholas Piggin
2023-05-09 4:26 ` Rohan McLure
0 siblings, 1 reply; 29+ messages in thread
From: Nicholas Piggin @ 2023-05-09 2:01 UTC (permalink / raw)
To: Rohan McLure, linuxppc-dev; +Cc: arnd
On Mon May 8, 2023 at 12:01 PM AEST, Rohan McLure wrote:
> A comment accompanying the locked attribute of a qnode assigns a value
> of 1 to mean that the lock has been acquired. The usages of this
> variable however assume opposite semantics. Update usages so that the
> assertions of this comment are reflected in this file.
1 actually means if the lock is acquired for this waiter. The
previous owner sets it to 1 which means we now have the lock.
It's slightly confusing but that's how generic qspinlock calls
it too.
It actually doesn't even really mean we have acquired the lock
though, it means we got through the MCS queue. "Waiting" or
"released" or something like that might be a better name.
Could change the name or comment to make that a bit clearer, but
while it'the same as kernel/locking/qspinlock.c then better
keep polarity the same.
Thanks,
Nick
>
> Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
> ---
> arch/powerpc/lib/qspinlock.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/arch/powerpc/lib/qspinlock.c b/arch/powerpc/lib/qspinlock.c
> index e4bd145255d0..9cf93963772b 100644
> --- a/arch/powerpc/lib/qspinlock.c
> +++ b/arch/powerpc/lib/qspinlock.c
> @@ -435,7 +435,7 @@ static __always_inline bool yield_to_prev(struct qspinlock *lock, struct qnode *
>
> smp_rmb(); /* See __yield_to_locked_owner comment */
>
> - if (!node->locked) {
> + if (node->locked) {
> yield_to_preempted(prev_cpu, yield_count);
> spin_begin();
> return preempted;
> @@ -566,7 +566,7 @@ static __always_inline void queued_spin_lock_mcs_queue(struct qspinlock *lock, b
> node->lock = lock;
> node->cpu = smp_processor_id();
> node->yield_cpu = -1;
> - node->locked = 0;
> + node->locked = 1;
>
> tail = encode_tail_cpu(node->cpu);
>
> @@ -584,7 +584,7 @@ static __always_inline void queued_spin_lock_mcs_queue(struct qspinlock *lock, b
>
> /* Wait for mcs node lock to be released */
> spin_begin();
> - while (!node->locked) {
> + while (node->locked) {
> spec_barrier();
>
> if (yield_to_prev(lock, node, old, paravirt))
> @@ -693,13 +693,13 @@ static __always_inline void queued_spin_lock_mcs_queue(struct qspinlock *lock, b
> */
> if (paravirt && pv_prod_head) {
> int next_cpu = next->cpu;
> - WRITE_ONCE(next->locked, 1);
> + WRITE_ONCE(next->locked, 0);
> if (_Q_SPIN_MISO)
> asm volatile("miso" ::: "memory");
> if (vcpu_is_preempted(next_cpu))
> prod_cpu(next_cpu);
> } else {
> - WRITE_ONCE(next->locked, 1);
> + WRITE_ONCE(next->locked, 0);
> if (_Q_SPIN_MISO)
> asm volatile("miso" ::: "memory");
> }
> --
> 2.37.2
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 02/12] powerpc: qspinlock: Mark accesses to qnode lock checks
2023-05-08 2:01 ` [PATCH 02/12] powerpc: qspinlock: Mark accesses to qnode lock checks Rohan McLure
@ 2023-05-09 2:02 ` Nicholas Piggin
0 siblings, 0 replies; 29+ messages in thread
From: Nicholas Piggin @ 2023-05-09 2:02 UTC (permalink / raw)
To: Rohan McLure, linuxppc-dev; +Cc: arnd
On Mon May 8, 2023 at 12:01 PM AEST, Rohan McLure wrote:
> The powerpc implemenation of qspinlocks will both poll and spin on the
> bitlock guarding a qnode. Mark these accesses with READ_ONCE to convey
> to KCSAN that polling is intentional here.
Yeah, and obviously pairs with the WRITE_ONCE so comment isn't really
necessary.
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
>
> Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
> ---
> arch/powerpc/lib/qspinlock.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/lib/qspinlock.c b/arch/powerpc/lib/qspinlock.c
> index 9cf93963772b..579290d55abf 100644
> --- a/arch/powerpc/lib/qspinlock.c
> +++ b/arch/powerpc/lib/qspinlock.c
> @@ -435,7 +435,7 @@ static __always_inline bool yield_to_prev(struct qspinlock *lock, struct qnode *
>
> smp_rmb(); /* See __yield_to_locked_owner comment */
>
> - if (node->locked) {
> + if (READ_ONCE(node->locked)) {
> yield_to_preempted(prev_cpu, yield_count);
> spin_begin();
> return preempted;
> @@ -584,7 +584,7 @@ static __always_inline void queued_spin_lock_mcs_queue(struct qspinlock *lock, b
>
> /* Wait for mcs node lock to be released */
> spin_begin();
> - while (node->locked) {
> + while (READ_ONCE(node->locked)) {
> spec_barrier();
>
> if (yield_to_prev(lock, node, old, paravirt))
> --
> 2.37.2
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 03/12] powerpc: qspinlock: Enforce qnode writes prior to publishing to queue
2023-05-08 2:01 ` [PATCH 03/12] powerpc: qspinlock: Enforce qnode writes prior to publishing to queue Rohan McLure
@ 2023-05-09 2:04 ` Nicholas Piggin
2023-05-09 5:26 ` Rohan McLure
0 siblings, 1 reply; 29+ messages in thread
From: Nicholas Piggin @ 2023-05-09 2:04 UTC (permalink / raw)
To: Rohan McLure, linuxppc-dev; +Cc: arnd
On Mon May 8, 2023 at 12:01 PM AEST, Rohan McLure wrote:
> Use a compiler barrier to enforce that all fields of a new struct qnode
> be written to (especially the lock value) before publishing the qnode to
> the waitqueue.
publish_tail_cpu is the release barrier for this and includes the memory
clobber there. Can we annotate that instead?
Thanks,
Nick
>
> Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
> ---
> arch/powerpc/lib/qspinlock.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/arch/powerpc/lib/qspinlock.c b/arch/powerpc/lib/qspinlock.c
> index 579290d55abf..d548001a86be 100644
> --- a/arch/powerpc/lib/qspinlock.c
> +++ b/arch/powerpc/lib/qspinlock.c
> @@ -567,6 +567,10 @@ static __always_inline void queued_spin_lock_mcs_queue(struct qspinlock *lock, b
> node->cpu = smp_processor_id();
> node->yield_cpu = -1;
> node->locked = 1;
> + /*
> + * Assign all attributes of a node before it can be published.
> + */
> + barrier();
>
> tail = encode_tail_cpu(node->cpu);
>
> --
> 2.37.2
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 04/12] asm-generic/mmiowb: Mark accesses to fix KCSAN warnings
2023-05-08 2:01 ` [PATCH 04/12] asm-generic/mmiowb: Mark accesses to fix KCSAN warnings Rohan McLure
2023-05-08 6:30 ` Arnd Bergmann
2023-05-08 15:44 ` [PATCH 4/12] " Gautam Menghani
@ 2023-05-09 2:16 ` Nicholas Piggin
2 siblings, 0 replies; 29+ messages in thread
From: Nicholas Piggin @ 2023-05-09 2:16 UTC (permalink / raw)
To: Rohan McLure, linuxppc-dev; +Cc: arnd, Gautam Menghani
On Mon May 8, 2023 at 12:01 PM AEST, Rohan McLure wrote:
> Prior to this patch, data races are detectable by KCSAN of the following
> forms:
>
> [1] Asynchronous calls to mmiowb_set_pending() from an interrupt context
> or otherwise outside of a critical section
> [2] Interrupted critical sections, where the interrupt will itself
> acquire a lock
>
> In case [1], calling context does not need an mmiowb() call to be
> issued, otherwise it would do so itself. Such calls to
> mmiowb_set_pending() are either idempotent or no-ops.
>
> In case [2], irrespective of when the interrupt occurs, the interrupt
> will acquire and release its locks prior to its return, nesting_count
> will continue balanced. In the worst case, the interrupted critical
> section during a mmiowb_spin_unlock() call observes an mmiowb to be
> pending and afterward is interrupted, leading to an extraneous call to
> mmiowb(). This data race is clearly innocuous.
>
> Mark all potentially asynchronous memory accesses with READ_ONCE or
> WRITE_ONCE, including increments and decrements to nesting_count. This
> has the effect of removing KCSAN warnings at consumer's callsites.
>
> Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
> Reported-by: Michael Ellerman <mpe@ellerman.id.au>
> Reported-by: Gautam Menghani <gautammenghani201@gmail.com>
> ---
> include/asm-generic/mmiowb.h | 17 +++++++++++------
> 1 file changed, 11 insertions(+), 6 deletions(-)
>
> diff --git a/include/asm-generic/mmiowb.h b/include/asm-generic/mmiowb.h
> index 5698fca3bf56..0b8b794150db 100644
> --- a/include/asm-generic/mmiowb.h
> +++ b/include/asm-generic/mmiowb.h
> @@ -35,27 +35,32 @@ DECLARE_PER_CPU(struct mmiowb_state, __mmiowb_state);
> static inline void mmiowb_set_pending(void)
> {
> struct mmiowb_state *ms = __mmiowb_state();
> + u16 nesting_count = READ_ONCE(ms->nesting_count);
The nesting_count is invariant from the point of view of this context,
so READ_ONCE shouldn't be required AFAIKS? It's sort of not even a
data race.
mmiowb_pending is a data race. I think we could get away without using
READ/WRITE_ONCE, but maybe a bit subtle to bother doing that and
explaining why it's okay.
Thanks,
Nick
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 05/12] powerpc: Mark [h]ssr_valid accesses in check_return_regs_valid
2023-05-08 2:01 ` [PATCH 05/12] powerpc: Mark [h]ssr_valid accesses in check_return_regs_valid Rohan McLure
@ 2023-05-09 2:17 ` Nicholas Piggin
0 siblings, 0 replies; 29+ messages in thread
From: Nicholas Piggin @ 2023-05-09 2:17 UTC (permalink / raw)
To: Rohan McLure, linuxppc-dev; +Cc: arnd
On Mon May 8, 2023 at 12:01 PM AEST, Rohan McLure wrote:
> Checks to see if the [H]SRR registers have been clobbered by (soft)
> NMI interrupts imply the possibility for a data race on the
> [h]srr_valid entries in the PACA. Annotate accesses to these fields with
> READ_ONCE, removing the need for the barrier.
>
> The diagnostic can use plain-access reads and writes, but annotate with
> data_race.
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
>
> Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
> Reported-by: Michael Ellerman <mpe@ellerman.id.au>
> ---
> arch/powerpc/include/asm/ptrace.h | 4 ++--
> arch/powerpc/kernel/interrupt.c | 14 ++++++--------
> 2 files changed, 8 insertions(+), 10 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/ptrace.h b/arch/powerpc/include/asm/ptrace.h
> index 0eb90a013346..9db8b16567e2 100644
> --- a/arch/powerpc/include/asm/ptrace.h
> +++ b/arch/powerpc/include/asm/ptrace.h
> @@ -180,8 +180,8 @@ void do_syscall_trace_leave(struct pt_regs *regs);
> static inline void set_return_regs_changed(void)
> {
> #ifdef CONFIG_PPC_BOOK3S_64
> - local_paca->hsrr_valid = 0;
> - local_paca->srr_valid = 0;
> + WRITE_ONCE(local_paca->hsrr_valid, 0);
> + WRITE_ONCE(local_paca->srr_valid, 0);
> #endif
> }
>
> diff --git a/arch/powerpc/kernel/interrupt.c b/arch/powerpc/kernel/interrupt.c
> index e34c72285b4e..1f033f11b871 100644
> --- a/arch/powerpc/kernel/interrupt.c
> +++ b/arch/powerpc/kernel/interrupt.c
> @@ -125,7 +125,7 @@ static notrace void check_return_regs_valid(struct pt_regs *regs)
> case 0x1600:
> case 0x1800:
> validp = &local_paca->hsrr_valid;
> - if (!*validp)
> + if (!READ_ONCE(*validp))
> return;
>
> srr0 = mfspr(SPRN_HSRR0);
> @@ -135,7 +135,7 @@ static notrace void check_return_regs_valid(struct pt_regs *regs)
> break;
> default:
> validp = &local_paca->srr_valid;
> - if (!*validp)
> + if (!READ_ONCE(*validp))
> return;
>
> srr0 = mfspr(SPRN_SRR0);
> @@ -161,19 +161,17 @@ static notrace void check_return_regs_valid(struct pt_regs *regs)
> * such things will get caught most of the time, statistically
> * enough to be able to get a warning out.
> */
> - barrier();
> -
> - if (!*validp)
> + if (!READ_ONCE(*validp))
> return;
>
> - if (!warned) {
> - warned = true;
> + if (!data_race(warned)) {
> + data_race(warned = true);
> printk("%sSRR0 was: %lx should be: %lx\n", h, srr0, regs->nip);
> printk("%sSRR1 was: %lx should be: %lx\n", h, srr1, regs->msr);
> show_regs(regs);
> }
>
> - *validp = 0; /* fixup */
> + WRITE_ONCE(*validp, 0); /* fixup */
> #endif
> }
>
> --
> 2.37.2
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 06/12] powerpc: Mark accesses to power_save callback in arch_cpu_idle
2023-05-08 2:01 ` [PATCH 06/12] powerpc: Mark accesses to power_save callback in arch_cpu_idle Rohan McLure
@ 2023-05-09 2:21 ` Nicholas Piggin
0 siblings, 0 replies; 29+ messages in thread
From: Nicholas Piggin @ 2023-05-09 2:21 UTC (permalink / raw)
To: Rohan McLure, linuxppc-dev; +Cc: arnd
On Mon May 8, 2023 at 12:01 PM AEST, Rohan McLure wrote:
> The power_save callback can be overwritten by another core at boot time.
> Specifically, null values will be replaced exactly once with the callback
> suitable for the particular platform (PowerNV / pseries lpars). Mark
> reads to this variable with READ_ONCE to signal to KCSAN that this race
> is acceptable, as well as to rule-out the possibility for compiler reorderings
> leading to calling a null pointer.
Is ppc_md readonly after init? Might be a good candidate if it is...
Maybe KCSAN doesn't recognise that though.
Unless the places that assign ppc_md.power_save need to be converted
to use WRITE_ONCE, you could just annotate this with data_race and
comment it's not really a race because it won't be called before the
structure is set up.
Thanks,
Nick
>
> Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
> ---
> arch/powerpc/kernel/idle.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
> index b1c0418b25c8..a1589bb97c98 100644
> --- a/arch/powerpc/kernel/idle.c
> +++ b/arch/powerpc/kernel/idle.c
> @@ -43,10 +43,12 @@ __setup("powersave=off", powersave_off);
>
> void arch_cpu_idle(void)
> {
> + void (*power_save)(void) = READ_ONCE(ppc_md.power_save);
> +
> ppc64_runlatch_off();
>
> - if (ppc_md.power_save) {
> - ppc_md.power_save();
> + if (power_save) {
> + power_save();
> /*
> * Some power_save functions return with
> * interrupts enabled, some don't.
> --
> 2.37.2
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 07/12] powerpc: powernv: Fix KCSAN datarace warnings on idle_state contention
2023-05-08 2:01 ` [PATCH 07/12] powerpc: powernv: Fix KCSAN datarace warnings on idle_state contention Rohan McLure
@ 2023-05-09 2:26 ` Nicholas Piggin
2023-05-10 2:00 ` Rohan McLure
0 siblings, 1 reply; 29+ messages in thread
From: Nicholas Piggin @ 2023-05-09 2:26 UTC (permalink / raw)
To: Rohan McLure, linuxppc-dev; +Cc: Rohan McLure, arnd
On Mon May 8, 2023 at 12:01 PM AEST, Rohan McLure wrote:
> The idle_state entry in the PACA on PowerNV features a bit which is
> atomically tested and set through ldarx/stdcx. to be used as a spinlock.
> This lock then guards access to other bit fields of idle_state. KCSAN
> cannot differentiate between any of these bitfield accesses as they all
> are implemented by 8-byte store/load instructions, thus cores contending
> on the bit-lock appear to data race with modifications to idle_state.
>
> Separate the bit-lock entry from the data guarded by the lock to avoid
> the possibility of data races being detected by KCSAN.
>
> Suggested-by: Nicholas Piggin <npiggin@gmail.com>
> Signed-off-by: Rohan McLure <rmclure@ibm.com>
> ---
> arch/powerpc/include/asm/paca.h | 1 +
> arch/powerpc/platforms/powernv/idle.c | 20 +++++++++++---------
> 2 files changed, 12 insertions(+), 9 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
> index da0377f46597..cb325938766a 100644
> --- a/arch/powerpc/include/asm/paca.h
> +++ b/arch/powerpc/include/asm/paca.h
> @@ -191,6 +191,7 @@ struct paca_struct {
> #ifdef CONFIG_PPC_POWERNV
> /* PowerNV idle fields */
> /* PNV_CORE_IDLE_* bits, all siblings work on thread 0 paca */
> + unsigned long idle_lock; /* A value of 1 means acquired */
> unsigned long idle_state;
> union {
> /* P7/P8 specific fields */
> diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
> index 841cb7f31f4f..97dbb7bc2b00 100644
> --- a/arch/powerpc/platforms/powernv/idle.c
> +++ b/arch/powerpc/platforms/powernv/idle.c
> @@ -246,9 +246,9 @@ static inline void atomic_lock_thread_idle(void)
> {
> int cpu = raw_smp_processor_id();
> int first = cpu_first_thread_sibling(cpu);
> - unsigned long *state = &paca_ptrs[first]->idle_state;
> + unsigned long *lock = &paca_ptrs[first]->idle_lock;
>
> - while (unlikely(test_and_set_bit_lock(NR_PNV_CORE_IDLE_LOCK_BIT, state)))
> + while (unlikely(test_and_set_bit_lock(NR_PNV_CORE_IDLE_LOCK_BIT, lock)))
> barrier();
> }
>
> @@ -258,29 +258,31 @@ static inline void atomic_unlock_and_stop_thread_idle(void)
> int first = cpu_first_thread_sibling(cpu);
> unsigned long thread = 1UL << cpu_thread_in_core(cpu);
> unsigned long *state = &paca_ptrs[first]->idle_state;
> + unsigned long *lock = &paca_ptrs[first]->idle_lock;
> u64 s = READ_ONCE(*state);
> u64 new, tmp;
>
> - BUG_ON(!(s & PNV_CORE_IDLE_LOCK_BIT));
> + BUG_ON(!(READ_ONCE(*lock) & PNV_CORE_IDLE_LOCK_BIT));
> BUG_ON(s & thread);
>
> again:
> - new = (s | thread) & ~PNV_CORE_IDLE_LOCK_BIT;
> + new = s | thread;
> tmp = cmpxchg(state, s, new);
> if (unlikely(tmp != s)) {
> s = tmp;
> goto again;
> }
> + clear_bit_unlock(NR_PNV_CORE_IDLE_LOCK_BIT, lock);
Sigh, another atomic. It's in a slow path though so I won't get too
upset. Would be nice to add a comment here and revert it when KCSCAN
can be taught about this pattern though, so we don't lose it.
> }
>
> static inline void atomic_unlock_thread_idle(void)
> {
> int cpu = raw_smp_processor_id();
> int first = cpu_first_thread_sibling(cpu);
> - unsigned long *state = &paca_ptrs[first]->idle_state;
> + unsigned long *lock = &paca_ptrs[first]->idle_lock;
>
> - BUG_ON(!test_bit(NR_PNV_CORE_IDLE_LOCK_BIT, state));
> - clear_bit_unlock(NR_PNV_CORE_IDLE_LOCK_BIT, state);
> + BUG_ON(!test_bit(NR_PNV_CORE_IDLE_LOCK_BIT, lock));
> + clear_bit_unlock(NR_PNV_CORE_IDLE_LOCK_BIT, lock);
> }
>
> /* P7 and P8 */
> @@ -380,9 +382,9 @@ static unsigned long power7_idle_insn(unsigned long type)
> sprs.uamor = mfspr(SPRN_UAMOR);
> }
>
> - local_paca->thread_idle_state = type;
> + WRITE_ONCE(local_paca->thread_idle_state, type);
> srr1 = isa206_idle_insn_mayloss(type); /* go idle */
> - local_paca->thread_idle_state = PNV_THREAD_RUNNING;
> + WRITE_ONCE(local_paca->thread_idle_state, PNV_THREAD_RUNNING);
Where is the thread_idle_state concurrency coming from?
Thanks,
Nick
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 08/12] powerpc: Annotate accesses to ipi message flags
2023-05-08 2:01 ` [PATCH 08/12] powerpc: Annotate accesses to ipi message flags Rohan McLure
@ 2023-05-09 2:28 ` Nicholas Piggin
0 siblings, 0 replies; 29+ messages in thread
From: Nicholas Piggin @ 2023-05-09 2:28 UTC (permalink / raw)
To: Rohan McLure, linuxppc-dev; +Cc: arnd
On Mon May 8, 2023 at 12:01 PM AEST, Rohan McLure wrote:
> IPI message flags are observed and consequently consumed in the
> smp_ipi_demux_relaxed function, which handles these message sources
> until it observes none more arriving. Mark the checked loop guard with
> READ_ONCE, to signal to KCSAN that the read is known to be volatile, and
> that non-determinism is expected.
smp_muxed_ipi_set_message() doesn't need a WRITE_ONCE()?
Thanks,
Nick
>
> Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
> ---
> arch/powerpc/kernel/smp.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 6b90f10a6c81..00b74d66b771 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -348,7 +348,7 @@ irqreturn_t smp_ipi_demux_relaxed(void)
> if (all & IPI_MESSAGE(PPC_MSG_NMI_IPI))
> nmi_ipi_action(0, NULL);
> #endif
> - } while (info->messages);
> + } while (READ_ONCE(info->messages));
>
> return IRQ_HANDLED;
> }
> --
> 2.37.2
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 09/12] powerpc: Mark writes registering ipi to host cpu through kvm
2023-05-08 2:01 ` [PATCH 09/12] powerpc: Mark writes registering ipi to host cpu through kvm Rohan McLure
@ 2023-05-09 2:30 ` Nicholas Piggin
0 siblings, 0 replies; 29+ messages in thread
From: Nicholas Piggin @ 2023-05-09 2:30 UTC (permalink / raw)
To: Rohan McLure, linuxppc-dev; +Cc: arnd
On Mon May 8, 2023 at 12:01 PM AEST, Rohan McLure wrote:
> Mark writes to hypervisor ipi state so that KCSAN recognises these
> asynchronous issue of kvmppc_{set,clear}_host_ipi to be intended, with
> atomic writes.
How about READ_ONCE for the read side of host_ipi?
Thanks,
Nick
>
> Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
> ---
> arch/powerpc/include/asm/kvm_ppc.h | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
> index bc57d058ad5b..d701df006c08 100644
> --- a/arch/powerpc/include/asm/kvm_ppc.h
> +++ b/arch/powerpc/include/asm/kvm_ppc.h
> @@ -548,12 +548,12 @@ static inline void kvmppc_set_host_ipi(int cpu)
> * pairs with the barrier in kvmppc_clear_host_ipi()
> */
> smp_mb();
> - paca_ptrs[cpu]->kvm_hstate.host_ipi = 1;
> + WRITE_ONCE(paca_ptrs[cpu]->kvm_hstate.host_ipi, 1);
> }
>
> static inline void kvmppc_clear_host_ipi(int cpu)
> {
> - paca_ptrs[cpu]->kvm_hstate.host_ipi = 0;
> + WRITE_ONCE(paca_ptrs[cpu]->kvm_hstate.host_ipi, 0);
> /*
> * order clearing of host_ipi flag vs. processing of IPI messages
> *
> --
> 2.37.2
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 10/12] powerpc: powernv: Annotate data races in opal events
2023-05-08 2:01 ` [PATCH 10/12] powerpc: powernv: Annotate data races in opal events Rohan McLure
@ 2023-05-09 2:31 ` Nicholas Piggin
0 siblings, 0 replies; 29+ messages in thread
From: Nicholas Piggin @ 2023-05-09 2:31 UTC (permalink / raw)
To: Rohan McLure, linuxppc-dev; +Cc: arnd
On Mon May 8, 2023 at 12:01 PM AEST, Rohan McLure wrote:
> The kopald thread handles opal events as they appear, but by polling a
> static bit-vector in last_outstanding_events. Annotate these data races
> accordingly. We are not at risk of missing events, but use of READ_ONCE,
> WRITE_ONCE will assist readers in seeing that kopald only consumes the
> events it is aware of when it is scheduled. Also removes extraneous
> KCSAN warnings.
This code is fairly crap, which I can say because I wrote it :(
But this at least is an improvement. Thanks.
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
>
> Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
> ---
> arch/powerpc/platforms/powernv/opal-irqchip.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/platforms/powernv/opal-irqchip.c b/arch/powerpc/platforms/powernv/opal-irqchip.c
> index d55652b5f6fa..f9a7001dacb7 100644
> --- a/arch/powerpc/platforms/powernv/opal-irqchip.c
> +++ b/arch/powerpc/platforms/powernv/opal-irqchip.c
> @@ -59,7 +59,7 @@ void opal_handle_events(void)
>
> cond_resched();
> }
> - last_outstanding_events = 0;
> + WRITE_ONCE(last_outstanding_events, 0);
> if (opal_poll_events(&events) != OPAL_SUCCESS)
> return;
> e = be64_to_cpu(events) & opal_event_irqchip.mask;
> @@ -69,7 +69,7 @@ void opal_handle_events(void)
>
> bool opal_have_pending_events(void)
> {
> - if (last_outstanding_events & opal_event_irqchip.mask)
> + if (READ_ONCE(last_outstanding_events) & opal_event_irqchip.mask)
> return true;
> return false;
> }
> @@ -124,7 +124,7 @@ static irqreturn_t opal_interrupt(int irq, void *data)
> __be64 events;
>
> opal_handle_interrupt(virq_to_hw(irq), &events);
> - last_outstanding_events = be64_to_cpu(events);
> + WRITE_ONCE(last_outstanding_events, be64_to_cpu(events));
> if (opal_have_pending_events())
> opal_wake_poller();
>
> --
> 2.37.2
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 01/12] powerpc: qspinlock: Fix qnode->locked value interpretation
2023-05-09 2:01 ` Nicholas Piggin
@ 2023-05-09 4:26 ` Rohan McLure
0 siblings, 0 replies; 29+ messages in thread
From: Rohan McLure @ 2023-05-09 4:26 UTC (permalink / raw)
To: Nicholas Piggin; +Cc: linuxppc-dev, Arnd Bergmann
> On 9 May 2023, at 12:01 pm, Nicholas Piggin <npiggin@gmail.com> wrote:
>
> On Mon May 8, 2023 at 12:01 PM AEST, Rohan McLure wrote:
>> A comment accompanying the locked attribute of a qnode assigns a value
>> of 1 to mean that the lock has been acquired. The usages of this
>> variable however assume opposite semantics. Update usages so that the
>> assertions of this comment are reflected in this file.
>
> 1 actually means if the lock is acquired for this waiter. The
> previous owner sets it to 1 which means we now have the lock.
> It's slightly confusing but that's how generic qspinlock calls
> it too.
>
> It actually doesn't even really mean we have acquired the lock
> though, it means we got through the MCS queue. "Waiting" or
> "released" or something like that might be a better name.
This makes more sense. Seemed pretty unlikely to me that swapped
polarity have gone unnoticed, so glad to have that cleared up.
>
> Could change the name or comment to make that a bit clearer, but
> while it'the same as kernel/locking/qspinlock.c then better
> keep polarity the same.
Yeah since ‘locked’ is an mcs intrinsic I think I’d rather keep
the name from kernel/locking/qspinlock.c.
>
> Thanks,
> Nick
>
>>
>> Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
>> ---
>> arch/powerpc/lib/qspinlock.c | 10 +++++-----
>> 1 file changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/powerpc/lib/qspinlock.c b/arch/powerpc/lib/qspinlock.c
>> index e4bd145255d0..9cf93963772b 100644
>> --- a/arch/powerpc/lib/qspinlock.c
>> +++ b/arch/powerpc/lib/qspinlock.c
>> @@ -435,7 +435,7 @@ static __always_inline bool yield_to_prev(struct qspinlock *lock, struct qnode *
>>
>> smp_rmb(); /* See __yield_to_locked_owner comment */
>>
>> - if (!node->locked) {
>> + if (node->locked) {
>> yield_to_preempted(prev_cpu, yield_count);
>> spin_begin();
>> return preempted;
>> @@ -566,7 +566,7 @@ static __always_inline void queued_spin_lock_mcs_queue(struct qspinlock *lock, b
>> node->lock = lock;
>> node->cpu = smp_processor_id();
>> node->yield_cpu = -1;
>> - node->locked = 0;
>> + node->locked = 1;
>>
>> tail = encode_tail_cpu(node->cpu);
>>
>> @@ -584,7 +584,7 @@ static __always_inline void queued_spin_lock_mcs_queue(struct qspinlock *lock, b
>>
>> /* Wait for mcs node lock to be released */
>> spin_begin();
>> - while (!node->locked) {
>> + while (node->locked) {
>> spec_barrier();
>>
>> if (yield_to_prev(lock, node, old, paravirt))
>> @@ -693,13 +693,13 @@ static __always_inline void queued_spin_lock_mcs_queue(struct qspinlock *lock, b
>> */
>> if (paravirt && pv_prod_head) {
>> int next_cpu = next->cpu;
>> - WRITE_ONCE(next->locked, 1);
>> + WRITE_ONCE(next->locked, 0);
>> if (_Q_SPIN_MISO)
>> asm volatile("miso" ::: "memory");
>> if (vcpu_is_preempted(next_cpu))
>> prod_cpu(next_cpu);
>> } else {
>> - WRITE_ONCE(next->locked, 1);
>> + WRITE_ONCE(next->locked, 0);
>> if (_Q_SPIN_MISO)
>> asm volatile("miso" ::: "memory");
>> }
>> --
>> 2.37.2
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 03/12] powerpc: qspinlock: Enforce qnode writes prior to publishing to queue
2023-05-09 2:04 ` Nicholas Piggin
@ 2023-05-09 5:26 ` Rohan McLure
2023-05-09 6:45 ` Nicholas Piggin
0 siblings, 1 reply; 29+ messages in thread
From: Rohan McLure @ 2023-05-09 5:26 UTC (permalink / raw)
To: Nicholas Piggin; +Cc: linuxppc-dev, arnd
> On 9 May 2023, at 12:04 pm, Nicholas Piggin <npiggin@gmail.com> wrote:
>
> On Mon May 8, 2023 at 12:01 PM AEST, Rohan McLure wrote:
>> Use a compiler barrier to enforce that all fields of a new struct qnode
>> be written to (especially the lock value) before publishing the qnode to
>> the waitqueue.
>
> publish_tail_cpu is the release barrier for this and includes the memory
> clobber there. Can we annotate that instead?
Got it, I see that one now.
On another note though, it looks like the memory clobber doesn’t serve
to squash KCSAN warnings here.
==================================================================
BUG: KCSAN: data-race in queued_spin_lock_slowpath / queued_spin_lock_slowpath
write (marked) to 0xc000000ff3790b20 of 1 bytes by task 1045 on cpu 64:
write (reordered) to 0xc000000ff3790b20 of 1 bytes by task 1063 on cpu 31:
|
+-> reordered to: queued_spin_lock_slowpath+0xcec/0x1b70
Reported by Kernel Concurrency Sanitizer on:
==================================================================
The one byte memory access has to be ‘locked’ in this instance. KCSAN
takes issue with how the assignment of locked is unmarked in this
instance, even while during the epoch at which this assignment can occur,
the node is inaccessible. Looks like I’ll have to issue a data_race, even
while there is no capacity for a real data race.
>
> Thanks,
> Nick
>
>>
>> Signed-off-by: Rohan McLure <rmclure@linux.ibm.com>
>> ---
>> arch/powerpc/lib/qspinlock.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/arch/powerpc/lib/qspinlock.c b/arch/powerpc/lib/qspinlock.c
>> index 579290d55abf..d548001a86be 100644
>> --- a/arch/powerpc/lib/qspinlock.c
>> +++ b/arch/powerpc/lib/qspinlock.c
>> @@ -567,6 +567,10 @@ static __always_inline void queued_spin_lock_mcs_queue(struct qspinlock *lock, b
>> node->cpu = smp_processor_id();
>> node->yield_cpu = -1;
>> node->locked = 1;
>> + /*
>> + * Assign all attributes of a node before it can be published.
>> + */
>> + barrier();
>>
>> tail = encode_tail_cpu(node->cpu);
>>
>> --
>> 2.37.2
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 03/12] powerpc: qspinlock: Enforce qnode writes prior to publishing to queue
2023-05-09 5:26 ` Rohan McLure
@ 2023-05-09 6:45 ` Nicholas Piggin
0 siblings, 0 replies; 29+ messages in thread
From: Nicholas Piggin @ 2023-05-09 6:45 UTC (permalink / raw)
To: Rohan McLure; +Cc: linuxppc-dev, arnd
On Tue May 9, 2023 at 3:26 PM AEST, Rohan McLure wrote:
> > On 9 May 2023, at 12:04 pm, Nicholas Piggin <npiggin@gmail.com> wrote:
> >
> > On Mon May 8, 2023 at 12:01 PM AEST, Rohan McLure wrote:
> >> Use a compiler barrier to enforce that all fields of a new struct qnode
> >> be written to (especially the lock value) before publishing the qnode to
> >> the waitqueue.
> >
> > publish_tail_cpu is the release barrier for this and includes the memory
> > clobber there. Can we annotate that instead?
>
> Got it, I see that one now.
>
> On another note though, it looks like the memory clobber doesn’t serve
> to squash KCSAN warnings here.
Aha, publish_tail_cpu() needs a kcsan_release().
Thanks,
Nick
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 07/12] powerpc: powernv: Fix KCSAN datarace warnings on idle_state contention
2023-05-09 2:26 ` Nicholas Piggin
@ 2023-05-10 2:00 ` Rohan McLure
0 siblings, 0 replies; 29+ messages in thread
From: Rohan McLure @ 2023-05-10 2:00 UTC (permalink / raw)
To: Nicholas Piggin; +Cc: Rohan McLure, linuxppc-dev, arnd
> On 9 May 2023, at 12:26 pm, Nicholas Piggin <npiggin@gmail.com> wrote:
>
> On Mon May 8, 2023 at 12:01 PM AEST, Rohan McLure wrote:
>> The idle_state entry in the PACA on PowerNV features a bit which is
>> atomically tested and set through ldarx/stdcx. to be used as a spinlock.
>> This lock then guards access to other bit fields of idle_state. KCSAN
>> cannot differentiate between any of these bitfield accesses as they all
>> are implemented by 8-byte store/load instructions, thus cores contending
>> on the bit-lock appear to data race with modifications to idle_state.
>>
>> Separate the bit-lock entry from the data guarded by the lock to avoid
>> the possibility of data races being detected by KCSAN.
>>
>> Suggested-by: Nicholas Piggin <npiggin@gmail.com>
>> Signed-off-by: Rohan McLure <rmclure@ibm.com>
>> ---
>> arch/powerpc/include/asm/paca.h | 1 +
>> arch/powerpc/platforms/powernv/idle.c | 20 +++++++++++---------
>> 2 files changed, 12 insertions(+), 9 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
>> index da0377f46597..cb325938766a 100644
>> --- a/arch/powerpc/include/asm/paca.h
>> +++ b/arch/powerpc/include/asm/paca.h
>> @@ -191,6 +191,7 @@ struct paca_struct {
>> #ifdef CONFIG_PPC_POWERNV
>> /* PowerNV idle fields */
>> /* PNV_CORE_IDLE_* bits, all siblings work on thread 0 paca */
>> + unsigned long idle_lock; /* A value of 1 means acquired */
>> unsigned long idle_state;
>> union {
>> /* P7/P8 specific fields */
>> diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
>> index 841cb7f31f4f..97dbb7bc2b00 100644
>> --- a/arch/powerpc/platforms/powernv/idle.c
>> +++ b/arch/powerpc/platforms/powernv/idle.c
>> @@ -246,9 +246,9 @@ static inline void atomic_lock_thread_idle(void)
>> {
>> int cpu = raw_smp_processor_id();
>> int first = cpu_first_thread_sibling(cpu);
>> - unsigned long *state = &paca_ptrs[first]->idle_state;
>> + unsigned long *lock = &paca_ptrs[first]->idle_lock;
>>
>> - while (unlikely(test_and_set_bit_lock(NR_PNV_CORE_IDLE_LOCK_BIT, state)))
>> + while (unlikely(test_and_set_bit_lock(NR_PNV_CORE_IDLE_LOCK_BIT, lock)))
>> barrier();
>> }
>>
>> @@ -258,29 +258,31 @@ static inline void atomic_unlock_and_stop_thread_idle(void)
>> int first = cpu_first_thread_sibling(cpu);
>> unsigned long thread = 1UL << cpu_thread_in_core(cpu);
>> unsigned long *state = &paca_ptrs[first]->idle_state;
>> + unsigned long *lock = &paca_ptrs[first]->idle_lock;
>> u64 s = READ_ONCE(*state);
>> u64 new, tmp;
>>
>> - BUG_ON(!(s & PNV_CORE_IDLE_LOCK_BIT));
>> + BUG_ON(!(READ_ONCE(*lock) & PNV_CORE_IDLE_LOCK_BIT));
>> BUG_ON(s & thread);
>>
>> again:
>> - new = (s | thread) & ~PNV_CORE_IDLE_LOCK_BIT;
>> + new = s | thread;
>> tmp = cmpxchg(state, s, new);
>> if (unlikely(tmp != s)) {
>> s = tmp;
>> goto again;
>> }
>> + clear_bit_unlock(NR_PNV_CORE_IDLE_LOCK_BIT, lock);
>
> Sigh, another atomic. It's in a slow path though so I won't get too
> upset. Would be nice to add a comment here and revert it when KCSCAN
> can be taught about this pattern though, so we don't lose it.
>
>> }
>>
>> static inline void atomic_unlock_thread_idle(void)
>> {
>> int cpu = raw_smp_processor_id();
>> int first = cpu_first_thread_sibling(cpu);
>> - unsigned long *state = &paca_ptrs[first]->idle_state;
>> + unsigned long *lock = &paca_ptrs[first]->idle_lock;
>>
>> - BUG_ON(!test_bit(NR_PNV_CORE_IDLE_LOCK_BIT, state));
>> - clear_bit_unlock(NR_PNV_CORE_IDLE_LOCK_BIT, state);
>> + BUG_ON(!test_bit(NR_PNV_CORE_IDLE_LOCK_BIT, lock));
>> + clear_bit_unlock(NR_PNV_CORE_IDLE_LOCK_BIT, lock);
>> }
>>
>> /* P7 and P8 */
>> @@ -380,9 +382,9 @@ static unsigned long power7_idle_insn(unsigned long type)
>> sprs.uamor = mfspr(SPRN_UAMOR);
>> }
>>
>> - local_paca->thread_idle_state = type;
>> + WRITE_ONCE(local_paca->thread_idle_state, type);
>> srr1 = isa206_idle_insn_mayloss(type); /* go idle */
>> - local_paca->thread_idle_state = PNV_THREAD_RUNNING;
>> + WRITE_ONCE(local_paca->thread_idle_state, PNV_THREAD_RUNNING);
>
> Where is the thread_idle_state concurrency coming from?
Yeah, I agree, WRITE_ONCE isn’t necessary here, as all reads of this variable
by xmon are purely diagnostic (data races permitted), and the
isa206_idle_insn_mayloss() call is a compiler barrier. So write instructions
will be emitted on each side of the call.
>
> Thanks,
> Nick
^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2023-05-10 2:01 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-08 2:01 [PATCH 00/12] powerpc: KCSAN fix warnings and mark accesses Rohan McLure
2023-05-08 2:01 ` [PATCH 01/12] powerpc: qspinlock: Fix qnode->locked value interpretation Rohan McLure
2023-05-09 2:01 ` Nicholas Piggin
2023-05-09 4:26 ` Rohan McLure
2023-05-08 2:01 ` [PATCH 02/12] powerpc: qspinlock: Mark accesses to qnode lock checks Rohan McLure
2023-05-09 2:02 ` Nicholas Piggin
2023-05-08 2:01 ` [PATCH 03/12] powerpc: qspinlock: Enforce qnode writes prior to publishing to queue Rohan McLure
2023-05-09 2:04 ` Nicholas Piggin
2023-05-09 5:26 ` Rohan McLure
2023-05-09 6:45 ` Nicholas Piggin
2023-05-08 2:01 ` [PATCH 04/12] asm-generic/mmiowb: Mark accesses to fix KCSAN warnings Rohan McLure
2023-05-08 6:30 ` Arnd Bergmann
2023-05-08 15:44 ` [PATCH 4/12] " Gautam Menghani
2023-05-09 2:16 ` [PATCH 04/12] " Nicholas Piggin
2023-05-08 2:01 ` [PATCH 05/12] powerpc: Mark [h]ssr_valid accesses in check_return_regs_valid Rohan McLure
2023-05-09 2:17 ` Nicholas Piggin
2023-05-08 2:01 ` [PATCH 06/12] powerpc: Mark accesses to power_save callback in arch_cpu_idle Rohan McLure
2023-05-09 2:21 ` Nicholas Piggin
2023-05-08 2:01 ` [PATCH 07/12] powerpc: powernv: Fix KCSAN datarace warnings on idle_state contention Rohan McLure
2023-05-09 2:26 ` Nicholas Piggin
2023-05-10 2:00 ` Rohan McLure
2023-05-08 2:01 ` [PATCH 08/12] powerpc: Annotate accesses to ipi message flags Rohan McLure
2023-05-09 2:28 ` Nicholas Piggin
2023-05-08 2:01 ` [PATCH 09/12] powerpc: Mark writes registering ipi to host cpu through kvm Rohan McLure
2023-05-09 2:30 ` Nicholas Piggin
2023-05-08 2:01 ` [PATCH 10/12] powerpc: powernv: Annotate data races in opal events Rohan McLure
2023-05-09 2:31 ` Nicholas Piggin
2023-05-08 2:01 ` [PATCH 11/12] powerpc: powernv: Annotate asynchronous access to opal tokens Rohan McLure
2023-05-08 2:01 ` [PATCH 12/12] powerpc: Mark asynchronous accesses to irq_data Rohan McLure
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).