public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] tile: use a more conservative __my_cpu_offset in CONFIG_PREEMPT
@ 2013-09-26 17:24 Chris Metcalf
  2013-09-26 17:57 ` Will Deacon
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Metcalf @ 2013-09-26 17:24 UTC (permalink / raw)
  To: linux-kernel, Tejun Heo, Christoph Lameter,
	Benjamin Herrenschmidt
  Cc: Rob Herring, Nicolas Pitre, Will Deacon, Linus Torvalds

It turns out the kernel relies on barrier() to force a reload of the
percpu offset value.  Since we can't easily modify the definition of
barrier() to include "tp" as an output register, we instead provide a
definition of __my_cpu_offset as extended assembly that includes a fake
stack read to hazard against barrier(), forcing gcc to know that it
must reread "tp" and recompute anything based on "tp" after a barrier.

This fixes observed hangs in the slub allocator when we are looping
on a percpu cmpxchg_double.

A similar fix for ARMv7 was made in June in change 509eb76ebf97.

Cc: stable@vger.kernel.org
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
---
Ben Herrenschmidt offered to look into extending barrier() in some
per-architecture way, but for now this patch is the minimal thing to
fix the bug in 3.12 and earlier, so I'm planning to push it as-is.

 arch/tile/include/asm/percpu.h | 33 ++++++++++++++++++++++++++++++---
 1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/arch/tile/include/asm/percpu.h b/arch/tile/include/asm/percpu.h
index 63294f5..de121d3 100644
--- a/arch/tile/include/asm/percpu.h
+++ b/arch/tile/include/asm/percpu.h
@@ -15,9 +15,36 @@
 #ifndef _ASM_TILE_PERCPU_H
 #define _ASM_TILE_PERCPU_H
 
-register unsigned long __my_cpu_offset __asm__("tp");
-#define __my_cpu_offset __my_cpu_offset
-#define set_my_cpu_offset(tp) (__my_cpu_offset = (tp))
+register unsigned long my_cpu_offset_reg asm("tp");
+
+#ifdef CONFIG_PREEMPT
+/*
+ * For full preemption, we can't just use the register variable
+ * directly, since we need barrier() to hazard against it, causing the
+ * compiler to reload anything computed from a previous "tp" value.
+ * But we also don't want to use volatile asm, since we'd like the
+ * compiler to be able to cache the value across multiple percpu reads.
+ * So we use a fake stack read as a hazard against barrier().
+ */
+static inline unsigned long __my_cpu_offset(void)
+{
+	unsigned long tp;
+	register unsigned long *sp asm("sp");
+	asm("move %0, tp" : "=r" (tp) : "m" (*sp));
+	return tp;
+}
+#define __my_cpu_offset __my_cpu_offset()
+#else
+/*
+ * We don't need to hazard against barrier() since "tp" doesn't ever
+ * change with PREEMPT_NONE, and with PREEMPT_VOLUNTARY it only
+ * changes at function call points, at which we are already re-reading
+ * the value of "tp" due to "my_cpu_offset_reg" being a global variable.
+ */
+#define __my_cpu_offset my_cpu_offset_reg
+#endif
+
+#define set_my_cpu_offset(tp) (my_cpu_offset_reg = (tp))
 
 #include <asm-generic/percpu.h>
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-09-30 18:01 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-26 17:24 [PATCH] tile: use a more conservative __my_cpu_offset in CONFIG_PREEMPT Chris Metcalf
2013-09-26 17:57 ` Will Deacon
2013-09-27 20:13   ` Chris Metcalf
2013-09-30  9:45     ` Will Deacon
2013-09-30 18:01       ` Chris Metcalf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox