linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
To: benh@kernel.crashing.org, mpe@ellerman.id.au
Cc: anton@samba.org, paulus@samba.org, npiggin@gmail.com,
	linuxppc-dev@lists.ozlabs.org,
	Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Subject: [PATCH v10 14/14] powerpc: rewrite local_t using soft_irq
Date: Sun, 20 Aug 2017 23:28:31 +0530	[thread overview]
Message-ID: <1503251911-30637-15-git-send-email-maddy@linux.vnet.ibm.com> (raw)
In-Reply-To: <1503251911-30637-1-git-send-email-maddy@linux.vnet.ibm.com>

Local atomic operations are fast and highly reentrant per CPU counters.
Used for percpu variable updates. Local atomic operations only guarantee
variable modification atomicity wrt the CPU which owns the data and
these needs to be executed in a preemption safe way.

Here is the design of this patch. Since local_* operations
are only need to be atomic to interrupts (IIUC), we have two options.
Either replay the "op" if interrupted or replay the interrupt after
the "op". Initial patchset posted was based on implementing local_* operation
based on CR5 which replay's the "op". Patchset had issues in case of
rewinding the address pointor from an array. This make the slow path
really slow. Since CR5 based implementation proposed using __ex_table to find
the rewind addressr, this rasied concerns about size of __ex_table and vmlinux.

https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-December/123115.html

But this patch uses, powerpc_local_irq_pmu_save to soft_disable
interrupts (including PMIs). After finishing the "op", powerpc_local_irq_pmu_restore()
called and correspondingly interrupts are replayed if any occured.

patch re-write the current local_* functions to use arch_local_irq_disbale.
Base flow for each function is

{
	powerpc_local_irq_pmu_save(flags)
	load
	..
	store
	powerpc_local_irq_pmu_restore(flags)
}

Reason for the approach is that, currently l[w/d]arx/st[w/d]cx.
instruction pair is used for local_* operations, which are heavy
on cycle count and they dont support a local variant. So to
see whether the new implementation helps, used a modified
version of Rusty's benchmark code on local_t.

https://lkml.org/lkml/2008/12/16/450

Modifications to Rusty's benchmark code:
- Executed only local_t test

Here are the values with the patch.

Time in ns per iteration

Local_t             Without Patch           With Patch

_inc                        38              10
_add                        38              10
_read                       4               4
_add_return	            38              10

Currently only asm/local.h has been rewritten, and also
the entire change is tested only in PPC64 (pseries guest)
and PPC64 host (LE)

Patch convert the inline asm implementation of the local_t
to C. Also removed the local_dec_if_positive since there are
no users for this.

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/local.h | 200 +++++++++++++++++----------------------
 1 file changed, 86 insertions(+), 114 deletions(-)

diff --git a/arch/powerpc/include/asm/local.h b/arch/powerpc/include/asm/local.h
index b8da91363864..7795359b3663 100644
--- a/arch/powerpc/include/asm/local.h
+++ b/arch/powerpc/include/asm/local.h
@@ -3,74 +3,62 @@
 
 #include <linux/percpu.h>
 #include <linux/atomic.h>
+#include <linux/irqflags.h>
+
+#include <asm/hw_irq.h>
+
+#ifdef CONFIG_PPC64
 
 typedef struct
 {
-	atomic_long_t a;
+	long v;
 } local_t;
 
-#define LOCAL_INIT(i)	{ ATOMIC_LONG_INIT(i) }
-
-#define local_read(l)	atomic_long_read(&(l)->a)
-#define local_set(l,i)	atomic_long_set(&(l)->a, (i))
+#define LOCAL_INIT(i)	{ (i) }
 
-#define local_add(i,l)	atomic_long_add((i),(&(l)->a))
-#define local_sub(i,l)	atomic_long_sub((i),(&(l)->a))
-#define local_inc(l)	atomic_long_inc(&(l)->a)
-#define local_dec(l)	atomic_long_dec(&(l)->a)
-
-static __inline__ long local_add_return(long a, local_t *l)
+static __inline__ long local_read(local_t *l)
 {
-	long t;
-
-	__asm__ __volatile__(
-"1:"	PPC_LLARX(%0,0,%2,0) "			# local_add_return\n\
-	add	%0,%1,%0\n"
-	PPC405_ERR77(0,%2)
-	PPC_STLCX	"%0,0,%2 \n\
-	bne-	1b"
-	: "=&r" (t)
-	: "r" (a), "r" (&(l->a.counter))
-	: "cc", "memory");
-
-	return t;
+	return READ_ONCE(l->v);
 }
 
-#define local_add_negative(a, l)	(local_add_return((a), (l)) < 0)
-
-static __inline__ long local_sub_return(long a, local_t *l)
+static __inline__ void local_set(local_t *l, long i)
 {
-	long t;
+	WRITE_ONCE(l->v, i);
+}
 
-	__asm__ __volatile__(
-"1:"	PPC_LLARX(%0,0,%2,0) "			# local_sub_return\n\
-	subf	%0,%1,%0\n"
-	PPC405_ERR77(0,%2)
-	PPC_STLCX	"%0,0,%2 \n\
-	bne-	1b"
-	: "=&r" (t)
-	: "r" (a), "r" (&(l->a.counter))
-	: "cc", "memory");
+#define LOCAL_OP(op, c_op)						\
+static __inline__ void local_##op(long i, local_t *l)			\
+{									\
+	unsigned long flags;						\
+									\
+	powerpc_local_irq_pmu_save(flags);				\
+	l->v c_op i;						\
+	powerpc_local_irq_pmu_restore(flags);				\
+}
 
-	return t;
+#define LOCAL_OP_RETURN(op, c_op)					\
+static __inline__ long local_##op##_return(long a, local_t *l)		\
+{									\
+	long t;								\
+	unsigned long flags;						\
+									\
+	powerpc_local_irq_pmu_save(flags);				\
+	t = (l->v c_op a);						\
+	powerpc_local_irq_pmu_restore(flags);				\
+									\
+	return t;							\
 }
 
-static __inline__ long local_inc_return(local_t *l)
-{
-	long t;
+#define LOCAL_OPS(op, c_op)		\
+	LOCAL_OP(op, c_op)		\
+	LOCAL_OP_RETURN(op, c_op)
 
-	__asm__ __volatile__(
-"1:"	PPC_LLARX(%0,0,%1,0) "			# local_inc_return\n\
-	addic	%0,%0,1\n"
-	PPC405_ERR77(0,%1)
-	PPC_STLCX	"%0,0,%1 \n\
-	bne-	1b"
-	: "=&r" (t)
-	: "r" (&(l->a.counter))
-	: "cc", "xer", "memory");
+LOCAL_OPS(add, +=)
+LOCAL_OPS(sub, -=)
 
-	return t;
-}
+#define local_add_negative(a, l)	(local_add_return((a), (l)) < 0)
+#define local_inc_return(l)		local_add_return(1LL, l)
+#define local_inc(l)			local_inc_return(l)
 
 /*
  * local_inc_and_test - increment and test
@@ -80,28 +68,39 @@ static __inline__ long local_inc_return(local_t *l)
  * and returns true if the result is zero, or false for all
  * other cases.
  */
-#define local_inc_and_test(l) (local_inc_return(l) == 0)
+#define local_inc_and_test(l)		(local_inc_return(l) == 0)
 
-static __inline__ long local_dec_return(local_t *l)
+#define local_dec_return(l)		local_sub_return(1LL, l)
+#define local_dec(l)			local_dec_return(l)
+#define local_sub_and_test(a, l)	(local_sub_return((a), (l)) == 0)
+#define local_dec_and_test(l)		(local_dec_return((l)) == 0)
+
+static __inline__ long local_cmpxchg(local_t *l, long o, long n)
 {
 	long t;
+	unsigned long flags;
 
-	__asm__ __volatile__(
-"1:"	PPC_LLARX(%0,0,%1,0) "			# local_dec_return\n\
-	addic	%0,%0,-1\n"
-	PPC405_ERR77(0,%1)
-	PPC_STLCX	"%0,0,%1\n\
-	bne-	1b"
-	: "=&r" (t)
-	: "r" (&(l->a.counter))
-	: "cc", "xer", "memory");
+	powerpc_local_irq_pmu_save(flags);
+	t = l->v;
+	if (t == o)
+		l->v = n;
+        powerpc_local_irq_pmu_restore(flags);
 
 	return t;
 }
 
-#define local_cmpxchg(l, o, n) \
-	(cmpxchg_local(&((l)->a.counter), (o), (n)))
-#define local_xchg(l, n) (xchg_local(&((l)->a.counter), (n)))
+static __inline__ long local_xchg(local_t *l, long n)
+{
+	long t;
+	unsigned long flags;
+
+	powerpc_local_irq_pmu_save(flags);
+	t = l->v;
+	l->v = n;
+        powerpc_local_irq_pmu_restore(flags);
+
+	return t;
+}
 
 /**
  * local_add_unless - add unless the number is a given value
@@ -114,62 +113,35 @@ static __inline__ long local_dec_return(local_t *l)
  */
 static __inline__ int local_add_unless(local_t *l, long a, long u)
 {
-	long t;
-
-	__asm__ __volatile__ (
-"1:"	PPC_LLARX(%0,0,%1,0) "			# local_add_unless\n\
-	cmpw	0,%0,%3 \n\
-	beq-	2f \n\
-	add	%0,%2,%0 \n"
-	PPC405_ERR77(0,%2)
-	PPC_STLCX	"%0,0,%1 \n\
-	bne-	1b \n"
-"	subf	%0,%2,%0 \n\
-2:"
-	: "=&r" (t)
-	: "r" (&(l->a.counter)), "r" (a), "r" (u)
-	: "cc", "memory");
-
-	return t != u;
-}
-
-#define local_inc_not_zero(l) local_add_unless((l), 1, 0)
-
-#define local_sub_and_test(a, l)	(local_sub_return((a), (l)) == 0)
-#define local_dec_and_test(l)		(local_dec_return((l)) == 0)
-
-/*
- * Atomically test *l and decrement if it is greater than 0.
- * The function returns the old value of *l minus 1.
- */
-static __inline__ long local_dec_if_positive(local_t *l)
-{
-	long t;
+	unsigned long flags;
+	int ret = 0;
 
-	__asm__ __volatile__(
-"1:"	PPC_LLARX(%0,0,%1,0) "			# local_dec_if_positive\n\
-	cmpwi	%0,1\n\
-	addi	%0,%0,-1\n\
-	blt-	2f\n"
-	PPC405_ERR77(0,%1)
-	PPC_STLCX	"%0,0,%1\n\
-	bne-	1b"
-	"\n\
-2:"	: "=&b" (t)
-	: "r" (&(l->a.counter))
-	: "cc", "memory");
+	powerpc_local_irq_pmu_save(flags);
+	if (l->v != u) {
+		l->v += a;
+		ret = 1;
+	}
+        powerpc_local_irq_pmu_restore(flags);
 
-	return t;
+	return ret;
 }
 
+#define local_inc_not_zero(l)		local_add_unless((l), 1, 0)
+
 /* Use these for per-cpu local_t variables: on some archs they are
  * much more efficient than these naive implementations.  Note they take
  * a variable, not an address.
  */
 
-#define __local_inc(l)		((l)->a.counter++)
-#define __local_dec(l)		((l)->a.counter++)
-#define __local_add(i,l)	((l)->a.counter+=(i))
-#define __local_sub(i,l)	((l)->a.counter-=(i))
+#define __local_inc(l)		((l)->v++)
+#define __local_dec(l)		((l)->v++)
+#define __local_add(i,l)	((l)->v+=(i))
+#define __local_sub(i,l)	((l)->v-=(i))
+
+#else /* CONFIG_PPC64 */
+
+#include <asm-generic/local.h>
+
+#endif /* CONFIG_PPC64 */
 
 #endif /* _ARCH_POWERPC_LOCAL_H */
-- 
2.7.4

      parent reply	other threads:[~2017-08-20 18:00 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-20 17:58 [PATCH v10 00/14] powerpc: "paca->soft_enabled" based local atomic operation implementation Madhavan Srinivasan
2017-08-20 17:58 ` [PATCH v10 01/14] powerpc: Add #defs for paca->soft_enabled flags Madhavan Srinivasan
2017-08-20 17:58 ` [PATCH v10 02/14] powerpc: move set_soft_enabled() and rename Madhavan Srinivasan
2017-08-20 17:58 ` [PATCH v10 03/14] powerpc: Use soft_enabled_set api to update paca->soft_enabled Madhavan Srinivasan
2017-08-20 17:58 ` [PATCH v10 04/14] powerpc: implement arch_local_* using the soft_enabled_* wrappers Madhavan Srinivasan
2017-08-20 17:58 ` [PATCH v10 05/14] powerpc/irq: Cleanup hard_irq_disable() macro Madhavan Srinivasan
2017-08-20 17:58 ` [PATCH v10 06/14] powerpc/irq: Fix arch_local_irq_disable() in book3s Madhavan Srinivasan
2017-08-20 17:58 ` [PATCH v10 07/14] powerpc: Change from enable flag to disable bitmask Madhavan Srinivasan
2017-08-20 17:58 ` [PATCH v10 08/14] powerpc: Rename soft_enabled to soft_disable_mask Madhavan Srinivasan
2017-08-20 17:58 ` [PATCH v10 09/14] powerpc: Avoid using EXCEPTION_PROLOG_1 macro in MASKABLE_* Madhavan Srinivasan
2017-08-20 17:58 ` [PATCH v10 10/14] powerpc: Add support to take additional parameter in MASKABLE_* macro Madhavan Srinivasan
2017-08-20 17:58 ` [PATCH v10 11/14] powerpc: Add support to mask perf interrupts and replay them Madhavan Srinivasan
2017-08-20 17:58 ` [PATCH v10 12/14] powerpc:Add new kconfig IRQ_DEBUG_SUPPORT Madhavan Srinivasan
2017-08-20 17:58 ` [PATCH v10 13/14] powerpc: Add new set of soft_disable_mask_ functions Madhavan Srinivasan
2017-08-20 17:58 ` Madhavan Srinivasan [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1503251911-30637-15-git-send-email-maddy@linux.vnet.ibm.com \
    --to=maddy@linux.vnet.ibm.com \
    --cc=anton@samba.org \
    --cc=benh@kernel.crashing.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).