From: Davidlohr Bueso <dave@stgolabs.net>
To: npiggin@gmail.com
Cc: Davidlohr Bueso <dave@stgolabs.net>,
will@kernel.org, dbueso@suse.de, peterz@infradead.org,
linux-kernel@vger.kernel.org, mingo@redhat.com, paulus@samba.org,
longman@redhat.com, linuxppc-dev@lists.ozlabs.org
Subject: [PATCH v2] powerpc/qspinlock: Use generic smp_cond_load_relaxed
Date: Thu, 18 Mar 2021 13:47:02 -0700 [thread overview]
Message-ID: <20210318204702.71417-1-dave@stgolabs.net> (raw)
In-Reply-To: <1615870473.h7h4jetmjb.astroid@bobo.none>
49a7d46a06c3 (powerpc: Implement smp_cond_load_relaxed()) added
busy-waiting pausing with a preferred SMT priority pattern, lowering
the priority (reducing decode cycles) during the whole loop slowpath.
However, data shows that while this pattern works well with simple
spinlocks, queued spinlocks benefit more being kept in medium priority,
with a cpu_relax() instead, being a low+medium combo on powerpc.
Data is from three benchmarks on a Power9: 9008-22L 64 CPUs with
2 sockets and 8 threads per core.
1. locktorture.
This is data for the lowest and most artificial/pathological level,
with increasing thread counts pounding on the lock. Metrics are total
ops/minute. Despite some small hits in the 4-8 range, scenarios are
either neutral or favorable to this patch.
+=========+==========+==========+=======+
| # tasks | vanilla | dirty | %diff |
+=========+==========+==========+=======+
| 2 | 46718565 | 48751350 | 4.35 |
+---------+----------+----------+-------+
| 4 | 51740198 | 50369082 | -2.65 |
+---------+----------+----------+-------+
| 8 | 63756510 | 62568821 | -1.86 |
+---------+----------+----------+-------+
| 16 | 67824531 | 70966546 | 4.63 |
+---------+----------+----------+-------+
| 32 | 53843519 | 61155508 | 13.58 |
+---------+----------+----------+-------+
| 64 | 53005778 | 53104412 | 0.18 |
+---------+----------+----------+-------+
| 128 | 53331980 | 54606910 | 2.39 |
+=========+==========+==========+=======+
2. sockperf (tcp throughput)
Here a client will do one-way throughput tests to a localhost server, with
increasing message sizes, dealing with the sk_lock. This patch shows to put
the performance of the qspinlock back to par with that of the simple lock:
simple-spinlock vanilla dirty
Hmean 14 73.50 ( 0.00%) 54.44 * -25.93%* 73.45 * -0.07%*
Hmean 100 654.47 ( 0.00%) 385.61 * -41.08%* 771.43 * 17.87%*
Hmean 300 2719.39 ( 0.00%) 2181.67 * -19.77%* 2666.50 * -1.94%*
Hmean 500 4400.59 ( 0.00%) 3390.77 * -22.95%* 4322.14 * -1.78%*
Hmean 850 6726.21 ( 0.00%) 5264.03 * -21.74%* 6863.12 * 2.04%*
3. dbench (tmpfs)
Configured to run with up to ncpusx8 clients, it shows both latency and
throughput metrics. For the latency, with the exception of the 64 case,
there is really nothing to go by:
vanilla dirty
Amean latency-1 1.67 ( 0.00%) 1.67 * 0.09%*
Amean latency-2 2.15 ( 0.00%) 2.08 * 3.36%*
Amean latency-4 2.50 ( 0.00%) 2.56 * -2.27%*
Amean latency-8 2.49 ( 0.00%) 2.48 * 0.31%*
Amean latency-16 2.69 ( 0.00%) 2.72 * -1.37%*
Amean latency-32 2.96 ( 0.00%) 3.04 * -2.60%*
Amean latency-64 7.78 ( 0.00%) 8.17 * -5.07%*
Amean latency-512 186.91 ( 0.00%) 186.41 * 0.27%*
For the dbench4 Throughput (misleading but traditional) there's a small
but rather constant improvement:
vanilla dirty
Hmean 1 849.13 ( 0.00%) 851.51 * 0.28%*
Hmean 2 1664.03 ( 0.00%) 1663.94 * -0.01%*
Hmean 4 3073.70 ( 0.00%) 3104.29 * 1.00%*
Hmean 8 5624.02 ( 0.00%) 5694.16 * 1.25%*
Hmean 16 9169.49 ( 0.00%) 9324.43 * 1.69%*
Hmean 32 11969.37 ( 0.00%) 12127.09 * 1.32%*
Hmean 64 15021.12 ( 0.00%) 15243.14 * 1.48%*
Hmean 512 14891.27 ( 0.00%) 15162.11 * 1.82%*
Measuring the dbench4 Per-VFS Operation latency, shows some very minor
differences within the noise level, around the 0-1% ranges.
Fixes: 49a7d46a06c3 (powerpc: Implement smp_cond_load_relaxed())
Acked-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
Changes from v1:
Added small description and labeling smp_cond_load_relaxed requested by Nick.
Added Nick's ack.
arch/powerpc/include/asm/barrier.h | 16 ----------------
arch/powerpc/include/asm/qspinlock.h | 7 +++++++
2 files changed, 7 insertions(+), 16 deletions(-)
diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
index aecfde829d5d..7ae29cfb06c0 100644
--- a/arch/powerpc/include/asm/barrier.h
+++ b/arch/powerpc/include/asm/barrier.h
@@ -80,22 +80,6 @@ do { \
___p1; \
})
-#ifdef CONFIG_PPC64
-#define smp_cond_load_relaxed(ptr, cond_expr) ({ \
- typeof(ptr) __PTR = (ptr); \
- __unqual_scalar_typeof(*ptr) VAL; \
- VAL = READ_ONCE(*__PTR); \
- if (unlikely(!(cond_expr))) { \
- spin_begin(); \
- do { \
- VAL = READ_ONCE(*__PTR); \
- } while (!(cond_expr)); \
- spin_end(); \
- } \
- (typeof(*ptr))VAL; \
-})
-#endif
-
#ifdef CONFIG_PPC_BOOK3S_64
#define NOSPEC_BARRIER_SLOT nop
#elif defined(CONFIG_PPC_FSL_BOOK3E)
diff --git a/arch/powerpc/include/asm/qspinlock.h b/arch/powerpc/include/asm/qspinlock.h
index b052b0624816..9da649e1a488 100644
--- a/arch/powerpc/include/asm/qspinlock.h
+++ b/arch/powerpc/include/asm/qspinlock.h
@@ -72,6 +72,13 @@ static inline void pv_spinlocks_init(void)
#endif
+/*
+ * Queued spinlocks rely heavily on smp_cond_load_relaxed to busy-wait,
+ * which was found that have performance problems if implemented with
+ * the preferred spin_begin()/spin_end() SMT priority pattern. Use the
+ * generic version instead.
+ */
+
#include <asm-generic/qspinlock.h>
#endif /* _ASM_POWERPC_QSPINLOCK_H */
--
2.26.2
next prev parent reply other threads:[~2021-03-18 20:47 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-09 1:59 [PATCH 0/3] powerpc/qspinlock: Some tuning updates Davidlohr Bueso
2021-03-09 1:59 ` [PATCH 1/3] powerpc/spinlock: Define smp_mb__after_spinlock only once Davidlohr Bueso
2021-03-09 1:59 ` [PATCH 2/3] powerpc/spinlock: Unserialize spin_is_locked Davidlohr Bueso
2021-03-09 1:59 ` [PATCH 3/3] powerpc/qspinlock: Use generic smp_cond_load_relaxed Davidlohr Bueso
2021-03-09 9:39 ` Michal Suchánek
2021-03-09 15:46 ` Davidlohr Bueso
2021-03-09 17:30 ` Michal Suchánek
2021-03-16 4:59 ` Nicholas Piggin
2021-03-18 20:02 ` Davidlohr Bueso
2021-03-18 20:47 ` Davidlohr Bueso [this message]
2021-03-31 1:09 ` [PATCH 0/3] powerpc/qspinlock: Some tuning updates Michael Ellerman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210318204702.71417-1-dave@stgolabs.net \
--to=dave@stgolabs.net \
--cc=dbueso@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=longman@redhat.com \
--cc=mingo@redhat.com \
--cc=npiggin@gmail.com \
--cc=paulus@samba.org \
--cc=peterz@infradead.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).