[Qemu-devel] [PULL 25/35] target-i386: remove helper_lock()

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Richard Henderson <rth@twiddle.net>
To: qemu-devel@nongnu.org
Cc: peter.maydell@linaro.org, "Emilio G. Cota" <cota@braap.org>
Subject: [Qemu-devel] [PULL 25/35] target-i386: remove helper_lock()
Date: Sat, 22 Oct 2016 14:05:14 -0700	[thread overview]
Message-ID: <1477170324-17783-26-git-send-email-rth@twiddle.net> (raw)
In-Reply-To: <1477170324-17783-1-git-send-email-rth@twiddle.net>

From: "Emilio G. Cota" <cota@braap.org>

It's been superseded by the atomic helpers.

The use of the atomic helpers provides a significant performance and scalability
improvement. Below is the result of running the atomic_add-test microbenchmark with:
 $ x86_64-linux-user/qemu-x86_64 tests/atomic_add-bench -o 5000000 -r $r -n $n
, where $n is the number of threads and $r is the allowed range for the additions.

The scenarios measured are:
- atomic: implements x86' ADDL with the atomic_add helper (i.e. this patchset)
- cmpxchg: implement x86' ADDL with a TCG loop using the cmpxchg helper
- master: before this patchset

Results sorted in ascending range, i.e. descending degree of contention.
Y axis is Throughput in Mops/s. Tests are run on an AMD machine with 64
Opteron 6376 cores.

                atomic_add-bench: 5000000 ops/thread, [0,1] range

  25 ++---------+----------+---------+----------+----------+----------+---++
     + atomic +-E--+       +         +          +          +          +    |
     |cmpxchg +-H--+                                                       |
  20 +Emaster +-N--+                                                      ++
     ||                                                                    |
     |++                                                                   |
     ||                                                                    |
  15 +++                                                                  ++
     |N|                                                                   |
     |+|                                                                   |
  10 ++|                                                                  ++
     |+|+                                                                  |
     | |    -+E+------        +++  ---+E+------+E+------+E+-----+E+------+E|
     |+E+E+- +++     +E+------+E+--                                        |
   5 ++|+                                                                 ++
     |+N+H+---                                 +++                         |
     ++++N+--+H++----+++   +  +++  --++H+------+H+------+H++----+H+---+--- |
   0 ++---------+-----H----+---H-----+----------+----------+----------+---H+
     0          10         20        30         40         50         60
                                Number of threads

                atomic_add-bench: 5000000 ops/thread, [0,2] range

  25 ++---------+----------+---------+----------+----------+----------+---++
     ++atomic +-E--+       +         +          +          +          +    |
     |cmpxchg +-H--+                                                       |
  20 ++master +-N--+                                                      ++
     |E|                                                                   |
     |++                                                                   |
     ||E                                                                   |
  15 ++|                                                                  ++
     |N||                                                                  |
     |+||                                   ---+E+------+E+-----+E+------+E|
  10 ++| |        ---+E+------+E+-----+E+---                    +++      +++
     ||H+E+--+E+--                                                         |
     |+++++                                                                |
     | ||                                                                  |
   5 ++|+H+--                                  +++                        ++
     |+N+    -                              ---+H+------+H+------          |
     +  +N+--+H++----+H+---+--+H+----++H+---    +          +    +H+---+--+H|
   0 ++---------+----------+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

                atomic_add-bench: 5000000 ops/thread, [0,8] range

  40 ++---------+----------+---------+----------+----------+----------+---++
     ++atomic +-E--+       +         +          +          +          +    |
  35 +cmpxchg +-H--+                                                      ++
     | master +-N--+               ---+E+------+E+------+E+-----+E+------+E|
  30 ++|                   ---+E+--   +++                                 ++
     | |            -+E+---                                                |
  25 ++E        ---- +++                                                  ++
     |+++++ -+E+                                                           |
  20 +E+ E-- +++                                                          ++
     |H|+++                                                                |
     |+|                                       +H+-------                  |
  15 ++H+                                   ---+++      +H+------         ++
     |N++H+--                         +++---                    +H+------++|
  10 ++ +++  -       +++           ---+H+                       +++      +H+
     | |     +H+-----+H+------+H+--                                        |
   5 ++|                      +++                                         ++
     ++N+N+--+N++          +         +          +          +          +    |
   0 ++---------+----------+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

               atomic_add-bench: 5000000 ops/thread, [0,128] range

  160 ++---------+---------+----------+---------+----------+----------+---++
      + atomic +-E--+      +          +         +          +          +    |
  140 +cmpxchg +-H--+                          +++      +++               ++
      | master +-N--+                           E--------E------+E+------++|
  120 ++                                      --|        |      +++       E+
      |                                     -- +++      +++              ++|
  100 ++                                   -                              ++
      |                                +++-                     +++      ++|
   80 ++                              -+E+    -+H+------+H+------H--------++
      |                           ----    ----                  +++       H|
      |            ---+E+-----+E+-  ---+H+                               ++|
   60 ++     +E+---   +++  ---+H+---                                      ++
      |    --+++   ---+H+--                                                |
   40 ++ +E+-+H+---                                                       ++
      |  +H+                                                               |
   20 +EE+                                                                ++
      +N+        +         +          +         +          +          +    |
    0 ++N-N---N--+---------+----------+---------+----------+----------+---++
      0          10        20         30        40         50         60
                                Number of threads

              atomic_add-bench: 5000000 ops/thread, [0,1024] range

  350 ++---------+---------+----------+---------+----------+----------+---++
      + atomic +-E--+      +          +         +          +          +    |
  300 +cmpxchg +-H--+                                                    +++
      | master +-N--+                                           +++       ||
      |                                                 +++      |    ----E|
  250 ++                                                 |   ----E----    ++
      |                                              ----E---    |    ---+H|
  200 ++                                      -+E+---   +++  ---+H+---    ++
      |                                   ----         -+H+--              |
      |                                +E+     +++ ---- +++                |
  150 ++                            ---+++  ---+H+-                       ++
      |                          ---  -+H+--                               |
  100 ++                   ---+E+ ---- +++                                ++
      |      +++   ---+E+-----+H+-                                         |
      |     -+E+------+H+--                                                |
   50 ++ +E+                                                              ++
      +EE+       +         +          +         +          +          +    |
    0 ++N-N---N--+---------+----------+---------+----------+----------+---++
      0          10        20         30        40         50         60
                                Number of threads

  hi-res: http://imgur.com/a/fMRmq

For master I stopped measuring master after 8 threads, because there is little
point in measuring the well-known performance collapse of a contended lock.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-21-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-i386/helper.h     |  2 --
 target-i386/mem_helper.c | 33 ---------------------------------
 target-i386/translate.c  | 15 ---------------
 3 files changed, 50 deletions(-)

diff --git a/target-i386/helper.h b/target-i386/helper.h
index 729d4b6..4e859eb 100644
--- a/target-i386/helper.h
+++ b/target-i386/helper.h
@@ -1,8 +1,6 @@
 DEF_HELPER_FLAGS_4(cc_compute_all, TCG_CALL_NO_RWG_SE, tl, tl, tl, tl, int)
 DEF_HELPER_FLAGS_4(cc_compute_c, TCG_CALL_NO_RWG_SE, tl, tl, tl, tl, int)
 
-DEF_HELPER_0(lock, void)
-DEF_HELPER_0(unlock, void)
 DEF_HELPER_3(write_eflags, void, env, tl, i32)
 DEF_HELPER_1(read_eflags, tl, env)
 DEF_HELPER_2(divb_AL, void, env, tl)
diff --git a/target-i386/mem_helper.c b/target-i386/mem_helper.c
index c4b5c5b..70f6766 100644
--- a/target-i386/mem_helper.c
+++ b/target-i386/mem_helper.c
@@ -25,39 +25,6 @@
 #include "qemu/int128.h"
 #include "tcg.h"
 
-/* broken thread support */
-
-#if defined(CONFIG_USER_ONLY)
-QemuMutex global_cpu_lock;
-
-void helper_lock(void)
-{
-    qemu_mutex_lock(&global_cpu_lock);
-}
-
-void helper_unlock(void)
-{
-    qemu_mutex_unlock(&global_cpu_lock);
-}
-
-void helper_lock_init(void)
-{
-    qemu_mutex_init(&global_cpu_lock);
-}
-#else
-void helper_lock(void)
-{
-}
-
-void helper_unlock(void)
-{
-}
-
-void helper_lock_init(void)
-{
-}
-#endif
-
 void helper_cmpxchg8b_unlocked(CPUX86State *env, target_ulong a0)
 {
     uintptr_t ra = GETPC();
diff --git a/target-i386/translate.c b/target-i386/translate.c
index c8827f3..ac3f6f4 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -4537,10 +4537,6 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
     s->aflag = aflag;
     s->dflag = dflag;
 
-    /* lock generation */
-    if (prefixes & PREFIX_LOCK)
-        gen_helper_lock();
-
     /* now check op code */
  reswitch:
     switch(b) {
@@ -8211,20 +8207,11 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
     default:
         goto unknown_op;
     }
-    /* lock generation */
-    if (s->prefix & PREFIX_LOCK)
-        gen_helper_unlock();
     return s->pc;
  illegal_op:
-    if (s->prefix & PREFIX_LOCK)
-        gen_helper_unlock();
-    /* XXX: ensure that no lock was generated */
     gen_illegal_opcode(s);
     return s->pc;
  unknown_op:
-    if (s->prefix & PREFIX_LOCK)
-        gen_helper_unlock();
-    /* XXX: ensure that no lock was generated */
     gen_unknown_opcode(env, s);
     return s->pc;
 }
@@ -8316,8 +8303,6 @@ void tcg_x86_init(void)
                                      offsetof(CPUX86State, bnd_regs[i].ub),
                                      bnd_regu_names[i]);
     }
-
-    helper_lock_init();
 }
 
 /* generate intermediate code for basic block 'tb'.  */
-- 
2.7.4

next prev parent reply	other threads:[~2016-10-22 21:05 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-22 21:04 [Qemu-devel] [PULL 00/35] cmpxchg atomic operations Richard Henderson
2016-10-22 21:04 ` [Qemu-devel] [PULL 01/35] atomics: add atomic_xor Richard Henderson
2016-10-22 21:04 ` [Qemu-devel] [PULL 02/35] atomics: add atomic_op_fetch variants Richard Henderson
2016-10-22 21:04 ` [Qemu-devel] [PULL 03/35] exec: Avoid direct references to Int128 parts Richard Henderson
2016-10-22 21:04 ` [Qemu-devel] [PULL 04/35] int128: Use __int128 if available Richard Henderson
2016-10-22 21:04 ` [Qemu-devel] [PULL 05/35] int128: Add int128_make128 Richard Henderson
2016-10-22 21:04 ` [Qemu-devel] [PULL 07/35] linux-user: enable parallel code generation on clone Richard Henderson
2016-10-22 21:04 ` [Qemu-devel] [PULL 08/35] cputlb: Replace SHIFT with DATA_SIZE Richard Henderson
2016-10-22 21:04 ` [Qemu-devel] [PULL 09/35] cputlb: Move probe_write out of softmmu_template.h Richard Henderson
2016-10-22 21:04 ` [Qemu-devel] [PULL 10/35] cputlb: Remove includes from softmmu_template.h Richard Henderson
2016-10-22 21:05 ` [Qemu-devel] [PULL 11/35] cputlb: Move most of iotlb code out of line Richard Henderson
2016-10-22 21:05 ` [Qemu-devel] [PULL 12/35] cputlb: Tidy some macros Richard Henderson
2016-10-22 21:05 ` [Qemu-devel] [PULL 13/35] tcg: Add atomic helpers Richard Henderson
2016-10-22 21:05 ` [Qemu-devel] [PULL 14/35] tcg: Add atomic128 helpers Richard Henderson
2016-10-22 21:05 ` [Qemu-devel] [PULL 15/35] tcg: Add CONFIG_ATOMIC64 Richard Henderson
2016-10-22 21:05 ` [Qemu-devel] [PULL 16/35] tcg: Emit barriers with parallel_cpus Richard Henderson
2016-10-22 21:05 ` [Qemu-devel] [PULL 17/35] target-i386: emulate LOCK'ed cmpxchg using cmpxchg helpers Richard Henderson
2016-10-22 21:05 ` [Qemu-devel] [PULL 18/35] target-i386: emulate LOCK'ed OP instructions using atomic helpers Richard Henderson
2016-10-22 21:05 ` [Qemu-devel] [PULL 19/35] target-i386: emulate LOCK'ed INC using atomic helper Richard Henderson
2016-10-22 21:05 ` [Qemu-devel] [PULL 20/35] target-i386: emulate LOCK'ed NOT " Richard Henderson
2016-10-22 21:05 ` [Qemu-devel] [PULL 21/35] target-i386: emulate LOCK'ed NEG using cmpxchg helper Richard Henderson
2016-10-22 21:05 ` [Qemu-devel] [PULL 22/35] target-i386: emulate LOCK'ed XADD using atomic helper Richard Henderson
2016-10-22 21:05 ` [Qemu-devel] [PULL 23/35] target-i386: emulate LOCK'ed BTX ops using atomic helpers Richard Henderson
2016-10-22 21:05 ` [Qemu-devel] [PULL 24/35] target-i386: emulate XCHG using atomic helper Richard Henderson
2016-10-22 21:05 ` Richard Henderson [this message]
2016-10-22 21:05 ` [Qemu-devel] [PULL 26/35] tests: add atomic_add-bench Richard Henderson
2016-10-22 21:05 ` [Qemu-devel] [PULL 27/35] target-arm: Rearrange aa32 load and store functions Richard Henderson
2016-10-22 21:05 ` [Qemu-devel] [PULL 28/35] target-arm: emulate LL/SC using cmpxchg helpers Richard Henderson
2016-10-22 21:05 ` [Qemu-devel] [PULL 29/35] target-arm: emulate SWP with atomic_xchg helper Richard Henderson
2016-10-22 21:05 ` [Qemu-devel] [PULL 30/35] target-arm: emulate aarch64's LL/SC using cmpxchg helpers Richard Henderson
2016-10-22 21:05 ` [Qemu-devel] [PULL 31/35] linux-user: remove handling of ARM's EXCP_STREX Richard Henderson
2016-10-22 21:05 ` [Qemu-devel] [PULL 32/35] linux-user: remove handling of aarch64's EXCP_STREX Richard Henderson
2016-10-22 21:05 ` [Qemu-devel] [PULL 33/35] target-arm: remove EXCP_STREX + cpu_exclusive_{test, info} Richard Henderson
2016-10-22 21:05 ` [Qemu-devel] [PULL 34/35] target-alpha: Introduce MMU_PHYS_IDX Richard Henderson
2016-10-22 21:05 ` [Qemu-devel] [PULL 35/35] target-alpha: Emulate LL/SC using cmpxchg helpers Richard Henderson
2016-10-24 10:51 ` [Qemu-devel] [PULL 00/35] cmpxchg atomic operations Peter Maydell
2016-10-24 12:37   ` Paolo Bonzini
2016-10-24 14:02     ` Peter Maydell
2016-10-24 17:27   ` Richard Henderson
2016-10-24 18:02     ` Peter Maydell

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:729d4b6 dfblob:4e859eb dfblob:c4b5c5b dfblob:70f6766
dfblob:c8827f3 dfblob:ac3f6f4 )
 OR (
bs:"[Qemu-devel] [PULL 25/35] target-i386: remove helper_lock()" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1477170324-17783-26-git-send-email-rth@twiddle.net \
    --to=rth@twiddle.net \
    --cc=cota@braap.org \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).