qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Richard Henderson <rth@twiddle.net>
To: qemu-devel@nongnu.org
Cc: peter.maydell@linaro.org, cota@braap.org, alex.bennee@linaro.org
Subject: [Qemu-devel] [PATCH v8 27/37] target-i386: remove helper_lock()
Date: Mon, 24 Oct 2016 10:39:38 -0700	[thread overview]
Message-ID: <1477330788-14996-28-git-send-email-rth@twiddle.net> (raw)
In-Reply-To: <1477330788-14996-1-git-send-email-rth@twiddle.net>

From: "Emilio G. Cota" <cota@braap.org>

It's been superseded by the atomic helpers.

The use of the atomic helpers provides a significant performance and scalability
improvement. Below is the result of running the atomic_add-test microbenchmark with:
 $ x86_64-linux-user/qemu-x86_64 tests/atomic_add-bench -o 5000000 -r $r -n $n
, where $n is the number of threads and $r is the allowed range for the additions.

The scenarios measured are:
- atomic: implements x86' ADDL with the atomic_add helper (i.e. this patchset)
- cmpxchg: implement x86' ADDL with a TCG loop using the cmpxchg helper
- master: before this patchset

Results sorted in ascending range, i.e. descending degree of contention.
Y axis is Throughput in Mops/s. Tests are run on an AMD machine with 64
Opteron 6376 cores.

                atomic_add-bench: 5000000 ops/thread, [0,1] range

  25 ++---------+----------+---------+----------+----------+----------+---++
     + atomic +-E--+       +         +          +          +          +    |
     |cmpxchg +-H--+                                                       |
  20 +Emaster +-N--+                                                      ++
     ||                                                                    |
     |++                                                                   |
     ||                                                                    |
  15 +++                                                                  ++
     |N|                                                                   |
     |+|                                                                   |
  10 ++|                                                                  ++
     |+|+                                                                  |
     | |    -+E+------        +++  ---+E+------+E+------+E+-----+E+------+E|
     |+E+E+- +++     +E+------+E+--                                        |
   5 ++|+                                                                 ++
     |+N+H+---                                 +++                         |
     ++++N+--+H++----+++   +  +++  --++H+------+H+------+H++----+H+---+--- |
   0 ++---------+-----H----+---H-----+----------+----------+----------+---H+
     0          10         20        30         40         50         60
                                Number of threads

                atomic_add-bench: 5000000 ops/thread, [0,2] range

  25 ++---------+----------+---------+----------+----------+----------+---++
     ++atomic +-E--+       +         +          +          +          +    |
     |cmpxchg +-H--+                                                       |
  20 ++master +-N--+                                                      ++
     |E|                                                                   |
     |++                                                                   |
     ||E                                                                   |
  15 ++|                                                                  ++
     |N||                                                                  |
     |+||                                   ---+E+------+E+-----+E+------+E|
  10 ++| |        ---+E+------+E+-----+E+---                    +++      +++
     ||H+E+--+E+--                                                         |
     |+++++                                                                |
     | ||                                                                  |
   5 ++|+H+--                                  +++                        ++
     |+N+    -                              ---+H+------+H+------          |
     +  +N+--+H++----+H+---+--+H+----++H+---    +          +    +H+---+--+H|
   0 ++---------+----------+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

                atomic_add-bench: 5000000 ops/thread, [0,8] range

  40 ++---------+----------+---------+----------+----------+----------+---++
     ++atomic +-E--+       +         +          +          +          +    |
  35 +cmpxchg +-H--+                                                      ++
     | master +-N--+               ---+E+------+E+------+E+-----+E+------+E|
  30 ++|                   ---+E+--   +++                                 ++
     | |            -+E+---                                                |
  25 ++E        ---- +++                                                  ++
     |+++++ -+E+                                                           |
  20 +E+ E-- +++                                                          ++
     |H|+++                                                                |
     |+|                                       +H+-------                  |
  15 ++H+                                   ---+++      +H+------         ++
     |N++H+--                         +++---                    +H+------++|
  10 ++ +++  -       +++           ---+H+                       +++      +H+
     | |     +H+-----+H+------+H+--                                        |
   5 ++|                      +++                                         ++
     ++N+N+--+N++          +         +          +          +          +    |
   0 ++---------+----------+---------+----------+----------+----------+---++
     0          10         20        30         40         50         60
                                Number of threads

               atomic_add-bench: 5000000 ops/thread, [0,128] range

  160 ++---------+---------+----------+---------+----------+----------+---++
      + atomic +-E--+      +          +         +          +          +    |
  140 +cmpxchg +-H--+                          +++      +++               ++
      | master +-N--+                           E--------E------+E+------++|
  120 ++                                      --|        |      +++       E+
      |                                     -- +++      +++              ++|
  100 ++                                   -                              ++
      |                                +++-                     +++      ++|
   80 ++                              -+E+    -+H+------+H+------H--------++
      |                           ----    ----                  +++       H|
      |            ---+E+-----+E+-  ---+H+                               ++|
   60 ++     +E+---   +++  ---+H+---                                      ++
      |    --+++   ---+H+--                                                |
   40 ++ +E+-+H+---                                                       ++
      |  +H+                                                               |
   20 +EE+                                                                ++
      +N+        +         +          +         +          +          +    |
    0 ++N-N---N--+---------+----------+---------+----------+----------+---++
      0          10        20         30        40         50         60
                                Number of threads

              atomic_add-bench: 5000000 ops/thread, [0,1024] range

  350 ++---------+---------+----------+---------+----------+----------+---++
      + atomic +-E--+      +          +         +          +          +    |
  300 +cmpxchg +-H--+                                                    +++
      | master +-N--+                                           +++       ||
      |                                                 +++      |    ----E|
  250 ++                                                 |   ----E----    ++
      |                                              ----E---    |    ---+H|
  200 ++                                      -+E+---   +++  ---+H+---    ++
      |                                   ----         -+H+--              |
      |                                +E+     +++ ---- +++                |
  150 ++                            ---+++  ---+H+-                       ++
      |                          ---  -+H+--                               |
  100 ++                   ---+E+ ---- +++                                ++
      |      +++   ---+E+-----+H+-                                         |
      |     -+E+------+H+--                                                |
   50 ++ +E+                                                              ++
      +EE+       +         +          +         +          +          +    |
    0 ++N-N---N--+---------+----------+---------+----------+----------+---++
      0          10        20         30        40         50         60
                                Number of threads

  hi-res: http://imgur.com/a/fMRmq

For master I stopped measuring master after 8 threads, because there is little
point in measuring the well-known performance collapse of a contended lock.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <1467054136-10430-21-git-send-email-cota@braap.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-i386/helper.h     |  2 --
 target-i386/mem_helper.c | 33 ---------------------------------
 target-i386/translate.c  | 15 ---------------
 3 files changed, 50 deletions(-)

diff --git a/target-i386/helper.h b/target-i386/helper.h
index 729d4b6..4e859eb 100644
--- a/target-i386/helper.h
+++ b/target-i386/helper.h
@@ -1,8 +1,6 @@
 DEF_HELPER_FLAGS_4(cc_compute_all, TCG_CALL_NO_RWG_SE, tl, tl, tl, tl, int)
 DEF_HELPER_FLAGS_4(cc_compute_c, TCG_CALL_NO_RWG_SE, tl, tl, tl, tl, int)
 
-DEF_HELPER_0(lock, void)
-DEF_HELPER_0(unlock, void)
 DEF_HELPER_3(write_eflags, void, env, tl, i32)
 DEF_HELPER_1(read_eflags, tl, env)
 DEF_HELPER_2(divb_AL, void, env, tl)
diff --git a/target-i386/mem_helper.c b/target-i386/mem_helper.c
index c4b5c5b..70f6766 100644
--- a/target-i386/mem_helper.c
+++ b/target-i386/mem_helper.c
@@ -25,39 +25,6 @@
 #include "qemu/int128.h"
 #include "tcg.h"
 
-/* broken thread support */
-
-#if defined(CONFIG_USER_ONLY)
-QemuMutex global_cpu_lock;
-
-void helper_lock(void)
-{
-    qemu_mutex_lock(&global_cpu_lock);
-}
-
-void helper_unlock(void)
-{
-    qemu_mutex_unlock(&global_cpu_lock);
-}
-
-void helper_lock_init(void)
-{
-    qemu_mutex_init(&global_cpu_lock);
-}
-#else
-void helper_lock(void)
-{
-}
-
-void helper_unlock(void)
-{
-}
-
-void helper_lock_init(void)
-{
-}
-#endif
-
 void helper_cmpxchg8b_unlocked(CPUX86State *env, target_ulong a0)
 {
     uintptr_t ra = GETPC();
diff --git a/target-i386/translate.c b/target-i386/translate.c
index cfa3956..927b366 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -4536,10 +4536,6 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
     s->aflag = aflag;
     s->dflag = dflag;
 
-    /* lock generation */
-    if (prefixes & PREFIX_LOCK)
-        gen_helper_lock();
-
     /* now check op code */
  reswitch:
     switch(b) {
@@ -8211,20 +8207,11 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
     default:
         goto unknown_op;
     }
-    /* lock generation */
-    if (s->prefix & PREFIX_LOCK)
-        gen_helper_unlock();
     return s->pc;
  illegal_op:
-    if (s->prefix & PREFIX_LOCK)
-        gen_helper_unlock();
-    /* XXX: ensure that no lock was generated */
     gen_illegal_opcode(s);
     return s->pc;
  unknown_op:
-    if (s->prefix & PREFIX_LOCK)
-        gen_helper_unlock();
-    /* XXX: ensure that no lock was generated */
     gen_unknown_opcode(env, s);
     return s->pc;
 }
@@ -8316,8 +8303,6 @@ void tcg_x86_init(void)
                                      offsetof(CPUX86State, bnd_regs[i].ub),
                                      bnd_regu_names[i]);
     }
-
-    helper_lock_init();
 }
 
 /* generate intermediate code for basic block 'tb'.  */
-- 
2.7.4

  parent reply	other threads:[~2016-10-24 17:40 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-24 17:39 [Qemu-devel] [PATCH v8 00/37] cmpxchg atomic operations Richard Henderson
2016-10-24 17:39 ` [Qemu-devel] [PATCH v8 01/37] atomics: Add parameters to macros Richard Henderson
2016-10-24 18:13   ` Emilio G. Cota
2016-10-24 17:39 ` [Qemu-devel] [PATCH v8 02/37] atomics: add atomic_xor Richard Henderson
2016-10-24 17:39 ` [Qemu-devel] [PATCH v8 03/37] atomics: add atomic_op_fetch variants Richard Henderson
2016-10-24 17:39 ` [Qemu-devel] [PATCH v8 04/37] atomics: Add __nocheck atomic operations Richard Henderson
2016-10-24 18:16   ` Emilio G. Cota
2016-10-24 17:39 ` [Qemu-devel] [PATCH v8 05/37] exec: Avoid direct references to Int128 parts Richard Henderson
2016-10-24 17:39 ` [Qemu-devel] [PATCH v8 06/37] int128: Use __int128 if available Richard Henderson
2016-10-24 17:39 ` [Qemu-devel] [PATCH v8 07/37] int128: Add int128_make128 Richard Henderson
2016-10-24 17:39 ` [Qemu-devel] [PATCH v8 09/37] linux-user: enable parallel code generation on clone Richard Henderson
2016-10-24 17:39 ` [Qemu-devel] [PATCH v8 10/37] cputlb: Replace SHIFT with DATA_SIZE Richard Henderson
2016-10-24 17:39 ` [Qemu-devel] [PATCH v8 11/37] cputlb: Move probe_write out of softmmu_template.h Richard Henderson
2016-10-24 17:39 ` [Qemu-devel] [PATCH v8 12/37] cputlb: Remove includes from softmmu_template.h Richard Henderson
2016-10-24 17:39 ` [Qemu-devel] [PATCH v8 13/37] cputlb: Move most of iotlb code out of line Richard Henderson
2016-10-24 17:39 ` [Qemu-devel] [PATCH v8 14/37] cputlb: Tidy some macros Richard Henderson
2016-10-24 17:39 ` [Qemu-devel] [PATCH v8 15/37] tcg: Add atomic helpers Richard Henderson
2016-10-24 17:39 ` [Qemu-devel] [PATCH v8 16/37] tcg: Add atomic128 helpers Richard Henderson
2016-10-24 17:39 ` [Qemu-devel] [PATCH v8 17/37] tcg: Add CONFIG_ATOMIC64 Richard Henderson
2016-10-24 17:39 ` [Qemu-devel] [PATCH v8 18/37] tcg: Emit barriers with parallel_cpus Richard Henderson
2016-10-24 17:39 ` [Qemu-devel] [PATCH v8 19/37] target-i386: emulate LOCK'ed cmpxchg using cmpxchg helpers Richard Henderson
2016-10-24 17:39 ` [Qemu-devel] [PATCH v8 20/37] target-i386: emulate LOCK'ed OP instructions using atomic helpers Richard Henderson
2016-10-24 17:39 ` [Qemu-devel] [PATCH v8 21/37] target-i386: emulate LOCK'ed INC using atomic helper Richard Henderson
2016-10-24 17:39 ` [Qemu-devel] [PATCH v8 22/37] target-i386: emulate LOCK'ed NOT " Richard Henderson
2016-10-24 17:39 ` [Qemu-devel] [PATCH v8 23/37] target-i386: emulate LOCK'ed NEG using cmpxchg helper Richard Henderson
2016-10-24 17:39 ` [Qemu-devel] [PATCH v8 24/37] target-i386: emulate LOCK'ed XADD using atomic helper Richard Henderson
2016-10-24 17:39 ` [Qemu-devel] [PATCH v8 25/37] target-i386: emulate LOCK'ed BTX ops using atomic helpers Richard Henderson
2016-10-24 17:39 ` [Qemu-devel] [PATCH v8 26/37] target-i386: emulate XCHG using atomic helper Richard Henderson
2016-10-24 17:39 ` Richard Henderson [this message]
2016-10-24 17:39 ` [Qemu-devel] [PATCH v8 28/37] tests: add atomic_add-bench Richard Henderson
2016-10-24 17:39 ` [Qemu-devel] [PATCH v8 29/37] target-arm: Rearrange aa32 load and store functions Richard Henderson
2016-10-24 17:39 ` [Qemu-devel] [PATCH v8 30/37] target-arm: emulate LL/SC using cmpxchg helpers Richard Henderson
2016-10-24 17:39 ` [Qemu-devel] [PATCH v8 31/37] target-arm: emulate SWP with atomic_xchg helper Richard Henderson
2016-10-24 17:39 ` [Qemu-devel] [PATCH v8 32/37] target-arm: emulate aarch64's LL/SC using cmpxchg helpers Richard Henderson
2016-10-24 17:39 ` [Qemu-devel] [PATCH v8 33/37] linux-user: remove handling of ARM's EXCP_STREX Richard Henderson
2016-10-24 17:39 ` [Qemu-devel] [PATCH v8 34/37] linux-user: remove handling of aarch64's EXCP_STREX Richard Henderson
2016-10-24 17:39 ` [Qemu-devel] [PATCH v8 35/37] target-arm: remove EXCP_STREX + cpu_exclusive_{test, info} Richard Henderson
2016-10-24 17:39 ` [Qemu-devel] [PATCH v8 36/37] target-alpha: Introduce MMU_PHYS_IDX Richard Henderson
2016-10-24 17:39 ` [Qemu-devel] [PATCH v8 37/37] target-alpha: Emulate LL/SC using cmpxchg helpers Richard Henderson
2016-10-24 18:27 ` [Qemu-devel] [PATCH v8 00/37] cmpxchg atomic operations Emilio G. Cota

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1477330788-14996-28-git-send-email-rth@twiddle.net \
    --to=rth@twiddle.net \
    --cc=alex.bennee@linaro.org \
    --cc=cota@braap.org \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).