Re: [Qemu-devel] MTTCG status updates, benchmark results and KVM forum plans

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Emilio G. Cota" <cota@braap.org>
To: "Alex Bennée" <alex.bennee@linaro.org>
Cc: mttcg@listserver.greensocs.com, qemu-devel@nongnu.org,
	fred.konrad@greensocs.com, a.rigo@virtualopensystems.com,
	bobby.prani@gmail.com, nikunj@linux.vnet.ibm.com,
	mark.burton@greensocs.com, pbonzini@redhat.com,
	jan.kiszka@siemens.com, serge.fdrv@gmail.com, rth@twiddle.net,
	peter.maydell@linaro.org, claudio.fontana@huawei.com,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Peter Crosthwaite <crosthwaite.peter@gmail.com>
Subject: Re: [Qemu-devel] MTTCG status updates, benchmark results and KVM forum plans
Date: Mon, 15 Aug 2016 11:46:26 -0400	[thread overview]
Message-ID: <20160815154626.GA8768@flamenco> (raw)
In-Reply-To: <87mvkeqph3.fsf@linaro.org>

On Mon, Aug 15, 2016 at 11:46:32 +0100, Alex Bennée wrote:
> As far as I'm aware the following work is still ongoing:
> 
> Emilo: cmpxchg atomics
> Alvise: LL/SC modelling

I've been tinkering with an experimental patch to do proper LL/SC. The idea
is to rely on hardware transactional memory, so that stores don't have
to be tracked. The trickiest thing is the fallback path, for which I'm
trying to (ab)use EXCP_ATOMIC to execute exclusively from the ldrex
all the way to the strex.

To test it, I'm using aarch64-linux-user running qht-bench compiled on
an aarch64 machine. I'm running on an Intel Skylake host (Skylake has
no known TSX bugs)

However, I'm finding issues that might not have to do with the
patch itself.

- On the latest MTTCG+cmpxchg tree (45c11751ed7 a.k.a.
  bennee/mttcg/base-patches-v4-with-cmpxchg-atomics-v2), QEMU loops
  forever without making progress in the instruction stream, even
  with taskset -c 0.
- On the cmpxchg tree (rth's atomic-2 branch [1]), it works more
  reliably, although tb_lock is held around tb_find_fast so parallelism isn't
  very high. Still, it sometimes triggers the assert below.
  - Applying the "remove tb_lock around hot path" patch makes it
    easier to trigger this assert in cpu-exec.c:650 (approx.):
            /* Assert that the compiler does not smash local variables. */
            g_assert(cpu == current_cpu)
    I've also seen triggered the assert immediately after that one, as well
    as the rcu_read_unlock depth assert.
  The asserts are usually triggered when all threads exit (by returning
  NULL) at roughly the same time.
  However, they cannot be triggered with taskset -c 0, which makes me
  suspect that somehow start_exclusive isn't working as intended.

Any tips would be appreciated! I'll reply with a patch that uses RTM,
the one below is fallback path all the way, and the best to reproduce
the above.

Thanks,

		Emilio

[1] https://github.com/rth7680/qemu/commits/atomic-2

>From ed6af6eb364e5a36e81d7cc8143c0e9783c50587 Mon Sep 17 00:00:00 2001
From: "Emilio G. Cota" <cota@braap.org>
Date: Mon, 15 Aug 2016 00:27:42 +0200
Subject: [PATCH] aarch64: use TSX for ldrex/strex (fallback path only)

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 linux-user/main.c          |  5 +++--
 target-arm/helper-a64.c    | 23 +++++++++++++++++++++++
 target-arm/helper-a64.h    |  4 ++++
 target-arm/translate-a64.c | 15 +++++++++------
 4 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/linux-user/main.c b/linux-user/main.c
index 9880505..6922faa 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -192,8 +192,9 @@ static void step_atomic(CPUState *cpu)
 
     /* Since we got here, we know that parallel_cpus must be true.  */
     parallel_cpus = false;
-    cpu_exec_step(cpu);
-    parallel_cpus = true;
+    while (!parallel_cpus) {
+        cpu_exec_step(cpu);
+    }
 
     end_exclusive();
 }
diff --git a/target-arm/helper-a64.c b/target-arm/helper-a64.c
index 8ce518b..a97b631 100644
--- a/target-arm/helper-a64.c
+++ b/target-arm/helper-a64.c
@@ -579,3 +579,26 @@ uint64_t HELPER(paired_cmpxchg64_be)(CPUARMState *env, uint64_t addr,
 
     return !success;
 }
+
+void HELPER(xbegin)(CPUARMState *env)
+{
+    uintptr_t ra = GETPC();
+
+    if (parallel_cpus) {
+        cpu_loop_exit_atomic(ENV_GET_CPU(env), ra);
+    }
+}
+
+void HELPER(xend)(void)
+{
+    assert(!parallel_cpus);
+    parallel_cpus = true;
+}
+
+uint64_t HELPER(x_ok)(void)
+{
+    if (!parallel_cpus) {
+        return 1;
+    }
+    return 0;
+}
diff --git a/target-arm/helper-a64.h b/target-arm/helper-a64.h
index dd32000..e7ede43 100644
--- a/target-arm/helper-a64.h
+++ b/target-arm/helper-a64.h
@@ -48,3 +48,7 @@ DEF_HELPER_FLAGS_3(crc32_64, TCG_CALL_NO_RWG_SE, i64, i64, i64, i32)
 DEF_HELPER_FLAGS_3(crc32c_64, TCG_CALL_NO_RWG_SE, i64, i64, i64, i32)
 DEF_HELPER_FLAGS_4(paired_cmpxchg64_le, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
 DEF_HELPER_FLAGS_4(paired_cmpxchg64_be, TCG_CALL_NO_WG, i64, env, i64, i64, i64)
+
+DEF_HELPER_1(xbegin, void, env)
+DEF_HELPER_0(x_ok, i64)
+DEF_HELPER_0(xend, void)
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 450c359..cfcf440 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -1760,6 +1760,8 @@ static void gen_load_exclusive(DisasContext *s, int rt, int rt2,
     TCGv_i64 tmp = tcg_temp_new_i64();
     TCGMemOp be = s->be_data;
 
+    gen_helper_xbegin(cpu_env);
+
     g_assert(size <= 3);
     if (is_pair) {
         TCGv_i64 hitmp = tcg_temp_new_i64();
@@ -1825,6 +1827,9 @@ static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
     tcg_gen_brcond_i64(TCG_COND_NE, addr, cpu_exclusive_addr, fail_label);
 
     tmp = tcg_temp_new_i64();
+    gen_helper_x_ok(tmp);
+    tcg_gen_brcondi_i64(TCG_COND_EQ, tmp, 0, fail_label);
+
     if (is_pair) {
         if (size == 2) {
             TCGv_i64 val = tcg_temp_new_i64();
@@ -1844,16 +1849,14 @@ static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2,
         }
     } else {
         TCGv_i64 val = cpu_reg(s, rt);
-        tcg_gen_atomic_cmpxchg_i64(tmp, addr, cpu_exclusive_val, val,
-                                   get_mem_index(s),
-                                   size | MO_ALIGN | s->be_data);
-        tcg_gen_setcond_i64(TCG_COND_NE, tmp, tmp, cpu_exclusive_val);
+        tcg_gen_qemu_st_i64(val, addr, get_mem_index(s), s->be_data + size);
     }
 
     tcg_temp_free_i64(addr);
-
-    tcg_gen_mov_i64(cpu_reg(s, rd), tmp);
     tcg_temp_free_i64(tmp);
+
+    tcg_gen_movi_i64(cpu_reg(s, rd), 0);
+    gen_helper_xend();
     tcg_gen_br(done_label);
 
     gen_set_label(fail_label);
-- 
2.7.4

next prev parent reply	other threads:[~2016-08-15 15:47 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-15 10:46 [Qemu-devel] MTTCG status updates, benchmark results and KVM forum plans Alex Bennée
2016-08-15 11:00 ` Peter Maydell
2016-08-15 11:16   ` Alex Bennée
2016-08-15 15:46 ` Emilio G. Cota [this message]
2016-08-15 15:49   ` [Qemu-devel] [PATCH] aarch64: use TSX for ldrex/strex Emilio G. Cota
2016-08-17 17:22     ` Richard Henderson
2016-08-17 17:58       ` Emilio G. Cota
2016-08-17 18:18         ` Emilio G. Cota
2016-08-17 18:41         ` Richard Henderson
2016-08-18 15:38           ` Richard Henderson
2016-08-24 21:12             ` Emilio G. Cota
2016-08-24 22:17               ` [Qemu-devel] [PATCH 1/8] cpu list: convert to RCU QLIST Emilio G. Cota
2016-08-24 22:17                 ` [Qemu-devel] [PATCH 2/8] cpu-exec: remove tb_lock from hot path Emilio G. Cota
2016-08-24 22:17                 ` [Qemu-devel] [PATCH 3/8] rcu: add rcu_read_lock_held() Emilio G. Cota
2016-08-24 22:17                 ` [Qemu-devel] [PATCH 4/8] target-arm: helper fixup for paired atomics Emilio G. Cota
2016-08-24 22:18                 ` [Qemu-devel] [PATCH 5/8] linux-user: add stop-the-world to be called from CPU loop Emilio G. Cota
2016-08-24 22:18                 ` [Qemu-devel] [PATCH 6/8] htm: add header to abstract Hardware Transactional Memory intrinsics Emilio G. Cota
2016-08-24 22:18                 ` [Qemu-devel] [PATCH 7/8] htm: add powerpc64 intrinsics Emilio G. Cota
2016-08-24 22:18                 ` [Qemu-devel] [PATCH 8/8] target-arm/a64: use HTM with stop-the-world fall-back path Emilio G. Cota
2016-08-16 11:16   ` [Qemu-devel] MTTCG status updates, benchmark results and KVM forum plans Alex Bennée
2016-08-16 21:51     ` Emilio G. Cota

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:9880505 dfblob:6922faa dfblob:8ce518b dfblob:a97b631
dfblob:dd32000 dfblob:e7ede43 dfblob:450c359 dfblob:cfcf440 )
 OR (
bs:"aarch64: use TSX for ldrex/strex (fallback path only)" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160815154626.GA8768@flamenco \
    --to=cota@braap.org \
    --cc=a.rigo@virtualopensystems.com \
    --cc=alex.bennee@linaro.org \
    --cc=bobby.prani@gmail.com \
    --cc=claudio.fontana@huawei.com \
    --cc=crosthwaite.peter@gmail.com \
    --cc=dgilbert@redhat.com \
    --cc=fred.konrad@greensocs.com \
    --cc=jan.kiszka@siemens.com \
    --cc=mark.burton@greensocs.com \
    --cc=mttcg@listserver.greensocs.com \
    --cc=nikunj@linux.vnet.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    --cc=serge.fdrv@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).