qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Emilio G. Cota" <cota@braap.org>
To: Richard Henderson <rth@twiddle.net>
Cc: "Alex Bennée" <alex.bennee@linaro.org>,
	mttcg@greensocs.com, qemu-devel@nongnu.org,
	fred.konrad@greensocs.com, a.rigo@virtualopensystems.com,
	bobby.prani@gmail.com, nikunj@linux.vnet.ibm.com,
	mark.burton@greensocs.com, pbonzini@redhat.com,
	jan.kiszka@siemens.com, serge.fdrv@gmail.com,
	peter.maydell@linaro.org, claudio.fontana@huawei.com,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	"Peter Crosthwaite" <crosthwaite.peter@gmail.com>
Subject: Re: [Qemu-devel] [PATCH] aarch64: use TSX for ldrex/strex
Date: Wed, 17 Aug 2016 13:58:00 -0400	[thread overview]
Message-ID: <20160817175800.GA5084@flamenco> (raw)
In-Reply-To: <17473d21-f53e-b46f-8882-4b54b6444bc4@twiddle.net>

On Wed, Aug 17, 2016 at 10:22:05 -0700, Richard Henderson wrote:
> On 08/15/2016 08:49 AM, Emilio G. Cota wrote:
> >+void HELPER(xbegin)(CPUARMState *env)
> >+{
> >+    uintptr_t ra = GETPC();
> >+    int status;
> >+    int retries = 100;
> >+
> >+ retry:
> >+    status = _xbegin();
> >+    if (status != _XBEGIN_STARTED) {
> >+        if (status && retries) {
> >+            retries--;
> >+            goto retry;
> >+        }
> >+        if (parallel_cpus) {
> >+            cpu_loop_exit_atomic(ENV_GET_CPU(env), ra);
> >+        }
> >+    }
> >+}
> >+
> >+void HELPER(xend)(void)
> >+{
> >+    if (_xtest()) {
> >+        _xend();
> >+    } else {
> >+        assert(!parallel_cpus);
> >+        parallel_cpus = true;
> >+    }
> >+}
> >+
> 
> Interesting idea.
> 
> FWIW, there are two other extant HTM implementations: ppc64 and s390x.  As I
> recall, the s390 (but not the ppc64) transactions do not roll back the fp
> registers.  Which suggests that we need special support within the TCG
> proglogue.  Perhaps folding these operations into special TCG opcodes.

I'm not familiar with s390, but as long as the hardware implements 'strong atomicity'
["strong atomicity guarantees atomicity between transactions and non-transactional
code", see http://acg.cis.upenn.edu/papers/cal06_atomic_semantics.pdf ] then
this approach would work, in the sense that stores wouldn't have to
be instrumented.

Of course architecture issues like saving the fp registers as you mention for
s390 would have to be taken into account.

> I believe that power8 has HTM, and there's one of those in the gcc compile
> farm, so this should be relatively easy to try out.

Good point! I had forgotten about power8. So far my tests have been on a
4-core Skylake. I have an account on the gcc compile farm so I will make use
of it. The power8 machine in the farm has a lot of cores, so this is
pretty exciting.

> We increase the chances of success of the transaction if we minimize the
> amount of non-target code that's executed while the transaction is running.
> That suggests two things:
> 
> (1) that it would be doubly helpful to incorporate the transaction start
> directly into TCG code generation rather than as a helper and

This (and leaving the fallback path in a helper) is simple enough that even
I could do it :-)

> (2) that we should start a new TB upon encountering a load-exclusive, so
> that we maximize the chance of the store-exclusive being a part of the same
> TB and thus have *nothing* extra between the beginning and commit of the
> transaction.

I don't know how to do this. If it's easy to do, please let me know how
(for aarch64 at least, since that's the target I'm using).

I've run some more tests on the Intel machine, and noticed that failed
transactions are very common (up to 50% abort rate for some SPEC workloads,
and I count these aborts as "retrying doesn't help" kind of aborts), so
bringing that down should definitely help.

Another thing I found out is that abusing tcg_exec_step (as is right now)
for the fallback path is a bad idea: when there are many failed transactions,
performance drops dramatically (up to 5x overall slowdown). Turns out that
all this overhead comes from re-translating the code between ldrex/strex.
Would it be possible to cache this step-by-step code? If not, then an
alternative would be to have a way to stop the world *without* leaving
the CPU loop for the calling thread. I'm more comfortable doing the latter
due to my glaring lack of TCG competence.

Thanks,

		Emilio

  reply	other threads:[~2016-08-17 17:58 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-15 10:46 [Qemu-devel] MTTCG status updates, benchmark results and KVM forum plans Alex Bennée
2016-08-15 11:00 ` Peter Maydell
2016-08-15 11:16   ` Alex Bennée
2016-08-15 15:46 ` Emilio G. Cota
2016-08-15 15:49   ` [Qemu-devel] [PATCH] aarch64: use TSX for ldrex/strex Emilio G. Cota
2016-08-17 17:22     ` Richard Henderson
2016-08-17 17:58       ` Emilio G. Cota [this message]
2016-08-17 18:18         ` Emilio G. Cota
2016-08-17 18:41         ` Richard Henderson
2016-08-18 15:38           ` Richard Henderson
2016-08-24 21:12             ` Emilio G. Cota
2016-08-24 22:17               ` [Qemu-devel] [PATCH 1/8] cpu list: convert to RCU QLIST Emilio G. Cota
2016-08-24 22:17                 ` [Qemu-devel] [PATCH 2/8] cpu-exec: remove tb_lock from hot path Emilio G. Cota
2016-08-24 22:17                 ` [Qemu-devel] [PATCH 3/8] rcu: add rcu_read_lock_held() Emilio G. Cota
2016-08-24 22:17                 ` [Qemu-devel] [PATCH 4/8] target-arm: helper fixup for paired atomics Emilio G. Cota
2016-08-24 22:18                 ` [Qemu-devel] [PATCH 5/8] linux-user: add stop-the-world to be called from CPU loop Emilio G. Cota
2016-08-24 22:18                 ` [Qemu-devel] [PATCH 6/8] htm: add header to abstract Hardware Transactional Memory intrinsics Emilio G. Cota
2016-08-24 22:18                 ` [Qemu-devel] [PATCH 7/8] htm: add powerpc64 intrinsics Emilio G. Cota
2016-08-24 22:18                 ` [Qemu-devel] [PATCH 8/8] target-arm/a64: use HTM with stop-the-world fall-back path Emilio G. Cota
2016-08-16 11:16   ` [Qemu-devel] MTTCG status updates, benchmark results and KVM forum plans Alex Bennée
2016-08-16 21:51     ` Emilio G. Cota

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160817175800.GA5084@flamenco \
    --to=cota@braap.org \
    --cc=a.rigo@virtualopensystems.com \
    --cc=alex.bennee@linaro.org \
    --cc=bobby.prani@gmail.com \
    --cc=claudio.fontana@huawei.com \
    --cc=crosthwaite.peter@gmail.com \
    --cc=dgilbert@redhat.com \
    --cc=fred.konrad@greensocs.com \
    --cc=jan.kiszka@siemens.com \
    --cc=mark.burton@greensocs.com \
    --cc=mttcg@greensocs.com \
    --cc=nikunj@linux.vnet.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    --cc=serge.fdrv@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).