From: "Emilio G. Cota" <cota@braap.org>
To: Richard Henderson <rth@twiddle.net>
Cc: "Alex Bennée" <alex.bennee@linaro.org>,
mttcg@greensocs.com, qemu-devel@nongnu.org,
fred.konrad@greensocs.com, a.rigo@virtualopensystems.com,
bobby.prani@gmail.com, nikunj@linux.vnet.ibm.com,
mark.burton@greensocs.com, pbonzini@redhat.com,
jan.kiszka@siemens.com, serge.fdrv@gmail.com,
peter.maydell@linaro.org, claudio.fontana@huawei.com,
"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
"Peter Crosthwaite" <crosthwaite.peter@gmail.com>
Subject: Re: [Qemu-devel] [PATCH] aarch64: use TSX for ldrex/strex
Date: Wed, 24 Aug 2016 17:12:40 -0400 [thread overview]
Message-ID: <20160824211240.GA26546@flamenco> (raw)
In-Reply-To: <5b81580e-0b6a-7b30-60a1-3c34548e7997@twiddle.net>
On Thu, Aug 18, 2016 at 08:38:47 -0700, Richard Henderson wrote:
> A couple of other notes, as I've thought about this some more.
Thanks for spending time on this.
I have a new patchset (will send as a reply to this e-mail in a few
minutes) that has good performance. Its main ideas:
- Use transactions that start on ldrex and finish on strex. On
an exception, end (instead of abort) the ongoing transaction,
if any. There's little point in aborting, since the subsequent
retries will end up in the same exception anyway. This means
the translation of the corresponding blocks might happen via
the fallback path. That's OK, given that subsequent executions
of the TBs will (likely) complete via HTM.
- For the fallback path, add a stop-the-world primitive that stops
all other CPUs, without requiring the calling CPU to exit the CPU loop.
Not breaking from the loop keeps the code simple--we can just
keep translating/executing normally, with the guarantee that
no other CPU can run until we're done.
- The fallback path of the transaction stops the world and then
continues execution (from ldrex) as the only running CPU.
- Only retry when the hardware hints that we may do so. This
ends up being rare (I can only get dozens of retries under
heavy contention, for instance with 'atomic_add-bench -r 1')
Limitations: for now user-mode only, and I have paid no attention
to paired atomics. Also, I'm making no checks for unusual (undefined?)
guest code, such as stray ldrex/strex thrown in there.
Performance optimizations like you suggest (e.g. starting a TB
on ldrex, or using TCG ops for beginning/ending the transaction)
could be implemented, but at least on Intel TSX (the only one I've
tried so far[*]), the transaction buffer seems big enough to not
make these optimizations a necessity.
[*] I tried running HTM primitives on the gcc compile farm's Power8,
but I get an illegal instruction fault on tbegin. I've filed
an issue here to report it: https://gna.org/support/?3369 ]
Some observations:
- The peak number of retries I see is for atomic_add-bench -r 1 -n 16
(on an 8-thread machine) at about ~90 retries. So I set the limit
to 100.
- The lowest success rate I've seen is ~98%, again for atomic_add-bench
under high contention.
Some numbers:
- atomic_add's performance is lower for HTM vs cmpxchg, although under
contention performance gets very similar. The reason for the perf
gap is that xbegin/xend takes more cycles than cmpxchg, especially
under little or no contention; this explains the large difference
for threads=1.
http://imgur.com/5kiT027
As a side note, contended transactions seem to scale worse than contended
cmpxchg when exploiting SMT. But anyway I wouldn't read much into
that.
- For more realistic workloads that gap goes away, as the relative impact
of cmpxchg or transaction delays is lower. For QHT, 1000 keys:
http://imgur.com/l6vcowu
And for SPEC (note that despite being single-threaded, SPEC executes
a lot of atomics, e.g. from mutexes and from forking):
http://imgur.com/W49YMhJ
Performance is essentially identical to that of cmpxchg, but of course
with HTM we get correct emulation.
Thanks for reading this far!
Emilio
next prev parent reply other threads:[~2016-08-24 21:13 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-08-15 10:46 [Qemu-devel] MTTCG status updates, benchmark results and KVM forum plans Alex Bennée
2016-08-15 11:00 ` Peter Maydell
2016-08-15 11:16 ` Alex Bennée
2016-08-15 15:46 ` Emilio G. Cota
2016-08-15 15:49 ` [Qemu-devel] [PATCH] aarch64: use TSX for ldrex/strex Emilio G. Cota
2016-08-17 17:22 ` Richard Henderson
2016-08-17 17:58 ` Emilio G. Cota
2016-08-17 18:18 ` Emilio G. Cota
2016-08-17 18:41 ` Richard Henderson
2016-08-18 15:38 ` Richard Henderson
2016-08-24 21:12 ` Emilio G. Cota [this message]
2016-08-24 22:17 ` [Qemu-devel] [PATCH 1/8] cpu list: convert to RCU QLIST Emilio G. Cota
2016-08-24 22:17 ` [Qemu-devel] [PATCH 2/8] cpu-exec: remove tb_lock from hot path Emilio G. Cota
2016-08-24 22:17 ` [Qemu-devel] [PATCH 3/8] rcu: add rcu_read_lock_held() Emilio G. Cota
2016-08-24 22:17 ` [Qemu-devel] [PATCH 4/8] target-arm: helper fixup for paired atomics Emilio G. Cota
2016-08-24 22:18 ` [Qemu-devel] [PATCH 5/8] linux-user: add stop-the-world to be called from CPU loop Emilio G. Cota
2016-08-24 22:18 ` [Qemu-devel] [PATCH 6/8] htm: add header to abstract Hardware Transactional Memory intrinsics Emilio G. Cota
2016-08-24 22:18 ` [Qemu-devel] [PATCH 7/8] htm: add powerpc64 intrinsics Emilio G. Cota
2016-08-24 22:18 ` [Qemu-devel] [PATCH 8/8] target-arm/a64: use HTM with stop-the-world fall-back path Emilio G. Cota
2016-08-16 11:16 ` [Qemu-devel] MTTCG status updates, benchmark results and KVM forum plans Alex Bennée
2016-08-16 21:51 ` Emilio G. Cota
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160824211240.GA26546@flamenco \
--to=cota@braap.org \
--cc=a.rigo@virtualopensystems.com \
--cc=alex.bennee@linaro.org \
--cc=bobby.prani@gmail.com \
--cc=claudio.fontana@huawei.com \
--cc=crosthwaite.peter@gmail.com \
--cc=dgilbert@redhat.com \
--cc=fred.konrad@greensocs.com \
--cc=jan.kiszka@siemens.com \
--cc=mark.burton@greensocs.com \
--cc=mttcg@greensocs.com \
--cc=nikunj@linux.vnet.ibm.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=rth@twiddle.net \
--cc=serge.fdrv@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).