Re: [Qemu-devel] [RFC v3 00/13] Slow-path for atomic instruction translation

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Frederic Konrad <fred.konrad@greensocs.com>
To: Alvise Rigo <a.rigo@virtualopensystems.com>,
	qemu-devel@nongnu.org, mttcg@listserver.greensocs.com
Cc: alex.bennee@linaro.org, jani.kokkonen@huawei.com,
	tech@virtualopensystems.com, claudio.fontana@huawei.com,
	pbonzini@redhat.com
Subject: Re: [Qemu-devel] [RFC v3 00/13] Slow-path for atomic instruction translation
Date: Fri, 10 Jul 2015 10:39:30 +0200	[thread overview]
Message-ID: <559F84C2.1090109@greensocs.com> (raw)
In-Reply-To: <1436516626-8322-1-git-send-email-a.rigo@virtualopensystems.com>

On 10/07/2015 10:23, Alvise Rigo wrote:
> This is the third iteration of the patch series; starting from PATCH 007
> there are the changes to move the whole work to multi-threading.
> Changes versus previous versions are at the bottom of this cover letter.
>
> This patch series provides an infrastructure for atomic
> instruction implementation in QEMU, paving the way for TCG multi-threading.
> The adopted design does not rely on host atomic
> instructions and is intended to propose a 'legacy' solution for
> translating guest atomic instructions.
>
> The underlying idea is to provide new TCG instructions that guarantee
> atomicity to some memory accesses or in general a way to define memory
> transactions. More specifically, a new pair of TCG instructions are
> implemented, qemu_ldlink_i32 and qemu_stcond_i32, that behave as
> LoadLink and StoreConditional primitives (only 32 bit variant
> implemented).  In order to achieve this, a new bitmap is added to the
> ram_list structure (always unique) which flags all memory pages that
> could not be accessed directly through the fast-path, due to previous
> exclusive operations. This new bitmap is coupled with a new TLB flag
> which forces the slow-path execution. All stores which are performed
> between an LL/SC operation by other vCPUs to the same (protected) address
> will fail the subsequent StoreConditional.
>
> In theory, the provided implementation of TCG LoadLink/StoreConditional
> can be used to properly handle atomic instructions on any architecture.
>
> The new slow-path is implemented such that:
> - the LoadLink behaves as a normal load slow-path, except for cleaning
>    the dirty flag in the bitmap. The TLB entries created from now on will
>    force the slow-path. To ensure it, we flush the TLB cache for the
>    other vCPUs. The vCPU also sets into a private variable the accessed
>    address, in order to make it visible to the other vCPUs
> - the StoreConditional behaves as a normal store slow-path, except for
>    checking whether other vCPUs have set the same exclusive address
>
> All those write accesses that are forced to follow the 'legacy'
> slow-path will set the accessed memory page to dirty.
>
> In this series only the ARM ldrex/strex instructions are implemented
> for ARM and i386 hosts.
> The code has been tested with bare-metal test cases and by booting Linux,
> using the latest mttcg QEMU branch available at
> http://git.greensocs.com/fkonrad/mttcg.git.
branch multi_tcg_v6 at this time.

>
> * Performance considerations
> This implementation shows good results while booting a Linux kernel,
> where tons of flushes affect the overall performance. A complete ARM
> Linux boot, without any filesystem, requires 30% longer if compared to
> the mttcg implementation, benefiting however of being capable to offer
> the infrastructure to handle atomic instructions on any architecture.
> Instead compared to the current TCG upstream, it is 40% faster with four
> vCPUs and 2.1 times faster with 8 vCPUs.
> In addition, there is still margin to improve such performance, since at
> the moment TLB is flushed quite often, probably more than the required.
>
> On the other hand, the test case
> https://git.virtualopensystems.com/dev/tcg_baremetal_tests.git
> that stresses heavily the LL/SC mechanic but not that much the TLB related
> part, performs up to 1.9 times faster with 8 cores and one milion iterations
> if compared with the mttcg implementation.
>
> Changes from v2:
> - the bitmap accessors are now atomic
> - a rendezvous between vCPUs and a simple callback support before executing
>    a TB have been added to handle the TLB flush support
Isn't exactly what my async_safe_work is supposed to do?

> - the softmmu_template and softmmu_llsc_template have been adapted to work
>    on real multi-threading
>
> Changes from v1:
> - The ram bitmap is not reversed anymore, 1 = dirty, 0 = exclusive
> - The way how the offset to access the bitmap is calculated has
>    been improved and fixed
> - A page to be set as dirty requires a vCPU to target the protected address
>    and not just an address in the page
> - Addressed comments from Richard Henderson to improve the logic in
>    softmmu_template.h and to simplify the methods generation through
>    softmmu_llsc_template.h
> - Added initial implementation of qemu_{ldlink,stcond}_i32 for tcg/i386
>
> This work has been sponsored by Huawei Technologies Duesseldorf GmbH.
>
> Alvise Rigo (13):
>    exec: Add new exclusive bitmap to ram_list
>    cputlb: Add new TLB_EXCL flag
>    softmmu: Add helpers for a new slow-path
>    tcg-op: create new TCG qemu_ldlink and qemu_stcond instructions
>    target-arm: translate: implement qemu_ldlink and qemu_stcond ops
>    target-i386: translate: implement qemu_ldlink and qemu_stcond ops
>    ram_addr.h: Make exclusive bitmap accessors atomic
>    exec.c: introduce a simple rendezvous support
>    cpus.c: introduce simple callback support
>    Simple TLB flush wrap to use as exit callback
>    Introduce exit_flush_req and tcg_excl_access_lock
>    softmmu_llsc_template.h: move to multithreading
>    softmmu_template.h: move to multithreading
>
>   cpus.c                  |  39 ++++++++
>   cputlb.c                |  33 +++++-
>   exec.c                  |  46 +++++++++
>   include/exec/cpu-all.h  |   2 +
>   include/exec/cpu-defs.h |   8 ++
>   include/exec/memory.h   |   3 +-
>   include/exec/ram_addr.h |  22 ++++
>   include/qom/cpu.h       |  37 +++++++
>   softmmu_llsc_template.h | 184 ++++++++++++++++++++++++++++++++++
>   softmmu_template.h      | 261 +++++++++++++++++++++++++++++++++++-------------
>   target-arm/translate.c  |  87 +++++++++++++++-
>   tcg/arm/tcg-target.c    | 121 ++++++++++++++++------
>   tcg/i386/tcg-target.c   | 136 +++++++++++++++++++++----
>   tcg/tcg-be-ldst.h       |   1 +
>   tcg/tcg-op.c            |  23 +++++
>   tcg/tcg-op.h            |   3 +
>   tcg/tcg-opc.h           |   4 +
>   tcg/tcg.c               |   2 +
>   tcg/tcg.h               |  20 ++++
>   19 files changed, 910 insertions(+), 122 deletions(-)
>   create mode 100644 softmmu_llsc_template.h
>

next prev parent reply	other threads:[~2015-07-10  8:39 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-10  8:23 [Qemu-devel] [RFC v3 00/13] Slow-path for atomic instruction translation Alvise Rigo
2015-07-10  8:23 ` [Qemu-devel] [RFC v3 01/13] exec: Add new exclusive bitmap to ram_list Alvise Rigo
2015-07-10  8:23 ` [Qemu-devel] [RFC v3 02/13] cputlb: Add new TLB_EXCL flag Alvise Rigo
2015-07-16 14:32   ` Alex Bennée
2015-07-16 15:04     ` alvise rigo
2015-07-10  8:23 ` [Qemu-devel] [RFC v3 03/13] softmmu: Add helpers for a new slow-path Alvise Rigo
2015-07-16 14:53   ` Alex Bennée
2015-07-16 15:15     ` alvise rigo
2015-07-10  8:23 ` [Qemu-devel] [RFC v3 04/13] tcg-op: create new TCG qemu_ldlink and qemu_stcond instructions Alvise Rigo
2015-07-17  9:49   ` Alex Bennée
2015-07-17 10:05     ` alvise rigo
2015-07-10  8:23 ` [Qemu-devel] [RFC v3 05/13] target-arm: translate: implement qemu_ldlink and qemu_stcond ops Alvise Rigo
2015-07-17 12:51   ` Alex Bennée
2015-07-17 13:01     ` alvise rigo
2015-07-10  8:23 ` [Qemu-devel] [RFC v3 06/13] target-i386: " Alvise Rigo
2015-07-17 12:56   ` Alex Bennée
2015-07-17 13:27     ` alvise rigo
2015-07-10  8:23 ` [Qemu-devel] [RFC v3 07/13] ram_addr.h: Make exclusive bitmap accessors atomic Alvise Rigo
2015-07-17 13:32   ` Alex Bennée
2015-07-10  8:23 ` [Qemu-devel] [RFC v3 08/13] exec.c: introduce a simple rendezvous support Alvise Rigo
2015-07-17 13:45   ` Alex Bennée
2015-07-17 13:54     ` alvise rigo
2015-07-10  8:23 ` [Qemu-devel] [RFC v3 09/13] cpus.c: introduce simple callback support Alvise Rigo
2015-07-10  9:36   ` Paolo Bonzini
2015-07-10  9:47     ` alvise rigo
2015-07-10  9:53       ` Frederic Konrad
2015-07-10 10:06         ` alvise rigo
2015-07-10 10:24       ` Paolo Bonzini
2015-07-10 12:16         ` Frederic Konrad
2015-07-10  8:23 ` [Qemu-devel] [RFC v3 10/13] Simple TLB flush wrap to use as exit callback Alvise Rigo
2015-07-10  8:23 ` [Qemu-devel] [RFC v3 11/13] Introduce exit_flush_req and tcg_excl_access_lock Alvise Rigo
2015-07-10  8:23 ` [Qemu-devel] [RFC v3 12/13] softmmu_llsc_template.h: move to multithreading Alvise Rigo
2015-07-17 15:27   ` Alex Bennée
2015-07-17 15:31     ` alvise rigo
2015-07-10  8:23 ` [Qemu-devel] [RFC v3 13/13] softmmu_template.h: " Alvise Rigo
2015-07-17 15:57   ` Alex Bennée
2015-07-17 16:19     ` alvise rigo
2015-07-10  8:31 ` [Qemu-devel] [RFC v3 00/13] Slow-path for atomic instruction translation Mark Burton
2015-07-10  8:58   ` alvise rigo
2015-07-10  8:39 ` Frederic Konrad [this message]
2015-07-10  9:04   ` alvise rigo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=559F84C2.1090109@greensocs.com \
    --to=fred.konrad@greensocs.com \
    --cc=a.rigo@virtualopensystems.com \
    --cc=alex.bennee@linaro.org \
    --cc=claudio.fontana@huawei.com \
    --cc=jani.kokkonen@huawei.com \
    --cc=mttcg@listserver.greensocs.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=tech@virtualopensystems.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).