qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Alex Bennée" <alex.bennee@linaro.org>
To: Alvise Rigo <a.rigo@virtualopensystems.com>
Cc: mttcg@listserver.greensocs.com, claudio.fontana@huawei.com,
	qemu-devel@nongnu.org, pbonzini@redhat.com,
	jani.kokkonen@huawei.com, tech@virtualopensystems.com,
	rth@twiddle.net
Subject: Re: [Qemu-devel] [RFC v6 00/14] Slow-path for atomic instruction translation
Date: Thu, 17 Dec 2015 16:06:39 +0000	[thread overview]
Message-ID: <87si31f4a8.fsf@linaro.org> (raw)
In-Reply-To: <1450082498-27109-1-git-send-email-a.rigo@virtualopensystems.com>


Alvise Rigo <a.rigo@virtualopensystems.com> writes:

> This is the sixth iteration of the patch series which applies to the
> upstream branch of QEMU (v2.5.0-rc3).
>
> Changes versus previous versions are at the bottom of this cover letter.
>
> The code is also available at following repository:
> https://git.virtualopensystems.com/dev/qemu-mt.git
> branch:
> slowpath-for-atomic-v6-no-mttcg

I'm starting to look through this now. However one problem that
immediately comes up is the aarch64 breakage. Because there is an
intrinsic link between a lot of the arm and aarch64 code it breaks the
other targets.

You could fix this by ensuring that CONFIG_TCG_USE_LDST_EXCL doesn't get
passed to the aarch64 build (tricky as aarch64-softmmu.mak includes
arm-softmmu.mak) or bite the bullet now and add the 64 bit helpers that
will be needed to convert the aarch64 exclusive equivalents.

>
> This patch series provides an infrastructure for atomic instruction
> implementation in QEMU, thus offering a 'legacy' solution for
> translating guest atomic instructions. Moreover, it can be considered as
> a first step toward a multi-thread TCG.
>
> The underlying idea is to provide new TCG helpers (sort of softmmu
> helpers) that guarantee atomicity to some memory accesses or in general
> a way to define memory transactions.
>
> More specifically, the new softmmu helpers behave as LoadLink and
> StoreConditional instructions, and are called from TCG code by means of
> target specific helpers. This work includes the implementation for all
> the ARM atomic instructions, see target-arm/op_helper.c.
>
> The implementation heavily uses the software TLB together with a new
> bitmap that has been added to the ram_list structure which flags, on a
> per-CPU basis, all the memory pages that are in the middle of a LoadLink
> (LL), StoreConditional (SC) operation.  Since all these pages can be
> accessed directly through the fast-path and alter a vCPU's linked value,
> the new bitmap has been coupled with a new TLB flag for the TLB virtual
> address which forces the slow-path execution for all the accesses to a
> page containing a linked address.
>
> The new slow-path is implemented such that:
> - the LL behaves as a normal load slow-path, except for clearing the
>   dirty flag in the bitmap.  The cputlb.c code while generating a TLB
>   entry, checks if there is at least one vCPU that has the bit cleared
>   in the exclusive bitmap, it that case the TLB entry will have the EXCL
>   flag set, thus forcing the slow-path.  In order to ensure that all the
>   vCPUs will follow the slow-path for that page, we flush the TLB cache
>   of all the other vCPUs.
>
>   The LL will also set the linked address and size of the access in a
>   vCPU's private variable. After the corresponding SC, this address will
>   be set to a reset value.
>
> - the SC can fail returning 1, or succeed, returning 0.  It has to come
>   always after a LL and has to access the same address 'linked' by the
>   previous LL, otherwise it will fail. If in the time window delimited
>   by a legit pair of LL/SC operations another write access happens to
>   the linked address, the SC will fail.
>
> In theory, the provided implementation of TCG LoadLink/StoreConditional
> can be used to properly handle atomic instructions on any architecture.
>
> The code has been tested with bare-metal test cases and by booting Linux.
>
> * Performance considerations
> The new slow-path adds some overhead to the translation of the ARM
> atomic instructions, since their emulation doesn't happen anymore only
> in the guest (by mean of pure TCG generated code), but requires the
> execution of two helpers functions. Despite this, the additional time
> required to boot an ARM Linux kernel on an i7 clocked at 2.5GHz is
> negligible.
> Instead, on a LL/SC bound test scenario - like:
> https://git.virtualopensystems.com/dev/tcg_baremetal_tests.git - this
> solution requires 30% (1 million iterations) and 70% (10 millions
> iterations) of additional time for the test to complete.
>
> Changes from v5:
> - The exclusive memory region is now set through a CPUClass hook,
>   allowing any architecture to decide the memory area that will be
>   protected during a LL/SC operation [PATCH 3]
> - The runtime helpers dropped any target dependency and are now in a
>   common file [PATCH 5]
> - Improved the way we restore a guest page as non-exclusive [PATCH 9]
> - Included MMIO memory as possible target of LL/SC
>   instructions. This also required to somehow simplify the
>   helper_*_st_name helpers in softmmu_template.h [PATCH 8-14]
>
> Changes from v4:
> - Reworked the exclusive bitmap to be of fixed size (8 bits per address)
> - The slow-path is now TCG backend independent, no need to touch
>   tcg/* anymore as suggested by Aurelien Jarno.
>
> Changes from v3:
> - based on upstream QEMU
> - addressed comments from Alex Bennée
> - the slow path can be enabled by the user with:
>   ./configure --enable-tcg-ldst-excl only if the backend supports it
> - all the ARM ldex/stex instructions make now use of the slow path
> - added aarch64 TCG backend support
> - part of the code has been rewritten
>
> Changes from v2:
> - the bitmap accessors are now atomic
> - a rendezvous between vCPUs and a simple callback support before executing
>   a TB have been added to handle the TLB flush support
> - the softmmu_template and softmmu_llsc_template have been adapted to work
>   on real multi-threading
>
> Changes from v1:
> - The ram bitmap is not reversed anymore, 1 = dirty, 0 = exclusive
> - The way how the offset to access the bitmap is calculated has
>   been improved and fixed
> - A page to be set as dirty requires a vCPU to target the protected address
>   and not just an address in the page
> - Addressed comments from Richard Henderson to improve the logic in
>   softmmu_template.h and to simplify the methods generation through
>   softmmu_llsc_template.h
> - Added initial implementation of qemu_{ldlink,stcond}_i32 for tcg/i386
>
> This work has been sponsored by Huawei Technologies Duesseldorf GmbH.
>
> Alvise Rigo (14):
>   exec.c: Add new exclusive bitmap to ram_list
>   softmmu: Add new TLB_EXCL flag
>   Add CPUClass hook to set exclusive range
>   softmmu: Add helpers for a new slowpath
>   tcg: Create new runtime helpers for excl accesses
>   configure: Use slow-path for atomic only when the softmmu is enabled
>   target-arm: translate: Use ld/st excl for atomic insns
>   target-arm: Add atomic_clear helper for CLREX insn
>   softmmu: Add history of excl accesses
>   softmmu: Simplify helper_*_st_name, wrap unaligned code
>   softmmu: Simplify helper_*_st_name, wrap MMIO code
>   softmmu: Simplify helper_*_st_name, wrap RAM code
>   softmmu: Include MMIO/invalid exclusive accesses
>   softmmu: Protect MMIO exclusive range
>
>  Makefile.target             |   2 +-
>  configure                   |   4 +
>  cputlb.c                    |  67 ++++++++-
>  exec.c                      |   8 +-
>  include/exec/cpu-all.h      |   8 ++
>  include/exec/cpu-defs.h     |   1 +
>  include/exec/helper-gen.h   |   1 +
>  include/exec/helper-proto.h |   1 +
>  include/exec/helper-tcg.h   |   1 +
>  include/exec/memory.h       |   4 +-
>  include/exec/ram_addr.h     |  76 ++++++++++
>  include/qom/cpu.h           |  21 +++
>  qom/cpu.c                   |   7 +
>  softmmu_llsc_template.h     | 144 +++++++++++++++++++
>  softmmu_template.h          | 338 +++++++++++++++++++++++++++++++++-----------
>  target-arm/helper.h         |   2 +
>  target-arm/op_helper.c      |   6 +
>  target-arm/translate.c      | 102 ++++++++++++-
>  tcg-llsc-helper.c           | 109 ++++++++++++++
>  tcg-llsc-helper.h           |  35 +++++
>  tcg/tcg-llsc-gen-helper.h   |  32 +++++
>  tcg/tcg.h                   |  31 ++++
>  22 files changed, 909 insertions(+), 91 deletions(-)
>  create mode 100644 softmmu_llsc_template.h
>  create mode 100644 tcg-llsc-helper.c
>  create mode 100644 tcg-llsc-helper.h
>  create mode 100644 tcg/tcg-llsc-gen-helper.h


--
Alex Bennée

  parent reply	other threads:[~2015-12-17 16:06 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-14  8:41 [Qemu-devel] [RFC v6 00/14] Slow-path for atomic instruction translation Alvise Rigo
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 01/14] exec.c: Add new exclusive bitmap to ram_list Alvise Rigo
2015-12-18 13:18   ` Alex Bennée
2015-12-18 13:47     ` alvise rigo
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 02/14] softmmu: Add new TLB_EXCL flag Alvise Rigo
2016-01-05 16:10   ` Alex Bennée
2016-01-05 17:27     ` alvise rigo
2016-01-05 18:39       ` Alex Bennée
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 03/14] Add CPUClass hook to set exclusive range Alvise Rigo
2016-01-05 16:42   ` Alex Bennée
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 04/14] softmmu: Add helpers for a new slowpath Alvise Rigo
2016-01-06 15:16   ` Alex Bennée
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 05/14] tcg: Create new runtime helpers for excl accesses Alvise Rigo
2015-12-14  9:40   ` Paolo Bonzini
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 06/14] configure: Use slow-path for atomic only when the softmmu is enabled Alvise Rigo
2015-12-14  9:38   ` Paolo Bonzini
2015-12-14  9:39     ` Paolo Bonzini
2015-12-14 10:14   ` Laurent Vivier
2015-12-15 14:23     ` alvise rigo
2015-12-15 14:31       ` Paolo Bonzini
2015-12-15 15:18         ` Laurent Vivier
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 07/14] target-arm: translate: Use ld/st excl for atomic insns Alvise Rigo
2016-01-06 17:11   ` Alex Bennée
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 08/14] target-arm: Add atomic_clear helper for CLREX insn Alvise Rigo
2016-01-06 17:13   ` Alex Bennée
2016-01-06 17:27     ` alvise rigo
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 09/14] softmmu: Add history of excl accesses Alvise Rigo
2015-12-14  9:35   ` Paolo Bonzini
2015-12-15 14:26     ` alvise rigo
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 10/14] softmmu: Simplify helper_*_st_name, wrap unaligned code Alvise Rigo
2016-01-07 14:46   ` Alex Bennée
2016-01-07 15:09     ` alvise rigo
2016-01-07 16:35       ` Alex Bennée
2016-01-07 16:54         ` alvise rigo
2016-01-07 17:36           ` Alex Bennée
2016-01-08 11:19   ` Alex Bennée
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 11/14] softmmu: Simplify helper_*_st_name, wrap MMIO code Alvise Rigo
2016-01-11  9:54   ` Alex Bennée
2016-01-11 10:19     ` alvise rigo
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 12/14] softmmu: Simplify helper_*_st_name, wrap RAM code Alvise Rigo
2015-12-17 16:52   ` Alex Bennée
2015-12-17 17:13     ` alvise rigo
2015-12-17 20:20       ` Alex Bennée
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 13/14] softmmu: Include MMIO/invalid exclusive accesses Alvise Rigo
2015-12-14  8:41 ` [Qemu-devel] [RFC v6 14/14] softmmu: Protect MMIO exclusive range Alvise Rigo
2015-12-14  9:33 ` [Qemu-devel] [RFC v6 00/14] Slow-path for atomic instruction translation Paolo Bonzini
2015-12-14 10:04   ` alvise rigo
2015-12-14 10:17     ` Paolo Bonzini
2015-12-15 13:59       ` alvise rigo
2015-12-15 14:18         ` Paolo Bonzini
2015-12-15 14:22           ` alvise rigo
2015-12-14 22:09 ` Andreas Tobler
2015-12-15  8:16   ` alvise rigo
2015-12-17 16:06 ` Alex Bennée [this message]
2015-12-17 16:16   ` alvise rigo
2016-01-06 18:00 ` Andrew Baumann
2016-01-07 10:21   ` alvise rigo
2016-01-07 10:22     ` Peter Maydell
2016-01-07 10:49       ` alvise rigo
2016-01-07 11:16         ` Peter Maydell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87si31f4a8.fsf@linaro.org \
    --to=alex.bennee@linaro.org \
    --cc=a.rigo@virtualopensystems.com \
    --cc=claudio.fontana@huawei.com \
    --cc=jani.kokkonen@huawei.com \
    --cc=mttcg@listserver.greensocs.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    --cc=tech@virtualopensystems.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).