Re: Support for x86_64 on aarch64 emulation

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Richard Henderson <richard.henderson@linaro.org>
To: Redha Gouicem <gouicem@in.tum.de>, qemu-devel@nongnu.org
Cc: d.g.sprokholt@tudelft.nl
Subject: Re: Support for x86_64 on aarch64 emulation
Date: Fri, 8 Apr 2022 08:27:27 -0700	[thread overview]
Message-ID: <0567b8bc-b327-9601-9acf-75711b77a1ef@linaro.org> (raw)
In-Reply-To: <648878dc-67c6-d919-c2a0-b7c6c5d613e2@in.tum.de>

On 4/8/22 05:21, Redha Gouicem wrote:
> We are working on support for x86_64 emulation on aarch64, mainly
> related to memory ordering issues. We first wanted to know what the
> community thinks about our proposal, and its chance of getting merged
> one day.
> 
> Note that we worked with qemu-user, so there may be issues in system
> mode that we missed.
> 
> # Problem
> 
> When generating the TCG instructions for memory accesses, fences are
> always inserted *before* the access, following this translation rule:
> 
>      x86   -->     TCG     -->    aarch64
>      -------------------------------------
>      RMOV  -->  Fm_ld; ld  -->  DMBLD; LDR
>      WMOV  -->  Fm_st; st  -->  DMBFF; STR
> 
> Here, Fm_ld is a fence that orders any preceding memory access with
> the subsequent load. F_m_st is a fence that orders any preceding
> memory access with the subsequent store. This means that, in TCG, all
> memory accesses are ordered by fences. Thus, no memory accesses can be
> re-ordered in TCG. This is a problem, because it is *stricter than
> x86*. Consider when a program contains:
> 
>      WMOV; RMOV
> 
> 
> x86 allows re-ordering independent store-load pairs, so the above pair
> can safely re-order on an x86 host. However, with QEMU's current
> translation, it becomes:
> 
>      DMBFF; STR; DMBLD; LDR
> 
> In this target aarch64 code, no re-ordering is possible. Hence, QEMU
> enforces a stronger model than x86. While that is correct, it harms
> performance.
> 
> # Solution
> 
> We propose an alternative scheme, which we formally proved correct
> (paper under review):
> 
>      x86   -->      TCG    -->    aarch64
>      -------------------------------------
>      RMOV  -->  ld; Fld_m  -->  LDR; DMBLD
>      WMOV  -->  Fst_st; st -->  DMBST; STR
> 
> This new scheme precisely captures the observable behaviors of the
> input program (in x86's memory model). This behavior is preserved in
> the resulting TCG and aarch64 programs. Which the inserted fences
> enforce (formally verified). Note that this scheme enforces fewer
> ordering than the previous (unnecessarily strong) mapping scheme. This
> new scheme benefits performance. We evaluated this on benchmarks
> (PARSEC) and got up to 19.7% improvement, 6.7% on average.
> 
> # Implementation Considerations
>   
> Different (source and host) architectures may demand different such
> mapping schemes. Some schemes may place fences before an instruction,
> while others place them after. The implementation of fence placement
> should thus be sufficiently flexible that either is possible. Though,
> note that write-read pairs are unordered in almost all architectures.
>   
> We see two ways of doing this:
> - extracting the placement of the fence from the
>    tcg_gen_qemu_ld/st_i32/i64 functions, and have each architecture
>    explicitly generate the fence at the correct place
> - adding two parameters to these functions specifying the strength of
>    the "before" and "after" fences. The function would then generate
>    both fences in the IR (one of them may be a NOP fence), which in
>    turn will be translated back to the host

This has been on my to-do list for quite some time.  My previous work was

https://patchew.org/QEMU/20210316220735.2048137-1-richard.henderson@linaro.org/

I have some further work (possibly not posted?  I can't find a reference) which attempted 
to strength reduce the barriers, and to use load-acquire/store-release insns when 
alignment of the operation allows.  Unfortunately, for the interesting cases in question 
(x86 and s390x guests, with the strongest guest memory models), it was rare that we could 
prove the alignment was sufficient, so it was a fair amount of work being done for no gain.


r~

next prev parent reply	other threads:[~2022-04-08 15:28 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-08 12:21 Support for x86_64 on aarch64 emulation Redha Gouicem
2022-04-08 15:27 ` Richard Henderson [this message]
2022-04-14 12:24   ` Redha Gouicem

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0567b8bc-b327-9601-9acf-75711b77a1ef@linaro.org \
    --to=richard.henderson@linaro.org \
    --cc=d.g.sprokholt@tudelft.nl \
    --cc=gouicem@in.tum.de \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).