All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Alex Bennée" <alex.bennee@linaro.org>
To: Kirill Batuzov <batuzovk@ispras.ru>
Cc: qemu-devel@nongnu.org, Peter Maydell <peter.maydell@linaro.org>,
	Peter Crosthwaite <crosthwaite.peter@gmail.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Richard Henderson <rth@twiddle.net>
Subject: Re: [Qemu-devel] [PATCH 00/18] Emulate guest vector operations with host vector operations
Date: Fri, 27 Jan 2017 14:55:39 +0000	[thread overview]
Message-ID: <87r33o8sd0.fsf@linaro.org> (raw)
In-Reply-To: <1484644078-21312-1-git-send-email-batuzovk@ispras.ru>


Kirill Batuzov <batuzovk@ispras.ru> writes:

> The goal of these patch series is to set up an infrastructure to emulate
> guest vector operations using host vector operations. Preliminary
> experiments show that simply translating loads and stores increases
> performance of x264 video codec by 10%. The performance of a gcc vectorized
> for loop increased 2x.
>
> To be able to emulate guest vector operations using host vector operations,
> several things need to be done.

I see rth has already done a bunch of review so I'll pass on this cycle
but please feel free to add me to the CC list next iteration.

>
> 1. Corresponding vector types should be added to TCG. These series add
> TCG_v128 and TCG_v64. I've made TCG_v64 a different type than TCG_i64
> because it usually needs to be allocated to different registers and
> supports different operations.
>
> 2. Load/store operations for these new types need to be implemented.
>
> 3. For seamless transition from current model to a new one we need to
> handle cases where memory occupied by global variable can be accessed via
> pointer to the CPUArchState structure. A very simple conservative alias
> analysis has been added to do it. This analysis tracks memory loads and
> stores that overlap with fields of CPUArchState and provides this
> information to the register allocator. The allocator then spills and
> reloads affected globals when needed.
>
> 4. Allow overlapping globals. For scalar registers this is a rare case, and
> overlapping registers can ba handled as a single one (ah, al, ax, eax,
> rax). In ARM every Q-register consists of two D-register each consisting of
> two S-registers. Handling 4 S-registers as one because they are parts of
> the same Q-register is way too inefficient.
>
> 5. Add new memory addressing mode to MMU code for large accesses and create
> needed helpers. Only 128-bit vectors have been handled for now.
>
> 6. Create TCG opcodes for vector operations. Only addition has beed handled
> in these series. Each operation has a wrapper that checks if the backend
> supports the corresponding operation or not. In one case the vector opcode
> is generated, in the other the operation is emulated with scalar
> operations. The emulation code is generated inline for performance reasons
> (there is a huge performance difference between inline generation
> and calling a helper). As a positive side effect this will eventually allow
>  to merge similar emulation code for vector instructions from different
> frontends to target-independent implementation.
>
> 7. Use new operations in the frontend (ARM was used in these series).
>
> 8. Support new operations in the backend (x86_64 was used in these series).
>
> For experiments I have used ARM guest on x86_64 host. I wanted some pair of
> different architectures with vector extensions both. ARM and x86_64 pair
> fits well.
>
> Kirill Batuzov (18):
>   tcg: add support for 128bit vector type
>   tcg: add support for 64bit vector type
>   tcg: add ld_v128, ld_v64, st_v128 and st_v64 opcodes
>   tcg: add simple alias analysis
>   tcg: use results of alias analysis in liveness analysis
>   tcg: allow globals to overlap
>   tcg: add vector addition operations
>   target/arm: support access to vector guest registers as globals
>   target/arm: use vector opcode to handle vadd.<size> instruction
>   tcg/i386: add support for vector opcodes
>   tcg/i386: support 64-bit vector operations
>   tcg/i386: support remaining vector addition operations
>   tcg: do not relay on exact values of MO_BSWAP or MO_SIGN in backend
>   tcg: introduce new TCGMemOp - MO_128
>   tcg: introduce qemu_ld_v128 and qemu_st_v128 opcodes
>   softmmu: create helpers for vector loads
>   tcg/i386: add support for qemu_ld_v128/qemu_st_v128 ops
>   target/arm: load two consecutive 64-bits vector regs as a 128-bit
>     vector reg
>
>  cputlb.c                     |   4 +
>  softmmu_template_vector.h    | 266 +++++++++++++++++++++++++++++++++++++++++++
>  target/arm/translate.c       |  89 ++++++++++++++-
>  tcg/aarch64/tcg-target.inc.c |   4 +-
>  tcg/arm/tcg-target.inc.c     |   4 +-
>  tcg/i386/tcg-target.h        |  35 +++++-
>  tcg/i386/tcg-target.inc.c    | 245 ++++++++++++++++++++++++++++++++++++---
>  tcg/mips/tcg-target.inc.c    |   4 +-
>  tcg/optimize.c               | 146 ++++++++++++++++++++++++
>  tcg/ppc/tcg-target.inc.c     |   4 +-
>  tcg/s390/tcg-target.inc.c    |   4 +-
>  tcg/sparc/tcg-target.inc.c   |  12 +-
>  tcg/tcg-op.c                 |  20 +++-
>  tcg/tcg-op.h                 | 262 ++++++++++++++++++++++++++++++++++++++++++
>  tcg/tcg-opc.h                |  34 ++++++
>  tcg/tcg.c                    | 146 ++++++++++++++++++++++++
>  tcg/tcg.h                    | 147 +++++++++++++++++++++++-
>  17 files changed, 1385 insertions(+), 41 deletions(-)
>  create mode 100644 softmmu_template_vector.h


--
Alex Bennée

      parent reply	other threads:[~2017-01-27 14:55 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-17  9:07 [Qemu-devel] [PATCH 00/18] Emulate guest vector operations with host vector operations Kirill Batuzov
2017-01-17  9:07 ` [Qemu-devel] [PATCH 01/18] tcg: add support for 128bit vector type Kirill Batuzov
2017-01-18 18:29   ` Richard Henderson
2017-01-19 13:04     ` Kirill Batuzov
2017-01-19 15:09       ` Richard Henderson
2017-01-19 16:54         ` Kirill Batuzov
2017-01-22  7:00           ` Richard Henderson
2017-01-23 10:30             ` Kirill Batuzov
2017-01-23 18:43               ` Richard Henderson
2017-01-24 14:29                 ` Kirill Batuzov
2017-01-17  9:07 ` [Qemu-devel] [PATCH 02/18] tcg: add support for 64bit " Kirill Batuzov
2017-01-17  9:07 ` [Qemu-devel] [PATCH 03/18] tcg: add ld_v128, ld_v64, st_v128 and st_v64 opcodes Kirill Batuzov
2017-01-17  9:07 ` [Qemu-devel] [PATCH 04/18] tcg: add simple alias analysis Kirill Batuzov
2017-01-17  9:07 ` [Qemu-devel] [PATCH 05/18] tcg: use results of alias analysis in liveness analysis Kirill Batuzov
2017-01-17  9:07 ` [Qemu-devel] [PATCH 06/18] tcg: allow globals to overlap Kirill Batuzov
2017-01-17 19:50   ` Richard Henderson
2017-01-17  9:07 ` [Qemu-devel] [PATCH 07/18] tcg: add vector addition operations Kirill Batuzov
2017-01-17 21:56   ` Richard Henderson
2017-01-17  9:07 ` [Qemu-devel] [PATCH 08/18] target/arm: support access to vector guest registers as globals Kirill Batuzov
2017-01-17 20:07   ` Richard Henderson
2017-01-17  9:07 ` [Qemu-devel] [PATCH 09/18] target/arm: use vector opcode to handle vadd.<size> instruction Kirill Batuzov
2017-01-17  9:07 ` [Qemu-devel] [PATCH 10/18] tcg/i386: add support for vector opcodes Kirill Batuzov
2017-01-17 20:19   ` Richard Henderson
2017-01-18 13:05     ` Kirill Batuzov
2017-01-18 18:22       ` Richard Henderson
2017-01-27 14:51   ` Alex Bennée
2017-01-17  9:07 ` [Qemu-devel] [PATCH 11/18] tcg/i386: support 64-bit vector operations Kirill Batuzov
2017-01-17  9:07 ` [Qemu-devel] [PATCH 12/18] tcg/i386: support remaining vector addition operations Kirill Batuzov
2017-01-17 21:49   ` Richard Henderson
2017-01-17  9:07 ` [Qemu-devel] [PATCH 13/18] tcg: do not relay on exact values of MO_BSWAP or MO_SIGN in backend Kirill Batuzov
2017-01-17  9:07 ` [Qemu-devel] [PATCH 14/18] tcg: introduce new TCGMemOp - MO_128 Kirill Batuzov
2017-01-17  9:07 ` [Qemu-devel] [PATCH 15/18] tcg: introduce qemu_ld_v128 and qemu_st_v128 opcodes Kirill Batuzov
2017-01-17  9:07 ` [Qemu-devel] [PATCH 16/18] softmmu: create helpers for vector loads Kirill Batuzov
2017-01-17  9:07 ` [Qemu-devel] [PATCH 17/18] tcg/i386: add support for qemu_ld_v128/qemu_st_v128 ops Kirill Batuzov
2017-01-17  9:07 ` [Qemu-devel] [PATCH 18/18] target/arm: load two consecutive 64-bits vector regs as a 128-bit vector reg Kirill Batuzov
2017-01-27 14:55 ` Alex Bennée [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r33o8sd0.fsf@linaro.org \
    --to=alex.bennee@linaro.org \
    --cc=batuzovk@ispras.ru \
    --cc=crosthwaite.peter@gmail.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.