[Qemu-devel] [PULL v2] Queued TCG improvements

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PULL v2] Queued TCG improvements
@ 2015-08-18 14:59 Richard Henderson
  2015-08-18 23:23 ` Peter Maydell
  0 siblings, 1 reply; 5+ messages in thread
From: Richard Henderson @ 2015-08-18 14:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

This pull includes three independent patch sets, which were
all posted during the 2.4 freeze.

The first is algorithmic improvements to tcg/optimize, both
improving its runtime and its tracking of constants.

The second is improvements to the representation of 32<->64-bit
size changing operations.  Still to do here is investigate how
these might be best applied to each tcg host.

The third is improvements to how guest unaligned accesses are
implemented in softmmu mode, for the 4 supported  host processors
that themselves implement unaligned accesses.

Change v1-v2:
  * Removed a patch that Aurelien self-nack'ed.  I guess I'd
    gotten the set of patches confused along the way.


r~


The following changes since commit 074a9925e1cfd659d5376dcaccd1436d3840e611:

  Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging (2015-08-14 16:52:34 +0100)

are available in the git repository at:

  git://github.com/rth7680/qemu.git tags/pull-tcg-20150818

for you to fetch changes up to 2e58c34d4c9c61f311b5468f05b0ad63b77645c1:

  tcg/aarch64: Use softmmu fast path for unaligned accesses (2015-08-18 07:50:19 -0700)

----------------------------------------------------------------
queued tcg patches

----------------------------------------------------------------
Aurelien Jarno (11):
      tcg/optimize: fix constant signedness
      tcg/optimize: optimize temps tracking
      tcg/optimize: add temp_is_const and temp_is_copy functions
      tcg/optimize: track const/copy status separately
      tcg/optimize: allow constant to have copies
      tcg: rename trunc_shr_i32 into trunc_shr_i64_i32
      tcg: don't abuse TCG type in tcg_gen_trunc_shr_i64_i32
      tcg: implement real ext_i32_i64 and extu_i32_i64 ops
      tcg/optimize: add optimizations for ext_i32_i64 and extu_i32_i64 ops
      tcg: update README about size changing ops
      tcg/i386: use softmmu fast path for unaligned accesses

Benjamin Herrenschmidt (1):
      tcg/ppc: Improve unaligned load/store handling on 64-bit backend

Richard Henderson (4):
      tcg: Split trunc_shr_i32 opcode into extr[lh]_i64_i32
      tcg: Remove tcg_gen_trunc_i64_i32
      tcg/s390: Use softmmu fast path for unaligned accesses
      tcg/aarch64: Use softmmu fast path for unaligned accesses

 target-alpha/translate.c      |   4 +-
 target-arm/translate-a64.c    |  60 +++++------
 target-arm/translate.c        |  46 ++++----
 target-cris/translate.c       |   4 +-
 target-m68k/translate.c       |   2 +-
 target-microblaze/translate.c |   8 +-
 target-mips/translate.c       |   4 +-
 target-openrisc/translate.c   |  22 ++--
 target-s390x/translate.c      |  30 +++---
 target-sh4/translate.c        |   4 +-
 target-sparc/translate.c      |  14 +--
 target-tricore/translate.c    |  32 +++---
 target-xtensa/translate.c     |   2 +-
 tcg/README                    |  32 ++++--
 tcg/aarch64/tcg-target.c      |  41 ++++---
 tcg/aarch64/tcg-target.h      |   3 +-
 tcg/i386/tcg-target.c         |  27 +++--
 tcg/i386/tcg-target.h         |   3 +-
 tcg/ia64/tcg-target.c         |   4 +
 tcg/ia64/tcg-target.h         |   3 +-
 tcg/optimize.c                | 246 +++++++++++++++++++++---------------------
 tcg/ppc/tcg-target.c          |  47 ++++++--
 tcg/ppc/tcg-target.h          |   3 +-
 tcg/s390/tcg-target.c         |  31 +++++-
 tcg/s390/tcg-target.h         |   3 +-
 tcg/sparc/tcg-target.c        |  22 ++--
 tcg/sparc/tcg-target.h        |   3 +-
 tcg/tcg-op.c                  |  48 ++++-----
 tcg/tcg-op.h                  |  12 +--
 tcg/tcg-opc.h                 |  10 +-
 tcg/tcg.h                     |   3 +-
 tcg/tci/tcg-target.c          |   4 +
 tcg/tci/tcg-target.h          |   3 +-
 tci.c                         |   6 +-
 34 files changed, 447 insertions(+), 339 deletions(-)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] [PULL v2] Queued TCG improvements
  2015-08-18 14:59 [Qemu-devel] [PULL v2] Queued TCG improvements Richard Henderson
@ 2015-08-18 23:23 ` Peter Maydell
  2015-08-19  6:15   ` Richard Henderson
  2015-08-19 15:49   ` Richard Henderson
  0 siblings, 2 replies; 5+ messages in thread
From: Peter Maydell @ 2015-08-18 23:23 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On 18 August 2015 at 15:59, Richard Henderson <rth@twiddle.net> wrote:
> This pull includes three independent patch sets, which were
> all posted during the 2.4 freeze.
>
> The first is algorithmic improvements to tcg/optimize, both
> improving its runtime and its tracking of constants.
>
> The second is improvements to the representation of 32<->64-bit
> size changing operations.  Still to do here is investigate how
> these might be best applied to each tcg host.
>
> The third is improvements to how guest unaligned accesses are
> implemented in softmmu mode, for the 4 supported  host processors
> that themselves implement unaligned accesses.
>
> Change v1-v2:
>   * Removed a patch that Aurelien self-nack'ed.  I guess I'd
>     gotten the set of patches confused along the way.
>
>
> r~
>
>
> The following changes since commit 074a9925e1cfd659d5376dcaccd1436d3840e611:
>
>   Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging (2015-08-14 16:52:34 +0100)
>
> are available in the git repository at:
>
>   git://github.com/rth7680/qemu.git tags/pull-tcg-20150818
>
> for you to fetch changes up to 2e58c34d4c9c61f311b5468f05b0ad63b77645c1:
>
>   tcg/aarch64: Use softmmu fast path for unaligned accesses (2015-08-18 07:50:19 -0700)
>
> ----------------------------------------------------------------
> queued tcg patches
>

Hi. I'm afraid this fails 'make check' on 32-bit ARM for me:


QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64
QTEST_QEMU_IMG=qemu-img MALLOC_PERTURB_=${MALLOC_PERTURB_:-
$((RANDOM % 255 + 1))} gtester -k --verbose -m=quick
tests/endianness-test tests/fdc-test tests/ide-test tests/
ahci-test tests/hd-geo-test tests/boot-order-test
tests/bios-tables-test tests/rtc-test tests/i440fx-test tests
/fw_cfg-test tests/drive_del-test tests/wdt_ib700-test tests/tco-test
tests/e1000-test tests/rtl8139-test tests
/pcnet-test tests/eepro100-test tests/ne2000-test tests/nvme-test
tests/ac97-test tests/es1370-test tests/virti
o-net-test tests/virtio-balloon-test tests/virtio-blk-test
tests/virtio-rng-test tests/virtio-scsi-test tests/virtio-9p-test
tests/virtio-serial-test tests/virtio-console-test tests/tpci200-test
tests/ipoctal232-test tests/display-vga-test tests/intel-hda-test
tests/vmxnet3-test tests/pvpanic-test tests/i82801b11-test
tests/ioh3420-test tests/usb-hcd-ohci-test tests/usb-hcd-uhci-test
tests/usb-hcd-ehci-test tests/usb-hcd-xhci-test tests/pc-cpu-test
tests/q35-test tests/vhost-user-test tests/qom-test

[snip...]

TEST: tests/bios-tables-test... (pid=15865)
  /x86_64/acpi/piix4/tcg:                                              FAIL
GTester: last random seed: R02S16b73b222ae7e15b567b3f8c378584b0
(pid=15870)
  /x86_64/acpi/piix4/tcg/bridge:                                       FAIL
GTester: last random seed: R02S6234d11ab3f559ebcd1267cc71046b7f
(pid=15875)
  /x86_64/acpi/q35/tcg:                                                FAIL
GTester: last random seed: R02Sc765d52188a4d61bbb5a9294c9429e13
(pid=15880)
  /x86_64/acpi/q35/tcg/bridge:                                         FAIL
GTester: last random seed: R02S7c5238b347bb71adf72465b0653a793a
(pid=15885)
FAIL: tests/bios-tables-test

These are the tests which try to actually run guest code, so
usually this means that TCG on ARM has broken. Indeed:

# ./build/all/x86_64-softmmu/qemu-system-x86_64
Segmentation fault

(i386-softmmu doesn't segv, so probably it's a 64-bit-ops-on-32-bit
thing.)

thanks
-- PMM

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] [PULL v2] Queued TCG improvements
  2015-08-18 23:23 ` Peter Maydell
@ 2015-08-19  6:15   ` Richard Henderson
  2015-08-19 15:49   ` Richard Henderson
  1 sibling, 0 replies; 5+ messages in thread
From: Richard Henderson @ 2015-08-19  6:15 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers

On 08/18/2015 04:23 PM, Peter Maydell wrote:
> Hi. I'm afraid this fails 'make check' on 32-bit ARM for me:
...
> (i386-softmmu doesn't segv, so probably it's a 64-bit-ops-on-32-bit
> thing.)

Sadly, this doesn't fail on 32-bit x86 host.  I've started a build on an arm 
host, but it may be a while before I get results.


r~

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] [PULL v2] Queued TCG improvements
  2015-08-18 23:23 ` Peter Maydell
  2015-08-19  6:15   ` Richard Henderson
@ 2015-08-19 15:49   ` Richard Henderson
  2015-09-10 19:31     ` Aurelien Jarno
  1 sibling, 1 reply; 5+ messages in thread
From: Richard Henderson @ 2015-08-19 15:49 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: Peter Maydell, QEMU Developers

On 08/18/2015 04:23 PM, Peter Maydell wrote:
> Hi. I'm afraid this fails 'make check' on 32-bit ARM for me:

Found it.  The problem is in the temps tracking patch, where we weren't
ignoring TCG_CALL_DUMMY_ARG (-1).  This isn't used on x86 of course, which is
why we didn't see this failure there.

The following fixes the problem.  I chose to split the initialization so that
non-call opcodes don't need to check for <dummy>.

Can I get an RB for squashing this into the original patch?


r~


diff --git a/tcg/optimize.c b/tcg/optimize.c
index 2693168..10795ec 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -597,17 +597,24 @@ void tcg_optimize(TCGContext *s)
         const TCGOpDef *def = &tcg_op_defs[opc];

         oi_next = op->next;
+
+        /* Count the arguments, and initialize the temps that are
+           going to be used */
         if (opc == INDEX_op_call) {
             nb_oargs = op->callo;
             nb_iargs = op->calli;
+            for (i = 0; i < nb_oargs + nb_iargs; i++) {
+                tmp = args[i];
+                if (tmp != TCG_CALL_DUMMY_ARG) {
+                    init_temp_info(tmp);
+                }
+            }
         } else {
             nb_oargs = def->nb_oargs;
             nb_iargs = def->nb_iargs;
-        }
-
-        /* Initialize the temps that are going to be used */
-        for (i = 0; i < nb_oargs + nb_iargs; i++) {
-            init_temp_info(args[i]);
+            for (i = 0; i < nb_oargs + nb_iargs; i++) {
+                init_temp_info(args[i]);
+            }
         }

         /* Do copy propagation */

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] [PULL v2] Queued TCG improvements
  2015-08-19 15:49   ` Richard Henderson
@ 2015-09-10 19:31     ` Aurelien Jarno
  0 siblings, 0 replies; 5+ messages in thread
From: Aurelien Jarno @ 2015-09-10 19:31 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Peter Maydell, QEMU Developers

On 2015-08-19 08:49, Richard Henderson wrote:
> On 08/18/2015 04:23 PM, Peter Maydell wrote:
> > Hi. I'm afraid this fails 'make check' on 32-bit ARM for me:
> 
> Found it.  The problem is in the temps tracking patch, where we weren't
> ignoring TCG_CALL_DUMMY_ARG (-1).  This isn't used on x86 of course, which is
> why we didn't see this failure there.
> 
> The following fixes the problem.  I chose to split the initialization so that
> non-call opcodes don't need to check for <dummy>.
> 
> Can I get an RB for squashing this into the original patch?

Sorry for answering so late while it's already merged. This is indeed
correct. Thanks for fixing that.

> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index 2693168..10795ec 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -597,17 +597,24 @@ void tcg_optimize(TCGContext *s)
>          const TCGOpDef *def = &tcg_op_defs[opc];
> 
>          oi_next = op->next;
> +
> +        /* Count the arguments, and initialize the temps that are
> +           going to be used */
>          if (opc == INDEX_op_call) {
>              nb_oargs = op->callo;
>              nb_iargs = op->calli;
> +            for (i = 0; i < nb_oargs + nb_iargs; i++) {
> +                tmp = args[i];
> +                if (tmp != TCG_CALL_DUMMY_ARG) {
> +                    init_temp_info(tmp);
> +                }
> +            }
>          } else {
>              nb_oargs = def->nb_oargs;
>              nb_iargs = def->nb_iargs;
> -        }
> -
> -        /* Initialize the temps that are going to be used */
> -        for (i = 0; i < nb_oargs + nb_iargs; i++) {
> -            init_temp_info(args[i]);
> +            for (i = 0; i < nb_oargs + nb_iargs; i++) {
> +                init_temp_info(args[i]);
> +            }
>          }
> 
>          /* Do copy propagation */
> 
> 

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-09-10 19:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-18 14:59 [Qemu-devel] [PULL v2] Queued TCG improvements Richard Henderson
2015-08-18 23:23 ` Peter Maydell
2015-08-19  6:15   ` Richard Henderson
2015-08-19 15:49   ` Richard Henderson
2015-09-10 19:31     ` Aurelien Jarno

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).