All of lore.kernel.org
 help / color / mirror / Atom feed
From: Aurelien Jarno <aurelien@aurel32.net>
To: Ulrich Hecht <uli@suse.de>
Cc: riku.voipio@iki.fi, qemu-devel@nongnu.org, agraf@suse.de
Subject: Re: [Qemu-devel] [PATCH 1/9] TCG "sync" op
Date: Fri, 16 Oct 2009 19:29:08 +0200	[thread overview]
Message-ID: <20091016172908.GG4127@hall.aurel32.net> (raw)
In-Reply-To: <200910161837.31850.uli@suse.de>

On Fri, Oct 16, 2009 at 06:37:31PM +0200, Ulrich Hecht wrote:
> On Friday 16 October 2009, Aurelien Jarno wrote:
> > IMHO, the correct way to do it is to use the following code, assuming
> > you want to use 64-bit TCG regs to hold 32-bit values (that's
> > something that is not really clear in your next patch):
> >
> > - for register load:
> > | static TCGv load_reg(int reg)
> > | {
> > |    TCGv r = tcg_temp_new_i64();
> > |    tcg_gen_ext32u_i64(r, tcgregs[reg]);
> > |    return r;
> > | }
> > |
> > | static void store_reg32(int reg, TCGv v)
> > | {
> > |    tcg_gen_ext32u_i64(v, v); /* may be optional */
> > |    tcg_gen_andi_i64(tcgregs[reg], tcgregs[reg],
> > | 0xffffffff00000000ULL); tcg_gen_or_i64(tcgregs[reg], tcgregs[reg],
> > | v);
> > | }
> 
> This is _extremely_ detrimental to performance. The point of the sync op 
> is that in most cases it's a nop because registers are usually used with 
> the same bitness again and again. The sign extension/masking stuff is 

I don't really understand how it can be simply be a nop, given it calls
temp_save. It means if a register is used twice in a BB, it is fetch
again from memory.

> done every time a register is accessed as 32 bits, which is the most 
> common case. Compare the translation of the following sequence of 
> instructions:
> 
> IN: _dl_aux_init
> 0x0000000080044ff6:  lhi        %r4,0
> 0x0000000080044ffa:  lhi        %r5,0
> 0x0000000080044ffe:  lhi        %r0,0

This example is a bit biased, as registers are only saved, and never
reused. Let's comment on it though.

> with sync:
> 
> OP:
>  mov_i32 loc0,global_cc
>  movi_i32 tmp1,$0x0
>  sync_i64 R4
>  mov_i32 r4,tmp1
>  movi_i32 tmp1,$0x0
>  sync_i64 R5
>  mov_i32 r5,tmp1
>  movi_i32 tmp1,$0x0
>  sync_i64 R0
>  mov_i32 r0,tmp1
>  mov_i32 global_cc,loc0
>  movi_i64 tmp2,$0x80045002
>  st_i64 tmp2,env,$0x158
>  exit_tb $0x0
> 
> OUT: [size=61]
> 0x6019a030:  mov    0x160(%r14),%ebp
> 0x6019a037:  mov    %rbp,%rbx
> 0x6019a03a:  mov    $0x80045002,%r12d
> 0x6019a040:  mov    %r12,0x158(%r14)
> 0x6019a047:  mov    %ebp,0xd1a0(%r14)
> 0x6019a04e:  mov    %ebx,0x160(%r14)
> 0x6019a055:  xor    %ebp,%ebp
> 0x6019a057:  mov    %ebp,(%r14)
> 0x6019a05a:  xor    %ebp,%ebp
> 0x6019a05c:  mov    %ebp,0x20(%r14)
> 0x6019a060:  xor    %ebp,%ebp
> 0x6019a062:  mov    %ebp,0x28(%r14)
> 0x6019a066:  xor    %eax,%eax
> 0x6019a068:  jmpq   0x621dc8ce
> 
> 
> with sign extension:
> 
> OP:
>  mov_i32 loc0,global_cc
>  movi_i32 tmp1,$0x0
>  ext32u_i64 tmp1,tmp1
>  movi_i64 tmp2,$0xffffffff00000000
>  and_i64 R4,R4,tmp2
>  or_i64 R4,R4,tmp1
>  movi_i32 tmp1,$0x0
>  ext32u_i64 tmp1,tmp1
>  movi_i64 tmp2,$0xffffffff00000000
>  and_i64 R5,R5,tmp2
>  or_i64 R5,R5,tmp1
>  movi_i32 tmp1,$0x0
>  ext32u_i64 tmp1,tmp1
>  movi_i64 tmp2,$0xffffffff00000000
>  and_i64 R0,R0,tmp2
>  or_i64 R0,R0,tmp1
>  mov_i32 global_cc,loc0
>  movi_i64 tmp2,$0x80045002
>  st_i64 tmp2,env,$0x158
>  exit_tb $0x0
> 
> OUT: [size=126]
> 0x6019af10:  mov    0x160(%r14),%ebp
> 0x6019af17:  xor    %ebx,%ebx
> 0x6019af19:  mov    %ebx,%ebx
> 0x6019af1b:  mov    0x20(%r14),%r12
> 0x6019af1f:  mov    $0xffffffff00000000,%r13
> 0x6019af29:  and    %r13,%r12
> 0x6019af2c:  or     %rbx,%r12
> 0x6019af2f:  xor    %ebx,%ebx
> 0x6019af31:  mov    %ebx,%ebx
> 0x6019af33:  mov    0x28(%r14),%r13
> 0x6019af37:  mov    $0xffffffff00000000,%r15
> 0x6019af41:  and    %r15,%r13
> 0x6019af44:  or     %rbx,%r13
> 0x6019af47:  xor    %ebx,%ebx
> 0x6019af49:  mov    %ebx,%ebx
> 0x6019af4b:  mov    (%r14),%r15
> 0x6019af4e:  mov    $0xffffffff00000000,%r10
> 0x6019af58:  and    %r10,%r15
> 0x6019af5b:  or     %rbx,%r15
> 0x6019af5e:  mov    %rbp,%rbx
> 0x6019af61:  mov    $0x80045002,%r10d
> 0x6019af67:  mov    %r10,0x158(%r14)
> 0x6019af6e:  mov    %ebp,0xd1a0(%r14)
> 0x6019af75:  mov    %ebx,0x160(%r14)
> 0x6019af7c:  mov    %r15,(%r14)
> 0x6019af7f:  mov    %r12,0x20(%r14)
> 0x6019af83:  mov    %r13,0x28(%r14)
> 0x6019af87:  xor    %eax,%eax
> 0x6019af89:  jmpq   0x621dd78e
> 
> Its more than twice the size and has ten memory accesses instead of 
> seven.
> 

There is clearly a huge impact for saving the globals, that I didn't
expect. I still believe a sync op as implemented in your patch is in
opposite direction to TCG's philosophy, probably does not work with
fixed registers, and I don't understand what is the gain compared to 
the use of tcg_gen_ld/st. Maybe you can post a dump of such a version 
so that we can see the benefit?

I think instead of a sync op, we should think of a way to use only the
low part of a global variable, maybe by adding new ops. That can also 
help to improve the concat_i32_i64 op on 64-bit hosts. Does someone has
any idea?

Aurelien

-- 
Aurelien Jarno	                        GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

  reply	other threads:[~2009-10-16 17:29 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-16 12:38 [Qemu-devel] [PATCH 0/9] S/390 support updated Ulrich Hecht
2009-10-16 12:38 ` [Qemu-devel] [PATCH 1/9] TCG "sync" op Ulrich Hecht
2009-10-16 12:38   ` [Qemu-devel] [PATCH 2/9] S/390 CPU emulation Ulrich Hecht
2009-10-16 12:38     ` [Qemu-devel] [PATCH 3/9] S/390 host/target build system support Ulrich Hecht
2009-10-16 12:38       ` [Qemu-devel] [PATCH 4/9] S/390 host support for TCG Ulrich Hecht
2009-10-16 12:38         ` [Qemu-devel] [PATCH 5/9] linux-user: S/390 64-bit (s390x) support Ulrich Hecht
2009-10-16 12:38           ` [Qemu-devel] [PATCH 6/9] linux-user: don't do locking in single-threaded processes Ulrich Hecht
2009-10-16 12:38             ` [Qemu-devel] [PATCH 7/9] linux-user: dup3, fallocate syscalls Ulrich Hecht
2009-10-16 12:38               ` [Qemu-devel] [PATCH 8/9] linux-user: define a couple of syscalls for non-uid16 targets Ulrich Hecht
2009-10-16 12:38                 ` [Qemu-devel] [PATCH 9/9] linux-user: getpriority errno fix Ulrich Hecht
2009-10-17 10:44       ` [Qemu-devel] [PATCH 3/9] S/390 host/target build system support Aurelien Jarno
2009-10-17 10:42     ` [Qemu-devel] [PATCH 2/9] S/390 CPU emulation Aurelien Jarno
2009-10-19 17:17       ` Ulrich Hecht
2009-10-22 21:28         ` Aurelien Jarno
2009-11-02 15:16           ` Ulrich Hecht
2009-11-02 18:42             ` Aurelien Jarno
2009-11-02 19:03               ` Laurent Desnogues
2009-11-09 16:55                 ` Ulrich Hecht
2009-11-10 16:02                   ` Aurelien Jarno
2009-11-11 10:46                     ` Ulrich Hecht
2009-10-16 15:52   ` [Qemu-devel] [PATCH 1/9] TCG "sync" op Aurelien Jarno
2009-10-16 16:37     ` Ulrich Hecht
2009-10-16 17:29       ` Aurelien Jarno [this message]
2009-10-19 17:17         ` Ulrich Hecht
2009-10-17  8:59     ` Edgar E. Iglesias
2009-10-19 17:17       ` Ulrich Hecht

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20091016172908.GG4127@hall.aurel32.net \
    --to=aurelien@aurel32.net \
    --cc=agraf@suse.de \
    --cc=qemu-devel@nongnu.org \
    --cc=riku.voipio@iki.fi \
    --cc=uli@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.