[Qemu-devel] x86: cvtsi2s{s,d} etc. array access

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] x86: cvtsi2s{s,d} etc. array access
@ 2012-05-14 21:05 Blue Swirl
  2012-05-15 17:08 ` Blue Swirl
  2012-05-15 17:27 ` Peter Maydell
  0 siblings, 2 replies; 4+ messages in thread
From: Blue Swirl @ 2012-05-14 21:05 UTC (permalink / raw)
  To: qemu-devel

Hi,

While working on the AREG0 patches, I noticed strange code in
target-i386/translate.c.

We have this table of function pointers:
static void *sse_op_table3[4 * 3] = {
    gen_helper_cvtsi2ss,
    gen_helper_cvtsi2sd,
    X86_64_ONLY(gen_helper_cvtsq2ss),
    X86_64_ONLY(gen_helper_cvtsq2sd),

    gen_helper_cvttss2si,
    gen_helper_cvttsd2si,
    X86_64_ONLY(gen_helper_cvttss2sq),
    X86_64_ONLY(gen_helper_cvttsd2sq),

    gen_helper_cvtss2si,
    gen_helper_cvtsd2si,
    X86_64_ONLY(gen_helper_cvtss2sq),
    X86_64_ONLY(gen_helper_cvtsd2sq),
};

It's accessed like this (line 3537):
            sse_op2 = sse_op_table3[(s->dflag == 2) * 2 + ((b >> 8) - 2)];

b >> 8 can be only either 1 or 0. I don't see how this can work, won't
the array index become negative for s->dflag != 2?

The other access is as follows (line 3594):
            sse_op2 = sse_op_table3[(s->dflag == 2) * 2 + ((b >> 8) - 2) + 4 +
                                    (b & 1) * 4];

This looks better because of + 4 but I think some array values are not
accessible (max. 1 * 2 + (1 - 2) + 4 + 1 * 4 == 9).

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] x86: cvtsi2s{s,d} etc. array access
  2012-05-14 21:05 [Qemu-devel] x86: cvtsi2s{s,d} etc. array access Blue Swirl
@ 2012-05-15 17:08 ` Blue Swirl
  2012-05-15 17:27 ` Peter Maydell
  1 sibling, 0 replies; 4+ messages in thread
From: Blue Swirl @ 2012-05-15 17:08 UTC (permalink / raw)
  To: qemu-devel

On Mon, May 14, 2012 at 9:05 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
> Hi,
>
> While working on the AREG0 patches, I noticed strange code in
> target-i386/translate.c.
>
> We have this table of function pointers:
> static void *sse_op_table3[4 * 3] = {
>    gen_helper_cvtsi2ss,
>    gen_helper_cvtsi2sd,
>    X86_64_ONLY(gen_helper_cvtsq2ss),
>    X86_64_ONLY(gen_helper_cvtsq2sd),
>
>    gen_helper_cvttss2si,
>    gen_helper_cvttsd2si,
>    X86_64_ONLY(gen_helper_cvttss2sq),
>    X86_64_ONLY(gen_helper_cvttsd2sq),
>
>    gen_helper_cvtss2si,
>    gen_helper_cvtsd2si,
>    X86_64_ONLY(gen_helper_cvtss2sq),
>    X86_64_ONLY(gen_helper_cvtsd2sq),
> };
>
> It's accessed like this (line 3537):
>            sse_op2 = sse_op_table3[(s->dflag == 2) * 2 + ((b >> 8) - 2)];
>
> b >> 8 can be only either 1 or 0. I don't see how this can work, won't
> the array index become negative for s->dflag != 2?
>
> The other access is as follows (line 3594):
>            sse_op2 = sse_op_table3[(s->dflag == 2) * 2 + ((b >> 8) - 2) + 4 +
>                                    (b & 1) * 4];
>
> This looks better because of + 4 but I think some array values are not
> accessible (max. 1 * 2 + (1 - 2) + 4 + 1 * 4 == 9).

I still don't understand the arithmetic, but it looks like the correct
helpers are called:

$ cat cvtsi2ss.c
int main(void)
{
    asm("cvtsi2ss %eax, %xmm0;");
    asm("cvtsi2sd %eax, %xmm0;");
#ifdef __amd64__
    asm("cvtsi2ss %rax, %xmm0;");
    asm("cvtsi2sd %rax, %xmm0;");
#endif

    asm("cvttss2si %xmm0, %eax;");
    asm("cvttsd2si %xmm0, %eax;");
#ifdef __amd64__
    asm("cvttss2si %xmm0, %rax;");
    asm("cvttsd2si %xmm0, %rax;");
#endif

    asm("cvtss2si %xmm0, %eax;");
    asm("cvtsd2si %xmm0, %eax;");
#ifdef __amd64__
    asm("cvtss2si %xmm0, %rax;");
    asm("cvtsd2si %xmm0, %rax;");
#endif

    return 0;
}

$ gcc -o cvtsi2ss cvtsi2ss.c
$ gcc -m32 -o cvtsi2ss.i386 cvtsi2ss.c

$ qemu-x86_64 -d in_asm,op_opt ./cvtsi2ss
IN: main
0x0000000000400494:  push   %rbp
0x0000000000400495:  mov    %rsp,%rbp
0x0000000000400498:  cvtsi2ss %eax,%xmm0
0x000000000040049c:  cvtsi2sd %eax,%xmm0
0x00000000004004a0:  cvtsi2ssq %rax,%xmm0
0x00000000004004a5:  cvtsi2sdq %rax,%xmm0
0x00000000004004aa:  cvttss2si %xmm0,%eax
0x00000000004004ae:  cvttsd2si %xmm0,%eax
0x00000000004004b2:  cvttss2siq %xmm0,%rax
0x00000000004004b7:  cvttsd2siq %xmm0,%rax
0x00000000004004bc:  cvtss2si %xmm0,%eax
0x00000000004004c0:  cvtsd2si %xmm0,%eax
0x00000000004004c4:  cvtss2siq %xmm0,%rax
0x00000000004004c9:  cvtsd2siq %xmm0,%rax
0x00000000004004ce:  mov    $0x0,%eax
0x00000000004004d3:  leaveq
0x00000000004004d4:  retq

OP after liveness analysis:
 mov_i64 tmp0,rbp
 mov_i64 tmp2,rsp
 movi_i64 tmp12,$0xfffffffffffffff8
 add_i64 tmp2,tmp2,tmp12
 qemu_st64 tmp0,tmp2,$0xffffffffffffffff
 mov_i64 rsp,tmp2
 mov_i64 tmp0,rsp
 mov_i64 rbp,tmp0
 mov_i64 tmp0,rax
 movi_i64 tmp12,$0x2a8
 add_i64 tmp10,env,tmp12
 mov_i32 tmp6,tmp0
 movi_i64 tmp12,$cvtsi2ss
 call tmp12,$0x0,$0,tmp10,tmp6
 mov_i64 tmp0,rax
 movi_i64 tmp12,$0x2a8
 add_i64 tmp10,env,tmp12
 mov_i32 tmp6,tmp0
 movi_i64 tmp12,$cvtsi2sd
 call tmp12,$0x0,$0,tmp10,tmp6
 mov_i64 tmp0,rax
 movi_i64 tmp12,$0x2a8
 add_i64 tmp10,env,tmp12
 movi_i64 tmp12,$cvtsq2ss
 call tmp12,$0x0,$0,tmp10,tmp0
 mov_i64 tmp0,rax
 movi_i64 tmp12,$0x2a8
 add_i64 tmp10,env,tmp12
 movi_i64 tmp12,$cvtsq2sd
 call tmp12,$0x0,$0,tmp10,tmp0
 movi_i64 tmp12,$0x2a8
 add_i64 tmp10,env,tmp12
 movi_i64 tmp12,$cvttss2si
 call tmp12,$0x0,$1,tmp6,tmp10
 ext32u_i64 tmp0,tmp6
 ext32u_i64 rax,tmp0
 movi_i64 tmp12,$0x2a8
 add_i64 tmp10,env,tmp12
 movi_i64 tmp12,$cvttsd2si
 call tmp12,$0x0,$1,tmp6,tmp10
 ext32u_i64 tmp0,tmp6
 ext32u_i64 rax,tmp0
 movi_i64 tmp12,$0x2a8
 add_i64 tmp10,env,tmp12
 movi_i64 tmp12,$cvttss2sq
 call tmp12,$0x0,$1,tmp0,tmp10
 mov_i64 rax,tmp0
 movi_i64 tmp12,$0x2a8
 add_i64 tmp10,env,tmp12
 movi_i64 tmp12,$cvttsd2sq
 call tmp12,$0x0,$1,tmp0,tmp10
 mov_i64 rax,tmp0
 movi_i64 tmp12,$0x2a8
 add_i64 tmp10,env,tmp12
 movi_i64 tmp12,$cvtss2si
 call tmp12,$0x0,$1,tmp6,tmp10
 ext32u_i64 tmp0,tmp6
 ext32u_i64 rax,tmp0
 movi_i64 tmp12,$0x2a8
 add_i64 tmp10,env,tmp12
 movi_i64 tmp12,$cvtsd2si
 call tmp12,$0x0,$1,tmp6,tmp10
 ext32u_i64 tmp0,tmp6
 ext32u_i64 rax,tmp0
 movi_i64 tmp12,$0x2a8
 add_i64 tmp10,env,tmp12
 movi_i64 tmp12,$cvtss2sq
 call tmp12,$0x0,$1,tmp0,tmp10
 mov_i64 rax,tmp0
 movi_i64 tmp12,$0x2a8
 add_i64 tmp10,env,tmp12
 movi_i64 tmp12,$cvtsd2sq

$ qemu-i386 -d in_asm,op_opt ./cvtsi2ss.i386
$ grep -B3 -A29 cvtsi2ss /tmp/qemu.log
IN: main
0x08048394:  push   %ebp
0x08048395:  mov    %esp,%ebp
0x08048397:  cvtsi2ss %eax,%xmm0
0x0804839b:  cvtsi2sd %eax,%xmm0
0x0804839f:  cvttss2si %xmm0,%eax
0x080483a3:  cvttsd2si %xmm0,%eax
0x080483a7:  cvtss2si %xmm0,%eax
0x080483ab:  cvtsd2si %xmm0,%eax
0x080483af:  mov    $0x0,%eax
0x080483b4:  pop    %ebp
0x080483b5:  ret

OP after liveness analysis:
 mov_i32 tmp0,ebp
 mov_i32 tmp2,esp
 movi_i32 tmp12,$0xfffffffc
 add_i32 tmp2,tmp2,tmp12
 qemu_st32 tmp0,tmp2,$0xffffffffffffffff
 mov_i32 esp,tmp2
 mov_i32 tmp0,esp
 mov_i32 ebp,tmp0
 mov_i32 tmp0,eax
 movi_i64 tmp13,$0x1d8
 add_i64 tmp10,env,tmp13
 mov_i32 tmp6,tmp0
 movi_i64 tmp13,$cvtsi2ss
 call tmp13,$0x0,$0,tmp10,tmp6
 mov_i32 tmp0,eax
 movi_i64 tmp13,$0x1d8
 add_i64 tmp10,env,tmp13
 mov_i32 tmp6,tmp0
 movi_i64 tmp13,$cvtsi2sd
 call tmp13,$0x0,$0,tmp10,tmp6
 movi_i64 tmp13,$0x1d8
 add_i64 tmp10,env,tmp13
 movi_i64 tmp13,$cvttss2si
 call tmp13,$0x0,$1,tmp6,tmp10
 nopn $0x2,$0x2
 mov_i32 eax,tmp6
 movi_i64 tmp13,$0x1d8
 add_i64 tmp10,env,tmp13
 movi_i64 tmp13,$cvttsd2si
 call tmp13,$0x0,$1,tmp6,tmp10
 nopn $0x2,$0x2
 mov_i32 eax,tmp6
 movi_i64 tmp13,$0x1d8
 add_i64 tmp10,env,tmp13
 movi_i64 tmp13,$cvtss2si
 call tmp13,$0x0,$1,tmp6,tmp10
 nopn $0x2,$0x2
 mov_i32 eax,tmp6
 movi_i64 tmp13,$0x1d8
 add_i64 tmp10,env,tmp13
 movi_i64 tmp13,$cvtsd2si
 call tmp13,$0x0,$1,tmp6,tmp10

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] x86: cvtsi2s{s,d} etc. array access
  2012-05-14 21:05 [Qemu-devel] x86: cvtsi2s{s,d} etc. array access Blue Swirl
  2012-05-15 17:08 ` Blue Swirl
@ 2012-05-15 17:27 ` Peter Maydell
  2012-05-15 17:41   ` Blue Swirl
  1 sibling, 1 reply; 4+ messages in thread
From: Peter Maydell @ 2012-05-15 17:27 UTC (permalink / raw)
  To: Blue Swirl; +Cc: qemu-devel

On 14 May 2012 22:05, Blue Swirl <blauwirbel@gmail.com> wrote:
> While working on the AREG0 patches, I noticed strange code in
> target-i386/translate.c.

> It's accessed like this (line 3537):
>            sse_op2 = sse_op_table3[(s->dflag == 2) * 2 + ((b >> 8) - 2)];
>
> b >> 8 can be only either 1 or 0.

I don't think this is true. At this point in the code we're inside
a "switch (b)" so we know that b is either 0x22a (cvtsi2ss) or
0x32a (cvtsi2sd). So "((b >> 8) - 2)" is 0 for cvtsi2ss and 1
for cvtsi2sd, giving us the lsbit of the array index, with
(s->dflag == 2) providing the next bit, so we end up with
indexes 0,1,2,3 in this table for these two insns in their
doubleword and quadword forms.

You could rewrite "((b >> 8) - 2)" as "((b >> 8) & 1)".

> The other access is as follows (line 3594):
>            sse_op2 = sse_op_table3[(s->dflag == 2) * 2 + ((b >> 8) - 2) + 4 +
>                                    (b & 1) * 4];
>
> This looks better because of + 4 but I think some array values are not
> accessible (max. 1 * 2 + (1 - 2) + 4 + 1 * 4 == 9).

Here we know b is 0x22c (cvttss2si) 0x32c (cvttsd2si) 0x22d (cvtss2si)
or 0x32d (cvtsd2si). ((b >> 8) - 2) distinguishes the 0x2XX and 0x3XX,
and (b & 1) the 0xXXc from 0xXXd. So the index is made up of (lsbit to
msbit) "0x2XX or 0x3XX?", "double or quad?", "0xXXC or 0xXXD?", and then
we add a constant offset of 4 because the entries start after the
4 entries for the cases we looked at earlier.

I think you could actually split sse_op_table3 into two separate
tables, one for each of these cases, which would be slightly
clearer IMHO.

-- PMM

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] x86: cvtsi2s{s,d} etc. array access
  2012-05-15 17:27 ` Peter Maydell
@ 2012-05-15 17:41   ` Blue Swirl
  0 siblings, 0 replies; 4+ messages in thread
From: Blue Swirl @ 2012-05-15 17:41 UTC (permalink / raw)
  To: Peter Maydell; +Cc: qemu-devel

On Tue, May 15, 2012 at 5:27 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 14 May 2012 22:05, Blue Swirl <blauwirbel@gmail.com> wrote:
>> While working on the AREG0 patches, I noticed strange code in
>> target-i386/translate.c.
>
>> It's accessed like this (line 3537):
>>            sse_op2 = sse_op_table3[(s->dflag == 2) * 2 + ((b >> 8) - 2)];
>>
>> b >> 8 can be only either 1 or 0.
>
> I don't think this is true. At this point in the code we're inside
> a "switch (b)" so we know that b is either 0x22a (cvtsi2ss) or
> 0x32a (cvtsi2sd). So "((b >> 8) - 2)" is 0 for cvtsi2ss and 1
> for cvtsi2sd, giving us the lsbit of the array index, with
> (s->dflag == 2) providing the next bit, so we end up with
> indexes 0,1,2,3 in this table for these two insns in their
> doubleword and quadword forms.

OK, I misread the start of the function pretty badly.

>
> You could rewrite "((b >> 8) - 2)" as "((b >> 8) & 1)".
>
>> The other access is as follows (line 3594):
>>            sse_op2 = sse_op_table3[(s->dflag == 2) * 2 + ((b >> 8) - 2) + 4 +
>>                                    (b & 1) * 4];
>>
>> This looks better because of + 4 but I think some array values are not
>> accessible (max. 1 * 2 + (1 - 2) + 4 + 1 * 4 == 9).
>
> Here we know b is 0x22c (cvttss2si) 0x32c (cvttsd2si) 0x22d (cvtss2si)
> or 0x32d (cvtsd2si). ((b >> 8) - 2) distinguishes the 0x2XX and 0x3XX,
> and (b & 1) the 0xXXc from 0xXXd. So the index is made up of (lsbit to
> msbit) "0x2XX or 0x3XX?", "double or quad?", "0xXXC or 0xXXD?", and then
> we add a constant offset of 4 because the entries start after the
> 4 entries for the cases we looked at earlier.
>
> I think you could actually split sse_op_table3 into two separate
> tables, one for each of these cases, which would be slightly
> clearer IMHO.

Yes, this is IMHO ugly and there is no type safety due to void
pointers. There could be also an inner switch, like how cvttps2pi is
handled nearby.

>
> -- PMM

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-05-15 17:41 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-14 21:05 [Qemu-devel] x86: cvtsi2s{s,d} etc. array access Blue Swirl
2012-05-15 17:08 ` Blue Swirl
2012-05-15 17:27 ` Peter Maydell
2012-05-15 17:41   ` Blue Swirl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).