* [Qemu-devel] x86: cvtsi2s{s,d} etc. array access
@ 2012-05-14 21:05 Blue Swirl
2012-05-15 17:08 ` Blue Swirl
2012-05-15 17:27 ` Peter Maydell
0 siblings, 2 replies; 4+ messages in thread
From: Blue Swirl @ 2012-05-14 21:05 UTC (permalink / raw)
To: qemu-devel
Hi,
While working on the AREG0 patches, I noticed strange code in
target-i386/translate.c.
We have this table of function pointers:
static void *sse_op_table3[4 * 3] = {
gen_helper_cvtsi2ss,
gen_helper_cvtsi2sd,
X86_64_ONLY(gen_helper_cvtsq2ss),
X86_64_ONLY(gen_helper_cvtsq2sd),
gen_helper_cvttss2si,
gen_helper_cvttsd2si,
X86_64_ONLY(gen_helper_cvttss2sq),
X86_64_ONLY(gen_helper_cvttsd2sq),
gen_helper_cvtss2si,
gen_helper_cvtsd2si,
X86_64_ONLY(gen_helper_cvtss2sq),
X86_64_ONLY(gen_helper_cvtsd2sq),
};
It's accessed like this (line 3537):
sse_op2 = sse_op_table3[(s->dflag == 2) * 2 + ((b >> 8) - 2)];
b >> 8 can be only either 1 or 0. I don't see how this can work, won't
the array index become negative for s->dflag != 2?
The other access is as follows (line 3594):
sse_op2 = sse_op_table3[(s->dflag == 2) * 2 + ((b >> 8) - 2) + 4 +
(b & 1) * 4];
This looks better because of + 4 but I think some array values are not
accessible (max. 1 * 2 + (1 - 2) + 4 + 1 * 4 == 9).
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] x86: cvtsi2s{s,d} etc. array access
2012-05-14 21:05 [Qemu-devel] x86: cvtsi2s{s,d} etc. array access Blue Swirl
@ 2012-05-15 17:08 ` Blue Swirl
2012-05-15 17:27 ` Peter Maydell
1 sibling, 0 replies; 4+ messages in thread
From: Blue Swirl @ 2012-05-15 17:08 UTC (permalink / raw)
To: qemu-devel
On Mon, May 14, 2012 at 9:05 PM, Blue Swirl <blauwirbel@gmail.com> wrote:
> Hi,
>
> While working on the AREG0 patches, I noticed strange code in
> target-i386/translate.c.
>
> We have this table of function pointers:
> static void *sse_op_table3[4 * 3] = {
> gen_helper_cvtsi2ss,
> gen_helper_cvtsi2sd,
> X86_64_ONLY(gen_helper_cvtsq2ss),
> X86_64_ONLY(gen_helper_cvtsq2sd),
>
> gen_helper_cvttss2si,
> gen_helper_cvttsd2si,
> X86_64_ONLY(gen_helper_cvttss2sq),
> X86_64_ONLY(gen_helper_cvttsd2sq),
>
> gen_helper_cvtss2si,
> gen_helper_cvtsd2si,
> X86_64_ONLY(gen_helper_cvtss2sq),
> X86_64_ONLY(gen_helper_cvtsd2sq),
> };
>
> It's accessed like this (line 3537):
> sse_op2 = sse_op_table3[(s->dflag == 2) * 2 + ((b >> 8) - 2)];
>
> b >> 8 can be only either 1 or 0. I don't see how this can work, won't
> the array index become negative for s->dflag != 2?
>
> The other access is as follows (line 3594):
> sse_op2 = sse_op_table3[(s->dflag == 2) * 2 + ((b >> 8) - 2) + 4 +
> (b & 1) * 4];
>
> This looks better because of + 4 but I think some array values are not
> accessible (max. 1 * 2 + (1 - 2) + 4 + 1 * 4 == 9).
I still don't understand the arithmetic, but it looks like the correct
helpers are called:
$ cat cvtsi2ss.c
int main(void)
{
asm("cvtsi2ss %eax, %xmm0;");
asm("cvtsi2sd %eax, %xmm0;");
#ifdef __amd64__
asm("cvtsi2ss %rax, %xmm0;");
asm("cvtsi2sd %rax, %xmm0;");
#endif
asm("cvttss2si %xmm0, %eax;");
asm("cvttsd2si %xmm0, %eax;");
#ifdef __amd64__
asm("cvttss2si %xmm0, %rax;");
asm("cvttsd2si %xmm0, %rax;");
#endif
asm("cvtss2si %xmm0, %eax;");
asm("cvtsd2si %xmm0, %eax;");
#ifdef __amd64__
asm("cvtss2si %xmm0, %rax;");
asm("cvtsd2si %xmm0, %rax;");
#endif
return 0;
}
$ gcc -o cvtsi2ss cvtsi2ss.c
$ gcc -m32 -o cvtsi2ss.i386 cvtsi2ss.c
$ qemu-x86_64 -d in_asm,op_opt ./cvtsi2ss
IN: main
0x0000000000400494: push %rbp
0x0000000000400495: mov %rsp,%rbp
0x0000000000400498: cvtsi2ss %eax,%xmm0
0x000000000040049c: cvtsi2sd %eax,%xmm0
0x00000000004004a0: cvtsi2ssq %rax,%xmm0
0x00000000004004a5: cvtsi2sdq %rax,%xmm0
0x00000000004004aa: cvttss2si %xmm0,%eax
0x00000000004004ae: cvttsd2si %xmm0,%eax
0x00000000004004b2: cvttss2siq %xmm0,%rax
0x00000000004004b7: cvttsd2siq %xmm0,%rax
0x00000000004004bc: cvtss2si %xmm0,%eax
0x00000000004004c0: cvtsd2si %xmm0,%eax
0x00000000004004c4: cvtss2siq %xmm0,%rax
0x00000000004004c9: cvtsd2siq %xmm0,%rax
0x00000000004004ce: mov $0x0,%eax
0x00000000004004d3: leaveq
0x00000000004004d4: retq
OP after liveness analysis:
mov_i64 tmp0,rbp
mov_i64 tmp2,rsp
movi_i64 tmp12,$0xfffffffffffffff8
add_i64 tmp2,tmp2,tmp12
qemu_st64 tmp0,tmp2,$0xffffffffffffffff
mov_i64 rsp,tmp2
mov_i64 tmp0,rsp
mov_i64 rbp,tmp0
mov_i64 tmp0,rax
movi_i64 tmp12,$0x2a8
add_i64 tmp10,env,tmp12
mov_i32 tmp6,tmp0
movi_i64 tmp12,$cvtsi2ss
call tmp12,$0x0,$0,tmp10,tmp6
mov_i64 tmp0,rax
movi_i64 tmp12,$0x2a8
add_i64 tmp10,env,tmp12
mov_i32 tmp6,tmp0
movi_i64 tmp12,$cvtsi2sd
call tmp12,$0x0,$0,tmp10,tmp6
mov_i64 tmp0,rax
movi_i64 tmp12,$0x2a8
add_i64 tmp10,env,tmp12
movi_i64 tmp12,$cvtsq2ss
call tmp12,$0x0,$0,tmp10,tmp0
mov_i64 tmp0,rax
movi_i64 tmp12,$0x2a8
add_i64 tmp10,env,tmp12
movi_i64 tmp12,$cvtsq2sd
call tmp12,$0x0,$0,tmp10,tmp0
movi_i64 tmp12,$0x2a8
add_i64 tmp10,env,tmp12
movi_i64 tmp12,$cvttss2si
call tmp12,$0x0,$1,tmp6,tmp10
ext32u_i64 tmp0,tmp6
ext32u_i64 rax,tmp0
movi_i64 tmp12,$0x2a8
add_i64 tmp10,env,tmp12
movi_i64 tmp12,$cvttsd2si
call tmp12,$0x0,$1,tmp6,tmp10
ext32u_i64 tmp0,tmp6
ext32u_i64 rax,tmp0
movi_i64 tmp12,$0x2a8
add_i64 tmp10,env,tmp12
movi_i64 tmp12,$cvttss2sq
call tmp12,$0x0,$1,tmp0,tmp10
mov_i64 rax,tmp0
movi_i64 tmp12,$0x2a8
add_i64 tmp10,env,tmp12
movi_i64 tmp12,$cvttsd2sq
call tmp12,$0x0,$1,tmp0,tmp10
mov_i64 rax,tmp0
movi_i64 tmp12,$0x2a8
add_i64 tmp10,env,tmp12
movi_i64 tmp12,$cvtss2si
call tmp12,$0x0,$1,tmp6,tmp10
ext32u_i64 tmp0,tmp6
ext32u_i64 rax,tmp0
movi_i64 tmp12,$0x2a8
add_i64 tmp10,env,tmp12
movi_i64 tmp12,$cvtsd2si
call tmp12,$0x0,$1,tmp6,tmp10
ext32u_i64 tmp0,tmp6
ext32u_i64 rax,tmp0
movi_i64 tmp12,$0x2a8
add_i64 tmp10,env,tmp12
movi_i64 tmp12,$cvtss2sq
call tmp12,$0x0,$1,tmp0,tmp10
mov_i64 rax,tmp0
movi_i64 tmp12,$0x2a8
add_i64 tmp10,env,tmp12
movi_i64 tmp12,$cvtsd2sq
$ qemu-i386 -d in_asm,op_opt ./cvtsi2ss.i386
$ grep -B3 -A29 cvtsi2ss /tmp/qemu.log
IN: main
0x08048394: push %ebp
0x08048395: mov %esp,%ebp
0x08048397: cvtsi2ss %eax,%xmm0
0x0804839b: cvtsi2sd %eax,%xmm0
0x0804839f: cvttss2si %xmm0,%eax
0x080483a3: cvttsd2si %xmm0,%eax
0x080483a7: cvtss2si %xmm0,%eax
0x080483ab: cvtsd2si %xmm0,%eax
0x080483af: mov $0x0,%eax
0x080483b4: pop %ebp
0x080483b5: ret
OP after liveness analysis:
mov_i32 tmp0,ebp
mov_i32 tmp2,esp
movi_i32 tmp12,$0xfffffffc
add_i32 tmp2,tmp2,tmp12
qemu_st32 tmp0,tmp2,$0xffffffffffffffff
mov_i32 esp,tmp2
mov_i32 tmp0,esp
mov_i32 ebp,tmp0
mov_i32 tmp0,eax
movi_i64 tmp13,$0x1d8
add_i64 tmp10,env,tmp13
mov_i32 tmp6,tmp0
movi_i64 tmp13,$cvtsi2ss
call tmp13,$0x0,$0,tmp10,tmp6
mov_i32 tmp0,eax
movi_i64 tmp13,$0x1d8
add_i64 tmp10,env,tmp13
mov_i32 tmp6,tmp0
movi_i64 tmp13,$cvtsi2sd
call tmp13,$0x0,$0,tmp10,tmp6
movi_i64 tmp13,$0x1d8
add_i64 tmp10,env,tmp13
movi_i64 tmp13,$cvttss2si
call tmp13,$0x0,$1,tmp6,tmp10
nopn $0x2,$0x2
mov_i32 eax,tmp6
movi_i64 tmp13,$0x1d8
add_i64 tmp10,env,tmp13
movi_i64 tmp13,$cvttsd2si
call tmp13,$0x0,$1,tmp6,tmp10
nopn $0x2,$0x2
mov_i32 eax,tmp6
movi_i64 tmp13,$0x1d8
add_i64 tmp10,env,tmp13
movi_i64 tmp13,$cvtss2si
call tmp13,$0x0,$1,tmp6,tmp10
nopn $0x2,$0x2
mov_i32 eax,tmp6
movi_i64 tmp13,$0x1d8
add_i64 tmp10,env,tmp13
movi_i64 tmp13,$cvtsd2si
call tmp13,$0x0,$1,tmp6,tmp10
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] x86: cvtsi2s{s,d} etc. array access
2012-05-14 21:05 [Qemu-devel] x86: cvtsi2s{s,d} etc. array access Blue Swirl
2012-05-15 17:08 ` Blue Swirl
@ 2012-05-15 17:27 ` Peter Maydell
2012-05-15 17:41 ` Blue Swirl
1 sibling, 1 reply; 4+ messages in thread
From: Peter Maydell @ 2012-05-15 17:27 UTC (permalink / raw)
To: Blue Swirl; +Cc: qemu-devel
On 14 May 2012 22:05, Blue Swirl <blauwirbel@gmail.com> wrote:
> While working on the AREG0 patches, I noticed strange code in
> target-i386/translate.c.
> It's accessed like this (line 3537):
> sse_op2 = sse_op_table3[(s->dflag == 2) * 2 + ((b >> 8) - 2)];
>
> b >> 8 can be only either 1 or 0.
I don't think this is true. At this point in the code we're inside
a "switch (b)" so we know that b is either 0x22a (cvtsi2ss) or
0x32a (cvtsi2sd). So "((b >> 8) - 2)" is 0 for cvtsi2ss and 1
for cvtsi2sd, giving us the lsbit of the array index, with
(s->dflag == 2) providing the next bit, so we end up with
indexes 0,1,2,3 in this table for these two insns in their
doubleword and quadword forms.
You could rewrite "((b >> 8) - 2)" as "((b >> 8) & 1)".
> The other access is as follows (line 3594):
> sse_op2 = sse_op_table3[(s->dflag == 2) * 2 + ((b >> 8) - 2) + 4 +
> (b & 1) * 4];
>
> This looks better because of + 4 but I think some array values are not
> accessible (max. 1 * 2 + (1 - 2) + 4 + 1 * 4 == 9).
Here we know b is 0x22c (cvttss2si) 0x32c (cvttsd2si) 0x22d (cvtss2si)
or 0x32d (cvtsd2si). ((b >> 8) - 2) distinguishes the 0x2XX and 0x3XX,
and (b & 1) the 0xXXc from 0xXXd. So the index is made up of (lsbit to
msbit) "0x2XX or 0x3XX?", "double or quad?", "0xXXC or 0xXXD?", and then
we add a constant offset of 4 because the entries start after the
4 entries for the cases we looked at earlier.
I think you could actually split sse_op_table3 into two separate
tables, one for each of these cases, which would be slightly
clearer IMHO.
-- PMM
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] x86: cvtsi2s{s,d} etc. array access
2012-05-15 17:27 ` Peter Maydell
@ 2012-05-15 17:41 ` Blue Swirl
0 siblings, 0 replies; 4+ messages in thread
From: Blue Swirl @ 2012-05-15 17:41 UTC (permalink / raw)
To: Peter Maydell; +Cc: qemu-devel
On Tue, May 15, 2012 at 5:27 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 14 May 2012 22:05, Blue Swirl <blauwirbel@gmail.com> wrote:
>> While working on the AREG0 patches, I noticed strange code in
>> target-i386/translate.c.
>
>> It's accessed like this (line 3537):
>> sse_op2 = sse_op_table3[(s->dflag == 2) * 2 + ((b >> 8) - 2)];
>>
>> b >> 8 can be only either 1 or 0.
>
> I don't think this is true. At this point in the code we're inside
> a "switch (b)" so we know that b is either 0x22a (cvtsi2ss) or
> 0x32a (cvtsi2sd). So "((b >> 8) - 2)" is 0 for cvtsi2ss and 1
> for cvtsi2sd, giving us the lsbit of the array index, with
> (s->dflag == 2) providing the next bit, so we end up with
> indexes 0,1,2,3 in this table for these two insns in their
> doubleword and quadword forms.
OK, I misread the start of the function pretty badly.
>
> You could rewrite "((b >> 8) - 2)" as "((b >> 8) & 1)".
>
>> The other access is as follows (line 3594):
>> sse_op2 = sse_op_table3[(s->dflag == 2) * 2 + ((b >> 8) - 2) + 4 +
>> (b & 1) * 4];
>>
>> This looks better because of + 4 but I think some array values are not
>> accessible (max. 1 * 2 + (1 - 2) + 4 + 1 * 4 == 9).
>
> Here we know b is 0x22c (cvttss2si) 0x32c (cvttsd2si) 0x22d (cvtss2si)
> or 0x32d (cvtsd2si). ((b >> 8) - 2) distinguishes the 0x2XX and 0x3XX,
> and (b & 1) the 0xXXc from 0xXXd. So the index is made up of (lsbit to
> msbit) "0x2XX or 0x3XX?", "double or quad?", "0xXXC or 0xXXD?", and then
> we add a constant offset of 4 because the entries start after the
> 4 entries for the cases we looked at earlier.
>
> I think you could actually split sse_op_table3 into two separate
> tables, one for each of these cases, which would be slightly
> clearer IMHO.
Yes, this is IMHO ugly and there is no type safety due to void
pointers. There could be also an inner switch, like how cvttps2pi is
handled nearby.
>
> -- PMM
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-05-15 17:41 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-14 21:05 [Qemu-devel] x86: cvtsi2s{s,d} etc. array access Blue Swirl
2012-05-15 17:08 ` Blue Swirl
2012-05-15 17:27 ` Peter Maydell
2012-05-15 17:41 ` Blue Swirl
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).