* [Qemu-devel] x86: cvtsi2s{s,d} etc. array access @ 2012-05-14 21:05 Blue Swirl 2012-05-15 17:08 ` Blue Swirl 2012-05-15 17:27 ` Peter Maydell 0 siblings, 2 replies; 4+ messages in thread From: Blue Swirl @ 2012-05-14 21:05 UTC (permalink / raw) To: qemu-devel Hi, While working on the AREG0 patches, I noticed strange code in target-i386/translate.c. We have this table of function pointers: static void *sse_op_table3[4 * 3] = { gen_helper_cvtsi2ss, gen_helper_cvtsi2sd, X86_64_ONLY(gen_helper_cvtsq2ss), X86_64_ONLY(gen_helper_cvtsq2sd), gen_helper_cvttss2si, gen_helper_cvttsd2si, X86_64_ONLY(gen_helper_cvttss2sq), X86_64_ONLY(gen_helper_cvttsd2sq), gen_helper_cvtss2si, gen_helper_cvtsd2si, X86_64_ONLY(gen_helper_cvtss2sq), X86_64_ONLY(gen_helper_cvtsd2sq), }; It's accessed like this (line 3537): sse_op2 = sse_op_table3[(s->dflag == 2) * 2 + ((b >> 8) - 2)]; b >> 8 can be only either 1 or 0. I don't see how this can work, won't the array index become negative for s->dflag != 2? The other access is as follows (line 3594): sse_op2 = sse_op_table3[(s->dflag == 2) * 2 + ((b >> 8) - 2) + 4 + (b & 1) * 4]; This looks better because of + 4 but I think some array values are not accessible (max. 1 * 2 + (1 - 2) + 4 + 1 * 4 == 9). ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] x86: cvtsi2s{s,d} etc. array access 2012-05-14 21:05 [Qemu-devel] x86: cvtsi2s{s,d} etc. array access Blue Swirl @ 2012-05-15 17:08 ` Blue Swirl 2012-05-15 17:27 ` Peter Maydell 1 sibling, 0 replies; 4+ messages in thread From: Blue Swirl @ 2012-05-15 17:08 UTC (permalink / raw) To: qemu-devel On Mon, May 14, 2012 at 9:05 PM, Blue Swirl <blauwirbel@gmail.com> wrote: > Hi, > > While working on the AREG0 patches, I noticed strange code in > target-i386/translate.c. > > We have this table of function pointers: > static void *sse_op_table3[4 * 3] = { > gen_helper_cvtsi2ss, > gen_helper_cvtsi2sd, > X86_64_ONLY(gen_helper_cvtsq2ss), > X86_64_ONLY(gen_helper_cvtsq2sd), > > gen_helper_cvttss2si, > gen_helper_cvttsd2si, > X86_64_ONLY(gen_helper_cvttss2sq), > X86_64_ONLY(gen_helper_cvttsd2sq), > > gen_helper_cvtss2si, > gen_helper_cvtsd2si, > X86_64_ONLY(gen_helper_cvtss2sq), > X86_64_ONLY(gen_helper_cvtsd2sq), > }; > > It's accessed like this (line 3537): > sse_op2 = sse_op_table3[(s->dflag == 2) * 2 + ((b >> 8) - 2)]; > > b >> 8 can be only either 1 or 0. I don't see how this can work, won't > the array index become negative for s->dflag != 2? > > The other access is as follows (line 3594): > sse_op2 = sse_op_table3[(s->dflag == 2) * 2 + ((b >> 8) - 2) + 4 + > (b & 1) * 4]; > > This looks better because of + 4 but I think some array values are not > accessible (max. 1 * 2 + (1 - 2) + 4 + 1 * 4 == 9). I still don't understand the arithmetic, but it looks like the correct helpers are called: $ cat cvtsi2ss.c int main(void) { asm("cvtsi2ss %eax, %xmm0;"); asm("cvtsi2sd %eax, %xmm0;"); #ifdef __amd64__ asm("cvtsi2ss %rax, %xmm0;"); asm("cvtsi2sd %rax, %xmm0;"); #endif asm("cvttss2si %xmm0, %eax;"); asm("cvttsd2si %xmm0, %eax;"); #ifdef __amd64__ asm("cvttss2si %xmm0, %rax;"); asm("cvttsd2si %xmm0, %rax;"); #endif asm("cvtss2si %xmm0, %eax;"); asm("cvtsd2si %xmm0, %eax;"); #ifdef __amd64__ asm("cvtss2si %xmm0, %rax;"); asm("cvtsd2si %xmm0, %rax;"); #endif return 0; } $ gcc -o cvtsi2ss cvtsi2ss.c $ gcc -m32 -o cvtsi2ss.i386 cvtsi2ss.c $ qemu-x86_64 -d in_asm,op_opt ./cvtsi2ss IN: main 0x0000000000400494: push %rbp 0x0000000000400495: mov %rsp,%rbp 0x0000000000400498: cvtsi2ss %eax,%xmm0 0x000000000040049c: cvtsi2sd %eax,%xmm0 0x00000000004004a0: cvtsi2ssq %rax,%xmm0 0x00000000004004a5: cvtsi2sdq %rax,%xmm0 0x00000000004004aa: cvttss2si %xmm0,%eax 0x00000000004004ae: cvttsd2si %xmm0,%eax 0x00000000004004b2: cvttss2siq %xmm0,%rax 0x00000000004004b7: cvttsd2siq %xmm0,%rax 0x00000000004004bc: cvtss2si %xmm0,%eax 0x00000000004004c0: cvtsd2si %xmm0,%eax 0x00000000004004c4: cvtss2siq %xmm0,%rax 0x00000000004004c9: cvtsd2siq %xmm0,%rax 0x00000000004004ce: mov $0x0,%eax 0x00000000004004d3: leaveq 0x00000000004004d4: retq OP after liveness analysis: mov_i64 tmp0,rbp mov_i64 tmp2,rsp movi_i64 tmp12,$0xfffffffffffffff8 add_i64 tmp2,tmp2,tmp12 qemu_st64 tmp0,tmp2,$0xffffffffffffffff mov_i64 rsp,tmp2 mov_i64 tmp0,rsp mov_i64 rbp,tmp0 mov_i64 tmp0,rax movi_i64 tmp12,$0x2a8 add_i64 tmp10,env,tmp12 mov_i32 tmp6,tmp0 movi_i64 tmp12,$cvtsi2ss call tmp12,$0x0,$0,tmp10,tmp6 mov_i64 tmp0,rax movi_i64 tmp12,$0x2a8 add_i64 tmp10,env,tmp12 mov_i32 tmp6,tmp0 movi_i64 tmp12,$cvtsi2sd call tmp12,$0x0,$0,tmp10,tmp6 mov_i64 tmp0,rax movi_i64 tmp12,$0x2a8 add_i64 tmp10,env,tmp12 movi_i64 tmp12,$cvtsq2ss call tmp12,$0x0,$0,tmp10,tmp0 mov_i64 tmp0,rax movi_i64 tmp12,$0x2a8 add_i64 tmp10,env,tmp12 movi_i64 tmp12,$cvtsq2sd call tmp12,$0x0,$0,tmp10,tmp0 movi_i64 tmp12,$0x2a8 add_i64 tmp10,env,tmp12 movi_i64 tmp12,$cvttss2si call tmp12,$0x0,$1,tmp6,tmp10 ext32u_i64 tmp0,tmp6 ext32u_i64 rax,tmp0 movi_i64 tmp12,$0x2a8 add_i64 tmp10,env,tmp12 movi_i64 tmp12,$cvttsd2si call tmp12,$0x0,$1,tmp6,tmp10 ext32u_i64 tmp0,tmp6 ext32u_i64 rax,tmp0 movi_i64 tmp12,$0x2a8 add_i64 tmp10,env,tmp12 movi_i64 tmp12,$cvttss2sq call tmp12,$0x0,$1,tmp0,tmp10 mov_i64 rax,tmp0 movi_i64 tmp12,$0x2a8 add_i64 tmp10,env,tmp12 movi_i64 tmp12,$cvttsd2sq call tmp12,$0x0,$1,tmp0,tmp10 mov_i64 rax,tmp0 movi_i64 tmp12,$0x2a8 add_i64 tmp10,env,tmp12 movi_i64 tmp12,$cvtss2si call tmp12,$0x0,$1,tmp6,tmp10 ext32u_i64 tmp0,tmp6 ext32u_i64 rax,tmp0 movi_i64 tmp12,$0x2a8 add_i64 tmp10,env,tmp12 movi_i64 tmp12,$cvtsd2si call tmp12,$0x0,$1,tmp6,tmp10 ext32u_i64 tmp0,tmp6 ext32u_i64 rax,tmp0 movi_i64 tmp12,$0x2a8 add_i64 tmp10,env,tmp12 movi_i64 tmp12,$cvtss2sq call tmp12,$0x0,$1,tmp0,tmp10 mov_i64 rax,tmp0 movi_i64 tmp12,$0x2a8 add_i64 tmp10,env,tmp12 movi_i64 tmp12,$cvtsd2sq $ qemu-i386 -d in_asm,op_opt ./cvtsi2ss.i386 $ grep -B3 -A29 cvtsi2ss /tmp/qemu.log IN: main 0x08048394: push %ebp 0x08048395: mov %esp,%ebp 0x08048397: cvtsi2ss %eax,%xmm0 0x0804839b: cvtsi2sd %eax,%xmm0 0x0804839f: cvttss2si %xmm0,%eax 0x080483a3: cvttsd2si %xmm0,%eax 0x080483a7: cvtss2si %xmm0,%eax 0x080483ab: cvtsd2si %xmm0,%eax 0x080483af: mov $0x0,%eax 0x080483b4: pop %ebp 0x080483b5: ret OP after liveness analysis: mov_i32 tmp0,ebp mov_i32 tmp2,esp movi_i32 tmp12,$0xfffffffc add_i32 tmp2,tmp2,tmp12 qemu_st32 tmp0,tmp2,$0xffffffffffffffff mov_i32 esp,tmp2 mov_i32 tmp0,esp mov_i32 ebp,tmp0 mov_i32 tmp0,eax movi_i64 tmp13,$0x1d8 add_i64 tmp10,env,tmp13 mov_i32 tmp6,tmp0 movi_i64 tmp13,$cvtsi2ss call tmp13,$0x0,$0,tmp10,tmp6 mov_i32 tmp0,eax movi_i64 tmp13,$0x1d8 add_i64 tmp10,env,tmp13 mov_i32 tmp6,tmp0 movi_i64 tmp13,$cvtsi2sd call tmp13,$0x0,$0,tmp10,tmp6 movi_i64 tmp13,$0x1d8 add_i64 tmp10,env,tmp13 movi_i64 tmp13,$cvttss2si call tmp13,$0x0,$1,tmp6,tmp10 nopn $0x2,$0x2 mov_i32 eax,tmp6 movi_i64 tmp13,$0x1d8 add_i64 tmp10,env,tmp13 movi_i64 tmp13,$cvttsd2si call tmp13,$0x0,$1,tmp6,tmp10 nopn $0x2,$0x2 mov_i32 eax,tmp6 movi_i64 tmp13,$0x1d8 add_i64 tmp10,env,tmp13 movi_i64 tmp13,$cvtss2si call tmp13,$0x0,$1,tmp6,tmp10 nopn $0x2,$0x2 mov_i32 eax,tmp6 movi_i64 tmp13,$0x1d8 add_i64 tmp10,env,tmp13 movi_i64 tmp13,$cvtsd2si call tmp13,$0x0,$1,tmp6,tmp10 ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] x86: cvtsi2s{s,d} etc. array access 2012-05-14 21:05 [Qemu-devel] x86: cvtsi2s{s,d} etc. array access Blue Swirl 2012-05-15 17:08 ` Blue Swirl @ 2012-05-15 17:27 ` Peter Maydell 2012-05-15 17:41 ` Blue Swirl 1 sibling, 1 reply; 4+ messages in thread From: Peter Maydell @ 2012-05-15 17:27 UTC (permalink / raw) To: Blue Swirl; +Cc: qemu-devel On 14 May 2012 22:05, Blue Swirl <blauwirbel@gmail.com> wrote: > While working on the AREG0 patches, I noticed strange code in > target-i386/translate.c. > It's accessed like this (line 3537): > sse_op2 = sse_op_table3[(s->dflag == 2) * 2 + ((b >> 8) - 2)]; > > b >> 8 can be only either 1 or 0. I don't think this is true. At this point in the code we're inside a "switch (b)" so we know that b is either 0x22a (cvtsi2ss) or 0x32a (cvtsi2sd). So "((b >> 8) - 2)" is 0 for cvtsi2ss and 1 for cvtsi2sd, giving us the lsbit of the array index, with (s->dflag == 2) providing the next bit, so we end up with indexes 0,1,2,3 in this table for these two insns in their doubleword and quadword forms. You could rewrite "((b >> 8) - 2)" as "((b >> 8) & 1)". > The other access is as follows (line 3594): > sse_op2 = sse_op_table3[(s->dflag == 2) * 2 + ((b >> 8) - 2) + 4 + > (b & 1) * 4]; > > This looks better because of + 4 but I think some array values are not > accessible (max. 1 * 2 + (1 - 2) + 4 + 1 * 4 == 9). Here we know b is 0x22c (cvttss2si) 0x32c (cvttsd2si) 0x22d (cvtss2si) or 0x32d (cvtsd2si). ((b >> 8) - 2) distinguishes the 0x2XX and 0x3XX, and (b & 1) the 0xXXc from 0xXXd. So the index is made up of (lsbit to msbit) "0x2XX or 0x3XX?", "double or quad?", "0xXXC or 0xXXD?", and then we add a constant offset of 4 because the entries start after the 4 entries for the cases we looked at earlier. I think you could actually split sse_op_table3 into two separate tables, one for each of these cases, which would be slightly clearer IMHO. -- PMM ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] x86: cvtsi2s{s,d} etc. array access 2012-05-15 17:27 ` Peter Maydell @ 2012-05-15 17:41 ` Blue Swirl 0 siblings, 0 replies; 4+ messages in thread From: Blue Swirl @ 2012-05-15 17:41 UTC (permalink / raw) To: Peter Maydell; +Cc: qemu-devel On Tue, May 15, 2012 at 5:27 PM, Peter Maydell <peter.maydell@linaro.org> wrote: > On 14 May 2012 22:05, Blue Swirl <blauwirbel@gmail.com> wrote: >> While working on the AREG0 patches, I noticed strange code in >> target-i386/translate.c. > >> It's accessed like this (line 3537): >> sse_op2 = sse_op_table3[(s->dflag == 2) * 2 + ((b >> 8) - 2)]; >> >> b >> 8 can be only either 1 or 0. > > I don't think this is true. At this point in the code we're inside > a "switch (b)" so we know that b is either 0x22a (cvtsi2ss) or > 0x32a (cvtsi2sd). So "((b >> 8) - 2)" is 0 for cvtsi2ss and 1 > for cvtsi2sd, giving us the lsbit of the array index, with > (s->dflag == 2) providing the next bit, so we end up with > indexes 0,1,2,3 in this table for these two insns in their > doubleword and quadword forms. OK, I misread the start of the function pretty badly. > > You could rewrite "((b >> 8) - 2)" as "((b >> 8) & 1)". > >> The other access is as follows (line 3594): >> sse_op2 = sse_op_table3[(s->dflag == 2) * 2 + ((b >> 8) - 2) + 4 + >> (b & 1) * 4]; >> >> This looks better because of + 4 but I think some array values are not >> accessible (max. 1 * 2 + (1 - 2) + 4 + 1 * 4 == 9). > > Here we know b is 0x22c (cvttss2si) 0x32c (cvttsd2si) 0x22d (cvtss2si) > or 0x32d (cvtsd2si). ((b >> 8) - 2) distinguishes the 0x2XX and 0x3XX, > and (b & 1) the 0xXXc from 0xXXd. So the index is made up of (lsbit to > msbit) "0x2XX or 0x3XX?", "double or quad?", "0xXXC or 0xXXD?", and then > we add a constant offset of 4 because the entries start after the > 4 entries for the cases we looked at earlier. > > I think you could actually split sse_op_table3 into two separate > tables, one for each of these cases, which would be slightly > clearer IMHO. Yes, this is IMHO ugly and there is no type safety due to void pointers. There could be also an inner switch, like how cvttps2pi is handled nearby. > > -- PMM ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-05-15 17:41 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-05-14 21:05 [Qemu-devel] x86: cvtsi2s{s,d} etc. array access Blue Swirl 2012-05-15 17:08 ` Blue Swirl 2012-05-15 17:27 ` Peter Maydell 2012-05-15 17:41 ` Blue Swirl
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).