* [Qemu-devel] x86 tcg problem
@ 2008-07-29 3:32 Vince Weaver
2008-07-29 8:46 ` Laurent Desnogues
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Vince Weaver @ 2008-07-29 3:32 UTC (permalink / raw)
To: qemu-devel
[-- Attachment #1: Type: TEXT/PLAIN, Size: 2337 bytes --]
Hello
I've spent a day now trying to figure out why bzip2 compress/decompress
doesn't work when using sparc32plus-linux-user on x86.
I've tracked the problem to the Zero flag being improperly set (attached
is a small exe/src that reproduces the problem.. it reports "Greater"
on real hardware, "Less Than" on qemu current).
The issue seems to be a misordering of an x86 sub instruction. I tried to
track this down in the tcg code but I quickly got lost.
The code does this for a compare (on sparc the compare turns into a
subtract with result as the [ignores] zero reg):
mov_i32 cc_src_0,g4_0 ;
mov_i32 cc_src_1,g4_1 ; load g4 (0xaae60)
mov_i32 cc_src2_0,g3_0 ;
mov_i32 cc_src2_1,g3_1 ; load g3 (0)
sub2_i32 cc_dst_0,cc_dst_1,cc_src2_0,cc_src2_1,cc_src_0,cc_src_1
; result = 0xaafe0-0
movi_i32 psr,$0x0 ; clear psr
mov_i32 tmp42,cc_dst_0 ; get cc_dst_0
movi_i32 tmp43,$0x0 ;
movi_i32 tmp44,$0x0 ;
movi_i32 tmp45,$0x0 ; zero extends
brcond2_i32 tmp42,tmp43,tmp44,tmp45,$0x1,$0x0 ; if not zero, skip
movi_i32 tmp19,$0x400000 ; else set zero flag
which converts into x86:
0xb80da04d: sub %ecx,%eax ; %ecx = g4-g3
0xb80da04f: sbb %ebx,%edx
0xb80da051: mov %eax,0x6c(%ebp) ; saving g3, not the result (ecx)!
0xb80da054: mov %edx,0x70(%ebp) ;
0xb80da057: xor %edx,%edx
0xb80da059: xor %ecx,%ecx ; clearing our result for use as psr
; result is lost!
; the later test for zero is done
; against g3 instead, which
; sets the zero flag when it
... ; shouldn't
0xb80da06f: test %eax,%eax
0xb80da071: jne 0xb80da091 ; skip if not zero
..
0xb80da07f: mov 0x8c(%ebp),%eax ; load psr
0xb80da085: or $0x400000,%eax ; set zero flag
So unless there's some weird AT&T/intel ordering thing that is confusing
me (please let me know if I am missing something), TCG is getting confused
about which argument of the subtract is the result. I'm not sure how to
fix this though...
Thanks for any help
Vince
[-- Attachment #2: Type: TEXT/PLAIN, Size: 1471 bytes --]
! + Syscalls have number in %g1, options in %o0,%o1,...
! Result returned in %o0
! Linux syscall is called by "ta 0x10"
.equ SYSCALL_EXIT,1
.equ SYSCALL_WRITE,4
.equ STDOUT,1
.globl _start
_start:
set data_region,%i0
set 0xae60,%g4
ld [ %i0 + 0x4 ], %g3
inc %g3
cmp %g4,%g3
ble less_equal
nop
ba greater
nop
less_equal:
set lessequal_string,%o1
ba write_stdout
nop
greater:
set greater_string,%o1
#================================
# WRITE_STDOUT
#================================
# %o1 has string
write_stdout:
set SYSCALL_WRITE,%g1 ! Write syscall in %g1
set STDOUT,%o0 ! 1 in %o0 (stdout)
set 0,%o2 ! 0 (count) in %o2
str_loop1:
ldub [%o1+%o2],%l0 ! load byte
cmp %l0,%g0 ! compare against zero
bnz str_loop1 ! if not nul, repeat
# BRANCH DELAY SLOT
inc %o2 ! increment count
dec %o2 ! correct count
ta 0x10 ! run the syscall
exit:
mov 0,%o0 ! exit value
mov SYSCALL_EXIT,%g1 ! put the exit syscall number in g1
ta 0x10 ! and exit
!===========================================================================
.data
!===========================================================================
data_region: .int -1,-1
greater_string: .ascii "Greater\n\0"
lessequal_string: .ascii "Less Equal\n\0"
[-- Attachment #3: Type: APPLICATION/octet-stream, Size: 985 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] x86 tcg problem
2008-07-29 3:32 [Qemu-devel] x86 tcg problem Vince Weaver
@ 2008-07-29 8:46 ` Laurent Desnogues
2008-07-29 13:43 ` Vince Weaver
2008-07-29 18:24 ` malc
2008-07-29 17:18 ` Blue Swirl
2008-07-29 17:51 ` [Qemu-devel] x86 tcg problem Blue Swirl
2 siblings, 2 replies; 15+ messages in thread
From: Laurent Desnogues @ 2008-07-29 8:46 UTC (permalink / raw)
To: qemu-devel
On Tue, Jul 29, 2008 at 5:32 AM, Vince Weaver <vince@csl.cornell.edu> wrote:
>
> I've spent a day now trying to figure out why bzip2 compress/decompress
> doesn't work when using sparc32plus-linux-user on x86.
>
> I've tracked the problem to the Zero flag being improperly set (attached is
> a small exe/src that reproduces the problem.. it reports "Greater"
> on real hardware, "Less Than" on qemu current).
It looks like it works with revision 4963:
./sparc32plus-linux-user/qemu-sparc32plus cmp_test
Less Equal
I guess it's malc commits 4961-2 that fixed it :-)
Laurent
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] x86 tcg problem
2008-07-29 8:46 ` Laurent Desnogues
@ 2008-07-29 13:43 ` Vince Weaver
2008-07-29 18:24 ` malc
1 sibling, 0 replies; 15+ messages in thread
From: Vince Weaver @ 2008-07-29 13:43 UTC (permalink / raw)
To: qemu-devel
On Tue, 29 Jul 2008, Laurent Desnogues wrote:
> On Tue, Jul 29, 2008 at 5:32 AM, Vince Weaver <vince@csl.cornell.edu> wrote:
>>
>> I've spent a day now trying to figure out why bzip2 compress/decompress
>> doesn't work when using sparc32plus-linux-user on x86.
>>
>> I've tracked the problem to the Zero flag being improperly set (attached is
>> a small exe/src that reproduces the problem.. it reports "Greater"
>> on real hardware, "Less Than" on qemu current).
>
> It looks like it works with revision 4963:
>
> ./sparc32plus-linux-user/qemu-sparc32plus cmp_test
> Less Equal
>
> I guess it's malc commits 4961-2 that fixed it :-)
I made a slight mistake in that post, "Less Equal" is the wrong condition,
not "Less Than". It still doesn't work, even with 4963. Real hardware
returns this:
(valor is a Niagara system)
valor:~/research/sparc_qemu% ./cmp_test
Greater
Vince
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] x86 tcg problem
2008-07-29 3:32 [Qemu-devel] x86 tcg problem Vince Weaver
2008-07-29 8:46 ` Laurent Desnogues
@ 2008-07-29 17:18 ` Blue Swirl
2008-08-05 20:20 ` [Qemu-devel] another SPARC issue Vince Weaver
2008-07-29 17:51 ` [Qemu-devel] x86 tcg problem Blue Swirl
2 siblings, 1 reply; 15+ messages in thread
From: Blue Swirl @ 2008-07-29 17:18 UTC (permalink / raw)
To: qemu-devel
On 7/29/08, Vince Weaver <vince@csl.cornell.edu> wrote:
> Hello
>
> I've spent a day now trying to figure out why bzip2 compress/decompress
> doesn't work when using sparc32plus-linux-user on x86.
>
> I've tracked the problem to the Zero flag being improperly set (attached is
> a small exe/src that reproduces the problem.. it reports "Greater"
> on real hardware, "Less Than" on qemu current).
>
> The issue seems to be a misordering of an x86 sub instruction. I tried to
> track this down in the tcg code but I quickly got lost.
>
> The code does this for a compare (on sparc the compare turns into a
> subtract with result as the [ignores] zero reg):
>
> mov_i32 cc_src_0,g4_0 ;
> mov_i32 cc_src_1,g4_1 ; load g4 (0xaae60)
> mov_i32 cc_src2_0,g3_0 ;
> mov_i32 cc_src2_1,g3_1 ; load g3 (0)
> sub2_i32
> cc_dst_0,cc_dst_1,cc_src2_0,cc_src2_1,cc_src_0,cc_src_1
> ; result = 0xaafe0-0
> movi_i32 psr,$0x0 ; clear psr
> mov_i32 tmp42,cc_dst_0 ; get cc_dst_0
> movi_i32 tmp43,$0x0 ;
> movi_i32 tmp44,$0x0 ;
> movi_i32 tmp45,$0x0 ; zero extends
> brcond2_i32 tmp42,tmp43,tmp44,tmp45,$0x1,$0x0 ; if not
> zero, skip
> movi_i32 tmp19,$0x400000 ; else set zero flag
>
>
>
> which converts into x86:
> 0xb80da04d: sub %ecx,%eax ; %ecx = g4-g3
> 0xb80da04f: sbb %ebx,%edx
> 0xb80da051: mov %eax,0x6c(%ebp) ; saving g3, not the result (ecx)!
> 0xb80da054: mov %edx,0x70(%ebp) ;
> 0xb80da057: xor %edx,%edx
> 0xb80da059: xor %ecx,%ecx ; clearing our result for use as
> psr
> ; result is lost!
> ; the later test for zero is done
> ; against g3 instead, which
> ; sets the zero flag when it
> ... ; shouldn't
> 0xb80da06f: test %eax,%eax
> 0xb80da071: jne 0xb80da091 ; skip if not zero
> ..
> 0xb80da07f: mov 0x8c(%ebp),%eax ; load psr
> 0xb80da085: or $0x400000,%eax ; set zero flag
>
>
> So unless there's some weird AT&T/intel ordering thing that is confusing me
> (please let me know if I am missing something), TCG is getting confused
> about which argument of the subtract is the result. I'm not sure how to fix
> this though...
Thank you for the analysis! IIRC sub %ecx, %eax is in C:
eax -= ecx;
Still, I can reproduce this, and also amd64 is not correct:
---- 0x1008c
mov_i64 cc_src,g4
mov_i64 cc_src2,g3
sub_i64 cc_dst,cc_src,cc_src2
movi_i32 psr,$0x0
movi_i64 tmp22,$0xffffffff
and_i64 tmp21,cc_dst,tmp22
movi_i64 tmp22,$0x0
brcond_i64 tmp21,tmp22,$0x1,$0x0
0x601c287b: mov 0x20(%r14),%rcx
0x601c287f: mov %rdx,%r8
0x601c2882: mov %rcx,%r9
0x601c2885: sub %r8,%r9
0x601c2888: mov %r9,%rax
0x601c288b: and $0xffffffff,%eax
0x601c2891: mov %rsi,0x10a58(%r14)
0x601c2898: mov %rdi,0x10a60(%r14)
0x601c289f: mov %rcx,0x60(%r14)
0x601c28a3: mov %r8,0x68(%r14)
0x601c28a7: mov %r9,0x70(%r14)
0x601c28ab: xor %edi,%edi
0x601c28ad: mov %edi,0x90(%r14)
0x601c28b4: mov %rdx,0x18(%r14)
0x601c28b8: test %rax,%rax
0x601c28bb: jne 0x601c28d5
Though gen_op_sub_cc C flag generation part looks suspicious.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] x86 tcg problem
2008-07-29 3:32 [Qemu-devel] x86 tcg problem Vince Weaver
2008-07-29 8:46 ` Laurent Desnogues
2008-07-29 17:18 ` Blue Swirl
@ 2008-07-29 17:51 ` Blue Swirl
2 siblings, 0 replies; 15+ messages in thread
From: Blue Swirl @ 2008-07-29 17:51 UTC (permalink / raw)
To: qemu-devel
On 7/29/08, Vince Weaver <vince@csl.cornell.edu> wrote:
> Hello
>
> I've spent a day now trying to figure out why bzip2 compress/decompress
> doesn't work when using sparc32plus-linux-user on x86.
>
> I've tracked the problem to the Zero flag being improperly set (attached is
> a small exe/src that reproduces the problem.. it reports "Greater"
> on real hardware, "Less Than" on qemu current).
>
> The issue seems to be a misordering of an x86 sub instruction. I tried to
> track this down in the tcg code but I quickly got lost.
>
> The code does this for a compare (on sparc the compare turns into a
> subtract with result as the [ignores] zero reg):
>
> mov_i32 cc_src_0,g4_0 ;
> mov_i32 cc_src_1,g4_1 ; load g4 (0xaae60)
> mov_i32 cc_src2_0,g3_0 ;
> mov_i32 cc_src2_1,g3_1 ; load g3 (0)
> sub2_i32
> cc_dst_0,cc_dst_1,cc_src2_0,cc_src2_1,cc_src_0,cc_src_1
> ; result = 0xaafe0-0
> movi_i32 psr,$0x0 ; clear psr
> mov_i32 tmp42,cc_dst_0 ; get cc_dst_0
> movi_i32 tmp43,$0x0 ;
> movi_i32 tmp44,$0x0 ;
> movi_i32 tmp45,$0x0 ; zero extends
> brcond2_i32 tmp42,tmp43,tmp44,tmp45,$0x1,$0x0 ; if not
> zero, skip
> movi_i32 tmp19,$0x400000 ; else set zero flag
>
>
>
> which converts into x86:
> 0xb80da04d: sub %ecx,%eax ; %ecx = g4-g3
> 0xb80da04f: sbb %ebx,%edx
> 0xb80da051: mov %eax,0x6c(%ebp) ; saving g3, not the result (ecx)!
> 0xb80da054: mov %edx,0x70(%ebp) ;
> 0xb80da057: xor %edx,%edx
> 0xb80da059: xor %ecx,%ecx ; clearing our result for use as
> psr
> ; result is lost!
> ; the later test for zero is done
> ; against g3 instead, which
> ; sets the zero flag when it
> ... ; shouldn't
> 0xb80da06f: test %eax,%eax
> 0xb80da071: jne 0xb80da091 ; skip if not zero
> ..
> 0xb80da07f: mov 0x8c(%ebp),%eax ; load psr
> 0xb80da085: or $0x400000,%eax ; set zero flag
>
>
> So unless there's some weird AT&T/intel ordering thing that is confusing me
> (please let me know if I am missing something), TCG is getting confused
> about which argument of the subtract is the result. I'm not sure how to fix
> this though...
Now I found the problem, (1 << 31) could be negative on 64 bit
variable. I'm testing a fix. It looks like this also solves some of
the Sparc64 boot problems, like Aurora 2.0 which complained about CRC
error.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] x86 tcg problem
2008-07-29 8:46 ` Laurent Desnogues
2008-07-29 13:43 ` Vince Weaver
@ 2008-07-29 18:24 ` malc
2008-07-29 18:33 ` Laurent Desnogues
1 sibling, 1 reply; 15+ messages in thread
From: malc @ 2008-07-29 18:24 UTC (permalink / raw)
To: qemu-devel
On Tue, 29 Jul 2008, Laurent Desnogues wrote:
> On Tue, Jul 29, 2008 at 5:32 AM, Vince Weaver <vince@csl.cornell.edu> wrote:
>>
>> I've spent a day now trying to figure out why bzip2 compress/decompress
>> doesn't work when using sparc32plus-linux-user on x86.
>>
>> I've tracked the problem to the Zero flag being improperly set (attached is
>> a small exe/src that reproduces the problem.. it reports "Greater"
>> on real hardware, "Less Than" on qemu current).
>
> It looks like it works with revision 4963:
>
> ./sparc32plus-linux-user/qemu-sparc32plus cmp_test
> Less Equal
>
> I guess it's malc commits 4961-2 that fixed it :-)
THat would have been a miracle since those do not touch x86 parts at all
:)
--
mailto:av1474@comtv.ru
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] x86 tcg problem
2008-07-29 18:24 ` malc
@ 2008-07-29 18:33 ` Laurent Desnogues
0 siblings, 0 replies; 15+ messages in thread
From: Laurent Desnogues @ 2008-07-29 18:33 UTC (permalink / raw)
To: qemu-devel
On Tue, Jul 29, 2008 at 8:24 PM, malc <av1474@comtv.ru> wrote:
>> I guess it's malc commits 4961-2 that fixed it :-)
>
> THat would have been a miracle since those do not touch x86 parts at all
> :)
Well, hum, is exhaustion enough of an excuse for stupidity? :)
Laurent
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Qemu-devel] another SPARC issue
2008-07-29 17:18 ` Blue Swirl
@ 2008-08-05 20:20 ` Vince Weaver
2008-08-05 20:36 ` Laurent Desnogues
0 siblings, 1 reply; 15+ messages in thread
From: Vince Weaver @ 2008-08-05 20:20 UTC (permalink / raw)
To: qemu-devel; +Cc: Blue Swirl
Hello
so I'm still trying to get bzip to work under sparc32plus-linux-user
This time the problem is with the "ldda" instruction, which is loading off
of the block-read %asi (I am learning way more than I want to about
obscure parts of the SPARC architecture).
The code involved is doing an optimized memory copy, so it's using the
floating point unit to accelerate this.
The instruction is
ldda [ %o1 + 0x40 ] %asi, %f16
but under Qemu for some reason 0x80 is added to %o1 (instead of 0x40).
Indeed if you look at the generated TCG (see below), the 0x40 is being
added in twice. I was digging around in translate.c but can't seem to see
why this is happning.
Any help would be appreciated...
Thanks
Vince
IN: __memcpy_large
0x000000000002b73c: ldda [ %o1 + 0x40 ] %asi, %f16
OP after la:
ld_i32 tmp11,regwptr,$0x8
ld_i32 tmp12,regwptr,$0xc
movi_i32 tmp15,$0x40
movi_i32 tmp16,$0x0
add2_i32 loc9,loc10,tmp11,tmp12,tmp15,tmp16
movi_i32 pc_0,$0x2b73c
movi_i32 pc_1,$0x0
movi_i32 npc_0,$0x2b740
movi_i32 npc_1,$0x0
movi_i32 tmp15,$0x40
movi_i32 tmp16,$0x0
add2_i32 loc9,loc10,loc9,loc10,tmp15,tmp16
ld_i32 tmp17,env,$0xd9c8
movi_i32 tmp18,$0x8
movi_i32 tmp19,$0x10
movi_i32 tmp20,$helper_ldf_asi
call tmp20,$0x0,$0,loc9,loc10,tmp17,tmp18,tmp19
movi_i32 pc_0,$0x2b740
movi_i32 pc_1,$0x0
movi_i32 npc_0,$0x2b744
movi_i32 npc_1,$0x0
movi_i32 tmp17,$helper_debug
call tmp17,$0x0,$0
exit_tb $0x0
end
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] another SPARC issue
2008-08-05 20:20 ` [Qemu-devel] another SPARC issue Vince Weaver
@ 2008-08-05 20:36 ` Laurent Desnogues
2008-08-06 2:33 ` Vince Weaver
0 siblings, 1 reply; 15+ messages in thread
From: Laurent Desnogues @ 2008-08-05 20:36 UTC (permalink / raw)
To: qemu-devel
On Tue, Aug 5, 2008 at 10:20 PM, Vince Weaver <vince@csl.cornell.edu> wrote:
>
> The instruction is
> ldda [ %o1 + 0x40 ] %asi, %f16
>
> but under Qemu for some reason 0x80 is added to %o1 (instead of 0x40).
>
> Indeed if you look at the generated TCG (see below), the 0x40 is being added
> in twice. I was digging around in translate.c but can't seem to see why
> this is happning.
>
> Any help would be appreciated...
Since I am not afraid to be ridiculous again, here is my take:
- in disas_sparc_insn, line 4199:
} else if (IS_IMM) { /* immediate */
rs2 = GET_FIELDs(insn, 19, 31);
tcg_gen_addi_tl(cpu_addr, cpu_src1, (int)rs2);
- line 4307:
case 0x13: /* load double word alternate */
save_state(dc, cpu_cond);
gen_ldda_asi(cpu_val, cpu_addr, insn, rd);
static inline void gen_ldda_asi(TCGv hi, TCGv addr, int insn, int rd)
{
TCGv r_asi, r_rd;
r_asi = gen_get_asi(insn, addr);
r_rd = tcg_const_i32(rd);
tcg_gen_helper_0_3(helper_ldda_asi, addr, r_asi, r_rd);
tcg_temp_free(r_rd);
tcg_temp_free(r_asi);
}
static inline TCGv gen_get_asi(int insn, TCGv r_addr)
{
int asi, offset;
TCGv r_asi;
if (IS_IMM) {
r_asi = tcg_temp_new(TCG_TYPE_I32);
offset = GET_FIELD(insn, 25, 31);
tcg_gen_addi_tl(r_addr, r_addr, offset);
tcg_gen_ld_i32(r_asi, cpu_env, offsetof(CPUSPARCState, asi));
} else {
asi = GET_FIELD(insn, 19, 26);
r_asi = tcg_const_i32(asi);
}
return r_asi;
}
I am not sure I followed the right path, but it seems that indeed
for asi ld/st the constant gets added twice: once in the disas
function and once in gen_get_asi.
Laurent
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] another SPARC issue
2008-08-05 20:36 ` Laurent Desnogues
@ 2008-08-06 2:33 ` Vince Weaver
2008-08-06 8:17 ` Laurent Desnogues
2008-08-06 15:29 ` Blue Swirl
0 siblings, 2 replies; 15+ messages in thread
From: Vince Weaver @ 2008-08-06 2:33 UTC (permalink / raw)
To: qemu-devel
> I am not sure I followed the right path, but it seems that indeed
> for asi ld/st the constant gets added twice: once in the disas
> function and once in gen_get_asi.
I agree.
The following patch to gen_get_asi() seems to fix things for me, the
bzip executable I am using gets much further along with this patch
applied (still not working 100% though).
>From what I can tell from the sparcv9 manual this patch is correct, but I
might be missing something.
--- translate.c.org 2008-08-05 22:30:15.000000000 -0400
+++ translate.c 2008-08-05 22:30:26.000000000 -0400
@@ -1641,13 +1641,11 @@
#ifdef TARGET_SPARC64
static inline TCGv gen_get_asi(int insn, TCGv r_addr)
{
- int asi, offset;
+ int asi;
TCGv r_asi;
if (IS_IMM) {
r_asi = tcg_temp_new(TCG_TYPE_I32);
- offset = GET_FIELD(insn, 25, 31);
- tcg_gen_addi_tl(r_addr, r_addr, offset);
tcg_gen_ld_i32(r_asi, cpu_env, offsetof(CPUSPARCState, asi));
} else {
asi = GET_FIELD(insn, 19, 26);
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] another SPARC issue
2008-08-06 2:33 ` Vince Weaver
@ 2008-08-06 8:17 ` Laurent Desnogues
2008-08-06 15:29 ` Blue Swirl
1 sibling, 0 replies; 15+ messages in thread
From: Laurent Desnogues @ 2008-08-06 8:17 UTC (permalink / raw)
To: qemu-devel
On Wed, Aug 6, 2008 at 4:33 AM, Vince Weaver <vince@csl.cornell.edu> wrote:
>
> From what I can tell from the sparcv9 manual this patch is correct, but I
> might be missing something.
>
> --- translate.c.org 2008-08-05 22:30:15.000000000 -0400
> +++ translate.c 2008-08-05 22:30:26.000000000 -0400
> @@ -1641,13 +1641,11 @@
> #ifdef TARGET_SPARC64
> static inline TCGv gen_get_asi(int insn, TCGv r_addr)
> {
> - int asi, offset;
> + int asi;
> TCGv r_asi;
>
> if (IS_IMM) {
> r_asi = tcg_temp_new(TCG_TYPE_I32);
> - offset = GET_FIELD(insn, 25, 31);
> - tcg_gen_addi_tl(r_addr, r_addr, offset);
> tcg_gen_ld_i32(r_asi, cpu_env, offsetof(CPUSPARCState, asi));
> } else {
> asi = GET_FIELD(insn, 19, 26);
My understanding of v9 manual is the same as yours. Your patch looks
correct.
Laurent
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] another SPARC issue
2008-08-06 2:33 ` Vince Weaver
2008-08-06 8:17 ` Laurent Desnogues
@ 2008-08-06 15:29 ` Blue Swirl
2008-08-06 19:31 ` [Qemu-devel] one more " Vince Weaver
1 sibling, 1 reply; 15+ messages in thread
From: Blue Swirl @ 2008-08-06 15:29 UTC (permalink / raw)
To: qemu-devel
On 8/6/08, Vince Weaver <vince@csl.cornell.edu> wrote:
>
>
> > I am not sure I followed the right path, but it seems that indeed
> > for asi ld/st the constant gets added twice: once in the disas
> > function and once in gen_get_asi.
> >
>
> I agree.
>
> The following patch to gen_get_asi() seems to fix things for me, the
> bzip executable I am using gets much further along with this patch applied
> (still not working 100% though).
>
> From what I can tell from the sparcv9 manual this patch is correct, but I
> might be missing something.
Thanks, applied.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Qemu-devel] one more SPARC issue
2008-08-06 15:29 ` Blue Swirl
@ 2008-08-06 19:31 ` Vince Weaver
2008-08-06 19:45 ` Julian Seward
2008-08-06 19:55 ` Blue Swirl
0 siblings, 2 replies; 15+ messages in thread
From: Vince Weaver @ 2008-08-06 19:31 UTC (permalink / raw)
To: qemu-devel
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1343 bytes --]
I found one last SPARC issue, this time with the falign() function.
This one was a pain to track down because gdb won't let you show the %gsr
register, or any of the floating point registers above %f32.
The falign code does this:
tmp = (*((uint64_t *)&DT0)) << ((env->gsr & 7) * 8);
tmp |= (*((uint64_t *)&DT1)) >> (64 - (env->gsr & 7) * 8);
But in the case where %gsr is zero, the second case turns into a
shift left of 64, which on many architectures turns into a no-op (rather
than clearning the result to zero).
So in this case, the output of the falign was rs1 *or'd* with rs2, rather
than just plain rs1.
I've included a patch below that fixes things for me, and I've also
attached a test case that shows the bug.
With this, I can finally run the spec 2006 bzip2 benchmark to completion
under sparc32plus-linux-user
Vince
--- op_helper.c.orig 2008-07-29 22:38:34.000000000 -0400
+++ op_helper.c 2008-08-06 15:24:51.000000000 -0400
@@ -234,7 +234,12 @@
uint64_t tmp;
tmp = (*((uint64_t *)&DT0)) << ((env->gsr & 7) * 8);
- tmp |= (*((uint64_t *)&DT1)) >> (64 - (env->gsr & 7) * 8);
+
+ /* on many architectures a shift of 64 does nothing */
+ if ( (env->gsr & 7) !=0) {
+ tmp |= (*((uint64_t *)&DT1)) >> (64 - (env->gsr & 7) * 8);
+ }
+
*((uint64_t *)&DT0) = tmp;
}
[-- Attachment #2: Type: TEXT/PLAIN, Size: 2360 bytes --]
! compile with
! as -xarch=v8plusa -o falign_test.o falign_test.s ;
! ld -o falign_test falign_test.o
! + Syscalls have number in %g1, options in %o0,%o1,...
! Result returned in %o0
! Linux syscall is called by "ta 0x10"
.equ SYSCALL_EXIT,1
.equ SYSCALL_WRITE,4
.equ STDOUT,1
.globl _start
_start:
!
! print the source!
!
set source,%o1
call write_stdout
nop
!
! copy the string
!
set source,%o1
set destination,%o0
alignaddr %o1, %g0, %o3
membar #StoreStore|#LoadStore|#StoreLoad
wr %g0, 0xf0, %asi ! set block-copy asi
nop
ldda [ %o1 ] %asi, %f0
nop
faligndata %f0, %f2, %f16
faligndata %f2, %f4, %f18
faligndata %f4, %f6, %f20
faligndata %f6, %f8, %f22
faligndata %f8, %f10, %f24
faligndata %f10, %f12, %f26
faligndata %f12, %f14, %f28
faligndata %f14, %f16, %f30
after:
stda %f16, [ %o0 ] %asi
nop
nop
!
! print the copy
!
set destination,%o1
call write_stdout
nop
ba exit
nop
#================================
# WRITE_STDOUT
#================================
# %o1 has string
write_stdout:
set SYSCALL_WRITE,%g1 ! Write syscall in %g1
set STDOUT,%o0 ! 1 in %o0 (stdout)
set 0,%o2 ! 0 (count) in %o2
str_loop1:
ldub [%o1+%o2],%l0 ! load byte
cmp %l0,%g0 ! compare against zero
bnz str_loop1 ! if not nul, repeat
# BRANCH DELAY SLOT
inc %o2 ! increment count
dec %o2 ! correct count
ta 0x10 ! run the syscall
retl ! return
nop
exit:
mov 0,%o0 ! exit value
mov SYSCALL_EXIT,%g1 ! put the exit syscall number in g1
ta 0x10 ! and exit
!===========================================================================
.data
!===========================================================================
.balign 64 !(needs to be on 64-byte boundary)
! 1 2 3 4
! 0123456789012345678901234567890123456789012345678
source: .ascii "The quick brown fox jumped over "
.ascii "the lazy dog! Need more fill!\n\0"
.lcomm destination,64
[-- Attachment #3: Type: APPLICATION/octet-stream, Size: 1099 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] one more SPARC issue
2008-08-06 19:31 ` [Qemu-devel] one more " Vince Weaver
@ 2008-08-06 19:45 ` Julian Seward
2008-08-06 19:55 ` Blue Swirl
1 sibling, 0 replies; 15+ messages in thread
From: Julian Seward @ 2008-08-06 19:45 UTC (permalink / raw)
To: qemu-devel
On Wednesday 06 August 2008 21:31, Vince Weaver wrote:
> + /* on many architectures a shift of 64 does nothing */
Not so much "does nothing" as "has undefined semantics in C";
hence all bets are off for shifts >= the word size.
J
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] one more SPARC issue
2008-08-06 19:31 ` [Qemu-devel] one more " Vince Weaver
2008-08-06 19:45 ` Julian Seward
@ 2008-08-06 19:55 ` Blue Swirl
1 sibling, 0 replies; 15+ messages in thread
From: Blue Swirl @ 2008-08-06 19:55 UTC (permalink / raw)
To: qemu-devel
On 8/6/08, Vince Weaver <vince@csl.cornell.edu> wrote:
>
> I found one last SPARC issue, this time with the falign() function.
>
> This one was a pain to track down because gdb won't let you show the %gsr
> register, or any of the floating point registers above %f32.
>
> The falign code does this:
>
> tmp = (*((uint64_t *)&DT0)) << ((env->gsr & 7) * 8);
> tmp |= (*((uint64_t *)&DT1)) >> (64 - (env->gsr & 7) * 8);
>
> But in the case where %gsr is zero, the second case turns into a
> shift left of 64, which on many architectures turns into a no-op (rather
> than clearning the result to zero).
>
> So in this case, the output of the falign was rs1 *or'd* with rs2, rather
> than just plain rs1.
>
> I've included a patch below that fixes things for me, and I've also
> attached a test case that shows the bug.
>
> With this, I can finally run the spec 2006 bzip2 benchmark to completion
> under sparc32plus-linux-user
Thanks, applied.
There are still other bugs:
dd if=/dev/zero bs=3334 count=1 | ./qemu-sparc64 -L . -d in_asm,op_opt
../bzip2.sparc64 | bzcat | hexdump -C
1+0 records in
1+0 records out
3334 bytes (3.3 kB) copied, 3.4e-05 seconds, 98.1 MB/s
bzip2.sparc64: Caught a SIGSEGV or SIGBUS whilst compressing.
Possible causes are (most likely first):
(1) This computer has unreliable memory or cache hardware
(a surprisingly common problem; try a different machine.)
(2) A bug in the compiler used to create this executable
(unlikely, if you didn't compile bzip2 yourself.)
(3) A real bug in bzip2 -- I hope this should never be the case.
The user's manual, Section 4.3, has more info on (1) and (2).
If you suspect this is a bug in bzip2, or are unsure about (1)
or (2), feel free to report it to me at: jseward@bzip.org.
Section 4.3 of the user's manual describes the info a useful
bug report should have. If the manual is available on your
system, please try and read it before mailing me. If you don't
have the manual or can't be bothered to read it, mail me anyway.
Input file = (stdin), output file = (stdout)
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000d00 00 00 00 00 00 00 |......|
00000d06
With bs=3333 it works as expected.
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2008-08-06 19:55 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-29 3:32 [Qemu-devel] x86 tcg problem Vince Weaver
2008-07-29 8:46 ` Laurent Desnogues
2008-07-29 13:43 ` Vince Weaver
2008-07-29 18:24 ` malc
2008-07-29 18:33 ` Laurent Desnogues
2008-07-29 17:18 ` Blue Swirl
2008-08-05 20:20 ` [Qemu-devel] another SPARC issue Vince Weaver
2008-08-05 20:36 ` Laurent Desnogues
2008-08-06 2:33 ` Vince Weaver
2008-08-06 8:17 ` Laurent Desnogues
2008-08-06 15:29 ` Blue Swirl
2008-08-06 19:31 ` [Qemu-devel] one more " Vince Weaver
2008-08-06 19:45 ` Julian Seward
2008-08-06 19:55 ` Blue Swirl
2008-07-29 17:51 ` [Qemu-devel] x86 tcg problem Blue Swirl
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).