[Qemu-devel] x86 tcg problem

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] x86 tcg problem
@ 2008-07-29  3:32 Vince Weaver
  2008-07-29  8:46 ` Laurent Desnogues
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Vince Weaver @ 2008-07-29  3:32 UTC (permalink / raw)
  To: qemu-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2337 bytes --]

Hello

I've spent a day now trying to figure out why bzip2 compress/decompress 
doesn't work when using sparc32plus-linux-user on x86.

I've tracked the problem to the Zero flag being improperly set (attached 
is a small exe/src that reproduces the problem.. it reports "Greater"
on real hardware, "Less Than" on qemu current).

The issue seems to be a misordering of an x86 sub instruction.  I tried to 
track this down in the tcg code but I quickly got lost.

The code does this for a compare (on sparc the compare turns into a 
subtract with result as the [ignores] zero reg):

  mov_i32 cc_src_0,g4_0                          ;
  mov_i32 cc_src_1,g4_1                          ; load g4  (0xaae60)
  mov_i32 cc_src2_0,g3_0                         ;
  mov_i32 cc_src2_1,g3_1                         ; load g3  (0)
  sub2_i32 cc_dst_0,cc_dst_1,cc_src2_0,cc_src2_1,cc_src_0,cc_src_1
                                                 ; result = 0xaafe0-0
  movi_i32 psr,$0x0                              ; clear psr
  mov_i32 tmp42,cc_dst_0                         ; get cc_dst_0
  movi_i32 tmp43,$0x0                            ;
  movi_i32 tmp44,$0x0                            ;
  movi_i32 tmp45,$0x0                            ; zero extends
  brcond2_i32 tmp42,tmp43,tmp44,tmp45,$0x1,$0x0  ; if not zero, skip
  movi_i32 tmp19,$0x400000                       ; else set zero flag



which converts into x86:
  0xb80da04d:  sub    %ecx,%eax          ; %ecx = g4-g3
  0xb80da04f:  sbb    %ebx,%edx
  0xb80da051:  mov    %eax,0x6c(%ebp)    ; saving g3, not the result (ecx)!
  0xb80da054:  mov    %edx,0x70(%ebp)	;
  0xb80da057:  xor    %edx,%edx
  0xb80da059:  xor    %ecx,%ecx		; clearing our result for use as psr
 					; result is lost!
 					; the later test for zero is done
 					; against g3 instead, which
 					; sets the zero flag when it
  ...					; shouldn't
  0xb80da06f:  test   %eax,%eax
  0xb80da071:  jne    0xb80da091		; skip if not zero
  ..
  0xb80da07f:  mov    0x8c(%ebp),%eax	; load psr
  0xb80da085:  or     $0x400000,%eax	; set zero flag


So unless there's some weird AT&T/intel ordering thing that is confusing 
me (please let me know if I am missing something), TCG is getting confused 
about which argument of the subtract is the result.  I'm not sure how to 
fix this though...

Thanks for any help

Vince

[-- Attachment #2: Type: TEXT/PLAIN, Size: 1471 bytes --]

!     + Syscalls have number in %g1, options in %o0,%o1,...
!	Result returned in %o0
!	Linux syscall is called by "ta 0x10"

.equ SYSCALL_EXIT,1	
.equ SYSCALL_WRITE,4

.equ STDOUT,1

	.globl _start
_start:
       set	data_region,%i0
       set	0xae60,%g4
       ld	[ %i0 + 0x4 ], %g3
       inc	%g3
       
       cmp	%g4,%g3
       ble	less_equal
       nop
       ba	greater
       nop
       
less_equal:
       set	lessequal_string,%o1
       ba	write_stdout
       nop

greater:
       set	greater_string,%o1
       
	#================================
	# WRITE_STDOUT
	#================================
	# %o1 has string

write_stdout:

	set	SYSCALL_WRITE,%g1	! Write syscall in %g1
	set	STDOUT,%o0		! 1 in %o0 (stdout)
	set	0,%o2			! 0 (count) in %o2

str_loop1:
	ldub	[%o1+%o2],%l0		! load byte
	cmp	%l0,%g0			! compare against zero
	bnz	str_loop1		! if not nul, repeat
	# BRANCH DELAY SLOT
	inc	%o2			! increment count

	dec	%o2			! correct count	
	ta	0x10			! run the syscall

exit:		
        mov	0,%o0			! exit value
        mov	SYSCALL_EXIT,%g1        ! put the exit syscall number in g1
        ta      0x10			! and exit




!===========================================================================
.data
!===========================================================================

data_region:		.int -1,-1
greater_string:		.ascii  "Greater\n\0"
lessequal_string:	.ascii  "Less Equal\n\0"

[-- Attachment #3: Type: APPLICATION/octet-stream, Size: 985 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] x86 tcg problem
  2008-07-29  3:32 [Qemu-devel] x86 tcg problem Vince Weaver
@ 2008-07-29  8:46 ` Laurent Desnogues
  2008-07-29 13:43   ` Vince Weaver
  2008-07-29 18:24   ` malc
  2008-07-29 17:18 ` Blue Swirl
  2008-07-29 17:51 ` [Qemu-devel] x86 tcg problem Blue Swirl
  2 siblings, 2 replies; 15+ messages in thread
From: Laurent Desnogues @ 2008-07-29  8:46 UTC (permalink / raw)
  To: qemu-devel

On Tue, Jul 29, 2008 at 5:32 AM, Vince Weaver <vince@csl.cornell.edu> wrote:
>
> I've spent a day now trying to figure out why bzip2 compress/decompress
> doesn't work when using sparc32plus-linux-user on x86.
>
> I've tracked the problem to the Zero flag being improperly set (attached is
> a small exe/src that reproduces the problem.. it reports "Greater"
> on real hardware, "Less Than" on qemu current).

It looks like it works with revision 4963:

./sparc32plus-linux-user/qemu-sparc32plus cmp_test
Less Equal

I guess it's malc commits 4961-2 that fixed it :-)


Laurent

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] x86 tcg problem
  2008-07-29  8:46 ` Laurent Desnogues
@ 2008-07-29 13:43   ` Vince Weaver
  2008-07-29 18:24   ` malc
  1 sibling, 0 replies; 15+ messages in thread
From: Vince Weaver @ 2008-07-29 13:43 UTC (permalink / raw)
  To: qemu-devel

On Tue, 29 Jul 2008, Laurent Desnogues wrote:

> On Tue, Jul 29, 2008 at 5:32 AM, Vince Weaver <vince@csl.cornell.edu> wrote:
>>
>> I've spent a day now trying to figure out why bzip2 compress/decompress
>> doesn't work when using sparc32plus-linux-user on x86.
>>
>> I've tracked the problem to the Zero flag being improperly set (attached is
>> a small exe/src that reproduces the problem.. it reports "Greater"
>> on real hardware, "Less Than" on qemu current).
>
> It looks like it works with revision 4963:
>
> ./sparc32plus-linux-user/qemu-sparc32plus cmp_test
> Less Equal
>
> I guess it's malc commits 4961-2 that fixed it :-)

I made a slight mistake in that post, "Less Equal" is the wrong condition, 
not "Less Than".  It still doesn't work, even with 4963.  Real hardware 
returns this:

(valor is a Niagara system)

valor:~/research/sparc_qemu% ./cmp_test
Greater


Vince

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] x86 tcg problem
  2008-07-29  3:32 [Qemu-devel] x86 tcg problem Vince Weaver
  2008-07-29  8:46 ` Laurent Desnogues
@ 2008-07-29 17:18 ` Blue Swirl
  2008-08-05 20:20   ` [Qemu-devel] another SPARC issue Vince Weaver
  2008-07-29 17:51 ` [Qemu-devel] x86 tcg problem Blue Swirl
  2 siblings, 1 reply; 15+ messages in thread
From: Blue Swirl @ 2008-07-29 17:18 UTC (permalink / raw)
  To: qemu-devel

On 7/29/08, Vince Weaver <vince@csl.cornell.edu> wrote:
> Hello
>
>  I've spent a day now trying to figure out why bzip2 compress/decompress
> doesn't work when using sparc32plus-linux-user on x86.
>
>  I've tracked the problem to the Zero flag being improperly set (attached is
> a small exe/src that reproduces the problem.. it reports "Greater"
>  on real hardware, "Less Than" on qemu current).
>
>  The issue seems to be a misordering of an x86 sub instruction.  I tried to
> track this down in the tcg code but I quickly got lost.
>
>  The code does this for a compare (on sparc the compare turns into a
> subtract with result as the [ignores] zero reg):
>
>   mov_i32 cc_src_0,g4_0                          ;
>   mov_i32 cc_src_1,g4_1                          ; load g4  (0xaae60)
>   mov_i32 cc_src2_0,g3_0                         ;
>   mov_i32 cc_src2_1,g3_1                         ; load g3  (0)
>   sub2_i32
> cc_dst_0,cc_dst_1,cc_src2_0,cc_src2_1,cc_src_0,cc_src_1
>                                                 ; result = 0xaafe0-0
>   movi_i32 psr,$0x0                              ; clear psr
>   mov_i32 tmp42,cc_dst_0                         ; get cc_dst_0
>   movi_i32 tmp43,$0x0                            ;
>   movi_i32 tmp44,$0x0                            ;
>   movi_i32 tmp45,$0x0                            ; zero extends
>   brcond2_i32 tmp42,tmp43,tmp44,tmp45,$0x1,$0x0  ; if not
> zero, skip
>   movi_i32 tmp19,$0x400000                       ; else set zero flag
>
>
>
>  which converts into x86:
>   0xb80da04d:  sub    %ecx,%eax          ; %ecx = g4-g3
>   0xb80da04f:  sbb    %ebx,%edx
>   0xb80da051:  mov    %eax,0x6c(%ebp)    ; saving g3, not the result (ecx)!
>   0xb80da054:  mov    %edx,0x70(%ebp)    ;
>   0xb80da057:  xor    %edx,%edx
>   0xb80da059:  xor    %ecx,%ecx          ; clearing our result for use as
> psr
>                                         ; result is lost!
>                                         ; the later test for zero is done
>                                         ; against g3 instead, which
>                                         ; sets the zero flag when it
>   ...                                    ; shouldn't
>   0xb80da06f:  test   %eax,%eax
>   0xb80da071:  jne    0xb80da091         ; skip if not zero
>   ..
>   0xb80da07f:  mov    0x8c(%ebp),%eax    ; load psr
>   0xb80da085:  or     $0x400000,%eax     ; set zero flag
>
>
>  So unless there's some weird AT&T/intel ordering thing that is confusing me
> (please let me know if I am missing something), TCG is getting confused
> about which argument of the subtract is the result.  I'm not sure how to fix
> this though...

Thank you for the analysis! IIRC sub %ecx, %eax is in C:
eax -= ecx;

Still, I can reproduce this, and also amd64 is not correct:
 ---- 0x1008c
 mov_i64 cc_src,g4
 mov_i64 cc_src2,g3
 sub_i64 cc_dst,cc_src,cc_src2
 movi_i32 psr,$0x0
 movi_i64 tmp22,$0xffffffff
 and_i64 tmp21,cc_dst,tmp22
 movi_i64 tmp22,$0x0
 brcond_i64 tmp21,tmp22,$0x1,$0x0

0x601c287b:  mov    0x20(%r14),%rcx
0x601c287f:  mov    %rdx,%r8
0x601c2882:  mov    %rcx,%r9
0x601c2885:  sub    %r8,%r9
0x601c2888:  mov    %r9,%rax
0x601c288b:  and    $0xffffffff,%eax
0x601c2891:  mov    %rsi,0x10a58(%r14)
0x601c2898:  mov    %rdi,0x10a60(%r14)
0x601c289f:  mov    %rcx,0x60(%r14)
0x601c28a3:  mov    %r8,0x68(%r14)
0x601c28a7:  mov    %r9,0x70(%r14)
0x601c28ab:  xor    %edi,%edi
0x601c28ad:  mov    %edi,0x90(%r14)
0x601c28b4:  mov    %rdx,0x18(%r14)
0x601c28b8:  test   %rax,%rax
0x601c28bb:  jne    0x601c28d5

Though gen_op_sub_cc C flag generation part looks suspicious.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] x86 tcg problem
  2008-07-29  3:32 [Qemu-devel] x86 tcg problem Vince Weaver
  2008-07-29  8:46 ` Laurent Desnogues
  2008-07-29 17:18 ` Blue Swirl
@ 2008-07-29 17:51 ` Blue Swirl
  2 siblings, 0 replies; 15+ messages in thread
From: Blue Swirl @ 2008-07-29 17:51 UTC (permalink / raw)
  To: qemu-devel

On 7/29/08, Vince Weaver <vince@csl.cornell.edu> wrote:
> Hello
>
>  I've spent a day now trying to figure out why bzip2 compress/decompress
> doesn't work when using sparc32plus-linux-user on x86.
>
>  I've tracked the problem to the Zero flag being improperly set (attached is
> a small exe/src that reproduces the problem.. it reports "Greater"
>  on real hardware, "Less Than" on qemu current).
>
>  The issue seems to be a misordering of an x86 sub instruction.  I tried to
> track this down in the tcg code but I quickly got lost.
>
>  The code does this for a compare (on sparc the compare turns into a
> subtract with result as the [ignores] zero reg):
>
>   mov_i32 cc_src_0,g4_0                          ;
>   mov_i32 cc_src_1,g4_1                          ; load g4  (0xaae60)
>   mov_i32 cc_src2_0,g3_0                         ;
>   mov_i32 cc_src2_1,g3_1                         ; load g3  (0)
>   sub2_i32
> cc_dst_0,cc_dst_1,cc_src2_0,cc_src2_1,cc_src_0,cc_src_1
>                                                 ; result = 0xaafe0-0
>   movi_i32 psr,$0x0                              ; clear psr
>   mov_i32 tmp42,cc_dst_0                         ; get cc_dst_0
>   movi_i32 tmp43,$0x0                            ;
>   movi_i32 tmp44,$0x0                            ;
>   movi_i32 tmp45,$0x0                            ; zero extends
>   brcond2_i32 tmp42,tmp43,tmp44,tmp45,$0x1,$0x0  ; if not
> zero, skip
>   movi_i32 tmp19,$0x400000                       ; else set zero flag
>
>
>
>  which converts into x86:
>   0xb80da04d:  sub    %ecx,%eax          ; %ecx = g4-g3
>   0xb80da04f:  sbb    %ebx,%edx
>   0xb80da051:  mov    %eax,0x6c(%ebp)    ; saving g3, not the result (ecx)!
>   0xb80da054:  mov    %edx,0x70(%ebp)    ;
>   0xb80da057:  xor    %edx,%edx
>   0xb80da059:  xor    %ecx,%ecx          ; clearing our result for use as
> psr
>                                         ; result is lost!
>                                         ; the later test for zero is done
>                                         ; against g3 instead, which
>                                         ; sets the zero flag when it
>   ...                                    ; shouldn't
>   0xb80da06f:  test   %eax,%eax
>   0xb80da071:  jne    0xb80da091         ; skip if not zero
>   ..
>   0xb80da07f:  mov    0x8c(%ebp),%eax    ; load psr
>   0xb80da085:  or     $0x400000,%eax     ; set zero flag
>
>
>  So unless there's some weird AT&T/intel ordering thing that is confusing me
> (please let me know if I am missing something), TCG is getting confused
> about which argument of the subtract is the result.  I'm not sure how to fix
> this though...

Now I found the problem, (1 << 31) could be negative on 64 bit
variable. I'm testing a fix. It looks like this also solves some of
the Sparc64 boot problems, like Aurora 2.0 which complained about CRC
error.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] x86 tcg problem
  2008-07-29  8:46 ` Laurent Desnogues
  2008-07-29 13:43   ` Vince Weaver
@ 2008-07-29 18:24   ` malc
  2008-07-29 18:33     ` Laurent Desnogues
  1 sibling, 1 reply; 15+ messages in thread
From: malc @ 2008-07-29 18:24 UTC (permalink / raw)
  To: qemu-devel

On Tue, 29 Jul 2008, Laurent Desnogues wrote:

> On Tue, Jul 29, 2008 at 5:32 AM, Vince Weaver <vince@csl.cornell.edu> wrote:
>>
>> I've spent a day now trying to figure out why bzip2 compress/decompress
>> doesn't work when using sparc32plus-linux-user on x86.
>>
>> I've tracked the problem to the Zero flag being improperly set (attached is
>> a small exe/src that reproduces the problem.. it reports "Greater"
>> on real hardware, "Less Than" on qemu current).
>
> It looks like it works with revision 4963:
>
> ./sparc32plus-linux-user/qemu-sparc32plus cmp_test
> Less Equal
>
> I guess it's malc commits 4961-2 that fixed it :-)

THat would have been a miracle since those do not touch x86 parts at all
:)

-- 
mailto:av1474@comtv.ru

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] x86 tcg problem
  2008-07-29 18:24   ` malc
@ 2008-07-29 18:33     ` Laurent Desnogues
  0 siblings, 0 replies; 15+ messages in thread
From: Laurent Desnogues @ 2008-07-29 18:33 UTC (permalink / raw)
  To: qemu-devel

On Tue, Jul 29, 2008 at 8:24 PM, malc <av1474@comtv.ru> wrote:
>> I guess it's malc commits 4961-2 that fixed it :-)
>
> THat would have been a miracle since those do not touch x86 parts at all
> :)

Well, hum, is exhaustion enough of an excuse for stupidity? :)


Laurent

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Qemu-devel] another SPARC issue
  2008-07-29 17:18 ` Blue Swirl
@ 2008-08-05 20:20   ` Vince Weaver
  2008-08-05 20:36     ` Laurent Desnogues
  0 siblings, 1 reply; 15+ messages in thread
From: Vince Weaver @ 2008-08-05 20:20 UTC (permalink / raw)
  To: qemu-devel; +Cc: Blue Swirl

Hello

so I'm still trying to get bzip to work under sparc32plus-linux-user

This time the problem is with the "ldda" instruction, which is loading off
of the block-read %asi (I am learning way more than I want to about 
obscure parts of the SPARC architecture).

The code involved is doing an optimized memory copy, so it's using the 
floating point unit to accelerate this.

The instruction is
    ldda  [ %o1 + 0x40 ] %asi, %f16

but under Qemu for some reason 0x80 is added to %o1 (instead of 0x40).

Indeed if you look at the generated TCG (see below), the 0x40 is being 
added in twice.  I was digging around in translate.c but can't seem to see 
why this is happning.

Any help would be appreciated...

Thanks

Vince

IN: __memcpy_large
0x000000000002b73c:  ldda  [ %o1 + 0x40 ] %asi, %f16

OP after la:
  ld_i32 tmp11,regwptr,$0x8
  ld_i32 tmp12,regwptr,$0xc
  movi_i32 tmp15,$0x40
  movi_i32 tmp16,$0x0
  add2_i32 loc9,loc10,tmp11,tmp12,tmp15,tmp16
  movi_i32 pc_0,$0x2b73c
  movi_i32 pc_1,$0x0
  movi_i32 npc_0,$0x2b740
  movi_i32 npc_1,$0x0
  movi_i32 tmp15,$0x40
  movi_i32 tmp16,$0x0
  add2_i32 loc9,loc10,loc9,loc10,tmp15,tmp16
  ld_i32 tmp17,env,$0xd9c8
  movi_i32 tmp18,$0x8
  movi_i32 tmp19,$0x10
  movi_i32 tmp20,$helper_ldf_asi
  call tmp20,$0x0,$0,loc9,loc10,tmp17,tmp18,tmp19
  movi_i32 pc_0,$0x2b740
  movi_i32 pc_1,$0x0
  movi_i32 npc_0,$0x2b744
  movi_i32 npc_1,$0x0
  movi_i32 tmp17,$helper_debug
  call tmp17,$0x0,$0
  exit_tb $0x0
  end

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] another SPARC issue
  2008-08-05 20:20   ` [Qemu-devel] another SPARC issue Vince Weaver
@ 2008-08-05 20:36     ` Laurent Desnogues
  2008-08-06  2:33       ` Vince Weaver
  0 siblings, 1 reply; 15+ messages in thread
From: Laurent Desnogues @ 2008-08-05 20:36 UTC (permalink / raw)
  To: qemu-devel

On Tue, Aug 5, 2008 at 10:20 PM, Vince Weaver <vince@csl.cornell.edu> wrote:
>
> The instruction is
>   ldda  [ %o1 + 0x40 ] %asi, %f16
>
> but under Qemu for some reason 0x80 is added to %o1 (instead of 0x40).
>
> Indeed if you look at the generated TCG (see below), the 0x40 is being added
> in twice.  I was digging around in translate.c but can't seem to see why
> this is happning.
>
> Any help would be appreciated...

Since I am not afraid to be ridiculous again, here is my take:

- in disas_sparc_insn, line 4199:
            } else if (IS_IMM) {     /* immediate */
                rs2 = GET_FIELDs(insn, 19, 31);
                tcg_gen_addi_tl(cpu_addr, cpu_src1, (int)rs2);

- line 4307:
                case 0x13:      /* load double word alternate */
                    save_state(dc, cpu_cond);
                    gen_ldda_asi(cpu_val, cpu_addr, insn, rd);

static inline void gen_ldda_asi(TCGv hi, TCGv addr, int insn, int rd)
{
    TCGv r_asi, r_rd;

    r_asi = gen_get_asi(insn, addr);
    r_rd = tcg_const_i32(rd);
    tcg_gen_helper_0_3(helper_ldda_asi, addr, r_asi, r_rd);
    tcg_temp_free(r_rd);
    tcg_temp_free(r_asi);
}

static inline TCGv gen_get_asi(int insn, TCGv r_addr)
{
    int asi, offset;
    TCGv r_asi;

    if (IS_IMM) {
        r_asi = tcg_temp_new(TCG_TYPE_I32);
        offset = GET_FIELD(insn, 25, 31);
        tcg_gen_addi_tl(r_addr, r_addr, offset);
        tcg_gen_ld_i32(r_asi, cpu_env, offsetof(CPUSPARCState, asi));
    } else {
        asi = GET_FIELD(insn, 19, 26);
        r_asi = tcg_const_i32(asi);
    }
    return r_asi;
}

I am not sure I followed the right path, but it seems that indeed
for asi ld/st the constant gets added twice:  once in the disas
function and once in gen_get_asi.


Laurent

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] another SPARC issue
  2008-08-05 20:36     ` Laurent Desnogues
@ 2008-08-06  2:33       ` Vince Weaver
  2008-08-06  8:17         ` Laurent Desnogues
  2008-08-06 15:29         ` Blue Swirl
  0 siblings, 2 replies; 15+ messages in thread
From: Vince Weaver @ 2008-08-06  2:33 UTC (permalink / raw)
  To: qemu-devel


> I am not sure I followed the right path, but it seems that indeed
> for asi ld/st the constant gets added twice:  once in the disas
> function and once in gen_get_asi.

I agree.

The following patch to gen_get_asi() seems to fix things for me, the
bzip executable I am using gets much further along with this patch 
applied (still not working 100% though).

>From what I can tell from the sparcv9 manual this patch is correct, but I 
might be missing something.

--- translate.c.org	2008-08-05 22:30:15.000000000 -0400
+++ translate.c	2008-08-05 22:30:26.000000000 -0400
@@ -1641,13 +1641,11 @@
  #ifdef TARGET_SPARC64
  static inline TCGv gen_get_asi(int insn, TCGv r_addr)
  {
-    int asi, offset;
+    int asi;
      TCGv r_asi;

      if (IS_IMM) {
          r_asi = tcg_temp_new(TCG_TYPE_I32);
-        offset = GET_FIELD(insn, 25, 31);
-        tcg_gen_addi_tl(r_addr, r_addr, offset);
          tcg_gen_ld_i32(r_asi, cpu_env, offsetof(CPUSPARCState, asi));
      } else {
          asi = GET_FIELD(insn, 19, 26);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] another SPARC issue
  2008-08-06  2:33       ` Vince Weaver
@ 2008-08-06  8:17         ` Laurent Desnogues
  2008-08-06 15:29         ` Blue Swirl
  1 sibling, 0 replies; 15+ messages in thread
From: Laurent Desnogues @ 2008-08-06  8:17 UTC (permalink / raw)
  To: qemu-devel

On Wed, Aug 6, 2008 at 4:33 AM, Vince Weaver <vince@csl.cornell.edu> wrote:
>
> From what I can tell from the sparcv9 manual this patch is correct, but I
> might be missing something.
>
> --- translate.c.org     2008-08-05 22:30:15.000000000 -0400
> +++ translate.c 2008-08-05 22:30:26.000000000 -0400
> @@ -1641,13 +1641,11 @@
>  #ifdef TARGET_SPARC64
>  static inline TCGv gen_get_asi(int insn, TCGv r_addr)
>  {
> -    int asi, offset;
> +    int asi;
>     TCGv r_asi;
>
>     if (IS_IMM) {
>         r_asi = tcg_temp_new(TCG_TYPE_I32);
> -        offset = GET_FIELD(insn, 25, 31);
> -        tcg_gen_addi_tl(r_addr, r_addr, offset);
>         tcg_gen_ld_i32(r_asi, cpu_env, offsetof(CPUSPARCState, asi));
>     } else {
>         asi = GET_FIELD(insn, 19, 26);

My understanding of v9 manual is the same as yours.  Your patch looks
correct.


Laurent

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] another SPARC issue
  2008-08-06  2:33       ` Vince Weaver
  2008-08-06  8:17         ` Laurent Desnogues
@ 2008-08-06 15:29         ` Blue Swirl
  2008-08-06 19:31           ` [Qemu-devel] one more " Vince Weaver
  1 sibling, 1 reply; 15+ messages in thread
From: Blue Swirl @ 2008-08-06 15:29 UTC (permalink / raw)
  To: qemu-devel

On 8/6/08, Vince Weaver <vince@csl.cornell.edu> wrote:
>
>
> > I am not sure I followed the right path, but it seems that indeed
> > for asi ld/st the constant gets added twice:  once in the disas
> > function and once in gen_get_asi.
> >
>
>  I agree.
>
>  The following patch to gen_get_asi() seems to fix things for me, the
>  bzip executable I am using gets much further along with this patch applied
> (still not working 100% though).
>
>  From what I can tell from the sparcv9 manual this patch is correct, but I
> might be missing something.

Thanks, applied.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Qemu-devel] one more SPARC issue
  2008-08-06 15:29         ` Blue Swirl
@ 2008-08-06 19:31           ` Vince Weaver
  2008-08-06 19:45             ` Julian Seward
  2008-08-06 19:55             ` Blue Swirl
  0 siblings, 2 replies; 15+ messages in thread
From: Vince Weaver @ 2008-08-06 19:31 UTC (permalink / raw)
  To: qemu-devel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1343 bytes --]


I found one last SPARC issue, this time with the falign() function.

This one was a pain to track down because gdb won't let you show the %gsr 
register, or any of the floating point registers above %f32.

The falign code does this:

      tmp = (*((uint64_t *)&DT0)) << ((env->gsr & 7) * 8);
      tmp |= (*((uint64_t *)&DT1)) >> (64 - (env->gsr & 7) * 8);

But in the case where %gsr is zero, the second case turns into a
shift left of 64, which on many architectures turns into a no-op (rather 
than clearning the result to zero).

So in this case, the output of the falign was rs1 *or'd* with rs2, rather 
than just plain rs1.

I've included a patch below that fixes things for me, and I've also 
attached a test case that shows the bug.

With this, I can finally run the spec 2006 bzip2 benchmark to completion 
under sparc32plus-linux-user

Vince

--- op_helper.c.orig	2008-07-29 22:38:34.000000000 -0400
+++ op_helper.c	2008-08-06 15:24:51.000000000 -0400
@@ -234,7 +234,12 @@
      uint64_t tmp;

      tmp = (*((uint64_t *)&DT0)) << ((env->gsr & 7) * 8);
-    tmp |= (*((uint64_t *)&DT1)) >> (64 - (env->gsr & 7) * 8);
+ 
+       /* on many architectures a shift of 64 does nothing */
+    if ( (env->gsr & 7) !=0) {
+       tmp |= (*((uint64_t *)&DT1)) >> (64 - (env->gsr & 7) * 8);
+    }
+
      *((uint64_t *)&DT0) = tmp;
  }




[-- Attachment #2: Type: TEXT/PLAIN, Size: 2360 bytes --]

! compile with
! as -xarch=v8plusa -o falign_test.o falign_test.s ; 
! ld -o falign_test falign_test.o

!     + Syscalls have number in %g1, options in %o0,%o1,...
!	Result returned in %o0
!	Linux syscall is called by "ta 0x10"

.equ SYSCALL_EXIT,1	
.equ SYSCALL_WRITE,4

.equ STDOUT,1

	.globl _start
_start:


!  
! print the source!
!

       set	source,%o1
       call	write_stdout
       nop

!
! copy the string
!

      set	source,%o1      
      set	destination,%o0


      alignaddr  %o1, %g0, %o3
 
      membar  #StoreStore|#LoadStore|#StoreLoad
      wr  %g0, 0xf0, %asi		       		! set block-copy asi
      nop
      
      ldda  [ %o1 ] %asi, %f0
      nop
      
      faligndata  %f0, %f2, %f16
      faligndata  %f2, %f4, %f18
      faligndata  %f4, %f6, %f20
      faligndata  %f6, %f8, %f22
      faligndata  %f8, %f10, %f24
      faligndata  %f10, %f12, %f26
      faligndata  %f12, %f14, %f28
      faligndata  %f14, %f16, %f30
after:      
      stda  %f16, [ %o0 ] %asi

      nop
      nop
       
      

!
! print the copy
!
       
      set	destination,%o1
      call	write_stdout
      nop


      ba 	exit
      nop


	#================================
	# WRITE_STDOUT
	#================================
	# %o1 has string

write_stdout:

	set	SYSCALL_WRITE,%g1	! Write syscall in %g1
	set	STDOUT,%o0		! 1 in %o0 (stdout)
	set	0,%o2			! 0 (count) in %o2

str_loop1:
	ldub	[%o1+%o2],%l0		! load byte
	cmp	%l0,%g0			! compare against zero
	bnz	str_loop1		! if not nul, repeat
	# BRANCH DELAY SLOT
	inc	%o2			! increment count

	dec	%o2			! correct count	
	ta	0x10			! run the syscall
	retl				! return
	nop

exit:		
        mov	0,%o0			! exit value
        mov	SYSCALL_EXIT,%g1        ! put the exit syscall number in g1
        ta      0x10			! and exit




!===========================================================================
.data
!===========================================================================

.balign 64    !(needs to be on 64-byte boundary)

              !           1         2         3         4
              ! 0123456789012345678901234567890123456789012345678
source: .ascii "The quick brown fox jumped over "
        .ascii  "the lazy dog! Need more fill!\n\0"

.lcomm destination,64

[-- Attachment #3: Type: APPLICATION/octet-stream, Size: 1099 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] one more SPARC issue
  2008-08-06 19:31           ` [Qemu-devel] one more " Vince Weaver
@ 2008-08-06 19:45             ` Julian Seward
  2008-08-06 19:55             ` Blue Swirl
  1 sibling, 0 replies; 15+ messages in thread
From: Julian Seward @ 2008-08-06 19:45 UTC (permalink / raw)
  To: qemu-devel

On Wednesday 06 August 2008 21:31, Vince Weaver wrote:
> +       /* on many architectures a shift of 64 does nothing */

Not so much "does nothing" as "has undefined semantics in C";
hence all bets are off for shifts >= the word size.

J

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] one more SPARC issue
  2008-08-06 19:31           ` [Qemu-devel] one more " Vince Weaver
  2008-08-06 19:45             ` Julian Seward
@ 2008-08-06 19:55             ` Blue Swirl
  1 sibling, 0 replies; 15+ messages in thread
From: Blue Swirl @ 2008-08-06 19:55 UTC (permalink / raw)
  To: qemu-devel

On 8/6/08, Vince Weaver <vince@csl.cornell.edu> wrote:
>
>  I found one last SPARC issue, this time with the falign() function.
>
>  This one was a pain to track down because gdb won't let you show the %gsr
> register, or any of the floating point registers above %f32.
>
>  The falign code does this:
>
>      tmp = (*((uint64_t *)&DT0)) << ((env->gsr & 7) * 8);
>      tmp |= (*((uint64_t *)&DT1)) >> (64 - (env->gsr & 7) * 8);
>
>  But in the case where %gsr is zero, the second case turns into a
>  shift left of 64, which on many architectures turns into a no-op (rather
> than clearning the result to zero).
>
>  So in this case, the output of the falign was rs1 *or'd* with rs2, rather
> than just plain rs1.
>
>  I've included a patch below that fixes things for me, and I've also
> attached a test case that shows the bug.
>
>  With this, I can finally run the spec 2006 bzip2 benchmark to completion
> under sparc32plus-linux-user

Thanks, applied.

There are still other bugs:
dd if=/dev/zero bs=3334 count=1 | ./qemu-sparc64 -L . -d in_asm,op_opt
../bzip2.sparc64 | bzcat | hexdump -C
1+0 records in
1+0 records out
3334 bytes (3.3 kB) copied, 3.4e-05 seconds, 98.1 MB/s

bzip2.sparc64: Caught a SIGSEGV or SIGBUS whilst compressing.

   Possible causes are (most likely first):
   (1) This computer has unreliable memory or cache hardware
       (a surprisingly common problem; try a different machine.)
   (2) A bug in the compiler used to create this executable
       (unlikely, if you didn't compile bzip2 yourself.)
   (3) A real bug in bzip2 -- I hope this should never be the case.
   The user's manual, Section 4.3, has more info on (1) and (2).

   If you suspect this is a bug in bzip2, or are unsure about (1)
   or (2), feel free to report it to me at: jseward@bzip.org.
   Section 4.3 of the user's manual describes the info a useful
   bug report should have.  If the manual is available on your
   system, please try and read it before mailing me.  If you don't
   have the manual or can't be bothered to read it, mail me anyway.

        Input file = (stdin), output file = (stdout)
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000d00  00 00 00 00 00 00                                 |......|
00000d06

With bs=3333 it works as expected.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2008-08-06 19:55 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-29  3:32 [Qemu-devel] x86 tcg problem Vince Weaver
2008-07-29  8:46 ` Laurent Desnogues
2008-07-29 13:43   ` Vince Weaver
2008-07-29 18:24   ` malc
2008-07-29 18:33     ` Laurent Desnogues
2008-07-29 17:18 ` Blue Swirl
2008-08-05 20:20   ` [Qemu-devel] another SPARC issue Vince Weaver
2008-08-05 20:36     ` Laurent Desnogues
2008-08-06  2:33       ` Vince Weaver
2008-08-06  8:17         ` Laurent Desnogues
2008-08-06 15:29         ` Blue Swirl
2008-08-06 19:31           ` [Qemu-devel] one more " Vince Weaver
2008-08-06 19:45             ` Julian Seward
2008-08-06 19:55             ` Blue Swirl
2008-07-29 17:51 ` [Qemu-devel] x86 tcg problem Blue Swirl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).