From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1KNsqC-0004Gc-Ox
	for qemu-devel@nongnu.org; Tue, 29 Jul 2008 13:18:28 -0400
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1KNsqC-0004FL-AB
	for qemu-devel@nongnu.org; Tue, 29 Jul 2008 13:18:28 -0400
Received: from [199.232.76.173] (port=33857 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1KNsqC-0004F7-0b
	for qemu-devel@nongnu.org; Tue, 29 Jul 2008 13:18:28 -0400
Received: from yx-out-1718.google.com ([74.125.44.157]:47133)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <blauwirbel@gmail.com>) id 1KNsqB-0004zn-JR
	for qemu-devel@nongnu.org; Tue, 29 Jul 2008 13:18:27 -0400
Received: by yx-out-1718.google.com with SMTP id 3so705317yxi.82
	for <qemu-devel@nongnu.org>; Tue, 29 Jul 2008 10:18:26 -0700 (PDT)
Message-ID: <f43fc5580807291018n7f47d010l97276f704b5afb52@mail.gmail.com>
Date: Tue, 29 Jul 2008 20:18:25 +0300
From: "Blue Swirl" <blauwirbel@gmail.com>
Subject: Re: [Qemu-devel] x86 tcg problem
In-Reply-To: <20080728225136.C26546@stanley.csl.cornell.edu>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <20080728225136.C26546@stanley.csl.cornell.edu>
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org

On 7/29/08, Vince Weaver <vince@csl.cornell.edu> wrote:
> Hello
>
>  I've spent a day now trying to figure out why bzip2 compress/decompress
> doesn't work when using sparc32plus-linux-user on x86.
>
>  I've tracked the problem to the Zero flag being improperly set (attached is
> a small exe/src that reproduces the problem.. it reports "Greater"
>  on real hardware, "Less Than" on qemu current).
>
>  The issue seems to be a misordering of an x86 sub instruction.  I tried to
> track this down in the tcg code but I quickly got lost.
>
>  The code does this for a compare (on sparc the compare turns into a
> subtract with result as the [ignores] zero reg):
>
>   mov_i32 cc_src_0,g4_0                          ;
>   mov_i32 cc_src_1,g4_1                          ; load g4  (0xaae60)
>   mov_i32 cc_src2_0,g3_0                         ;
>   mov_i32 cc_src2_1,g3_1                         ; load g3  (0)
>   sub2_i32
> cc_dst_0,cc_dst_1,cc_src2_0,cc_src2_1,cc_src_0,cc_src_1
>                                                 ; result = 0xaafe0-0
>   movi_i32 psr,$0x0                              ; clear psr
>   mov_i32 tmp42,cc_dst_0                         ; get cc_dst_0
>   movi_i32 tmp43,$0x0                            ;
>   movi_i32 tmp44,$0x0                            ;
>   movi_i32 tmp45,$0x0                            ; zero extends
>   brcond2_i32 tmp42,tmp43,tmp44,tmp45,$0x1,$0x0  ; if not
> zero, skip
>   movi_i32 tmp19,$0x400000                       ; else set zero flag
>
>
>
>  which converts into x86:
>   0xb80da04d:  sub    %ecx,%eax          ; %ecx = g4-g3
>   0xb80da04f:  sbb    %ebx,%edx
>   0xb80da051:  mov    %eax,0x6c(%ebp)    ; saving g3, not the result (ecx)!
>   0xb80da054:  mov    %edx,0x70(%ebp)    ;
>   0xb80da057:  xor    %edx,%edx
>   0xb80da059:  xor    %ecx,%ecx          ; clearing our result for use as
> psr
>                                         ; result is lost!
>                                         ; the later test for zero is done
>                                         ; against g3 instead, which
>                                         ; sets the zero flag when it
>   ...                                    ; shouldn't
>   0xb80da06f:  test   %eax,%eax
>   0xb80da071:  jne    0xb80da091         ; skip if not zero
>   ..
>   0xb80da07f:  mov    0x8c(%ebp),%eax    ; load psr
>   0xb80da085:  or     $0x400000,%eax     ; set zero flag
>
>
>  So unless there's some weird AT&T/intel ordering thing that is confusing me
> (please let me know if I am missing something), TCG is getting confused
> about which argument of the subtract is the result.  I'm not sure how to fix
> this though...

Thank you for the analysis! IIRC sub %ecx, %eax is in C:
eax -= ecx;

Still, I can reproduce this, and also amd64 is not correct:
 ---- 0x1008c
 mov_i64 cc_src,g4
 mov_i64 cc_src2,g3
 sub_i64 cc_dst,cc_src,cc_src2
 movi_i32 psr,$0x0
 movi_i64 tmp22,$0xffffffff
 and_i64 tmp21,cc_dst,tmp22
 movi_i64 tmp22,$0x0
 brcond_i64 tmp21,tmp22,$0x1,$0x0

0x601c287b:  mov    0x20(%r14),%rcx
0x601c287f:  mov    %rdx,%r8
0x601c2882:  mov    %rcx,%r9
0x601c2885:  sub    %r8,%r9
0x601c2888:  mov    %r9,%rax
0x601c288b:  and    $0xffffffff,%eax
0x601c2891:  mov    %rsi,0x10a58(%r14)
0x601c2898:  mov    %rdi,0x10a60(%r14)
0x601c289f:  mov    %rcx,0x60(%r14)
0x601c28a3:  mov    %r8,0x68(%r14)
0x601c28a7:  mov    %r9,0x70(%r14)
0x601c28ab:  xor    %edi,%edi
0x601c28ad:  mov    %edi,0x90(%r14)
0x601c28b4:  mov    %rdx,0x18(%r14)
0x601c28b8:  test   %rax,%rax
0x601c28bb:  jne    0x601c28d5

Though gen_op_sub_cc C flag generation part looks suspicious.