From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1KNsqC-0004Gc-Ox for qemu-devel@nongnu.org; Tue, 29 Jul 2008 13:18:28 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1KNsqC-0004FL-AB for qemu-devel@nongnu.org; Tue, 29 Jul 2008 13:18:28 -0400 Received: from [199.232.76.173] (port=33857 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KNsqC-0004F7-0b for qemu-devel@nongnu.org; Tue, 29 Jul 2008 13:18:28 -0400 Received: from yx-out-1718.google.com ([74.125.44.157]:47133) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1KNsqB-0004zn-JR for qemu-devel@nongnu.org; Tue, 29 Jul 2008 13:18:27 -0400 Received: by yx-out-1718.google.com with SMTP id 3so705317yxi.82 for ; Tue, 29 Jul 2008 10:18:26 -0700 (PDT) Message-ID: Date: Tue, 29 Jul 2008 20:18:25 +0300 From: "Blue Swirl" Subject: Re: [Qemu-devel] x86 tcg problem In-Reply-To: <20080728225136.C26546@stanley.csl.cornell.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20080728225136.C26546@stanley.csl.cornell.edu> Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org On 7/29/08, Vince Weaver wrote: > Hello > > I've spent a day now trying to figure out why bzip2 compress/decompress > doesn't work when using sparc32plus-linux-user on x86. > > I've tracked the problem to the Zero flag being improperly set (attached is > a small exe/src that reproduces the problem.. it reports "Greater" > on real hardware, "Less Than" on qemu current). > > The issue seems to be a misordering of an x86 sub instruction. I tried to > track this down in the tcg code but I quickly got lost. > > The code does this for a compare (on sparc the compare turns into a > subtract with result as the [ignores] zero reg): > > mov_i32 cc_src_0,g4_0 ; > mov_i32 cc_src_1,g4_1 ; load g4 (0xaae60) > mov_i32 cc_src2_0,g3_0 ; > mov_i32 cc_src2_1,g3_1 ; load g3 (0) > sub2_i32 > cc_dst_0,cc_dst_1,cc_src2_0,cc_src2_1,cc_src_0,cc_src_1 > ; result = 0xaafe0-0 > movi_i32 psr,$0x0 ; clear psr > mov_i32 tmp42,cc_dst_0 ; get cc_dst_0 > movi_i32 tmp43,$0x0 ; > movi_i32 tmp44,$0x0 ; > movi_i32 tmp45,$0x0 ; zero extends > brcond2_i32 tmp42,tmp43,tmp44,tmp45,$0x1,$0x0 ; if not > zero, skip > movi_i32 tmp19,$0x400000 ; else set zero flag > > > > which converts into x86: > 0xb80da04d: sub %ecx,%eax ; %ecx = g4-g3 > 0xb80da04f: sbb %ebx,%edx > 0xb80da051: mov %eax,0x6c(%ebp) ; saving g3, not the result (ecx)! > 0xb80da054: mov %edx,0x70(%ebp) ; > 0xb80da057: xor %edx,%edx > 0xb80da059: xor %ecx,%ecx ; clearing our result for use as > psr > ; result is lost! > ; the later test for zero is done > ; against g3 instead, which > ; sets the zero flag when it > ... ; shouldn't > 0xb80da06f: test %eax,%eax > 0xb80da071: jne 0xb80da091 ; skip if not zero > .. > 0xb80da07f: mov 0x8c(%ebp),%eax ; load psr > 0xb80da085: or $0x400000,%eax ; set zero flag > > > So unless there's some weird AT&T/intel ordering thing that is confusing me > (please let me know if I am missing something), TCG is getting confused > about which argument of the subtract is the result. I'm not sure how to fix > this though... Thank you for the analysis! IIRC sub %ecx, %eax is in C: eax -= ecx; Still, I can reproduce this, and also amd64 is not correct: ---- 0x1008c mov_i64 cc_src,g4 mov_i64 cc_src2,g3 sub_i64 cc_dst,cc_src,cc_src2 movi_i32 psr,$0x0 movi_i64 tmp22,$0xffffffff and_i64 tmp21,cc_dst,tmp22 movi_i64 tmp22,$0x0 brcond_i64 tmp21,tmp22,$0x1,$0x0 0x601c287b: mov 0x20(%r14),%rcx 0x601c287f: mov %rdx,%r8 0x601c2882: mov %rcx,%r9 0x601c2885: sub %r8,%r9 0x601c2888: mov %r9,%rax 0x601c288b: and $0xffffffff,%eax 0x601c2891: mov %rsi,0x10a58(%r14) 0x601c2898: mov %rdi,0x10a60(%r14) 0x601c289f: mov %rcx,0x60(%r14) 0x601c28a3: mov %r8,0x68(%r14) 0x601c28a7: mov %r9,0x70(%r14) 0x601c28ab: xor %edi,%edi 0x601c28ad: mov %edi,0x90(%r14) 0x601c28b4: mov %rdx,0x18(%r14) 0x601c28b8: test %rax,%rax 0x601c28bb: jne 0x601c28d5 Though gen_op_sub_cc C flag generation part looks suspicious.