From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=54614 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Q8zUg-0006YW-B0 for qemu-devel@nongnu.org; Sun, 10 Apr 2011 14:36:19 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Q8zUe-0006BL-N6 for qemu-devel@nongnu.org; Sun, 10 Apr 2011 14:36:18 -0400 Received: from mail-qy0-f173.google.com ([209.85.216.173]:52635) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Q8zUe-0006BH-Iz for qemu-devel@nongnu.org; Sun, 10 Apr 2011 14:36:16 -0400 Received: by qyk36 with SMTP id 36so926716qyk.4 for ; Sun, 10 Apr 2011 11:36:16 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <20110410132415.GA6719@volta.aurel32.net> From: Artyom Tarasenko Date: Sun, 10 Apr 2011 20:35:55 +0200 Message-ID: Subject: Re: [Qemu-devel] tcg/tcg.c:1892: tcg fatal error Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Blue Swirl Cc: peter.maydell@linaro.org, qemu-devel , Aurelien Jarno On Sun, Apr 10, 2011 at 7:57 PM, Blue Swirl wrote: > On Sun, Apr 10, 2011 at 8:48 PM, Artyom Tarasenko w= rote: >> On Sun, Apr 10, 2011 at 4:44 PM, Blue Swirl wrote= : >>> On Sun, Apr 10, 2011 at 5:09 PM, Artyom Tarasenko = wrote: >>>> On Sun, Apr 10, 2011 at 3:24 PM, Aurelien Jarno = wrote: >>>>> On Sun, Apr 10, 2011 at 02:29:59PM +0200, Artyom Tarasenko wrote: >>>>>> Trying to boot some proprietary OS I get qemu-system-sparc64 crash w= ith a >>>>>> >>>>>> tcg/tcg.c:1892: tcg fatal error >>>>>> >>>>>> error message. >>>>>> >>>>>> It looks like it can be a platform independent bug though, because >>>>>> when a '-singlestep' option IS present, qemu doesn't crash and seems >>>>>> to translate the code properly. >>>>>> >>>>>> (gdb) bt >>>>>> #0 =A00x00000032c2e327f5 in raise () from /lib64/libc.so.6 >>>>>> #1 =A00x00000032c2e33fd5 in abort () from /lib64/libc.so.6 >>>>>> #2 =A00x000000000051933d in tcg_reg_alloc_call (s=3D, >>>>>> def=3D0x89d340, opc=3DINDEX_op_call, args=3D0x10acc98, dead_iargs=3D= 3) at >>>>>> qemu/tcg/tcg.c:1892 >>>>>> #3 =A00x000000000051a557 in tcg_gen_code_common (s=3D0x10b8940, >>>>>> gen_code_buf=3D0x40338b60 "I\213n@H\213] 3\355I\211\256\220") at >>>>>> qemu/tcg/tcg.c:2099 >>>>>> #4 =A0tcg_gen_code (s=3D0x10b8940, gen_code_buf=3D0x40338b60 "I\213n= @H\213] >>>>>> 3\355I\211\256\220") at qemu/tcg/tcg.c:2142 >>>>>> #5 =A00x00000000004d38f1 in cpu_sparc_gen_code (env=3D0x10cce10, >>>>>> tb=3D0x7fffe91bc218, gen_code_size_ptr=3D0x7fffffffd9b4) at >>>>>> qemu/translate-all.c:93 >>>>>> #6 =A00x00000000004d1fd7 in tb_gen_code (env=3D0x10cce10, pc=3D18868= 776, >>>>>> cs_base=3D18868780, flags=3D15, cflags=3D0) at qemu/exec.c:989 >>>>>> #7 =A00x00000000004d4029 in tb_find_slow (env1=3D) at >>>>>> qemu/cpu-exec.c:167 >>>>>> #8 =A0tb_find_fast (env1=3D) at cpu-exec.c:194 >>>>>> #9 =A0cpu_sparc_exec (env1=3D) at qemu/cpu-exec= .c:556 >>>>>> #10 0x0000000000408868 in tcg_cpu_exec () at qemu/cpus.c:1066 >>>>>> #11 cpu_exec_all () at qemu/cpus.c:1102 >>>>>> #12 0x000000000053c756 in main_loop (argc=3D, >>>>>> argv=3D, envp=3D) at >>>>>> qemu/vl.c:1430 >>>>>> >>>>>> I inspected ts->val_type causing the abort() case and it turned out = to be 0. >>>>>> >>>>>> The last lines of qemu.log (without -singlestep) >>>>>> IN: >>>>>> 0x00000000011fe9f0: =A0rdpr =A0%pstate, %g1 >>>>>> 0x00000000011fe9f4: =A0wrpr =A0%g1, 2, %pstate >>>>>> -------------- >>>>>> IN: >>>>>> 0x00000000011fe9f8: =A0ldub =A0[ %o0 ], %o1 >>>>>> 0x00000000011fe9fc: =A0mov =A0%o1, %o2 >>>>>> 0x00000000011fea00: =A0rdpr =A0%tick, %o3 >>>>>> 0x00000000011fea04: =A0cmp =A0%o1, %o2 >>>>>> 0x00000000011fea08: =A0be =A0%icc, 0x11fea00 >>>>>> 0x00000000011fea0c: =A0ldub =A0[ %o0 ], %o2 >>>>>> >>>>>> Search PC... >>>>>> Search PC... >>>>>> Search PC... >>>>>> Search PC... >>>>>> Search PC... >>>>>> Search PC... >>>>>> -------------- >>>>>> IN: >>>>>> 0x00000000011fe9f8: =A0ldub =A0[ %o0 ], %o1 >>>>>> 0x00000000011fe9fc: =A0mov =A0%o1, %o2 >>>>>> 0x00000000011fea00: =A0rdpr =A0%tick, %o3 >>>>>> 0x00000000011fea04: =A0cmp =A0%o1, %o2 >>>>>> 0x00000000011fea08: =A0be =A0%icc, 0x11fea00 >>>>>> 0x00000000011fea0c: =A0ldub =A0[ %o0 ], %o2 >>>>>> >>>>>> 110521: Data Access MMU Miss (v=3D0068) pc=3D00000000011fe9f8 >>>>>> npc=3D00000000011fe9fc SP=3D000000000180ae41 >>>>>> pc: 00000000011fe9f8 =A0npc: 00000000011fe9fc >>>>>> >>>>>> IN: >>>>>> 0x00000000011fea00: =A0rdpr =A0%tick, %o3 >>>>>> 0x00000000011fea04: =A0cmp =A0%o1, %o2 >>>>>> 0x00000000011fea08: =A0be =A0%icc, 0x11fea00 >>>>>> 0x00000000011fea0c: =A0ldub =A0[ %o0 ], %o2 >>>>>> -------------- >>>>>> IN: >>>>>> 0x00000000011fea10: =A0brz,pn =A0 %o2, 0x11fe9f8 >>>>>> 0x00000000011fea14: =A0mov =A0%o2, %o4 >>>>>> -------------- >>>>>> IN: >>>>>> 0x00000000011fea18: =A0rdpr =A0%tick, %o5 >>>>>> 0x00000000011fea1c: =A0cmp =A0%o2, %o4 >>>>>> 0x00000000011fea20: =A0be =A0%icc, 0x11fea18 >>>>>> 0x00000000011fea24: =A0ldub =A0[ %o0 ], %o4 >>>>>> -------------- >>>>>> IN: >>>>>> 0x00000000011fea28: =A0brz,pn =A0 %o4, 0x11fe9f4 >>>>>> 0x00000000011fea2c: =A0wrpr =A0%g0, %g1, %pstate >>>>>> >>>>>> >>>>>> The crash is 100% reproducible and happens always on the same place, >>>>>> so it's probably a pure TCG issue, not related on getting the >>>>>> external/timer interrupts. >>>>>> >>>>>> Do you need any additional info? >>>>>> >>>>> >>>>> What would be interesting would be to get the corresponding TCG code >>>>> from qemu.log (-d op,op_opt). >>>> >>>> >>>> OP: >>>> =A0---- 0x11fea28 >>>> =A0ld_i64 tmp6,regwptr,$0x20 >>>> =A0movi_i64 cond,$0x0 >>>> =A0movi_i64 tmp8,$0x0 >>>> =A0brcond_i64 tmp6,tmp8,ne,$0x0 >>>> =A0movi_i64 cond,$0x1 >>>> =A0set_label $0x0 >>>> >>>> =A0---- 0x11fea2c >>>> =A0movi_i64 tmp7,$0x0 >>>> =A0xor_i64 tmp0,tmp7,g1 >>>> =A0movi_i64 pc,$0x11fea2c >>>> =A0movi_i64 tmp8,$compute_psr >>>> =A0call tmp8,$0x0,$0 >>>> =A0movi_i64 tmp8,$0x0 >>>> =A0brcond_i64 cond,tmp8,eq,$0x1 >>>> =A0movi_i64 npc,$0x11fe9f4 >>>> =A0br $0x2 >>>> =A0set_label $0x1 >>>> =A0movi_i64 npc,$0x11fea30 >>>> =A0set_label $0x2 >>>> =A0movi_i64 tmp8,$wrpstate >>>> =A0call tmp8,$0x0,$0,tmp0 >>>> =A0mov_i64 pc,npc >>>> =A0movi_i64 tmp8,$0x4 >>>> =A0add_i64 npc,npc,tmp8 >>>> =A0exit_tb $0x0 >>>> >>>> OP after liveness analysis: >>>> =A0---- 0x11fea28 >>>> =A0ld_i64 tmp6,regwptr,$0x20 >>>> =A0movi_i64 cond,$0x0 >>>> =A0movi_i64 tmp8,$0x0 >>>> =A0brcond_i64 tmp6,tmp8,ne,$0x0 >>>> =A0movi_i64 cond,$0x1 >>>> =A0set_label $0x0 >>>> >>>> =A0---- 0x11fea2c >>>> =A0nopn $0x2,$0x2 >>>> =A0nopn $0x3,$0x68,$0x3 >>>> =A0movi_i64 pc,$0x11fea2c >>>> =A0movi_i64 tmp8,$compute_psr >>>> =A0call tmp8,$0x0,$0 >>>> =A0movi_i64 tmp8,$0x0 >>>> =A0brcond_i64 cond,tmp8,eq,$0x1 >>>> =A0movi_i64 npc,$0x11fe9f4 >>>> =A0br $0x2 >>>> =A0set_label $0x1 >>>> =A0movi_i64 npc,$0x11fea30 >>>> =A0set_label $0x2 >>>> =A0movi_i64 tmp8,$wrpstate >>>> =A0call tmp8,$0x0,$0,tmp0 >>>> =A0mov_i64 pc,npc >>>> =A0movi_i64 tmp8,$0x4 >>>> =A0add_i64 npc,npc,tmp8 >>>> =A0exit_tb $0x0 >>>> =A0end >>>> >>>> Does it mean the last block is processed correctly and the crash >>>> happens on the next instruction which doesn't make it to the log? >>>> The next instruction would be a >>>> >>>> 0x00000000011fea30: =A0retl >>>> >>>> Since it's a branch instruction I guess this would also be a tcg block= boundary. >>> >>> Because abort() was called from tcg_reg_alloc_call, I'd say 'retl' >>> (synthetic op for 'jmpl %o8 + 8, %g0') was the problem. >> >> Any idea why? retl is not a rare instruction... > > Sorry, calls are generated for helpers, so it's not 'jmpl' but the > call to wrpstate helper. And why it doesn't happen in a singlestep mode? I tried to comment out cpu_check_irqs(env); in the helper_wrpstate but it made no difference. The only suspicious thing left is register bank switching. Is it safe to switch register banks in the helper function? Shouldn't we end the translation block before? --=20 Regards, Artyom Tarasenko solaris/sparc under qemu blog: http://tyom.blogspot.com/