From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=53050 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Q90oF-0005Ry-Qj for qemu-devel@nongnu.org; Sun, 10 Apr 2011 16:00:38 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Q90oE-0003Yy-7B for qemu-devel@nongnu.org; Sun, 10 Apr 2011 16:00:35 -0400 Received: from mail-qy0-f173.google.com ([209.85.216.173]:46303) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Q90oE-0003Yj-2o for qemu-devel@nongnu.org; Sun, 10 Apr 2011 16:00:34 -0400 Received: by qyk36 with SMTP id 36so953600qyk.4 for ; Sun, 10 Apr 2011 13:00:33 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <20110410132415.GA6719@volta.aurel32.net> From: Artyom Tarasenko Date: Sun, 10 Apr 2011 22:00:13 +0200 Message-ID: Subject: Re: [Qemu-devel] tcg/tcg.c:1892: tcg fatal error Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Igor Kovalenko Cc: Blue Swirl , peter.maydell@linaro.org, qemu-devel , Aurelien Jarno On Sun, Apr 10, 2011 at 9:41 PM, Igor Kovalenko wrote: > On Sun, Apr 10, 2011 at 11:37 PM, Artyom Tarasenko = wrote: >> On Sun, Apr 10, 2011 at 8:52 PM, Igor Kovalenko >> wrote: >>> On Sun, Apr 10, 2011 at 10:35 PM, Artyom Tarasenko wrote: >>>> On Sun, Apr 10, 2011 at 7:57 PM, Blue Swirl wro= te: >>>>> On Sun, Apr 10, 2011 at 8:48 PM, Artyom Tarasenko wrote: >>>>>> On Sun, Apr 10, 2011 at 4:44 PM, Blue Swirl w= rote: >>>>>>> On Sun, Apr 10, 2011 at 5:09 PM, Artyom Tarasenko wrote: >>>>>>>> On Sun, Apr 10, 2011 at 3:24 PM, Aurelien Jarno wrote: >>>>>>>>> On Sun, Apr 10, 2011 at 02:29:59PM +0200, Artyom Tarasenko wrote: >>>>>>>>>> Trying to boot some proprietary OS I get qemu-system-sparc64 cra= sh with a >>>>>>>>>> >>>>>>>>>> tcg/tcg.c:1892: tcg fatal error >>>>>>>>>> >>>>>>>>>> error message. >>>>>>>>>> >>>>>>>>>> It looks like it can be a platform independent bug though, becau= se >>>>>>>>>> when a '-singlestep' option IS present, qemu doesn't crash and s= eems >>>>>>>>>> to translate the code properly. >>>>>>>>>> >>>>>>>>>> (gdb) bt >>>>>>>>>> #0 =A00x00000032c2e327f5 in raise () from /lib64/libc.so.6 >>>>>>>>>> #1 =A00x00000032c2e33fd5 in abort () from /lib64/libc.so.6 >>>>>>>>>> #2 =A00x000000000051933d in tcg_reg_alloc_call (s=3D, >>>>>>>>>> def=3D0x89d340, opc=3DINDEX_op_call, args=3D0x10acc98, dead_iarg= s=3D3) at >>>>>>>>>> qemu/tcg/tcg.c:1892 >>>>>>>>>> #3 =A00x000000000051a557 in tcg_gen_code_common (s=3D0x10b8940, >>>>>>>>>> gen_code_buf=3D0x40338b60 "I\213n@H\213] 3\355I\211\256\220") at >>>>>>>>>> qemu/tcg/tcg.c:2099 >>>>>>>>>> #4 =A0tcg_gen_code (s=3D0x10b8940, gen_code_buf=3D0x40338b60 "I\= 213n@H\213] >>>>>>>>>> 3\355I\211\256\220") at qemu/tcg/tcg.c:2142 >>>>>>>>>> #5 =A00x00000000004d38f1 in cpu_sparc_gen_code (env=3D0x10cce10, >>>>>>>>>> tb=3D0x7fffe91bc218, gen_code_size_ptr=3D0x7fffffffd9b4) at >>>>>>>>>> qemu/translate-all.c:93 >>>>>>>>>> #6 =A00x00000000004d1fd7 in tb_gen_code (env=3D0x10cce10, pc=3D1= 8868776, >>>>>>>>>> cs_base=3D18868780, flags=3D15, cflags=3D0) at qemu/exec.c:989 >>>>>>>>>> #7 =A00x00000000004d4029 in tb_find_slow (env1=3D) at >>>>>>>>>> qemu/cpu-exec.c:167 >>>>>>>>>> #8 =A0tb_find_fast (env1=3D) at cpu-exec.c:= 194 >>>>>>>>>> #9 =A0cpu_sparc_exec (env1=3D) at qemu/cpu-= exec.c:556 >>>>>>>>>> #10 0x0000000000408868 in tcg_cpu_exec () at qemu/cpus.c:1066 >>>>>>>>>> #11 cpu_exec_all () at qemu/cpus.c:1102 >>>>>>>>>> #12 0x000000000053c756 in main_loop (argc=3D, >>>>>>>>>> argv=3D, envp=3D) at >>>>>>>>>> qemu/vl.c:1430 >>>>>>>>>> >>>>>>>>>> I inspected ts->val_type causing the abort() case and it turned = out to be 0. >>>>>>>>>> >>>>>>>>>> The last lines of qemu.log (without -singlestep) >>>>>>>>>> IN: >>>>>>>>>> 0x00000000011fe9f0: =A0rdpr =A0%pstate, %g1 >>>>>>>>>> 0x00000000011fe9f4: =A0wrpr =A0%g1, 2, %pstate >>>>>>>>>> -------------- >>>>>>>>>> IN: >>>>>>>>>> 0x00000000011fe9f8: =A0ldub =A0[ %o0 ], %o1 >>>>>>>>>> 0x00000000011fe9fc: =A0mov =A0%o1, %o2 >>>>>>>>>> 0x00000000011fea00: =A0rdpr =A0%tick, %o3 >>>>>>>>>> 0x00000000011fea04: =A0cmp =A0%o1, %o2 >>>>>>>>>> 0x00000000011fea08: =A0be =A0%icc, 0x11fea00 >>>>>>>>>> 0x00000000011fea0c: =A0ldub =A0[ %o0 ], %o2 >>>>>>>>>> >>>>>>>>>> Search PC... >>>>>>>>>> Search PC... >>>>>>>>>> Search PC... >>>>>>>>>> Search PC... >>>>>>>>>> Search PC... >>>>>>>>>> Search PC... >>>>>>>>>> -------------- >>>>>>>>>> IN: >>>>>>>>>> 0x00000000011fe9f8: =A0ldub =A0[ %o0 ], %o1 >>>>>>>>>> 0x00000000011fe9fc: =A0mov =A0%o1, %o2 >>>>>>>>>> 0x00000000011fea00: =A0rdpr =A0%tick, %o3 >>>>>>>>>> 0x00000000011fea04: =A0cmp =A0%o1, %o2 >>>>>>>>>> 0x00000000011fea08: =A0be =A0%icc, 0x11fea00 >>>>>>>>>> 0x00000000011fea0c: =A0ldub =A0[ %o0 ], %o2 >>>>>>>>>> >>>>>>>>>> 110521: Data Access MMU Miss (v=3D0068) pc=3D00000000011fe9f8 >>>>>>>>>> npc=3D00000000011fe9fc SP=3D000000000180ae41 >>>>>>>>>> pc: 00000000011fe9f8 =A0npc: 00000000011fe9fc >>>>>>>>>> >>>>>>>>>> IN: >>>>>>>>>> 0x00000000011fea00: =A0rdpr =A0%tick, %o3 >>>>>>>>>> 0x00000000011fea04: =A0cmp =A0%o1, %o2 >>>>>>>>>> 0x00000000011fea08: =A0be =A0%icc, 0x11fea00 >>>>>>>>>> 0x00000000011fea0c: =A0ldub =A0[ %o0 ], %o2 >>>>>>>>>> -------------- >>>>>>>>>> IN: >>>>>>>>>> 0x00000000011fea10: =A0brz,pn =A0 %o2, 0x11fe9f8 >>>>>>>>>> 0x00000000011fea14: =A0mov =A0%o2, %o4 >>>>>>>>>> -------------- >>>>>>>>>> IN: >>>>>>>>>> 0x00000000011fea18: =A0rdpr =A0%tick, %o5 >>>>>>>>>> 0x00000000011fea1c: =A0cmp =A0%o2, %o4 >>>>>>>>>> 0x00000000011fea20: =A0be =A0%icc, 0x11fea18 >>>>>>>>>> 0x00000000011fea24: =A0ldub =A0[ %o0 ], %o4 >>>>>>>>>> -------------- >>>>>>>>>> IN: >>>>>>>>>> 0x00000000011fea28: =A0brz,pn =A0 %o4, 0x11fe9f4 >>>>>>>>>> 0x00000000011fea2c: =A0wrpr =A0%g0, %g1, %pstate >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> The crash is 100% reproducible and happens always on the same pl= ace, >>>>>>>>>> so it's probably a pure TCG issue, not related on getting the >>>>>>>>>> external/timer interrupts. >>>>>>>>>> >>>>>>>>>> Do you need any additional info? >>>>>>>>>> >>>>>>>>> >>>>>>>>> What would be interesting would be to get the corresponding TCG c= ode >>>>>>>>> from qemu.log (-d op,op_opt). >>>>>>>> >>>>>>>> >>>>>>>> OP: >>>>>>>> =A0---- 0x11fea28 >>>>>>>> =A0ld_i64 tmp6,regwptr,$0x20 >>>>>>>> =A0movi_i64 cond,$0x0 >>>>>>>> =A0movi_i64 tmp8,$0x0 >>>>>>>> =A0brcond_i64 tmp6,tmp8,ne,$0x0 >>>>>>>> =A0movi_i64 cond,$0x1 >>>>>>>> =A0set_label $0x0 >>>>>>>> >>>>>>>> =A0---- 0x11fea2c >>>>>>>> =A0movi_i64 tmp7,$0x0 >>>>>>>> =A0xor_i64 tmp0,tmp7,g1 >>>>>>>> =A0movi_i64 pc,$0x11fea2c >>>>>>>> =A0movi_i64 tmp8,$compute_psr >>>>>>>> =A0call tmp8,$0x0,$0 >>>>>>>> =A0movi_i64 tmp8,$0x0 >>>>>>>> =A0brcond_i64 cond,tmp8,eq,$0x1 >>>>>>>> =A0movi_i64 npc,$0x11fe9f4 >>>>>>>> =A0br $0x2 >>>>>>>> =A0set_label $0x1 >>>>>>>> =A0movi_i64 npc,$0x11fea30 >>>>>>>> =A0set_label $0x2 >>>>>>>> =A0movi_i64 tmp8,$wrpstate >>>>>>>> =A0call tmp8,$0x0,$0,tmp0 >>>>>>>> =A0mov_i64 pc,npc >>>>>>>> =A0movi_i64 tmp8,$0x4 >>>>>>>> =A0add_i64 npc,npc,tmp8 >>>>>>>> =A0exit_tb $0x0 >>>>>>>> >>>>>>>> OP after liveness analysis: >>>>>>>> =A0---- 0x11fea28 >>>>>>>> =A0ld_i64 tmp6,regwptr,$0x20 >>>>>>>> =A0movi_i64 cond,$0x0 >>>>>>>> =A0movi_i64 tmp8,$0x0 >>>>>>>> =A0brcond_i64 tmp6,tmp8,ne,$0x0 >>>>>>>> =A0movi_i64 cond,$0x1 >>>>>>>> =A0set_label $0x0 >>>>>>>> >>>>>>>> =A0---- 0x11fea2c >>>>>>>> =A0nopn $0x2,$0x2 >>>>>>>> =A0nopn $0x3,$0x68,$0x3 >>>>>>>> =A0movi_i64 pc,$0x11fea2c >>>>>>>> =A0movi_i64 tmp8,$compute_psr >>>>>>>> =A0call tmp8,$0x0,$0 >>>>>>>> =A0movi_i64 tmp8,$0x0 >>>>>>>> =A0brcond_i64 cond,tmp8,eq,$0x1 >>>>>>>> =A0movi_i64 npc,$0x11fe9f4 >>>>>>>> =A0br $0x2 >>>>>>>> =A0set_label $0x1 >>>>>>>> =A0movi_i64 npc,$0x11fea30 >>>>>>>> =A0set_label $0x2 >>>>>>>> =A0movi_i64 tmp8,$wrpstate >>>>>>>> =A0call tmp8,$0x0,$0,tmp0 >>>>>>>> =A0mov_i64 pc,npc >>>>>>>> =A0movi_i64 tmp8,$0x4 >>>>>>>> =A0add_i64 npc,npc,tmp8 >>>>>>>> =A0exit_tb $0x0 >>>>>>>> =A0end >>>>>>>> >>>>>>>> Does it mean the last block is processed correctly and the crash >>>>>>>> happens on the next instruction which doesn't make it to the log? >>>>>>>> The next instruction would be a >>>>>>>> >>>>>>>> 0x00000000011fea30: =A0retl >>>>>>>> >>>>>>>> Since it's a branch instruction I guess this would also be a tcg b= lock boundary. >>>>>>> >>>>>>> Because abort() was called from tcg_reg_alloc_call, I'd say 'retl' >>>>>>> (synthetic op for 'jmpl %o8 + 8, %g0') was the problem. >>>>>> >>>>>> Any idea why? retl is not a rare instruction... >>>>> >>>>> Sorry, calls are generated for helpers, so it's not 'jmpl' but the >>>>> call to wrpstate helper. >>>> >>>> And why it doesn't happen in a singlestep mode? >>>> I tried to comment out >>>> cpu_check_irqs(env); >>>> in the helper_wrpstate but it made no difference. The only suspicious >>>> thing left is register bank switching. Is it safe to switch register >>>> banks in the helper function? Shouldn't we end the translation block >>>> before? >>> >>> Not sure if I have seen write to pstate in delay slot, but switching >>> globals with PS_AG appears to be safe. >>> Do you know which bits are changed in the pstate? >> >> Hard to say. With a breakpoint set qemu doesn't crash. >> The breakpoint shows the change from 0x14->0x16. >> So the only difference is that interrupts are getting enabled. No >> register bank change. >> (And now also no cpu_check_irqs(env) call, because I commented it out.) >> >> But given there was a Data Access MMU Miss, I would expect there must >> have beeb a PS_MG switch. >> >> Also the breakpoint makes tcg to cut the translation block before the wr= pr: >> >> IN: >> 0x00000000011fea18: =A0rdpr =A0%tick, %o5 >> 0x00000000011fea1c: =A0cmp =A0%o2, %o4 >> 0x00000000011fea20: =A0be =A0%icc, 0x11fea18 >> 0x00000000011fea24: =A0ldub =A0[ %o0 ], %o4 >> -------------- >> IN: >> 0x00000000011fea28: =A0brz,pn =A0 %o4, 0x11fe9f4 >> -------------- >> IN: >> 0x00000000011fea2c: =A0wrpr =A0%g0, %g1, %pstate >> -------------- >> IN: >> 0x00000000011fea30: =A0retl >> -------------- >> IN: >> 0x00000000011fea30: =A0retl >> 0x00000000011fea34: =A0sub =A0%o5, %o3, %o0 >> > > You can try enabling DEBUG_PSTATE to see which bits are changed. I put an additional DPRINTF in the helper and it doesn't get executed at 11fea2c. Only at 11fe9f4 (0x16->0x14). --=20 Regards, Artyom Tarasenko solaris/sparc under qemu blog: http://tyom.blogspot.com/