From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:56933) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Q9Sga-0003Ql-1h for qemu-devel@nongnu.org; Mon, 11 Apr 2011 21:46:41 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Q9LIw-0002Xw-6u for qemu-devel@nongnu.org; Mon, 11 Apr 2011 13:53:40 -0400 Received: from mail-qy0-f173.google.com ([209.85.216.173]:40775) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Q9LIw-0002Wy-3Z for qemu-devel@nongnu.org; Mon, 11 Apr 2011 13:53:38 -0400 Received: by qyk36 with SMTP id 36so1559184qyk.4 for ; Mon, 11 Apr 2011 10:53:37 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <20110410132415.GA6719@volta.aurel32.net> From: Artyom Tarasenko Date: Mon, 11 Apr 2011 19:53:17 +0200 Message-ID: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] tcg/tcg.c:1892: tcg fatal error List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Igor Kovalenko Cc: Blue Swirl , peter.maydell@linaro.org, qemu-devel , Aurelien Jarno On Mon, Apr 11, 2011 at 5:16 AM, Igor Kovalenko wrote: > On Mon, Apr 11, 2011 at 12:00 AM, Artyom Tarasenko = wrote: >> On Sun, Apr 10, 2011 at 9:41 PM, Igor Kovalenko >> wrote: >>> On Sun, Apr 10, 2011 at 11:37 PM, Artyom Tarasenko wrote: >>>> On Sun, Apr 10, 2011 at 8:52 PM, Igor Kovalenko >>>> wrote: >>>>> On Sun, Apr 10, 2011 at 10:35 PM, Artyom Tarasenko wrote: >>>>>> On Sun, Apr 10, 2011 at 7:57 PM, Blue Swirl w= rote: >>>>>>> On Sun, Apr 10, 2011 at 8:48 PM, Artyom Tarasenko wrote: >>>>>>>> On Sun, Apr 10, 2011 at 4:44 PM, Blue Swirl = wrote: >>>>>>>>> On Sun, Apr 10, 2011 at 5:09 PM, Artyom Tarasenko wrote: >>>>>>>>>> On Sun, Apr 10, 2011 at 3:24 PM, Aurelien Jarno wrote: >>>>>>>>>>> On Sun, Apr 10, 2011 at 02:29:59PM +0200, Artyom Tarasenko wrot= e: >>>>>>>>>>>> Trying to boot some proprietary OS I get qemu-system-sparc64 c= rash with a >>>>>>>>>>>> >>>>>>>>>>>> tcg/tcg.c:1892: tcg fatal error >>>>>>>>>>>> >>>>>>>>>>>> error message. >>>>>>>>>>>> >>>>>>>>>>>> It looks like it can be a platform independent bug though, bec= ause >>>>>>>>>>>> when a '-singlestep' option IS present, qemu doesn't crash and= seems >>>>>>>>>>>> to translate the code properly. >>>>>>>>>>>> >>>>>>>>>>>> (gdb) bt >>>>>>>>>>>> #0 =A00x00000032c2e327f5 in raise () from /lib64/libc.so.6 >>>>>>>>>>>> #1 =A00x00000032c2e33fd5 in abort () from /lib64/libc.so.6 >>>>>>>>>>>> #2 =A00x000000000051933d in tcg_reg_alloc_call (s=3D, >>>>>>>>>>>> def=3D0x89d340, opc=3DINDEX_op_call, args=3D0x10acc98, dead_ia= rgs=3D3) at >>>>>>>>>>>> qemu/tcg/tcg.c:1892 >>>>>>>>>>>> #3 =A00x000000000051a557 in tcg_gen_code_common (s=3D0x10b8940= , >>>>>>>>>>>> gen_code_buf=3D0x40338b60 "I\213n@H\213] 3\355I\211\256\220") = at >>>>>>>>>>>> qemu/tcg/tcg.c:2099 >>>>>>>>>>>> #4 =A0tcg_gen_code (s=3D0x10b8940, gen_code_buf=3D0x40338b60 "= I\213n@H\213] >>>>>>>>>>>> 3\355I\211\256\220") at qemu/tcg/tcg.c:2142 >>>>>>>>>>>> #5 =A00x00000000004d38f1 in cpu_sparc_gen_code (env=3D0x10cce1= 0, >>>>>>>>>>>> tb=3D0x7fffe91bc218, gen_code_size_ptr=3D0x7fffffffd9b4) at >>>>>>>>>>>> qemu/translate-all.c:93 >>>>>>>>>>>> #6 =A00x00000000004d1fd7 in tb_gen_code (env=3D0x10cce10, pc= =3D18868776, >>>>>>>>>>>> cs_base=3D18868780, flags=3D15, cflags=3D0) at qemu/exec.c:989 >>>>>>>>>>>> #7 =A00x00000000004d4029 in tb_find_slow (env1=3D) at >>>>>>>>>>>> qemu/cpu-exec.c:167 >>>>>>>>>>>> #8 =A0tb_find_fast (env1=3D) at cpu-exec.= c:194 >>>>>>>>>>>> #9 =A0cpu_sparc_exec (env1=3D) at qemu/cp= u-exec.c:556 >>>>>>>>>>>> #10 0x0000000000408868 in tcg_cpu_exec () at qemu/cpus.c:1066 >>>>>>>>>>>> #11 cpu_exec_all () at qemu/cpus.c:1102 >>>>>>>>>>>> #12 0x000000000053c756 in main_loop (argc=3D, >>>>>>>>>>>> argv=3D, envp=3D) at >>>>>>>>>>>> qemu/vl.c:1430 >>>>>>>>>>>> >>>>>>>>>>>> I inspected ts->val_type causing the abort() case and it turne= d out to be 0. >>>>>>>>>>>> >>>>>>>>>>>> The last lines of qemu.log (without -singlestep) >>>>>>>>>>>> IN: >>>>>>>>>>>> 0x00000000011fe9f0: =A0rdpr =A0%pstate, %g1 >>>>>>>>>>>> 0x00000000011fe9f4: =A0wrpr =A0%g1, 2, %pstate >>>>>>>>>>>> -------------- >>>>>>>>>>>> IN: >>>>>>>>>>>> 0x00000000011fe9f8: =A0ldub =A0[ %o0 ], %o1 >>>>>>>>>>>> 0x00000000011fe9fc: =A0mov =A0%o1, %o2 >>>>>>>>>>>> 0x00000000011fea00: =A0rdpr =A0%tick, %o3 >>>>>>>>>>>> 0x00000000011fea04: =A0cmp =A0%o1, %o2 >>>>>>>>>>>> 0x00000000011fea08: =A0be =A0%icc, 0x11fea00 >>>>>>>>>>>> 0x00000000011fea0c: =A0ldub =A0[ %o0 ], %o2 >>>>>>>>>>>> >>>>>>>>>>>> Search PC... >>>>>>>>>>>> Search PC... >>>>>>>>>>>> Search PC... >>>>>>>>>>>> Search PC... >>>>>>>>>>>> Search PC... >>>>>>>>>>>> Search PC... >>>>>>>>>>>> -------------- >>>>>>>>>>>> IN: >>>>>>>>>>>> 0x00000000011fe9f8: =A0ldub =A0[ %o0 ], %o1 >>>>>>>>>>>> 0x00000000011fe9fc: =A0mov =A0%o1, %o2 >>>>>>>>>>>> 0x00000000011fea00: =A0rdpr =A0%tick, %o3 >>>>>>>>>>>> 0x00000000011fea04: =A0cmp =A0%o1, %o2 >>>>>>>>>>>> 0x00000000011fea08: =A0be =A0%icc, 0x11fea00 >>>>>>>>>>>> 0x00000000011fea0c: =A0ldub =A0[ %o0 ], %o2 >>>>>>>>>>>> >>>>>>>>>>>> 110521: Data Access MMU Miss (v=3D0068) pc=3D00000000011fe9f8 >>>>>>>>>>>> npc=3D00000000011fe9fc SP=3D000000000180ae41 >>>>>>>>>>>> pc: 00000000011fe9f8 =A0npc: 00000000011fe9fc >>>>>>>>>>>> >>>>>>>>>>>> IN: >>>>>>>>>>>> 0x00000000011fea00: =A0rdpr =A0%tick, %o3 >>>>>>>>>>>> 0x00000000011fea04: =A0cmp =A0%o1, %o2 >>>>>>>>>>>> 0x00000000011fea08: =A0be =A0%icc, 0x11fea00 >>>>>>>>>>>> 0x00000000011fea0c: =A0ldub =A0[ %o0 ], %o2 >>>>>>>>>>>> -------------- >>>>>>>>>>>> IN: >>>>>>>>>>>> 0x00000000011fea10: =A0brz,pn =A0 %o2, 0x11fe9f8 >>>>>>>>>>>> 0x00000000011fea14: =A0mov =A0%o2, %o4 >>>>>>>>>>>> -------------- >>>>>>>>>>>> IN: >>>>>>>>>>>> 0x00000000011fea18: =A0rdpr =A0%tick, %o5 >>>>>>>>>>>> 0x00000000011fea1c: =A0cmp =A0%o2, %o4 >>>>>>>>>>>> 0x00000000011fea20: =A0be =A0%icc, 0x11fea18 >>>>>>>>>>>> 0x00000000011fea24: =A0ldub =A0[ %o0 ], %o4 >>>>>>>>>>>> -------------- >>>>>>>>>>>> IN: >>>>>>>>>>>> 0x00000000011fea28: =A0brz,pn =A0 %o4, 0x11fe9f4 >>>>>>>>>>>> 0x00000000011fea2c: =A0wrpr =A0%g0, %g1, %pstate >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> The crash is 100% reproducible and happens always on the same = place, >>>>>>>>>>>> so it's probably a pure TCG issue, not related on getting the >>>>>>>>>>>> external/timer interrupts. >>>>>>>>>>>> >>>>>>>>>>>> Do you need any additional info? >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> What would be interesting would be to get the corresponding TCG= code >>>>>>>>>>> from qemu.log (-d op,op_opt). >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> OP: >>>>>>>>>> =A0---- 0x11fea28 >>>>>>>>>> =A0ld_i64 tmp6,regwptr,$0x20 >>>>>>>>>> =A0movi_i64 cond,$0x0 >>>>>>>>>> =A0movi_i64 tmp8,$0x0 >>>>>>>>>> =A0brcond_i64 tmp6,tmp8,ne,$0x0 >>>>>>>>>> =A0movi_i64 cond,$0x1 >>>>>>>>>> =A0set_label $0x0 >>>>>>>>>> >>>>>>>>>> =A0---- 0x11fea2c >>>>>>>>>> =A0movi_i64 tmp7,$0x0 >>>>>>>>>> =A0xor_i64 tmp0,tmp7,g1 >>>>>>>>>> =A0movi_i64 pc,$0x11fea2c >>>>>>>>>> =A0movi_i64 tmp8,$compute_psr >>>>>>>>>> =A0call tmp8,$0x0,$0 >>>>>>>>>> =A0movi_i64 tmp8,$0x0 >>>>>>>>>> =A0brcond_i64 cond,tmp8,eq,$0x1 >>>>>>>>>> =A0movi_i64 npc,$0x11fe9f4 >>>>>>>>>> =A0br $0x2 >>>>>>>>>> =A0set_label $0x1 >>>>>>>>>> =A0movi_i64 npc,$0x11fea30 >>>>>>>>>> =A0set_label $0x2 >>>>>>>>>> =A0movi_i64 tmp8,$wrpstate >>>>>>>>>> =A0call tmp8,$0x0,$0,tmp0 >>>>>>>>>> =A0mov_i64 pc,npc >>>>>>>>>> =A0movi_i64 tmp8,$0x4 >>>>>>>>>> =A0add_i64 npc,npc,tmp8 >>>>>>>>>> =A0exit_tb $0x0 >>>>>>>>>> >>>>>>>>>> OP after liveness analysis: >>>>>>>>>> =A0---- 0x11fea28 >>>>>>>>>> =A0ld_i64 tmp6,regwptr,$0x20 >>>>>>>>>> =A0movi_i64 cond,$0x0 >>>>>>>>>> =A0movi_i64 tmp8,$0x0 >>>>>>>>>> =A0brcond_i64 tmp6,tmp8,ne,$0x0 >>>>>>>>>> =A0movi_i64 cond,$0x1 >>>>>>>>>> =A0set_label $0x0 >>>>>>>>>> >>>>>>>>>> =A0---- 0x11fea2c >>>>>>>>>> =A0nopn $0x2,$0x2 >>>>>>>>>> =A0nopn $0x3,$0x68,$0x3 >>>>>>>>>> =A0movi_i64 pc,$0x11fea2c >>>>>>>>>> =A0movi_i64 tmp8,$compute_psr >>>>>>>>>> =A0call tmp8,$0x0,$0 >>>>>>>>>> =A0movi_i64 tmp8,$0x0 >>>>>>>>>> =A0brcond_i64 cond,tmp8,eq,$0x1 >>>>>>>>>> =A0movi_i64 npc,$0x11fe9f4 >>>>>>>>>> =A0br $0x2 >>>>>>>>>> =A0set_label $0x1 >>>>>>>>>> =A0movi_i64 npc,$0x11fea30 >>>>>>>>>> =A0set_label $0x2 >>>>>>>>>> =A0movi_i64 tmp8,$wrpstate >>>>>>>>>> =A0call tmp8,$0x0,$0,tmp0 >>>>>>>>>> =A0mov_i64 pc,npc >>>>>>>>>> =A0movi_i64 tmp8,$0x4 >>>>>>>>>> =A0add_i64 npc,npc,tmp8 >>>>>>>>>> =A0exit_tb $0x0 >>>>>>>>>> =A0end >>>>>>>>>> >>>>>>>>>> Does it mean the last block is processed correctly and the crash >>>>>>>>>> happens on the next instruction which doesn't make it to the log= ? >>>>>>>>>> The next instruction would be a >>>>>>>>>> >>>>>>>>>> 0x00000000011fea30: =A0retl >>>>>>>>>> >>>>>>>>>> Since it's a branch instruction I guess this would also be a tcg= block boundary. >>>>>>>>> >>>>>>>>> Because abort() was called from tcg_reg_alloc_call, I'd say 'retl= ' >>>>>>>>> (synthetic op for 'jmpl %o8 + 8, %g0') was the problem. >>>>>>>> >>>>>>>> Any idea why? retl is not a rare instruction... >>>>>>> >>>>>>> Sorry, calls are generated for helpers, so it's not 'jmpl' but the >>>>>>> call to wrpstate helper. >>>>>> >>>>>> And why it doesn't happen in a singlestep mode? >>>>>> I tried to comment out >>>>>> cpu_check_irqs(env); >>>>>> in the helper_wrpstate but it made no difference. The only suspiciou= s >>>>>> thing left is register bank switching. Is it safe to switch register >>>>>> banks in the helper function? Shouldn't we end the translation block >>>>>> before? >>>>> >>>>> Not sure if I have seen write to pstate in delay slot, but switching >>>>> globals with PS_AG appears to be safe. >>>>> Do you know which bits are changed in the pstate? >>>> >>>> Hard to say. With a breakpoint set qemu doesn't crash. >>>> The breakpoint shows the change from 0x14->0x16. >>>> So the only difference is that interrupts are getting enabled. No >>>> register bank change. >>>> (And now also no cpu_check_irqs(env) call, because I commented it out.= ) >>>> >>>> But given there was a Data Access MMU Miss, I would expect there must >>>> have beeb a PS_MG switch. >>>> >>>> Also the breakpoint makes tcg to cut the translation block before the = wrpr: >>>> >>>> IN: >>>> 0x00000000011fea18: =A0rdpr =A0%tick, %o5 >>>> 0x00000000011fea1c: =A0cmp =A0%o2, %o4 >>>> 0x00000000011fea20: =A0be =A0%icc, 0x11fea18 >>>> 0x00000000011fea24: =A0ldub =A0[ %o0 ], %o4 >>>> -------------- >>>> IN: >>>> 0x00000000011fea28: =A0brz,pn =A0 %o4, 0x11fe9f4 >>>> -------------- >>>> IN: >>>> 0x00000000011fea2c: =A0wrpr =A0%g0, %g1, %pstate >>>> -------------- >>>> IN: >>>> 0x00000000011fea30: =A0retl >>>> -------------- >>>> IN: >>>> 0x00000000011fea30: =A0retl >>>> 0x00000000011fea34: =A0sub =A0%o5, %o3, %o0 >>>> >>> >>> You can try enabling DEBUG_PSTATE to see which bits are changed. >> >> I put an additional DPRINTF in the helper and it doesn't get executed >> at 11fea2c. Only at 11fe9f4 (0x16->0x14). > > In such cases I would run with -d in_asm,int to have more data to > compare two runs. > May the patch attached help a bit to add verbose pstate output. Can do it, but I'd like to understand first what we are looking for. How does the main works in this case? Is it something like following? translate {brz,pn ; wrpr} -> optimize -> execute ->translate {retl ; ...} ->optimize -> execute. The subject error is a tcg error, so it is happening in one of the two translate/optimise phases drawn above, right? So, why are we looking at the wrpr helper code? > Do you have public test case? > It is possible to code this delay slot write test but real issue may > be corruption elsewhere. You assume ts->val_type gets corrupted? But then it must happen before the wrpr helper call, or actually before the translation of {brz,pn ; wrpr} block, no? --=20 Regards, Artyom Tarasenko solaris/sparc under qemu blog: http://tyom.blogspot.com/