From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:50623) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UIr5l-0002NO-U0 for qemu-devel@nongnu.org; Thu, 21 Mar 2013 21:48:32 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UIr5g-0004pr-KO for qemu-devel@nongnu.org; Thu, 21 Mar 2013 21:48:25 -0400 Received: from mailout2.samsung.com ([203.254.224.25]:43444) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UIr5g-0004pY-2c for qemu-devel@nongnu.org; Thu, 21 Mar 2013 21:48:20 -0400 Received: from epcpsbgm2.samsung.com (epcpsbgm2 [203.254.230.27]) by mailout2.samsung.com (Oracle Communications Messaging Server 7u4-24.01(7.0.4.24.0) 64bit (built Nov 17 2011)) with ESMTP id <0MK100NRLH090ZG0@mailout2.samsung.com> for qemu-devel@nongnu.org; Fri, 22 Mar 2013 10:48:16 +0900 (KST) Date: Fri, 22 Mar 2013 10:48:22 +0900 From: Yeongkyoon Lee In-reply-to: <20130321221153.GA11625@ohm.aurel32.net> Message-id: <514BB866.20809@samsung.com> MIME-version: 1.0 Content-type: text/plain; charset=ISO-8859-15; format=flowed Content-transfer-encoding: QUOTED-PRINTABLE References: <51293E4A.1040100@weilnetz.de> <20130304163731.GA23040@ohm.aurel32.net> <20130305141806.GA5757@ohm.aurel32.net> <5136A45B.1060000@samsung.com> <20130306061017.GH23040@ohm.aurel32.net> <20130317222747.GB4769@ohm.aurel32.net> <514AB10C.8070408@samsung.com> <20130321221153.GA11625@ohm.aurel32.net> Subject: Re: [Qemu-devel] TCG broken in system mode (was TCG assertion with qemu-system-mipsel) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: =?ISO-8859-15?Q?Aur=E9lien_Jarno?= Cc: Blue Swirl , Stefan Weil , qemu-devel , Richard Henderson On 03/22/2013 07:11 AM, Aur=E9lien Jarno wrote: > On Thu, Mar 21, 2013 at 04:04:44PM +0900, Yeongkyoon Lee wrote: >> On 03/18/2013 07:27 AM, Aur=E9lien Jarno wrote: >>> On Wed, Mar 06, 2013 at 07:10:17AM +0100, Aur=E9lien Jarno wrote: >>>> On Wed, Mar 06, 2013 at 11:05:15AM +0900, Yeongkyoon Lee wrote: >>>>> On 03/05/2013 11:18 PM, Aur=E9lien Jarno wrote: >>>>>> On Mon, Mar 04, 2013 at 05:37:31PM +0100, Aur=E9lien Jarno wro= te: >>>>>>> Hi, >>>>>>> >>>>>>> On Sat, Feb 23, 2013 at 11:10:18PM +0100, Stefan Weil wrote: >>>>>>>> This assertion occured with latest git master: >>>>>>>> >>>>>>>> qemu-system-mipsel: /src/qemu/tcg/tcg-op.h:2589: >>>>>>>> tcg_gen_goto_tb: Assertion `(tcg_ctx.goto_tb_issue_mask & = (1 << idx)) >>>>>>>> =3D=3D 0' failed. >>>>>>>> Aborted >>>>>>>> >>>>>>>> QEMU was built with --enable-debug and running a Debian MIPS= Lenny (NFS >>>>>>>> root). >>>>>>>> The assertion happened when running "apt-get update" in the = guest. >>>>>>>> >>>>>>> Is it something reproductible or more or less random? Have yo= u Cc:ed >>>>>>> Richard because it's related to the latest patches? >>>>>>> >>>>>>> On my side I am experiencing random segfaults in various gues= ts (at >>>>>>> least PowerPC, MIPS, SH4 and ARM). I have found a way to bise= ct it, even >>>>>>> if it is quite long (building Perl + the testsuite). Currentl= y I know >>>>>>> that 1.3 is affected, while 1.2 is not. >>>>>>> >>>>>> I have found that the issue comes from the following commits, = which >>>>>> unfortunately are not bisectable one by one (though it won't c= hange the >>>>>> results a lot): >>>>>> >>>>>> commit b76f0d8c2e3eac94bc7fd90a510cb7426b2a2699 >>>>>> Author: Yeongkyoon Lee >>>>>> Date: Wed Oct 31 16:04:25 2012 +0900 >>>>>> tcg: Optimize qemu_ld/st by generating slow paths at = the end of a block >>>>>> Add optimized TCG qemu_ld/st generation which locates= the code of TLB miss >>>>>> cases at the end of a block after generating the othe= r IRs. >>>>>> Currently, this optimization supports only i386 and x= 86_64 hosts. >>>>>> Signed-off-by: Yeongkyoon Lee >>>>>> Signed-off-by: Blue Swirl >>>>>> commit fdbb84d1332ae0827d60f1a2ca03c7d5678c6edd >>>>>> Author: Yeongkyoon Lee >>>>>> Date: Wed Oct 31 16:04:24 2012 +0900 >>>>>> tcg: Add extended GETPC mechanism for MMU helpers wit= h ldst optimization >>>>>> Add GETPC_EXT which is used by MMU helpers to selecti= vely calculate the code >>>>>> address of accessing guest memory when called from a = qemu_ld/st optimized code >>>>>> or a C function. Currently, it supports only i386 and= x86-64 hosts. >>>>>> Signed-off-by: Yeongkyoon Lee >>>>>> Signed-off-by: Blue Swirl >>>>>> commit 32761257c0b9fa7ee04d2871a6e48a41f119c469 >>>>>> Author: Yeongkyoon Lee >>>>>> Date: Wed Oct 31 16:04:23 2012 +0900 >>>>>> configure: Add CONFIG_QEMU_LDST_OPTIMIZATION for TCG = qemu_ld/st optimization >>>>>> Enable CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/= st optimization only when >>>>>> a host is i386 or x86_64. >>>>>> Signed-off-by: Yeongkyoon Lee >>>>>> Signed-off-by: Blue Swirl >>>>>> >>>>>> I will try to understand why. >>>>>> >>>>>> >>>>> Hi Aur=E9lien, >>>>> Do you mean that those random segfaults occurred only when >>>>> configured with "--enable-debug"? >>>>> Although I cannot see how my commits affect debug built image a= t a >>>>> glance, I'll do double-check. >>>>> Thanks. >>>> The problem is there even without configuring QEMU with --enable= -debug. >>>> It justs doesn't happens very often, and very randomly. The only= way to >>>> reproduce it each time is to launch a big task in the guest (for= me >>>> building Perl) and see if it completes or now. It can take up to= one >>>> hour until it happens. >>>> >>>> I should precise that the segfault is on the guest side. >>>> >>>> I have tried to look at your patches, and so far I haven't found= the >>>> issue. It seems the two first patches are fine, ie I have verifi= ed the >>>> return address is always correctly computed. >>>> >>> I still haven't found the issue, but on the other hand I can't fi= nd any >>> problem in your code, after reading it dozen of times. I also tri= ed to >>> modify it as less as possible while issuing the slow path back in= side >>> the TB and it fixes the problem. So it really looks like to be du= e to >>> the slow path being at the end of the TB, and not to a bug in the= code >>> generating it. After adding various checks, I am also convinced t= he >>> address computed in GETPC_EXT() is always correct. I have to say = I am >>> running out of ideas. >>> >>> One way to reproduce the issue more easily is to reduce the size = of the >>> generated code buffer, for example by setting it to 512kB for bot= h >>> MIN_CODE_GEN_BUFFER_SIZE and MAX_CODE_GEN_BUFFER_SIZE in >>> translate-all.c. That way booting an ARM guest triggers plenty of >>> segmentation faults or other strange issues with your patch but n= ot >>> without. >>> >>> OTOH increasing this size make the issue to almost disappear even= when >>> building perl including the testsuite (for that it has to be at l= east >>> 512MB). >>> >> Although I've not succeeded to reproduce the problem, I've found a >> suspicious code stub about boundary-checking of generated code >> (is_tcg_gen_code() in translate-all.c). >> >> The code is supposed to be changed as follows.case >> Before: >> return (tc_ptr >=3D (uintptr_t)tcg_ctx.code_gen_buffer && >> tc_ptr < (uintptr_t)(tcg_ctx.code_gen_buffer + >> tcg_ctx.code_gen_buffer_max_size)); >> After: >> return (tc_ptr >=3D (uintptr_t)tcg_ctx.code_gen_buffer && >> tc_ptr < (uintptr_t)(tcg_ctx.code_gen_buffer + >> tcg_ctx.code_gen_buffer_size)); >> >> The reason is that there could happen to miss out the generated co= de >> ranges by "(TCG_MAX_OP_SIZE * OPC_BUF_SIZE)". >> See code_gen_alloc() in translate-all.c: >> tcg_ctx.code_gen_buffer_max_size =3D tcg_ctx.code_gen_buffer_= size >> - (TCG_MAX_OP_SIZE * OPC_BUF_SIZE) >> > Very good catch! Thanks. This fixes the issue I observed. > > To give more details, code_gen_buffer_max_size corresponds to the > threshold which clear all TBs before continuing generating code. Th= is > means that it can be exceeded by a few bytes and up to (TCG_MAX_OP_= SIZE > * OPC_BUF_SIZE) bytes which corresponds to the maximum bytes of a > generated TB. > > Could you please send a proper patch to fix that? I think it should= also > be fixed in the next 0.13.x and 0.14.x releases (0.12.x releases ar= e not > affected), so please Cc: qemu-stable (even if the patch will have t= o be > slightly tweaked). > Sure, I'll send the patch. Thanks.