From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:34614) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UIniV-0007nI-C4 for qemu-devel@nongnu.org; Thu, 21 Mar 2013 18:12:13 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UIniT-00040O-B3 for qemu-devel@nongnu.org; Thu, 21 Mar 2013 18:12:11 -0400 Received: from hall.aurel32.net ([2001:470:1f15:c4f::1]:51467) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UIniT-0003yv-0R for qemu-devel@nongnu.org; Thu, 21 Mar 2013 18:12:09 -0400 Date: Thu, 21 Mar 2013 23:11:53 +0100 From: =?iso-8859-15?Q?Aur=E9lien?= Jarno Message-ID: <20130321221153.GA11625@ohm.aurel32.net> References: <51293E4A.1040100@weilnetz.de> <20130304163731.GA23040@ohm.aurel32.net> <20130305141806.GA5757@ohm.aurel32.net> <5136A45B.1060000@samsung.com> <20130306061017.GH23040@ohm.aurel32.net> <20130317222747.GB4769@ohm.aurel32.net> <514AB10C.8070408@samsung.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <514AB10C.8070408@samsung.com> Subject: Re: [Qemu-devel] TCG broken in system mode (was TCG assertion with qemu-system-mipsel) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Yeongkyoon Lee Cc: Blue Swirl , Stefan Weil , qemu-devel , Richard Henderson On Thu, Mar 21, 2013 at 04:04:44PM +0900, Yeongkyoon Lee wrote: > On 03/18/2013 07:27 AM, Aurélien Jarno wrote: > >On Wed, Mar 06, 2013 at 07:10:17AM +0100, Aurélien Jarno wrote: > >>On Wed, Mar 06, 2013 at 11:05:15AM +0900, Yeongkyoon Lee wrote: > >>>On 03/05/2013 11:18 PM, Aurélien Jarno wrote: > >>>>On Mon, Mar 04, 2013 at 05:37:31PM +0100, Aurélien Jarno wrote: > >>>>>Hi, > >>>>> > >>>>>On Sat, Feb 23, 2013 at 11:10:18PM +0100, Stefan Weil wrote: > >>>>>>This assertion occured with latest git master: > >>>>>> > >>>>>>qemu-system-mipsel: /src/qemu/tcg/tcg-op.h:2589: > >>>>>> tcg_gen_goto_tb: Assertion `(tcg_ctx.goto_tb_issue_mask & (1 << idx)) > >>>>>>== 0' failed. > >>>>>>Aborted > >>>>>> > >>>>>>QEMU was built with --enable-debug and running a Debian MIPS Lenny (NFS > >>>>>>root). > >>>>>>The assertion happened when running "apt-get update" in the guest. > >>>>>> > >>>>>Is it something reproductible or more or less random? Have you Cc:ed > >>>>>Richard because it's related to the latest patches? > >>>>> > >>>>>On my side I am experiencing random segfaults in various guests (at > >>>>>least PowerPC, MIPS, SH4 and ARM). I have found a way to bisect it, even > >>>>>if it is quite long (building Perl + the testsuite). Currently I know > >>>>>that 1.3 is affected, while 1.2 is not. > >>>>> > >>>>I have found that the issue comes from the following commits, which > >>>>unfortunately are not bisectable one by one (though it won't change the > >>>>results a lot): > >>>> > >>>> commit b76f0d8c2e3eac94bc7fd90a510cb7426b2a2699 > >>>> Author: Yeongkyoon Lee > >>>> Date: Wed Oct 31 16:04:25 2012 +0900 > >>>> tcg: Optimize qemu_ld/st by generating slow paths at the end of a block > >>>> Add optimized TCG qemu_ld/st generation which locates the code of TLB miss > >>>> cases at the end of a block after generating the other IRs. > >>>> Currently, this optimization supports only i386 and x86_64 hosts. > >>>> Signed-off-by: Yeongkyoon Lee > >>>> Signed-off-by: Blue Swirl > >>>> commit fdbb84d1332ae0827d60f1a2ca03c7d5678c6edd > >>>> Author: Yeongkyoon Lee > >>>> Date: Wed Oct 31 16:04:24 2012 +0900 > >>>> tcg: Add extended GETPC mechanism for MMU helpers with ldst optimization > >>>> Add GETPC_EXT which is used by MMU helpers to selectively calculate the code > >>>> address of accessing guest memory when called from a qemu_ld/st optimized code > >>>> or a C function. Currently, it supports only i386 and x86-64 hosts. > >>>> Signed-off-by: Yeongkyoon Lee > >>>> Signed-off-by: Blue Swirl > >>>> commit 32761257c0b9fa7ee04d2871a6e48a41f119c469 > >>>> Author: Yeongkyoon Lee > >>>> Date: Wed Oct 31 16:04:23 2012 +0900 > >>>> configure: Add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st optimization > >>>> Enable CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st optimization only when > >>>> a host is i386 or x86_64. > >>>> Signed-off-by: Yeongkyoon Lee > >>>> Signed-off-by: Blue Swirl > >>>> > >>>>I will try to understand why. > >>>> > >>>> > >>>Hi Aurélien, > >>>Do you mean that those random segfaults occurred only when > >>>configured with "--enable-debug"? > >>>Although I cannot see how my commits affect debug built image at a > >>>glance, I'll do double-check. > >>>Thanks. > >>The problem is there even without configuring QEMU with --enable-debug. > >>It justs doesn't happens very often, and very randomly. The only way to > >>reproduce it each time is to launch a big task in the guest (for me > >>building Perl) and see if it completes or now. It can take up to one > >>hour until it happens. > >> > >>I should precise that the segfault is on the guest side. > >> > >>I have tried to look at your patches, and so far I haven't found the > >>issue. It seems the two first patches are fine, ie I have verified the > >>return address is always correctly computed. > >> > >I still haven't found the issue, but on the other hand I can't find any > >problem in your code, after reading it dozen of times. I also tried to > >modify it as less as possible while issuing the slow path back inside > >the TB and it fixes the problem. So it really looks like to be due to > >the slow path being at the end of the TB, and not to a bug in the code > >generating it. After adding various checks, I am also convinced the > >address computed in GETPC_EXT() is always correct. I have to say I am > >running out of ideas. > > > >One way to reproduce the issue more easily is to reduce the size of the > >generated code buffer, for example by setting it to 512kB for both > >MIN_CODE_GEN_BUFFER_SIZE and MAX_CODE_GEN_BUFFER_SIZE in > >translate-all.c. That way booting an ARM guest triggers plenty of > >segmentation faults or other strange issues with your patch but not > >without. > > > >OTOH increasing this size make the issue to almost disappear even when > >building perl including the testsuite (for that it has to be at least > >512MB). > > > > Although I've not succeeded to reproduce the problem, I've found a > suspicious code stub about boundary-checking of generated code > (is_tcg_gen_code() in translate-all.c). > > The code is supposed to be changed as follows.case > Before: > return (tc_ptr >= (uintptr_t)tcg_ctx.code_gen_buffer && > tc_ptr < (uintptr_t)(tcg_ctx.code_gen_buffer + > tcg_ctx.code_gen_buffer_max_size)); > After: > return (tc_ptr >= (uintptr_t)tcg_ctx.code_gen_buffer && > tc_ptr < (uintptr_t)(tcg_ctx.code_gen_buffer + > tcg_ctx.code_gen_buffer_size)); > > The reason is that there could happen to miss out the generated code > ranges by "(TCG_MAX_OP_SIZE * OPC_BUF_SIZE)". > See code_gen_alloc() in translate-all.c: > tcg_ctx.code_gen_buffer_max_size = tcg_ctx.code_gen_buffer_size > - (TCG_MAX_OP_SIZE * OPC_BUF_SIZE) > Very good catch! Thanks. This fixes the issue I observed. To give more details, code_gen_buffer_max_size corresponds to the threshold which clear all TBs before continuing generating code. This means that it can be exceeded by a few bytes and up to (TCG_MAX_OP_SIZE * OPC_BUF_SIZE) bytes which corresponds to the maximum bytes of a generated TB. Could you please send a proper patch to fix that? I think it should also be fixed in the next 0.13.x and 0.14.x releases (0.12.x releases are not affected), so please Cc: qemu-stable (even if the patch will have to be slightly tweaked). -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurelien@aurel32.net http://www.aurel32.net