From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:55659)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <yeongkyoon.lee@samsung.com>) id 1UIZYN-0005dp-Mk
	for qemu-devel@nongnu.org; Thu, 21 Mar 2013 03:04:48 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <yeongkyoon.lee@samsung.com>) id 1UIZYK-0004pL-Nf
	for qemu-devel@nongnu.org; Thu, 21 Mar 2013 03:04:47 -0400
Received: from mailout2.samsung.com ([203.254.224.25]:38875)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <yeongkyoon.lee@samsung.com>) id 1UIZYK-0004oa-62
	for qemu-devel@nongnu.org; Thu, 21 Mar 2013 03:04:44 -0400
Received: from epcpsbgm1.samsung.com (epcpsbgm1 [203.254.230.26])
	by mailout2.samsung.com
	(Oracle Communications Messaging Server 7u4-24.01(7.0.4.24.0) 64bit
	(built Nov
	17 2011)) with ESMTP id <0MK000FK10ZSIVQ0@mailout2.samsung.com> for
	qemu-devel@nongnu.org; Thu, 21 Mar 2013 16:04:40 +0900 (KST)
Received: from [172.21.111.108] ([182.198.1.3])
	by mmp1.samsung.com (Oracle Communications Messaging Server 7u4-24.01
	(7.0.4.24.0) 64bit (built Nov 17 2011))
	with ESMTPA id <0MK00016J0ZQHM70@mmp1.samsung.com> for
	qemu-devel@nongnu.org; Thu, 21 Mar 2013 16:04:40 +0900 (KST)
Date: Thu, 21 Mar 2013 16:04:44 +0900
From: Yeongkyoon Lee <yeongkyoon.lee@samsung.com>
In-reply-to: <20130317222747.GB4769@ohm.aurel32.net>
Message-id: <514AB10C.8070408@samsung.com>
MIME-version: 1.0
Content-type: text/plain; charset=ISO-8859-15; format=flowed
Content-transfer-encoding: QUOTED-PRINTABLE
References: <51293E4A.1040100@weilnetz.de>
	<20130304163731.GA23040@ohm.aurel32.net>
	<20130305141806.GA5757@ohm.aurel32.net> <5136A45B.1060000@samsung.com>
	<20130306061017.GH23040@ohm.aurel32.net>
	<20130317222747.GB4769@ohm.aurel32.net>
Subject: Re: [Qemu-devel] TCG broken in system mode (was TCG assertion with
 qemu-system-mipsel)
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: =?ISO-8859-15?Q?Aur=E9lien_Jarno?= <aurelien@aurel32.net>
Cc: Blue Swirl <blauwirbel@gmail.com>, Stefan Weil <sw@weilnetz.de>, qemu-devel <qemu-devel@nongnu.org>, Richard Henderson <rth@twiddle.net>

On 03/18/2013 07:27 AM, Aur=E9lien Jarno wrote:
> On Wed, Mar 06, 2013 at 07:10:17AM +0100, Aur=E9lien Jarno wrote:
>> On Wed, Mar 06, 2013 at 11:05:15AM +0900, Yeongkyoon Lee wrote:
>>> On 03/05/2013 11:18 PM, Aur=E9lien Jarno wrote:
>>>> On Mon, Mar 04, 2013 at 05:37:31PM +0100, Aur=E9lien Jarno wrote=
:
>>>>> Hi,
>>>>>
>>>>> On Sat, Feb 23, 2013 at 11:10:18PM +0100, Stefan Weil wrote:
>>>>>> This assertion occured with latest git master:
>>>>>>
>>>>>> qemu-system-mipsel: /src/qemu/tcg/tcg-op.h:2589:
>>>>>>   tcg_gen_goto_tb: Assertion `(tcg_ctx.goto_tb_issue_mask & (1=
 << idx))
>>>>>> =3D=3D 0' failed.
>>>>>> Aborted
>>>>>>
>>>>>> QEMU was built with --enable-debug and running a Debian MIPS L=
enny (NFS
>>>>>> root).
>>>>>> The assertion happened when running "apt-get update" in the gu=
est.
>>>>>>
>>>>> Is it something reproductible or more or less random? Have you =
Cc:ed
>>>>> Richard because it's related to the latest patches?
>>>>>
>>>>> On my side I am experiencing random segfaults in various guests=
 (at
>>>>> least PowerPC, MIPS, SH4 and ARM). I have found a way to bisect=
 it, even
>>>>> if it is quite long (building Perl + the testsuite). Currently =
I know
>>>>> that 1.3 is affected, while 1.2 is not.
>>>>>
>>>> I have found that the issue comes from the following commits, wh=
ich
>>>> unfortunately are not bisectable one by one (though it won't cha=
nge the
>>>> results a lot):
>>>>
>>>>      commit b76f0d8c2e3eac94bc7fd90a510cb7426b2a2699
>>>>      Author: Yeongkyoon Lee <yeongkyoon.lee@samsung.com>
>>>>      Date:   Wed Oct 31 16:04:25 2012 +0900
>>>>          tcg: Optimize qemu_ld/st by generating slow paths at th=
e end of a block
>>>>          Add optimized TCG qemu_ld/st generation which locates t=
he code of TLB miss
>>>>          cases at the end of a block after generating the other =
IRs.
>>>>          Currently, this optimization supports only i386 and x86=
_64 hosts.
>>>>          Signed-off-by: Yeongkyoon Lee <yeongkyoon.lee@samsung.c=
om>
>>>>          Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
>>>>      commit fdbb84d1332ae0827d60f1a2ca03c7d5678c6edd
>>>>      Author: Yeongkyoon Lee <yeongkyoon.lee@samsung.com>
>>>>      Date:   Wed Oct 31 16:04:24 2012 +0900
>>>>          tcg: Add extended GETPC mechanism for MMU helpers with =
ldst optimization
>>>>          Add GETPC_EXT which is used by MMU helpers to selective=
ly calculate the code
>>>>          address of accessing guest memory when called from a qe=
mu_ld/st optimized code
>>>>          or a C function. Currently, it supports only i386 and x=
86-64 hosts.
>>>>          Signed-off-by: Yeongkyoon Lee <yeongkyoon.lee@samsung.c=
om>
>>>>          Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
>>>>      commit 32761257c0b9fa7ee04d2871a6e48a41f119c469
>>>>      Author: Yeongkyoon Lee <yeongkyoon.lee@samsung.com>
>>>>      Date:   Wed Oct 31 16:04:23 2012 +0900
>>>>          configure: Add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qe=
mu_ld/st optimization
>>>>          Enable CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st=
 optimization only when
>>>>          a host is i386 or x86_64.
>>>>          Signed-off-by: Yeongkyoon Lee <yeongkyoon.lee@samsung.c=
om>
>>>>          Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
>>>>
>>>> I will try to understand why.
>>>>
>>>>
>>> Hi Aur=E9lien,
>>> Do you mean that those random segfaults occurred only when
>>> configured with "--enable-debug"?
>>> Although I cannot see how my commits affect debug built image at =
a
>>> glance, I'll do double-check.
>>> Thanks.
>> The problem is there even without configuring QEMU with --enable-d=
ebug.
>> It justs doesn't happens very often, and very randomly. The only w=
ay to
>> reproduce it each time is to launch a big task in the guest (for m=
e
>> building Perl) and see if it completes or now. It can take up to o=
ne
>> hour until it happens.
>>
>> I should precise that the segfault is on the guest side.
>>
>> I have tried to look at your patches, and so far I haven't found t=
he
>> issue. It seems the two first patches are fine, ie I have verified=
 the
>> return address is always correctly computed.
>>
> I still haven't found the issue, but on the other hand I can't find=
 any
> problem in your code, after reading it dozen of times. I also tried=
 to
> modify it as less as possible while issuing the slow path back insi=
de
> the TB and it fixes the problem. So it really looks like to be due =
to
> the slow path being at the end of the TB, and not to a bug in the c=
ode
> generating it. After adding various checks, I am also convinced the
> address computed in GETPC_EXT() is always correct. I have to say I =
am
> running out of ideas.
>
> One way to reproduce the issue more easily is to reduce the size of=
 the
> generated code buffer, for example by setting it to 512kB for both
> MIN_CODE_GEN_BUFFER_SIZE and MAX_CODE_GEN_BUFFER_SIZE in
> translate-all.c. That way booting an ARM guest triggers plenty of
> segmentation faults or other strange issues with your patch but not
> without.
>
> OTOH increasing this size make the issue to almost disappear even w=
hen
> building perl including the testsuite (for that it has to be at lea=
st
> 512MB).
>

Although I've not succeeded to reproduce the problem, I've found a=
=20
suspicious code stub about boundary-checking of generated code=20
(is_tcg_gen_code() in translate-all.c).

The code is supposed to be changed as follows.case
Before:
     return (tc_ptr >=3D (uintptr_t)tcg_ctx.code_gen_buffer &&
                 tc_ptr < (uintptr_t)(tcg_ctx.code_gen_buffer +
                 tcg_ctx.code_gen_buffer_max_size));
After:
     return (tc_ptr >=3D (uintptr_t)tcg_ctx.code_gen_buffer &&
                 tc_ptr < (uintptr_t)(tcg_ctx.code_gen_buffer +
                 tcg_ctx.code_gen_buffer_size));

The reason is that there could happen to miss out the generated code=
=20
ranges by "(TCG_MAX_OP_SIZE * OPC_BUF_SIZE)".
See code_gen_alloc() in translate-all.c:
     tcg_ctx.code_gen_buffer_max_size =3D tcg_ctx.code_gen_buffer_siz=
e -=20
(TCG_MAX_OP_SIZE * OPC_BUF_SIZE)

Aur=E9lien and Stefan,
Could you please test this and feedback the result?
Because, I'm not able to reproduce this problem, though I follow up=
=20
Aur=E9lien's reproducible steps.