Re: [Qemu-devel] TCG broken in system mode (was TCG assertion with qemu-system-mipsel)

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Yeongkyoon Lee <yeongkyoon.lee@samsung.com>
To: "Aurélien Jarno" <aurelien@aurel32.net>
Cc: Blue Swirl <blauwirbel@gmail.com>, Stefan Weil <sw@weilnetz.de>,
	qemu-devel <qemu-devel@nongnu.org>,
	Richard Henderson <rth@twiddle.net>
Subject: Re: [Qemu-devel] TCG broken in system mode (was TCG assertion with qemu-system-mipsel)
Date: Fri, 22 Mar 2013 10:48:22 +0900	[thread overview]
Message-ID: <514BB866.20809@samsung.com> (raw)
In-Reply-To: <20130321221153.GA11625@ohm.aurel32.net>

On 03/22/2013 07:11 AM, Aurélien Jarno wrote:
> On Thu, Mar 21, 2013 at 04:04:44PM +0900, Yeongkyoon Lee wrote:
>> On 03/18/2013 07:27 AM, Aurélien Jarno wrote:
>>> On Wed, Mar 06, 2013 at 07:10:17AM +0100, Aurélien Jarno wrote:
>>>> On Wed, Mar 06, 2013 at 11:05:15AM +0900, Yeongkyoon Lee wrote:
>>>>> On 03/05/2013 11:18 PM, Aurélien Jarno wrote:
>>>>>> On Mon, Mar 04, 2013 at 05:37:31PM +0100, Aurélien Jarno wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> On Sat, Feb 23, 2013 at 11:10:18PM +0100, Stefan Weil wrote:
>>>>>>>> This assertion occured with latest git master:
>>>>>>>>
>>>>>>>> qemu-system-mipsel: /src/qemu/tcg/tcg-op.h:2589:
>>>>>>>>   tcg_gen_goto_tb: Assertion `(tcg_ctx.goto_tb_issue_mask & (1 << idx))
>>>>>>>> == 0' failed.
>>>>>>>> Aborted
>>>>>>>>
>>>>>>>> QEMU was built with --enable-debug and running a Debian MIPS Lenny (NFS
>>>>>>>> root).
>>>>>>>> The assertion happened when running "apt-get update" in the guest.
>>>>>>>>
>>>>>>> Is it something reproductible or more or less random? Have you Cc:ed
>>>>>>> Richard because it's related to the latest patches?
>>>>>>>
>>>>>>> On my side I am experiencing random segfaults in various guests (at
>>>>>>> least PowerPC, MIPS, SH4 and ARM). I have found a way to bisect it, even
>>>>>>> if it is quite long (building Perl + the testsuite). Currently I know
>>>>>>> that 1.3 is affected, while 1.2 is not.
>>>>>>>
>>>>>> I have found that the issue comes from the following commits, which
>>>>>> unfortunately are not bisectable one by one (though it won't change the
>>>>>> results a lot):
>>>>>>
>>>>>>      commit b76f0d8c2e3eac94bc7fd90a510cb7426b2a2699
>>>>>>      Author: Yeongkyoon Lee <yeongkyoon.lee@samsung.com>
>>>>>>      Date:   Wed Oct 31 16:04:25 2012 +0900
>>>>>>          tcg: Optimize qemu_ld/st by generating slow paths at the end of a block
>>>>>>          Add optimized TCG qemu_ld/st generation which locates the code of TLB miss
>>>>>>          cases at the end of a block after generating the other IRs.
>>>>>>          Currently, this optimization supports only i386 and x86_64 hosts.
>>>>>>          Signed-off-by: Yeongkyoon Lee <yeongkyoon.lee@samsung.com>
>>>>>>          Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
>>>>>>      commit fdbb84d1332ae0827d60f1a2ca03c7d5678c6edd
>>>>>>      Author: Yeongkyoon Lee <yeongkyoon.lee@samsung.com>
>>>>>>      Date:   Wed Oct 31 16:04:24 2012 +0900
>>>>>>          tcg: Add extended GETPC mechanism for MMU helpers with ldst optimization
>>>>>>          Add GETPC_EXT which is used by MMU helpers to selectively calculate the code
>>>>>>          address of accessing guest memory when called from a qemu_ld/st optimized code
>>>>>>          or a C function. Currently, it supports only i386 and x86-64 hosts.
>>>>>>          Signed-off-by: Yeongkyoon Lee <yeongkyoon.lee@samsung.com>
>>>>>>          Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
>>>>>>      commit 32761257c0b9fa7ee04d2871a6e48a41f119c469
>>>>>>      Author: Yeongkyoon Lee <yeongkyoon.lee@samsung.com>
>>>>>>      Date:   Wed Oct 31 16:04:23 2012 +0900
>>>>>>          configure: Add CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st optimization
>>>>>>          Enable CONFIG_QEMU_LDST_OPTIMIZATION for TCG qemu_ld/st optimization only when
>>>>>>          a host is i386 or x86_64.
>>>>>>          Signed-off-by: Yeongkyoon Lee <yeongkyoon.lee@samsung.com>
>>>>>>          Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
>>>>>>
>>>>>> I will try to understand why.
>>>>>>
>>>>>>
>>>>> Hi Aurélien,
>>>>> Do you mean that those random segfaults occurred only when
>>>>> configured with "--enable-debug"?
>>>>> Although I cannot see how my commits affect debug built image at a
>>>>> glance, I'll do double-check.
>>>>> Thanks.
>>>> The problem is there even without configuring QEMU with --enable-debug.
>>>> It justs doesn't happens very often, and very randomly. The only way to
>>>> reproduce it each time is to launch a big task in the guest (for me
>>>> building Perl) and see if it completes or now. It can take up to one
>>>> hour until it happens.
>>>>
>>>> I should precise that the segfault is on the guest side.
>>>>
>>>> I have tried to look at your patches, and so far I haven't found the
>>>> issue. It seems the two first patches are fine, ie I have verified the
>>>> return address is always correctly computed.
>>>>
>>> I still haven't found the issue, but on the other hand I can't find any
>>> problem in your code, after reading it dozen of times. I also tried to
>>> modify it as less as possible while issuing the slow path back inside
>>> the TB and it fixes the problem. So it really looks like to be due to
>>> the slow path being at the end of the TB, and not to a bug in the code
>>> generating it. After adding various checks, I am also convinced the
>>> address computed in GETPC_EXT() is always correct. I have to say I am
>>> running out of ideas.
>>>
>>> One way to reproduce the issue more easily is to reduce the size of the
>>> generated code buffer, for example by setting it to 512kB for both
>>> MIN_CODE_GEN_BUFFER_SIZE and MAX_CODE_GEN_BUFFER_SIZE in
>>> translate-all.c. That way booting an ARM guest triggers plenty of
>>> segmentation faults or other strange issues with your patch but not
>>> without.
>>>
>>> OTOH increasing this size make the issue to almost disappear even when
>>> building perl including the testsuite (for that it has to be at least
>>> 512MB).
>>>
>> Although I've not succeeded to reproduce the problem, I've found a
>> suspicious code stub about boundary-checking of generated code
>> (is_tcg_gen_code() in translate-all.c).
>>
>> The code is supposed to be changed as follows.case
>> Before:
>>      return (tc_ptr >= (uintptr_t)tcg_ctx.code_gen_buffer &&
>>                  tc_ptr < (uintptr_t)(tcg_ctx.code_gen_buffer +
>>                  tcg_ctx.code_gen_buffer_max_size));
>> After:
>>      return (tc_ptr >= (uintptr_t)tcg_ctx.code_gen_buffer &&
>>                  tc_ptr < (uintptr_t)(tcg_ctx.code_gen_buffer +
>>                  tcg_ctx.code_gen_buffer_size));
>>
>> The reason is that there could happen to miss out the generated code
>> ranges by "(TCG_MAX_OP_SIZE * OPC_BUF_SIZE)".
>> See code_gen_alloc() in translate-all.c:
>>      tcg_ctx.code_gen_buffer_max_size = tcg_ctx.code_gen_buffer_size
>> - (TCG_MAX_OP_SIZE * OPC_BUF_SIZE)
>>
> Very good catch! Thanks. This fixes the issue I observed.
>
> To give more details, code_gen_buffer_max_size corresponds to the
> threshold which clear all TBs before continuing generating code. This
> means that it can be exceeded by a few bytes and up to (TCG_MAX_OP_SIZE
> * OPC_BUF_SIZE) bytes which corresponds to the maximum bytes of a
> generated TB.
>
> Could you please send a proper patch to fix that? I think it should also
> be fixed in the next 0.13.x and 0.14.x releases (0.12.x releases are not
> affected), so please Cc: qemu-stable (even if the patch will have to be
> slightly tweaked).
>
Sure, I'll send the patch.
Thanks.

     prev parent reply	other threads:[~2013-03-22  1:48 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <51293E4A.1040100@weilnetz.de>
2013-03-04 16:37 ` [Qemu-devel] TCG assertion with qemu-system-mipsel Aurélien Jarno
2013-03-04 20:29   ` Stefan Weil
2013-03-05 14:18   ` Aurélien Jarno
2013-03-06  2:05     ` Yeongkyoon Lee
2013-03-06  6:10       ` Aurélien Jarno
2013-03-17 22:27         ` [Qemu-devel] TCG broken in system mode (was TCG assertion with qemu-system-mipsel) Aurélien Jarno
2013-03-21  7:04           ` Yeongkyoon Lee
2013-03-21 22:11             ` Aurélien Jarno
2013-03-22  1:48               ` Yeongkyoon Lee [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=514BB866.20809@samsung.com \
    --to=yeongkyoon.lee@samsung.com \
    --cc=aurelien@aurel32.net \
    --cc=blauwirbel@gmail.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    --cc=sw@weilnetz.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).