From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from [140.186.70.92] (port=50338 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1PcIkW-00050a-8R
	for qemu-devel@nongnu.org; Mon, 10 Jan 2011 09:29:33 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <agraf@suse.de>) id 1PcIkU-00084U-Rz
	for qemu-devel@nongnu.org; Mon, 10 Jan 2011 09:29:32 -0500
Received: from cantor2.suse.de ([195.135.220.15]:51688 helo=mx2.suse.de)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <agraf@suse.de>) id 1PcIkU-000849-Hk
	for qemu-devel@nongnu.org; Mon, 10 Jan 2011 09:29:30 -0500
Mime-Version: 1.0 (Apple Message framework v1082)
Content-Type: text/plain; charset=us-ascii
From: Alexander Graf <agraf@suse.de>
In-Reply-To: <20110110142317.GG17026@hall.aurel32.net>
Date: Mon, 10 Jan 2011 15:29:28 +0100
Content-Transfer-Encoding: quoted-printable
Message-Id: <5DB773F2-511F-4810-A7DF-37986BA1F2DB@suse.de>
References: <20110106221242.GG21099@hall.aurel32.net>
	<ADE7D325-3612-4BD9-A88E-7B88E68449E1@suse.de>
	<20110110140039.GE17026@hall.aurel32.net>
	<ECA28BDF-185E-4B1B-9527-1EFD9EE0E771@suse.de>
	<20110110141524.GF17026@hall.aurel32.net>
	<BA0C76F5-91E3-472A-AC84-B65E0BBD4CCE@suse.de>
	<20110110142317.GG17026@hall.aurel32.net>
Subject: [Qemu-devel] Re: tcg/{ppc, s390,
	sparc}: branch target and code retranslation
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Aurelien Jarno <aurelien@aurel32.net>
Cc: Blue Swirl <blauwirbel@gmail.com>, qemu-devel@nongnu.org


On 10.01.2011, at 15:23, Aurelien Jarno wrote:

> On Mon, Jan 10, 2011 at 03:20:40PM +0100, Alexander Graf wrote:
>>=20
>> On 10.01.2011, at 15:15, Aurelien Jarno wrote:
>>=20
>>> On Mon, Jan 10, 2011 at 03:07:52PM +0100, Alexander Graf wrote:
>>>>=20
>>>> On 10.01.2011, at 15:00, Aurelien Jarno wrote:
>>>>=20
>>>>> On Mon, Jan 10, 2011 at 01:13:25PM +0100, Alexander Graf wrote:
>>>>>>=20
>>>>>> On 06.01.2011, at 23:12, Aurelien Jarno wrote:
>>>>>>=20
>>>>>>> Hi,
>>>>>>>=20
>>>>>>> I have just sent a tcg/arm patch concerning code retranslation. =
You
>>>>>>> might want to look at the description (copied below), as from a =
first
>>>>>>> glance ppc, s390 and sparc TCG targets might be affected. If you =
see
>>>>>>> guest kernel panics, some segmentation fault of qemu or in the =
guest,
>>>>>>> strange behaviors, that happen randomly and that looks difficult =
to
>>>>>>> debug it might be the issue.
>>>>>>>=20
>>>>>>> Aurelien
>>>>>>>=20
>>>>>>>=20
>>>>>>> | QEMU uses code retranslation to restore the CPU state when an =
exception
>>>>>>> | happens. For it to work the retranslation must not modify the =
generated
>>>>>>> | code. This is what is currently implemented in ARM TCG.
>>>>>>> |
>>>>>>> | However on CPU that don't have icache/dcache/memory =
synchronised like
>>>>>>> | ARM, this requirement is stronger and code retranslation must =
not modify
>>>>>>> | the generated code "atomically", as the cache line might be =
flushed
>>>>>>> | at any moment (interrupt, exception, task switching), even if =
not
>>>>>>> | triggered by QEMU. The probability for this to happen is very =
low, and
>>>>>>> | depends on cache size and associativiy, machine load, =
interrupts, so the
>>>>>>> | symptoms are might happen randomly.
>>>>>>> |
>>>>>>> | This requirement is currently not followed in tcg/arm, for the
>>>>>>> | load/store code, which basically has the following structure:
>>>>>>> |   1) tlb access code is written
>>>>>>> |   2) conditional fast path code is written
>>>>>>> |   3) branch is written with a temporary target
>>>>>>> |   4) slow path code is written
>>>>>>> |   5) branch target is updated
>>>>>>> | The cache lines corresponding to the retranslated code is not =
flushed
>>>>>>> | after code retranslation as the generated code is supposed to =
be the
>>>>>>> | same. However if the cache line corresponding to the branch =
instruction
>>>>>>> | is flushed between step 3 and 5, and is not flushed again =
before the
>>>>>>> | code is executed again, the branch target is wrong. In the =
guest, the
>>>>>>> | symptoms are MMU page fault at a random addresses, which leads =
to
>>>>>>> | kernel page fault or segmentation faults.
>>>>>>=20
>>>>>> I don't see the problem. If you have separate icache from dcache, =
the code in question doesn't get executed during the rewrite, so all =
should be fine. If you have both combined, the data write should =
automatically modify the cache line, so all is great too.
>>>>>=20
>>>>> As far as I understand, this is what happens to the branch target
>>>>> instruction, or at least one possibility:
>>>>>=20
>>>>> operation                                 | icache | dcache | =
mem/L2 |
>>>>> =
------------------------------------------+--------+--------+--------+
>>>>> tlb access code is written                | absent | absent |   ok =
  |
>>>>> conditional fast path code is written     | absent | absent |   ok =
  |
>>>>> branch is written with a temporary target | absent | wrong  |   ok =
  |
>>>>> cache line is flushed to memory           | absent | absent |  =
wrong |
>>>>> slow path code is written                 | absent | absent |  =
wrong |
>>>>> branch target is updated                  | absent |   ok   |  =
wrong |
>>>>> TB is re-executed                         | wrong  |   ok   |  =
wrong |
>>>>>=20
>>>>> Note that the issue is real, the patch really fixes the issue on =
ARM and
>>>>> MIPS. It's not impossible that I don't fully understand it given I =
can't
>>>>> analyse the cache values at a given time. However, when QEMU =
crashes=20
>>>>> because of that, we have seen that the executed code doesn't match =
what
>>>>> GDB says, and changing the temporary branch value also changes the =
way=20
>>>>> QEMU or the guest crash.
>>>>>=20
>>>>> The problem doesn't happens very often though (for sure less than =
1=20
>>>>> every few millions). The other way to fix it is to issue a cache =
flush
>>>>> after a code retranslation, however, it is something very costly =
on some
>>>>> architectures.
>>>>=20
>>>> I'd guess it only happens when code is overwritten. Only then =
icache can still be stale in that code region. Can't we just invalidate =
the icache on every tb flush? That should also fix the issue I'd guess.
>>>=20
>>> That's a solution that works (tested). However making sure that the
>>> retranslation doesn't change the code looks better.
>>=20
>> Yeah, I agree. Temporary retranslation really shouldn't modify =
existing code. We really do need an icache flush on tb flushes still =
though as that's a separate issue? Or are these already in?
>>=20
>=20
> We already have an icache flush after the first translation. We don't
> have any when the TB is flushed, but I don't think it is something
> necessary. The new TB that will replace it will issue an icache flush
> after it has been translated.

We don't need the flush after the translation if we just flush the whole =
block on invalidate. That should be a speedup, no? Flushing the full =
thing once is probably faster than lots of small flushes on every TB =
write.

Also, if there's a flush after the translation, I'm even more puzzled =
how the race you describe above can occur.


Alex