From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:41629)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <benh@kernel.crashing.org>) id 1bRIjr-000051-VW
	for qemu-devel@nongnu.org; Sun, 24 Jul 2016 08:42:36 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <benh@kernel.crashing.org>) id 1bRIjn-00081o-Od
	for qemu-devel@nongnu.org; Sun, 24 Jul 2016 08:42:34 -0400
Received: from gate.crashing.org ([63.228.1.57]:47338)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <benh@kernel.crashing.org>) id 1bRIjn-00081k-FT
	for qemu-devel@nongnu.org; Sun, 24 Jul 2016 08:42:31 -0400
Message-ID: <1469364141.8568.251.camel@kernel.crashing.org>
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Date: Sun, 24 Jul 2016 22:42:21 +1000
Content-Type: text/plain; charset="UTF-8"
Mime-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Subject: [Qemu-devel] TCG problem with cpu_{st,ld}x_data ?
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org
Cc: Paolo Bonzini <pbonzini@redhat.com>, Richard Henderson <rth@twiddle.net>, Christian Borntraeger <borntraeger@de.ibm.com>

Hi !

I need help from TCG experts here. I was chasing down a bug causing
some stuff to crash when using vector ops with a ppc32 guest on x86,
but pulling that string led to a whole mess that *may* be affecting a
pile of architetures unless I'm misunderstanding something...

So basically what happens is that some instruction emulation helpers,
like in my case=C2=A0stvebx (target-ppc/mem_helper.c) are doing calls to
cpu_{st,ld}x_data. Let's say cpu_stb_data() for the sake of the
argument.

That is equivalent to calling cpu_stb_data_ra() with a "0" retaddr.

However, if that faults, when tlb_fill() gets eventually called, what I
observe is not 0 in "retaddr" but ... -2.

The reason, as far as I understand, is that cpu_stb_data_ra() calls
helper_ret_stb_mmu() which does:

=C2=A0=C2=A0 =C2=A0 retaddr -=3D GETPC_ADJ;

(which is -2)

Now a whole pile of tlb_fill() implementations (in fact all of them one
way or another) do:

=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (likely(retaddr)) {
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/=
* now we have a real cpu fault */
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0c=
pu_restore_state(cs, retaddr);
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0}

So that test is missed. The result that I obseve is that the fault gets
reported for the wrong instruction (the wrong PC).

Now I did try changing the above test out of curiosity to also check
against -2, but the end result still mis-reported the fault. So
something's going deeper than that I figured out so far...

What *did* work was to copy what x86 does, which is to change my
helper_stvebx() to not use cpu_stb_data at all, but instead use
cpu_stb_data_ra(...., GETPC()), which mimmics what x86 does for some of
it's helpers.

That fixed the specific problem I was chasing.

However, there are a ton of other helpers, in powerpc, s390 and other
archs, doing that cpu_stb_data() the same way we do, so I really wonder
what's going on here.

Some advice would be very much appreciated ;-)

Cheers,
Ben.