From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41629) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bRIjr-000051-VW for qemu-devel@nongnu.org; Sun, 24 Jul 2016 08:42:36 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bRIjn-00081o-Od for qemu-devel@nongnu.org; Sun, 24 Jul 2016 08:42:34 -0400 Received: from gate.crashing.org ([63.228.1.57]:47338) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bRIjn-00081k-FT for qemu-devel@nongnu.org; Sun, 24 Jul 2016 08:42:31 -0400 Message-ID: <1469364141.8568.251.camel@kernel.crashing.org> From: Benjamin Herrenschmidt Date: Sun, 24 Jul 2016 22:42:21 +1000 Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Subject: [Qemu-devel] TCG problem with cpu_{st,ld}x_data ? List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: Paolo Bonzini , Richard Henderson , Christian Borntraeger Hi ! I need help from TCG experts here. I was chasing down a bug causing some stuff to crash when using vector ops with a ppc32 guest on x86, but pulling that string led to a whole mess that *may* be affecting a pile of architetures unless I'm misunderstanding something... So basically what happens is that some instruction emulation helpers, like in my case=C2=A0stvebx (target-ppc/mem_helper.c) are doing calls to cpu_{st,ld}x_data. Let's say cpu_stb_data() for the sake of the argument. That is equivalent to calling cpu_stb_data_ra() with a "0" retaddr. However, if that faults, when tlb_fill() gets eventually called, what I observe is not 0 in "retaddr" but ... -2. The reason, as far as I understand, is that cpu_stb_data_ra() calls helper_ret_stb_mmu() which does: =C2=A0=C2=A0 =C2=A0 retaddr -=3D GETPC_ADJ; (which is -2) Now a whole pile of tlb_fill() implementations (in fact all of them one way or another) do: =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (likely(retaddr)) { =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/= * now we have a real cpu fault */ =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0c= pu_restore_state(cs, retaddr); =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0} So that test is missed. The result that I obseve is that the fault gets reported for the wrong instruction (the wrong PC). Now I did try changing the above test out of curiosity to also check against -2, but the end result still mis-reported the fault. So something's going deeper than that I figured out so far... What *did* work was to copy what x86 does, which is to change my helper_stvebx() to not use cpu_stb_data at all, but instead use cpu_stb_data_ra(...., GETPC()), which mimmics what x86 does for some of it's helpers. That fixed the specific problem I was chasing. However, there are a ton of other helpers, in powerpc, s390 and other archs, doing that cpu_stb_data() the same way we do, so I really wonder what's going on here. Some advice would be very much appreciated ;-) Cheers, Ben.