From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38258) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bT3eW-00044D-O0 for qemu-devel@nongnu.org; Fri, 29 Jul 2016 05:00:21 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bT3eR-00023z-RC for qemu-devel@nongnu.org; Fri, 29 Jul 2016 05:00:19 -0400 Message-ID: <1469782800.5978.284.camel@kernel.crashing.org> From: Benjamin Herrenschmidt Date: Fri, 29 Jul 2016 19:00:00 +1000 In-Reply-To: <76968d9e-1b45-538e-a400-9069eb72f42e@twiddle.net> References: <1469571686-7284-1-git-send-email-benh@kernel.crashing.org> <1469571686-7284-28-git-send-email-benh@kernel.crashing.org> <76968d9e-1b45-538e-a400-9069eb72f42e@twiddle.net> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH 28/32] ppc: Avoid double translation for lvx/lvxl/stvx/stvxl List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Richard Henderson , qemu-ppc@nongnu.org Cc: qemu-devel@nongnu.org, david@gibson.dropbear.id.au On Fri, 2016-07-29 at 06:19 +0530, Richard Henderson wrote: > (1) The helper, since it writes to registers controlled by tcg, must be= =C2=A0 > described to clobber all registers.=C2=A0 Which will noticeably increas= e memory=C2=A0 > traffic to ENV.=C2=A0 For instance, you won't be able to hold the guest= register=C2=A0 > holding the address in a host register across the call. So after fixing my test setup, I did observe indeed a small performance loss using the helper in qemu-user. It might still win us something in softmmu due to avoiding extra translations but I will leave that aside as I mentioned separately. Now out of curosity, I tried this: --- a/target-ppc/helper.h +++ b/target-ppc/helper.h @@ -22,12 +22,12 @@ DEF_HELPER_1(check_tlb_flush, void, env) =C2=A0#endif =C2=A0 =C2=A0DEF_HELPER_3(lmw, void, env, tl, i32) -DEF_HELPER_3(stmw, void, env, tl, i32) +DEF_HELPER_FLAGS_3(stmw, TCG_CALL_NO_WG, void, env, tl, i32) =C2=A0DEF_HELPER_4(lsw, void, env, tl, i32, i32) =C2=A0DEF_HELPER_5(lswx, void, env, tl, i32, i32, i32) -DEF_HELPER_4(stsw, void, env, tl, i32, i32) -DEF_HELPER_3(dcbz, void, env, tl, i32) -DEF_HELPER_2(icbi, void, env, tl) +DEF_HELPER_FLAGS_4(stsw, TCG_CALL_NO_WG, void, env, tl, i32, i32) +DEF_HELPER_FLAGS_3(dcbz, TCG_CALL_NO_WG, void, env, tl, i32) +DEF_HELPER_FLAGS_2(icbi, TCG_CALL_NO_WG, void, env, tl) =C2=A0DEF_HELPER_5(lscbx, tl, env, tl, i32, i32, i32) =C2=A0 =C2=A0#if defined(TARGET_PPC64) If my understanding is right, the above is correct, as none of these instructions will write to the env, though they can read from it and/ or generate faults. Sadly I haven't observed any performance improvement as a result in a few micro-benchmarks I cooked up. Cheers, Ben =20