From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:57746) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QHu6i-0005Id-Uf for qemu-devel@nongnu.org; Thu, 05 May 2011 04:40:25 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QHu6h-0005ae-RO for qemu-devel@nongnu.org; Thu, 05 May 2011 04:40:24 -0400 Received: from mail-qy0-f173.google.com ([209.85.216.173]:59703) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QHu6h-0005aZ-OB for qemu-devel@nongnu.org; Thu, 05 May 2011 04:40:23 -0400 Received: by qyk36 with SMTP id 36so3677457qyk.4 for ; Thu, 05 May 2011 01:40:23 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <4DC1A3F9.9030000@twiddle.net> References: <1304470768-16924-1-git-send-email-jcmvbkbc@gmail.com> <1304470768-16924-12-git-send-email-jcmvbkbc@gmail.com> <4DC17BE1.6020005@twiddle.net> <4DC1A3F9.9030000@twiddle.net> Date: Thu, 5 May 2011 12:40:22 +0400 Message-ID: From: Max Filippov Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [RFC 12/28] target-xtensa: implement shifts (ST1 and RST1 groups) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Richard Henderson Cc: qemu-devel@nongnu.org >> To track immediate values written to SAR? You mean that there may be >> some performance difference of fixed size shift vs indirect shift and >> TCG is able to tell them apart? > > Well, not really fixed vs indirect, but if you know that the value > in the SAR register is in the right range, you can avoid using a > 64-bit shift. > > For instance, > > =A0 =A0 =A0 =A0SSL =A0 =A0 ar2 > =A0 =A0 =A0 =A0SLL =A0 =A0 ar0, ar1 > > could be implemented with > > =A0 =A0 =A0 =A0tcg_gen_sll_i32(ar0, ar1, ar2); > > assuming we have enough context. > > Let us decompose the SAR register into two parts, storing both the > true value, and 32-value. > > =A0 =A0struct DisasContext { > =A0 =A0 =A0 =A0// Current Stuff > =A0 =A0 =A0 =A0// ... > > =A0 =A0 =A0 =A0// When valid, holds 32-SAR. > =A0 =A0 =A0 =A0TCGv sar_m32; > =A0 =A0 =A0 =A0bool sar_m32_alloc; > =A0 =A0 =A0 =A0bool sar_m32_valid; > =A0 =A0 =A0 =A0bool sar_5bit; > =A0 =A0}; > > At the beginning of the TB: > > =A0 =A0 =A0 =A0TCGV_UNUSED_I32(dc->sar_m32); > =A0 =A0 =A0 =A0dc->sar_m32_alloc =3D false; > =A0 =A0 =A0 =A0dc->sar_m32_valid =3D false; > =A0 =A0 =A0 =A0dc->sar_5bit =3D false; > > > > static void gen_set_sra_m32(DisasContext *dc, TCGv val) > { > =A0 =A0if (!dc->sar_m32_alloc) { > =A0 =A0 =A0 =A0dc->sar_m32_alloc =3D true; > =A0 =A0 =A0 =A0dc->sar_m32 =3D tcg_temp_local_new_i32(); > =A0 =A0} > =A0 =A0dc->sar_m32_valid =3D true; > > =A0 =A0/* Clear 5 bit because the SAR value could be 32. =A0*/ > =A0 =A0dc->sar_5bit =3D false; > > =A0 =A0tcg_gen_movi_i32(cpu_SR[SAR], 32); > =A0 =A0tcg_gen_sub_i32(cpu_SR[SAR], cpu_SR[SAR], val); > =A0 =A0tcg_gen_mov_i32(dc->sar_m32, val); > } > > static void gen_set_sra(DisasContext *dc, TCGv val, bool is_5bit) > { > =A0 =A0if (dc->sar_m32_alloc && dc->sar_m32_valid) { > =A0 =A0 =A0 =A0tcg_gen_discard_i32(dc->sar_m32); > =A0 =A0} > =A0 =A0dc->sar_m32_valid =3D false; > =A0 =A0dc->sar_5bit =3D is_5bit; > > =A0 =A0tcg_gen_mov_i32(cpu_SR[SAR], val); > } > > =A0 =A0 =A0 =A0/* SSL */ > =A0 =A0 =A0 =A0tcg_gen_andi_i32(tmp, cpu_R[AS], 31); > =A0 =A0 =A0 =A0gen_set_sra_m32(dc, tmp); > =A0 =A0 =A0 =A0break; > > =A0 =A0 =A0 =A0/* SRL */ > =A0 =A0 =A0 =A0tcg_gen_andi_i32(tmp, cpu_R[AS], 31); > =A0 =A0 =A0 =A0gen_set_sra(dc, tmp, true); > =A0 =A0 =A0 =A0break; > > =A0 =A0 =A0 =A0/* WSR.SAR */ > =A0 =A0 =A0 =A0tcg_gen_andi_i32(tmp, cpu_R[AS], 63); > =A0 =A0 =A0 =A0gen_set_sra(dc, tmp, false); > =A0 =A0 =A0 =A0break; > > =A0 =A0 =A0 =A0/* SSAI */ > =A0 =A0 =A0 =A0tcg_gen_movi_i32(tmp, constant); > =A0 =A0 =A0 =A0gen_gen_sra(dc, tmp, true); > =A0 =A0 =A0 =A0break; > > =A0 =A0 =A0 =A0/* SLL */ > =A0 =A0 =A0 =A0if (dc->sar_m32_valid) { > =A0 =A0 =A0 =A0 =A0 =A0tcg_gen_sll_i32(cpu_R[AR], cpu_R[AS], dc->sar_m32)= ; > =A0 =A0 =A0 =A0} else { > =A0 =A0 =A0 =A0 =A0 =A0/* your existing 64-bit shift emulation. =A0*/ > =A0 =A0 =A0 =A0} > =A0 =A0 =A0 =A0break; > > =A0 =A0 =A0 =A0/* SRL */ > =A0 =A0 =A0 =A0if (dc->sar_5bit) { > =A0 =A0 =A0 =A0 =A0 =A0tcg_gen_srl_i32(cpu_R[AR], cpu_R[AS], cpu_SR[SAR])= ; > =A0 =A0 =A0 =A0} else { > =A0 =A0 =A0 =A0 =A0 =A0/* your existing 64-bit shift emulation. =A0*/ > =A0 =A0 =A0 =A0} > > > A couple of points: The use of the local temp avoids problems with > intervening insns that might generate branch opcodes. =A0For the > simplest cases, as with the case at the start of the message, we > ought to be able to propagate the values into the TCG shift insn > directly. > > Does that make sense? Yes it does. Thanks for the good explanation. I tried to keep it all as simple as possible to have a working prototype qickly. Now that it works optimizations should be no problem. Thanks. -- Max