From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:57746)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jcmvbkbc@gmail.com>) id 1QHu6i-0005Id-Uf
	for qemu-devel@nongnu.org; Thu, 05 May 2011 04:40:25 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <jcmvbkbc@gmail.com>) id 1QHu6h-0005ae-RO
	for qemu-devel@nongnu.org; Thu, 05 May 2011 04:40:24 -0400
Received: from mail-qy0-f173.google.com ([209.85.216.173]:59703)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jcmvbkbc@gmail.com>) id 1QHu6h-0005aZ-OB
	for qemu-devel@nongnu.org; Thu, 05 May 2011 04:40:23 -0400
Received: by qyk36 with SMTP id 36so3677457qyk.4
	for <qemu-devel@nongnu.org>; Thu, 05 May 2011 01:40:23 -0700 (PDT)
MIME-Version: 1.0
In-Reply-To: <4DC1A3F9.9030000@twiddle.net>
References: <1304470768-16924-1-git-send-email-jcmvbkbc@gmail.com>
	<1304470768-16924-12-git-send-email-jcmvbkbc@gmail.com>
	<4DC17BE1.6020005@twiddle.net>
	<BANLkTinP-NG0+x8Usn-7eRChy8PaR-sfBg@mail.gmail.com>
	<4DC1A3F9.9030000@twiddle.net>
Date: Thu, 5 May 2011 12:40:22 +0400
Message-ID: <BANLkTim5=0Bi7uuAGPFmsvFh0mB5w9V0jA@mail.gmail.com>
From: Max Filippov <jcmvbkbc@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] [RFC 12/28] target-xtensa: implement shifts (ST1
 and RST1 groups)
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Richard Henderson <rth@twiddle.net>
Cc: qemu-devel@nongnu.org

>> To track immediate values written to SAR? You mean that there may be
>> some performance difference of fixed size shift vs indirect shift and
>> TCG is able to tell them apart?
>
> Well, not really fixed vs indirect, but if you know that the value
> in the SAR register is in the right range, you can avoid using a
> 64-bit shift.
>
> For instance,
>
> =A0 =A0 =A0 =A0SSL =A0 =A0 ar2
> =A0 =A0 =A0 =A0SLL =A0 =A0 ar0, ar1
>
> could be implemented with
>
> =A0 =A0 =A0 =A0tcg_gen_sll_i32(ar0, ar1, ar2);
>
> assuming we have enough context.
>
> Let us decompose the SAR register into two parts, storing both the
> true value, and 32-value.
>
> =A0 =A0struct DisasContext {
> =A0 =A0 =A0 =A0// Current Stuff
> =A0 =A0 =A0 =A0// ...
>
> =A0 =A0 =A0 =A0// When valid, holds 32-SAR.
> =A0 =A0 =A0 =A0TCGv sar_m32;
> =A0 =A0 =A0 =A0bool sar_m32_alloc;
> =A0 =A0 =A0 =A0bool sar_m32_valid;
> =A0 =A0 =A0 =A0bool sar_5bit;
> =A0 =A0};
>
> At the beginning of the TB:
>
> =A0 =A0 =A0 =A0TCGV_UNUSED_I32(dc->sar_m32);
> =A0 =A0 =A0 =A0dc->sar_m32_alloc =3D false;
> =A0 =A0 =A0 =A0dc->sar_m32_valid =3D false;
> =A0 =A0 =A0 =A0dc->sar_5bit =3D false;
>
>
>
> static void gen_set_sra_m32(DisasContext *dc, TCGv val)
> {
> =A0 =A0if (!dc->sar_m32_alloc) {
> =A0 =A0 =A0 =A0dc->sar_m32_alloc =3D true;
> =A0 =A0 =A0 =A0dc->sar_m32 =3D tcg_temp_local_new_i32();
> =A0 =A0}
> =A0 =A0dc->sar_m32_valid =3D true;
>
> =A0 =A0/* Clear 5 bit because the SAR value could be 32. =A0*/
> =A0 =A0dc->sar_5bit =3D false;
>
> =A0 =A0tcg_gen_movi_i32(cpu_SR[SAR], 32);
> =A0 =A0tcg_gen_sub_i32(cpu_SR[SAR], cpu_SR[SAR], val);
> =A0 =A0tcg_gen_mov_i32(dc->sar_m32, val);
> }
>
> static void gen_set_sra(DisasContext *dc, TCGv val, bool is_5bit)
> {
> =A0 =A0if (dc->sar_m32_alloc && dc->sar_m32_valid) {
> =A0 =A0 =A0 =A0tcg_gen_discard_i32(dc->sar_m32);
> =A0 =A0}
> =A0 =A0dc->sar_m32_valid =3D false;
> =A0 =A0dc->sar_5bit =3D is_5bit;
>
> =A0 =A0tcg_gen_mov_i32(cpu_SR[SAR], val);
> }
>
> =A0 =A0 =A0 =A0/* SSL */
> =A0 =A0 =A0 =A0tcg_gen_andi_i32(tmp, cpu_R[AS], 31);
> =A0 =A0 =A0 =A0gen_set_sra_m32(dc, tmp);
> =A0 =A0 =A0 =A0break;
>
> =A0 =A0 =A0 =A0/* SRL */
> =A0 =A0 =A0 =A0tcg_gen_andi_i32(tmp, cpu_R[AS], 31);
> =A0 =A0 =A0 =A0gen_set_sra(dc, tmp, true);
> =A0 =A0 =A0 =A0break;
>
> =A0 =A0 =A0 =A0/* WSR.SAR */
> =A0 =A0 =A0 =A0tcg_gen_andi_i32(tmp, cpu_R[AS], 63);
> =A0 =A0 =A0 =A0gen_set_sra(dc, tmp, false);
> =A0 =A0 =A0 =A0break;
>
> =A0 =A0 =A0 =A0/* SSAI */
> =A0 =A0 =A0 =A0tcg_gen_movi_i32(tmp, constant);
> =A0 =A0 =A0 =A0gen_gen_sra(dc, tmp, true);
> =A0 =A0 =A0 =A0break;
>
> =A0 =A0 =A0 =A0/* SLL */
> =A0 =A0 =A0 =A0if (dc->sar_m32_valid) {
> =A0 =A0 =A0 =A0 =A0 =A0tcg_gen_sll_i32(cpu_R[AR], cpu_R[AS], dc->sar_m32)=
;
> =A0 =A0 =A0 =A0} else {
> =A0 =A0 =A0 =A0 =A0 =A0/* your existing 64-bit shift emulation. =A0*/
> =A0 =A0 =A0 =A0}
> =A0 =A0 =A0 =A0break;
>
> =A0 =A0 =A0 =A0/* SRL */
> =A0 =A0 =A0 =A0if (dc->sar_5bit) {
> =A0 =A0 =A0 =A0 =A0 =A0tcg_gen_srl_i32(cpu_R[AR], cpu_R[AS], cpu_SR[SAR])=
;
> =A0 =A0 =A0 =A0} else {
> =A0 =A0 =A0 =A0 =A0 =A0/* your existing 64-bit shift emulation. =A0*/
> =A0 =A0 =A0 =A0}
>
>
> A couple of points: The use of the local temp avoids problems with
> intervening insns that might generate branch opcodes. =A0For the
> simplest cases, as with the case at the start of the message, we
> ought to be able to propagate the values into the TCG shift insn
> directly.
>
> Does that make sense?

Yes it does. Thanks for the good explanation.
I tried to keep it all as simple as possible to have a working
prototype qickly. Now that it works optimizations should be no
problem.

Thanks.
-- Max