From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:44353)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <rth7680@gmail.com>) id 1Zd7z5-0000eJ-Nj
	for qemu-devel@nongnu.org; Fri, 18 Sep 2015 22:34:40 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <rth7680@gmail.com>) id 1Zd7z2-0004Ih-Hm
	for qemu-devel@nongnu.org; Fri, 18 Sep 2015 22:34:39 -0400
Received: from mail-pa0-x231.google.com ([2607:f8b0:400e:c03::231]:34225)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <rth7680@gmail.com>) id 1Zd7z2-0004Ib-B1
	for qemu-devel@nongnu.org; Fri, 18 Sep 2015 22:34:36 -0400
Received: by padhy16 with SMTP id hy16so65642194pad.1
	for <qemu-devel@nongnu.org>; Fri, 18 Sep 2015 19:34:35 -0700 (PDT)
Sender: Richard Henderson <rth7680@gmail.com>
References: <1442621006-4231-1-git-send-email-gang.chen.5i5j@gmail.com>
From: Richard Henderson <rth@twiddle.net>
Message-ID: <55FCC9B8.1030606@twiddle.net>
Date: Fri, 18 Sep 2015 19:34:32 -0700
MIME-Version: 1.0
In-Reply-To: <1442621006-4231-1-git-send-email-gang.chen.5i5j@gmail.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH] target-tilegx: Implement v*add and v*sub
	instructions
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: gang.chen.5i5j@gmail.com, peter.maydell@linaro.org
Cc: qemu-devel@nongnu.org, xili_gchen_5257@hotmail.com

On 09/18/2015 05:03 PM, gang.chen.5i5j@gmail.com wrote:
> +uint64_t helper_v1add(uint64_t a, uint64_t b)
> +{
> +    uint64_t r = 0;
> +    int i;
> +
> +    for (i = 0; i < 64; i += 8) {
> +        int64_t ae = (int8_t)(a >> i);
> +        int64_t be = (int8_t)(b >> i);
> +        r |= ((ae + be) & 0xff) << i;
> +    }
> +    return r;
> +}
> +
> +uint64_t helper_v2add(uint64_t a, uint64_t b)
> +{
> +    uint64_t r = 0;
> +    int i;
> +
> +    for (i = 0; i < 64; i += 16) {
> +        int64_t ae = (int16_t)(a >> i);
> +        int64_t be = (int16_t)(b >> i);
> +        r |= ((ae + be) & 0xffff) << i;
> +    }
> +    return r;
> +}

There's a trick for this that's more efficient for 4 or more elements per 
vector (i.e. good for v2 and v1, but not v4):

    a + b = (a & 0x7f7f7f7f) + (b & 0x7f7f7f7f)) ^ ((a ^ b) & 0x80808080)

    a - b = (a | 0x80808080) - (b & 0x7f7f7f7f)) ^ ((a ^ ~b) & 0x80808080)

> +uint64_t helper_v4add(uint64_t a, uint64_t b)
> +{
> +    uint64_t r = 0;
> +    int i;
> +
> +    for (i = 0; i < 64; i += 32) {
> +        int64_t ae = (int32_t)(a >> i);
> +        int64_t be = (int32_t)(b >> i);
> +        r |= ((ae + be) & 0xffffffff) << i;
> +    }
> +    return r;
> +}

I should have mentioned this in the previous patch...

I think probably it would be best to open-code all, or most of, the v4 
operations.  Something like

static void gen_v4op(TCGv d64, TCGv a64, TCGv b64,
                      void (*generate)(TCGv_i32, TCGv_i32, TCGv_i32))
{
     TCGv_i32 al = tcg_temp_new_i32();
     TCGv_i32 ah = tcg_temp_new_i32();
     TCGv_i32 bl = tcg_temp_new_i32();
     TCGv_i32 bh = tcg_temp_new_i32();

     tcg_gen_extr_i64_i32(al, ah, a64);
     tcg_gen_extr_i64_i32(bl, bh, b64);
     generate(al, al, bl);
     generate(ah, ah, bh);
     tcg_gen_concat_i32_i64(d64, al, ah);

     tcg_temp_free_i32(al);
     tcg_temp_free_i32(ah);
     tcg_temp_free_i32(bl);
     tcg_temp_free_i32(bh);
}

>       case OE_RRR(V4ADD, 0, X0):
>       case OE_RRR(V4ADD, 0, X1):
> -        return TILEGX_EXCP_OPCODE_UNIMPLEMENTED;
> +        gen_helper_v4add(tdest, tsrca, tsrcb);

And then

     gen_v4op(tdest, tsrca, tsrcb, tcg_gen_add_i32);


r~