From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44353) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zd7z5-0000eJ-Nj for qemu-devel@nongnu.org; Fri, 18 Sep 2015 22:34:40 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Zd7z2-0004Ih-Hm for qemu-devel@nongnu.org; Fri, 18 Sep 2015 22:34:39 -0400 Received: from mail-pa0-x231.google.com ([2607:f8b0:400e:c03::231]:34225) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zd7z2-0004Ib-B1 for qemu-devel@nongnu.org; Fri, 18 Sep 2015 22:34:36 -0400 Received: by padhy16 with SMTP id hy16so65642194pad.1 for ; Fri, 18 Sep 2015 19:34:35 -0700 (PDT) Sender: Richard Henderson References: <1442621006-4231-1-git-send-email-gang.chen.5i5j@gmail.com> From: Richard Henderson Message-ID: <55FCC9B8.1030606@twiddle.net> Date: Fri, 18 Sep 2015 19:34:32 -0700 MIME-Version: 1.0 In-Reply-To: <1442621006-4231-1-git-send-email-gang.chen.5i5j@gmail.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH] target-tilegx: Implement v*add and v*sub instructions List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: gang.chen.5i5j@gmail.com, peter.maydell@linaro.org Cc: qemu-devel@nongnu.org, xili_gchen_5257@hotmail.com On 09/18/2015 05:03 PM, gang.chen.5i5j@gmail.com wrote: > +uint64_t helper_v1add(uint64_t a, uint64_t b) > +{ > + uint64_t r = 0; > + int i; > + > + for (i = 0; i < 64; i += 8) { > + int64_t ae = (int8_t)(a >> i); > + int64_t be = (int8_t)(b >> i); > + r |= ((ae + be) & 0xff) << i; > + } > + return r; > +} > + > +uint64_t helper_v2add(uint64_t a, uint64_t b) > +{ > + uint64_t r = 0; > + int i; > + > + for (i = 0; i < 64; i += 16) { > + int64_t ae = (int16_t)(a >> i); > + int64_t be = (int16_t)(b >> i); > + r |= ((ae + be) & 0xffff) << i; > + } > + return r; > +} There's a trick for this that's more efficient for 4 or more elements per vector (i.e. good for v2 and v1, but not v4): a + b = (a & 0x7f7f7f7f) + (b & 0x7f7f7f7f)) ^ ((a ^ b) & 0x80808080) a - b = (a | 0x80808080) - (b & 0x7f7f7f7f)) ^ ((a ^ ~b) & 0x80808080) > +uint64_t helper_v4add(uint64_t a, uint64_t b) > +{ > + uint64_t r = 0; > + int i; > + > + for (i = 0; i < 64; i += 32) { > + int64_t ae = (int32_t)(a >> i); > + int64_t be = (int32_t)(b >> i); > + r |= ((ae + be) & 0xffffffff) << i; > + } > + return r; > +} I should have mentioned this in the previous patch... I think probably it would be best to open-code all, or most of, the v4 operations. Something like static void gen_v4op(TCGv d64, TCGv a64, TCGv b64, void (*generate)(TCGv_i32, TCGv_i32, TCGv_i32)) { TCGv_i32 al = tcg_temp_new_i32(); TCGv_i32 ah = tcg_temp_new_i32(); TCGv_i32 bl = tcg_temp_new_i32(); TCGv_i32 bh = tcg_temp_new_i32(); tcg_gen_extr_i64_i32(al, ah, a64); tcg_gen_extr_i64_i32(bl, bh, b64); generate(al, al, bl); generate(ah, ah, bh); tcg_gen_concat_i32_i64(d64, al, ah); tcg_temp_free_i32(al); tcg_temp_free_i32(ah); tcg_temp_free_i32(bl); tcg_temp_free_i32(bh); } > case OE_RRR(V4ADD, 0, X0): > case OE_RRR(V4ADD, 0, X1): > - return TILEGX_EXCP_OPCODE_UNIMPLEMENTED; > + gen_helper_v4add(tdest, tsrca, tsrcb); And then gen_v4op(tdest, tsrca, tsrcb, tcg_gen_add_i32); r~