From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35373) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XEN6Q-0000xx-NI for qemu-devel@nongnu.org; Mon, 04 Aug 2014 14:35:31 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XEN6H-0005dg-5a for qemu-devel@nongnu.org; Mon, 04 Aug 2014 14:35:22 -0400 Received: from mail-qg0-x22e.google.com ([2607:f8b0:400d:c04::22e]:54837) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XEN6G-0005db-VS for qemu-devel@nongnu.org; Mon, 04 Aug 2014 14:35:13 -0400 Received: by mail-qg0-f46.google.com with SMTP id z60so9250514qgd.19 for ; Mon, 04 Aug 2014 11:35:11 -0700 (PDT) Sender: Richard Henderson Message-ID: <53DFD257.3060905@twiddle.net> Date: Mon, 04 Aug 2014 08:35:03 -1000 From: Richard Henderson MIME-Version: 1.0 References: <1407173932-969-1-git-send-email-kbastian@mail.uni-paderborn.de> <1407173932-969-7-git-send-email-kbastian@mail.uni-paderborn.de> In-Reply-To: <1407173932-969-7-git-send-email-kbastian@mail.uni-paderborn.de> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v3 06/15] target-tricore: Add instructions of SRC opcode format List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Bastian Koppelmann , qemu-devel@nongnu.org Cc: peter.maydell@linaro.org On 08/04/2014 07:38 AM, Bastian Koppelmann wrote: > +static inline void gen_calc_psw_sv_i32(TCGv ret, TCGv arg) > +{ > + tcg_gen_xor_tl(ret, ret, arg); > +} Not exclusive or, inclusive or. And there's really no need for a helper for this. > +static inline void gen_calc_psw_av_i32(TCGv ret, TCGv arg) > +{ > + TCGv temp = tcg_temp_new(); > + tcg_gen_muli_tl(temp, arg, 2); Strength reduce to tcg_gen_add_tl(temp, arg, arg). > + tcg_gen_xor_tl(temp, arg, temp); > + tcg_gen_andi_tl(ret, temp, 0x80000000); No need for the andi if you do as I suggested and only consider the high bit when reading the value from PSW_AV. > +static inline void gen_calc_psw_sav_i32(TCGv ret, TCGv arg) > +{ > + tcg_gen_xor_tl(ret, ret, arg); > +} Again, inclusive or. > +static inline void gen_add_i32(TCGv ret, TCGv r1, TCGv r2) I strongly suggest that you name this something else, because you've gone and confused yourself: this only applies to adds in the data registers. > + TCGv t0 = tcg_temp_new_i32(); > + /* Addition and set V/SV bits */ > + tcg_gen_movi_tl(t0, 0); > + tcg_gen_add2_tl(ret, cpu_PSW_V, r1, t0, r2, t0); This computation is not overflow, but carry. As I said, see e.g. the ARM port where we properly compute overflow as R = A + B VF = (R ^ A) & ~(A ^ B) i.e. tcg_gen_xor_tl(VF, R, A) tcg_gen_xor_tl(tmp, A, B) tcg_gen_andc_tl(VF, VF, tmp) considering only the most significant bit as the overflow. > +#define OP_COND(insn)\ > +static inline void gen_cond_##insn(int cond, TCGv r1, TCGv r2, TCGv r3, \ > + TCGv r4) \ > +{ \ > + TCGv temp = tcg_temp_new(); \ > + TCGv temp2 = tcg_temp_new(); \ > + TCGv t0 = tcg_const_i32(0); \ > + \ > + tcg_gen_##insn ## 2_tl(temp, temp2, r1, t0, r2, t0); \ > + tcg_gen_movcond_tl(cond, r3, r4, t0, temp, r3); \ > + /* Set PSW_V conditional */ \ > + tcg_gen_movcond_tl(cond, cpu_PSW_V, r4, t0, temp2, cpu_PSW_V); \ > + /* Set PSW_SV conditional */ \ > + gen_calc_psw_sv_i32(temp2, cpu_PSW_SV); \ > + tcg_gen_movcond_tl(cond, cpu_PSW_SV, r4, t0, temp2, cpu_PSW_SV); \ > + /* calc AV bit */ \ > + gen_calc_psw_av_i32(temp2, temp); \ > + tcg_gen_movcond_tl(cond, cpu_PSW_AV, r4, t0, temp2, cpu_PSW_AV); \ > + /* calc SAV bit */ \ > + gen_calc_psw_sav_i32(temp2, cpu_PSW_SAV); \ > + tcg_gen_movcond_tl(cond, cpu_PSW_SAV, r4, t0, temp2, cpu_PSW_SAV); \ > + \ > + tcg_temp_free(t0); \ > + tcg_temp_free(temp); \ > + tcg_temp_free(temp2); \ > +} \ > + \ > +static inline void gen_condi_##insn(int cond, TCGv r1, int32_t r2, \ > + TCGv r3, TCGv r4) \ > +{ \ > + TCGv temp = tcg_const_i32(r2); \ > + gen_cond_##insn(cond, r1, temp, r3, r4); \ > + tcg_temp_free(temp); \ > +} > + > +OP_COND(add) > +OP_COND(sub) BTW, this macro substitution isn't going to work well as is, since there are different overflow computations for addition and subtraction. > +static void gen_shaci(TCGv ret, TCGv r1, int32_t shift_count) > +{ > + uint32_t msk, msk_start; > + TCGv_i64 temp = tcg_temp_new_i64(); > + TCGv_i64 result = tcg_temp_new_i64(); > + TCGv_i64 t_0 = tcg_const_i64(0); > + TCGv_i64 t_1 = tcg_const_i64(1); > + TCGv_i64 t_max = tcg_const_i64(0x7FFFFFFF); > + TCGv_i64 t_min = tcg_const_i64(-(0x80000000L)); > + > + if (shift_count == 0) { > + /* Clear PSW.C */ > + tcg_gen_movi_tl(cpu_PSW_C, 0); > + tcg_gen_mov_tl(ret, r1); > + } else if (shift_count > 0) { > + tcg_gen_ext_i32_i64(temp, r1); > + tcg_gen_shli_i64(result, temp, shift_count); > + /* calc carry */ > + msk_start = 32 - shift_count; > + msk = ((1 << shift_count) - 1) << msk_start; > + tcg_gen_andi_tl(cpu_PSW_C, r1, msk); You don't need a 64-bit shift here. > + } else { > + tcg_gen_ext_i32_i64(temp, r1); > + tcg_gen_sari_i64(result, temp, -(shift_count)); > + /* calc carry */ > + msk = (1 << (shift_count - 1)) - 1; > + tcg_gen_andi_tl(cpu_PSW_C, r1, msk); > + } Likewise, although that does mean you need to handle the special case of -32. > + /* calc v/sv bits only if shift happened and write back 64bit result*/ > + if (shift_count != 0) { > + /* v/sv */ > + tcg_gen_movcond_i64(TCG_COND_GT, temp, result, t_max, t_1, t_0); > + tcg_gen_movcond_i64(TCG_COND_LT, temp, result, t_min, t_1, temp); > + tcg_gen_trunc_i64_i32(cpu_PSW_V, temp); > + > + gen_calc_psw_sv_i32(cpu_PSW_SV, cpu_PSW_V); > + /* write back result */ > + tcg_gen_trunc_i64_i32(ret, result); > + } Note that right shifts can't overflow, since the magnitude always reduces. I suppose using the 64-bit shift result is a reasonable way to compute left shift overflow on a 64-bit host. It seems like there's an easier way to compute this that would be better for 32-bit hosts though... One way is to adjust the comparisons for prior to the shift. That is, R >= 0x7fff_ffff = (A << C) >= 0x7fff_ffff = A >= (0x7fff_ffff >> C) Also, using 2 setcond and 1 or is more efficient than 2 movcond with those constants on most hosts. > + case OPC1_16_SRC_ADD_A: > + gen_addi_i32(cpu_gpr_a[r1], cpu_gpr_a[r1], const4); > + break; No PSW computation here. r~