From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1MzvrG-00024J-J3 for qemu-devel@nongnu.org; Mon, 19 Oct 2009 13:17:22 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1MzvrC-00021X-0i for qemu-devel@nongnu.org; Mon, 19 Oct 2009 13:17:21 -0400 Received: from [199.232.76.173] (port=43294 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1MzvrB-00021U-PE for qemu-devel@nongnu.org; Mon, 19 Oct 2009 13:17:17 -0400 Received: from cantor2.suse.de ([195.135.220.15]:49892 helo=mx2.suse.de) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1MzvrB-0000HI-00 for qemu-devel@nongnu.org; Mon, 19 Oct 2009 13:17:17 -0400 From: Ulrich Hecht Subject: Re: [Qemu-devel] [PATCH 2/9] S/390 CPU emulation Date: Mon, 19 Oct 2009 19:17:18 +0200 References: <1255696735-21396-1-git-send-email-uli@suse.de> <1255696735-21396-3-git-send-email-uli@suse.de> <20091017104256.GA27224@hall.aurel32.net> In-Reply-To: <20091017104256.GA27224@hall.aurel32.net> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-15" Content-Disposition: inline Message-Id: <200910191917.18972.uli@suse.de> Content-Transfer-Encoding: quoted-printable List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Aurelien Jarno Cc: riku.voipio@iki.fi, qemu-devel@nongnu.org, agraf@suse.de On Saturday 17 October 2009, Aurelien Jarno wrote: > On Fri, Oct 16, 2009 at 02:38:48PM +0200, Ulrich Hecht wrote: > First of all a few general comments. Note that I know very few things > about S390/S390X, so I may have dumb comments/questions. Also as the > patch is very long, I probably have missed things, we should probably > iterate with new versions of the patches. > > Is it possible given the current implementation to emulate S390 in > addition to S390X? Not as it stands. The S/390 instruction set is a subset of zArch=20 (64-bit), but currently only 64-bit addressing is implemented (because=20 that is the only mode used by 64-bit Linux binaries). > If yes how different would be a S390 only target?=20 It would need support for 31-bit addressing. 24-bit, too, if you want to=20 run the really old stuff. > If they won't be too different, it probably worth using _tl type > registers and call it target-s390. A bit the way it is done with > i386/x86_64, ppc/ppc64, mips/mips64 or sparc/sparc64. If it's going to be implemented, it will certainly share most code with=20 the 64-bit target, so having a common directory would most likely be a=20 good idea. There are some cases where _tl types might be appropriate, but generally=20 everything that is shared between S/390 and zArch is 32-bit, and=20 instructions present in both architectures also have the same operand=20 size: L, for instance, is a 32-bit load to a 32-bit register on both=20 architectures. > Secondly there seems to be a lot of mix between 32-bit and 64-bit > TCG registers. They should not be mixed. _i32 ops apply to TCGv_i32 > variables, _i64 ops apply to TCGv_i64 variables, and _tl ops to TGCv > variables. TCGv/_tl types can map either to _i32 or _i64 depending on > your target. You should try to build the target with > --enable-debug-tcg Er, reading the code and observing the behavior of the (AMD64) code=20 generator, it seemed to me that neither frontends nor backends care one=20 bit about that. I'll look into it, though. (I won't comment on those=20 below, except in cases where the solution or, indeed, the problem is not=20 clear to me.) > It would be nice to split all the disassembling part in a separate > combined with the related makefile changes. This way it can be applied > separately. Can do. > > // switch (info->mach) > > // { > > // case bfd_mach_s390_31: > > - current_arch_mask =3D 1 << S390_OPCODE_ESA; > > +// current_arch_mask =3D 1 << S390_OPCODE_ESA; > > // break; > > // case bfd_mach_s390_64: > > -// current_arch_mask =3D 1 << S390_OPCODE_ZARCH; > > + current_arch_mask =3D 1 << S390_OPCODE_ZARCH; > > // break; > > // default: > > // abort (); > > While I understand the second part, Why is it necessary to comment the > bfd_mach_s390_31 case? It's not necessary, but current_arch_mask can only hold one value... > > +typedef union FPReg { > > + struct { > > +#ifdef WORDS_BIGENDIAN > > + float32 e; > > + int32_t __pad; > > +#else > > + int32_t __pad; > > + float32 e; > > +#endif > > + }; > > + float64 d; > > + uint64_t i; > > +} FPReg; > > WORDS_BIGENDIAN is wrong here. It should probably be > HOST_WORDS_BIGENDIAN. Indeed. > Also it may be a better idea to=20 > reuse CPU_FloatU and CPU_DoubleU here. Yes. Didn't know about those. > > +CPUS390XState *cpu_s390x_init(const char *cpu_model) > > This function would probably be better in translate.c, so that > s390x_translate_init() can be declared static. I'll check that. > Please use /* */ for comments (See CODING_STYLE). OK. > > + /* FIXME: reset vector? */ > > Yes, reset vector, MMU mode, default register values, ... Works perfectly fine without. ;) Also, the question remains what kind of reset cpu_reset() is supposed to=20 be. For instance, a CPU Reset or Initial CPU Reset on S/390 leaves the=20 general-purpose registers unchanged, a Clear Reset or Power-On Reset=20 sets them to zero. See the table on p. 4-50 in the zArchitecture=20 Principles of Operation (POP),=20 http://publibfp.boulder.ibm.com/cgi-bin/bookmgr/download/A2278324.pdf,=20 for the gory details. > > +//#define DEBUG_HELPER > > +#ifdef DEBUG_HELPER > > +#define HELPER_LOG(x...) qemu_log(x) > > +#else > > +#define HELPER_LOG(x...) > > +#endif > > Small comment for the rest of the file. While I understand HELPER_LOG > is very handy while developing, I think some calls to it can be > dropped in really simple functions. Yes, a bit of cleanup will be in order. > > + for (i =3D 0; i <=3D l; i++) { > > + x =3D ldub(dest + i) & ldub(src + i); > > + if (x) cc =3D 1; > > coding style Maybe you are referring to "Every indented statement is braced; even if=20 the block contains just one statement.", but I figured that that does=20 not apply here because there is no indentation... > > + if (v < 0) return 1; > > + else if (v > 0) return 2; > > coding style Same here. > The last 6 helpers can probably be coded easily in TCG (they are very > similar to some PowerPC code), though merging this patch with the > helpers is not a problem at all. Again, that would be detrimental to performance. I have experimented with= =20 that and found that TCG is well capable of optimizing out a pure helper=20 call, but not a complex piece of code with branches. Condition codes are=20 not used most times, and the overhead of calling a helper in the cases=20 they are is far smaller than doing all the condition code stuff that is=20 not used. Also reduces code size. > All the branch part would really gain to be coded in TCG, as it will > allow TB chaining. I vaguely remember trying that as well, but I don't know if I gave up on=20 it because it was slower, or because I couldn't get it to work... > > + } > > + else if (r > d) { > > coding style OK, this one is valid. :) > __uint128_t is probably not supported on all hosts/GCC versions. Definitely does not work on 32-bit hosts. > mulu64() should be used instead. Excellent, didn't know about that either. > Same here, __uint128_t should not be used, though I don't know what > should be used instead. Hmmmm... > > +/* unsigned string compare (c is string terminator) */ > > +uint32_t HELPER(clst)(uint32_t c, uint32_t r1, uint32_t r2) > > +{ > > + uint64_t s1 =3D env->regs[r1]; > > + uint64_t s2 =3D env->regs[r2]; > > + uint8_t v1, v2; > > + uint32_t cc; > > + c =3D c & 0xff; > > +#ifdef CONFIG_USER_ONLY > > + if (!c) { > > + HELPER_LOG("%s: comparing '%s' and '%s'\n", > > + __FUNCTION__, (char*)s1, (char*)s2); > > + } > > +#endif > > Why CONFIG_USER_ONLY ? s1 and s2 are target addresses and not valid on the host system in the=20 SOFTMMU case, so this would segfault. > > +/* union used for splitting/joining 128-bit floats to/from 64-bit > > FP regs */ +typedef union { > > + struct { > > +#ifdef WORDS_BIGENDIAN > > + uint64_t h; > > + uint64_t l; > > +#else > > + uint64_t l; > > + uint64_t h; > > +#endif > > + }; > > + float128 x; > > +} FP128; > > WORDS_BIGENDIAN is wrong here. CPU_QuadU can probably be used instead. Agree. > > + if (float32_is_neg(v2)) { > > + v1 =3D float32_abs(v2); > > + } > > + else { > > + v1 =3D v2; > > + } > > I don't see the point of such a test here. Now that you mention it, I don't really see it either. > > + v2.i =3D ldl(a2); > > The value should be passed directly instead of loaded here, as ldl is > is wrong depending on the MMU mode. Yup. > > +#ifdef WORDS_BIGENDIAN > > This is wrong. It should probably be HOST_WORDS_BIGENDIAN. I concur. > > +static TCGv load_reg(int reg) > > +{ > > + TCGv r =3D tcg_temp_new_i64(); > > +#ifdef TCGREGS > > + sync_reg32(reg); > > + tcg_gen_mov_i64(r, tcgregs[reg]); > > + return r; > > +#else > > + tcg_gen_ld_i64(r, cpu_env, offsetof(CPUState, regs[reg])); > > + return r; > > +#endif > > +} > > I don't really like implicit TCGv temp allocation. In other targets it > often has caused missing tcg_temp_free(). I did in fact run into problems with this and was thus forced to make=20 sure that no tcg_temp_free()s are missing... :) > It should probably be=20 > rewritten as load_reg(TCGv t, int reg). Later. This is the kind of change that you never get right on the first=20 try, so I'd like to do that after everything went upstream. > For all register load/store, as already explained in the TCG sync op > patch, I am in favor of using the ld/st version (TCGREGS not defined). It doesn't perform. > > +static void gen_illegal_opcode(DisasContext *s) > > +{ > > + TCGv tmp =3D tcg_temp_new_i64(); > > + tcg_gen_movi_i64(tmp, 42); > > tcg_const_i64 could be used instead. Yes. Using the actual value for a specification exception instead of 42=20 might be a good idea as well. :) > > > + gen_helper_exception(tmp); > > Missing tcg_temp_free_i64(tmp); OK, so I missed _that_ one. Fine. Be like that if you want to. Illegal=20 instructions are not that frequent anyway... > > + gen_helper_cmp_s32(cc, v1, tcg_const_i32(v2)); > > The TCG passed to the helper should be freed. And that one... > > + gen_helper_cmp_u32(cc, v1, tcg_const_i32(v2)); > > Same. And that one... :( > > + gen_helper_cmp_s64(cc, v1, tcg_const_i64(v2)); > > Same And that one... :(( > > + gen_helper_cmp_u64(cc, v1, tcg_const_i64(v2)); > > Same And that one... :((( > The branches should be handled using brcond and goto_tb and exit_tb > and not with helpers, to allow TB chaining. I'll look into that, but I guess it's not entirely trivial. > > + TCGv tmp =3D 0, tmp2 =3D 0, tmp3 =3D 0; > > This is wrong. 0 maps to global 0 (env in your case). -1 should be > used instead, or even better the TCGV_UNUSED macro. OK. > > + case 0x12: /* LT R1,D2(X2,B2) [RXY] */ > > + tmp2 =3D tcg_temp_new_i32(); > > Wrong TCGv type. > > > + tcg_gen_qemu_ld32s(tmp2, tmp, 1); > > tcg_gen_qemu_ld32s loads a 32 bit value in the default size register, > that is 64-bit here. Actually, this a purely 32-bit insn, so it should be tcg_gen_qemu_ld32()=20 in the first place. > > + tcg_gen_qemu_ld32s(tmp2, tmp, 1); > > + tcg_gen_ext32s_i64(tmp2, tmp2); > > The sign extension is already done by qemu_ld32s Just to be on the safe side. :) > > + switch (op) { > > + case 0x8: case 0x18: gen_set_cc_add64(tmp, tmp2, tmp3); > > break; + case 0xa: case 0x1a: gen_helper_set_cc_addu64(cc, > > tmp, tmp2, tmp3); break; + default: tcg_abort(); > > tcg_abort() is wrong here, it is for internal TCG use. Also all the > real CPU it probably launch an illegal instruction exception, it does > not power off the machin. Actually, if default is reached here there's an error in the translator=20 (it should only get here in the explicit cases), so I'd consider=20 tcg_abort() appropriate. > > + case 0x1e: /* LRV R1,D2(X2,B2) [RXY] */ > > + tmp2 =3D tcg_temp_new_i32(); > > Wrong TCGv type; > > > + tcg_gen_qemu_ld32u(tmp2, tmp, 1); > > + tcg_gen_bswap32_i32(tmp2, tmp2); > > Wrong TCGv type; > > > + store_reg(r1, tmp2); This is more wrong than you think, because this is a 32-bit insn, so it=20 shouldn't overwrite the top 32 bits. Rarely used, I guess that's how it=20 could slip through. > > + case 0x3e: /* STRV R1,D2(X2,B2) [RXY] */ > > + tmp2 =3D load_reg32(r1); > > + tcg_gen_bswap32_i32(tmp2, tmp2); > > + tcg_gen_qemu_st32(tmp2, tmp, 1); > > Wrong TCGv type. What's wrong here? Everything's 32-bit. > > + case 0x57: /* XY R1,D2(X2,B2) [RXY] */ > > + tmp2 =3D load_reg32(r1); > > + tmp3 =3D tcg_temp_new_i32(); > > + tcg_gen_qemu_ld32u(tmp3, tmp, 1); > > + tcg_gen_xor_i32(tmp, tmp2, tmp3); > > + store_reg32(r1, tmp); > > + set_cc_nz_u32(tmp); > > Wrong TCGv type. No need to zero-extend anyway, it's a 32-bit insn. > > + case 0x76: /* LB R1,D2(X2,B2) [RXY] */ > > + case 0x77: /* LGB R1,D2(X2,B2) [RXY] */ > > + tmp2 =3D tcg_temp_new_i64(); > > + tcg_gen_qemu_ld8s(tmp2, tmp, 1); > > + switch (op) { > > + case 0x76: > > + tcg_gen_ext8s_i32(tmp2, tmp2); > > Wrong TCGv type. > > > + store_reg32(r1, tmp2); > > + break; > > + case 0x77: > > + tcg_gen_ext8s_i64(tmp2, tmp2); > > Wrong TCGv type. The way I see it, there are only 32-bit and 64-bit types, so it can only=20 be wrong in one case, right? > > + case 0x78: /* LHY R1,D2(X2,B2) [RXY] */ > > + tmp2 =3D tcg_temp_new_i32(); > > Wrong TCGv type. > > > + tcg_gen_qemu_ld16s(tmp2, tmp, 1); > > + tcg_gen_ext16s_i32(tmp2, tmp2); > > + store_reg32(r1, tmp2); > > + break; Replacing that tcg_gen_qemu_ld16s() with tcg_gen_qemu_ld16() should do=20 it, right? > > + case 0x86: /* MLG R1,D2(X2,B2) [RXY] */ > > + tmp2 =3D tcg_temp_new_i64(); > > + tcg_gen_qemu_ld64(tmp2, tmp, 1); > > + tmp =3D tcg_const_i32(r1); > > Wrong TCGv type. > > > + gen_helper_mlg(tmp, tmp2); Huh? It's a register number. There are only 16 GP registers. Fits a=20 32-bit value, last time I checked. The helper also expects a 32-bit=20 value. > > + if (tmp2) tcg_temp_free(tmp2); > > + if (tmp3) tcg_temp_free(tmp3); > > Comparison on TCGv type is not allowed. Not even to TCGV_UNUSED? > > +static void disas_ed(DisasContext *s, int op, int r1, int x2, int > > b2, int d2, int r1b) +{ > > + TCGv tmp, tmp2, tmp3 =3D 0; > > tmp should be declared as TGV_i32 here, so that the types are correct > for the whole function. Also tmp3 should not be initialized to 0. OK. > > + gen_helper_madb(tmp3, tmp2, tmp); > > tmp3 should be freed after the helper. Yup. > > + LOG_DISAS("disas_c0: op 0x%x r1 %d i2 %d\n", op, r1, i2); > > + uint64_t target =3D s->pc + i2 * 2; > > + /* FIXME: huh? */ target &=3D 0xffffffff; > > That should be fixed before a merge. Tricky, because I have not yet understood why it is necessary. According=20 to the POP, the address calculation in 64-bit addressing mode should be=20 64 bits. In practice, on a real machine running a real Linux kernel, it=20 wraps around at 32 bits, and userland code actually relies on that. No=20 clue what I could be missing here... > > +static inline uint64_t ld_code2(uint64_t pc) { return > > (uint64_t)lduw_code(pc); } > > Coding style. OK. > > +#if 1 > > + cc =3D tcg_temp_local_new_i32(); > > + tcg_gen_mov_i32(cc, global_cc); > > +#else > > + cc =3D global_cc; > > +#endif > > Why this? Because it didn't work otherwise. I don't claim to understand the=20 mysterious ways of the TCG to its full extent... > > + } while (!dc.is_jmp && gen_opc_ptr < gen_opc_end && dc.pc < > > next_page_start && num_insns < max_insns); > > The also translation should be stopped if in singlestep mode > (singlestep variable). OK. Phew, that was a big one. :) Thanks for reviewing. I'll fix what makes sense to me and send a new one=20 when I'm done. CU Uli --=20 SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG N=FCrnberg)