From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NXiTC-0003Tl-DX for qemu-devel@nongnu.org; Wed, 20 Jan 2010 16:52:10 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NXiTA-0003Rz-HE for qemu-devel@nongnu.org; Wed, 20 Jan 2010 16:52:09 -0500 Received: from [199.232.76.173] (port=59213 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NXiTA-0003Rc-93 for qemu-devel@nongnu.org; Wed, 20 Jan 2010 16:52:08 -0500 Received: from are.twiddle.net ([75.149.56.221]:41730) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1NXiT8-0001Pe-KP for qemu-devel@nongnu.org; Wed, 20 Jan 2010 16:52:07 -0500 Message-ID: <4B577515.3050102@twiddle.net> Date: Wed, 20 Jan 2010 13:26:45 -0800 From: Richard Henderson MIME-Version: 1.0 Subject: Re: [Qemu-devel] [PATCH] Porting TCG to alpha platform References: <682404.21141.qm@web15904.mail.cnb.yahoo.com> In-Reply-To: <682404.21141.qm@web15904.mail.cnb.yahoo.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: identifier scorpio Cc: qemu-devel@nongnu.org On 01/20/2010 09:19 AM, identifier scorpio wrote: > Below i'll append my newly generated patch against stable-0.10, > in case it is mangled, i also put it in the attachment. For the record, the inline portion was again mangled; the attachment is fine. > in qemu_ld/st, we must use $16,$17,$18 as temporaries, > because pass them as argument to helper functions such as qemu_ld/st_helpers[]. Ah, yes, I forgot about that. >> With I/J constraints you don't need this special casing. > > I'm not very familiar with I/J constraints and i'll study them later. I = uint8_t, to be used with the second arithmetic input. J = 0, to be used with the first arithmetic input (i.e. $31). >>> + tcg_out_inst2(s, opc^4, >> TMP_REG1, 1); >>> + /* record relocation infor */ >>> + tcg_out_reloc(s, >> s->code_ptr, R_ALPHA_REFQUAD, label_index, 0); >>> + s->code_ptr += 4; >> >> Bug: You've applied the relocation to the wrong >> instruction. >> Bug: What's with the "opc^4"? >> > > what did you mean that i "applied the relocation to the wrong > instruction", couldn't i apply relocation to INDEX_op_brcond_i32 operation? With tcg_out_inst2 you emit the branch instruction, which calls tcg_out32, which increments s->code_ptr. Next you call tcg_out_reloc with the updated s->code_ptr, which means that the relocation gets applied to the instruction *after* the branch. Finally, you increment s->code_ptr *again*, which means that the instruction after the branch is in fact completely uninitialized. > and opc^4 here is used to toggle between OP_BLBC(opcode 0x38) and OP_BLBS(opcode 0x3c), ugly code :) It does beg the question of why you're reversing the sense of the jump at all. Just because the branch is forward doesn't mean its condition should be changed. I think that's definitely a bug. The sense of the jump should be exactly the same, never mind the direction of the jump. Oh... I see what you're doing here -- you're generating the entire branch instruction in patch_reloc, and you're generating branch over branch here. That's both confusing and unnecessary. We should do static void patch_reloc(uint8_t *x_ptr, int type, tcg_target_long value, tcg_target_long addend) { uint32_t *code_ptr = (uint32_t *)x; uint32_t insn = *code_ptr; value += addend; switch (type) { case R_ALPHA_BRADDR: value -= (long)x_ptr + 4; if ((value & 3) || value < -0x400000 || value >= 0x400000) { tcg_abort(); } *code_ptr = insn | INSN_DISP21(val >> 2); break; default: tcg_abort(); } } so as to apply the branch address relocation to the existing insn. Which lets you write static void tcg_out_br(TCGContext *s, int opc, int ra, int label_index) { TCGLabel *l = &s->labels[label_index]; tcg_target_long value; if (l->has_value) { value = l->u.value; value -= (long)s->code_ptr + 4; if ((value & 3) || value < -0x400000 || value >= 0x400000) { tcg_abort(); } value >>= 2; } else { tcg_out_reloc(s, s->code_ptr, R_ALPHA_BRADDR, label_index, 0); value = 0; } tcg_out_fmt_br(s, opc, ra, value); } static void tcg_out_brcond(TCGContext *s, ...) { // Emit comparison into TMP_REG1. opc = (cond & 1) ? INSN_BLBC : INSN_BLBS; tcg_out_br(s, opc, TMP_REG1, label_index); } Isn't that much clearer? > + tcg_out_mov(s, r1, addr_reg); > + tcg_out_mov(s, r0, addr_reg); > + > +#if TARGET_LONG_BITS == 32 > + /* if VM is of 32-bit arch, clear higher 32-bit of addr */ > + tcg_out_fmt_opi(s, INSN_ZAPNOT, r0, 0x0f, r0); > + tcg_out_fmt_opi(s, INSN_ZAPNOT, r1, 0x0f, r1); > +#endif I suggest creating a static inline tcg_out_mov_addr(TCGContext *s, int rd, int rs) { if (TARGET_LONG_BITS == 32) { tcg_out_fmt_opi(s, INSN_ZAPNOT, rs, 0x0f, rd); } else { tcg_out_mov(s, rd, rs); } } and using that throughout qemu_ld/st. That will save some redundant moves you are creating there, as well as removing some conditional compilation. Indeed, I would suggest replacing all of the conditional compilation vs TARGET_LONG_BITS with normal if's. ... Of course, in this particular case, that zapnot is redundant with the INSN_AND that follows, for both R0 and R1. > + tcg_out_movi(s, TCG_TYPE_I64, TMP_REG1, (tcg_target_long)qemu_ld_helpers[s_bits]); > + tcg_out_push(s, addr_reg); > + //tcg_out_push(s, TCG_REG_26); > + //tcg_out_push(s, TCG_REG_15); > + tcg_out_mov(s, TCG_REG_27, TMP_REG1); > + tcg_out_fmt_jmp(s, INSN_CALL, TCG_REG_26, TMP_REG1, 0); > + //tcg_out_pop(s, TCG_REG_15); > + //tcg_out_pop(s, TCG_REG_26); > + tcg_out_pop(s, addr_reg); You need not save and restore ADDR_REG; it is not used after the call. Also, you can load the address into $27 directly and not use the temp. > + *(uint32_t *)label1_ptr = (uint32_t) \ > + ( INSN_BNE | ( TMP_REG1 << 21 ) | ( val & 0x1fffff)); Frankly, I don't really think it's worth being so convoluted with the branches in here. I know that's the way that the i386 port does it in qemu_ld/st, but I think we should rather pattern it after the i386 brcond2. I.e. use gen_new_label to create a new label for use within the routine, use a normal call into tcg_out_br to generate the branch, and use tcg_out_label to place the label at the proper place and resolve the forward branch. It may be be a teeny bit less efficient, but it's far clearer. > + tcg_out_movi(s, TCG_TYPE_I64, TMP_REG1, \ > + offsetof(CPUTLBEntry, addend) - offsetof(CPUTLBEntry, addr_read)); > + tcg_out_fmt_opr(s, INSN_ADDQ, r1, TMP_REG1, r1); > + tcg_out_fmt_mem(s, INSN_LDQ, TMP_REG1, r1, 0); You may want to use tcg_out_ld(s, TCG_TYPE_I64, TMP_REG1, r1, offsetof ...); for clarity, and to reuse the tcg_out_op_long improvements. > +#else > + r0 = addr_reg; > +#endif // endif defined(CONFIG_SOFTMMU) Missing GUEST_BASE handling, though that won't help your winnt... >>> + case INDEX_op_sar_i32: >>> + tcg_out_inst4i(s, >> OP_SHIFT, args[1], 32, FUNC_SLL, args[1]); >>> + tcg_out_inst4i(s, >> OP_SHIFT, args[1], 32, FUNC_SRA, args[1]); >> >> That last shift can be combined with the requested shift >> via addition. For constant input, this saves an insn; for >> register input, the addition can be done in parallel with >> the first shift. > > i changed to use "addl r, 0, r" here. Even better. > I think when qemu met x86 divide instructions, it will call helper > functions to simulate them, must i define div_i32/divu_i32/...? If you want to emulate anything other than x86, yes. r~