From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1NXiTC-0003Tl-DX
	for qemu-devel@nongnu.org; Wed, 20 Jan 2010 16:52:10 -0500
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1NXiTA-0003Rz-HE
	for qemu-devel@nongnu.org; Wed, 20 Jan 2010 16:52:09 -0500
Received: from [199.232.76.173] (port=59213 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1NXiTA-0003Rc-93
	for qemu-devel@nongnu.org; Wed, 20 Jan 2010 16:52:08 -0500
Received: from are.twiddle.net ([75.149.56.221]:41730)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <rth@twiddle.net>) id 1NXiT8-0001Pe-KP
	for qemu-devel@nongnu.org; Wed, 20 Jan 2010 16:52:07 -0500
Message-ID: <4B577515.3050102@twiddle.net>
Date: Wed, 20 Jan 2010 13:26:45 -0800
From: Richard Henderson <rth@twiddle.net>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] [PATCH] Porting TCG to alpha platform
References: <682404.21141.qm@web15904.mail.cnb.yahoo.com>
In-Reply-To: <682404.21141.qm@web15904.mail.cnb.yahoo.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: identifier scorpio <cidentifier@yahoo.com.cn>
Cc: qemu-devel@nongnu.org

On 01/20/2010 09:19 AM, identifier scorpio wrote:
> Below i'll append my newly generated patch against stable-0.10,
> in case it is mangled, i also put it in the attachment.

For the record, the inline portion was again mangled; the attachment is 
fine.

> in qemu_ld/st, we must use $16,$17,$18 as temporaries,
> because pass them as argument to helper functions such as qemu_ld/st_helpers[].

Ah, yes, I forgot about that.

>> With I/J constraints you don't need this special casing.
>
> I'm not very familiar with I/J constraints and i'll study them later.

I = uint8_t, to be used with the second arithmetic input.
J = 0, to be used with the first arithmetic input (i.e. $31).

>>> +        tcg_out_inst2(s, opc^4,
>> TMP_REG1, 1);
>>> +    /* record relocation infor */
>>> +        tcg_out_reloc(s,
>> s->code_ptr, R_ALPHA_REFQUAD, label_index, 0);
>>> +        s->code_ptr += 4;
>>
>> Bug: You've applied the relocation to the wrong
>> instruction.
>> Bug: What's with the "opc^4"?
>>
>
> what did you mean that i "applied the relocation to the wrong
> instruction", couldn't i apply relocation to INDEX_op_brcond_i32 operation?

With tcg_out_inst2 you emit the branch instruction, which calls 
tcg_out32, which increments s->code_ptr.  Next you call tcg_out_reloc 
with the updated s->code_ptr, which means that the relocation gets 
applied to the instruction *after* the branch.  Finally, you increment 
s->code_ptr *again*, which means that the instruction after the branch 
is in fact completely uninitialized.

> and opc^4 here is used to toggle between OP_BLBC(opcode 0x38) and OP_BLBS(opcode 0x3c), ugly code :)

It does beg the question of why you're reversing the sense of the jump 
at all.  Just because the branch is forward doesn't mean its condition 
should be changed.  I think that's definitely a bug.  The sense of the 
jump should be exactly the same, never mind the direction of the jump.

Oh... I see what you're doing here -- you're generating the entire 
branch instruction in patch_reloc, and you're generating branch over 
branch here.  That's both confusing and unnecessary.

We should do

static void patch_reloc(uint8_t *x_ptr, int type,
                         tcg_target_long value,
                         tcg_target_long addend)
{
     uint32_t *code_ptr = (uint32_t *)x;
     uint32_t insn = *code_ptr;

     value += addend;
     switch (type) {
     case R_ALPHA_BRADDR:
         value -= (long)x_ptr + 4;
         if ((value & 3) || value < -0x400000 || value >= 0x400000) {
             tcg_abort();
         }
         *code_ptr = insn | INSN_DISP21(val >> 2);
         break;

     default:
         tcg_abort();
     }
}

so as to apply the branch address relocation to the existing insn.
Which lets you write

static void tcg_out_br(TCGContext *s, int opc, int ra, int label_index)
{
     TCGLabel *l = &s->labels[label_index];
     tcg_target_long value;

     if (l->has_value) {
         value = l->u.value;
         value -= (long)s->code_ptr + 4;
         if ((value & 3) || value < -0x400000 || value >= 0x400000) {
             tcg_abort();
         }
         value >>= 2;
     } else {
         tcg_out_reloc(s, s->code_ptr, R_ALPHA_BRADDR, label_index, 0);
         value = 0;
     }
     tcg_out_fmt_br(s, opc, ra, value);
}

static void tcg_out_brcond(TCGContext *s, ...)
{
     // Emit comparison into TMP_REG1.

     opc = (cond & 1) ? INSN_BLBC : INSN_BLBS;
     tcg_out_br(s, opc, TMP_REG1, label_index);
}

Isn't that much clearer?

> +    tcg_out_mov(s, r1, addr_reg);
> +    tcg_out_mov(s, r0, addr_reg);
> +
> +#if TARGET_LONG_BITS == 32
> +    /* if VM is of 32-bit arch, clear higher 32-bit of addr */
> +    tcg_out_fmt_opi(s, INSN_ZAPNOT, r0, 0x0f, r0);
> +    tcg_out_fmt_opi(s, INSN_ZAPNOT, r1, 0x0f, r1);
> +#endif

I suggest creating a

static inline tcg_out_mov_addr(TCGContext *s, int rd, int rs)
{
     if (TARGET_LONG_BITS == 32) {
         tcg_out_fmt_opi(s, INSN_ZAPNOT, rs, 0x0f, rd);
     } else {
         tcg_out_mov(s, rd, rs);
     }
}

and using that throughout qemu_ld/st.  That will save some redundant 
moves you are creating there, as well as removing some conditional 
compilation.  Indeed, I would suggest replacing all of the conditional 
compilation vs TARGET_LONG_BITS with normal if's.

... Of course, in this particular case, that zapnot is redundant with 
the INSN_AND that follows, for both R0 and R1.

> +    tcg_out_movi(s, TCG_TYPE_I64, TMP_REG1, (tcg_target_long)qemu_ld_helpers[s_bits]);
> +    tcg_out_push(s, addr_reg);
> +    //tcg_out_push(s, TCG_REG_26);
> +    //tcg_out_push(s, TCG_REG_15);
> +    tcg_out_mov(s, TCG_REG_27, TMP_REG1);
> +    tcg_out_fmt_jmp(s, INSN_CALL, TCG_REG_26, TMP_REG1, 0);
> +    //tcg_out_pop(s, TCG_REG_15);
> +    //tcg_out_pop(s, TCG_REG_26);
> +    tcg_out_pop(s, addr_reg);

You need not save and restore ADDR_REG; it is not used after the call.
Also, you can load the address into $27 directly and not use the temp.

> +    *(uint32_t *)label1_ptr = (uint32_t) \
> +        ( INSN_BNE | ( TMP_REG1 << 21 ) | ( val & 0x1fffff));

Frankly, I don't really think it's worth being so convoluted with the 
branches in here.  I know that's the way that the i386 port does it in 
qemu_ld/st, but I think we should rather pattern it after the i386 brcond2.

I.e. use gen_new_label to create a new label for use within the routine, 
use a normal call into tcg_out_br to generate the branch, and use 
tcg_out_label to place the label at the proper place and resolve the 
forward branch.

It may be be a teeny bit less efficient, but it's far clearer.

> +    tcg_out_movi(s, TCG_TYPE_I64, TMP_REG1, \
> +	offsetof(CPUTLBEntry, addend) - offsetof(CPUTLBEntry, addr_read));
> +    tcg_out_fmt_opr(s, INSN_ADDQ, r1, TMP_REG1, r1);
> +    tcg_out_fmt_mem(s, INSN_LDQ, TMP_REG1, r1, 0);

You may want to use

   tcg_out_ld(s, TCG_TYPE_I64, TMP_REG1, r1, offsetof ...);

for clarity, and to reuse the tcg_out_op_long improvements.

> +#else
> +    r0 = addr_reg;
> +#endif // endif defined(CONFIG_SOFTMMU)

Missing GUEST_BASE handling, though that won't help your winnt...

>>> +    case INDEX_op_sar_i32:
>>> +        tcg_out_inst4i(s,
>> OP_SHIFT, args[1], 32, FUNC_SLL, args[1]);
>>> +        tcg_out_inst4i(s,
>> OP_SHIFT, args[1], 32, FUNC_SRA, args[1]);
>>
>> That last shift can be combined with the requested shift
>> via addition. For constant input, this saves an insn; for
>> register input, the addition can be done in parallel with
>> the first shift.
>
> i changed to use "addl r, 0, r" here.

Even better.

> I think when qemu met x86 divide instructions, it will call helper
> functions to simulate them, must i define div_i32/divu_i32/...?

If you want to emulate anything other than x86, yes.


r~