From: Claudio Fontana <claudio.fontana@huawei.com>
To: Richard Henderson <rth@twiddle.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
qemu-devel@nongnu.org, Peter Maydell <peter.maydell@linaro.org>
Subject: Re: [Qemu-devel] [PATCH 3/3] tcg/aarch64: implement new TCG target for aarch64
Date: Tue, 14 May 2013 16:05:43 +0200 [thread overview]
Message-ID: <519244B7.4090207@huawei.com> (raw)
In-Reply-To: <519143B8.1030109@twiddle.net>
On 13.05.2013 21:49, Richard Henderson wrote:
> On 05/13/2013 06:33 AM, Claudio Fontana wrote:
>> +enum aarch64_cond_code {
>> + COND_EQ = 0x0,
>> + COND_NE = 0x1,
>> + COND_CS = 0x2, /* Unsigned greater or equal */
>> + COND_HS = 0x2, /* ALIAS greater or equal */
>
> Clearer to define aliases as COND_HS = COND_CS.
I agree, will change.
>> +static inline void tcg_out_movi64(TCGContext *s, int rd, uint64_t value)
>> +{
>> + uint32_t half, base, movk = 0, shift = 0;
>> + if (!value) {
>> + tcg_out_movr(s, 1, rd, TCG_REG_XZR);
>> + return;
>> + }
>> + /* construct halfwords of the immediate with MOVZ with LSL */
>> + /* using MOVZ 0x52800000 | extended reg.. */
>> + base = 0xd2800000;
>> +
>> + while (value) {
>> + half = value & 0xffff;
>> + if (half) {
>> + /* Op can be MOVZ or MOVK */
>> + tcg_out32(s, base | movk | shift | half << 5 | rd);
>> + if (!movk)
>> + movk = 0x20000000; /* morph next MOVZs into MOVKs */
>> + }
>> + value >>= 16;
>> + shift += 0x00200000;
>
> You'll almost certainly want to try ADP+ADD before decomposing into 3-4 mov[zk]
> instructions.
I don't know the ADP instruction; but this movi can be improved.
Right now it needs 1 to 4 mov[zk] instructions, depending on the value:
it depends on how many 0000h 16 bit holes there are in the 64bit value.
I was thinking a successive patch to experiment with this.
>
>> +void aarch64_tb_set_jmp_target(uintptr_t jmp_addr, uintptr_t addr)
>> +{
>> + tcg_target_long target, offset;
>> + target = (tcg_target_long)addr;
>> + offset = (target - (tcg_target_long)jmp_addr) / 4;
>> +
>> + if (offset <= -0x02000000 || offset >= 0x02000000) {
>> + /* out of 26bit range */
>> + tcg_abort();
>> + }
>
> See MAX_CODE_GEN_BUFFER_SIZE in translate-all.c. Set this value to 128MB and
> then all cross-TB branches will be in range, and the abort won't trigger.
That's great, I missed that. I will add a change to that end in translate-all.c .
>
>> +static inline void tcg_out_goto_label_cond(TCGContext *s, TCGCond c, int label_index)
>> +{
>> + tcg_target_long offset;
>> + /* backward conditional jump never seems to happen in practice,
>> + so just always use the branch trampoline */
>> + c = tcg_invert_cond(c);
>> + offset = 2; /* skip current instr and the next */
>> + tcg_out_goto_cond(s, c, offset);
>> + tcg_out_goto_label(s, label_index); /* emit 26bit jump */
>> +}
>
> Conditional branch range is +-1MB. You'll never see a TB that large. You
> don't need to emit a branch-across-branch.
Is there maybe a way to do it right even in the corner case where we have
a huge list of hundreds of thousands of instructions without jumps and then a conditional jump?
Are we _guaranteed_ to never see that large a TB with some kind of define,
similarly to MAX_CODE_GEN_BUFFER_SIZE?
I know, it's a corner case that would only trigger in a very strange program, but
I would propose to add this topic to the TODO list for successive patches.
>
>> + /* Should generate something like the following:
>> + * shr x8, addr_reg, #TARGET_PAGE_BITS
>> + * and x0, x8, #(CPU_TLB_SIZE - 1) @ Assumption: CPU_TLB_BITS <= 8
>> + * add x0, env, x0 lsl #CPU_TLB_ENTRY_BITS
>> + */
>> +# if CPU_TLB_BITS > 8
>> +# error "CPU_TLB_BITS too large"
>> +# endif
>
> I wonder if using UBFM to extract the TLB bits and BFM with XZR to clear the
> middle bits wouldn't be better, as you wouldn't be restricted on the size of
> CPU_TLB_BITS. AFAICS it would be the same number of instructions.
Hmm..
>
>> + case INDEX_op_mov_i64: ext = 1;
>> + case INDEX_op_mov_i32:
>> + tcg_out_movr(s, ext, args[0], args[1]);
>> + break;
>
> See how the i386 backend uses macros to reduce the typing with these sorts of
> paired opcodes.
I saw the glue thing, and I try to stay away from that kind of preprocessor use,
as it makes it more difficult for newcomers to dig in, since it breaks gid / grepping
for symbols among other things.
I used instead the editor and regexps to generate the list.
Maybe this could be patched up later if that seems to be the consensus?
>
>> + case INDEX_op_rotl_i64: ext = 1;
>> + case INDEX_op_rotl_i32: /* same as rotate right by (32 - m) */
>> + if (const_args[2]) /* ROR / EXTR Wd, Wm, Wm, 32 - m */
>> + tcg_out_rotl(s, ext, args[0], args[1], args[2]);
>> + else { /* no RSB in aarch64 unfortunately. */
>> + /* XXX UNTESTED */
>> + tcg_out_movi32(s, ext, TCG_REG_X8, ext ? 64 : 32);
>
> But A64 does have shift counts that truncate to the width of the operation.
> Which means that the high bits may contain garbage, which means that you can
> compute this merely as ROR = -ROL, ignoring the 32/64.
I see.
>
>> + case INDEX_op_setcond_i64: ext = 1;
>> + case INDEX_op_setcond_i32:
>> + tcg_out_movi32(s, ext, TCG_REG_X8, 0x01);
>> + tcg_out_cmp(s, ext, args[1], args[2]);
>> + tcg_out_csel(s, ext, args[0], TCG_REG_X8, TCG_REG_XZR,
>> + tcg_cond_to_aarch64_cond[args[3]]);
>
> See CSINC Wd,Wzr,Wzr,cond. No need for the initial movi.
Yes, that was mentioned also by Peter, will change.
>
>> + tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_I32], 0, 0xffff);
>> + tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_I64], 0, 0xffff);
>
> Only half of your registers are marked available.
Oops, will fix. Thanks,
--
Claudio Fontana
next prev parent reply other threads:[~2013-05-14 14:06 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-14 15:57 [Qemu-devel] QEMU aarch64 TCG target Claudio Fontana
2013-03-14 16:16 ` Peter Maydell
2013-05-06 12:56 ` [Qemu-devel] QEMU aarch64 TCG target - testing question about x86-64 Claudio Fontana
2013-05-06 13:27 ` Paolo Bonzini
2013-05-13 13:22 ` [Qemu-devel] [PATCH 0/3] ARM aarch64 TCG target Claudio Fontana
2013-05-13 13:28 ` [Qemu-devel] [PATCH 1/3] configure: permit compilation on arm aarch64 Claudio Fontana
2013-05-13 18:29 ` Peter Maydell
2013-05-14 8:19 ` Claudio Fontana
2013-05-13 13:31 ` [Qemu-devel] [PATCH 2/3] include/elf.h: add aarch64 ELF machine and relocs Claudio Fontana
2013-05-13 18:34 ` Peter Maydell
2013-05-14 8:24 ` Claudio Fontana
2013-05-13 13:33 ` [Qemu-devel] [PATCH 3/3] tcg/aarch64: implement new TCG target for aarch64 Claudio Fontana
2013-05-13 18:28 ` Peter Maydell
2013-05-14 12:01 ` Claudio Fontana
2013-05-14 12:25 ` Peter Maydell
2013-05-14 15:19 ` Richard Henderson
2013-05-16 14:39 ` Claudio Fontana
2013-05-14 12:41 ` Laurent Desnogues
2013-05-13 19:49 ` Richard Henderson
2013-05-14 14:05 ` Claudio Fontana [this message]
2013-05-14 15:16 ` Richard Henderson
2013-05-14 16:26 ` Richard Henderson
2013-05-06 13:42 ` [Qemu-devel] QEMU aarch64 TCG target - testing question about x86-64 Peter Maydell
2013-05-23 8:09 ` [Qemu-devel] [PATCH 0/4] ARM aarch64 TCG target VERSION 2 Claudio Fontana
2013-05-23 8:14 ` [Qemu-devel] [PATCH 1/4] include/elf.h: add aarch64 ELF machine and relocs Claudio Fontana
2013-05-23 13:18 ` Peter Maydell
2013-05-28 8:09 ` Laurent Desnogues
2013-05-23 8:18 ` [Qemu-devel] [PATCH 2/4] tcg/aarch64: implement new TCG target for aarch64 Claudio Fontana
2013-05-23 16:29 ` Richard Henderson
2013-05-24 8:53 ` Claudio Fontana
2013-05-24 17:02 ` Richard Henderson
2013-05-24 17:08 ` Peter Maydell
2013-05-24 17:17 ` Richard Henderson
2013-05-24 17:28 ` Peter Maydell
2013-05-24 17:54 ` Richard Henderson
2013-05-27 11:43 ` Claudio Fontana
2013-05-27 18:47 ` Richard Henderson
2013-05-27 21:14 ` [Qemu-devel] [PATCH 3/3] " Laurent Desnogues
2013-05-28 13:01 ` Claudio Fontana
2013-05-28 13:09 ` Laurent Desnogues
2013-05-28 7:17 ` [Qemu-devel] [PATCH 2/4] " Claudio Fontana
2013-05-28 14:52 ` Richard Henderson
2013-05-23 16:39 ` Peter Maydell
2013-05-24 8:51 ` Claudio Fontana
2013-05-27 9:10 ` Claudio Fontana
2013-05-27 10:40 ` Peter Maydell
2013-05-27 17:05 ` Richard Henderson
2013-05-27 9:47 ` Laurent Desnogues
2013-05-27 10:13 ` Claudio Fontana
2013-05-27 10:28 ` Laurent Desnogues
2013-05-28 13:14 ` Laurent Desnogues
2013-05-28 14:37 ` Claudio Fontana
2013-05-23 8:19 ` [Qemu-devel] [PATCH 3/4] configure: permit compilation on arm aarch64 Claudio Fontana
2013-05-23 13:24 ` Peter Maydell
2013-05-23 8:22 ` [Qemu-devel] [PATCH 4/4] tcg/aarch64: more ops in preparation of tlb lookup Claudio Fontana
2013-05-23 12:37 ` [Qemu-devel] [PATCH 0/4] ARM aarch64 TCG target VERSION 2 Andreas Färber
2013-05-23 12:50 ` Peter Maydell
2013-05-23 12:53 ` Andreas Färber
2013-05-23 13:03 ` Peter Maydell
2013-05-23 13:27 ` Claudio Fontana
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=519244B7.4090207@huawei.com \
--to=claudio.fontana@huawei.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=rth@twiddle.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.