From: Aurelien Jarno <aurelien@aurel32.net>
To: Richard Henderson <rth@twiddle.net>
Cc: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 2/4] tcg-hppa: Finish the port.
Date: Fri, 9 Apr 2010 01:01:49 +0200 [thread overview]
Message-ID: <20100408230149.GC2220@volta.aurel32.net> (raw)
In-Reply-To: <4BBE0529.7000404@twiddle.net>
On Thu, Apr 08, 2010 at 09:32:41AM -0700, Richard Henderson wrote:
> On 04/08/2010 02:56 AM, Aurelien Jarno wrote:
> > I have applied the patch. I have some comments though, it would be nice
> > if you can address them with additional patches.
>
> Sure.
>
> >> +static void tcg_out_ori(TCGContext *s, int ret, int arg, tcg_target_ulong m)
> >> +{
> >> + if (m == 0) {
> >> + tcg_out_mov(s, ret, arg);
> >> + } else if (m == -1) {
> >> + tcg_out_movi(s, TCG_TYPE_I32, ret, -1);
> >
> > Those cases are already eliminated in tcg/tcg-op.h. This code looks
> > redundant.
>
> The cases eliminated in tcg-op.h are with immediate constants.
> There is no generic code in tcg.c to eliminate these cases
> after constant propagation. However, I can remove them with...
Ok, fine.
> >> + } else {
> >> + tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_R1, m);
> >> + tcg_out_arith(s, ret, arg, TCG_REG_R1, INSN_OR);
> >
> > Do we really want a movi here? It would be better to leave the tcg code
> > load the constant itself, so that if the same constant is used twice, it
> > is only loaded once.
>
> I've never caught TCG properly re-using constants, but I take your
> point -- there's no reason why tcg.c can't be improved, and this
> port would miss out on that improvement. I'll invent a constraint
> that matches or_mask_p.
>
> >> +static void tcg_out_andi(TCGContext *s, int ret, int arg, tcg_target_ulong m)
> ...
> >> + tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_R1, m);
> >> + tcg_out_arith(s, ret, arg, TCG_REG_R1, INSN_AND);
> >
> > Same.
>
> ANDI slightly different case. This function is used by tcg_out_tlb_read
> with constants that may or may not satisfy and_mask_p. I think it's cleaner
> to handle the arbitrary case here, rather than open code the same test in
> the tlb read function.
>
> I will of course add a constraint to match and_mask_p, for ANDs that
> originate within the opcode stream.
Ok, fine.
> >> + tcg_out_reloc(s, s->code_ptr, R_PARISC_PCREL17F, label_index, 0);
> >> + tcg_out32(s, op);
> >
> > This breaks partial retranslation. The bits corresponding to the offset
> > should be preserved.
>
> I don't recall ever hearing about re-translation. Can you point me
> at the bits that do it, so I can figure out what's going on? This
> sounds like something that ought to be documented properly...
>
> I rather assumed that the "addend" parameter to patch_reloc would
> hold whatever is really needed to be preserved. What else is that
> field for, anyway?
The problem is that in case of an exception, the code is retranslated to
get the address (on the guest side) of the exception. The retranslation
is done using the same buffer, and stops as soon as the address is
found.
It means that the branch instruction is rewritten with a new address,
why the relocation is not retranslated. In short the jump address is
then pointing to the wrong address, which causes either an endless loop
or a crash.
This is something visible in system mode, usually it starts to appear
when the guest switches to userland.
To prevent that, the code should change only the bits defining the
jump instruction, leaving the others defining the address unchanged.
> >> - if (opc == 3)
> >> - data_reg2 = *args++;
> >> - else
> >> - data_reg2 = 0; /* suppress warning */
> >> + data_reg2 = (opc == 3 ? *args++ : TCG_REG_R0);
> >
> > I am not sure TCG_REG_R0 is really correct here, and I find it confusing.
> > While it's value is zero, the assignment there is just to make GCC
> > happy, it won't be used after
>
> Correct. I don't see what else I can really do though. I think it's
> worse to mix types: integer-as-register-number (i.e. *args) and
> integer-as-filler (i.e. 0). Better to at least have them be the same
> type as it clarifies that *args must be a register number.
>
> Perhaps just a comment here?
I think the old code was actually pretty fine, that is the '0' value,
plus a comment. I don't really see why it was necessary to change this
code.
> >> #if defined(CONFIG_SOFTMMU)
> >> - tcg_out_mov(s, r1, addr_reg);
> >> + lab1 = gen_new_label();
> >> + lab2 = gen_new_label();
> >
> > Do you really want to use label here? load/store are the most common
> > instructions, I am not really sure of the resulting performance.
>
> I think the code is *so* much more readable re-using the usual branch
> and relocate code. I'd almost rather spend the time speeding up the
> use of temporary labels than uglifying the code here.
>
> >> + /* These three correspond exactly to the fallback implementation.
> >> + But by including them we reduce the number of TCG ops that
> >> + need to be generated, and these opcodes are fairly common. */
> >
> > Are you sure it really makes a difference?
>
> Not quantifiably, but the reasoning is sound. I can remove them if you insist.
My point is on one side you seems to look for performance, while on the
others (labels just above), you don't really care about performance.
> >> + { INDEX_op_add_i32, { "r", "rZ", "ri" } },
> >> + { INDEX_op_sub_i32, { "r", "rI", "ri" } },
> >> + { INDEX_op_and_i32, { "r", "rZ", "ri" } },
> >> + { INDEX_op_or_i32, { "r", "rZ", "ri" } },
> >
> > Already commented for "and" and "or", but the same apply for add and
> > sub. Do we really need a "i" contraints here if the constant is going
> > to be loaded with a movi.
>
> ADD and SUB are not going to use movi. They will use one or both of
> ADDIL (21-bit constant << 11) and LDO (14-bit constant). As a pair
> these insns can perform a full 32-bit constant addition.
>
> I suppose technically there's a subset of 32-bit constants that could
> benefit from generic code loading constants into registers. The only
> valid output register for ADDIL is R1. So at the moment for
>
> R3 = R4 + 0x10000;
>
> we generate
>
> addil 0x10000, r4, r1
> copy r1, r3
>
> where we could equivalently generate
>
> ldil 0x10000, r5
> add r4, r5, r3
>
> However I don't think this is worth worrying about in the short term.
Ok.
> >> + if (GUEST_BASE != 0) {
> >> + tcg_out_movi(s, TCG_TYPE_PTR, TCG_GUEST_BASE_REG, GUEST_BASE);
> >> + }
> >
> > The final GUEST_BASE value is computed after the prologue has been
> > generated. The value is modified in two cases:
> > - The user specify a non-aligned base address.
> > - /proc/sys/vm/mmap_min_addr is different than 0, which is now the
> > in default configuration for more than one year.
> >
> > When it happens, the guest crashes almost immediately.
>
> To be fair, mmap_min_addr only affects GUEST_BASE if the executable
> image we've loaded overlaps. Which is uncommon, but certainly possible.
When I found the problem on the tcg ia64, it was crashing for all
binaries I tried.
> Hmm. I wonder which is better: one extra instruction needed per qemu_ld
> vs having one more call-saved register available. At the moment we don't
> even come close to using all of the call-saved registers, and it would be
> easy enough to have the prologue read the actual guest_base variable rather
> than embed the constant.
>
The other option is to reorganize the order in which the prologue is
generated and the guest base value computed. The work is probably more
important though.
--
Aurelien Jarno GPG: 1024D/F1BCDB73
aurelien@aurel32.net http://www.aurel32.net
next prev parent reply other threads:[~2010-04-08 23:02 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-23 22:33 [Qemu-devel] [PATCH 0/2] tcg-hppa finish, v3 Richard Henderson
2010-03-12 14:58 ` [Qemu-devel] [PATCH 1/2] tcg-hppa: Compute is_write in cpu_signal_handler Richard Henderson
2010-03-23 22:06 ` [Qemu-devel] [PATCH 2/2] tcg-hppa: Finish the port Richard Henderson
2010-04-07 11:45 ` [Qemu-devel] [PATCH 0/2] tcg-hppa finish, v3 Richard Henderson
2010-04-07 11:56 ` Aurelien Jarno
2010-04-07 11:56 ` [Qemu-devel] [PATCH 2/4] tcg-hppa: Finish the port Richard Henderson
2010-04-08 9:56 ` Aurelien Jarno
2010-04-08 16:32 ` Richard Henderson
2010-04-08 21:48 ` Richard Henderson
2010-04-08 23:01 ` Aurelien Jarno [this message]
2010-04-07 14:46 ` [Qemu-devel] [PATCH 3/4] tcg-hppa: Fix in/out register overlap in add2/sub2 Richard Henderson
2010-04-08 9:57 ` Aurelien Jarno
2010-04-07 23:24 ` [Qemu-devel] [PATCH 4/4] tcg-hppa: Don't try to calls to non-constant addresses Richard Henderson
2010-04-08 9:58 ` Aurelien Jarno
2010-04-07 23:29 ` [Qemu-devel] [PATCH 0/4] tcg-hppa finish, v4 Richard Henderson
2010-04-08 9:36 ` Aurelien Jarno
2010-04-07 23:42 ` [Qemu-devel] [PATCH 1/4] tcg-hppa: Compute is_write in cpu_signal_handler Richard Henderson
2010-04-08 9:36 ` Aurelien Jarno
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100408230149.GC2220@volta.aurel32.net \
--to=aurelien@aurel32.net \
--cc=qemu-devel@nongnu.org \
--cc=rth@twiddle.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.