* [Qemu-devel] [PATCH 0/2] tcg/aarch64 fixes
@ 2016-12-07 18:07 Richard Henderson
2016-12-07 18:07 ` [Qemu-devel] [PATCH 1/2] tcg/aarch64: Fix addsub2 for 0+C Richard Henderson
2016-12-07 18:07 ` [Qemu-devel] [PATCH 2/2] tcg/aarch64: Fix tcg_out_movi Richard Henderson
0 siblings, 2 replies; 5+ messages in thread
From: Richard Henderson @ 2016-12-07 18:07 UTC (permalink / raw)
To: qemu-devel; +Cc: alex.bennee, peter.maydell
Both of these were identified by Alex and his risu testsuite, and diagnosed
by Peter. I believe that both bugs are my fault, originally.
The risu testsuite, as posted, now passes with an aarch64 host.
r~
Richard Henderson (2):
tcg/aarch64: Fix addsub2 for 0+C
tcg/aarch64: Fix tcg_out_movi
tcg/aarch64/tcg-target.inc.c | 66 ++++++++++++++++++++++----------------------
1 file changed, 33 insertions(+), 33 deletions(-)
--
2.9.3
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Qemu-devel] [PATCH 1/2] tcg/aarch64: Fix addsub2 for 0+C
2016-12-07 18:07 [Qemu-devel] [PATCH 0/2] tcg/aarch64 fixes Richard Henderson
@ 2016-12-07 18:07 ` Richard Henderson
2016-12-07 18:10 ` Peter Maydell
2016-12-07 18:07 ` [Qemu-devel] [PATCH 2/2] tcg/aarch64: Fix tcg_out_movi Richard Henderson
1 sibling, 1 reply; 5+ messages in thread
From: Richard Henderson @ 2016-12-07 18:07 UTC (permalink / raw)
To: qemu-devel; +Cc: alex.bennee, peter.maydell
When al == xzr, we cannot use addi/subi because that encodes xsp.
Force a zero into the temp register for that (rare) case.
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
tcg/aarch64/tcg-target.inc.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 1939d35..6c68681 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -965,6 +965,15 @@ static inline void tcg_out_addsub2(TCGContext *s, int ext, TCGReg rl,
insn = I3401_SUBSI;
bl = -bl;
}
+ if (unlikely(al == TCG_REG_XZR)) {
+ /* ??? We want to allow al to be zero for the benefit of
+ negation via subtraction. However, that leaves open the
+ possibility of adding 0+const in the low part, and the
+ immediate add instructions encode XSP not XZR. Don't try
+ anything more elaborate here than loading another zero. */
+ al = TCG_REG_TMP;
+ tcg_out_movi(s, ext, al, 0);
+ }
tcg_out_insn_3401(s, insn, ext, rl, al, bl);
} else {
tcg_out_insn_3502(s, sub ? I3502_SUBS : I3502_ADDS, ext, rl, al, bl);
--
2.9.3
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [Qemu-devel] [PATCH 2/2] tcg/aarch64: Fix tcg_out_movi
2016-12-07 18:07 [Qemu-devel] [PATCH 0/2] tcg/aarch64 fixes Richard Henderson
2016-12-07 18:07 ` [Qemu-devel] [PATCH 1/2] tcg/aarch64: Fix addsub2 for 0+C Richard Henderson
@ 2016-12-07 18:07 ` Richard Henderson
1 sibling, 0 replies; 5+ messages in thread
From: Richard Henderson @ 2016-12-07 18:07 UTC (permalink / raw)
To: qemu-devel; +Cc: alex.bennee, peter.maydell
There were some patterns, like 0x0000_ffff_ffff_00ff, for which we
would select to begin a multi-insn sequence with MOVN, but would
fail to set the 0x0000 lane back from 0xffff.
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
tcg/aarch64/tcg-target.inc.c | 57 +++++++++++++++++++-------------------------
1 file changed, 24 insertions(+), 33 deletions(-)
diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 6c68681..2d7cc35 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -581,11 +581,9 @@ static void tcg_out_logicali(TCGContext *s, AArch64Insn insn, TCGType ext,
static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
tcg_target_long value)
{
- AArch64Insn insn;
int i, wantinv, shift;
tcg_target_long svalue = value;
tcg_target_long ivalue = ~value;
- tcg_target_long imask;
/* For 32-bit values, discard potential garbage in value. For 64-bit
values within [2**31, 2**32-1], we can create smaller sequences by
@@ -631,42 +629,35 @@ static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg rd,
/* Would it take fewer insns to begin with MOVN? For the value and its
inverse, count the number of 16-bit lanes that are 0. */
- for (i = wantinv = imask = 0; i < 64; i += 16) {
+ for (i = wantinv = 0; i < 64; i += 16) {
tcg_target_long mask = 0xffffull << i;
- if ((value & mask) == 0) {
- wantinv -= 1;
- }
- if ((ivalue & mask) == 0) {
- wantinv += 1;
- imask |= mask;
- }
+ wantinv -= ((value & mask) == 0);
+ wantinv += ((ivalue & mask) == 0);
}
- /* If we had more 0xffff than 0x0000, invert VALUE and use MOVN. */
- insn = I3405_MOVZ;
- if (wantinv > 0) {
- value = ivalue;
- insn = I3405_MOVN;
- }
-
- /* Find the lowest lane that is not 0x0000. */
- shift = ctz64(value) & (63 & -16);
- tcg_out_insn_3405(s, insn, type, rd, value >> shift, shift);
-
- if (wantinv > 0) {
- /* Re-invert the value, so MOVK sees non-inverted bits. */
- value = ~value;
- /* Clear out all the 0xffff lanes. */
- value ^= imask;
- }
- /* Clear out the lane that we just set. */
- value &= ~(0xffffUL << shift);
-
- /* Iterate until all lanes have been set, and thus cleared from VALUE. */
- while (value) {
+ if (wantinv <= 0) {
+ /* Find the lowest lane that is not 0x0000. */
shift = ctz64(value) & (63 & -16);
- tcg_out_insn(s, 3405, MOVK, type, rd, value >> shift, shift);
+ tcg_out_insn(s, 3405, MOVZ, type, rd, value >> shift, shift);
+ /* Clear out the lane that we just set. */
value &= ~(0xffffUL << shift);
+ /* Iterate until all non-zero lanes have been processed. */
+ while (value) {
+ shift = ctz64(value) & (63 & -16);
+ tcg_out_insn(s, 3405, MOVK, type, rd, value >> shift, shift);
+ value &= ~(0xffffUL << shift);
+ }
+ } else {
+ /* Like above, but with the inverted value and MOVN to start. */
+ shift = ctz64(ivalue) & (63 & -16);
+ tcg_out_insn(s, 3405, MOVN, type, rd, ivalue >> shift, shift);
+ ivalue &= ~(0xffffUL << shift);
+ while (ivalue) {
+ shift = ctz64(ivalue) & (63 & -16);
+ /* Provide MOVK with the non-inverted value. */
+ tcg_out_insn(s, 3405, MOVK, type, rd, ~(ivalue >> shift), shift);
+ ivalue &= ~(0xffffUL << shift);
+ }
}
}
--
2.9.3
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] [PATCH 1/2] tcg/aarch64: Fix addsub2 for 0+C
2016-12-07 18:07 ` [Qemu-devel] [PATCH 1/2] tcg/aarch64: Fix addsub2 for 0+C Richard Henderson
@ 2016-12-07 18:10 ` Peter Maydell
2016-12-07 18:15 ` Richard Henderson
0 siblings, 1 reply; 5+ messages in thread
From: Peter Maydell @ 2016-12-07 18:10 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, Alex Bennée
On 7 December 2016 at 18:07, Richard Henderson <rth@twiddle.net> wrote:
> When al == xzr, we cannot use addi/subi because that encodes xsp.
> Force a zero into the temp register for that (rare) case.
Incidentally I was slightly surprised that the optimisation
pass didn't turn "add2 rlo, rhi, 0, 0, 0, 0" into moves of 0
into rlo and rhi. Constant shifts of xzr in the guest don't
seem worth spending much effort on optimising though :-)
thanks
-- PMM
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] [PATCH 1/2] tcg/aarch64: Fix addsub2 for 0+C
2016-12-07 18:10 ` Peter Maydell
@ 2016-12-07 18:15 ` Richard Henderson
0 siblings, 0 replies; 5+ messages in thread
From: Richard Henderson @ 2016-12-07 18:15 UTC (permalink / raw)
To: Peter Maydell; +Cc: QEMU Developers, Alex Bennée
On 12/07/2016 10:10 AM, Peter Maydell wrote:
> On 7 December 2016 at 18:07, Richard Henderson <rth@twiddle.net> wrote:
>> When al == xzr, we cannot use addi/subi because that encodes xsp.
>> Force a zero into the temp register for that (rare) case.
>
> Incidentally I was slightly surprised that the optimisation
> pass didn't turn "add2 rlo, rhi, 0, 0, 0, 0" into moves of 0
> into rlo and rhi. Constant shifts of xzr in the guest don't
> seem worth spending much effort on optimising though :-)
Until this last release cycle, we couldn't insert opcodes into the instruction
stream during optimization. Thus we were very restricted in what we could do
when wanting to assign simpler values to two different outputs.
I'll bet addition of 0 + non-constant come up semi-regularly for 32-bit hosts
and 64-bit guests. I'll see about better simplification of double-word
arithmetic for the next cycle.
r~
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-12-07 18:16 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-12-07 18:07 [Qemu-devel] [PATCH 0/2] tcg/aarch64 fixes Richard Henderson
2016-12-07 18:07 ` [Qemu-devel] [PATCH 1/2] tcg/aarch64: Fix addsub2 for 0+C Richard Henderson
2016-12-07 18:10 ` Peter Maydell
2016-12-07 18:15 ` Richard Henderson
2016-12-07 18:07 ` [Qemu-devel] [PATCH 2/2] tcg/aarch64: Fix tcg_out_movi Richard Henderson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).