From: Blue Swirl <blauwirbel@gmail.com>
To: Igor Kovalenko <igor.v.kovalenko@gmail.com>
Cc: qemu-devel <qemu-devel@nongnu.org>
Subject: [Qemu-devel] Re: sparc64 lazy conditional codes evaluation
Date: Mon, 3 May 2010 22:24:32 +0300 [thread overview]
Message-ID: <r2pf43fc5581005031224o2b430a8bo5c5605734ea64b52@mail.gmail.com> (raw)
In-Reply-To: <q2rb2fa41d61005030017vacbfb9dcq65a905ff38bc1ef9@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 2990 bytes --]
On 5/3/10, Igor Kovalenko <igor.v.kovalenko@gmail.com> wrote:
> Hi!
>
> There is an issue with lazy conditional codes evaluation where
> we return from trap handler with mismatching conditionals.
>
> I seldom reproduce it here when dragging qemu window while
> machine is working through silo initialization. I use gentoo minimal cd
> install-sparc64-minimal-20100322.iso but I think anything with silo boot
> would experience the same. Once in a while it would report crc error,
> unable to open cd partition or it would fail to decompress image.
I think I've also seen this.
> Pattern that fails appears to require a sequence of compare insn
> possibly followed by a few instructions which do not touch conditionals,
> then conditional branch insn. If it happens that we trap while processing
> conditional branch insn so it is restarted after return from trap then
> seldom conditional codes are calculated incorrectly.
>
> I cannot point to exact cause but it appears that after trap return
> we may have CC_OP and CC_SRC* mismatch somewhere,
> since adding more cond evaluation flushes over the code helps.
>
> We already tried doing flush more frequently and it is still not
> complete, so the question is how to finally do this once and right :)
>
> Obviously I do not get the design of lazy evaluation right, but
> the following list appears to be good start. Plan is to prepare
> a change to qemu and find a way to test it.
>
> 1. Since SPARC* is a RISC CPU it seems to be not profitable to
> use DisasContext->cc_op to predict if flags should be not evaluated
> due to overriding insn. Instead we can drop cc_op from disassembler
> context and simplify code to only use cc_op from env.
Not currently, but in the future we may use that to do even lazier
flags computation. For example the sequence 'cmp x, y; bne target'
could be much more optimal by changing the branch to do the
comparison. Here's an old unfinished patch to do some of this.
> Another point is that we always write to env->cc_op when
> translating *cc insns
> This should solve any issue with dc->cc_op prediction going
> out of sync with env->cc_op and cpu_cc_src*
I think this is what is happening now.
> 2. We must flush lazy evaluation back to CC_OP_FLAGS in a few cases when
> a. conditional code is required by insn (like addc, cond branch etc.)
> - here we can optimize by evaluating specific bits (carry?)
> - not sure if it works in case we have two cond consuming insns,
> where first needs carry another needs the rest of flags
Here's another patch to optimize C flag handling. It doesn't pass my
tests though.
> b. CCR is read by rdccr (helper_rdccr)
> - have to compute all flags
> c. trap occurs and we prepare trap level context (saving pstate)
> - have to compute all flags
> d. control goes out of tcg runtime (so gdbstub reads correct value from env)
> - have to compute all flags
Fully agree.
[-- Attachment #2: 0001-Convert-C-flag-input-BROKEN.patch --]
[-- Type: text/x-diff, Size: 5475 bytes --]
From b0863c213ce487e9c1034674668d1b64a43b7266 Mon Sep 17 00:00:00 2001
From: Blue Swirl <blauwirbel@gmail.com>
Date: Mon, 3 May 2010 19:11:37 +0000
Subject: [PATCH] Convert C flag input BROKEN
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
---
target-sparc/translate.c | 24 ++++++++----------------
1 files changed, 8 insertions(+), 16 deletions(-)
diff --git a/target-sparc/translate.c b/target-sparc/translate.c
index be2a116..94c343d 100644
--- a/target-sparc/translate.c
+++ b/target-sparc/translate.c
@@ -336,7 +336,7 @@ static inline void gen_op_addxi_cc(TCGv dst, TCGv src1, target_long src2)
{
tcg_gen_mov_tl(cpu_cc_src, src1);
tcg_gen_movi_tl(cpu_cc_src2, src2);
- gen_mov_reg_C(cpu_tmp0, cpu_psr);
+ gen_helper_compute_C_icc(cpu_tmp0);
tcg_gen_add_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0);
tcg_gen_addi_tl(cpu_cc_dst, cpu_cc_dst, src2);
tcg_gen_mov_tl(dst, cpu_cc_dst);
@@ -346,7 +346,7 @@ static inline void gen_op_addx_cc(TCGv dst, TCGv src1, TCGv src2)
{
tcg_gen_mov_tl(cpu_cc_src, src1);
tcg_gen_mov_tl(cpu_cc_src2, src2);
- gen_mov_reg_C(cpu_tmp0, cpu_psr);
+ gen_helper_compute_C_icc(cpu_tmp0);
tcg_gen_add_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0);
tcg_gen_add_tl(cpu_cc_dst, cpu_cc_dst, cpu_cc_src2);
tcg_gen_mov_tl(dst, cpu_cc_dst);
@@ -419,7 +419,7 @@ static inline void gen_op_subxi_cc(TCGv dst, TCGv src1, target_long src2)
{
tcg_gen_mov_tl(cpu_cc_src, src1);
tcg_gen_movi_tl(cpu_cc_src2, src2);
- gen_mov_reg_C(cpu_tmp0, cpu_psr);
+ gen_helper_compute_C_icc(cpu_tmp0);
tcg_gen_sub_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0);
tcg_gen_subi_tl(cpu_cc_dst, cpu_cc_dst, src2);
tcg_gen_mov_tl(dst, cpu_cc_dst);
@@ -429,7 +429,7 @@ static inline void gen_op_subx_cc(TCGv dst, TCGv src1, TCGv src2)
{
tcg_gen_mov_tl(cpu_cc_src, src1);
tcg_gen_mov_tl(cpu_cc_src2, src2);
- gen_mov_reg_C(cpu_tmp0, cpu_psr);
+ gen_helper_compute_C_icc(cpu_tmp0);
tcg_gen_sub_tl(cpu_cc_dst, cpu_cc_src, cpu_tmp0);
tcg_gen_sub_tl(cpu_cc_dst, cpu_cc_dst, cpu_cc_src2);
tcg_gen_mov_tl(dst, cpu_cc_dst);
@@ -2953,25 +2953,21 @@ static void disas_sparc_insn(DisasContext * dc)
if (IS_IMM) {
simm = GET_FIELDs(insn, 19, 31);
if (xop & 0x10) {
- gen_helper_compute_psr();
gen_op_addxi_cc(cpu_dst, cpu_src1, simm);
tcg_gen_movi_i32(cpu_cc_op, CC_OP_ADDX);
dc->cc_op = CC_OP_ADDX;
} else {
- gen_helper_compute_psr();
- gen_mov_reg_C(cpu_tmp0, cpu_psr);
+ gen_helper_compute_C_icc(cpu_tmp0);
tcg_gen_addi_tl(cpu_tmp0, cpu_tmp0, simm);
tcg_gen_add_tl(cpu_dst, cpu_src1, cpu_tmp0);
}
} else {
if (xop & 0x10) {
- gen_helper_compute_psr();
gen_op_addx_cc(cpu_dst, cpu_src1, cpu_src2);
tcg_gen_movi_i32(cpu_cc_op, CC_OP_ADDX);
dc->cc_op = CC_OP_ADDX;
} else {
- gen_helper_compute_psr();
- gen_mov_reg_C(cpu_tmp0, cpu_psr);
+ gen_helper_compute_C_icc(cpu_tmp0);
tcg_gen_add_tl(cpu_tmp0, cpu_src2, cpu_tmp0);
tcg_gen_add_tl(cpu_dst, cpu_src1, cpu_tmp0);
}
@@ -3009,25 +3005,21 @@ static void disas_sparc_insn(DisasContext * dc)
if (IS_IMM) {
simm = GET_FIELDs(insn, 19, 31);
if (xop & 0x10) {
- gen_helper_compute_psr();
gen_op_subxi_cc(cpu_dst, cpu_src1, simm);
tcg_gen_movi_i32(cpu_cc_op, CC_OP_SUBX);
dc->cc_op = CC_OP_SUBX;
} else {
- gen_helper_compute_psr();
- gen_mov_reg_C(cpu_tmp0, cpu_psr);
+ gen_helper_compute_C_icc(cpu_tmp0);
tcg_gen_addi_tl(cpu_tmp0, cpu_tmp0, simm);
tcg_gen_sub_tl(cpu_dst, cpu_src1, cpu_tmp0);
}
} else {
if (xop & 0x10) {
- gen_helper_compute_psr();
gen_op_subx_cc(cpu_dst, cpu_src1, cpu_src2);
tcg_gen_movi_i32(cpu_cc_op, CC_OP_SUBX);
dc->cc_op = CC_OP_SUBX;
} else {
- gen_helper_compute_psr();
- gen_mov_reg_C(cpu_tmp0, cpu_psr);
+ gen_helper_compute_C_icc(cpu_tmp0);
tcg_gen_add_tl(cpu_tmp0, cpu_src2, cpu_tmp0);
tcg_gen_sub_tl(cpu_dst, cpu_src1, cpu_tmp0);
}
--
1.5.6.5
[-- Attachment #3: 0002-Branch-optimization-BROKEN.patch --]
[-- Type: text/x-diff, Size: 4402 bytes --]
From 93cce43be043ca25770165b8c06546eafc320716 Mon Sep 17 00:00:00 2001
From: Blue Swirl <blauwirbel@gmail.com>
Date: Mon, 3 May 2010 19:21:59 +0000
Subject: [PATCH] Branch optimization BROKEN
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
---
target-sparc/translate.c | 108 +++++++++++++++++++++++++++++++++++++++++++++-
1 files changed, 106 insertions(+), 2 deletions(-)
diff --git a/target-sparc/translate.c b/target-sparc/translate.c
index 94c343d..57bda12 100644
--- a/target-sparc/translate.c
+++ b/target-sparc/translate.c
@@ -1115,6 +1115,104 @@ static inline void gen_cond_reg(TCGv r_dst, int cond, TCGv r_src)
}
#endif
+// Inverted logic
+static const int gen_tcg_cond[16] = {
+ -1,
+ TCG_COND_NE,
+ TCG_COND_GT,
+ TCG_COND_GE,
+ TCG_COND_GTU,
+ TCG_COND_GEU,
+ -1,
+ -1,
+ -1,
+ TCG_COND_EQ,
+ TCG_COND_LE,
+ TCG_COND_LT,
+ TCG_COND_LEU,
+ TCG_COND_LTU,
+ -1,
+ -1,
+};
+
+/* generate a conditional jump to label 'l1' according to jump opcode
+ value 'b'. In the fast case, T0 is guaranted not to be used. */
+static inline void gen_brcond(DisasContext *dc, int cond, int l1, int cc, TCGv r_cond)
+{
+ //printf("gen_brcond: cc_op %d\n", dc->cc_op);
+ switch (dc->cc_op) {
+ /* we optimize the cmp/br case */
+ case CC_OP_SUB:
+ // Inverted logic
+ switch (cond) {
+ case 0x0: // n
+ tcg_gen_br(l1);
+ break;
+ case 0x1: // e
+ if (cc == 1) {
+ tcg_gen_brcondi_i64(TCG_COND_NE, cpu_cc_dst, 0, l1);
+ } else {
+ tcg_gen_brcondi_i32(TCG_COND_NE, cpu_cc_dst, 0, l1);
+ }
+ break;
+ case 0x2: // le
+ case 0x3: // l
+ case 0x4: // leu
+ case 0x5: // cs/lu
+ case 0xa: // g
+ case 0xb: // ge
+ case 0xc: // gu
+ case 0xd: // cc/geu
+ if (cc == 1) {
+ tcg_gen_brcondi_i64(gen_tcg_cond[cond], cpu_cc_src, cpu_cc_src2, l1);
+ } else {
+ tcg_gen_brcondi_i32(gen_tcg_cond[cond], cpu_cc_src, cpu_cc_src2, l1);
+ }
+ break;
+ case 0x6: // neg
+ if (cc == 1) {
+ tcg_gen_brcondi_i64(TCG_COND_GE, cpu_cc_dst, 0, l1);
+ } else {
+ tcg_gen_brcondi_i32(TCG_COND_GE, cpu_cc_dst, 0, l1);
+ }
+ break;
+ case 0x7: // vs
+ gen_helper_compute_psr();
+ dc->cc_op = CC_OP_FLAGS;
+ gen_op_eval_bvs(cpu_cc_dst, cpu_cc_src);
+ break;
+ case 0x8: // a
+ // nop
+ break;
+ case 0x9: // ne
+ if (cc == 1) {
+ tcg_gen_brcondi_i64(TCG_COND_EQ, cpu_cc_dst, 0, l1);
+ } else {
+ tcg_gen_brcondi_i32(TCG_COND_EQ, cpu_cc_dst, 0, l1);
+ }
+ break;
+ case 0xe: // pos
+ if (cc == 1) {
+ tcg_gen_brcondi_i64(TCG_COND_LT, cpu_cc_dst, 0, l1);
+ } else {
+ tcg_gen_brcondi_i32(TCG_COND_LT, cpu_cc_dst, 0, l1);
+ }
+ break;
+ case 0xf: // vc
+ gen_helper_compute_psr();
+ dc->cc_op = CC_OP_FLAGS;
+ gen_op_eval_bvc(cpu_cc_dst, cpu_cc_src);
+ break;
+ }
+ break;
+ case CC_OP_FLAGS:
+ default:
+ gen_cond(r_cond, cc, cond, dc);
+ tcg_gen_brcondi_tl(TCG_COND_EQ, r_cond, 0, l1);
+ break;
+ }
+}
+
/* XXX: potentially incorrect if dynamic npc */
static void do_branch(DisasContext *dc, int32_t offset, uint32_t insn, int cc,
TCGv r_cond)
@@ -1143,11 +1241,17 @@ static void do_branch(DisasContext *dc, int32_t offset, uint32_t insn, int cc,
}
} else {
flush_cond(dc, r_cond);
- gen_cond(r_cond, cc, cond, dc);
if (a) {
- gen_branch_a(dc, target, dc->npc, r_cond);
+ int l1 = gen_new_label();
+
+ gen_brcond(dc, cond, l1, cc, r_cond);
+ gen_goto_tb(dc, 0, dc->npc, target);
+
+ gen_set_label(l1);
+ gen_goto_tb(dc, 1, dc->npc + 4, dc->npc + 8);
dc->is_br = 1;
} else {
+ gen_cond(r_cond, cc, cond, dc);
dc->pc = dc->npc;
dc->jump_pc[0] = target;
dc->jump_pc[1] = dc->npc + 4;
--
1.5.6.5
next prev parent reply other threads:[~2010-05-03 19:24 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-05-03 7:17 [Qemu-devel] sparc64 lazy conditional codes evaluation Igor Kovalenko
2010-05-03 19:24 ` Blue Swirl [this message]
2010-05-03 19:46 ` [Qemu-devel] " Igor Kovalenko
2010-05-03 19:54 ` Blue Swirl
2010-05-03 20:03 ` Igor Kovalenko
2010-05-04 20:21 ` Blue Swirl
2010-05-05 20:24 ` Igor Kovalenko
2010-05-06 18:51 ` Blue Swirl
2010-05-08 6:41 ` Igor Kovalenko
2010-05-09 20:22 ` Blue Swirl
2010-05-10 9:40 ` Mark Cave-Ayland
2010-05-10 18:33 ` Blue Swirl
2010-05-15 12:56 ` Mark Cave-Ayland
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=r2pf43fc5581005031224o2b430a8bo5c5605734ea64b52@mail.gmail.com \
--to=blauwirbel@gmail.com \
--cc=igor.v.kovalenko@gmail.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).