[Qemu-devel] [PATCH 0/9] S/390 support updated

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH 0/9] S/390 support updated
@ 2009-10-16 12:38 Ulrich Hecht
  2009-10-16 12:38 ` [Qemu-devel] [PATCH 1/9] TCG "sync" op Ulrich Hecht
  0 siblings, 1 reply; 26+ messages in thread
From: Ulrich Hecht @ 2009-10-16 12:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: riku.voipio, agraf

S/390 support updated again. People are breathing down my neck to get this
upstream, so I would really appreciate if someone could commit this, or tell
me what's wrong with it. And preferably commit the stuff that is not wrong
in the meantime, so I don't have to juggle 300k of patches every time...

CU
Uli

Ulrich Hecht (9):
  TCG "sync" op
  S/390 CPU emulation
  S/390 host/target build system support
  S/390 host support for TCG
  linux-user: S/390 64-bit (s390x) support
  linux-user: don't do locking in single-threaded processes
  linux-user: dup3, fallocate syscalls
  linux-user: define a couple of syscalls for non-uid16 targets
  linux-user: getpriority errno fix

 configure                            |   58 +-
 cpu-defs.h                           |    8 +
 cpu-exec.c                           |   16 +-
 default-configs/s390x-linux-user.mak |    1 +
 disas.c                              |    3 +
 dyngen-exec.h                        |    2 +-
 linux-user/elfload.c                 |   18 +
 linux-user/main.c                    |   82 ++
 linux-user/s390x/syscall.h           |   25 +
 linux-user/s390x/syscall_nr.h        |  348 +++++
 linux-user/s390x/target_signal.h     |   26 +
 linux-user/s390x/termbits.h          |  283 ++++
 linux-user/signal.c                  |  314 +++++
 linux-user/syscall.c                 |  156 ++-
 linux-user/syscall_defs.h            |   56 +-
 qemu-binfmt-conf.sh                  |    5 +-
 s390-dis.c                           |    4 +-
 s390x.ld                             |  194 +++
 target-s390x/cpu.h                   |  132 ++
 target-s390x/exec.h                  |   51 +
 target-s390x/helper.c                |   81 ++
 target-s390x/helpers.h               |  128 ++
 target-s390x/op_helper.c             | 1719 +++++++++++++++++++++++
 target-s390x/translate.c             | 2479 ++++++++++++++++++++++++++++++++++
 tcg/s390/tcg-target.c                | 1145 ++++++++++++++++
 tcg/s390/tcg-target.h                |   76 +
 tcg/tcg-op.h                         |   12 +
 tcg/tcg-opc.h                        |    2 +
 tcg/tcg.c                            |    6 +
 29 files changed, 7390 insertions(+), 40 deletions(-)
 create mode 100644 default-configs/s390x-linux-user.mak
 create mode 100644 linux-user/s390x/syscall.h
 create mode 100644 linux-user/s390x/syscall_nr.h
 create mode 100644 linux-user/s390x/target_signal.h
 create mode 100644 linux-user/s390x/termbits.h
 create mode 100644 s390x.ld
 create mode 100644 target-s390x/cpu.h
 create mode 100644 target-s390x/exec.h
 create mode 100644 target-s390x/helper.c
 create mode 100644 target-s390x/helpers.h
 create mode 100644 target-s390x/op_helper.c
 create mode 100644 target-s390x/translate.c
 create mode 100644 tcg/s390/tcg-target.c
 create mode 100644 tcg/s390/tcg-target.h

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 1/9] TCG "sync" op
  2009-10-16 12:38 [Qemu-devel] [PATCH 0/9] S/390 support updated Ulrich Hecht
@ 2009-10-16 12:38 ` Ulrich Hecht
  2009-10-16 12:38   ` [Qemu-devel] [PATCH 2/9] S/390 CPU emulation Ulrich Hecht
  2009-10-16 15:52   ` [Qemu-devel] [PATCH 1/9] TCG "sync" op Aurelien Jarno
  0 siblings, 2 replies; 26+ messages in thread
From: Ulrich Hecht @ 2009-10-16 12:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: riku.voipio, agraf

sync allows concurrent accesses to locations in memory through different TCG
variables. This comes in handy when you are emulating CPU registers that can
be used as either 32 or 64 bit, as TCG doesn't know anything about aliases.
See the s390x target for an example.

Fixed sync_i64 build failure on 32-bit targets.

Signed-off-by: Ulrich Hecht <uli@suse.de>
---
 tcg/tcg-op.h  |   12 ++++++++++++
 tcg/tcg-opc.h |    2 ++
 tcg/tcg.c     |    6 ++++++
 3 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index faf2e8b..c1b4710 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -316,6 +316,18 @@ static inline void tcg_gen_br(int label)
     tcg_gen_op1i(INDEX_op_br, label);
 }
 
+static inline void tcg_gen_sync_i32(TCGv_i32 arg)
+{
+    tcg_gen_op1_i32(INDEX_op_sync_i32, arg);
+}
+
+#if TCG_TARGET_REG_BITS == 64
+static inline void tcg_gen_sync_i64(TCGv_i64 arg)
+{
+    tcg_gen_op1_i64(INDEX_op_sync_i64, arg);
+}
+#endif
+
 static inline void tcg_gen_mov_i32(TCGv_i32 ret, TCGv_i32 arg)
 {
     if (!TCGV_EQUAL_I32(ret, arg))
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index b7f3fd7..5dcdeba 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -40,6 +40,7 @@ DEF2(call, 0, 1, 2, TCG_OPF_SIDE_EFFECTS) /* variable number of parameters */
 DEF2(jmp, 0, 1, 0, TCG_OPF_BB_END | TCG_OPF_SIDE_EFFECTS)
 DEF2(br, 0, 0, 1, TCG_OPF_BB_END | TCG_OPF_SIDE_EFFECTS)
 
+DEF2(sync_i32, 0, 1, 0, 0)
 DEF2(mov_i32, 1, 1, 0, 0)
 DEF2(movi_i32, 1, 0, 1, 0)
 /* load/store */
@@ -109,6 +110,7 @@ DEF2(neg_i32, 1, 1, 0, 0)
 #endif
 
 #if TCG_TARGET_REG_BITS == 64
+DEF2(sync_i64, 0, 1, 0, 0)
 DEF2(mov_i64, 1, 1, 0, 0)
 DEF2(movi_i64, 1, 0, 1, 0)
 /* load/store */
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 3c0e296..8eb60f8 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1930,6 +1930,12 @@ static inline int tcg_gen_code_common(TCGContext *s, uint8_t *gen_code_buf,
         //        dump_regs(s);
 #endif
         switch(opc) {
+        case INDEX_op_sync_i32:
+#if TCG_TARGET_REG_BITS == 64
+        case INDEX_op_sync_i64:
+#endif
+            temp_save(s, args[0], s->reserved_regs);
+            break;
         case INDEX_op_mov_i32:
 #if TCG_TARGET_REG_BITS == 64
         case INDEX_op_mov_i64:
-- 
1.6.2.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 2/9] S/390 CPU emulation
  2009-10-16 12:38 ` [Qemu-devel] [PATCH 1/9] TCG "sync" op Ulrich Hecht
@ 2009-10-16 12:38   ` Ulrich Hecht
  2009-10-16 12:38     ` [Qemu-devel] [PATCH 3/9] S/390 host/target build system support Ulrich Hecht
  2009-10-17 10:42     ` [Qemu-devel] [PATCH 2/9] S/390 CPU emulation Aurelien Jarno
  2009-10-16 15:52   ` [Qemu-devel] [PATCH 1/9] TCG "sync" op Aurelien Jarno
  1 sibling, 2 replies; 26+ messages in thread
From: Ulrich Hecht @ 2009-10-16 12:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: riku.voipio, agraf

Currently only does userspace with 64-bit addressing, but it's quite good
at that.

replaced always_inline with inline

Signed-off-by: Ulrich Hecht <uli@suse.de>
---
 cpu-exec.c               |    2 +
 disas.c                  |    3 +
 s390-dis.c               |    4 +-
 target-s390x/cpu.h       |  132 +++
 target-s390x/exec.h      |   51 +
 target-s390x/helper.c    |   81 ++
 target-s390x/helpers.h   |  128 +++
 target-s390x/op_helper.c | 1719 ++++++++++++++++++++++++++++++++
 target-s390x/translate.c | 2479 ++++++++++++++++++++++++++++++++++++++++++++++
 9 files changed, 4597 insertions(+), 2 deletions(-)
 create mode 100644 target-s390x/cpu.h
 create mode 100644 target-s390x/exec.h
 create mode 100644 target-s390x/helper.c
 create mode 100644 target-s390x/helpers.h
 create mode 100644 target-s390x/op_helper.c
 create mode 100644 target-s390x/translate.c

diff --git a/cpu-exec.c b/cpu-exec.c
index 8aa92c7..6b3391c 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -249,6 +249,7 @@ int cpu_exec(CPUState *env1)
 #elif defined(TARGET_MIPS)
 #elif defined(TARGET_SH4)
 #elif defined(TARGET_CRIS)
+#elif defined(TARGET_S390X)
     /* XXXXX */
 #else
 #error unsupported target CPU
@@ -673,6 +674,7 @@ int cpu_exec(CPUState *env1)
 #elif defined(TARGET_SH4)
 #elif defined(TARGET_ALPHA)
 #elif defined(TARGET_CRIS)
+#elif defined(TARGET_S390X)
     /* XXXXX */
 #else
 #error unsupported target CPU
diff --git a/disas.c b/disas.c
index ce342bc..14c8901 100644
--- a/disas.c
+++ b/disas.c
@@ -195,6 +195,9 @@ void target_disas(FILE *out, target_ulong code, target_ulong size, int flags)
 #elif defined(TARGET_CRIS)
     disasm_info.mach = bfd_mach_cris_v32;
     print_insn = print_insn_crisv32;
+#elif defined(TARGET_S390X)
+    disasm_info.mach = bfd_mach_s390_64;
+    print_insn = print_insn_s390;
 #elif defined(TARGET_MICROBLAZE)
     disasm_info.mach = bfd_arch_microblaze;
     print_insn = print_insn_microblaze;
diff --git a/s390-dis.c b/s390-dis.c
index 86dd84f..9a73a57 100644
--- a/s390-dis.c
+++ b/s390-dis.c
@@ -191,10 +191,10 @@ init_disasm (struct disassemble_info *info)
 //  switch (info->mach)
 //    {
 //    case bfd_mach_s390_31:
-      current_arch_mask = 1 << S390_OPCODE_ESA;
+//      current_arch_mask = 1 << S390_OPCODE_ESA;
 //      break;
 //    case bfd_mach_s390_64:
-//      current_arch_mask = 1 << S390_OPCODE_ZARCH;
+      current_arch_mask = 1 << S390_OPCODE_ZARCH;
 //      break;
 //    default:
 //      abort ();
diff --git a/target-s390x/cpu.h b/target-s390x/cpu.h
new file mode 100644
index 0000000..93b09cd
--- /dev/null
+++ b/target-s390x/cpu.h
@@ -0,0 +1,132 @@
+/*
+ * S/390 virtual CPU header
+ *
+ *  Copyright (c) 2009 Ulrich Hecht
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston MA  02110-1301 USA
+ */
+#ifndef CPU_S390X_H
+#define CPU_S390X_H
+
+#define TARGET_LONG_BITS 64
+
+#define ELF_MACHINE	EM_S390
+
+#define CPUState struct CPUS390XState
+
+#include "cpu-defs.h"
+
+#include "softfloat.h"
+
+#define NB_MMU_MODES 2 // guess
+#define MMU_USER_IDX 0 // guess
+
+typedef union FPReg {
+    struct {
+#ifdef WORDS_BIGENDIAN
+        float32 e;
+        int32_t __pad;
+#else
+        int32_t __pad;
+        float32 e;
+#endif
+    };
+    float64 d;
+    uint64_t i;
+} FPReg;
+
+typedef struct CPUS390XState {
+    uint64_t regs[16];	/* GP registers */
+    
+    uint32_t aregs[16];	/* access registers */
+    
+    uint32_t fpc;	/* floating-point control register */
+    FPReg fregs[16]; /* FP registers */
+    float_status fpu_status; /* passed to softfloat lib */
+    
+    struct {
+        uint64_t mask;
+        uint64_t addr;
+    } psw;
+    
+    int cc; /* condition code (0-3) */
+    
+    uint64_t __excp_addr;
+    
+    CPU_COMMON
+} CPUS390XState;
+
+#if defined(CONFIG_USER_ONLY)
+static inline void cpu_clone_regs(CPUState *env, target_ulong newsp)
+{
+    if (newsp)
+        env->regs[15] = newsp;
+    env->regs[0] = 0;
+}
+#endif
+
+CPUS390XState *cpu_s390x_init(const char *cpu_model);
+void s390x_translate_init(void);
+int cpu_s390x_exec(CPUS390XState *s);
+void cpu_s390x_close(CPUS390XState *s);
+void do_interrupt (CPUState *env);
+
+/* you can call this signal handler from your SIGBUS and SIGSEGV
+   signal handlers to inform the virtual CPU of exceptions. non zero
+   is returned if the signal was handled by the virtual CPU.  */
+int cpu_s390x_signal_handler(int host_signum, void *pinfo,
+                           void *puc);
+int cpu_s390x_handle_mmu_fault (CPUS390XState *env, target_ulong address, int rw,
+                              int mmu_idx, int is_softmuu);
+#define cpu_handle_mmu_fault cpu_s390x_handle_mmu_fault
+
+void cpu_lock(void);
+void cpu_unlock(void);
+
+static inline void cpu_set_tls(CPUS390XState *env, target_ulong newtls)
+{
+    env->aregs[0] = newtls >> 32;
+    env->aregs[1] = newtls & 0xffffffffULL;
+}
+
+#define TARGET_PAGE_BITS 12 // guess
+
+#define cpu_init cpu_s390x_init
+#define cpu_exec cpu_s390x_exec
+#define cpu_gen_code cpu_s390x_gen_code
+#define cpu_signal_handler cpu_s390x_signal_handler
+//#define cpu_list s390x_cpu_list
+
+#include "cpu-all.h"
+#include "exec-all.h"
+
+#define EXCP_OPEX 1 /* operation exception (sigill) */
+#define EXCP_SVC 2 /* supervisor call (syscall) */
+#define EXCP_ADDR 5 /* addressing exception */
+#define EXCP_EXECUTE_SVC 0xff00000 /* supervisor call via execute insn */
+
+static inline void cpu_pc_from_tb(CPUState *env, TranslationBlock* tb)
+{
+    env->psw.addr = tb->pc;
+}
+
+static inline void cpu_get_tb_cpu_state(CPUState* env, target_ulong *pc,
+                                        target_ulong *cs_base, int *flags)
+{
+    *pc = env->psw.addr;
+    *cs_base = 0;
+    *flags = env->psw.mask; // guess
+}
+#endif
diff --git a/target-s390x/exec.h b/target-s390x/exec.h
new file mode 100644
index 0000000..5198359
--- /dev/null
+++ b/target-s390x/exec.h
@@ -0,0 +1,51 @@
+/*
+ *  S/390 execution defines
+ *
+ *  Copyright (c) 2009 Ulrich Hecht
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston MA  02110-1301 USA
+ */
+
+#include "dyngen-exec.h"
+
+register struct CPUS390XState *env asm(AREG0);
+
+#include "cpu.h"
+#include "exec-all.h"
+
+static inline int cpu_has_work(CPUState *env)
+{
+    return env->interrupt_request & CPU_INTERRUPT_HARD; // guess
+}
+
+static inline void regs_to_env(void)
+{
+}
+
+static inline void env_to_regs(void)
+{
+}
+
+static inline int cpu_halted(CPUState *env)
+{
+    if (!env->halted) {
+       return 0;
+    }
+    if (cpu_has_work(env)) {
+        env->halted = 0;
+        return 0;
+    }
+    return EXCP_HALTED;
+}
diff --git a/target-s390x/helper.c b/target-s390x/helper.c
new file mode 100644
index 0000000..5407c62
--- /dev/null
+++ b/target-s390x/helper.c
@@ -0,0 +1,81 @@
+/*
+ *  S/390 helpers
+ *
+ *  Copyright (c) 2009 Ulrich Hecht
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston MA  02110-1301 USA
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include "cpu.h"
+#include "exec-all.h"
+#include "gdbstub.h"
+#include "qemu-common.h"
+
+CPUS390XState *cpu_s390x_init(const char *cpu_model)
+{
+    CPUS390XState *env;
+    static int inited = 0;
+    
+    env = qemu_mallocz(sizeof(CPUS390XState));
+    cpu_exec_init(env);
+    if (!inited) {
+        inited = 1;
+        s390x_translate_init();
+    }
+    
+    env->cpu_model_str = cpu_model;
+    cpu_reset(env);
+    qemu_init_vcpu(env);
+    return env;
+}
+
+#if defined(CONFIG_USER_ONLY)
+
+void do_interrupt (CPUState *env)
+{
+    env->exception_index = -1;
+}
+
+int cpu_s390x_handle_mmu_fault (CPUState *env, target_ulong address, int rw,
+                              int mmu_idx, int is_softmmu)
+{
+    //fprintf(stderr,"%s: address 0x%lx rw %d mmu_idx %d is_softmmu %d\n", __FUNCTION__, address, rw, mmu_idx, is_softmmu);
+    env->exception_index = EXCP_ADDR;
+    env->__excp_addr = address; /* FIXME: find out how this works on a real machine */
+    return 1;
+}
+
+target_phys_addr_t cpu_get_phys_page_debug(CPUState *env, target_ulong addr)
+{
+    return addr;
+}
+
+#endif /* CONFIG_USER_ONLY */
+
+void cpu_reset(CPUS390XState *env)
+{
+    if (qemu_loglevel_mask(CPU_LOG_RESET)) {
+        qemu_log("CPU Reset (CPU %d)\n", env->cpu_index);
+        log_cpu_state(env, 0);
+    }
+    
+    memset(env, 0, offsetof(CPUS390XState, breakpoints));
+    /* FIXME: reset vector? */
+    tlb_flush(env, 1);
+}
diff --git a/target-s390x/helpers.h b/target-s390x/helpers.h
new file mode 100644
index 0000000..0ba2086
--- /dev/null
+++ b/target-s390x/helpers.h
@@ -0,0 +1,128 @@
+#include "def-helper.h"
+
+DEF_HELPER_1(exception, void, i32)
+DEF_HELPER_4(nc, i32, i32, i32, i32, i32)
+DEF_HELPER_4(oc, i32, i32, i32, i32, i32)
+DEF_HELPER_4(xc, i32, i32, i32, i32, i32)
+DEF_HELPER_4(mvc, void, i32, i32, i32, i32)
+DEF_HELPER_4(clc, i32, i32, i32, i32, i32)
+DEF_HELPER_4(lmg, void, i32, i32, i32, s32)
+DEF_HELPER_4(stmg, void, i32, i32, i32, s32)
+DEF_HELPER_FLAGS_1(set_cc_s32, TCG_CALL_PURE|TCG_CALL_CONST, i32, s32)
+DEF_HELPER_FLAGS_1(set_cc_s64, TCG_CALL_PURE|TCG_CALL_CONST, i32, s64)
+DEF_HELPER_FLAGS_1(set_cc_comp_s32, TCG_CALL_PURE|TCG_CALL_CONST, i32, s32)
+DEF_HELPER_FLAGS_1(set_cc_comp_s64, TCG_CALL_PURE|TCG_CALL_CONST, i32, s64)
+DEF_HELPER_FLAGS_1(set_cc_nz_u32, TCG_CALL_PURE|TCG_CALL_CONST, i32, i32)
+DEF_HELPER_FLAGS_1(set_cc_nz_u64, TCG_CALL_PURE|TCG_CALL_CONST, i32, i64)
+DEF_HELPER_FLAGS_2(set_cc_icm, TCG_CALL_PURE|TCG_CALL_CONST, i32, i32, i32)
+DEF_HELPER_4(brc, void, i32, i32, i64, s32)
+DEF_HELPER_3(brctg, void, i64, i64, s32)
+DEF_HELPER_3(brct, void, i32, i64, s32)
+DEF_HELPER_4(brcl, void, i32, i32, i64, s64)
+DEF_HELPER_4(bcr, void, i32, i32, i64, i64)
+DEF_HELPER_4(bc, void, i32, i32, i64, i64)
+DEF_HELPER_FLAGS_2(cmp_u64, TCG_CALL_PURE|TCG_CALL_CONST, i32, i64, i64)
+DEF_HELPER_FLAGS_2(cmp_u32, TCG_CALL_PURE|TCG_CALL_CONST, i32, i32, i32)
+DEF_HELPER_FLAGS_2(cmp_s32, TCG_CALL_PURE|TCG_CALL_CONST, i32, s32, s32)
+DEF_HELPER_FLAGS_2(cmp_s64, TCG_CALL_PURE|TCG_CALL_CONST, i32, s64, s64)
+DEF_HELPER_3(clm, i32, i32, i32, i64)
+DEF_HELPER_3(stcm, void, i32, i32, i64)
+DEF_HELPER_2(mlg, void, i32, i64)
+DEF_HELPER_2(dlg, void, i32, i64)
+DEF_HELPER_FLAGS_3(set_cc_add64, TCG_CALL_PURE|TCG_CALL_CONST, i32, s64, s64, s64)
+DEF_HELPER_FLAGS_3(set_cc_addu64, TCG_CALL_PURE|TCG_CALL_CONST, i32, i64, i64, i64)
+DEF_HELPER_FLAGS_3(set_cc_add32, TCG_CALL_PURE|TCG_CALL_CONST, i32, s32, s32, s32)
+DEF_HELPER_FLAGS_3(set_cc_addu32, TCG_CALL_PURE|TCG_CALL_CONST, i32, i32, i32, i32)
+DEF_HELPER_FLAGS_3(set_cc_sub64, TCG_CALL_PURE|TCG_CALL_CONST, i32, s64, s64, s64)
+DEF_HELPER_FLAGS_3(set_cc_subu64, TCG_CALL_PURE|TCG_CALL_CONST, i32, i64, i64, i64)
+DEF_HELPER_FLAGS_3(set_cc_sub32, TCG_CALL_PURE|TCG_CALL_CONST, i32, s32, s32, s32)
+DEF_HELPER_FLAGS_3(set_cc_subu32, TCG_CALL_PURE|TCG_CALL_CONST, i32, i32, i32, i32)
+DEF_HELPER_3(srst, i32, i32, i32, i32)
+DEF_HELPER_3(clst, i32, i32, i32, i32)
+DEF_HELPER_3(mvst, i32, i32, i32, i32)
+DEF_HELPER_3(csg, i32, i32, i64, i32)
+DEF_HELPER_3(cdsg, i32, i32, i64, i32)
+DEF_HELPER_3(cs, i32, i32, i64, i32)
+DEF_HELPER_4(ex, i32, i32, i64, i64, i64)
+DEF_HELPER_FLAGS_2(tm, TCG_CALL_PURE|TCG_CALL_CONST, i32, i32, i32)
+DEF_HELPER_FLAGS_2(tmxx, TCG_CALL_PURE|TCG_CALL_CONST, i32, i64, i32)
+DEF_HELPER_2(abs_i32, i32, i32, s32)
+DEF_HELPER_2(nabs_i32, i32, i32, s32)
+DEF_HELPER_2(abs_i64, i32, i32, s64)
+DEF_HELPER_2(nabs_i64, i32, i32, s64)
+DEF_HELPER_3(stcmh, i32, i32, i64, i32)
+DEF_HELPER_3(icmh, i32, i32, i64, i32)
+DEF_HELPER_2(ipm, void, i32, i32)
+DEF_HELPER_3(addc_u32, i32, i32, i32, i32)
+DEF_HELPER_FLAGS_3(set_cc_addc_u64, TCG_CALL_PURE|TCG_CALL_CONST, i32, i64, i64, i64)
+DEF_HELPER_3(stam, void, i32, i64, i32)
+DEF_HELPER_3(mvcle, i32, i32, i64, i32)
+DEF_HELPER_3(clcle, i32, i32, i64, i32)
+DEF_HELPER_4(slb, i32, i32, i32, i32, i32)
+DEF_HELPER_4(slbg, i32, i32, i32, i64, i64)
+DEF_HELPER_2(cefbr, void, i32, s32)
+DEF_HELPER_2(cdfbr, void, i32, s32)
+DEF_HELPER_2(cxfbr, void, i32, s32)
+DEF_HELPER_2(cegbr, void, i32, s64)
+DEF_HELPER_2(cdgbr, void, i32, s64)
+DEF_HELPER_2(cxgbr, void, i32, s64)
+DEF_HELPER_2(adbr, i32, i32, i32)
+DEF_HELPER_2(aebr, i32, i32, i32)
+DEF_HELPER_2(sebr, i32, i32, i32)
+DEF_HELPER_2(sdbr, i32, i32, i32)
+DEF_HELPER_2(debr, void, i32, i32)
+DEF_HELPER_2(dxbr, void, i32, i32)
+DEF_HELPER_2(mdbr, void, i32, i32)
+DEF_HELPER_2(mxbr, void, i32, i32)
+DEF_HELPER_2(ldebr, void, i32, i32)
+DEF_HELPER_2(ldxbr, void, i32, i32)
+DEF_HELPER_2(lxdbr, void, i32, i32)
+DEF_HELPER_2(ledbr, void, i32, i32)
+DEF_HELPER_2(lexbr, void, i32, i32)
+DEF_HELPER_2(lpebr, i32, i32, i32)
+DEF_HELPER_2(lpdbr, i32, i32, i32)
+DEF_HELPER_2(lpxbr, i32, i32, i32)
+DEF_HELPER_2(ltebr, i32, i32, i32)
+DEF_HELPER_2(ltdbr, i32, i32, i32)
+DEF_HELPER_2(ltxbr, i32, i32, i32)
+DEF_HELPER_2(lcebr, i32, i32, i32)
+DEF_HELPER_2(lcdbr, i32, i32, i32)
+DEF_HELPER_2(lcxbr, i32, i32, i32)
+DEF_HELPER_2(ceb, i32, i32, i64)
+DEF_HELPER_2(aeb, i32, i32, i64)
+DEF_HELPER_2(deb, void, i32, i64)
+DEF_HELPER_2(meeb, void, i32, i64)
+DEF_HELPER_2(cdb, i32, i32, i64)
+DEF_HELPER_2(adb, i32, i32, i64)
+DEF_HELPER_2(seb, i32, i32, i64)
+DEF_HELPER_2(sdb, i32, i32, i64)
+DEF_HELPER_2(mdb, void, i32, i64)
+DEF_HELPER_2(ddb, void, i32, i64)
+DEF_HELPER_FLAGS_2(cebr, TCG_CALL_PURE, i32, i32, i32)
+DEF_HELPER_FLAGS_2(cdbr, TCG_CALL_PURE, i32, i32, i32)
+DEF_HELPER_FLAGS_2(cxbr, TCG_CALL_PURE, i32, i32, i32)
+DEF_HELPER_3(cgebr, i32, i32, i32, i32)
+DEF_HELPER_3(cgdbr, i32, i32, i32, i32)
+DEF_HELPER_3(cgxbr, i32, i32, i32, i32)
+DEF_HELPER_1(lzer, void, i32)
+DEF_HELPER_1(lzdr, void, i32)
+DEF_HELPER_1(lzxr, void, i32)
+DEF_HELPER_3(cfebr, i32, i32, i32, i32)
+DEF_HELPER_3(cfdbr, i32, i32, i32, i32)
+DEF_HELPER_3(cfxbr, i32, i32, i32, i32)
+DEF_HELPER_2(axbr, i32, i32, i32)
+DEF_HELPER_2(sxbr, i32, i32, i32)
+DEF_HELPER_2(meebr, void, i32, i32)
+DEF_HELPER_2(ddbr, void, i32, i32)
+DEF_HELPER_3(madb, void, i32, i64, i32)
+DEF_HELPER_3(maebr, void, i32, i32, i32)
+DEF_HELPER_3(madbr, void, i32, i32, i32)
+DEF_HELPER_3(msdbr, void, i32, i32, i32)
+DEF_HELPER_2(lxdb, void, i32, i64)
+DEF_HELPER_FLAGS_2(tceb, TCG_CALL_PURE, i32, i32, i64)
+DEF_HELPER_FLAGS_2(tcdb, TCG_CALL_PURE, i32, i32, i64)
+DEF_HELPER_FLAGS_2(tcxb, TCG_CALL_PURE, i32, i32, i64)
+DEF_HELPER_2(flogr, i32, i32, i64)
+DEF_HELPER_2(sqdbr, void, i32, i32)
+
+#include "def-helper.h"
diff --git a/target-s390x/op_helper.c b/target-s390x/op_helper.c
new file mode 100644
index 0000000..5de4d08
--- /dev/null
+++ b/target-s390x/op_helper.c
@@ -0,0 +1,1719 @@
+/*
+ *  S/390 helper routines
+ *
+ *  Copyright (c) 2009 Ulrich Hecht
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston MA  02110-1301 USA
+ */
+
+#include "exec.h"
+#include "helpers.h"
+#include <string.h>
+
+//#define DEBUG_HELPER
+#ifdef DEBUG_HELPER
+#define HELPER_LOG(x...) qemu_log(x)
+#else
+#define HELPER_LOG(x...)
+#endif
+
+/* raise an exception */
+void HELPER(exception)(uint32_t excp)
+{
+    HELPER_LOG("%s: exception %d\n", __FUNCTION__, excp);
+    env->exception_index = excp;
+    cpu_loop_exit();
+}
+
+/* and on array */
+uint32_t HELPER(nc)(uint32_t l, uint32_t b, uint32_t d1, uint32_t d2)
+{
+    uint64_t dest = env->regs[b >> 4] + d1;
+    uint64_t src = env->regs[b & 0xf] + d2;
+    int i;
+    unsigned char x;
+    uint32_t cc = 0;
+    HELPER_LOG("%s l %d b 0x%x d1 %d d2 %d\n", __FUNCTION__, l, b, d1, d2);
+    for (i = 0; i <= l; i++) {
+        x = ldub(dest + i) & ldub(src + i);
+        if (x) cc = 1;
+        stb(dest + i, x);
+    }
+    return cc;
+}
+
+/* xor on array */
+uint32_t HELPER(xc)(uint32_t l, uint32_t b, uint32_t d1, uint32_t d2)
+{
+    uint64_t dest = env->regs[b >> 4] + d1;
+    uint64_t src = env->regs[b & 0xf] + d2;
+    int i;
+    unsigned char x;
+    uint32_t cc = 0;
+    HELPER_LOG("%s l %d b 0x%x d1 %d d2 %d\n", __FUNCTION__, l, b, d1, d2);
+    for (i = 0; i <= l; i++) {
+        x = ldub(dest + i) ^ ldub(src + i);
+        if (x) cc = 1;
+        stb(dest + i, x);
+    }
+    return cc;
+}
+
+/* or on array */
+uint32_t HELPER(oc)(uint32_t l, uint32_t b, uint32_t d1, uint32_t d2)
+{
+    uint64_t dest = env->regs[b >> 4] + d1;
+    uint64_t src = env->regs[b & 0xf] + d2;
+    int i;
+    unsigned char x;
+    uint32_t cc = 0;
+    HELPER_LOG("%s l %d b 0x%x d1 %d d2 %d\n", __FUNCTION__, l, b, d1, d2);
+    for (i = 0; i <= l; i++) {
+        x = ldub(dest + i) | ldub(src + i);
+        if (x) cc = 1;
+        stb(dest + i, x);
+    }
+    return cc;
+}
+
+/* memcopy */
+void HELPER(mvc)(uint32_t l, uint32_t b, uint32_t d1, uint32_t d2)
+{
+    uint64_t dest = env->regs[b >> 4] + d1;
+    uint64_t src = env->regs[b & 0xf] + d2;
+    int i;
+    HELPER_LOG("%s l %d b 0x%x d1 %d d2 %d\n", __FUNCTION__, l, b, d1, d2);
+    for (i = 0; i <= l; i++) {
+        stb(dest + i, ldub(src + i));
+    }
+}
+
+/* compare unsigned byte arrays */
+uint32_t HELPER(clc)(uint32_t l, uint32_t b, uint32_t d1, uint32_t d2)
+{
+    uint64_t s1 = env->regs[b >> 4] + d1;
+    uint64_t s2 = env->regs[b & 0xf] + d2;
+    int i;
+    unsigned char x,y;
+    uint32_t cc;
+    HELPER_LOG("%s l %d b 0x%x d1 %d d2 %d\n", __FUNCTION__, l, b, d1, d2);
+    for (i = 0; i <= l; i++) {
+        x = ldub(s1 + i);
+        y = ldub(s2 + i);
+        HELPER_LOG("%02x (%c)/%02x (%c) ", x, x, y, y);
+        if (x < y) {
+            cc = 1;
+            goto done;
+        }
+        else if (x > y) {
+            cc = 2;
+            goto done;
+        }
+    }
+    cc = 0;
+done:
+    HELPER_LOG("\n");
+    return cc;
+}
+
+/* load multiple 64-bit registers from memory */
+void HELPER(lmg)(uint32_t r1, uint32_t r3, uint32_t b2, int d2)
+{
+    uint64_t src = env->regs[b2] + d2;
+    for (;;) {
+        env->regs[r1] = ldq(src);
+        src += 8;
+        if (r1 == r3) break;
+        r1 = (r1 + 1) & 15;
+    }
+}
+
+/* store multiple 64-bit registers to memory */
+void HELPER(stmg)(uint32_t r1, uint32_t r3, uint32_t b2, int d2)
+{
+    uint64_t dest = env->regs[b2] + d2;
+    HELPER_LOG("%s: r1 %d r3 %d\n", __FUNCTION__, r1, r3);
+    for (;;) {
+        HELPER_LOG("storing r%d in 0x%lx\n", r1, dest);
+        stq(dest, env->regs[r1]);
+        dest += 8;
+        if (r1 == r3) break;
+        r1 = (r1 + 1) & 15;
+    }
+}
+
+/* set condition code for signed 32-bit arithmetics */
+uint32_t HELPER(set_cc_s32)(int32_t v)
+{
+    if (v < 0) return 1;
+    else if (v > 0) return 2;
+    else return 0;
+}
+
+/* set condition code for signed 64-bit arithmetics */
+uint32_t HELPER(set_cc_s64)(int64_t v)
+{
+    if (v < 0) return 1;
+    else if (v > 0) return 2;
+    else return 0;
+}
+
+/* set condition code for signed 32-bit two's complement */
+uint32_t HELPER(set_cc_comp_s32)(int32_t v)
+{
+    if ((uint32_t)v == 0x80000000UL) return 3;
+    else if (v < 0) return 1;
+    else if (v > 0) return 2;
+    else return 0;
+}
+
+/* set condition code for signed 64-bit two's complement */
+uint32_t HELPER(set_cc_comp_s64)(int64_t v)
+{
+    if ((uint64_t)v == 0x8000000000000000ULL) return 3;
+    else if (v < 0) return 1;
+    else if (v > 0) return 2;
+    else return 0;
+}
+
+/* set negative/zero condition code for 32-bit logical op */
+uint32_t HELPER(set_cc_nz_u32)(uint32_t v)
+{
+    if (v) return 1;
+    else return 0;
+}
+
+/* set negative/zero condition code for 64-bit logical op */
+uint32_t HELPER(set_cc_nz_u64)(uint64_t v)
+{
+    if (v) return 1;
+    else return 0;
+}
+
+/* set condition code for insert character under mask insn */
+uint32_t HELPER(set_cc_icm)(uint32_t mask, uint32_t val)
+{
+    HELPER_LOG("%s: mask 0x%x val %d\n", __FUNCTION__, mask, val);
+    uint32_t cc;
+    if (!val || !mask) cc = 0;
+    else {
+        while (mask != 1) {
+            mask >>= 1;
+            val >>= 8;
+        }
+        if (val & 0x80) cc = 1;
+        else cc = 2;
+    }
+    return cc;
+}
+
+/* relative conditional branch */
+void HELPER(brc)(uint32_t cc, uint32_t mask, uint64_t pc, int32_t offset)
+{
+    if ( mask & ( 1 << (3 - cc) ) ) {
+        env->psw.addr = pc + offset;
+    }
+    else {
+        env->psw.addr = pc + 4;
+    }
+}
+
+/* branch relative on 64-bit count (condition is computed inline, this only
+   does the branch */
+void HELPER(brctg)(uint64_t flag, uint64_t pc, int32_t offset)
+{
+    if (flag) {
+        env->psw.addr = pc + offset;
+    }
+    else {
+        env->psw.addr = pc + 4;
+    }
+    HELPER_LOG("%s: pc 0x%lx flag %ld psw.addr 0x%lx\n", __FUNCTION__, pc, flag,
+             env->psw.addr);
+}
+
+/* branch relative on 32-bit count (condition is computed inline, this only
+   does the branch */
+void HELPER(brct)(uint32_t flag, uint64_t pc, int32_t offset)
+{
+    if (flag) {
+        env->psw.addr = pc + offset;
+    }
+    else {
+        env->psw.addr = pc + 4;
+    }
+    HELPER_LOG("%s: pc 0x%lx flag %d psw.addr 0x%lx\n", __FUNCTION__, pc, flag,
+             env->psw.addr);
+}
+
+/* relative conditional branch with long displacement */
+void HELPER(brcl)(uint32_t cc, uint32_t mask, uint64_t pc, int64_t offset)
+{
+    if ( mask & ( 1 << (3 - cc) ) ) {
+        env->psw.addr = pc + offset;
+    }
+    else {
+        env->psw.addr = pc + 6;
+    }
+    HELPER_LOG("%s: pc 0x%lx psw.addr 0x%lx\n", __FUNCTION__, pc, env->psw.addr);
+}
+
+/* conditional branch to register (register content is passed as target) */
+void HELPER(bcr)(uint32_t cc, uint32_t mask, uint64_t target, uint64_t pc)
+{
+    if ( mask & ( 1 << (3 - cc) ) ) {
+        env->psw.addr = target;
+    }
+    else {
+        env->psw.addr = pc + 2;
+    }
+}
+
+/* conditional branch to address (address is passed as target) */
+void HELPER(bc)(uint32_t cc, uint32_t mask, uint64_t target, uint64_t pc)
+{
+    if ( mask & ( 1 << (3 - cc) ) ) {
+        env->psw.addr = target;
+    }
+    else {
+        env->psw.addr = pc + 4;
+    }
+    HELPER_LOG("%s: pc 0x%lx psw.addr 0x%lx r2 0x%lx r5 0x%lx\n", __FUNCTION__,
+             pc, env->psw.addr, env->regs[2], env->regs[5]);
+}
+
+/* 64-bit unsigned comparison */
+uint32_t HELPER(cmp_u64)(uint64_t o1, uint64_t o2)
+{
+    if (o1 < o2) return 1;
+    else if (o1 > o2) return 2;
+    else return 0;
+}
+
+/* 32-bit unsigned comparison */
+uint32_t HELPER(cmp_u32)(uint32_t o1, uint32_t o2)
+{
+    HELPER_LOG("%s: o1 0x%x o2 0x%x\n", __FUNCTION__, o1, o2);
+    if (o1 < o2) return 1;
+    else if (o1 > o2) return 2;
+    else return 0;
+}
+
+/* 64-bit signed comparison */
+uint32_t HELPER(cmp_s64)(int64_t o1, int64_t o2)
+{
+    HELPER_LOG("%s: o1 %ld o2 %ld\n", __FUNCTION__, o1, o2);
+    if (o1 < o2) return 1;
+    else if (o1 > o2) return 2;
+    else return 0;
+}
+
+/* 32-bit signed comparison */
+uint32_t HELPER(cmp_s32)(int32_t o1, int32_t o2)
+{
+    if (o1 < o2) return 1;
+    else if (o1 > o2) return 2;
+    else return 0;
+}
+
+/* compare logical under mask */
+uint32_t HELPER(clm)(uint32_t r1, uint32_t mask, uint64_t addr)
+{
+    uint8_t r,d;
+    uint32_t cc;
+    HELPER_LOG("%s: r1 0x%x mask 0x%x addr 0x%lx\n",__FUNCTION__,r1,mask,addr);
+    cc = 0;
+    while (mask) {
+        if (mask & 8) {
+            d = ldub(addr);
+            r = (r1 & 0xff000000UL) >> 24;
+            HELPER_LOG("mask 0x%x %02x/%02x (0x%lx) ", mask, r, d, addr);
+            if (r < d) {
+                cc = 1;
+                break;
+            }
+            else if (r > d) {
+                cc = 2;
+                break;
+            }
+            addr++;
+        }
+        mask = (mask << 1) & 0xf;
+        r1 <<= 8;
+    }
+    HELPER_LOG("\n");
+    return cc;
+}
+
+/* store character under mask */
+void HELPER(stcm)(uint32_t r1, uint32_t mask, uint64_t addr)
+{
+    uint8_t r;
+    HELPER_LOG("%s: r1 0x%x mask 0x%x addr 0x%lx\n",__FUNCTION__,r1,mask,addr);
+    while (mask) {
+        if (mask & 8) {
+            r = (r1 & 0xff000000UL) >> 24;
+            stb(addr, r);
+            HELPER_LOG("mask 0x%x %02x (0x%lx) ", mask, r, addr);
+            addr++;
+        }
+        mask = (mask << 1) & 0xf;
+        r1 <<= 8;
+    }
+    HELPER_LOG("\n");
+}
+
+/* 64/64 -> 128 unsigned multiplication */
+void HELPER(mlg)(uint32_t r1, uint64_t v2)
+{
+    __uint128_t res = (__uint128_t)env->regs[r1 + 1];
+    res *= (__uint128_t)v2;
+    env->regs[r1] = (uint64_t)(res >> 64);
+    env->regs[r1 + 1] = (uint64_t)res;
+}
+
+/* 128 -> 64/64 unsigned division */
+void HELPER(dlg)(uint32_t r1, uint64_t v2)
+{
+    __uint128_t dividend = (((__uint128_t)env->regs[r1]) << 64) | 
+                           (env->regs[r1+1]);
+    uint64_t divisor = v2;
+    __uint128_t quotient = dividend / divisor;
+    env->regs[r1+1] = quotient;
+    __uint128_t remainder = dividend % divisor;
+    env->regs[r1] = remainder;
+    HELPER_LOG("%s: dividend 0x%016lx%016lx divisor 0x%lx quotient 0x%lx rem 0x%lx\n",
+               __FUNCTION__, (uint64_t)(dividend >> 64), (uint64_t)dividend, divisor, (uint64_t)quotient,
+               (uint64_t)remainder);
+}
+
+/* set condition code for 64-bit signed addition */
+uint32_t HELPER(set_cc_add64)(int64_t a1, int64_t a2, int64_t ar)
+{
+    if ((a1 > 0 && a2 > 0 && ar < 0) || (a1 < 0 && a2 < 0 && ar > 0)) {
+        return 3; /* overflow */
+    }
+    else {
+        if (ar < 0) return 1;
+        else if (ar > 0) return 2;
+        else return 0;
+    }
+}
+
+/* set condition code for 64-bit unsigned addition */
+uint32_t HELPER(set_cc_addu64)(uint64_t a1, uint64_t a2, uint64_t ar)
+{
+    if (ar == 0) {
+        if (a1) return 2;
+        else return 0;
+    }
+    else {
+        if (ar < a1 || ar < a2) {
+          return 3;
+        }
+        else {
+          return 1;
+        }
+    }
+}
+
+/* set condition code for 32-bit signed addition */
+uint32_t HELPER(set_cc_add32)(int32_t a1, int32_t a2, int32_t ar)
+{
+    if ((a1 > 0 && a2 > 0 && ar < 0) || (a1 < 0 && a2 < 0 && ar > 0)) {
+        return 3; /* overflow */
+    }
+    else {
+        if (ar < 0) return 1;
+        else if (ar > 0) return 2;
+        else return 0;
+    }
+}
+
+/* set condition code for 32-bit unsigned addition */
+uint32_t HELPER(set_cc_addu32)(uint32_t a1, uint32_t a2, uint32_t ar)
+{
+    if (ar == 0) {
+        if (a1) return 2;
+        else return 0;
+    }
+    else {
+        if (ar < a1 || ar < a2) {
+          return 3;
+        }
+        else {
+          return 1;
+        }
+    }
+}
+
+/* set condition code for 64-bit signed subtraction */
+uint32_t HELPER(set_cc_sub64)(int64_t s1, int64_t s2, int64_t sr)
+{
+    if ((s1 > 0 && s2 < 0 && sr < 0) || (s1 < 0 && s2 > 0 && sr > 0)) {
+        return 3; /* overflow */
+    }
+    else {
+        if (sr < 0) return 1;
+        else if (sr > 0) return 2;
+        else return 0;
+    }
+}
+
+/* set condition code for 32-bit signed subtraction */
+uint32_t HELPER(set_cc_sub32)(int32_t s1, int32_t s2, int32_t sr)
+{
+    if ((s1 > 0 && s2 < 0 && sr < 0) || (s1 < 0 && s2 > 0 && sr > 0)) {
+        return 3; /* overflow */
+    }
+    else {
+        if (sr < 0) return 1;
+        else if (sr > 0) return 2;
+        else return 0;
+    }
+}
+
+/* set condition code for 32-bit unsigned subtraction */
+uint32_t HELPER(set_cc_subu32)(uint32_t s1, uint32_t s2, uint32_t sr)
+{
+    if (sr == 0) return 2;
+    else {
+        if (s2 > s1) return 1;
+        else return 3;
+    }
+}
+
+/* set condition code for 64-bit unsigned subtraction */
+uint32_t HELPER(set_cc_subu64)(uint64_t s1, uint64_t s2, uint64_t sr)
+{
+    if (sr == 0) return 2;
+    else {
+        if (s2 > s1) return 1;
+        else return 3;
+    }
+}
+
+/* search string (c is byte to search, r2 is string, r1 end of string) */
+uint32_t HELPER(srst)(uint32_t c, uint32_t r1, uint32_t r2)
+{
+    HELPER_LOG("%s: c %d *r1 0x%lx *r2 0x%lx\n", __FUNCTION__, c, env->regs[r1],
+             env->regs[r2]);
+    uint64_t i;
+    uint32_t cc;
+    for (i = env->regs[r2]; i != env->regs[r1]; i++) {
+        if (ldub(i) == c) {
+            env->regs[r1] = i;
+            cc = 1;
+            return cc;
+        }
+    }
+    cc = 2;
+    return cc;
+}
+
+/* unsigned string compare (c is string terminator) */
+uint32_t HELPER(clst)(uint32_t c, uint32_t r1, uint32_t r2)
+{
+    uint64_t s1 = env->regs[r1];
+    uint64_t s2 = env->regs[r2];
+    uint8_t v1, v2;
+    uint32_t cc;
+    c = c & 0xff;
+#ifdef CONFIG_USER_ONLY
+    if (!c) {
+        HELPER_LOG("%s: comparing '%s' and '%s'\n",
+                   __FUNCTION__, (char*)s1, (char*)s2);
+    }
+#endif
+    for (;;) {
+        v1 = ldub(s1);
+        v2 = ldub(s2);
+        if (v1 == c || v2 == c) break;
+        if (v1 != v2) break;
+        s1++; s2++;
+    }
+    
+    if (v1 == v2) cc = 0;
+    else {
+        if (v1 < v2) cc = 1;
+        else cc = 2;
+        env->regs[r1] = s1;
+        env->regs[r2] = s2;
+    }
+    return cc;
+}
+
+/* string copy (c is string terminator) */
+uint32_t HELPER(mvst)(uint32_t c, uint32_t r1, uint32_t r2)
+{
+    uint64_t dest = env->regs[r1];
+    uint64_t src = env->regs[r2];
+    uint8_t v;
+    c = c & 0xff;
+#ifdef CONFIG_USER_ONLY
+    if (!c) {
+        HELPER_LOG("%s: copying '%s' to 0x%lx\n", __FUNCTION__, (char*)src, dest);
+    }
+#endif
+    for (;;) {
+        v = ldub(src);
+        stb(dest, v);
+        if (v == c) break;
+        src++; dest++;
+    }
+    env->regs[r1] = dest;
+    return 1;
+}
+
+/* compare and swap 64-bit */
+uint32_t HELPER(csg)(uint32_t r1, uint64_t a2, uint32_t r3)
+{
+    /* FIXME: locking? */
+    uint32_t cc;
+    uint64_t v2 = ldq(a2);
+    if (env->regs[r1] == v2) {
+        cc = 0;
+        stq(a2, env->regs[r3]);
+    }
+    else {
+        cc = 1;
+        env->regs[r1] = v2;
+    }
+    return cc;
+}
+
+/* compare double and swap 64-bit */
+uint32_t HELPER(cdsg)(uint32_t r1, uint64_t a2, uint32_t r3)
+{
+    /* FIXME: locking? */
+    uint32_t cc;
+    __uint128_t v2 = (((__uint128_t)ldq(a2)) << 64) | (__uint128_t)ldq(a2 + 8);
+    __uint128_t v1 = (((__uint128_t)env->regs[r1]) << 64) | (__uint128_t)env->regs[r1 + 1];
+    if (v1 == v2) {
+        cc = 0;
+        stq(a2, env->regs[r3]);
+        stq(a2 + 8, env->regs[r3 + 1]);
+    }
+    else {
+        cc = 1;
+        env->regs[r1] = v2 >> 64;
+        env->regs[r1 + 1] = v2 & 0xffffffffffffffffULL;
+    }
+    return cc;
+}
+
+/* compare and swap 32-bit */
+uint32_t HELPER(cs)(uint32_t r1, uint64_t a2, uint32_t r3)
+{
+    /* FIXME: locking? */
+    uint32_t cc;
+    HELPER_LOG("%s: r1 %d a2 0x%lx r3 %d\n", __FUNCTION__, r1, a2, r3);
+    uint32_t v2 = ldl(a2);
+    if (((uint32_t)env->regs[r1]) == v2) {
+        cc = 0;
+        stl(a2, (uint32_t)env->regs[r3]);
+    }
+    else {
+        cc = 1;
+        env->regs[r1] = (env->regs[r1] & 0xffffffff00000000ULL) | v2;
+    }
+    return cc;
+}
+
+/* execute instruction
+   this instruction executes an insn modified with the contents of r1
+   it does not change the executed instruction in memory
+   it does not change the program counter
+   in other words: tricky...
+   currently implemented by interpreting the cases it is most commonly used in
+ */
+uint32_t HELPER(ex)(uint32_t cc, uint64_t v1, uint64_t addr, uint64_t ret)
+{
+    uint16_t insn = lduw(addr);
+    HELPER_LOG("%s: v1 0x%lx addr 0x%lx insn 0x%x\n", __FUNCTION__, v1, addr,
+             insn);
+    if ((insn & 0xf0ff) == 0xd000) {
+        uint32_t l, insn2, b, d1, d2;
+        l = v1 & 0xff;
+        insn2 = ldl(addr + 2);
+        b = (((insn2 >> 28) & 0xf) << 4) | ((insn2 >> 12) & 0xf);
+        d1 = (insn2 >> 16) & 0xfff;
+        d2 = insn2 & 0xfff;
+        switch (insn & 0xf00) {
+        case 0x200: helper_mvc(l, b, d1, d2); return cc; break;
+        case 0x500: return helper_clc(l, b, d1, d2); break;
+        case 0x700: return helper_xc(l, b, d1, d2); break;
+        default: helper_exception(23); break;
+        }
+    }
+    else if ((insn & 0xff00) == 0x0a00) {	/* supervisor call */
+        HELPER_LOG("%s: svc %ld via execute\n", __FUNCTION__, (insn|v1) & 0xff);
+        env->psw.addr = ret;
+        helper_exception(EXCP_EXECUTE_SVC + ((insn | v1) & 0xff));
+    }
+    else {
+        helper_exception(23);
+    }
+    return cc;
+}
+
+/* set condition code for test under mask */
+uint32_t HELPER(tm)(uint32_t val, uint32_t mask)
+{
+    HELPER_LOG("%s: val 0x%x mask 0x%x\n", __FUNCTION__, val, mask);
+    uint16_t r = val & mask;
+    if (r == 0) return 0;
+    else if (r == mask) return 3;
+    else return 1;
+}
+
+/* set condition code for test under mask */
+uint32_t HELPER(tmxx)(uint64_t val, uint32_t mask)
+{
+    uint16_t r = val & mask;
+    HELPER_LOG("%s: val 0x%lx mask 0x%x r 0x%x\n", __FUNCTION__, val, mask, r);
+    if (r == 0) return 0;
+    else if (r == mask) return 3;
+    else {
+        while (!(mask & 0x8000)) {
+            mask <<= 1;
+            val <<= 1;
+        }
+        if (val & 0x8000) return 2;
+        else return 1;
+    }
+}
+
+/* absolute value 32-bit */
+uint32_t HELPER(abs_i32)(uint32_t reg, int32_t val)
+{
+    uint32_t cc;
+    if (val == 0x80000000UL) cc = 3;
+    else if (val) cc = 1;
+    else cc = 0;
+
+    if (val < 0) {
+        env->regs[reg] = -val;
+    }
+    else {
+        env->regs[reg] = val;
+    }
+    return cc;
+}
+
+/* negative absolute value 32-bit */
+uint32_t HELPER(nabs_i32)(uint32_t reg, int32_t val)
+{
+    uint32_t cc;
+    if (val) cc = 1;
+    else cc = 0;
+    
+    if (val < 0) {
+        env->regs[reg] = (env->regs[reg] & 0xffffffff00000000ULL) | val;
+    }
+    else {
+        env->regs[reg] = (env->regs[reg] & 0xffffffff00000000ULL) | ((uint32_t)-val);
+    }
+    return cc;
+}
+
+/* absolute value 64-bit */
+uint32_t HELPER(abs_i64)(uint32_t reg, int64_t val)
+{
+    uint32_t cc;
+    if (val == 0x8000000000000000ULL) cc = 3;
+    else if (val) cc = 1;
+    else cc = 0;
+    
+    if (val < 0) {
+        env->regs[reg] = -val;
+    }
+    else {
+        env->regs[reg] = val;
+    }
+    return cc;
+}
+
+/* negative absolute value 64-bit */
+uint32_t HELPER(nabs_i64)(uint32_t reg, int64_t val)
+{
+    uint32_t cc;
+    if (val) cc = 1;
+    else cc = 0;
+
+    if (val < 0) {
+        env->regs[reg] = val;
+    }
+    else {
+        env->regs[reg] = -val;
+    }
+    return cc;
+}
+
+/* add with carry 32-bit unsigned */
+uint32_t HELPER(addc_u32)(uint32_t cc, uint32_t r1, uint32_t v2)
+{
+    uint32_t res;
+    uint32_t v1 = env->regs[r1] & 0xffffffffUL;
+    res = v1 + v2;
+    if (cc & 2) res++;
+
+    if (res == 0) {
+        if (v1) cc = 2;
+        else cc = 0;
+    }
+    else {
+        if (res < v1 || res < v2)
+          cc = 3;
+        else
+          cc = 1;
+    }
+    env->regs[r1] = (env->regs[r1] & 0xffffffff00000000ULL) | res;
+    return cc;
+}
+
+/* CC for add with carry 64-bit unsigned (isn't this a duplicate of some other CC function?) */
+uint32_t HELPER(set_cc_addc_u64)(uint64_t v1, uint64_t v2, uint64_t res)
+{
+    uint32_t cc;
+    if (res == 0) {
+        if (v1) cc = 2;
+        else cc = 0;
+    }
+    else {
+        if (res < v1 || res < v2) {
+          cc = 3;
+        }
+        else {
+          cc = 1;
+        }
+    }
+    return cc;
+}
+
+/* store character under mask high
+   operates on the upper half of r1 */
+uint32_t HELPER(stcmh)(uint32_t r1, uint64_t address, uint32_t mask)
+{
+    int pos = 56; /* top of the upper half of r1 */
+    
+    while (mask) {
+        if (mask & 8) {
+            stb(address, (env->regs[r1] >> pos) & 0xff);
+            address++;
+        }
+        mask = (mask << 1) & 0xf;
+        pos -= 8;
+    }
+    return 0;
+}
+
+/* insert character under mask high
+   same as icm, but operates on the upper half of r1 */
+uint32_t HELPER(icmh)(uint32_t r1, uint64_t address, uint32_t mask)
+{
+    int pos = 56; /* top of the upper half of r1 */
+    uint64_t rmask = 0xff00000000000000ULL;
+    uint8_t val = 0;
+    int ccd = 0;
+    uint32_t cc;
+    
+    cc = 0;
+    
+    while (mask) {
+        if (mask & 8) {
+            env->regs[r1] &= ~rmask;
+            val = ldub(address);
+            if ((val & 0x80) && !ccd) cc = 1;
+            ccd = 1;
+            if (val && cc == 0) cc = 2;
+            env->regs[r1] |= (uint64_t)val << pos;
+            address++;
+        }
+        mask = (mask << 1) & 0xf;
+        pos -= 8;
+        rmask >>= 8;
+    }
+    return cc;
+}
+
+/* insert psw mask and condition code into r1 */
+void HELPER(ipm)(uint32_t cc, uint32_t r1)
+{
+    uint64_t r = env->regs[r1];
+    r &= 0xffffffff00ffffffULL;
+    r |= (cc << 28) | ( (env->psw.mask >> 40) & 0xf );
+    env->regs[r1] = r;
+    HELPER_LOG("%s: cc %d psw.mask 0x%lx r1 0x%lx\n", __FUNCTION__, cc, env->psw.mask, r);
+}
+
+/* store access registers r1 to r3 in memory at a2 */
+void HELPER(stam)(uint32_t r1, uint64_t a2, uint32_t r3)
+{
+    int i;
+    for (i = r1; i != ((r3 + 1) & 15); i = (i + 1) & 15) {
+        stl(a2, env->aregs[i]);
+        a2 += 4;
+    }
+}
+
+/* move long extended
+   another memcopy insn with more bells and whistles */
+uint32_t HELPER(mvcle)(uint32_t r1, uint64_t a2, uint32_t r3)
+{
+    uint64_t destlen = env->regs[r1 + 1];
+    uint64_t dest = env->regs[r1];
+    uint64_t srclen = env->regs[r3 + 1];
+    uint64_t src = env->regs[r3];
+    uint8_t pad = a2 & 0xff;
+    uint8_t v;
+    uint32_t cc;
+    if (destlen == srclen) cc = 0;
+    else if (destlen < srclen) cc = 1;
+    else cc = 2;
+    if (srclen > destlen) srclen = destlen;
+    for(;destlen && srclen;src++,dest++,destlen--,srclen--) {
+        v = ldub(src);
+        stb(dest, v);
+    }
+    for(;destlen;dest++,destlen--) {
+        stb(dest, pad);
+    }
+    env->regs[r1 + 1] = destlen;
+    env->regs[r3 + 1] -= src - env->regs[r3]; /* can't use srclen here,
+                                                 we trunc'ed it */
+    env->regs[r1] = dest;
+    env->regs[r3] = src;
+    
+    return cc;
+}
+
+/* compare logical long extended
+   memcompare insn with padding */
+uint32_t HELPER(clcle)(uint32_t r1, uint64_t a2, uint32_t r3)
+{
+    uint64_t destlen = env->regs[r1 + 1];
+    uint64_t dest = env->regs[r1];
+    uint64_t srclen = env->regs[r3 + 1];
+    uint64_t src = env->regs[r3];
+    uint8_t pad = a2 & 0xff;
+    uint8_t v1 = 0,v2 = 0;
+    uint32_t cc = 0;
+    if (!(destlen || srclen)) return cc;
+    if (srclen > destlen) srclen = destlen;
+    for(;destlen || srclen;src++,dest++,destlen--,srclen--) {
+        if (srclen) v1 = ldub(src);
+        else v1 = pad;
+        if (destlen) v2 = ldub(dest);
+        else v2 = pad;
+        if (v1 != v2) break;
+    }
+
+    env->regs[r1 + 1] = destlen;
+    env->regs[r3 + 1] -= src - env->regs[r3]; /* can't use srclen here,
+                                                 we trunc'ed it */
+    env->regs[r1] = dest;
+    env->regs[r3] = src;
+    
+    if (v1 < v2) cc = 1;
+    else if (v1 > v2) cc = 2;
+    
+    return cc;
+}
+
+/* subtract unsigned v2 from v1 with borrow */
+uint32_t HELPER(slb)(uint32_t cc, uint32_t r1, uint32_t v1, uint32_t v2)
+{
+    uint32_t res = v1 + (~v2) + (cc >> 1);
+    env->regs[r1] = (env->regs[r1] & 0xffffffff00000000ULL) | res;
+    if (cc & 2) { /* borrow */
+        if (v1) return 1;
+        else return 0;
+    }
+    else {
+        if (v1) return 3;
+        else return 2;
+    }
+}
+
+/* subtract unsigned v2 from v1 with borrow */
+uint32_t HELPER(slbg)(uint32_t cc, uint32_t r1, uint64_t v1, uint64_t v2)
+{
+    uint64_t res = v1 + (~v2) + (cc >> 1);
+    env->regs[r1] = res;
+    if (cc & 2) { /* borrow */
+        if (v1) return 1;
+        else return 0;
+    }
+    else {
+        if (v1) return 3;
+        else return 2;
+    }
+}
+
+/* union used for splitting/joining 128-bit floats to/from 64-bit FP regs */
+typedef union {
+    struct {
+#ifdef WORDS_BIGENDIAN
+        uint64_t h;
+        uint64_t l;
+#else
+        uint64_t l;
+        uint64_t h;
+#endif
+    };
+    float128 x;
+} FP128;
+
+/* condition codes for binary FP ops */
+static uint32_t set_cc_f32(float32 v1, float32 v2)
+{
+    if (float32_is_nan(v1) || float32_is_nan(v2)) return 3;
+    else if (float32_eq(v1, v2, &env->fpu_status)) return 0;
+    else if (float32_lt(v1, v2, &env->fpu_status)) return 1;
+    else return 2;
+}
+
+static uint32_t set_cc_f64(float64 v1, float64 v2)
+{
+    if (float64_is_nan(v1) || float64_is_nan(v2)) return 3;
+    else if (float64_eq(v1, v2, &env->fpu_status)) return 0;
+    else if (float64_lt(v1, v2, &env->fpu_status)) return 1;
+    else return 2;
+}
+
+/* condition codes for unary FP ops */
+static uint32_t set_cc_nz_f32(float32 v)
+{
+    if (float32_is_nan(v)) return 3;
+    else if (float32_is_zero(v)) return 0;
+    else if (float32_is_neg(v)) return 1;
+    else return 2;
+}
+
+static uint32_t set_cc_nz_f64(float64 v)
+{
+    if (float64_is_nan(v)) return 3;
+    else if (float64_is_zero(v)) return 0;
+    else if (float64_is_neg(v)) return 1;
+    else return 2;
+}
+
+static uint32_t set_cc_nz_f128(float128 v)
+{
+    if (float128_is_nan(v)) return 3;
+    else if (float128_is_zero(v)) return 0;
+    else if (float128_is_neg(v)) return 1;
+    else return 2;
+}
+
+/* convert 32-bit int to 64-bit float */
+void HELPER(cdfbr)(uint32_t f1, int32_t v2)
+{
+    HELPER_LOG("%s: converting %d to f%d\n", __FUNCTION__, v2, f1);
+    env->fregs[f1].d = int32_to_float64(v2, &env->fpu_status);
+}
+
+/* convert 32-bit int to 128-bit float */
+void HELPER(cxfbr)(uint32_t f1, int32_t v2)
+{
+    FP128 v1;
+    v1.x = int32_to_float128(v2, &env->fpu_status);
+    env->fregs[f1].i = v1.h;
+    env->fregs[f1 + 2].i = v1.l;
+}
+
+/* convert 64-bit int to 32-bit float */
+void HELPER(cegbr)(uint32_t f1, int64_t v2)
+{
+    HELPER_LOG("%s: converting %ld to f%d\n", __FUNCTION__, v2, f1);
+    env->fregs[f1].e = int64_to_float32(v2, &env->fpu_status);
+}
+
+/* convert 64-bit int to 64-bit float */
+void HELPER(cdgbr)(uint32_t f1, int64_t v2)
+{
+    HELPER_LOG("%s: converting %ld to f%d\n", __FUNCTION__, v2, f1);
+    env->fregs[f1].d = int64_to_float64(v2, &env->fpu_status);
+}
+
+/* convert 64-bit int to 128-bit float */
+void HELPER(cxgbr)(uint32_t f1, int64_t v2)
+{
+    FP128 x1;
+    x1.x = int64_to_float128(v2, &env->fpu_status);
+    HELPER_LOG("%s: converted %ld to 0x%lx and 0x%lx\n", __FUNCTION__, v2, x1.h, x1.l);
+    env->fregs[f1].i = x1.h;
+    env->fregs[f1 + 2].i = x1.l;
+}
+
+/* convert 32-bit int to 32-bit float */
+void HELPER(cefbr)(uint32_t f1, int32_t v2)
+{
+    env->fregs[f1].e = int32_to_float32(v2, &env->fpu_status);
+    HELPER_LOG("%s: converting %d to 0x%d in f%d\n", __FUNCTION__, v2, env->fregs[f1].e, f1);
+}
+
+/* 32-bit FP addition RR */
+uint32_t HELPER(aebr)(uint32_t f1, uint32_t f2)
+{
+    env->fregs[f1].e = float32_add(env->fregs[f1].e, env->fregs[f2].e, &env->fpu_status);
+    HELPER_LOG("%s: adding 0x%d resulting in 0x%d in f%d\n", __FUNCTION__, env->fregs[f2].e, env->fregs[f1].e, f1);
+    return set_cc_nz_f32(env->fregs[f1].e);
+}
+
+/* 64-bit FP addition RR */
+uint32_t HELPER(adbr)(uint32_t f1, uint32_t f2)
+{
+    env->fregs[f1].d = float64_add(env->fregs[f1].d, env->fregs[f2].d, &env->fpu_status);
+    HELPER_LOG("%s: adding 0x%ld resulting in 0x%ld in f%d\n", __FUNCTION__, env->fregs[f2].d, env->fregs[f1].d, f1);
+    return set_cc_nz_f64(env->fregs[f1].d);
+}
+
+/* 32-bit FP subtraction RR */
+uint32_t HELPER(sebr)(uint32_t f1, uint32_t f2)
+{
+    env->fregs[f1].e = float32_sub(env->fregs[f1].e, env->fregs[f2].e, &env->fpu_status);
+    HELPER_LOG("%s: adding 0x%d resulting in 0x%d in f%d\n", __FUNCTION__, env->fregs[f2].e, env->fregs[f1].e, f1);
+    return set_cc_nz_f32(env->fregs[f1].e);
+}
+
+/* 64-bit FP subtraction RR */
+uint32_t HELPER(sdbr)(uint32_t f1, uint32_t f2)
+{
+    env->fregs[f1].d = float64_sub(env->fregs[f1].d, env->fregs[f2].d, &env->fpu_status);
+    HELPER_LOG("%s: subtracting 0x%ld resulting in 0x%ld in f%d\n", __FUNCTION__, env->fregs[f2].d, env->fregs[f1].d, f1);
+    return set_cc_nz_f64(env->fregs[f1].d);
+}
+
+/* 32-bit FP division RR */
+void HELPER(debr)(uint32_t f1, uint32_t f2)
+{
+    env->fregs[f1].e = float32_div(env->fregs[f1].e, env->fregs[f2].e, &env->fpu_status);
+}
+
+/* 128-bit FP division RR */
+void HELPER(dxbr)(uint32_t f1, uint32_t f2)
+{
+    FP128 v1;
+    v1.h = env->fregs[f1].i;
+    v1.l = env->fregs[f1 + 2].i;
+    FP128 v2;
+    v2.h = env->fregs[f2].i;
+    v2.l = env->fregs[f2 + 2].i;
+    FP128 res;
+    res.x = float128_div(v1.x, v2.x, &env->fpu_status);
+    env->fregs[f1].i = res.h;
+    env->fregs[f1 + 2].i = res.l;
+}
+
+/* 64-bit FP multiplication RR */
+void HELPER(mdbr)(uint32_t f1, uint32_t f2)
+{
+    env->fregs[f1].d = float64_mul(env->fregs[f1].d, env->fregs[f2].d, &env->fpu_status);
+}
+
+/* 128-bit FP multiplication RR */
+void HELPER(mxbr)(uint32_t f1, uint32_t f2)
+{
+    FP128 v1;
+    v1.h = env->fregs[f1].i;
+    v1.l = env->fregs[f1 + 2].i;
+    FP128 v2;
+    v2.h = env->fregs[f2].i;
+    v2.l = env->fregs[f2 + 2].i;
+    FP128 res;
+    res.x = float128_mul(v1.x, v2.x, &env->fpu_status);
+    //HELPER_LOG("%s: 0x%ld * 0x%ld = 0x%ld\n", __FUNCTION__, v1.x, v2.x, res.x);
+    env->fregs[f1].i = res.h;
+    env->fregs[f1 + 2].i = res.l;
+}
+
+/* convert 32-bit float to 64-bit float */
+void HELPER(ldebr)(uint32_t r1, uint32_t r2)
+{
+    env->fregs[r1].d = float32_to_float64(env->fregs[r2].e, &env->fpu_status);
+}
+
+/* convert 128-bit float to 64-bit float */
+void HELPER(ldxbr)(uint32_t f1, uint32_t f2)
+{
+    FP128 x2;
+    x2.h = env->fregs[f2].i;
+    x2.l = env->fregs[f2 + 2].i;
+    //HELPER_LOG("%s: converted %llf ", __FUNCTION__, x2.x);
+    env->fregs[f1].d = float128_to_float64(x2.x, &env->fpu_status);
+    HELPER_LOG("%s: to 0x%ld\n", __FUNCTION__, env->fregs[f1].d);
+}
+
+/* convert 64-bit float to 128-bit float */
+void HELPER(lxdbr)(uint32_t f1, uint32_t f2)
+{
+    FP128 res;
+    res.x = float64_to_float128(env->fregs[f2].d, &env->fpu_status);
+    env->fregs[f1].i = res.h;
+    env->fregs[f1 + 2].i = res.l;
+}
+
+/* convert 64-bit float to 32-bit float */
+void HELPER(ledbr)(uint32_t f1, uint32_t f2)
+{
+    float64 d2 = env->fregs[f2].d;
+    env->fregs[f1].e = float64_to_float32(d2, &env->fpu_status);
+}
+
+/* convert 128-bit float to 32-bit float */
+void HELPER(lexbr)(uint32_t f1, uint32_t f2)
+{
+    FP128 x2;
+    x2.h = env->fregs[f2].i;
+    x2.l = env->fregs[f2 + 2].i;
+    //HELPER_LOG("%s: converted %llf ", __FUNCTION__, x2.x);
+    env->fregs[f1].e = float128_to_float32(x2.x, &env->fpu_status);
+    HELPER_LOG("%s: to 0x%d\n", __FUNCTION__, env->fregs[f1].e);
+}
+
+/* absolute value of 32-bit float */
+uint32_t HELPER(lpebr)(uint32_t f1, uint32_t f2)
+{
+    float32 v1;
+    float32 v2 = env->fregs[f2].d;
+    if (float32_is_neg(v2)) {
+        v1 = float32_abs(v2);
+    }
+    else {
+        v1 = v2;
+    }
+    env->fregs[f1].d = v1;
+    return set_cc_nz_f32(v1);
+}
+
+/* absolute value of 64-bit float */
+uint32_t HELPER(lpdbr)(uint32_t f1, uint32_t f2)
+{
+    float64 v1;
+    float64 v2 = env->fregs[f2].d;
+    if (float64_is_neg(v2)) {
+        v1 = float64_abs(v2);
+    }
+    else {
+        v1 = v2;
+    }
+    env->fregs[f1].d = v1;
+    return set_cc_nz_f64(v1);
+}
+
+/* absolute value of 128-bit float */
+uint32_t HELPER(lpxbr)(uint32_t f1, uint32_t f2)
+{
+    FP128 v1;
+    FP128 v2;
+    v2.h = env->fregs[f2].i;
+    v2.l = env->fregs[f2 + 2].i;
+    if (float128_is_neg(v2.x)) {
+        v1.x = float128_abs(v2.x);
+    }
+    else {
+        v1 = v2;
+    }
+    env->fregs[f1].i = v1.h;
+    env->fregs[f1 + 2].i = v1.l;
+    return set_cc_nz_f128(v1.x);
+}
+
+/* load and test 64-bit float */
+uint32_t HELPER(ltdbr)(uint32_t f1, uint32_t f2)
+{
+    env->fregs[f1].d = env->fregs[f2].d;
+    return set_cc_nz_f64(env->fregs[f1].d);
+}
+
+/* load and test 32-bit float */
+uint32_t HELPER(ltebr)(uint32_t f1, uint32_t f2)
+{
+    env->fregs[f1].e = env->fregs[f2].e;
+    return set_cc_nz_f32(env->fregs[f1].e);
+}
+
+/* load and test 128-bit float */
+uint32_t HELPER(ltxbr)(uint32_t f1, uint32_t f2)
+{
+    FP128 x;
+    x.h = env->fregs[f2].i;
+    x.l = env->fregs[f2 + 2].i;
+    env->fregs[f1].i = x.h;
+    env->fregs[f1 + 2].i = x.l;
+    return set_cc_nz_f128(x.x);
+}
+
+/* negative absolute of 32-bit float */
+uint32_t HELPER(lcebr)(uint32_t f1, uint32_t f2)
+{
+    env->fregs[f1].e = float32_sub(float32_zero, env->fregs[f2].e, &env->fpu_status);
+    return set_cc_nz_f32(env->fregs[f1].e);
+}
+
+/* negative absolute of 64-bit float */
+uint32_t HELPER(lcdbr)(uint32_t f1, uint32_t f2)
+{
+    env->fregs[f1].d = float64_sub(float64_zero, env->fregs[f2].d, &env->fpu_status);
+    return set_cc_nz_f64(env->fregs[f1].d);
+}
+
+/* convert 64-bit float to 128-bit float */
+uint32_t HELPER(lcxbr)(uint32_t f1, uint32_t f2)
+{
+    FP128 x1, x2;
+    x2.h = env->fregs[f2].i;
+    x2.l = env->fregs[f2 + 2].i;
+    x1.x = float128_sub(float64_to_float128(float64_zero, &env->fpu_status), x2.x, &env->fpu_status);
+    env->fregs[f1].i = x1.h;
+    env->fregs[f1 + 2].i = x1.l;
+    return set_cc_nz_f128(x1.x);
+}
+
+/* 32-bit FP compare RM */
+uint32_t HELPER(ceb)(uint32_t f1, uint64_t a2)
+{
+    float32 v1 = env->fregs[f1].e;
+    union {
+        float32 e;
+        uint32_t i;
+    } v2;
+    v2.i = ldl(a2);
+    HELPER_LOG("%s: comparing 0x%d from f%d and 0x%d\n", __FUNCTION__, v1, f1, v2.e);
+    return set_cc_f32(v1, v2.e);
+}
+
+/* 32-bit FP addition RM */
+uint32_t HELPER(aeb)(uint32_t f1, uint64_t a2)
+{
+    float32 v1 = env->fregs[f1].e;
+    union {
+        float32 e;
+        uint32_t i;
+    } v2;
+    v2.i = ldl(a2);
+    HELPER_LOG("%s: adding 0x%d from f%d and 0x%d\n", __FUNCTION__, v1, f1, v2.e);
+    env->fregs[f1].e = float32_add(v1, v2.e, &env->fpu_status);
+    return set_cc_nz_f32(env->fregs[f1].e);
+}
+
+/* 32-bit FP division RM */
+void HELPER(deb)(uint32_t f1, uint64_t a2)
+{
+    float32 v1 = env->fregs[f1].e;
+    union {
+        float32 e;
+        uint32_t i;
+    } v2;
+    v2.i = ldl(a2);
+    HELPER_LOG("%s: dividing 0x%d from f%d by 0x%d\n", __FUNCTION__, v1, f1, v2.e);
+    env->fregs[f1].e = float32_div(v1, v2.e, &env->fpu_status);
+}
+
+/* 32-bit FP multiplication RM */
+void HELPER(meeb)(uint32_t f1, uint64_t a2)
+{
+    float32 v1 = env->fregs[f1].e;
+    union {
+        float32 e;
+        uint32_t i;
+    } v2;
+    v2.i = ldl(a2);
+    HELPER_LOG("%s: multiplying 0x%d from f%d and 0x%d\n", __FUNCTION__, v1, f1, v2.e);
+    env->fregs[f1].e = float32_mul(v1, v2.e, &env->fpu_status);
+}
+
+/* 32-bit FP compare RR */
+uint32_t HELPER(cebr)(uint32_t f1, uint32_t f2)
+{
+    float32 v1 = env->fregs[f1].e;
+    float32 v2 = env->fregs[f2].e;;
+    HELPER_LOG("%s: comparing 0x%d from f%d and 0x%d\n", __FUNCTION__, v1, f1, v2);
+    return set_cc_f32(v1, v2);
+}
+
+/* 64-bit FP compare RR */
+uint32_t HELPER(cdbr)(uint32_t f1, uint32_t f2)
+{
+    float64 v1 = env->fregs[f1].d;
+    float64 v2 = env->fregs[f2].d;;
+    HELPER_LOG("%s: comparing 0x%ld from f%d and 0x%ld\n", __FUNCTION__, v1, f1, v2);
+    return set_cc_f64(v1, v2);
+}
+
+/* 128-bit FP compare RR */
+uint32_t HELPER(cxbr)(uint32_t f1, uint32_t f2)
+{
+    FP128 v1;
+    v1.h = env->fregs[f1].i;
+    v1.l = env->fregs[f1 + 2].i;
+    FP128 v2;
+    v2.h = env->fregs[f2].i;
+    v2.l = env->fregs[f2 + 2].i;
+    //HELPER_LOG("%s: comparing %llf from f%d and %llf\n", __FUNCTION__, v1.x, f1, v2.x);
+    if (float128_is_nan(v1.x) || float128_is_nan(v2.x)) return 3;
+    else if (float128_eq(v1.x, v2.x, &env->fpu_status)) return 0;
+    else if (float128_lt(v1.x, v2.x, &env->fpu_status)) return 1;
+    else return 2;
+}
+
+/* 64-bit FP compare RM */
+uint32_t HELPER(cdb)(uint32_t f1, uint64_t a2)
+{
+    float64 v1 = env->fregs[f1].d;
+    union {
+        float64 d;
+        uint64_t i;
+    } v2;
+    v2.i = ldq(a2);
+    HELPER_LOG("%s: comparing 0x%ld from f%d and 0x%lx\n", __FUNCTION__, v1, f1, v2.d);
+    return set_cc_f64(v1, v2.d);
+}
+
+/* 64-bit FP addition RM */
+uint32_t HELPER(adb)(uint32_t f1, uint64_t a2)
+{
+    float64 v1 = env->fregs[f1].d;
+    union {
+        float64 d;
+        uint64_t i;
+    } v2;
+    v2.i = ldq(a2);
+    HELPER_LOG("%s: adding 0x%lx from f%d and 0x%lx\n", __FUNCTION__, v1, f1, v2.d);
+    env->fregs[f1].d = v1 = float64_add(v1, v2.d, &env->fpu_status);
+    return set_cc_nz_f64(v1);
+}
+
+/* 32-bit FP subtraction RM */
+uint32_t HELPER(seb)(uint32_t f1, uint64_t a2)
+{
+    float32 v1 = env->fregs[f1].e;
+    union {
+        float32 e;
+        uint32_t i;
+    } v2;
+    v2.i = ldl(a2);
+    env->fregs[f1].e = v1 = float32_sub(v1, v2.e, &env->fpu_status);
+    return set_cc_nz_f32(v1);
+}
+
+/* 64-bit FP subtraction RM */
+uint32_t HELPER(sdb)(uint32_t f1, uint64_t a2)
+{
+    float64 v1 = env->fregs[f1].d;
+    union {
+        float64 d;
+        uint64_t i;
+    } v2;
+    v2.i = ldq(a2);
+    env->fregs[f1].d = v1 = float64_sub(v1, v2.d, &env->fpu_status);
+    return set_cc_nz_f64(v1);
+}
+
+/* 64-bit FP multiplication RM */
+void HELPER(mdb)(uint32_t f1, uint64_t a2)
+{
+    float64 v1 = env->fregs[f1].d;
+    union {
+        float64 d;
+        uint64_t i;
+    } v2;
+    v2.i = ldq(a2);
+    HELPER_LOG("%s: multiplying 0x%lx from f%d and 0x%ld\n", __FUNCTION__, v1, f1, v2.d);
+    env->fregs[f1].d = float64_mul(v1, v2.d, &env->fpu_status);
+}
+
+/* 64-bit FP division RM */
+void HELPER(ddb)(uint32_t f1, uint64_t a2)
+{
+    float64 v1 = env->fregs[f1].d;
+    union {
+        float64 d;
+        uint64_t i;
+    } v2;
+    v2.i = ldq(a2);
+    HELPER_LOG("%s: dividing 0x%lx from f%d by 0x%ld\n", __FUNCTION__, v1, f1, v2.d);
+    env->fregs[f1].d = float64_div(v1, v2.d, &env->fpu_status);
+}
+
+static void set_round_mode(int m3)
+{
+    switch (m3) {
+    case 0: break; /* current mode */
+    case 1: /* biased round no nearest */
+    case 4: /* round to nearest */
+        set_float_rounding_mode(float_round_nearest_even, &env->fpu_status);
+        break;
+    case 5: /* round to zero */
+        set_float_rounding_mode(float_round_to_zero, &env->fpu_status);
+        break;
+    case 6: /* round to +inf */
+        set_float_rounding_mode(float_round_up, &env->fpu_status);
+        break;
+    case 7: /* round to -inf */
+        set_float_rounding_mode(float_round_down, &env->fpu_status);
+        break;
+    }
+}
+
+/* convert 32-bit float to 64-bit int */
+uint32_t HELPER(cgebr)(uint32_t r1, uint32_t f2, uint32_t m3)
+{
+    float32 v2 = env->fregs[f2].e;
+    set_round_mode(m3);
+    env->regs[r1] = float32_to_int64(v2, &env->fpu_status);
+    return set_cc_nz_f32(v2);
+}
+
+/* convert 64-bit float to 64-bit int */
+uint32_t HELPER(cgdbr)(uint32_t r1, uint32_t f2, uint32_t m3)
+{
+    float64 v2 = env->fregs[f2].d;
+    set_round_mode(m3);
+    env->regs[r1] = float64_to_int64(v2, &env->fpu_status);
+    return set_cc_nz_f64(v2);
+}
+
+/* convert 128-bit float to 64-bit int */
+uint32_t HELPER(cgxbr)(uint32_t r1, uint32_t f2, uint32_t m3)
+{
+    FP128 v2;
+    v2.h = env->fregs[f2].i;
+    v2.l = env->fregs[f2 + 2].i;
+    set_round_mode(m3);
+    env->regs[r1] = float128_to_int64(v2.x, &env->fpu_status);
+    if (float128_is_nan(v2.x)) return 3;
+    else if (float128_is_zero(v2.x)) return 0;
+    else if (float128_is_neg(v2.x)) return 1;
+    else return 2;
+}
+
+/* convert 32-bit float to 32-bit int */
+uint32_t HELPER(cfebr)(uint32_t r1, uint32_t f2, uint32_t m3)
+{
+    float32 v2 = env->fregs[f2].e;
+    set_round_mode(m3);
+    env->regs[r1] = (env->regs[r1] & 0xffffffff00000000ULL) | float32_to_int32(v2, &env->fpu_status);
+    return set_cc_nz_f32(v2);
+}
+
+/* convert 64-bit float to 32-bit int */
+uint32_t HELPER(cfdbr)(uint32_t r1, uint32_t f2, uint32_t m3)
+{
+    float64 v2 = env->fregs[f2].d;
+    set_round_mode(m3);
+    env->regs[r1] = (env->regs[r1] & 0xffffffff00000000ULL) | float64_to_int32(v2, &env->fpu_status);
+    return set_cc_nz_f64(v2);
+}
+
+/* convert 128-bit float to 32-bit int */
+uint32_t HELPER(cfxbr)(uint32_t r1, uint32_t f2, uint32_t m3)
+{
+    FP128 v2;
+    v2.h = env->fregs[f2].i;
+    v2.l = env->fregs[f2 + 2].i;
+    env->regs[r1] = (env->regs[r1] & 0xffffffff00000000ULL) | float128_to_int32(v2.x, &env->fpu_status);
+    return set_cc_nz_f128(v2.x);
+}
+
+/* load 32-bit FP zero */
+void HELPER(lzer)(uint32_t f1)
+{
+    env->fregs[f1].e = float32_zero;
+}
+
+/* load 64-bit FP zero */
+void HELPER(lzdr)(uint32_t f1)
+{
+    env->fregs[f1].d = float64_zero;
+}
+
+/* load 128-bit FP zero */
+void HELPER(lzxr)(uint32_t f1)
+{
+    FP128 x;
+    x.x = float64_to_float128(float64_zero, &env->fpu_status);
+    env->fregs[f1].i = x.h;
+    env->fregs[f1 + 1].i = x.l;
+}
+
+/* 128-bit FP subtraction RR */
+uint32_t HELPER(sxbr)(uint32_t f1, uint32_t f2)
+{
+    FP128 v1;
+    v1.h = env->fregs[f1].i;
+    v1.l = env->fregs[f1 + 2].i;
+    FP128 v2;
+    v2.h = env->fregs[f2].i;
+    v2.l = env->fregs[f2 + 2].i;
+    FP128 res;
+    res.x = float128_sub(v1.x, v2.x, &env->fpu_status);
+    env->fregs[f1].i = res.h;
+    env->fregs[f1 + 2].i = res.l;
+    return set_cc_nz_f128(res.x);
+}
+
+/* 128-bit FP addition RR */
+uint32_t HELPER(axbr)(uint32_t f1, uint32_t f2)
+{
+    FP128 v1;
+    v1.h = env->fregs[f1].i;
+    v1.l = env->fregs[f1 + 2].i;
+    FP128 v2;
+    v2.h = env->fregs[f2].i;
+    v2.l = env->fregs[f2 + 2].i;
+    FP128 res;
+    res.x = float128_add(v1.x, v2.x, &env->fpu_status);
+    env->fregs[f1].i = res.h;
+    env->fregs[f1 + 2].i = res.l;
+    return set_cc_nz_f128(res.x);
+}
+
+/* 32-bit FP multiplication RR */
+void HELPER(meebr)(uint32_t f1, uint32_t f2)
+{
+    env->fregs[f1].e = float32_mul(env->fregs[f1].e, env->fregs[f2].e, &env->fpu_status);
+}
+
+/* 64-bit FP division RR */
+void HELPER(ddbr)(uint32_t f1, uint32_t f2)
+{
+    env->fregs[f1].d = float64_div(env->fregs[f1].d, env->fregs[f2].d, &env->fpu_status);
+}
+
+/* 64-bit FP multiply and add RM */
+void HELPER(madb)(uint32_t f1, uint64_t a2, uint32_t f3)
+{
+    HELPER_LOG("%s: f1 %d a2 0x%lx f3 %d\n", __FUNCTION__, f1, a2, f3);
+    union {
+        float64 d;
+        uint64_t i;
+    } v2;
+    v2.i = ldq(a2);
+    env->fregs[f1].d = float64_add(env->fregs[f1].d, float64_mul(v2.d, env->fregs[f3].d, &env->fpu_status), &env->fpu_status);
+}
+
+/* 64-bit FP multiply and add RR */
+void HELPER(madbr)(uint32_t f1, uint32_t f3, uint32_t f2)
+{
+    HELPER_LOG("%s: f1 %d f2 %d f3 %d\n", __FUNCTION__, f1, f2, f3);
+    env->fregs[f1].d = float64_add(float64_mul(env->fregs[f2].d, env->fregs[f3].d, &env->fpu_status), env->fregs[f1].d, &env->fpu_status);
+}
+
+/* 64-bit FP multiply and subtract RR */
+void HELPER(msdbr)(uint32_t f1, uint32_t f3, uint32_t f2)
+{
+    HELPER_LOG("%s: f1 %d f2 %d f3 %d\n", __FUNCTION__, f1, f2, f3);
+    env->fregs[f1].d = float64_sub(float64_mul(env->fregs[f2].d, env->fregs[f3].d, &env->fpu_status), env->fregs[f1].d, &env->fpu_status);
+}
+
+/* 32-bit FP multiply and add RR */
+void HELPER(maebr)(uint32_t f1, uint32_t f3, uint32_t f2)
+{
+    env->fregs[f1].e = float32_add(env->fregs[f1].e, float32_mul(env->fregs[f2].e, env->fregs[f3].e, &env->fpu_status), &env->fpu_status);
+}
+
+/* convert 64-bit float to 128-bit float */
+void HELPER(lxdb)(uint32_t f1, uint64_t a2)
+{
+    union {
+        float64 d;
+        uint64_t i;
+    } v2;
+    v2.i = ldq(a2);
+    FP128 v1;
+    v1.x = float64_to_float128(v2.d, &env->fpu_status);
+    env->fregs[f1].i = v1.h;
+    env->fregs[f1 + 2].i = v1.l;
+}
+
+/* test data class 32-bit */
+uint32_t HELPER(tceb)(uint32_t f1, uint64_t m2)
+{
+    float32 v1 = env->fregs[f1].e;
+    int neg = float32_is_neg(v1);
+    uint32_t cc = 0;
+    HELPER_LOG("%s: v1 0x%lx m2 0x%lx neg %d\n", __FUNCTION__, v1, m2, neg);
+    if (float32_is_zero(v1) && (m2 & (1 << (11-neg)))) cc = 1;
+    else if (float32_is_infinity(v1) && (m2 & (1 << (5-neg)))) cc = 1;
+    else if (float32_is_nan(v1) && (m2 & (1 << (3-neg)))) cc = 1;
+    else if (float32_is_signaling_nan(v1) && (m2 & (1 << (1-neg)))) cc = 1;
+    else /* assume normalized number */ if (m2 & (1 << (9-neg))) cc = 1;
+    /* FIXME: denormalized? */
+    return cc;
+}
+
+/* test data class 64-bit */
+uint32_t HELPER(tcdb)(uint32_t f1, uint64_t m2)
+{
+    float64 v1 = env->fregs[f1].d;
+    int neg = float64_is_neg(v1);
+    uint32_t cc = 0;
+    HELPER_LOG("%s: v1 0x%lx m2 0x%lx neg %d\n", __FUNCTION__, v1, m2, neg);
+    if (float64_is_zero(v1) && (m2 & (1 << (11-neg)))) cc = 1;
+    else if (float64_is_infinity(v1) && (m2 & (1 << (5-neg)))) cc = 1;
+    else if (float64_is_nan(v1) && (m2 & (1 << (3-neg)))) cc = 1;
+    else if (float64_is_signaling_nan(v1) && (m2 & (1 << (1-neg)))) cc = 1;
+    else /* assume normalized number */ if (m2 & (1 << (9-neg))) cc = 1;
+    /* FIXME: denormalized? */
+    return cc;
+}
+
+/* test data class 128-bit */
+uint32_t HELPER(tcxb)(uint32_t f1, uint64_t m2)
+{
+    FP128 v1;
+    uint32_t cc = 0;
+    v1.h = env->fregs[f1].i;
+    v1.l = env->fregs[f1 + 2].i;
+    
+    int neg = float128_is_neg(v1.x);
+    if (float128_is_zero(v1.x) && (m2 & (1 << (11-neg)))) cc = 1;
+    else if (float128_is_infinity(v1.x) && (m2 & (1 << (5-neg)))) cc = 1;
+    else if (float128_is_nan(v1.x) && (m2 & (1 << (3-neg)))) cc = 1;
+    else if (float128_is_signaling_nan(v1.x) && (m2 & (1 << (1-neg)))) cc = 1;
+    else /* assume normalized number */ if (m2 & (1 << (9-neg))) cc = 1;
+    /* FIXME: denormalized? */
+    return cc;
+}
+
+/* find leftmost one */
+uint32_t HELPER(flogr)(uint32_t r1, uint64_t v2)
+{
+    uint64_t res = 0;
+    uint64_t ov2 = v2;
+    while (!(v2 & 0x8000000000000000ULL) && v2) {
+        v2 <<= 1;
+        res++;
+    }
+    if (!v2) {
+        env->regs[r1] = 64;
+        env->regs[r1 + 1] = 0;
+        return 0;
+    }
+    else {
+        env->regs[r1] = res;
+        env->regs[r1 + 1] = ov2 & ~(0x8000000000000000ULL >> res);
+        return 2;
+    }
+}
+
+/* square root 64-bit RR */
+void HELPER(sqdbr)(uint32_t f1, uint32_t f2)
+{
+    env->fregs[f1].d = float64_sqrt(env->fregs[f2].d, &env->fpu_status);
+}
diff --git a/target-s390x/translate.c b/target-s390x/translate.c
new file mode 100644
index 0000000..a1948bf
--- /dev/null
+++ b/target-s390x/translate.c
@@ -0,0 +1,2479 @@
+/*
+ *  S/390 translation
+ *
+ *  Copyright (c) 2009 Ulrich Hecht
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston MA  02110-1301 USA
+ */
+#include <stdarg.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+#include <inttypes.h>
+
+#define S390X_DEBUG_DISAS
+#ifdef S390X_DEBUG_DISAS
+#  define LOG_DISAS(...) qemu_log(__VA_ARGS__)
+#else
+#  define LOG_DISAS(...) do { } while (0)
+#endif
+
+#include "cpu.h"
+#include "exec-all.h"
+#include "disas.h"
+#include "tcg-op.h"
+#include "qemu-log.h"
+
+/* global register indexes */
+static TCGv_ptr cpu_env;
+
+#include "gen-icount.h"
+#include "helpers.h"
+#define GEN_HELPER 1
+#include "helpers.h"
+
+typedef struct DisasContext DisasContext;
+struct DisasContext {
+    uint64_t pc;
+    int is_jmp;
+    CPUS390XState *env;
+};
+
+#define DISAS_EXCP 4
+#define DISAS_SVC 5
+
+void cpu_dump_state(CPUState *env, FILE *f,
+                    int (*cpu_fprintf)(FILE *f, const char *fmt, ...),
+                    int flags)
+{
+    int i;
+    for (i = 0; i < 16; i++) {
+        cpu_fprintf(f, "R%02d=%016lx", i, env->regs[i]);
+        if ((i % 4) == 3) cpu_fprintf(f, "\n");
+        else cpu_fprintf(f, " ");
+    }
+    for (i = 0; i < 16; i++) {
+        cpu_fprintf(f, "F%02d=%016lx", i, env->fregs[i]);
+        if ((i % 4) == 3) cpu_fprintf(f, "\n");
+        else cpu_fprintf(f, " ");
+    }
+    cpu_fprintf(f, "PSW=mask %016lx addr %016lx cc %02x\n", env->psw.mask, env->psw.addr, env->cc);
+}
+
+#define TCGREGS
+
+static TCGv global_cc;
+#ifdef TCGREGS
+/* registers stored in TCG variables enhance performance */
+static TCGv tcgregs[16];
+static TCGv tcgregs32[16];
+#endif
+static TCGv cc;
+static TCGv psw_addr;
+
+void s390x_translate_init(void)
+{
+    cpu_env = tcg_global_reg_new_ptr(TCG_AREG0, "env");
+    global_cc = tcg_global_mem_new_i32(TCG_AREG0, offsetof(CPUState, cc), "global_cc");
+#ifdef TCGREGS
+    int i;
+    char rn[4];
+    for (i = 0; i < 16; i++) {
+        sprintf(rn, "R%d", i);
+        tcgregs[i] = tcg_global_mem_new_i64(TCG_AREG0, offsetof(CPUState, regs[i]), strdup(rn));
+        sprintf(rn, "r%d", i);
+        tcgregs32[i] = tcg_global_mem_new_i32(TCG_AREG0, offsetof(CPUState, regs[i])
+#ifdef WORDS_BIGENDIAN
+                                                                                     + 4
+#endif
+                                                                                        , strdup(rn));
+    }
+#endif
+    psw_addr = tcg_global_mem_new_i64(TCG_AREG0, offsetof(CPUState, psw.addr), "psw_addr");
+}
+
+#ifdef TCGREGS
+static inline void sync_reg64(int reg)
+{
+    tcg_gen_sync_i64(tcgregs[reg]);
+}
+static inline void sync_reg32(int reg)
+{
+    tcg_gen_sync_i32(tcgregs32[reg]);
+}
+#endif
+
+static TCGv load_reg(int reg)
+{
+    TCGv r = tcg_temp_new_i64();
+#ifdef TCGREGS
+    sync_reg32(reg);
+    tcg_gen_mov_i64(r, tcgregs[reg]);
+    return r;
+#else
+    tcg_gen_ld_i64(r, cpu_env, offsetof(CPUState, regs[reg]));
+    return r;
+#endif
+}
+
+static TCGv load_freg(int reg)
+{
+    TCGv r = tcg_temp_new_i64();
+    tcg_gen_ld_i64(r, cpu_env, offsetof(CPUState, fregs[reg].d));
+    return r;
+}
+
+static TCGv load_freg32(int reg)
+{
+    TCGv r = tcg_temp_new_i32();
+    tcg_gen_ld_i32(r, cpu_env, offsetof(CPUState, fregs[reg].e));
+    return r;
+}
+
+static void load_reg32_var(TCGv r, int reg)
+{
+#ifdef TCGREGS
+    sync_reg64(reg);
+    tcg_gen_mov_i32(r, tcgregs32[reg]);
+#else
+#ifdef WORDS_BIGENDIAN
+    tcg_gen_ld_i32(r, cpu_env, offsetof(CPUState, regs[reg]) + 4);
+#else
+    tcg_gen_ld_i32(r, cpu_env, offsetof(CPUState, regs[reg]));
+#endif
+#endif
+}
+
+static TCGv load_reg32(int reg)
+{
+    TCGv r = tcg_temp_new_i32();
+    load_reg32_var(r, reg);
+    return r;
+}
+
+static void store_reg(int reg, TCGv v)
+{
+#ifdef TCGREGS
+    sync_reg32(reg);
+    tcg_gen_mov_i64(tcgregs[reg], v);
+#else
+    tcg_gen_st_i64(v, cpu_env, offsetof(CPUState, regs[reg]));
+#endif
+}
+
+static void store_freg(int reg, TCGv v)
+{
+    tcg_gen_st_i64(v, cpu_env, offsetof(CPUState, fregs[reg].d));
+}
+
+static void store_reg32(int reg, TCGv v)
+{
+#ifdef TCGREGS
+    sync_reg64(reg);
+    tcg_gen_mov_i32(tcgregs32[reg], v);
+#else
+#ifdef WORDS_BIGENDIAN
+    tcg_gen_st_i32(v, cpu_env, offsetof(CPUState, regs[reg]) + 4);
+#else
+    tcg_gen_st_i32(v, cpu_env, offsetof(CPUState, regs[reg]));
+#endif
+#endif
+}
+
+static void store_reg8(int reg, TCGv v)
+{
+#ifdef TCGREGS
+    TCGv tmp = tcg_temp_new_i32();
+    sync_reg64(reg);
+    tcg_gen_andi_i32(tmp, tcgregs32[reg], 0xffffff00UL);
+    tcg_gen_or_i32(tcgregs32[reg], tmp, v);
+    tcg_temp_free(tmp);
+#else
+#ifdef WORDS_BIGENDIAN
+    tcg_gen_st8_i32(v, cpu_env, offsetof(CPUState, regs[reg]) + 7);
+#else
+    tcg_gen_st8_i32(v, cpu_env, offsetof(CPUState, regs[reg]));
+#endif
+#endif
+}
+
+static void store_freg32(int reg, TCGv v)
+{
+    tcg_gen_st_i32(v, cpu_env, offsetof(CPUState, fregs[reg].e));
+}
+
+static void gen_illegal_opcode(DisasContext *s)
+{
+    TCGv tmp = tcg_temp_new_i64();
+    tcg_gen_movi_i64(tmp, 42);
+    gen_helper_exception(tmp);
+    s->is_jmp = DISAS_EXCP;
+}
+
+#define DEBUGINSN LOG_DISAS("insn: 0x%lx\n", insn);
+
+static TCGv get_address(int x2, int b2, int d2)
+{
+    TCGv tmp = 0,tmp2 = 0;
+    if (d2) tmp = tcg_const_i64(d2);
+    if (x2) {
+        if (d2) {
+            tmp2 = load_reg(x2);
+            tcg_gen_add_i64(tmp, tmp, tmp2);
+            tcg_temp_free(tmp2);
+        }
+        else {
+            tmp = load_reg(x2);
+        }
+    }
+    if (b2) {
+        if (d2 || x2) {
+            tmp2 = load_reg(b2);
+            tcg_gen_add_i64(tmp, tmp, tmp2);
+            tcg_temp_free(tmp2);
+        }
+        else {
+            tmp = load_reg(b2);
+        }
+    }
+    
+    if (!(d2 || x2 || b2)) tmp = tcg_const_i64(0);
+    
+    return tmp;
+}
+
+static inline void set_cc_nz_u32(TCGv val)
+{
+    gen_helper_set_cc_nz_u32(cc, val);
+}
+
+static inline void set_cc_nz_u64(TCGv val)
+{
+    gen_helper_set_cc_nz_u64(cc, val);
+}
+
+static inline void set_cc_s32(TCGv val)
+{
+    gen_helper_set_cc_s32(cc, val);
+}
+
+static inline void set_cc_s64(TCGv val)
+{
+    gen_helper_set_cc_s64(cc, val);
+}
+
+static inline void cmp_s32(TCGv v1, TCGv v2)
+{
+    gen_helper_cmp_s32(cc, v1, v2);
+}
+
+static inline void cmp_u32(TCGv v1, TCGv v2)
+{
+    gen_helper_cmp_u32(cc, v1, v2);
+}
+
+/* this is a hysterical raisin */
+static inline void cmp_s32c(TCGv v1, int32_t v2)
+{
+    gen_helper_cmp_s32(cc, v1, tcg_const_i32(v2));
+}
+static inline void cmp_u32c(TCGv v1, uint32_t v2)
+{
+    gen_helper_cmp_u32(cc, v1, tcg_const_i32(v2));
+}
+
+
+static inline void cmp_s64(TCGv v1, TCGv v2)
+{
+    gen_helper_cmp_s64(cc, v1, v2);
+}
+
+static inline void cmp_u64(TCGv v1, TCGv v2)
+{
+    gen_helper_cmp_u64(cc, v1, v2);
+}
+
+/* see cmp_[su]32c() */
+static inline void cmp_s64c(TCGv v1, int64_t v2)
+{
+    gen_helper_cmp_s64(cc, v1, tcg_const_i64(v2));
+}
+static inline void cmp_u64c(TCGv v1, uint64_t v2)
+{
+    gen_helper_cmp_u64(cc, v1, tcg_const_i64(v2));
+}
+
+static void gen_bcr(uint32_t mask, int tr, uint64_t offset)
+{
+    TCGv target;
+    if (mask == 0xf) {	/* unconditional */
+      target = load_reg(tr);
+      tcg_gen_mov_i64(psw_addr, target);
+    }
+    else {
+      gen_helper_bcr(cc, tcg_const_i32(mask), (target = load_reg(tr)), tcg_const_i64(offset));
+    }
+    tcg_temp_free(target);
+}
+
+static void gen_brc(uint32_t mask, uint64_t pc, int32_t offset)
+{
+    if (mask == 0xf) {	/* unconditional */
+      tcg_gen_movi_i64(psw_addr, pc + offset);
+    }
+    else {
+      gen_helper_brc(cc, tcg_const_i32(mask), tcg_const_i64(pc), tcg_const_i32(offset));
+    }
+}
+
+static void gen_set_cc_add64(TCGv v1, TCGv v2, TCGv vr)
+{
+    gen_helper_set_cc_add64(cc, v1, v2, vr);
+}
+
+static void disas_e3(DisasContext* s, int op, int r1, int x2, int b2, int d2)
+{
+    TCGv tmp = 0, tmp2 = 0, tmp3 = 0;
+    
+    LOG_DISAS("disas_e3: op 0x%x r1 %d x2 %d b2 %d d2 %d\n", op, r1, x2, b2, d2);
+    tmp = get_address(x2, b2, d2);
+    switch (op) {
+    case 0x2: /* LTG R1,D2(X2,B2) [RXY] */
+    case 0x4: /* lg r1,d2(x2,b2) */
+        tmp2 = tcg_temp_new_i64();
+        tcg_gen_qemu_ld64(tmp2, tmp, 1);
+        store_reg(r1, tmp2);
+        if (op == 0x2) set_cc_s64(tmp2);
+        break;
+    case 0x12: /* LT R1,D2(X2,B2) [RXY] */
+        tmp2 = tcg_temp_new_i32();
+        tcg_gen_qemu_ld32s(tmp2, tmp, 1);
+        store_reg32(r1, tmp2);
+        set_cc_s32(tmp2);
+        break;
+    case 0xc: /* MSG      R1,D2(X2,B2)     [RXY] */
+    case 0x1c: /* MSGF     R1,D2(X2,B2)     [RXY] */
+        tmp2 = tcg_temp_new_i64();
+        if (op == 0xc) {
+            tcg_gen_qemu_ld64(tmp2, tmp, 1);
+        }
+        else {
+            tcg_gen_qemu_ld32s(tmp2, tmp, 1);
+            tcg_gen_ext32s_i64(tmp2, tmp2);
+        }
+        tmp = load_reg(r1);
+        tcg_gen_mul_i64(tmp, tmp, tmp2);
+        store_reg(r1, tmp);
+        break;
+    case 0xd: /* DSG      R1,D2(X2,B2)     [RXY] */
+    case 0x1d: /* DSGF      R1,D2(X2,B2)     [RXY] */
+        tmp2 = tcg_temp_new_i64();
+        if (op == 0x1d) {
+            tcg_gen_qemu_ld32s(tmp2, tmp, 1);
+            tcg_gen_ext32s_i64(tmp2, tmp2);
+        }
+        else {
+            tcg_gen_qemu_ld64(tmp2, tmp, 1);
+        }
+        tmp = load_reg(r1 + 1);
+        tmp3 = tcg_temp_new_i64();
+        tcg_gen_div_i64(tmp3, tmp, tmp2);
+        store_reg(r1 + 1, tmp3);
+        tcg_gen_rem_i64(tmp3, tmp, tmp2);
+        store_reg(r1, tmp3);
+        break;
+    case 0x8: /* AG      R1,D2(X2,B2)     [RXY] */
+    case 0xa: /* ALG      R1,D2(X2,B2)     [RXY] */
+    case 0x18: /* AGF       R1,D2(X2,B2)     [RXY] */
+    case 0x1a: /* ALGF      R1,D2(X2,B2)     [RXY] */
+        if (op == 0x1a) {
+            tmp2 = tcg_temp_new_i32();
+            tcg_gen_qemu_ld32u(tmp2, tmp, 1);
+            tcg_gen_ext32u_i64(tmp2, tmp2);
+        }
+        else if (op == 0x18) {
+            tmp2 = tcg_temp_new_i32();
+            tcg_gen_qemu_ld32s(tmp2, tmp, 1);
+            tcg_gen_ext32s_i64(tmp2, tmp2);
+        }
+        else {
+            tmp2 = tcg_temp_new_i64();
+            tcg_gen_qemu_ld64(tmp2, tmp, 1);
+        }
+        tmp = load_reg(r1);
+        tmp3 = tcg_temp_new_i64();
+        tcg_gen_add_i64(tmp3, tmp, tmp2);
+        store_reg(r1, tmp3);
+        switch (op) {
+        case 0x8: case 0x18: gen_set_cc_add64(tmp, tmp2, tmp3); break;
+        case 0xa: case 0x1a: gen_helper_set_cc_addu64(cc, tmp, tmp2, tmp3); break;
+        default: tcg_abort();
+        }
+        break;
+    case 0x9: /* SG      R1,D2(X2,B2)     [RXY] */
+    case 0xb: /* SLG      R1,D2(X2,B2)     [RXY] */
+    case 0x19: /* SGF      R1,D2(X2,B2)     [RXY] */
+    case 0x1b: /* SLGF     R1,D2(X2,B2)     [RXY] */
+        if (op == 0x19) {
+            tmp2 = tcg_temp_new_i32();
+            tcg_gen_qemu_ld32s(tmp2, tmp, 1);
+            tcg_gen_ext32s_i64(tmp2, tmp2);
+        }
+        else if (op == 0x1b) {
+            tmp2 = tcg_temp_new_i32();
+            tcg_gen_qemu_ld32u(tmp2, tmp, 1);
+            tcg_gen_ext32u_i64(tmp2, tmp2);
+        }
+        else {
+            tmp2 = tcg_temp_new_i64();
+            tcg_gen_qemu_ld64(tmp2, tmp, 1);
+        }
+        tmp = load_reg(r1);
+        tmp3 = tcg_temp_new_i64();
+        tcg_gen_sub_i64(tmp3, tmp, tmp2);
+        store_reg(r1, tmp3);
+        switch (op) {
+        case 0x9: case 0x19: gen_helper_set_cc_sub64(cc, tmp, tmp2, tmp3); break;
+        case 0xb: case 0x1b: gen_helper_set_cc_subu64(cc, tmp, tmp2, tmp3); break;
+        default: tcg_abort();
+        }
+        break;
+    case 0x14: /* LGF      R1,D2(X2,B2)     [RXY] */
+    case 0x16: /* LLGF      R1,D2(X2,B2)     [RXY] */
+        tmp2 = tcg_temp_new_i64();
+        tcg_gen_qemu_ld32u(tmp2, tmp, 1);
+        switch (op) {
+        case 0x14: tcg_gen_ext32s_i64(tmp2, tmp2); break;
+        case 0x16: tcg_gen_ext32u_i64(tmp2, tmp2); break;
+        default: tcg_abort();
+        }
+        store_reg(r1, tmp2);
+        break;
+    case 0x15: /* LGH     R1,D2(X2,B2)     [RXY] */
+        tmp2 = tcg_temp_new_i64();
+        tcg_gen_qemu_ld16s(tmp2, tmp, 1);
+        tcg_gen_ext16s_i64(tmp2, tmp2);
+        store_reg(r1, tmp2);
+        break;
+    case 0x17: /* LLGT      R1,D2(X2,B2)     [RXY] */
+        tmp2 = tcg_temp_new_i64();
+        tcg_gen_qemu_ld32u(tmp2, tmp, 1);
+        tcg_gen_ext32u_i64(tmp2, tmp2);
+        tcg_gen_andi_i64(tmp2, tmp2, 0x7fffffffULL);
+        store_reg(r1, tmp2);
+        break;
+    case 0x1e: /* LRV R1,D2(X2,B2) [RXY] */
+        tmp2 = tcg_temp_new_i32();
+        tcg_gen_qemu_ld32u(tmp2, tmp, 1);
+        tcg_gen_bswap32_i32(tmp2, tmp2);
+        store_reg(r1, tmp2);
+        break;
+    case 0x20: /* CG      R1,D2(X2,B2)     [RXY] */
+    case 0x21: /* CLG      R1,D2(X2,B2) */
+    case 0x30: /* CGF       R1,D2(X2,B2)     [RXY] */
+    case 0x31: /* CLGF      R1,D2(X2,B2)     [RXY] */
+        tmp2 = tcg_temp_new_i64();
+        switch (op) {
+        case 0x20:
+        case 0x21:
+            tcg_gen_qemu_ld64(tmp2, tmp, 1);
+            break;
+        case 0x30:
+            tcg_gen_qemu_ld32s(tmp2, tmp, 1);
+            tcg_gen_ext32s_i64(tmp2, tmp2);
+            break;
+        case 0x31:
+            tcg_gen_qemu_ld32u(tmp2, tmp, 1);
+            tcg_gen_ext32u_i64(tmp2, tmp2);
+            break;
+        default:
+            tcg_abort();
+        }
+        tmp = load_reg(r1);
+        switch (op) {
+        case 0x20: case 0x30: cmp_s64(tmp, tmp2); break;
+        case 0x21: case 0x31: cmp_u64(tmp, tmp2); break;
+        default: tcg_abort();
+        }
+        break;
+    case 0x24: /* stg r1, d2(x2,b2) */
+        tmp2 = load_reg(r1);
+        tcg_gen_qemu_st64(tmp2, tmp, 1);
+        break;
+    case 0x3e: /* STRV R1,D2(X2,B2) [RXY] */
+        tmp2 = load_reg32(r1);
+        tcg_gen_bswap32_i32(tmp2, tmp2);
+        tcg_gen_qemu_st32(tmp2, tmp, 1);
+        break;
+    case 0x50: /* STY  R1,D2(X2,B2) [RXY] */
+        tmp2 = load_reg32(r1);
+        tcg_gen_qemu_st32(tmp2, tmp, 1);
+        break;
+    case 0x57: /* XY R1,D2(X2,B2) [RXY] */
+        tmp2 = load_reg32(r1);
+        tmp3 = tcg_temp_new_i32();
+        tcg_gen_qemu_ld32u(tmp3, tmp, 1);
+        tcg_gen_xor_i32(tmp, tmp2, tmp3);
+        store_reg32(r1, tmp);
+        set_cc_nz_u32(tmp);
+        break;
+    case 0x58: /* LY R1,D2(X2,B2) [RXY] */
+        tmp3 = tcg_temp_new_i32();
+        tcg_gen_qemu_ld32u(tmp3, tmp, 1);
+        store_reg32(r1, tmp3);
+        break;
+    case 0x5a: /* AY R1,D2(X2,B2) [RXY] */
+    case 0x5b: /* SY R1,D2(X2,B2) [RXY] */
+        tmp2 = load_reg32(r1);
+        tmp3 = tcg_temp_new_i32();
+        tcg_gen_qemu_ld32s(tmp3, tmp, 1);
+        switch (op) {
+        case 0x5a: tcg_gen_add_i32(tmp, tmp2, tmp3); break;
+        case 0x5b: tcg_gen_sub_i32(tmp, tmp2, tmp3); break;
+        default: tcg_abort();
+        }
+        store_reg32(r1, tmp);
+        switch (op) {
+        case 0x5a: gen_helper_set_cc_add32(cc, tmp2, tmp3, tmp); break;
+        case 0x5b: gen_helper_set_cc_sub32(cc, tmp2, tmp3, tmp); break;
+        default: tcg_abort();
+        }
+        break;
+    case 0x71: /* LAY R1,D2(X2,B2) [RXY] */
+        store_reg(r1, tmp);
+        break;
+    case 0x72: /* STCY R1,D2(X2,B2) [RXY] */
+        tmp2 = load_reg32(r1);
+        tcg_gen_qemu_st8(tmp2, tmp, 1);
+        break;
+    case 0x73: /* ICY R1,D2(X2,B2) [RXY] */
+        tmp3 = tcg_temp_new_i32();
+        tcg_gen_qemu_ld8u(tmp3, tmp, 1);
+        store_reg8(r1, tmp3);
+        break; 
+    case 0x76: /* LB R1,D2(X2,B2) [RXY] */
+    case 0x77: /* LGB R1,D2(X2,B2) [RXY] */
+        tmp2 = tcg_temp_new_i64();
+        tcg_gen_qemu_ld8s(tmp2, tmp, 1);
+        switch (op) {
+        case 0x76:
+            tcg_gen_ext8s_i32(tmp2, tmp2);
+            store_reg32(r1, tmp2);
+            break;
+        case 0x77:
+            tcg_gen_ext8s_i64(tmp2, tmp2);
+            store_reg(r1, tmp2);
+            break;
+        default: tcg_abort();
+        }
+        break;
+    case 0x78: /* LHY R1,D2(X2,B2) [RXY] */
+        tmp2 = tcg_temp_new_i32();
+        tcg_gen_qemu_ld16s(tmp2, tmp, 1);
+        tcg_gen_ext16s_i32(tmp2, tmp2);
+        store_reg32(r1, tmp2);
+        break;
+    case 0x80: /* NG      R1,D2(X2,B2)     [RXY] */
+    case 0x81: /* OG      R1,D2(X2,B2)     [RXY] */
+    case 0x82: /* XG      R1,D2(X2,B2)     [RXY] */
+        tmp2 = load_reg(r1);
+        tmp3 = tcg_temp_new_i64();
+        tcg_gen_qemu_ld64(tmp3, tmp, 1);
+        switch (op) {
+        case 0x80: tcg_gen_and_i64(tmp, tmp2, tmp3); break;
+        case 0x81: tcg_gen_or_i64(tmp, tmp2, tmp3); break;
+        case 0x82: tcg_gen_xor_i64(tmp, tmp2, tmp3); break;
+        default: tcg_abort();
+        }
+        store_reg(r1, tmp);
+        set_cc_nz_u64(tmp);
+        break;
+    case 0x86: /* MLG      R1,D2(X2,B2)     [RXY] */
+        tmp2 = tcg_temp_new_i64();
+        tcg_gen_qemu_ld64(tmp2, tmp, 1);
+        tmp = tcg_const_i32(r1);
+        gen_helper_mlg(tmp, tmp2);
+        break;
+    case 0x87: /* DLG      R1,D2(X2,B2)     [RXY] */
+        tmp2 = tcg_temp_new_i64();
+        tcg_gen_qemu_ld64(tmp2, tmp, 1);
+        tmp = tcg_const_i32(r1);
+        gen_helper_dlg(tmp, tmp2);
+        break;
+    case 0x88: /* ALCG      R1,D2(X2,B2)     [RXY] */
+        tmp2 = tcg_temp_new_i64();
+        tcg_gen_qemu_ld64(tmp2, tmp, 1);
+        tmp = load_reg(r1);
+        tmp3 = tcg_temp_new_i64();
+        tcg_gen_shri_i64(tmp3, cc, 1);
+        tcg_gen_andi_i64(tmp3, tmp3, 1);
+        tcg_gen_add_i64(tmp3, tmp2, tmp3);;
+        tcg_gen_add_i64(tmp3, tmp, tmp3);
+        store_reg(r1, tmp3);
+        gen_helper_set_cc_addc_u64(cc, tmp, tmp2, tmp3);
+        break;
+    case 0x89: /* SLBG      R1,D2(X2,B2)     [RXY] */
+        tmp2 = tcg_temp_new_i64();
+        tcg_gen_qemu_ld64(tmp2, tmp, 1);
+        tmp = load_reg(r1);
+        tmp3 = tcg_const_i32(r1);
+        gen_helper_slbg(cc, cc, tmp3, tmp, tmp2);
+        break;
+    case 0x90: /* LLGC      R1,D2(X2,B2)     [RXY] */
+        tmp2 = tcg_temp_new_i64();
+        tcg_gen_qemu_ld8u(tmp2, tmp, 1);
+        store_reg(r1, tmp2);
+        break;
+    case 0x91: /* LLGH      R1,D2(X2,B2)     [RXY] */
+        tmp2 = tcg_temp_new_i64();
+        tcg_gen_qemu_ld16u(tmp2, tmp, 1);
+        store_reg(r1, tmp2);
+        break;
+    case 0x94: /* LLC     R1,D2(X2,B2)     [RXY] */
+        tmp2 = tcg_temp_new_i32();
+        tcg_gen_qemu_ld8u(tmp2, tmp, 1);
+        store_reg32(r1, tmp2);
+        break;
+    case 0x95: /* LLH     R1,D2(X2,B2)     [RXY] */
+        tmp2 = tcg_temp_new_i32();
+        tcg_gen_qemu_ld16u(tmp2, tmp, 1);
+        store_reg32(r1, tmp2);
+        break;
+    case 0x98: /* ALC     R1,D2(X2,B2)     [RXY] */
+        tmp2 = tcg_temp_new_i32();
+        tcg_gen_qemu_ld32u(tmp2, tmp, 1);
+        tmp = tcg_const_i32(r1);
+        gen_helper_addc_u32(cc, cc, tmp, tmp2);
+        break;
+    case 0x99: /* SLB     R1,D2(X2,B2)     [RXY] */
+        tmp2 = tcg_temp_new_i32();
+        tcg_gen_qemu_ld32u(tmp2, tmp, 1);
+        tmp = load_reg32(r1);
+        tmp3 = tcg_const_i32(r1);
+        gen_helper_slb(cc, cc, tmp3, tmp, tmp2);
+        break;
+    default:
+        LOG_DISAS("illegal e3 operation 0x%x\n", op);
+        gen_illegal_opcode(s);
+        break;
+    }
+    tcg_temp_free(tmp);
+    if (tmp2) tcg_temp_free(tmp2);
+    if (tmp3) tcg_temp_free(tmp3);
+}
+
+static void disas_eb(DisasContext *s, int op, int r1, int r3, int b2, int d2)
+{
+    TCGv tmp = 0,tmp2 = 0,tmp3 = 0,tmp4 = 0;
+    int i;
+    
+    LOG_DISAS("disas_eb: op 0x%x r1 %d r3 %d b2 %d d2 0x%x\n", op, r1, r3, b2, d2);
+    switch (op) {
+    case 0xc: /* SRLG     R1,R3,D2(B2)     [RSY] */
+    case 0xd: /* SLLG     R1,R3,D2(B2)     [RSY] */
+    case 0xa: /* SRAG     R1,R3,D2(B2)     [RSY] */
+    case 0x1c: /* RLLG     R1,R3,D2(B2)     [RSY] */
+        if (b2) {
+            tmp = get_address(0, b2, d2);
+            tcg_gen_andi_i64(tmp, tmp, 0x3f);
+        }
+        else tmp = tcg_const_i32(d2 & 0x3f);
+        tmp2 = load_reg(r3);
+        tmp3 = tcg_temp_new_i64();
+        switch (op) {
+        case 0xc: tcg_gen_shr_i64(tmp3, tmp2, tmp); break;
+        case 0xd: tcg_gen_shl_i64(tmp3, tmp2, tmp); break;
+        case 0xa: tcg_gen_sar_i64(tmp3, tmp2, tmp); break;
+        case 0x1c: tcg_gen_rotl_i64(tmp3, tmp2, tmp); break;
+        default: tcg_abort(); break;
+        }
+        store_reg(r1, tmp3);
+        if (op == 0xa) set_cc_s64(tmp3);
+        break;
+    case 0x1d: /* RLL    R1,R3,D2(B2)        [RSY] */
+        if (b2) {
+            tmp = get_address(0, b2, d2);
+            tcg_gen_andi_i64(tmp, tmp, 0x3f);
+        }
+        else tmp = tcg_const_i32(d2 & 0x3f);
+        tmp2 = load_reg32(r3);
+        tmp3 = tcg_temp_new_i32();
+        switch (op) {
+        case 0x1d: tcg_gen_rotl_i32(tmp3, tmp2, tmp); break;
+        default: tcg_abort(); break;
+        }
+        store_reg32(r1, tmp3);
+        break;
+    case 0x4: /* LMG     R1,R3,D2(B2)     [RSY] */
+    case 0x24: /* stmg */
+        /* Apparently, unrolling lmg/stmg of any size gains performance -
+           even for very long ones... */
+        if (r3 > r1) {
+            tmp = get_address(0, b2, d2);
+            for (i = r1; i <= r3; i++) {
+                if (op == 0x4) {
+                    tmp2 = tcg_temp_new_i64();
+                    tcg_gen_qemu_ld64(tmp2, tmp, 1);
+                    store_reg(i, tmp2);
+                    /* At least one register is usually read after an lmg
+                       (br %rsomething), which is why freeing them is
+                       detrimental to performance */
+                }
+                else {
+                    tmp2 = load_reg(i);
+                    tcg_gen_qemu_st64(tmp2, tmp, 1);
+                    /* R15 is usually read after an stmg; other registers
+                       generally aren't and can be free'd */
+                    if (i != 15) tcg_temp_free(tmp2);
+                }
+                tcg_gen_addi_i64(tmp, tmp, 8);
+            }
+            tmp2 = 0;
+        }
+        else {
+            tmp = tcg_const_i32(r1);
+            tmp2 = tcg_const_i32(r3);
+            tmp3 = tcg_const_i32(b2);
+            tmp4 = tcg_const_i32(d2);
+            if (op == 0x4) gen_helper_lmg(tmp, tmp2, tmp3, tmp4);
+            else gen_helper_stmg(tmp, tmp2, tmp3, tmp4);
+        }
+        break;
+    case 0x2c: /* STCMH R1,M3,D2(B2) [RSY] */
+        tmp2 = get_address(0, b2, d2);
+        tmp = tcg_const_i32(r1);
+        tmp3 = tcg_const_i32(r3);
+        gen_helper_stcmh(cc, tmp, tmp2, tmp3);
+        break;
+    case 0x30: /* CSG     R1,R3,D2(B2)     [RSY] */
+        tmp2 = get_address(0, b2, d2);
+        tmp = tcg_const_i32(r1);
+        tmp3 = tcg_const_i32(r3);
+        gen_helper_csg(cc, tmp, tmp2, tmp3);
+        break;
+    case 0x3e: /* CDSG R1,R3,D2(B2) [RSY] */
+        tmp2 = get_address(0, b2, d2);
+        tmp = tcg_const_i32(r1);
+        tmp3 = tcg_const_i32(r3);
+        gen_helper_cdsg(cc, tmp, tmp2, tmp3);
+        break;
+    case 0x51: /* TMY D1(B1),I2 [SIY] */
+        tmp = get_address(0, b2, d2); /* SIY -> this is the destination */
+        tmp2 = tcg_temp_new_i32();
+        tcg_gen_qemu_ld8u(tmp2, tmp, 1);
+        tmp = tcg_const_i32((r1 << 4) | r3);
+        gen_helper_tm(cc, tmp2, tmp);
+        break;
+    case 0x52: /* MVIY D1(B1),I2 [SIY] */
+        tmp2 = tcg_const_i32((r1 << 4) | r3);
+        tmp = get_address(0, b2, d2); /* SIY -> this is the destination */
+        tcg_gen_qemu_st8(tmp2, tmp, 1);
+        break;
+    case 0x55: /* CLIY D1(B1),I2 [SIY] */
+        tmp3 = get_address(0, b2, d2); /* SIY -> this is the 1st operand */
+        tmp = tcg_temp_new_i32();
+        tcg_gen_qemu_ld8u(tmp, tmp3, 1);
+        cmp_u32c(tmp, (r1 << 4) | r3);
+        break;
+    case 0x80: /* ICMH      R1,M3,D2(B2)     [RSY] */
+        tmp2 = get_address(0, b2, d2);
+        tmp = tcg_const_i32(r1);
+        tmp3 = tcg_const_i32(r3);
+        gen_helper_icmh(cc, tmp, tmp2, tmp3);
+        break;
+    default:
+        LOG_DISAS("illegal eb operation 0x%x\n", op);
+        gen_illegal_opcode(s);
+        break;
+    }
+    if (tmp) tcg_temp_free(tmp);
+    if (tmp2) tcg_temp_free(tmp2);
+    if (tmp3) tcg_temp_free(tmp3);
+    if (tmp4) tcg_temp_free(tmp4);
+}
+
+static void disas_ed(DisasContext *s, int op, int r1, int x2, int b2, int d2, int r1b)
+{
+    TCGv tmp, tmp2, tmp3 = 0;
+    tmp2 = get_address(x2, b2, d2);
+    tmp = tcg_const_i32(r1);
+    switch (op) {
+    case 0x5: /* LXDB R1,D2(X2,B2) [RXE] */
+        gen_helper_lxdb(tmp, tmp2);
+        break;
+    case 0x9: /* CEB    R1,D2(X2,B2)       [RXE] */
+        gen_helper_ceb(cc, tmp, tmp2);
+        break;
+    case 0xa: /* AEB    R1,D2(X2,B2)       [RXE] */
+        gen_helper_aeb(cc, tmp, tmp2);
+        break;
+    case 0xb: /* SEB    R1,D2(X2,B2)       [RXE] */
+        gen_helper_seb(cc, tmp, tmp2);
+        break;
+    case 0xd: /* DEB    R1,D2(X2,B2)       [RXE] */
+        gen_helper_deb(tmp, tmp2);
+        break;
+    case 0x10: /* TCEB   R1,D2(X2,B2)       [RXE] */
+        gen_helper_tceb(cc, tmp, tmp2);
+        break;
+    case 0x11: /* TCDB   R1,D2(X2,B2)       [RXE] */
+        gen_helper_tcdb(cc, tmp, tmp2);
+        break;
+    case 0x12: /* TCXB   R1,D2(X2,B2)       [RXE] */
+        gen_helper_tcxb(cc, tmp, tmp2);
+        break;
+    case 0x17: /* MEEB   R1,D2(X2,B2)       [RXE] */
+        gen_helper_meeb(tmp, tmp2);
+        break;
+    case 0x19: /* CDB    R1,D2(X2,B2)       [RXE] */
+        gen_helper_cdb(cc, tmp, tmp2);
+        break;
+    case 0x1a: /* ADB    R1,D2(X2,B2)       [RXE] */
+        gen_helper_adb(cc, tmp, tmp2);
+        break;
+    case 0x1b: /* SDB    R1,D2(X2,B2)       [RXE] */
+        gen_helper_sdb(cc, tmp, tmp2);
+        break;
+    case 0x1c: /* MDB    R1,D2(X2,B2)       [RXE] */
+        gen_helper_mdb(tmp, tmp2);
+        break;
+    case 0x1d: /* DDB    R1,D2(X2,B2)       [RXE] */
+        gen_helper_ddb(tmp, tmp2);
+        break;
+    case 0x1e: /* MADB  R1,R3,D2(X2,B2) [RXF] */
+        /* for RXF insns, r1 is R3 and r1b is R1 */
+        tmp3 = tcg_const_i32(r1b);
+        gen_helper_madb(tmp3, tmp2, tmp);
+        break;
+    default:
+        LOG_DISAS("illegal ed operation 0x%x\n", op);
+        gen_illegal_opcode(s);
+        break;
+    }
+    tcg_temp_free(tmp);
+    tcg_temp_free(tmp2);
+}
+
+static void disas_a5(DisasContext *s, int op, int r1, int i2)
+{
+    TCGv tmp = 0,tmp2 = 0;
+    uint64_t vtmp;
+    LOG_DISAS("disas_a5: op 0x%x r1 %d i2 0x%x\n", op, r1, i2);
+    switch (op) {
+    case 0x0: /* IIHH     R1,I2     [RI] */
+    case 0x1: /* IIHL     R1,I2     [RI] */
+        tmp = load_reg(r1);
+        vtmp = i2;
+        switch (op) {
+        case 0x0: tcg_gen_andi_i64(tmp, tmp, 0x0000ffffffffffffULL); vtmp <<= 48; break;
+        case 0x1: tcg_gen_andi_i64(tmp, tmp, 0xffff0000ffffffffULL); vtmp <<= 32; break;
+        default: tcg_abort();
+        }
+        tcg_gen_ori_i64(tmp, tmp, vtmp);
+        store_reg(r1, tmp);
+        break;
+    case 0x4: /* NIHH     R1,I2     [RI] */
+    case 0x8: /* OIHH     R1,I2     [RI] */
+        tmp = load_reg(r1);
+        switch (op) {
+        case 0x4:
+            tmp2 = tcg_const_i64( (((uint64_t)i2) << 48) | 0x0000ffffffffffffULL);
+            tcg_gen_and_i64(tmp, tmp, tmp2);
+            break;
+        case 0x8:
+            tmp2 = tcg_const_i64(((uint64_t)i2) << 48);
+            tcg_gen_or_i64(tmp, tmp, tmp2);
+            break;
+        default: tcg_abort();
+        }
+        store_reg(r1, tmp);
+        tcg_gen_shri_i64(tmp2, tmp, 48);
+        tcg_gen_trunc_i64_i32(tmp2, tmp2);
+        set_cc_nz_u32(tmp2);
+        break;
+    case 0x5: /* NIHL     R1,I2     [RI] */
+    case 0x9: /* OIHL     R1,I2     [RI] */
+        tmp = load_reg(r1);
+        switch (op) {
+        case 0x5:
+            tmp2 = tcg_const_i64( (((uint64_t)i2) << 32) | 0xffff0000ffffffffULL);
+            tcg_gen_and_i64(tmp, tmp, tmp2);
+            break;
+        case 0x9:
+            tmp2 = tcg_const_i64(((uint64_t)i2) << 32);
+            tcg_gen_or_i64(tmp, tmp, tmp2);
+            break;
+        default: tcg_abort();
+        }
+        store_reg(r1, tmp);
+        tcg_gen_shri_i64(tmp2, tmp, 32);
+        tcg_gen_trunc_i64_i32(tmp2, tmp2);
+        tcg_gen_andi_i32(tmp2, tmp2, 0xffff);
+        set_cc_nz_u32(tmp2);
+        break;
+    case 0x6: /* NILH     R1,I2     [RI] */
+    case 0xa: /* OILH     R1,I2     [RI] */
+        tmp = load_reg(r1);
+        switch (op) {
+        case 0x6:
+            tmp2 = tcg_const_i64( (((uint64_t)i2) << 16) | 0xffffffff0000ffffULL);
+            tcg_gen_and_i64(tmp, tmp, tmp2);
+            break;
+        case 0xa:
+            tmp2 = tcg_const_i64(((uint64_t)i2) << 16);
+            tcg_gen_or_i64(tmp, tmp, tmp2);
+            break;
+        default: tcg_abort();
+        }
+        store_reg(r1, tmp);
+        tcg_gen_shri_i64(tmp2, tmp, 16);
+        tcg_gen_trunc_i64_i32(tmp2, tmp2);
+        tcg_gen_andi_i32(tmp2, tmp2, 0xffff);
+        set_cc_nz_u32(tmp2);
+        break;
+    case 0x7: /* NILL     R1,I2     [RI] */
+    case 0xb: /* OILL     R1,I2     [RI] */
+        tmp = load_reg(r1);
+        switch (op) {
+        case 0x7:
+            tmp2 = tcg_const_i64(i2 | 0xffffffffffff0000ULL);
+            tcg_gen_and_i64(tmp, tmp, tmp2);
+            break;
+        case 0xb: 
+            tmp2 = tcg_const_i64(i2);
+            tcg_gen_or_i64(tmp, tmp, tmp2);
+            break;
+        default: tcg_abort(); break;
+        }
+        store_reg(r1, tmp);
+        tcg_gen_trunc_i64_i32(tmp, tmp);
+        tcg_gen_andi_i32(tmp, tmp, 0xffff);
+        set_cc_nz_u32(tmp);	/* signedness should not matter here */
+        break;
+    case 0xc: /* LLIHH     R1,I2     [RI] */
+        tmp = tcg_const_i64( ((uint64_t)i2) << 48 );
+        store_reg(r1, tmp);
+        break;
+    case 0xd: /* LLIHL     R1,I2     [RI] */
+        tmp = tcg_const_i64( ((uint64_t)i2) << 32 );
+        store_reg(r1, tmp);
+        break;
+    case 0xe: /* LLILH     R1,I2     [RI] */
+        tmp = tcg_const_i64( ((uint64_t)i2) << 16 );
+        store_reg(r1, tmp);
+        break;
+    case 0xf: /* LLILL     R1,I2     [RI] */
+        tmp = tcg_const_i64(i2);
+        store_reg(r1, tmp);
+        break;
+    default:
+        LOG_DISAS("illegal a5 operation 0x%x\n", op);
+        gen_illegal_opcode(s);
+        break;
+    }
+    if (tmp) tcg_temp_free(tmp);
+    if (tmp2) tcg_temp_free(tmp2);
+}
+
+static void disas_a7(DisasContext *s, int op, int r1, int i2)
+{
+    TCGv tmp = 0,tmp2 = 0,tmp3 = 0;
+    LOG_DISAS("disas_a7: op 0x%x r1 %d i2 0x%x\n", op, r1, i2);
+    switch (op) {
+    case 0x0: /* TMLH or TMH     R1,I2     [RI] */
+        tmp = load_reg(r1);
+        tcg_gen_shri_i64(tmp, tmp, 16);
+        tmp2 = tcg_const_i32((uint16_t)i2);
+        gen_helper_tmxx(cc, tmp, tmp2);
+        break;
+    case 0x1: /* TMLL or TML     R1,I2     [RI] */
+        tmp = load_reg(r1);
+        tmp2 = tcg_const_i32((uint16_t)i2);
+        gen_helper_tmxx(cc, tmp, tmp2);
+        break;
+    case 0x2: /* TMHH     R1,I2     [RI] */
+        tmp = load_reg(r1);
+        tcg_gen_shri_i64(tmp, tmp, 48);
+        tmp2 = tcg_const_i32((uint16_t)i2);
+        gen_helper_tmxx(cc, tmp, tmp2);
+        break;
+    case 0x3: /* TMHL     R1,I2     [RI] */
+        tmp = load_reg(r1);
+        tcg_gen_shri_i64(tmp, tmp, 32);
+        tmp2 = tcg_const_i32((uint16_t)i2);
+        gen_helper_tmxx(cc, tmp, tmp2);
+        break;
+    case 0x4: /* brc m1, i2 */
+        /* FIXME: optimize m1 == 0xf (unconditional) case */
+        gen_brc(r1, s->pc, i2 * 2);
+        s->is_jmp = DISAS_JUMP;
+        break;
+    case 0x5: /* BRAS     R1,I2     [RI] */
+        tmp = tcg_const_i64(s->pc + 4);
+        store_reg(r1, tmp);
+        tmp = tcg_const_i64(s->pc + i2 * 2);
+        tcg_gen_st_i64(tmp, cpu_env, offsetof(CPUState, psw.addr));
+        s->is_jmp = DISAS_JUMP;
+        break;
+    case 0x6: /* BRCT     R1,I2     [RI] */
+        tmp = load_reg32(r1);
+        tcg_gen_subi_i32(tmp, tmp, 1);
+        store_reg32(r1, tmp);
+        tmp2 = tcg_const_i64(s->pc);
+        tmp3 = tcg_const_i32(i2 * 2);
+        gen_helper_brct(tmp, tmp2, tmp3);
+        s->is_jmp = DISAS_JUMP;
+        break;
+    case 0x7: /* BRCTG     R1,I2     [RI] */
+        tmp = load_reg(r1);
+        tcg_gen_subi_i64(tmp, tmp, 1);
+        store_reg(r1, tmp);
+        tmp2 = tcg_const_i64(s->pc);
+        tmp3 = tcg_const_i32(i2 * 2);
+        gen_helper_brctg(tmp, tmp2, tmp3);
+        s->is_jmp = DISAS_JUMP;
+        break;
+    case 0x8: /* lhi r1, i2 */
+        tmp = tcg_const_i32(i2);
+        store_reg32(r1, tmp);
+        break;
+    case 0x9: /* lghi r1, i2 */
+        tmp = tcg_const_i64(i2);
+        store_reg(r1, tmp);
+        break;
+    case 0xa: /* AHI     R1,I2     [RI] */
+        tmp = load_reg32(r1);
+        tmp3 = tcg_temp_new_i32();
+        tcg_gen_addi_i32(tmp3, tmp, i2);
+        store_reg32(r1, tmp3);
+        tmp2 = tcg_const_i32(i2);
+        gen_helper_set_cc_add32(cc, tmp, tmp2, tmp3);
+        break;
+    case 0xb: /* aghi r1, i2 */
+        tmp = load_reg(r1);
+        tmp3 = tcg_temp_new_i64();
+        tcg_gen_addi_i64(tmp3, tmp, i2);
+        store_reg(r1, tmp3);
+        tmp2 = tcg_const_i64(i2);
+        gen_set_cc_add64(tmp, tmp2, tmp3);
+        break;
+    case 0xc: /* MHI     R1,I2     [RI] */
+        tmp = load_reg32(r1);
+        tcg_gen_muli_i32(tmp, tmp, i2);
+        store_reg32(r1, tmp);
+        break;
+    case 0xd: /* MGHI     R1,I2     [RI] */
+        tmp = load_reg(r1);
+        tcg_gen_muli_i64(tmp, tmp, i2);
+        store_reg(r1, tmp);
+        break;
+    case 0xe: /* CHI     R1,I2     [RI] */
+        tmp = load_reg32(r1);
+        cmp_s32c(tmp, i2);
+        break;
+    case 0xf: /* CGHI     R1,I2     [RI] */
+        tmp = load_reg(r1);
+        cmp_s64c(tmp, i2);
+        break;
+    default:
+        LOG_DISAS("illegal a7 operation 0x%x\n", op);
+        gen_illegal_opcode(s);
+        break;
+    }
+    if (tmp) tcg_temp_free(tmp);
+    if (tmp2) tcg_temp_free(tmp2);
+    if (tmp3) tcg_temp_free(tmp3);
+}
+
+static void disas_b2(DisasContext *s, int op, int r1, int r2)
+{
+    TCGv tmp = 0, tmp2 = 0, tmp3 = 0;
+    LOG_DISAS("disas_b2: op 0x%x r1 %d r2 %d\n", op, r1, r2);
+    switch (op) {
+    case 0x22: /* IPM    R1               [RRE] */
+        tmp = tcg_const_i32(r1);
+        gen_helper_ipm(cc, tmp);
+        break;
+    case 0x4e: /* SAR     R1,R2     [RRE] */
+        tmp = load_reg32(r2);
+        tcg_gen_st_i32(tmp, cpu_env, offsetof(CPUState, aregs[r1]));
+        break;
+    case 0x4f: /* EAR     R1,R2     [RRE] */
+        tmp = tcg_temp_new_i32();
+        tcg_gen_ld_i32(tmp, cpu_env, offsetof(CPUState, aregs[r2]));
+        store_reg32(r1, tmp);
+        break;
+    case 0x52: /* MSR     R1,R2     [RRE] */
+        tmp = load_reg32(r1);
+        tmp2 = load_reg32(r2);
+        tcg_gen_mul_i32(tmp, tmp, tmp2);
+        store_reg32(r1, tmp);
+        break;
+    case 0x55: /* MVST     R1,R2     [RRE] */
+        tmp = load_reg32(0);
+        tmp2 = tcg_const_i32(r1);
+        tmp3 = tcg_const_i32(r2);
+        gen_helper_mvst(cc, tmp, tmp2, tmp3);
+        break;
+    case 0x5d: /* CLST     R1,R2     [RRE] */
+        tmp = load_reg32(0);
+        tmp2 = tcg_const_i32(r1);
+        tmp3 = tcg_const_i32(r2);
+        gen_helper_clst(cc, tmp, tmp2, tmp3);
+        break;
+    case 0x5e: /* SRST     R1,R2     [RRE] */
+        tmp = load_reg32(0);
+        tmp2 = tcg_const_i32(r1);
+        tmp3 = tcg_const_i32(r2);
+        gen_helper_srst(cc, tmp, tmp2, tmp3);
+        break;
+    default:
+        LOG_DISAS("illegal b2 operation 0x%x\n", op);
+        gen_illegal_opcode(s);
+        break;
+    }
+    if (tmp) tcg_temp_free(tmp);
+    if (tmp2) tcg_temp_free(tmp2);
+    if (tmp3) tcg_temp_free(tmp3);
+}
+
+static void disas_b3(DisasContext *s, int op, int m3, int r1, int r2)
+{
+    TCGv tmp = 0, tmp2 = 0, tmp3 = 0;
+    LOG_DISAS("disas_b3: op 0x%x m3 0x%x r1 %d r2 %d\n", op, m3, r1, r2);
+#define FP_HELPER(i) \
+    tmp = tcg_const_i32(r1); \
+    tmp2 = tcg_const_i32(r2); \
+    gen_helper_ ## i (tmp, tmp2);
+#define FP_HELPER_CC(i) \
+    tmp = tcg_const_i32(r1); \
+    tmp2 = tcg_const_i32(r2); \
+    gen_helper_ ## i (cc, tmp, tmp2);
+
+    switch (op) {
+    case 0x0: /* LPEBR       R1,R2             [RRE] */
+        FP_HELPER_CC(lpebr); break;
+    case 0x2: /* LTEBR       R1,R2             [RRE] */
+        FP_HELPER_CC(ltebr); break;
+    case 0x3: /* LCEBR       R1,R2             [RRE] */
+        FP_HELPER_CC(lcebr); break;
+    case 0x4: /* LDEBR       R1,R2             [RRE] */
+        FP_HELPER(ldebr); break;
+    case 0x5: /* LXDBR       R1,R2             [RRE] */
+        FP_HELPER(lxdbr); break;
+    case 0x9: /* CEBR        R1,R2             [RRE] */
+        FP_HELPER_CC(cebr); break;
+    case 0xa: /* AEBR        R1,R2             [RRE] */
+        FP_HELPER_CC(aebr); break;
+    case 0xb: /* SEBR        R1,R2             [RRE] */
+        FP_HELPER_CC(sebr); break;
+    case 0xd: /* DEBR        R1,R2             [RRE] */
+        FP_HELPER(debr); break;
+    case 0x10: /* LPDBR       R1,R2             [RRE] */
+        FP_HELPER_CC(lpdbr); break;
+    case 0x12: /* LTDBR       R1,R2             [RRE] */
+        FP_HELPER_CC(ltdbr); break;
+    case 0x13: /* LCDBR       R1,R2             [RRE] */
+        FP_HELPER_CC(lcdbr); break;
+    case 0x15: /* SQBDR       R1,R2             [RRE] */
+        FP_HELPER(sqdbr); break;
+    case 0x17: /* MEEBR       R1,R2             [RRE] */
+        FP_HELPER(meebr); break;
+    case 0x19: /* CDBR        R1,R2             [RRE] */
+        FP_HELPER_CC(cdbr); break;
+    case 0x1a: /* ADBR        R1,R2             [RRE] */
+        FP_HELPER_CC(adbr); break;
+    case 0x1b: /* SDBR        R1,R2             [RRE] */
+        FP_HELPER_CC(sdbr); break;
+    case 0x1c: /* MDBR        R1,R2             [RRE] */
+        FP_HELPER(mdbr); break;
+    case 0x1d: /* DDBR        R1,R2             [RRE] */
+        FP_HELPER(ddbr); break;
+    case 0xe: /* MAEBR  R1,R3,R2 [RRF] */
+    case 0x1e: /* MADBR R1,R3,R2 [RRF] */
+    case 0x1f: /* MSDBR R1,R3,R2 [RRF] */
+        /* for RRF insns, m3 is R1, r1 is R3, and r2 is R2 */
+        tmp = tcg_const_i32(m3);
+        tmp2 = tcg_const_i32(r2);
+        tmp3 = tcg_const_i32(r1);
+        switch (op) {
+        case 0xe: gen_helper_maebr(tmp, tmp3, tmp2); break;
+        case 0x1e: gen_helper_madbr(tmp, tmp3, tmp2); break;
+        case 0x1f: gen_helper_msdbr(tmp, tmp3, tmp2); break;
+        default: tcg_abort();
+        }
+        break;
+    case 0x40: /* LPXBR       R1,R2             [RRE] */
+        FP_HELPER_CC(lpxbr); break;
+    case 0x42: /* LTXBR       R1,R2             [RRE] */
+        FP_HELPER_CC(ltxbr); break;
+    case 0x43: /* LCXBR       R1,R2             [RRE] */
+        FP_HELPER_CC(lcxbr); break;
+    case 0x44: /* LEDBR       R1,R2             [RRE] */
+        FP_HELPER(ledbr); break;
+    case 0x45: /* LDXBR       R1,R2             [RRE] */
+        FP_HELPER(ldxbr); break;
+    case 0x46: /* LEXBR       R1,R2             [RRE] */
+        FP_HELPER(lexbr); break;
+    case 0x49: /* CXBR        R1,R2             [RRE] */
+        FP_HELPER_CC(cxbr); break;
+    case 0x4a: /* AXBR        R1,R2             [RRE] */
+        FP_HELPER_CC(axbr); break;
+    case 0x4b: /* SXBR        R1,R2             [RRE] */
+        FP_HELPER_CC(sxbr); break;
+    case 0x4c: /* MXBR        R1,R2             [RRE] */
+        FP_HELPER(mxbr); break;
+    case 0x4d: /* DXBR        R1,R2             [RRE] */
+        FP_HELPER(dxbr); break;
+    case 0x65: /* LXR         R1,R2             [RRE] */
+        tmp = load_freg(r2);
+        store_freg(r1, tmp);
+        tmp = load_freg(r2 + 2);
+        store_freg(r1 + 2, tmp);
+        break;
+    case 0x74: /* LZER        R1                [RRE] */
+        tmp = tcg_const_i32(r1);
+        gen_helper_lzer(tmp);
+        break;
+    case 0x75: /* LZDR        R1                [RRE] */
+        tmp = tcg_const_i32(r1);
+        gen_helper_lzdr(tmp);
+        break;
+    case 0x76: /* LZXR        R1                [RRE] */
+        tmp = tcg_const_i32(r1);
+        gen_helper_lzxr(tmp);
+        break;
+    case 0x84: /* SFPC        R1                [RRE] */
+        tmp = load_reg32(r1);
+        tcg_gen_st_i32(tmp, cpu_env, offsetof(CPUState, fpc));
+        break;
+    case 0x8c: /* EFPC        R1                [RRE] */
+        tmp = tcg_temp_new_i32();
+        tcg_gen_ld_i32(tmp, cpu_env, offsetof(CPUState, fpc));
+        store_reg32(r1, tmp);
+        break;
+    case 0x94: /* CEFBR       R1,R2             [RRE] */
+    case 0x95: /* CDFBR       R1,R2             [RRE] */
+    case 0x96: /* CXFBR       R1,R2             [RRE] */
+        tmp = tcg_const_i32(r1);
+        tmp2 = load_reg32(r2);
+        switch (op) {
+        case 0x94: gen_helper_cefbr(tmp, tmp2); break;
+        case 0x95: gen_helper_cdfbr(tmp, tmp2); break;
+        case 0x96: gen_helper_cxfbr(tmp, tmp2); break;
+        default: tcg_abort();
+        }
+        break;
+    case 0x98: /* CFEBR       R1,R2             [RRE] */
+    case 0x99: /* CFDBR	      R1,R2             [RRE] */
+    case 0x9a: /* CFXBR       R1,R2             [RRE] */
+        tmp = tcg_const_i32(r1);
+        tmp2 = tcg_const_i32(r2);
+        tmp3 = tcg_const_i32(m3);
+        switch (op) {
+        case 0x98: gen_helper_cfebr(cc, tmp, tmp2, tmp3); break;
+        case 0x99: gen_helper_cfdbr(cc, tmp, tmp2, tmp3); break;
+        case 0x9a: gen_helper_cfxbr(cc, tmp, tmp2, tmp3); break;
+        default: tcg_abort();
+        }
+        break;
+    case 0xa4: /* CEGBR       R1,R2             [RRE] */
+    case 0xa5: /* CDGBR       R1,R2             [RRE] */
+        tmp = tcg_const_i32(r1);
+        tmp2 = load_reg(r2);
+        switch (op) {
+        case 0xa4: gen_helper_cegbr(tmp, tmp2); break;
+        case 0xa5: gen_helper_cdgbr(tmp, tmp2); break;
+        default: tcg_abort();
+        }
+        break;
+    case 0xa6: /* CXGBR       R1,R2             [RRE] */
+        tmp = tcg_const_i32(r1);
+        tmp2 = load_reg(r2);
+        gen_helper_cxgbr(tmp, tmp2);
+        break;
+    case 0xa8: /* CGEBR       R1,R2             [RRE] */
+        tmp = tcg_const_i32(r1);
+        tmp2 = tcg_const_i32(r2);
+        tmp3 = tcg_const_i32(m3);
+        gen_helper_cgebr(cc, tmp, tmp2, tmp3);
+        break;
+    case 0xa9: /* CGDBR       R1,R2             [RRE] */
+        tmp = tcg_const_i32(r1);
+        tmp2 = tcg_const_i32(r2);
+        tmp3 = tcg_const_i32(m3);
+        gen_helper_cgdbr(cc, tmp, tmp2, tmp3);
+        break;
+    case 0xaa: /* CGXBR       R1,R2             [RRE] */
+        tmp = tcg_const_i32(r1);
+        tmp2 = tcg_const_i32(r2);
+        tmp3 = tcg_const_i32(m3);
+        gen_helper_cgxbr(cc, tmp, tmp2, tmp3);
+        break;
+    default:
+        LOG_DISAS("illegal b3 operation 0x%x\n", op);
+        gen_illegal_opcode(s);
+        break;
+    }
+    if (tmp) tcg_temp_free(tmp);
+    if (tmp2) tcg_temp_free(tmp2);
+    if (tmp3) tcg_temp_free(tmp3);
+}
+
+static void disas_b9(DisasContext *s, int op, int r1, int r2)
+{
+    TCGv tmp = 0, tmp2 = 0, tmp3 = 0;
+    LOG_DISAS("disas_b9: op 0x%x r1 %d r2 %d\n", op, r1, r2);
+    switch (op) {
+    case 0: /* LPGR     R1,R2     [RRE] */
+    case 0x10: /* LPGFR R1,R2 [RRE] */
+        if (op == 0) {
+            tmp2 = load_reg(r2);
+        }
+        else {
+            tmp2 = load_reg32(r2);
+            tcg_gen_ext32s_i64(tmp2, tmp2);
+        }
+        tmp = tcg_const_i32(r1);
+        gen_helper_abs_i64(cc, tmp, tmp2);
+        break;
+    case 1: /* LNGR     R1,R2     [RRE] */
+        tmp2 = load_reg(r2);
+        tmp = tcg_const_i32(r1);
+        gen_helper_nabs_i64(cc, tmp, tmp2);
+        break;
+    case 2: /* LTGR R1,R2 [RRE] */
+        tmp = load_reg(r2);
+        if (r1 != r2) store_reg(r1, tmp);
+        set_cc_s64(tmp);
+        break;
+    case 3: /* LCGR     R1,R2     [RRE] */
+    case 0x13: /* LCGFR    R1,R2     [RRE] */
+        if (op == 0x13) {
+            tmp = load_reg32(r2);
+            tcg_gen_ext32s_i64(tmp, tmp);
+        }
+        else {
+            tmp = load_reg(r2);
+        }
+        tcg_gen_neg_i64(tmp, tmp);
+        store_reg(r1, tmp);
+        gen_helper_set_cc_comp_s64(cc, tmp);
+        break;
+    case 4: /* LGR R1,R2 [RRE] */
+        tmp = load_reg(r2);
+        store_reg(r1, tmp);
+        break;
+    case 0x6: /* LGBR R1,R2 [RRE] */
+        tmp2 = load_reg(r2);
+        tcg_gen_ext8s_i64(tmp2, tmp2);
+        store_reg(r1, tmp2);
+        break;
+    case 8: /* AGR     R1,R2     [RRE] */
+    case 0xa: /* ALGR     R1,R2     [RRE] */
+        tmp = load_reg(r1);
+        tmp2 = load_reg(r2);
+        tmp3 = tcg_temp_new_i64();
+        tcg_gen_add_i64(tmp3, tmp, tmp2);
+        store_reg(r1, tmp3);
+        switch (op) {
+        case 0x8: gen_set_cc_add64(tmp, tmp2, tmp3); break;
+        case 0xa: gen_helper_set_cc_addu64(cc, tmp, tmp2, tmp3); break;
+        default: tcg_abort();
+        }
+        break;
+    case 9: /* SGR     R1,R2     [RRE] */
+    case 0xb: /* SLGR     R1,R2     [RRE] */
+    case 0x1b: /* SLGFR     R1,R2     [RRE] */
+    case 0x19: /* SGFR     R1,R2     [RRE] */
+        tmp = load_reg(r1);
+        switch (op) {
+        case 0x1b: case 0x19:
+            tmp2 = load_reg32(r2);
+            if (op == 0x19) tcg_gen_ext32s_i64(tmp2, tmp2);
+            else tcg_gen_ext32u_i64(tmp2, tmp2);
+            break;
+        default:
+            tmp2 = load_reg(r2);
+            break;
+        }
+        tmp3 = tcg_temp_new_i64();
+        tcg_gen_sub_i64(tmp3, tmp, tmp2);
+        store_reg(r1, tmp3);
+        switch (op) {
+        case 9: case 0x19: gen_helper_set_cc_sub64(cc, tmp,tmp2,tmp3); break;
+        case 0xb: case 0x1b: gen_helper_set_cc_subu64(cc, tmp, tmp2, tmp3); break;
+        default: tcg_abort();
+        }
+        break;
+    case 0xc: /* MSGR      R1,R2     [RRE] */
+    case 0x1c: /* MSGFR      R1,R2     [RRE] */
+        tmp = load_reg(r1);
+        tmp2 = load_reg(r2);
+        if (op == 0x1c) tcg_gen_ext32s_i64(tmp2, tmp2);
+        tcg_gen_mul_i64(tmp, tmp, tmp2);
+        store_reg(r1, tmp);
+        break;
+    case 0xd: /* DSGR      R1,R2     [RRE] */
+    case 0x1d: /* DSGFR      R1,R2     [RRE] */
+        tmp = load_reg(r1 + 1);
+        if (op == 0xd) {
+            tmp2 = load_reg(r2);
+        }
+        else {
+            tmp2 = load_reg32(r2);
+            tcg_gen_ext32s_i64(tmp2, tmp2);
+        }
+        tmp3 = tcg_temp_new_i64();
+        tcg_gen_div_i64(tmp3, tmp, tmp2);
+        store_reg(r1 + 1, tmp3);
+        tcg_gen_rem_i64(tmp3, tmp, tmp2);
+        store_reg(r1, tmp3);
+        break;
+    case 0x14: /* LGFR     R1,R2     [RRE] */
+        tmp = load_reg32(r2);
+        tmp2 = tcg_temp_new_i64();
+        tcg_gen_ext32s_i64(tmp2, tmp);
+        store_reg(r1, tmp2);
+        break;
+    case 0x16: /* LLGFR      R1,R2     [RRE] */
+        tmp = load_reg32(r2);
+        tcg_gen_ext32u_i64(tmp, tmp);
+        store_reg(r1, tmp);
+        break;
+    case 0x17: /* LLGTR      R1,R2     [RRE] */
+        tmp = load_reg32(r2);
+        tcg_gen_andi_i32(tmp, tmp, 0x7fffffffUL);
+        tcg_gen_ext32u_i64(tmp, tmp);
+        store_reg(r1, tmp);
+        break;
+    case 0x18: /* AGFR     R1,R2     [RRE] */
+    case 0x1a: /* ALGFR     R1,R2     [RRE] */
+        tmp2 = load_reg32(r2);
+        switch (op) {
+        case 0x18: tcg_gen_ext32s_i64(tmp2, tmp2); break;
+        case 0x1a: tcg_gen_ext32u_i64(tmp2, tmp2); break;
+        default: tcg_abort();
+        }
+        tmp = load_reg(r1);
+        tmp3 = tcg_temp_new_i64();
+        tcg_gen_add_i64(tmp3, tmp, tmp2);
+        store_reg(r1, tmp3);
+        switch (op) {
+        case 0x18: gen_set_cc_add64(tmp, tmp2, tmp3); break;
+        case 0x1a: gen_helper_set_cc_addu64(cc, tmp, tmp2, tmp3); break;
+        default: tcg_abort();
+        }
+        break;
+    case 0x20: /* CGR     R1,R2     [RRE] */
+    case 0x30: /* CGFR     R1,R2     [RRE] */
+        tmp2 = load_reg(r2);
+        if (op == 0x30) tcg_gen_ext32s_i64(tmp2, tmp2);
+        tmp = load_reg(r1);
+        cmp_s64(tmp, tmp2);
+        break;
+    case 0x21: /* CLGR     R1,R2     [RRE] */
+    case 0x31: /* CLGFR    R1,R2     [RRE] */
+        tmp2 = load_reg(r2);
+        if (op == 0x31) tcg_gen_ext32u_i64(tmp2, tmp2);
+        tmp = load_reg(r1);
+        cmp_u64(tmp, tmp2);
+        break;
+    case 0x26: /* LBR R1,R2 [RRE] */
+        tmp2 = load_reg32(r2);
+        tcg_gen_ext8s_i32(tmp2, tmp2);
+        store_reg32(r1, tmp2);
+        break;
+    case 0x27: /* LHR R1,R2 [RRE] */
+        tmp2 = load_reg32(r2);
+        tcg_gen_ext16s_i32(tmp2, tmp2);
+        store_reg32(r1, tmp2);
+        break;
+    case 0x80: /* NGR R1,R2 [RRE] */
+    case 0x81: /* OGR R1,R2 [RRE] */
+    case 0x82: /* XGR R1,R2 [RRE] */
+        tmp = load_reg(r1);
+        tmp2 = load_reg(r2);
+        switch (op) {
+        case 0x80: tcg_gen_and_i64(tmp, tmp, tmp2); break;
+        case 0x81: tcg_gen_or_i64(tmp, tmp, tmp2); break;
+        case 0x82: tcg_gen_xor_i64(tmp, tmp, tmp2); break;
+        default: tcg_abort();
+        }
+        store_reg(r1, tmp);
+        set_cc_nz_u64(tmp);
+        break;
+    case 0x83: /* FLOGR R1,R2 [RRE] */
+        tmp2 = load_reg(r2);
+        tmp = tcg_const_i32(r1);
+        gen_helper_flogr(cc, tmp, tmp2);
+        break;
+    case 0x84: /* LLGCR R1,R2 [RRE] */
+        tmp = load_reg(r2);
+        tcg_gen_andi_i64(tmp, tmp, 0xff);
+        store_reg(r1, tmp);
+        break;
+    case 0x85: /* LLGHR R1,R2 [RRE] */
+        tmp = load_reg(r2);
+        tcg_gen_andi_i64(tmp, tmp, 0xffff);
+        store_reg(r1, tmp);
+        break;
+    case 0x87: /* DLGR      R1,R2     [RRE] */
+        tmp = tcg_const_i32(r1);
+        tmp2 = load_reg(r2);
+        gen_helper_dlg(tmp, tmp2);
+        break;
+    case 0x88: /* ALCGR     R1,R2     [RRE] */
+        tmp = load_reg(r1);
+        tmp2 = load_reg(r2);
+        tmp3 = tcg_temp_new_i64();
+        tcg_gen_shri_i64(tmp3, cc, 1);
+        tcg_gen_andi_i64(tmp3, tmp3, 1);
+        tcg_gen_add_i64(tmp3, tmp2, tmp3);
+        tcg_gen_add_i64(tmp3, tmp, tmp3);
+        store_reg(r1, tmp3);
+        gen_helper_set_cc_addc_u64(cc, tmp, tmp2, tmp3);
+        break;
+    case 0x89: /* SLBGR   R1,R2     [RRE] */
+        tmp = load_reg(r1);
+        tmp2 = load_reg(r2);
+        tmp3 = tcg_const_i32(r1);
+        gen_helper_slbg(cc, cc, tmp3, tmp, tmp2);
+        break;
+    case 0x94: /* LLCR R1,R2 [RRE] */
+        tmp = load_reg32(r2);
+        tcg_gen_andi_i32(tmp, tmp, 0xff);
+        store_reg32(r1, tmp);
+        break;
+    case 0x95: /* LLHR R1,R2 [RRE] */
+        tmp = load_reg32(r2);
+        tcg_gen_andi_i32(tmp, tmp, 0xffff);
+        store_reg32(r1, tmp);
+        break;
+    case 0x98: /* ALCR    R1,R2     [RRE] */
+        tmp = tcg_const_i32(r1);
+        tmp2 = load_reg32(r2);
+        gen_helper_addc_u32(cc, cc, tmp, tmp2);
+        break;
+    case 0x99: /* SLBR    R1,R2     [RRE] */
+        tmp = load_reg32(r1);
+        tmp2 = load_reg32(r2);
+        tmp3 = tcg_const_i32(r1);
+        gen_helper_slb(cc, cc, tmp3, tmp, tmp2);
+        break;
+    default:
+        LOG_DISAS("illegal b9 operation 0x%x\n", op);
+        gen_illegal_opcode(s);
+        break;
+    }
+    if (tmp) tcg_temp_free(tmp);
+    if (tmp2) tcg_temp_free(tmp2);
+    if (tmp3) tcg_temp_free(tmp3);
+}
+
+static void disas_c0(DisasContext *s, int op, int r1, int i2)
+{
+    TCGv tmp = 0, tmp2 = 0, tmp3 = 0;
+    LOG_DISAS("disas_c0: op 0x%x r1 %d i2 %d\n", op, r1, i2);
+    uint64_t target = s->pc + i2 * 2;
+    /* FIXME: huh? */ target &= 0xffffffff;
+    switch (op) {
+    case 0: /* larl r1, i2 */
+        tmp = tcg_const_i64(target);
+        store_reg(r1, tmp);
+        break;
+    case 0x1: /* LGFI R1,I2 [RIL] */
+        tmp = tcg_const_i64((int64_t)i2);
+        store_reg(r1, tmp);
+        break;
+    case 0x4: /* BRCL     M1,I2     [RIL] */
+        tmp = tcg_const_i32(r1); /* aka m1 */
+        tmp2 = tcg_const_i64(s->pc);
+        tmp3 = tcg_const_i64(i2 * 2);
+        gen_helper_brcl(cc, tmp, tmp2, tmp3);
+        s->is_jmp = DISAS_JUMP;
+        break;
+    case 0x5: /* brasl r1, i2 */
+        tmp = tcg_const_i64(s->pc + 6);
+        store_reg(r1, tmp);
+        tmp = tcg_const_i64(target);
+        tcg_gen_st_i64(tmp, cpu_env, offsetof(CPUState, psw.addr));
+        s->is_jmp = DISAS_JUMP;
+        break;
+    case 0x7: /* XILF R1,I2 [RIL] */
+    case 0xb: /* NILF R1,I2 [RIL] */
+    case 0xd: /* OILF R1,I2 [RIL] */
+        tmp = load_reg32(r1);
+        switch (op) {
+        case 0x7: tcg_gen_xori_i32(tmp, tmp, (uint32_t)i2); break;
+        case 0xb: tcg_gen_andi_i32(tmp, tmp, (uint32_t)i2); break;
+        case 0xd: tcg_gen_ori_i32(tmp, tmp, (uint32_t)i2); break;
+        default: tcg_abort();
+        }
+        store_reg32(r1, tmp);
+        tcg_gen_trunc_i64_i32(tmp, tmp);
+        set_cc_nz_u32(tmp);
+        break;
+    case 0x9: /* IILF R1,I2 [RIL] */
+        tmp = tcg_const_i32((uint32_t)i2);
+        store_reg32(r1, tmp);
+        break;
+    case 0xa: /* NIHF R1,I2 [RIL] */
+        tmp = load_reg(r1);
+        switch (op) {
+        case 0xa: tcg_gen_andi_i64(tmp, tmp, (((uint64_t)((uint32_t)i2)) << 32) | 0xffffffffULL); break;
+        default: tcg_abort();
+        }
+        store_reg(r1, tmp);
+        tcg_gen_shr_i64(tmp, tmp, 32);
+        tcg_gen_trunc_i64_i32(tmp, tmp);
+        set_cc_nz_u32(tmp);
+        break;
+    case 0xe: /* LLIHF R1,I2 [RIL] */
+        tmp = tcg_const_i64(((uint64_t)(uint32_t)i2) << 32);
+        store_reg(r1, tmp);
+        break;
+    case 0xf: /* LLILF R1,I2 [RIL] */
+        tmp = tcg_const_i64((uint32_t)i2);
+        store_reg(r1, tmp);
+        break;
+    default:
+        LOG_DISAS("illegal c0 operation 0x%x\n", op);
+        gen_illegal_opcode(s);
+        break;
+    }
+    if (tmp) tcg_temp_free(tmp);
+    if (tmp2) tcg_temp_free(tmp2);
+    if (tmp3) tcg_temp_free(tmp3);
+}
+
+static void disas_c2(DisasContext *s, int op, int r1, int i2)
+{
+    TCGv tmp = 0, tmp2 = 0, tmp3 = 0;
+    switch (op) {
+    case 0x4: /* SLGFI R1,I2 [RIL] */
+    case 0xa: /* ALGFI R1,I2 [RIL] */
+        tmp = load_reg(r1);
+        tmp2 = tcg_const_i64((uint64_t)(uint32_t)i2);
+        tmp3 = tcg_temp_new_i64();
+        switch (op) {
+        case 0x4:
+            tcg_gen_sub_i64(tmp3, tmp, tmp2);
+            gen_helper_set_cc_subu64(cc, tmp, tmp2, tmp3);
+            break;
+        case 0xa:
+            tcg_gen_add_i64(tmp3, tmp, tmp2);
+            gen_helper_set_cc_addu64(cc, tmp, tmp2, tmp3);
+            break;
+        default: tcg_abort();
+        }
+        store_reg(r1, tmp3);
+        break;
+    case 0x5: /* SLFI R1,I2 [RIL] */
+    case 0xb: /* ALFI R1,I2 [RIL] */
+        tmp = load_reg32(r1);
+        tmp2 = tcg_const_i32(i2);
+        tmp3 = tcg_temp_new_i32();
+        switch (op) {
+        case 0x5:
+            tcg_gen_sub_i32(tmp3, tmp, tmp2);
+            gen_helper_set_cc_subu32(cc, tmp, tmp2, tmp3);
+            break;
+        case 0xb:
+            tcg_gen_add_i32(tmp3, tmp, tmp2);
+            gen_helper_set_cc_addu32(cc, tmp, tmp2, tmp3);
+            break;
+        default: tcg_abort();
+        }
+        store_reg32(r1, tmp3);
+        break;
+    case 0xc: /* CGFI R1,I2 [RIL] */
+        tmp = load_reg(r1);
+        cmp_s64c(tmp, (int64_t)i2);
+        break;
+    case 0xe: /* CLGFI R1,I2 [RIL] */
+        tmp = load_reg(r1);
+        cmp_u64c(tmp, (uint64_t)(uint32_t)i2);
+        break;
+    case 0xd: /* CFI R1,I2 [RIL] */
+    case 0xf: /* CLFI R1,I2 [RIL] */
+        tmp = load_reg32(r1);
+        switch (op) {
+        case 0xd: cmp_s32c(tmp, i2); break;
+        case 0xf: cmp_u32c(tmp, i2); break;
+        default: tcg_abort();
+        }
+        break;
+    default:
+        LOG_DISAS("illegal c2 operation 0x%x\n", op);
+        gen_illegal_opcode(s);
+        break;
+    }
+    if (tmp) tcg_temp_free(tmp);
+    if (tmp2) tcg_temp_free(tmp2);
+    if (tmp3) tcg_temp_free(tmp3);
+}
+
+static inline uint64_t ld_code2(uint64_t pc) { return (uint64_t)lduw_code(pc); }
+static inline uint64_t ld_code4(uint64_t pc) { return (uint64_t)ldl_code(pc); }
+static inline uint64_t ld_code6(uint64_t pc)
+{
+    uint64_t opc;
+    opc = (uint64_t)lduw_code(pc) << 32;
+    opc |= (uint64_t)(unsigned int)ldl_code(pc+2);
+    return opc;
+}
+
+static void disas_s390_insn(CPUState *env, DisasContext *s)
+{
+    TCGv tmp = 0,tmp2 = 0,tmp3 = 0;
+    unsigned char opc;
+    uint64_t insn;
+    int op, r1, r2, r3, d1, d2, x2, b1, b2, i, i2, r1b;
+    TCGv vl, vd1, vd2, vb;
+    
+    opc = ldub_code(s->pc);
+    LOG_DISAS("opc 0x%x\n", opc);
+
+#define FETCH_DECODE_RR \
+    insn = ld_code2(s->pc); \
+    DEBUGINSN \
+    r1 = (insn >> 4) & 0xf; \
+    r2 = insn & 0xf;
+
+#define FETCH_DECODE_RX \
+    insn = ld_code4(s->pc); \
+    DEBUGINSN \
+    r1 = (insn >> 20) & 0xf; \
+    x2 = (insn >> 16) & 0xf; \
+    b2 = (insn >> 12) & 0xf; \
+    d2 = insn & 0xfff; \
+    tmp = get_address(x2, b2, d2);
+
+#define FETCH_DECODE_RS \
+    insn = ld_code4(s->pc); \
+    DEBUGINSN \
+    r1 = (insn >> 20) & 0xf; \
+    r3 = (insn >> 16) & 0xf; /* aka m3 */ \
+    b2 = (insn >> 12) & 0xf; \
+    d2 = insn & 0xfff;
+        
+#define FETCH_DECODE_SI \
+    insn = ld_code4(s->pc); \
+    i2 = (insn >> 16) & 0xff; \
+    b1 = (insn >> 12) & 0xf; \
+    d1 = insn & 0xfff; \
+    tmp = get_address(0, b1, d1);
+
+    switch (opc) {
+    case 0x7: /* BCR    M1,R2     [RR] */
+        FETCH_DECODE_RR
+        if (r2) {
+            gen_bcr(r1, r2, s->pc);
+            s->is_jmp = DISAS_JUMP;
+        }
+        else {
+            /* FIXME: "serialization and checkpoint-synchronization function"? */
+        }
+        s->pc += 2;
+        break;
+    case 0xa: /* SVC    I         [RR] */
+        insn = ld_code2(s->pc);
+        DEBUGINSN
+        i = insn & 0xff;
+        tmp = tcg_const_i64(s->pc);
+        tcg_gen_st_i64(tmp, cpu_env, offsetof(CPUState, psw.addr));
+        s->is_jmp = DISAS_SVC;
+        s->pc += 2;
+        break;
+    case 0xd: /* BASR   R1,R2     [RR] */
+        FETCH_DECODE_RR
+        tmp = tcg_const_i64(s->pc + 2);
+        store_reg(r1, tmp);
+        if (r2) {
+            tmp2 = load_reg(r2);
+            tcg_gen_st_i64(tmp2, cpu_env, offsetof(CPUState, psw.addr));
+            s->is_jmp = DISAS_JUMP;
+        }
+        s->pc += 2;
+        break;
+    case 0x10: /* LPR    R1,R2     [RR] */
+        FETCH_DECODE_RR
+        tmp2 = load_reg32(r2);
+        tmp = tcg_const_i32(r1);
+        gen_helper_abs_i32(cc, tmp, tmp2);
+        s->pc += 2;
+        break;
+    case 0x11: /* LNR    R1,R2     [RR] */
+        FETCH_DECODE_RR
+        tmp2 = load_reg32(r2);
+        tmp = tcg_const_i32(r1);
+        gen_helper_nabs_i32(cc, tmp, tmp2);
+        s->pc += 2;
+        break;
+    case 0x12: /* LTR    R1,R2     [RR] */
+        FETCH_DECODE_RR
+        tmp = load_reg32(r2);
+        if (r1 != r2) store_reg32(r1, tmp);
+        set_cc_s32(tmp);
+        s->pc += 2;
+        break;
+    case 0x13: /* LCR    R1,R2     [RR] */
+        FETCH_DECODE_RR
+        tmp = load_reg32(r2);
+        tcg_gen_neg_i32(tmp, tmp);
+        store_reg32(r1, tmp);
+        gen_helper_set_cc_comp_s32(cc, tmp);
+        s->pc += 2;
+        break;
+    case 0x14: /* NR     R1,R2     [RR] */
+    case 0x16: /* OR     R1,R2     [RR] */
+    case 0x17: /* XR     R1,R2     [RR] */
+        FETCH_DECODE_RR
+        tmp2 = load_reg32(r2);
+        tmp = load_reg32(r1);
+        switch (opc) {
+        case 0x14: tcg_gen_and_i32(tmp, tmp, tmp2); break;
+        case 0x16: tcg_gen_or_i32(tmp, tmp, tmp2); break;
+        case 0x17: tcg_gen_xor_i32(tmp, tmp, tmp2); break;
+        default: tcg_abort();
+        }
+        store_reg32(r1, tmp);
+        set_cc_nz_u32(tmp);
+        s->pc += 2;
+        break;
+    case 0x18: /* LR     R1,R2     [RR] */
+        FETCH_DECODE_RR
+        tmp = load_reg32(r2);
+        store_reg32(r1, tmp);
+        s->pc += 2;
+        break;
+    case 0x15: /* CLR    R1,R2     [RR] */
+    case 0x19: /* CR     R1,R2     [RR] */ 
+        FETCH_DECODE_RR
+        tmp = load_reg32(r1);
+        tmp2 = load_reg32(r2);
+        switch (opc) {
+        case 0x15: cmp_u32(tmp, tmp2); break;
+        case 0x19: cmp_s32(tmp, tmp2); break;
+        default: tcg_abort();
+        }
+        s->pc += 2;
+        break;
+    case 0x1a: /* AR     R1,R2     [RR] */
+    case 0x1e: /* ALR    R1,R2     [RR] */
+        FETCH_DECODE_RR
+        tmp = load_reg32(r1);
+        tmp2 = load_reg32(r2);
+        tmp3 = tcg_temp_new_i32();
+        tcg_gen_add_i32(tmp3, tmp, tmp2);
+        store_reg32(r1, tmp3);
+        switch (opc) {
+        case 0x1a: gen_helper_set_cc_add32(cc, tmp, tmp2, tmp3); break;
+        case 0x1e: gen_helper_set_cc_addu32(cc, tmp, tmp2, tmp3); break;
+        default: tcg_abort();
+        }
+        s->pc += 2;
+        break;
+    case 0x1b: /* SR     R1,R2     [RR] */
+    case 0x1f: /* SLR    R1,R2     [RR] */
+        FETCH_DECODE_RR
+        tmp = load_reg32(r1);
+        tmp2 = load_reg32(r2);
+        tmp3 = tcg_temp_new_i32();
+        tcg_gen_sub_i32(tmp3, tmp, tmp2);
+        store_reg32(r1, tmp3);
+        switch (opc) {
+        case 0x1b: gen_helper_set_cc_sub32(cc, tmp, tmp2, tmp3); break;
+        case 0x1f: gen_helper_set_cc_subu32(cc, tmp, tmp2, tmp3); break;
+        default: tcg_abort();
+        }
+        s->pc += 2;
+        break;
+    case 0x28: /* LDR    R1,R2               [RR] */
+        FETCH_DECODE_RR
+        tmp = load_freg(r2);
+        store_freg(r1, tmp);
+        s->pc += 2;
+        break;
+    case 0x38: /* LER    R1,R2               [RR] */
+        FETCH_DECODE_RR
+        tmp = load_freg32(r2);
+        store_freg32(r1, tmp);
+        s->pc += 2;
+        break;
+    case 0x40: /* STH    R1,D2(X2,B2)     [RX] */
+        FETCH_DECODE_RX
+        tmp2 = load_reg32(r1);
+        tcg_gen_qemu_st16(tmp2, tmp, 1);
+        s->pc += 4;
+        break;
+    case 0x41:	/* la */
+        FETCH_DECODE_RX
+        store_reg(r1, tmp); /* FIXME: 31/24-bit addressing */
+        s->pc += 4;
+        break;
+    case 0x42: /* STC    R1,D2(X2,B2)     [RX] */
+        FETCH_DECODE_RX
+        tmp2 = load_reg32(r1);
+        tcg_gen_qemu_st8(tmp2, tmp, 1);
+        s->pc += 4;
+        break;
+    case 0x43: /* IC     R1,D2(X2,B2)     [RX] */
+        FETCH_DECODE_RX
+        tmp2 = tcg_temp_new_i32();
+        tcg_gen_qemu_ld8u(tmp2, tmp, 1);
+        store_reg8(r1, tmp2);
+        s->pc += 4;
+        break;
+    case 0x44: /* EX     R1,D2(X2,B2)     [RX] */
+        FETCH_DECODE_RX
+        tmp2 = load_reg(r1);
+        tmp3 = tcg_const_i64(s->pc + 4);
+        gen_helper_ex(cc, cc, tmp2, tmp, tmp3);
+        s->pc += 4;
+        break;
+    case 0x47: /* BC     M1,D2(X2,B2)     [RX] */
+        FETCH_DECODE_RX
+        /* FIXME: optimize m1 == 0xf (unconditional) case */
+        tmp2 = tcg_const_i32(r1); /* aka m1 */
+        tmp3 = tcg_const_i64(s->pc);
+        gen_helper_bc(cc, tmp2, tmp, tmp3);
+        s->is_jmp = DISAS_JUMP;
+        s->pc += 4;
+        break;
+    case 0x48: /* LH     R1,D2(X2,B2)     [RX] */
+        FETCH_DECODE_RX
+        tmp2 = tcg_temp_new_i32();
+        tcg_gen_qemu_ld16s(tmp2, tmp, 1);
+        store_reg32(r1, tmp2);
+        s->pc += 4;
+        break;
+    case 0x49: /* CH     R1,D2(X2,B2)     [RX] */
+        FETCH_DECODE_RX
+        tmp2 = tcg_temp_new_i32();
+        tcg_gen_qemu_ld16s(tmp2, tmp, 1);
+        tmp = load_reg32(r1);
+        cmp_s32(tmp, tmp2);
+        s->pc += 4;
+        break;
+    case 0x4a: /* AH     R1,D2(X2,B2)     [RX] */
+    case 0x4b: /* SH     R1,D2(X2,B2)     [RX] */
+    case 0x4c: /* MH     R1,D2(X2,B2)     [RX] */
+        FETCH_DECODE_RX
+        tmp2 = tcg_temp_new_i32();
+        tcg_gen_qemu_ld16s(tmp2, tmp, 1);
+        tmp = load_reg32(r1);
+        tmp3 = tcg_temp_new_i32();
+        switch (opc) {
+        case 0x4a:
+            tcg_gen_add_i32(tmp3, tmp, tmp2);
+            gen_helper_set_cc_add32(cc, tmp, tmp2, tmp3);
+            break;
+        case 0x4b:
+            tcg_gen_sub_i32(tmp3, tmp, tmp2);
+            gen_helper_set_cc_sub32(cc, tmp, tmp2, tmp3);
+            break;
+        case 0x4c:
+            tcg_gen_mul_i32(tmp3, tmp, tmp2);
+            break;
+        default: tcg_abort();
+        }
+        store_reg32(r1, tmp3);
+        s->pc += 4;
+        break;
+    case 0x50: /* st r1, d2(x2, b2) */
+        FETCH_DECODE_RX
+        tmp2 = load_reg32(r1);
+        tcg_gen_qemu_st32(tmp2, tmp, 1);
+        s->pc += 4;
+        break;
+    case 0x55: /* CL     R1,D2(X2,B2)     [RX] */
+        FETCH_DECODE_RX
+        tmp2 = tcg_temp_new_i32();
+        tcg_gen_qemu_ld32u(tmp2, tmp, 1);
+        tmp = load_reg32(r1);
+        cmp_u32(tmp, tmp2);
+        s->pc += 4;
+        break;
+    case 0x54: /* N      R1,D2(X2,B2)     [RX] */
+    case 0x56: /* O      R1,D2(X2,B2)     [RX] */
+    case 0x57: /* X      R1,D2(X2,B2)     [RX] */
+        FETCH_DECODE_RX
+        tmp2 = tcg_temp_new_i32();
+        tcg_gen_qemu_ld32u(tmp2, tmp, 1);
+        tmp = load_reg32(r1);
+        switch (opc) {
+        case 0x54: tcg_gen_and_i32(tmp, tmp, tmp2); break;
+        case 0x56: tcg_gen_or_i32(tmp, tmp, tmp2); break;
+        case 0x57: tcg_gen_xor_i32(tmp, tmp, tmp2); break;
+        default: tcg_abort();
+        }
+        store_reg32(r1, tmp);
+        set_cc_nz_u32(tmp);
+        s->pc += 4;
+        break;
+    case 0x58: /* l r1, d2(x2, b2) */
+        FETCH_DECODE_RX
+        tmp2 = tcg_temp_new_i32();
+        tcg_gen_qemu_ld32u(tmp2, tmp, 1);
+        store_reg32(r1, tmp2);
+        s->pc += 4;
+        break;
+    case 0x59: /* C      R1,D2(X2,B2)     [RX] */
+        FETCH_DECODE_RX
+        tmp2 = tcg_temp_new_i32();
+        tcg_gen_qemu_ld32s(tmp2, tmp, 1);
+        tmp = load_reg32(r1);
+        cmp_s32(tmp, tmp2);
+        s->pc += 4;
+        break;
+    case 0x5a: /* A      R1,D2(X2,B2)     [RX] */
+    case 0x5b: /* S      R1,D2(X2,B2)     [RX] */
+    case 0x5e: /* AL     R1,D2(X2,B2)     [RX] */
+    case 0x5f: /* SL     R1,D2(X2,B2)     [RX] */
+        FETCH_DECODE_RX
+        tmp2 = load_reg32(r1);
+        tcg_gen_qemu_ld32s(tmp, tmp, 1);
+        tmp3 = tcg_temp_new_i32();
+        switch (opc) {
+        case 0x5a: case 0x5e: tcg_gen_add_i32(tmp3, tmp2, tmp); break;
+        case 0x5b: case 0x5f: tcg_gen_sub_i32(tmp3, tmp2, tmp); break;
+        default: tcg_abort();
+        }
+        store_reg32(r1, tmp3);
+        switch (opc) {
+        case 0x5a: gen_helper_set_cc_add32(cc, tmp2, tmp, tmp3); break;
+        case 0x5e: gen_helper_set_cc_addu32(cc, tmp2, tmp, tmp3); break;
+        case 0x5b: gen_helper_set_cc_sub32(cc, tmp2, tmp, tmp3); break;
+        case 0x5f: gen_helper_set_cc_subu32(cc, tmp2, tmp, tmp3); break;
+        default: tcg_abort();
+        }
+        s->pc += 4;
+        break;
+    case 0x60: /* STD    R1,D2(X2,B2)        [RX] */
+        FETCH_DECODE_RX
+        tmp2 = load_freg(r1);
+        tcg_gen_qemu_st64(tmp2, tmp, 1);
+        s->pc += 4;
+        break;
+    case 0x68: /* LD    R1,D2(X2,B2)        [RX] */
+        FETCH_DECODE_RX
+        tmp2 = tcg_temp_new_i64();
+        tcg_gen_qemu_ld64(tmp2, tmp, 1);
+        store_freg(r1, tmp2);
+        s->pc += 4;
+        break;
+    case 0x70: /* STE R1,D2(X2,B2) [RX] */
+        FETCH_DECODE_RX
+        tmp2 = load_freg32(r1);
+        tcg_gen_qemu_st32(tmp2, tmp, 1);
+        s->pc += 4;
+        break;
+    case 0x71: /* MS      R1,D2(X2,B2)     [RX] */
+        FETCH_DECODE_RX
+        tmp2 = tcg_temp_new_i32();
+        tcg_gen_qemu_ld32s(tmp2, tmp, 1);
+        tmp = load_reg(r1);
+        tcg_gen_mul_i32(tmp, tmp, tmp2);
+        store_reg(r1, tmp);
+        s->pc += 4;
+        break;
+    case 0x78: /* LE     R1,D2(X2,B2)        [RX] */
+        FETCH_DECODE_RX
+        tmp2 = tcg_temp_new_i32();
+        tcg_gen_qemu_ld32u(tmp2, tmp, 1);
+        store_freg32(r1, tmp2);
+        s->pc += 4;
+        break;
+    case 0x88: /* SRL    R1,D2(B2)        [RS] */
+    case 0x89: /* SLL    R1,D2(B2)        [RS] */
+    case 0x8a: /* SRA    R1,D2(B2)        [RS] */
+        FETCH_DECODE_RS
+        tmp = get_address(0, b2, d2);
+        tcg_gen_andi_i64(tmp, tmp, 0x3f);
+        tmp2 = load_reg32(r1);
+        switch (opc) {
+        case 0x88: tcg_gen_shr_i32(tmp2, tmp2, tmp); break;
+        case 0x89: tcg_gen_shl_i32(tmp2, tmp2, tmp); break;
+        case 0x8a: tcg_gen_sar_i32(tmp2, tmp2, tmp); break;
+        default: tcg_abort();
+        }
+        store_reg32(r1, tmp2);
+        if (opc == 0x8a) set_cc_s32(tmp2);
+        s->pc += 4;
+        break;
+    case 0x91: /* TM     D1(B1),I2        [SI] */
+        FETCH_DECODE_SI
+        tmp2 = tcg_temp_new_i32();
+        tcg_gen_qemu_ld8u(tmp2, tmp, 1);
+        tmp = tcg_const_i32(i2);
+        gen_helper_tm(cc, tmp2, tmp);
+        s->pc += 4;
+        break;
+    case 0x92: /* MVI    D1(B1),I2        [SI] */
+        FETCH_DECODE_SI
+        tmp2 = tcg_const_i32(i2);
+        tcg_gen_qemu_st8(tmp2, tmp, 1);
+        s->pc += 4;
+        break;
+    case 0x94: /* NI     D1(B1),I2        [SI] */
+    case 0x96: /* OI     D1(B1),I2        [SI] */
+    case 0x97: /* XI     D1(B1),I2        [SI] */
+        FETCH_DECODE_SI
+        tmp2 = tcg_temp_new_i32();
+        tcg_gen_qemu_ld8u(tmp2, tmp, 1);
+        switch (opc) {
+        case 0x94: tcg_gen_andi_i32(tmp2, tmp2, i2); break;
+        case 0x96: tcg_gen_ori_i32(tmp2, tmp2, i2); break;
+        case 0x97: tcg_gen_xori_i32(tmp2, tmp2, i2); break;
+        default: tcg_abort();
+        }
+        tcg_gen_qemu_st8(tmp2, tmp, 1);
+        set_cc_nz_u32(tmp2);
+        s->pc += 4;
+        break;
+    case 0x95: /* CLI    D1(B1),I2        [SI] */
+        FETCH_DECODE_SI
+        tmp2 = tcg_temp_new_i32();
+        tcg_gen_qemu_ld8u(tmp2, tmp, 1);
+        cmp_u32c(tmp2, i2);
+        s->pc += 4;
+        break;
+    case 0x9b: /* STAM     R1,R3,D2(B2)     [RS] */
+        FETCH_DECODE_RS
+        tmp = tcg_const_i32(r1);
+        tmp2 = get_address(0, b2, d2);
+        tmp3 = tcg_const_i32(r3);
+        gen_helper_stam(tmp, tmp2, tmp3);
+        s->pc += 4;
+        break;
+    case 0xa5:
+        insn = ld_code4(s->pc);
+        r1 = (insn >> 20) & 0xf;
+        op = (insn >> 16) & 0xf;
+        i2 = insn & 0xffff;
+        disas_a5(s, op, r1, i2);
+        s->pc += 4;
+        break;
+    case 0xa7:
+        insn = ld_code4(s->pc);
+        r1 = (insn >> 20) & 0xf;
+        op = (insn >> 16) & 0xf;
+        i2 = (short)insn;
+        disas_a7(s, op, r1, i2);
+        s->pc += 4;
+        break;
+    case 0xa8: /* MVCLE   R1,R3,D2(B2)     [RS] */
+        FETCH_DECODE_RS
+        tmp = tcg_const_i32(r1);
+        tmp3 = tcg_const_i32(r3);
+        tmp2 = get_address(0, b2, d2);
+        gen_helper_mvcle(cc, tmp, tmp2, tmp3);
+        s->pc += 4;
+        break;
+    case 0xa9: /* CLCLE   R1,R3,D2(B2)     [RS] */
+        FETCH_DECODE_RS
+        tmp = tcg_const_i32(r1);
+        tmp3 = tcg_const_i32(r3);
+        tmp2 = get_address(0, b2, d2);
+        gen_helper_clcle(cc, tmp, tmp2, tmp3);
+        s->pc += 4;
+        break;
+    case 0xb2:
+        insn = ld_code4(s->pc);
+        op = (insn >> 16) & 0xff;
+        switch (op) {
+        case 0x9c: /* STFPC    D2(B2) [S] */
+            d2 = insn & 0xfff;
+            b2 = (insn >> 12) & 0xf;
+            tmp = tcg_temp_new_i32();
+            tcg_gen_ld_i32(tmp, cpu_env, offsetof(CPUState, fpc));
+            tmp2 = get_address(0, b2, d2);
+            tcg_gen_qemu_st32(tmp, tmp2, 1);
+            break;
+        default:
+            r1 = (insn >> 4) & 0xf;
+            r2 = insn & 0xf;
+            disas_b2(s, op, r1, r2);
+            break;
+        }
+        s->pc += 4;
+        break;
+    case 0xb3:
+        insn = ld_code4(s->pc);
+        op = (insn >> 16) & 0xff;
+        r3 = (insn >> 12) & 0xf; /* aka m3 */
+        r1 = (insn >> 4) & 0xf;
+        r2 = insn & 0xf;
+        disas_b3(s, op, r3, r1, r2);
+        s->pc += 4;
+        break;
+    case 0xb9:
+        insn = ld_code4(s->pc);
+        r1 = (insn >> 4) & 0xf;
+        r2 = insn & 0xf;
+        op = (insn >> 16) & 0xff;
+        disas_b9(s, op, r1, r2);
+        s->pc += 4;
+        break;
+    case 0xba: /* CS     R1,R3,D2(B2)     [RS] */
+        FETCH_DECODE_RS
+        tmp = tcg_const_i32(r1);
+        tmp2 = get_address(0, b2, d2);
+        tmp3 = tcg_const_i32(r3);
+        gen_helper_cs(cc, tmp, tmp2, tmp3);
+        s->pc += 4;
+        break;
+    case 0xbd: /* CLM    R1,M3,D2(B2)     [RS] */
+        FETCH_DECODE_RS
+        tmp3 = get_address(0, b2, d2);
+        tmp2 = tcg_const_i32(r3); /* aka m3 */
+        tmp = load_reg32(r1);
+        gen_helper_clm(cc, tmp, tmp2, tmp3);
+        s->pc += 4;
+        break;
+    case 0xbe: /* STCM R1,M3,D2(B2) [RS] */
+        FETCH_DECODE_RS
+        tmp3 = get_address(0, b2, d2);
+        tmp2 = tcg_const_i32(r3); /* aka m3 */
+        tmp = load_reg32(r1);
+        gen_helper_stcm(tmp, tmp2, tmp3);
+        s->pc += 4;
+        break;
+    case 0xbf: /* ICM    R1,M3,D2(B2)     [RS] */
+        FETCH_DECODE_RS
+        if (r3 == 15) {	/* effectively a 32-bit load */
+            tmp = get_address(0, b2, d2);
+            tmp2 = tcg_temp_new_i32();
+            tcg_gen_qemu_ld32u(tmp2, tmp, 1);
+            store_reg32(r1, tmp2);
+            tmp = tcg_const_i32(r3);
+            gen_helper_set_cc_icm(cc, tmp, tmp2);
+        }
+        else if (r3) {
+            uint32_t mask = 0x00ffffffUL;
+            uint32_t shift = 24;
+            int m3 = r3;
+            tmp3 = load_reg32(r1);
+            tmp = get_address(0, b2, d2);
+            tmp2 = tcg_temp_new_i32();
+            while (m3) {
+                if (m3 & 8) {
+                    tcg_gen_qemu_ld8u(tmp2, tmp, 1);
+                    if (shift) tcg_gen_shli_i32(tmp2, tmp2, shift);
+                    tcg_gen_andi_i32(tmp3, tmp3, mask);
+                    tcg_gen_or_i32(tmp3, tmp3, tmp2);
+                    tcg_gen_addi_i64(tmp, tmp, 1);
+                }
+                m3 = (m3 << 1) & 0xf;
+                mask = (mask >> 8) | 0xff000000UL;
+                shift -= 8;
+            }
+            store_reg32(r1, tmp3);
+            tmp = tcg_const_i32(r3);
+            gen_helper_set_cc_icm(cc, tmp, tmp2);
+        }
+        else {
+            tmp = tcg_const_i32(0);
+            gen_helper_set_cc_icm(cc, tmp, tmp);	/* i.e. env->cc = 0 */
+        }
+        s->pc += 4;
+        break;
+    case 0xc0:
+    case 0xc2:
+        insn = ld_code6(s->pc);
+        r1 = (insn >> 36) & 0xf;
+        op = (insn >> 32) & 0xf;
+        i2 = (int)insn;
+        switch (opc) {
+        case 0xc0: disas_c0(s, op, r1, i2); break;
+        case 0xc2: disas_c2(s, op, r1, i2); break;
+        default: tcg_abort();
+        }
+        s->pc += 6;
+        break;
+    case 0xd2: /* mvc d1(l, b1), d2(b2) */
+    case 0xd4: /* NC     D1(L,B1),D2(B2)         [SS] */
+    case 0xd5: /* CLC    D1(L,B1),D2(B2)         [SS] */
+    case 0xd6: /* OC     D1(L,B1),D2(B2)         [SS] */
+    case 0xd7: /* xc d1(l, b1), d2(b2) */
+        insn = ld_code6(s->pc);
+        vl = tcg_const_i32((insn >> 32) & 0xff);
+        b1 = (insn >> 28) & 0xf;
+        vd1 = tcg_const_i32((insn >> 16) & 0xfff);
+        b2 = (insn >> 12) & 0xf;
+        vd2 = tcg_const_i32(insn & 0xfff);
+        vb = tcg_const_i32((b1 << 4) | b2);
+        switch (opc) {
+        case 0xd2: gen_helper_mvc(vl, vb, vd1, vd2); break;
+        case 0xd4: gen_helper_nc(cc, vl, vb, vd1, vd2); break;
+        case 0xd5: gen_helper_clc(cc, vl, vb, vd1, vd2); break;
+        case 0xd6: gen_helper_oc(cc, vl, vb, vd1, vd2); break;
+        case 0xd7: gen_helper_xc(cc, vl, vb, vd1, vd2); break;
+        default: tcg_abort(); break;
+        }
+        s->pc += 6;
+        break;
+    case 0xe3:
+        insn = ld_code6(s->pc);
+        DEBUGINSN
+        d2 = (  (int) ( (((insn >> 16) & 0xfff) | ((insn << 4) & 0xff000)) << 12 )  ) >> 12;
+        disas_e3(s, /* op */ insn & 0xff, /* r1 */ (insn >> 36) & 0xf, /* x2 */ (insn >> 32) & 0xf, /* b2 */ (insn >> 28) & 0xf, d2 );
+        s->pc += 6;
+        break;
+    case 0xeb:
+        insn = ld_code6(s->pc);
+        DEBUGINSN
+        op = insn & 0xff;
+        r1 = (insn >> 36) & 0xf;
+        r3 = (insn >> 32) & 0xf;
+        b2 = (insn >> 28) & 0xf;
+        d2 = (  (int) ( (((insn >> 16) & 0xfff) | ((insn << 4) & 0xff000)) << 12 )  ) >> 12;
+        disas_eb(s, op, r1, r3, b2, d2);
+        s->pc += 6;
+        break;
+    case 0xed:
+        insn = ld_code6(s->pc);
+        DEBUGINSN
+        op = insn & 0xff;
+        r1 = (insn >> 36) & 0xf;
+        x2 = (insn >> 32) & 0xf;
+        b2 = (insn >> 28) & 0xf;
+        d2 = (short)((insn >> 16) & 0xfff);
+        r1b = (insn >> 12) & 0xf;
+        disas_ed(s, op, r1, x2, b2, d2, r1b);
+        s->pc += 6;
+        break;
+    default:
+        LOG_DISAS("unimplemented opcode 0x%x\n", opc);
+        gen_illegal_opcode(s);
+        s->pc += 6;
+        break;
+    }
+    if (tmp) tcg_temp_free(tmp);
+    if (tmp2) tcg_temp_free(tmp2);
+    if (tmp3) tcg_temp_free(tmp3);
+}
+
+static inline void gen_intermediate_code_internal (CPUState *env,
+                                                          TranslationBlock *tb,
+                                                          int search_pc)
+{
+    DisasContext dc;
+    target_ulong pc_start;
+    uint64_t next_page_start;
+    uint16_t *gen_opc_end;
+    int j, lj = -1;
+    int num_insns, max_insns;
+    
+    pc_start = tb->pc;
+    
+    dc.pc = tb->pc;
+    dc.env = env;
+    dc.pc = pc_start;
+    dc.is_jmp = DISAS_NEXT;
+    
+    gen_opc_end = gen_opc_buf + OPC_MAX_SIZE;
+    
+    next_page_start = (pc_start & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
+    
+    num_insns = 0;
+    max_insns = tb->cflags & CF_COUNT_MASK;
+    if (max_insns == 0)
+        max_insns = CF_COUNT_MASK;
+
+    gen_icount_start();
+#if 1
+    cc = tcg_temp_local_new_i32();
+    tcg_gen_mov_i32(cc, global_cc);
+#else
+    cc = global_cc;
+#endif
+    do {
+        if (search_pc) {
+            j = gen_opc_ptr - gen_opc_buf;
+            if (lj < j) {
+                lj++;
+                while (lj < j)
+                    gen_opc_instr_start[lj++] = 0;
+            }
+            gen_opc_pc[lj] = dc.pc;
+            gen_opc_instr_start[lj] = 1;
+            gen_opc_icount[lj] = num_insns;
+        }
+        if (num_insns + 1 == max_insns && (tb->cflags & CF_LAST_IO))
+            gen_io_start();
+#if defined S390X_DEBUG_DISAS
+        LOG_DISAS("pc " TARGET_FMT_lx "\n",
+                  dc.pc);
+#endif
+        disas_s390_insn(env, &dc);
+        
+        num_insns++;
+    } while (!dc.is_jmp && gen_opc_ptr < gen_opc_end && dc.pc < next_page_start && num_insns < max_insns);
+    tcg_gen_mov_i32(global_cc, cc);
+    tcg_temp_free(cc);
+    
+    if (!dc.is_jmp) {
+        tcg_gen_st_i64(tcg_const_i64(dc.pc), cpu_env, offsetof(CPUState, psw.addr));
+    }
+    
+    if (dc.is_jmp == DISAS_SVC) {
+        tcg_gen_st_i64(tcg_const_i64(dc.pc), cpu_env, offsetof(CPUState, psw.addr));
+        TCGv tmp = tcg_const_i32(EXCP_SVC);
+        gen_helper_exception(tmp);
+    }
+
+    if (tb->cflags & CF_LAST_IO)
+        gen_io_end();
+    /* Generate the return instruction */
+    tcg_gen_exit_tb(0);
+    gen_icount_end(tb, num_insns);
+    *gen_opc_ptr = INDEX_op_end;
+    if (search_pc) {
+        j = gen_opc_ptr - gen_opc_buf;
+        lj++;
+        while (lj <= j)
+            gen_opc_instr_start[lj++] = 0;
+    } else {
+        tb->size = dc.pc - pc_start;
+        tb->icount = num_insns;
+    }
+#if defined S390X_DEBUG_DISAS
+    log_cpu_state_mask(CPU_LOG_TB_CPU, env, 0);
+    if (qemu_loglevel_mask(CPU_LOG_TB_IN_ASM)) {
+        qemu_log("IN: %s\n", lookup_symbol(pc_start));
+        log_target_disas(pc_start, dc.pc - pc_start, 1);
+        qemu_log("\n");
+    }
+#endif
+}
+
+void gen_intermediate_code (CPUState *env, struct TranslationBlock *tb)
+{
+    gen_intermediate_code_internal(env, tb, 0);
+}
+
+void gen_intermediate_code_pc (CPUState *env, struct TranslationBlock *tb)
+{
+    gen_intermediate_code_internal(env, tb, 1);
+}
+
+void gen_pc_load(CPUState *env, TranslationBlock *tb,
+                unsigned long searched_pc, int pc_pos, void *puc)
+{
+    env->psw.addr = gen_opc_pc[pc_pos];
+}
-- 
1.6.2.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 3/9] S/390 host/target build system support
  2009-10-16 12:38   ` [Qemu-devel] [PATCH 2/9] S/390 CPU emulation Ulrich Hecht
@ 2009-10-16 12:38     ` Ulrich Hecht
  2009-10-16 12:38       ` [Qemu-devel] [PATCH 4/9] S/390 host support for TCG Ulrich Hecht
  2009-10-17 10:44       ` [Qemu-devel] [PATCH 3/9] S/390 host/target build system support Aurelien Jarno
  2009-10-17 10:42     ` [Qemu-devel] [PATCH 2/9] S/390 CPU emulation Aurelien Jarno
  1 sibling, 2 replies; 26+ messages in thread
From: Ulrich Hecht @ 2009-10-16 12:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: riku.voipio, agraf

changes to configure and makefiles for S/390 host and target support,
fixed as suggested by Juan Quintela

adapted to most recent changes in build system

Signed-off-by: Ulrich Hecht <uli@suse.de>
---
 configure                            |   22 ++++++++++++++++------
 default-configs/s390x-linux-user.mak |    1 +
 2 files changed, 17 insertions(+), 6 deletions(-)
 create mode 100644 default-configs/s390x-linux-user.mak

diff --git a/configure b/configure
index ca6d45c..64be51f 100755
--- a/configure
+++ b/configure
@@ -157,9 +157,12 @@ case "$cpu" in
   parisc|parisc64)
     cpu="hppa"
   ;;
-  s390*)
+  s390)
     cpu="s390"
   ;;
+  s390x)
+    cpu="s390x"
+  ;;
   sparc|sun4[cdmuv])
     cpu="sparc"
   ;;
@@ -790,6 +793,7 @@ sh4eb-linux-user \
 sparc-linux-user \
 sparc64-linux-user \
 sparc32plus-linux-user \
+s390x-linux-user \
 "
     fi
 # the following are Darwin specific
@@ -855,7 +859,7 @@ fi
 # host long bits test
 hostlongbits="32"
 case "$cpu" in
-  x86_64|alpha|ia64|sparc64|ppc64)
+  x86_64|alpha|ia64|sparc64|ppc64|s390x)
     hostlongbits=64
   ;;
 esac
@@ -1819,7 +1823,7 @@ echo >> $config_host_mak
 echo "CONFIG_QEMU_SHAREDIR=\"$prefix$datasuffix\"" >> $config_host_mak
 
 case "$cpu" in
-  i386|x86_64|alpha|cris|hppa|ia64|m68k|microblaze|mips|mips64|ppc|ppc64|s390|sparc|sparc64)
+  i386|x86_64|alpha|cris|hppa|ia64|m68k|microblaze|mips|mips64|ppc|ppc64|s390|s390x|sparc|sparc64)
     ARCH=$cpu
   ;;
   armv4b|armv4l)
@@ -2090,7 +2094,7 @@ target_arch2=`echo $target | cut -d '-' -f 1`
 target_bigendian="no"
 
 case "$target_arch2" in
-  armeb|m68k|microblaze|mips|mipsn32|mips64|ppc|ppcemb|ppc64|ppc64abi32|sh4eb|sparc|sparc64|sparc32plus)
+  armeb|m68k|microblaze|mips|mipsn32|mips64|ppc|ppcemb|ppc64|ppc64abi32|s390x|sh4eb|sparc|sparc64|sparc32plus)
   target_bigendian=yes
   ;;
 esac
@@ -2250,6 +2254,10 @@ case "$target_arch2" in
     echo "TARGET_ABI32=y" >> $config_target_mak
     target_phys_bits=64
   ;;
+  s390x)
+    target_nptl="yes"
+    target_phys_bits=64
+  ;;
   *)
     echo "Unsupported target CPU"
     exit 1
@@ -2318,7 +2326,7 @@ if test ! -z "$gdb_xml_files" ; then
 fi
 
 case "$target_arch2" in
-  arm|armeb|m68k|microblaze|mips|mipsel|mipsn32|mipsn32el|mips64|mips64el|ppc|ppc64|ppc64abi32|ppcemb|sparc|sparc64|sparc32plus)
+  arm|armeb|m68k|microblaze|mips|mipsel|mipsn32|mipsn32el|mips64|mips64el|ppc|ppc64|ppc64abi32|ppcemb|s390x|sparc|sparc64|sparc32plus)
     echo "CONFIG_SOFTFLOAT=y" >> $config_target_mak
     ;;
   *)
@@ -2351,6 +2359,8 @@ ldflags=""
 
 if test "$ARCH" = "sparc64" ; then
   cflags="-I\$(SRC_PATH)/tcg/sparc $cflags"
+elif test "$ARCH" = "s390x" ; then
+  cflags="-I\$(SRC_PATH)/tcg/s390 $cflags"
 else
   cflags="-I\$(SRC_PATH)/tcg/\$(ARCH) $cflags"
 fi
@@ -2386,7 +2396,7 @@ for i in $ARCH $TARGET_BASE_ARCH ; do
   ppc*)
     echo "CONFIG_PPC_DIS=y"  >> $config_target_mak
   ;;
-  s390)
+  s390*)
     echo "CONFIG_S390_DIS=y"  >> $config_target_mak
   ;;
   sh4)
diff --git a/default-configs/s390x-linux-user.mak b/default-configs/s390x-linux-user.mak
new file mode 100644
index 0000000..a243c99
--- /dev/null
+++ b/default-configs/s390x-linux-user.mak
@@ -0,0 +1 @@
+# Default configuration for s390x-linux-user
-- 
1.6.2.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 4/9] S/390 host support for TCG
  2009-10-16 12:38     ` [Qemu-devel] [PATCH 3/9] S/390 host/target build system support Ulrich Hecht
@ 2009-10-16 12:38       ` Ulrich Hecht
  2009-10-16 12:38         ` [Qemu-devel] [PATCH 5/9] linux-user: S/390 64-bit (s390x) support Ulrich Hecht
  2009-10-17 10:44       ` [Qemu-devel] [PATCH 3/9] S/390 host/target build system support Aurelien Jarno
  1 sibling, 1 reply; 26+ messages in thread
From: Ulrich Hecht @ 2009-10-16 12:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: riku.voipio, agraf

S/390 TCG code generator as posted before

improvements since last time:
- don't use R0 (often means "zero", not "register zero")
- optimized add_i32 immediate
- formatted for better compliance with the QEMU coding style

Signed-off-by: Ulrich Hecht <uli@suse.de>
---
 dyngen-exec.h         |    2 +-
 linux-user/syscall.c  |    2 +-
 s390x.ld              |  194 +++++++++
 tcg/s390/tcg-target.c | 1145 +++++++++++++++++++++++++++++++++++++++++++++++++
 tcg/s390/tcg-target.h |   76 ++++
 5 files changed, 1417 insertions(+), 2 deletions(-)
 create mode 100644 s390x.ld
 create mode 100644 tcg/s390/tcg-target.c
 create mode 100644 tcg/s390/tcg-target.h

diff --git a/dyngen-exec.h b/dyngen-exec.h
index 86e61c3..0353f36 100644
--- a/dyngen-exec.h
+++ b/dyngen-exec.h
@@ -117,7 +117,7 @@ extern int printf(const char *, ...);
 
 /* The return address may point to the start of the next instruction.
    Subtracting one gets us the call instruction itself.  */
-#if defined(__s390__)
+#if defined(__s390__) && !defined(__s390x__)
 # define GETPC() ((void*)(((unsigned long)__builtin_return_address(0) & 0x7fffffffUL) - 1))
 #elif defined(__arm__)
 /* Thumb return addresses have the low bit set, so we need to subtract two.
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index bf06d14..45ccef9 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -183,7 +183,7 @@ static type name (type1 arg1,type2 arg2,type3 arg3,type4 arg4,type5 arg5,	\
 #define __NR_sys_inotify_add_watch __NR_inotify_add_watch
 #define __NR_sys_inotify_rm_watch __NR_inotify_rm_watch
 
-#if defined(__alpha__) || defined (__ia64__) || defined(__x86_64__)
+#if defined(__alpha__) || defined (__ia64__) || defined(__x86_64__) || defined(__s390x__)
 #define __NR__llseek __NR_lseek
 #endif
 
diff --git a/s390x.ld b/s390x.ld
new file mode 100644
index 0000000..7d1f2b7
--- /dev/null
+++ b/s390x.ld
@@ -0,0 +1,194 @@
+/* Default linker script, for normal executables */
+OUTPUT_FORMAT("elf64-s390", "elf64-s390",
+	      "elf64-s390")
+OUTPUT_ARCH(s390:64-bit)
+ENTRY(_start)
+SEARCH_DIR("/usr/s390x-suse-linux/lib64"); SEARCH_DIR("/usr/local/lib64"); SEARCH_DIR("/lib64"); SEARCH_DIR("/usr/lib64"); SEARCH_DIR("/usr/s390x-suse-linux/lib"); SEARCH_DIR("/usr/lib64"); SEARCH_DIR("/usr/local/lib"); SEARCH_DIR("/lib"); SEARCH_DIR("/usr/lib");
+SECTIONS
+{
+  /* Read-only sections, merged into text segment: */
+  PROVIDE (__executable_start = 0x60000000); . = 0x60000000 + SIZEOF_HEADERS;
+  .interp         : { *(.interp) }
+  .note.gnu.build-id : { *(.note.gnu.build-id) }
+  .hash           : { *(.hash) }
+  .gnu.hash       : { *(.gnu.hash) }
+  .dynsym         : { *(.dynsym) }
+  .dynstr         : { *(.dynstr) }
+  .gnu.version    : { *(.gnu.version) }
+  .gnu.version_d  : { *(.gnu.version_d) }
+  .gnu.version_r  : { *(.gnu.version_r) }
+  .rel.init       : { *(.rel.init) }
+  .rela.init      : { *(.rela.init) }
+  .rel.text       : { *(.rel.text .rel.text.* .rel.gnu.linkonce.t.*) }
+  .rela.text      : { *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*) }
+  .rel.fini       : { *(.rel.fini) }
+  .rela.fini      : { *(.rela.fini) }
+  .rel.rodata     : { *(.rel.rodata .rel.rodata.* .rel.gnu.linkonce.r.*) }
+  .rela.rodata    : { *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*) }
+  .rel.data.rel.ro   : { *(.rel.data.rel.ro* .rel.gnu.linkonce.d.rel.ro.*) }
+  .rela.data.rel.ro   : { *(.rela.data.rel.ro* .rela.gnu.linkonce.d.rel.ro.*) }
+  .rel.data       : { *(.rel.data .rel.data.* .rel.gnu.linkonce.d.*) }
+  .rela.data      : { *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*) }
+  .rel.tdata	  : { *(.rel.tdata .rel.tdata.* .rel.gnu.linkonce.td.*) }
+  .rela.tdata	  : { *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*) }
+  .rel.tbss	  : { *(.rel.tbss .rel.tbss.* .rel.gnu.linkonce.tb.*) }
+  .rela.tbss	  : { *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*) }
+  .rel.ctors      : { *(.rel.ctors) }
+  .rela.ctors     : { *(.rela.ctors) }
+  .rel.dtors      : { *(.rel.dtors) }
+  .rela.dtors     : { *(.rela.dtors) }
+  .rel.got        : { *(.rel.got) }
+  .rela.got       : { *(.rela.got) }
+  .rel.bss        : { *(.rel.bss .rel.bss.* .rel.gnu.linkonce.b.*) }
+  .rela.bss       : { *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*) }
+  .rel.plt        : { *(.rel.plt) }
+  .rela.plt       : { *(.rela.plt) }
+  .init           :
+  {
+    KEEP (*(.init))
+  } =0x07070707
+  .plt            : { *(.plt) }
+  .text           :
+  {
+    *(.text .stub .text.* .gnu.linkonce.t.*)
+    /* .gnu.warning sections are handled specially by elf32.em.  */
+    *(.gnu.warning)
+  } =0x07070707
+  .fini           :
+  {
+    KEEP (*(.fini))
+  } =0x07070707
+  PROVIDE (__etext = .);
+  PROVIDE (_etext = .);
+  PROVIDE (etext = .);
+  .rodata         : { *(.rodata .rodata.* .gnu.linkonce.r.*) }
+  .rodata1        : { *(.rodata1) }
+  .eh_frame_hdr : { *(.eh_frame_hdr) }
+  .eh_frame       : ONLY_IF_RO { KEEP (*(.eh_frame)) }
+  .gcc_except_table   : ONLY_IF_RO { *(.gcc_except_table .gcc_except_table.*) }
+  /* Adjust the address for the data segment.  We want to adjust up to
+     the same address within the page on the next page up.  */
+  . = ALIGN (CONSTANT (MAXPAGESIZE)) - ((CONSTANT (MAXPAGESIZE) - .) & (CONSTANT (MAXPAGESIZE) - 1)); . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE));
+  /* Exception handling  */
+  .eh_frame       : ONLY_IF_RW { KEEP (*(.eh_frame)) }
+  .gcc_except_table   : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) }
+  /* Thread Local Storage sections  */
+  .tdata	  : { *(.tdata .tdata.* .gnu.linkonce.td.*) }
+  .tbss		  : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) }
+  .preinit_array     :
+  {
+    PROVIDE_HIDDEN (__preinit_array_start = .);
+    KEEP (*(.preinit_array))
+    PROVIDE_HIDDEN (__preinit_array_end = .);
+  }
+  .init_array     :
+  {
+     PROVIDE_HIDDEN (__init_array_start = .);
+     KEEP (*(SORT(.init_array.*)))
+     KEEP (*(.init_array))
+     PROVIDE_HIDDEN (__init_array_end = .);
+  }
+  .fini_array     :
+  {
+    PROVIDE_HIDDEN (__fini_array_start = .);
+    KEEP (*(.fini_array))
+    KEEP (*(SORT(.fini_array.*)))
+    PROVIDE_HIDDEN (__fini_array_end = .);
+  }
+  .ctors          :
+  {
+    /* gcc uses crtbegin.o to find the start of
+       the constructors, so we make sure it is
+       first.  Because this is a wildcard, it
+       doesn't matter if the user does not
+       actually link against crtbegin.o; the
+       linker won't look for a file to match a
+       wildcard.  The wildcard also means that it
+       doesn't matter which directory crtbegin.o
+       is in.  */
+    KEEP (*crtbegin.o(.ctors))
+    KEEP (*crtbegin?.o(.ctors))
+    /* We don't want to include the .ctor section from
+       the crtend.o file until after the sorted ctors.
+       The .ctor section from the crtend file contains the
+       end of ctors marker and it must be last */
+    KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors))
+    KEEP (*(SORT(.ctors.*)))
+    KEEP (*(.ctors))
+  }
+  .dtors          :
+  {
+    KEEP (*crtbegin.o(.dtors))
+    KEEP (*crtbegin?.o(.dtors))
+    KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors))
+    KEEP (*(SORT(.dtors.*)))
+    KEEP (*(.dtors))
+  }
+  .jcr            : { KEEP (*(.jcr)) }
+  .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*) *(.data.rel.ro* .gnu.linkonce.d.rel.ro.*) }
+  .dynamic        : { *(.dynamic) }
+  . = DATA_SEGMENT_RELRO_END (0, .);
+  .got            : { *(.got.plt) *(.got) }
+  .data           :
+  {
+    *(.data .data.* .gnu.linkonce.d.*)
+    SORT(CONSTRUCTORS)
+  }
+  .data1          : { *(.data1) }
+  _edata = .; PROVIDE (edata = .);
+  __bss_start = .;
+  .bss            :
+  {
+   *(.dynbss)
+   *(.bss .bss.* .gnu.linkonce.b.*)
+   *(COMMON)
+   /* Align here to ensure that the .bss section occupies space up to
+      _end.  Align after .bss to ensure correct alignment even if the
+      .bss section disappears because there are no input sections.
+      FIXME: Why do we need it? When there is no .bss section, we don't
+      pad the .data section.  */
+   . = ALIGN(. != 0 ? 64 / 8 : 1);
+  }
+  . = ALIGN(64 / 8);
+  . = ALIGN(64 / 8);
+  _end = .; PROVIDE (end = .);
+  . = DATA_SEGMENT_END (.);
+  /* Stabs debugging sections.  */
+  .stab          0 : { *(.stab) }
+  .stabstr       0 : { *(.stabstr) }
+  .stab.excl     0 : { *(.stab.excl) }
+  .stab.exclstr  0 : { *(.stab.exclstr) }
+  .stab.index    0 : { *(.stab.index) }
+  .stab.indexstr 0 : { *(.stab.indexstr) }
+  .comment       0 : { *(.comment) }
+  /* DWARF debug sections.
+     Symbols in the DWARF debugging sections are relative to the beginning
+     of the section so we begin them at 0.  */
+  /* DWARF 1 */
+  .debug          0 : { *(.debug) }
+  .line           0 : { *(.line) }
+  /* GNU DWARF 1 extensions */
+  .debug_srcinfo  0 : { *(.debug_srcinfo) }
+  .debug_sfnames  0 : { *(.debug_sfnames) }
+  /* DWARF 1.1 and DWARF 2 */
+  .debug_aranges  0 : { *(.debug_aranges) }
+  .debug_pubnames 0 : { *(.debug_pubnames) }
+  /* DWARF 2 */
+  .debug_info     0 : { *(.debug_info .gnu.linkonce.wi.*) }
+  .debug_abbrev   0 : { *(.debug_abbrev) }
+  .debug_line     0 : { *(.debug_line) }
+  .debug_frame    0 : { *(.debug_frame) }
+  .debug_str      0 : { *(.debug_str) }
+  .debug_loc      0 : { *(.debug_loc) }
+  .debug_macinfo  0 : { *(.debug_macinfo) }
+  /* SGI/MIPS DWARF 2 extensions */
+  .debug_weaknames 0 : { *(.debug_weaknames) }
+  .debug_funcnames 0 : { *(.debug_funcnames) }
+  .debug_typenames 0 : { *(.debug_typenames) }
+  .debug_varnames  0 : { *(.debug_varnames) }
+  /* DWARF 3 */
+  .debug_pubtypes 0 : { *(.debug_pubtypes) }
+  .debug_ranges   0 : { *(.debug_ranges) }
+  .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) }
+  /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) }
+}
diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
new file mode 100644
index 0000000..0b285cd
--- /dev/null
+++ b/tcg/s390/tcg-target.c
@@ -0,0 +1,1145 @@
+/*
+ * Tiny Code Generator for QEMU
+ *
+ * Copyright (c) 2009 Ulrich Hecht <uli@suse.de>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef NDEBUG
+static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
+    "%r0",
+    "%r1",
+    "%r2",
+    "%r3",
+    "%r4",
+    "%r5",
+    "%r6",
+    "%r7",
+    "%r8",
+    "%r9",
+    "%r10",
+    "%r11",
+    "%r12",
+    "%r13",
+    "%r14",
+    "%r15"
+};
+#endif
+
+static const int tcg_target_reg_alloc_order[] = {
+    TCG_REG_R6,
+    TCG_REG_R7,
+    TCG_REG_R8,
+    TCG_REG_R9,
+    TCG_REG_R10,
+    TCG_REG_R11,
+    TCG_REG_R12,
+    TCG_REG_R13,
+    TCG_REG_R14,
+    /* TCG_REG_R0, many insns can't be used with R0, so we better avoid it for now */
+    TCG_REG_R1,
+    TCG_REG_R2,
+    TCG_REG_R3,
+    TCG_REG_R4,
+    TCG_REG_R5,
+};
+
+static const int tcg_target_call_iarg_regs[4] = {
+    TCG_REG_R2, TCG_REG_R3, TCG_REG_R4, TCG_REG_R5
+};
+static const int tcg_target_call_oarg_regs[2] = {
+    TCG_REG_R2, TCG_REG_R3
+};
+
+static void patch_reloc(uint8_t *code_ptr, int type,
+                tcg_target_long value, tcg_target_long addend)
+{
+    switch (type) {
+    case R_390_PC32DBL:
+        *(uint32_t*)code_ptr = (value - ((tcg_target_long)code_ptr + addend)) >> 1;
+        break;
+    default:
+        tcg_abort();
+        break;
+    }
+}
+
+/* maximum number of register used for input function arguments */
+static inline int tcg_target_get_call_iarg_regs_count(int flags)
+{
+    return 4;
+}
+
+#define TCG_CT_CONST_S16 0x100
+#define TCG_CT_CONST_U12 0x200
+
+/* parse target specific constraints */
+static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
+{
+    const char *ct_str;
+    
+    ct->ct |= TCG_CT_REG;
+    tcg_regset_set32(ct->u.regs, 0, 0xffff);
+    ct_str = *pct_str;
+    switch (ct_str[0]) {
+    case 'L':                   /* qemu_ld constraint */
+        tcg_regset_reset_reg (ct->u.regs, TCG_REG_R2);
+#ifdef CONFIG_SOFTMMU
+        tcg_regset_reset_reg (ct->u.regs, TCG_REG_R3);
+#endif
+        break;
+    case 'S':                   /* qemu_st constraint */
+        tcg_regset_reset_reg (ct->u.regs, TCG_REG_R2);
+#ifdef CONFIG_SOFTMMU
+        tcg_regset_reset_reg (ct->u.regs, TCG_REG_R3);
+        tcg_regset_reset_reg (ct->u.regs, TCG_REG_R4);
+#endif
+        break;
+    case 'R':			/* not R0 */
+        tcg_regset_reset_reg(ct->u.regs, TCG_REG_R0);
+        break;
+    case 'I':
+        ct->ct &= ~TCG_CT_REG;
+        ct->ct |= TCG_CT_CONST_S16;
+        break;
+    default:
+        break;
+    }
+    ct_str++;
+    *pct_str = ct_str;
+
+    return 0;
+}
+
+/* Test if a constant matches the constraint. */
+static inline int tcg_target_const_match(tcg_target_long val,
+                const TCGArgConstraint *arg_ct)
+{
+    int ct;
+    ct = arg_ct->ct;
+    if (ct & TCG_CT_CONST)
+        return 1;
+    if ((ct & TCG_CT_CONST_S16) && val == (int16_t)val)
+        return 1;
+    if ((ct & TCG_CT_CONST_U12) && val == (val & 0xfff))
+        return 1;
+
+    return 0;
+}
+
+#ifdef CONFIG_SOFTMMU
+
+#include "../../softmmu_defs.h"
+
+static void *qemu_ld_helpers[4] = {
+    __ldb_mmu,
+    __ldw_mmu,
+    __ldl_mmu,
+    __ldq_mmu,
+};
+
+static void *qemu_st_helpers[4] = {
+    __stb_mmu,
+    __stw_mmu,
+    __stl_mmu,
+    __stq_mmu,
+};
+#endif
+
+static uint8_t *tb_ret_addr;
+
+/* signed/unsigned is handled by using COMPARE and COMPARE LOGICAL,
+   respectively */
+static const uint8_t tcg_cond_to_s390_cond[10] = {
+    [TCG_COND_EQ] = 8,
+    [TCG_COND_LT] = 4,
+    [TCG_COND_LTU] = 4,
+    [TCG_COND_LE] = 8 | 4,
+    [TCG_COND_LEU] = 8 | 4,
+    [TCG_COND_GT] = 2,
+    [TCG_COND_GTU] = 2,
+    [TCG_COND_GE] = 8 | 2,
+    [TCG_COND_GEU] = 8 | 2,
+    [TCG_COND_NE] = 4 | 2 | 1,
+};
+
+/* emit load/store (and then some) instructions (E3 prefix) */
+static inline void tcg_out_e3(TCGContext* s, int op, int r1, int r2, int disp)
+{
+    tcg_out16(s, 0xe300 | (r1 << 4));
+    tcg_out32(s, op | (r2 << 28) | ((disp & 0xfff) << 16) | ((disp >> 12) << 8));
+}
+#define E3_LG	  0x04
+#define E3_LRVG	  0x0f
+#define E3_LGF	  0x14
+#define E3_LGH	  0x15
+#define E3_LLGF   0x16
+#define E3_LRV	  0x1e
+#define E3_LRVH   0x1f
+#define E3_CG	  0x20
+#define E3_STG    0x24
+#define E3_STRVG  0x2f
+#define E3_STRV   0x3e
+#define E3_STRVH  0x3f
+#define E3_STHY   0x70
+#define E3_STCY   0x72
+#define E3_LGB	  0x77
+#define E3_LLGC   0x90
+#define E3_LLGH   0x91
+
+/* emit 64-bit register/register insns (B9 prefix) */
+static inline void tcg_out_b9(TCGContext* s, int op, int r1, int r2)
+{
+    tcg_out32(s, 0xb9000000 | (op << 16) | (r1 << 4) | r2);
+}
+#define B9_LGR   0x04
+#define B9_AGR   0x08
+#define B9_SGR   0x09
+#define B9_MSGR  0x0c
+#define B9_LGFR  0x14
+#define B9_LLGFR 0x16
+#define B9_CGR   0x20
+#define B9_CLGR  0x21
+#define B9_NGR   0x80
+#define B9_OGR	 0x81
+#define B9_XGR   0x82
+#define B9_DLGR  0x87
+#define B9_DLR   0x97
+
+/* emit (mostly) 32-bit register/register insns */
+static inline void tcg_out_rr(TCGContext* s, int op, int r1, int r2)
+{
+    tcg_out16(s, (op << 8) | (r1 << 4) | r2);
+}
+#define RR_BASR 0x0d
+#define RR_NR   0x14
+#define RR_CLR	0x15
+#define RR_OR	0x16
+#define RR_XR   0x17
+#define RR_LR   0x18
+#define RR_CR	0x19
+#define RR_AR	0x1a
+#define RR_SR	0x1b
+
+static inline void tcg_out_a7(TCGContext *s, int op, int r1, int16_t i2)
+{
+    tcg_out32(s, 0xa7000000UL | (r1 << 20) | (op << 16) | ((uint16_t)i2));
+}
+#define A7_AHI 0xa
+#define A7_AHGI 0xb
+
+/* emit 64-bit shifts (EB prefix) */
+static inline void tcg_out_sh64(TCGContext* s, int op, int r0, int r1, int r2, int imm)
+{
+    tcg_out16(s, 0xeb00 | (r0 << 4) | r1);
+    tcg_out32(s, op | (r2 << 28) | ((imm & 0xfff) << 16) | ((imm >> 12) << 8));
+}
+#define SH64_REG_NONE 0 /* use immediate only (not R0!) */
+#define SH64_SRAG 0x0a
+#define SH64_SRLG 0x0c
+#define SH64_SLLG 0x0d
+
+/* emit 32-bit shifts */
+static inline void tcg_out_sh32(TCGContext* s, int op, int r0, int r1, int imm)
+{
+    tcg_out32(s, 0x80000000 | (op << 24) | (r0 << 20) | (r1 << 12) | imm);
+}
+#define SH32_REG_NONE 0 /* use immediate only (not R0!) */
+#define SH32_SRL 0x8
+#define SH32_SLL 0x9
+#define SH32_SRA 0xa
+
+/* branch to relative address (long) */
+static inline void tcg_out_brasl(TCGContext* s, int r, tcg_target_long raddr)
+{
+    tcg_out16(s, 0xc005 | (r << 4));
+    tcg_out32(s, raddr >> 1);
+}
+
+/* store 8/16/32 bits */
+static inline void tcg_out_store(TCGContext* s, int op, int r0, int r1, int off)
+{
+    tcg_out32(s, (op << 24) | (r0 << 20) | (r1 << 12) | off);
+}
+#define ST_STH 0x40
+#define ST_STC 0x42
+#define ST_ST  0x50
+
+/* load a register with an immediate value */
+static inline void tcg_out_movi(TCGContext *s, TCGType type,
+                int ret, tcg_target_long arg)
+{
+    //fprintf(stderr,"tcg_out_movi ret 0x%x arg 0x%lx\n",ret,arg);
+    if (arg >= -0x8000 && arg < 0x8000) { /* signed immediate load */
+        /* lghi %rret, arg */
+        tcg_out32(s, 0xa7090000 | (ret << 20) | (arg & 0xffff));
+    }
+    else if (!(arg & 0xffffffffffff0000UL)) {
+        /* llill %rret, arg */
+        tcg_out32(s, 0xa50f0000 | (ret << 20) | arg);
+    }
+    else if (!(arg & 0xffffffff00000000UL) || type == TCG_TYPE_I32) {
+        /* llill %rret, arg */
+        tcg_out32(s, 0xa50f0000 | (ret << 20) | (arg & 0xffff));
+        /* iilh %rret, arg */
+        tcg_out32(s, 0xa5020000 | (ret << 20) | ((arg & 0xffffffff) >> 16));
+    }
+    else {
+        /* branch over constant and store its address in R13 */
+        tcg_out_brasl(s, TCG_REG_R13, 14);
+        /* 64-bit constant */
+        tcg_out32(s,arg >> 32);
+        tcg_out32(s,arg);
+        /* load constant to ret */
+        tcg_out_e3(s, E3_LG, ret, TCG_REG_R13, 0);
+    }
+}
+
+/* load data without address translation or endianness conversion */
+static inline void tcg_out_ld(TCGContext *s, TCGType type, int arg,
+                int arg1, tcg_target_long arg2)
+{
+    int op;
+    //fprintf(stderr,"tcg_out_ld type %d arg %d arg1 %d arg2 %ld\n",type,arg,arg1,arg2);
+    
+    if (type == TCG_TYPE_I32) op = E3_LLGF;	/* 32-bit zero-extended */
+    else op = E3_LG;				/* 64-bit */
+    
+    if (arg2 < -0x80000 || arg2 > 0x7ffff) {
+        tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R13, arg2);	/* load the displacement */
+        tcg_out_b9(s, B9_AGR, TCG_REG_R13, arg1);		/* add the address */
+        tcg_out_e3(s, op, arg, TCG_REG_R13, 0);			/* load the data */
+    }
+    else {
+        tcg_out_e3(s, op, arg, arg1, arg2);		/* load the data */
+    }
+}
+
+/* load data with address translation (if applicable) and endianness conversion */
+static void tcg_out_qemu_ld(TCGContext* s, const TCGArg* args, int opc)
+{
+    int addr_reg, data_reg, mem_index, s_bits;
+#if defined(CONFIG_SOFTMMU)
+    uint16_t *label1_ptr, *label2_ptr;
+#endif
+        
+    data_reg = *args++;
+    addr_reg = *args++;
+    mem_index = *args;
+    
+    s_bits = opc & 3;
+    
+    int arg0 = TCG_REG_R2;
+#ifdef CONFIG_SOFTMMU
+    int arg1 = TCG_REG_R3;
+#endif
+    
+    /* fprintf(stderr,"tcg_out_qemu_ld opc %d data_reg %d addr_reg %d mem_index %d s_bits %d\n",
+            opc, data_reg, addr_reg, mem_index, s_bits); */
+    
+#ifdef CONFIG_SOFTMMU
+    tcg_out_b9(s, B9_LGR, arg1, addr_reg);
+    tcg_out_b9(s, B9_LGR, arg0, addr_reg);
+    
+    tcg_out_sh64(s, SH64_SRLG, arg1, addr_reg, SH64_REG_NONE, TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS);
+    
+    tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R13, TARGET_PAGE_MASK | ((1 << s_bits) - 1));
+    tcg_out_b9(s, B9_NGR, arg0, TCG_REG_R13);
+    
+    tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R13, (CPU_TLB_SIZE - 1) << CPU_TLB_ENTRY_BITS);
+    tcg_out_b9(s, B9_NGR, arg1, TCG_REG_R13);
+
+    tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R13, offsetof(CPUState, tlb_table[mem_index][0].addr_read));
+    tcg_out_b9(s, B9_AGR, arg1, TCG_REG_R13);
+    
+    tcg_out_b9(s, B9_AGR, arg1, TCG_AREG0);
+    
+    tcg_out_e3(s, E3_CG, arg0, arg1, 0);
+    
+    label1_ptr = (uint16_t*)s->code_ptr;
+    tcg_out32(s, 0xa7840000); /* je label1 (offset will be patched in later) */
+    
+    /* call load helper */
+#if TARGET_LONG_BITS == 32
+    tcg_out_b9(s, B9_LLGFR, arg0, addr_reg);
+#else
+    tcg_out_b9(s, B9_LGR, arg0, addr_reg);
+#endif
+    tcg_out_movi(s, TCG_TYPE_I32, arg1, mem_index);
+    tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R13, (tcg_target_ulong)qemu_ld_helpers[s_bits]);
+    tcg_out_rr(s, RR_BASR, TCG_REG_R14, TCG_REG_R13);
+
+    /* sign extension */
+    switch (opc) {
+        case 0 | 4:
+            tcg_out_sh64(s, SH64_SLLG, data_reg, arg0, SH64_REG_NONE, 56);
+            tcg_out_sh64(s, SH64_SRAG, data_reg, data_reg, SH64_REG_NONE, 56);
+            break;
+        case 1 | 4:
+	    tcg_out_sh64(s, SH64_SLLG, data_reg, arg0, SH64_REG_NONE, 48);
+            tcg_out_sh64(s, SH64_SRAG, data_reg, data_reg, SH64_REG_NONE, 48);
+            break;
+        case 2 | 4:
+            tcg_out_b9(s, B9_LGFR, data_reg, arg0);
+            break;
+        case 0: case 1: case 2: case 3: default:
+            /* unsigned -> just copy */
+            tcg_out_b9(s, B9_LGR, data_reg, arg0);
+            break;
+    }
+    
+    /* jump to label2 (end) */
+    label2_ptr = (uint16_t*)s->code_ptr;
+    tcg_out32(s, 0xa7d50000); /* bras %r13, label2 */
+    
+    /* this is label1, patch branch */
+    *(label1_ptr + 1) = ((unsigned long)s->code_ptr - (unsigned long)label1_ptr) >> 1;
+    
+    tcg_out_e3(s, E3_LG, arg1, arg1, offsetof(CPUTLBEntry, addend) - offsetof(CPUTLBEntry, addr_read));
+
+#if TARGET_LONG_BITS == 32
+    /* zero upper 32 bits */
+    tcg_out_b9(s, B9_LLGFR, arg0, addr_reg);
+#else
+    /* just copy */
+    tcg_out_b9(s, B9_LGR, arg0, addr_reg);
+#endif
+    tcg_out_b9(s, B9_AGR, arg0, arg1);
+
+#else /* CONFIG_SOFTMMU */
+    /* user mode, no address translation required */
+    arg0 = addr_reg;
+#endif
+
+    switch (opc) {
+        case 0:	/* unsigned byte */
+            tcg_out_e3(s, E3_LLGC, data_reg, arg0, 0);
+            break;
+        case 0 | 4: /* signed byte */
+            tcg_out_e3(s, E3_LGB, data_reg, arg0, 0);
+            break;
+        case 1:	/* unsigned short */
+#ifdef TARGET_WORDS_BIGENDIAN
+            tcg_out_e3(s, E3_LLGH, data_reg, arg0, 0);
+#else
+            /* swapped unsigned halfword load with upper bits zeroed */
+            tcg_out_e3(s, E3_LRVH, data_reg, arg0, 0);
+            tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R13, 0xffffL);
+            tcg_out_b9(s, B9_NGR, data_reg, 13);
+#endif
+            break;
+        case 1 | 4: /* signed short */
+#ifdef TARGET_WORDS_BIGENDIAN
+            tcg_out_e3(s, E3_LGH, data_reg, arg0, 0);
+#else
+            /* swapped sign-extended halfword load */
+            tcg_out_e3(s, E3_LRVH, data_reg, arg0, 0);
+            tcg_out_sh64(s, SH64_SLLG, data_reg, data_reg, SH64_REG_NONE, 48);
+            tcg_out_sh64(s, SH64_SRAG, data_reg, data_reg, SH64_REG_NONE, 48);
+#endif
+            break;
+        case 2: /* unsigned int */
+#ifdef TARGET_WORDS_BIGENDIAN
+            tcg_out_e3(s, E3_LLGF, data_reg, arg0, 0);
+#else
+            /* swapped unsigned int load with upper bits zeroed */
+            tcg_out_e3(s, E3_LRV, data_reg, arg0, 0);
+            tcg_out_b9(s, B9_LLGFR, data_reg, data_reg);
+#endif
+            break;
+        case 2 | 4: /* signed int */
+#ifdef TARGET_WORDS_BIGENDIAN
+            tcg_out_e3(s, E3_LGF, data_reg, arg0, 0);
+#else
+            /* swapped sign-extended int load */
+            tcg_out_e3(s, E3_LRV, data_reg, arg0, 0);
+            tcg_out_b9(s, B9_LGFR, data_reg, data_reg);
+#endif
+            break;
+        case 3: /* long (64 bit) */
+#ifdef TARGET_WORDS_BIGENDIAN
+            tcg_out_e3(s, E3_LG, data_reg, arg0, 0);
+#else
+            tcg_out_e3(s, E3_LRVG, data_reg, arg0, 0);
+#endif
+            break;
+        default:
+            tcg_abort();
+    }
+    
+#ifdef CONFIG_SOFTMMU
+    /* this is label2, patch branch */
+    *(label2_ptr + 1) = ((unsigned long)s->code_ptr - (unsigned long)label2_ptr) >> 1;
+#endif
+}
+
+static void tcg_out_qemu_st(TCGContext* s, const TCGArg* args, int opc)
+{
+    int addr_reg, data_reg, mem_index, s_bits;
+#if defined(CONFIG_SOFTMMU)
+    uint16_t *label1_ptr, *label2_ptr;
+#endif
+        
+    data_reg = *args++;
+    addr_reg = *args++;
+    mem_index = *args;
+    
+    s_bits = opc;
+    
+    int arg0 = TCG_REG_R2;
+#ifdef CONFIG_SOFTMMU
+    int arg1 = TCG_REG_R3;
+    int arg2 = TCG_REG_R4;
+#endif
+    
+    /* fprintf(stderr,"tcg_out_qemu_st opc %d data_reg %d addr_reg %d mem_index %d s_bits %d\n",
+            opc, data_reg, addr_reg, mem_index, s_bits); */
+    
+#ifdef CONFIG_SOFTMMU
+#if TARGET_LONG_BITS == 32
+    tcg_out_b9(s, B9_LLGFR, arg1, addr_reg);
+    tcg_out_b9(s, B9_LLGFR, arg0, addr_reg);
+#else
+    tcg_out_b9(s, B9_LGR, arg1, addr_reg);
+    tcg_out_b9(s, B9_LGR, arg0, addr_reg);
+#endif
+    
+    tcg_out_sh64(s, SH64_SRLG, arg1, addr_reg, SH64_REG_NONE, TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS);
+    
+    tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R13, TARGET_PAGE_MASK | ((1 << s_bits) - 1));
+    tcg_out_b9(s, B9_NGR, arg0, TCG_REG_R13);
+    
+    tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R13, (CPU_TLB_SIZE - 1) << CPU_TLB_ENTRY_BITS);
+    tcg_out_b9(s, B9_NGR, arg1, TCG_REG_R13);
+
+    tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R13, offsetof(CPUState, tlb_table[mem_index][0].addr_write));
+    tcg_out_b9(s, B9_AGR, arg1, TCG_REG_R13);
+    
+    tcg_out_b9(s, B9_AGR, arg1, TCG_AREG0);
+    
+    tcg_out_e3(s, E3_CG, arg0, arg1, 0);
+    
+#if TARGET_LONG_BITS == 32
+    tcg_out_b9(s, B9_LLGFR, arg0, addr_reg);
+#else
+    tcg_out_b9(s, B9_LGR, arg0, addr_reg);
+#endif
+
+    /* jump to label1 */
+    label1_ptr = (uint16_t*)s->code_ptr;
+    tcg_out32(s, 0xa7840000); /* je label1 */
+    
+    /* call store helper */
+    tcg_out_b9(s, B9_LGR, arg1, data_reg);
+    tcg_out_movi(s, TCG_TYPE_I32, arg2, mem_index);
+    tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R13, (tcg_target_ulong)qemu_st_helpers[s_bits]);
+    tcg_out_rr(s, RR_BASR, TCG_REG_R14, TCG_REG_R13);
+
+    /* jump to label2 (end) */
+    label2_ptr = (uint16_t*)s->code_ptr;
+    tcg_out32(s, 0xa7d50000); /* bras %r13, label2 */
+    
+    /* this is label1, patch branch */
+    *(label1_ptr + 1) = ((unsigned long)s->code_ptr - (unsigned long)label1_ptr) >> 1;
+    
+    tcg_out_e3(s, E3_LG, arg1, arg1, offsetof(CPUTLBEntry, addend) - offsetof(CPUTLBEntry, addr_write));
+    
+#if TARGET_LONG_BITS == 32
+    /* zero upper 32 bits */
+    tcg_out_b9(s, B9_LLGFR, arg0, addr_reg);
+#else
+    /* just copy */
+    tcg_out_b9(s, B9_LGR, arg0, addr_reg);
+#endif
+    tcg_out_b9(s, B9_AGR, arg0, arg1);
+
+#else /* CONFIG_SOFTMMU */
+    /* user mode, no address translation required */
+    arg0 = addr_reg;
+#endif
+
+    switch (opc) {
+    case 0:
+        tcg_out_store(s, ST_STC, data_reg, arg0, 0);
+        break;
+    case 1:
+#ifdef TARGET_WORDS_BIGENDIAN
+        tcg_out_store(s, ST_STH, data_reg, arg0, 0);
+#else
+        tcg_out_e3(s, E3_STRVH, data_reg, arg0, 0);
+#endif
+        break;
+    case 2:
+#ifdef TARGET_WORDS_BIGENDIAN
+        tcg_out_store(s, ST_ST, data_reg, arg0, 0);
+#else
+        tcg_out_e3(s, E3_STRV, data_reg, arg0, 0);
+#endif
+        break;
+    case 3:
+#ifdef TARGET_WORDS_BIGENDIAN
+        tcg_out_e3(s, E3_STG, data_reg, arg0, 0);
+#else
+        tcg_out_e3(s, E3_STRVG, data_reg, arg0, 0);
+#endif
+        break;
+    default:
+        tcg_abort();
+    }
+    
+#ifdef CONFIG_SOFTMMU
+    /* this is label2, patch branch */
+    *(label2_ptr + 1) = ((unsigned long)s->code_ptr - (unsigned long)label2_ptr) >> 1;
+#endif
+}
+
+static inline void tcg_out_st(TCGContext *s, TCGType type, int arg,
+                              int arg1, tcg_target_long arg2)
+{
+    //fprintf(stderr,"tcg_out_st arg 0x%x arg1 0x%x arg2 0x%lx\n",arg,arg1,arg2);
+    if (type == TCG_TYPE_I32) {
+        if (((long)arg2) < -0x800 || ((long)arg2) > 0x7ff) {
+            tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R13, arg2);
+            tcg_out_b9(s, B9_AGR, 13, arg1);
+            tcg_out_store(s, ST_ST, arg, TCG_REG_R13, 0);
+        }
+        else tcg_out_store(s, ST_ST, arg, arg1, arg2);
+    }
+    else {
+        if (((long)arg2) < -0x80000 || ((long)arg2) > 0x7ffff) tcg_abort();
+        tcg_out_e3(s, E3_STG, arg, arg1, arg2);
+    }
+}
+
+static inline void tcg_out_op(TCGContext *s, int opc,
+                const TCGArg *args, const int *const_args)
+{
+    TCGLabel* l;
+    int op;
+    int op2;
+    //fprintf(stderr,"0x%x\n", INDEX_op_divu_i32);
+    switch (opc) {
+    case INDEX_op_exit_tb:
+        //fprintf(stderr,"op 0x%x exit_tb 0x%lx 0x%lx 0x%lx\n",opc,args[0],args[1],args[2]);
+        tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R2, args[0]);	/* return value */
+        tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R13, (unsigned long)tb_ret_addr);
+        tcg_out16(s,0x7fd); /* br %r13 */
+        break;
+    case INDEX_op_goto_tb:
+        //fprintf(stderr,"op 0x%x goto_tb 0x%lx 0x%lx 0x%lx\n",opc,args[0],args[1],args[2]);
+        if (s->tb_jmp_offset) {
+            tcg_abort();
+        }
+        else {
+            tcg_target_long off = ((tcg_target_long)(s->tb_next + args[0]) - (tcg_target_long)s->code_ptr) >> 1;
+            if (off > -0x80000000L && off < 0x7fffffffL) { /* load address relative to PC */
+                /* larl %r13, off */
+                tcg_out16(s,0xc0d0); tcg_out32(s,off);
+            }
+            else { /* too far for larl */
+                tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R13, (tcg_target_long)(s->tb_next + args[0]));
+            }
+            tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_R13, TCG_REG_R13, 0);   /* load address stored at s->tb_next + args[0] */
+            tcg_out_rr(s, RR_BASR, TCG_REG_R13, TCG_REG_R13);		/* and go there */
+        }
+        s->tb_next_offset[args[0]] = s->code_ptr - s->code_buf;
+        break;
+    case INDEX_op_call:
+        //fprintf(stderr,"op 0x%x call 0x%lx 0x%lx 0x%lx\n",opc,args[0],args[1],args[2]);
+        if (const_args[0]) {
+            tcg_target_long off = (args[0] - (tcg_target_long)s->code_ptr + 4) >> 1; /* FIXME: + 4? Where did that come from? */
+            if (off > -0x80000000 && off < 0x7fffffff) { /* relative call */
+                tcg_out_brasl(s, TCG_REG_R14, off << 1);
+                tcg_abort(); // untested
+            }
+            else { /* too far for a relative call, load full address */
+                tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R13, args[0]);
+                tcg_out_rr(s, RR_BASR, TCG_REG_R14, TCG_REG_R13);
+            }
+        }
+        else {	/* call function in register args[0] */
+            tcg_out_rr(s, RR_BASR, TCG_REG_R14, args[0]);
+        }
+        break;
+    case INDEX_op_jmp:
+        fprintf(stderr,"op 0x%x jmp 0x%lx 0x%lx 0x%lx\n",opc,args[0],args[1],args[2]);
+        tcg_abort();
+        break;
+    case INDEX_op_ld8u_i32:
+    case INDEX_op_ld8u_i64:
+        //fprintf(stderr,"op 0x%x ld8u_i32 0x%lx 0x%lx 0x%lx\n",opc,args[0],args[1],args[2]);
+        if ((long)args[2] > -0x80000 && (long)args[2] < 0x7ffff) {
+            tcg_out_e3(s, E3_LLGC, args[0], args[1], args[2]);
+        }
+        else {	/* displacement too large, have to calculate address manually */
+            tcg_abort();
+        }
+        break;
+    case INDEX_op_ld8s_i32:
+        fprintf(stderr,"op 0x%x ld8s_i32 0x%lx 0x%lx 0x%lx\n",opc,args[0],args[1],args[2]);
+        tcg_abort();
+        break;
+    case INDEX_op_ld16u_i32:
+        fprintf(stderr,"op 0x%x ld16u_i32 0x%lx 0x%lx 0x%lx\n",opc,args[0],args[1],args[2]);
+        if ((long)args[2] > -0x80000 && (long)args[2] < 0x7ffff) {
+            tcg_out_e3(s, E3_LLGH, args[0], args[1], args[2]);
+        }
+        else {	/* displacement too large, have to calculate address manually */
+            tcg_abort();
+        }
+        break;
+    case INDEX_op_ld16s_i32:
+        fprintf(stderr,"op 0x%x ld16s_i32 0x%lx 0x%lx 0x%lx\n",opc,args[0],args[1],args[2]);
+        tcg_abort();
+        break;
+    case INDEX_op_ld_i32:
+    case INDEX_op_ld32u_i64:
+        tcg_out_ld(s, TCG_TYPE_I32, args[0], args[1], args[2]);
+        break;
+    case INDEX_op_ld32s_i64:
+        if (args[2] < -0x80000 || args[2] > 0x7ffff) {
+            tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R13, args[2]);	/* load the displacement */
+            tcg_out_b9(s, B9_AGR, TCG_REG_R13, args[1]);		/* add the address */
+            tcg_out_e3(s, E3_LGF, args[0], TCG_REG_R13, 0);		/* load the data (sign-extended) */
+        }
+        else {
+            tcg_out_e3(s, E3_LGF, args[0], args[1], args[2]);		/* load the data (sign-extended) */
+        }
+        break;
+    case INDEX_op_ld_i64:
+        tcg_out_ld(s, TCG_TYPE_I64, args[0], args[1], args[2]);
+        break;
+    case INDEX_op_st8_i32:
+    case INDEX_op_st8_i64:
+        //fprintf(stderr,"op 0x%x st8_i32 0x%lx 0x%lx 0x%lx\n",opc,args[0],args[1],args[2]);
+        if (((long)args[2]) >= -0x800 && ((long)args[2]) < 0x800)
+            tcg_out_store(s, ST_STC, args[0], args[1], args[2]);
+        else if (((long)args[2]) >= -0x80000 && ((long)args[2]) < 0x80000) {
+            tcg_out_e3(s, E3_STCY, args[0], args[1], args[2]); /* FIXME: requires long displacement facility */
+            tcg_abort(); // untested
+        }
+        else tcg_abort();
+        break;
+    case INDEX_op_st16_i32:
+    case INDEX_op_st16_i64:
+        //fprintf(stderr,"op 0x%x st16_i32 0x%lx 0x%lx 0x%lx\n",opc,args[0],args[1],args[2]);
+        if (((long)args[2]) >= -0x800 && ((long)args[2]) < 0x800)
+            tcg_out_store(s, ST_STH, args[0], args[1], args[2]);
+        else if (((long)args[2]) >= -0x80000 && ((long)args[2]) < 0x80000) {
+            tcg_out_e3(s, E3_STHY, args[0], args[1], args[2]);	/* FIXME: requires long displacement facility */
+            tcg_abort(); // untested
+        }
+        else tcg_abort();
+        break;
+    case INDEX_op_st_i32:
+    case INDEX_op_st32_i64:
+        tcg_out_st(s, TCG_TYPE_I32, args[0], args[1], args[2]);
+        break;
+    case INDEX_op_st_i64:
+        tcg_out_st(s, TCG_TYPE_I64, args[0], args[1], args[2]);
+        break;
+    case INDEX_op_mov_i32:
+        fprintf(stderr,"op 0x%x mov_i32 0x%lx 0x%lx 0x%lx\n",opc,args[0],args[1],args[2]);
+        tcg_abort();
+        break;
+    case INDEX_op_movi_i32:
+        fprintf(stderr,"op 0x%x movi_i32 0x%lx 0x%lx 0x%lx\n",opc,args[0],args[1],args[2]);
+        tcg_abort();
+        break;
+    case INDEX_op_add_i32:
+        if (const_args[2]) {
+            if (args[0] == args[1]) tcg_out_a7(s, A7_AHI, args[1], args[2]);
+            else {
+                tcg_out_rr(s, RR_LR, args[0], args[1]);
+                tcg_out_a7(s, A7_AHI, args[0], args[2]);
+            }
+        }
+        else if (args[0] == args[1]) {
+            tcg_out_rr(s, RR_AR, args[1], args[2]);
+        }
+        else if (args[0] == args[2]) {
+            tcg_out_rr(s, RR_AR, args[0], args[1]);
+        }
+        else {
+            tcg_out_rr(s, RR_LR, args[0], args[1]);
+            tcg_out_rr(s, RR_AR, args[0], args[2]);
+        }
+        break;
+    case INDEX_op_sub_i32:
+        //fprintf(stderr,"op 0x%x sub_i32 0x%lx 0x%lx 0x%lx\n",opc,args[0],args[1],args[2]);
+        if (args[0] == args[1]) {
+            tcg_out_rr(s, RR_SR, args[1], args[2]); /* sr %ra0/1, %ra2 */
+        }
+        else if (args[0] == args[2]) {
+            tcg_out_rr(s, RR_LR, TCG_REG_R13, args[2]); /* lr %r13, %raa0/2 */
+            tcg_out_rr(s, RR_LR, args[0], args[1]); /* lr %ra0/2, %ra1 */
+            tcg_out_rr(s, RR_SR, args[0], TCG_REG_R13); /* sr %ra0/2, %r13 */
+        }
+        else {
+            tcg_out_rr(s, RR_LR, args[0], args[1]);	/* lr %ra0, %ra1 */
+            tcg_out_rr(s, RR_SR, args[0], args[2]);	/* sr %ra0, %ra2 */
+        }
+        break;
+
+    case INDEX_op_sub_i64:
+        //fprintf(stderr,"op 0x%x sub_i64 0x%lx 0x%lx 0x%lx\n",opc,args[0],args[1],args[2]);
+        if (args[0] == args[1]) {
+            tcg_out_b9(s, B9_SGR, args[1], args[2]); /* sgr %ra0/1, %ra2 */
+        }
+        else if (args[0] == args[2]) {
+            tcg_out_b9(s, B9_LGR, TCG_REG_R13, args[2]); /* lgr %r13, %raa0/2 */
+            tcg_out_b9(s, B9_LGR, args[0], args[1]); /* lgr %ra0/2, %ra1 */
+            tcg_out_b9(s, B9_SGR, args[0], TCG_REG_R13); /* sgr %ra0/2, %r13 */
+        }
+        else {
+            tcg_out_b9(s, B9_LGR, args[0], args[1]);	/* lgr %ra0, %ra1 */
+            tcg_out_b9(s, B9_SGR, args[0], args[2]);	/* sgr %ra0, %ra2 */
+        }
+        break;
+    case INDEX_op_add_i64:
+        //fprintf(stderr,"op 0x%x add_i64 0x%lx 0x%lx 0x%lx\n",opc,args[0],args[1],args[2]);
+        if (args[0] == args[1]) {
+            tcg_out_b9(s, B9_AGR, args[1], args[2]);
+        }
+        else if (args[0] == args[2]) {
+            tcg_out_b9(s, B9_AGR, args[0], args[1]);
+        }
+        else {
+            tcg_out_b9(s, B9_LGR, args[0], args[1]);
+            tcg_out_b9(s, B9_AGR, args[0], args[2]);
+        }
+        break;
+    
+    case INDEX_op_and_i32:
+        op = RR_NR;
+do_logic_i32:
+        if (args[0] == args[1]) {
+            tcg_out_rr(s, op, args[1], args[2]); /* xr %ra0/1, %ra2 */
+        }
+        else if (args[0] == args[2]) {
+            tcg_out_rr(s, op, args[0], args[1]); /* xr %ra0/2, %ra1 */
+        }
+        else {
+            tcg_out_rr(s, RR_LR, args[0], args[1]); /* lr %ra0, %ra1 */
+            tcg_out_rr(s, op, args[0], args[2]); /* xr %ra0, %ra2 */
+        }
+        break;
+    case INDEX_op_or_i32: op = RR_OR; goto do_logic_i32;
+    case INDEX_op_xor_i32: op = RR_XR; goto do_logic_i32;
+
+    case INDEX_op_and_i64:
+        //fprintf(stderr,"op 0x%x and_i64 0x%lx 0x%lx 0x%lx\n",opc,args[0],args[1],args[2]);
+        op = B9_NGR;
+do_logic_i64:
+        if (args[0] == args[1]) {
+            tcg_out_b9(s, op, args[0], args[2]);
+        }
+        else if (args[0] == args[2]) {
+            tcg_out_b9(s, op, args[0], args[1]);
+        }
+        else {
+            tcg_out_b9(s, B9_LGR, args[0], args[1]);
+            tcg_out_b9(s, op, args[0], args[2]);
+        }
+        break;
+    case INDEX_op_or_i64: op = B9_OGR; goto do_logic_i64;
+    case INDEX_op_xor_i64: op = B9_XGR; goto do_logic_i64;
+    
+    case INDEX_op_neg_i32:
+        //fprintf(stderr,"op 0x%x neg_i32 0x%lx 0x%lx 0x%lx\n",opc,args[0],args[1],args[2]);
+        /* FIXME: optimize args[0] != args[1] case */
+        tcg_out_rr(s, RR_LR, 13, args[1]);
+        tcg_out32(s, 0xa7090000 | (args[0] << 20)); /* lghi %ra0, 0 */
+        tcg_out_rr(s, RR_SR, args[0], 13);
+        break;
+    case INDEX_op_neg_i64:
+        //fprintf(stderr,"op 0x%x neg_i64 0x%lx 0x%lx 0x%lx\n",opc,args[0],args[1],args[2]);
+        /* FIXME: optimize args[0] != args[1] case */
+        tcg_out_b9(s, B9_LGR, 13, args[1]);
+        tcg_out32(s, 0xa7090000 | (args[0] << 20)); /* lghi %ra0, 0 */
+        tcg_out_b9(s, B9_SGR, args[0], 13);
+        break;
+
+    case INDEX_op_mul_i32:
+        //fprintf(stderr,"op 0x%x mul_i32 0x%lx 0x%lx 0x%lx\n",opc,args[0],args[1],args[2]);
+        if (args[0] == args[1])
+          tcg_out32(s, 0xb2520000 | (args[0] << 4) | args[2]); /* msr %ra0/1, %ra2 */
+        else if (args[0] == args[2])
+          tcg_out32(s, 0xb2520000 | (args[0] << 4) | args[1]); /* msr %ra0/2, %ra1 */
+        else {
+          tcg_out_rr(s, RR_LR, args[0], args[1]);
+          tcg_out32(s, 0xb2520000 | (args[0] << 4) | args[2]); /* msr %ra0, %ra2 */
+        }
+        break;
+    case INDEX_op_mul_i64:
+        //fprintf(stderr,"op 0x%x mul_i64 0x%lx 0x%lx 0x%lx\n",opc,args[0],args[1],args[2]);
+        if (args[0] == args[1]) {
+            tcg_out_b9(s, B9_MSGR, args[0], args[2]);
+        }
+        else if (args[0] == args[2]) {
+            tcg_out_b9(s, B9_MSGR, args[0], args[1]);
+        }
+        else tcg_abort();
+        break;
+
+    case INDEX_op_divu_i32:
+    case INDEX_op_remu_i32:
+        //fprintf(stderr,"op 0x%x div/remu_i32 0x%lx 0x%lx 0x%lx\n",opc,args[0],args[1],args[2]);
+        tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_R12, 0);
+        tcg_out_rr(s, RR_LR, TCG_REG_R13, args[1]);
+        tcg_out_b9(s, B9_DLR, TCG_REG_R12, args[2]);
+        if (opc == INDEX_op_divu_i32)
+          tcg_out_rr(s, RR_LR, args[0], TCG_REG_R13);	/* quotient */
+        else
+          tcg_out_rr(s, RR_LR, args[0], TCG_REG_R12);	/* remainder */
+        break;
+        
+    case INDEX_op_shl_i32:
+        op = SH32_SLL; op2 = SH64_SLLG;
+do_shift32:
+        if (const_args[2]) {
+            if (args[0] == args[1]) {
+                tcg_out_sh32(s, op, args[0], SH32_REG_NONE, args[2]);
+            }
+            else {
+                tcg_out_rr(s, RR_LR, args[0], args[1]);
+                tcg_out_sh32(s, op, args[0], SH32_REG_NONE, args[2]);
+            }
+        }
+        else {
+            if (args[0] == args[1]) {
+                tcg_out_sh32(s, op, args[0], args[2], 0);
+            }
+            else
+                tcg_out_sh64(s, op2, args[0], args[1], args[2], 0);
+        }
+        break;
+    case INDEX_op_shr_i32: op = SH32_SRL; op2 = SH64_SRLG; goto do_shift32;
+    case INDEX_op_sar_i32: op = SH32_SRA; op2 = SH64_SRAG; goto do_shift32;
+
+    case INDEX_op_shl_i64:
+        op = SH64_SLLG;
+do_shift64:
+        if (const_args[2]) {
+            tcg_out_sh64(s, op, args[0], args[1], SH64_REG_NONE, args[2]);
+        }
+        else {
+            tcg_out_sh64(s, op, args[0], args[1], args[2], 0);
+        }
+        break;
+    case INDEX_op_shr_i64: op = SH64_SRLG; goto do_shift64;
+    case INDEX_op_sar_i64: op = SH64_SRAG; goto do_shift64;
+
+    case INDEX_op_br:
+        //fprintf(stderr,"op 0x%x br 0x%lx 0x%lx 0x%lx\n",opc,args[0],args[1],args[2]);
+        l = &s->labels[args[0]];
+        if (l->has_value) {
+            tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R13, l->u.value);
+        }
+        else {
+            /* larl %r13, ... */
+            tcg_out16(s, 0xc0d0);
+            tcg_out_reloc(s, s->code_ptr, R_390_PC32DBL, args[0], -2);
+            s->code_ptr += 4;
+        }
+        tcg_out_rr(s, RR_BASR, TCG_REG_R13, TCG_REG_R13);
+        break;
+    case INDEX_op_brcond_i64:
+        //fprintf(stderr,"op 0x%x brcond_i64 0x%lx 0x%lx (c %d) 0x%lx\n",opc,args[0],args[1],const_args[1],args[2]);
+        if (args[2] > TCG_COND_GT) { /* unsigned */
+          tcg_out_b9(s, B9_CLGR, args[0], args[1]); /* clgr %ra0, %ra1 */
+        }
+        else { /* signed */
+          tcg_out_b9(s, B9_CGR, args[0], args[1]); /* cgr %ra0, %ra1 */
+        }
+        goto do_brcond;
+    case INDEX_op_brcond_i32:
+        //fprintf(stderr,"op 0x%x brcond_i32 0x%lx 0x%lx (c %d) 0x%lx\n",opc,args[0],args[1],const_args[1],args[2]);
+        if (args[2] > TCG_COND_GT) { /* unsigned */
+          tcg_out_rr(s, RR_CLR, args[0], args[1]); /* clr %ra0, %ra1 */
+        }
+        else { /* signed */
+          tcg_out_rr(s, RR_CR, args[0], args[1]); /* cr %ra0, %ra1 */
+        }
+do_brcond:
+        l = &s->labels[args[3]];
+        if (l->has_value) {
+            tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R13, l->u.value);
+        }
+        else {
+            /* larl %r13, ... */
+            tcg_out16(s, 0xc0d0);
+            tcg_out_reloc(s, s->code_ptr, R_390_PC32DBL, args[3], -2);
+            s->code_ptr += 4;
+        }
+        tcg_out16(s, 0x070d | (tcg_cond_to_s390_cond[args[2]] << 4)); /* bcr cond,%r13 */
+        break;
+
+    case INDEX_op_qemu_ld8u: tcg_out_qemu_ld(s, args, 0); break;
+    case INDEX_op_qemu_ld8s: tcg_out_qemu_ld(s, args, 0 | 4); break;
+    case INDEX_op_qemu_ld16u: tcg_out_qemu_ld(s, args, 1); break;
+    case INDEX_op_qemu_ld16s: tcg_out_qemu_ld(s, args, 1 | 4); break;
+    case INDEX_op_qemu_ld32u: tcg_out_qemu_ld(s, args, 2); break;
+    case INDEX_op_qemu_ld32s: tcg_out_qemu_ld(s, args, 2 | 4); break;
+    case INDEX_op_qemu_ld64: tcg_out_qemu_ld(s, args, 3); break;
+    case INDEX_op_qemu_st8: tcg_out_qemu_st(s, args, 0); break;
+    case INDEX_op_qemu_st16: tcg_out_qemu_st(s, args, 1); break;
+    case INDEX_op_qemu_st32: tcg_out_qemu_st(s, args, 2); break;
+    case INDEX_op_qemu_st64: tcg_out_qemu_st(s, args, 3); break;
+
+    default:
+        fprintf(stderr,"unimplemented opc 0x%x\n",opc);
+        tcg_abort();
+    }
+}
+
+static const TCGTargetOpDef s390_op_defs[] = {
+    { INDEX_op_exit_tb, { } },
+    { INDEX_op_goto_tb, { } },
+    { INDEX_op_call, { "ri" } },
+    { INDEX_op_jmp, { "ri" } },
+    { INDEX_op_br, { } },
+
+    { INDEX_op_mov_i32, { "r", "r" } },
+    { INDEX_op_movi_i32, { "r" } },
+
+    { INDEX_op_ld8u_i32, { "r", "r" } },
+    { INDEX_op_ld8s_i32, { "r", "r" } },
+    { INDEX_op_ld16u_i32, { "r", "r" } },
+    { INDEX_op_ld16s_i32, { "r", "r" } },
+    { INDEX_op_ld_i32, { "r", "r" } },
+    { INDEX_op_st8_i32, { "r", "r" } },
+    { INDEX_op_st16_i32, { "r", "r" } },
+    { INDEX_op_st_i32, { "r", "r" } },
+
+    { INDEX_op_add_i32, { "r", "r", "rI" } },
+    { INDEX_op_sub_i32, { "r", "r", "r" } },
+    { INDEX_op_mul_i32, { "r", "r", "r" } },
+
+    { INDEX_op_div_i32, { "r", "r", "r" } },
+    { INDEX_op_divu_i32, { "r", "r", "r" } },
+    { INDEX_op_rem_i32, { "r", "r", "r" } },
+    { INDEX_op_remu_i32, { "r", "r", "r" } },
+
+    { INDEX_op_and_i32, { "r", "r", "r" } },
+    { INDEX_op_or_i32, { "r", "r", "r" } },
+    { INDEX_op_xor_i32, { "r", "r", "r" } },
+    { INDEX_op_neg_i32, { "r", "r" } },
+
+    { INDEX_op_shl_i32, { "r", "r", "Ri" } },
+    { INDEX_op_shr_i32, { "r", "r", "Ri" } },
+    { INDEX_op_sar_i32, { "r", "r", "Ri" } },
+
+    { INDEX_op_brcond_i32, { "r", "r" } },
+
+    { INDEX_op_qemu_ld8u, { "r", "L" } },
+    { INDEX_op_qemu_ld8s, { "r", "L" } },
+    { INDEX_op_qemu_ld16u, { "r", "L" } },
+    { INDEX_op_qemu_ld16s, { "r", "L" } },
+    { INDEX_op_qemu_ld32u, { "r", "L" } },
+    { INDEX_op_qemu_ld32s, { "r", "L" } },
+
+    { INDEX_op_qemu_st8, { "S", "S" } },
+    { INDEX_op_qemu_st16, { "S", "S" } },
+    { INDEX_op_qemu_st32, { "S", "S" } },
+
+#if defined(__s390x__)
+    { INDEX_op_mov_i64, { "r", "r" } },
+    { INDEX_op_movi_i64, { "r" } },
+
+    { INDEX_op_ld8u_i64, { "r", "r" } },
+    { INDEX_op_ld8s_i64, { "r", "r" } },
+    { INDEX_op_ld16u_i64, { "r", "r" } },
+    { INDEX_op_ld16s_i64, { "r", "r" } },
+    { INDEX_op_ld32u_i64, { "r", "r" } },
+    { INDEX_op_ld32s_i64, { "r", "r" } },
+    { INDEX_op_ld_i64, { "r", "r" } },
+
+    { INDEX_op_st8_i64, { "r", "r" } },
+    { INDEX_op_st16_i64, { "r", "r" } },
+    { INDEX_op_st32_i64, { "r", "r" } },
+    { INDEX_op_st_i64, { "r", "r" } },
+
+    { INDEX_op_qemu_ld64, { "L", "L" } },
+    { INDEX_op_qemu_st64, { "S", "S" } },
+
+    { INDEX_op_add_i64, { "r", "r", "r" } },
+    { INDEX_op_mul_i64, { "r", "r", "r" } },
+    { INDEX_op_sub_i64, { "r", "r", "r" } },
+
+    { INDEX_op_and_i64, { "r", "r", "r" } },
+    { INDEX_op_or_i64, { "r", "r", "r" } },
+    { INDEX_op_xor_i64, { "r", "r", "r" } },
+    { INDEX_op_neg_i64, { "r", "r" } },
+
+    { INDEX_op_shl_i64, { "r", "r", "Ri" } },
+    { INDEX_op_shr_i64, { "r", "r", "Ri" } },
+    { INDEX_op_sar_i64, { "r", "r", "Ri" } },
+
+    { INDEX_op_brcond_i64, { "r", "r" } },
+#endif
+
+    { -1 },
+};
+
+void tcg_target_init(TCGContext *s)
+{
+    /* fail safe */
+    if ((1 << CPU_TLB_ENTRY_BITS) != sizeof(CPUTLBEntry))
+        tcg_abort();
+
+    tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_I32], 0, 0xffff);
+    tcg_regset_set32(tcg_target_available_regs[TCG_TYPE_I64], 0, 0xffff);
+    tcg_regset_set32(tcg_target_call_clobber_regs, 0,
+                     (1 << TCG_REG_R0) |
+                     (1 << TCG_REG_R1) |
+                     (1 << TCG_REG_R2) |
+                     (1 << TCG_REG_R3) |
+                     (1 << TCG_REG_R4) |
+                     (1 << TCG_REG_R5) |
+                     (1 << TCG_REG_R14)); /* link register */
+    
+    tcg_regset_clear(s->reserved_regs);
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_R13); /* frequently used as a temporary */
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_R12); /* another temporary */
+
+    tcg_add_target_add_op_defs(s390_op_defs);
+}
+
+void tcg_target_qemu_prologue(TCGContext *s)
+{
+    tcg_out16(s,0xeb6f);tcg_out32(s,0xf0300024);	/* stmg %r6,%r15,48(%r15) (save registers) */
+    tcg_out32(s, 0xa7fbff60);				/* aghi %r15,-160 (stack frame) */
+    tcg_out16(s,0x7f2);					/* br %r2 (go to TB) */
+    tb_ret_addr = s->code_ptr;
+    tcg_out16(s,0xeb6f);tcg_out32(s, 0xf0d00004);	/* lmg %r6,%r15,208(%r15) (restore registers) */
+    tcg_out16(s,0x7fe);					/* br %r14 (return) */
+}
+
+
+static inline void tcg_out_mov(TCGContext *s, int ret, int arg)
+{
+    tcg_out_b9(s, B9_LGR, ret, arg);
+}
+
+static inline void tcg_out_addi(TCGContext *s, int reg, tcg_target_long val)
+{
+    tcg_abort();
+}
diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
new file mode 100644
index 0000000..fcb28d1
--- /dev/null
+++ b/tcg/s390/tcg-target.h
@@ -0,0 +1,76 @@
+/*
+ * Tiny Code Generator for QEMU
+ *
+ * Copyright (c) 2009 Ulrich Hecht <uli@suse.de>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+#define TCG_TARGET_S390 1
+
+#define TCG_TARGET_REG_BITS 64
+#define TCG_TARGET_WORDS_BIGENDIAN
+#define TCG_TARGET_HAS_div_i32
+#undef TCG_TARGET_HAS_div_i64
+#undef TCG_TARGET_HAS_bswap_i32
+#define TCG_TARGET_HAS_neg_i32
+#define TCG_TARGET_HAS_neg_i64
+#undef TCG_TARGET_STACK_GROWSUP
+
+enum {
+    TCG_REG_R0 = 0,
+    TCG_REG_R1,
+    TCG_REG_R2,
+    TCG_REG_R3,
+    TCG_REG_R4,
+    TCG_REG_R5,
+    TCG_REG_R6,
+    TCG_REG_R7,
+    TCG_REG_R8,
+    TCG_REG_R9,
+    TCG_REG_R10,
+    TCG_REG_R11,
+    TCG_REG_R12,
+    TCG_REG_R13,
+    TCG_REG_R14,
+    TCG_REG_R15
+};
+#define TCG_TARGET_NB_REGS 16
+
+/* used for function call generation */
+#define TCG_REG_CALL_STACK		TCG_REG_R15
+#define TCG_TARGET_STACK_ALIGN		8
+#define TCG_TARGET_CALL_STACK_OFFSET	0
+
+enum {
+    /* Note: must be synced with dyngen-exec.h */
+    TCG_AREG0 = TCG_REG_R10,
+    TCG_AREG1 = TCG_REG_R7,
+    TCG_AREG2 = TCG_REG_R8,
+    TCG_AREG3 = TCG_REG_R9,
+};
+
+static inline void flush_icache_range(unsigned long start, unsigned long stop)
+{
+#if QEMU_GNUC_PREREQ(4, 1)
+    void __clear_cache(char *beg, char *end);
+    __clear_cache((char *) start, (char *) stop);
+#else
+#error not implemented
+#endif
+}
-- 
1.6.2.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 5/9] linux-user: S/390 64-bit (s390x) support
  2009-10-16 12:38       ` [Qemu-devel] [PATCH 4/9] S/390 host support for TCG Ulrich Hecht
@ 2009-10-16 12:38         ` Ulrich Hecht
  2009-10-16 12:38           ` [Qemu-devel] [PATCH 6/9] linux-user: don't do locking in single-threaded processes Ulrich Hecht
  0 siblings, 1 reply; 26+ messages in thread
From: Ulrich Hecht @ 2009-10-16 12:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: riku.voipio, agraf

code for running 64-bit S/390 Linux binaries

Signed-off-by: Ulrich Hecht <uli@suse.de>
---
 linux-user/elfload.c             |   18 ++
 linux-user/main.c                |   82 +++++++++
 linux-user/s390x/syscall.h       |   25 +++
 linux-user/s390x/syscall_nr.h    |  348 ++++++++++++++++++++++++++++++++++++++
 linux-user/s390x/target_signal.h |   26 +++
 linux-user/s390x/termbits.h      |  283 +++++++++++++++++++++++++++++++
 linux-user/signal.c              |  314 ++++++++++++++++++++++++++++++++++
 linux-user/syscall.c             |   16 ++-
 linux-user/syscall_defs.h        |   56 ++++++-
 qemu-binfmt-conf.sh              |    5 +-
 10 files changed, 1166 insertions(+), 7 deletions(-)
 create mode 100644 linux-user/s390x/syscall.h
 create mode 100644 linux-user/s390x/syscall_nr.h
 create mode 100644 linux-user/s390x/target_signal.h
 create mode 100644 linux-user/s390x/termbits.h

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 62a3f2a..90e9268 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -679,6 +679,24 @@ static inline void init_thread(struct target_pt_regs *regs, struct image_info *i
 
 #endif /* TARGET_ALPHA */
 
+#ifdef TARGET_S390X
+
+#define ELF_START_MMAP (0x20000000000ULL)
+
+#define elf_check_arch(x) ( (x) == ELF_ARCH )
+
+#define ELF_CLASS	ELFCLASS64
+#define ELF_DATA	ELFDATA2MSB
+#define ELF_ARCH	EM_S390
+
+static inline void init_thread(struct target_pt_regs *regs, struct image_info *infop)
+{
+    regs->psw.addr = infop->entry;
+    regs->gprs[15] = infop->start_stack;
+}
+
+#endif /* TARGET_S390X */
+
 #ifndef ELF_PLATFORM
 #define ELF_PLATFORM (NULL)
 #endif
diff --git a/linux-user/main.c b/linux-user/main.c
index 81a1ada..e2a8ca9 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -2351,6 +2351,79 @@ void cpu_loop (CPUState *env)
 }
 #endif /* TARGET_ALPHA */
 
+#ifdef TARGET_S390X
+void cpu_loop(CPUS390XState *env)
+{
+    int trapnr;
+    target_siginfo_t info;
+
+    while (1) {
+        trapnr = cpu_s390x_exec (env);
+        
+        if ((trapnr & 0xffff0000) == EXCP_EXECUTE_SVC) {
+            int n = trapnr & 0xffff;
+            env->regs[2] = do_syscall(env, n,
+                       env->regs[2],
+                       env->regs[3],
+                       env->regs[4],
+                       env->regs[5],
+                       env->regs[6],
+                       env->regs[7]);
+        }
+        else switch (trapnr) {
+        case EXCP_INTERRUPT:
+            /* just indicate that signals should be handled asap */
+            break;
+        case EXCP_DEBUG:
+            {
+                int sig;
+
+                sig = gdb_handlesig (env, TARGET_SIGTRAP);
+                if (sig) {
+                    info.si_signo = sig;
+                    info.si_errno = 0;
+                    info.si_code = TARGET_TRAP_BRKPT;
+                    queue_signal(env, info.si_signo, &info);
+                }
+            }
+            break;
+        case EXCP_SVC:
+            {
+                int n = ldub(env->psw.addr - 1);
+                if (!n) n = env->regs[1];	/* syscalls > 255 */
+                env->regs[2] = do_syscall(env, n,
+                           env->regs[2],
+                           env->regs[3],
+                           env->regs[4],
+                           env->regs[5],
+                           env->regs[6],
+                           env->regs[7]);
+            }
+            break;
+        case EXCP_ADDR:
+            {
+                info.si_signo = SIGSEGV;
+                info.si_errno = 0;
+                /* XXX: check env->error_code */
+                info.si_code = TARGET_SEGV_MAPERR;
+                info._sifields._sigfault._addr = env->__excp_addr;
+                queue_signal(env, info.si_signo, &info);
+            }
+            break;
+        default:
+            printf ("Unhandled trap: 0x%x\n", trapnr);
+            if (trapnr == 42) { /* unimplemented insn */
+                fprintf(stderr,"insn 0x%08x%04x\n", ldl(env->psw.addr), lduw(env->psw.addr + 4));
+            }
+            cpu_dump_state(env, stderr, fprintf, 0);
+            exit (1);
+        }
+        process_pending_signals (env);
+    }
+}
+
+#endif /* TARGET_S390X */
+
 static void usage(void)
 {
     printf("qemu-" TARGET_ARCH " version " QEMU_VERSION QEMU_PKGVERSION ", Copyright (c) 2003-2008 Fabrice Bellard\n"
@@ -2987,6 +3060,15 @@ int main(int argc, char **argv, char **envp)
 	    env->regs[15] = regs->acr;	    
 	    env->pc = regs->erp;
     }
+#elif defined(TARGET_S390X)
+    {
+            int i;
+            for (i = 0; i < 16; i++) {
+                env->regs[i] = regs->gprs[i];
+            }
+            env->psw.mask = regs->psw.mask;
+            env->psw.addr = regs->psw.addr;
+    }
 #else
 #error unsupported target CPU
 #endif
diff --git a/linux-user/s390x/syscall.h b/linux-user/s390x/syscall.h
new file mode 100644
index 0000000..a3812a8
--- /dev/null
+++ b/linux-user/s390x/syscall.h
@@ -0,0 +1,25 @@
+/* this typedef defines how a Program Status Word looks like */
+typedef struct
+{
+    abi_ulong mask;
+    abi_ulong addr;
+} __attribute__ ((aligned(8))) target_psw_t;
+
+/*
+ * The pt_regs struct defines the way the registers are stored on
+ * the stack during a system call.
+ */
+
+#define TARGET_NUM_GPRS        16
+
+struct target_pt_regs
+{
+    abi_ulong args[1];
+    target_psw_t psw;
+    abi_ulong gprs[TARGET_NUM_GPRS];
+    abi_ulong orig_gpr2;
+    unsigned short ilc;
+    unsigned short trap;
+};
+
+#define UNAME_MACHINE "s390x"
diff --git a/linux-user/s390x/syscall_nr.h b/linux-user/s390x/syscall_nr.h
new file mode 100644
index 0000000..4a60b9a
--- /dev/null
+++ b/linux-user/s390x/syscall_nr.h
@@ -0,0 +1,348 @@
+/*
+ * This file contains the system call numbers.
+ */
+
+#define TARGET_NR_exit                 1
+#define TARGET_NR_fork                 2
+#define TARGET_NR_read                 3
+#define TARGET_NR_write                4
+#define TARGET_NR_open                 5
+#define TARGET_NR_close                6
+#define TARGET_NR_restart_syscall	  7
+#define TARGET_NR_creat                8
+#define TARGET_NR_link                 9
+#define TARGET_NR_unlink              10
+#define TARGET_NR_execve              11
+#define TARGET_NR_chdir               12
+#define TARGET_NR_mknod               14
+#define TARGET_NR_chmod               15
+#define TARGET_NR_lseek               19
+#define TARGET_NR_getpid              20
+#define TARGET_NR_mount               21
+#define TARGET_NR_umount              22
+#define TARGET_NR_ptrace              26
+#define TARGET_NR_alarm               27
+#define TARGET_NR_pause               29
+#define TARGET_NR_utime               30
+#define TARGET_NR_access              33
+#define TARGET_NR_nice                34
+#define TARGET_NR_sync                36
+#define TARGET_NR_kill                37
+#define TARGET_NR_rename              38
+#define TARGET_NR_mkdir               39
+#define TARGET_NR_rmdir               40
+#define TARGET_NR_dup                 41
+#define TARGET_NR_pipe                42
+#define TARGET_NR_times               43
+#define TARGET_NR_brk                 45
+#define TARGET_NR_signal              48
+#define TARGET_NR_acct                51
+#define TARGET_NR_umount2             52
+#define TARGET_NR_ioctl               54
+#define TARGET_NR_fcntl               55
+#define TARGET_NR_setpgid             57
+#define TARGET_NR_umask               60
+#define TARGET_NR_chroot              61
+#define TARGET_NR_ustat               62
+#define TARGET_NR_dup2                63
+#define TARGET_NR_getppid             64
+#define TARGET_NR_getpgrp             65
+#define TARGET_NR_setsid              66
+#define TARGET_NR_sigaction           67
+#define TARGET_NR_sigsuspend          72
+#define TARGET_NR_sigpending          73
+#define TARGET_NR_sethostname         74
+#define TARGET_NR_setrlimit           75
+#define TARGET_NR_getrusage           77
+#define TARGET_NR_gettimeofday        78
+#define TARGET_NR_settimeofday        79
+#define TARGET_NR_symlink             83
+#define TARGET_NR_readlink            85
+#define TARGET_NR_uselib              86
+#define TARGET_NR_swapon              87
+#define TARGET_NR_reboot              88
+#define TARGET_NR_readdir             89
+#define TARGET_NR_mmap                90
+#define TARGET_NR_munmap              91
+#define TARGET_NR_truncate            92
+#define TARGET_NR_ftruncate           93
+#define TARGET_NR_fchmod              94
+#define TARGET_NR_getpriority         96
+#define TARGET_NR_setpriority         97
+#define TARGET_NR_statfs              99
+#define TARGET_NR_fstatfs            100
+#define TARGET_NR_socketcall         102
+#define TARGET_NR_syslog             103
+#define TARGET_NR_setitimer          104
+#define TARGET_NR_getitimer          105
+#define TARGET_NR_stat               106
+#define TARGET_NR_lstat              107
+#define TARGET_NR_fstat              108
+#define TARGET_NR_lookup_dcookie     110
+#define TARGET_NR_vhangup            111
+#define TARGET_NR_idle               112
+#define TARGET_NR_wait4              114
+#define TARGET_NR_swapoff            115
+#define TARGET_NR_sysinfo            116
+#define TARGET_NR_ipc                117
+#define TARGET_NR_fsync              118
+#define TARGET_NR_sigreturn          119
+#define TARGET_NR_clone              120
+#define TARGET_NR_setdomainname      121
+#define TARGET_NR_uname              122
+#define TARGET_NR_adjtimex           124
+#define TARGET_NR_mprotect           125
+#define TARGET_NR_sigprocmask        126
+#define TARGET_NR_create_module      127
+#define TARGET_NR_init_module        128
+#define TARGET_NR_delete_module      129
+#define TARGET_NR_get_kernel_syms    130
+#define TARGET_NR_quotactl           131
+#define TARGET_NR_getpgid            132
+#define TARGET_NR_fchdir             133
+#define TARGET_NR_bdflush            134
+#define TARGET_NR_sysfs              135
+#define TARGET_NR_personality        136
+#define TARGET_NR_afs_syscall        137 /* Syscall for Andrew File System */
+#define TARGET_NR_getdents           141
+#define TARGET_NR_flock              143
+#define TARGET_NR_msync              144
+#define TARGET_NR_readv              145
+#define TARGET_NR_writev             146
+#define TARGET_NR_getsid             147
+#define TARGET_NR_fdatasync          148
+#define TARGET_NR__sysctl            149
+#define TARGET_NR_mlock              150
+#define TARGET_NR_munlock            151
+#define TARGET_NR_mlockall           152
+#define TARGET_NR_munlockall         153
+#define TARGET_NR_sched_setparam             154
+#define TARGET_NR_sched_getparam             155
+#define TARGET_NR_sched_setscheduler         156
+#define TARGET_NR_sched_getscheduler         157
+#define TARGET_NR_sched_yield                158
+#define TARGET_NR_sched_get_priority_max     159
+#define TARGET_NR_sched_get_priority_min     160
+#define TARGET_NR_sched_rr_get_interval      161
+#define TARGET_NR_nanosleep          162
+#define TARGET_NR_mremap             163
+#define TARGET_NR_query_module       167
+#define TARGET_NR_poll               168
+#define TARGET_NR_nfsservctl         169
+#define TARGET_NR_prctl              172
+#define TARGET_NR_rt_sigreturn       173
+#define TARGET_NR_rt_sigaction       174
+#define TARGET_NR_rt_sigprocmask     175
+#define TARGET_NR_rt_sigpending      176
+#define TARGET_NR_rt_sigtimedwait    177
+#define TARGET_NR_rt_sigqueueinfo    178
+#define TARGET_NR_rt_sigsuspend      179
+#define TARGET_NR_pread64            180
+#define TARGET_NR_pwrite64           181
+#define TARGET_NR_getcwd             183
+#define TARGET_NR_capget             184
+#define TARGET_NR_capset             185
+#define TARGET_NR_sigaltstack        186
+#define TARGET_NR_sendfile           187
+#define TARGET_NR_getpmsg		188
+#define TARGET_NR_putpmsg		189
+#define TARGET_NR_vfork		190
+#define TARGET_NR_pivot_root         217
+#define TARGET_NR_mincore            218
+#define TARGET_NR_madvise            219
+#define TARGET_NR_getdents64		220
+#define TARGET_NR_readahead		222
+#define TARGET_NR_setxattr		224
+#define TARGET_NR_lsetxattr		225
+#define TARGET_NR_fsetxattr		226
+#define TARGET_NR_getxattr		227
+#define TARGET_NR_lgetxattr		228
+#define TARGET_NR_fgetxattr		229
+#define TARGET_NR_listxattr		230
+#define TARGET_NR_llistxattr		231
+#define TARGET_NR_flistxattr		232
+#define TARGET_NR_removexattr	233
+#define TARGET_NR_lremovexattr	234
+#define TARGET_NR_fremovexattr	235
+#define TARGET_NR_gettid		236
+#define TARGET_NR_tkill		237
+#define TARGET_NR_futex		238
+#define TARGET_NR_sched_setaffinity	239
+#define TARGET_NR_sched_getaffinity	240
+#define TARGET_NR_tgkill		241
+/* Number 242 is reserved for tux */
+#define TARGET_NR_io_setup		243
+#define TARGET_NR_io_destroy		244
+#define TARGET_NR_io_getevents	245
+#define TARGET_NR_io_submit		246
+#define TARGET_NR_io_cancel		247
+#define TARGET_NR_exit_group		248
+#define TARGET_NR_epoll_create	249
+#define TARGET_NR_epoll_ctl		250
+#define TARGET_NR_epoll_wait		251
+#define TARGET_NR_set_tid_address	252
+#define TARGET_NR_fadvise64		253
+#define TARGET_NR_timer_create	254
+#define TARGET_NR_timer_settime	(TARGET_NR_timer_create+1)
+#define TARGET_NR_timer_gettime	(TARGET_NR_timer_create+2)
+#define TARGET_NR_timer_getoverrun	(TARGET_NR_timer_create+3)
+#define TARGET_NR_timer_delete	(TARGET_NR_timer_create+4)
+#define TARGET_NR_clock_settime	(TARGET_NR_timer_create+5)
+#define TARGET_NR_clock_gettime	(TARGET_NR_timer_create+6)
+#define TARGET_NR_clock_getres	(TARGET_NR_timer_create+7)
+#define TARGET_NR_clock_nanosleep	(TARGET_NR_timer_create+8)
+/* Number 263 is reserved for vserver */
+#define TARGET_NR_statfs64		265
+#define TARGET_NR_fstatfs64		266
+#define TARGET_NR_remap_file_pages	267
+/* Number 268 is reserved for new sys_mbind */
+/* Number 269 is reserved for new sys_get_mempolicy */
+/* Number 270 is reserved for new sys_set_mempolicy */
+#define TARGET_NR_mq_open		271
+#define TARGET_NR_mq_unlink		272
+#define TARGET_NR_mq_timedsend	273
+#define TARGET_NR_mq_timedreceive	274
+#define TARGET_NR_mq_notify		275
+#define TARGET_NR_mq_getsetattr	276
+#define TARGET_NR_kexec_load		277
+#define TARGET_NR_add_key		278
+#define TARGET_NR_request_key	279
+#define TARGET_NR_keyctl		280
+#define TARGET_NR_waitid		281
+#define TARGET_NR_ioprio_set		282
+#define TARGET_NR_ioprio_get		283
+#define TARGET_NR_inotify_init	284
+#define TARGET_NR_inotify_add_watch	285
+#define TARGET_NR_inotify_rm_watch	286
+/* Number 287 is reserved for new sys_migrate_pages */
+#define TARGET_NR_openat		288
+#define TARGET_NR_mkdirat		289
+#define TARGET_NR_mknodat		290
+#define TARGET_NR_fchownat		291
+#define TARGET_NR_futimesat		292
+#define TARGET_NR_unlinkat		294
+#define TARGET_NR_renameat		295
+#define TARGET_NR_linkat		296
+#define TARGET_NR_symlinkat		297
+#define TARGET_NR_readlinkat		298
+#define TARGET_NR_fchmodat		299
+#define TARGET_NR_faccessat		300
+#define TARGET_NR_pselect6		301
+#define TARGET_NR_ppoll		302
+#define TARGET_NR_unshare		303
+#define TARGET_NR_set_robust_list	304
+#define TARGET_NR_get_robust_list	305
+#define TARGET_NR_splice		306
+#define TARGET_NR_sync_file_range	307
+#define TARGET_NR_tee		308
+#define TARGET_NR_vmsplice		309
+/* Number 310 is reserved for new sys_move_pages */
+#define TARGET_NR_getcpu		311
+#define TARGET_NR_epoll_pwait	312
+#define TARGET_NR_utimes		313
+#define TARGET_NR_fallocate		314
+#define TARGET_NR_utimensat		315
+#define TARGET_NR_signalfd		316
+#define TARGET_NR_timerfd		317
+#define TARGET_NR_eventfd		318
+#define TARGET_NR_timerfd_create	319
+#define TARGET_NR_timerfd_settime	320
+#define TARGET_NR_timerfd_gettime	321
+#define TARGET_NR_signalfd4		322
+#define TARGET_NR_eventfd2		323
+#define TARGET_NR_inotify_init1	324
+#define TARGET_NR_pipe2		325
+#define TARGET_NR_dup3		326
+#define TARGET_NR_epoll_create1	327
+#define NR_syscalls 328
+
+/* 
+ * There are some system calls that are not present on 64 bit, some
+ * have a different name although they do the same (e.g. TARGET_NR_chown32
+ * is TARGET_NR_chown on 64 bit).
+ */
+#ifndef TARGET_S390X
+
+#define TARGET_NR_time		 13
+#define TARGET_NR_lchown		 16
+#define TARGET_NR_setuid		 23
+#define TARGET_NR_getuid		 24
+#define TARGET_NR_stime		 25
+#define TARGET_NR_setgid		 46
+#define TARGET_NR_getgid		 47
+#define TARGET_NR_geteuid		 49
+#define TARGET_NR_getegid		 50
+#define TARGET_NR_setreuid		 70
+#define TARGET_NR_setregid		 71
+#define TARGET_NR_getrlimit		 76
+#define TARGET_NR_getgroups		 80
+#define TARGET_NR_setgroups		 81
+#define TARGET_NR_fchown		 95
+#define TARGET_NR_ioperm		101
+#define TARGET_NR_setfsuid		138
+#define TARGET_NR_setfsgid		139
+#define TARGET_NR__llseek		140
+#define TARGET_NR__newselect 	142
+#define TARGET_NR_setresuid		164
+#define TARGET_NR_getresuid		165
+#define TARGET_NR_setresgid		170
+#define TARGET_NR_getresgid		171
+#define TARGET_NR_chown		182
+#define TARGET_NR_ugetrlimit		191	/* SuS compliant getrlimit */
+#define TARGET_NR_mmap2		192
+#define TARGET_NR_truncate64		193
+#define TARGET_NR_ftruncate64	194
+#define TARGET_NR_stat64		195
+#define TARGET_NR_lstat64		196
+#define TARGET_NR_fstat64		197
+#define TARGET_NR_lchown32		198
+#define TARGET_NR_getuid32		199
+#define TARGET_NR_getgid32		200
+#define TARGET_NR_geteuid32		201
+#define TARGET_NR_getegid32		202
+#define TARGET_NR_setreuid32		203
+#define TARGET_NR_setregid32		204
+#define TARGET_NR_getgroups32	205
+#define TARGET_NR_setgroups32	206
+#define TARGET_NR_fchown32		207
+#define TARGET_NR_setresuid32	208
+#define TARGET_NR_getresuid32	209
+#define TARGET_NR_setresgid32	210
+#define TARGET_NR_getresgid32	211
+#define TARGET_NR_chown32		212
+#define TARGET_NR_setuid32		213
+#define TARGET_NR_setgid32		214
+#define TARGET_NR_setfsuid32		215
+#define TARGET_NR_setfsgid32		216
+#define TARGET_NR_fcntl64		221
+#define TARGET_NR_sendfile64		223
+#define TARGET_NR_fadvise64_64	264
+#define TARGET_NR_fstatat64		293
+
+#else
+
+#define TARGET_NR_select		142
+#define TARGET_NR_getrlimit		191	/* SuS compliant getrlimit */
+#define TARGET_NR_lchown  		198
+#define TARGET_NR_getuid  		199
+#define TARGET_NR_getgid  		200
+#define TARGET_NR_geteuid  		201
+#define TARGET_NR_getegid  		202
+#define TARGET_NR_setreuid  		203
+#define TARGET_NR_setregid  		204
+#define TARGET_NR_getgroups  	205
+#define TARGET_NR_setgroups  	206
+#define TARGET_NR_fchown  		207
+#define TARGET_NR_setresuid  	208
+#define TARGET_NR_getresuid  	209
+#define TARGET_NR_setresgid  	210
+#define TARGET_NR_getresgid  	211
+#define TARGET_NR_chown  		212
+#define TARGET_NR_setuid  		213
+#define TARGET_NR_setgid  		214
+#define TARGET_NR_setfsuid  		215
+#define TARGET_NR_setfsgid  		216
+#define TARGET_NR_newfstatat		293
+
+#endif
+
diff --git a/linux-user/s390x/target_signal.h b/linux-user/s390x/target_signal.h
new file mode 100644
index 0000000..b4816b0
--- /dev/null
+++ b/linux-user/s390x/target_signal.h
@@ -0,0 +1,26 @@
+#ifndef TARGET_SIGNAL_H
+#define TARGET_SIGNAL_H
+
+#include "cpu.h"
+
+typedef struct target_sigaltstack {
+    abi_ulong ss_sp;
+    int ss_flags;
+    abi_ulong ss_size;
+} target_stack_t;
+
+/*
+ * sigaltstack controls
+ */
+#define TARGET_SS_ONSTACK      1
+#define TARGET_SS_DISABLE      2
+
+#define TARGET_MINSIGSTKSZ     2048
+#define TARGET_SIGSTKSZ        8192
+
+static inline abi_ulong get_sp_from_cpustate(CPUS390XState *state)
+{
+   return state->regs[15];
+}
+
+#endif /* TARGET_SIGNAL_H */
diff --git a/linux-user/s390x/termbits.h b/linux-user/s390x/termbits.h
new file mode 100644
index 0000000..2a78a05
--- /dev/null
+++ b/linux-user/s390x/termbits.h
@@ -0,0 +1,283 @@
+/*
+ *  include/asm-s390/termbits.h
+ *
+ *  S390 version
+ *
+ *  Derived from "include/asm-i386/termbits.h"
+ */
+
+#define TARGET_NCCS 19
+struct target_termios {
+    unsigned int c_iflag;		/* input mode flags */
+    unsigned int c_oflag;		/* output mode flags */
+    unsigned int c_cflag;		/* control mode flags */
+    unsigned int c_lflag;		/* local mode flags */
+    unsigned char c_line;			/* line discipline */
+    unsigned char c_cc[TARGET_NCCS];		/* control characters */
+};
+
+struct target_termios2 {
+    unsigned int c_iflag;		/* input mode flags */
+    unsigned int c_oflag;		/* output mode flags */
+    unsigned int c_cflag;		/* control mode flags */
+    unsigned int c_lflag;		/* local mode flags */
+    unsigned char c_line;			/* line discipline */
+    unsigned char c_cc[TARGET_NCCS];		/* control characters */
+    unsigned int c_ispeed;		/* input speed */
+    unsigned int c_ospeed;		/* output speed */
+};
+
+struct target_ktermios {
+    unsigned int c_iflag;		/* input mode flags */
+    unsigned int c_oflag;		/* output mode flags */
+    unsigned int c_cflag;		/* control mode flags */
+    unsigned int c_lflag;		/* local mode flags */
+    unsigned char c_line;			/* line discipline */
+    unsigned char c_cc[TARGET_NCCS];		/* control characters */
+    unsigned int c_ispeed;		/* input speed */
+    unsigned int c_ospeed;		/* output speed */
+};
+
+/* c_cc characters */
+#define TARGET_VINTR 0
+#define TARGET_VQUIT 1
+#define TARGET_VERASE 2
+#define TARGET_VKILL 3
+#define TARGET_VEOF 4
+#define TARGET_VTIME 5
+#define TARGET_VMIN 6
+#define TARGET_VSWTC 7
+#define TARGET_VSTART 8
+#define TARGET_VSTOP 9
+#define TARGET_VSUSP 10
+#define TARGET_VEOL 11
+#define TARGET_VREPRINT 12
+#define TARGET_VDISCARD 13
+#define TARGET_VWERASE 14
+#define TARGET_VLNEXT 15
+#define TARGET_VEOL2 16
+
+/* c_iflag bits */
+#define TARGET_IGNBRK	0000001
+#define TARGET_BRKINT	0000002
+#define TARGET_IGNPAR	0000004
+#define TARGET_PARMRK	0000010
+#define TARGET_INPCK	0000020
+#define TARGET_ISTRIP	0000040
+#define TARGET_INLCR	0000100
+#define TARGET_IGNCR	0000200
+#define TARGET_ICRNL	0000400
+#define TARGET_IUCLC	0001000
+#define TARGET_IXON	0002000
+#define TARGET_IXANY	0004000
+#define TARGET_IXOFF	0010000
+#define TARGET_IMAXBEL	0020000
+#define TARGET_IUTF8	0040000
+
+/* c_oflag bits */
+#define TARGET_OPOST	0000001
+#define TARGET_OLCUC	0000002
+#define TARGET_ONLCR	0000004
+#define TARGET_OCRNL	0000010
+#define TARGET_ONOCR	0000020
+#define TARGET_ONLRET	0000040
+#define TARGET_OFILL	0000100
+#define TARGET_OFDEL	0000200
+#define TARGET_NLDLY	0000400
+#define TARGET_NL0	0000000
+#define TARGET_NL1	0000400
+#define TARGET_CRDLY	0003000
+#define TARGET_CR0	0000000
+#define TARGET_CR1	0001000
+#define TARGET_CR2	0002000
+#define TARGET_CR3	0003000
+#define TARGET_TABDLY	0014000
+#define TARGET_TAB0	0000000
+#define TARGET_TAB1	0004000
+#define TARGET_TAB2	0010000
+#define TARGET_TAB3	0014000
+#define TARGET_XTABS	0014000
+#define TARGET_BSDLY	0020000
+#define TARGET_BS0	0000000
+#define TARGET_BS1	0020000
+#define TARGET_VTDLY	0040000
+#define TARGET_VT0	0000000
+#define TARGET_VT1	0040000
+#define TARGET_FFDLY	0100000
+#define TARGET_FF0	0000000
+#define TARGET_FF1	0100000
+
+/* c_cflag bit meaning */
+#define TARGET_CBAUD	0010017
+#define TARGET_B0	0000000		/* hang up */
+#define TARGET_B50	0000001
+#define TARGET_B75	0000002
+#define TARGET_B110	0000003
+#define TARGET_B134	0000004
+#define TARGET_B150	0000005
+#define TARGET_B200	0000006
+#define TARGET_B300	0000007
+#define TARGET_B600	0000010
+#define TARGET_B1200	0000011
+#define TARGET_B1800	0000012
+#define TARGET_B2400	0000013
+#define TARGET_B4800	0000014
+#define TARGET_B9600	0000015
+#define TARGET_B19200	0000016
+#define TARGET_B38400	0000017
+#define TARGET_EXTA B19200
+#define TARGET_EXTB B38400
+#define TARGET_CSIZE	0000060
+#define TARGET_CS5	0000000
+#define TARGET_CS6	0000020
+#define TARGET_CS7	0000040
+#define TARGET_CS8	0000060
+#define TARGET_CSTOPB	0000100
+#define TARGET_CREAD	0000200
+#define TARGET_PARENB	0000400
+#define TARGET_PARODD	0001000
+#define TARGET_HUPCL	0002000
+#define TARGET_CLOCAL	0004000
+#define TARGET_CBAUDEX 0010000
+#define TARGET_BOTHER  0010000
+#define TARGET_B57600  0010001
+#define TARGET_B115200 0010002
+#define TARGET_B230400 0010003
+#define TARGET_B460800 0010004
+#define TARGET_B500000 0010005
+#define TARGET_B576000 0010006
+#define TARGET_B921600 0010007
+#define TARGET_B1000000 0010010
+#define TARGET_B1152000 0010011
+#define TARGET_B1500000 0010012
+#define TARGET_B2000000 0010013
+#define TARGET_B2500000 0010014
+#define TARGET_B3000000 0010015
+#define TARGET_B3500000 0010016
+#define TARGET_B4000000 0010017
+#define TARGET_CIBAUD	  002003600000	/* input baud rate */
+#define TARGET_CMSPAR	  010000000000		/* mark or space (stick) parity */
+#define TARGET_CRTSCTS	  020000000000		/* flow control */
+
+#define TARGET_IBSHIFT	  16		/* Shift from CBAUD to CIBAUD */
+
+/* c_lflag bits */
+#define TARGET_ISIG	0000001
+#define TARGET_ICANON	0000002
+#define TARGET_XCASE	0000004
+#define TARGET_ECHO	0000010
+#define TARGET_ECHOE	0000020
+#define TARGET_ECHOK	0000040
+#define TARGET_ECHONL	0000100
+#define TARGET_NOFLSH	0000200
+#define TARGET_TOSTOP	0000400
+#define TARGET_ECHOCTL	0001000
+#define TARGET_ECHOPRT	0002000
+#define TARGET_ECHOKE	0004000
+#define TARGET_FLUSHO	0010000
+#define TARGET_PENDIN	0040000
+#define TARGET_IEXTEN	0100000
+
+/* tcflow() and TCXONC use these */
+#define	TARGET_TCOOFF		0
+#define	TARGET_TCOON		1
+#define	TARGET_TCIOFF		2
+#define	TARGET_TCION		3
+
+/* tcflush() and TCFLSH use these */
+#define	TARGET_TCIFLUSH	0
+#define	TARGET_TCOFLUSH	1
+#define	TARGET_TCIOFLUSH	2
+
+/* tcsetattr uses these */
+#define	TARGET_TCSANOW		0
+#define	TARGET_TCSADRAIN	1
+#define	TARGET_TCSAFLUSH	2
+
+/*
+ *  include/asm-s390/ioctls.h
+ *
+ *  S390 version
+ *
+ *  Derived from "include/asm-i386/ioctls.h"
+ */
+
+/* 0x54 is just a magic number to make these relatively unique ('T') */
+
+#define TARGET_TCGETS		0x5401
+#define TARGET_TCSETS		0x5402
+#define TARGET_TCSETSW		0x5403
+#define TARGET_TCSETSF		0x5404
+#define TARGET_TCGETA		0x5405
+#define TARGET_TCSETA		0x5406
+#define TARGET_TCSETAW		0x5407
+#define TARGET_TCSETAF		0x5408
+#define TARGET_TCSBRK		0x5409
+#define TARGET_TCXONC		0x540A
+#define TARGET_TCFLSH		0x540B
+#define TARGET_TIOCEXCL	0x540C
+#define TARGET_TIOCNXCL	0x540D
+#define TARGET_TIOCSCTTY	0x540E
+#define TARGET_TIOCGPGRP	0x540F
+#define TARGET_TIOCSPGRP	0x5410
+#define TARGET_TIOCOUTQ	0x5411
+#define TARGET_TIOCSTI		0x5412
+#define TARGET_TIOCGWINSZ	0x5413
+#define TARGET_TIOCSWINSZ	0x5414
+#define TARGET_TIOCMGET	0x5415
+#define TARGET_TIOCMBIS	0x5416
+#define TARGET_TIOCMBIC	0x5417
+#define TARGET_TIOCMSET	0x5418
+#define TARGET_TIOCGSOFTCAR	0x5419
+#define TARGET_TIOCSSOFTCAR	0x541A
+#define TARGET_FIONREAD	0x541B
+#define TARGET_TIOCINQ		FIONREAD
+#define TARGET_TIOCLINUX	0x541C
+#define TARGET_TIOCCONS	0x541D
+#define TARGET_TIOCGSERIAL	0x541E
+#define TARGET_TIOCSSERIAL	0x541F
+#define TARGET_TIOCPKT		0x5420
+#define TARGET_FIONBIO		0x5421
+#define TARGET_TIOCNOTTY	0x5422
+#define TARGET_TIOCSETD	0x5423
+#define TARGET_TIOCGETD	0x5424
+#define TARGET_TCSBRKP		0x5425	/* Needed for POSIX tcsendbreak() */
+#define TARGET_TIOCSBRK	0x5427  /* BSD compatibility */
+#define TARGET_TIOCCBRK	0x5428  /* BSD compatibility */
+#define TARGET_TIOCGSID	0x5429  /* Return the session ID of FD */
+#define TARGET_TCGETS2		_IOR('T',0x2A, struct termios2)
+#define TARGET_TCSETS2		_IOW('T',0x2B, struct termios2)
+#define TARGET_TCSETSW2	_IOW('T',0x2C, struct termios2)
+#define TARGET_TCSETSF2	_IOW('T',0x2D, struct termios2)
+#define TARGET_TIOCGPTN	_IOR('T',0x30, unsigned int) /* Get Pty Number (of pty-mux device) */
+#define TARGET_TIOCSPTLCK	_IOW('T',0x31, int)  /* Lock/unlock Pty */
+#define TARGET_TIOCGDEV	_IOR('T',0x32, unsigned int) /* Get real dev no below /dev/console */
+
+#define TARGET_FIONCLEX	0x5450  /* these numbers need to be adjusted. */
+#define TARGET_FIOCLEX		0x5451
+#define TARGET_FIOASYNC	0x5452
+#define TARGET_TIOCSERCONFIG	0x5453
+#define TARGET_TIOCSERGWILD	0x5454
+#define TARGET_TIOCSERSWILD	0x5455
+#define TARGET_TIOCGLCKTRMIOS	0x5456
+#define TARGET_TIOCSLCKTRMIOS	0x5457
+#define TARGET_TIOCSERGSTRUCT	0x5458 /* For debugging only */
+#define TARGET_TIOCSERGETLSR   0x5459 /* Get line status register */
+#define TARGET_TIOCSERGETMULTI 0x545A /* Get multiport config  */
+#define TARGET_TIOCSERSETMULTI 0x545B /* Set multiport config */
+
+#define TARGET_TIOCMIWAIT	0x545C	/* wait for a change on serial input line(s) */
+#define TARGET_TIOCGICOUNT	0x545D	/* read serial port inline interrupt counts */
+#define TARGET_FIOQSIZE	0x545E
+
+/* Used for packet mode */
+#define TARGET_TIOCPKT_DATA		 0
+#define TARGET_TIOCPKT_FLUSHREAD	 1
+#define TARGET_TIOCPKT_FLUSHWRITE	 2
+#define TARGET_TIOCPKT_STOP		 4
+#define TARGET_TIOCPKT_START		 8
+#define TARGET_TIOCPKT_NOSTOP		16
+#define TARGET_TIOCPKT_DOSTOP		32
+
+#define TARGET_TIOCSER_TEMT    0x01	/* Transmitter physically empty */
+
diff --git a/linux-user/signal.c b/linux-user/signal.c
index 2df17aa..e320e8e 100644
--- a/linux-user/signal.c
+++ b/linux-user/signal.c
@@ -3432,6 +3432,320 @@ long do_rt_sigreturn(CPUState *env)
     return -TARGET_ENOSYS;
 }
 
+#elif defined(TARGET_S390X)
+
+#define __NUM_GPRS 16
+#define __NUM_FPRS 16
+#define __NUM_ACRS 16
+
+#define S390_SYSCALL_SIZE   2
+#define __SIGNAL_FRAMESIZE      160 /* FIXME: 31-bit mode -> 96 */
+
+#define _SIGCONTEXT_NSIG        64
+#define _SIGCONTEXT_NSIG_BPW    64 /* FIXME: 31-bit mode -> 32 */
+#define _SIGCONTEXT_NSIG_WORDS  (_SIGCONTEXT_NSIG / _SIGCONTEXT_NSIG_BPW)
+#define _SIGMASK_COPY_SIZE    (sizeof(unsigned long)*_SIGCONTEXT_NSIG_WORDS)
+#define PSW_ADDR_AMODE            0x0000000000000000UL /* 0x80000000UL for 31-bit */
+#define S390_SYSCALL_OPCODE ((uint16_t)0x0a00)
+
+typedef struct
+{
+    target_psw_t psw;
+    target_ulong gprs[__NUM_GPRS];
+    unsigned int  acrs[__NUM_ACRS];
+} target_s390_regs_common;
+
+typedef struct
+{
+    unsigned int fpc;
+    double   fprs[__NUM_FPRS];
+} target_s390_fp_regs;
+
+typedef struct
+{
+    target_s390_regs_common regs;
+    target_s390_fp_regs     fpregs;
+} target_sigregs;
+
+struct target_sigcontext
+{
+    target_ulong   oldmask[_SIGCONTEXT_NSIG_WORDS];
+    target_sigregs        *sregs;
+};
+
+typedef struct
+{
+    uint8_t callee_used_stack[__SIGNAL_FRAMESIZE];
+    struct target_sigcontext sc;
+    target_sigregs sregs;
+    int signo;
+    uint8_t retcode[S390_SYSCALL_SIZE];
+} sigframe;
+
+struct target_ucontext {
+    target_ulong      uc_flags;
+    struct target_ucontext  *uc_link;
+    target_stack_t           uc_stack;
+    target_sigregs          uc_mcontext;
+    target_sigset_t   uc_sigmask;   /* mask last for extensibility */
+};
+                                        
+typedef struct
+{
+    uint8_t callee_used_stack[__SIGNAL_FRAMESIZE];
+    uint8_t retcode[S390_SYSCALL_SIZE];
+    struct target_siginfo info;
+    struct target_ucontext uc;
+} rt_sigframe;
+
+static inline abi_ulong
+get_sigframe(struct target_sigaction *ka, CPUState *env, size_t frame_size)
+{
+    abi_ulong sp;
+
+    /* Default to using normal stack */
+    sp = env->regs[15];
+
+    /* This is the X/Open sanctioned signal stack switching.  */
+    if (ka->sa_flags & TARGET_SA_ONSTACK) {
+        if (! sas_ss_flags(sp))
+            sp = target_sigaltstack_used.ss_sp + target_sigaltstack_used.ss_size;
+    }
+
+    /* This is the legacy signal stack switching. */
+    else if (/* FIXME !user_mode(regs) */ 0 &&
+             !(ka->sa_flags & TARGET_SA_RESTORER) &&
+             ka->sa_restorer) {
+        sp = (abi_ulong) ka->sa_restorer;
+    }
+
+    return (sp - frame_size) & -8ul;
+}
+
+static void save_sigregs(CPUState *env, target_sigregs *sregs)
+{
+    int i;
+    //save_access_regs(current->thread.acrs); FIXME
+
+    /* Copy a 'clean' PSW mask to the user to avoid leaking
+       information about whether PER is currently on.  */
+    __put_user(env->psw.mask, &sregs->regs.psw.mask);
+    __put_user(env->psw.addr, &sregs->regs.psw.addr);
+    for (i = 0; i < 16; i++)
+        __put_user(env->regs[i], &sregs->regs.gprs[i]);
+    for (i = 0; i < 16; i++)
+        __put_user(env->aregs[i], &sregs->regs.acrs[i]);
+    /*
+     * We have to store the fp registers to current->thread.fp_regs
+     * to merge them with the emulated registers.
+     */
+    //save_fp_regs(&current->thread.fp_regs); FIXME
+    for (i = 0; i < 16; i++)
+        __put_user(env->fregs[i].i, &sregs->fpregs.fprs[i]);
+}
+
+static void setup_frame(int sig, struct target_sigaction *ka,
+			target_sigset_t *set, CPUState *env)
+{
+    sigframe *frame;
+    abi_ulong frame_addr;
+
+    frame_addr = get_sigframe(ka, env, sizeof *frame);
+    qemu_log("%s: frame_addr 0x%lx\n", __FUNCTION__, frame_addr);
+    if (!lock_user_struct(VERIFY_WRITE, frame, frame_addr, 0))
+            goto give_sigsegv;
+
+    qemu_log("%s: 1\n", __FUNCTION__);
+    if (__put_user(set->sig[0], &frame->sc.oldmask[0]))
+              goto give_sigsegv;
+
+    save_sigregs(env, &frame->sregs);
+
+    __put_user((abi_ulong)&frame->sregs, (abi_ulong *)&frame->sc.sregs);
+
+    /* Set up to return from userspace.  If provided, use a stub
+       already in userspace.  */
+    if (ka->sa_flags & TARGET_SA_RESTORER) {
+            env->regs[14] = (unsigned long)
+                    ka->sa_restorer | PSW_ADDR_AMODE;
+    } else {
+            env->regs[14] = (unsigned long)
+                    frame->retcode | PSW_ADDR_AMODE;
+            if (__put_user(S390_SYSCALL_OPCODE | TARGET_NR_sigreturn,
+                           (uint16_t *)(frame->retcode)))
+                    goto give_sigsegv;
+    }
+
+    /* Set up backchain. */
+    if (__put_user(env->regs[15], (abi_ulong *) frame))
+            goto give_sigsegv;
+
+    /* Set up registers for signal handler */
+    env->regs[15] = (target_ulong) frame;
+    env->psw.addr = (target_ulong) ka->_sa_handler | PSW_ADDR_AMODE;
+
+    env->regs[2] = sig; //map_signal(sig);
+    env->regs[3] = (target_ulong) &frame->sc;
+
+    /* We forgot to include these in the sigcontext.
+       To avoid breaking binary compatibility, they are passed as args. */
+    env->regs[4] = 0; // FIXME: no clue... current->thread.trap_no;
+    env->regs[5] = 0; // FIXME: no clue... current->thread.prot_addr;
+
+    /* Place signal number on stack to allow backtrace from handler.  */
+    if (__put_user(env->regs[2], (int *) &frame->signo))
+            goto give_sigsegv;
+    unlock_user_struct(frame, frame_addr, 1);
+    return;
+
+give_sigsegv:
+    qemu_log("%s: give_sigsegv\n", __FUNCTION__);
+    unlock_user_struct(frame, frame_addr, 1);
+    force_sig(TARGET_SIGSEGV);
+}
+
+static void setup_rt_frame(int sig, struct target_sigaction *ka,
+                        target_siginfo_t *info,
+			target_sigset_t *set, CPUState *env)
+{
+    int i;
+    rt_sigframe *frame;
+    abi_ulong frame_addr;
+
+    frame_addr = get_sigframe(ka, env, sizeof *frame);
+    qemu_log("%s: frame_addr 0x%lx\n", __FUNCTION__, frame_addr);
+    if (!lock_user_struct(VERIFY_WRITE, frame, frame_addr, 0))
+            goto give_sigsegv;
+
+    qemu_log("%s: 1\n", __FUNCTION__);
+    if (copy_siginfo_to_user(&frame->info, info))
+              goto give_sigsegv;
+
+    /* Create the ucontext.  */
+    __put_user(0, &frame->uc.uc_flags);
+    __put_user((abi_ulong)NULL, (abi_ulong*)&frame->uc.uc_link);
+    __put_user(target_sigaltstack_used.ss_sp, &frame->uc.uc_stack.ss_sp);
+    __put_user(sas_ss_flags(get_sp_from_cpustate(env)),
+                      &frame->uc.uc_stack.ss_flags);
+    __put_user(target_sigaltstack_used.ss_size, &frame->uc.uc_stack.ss_size);
+    save_sigregs(env, &frame->uc.uc_mcontext);
+    for(i = 0; i < TARGET_NSIG_WORDS; i++) {
+        __put_user((abi_ulong)set->sig[i], (abi_ulong*)&frame->uc.uc_sigmask.sig[i]);
+    }
+
+    /* Set up to return from userspace.  If provided, use a stub
+       already in userspace.  */
+    if (ka->sa_flags & TARGET_SA_RESTORER) {
+            env->regs[14] = (unsigned long)
+                    ka->sa_restorer | PSW_ADDR_AMODE;
+    } else {
+            env->regs[14] = (unsigned long)
+                    frame->retcode | PSW_ADDR_AMODE;
+            if (__put_user(S390_SYSCALL_OPCODE | TARGET_NR_rt_sigreturn,
+                           (uint16_t *)(frame->retcode)))
+                    goto give_sigsegv;
+    }
+
+    /* Set up backchain. */
+    if (__put_user(env->regs[15], (abi_ulong *) frame))
+            goto give_sigsegv;
+
+    /* Set up registers for signal handler */
+    env->regs[15] = (target_ulong) frame;
+    env->psw.addr = (target_ulong) ka->_sa_handler | PSW_ADDR_AMODE;
+
+    env->regs[2] = sig; //map_signal(sig);
+    env->regs[3] = (target_ulong) &frame->info;
+    env->regs[4] = (target_ulong) &frame->uc;
+    return;
+
+give_sigsegv:
+    qemu_log("%s: give_sigsegv\n", __FUNCTION__);
+    unlock_user_struct(frame, frame_addr, 1);
+    force_sig(TARGET_SIGSEGV);
+}
+
+static int
+restore_sigregs(CPUState *env, target_sigregs *sc)
+{
+    int err = 0;
+    int i;
+
+    for (i = 0; i < 16; i++) {
+        err |= __get_user(env->regs[i], &sc->regs.gprs[i]);
+    }
+
+    err |= __get_user(env->psw.mask, &sc->regs.psw.mask);
+    qemu_log("%s: sc->regs.psw.addr 0x%lx env->psw.addr 0x%lx\n", __FUNCTION__, sc->regs.psw.addr, env->psw.addr);
+    err |= __get_user(env->psw.addr, &sc->regs.psw.addr);
+    /* FIXME: 31-bit -> | PSW_ADDR_AMODE */
+    
+    for (i = 0; i < 16; i++) {
+        err |= __get_user(env->aregs[i], &sc->regs.acrs[i]);
+    }
+    for (i = 0; i < 16; i++) {
+        err |= __get_user(env->fregs[i].i, &sc->fpregs.fprs[i]);
+    }
+    
+    return err;
+}
+
+long do_sigreturn(CPUState *env)
+{
+    sigframe *frame;
+    abi_ulong frame_addr = env->regs[15];
+    qemu_log("%s: frame_addr 0x%lx\n", __FUNCTION__, frame_addr);
+    target_sigset_t target_set;
+    sigset_t set;
+
+    if (!lock_user_struct(VERIFY_READ, frame, frame_addr, 1))
+            goto badframe;
+    if (__get_user(target_set.sig[0], &frame->sc.oldmask[0]))
+            goto badframe;
+
+    target_to_host_sigset_internal(&set, &target_set);
+    sigprocmask(SIG_SETMASK, &set, NULL);	/* ~_BLOCKABLE? */
+
+    if (restore_sigregs(env, &frame->sregs))
+            goto badframe;
+
+    unlock_user_struct(frame, frame_addr, 0);
+    return env->regs[2];
+
+badframe:
+    unlock_user_struct(frame, frame_addr, 0);
+    force_sig(TARGET_SIGSEGV);
+    return 0;
+}
+
+long do_rt_sigreturn(CPUState *env)
+{
+    rt_sigframe *frame;
+    abi_ulong frame_addr = env->regs[15];
+    qemu_log("%s: frame_addr 0x%lx\n", __FUNCTION__, frame_addr);
+    sigset_t set;
+
+    if (!lock_user_struct(VERIFY_READ, frame, frame_addr, 1))
+            goto badframe;
+    target_to_host_sigset(&set, &frame->uc.uc_sigmask);
+
+    sigprocmask(SIG_SETMASK, &set, NULL);	/* ~_BLOCKABLE? */
+
+    if (restore_sigregs(env, &frame->uc.uc_mcontext))
+            goto badframe;
+
+    if (do_sigaltstack(frame_addr + offsetof(rt_sigframe, uc.uc_stack), 0,
+                       get_sp_from_cpustate(env)) == -EFAULT)
+            goto badframe;
+    unlock_user_struct(frame, frame_addr, 0);
+    return env->regs[2];
+
+badframe:
+    unlock_user_struct(frame, frame_addr, 0);
+    force_sig(TARGET_SIGSEGV);
+    return 0;
+}
+
 #elif defined(TARGET_PPC) && !defined(TARGET_PPC64)
 
 /* FIXME: Many of the structures are defined for both PPC and PPC64, but
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 45ccef9..617e031 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -5102,7 +5102,7 @@ abi_long do_syscall(void *cpu_env, int num, abi_long arg1,
             ret = get_errno(settimeofday(&tv, NULL));
         }
         break;
-#ifdef TARGET_NR_select
+#if defined(TARGET_NR_select) && !defined(TARGET_S390X) && !defined(TARGET_S390)
     case TARGET_NR_select:
         {
             struct target_sel_arg_struct *sel;
@@ -5209,7 +5209,9 @@ abi_long do_syscall(void *cpu_env, int num, abi_long arg1,
 #endif
 #ifdef TARGET_NR_mmap
     case TARGET_NR_mmap:
-#if (defined(TARGET_I386) && defined(TARGET_ABI32)) || defined(TARGET_ARM) || defined(TARGET_M68K) || defined(TARGET_CRIS) || defined(TARGET_MICROBLAZE)
+#if (defined(TARGET_I386) && defined(TARGET_ABI32)) || defined(TARGET_ARM) || \
+    defined(TARGET_M68K) || defined(TARGET_CRIS) || defined(TARGET_MICROBLAZE) \
+    || defined(TARGET_S390X)
         {
             abi_ulong *v;
             abi_ulong v1, v2, v3, v4, v5, v6;
@@ -5694,6 +5696,8 @@ abi_long do_syscall(void *cpu_env, int num, abi_long arg1,
         ret = get_errno(do_fork(cpu_env, arg1, arg2, arg3, arg5, arg4));
 #elif defined(TARGET_CRIS)
         ret = get_errno(do_fork(cpu_env, arg2, arg1, arg3, arg4, arg5));
+#elif defined(TARGET_S390X)
+        ret = get_errno(do_fork(cpu_env, arg2, arg1, arg3, arg5, arg4));
 #else
         ret = get_errno(do_fork(cpu_env, arg1, arg2, arg3, arg4, arg5));
 #endif
@@ -5898,8 +5902,12 @@ abi_long do_syscall(void *cpu_env, int num, abi_long arg1,
         }
         break;
 #endif /* TARGET_NR_getdents64 */
-#ifdef TARGET_NR__newselect
+#if defined(TARGET_NR__newselect) || defined(TARGET_S390X)
+#ifdef TARGET_S390X
+    case TARGET_NR_select:
+#else
     case TARGET_NR__newselect:
+#endif
         ret = do_select(arg1, arg2, arg3, arg4, arg5);
         break;
 #endif
@@ -6124,7 +6132,7 @@ abi_long do_syscall(void *cpu_env, int num, abi_long arg1,
     case TARGET_NR_sigaltstack:
 #if defined(TARGET_I386) || defined(TARGET_ARM) || defined(TARGET_MIPS) || \
     defined(TARGET_SPARC) || defined(TARGET_PPC) || defined(TARGET_ALPHA) || \
-    defined(TARGET_M68K)
+    defined(TARGET_M68K) || defined(TARGET_S390X)
         ret = do_sigaltstack(arg1, arg2, get_sp_from_cpustate((CPUState *)cpu_env));
         break;
 #else
diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index c018165..2771fae 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -55,7 +55,7 @@
 #endif
 
 #if defined(TARGET_I386) || defined(TARGET_ARM) || defined(TARGET_SH4) \
-    || defined(TARGET_M68K) || defined(TARGET_CRIS)
+    || defined(TARGET_M68K) || defined(TARGET_CRIS) || defined(TARGET_S390X)
 
 #define TARGET_IOC_SIZEBITS	14
 #define TARGET_IOC_DIRBITS	2
@@ -315,7 +315,10 @@ struct target_sigaction;
 int do_sigaction(int sig, const struct target_sigaction *act,
                  struct target_sigaction *oact);
 
-#if defined(TARGET_I386) || defined(TARGET_ARM) || defined(TARGET_SPARC) || defined(TARGET_PPC) || defined(TARGET_MIPS) || defined (TARGET_SH4) || defined(TARGET_M68K) || defined(TARGET_ALPHA) || defined(TARGET_CRIS) || defined(TARGET_MICROBLAZE)
+#if defined(TARGET_I386) || defined(TARGET_ARM) || defined(TARGET_SPARC) || \
+    defined(TARGET_PPC) || defined(TARGET_MIPS) || defined (TARGET_SH4) || \
+    defined(TARGET_M68K) || defined(TARGET_ALPHA) || defined(TARGET_CRIS) || \
+    defined(TARGET_MICROBLAZE) || defined(TARGET_S390X)
 
 #if defined(TARGET_SPARC)
 #define TARGET_SA_NOCLDSTOP    8u
@@ -1617,6 +1620,27 @@ struct target_stat {
 
   	abi_long	__unused[3];
 };
+#elif defined(TARGET_S390X)
+struct target_stat {
+    abi_ulong  st_dev;
+    abi_ulong  st_ino;
+    abi_ulong  st_nlink;
+    unsigned int   st_mode;
+    unsigned int   st_uid;
+    unsigned int   st_gid;
+    unsigned int   __pad1;
+    abi_ulong  st_rdev;
+    abi_ulong  st_size;
+    abi_ulong  target_st_atime;
+    abi_ulong  target_st_atime_nsec;
+    abi_ulong  target_st_mtime;
+    abi_ulong  target_st_mtime_nsec;
+    abi_ulong  target_st_ctime;
+    abi_ulong  target_st_ctime_nsec;
+    abi_ulong  st_blksize;
+    abi_long       st_blocks;
+    abi_ulong  __unused[3];
+};
 #else
 #error unsupported CPU
 #endif
@@ -1703,6 +1727,34 @@ struct target_statfs64 {
 	abi_long f_frsize;
 	abi_long f_spare[5];
 };
+#elif defined(TARGET_S390X)
+struct target_statfs {
+    int32_t  f_type;
+    int32_t  f_bsize;
+    abi_long f_blocks;
+    abi_long f_bfree;
+    abi_long f_bavail;
+    abi_long f_files;
+    abi_long f_ffree;
+    kernel_fsid_t f_fsid;
+    int32_t  f_namelen;
+    int32_t  f_frsize;
+    int32_t  f_spare[5];
+};
+
+struct target_statfs64 {
+    int32_t  f_type;
+    int32_t  f_bsize;
+    abi_long f_blocks;
+    abi_long f_bfree;
+    abi_long f_bavail;
+    abi_long f_files;
+    abi_long f_ffree;
+    kernel_fsid_t f_fsid;
+    int32_t  f_namelen;
+    int32_t  f_frsize;
+    int32_t  f_spare[5];
+};
 #else
 struct target_statfs {
 	uint32_t f_type;
diff --git a/qemu-binfmt-conf.sh b/qemu-binfmt-conf.sh
index 941f0cf..5678771 100644
--- a/qemu-binfmt-conf.sh
+++ b/qemu-binfmt-conf.sh
@@ -1,5 +1,5 @@
 #!/bin/sh
-# enable automatic i386/ARM/M68K/MIPS/SPARC/PPC program execution by the kernel
+# enable automatic i386/ARM/M68K/MIPS/SPARC/PPC/s390 program execution by the kernel
 
 # load the binfmt_misc module
 if [ ! -d /proc/sys/fs/binfmt_misc ]; then
@@ -57,3 +57,6 @@ if [ $cpu != "mips" ] ; then
     echo   ':mips64:M::\x7fELF\x02\x02\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x08:\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff:/usr/local/bin/qemu-mips64:' > /proc/sys/fs/binfmt_misc/register
     echo   ':mips64el:M::\x7fELF\x02\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x08\x00:\xff\xff\xff\xff\xff\xff\xff\x00\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff\xff:/usr/local/bin/qemu-mips64el:' > /proc/sys/fs/binfmt_misc/register
 fi
+if [ $cpu != "s390x" ] ; then
+    echo   ':s390x:M::\x7fELF\x02\x02\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x16:\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfe\xff\xff:/usr/local/bin/qemu-s390x:' > /proc/sys/fs/binfmt_misc/register
+fi
-- 
1.6.2.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 6/9] linux-user: don't do locking in single-threaded processes
  2009-10-16 12:38         ` [Qemu-devel] [PATCH 5/9] linux-user: S/390 64-bit (s390x) support Ulrich Hecht
@ 2009-10-16 12:38           ` Ulrich Hecht
  2009-10-16 12:38             ` [Qemu-devel] [PATCH 7/9] linux-user: dup3, fallocate syscalls Ulrich Hecht
  0 siblings, 1 reply; 26+ messages in thread
From: Ulrich Hecht @ 2009-10-16 12:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: riku.voipio, agraf

Skips setting the tb_lock if a process doesn't have more than one thread,
which is usually the case. Results in about 20% performance gain (measured
with the s390x target, but the effect should be similar with other targets).

Signed-off-by: Ulrich Hecht <uli@suse.de>
---
 cpu-defs.h           |    8 ++++++++
 cpu-exec.c           |   14 ++++++++++++--
 linux-user/syscall.c |    1 +
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/cpu-defs.h b/cpu-defs.h
index 95068b5..c50c59e 100644
--- a/cpu-defs.h
+++ b/cpu-defs.h
@@ -135,6 +135,13 @@ typedef struct CPUWatchpoint {
 } CPUWatchpoint;
 
 #define CPU_TEMP_BUF_NLONGS 128
+
+#ifdef CONFIG_USER_ONLY
+#define MULTITHREAD uint32_t multithreaded;
+#else
+#define MULTITHREAD
+#endif
+
 #define CPU_COMMON                                                      \
     struct TranslationBlock *current_tb; /* currently executing TB  */  \
     /* soft mmu support */                                              \
@@ -149,6 +156,7 @@ typedef struct CPUWatchpoint {
     uint32_t stop;   /* Stop request */                                 \
     uint32_t stopped; /* Artificially stopped */                        \
     uint32_t interrupt_request;                                         \
+    MULTITHREAD /* needs locking when accessing TBs */                  \
     volatile sig_atomic_t exit_request;                                 \
     /* The meaning of the MMU modes is defined in the target code. */   \
     CPUTLBEntry tlb_table[NB_MMU_MODES][CPU_TLB_SIZE];                  \
diff --git a/cpu-exec.c b/cpu-exec.c
index 6b3391c..3fe2725 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -219,6 +219,9 @@ int cpu_exec(CPUState *env1)
     TranslationBlock *tb;
     uint8_t *tc_ptr;
     unsigned long next_tb;
+#ifdef CONFIG_USER_ONLY
+    uint32_t multithreaded;
+#endif
 
     if (cpu_halted(env1) == EXCP_HALTED)
         return EXCP_HALTED;
@@ -576,7 +579,11 @@ int cpu_exec(CPUState *env1)
 #endif
                 }
 #endif
-                spin_lock(&tb_lock);
+#ifdef CONFIG_USER_ONLY
+                multithreaded = env->multithreaded;
+                if (multithreaded)
+#endif
+                    spin_lock(&tb_lock);
                 tb = tb_find_fast();
                 /* Note: we do it here to avoid a gcc bug on Mac OS X when
                    doing it in tb_find_slow */
@@ -600,7 +607,10 @@ int cpu_exec(CPUState *env1)
                     tb_add_jump((TranslationBlock *)(next_tb & ~3), next_tb & 3, tb);
                 }
                 }
-                spin_unlock(&tb_lock);
+#ifdef CONFIG_USER_ONLY
+                if (multithreaded)
+#endif
+                    spin_unlock(&tb_lock);
                 env->current_tb = tb;
 
                 /* cpu_interrupt might be called while translating the
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 617e031..f2a53d5 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -3549,6 +3549,7 @@ static int do_fork(CPUState *env, unsigned int flags, abi_ulong newsp,
         ts = qemu_mallocz(sizeof(TaskState) + NEW_STACK_SIZE);
         init_task_state(ts);
         new_stack = ts->stack;
+        env->multithreaded = 1;
         /* we create a new CPU instance. */
         new_env = cpu_copy(env);
         /* Init regs that differ from the parent.  */
-- 
1.6.2.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 7/9] linux-user: dup3, fallocate syscalls
  2009-10-16 12:38           ` [Qemu-devel] [PATCH 6/9] linux-user: don't do locking in single-threaded processes Ulrich Hecht
@ 2009-10-16 12:38             ` Ulrich Hecht
  2009-10-16 12:38               ` [Qemu-devel] [PATCH 8/9] linux-user: define a couple of syscalls for non-uid16 targets Ulrich Hecht
  0 siblings, 1 reply; 26+ messages in thread
From: Ulrich Hecht @ 2009-10-16 12:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: riku.voipio, agraf

implementations of dup3 and fallocate that are good enough to fool LTP

dup3 check, fallocate check fixed
use compile_prog

Signed-off-by: Ulrich Hecht <uli@suse.de>
---
 configure            |   36 ++++++++++++++++++++++++++++++++++++
 linux-user/syscall.c |   10 ++++++++++
 2 files changed, 46 insertions(+), 0 deletions(-)

diff --git a/configure b/configure
index 64be51f..30ecd8f 100755
--- a/configure
+++ b/configure
@@ -1573,6 +1573,36 @@ if compile_prog "" "" ; then
   eventfd=yes
 fi
 
+# check for fallocate
+fallocate=no
+cat > $TMPC << EOF
+#include <fcntl.h>
+
+int main(void)
+{
+    fallocate(0, 0, 0, 0);
+    return 0;
+}
+EOF
+if compile_prog "" "" ; then
+  fallocate=yes
+fi
+
+# check for dup3
+dup3=no
+cat > $TMPC << EOF
+#include <unistd.h>
+
+int main(void)
+{
+    dup3(0, 0, 0);
+    return 0;
+}
+EOF
+if compile_prog "" "" ; then
+  dup3=yes
+fi
+
 # Check if tools are available to build documentation.
 if test "$docs" != "no" ; then
   if test -x "`which texi2html 2>/dev/null`" -a \
@@ -1954,6 +1984,12 @@ fi
 if test "$eventfd" = "yes" ; then
   echo "CONFIG_EVENTFD=y" >> $config_host_mak
 fi
+if test "$fallocate" = "yes" ; then
+  echo "CONFIG_FALLOCATE=y" >> $config_host_mak
+fi
+if test "$dup3" = "yes" ; then
+  echo "CONFIG_DUP3=y" >> $config_host_mak
+fi
 if test "$inotify" = "yes" ; then
   echo "CONFIG_INOTIFY=y" >> $config_host_mak
 fi
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index f2a53d5..4991154 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -4747,6 +4747,11 @@ abi_long do_syscall(void *cpu_env, int num, abi_long arg1,
     case TARGET_NR_dup2:
         ret = get_errno(dup2(arg1, arg2));
         break;
+#if defined(TARGET_NR_dup3) && defined(CONFIG_DUP3)
+    case TARGET_NR_dup3:
+        ret = get_errno(dup3(arg1, arg2, arg3));
+        break;
+#endif
 #ifdef TARGET_NR_getppid /* not on alpha */
     case TARGET_NR_getppid:
         ret = get_errno(getppid());
@@ -7022,6 +7027,11 @@ abi_long do_syscall(void *cpu_env, int num, abi_long arg1,
         break;
 #endif
 #endif /* CONFIG_EVENTFD  */
+#if defined(CONFIG_FALLOCATE) && defined(TARGET_NR_fallocate)
+    case TARGET_NR_fallocate:
+        ret = get_errno(fallocate(arg1, arg2, arg3, arg4));
+        break;
+#endif
     default:
     unimplemented:
         gemu_log("qemu: Unsupported syscall: %d\n", num);
-- 
1.6.2.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 8/9] linux-user: define a couple of syscalls for non-uid16 targets
  2009-10-16 12:38             ` [Qemu-devel] [PATCH 7/9] linux-user: dup3, fallocate syscalls Ulrich Hecht
@ 2009-10-16 12:38               ` Ulrich Hecht
  2009-10-16 12:38                 ` [Qemu-devel] [PATCH 9/9] linux-user: getpriority errno fix Ulrich Hecht
  0 siblings, 1 reply; 26+ messages in thread
From: Ulrich Hecht @ 2009-10-16 12:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: riku.voipio, agraf

Quite a number of syscalls are only defined on systems with USE_UID16
defined; this patch defines them on other systems as well.

Fixes a large number of uid/gid-related testcases on the s390x target
(and most likely on other targets as well)

Signed-off-by: Ulrich Hecht <uli@suse.de>
---
 linux-user/syscall.c |  125 ++++++++++++++++++++++++++++++++++++++++++--------
 1 files changed, 105 insertions(+), 20 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 4991154..da6f2e1 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -307,7 +307,7 @@ static int sys_fchmodat(int dirfd, const char *pathname, mode_t mode)
   return (fchmodat(dirfd, pathname, mode, 0));
 }
 #endif
-#if defined(TARGET_NR_fchownat) && defined(USE_UID16)
+#if defined(TARGET_NR_fchownat)
 static int sys_fchownat(int dirfd, const char *pathname, uid_t owner,
     gid_t group, int flags)
 {
@@ -416,7 +416,7 @@ _syscall3(int,sys_faccessat,int,dirfd,const char *,pathname,int,mode)
 #if defined(TARGET_NR_fchmodat) && defined(__NR_fchmodat)
 _syscall3(int,sys_fchmodat,int,dirfd,const char *,pathname, mode_t,mode)
 #endif
-#if defined(TARGET_NR_fchownat) && defined(__NR_fchownat) && defined(USE_UID16)
+#if defined(TARGET_NR_fchownat) && defined(__NR_fchownat)
 _syscall5(int,sys_fchownat,int,dirfd,const char *,pathname,
           uid_t,owner,gid_t,group,int,flags)
 #endif
@@ -6371,18 +6371,35 @@ abi_long do_syscall(void *cpu_env, int num, abi_long arg1,
     case TARGET_NR_setfsgid:
         ret = get_errno(setfsgid(arg1));
         break;
+#else /* USE_UID16 */
+#if defined(TARGET_NR_fchownat) && defined(__NR_fchownat)
+    case TARGET_NR_fchownat:
+        if (!(p = lock_user_string(arg2))) 
+            goto efault;
+        ret = get_errno(sys_fchownat(arg1, p, arg3, arg4, arg5));
+        unlock_user(p, arg2, 0);
+        break;
+#endif
 #endif /* USE_UID16 */
 
-#ifdef TARGET_NR_lchown32
+#if defined(TARGET_NR_lchown32) || !defined(USE_UID16)
+#if defined(TARGET_NR_lchown32)
     case TARGET_NR_lchown32:
+#else
+    case TARGET_NR_lchown:
+#endif
         if (!(p = lock_user_string(arg1)))
             goto efault;
         ret = get_errno(lchown(p, arg2, arg3));
         unlock_user(p, arg1, 0);
         break;
 #endif
-#ifdef TARGET_NR_getuid32
+#if defined(TARGET_NR_getuid32) || (defined(TARGET_NR_getuid) && !defined(USE_UID16))
+#if defined(TARGET_NR_getuid32)
     case TARGET_NR_getuid32:
+#else
+    case TARGET_NR_getuid:
+#endif
         ret = get_errno(getuid());
         break;
 #endif
@@ -6410,33 +6427,57 @@ abi_long do_syscall(void *cpu_env, int num, abi_long arg1,
         break;
 #endif
 
-#ifdef TARGET_NR_getgid32
+#if defined(TARGET_NR_getgid32) || (defined(TARGET_NR_getgid) && !defined(USE_UID16))
+#if defined(TARGET_NR_getgid32)
     case TARGET_NR_getgid32:
+#else
+    case TARGET_NR_getgid:
+#endif
         ret = get_errno(getgid());
         break;
 #endif
-#ifdef TARGET_NR_geteuid32
+#if defined(TARGET_NR_geteuid32) || (defined(TARGET_NR_geteuid) && !defined(USE_UID16))
+#if defined(TARGET_NR_geteuid32)
     case TARGET_NR_geteuid32:
+#else
+    case TARGET_NR_geteuid:
+#endif
         ret = get_errno(geteuid());
         break;
 #endif
-#ifdef TARGET_NR_getegid32
+#if defined(TARGET_NR_getegid32) || (defined(TARGET_NR_getegid) && !defined(USE_UID16))
+#if defined(TARGET_NR_getegid32)
     case TARGET_NR_getegid32:
+#else
+    case TARGET_NR_getegid:
+#endif
         ret = get_errno(getegid());
         break;
 #endif
-#ifdef TARGET_NR_setreuid32
+#if defined(TARGET_NR_setreuid32) || !defined(USE_UID16)
+#if defined(TARGET_NR_setreuid32)
     case TARGET_NR_setreuid32:
+#else
+    case TARGET_NR_setreuid:
+#endif
         ret = get_errno(setreuid(arg1, arg2));
         break;
 #endif
-#ifdef TARGET_NR_setregid32
+#if defined(TARGET_NR_setregid32) || !defined(USE_UID16)
+#if defined(TARGET_NR_setregid32)
     case TARGET_NR_setregid32:
+#else
+    case TARGET_NR_setregid:
+#endif
         ret = get_errno(setregid(arg1, arg2));
         break;
 #endif
-#ifdef TARGET_NR_getgroups32
+#if defined(TARGET_NR_getgroups32) || !defined(USE_UID16)
+#if defined(TARGET_NR_getgroups32)
     case TARGET_NR_getgroups32:
+#else
+    case TARGET_NR_getgroups:
+#endif
         {
             int gidsetsize = arg1;
             uint32_t *target_grouplist;
@@ -6460,8 +6501,12 @@ abi_long do_syscall(void *cpu_env, int num, abi_long arg1,
         }
         break;
 #endif
-#ifdef TARGET_NR_setgroups32
+#if defined(TARGET_NR_setgroups32) || !defined(USE_UID16)
+#if defined(TARGET_NR_setgroups32)
     case TARGET_NR_setgroups32:
+#else
+    case TARGET_NR_setgroups:
+#endif
         {
             int gidsetsize = arg1;
             uint32_t *target_grouplist;
@@ -6481,18 +6526,30 @@ abi_long do_syscall(void *cpu_env, int num, abi_long arg1,
         }
         break;
 #endif
-#ifdef TARGET_NR_fchown32
+#if defined(TARGET_NR_fchown32) || !defined(USE_UID16)
+#if defined(TARGET_NR_fchown32)
     case TARGET_NR_fchown32:
+#else
+    case TARGET_NR_fchown:
+#endif
         ret = get_errno(fchown(arg1, arg2, arg3));
         break;
 #endif
-#ifdef TARGET_NR_setresuid32
+#if defined(TARGET_NR_setresuid32) || !defined(USE_UID16)
+#if defined(TARGET_NR_setresuid32)
     case TARGET_NR_setresuid32:
+#else
+    case TARGET_NR_setresuid:
+#endif
         ret = get_errno(setresuid(arg1, arg2, arg3));
         break;
 #endif
-#ifdef TARGET_NR_getresuid32
+#if defined(TARGET_NR_getresuid32) || !defined(USE_UID16)
+#if defined(TARGET_NR_getresuid32)
     case TARGET_NR_getresuid32:
+#else
+    case TARGET_NR_getresuid:
+#endif
         {
             uid_t ruid, euid, suid;
             ret = get_errno(getresuid(&ruid, &euid, &suid));
@@ -6505,13 +6562,21 @@ abi_long do_syscall(void *cpu_env, int num, abi_long arg1,
         }
         break;
 #endif
-#ifdef TARGET_NR_setresgid32
+#if defined(TARGET_NR_setresgid32) || !defined(USE_UID16)
+#if defined(TARGET_NR_setresgid32)
     case TARGET_NR_setresgid32:
+#else
+    case TARGET_NR_setresgid:
+#endif
         ret = get_errno(setresgid(arg1, arg2, arg3));
         break;
 #endif
+#if defined(TARGET_NR_getresgid32) || !defined(USE_UID16)
 #ifdef TARGET_NR_getresgid32
     case TARGET_NR_getresgid32:
+#else
+    case TARGET_NR_getresgid:
+#endif
         {
             gid_t rgid, egid, sgid;
             ret = get_errno(getresgid(&rgid, &egid, &sgid));
@@ -6524,31 +6589,51 @@ abi_long do_syscall(void *cpu_env, int num, abi_long arg1,
         }
         break;
 #endif
-#ifdef TARGET_NR_chown32
+#if defined(TARGET_NR_chown32) || !defined(USE_UID16)
+#if defined(TARGET_NR_chown32)
     case TARGET_NR_chown32:
+#else
+    case TARGET_NR_chown:
+#endif
         if (!(p = lock_user_string(arg1)))
             goto efault;
         ret = get_errno(chown(p, arg2, arg3));
         unlock_user(p, arg1, 0);
         break;
 #endif
-#ifdef TARGET_NR_setuid32
+#if defined(TARGET_NR_setuid32) || !defined(USE_UID16)
+#if defined(TARGET_NR_setuid32)
     case TARGET_NR_setuid32:
+#else
+    case TARGET_NR_setuid:
+#endif
         ret = get_errno(setuid(arg1));
         break;
 #endif
-#ifdef TARGET_NR_setgid32
+#if defined(TARGET_NR_setgid32) || !defined(USE_UID16)
+#if defined(TARGET_NR_setgid32)
     case TARGET_NR_setgid32:
+#else
+    case TARGET_NR_setgid:
+#endif
         ret = get_errno(setgid(arg1));
         break;
 #endif
-#ifdef TARGET_NR_setfsuid32
+#if defined(TARGET_NR_setfsuid32) || !defined(USE_UID16)
+#if defined(TARGET_NR_setfsuid32)
     case TARGET_NR_setfsuid32:
+#else
+    case TARGET_NR_setfsuid:
+#endif
         ret = get_errno(setfsuid(arg1));
         break;
 #endif
-#ifdef TARGET_NR_setfsgid32
+#if defined(TARGET_NR_setfsgid32) || !defined(USE_UID16)
+#if defined(TARGET_NR_setfsgid32)
     case TARGET_NR_setfsgid32:
+#else
+    case TARGET_NR_setfsgid:
+#endif
         ret = get_errno(setfsgid(arg1));
         break;
 #endif
-- 
1.6.2.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [Qemu-devel] [PATCH 9/9] linux-user: getpriority errno fix
  2009-10-16 12:38               ` [Qemu-devel] [PATCH 8/9] linux-user: define a couple of syscalls for non-uid16 targets Ulrich Hecht
@ 2009-10-16 12:38                 ` Ulrich Hecht
  0 siblings, 0 replies; 26+ messages in thread
From: Ulrich Hecht @ 2009-10-16 12:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: riku.voipio, agraf

getpriority returned wrong errno; fixes LTP test getpriority02.

Signed-off-by: Ulrich Hecht <uli@suse.de>
---
 linux-user/syscall.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index da6f2e1..455c3fd 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -5314,7 +5314,7 @@ abi_long do_syscall(void *cpu_env, int num, abi_long arg1,
         /* libc does special remapping of the return value of
          * sys_getpriority() so it's just easiest to call
          * sys_getpriority() directly rather than through libc. */
-        ret = sys_getpriority(arg1, arg2);
+        ret = get_errno(sys_getpriority(arg1, arg2));
         break;
     case TARGET_NR_setpriority:
         ret = get_errno(setpriority(arg1, arg2, arg3));
-- 
1.6.2.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] TCG "sync" op
  2009-10-16 12:38 ` [Qemu-devel] [PATCH 1/9] TCG "sync" op Ulrich Hecht
  2009-10-16 12:38   ` [Qemu-devel] [PATCH 2/9] S/390 CPU emulation Ulrich Hecht
@ 2009-10-16 15:52   ` Aurelien Jarno
  2009-10-16 16:37     ` Ulrich Hecht
  2009-10-17  8:59     ` Edgar E. Iglesias
  1 sibling, 2 replies; 26+ messages in thread
From: Aurelien Jarno @ 2009-10-16 15:52 UTC (permalink / raw)
  To: Ulrich Hecht; +Cc: riku.voipio, qemu-devel, agraf

On Fri, Oct 16, 2009 at 02:38:47PM +0200, Ulrich Hecht wrote:
> sync allows concurrent accesses to locations in memory through different TCG
> variables. This comes in handy when you are emulating CPU registers that can
> be used as either 32 or 64 bit, as TCG doesn't know anything about aliases.
> See the s390x target for an example.
> 
> Fixed sync_i64 build failure on 32-bit targets.

It don't really see the point of such a new op, especially the way it is
used in the S390 target.

If a global is "synced" before each load/store, it will be load/stored 
from/to memory each time it is used. This is exactly what tcg_gen_ld/st
do, except it's only one op. The benefit of globals in TCG is to hold 
them as long as possible in host register and avoid costly memory 
load/store. tcg_gen_ld/st would probably be even more efficient, as 
it is one op instead of two, and also because mapping more globals 
means more time spent in the code looping over all globals.

IMHO, the correct way to do it is to use the following code, assuming 
you want to use 64-bit TCG regs to hold 32-bit values (that's something
that is not really clear in your next patch):

- for register load:

| static TCGv load_reg(int reg)
| {
|    TCGv r = tcg_temp_new_i64();
|    tcg_gen_ext32u_i64(r, tcgregs[reg]);
|    return r;
| }
|
| static void store_reg32(int reg, TCGv v)
| {
|    tcg_gen_ext32u_i64(v, v); /* may be optional */
|    tcg_gen_andi_i64(tcgregs[reg], tcgregs[reg], 0xffffffff00000000ULL);
|    tcg_gen_or_i64(tcgregs[reg], tcgregs[reg], v);
| }

If you want to do the same using 32-bit TCG regs:

| static TCGv_i32 load_reg(int reg)
| {
|    TCGv_i32 r = tcg_temp_new_i32();
|    tcg_gen_extu_i32_i64(r, tcgregs[reg]);
|    return r;
| }
|
| static void store_reg32(int reg, TCGv_i32 v)
| {
|    TCGv tmp = tcg_temp_new();
|    tcg_gen_extu_i32_i64(tmp, v);
|    tcg_gen_andi_i64(tcgregs[reg], tcgregs[reg], 0xffffffff00000000ULL);
|    tcg_gen_or_i64(tcgregs[reg], tcgregs[reg], tmp);
|    tcg_temp_free(tmp);
| }

Regards,
Aurelien

> Signed-off-by: Ulrich Hecht <uli@suse.de>
> ---
>  tcg/tcg-op.h  |   12 ++++++++++++
>  tcg/tcg-opc.h |    2 ++
>  tcg/tcg.c     |    6 ++++++
>  3 files changed, 20 insertions(+), 0 deletions(-)
> 
> diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
> index faf2e8b..c1b4710 100644
> --- a/tcg/tcg-op.h
> +++ b/tcg/tcg-op.h
> @@ -316,6 +316,18 @@ static inline void tcg_gen_br(int label)
>      tcg_gen_op1i(INDEX_op_br, label);
>  }
>  
> +static inline void tcg_gen_sync_i32(TCGv_i32 arg)
> +{
> +    tcg_gen_op1_i32(INDEX_op_sync_i32, arg);
> +}
> +
> +#if TCG_TARGET_REG_BITS == 64
> +static inline void tcg_gen_sync_i64(TCGv_i64 arg)
> +{
> +    tcg_gen_op1_i64(INDEX_op_sync_i64, arg);
> +}
> +#endif
> +
>  static inline void tcg_gen_mov_i32(TCGv_i32 ret, TCGv_i32 arg)
>  {
>      if (!TCGV_EQUAL_I32(ret, arg))
> diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
> index b7f3fd7..5dcdeba 100644
> --- a/tcg/tcg-opc.h
> +++ b/tcg/tcg-opc.h
> @@ -40,6 +40,7 @@ DEF2(call, 0, 1, 2, TCG_OPF_SIDE_EFFECTS) /* variable number of parameters */
>  DEF2(jmp, 0, 1, 0, TCG_OPF_BB_END | TCG_OPF_SIDE_EFFECTS)
>  DEF2(br, 0, 0, 1, TCG_OPF_BB_END | TCG_OPF_SIDE_EFFECTS)
>  
> +DEF2(sync_i32, 0, 1, 0, 0)
>  DEF2(mov_i32, 1, 1, 0, 0)
>  DEF2(movi_i32, 1, 0, 1, 0)
>  /* load/store */
> @@ -109,6 +110,7 @@ DEF2(neg_i32, 1, 1, 0, 0)
>  #endif
>  
>  #if TCG_TARGET_REG_BITS == 64
> +DEF2(sync_i64, 0, 1, 0, 0)
>  DEF2(mov_i64, 1, 1, 0, 0)
>  DEF2(movi_i64, 1, 0, 1, 0)
>  /* load/store */
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index 3c0e296..8eb60f8 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -1930,6 +1930,12 @@ static inline int tcg_gen_code_common(TCGContext *s, uint8_t *gen_code_buf,
>          //        dump_regs(s);
>  #endif
>          switch(opc) {
> +        case INDEX_op_sync_i32:
> +#if TCG_TARGET_REG_BITS == 64
> +        case INDEX_op_sync_i64:
> +#endif
> +            temp_save(s, args[0], s->reserved_regs);
> +            break;
>          case INDEX_op_mov_i32:
>  #if TCG_TARGET_REG_BITS == 64
>          case INDEX_op_mov_i64:
> -- 
> 1.6.2.1
> 
> 
> 
> 

-- 
Aurelien Jarno	                        GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] TCG "sync" op
  2009-10-16 15:52   ` [Qemu-devel] [PATCH 1/9] TCG "sync" op Aurelien Jarno
@ 2009-10-16 16:37     ` Ulrich Hecht
  2009-10-16 17:29       ` Aurelien Jarno
  2009-10-17  8:59     ` Edgar E. Iglesias
  1 sibling, 1 reply; 26+ messages in thread
From: Ulrich Hecht @ 2009-10-16 16:37 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: riku.voipio, qemu-devel, agraf

On Friday 16 October 2009, Aurelien Jarno wrote:
> IMHO, the correct way to do it is to use the following code, assuming
> you want to use 64-bit TCG regs to hold 32-bit values (that's
> something that is not really clear in your next patch):
>
> - for register load:
> | static TCGv load_reg(int reg)
> | {
> |    TCGv r = tcg_temp_new_i64();
> |    tcg_gen_ext32u_i64(r, tcgregs[reg]);
> |    return r;
> | }
> |
> | static void store_reg32(int reg, TCGv v)
> | {
> |    tcg_gen_ext32u_i64(v, v); /* may be optional */
> |    tcg_gen_andi_i64(tcgregs[reg], tcgregs[reg],
> | 0xffffffff00000000ULL); tcg_gen_or_i64(tcgregs[reg], tcgregs[reg],
> | v);
> | }

This is _extremely_ detrimental to performance. The point of the sync op 
is that in most cases it's a nop because registers are usually used with 
the same bitness again and again. The sign extension/masking stuff is 
done every time a register is accessed as 32 bits, which is the most 
common case. Compare the translation of the following sequence of 
instructions:

IN: _dl_aux_init
0x0000000080044ff6:  lhi        %r4,0
0x0000000080044ffa:  lhi        %r5,0
0x0000000080044ffe:  lhi        %r0,0

with sync:

OP:
 mov_i32 loc0,global_cc
 movi_i32 tmp1,$0x0
 sync_i64 R4
 mov_i32 r4,tmp1
 movi_i32 tmp1,$0x0
 sync_i64 R5
 mov_i32 r5,tmp1
 movi_i32 tmp1,$0x0
 sync_i64 R0
 mov_i32 r0,tmp1
 mov_i32 global_cc,loc0
 movi_i64 tmp2,$0x80045002
 st_i64 tmp2,env,$0x158
 exit_tb $0x0

OUT: [size=61]
0x6019a030:  mov    0x160(%r14),%ebp
0x6019a037:  mov    %rbp,%rbx
0x6019a03a:  mov    $0x80045002,%r12d
0x6019a040:  mov    %r12,0x158(%r14)
0x6019a047:  mov    %ebp,0xd1a0(%r14)
0x6019a04e:  mov    %ebx,0x160(%r14)
0x6019a055:  xor    %ebp,%ebp
0x6019a057:  mov    %ebp,(%r14)
0x6019a05a:  xor    %ebp,%ebp
0x6019a05c:  mov    %ebp,0x20(%r14)
0x6019a060:  xor    %ebp,%ebp
0x6019a062:  mov    %ebp,0x28(%r14)
0x6019a066:  xor    %eax,%eax
0x6019a068:  jmpq   0x621dc8ce


with sign extension:

OP:
 mov_i32 loc0,global_cc
 movi_i32 tmp1,$0x0
 ext32u_i64 tmp1,tmp1
 movi_i64 tmp2,$0xffffffff00000000
 and_i64 R4,R4,tmp2
 or_i64 R4,R4,tmp1
 movi_i32 tmp1,$0x0
 ext32u_i64 tmp1,tmp1
 movi_i64 tmp2,$0xffffffff00000000
 and_i64 R5,R5,tmp2
 or_i64 R5,R5,tmp1
 movi_i32 tmp1,$0x0
 ext32u_i64 tmp1,tmp1
 movi_i64 tmp2,$0xffffffff00000000
 and_i64 R0,R0,tmp2
 or_i64 R0,R0,tmp1
 mov_i32 global_cc,loc0
 movi_i64 tmp2,$0x80045002
 st_i64 tmp2,env,$0x158
 exit_tb $0x0

OUT: [size=126]
0x6019af10:  mov    0x160(%r14),%ebp
0x6019af17:  xor    %ebx,%ebx
0x6019af19:  mov    %ebx,%ebx
0x6019af1b:  mov    0x20(%r14),%r12
0x6019af1f:  mov    $0xffffffff00000000,%r13
0x6019af29:  and    %r13,%r12
0x6019af2c:  or     %rbx,%r12
0x6019af2f:  xor    %ebx,%ebx
0x6019af31:  mov    %ebx,%ebx
0x6019af33:  mov    0x28(%r14),%r13
0x6019af37:  mov    $0xffffffff00000000,%r15
0x6019af41:  and    %r15,%r13
0x6019af44:  or     %rbx,%r13
0x6019af47:  xor    %ebx,%ebx
0x6019af49:  mov    %ebx,%ebx
0x6019af4b:  mov    (%r14),%r15
0x6019af4e:  mov    $0xffffffff00000000,%r10
0x6019af58:  and    %r10,%r15
0x6019af5b:  or     %rbx,%r15
0x6019af5e:  mov    %rbp,%rbx
0x6019af61:  mov    $0x80045002,%r10d
0x6019af67:  mov    %r10,0x158(%r14)
0x6019af6e:  mov    %ebp,0xd1a0(%r14)
0x6019af75:  mov    %ebx,0x160(%r14)
0x6019af7c:  mov    %r15,(%r14)
0x6019af7f:  mov    %r12,0x20(%r14)
0x6019af83:  mov    %r13,0x28(%r14)
0x6019af87:  xor    %eax,%eax
0x6019af89:  jmpq   0x621dd78e

Its more than twice the size and has ten memory accesses instead of 
seven.

CU
Uli

-- 
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] TCG "sync" op
  2009-10-16 16:37     ` Ulrich Hecht
@ 2009-10-16 17:29       ` Aurelien Jarno
  2009-10-19 17:17         ` Ulrich Hecht
  0 siblings, 1 reply; 26+ messages in thread
From: Aurelien Jarno @ 2009-10-16 17:29 UTC (permalink / raw)
  To: Ulrich Hecht; +Cc: riku.voipio, qemu-devel, agraf

On Fri, Oct 16, 2009 at 06:37:31PM +0200, Ulrich Hecht wrote:
> On Friday 16 October 2009, Aurelien Jarno wrote:
> > IMHO, the correct way to do it is to use the following code, assuming
> > you want to use 64-bit TCG regs to hold 32-bit values (that's
> > something that is not really clear in your next patch):
> >
> > - for register load:
> > | static TCGv load_reg(int reg)
> > | {
> > |    TCGv r = tcg_temp_new_i64();
> > |    tcg_gen_ext32u_i64(r, tcgregs[reg]);
> > |    return r;
> > | }
> > |
> > | static void store_reg32(int reg, TCGv v)
> > | {
> > |    tcg_gen_ext32u_i64(v, v); /* may be optional */
> > |    tcg_gen_andi_i64(tcgregs[reg], tcgregs[reg],
> > | 0xffffffff00000000ULL); tcg_gen_or_i64(tcgregs[reg], tcgregs[reg],
> > | v);
> > | }
> 
> This is _extremely_ detrimental to performance. The point of the sync op 
> is that in most cases it's a nop because registers are usually used with 
> the same bitness again and again. The sign extension/masking stuff is 

I don't really understand how it can be simply be a nop, given it calls
temp_save. It means if a register is used twice in a BB, it is fetch
again from memory.

> done every time a register is accessed as 32 bits, which is the most 
> common case. Compare the translation of the following sequence of 
> instructions:
> 
> IN: _dl_aux_init
> 0x0000000080044ff6:  lhi        %r4,0
> 0x0000000080044ffa:  lhi        %r5,0
> 0x0000000080044ffe:  lhi        %r0,0

This example is a bit biased, as registers are only saved, and never
reused. Let's comment on it though.

> with sync:
> 
> OP:
>  mov_i32 loc0,global_cc
>  movi_i32 tmp1,$0x0
>  sync_i64 R4
>  mov_i32 r4,tmp1
>  movi_i32 tmp1,$0x0
>  sync_i64 R5
>  mov_i32 r5,tmp1
>  movi_i32 tmp1,$0x0
>  sync_i64 R0
>  mov_i32 r0,tmp1
>  mov_i32 global_cc,loc0
>  movi_i64 tmp2,$0x80045002
>  st_i64 tmp2,env,$0x158
>  exit_tb $0x0
> 
> OUT: [size=61]
> 0x6019a030:  mov    0x160(%r14),%ebp
> 0x6019a037:  mov    %rbp,%rbx
> 0x6019a03a:  mov    $0x80045002,%r12d
> 0x6019a040:  mov    %r12,0x158(%r14)
> 0x6019a047:  mov    %ebp,0xd1a0(%r14)
> 0x6019a04e:  mov    %ebx,0x160(%r14)
> 0x6019a055:  xor    %ebp,%ebp
> 0x6019a057:  mov    %ebp,(%r14)
> 0x6019a05a:  xor    %ebp,%ebp
> 0x6019a05c:  mov    %ebp,0x20(%r14)
> 0x6019a060:  xor    %ebp,%ebp
> 0x6019a062:  mov    %ebp,0x28(%r14)
> 0x6019a066:  xor    %eax,%eax
> 0x6019a068:  jmpq   0x621dc8ce
> 
> 
> with sign extension:
> 
> OP:
>  mov_i32 loc0,global_cc
>  movi_i32 tmp1,$0x0
>  ext32u_i64 tmp1,tmp1
>  movi_i64 tmp2,$0xffffffff00000000
>  and_i64 R4,R4,tmp2
>  or_i64 R4,R4,tmp1
>  movi_i32 tmp1,$0x0
>  ext32u_i64 tmp1,tmp1
>  movi_i64 tmp2,$0xffffffff00000000
>  and_i64 R5,R5,tmp2
>  or_i64 R5,R5,tmp1
>  movi_i32 tmp1,$0x0
>  ext32u_i64 tmp1,tmp1
>  movi_i64 tmp2,$0xffffffff00000000
>  and_i64 R0,R0,tmp2
>  or_i64 R0,R0,tmp1
>  mov_i32 global_cc,loc0
>  movi_i64 tmp2,$0x80045002
>  st_i64 tmp2,env,$0x158
>  exit_tb $0x0
> 
> OUT: [size=126]
> 0x6019af10:  mov    0x160(%r14),%ebp
> 0x6019af17:  xor    %ebx,%ebx
> 0x6019af19:  mov    %ebx,%ebx
> 0x6019af1b:  mov    0x20(%r14),%r12
> 0x6019af1f:  mov    $0xffffffff00000000,%r13
> 0x6019af29:  and    %r13,%r12
> 0x6019af2c:  or     %rbx,%r12
> 0x6019af2f:  xor    %ebx,%ebx
> 0x6019af31:  mov    %ebx,%ebx
> 0x6019af33:  mov    0x28(%r14),%r13
> 0x6019af37:  mov    $0xffffffff00000000,%r15
> 0x6019af41:  and    %r15,%r13
> 0x6019af44:  or     %rbx,%r13
> 0x6019af47:  xor    %ebx,%ebx
> 0x6019af49:  mov    %ebx,%ebx
> 0x6019af4b:  mov    (%r14),%r15
> 0x6019af4e:  mov    $0xffffffff00000000,%r10
> 0x6019af58:  and    %r10,%r15
> 0x6019af5b:  or     %rbx,%r15
> 0x6019af5e:  mov    %rbp,%rbx
> 0x6019af61:  mov    $0x80045002,%r10d
> 0x6019af67:  mov    %r10,0x158(%r14)
> 0x6019af6e:  mov    %ebp,0xd1a0(%r14)
> 0x6019af75:  mov    %ebx,0x160(%r14)
> 0x6019af7c:  mov    %r15,(%r14)
> 0x6019af7f:  mov    %r12,0x20(%r14)
> 0x6019af83:  mov    %r13,0x28(%r14)
> 0x6019af87:  xor    %eax,%eax
> 0x6019af89:  jmpq   0x621dd78e
> 
> Its more than twice the size and has ten memory accesses instead of 
> seven.
> 

There is clearly a huge impact for saving the globals, that I didn't
expect. I still believe a sync op as implemented in your patch is in
opposite direction to TCG's philosophy, probably does not work with
fixed registers, and I don't understand what is the gain compared to 
the use of tcg_gen_ld/st. Maybe you can post a dump of such a version 
so that we can see the benefit?

I think instead of a sync op, we should think of a way to use only the
low part of a global variable, maybe by adding new ops. That can also 
help to improve the concat_i32_i64 op on 64-bit hosts. Does someone has
any idea?

Aurelien

-- 
Aurelien Jarno	                        GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] TCG "sync" op
  2009-10-16 15:52   ` [Qemu-devel] [PATCH 1/9] TCG "sync" op Aurelien Jarno
  2009-10-16 16:37     ` Ulrich Hecht
@ 2009-10-17  8:59     ` Edgar E. Iglesias
  2009-10-19 17:17       ` Ulrich Hecht
  1 sibling, 1 reply; 26+ messages in thread
From: Edgar E. Iglesias @ 2009-10-17  8:59 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: riku.voipio, qemu-devel, agraf

On Fri, Oct 16, 2009 at 05:52:21PM +0200, Aurelien Jarno wrote:
> On Fri, Oct 16, 2009 at 02:38:47PM +0200, Ulrich Hecht wrote:
> > sync allows concurrent accesses to locations in memory through different TCG
> > variables. This comes in handy when you are emulating CPU registers that can
> > be used as either 32 or 64 bit, as TCG doesn't know anything about aliases.
> > See the s390x target for an example.
> > 
> > Fixed sync_i64 build failure on 32-bit targets.
> 
> It don't really see the point of such a new op, especially the way it is
> used in the S390 target.

Hi,

I looked at the s390 patches and was also unsure about this sync op.
I'm not convinced it's bad but my first feeling was as Aurelien points
out that the translator shoud take care of it. Not sure though, it would
be nice to hear what other ppl think.

Another thing I noticed was the large amount of helpers. Without looking
at the details my feeling was that you could probably do more at
translation time. That kind of optimization can be done incrementally
with follow-up patches though.

Other than that the s390 series looks OK and should IMO get committed.

Nice work!

Cheers

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 2/9] S/390 CPU emulation
  2009-10-16 12:38   ` [Qemu-devel] [PATCH 2/9] S/390 CPU emulation Ulrich Hecht
  2009-10-16 12:38     ` [Qemu-devel] [PATCH 3/9] S/390 host/target build system support Ulrich Hecht
@ 2009-10-17 10:42     ` Aurelien Jarno
  2009-10-19 17:17       ` Ulrich Hecht
  1 sibling, 1 reply; 26+ messages in thread
From: Aurelien Jarno @ 2009-10-17 10:42 UTC (permalink / raw)
  To: Ulrich Hecht; +Cc: riku.voipio, qemu-devel, agraf

On Fri, Oct 16, 2009 at 02:38:48PM +0200, Ulrich Hecht wrote:
> Currently only does userspace with 64-bit addressing, but it's quite good
> at that.
> 
> replaced always_inline with inline
> 
> Signed-off-by: Ulrich Hecht <uli@suse.de>
> ---
>  cpu-exec.c               |    2 +
>  disas.c                  |    3 +
>  s390-dis.c               |    4 +-
>  target-s390x/cpu.h       |  132 +++
>  target-s390x/exec.h      |   51 +
>  target-s390x/helper.c    |   81 ++
>  target-s390x/helpers.h   |  128 +++
>  target-s390x/op_helper.c | 1719 ++++++++++++++++++++++++++++++++
>  target-s390x/translate.c | 2479 ++++++++++++++++++++++++++++++++++++++++++++++
>  9 files changed, 4597 insertions(+), 2 deletions(-)
>  create mode 100644 target-s390x/cpu.h
>  create mode 100644 target-s390x/exec.h
>  create mode 100644 target-s390x/helper.c
>  create mode 100644 target-s390x/helpers.h
>  create mode 100644 target-s390x/op_helper.c
>  create mode 100644 target-s390x/translate.c

Thanks for this nice patch. s390 is one if the missing major targets
with ia64.

First of all a few general comments. Note that I know very few things
about S390/S390X, so I may have dumb comments/questions. Also as the
patch is very long, I probably have missed things, we should probably
iterate with new versions of the patches.

Is it possible given the current implementation to emulate S390 in
addition to S390X? If yes how different would be a S390 only target? If
they won't be too different, it probably worth using _tl type registers
and call it target-s390. A bit the way it is done with i386/x86_64,
ppc/ppc64, mips/mips64 or sparc/sparc64.

Secondly there seems to be a lot of mix between 32-bit and 64-bit
TCG registers. They should not be mixed. _i32 ops apply to TCGv_i32
variables, _i64 ops apply to TCGv_i64 variables, and _tl ops to TGCv
variables. TCGv/_tl types can map either to _i32 or _i64 depending on
your target. You should try to build the target with --enable-debug-tcg

Otherwise please find my comments inline.

> diff --git a/cpu-exec.c b/cpu-exec.c
> index 8aa92c7..6b3391c 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -249,6 +249,7 @@ int cpu_exec(CPUState *env1)
>  #elif defined(TARGET_MIPS)
>  #elif defined(TARGET_SH4)
>  #elif defined(TARGET_CRIS)
> +#elif defined(TARGET_S390X)
>      /* XXXXX */
>  #else
>  #error unsupported target CPU
> @@ -673,6 +674,7 @@ int cpu_exec(CPUState *env1)
>  #elif defined(TARGET_SH4)
>  #elif defined(TARGET_ALPHA)
>  #elif defined(TARGET_CRIS)
> +#elif defined(TARGET_S390X)
>      /* XXXXX */
>  #else
>  #error unsupported target CPU
> diff --git a/disas.c b/disas.c
> index ce342bc..14c8901 100644
> --- a/disas.c
> +++ b/disas.c
> @@ -195,6 +195,9 @@ void target_disas(FILE *out, target_ulong code, target_ulong size, int flags)
>  #elif defined(TARGET_CRIS)
>      disasm_info.mach = bfd_mach_cris_v32;
>      print_insn = print_insn_crisv32;
> +#elif defined(TARGET_S390X)
> +    disasm_info.mach = bfd_mach_s390_64;
> +    print_insn = print_insn_s390;
>  #elif defined(TARGET_MICROBLAZE)
>      disasm_info.mach = bfd_arch_microblaze;
>      print_insn = print_insn_microblaze;

It would be nice to split all the disassembling part in a separate
combined with the related makefile changes. This way it can be applied
separately.

> diff --git a/s390-dis.c b/s390-dis.c
> index 86dd84f..9a73a57 100644
> --- a/s390-dis.c
> +++ b/s390-dis.c
> @@ -191,10 +191,10 @@ init_disasm (struct disassemble_info *info)
>  //  switch (info->mach)
>  //    {
>  //    case bfd_mach_s390_31:
> -      current_arch_mask = 1 << S390_OPCODE_ESA;
> +//      current_arch_mask = 1 << S390_OPCODE_ESA;
>  //      break;
>  //    case bfd_mach_s390_64:
> -//      current_arch_mask = 1 << S390_OPCODE_ZARCH;
> +      current_arch_mask = 1 << S390_OPCODE_ZARCH;
>  //      break;
>  //    default:
>  //      abort ();

While I understand the second part, Why is it necessary to comment the
bfd_mach_s390_31 case?

> diff --git a/target-s390x/cpu.h b/target-s390x/cpu.h
> new file mode 100644
> index 0000000..93b09cd
> --- /dev/null
> +++ b/target-s390x/cpu.h
> @@ -0,0 +1,132 @@
> +/*
> + * S/390 virtual CPU header
> + *
> + *  Copyright (c) 2009 Ulrich Hecht
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston MA  02110-1301 USA
> + */
> +#ifndef CPU_S390X_H
> +#define CPU_S390X_H
> +
> +#define TARGET_LONG_BITS 64
> +
> +#define ELF_MACHINE	EM_S390
> +
> +#define CPUState struct CPUS390XState
> +
> +#include "cpu-defs.h"
> +
> +#include "softfloat.h"
> +
> +#define NB_MMU_MODES 2 // guess
> +#define MMU_USER_IDX 0 // guess
> +
> +typedef union FPReg {
> +    struct {
> +#ifdef WORDS_BIGENDIAN
> +        float32 e;
> +        int32_t __pad;
> +#else
> +        int32_t __pad;
> +        float32 e;
> +#endif
> +    };
> +    float64 d;
> +    uint64_t i;
> +} FPReg;

WORDS_BIGENDIAN is wrong here. It should probably be 
HOST_WORDS_BIGENDIAN. Also it may be a better idea to
reuse CPU_FloatU and CPU_DoubleU here.

> +typedef struct CPUS390XState {
> +    uint64_t regs[16];	/* GP registers */
> +    
> +    uint32_t aregs[16];	/* access registers */
> +    
> +    uint32_t fpc;	/* floating-point control register */
> +    FPReg fregs[16]; /* FP registers */
> +    float_status fpu_status; /* passed to softfloat lib */
> +    
> +    struct {
> +        uint64_t mask;
> +        uint64_t addr;
> +    } psw;
> +    
> +    int cc; /* condition code (0-3) */
> +    
> +    uint64_t __excp_addr;
> +    
> +    CPU_COMMON
> +} CPUS390XState;
> +
> +#if defined(CONFIG_USER_ONLY)
> +static inline void cpu_clone_regs(CPUState *env, target_ulong newsp)
> +{
> +    if (newsp)
> +        env->regs[15] = newsp;
> +    env->regs[0] = 0;
> +}
> +#endif
> +
> +CPUS390XState *cpu_s390x_init(const char *cpu_model);
> +void s390x_translate_init(void);
> +int cpu_s390x_exec(CPUS390XState *s);
> +void cpu_s390x_close(CPUS390XState *s);
> +void do_interrupt (CPUState *env);
> +
> +/* you can call this signal handler from your SIGBUS and SIGSEGV
> +   signal handlers to inform the virtual CPU of exceptions. non zero
> +   is returned if the signal was handled by the virtual CPU.  */
> +int cpu_s390x_signal_handler(int host_signum, void *pinfo,
> +                           void *puc);
> +int cpu_s390x_handle_mmu_fault (CPUS390XState *env, target_ulong address, int rw,
> +                              int mmu_idx, int is_softmuu);
> +#define cpu_handle_mmu_fault cpu_s390x_handle_mmu_fault
> +
> +void cpu_lock(void);
> +void cpu_unlock(void);
> +
> +static inline void cpu_set_tls(CPUS390XState *env, target_ulong newtls)
> +{
> +    env->aregs[0] = newtls >> 32;
> +    env->aregs[1] = newtls & 0xffffffffULL;
> +}
> +
> +#define TARGET_PAGE_BITS 12 // guess
> +
> +#define cpu_init cpu_s390x_init
> +#define cpu_exec cpu_s390x_exec
> +#define cpu_gen_code cpu_s390x_gen_code
> +#define cpu_signal_handler cpu_s390x_signal_handler
> +//#define cpu_list s390x_cpu_list
> +
> +#include "cpu-all.h"
> +#include "exec-all.h"
> +
> +#define EXCP_OPEX 1 /* operation exception (sigill) */
> +#define EXCP_SVC 2 /* supervisor call (syscall) */
> +#define EXCP_ADDR 5 /* addressing exception */
> +#define EXCP_EXECUTE_SVC 0xff00000 /* supervisor call via execute insn */
> +
> +static inline void cpu_pc_from_tb(CPUState *env, TranslationBlock* tb)
> +{
> +    env->psw.addr = tb->pc;
> +}
> +
> +static inline void cpu_get_tb_cpu_state(CPUState* env, target_ulong *pc,
> +                                        target_ulong *cs_base, int *flags)
> +{
> +    *pc = env->psw.addr;
> +    *cs_base = 0;
> +    *flags = env->psw.mask; // guess
> +}
> +#endif
> diff --git a/target-s390x/exec.h b/target-s390x/exec.h
> new file mode 100644
> index 0000000..5198359
> --- /dev/null
> +++ b/target-s390x/exec.h
> @@ -0,0 +1,51 @@
> +/*
> + *  S/390 execution defines
> + *
> + *  Copyright (c) 2009 Ulrich Hecht
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston MA  02110-1301 USA
> + */
> +
> +#include "dyngen-exec.h"
> +
> +register struct CPUS390XState *env asm(AREG0);
> +
> +#include "cpu.h"
> +#include "exec-all.h"
> +
> +static inline int cpu_has_work(CPUState *env)
> +{
> +    return env->interrupt_request & CPU_INTERRUPT_HARD; // guess
> +}
> +
> +static inline void regs_to_env(void)
> +{
> +}
> +
> +static inline void env_to_regs(void)
> +{
> +}
> +
> +static inline int cpu_halted(CPUState *env)
> +{
> +    if (!env->halted) {
> +       return 0;
> +    }
> +    if (cpu_has_work(env)) {
> +        env->halted = 0;
> +        return 0;
> +    }
> +    return EXCP_HALTED;
> +}
> diff --git a/target-s390x/helper.c b/target-s390x/helper.c
> new file mode 100644
> index 0000000..5407c62
> --- /dev/null
> +++ b/target-s390x/helper.c
> @@ -0,0 +1,81 @@
> +/*
> + *  S/390 helpers
> + *
> + *  Copyright (c) 2009 Ulrich Hecht
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston MA  02110-1301 USA
> + */
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +
> +#include "cpu.h"
> +#include "exec-all.h"
> +#include "gdbstub.h"
> +#include "qemu-common.h"
> +
> +CPUS390XState *cpu_s390x_init(const char *cpu_model)
> +{
> +    CPUS390XState *env;
> +    static int inited = 0;
> +    
> +    env = qemu_mallocz(sizeof(CPUS390XState));
> +    cpu_exec_init(env);
> +    if (!inited) {
> +        inited = 1;
> +        s390x_translate_init();
> +    }
> +    
> +    env->cpu_model_str = cpu_model;
> +    cpu_reset(env);
> +    qemu_init_vcpu(env);
> +    return env;
> +}

This function would probably be better in translate.c, so that
s390x_translate_init() can be declared static.

> +#if defined(CONFIG_USER_ONLY)
> +
> +void do_interrupt (CPUState *env)
> +{
> +    env->exception_index = -1;
> +}
> +
> +int cpu_s390x_handle_mmu_fault (CPUState *env, target_ulong address, int rw,
> +                              int mmu_idx, int is_softmmu)
> +{
> +    //fprintf(stderr,"%s: address 0x%lx rw %d mmu_idx %d is_softmmu %d\n", __FUNCTION__, address, rw, mmu_idx, is_softmmu);

Please use /* */ for comments (See CODING_STYLE).

> +    env->exception_index = EXCP_ADDR;
> +    env->__excp_addr = address; /* FIXME: find out how this works on a real machine */
> +    return 1;
> +}
> +
> +target_phys_addr_t cpu_get_phys_page_debug(CPUState *env, target_ulong addr)
> +{
> +    return addr;
> +}
> +
> +#endif /* CONFIG_USER_ONLY */
> +
> +void cpu_reset(CPUS390XState *env)
> +{
> +    if (qemu_loglevel_mask(CPU_LOG_RESET)) {
> +        qemu_log("CPU Reset (CPU %d)\n", env->cpu_index);
> +        log_cpu_state(env, 0);
> +    }
> +    
> +    memset(env, 0, offsetof(CPUS390XState, breakpoints));
> +    /* FIXME: reset vector? */
> +    tlb_flush(env, 1);
> +}

Yes, reset vector, MMU mode, default register values, ...

> diff --git a/target-s390x/helpers.h b/target-s390x/helpers.h
> new file mode 100644
> index 0000000..0ba2086
> --- /dev/null
> +++ b/target-s390x/helpers.h
> @@ -0,0 +1,128 @@
> +#include "def-helper.h"
> +
> +DEF_HELPER_1(exception, void, i32)
> +DEF_HELPER_4(nc, i32, i32, i32, i32, i32)
> +DEF_HELPER_4(oc, i32, i32, i32, i32, i32)
> +DEF_HELPER_4(xc, i32, i32, i32, i32, i32)
> +DEF_HELPER_4(mvc, void, i32, i32, i32, i32)
> +DEF_HELPER_4(clc, i32, i32, i32, i32, i32)
> +DEF_HELPER_4(lmg, void, i32, i32, i32, s32)
> +DEF_HELPER_4(stmg, void, i32, i32, i32, s32)
> +DEF_HELPER_FLAGS_1(set_cc_s32, TCG_CALL_PURE|TCG_CALL_CONST, i32, s32)
> +DEF_HELPER_FLAGS_1(set_cc_s64, TCG_CALL_PURE|TCG_CALL_CONST, i32, s64)
> +DEF_HELPER_FLAGS_1(set_cc_comp_s32, TCG_CALL_PURE|TCG_CALL_CONST, i32, s32)
> +DEF_HELPER_FLAGS_1(set_cc_comp_s64, TCG_CALL_PURE|TCG_CALL_CONST, i32, s64)
> +DEF_HELPER_FLAGS_1(set_cc_nz_u32, TCG_CALL_PURE|TCG_CALL_CONST, i32, i32)
> +DEF_HELPER_FLAGS_1(set_cc_nz_u64, TCG_CALL_PURE|TCG_CALL_CONST, i32, i64)
> +DEF_HELPER_FLAGS_2(set_cc_icm, TCG_CALL_PURE|TCG_CALL_CONST, i32, i32, i32)
> +DEF_HELPER_4(brc, void, i32, i32, i64, s32)
> +DEF_HELPER_3(brctg, void, i64, i64, s32)
> +DEF_HELPER_3(brct, void, i32, i64, s32)
> +DEF_HELPER_4(brcl, void, i32, i32, i64, s64)
> +DEF_HELPER_4(bcr, void, i32, i32, i64, i64)
> +DEF_HELPER_4(bc, void, i32, i32, i64, i64)
> +DEF_HELPER_FLAGS_2(cmp_u64, TCG_CALL_PURE|TCG_CALL_CONST, i32, i64, i64)
> +DEF_HELPER_FLAGS_2(cmp_u32, TCG_CALL_PURE|TCG_CALL_CONST, i32, i32, i32)
> +DEF_HELPER_FLAGS_2(cmp_s32, TCG_CALL_PURE|TCG_CALL_CONST, i32, s32, s32)
> +DEF_HELPER_FLAGS_2(cmp_s64, TCG_CALL_PURE|TCG_CALL_CONST, i32, s64, s64)
> +DEF_HELPER_3(clm, i32, i32, i32, i64)
> +DEF_HELPER_3(stcm, void, i32, i32, i64)
> +DEF_HELPER_2(mlg, void, i32, i64)
> +DEF_HELPER_2(dlg, void, i32, i64)
> +DEF_HELPER_FLAGS_3(set_cc_add64, TCG_CALL_PURE|TCG_CALL_CONST, i32, s64, s64, s64)
> +DEF_HELPER_FLAGS_3(set_cc_addu64, TCG_CALL_PURE|TCG_CALL_CONST, i32, i64, i64, i64)
> +DEF_HELPER_FLAGS_3(set_cc_add32, TCG_CALL_PURE|TCG_CALL_CONST, i32, s32, s32, s32)
> +DEF_HELPER_FLAGS_3(set_cc_addu32, TCG_CALL_PURE|TCG_CALL_CONST, i32, i32, i32, i32)
> +DEF_HELPER_FLAGS_3(set_cc_sub64, TCG_CALL_PURE|TCG_CALL_CONST, i32, s64, s64, s64)
> +DEF_HELPER_FLAGS_3(set_cc_subu64, TCG_CALL_PURE|TCG_CALL_CONST, i32, i64, i64, i64)
> +DEF_HELPER_FLAGS_3(set_cc_sub32, TCG_CALL_PURE|TCG_CALL_CONST, i32, s32, s32, s32)
> +DEF_HELPER_FLAGS_3(set_cc_subu32, TCG_CALL_PURE|TCG_CALL_CONST, i32, i32, i32, i32)
> +DEF_HELPER_3(srst, i32, i32, i32, i32)
> +DEF_HELPER_3(clst, i32, i32, i32, i32)
> +DEF_HELPER_3(mvst, i32, i32, i32, i32)
> +DEF_HELPER_3(csg, i32, i32, i64, i32)
> +DEF_HELPER_3(cdsg, i32, i32, i64, i32)
> +DEF_HELPER_3(cs, i32, i32, i64, i32)
> +DEF_HELPER_4(ex, i32, i32, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(tm, TCG_CALL_PURE|TCG_CALL_CONST, i32, i32, i32)
> +DEF_HELPER_FLAGS_2(tmxx, TCG_CALL_PURE|TCG_CALL_CONST, i32, i64, i32)
> +DEF_HELPER_2(abs_i32, i32, i32, s32)
> +DEF_HELPER_2(nabs_i32, i32, i32, s32)
> +DEF_HELPER_2(abs_i64, i32, i32, s64)
> +DEF_HELPER_2(nabs_i64, i32, i32, s64)
> +DEF_HELPER_3(stcmh, i32, i32, i64, i32)
> +DEF_HELPER_3(icmh, i32, i32, i64, i32)
> +DEF_HELPER_2(ipm, void, i32, i32)
> +DEF_HELPER_3(addc_u32, i32, i32, i32, i32)
> +DEF_HELPER_FLAGS_3(set_cc_addc_u64, TCG_CALL_PURE|TCG_CALL_CONST, i32, i64, i64, i64)
> +DEF_HELPER_3(stam, void, i32, i64, i32)
> +DEF_HELPER_3(mvcle, i32, i32, i64, i32)
> +DEF_HELPER_3(clcle, i32, i32, i64, i32)
> +DEF_HELPER_4(slb, i32, i32, i32, i32, i32)
> +DEF_HELPER_4(slbg, i32, i32, i32, i64, i64)
> +DEF_HELPER_2(cefbr, void, i32, s32)
> +DEF_HELPER_2(cdfbr, void, i32, s32)
> +DEF_HELPER_2(cxfbr, void, i32, s32)
> +DEF_HELPER_2(cegbr, void, i32, s64)
> +DEF_HELPER_2(cdgbr, void, i32, s64)
> +DEF_HELPER_2(cxgbr, void, i32, s64)
> +DEF_HELPER_2(adbr, i32, i32, i32)
> +DEF_HELPER_2(aebr, i32, i32, i32)
> +DEF_HELPER_2(sebr, i32, i32, i32)
> +DEF_HELPER_2(sdbr, i32, i32, i32)
> +DEF_HELPER_2(debr, void, i32, i32)
> +DEF_HELPER_2(dxbr, void, i32, i32)
> +DEF_HELPER_2(mdbr, void, i32, i32)
> +DEF_HELPER_2(mxbr, void, i32, i32)
> +DEF_HELPER_2(ldebr, void, i32, i32)
> +DEF_HELPER_2(ldxbr, void, i32, i32)
> +DEF_HELPER_2(lxdbr, void, i32, i32)
> +DEF_HELPER_2(ledbr, void, i32, i32)
> +DEF_HELPER_2(lexbr, void, i32, i32)
> +DEF_HELPER_2(lpebr, i32, i32, i32)
> +DEF_HELPER_2(lpdbr, i32, i32, i32)
> +DEF_HELPER_2(lpxbr, i32, i32, i32)
> +DEF_HELPER_2(ltebr, i32, i32, i32)
> +DEF_HELPER_2(ltdbr, i32, i32, i32)
> +DEF_HELPER_2(ltxbr, i32, i32, i32)
> +DEF_HELPER_2(lcebr, i32, i32, i32)
> +DEF_HELPER_2(lcdbr, i32, i32, i32)
> +DEF_HELPER_2(lcxbr, i32, i32, i32)
> +DEF_HELPER_2(ceb, i32, i32, i64)
> +DEF_HELPER_2(aeb, i32, i32, i64)
> +DEF_HELPER_2(deb, void, i32, i64)
> +DEF_HELPER_2(meeb, void, i32, i64)
> +DEF_HELPER_2(cdb, i32, i32, i64)
> +DEF_HELPER_2(adb, i32, i32, i64)
> +DEF_HELPER_2(seb, i32, i32, i64)
> +DEF_HELPER_2(sdb, i32, i32, i64)
> +DEF_HELPER_2(mdb, void, i32, i64)
> +DEF_HELPER_2(ddb, void, i32, i64)
> +DEF_HELPER_FLAGS_2(cebr, TCG_CALL_PURE, i32, i32, i32)
> +DEF_HELPER_FLAGS_2(cdbr, TCG_CALL_PURE, i32, i32, i32)
> +DEF_HELPER_FLAGS_2(cxbr, TCG_CALL_PURE, i32, i32, i32)
> +DEF_HELPER_3(cgebr, i32, i32, i32, i32)
> +DEF_HELPER_3(cgdbr, i32, i32, i32, i32)
> +DEF_HELPER_3(cgxbr, i32, i32, i32, i32)
> +DEF_HELPER_1(lzer, void, i32)
> +DEF_HELPER_1(lzdr, void, i32)
> +DEF_HELPER_1(lzxr, void, i32)
> +DEF_HELPER_3(cfebr, i32, i32, i32, i32)
> +DEF_HELPER_3(cfdbr, i32, i32, i32, i32)
> +DEF_HELPER_3(cfxbr, i32, i32, i32, i32)
> +DEF_HELPER_2(axbr, i32, i32, i32)
> +DEF_HELPER_2(sxbr, i32, i32, i32)
> +DEF_HELPER_2(meebr, void, i32, i32)
> +DEF_HELPER_2(ddbr, void, i32, i32)
> +DEF_HELPER_3(madb, void, i32, i64, i32)
> +DEF_HELPER_3(maebr, void, i32, i32, i32)
> +DEF_HELPER_3(madbr, void, i32, i32, i32)
> +DEF_HELPER_3(msdbr, void, i32, i32, i32)
> +DEF_HELPER_2(lxdb, void, i32, i64)
> +DEF_HELPER_FLAGS_2(tceb, TCG_CALL_PURE, i32, i32, i64)
> +DEF_HELPER_FLAGS_2(tcdb, TCG_CALL_PURE, i32, i32, i64)
> +DEF_HELPER_FLAGS_2(tcxb, TCG_CALL_PURE, i32, i32, i64)
> +DEF_HELPER_2(flogr, i32, i32, i64)
> +DEF_HELPER_2(sqdbr, void, i32, i32)
> +
> +#include "def-helper.h"
> diff --git a/target-s390x/op_helper.c b/target-s390x/op_helper.c
> new file mode 100644
> index 0000000..5de4d08
> --- /dev/null
> +++ b/target-s390x/op_helper.c
> @@ -0,0 +1,1719 @@
> +/*
> + *  S/390 helper routines
> + *
> + *  Copyright (c) 2009 Ulrich Hecht
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston MA  02110-1301 USA
> + */
> +
> +#include "exec.h"
> +#include "helpers.h"
> +#include <string.h>
> +
> +//#define DEBUG_HELPER
> +#ifdef DEBUG_HELPER
> +#define HELPER_LOG(x...) qemu_log(x)
> +#else
> +#define HELPER_LOG(x...)
> +#endif

Small comment for the rest of the file. While I understand HELPER_LOG is
very handy while developing, I think some calls to it can be dropped in
really simple functions. Keeping it in more complex function is not a
problem though.

> +/* raise an exception */
> +void HELPER(exception)(uint32_t excp)
> +{
> +    HELPER_LOG("%s: exception %d\n", __FUNCTION__, excp);
> +    env->exception_index = excp;
> +    cpu_loop_exit();
> +}
> +
> +/* and on array */
> +uint32_t HELPER(nc)(uint32_t l, uint32_t b, uint32_t d1, uint32_t d2)
> +{
> +    uint64_t dest = env->regs[b >> 4] + d1;
> +    uint64_t src = env->regs[b & 0xf] + d2;
> +    int i;
> +    unsigned char x;
> +    uint32_t cc = 0;
> +    HELPER_LOG("%s l %d b 0x%x d1 %d d2 %d\n", __FUNCTION__, l, b, d1, d2);
> +    for (i = 0; i <= l; i++) {
> +        x = ldub(dest + i) & ldub(src + i);
> +        if (x) cc = 1;

coding style

> +        stb(dest + i, x);
> +    }
> +    return cc;
> +}
> +
> +/* xor on array */
> +uint32_t HELPER(xc)(uint32_t l, uint32_t b, uint32_t d1, uint32_t d2)
> +{
> +    uint64_t dest = env->regs[b >> 4] + d1;
> +    uint64_t src = env->regs[b & 0xf] + d2;
> +    int i;
> +    unsigned char x;
> +    uint32_t cc = 0;
> +    HELPER_LOG("%s l %d b 0x%x d1 %d d2 %d\n", __FUNCTION__, l, b, d1, d2);
> +    for (i = 0; i <= l; i++) {
> +        x = ldub(dest + i) ^ ldub(src + i);
> +        if (x) cc = 1;

coding style

> +        stb(dest + i, x);
> +    }
> +    return cc;
> +}
> +
> +/* or on array */
> +uint32_t HELPER(oc)(uint32_t l, uint32_t b, uint32_t d1, uint32_t d2)
> +{
> +    uint64_t dest = env->regs[b >> 4] + d1;
> +    uint64_t src = env->regs[b & 0xf] + d2;
> +    int i;
> +    unsigned char x;
> +    uint32_t cc = 0;
> +    HELPER_LOG("%s l %d b 0x%x d1 %d d2 %d\n", __FUNCTION__, l, b, d1, d2);
> +    for (i = 0; i <= l; i++) {
> +        x = ldub(dest + i) | ldub(src + i);
> +        if (x) cc = 1;

coding style

> +        stb(dest + i, x);
> +    }
> +    return cc;
> +}
> +
> +/* memcopy */
> +void HELPER(mvc)(uint32_t l, uint32_t b, uint32_t d1, uint32_t d2)
> +{
> +    uint64_t dest = env->regs[b >> 4] + d1;
> +    uint64_t src = env->regs[b & 0xf] + d2;
> +    int i;
> +    HELPER_LOG("%s l %d b 0x%x d1 %d d2 %d\n", __FUNCTION__, l, b, d1, d2);
> +    for (i = 0; i <= l; i++) {
> +        stb(dest + i, ldub(src + i));
> +    }
> +}
> +
> +/* compare unsigned byte arrays */
> +uint32_t HELPER(clc)(uint32_t l, uint32_t b, uint32_t d1, uint32_t d2)
> +{
> +    uint64_t s1 = env->regs[b >> 4] + d1;
> +    uint64_t s2 = env->regs[b & 0xf] + d2;
> +    int i;
> +    unsigned char x,y;
> +    uint32_t cc;
> +    HELPER_LOG("%s l %d b 0x%x d1 %d d2 %d\n", __FUNCTION__, l, b, d1, d2);
> +    for (i = 0; i <= l; i++) {
> +        x = ldub(s1 + i);
> +        y = ldub(s2 + i);
> +        HELPER_LOG("%02x (%c)/%02x (%c) ", x, x, y, y);
> +        if (x < y) {
> +            cc = 1;
> +            goto done;
> +        }
> +        else if (x > y) {

coding style

> +            cc = 2;
> +            goto done;
> +        }
> +    }
> +    cc = 0;
> +done:
> +    HELPER_LOG("\n");
> +    return cc;
> +}
> +
> +/* load multiple 64-bit registers from memory */
> +void HELPER(lmg)(uint32_t r1, uint32_t r3, uint32_t b2, int d2)
> +{
> +    uint64_t src = env->regs[b2] + d2;
> +    for (;;) {
> +        env->regs[r1] = ldq(src);
> +        src += 8;
> +        if (r1 == r3) break;

coding style

> +        r1 = (r1 + 1) & 15;
> +    }
> +}
> +
> +/* store multiple 64-bit registers to memory */
> +void HELPER(stmg)(uint32_t r1, uint32_t r3, uint32_t b2, int d2)
> +{
> +    uint64_t dest = env->regs[b2] + d2;
> +    HELPER_LOG("%s: r1 %d r3 %d\n", __FUNCTION__, r1, r3);
> +    for (;;) {
> +        HELPER_LOG("storing r%d in 0x%lx\n", r1, dest);
> +        stq(dest, env->regs[r1]);
> +        dest += 8;
> +        if (r1 == r3) break;

coding style

> +        r1 = (r1 + 1) & 15;
> +    }
> +}
> +
> +/* set condition code for signed 32-bit arithmetics */
> +uint32_t HELPER(set_cc_s32)(int32_t v)
> +{
> +    if (v < 0) return 1;
> +    else if (v > 0) return 2;

coding style

> +    else return 0;
> +}
> +
> +/* set condition code for signed 64-bit arithmetics */
> +uint32_t HELPER(set_cc_s64)(int64_t v)
> +{
> +    if (v < 0) return 1;
> +    else if (v > 0) return 2;

coding style

> +    else return 0;
> +}
> +
> +/* set condition code for signed 32-bit two's complement */
> +uint32_t HELPER(set_cc_comp_s32)(int32_t v)
> +{
> +    if ((uint32_t)v == 0x80000000UL) return 3;
> +    else if (v < 0) return 1;
> +    else if (v > 0) return 2;
> +    else return 0;

coding style

> +}
> +
> +/* set condition code for signed 64-bit two's complement */
> +uint32_t HELPER(set_cc_comp_s64)(int64_t v)
> +{
> +    if ((uint64_t)v == 0x8000000000000000ULL) return 3;
> +    else if (v < 0) return 1;
> +    else if (v > 0) return 2;
> +    else return 0;

coding style

> +}
> +
> +/* set negative/zero condition code for 32-bit logical op */
> +uint32_t HELPER(set_cc_nz_u32)(uint32_t v)
> +{
> +    if (v) return 1;
> +    else return 0;

coding style

> +}
> +
> +/* set negative/zero condition code for 64-bit logical op */
> +uint32_t HELPER(set_cc_nz_u64)(uint64_t v)
> +{
> +    if (v) return 1;
> +    else return 0;

coding style

> +}

The last 6 helpers can probably be coded easily in TCG (they are very
similar to some PowerPC code), though merging this patch with the
helpers is not a problem at all.

> +/* set condition code for insert character under mask insn */
> +uint32_t HELPER(set_cc_icm)(uint32_t mask, uint32_t val)
> +{
> +    HELPER_LOG("%s: mask 0x%x val %d\n", __FUNCTION__, mask, val);
> +    uint32_t cc;
> +    if (!val || !mask) cc = 0;
> +    else {
> +        while (mask != 1) {
> +            mask >>= 1;
> +            val >>= 8;
> +        }
> +        if (val & 0x80) cc = 1;
> +        else cc = 2;
> +    }
> +    return cc;

coding style

> +}
> +
> +/* relative conditional branch */
> +void HELPER(brc)(uint32_t cc, uint32_t mask, uint64_t pc, int32_t offset)
> +{
> +    if ( mask & ( 1 << (3 - cc) ) ) {
> +        env->psw.addr = pc + offset;
> +    }
> +    else {
> +        env->psw.addr = pc + 4;
> +    }
> +}
> +
> +/* branch relative on 64-bit count (condition is computed inline, this only
> +   does the branch */
> +void HELPER(brctg)(uint64_t flag, uint64_t pc, int32_t offset)
> +{
> +    if (flag) {
> +        env->psw.addr = pc + offset;
> +    }
> +    else {
> +        env->psw.addr = pc + 4;
> +    }
> +    HELPER_LOG("%s: pc 0x%lx flag %ld psw.addr 0x%lx\n", __FUNCTION__, pc, flag,
> +             env->psw.addr);
> +}
> +
> +/* branch relative on 32-bit count (condition is computed inline, this only
> +   does the branch */
> +void HELPER(brct)(uint32_t flag, uint64_t pc, int32_t offset)
> +{
> +    if (flag) {
> +        env->psw.addr = pc + offset;
> +    }
> +    else {
> +        env->psw.addr = pc + 4;
> +    }
> +    HELPER_LOG("%s: pc 0x%lx flag %d psw.addr 0x%lx\n", __FUNCTION__, pc, flag,
> +             env->psw.addr);
> +}
> +
> +/* relative conditional branch with long displacement */
> +void HELPER(brcl)(uint32_t cc, uint32_t mask, uint64_t pc, int64_t offset)
> +{
> +    if ( mask & ( 1 << (3 - cc) ) ) {
> +        env->psw.addr = pc + offset;
> +    }
> +    else {
> +        env->psw.addr = pc + 6;
> +    }
> +    HELPER_LOG("%s: pc 0x%lx psw.addr 0x%lx\n", __FUNCTION__, pc, env->psw.addr);
> +}
> +
> +/* conditional branch to register (register content is passed as target) */
> +void HELPER(bcr)(uint32_t cc, uint32_t mask, uint64_t target, uint64_t pc)
> +{
> +    if ( mask & ( 1 << (3 - cc) ) ) {
> +        env->psw.addr = target;
> +    }
> +    else {
> +        env->psw.addr = pc + 2;
> +    }
> +}
> +
> +/* conditional branch to address (address is passed as target) */
> +void HELPER(bc)(uint32_t cc, uint32_t mask, uint64_t target, uint64_t pc)
> +{
> +    if ( mask & ( 1 << (3 - cc) ) ) {
> +        env->psw.addr = target;
> +    }
> +    else {
> +        env->psw.addr = pc + 4;
> +    }
> +    HELPER_LOG("%s: pc 0x%lx psw.addr 0x%lx r2 0x%lx r5 0x%lx\n", __FUNCTION__,
> +             pc, env->psw.addr, env->regs[2], env->regs[5]);
> +}

All the branch part would really gain to be coded in TCG, as it will
allow TB chaining.

> +/* 64-bit unsigned comparison */
> +uint32_t HELPER(cmp_u64)(uint64_t o1, uint64_t o2)
> +{
> +    if (o1 < o2) return 1;
> +    else if (o1 > o2) return 2;
> +    else return 0;

coding style

> +}
> +
> +/* 32-bit unsigned comparison */
> +uint32_t HELPER(cmp_u32)(uint32_t o1, uint32_t o2)
> +{
> +    HELPER_LOG("%s: o1 0x%x o2 0x%x\n", __FUNCTION__, o1, o2);
> +    if (o1 < o2) return 1;
> +    else if (o1 > o2) return 2;
> +    else return 0;
> +}
> +
> +/* 64-bit signed comparison */
> +uint32_t HELPER(cmp_s64)(int64_t o1, int64_t o2)
> +{
> +    HELPER_LOG("%s: o1 %ld o2 %ld\n", __FUNCTION__, o1, o2);
> +    if (o1 < o2) return 1;
> +    else if (o1 > o2) return 2;
> +    else return 0;

coding style

> +}
> +
> +/* 32-bit signed comparison */
> +uint32_t HELPER(cmp_s32)(int32_t o1, int32_t o2)
> +{
> +    if (o1 < o2) return 1;
> +    else if (o1 > o2) return 2;
> +    else return 0;

coding style

> +}

Same remarks as for previous comparisons, this can be done in TCG.

> +/* compare logical under mask */
> +uint32_t HELPER(clm)(uint32_t r1, uint32_t mask, uint64_t addr)
> +{
> +    uint8_t r,d;
> +    uint32_t cc;
> +    HELPER_LOG("%s: r1 0x%x mask 0x%x addr 0x%lx\n",__FUNCTION__,r1,mask,addr);
> +    cc = 0;
> +    while (mask) {
> +        if (mask & 8) {
> +            d = ldub(addr);
> +            r = (r1 & 0xff000000UL) >> 24;
> +            HELPER_LOG("mask 0x%x %02x/%02x (0x%lx) ", mask, r, d, addr);
> +            if (r < d) {
> +                cc = 1;
> +                break;
> +            }
> +            else if (r > d) {

coding style

> +                cc = 2;
> +                break;
> +            }
> +            addr++;
> +        }
> +        mask = (mask << 1) & 0xf;
> +        r1 <<= 8;
> +    }
> +    HELPER_LOG("\n");
> +    return cc;
> +}
> +
> +/* store character under mask */
> +void HELPER(stcm)(uint32_t r1, uint32_t mask, uint64_t addr)
> +{
> +    uint8_t r;
> +    HELPER_LOG("%s: r1 0x%x mask 0x%x addr 0x%lx\n",__FUNCTION__,r1,mask,addr);
> +    while (mask) {
> +        if (mask & 8) {
> +            r = (r1 & 0xff000000UL) >> 24;
> +            stb(addr, r);
> +            HELPER_LOG("mask 0x%x %02x (0x%lx) ", mask, r, addr);
> +            addr++;
> +        }
> +        mask = (mask << 1) & 0xf;
> +        r1 <<= 8;
> +    }
> +    HELPER_LOG("\n");
> +}
> +
> +/* 64/64 -> 128 unsigned multiplication */
> +void HELPER(mlg)(uint32_t r1, uint64_t v2)
> +{
> +    __uint128_t res = (__uint128_t)env->regs[r1 + 1];
> +    res *= (__uint128_t)v2;
> +    env->regs[r1] = (uint64_t)(res >> 64);
> +    env->regs[r1 + 1] = (uint64_t)res;
> +}

__uint128_t is probably not supported on all hosts/GCC versions.
mulu64() should be used instead.


> +/* 128 -> 64/64 unsigned division */
> +void HELPER(dlg)(uint32_t r1, uint64_t v2)
> +{
> +    __uint128_t dividend = (((__uint128_t)env->regs[r1]) << 64) | 
> +                           (env->regs[r1+1]);
> +    uint64_t divisor = v2;
> +    __uint128_t quotient = dividend / divisor;
> +    env->regs[r1+1] = quotient;
> +    __uint128_t remainder = dividend % divisor;
> +    env->regs[r1] = remainder;
> +    HELPER_LOG("%s: dividend 0x%016lx%016lx divisor 0x%lx quotient 0x%lx rem 0x%lx\n",
> +               __FUNCTION__, (uint64_t)(dividend >> 64), (uint64_t)dividend, divisor, (uint64_t)quotient,
> +               (uint64_t)remainder);
> +}

Same here, __uint128_t should not be used, though I don't know what
should be used instead.

> +/* set condition code for 64-bit signed addition */
> +uint32_t HELPER(set_cc_add64)(int64_t a1, int64_t a2, int64_t ar)
> +{
> +    if ((a1 > 0 && a2 > 0 && ar < 0) || (a1 < 0 && a2 < 0 && ar > 0)) {
> +        return 3; /* overflow */
> +    }
> +    else {
> +        if (ar < 0) return 1;
> +        else if (ar > 0) return 2;
> +        else return 0;
> +    }

coding style

> +}
> +
> +/* set condition code for 64-bit unsigned addition */
> +uint32_t HELPER(set_cc_addu64)(uint64_t a1, uint64_t a2, uint64_t ar)
> +{
> +    if (ar == 0) {
> +        if (a1) return 2;
> +        else return 0;
> +    }
> +    else {
> +        if (ar < a1 || ar < a2) {
> +          return 3;
> +        }
> +        else {
> +          return 1;
> +        }
> +    }

coding style

> +}
> +
> +/* set condition code for 32-bit signed addition */
> +uint32_t HELPER(set_cc_add32)(int32_t a1, int32_t a2, int32_t ar)
> +{
> +    if ((a1 > 0 && a2 > 0 && ar < 0) || (a1 < 0 && a2 < 0 && ar > 0)) {
> +        return 3; /* overflow */
> +    }
> +    else {
coding style
> +        if (ar < 0) return 1;
> +        else if (ar > 0) return 2;
> +        else return 0;
> +    }
> +}
> +
> +/* set condition code for 32-bit unsigned addition */
> +uint32_t HELPER(set_cc_addu32)(uint32_t a1, uint32_t a2, uint32_t ar)
> +{
> +    if (ar == 0) {
> +        if (a1) return 2;
> +        else return 0;
> +    }
> +    else {
coding style
> +        if (ar < a1 || ar < a2) {
> +          return 3;
> +        }
> +        else {
> +          return 1;
> +        }
> +    }
> +}
> +
> +/* set condition code for 64-bit signed subtraction */
> +uint32_t HELPER(set_cc_sub64)(int64_t s1, int64_t s2, int64_t sr)
> +{
> +    if ((s1 > 0 && s2 < 0 && sr < 0) || (s1 < 0 && s2 > 0 && sr > 0)) {
> +        return 3; /* overflow */
> +    }
> +    else {
coding style
> +        if (sr < 0) return 1;
> +        else if (sr > 0) return 2;
> +        else return 0;
> +    }
> +}
> +
> +/* set condition code for 32-bit signed subtraction */
> +uint32_t HELPER(set_cc_sub32)(int32_t s1, int32_t s2, int32_t sr)
> +{
> +    if ((s1 > 0 && s2 < 0 && sr < 0) || (s1 < 0 && s2 > 0 && sr > 0)) {
> +        return 3; /* overflow */
> +    }
> +    else {
> +        if (sr < 0) return 1;
> +        else if (sr > 0) return 2;
> +        else return 0;
> +    }
> +}
> +
> +/* set condition code for 32-bit unsigned subtraction */
> +uint32_t HELPER(set_cc_subu32)(uint32_t s1, uint32_t s2, uint32_t sr)
> +{
> +    if (sr == 0) return 2;
> +    else {
coding style
> +        if (s2 > s1) return 1;
> +        else return 3;
> +    }
> +}
> +
> +/* set condition code for 64-bit unsigned subtraction */
> +uint32_t HELPER(set_cc_subu64)(uint64_t s1, uint64_t s2, uint64_t sr)
> +{
> +    if (sr == 0) return 2;
> +    else {
coding style
> +        if (s2 > s1) return 1;
> +        else return 3;
> +    }
> +}
> +
> +/* search string (c is byte to search, r2 is string, r1 end of string) */
> +uint32_t HELPER(srst)(uint32_t c, uint32_t r1, uint32_t r2)
> +{
> +    HELPER_LOG("%s: c %d *r1 0x%lx *r2 0x%lx\n", __FUNCTION__, c, env->regs[r1],
> +             env->regs[r2]);
> +    uint64_t i;
> +    uint32_t cc;
> +    for (i = env->regs[r2]; i != env->regs[r1]; i++) {
> +        if (ldub(i) == c) {
> +            env->regs[r1] = i;
> +            cc = 1;
> +            return cc;
> +        }
> +    }
> +    cc = 2;
> +    return cc;
> +}
> +
> +/* unsigned string compare (c is string terminator) */
> +uint32_t HELPER(clst)(uint32_t c, uint32_t r1, uint32_t r2)
> +{
> +    uint64_t s1 = env->regs[r1];
> +    uint64_t s2 = env->regs[r2];
> +    uint8_t v1, v2;
> +    uint32_t cc;
> +    c = c & 0xff;
> +#ifdef CONFIG_USER_ONLY
> +    if (!c) {
> +        HELPER_LOG("%s: comparing '%s' and '%s'\n",
> +                   __FUNCTION__, (char*)s1, (char*)s2);
> +    }
> +#endif

Why CONFIG_USER_ONLY ?

> +    for (;;) {
> +        v1 = ldub(s1);
> +        v2 = ldub(s2);
> +        if (v1 == c || v2 == c) break;
> +        if (v1 != v2) break;
> +        s1++; s2++;
> +    }
coding style
> +    
> +    if (v1 == v2) cc = 0;
> +    else {
> +        if (v1 < v2) cc = 1;
> +        else cc = 2;
> +        env->regs[r1] = s1;
> +        env->regs[r2] = s2;
> +    }
> +    return cc;
> +}
> +
> +/* string copy (c is string terminator) */
> +uint32_t HELPER(mvst)(uint32_t c, uint32_t r1, uint32_t r2)
> +{
> +    uint64_t dest = env->regs[r1];
> +    uint64_t src = env->regs[r2];
> +    uint8_t v;
> +    c = c & 0xff;
> +#ifdef CONFIG_USER_ONLY
> +    if (!c) {
> +        HELPER_LOG("%s: copying '%s' to 0x%lx\n", __FUNCTION__, (char*)src, dest);
> +    }
> +#endif

Same.

> +    for (;;) {
> +        v = ldub(src);
> +        stb(dest, v);
> +        if (v == c) break;
coding style
> +        src++; dest++;
> +    }
> +    env->regs[r1] = dest;
> +    return 1;
> +}
> +
> +/* compare and swap 64-bit */
> +uint32_t HELPER(csg)(uint32_t r1, uint64_t a2, uint32_t r3)
> +{
> +    /* FIXME: locking? */
> +    uint32_t cc;
> +    uint64_t v2 = ldq(a2);
> +    if (env->regs[r1] == v2) {
> +        cc = 0;
> +        stq(a2, env->regs[r3]);
> +    }
> +    else {
> +        cc = 1;
> +        env->regs[r1] = v2;
> +    }
> +    return cc;
coding style
> +}
> +
> +/* compare double and swap 64-bit */
> +uint32_t HELPER(cdsg)(uint32_t r1, uint64_t a2, uint32_t r3)
> +{
> +    /* FIXME: locking? */
> +    uint32_t cc;
> +    __uint128_t v2 = (((__uint128_t)ldq(a2)) << 64) | (__uint128_t)ldq(a2 + 8);
> +    __uint128_t v1 = (((__uint128_t)env->regs[r1]) << 64) | (__uint128_t)env->regs[r1 + 1];
> +    if (v1 == v2) {
> +        cc = 0;
> +        stq(a2, env->regs[r3]);
> +        stq(a2 + 8, env->regs[r3 + 1]);
> +    }
> +    else {
coding style
> +        cc = 1;
> +        env->regs[r1] = v2 >> 64;
> +        env->regs[r1 + 1] = v2 & 0xffffffffffffffffULL;
> +    }
> +    return cc;
> +}
> +
> +/* compare and swap 32-bit */
> +uint32_t HELPER(cs)(uint32_t r1, uint64_t a2, uint32_t r3)
> +{
> +    /* FIXME: locking? */
> +    uint32_t cc;
> +    HELPER_LOG("%s: r1 %d a2 0x%lx r3 %d\n", __FUNCTION__, r1, a2, r3);
> +    uint32_t v2 = ldl(a2);
> +    if (((uint32_t)env->regs[r1]) == v2) {
> +        cc = 0;
> +        stl(a2, (uint32_t)env->regs[r3]);
> +    }
coding style
> +    else {
> +        cc = 1;
> +        env->regs[r1] = (env->regs[r1] & 0xffffffff00000000ULL) | v2;
> +    }
> +    return cc;
> +}
> +
> +/* execute instruction
> +   this instruction executes an insn modified with the contents of r1
> +   it does not change the executed instruction in memory
> +   it does not change the program counter
> +   in other words: tricky...
> +   currently implemented by interpreting the cases it is most commonly used in
> + */
> +uint32_t HELPER(ex)(uint32_t cc, uint64_t v1, uint64_t addr, uint64_t ret)
> +{
> +    uint16_t insn = lduw(addr);
> +    HELPER_LOG("%s: v1 0x%lx addr 0x%lx insn 0x%x\n", __FUNCTION__, v1, addr,
> +             insn);
> +    if ((insn & 0xf0ff) == 0xd000) {
> +        uint32_t l, insn2, b, d1, d2;
> +        l = v1 & 0xff;
> +        insn2 = ldl(addr + 2);
> +        b = (((insn2 >> 28) & 0xf) << 4) | ((insn2 >> 12) & 0xf);
> +        d1 = (insn2 >> 16) & 0xfff;
> +        d2 = insn2 & 0xfff;
> +        switch (insn & 0xf00) {
> +        case 0x200: helper_mvc(l, b, d1, d2); return cc; break;
> +        case 0x500: return helper_clc(l, b, d1, d2); break;
> +        case 0x700: return helper_xc(l, b, d1, d2); break;
> +        default: helper_exception(23); break;
> +        }
> +    }
> +    else if ((insn & 0xff00) == 0x0a00) {	/* supervisor call */
> +        HELPER_LOG("%s: svc %ld via execute\n", __FUNCTION__, (insn|v1) & 0xff);
> +        env->psw.addr = ret;
> +        helper_exception(EXCP_EXECUTE_SVC + ((insn | v1) & 0xff));
> +    }
> +    else {
> +        helper_exception(23);
> +    }
> +    return cc;
> +}

Looks a bit ugly, but currently do not have anything else to offer.

> +/* set condition code for test under mask */
> +uint32_t HELPER(tm)(uint32_t val, uint32_t mask)
> +{
> +    HELPER_LOG("%s: val 0x%x mask 0x%x\n", __FUNCTION__, val, mask);
> +    uint16_t r = val & mask;
> +    if (r == 0) return 0;
> +    else if (r == mask) return 3;
> +    else return 1;
coding style
> +}
> +
> +/* set condition code for test under mask */
> +uint32_t HELPER(tmxx)(uint64_t val, uint32_t mask)
> +{
> +    uint16_t r = val & mask;
> +    HELPER_LOG("%s: val 0x%lx mask 0x%x r 0x%x\n", __FUNCTION__, val, mask, r);
> +    if (r == 0) return 0;
> +    else if (r == mask) return 3;
> +    else {
> +        while (!(mask & 0x8000)) {
> +            mask <<= 1;
> +            val <<= 1;
> +        }
> +        if (val & 0x8000) return 2;
> +        else return 1;
> +    }
coding style
> +}
> +
> +/* absolute value 32-bit */
> +uint32_t HELPER(abs_i32)(uint32_t reg, int32_t val)
> +{
> +    uint32_t cc;
> +    if (val == 0x80000000UL) cc = 3;
> +    else if (val) cc = 1;
> +    else cc = 0;
> +
> +    if (val < 0) {
> +        env->regs[reg] = -val;
> +    }
> +    else {
> +        env->regs[reg] = val;
> +    }
> +    return cc;
coding style
> +}
> +
> +/* negative absolute value 32-bit */
> +uint32_t HELPER(nabs_i32)(uint32_t reg, int32_t val)
> +{
> +    uint32_t cc;
> +    if (val) cc = 1;
> +    else cc = 0;
> +    
> +    if (val < 0) {
> +        env->regs[reg] = (env->regs[reg] & 0xffffffff00000000ULL) | val;
> +    }
> +    else {
> +        env->regs[reg] = (env->regs[reg] & 0xffffffff00000000ULL) | ((uint32_t)-val);
> +    }
> +    return cc;
coding style
> +}
> +
> +/* absolute value 64-bit */
> +uint32_t HELPER(abs_i64)(uint32_t reg, int64_t val)
> +{
> +    uint32_t cc;
> +    if (val == 0x8000000000000000ULL) cc = 3;
> +    else if (val) cc = 1;
> +    else cc = 0;
> +    
> +    if (val < 0) {
> +        env->regs[reg] = -val;
> +    }
> +    else {
> +        env->regs[reg] = val;
> +    }
> +    return cc;
> +}
> +
> +/* negative absolute value 64-bit */
> +uint32_t HELPER(nabs_i64)(uint32_t reg, int64_t val)
> +{
> +    uint32_t cc;
> +    if (val) cc = 1;
> +    else cc = 0;
> +
> +    if (val < 0) {
> +        env->regs[reg] = val;
> +    }
> +    else {
> +        env->regs[reg] = -val;
> +    }
> +    return cc;
coding style
> +}
> +
> +/* add with carry 32-bit unsigned */
> +uint32_t HELPER(addc_u32)(uint32_t cc, uint32_t r1, uint32_t v2)
> +{
> +    uint32_t res;
> +    uint32_t v1 = env->regs[r1] & 0xffffffffUL;
> +    res = v1 + v2;
> +    if (cc & 2) res++;
> +
> +    if (res == 0) {
> +        if (v1) cc = 2;
> +        else cc = 0;
> +    }
> +    else {
> +        if (res < v1 || res < v2)
> +          cc = 3;
> +        else
> +          cc = 1;
> +    }
> +    env->regs[r1] = (env->regs[r1] & 0xffffffff00000000ULL) | res;
> +    return cc;
coding style
> +}
> +
> +/* CC for add with carry 64-bit unsigned (isn't this a duplicate of some other CC function?) */
> +uint32_t HELPER(set_cc_addc_u64)(uint64_t v1, uint64_t v2, uint64_t res)
> +{
> +    uint32_t cc;
> +    if (res == 0) {
> +        if (v1) cc = 2;
> +        else cc = 0;
> +    }
> +    else {
> +        if (res < v1 || res < v2) {
> +          cc = 3;
> +        }
> +        else {
> +          cc = 1;
> +        }
> +    }
> +    return cc;
> +}
> +
> +/* store character under mask high
> +   operates on the upper half of r1 */
> +uint32_t HELPER(stcmh)(uint32_t r1, uint64_t address, uint32_t mask)
> +{
> +    int pos = 56; /* top of the upper half of r1 */
> +    
> +    while (mask) {
> +        if (mask & 8) {
> +            stb(address, (env->regs[r1] >> pos) & 0xff);
> +            address++;
> +        }
> +        mask = (mask << 1) & 0xf;
> +        pos -= 8;
> +    }
> +    return 0;
> +}
> +
> +/* insert character under mask high
> +   same as icm, but operates on the upper half of r1 */
> +uint32_t HELPER(icmh)(uint32_t r1, uint64_t address, uint32_t mask)
> +{
> +    int pos = 56; /* top of the upper half of r1 */
> +    uint64_t rmask = 0xff00000000000000ULL;
> +    uint8_t val = 0;
> +    int ccd = 0;
> +    uint32_t cc;
> +    
> +    cc = 0;
> +    
> +    while (mask) {
> +        if (mask & 8) {
> +            env->regs[r1] &= ~rmask;
> +            val = ldub(address);
> +            if ((val & 0x80) && !ccd) cc = 1;
> +            ccd = 1;
> +            if (val && cc == 0) cc = 2;
> +            env->regs[r1] |= (uint64_t)val << pos;
> +            address++;
> +        }
> +        mask = (mask << 1) & 0xf;
> +        pos -= 8;
> +        rmask >>= 8;
> +    }
> +    return cc;
coding style
> +}
> +
> +/* insert psw mask and condition code into r1 */
> +void HELPER(ipm)(uint32_t cc, uint32_t r1)
> +{
> +    uint64_t r = env->regs[r1];
> +    r &= 0xffffffff00ffffffULL;
> +    r |= (cc << 28) | ( (env->psw.mask >> 40) & 0xf );
> +    env->regs[r1] = r;
> +    HELPER_LOG("%s: cc %d psw.mask 0x%lx r1 0x%lx\n", __FUNCTION__, cc, env->psw.mask, r);
> +}
> +
> +/* store access registers r1 to r3 in memory at a2 */
> +void HELPER(stam)(uint32_t r1, uint64_t a2, uint32_t r3)
> +{
> +    int i;
> +    for (i = r1; i != ((r3 + 1) & 15); i = (i + 1) & 15) {
> +        stl(a2, env->aregs[i]);
> +        a2 += 4;
> +    }
> +}
> +
> +/* move long extended
> +   another memcopy insn with more bells and whistles */
> +uint32_t HELPER(mvcle)(uint32_t r1, uint64_t a2, uint32_t r3)
> +{
> +    uint64_t destlen = env->regs[r1 + 1];
> +    uint64_t dest = env->regs[r1];
> +    uint64_t srclen = env->regs[r3 + 1];
> +    uint64_t src = env->regs[r3];
> +    uint8_t pad = a2 & 0xff;
> +    uint8_t v;
> +    uint32_t cc;
> +    if (destlen == srclen) cc = 0;
> +    else if (destlen < srclen) cc = 1;
> +    else cc = 2;
> +    if (srclen > destlen) srclen = destlen;
coding style
> +    for(;destlen && srclen;src++,dest++,destlen--,srclen--) {
> +        v = ldub(src);
> +        stb(dest, v);
> +    }
> +    for(;destlen;dest++,destlen--) {
> +        stb(dest, pad);
> +    }
> +    env->regs[r1 + 1] = destlen;
> +    env->regs[r3 + 1] -= src - env->regs[r3]; /* can't use srclen here,
> +                                                 we trunc'ed it */
> +    env->regs[r1] = dest;
> +    env->regs[r3] = src;
> +    
> +    return cc;
> +}
> +
> +/* compare logical long extended
> +   memcompare insn with padding */
> +uint32_t HELPER(clcle)(uint32_t r1, uint64_t a2, uint32_t r3)
> +{
> +    uint64_t destlen = env->regs[r1 + 1];
> +    uint64_t dest = env->regs[r1];
> +    uint64_t srclen = env->regs[r3 + 1];
> +    uint64_t src = env->regs[r3];
> +    uint8_t pad = a2 & 0xff;
> +    uint8_t v1 = 0,v2 = 0;
> +    uint32_t cc = 0;
> +    if (!(destlen || srclen)) return cc;
> +    if (srclen > destlen) srclen = destlen;
> +    for(;destlen || srclen;src++,dest++,destlen--,srclen--) {
> +        if (srclen) v1 = ldub(src);
> +        else v1 = pad;
> +        if (destlen) v2 = ldub(dest);
> +        else v2 = pad;
> +        if (v1 != v2) break;
> +    }
> +
> +    env->regs[r1 + 1] = destlen;
> +    env->regs[r3 + 1] -= src - env->regs[r3]; /* can't use srclen here,
> +                                                 we trunc'ed it */
> +    env->regs[r1] = dest;
> +    env->regs[r3] = src;
> +    
> +    if (v1 < v2) cc = 1;
> +    else if (v1 > v2) cc = 2;
> +    
> +    return cc;
coding style
> +}
> +
> +/* subtract unsigned v2 from v1 with borrow */
> +uint32_t HELPER(slb)(uint32_t cc, uint32_t r1, uint32_t v1, uint32_t v2)
> +{
> +    uint32_t res = v1 + (~v2) + (cc >> 1);
> +    env->regs[r1] = (env->regs[r1] & 0xffffffff00000000ULL) | res;
> +    if (cc & 2) { /* borrow */
> +        if (v1) return 1;
> +        else return 0;
> +    }
> +    else {
> +        if (v1) return 3;
> +        else return 2;
> +    }
coding style
> +}
> +
> +/* subtract unsigned v2 from v1 with borrow */
> +uint32_t HELPER(slbg)(uint32_t cc, uint32_t r1, uint64_t v1, uint64_t v2)
> +{
> +    uint64_t res = v1 + (~v2) + (cc >> 1);
> +    env->regs[r1] = res;
> +    if (cc & 2) { /* borrow */
> +        if (v1) return 1;
> +        else return 0;
> +    }
> +    else {
> +        if (v1) return 3;
> +        else return 2;
> +    }
coding style
> +}
> +
> +/* union used for splitting/joining 128-bit floats to/from 64-bit FP regs */
> +typedef union {
> +    struct {
> +#ifdef WORDS_BIGENDIAN
> +        uint64_t h;
> +        uint64_t l;
> +#else
> +        uint64_t l;
> +        uint64_t h;
> +#endif
> +    };
> +    float128 x;
> +} FP128;

WORDS_BIGENDIAN is wrong here. CPU_QuadU can probably be used instead.

> +/* condition codes for binary FP ops */
> +static uint32_t set_cc_f32(float32 v1, float32 v2)
> +{
> +    if (float32_is_nan(v1) || float32_is_nan(v2)) return 3;
> +    else if (float32_eq(v1, v2, &env->fpu_status)) return 0;
> +    else if (float32_lt(v1, v2, &env->fpu_status)) return 1;
> +    else return 2;
coding style
> +}
> +
> +static uint32_t set_cc_f64(float64 v1, float64 v2)
> +{
> +    if (float64_is_nan(v1) || float64_is_nan(v2)) return 3;
> +    else if (float64_eq(v1, v2, &env->fpu_status)) return 0;
> +    else if (float64_lt(v1, v2, &env->fpu_status)) return 1;
> +    else return 2;
coding style
> +}
> +
> +/* condition codes for unary FP ops */
> +static uint32_t set_cc_nz_f32(float32 v)
> +{
> +    if (float32_is_nan(v)) return 3;
> +    else if (float32_is_zero(v)) return 0;
> +    else if (float32_is_neg(v)) return 1;
> +    else return 2;
coding style
> +}
> +
> +static uint32_t set_cc_nz_f64(float64 v)
> +{
> +    if (float64_is_nan(v)) return 3;
> +    else if (float64_is_zero(v)) return 0;
> +    else if (float64_is_neg(v)) return 1;
> +    else return 2;
coding style
> +}
> +
> +static uint32_t set_cc_nz_f128(float128 v)
> +{
> +    if (float128_is_nan(v)) return 3;
> +    else if (float128_is_zero(v)) return 0;
> +    else if (float128_is_neg(v)) return 1;
> +    else return 2;
coding style
> +}
> +
> +/* convert 32-bit int to 64-bit float */
> +void HELPER(cdfbr)(uint32_t f1, int32_t v2)
> +{
> +    HELPER_LOG("%s: converting %d to f%d\n", __FUNCTION__, v2, f1);
> +    env->fregs[f1].d = int32_to_float64(v2, &env->fpu_status);
> +}
> +
> +/* convert 32-bit int to 128-bit float */
> +void HELPER(cxfbr)(uint32_t f1, int32_t v2)
> +{
> +    FP128 v1;
> +    v1.x = int32_to_float128(v2, &env->fpu_status);
> +    env->fregs[f1].i = v1.h;
> +    env->fregs[f1 + 2].i = v1.l;
> +}
> +
> +/* convert 64-bit int to 32-bit float */
> +void HELPER(cegbr)(uint32_t f1, int64_t v2)
> +{
> +    HELPER_LOG("%s: converting %ld to f%d\n", __FUNCTION__, v2, f1);
> +    env->fregs[f1].e = int64_to_float32(v2, &env->fpu_status);
> +}
> +
> +/* convert 64-bit int to 64-bit float */
> +void HELPER(cdgbr)(uint32_t f1, int64_t v2)
> +{
> +    HELPER_LOG("%s: converting %ld to f%d\n", __FUNCTION__, v2, f1);
> +    env->fregs[f1].d = int64_to_float64(v2, &env->fpu_status);
> +}
> +
> +/* convert 64-bit int to 128-bit float */
> +void HELPER(cxgbr)(uint32_t f1, int64_t v2)
> +{
> +    FP128 x1;
> +    x1.x = int64_to_float128(v2, &env->fpu_status);
> +    HELPER_LOG("%s: converted %ld to 0x%lx and 0x%lx\n", __FUNCTION__, v2, x1.h, x1.l);
> +    env->fregs[f1].i = x1.h;
> +    env->fregs[f1 + 2].i = x1.l;
> +}
> +
> +/* convert 32-bit int to 32-bit float */
> +void HELPER(cefbr)(uint32_t f1, int32_t v2)
> +{
> +    env->fregs[f1].e = int32_to_float32(v2, &env->fpu_status);
> +    HELPER_LOG("%s: converting %d to 0x%d in f%d\n", __FUNCTION__, v2, env->fregs[f1].e, f1);
> +}
> +
> +/* 32-bit FP addition RR */
> +uint32_t HELPER(aebr)(uint32_t f1, uint32_t f2)
> +{
> +    env->fregs[f1].e = float32_add(env->fregs[f1].e, env->fregs[f2].e, &env->fpu_status);
> +    HELPER_LOG("%s: adding 0x%d resulting in 0x%d in f%d\n", __FUNCTION__, env->fregs[f2].e, env->fregs[f1].e, f1);
> +    return set_cc_nz_f32(env->fregs[f1].e);
> +}
> +
> +/* 64-bit FP addition RR */
> +uint32_t HELPER(adbr)(uint32_t f1, uint32_t f2)
> +{
> +    env->fregs[f1].d = float64_add(env->fregs[f1].d, env->fregs[f2].d, &env->fpu_status);
> +    HELPER_LOG("%s: adding 0x%ld resulting in 0x%ld in f%d\n", __FUNCTION__, env->fregs[f2].d, env->fregs[f1].d, f1);
> +    return set_cc_nz_f64(env->fregs[f1].d);
> +}
> +
> +/* 32-bit FP subtraction RR */
> +uint32_t HELPER(sebr)(uint32_t f1, uint32_t f2)
> +{
> +    env->fregs[f1].e = float32_sub(env->fregs[f1].e, env->fregs[f2].e, &env->fpu_status);
> +    HELPER_LOG("%s: adding 0x%d resulting in 0x%d in f%d\n", __FUNCTION__, env->fregs[f2].e, env->fregs[f1].e, f1);
> +    return set_cc_nz_f32(env->fregs[f1].e);
> +}
> +
> +/* 64-bit FP subtraction RR */
> +uint32_t HELPER(sdbr)(uint32_t f1, uint32_t f2)
> +{
> +    env->fregs[f1].d = float64_sub(env->fregs[f1].d, env->fregs[f2].d, &env->fpu_status);
> +    HELPER_LOG("%s: subtracting 0x%ld resulting in 0x%ld in f%d\n", __FUNCTION__, env->fregs[f2].d, env->fregs[f1].d, f1);
> +    return set_cc_nz_f64(env->fregs[f1].d);
> +}
> +
> +/* 32-bit FP division RR */
> +void HELPER(debr)(uint32_t f1, uint32_t f2)
> +{
> +    env->fregs[f1].e = float32_div(env->fregs[f1].e, env->fregs[f2].e, &env->fpu_status);
> +}
> +
> +/* 128-bit FP division RR */
> +void HELPER(dxbr)(uint32_t f1, uint32_t f2)
> +{
> +    FP128 v1;
> +    v1.h = env->fregs[f1].i;
> +    v1.l = env->fregs[f1 + 2].i;
> +    FP128 v2;
> +    v2.h = env->fregs[f2].i;
> +    v2.l = env->fregs[f2 + 2].i;
> +    FP128 res;
> +    res.x = float128_div(v1.x, v2.x, &env->fpu_status);
> +    env->fregs[f1].i = res.h;
> +    env->fregs[f1 + 2].i = res.l;
> +}
> +
> +/* 64-bit FP multiplication RR */
> +void HELPER(mdbr)(uint32_t f1, uint32_t f2)
> +{
> +    env->fregs[f1].d = float64_mul(env->fregs[f1].d, env->fregs[f2].d, &env->fpu_status);
> +}
> +
> +/* 128-bit FP multiplication RR */
> +void HELPER(mxbr)(uint32_t f1, uint32_t f2)
> +{
> +    FP128 v1;
> +    v1.h = env->fregs[f1].i;
> +    v1.l = env->fregs[f1 + 2].i;
> +    FP128 v2;
> +    v2.h = env->fregs[f2].i;
> +    v2.l = env->fregs[f2 + 2].i;
> +    FP128 res;
> +    res.x = float128_mul(v1.x, v2.x, &env->fpu_status);
> +    //HELPER_LOG("%s: 0x%ld * 0x%ld = 0x%ld\n", __FUNCTION__, v1.x, v2.x, res.x);
> +    env->fregs[f1].i = res.h;
> +    env->fregs[f1 + 2].i = res.l;
> +}
> +
> +/* convert 32-bit float to 64-bit float */
> +void HELPER(ldebr)(uint32_t r1, uint32_t r2)
> +{
> +    env->fregs[r1].d = float32_to_float64(env->fregs[r2].e, &env->fpu_status);
> +}
> +
> +/* convert 128-bit float to 64-bit float */
> +void HELPER(ldxbr)(uint32_t f1, uint32_t f2)
> +{
> +    FP128 x2;
> +    x2.h = env->fregs[f2].i;
> +    x2.l = env->fregs[f2 + 2].i;
> +    //HELPER_LOG("%s: converted %llf ", __FUNCTION__, x2.x);
> +    env->fregs[f1].d = float128_to_float64(x2.x, &env->fpu_status);
> +    HELPER_LOG("%s: to 0x%ld\n", __FUNCTION__, env->fregs[f1].d);
> +}
> +
> +/* convert 64-bit float to 128-bit float */
> +void HELPER(lxdbr)(uint32_t f1, uint32_t f2)
> +{
> +    FP128 res;
> +    res.x = float64_to_float128(env->fregs[f2].d, &env->fpu_status);
> +    env->fregs[f1].i = res.h;
> +    env->fregs[f1 + 2].i = res.l;
> +}
> +
> +/* convert 64-bit float to 32-bit float */
> +void HELPER(ledbr)(uint32_t f1, uint32_t f2)
> +{
> +    float64 d2 = env->fregs[f2].d;
> +    env->fregs[f1].e = float64_to_float32(d2, &env->fpu_status);
> +}
> +
> +/* convert 128-bit float to 32-bit float */
> +void HELPER(lexbr)(uint32_t f1, uint32_t f2)
> +{
> +    FP128 x2;
> +    x2.h = env->fregs[f2].i;
> +    x2.l = env->fregs[f2 + 2].i;
> +    //HELPER_LOG("%s: converted %llf ", __FUNCTION__, x2.x);
> +    env->fregs[f1].e = float128_to_float32(x2.x, &env->fpu_status);
> +    HELPER_LOG("%s: to 0x%d\n", __FUNCTION__, env->fregs[f1].e);
> +}
> +
> +/* absolute value of 32-bit float */
> +uint32_t HELPER(lpebr)(uint32_t f1, uint32_t f2)
> +{
> +    float32 v1;
> +    float32 v2 = env->fregs[f2].d;
> +    if (float32_is_neg(v2)) {
> +        v1 = float32_abs(v2);
> +    }
> +    else {
> +        v1 = v2;
> +    }

I don't see the point of such a test here.

> +    env->fregs[f1].d = v1;
> +    return set_cc_nz_f32(v1);
> +}
> +
> +/* absolute value of 64-bit float */
> +uint32_t HELPER(lpdbr)(uint32_t f1, uint32_t f2)
> +{
> +    float64 v1;
> +    float64 v2 = env->fregs[f2].d;
> +    if (float64_is_neg(v2)) {
> +        v1 = float64_abs(v2);
> +    }
> +    else {
> +        v1 = v2;
> +    }

Same.

> +    env->fregs[f1].d = v1;
> +    return set_cc_nz_f64(v1);
> +}
> +
> +/* absolute value of 128-bit float */
> +uint32_t HELPER(lpxbr)(uint32_t f1, uint32_t f2)
> +{
> +    FP128 v1;
> +    FP128 v2;
> +    v2.h = env->fregs[f2].i;
> +    v2.l = env->fregs[f2 + 2].i;
> +    if (float128_is_neg(v2.x)) {
> +        v1.x = float128_abs(v2.x);
> +    }
> +    else {
> +        v1 = v2;
> +    }

Same.

> +    env->fregs[f1].i = v1.h;
> +    env->fregs[f1 + 2].i = v1.l;
> +    return set_cc_nz_f128(v1.x);
> +}
> +
> +/* load and test 64-bit float */
> +uint32_t HELPER(ltdbr)(uint32_t f1, uint32_t f2)
> +{
> +    env->fregs[f1].d = env->fregs[f2].d;
> +    return set_cc_nz_f64(env->fregs[f1].d);
> +}
> +
> +/* load and test 32-bit float */
> +uint32_t HELPER(ltebr)(uint32_t f1, uint32_t f2)
> +{
> +    env->fregs[f1].e = env->fregs[f2].e;
> +    return set_cc_nz_f32(env->fregs[f1].e);
> +}
> +
> +/* load and test 128-bit float */
> +uint32_t HELPER(ltxbr)(uint32_t f1, uint32_t f2)
> +{
> +    FP128 x;
> +    x.h = env->fregs[f2].i;
> +    x.l = env->fregs[f2 + 2].i;
> +    env->fregs[f1].i = x.h;
> +    env->fregs[f1 + 2].i = x.l;
> +    return set_cc_nz_f128(x.x);
> +}
> +
> +/* negative absolute of 32-bit float */
> +uint32_t HELPER(lcebr)(uint32_t f1, uint32_t f2)
> +{
> +    env->fregs[f1].e = float32_sub(float32_zero, env->fregs[f2].e, &env->fpu_status);
> +    return set_cc_nz_f32(env->fregs[f1].e);
> +}
> +
> +/* negative absolute of 64-bit float */
> +uint32_t HELPER(lcdbr)(uint32_t f1, uint32_t f2)
> +{
> +    env->fregs[f1].d = float64_sub(float64_zero, env->fregs[f2].d, &env->fpu_status);
> +    return set_cc_nz_f64(env->fregs[f1].d);
> +}
> +
> +/* convert 64-bit float to 128-bit float */
> +uint32_t HELPER(lcxbr)(uint32_t f1, uint32_t f2)
> +{
> +    FP128 x1, x2;
> +    x2.h = env->fregs[f2].i;
> +    x2.l = env->fregs[f2 + 2].i;
> +    x1.x = float128_sub(float64_to_float128(float64_zero, &env->fpu_status), x2.x, &env->fpu_status);
> +    env->fregs[f1].i = x1.h;
> +    env->fregs[f1 + 2].i = x1.l;
> +    return set_cc_nz_f128(x1.x);
> +}
> +
> +/* 32-bit FP compare RM */
> +uint32_t HELPER(ceb)(uint32_t f1, uint64_t a2)
> +{
> +    float32 v1 = env->fregs[f1].e;
> +    union {
> +        float32 e;
> +        uint32_t i;
> +    } v2;

CPU_FloatU should be used instead.

> +    v2.i = ldl(a2);

The value should be passed directly instead of loaded here, as ldl is
is wrong depending on the MMU mode.

> +    HELPER_LOG("%s: comparing 0x%d from f%d and 0x%d\n", __FUNCTION__, v1, f1, v2.e);
> +    return set_cc_f32(v1, v2.e);
> +}
> +
> +/* 32-bit FP addition RM */
> +uint32_t HELPER(aeb)(uint32_t f1, uint64_t a2)
> +{
> +    float32 v1 = env->fregs[f1].e;
> +    union {
> +        float32 e;
> +        uint32_t i;
> +    } v2;

same

> +    v2.i = ldl(a2);
same

> +    HELPER_LOG("%s: adding 0x%d from f%d and 0x%d\n", __FUNCTION__, v1, f1, v2.e);
> +    env->fregs[f1].e = float32_add(v1, v2.e, &env->fpu_status);
> +    return set_cc_nz_f32(env->fregs[f1].e);
> +}
> +
> +/* 32-bit FP division RM */
> +void HELPER(deb)(uint32_t f1, uint64_t a2)
> +{
> +    float32 v1 = env->fregs[f1].e;
> +    union {
> +        float32 e;
> +        uint32_t i;
> +    } v2;
same

> +    v2.i = ldl(a2);
same

> +    HELPER_LOG("%s: dividing 0x%d from f%d by 0x%d\n", __FUNCTION__, v1, f1, v2.e);
> +    env->fregs[f1].e = float32_div(v1, v2.e, &env->fpu_status);
> +}
> +
> +/* 32-bit FP multiplication RM */
> +void HELPER(meeb)(uint32_t f1, uint64_t a2)
> +{
> +    float32 v1 = env->fregs[f1].e;
> +    union {
> +        float32 e;
> +        uint32_t i;
> +    } v2;
same

> +    v2.i = ldl(a2);
same

> +    HELPER_LOG("%s: multiplying 0x%d from f%d and 0x%d\n", __FUNCTION__, v1, f1, v2.e);
> +    env->fregs[f1].e = float32_mul(v1, v2.e, &env->fpu_status);
> +}
> +
> +/* 32-bit FP compare RR */
> +uint32_t HELPER(cebr)(uint32_t f1, uint32_t f2)
> +{
> +    float32 v1 = env->fregs[f1].e;
> +    float32 v2 = env->fregs[f2].e;;
> +    HELPER_LOG("%s: comparing 0x%d from f%d and 0x%d\n", __FUNCTION__, v1, f1, v2);
> +    return set_cc_f32(v1, v2);
> +}
> +
> +/* 64-bit FP compare RR */
> +uint32_t HELPER(cdbr)(uint32_t f1, uint32_t f2)
> +{
> +    float64 v1 = env->fregs[f1].d;
> +    float64 v2 = env->fregs[f2].d;;
> +    HELPER_LOG("%s: comparing 0x%ld from f%d and 0x%ld\n", __FUNCTION__, v1, f1, v2);
> +    return set_cc_f64(v1, v2);
> +}
> +
> +/* 128-bit FP compare RR */
> +uint32_t HELPER(cxbr)(uint32_t f1, uint32_t f2)
> +{
> +    FP128 v1;
> +    v1.h = env->fregs[f1].i;
> +    v1.l = env->fregs[f1 + 2].i;
> +    FP128 v2;
> +    v2.h = env->fregs[f2].i;
> +    v2.l = env->fregs[f2 + 2].i;
> +    //HELPER_LOG("%s: comparing %llf from f%d and %llf\n", __FUNCTION__, v1.x, f1, v2.x);
> +    if (float128_is_nan(v1.x) || float128_is_nan(v2.x)) return 3;
> +    else if (float128_eq(v1.x, v2.x, &env->fpu_status)) return 0;
> +    else if (float128_lt(v1.x, v2.x, &env->fpu_status)) return 1;
> +    else return 2;
coding style
> +}
> +
> +/* 64-bit FP compare RM */
> +uint32_t HELPER(cdb)(uint32_t f1, uint64_t a2)
> +{
> +    float64 v1 = env->fregs[f1].d;
> +    union {
> +        float64 d;
> +        uint64_t i;
> +    } v2;
same

> +    v2.i = ldq(a2);
same

> +    HELPER_LOG("%s: comparing 0x%ld from f%d and 0x%lx\n", __FUNCTION__, v1, f1, v2.d);
> +    return set_cc_f64(v1, v2.d);
> +}
> +
> +/* 64-bit FP addition RM */
> +uint32_t HELPER(adb)(uint32_t f1, uint64_t a2)
> +{
> +    float64 v1 = env->fregs[f1].d;
> +    union {
> +        float64 d;
> +        uint64_t i;
> +    } v2;
same

> +    v2.i = ldq(a2);
same

> +    HELPER_LOG("%s: adding 0x%lx from f%d and 0x%lx\n", __FUNCTION__, v1, f1, v2.d);
> +    env->fregs[f1].d = v1 = float64_add(v1, v2.d, &env->fpu_status);
> +    return set_cc_nz_f64(v1);
> +}
> +
> +/* 32-bit FP subtraction RM */
> +uint32_t HELPER(seb)(uint32_t f1, uint64_t a2)
> +{
> +    float32 v1 = env->fregs[f1].e;
> +    union {
> +        float32 e;
> +        uint32_t i;
> +    } v2;
same

> +    v2.i = ldl(a2);
same

> +    env->fregs[f1].e = v1 = float32_sub(v1, v2.e, &env->fpu_status);
> +    return set_cc_nz_f32(v1);
> +}
> +
> +/* 64-bit FP subtraction RM */
> +uint32_t HELPER(sdb)(uint32_t f1, uint64_t a2)
> +{
> +    float64 v1 = env->fregs[f1].d;
> +    union {
> +        float64 d;
> +        uint64_t i;
> +    } v2;
same

> +    v2.i = ldq(a2);
same

> +    env->fregs[f1].d = v1 = float64_sub(v1, v2.d, &env->fpu_status);
> +    return set_cc_nz_f64(v1);
> +}
> +
> +/* 64-bit FP multiplication RM */
> +void HELPER(mdb)(uint32_t f1, uint64_t a2)
> +{
> +    float64 v1 = env->fregs[f1].d;
> +    union {
> +        float64 d;
> +        uint64_t i;
> +    } v2;
same

> +    v2.i = ldq(a2);
same

> +    HELPER_LOG("%s: multiplying 0x%lx from f%d and 0x%ld\n", __FUNCTION__, v1, f1, v2.d);
> +    env->fregs[f1].d = float64_mul(v1, v2.d, &env->fpu_status);
> +}
> +
> +/* 64-bit FP division RM */
> +void HELPER(ddb)(uint32_t f1, uint64_t a2)
> +{
> +    float64 v1 = env->fregs[f1].d;
> +    union {
> +        float64 d;
> +        uint64_t i;
> +    } v2;
same

> +    v2.i = ldq(a2);
same

> +    HELPER_LOG("%s: dividing 0x%lx from f%d by 0x%ld\n", __FUNCTION__, v1, f1, v2.d);
> +    env->fregs[f1].d = float64_div(v1, v2.d, &env->fpu_status);
> +}
> +
> +static void set_round_mode(int m3)
> +{
> +    switch (m3) {
> +    case 0: break; /* current mode */
> +    case 1: /* biased round no nearest */
> +    case 4: /* round to nearest */
> +        set_float_rounding_mode(float_round_nearest_even, &env->fpu_status);
> +        break;
> +    case 5: /* round to zero */
> +        set_float_rounding_mode(float_round_to_zero, &env->fpu_status);
> +        break;
> +    case 6: /* round to +inf */
> +        set_float_rounding_mode(float_round_up, &env->fpu_status);
> +        break;
> +    case 7: /* round to -inf */
> +        set_float_rounding_mode(float_round_down, &env->fpu_status);
> +        break;
> +    }
> +}
> +
> +/* convert 32-bit float to 64-bit int */
> +uint32_t HELPER(cgebr)(uint32_t r1, uint32_t f2, uint32_t m3)
> +{
> +    float32 v2 = env->fregs[f2].e;
> +    set_round_mode(m3);
> +    env->regs[r1] = float32_to_int64(v2, &env->fpu_status);
> +    return set_cc_nz_f32(v2);
> +}
> +
> +/* convert 64-bit float to 64-bit int */
> +uint32_t HELPER(cgdbr)(uint32_t r1, uint32_t f2, uint32_t m3)
> +{
> +    float64 v2 = env->fregs[f2].d;
> +    set_round_mode(m3);
> +    env->regs[r1] = float64_to_int64(v2, &env->fpu_status);
> +    return set_cc_nz_f64(v2);
> +}
> +
> +/* convert 128-bit float to 64-bit int */
> +uint32_t HELPER(cgxbr)(uint32_t r1, uint32_t f2, uint32_t m3)
> +{
> +    FP128 v2;
> +    v2.h = env->fregs[f2].i;
> +    v2.l = env->fregs[f2 + 2].i;
> +    set_round_mode(m3);
> +    env->regs[r1] = float128_to_int64(v2.x, &env->fpu_status);
> +    if (float128_is_nan(v2.x)) return 3;
> +    else if (float128_is_zero(v2.x)) return 0;
> +    else if (float128_is_neg(v2.x)) return 1;
> +    else return 2;
coding style
> +}
> +
> +/* convert 32-bit float to 32-bit int */
> +uint32_t HELPER(cfebr)(uint32_t r1, uint32_t f2, uint32_t m3)
> +{
> +    float32 v2 = env->fregs[f2].e;
> +    set_round_mode(m3);
> +    env->regs[r1] = (env->regs[r1] & 0xffffffff00000000ULL) | float32_to_int32(v2, &env->fpu_status);
> +    return set_cc_nz_f32(v2);
> +}
> +
> +/* convert 64-bit float to 32-bit int */
> +uint32_t HELPER(cfdbr)(uint32_t r1, uint32_t f2, uint32_t m3)
> +{
> +    float64 v2 = env->fregs[f2].d;
> +    set_round_mode(m3);
> +    env->regs[r1] = (env->regs[r1] & 0xffffffff00000000ULL) | float64_to_int32(v2, &env->fpu_status);
> +    return set_cc_nz_f64(v2);
> +}
> +
> +/* convert 128-bit float to 32-bit int */
> +uint32_t HELPER(cfxbr)(uint32_t r1, uint32_t f2, uint32_t m3)
> +{
> +    FP128 v2;
> +    v2.h = env->fregs[f2].i;
> +    v2.l = env->fregs[f2 + 2].i;
> +    env->regs[r1] = (env->regs[r1] & 0xffffffff00000000ULL) | float128_to_int32(v2.x, &env->fpu_status);
> +    return set_cc_nz_f128(v2.x);
> +}
> +
> +/* load 32-bit FP zero */
> +void HELPER(lzer)(uint32_t f1)
> +{
> +    env->fregs[f1].e = float32_zero;
> +}
> +
> +/* load 64-bit FP zero */
> +void HELPER(lzdr)(uint32_t f1)
> +{
> +    env->fregs[f1].d = float64_zero;
> +}
>
> +/* load 128-bit FP zero */
> +void HELPER(lzxr)(uint32_t f1)
> +{
> +    FP128 x;
> +    x.x = float64_to_float128(float64_zero, &env->fpu_status);
> +    env->fregs[f1].i = x.h;
> +    env->fregs[f1 + 1].i = x.l;
> +}
> +
> +/* 128-bit FP subtraction RR */
> +uint32_t HELPER(sxbr)(uint32_t f1, uint32_t f2)
> +{
> +    FP128 v1;
> +    v1.h = env->fregs[f1].i;
> +    v1.l = env->fregs[f1 + 2].i;
> +    FP128 v2;
> +    v2.h = env->fregs[f2].i;
> +    v2.l = env->fregs[f2 + 2].i;
> +    FP128 res;
> +    res.x = float128_sub(v1.x, v2.x, &env->fpu_status);
> +    env->fregs[f1].i = res.h;
> +    env->fregs[f1 + 2].i = res.l;
> +    return set_cc_nz_f128(res.x);
> +}
> +
> +/* 128-bit FP addition RR */
> +uint32_t HELPER(axbr)(uint32_t f1, uint32_t f2)
> +{
> +    FP128 v1;
> +    v1.h = env->fregs[f1].i;
> +    v1.l = env->fregs[f1 + 2].i;
> +    FP128 v2;
> +    v2.h = env->fregs[f2].i;
> +    v2.l = env->fregs[f2 + 2].i;
> +    FP128 res;
> +    res.x = float128_add(v1.x, v2.x, &env->fpu_status);
> +    env->fregs[f1].i = res.h;
> +    env->fregs[f1 + 2].i = res.l;
> +    return set_cc_nz_f128(res.x);
> +}
> +
> +/* 32-bit FP multiplication RR */
> +void HELPER(meebr)(uint32_t f1, uint32_t f2)
> +{
> +    env->fregs[f1].e = float32_mul(env->fregs[f1].e, env->fregs[f2].e, &env->fpu_status);
> +}
> +
> +/* 64-bit FP division RR */
> +void HELPER(ddbr)(uint32_t f1, uint32_t f2)
> +{
> +    env->fregs[f1].d = float64_div(env->fregs[f1].d, env->fregs[f2].d, &env->fpu_status);
> +}
> +
> +/* 64-bit FP multiply and add RM */
> +void HELPER(madb)(uint32_t f1, uint64_t a2, uint32_t f3)
> +{
> +    HELPER_LOG("%s: f1 %d a2 0x%lx f3 %d\n", __FUNCTION__, f1, a2, f3);
> +    union {
> +        float64 d;
> +        uint64_t i;
> +    } v2;
> +    v2.i = ldq(a2);
> +    env->fregs[f1].d = float64_add(env->fregs[f1].d, float64_mul(v2.d, env->fregs[f3].d, &env->fpu_status), &env->fpu_status);
> +}
> +
> +/* 64-bit FP multiply and add RR */
> +void HELPER(madbr)(uint32_t f1, uint32_t f3, uint32_t f2)
> +{
> +    HELPER_LOG("%s: f1 %d f2 %d f3 %d\n", __FUNCTION__, f1, f2, f3);
> +    env->fregs[f1].d = float64_add(float64_mul(env->fregs[f2].d, env->fregs[f3].d, &env->fpu_status), env->fregs[f1].d, &env->fpu_status);
> +}
> +
> +/* 64-bit FP multiply and subtract RR */
> +void HELPER(msdbr)(uint32_t f1, uint32_t f3, uint32_t f2)
> +{
> +    HELPER_LOG("%s: f1 %d f2 %d f3 %d\n", __FUNCTION__, f1, f2, f3);
> +    env->fregs[f1].d = float64_sub(float64_mul(env->fregs[f2].d, env->fregs[f3].d, &env->fpu_status), env->fregs[f1].d, &env->fpu_status);
> +}
> +
> +/* 32-bit FP multiply and add RR */
> +void HELPER(maebr)(uint32_t f1, uint32_t f3, uint32_t f2)
> +{
> +    env->fregs[f1].e = float32_add(env->fregs[f1].e, float32_mul(env->fregs[f2].e, env->fregs[f3].e, &env->fpu_status), &env->fpu_status);
> +}
> +
> +/* convert 64-bit float to 128-bit float */
> +void HELPER(lxdb)(uint32_t f1, uint64_t a2)
> +{
> +    union {
> +        float64 d;
> +        uint64_t i;
> +    } v2;
same

> +    v2.i = ldq(a2);
same

> +    FP128 v1;
> +    v1.x = float64_to_float128(v2.d, &env->fpu_status);
> +    env->fregs[f1].i = v1.h;
> +    env->fregs[f1 + 2].i = v1.l;
> +}
> +
> +/* test data class 32-bit */
> +uint32_t HELPER(tceb)(uint32_t f1, uint64_t m2)
> +{
> +    float32 v1 = env->fregs[f1].e;
> +    int neg = float32_is_neg(v1);
> +    uint32_t cc = 0;
> +    HELPER_LOG("%s: v1 0x%lx m2 0x%lx neg %d\n", __FUNCTION__, v1, m2, neg);
> +    if (float32_is_zero(v1) && (m2 & (1 << (11-neg)))) cc = 1;
> +    else if (float32_is_infinity(v1) && (m2 & (1 << (5-neg)))) cc = 1;
> +    else if (float32_is_nan(v1) && (m2 & (1 << (3-neg)))) cc = 1;
> +    else if (float32_is_signaling_nan(v1) && (m2 & (1 << (1-neg)))) cc = 1;
> +    else /* assume normalized number */ if (m2 & (1 << (9-neg))) cc = 1;
> +    /* FIXME: denormalized? */
> +    return cc;
coding style
> +}
> +
> +/* test data class 64-bit */
> +uint32_t HELPER(tcdb)(uint32_t f1, uint64_t m2)
> +{
> +    float64 v1 = env->fregs[f1].d;
> +    int neg = float64_is_neg(v1);
> +    uint32_t cc = 0;
> +    HELPER_LOG("%s: v1 0x%lx m2 0x%lx neg %d\n", __FUNCTION__, v1, m2, neg);
> +    if (float64_is_zero(v1) && (m2 & (1 << (11-neg)))) cc = 1;
> +    else if (float64_is_infinity(v1) && (m2 & (1 << (5-neg)))) cc = 1;
> +    else if (float64_is_nan(v1) && (m2 & (1 << (3-neg)))) cc = 1;
> +    else if (float64_is_signaling_nan(v1) && (m2 & (1 << (1-neg)))) cc = 1;
> +    else /* assume normalized number */ if (m2 & (1 << (9-neg))) cc = 1;
> +    /* FIXME: denormalized? */
> +    return cc;
coding style
> +}
> +
> +/* test data class 128-bit */
> +uint32_t HELPER(tcxb)(uint32_t f1, uint64_t m2)
> +{
> +    FP128 v1;
> +    uint32_t cc = 0;
> +    v1.h = env->fregs[f1].i;
> +    v1.l = env->fregs[f1 + 2].i;
> +    
> +    int neg = float128_is_neg(v1.x);
> +    if (float128_is_zero(v1.x) && (m2 & (1 << (11-neg)))) cc = 1;
> +    else if (float128_is_infinity(v1.x) && (m2 & (1 << (5-neg)))) cc = 1;
> +    else if (float128_is_nan(v1.x) && (m2 & (1 << (3-neg)))) cc = 1;
> +    else if (float128_is_signaling_nan(v1.x) && (m2 & (1 << (1-neg)))) cc = 1;
> +    else /* assume normalized number */ if (m2 & (1 << (9-neg))) cc = 1;
> +    /* FIXME: denormalized? */
> +    return cc;
coding style
> +}
> +
> +/* find leftmost one */
> +uint32_t HELPER(flogr)(uint32_t r1, uint64_t v2)
> +{
> +    uint64_t res = 0;
> +    uint64_t ov2 = v2;
> +    while (!(v2 & 0x8000000000000000ULL) && v2) {
> +        v2 <<= 1;
> +        res++;
> +    }
> +    if (!v2) {
> +        env->regs[r1] = 64;
> +        env->regs[r1 + 1] = 0;
> +        return 0;
> +    }
> +    else {
> +        env->regs[r1] = res;
> +        env->regs[r1 + 1] = ov2 & ~(0x8000000000000000ULL >> res);
> +        return 2;
> +    }
> +}
> +
> +/* square root 64-bit RR */
> +void HELPER(sqdbr)(uint32_t f1, uint32_t f2)
> +{
> +    env->fregs[f1].d = float64_sqrt(env->fregs[f2].d, &env->fpu_status);
> +}
> diff --git a/target-s390x/translate.c b/target-s390x/translate.c
> new file mode 100644
> index 0000000..a1948bf
> --- /dev/null
> +++ b/target-s390x/translate.c
> @@ -0,0 +1,2479 @@
> +/*
> + *  S/390 translation
> + *
> + *  Copyright (c) 2009 Ulrich Hecht
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston MA  02110-1301 USA
> + */
> +#include <stdarg.h>
> +#include <stdlib.h>
> +#include <stdio.h>
> +#include <string.h>
> +#include <inttypes.h>
> +
> +#define S390X_DEBUG_DISAS
> +#ifdef S390X_DEBUG_DISAS
> +#  define LOG_DISAS(...) qemu_log(__VA_ARGS__)
> +#else
> +#  define LOG_DISAS(...) do { } while (0)
> +#endif
> +
> +#include "cpu.h"
> +#include "exec-all.h"
> +#include "disas.h"
> +#include "tcg-op.h"
> +#include "qemu-log.h"
> +
> +/* global register indexes */
> +static TCGv_ptr cpu_env;
> +
> +#include "gen-icount.h"
> +#include "helpers.h"
> +#define GEN_HELPER 1
> +#include "helpers.h"
> +
> +typedef struct DisasContext DisasContext;
> +struct DisasContext {
> +    uint64_t pc;
> +    int is_jmp;
> +    CPUS390XState *env;
> +};
> +
> +#define DISAS_EXCP 4
> +#define DISAS_SVC 5
> +
> +void cpu_dump_state(CPUState *env, FILE *f,
> +                    int (*cpu_fprintf)(FILE *f, const char *fmt, ...),
> +                    int flags)
> +{
> +    int i;
> +    for (i = 0; i < 16; i++) {
> +        cpu_fprintf(f, "R%02d=%016lx", i, env->regs[i]);
> +        if ((i % 4) == 3) cpu_fprintf(f, "\n");
coding style
> +        else cpu_fprintf(f, " ");
> +    }
> +    for (i = 0; i < 16; i++) {
> +        cpu_fprintf(f, "F%02d=%016lx", i, env->fregs[i]);
> +        if ((i % 4) == 3) cpu_fprintf(f, "\n");
coding style
> +        else cpu_fprintf(f, " ");
> +    }
> +    cpu_fprintf(f, "PSW=mask %016lx addr %016lx cc %02x\n", env->psw.mask, env->psw.addr, env->cc);
> +}
> +
> +#define TCGREGS
> +
> +static TCGv global_cc;
> +#ifdef TCGREGS
> +/* registers stored in TCG variables enhance performance */
> +static TCGv tcgregs[16];
> +static TCGv tcgregs32[16];

This variables hold 32-bit TCG values, it should be of type TCGv_i32.

> +#endif
> +static TCGv cc;
> +static TCGv psw_addr;
> +
> +void s390x_translate_init(void)
> +{
> +    cpu_env = tcg_global_reg_new_ptr(TCG_AREG0, "env");
> +    global_cc = tcg_global_mem_new_i32(TCG_AREG0, offsetof(CPUState, cc), "global_cc");
> +#ifdef TCGREGS
> +    int i;
> +    char rn[4];
> +    for (i = 0; i < 16; i++) {
> +        sprintf(rn, "R%d", i);
> +        tcgregs[i] = tcg_global_mem_new_i64(TCG_AREG0, offsetof(CPUState, regs[i]), strdup(rn));
> +        sprintf(rn, "r%d", i);
> +        tcgregs32[i] = tcg_global_mem_new_i32(TCG_AREG0, offsetof(CPUState, regs[i])
> +#ifdef WORDS_BIGENDIAN

This is wrong. It should probably be HOST_WORDS_BIGENDIAN.

> +                                                                                     + 4
> +#endif
> +                                                                                        , strdup(rn));
> +    }
> +#endif
> +    psw_addr = tcg_global_mem_new_i64(TCG_AREG0, offsetof(CPUState, psw.addr), "psw_addr");
> +}
> +
> +#ifdef TCGREGS
> +static inline void sync_reg64(int reg)
> +{
> +    tcg_gen_sync_i64(tcgregs[reg]);
> +}
> +static inline void sync_reg32(int reg)
> +{
> +    tcg_gen_sync_i32(tcgregs32[reg]);
> +}
> +#endif
> +
> +static TCGv load_reg(int reg)
> +{
> +    TCGv r = tcg_temp_new_i64();
> +#ifdef TCGREGS
> +    sync_reg32(reg);
> +    tcg_gen_mov_i64(r, tcgregs[reg]);
> +    return r;
> +#else
> +    tcg_gen_ld_i64(r, cpu_env, offsetof(CPUState, regs[reg]));
> +    return r;
> +#endif
> +}

I don't really like implicit TCGv temp allocation. In other targets it
often has caused missing tcg_temp_free(). It should probably be
rewritten as load_reg(TCGv t, int reg).


> +static TCGv load_freg(int reg)
> +{
> +    TCGv r = tcg_temp_new_i64();
> +    tcg_gen_ld_i64(r, cpu_env, offsetof(CPUState, fregs[reg].d));
> +    return r;
> +}
> +
> +static TCGv load_freg32(int reg)
> +{
> +    TCGv r = tcg_temp_new_i32();
> +    tcg_gen_ld_i32(r, cpu_env, offsetof(CPUState, fregs[reg].e));
> +    return r;
> +}

Should be of type TCGv_i32.

> +static void load_reg32_var(TCGv r, int reg)
> +{
> +#ifdef TCGREGS
> +    sync_reg64(reg);
> +    tcg_gen_mov_i32(r, tcgregs32[reg]);
> +#else
> +#ifdef WORDS_BIGENDIAN

HOST_WORDS_BIGENDIAN ?

> +    tcg_gen_ld_i32(r, cpu_env, offsetof(CPUState, regs[reg]) + 4);
> +#else
> +    tcg_gen_ld_i32(r, cpu_env, offsetof(CPUState, regs[reg]));
> +#endif
> +#endif
> +}

Should be of type TCGv_i32.

> +
> +static TCGv load_reg32(int reg)
> +{
> +    TCGv r = tcg_temp_new_i32();
> +    load_reg32_var(r, reg);
> +    return r;
> +}

Should be of type TCGv_i32.

> +
> +static void store_reg(int reg, TCGv v)
> +{
> +#ifdef TCGREGS
> +    sync_reg32(reg);
> +    tcg_gen_mov_i64(tcgregs[reg], v);
> +#else
> +    tcg_gen_st_i64(v, cpu_env, offsetof(CPUState, regs[reg]));
> +#endif
> +}
> +
> +static void store_freg(int reg, TCGv v)
> +{
> +    tcg_gen_st_i64(v, cpu_env, offsetof(CPUState, fregs[reg].d));
> +}
> +
> +static void store_reg32(int reg, TCGv v)
> +{
> +#ifdef TCGREGS
> +    sync_reg64(reg);
> +    tcg_gen_mov_i32(tcgregs32[reg], v);
> +#else
> +#ifdef WORDS_BIGENDIAN

HOST_WORDS_BIGENDIAN ?

> +    tcg_gen_st_i32(v, cpu_env, offsetof(CPUState, regs[reg]) + 4);
> +#else
> +    tcg_gen_st_i32(v, cpu_env, offsetof(CPUState, regs[reg]));
> +#endif
> +#endif

This should use TCGv_i32.

> +}
> +
> +static void store_reg8(int reg, TCGv v)
> +{
> +#ifdef TCGREGS
> +    TCGv tmp = tcg_temp_new_i32();
> +    sync_reg64(reg);
> +    tcg_gen_andi_i32(tmp, tcgregs32[reg], 0xffffff00UL);
> +    tcg_gen_or_i32(tcgregs32[reg], tmp, v);
> +    tcg_temp_free(tmp);
> +#else
> +#ifdef WORDS_BIGENDIAN

HOST_WORDS_BIGENDIAN ?

> +    tcg_gen_st8_i32(v, cpu_env, offsetof(CPUState, regs[reg]) + 7);
> +#else
> +    tcg_gen_st8_i32(v, cpu_env, offsetof(CPUState, regs[reg]));
> +#endif
> +#endif
> +}

This should use TCGv_i32.

> +
> +static void store_freg32(int reg, TCGv v)
> +{
> +    tcg_gen_st_i32(v, cpu_env, offsetof(CPUState, fregs[reg].e));
> +}

This should use TCGv_i32.


For all register load/store, as already explained in the TCG sync op
patch, I am in favor of using the ld/st version (TCGREGS not defined).

> +
> +static void gen_illegal_opcode(DisasContext *s)
> +{
> +    TCGv tmp = tcg_temp_new_i64();
> +    tcg_gen_movi_i64(tmp, 42);
tcg_const_i64 could be used instead.
> +    gen_helper_exception(tmp);

Missing tcg_temp_free_i64(tmp);

> +    s->is_jmp = DISAS_EXCP;
> +}
> +
> +#define DEBUGINSN LOG_DISAS("insn: 0x%lx\n", insn);
> +
> +static TCGv get_address(int x2, int b2, int d2)
> +{
> +    TCGv tmp = 0,tmp2 = 0;
> +    if (d2) tmp = tcg_const_i64(d2);
> +    if (x2) {
> +        if (d2) {
> +            tmp2 = load_reg(x2);
> +            tcg_gen_add_i64(tmp, tmp, tmp2);
> +            tcg_temp_free(tmp2);
> +        }
> +        else {
> +            tmp = load_reg(x2);
> +        }
> +    }
> +    if (b2) {
> +        if (d2 || x2) {
> +            tmp2 = load_reg(b2);
> +            tcg_gen_add_i64(tmp, tmp, tmp2);
> +            tcg_temp_free(tmp2);
> +        }
> +        else {
> +            tmp = load_reg(b2);
> +        }
> +    }
> +    
> +    if (!(d2 || x2 || b2)) tmp = tcg_const_i64(0);
> +    
> +    return tmp;
coding style
> +}
> +
> +static inline void set_cc_nz_u32(TCGv val)
> +{
> +    gen_helper_set_cc_nz_u32(cc, val);
> +}
> +
> +static inline void set_cc_nz_u64(TCGv val)
> +{
> +    gen_helper_set_cc_nz_u64(cc, val);
> +}
> +
> +static inline void set_cc_s32(TCGv val)
> +{
> +    gen_helper_set_cc_s32(cc, val);
> +}
> +
> +static inline void set_cc_s64(TCGv val)
> +{
> +    gen_helper_set_cc_s64(cc, val);
> +}
> +
> +static inline void cmp_s32(TCGv v1, TCGv v2)
> +{
> +    gen_helper_cmp_s32(cc, v1, v2);
> +}
> +
> +static inline void cmp_u32(TCGv v1, TCGv v2)
> +{
> +    gen_helper_cmp_u32(cc, v1, v2);
> +}
> +
> +/* this is a hysterical raisin */
> +static inline void cmp_s32c(TCGv v1, int32_t v2)
> +{
> +    gen_helper_cmp_s32(cc, v1, tcg_const_i32(v2));

The TCG passed to the helper should be freed.

> +}
> +static inline void cmp_u32c(TCGv v1, uint32_t v2)
> +{
> +    gen_helper_cmp_u32(cc, v1, tcg_const_i32(v2));

Same.

> +}
> +
> +
> +static inline void cmp_s64(TCGv v1, TCGv v2)
> +{
> +    gen_helper_cmp_s64(cc, v1, v2);
> +}
> +
> +static inline void cmp_u64(TCGv v1, TCGv v2)
> +{
> +    gen_helper_cmp_u64(cc, v1, v2);
> +}
> +
> +/* see cmp_[su]32c() */
> +static inline void cmp_s64c(TCGv v1, int64_t v2)
> +{
> +    gen_helper_cmp_s64(cc, v1, tcg_const_i64(v2));

Same

> +}
> +static inline void cmp_u64c(TCGv v1, uint64_t v2)
> +{
> +    gen_helper_cmp_u64(cc, v1, tcg_const_i64(v2));

Same

> +}
> +
> +static void gen_bcr(uint32_t mask, int tr, uint64_t offset)
> +{
> +    TCGv target;
> +    if (mask == 0xf) {	/* unconditional */
> +      target = load_reg(tr);
> +      tcg_gen_mov_i64(psw_addr, target);
> +    }
> +    else {
> +      gen_helper_bcr(cc, tcg_const_i32(mask), (target = load_reg(tr)), tcg_const_i64(offset));
> +    }
> +    tcg_temp_free(target);
> +}
> +
> +static void gen_brc(uint32_t mask, uint64_t pc, int32_t offset)
> +{
> +    if (mask == 0xf) {	/* unconditional */
> +      tcg_gen_movi_i64(psw_addr, pc + offset);
> +    }
> +    else {
> +      gen_helper_brc(cc, tcg_const_i32(mask), tcg_const_i64(pc), tcg_const_i32(offset));
> +    }
> +}

The branches should be handled using brcond and goto_tb and exit_tb and
not with helpers, to allow TB chaining.

> +static void gen_set_cc_add64(TCGv v1, TCGv v2, TCGv vr)
> +{
> +    gen_helper_set_cc_add64(cc, v1, v2, vr);
> +}
> +
> +static void disas_e3(DisasContext* s, int op, int r1, int x2, int b2, int d2)
> +{
> +    TCGv tmp = 0, tmp2 = 0, tmp3 = 0;

This is wrong. 0 maps to global 0 (env in your case). -1 should be used
instead, or even better the TCGV_UNUSED macro.

> +    
> +    LOG_DISAS("disas_e3: op 0x%x r1 %d x2 %d b2 %d d2 %d\n", op, r1, x2, b2, d2);
> +    tmp = get_address(x2, b2, d2);
> +    switch (op) {
> +    case 0x2: /* LTG R1,D2(X2,B2) [RXY] */
> +    case 0x4: /* lg r1,d2(x2,b2) */
> +        tmp2 = tcg_temp_new_i64();
> +        tcg_gen_qemu_ld64(tmp2, tmp, 1);
> +        store_reg(r1, tmp2);
> +        if (op == 0x2) set_cc_s64(tmp2);
coding style
> +        break;
> +    case 0x12: /* LT R1,D2(X2,B2) [RXY] */
> +        tmp2 = tcg_temp_new_i32();

Wrong TCGv type.

> +        tcg_gen_qemu_ld32s(tmp2, tmp, 1);

tcg_gen_qemu_ld32s loads a 32 bit value in the default size register,
that is 64-bit here.

> +        store_reg32(r1, tmp2);
> +        set_cc_s32(tmp2);
> +        break;
> +    case 0xc: /* MSG      R1,D2(X2,B2)     [RXY] */
> +    case 0x1c: /* MSGF     R1,D2(X2,B2)     [RXY] */
> +        tmp2 = tcg_temp_new_i64();
> +        if (op == 0xc) {
> +            tcg_gen_qemu_ld64(tmp2, tmp, 1);
> +        }
> +        else {
> +            tcg_gen_qemu_ld32s(tmp2, tmp, 1);
> +            tcg_gen_ext32s_i64(tmp2, tmp2);

The sign extension is already done by qemu_ld32s

> +        }
> +        tmp = load_reg(r1);
> +        tcg_gen_mul_i64(tmp, tmp, tmp2);
> +        store_reg(r1, tmp);
> +        break;
> +    case 0xd: /* DSG      R1,D2(X2,B2)     [RXY] */
> +    case 0x1d: /* DSGF      R1,D2(X2,B2)     [RXY] */
> +        tmp2 = tcg_temp_new_i64();
> +        if (op == 0x1d) {
> +            tcg_gen_qemu_ld32s(tmp2, tmp, 1);
> +            tcg_gen_ext32s_i64(tmp2, tmp2);

The sign extension is already done by qemu_ld32s

> +        }
> +        else {
> +            tcg_gen_qemu_ld64(tmp2, tmp, 1);
> +        }
> +        tmp = load_reg(r1 + 1);
> +        tmp3 = tcg_temp_new_i64();
> +        tcg_gen_div_i64(tmp3, tmp, tmp2);
> +        store_reg(r1 + 1, tmp3);
> +        tcg_gen_rem_i64(tmp3, tmp, tmp2);
> +        store_reg(r1, tmp3);
> +        break;
> +    case 0x8: /* AG      R1,D2(X2,B2)     [RXY] */
> +    case 0xa: /* ALG      R1,D2(X2,B2)     [RXY] */
> +    case 0x18: /* AGF       R1,D2(X2,B2)     [RXY] */
> +    case 0x1a: /* ALGF      R1,D2(X2,B2)     [RXY] */
> +        if (op == 0x1a) {
> +            tmp2 = tcg_temp_new_i32();

Wrong TCGv type.

> +            tcg_gen_qemu_ld32u(tmp2, tmp, 1);
> +            tcg_gen_ext32u_i64(tmp2, tmp2);

The zero extension is already done by qemu_ld32u

> +        }
> +        else if (op == 0x18) {
> +            tmp2 = tcg_temp_new_i32();

Wrong TCGv type.

> +            tcg_gen_qemu_ld32s(tmp2, tmp, 1);
> +            tcg_gen_ext32s_i64(tmp2, tmp2);

The sign extension is already done by qemu_ld32s

> +        }
> +        else {
coding style
> +            tmp2 = tcg_temp_new_i64();
> +            tcg_gen_qemu_ld64(tmp2, tmp, 1);
> +        }
> +        tmp = load_reg(r1);
> +        tmp3 = tcg_temp_new_i64();
> +        tcg_gen_add_i64(tmp3, tmp, tmp2);
> +        store_reg(r1, tmp3);
> +        switch (op) {
> +        case 0x8: case 0x18: gen_set_cc_add64(tmp, tmp2, tmp3); break;
> +        case 0xa: case 0x1a: gen_helper_set_cc_addu64(cc, tmp, tmp2, tmp3); break;
> +        default: tcg_abort();

tcg_abort() is wrong here, it is for internal TCG use. Also all the real
CPU it probably launch an illegal instruction exception, it does not
power off the machin.

> +        }
> +        break;
> +    case 0x9: /* SG      R1,D2(X2,B2)     [RXY] */
> +    case 0xb: /* SLG      R1,D2(X2,B2)     [RXY] */
> +    case 0x19: /* SGF      R1,D2(X2,B2)     [RXY] */
> +    case 0x1b: /* SLGF     R1,D2(X2,B2)     [RXY] */
> +        if (op == 0x19) {
> +            tmp2 = tcg_temp_new_i32();

Wrong TCGv type.

> +            tcg_gen_qemu_ld32s(tmp2, tmp, 1);
> +            tcg_gen_ext32s_i64(tmp2, tmp2);

The sign extension is already done by qemu_ld32s

> +        }
> +        else if (op == 0x1b) {
> +            tmp2 = tcg_temp_new_i32();

Wrong TCGv type.

> +            tcg_gen_qemu_ld32u(tmp2, tmp, 1);
> +            tcg_gen_ext32u_i64(tmp2, tmp2);

The zero extension is already done by qemu_ld32u

> +        }
> +        else {
> +            tmp2 = tcg_temp_new_i64();
> +            tcg_gen_qemu_ld64(tmp2, tmp, 1);
> +        }
> +        tmp = load_reg(r1);
> +        tmp3 = tcg_temp_new_i64();
> +        tcg_gen_sub_i64(tmp3, tmp, tmp2);
> +        store_reg(r1, tmp3);
> +        switch (op) {
> +        case 0x9: case 0x19: gen_helper_set_cc_sub64(cc, tmp, tmp2, tmp3); break;
> +        case 0xb: case 0x1b: gen_helper_set_cc_subu64(cc, tmp, tmp2, tmp3); break;
> +        default: tcg_abort();

Same comment as previous tcg_abort() one.

> +        }
> +        break;
> +    case 0x14: /* LGF      R1,D2(X2,B2)     [RXY] */
> +    case 0x16: /* LLGF      R1,D2(X2,B2)     [RXY] */
> +        tmp2 = tcg_temp_new_i64();
> +        tcg_gen_qemu_ld32u(tmp2, tmp, 1);
> +        switch (op) {
> +        case 0x14: tcg_gen_ext32s_i64(tmp2, tmp2); break;
> +        case 0x16: tcg_gen_ext32u_i64(tmp2, tmp2); break;
> +        default: tcg_abort();

Same comment as previous tcg_abort() one.

> +        }
> +        store_reg(r1, tmp2);
> +        break;
> +    case 0x15: /* LGH     R1,D2(X2,B2)     [RXY] */
> +        tmp2 = tcg_temp_new_i64();
> +        tcg_gen_qemu_ld16s(tmp2, tmp, 1);
> +        tcg_gen_ext16s_i64(tmp2, tmp2);
> +        store_reg(r1, tmp2);
> +        break;
> +    case 0x17: /* LLGT      R1,D2(X2,B2)     [RXY] */
> +        tmp2 = tcg_temp_new_i64();
> +        tcg_gen_qemu_ld32u(tmp2, tmp, 1);
> +        tcg_gen_ext32u_i64(tmp2, tmp2);

The zero extension is already done by qemu_ld32u

> +        tcg_gen_andi_i64(tmp2, tmp2, 0x7fffffffULL);
> +        store_reg(r1, tmp2);
> +        break;
> +    case 0x1e: /* LRV R1,D2(X2,B2) [RXY] */
> +        tmp2 = tcg_temp_new_i32();

Wrong TCGv type;

> +        tcg_gen_qemu_ld32u(tmp2, tmp, 1);
> +        tcg_gen_bswap32_i32(tmp2, tmp2);
Wrong TCGv type;

> +        store_reg(r1, tmp2);
> +        break;
> +    case 0x20: /* CG      R1,D2(X2,B2)     [RXY] */
> +    case 0x21: /* CLG      R1,D2(X2,B2) */
> +    case 0x30: /* CGF       R1,D2(X2,B2)     [RXY] */
> +    case 0x31: /* CLGF      R1,D2(X2,B2)     [RXY] */
> +        tmp2 = tcg_temp_new_i64();
> +        switch (op) {
> +        case 0x20:
> +        case 0x21:
> +            tcg_gen_qemu_ld64(tmp2, tmp, 1);
> +            break;
> +        case 0x30:
> +            tcg_gen_qemu_ld32s(tmp2, tmp, 1);
> +            tcg_gen_ext32s_i64(tmp2, tmp2);

The zero extension is already done by qemu_ld32s

> +            break;
> +        case 0x31:
> +            tcg_gen_qemu_ld32u(tmp2, tmp, 1);
> +            tcg_gen_ext32u_i64(tmp2, tmp2);

The zero extension is already done by qemu_ld32u

> +            break;
> +        default:
> +            tcg_abort();

Same comment as previous tcg_abort() one.

> +        }
> +        tmp = load_reg(r1);
> +        switch (op) {
> +        case 0x20: case 0x30: cmp_s64(tmp, tmp2); break;
> +        case 0x21: case 0x31: cmp_u64(tmp, tmp2); break;
> +        default: tcg_abort();

Same comment as previous tcg_abort() one.

> +        }
> +        break;
> +    case 0x24: /* stg r1, d2(x2,b2) */
> +        tmp2 = load_reg(r1);
> +        tcg_gen_qemu_st64(tmp2, tmp, 1);
> +        break;
> +    case 0x3e: /* STRV R1,D2(X2,B2) [RXY] */
> +        tmp2 = load_reg32(r1);
> +        tcg_gen_bswap32_i32(tmp2, tmp2);
> +        tcg_gen_qemu_st32(tmp2, tmp, 1);

Wrong TCGv type.

> +        break;
> +    case 0x50: /* STY  R1,D2(X2,B2) [RXY] */
> +        tmp2 = load_reg32(r1);
> +        tcg_gen_qemu_st32(tmp2, tmp, 1);

Wrong TCGv type.

> +        break;
> +    case 0x57: /* XY R1,D2(X2,B2) [RXY] */
> +        tmp2 = load_reg32(r1);
> +        tmp3 = tcg_temp_new_i32();
> +        tcg_gen_qemu_ld32u(tmp3, tmp, 1);
> +        tcg_gen_xor_i32(tmp, tmp2, tmp3);
> +        store_reg32(r1, tmp);
> +        set_cc_nz_u32(tmp);

Wrong TCGv type.

> +        break;
> +    case 0x58: /* LY R1,D2(X2,B2) [RXY] */
> +        tmp3 = tcg_temp_new_i32();
> +        tcg_gen_qemu_ld32u(tmp3, tmp, 1);
> +        store_reg32(r1, tmp3);

Wrong TCGv type.

> +        break;
> +    case 0x5a: /* AY R1,D2(X2,B2) [RXY] */
> +    case 0x5b: /* SY R1,D2(X2,B2) [RXY] */
> +        tmp2 = load_reg32(r1);
> +        tmp3 = tcg_temp_new_i32();

Wrong TCGv type.

> +        tcg_gen_qemu_ld32s(tmp3, tmp, 1);
> +        switch (op) {
> +        case 0x5a: tcg_gen_add_i32(tmp, tmp2, tmp3); break;
> +        case 0x5b: tcg_gen_sub_i32(tmp, tmp2, tmp3); break;
> +        default: tcg_abort();

Same comment as previous tcg_abort() one.

> +        }
> +        store_reg32(r1, tmp);
> +        switch (op) {
> +        case 0x5a: gen_helper_set_cc_add32(cc, tmp2, tmp3, tmp); break;
> +        case 0x5b: gen_helper_set_cc_sub32(cc, tmp2, tmp3, tmp); break;
> +        default: tcg_abort();

Same comment as previous tcg_abort() one.

> +        }
> +        break;
> +    case 0x71: /* LAY R1,D2(X2,B2) [RXY] */
> +        store_reg(r1, tmp);
> +        break;
> +    case 0x72: /* STCY R1,D2(X2,B2) [RXY] */
> +        tmp2 = load_reg32(r1);
> +        tcg_gen_qemu_st8(tmp2, tmp, 1);
> +        break;
> +    case 0x73: /* ICY R1,D2(X2,B2) [RXY] */
> +        tmp3 = tcg_temp_new_i32();
Wrong TCGv type.
> +        tcg_gen_qemu_ld8u(tmp3, tmp, 1);
> +        store_reg8(r1, tmp3);
> +        break; 
> +    case 0x76: /* LB R1,D2(X2,B2) [RXY] */
> +    case 0x77: /* LGB R1,D2(X2,B2) [RXY] */
> +        tmp2 = tcg_temp_new_i64();
> +        tcg_gen_qemu_ld8s(tmp2, tmp, 1);
> +        switch (op) {
> +        case 0x76:
> +            tcg_gen_ext8s_i32(tmp2, tmp2);
Wrong TCGv type.
> +            store_reg32(r1, tmp2);
> +            break;
> +        case 0x77:
> +            tcg_gen_ext8s_i64(tmp2, tmp2);
Wrong TCGv type.
> +            store_reg(r1, tmp2);
> +            break;
> +        default: tcg_abort();

Same comment as previous tcg_abort() one.

> +        }
> +        break;
> +    case 0x78: /* LHY R1,D2(X2,B2) [RXY] */
> +        tmp2 = tcg_temp_new_i32();

Wrong TCGv type.

> +        tcg_gen_qemu_ld16s(tmp2, tmp, 1);
> +        tcg_gen_ext16s_i32(tmp2, tmp2);
> +        store_reg32(r1, tmp2);
> +        break;
> +    case 0x80: /* NG      R1,D2(X2,B2)     [RXY] */
> +    case 0x81: /* OG      R1,D2(X2,B2)     [RXY] */
> +    case 0x82: /* XG      R1,D2(X2,B2)     [RXY] */
> +        tmp2 = load_reg(r1);
> +        tmp3 = tcg_temp_new_i64();
> +        tcg_gen_qemu_ld64(tmp3, tmp, 1);
> +        switch (op) {
> +        case 0x80: tcg_gen_and_i64(tmp, tmp2, tmp3); break;
> +        case 0x81: tcg_gen_or_i64(tmp, tmp2, tmp3); break;
> +        case 0x82: tcg_gen_xor_i64(tmp, tmp2, tmp3); break;
> +        default: tcg_abort();

Same comment as previous tcg_abort() one.

> +        }
> +        store_reg(r1, tmp);
> +        set_cc_nz_u64(tmp);
> +        break;
> +    case 0x86: /* MLG      R1,D2(X2,B2)     [RXY] */
> +        tmp2 = tcg_temp_new_i64();
> +        tcg_gen_qemu_ld64(tmp2, tmp, 1);
> +        tmp = tcg_const_i32(r1);

Wrong TCGv type.

> +        gen_helper_mlg(tmp, tmp2);
> +        break;
> +    case 0x87: /* DLG      R1,D2(X2,B2)     [RXY] */
> +        tmp2 = tcg_temp_new_i64();
> +        tcg_gen_qemu_ld64(tmp2, tmp, 1);
> +        tmp = tcg_const_i32(r1);

Wrong TCGv type.

> +        gen_helper_dlg(tmp, tmp2);
> +        break;
> +    case 0x88: /* ALCG      R1,D2(X2,B2)     [RXY] */
> +        tmp2 = tcg_temp_new_i64();
> +        tcg_gen_qemu_ld64(tmp2, tmp, 1);
> +        tmp = load_reg(r1);
> +        tmp3 = tcg_temp_new_i64();
> +        tcg_gen_shri_i64(tmp3, cc, 1);
> +        tcg_gen_andi_i64(tmp3, tmp3, 1);
> +        tcg_gen_add_i64(tmp3, tmp2, tmp3);;
> +        tcg_gen_add_i64(tmp3, tmp, tmp3);
> +        store_reg(r1, tmp3);
> +        gen_helper_set_cc_addc_u64(cc, tmp, tmp2, tmp3);
> +        break;
> +    case 0x89: /* SLBG      R1,D2(X2,B2)     [RXY] */
> +        tmp2 = tcg_temp_new_i64();
> +        tcg_gen_qemu_ld64(tmp2, tmp, 1);
> +        tmp = load_reg(r1);
> +        tmp3 = tcg_const_i32(r1);

Wrong TCGv type.

> +        gen_helper_slbg(cc, cc, tmp3, tmp, tmp2);
> +        break;
> +    case 0x90: /* LLGC      R1,D2(X2,B2)     [RXY] */
> +        tmp2 = tcg_temp_new_i64();
> +        tcg_gen_qemu_ld8u(tmp2, tmp, 1);
> +        store_reg(r1, tmp2);
> +        break;
> +    case 0x91: /* LLGH      R1,D2(X2,B2)     [RXY] */
> +        tmp2 = tcg_temp_new_i64();
> +        tcg_gen_qemu_ld16u(tmp2, tmp, 1);
> +        store_reg(r1, tmp2);
> +        break;
> +    case 0x94: /* LLC     R1,D2(X2,B2)     [RXY] */
> +        tmp2 = tcg_temp_new_i32();

Wrong TCGv type.

> +        tcg_gen_qemu_ld8u(tmp2, tmp, 1);
> +        store_reg32(r1, tmp2);
> +        break;
> +    case 0x95: /* LLH     R1,D2(X2,B2)     [RXY] */
> +        tmp2 = tcg_temp_new_i32();
> +        tcg_gen_qemu_ld16u(tmp2, tmp, 1);
> +        store_reg32(r1, tmp2);
> +        break;
> +    case 0x98: /* ALC     R1,D2(X2,B2)     [RXY] */
> +        tmp2 = tcg_temp_new_i32();

Wrong TCGv type.

> +        tcg_gen_qemu_ld32u(tmp2, tmp, 1);
> +        tmp = tcg_const_i32(r1);
> +        gen_helper_addc_u32(cc, cc, tmp, tmp2);
> +        break;
> +    case 0x99: /* SLB     R1,D2(X2,B2)     [RXY] */
> +        tmp2 = tcg_temp_new_i32();

Wrong TCGv type.

> +        tcg_gen_qemu_ld32u(tmp2, tmp, 1);
> +        tmp = load_reg32(r1);
> +        tmp3 = tcg_const_i32(r1);
> +        gen_helper_slb(cc, cc, tmp3, tmp, tmp2);
> +        break;
> +    default:
> +        LOG_DISAS("illegal e3 operation 0x%x\n", op);
> +        gen_illegal_opcode(s);
> +        break;
> +    }
> +    tcg_temp_free(tmp);
> +    if (tmp2) tcg_temp_free(tmp2);
> +    if (tmp3) tcg_temp_free(tmp3);

Comparison on TCGv type is not allowed.

> +}
> +
> +static void disas_eb(DisasContext *s, int op, int r1, int r3, int b2, int d2)
> +{
> +    TCGv tmp = 0,tmp2 = 0,tmp3 = 0,tmp4 = 0;

Same comment as in the previous function. 0 maps to a global.

> +    int i;
> +    
> +    LOG_DISAS("disas_eb: op 0x%x r1 %d r3 %d b2 %d d2 0x%x\n", op, r1, r3, b2, d2);
> +    switch (op) {
> +    case 0xc: /* SRLG     R1,R3,D2(B2)     [RSY] */
> +    case 0xd: /* SLLG     R1,R3,D2(B2)     [RSY] */
> +    case 0xa: /* SRAG     R1,R3,D2(B2)     [RSY] */
> +    case 0x1c: /* RLLG     R1,R3,D2(B2)     [RSY] */
> +        if (b2) {
> +            tmp = get_address(0, b2, d2);
> +            tcg_gen_andi_i64(tmp, tmp, 0x3f);
> +        }
coding style
> +        else tmp = tcg_const_i32(d2 & 0x3f);

Wrong TCGv type.

> +        tmp2 = load_reg(r3);
> +        tmp3 = tcg_temp_new_i64();
> +        switch (op) {
> +        case 0xc: tcg_gen_shr_i64(tmp3, tmp2, tmp); break;
> +        case 0xd: tcg_gen_shl_i64(tmp3, tmp2, tmp); break;
> +        case 0xa: tcg_gen_sar_i64(tmp3, tmp2, tmp); break;
> +        case 0x1c: tcg_gen_rotl_i64(tmp3, tmp2, tmp); break;
> +        default: tcg_abort(); break;

Same comment as previous tcg_abort() one.

> +        }
> +        store_reg(r1, tmp3);
> +        if (op == 0xa) set_cc_s64(tmp3);
> +        break;
> +    case 0x1d: /* RLL    R1,R3,D2(B2)        [RSY] */
> +        if (b2) {
> +            tmp = get_address(0, b2, d2);
> +            tcg_gen_andi_i64(tmp, tmp, 0x3f);
> +        }
> +        else tmp = tcg_const_i32(d2 & 0x3f);
> +        tmp2 = load_reg32(r3);
> +        tmp3 = tcg_temp_new_i32();

Wrong TCGv type.

> +        switch (op) {
> +        case 0x1d: tcg_gen_rotl_i32(tmp3, tmp2, tmp); break;
> +        default: tcg_abort(); break;

Same comment as previous tcg_abort() one.

> +        }
> +        store_reg32(r1, tmp3);
> +        break;
> +    case 0x4: /* LMG     R1,R3,D2(B2)     [RSY] */
> +    case 0x24: /* stmg */
> +        /* Apparently, unrolling lmg/stmg of any size gains performance -
> +           even for very long ones... */
> +        if (r3 > r1) {
> +            tmp = get_address(0, b2, d2);
> +            for (i = r1; i <= r3; i++) {
> +                if (op == 0x4) {
> +                    tmp2 = tcg_temp_new_i64();
> +                    tcg_gen_qemu_ld64(tmp2, tmp, 1);
> +                    store_reg(i, tmp2);
> +                    /* At least one register is usually read after an lmg
> +                       (br %rsomething), which is why freeing them is
> +                       detrimental to performance */
> +                }
> +                else {
> +                    tmp2 = load_reg(i);
> +                    tcg_gen_qemu_st64(tmp2, tmp, 1);
> +                    /* R15 is usually read after an stmg; other registers
> +                       generally aren't and can be free'd */
> +                    if (i != 15) tcg_temp_free(tmp2);
> +                }
> +                tcg_gen_addi_i64(tmp, tmp, 8);
> +            }
> +            tmp2 = 0;
> +        }
> +        else {
> +            tmp = tcg_const_i32(r1);
> +            tmp2 = tcg_const_i32(r3);
> +            tmp3 = tcg_const_i32(b2);
> +            tmp4 = tcg_const_i32(d2);

Wrong TCGv type.

> +            if (op == 0x4) gen_helper_lmg(tmp, tmp2, tmp3, tmp4);
> +            else gen_helper_stmg(tmp, tmp2, tmp3, tmp4);
> +        }
> +        break;
> +    case 0x2c: /* STCMH R1,M3,D2(B2) [RSY] */
> +        tmp2 = get_address(0, b2, d2);
> +        tmp = tcg_const_i32(r1);
> +        tmp3 = tcg_const_i32(r3);

Wrong TCGv type.

> +        gen_helper_stcmh(cc, tmp, tmp2, tmp3);
> +        break;
> +    case 0x30: /* CSG     R1,R3,D2(B2)     [RSY] */
> +        tmp2 = get_address(0, b2, d2);
> +        tmp = tcg_const_i32(r1);
> +        tmp3 = tcg_const_i32(r3);

Wrong TCGv type.

> +        gen_helper_csg(cc, tmp, tmp2, tmp3);
> +        break;
> +    case 0x3e: /* CDSG R1,R3,D2(B2) [RSY] */
> +        tmp2 = get_address(0, b2, d2);
> +        tmp = tcg_const_i32(r1);
> +        tmp3 = tcg_const_i32(r3);

Wrong TCGv type.

> +        gen_helper_cdsg(cc, tmp, tmp2, tmp3);
> +        break;
> +    case 0x51: /* TMY D1(B1),I2 [SIY] */
> +        tmp = get_address(0, b2, d2); /* SIY -> this is the destination */
> +        tmp2 = tcg_temp_new_i32();
> +        tcg_gen_qemu_ld8u(tmp2, tmp, 1);
> +        tmp = tcg_const_i32((r1 << 4) | r3);

Wrong TCGv type.

> +        gen_helper_tm(cc, tmp2, tmp);
> +        break;
> +    case 0x52: /* MVIY D1(B1),I2 [SIY] */
> +        tmp2 = tcg_const_i32((r1 << 4) | r3);

Wrong TCGv type.

> +        tmp = get_address(0, b2, d2); /* SIY -> this is the destination */
> +        tcg_gen_qemu_st8(tmp2, tmp, 1);
> +        break;
> +    case 0x55: /* CLIY D1(B1),I2 [SIY] */
> +        tmp3 = get_address(0, b2, d2); /* SIY -> this is the 1st operand */
> +        tmp = tcg_temp_new_i32();

Wrong TCGv type.

> +        tcg_gen_qemu_ld8u(tmp, tmp3, 1);
> +        cmp_u32c(tmp, (r1 << 4) | r3);
> +        break;
> +    case 0x80: /* ICMH      R1,M3,D2(B2)     [RSY] */
> +        tmp2 = get_address(0, b2, d2);
> +        tmp = tcg_const_i32(r1);
> +        tmp3 = tcg_const_i32(r3);

Wrong TCGv type.

> +        gen_helper_icmh(cc, tmp, tmp2, tmp3);
> +        break;
> +    default:
> +        LOG_DISAS("illegal eb operation 0x%x\n", op);
> +        gen_illegal_opcode(s);
> +        break;
> +    }
> +    if (tmp) tcg_temp_free(tmp);
> +    if (tmp2) tcg_temp_free(tmp2);
> +    if (tmp3) tcg_temp_free(tmp3);
> +    if (tmp4) tcg_temp_free(tmp4);

Comparison on TCGv type is not allowed.

> +}
> +
> +static void disas_ed(DisasContext *s, int op, int r1, int x2, int b2, int d2, int r1b)
> +{
> +    TCGv tmp, tmp2, tmp3 = 0;

tmp should be declared as TGV_i32 here, so that the types are correct
for the whole function. Also tmp3 should not be initialized to 0.

> +    tmp2 = get_address(x2, b2, d2);
> +    tmp = tcg_const_i32(r1);
> +    switch (op) {
> +    case 0x5: /* LXDB R1,D2(X2,B2) [RXE] */
> +        gen_helper_lxdb(tmp, tmp2);
> +        break;
> +    case 0x9: /* CEB    R1,D2(X2,B2)       [RXE] */
> +        gen_helper_ceb(cc, tmp, tmp2);
> +        break;
> +    case 0xa: /* AEB    R1,D2(X2,B2)       [RXE] */
> +        gen_helper_aeb(cc, tmp, tmp2);
> +        break;
> +    case 0xb: /* SEB    R1,D2(X2,B2)       [RXE] */
> +        gen_helper_seb(cc, tmp, tmp2);
> +        break;
> +    case 0xd: /* DEB    R1,D2(X2,B2)       [RXE] */
> +        gen_helper_deb(tmp, tmp2);
> +        break;
> +    case 0x10: /* TCEB   R1,D2(X2,B2)       [RXE] */
> +        gen_helper_tceb(cc, tmp, tmp2);
> +        break;
> +    case 0x11: /* TCDB   R1,D2(X2,B2)       [RXE] */
> +        gen_helper_tcdb(cc, tmp, tmp2);
> +        break;
> +    case 0x12: /* TCXB   R1,D2(X2,B2)       [RXE] */
> +        gen_helper_tcxb(cc, tmp, tmp2);
> +        break;
> +    case 0x17: /* MEEB   R1,D2(X2,B2)       [RXE] */
> +        gen_helper_meeb(tmp, tmp2);
> +        break;
> +    case 0x19: /* CDB    R1,D2(X2,B2)       [RXE] */
> +        gen_helper_cdb(cc, tmp, tmp2);
> +        break;
> +    case 0x1a: /* ADB    R1,D2(X2,B2)       [RXE] */
> +        gen_helper_adb(cc, tmp, tmp2);
> +        break;
> +    case 0x1b: /* SDB    R1,D2(X2,B2)       [RXE] */
> +        gen_helper_sdb(cc, tmp, tmp2);
> +        break;
> +    case 0x1c: /* MDB    R1,D2(X2,B2)       [RXE] */
> +        gen_helper_mdb(tmp, tmp2);
> +        break;
> +    case 0x1d: /* DDB    R1,D2(X2,B2)       [RXE] */
> +        gen_helper_ddb(tmp, tmp2);
> +        break;
> +    case 0x1e: /* MADB  R1,R3,D2(X2,B2) [RXF] */
> +        /* for RXF insns, r1 is R3 and r1b is R1 */
> +        tmp3 = tcg_const_i32(r1b);
> +        gen_helper_madb(tmp3, tmp2, tmp);

tmp3 should be freed after the helper.

> +        break;
> +    default:
> +        LOG_DISAS("illegal ed operation 0x%x\n", op);
> +        gen_illegal_opcode(s);
> +        break;
> +    }
> +    tcg_temp_free(tmp);
> +    tcg_temp_free(tmp2);
> +}
> +
> +static void disas_a5(DisasContext *s, int op, int r1, int i2)
> +{
> +    TCGv tmp = 0,tmp2 = 0;

Same.

> +    uint64_t vtmp;
> +    LOG_DISAS("disas_a5: op 0x%x r1 %d i2 0x%x\n", op, r1, i2);
> +    switch (op) {
> +    case 0x0: /* IIHH     R1,I2     [RI] */
> +    case 0x1: /* IIHL     R1,I2     [RI] */
> +        tmp = load_reg(r1);
> +        vtmp = i2;
> +        switch (op) {
> +        case 0x0: tcg_gen_andi_i64(tmp, tmp, 0x0000ffffffffffffULL); vtmp <<= 48; break;
> +        case 0x1: tcg_gen_andi_i64(tmp, tmp, 0xffff0000ffffffffULL); vtmp <<= 32; break;
> +        default: tcg_abort();

Same comment as previous tcg_abort() one.

> +        }
> +        tcg_gen_ori_i64(tmp, tmp, vtmp);
> +        store_reg(r1, tmp);
> +        break;
> +    case 0x4: /* NIHH     R1,I2     [RI] */
> +    case 0x8: /* OIHH     R1,I2     [RI] */
> +        tmp = load_reg(r1);
> +        switch (op) {
> +        case 0x4:
> +            tmp2 = tcg_const_i64( (((uint64_t)i2) << 48) | 0x0000ffffffffffffULL);
> +            tcg_gen_and_i64(tmp, tmp, tmp2);
> +            break;
> +        case 0x8:
> +            tmp2 = tcg_const_i64(((uint64_t)i2) << 48);
> +            tcg_gen_or_i64(tmp, tmp, tmp2);
> +            break;
> +        default: tcg_abort();

Same comment as previous tcg_abort() one.

> +        }
> +        store_reg(r1, tmp);
> +        tcg_gen_shri_i64(tmp2, tmp, 48);
> +        tcg_gen_trunc_i64_i32(tmp2, tmp2);

Wrong TCGv type.

> +        set_cc_nz_u32(tmp2);
> +        break;
> +    case 0x5: /* NIHL     R1,I2     [RI] */
> +    case 0x9: /* OIHL     R1,I2     [RI] */
> +        tmp = load_reg(r1);
> +        switch (op) {
> +        case 0x5:
> +            tmp2 = tcg_const_i64( (((uint64_t)i2) << 32) | 0xffff0000ffffffffULL);
> +            tcg_gen_and_i64(tmp, tmp, tmp2);
> +            break;
> +        case 0x9:
> +            tmp2 = tcg_const_i64(((uint64_t)i2) << 32);
> +            tcg_gen_or_i64(tmp, tmp, tmp2);
> +            break;
> +        default: tcg_abort();

Same comment as previous tcg_abort() one.

> +        }
> +        store_reg(r1, tmp);
> +        tcg_gen_shri_i64(tmp2, tmp, 32);
> +        tcg_gen_trunc_i64_i32(tmp2, tmp2);

Wrong TCGv type.

> +        tcg_gen_andi_i32(tmp2, tmp2, 0xffff);
> +        set_cc_nz_u32(tmp2);
> +        break;
> +    case 0x6: /* NILH     R1,I2     [RI] */
> +    case 0xa: /* OILH     R1,I2     [RI] */
> +        tmp = load_reg(r1);
> +        switch (op) {
> +        case 0x6:
> +            tmp2 = tcg_const_i64( (((uint64_t)i2) << 16) | 0xffffffff0000ffffULL);
> +            tcg_gen_and_i64(tmp, tmp, tmp2);
> +            break;
> +        case 0xa:
> +            tmp2 = tcg_const_i64(((uint64_t)i2) << 16);
> +            tcg_gen_or_i64(tmp, tmp, tmp2);
> +            break;
> +        default: tcg_abort();

Same comment as previous tcg_abort() one.

> +        }
> +        store_reg(r1, tmp);
> +        tcg_gen_shri_i64(tmp2, tmp, 16);
> +        tcg_gen_trunc_i64_i32(tmp2, tmp2);
> +        tcg_gen_andi_i32(tmp2, tmp2, 0xffff);

Wrong TCGv type.

> +        set_cc_nz_u32(tmp2);
> +        break;
> +    case 0x7: /* NILL     R1,I2     [RI] */
> +    case 0xb: /* OILL     R1,I2     [RI] */
> +        tmp = load_reg(r1);
> +        switch (op) {
> +        case 0x7:
> +            tmp2 = tcg_const_i64(i2 | 0xffffffffffff0000ULL);
> +            tcg_gen_and_i64(tmp, tmp, tmp2);
> +            break;
> +        case 0xb: 
> +            tmp2 = tcg_const_i64(i2);
> +            tcg_gen_or_i64(tmp, tmp, tmp2);
> +            break;
> +        default: tcg_abort(); break;
> +        }
> +        store_reg(r1, tmp);
> +        tcg_gen_trunc_i64_i32(tmp, tmp);
> +        tcg_gen_andi_i32(tmp, tmp, 0xffff);

Wrong TCGv type.

> +        set_cc_nz_u32(tmp);	/* signedness should not matter here */
> +        break;
> +    case 0xc: /* LLIHH     R1,I2     [RI] */
> +        tmp = tcg_const_i64( ((uint64_t)i2) << 48 );
> +        store_reg(r1, tmp);
> +        break;
> +    case 0xd: /* LLIHL     R1,I2     [RI] */
> +        tmp = tcg_const_i64( ((uint64_t)i2) << 32 );
> +        store_reg(r1, tmp);
> +        break;
> +    case 0xe: /* LLILH     R1,I2     [RI] */
> +        tmp = tcg_const_i64( ((uint64_t)i2) << 16 );
> +        store_reg(r1, tmp);
> +        break;
> +    case 0xf: /* LLILL     R1,I2     [RI] */
> +        tmp = tcg_const_i64(i2);
> +        store_reg(r1, tmp);
> +        break;
> +    default:
> +        LOG_DISAS("illegal a5 operation 0x%x\n", op);
> +        gen_illegal_opcode(s);
> +        break;
> +    }
> +    if (tmp) tcg_temp_free(tmp);
> +    if (tmp2) tcg_temp_free(tmp2);

Comparison on TCGv type is not allowed.

> +}
> +
> +static void disas_a7(DisasContext *s, int op, int r1, int i2)
> +{
> +    TCGv tmp = 0,tmp2 = 0,tmp3 = 0;
Same

> +    LOG_DISAS("disas_a7: op 0x%x r1 %d i2 0x%x\n", op, r1, i2);
> +    switch (op) {
> +    case 0x0: /* TMLH or TMH     R1,I2     [RI] */
> +        tmp = load_reg(r1);
> +        tcg_gen_shri_i64(tmp, tmp, 16);
> +        tmp2 = tcg_const_i32((uint16_t)i2);

Wrong TCGv type.

> +        gen_helper_tmxx(cc, tmp, tmp2);
> +        break;
> +    case 0x1: /* TMLL or TML     R1,I2     [RI] */
> +        tmp = load_reg(r1);
> +        tmp2 = tcg_const_i32((uint16_t)i2);

Wrong TCGv type.

> +        gen_helper_tmxx(cc, tmp, tmp2);
> +        break;
> +    case 0x2: /* TMHH     R1,I2     [RI] */
> +        tmp = load_reg(r1);
> +        tcg_gen_shri_i64(tmp, tmp, 48);
> +        tmp2 = tcg_const_i32((uint16_t)i2);

Wrong TCGv type.

> +        gen_helper_tmxx(cc, tmp, tmp2);
> +        break;
> +    case 0x3: /* TMHL     R1,I2     [RI] */
> +        tmp = load_reg(r1);
> +        tcg_gen_shri_i64(tmp, tmp, 32);
> +        tmp2 = tcg_const_i32((uint16_t)i2);

Wrong TCGv type.

> +        gen_helper_tmxx(cc, tmp, tmp2);
> +        break;
> +    case 0x4: /* brc m1, i2 */
> +        /* FIXME: optimize m1 == 0xf (unconditional) case */
> +        gen_brc(r1, s->pc, i2 * 2);
> +        s->is_jmp = DISAS_JUMP;
> +        break;
> +    case 0x5: /* BRAS     R1,I2     [RI] */
> +        tmp = tcg_const_i64(s->pc + 4);
> +        store_reg(r1, tmp);
> +        tmp = tcg_const_i64(s->pc + i2 * 2);
> +        tcg_gen_st_i64(tmp, cpu_env, offsetof(CPUState, psw.addr));
> +        s->is_jmp = DISAS_JUMP;
> +        break;
> +    case 0x6: /* BRCT     R1,I2     [RI] */
> +        tmp = load_reg32(r1);
> +        tcg_gen_subi_i32(tmp, tmp, 1);
> +        store_reg32(r1, tmp);
> +        tmp2 = tcg_const_i64(s->pc);
> +        tmp3 = tcg_const_i32(i2 * 2);

Wrong TCGv type.

> +        gen_helper_brct(tmp, tmp2, tmp3);
> +        s->is_jmp = DISAS_JUMP;
> +        break;
> +    case 0x7: /* BRCTG     R1,I2     [RI] */
> +        tmp = load_reg(r1);
> +        tcg_gen_subi_i64(tmp, tmp, 1);
> +        store_reg(r1, tmp);
> +        tmp2 = tcg_const_i64(s->pc);
> +        tmp3 = tcg_const_i32(i2 * 2);

Wrong TCGv type.

> +        gen_helper_brctg(tmp, tmp2, tmp3);
> +        s->is_jmp = DISAS_JUMP;
> +        break;
> +    case 0x8: /* lhi r1, i2 */
> +        tmp = tcg_const_i32(i2);

Wrong TCGv type.

> +        store_reg32(r1, tmp);
> +        break;
> +    case 0x9: /* lghi r1, i2 */
> +        tmp = tcg_const_i64(i2);
> +        store_reg(r1, tmp);
> +        break;
> +    case 0xa: /* AHI     R1,I2     [RI] */
> +        tmp = load_reg32(r1);
> +        tmp3 = tcg_temp_new_i32();
> +        tcg_gen_addi_i32(tmp3, tmp, i2);
> +        store_reg32(r1, tmp3);
> +        tmp2 = tcg_const_i32(i2);

Wrong TCGv type.

> +        gen_helper_set_cc_add32(cc, tmp, tmp2, tmp3);
> +        break;
> +    case 0xb: /* aghi r1, i2 */
> +        tmp = load_reg(r1);
> +        tmp3 = tcg_temp_new_i64();
> +        tcg_gen_addi_i64(tmp3, tmp, i2);
> +        store_reg(r1, tmp3);
> +        tmp2 = tcg_const_i64(i2);
> +        gen_set_cc_add64(tmp, tmp2, tmp3);
> +        break;
> +    case 0xc: /* MHI     R1,I2     [RI] */
> +        tmp = load_reg32(r1);
> +        tcg_gen_muli_i32(tmp, tmp, i2);

Wrong TCGv type.

> +        store_reg32(r1, tmp);
> +        break;
> +    case 0xd: /* MGHI     R1,I2     [RI] */
> +        tmp = load_reg(r1);
> +        tcg_gen_muli_i64(tmp, tmp, i2);
> +        store_reg(r1, tmp);
> +        break;
> +    case 0xe: /* CHI     R1,I2     [RI] */
> +        tmp = load_reg32(r1);
> +        cmp_s32c(tmp, i2);
> +        break;
> +    case 0xf: /* CGHI     R1,I2     [RI] */
> +        tmp = load_reg(r1);
> +        cmp_s64c(tmp, i2);
> +        break;
> +    default:
> +        LOG_DISAS("illegal a7 operation 0x%x\n", op);
> +        gen_illegal_opcode(s);
> +        break;
> +    }
> +    if (tmp) tcg_temp_free(tmp);
> +    if (tmp2) tcg_temp_free(tmp2);
> +    if (tmp3) tcg_temp_free(tmp3);

Comparison on TCGv type is not allowed.

> +}
> +
> +static void disas_b2(DisasContext *s, int op, int r1, int r2)
> +{
> +    TCGv tmp = 0, tmp2 = 0, tmp3 = 0;

The three should be of type TCGv_i32 and not initialized to 0 

> +    LOG_DISAS("disas_b2: op 0x%x r1 %d r2 %d\n", op, r1, r2);
> +    switch (op) {
> +    case 0x22: /* IPM    R1               [RRE] */
> +        tmp = tcg_const_i32(r1);
> +        gen_helper_ipm(cc, tmp);
> +        break;
> +    case 0x4e: /* SAR     R1,R2     [RRE] */
> +        tmp = load_reg32(r2);
> +        tcg_gen_st_i32(tmp, cpu_env, offsetof(CPUState, aregs[r1]));
> +        break;
> +    case 0x4f: /* EAR     R1,R2     [RRE] */
> +        tmp = tcg_temp_new_i32();
> +        tcg_gen_ld_i32(tmp, cpu_env, offsetof(CPUState, aregs[r2]));
> +        store_reg32(r1, tmp);
> +        break;
> +    case 0x52: /* MSR     R1,R2     [RRE] */
> +        tmp = load_reg32(r1);
> +        tmp2 = load_reg32(r2);
> +        tcg_gen_mul_i32(tmp, tmp, tmp2);
> +        store_reg32(r1, tmp);
> +        break;
> +    case 0x55: /* MVST     R1,R2     [RRE] */
> +        tmp = load_reg32(0);
> +        tmp2 = tcg_const_i32(r1);
> +        tmp3 = tcg_const_i32(r2);
> +        gen_helper_mvst(cc, tmp, tmp2, tmp3);
> +        break;
> +    case 0x5d: /* CLST     R1,R2     [RRE] */
> +        tmp = load_reg32(0);
> +        tmp2 = tcg_const_i32(r1);
> +        tmp3 = tcg_const_i32(r2);
> +        gen_helper_clst(cc, tmp, tmp2, tmp3);
> +        break;
> +    case 0x5e: /* SRST     R1,R2     [RRE] */
> +        tmp = load_reg32(0);
> +        tmp2 = tcg_const_i32(r1);
> +        tmp3 = tcg_const_i32(r2);
> +        gen_helper_srst(cc, tmp, tmp2, tmp3);
> +        break;
> +    default:
> +        LOG_DISAS("illegal b2 operation 0x%x\n", op);
> +        gen_illegal_opcode(s);
> +        break;
> +    }
> +    if (tmp) tcg_temp_free(tmp);
> +    if (tmp2) tcg_temp_free(tmp2);
> +    if (tmp3) tcg_temp_free(tmp3);

Comparison on TCGv type is not allowed.

> +}
> +
> +static void disas_b3(DisasContext *s, int op, int m3, int r1, int r2)
> +{
> +    TCGv tmp = 0, tmp2 = 0, tmp3 = 0;

The three should be of type TCGv_i32 and not initialized to 0 

> +    LOG_DISAS("disas_b3: op 0x%x m3 0x%x r1 %d r2 %d\n", op, m3, r1, r2);
> +#define FP_HELPER(i) \
> +    tmp = tcg_const_i32(r1); \
> +    tmp2 = tcg_const_i32(r2); \
> +    gen_helper_ ## i (tmp, tmp2);
> +#define FP_HELPER_CC(i) \
> +    tmp = tcg_const_i32(r1); \
> +    tmp2 = tcg_const_i32(r2); \
> +    gen_helper_ ## i (cc, tmp, tmp2);
> +
> +    switch (op) {
> +    case 0x0: /* LPEBR       R1,R2             [RRE] */
> +        FP_HELPER_CC(lpebr); break;
> +    case 0x2: /* LTEBR       R1,R2             [RRE] */
> +        FP_HELPER_CC(ltebr); break;
> +    case 0x3: /* LCEBR       R1,R2             [RRE] */
> +        FP_HELPER_CC(lcebr); break;
> +    case 0x4: /* LDEBR       R1,R2             [RRE] */
> +        FP_HELPER(ldebr); break;
> +    case 0x5: /* LXDBR       R1,R2             [RRE] */
> +        FP_HELPER(lxdbr); break;
> +    case 0x9: /* CEBR        R1,R2             [RRE] */
> +        FP_HELPER_CC(cebr); break;
> +    case 0xa: /* AEBR        R1,R2             [RRE] */
> +        FP_HELPER_CC(aebr); break;
> +    case 0xb: /* SEBR        R1,R2             [RRE] */
> +        FP_HELPER_CC(sebr); break;
> +    case 0xd: /* DEBR        R1,R2             [RRE] */
> +        FP_HELPER(debr); break;
> +    case 0x10: /* LPDBR       R1,R2             [RRE] */
> +        FP_HELPER_CC(lpdbr); break;
> +    case 0x12: /* LTDBR       R1,R2             [RRE] */
> +        FP_HELPER_CC(ltdbr); break;
> +    case 0x13: /* LCDBR       R1,R2             [RRE] */
> +        FP_HELPER_CC(lcdbr); break;
> +    case 0x15: /* SQBDR       R1,R2             [RRE] */
> +        FP_HELPER(sqdbr); break;
> +    case 0x17: /* MEEBR       R1,R2             [RRE] */
> +        FP_HELPER(meebr); break;
> +    case 0x19: /* CDBR        R1,R2             [RRE] */
> +        FP_HELPER_CC(cdbr); break;
> +    case 0x1a: /* ADBR        R1,R2             [RRE] */
> +        FP_HELPER_CC(adbr); break;
> +    case 0x1b: /* SDBR        R1,R2             [RRE] */
> +        FP_HELPER_CC(sdbr); break;
> +    case 0x1c: /* MDBR        R1,R2             [RRE] */
> +        FP_HELPER(mdbr); break;
> +    case 0x1d: /* DDBR        R1,R2             [RRE] */
> +        FP_HELPER(ddbr); break;
> +    case 0xe: /* MAEBR  R1,R3,R2 [RRF] */
> +    case 0x1e: /* MADBR R1,R3,R2 [RRF] */
> +    case 0x1f: /* MSDBR R1,R3,R2 [RRF] */
> +        /* for RRF insns, m3 is R1, r1 is R3, and r2 is R2 */
> +        tmp = tcg_const_i32(m3);
> +        tmp2 = tcg_const_i32(r2);
> +        tmp3 = tcg_const_i32(r1);
> +        switch (op) {
> +        case 0xe: gen_helper_maebr(tmp, tmp3, tmp2); break;
> +        case 0x1e: gen_helper_madbr(tmp, tmp3, tmp2); break;
> +        case 0x1f: gen_helper_msdbr(tmp, tmp3, tmp2); break;
> +        default: tcg_abort();

Same comment as previous tcg_abort() one.

> +        }
> +        break;
> +    case 0x40: /* LPXBR       R1,R2             [RRE] */
> +        FP_HELPER_CC(lpxbr); break;
> +    case 0x42: /* LTXBR       R1,R2             [RRE] */
> +        FP_HELPER_CC(ltxbr); break;
> +    case 0x43: /* LCXBR       R1,R2             [RRE] */
> +        FP_HELPER_CC(lcxbr); break;
> +    case 0x44: /* LEDBR       R1,R2             [RRE] */
> +        FP_HELPER(ledbr); break;
> +    case 0x45: /* LDXBR       R1,R2             [RRE] */
> +        FP_HELPER(ldxbr); break;
> +    case 0x46: /* LEXBR       R1,R2             [RRE] */
> +        FP_HELPER(lexbr); break;
> +    case 0x49: /* CXBR        R1,R2             [RRE] */
> +        FP_HELPER_CC(cxbr); break;
> +    case 0x4a: /* AXBR        R1,R2             [RRE] */
> +        FP_HELPER_CC(axbr); break;
> +    case 0x4b: /* SXBR        R1,R2             [RRE] */
> +        FP_HELPER_CC(sxbr); break;
> +    case 0x4c: /* MXBR        R1,R2             [RRE] */
> +        FP_HELPER(mxbr); break;
> +    case 0x4d: /* DXBR        R1,R2             [RRE] */
> +        FP_HELPER(dxbr); break;
> +    case 0x65: /* LXR         R1,R2             [RRE] */
> +        tmp = load_freg(r2);
> +        store_freg(r1, tmp);
> +        tmp = load_freg(r2 + 2);
> +        store_freg(r1 + 2, tmp);
> +        break;
> +    case 0x74: /* LZER        R1                [RRE] */
> +        tmp = tcg_const_i32(r1);
> +        gen_helper_lzer(tmp);
> +        break;
> +    case 0x75: /* LZDR        R1                [RRE] */
> +        tmp = tcg_const_i32(r1);
> +        gen_helper_lzdr(tmp);
> +        break;
> +    case 0x76: /* LZXR        R1                [RRE] */
> +        tmp = tcg_const_i32(r1);
> +        gen_helper_lzxr(tmp);
> +        break;
> +    case 0x84: /* SFPC        R1                [RRE] */
> +        tmp = load_reg32(r1);
> +        tcg_gen_st_i32(tmp, cpu_env, offsetof(CPUState, fpc));
> +        break;
> +    case 0x8c: /* EFPC        R1                [RRE] */
> +        tmp = tcg_temp_new_i32();
> +        tcg_gen_ld_i32(tmp, cpu_env, offsetof(CPUState, fpc));
> +        store_reg32(r1, tmp);
> +        break;
> +    case 0x94: /* CEFBR       R1,R2             [RRE] */
> +    case 0x95: /* CDFBR       R1,R2             [RRE] */
> +    case 0x96: /* CXFBR       R1,R2             [RRE] */
> +        tmp = tcg_const_i32(r1);
> +        tmp2 = load_reg32(r2);
> +        switch (op) {
> +        case 0x94: gen_helper_cefbr(tmp, tmp2); break;
> +        case 0x95: gen_helper_cdfbr(tmp, tmp2); break;
> +        case 0x96: gen_helper_cxfbr(tmp, tmp2); break;
> +        default: tcg_abort();

Same comment as previous tcg_abort() one.

> +        }
> +        break;
> +    case 0x98: /* CFEBR       R1,R2             [RRE] */
> +    case 0x99: /* CFDBR	      R1,R2             [RRE] */
> +    case 0x9a: /* CFXBR       R1,R2             [RRE] */
> +        tmp = tcg_const_i32(r1);
> +        tmp2 = tcg_const_i32(r2);
> +        tmp3 = tcg_const_i32(m3);
> +        switch (op) {
> +        case 0x98: gen_helper_cfebr(cc, tmp, tmp2, tmp3); break;
> +        case 0x99: gen_helper_cfdbr(cc, tmp, tmp2, tmp3); break;
> +        case 0x9a: gen_helper_cfxbr(cc, tmp, tmp2, tmp3); break;
> +        default: tcg_abort();

Same comment as previous tcg_abort() one.

> +        }
> +        break;
> +    case 0xa4: /* CEGBR       R1,R2             [RRE] */
> +    case 0xa5: /* CDGBR       R1,R2             [RRE] */
> +        tmp = tcg_const_i32(r1);
> +        tmp2 = load_reg(r2);
> +        switch (op) {
> +        case 0xa4: gen_helper_cegbr(tmp, tmp2); break;
> +        case 0xa5: gen_helper_cdgbr(tmp, tmp2); break;
> +        default: tcg_abort();
> +        }
> +        break;
> +    case 0xa6: /* CXGBR       R1,R2             [RRE] */
> +        tmp = tcg_const_i32(r1);
> +        tmp2 = load_reg(r2);
> +        gen_helper_cxgbr(tmp, tmp2);
> +        break;
> +    case 0xa8: /* CGEBR       R1,R2             [RRE] */
> +        tmp = tcg_const_i32(r1);
> +        tmp2 = tcg_const_i32(r2);
> +        tmp3 = tcg_const_i32(m3);
> +        gen_helper_cgebr(cc, tmp, tmp2, tmp3);
> +        break;
> +    case 0xa9: /* CGDBR       R1,R2             [RRE] */
> +        tmp = tcg_const_i32(r1);
> +        tmp2 = tcg_const_i32(r2);
> +        tmp3 = tcg_const_i32(m3);
> +        gen_helper_cgdbr(cc, tmp, tmp2, tmp3);
> +        break;
> +    case 0xaa: /* CGXBR       R1,R2             [RRE] */
> +        tmp = tcg_const_i32(r1);
> +        tmp2 = tcg_const_i32(r2);
> +        tmp3 = tcg_const_i32(m3);
> +        gen_helper_cgxbr(cc, tmp, tmp2, tmp3);
> +        break;
> +    default:
> +        LOG_DISAS("illegal b3 operation 0x%x\n", op);
> +        gen_illegal_opcode(s);
> +        break;
> +    }
> +    if (tmp) tcg_temp_free(tmp);
> +    if (tmp2) tcg_temp_free(tmp2);
> +    if (tmp3) tcg_temp_free(tmp3);

Comparison on TCGv type is not allowed.

> +}
> +
> +static void disas_b9(DisasContext *s, int op, int r1, int r2)
> +{
> +    TCGv tmp = 0, tmp2 = 0, tmp3 = 0;

The variables should not be initialized to 0 

> +    LOG_DISAS("disas_b9: op 0x%x r1 %d r2 %d\n", op, r1, r2);
> +    switch (op) {
> +    case 0: /* LPGR     R1,R2     [RRE] */
> +    case 0x10: /* LPGFR R1,R2 [RRE] */
> +        if (op == 0) {
> +            tmp2 = load_reg(r2);
> +        }
> +        else {
> +            tmp2 = load_reg32(r2);
> +            tcg_gen_ext32s_i64(tmp2, tmp2);
> +        }
> +        tmp = tcg_const_i32(r1);

Wrong TCG type.

> +        gen_helper_abs_i64(cc, tmp, tmp2);
> +        break;
> +    case 1: /* LNGR     R1,R2     [RRE] */
> +        tmp2 = load_reg(r2);
> +        tmp = tcg_const_i32(r1);

Wrong TCG type.

> +        gen_helper_nabs_i64(cc, tmp, tmp2);
> +        break;
> +    case 2: /* LTGR R1,R2 [RRE] */
> +        tmp = load_reg(r2);
> +        if (r1 != r2) store_reg(r1, tmp);
coding style
> +        set_cc_s64(tmp);
> +        break;
> +    case 3: /* LCGR     R1,R2     [RRE] */
> +    case 0x13: /* LCGFR    R1,R2     [RRE] */
> +        if (op == 0x13) {
> +            tmp = load_reg32(r2);

Wrong TCG type.

> +            tcg_gen_ext32s_i64(tmp, tmp);
> +        }
> +        else {
> +            tmp = load_reg(r2);
> +        }
> +        tcg_gen_neg_i64(tmp, tmp);
> +        store_reg(r1, tmp);
> +        gen_helper_set_cc_comp_s64(cc, tmp);
> +        break;
> +    case 4: /* LGR R1,R2 [RRE] */
> +        tmp = load_reg(r2);
> +        store_reg(r1, tmp);
> +        break;
> +    case 0x6: /* LGBR R1,R2 [RRE] */
> +        tmp2 = load_reg(r2);
> +        tcg_gen_ext8s_i64(tmp2, tmp2);
> +        store_reg(r1, tmp2);
> +        break;
> +    case 8: /* AGR     R1,R2     [RRE] */
> +    case 0xa: /* ALGR     R1,R2     [RRE] */
> +        tmp = load_reg(r1);
> +        tmp2 = load_reg(r2);
> +        tmp3 = tcg_temp_new_i64();
> +        tcg_gen_add_i64(tmp3, tmp, tmp2);
> +        store_reg(r1, tmp3);
> +        switch (op) {
> +        case 0x8: gen_set_cc_add64(tmp, tmp2, tmp3); break;
> +        case 0xa: gen_helper_set_cc_addu64(cc, tmp, tmp2, tmp3); break;
> +        default: tcg_abort();

Same comment as previous tcg_abort() one.

> +        }
> +        break;
> +    case 9: /* SGR     R1,R2     [RRE] */
> +    case 0xb: /* SLGR     R1,R2     [RRE] */
> +    case 0x1b: /* SLGFR     R1,R2     [RRE] */
> +    case 0x19: /* SGFR     R1,R2     [RRE] */
> +        tmp = load_reg(r1);
> +        switch (op) {
> +        case 0x1b: case 0x19:
> +            tmp2 = load_reg32(r2);
> +            if (op == 0x19) tcg_gen_ext32s_i64(tmp2, tmp2);
> +            else tcg_gen_ext32u_i64(tmp2, tmp2);
coding style
> +            break;
> +        default:
> +            tmp2 = load_reg(r2);
> +            break;
> +        }
> +        tmp3 = tcg_temp_new_i64();
> +        tcg_gen_sub_i64(tmp3, tmp, tmp2);
> +        store_reg(r1, tmp3);
> +        switch (op) {
> +        case 9: case 0x19: gen_helper_set_cc_sub64(cc, tmp,tmp2,tmp3); break;
> +        case 0xb: case 0x1b: gen_helper_set_cc_subu64(cc, tmp, tmp2, tmp3); break;
> +        default: tcg_abort();

Same comment as previous tcg_abort() one.

> +        }
> +        break;
> +    case 0xc: /* MSGR      R1,R2     [RRE] */
> +    case 0x1c: /* MSGFR      R1,R2     [RRE] */
> +        tmp = load_reg(r1);
> +        tmp2 = load_reg(r2);
> +        if (op == 0x1c) tcg_gen_ext32s_i64(tmp2, tmp2);
> +        tcg_gen_mul_i64(tmp, tmp, tmp2);
> +        store_reg(r1, tmp);
> +        break;
> +    case 0xd: /* DSGR      R1,R2     [RRE] */
> +    case 0x1d: /* DSGFR      R1,R2     [RRE] */
> +        tmp = load_reg(r1 + 1);
> +        if (op == 0xd) {
> +            tmp2 = load_reg(r2);
> +        }
> +        else {
> +            tmp2 = load_reg32(r2);
> +            tcg_gen_ext32s_i64(tmp2, tmp2);
> +        }
> +        tmp3 = tcg_temp_new_i64();
> +        tcg_gen_div_i64(tmp3, tmp, tmp2);
> +        store_reg(r1 + 1, tmp3);
> +        tcg_gen_rem_i64(tmp3, tmp, tmp2);
> +        store_reg(r1, tmp3);
> +        break;
> +    case 0x14: /* LGFR     R1,R2     [RRE] */
> +        tmp = load_reg32(r2);
> +        tmp2 = tcg_temp_new_i64();
> +        tcg_gen_ext32s_i64(tmp2, tmp);
> +        store_reg(r1, tmp2);
> +        break;
> +    case 0x16: /* LLGFR      R1,R2     [RRE] */
> +        tmp = load_reg32(r2);
> +        tcg_gen_ext32u_i64(tmp, tmp);
> +        store_reg(r1, tmp);
> +        break;
> +    case 0x17: /* LLGTR      R1,R2     [RRE] */
> +        tmp = load_reg32(r2);
> +        tcg_gen_andi_i32(tmp, tmp, 0x7fffffffUL);
Wrong TCG type.
> +        tcg_gen_ext32u_i64(tmp, tmp);
> +        store_reg(r1, tmp);
> +        break;
> +    case 0x18: /* AGFR     R1,R2     [RRE] */
> +    case 0x1a: /* ALGFR     R1,R2     [RRE] */
> +        tmp2 = load_reg32(r2);
> +        switch (op) {
> +        case 0x18: tcg_gen_ext32s_i64(tmp2, tmp2); break;
> +        case 0x1a: tcg_gen_ext32u_i64(tmp2, tmp2); break;
> +        default: tcg_abort();

Same comment as previous tcg_abort() one.

> +        }
> +        tmp = load_reg(r1);
> +        tmp3 = tcg_temp_new_i64();
> +        tcg_gen_add_i64(tmp3, tmp, tmp2);
> +        store_reg(r1, tmp3);
> +        switch (op) {
> +        case 0x18: gen_set_cc_add64(tmp, tmp2, tmp3); break;
> +        case 0x1a: gen_helper_set_cc_addu64(cc, tmp, tmp2, tmp3); break;
> +        default: tcg_abort();

Same comment as previous tcg_abort() one.

> +        }
> +        break;
> +    case 0x20: /* CGR     R1,R2     [RRE] */
> +    case 0x30: /* CGFR     R1,R2     [RRE] */
> +        tmp2 = load_reg(r2);
> +        if (op == 0x30) tcg_gen_ext32s_i64(tmp2, tmp2);
> +        tmp = load_reg(r1);
> +        cmp_s64(tmp, tmp2);
> +        break;
> +    case 0x21: /* CLGR     R1,R2     [RRE] */
> +    case 0x31: /* CLGFR    R1,R2     [RRE] */
> +        tmp2 = load_reg(r2);
> +        if (op == 0x31) tcg_gen_ext32u_i64(tmp2, tmp2);
> +        tmp = load_reg(r1);
> +        cmp_u64(tmp, tmp2);
> +        break;
> +    case 0x26: /* LBR R1,R2 [RRE] */
> +        tmp2 = load_reg32(r2);
> +        tcg_gen_ext8s_i32(tmp2, tmp2);
Wrong TCG type.
> +        store_reg32(r1, tmp2);
> +        break;
> +    case 0x27: /* LHR R1,R2 [RRE] */
> +        tmp2 = load_reg32(r2);
> +        tcg_gen_ext16s_i32(tmp2, tmp2);
> +        store_reg32(r1, tmp2);
> +        break;
> +    case 0x80: /* NGR R1,R2 [RRE] */
> +    case 0x81: /* OGR R1,R2 [RRE] */
> +    case 0x82: /* XGR R1,R2 [RRE] */
> +        tmp = load_reg(r1);
> +        tmp2 = load_reg(r2);
> +        switch (op) {
> +        case 0x80: tcg_gen_and_i64(tmp, tmp, tmp2); break;
> +        case 0x81: tcg_gen_or_i64(tmp, tmp, tmp2); break;
> +        case 0x82: tcg_gen_xor_i64(tmp, tmp, tmp2); break;
> +        default: tcg_abort();

Same comment as previous tcg_abort() one.

> +        }
> +        store_reg(r1, tmp);
> +        set_cc_nz_u64(tmp);
> +        break;
> +    case 0x83: /* FLOGR R1,R2 [RRE] */
> +        tmp2 = load_reg(r2);
> +        tmp = tcg_const_i32(r1);
Wrong TCG type.
> +        gen_helper_flogr(cc, tmp, tmp2);
> +        break;
> +    case 0x84: /* LLGCR R1,R2 [RRE] */
> +        tmp = load_reg(r2);
> +        tcg_gen_andi_i64(tmp, tmp, 0xff);
> +        store_reg(r1, tmp);
> +        break;
> +    case 0x85: /* LLGHR R1,R2 [RRE] */
> +        tmp = load_reg(r2);
> +        tcg_gen_andi_i64(tmp, tmp, 0xffff);
> +        store_reg(r1, tmp);
> +        break;
> +    case 0x87: /* DLGR      R1,R2     [RRE] */
> +        tmp = tcg_const_i32(r1);
Wrong TCG type.
> +        tmp2 = load_reg(r2);
> +        gen_helper_dlg(tmp, tmp2);
> +        break;
> +    case 0x88: /* ALCGR     R1,R2     [RRE] */
> +        tmp = load_reg(r1);
> +        tmp2 = load_reg(r2);
> +        tmp3 = tcg_temp_new_i64();
> +        tcg_gen_shri_i64(tmp3, cc, 1);
> +        tcg_gen_andi_i64(tmp3, tmp3, 1);
> +        tcg_gen_add_i64(tmp3, tmp2, tmp3);
> +        tcg_gen_add_i64(tmp3, tmp, tmp3);
> +        store_reg(r1, tmp3);
> +        gen_helper_set_cc_addc_u64(cc, tmp, tmp2, tmp3);
> +        break;
> +    case 0x89: /* SLBGR   R1,R2     [RRE] */
> +        tmp = load_reg(r1);
> +        tmp2 = load_reg(r2);
> +        tmp3 = tcg_const_i32(r1);
Wrong TCG type.
> +        gen_helper_slbg(cc, cc, tmp3, tmp, tmp2);
> +        break;
> +    case 0x94: /* LLCR R1,R2 [RRE] */
> +        tmp = load_reg32(r2);
> +        tcg_gen_andi_i32(tmp, tmp, 0xff);
> +        store_reg32(r1, tmp);
> +        break;
> +    case 0x95: /* LLHR R1,R2 [RRE] */
> +        tmp = load_reg32(r2);
> +        tcg_gen_andi_i32(tmp, tmp, 0xffff);
Wrong TCG type.
> +        store_reg32(r1, tmp);
> +        break;
> +    case 0x98: /* ALCR    R1,R2     [RRE] */
> +        tmp = tcg_const_i32(r1);
Wrong TCG type.
> +        tmp2 = load_reg32(r2);
> +        gen_helper_addc_u32(cc, cc, tmp, tmp2);
> +        break;
> +    case 0x99: /* SLBR    R1,R2     [RRE] */
> +        tmp = load_reg32(r1);
> +        tmp2 = load_reg32(r2);
> +        tmp3 = tcg_const_i32(r1);
Wrong TCG type.
> +        gen_helper_slb(cc, cc, tmp3, tmp, tmp2);
> +        break;
> +    default:
> +        LOG_DISAS("illegal b9 operation 0x%x\n", op);
> +        gen_illegal_opcode(s);
> +        break;
> +    }
> +    if (tmp) tcg_temp_free(tmp);
> +    if (tmp2) tcg_temp_free(tmp2);
> +    if (tmp3) tcg_temp_free(tmp3);

Comparison on TCGv type is not allowed.

> +}
> +
> +static void disas_c0(DisasContext *s, int op, int r1, int i2)
> +{
> +    TCGv tmp = 0, tmp2 = 0, tmp3 = 0;

The variables should not be initialized to 0 

> +    LOG_DISAS("disas_c0: op 0x%x r1 %d i2 %d\n", op, r1, i2);
> +    uint64_t target = s->pc + i2 * 2;
> +    /* FIXME: huh? */ target &= 0xffffffff;

That should be fixed before a merge.

> +    switch (op) {
> +    case 0: /* larl r1, i2 */
> +        tmp = tcg_const_i64(target);
> +        store_reg(r1, tmp);
> +        break;
> +    case 0x1: /* LGFI R1,I2 [RIL] */
> +        tmp = tcg_const_i64((int64_t)i2);
> +        store_reg(r1, tmp);
> +        break;
> +    case 0x4: /* BRCL     M1,I2     [RIL] */
> +        tmp = tcg_const_i32(r1); /* aka m1 */
Wrong TCG type.
> +        tmp2 = tcg_const_i64(s->pc);
> +        tmp3 = tcg_const_i64(i2 * 2);
> +        gen_helper_brcl(cc, tmp, tmp2, tmp3);
> +        s->is_jmp = DISAS_JUMP;
> +        break;
> +    case 0x5: /* brasl r1, i2 */
> +        tmp = tcg_const_i64(s->pc + 6);
> +        store_reg(r1, tmp);
> +        tmp = tcg_const_i64(target);
> +        tcg_gen_st_i64(tmp, cpu_env, offsetof(CPUState, psw.addr));
> +        s->is_jmp = DISAS_JUMP;
> +        break;
> +    case 0x7: /* XILF R1,I2 [RIL] */
> +    case 0xb: /* NILF R1,I2 [RIL] */
> +    case 0xd: /* OILF R1,I2 [RIL] */
> +        tmp = load_reg32(r1);
> +        switch (op) {
> +        case 0x7: tcg_gen_xori_i32(tmp, tmp, (uint32_t)i2); break;
> +        case 0xb: tcg_gen_andi_i32(tmp, tmp, (uint32_t)i2); break;
> +        case 0xd: tcg_gen_ori_i32(tmp, tmp, (uint32_t)i2); break;
Wrong TCG type.
> +        default: tcg_abort();

Same comment as previous tcg_abort() one.

> +        }
> +        store_reg32(r1, tmp);
> +        tcg_gen_trunc_i64_i32(tmp, tmp);
Wrong TCG type.
> +        set_cc_nz_u32(tmp);
> +        break;
> +    case 0x9: /* IILF R1,I2 [RIL] */
> +        tmp = tcg_const_i32((uint32_t)i2);
Wrong TCG type.
> +        store_reg32(r1, tmp);
> +        break;
> +    case 0xa: /* NIHF R1,I2 [RIL] */
> +        tmp = load_reg(r1);
> +        switch (op) {
> +        case 0xa: tcg_gen_andi_i64(tmp, tmp, (((uint64_t)((uint32_t)i2)) << 32) | 0xffffffffULL); break;
> +        default: tcg_abort();

Same comment as previous tcg_abort() one.

> +        }
> +        store_reg(r1, tmp);
> +        tcg_gen_shr_i64(tmp, tmp, 32);
> +        tcg_gen_trunc_i64_i32(tmp, tmp);
Wrong TCG type.
> +        set_cc_nz_u32(tmp);
> +        break;
> +    case 0xe: /* LLIHF R1,I2 [RIL] */
> +        tmp = tcg_const_i64(((uint64_t)(uint32_t)i2) << 32);
> +        store_reg(r1, tmp);
> +        break;
> +    case 0xf: /* LLILF R1,I2 [RIL] */
> +        tmp = tcg_const_i64((uint32_t)i2);
> +        store_reg(r1, tmp);
> +        break;
> +    default:
> +        LOG_DISAS("illegal c0 operation 0x%x\n", op);
> +        gen_illegal_opcode(s);
> +        break;
> +    }
> +    if (tmp) tcg_temp_free(tmp);
> +    if (tmp2) tcg_temp_free(tmp2);
> +    if (tmp3) tcg_temp_free(tmp3);

Comparison on TCGv type is not allowed.

> +}
> +
> +static void disas_c2(DisasContext *s, int op, int r1, int i2)
> +{
> +    TCGv tmp = 0, tmp2 = 0, tmp3 = 0;
The variables should not be initialized to 0 

> +    switch (op) {
> +    case 0x4: /* SLGFI R1,I2 [RIL] */
> +    case 0xa: /* ALGFI R1,I2 [RIL] */
> +        tmp = load_reg(r1);
> +        tmp2 = tcg_const_i64((uint64_t)(uint32_t)i2);
> +        tmp3 = tcg_temp_new_i64();
> +        switch (op) {
> +        case 0x4:
> +            tcg_gen_sub_i64(tmp3, tmp, tmp2);
> +            gen_helper_set_cc_subu64(cc, tmp, tmp2, tmp3);
> +            break;
> +        case 0xa:
> +            tcg_gen_add_i64(tmp3, tmp, tmp2);
> +            gen_helper_set_cc_addu64(cc, tmp, tmp2, tmp3);
> +            break;
> +        default: tcg_abort();

Same comment as previous tcg_abort() one.

> +        }
> +        store_reg(r1, tmp3);
> +        break;
> +    case 0x5: /* SLFI R1,I2 [RIL] */
> +    case 0xb: /* ALFI R1,I2 [RIL] */
> +        tmp = load_reg32(r1);
> +        tmp2 = tcg_const_i32(i2);
> +        tmp3 = tcg_temp_new_i32();
Wrong TCG type.
> +        switch (op) {
> +        case 0x5:
> +            tcg_gen_sub_i32(tmp3, tmp, tmp2);
Wrong TCG type.
> +            gen_helper_set_cc_subu32(cc, tmp, tmp2, tmp3);
> +            break;
> +        case 0xb:
> +            tcg_gen_add_i32(tmp3, tmp, tmp2);
Wrong TCG type.
> +            gen_helper_set_cc_addu32(cc, tmp, tmp2, tmp3);
> +            break;
> +        default: tcg_abort();

Same comment as previous tcg_abort() one.

> +        }
> +        store_reg32(r1, tmp3);
> +        break;
> +    case 0xc: /* CGFI R1,I2 [RIL] */
> +        tmp = load_reg(r1);
> +        cmp_s64c(tmp, (int64_t)i2);
> +        break;
> +    case 0xe: /* CLGFI R1,I2 [RIL] */
> +        tmp = load_reg(r1);
> +        cmp_u64c(tmp, (uint64_t)(uint32_t)i2);
> +        break;
> +    case 0xd: /* CFI R1,I2 [RIL] */
> +    case 0xf: /* CLFI R1,I2 [RIL] */
> +        tmp = load_reg32(r1);
> +        switch (op) {
> +        case 0xd: cmp_s32c(tmp, i2); break;
> +        case 0xf: cmp_u32c(tmp, i2); break;
> +        default: tcg_abort();

Same comment as previous tcg_abort() one.

> +        }
> +        break;
> +    default:
> +        LOG_DISAS("illegal c2 operation 0x%x\n", op);
> +        gen_illegal_opcode(s);
> +        break;
> +    }
> +    if (tmp) tcg_temp_free(tmp);
> +    if (tmp2) tcg_temp_free(tmp2);
> +    if (tmp3) tcg_temp_free(tmp3);

Comparison on TCGv type is not allowed.

> +}
> +
> +static inline uint64_t ld_code2(uint64_t pc) { return (uint64_t)lduw_code(pc); }

Coding style.

> +static inline uint64_t ld_code4(uint64_t pc) { return (uint64_t)ldl_code(pc); }

Coding style.

> +static inline uint64_t ld_code6(uint64_t pc)
> +{
> +    uint64_t opc;
> +    opc = (uint64_t)lduw_code(pc) << 32;
> +    opc |= (uint64_t)(unsigned int)ldl_code(pc+2);
> +    return opc;
> +}
> +
> +static void disas_s390_insn(CPUState *env, DisasContext *s)
> +{
> +    TCGv tmp = 0,tmp2 = 0,tmp3 = 0;

The variables should not be initialized to 0 

> +    unsigned char opc;
> +    uint64_t insn;
> +    int op, r1, r2, r3, d1, d2, x2, b1, b2, i, i2, r1b;
> +    TCGv vl, vd1, vd2, vb;
> +    
> +    opc = ldub_code(s->pc);
> +    LOG_DISAS("opc 0x%x\n", opc);
> +
> +#define FETCH_DECODE_RR \
> +    insn = ld_code2(s->pc); \
> +    DEBUGINSN \
> +    r1 = (insn >> 4) & 0xf; \
> +    r2 = insn & 0xf;
> +
> +#define FETCH_DECODE_RX \
> +    insn = ld_code4(s->pc); \
> +    DEBUGINSN \
> +    r1 = (insn >> 20) & 0xf; \
> +    x2 = (insn >> 16) & 0xf; \
> +    b2 = (insn >> 12) & 0xf; \
> +    d2 = insn & 0xfff; \
> +    tmp = get_address(x2, b2, d2);
> +
> +#define FETCH_DECODE_RS \
> +    insn = ld_code4(s->pc); \
> +    DEBUGINSN \
> +    r1 = (insn >> 20) & 0xf; \
> +    r3 = (insn >> 16) & 0xf; /* aka m3 */ \
> +    b2 = (insn >> 12) & 0xf; \
> +    d2 = insn & 0xfff;
> +        
> +#define FETCH_DECODE_SI \
> +    insn = ld_code4(s->pc); \
> +    i2 = (insn >> 16) & 0xff; \
> +    b1 = (insn >> 12) & 0xf; \
> +    d1 = insn & 0xfff; \
> +    tmp = get_address(0, b1, d1);
> +
> +    switch (opc) {
> +    case 0x7: /* BCR    M1,R2     [RR] */
> +        FETCH_DECODE_RR
> +        if (r2) {
> +            gen_bcr(r1, r2, s->pc);
> +            s->is_jmp = DISAS_JUMP;
> +        }
> +        else {
> +            /* FIXME: "serialization and checkpoint-synchronization function"? */
> +        }
> +        s->pc += 2;
> +        break;
> +    case 0xa: /* SVC    I         [RR] */
> +        insn = ld_code2(s->pc);
> +        DEBUGINSN
> +        i = insn & 0xff;
> +        tmp = tcg_const_i64(s->pc);
> +        tcg_gen_st_i64(tmp, cpu_env, offsetof(CPUState, psw.addr));
> +        s->is_jmp = DISAS_SVC;
> +        s->pc += 2;
> +        break;
> +    case 0xd: /* BASR   R1,R2     [RR] */
> +        FETCH_DECODE_RR
> +        tmp = tcg_const_i64(s->pc + 2);
> +        store_reg(r1, tmp);
> +        if (r2) {
> +            tmp2 = load_reg(r2);
> +            tcg_gen_st_i64(tmp2, cpu_env, offsetof(CPUState, psw.addr));
> +            s->is_jmp = DISAS_JUMP;
> +        }
> +        s->pc += 2;
> +        break;
> +    case 0x10: /* LPR    R1,R2     [RR] */
> +        FETCH_DECODE_RR
> +        tmp2 = load_reg32(r2);
> +        tmp = tcg_const_i32(r1);
> +        gen_helper_abs_i32(cc, tmp, tmp2);

Wrong TCGv type.

> +        s->pc += 2;
> +        break;
> +    case 0x11: /* LNR    R1,R2     [RR] */
> +        FETCH_DECODE_RR
> +        tmp2 = load_reg32(r2);
> +        tmp = tcg_const_i32(r1);
> +        gen_helper_nabs_i32(cc, tmp, tmp2);

Wrong TCGv type.

> +        s->pc += 2;
> +        break;
> +    case 0x12: /* LTR    R1,R2     [RR] */
> +        FETCH_DECODE_RR
> +        tmp = load_reg32(r2);
> +        if (r1 != r2) store_reg32(r1, tmp);
> +        set_cc_s32(tmp);
> +        s->pc += 2;
> +        break;
> +    case 0x13: /* LCR    R1,R2     [RR] */
> +        FETCH_DECODE_RR
> +        tmp = load_reg32(r2);
> +        tcg_gen_neg_i32(tmp, tmp);
> +        store_reg32(r1, tmp);
> +        gen_helper_set_cc_comp_s32(cc, tmp);

Wrong TCGv type.

> +        s->pc += 2;
> +        break;
> +    case 0x14: /* NR     R1,R2     [RR] */
> +    case 0x16: /* OR     R1,R2     [RR] */
> +    case 0x17: /* XR     R1,R2     [RR] */
> +        FETCH_DECODE_RR
> +        tmp2 = load_reg32(r2);
> +        tmp = load_reg32(r1);
> +        switch (opc) {
> +        case 0x14: tcg_gen_and_i32(tmp, tmp, tmp2); break;
> +        case 0x16: tcg_gen_or_i32(tmp, tmp, tmp2); break;
> +        case 0x17: tcg_gen_xor_i32(tmp, tmp, tmp2); break;

Wrong TCGv type.

> +        default: tcg_abort();

Same comment as previous tcg_abort() one.

> +        }
> +        store_reg32(r1, tmp);
> +        set_cc_nz_u32(tmp);
> +        s->pc += 2;
> +        break;
> +    case 0x18: /* LR     R1,R2     [RR] */
> +        FETCH_DECODE_RR
> +        tmp = load_reg32(r2);
> +        store_reg32(r1, tmp);
> +        s->pc += 2;
> +        break;
> +    case 0x15: /* CLR    R1,R2     [RR] */
> +    case 0x19: /* CR     R1,R2     [RR] */ 
> +        FETCH_DECODE_RR
> +        tmp = load_reg32(r1);
> +        tmp2 = load_reg32(r2);
> +        switch (opc) {
> +        case 0x15: cmp_u32(tmp, tmp2); break;
> +        case 0x19: cmp_s32(tmp, tmp2); break;
> +        default: tcg_abort();

Same comment as previous tcg_abort() one.

> +        }
> +        s->pc += 2;
> +        break;
> +    case 0x1a: /* AR     R1,R2     [RR] */
> +    case 0x1e: /* ALR    R1,R2     [RR] */
> +        FETCH_DECODE_RR
> +        tmp = load_reg32(r1);
> +        tmp2 = load_reg32(r2);
> +        tmp3 = tcg_temp_new_i32();
> +        tcg_gen_add_i32(tmp3, tmp, tmp2);
> +        store_reg32(r1, tmp3);

Wrong TCGv type.

> +        switch (opc) {
> +        case 0x1a: gen_helper_set_cc_add32(cc, tmp, tmp2, tmp3); break;
> +        case 0x1e: gen_helper_set_cc_addu32(cc, tmp, tmp2, tmp3); break;
> +        default: tcg_abort();

Same comment as previous tcg_abort() one.

> +        }
> +        s->pc += 2;
> +        break;
> +    case 0x1b: /* SR     R1,R2     [RR] */
> +    case 0x1f: /* SLR    R1,R2     [RR] */
> +        FETCH_DECODE_RR
> +        tmp = load_reg32(r1);
> +        tmp2 = load_reg32(r2);
> +        tmp3 = tcg_temp_new_i32();
> +        tcg_gen_sub_i32(tmp3, tmp, tmp2);
> +        store_reg32(r1, tmp3);

Wrong TCGv type.

> +        switch (opc) {
> +        case 0x1b: gen_helper_set_cc_sub32(cc, tmp, tmp2, tmp3); break;
> +        case 0x1f: gen_helper_set_cc_subu32(cc, tmp, tmp2, tmp3); break;
> +        default: tcg_abort();

Same comment as previous tcg_abort() one.

> +        }
> +        s->pc += 2;
> +        break;
> +    case 0x28: /* LDR    R1,R2               [RR] */
> +        FETCH_DECODE_RR
> +        tmp = load_freg(r2);
> +        store_freg(r1, tmp);
> +        s->pc += 2;
> +        break;
> +    case 0x38: /* LER    R1,R2               [RR] */
> +        FETCH_DECODE_RR
> +        tmp = load_freg32(r2);
> +        store_freg32(r1, tmp);
> +        s->pc += 2;
> +        break;
> +    case 0x40: /* STH    R1,D2(X2,B2)     [RX] */
> +        FETCH_DECODE_RX
> +        tmp2 = load_reg32(r1);
> +        tcg_gen_qemu_st16(tmp2, tmp, 1);
> +        s->pc += 4;
> +        break;
> +    case 0x41:	/* la */
> +        FETCH_DECODE_RX
> +        store_reg(r1, tmp); /* FIXME: 31/24-bit addressing */
> +        s->pc += 4;
> +        break;
> +    case 0x42: /* STC    R1,D2(X2,B2)     [RX] */
> +        FETCH_DECODE_RX
> +        tmp2 = load_reg32(r1);
> +        tcg_gen_qemu_st8(tmp2, tmp, 1);
> +        s->pc += 4;
> +        break;
> +    case 0x43: /* IC     R1,D2(X2,B2)     [RX] */
> +        FETCH_DECODE_RX
> +        tmp2 = tcg_temp_new_i32();

Wrong TCGv type.

> +        tcg_gen_qemu_ld8u(tmp2, tmp, 1);
> +        store_reg8(r1, tmp2);
> +        s->pc += 4;
> +        break;
> +    case 0x44: /* EX     R1,D2(X2,B2)     [RX] */
> +        FETCH_DECODE_RX
> +        tmp2 = load_reg(r1);
> +        tmp3 = tcg_const_i64(s->pc + 4);
> +        gen_helper_ex(cc, cc, tmp2, tmp, tmp3);
> +        s->pc += 4;
> +        break;
> +    case 0x47: /* BC     M1,D2(X2,B2)     [RX] */
> +        FETCH_DECODE_RX
> +        /* FIXME: optimize m1 == 0xf (unconditional) case */
> +        tmp2 = tcg_const_i32(r1); /* aka m1 */

Wrong TCGv type.

> +        tmp3 = tcg_const_i64(s->pc);
> +        gen_helper_bc(cc, tmp2, tmp, tmp3);
> +        s->is_jmp = DISAS_JUMP;
> +        s->pc += 4;
> +        break;
> +    case 0x48: /* LH     R1,D2(X2,B2)     [RX] */
> +        FETCH_DECODE_RX
> +        tmp2 = tcg_temp_new_i32();

Wrong TCGv type.

> +        tcg_gen_qemu_ld16s(tmp2, tmp, 1);
> +        store_reg32(r1, tmp2);
> +        s->pc += 4;
> +        break;
> +    case 0x49: /* CH     R1,D2(X2,B2)     [RX] */
> +        FETCH_DECODE_RX
> +        tmp2 = tcg_temp_new_i32();

Wrong TCGv type.

> +        tcg_gen_qemu_ld16s(tmp2, tmp, 1);
> +        tmp = load_reg32(r1);
> +        cmp_s32(tmp, tmp2);
> +        s->pc += 4;
> +        break;
> +    case 0x4a: /* AH     R1,D2(X2,B2)     [RX] */
> +    case 0x4b: /* SH     R1,D2(X2,B2)     [RX] */
> +    case 0x4c: /* MH     R1,D2(X2,B2)     [RX] */
> +        FETCH_DECODE_RX
> +        tmp2 = tcg_temp_new_i32();

Wrong TCGv type.

> +        tcg_gen_qemu_ld16s(tmp2, tmp, 1);
> +        tmp = load_reg32(r1);
> +        tmp3 = tcg_temp_new_i32();
> +        switch (opc) {
> +        case 0x4a:
> +            tcg_gen_add_i32(tmp3, tmp, tmp2);
> +            gen_helper_set_cc_add32(cc, tmp, tmp2, tmp3);
> +            break;
> +        case 0x4b:
> +            tcg_gen_sub_i32(tmp3, tmp, tmp2);
> +            gen_helper_set_cc_sub32(cc, tmp, tmp2, tmp3);
> +            break;
> +        case 0x4c:
> +            tcg_gen_mul_i32(tmp3, tmp, tmp2);
> +            break;
> +        default: tcg_abort();

Same comment as previous tcg_abort() one.

> +        }
> +        store_reg32(r1, tmp3);
> +        s->pc += 4;
> +        break;
> +    case 0x50: /* st r1, d2(x2, b2) */
> +        FETCH_DECODE_RX
> +        tmp2 = load_reg32(r1);
> +        tcg_gen_qemu_st32(tmp2, tmp, 1);
> +        s->pc += 4;
> +        break;
> +    case 0x55: /* CL     R1,D2(X2,B2)     [RX] */
> +        FETCH_DECODE_RX
> +        tmp2 = tcg_temp_new_i32();

Wrong TCGv type.

> +        tcg_gen_qemu_ld32u(tmp2, tmp, 1);
> +        tmp = load_reg32(r1);
> +        cmp_u32(tmp, tmp2);
> +        s->pc += 4;
> +        break;
> +    case 0x54: /* N      R1,D2(X2,B2)     [RX] */
> +    case 0x56: /* O      R1,D2(X2,B2)     [RX] */
> +    case 0x57: /* X      R1,D2(X2,B2)     [RX] */
> +        FETCH_DECODE_RX

Wrong TCGv type.

> +        tmp2 = tcg_temp_new_i32();
> +        tcg_gen_qemu_ld32u(tmp2, tmp, 1);
> +        tmp = load_reg32(r1);
> +        switch (opc) {
> +        case 0x54: tcg_gen_and_i32(tmp, tmp, tmp2); break;
> +        case 0x56: tcg_gen_or_i32(tmp, tmp, tmp2); break;
> +        case 0x57: tcg_gen_xor_i32(tmp, tmp, tmp2); break;

Wrong TCGv type.

> +        default: tcg_abort();

Same comment as previous tcg_abort() one.

> +        }
> +        store_reg32(r1, tmp);
> +        set_cc_nz_u32(tmp);
> +        s->pc += 4;
> +        break;
> +    case 0x58: /* l r1, d2(x2, b2) */
> +        FETCH_DECODE_RX
> +        tmp2 = tcg_temp_new_i32();

Wrong TCGv type.

> +        tcg_gen_qemu_ld32u(tmp2, tmp, 1);
> +        store_reg32(r1, tmp2);
> +        s->pc += 4;
> +        break;
> +    case 0x59: /* C      R1,D2(X2,B2)     [RX] */
> +        FETCH_DECODE_RX
> +        tmp2 = tcg_temp_new_i32();

Wrong TCGv type.

> +        tcg_gen_qemu_ld32s(tmp2, tmp, 1);
> +        tmp = load_reg32(r1);
> +        cmp_s32(tmp, tmp2);
> +        s->pc += 4;
> +        break;
> +    case 0x5a: /* A      R1,D2(X2,B2)     [RX] */
> +    case 0x5b: /* S      R1,D2(X2,B2)     [RX] */
> +    case 0x5e: /* AL     R1,D2(X2,B2)     [RX] */
> +    case 0x5f: /* SL     R1,D2(X2,B2)     [RX] */
> +        FETCH_DECODE_RX
> +        tmp2 = load_reg32(r1);
> +        tcg_gen_qemu_ld32s(tmp, tmp, 1);
> +        tmp3 = tcg_temp_new_i32();

Wrong TCGv type.

> +        switch (opc) {
> +        case 0x5a: case 0x5e: tcg_gen_add_i32(tmp3, tmp2, tmp); break;
> +        case 0x5b: case 0x5f: tcg_gen_sub_i32(tmp3, tmp2, tmp); break;
> +        default: tcg_abort();

Same comment as previous tcg_abort() one.

> +        }
> +        store_reg32(r1, tmp3);
> +        switch (opc) {
> +        case 0x5a: gen_helper_set_cc_add32(cc, tmp2, tmp, tmp3); break;
> +        case 0x5e: gen_helper_set_cc_addu32(cc, tmp2, tmp, tmp3); break;
> +        case 0x5b: gen_helper_set_cc_sub32(cc, tmp2, tmp, tmp3); break;
> +        case 0x5f: gen_helper_set_cc_subu32(cc, tmp2, tmp, tmp3); break;
> +        default: tcg_abort();

Same comment as previous tcg_abort() one.

> +        }
> +        s->pc += 4;
> +        break;
> +    case 0x60: /* STD    R1,D2(X2,B2)        [RX] */
> +        FETCH_DECODE_RX
> +        tmp2 = load_freg(r1);
> +        tcg_gen_qemu_st64(tmp2, tmp, 1);
> +        s->pc += 4;
> +        break;
> +    case 0x68: /* LD    R1,D2(X2,B2)        [RX] */
> +        FETCH_DECODE_RX
> +        tmp2 = tcg_temp_new_i64();
> +        tcg_gen_qemu_ld64(tmp2, tmp, 1);
> +        store_freg(r1, tmp2);
> +        s->pc += 4;
> +        break;
> +    case 0x70: /* STE R1,D2(X2,B2) [RX] */
> +        FETCH_DECODE_RX
> +        tmp2 = load_freg32(r1);
> +        tcg_gen_qemu_st32(tmp2, tmp, 1);
> +        s->pc += 4;
> +        break;
> +    case 0x71: /* MS      R1,D2(X2,B2)     [RX] */
> +        FETCH_DECODE_RX
> +        tmp2 = tcg_temp_new_i32();

Wrong TCGv type.

> +        tcg_gen_qemu_ld32s(tmp2, tmp, 1);
> +        tmp = load_reg(r1);
> +        tcg_gen_mul_i32(tmp, tmp, tmp2);
> +        store_reg(r1, tmp);
> +        s->pc += 4;
> +        break;
> +    case 0x78: /* LE     R1,D2(X2,B2)        [RX] */
> +        FETCH_DECODE_RX
> +        tmp2 = tcg_temp_new_i32();
> +        tcg_gen_qemu_ld32u(tmp2, tmp, 1);
> +        store_freg32(r1, tmp2);
> +        s->pc += 4;
> +        break;
> +    case 0x88: /* SRL    R1,D2(B2)        [RS] */
> +    case 0x89: /* SLL    R1,D2(B2)        [RS] */
> +    case 0x8a: /* SRA    R1,D2(B2)        [RS] */
> +        FETCH_DECODE_RS
> +        tmp = get_address(0, b2, d2);
> +        tcg_gen_andi_i64(tmp, tmp, 0x3f);
> +        tmp2 = load_reg32(r1);
> +        switch (opc) {
> +        case 0x88: tcg_gen_shr_i32(tmp2, tmp2, tmp); break;
> +        case 0x89: tcg_gen_shl_i32(tmp2, tmp2, tmp); break;
> +        case 0x8a: tcg_gen_sar_i32(tmp2, tmp2, tmp); break;
> +        default: tcg_abort();

Same comment as previous tcg_abort() one.

> +        }
> +        store_reg32(r1, tmp2);
> +        if (opc == 0x8a) set_cc_s32(tmp2);
coding style
> +        s->pc += 4;
> +        break;
> +    case 0x91: /* TM     D1(B1),I2        [SI] */
> +        FETCH_DECODE_SI
> +        tmp2 = tcg_temp_new_i32();
> +        tcg_gen_qemu_ld8u(tmp2, tmp, 1);
> +        tmp = tcg_const_i32(i2);
Wrong TCGv type.
> +        gen_helper_tm(cc, tmp2, tmp);
> +        s->pc += 4;
> +        break;
> +    case 0x92: /* MVI    D1(B1),I2        [SI] */
> +        FETCH_DECODE_SI
> +        tmp2 = tcg_const_i32(i2);
> +        tcg_gen_qemu_st8(tmp2, tmp, 1);
> +        s->pc += 4;
> +        break;
> +    case 0x94: /* NI     D1(B1),I2        [SI] */
> +    case 0x96: /* OI     D1(B1),I2        [SI] */
> +    case 0x97: /* XI     D1(B1),I2        [SI] */
> +        FETCH_DECODE_SI
> +        tmp2 = tcg_temp_new_i32();
Wrong TCGv type.
> +        tcg_gen_qemu_ld8u(tmp2, tmp, 1);
> +        switch (opc) {
> +        case 0x94: tcg_gen_andi_i32(tmp2, tmp2, i2); break;
> +        case 0x96: tcg_gen_ori_i32(tmp2, tmp2, i2); break;
> +        case 0x97: tcg_gen_xori_i32(tmp2, tmp2, i2); break;

Same comment as previous tcg_abort() one.

> +        default: tcg_abort();
> +        }
> +        tcg_gen_qemu_st8(tmp2, tmp, 1);
> +        set_cc_nz_u32(tmp2);
> +        s->pc += 4;
> +        break;
> +    case 0x95: /* CLI    D1(B1),I2        [SI] */
> +        FETCH_DECODE_SI
> +        tmp2 = tcg_temp_new_i32();
Wrong TCGv type.
> +        tcg_gen_qemu_ld8u(tmp2, tmp, 1);
> +        cmp_u32c(tmp2, i2);
> +        s->pc += 4;
> +        break;
> +    case 0x9b: /* STAM     R1,R3,D2(B2)     [RS] */
> +        FETCH_DECODE_RS
> +        tmp = tcg_const_i32(r1);
> +        tmp2 = get_address(0, b2, d2);
> +        tmp3 = tcg_const_i32(r3);
Wrong TCGv type.
> +        gen_helper_stam(tmp, tmp2, tmp3);
> +        s->pc += 4;
> +        break;
> +    case 0xa5:
> +        insn = ld_code4(s->pc);
> +        r1 = (insn >> 20) & 0xf;
> +        op = (insn >> 16) & 0xf;
> +        i2 = insn & 0xffff;
> +        disas_a5(s, op, r1, i2);
> +        s->pc += 4;
> +        break;
> +    case 0xa7:
> +        insn = ld_code4(s->pc);
> +        r1 = (insn >> 20) & 0xf;
> +        op = (insn >> 16) & 0xf;
> +        i2 = (short)insn;
> +        disas_a7(s, op, r1, i2);
> +        s->pc += 4;
> +        break;
> +    case 0xa8: /* MVCLE   R1,R3,D2(B2)     [RS] */
> +        FETCH_DECODE_RS
> +        tmp = tcg_const_i32(r1);
> +        tmp3 = tcg_const_i32(r3);
Wrong TCGv type.
> +        tmp2 = get_address(0, b2, d2);
> +        gen_helper_mvcle(cc, tmp, tmp2, tmp3);
> +        s->pc += 4;
> +        break;
> +    case 0xa9: /* CLCLE   R1,R3,D2(B2)     [RS] */
> +        FETCH_DECODE_RS
> +        tmp = tcg_const_i32(r1);
> +        tmp3 = tcg_const_i32(r3);
> +        tmp2 = get_address(0, b2, d2);
> +        gen_helper_clcle(cc, tmp, tmp2, tmp3);
> +        s->pc += 4;
> +        break;
> +    case 0xb2:
> +        insn = ld_code4(s->pc);
> +        op = (insn >> 16) & 0xff;
> +        switch (op) {
> +        case 0x9c: /* STFPC    D2(B2) [S] */
> +            d2 = insn & 0xfff;
> +            b2 = (insn >> 12) & 0xf;
> +            tmp = tcg_temp_new_i32();
> +            tcg_gen_ld_i32(tmp, cpu_env, offsetof(CPUState, fpc));
Wrong TCGv type.
> +            tmp2 = get_address(0, b2, d2);
> +            tcg_gen_qemu_st32(tmp, tmp2, 1);
> +            break;
> +        default:
> +            r1 = (insn >> 4) & 0xf;
> +            r2 = insn & 0xf;
> +            disas_b2(s, op, r1, r2);
> +            break;
> +        }
> +        s->pc += 4;
> +        break;
> +    case 0xb3:
> +        insn = ld_code4(s->pc);
> +        op = (insn >> 16) & 0xff;
> +        r3 = (insn >> 12) & 0xf; /* aka m3 */
> +        r1 = (insn >> 4) & 0xf;
> +        r2 = insn & 0xf;
> +        disas_b3(s, op, r3, r1, r2);
> +        s->pc += 4;
> +        break;
> +    case 0xb9:
> +        insn = ld_code4(s->pc);
> +        r1 = (insn >> 4) & 0xf;
> +        r2 = insn & 0xf;
> +        op = (insn >> 16) & 0xff;
> +        disas_b9(s, op, r1, r2);
> +        s->pc += 4;
> +        break;
> +    case 0xba: /* CS     R1,R3,D2(B2)     [RS] */
> +        FETCH_DECODE_RS
> +        tmp = tcg_const_i32(r1);
> +        tmp2 = get_address(0, b2, d2);
> +        tmp3 = tcg_const_i32(r3);
Wrong TCGv type.
> +        gen_helper_cs(cc, tmp, tmp2, tmp3);
> +        s->pc += 4;
> +        break;
> +    case 0xbd: /* CLM    R1,M3,D2(B2)     [RS] */
> +        FETCH_DECODE_RS
> +        tmp3 = get_address(0, b2, d2);
> +        tmp2 = tcg_const_i32(r3); /* aka m3 */
Wrong TCGv type.
> +        tmp = load_reg32(r1);
> +        gen_helper_clm(cc, tmp, tmp2, tmp3);
> +        s->pc += 4;
> +        break;
> +    case 0xbe: /* STCM R1,M3,D2(B2) [RS] */
> +        FETCH_DECODE_RS
> +        tmp3 = get_address(0, b2, d2);
> +        tmp2 = tcg_const_i32(r3); /* aka m3 */
Wrong TCGv type.
> +        tmp = load_reg32(r1);
> +        gen_helper_stcm(tmp, tmp2, tmp3);
> +        s->pc += 4;
> +        break;
> +    case 0xbf: /* ICM    R1,M3,D2(B2)     [RS] */
> +        FETCH_DECODE_RS
> +        if (r3 == 15) {	/* effectively a 32-bit load */
> +            tmp = get_address(0, b2, d2);
> +            tmp2 = tcg_temp_new_i32();
> +            tcg_gen_qemu_ld32u(tmp2, tmp, 1);
> +            store_reg32(r1, tmp2);
> +            tmp = tcg_const_i32(r3);
Wrong TCGv type.
> +            gen_helper_set_cc_icm(cc, tmp, tmp2);
> +        }
> +        else if (r3) {
> +            uint32_t mask = 0x00ffffffUL;
> +            uint32_t shift = 24;
> +            int m3 = r3;
> +            tmp3 = load_reg32(r1);
> +            tmp = get_address(0, b2, d2);
> +            tmp2 = tcg_temp_new_i32();
> +            while (m3) {
> +                if (m3 & 8) {
> +                    tcg_gen_qemu_ld8u(tmp2, tmp, 1);
> +                    if (shift) tcg_gen_shli_i32(tmp2, tmp2, shift);
> +                    tcg_gen_andi_i32(tmp3, tmp3, mask);
> +                    tcg_gen_or_i32(tmp3, tmp3, tmp2);
> +                    tcg_gen_addi_i64(tmp, tmp, 1);
> +                }
> +                m3 = (m3 << 1) & 0xf;
> +                mask = (mask >> 8) | 0xff000000UL;
> +                shift -= 8;
> +            }
> +            store_reg32(r1, tmp3);
> +            tmp = tcg_const_i32(r3);
Wrong TCGv type.
> +            gen_helper_set_cc_icm(cc, tmp, tmp2);
> +        }
> +        else {
> +            tmp = tcg_const_i32(0);
Wrong TCGv type.
> +            gen_helper_set_cc_icm(cc, tmp, tmp);	/* i.e. env->cc = 0 */
> +        }
> +        s->pc += 4;
> +        break;
> +    case 0xc0:
> +    case 0xc2:
> +        insn = ld_code6(s->pc);
> +        r1 = (insn >> 36) & 0xf;
> +        op = (insn >> 32) & 0xf;
> +        i2 = (int)insn;
> +        switch (opc) {
> +        case 0xc0: disas_c0(s, op, r1, i2); break;
> +        case 0xc2: disas_c2(s, op, r1, i2); break;
> +        default: tcg_abort();

Same comment as previous tcg_abort() one.

> +        }
> +        s->pc += 6;
> +        break;
> +    case 0xd2: /* mvc d1(l, b1), d2(b2) */
> +    case 0xd4: /* NC     D1(L,B1),D2(B2)         [SS] */
> +    case 0xd5: /* CLC    D1(L,B1),D2(B2)         [SS] */
> +    case 0xd6: /* OC     D1(L,B1),D2(B2)         [SS] */
> +    case 0xd7: /* xc d1(l, b1), d2(b2) */
> +        insn = ld_code6(s->pc);
> +        vl = tcg_const_i32((insn >> 32) & 0xff);
> +        b1 = (insn >> 28) & 0xf;
> +        vd1 = tcg_const_i32((insn >> 16) & 0xfff);
> +        b2 = (insn >> 12) & 0xf;
> +        vd2 = tcg_const_i32(insn & 0xfff);
> +        vb = tcg_const_i32((b1 << 4) | b2);

Wrong TCGv type.

> +        switch (opc) {
> +        case 0xd2: gen_helper_mvc(vl, vb, vd1, vd2); break;
> +        case 0xd4: gen_helper_nc(cc, vl, vb, vd1, vd2); break;
> +        case 0xd5: gen_helper_clc(cc, vl, vb, vd1, vd2); break;
> +        case 0xd6: gen_helper_oc(cc, vl, vb, vd1, vd2); break;
> +        case 0xd7: gen_helper_xc(cc, vl, vb, vd1, vd2); break;
> +        default: tcg_abort(); break;

Same comment as previous tcg_abort() one.

> +        }
> +        s->pc += 6;
> +        break;
> +    case 0xe3:
> +        insn = ld_code6(s->pc);
> +        DEBUGINSN
> +        d2 = (  (int) ( (((insn >> 16) & 0xfff) | ((insn << 4) & 0xff000)) << 12 )  ) >> 12;
> +        disas_e3(s, /* op */ insn & 0xff, /* r1 */ (insn >> 36) & 0xf, /* x2 */ (insn >> 32) & 0xf, /* b2 */ (insn >> 28) & 0xf, d2 );
> +        s->pc += 6;
> +        break;
> +    case 0xeb:
> +        insn = ld_code6(s->pc);
> +        DEBUGINSN
> +        op = insn & 0xff;
> +        r1 = (insn >> 36) & 0xf;
> +        r3 = (insn >> 32) & 0xf;
> +        b2 = (insn >> 28) & 0xf;
> +        d2 = (  (int) ( (((insn >> 16) & 0xfff) | ((insn << 4) & 0xff000)) << 12 )  ) >> 12;
> +        disas_eb(s, op, r1, r3, b2, d2);
> +        s->pc += 6;
> +        break;
> +    case 0xed:
> +        insn = ld_code6(s->pc);
> +        DEBUGINSN
> +        op = insn & 0xff;
> +        r1 = (insn >> 36) & 0xf;
> +        x2 = (insn >> 32) & 0xf;
> +        b2 = (insn >> 28) & 0xf;
> +        d2 = (short)((insn >> 16) & 0xfff);
> +        r1b = (insn >> 12) & 0xf;
> +        disas_ed(s, op, r1, x2, b2, d2, r1b);
> +        s->pc += 6;
> +        break;
> +    default:
> +        LOG_DISAS("unimplemented opcode 0x%x\n", opc);
> +        gen_illegal_opcode(s);
> +        s->pc += 6;
> +        break;
> +    }
> +    if (tmp) tcg_temp_free(tmp);
> +    if (tmp2) tcg_temp_free(tmp2);
> +    if (tmp3) tcg_temp_free(tmp3);

Comparison on TCGv type is not allowed.

> +}
> +
> +static inline void gen_intermediate_code_internal (CPUState *env,
> +                                                          TranslationBlock *tb,
> +                                                          int search_pc)
> +{
> +    DisasContext dc;
> +    target_ulong pc_start;
> +    uint64_t next_page_start;
> +    uint16_t *gen_opc_end;
> +    int j, lj = -1;
> +    int num_insns, max_insns;
> +    
> +    pc_start = tb->pc;
> +    
> +    dc.pc = tb->pc;
> +    dc.env = env;
> +    dc.pc = pc_start;
> +    dc.is_jmp = DISAS_NEXT;
> +    
> +    gen_opc_end = gen_opc_buf + OPC_MAX_SIZE;
> +    
> +    next_page_start = (pc_start & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
> +    
> +    num_insns = 0;
> +    max_insns = tb->cflags & CF_COUNT_MASK;
> +    if (max_insns == 0)
> +        max_insns = CF_COUNT_MASK;
> +
> +    gen_icount_start();
> +#if 1
> +    cc = tcg_temp_local_new_i32();
> +    tcg_gen_mov_i32(cc, global_cc);
> +#else
> +    cc = global_cc;
> +#endif

Why this?

> +    do {
> +        if (search_pc) {
> +            j = gen_opc_ptr - gen_opc_buf;
> +            if (lj < j) {
> +                lj++;
> +                while (lj < j)
> +                    gen_opc_instr_start[lj++] = 0;
> +            }
> +            gen_opc_pc[lj] = dc.pc;
> +            gen_opc_instr_start[lj] = 1;
> +            gen_opc_icount[lj] = num_insns;
> +        }
> +        if (num_insns + 1 == max_insns && (tb->cflags & CF_LAST_IO))
> +            gen_io_start();
> +#if defined S390X_DEBUG_DISAS
> +        LOG_DISAS("pc " TARGET_FMT_lx "\n",
> +                  dc.pc);
> +#endif
> +        disas_s390_insn(env, &dc);
> +        
> +        num_insns++;
> +    } while (!dc.is_jmp && gen_opc_ptr < gen_opc_end && dc.pc < next_page_start && num_insns < max_insns);

The also translation should be stopped if in singlestep mode (singlestep
variable).

> +    tcg_gen_mov_i32(global_cc, cc);
> +    tcg_temp_free(cc);
> +    
> +    if (!dc.is_jmp) {
> +        tcg_gen_st_i64(tcg_const_i64(dc.pc), cpu_env, offsetof(CPUState, psw.addr));
> +    }
> +    
> +    if (dc.is_jmp == DISAS_SVC) {
> +        tcg_gen_st_i64(tcg_const_i64(dc.pc), cpu_env, offsetof(CPUState, psw.addr));
> +        TCGv tmp = tcg_const_i32(EXCP_SVC);
> +        gen_helper_exception(tmp);
> +    }
> +
> +    if (tb->cflags & CF_LAST_IO)
> +        gen_io_end();
> +    /* Generate the return instruction */
> +    tcg_gen_exit_tb(0);
> +    gen_icount_end(tb, num_insns);
> +    *gen_opc_ptr = INDEX_op_end;
> +    if (search_pc) {
> +        j = gen_opc_ptr - gen_opc_buf;
> +        lj++;
> +        while (lj <= j)
> +            gen_opc_instr_start[lj++] = 0;
> +    } else {
> +        tb->size = dc.pc - pc_start;
> +        tb->icount = num_insns;
> +    }
> +#if defined S390X_DEBUG_DISAS
> +    log_cpu_state_mask(CPU_LOG_TB_CPU, env, 0);
> +    if (qemu_loglevel_mask(CPU_LOG_TB_IN_ASM)) {
> +        qemu_log("IN: %s\n", lookup_symbol(pc_start));
> +        log_target_disas(pc_start, dc.pc - pc_start, 1);
> +        qemu_log("\n");
> +    }
> +#endif
> +}
> +
> +void gen_intermediate_code (CPUState *env, struct TranslationBlock *tb)
> +{
> +    gen_intermediate_code_internal(env, tb, 0);
> +}
> +
> +void gen_intermediate_code_pc (CPUState *env, struct TranslationBlock *tb)
> +{
> +    gen_intermediate_code_internal(env, tb, 1);
> +}
> +
> +void gen_pc_load(CPUState *env, TranslationBlock *tb,
> +                unsigned long searched_pc, int pc_pos, void *puc)
> +{
> +    env->psw.addr = gen_opc_pc[pc_pos];
> +}
> -- 
> 1.6.2.1
> 
> 
> 
> 

-- 
Aurelien Jarno	                        GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 3/9] S/390 host/target build system support
  2009-10-16 12:38     ` [Qemu-devel] [PATCH 3/9] S/390 host/target build system support Ulrich Hecht
  2009-10-16 12:38       ` [Qemu-devel] [PATCH 4/9] S/390 host support for TCG Ulrich Hecht
@ 2009-10-17 10:44       ` Aurelien Jarno
  1 sibling, 0 replies; 26+ messages in thread
From: Aurelien Jarno @ 2009-10-17 10:44 UTC (permalink / raw)
  To: Ulrich Hecht; +Cc: riku.voipio, qemu-devel, agraf

On Fri, Oct 16, 2009 at 02:38:49PM +0200, Ulrich Hecht wrote:
> changes to configure and makefiles for S/390 host and target support,
> fixed as suggested by Juan Quintela
> 
> adapted to most recent changes in build system

Basically looks ok, but it would be worth to split it into host and
target part, so that the two resulting patches can be applied
separately.

> Signed-off-by: Ulrich Hecht <uli@suse.de>
> ---
>  configure                            |   22 ++++++++++++++++------
>  default-configs/s390x-linux-user.mak |    1 +
>  2 files changed, 17 insertions(+), 6 deletions(-)
>  create mode 100644 default-configs/s390x-linux-user.mak
> 
> diff --git a/configure b/configure
> index ca6d45c..64be51f 100755
> --- a/configure
> +++ b/configure
> @@ -157,9 +157,12 @@ case "$cpu" in
>    parisc|parisc64)
>      cpu="hppa"
>    ;;
> -  s390*)
> +  s390)
>      cpu="s390"
>    ;;
> +  s390x)
> +    cpu="s390x"
> +  ;;
>    sparc|sun4[cdmuv])
>      cpu="sparc"
>    ;;
> @@ -790,6 +793,7 @@ sh4eb-linux-user \
>  sparc-linux-user \
>  sparc64-linux-user \
>  sparc32plus-linux-user \
> +s390x-linux-user \
>  "
>      fi
>  # the following are Darwin specific
> @@ -855,7 +859,7 @@ fi
>  # host long bits test
>  hostlongbits="32"
>  case "$cpu" in
> -  x86_64|alpha|ia64|sparc64|ppc64)
> +  x86_64|alpha|ia64|sparc64|ppc64|s390x)
>      hostlongbits=64
>    ;;
>  esac
> @@ -1819,7 +1823,7 @@ echo >> $config_host_mak
>  echo "CONFIG_QEMU_SHAREDIR=\"$prefix$datasuffix\"" >> $config_host_mak
>  
>  case "$cpu" in
> -  i386|x86_64|alpha|cris|hppa|ia64|m68k|microblaze|mips|mips64|ppc|ppc64|s390|sparc|sparc64)
> +  i386|x86_64|alpha|cris|hppa|ia64|m68k|microblaze|mips|mips64|ppc|ppc64|s390|s390x|sparc|sparc64)
>      ARCH=$cpu
>    ;;
>    armv4b|armv4l)
> @@ -2090,7 +2094,7 @@ target_arch2=`echo $target | cut -d '-' -f 1`
>  target_bigendian="no"
>  
>  case "$target_arch2" in
> -  armeb|m68k|microblaze|mips|mipsn32|mips64|ppc|ppcemb|ppc64|ppc64abi32|sh4eb|sparc|sparc64|sparc32plus)
> +  armeb|m68k|microblaze|mips|mipsn32|mips64|ppc|ppcemb|ppc64|ppc64abi32|s390x|sh4eb|sparc|sparc64|sparc32plus)
>    target_bigendian=yes
>    ;;
>  esac
> @@ -2250,6 +2254,10 @@ case "$target_arch2" in
>      echo "TARGET_ABI32=y" >> $config_target_mak
>      target_phys_bits=64
>    ;;
> +  s390x)
> +    target_nptl="yes"
> +    target_phys_bits=64
> +  ;;
>    *)
>      echo "Unsupported target CPU"
>      exit 1
> @@ -2318,7 +2326,7 @@ if test ! -z "$gdb_xml_files" ; then
>  fi
>  
>  case "$target_arch2" in
> -  arm|armeb|m68k|microblaze|mips|mipsel|mipsn32|mipsn32el|mips64|mips64el|ppc|ppc64|ppc64abi32|ppcemb|sparc|sparc64|sparc32plus)
> +  arm|armeb|m68k|microblaze|mips|mipsel|mipsn32|mipsn32el|mips64|mips64el|ppc|ppc64|ppc64abi32|ppcemb|s390x|sparc|sparc64|sparc32plus)
>      echo "CONFIG_SOFTFLOAT=y" >> $config_target_mak
>      ;;
>    *)
> @@ -2351,6 +2359,8 @@ ldflags=""
>  
>  if test "$ARCH" = "sparc64" ; then
>    cflags="-I\$(SRC_PATH)/tcg/sparc $cflags"
> +elif test "$ARCH" = "s390x" ; then
> +  cflags="-I\$(SRC_PATH)/tcg/s390 $cflags"
>  else
>    cflags="-I\$(SRC_PATH)/tcg/\$(ARCH) $cflags"
>  fi
> @@ -2386,7 +2396,7 @@ for i in $ARCH $TARGET_BASE_ARCH ; do
>    ppc*)
>      echo "CONFIG_PPC_DIS=y"  >> $config_target_mak
>    ;;
> -  s390)
> +  s390*)
>      echo "CONFIG_S390_DIS=y"  >> $config_target_mak
>    ;;
>    sh4)
> diff --git a/default-configs/s390x-linux-user.mak b/default-configs/s390x-linux-user.mak
> new file mode 100644
> index 0000000..a243c99
> --- /dev/null
> +++ b/default-configs/s390x-linux-user.mak
> @@ -0,0 +1 @@
> +# Default configuration for s390x-linux-user
> -- 
> 1.6.2.1
> 
> 
> 
> 

-- 
Aurelien Jarno	                        GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 2/9] S/390 CPU emulation
  2009-10-17 10:42     ` [Qemu-devel] [PATCH 2/9] S/390 CPU emulation Aurelien Jarno
@ 2009-10-19 17:17       ` Ulrich Hecht
  2009-10-22 21:28         ` Aurelien Jarno
  0 siblings, 1 reply; 26+ messages in thread
From: Ulrich Hecht @ 2009-10-19 17:17 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: riku.voipio, qemu-devel, agraf

On Saturday 17 October 2009, Aurelien Jarno wrote:
> On Fri, Oct 16, 2009 at 02:38:48PM +0200, Ulrich Hecht wrote:
> First of all a few general comments. Note that I know very few things
> about S390/S390X, so I may have dumb comments/questions. Also as the
> patch is very long, I probably have missed things, we should probably
> iterate with new versions of the patches.
>
> Is it possible given the current implementation to emulate S390 in
> addition to S390X?

Not as it stands. The S/390 instruction set is a subset of zArch 
(64-bit), but currently only 64-bit addressing is implemented (because 
that is the only mode used by 64-bit Linux binaries).

> If yes how different would be a S390 only target? 

It would need support for 31-bit addressing. 24-bit, too, if you want to 
run the really old stuff.

> If they won't be too different, it probably worth using _tl type
> registers and call it target-s390. A bit the way it is done with
> i386/x86_64, ppc/ppc64, mips/mips64 or sparc/sparc64.

If it's going to be implemented, it will certainly share most code with 
the 64-bit target, so having a common directory would most likely be a 
good idea.

There are some cases where _tl types might be appropriate, but generally 
everything that is shared between S/390 and zArch is 32-bit, and 
instructions present in both architectures also have the same operand 
size: L, for instance, is a 32-bit load to a 32-bit register on both 
architectures.

> Secondly there seems to be a lot of mix between 32-bit and 64-bit
> TCG registers. They should not be mixed. _i32 ops apply to TCGv_i32
> variables, _i64 ops apply to TCGv_i64 variables, and _tl ops to TGCv
> variables. TCGv/_tl types can map either to _i32 or _i64 depending on
> your target. You should try to build the target with
> --enable-debug-tcg

Er, reading the code and observing the behavior of the (AMD64) code 
generator, it seemed to me that neither frontends nor backends care one 
bit about that. I'll look into it, though. (I won't comment on those 
below, except in cases where the solution or, indeed, the problem is not 
clear to me.)

> It would be nice to split all the disassembling part in a separate
> combined with the related makefile changes. This way it can be applied
> separately.

Can do.

> >  //  switch (info->mach)
> >  //    {
> >  //    case bfd_mach_s390_31:
> > -      current_arch_mask = 1 << S390_OPCODE_ESA;
> > +//      current_arch_mask = 1 << S390_OPCODE_ESA;
> >  //      break;
> >  //    case bfd_mach_s390_64:
> > -//      current_arch_mask = 1 << S390_OPCODE_ZARCH;
> > +      current_arch_mask = 1 << S390_OPCODE_ZARCH;
> >  //      break;
> >  //    default:
> >  //      abort ();
>
> While I understand the second part, Why is it necessary to comment the
> bfd_mach_s390_31 case?

It's not necessary, but current_arch_mask can only hold one value...

> > +typedef union FPReg {
> > +    struct {
> > +#ifdef WORDS_BIGENDIAN
> > +        float32 e;
> > +        int32_t __pad;
> > +#else
> > +        int32_t __pad;
> > +        float32 e;
> > +#endif
> > +    };
> > +    float64 d;
> > +    uint64_t i;
> > +} FPReg;
>
> WORDS_BIGENDIAN is wrong here. It should probably be
> HOST_WORDS_BIGENDIAN.

Indeed.

> Also it may be a better idea to 
> reuse CPU_FloatU and CPU_DoubleU here.

Yes. Didn't know about those.

> > +CPUS390XState *cpu_s390x_init(const char *cpu_model)
>
> This function would probably be better in translate.c, so that
> s390x_translate_init() can be declared static.

I'll check that.

> Please use /* */ for comments (See CODING_STYLE).

OK.

> > +    /* FIXME: reset vector? */
>
> Yes, reset vector, MMU mode, default register values, ...

Works perfectly fine without. ;)

Also, the question remains what kind of reset cpu_reset() is supposed to 
be. For instance, a CPU Reset or Initial CPU Reset on S/390 leaves the 
general-purpose registers unchanged, a Clear Reset or Power-On Reset 
sets them to zero. See the table on p. 4-50 in the zArchitecture 
Principles of Operation (POP), 
http://publibfp.boulder.ibm.com/cgi-bin/bookmgr/download/A2278324.pdf, 
for the gory details.

> > +//#define DEBUG_HELPER
> > +#ifdef DEBUG_HELPER
> > +#define HELPER_LOG(x...) qemu_log(x)
> > +#else
> > +#define HELPER_LOG(x...)
> > +#endif
>
> Small comment for the rest of the file. While I understand HELPER_LOG
> is very handy while developing, I think some calls to it can be
> dropped in really simple functions.

Yes, a bit of cleanup will be in order.

> > +    for (i = 0; i <= l; i++) {
> > +        x = ldub(dest + i) & ldub(src + i);
> > +        if (x) cc = 1;
>
> coding style

Maybe you are referring to "Every indented statement is braced; even if 
the block contains just one statement.", but I figured that that does 
not apply here because there is no indentation...

> > +    if (v < 0) return 1;
> > +    else if (v > 0) return 2;
>
> coding style

Same here.

> The last 6 helpers can probably be coded easily in TCG (they are very
> similar to some PowerPC code), though merging this patch with the
> helpers is not a problem at all.

Again, that would be detrimental to performance. I have experimented with 
that and found that TCG is well capable of optimizing out a pure helper 
call, but not a complex piece of code with branches. Condition codes are 
not used most times, and the overhead of calling a helper in the cases 
they are is far smaller than doing all the condition code stuff that is 
not used. Also reduces code size.

> All the branch part would really gain to be coded in TCG, as it will
> allow TB chaining.

I vaguely remember trying that as well, but I don't know if I gave up on 
it because it was slower, or because I couldn't get it to work...

> > +            }
> > +            else if (r > d) {
>
> coding style

OK, this one is valid. :)

> __uint128_t is probably not supported on all hosts/GCC versions.

Definitely does not work on 32-bit hosts.

> mulu64() should be used instead.

Excellent, didn't know about that either.

> Same here, __uint128_t should not be used, though I don't know what
> should be used instead.

Hmmmm...

> > +/* unsigned string compare (c is string terminator) */
> > +uint32_t HELPER(clst)(uint32_t c, uint32_t r1, uint32_t r2)
> > +{
> > +    uint64_t s1 = env->regs[r1];
> > +    uint64_t s2 = env->regs[r2];
> > +    uint8_t v1, v2;
> > +    uint32_t cc;
> > +    c = c & 0xff;
> > +#ifdef CONFIG_USER_ONLY
> > +    if (!c) {
> > +        HELPER_LOG("%s: comparing '%s' and '%s'\n",
> > +                   __FUNCTION__, (char*)s1, (char*)s2);
> > +    }
> > +#endif
>
> Why CONFIG_USER_ONLY ?

s1 and s2 are target addresses and not valid on the host system in the 
SOFTMMU case, so this would segfault.

> > +/* union used for splitting/joining 128-bit floats to/from 64-bit
> > FP regs */ +typedef union {
> > +    struct {
> > +#ifdef WORDS_BIGENDIAN
> > +        uint64_t h;
> > +        uint64_t l;
> > +#else
> > +        uint64_t l;
> > +        uint64_t h;
> > +#endif
> > +    };
> > +    float128 x;
> > +} FP128;
>
> WORDS_BIGENDIAN is wrong here. CPU_QuadU can probably be used instead.

Agree.

> > +    if (float32_is_neg(v2)) {
> > +        v1 = float32_abs(v2);
> > +    }
> > +    else {
> > +        v1 = v2;
> > +    }
>
> I don't see the point of such a test here.

Now that you mention it, I don't really see it either.

> > +    v2.i = ldl(a2);
>
> The value should be passed directly instead of loaded here, as ldl is
> is wrong depending on the MMU mode.

Yup.

> > +#ifdef WORDS_BIGENDIAN
>
> This is wrong. It should probably be HOST_WORDS_BIGENDIAN.

I concur.

> > +static TCGv load_reg(int reg)
> > +{
> > +    TCGv r = tcg_temp_new_i64();
> > +#ifdef TCGREGS
> > +    sync_reg32(reg);
> > +    tcg_gen_mov_i64(r, tcgregs[reg]);
> > +    return r;
> > +#else
> > +    tcg_gen_ld_i64(r, cpu_env, offsetof(CPUState, regs[reg]));
> > +    return r;
> > +#endif
> > +}
>
> I don't really like implicit TCGv temp allocation. In other targets it
> often has caused missing tcg_temp_free().

I did in fact run into problems with this and was thus forced to make 
sure that no tcg_temp_free()s are missing... :)

> It should probably be 
> rewritten as load_reg(TCGv t, int reg).

Later. This is the kind of change that you never get right on the first 
try, so I'd like to do that after everything went upstream.

> For all register load/store, as already explained in the TCG sync op
> patch, I am in favor of using the ld/st version (TCGREGS not defined).

It doesn't perform.

> > +static void gen_illegal_opcode(DisasContext *s)
> > +{
> > +    TCGv tmp = tcg_temp_new_i64();
> > +    tcg_gen_movi_i64(tmp, 42);
>
> tcg_const_i64 could be used instead.

Yes. Using the actual value for a specification exception instead of 42 
might be a good idea as well. :)

>
> > +    gen_helper_exception(tmp);
>
> Missing tcg_temp_free_i64(tmp);

OK, so I missed _that_ one. Fine. Be like that if you want to. Illegal 
instructions are not that frequent anyway...

> > +    gen_helper_cmp_s32(cc, v1, tcg_const_i32(v2));
>
> The TCG passed to the helper should be freed.

And that one...

> > +    gen_helper_cmp_u32(cc, v1, tcg_const_i32(v2));
>
> Same.

And that one... :(

> > +    gen_helper_cmp_s64(cc, v1, tcg_const_i64(v2));
>
> Same

And that one... :((

> > +    gen_helper_cmp_u64(cc, v1, tcg_const_i64(v2));
>
> Same

And that one... :(((

> The branches should be handled using brcond and goto_tb and exit_tb
> and not with helpers, to allow TB chaining.

I'll look into that, but I guess it's not entirely trivial.

> > +    TCGv tmp = 0, tmp2 = 0, tmp3 = 0;
>
> This is wrong. 0 maps to global 0 (env in your case). -1 should be
> used instead, or even better the TCGV_UNUSED macro.

OK.

> > +    case 0x12: /* LT R1,D2(X2,B2) [RXY] */
> > +        tmp2 = tcg_temp_new_i32();
>
> Wrong TCGv type.
>
> > +        tcg_gen_qemu_ld32s(tmp2, tmp, 1);
>
> tcg_gen_qemu_ld32s loads a 32 bit value in the default size register,
> that is 64-bit here.

Actually, this a purely 32-bit insn, so it should be tcg_gen_qemu_ld32() 
in the first place.

> > +            tcg_gen_qemu_ld32s(tmp2, tmp, 1);
> > +            tcg_gen_ext32s_i64(tmp2, tmp2);
>
> The sign extension is already done by qemu_ld32s

Just to be on the safe side. :)

> > +        switch (op) {
> > +        case 0x8: case 0x18: gen_set_cc_add64(tmp, tmp2, tmp3);
> > break; +        case 0xa: case 0x1a: gen_helper_set_cc_addu64(cc,
> > tmp, tmp2, tmp3); break; +        default: tcg_abort();
>
> tcg_abort() is wrong here, it is for internal TCG use. Also all the
> real CPU it probably launch an illegal instruction exception, it does
> not power off the machin.

Actually, if default is reached here there's an error in the translator 
(it should only get here in the explicit cases), so I'd consider 
tcg_abort() appropriate.

> > +    case 0x1e: /* LRV R1,D2(X2,B2) [RXY] */
> > +        tmp2 = tcg_temp_new_i32();
>
> Wrong TCGv type;
>
> > +        tcg_gen_qemu_ld32u(tmp2, tmp, 1);
> > +        tcg_gen_bswap32_i32(tmp2, tmp2);
>
> Wrong TCGv type;
>
> > +        store_reg(r1, tmp2);

This is more wrong than you think, because this is a 32-bit insn, so it 
shouldn't overwrite the top 32 bits. Rarely used, I guess that's how it 
could slip through.

> > +    case 0x3e: /* STRV R1,D2(X2,B2) [RXY] */
> > +        tmp2 = load_reg32(r1);
> > +        tcg_gen_bswap32_i32(tmp2, tmp2);
> > +        tcg_gen_qemu_st32(tmp2, tmp, 1);
>
> Wrong TCGv type.

What's wrong here? Everything's 32-bit.

> > +    case 0x57: /* XY R1,D2(X2,B2) [RXY] */
> > +        tmp2 = load_reg32(r1);
> > +        tmp3 = tcg_temp_new_i32();
> > +        tcg_gen_qemu_ld32u(tmp3, tmp, 1);
> > +        tcg_gen_xor_i32(tmp, tmp2, tmp3);
> > +        store_reg32(r1, tmp);
> > +        set_cc_nz_u32(tmp);
>
> Wrong TCGv type.

No need to zero-extend anyway, it's a 32-bit insn.

> > +    case 0x76: /* LB R1,D2(X2,B2) [RXY] */
> > +    case 0x77: /* LGB R1,D2(X2,B2) [RXY] */
> > +        tmp2 = tcg_temp_new_i64();
> > +        tcg_gen_qemu_ld8s(tmp2, tmp, 1);
> > +        switch (op) {
> > +        case 0x76:
> > +            tcg_gen_ext8s_i32(tmp2, tmp2);
>
> Wrong TCGv type.
>
> > +            store_reg32(r1, tmp2);
> > +            break;
> > +        case 0x77:
> > +            tcg_gen_ext8s_i64(tmp2, tmp2);
>
> Wrong TCGv type.

The way I see it, there are only 32-bit and 64-bit types, so it can only 
be wrong in one case, right?

> > +    case 0x78: /* LHY R1,D2(X2,B2) [RXY] */
> > +        tmp2 = tcg_temp_new_i32();
>
> Wrong TCGv type.
>
> > +        tcg_gen_qemu_ld16s(tmp2, tmp, 1);
> > +        tcg_gen_ext16s_i32(tmp2, tmp2);
> > +        store_reg32(r1, tmp2);
> > +        break;

Replacing that tcg_gen_qemu_ld16s() with tcg_gen_qemu_ld16() should do 
it, right?

> > +    case 0x86: /* MLG      R1,D2(X2,B2)     [RXY] */
> > +        tmp2 = tcg_temp_new_i64();
> > +        tcg_gen_qemu_ld64(tmp2, tmp, 1);
> > +        tmp = tcg_const_i32(r1);
>
> Wrong TCGv type.
>
> > +        gen_helper_mlg(tmp, tmp2);

Huh? It's a register number. There are only 16 GP registers. Fits a 
32-bit value, last time I checked. The helper also expects a 32-bit 
value.

> > +    if (tmp2) tcg_temp_free(tmp2);
> > +    if (tmp3) tcg_temp_free(tmp3);
>
> Comparison on TCGv type is not allowed.

Not even to TCGV_UNUSED?

> > +static void disas_ed(DisasContext *s, int op, int r1, int x2, int
> > b2, int d2, int r1b) +{
> > +    TCGv tmp, tmp2, tmp3 = 0;
>
> tmp should be declared as TGV_i32 here, so that the types are correct
> for the whole function. Also tmp3 should not be initialized to 0.

OK.

> > +        gen_helper_madb(tmp3, tmp2, tmp);
>
> tmp3 should be freed after the helper.

Yup.

> > +    LOG_DISAS("disas_c0: op 0x%x r1 %d i2 %d\n", op, r1, i2);
> > +    uint64_t target = s->pc + i2 * 2;
> > +    /* FIXME: huh? */ target &= 0xffffffff;
>
> That should be fixed before a merge.

Tricky, because I have not yet understood why it is necessary. According 
to the POP, the address calculation in 64-bit addressing mode should be 
64 bits. In practice, on a real machine running a real Linux kernel, it 
wraps around at 32 bits, and userland code actually relies on that. No 
clue what I could be missing here...

> > +static inline uint64_t ld_code2(uint64_t pc) { return
> > (uint64_t)lduw_code(pc); }
>
> Coding style.

OK.

> > +#if 1
> > +    cc = tcg_temp_local_new_i32();
> > +    tcg_gen_mov_i32(cc, global_cc);
> > +#else
> > +    cc = global_cc;
> > +#endif
>
> Why this?

Because it didn't work otherwise. I don't claim to understand the 
mysterious ways of the TCG to its full extent...

> > +    } while (!dc.is_jmp && gen_opc_ptr < gen_opc_end && dc.pc <
> > next_page_start && num_insns < max_insns);
>
> The also translation should be stopped if in singlestep mode
> (singlestep variable).

OK.

Phew, that was a big one. :)

Thanks for reviewing. I'll fix what makes sense to me and send a new one 
when I'm done.

CU
Uli

-- 
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] TCG "sync" op
  2009-10-16 17:29       ` Aurelien Jarno
@ 2009-10-19 17:17         ` Ulrich Hecht
  0 siblings, 0 replies; 26+ messages in thread
From: Ulrich Hecht @ 2009-10-19 17:17 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: riku.voipio, qemu-devel, agraf

On Friday 16 October 2009, Aurelien Jarno wrote:
> This example is a bit biased, as registers are only saved, and never
> reused. Let's comment on it though.

Yeah, well, I searched from the top for the first case where it makes a 
difference. If it's of any help, I can upload a complete dump of both 
versions somewhere.

> and I don't understand what is the gain compared to 
> the use of tcg_gen_ld/st.

There are two sets of TCG values, tcgregs (which would arguably better 
called tcgregs64) and tcgregs32. When doing a 32-bit access, tcgregs(64) 
is synced, which is a nop if tcgregs(64) hasn't been touched. When doing 
a 64-bit access, tcgregs32 is synced, which is a nop if tcgregs32 hasn't 
been touched. In practice, 32-bit accesses followed by 64-bit accesses 
and vice versa are very rare, so in most cases, sync is a nop. 
tcg_gen_ld/st is never a nop. That's the benefit.

CU
Uli

-- 
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 1/9] TCG "sync" op
  2009-10-17  8:59     ` Edgar E. Iglesias
@ 2009-10-19 17:17       ` Ulrich Hecht
  0 siblings, 0 replies; 26+ messages in thread
From: Ulrich Hecht @ 2009-10-19 17:17 UTC (permalink / raw)
  To: Edgar E. Iglesias; +Cc: riku.voipio, qemu-devel, aurelien, agraf

On Saturday 17 October 2009, Edgar E. Iglesias wrote:
> I looked at the s390 patches and was also unsure about this sync op.
> I'm not convinced it's bad but my first feeling was as Aurelien points
> out that the translator shoud take care of it.

Indeed. I would have expected it to, in fact. But it doesn't. The sync op 
is the simplest and quickest way to get what I want that I could come up 
with. I'd be perfectly happy if TCG could handle aliases on its own, but 
doing a lot of ALU operations on every register access is not an option.

> Another thing I noticed was the large amount of helpers. Without
> looking at the details my feeling was that you could probably do more
> at translation time.

My experience is that helper functions have an undeservedly bad image. A 
pure const helper call is very easy to optimize away for TCG. Random bit 
shifting, comparing and branching isn't.

> Nice work!

Thank you.

CU
Uli

-- 
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 2/9] S/390 CPU emulation
  2009-10-19 17:17       ` Ulrich Hecht
@ 2009-10-22 21:28         ` Aurelien Jarno
  2009-11-02 15:16           ` Ulrich Hecht
  0 siblings, 1 reply; 26+ messages in thread
From: Aurelien Jarno @ 2009-10-22 21:28 UTC (permalink / raw)
  To: Ulrich Hecht; +Cc: riku.voipio, qemu-devel, agraf

On Mon, Oct 19, 2009 at 07:17:18PM +0200, Ulrich Hecht wrote:
> On Saturday 17 October 2009, Aurelien Jarno wrote:
> > On Fri, Oct 16, 2009 at 02:38:48PM +0200, Ulrich Hecht wrote:
> > First of all a few general comments. Note that I know very few things
> > about S390/S390X, so I may have dumb comments/questions. Also as the
> > patch is very long, I probably have missed things, we should probably
> > iterate with new versions of the patches.
> >
> > Is it possible given the current implementation to emulate S390 in
> > addition to S390X?
> 
> Not as it stands. The S/390 instruction set is a subset of zArch 
> (64-bit), but currently only 64-bit addressing is implemented (because 
> that is the only mode used by 64-bit Linux binaries).
> 
> > If yes how different would be a S390 only target? 
> 
> It would need support for 31-bit addressing. 24-bit, too, if you want to 
> run the really old stuff.
> 
> > If they won't be too different, it probably worth using _tl type
> > registers and call it target-s390. A bit the way it is done with
> > i386/x86_64, ppc/ppc64, mips/mips64 or sparc/sparc64.
> 
> If it's going to be implemented, it will certainly share most code with 
> the 64-bit target, so having a common directory would most likely be a 
> good idea.
> 
> There are some cases where _tl types might be appropriate, but generally 
> everything that is shared between S/390 and zArch is 32-bit, and 
> instructions present in both architectures also have the same operand 
> size: L, for instance, is a 32-bit load to a 32-bit register on both 
> architectures.

Ok, so I think we should go for putting the code is the target-s390 
directory. As no code is shared for now, we can just ignore the _tl
stuff.

> > Secondly there seems to be a lot of mix between 32-bit and 64-bit
> > TCG registers. They should not be mixed. _i32 ops apply to TCGv_i32
> > variables, _i64 ops apply to TCGv_i64 variables, and _tl ops to TGCv
> > variables. TCGv/_tl types can map either to _i32 or _i64 depending on
> > your target. You should try to build the target with
> > --enable-debug-tcg
> 
> Er, reading the code and observing the behavior of the (AMD64) code 
> generator, it seemed to me that neither frontends nor backends care one 
> bit about that. I'll look into it, though. (I won't comment on those 
> below, except in cases where the solution or, indeed, the problem is not 
> clear to me.)

It is not strictly needed on AMD64, because a move between 32- and 64-
bit registers also zero extends the value. It is definitely needs on
other hosts (and most probably s390x as I understand).

> > It would be nice to split all the disassembling part in a separate
> > combined with the related makefile changes. This way it can be applied
> > separately.
> 
> Can do.
> 
> > >  //  switch (info->mach)
> > >  //    {
> > >  //    case bfd_mach_s390_31:
> > > -      current_arch_mask = 1 << S390_OPCODE_ESA;
> > > +//      current_arch_mask = 1 << S390_OPCODE_ESA;
> > >  //      break;
> > >  //    case bfd_mach_s390_64:
> > > -//      current_arch_mask = 1 << S390_OPCODE_ZARCH;
> > > +      current_arch_mask = 1 << S390_OPCODE_ZARCH;
> > >  //      break;
> > >  //    default:
> > >  //      abort ();
> >
> > While I understand the second part, Why is it necessary to comment the
> > bfd_mach_s390_31 case?
> 
> It's not necessary, but current_arch_mask can only hold one value...

I'll answer in the new version of the patch.

> 
> > > +    /* FIXME: reset vector? */
> >
> > Yes, reset vector, MMU mode, default register values, ...
> 
> Works perfectly fine without. ;)
> 
> Also, the question remains what kind of reset cpu_reset() is supposed to 
> be. For instance, a CPU Reset or Initial CPU Reset on S/390 leaves the 
> general-purpose registers unchanged, a Clear Reset or Power-On Reset 
> sets them to zero. See the table on p. 4-50 in the zArchitecture 
> Principles of Operation (POP), 
> http://publibfp.boulder.ibm.com/cgi-bin/bookmgr/download/A2278324.pdf, 
> for the gory details.

I don't know what is the best, probably a CPU Reset is enough. Just
consider that as the code that will be run if someone press the reset
button.

> > > +    for (i = 0; i <= l; i++) {
> > > +        x = ldub(dest + i) & ldub(src + i);
> > > +        if (x) cc = 1;
> >
> > coding style
> 
> Maybe you are referring to "Every indented statement is braced; even if 
> the block contains just one statement.", but I figured that that does 
> not apply here because there is no indentation...

You are basically trying to workaround the coding style. The idea behind
it is to write: 
if (x) {
    cc = 1;
}

> > All the branch part would really gain to be coded in TCG, as it will
> > allow TB chaining.
> 
> I vaguely remember trying that as well, but I don't know if I gave up on 
> it because it was slower, or because I couldn't get it to work...

Probably the second. Changing the instruction pointer in the helper
instead of using the proper goto_tb TCG op prevents TB chaining, and
therefore as a huge impact on performance.

It's something not difficult to implement, and that I would definitely
want to see in the patch before getting it merged.

> > It should probably be 
> > rewritten as load_reg(TCGv t, int reg).
> 
> Later. This is the kind of change that you never get right on the first 
> try, so I'd like to do that after everything went upstream.

It also means huge patch difficult to review when it is done after the
initial merge, as they need to be checked within the context.

 
> > > +        tmp = tcg_const_i32(r1);
> >
> > Wrong TCGv type.
> >
> > > +        gen_helper_mlg(tmp, tmp2);
> 
> Huh? It's a register number. There are only 16 GP registers. Fits a 
> 32-bit value, last time I checked. The helper also expects a 32-bit 
> value.

Then a TCGv_i32 variable should be used. TCGv is 64-bit, as the whole
target is 64-bit.

> > > +    if (tmp2) tcg_temp_free(tmp2);
> > > +    if (tmp3) tcg_temp_free(tmp3);
> >
> > Comparison on TCGv type is not allowed.
> 
> Not even to TCGV_UNUSED?

TCGV_UNUSED() is a macro to set the value, not to retrieve it.

> > > +    LOG_DISAS("disas_c0: op 0x%x r1 %d i2 %d\n", op, r1, i2);
> > > +    uint64_t target = s->pc + i2 * 2;
> > > +    /* FIXME: huh? */ target &= 0xffffffff;
> >
> > That should be fixed before a merge.
> 
> Tricky, because I have not yet understood why it is necessary. According 
> to the POP, the address calculation in 64-bit addressing mode should be 
> 64 bits. In practice, on a real machine running a real Linux kernel, it 
> wraps around at 32 bits, and userland code actually relies on that. No 
> clue what I could be missing here...

Does it means the userspace is actually limited to 32-bit addressing?

> > > +#if 1
> > > +    cc = tcg_temp_local_new_i32();
> > > +    tcg_gen_mov_i32(cc, global_cc);
> > > +#else
> > > +    cc = global_cc;
> > > +#endif
> >
> > Why this?
> 
> Because it didn't work otherwise. I don't claim to understand the 
> mysterious ways of the TCG to its full extent...

What do you mean by didn't work? TCG error? Wrong emulation? Something
else?

-- 
Aurelien Jarno	                        GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 2/9] S/390 CPU emulation
  2009-10-22 21:28         ` Aurelien Jarno
@ 2009-11-02 15:16           ` Ulrich Hecht
  2009-11-02 18:42             ` Aurelien Jarno
  0 siblings, 1 reply; 26+ messages in thread
From: Ulrich Hecht @ 2009-11-02 15:16 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel, agraf

[-- Attachment #1: Type: text/plain, Size: 896 bytes --]

On Thursday 22 October 2009, Aurelien Jarno wrote:
> Probably the second. Changing the instruction pointer in the helper
> instead of using the proper goto_tb TCG op prevents TB chaining, and
> therefore as a huge impact on performance.
>
> It's something not difficult to implement, and that I would definitely
> want to see in the patch before getting it merged.

OK, I implemented it, and the surprising result is that performance  does 
not get any better; in fact it even suffers a little bit. (My standard 
quick test, the polarssl test suite, shows about a 2% performance impact 
when profiled with cachegrind).

Could there be anything I overlooked? I modelled my implementation after 
those in the existing targets. (See the attached patch that goes on top 
of my other S/390 patches.)

CU
Uli

-- 
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)

[-- Attachment #2: s390-goto_tb.patch --]
[-- Type: text/x-diff, Size: 5423 bytes --]

diff --git a/target-s390/helpers.h b/target-s390/helpers.h
index 0d16760..6009312 100644
--- a/target-s390/helpers.h
+++ b/target-s390/helpers.h
@@ -15,7 +15,6 @@ DEF_HELPER_FLAGS_1(set_cc_comp_s64, TCG_CALL_PURE|TCG_CALL_CONST, i32, s64)
 DEF_HELPER_FLAGS_1(set_cc_nz_u32, TCG_CALL_PURE|TCG_CALL_CONST, i32, i32)
 DEF_HELPER_FLAGS_1(set_cc_nz_u64, TCG_CALL_PURE|TCG_CALL_CONST, i32, i64)
 DEF_HELPER_FLAGS_2(set_cc_icm, TCG_CALL_PURE|TCG_CALL_CONST, i32, i32, i32)
-DEF_HELPER_4(brc, void, i32, i32, i64, s32)
 DEF_HELPER_3(brctg, void, i64, i64, s32)
 DEF_HELPER_3(brct, void, i32, i64, s32)
 DEF_HELPER_4(brcl, void, i32, i32, i64, s64)
diff --git a/target-s390/op_helper.c b/target-s390/op_helper.c
index 637d22f..f7f52ba 100644
--- a/target-s390/op_helper.c
+++ b/target-s390/op_helper.c
@@ -218,17 +218,6 @@ uint32_t HELPER(set_cc_icm)(uint32_t mask, uint32_t val)
     return cc;
 }
 
-/* relative conditional branch */
-void HELPER(brc)(uint32_t cc, uint32_t mask, uint64_t pc, int32_t offset)
-{
-    if ( mask & ( 1 << (3 - cc) ) ) {
-        env->psw.addr = pc + offset;
-    }
-    else {
-        env->psw.addr = pc + 4;
-    }
-}
-
 /* branch relative on 64-bit count (condition is computed inline, this only
    does the branch */
 void HELPER(brctg)(uint64_t flag, uint64_t pc, int32_t offset)
diff --git a/target-s390/translate.c b/target-s390/translate.c
index 9ffa7bd..5a7cfe7 100644
--- a/target-s390/translate.c
+++ b/target-s390/translate.c
@@ -49,6 +49,7 @@ struct DisasContext {
     uint64_t pc;
     int is_jmp;
     CPUS390XState *env;
+    struct TranslationBlock *tb;
 };
 
 #define DISAS_EXCP 4
@@ -359,23 +360,55 @@ static void gen_bcr(uint32_t mask, int tr, uint64_t offset)
     tcg_temp_free(target);
 }
 
-static void gen_brc(uint32_t mask, uint64_t pc, int32_t offset)
+static inline void gen_goto_tb(DisasContext *s, int tb_num, target_ulong pc)
 {
-    TCGv p;
-    TCGv_i32 m, o;
+    TranslationBlock *tb;
+
+    tb = s->tb;
+    /* NOTE: we handle the case where the TB spans two pages here */
+    if ((pc & TARGET_PAGE_MASK) == (tb->pc & TARGET_PAGE_MASK) ||
+        (pc & TARGET_PAGE_MASK) == ((s->pc - 1) & TARGET_PAGE_MASK))  {
+        /* jump to same page: we can use a direct jump */
+        tcg_gen_mov_i32(global_cc, cc);
+        tcg_gen_goto_tb(tb_num);
+        tcg_gen_movi_i64(psw_addr, pc);
+        tcg_gen_exit_tb((long)tb + tb_num);
+    } else {
+        /* jump to another page: currently not optimized */
+        tcg_gen_movi_i64(psw_addr, pc);
+        tcg_gen_mov_i32(global_cc, cc);
+        tcg_gen_exit_tb(0);
+    }
+}
+
+static void gen_brc(uint32_t mask, DisasContext *s, int32_t offset)
+{
+    TCGv_i32 r;
+    TCGv_i32 tmp, tmp2;
+    int skip;
     
     if (mask == 0xf) {	/* unconditional */
-      tcg_gen_movi_i64(psw_addr, pc + offset);
+      //tcg_gen_movi_i64(psw_addr, s->pc + offset);
+      gen_goto_tb(s, 0, s->pc + offset);
     }
     else {
-      m = tcg_const_i32(mask);
-      p = tcg_const_i64(pc);
-      o = tcg_const_i32(offset);
-      gen_helper_brc(cc, m, p, o);
-      tcg_temp_free(m);
-      tcg_temp_free(p);
-      tcg_temp_free(o);
+      tmp = tcg_const_i32(3);
+      tcg_gen_sub_i32(tmp, tmp, cc);	/* 3 - cc */
+      tmp2 = tcg_const_i32(1);
+      tcg_gen_shl_i32(tmp2, tmp2, tmp);	/* 1 << (3 - cc) */
+      r = tcg_const_i32(mask);
+      tcg_gen_and_i32(r, r, tmp2);	/* mask & (1 << (3 - cc)) */
+      tcg_temp_free(tmp);
+      tcg_temp_free(tmp2);
+      skip = gen_new_label();
+      tcg_gen_brcondi_i32(TCG_COND_EQ, r, 0, skip);
+      gen_goto_tb(s, 0, s->pc + offset);
+      gen_set_label(skip);
+      gen_goto_tb(s, 1, s->pc + 4);
+      tcg_gen_mov_i32(global_cc, cc);
+      tcg_temp_free(r);
     }
+    s->is_jmp = DISAS_TB_JUMP;
 }
 
 static void gen_set_cc_add64(TCGv v1, TCGv v2, TCGv vr)
@@ -1143,9 +1176,7 @@ static void disas_a7(DisasContext *s, int op, int r1, int i2)
         tcg_temp_free(tmp2);
         break;
     case 0x4: /* brc m1, i2 */
-        /* FIXME: optimize m1 == 0xf (unconditional) case */
-        gen_brc(r1, s->pc, i2 * 2);
-        s->is_jmp = DISAS_JUMP;
+        gen_brc(r1, s, i2 * 2);
         return;
     case 0x5: /* BRAS     R1,I2     [RI] */
         tmp = tcg_const_i64(s->pc + 4);
@@ -2739,6 +2770,7 @@ static inline void gen_intermediate_code_internal (CPUState *env,
     dc.env = env;
     dc.pc = pc_start;
     dc.is_jmp = DISAS_NEXT;
+    dc.tb = tb;
     
     gen_opc_end = gen_opc_buf + OPC_MAX_SIZE;
     
@@ -2778,8 +2810,11 @@ static inline void gen_intermediate_code_internal (CPUState *env,
         num_insns++;
     } while (!dc.is_jmp && gen_opc_ptr < gen_opc_end && dc.pc < next_page_start
              && num_insns < max_insns && !env->singlestep_enabled);
-    tcg_gen_mov_i32(global_cc, cc);
-    tcg_temp_free(cc);
+
+    if (dc.is_jmp != DISAS_TB_JUMP) {
+        tcg_gen_mov_i32(global_cc, cc);
+        tcg_temp_free(cc);
+    }
     
     if (!dc.is_jmp) {
         tcg_gen_st_i64(tcg_const_i64(dc.pc), cpu_env, offsetof(CPUState, psw.addr));
@@ -2794,7 +2829,9 @@ static inline void gen_intermediate_code_internal (CPUState *env,
     if (tb->cflags & CF_LAST_IO)
         gen_io_end();
     /* Generate the return instruction */
-    tcg_gen_exit_tb(0);
+    if (dc.is_jmp != DISAS_TB_JUMP) {
+        tcg_gen_exit_tb(0);
+    }
     gen_icount_end(tb, num_insns);
     *gen_opc_ptr = INDEX_op_end;
     if (search_pc) {

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 2/9] S/390 CPU emulation
  2009-11-02 15:16           ` Ulrich Hecht
@ 2009-11-02 18:42             ` Aurelien Jarno
  2009-11-02 19:03               ` Laurent Desnogues
  0 siblings, 1 reply; 26+ messages in thread
From: Aurelien Jarno @ 2009-11-02 18:42 UTC (permalink / raw)
  To: Ulrich Hecht; +Cc: qemu-devel, agraf

On Mon, Nov 02, 2009 at 05:16:44PM +0200, Ulrich Hecht wrote:
> On Thursday 22 October 2009, Aurelien Jarno wrote:
> > Probably the second. Changing the instruction pointer in the helper
> > instead of using the proper goto_tb TCG op prevents TB chaining, and
> > therefore as a huge impact on performance.
> >
> > It's something not difficult to implement, and that I would definitely
> > want to see in the patch before getting it merged.
> 
> OK, I implemented it, and the surprising result is that performance  does 
> not get any better; in fact it even suffers a little bit. (My standard 
> quick test, the polarssl test suite, shows about a 2% performance impact 
> when profiled with cachegrind).

That looks really strange, as TB chaining clearly reduce the number of
instructions to execute, by not have to lookup for the TB after each
branch. Also using a brcond instead of a helper should change nothing as
it is located at the end of the TB, where all the globals must be saved
in anyway.

Also a recent bug found on ARM host with regard to TB chaining has shown
it can gives a noticeably speed gain.

On what host are you doing your benchmarks?

> Could there be anything I overlooked? I modelled my implementation after 
> those in the existing targets. (See the attached patch that goes on top 
> of my other S/390 patches.)
> 

Your patch looks good overall, I have minor comments (see below), 
but nothing that should improve the speed noticeably.

> -- 
> SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)

> diff --git a/target-s390/helpers.h b/target-s390/helpers.h
> index 0d16760..6009312 100644
> --- a/target-s390/helpers.h
> +++ b/target-s390/helpers.h
> @@ -15,7 +15,6 @@ DEF_HELPER_FLAGS_1(set_cc_comp_s64, TCG_CALL_PURE|TCG_CALL_CONST, i32, s64)
>  DEF_HELPER_FLAGS_1(set_cc_nz_u32, TCG_CALL_PURE|TCG_CALL_CONST, i32, i32)
>  DEF_HELPER_FLAGS_1(set_cc_nz_u64, TCG_CALL_PURE|TCG_CALL_CONST, i32, i64)
>  DEF_HELPER_FLAGS_2(set_cc_icm, TCG_CALL_PURE|TCG_CALL_CONST, i32, i32, i32)
> -DEF_HELPER_4(brc, void, i32, i32, i64, s32)
>  DEF_HELPER_3(brctg, void, i64, i64, s32)
>  DEF_HELPER_3(brct, void, i32, i64, s32)
>  DEF_HELPER_4(brcl, void, i32, i32, i64, s64)
> diff --git a/target-s390/op_helper.c b/target-s390/op_helper.c
> index 637d22f..f7f52ba 100644
> --- a/target-s390/op_helper.c
> +++ b/target-s390/op_helper.c
> @@ -218,17 +218,6 @@ uint32_t HELPER(set_cc_icm)(uint32_t mask, uint32_t val)
>      return cc;
>  }
>  
> -/* relative conditional branch */
> -void HELPER(brc)(uint32_t cc, uint32_t mask, uint64_t pc, int32_t offset)
> -{
> -    if ( mask & ( 1 << (3 - cc) ) ) {
> -        env->psw.addr = pc + offset;
> -    }
> -    else {
> -        env->psw.addr = pc + 4;
> -    }
> -}
> -
>  /* branch relative on 64-bit count (condition is computed inline, this only
>     does the branch */
>  void HELPER(brctg)(uint64_t flag, uint64_t pc, int32_t offset)
> diff --git a/target-s390/translate.c b/target-s390/translate.c
> index 9ffa7bd..5a7cfe7 100644
> --- a/target-s390/translate.c
> +++ b/target-s390/translate.c
> @@ -49,6 +49,7 @@ struct DisasContext {
>      uint64_t pc;
>      int is_jmp;
>      CPUS390XState *env;
> +    struct TranslationBlock *tb;
>  };
>  
>  #define DISAS_EXCP 4
> @@ -359,23 +360,55 @@ static void gen_bcr(uint32_t mask, int tr, uint64_t offset)
>      tcg_temp_free(target);
>  }
>  
> -static void gen_brc(uint32_t mask, uint64_t pc, int32_t offset)
> +static inline void gen_goto_tb(DisasContext *s, int tb_num, target_ulong pc)
>  {
> -    TCGv p;
> -    TCGv_i32 m, o;
> +    TranslationBlock *tb;
> +
> +    tb = s->tb;
> +    /* NOTE: we handle the case where the TB spans two pages here */
> +    if ((pc & TARGET_PAGE_MASK) == (tb->pc & TARGET_PAGE_MASK) ||
> +        (pc & TARGET_PAGE_MASK) == ((s->pc - 1) & TARGET_PAGE_MASK))  {

I have difficulties to figure out why the second comparison is needed. I
know it comes from target-i386, but on the other hand it is not present
in other targets.

> +        /* jump to same page: we can use a direct jump */
> +        tcg_gen_mov_i32(global_cc, cc);
> +        tcg_gen_goto_tb(tb_num);
> +        tcg_gen_movi_i64(psw_addr, pc);
> +        tcg_gen_exit_tb((long)tb + tb_num);
> +    } else {
> +        /* jump to another page: currently not optimized */
> +        tcg_gen_movi_i64(psw_addr, pc);
> +        tcg_gen_mov_i32(global_cc, cc);
> +        tcg_gen_exit_tb(0);
> +    }
> +}
> +
> +static void gen_brc(uint32_t mask, DisasContext *s, int32_t offset)
> +{
> +    TCGv_i32 r;
> +    TCGv_i32 tmp, tmp2;
> +    int skip;
>      
>      if (mask == 0xf) {	/* unconditional */
> -      tcg_gen_movi_i64(psw_addr, pc + offset);
> +      //tcg_gen_movi_i64(psw_addr, s->pc + offset);
> +      gen_goto_tb(s, 0, s->pc + offset);
>      }
>      else {
> -      m = tcg_const_i32(mask);
> -      p = tcg_const_i64(pc);
> -      o = tcg_const_i32(offset);
> -      gen_helper_brc(cc, m, p, o);
> -      tcg_temp_free(m);
> -      tcg_temp_free(p);
> -      tcg_temp_free(o);
> +      tmp = tcg_const_i32(3);
> +      tcg_gen_sub_i32(tmp, tmp, cc);	/* 3 - cc */
> +      tmp2 = tcg_const_i32(1);
> +      tcg_gen_shl_i32(tmp2, tmp2, tmp);	/* 1 << (3 - cc) */
> +      r = tcg_const_i32(mask);
> +      tcg_gen_and_i32(r, r, tmp2);	/* mask & (1 << (3 - cc)) */
> +      tcg_temp_free(tmp);
> +      tcg_temp_free(tmp2);
> +      skip = gen_new_label();
> +      tcg_gen_brcondi_i32(TCG_COND_EQ, r, 0, skip);
> +      gen_goto_tb(s, 0, s->pc + offset);
> +      gen_set_label(skip);
> +      gen_goto_tb(s, 1, s->pc + 4);
> +      tcg_gen_mov_i32(global_cc, cc);

This is probably not needed as this is already done in gen_goto_tb().

> +      tcg_temp_free(r);
>      }
> +    s->is_jmp = DISAS_TB_JUMP;
>  }
>  
>  static void gen_set_cc_add64(TCGv v1, TCGv v2, TCGv vr)
> @@ -1143,9 +1176,7 @@ static void disas_a7(DisasContext *s, int op, int r1, int i2)
>          tcg_temp_free(tmp2);
>          break;
>      case 0x4: /* brc m1, i2 */
> -        /* FIXME: optimize m1 == 0xf (unconditional) case */
> -        gen_brc(r1, s->pc, i2 * 2);
> -        s->is_jmp = DISAS_JUMP;
> +        gen_brc(r1, s, i2 * 2);
>          return;
>      case 0x5: /* BRAS     R1,I2     [RI] */
>          tmp = tcg_const_i64(s->pc + 4);
> @@ -2739,6 +2770,7 @@ static inline void gen_intermediate_code_internal (CPUState *env,
>      dc.env = env;
>      dc.pc = pc_start;
>      dc.is_jmp = DISAS_NEXT;
> +    dc.tb = tb;
>      
>      gen_opc_end = gen_opc_buf + OPC_MAX_SIZE;
>      
> @@ -2778,8 +2810,11 @@ static inline void gen_intermediate_code_internal (CPUState *env,
>          num_insns++;
>      } while (!dc.is_jmp && gen_opc_ptr < gen_opc_end && dc.pc < next_page_start
>               && num_insns < max_insns && !env->singlestep_enabled);
> -    tcg_gen_mov_i32(global_cc, cc);
> -    tcg_temp_free(cc);
> +
> +    if (dc.is_jmp != DISAS_TB_JUMP) {
> +        tcg_gen_mov_i32(global_cc, cc);
> +        tcg_temp_free(cc);
> +    }
>      
>      if (!dc.is_jmp) {
>          tcg_gen_st_i64(tcg_const_i64(dc.pc), cpu_env, offsetof(CPUState, psw.addr));
> @@ -2794,7 +2829,9 @@ static inline void gen_intermediate_code_internal (CPUState *env,
>      if (tb->cflags & CF_LAST_IO)
>          gen_io_end();
>      /* Generate the return instruction */
> -    tcg_gen_exit_tb(0);
> +    if (dc.is_jmp != DISAS_TB_JUMP) {
> +        tcg_gen_exit_tb(0);
> +    }
>      gen_icount_end(tb, num_insns);
>      *gen_opc_ptr = INDEX_op_end;
>      if (search_pc) {


-- 
Aurelien Jarno	                        GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 2/9] S/390 CPU emulation
  2009-11-02 18:42             ` Aurelien Jarno
@ 2009-11-02 19:03               ` Laurent Desnogues
  2009-11-09 16:55                 ` Ulrich Hecht
  0 siblings, 1 reply; 26+ messages in thread
From: Laurent Desnogues @ 2009-11-02 19:03 UTC (permalink / raw)
  To: Aurelien Jarno, Ulrich Hecht; +Cc: qemu-devel, agraf

On Mon, Nov 2, 2009 at 7:42 PM, Aurelien Jarno <aurelien@aurel32.net> wrote:
> On Mon, Nov 02, 2009 at 05:16:44PM +0200, Ulrich Hecht wrote:
>> On Thursday 22 October 2009, Aurelien Jarno wrote:
>> > Probably the second. Changing the instruction pointer in the helper
>> > instead of using the proper goto_tb TCG op prevents TB chaining, and
>> > therefore as a huge impact on performance.
>> >
>> > It's something not difficult to implement, and that I would definitely
>> > want to see in the patch before getting it merged.
>>
>> OK, I implemented it, and the surprising result is that performance  does
>> not get any better; in fact it even suffers a little bit. (My standard
>> quick test, the polarssl test suite, shows about a 2% performance impact
>> when profiled with cachegrind).
>
> That looks really strange, as TB chaining clearly reduce the number of
> instructions to execute, by not have to lookup for the TB after each
> branch. Also using a brcond instead of a helper should change nothing as
> it is located at the end of the TB, where all the globals must be saved
> in anyway.
>
> Also a recent bug found on ARM host with regard to TB chaining has shown
> it can gives a noticeably speed gain.

That indeed looks strange:  fixing the TB chaining on ARM
made nbench i386 three times faster.  Note the gain was
less for FP parts of the benchmark due to the use of
helpers.

Ulrich,

out of curiosity could you post your tb_set_jmp_target1
function?  The only thing I can think of at the moment that
could make the code slower is that the program you ran
was not reusing blocks and/or cache flushing in
tb_set_jmp_target1 is overkill.


Laurent

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 2/9] S/390 CPU emulation
  2009-11-02 19:03               ` Laurent Desnogues
@ 2009-11-09 16:55                 ` Ulrich Hecht
  2009-11-10 16:02                   ` Aurelien Jarno
  0 siblings, 1 reply; 26+ messages in thread
From: Ulrich Hecht @ 2009-11-09 16:55 UTC (permalink / raw)
  To: Laurent Desnogues; +Cc: qemu-devel, Aurelien Jarno, agraf

On Monday 02 November 2009, Laurent Desnogues wrote:
> That indeed looks strange:  fixing the TB chaining on ARM
> made nbench i386 three times faster.  Note the gain was
> less for FP parts of the benchmark due to the use of
> helpers.
>
> out of curiosity could you post your tb_set_jmp_target1
> function?

I'm on an AMD64 host, so it's the same code as in mainline.

> The only thing I can think of at the moment that 
> could make the code slower is that the program you ran
> was not reusing blocks and/or cache flushing in
> tb_set_jmp_target1 is overkill.

There is no cache flushing in the AMD64 tb_set_jmp_target1() function, 
and the polarssl test suite is by nature rather repetitive.

I did some experiments, and it seems disabling the TB chaining (by 
emptying tb_set_jmp_target()) does not have any impact on performance at 
all on AMD64. I tested it with several CPU-intensive programs (md5sum 
and the like) with AMD64 on AMD64 userspace emulation (qemu-x86_64), and 
the difference in performance with TB chaining and without is hardly 
measurable. The chaining is performed as advertised if enabled, I 
checked that, but it does not seem to help performance.

How is this possible? Could this be related to cache size? I suspect the 
Phenom 9500 of mine is better equipped in that area than the average ARM 
controller.

And does the TB chaining actually work on AMD64 at all? I checked by 
adding some debug output, and it seems to patch the jumps correctly, but 
maybe somebody can verify that.

CU
Uli

-- 
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 2/9] S/390 CPU emulation
  2009-11-09 16:55                 ` Ulrich Hecht
@ 2009-11-10 16:02                   ` Aurelien Jarno
  2009-11-11 10:46                     ` Ulrich Hecht
  0 siblings, 1 reply; 26+ messages in thread
From: Aurelien Jarno @ 2009-11-10 16:02 UTC (permalink / raw)
  To: Ulrich Hecht; +Cc: Laurent Desnogues, qemu-devel, agraf

On Mon, Nov 09, 2009 at 06:55:23PM +0200, Ulrich Hecht wrote:
> On Monday 02 November 2009, Laurent Desnogues wrote:
> > That indeed looks strange:  fixing the TB chaining on ARM
> > made nbench i386 three times faster.  Note the gain was
> > less for FP parts of the benchmark due to the use of
> > helpers.
> >
> > out of curiosity could you post your tb_set_jmp_target1
> > function?
> 
> I'm on an AMD64 host, so it's the same code as in mainline.
> 
> > The only thing I can think of at the moment that 
> > could make the code slower is that the program you ran
> > was not reusing blocks and/or cache flushing in
> > tb_set_jmp_target1 is overkill.
> 
> There is no cache flushing in the AMD64 tb_set_jmp_target1() function, 
> and the polarssl test suite is by nature rather repetitive.
> 
> I did some experiments, and it seems disabling the TB chaining (by 
> emptying tb_set_jmp_target()) does not have any impact on performance at 
> all on AMD64. I tested it with several CPU-intensive programs (md5sum 
> and the like) with AMD64 on AMD64 userspace emulation (qemu-x86_64), and 
> the difference in performance with TB chaining and without is hardly 
> measurable. The chaining is performed as advertised if enabled, I 
> checked that, but it does not seem to help performance.

I have tested it by removing all the block around tb_add_jump in
cpu_exec.c. I have a speed loss of about 2.5x in the boot time of an
x86_64 image.

> How is this possible? Could this be related to cache size? I suspect the 
> Phenom 9500 of mine is better equipped in that area than the average ARM 
> controller.

For me it's on a Core 2 Duo T7200, so I doubt it is related to cache
size.

> And does the TB chaining actually work on AMD64 at all? I checked by 
> adding some debug output, and it seems to patch the jumps correctly, but 
> maybe somebody can verify that.
> 

Given the gain in speed I have, I guess it works.

-- 
Aurelien Jarno	                        GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Qemu-devel] [PATCH 2/9] S/390 CPU emulation
  2009-11-10 16:02                   ` Aurelien Jarno
@ 2009-11-11 10:46                     ` Ulrich Hecht
  0 siblings, 0 replies; 26+ messages in thread
From: Ulrich Hecht @ 2009-11-11 10:46 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: Laurent Desnogues, qemu-devel, agraf

On Tuesday 10 November 2009, Aurelien Jarno wrote:
> I have tested it by removing all the block around tb_add_jump in
> cpu_exec.c. I have a speed loss of about 2.5x in the boot time of an
> x86_64 image.

I just tried it with qemu-system-x86_64, and with that I can observe a 
noticable performance gain using TB chaining as well. Maybe it's simply 
a lot more effective in system than in userspace emulation.

Anyway, having implemented it for BRC already, I might as well leave it 
in for future generations to enjoy.

CU
Uli

-- 
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2009-11-11 10:45 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-10-16 12:38 [Qemu-devel] [PATCH 0/9] S/390 support updated Ulrich Hecht
2009-10-16 12:38 ` [Qemu-devel] [PATCH 1/9] TCG "sync" op Ulrich Hecht
2009-10-16 12:38   ` [Qemu-devel] [PATCH 2/9] S/390 CPU emulation Ulrich Hecht
2009-10-16 12:38     ` [Qemu-devel] [PATCH 3/9] S/390 host/target build system support Ulrich Hecht
2009-10-16 12:38       ` [Qemu-devel] [PATCH 4/9] S/390 host support for TCG Ulrich Hecht
2009-10-16 12:38         ` [Qemu-devel] [PATCH 5/9] linux-user: S/390 64-bit (s390x) support Ulrich Hecht
2009-10-16 12:38           ` [Qemu-devel] [PATCH 6/9] linux-user: don't do locking in single-threaded processes Ulrich Hecht
2009-10-16 12:38             ` [Qemu-devel] [PATCH 7/9] linux-user: dup3, fallocate syscalls Ulrich Hecht
2009-10-16 12:38               ` [Qemu-devel] [PATCH 8/9] linux-user: define a couple of syscalls for non-uid16 targets Ulrich Hecht
2009-10-16 12:38                 ` [Qemu-devel] [PATCH 9/9] linux-user: getpriority errno fix Ulrich Hecht
2009-10-17 10:44       ` [Qemu-devel] [PATCH 3/9] S/390 host/target build system support Aurelien Jarno
2009-10-17 10:42     ` [Qemu-devel] [PATCH 2/9] S/390 CPU emulation Aurelien Jarno
2009-10-19 17:17       ` Ulrich Hecht
2009-10-22 21:28         ` Aurelien Jarno
2009-11-02 15:16           ` Ulrich Hecht
2009-11-02 18:42             ` Aurelien Jarno
2009-11-02 19:03               ` Laurent Desnogues
2009-11-09 16:55                 ` Ulrich Hecht
2009-11-10 16:02                   ` Aurelien Jarno
2009-11-11 10:46                     ` Ulrich Hecht
2009-10-16 15:52   ` [Qemu-devel] [PATCH 1/9] TCG "sync" op Aurelien Jarno
2009-10-16 16:37     ` Ulrich Hecht
2009-10-16 17:29       ` Aurelien Jarno
2009-10-19 17:17         ` Ulrich Hecht
2009-10-17  8:59     ` Edgar E. Iglesias
2009-10-19 17:17       ` Ulrich Hecht

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).