[Qemu-devel] [PATCH][RFC] Split non-TCG bits out of exec.c

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH][RFC] Split non-TCG bits out of exec.c
@ 2008-11-12 22:10 Anthony Liguori
  2008-11-12 22:48 ` Fabrice Bellard
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Anthony Liguori @ 2008-11-12 22:10 UTC (permalink / raw)
  To: qemu-devel
  Cc: Carsten Otte, Anthony Liguori, kvm-devel, Hollis Blanchard,
	Paul Brook

Unlike kqemu, KVM does not use TCG at all when accelerating QEMU.  Having TCG
present is not a problem when using KVM on x86.  x86 already has TCG host and
target support and it's quite convenient to be able to disable/enable KVM and
compare it to TCG when debugging.

KVM also supports architectures that do not have TCG host and target support
such as ia64, s390, and PPC[1].  For these architectures, TCG is an inhibitor
for upstream inclusion.

TCG is pretty well isolated in QEMU so building these targets without TCG
should be easy enough.  This breaks down in exec.c though.  There is a lot of
TCG specific code in exec.c, but also a lot of code that KVM needs.

This patch moves the non-TCG specific bits of exec.c into a separate file,
exec-all.c.  This makes it relatively easy to build QEMU without TCG support.
More patches will come to complete this work but the exec.c bits are probably
95% of what is needed.

The remaining bits are some general cleanups where layering has been violated
and the introduction of a new -kvm subtarget, similar to -softmmu or
 -linux-user.  This target will not have TCG support and only support KVM.
However, before going down that path, I wanted to see if anyone objected to this
bit of the cleanup.

Any objections?

[1] The particular PPC embedded processor/board combination that KVM supports is
not supported by TCG at this time.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>

diff --git a/Makefile.target b/Makefile.target
index 031ab45..8fe4cce 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -191,7 +191,7 @@ all: $(PROGS)
 #########################################################
 # cpu emulator library
 LIBOBJS=exec.o kqemu.o translate-all.o cpu-exec.o\
-        translate.o host-utils.o
+        translate.o host-utils.o exec-all.o
 ifdef CONFIG_DYNGEN_OP
 exec.o: dyngen-opc.h
 LIBOBJS+=op.o
diff --git a/cpu-all.h b/cpu-all.h
index cdd79bc..f9a50b2 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -730,6 +730,7 @@ void page_set_flags(target_ulong start, target_ulong end, int flags);
 int page_check_range(target_ulong start, target_ulong len, int flags);
 
 void cpu_exec_init_all(unsigned long tb_size);
+void cpu_noexec_init_all(void);
 CPUState *cpu_copy(CPUState *env);
 
 void cpu_dump_state(CPUState *env, FILE *f,
diff --git a/exec-all.c b/exec-all.c
new file mode 100644
index 0000000..50eaa9a
--- /dev/null
+++ b/exec-all.c
@@ -0,0 +1,1504 @@
+/*
+ *  virtual page mapping and translated block handling
+ *
+ *  Copyright (c) 2003 Fabrice Bellard
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+#include "config.h"
+#ifdef _WIN32
+#define WIN32_LEAN_AND_MEAN
+#include <windows.h>
+#else
+#include <sys/types.h>
+#include <sys/mman.h>
+#endif
+#include <stdlib.h>
+#include <stdio.h>
+#include <stdarg.h>
+#include <string.h>
+#include <errno.h>
+#include <unistd.h>
+#include <inttypes.h>
+
+#include "cpu.h"
+#include "exec-all.h"
+#include "qemu-common.h"
+#include "tcg.h"
+#include "hw/hw.h"
+#include "osdep.h"
+#include "kvm.h"
+#if defined(CONFIG_USER_ONLY)
+#include <qemu.h>
+#endif
+
+#if !defined(CONFIG_USER_ONLY)
+ram_addr_t phys_ram_size;
+int phys_ram_fd;
+uint8_t *phys_ram_base;
+uint8_t *phys_ram_dirty;
+static int in_migration;
+static ram_addr_t phys_ram_alloc_offset = 0;
+#endif
+
+CPUState *first_cpu;
+/* current CPU in the current thread. It is only valid inside
+   cpu_exec() */
+CPUState *cpu_single_env;
+/* 0 = Do not count executed instructions.
+   1 = Precise instruction counting.
+   2 = Adaptive rate instruction counting.  */
+int use_icount = 0;
+/* Current instruction counter.  While executing translated code this may
+   include some instructions that have not yet been executed.  */
+int64_t qemu_icount;
+
+typedef struct PhysPageDesc {
+    /* offset in host memory of the page + io_index in the low bits */
+    ram_addr_t phys_offset;
+} PhysPageDesc;
+
+unsigned long qemu_real_host_page_size;
+unsigned long qemu_host_page_bits;
+unsigned long qemu_host_page_size;
+unsigned long qemu_host_page_mask;
+
+static PhysPageDesc **l1_phys_map;
+
+#if !defined(CONFIG_USER_ONLY)
+static void io_mem_init(void);
+
+/* io memory support */
+CPUWriteMemoryFunc *io_mem_write[IO_MEM_NB_ENTRIES][4];
+CPUReadMemoryFunc *io_mem_read[IO_MEM_NB_ENTRIES][4];
+void *io_mem_opaque[IO_MEM_NB_ENTRIES];
+static int io_mem_nb;
+#endif
+
+/* log support */
+static const char *logfilename = "/tmp/qemu.log";
+FILE *logfile;
+int loglevel;
+static int log_append = 0;
+#define SUBPAGE_IDX(addr) ((addr) & ~TARGET_PAGE_MASK)
+typedef struct subpage_t {
+    target_phys_addr_t base;
+    CPUReadMemoryFunc **mem_read[TARGET_PAGE_SIZE][4];
+    CPUWriteMemoryFunc **mem_write[TARGET_PAGE_SIZE][4];
+    void *opaque[TARGET_PAGE_SIZE][2][4];
+} subpage_t;
+
+static void page_init(void)
+{
+    /* NOTE: we can always suppose that qemu_host_page_size >=
+       TARGET_PAGE_SIZE */
+#ifdef _WIN32
+    {
+        SYSTEM_INFO system_info;
+
+        GetSystemInfo(&system_info);
+        qemu_real_host_page_size = system_info.dwPageSize;
+    }
+#else
+    qemu_real_host_page_size = getpagesize();
+#endif
+    if (qemu_host_page_size == 0)
+        qemu_host_page_size = qemu_real_host_page_size;
+    if (qemu_host_page_size < TARGET_PAGE_SIZE)
+        qemu_host_page_size = TARGET_PAGE_SIZE;
+    qemu_host_page_bits = 0;
+    while ((1 << qemu_host_page_bits) < qemu_host_page_size)
+        qemu_host_page_bits++;
+    qemu_host_page_mask = ~(qemu_host_page_size - 1);
+    l1_phys_map = qemu_vmalloc(L1_SIZE * sizeof(void *));
+    memset(l1_phys_map, 0, L1_SIZE * sizeof(void *));
+
+#if !defined(_WIN32) && defined(CONFIG_USER_ONLY)
+    {
+        long long startaddr, endaddr;
+        FILE *f;
+        int n;
+
+        mmap_lock();
+        last_brk = (unsigned long)sbrk(0);
+        f = fopen("/proc/self/maps", "r");
+        if (f) {
+            do {
+                n = fscanf (f, "%llx-%llx %*[^\n]\n", &startaddr, &endaddr);
+                if (n == 2) {
+                    startaddr = MIN(startaddr,
+                                    (1ULL << TARGET_PHYS_ADDR_SPACE_BITS) - 1);
+                    endaddr = MIN(endaddr,
+                                    (1ULL << TARGET_PHYS_ADDR_SPACE_BITS) - 1);
+                    page_set_flags(startaddr & TARGET_PAGE_MASK,
+                                   TARGET_PAGE_ALIGN(endaddr),
+                                   PAGE_RESERVED); 
+                }
+            } while (!feof(f));
+            fclose(f);
+        }
+        mmap_unlock();
+    }
+#endif
+}
+
+static PhysPageDesc *phys_page_find_alloc(target_phys_addr_t index, int alloc)
+{
+    void **lp, **p;
+    PhysPageDesc *pd;
+
+    p = (void **)l1_phys_map;
+#if TARGET_PHYS_ADDR_SPACE_BITS > 32
+
+#if TARGET_PHYS_ADDR_SPACE_BITS > (32 + L1_BITS)
+#error unsupported TARGET_PHYS_ADDR_SPACE_BITS
+#endif
+    lp = p + ((index >> (L1_BITS + L2_BITS)) & (L1_SIZE - 1));
+    p = *lp;
+    if (!p) {
+        /* allocate if not found */
+        if (!alloc)
+            return NULL;
+        p = qemu_vmalloc(sizeof(void *) * L1_SIZE);
+        memset(p, 0, sizeof(void *) * L1_SIZE);
+        *lp = p;
+    }
+#endif
+    lp = p + ((index >> L2_BITS) & (L1_SIZE - 1));
+    pd = *lp;
+    if (!pd) {
+        int i;
+        /* allocate if not found */
+        if (!alloc)
+            return NULL;
+        pd = qemu_vmalloc(sizeof(PhysPageDesc) * L2_SIZE);
+        *lp = pd;
+        for (i = 0; i < L2_SIZE; i++)
+          pd[i].phys_offset = IO_MEM_UNASSIGNED;
+    }
+    return ((PhysPageDesc *)pd) + (index & (L2_SIZE - 1));
+}
+
+static inline PhysPageDesc *phys_page_find(target_phys_addr_t index)
+{
+    return phys_page_find_alloc(index, 0);
+}
+
+void cpu_noexec_init_all(void)
+{
+    page_init();
+#if !defined(CONFIG_USER_ONLY)
+    io_mem_init();
+#endif
+}
+
+#if defined(CPU_SAVE_VERSION) && !defined(CONFIG_USER_ONLY)
+
+#define CPU_COMMON_SAVE_VERSION 1
+
+static void cpu_common_save(QEMUFile *f, void *opaque)
+{
+    CPUState *env = opaque;
+
+    qemu_put_be32s(f, &env->halted);
+    qemu_put_be32s(f, &env->interrupt_request);
+}
+
+static int cpu_common_load(QEMUFile *f, void *opaque, int version_id)
+{
+    CPUState *env = opaque;
+
+    if (version_id != CPU_COMMON_SAVE_VERSION)
+        return -EINVAL;
+
+    qemu_get_be32s(f, &env->halted);
+    qemu_get_be32s(f, &env->interrupt_request);
+    tlb_flush(env, 1);
+
+    return 0;
+}
+#endif
+
+void cpu_exec_init(CPUState *env)
+{
+    CPUState **penv;
+    int cpu_index;
+
+    env->next_cpu = NULL;
+    penv = &first_cpu;
+    cpu_index = 0;
+    while (*penv != NULL) {
+        penv = (CPUState **)&(*penv)->next_cpu;
+        cpu_index++;
+    }
+    env->cpu_index = cpu_index;
+    env->nb_watchpoints = 0;
+    *penv = env;
+#if defined(CPU_SAVE_VERSION) && !defined(CONFIG_USER_ONLY)
+    register_savevm("cpu_common", cpu_index, CPU_COMMON_SAVE_VERSION,
+                    cpu_common_save, cpu_common_load, env);
+    register_savevm("cpu", cpu_index, CPU_SAVE_VERSION,
+                    cpu_save, cpu_load, env);
+#endif
+}
+
+#if defined(TARGET_HAS_ICE)
+static void breakpoint_invalidate(CPUState *env, target_ulong pc)
+{
+    target_phys_addr_t addr;
+    target_ulong pd;
+    ram_addr_t ram_addr;
+    PhysPageDesc *p;
+
+    addr = cpu_get_phys_page_debug(env, pc);
+    p = phys_page_find(addr >> TARGET_PAGE_BITS);
+    if (!p) {
+        pd = IO_MEM_UNASSIGNED;
+    } else {
+        pd = p->phys_offset;
+    }
+    ram_addr = (pd & TARGET_PAGE_MASK) | (pc & ~TARGET_PAGE_MASK);
+    tb_invalidate_phys_page_range(ram_addr, ram_addr + 1, 0);
+}
+#endif
+
+/* Add a watchpoint.  */
+int cpu_watchpoint_insert(CPUState *env, target_ulong addr, int type)
+{
+    int i;
+
+    for (i = 0; i < env->nb_watchpoints; i++) {
+        if (addr == env->watchpoint[i].vaddr)
+            return 0;
+    }
+    if (env->nb_watchpoints >= MAX_WATCHPOINTS)
+        return -1;
+
+    i = env->nb_watchpoints++;
+    env->watchpoint[i].vaddr = addr;
+    env->watchpoint[i].type = type;
+    tlb_flush_page(env, addr);
+    /* FIXME: This flush is needed because of the hack to make memory ops
+       terminate the TB.  It can be removed once the proper IO trap and
+       re-execute bits are in.  */
+    tb_flush(env);
+    return i;
+}
+
+/* Remove a watchpoint.  */
+int cpu_watchpoint_remove(CPUState *env, target_ulong addr)
+{
+    int i;
+
+    for (i = 0; i < env->nb_watchpoints; i++) {
+        if (addr == env->watchpoint[i].vaddr) {
+            env->nb_watchpoints--;
+            env->watchpoint[i] = env->watchpoint[env->nb_watchpoints];
+            tlb_flush_page(env, addr);
+            return 0;
+        }
+    }
+    return -1;
+}
+
+/* Remove all watchpoints. */
+void cpu_watchpoint_remove_all(CPUState *env) {
+    int i;
+
+    for (i = 0; i < env->nb_watchpoints; i++) {
+        tlb_flush_page(env, env->watchpoint[i].vaddr);
+    }
+    env->nb_watchpoints = 0;
+}
+
+/* add a breakpoint. EXCP_DEBUG is returned by the CPU loop if a
+   breakpoint is reached */
+int cpu_breakpoint_insert(CPUState *env, target_ulong pc)
+{
+#if defined(TARGET_HAS_ICE)
+    int i;
+
+    for(i = 0; i < env->nb_breakpoints; i++) {
+        if (env->breakpoints[i] == pc)
+            return 0;
+    }
+
+    if (env->nb_breakpoints >= MAX_BREAKPOINTS)
+        return -1;
+    env->breakpoints[env->nb_breakpoints++] = pc;
+
+    breakpoint_invalidate(env, pc);
+    return 0;
+#else
+    return -1;
+#endif
+}
+
+/* remove all breakpoints */
+void cpu_breakpoint_remove_all(CPUState *env) {
+#if defined(TARGET_HAS_ICE)
+    int i;
+    for(i = 0; i < env->nb_breakpoints; i++) {
+        breakpoint_invalidate(env, env->breakpoints[i]);
+    }
+    env->nb_breakpoints = 0;
+#endif
+}
+
+/* remove a breakpoint */
+int cpu_breakpoint_remove(CPUState *env, target_ulong pc)
+{
+#if defined(TARGET_HAS_ICE)
+    int i;
+    for(i = 0; i < env->nb_breakpoints; i++) {
+        if (env->breakpoints[i] == pc)
+            goto found;
+    }
+    return -1;
+ found:
+    env->nb_breakpoints--;
+    if (i < env->nb_breakpoints)
+      env->breakpoints[i] = env->breakpoints[env->nb_breakpoints];
+
+    breakpoint_invalidate(env, pc);
+    return 0;
+#else
+    return -1;
+#endif
+}
+
+/* enable or disable single step mode. EXCP_DEBUG is returned by the
+   CPU loop after each instruction */
+void cpu_single_step(CPUState *env, int enabled)
+{
+#if defined(TARGET_HAS_ICE)
+    if (env->singlestep_enabled != enabled) {
+        env->singlestep_enabled = enabled;
+        /* must flush all the translated code to avoid inconsistancies */
+        /* XXX: only flush what is necessary */
+        tb_flush(env);
+    }
+#endif
+}
+
+/* enable or disable low levels log */
+void cpu_set_log(int log_flags)
+{
+    loglevel = log_flags;
+    if (loglevel && !logfile) {
+        logfile = fopen(logfilename, log_append ? "a" : "w");
+        if (!logfile) {
+            perror(logfilename);
+            _exit(1);
+        }
+#if !defined(CONFIG_SOFTMMU)
+        /* must avoid mmap() usage of glibc by setting a buffer "by hand" */
+        {
+            static char logfile_buf[4096];
+            setvbuf(logfile, logfile_buf, _IOLBF, sizeof(logfile_buf));
+        }
+#else
+        setvbuf(logfile, NULL, _IOLBF, 0);
+#endif
+        log_append = 1;
+    }
+    if (!loglevel && logfile) {
+        fclose(logfile);
+        logfile = NULL;
+    }
+}
+
+void cpu_set_log_filename(const char *filename)
+{
+    logfilename = strdup(filename);
+    if (logfile) {
+        fclose(logfile);
+        logfile = NULL;
+    }
+    cpu_set_log(loglevel);
+}
+
+void cpu_reset_interrupt(CPUState *env, int mask)
+{
+    env->interrupt_request &= ~mask;
+}
+
+const CPULogItem cpu_log_items[] = {
+    { CPU_LOG_TB_OUT_ASM, "out_asm",
+      "show generated host assembly code for each compiled TB" },
+    { CPU_LOG_TB_IN_ASM, "in_asm",
+      "show target assembly code for each compiled TB" },
+    { CPU_LOG_TB_OP, "op",
+      "show micro ops for each compiled TB" },
+    { CPU_LOG_TB_OP_OPT, "op_opt",
+      "show micro ops "
+#ifdef TARGET_I386
+      "before eflags optimization and "
+#endif
+      "after liveness analysis" },
+    { CPU_LOG_INT, "int",
+      "show interrupts/exceptions in short format" },
+    { CPU_LOG_EXEC, "exec",
+      "show trace before each executed TB (lots of logs)" },
+    { CPU_LOG_TB_CPU, "cpu",
+      "show CPU state before block translation" },
+#ifdef TARGET_I386
+    { CPU_LOG_PCALL, "pcall",
+      "show protected mode far calls/returns/exceptions" },
+#endif
+#ifdef DEBUG_IOPORT
+    { CPU_LOG_IOPORT, "ioport",
+      "show all i/o ports accesses" },
+#endif
+    { 0, NULL, NULL },
+};
+
+static int cmp1(const char *s1, int n, const char *s2)
+{
+    if (strlen(s2) != n)
+        return 0;
+    return memcmp(s1, s2, n) == 0;
+}
+
+/* takes a comma separated list of log masks. Return 0 if error. */
+int cpu_str_to_log_mask(const char *str)
+{
+    const CPULogItem *item;
+    int mask;
+    const char *p, *p1;
+
+    p = str;
+    mask = 0;
+    for(;;) {
+        p1 = strchr(p, ',');
+        if (!p1)
+            p1 = p + strlen(p);
+	if(cmp1(p,p1-p,"all")) {
+		for(item = cpu_log_items; item->mask != 0; item++) {
+			mask |= item->mask;
+		}
+	} else {
+        for(item = cpu_log_items; item->mask != 0; item++) {
+            if (cmp1(p, p1 - p, item->name))
+                goto found;
+        }
+        return 0;
+	}
+    found:
+        mask |= item->mask;
+        if (*p1 != ',')
+            break;
+        p = p1 + 1;
+    }
+    return mask;
+}
+
+void cpu_abort(CPUState *env, const char *fmt, ...)
+{
+    va_list ap;
+    va_list ap2;
+
+    va_start(ap, fmt);
+    va_copy(ap2, ap);
+    fprintf(stderr, "qemu: fatal: ");
+    vfprintf(stderr, fmt, ap);
+    fprintf(stderr, "\n");
+#ifdef TARGET_I386
+    cpu_dump_state(env, stderr, fprintf, X86_DUMP_FPU | X86_DUMP_CCOP);
+#else
+    cpu_dump_state(env, stderr, fprintf, 0);
+#endif
+    if (logfile) {
+        fprintf(logfile, "qemu: fatal: ");
+        vfprintf(logfile, fmt, ap2);
+        fprintf(logfile, "\n");
+#ifdef TARGET_I386
+        cpu_dump_state(env, logfile, fprintf, X86_DUMP_FPU | X86_DUMP_CCOP);
+#else
+        cpu_dump_state(env, logfile, fprintf, 0);
+#endif
+        fflush(logfile);
+        fclose(logfile);
+    }
+    va_end(ap2);
+    va_end(ap);
+    abort();
+}
+
+CPUState *cpu_copy(CPUState *env)
+{
+    CPUState *new_env = cpu_init(env->cpu_model_str);
+    /* preserve chaining and index */
+    CPUState *next_cpu = new_env->next_cpu;
+    int cpu_index = new_env->cpu_index;
+    memcpy(new_env, env, sizeof(CPUState));
+    new_env->next_cpu = next_cpu;
+    new_env->cpu_index = cpu_index;
+    return new_env;
+}
+
+#if !defined(CONFIG_USER_ONLY)
+void cpu_physical_memory_reset_dirty(ram_addr_t start, ram_addr_t end,
+                                     int dirty_flags)
+{
+    CPUState *env;
+    unsigned long length;
+    int i, mask, len;
+    uint8_t *p;
+
+    start &= TARGET_PAGE_MASK;
+    end = TARGET_PAGE_ALIGN(end);
+
+    length = end - start;
+    if (length == 0)
+        return;
+    len = length >> TARGET_PAGE_BITS;
+#ifdef USE_KQEMU
+    /* XXX: should not depend on cpu context */
+    env = first_cpu;
+    if (env->kqemu_enabled) {
+        ram_addr_t addr;
+        addr = start;
+        for(i = 0; i < len; i++) {
+            kqemu_set_notdirty(env, addr);
+            addr += TARGET_PAGE_SIZE;
+        }
+    }
+#endif
+    mask = ~dirty_flags;
+    p = phys_ram_dirty + (start >> TARGET_PAGE_BITS);
+    for(i = 0; i < len; i++)
+        p[i] &= mask;
+
+    tlb_reset_dirty(start, length);
+}
+
+int cpu_physical_memory_set_dirty_tracking(int enable)
+{
+    in_migration = enable;
+    return 0;
+}
+
+int cpu_physical_memory_get_dirty_tracking(void)
+{
+    return in_migration;
+}
+
+static int subpage_register (subpage_t *mmio, uint32_t start, uint32_t end,
+                             ram_addr_t memory);
+static void *subpage_init (target_phys_addr_t base, ram_addr_t *phys,
+                           ram_addr_t orig_memory);
+#define CHECK_SUBPAGE(addr, start_addr, start_addr2, end_addr, end_addr2, \
+                      need_subpage)                                     \
+    do {                                                                \
+        if (addr > start_addr)                                          \
+            start_addr2 = 0;                                            \
+        else {                                                          \
+            start_addr2 = start_addr & ~TARGET_PAGE_MASK;               \
+            if (start_addr2 > 0)                                        \
+                need_subpage = 1;                                       \
+        }                                                               \
+                                                                        \
+        if ((start_addr + orig_size) - addr >= TARGET_PAGE_SIZE)        \
+            end_addr2 = TARGET_PAGE_SIZE - 1;                           \
+        else {                                                          \
+            end_addr2 = (start_addr + orig_size - 1) & ~TARGET_PAGE_MASK; \
+            if (end_addr2 < TARGET_PAGE_SIZE - 1)                       \
+                need_subpage = 1;                                       \
+        }                                                               \
+    } while (0)
+
+/* register physical memory. 'size' must be a multiple of the target
+   page size. If (phys_offset & ~TARGET_PAGE_MASK) != 0, then it is an
+   io memory page */
+void cpu_register_physical_memory(target_phys_addr_t start_addr,
+                                  ram_addr_t size,
+                                  ram_addr_t phys_offset)
+{
+    target_phys_addr_t addr, end_addr;
+    PhysPageDesc *p;
+    CPUState *env;
+    ram_addr_t orig_size = size;
+    void *subpage;
+
+#ifdef USE_KQEMU
+    /* XXX: should not depend on cpu context */
+    env = first_cpu;
+    if (env->kqemu_enabled) {
+        kqemu_set_phys_mem(start_addr, size, phys_offset);
+    }
+#endif
+    if (kvm_enabled())
+        kvm_set_phys_mem(start_addr, size, phys_offset);
+
+    size = (size + TARGET_PAGE_SIZE - 1) & TARGET_PAGE_MASK;
+    end_addr = start_addr + (target_phys_addr_t)size;
+    for(addr = start_addr; addr != end_addr; addr += TARGET_PAGE_SIZE) {
+        p = phys_page_find(addr >> TARGET_PAGE_BITS);
+        if (p && p->phys_offset != IO_MEM_UNASSIGNED) {
+            ram_addr_t orig_memory = p->phys_offset;
+            target_phys_addr_t start_addr2, end_addr2;
+            int need_subpage = 0;
+
+            CHECK_SUBPAGE(addr, start_addr, start_addr2, end_addr, end_addr2,
+                          need_subpage);
+            if (need_subpage || phys_offset & IO_MEM_SUBWIDTH) {
+                if (!(orig_memory & IO_MEM_SUBPAGE)) {
+                    subpage = subpage_init((addr & TARGET_PAGE_MASK),
+                                           &p->phys_offset, orig_memory);
+                } else {
+                    subpage = io_mem_opaque[(orig_memory & ~TARGET_PAGE_MASK)
+                                            >> IO_MEM_SHIFT];
+                }
+                subpage_register(subpage, start_addr2, end_addr2, phys_offset);
+            } else {
+                p->phys_offset = phys_offset;
+                if ((phys_offset & ~TARGET_PAGE_MASK) <= IO_MEM_ROM ||
+                    (phys_offset & IO_MEM_ROMD))
+                    phys_offset += TARGET_PAGE_SIZE;
+            }
+        } else {
+            p = phys_page_find_alloc(addr >> TARGET_PAGE_BITS, 1);
+            p->phys_offset = phys_offset;
+            if ((phys_offset & ~TARGET_PAGE_MASK) <= IO_MEM_ROM ||
+                (phys_offset & IO_MEM_ROMD))
+                phys_offset += TARGET_PAGE_SIZE;
+            else {
+                target_phys_addr_t start_addr2, end_addr2;
+                int need_subpage = 0;
+
+                CHECK_SUBPAGE(addr, start_addr, start_addr2, end_addr,
+                              end_addr2, need_subpage);
+
+                if (need_subpage || phys_offset & IO_MEM_SUBWIDTH) {
+                    subpage = subpage_init((addr & TARGET_PAGE_MASK),
+                                           &p->phys_offset, IO_MEM_UNASSIGNED);
+                    subpage_register(subpage, start_addr2, end_addr2,
+                                     phys_offset);
+                }
+            }
+        }
+    }
+
+    /* since each CPU stores ram addresses in its TLB cache, we must
+       reset the modified entries */
+    /* XXX: slow ! */
+    for(env = first_cpu; env != NULL; env = env->next_cpu) {
+        tlb_flush(env, 1);
+    }
+}
+
+/* XXX: temporary until new memory mapping API */
+ram_addr_t cpu_get_physical_page_desc(target_phys_addr_t addr)
+{
+    PhysPageDesc *p;
+
+    p = phys_page_find(addr >> TARGET_PAGE_BITS);
+    if (!p)
+        return IO_MEM_UNASSIGNED;
+    return p->phys_offset;
+}
+
+/* XXX: better than nothing */
+ram_addr_t qemu_ram_alloc(ram_addr_t size)
+{
+    ram_addr_t addr;
+    if ((phys_ram_alloc_offset + size) > phys_ram_size) {
+        fprintf(stderr, "Not enough memory (requested_size = %" PRIu64 ", max memory = %" PRIu64 ")\n",
+                (uint64_t)size, (uint64_t)phys_ram_size);
+        abort();
+    }
+    addr = phys_ram_alloc_offset;
+    phys_ram_alloc_offset = TARGET_PAGE_ALIGN(phys_ram_alloc_offset + size);
+    return addr;
+}
+
+void qemu_ram_free(ram_addr_t addr)
+{
+}
+
+static uint32_t unassigned_mem_readb(void *opaque, target_phys_addr_t addr)
+{
+#ifdef DEBUG_UNASSIGNED
+    printf("Unassigned mem read " TARGET_FMT_plx "\n", addr);
+#endif
+#if defined(TARGET_SPARC) || defined(TARGET_CRIS)
+    do_unassigned_access(addr, 0, 0, 0, 1);
+#endif
+    return 0;
+}
+
+static uint32_t unassigned_mem_readw(void *opaque, target_phys_addr_t addr)
+{
+#ifdef DEBUG_UNASSIGNED
+    printf("Unassigned mem read " TARGET_FMT_plx "\n", addr);
+#endif
+#if defined(TARGET_SPARC) || defined(TARGET_CRIS)
+    do_unassigned_access(addr, 0, 0, 0, 2);
+#endif
+    return 0;
+}
+
+static uint32_t unassigned_mem_readl(void *opaque, target_phys_addr_t addr)
+{
+#ifdef DEBUG_UNASSIGNED
+    printf("Unassigned mem read " TARGET_FMT_plx "\n", addr);
+#endif
+#if defined(TARGET_SPARC) || defined(TARGET_CRIS)
+    do_unassigned_access(addr, 0, 0, 0, 4);
+#endif
+    return 0;
+}
+
+static void unassigned_mem_writeb(void *opaque, target_phys_addr_t addr, uint32_t val)
+{
+#ifdef DEBUG_UNASSIGNED
+    printf("Unassigned mem write " TARGET_FMT_plx " = 0x%x\n", addr, val);
+#endif
+#if defined(TARGET_SPARC) || defined(TARGET_CRIS)
+    do_unassigned_access(addr, 1, 0, 0, 1);
+#endif
+}
+
+static void unassigned_mem_writew(void *opaque, target_phys_addr_t addr, uint32_t val)
+{
+#ifdef DEBUG_UNASSIGNED
+    printf("Unassigned mem write " TARGET_FMT_plx " = 0x%x\n", addr, val);
+#endif
+#if defined(TARGET_SPARC) || defined(TARGET_CRIS)
+    do_unassigned_access(addr, 1, 0, 0, 2);
+#endif
+}
+
+static void unassigned_mem_writel(void *opaque, target_phys_addr_t addr, uint32_t val)
+{
+#ifdef DEBUG_UNASSIGNED
+    printf("Unassigned mem write " TARGET_FMT_plx " = 0x%x\n", addr, val);
+#endif
+#if defined(TARGET_SPARC) || defined(TARGET_CRIS)
+    do_unassigned_access(addr, 1, 0, 0, 4);
+#endif
+}
+
+static CPUReadMemoryFunc *unassigned_mem_read[3] = {
+    unassigned_mem_readb,
+    unassigned_mem_readw,
+    unassigned_mem_readl,
+};
+
+static CPUWriteMemoryFunc *unassigned_mem_write[3] = {
+    unassigned_mem_writeb,
+    unassigned_mem_writew,
+    unassigned_mem_writel,
+};
+
+static void notdirty_mem_writeb(void *opaque, target_phys_addr_t ram_addr,
+                                uint32_t val)
+{
+    int dirty_flags;
+    dirty_flags = phys_ram_dirty[ram_addr >> TARGET_PAGE_BITS];
+    if (!(dirty_flags & CODE_DIRTY_FLAG)) {
+#if !defined(CONFIG_USER_ONLY)
+        tb_invalidate_phys_page_fast(ram_addr, 1);
+        dirty_flags = phys_ram_dirty[ram_addr >> TARGET_PAGE_BITS];
+#endif
+    }
+    stb_p(phys_ram_base + ram_addr, val);
+#ifdef USE_KQEMU
+    if (cpu_single_env->kqemu_enabled &&
+        (dirty_flags & KQEMU_MODIFY_PAGE_MASK) != KQEMU_MODIFY_PAGE_MASK)
+        kqemu_modify_page(cpu_single_env, ram_addr);
+#endif
+    dirty_flags |= (0xff & ~CODE_DIRTY_FLAG);
+    phys_ram_dirty[ram_addr >> TARGET_PAGE_BITS] = dirty_flags;
+    /* we remove the notdirty callback only if the code has been
+       flushed */
+    if (dirty_flags == 0xff)
+        tlb_set_dirty(cpu_single_env, cpu_single_env->mem_io_vaddr);
+}
+
+static void notdirty_mem_writew(void *opaque, target_phys_addr_t ram_addr,
+                                uint32_t val)
+{
+    int dirty_flags;
+    dirty_flags = phys_ram_dirty[ram_addr >> TARGET_PAGE_BITS];
+    if (!(dirty_flags & CODE_DIRTY_FLAG)) {
+#if !defined(CONFIG_USER_ONLY)
+        tb_invalidate_phys_page_fast(ram_addr, 2);
+        dirty_flags = phys_ram_dirty[ram_addr >> TARGET_PAGE_BITS];
+#endif
+    }
+    stw_p(phys_ram_base + ram_addr, val);
+#ifdef USE_KQEMU
+    if (cpu_single_env->kqemu_enabled &&
+        (dirty_flags & KQEMU_MODIFY_PAGE_MASK) != KQEMU_MODIFY_PAGE_MASK)
+        kqemu_modify_page(cpu_single_env, ram_addr);
+#endif
+    dirty_flags |= (0xff & ~CODE_DIRTY_FLAG);
+    phys_ram_dirty[ram_addr >> TARGET_PAGE_BITS] = dirty_flags;
+    /* we remove the notdirty callback only if the code has been
+       flushed */
+    if (dirty_flags == 0xff)
+        tlb_set_dirty(cpu_single_env, cpu_single_env->mem_io_vaddr);
+}
+
+static void notdirty_mem_writel(void *opaque, target_phys_addr_t ram_addr,
+                                uint32_t val)
+{
+    int dirty_flags;
+    dirty_flags = phys_ram_dirty[ram_addr >> TARGET_PAGE_BITS];
+    if (!(dirty_flags & CODE_DIRTY_FLAG)) {
+#if !defined(CONFIG_USER_ONLY)
+        tb_invalidate_phys_page_fast(ram_addr, 4);
+        dirty_flags = phys_ram_dirty[ram_addr >> TARGET_PAGE_BITS];
+#endif
+    }
+    stl_p(phys_ram_base + ram_addr, val);
+#ifdef USE_KQEMU
+    if (cpu_single_env->kqemu_enabled &&
+        (dirty_flags & KQEMU_MODIFY_PAGE_MASK) != KQEMU_MODIFY_PAGE_MASK)
+        kqemu_modify_page(cpu_single_env, ram_addr);
+#endif
+    dirty_flags |= (0xff & ~CODE_DIRTY_FLAG);
+    phys_ram_dirty[ram_addr >> TARGET_PAGE_BITS] = dirty_flags;
+    /* we remove the notdirty callback only if the code has been
+       flushed */
+    if (dirty_flags == 0xff)
+        tlb_set_dirty(cpu_single_env, cpu_single_env->mem_io_vaddr);
+}
+
+static CPUReadMemoryFunc *error_mem_read[3] = {
+    NULL, /* never used */
+    NULL, /* never used */
+    NULL, /* never used */
+};
+
+static CPUWriteMemoryFunc *notdirty_mem_write[3] = {
+    notdirty_mem_writeb,
+    notdirty_mem_writew,
+    notdirty_mem_writel,
+};
+
+static inline uint32_t subpage_readlen (subpage_t *mmio, target_phys_addr_t addr,
+                                 unsigned int len)
+{
+    uint32_t ret;
+    unsigned int idx;
+
+    idx = SUBPAGE_IDX(addr - mmio->base);
+#if defined(DEBUG_SUBPAGE)
+    printf("%s: subpage %p len %d addr " TARGET_FMT_plx " idx %d\n", __func__,
+           mmio, len, addr, idx);
+#endif
+    ret = (**mmio->mem_read[idx][len])(mmio->opaque[idx][0][len], addr);
+
+    return ret;
+}
+
+static inline void subpage_writelen (subpage_t *mmio, target_phys_addr_t addr,
+                              uint32_t value, unsigned int len)
+{
+    unsigned int idx;
+
+    idx = SUBPAGE_IDX(addr - mmio->base);
+#if defined(DEBUG_SUBPAGE)
+    printf("%s: subpage %p len %d addr " TARGET_FMT_plx " idx %d value %08x\n", __func__,
+           mmio, len, addr, idx, value);
+#endif
+    (**mmio->mem_write[idx][len])(mmio->opaque[idx][1][len], addr, value);
+}
+
+static uint32_t subpage_readb (void *opaque, target_phys_addr_t addr)
+{
+#if defined(DEBUG_SUBPAGE)
+    printf("%s: addr " TARGET_FMT_plx "\n", __func__, addr);
+#endif
+
+    return subpage_readlen(opaque, addr, 0);
+}
+
+static void subpage_writeb (void *opaque, target_phys_addr_t addr,
+                            uint32_t value)
+{
+#if defined(DEBUG_SUBPAGE)
+    printf("%s: addr " TARGET_FMT_plx " val %08x\n", __func__, addr, value);
+#endif
+    subpage_writelen(opaque, addr, value, 0);
+}
+
+static uint32_t subpage_readw (void *opaque, target_phys_addr_t addr)
+{
+#if defined(DEBUG_SUBPAGE)
+    printf("%s: addr " TARGET_FMT_plx "\n", __func__, addr);
+#endif
+
+    return subpage_readlen(opaque, addr, 1);
+}
+
+static void subpage_writew (void *opaque, target_phys_addr_t addr,
+                            uint32_t value)
+{
+#if defined(DEBUG_SUBPAGE)
+    printf("%s: addr " TARGET_FMT_plx " val %08x\n", __func__, addr, value);
+#endif
+    subpage_writelen(opaque, addr, value, 1);
+}
+
+static uint32_t subpage_readl (void *opaque, target_phys_addr_t addr)
+{
+#if defined(DEBUG_SUBPAGE)
+    printf("%s: addr " TARGET_FMT_plx "\n", __func__, addr);
+#endif
+
+    return subpage_readlen(opaque, addr, 2);
+}
+
+static void subpage_writel (void *opaque,
+                         target_phys_addr_t addr, uint32_t value)
+{
+#if defined(DEBUG_SUBPAGE)
+    printf("%s: addr " TARGET_FMT_plx " val %08x\n", __func__, addr, value);
+#endif
+    subpage_writelen(opaque, addr, value, 2);
+}
+
+static CPUReadMemoryFunc *subpage_read[] = {
+    &subpage_readb,
+    &subpage_readw,
+    &subpage_readl,
+};
+
+static CPUWriteMemoryFunc *subpage_write[] = {
+    &subpage_writeb,
+    &subpage_writew,
+    &subpage_writel,
+};
+
+static int subpage_register (subpage_t *mmio, uint32_t start, uint32_t end,
+                             ram_addr_t memory)
+{
+    int idx, eidx;
+    unsigned int i;
+
+    if (start >= TARGET_PAGE_SIZE || end >= TARGET_PAGE_SIZE)
+        return -1;
+    idx = SUBPAGE_IDX(start);
+    eidx = SUBPAGE_IDX(end);
+#if defined(DEBUG_SUBPAGE)
+    printf("%s: %p start %08x end %08x idx %08x eidx %08x mem %d\n", __func__,
+           mmio, start, end, idx, eidx, memory);
+#endif
+    memory >>= IO_MEM_SHIFT;
+    for (; idx <= eidx; idx++) {
+        for (i = 0; i < 4; i++) {
+            if (io_mem_read[memory][i]) {
+                mmio->mem_read[idx][i] = &io_mem_read[memory][i];
+                mmio->opaque[idx][0][i] = io_mem_opaque[memory];
+            }
+            if (io_mem_write[memory][i]) {
+                mmio->mem_write[idx][i] = &io_mem_write[memory][i];
+                mmio->opaque[idx][1][i] = io_mem_opaque[memory];
+            }
+        }
+    }
+
+    return 0;
+}
+
+static void *subpage_init (target_phys_addr_t base, ram_addr_t *phys,
+                           ram_addr_t orig_memory)
+{
+    subpage_t *mmio;
+    int subpage_memory;
+
+    mmio = qemu_mallocz(sizeof(subpage_t));
+    if (mmio != NULL) {
+        mmio->base = base;
+        subpage_memory = cpu_register_io_memory(0, subpage_read, subpage_write, mmio);
+#if defined(DEBUG_SUBPAGE)
+        printf("%s: %p base " TARGET_FMT_plx " len %08x %d\n", __func__,
+               mmio, base, TARGET_PAGE_SIZE, subpage_memory);
+#endif
+        *phys = subpage_memory | IO_MEM_SUBPAGE;
+        subpage_register(mmio, 0, TARGET_PAGE_SIZE - 1, orig_memory);
+    }
+
+    return mmio;
+}
+
+static void io_mem_init(void)
+{
+    cpu_register_io_memory(IO_MEM_ROM >> IO_MEM_SHIFT, error_mem_read, unassigned_mem_write, NULL);
+    cpu_register_io_memory(IO_MEM_UNASSIGNED >> IO_MEM_SHIFT, unassigned_mem_read, unassigned_mem_write, NULL);
+    cpu_register_io_memory(IO_MEM_NOTDIRTY >> IO_MEM_SHIFT, error_mem_read, notdirty_mem_write, NULL);
+    io_mem_nb = 5;
+
+    /* alloc dirty bits array */
+    phys_ram_dirty = qemu_vmalloc(phys_ram_size >> TARGET_PAGE_BITS);
+    memset(phys_ram_dirty, 0xff, phys_ram_size >> TARGET_PAGE_BITS);
+}
+
+/* mem_read and mem_write are arrays of functions containing the
+   function to access byte (index 0), word (index 1) and dword (index
+   2). Functions can be omitted with a NULL function pointer. The
+   registered functions may be modified dynamically later.
+   If io_index is non zero, the corresponding io zone is
+   modified. If it is zero, a new io zone is allocated. The return
+   value can be used with cpu_register_physical_memory(). (-1) is
+   returned if error. */
+int cpu_register_io_memory(int io_index,
+                           CPUReadMemoryFunc **mem_read,
+                           CPUWriteMemoryFunc **mem_write,
+                           void *opaque)
+{
+    int i, subwidth = 0;
+
+    if (io_index <= 0) {
+        if (io_mem_nb >= IO_MEM_NB_ENTRIES)
+            return -1;
+        io_index = io_mem_nb++;
+    } else {
+        if (io_index >= IO_MEM_NB_ENTRIES)
+            return -1;
+    }
+
+    for(i = 0;i < 3; i++) {
+        if (!mem_read[i] || !mem_write[i])
+            subwidth = IO_MEM_SUBWIDTH;
+        io_mem_read[io_index][i] = mem_read[i];
+        io_mem_write[io_index][i] = mem_write[i];
+    }
+    io_mem_opaque[io_index] = opaque;
+    return (io_index << IO_MEM_SHIFT) | subwidth;
+}
+
+CPUWriteMemoryFunc **cpu_get_io_memory_write(int io_index)
+{
+    return io_mem_write[io_index >> IO_MEM_SHIFT];
+}
+
+CPUReadMemoryFunc **cpu_get_io_memory_read(int io_index)
+{
+    return io_mem_read[io_index >> IO_MEM_SHIFT];
+}
+
+#endif /* !defined(CONFIG_USER_ONLY) */
+
+/* physical memory access (slow version, mainly for debug) */
+#if defined(CONFIG_USER_ONLY)
+void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
+                            int len, int is_write)
+{
+    int l, flags;
+    target_ulong page;
+    void * p;
+
+    while (len > 0) {
+        page = addr & TARGET_PAGE_MASK;
+        l = (page + TARGET_PAGE_SIZE) - addr;
+        if (l > len)
+            l = len;
+        flags = page_get_flags(page);
+        if (!(flags & PAGE_VALID))
+            return;
+        if (is_write) {
+            if (!(flags & PAGE_WRITE))
+                return;
+            /* XXX: this code should not depend on lock_user */
+            if (!(p = lock_user(VERIFY_WRITE, addr, l, 0)))
+                /* FIXME - should this return an error rather than just fail? */
+                return;
+            memcpy(p, buf, l);
+            unlock_user(p, addr, l);
+        } else {
+            if (!(flags & PAGE_READ))
+                return;
+            /* XXX: this code should not depend on lock_user */
+            if (!(p = lock_user(VERIFY_READ, addr, l, 1)))
+                /* FIXME - should this return an error rather than just fail? */
+                return;
+            memcpy(buf, p, l);
+            unlock_user(p, addr, 0);
+        }
+        len -= l;
+        buf += l;
+        addr += l;
+    }
+}
+
+#else
+void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
+                            int len, int is_write)
+{
+    int l, io_index;
+    uint8_t *ptr;
+    uint32_t val;
+    target_phys_addr_t page;
+    unsigned long pd;
+    PhysPageDesc *p;
+
+    while (len > 0) {
+        page = addr & TARGET_PAGE_MASK;
+        l = (page + TARGET_PAGE_SIZE) - addr;
+        if (l > len)
+            l = len;
+        p = phys_page_find(page >> TARGET_PAGE_BITS);
+        if (!p) {
+            pd = IO_MEM_UNASSIGNED;
+        } else {
+            pd = p->phys_offset;
+        }
+
+        if (is_write) {
+            if ((pd & ~TARGET_PAGE_MASK) != IO_MEM_RAM) {
+                io_index = (pd >> IO_MEM_SHIFT) & (IO_MEM_NB_ENTRIES - 1);
+                /* XXX: could force cpu_single_env to NULL to avoid
+                   potential bugs */
+                if (l >= 4 && ((addr & 3) == 0)) {
+                    /* 32 bit write access */
+                    val = ldl_p(buf);
+                    io_mem_write[io_index][2](io_mem_opaque[io_index], addr, val);
+                    l = 4;
+                } else if (l >= 2 && ((addr & 1) == 0)) {
+                    /* 16 bit write access */
+                    val = lduw_p(buf);
+                    io_mem_write[io_index][1](io_mem_opaque[io_index], addr, val);
+                    l = 2;
+                } else {
+                    /* 8 bit write access */
+                    val = ldub_p(buf);
+                    io_mem_write[io_index][0](io_mem_opaque[io_index], addr, val);
+                    l = 1;
+                }
+            } else {
+                unsigned long addr1;
+                addr1 = (pd & TARGET_PAGE_MASK) + (addr & ~TARGET_PAGE_MASK);
+                /* RAM case */
+                ptr = phys_ram_base + addr1;
+                memcpy(ptr, buf, l);
+                if (!cpu_physical_memory_is_dirty(addr1)) {
+                    /* invalidate code */
+                    tb_invalidate_phys_page_range(addr1, addr1 + l, 0);
+                    /* set dirty bit */
+                    phys_ram_dirty[addr1 >> TARGET_PAGE_BITS] |=
+                        (0xff & ~CODE_DIRTY_FLAG);
+                }
+            }
+        } else {
+            if ((pd & ~TARGET_PAGE_MASK) > IO_MEM_ROM &&
+                !(pd & IO_MEM_ROMD)) {
+                /* I/O case */
+                io_index = (pd >> IO_MEM_SHIFT) & (IO_MEM_NB_ENTRIES - 1);
+                if (l >= 4 && ((addr & 3) == 0)) {
+                    /* 32 bit read access */
+                    val = io_mem_read[io_index][2](io_mem_opaque[io_index], addr);
+                    stl_p(buf, val);
+                    l = 4;
+                } else if (l >= 2 && ((addr & 1) == 0)) {
+                    /* 16 bit read access */
+                    val = io_mem_read[io_index][1](io_mem_opaque[io_index], addr);
+                    stw_p(buf, val);
+                    l = 2;
+                } else {
+                    /* 8 bit read access */
+                    val = io_mem_read[io_index][0](io_mem_opaque[io_index], addr);
+                    stb_p(buf, val);
+                    l = 1;
+                }
+            } else {
+                /* RAM case */
+                ptr = phys_ram_base + (pd & TARGET_PAGE_MASK) +
+                    (addr & ~TARGET_PAGE_MASK);
+                memcpy(buf, ptr, l);
+            }
+        }
+        len -= l;
+        buf += l;
+        addr += l;
+    }
+}
+
+/* used for ROM loading : can write in RAM and ROM */
+void cpu_physical_memory_write_rom(target_phys_addr_t addr,
+                                   const uint8_t *buf, int len)
+{
+    int l;
+    uint8_t *ptr;
+    target_phys_addr_t page;
+    unsigned long pd;
+    PhysPageDesc *p;
+
+    while (len > 0) {
+        page = addr & TARGET_PAGE_MASK;
+        l = (page + TARGET_PAGE_SIZE) - addr;
+        if (l > len)
+            l = len;
+        p = phys_page_find(page >> TARGET_PAGE_BITS);
+        if (!p) {
+            pd = IO_MEM_UNASSIGNED;
+        } else {
+            pd = p->phys_offset;
+        }
+
+        if ((pd & ~TARGET_PAGE_MASK) != IO_MEM_RAM &&
+            (pd & ~TARGET_PAGE_MASK) != IO_MEM_ROM &&
+            !(pd & IO_MEM_ROMD)) {
+            /* do nothing */
+        } else {
+            unsigned long addr1;
+            addr1 = (pd & TARGET_PAGE_MASK) + (addr & ~TARGET_PAGE_MASK);
+            /* ROM/RAM case */
+            ptr = phys_ram_base + addr1;
+            memcpy(ptr, buf, l);
+        }
+        len -= l;
+        buf += l;
+        addr += l;
+    }
+}
+
+
+/* warning: addr must be aligned */
+uint32_t ldl_phys(target_phys_addr_t addr)
+{
+    int io_index;
+    uint8_t *ptr;
+    uint32_t val;
+    unsigned long pd;
+    PhysPageDesc *p;
+
+    p = phys_page_find(addr >> TARGET_PAGE_BITS);
+    if (!p) {
+        pd = IO_MEM_UNASSIGNED;
+    } else {
+        pd = p->phys_offset;
+    }
+
+    if ((pd & ~TARGET_PAGE_MASK) > IO_MEM_ROM &&
+        !(pd & IO_MEM_ROMD)) {
+        /* I/O case */
+        io_index = (pd >> IO_MEM_SHIFT) & (IO_MEM_NB_ENTRIES - 1);
+        val = io_mem_read[io_index][2](io_mem_opaque[io_index], addr);
+    } else {
+        /* RAM case */
+        ptr = phys_ram_base + (pd & TARGET_PAGE_MASK) +
+            (addr & ~TARGET_PAGE_MASK);
+        val = ldl_p(ptr);
+    }
+    return val;
+}
+
+/* warning: addr must be aligned */
+uint64_t ldq_phys(target_phys_addr_t addr)
+{
+    int io_index;
+    uint8_t *ptr;
+    uint64_t val;
+    unsigned long pd;
+    PhysPageDesc *p;
+
+    p = phys_page_find(addr >> TARGET_PAGE_BITS);
+    if (!p) {
+        pd = IO_MEM_UNASSIGNED;
+    } else {
+        pd = p->phys_offset;
+    }
+
+    if ((pd & ~TARGET_PAGE_MASK) > IO_MEM_ROM &&
+        !(pd & IO_MEM_ROMD)) {
+        /* I/O case */
+        io_index = (pd >> IO_MEM_SHIFT) & (IO_MEM_NB_ENTRIES - 1);
+#ifdef TARGET_WORDS_BIGENDIAN
+        val = (uint64_t)io_mem_read[io_index][2](io_mem_opaque[io_index], addr) << 32;
+        val |= io_mem_read[io_index][2](io_mem_opaque[io_index], addr + 4);
+#else
+        val = io_mem_read[io_index][2](io_mem_opaque[io_index], addr);
+        val |= (uint64_t)io_mem_read[io_index][2](io_mem_opaque[io_index], addr + 4) << 32;
+#endif
+    } else {
+        /* RAM case */
+        ptr = phys_ram_base + (pd & TARGET_PAGE_MASK) +
+            (addr & ~TARGET_PAGE_MASK);
+        val = ldq_p(ptr);
+    }
+    return val;
+}
+
+/* XXX: optimize */
+uint32_t ldub_phys(target_phys_addr_t addr)
+{
+    uint8_t val;
+    cpu_physical_memory_read(addr, &val, 1);
+    return val;
+}
+
+/* XXX: optimize */
+uint32_t lduw_phys(target_phys_addr_t addr)
+{
+    uint16_t val;
+    cpu_physical_memory_read(addr, (uint8_t *)&val, 2);
+    return tswap16(val);
+}
+
+/* warning: addr must be aligned. The ram page is not masked as dirty
+   and the code inside is not invalidated. It is useful if the dirty
+   bits are used to track modified PTEs */
+void stl_phys_notdirty(target_phys_addr_t addr, uint32_t val)
+{
+    int io_index;
+    uint8_t *ptr;
+    unsigned long pd;
+    PhysPageDesc *p;
+
+    p = phys_page_find(addr >> TARGET_PAGE_BITS);
+    if (!p) {
+        pd = IO_MEM_UNASSIGNED;
+    } else {
+        pd = p->phys_offset;
+    }
+
+    if ((pd & ~TARGET_PAGE_MASK) != IO_MEM_RAM) {
+        io_index = (pd >> IO_MEM_SHIFT) & (IO_MEM_NB_ENTRIES - 1);
+        io_mem_write[io_index][2](io_mem_opaque[io_index], addr, val);
+    } else {
+        unsigned long addr1 = (pd & TARGET_PAGE_MASK) + (addr & ~TARGET_PAGE_MASK);
+        ptr = phys_ram_base + addr1;
+        stl_p(ptr, val);
+
+        if (unlikely(in_migration)) {
+            if (!cpu_physical_memory_is_dirty(addr1)) {
+                /* invalidate code */
+                tb_invalidate_phys_page_range(addr1, addr1 + 4, 0);
+                /* set dirty bit */
+                phys_ram_dirty[addr1 >> TARGET_PAGE_BITS] |=
+                    (0xff & ~CODE_DIRTY_FLAG);
+            }
+        }
+    }
+}
+
+void stq_phys_notdirty(target_phys_addr_t addr, uint64_t val)
+{
+    int io_index;
+    uint8_t *ptr;
+    unsigned long pd;
+    PhysPageDesc *p;
+
+    p = phys_page_find(addr >> TARGET_PAGE_BITS);
+    if (!p) {
+        pd = IO_MEM_UNASSIGNED;
+    } else {
+        pd = p->phys_offset;
+    }
+
+    if ((pd & ~TARGET_PAGE_MASK) != IO_MEM_RAM) {
+        io_index = (pd >> IO_MEM_SHIFT) & (IO_MEM_NB_ENTRIES - 1);
+#ifdef TARGET_WORDS_BIGENDIAN
+        io_mem_write[io_index][2](io_mem_opaque[io_index], addr, val >> 32);
+        io_mem_write[io_index][2](io_mem_opaque[io_index], addr + 4, val);
+#else
+        io_mem_write[io_index][2](io_mem_opaque[io_index], addr, val);
+        io_mem_write[io_index][2](io_mem_opaque[io_index], addr + 4, val >> 32);
+#endif
+    } else {
+        ptr = phys_ram_base + (pd & TARGET_PAGE_MASK) +
+            (addr & ~TARGET_PAGE_MASK);
+        stq_p(ptr, val);
+    }
+}
+
+/* warning: addr must be aligned */
+void stl_phys(target_phys_addr_t addr, uint32_t val)
+{
+    int io_index;
+    uint8_t *ptr;
+    unsigned long pd;
+    PhysPageDesc *p;
+
+    p = phys_page_find(addr >> TARGET_PAGE_BITS);
+    if (!p) {
+        pd = IO_MEM_UNASSIGNED;
+    } else {
+        pd = p->phys_offset;
+    }
+
+    if ((pd & ~TARGET_PAGE_MASK) != IO_MEM_RAM) {
+        io_index = (pd >> IO_MEM_SHIFT) & (IO_MEM_NB_ENTRIES - 1);
+        io_mem_write[io_index][2](io_mem_opaque[io_index], addr, val);
+    } else {
+        unsigned long addr1;
+        addr1 = (pd & TARGET_PAGE_MASK) + (addr & ~TARGET_PAGE_MASK);
+        /* RAM case */
+        ptr = phys_ram_base + addr1;
+        stl_p(ptr, val);
+        if (!cpu_physical_memory_is_dirty(addr1)) {
+            /* invalidate code */
+            tb_invalidate_phys_page_range(addr1, addr1 + 4, 0);
+            /* set dirty bit */
+            phys_ram_dirty[addr1 >> TARGET_PAGE_BITS] |=
+                (0xff & ~CODE_DIRTY_FLAG);
+        }
+    }
+}
+
+/* XXX: optimize */
+void stb_phys(target_phys_addr_t addr, uint32_t val)
+{
+    uint8_t v = val;
+    cpu_physical_memory_write(addr, &v, 1);
+}
+
+/* XXX: optimize */
+void stw_phys(target_phys_addr_t addr, uint32_t val)
+{
+    uint16_t v = tswap16(val);
+    cpu_physical_memory_write(addr, (const uint8_t *)&v, 2);
+}
+
+/* XXX: optimize */
+void stq_phys(target_phys_addr_t addr, uint64_t val)
+{
+    val = tswap64(val);
+    cpu_physical_memory_write(addr, (const uint8_t *)&val, 8);
+}
+
+#endif
+
+/* virtual memory access for debug */
+int cpu_memory_rw_debug(CPUState *env, target_ulong addr,
+                        uint8_t *buf, int len, int is_write)
+{
+    int l;
+    target_phys_addr_t phys_addr;
+    target_ulong page;
+
+    while (len > 0) {
+        page = addr & TARGET_PAGE_MASK;
+        phys_addr = cpu_get_phys_page_debug(env, page);
+        /* if no physical page mapped, return an error */
+        if (phys_addr == -1)
+            return -1;
+        l = (page + TARGET_PAGE_SIZE) - addr;
+        if (l > len)
+            l = len;
+        cpu_physical_memory_rw(phys_addr + (addr & ~TARGET_PAGE_MASK),
+                               buf, l, is_write);
+        len -= l;
+        buf += l;
+        addr += l;
+    }
+    return 0;
+}
+
diff --git a/exec-all.h b/exec-all.h
index e3da98a..c965bb0 100644
--- a/exec-all.h
+++ b/exec-all.h
@@ -23,6 +23,38 @@
 /* allow to see translation results - the slowdown should be negligible, so we leave it */
 #define DEBUG_DISAS
 
+#if defined(TARGET_SPARC64)
+#define TARGET_PHYS_ADDR_SPACE_BITS 41
+#elif defined(TARGET_SPARC)
+#define TARGET_PHYS_ADDR_SPACE_BITS 36
+#elif defined(TARGET_ALPHA)
+#define TARGET_PHYS_ADDR_SPACE_BITS 42
+#define TARGET_VIRT_ADDR_SPACE_BITS 42
+#elif defined(TARGET_PPC64)
+#define TARGET_PHYS_ADDR_SPACE_BITS 42
+#elif defined(TARGET_X86_64) && !defined(USE_KQEMU)
+#define TARGET_PHYS_ADDR_SPACE_BITS 42
+#elif defined(TARGET_I386) && !defined(USE_KQEMU)
+#define TARGET_PHYS_ADDR_SPACE_BITS 36
+#else
+/* Note: for compatibility with kqemu, we use 32 bits for x86_64 */
+#define TARGET_PHYS_ADDR_SPACE_BITS 32
+#endif
+
+#define L2_BITS 10
+#if defined(CONFIG_USER_ONLY) && defined(TARGET_VIRT_ADDR_SPACE_BITS)
+/* XXX: this is a temporary hack for alpha target.
+ *      In the future, this is to be replaced by a multi-level table
+ *      to actually be able to handle the complete 64 bits address space.
+ */
+#define L1_BITS (TARGET_VIRT_ADDR_SPACE_BITS - L2_BITS - TARGET_PAGE_BITS)
+#else
+#define L1_BITS (32 - L2_BITS - TARGET_PAGE_BITS)
+#endif
+
+#define L1_SIZE (1 << L1_BITS)
+#define L2_SIZE (1 << L2_BITS)
+
 /* is_jmp field values */
 #define DISAS_NEXT    0 /* next instruction can be analyzed */
 #define DISAS_JUMP    1 /* only pc was modified dynamically */
@@ -86,6 +118,7 @@ int page_unprotect(target_ulong address, unsigned long pc, void *puc);
 void tb_invalidate_phys_page_range(target_phys_addr_t start, target_phys_addr_t end,
                                    int is_cpu_write_access);
 void tb_invalidate_page_range(target_ulong start, target_ulong end);
+void tb_invalidate_phys_page_fast(target_phys_addr_t start, int len);
 void tlb_flush_page(CPUState *env, target_ulong addr);
 void tlb_flush(CPUState *env, int flush_global);
 int tlb_set_page_exec(CPUState *env, target_ulong vaddr,
@@ -99,6 +132,8 @@ static inline int tlb_set_page(CPUState *env1, target_ulong vaddr,
         prot |= PAGE_EXEC;
     return tlb_set_page_exec(env1, vaddr, paddr, prot, mmu_idx, is_softmmu);
 }
+void tlb_set_dirty(CPUState *env, target_ulong vaddr);
+void tlb_reset_dirty(ram_addr_t start, unsigned long length);
 
 #define CODE_GEN_ALIGN           16 /* must be >= of the size of a icache line */
 
diff --git a/exec.c b/exec.c
index 1edc737..b13c511 100644
--- a/exec.c
+++ b/exec.c
@@ -66,24 +66,6 @@
 #define MMAP_AREA_START        0x00000000
 #define MMAP_AREA_END          0xa8000000
 
-#if defined(TARGET_SPARC64)
-#define TARGET_PHYS_ADDR_SPACE_BITS 41
-#elif defined(TARGET_SPARC)
-#define TARGET_PHYS_ADDR_SPACE_BITS 36
-#elif defined(TARGET_ALPHA)
-#define TARGET_PHYS_ADDR_SPACE_BITS 42
-#define TARGET_VIRT_ADDR_SPACE_BITS 42
-#elif defined(TARGET_PPC64)
-#define TARGET_PHYS_ADDR_SPACE_BITS 42
-#elif defined(TARGET_X86_64) && !defined(USE_KQEMU)
-#define TARGET_PHYS_ADDR_SPACE_BITS 42
-#elif defined(TARGET_I386) && !defined(USE_KQEMU)
-#define TARGET_PHYS_ADDR_SPACE_BITS 36
-#else
-/* Note: for compatibility with kqemu, we use 32 bits for x86_64 */
-#define TARGET_PHYS_ADDR_SPACE_BITS 32
-#endif
-
 static TranslationBlock *tbs;
 int code_gen_max_blocks;
 TranslationBlock *tb_phys_hash[CODE_GEN_PHYS_HASH_SIZE];
@@ -91,6 +73,10 @@ static int nb_tbs;
 /* any access to the tbs or the page table must use this lock */
 spinlock_t tb_lock = SPIN_LOCK_UNLOCKED;
 
+#if !defined(CONFIG_USER_ONLY)
+static int io_mem_watch;
+#endif
+
 #if defined(__arm__) || defined(__sparc_v9__)
 /* The prologue must be reachable with a direct jump. ARM and Sparc64
  have limited branch ranges (possibly also PPC) so place it in a
@@ -110,27 +96,6 @@ static unsigned long code_gen_buffer_size;
 static unsigned long code_gen_buffer_max_size;
 uint8_t *code_gen_ptr;
 
-#if !defined(CONFIG_USER_ONLY)
-ram_addr_t phys_ram_size;
-int phys_ram_fd;
-uint8_t *phys_ram_base;
-uint8_t *phys_ram_dirty;
-static int in_migration;
-static ram_addr_t phys_ram_alloc_offset = 0;
-#endif
-
-CPUState *first_cpu;
-/* current CPU in the current thread. It is only valid inside
-   cpu_exec() */
-CPUState *cpu_single_env;
-/* 0 = Do not count executed instructions.
-   1 = Precise instruction counting.
-   2 = Adaptive rate instruction counting.  */
-int use_icount = 0;
-/* Current instruction counter.  While executing translated code this may
-   include some instructions that have not yet been executed.  */
-int64_t qemu_icount;
-
 typedef struct PageDesc {
     /* list of TBs intersecting this ram page */
     TranslationBlock *first_tb;
@@ -143,64 +108,14 @@ typedef struct PageDesc {
 #endif
 } PageDesc;
 
-typedef struct PhysPageDesc {
-    /* offset in host memory of the page + io_index in the low bits */
-    ram_addr_t phys_offset;
-} PhysPageDesc;
-
-#define L2_BITS 10
-#if defined(CONFIG_USER_ONLY) && defined(TARGET_VIRT_ADDR_SPACE_BITS)
-/* XXX: this is a temporary hack for alpha target.
- *      In the future, this is to be replaced by a multi-level table
- *      to actually be able to handle the complete 64 bits address space.
- */
-#define L1_BITS (TARGET_VIRT_ADDR_SPACE_BITS - L2_BITS - TARGET_PAGE_BITS)
-#else
-#define L1_BITS (32 - L2_BITS - TARGET_PAGE_BITS)
-#endif
-
-#define L1_SIZE (1 << L1_BITS)
-#define L2_SIZE (1 << L2_BITS)
-
-unsigned long qemu_real_host_page_size;
-unsigned long qemu_host_page_bits;
-unsigned long qemu_host_page_size;
-unsigned long qemu_host_page_mask;
-
 /* XXX: for system emulation, it could just be an array */
 static PageDesc *l1_map[L1_SIZE];
-static PhysPageDesc **l1_phys_map;
-
-#if !defined(CONFIG_USER_ONLY)
-static void io_mem_init(void);
-
-/* io memory support */
-CPUWriteMemoryFunc *io_mem_write[IO_MEM_NB_ENTRIES][4];
-CPUReadMemoryFunc *io_mem_read[IO_MEM_NB_ENTRIES][4];
-void *io_mem_opaque[IO_MEM_NB_ENTRIES];
-static int io_mem_nb;
-static int io_mem_watch;
-#endif
-
-/* log support */
-static const char *logfilename = "/tmp/qemu.log";
-FILE *logfile;
-int loglevel;
-static int log_append = 0;
 
 /* statistics */
 static int tlb_flush_count;
 static int tb_flush_count;
 static int tb_phys_invalidate_count;
 
-#define SUBPAGE_IDX(addr) ((addr) & ~TARGET_PAGE_MASK)
-typedef struct subpage_t {
-    target_phys_addr_t base;
-    CPUReadMemoryFunc **mem_read[TARGET_PAGE_SIZE][4];
-    CPUWriteMemoryFunc **mem_write[TARGET_PAGE_SIZE][4];
-    void *opaque[TARGET_PAGE_SIZE][2][4];
-} subpage_t;
-
 #ifdef _WIN32
 static void map_exec(void *addr, long size)
 {
@@ -227,60 +142,6 @@ static void map_exec(void *addr, long size)
 }
 #endif
 
-static void page_init(void)
-{
-    /* NOTE: we can always suppose that qemu_host_page_size >=
-       TARGET_PAGE_SIZE */
-#ifdef _WIN32
-    {
-        SYSTEM_INFO system_info;
-
-        GetSystemInfo(&system_info);
-        qemu_real_host_page_size = system_info.dwPageSize;
-    }
-#else
-    qemu_real_host_page_size = getpagesize();
-#endif
-    if (qemu_host_page_size == 0)
-        qemu_host_page_size = qemu_real_host_page_size;
-    if (qemu_host_page_size < TARGET_PAGE_SIZE)
-        qemu_host_page_size = TARGET_PAGE_SIZE;
-    qemu_host_page_bits = 0;
-    while ((1 << qemu_host_page_bits) < qemu_host_page_size)
-        qemu_host_page_bits++;
-    qemu_host_page_mask = ~(qemu_host_page_size - 1);
-    l1_phys_map = qemu_vmalloc(L1_SIZE * sizeof(void *));
-    memset(l1_phys_map, 0, L1_SIZE * sizeof(void *));
-
-#if !defined(_WIN32) && defined(CONFIG_USER_ONLY)
-    {
-        long long startaddr, endaddr;
-        FILE *f;
-        int n;
-
-        mmap_lock();
-        last_brk = (unsigned long)sbrk(0);
-        f = fopen("/proc/self/maps", "r");
-        if (f) {
-            do {
-                n = fscanf (f, "%llx-%llx %*[^\n]\n", &startaddr, &endaddr);
-                if (n == 2) {
-                    startaddr = MIN(startaddr,
-                                    (1ULL << TARGET_PHYS_ADDR_SPACE_BITS) - 1);
-                    endaddr = MIN(endaddr,
-                                    (1ULL << TARGET_PHYS_ADDR_SPACE_BITS) - 1);
-                    page_set_flags(startaddr & TARGET_PAGE_MASK,
-                                   TARGET_PAGE_ALIGN(endaddr),
-                                   PAGE_RESERVED); 
-                }
-            } while (!feof(f));
-            fclose(f);
-        }
-        mmap_unlock();
-    }
-#endif
-}
-
 static inline PageDesc **page_l1_map(target_ulong index)
 {
 #if TARGET_LONG_BITS > 32
@@ -336,48 +197,6 @@ static inline PageDesc *page_find(target_ulong index)
     return p + (index & (L2_SIZE - 1));
 }
 
-static PhysPageDesc *phys_page_find_alloc(target_phys_addr_t index, int alloc)
-{
-    void **lp, **p;
-    PhysPageDesc *pd;
-
-    p = (void **)l1_phys_map;
-#if TARGET_PHYS_ADDR_SPACE_BITS > 32
-
-#if TARGET_PHYS_ADDR_SPACE_BITS > (32 + L1_BITS)
-#error unsupported TARGET_PHYS_ADDR_SPACE_BITS
-#endif
-    lp = p + ((index >> (L1_BITS + L2_BITS)) & (L1_SIZE - 1));
-    p = *lp;
-    if (!p) {
-        /* allocate if not found */
-        if (!alloc)
-            return NULL;
-        p = qemu_vmalloc(sizeof(void *) * L1_SIZE);
-        memset(p, 0, sizeof(void *) * L1_SIZE);
-        *lp = p;
-    }
-#endif
-    lp = p + ((index >> L2_BITS) & (L1_SIZE - 1));
-    pd = *lp;
-    if (!pd) {
-        int i;
-        /* allocate if not found */
-        if (!alloc)
-            return NULL;
-        pd = qemu_vmalloc(sizeof(PhysPageDesc) * L2_SIZE);
-        *lp = pd;
-        for (i = 0; i < L2_SIZE; i++)
-          pd[i].phys_offset = IO_MEM_UNASSIGNED;
-    }
-    return ((PhysPageDesc *)pd) + (index & (L2_SIZE - 1));
-}
-
-static inline PhysPageDesc *phys_page_find(target_phys_addr_t index)
-{
-    return phys_page_find_alloc(index, 0);
-}
-
 #if !defined(CONFIG_USER_ONLY)
 static void tlb_protect_code(ram_addr_t ram_addr);
 static void tlb_unprotect_code_phys(CPUState *env, ram_addr_t ram_addr,
@@ -483,67 +302,96 @@ static void code_gen_alloc(unsigned long tb_size)
     tbs = qemu_malloc(code_gen_max_blocks * sizeof(TranslationBlock));
 }
 
-/* Must be called before using the QEMU cpus. 'tb_size' is the size
-   (in bytes) allocated to the translation buffer. Zero means default
-   size. */
-void cpu_exec_init_all(unsigned long tb_size)
-{
-    cpu_gen_init();
-    code_gen_alloc(tb_size);
-    code_gen_ptr = code_gen_buffer;
-    page_init();
 #if !defined(CONFIG_USER_ONLY)
-    io_mem_init();
-#endif
+/* Generate a debug exception if a watchpoint has been hit.  */
+static void check_watchpoint(int offset, int flags)
+{
+    CPUState *env = cpu_single_env;
+    target_ulong vaddr;
+    int i;
+
+    vaddr = (env->mem_io_vaddr & TARGET_PAGE_MASK) + offset;
+    for (i = 0; i < env->nb_watchpoints; i++) {
+        if (vaddr == env->watchpoint[i].vaddr
+                && (env->watchpoint[i].type & flags)) {
+            env->watchpoint_hit = i + 1;
+            cpu_interrupt(env, CPU_INTERRUPT_DEBUG);
+            break;
+        }
+    }
+}
+
+/* Watchpoint access routines.  Watchpoints are inserted using TLB tricks,
+   so these check for a hit then pass through to the normal out-of-line
+   phys routines.  */
+static uint32_t watch_mem_readb(void *opaque, target_phys_addr_t addr)
+{
+    check_watchpoint(addr & ~TARGET_PAGE_MASK, PAGE_READ);
+    return ldub_phys(addr);
 }
 
-#if defined(CPU_SAVE_VERSION) && !defined(CONFIG_USER_ONLY)
+static uint32_t watch_mem_readw(void *opaque, target_phys_addr_t addr)
+{
+    check_watchpoint(addr & ~TARGET_PAGE_MASK, PAGE_READ);
+    return lduw_phys(addr);
+}
 
-#define CPU_COMMON_SAVE_VERSION 1
+static uint32_t watch_mem_readl(void *opaque, target_phys_addr_t addr)
+{
+    check_watchpoint(addr & ~TARGET_PAGE_MASK, PAGE_READ);
+    return ldl_phys(addr);
+}
 
-static void cpu_common_save(QEMUFile *f, void *opaque)
+static void watch_mem_writeb(void *opaque, target_phys_addr_t addr,
+                             uint32_t val)
 {
-    CPUState *env = opaque;
+    check_watchpoint(addr & ~TARGET_PAGE_MASK, PAGE_WRITE);
+    stb_phys(addr, val);
+}
 
-    qemu_put_be32s(f, &env->halted);
-    qemu_put_be32s(f, &env->interrupt_request);
+static void watch_mem_writew(void *opaque, target_phys_addr_t addr,
+                             uint32_t val)
+{
+    check_watchpoint(addr & ~TARGET_PAGE_MASK, PAGE_WRITE);
+    stw_phys(addr, val);
 }
 
-static int cpu_common_load(QEMUFile *f, void *opaque, int version_id)
+static void watch_mem_writel(void *opaque, target_phys_addr_t addr,
+                             uint32_t val)
 {
-    CPUState *env = opaque;
+    check_watchpoint(addr & ~TARGET_PAGE_MASK, PAGE_WRITE);
+    stl_phys(addr, val);
+}
 
-    if (version_id != CPU_COMMON_SAVE_VERSION)
-        return -EINVAL;
+static CPUReadMemoryFunc *watch_mem_read[3] = {
+    watch_mem_readb,
+    watch_mem_readw,
+    watch_mem_readl,
+};
 
-    qemu_get_be32s(f, &env->halted);
-    qemu_get_be32s(f, &env->interrupt_request);
-    tlb_flush(env, 1);
+static CPUWriteMemoryFunc *watch_mem_write[3] = {
+    watch_mem_writeb,
+    watch_mem_writew,
+    watch_mem_writel,
+};
 
-    return 0;
+static void io_mem_watch_init(void)
+{
+    io_mem_watch = cpu_register_io_memory(0, watch_mem_read,
+                                          watch_mem_write, NULL);
 }
 #endif
 
-void cpu_exec_init(CPUState *env)
+/* Must be called before using the QEMU cpus. 'tb_size' is the size
+   (in bytes) allocated to the translation buffer. Zero means default
+   size. */
+void cpu_exec_init_all(unsigned long tb_size)
 {
-    CPUState **penv;
-    int cpu_index;
-
-    env->next_cpu = NULL;
-    penv = &first_cpu;
-    cpu_index = 0;
-    while (*penv != NULL) {
-        penv = (CPUState **)&(*penv)->next_cpu;
-        cpu_index++;
-    }
-    env->cpu_index = cpu_index;
-    env->nb_watchpoints = 0;
-    *penv = env;
-#if defined(CPU_SAVE_VERSION) && !defined(CONFIG_USER_ONLY)
-    register_savevm("cpu_common", cpu_index, CPU_COMMON_SAVE_VERSION,
-                    cpu_common_save, cpu_common_load, env);
-    register_savevm("cpu", cpu_index, CPU_SAVE_VERSION,
-                    cpu_save, cpu_load, env);
+    cpu_gen_init();
+    code_gen_alloc(tb_size);
+    code_gen_ptr = code_gen_buffer;
+#if !defined(CONFIG_USER_ONLY)
+    io_mem_watch_init();
 #endif
 }
 
@@ -995,7 +843,7 @@ void tb_invalidate_phys_page_range(target_phys_addr_t start, target_phys_addr_t
 }
 
 /* len must be <= 8 and start must be a multiple of len */
-static inline void tb_invalidate_phys_page_fast(target_phys_addr_t start, int len)
+void tb_invalidate_phys_page_fast(target_phys_addr_t start, int len)
 {
     PageDesc *p;
     int offset, b;
@@ -1290,182 +1138,6 @@ static void tb_reset_jump_recursive(TranslationBlock *tb)
     tb_reset_jump_recursive2(tb, 1);
 }
 
-#if defined(TARGET_HAS_ICE)
-static void breakpoint_invalidate(CPUState *env, target_ulong pc)
-{
-    target_phys_addr_t addr;
-    target_ulong pd;
-    ram_addr_t ram_addr;
-    PhysPageDesc *p;
-
-    addr = cpu_get_phys_page_debug(env, pc);
-    p = phys_page_find(addr >> TARGET_PAGE_BITS);
-    if (!p) {
-        pd = IO_MEM_UNASSIGNED;
-    } else {
-        pd = p->phys_offset;
-    }
-    ram_addr = (pd & TARGET_PAGE_MASK) | (pc & ~TARGET_PAGE_MASK);
-    tb_invalidate_phys_page_range(ram_addr, ram_addr + 1, 0);
-}
-#endif
-
-/* Add a watchpoint.  */
-int cpu_watchpoint_insert(CPUState *env, target_ulong addr, int type)
-{
-    int i;
-
-    for (i = 0; i < env->nb_watchpoints; i++) {
-        if (addr == env->watchpoint[i].vaddr)
-            return 0;
-    }
-    if (env->nb_watchpoints >= MAX_WATCHPOINTS)
-        return -1;
-
-    i = env->nb_watchpoints++;
-    env->watchpoint[i].vaddr = addr;
-    env->watchpoint[i].type = type;
-    tlb_flush_page(env, addr);
-    /* FIXME: This flush is needed because of the hack to make memory ops
-       terminate the TB.  It can be removed once the proper IO trap and
-       re-execute bits are in.  */
-    tb_flush(env);
-    return i;
-}
-
-/* Remove a watchpoint.  */
-int cpu_watchpoint_remove(CPUState *env, target_ulong addr)
-{
-    int i;
-
-    for (i = 0; i < env->nb_watchpoints; i++) {
-        if (addr == env->watchpoint[i].vaddr) {
-            env->nb_watchpoints--;
-            env->watchpoint[i] = env->watchpoint[env->nb_watchpoints];
-            tlb_flush_page(env, addr);
-            return 0;
-        }
-    }
-    return -1;
-}
-
-/* Remove all watchpoints. */
-void cpu_watchpoint_remove_all(CPUState *env) {
-    int i;
-
-    for (i = 0; i < env->nb_watchpoints; i++) {
-        tlb_flush_page(env, env->watchpoint[i].vaddr);
-    }
-    env->nb_watchpoints = 0;
-}
-
-/* add a breakpoint. EXCP_DEBUG is returned by the CPU loop if a
-   breakpoint is reached */
-int cpu_breakpoint_insert(CPUState *env, target_ulong pc)
-{
-#if defined(TARGET_HAS_ICE)
-    int i;
-
-    for(i = 0; i < env->nb_breakpoints; i++) {
-        if (env->breakpoints[i] == pc)
-            return 0;
-    }
-
-    if (env->nb_breakpoints >= MAX_BREAKPOINTS)
-        return -1;
-    env->breakpoints[env->nb_breakpoints++] = pc;
-
-    breakpoint_invalidate(env, pc);
-    return 0;
-#else
-    return -1;
-#endif
-}
-
-/* remove all breakpoints */
-void cpu_breakpoint_remove_all(CPUState *env) {
-#if defined(TARGET_HAS_ICE)
-    int i;
-    for(i = 0; i < env->nb_breakpoints; i++) {
-        breakpoint_invalidate(env, env->breakpoints[i]);
-    }
-    env->nb_breakpoints = 0;
-#endif
-}
-
-/* remove a breakpoint */
-int cpu_breakpoint_remove(CPUState *env, target_ulong pc)
-{
-#if defined(TARGET_HAS_ICE)
-    int i;
-    for(i = 0; i < env->nb_breakpoints; i++) {
-        if (env->breakpoints[i] == pc)
-            goto found;
-    }
-    return -1;
- found:
-    env->nb_breakpoints--;
-    if (i < env->nb_breakpoints)
-      env->breakpoints[i] = env->breakpoints[env->nb_breakpoints];
-
-    breakpoint_invalidate(env, pc);
-    return 0;
-#else
-    return -1;
-#endif
-}
-
-/* enable or disable single step mode. EXCP_DEBUG is returned by the
-   CPU loop after each instruction */
-void cpu_single_step(CPUState *env, int enabled)
-{
-#if defined(TARGET_HAS_ICE)
-    if (env->singlestep_enabled != enabled) {
-        env->singlestep_enabled = enabled;
-        /* must flush all the translated code to avoid inconsistancies */
-        /* XXX: only flush what is necessary */
-        tb_flush(env);
-    }
-#endif
-}
-
-/* enable or disable low levels log */
-void cpu_set_log(int log_flags)
-{
-    loglevel = log_flags;
-    if (loglevel && !logfile) {
-        logfile = fopen(logfilename, log_append ? "a" : "w");
-        if (!logfile) {
-            perror(logfilename);
-            _exit(1);
-        }
-#if !defined(CONFIG_SOFTMMU)
-        /* must avoid mmap() usage of glibc by setting a buffer "by hand" */
-        {
-            static char logfile_buf[4096];
-            setvbuf(logfile, logfile_buf, _IOLBF, sizeof(logfile_buf));
-        }
-#else
-        setvbuf(logfile, NULL, _IOLBF, 0);
-#endif
-        log_append = 1;
-    }
-    if (!loglevel && logfile) {
-        fclose(logfile);
-        logfile = NULL;
-    }
-}
-
-void cpu_set_log_filename(const char *filename)
-{
-    logfilename = strdup(filename);
-    if (logfile) {
-        fclose(logfile);
-        logfile = NULL;
-    }
-    cpu_set_log(loglevel);
-}
-
 /* mask must never be zero, except for A20 change call */
 void cpu_interrupt(CPUState *env, int mask)
 {
@@ -1508,125 +1180,6 @@ void cpu_interrupt(CPUState *env, int mask)
 #endif
 }
 
-void cpu_reset_interrupt(CPUState *env, int mask)
-{
-    env->interrupt_request &= ~mask;
-}
-
-const CPULogItem cpu_log_items[] = {
-    { CPU_LOG_TB_OUT_ASM, "out_asm",
-      "show generated host assembly code for each compiled TB" },
-    { CPU_LOG_TB_IN_ASM, "in_asm",
-      "show target assembly code for each compiled TB" },
-    { CPU_LOG_TB_OP, "op",
-      "show micro ops for each compiled TB" },
-    { CPU_LOG_TB_OP_OPT, "op_opt",
-      "show micro ops "
-#ifdef TARGET_I386
-      "before eflags optimization and "
-#endif
-      "after liveness analysis" },
-    { CPU_LOG_INT, "int",
-      "show interrupts/exceptions in short format" },
-    { CPU_LOG_EXEC, "exec",
-      "show trace before each executed TB (lots of logs)" },
-    { CPU_LOG_TB_CPU, "cpu",
-      "show CPU state before block translation" },
-#ifdef TARGET_I386
-    { CPU_LOG_PCALL, "pcall",
-      "show protected mode far calls/returns/exceptions" },
-#endif
-#ifdef DEBUG_IOPORT
-    { CPU_LOG_IOPORT, "ioport",
-      "show all i/o ports accesses" },
-#endif
-    { 0, NULL, NULL },
-};
-
-static int cmp1(const char *s1, int n, const char *s2)
-{
-    if (strlen(s2) != n)
-        return 0;
-    return memcmp(s1, s2, n) == 0;
-}
-
-/* takes a comma separated list of log masks. Return 0 if error. */
-int cpu_str_to_log_mask(const char *str)
-{
-    const CPULogItem *item;
-    int mask;
-    const char *p, *p1;
-
-    p = str;
-    mask = 0;
-    for(;;) {
-        p1 = strchr(p, ',');
-        if (!p1)
-            p1 = p + strlen(p);
-	if(cmp1(p,p1-p,"all")) {
-		for(item = cpu_log_items; item->mask != 0; item++) {
-			mask |= item->mask;
-		}
-	} else {
-        for(item = cpu_log_items; item->mask != 0; item++) {
-            if (cmp1(p, p1 - p, item->name))
-                goto found;
-        }
-        return 0;
-	}
-    found:
-        mask |= item->mask;
-        if (*p1 != ',')
-            break;
-        p = p1 + 1;
-    }
-    return mask;
-}
-
-void cpu_abort(CPUState *env, const char *fmt, ...)
-{
-    va_list ap;
-    va_list ap2;
-
-    va_start(ap, fmt);
-    va_copy(ap2, ap);
-    fprintf(stderr, "qemu: fatal: ");
-    vfprintf(stderr, fmt, ap);
-    fprintf(stderr, "\n");
-#ifdef TARGET_I386
-    cpu_dump_state(env, stderr, fprintf, X86_DUMP_FPU | X86_DUMP_CCOP);
-#else
-    cpu_dump_state(env, stderr, fprintf, 0);
-#endif
-    if (logfile) {
-        fprintf(logfile, "qemu: fatal: ");
-        vfprintf(logfile, fmt, ap2);
-        fprintf(logfile, "\n");
-#ifdef TARGET_I386
-        cpu_dump_state(env, logfile, fprintf, X86_DUMP_FPU | X86_DUMP_CCOP);
-#else
-        cpu_dump_state(env, logfile, fprintf, 0);
-#endif
-        fflush(logfile);
-        fclose(logfile);
-    }
-    va_end(ap2);
-    va_end(ap);
-    abort();
-}
-
-CPUState *cpu_copy(CPUState *env)
-{
-    CPUState *new_env = cpu_init(env->cpu_model_str);
-    /* preserve chaining and index */
-    CPUState *next_cpu = new_env->next_cpu;
-    int cpu_index = new_env->cpu_index;
-    memcpy(new_env, env, sizeof(CPUState));
-    new_env->next_cpu = next_cpu;
-    new_env->cpu_index = cpu_index;
-    return new_env;
-}
-
 #if !defined(CONFIG_USER_ONLY)
 
 static inline void tlb_flush_jmp_cache(CPUState *env, target_ulong addr)
@@ -1760,68 +1313,6 @@ static inline void tlb_reset_dirty_range(CPUTLBEntry *tlb_entry,
     }
 }
 
-void cpu_physical_memory_reset_dirty(ram_addr_t start, ram_addr_t end,
-                                     int dirty_flags)
-{
-    CPUState *env;
-    unsigned long length, start1;
-    int i, mask, len;
-    uint8_t *p;
-
-    start &= TARGET_PAGE_MASK;
-    end = TARGET_PAGE_ALIGN(end);
-
-    length = end - start;
-    if (length == 0)
-        return;
-    len = length >> TARGET_PAGE_BITS;
-#ifdef USE_KQEMU
-    /* XXX: should not depend on cpu context */
-    env = first_cpu;
-    if (env->kqemu_enabled) {
-        ram_addr_t addr;
-        addr = start;
-        for(i = 0; i < len; i++) {
-            kqemu_set_notdirty(env, addr);
-            addr += TARGET_PAGE_SIZE;
-        }
-    }
-#endif
-    mask = ~dirty_flags;
-    p = phys_ram_dirty + (start >> TARGET_PAGE_BITS);
-    for(i = 0; i < len; i++)
-        p[i] &= mask;
-
-    /* we modify the TLB cache so that the dirty bit will be set again
-       when accessing the range */
-    start1 = start + (unsigned long)phys_ram_base;
-    for(env = first_cpu; env != NULL; env = env->next_cpu) {
-        for(i = 0; i < CPU_TLB_SIZE; i++)
-            tlb_reset_dirty_range(&env->tlb_table[0][i], start1, length);
-        for(i = 0; i < CPU_TLB_SIZE; i++)
-            tlb_reset_dirty_range(&env->tlb_table[1][i], start1, length);
-#if (NB_MMU_MODES >= 3)
-        for(i = 0; i < CPU_TLB_SIZE; i++)
-            tlb_reset_dirty_range(&env->tlb_table[2][i], start1, length);
-#if (NB_MMU_MODES == 4)
-        for(i = 0; i < CPU_TLB_SIZE; i++)
-            tlb_reset_dirty_range(&env->tlb_table[3][i], start1, length);
-#endif
-#endif
-    }
-}
-
-int cpu_physical_memory_set_dirty_tracking(int enable)
-{
-    in_migration = enable;
-    return 0;
-}
-
-int cpu_physical_memory_get_dirty_tracking(void)
-{
-    return in_migration;
-}
-
 static inline void tlb_update_dirty(CPUTLBEntry *tlb_entry)
 {
     ram_addr_t ram_addr;
@@ -1861,7 +1352,7 @@ static inline void tlb_set_dirty1(CPUTLBEntry *tlb_entry, target_ulong vaddr)
 
 /* update the TLB corresponding to virtual page vaddr
    so that it is no longer dirty */
-static inline void tlb_set_dirty(CPUState *env, target_ulong vaddr)
+void tlb_set_dirty(CPUState *env, target_ulong vaddr)
 {
     int i;
 
@@ -1877,6 +1368,31 @@ static inline void tlb_set_dirty(CPUState *env, target_ulong vaddr)
 #endif
 }
 
+void tlb_reset_dirty(ram_addr_t start, unsigned long length)
+{
+    CPUState *env;
+    unsigned long start1;
+    int i;
+
+    /* we modify the TLB cache so that the dirty bit will be set again
+       when accessing the range */
+    start1 = start + (unsigned long)phys_ram_base;
+    for(env = first_cpu; env != NULL; env = env->next_cpu) {
+        for(i = 0; i < CPU_TLB_SIZE; i++)
+            tlb_reset_dirty_range(&env->tlb_table[0][i], start1, length);
+        for(i = 0; i < CPU_TLB_SIZE; i++)
+            tlb_reset_dirty_range(&env->tlb_table[1][i], start1, length);
+#if (NB_MMU_MODES >= 3)
+        for(i = 0; i < CPU_TLB_SIZE; i++)
+            tlb_reset_dirty_range(&env->tlb_table[2][i], start1, length);
+#if (NB_MMU_MODES == 4)
+        for(i = 0; i < CPU_TLB_SIZE; i++)
+            tlb_reset_dirty_range(&env->tlb_table[3][i], start1, length);
+#endif
+#endif
+    }
+}
+
 /* add a new TLB entry. At most one entry for a given virtual address
    is permitted. Return 0 if OK or 2 if the page could not be mapped
    (can only happen in non SOFTMMU mode for I/O pages or pages
@@ -1885,8 +1401,7 @@ int tlb_set_page_exec(CPUState *env, target_ulong vaddr,
                       target_phys_addr_t paddr, int prot,
                       int mmu_idx, int is_softmmu)
 {
-    PhysPageDesc *p;
-    unsigned long pd;
+    ram_addr_t pd;
     unsigned int index;
     target_ulong address;
     target_ulong code_address;
@@ -1896,12 +1411,7 @@ int tlb_set_page_exec(CPUState *env, target_ulong vaddr,
     int i;
     target_phys_addr_t iotlb;
 
-    p = phys_page_find(paddr >> TARGET_PAGE_BITS);
-    if (!p) {
-        pd = IO_MEM_UNASSIGNED;
-    } else {
-        pd = p->phys_offset;
-    }
+    pd = cpu_get_physical_page_desc(paddr);
 #if defined(DEBUG_TLB)
     printf("tlb_set_page: vaddr=" TARGET_FMT_lx " paddr=0x%08x prot=%x idx=%d smmu=%d pd=0x%08lx\n",
            vaddr, (int)paddr, prot, mmu_idx, is_softmmu, pd);
@@ -2161,993 +1671,11 @@ int page_unprotect(target_ulong address, unsigned long pc, void *puc)
     return 0;
 }
 
-static inline void tlb_set_dirty(CPUState *env,
-                                 unsigned long addr, target_ulong vaddr)
+void tlb_set_dirty(CPUState *env, target_ulong vaddr)
 {
 }
 #endif /* defined(CONFIG_USER_ONLY) */
 
-#if !defined(CONFIG_USER_ONLY)
-static int subpage_register (subpage_t *mmio, uint32_t start, uint32_t end,
-                             ram_addr_t memory);
-static void *subpage_init (target_phys_addr_t base, ram_addr_t *phys,
-                           ram_addr_t orig_memory);
-#define CHECK_SUBPAGE(addr, start_addr, start_addr2, end_addr, end_addr2, \
-                      need_subpage)                                     \
-    do {                                                                \
-        if (addr > start_addr)                                          \
-            start_addr2 = 0;                                            \
-        else {                                                          \
-            start_addr2 = start_addr & ~TARGET_PAGE_MASK;               \
-            if (start_addr2 > 0)                                        \
-                need_subpage = 1;                                       \
-        }                                                               \
-                                                                        \
-        if ((start_addr + orig_size) - addr >= TARGET_PAGE_SIZE)        \
-            end_addr2 = TARGET_PAGE_SIZE - 1;                           \
-        else {                                                          \
-            end_addr2 = (start_addr + orig_size - 1) & ~TARGET_PAGE_MASK; \
-            if (end_addr2 < TARGET_PAGE_SIZE - 1)                       \
-                need_subpage = 1;                                       \
-        }                                                               \
-    } while (0)
-
-/* register physical memory. 'size' must be a multiple of the target
-   page size. If (phys_offset & ~TARGET_PAGE_MASK) != 0, then it is an
-   io memory page */
-void cpu_register_physical_memory(target_phys_addr_t start_addr,
-                                  ram_addr_t size,
-                                  ram_addr_t phys_offset)
-{
-    target_phys_addr_t addr, end_addr;
-    PhysPageDesc *p;
-    CPUState *env;
-    ram_addr_t orig_size = size;
-    void *subpage;
-
-#ifdef USE_KQEMU
-    /* XXX: should not depend on cpu context */
-    env = first_cpu;
-    if (env->kqemu_enabled) {
-        kqemu_set_phys_mem(start_addr, size, phys_offset);
-    }
-#endif
-    if (kvm_enabled())
-        kvm_set_phys_mem(start_addr, size, phys_offset);
-
-    size = (size + TARGET_PAGE_SIZE - 1) & TARGET_PAGE_MASK;
-    end_addr = start_addr + (target_phys_addr_t)size;
-    for(addr = start_addr; addr != end_addr; addr += TARGET_PAGE_SIZE) {
-        p = phys_page_find(addr >> TARGET_PAGE_BITS);
-        if (p && p->phys_offset != IO_MEM_UNASSIGNED) {
-            ram_addr_t orig_memory = p->phys_offset;
-            target_phys_addr_t start_addr2, end_addr2;
-            int need_subpage = 0;
-
-            CHECK_SUBPAGE(addr, start_addr, start_addr2, end_addr, end_addr2,
-                          need_subpage);
-            if (need_subpage || phys_offset & IO_MEM_SUBWIDTH) {
-                if (!(orig_memory & IO_MEM_SUBPAGE)) {
-                    subpage = subpage_init((addr & TARGET_PAGE_MASK),
-                                           &p->phys_offset, orig_memory);
-                } else {
-                    subpage = io_mem_opaque[(orig_memory & ~TARGET_PAGE_MASK)
-                                            >> IO_MEM_SHIFT];
-                }
-                subpage_register(subpage, start_addr2, end_addr2, phys_offset);
-            } else {
-                p->phys_offset = phys_offset;
-                if ((phys_offset & ~TARGET_PAGE_MASK) <= IO_MEM_ROM ||
-                    (phys_offset & IO_MEM_ROMD))
-                    phys_offset += TARGET_PAGE_SIZE;
-            }
-        } else {
-            p = phys_page_find_alloc(addr >> TARGET_PAGE_BITS, 1);
-            p->phys_offset = phys_offset;
-            if ((phys_offset & ~TARGET_PAGE_MASK) <= IO_MEM_ROM ||
-                (phys_offset & IO_MEM_ROMD))
-                phys_offset += TARGET_PAGE_SIZE;
-            else {
-                target_phys_addr_t start_addr2, end_addr2;
-                int need_subpage = 0;
-
-                CHECK_SUBPAGE(addr, start_addr, start_addr2, end_addr,
-                              end_addr2, need_subpage);
-
-                if (need_subpage || phys_offset & IO_MEM_SUBWIDTH) {
-                    subpage = subpage_init((addr & TARGET_PAGE_MASK),
-                                           &p->phys_offset, IO_MEM_UNASSIGNED);
-                    subpage_register(subpage, start_addr2, end_addr2,
-                                     phys_offset);
-                }
-            }
-        }
-    }
-
-    /* since each CPU stores ram addresses in its TLB cache, we must
-       reset the modified entries */
-    /* XXX: slow ! */
-    for(env = first_cpu; env != NULL; env = env->next_cpu) {
-        tlb_flush(env, 1);
-    }
-}
-
-/* XXX: temporary until new memory mapping API */
-ram_addr_t cpu_get_physical_page_desc(target_phys_addr_t addr)
-{
-    PhysPageDesc *p;
-
-    p = phys_page_find(addr >> TARGET_PAGE_BITS);
-    if (!p)
-        return IO_MEM_UNASSIGNED;
-    return p->phys_offset;
-}
-
-/* XXX: better than nothing */
-ram_addr_t qemu_ram_alloc(ram_addr_t size)
-{
-    ram_addr_t addr;
-    if ((phys_ram_alloc_offset + size) > phys_ram_size) {
-        fprintf(stderr, "Not enough memory (requested_size = %" PRIu64 ", max memory = %" PRIu64 ")\n",
-                (uint64_t)size, (uint64_t)phys_ram_size);
-        abort();
-    }
-    addr = phys_ram_alloc_offset;
-    phys_ram_alloc_offset = TARGET_PAGE_ALIGN(phys_ram_alloc_offset + size);
-    return addr;
-}
-
-void qemu_ram_free(ram_addr_t addr)
-{
-}
-
-static uint32_t unassigned_mem_readb(void *opaque, target_phys_addr_t addr)
-{
-#ifdef DEBUG_UNASSIGNED
-    printf("Unassigned mem read " TARGET_FMT_plx "\n", addr);
-#endif
-#if defined(TARGET_SPARC) || defined(TARGET_CRIS)
-    do_unassigned_access(addr, 0, 0, 0, 1);
-#endif
-    return 0;
-}
-
-static uint32_t unassigned_mem_readw(void *opaque, target_phys_addr_t addr)
-{
-#ifdef DEBUG_UNASSIGNED
-    printf("Unassigned mem read " TARGET_FMT_plx "\n", addr);
-#endif
-#if defined(TARGET_SPARC) || defined(TARGET_CRIS)
-    do_unassigned_access(addr, 0, 0, 0, 2);
-#endif
-    return 0;
-}
-
-static uint32_t unassigned_mem_readl(void *opaque, target_phys_addr_t addr)
-{
-#ifdef DEBUG_UNASSIGNED
-    printf("Unassigned mem read " TARGET_FMT_plx "\n", addr);
-#endif
-#if defined(TARGET_SPARC) || defined(TARGET_CRIS)
-    do_unassigned_access(addr, 0, 0, 0, 4);
-#endif
-    return 0;
-}
-
-static void unassigned_mem_writeb(void *opaque, target_phys_addr_t addr, uint32_t val)
-{
-#ifdef DEBUG_UNASSIGNED
-    printf("Unassigned mem write " TARGET_FMT_plx " = 0x%x\n", addr, val);
-#endif
-#if defined(TARGET_SPARC) || defined(TARGET_CRIS)
-    do_unassigned_access(addr, 1, 0, 0, 1);
-#endif
-}
-
-static void unassigned_mem_writew(void *opaque, target_phys_addr_t addr, uint32_t val)
-{
-#ifdef DEBUG_UNASSIGNED
-    printf("Unassigned mem write " TARGET_FMT_plx " = 0x%x\n", addr, val);
-#endif
-#if defined(TARGET_SPARC) || defined(TARGET_CRIS)
-    do_unassigned_access(addr, 1, 0, 0, 2);
-#endif
-}
-
-static void unassigned_mem_writel(void *opaque, target_phys_addr_t addr, uint32_t val)
-{
-#ifdef DEBUG_UNASSIGNED
-    printf("Unassigned mem write " TARGET_FMT_plx " = 0x%x\n", addr, val);
-#endif
-#if defined(TARGET_SPARC) || defined(TARGET_CRIS)
-    do_unassigned_access(addr, 1, 0, 0, 4);
-#endif
-}
-
-static CPUReadMemoryFunc *unassigned_mem_read[3] = {
-    unassigned_mem_readb,
-    unassigned_mem_readw,
-    unassigned_mem_readl,
-};
-
-static CPUWriteMemoryFunc *unassigned_mem_write[3] = {
-    unassigned_mem_writeb,
-    unassigned_mem_writew,
-    unassigned_mem_writel,
-};
-
-static void notdirty_mem_writeb(void *opaque, target_phys_addr_t ram_addr,
-                                uint32_t val)
-{
-    int dirty_flags;
-    dirty_flags = phys_ram_dirty[ram_addr >> TARGET_PAGE_BITS];
-    if (!(dirty_flags & CODE_DIRTY_FLAG)) {
-#if !defined(CONFIG_USER_ONLY)
-        tb_invalidate_phys_page_fast(ram_addr, 1);
-        dirty_flags = phys_ram_dirty[ram_addr >> TARGET_PAGE_BITS];
-#endif
-    }
-    stb_p(phys_ram_base + ram_addr, val);
-#ifdef USE_KQEMU
-    if (cpu_single_env->kqemu_enabled &&
-        (dirty_flags & KQEMU_MODIFY_PAGE_MASK) != KQEMU_MODIFY_PAGE_MASK)
-        kqemu_modify_page(cpu_single_env, ram_addr);
-#endif
-    dirty_flags |= (0xff & ~CODE_DIRTY_FLAG);
-    phys_ram_dirty[ram_addr >> TARGET_PAGE_BITS] = dirty_flags;
-    /* we remove the notdirty callback only if the code has been
-       flushed */
-    if (dirty_flags == 0xff)
-        tlb_set_dirty(cpu_single_env, cpu_single_env->mem_io_vaddr);
-}
-
-static void notdirty_mem_writew(void *opaque, target_phys_addr_t ram_addr,
-                                uint32_t val)
-{
-    int dirty_flags;
-    dirty_flags = phys_ram_dirty[ram_addr >> TARGET_PAGE_BITS];
-    if (!(dirty_flags & CODE_DIRTY_FLAG)) {
-#if !defined(CONFIG_USER_ONLY)
-        tb_invalidate_phys_page_fast(ram_addr, 2);
-        dirty_flags = phys_ram_dirty[ram_addr >> TARGET_PAGE_BITS];
-#endif
-    }
-    stw_p(phys_ram_base + ram_addr, val);
-#ifdef USE_KQEMU
-    if (cpu_single_env->kqemu_enabled &&
-        (dirty_flags & KQEMU_MODIFY_PAGE_MASK) != KQEMU_MODIFY_PAGE_MASK)
-        kqemu_modify_page(cpu_single_env, ram_addr);
-#endif
-    dirty_flags |= (0xff & ~CODE_DIRTY_FLAG);
-    phys_ram_dirty[ram_addr >> TARGET_PAGE_BITS] = dirty_flags;
-    /* we remove the notdirty callback only if the code has been
-       flushed */
-    if (dirty_flags == 0xff)
-        tlb_set_dirty(cpu_single_env, cpu_single_env->mem_io_vaddr);
-}
-
-static void notdirty_mem_writel(void *opaque, target_phys_addr_t ram_addr,
-                                uint32_t val)
-{
-    int dirty_flags;
-    dirty_flags = phys_ram_dirty[ram_addr >> TARGET_PAGE_BITS];
-    if (!(dirty_flags & CODE_DIRTY_FLAG)) {
-#if !defined(CONFIG_USER_ONLY)
-        tb_invalidate_phys_page_fast(ram_addr, 4);
-        dirty_flags = phys_ram_dirty[ram_addr >> TARGET_PAGE_BITS];
-#endif
-    }
-    stl_p(phys_ram_base + ram_addr, val);
-#ifdef USE_KQEMU
-    if (cpu_single_env->kqemu_enabled &&
-        (dirty_flags & KQEMU_MODIFY_PAGE_MASK) != KQEMU_MODIFY_PAGE_MASK)
-        kqemu_modify_page(cpu_single_env, ram_addr);
-#endif
-    dirty_flags |= (0xff & ~CODE_DIRTY_FLAG);
-    phys_ram_dirty[ram_addr >> TARGET_PAGE_BITS] = dirty_flags;
-    /* we remove the notdirty callback only if the code has been
-       flushed */
-    if (dirty_flags == 0xff)
-        tlb_set_dirty(cpu_single_env, cpu_single_env->mem_io_vaddr);
-}
-
-static CPUReadMemoryFunc *error_mem_read[3] = {
-    NULL, /* never used */
-    NULL, /* never used */
-    NULL, /* never used */
-};
-
-static CPUWriteMemoryFunc *notdirty_mem_write[3] = {
-    notdirty_mem_writeb,
-    notdirty_mem_writew,
-    notdirty_mem_writel,
-};
-
-/* Generate a debug exception if a watchpoint has been hit.  */
-static void check_watchpoint(int offset, int flags)
-{
-    CPUState *env = cpu_single_env;
-    target_ulong vaddr;
-    int i;
-
-    vaddr = (env->mem_io_vaddr & TARGET_PAGE_MASK) + offset;
-    for (i = 0; i < env->nb_watchpoints; i++) {
-        if (vaddr == env->watchpoint[i].vaddr
-                && (env->watchpoint[i].type & flags)) {
-            env->watchpoint_hit = i + 1;
-            cpu_interrupt(env, CPU_INTERRUPT_DEBUG);
-            break;
-        }
-    }
-}
-
-/* Watchpoint access routines.  Watchpoints are inserted using TLB tricks,
-   so these check for a hit then pass through to the normal out-of-line
-   phys routines.  */
-static uint32_t watch_mem_readb(void *opaque, target_phys_addr_t addr)
-{
-    check_watchpoint(addr & ~TARGET_PAGE_MASK, PAGE_READ);
-    return ldub_phys(addr);
-}
-
-static uint32_t watch_mem_readw(void *opaque, target_phys_addr_t addr)
-{
-    check_watchpoint(addr & ~TARGET_PAGE_MASK, PAGE_READ);
-    return lduw_phys(addr);
-}
-
-static uint32_t watch_mem_readl(void *opaque, target_phys_addr_t addr)
-{
-    check_watchpoint(addr & ~TARGET_PAGE_MASK, PAGE_READ);
-    return ldl_phys(addr);
-}
-
-static void watch_mem_writeb(void *opaque, target_phys_addr_t addr,
-                             uint32_t val)
-{
-    check_watchpoint(addr & ~TARGET_PAGE_MASK, PAGE_WRITE);
-    stb_phys(addr, val);
-}
-
-static void watch_mem_writew(void *opaque, target_phys_addr_t addr,
-                             uint32_t val)
-{
-    check_watchpoint(addr & ~TARGET_PAGE_MASK, PAGE_WRITE);
-    stw_phys(addr, val);
-}
-
-static void watch_mem_writel(void *opaque, target_phys_addr_t addr,
-                             uint32_t val)
-{
-    check_watchpoint(addr & ~TARGET_PAGE_MASK, PAGE_WRITE);
-    stl_phys(addr, val);
-}
-
-static CPUReadMemoryFunc *watch_mem_read[3] = {
-    watch_mem_readb,
-    watch_mem_readw,
-    watch_mem_readl,
-};
-
-static CPUWriteMemoryFunc *watch_mem_write[3] = {
-    watch_mem_writeb,
-    watch_mem_writew,
-    watch_mem_writel,
-};
-
-static inline uint32_t subpage_readlen (subpage_t *mmio, target_phys_addr_t addr,
-                                 unsigned int len)
-{
-    uint32_t ret;
-    unsigned int idx;
-
-    idx = SUBPAGE_IDX(addr - mmio->base);
-#if defined(DEBUG_SUBPAGE)
-    printf("%s: subpage %p len %d addr " TARGET_FMT_plx " idx %d\n", __func__,
-           mmio, len, addr, idx);
-#endif
-    ret = (**mmio->mem_read[idx][len])(mmio->opaque[idx][0][len], addr);
-
-    return ret;
-}
-
-static inline void subpage_writelen (subpage_t *mmio, target_phys_addr_t addr,
-                              uint32_t value, unsigned int len)
-{
-    unsigned int idx;
-
-    idx = SUBPAGE_IDX(addr - mmio->base);
-#if defined(DEBUG_SUBPAGE)
-    printf("%s: subpage %p len %d addr " TARGET_FMT_plx " idx %d value %08x\n", __func__,
-           mmio, len, addr, idx, value);
-#endif
-    (**mmio->mem_write[idx][len])(mmio->opaque[idx][1][len], addr, value);
-}
-
-static uint32_t subpage_readb (void *opaque, target_phys_addr_t addr)
-{
-#if defined(DEBUG_SUBPAGE)
-    printf("%s: addr " TARGET_FMT_plx "\n", __func__, addr);
-#endif
-
-    return subpage_readlen(opaque, addr, 0);
-}
-
-static void subpage_writeb (void *opaque, target_phys_addr_t addr,
-                            uint32_t value)
-{
-#if defined(DEBUG_SUBPAGE)
-    printf("%s: addr " TARGET_FMT_plx " val %08x\n", __func__, addr, value);
-#endif
-    subpage_writelen(opaque, addr, value, 0);
-}
-
-static uint32_t subpage_readw (void *opaque, target_phys_addr_t addr)
-{
-#if defined(DEBUG_SUBPAGE)
-    printf("%s: addr " TARGET_FMT_plx "\n", __func__, addr);
-#endif
-
-    return subpage_readlen(opaque, addr, 1);
-}
-
-static void subpage_writew (void *opaque, target_phys_addr_t addr,
-                            uint32_t value)
-{
-#if defined(DEBUG_SUBPAGE)
-    printf("%s: addr " TARGET_FMT_plx " val %08x\n", __func__, addr, value);
-#endif
-    subpage_writelen(opaque, addr, value, 1);
-}
-
-static uint32_t subpage_readl (void *opaque, target_phys_addr_t addr)
-{
-#if defined(DEBUG_SUBPAGE)
-    printf("%s: addr " TARGET_FMT_plx "\n", __func__, addr);
-#endif
-
-    return subpage_readlen(opaque, addr, 2);
-}
-
-static void subpage_writel (void *opaque,
-                         target_phys_addr_t addr, uint32_t value)
-{
-#if defined(DEBUG_SUBPAGE)
-    printf("%s: addr " TARGET_FMT_plx " val %08x\n", __func__, addr, value);
-#endif
-    subpage_writelen(opaque, addr, value, 2);
-}
-
-static CPUReadMemoryFunc *subpage_read[] = {
-    &subpage_readb,
-    &subpage_readw,
-    &subpage_readl,
-};
-
-static CPUWriteMemoryFunc *subpage_write[] = {
-    &subpage_writeb,
-    &subpage_writew,
-    &subpage_writel,
-};
-
-static int subpage_register (subpage_t *mmio, uint32_t start, uint32_t end,
-                             ram_addr_t memory)
-{
-    int idx, eidx;
-    unsigned int i;
-
-    if (start >= TARGET_PAGE_SIZE || end >= TARGET_PAGE_SIZE)
-        return -1;
-    idx = SUBPAGE_IDX(start);
-    eidx = SUBPAGE_IDX(end);
-#if defined(DEBUG_SUBPAGE)
-    printf("%s: %p start %08x end %08x idx %08x eidx %08x mem %d\n", __func__,
-           mmio, start, end, idx, eidx, memory);
-#endif
-    memory >>= IO_MEM_SHIFT;
-    for (; idx <= eidx; idx++) {
-        for (i = 0; i < 4; i++) {
-            if (io_mem_read[memory][i]) {
-                mmio->mem_read[idx][i] = &io_mem_read[memory][i];
-                mmio->opaque[idx][0][i] = io_mem_opaque[memory];
-            }
-            if (io_mem_write[memory][i]) {
-                mmio->mem_write[idx][i] = &io_mem_write[memory][i];
-                mmio->opaque[idx][1][i] = io_mem_opaque[memory];
-            }
-        }
-    }
-
-    return 0;
-}
-
-static void *subpage_init (target_phys_addr_t base, ram_addr_t *phys,
-                           ram_addr_t orig_memory)
-{
-    subpage_t *mmio;
-    int subpage_memory;
-
-    mmio = qemu_mallocz(sizeof(subpage_t));
-    if (mmio != NULL) {
-        mmio->base = base;
-        subpage_memory = cpu_register_io_memory(0, subpage_read, subpage_write, mmio);
-#if defined(DEBUG_SUBPAGE)
-        printf("%s: %p base " TARGET_FMT_plx " len %08x %d\n", __func__,
-               mmio, base, TARGET_PAGE_SIZE, subpage_memory);
-#endif
-        *phys = subpage_memory | IO_MEM_SUBPAGE;
-        subpage_register(mmio, 0, TARGET_PAGE_SIZE - 1, orig_memory);
-    }
-
-    return mmio;
-}
-
-static void io_mem_init(void)
-{
-    cpu_register_io_memory(IO_MEM_ROM >> IO_MEM_SHIFT, error_mem_read, unassigned_mem_write, NULL);
-    cpu_register_io_memory(IO_MEM_UNASSIGNED >> IO_MEM_SHIFT, unassigned_mem_read, unassigned_mem_write, NULL);
-    cpu_register_io_memory(IO_MEM_NOTDIRTY >> IO_MEM_SHIFT, error_mem_read, notdirty_mem_write, NULL);
-    io_mem_nb = 5;
-
-    io_mem_watch = cpu_register_io_memory(0, watch_mem_read,
-                                          watch_mem_write, NULL);
-    /* alloc dirty bits array */
-    phys_ram_dirty = qemu_vmalloc(phys_ram_size >> TARGET_PAGE_BITS);
-    memset(phys_ram_dirty, 0xff, phys_ram_size >> TARGET_PAGE_BITS);
-}
-
-/* mem_read and mem_write are arrays of functions containing the
-   function to access byte (index 0), word (index 1) and dword (index
-   2). Functions can be omitted with a NULL function pointer. The
-   registered functions may be modified dynamically later.
-   If io_index is non zero, the corresponding io zone is
-   modified. If it is zero, a new io zone is allocated. The return
-   value can be used with cpu_register_physical_memory(). (-1) is
-   returned if error. */
-int cpu_register_io_memory(int io_index,
-                           CPUReadMemoryFunc **mem_read,
-                           CPUWriteMemoryFunc **mem_write,
-                           void *opaque)
-{
-    int i, subwidth = 0;
-
-    if (io_index <= 0) {
-        if (io_mem_nb >= IO_MEM_NB_ENTRIES)
-            return -1;
-        io_index = io_mem_nb++;
-    } else {
-        if (io_index >= IO_MEM_NB_ENTRIES)
-            return -1;
-    }
-
-    for(i = 0;i < 3; i++) {
-        if (!mem_read[i] || !mem_write[i])
-            subwidth = IO_MEM_SUBWIDTH;
-        io_mem_read[io_index][i] = mem_read[i];
-        io_mem_write[io_index][i] = mem_write[i];
-    }
-    io_mem_opaque[io_index] = opaque;
-    return (io_index << IO_MEM_SHIFT) | subwidth;
-}
-
-CPUWriteMemoryFunc **cpu_get_io_memory_write(int io_index)
-{
-    return io_mem_write[io_index >> IO_MEM_SHIFT];
-}
-
-CPUReadMemoryFunc **cpu_get_io_memory_read(int io_index)
-{
-    return io_mem_read[io_index >> IO_MEM_SHIFT];
-}
-
-#endif /* !defined(CONFIG_USER_ONLY) */
-
-/* physical memory access (slow version, mainly for debug) */
-#if defined(CONFIG_USER_ONLY)
-void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
-                            int len, int is_write)
-{
-    int l, flags;
-    target_ulong page;
-    void * p;
-
-    while (len > 0) {
-        page = addr & TARGET_PAGE_MASK;
-        l = (page + TARGET_PAGE_SIZE) - addr;
-        if (l > len)
-            l = len;
-        flags = page_get_flags(page);
-        if (!(flags & PAGE_VALID))
-            return;
-        if (is_write) {
-            if (!(flags & PAGE_WRITE))
-                return;
-            /* XXX: this code should not depend on lock_user */
-            if (!(p = lock_user(VERIFY_WRITE, addr, l, 0)))
-                /* FIXME - should this return an error rather than just fail? */
-                return;
-            memcpy(p, buf, l);
-            unlock_user(p, addr, l);
-        } else {
-            if (!(flags & PAGE_READ))
-                return;
-            /* XXX: this code should not depend on lock_user */
-            if (!(p = lock_user(VERIFY_READ, addr, l, 1)))
-                /* FIXME - should this return an error rather than just fail? */
-                return;
-            memcpy(buf, p, l);
-            unlock_user(p, addr, 0);
-        }
-        len -= l;
-        buf += l;
-        addr += l;
-    }
-}
-
-#else
-void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
-                            int len, int is_write)
-{
-    int l, io_index;
-    uint8_t *ptr;
-    uint32_t val;
-    target_phys_addr_t page;
-    unsigned long pd;
-    PhysPageDesc *p;
-
-    while (len > 0) {
-        page = addr & TARGET_PAGE_MASK;
-        l = (page + TARGET_PAGE_SIZE) - addr;
-        if (l > len)
-            l = len;
-        p = phys_page_find(page >> TARGET_PAGE_BITS);
-        if (!p) {
-            pd = IO_MEM_UNASSIGNED;
-        } else {
-            pd = p->phys_offset;
-        }
-
-        if (is_write) {
-            if ((pd & ~TARGET_PAGE_MASK) != IO_MEM_RAM) {
-                io_index = (pd >> IO_MEM_SHIFT) & (IO_MEM_NB_ENTRIES - 1);
-                /* XXX: could force cpu_single_env to NULL to avoid
-                   potential bugs */
-                if (l >= 4 && ((addr & 3) == 0)) {
-                    /* 32 bit write access */
-                    val = ldl_p(buf);
-                    io_mem_write[io_index][2](io_mem_opaque[io_index], addr, val);
-                    l = 4;
-                } else if (l >= 2 && ((addr & 1) == 0)) {
-                    /* 16 bit write access */
-                    val = lduw_p(buf);
-                    io_mem_write[io_index][1](io_mem_opaque[io_index], addr, val);
-                    l = 2;
-                } else {
-                    /* 8 bit write access */
-                    val = ldub_p(buf);
-                    io_mem_write[io_index][0](io_mem_opaque[io_index], addr, val);
-                    l = 1;
-                }
-            } else {
-                unsigned long addr1;
-                addr1 = (pd & TARGET_PAGE_MASK) + (addr & ~TARGET_PAGE_MASK);
-                /* RAM case */
-                ptr = phys_ram_base + addr1;
-                memcpy(ptr, buf, l);
-                if (!cpu_physical_memory_is_dirty(addr1)) {
-                    /* invalidate code */
-                    tb_invalidate_phys_page_range(addr1, addr1 + l, 0);
-                    /* set dirty bit */
-                    phys_ram_dirty[addr1 >> TARGET_PAGE_BITS] |=
-                        (0xff & ~CODE_DIRTY_FLAG);
-                }
-            }
-        } else {
-            if ((pd & ~TARGET_PAGE_MASK) > IO_MEM_ROM &&
-                !(pd & IO_MEM_ROMD)) {
-                /* I/O case */
-                io_index = (pd >> IO_MEM_SHIFT) & (IO_MEM_NB_ENTRIES - 1);
-                if (l >= 4 && ((addr & 3) == 0)) {
-                    /* 32 bit read access */
-                    val = io_mem_read[io_index][2](io_mem_opaque[io_index], addr);
-                    stl_p(buf, val);
-                    l = 4;
-                } else if (l >= 2 && ((addr & 1) == 0)) {
-                    /* 16 bit read access */
-                    val = io_mem_read[io_index][1](io_mem_opaque[io_index], addr);
-                    stw_p(buf, val);
-                    l = 2;
-                } else {
-                    /* 8 bit read access */
-                    val = io_mem_read[io_index][0](io_mem_opaque[io_index], addr);
-                    stb_p(buf, val);
-                    l = 1;
-                }
-            } else {
-                /* RAM case */
-                ptr = phys_ram_base + (pd & TARGET_PAGE_MASK) +
-                    (addr & ~TARGET_PAGE_MASK);
-                memcpy(buf, ptr, l);
-            }
-        }
-        len -= l;
-        buf += l;
-        addr += l;
-    }
-}
-
-/* used for ROM loading : can write in RAM and ROM */
-void cpu_physical_memory_write_rom(target_phys_addr_t addr,
-                                   const uint8_t *buf, int len)
-{
-    int l;
-    uint8_t *ptr;
-    target_phys_addr_t page;
-    unsigned long pd;
-    PhysPageDesc *p;
-
-    while (len > 0) {
-        page = addr & TARGET_PAGE_MASK;
-        l = (page + TARGET_PAGE_SIZE) - addr;
-        if (l > len)
-            l = len;
-        p = phys_page_find(page >> TARGET_PAGE_BITS);
-        if (!p) {
-            pd = IO_MEM_UNASSIGNED;
-        } else {
-            pd = p->phys_offset;
-        }
-
-        if ((pd & ~TARGET_PAGE_MASK) != IO_MEM_RAM &&
-            (pd & ~TARGET_PAGE_MASK) != IO_MEM_ROM &&
-            !(pd & IO_MEM_ROMD)) {
-            /* do nothing */
-        } else {
-            unsigned long addr1;
-            addr1 = (pd & TARGET_PAGE_MASK) + (addr & ~TARGET_PAGE_MASK);
-            /* ROM/RAM case */
-            ptr = phys_ram_base + addr1;
-            memcpy(ptr, buf, l);
-        }
-        len -= l;
-        buf += l;
-        addr += l;
-    }
-}
-
-
-/* warning: addr must be aligned */
-uint32_t ldl_phys(target_phys_addr_t addr)
-{
-    int io_index;
-    uint8_t *ptr;
-    uint32_t val;
-    unsigned long pd;
-    PhysPageDesc *p;
-
-    p = phys_page_find(addr >> TARGET_PAGE_BITS);
-    if (!p) {
-        pd = IO_MEM_UNASSIGNED;
-    } else {
-        pd = p->phys_offset;
-    }
-
-    if ((pd & ~TARGET_PAGE_MASK) > IO_MEM_ROM &&
-        !(pd & IO_MEM_ROMD)) {
-        /* I/O case */
-        io_index = (pd >> IO_MEM_SHIFT) & (IO_MEM_NB_ENTRIES - 1);
-        val = io_mem_read[io_index][2](io_mem_opaque[io_index], addr);
-    } else {
-        /* RAM case */
-        ptr = phys_ram_base + (pd & TARGET_PAGE_MASK) +
-            (addr & ~TARGET_PAGE_MASK);
-        val = ldl_p(ptr);
-    }
-    return val;
-}
-
-/* warning: addr must be aligned */
-uint64_t ldq_phys(target_phys_addr_t addr)
-{
-    int io_index;
-    uint8_t *ptr;
-    uint64_t val;
-    unsigned long pd;
-    PhysPageDesc *p;
-
-    p = phys_page_find(addr >> TARGET_PAGE_BITS);
-    if (!p) {
-        pd = IO_MEM_UNASSIGNED;
-    } else {
-        pd = p->phys_offset;
-    }
-
-    if ((pd & ~TARGET_PAGE_MASK) > IO_MEM_ROM &&
-        !(pd & IO_MEM_ROMD)) {
-        /* I/O case */
-        io_index = (pd >> IO_MEM_SHIFT) & (IO_MEM_NB_ENTRIES - 1);
-#ifdef TARGET_WORDS_BIGENDIAN
-        val = (uint64_t)io_mem_read[io_index][2](io_mem_opaque[io_index], addr) << 32;
-        val |= io_mem_read[io_index][2](io_mem_opaque[io_index], addr + 4);
-#else
-        val = io_mem_read[io_index][2](io_mem_opaque[io_index], addr);
-        val |= (uint64_t)io_mem_read[io_index][2](io_mem_opaque[io_index], addr + 4) << 32;
-#endif
-    } else {
-        /* RAM case */
-        ptr = phys_ram_base + (pd & TARGET_PAGE_MASK) +
-            (addr & ~TARGET_PAGE_MASK);
-        val = ldq_p(ptr);
-    }
-    return val;
-}
-
-/* XXX: optimize */
-uint32_t ldub_phys(target_phys_addr_t addr)
-{
-    uint8_t val;
-    cpu_physical_memory_read(addr, &val, 1);
-    return val;
-}
-
-/* XXX: optimize */
-uint32_t lduw_phys(target_phys_addr_t addr)
-{
-    uint16_t val;
-    cpu_physical_memory_read(addr, (uint8_t *)&val, 2);
-    return tswap16(val);
-}
-
-/* warning: addr must be aligned. The ram page is not masked as dirty
-   and the code inside is not invalidated. It is useful if the dirty
-   bits are used to track modified PTEs */
-void stl_phys_notdirty(target_phys_addr_t addr, uint32_t val)
-{
-    int io_index;
-    uint8_t *ptr;
-    unsigned long pd;
-    PhysPageDesc *p;
-
-    p = phys_page_find(addr >> TARGET_PAGE_BITS);
-    if (!p) {
-        pd = IO_MEM_UNASSIGNED;
-    } else {
-        pd = p->phys_offset;
-    }
-
-    if ((pd & ~TARGET_PAGE_MASK) != IO_MEM_RAM) {
-        io_index = (pd >> IO_MEM_SHIFT) & (IO_MEM_NB_ENTRIES - 1);
-        io_mem_write[io_index][2](io_mem_opaque[io_index], addr, val);
-    } else {
-        unsigned long addr1 = (pd & TARGET_PAGE_MASK) + (addr & ~TARGET_PAGE_MASK);
-        ptr = phys_ram_base + addr1;
-        stl_p(ptr, val);
-
-        if (unlikely(in_migration)) {
-            if (!cpu_physical_memory_is_dirty(addr1)) {
-                /* invalidate code */
-                tb_invalidate_phys_page_range(addr1, addr1 + 4, 0);
-                /* set dirty bit */
-                phys_ram_dirty[addr1 >> TARGET_PAGE_BITS] |=
-                    (0xff & ~CODE_DIRTY_FLAG);
-            }
-        }
-    }
-}
-
-void stq_phys_notdirty(target_phys_addr_t addr, uint64_t val)
-{
-    int io_index;
-    uint8_t *ptr;
-    unsigned long pd;
-    PhysPageDesc *p;
-
-    p = phys_page_find(addr >> TARGET_PAGE_BITS);
-    if (!p) {
-        pd = IO_MEM_UNASSIGNED;
-    } else {
-        pd = p->phys_offset;
-    }
-
-    if ((pd & ~TARGET_PAGE_MASK) != IO_MEM_RAM) {
-        io_index = (pd >> IO_MEM_SHIFT) & (IO_MEM_NB_ENTRIES - 1);
-#ifdef TARGET_WORDS_BIGENDIAN
-        io_mem_write[io_index][2](io_mem_opaque[io_index], addr, val >> 32);
-        io_mem_write[io_index][2](io_mem_opaque[io_index], addr + 4, val);
-#else
-        io_mem_write[io_index][2](io_mem_opaque[io_index], addr, val);
-        io_mem_write[io_index][2](io_mem_opaque[io_index], addr + 4, val >> 32);
-#endif
-    } else {
-        ptr = phys_ram_base + (pd & TARGET_PAGE_MASK) +
-            (addr & ~TARGET_PAGE_MASK);
-        stq_p(ptr, val);
-    }
-}
-
-/* warning: addr must be aligned */
-void stl_phys(target_phys_addr_t addr, uint32_t val)
-{
-    int io_index;
-    uint8_t *ptr;
-    unsigned long pd;
-    PhysPageDesc *p;
-
-    p = phys_page_find(addr >> TARGET_PAGE_BITS);
-    if (!p) {
-        pd = IO_MEM_UNASSIGNED;
-    } else {
-        pd = p->phys_offset;
-    }
-
-    if ((pd & ~TARGET_PAGE_MASK) != IO_MEM_RAM) {
-        io_index = (pd >> IO_MEM_SHIFT) & (IO_MEM_NB_ENTRIES - 1);
-        io_mem_write[io_index][2](io_mem_opaque[io_index], addr, val);
-    } else {
-        unsigned long addr1;
-        addr1 = (pd & TARGET_PAGE_MASK) + (addr & ~TARGET_PAGE_MASK);
-        /* RAM case */
-        ptr = phys_ram_base + addr1;
-        stl_p(ptr, val);
-        if (!cpu_physical_memory_is_dirty(addr1)) {
-            /* invalidate code */
-            tb_invalidate_phys_page_range(addr1, addr1 + 4, 0);
-            /* set dirty bit */
-            phys_ram_dirty[addr1 >> TARGET_PAGE_BITS] |=
-                (0xff & ~CODE_DIRTY_FLAG);
-        }
-    }
-}
-
-/* XXX: optimize */
-void stb_phys(target_phys_addr_t addr, uint32_t val)
-{
-    uint8_t v = val;
-    cpu_physical_memory_write(addr, &v, 1);
-}
-
-/* XXX: optimize */
-void stw_phys(target_phys_addr_t addr, uint32_t val)
-{
-    uint16_t v = tswap16(val);
-    cpu_physical_memory_write(addr, (const uint8_t *)&v, 2);
-}
-
-/* XXX: optimize */
-void stq_phys(target_phys_addr_t addr, uint64_t val)
-{
-    val = tswap64(val);
-    cpu_physical_memory_write(addr, (const uint8_t *)&val, 8);
-}
-
-#endif
-
-/* virtual memory access for debug */
-int cpu_memory_rw_debug(CPUState *env, target_ulong addr,
-                        uint8_t *buf, int len, int is_write)
-{
-    int l;
-    target_phys_addr_t phys_addr;
-    target_ulong page;
-
-    while (len > 0) {
-        page = addr & TARGET_PAGE_MASK;
-        phys_addr = cpu_get_phys_page_debug(env, page);
-        /* if no physical page mapped, return an error */
-        if (phys_addr == -1)
-            return -1;
-        l = (page + TARGET_PAGE_SIZE) - addr;
-        if (l > len)
-            l = len;
-        cpu_physical_memory_rw(phys_addr + (addr & ~TARGET_PAGE_MASK),
-                               buf, l, is_write);
-        len -= l;
-        buf += l;
-        addr += l;
-    }
-    return 0;
-}
-
 /* in deterministic execution mode, instructions doing device I/Os
    must be at the end of the TB */
 void cpu_io_recompile(CPUState *env, void *retaddr)
diff --git a/vl.c b/vl.c
index 7bcffd3..0e8dc50 100644
--- a/vl.c
+++ b/vl.c
@@ -6381,6 +6381,7 @@ int main(int argc, char **argv)
 
     /* init the dynamic translator */
     cpu_exec_init_all(tb_size * 1024 * 1024);
+    cpu_noexec_init_all();
 
     bdrv_init();
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH][RFC] Split non-TCG bits out of exec.c
  2008-11-12 22:10 [Qemu-devel] [PATCH][RFC] Split non-TCG bits out of exec.c Anthony Liguori
@ 2008-11-12 22:48 ` Fabrice Bellard
  2008-11-12 22:53   ` Anthony Liguori
  2008-11-13 13:51 ` andrzej zaborowski
  2008-11-14  4:03 ` Jamie Lokier
  2 siblings, 1 reply; 17+ messages in thread
From: Fabrice Bellard @ 2008-11-12 22:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: Carsten Otte, Anthony Liguori, Hollis Blanchard, kvm-devel,
	Paul Brook

Anthony Liguori wrote:
> Unlike kqemu, KVM does not use TCG at all when accelerating QEMU.  Having TCG
> present is not a problem when using KVM on x86.  x86 already has TCG host and
> target support and it's quite convenient to be able to disable/enable KVM and
> compare it to TCG when debugging.
> 
> KVM also supports architectures that do not have TCG host and target support
> such as ia64, s390, and PPC[1].  For these architectures, TCG is an inhibitor
> for upstream inclusion.
> 
> TCG is pretty well isolated in QEMU so building these targets without TCG
> should be easy enough.  This breaks down in exec.c though.  There is a lot of
> TCG specific code in exec.c, but also a lot of code that KVM needs.
> 
> This patch moves the non-TCG specific bits of exec.c into a separate file,
> exec-all.c.  This makes it relatively easy to build QEMU without TCG support.
> More patches will come to complete this work but the exec.c bits are probably
> 95% of what is needed.
> 
> The remaining bits are some general cleanups where layering has been violated
> and the introduction of a new -kvm subtarget, similar to -softmmu or
>  -linux-user.  This target will not have TCG support and only support KVM.
> However, before going down that path, I wanted to see if anyone objected to this
> bit of the cleanup.
> 
> Any objections?

I suggest to go even further: there should be a way in QEMU to define
CPUs which do not rely on the dynamic translator and this choice should
be doable at runtime (i.e. not with a bunch of #ifdefs as you may do
it). This way you could not only plug KVM CPUs without having the
equivalent TCG one, but also CPUs from other sources (i.e. the x86
interpreter of malc, or the cycle accurate PTLsim x86 emulator).

Fabrice.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH][RFC] Split non-TCG bits out of exec.c
  2008-11-12 22:48 ` Fabrice Bellard
@ 2008-11-12 22:53   ` Anthony Liguori
  0 siblings, 0 replies; 17+ messages in thread
From: Anthony Liguori @ 2008-11-12 22:53 UTC (permalink / raw)
  To: Fabrice Bellard
  Cc: Carsten Otte, Paul Brook, qemu-devel, kvm-devel, Hollis Blanchard

Fabrice Bellard wrote:
> I suggest to go even further: there should be a way in QEMU to define
> CPUs which do not rely on the dynamic translator and this choice should
> be doable at runtime (i.e. not with a bunch of #ifdefs as you may do
> it). This way you could not only plug KVM CPUs without having the
> equivalent TCG one, but also CPUs from other sources (i.e. the x86
> interpreter of malc, or the cycle accurate PTLsim x86 emulator).
>   

Today, we do this with KVM support without any ifdefs (what's in SVN 
right now).  We leave the TCG and the KVM state in CPUState, and then 
just use the appropriate cpu_exec() loop to run depending on the CPU type.

We could go a step further and split out the core x86 CPU state from 
CPUX86State, and then introduce a CPUTCGState and CPUKVMState that both 
include CPUX86State but that seems like a lot of churn for little gain 
(KVM just adds two more fields to CPUX86State).

What I'm trying to do with this patch, is make it possible to get rid of 
the TCG code altogether for targets that only support KVM and not TCG 
(ia64, s390, etc).

How I'll eventually get rid of it is not with #ifdefs, but by just not 
compiling in all the TCG code, cpu-exec.c, exec.c and instead just 
compiling in a kvm-exec.c or something like that.

Regards,

Anthony Liguori

> Fabrice.
>
>   

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH][RFC] Split non-TCG bits out of exec.c
  2008-11-12 22:10 [Qemu-devel] [PATCH][RFC] Split non-TCG bits out of exec.c Anthony Liguori
  2008-11-12 22:48 ` Fabrice Bellard
@ 2008-11-13 13:51 ` andrzej zaborowski
  2008-11-13 16:18   ` Anthony Liguori
  2008-11-14  4:03 ` Jamie Lokier
  2 siblings, 1 reply; 17+ messages in thread
From: andrzej zaborowski @ 2008-11-13 13:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: Carsten Otte, Anthony Liguori, Hollis Blanchard, kvm-devel,
	Paul Brook

2008/11/12 Anthony Liguori <aliguori@us.ibm.com>:
> Unlike kqemu, KVM does not use TCG at all when accelerating QEMU.  Having TCG
> present is not a problem when using KVM on x86.  x86 already has TCG host and
> target support and it's quite convenient to be able to disable/enable KVM and
> compare it to TCG when debugging.
>
> KVM also supports architectures that do not have TCG host and target support
> such as ia64, s390, and PPC[1].  For these architectures, TCG is an inhibitor
> for upstream inclusion.
>
> TCG is pretty well isolated in QEMU so building these targets without TCG
> should be easy enough.  This breaks down in exec.c though.  There is a lot of
> TCG specific code in exec.c, but also a lot of code that KVM needs.
>
> This patch moves the non-TCG specific bits of exec.c into a separate file,
> exec-all.c.  This makes it relatively easy to build QEMU without TCG support.
> More patches will come to complete this work but the exec.c bits are probably
> 95% of what is needed.
>
> The remaining bits are some general cleanups where layering has been violated
> and the introduction of a new -kvm subtarget, similar to -softmmu or
>  -linux-user.

Is this going a bit in the opposite direction to where QEMUAccel is
going?  What Fabrice suggests seems to be like QEMUAccel, with TCG
treated as another accelerator.

BTW It would be great if before merging a change like this you
review/merge the patches submitted to the list that might touch the
same area so as not to break them (such as Jan Kiszka's
single-stepping/watchpoint fixes).

Cheers

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH][RFC] Split non-TCG bits out of exec.c
  2008-11-13 13:51 ` andrzej zaborowski
@ 2008-11-13 16:18   ` Anthony Liguori
  2008-11-14  3:12     ` andrzej zaborowski
  0 siblings, 1 reply; 17+ messages in thread
From: Anthony Liguori @ 2008-11-13 16:18 UTC (permalink / raw)
  To: andrzej zaborowski
  Cc: Carsten Otte, Anthony Liguori, Hollis Blanchard, kvm-devel,
	qemu-devel, Paul Brook

andrzej zaborowski wrote:
> Is this going a bit in the opposite direction to where QEMUAccel is
> going?  What Fabrice suggests seems to be like QEMUAccel, with TCG
> treated as another accelerator.
>   

QEMUAccel is a bit orthogonal to what I'm talking about.  There is 
already KVM support in QEMU today and I'm merely looking to restructure 
existing code so that I can build a version of QEMU that has no TCG 
support, only KVM support.  TCG is too intimately woven into QEMU right 
now.  You could think of this perhaps as a precursor to making TCG more 
of an "accelerator" than it is today.

But wrt QEMUAccel and KVM, there are 5 places in QEMU where there is KVM 
specific code.

One is cpu-exec.c to invoke the kvm exec routine instead of TCG.  kqemu 
has something similar.  Unfortunately, kqemu relies on some state that's 
only available in cpu-exec.c so we can't make this a single function 
pointer invocation without major surgery on cpu-exec.

One is vl.c to initialize KVM support.  kqemu doesn't need this.

One is exec.c, to hook cpu_register_physical_memory.  kqemu does this 
too so it could conceivably be a hook.

Another one is monitor.c to implement 'info kvm'.  Not really a place 
for a hook.  Ideally we could register the monitor callback from 
kvm-all.c when we initialize KVM.

Finally, there is a hook in hw/acpi.c to disable SMM support when using 
KVM.  This is KVM specific because KVM doesn't support SMM.  kqemu uses 
TCG to run SMM code.

Since there is only one shared hook ATM, I don't think something like 
QEMUAccel is all that useful for KVM.  On the other hand, there are 42 
places that are kqemu specific.  I think kqemu could be refactored to 
eliminate most of these.

kqemu relies on TCG so you can't really decouple them from each other.

> BTW It would be great if before merging a change like this you
> review/merge the patches submitted to the list that might touch the
> same area so as not to break them (such as Jan Kiszka's
> single-stepping/watchpoint fixes).
>   

Yeah, I will make sure to.

Regards,

Anthony Liguori

> Cheers
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>   

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH][RFC] Split non-TCG bits out of exec.c
  2008-11-13 16:18   ` Anthony Liguori
@ 2008-11-14  3:12     ` andrzej zaborowski
  2008-11-14  3:18       ` Anthony Liguori
  0 siblings, 1 reply; 17+ messages in thread
From: andrzej zaborowski @ 2008-11-14  3:12 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Carsten Otte, Anthony Liguori, Hollis Blanchard, kvm-devel,
	qemu-devel, Paul Brook

2008/11/13 Anthony Liguori <anthony@codemonkey.ws>:
> andrzej zaborowski wrote:
>> Is this going a bit in the opposite direction to where QEMUAccel is
>> going?  What Fabrice suggests seems to be like QEMUAccel, with TCG
>> treated as another accelerator.
>>
>
> QEMUAccel is a bit orthogonal to what I'm talking about.  There is already
> KVM support in QEMU today and I'm merely looking to restructure existing
> code so that I can build a version of QEMU that has no TCG support, only KVM
> support.  TCG is too intimately woven into QEMU right now.  You could think
> of this perhaps as a precursor to making TCG more of an "accelerator" than
> it is today.

Ah, I agree with your patch, I was only commenting on the idea of
*-kvm/ targets.  I see something like QEMUAccel as a way to turn on
and off the cpu emulators (TCG, kvm, kqemu).  Ofcourse, kqemu depends
on a fallback emulator - currently TCG - I guess it could be possible
to run kqemu with kvm as the fall back and not compile in TCG (even if
not a very useful configuration).

>
> But wrt QEMUAccel and KVM, there are 5 places in QEMU where there is KVM
> specific code.
>
> One is cpu-exec.c to invoke the kvm exec routine instead of TCG.  kqemu has
> something similar.  Unfortunately, kqemu relies on some state that's only
> available in cpu-exec.c so we can't make this a single function pointer
> invocation without major surgery on cpu-exec.
>
> One is vl.c to initialize KVM support.  kqemu doesn't need this.
>
> One is exec.c, to hook cpu_register_physical_memory.  kqemu does this too so
> it could conceivably be a hook.
>
> Another one is monitor.c to implement 'info kvm'.  Not really a place for a
> hook.  Ideally we could register the monitor callback from kvm-all.c when we
> initialize KVM.
>
> Finally, there is a hook in hw/acpi.c to disable SMM support when using KVM.
>  This is KVM specific because KVM doesn't support SMM.  kqemu uses TCG to
> run SMM code.
>
> Since there is only one shared hook ATM, I don't think something like
> QEMUAccel is all that useful for KVM.  On the other hand, there are 42
> places that are kqemu specific.  I think kqemu could be refactored to
> eliminate most of these.
>
> kqemu relies on TCG so you can't really decouple them from each other.
>
>> BTW It would be great if before merging a change like this you
>> review/merge the patches submitted to the list that might touch the
>> same area so as not to break them (such as Jan Kiszka's
>> single-stepping/watchpoint fixes).
>>
>
> Yeah, I will make sure to.

Many thanks for that.

Regards

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH][RFC] Split non-TCG bits out of exec.c
  2008-11-14  3:12     ` andrzej zaborowski
@ 2008-11-14  3:18       ` Anthony Liguori
  2008-11-14 13:45         ` andrzej zaborowski
  0 siblings, 1 reply; 17+ messages in thread
From: Anthony Liguori @ 2008-11-14  3:18 UTC (permalink / raw)
  To: andrzej zaborowski
  Cc: Carsten Otte, Anthony Liguori, Hollis Blanchard, kvm-devel,
	qemu-devel, Paul Brook

andrzej zaborowski wrote:
> 2008/11/13 Anthony Liguori <anthony@codemonkey.ws>:
>   
>> andrzej zaborowski wrote:
>>     
>>> Is this going a bit in the opposite direction to where QEMUAccel is
>>> going?  What Fabrice suggests seems to be like QEMUAccel, with TCG
>>> treated as another accelerator.
>>>
>>>       
>> QEMUAccel is a bit orthogonal to what I'm talking about.  There is already
>> KVM support in QEMU today and I'm merely looking to restructure existing
>> code so that I can build a version of QEMU that has no TCG support, only KVM
>> support.  TCG is too intimately woven into QEMU right now.  You could think
>> of this perhaps as a precursor to making TCG more of an "accelerator" than
>> it is today.
>>     
>
> Ah, I agree with your patch, I was only commenting on the idea of
> *-kvm/ targets.  I see something like QEMUAccel as a way to turn on
> and off the cpu emulators (TCG, kvm, kqemu).

The issue is not disabling TCG at runtime.  That's easy enough.  The 
issue is that TCG doesn't exist (and probably won't ever exist) for 
certain architectures like ia64 and s390.  Being forced to build with 
TCG support makes having QEMU + KVM not possible on these platforms even 
though they both support KVM.

The idea behind a -kvm target is to be able to use QEMU + KVM on these 
architectures in a clean way.  We could also build qemu-system-s390 and 
just exclude TCG but from a naming perspective, it makes sense to be 
qemu-kvm because there can only be a single KVM executable for any given 
platform.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH][RFC] Split non-TCG bits out of exec.c
  2008-11-12 22:10 [Qemu-devel] [PATCH][RFC] Split non-TCG bits out of exec.c Anthony Liguori
  2008-11-12 22:48 ` Fabrice Bellard
  2008-11-13 13:51 ` andrzej zaborowski
@ 2008-11-14  4:03 ` Jamie Lokier
  2008-11-14  9:58   ` Avi Kivity
                     ` (2 more replies)
  2 siblings, 3 replies; 17+ messages in thread
From: Jamie Lokier @ 2008-11-14  4:03 UTC (permalink / raw)
  To: qemu-devel
  Cc: Carsten Otte, Anthony Liguori, Hollis Blanchard, kvm-devel,
	Paul Brook

Anthony Liguori wrote:
> Unlike kqemu, KVM does not use TCG at all when accelerating QEMU.  Having TCG
> present is not a problem when using KVM on x86.  x86 already has TCG host and
> target support and it's quite convenient to be able to disable/enable KVM and
> compare it to TCG when debugging.

I agree with removing/isolating the dependency on TCG, and there are good
reasons for it.

But does the fact KVM doesn't use TCG prevent KVM from running some
x86 modes correctly?  E.g. I gather 16-bit code is run by KVM using
VM86 mode, which is not exactly correct.  It would be nice to have KVM
acceleration but also complete and correct emulation, by switching to
TCG for those modes.

Also, an earlier thread pointed out that loops doing a lot of MMIO are
_slower_ with KVM than without - this manifested as very slow VGA
output for some guests.  Having KVM pass control to TCG for short runs
of guest instructions which do MMIO, or other instructions which need
to be emulated, would accelerate KVM in this respect.

-- Jamie

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH][RFC] Split non-TCG bits out of exec.c
  2008-11-14  4:03 ` Jamie Lokier
@ 2008-11-14  9:58   ` Avi Kivity
  2008-11-14 13:23     ` Jamie Lokier
  2008-11-14 13:58   ` Anthony Liguori
  2008-11-14 14:07   ` Anthony Liguori
  2 siblings, 1 reply; 17+ messages in thread
From: Avi Kivity @ 2008-11-14  9:58 UTC (permalink / raw)
  To: qemu-devel
  Cc: Carsten Otte, Anthony Liguori, kvm-devel, Hollis Blanchard,
	Paul Brook

Jamie Lokier wrote:
> But does the fact KVM doesn't use TCG prevent KVM from running some
> x86 modes correctly?  E.g. I gather 16-bit code is run by KVM using
> VM86 mode, which is not exactly correct.  It would be nice to have KVM
> acceleration but also complete and correct emulation, by switching to
> TCG for those modes.
>
>   

There is work in progress to make 16-bit emulation fully accurate.

> Also, an earlier thread pointed out that loops doing a lot of MMIO are
> _slower_ with KVM than without - this manifested as very slow VGA
> output for some guests.  Having KVM pass control to TCG for short runs
> of guest instructions which do MMIO, or other instructions which need
> to be emulated, would accelerate KVM in this respect.
>   

Since TCG is not smp-safe, this is very problematic for smp guests.  You 
would have to stop virtualization on all vcpus and start tcg on all of 
them.  Performance would plummet.

There are ways of mitigating the high mmio cost with kvm.  For 
framebuffers, one can allow kvm direct access.  For other mmio, there's 
the 'coalesced mmio' support which allows mmio to be batched when this 
does not affect emulation accuracy and latency.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH][RFC] Split non-TCG bits out of exec.c
  2008-11-14  9:58   ` Avi Kivity
@ 2008-11-14 13:23     ` Jamie Lokier
  2008-11-16 13:07       ` Avi Kivity
  0 siblings, 1 reply; 17+ messages in thread
From: Jamie Lokier @ 2008-11-14 13:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Carsten Otte, Anthony Liguori, Hollis Blanchard, kvm-devel,
	Paul Brook

Avi Kivity wrote:
> Jamie Lokier wrote:
> >But does the fact KVM doesn't use TCG prevent KVM from running some
> >x86 modes correctly?  E.g. I gather 16-bit code is run by KVM using
> >VM86 mode, which is not exactly correct.  It would be nice to have KVM
> >acceleration but also complete and correct emulation, by switching to
> >TCG for those modes.
> 
> There is work in progress to make 16-bit emulation fully accurate.

Ooh!  I want my Windows 95 to run in KVM :-)
I'm curious, how is this planned to work?

I'm having trouble thinking of how to do it without software emulation
at some stage.

> >Also, an earlier thread pointed out that loops doing a lot of MMIO are
> >_slower_ with KVM than without - this manifested as very slow VGA
> >output for some guests.  Having KVM pass control to TCG for short runs
> >of guest instructions which do MMIO, or other instructions which need
> >to be emulated, would accelerate KVM in this respect.

(I think VMware does something like this, btw).

> Since TCG is not smp-safe, this is very problematic for smp guests.  You 
> would have to stop virtualization on all vcpus and start tcg on all of 
> them.  Performance would plummet.

On the other hand, when running on a KVM-capable architecture
combination, it is definitely possible to make TCG smp-safe because
every guest atomic instruction has a corresponding host one.  It's
practically a 1:1 instruction mapping on x86, which doesn't have many
atomic instructions.  (Maybe harder on other archs).

> There are ways of mitigating the high mmio cost with kvm.  For 
> framebuffers, one can allow kvm direct access.  For other mmio, there's 
> the 'coalesced mmio' support which allows mmio to be batched when this 
> does not affect emulation accuracy and latency.

Don't you still have to trap for each MMIO in order to collect the
batch, except for REP instructions?  It's the traps which are expensive.

Fortunately modern hardware tends to use DMA for data intensive
things, and MMIO just to trigger DMA, and initialisation.

-- Jamie

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH][RFC] Split non-TCG bits out of exec.c
  2008-11-14  3:18       ` Anthony Liguori
@ 2008-11-14 13:45         ` andrzej zaborowski
  0 siblings, 0 replies; 17+ messages in thread
From: andrzej zaborowski @ 2008-11-14 13:45 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Carsten Otte, Anthony Liguori, Hollis Blanchard, kvm-devel,
	qemu-devel, Paul Brook

2008/11/14 Anthony Liguori <anthony@codemonkey.ws>:
> The issue is not disabling TCG at runtime.  That's easy enough.  The issue
> is that TCG doesn't exist (and probably won't ever exist) for certain
> architectures like ia64 and s390.  Being forced to build with TCG support
> makes having QEMU + KVM not possible on these platforms even though they
> both support KVM.

I mean either compile-time or run-time: assuming that each QEMUAccel
implementation is a bunch of files + a struct with pointers in the
common code, it should make turning on/off each emulator easy.

Cheers

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH][RFC] Split non-TCG bits out of exec.c
  2008-11-14  4:03 ` Jamie Lokier
  2008-11-14  9:58   ` Avi Kivity
@ 2008-11-14 13:58   ` Anthony Liguori
  2008-11-14 14:07   ` Anthony Liguori
  2 siblings, 0 replies; 17+ messages in thread
From: Anthony Liguori @ 2008-11-14 13:58 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Carsten Otte, Anthony Liguori, Hollis Blanchard, kvm-devel,
	qemu-devel, Paul Brook

Jamie Lokier wrote:
> Anthony Liguori wrote:
>   
>> Unlike kqemu, KVM does not use TCG at all when accelerating QEMU.  Having TCG
>> present is not a problem when using KVM on x86.  x86 already has TCG host and
>> target support and it's quite convenient to be able to disable/enable KVM and
>> compare it to TCG when debugging.
>>     
>
> I agree with removing/isolating the dependency on TCG, and there are good
> reasons for it.
>
> But does the fact KVM doesn't use TCG prevent KVM from running some
> x86 modes correctly?  E.g. I gather 16-bit code is run by KVM using
> VM86 mode, which is not exactly correct.  It would be nice to have KVM
> acceleration but also complete and correct emulation, by switching to
> TCG for those modes.
>   

That's just a limitation of Intel VT.  AMD SVM runs 16-bit code 
natively.  We're slowly improving our in-kernel emulator so eventually 
we'll be able to emulate 16-bit mode in the kernel.

Running 16-bit code in TCG is something that has been considered.

> Also, an earlier thread pointed out that loops doing a lot of MMIO are
> _slower_ with KVM than without - this manifested as very slow VGA
> output for some guests.  Having KVM pass control to TCG for short runs
> of guest instructions which do MMIO, or other instructions which need
> to be emulated, would accelerate KVM in this respect.
>   

It falls apart for SMP guests.  TCG does not preserve atomicity of 
memory instructions so you could never have an SMP VCPU running on bare 
metal while TCG is running.  There is a rather large initial cost for 
building the TBs too so in practice, there are few areas that benefit 
from this sort of hand off.

The VGA optimization actually addresses this problem in a much nicer 
way.  KVM also supports MMIO batching which we'll eventually merge that 
covers the remaining cases pretty well.

Regards,

Anthony Liguori

> -- Jamie
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>   

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH][RFC] Split non-TCG bits out of exec.c
  2008-11-14  4:03 ` Jamie Lokier
  2008-11-14  9:58   ` Avi Kivity
  2008-11-14 13:58   ` Anthony Liguori
@ 2008-11-14 14:07   ` Anthony Liguori
  2008-11-14 23:13     ` Jamie Lokier
  2 siblings, 1 reply; 17+ messages in thread
From: Anthony Liguori @ 2008-11-14 14:07 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Carsten Otte, Anthony Liguori, Hollis Blanchard, kvm-devel,
	qemu-devel, Paul Brook

Jamie Lokier wrote:
> Also, an earlier thread pointed out that loops doing a lot of MMIO are
> _slower_ with KVM than without - this manifested as very slow VGA
> output for some guests.  Having KVM pass control to TCG for short runs
> of guest instructions which do MMIO, or other instructions which need
> to be emulated, would accelerate KVM in this respect.
>   

Note, the devil is in the details here.

An MMIO exit to userspace typically costs around 6k cycles.  On the 
other hand, a TB translation tends to average closer to 300k often times 
reaching much higher.  This with was with dyngen so TCG may be more or 
less expensive.

An in-kernel MMIO exit on the other hand will cost around 3k cycles.  
MMIO coalescing is pretty efficient because it effectively reduces the 
cost of a exit by half.

To make up the cost of TCG translation for just one TB, you need to have 
a tight loop of at least 50 iterations.  We can handle rep instructions 
with a single exit in KVM so this needs to be an actual MMIO loop, not a 
rep loop.

If you also consider all the potential locking issues with SMP guests, I 
think it's pretty likely that there are few cases where dropping to TCG 
is going to be a net performance win.

Regards,

Anthony Liguori

> -- Jamie
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>   

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH][RFC] Split non-TCG bits out of exec.c
  2008-11-14 14:07   ` Anthony Liguori
@ 2008-11-14 23:13     ` Jamie Lokier
  2008-11-14 23:20       ` Anthony Liguori
  0 siblings, 1 reply; 17+ messages in thread
From: Jamie Lokier @ 2008-11-14 23:13 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Carsten Otte, Anthony Liguori, Hollis Blanchard, kvm-devel,
	qemu-devel, Paul Brook

Anthony Liguori wrote:
> Jamie Lokier wrote:
> >Also, an earlier thread pointed out that loops doing a lot of MMIO are
> >_slower_ with KVM than without - this manifested as very slow VGA
> >output for some guests.  Having KVM pass control to TCG for short runs
> >of guest instructions which do MMIO, or other instructions which need
> >to be emulated, would accelerate KVM in this respect.
...
> An MMIO exit to userspace typically costs around 6k cycles.  On the 
> other hand, a TB translation tends to average closer to 300k often times 
> reaching much higher.  This with was with dyngen so TCG may be more or 
> less expensive.
> 
> An in-kernel MMIO exit on the other hand will cost around 3k cycles.
...
> To make up the cost of TCG translation for just one TB, you need to have 
> a tight loop of at least 50 iterations.

Firstly:

That doesn't make sense: why would you do an expensive TCG translation
every time you hit the same code?  After the first encounter, if the
code page hasn't been modified, it should be a TB cache lookup to
already translated code.

I'm guessing the cost of TB cache lookup is much closer to 3k than
300k cycles, maybe even lower...

Secondly:

In these cases, you can use a special fast translation (when it's not
cached) which just copies the instructions 1:1 from the guest, simply
converting the special instructions (MMIO, anything else needing it)
to helper calls.  That's possible because you know the host is ture
architeccompatible with the guest, as it's running KVM.

> If you also consider all the potential locking issues with SMP guests, I 
> think it's pretty likely that there are few cases where dropping to TCG 
> is going to be a net performance win.

VMware claimed otherwise when Intel first brought out CPU support for
virtualisation.

SMP works fine if you map guest instructions 1:1 to host instructions
with helper calls for special cases.  Even atomics, load-locked
sequences and complex weak memory ordering things would behave
correctly.

Oops, I believe I just argued for keeping the TB cache and code
translation but not using TCG :-)

-- Jamie

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH][RFC] Split non-TCG bits out of exec.c
  2008-11-14 23:13     ` Jamie Lokier
@ 2008-11-14 23:20       ` Anthony Liguori
  0 siblings, 0 replies; 17+ messages in thread
From: Anthony Liguori @ 2008-11-14 23:20 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Carsten Otte, Paul Brook, qemu-devel, kvm-devel, Hollis Blanchard

Jamie Lokier wrote:
> Firstly:
>
> That doesn't make sense: why would you do an expensive TCG translation
> every time you hit the same code?  After the first encounter, if the
> code page hasn't been modified, it should be a TB cache lookup to
> already translated code.
>   

Except that once you run under KVM again, you lose all dirty information 
and you have to invalidate all TBs.

FWIW, a few years ago, we implemented this concept with QEMU and Xen.  
That's where my data is coming from.

> I'm guessing the cost of TB cache lookup is much closer to 3k than
> 300k cycles, maybe even lower...
>   

You're guessing and it doesn't matter anyway because the TB cache has to 
be invalidated.

> Secondly:
>
> In these cases, you can use a special fast translation (when it's not
> cached) which just copies the instructions 1:1 from the guest, simply
> converting the special instructions (MMIO, anything else needing it)
> to helper calls.  That's possible because you know the host is ture
> architeccompatible with the guest, as it's running KVM.
>   

You can't copy 1:1 because the instructions aren't 1:1.  Only trivial 
instructions that manipulate registers remain the same but even then, 
you have to do register renaming and on the x86 this probably means 
you'll have to spill some registers because you have so few.  Any memory 
reference (mov, push, pop, etc.) must be translated to a different 
instruction because you don't have a virtual address that can be 
accessed directly so you need a hook to simulate a tlb miss.

You can preserve atomicity if you try hard enough, but it certainly 
isn't a 1:1 translation in softmmu mode.

>> If you also consider all the potential locking issues with SMP guests, I 
>> think it's pretty likely that there are few cases where dropping to TCG 
>> is going to be a net performance win.
>>     
>
> VMware claimed otherwise when Intel first brought out CPU support for
> virtualisation.
>   

That's just not true.  The paper that you're most likely referencing was 
much more nuanced than that and the hardware has improved dramatically 
since then.

> SMP works fine if you map guest instructions 1:1 to host instructions
> with helper calls for special cases.  Even atomics, load-locked
> sequences and complex weak memory ordering things would behave
> correctly.
>   

You can't translate 1:1 so your argument falls apart.

Regards,

Anthony Liguori

> Oops, I believe I just argued for keeping the TB cache and code
> translation but not using TCG :-)
>
> -- Jamie
>   

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH][RFC] Split non-TCG bits out of exec.c
  2008-11-14 13:23     ` Jamie Lokier
@ 2008-11-16 13:07       ` Avi Kivity
  2008-11-17  3:57         ` Jamie Lokier
  0 siblings, 1 reply; 17+ messages in thread
From: Avi Kivity @ 2008-11-16 13:07 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Carsten Otte, Anthony Liguori, Hollis Blanchard, kvm-devel,
	qemu-devel, Paul Brook

Jamie Lokier wrote:
> Avi Kivity wrote:
>   
>> Jamie Lokier wrote:
>>     
>>> But does the fact KVM doesn't use TCG prevent KVM from running some
>>> x86 modes correctly?  E.g. I gather 16-bit code is run by KVM using
>>> VM86 mode, which is not exactly correct.  It would be nice to have KVM
>>> acceleration but also complete and correct emulation, by switching to
>>> TCG for those modes.
>>>       
>> There is work in progress to make 16-bit emulation fully accurate.
>>     
>
> Ooh!  I want my Windows 95 to run in KVM :-)
> I'm curious, how is this planned to work?
>
> I'm having trouble thinking of how to do it without software emulation
> at some stage.
>
>   

By emulating all instructions that can't be virtualized.

>> Since TCG is not smp-safe, this is very problematic for smp guests.  You 
>> would have to stop virtualization on all vcpus and start tcg on all of 
>> them.  Performance would plummet.
>>     
>
> On the other hand, when running on a KVM-capable architecture
> combination, it is definitely possible to make TCG smp-safe because
> every guest atomic instruction has a corresponding host one.  It's
> practically a 1:1 instruction mapping on x86, which doesn't have many
> atomic instructions.  (Maybe harder on other archs).
>
>   

Maybe.  It's simpler to fix kvm not to require this.  I don't want kvm 
to be tied to qemu; when userspace tells kvm to run a vcpu, it means run 
the vcpu; not "run the vcpu unless there are some instructions you can't 
run for some undocumented reason".

>> There are ways of mitigating the high mmio cost with kvm.  For 
>> framebuffers, one can allow kvm direct access.  For other mmio, there's 
>> the 'coalesced mmio' support which allows mmio to be batched when this 
>> does not affect emulation accuracy and latency.
>>     
>
> Don't you still have to trap for each MMIO in order to collect the
> batch, except for REP instructions?  It's the traps which are expensive.
>
> Fortunately modern hardware tends to use DMA for data intensive
> things, and MMIO just to trigger DMA, and initialisation.
>   

In practice things work fine.  16-color modes are slow but only very old 
software was designed to work with them, so it expected the hardware to 
be slow.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH][RFC] Split non-TCG bits out of exec.c
  2008-11-16 13:07       ` Avi Kivity
@ 2008-11-17  3:57         ` Jamie Lokier
  0 siblings, 0 replies; 17+ messages in thread
From: Jamie Lokier @ 2008-11-17  3:57 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Carsten Otte, Anthony Liguori, Hollis Blanchard, kvm-devel,
	qemu-devel, Paul Brook

Avi Kivity wrote:
> >>>But does the fact KVM doesn't use TCG prevent KVM from running some
> >>>x86 modes correctly?  E.g. I gather 16-bit code is run by KVM using
> >>>VM86 mode, which is not exactly correct.  It would be nice to have KVM
> >>>acceleration but also complete and correct emulation, by switching to
> >>>TCG for those modes.
> >>>      
> >>There is work in progress to make 16-bit emulation fully accurate.
> >
> >Ooh!  I want my Windows 95 to run in KVM :-)
> >I'm curious, how is this planned to work?
> >
> >I'm having trouble thinking of how to do it without software emulation
> >at some stage.
> 
> By emulating all instructions that can't be virtualized.

Ah, I see (after much reading)... the idea is to finish the software
emulator for real-mode instructions in the kernel, include floating
point and 32-bit, and then to stop using VM86 altogether when
emulating real-mode.  VM86 might still be used to virtualize VM86 :-)

Fortunately the set of instructions in real-mode is small (by x86
standards!), and listed in Intel's system architecture manual:
"Instructions Supported in Real-Address Mode", plus x87 instructions
and a few quasi-undocumented ones.  Other instructions (MMX, SSE,
etc.) cannot run in real mode, so a complete real-mode emulator is
reasonably small.

I was under the impression real-mode emulation needed to cover most of
the x86 instruction set, which is large, but this is not required.

Great!

I'm looking forward to running Windows 95 and 3.11 under it :-)

-- Jamie

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2008-11-17  3:57 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-12 22:10 [Qemu-devel] [PATCH][RFC] Split non-TCG bits out of exec.c Anthony Liguori
2008-11-12 22:48 ` Fabrice Bellard
2008-11-12 22:53   ` Anthony Liguori
2008-11-13 13:51 ` andrzej zaborowski
2008-11-13 16:18   ` Anthony Liguori
2008-11-14  3:12     ` andrzej zaborowski
2008-11-14  3:18       ` Anthony Liguori
2008-11-14 13:45         ` andrzej zaborowski
2008-11-14  4:03 ` Jamie Lokier
2008-11-14  9:58   ` Avi Kivity
2008-11-14 13:23     ` Jamie Lokier
2008-11-16 13:07       ` Avi Kivity
2008-11-17  3:57         ` Jamie Lokier
2008-11-14 13:58   ` Anthony Liguori
2008-11-14 14:07   ` Anthony Liguori
2008-11-14 23:13     ` Jamie Lokier
2008-11-14 23:20       ` Anthony Liguori

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).