[Qemu-devel] [RFC][PATCT 0/5 v2] dump memory when host pci device is used by guest

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [RFC][PATCT 0/5 v2] dump memory when host pci device is used by guest
@ 2011-12-09  7:57 Wen Congyang
  2011-12-09  8:06 ` [Qemu-devel] [RFC][PATCH 1/5 v2] Add API to create memory mapping list Wen Congyang
                   ` (6 more replies)
  0 siblings, 7 replies; 16+ messages in thread
From: Wen Congyang @ 2011-12-09  7:57 UTC (permalink / raw)
  To: qemu-devel, Jan Kiszka, Dave Anderson, HATAYAMA Daisuke

Hi, all

'virsh dump' can not work when host pci device is used by guest. We have
discussed this issue here:
http://lists.nongnu.org/archive/html/qemu-devel/2011-10/msg00736.html

We have determined to introduce a new command dump to dump memory. The core
file's format can be elf.

Note:
1. The guest should be x86 or x86_64. The other arch is not supported.
2. If you use old gdb, gdb may crash. I use gdb-7.3.1, and it does not crash.
3. If the OS is in the second kernel, gdb may not work well, and crash can
   work by specifying '--machdep phys_addr=xxx' in the command line. The
   reason is that the second kernel will update the page table, and we can
   not get the page table for the first kernel.
4. If the guest OS is 32 bit and the memory size is larger than 4G, the vmcore
   is elf64 format. You should use the gdb which is built with --enable-64-bit-bfd.

Changes from v1 to v2:
1. fix virt addr in the vmcore.

Wen Congyang (5):
  Add API to create memory mapping list
  Add API to check whether a physical address is I/O address
  target-i386: implement cpu_get_memory_mapping()
  Add API to get memory mapping
  introduce a new monitor command 'dump' to dump guest's memory

 Makefile.target      |    9 +-
 cpu-all.h            |   10 +
 cpu-common.h         |    1 +
 dump.c               |  722 ++++++++++++++++++++++++++++++++++++++++++++++++++
 dump.h               |    6 +
 exec.c               |   20 ++
 hmp-commands.hx      |   16 ++
 memory_mapping.c     |  183 +++++++++++++
 memory_mapping.h     |   30 ++
 monitor.c            |    3 +
 qmp-commands.hx      |   24 ++
 target-i386/helper.c |  239 +++++++++++++++++
 12 files changed, 1259 insertions(+), 4 deletions(-)
 create mode 100644 dump.c
 create mode 100644 dump.h
 create mode 100644 memory_mapping.c
 create mode 100644 memory_mapping.h

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Qemu-devel] [RFC][PATCH 1/5 v2] Add API to create memory mapping list
  2011-12-09  7:57 [Qemu-devel] [RFC][PATCT 0/5 v2] dump memory when host pci device is used by guest Wen Congyang
@ 2011-12-09  8:06 ` Wen Congyang
  2011-12-13 13:03   ` Jan Kiszka
  2011-12-09  8:07 ` [Qemu-devel] [RFC][PATCH 2/5 v2] Add API to check whether a physical address is I/O address Wen Congyang
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Wen Congyang @ 2011-12-09  8:06 UTC (permalink / raw)
  To: qemu-devel, Jan Kiszka, Dave Anderson, HATAYAMA Daisuke

The memory mapping list stores virtual address and physical address mapping.
The folloing patch will use this information to create PT_LOAD in the vmcore.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 Makefile.target  |    1 +
 memory_mapping.c |  130 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 memory_mapping.h |   29 ++++++++++++
 3 files changed, 160 insertions(+), 0 deletions(-)
 create mode 100644 memory_mapping.c
 create mode 100644 memory_mapping.h

diff --git a/Makefile.target b/Makefile.target
index a111521..778f514 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -205,6 +205,7 @@ obj-$(CONFIG_REALLY_VIRTFS) += 9pfs/virtio-9p-device.o
 obj-$(CONFIG_KVM) += kvm.o kvm-all.o
 obj-$(CONFIG_NO_KVM) += kvm-stub.o
 obj-y += memory.o
+obj-y += memory_mapping.o
 LIBS+=-lz
 
 QEMU_CFLAGS += $(VNC_TLS_CFLAGS)
diff --git a/memory_mapping.c b/memory_mapping.c
new file mode 100644
index 0000000..d83b7d7
--- /dev/null
+++ b/memory_mapping.c
@@ -0,0 +1,130 @@
+/*
+ * QEMU memory mapping
+ *
+ * Copyright Fujitsu, Corp. 2011
+ *
+ * Authors:
+ *     Wen Congyang <wency@cn.fujitsu.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#include "cpu.h"
+#include "cpu-all.h"
+#include "memory_mapping.h"
+
+static MemoryMapping *last_mapping;
+
+static void create_new_memory_mapping(MemoryMappingList *list,
+                                      target_phys_addr_t phys_addr,
+                                      target_phys_addr_t virt_addr,
+                                      ram_addr_t length)
+{
+    MemoryMapping *memory_mapping, *p;
+
+    memory_mapping = g_malloc(sizeof(MemoryMapping));
+    memory_mapping->phys_addr = phys_addr;
+    memory_mapping->virt_addr = virt_addr;
+    memory_mapping->length = length;
+    last_mapping = memory_mapping;
+    list->num++;
+    QTAILQ_FOREACH(p, &list->head, next) {
+        if (p->phys_addr >= memory_mapping->phys_addr) {
+            QTAILQ_INSERT_BEFORE(p, memory_mapping, next);
+            return;
+        }
+    }
+    QTAILQ_INSERT_TAIL(&list->head, memory_mapping, next);
+    return;
+}
+
+void create_new_memory_mapping_head(MemoryMappingList *list,
+                                    target_phys_addr_t phys_addr,
+                                    target_phys_addr_t virt_addr,
+                                    ram_addr_t length)
+{
+    MemoryMapping *memory_mapping;
+
+    memory_mapping = g_malloc(sizeof(MemoryMapping));
+    memory_mapping->phys_addr = phys_addr;
+    memory_mapping->virt_addr = virt_addr;
+    memory_mapping->length = length;
+    last_mapping = memory_mapping;
+    list->num++;
+    QTAILQ_INSERT_HEAD(&list->head, memory_mapping, next);
+    return;
+}
+
+void add_to_memory_mapping(MemoryMappingList *list,
+                           target_phys_addr_t phys_addr,
+                           target_phys_addr_t virt_addr,
+                           ram_addr_t length)
+{
+    MemoryMapping *memory_mapping;
+
+    if (QTAILQ_EMPTY(&list->head)) {
+        create_new_memory_mapping(list, phys_addr, virt_addr, length);
+        return;
+    }
+
+    if (last_mapping) {
+        if ((phys_addr == (last_mapping->phys_addr + last_mapping->length)) &&
+            (virt_addr == (last_mapping->virt_addr + last_mapping->length))) {
+            last_mapping->length += length;
+            return;
+        }
+    }
+
+    QTAILQ_FOREACH(memory_mapping, &list->head, next) {
+        last_mapping = memory_mapping;
+        if ((phys_addr == (last_mapping->phys_addr + last_mapping->length)) &&
+            (virt_addr == (last_mapping->virt_addr + last_mapping->length))) {
+            last_mapping->length += length;
+            return;
+        }
+
+        if (!(phys_addr >= (last_mapping->phys_addr)) ||
+            !(phys_addr < (last_mapping->phys_addr + last_mapping->length))) {
+            /* last_mapping does not contain this region */
+            continue;
+        }
+        if (!(virt_addr >= (last_mapping->virt_addr)) ||
+            !(virt_addr < (last_mapping->virt_addr + last_mapping->length))) {
+            /* last_mapping does not contain this region */
+            continue;
+        }
+        if ((virt_addr - last_mapping->virt_addr) !=
+            (phys_addr - last_mapping->phys_addr)) {
+            /*
+             * last_mapping contains this region, but we should create another
+             * mapping region.
+             */
+            break;
+        }
+
+        /* merge this region into last_mapping */
+        if ((virt_addr + length) >
+            (last_mapping->virt_addr + last_mapping->length)) {
+            last_mapping->length = virt_addr + length - last_mapping->virt_addr;
+        }
+        return;
+    }
+
+    /* this region can not be merged into any existed memory mapping. */
+    create_new_memory_mapping(list, phys_addr, virt_addr, length);
+    return;
+}
+
+void free_memory_mapping_list(MemoryMappingList *list)
+{
+    MemoryMapping *p, *q;
+
+    QTAILQ_FOREACH_SAFE(p, &list->head, next, q) {
+        QTAILQ_REMOVE(&list->head, p, next);
+        g_free(p);
+    }
+
+    list->num = 0;
+}
diff --git a/memory_mapping.h b/memory_mapping.h
new file mode 100644
index 0000000..871591d
--- /dev/null
+++ b/memory_mapping.h
@@ -0,0 +1,29 @@
+#ifndef MEMORY_MAPPING_H
+#define MEMORY_MAPPING_H
+
+#include "qemu-queue.h"
+
+typedef struct MemoryMapping {
+    target_phys_addr_t phys_addr;
+    target_ulong virt_addr;
+    ram_addr_t length;
+    QTAILQ_ENTRY(MemoryMapping) next;
+} MemoryMapping;
+
+typedef struct MemoryMappingList {
+    unsigned int num;
+    QTAILQ_HEAD(, MemoryMapping) head;
+} MemoryMappingList;
+
+void create_new_memory_mapping_head(MemoryMappingList *list,
+                                    target_phys_addr_t phys_addr,
+                                    target_phys_addr_t virt_addr,
+                                    ram_addr_t length);
+void add_to_memory_mapping(MemoryMappingList *list,
+                           target_phys_addr_t phys_addr,
+                           target_phys_addr_t virt_addr,
+                           ram_addr_t length);
+
+void free_memory_mapping_list(MemoryMappingList *list);
+
+#endif
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Qemu-devel] [RFC][PATCH 2/5 v2] Add API to check whether a physical address is I/O address
  2011-12-09  7:57 [Qemu-devel] [RFC][PATCT 0/5 v2] dump memory when host pci device is used by guest Wen Congyang
  2011-12-09  8:06 ` [Qemu-devel] [RFC][PATCH 1/5 v2] Add API to create memory mapping list Wen Congyang
@ 2011-12-09  8:07 ` Wen Congyang
  2011-12-09  8:08 ` [Qemu-devel] [RFC][PATCH 3/5 v2] target-i386: implement cpu_get_memory_mapping() Wen Congyang
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Wen Congyang @ 2011-12-09  8:07 UTC (permalink / raw)
  To: qemu-devel, Jan Kiszka, Dave Anderson, HATAYAMA Daisuke

This API will be used in the following patch.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 cpu-common.h |    1 +
 exec.c       |   20 ++++++++++++++++++++
 2 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/cpu-common.h b/cpu-common.h
index 7c9cef8..abcd1a6 100644
--- a/cpu-common.h
+++ b/cpu-common.h
@@ -123,6 +123,7 @@ struct CPUPhysMemoryClient {
     QLIST_ENTRY(CPUPhysMemoryClient) list;
 };
 
+bool is_io_addr(target_phys_addr_t phys_addr);
 void cpu_register_phys_memory_client(CPUPhysMemoryClient *);
 void cpu_unregister_phys_memory_client(CPUPhysMemoryClient *);
 
diff --git a/exec.c b/exec.c
index 6b92198..0a14cdd 100644
--- a/exec.c
+++ b/exec.c
@@ -4786,3 +4786,23 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
 #undef env
 
 #endif
+
+bool is_io_addr(target_phys_addr_t phys_addr)
+{
+    ram_addr_t pd;
+    PhysPageDesc *p;
+
+    p = phys_page_find(phys_addr >> TARGET_PAGE_BITS);
+    if (!p) {
+        pd = IO_MEM_UNASSIGNED;
+    } else {
+        pd = p->phys_offset;
+    }
+
+    if ((pd & ~TARGET_PAGE_MASK) > IO_MEM_ROM && !(pd & IO_MEM_ROMD)) {
+        /* I/O region */
+        return true;
+    }
+
+    return false;
+}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Qemu-devel] [RFC][PATCH 3/5 v2] target-i386: implement cpu_get_memory_mapping()
  2011-12-09  7:57 [Qemu-devel] [RFC][PATCT 0/5 v2] dump memory when host pci device is used by guest Wen Congyang
  2011-12-09  8:06 ` [Qemu-devel] [RFC][PATCH 1/5 v2] Add API to create memory mapping list Wen Congyang
  2011-12-09  8:07 ` [Qemu-devel] [RFC][PATCH 2/5 v2] Add API to check whether a physical address is I/O address Wen Congyang
@ 2011-12-09  8:08 ` Wen Congyang
  2011-12-09  8:09 ` [Qemu-devel] [RFC][PATCH 4/5 v2] Add API to get memory mapping Wen Congyang
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Wen Congyang @ 2011-12-09  8:08 UTC (permalink / raw)
  To: qemu-devel, Jan Kiszka, Dave Anderson, HATAYAMA Daisuke

Walk cpu's page table and collect all virtual address and physical address mapping.
Then, add these mapping into memory mapping list.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 cpu-all.h            |   10 ++
 target-i386/helper.c |  239 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 249 insertions(+), 0 deletions(-)

diff --git a/cpu-all.h b/cpu-all.h
index 7246a67..2f1013f 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -22,6 +22,7 @@
 #include "qemu-common.h"
 #include "qemu-tls.h"
 #include "cpu-common.h"
+#include "memory_mapping.h"
 
 /* some important defines:
  *
@@ -584,4 +585,13 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf);
 int cpu_memory_rw_debug(CPUState *env, target_ulong addr,
                         uint8_t *buf, int len, int is_write);
 
+#if defined(TARGET_I386)
+void cpu_get_memory_mapping(MemoryMappingList *list, CPUState *env);
+#else
+static inline void cpu_get_memory_mapping(MemoryMappingList *list,
+                                          CPUState *env)
+{
+}
+#endif
+
 #endif /* CPU_ALL_H */
diff --git a/target-i386/helper.c b/target-i386/helper.c
index 2586aff..c33747b 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -1290,3 +1290,242 @@ void do_cpu_sipi(CPUState *env)
 {
 }
 #endif
+
+/* PAE Paging or IA-32e Paging */
+static void walk_pte(MemoryMappingList *list, target_phys_addr_t pte_start_addr,
+                     int32_t a20_mask, target_ulong start_line_addr)
+{
+    target_phys_addr_t pte_addr, start_paddr;
+    uint64_t pte;
+    target_ulong start_vaddr;
+    int i;
+
+    for (i = 0; i < 512; i++) {
+        pte_addr = (pte_start_addr + i * 8) & a20_mask;
+        pte = ldq_phys(pte_addr);
+        if (!(pte & PG_PRESENT_MASK)) {
+            /* not present */
+            continue;
+        }
+
+        start_paddr = (pte & ~0xfff) & ~(0x1ULL << 63);
+        if (is_io_addr(start_paddr)) {
+            /* I/O region */
+            continue;
+        }
+
+        start_vaddr = start_line_addr | ((i & 0x1fff) << 12);
+        add_to_memory_mapping(list, start_paddr, start_vaddr, 1 << 12);
+    }
+}
+
+/* 32-bit Paging */
+static void walk_pte2(MemoryMappingList *list,
+                      target_phys_addr_t pte_start_addr, int32_t a20_mask,
+                      target_ulong start_line_addr)
+{
+    target_phys_addr_t pte_addr, start_paddr;
+    uint32_t pte;
+    target_ulong start_vaddr;
+    int i;
+
+    for (i = 0; i < 1024; i++) {
+        pte_addr = (pte_start_addr + i * 4) & a20_mask;
+        pte = ldl_phys(pte_addr);
+        if (!(pte & PG_PRESENT_MASK)) {
+            /* not present */
+            continue;
+        }
+
+        start_paddr = pte & ~0xfff;
+        if (is_io_addr(start_paddr)) {
+            /* I/O region */
+            continue;
+        }
+
+        start_vaddr = start_line_addr | ((i & 0x3ff) << 12);
+        add_to_memory_mapping(list, start_paddr, start_vaddr, 1 << 12);
+    }
+}
+
+/* PAE Paging or IA-32e Paging */
+static void walk_pde(MemoryMappingList *list, target_phys_addr_t pde_start_addr,
+                     int32_t a20_mask, target_ulong start_line_addr)
+{
+    target_phys_addr_t pde_addr, pte_start_addr, start_paddr;
+    uint64_t pde;
+    target_ulong line_addr, start_vaddr;
+    int i;
+
+    for (i = 0; i < 512; i++) {
+        pde_addr = (pde_start_addr + i * 8) & a20_mask;
+        pde = ldq_phys(pde_addr);
+        if (!(pde & PG_PRESENT_MASK)) {
+            /* not present */
+            continue;
+        }
+
+        line_addr = start_line_addr | ((i & 0x1ff) << 21);
+        if (pde & PG_PSE_MASK) {
+            /* 2 MB page */
+            start_paddr = (pde & ~0x1fffff) & ~(0x1ULL << 63);
+            if (is_io_addr(start_paddr)) {
+                /* I/O region */
+                continue;
+            }
+            start_vaddr = line_addr;
+            add_to_memory_mapping(list, start_paddr, start_vaddr, 1 << 21);
+            continue;
+        }
+
+        pte_start_addr = (pde & ~0xfff) & a20_mask;
+        walk_pte(list, pte_start_addr, a20_mask, line_addr);
+    }
+}
+
+/* 32-bit Paging */
+static void walk_pde2(MemoryMappingList *list,
+                      target_phys_addr_t pde_start_addr, int32_t a20_mask,
+                      bool pse)
+{
+    target_phys_addr_t pde_addr, pte_start_addr, start_paddr;
+    uint32_t pde;
+    target_ulong line_addr, start_vaddr;
+    int i;
+
+    for (i = 0; i < 1024; i++) {
+        pde_addr = (pde_start_addr + i * 4) & a20_mask;
+        pde = ldl_phys(pde_addr);
+        if (!(pde & PG_PRESENT_MASK)) {
+            /* not present */
+            continue;
+        }
+
+        line_addr = (((unsigned int)i & 0x3ff) << 22);
+        if ((pde & PG_PSE_MASK) && pse) {
+            /* 4 MB page */
+            start_paddr = (pde & ~0x3fffff) | ((pde & 0x1fe000) << 19);
+            if (is_io_addr(start_paddr)) {
+                /* I/O region */
+                continue;
+            }
+            start_vaddr = line_addr;
+            add_to_memory_mapping(list, start_paddr, start_vaddr, 1 << 22);
+            continue;
+        }
+
+        pte_start_addr = (pde & ~0xfff) & a20_mask;
+        walk_pte2(list, pte_start_addr, a20_mask, line_addr);
+    }
+}
+
+/* PAE Paging */
+static void walk_pdpe2(MemoryMappingList *list,
+                       target_phys_addr_t pdpe_start_addr, int32_t a20_mask)
+{
+    target_phys_addr_t pdpe_addr, pde_start_addr;
+    uint64_t pdpe;
+    target_ulong line_addr;
+    int i;
+
+    for (i = 0; i < 4; i++) {
+        pdpe_addr = (pdpe_start_addr + i * 8) & a20_mask;
+        pdpe = ldq_phys(pdpe_addr);
+        if (!(pdpe & PG_PRESENT_MASK)) {
+            /* not present */
+            continue;
+        }
+
+        line_addr = (((unsigned int)i & 0x3) << 30);
+        pde_start_addr = (pdpe & ~0xfff) & a20_mask;
+        walk_pde(list, pde_start_addr, a20_mask, line_addr);
+    }
+}
+
+#ifdef TARGET_X86_64
+/* IA-32e Paging */
+static void walk_pdpe(MemoryMappingList *list,
+                      target_phys_addr_t pdpe_start_addr, int32_t a20_mask,
+                      target_ulong start_line_addr)
+{
+    target_phys_addr_t pdpe_addr, pde_start_addr, start_paddr;
+    uint64_t pdpe;
+    target_ulong line_addr, start_vaddr;
+    int i;
+
+    for (i = 0; i < 512; i++) {
+        pdpe_addr = (pdpe_start_addr + i * 8) & a20_mask;
+        pdpe = ldq_phys(pdpe_addr);
+        if (!(pdpe & PG_PRESENT_MASK)) {
+            /* not present */
+            continue;
+        }
+
+        line_addr = start_line_addr | ((i & 0x1ffULL) << 30);
+        if (pdpe & PG_PSE_MASK) {
+            /* 1 GB page */
+            start_paddr = (pdpe & ~0x3fffffff) & ~(0x1ULL << 63);
+            if (is_io_addr(start_paddr)) {
+                /* I/O region */
+                continue;
+            }
+            start_vaddr = line_addr;
+            add_to_memory_mapping(list, start_paddr, start_vaddr, 1 << 30);
+            continue;
+        }
+
+        pde_start_addr = (pdpe & ~0xfff) & a20_mask;
+        walk_pde(list, pde_start_addr, a20_mask, line_addr);
+    }
+}
+
+/* IA-32e Paging */
+static void walk_pml4e(MemoryMappingList *list,
+                       target_phys_addr_t pml4e_start_addr, int32_t a20_mask)
+{
+    target_phys_addr_t pml4e_addr, pdpe_start_addr;
+    uint64_t pml4e;
+    target_ulong line_addr;
+    int i;
+
+    for (i = 0; i < 512; i++) {
+        pml4e_addr = (pml4e_start_addr + i * 8) & a20_mask;
+        pml4e = ldq_phys(pml4e_addr);
+        if (!(pml4e & PG_PRESENT_MASK)) {
+            /* not present */
+            continue;
+        }
+
+        line_addr = ((i & 0x1ffULL) << 39) | (0xffffULL << 48);
+        pdpe_start_addr = (pml4e & ~0xfff) & a20_mask;
+        walk_pdpe(list, pdpe_start_addr, a20_mask, line_addr);
+    }
+}
+#endif
+
+void cpu_get_memory_mapping(MemoryMappingList *list, CPUState *env)
+{
+    if (env->cr[4] & CR4_PAE_MASK) {
+#ifdef TARGET_X86_64
+        if (env->hflags & HF_LMA_MASK) {
+            target_phys_addr_t pml4e_addr;
+
+            pml4e_addr = (env->cr[3] & ~0xfff) & env->a20_mask;
+            walk_pml4e(list, pml4e_addr, env->a20_mask);
+        } else
+#endif
+        {
+            target_phys_addr_t pdpe_addr;
+
+            pdpe_addr = (env->cr[3] & ~0x1f) & env->a20_mask;
+            walk_pdpe2(list, pdpe_addr, env->a20_mask);
+        }
+    } else {
+        target_phys_addr_t pde_addr;
+        bool pse;
+
+        pde_addr = (env->cr[3] & ~0xfff) & env->a20_mask;
+        pse = !!(env->cr[4] & CR4_PSE_MASK);
+        walk_pde2(list, pde_addr, env->a20_mask, pse);
+    }
+}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Qemu-devel] [RFC][PATCH 4/5 v2] Add API to get memory mapping
  2011-12-09  7:57 [Qemu-devel] [RFC][PATCT 0/5 v2] dump memory when host pci device is used by guest Wen Congyang
                   ` (2 preceding siblings ...)
  2011-12-09  8:08 ` [Qemu-devel] [RFC][PATCH 3/5 v2] target-i386: implement cpu_get_memory_mapping() Wen Congyang
@ 2011-12-09  8:09 ` Wen Congyang
  2011-12-09  8:09 ` [Qemu-devel] [RFC][PATCH 5/5v2] introduce a new monitor command 'dump' to dump guest's memory Wen Congyang
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Wen Congyang @ 2011-12-09  8:09 UTC (permalink / raw)
  To: qemu-devel, Jan Kiszka, Dave Anderson, HATAYAMA Daisuke

Add API to get all virtual address and physical address mapping.
If there is no virtual address for some physical address, the virtual
address is 0.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 memory_mapping.c |   53 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 memory_mapping.h |    1 +
 2 files changed, 54 insertions(+), 0 deletions(-)

diff --git a/memory_mapping.c b/memory_mapping.c
index d83b7d7..6c1778c 100644
--- a/memory_mapping.c
+++ b/memory_mapping.c
@@ -128,3 +128,56 @@ void free_memory_mapping_list(MemoryMappingList *list)
 
     list->num = 0;
 }
+
+void get_memory_mapping(MemoryMappingList *list)
+{
+    CPUState *env;
+    MemoryMapping *memory_mapping;
+    RAMBlock *block;
+    ram_addr_t offset, length;
+
+    for (env = first_cpu; env != NULL; env = env->next_cpu) {
+        cpu_get_memory_mapping(list, env);
+    }
+
+    /* some memory may be not mapped, add them into memory mapping's list */
+    QLIST_FOREACH(block, &ram_list.blocks, next) {
+        offset = block->offset;
+        length = block->length;
+
+        QTAILQ_FOREACH(memory_mapping, &list->head, next) {
+            if (memory_mapping->phys_addr >= (offset + length)) {
+                /*
+                 * memory_mapping'list does not conatin the region
+                 * [offset, offset+length)
+                 */
+                create_new_memory_mapping(list, offset, 0, length);
+                break;
+            }
+
+            if ((memory_mapping->phys_addr + memory_mapping->length) <=
+                offset) {
+                continue;
+            }
+
+            if (memory_mapping->phys_addr > offset) {
+                /*
+                 * memory_mapping'list does not conatin the region
+                 * [offset, memory_mapping->phys_addr)
+                 */
+                create_new_memory_mapping(list, offset, 0,
+                                          memory_mapping->phys_addr - offset);
+            }
+
+            if ((offset + length) <=
+                (memory_mapping->phys_addr + memory_mapping->length)) {
+                break;
+            }
+            length -= memory_mapping->phys_addr + memory_mapping->length -
+                      offset;
+            offset = memory_mapping->phys_addr + memory_mapping->length;
+        }
+    }
+
+    return;
+}
diff --git a/memory_mapping.h b/memory_mapping.h
index 871591d..9a876b7 100644
--- a/memory_mapping.h
+++ b/memory_mapping.h
@@ -25,5 +25,6 @@ void add_to_memory_mapping(MemoryMappingList *list,
                            ram_addr_t length);
 
 void free_memory_mapping_list(MemoryMappingList *list);
+void get_memory_mapping(MemoryMappingList *list);
 
 #endif
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [Qemu-devel] [RFC][PATCH 5/5v2] introduce a new monitor command 'dump' to dump guest's memory
  2011-12-09  7:57 [Qemu-devel] [RFC][PATCT 0/5 v2] dump memory when host pci device is used by guest Wen Congyang
                   ` (3 preceding siblings ...)
  2011-12-09  8:09 ` [Qemu-devel] [RFC][PATCH 4/5 v2] Add API to get memory mapping Wen Congyang
@ 2011-12-09  8:09 ` Wen Congyang
  2011-12-13  3:12 ` [Qemu-devel] [RFC][PATCT 0/5 v2] dump memory when host pci device is used by guest HATAYAMA Daisuke
  2011-12-13 12:55 ` Jan Kiszka
  6 siblings, 0 replies; 16+ messages in thread
From: Wen Congyang @ 2011-12-09  8:09 UTC (permalink / raw)
  To: qemu-devel, Jan Kiszka, Dave Anderson, HATAYAMA Daisuke

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 Makefile.target |    8 +-
 dump.c          |  722 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 dump.h          |    6 +
 hmp-commands.hx |   16 ++
 monitor.c       |    3 +
 qmp-commands.hx |   24 ++
 6 files changed, 775 insertions(+), 4 deletions(-)
 create mode 100644 dump.c
 create mode 100644 dump.h

diff --git a/Makefile.target b/Makefile.target
index 778f514..49164d3 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -118,7 +118,7 @@ $(call set-vpath, $(SRC_PATH)/linux-user:$(SRC_PATH)/linux-user/$(TARGET_ABI_DIR
 QEMU_CFLAGS+=-I$(SRC_PATH)/linux-user/$(TARGET_ABI_DIR) -I$(SRC_PATH)/linux-user
 obj-y = main.o syscall.o strace.o mmap.o signal.o thunk.o \
       elfload.o linuxload.o uaccess.o gdbstub.o cpu-uname.o \
-      user-exec.o $(oslib-obj-y)
+      user-exec.o $(oslib-obj-y) dump.o
 
 obj-$(TARGET_HAS_BFLT) += flatload.o
 
@@ -156,7 +156,7 @@ LDFLAGS+=-Wl,-segaddr,__STD_PROG_ZONE,0x1000 -image_base 0x0e000000
 LIBS+=-lmx
 
 obj-y = main.o commpage.o machload.o mmap.o signal.o syscall.o thunk.o \
-        gdbstub.o user-exec.o
+        gdbstub.o user-exec.o dump.o
 
 obj-i386-y += ioport-user.o
 
@@ -178,7 +178,7 @@ $(call set-vpath, $(SRC_PATH)/bsd-user)
 QEMU_CFLAGS+=-I$(SRC_PATH)/bsd-user -I$(SRC_PATH)/bsd-user/$(TARGET_ARCH)
 
 obj-y = main.o bsdload.o elfload.o mmap.o signal.o strace.o syscall.o \
-        gdbstub.o uaccess.o user-exec.o
+        gdbstub.o uaccess.o user-exec.o dump.o
 
 obj-i386-y += ioport-user.o
 
@@ -194,7 +194,7 @@ endif #CONFIG_BSD_USER
 # System emulator target
 ifdef CONFIG_SOFTMMU
 
-obj-y = arch_init.o cpus.o monitor.o machine.o gdbstub.o balloon.o ioport.o
+obj-y = arch_init.o cpus.o monitor.o machine.o gdbstub.o balloon.o ioport.o dump.o
 # virtio has to be here due to weird dependency between PCI and virtio-net.
 # need to fix this properly
 obj-$(CONFIG_NO_PCI) += pci-stub.o
diff --git a/dump.c b/dump.c
new file mode 100644
index 0000000..b9af75c
--- /dev/null
+++ b/dump.c
@@ -0,0 +1,722 @@
+/*
+ * QEMU dump
+ *
+ * Copyright Fujitsu, Corp. 2011
+ *
+ * Authors:
+ *     Wen Congyang <wency@cn.fujitsu.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu-common.h"
+#include <unistd.h>
+#include <elf.h>
+#include <sys/procfs.h>
+#include "cpu.h"
+#include "cpu-all.h"
+#include "targphys.h"
+#include "monitor.h"
+#include "kvm.h"
+#include "dump.h"
+#include "sysemu.h"
+#include "bswap.h"
+#include "memory_mapping.h"
+
+static inline int cpuid(CPUState *env)
+{
+#if defined(CONFIG_USER_ONLY) && defined(CONFIG_USE_NPTL)
+    return env->host_tid;
+#else
+    return env->cpu_index + 1;
+#endif
+}
+
+#if defined(TARGET_I386)
+
+#ifdef TARGET_X86_64
+typedef struct {
+    target_ulong r15, r14, r13, r12, rbp, rbx, r11, r10;
+    target_ulong r9, r8, rax, rcx, rdx, rsi, rdi, orig_rax;
+    target_ulong rip, cs, eflags;
+    target_ulong rsp, ss;
+    target_ulong fs_base, gs_base;
+    target_ulong ds, es, fs, gs;
+} x86_64_user_regs_struct;
+
+static int x86_64_write_elf64_note(Monitor *mon, int fd, CPUState *env,
+                                   target_phys_addr_t *offset)
+{
+    x86_64_user_regs_struct regs;
+    Elf64_Nhdr *note;
+    char *buf;
+    int descsz, note_size, name_size = 5;
+    const char *name = "CORE";
+    int id = cpuid(env);
+    int ret;
+
+    regs.r15 = env->regs[15];
+    regs.r14 = env->regs[14];
+    regs.r13 = env->regs[13];
+    regs.r12 = env->regs[12];
+    regs.r11 = env->regs[11];
+    regs.r10 = env->regs[10];
+    regs.r9  = env->regs[9];
+    regs.r8  = env->regs[8];
+    regs.rbp = env->regs[R_EBP];
+    regs.rsp = env->regs[R_ESP];
+    regs.rdi = env->regs[R_EDI];
+    regs.rsi = env->regs[R_ESI];
+    regs.rdx = env->regs[R_EDX];
+    regs.rcx = env->regs[R_ECX];
+    regs.rbx = env->regs[R_EBX];
+    regs.rax = env->regs[R_EAX];
+    regs.rip = env->eip;
+    regs.eflags = env->eflags;
+
+    regs.orig_rax = 0; /* FIXME */
+    regs.cs = env->segs[R_CS].selector;
+    regs.ss = env->segs[R_SS].selector;
+    regs.fs_base = env->segs[R_FS].base;
+    regs.gs_base = env->segs[R_GS].base;
+    regs.ds = env->segs[R_DS].selector;
+    regs.es = env->segs[R_ES].selector;
+    regs.fs = env->segs[R_FS].selector;
+    regs.gs = env->segs[R_GS].selector;
+
+    descsz = 336; /* sizeof(prstatus_t) is 336 on x86_64 box */
+    note_size = ((sizeof(Elf64_Nhdr) + 3) / 4 + (name_size + 3) / 4 +
+                (descsz + 3) / 4) * 4;
+    note = g_malloc(note_size);
+
+    memset(note, 0, note_size);
+    note->n_namesz = cpu_to_le32(name_size);
+    note->n_descsz = cpu_to_le32(descsz);
+    note->n_type = cpu_to_le32(NT_PRSTATUS);
+    buf = (char *)note;
+    buf += ((sizeof(Elf64_Nhdr) + 3) / 4) * 4;
+    memcpy(buf, name, name_size);
+    buf += ((name_size + 3) / 4) * 4;
+    memcpy(buf + 32, &id, 4); /* pr_pid */
+    buf += descsz - sizeof(x86_64_user_regs_struct)-sizeof(target_ulong);
+    memcpy(buf, &regs, sizeof(x86_64_user_regs_struct));
+
+    lseek(fd, *offset, SEEK_SET);
+    ret = write(fd, note, note_size);
+    g_free(note);
+    if (ret < 0) {
+        monitor_printf(mon, "dump: failed to write elf prstatus.\n");
+        return -1;
+    }
+
+    *offset += note_size;
+
+    return 0;
+}
+#endif
+
+/* This function is copied from crash */
+static target_ulong get_phys_base_addr(CPUState *env, target_ulong *base_vaddr)
+{
+    int i;
+    target_ulong kernel_base = -1;
+    target_ulong last, mask;
+
+    for (i = 30, last = -1; (kernel_base == -1) && (i >= 20); i--) {
+        mask = ~((1LL << i) - 1);
+        *base_vaddr = env->idt.base & mask;
+        if (*base_vaddr == last) {
+            continue;
+        }
+
+        kernel_base = cpu_get_phys_page_debug(env, *base_vaddr);
+        last = *base_vaddr;
+    }
+
+    return kernel_base;
+}
+
+typedef struct {
+    uint32_t ebx, ecx, edx, esi, edi, ebp, eax;
+    unsigned short ds, __ds, es, __es;
+    unsigned short fs, __fs, gs, __gs;
+    uint32_t orig_eax, eip;
+    unsigned short cs, __cs;
+    uint32_t eflags, esp;
+    unsigned short ss, __ss;
+} x86_user_regs_struct;
+
+static int x86_write_elf64_note(Monitor *mon, int fd, CPUState *env,
+                                target_phys_addr_t *offset)
+{
+    x86_user_regs_struct regs;
+    Elf64_Nhdr *note;
+    char *buf;
+    int descsz, note_size, name_size = 5;
+    const char *name = "CORE";
+    int id = cpuid(env);
+    int ret;
+
+    regs.ebp = env->regs[R_EBP] & 0xffffffff;
+    regs.esp = env->regs[R_ESP] & 0xffffffff;
+    regs.edi = env->regs[R_EDI] & 0xffffffff;
+    regs.esi = env->regs[R_ESI] & 0xffffffff;
+    regs.edx = env->regs[R_EDX] & 0xffffffff;
+    regs.ecx = env->regs[R_ECX] & 0xffffffff;
+    regs.ebx = env->regs[R_EBX] & 0xffffffff;
+    regs.eax = env->regs[R_EAX] & 0xffffffff;
+    regs.eip = env->eip & 0xffffffff;
+    regs.eflags = env->eflags & 0xffffffff;
+
+    regs.cs = env->segs[R_CS].selector;
+    regs.__cs = 0;
+    regs.ss = env->segs[R_SS].selector;
+    regs.__ss = 0;
+    regs.ds = env->segs[R_DS].selector;
+    regs.__ds = 0;
+    regs.es = env->segs[R_ES].selector;
+    regs.__es = 0;
+    regs.fs = env->segs[R_FS].selector;
+    regs.__fs = 0;
+    regs.gs = env->segs[R_GS].selector;
+    regs.__gs = 0;
+
+    descsz = 144; /* sizeof(prstatus_t) is 144 on x86 box */
+    note_size = ((sizeof(Elf64_Nhdr) + 3) / 4 + (name_size + 3) / 4 +
+                (descsz + 3) / 4) * 4;
+    note = g_malloc(note_size);
+
+    memset(note, 0, note_size);
+    note->n_namesz = cpu_to_le32(name_size);
+    note->n_descsz = cpu_to_le32(descsz);
+    note->n_type = cpu_to_le32(NT_PRSTATUS);
+    buf = (char *)note;
+    buf += ((sizeof(Elf64_Nhdr) + 3) / 4) * 4;
+    memcpy(buf, name, name_size);
+    buf += ((name_size + 3) / 4) * 4;
+    memcpy(buf + 24, &id, 4); /* pr_pid */
+    buf += descsz - sizeof(x86_user_regs_struct)-4;
+    memcpy(buf, &regs, sizeof(x86_user_regs_struct));
+
+    lseek(fd, *offset, SEEK_SET);
+    ret = write(fd, note, note_size);
+    g_free(note);
+    if (ret < 0) {
+        monitor_printf(mon, "dump: failed to write elf prstatus.\n");
+        return -1;
+    }
+
+    *offset += note_size;
+
+    return 0;
+}
+
+static int x86_write_elf32_note(Monitor *mon, int fd, CPUState *env,
+                                target_phys_addr_t *offset)
+{
+    x86_user_regs_struct regs;
+    Elf32_Nhdr *note;
+    char *buf;
+    int descsz, note_size, name_size = 5;
+    const char *name = "CORE";
+    int id = cpuid(env);
+    int ret;
+
+    regs.ebp = env->regs[R_EBP] & 0xffffffff;
+    regs.esp = env->regs[R_ESP] & 0xffffffff;
+    regs.edi = env->regs[R_EDI] & 0xffffffff;
+    regs.esi = env->regs[R_ESI] & 0xffffffff;
+    regs.edx = env->regs[R_EDX] & 0xffffffff;
+    regs.ecx = env->regs[R_ECX] & 0xffffffff;
+    regs.ebx = env->regs[R_EBX] & 0xffffffff;
+    regs.eax = env->regs[R_EAX] & 0xffffffff;
+    regs.eip = env->eip & 0xffffffff;
+    regs.eflags = env->eflags & 0xffffffff;
+
+    regs.cs = env->segs[R_CS].selector;
+    regs.__cs = 0;
+    regs.ss = env->segs[R_SS].selector;
+    regs.__ss = 0;
+    regs.ds = env->segs[R_DS].selector;
+    regs.__ds = 0;
+    regs.es = env->segs[R_ES].selector;
+    regs.__es = 0;
+    regs.fs = env->segs[R_FS].selector;
+    regs.__fs = 0;
+    regs.gs = env->segs[R_GS].selector;
+    regs.__gs = 0;
+
+    descsz = 144; /* sizeof(prstatus_t) is 144 on x86 box */
+    note_size = ((sizeof(Elf32_Nhdr) + 3) / 4 + (name_size + 3) / 4 +
+                (descsz + 3) / 4) * 4;
+    note = g_malloc(note_size);
+
+    memset(note, 0, note_size);
+    note->n_namesz = cpu_to_le32(name_size);
+    note->n_descsz = cpu_to_le32(descsz);
+    note->n_type = cpu_to_le32(NT_PRSTATUS);
+    buf = (char *)note;
+    buf += ((sizeof(Elf32_Nhdr) + 3) / 4) * 4;
+    memcpy(buf, name, name_size);
+    buf += ((name_size + 3) / 4) * 4;
+    memcpy(buf + 24, &id, 4); /* pr_pid */
+    buf += descsz - sizeof(x86_user_regs_struct)-4;
+    memcpy(buf, &regs, sizeof(x86_user_regs_struct));
+
+    lseek(fd, *offset, SEEK_SET);
+    ret = write(fd, note, note_size);
+    g_free(note);
+    if (ret < 0) {
+        monitor_printf(mon, "dump: failed to write elf prstatus.\n");
+        return -1;
+    }
+
+    *offset += note_size;
+
+    return 0;
+}
+#endif
+
+static int write_elf64_header(Monitor *mon, int fd, int phdr_num, int machine,
+                              int endian)
+{
+    Elf64_Ehdr elf_header;
+    int ret;
+
+    memset(&elf_header, 0, sizeof(Elf64_Ehdr));
+    memcpy(&elf_header, ELFMAG, 4);
+    elf_header.e_ident[EI_CLASS] = ELFCLASS64;
+    elf_header.e_ident[EI_DATA] = endian;
+    elf_header.e_ident[EI_VERSION] = EV_CURRENT;
+    elf_header.e_type = cpu_to_le16(ET_CORE);
+    elf_header.e_machine = cpu_to_le16(machine);
+    elf_header.e_version = cpu_to_le32(EV_CURRENT);
+    elf_header.e_ehsize = cpu_to_le16(sizeof(elf_header));
+    elf_header.e_phoff = cpu_to_le64(sizeof(Elf64_Ehdr));
+    elf_header.e_phentsize = cpu_to_le16(sizeof(Elf64_Phdr));
+    elf_header.e_phnum = cpu_to_le16(phdr_num);
+
+    lseek(fd, 0, SEEK_SET);
+    ret = write(fd, &elf_header, sizeof(elf_header));
+    if (ret < 0) {
+        monitor_printf(mon, "dump: failed to write elf header.\n");
+        return -1;
+    }
+
+    return 0;
+}
+
+static int write_elf32_header(Monitor *mon, int fd, int phdr_num, int machine,
+                              int endian)
+{
+    Elf32_Ehdr elf_header;
+    int ret;
+
+    memset(&elf_header, 0, sizeof(Elf32_Ehdr));
+    memcpy(&elf_header, ELFMAG, 4);
+    elf_header.e_ident[EI_CLASS] = ELFCLASS32;
+    elf_header.e_ident[EI_DATA] = endian;
+    elf_header.e_ident[EI_VERSION] = EV_CURRENT;
+    elf_header.e_type = cpu_to_le16(ET_CORE);
+    elf_header.e_machine = cpu_to_le16(machine);
+    elf_header.e_version = cpu_to_le32(EV_CURRENT);
+    elf_header.e_ehsize = cpu_to_le16(sizeof(elf_header));
+    elf_header.e_phoff = cpu_to_le32(sizeof(Elf32_Ehdr));
+    elf_header.e_phentsize = cpu_to_le16(sizeof(Elf32_Phdr));
+    elf_header.e_phnum = cpu_to_le16(phdr_num);
+
+    lseek(fd, 0, SEEK_SET);
+    ret = write(fd, &elf_header, sizeof(elf_header));
+    if (ret < 0) {
+        monitor_printf(mon, "dump: failed to write elf header.\n");
+        return -1;
+    }
+
+    return 0;
+}
+
+static int write_elf64_load(Monitor *mon, int fd, MemoryMapping *memory_mapping,
+                            int phdr_index, target_phys_addr_t offset)
+{
+    Elf64_Phdr phdr;
+    off_t phdr_offset;
+    int ret;
+
+    memset(&phdr, 0, sizeof(Elf64_Phdr));
+    phdr.p_type = cpu_to_le32(PT_LOAD);
+    phdr.p_offset = cpu_to_le64(offset);
+    phdr.p_paddr = cpu_to_le64(memory_mapping->phys_addr);
+    if (offset == -1) {
+        phdr.p_filesz = 0;
+    } else {
+        phdr.p_filesz = cpu_to_le64(memory_mapping->length);
+    }
+    phdr.p_memsz = cpu_to_le64(memory_mapping->length);
+    phdr.p_vaddr = cpu_to_le64(memory_mapping->virt_addr);
+
+    phdr_offset = sizeof(Elf64_Ehdr) + sizeof(Elf64_Phdr)*phdr_index;
+    lseek(fd, phdr_offset, SEEK_SET);
+    ret = write(fd, &phdr, sizeof(Elf64_Phdr));
+    if (ret < 0) {
+        monitor_printf(mon, "dump: failed to write program header table.\n");
+        return -1;
+    }
+
+    return 0;
+}
+
+static int write_elf32_load(Monitor *mon, int fd, MemoryMapping *memory_mapping,
+                            int phdr_index, target_phys_addr_t offset)
+{
+    Elf32_Phdr phdr;
+    off_t phdr_offset;
+    int ret;
+
+    memset(&phdr, 0, sizeof(Elf32_Phdr));
+    phdr.p_type = cpu_to_le32(PT_LOAD);
+    phdr.p_offset = cpu_to_le32(offset);
+    phdr.p_paddr = cpu_to_le32(memory_mapping->phys_addr);
+    if (offset == -1) {
+        phdr.p_filesz = 0;
+    } else {
+        phdr.p_filesz = cpu_to_le32(memory_mapping->length);
+    }
+    phdr.p_memsz = cpu_to_le32(memory_mapping->length);
+    phdr.p_vaddr = cpu_to_le32(memory_mapping->virt_addr);
+
+    phdr_offset = sizeof(Elf32_Ehdr) + sizeof(Elf32_Phdr)*phdr_index;
+    lseek(fd, phdr_offset, SEEK_SET);
+    ret = write(fd, &phdr, sizeof(Elf32_Phdr));
+    if (ret < 0) {
+        monitor_printf(mon, "dump: failed to write program header table.\n");
+        return -1;
+    }
+
+    return 0;
+}
+
+static int write_elf64_notes(Monitor *mon, int fd, int phdr_index,
+                             target_phys_addr_t *offset, bool lma)
+{
+    CPUState *env;
+    int ret;
+    target_phys_addr_t begin = *offset;
+    Elf64_Phdr phdr;
+    off_t phdr_offset;
+
+    for (env = first_cpu; env != NULL; env = env->next_cpu) {
+#if defined(TARGET_I386)
+#ifdef TARGET_X86_64
+        if (lma) {
+            ret = x86_64_write_elf64_note(mon, fd, env, offset);
+        } else {
+#endif
+            ret = x86_write_elf64_note(mon, fd, env, offset);
+#ifdef TARGET_X86_64
+        }
+#endif
+#else
+        ret = -1; /* Not supported */
+#endif
+
+        if (ret < 0) {
+            monitor_printf(mon, "dump: failed to write elf notes.\n");
+            return -1;
+        }
+    }
+
+    memset(&phdr, 0, sizeof(Elf64_Phdr));
+    phdr.p_type = cpu_to_le32(PT_NOTE);
+    phdr.p_offset = cpu_to_le64(begin);
+    phdr.p_paddr = 0;
+    phdr.p_filesz = cpu_to_le64(*offset - begin);
+    phdr.p_memsz = cpu_to_le64(*offset - begin);
+    phdr.p_vaddr = 0;
+
+    phdr_offset = sizeof(Elf64_Ehdr);
+    lseek(fd, phdr_offset, SEEK_SET);
+    ret = write(fd, &phdr, sizeof(Elf64_Phdr));
+    if (ret < 0) {
+        monitor_printf(mon, "dump: failed to write program header table.\n");
+        return -1;
+    }
+
+    return 0;
+}
+
+static int write_elf32_notes(Monitor *mon, int fd, int phdr_index,
+                             target_phys_addr_t *offset)
+{
+    CPUState *env;
+    int ret;
+    target_phys_addr_t begin = *offset;
+    Elf32_Phdr phdr;
+    off_t phdr_offset;
+
+    for (env = first_cpu; env != NULL; env = env->next_cpu) {
+#if defined(TARGET_I386)
+        ret = x86_write_elf32_note(mon, fd, env, offset);
+#else
+        ret = -1; /* Not supported */
+#endif
+
+        if (ret < 0) {
+            monitor_printf(mon, "dump: failed to write elf notes.\n");
+            return -1;
+        }
+    }
+
+    memset(&phdr, 0, sizeof(Elf32_Phdr));
+    phdr.p_type = cpu_to_le32(PT_NOTE);
+    phdr.p_offset = cpu_to_le32(begin);
+    phdr.p_paddr = 0;
+    phdr.p_filesz = cpu_to_le32(*offset - begin);
+    phdr.p_memsz = cpu_to_le32(*offset - begin);
+    phdr.p_vaddr = 0;
+
+    phdr_offset = sizeof(Elf32_Ehdr);
+    lseek(fd, phdr_offset, SEEK_SET);
+    ret = write(fd, &phdr, sizeof(Elf32_Phdr));
+    if (ret < 0) {
+        monitor_printf(mon, "dump: failed to write program header table.\n");
+        return -1;
+    }
+
+    return 0;
+}
+
+static int write_data(Monitor *mon, int fd, void *buf, int length,
+                      target_phys_addr_t *offset)
+{
+    int ret;
+
+    lseek(fd, *offset, SEEK_SET);
+    ret = write(fd, buf, length);
+    if (ret < 0) {
+        monitor_printf(mon, "dump: failed to save memory.\n");
+        return -1;
+    }
+
+    *offset += length;
+    return 0;
+}
+
+/* write the memroy to vmcore. 1 page per I/O. */
+static int write_memory(Monitor *mon, int fd, RAMBlock *block,
+                        target_phys_addr_t *offset)
+{
+    int i, ret;
+
+    for (i = 0; i < block->length / TARGET_PAGE_SIZE; i++) {
+        ret = write_data(mon, fd, block->host + i * TARGET_PAGE_SIZE,
+                         TARGET_PAGE_SIZE, offset);
+        if (ret < 0) {
+            return -1;
+        }
+    }
+
+    if ((block->length % TARGET_PAGE_SIZE) != 0) {
+        ret = write_data(mon, fd, block->host + i * TARGET_PAGE_SIZE,
+                        block->length % TARGET_PAGE_SIZE, offset);
+        if (ret < 0) {
+            return -1;
+        }
+    }
+
+    return 0;
+}
+
+/* get the memory's offset in the vmcore */
+static target_phys_addr_t get_offset(target_phys_addr_t phys_addr,
+                                     target_phys_addr_t memory_offset)
+{
+    RAMBlock *block;
+    target_phys_addr_t offset = memory_offset;
+
+    QLIST_FOREACH(block, &ram_list.blocks, next) {
+        if (phys_addr >= block->offset &&
+            phys_addr < block->offset + block->length) {
+            return phys_addr - block->offset + offset;
+        }
+        offset += block->length;
+    }
+
+    return -1;
+}
+
+static int create_vmcore(Monitor *mon, int fd)
+{
+    CPUState *env;
+    target_phys_addr_t kernel_base = -1;
+    target_ulong base_vaddr;
+    target_phys_addr_t offset, memory_offset;
+    int phdr_num, phdr_index;
+    RAMBlock *block;
+    bool lma = false;
+    int ret;
+    int type, machine, endian;
+    MemoryMappingList list;
+    MemoryMapping *memory_mapping;
+
+    for (env = first_cpu; env != NULL; env = env->next_cpu) {
+        cpu_synchronize_state(env);
+    }
+
+    list.num = 0;
+    QTAILQ_INIT(&list.head);
+    get_memory_mapping(&list);
+
+#if defined(TARGET_I386)
+
+#ifdef TARGET_X86_64
+    lma = !!(first_cpu->hflags & HF_LMA_MASK);
+#endif
+
+    kernel_base = get_phys_base_addr(first_cpu, &base_vaddr);
+    if (kernel_base == -1) {
+        monitor_printf(mon, "dump: can not get phys_base\n");
+        goto error;
+    }
+#endif
+
+#if defined(TARGET_I386)
+    if (lma) {
+        machine = EM_X86_64;
+    } else {
+        machine = EM_386;
+    }
+    endian = ELFDATA2LSB;
+#else
+    monitor_printf(mon, "dump: unsupported target.\n")
+    goto error;
+#endif
+
+    if (sizeof(ram_addr_t) == 4) {
+        type = 0; /* use elf32 */
+#if defined(TARGET_I386)
+    } else if (!lma) {
+        type = 0; /* the guest os is not in IA-32e mode */
+#endif
+    } else {
+        type = 1; /* use elf64 */
+    }
+
+    phdr_num = 1; /* PT_NOTE */
+#if defined(TARGET_I386)
+#ifdef TARGET_X86_64
+    if (lma) {
+        create_new_memory_mapping_head(&list, kernel_base, base_vaddr,
+                                       TARGET_PAGE_SIZE);
+    }
+#endif
+
+    QLIST_FOREACH(block, &ram_list.blocks, next) {
+        if (!lma && (block->offset + block->length > UINT_MAX)) {
+            type = 1; /* The memory size is greater than 4G */
+            break;
+        }
+    }
+#endif
+
+    /* the type of phdr->num is uint16_t, so we should avoid overflow */
+    if (list.num > (1 << 16) - 2) {
+        phdr_num = (1 << 16) - 1;
+    } else {
+        phdr_num += list.num;
+    }
+
+    /* write elf header to vmcore */
+    if (type == 1) {
+        ret = write_elf64_header(mon, fd, phdr_num, machine, endian);
+    } else {
+        ret = write_elf32_header(mon, fd, phdr_num, machine, endian);
+    }
+    if (ret < 0) {
+        goto error;
+    }
+
+    /* write elf notes to vmcore */
+    phdr_index = 0;
+    if (type == 1) {
+        offset = sizeof(Elf64_Ehdr) + sizeof(Elf64_Phdr)*phdr_num;
+        ret = write_elf64_notes(mon, fd, phdr_index++, &offset, lma);
+    } else {
+        offset = sizeof(Elf32_Ehdr) + sizeof(Elf32_Phdr)*phdr_num;
+        ret = write_elf32_notes(mon, fd, phdr_index++, &offset);
+    }
+
+    if (ret < 0) {
+        goto error;
+    }
+
+    memory_offset = offset;
+    /* write all memory to vmcore */
+    QLIST_FOREACH(block, &ram_list.blocks, next) {
+        ret = write_memory(mon, fd, block, &offset);
+        if (ret < 0) {
+            goto error;
+        }
+    }
+
+    /* write PT_LOAD program header to vmcore */
+    QTAILQ_FOREACH(memory_mapping, &list.head, next) {
+        offset = get_offset(memory_mapping->phys_addr, memory_offset);
+        if (type == 1) {
+            ret = write_elf64_load(mon, fd, memory_mapping, phdr_index++,
+                                   offset);
+        } else {
+            ret = write_elf32_load(mon, fd, memory_mapping, phdr_index++,
+                                   offset);
+        }
+        if (ret < 0) {
+            goto error;
+        }
+    }
+
+    free_memory_mapping_list(&list);
+    return 0;
+
+error:
+    free_memory_mapping_list(&list);
+    return -1;
+}
+
+int do_dump(Monitor *mon, const QDict *qdict, QObject **ret_data)
+{
+    const char *file = qdict_get_str(qdict, "file");
+    const char *p;
+    int fd = -1;
+
+#if !defined(WIN32)
+    if (strstart(file, "fd:", &p)) {
+        fd = monitor_get_fd(mon, p);
+        if (fd == -1) {
+            monitor_printf(mon, "dump: invalid file descriptor"
+                           " identifier\n");
+            return -1;
+        }
+    }
+#endif
+
+    if  (strstart(file, "file:", &p)) {
+        fd = open(p, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY);
+        if (fd < 0) {
+            monitor_printf(mon, "dump: failed to open %s\n", p);
+            return -1;
+        }
+    }
+
+    if (fd == -1) {
+        monitor_printf(mon, "unknown dump protocol: %s\n", file);
+        return -1;
+    }
+
+    vm_stop(RUN_STATE_PAUSED);
+    if (create_vmcore(mon, fd) < 0) {
+        return -1;
+    }
+
+    return 0;
+}
diff --git a/dump.h b/dump.h
new file mode 100644
index 0000000..c91fa2c
--- /dev/null
+++ b/dump.h
@@ -0,0 +1,6 @@
+#ifndef DUMP_H
+#define DUMP_H
+
+int do_dump(Monitor *mon, const QDict *qdict, QObject **ret_data);
+
+#endif
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 79a9195..83df152 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -772,6 +772,22 @@ Migrate to @var{uri} (using -d to not wait for completion).
 ETEXI
 
     {
+        .name       = "dump",
+        .args_type  = "file:s",
+        .params     = "file",
+        .help       = "dump to file",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd_new = do_dump,
+    },
+
+
+STEXI
+@item dump @var{file}
+@findex dump
+Dump to @var{file}.
+ETEXI
+
+    {
         .name       = "migrate_cancel",
         .args_type  = "",
         .params     = "",
diff --git a/monitor.c b/monitor.c
index 1be222e..9f26c3d 100644
--- a/monitor.c
+++ b/monitor.c
@@ -73,6 +73,9 @@
 #endif
 #include "hw/lm32_pic.h"
 
+/* for dump */
+#include "dump.h"
+
 //#define DEBUG
 //#define DEBUG_COMPLETION
 
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 94da2a8..4ea50e0 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -485,6 +485,30 @@ Notes:
 EQMP
 
     {
+        .name       = "dump",
+        .args_type  = "file:s",
+        .params     = "file",
+        .help       = "dump to file",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd_new = do_dump,
+    },
+
+SQMP
+dump
+-------
+
+Dump to file.
+
+Arguments: None.
+
+Example:
+
+-> { "execute": "dump", "arguments": { "file": "fd:dump" } }
+<- { "return": {} }
+
+EQMP
+
+    {
         .name       = "migrate_cancel",
         .args_type  = "",
         .params     = "",
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC][PATCT 0/5 v2] dump memory when host pci device is used by guest
  2011-12-09  7:57 [Qemu-devel] [RFC][PATCT 0/5 v2] dump memory when host pci device is used by guest Wen Congyang
                   ` (4 preceding siblings ...)
  2011-12-09  8:09 ` [Qemu-devel] [RFC][PATCH 5/5v2] introduce a new monitor command 'dump' to dump guest's memory Wen Congyang
@ 2011-12-13  3:12 ` HATAYAMA Daisuke
  2011-12-13  3:35   ` Wen Congyang
  2011-12-13 12:55 ` Jan Kiszka
  6 siblings, 1 reply; 16+ messages in thread
From: HATAYAMA Daisuke @ 2011-12-13  3:12 UTC (permalink / raw)
  To: wency; +Cc: jan.kiszka, anderson, qemu-devel

Hello Wen,

From: Wen Congyang <wency@cn.fujitsu.com>
Subject: [Qemu-devel] [RFC][PATCT 0/5 v2] dump memory when host pci device is used by guest
Date: Fri, 09 Dec 2011 15:57:26 +0800

> Hi, all
> 
> 'virsh dump' can not work when host pci device is used by guest. We have
> discussed this issue here:
> http://lists.nongnu.org/archive/html/qemu-devel/2011-10/msg00736.html
> 
> We have determined to introduce a new command dump to dump memory. The core
> file's format can be elf.
> 
> Note:
> 1. The guest should be x86 or x86_64. The other arch is not supported.
> 2. If you use old gdb, gdb may crash. I use gdb-7.3.1, and it does not crash.
> 3. If the OS is in the second kernel, gdb may not work well, and crash can
>    work by specifying '--machdep phys_addr=xxx' in the command line. The
>    reason is that the second kernel will update the page table, and we can
>    not get the page table for the first kernel.

I guess still the current implementation breaks vmalloc'ed area that
needs page tables originally located in the first 640kB, right? If you
want to do so in a correct way, you need to identify a position of
backup region and get data of 1st kernel's page tables.

But it needs debugging information of guest kernel, and I don't think
it good idea that qemu uses too guest-specific information.

On the other hand, I have a basic question. Can this command used for
creating live dump? or crash dump only?

Thanks.
HATAYAMA, Daisuke

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC][PATCT 0/5 v2] dump memory when host pci device is used by guest
  2011-12-13  3:12 ` [Qemu-devel] [RFC][PATCT 0/5 v2] dump memory when host pci device is used by guest HATAYAMA Daisuke
@ 2011-12-13  3:35   ` Wen Congyang
  2011-12-13  6:01     ` HATAYAMA Daisuke
  0 siblings, 1 reply; 16+ messages in thread
From: Wen Congyang @ 2011-12-13  3:35 UTC (permalink / raw)
  To: HATAYAMA Daisuke; +Cc: jan.kiszka, anderson, qemu-devel

Hi, hatayama-san

At 12/13/2011 11:12 AM, HATAYAMA Daisuke Write:
> Hello Wen,
> 
> From: Wen Congyang <wency@cn.fujitsu.com>
> Subject: [Qemu-devel] [RFC][PATCT 0/5 v2] dump memory when host pci device is used by guest
> Date: Fri, 09 Dec 2011 15:57:26 +0800
> 
>> Hi, all
>>
>> 'virsh dump' can not work when host pci device is used by guest. We have
>> discussed this issue here:
>> http://lists.nongnu.org/archive/html/qemu-devel/2011-10/msg00736.html
>>
>> We have determined to introduce a new command dump to dump memory. The core
>> file's format can be elf.
>>
>> Note:
>> 1. The guest should be x86 or x86_64. The other arch is not supported.
>> 2. If you use old gdb, gdb may crash. I use gdb-7.3.1, and it does not crash.
>> 3. If the OS is in the second kernel, gdb may not work well, and crash can
>>    work by specifying '--machdep phys_addr=xxx' in the command line. The
>>    reason is that the second kernel will update the page table, and we can
>>    not get the page table for the first kernel.
> 
> I guess still the current implementation breaks vmalloc'ed area that
> needs page tables originally located in the first 640kB, right? If you
> want to do so in a correct way, you need to identify a position of
> backup region and get data of 1st kernel's page tables.

I do not know anything about vmalloc'ed area. Can you explain it more
detailed?

> 
> But it needs debugging information of guest kernel, and I don't think
> it good idea that qemu uses too guest-specific information.
> 
> On the other hand, I have a basic question. Can this command used for
> creating live dump? or crash dump only?

Do you mean dump guest's memory while it is running(do not stop the guest)?
If so, this command can not be used for creating live dump.

Thanks
Wen Congyang

> 
> Thanks.
> HATAYAMA, Daisuke
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC][PATCT 0/5 v2] dump memory when host pci device is used by guest
  2011-12-13  3:35   ` Wen Congyang
@ 2011-12-13  6:01     ` HATAYAMA Daisuke
  2011-12-13  9:20       ` Wen Congyang
  0 siblings, 1 reply; 16+ messages in thread
From: HATAYAMA Daisuke @ 2011-12-13  6:01 UTC (permalink / raw)
  To: wency; +Cc: jan.kiszka, anderson, qemu-devel

From: Wen Congyang <wency@cn.fujitsu.com>
Subject: Re: [Qemu-devel] [RFC][PATCT 0/5 v2] dump memory when host pci device is used by guest
Date: Tue, 13 Dec 2011 11:35:53 +0800

> Hi, hatayama-san
> 
> At 12/13/2011 11:12 AM, HATAYAMA Daisuke Write:
>> Hello Wen,
>> 
>> From: Wen Congyang <wency@cn.fujitsu.com>
>> Subject: [Qemu-devel] [RFC][PATCT 0/5 v2] dump memory when host pci device is used by guest
>> Date: Fri, 09 Dec 2011 15:57:26 +0800
>> 
>>> Hi, all
>>>
>>> 'virsh dump' can not work when host pci device is used by guest. We have
>>> discussed this issue here:
>>> http://lists.nongnu.org/archive/html/qemu-devel/2011-10/msg00736.html
>>>
>>> We have determined to introduce a new command dump to dump memory. The core
>>> file's format can be elf.
>>>
>>> Note:
>>> 1. The guest should be x86 or x86_64. The other arch is not supported.
>>> 2. If you use old gdb, gdb may crash. I use gdb-7.3.1, and it does not crash.
>>> 3. If the OS is in the second kernel, gdb may not work well, and crash can
>>>    work by specifying '--machdep phys_addr=xxx' in the command line. The
>>>    reason is that the second kernel will update the page table, and we can
>>>    not get the page table for the first kernel.
>> 
>> I guess still the current implementation breaks vmalloc'ed area that
>> needs page tables originally located in the first 640kB, right? If you
>> want to do so in a correct way, you need to identify a position of
>> backup region and get data of 1st kernel's page tables.
> 
> I do not know anything about vmalloc'ed area. Can you explain it more
> detailed?
> 

It's memory area not straight-mapped. To read the area, it's necessary
to look up guest machine's page tables. If I understand correctly,
your current implementation translates the vmalloc'ed area so that the
generated vmcore is linearly mapped w.r.t. virtual-address for gdb to
work.

kdump saves the first 640kB physical memory into the backup region. I
guess, for some vmcores created by the current implementation, gdb and
crash cannot see the vmalloc'ed memory area that needs page tables
placed at the 640kB region, correctly. For example, try to use mod
sub-command. Kernel modules are allocated on vmalloc'ed area.

I have developped a very similar logic for sadump. Look at sadump.c in
crash. Logic itself is very simple, but debugging information is
necessary. Documentation/kdump/kdump.txt and the following paper
explains backup region mechanism very well, and the implementaion
around there remains same now.

  http://lse.sourceforge.net/kdump/documentation/ols2oo5-kdump-paper.pdf

On the other hand, have you written patch for crash to read this
vmcore? I expect it's possible by a little fix to kcore code.

> 
> Do you mean dump guest's memory while it is running(do not stop the guest)?
> If so, this command can not be used for creating live dump.
> 

I mean dump that keeps machine running as you say.
Do you have plan for live dump?

Thanks.
HATAYAMA, Daisuke

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC][PATCT 0/5 v2] dump memory when host pci device is used by guest
  2011-12-13  6:01     ` HATAYAMA Daisuke
@ 2011-12-13  9:20       ` Wen Congyang
  2011-12-15  1:30         ` HATAYAMA Daisuke
  0 siblings, 1 reply; 16+ messages in thread
From: Wen Congyang @ 2011-12-13  9:20 UTC (permalink / raw)
  To: HATAYAMA Daisuke; +Cc: jan.kiszka, anderson, qemu-devel

At 12/13/2011 02:01 PM, HATAYAMA Daisuke Write:
> From: Wen Congyang <wency@cn.fujitsu.com>
> Subject: Re: [Qemu-devel] [RFC][PATCT 0/5 v2] dump memory when host pci device is used by guest
> Date: Tue, 13 Dec 2011 11:35:53 +0800
> 
>> Hi, hatayama-san
>>
>> At 12/13/2011 11:12 AM, HATAYAMA Daisuke Write:
>>> Hello Wen,
>>>
>>> From: Wen Congyang <wency@cn.fujitsu.com>
>>> Subject: [Qemu-devel] [RFC][PATCT 0/5 v2] dump memory when host pci device is used by guest
>>> Date: Fri, 09 Dec 2011 15:57:26 +0800
>>>
>>>> Hi, all
>>>>
>>>> 'virsh dump' can not work when host pci device is used by guest. We have
>>>> discussed this issue here:
>>>> http://lists.nongnu.org/archive/html/qemu-devel/2011-10/msg00736.html
>>>>
>>>> We have determined to introduce a new command dump to dump memory. The core
>>>> file's format can be elf.
>>>>
>>>> Note:
>>>> 1. The guest should be x86 or x86_64. The other arch is not supported.
>>>> 2. If you use old gdb, gdb may crash. I use gdb-7.3.1, and it does not crash.
>>>> 3. If the OS is in the second kernel, gdb may not work well, and crash can
>>>>    work by specifying '--machdep phys_addr=xxx' in the command line. The
>>>>    reason is that the second kernel will update the page table, and we can
>>>>    not get the page table for the first kernel.
>>>
>>> I guess still the current implementation breaks vmalloc'ed area that
>>> needs page tables originally located in the first 640kB, right? If you
>>> want to do so in a correct way, you need to identify a position of
>>> backup region and get data of 1st kernel's page tables.
>>
>> I do not know anything about vmalloc'ed area. Can you explain it more
>> detailed?
>>
> 
> It's memory area not straight-mapped. To read the area, it's necessary
> to look up guest machine's page tables. If I understand correctly,
> your current implementation translates the vmalloc'ed area so that the
> generated vmcore is linearly mapped w.r.t. virtual-address for gdb to
> work.

Do you mean the page table for vmalloc'ed area is stored in first 640KB,
and it may be overwriten by the second kernel(this region has been backed up)?

> 
> kdump saves the first 640kB physical memory into the backup region. I
> guess, for some vmcores created by the current implementation, gdb and
> crash cannot see the vmalloc'ed memory area that needs page tables

Hmm, IIRC, crash do not use CPU's page table. gdb use the information in
PT_LOAD to read memory area.

> placed at the 640kB region, correctly. For example, try to use mod
> sub-command. Kernel modules are allocated on vmalloc'ed area.
> 
> I have developped a very similar logic for sadump. Look at sadump.c in
> crash. Logic itself is very simple, but debugging information is
> necessary. Documentation/kdump/kdump.txt and the following paper
> explains backup region mechanism very well, and the implementaion
> around there remains same now.

Hmm, we can not use debugging information on qemu sied.

> 
>   http://lse.sourceforge.net/kdump/documentation/ols2oo5-kdump-paper.pdf
> 
> On the other hand, have you written patch for crash to read this
> vmcore? I expect it's possible by a little fix to kcore code.

crash can read this vmcore without any change.

Thanks
Wen Congyang.


> 
>>
>> Do you mean dump guest's memory while it is running(do not stop the guest)?
>> If so, this command can not be used for creating live dump.
>>
> 
> I mean dump that keeps machine running as you say.
> Do you have plan for live dump?
> 
> Thanks.
> HATAYAMA, Daisuke
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC][PATCT 0/5 v2] dump memory when host pci device is used by guest
  2011-12-09  7:57 [Qemu-devel] [RFC][PATCT 0/5 v2] dump memory when host pci device is used by guest Wen Congyang
                   ` (5 preceding siblings ...)
  2011-12-13  3:12 ` [Qemu-devel] [RFC][PATCT 0/5 v2] dump memory when host pci device is used by guest HATAYAMA Daisuke
@ 2011-12-13 12:55 ` Jan Kiszka
  2011-12-14  2:43   ` Wen Congyang
  6 siblings, 1 reply; 16+ messages in thread
From: Jan Kiszka @ 2011-12-13 12:55 UTC (permalink / raw)
  To: Wen Congyang; +Cc: HATAYAMA Daisuke, Dave Anderson, qemu-devel

On 2011-12-09 08:57, Wen Congyang wrote:
> Hi, all
> 
> 'virsh dump' can not work when host pci device is used by guest. We have
> discussed this issue here:
> http://lists.nongnu.org/archive/html/qemu-devel/2011-10/msg00736.html
> 
> We have determined to introduce a new command dump to dump memory. The core
> file's format can be elf.
> 
> Note:
> 1. The guest should be x86 or x86_64. The other arch is not supported.
> 2. If you use old gdb, gdb may crash. I use gdb-7.3.1, and it does not crash.
> 3. If the OS is in the second kernel, gdb may not work well, and crash can
>    work by specifying '--machdep phys_addr=xxx' in the command line. The
>    reason is that the second kernel will update the page table, and we can
>    not get the page table for the first kernel.
> 4. If the guest OS is 32 bit and the memory size is larger than 4G, the vmcore
>    is elf64 format. You should use the gdb which is built with --enable-64-bit-bfd.
> 
> Changes from v1 to v2:
> 1. fix virt addr in the vmcore.
> 
> Wen Congyang (5):
>   Add API to create memory mapping list
>   Add API to check whether a physical address is I/O address
>   target-i386: implement cpu_get_memory_mapping()
>   Add API to get memory mapping
>   introduce a new monitor command 'dump' to dump guest's memory
> 
>  Makefile.target      |    9 +-
>  cpu-all.h            |   10 +
>  cpu-common.h         |    1 +
>  dump.c               |  722 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  dump.h               |    6 +
>  exec.c               |   20 ++
>  hmp-commands.hx      |   16 ++
>  memory_mapping.c     |  183 +++++++++++++
>  memory_mapping.h     |   30 ++
>  monitor.c            |    3 +
>  qmp-commands.hx      |   24 ++
>  target-i386/helper.c |  239 +++++++++++++++++
>  12 files changed, 1259 insertions(+), 4 deletions(-)
>  create mode 100644 dump.c
>  create mode 100644 dump.h
>  create mode 100644 memory_mapping.c
>  create mode 100644 memory_mapping.h

A general remark regarding code organization: Please factor out the
target specific bits and push them into target-*/dump.[ch] or whatever
appropriate file in that folder. Ugly #ifdefs should be avoided in
generic code as far as possible.

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 1/5 v2] Add API to create memory mapping list
  2011-12-09  8:06 ` [Qemu-devel] [RFC][PATCH 1/5 v2] Add API to create memory mapping list Wen Congyang
@ 2011-12-13 13:03   ` Jan Kiszka
  2011-12-14  8:10     ` Wen Congyang
  0 siblings, 1 reply; 16+ messages in thread
From: Jan Kiszka @ 2011-12-13 13:03 UTC (permalink / raw)
  To: Wen Congyang; +Cc: HATAYAMA Daisuke, Dave Anderson, qemu-devel

On 2011-12-09 09:06, Wen Congyang wrote:
> The memory mapping list stores virtual address and physical address mapping.
> The folloing patch will use this information to create PT_LOAD in the vmcore.
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> ---
>  Makefile.target  |    1 +
>  memory_mapping.c |  130 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  memory_mapping.h |   29 ++++++++++++
>  3 files changed, 160 insertions(+), 0 deletions(-)
>  create mode 100644 memory_mapping.c
>  create mode 100644 memory_mapping.h
> 
> diff --git a/Makefile.target b/Makefile.target
> index a111521..778f514 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -205,6 +205,7 @@ obj-$(CONFIG_REALLY_VIRTFS) += 9pfs/virtio-9p-device.o
>  obj-$(CONFIG_KVM) += kvm.o kvm-all.o
>  obj-$(CONFIG_NO_KVM) += kvm-stub.o
>  obj-y += memory.o
> +obj-y += memory_mapping.o
>  LIBS+=-lz
>  
>  QEMU_CFLAGS += $(VNC_TLS_CFLAGS)
> diff --git a/memory_mapping.c b/memory_mapping.c
> new file mode 100644
> index 0000000..d83b7d7
> --- /dev/null
> +++ b/memory_mapping.c
> @@ -0,0 +1,130 @@
> +/*
> + * QEMU memory mapping
> + *
> + * Copyright Fujitsu, Corp. 2011
> + *
> + * Authors:
> + *     Wen Congyang <wency@cn.fujitsu.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "cpu.h"
> +#include "cpu-all.h"
> +#include "memory_mapping.h"
> +
> +static MemoryMapping *last_mapping;
> +
> +static void create_new_memory_mapping(MemoryMappingList *list,
> +                                      target_phys_addr_t phys_addr,
> +                                      target_phys_addr_t virt_addr,
> +                                      ram_addr_t length)
> +{
> +    MemoryMapping *memory_mapping, *p;
> +
> +    memory_mapping = g_malloc(sizeof(MemoryMapping));
> +    memory_mapping->phys_addr = phys_addr;
> +    memory_mapping->virt_addr = virt_addr;
> +    memory_mapping->length = length;
> +    last_mapping = memory_mapping;
> +    list->num++;
> +    QTAILQ_FOREACH(p, &list->head, next) {
> +        if (p->phys_addr >= memory_mapping->phys_addr) {
> +            QTAILQ_INSERT_BEFORE(p, memory_mapping, next);
> +            return;
> +        }
> +    }
> +    QTAILQ_INSERT_TAIL(&list->head, memory_mapping, next);
> +    return;
> +}
> +
> +void create_new_memory_mapping_head(MemoryMappingList *list,
> +                                    target_phys_addr_t phys_addr,
> +                                    target_phys_addr_t virt_addr,
> +                                    ram_addr_t length)
> +{
> +    MemoryMapping *memory_mapping;
> +
> +    memory_mapping = g_malloc(sizeof(MemoryMapping));
> +    memory_mapping->phys_addr = phys_addr;
> +    memory_mapping->virt_addr = virt_addr;
> +    memory_mapping->length = length;
> +    last_mapping = memory_mapping;
> +    list->num++;
> +    QTAILQ_INSERT_HEAD(&list->head, memory_mapping, next);
> +    return;
> +}

Isn't create_new_memory_mapping_head just a special case of
create_new_memory_mapping? And can't add_to_memory_mapping be used or
extended so that create_new_memory_mapping_head becomes obsolete?
Documenting the API would help at least me understanding the different
semantics.

> +
> +void add_to_memory_mapping(MemoryMappingList *list,
> +                           target_phys_addr_t phys_addr,
> +                           target_phys_addr_t virt_addr,
> +                           ram_addr_t length)
> +{
> +    MemoryMapping *memory_mapping;
> +
> +    if (QTAILQ_EMPTY(&list->head)) {
> +        create_new_memory_mapping(list, phys_addr, virt_addr, length);
> +        return;
> +    }
> +
> +    if (last_mapping) {
> +        if ((phys_addr == (last_mapping->phys_addr + last_mapping->length)) &&
> +            (virt_addr == (last_mapping->virt_addr + last_mapping->length))) {
> +            last_mapping->length += length;
> +            return;
> +        }
> +    }
> +
> +    QTAILQ_FOREACH(memory_mapping, &list->head, next) {
> +        last_mapping = memory_mapping;
> +        if ((phys_addr == (last_mapping->phys_addr + last_mapping->length)) &&
> +            (virt_addr == (last_mapping->virt_addr + last_mapping->length))) {
> +            last_mapping->length += length;
> +            return;
> +        }
> +
> +        if (!(phys_addr >= (last_mapping->phys_addr)) ||
> +            !(phys_addr < (last_mapping->phys_addr + last_mapping->length))) {
> +            /* last_mapping does not contain this region */
> +            continue;
> +        }
> +        if (!(virt_addr >= (last_mapping->virt_addr)) ||
> +            !(virt_addr < (last_mapping->virt_addr + last_mapping->length))) {
> +            /* last_mapping does not contain this region */
> +            continue;
> +        }
> +        if ((virt_addr - last_mapping->virt_addr) !=
> +            (phys_addr - last_mapping->phys_addr)) {
> +            /*
> +             * last_mapping contains this region, but we should create another
> +             * mapping region.
> +             */
> +            break;
> +        }
> +
> +        /* merge this region into last_mapping */
> +        if ((virt_addr + length) >
> +            (last_mapping->virt_addr + last_mapping->length)) {
> +            last_mapping->length = virt_addr + length - last_mapping->virt_addr;
> +        }
> +        return;
> +    }
> +
> +    /* this region can not be merged into any existed memory mapping. */
> +    create_new_memory_mapping(list, phys_addr, virt_addr, length);
> +    return;
> +}
> +
> +void free_memory_mapping_list(MemoryMappingList *list)
> +{
> +    MemoryMapping *p, *q;
> +
> +    QTAILQ_FOREACH_SAFE(p, &list->head, next, q) {
> +        QTAILQ_REMOVE(&list->head, p, next);
> +        g_free(p);

Can't last_mapping still point to this object?

> +    }
> +
> +    list->num = 0;
> +}
> diff --git a/memory_mapping.h b/memory_mapping.h
> new file mode 100644
> index 0000000..871591d
> --- /dev/null
> +++ b/memory_mapping.h
> @@ -0,0 +1,29 @@
> +#ifndef MEMORY_MAPPING_H
> +#define MEMORY_MAPPING_H
> +
> +#include "qemu-queue.h"
> +
> +typedef struct MemoryMapping {
> +    target_phys_addr_t phys_addr;
> +    target_ulong virt_addr;
> +    ram_addr_t length;
> +    QTAILQ_ENTRY(MemoryMapping) next;
> +} MemoryMapping;
> +
> +typedef struct MemoryMappingList {
> +    unsigned int num;
> +    QTAILQ_HEAD(, MemoryMapping) head;
> +} MemoryMappingList;
> +
> +void create_new_memory_mapping_head(MemoryMappingList *list,
> +                                    target_phys_addr_t phys_addr,
> +                                    target_phys_addr_t virt_addr,
> +                                    ram_addr_t length);
> +void add_to_memory_mapping(MemoryMappingList *list,
> +                           target_phys_addr_t phys_addr,
> +                           target_phys_addr_t virt_addr,
> +                           ram_addr_t length);
> +
> +void free_memory_mapping_list(MemoryMappingList *list);
> +
> +#endif

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC][PATCT 0/5 v2] dump memory when host pci device is used by guest
  2011-12-13 12:55 ` Jan Kiszka
@ 2011-12-14  2:43   ` Wen Congyang
  0 siblings, 0 replies; 16+ messages in thread
From: Wen Congyang @ 2011-12-14  2:43 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: HATAYAMA Daisuke, Dave Anderson, qemu-devel

At 12/13/2011 08:55 PM, Jan Kiszka Write:
> On 2011-12-09 08:57, Wen Congyang wrote:
>> Hi, all
>>
>> 'virsh dump' can not work when host pci device is used by guest. We have
>> discussed this issue here:
>> http://lists.nongnu.org/archive/html/qemu-devel/2011-10/msg00736.html
>>
>> We have determined to introduce a new command dump to dump memory. The core
>> file's format can be elf.
>>
>> Note:
>> 1. The guest should be x86 or x86_64. The other arch is not supported.
>> 2. If you use old gdb, gdb may crash. I use gdb-7.3.1, and it does not crash.
>> 3. If the OS is in the second kernel, gdb may not work well, and crash can
>>    work by specifying '--machdep phys_addr=xxx' in the command line. The
>>    reason is that the second kernel will update the page table, and we can
>>    not get the page table for the first kernel.
>> 4. If the guest OS is 32 bit and the memory size is larger than 4G, the vmcore
>>    is elf64 format. You should use the gdb which is built with --enable-64-bit-bfd.
>>
>> Changes from v1 to v2:
>> 1. fix virt addr in the vmcore.
>>
>> Wen Congyang (5):
>>   Add API to create memory mapping list
>>   Add API to check whether a physical address is I/O address
>>   target-i386: implement cpu_get_memory_mapping()
>>   Add API to get memory mapping
>>   introduce a new monitor command 'dump' to dump guest's memory
>>
>>  Makefile.target      |    9 +-
>>  cpu-all.h            |   10 +
>>  cpu-common.h         |    1 +
>>  dump.c               |  722 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>  dump.h               |    6 +
>>  exec.c               |   20 ++
>>  hmp-commands.hx      |   16 ++
>>  memory_mapping.c     |  183 +++++++++++++
>>  memory_mapping.h     |   30 ++
>>  monitor.c            |    3 +
>>  qmp-commands.hx      |   24 ++
>>  target-i386/helper.c |  239 +++++++++++++++++
>>  12 files changed, 1259 insertions(+), 4 deletions(-)
>>  create mode 100644 dump.c
>>  create mode 100644 dump.h
>>  create mode 100644 memory_mapping.c
>>  create mode 100644 memory_mapping.h
> 
> A general remark regarding code organization: Please factor out the
> target specific bits and push them into target-*/dump.[ch] or whatever
> appropriate file in that folder. Ugly #ifdefs should be avoided in
> generic code as far as possible.

OK. I will fix it. Thanks for pointing it out.

Thanks
Wen Congyang

> 
> Thanks,
> Jan
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 1/5 v2] Add API to create memory mapping list
  2011-12-13 13:03   ` Jan Kiszka
@ 2011-12-14  8:10     ` Wen Congyang
  0 siblings, 0 replies; 16+ messages in thread
From: Wen Congyang @ 2011-12-14  8:10 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: HATAYAMA Daisuke, Dave Anderson, qemu-devel

At 12/13/2011 09:03 PM, Jan Kiszka Write:
> On 2011-12-09 09:06, Wen Congyang wrote:
>> The memory mapping list stores virtual address and physical address mapping.
>> The folloing patch will use this information to create PT_LOAD in the vmcore.
>>
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> ---
>>  Makefile.target  |    1 +
>>  memory_mapping.c |  130 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  memory_mapping.h |   29 ++++++++++++
>>  3 files changed, 160 insertions(+), 0 deletions(-)
>>  create mode 100644 memory_mapping.c
>>  create mode 100644 memory_mapping.h
>>
>> diff --git a/Makefile.target b/Makefile.target
>> index a111521..778f514 100644
>> --- a/Makefile.target
>> +++ b/Makefile.target
>> @@ -205,6 +205,7 @@ obj-$(CONFIG_REALLY_VIRTFS) += 9pfs/virtio-9p-device.o
>>  obj-$(CONFIG_KVM) += kvm.o kvm-all.o
>>  obj-$(CONFIG_NO_KVM) += kvm-stub.o
>>  obj-y += memory.o
>> +obj-y += memory_mapping.o
>>  LIBS+=-lz
>>  
>>  QEMU_CFLAGS += $(VNC_TLS_CFLAGS)
>> diff --git a/memory_mapping.c b/memory_mapping.c
>> new file mode 100644
>> index 0000000..d83b7d7
>> --- /dev/null
>> +++ b/memory_mapping.c
>> @@ -0,0 +1,130 @@
>> +/*
>> + * QEMU memory mapping
>> + *
>> + * Copyright Fujitsu, Corp. 2011
>> + *
>> + * Authors:
>> + *     Wen Congyang <wency@cn.fujitsu.com>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2.  See
>> + * the COPYING file in the top-level directory.
>> + *
>> + */
>> +
>> +#include "cpu.h"
>> +#include "cpu-all.h"
>> +#include "memory_mapping.h"
>> +
>> +static MemoryMapping *last_mapping;
>> +
>> +static void create_new_memory_mapping(MemoryMappingList *list,
>> +                                      target_phys_addr_t phys_addr,
>> +                                      target_phys_addr_t virt_addr,
>> +                                      ram_addr_t length)
>> +{
>> +    MemoryMapping *memory_mapping, *p;
>> +
>> +    memory_mapping = g_malloc(sizeof(MemoryMapping));
>> +    memory_mapping->phys_addr = phys_addr;
>> +    memory_mapping->virt_addr = virt_addr;
>> +    memory_mapping->length = length;
>> +    last_mapping = memory_mapping;
>> +    list->num++;
>> +    QTAILQ_FOREACH(p, &list->head, next) {
>> +        if (p->phys_addr >= memory_mapping->phys_addr) {
>> +            QTAILQ_INSERT_BEFORE(p, memory_mapping, next);
>> +            return;
>> +        }
>> +    }
>> +    QTAILQ_INSERT_TAIL(&list->head, memory_mapping, next);
>> +    return;
>> +}
>> +
>> +void create_new_memory_mapping_head(MemoryMappingList *list,
>> +                                    target_phys_addr_t phys_addr,
>> +                                    target_phys_addr_t virt_addr,
>> +                                    ram_addr_t length)
>> +{
>> +    MemoryMapping *memory_mapping;
>> +
>> +    memory_mapping = g_malloc(sizeof(MemoryMapping));
>> +    memory_mapping->phys_addr = phys_addr;
>> +    memory_mapping->virt_addr = virt_addr;
>> +    memory_mapping->length = length;
>> +    last_mapping = memory_mapping;
>> +    list->num++;
>> +    QTAILQ_INSERT_HEAD(&list->head, memory_mapping, next);
>> +    return;
>> +}
> 
> Isn't create_new_memory_mapping_head just a special case of
> create_new_memory_mapping? And can't add_to_memory_mapping be used or
> extended so that create_new_memory_mapping_head becomes obsolete?
> Documenting the API would help at least me understanding the different
> semantics.

The memory mapping list is sorted by physical address. If the memory mapping's
num is greater than 2^16-2(the length of the value that contains program header
table entry count is 16 bit), we must drop some mappings...

I drop the memory mappings according to physical address, so I sort it.

But crash will use the first PT_LOAD to calculate phys_offset, so I add
the API create_new_memory_mapping_head() to add a specified memory mapping at
the head of the list.

Do you have any better idea?

> 
>> +
>> +void add_to_memory_mapping(MemoryMappingList *list,
>> +                           target_phys_addr_t phys_addr,
>> +                           target_phys_addr_t virt_addr,
>> +                           ram_addr_t length)
>> +{
>> +    MemoryMapping *memory_mapping;
>> +
>> +    if (QTAILQ_EMPTY(&list->head)) {
>> +        create_new_memory_mapping(list, phys_addr, virt_addr, length);
>> +        return;
>> +    }
>> +
>> +    if (last_mapping) {
>> +        if ((phys_addr == (last_mapping->phys_addr + last_mapping->length)) &&
>> +            (virt_addr == (last_mapping->virt_addr + last_mapping->length))) {
>> +            last_mapping->length += length;
>> +            return;
>> +        }
>> +    }
>> +
>> +    QTAILQ_FOREACH(memory_mapping, &list->head, next) {
>> +        last_mapping = memory_mapping;
>> +        if ((phys_addr == (last_mapping->phys_addr + last_mapping->length)) &&
>> +            (virt_addr == (last_mapping->virt_addr + last_mapping->length))) {
>> +            last_mapping->length += length;
>> +            return;
>> +        }
>> +
>> +        if (!(phys_addr >= (last_mapping->phys_addr)) ||
>> +            !(phys_addr < (last_mapping->phys_addr + last_mapping->length))) {
>> +            /* last_mapping does not contain this region */
>> +            continue;
>> +        }
>> +        if (!(virt_addr >= (last_mapping->virt_addr)) ||
>> +            !(virt_addr < (last_mapping->virt_addr + last_mapping->length))) {
>> +            /* last_mapping does not contain this region */
>> +            continue;
>> +        }
>> +        if ((virt_addr - last_mapping->virt_addr) !=
>> +            (phys_addr - last_mapping->phys_addr)) {
>> +            /*
>> +             * last_mapping contains this region, but we should create another
>> +             * mapping region.
>> +             */
>> +            break;
>> +        }
>> +
>> +        /* merge this region into last_mapping */
>> +        if ((virt_addr + length) >
>> +            (last_mapping->virt_addr + last_mapping->length)) {
>> +            last_mapping->length = virt_addr + length - last_mapping->virt_addr;
>> +        }
>> +        return;
>> +    }
>> +
>> +    /* this region can not be merged into any existed memory mapping. */
>> +    create_new_memory_mapping(list, phys_addr, virt_addr, length);
>> +    return;
>> +}
>> +
>> +void free_memory_mapping_list(MemoryMappingList *list)
>> +{
>> +    MemoryMapping *p, *q;
>> +
>> +    QTAILQ_FOREACH_SAFE(p, &list->head, next, q) {
>> +        QTAILQ_REMOVE(&list->head, p, next);
>> +        g_free(p);
> 
> Can't last_mapping still point to this object?

Yes, will fix it(set last_mapping to NULL in get memory mapping())

Thanks for your comment
Wen Congyang

> 
>> +    }
>> +
>> +    list->num = 0;
>> +}
>> diff --git a/memory_mapping.h b/memory_mapping.h
>> new file mode 100644
>> index 0000000..871591d
>> --- /dev/null
>> +++ b/memory_mapping.h
>> @@ -0,0 +1,29 @@
>> +#ifndef MEMORY_MAPPING_H
>> +#define MEMORY_MAPPING_H
>> +
>> +#include "qemu-queue.h"
>> +
>> +typedef struct MemoryMapping {
>> +    target_phys_addr_t phys_addr;
>> +    target_ulong virt_addr;
>> +    ram_addr_t length;
>> +    QTAILQ_ENTRY(MemoryMapping) next;
>> +} MemoryMapping;
>> +
>> +typedef struct MemoryMappingList {
>> +    unsigned int num;
>> +    QTAILQ_HEAD(, MemoryMapping) head;
>> +} MemoryMappingList;
>> +
>> +void create_new_memory_mapping_head(MemoryMappingList *list,
>> +                                    target_phys_addr_t phys_addr,
>> +                                    target_phys_addr_t virt_addr,
>> +                                    ram_addr_t length);
>> +void add_to_memory_mapping(MemoryMappingList *list,
>> +                           target_phys_addr_t phys_addr,
>> +                           target_phys_addr_t virt_addr,
>> +                           ram_addr_t length);
>> +
>> +void free_memory_mapping_list(MemoryMappingList *list);
>> +
>> +#endif
> 
> Jan
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC][PATCT 0/5 v2] dump memory when host pci device is used by guest
  2011-12-13  9:20       ` Wen Congyang
@ 2011-12-15  1:30         ` HATAYAMA Daisuke
  2011-12-15  8:57           ` Wen Congyang
  0 siblings, 1 reply; 16+ messages in thread
From: HATAYAMA Daisuke @ 2011-12-15  1:30 UTC (permalink / raw)
  To: wency; +Cc: jan.kiszka, anderson, qemu-devel

From: Wen Congyang <wency@cn.fujitsu.com>
Subject: Re: [Qemu-devel] [RFC][PATCT 0/5 v2] dump memory when host pci device is used by guest
Date: Tue, 13 Dec 2011 17:20:24 +0800

> At 12/13/2011 02:01 PM, HATAYAMA Daisuke Write:
>> From: Wen Congyang <wency@cn.fujitsu.com>
>> Subject: Re: [Qemu-devel] [RFC][PATCT 0/5 v2] dump memory when host pci device is used by guest
>> Date: Tue, 13 Dec 2011 11:35:53 +0800
>> 
>>> Hi, hatayama-san
>>>
>>> At 12/13/2011 11:12 AM, HATAYAMA Daisuke Write:
>>>> Hello Wen,
>>>>
>>>> From: Wen Congyang <wency@cn.fujitsu.com>
>>>> Subject: [Qemu-devel] [RFC][PATCT 0/5 v2] dump memory when host pci device is used by guest
>>>> Date: Fri, 09 Dec 2011 15:57:26 +0800
>>>>
>>>>> Hi, all
>>>>>
>>>>> 'virsh dump' can not work when host pci device is used by guest. We have
>>>>> discussed this issue here:
>>>>> http://lists.nongnu.org/archive/html/qemu-devel/2011-10/msg00736.html
>>>>>
>>>>> We have determined to introduce a new command dump to dump memory. The core
>>>>> file's format can be elf.
>>>>>
>>>>> Note:
>>>>> 1. The guest should be x86 or x86_64. The other arch is not supported.
>>>>> 2. If you use old gdb, gdb may crash. I use gdb-7.3.1, and it does not crash.
>>>>> 3. If the OS is in the second kernel, gdb may not work well, and crash can
>>>>>    work by specifying '--machdep phys_addr=xxx' in the command line. The
>>>>>    reason is that the second kernel will update the page table, and we can
>>>>>    not get the page table for the first kernel.
>>>>
>>>> I guess still the current implementation breaks vmalloc'ed area that
>>>> needs page tables originally located in the first 640kB, right? If you
>>>> want to do so in a correct way, you need to identify a position of
>>>> backup region and get data of 1st kernel's page tables.
>>>
>>> I do not know anything about vmalloc'ed area. Can you explain it more
>>> detailed?
>>>
>> 
>> It's memory area not straight-mapped. To read the area, it's necessary
>> to look up guest machine's page tables. If I understand correctly,
>> your current implementation translates the vmalloc'ed area so that the
>> generated vmcore is linearly mapped w.r.t. virtual-address for gdb to
>> work.
> 
> Do you mean the page table for vmalloc'ed area is stored in first 640KB,
> and it may be overwriten by the second kernel(this region has been backed up)?
> 

This might be wrong.. I've locally tried to ensure this but I have not
done yet.

I make sure at least pgtlist_data could be within the first 640kB:

crash> log
<cut>
No NUMA configuration found
Faking a node at 0000000000000000-000000007f800000
Bootmem setup node 0 0000000000000000-000000007f800000
  NODE_DATA [0000000000011000 - 0000000000044fff] <-- this
  bootmap [0000000000045000 -  0000000000054eff] pages 10
(7 early reservations) ==> bootmem [0000000000 - 007f800000]
  #0 [0000000000 - 0000001000]   BIOS data page ==> [0000000000 - 0000001000]
  #1 [0000006000 - 0000008000]       TRAMPOLINE ==> [0000006000 - 0000008000]

And I had ever had the vmcore created after entering 2nd kernel where
I cannot see module data using mod sub-command, which was resolved by
re-reading the address to the corresponding backup region.

I guess becuase crash uses page table on memory, this affects paging
badly.

I want to look into this more but I don't have such vmcore now because
I lost them accidentally... I tried to reproduce this some times
yesterday but didn't succeed. The vmcore above is one of them.

>> 
>> kdump saves the first 640kB physical memory into the backup region. I
>> guess, for some vmcores created by the current implementation, gdb and
>> crash cannot see the vmalloc'ed memory area that needs page tables
> 
> Hmm, IIRC, crash do not use CPU's page table. gdb use the information in
> PT_LOAD to read memory area.
> 

I was confused this. Your dump command uses CPU's page table.

So on the qemu side you can get page table over a whole physical
address, right? If so, contents themselves are not broken, I think.

>> placed at the 640kB region, correctly. For example, try to use mod
>> sub-command. Kernel modules are allocated on vmalloc'ed area.
>> 
>> I have developped a very similar logic for sadump. Look at sadump.c in
>> crash. Logic itself is very simple, but debugging information is
>> necessary. Documentation/kdump/kdump.txt and the following paper
>> explains backup region mechanism very well, and the implementaion
>> around there remains same now.
> 
> Hmm, we can not use debugging information on qemu sied.
> 

How about re-reading them later in crash? Users want to see the 1st
kernel rather than 2nd kernel.

To do it, the dump format must be able to be distingished from crash.
Which does function in crash read vmcores created by this command?
kcore, or netdump?

Thanks.
HATAYAMA, Daisuke

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Qemu-devel] [RFC][PATCT 0/5 v2] dump memory when host pci device is used by guest
  2011-12-15  1:30         ` HATAYAMA Daisuke
@ 2011-12-15  8:57           ` Wen Congyang
  0 siblings, 0 replies; 16+ messages in thread
From: Wen Congyang @ 2011-12-15  8:57 UTC (permalink / raw)
  To: HATAYAMA Daisuke; +Cc: jan.kiszka, anderson, qemu-devel

At 12/15/2011 09:30 AM, HATAYAMA Daisuke Write:
> From: Wen Congyang <wency@cn.fujitsu.com>
> Subject: Re: [Qemu-devel] [RFC][PATCT 0/5 v2] dump memory when host pci device is used by guest
> Date: Tue, 13 Dec 2011 17:20:24 +0800
> 
>> At 12/13/2011 02:01 PM, HATAYAMA Daisuke Write:
>>> From: Wen Congyang <wency@cn.fujitsu.com>
>>> Subject: Re: [Qemu-devel] [RFC][PATCT 0/5 v2] dump memory when host pci device is used by guest
>>> Date: Tue, 13 Dec 2011 11:35:53 +0800
>>>
>>>> Hi, hatayama-san
>>>>
>>>> At 12/13/2011 11:12 AM, HATAYAMA Daisuke Write:
>>>>> Hello Wen,
>>>>>
>>>>> From: Wen Congyang <wency@cn.fujitsu.com>
>>>>> Subject: [Qemu-devel] [RFC][PATCT 0/5 v2] dump memory when host pci device is used by guest
>>>>> Date: Fri, 09 Dec 2011 15:57:26 +0800
>>>>>
>>>>>> Hi, all
>>>>>>
>>>>>> 'virsh dump' can not work when host pci device is used by guest. We have
>>>>>> discussed this issue here:
>>>>>> http://lists.nongnu.org/archive/html/qemu-devel/2011-10/msg00736.html
>>>>>>
>>>>>> We have determined to introduce a new command dump to dump memory. The core
>>>>>> file's format can be elf.
>>>>>>
>>>>>> Note:
>>>>>> 1. The guest should be x86 or x86_64. The other arch is not supported.
>>>>>> 2. If you use old gdb, gdb may crash. I use gdb-7.3.1, and it does not crash.
>>>>>> 3. If the OS is in the second kernel, gdb may not work well, and crash can
>>>>>>    work by specifying '--machdep phys_addr=xxx' in the command line. The
>>>>>>    reason is that the second kernel will update the page table, and we can
>>>>>>    not get the page table for the first kernel.
>>>>>
>>>>> I guess still the current implementation breaks vmalloc'ed area that
>>>>> needs page tables originally located in the first 640kB, right? If you
>>>>> want to do so in a correct way, you need to identify a position of
>>>>> backup region and get data of 1st kernel's page tables.
>>>>
>>>> I do not know anything about vmalloc'ed area. Can you explain it more
>>>> detailed?
>>>>
>>>
>>> It's memory area not straight-mapped. To read the area, it's necessary
>>> to look up guest machine's page tables. If I understand correctly,
>>> your current implementation translates the vmalloc'ed area so that the
>>> generated vmcore is linearly mapped w.r.t. virtual-address for gdb to
>>> work.
>>
>> Do you mean the page table for vmalloc'ed area is stored in first 640KB,
>> and it may be overwriten by the second kernel(this region has been backed up)?
>>
> 
> This might be wrong.. I've locally tried to ensure this but I have not
> done yet.
> 
> I make sure at least pgtlist_data could be within the first 640kB:
> 
> crash> log
> <cut>
> No NUMA configuration found
> Faking a node at 0000000000000000-000000007f800000
> Bootmem setup node 0 0000000000000000-000000007f800000
>   NODE_DATA [0000000000011000 - 0000000000044fff] <-- this

Only kernel built with CONFIG_NUMA has this. This config is only enabled
on RHEL x86_64. I do not have such env on hand now.

>   bootmap [0000000000045000 -  0000000000054eff] pages 10
> (7 early reservations) ==> bootmem [0000000000 - 007f800000]
>   #0 [0000000000 - 0000001000]   BIOS data page ==> [0000000000 - 0000001000]
>   #1 [0000006000 - 0000008000]       TRAMPOLINE ==> [0000006000 - 0000008000]
> 
> And I had ever had the vmcore created after entering 2nd kernel where
> I cannot see module data using mod sub-command, which was resolved by
> re-reading the address to the corresponding backup region.
> 
> I guess becuase crash uses page table on memory, this affects paging
> badly.
> 
> I want to look into this more but I don't have such vmcore now because
> I lost them accidentally... I tried to reproduce this some times
> yesterday but didn't succeed. The vmcore above is one of them.
> 
>>>
>>> kdump saves the first 640kB physical memory into the backup region. I
>>> guess, for some vmcores created by the current implementation, gdb and
>>> crash cannot see the vmalloc'ed memory area that needs page tables
>>
>> Hmm, IIRC, crash do not use CPU's page table. gdb use the information in
>> PT_LOAD to read memory area.
>>
> 
> I was confused this. Your dump command uses CPU's page table.
> 
> So on the qemu side you can get page table over a whole physical
> address, right? If so, contents themselves are not broken, I think.
> 
>>> placed at the 640kB region, correctly. For example, try to use mod
>>> sub-command. Kernel modules are allocated on vmalloc'ed area.
>>>
>>> I have developped a very similar logic for sadump. Look at sadump.c in
>>> crash. Logic itself is very simple, but debugging information is
>>> necessary. Documentation/kdump/kdump.txt and the following paper
>>> explains backup region mechanism very well, and the implementaion
>>> around there remains same now.
>>
>> Hmm, we can not use debugging information on qemu sied.
>>
> 
> How about re-reading them later in crash? Users want to see the 1st
> kernel rather than 2nd kernel.

A easy way to see the 1st kernel is: specify --machdep phys_base=xxx in
the command line.

Thanks
Wen Congyang
> 
> To do it, the dump format must be able to be distingished from crash.
> Which does function in crash read vmcores created by this command?
> kcore, or netdump?
> 
> Thanks.
> HATAYAMA, Daisuke
> 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2011-12-15  8:55 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-09  7:57 [Qemu-devel] [RFC][PATCT 0/5 v2] dump memory when host pci device is used by guest Wen Congyang
2011-12-09  8:06 ` [Qemu-devel] [RFC][PATCH 1/5 v2] Add API to create memory mapping list Wen Congyang
2011-12-13 13:03   ` Jan Kiszka
2011-12-14  8:10     ` Wen Congyang
2011-12-09  8:07 ` [Qemu-devel] [RFC][PATCH 2/5 v2] Add API to check whether a physical address is I/O address Wen Congyang
2011-12-09  8:08 ` [Qemu-devel] [RFC][PATCH 3/5 v2] target-i386: implement cpu_get_memory_mapping() Wen Congyang
2011-12-09  8:09 ` [Qemu-devel] [RFC][PATCH 4/5 v2] Add API to get memory mapping Wen Congyang
2011-12-09  8:09 ` [Qemu-devel] [RFC][PATCH 5/5v2] introduce a new monitor command 'dump' to dump guest's memory Wen Congyang
2011-12-13  3:12 ` [Qemu-devel] [RFC][PATCT 0/5 v2] dump memory when host pci device is used by guest HATAYAMA Daisuke
2011-12-13  3:35   ` Wen Congyang
2011-12-13  6:01     ` HATAYAMA Daisuke
2011-12-13  9:20       ` Wen Congyang
2011-12-15  1:30         ` HATAYAMA Daisuke
2011-12-15  8:57           ` Wen Congyang
2011-12-13 12:55 ` Jan Kiszka
2011-12-14  2:43   ` Wen Congyang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).