[Qemu-devel] [RFC][PATCH 00/16 v6] introducing a new, dedicated memory dump mechanism

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [RFC][PATCH 00/16 v6] introducing a new, dedicated memory dump mechanism
@ 2012-02-09  3:16 Wen Congyang
  2012-02-09  3:19 ` [Qemu-devel] [RFC][PATCH 01/16 v6] monitor: introduce qemu_suspend_monitor()/qemu_resume_monitor() Wen Congyang
                   ` (16 more replies)
  0 siblings, 17 replies; 68+ messages in thread
From: Wen Congyang @ 2012-02-09  3:16 UTC (permalink / raw)
  To: qemu-devel, Jan Kiszka, Dave Anderson, HATAYAMA Daisuke,
	Luiz Capitulino, Eric Blake

Hi, all

'virsh dump' can not work when host pci device is used by guest. We have
discussed this issue here:
http://lists.nongnu.org/archive/html/qemu-devel/2011-10/msg00736.html

We have determined to introduce a new command dump to dump memory. The core
file's format can be elf.

Note:
1. The guest should be x86 or x86_64. The other arch is not supported.
2. If you use old gdb, gdb may crash. I use gdb-7.3.1, and it does not crash.
3. If the OS is in the second kernel, gdb may not work well, and crash can
   work by specifying '--machdep phys_addr=xxx' in the command line. The
   reason is that the second kernel will update the page table, and we can
   not get the page table for the first kernel.
4. If the guest OS is 32 bit and the memory size is larger than 4G, the vmcore
   is elf64 format. You should use the gdb which is built with --enable-64-bit-bfd.
5. This patchset is based on the upstream tree, and apply one patch that is still
   in Luiz Capitulino's tree, because I use the API qemu_get_fd() in this patchset.

Changes from v5 to v6:
1. allow user to dump a fraction of the memory
2. fix some bugs

Changes from v4 to v5:
1. convert the new command dump to QAPI 

Changes from v3 to v4:
1. support it to run asynchronously
2. add API to cancel dumping and query dumping progress
3. add API to control dumping speed
4. auto cancel dumping when the user resumes vm, and the status is failed.

Changes from v2 to v3:
1. address Jan Kiszka's comment

Changes from v1 to v2:
1. fix virt addr in the vmcore.

Wen Congyang (16):
  monitor: introduce qemu_suspend_monitor()/qemu_resume_monitor()
  Add API to create memory mapping list
  Add API to check whether a physical address is I/O address
  target-i386: implement cpu_get_memory_mapping()
  Add API to get memory mapping
  target-i386: Add API to write elf notes to core file
  target-i386: Add API to add extra memory mapping
  target-i386: add API to get dump info
  introduce a new monitor command 'dump' to dump guest's memory
  run dump at the background
  support detached dump
  support to cancel the current dumping
  support to set dumping speed
  support to query dumping status
  auto cancel dumping after vm state is changed to run
  allow user to dump a fraction of the memory

 Makefile.target         |   11 +-
 cpu-all.h               |   18 +
 cpu-common.h            |    2 +
 dump.c                  |  885 +++++++++++++++++++++++++++++++++++++++++++++++
 dump.h                  |   13 +
 exec.c                  |   16 +
 hmp-commands.hx         |   49 +++
 hmp.c                   |   49 +++
 hmp.h                   |    4 +
 memory_mapping.c        |  222 ++++++++++++
 memory_mapping.h        |   41 +++
 monitor.c               |   37 ++
 monitor.h               |    2 +
 qapi-schema.json        |   72 ++++
 qmp-commands.hx         |  119 +++++++
 target-i386/arch-dump.c |  574 ++++++++++++++++++++++++++++++
 vl.c                    |    5 +-
 17 files changed, 2112 insertions(+), 7 deletions(-)
 create mode 100644 dump.c
 create mode 100644 dump.h
 create mode 100644 memory_mapping.c
 create mode 100644 memory_mapping.h
 create mode 100644 target-i386/arch-dump.c

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [Qemu-devel] [RFC][PATCH 01/16 v6] monitor: introduce qemu_suspend_monitor()/qemu_resume_monitor()
  2012-02-09  3:16 [Qemu-devel] [RFC][PATCH 00/16 v6] introducing a new, dedicated memory dump mechanism Wen Congyang
@ 2012-02-09  3:19 ` Wen Congyang
  2012-02-14 16:19   ` Jan Kiszka
  2012-02-09  3:20 ` [Qemu-devel] [RFC][PATCH 02/16 v6] Add API to create memory mapping list Wen Congyang
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 68+ messages in thread
From: Wen Congyang @ 2012-02-09  3:19 UTC (permalink / raw)
  To: qemu-devel, Jan Kiszka, Dave Anderson, HATAYAMA Daisuke,
	Luiz Capitulino, Eric Blake

Sync command needs these two APIs to suspend/resume monitor.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 monitor.c |   27 +++++++++++++++++++++++++++
 monitor.h |    2 ++
 2 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/monitor.c b/monitor.c
index 11639b1..7e72739 100644
--- a/monitor.c
+++ b/monitor.c
@@ -4442,6 +4442,26 @@ static void monitor_command_cb(Monitor *mon, const char *cmdline, void *opaque)
     monitor_resume(mon);
 }
 
+int qemu_suspend_monitor(const char *fmt, ...)
+{
+    int ret;
+
+    if (cur_mon) {
+        ret = monitor_suspend(cur_mon);
+    } else {
+        ret = -ENOTTY;
+    }
+
+    if (ret < 0 && fmt) {
+        va_list ap;
+        va_start(ap, fmt);
+        monitor_vprintf(cur_mon, fmt, ap);
+        va_end(ap);
+    }
+
+    return ret;
+}
+
 int monitor_suspend(Monitor *mon)
 {
     if (!mon->rs)
@@ -4450,6 +4470,13 @@ int monitor_suspend(Monitor *mon)
     return 0;
 }
 
+void qemu_resume_monitor(void)
+{
+    if (cur_mon) {
+        monitor_resume(cur_mon);
+    }
+}
+
 void monitor_resume(Monitor *mon)
 {
     if (!mon->rs)
diff --git a/monitor.h b/monitor.h
index 58109af..60a1e17 100644
--- a/monitor.h
+++ b/monitor.h
@@ -46,7 +46,9 @@ int monitor_cur_is_qmp(void);
 void monitor_protocol_event(MonitorEvent event, QObject *data);
 void monitor_init(CharDriverState *chr, int flags);
 
+int qemu_suspend_monitor(const char *fmt, ...) GCC_FMT_ATTR(1, 2);
 int monitor_suspend(Monitor *mon);
+void qemu_resume_monitor(void);
 void monitor_resume(Monitor *mon);
 
 int monitor_read_bdrv_key_start(Monitor *mon, BlockDriverState *bs,
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [RFC][PATCH 02/16 v6] Add API to create memory mapping list
  2012-02-09  3:16 [Qemu-devel] [RFC][PATCH 00/16 v6] introducing a new, dedicated memory dump mechanism Wen Congyang
  2012-02-09  3:19 ` [Qemu-devel] [RFC][PATCH 01/16 v6] monitor: introduce qemu_suspend_monitor()/qemu_resume_monitor() Wen Congyang
@ 2012-02-09  3:20 ` Wen Congyang
  2012-02-14 16:39   ` Jan Kiszka
  2012-02-09  3:21 ` [Qemu-devel] [RFC][PATCH 03/16 v6] Add API to check whether a physical address is I/O address Wen Congyang
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 68+ messages in thread
From: Wen Congyang @ 2012-02-09  3:20 UTC (permalink / raw)
  To: qemu-devel, Jan Kiszka, Dave Anderson, HATAYAMA Daisuke,
	Luiz Capitulino, Eric Blake

The memory mapping list stores virtual address and physical address mapping.
The folloing patch will use this information to create PT_LOAD in the vmcore.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 Makefile.target  |    1 +
 memory_mapping.c |  130 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 memory_mapping.h |   38 ++++++++++++++++
 3 files changed, 169 insertions(+), 0 deletions(-)
 create mode 100644 memory_mapping.c
 create mode 100644 memory_mapping.h

diff --git a/Makefile.target b/Makefile.target
index 68481a3..e35e464 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -200,6 +200,7 @@ obj-$(CONFIG_KVM) += kvm.o kvm-all.o
 obj-$(CONFIG_NO_KVM) += kvm-stub.o
 obj-$(CONFIG_VGA) += vga.o
 obj-y += memory.o savevm.o
+obj-y += memory_mapping.o
 LIBS+=-lz
 
 obj-i386-$(CONFIG_KVM) += hyperv.o
diff --git a/memory_mapping.c b/memory_mapping.c
new file mode 100644
index 0000000..d83b7d7
--- /dev/null
+++ b/memory_mapping.c
@@ -0,0 +1,130 @@
+/*
+ * QEMU memory mapping
+ *
+ * Copyright Fujitsu, Corp. 2011
+ *
+ * Authors:
+ *     Wen Congyang <wency@cn.fujitsu.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#include "cpu.h"
+#include "cpu-all.h"
+#include "memory_mapping.h"
+
+static MemoryMapping *last_mapping;
+
+static void create_new_memory_mapping(MemoryMappingList *list,
+                                      target_phys_addr_t phys_addr,
+                                      target_phys_addr_t virt_addr,
+                                      ram_addr_t length)
+{
+    MemoryMapping *memory_mapping, *p;
+
+    memory_mapping = g_malloc(sizeof(MemoryMapping));
+    memory_mapping->phys_addr = phys_addr;
+    memory_mapping->virt_addr = virt_addr;
+    memory_mapping->length = length;
+    last_mapping = memory_mapping;
+    list->num++;
+    QTAILQ_FOREACH(p, &list->head, next) {
+        if (p->phys_addr >= memory_mapping->phys_addr) {
+            QTAILQ_INSERT_BEFORE(p, memory_mapping, next);
+            return;
+        }
+    }
+    QTAILQ_INSERT_TAIL(&list->head, memory_mapping, next);
+    return;
+}
+
+void create_new_memory_mapping_head(MemoryMappingList *list,
+                                    target_phys_addr_t phys_addr,
+                                    target_phys_addr_t virt_addr,
+                                    ram_addr_t length)
+{
+    MemoryMapping *memory_mapping;
+
+    memory_mapping = g_malloc(sizeof(MemoryMapping));
+    memory_mapping->phys_addr = phys_addr;
+    memory_mapping->virt_addr = virt_addr;
+    memory_mapping->length = length;
+    last_mapping = memory_mapping;
+    list->num++;
+    QTAILQ_INSERT_HEAD(&list->head, memory_mapping, next);
+    return;
+}
+
+void add_to_memory_mapping(MemoryMappingList *list,
+                           target_phys_addr_t phys_addr,
+                           target_phys_addr_t virt_addr,
+                           ram_addr_t length)
+{
+    MemoryMapping *memory_mapping;
+
+    if (QTAILQ_EMPTY(&list->head)) {
+        create_new_memory_mapping(list, phys_addr, virt_addr, length);
+        return;
+    }
+
+    if (last_mapping) {
+        if ((phys_addr == (last_mapping->phys_addr + last_mapping->length)) &&
+            (virt_addr == (last_mapping->virt_addr + last_mapping->length))) {
+            last_mapping->length += length;
+            return;
+        }
+    }
+
+    QTAILQ_FOREACH(memory_mapping, &list->head, next) {
+        last_mapping = memory_mapping;
+        if ((phys_addr == (last_mapping->phys_addr + last_mapping->length)) &&
+            (virt_addr == (last_mapping->virt_addr + last_mapping->length))) {
+            last_mapping->length += length;
+            return;
+        }
+
+        if (!(phys_addr >= (last_mapping->phys_addr)) ||
+            !(phys_addr < (last_mapping->phys_addr + last_mapping->length))) {
+            /* last_mapping does not contain this region */
+            continue;
+        }
+        if (!(virt_addr >= (last_mapping->virt_addr)) ||
+            !(virt_addr < (last_mapping->virt_addr + last_mapping->length))) {
+            /* last_mapping does not contain this region */
+            continue;
+        }
+        if ((virt_addr - last_mapping->virt_addr) !=
+            (phys_addr - last_mapping->phys_addr)) {
+            /*
+             * last_mapping contains this region, but we should create another
+             * mapping region.
+             */
+            break;
+        }
+
+        /* merge this region into last_mapping */
+        if ((virt_addr + length) >
+            (last_mapping->virt_addr + last_mapping->length)) {
+            last_mapping->length = virt_addr + length - last_mapping->virt_addr;
+        }
+        return;
+    }
+
+    /* this region can not be merged into any existed memory mapping. */
+    create_new_memory_mapping(list, phys_addr, virt_addr, length);
+    return;
+}
+
+void free_memory_mapping_list(MemoryMappingList *list)
+{
+    MemoryMapping *p, *q;
+
+    QTAILQ_FOREACH_SAFE(p, &list->head, next, q) {
+        QTAILQ_REMOVE(&list->head, p, next);
+        g_free(p);
+    }
+
+    list->num = 0;
+}
diff --git a/memory_mapping.h b/memory_mapping.h
new file mode 100644
index 0000000..a4b1532
--- /dev/null
+++ b/memory_mapping.h
@@ -0,0 +1,38 @@
+#ifndef MEMORY_MAPPING_H
+#define MEMORY_MAPPING_H
+
+#include "qemu-queue.h"
+
+typedef struct MemoryMapping {
+    target_phys_addr_t phys_addr;
+    target_ulong virt_addr;
+    ram_addr_t length;
+    QTAILQ_ENTRY(MemoryMapping) next;
+} MemoryMapping;
+
+typedef struct MemoryMappingList {
+    unsigned int num;
+    QTAILQ_HEAD(, MemoryMapping) head;
+} MemoryMappingList;
+
+/*
+ * crash needs some memory mapping should be at the head of the list. It will
+ * cause the list is not sorted. So the caller must add the special memory
+ * mapping after adding all the normal memory mapping into list.
+ */
+void create_new_memory_mapping_head(MemoryMappingList *list,
+                                    target_phys_addr_t phys_addr,
+                                    target_phys_addr_t virt_addr,
+                                    ram_addr_t length);
+/*
+ * add or merge the memory region into the memory mapping's list. The list is
+ * sorted by phys_addr.
+ */
+void add_to_memory_mapping(MemoryMappingList *list,
+                           target_phys_addr_t phys_addr,
+                           target_phys_addr_t virt_addr,
+                           ram_addr_t length);
+
+void free_memory_mapping_list(MemoryMappingList *list);
+
+#endif
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [RFC][PATCH 03/16 v6] Add API to check whether a physical address is I/O address
  2012-02-09  3:16 [Qemu-devel] [RFC][PATCH 00/16 v6] introducing a new, dedicated memory dump mechanism Wen Congyang
  2012-02-09  3:19 ` [Qemu-devel] [RFC][PATCH 01/16 v6] monitor: introduce qemu_suspend_monitor()/qemu_resume_monitor() Wen Congyang
  2012-02-09  3:20 ` [Qemu-devel] [RFC][PATCH 02/16 v6] Add API to create memory mapping list Wen Congyang
@ 2012-02-09  3:21 ` Wen Congyang
  2012-02-14 16:52   ` Jan Kiszka
  2012-02-09  3:21 ` [Qemu-devel] [RFC][PATCH 04/16 v6] target-i386: implement cpu_get_memory_mapping() Wen Congyang
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 68+ messages in thread
From: Wen Congyang @ 2012-02-09  3:21 UTC (permalink / raw)
  To: qemu-devel, Jan Kiszka, Dave Anderson, HATAYAMA Daisuke,
	Luiz Capitulino, Eric Blake

This API will be used in the following patch.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 cpu-common.h |    2 ++
 exec.c       |   16 ++++++++++++++++
 2 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/cpu-common.h b/cpu-common.h
index a40c57d..d047137 100644
--- a/cpu-common.h
+++ b/cpu-common.h
@@ -71,6 +71,8 @@ void cpu_physical_memory_unmap(void *buffer, target_phys_addr_t len,
 void *cpu_register_map_client(void *opaque, void (*callback)(void *opaque));
 void cpu_unregister_map_client(void *cookie);
 
+bool is_io_addr(target_phys_addr_t phys_addr);
+
 /* Coalesced MMIO regions are areas where write operations can be reordered.
  * This usually implies that write operations are side-effect free.  This allows
  * batching which can make a major impact on performance when using
diff --git a/exec.c b/exec.c
index b81677a..edc5684 100644
--- a/exec.c
+++ b/exec.c
@@ -4435,3 +4435,19 @@ bool virtio_is_big_endian(void)
 #undef env
 
 #endif
+
+bool is_io_addr(target_phys_addr_t phys_addr)
+{
+    ram_addr_t pd;
+    PhysPageDesc p;
+
+    p = phys_page_find(phys_addr >> TARGET_PAGE_BITS);
+    pd = p.phys_offset;
+
+    if (!is_ram_rom_romd(pd)) {
+        /* I/O region */
+        return true;
+    }
+
+    return false;
+}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [RFC][PATCH 04/16 v6] target-i386: implement cpu_get_memory_mapping()
  2012-02-09  3:16 [Qemu-devel] [RFC][PATCH 00/16 v6] introducing a new, dedicated memory dump mechanism Wen Congyang
                   ` (2 preceding siblings ...)
  2012-02-09  3:21 ` [Qemu-devel] [RFC][PATCH 03/16 v6] Add API to check whether a physical address is I/O address Wen Congyang
@ 2012-02-09  3:21 ` Wen Congyang
  2012-02-14 17:07   ` Jan Kiszka
  2012-02-09  3:22 ` [Qemu-devel] [RFC][PATCH 05/16 v6] Add API to get memory mapping Wen Congyang
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 68+ messages in thread
From: Wen Congyang @ 2012-02-09  3:21 UTC (permalink / raw)
  To: qemu-devel, Jan Kiszka, Dave Anderson, HATAYAMA Daisuke,
	Luiz Capitulino, Eric Blake

Walk cpu's page table and collect all virtual address and physical address mapping.
Then, add these mapping into memory mapping list.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 Makefile.target         |    2 +-
 cpu-all.h               |    7 ++
 target-i386/arch-dump.c |  254 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 262 insertions(+), 1 deletions(-)
 create mode 100644 target-i386/arch-dump.c

diff --git a/Makefile.target b/Makefile.target
index e35e464..d6e5684 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -75,7 +75,7 @@ libobj-$(CONFIG_TCG_INTERPRETER) += tci.o
 libobj-y += fpu/softfloat.o
 libobj-y += op_helper.o helper.o
 ifeq ($(TARGET_BASE_ARCH), i386)
-libobj-y += cpuid.o
+libobj-y += cpuid.o arch-dump.o
 endif
 libobj-$(TARGET_SPARC64) += vis_helper.o
 libobj-$(CONFIG_NEED_MMU) += mmu.o
diff --git a/cpu-all.h b/cpu-all.h
index e2c3c49..4cd7fbb 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -22,6 +22,7 @@
 #include "qemu-common.h"
 #include "qemu-tls.h"
 #include "cpu-common.h"
+#include "memory_mapping.h"
 
 /* some important defines:
  *
@@ -523,4 +524,10 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf);
 int cpu_memory_rw_debug(CPUState *env, target_ulong addr,
                         uint8_t *buf, int len, int is_write);
 
+#if defined(TARGET_I386)
+void cpu_get_memory_mapping(MemoryMappingList *list, CPUState *env);
+#else
+#define cpu_get_memory_mapping(list, env)
+#endif
+
 #endif /* CPU_ALL_H */
diff --git a/target-i386/arch-dump.c b/target-i386/arch-dump.c
new file mode 100644
index 0000000..2e921c7
--- /dev/null
+++ b/target-i386/arch-dump.c
@@ -0,0 +1,254 @@
+/*
+ * i386 dump
+ *
+ * Copyright Fujitsu, Corp. 2011
+ *
+ * Authors:
+ *     Wen Congyang <wency@cn.fujitsu.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#include "cpu.h"
+#include "cpu-all.h"
+
+/* PAE Paging or IA-32e Paging */
+static void walk_pte(MemoryMappingList *list, target_phys_addr_t pte_start_addr,
+                     int32_t a20_mask, target_ulong start_line_addr)
+{
+    target_phys_addr_t pte_addr, start_paddr;
+    uint64_t pte;
+    target_ulong start_vaddr;
+    int i;
+
+    for (i = 0; i < 512; i++) {
+        pte_addr = (pte_start_addr + i * 8) & a20_mask;
+        pte = ldq_phys(pte_addr);
+        if (!(pte & PG_PRESENT_MASK)) {
+            /* not present */
+            continue;
+        }
+
+        start_paddr = (pte & ~0xfff) & ~(0x1ULL << 63);
+        if (is_io_addr(start_paddr)) {
+            /* I/O region */
+            continue;
+        }
+
+        start_vaddr = start_line_addr | ((i & 0x1fff) << 12);
+        add_to_memory_mapping(list, start_paddr, start_vaddr, 1 << 12);
+    }
+}
+
+/* 32-bit Paging */
+static void walk_pte2(MemoryMappingList *list,
+                      target_phys_addr_t pte_start_addr, int32_t a20_mask,
+                      target_ulong start_line_addr)
+{
+    target_phys_addr_t pte_addr, start_paddr;
+    uint32_t pte;
+    target_ulong start_vaddr;
+    int i;
+
+    for (i = 0; i < 1024; i++) {
+        pte_addr = (pte_start_addr + i * 4) & a20_mask;
+        pte = ldl_phys(pte_addr);
+        if (!(pte & PG_PRESENT_MASK)) {
+            /* not present */
+            continue;
+        }
+
+        start_paddr = pte & ~0xfff;
+        if (is_io_addr(start_paddr)) {
+            /* I/O region */
+            continue;
+        }
+
+        start_vaddr = start_line_addr | ((i & 0x3ff) << 12);
+        add_to_memory_mapping(list, start_paddr, start_vaddr, 1 << 12);
+    }
+}
+
+/* PAE Paging or IA-32e Paging */
+static void walk_pde(MemoryMappingList *list, target_phys_addr_t pde_start_addr,
+                     int32_t a20_mask, target_ulong start_line_addr)
+{
+    target_phys_addr_t pde_addr, pte_start_addr, start_paddr;
+    uint64_t pde;
+    target_ulong line_addr, start_vaddr;
+    int i;
+
+    for (i = 0; i < 512; i++) {
+        pde_addr = (pde_start_addr + i * 8) & a20_mask;
+        pde = ldq_phys(pde_addr);
+        if (!(pde & PG_PRESENT_MASK)) {
+            /* not present */
+            continue;
+        }
+
+        line_addr = start_line_addr | ((i & 0x1ff) << 21);
+        if (pde & PG_PSE_MASK) {
+            /* 2 MB page */
+            start_paddr = (pde & ~0x1fffff) & ~(0x1ULL << 63);
+            if (is_io_addr(start_paddr)) {
+                /* I/O region */
+                continue;
+            }
+            start_vaddr = line_addr;
+            add_to_memory_mapping(list, start_paddr, start_vaddr, 1 << 21);
+            continue;
+        }
+
+        pte_start_addr = (pde & ~0xfff) & a20_mask;
+        walk_pte(list, pte_start_addr, a20_mask, line_addr);
+    }
+}
+
+/* 32-bit Paging */
+static void walk_pde2(MemoryMappingList *list,
+                      target_phys_addr_t pde_start_addr, int32_t a20_mask,
+                      bool pse)
+{
+    target_phys_addr_t pde_addr, pte_start_addr, start_paddr;
+    uint32_t pde;
+    target_ulong line_addr, start_vaddr;
+    int i;
+
+    for (i = 0; i < 1024; i++) {
+        pde_addr = (pde_start_addr + i * 4) & a20_mask;
+        pde = ldl_phys(pde_addr);
+        if (!(pde & PG_PRESENT_MASK)) {
+            /* not present */
+            continue;
+        }
+
+        line_addr = (((unsigned int)i & 0x3ff) << 22);
+        if ((pde & PG_PSE_MASK) && pse) {
+            /* 4 MB page */
+            start_paddr = (pde & ~0x3fffff) | ((pde & 0x1fe000) << 19);
+            if (is_io_addr(start_paddr)) {
+                /* I/O region */
+                continue;
+            }
+            start_vaddr = line_addr;
+            add_to_memory_mapping(list, start_paddr, start_vaddr, 1 << 22);
+            continue;
+        }
+
+        pte_start_addr = (pde & ~0xfff) & a20_mask;
+        walk_pte2(list, pte_start_addr, a20_mask, line_addr);
+    }
+}
+
+/* PAE Paging */
+static void walk_pdpe2(MemoryMappingList *list,
+                       target_phys_addr_t pdpe_start_addr, int32_t a20_mask)
+{
+    target_phys_addr_t pdpe_addr, pde_start_addr;
+    uint64_t pdpe;
+    target_ulong line_addr;
+    int i;
+
+    for (i = 0; i < 4; i++) {
+        pdpe_addr = (pdpe_start_addr + i * 8) & a20_mask;
+        pdpe = ldq_phys(pdpe_addr);
+        if (!(pdpe & PG_PRESENT_MASK)) {
+            /* not present */
+            continue;
+        }
+
+        line_addr = (((unsigned int)i & 0x3) << 30);
+        pde_start_addr = (pdpe & ~0xfff) & a20_mask;
+        walk_pde(list, pde_start_addr, a20_mask, line_addr);
+    }
+}
+
+#ifdef TARGET_X86_64
+/* IA-32e Paging */
+static void walk_pdpe(MemoryMappingList *list,
+                      target_phys_addr_t pdpe_start_addr, int32_t a20_mask,
+                      target_ulong start_line_addr)
+{
+    target_phys_addr_t pdpe_addr, pde_start_addr, start_paddr;
+    uint64_t pdpe;
+    target_ulong line_addr, start_vaddr;
+    int i;
+
+    for (i = 0; i < 512; i++) {
+        pdpe_addr = (pdpe_start_addr + i * 8) & a20_mask;
+        pdpe = ldq_phys(pdpe_addr);
+        if (!(pdpe & PG_PRESENT_MASK)) {
+            /* not present */
+            continue;
+        }
+
+        line_addr = start_line_addr | ((i & 0x1ffULL) << 30);
+        if (pdpe & PG_PSE_MASK) {
+            /* 1 GB page */
+            start_paddr = (pdpe & ~0x3fffffff) & ~(0x1ULL << 63);
+            if (is_io_addr(start_paddr)) {
+                /* I/O region */
+                continue;
+            }
+            start_vaddr = line_addr;
+            add_to_memory_mapping(list, start_paddr, start_vaddr, 1 << 30);
+            continue;
+        }
+
+        pde_start_addr = (pdpe & ~0xfff) & a20_mask;
+        walk_pde(list, pde_start_addr, a20_mask, line_addr);
+    }
+}
+
+/* IA-32e Paging */
+static void walk_pml4e(MemoryMappingList *list,
+                       target_phys_addr_t pml4e_start_addr, int32_t a20_mask)
+{
+    target_phys_addr_t pml4e_addr, pdpe_start_addr;
+    uint64_t pml4e;
+    target_ulong line_addr;
+    int i;
+
+    for (i = 0; i < 512; i++) {
+        pml4e_addr = (pml4e_start_addr + i * 8) & a20_mask;
+        pml4e = ldq_phys(pml4e_addr);
+        if (!(pml4e & PG_PRESENT_MASK)) {
+            /* not present */
+            continue;
+        }
+
+        line_addr = ((i & 0x1ffULL) << 39) | (0xffffULL << 48);
+        pdpe_start_addr = (pml4e & ~0xfff) & a20_mask;
+        walk_pdpe(list, pdpe_start_addr, a20_mask, line_addr);
+    }
+}
+#endif
+
+void cpu_get_memory_mapping(MemoryMappingList *list, CPUState *env)
+{
+    if (env->cr[4] & CR4_PAE_MASK) {
+#ifdef TARGET_X86_64
+        if (env->hflags & HF_LMA_MASK) {
+            target_phys_addr_t pml4e_addr;
+
+            pml4e_addr = (env->cr[3] & ~0xfff) & env->a20_mask;
+            walk_pml4e(list, pml4e_addr, env->a20_mask);
+        } else
+#endif
+        {
+            target_phys_addr_t pdpe_addr;
+
+            pdpe_addr = (env->cr[3] & ~0x1f) & env->a20_mask;
+            walk_pdpe2(list, pdpe_addr, env->a20_mask);
+        }
+    } else {
+        target_phys_addr_t pde_addr;
+        bool pse;
+
+        pde_addr = (env->cr[3] & ~0xfff) & env->a20_mask;
+        pse = !!(env->cr[4] & CR4_PSE_MASK);
+        walk_pde2(list, pde_addr, env->a20_mask, pse);
+    }
+}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [RFC][PATCH 05/16 v6] Add API to get memory mapping
  2012-02-09  3:16 [Qemu-devel] [RFC][PATCH 00/16 v6] introducing a new, dedicated memory dump mechanism Wen Congyang
                   ` (3 preceding siblings ...)
  2012-02-09  3:21 ` [Qemu-devel] [RFC][PATCH 04/16 v6] target-i386: implement cpu_get_memory_mapping() Wen Congyang
@ 2012-02-09  3:22 ` Wen Congyang
  2012-02-14 17:21   ` Jan Kiszka
  2012-02-09  3:24 ` [Qemu-devel] [RFC][PATCH 06/16 v6] target-i386: Add API to write elf notes to core file Wen Congyang
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 68+ messages in thread
From: Wen Congyang @ 2012-02-09  3:22 UTC (permalink / raw)
  To: qemu-devel, Jan Kiszka, Dave Anderson, HATAYAMA Daisuke,
	Luiz Capitulino, Eric Blake

Add API to get all virtual address and physical address mapping.
If there is no virtual address for some physical address, the virtual
address is 0.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 memory_mapping.c |   65 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 memory_mapping.h |    1 +
 2 files changed, 66 insertions(+), 0 deletions(-)

diff --git a/memory_mapping.c b/memory_mapping.c
index d83b7d7..fc0ddee 100644
--- a/memory_mapping.c
+++ b/memory_mapping.c
@@ -128,3 +128,68 @@ void free_memory_mapping_list(MemoryMappingList *list)
 
     list->num = 0;
 }
+
+void get_memory_mapping(MemoryMappingList *list)
+{
+    CPUState *env;
+    MemoryMapping *memory_mapping;
+    RAMBlock *block;
+    ram_addr_t offset, length;
+
+    last_mapping = NULL;
+
+    for (env = first_cpu; env != NULL; env = env->next_cpu) {
+        cpu_get_memory_mapping(list, env);
+    }
+
+    /* some memory may be not mapped, add them into memory mapping's list */
+    QLIST_FOREACH(block, &ram_list.blocks, next) {
+        offset = block->offset;
+        length = block->length;
+
+        QTAILQ_FOREACH(memory_mapping, &list->head, next) {
+            if (memory_mapping->phys_addr >= (offset + length)) {
+                /*
+                 * memory_mapping's list does not conatin the region
+                 * [offset, offset+length)
+                 */
+                create_new_memory_mapping(list, offset, 0, length);
+                length = 0;
+                break;
+            }
+
+            if ((memory_mapping->phys_addr + memory_mapping->length) <=
+                offset) {
+                continue;
+            }
+
+            if (memory_mapping->phys_addr > offset) {
+                /*
+                 * memory_mapping's list does not conatin the region
+                 * [offset, memory_mapping->phys_addr)
+                 */
+                create_new_memory_mapping(list, offset, 0,
+                                          memory_mapping->phys_addr - offset);
+            }
+
+            if ((offset + length) <=
+                (memory_mapping->phys_addr + memory_mapping->length)) {
+                length = 0;
+                break;
+            }
+            length -= memory_mapping->phys_addr + memory_mapping->length -
+                      offset;
+            offset = memory_mapping->phys_addr + memory_mapping->length;
+        }
+
+        if (length > 0) {
+            /*
+             * memory_mapping's list does not conatin the region
+             * [offset, memory_mapping->phys_addr)
+             */
+            create_new_memory_mapping(list, offset, 0, length);
+        }
+    }
+
+    return;
+}
diff --git a/memory_mapping.h b/memory_mapping.h
index a4b1532..679f9ef 100644
--- a/memory_mapping.h
+++ b/memory_mapping.h
@@ -34,5 +34,6 @@ void add_to_memory_mapping(MemoryMappingList *list,
                            ram_addr_t length);
 
 void free_memory_mapping_list(MemoryMappingList *list);
+void get_memory_mapping(MemoryMappingList *list);
 
 #endif
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [RFC][PATCH 06/16 v6] target-i386: Add API to write elf notes to core file
  2012-02-09  3:16 [Qemu-devel] [RFC][PATCH 00/16 v6] introducing a new, dedicated memory dump mechanism Wen Congyang
                   ` (4 preceding siblings ...)
  2012-02-09  3:22 ` [Qemu-devel] [RFC][PATCH 05/16 v6] Add API to get memory mapping Wen Congyang
@ 2012-02-09  3:24 ` Wen Congyang
  2012-02-14 17:31   ` Jan Kiszka
  2012-02-09  3:24 ` [Qemu-devel] [RFC][PATCH 07/16 v6] target-i386: Add API to add extra memory mapping Wen Congyang
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 68+ messages in thread
From: Wen Congyang @ 2012-02-09  3:24 UTC (permalink / raw)
  To: qemu-devel, Jan Kiszka, Dave Anderson, HATAYAMA Daisuke,
	Luiz Capitulino, Eric Blake

The core file contains register's value. These APIs write registers to
core file, and them will be called in the following patch.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 cpu-all.h               |    6 +
 target-i386/arch-dump.c |  243 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 249 insertions(+), 0 deletions(-)

diff --git a/cpu-all.h b/cpu-all.h
index 4cd7fbb..efb5ba3 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -526,8 +526,14 @@ int cpu_memory_rw_debug(CPUState *env, target_ulong addr,
 
 #if defined(TARGET_I386)
 void cpu_get_memory_mapping(MemoryMappingList *list, CPUState *env);
+int cpu_write_elf64_note(int fd, CPUState *env, int cpuid,
+                         target_phys_addr_t *offset);
+int cpu_write_elf32_note(int fd, CPUState *env, int cpuid,
+                         target_phys_addr_t *offset);
 #else
 #define cpu_get_memory_mapping(list, env)
+#define cpu_write_elf64_note(fd, env, cpuid, offset) ({ -1; })
+#define cpu_write_elf32_note(fd, env, cpuid, offset) ({ -1; })
 #endif
 
 #endif /* CPU_ALL_H */
diff --git a/target-i386/arch-dump.c b/target-i386/arch-dump.c
index 2e921c7..4c0ff77 100644
--- a/target-i386/arch-dump.c
+++ b/target-i386/arch-dump.c
@@ -11,8 +11,11 @@
  *
  */
 
+#include <elf.h>
+
 #include "cpu.h"
 #include "cpu-all.h"
+#include "monitor.h"
 
 /* PAE Paging or IA-32e Paging */
 static void walk_pte(MemoryMappingList *list, target_phys_addr_t pte_start_addr,
@@ -252,3 +255,243 @@ void cpu_get_memory_mapping(MemoryMappingList *list, CPUState *env)
         walk_pde2(list, pde_addr, env->a20_mask, pse);
     }
 }
+
+#ifdef TARGET_X86_64
+typedef struct {
+    target_ulong r15, r14, r13, r12, rbp, rbx, r11, r10;
+    target_ulong r9, r8, rax, rcx, rdx, rsi, rdi, orig_rax;
+    target_ulong rip, cs, eflags;
+    target_ulong rsp, ss;
+    target_ulong fs_base, gs_base;
+    target_ulong ds, es, fs, gs;
+} x86_64_user_regs_struct;
+
+static int x86_64_write_elf64_note(int fd, CPUState *env, int id,
+                                   target_phys_addr_t *offset)
+{
+    x86_64_user_regs_struct regs;
+    Elf64_Nhdr *note;
+    char *buf;
+    int descsz, note_size, name_size = 5;
+    const char *name = "CORE";
+    int ret;
+
+    regs.r15 = env->regs[15];
+    regs.r14 = env->regs[14];
+    regs.r13 = env->regs[13];
+    regs.r12 = env->regs[12];
+    regs.r11 = env->regs[11];
+    regs.r10 = env->regs[10];
+    regs.r9  = env->regs[9];
+    regs.r8  = env->regs[8];
+    regs.rbp = env->regs[R_EBP];
+    regs.rsp = env->regs[R_ESP];
+    regs.rdi = env->regs[R_EDI];
+    regs.rsi = env->regs[R_ESI];
+    regs.rdx = env->regs[R_EDX];
+    regs.rcx = env->regs[R_ECX];
+    regs.rbx = env->regs[R_EBX];
+    regs.rax = env->regs[R_EAX];
+    regs.rip = env->eip;
+    regs.eflags = env->eflags;
+
+    regs.orig_rax = 0; /* FIXME */
+    regs.cs = env->segs[R_CS].selector;
+    regs.ss = env->segs[R_SS].selector;
+    regs.fs_base = env->segs[R_FS].base;
+    regs.gs_base = env->segs[R_GS].base;
+    regs.ds = env->segs[R_DS].selector;
+    regs.es = env->segs[R_ES].selector;
+    regs.fs = env->segs[R_FS].selector;
+    regs.gs = env->segs[R_GS].selector;
+
+    descsz = 336; /* sizeof(prstatus_t) is 336 on x86_64 box */
+    note_size = ((sizeof(Elf64_Nhdr) + 3) / 4 + (name_size + 3) / 4 +
+                (descsz + 3) / 4) * 4;
+    note = g_malloc(note_size);
+
+    memset(note, 0, note_size);
+    note->n_namesz = cpu_to_le32(name_size);
+    note->n_descsz = cpu_to_le32(descsz);
+    note->n_type = cpu_to_le32(NT_PRSTATUS);
+    buf = (char *)note;
+    buf += ((sizeof(Elf64_Nhdr) + 3) / 4) * 4;
+    memcpy(buf, name, name_size);
+    buf += ((name_size + 3) / 4) * 4;
+    memcpy(buf + 32, &id, 4); /* pr_pid */
+    buf += descsz - sizeof(x86_64_user_regs_struct)-sizeof(target_ulong);
+    memcpy(buf, &regs, sizeof(x86_64_user_regs_struct));
+
+    lseek(fd, *offset, SEEK_SET);
+    ret = write(fd, note, note_size);
+    g_free(note);
+    if (ret < 0) {
+        return -1;
+    }
+
+    *offset += note_size;
+
+    return 0;
+}
+#endif
+
+typedef struct {
+    uint32_t ebx, ecx, edx, esi, edi, ebp, eax;
+    unsigned short ds, __ds, es, __es;
+    unsigned short fs, __fs, gs, __gs;
+    uint32_t orig_eax, eip;
+    unsigned short cs, __cs;
+    uint32_t eflags, esp;
+    unsigned short ss, __ss;
+} x86_user_regs_struct;
+
+static int x86_write_elf64_note(int fd, CPUState *env, int id,
+                                target_phys_addr_t *offset)
+{
+    x86_user_regs_struct regs;
+    Elf64_Nhdr *note;
+    char *buf;
+    int descsz, note_size, name_size = 5;
+    const char *name = "CORE";
+    int ret;
+
+    regs.ebp = env->regs[R_EBP] & 0xffffffff;
+    regs.esp = env->regs[R_ESP] & 0xffffffff;
+    regs.edi = env->regs[R_EDI] & 0xffffffff;
+    regs.esi = env->regs[R_ESI] & 0xffffffff;
+    regs.edx = env->regs[R_EDX] & 0xffffffff;
+    regs.ecx = env->regs[R_ECX] & 0xffffffff;
+    regs.ebx = env->regs[R_EBX] & 0xffffffff;
+    regs.eax = env->regs[R_EAX] & 0xffffffff;
+    regs.eip = env->eip & 0xffffffff;
+    regs.eflags = env->eflags & 0xffffffff;
+
+    regs.cs = env->segs[R_CS].selector;
+    regs.__cs = 0;
+    regs.ss = env->segs[R_SS].selector;
+    regs.__ss = 0;
+    regs.ds = env->segs[R_DS].selector;
+    regs.__ds = 0;
+    regs.es = env->segs[R_ES].selector;
+    regs.__es = 0;
+    regs.fs = env->segs[R_FS].selector;
+    regs.__fs = 0;
+    regs.gs = env->segs[R_GS].selector;
+    regs.__gs = 0;
+
+    descsz = 144; /* sizeof(prstatus_t) is 144 on x86 box */
+    note_size = ((sizeof(Elf64_Nhdr) + 3) / 4 + (name_size + 3) / 4 +
+                (descsz + 3) / 4) * 4;
+    note = g_malloc(note_size);
+
+    memset(note, 0, note_size);
+    note->n_namesz = cpu_to_le32(name_size);
+    note->n_descsz = cpu_to_le32(descsz);
+    note->n_type = cpu_to_le32(NT_PRSTATUS);
+    buf = (char *)note;
+    buf += ((sizeof(Elf64_Nhdr) + 3) / 4) * 4;
+    memcpy(buf, name, name_size);
+    buf += ((name_size + 3) / 4) * 4;
+    memcpy(buf + 24, &id, 4); /* pr_pid */
+    buf += descsz - sizeof(x86_user_regs_struct)-4;
+    memcpy(buf, &regs, sizeof(x86_user_regs_struct));
+
+    lseek(fd, *offset, SEEK_SET);
+    ret = write(fd, note, note_size);
+    g_free(note);
+    if (ret < 0) {
+        return -1;
+    }
+
+    *offset += note_size;
+
+    return 0;
+}
+
+static int x86_write_elf32_note(int fd, CPUState *env, int id,
+                                target_phys_addr_t *offset)
+{
+    x86_user_regs_struct regs;
+    Elf32_Nhdr *note;
+    char *buf;
+    int descsz, note_size, name_size = 5;
+    const char *name = "CORE";
+    int ret;
+
+    regs.ebp = env->regs[R_EBP] & 0xffffffff;
+    regs.esp = env->regs[R_ESP] & 0xffffffff;
+    regs.edi = env->regs[R_EDI] & 0xffffffff;
+    regs.esi = env->regs[R_ESI] & 0xffffffff;
+    regs.edx = env->regs[R_EDX] & 0xffffffff;
+    regs.ecx = env->regs[R_ECX] & 0xffffffff;
+    regs.ebx = env->regs[R_EBX] & 0xffffffff;
+    regs.eax = env->regs[R_EAX] & 0xffffffff;
+    regs.eip = env->eip & 0xffffffff;
+    regs.eflags = env->eflags & 0xffffffff;
+
+    regs.cs = env->segs[R_CS].selector;
+    regs.__cs = 0;
+    regs.ss = env->segs[R_SS].selector;
+    regs.__ss = 0;
+    regs.ds = env->segs[R_DS].selector;
+    regs.__ds = 0;
+    regs.es = env->segs[R_ES].selector;
+    regs.__es = 0;
+    regs.fs = env->segs[R_FS].selector;
+    regs.__fs = 0;
+    regs.gs = env->segs[R_GS].selector;
+    regs.__gs = 0;
+
+    descsz = 144; /* sizeof(prstatus_t) is 144 on x86 box */
+    note_size = ((sizeof(Elf32_Nhdr) + 3) / 4 + (name_size + 3) / 4 +
+                (descsz + 3) / 4) * 4;
+    note = g_malloc(note_size);
+
+    memset(note, 0, note_size);
+    note->n_namesz = cpu_to_le32(name_size);
+    note->n_descsz = cpu_to_le32(descsz);
+    note->n_type = cpu_to_le32(NT_PRSTATUS);
+    buf = (char *)note;
+    buf += ((sizeof(Elf32_Nhdr) + 3) / 4) * 4;
+    memcpy(buf, name, name_size);
+    buf += ((name_size + 3) / 4) * 4;
+    memcpy(buf + 24, &id, 4); /* pr_pid */
+    buf += descsz - sizeof(x86_user_regs_struct)-4;
+    memcpy(buf, &regs, sizeof(x86_user_regs_struct));
+
+    lseek(fd, *offset, SEEK_SET);
+    ret = write(fd, note, note_size);
+    g_free(note);
+    if (ret < 0) {
+        return -1;
+    }
+
+    *offset += note_size;
+
+    return 0;
+}
+
+int cpu_write_elf64_note(int fd, CPUState *env, int cpuid,
+                         target_phys_addr_t *offset)
+{
+    int ret;
+#ifdef TARGET_X86_64
+    bool lma = !!(first_cpu->hflags & HF_LMA_MASK);
+
+    if (lma) {
+        ret = x86_64_write_elf64_note(fd, env, cpuid, offset);
+    } else {
+#endif
+        ret = x86_write_elf64_note(fd, env, cpuid, offset);
+#ifdef TARGET_X86_64
+    }
+#endif
+
+    return ret;
+}
+
+int cpu_write_elf32_note(int fd, CPUState *env, int cpuid,
+                         target_phys_addr_t *offset)
+{
+    return x86_write_elf32_note(fd, env, cpuid, offset);
+}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [RFC][PATCH 07/16 v6] target-i386: Add API to add extra memory mapping
  2012-02-09  3:16 [Qemu-devel] [RFC][PATCH 00/16 v6] introducing a new, dedicated memory dump mechanism Wen Congyang
                   ` (5 preceding siblings ...)
  2012-02-09  3:24 ` [Qemu-devel] [RFC][PATCH 06/16 v6] target-i386: Add API to write elf notes to core file Wen Congyang
@ 2012-02-09  3:24 ` Wen Congyang
  2012-02-14 17:35   ` Jan Kiszka
  2012-02-09  3:26 ` [Qemu-devel] [RFC][PATCH 08/16 v6] target-i386: add API to get dump info Wen Congyang
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 68+ messages in thread
From: Wen Congyang @ 2012-02-09  3:24 UTC (permalink / raw)
  To: qemu-devel, Jan Kiszka, Dave Anderson, HATAYAMA Daisuke,
	Luiz Capitulino, Eric Blake

Crash needs extra memory mapping to determine phys_base.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 cpu-all.h               |    2 ++
 target-i386/arch-dump.c |   43 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+), 0 deletions(-)

diff --git a/cpu-all.h b/cpu-all.h
index efb5ba3..290c43a 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -530,10 +530,12 @@ int cpu_write_elf64_note(int fd, CPUState *env, int cpuid,
                          target_phys_addr_t *offset);
 int cpu_write_elf32_note(int fd, CPUState *env, int cpuid,
                          target_phys_addr_t *offset);
+int cpu_add_extra_memory_mapping(MemoryMappingList *list);
 #else
 #define cpu_get_memory_mapping(list, env)
 #define cpu_write_elf64_note(fd, env, cpuid, offset) ({ -1; })
 #define cpu_write_elf32_note(fd, env, cpuid, offset) ({ -1; })
+#define cpu_add_extra_memory_mapping(list) ({ 0; })
 #endif
 
 #endif /* CPU_ALL_H */
diff --git a/target-i386/arch-dump.c b/target-i386/arch-dump.c
index 4c0ff77..d96f6ae 100644
--- a/target-i386/arch-dump.c
+++ b/target-i386/arch-dump.c
@@ -495,3 +495,46 @@ int cpu_write_elf32_note(int fd, CPUState *env, int cpuid,
 {
     return x86_write_elf32_note(fd, env, cpuid, offset);
 }
+
+/* This function is copied from crash */
+static target_ulong get_phys_base_addr(CPUState *env, target_ulong *base_vaddr)
+{
+    int i;
+    target_ulong kernel_base = -1;
+    target_ulong last, mask;
+
+    for (i = 30, last = -1; (kernel_base == -1) && (i >= 20); i--) {
+        mask = ~((1LL << i) - 1);
+        *base_vaddr = env->idt.base & mask;
+        if (*base_vaddr == last) {
+            continue;
+        }
+
+        kernel_base = cpu_get_phys_page_debug(env, *base_vaddr);
+        last = *base_vaddr;
+    }
+
+    return kernel_base;
+}
+
+int cpu_add_extra_memory_mapping(MemoryMappingList *list)
+{
+#ifdef TARGET_X86_64
+    target_phys_addr_t kernel_base = -1;
+    target_ulong base_vaddr;
+    bool lma = !!(first_cpu->hflags & HF_LMA_MASK);
+
+    if (!lma) {
+        return 0;
+    }
+
+    kernel_base = get_phys_base_addr(first_cpu, &base_vaddr);
+    if (kernel_base == -1) {
+        return -1;
+    }
+
+    create_new_memory_mapping_head(list, kernel_base, base_vaddr,
+                                   TARGET_PAGE_SIZE);
+#endif
+    return 0;
+}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [RFC][PATCH 08/16 v6] target-i386: add API to get dump info
  2012-02-09  3:16 [Qemu-devel] [RFC][PATCH 00/16 v6] introducing a new, dedicated memory dump mechanism Wen Congyang
                   ` (6 preceding siblings ...)
  2012-02-09  3:24 ` [Qemu-devel] [RFC][PATCH 07/16 v6] target-i386: Add API to add extra memory mapping Wen Congyang
@ 2012-02-09  3:26 ` Wen Congyang
  2012-02-14 17:39   ` Jan Kiszka
  2012-02-15  9:12   ` Peter Maydell
  2012-02-09  3:28 ` [Qemu-devel] [RFC][PATCH 09/16 v6] introduce a new monitor command 'dump' to dump guest's memory Wen Congyang
                   ` (8 subsequent siblings)
  16 siblings, 2 replies; 68+ messages in thread
From: Wen Congyang @ 2012-02-09  3:26 UTC (permalink / raw)
  To: qemu-devel, Jan Kiszka, Dave Anderson, HATAYAMA Daisuke,
	Luiz Capitulino, Eric Blake

Dump info contains: endian, class and architecture. The next
patch will use these information to create vmcore.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 cpu-all.h               |    3 +++
 dump.h                  |   10 ++++++++++
 target-i386/arch-dump.c |   34 ++++++++++++++++++++++++++++++++++
 3 files changed, 47 insertions(+), 0 deletions(-)
 create mode 100644 dump.h

diff --git a/cpu-all.h b/cpu-all.h
index 290c43a..268d1f6 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -23,6 +23,7 @@
 #include "qemu-tls.h"
 #include "cpu-common.h"
 #include "memory_mapping.h"
+#include "dump.h"
 
 /* some important defines:
  *
@@ -531,11 +532,13 @@ int cpu_write_elf64_note(int fd, CPUState *env, int cpuid,
 int cpu_write_elf32_note(int fd, CPUState *env, int cpuid,
                          target_phys_addr_t *offset);
 int cpu_add_extra_memory_mapping(MemoryMappingList *list);
+int cpu_get_dump_info(ArchDumpInfo *info);
 #else
 #define cpu_get_memory_mapping(list, env)
 #define cpu_write_elf64_note(fd, env, cpuid, offset) ({ -1; })
 #define cpu_write_elf32_note(fd, env, cpuid, offset) ({ -1; })
 #define cpu_add_extra_memory_mapping(list) ({ 0; })
+#define cpu_get_dump_info(info) ({ -1; })
 #endif
 
 #endif /* CPU_ALL_H */
diff --git a/dump.h b/dump.h
new file mode 100644
index 0000000..a36468b
--- /dev/null
+++ b/dump.h
@@ -0,0 +1,10 @@
+#ifndef DUMP_H
+#define DUMP_H
+
+typedef struct ArchDumpInfo {
+    int d_machine;  /* Architecture */
+    int d_endian;   /* ELFDATA2LSB or ELFDATA2MSB */
+    int d_class;    /* ELFCLASS32 or ELFCLASS64 */
+} ArchDumpInfo;
+
+#endif
diff --git a/target-i386/arch-dump.c b/target-i386/arch-dump.c
index d96f6ae..92a53bc 100644
--- a/target-i386/arch-dump.c
+++ b/target-i386/arch-dump.c
@@ -15,6 +15,7 @@
 
 #include "cpu.h"
 #include "cpu-all.h"
+#include "dump.h"
 #include "monitor.h"
 
 /* PAE Paging or IA-32e Paging */
@@ -538,3 +539,36 @@ int cpu_add_extra_memory_mapping(MemoryMappingList *list)
 #endif
     return 0;
 }
+
+int cpu_get_dump_info(ArchDumpInfo *info)
+{
+    bool lma = false;
+    RAMBlock *block;
+
+#ifdef TARGET_X86_64
+    lma = !!(first_cpu->hflags & HF_LMA_MASK);
+#endif
+
+    if (lma) {
+        info->d_machine = EM_X86_64;
+    } else {
+        info->d_machine = EM_386;
+    }
+    info->d_endian = ELFDATA2LSB;
+
+    if (lma) {
+        info->d_class = ELFCLASS64;
+    } else {
+        info->d_class = ELFCLASS32;
+    }
+
+    QLIST_FOREACH(block, &ram_list.blocks, next) {
+        if (!lma && (block->offset + block->length > UINT_MAX)) {
+            /* The memory size is greater than 4G */
+            info->d_class = ELFCLASS32;
+            break;
+        }
+    }
+
+    return 0;
+}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [RFC][PATCH 09/16 v6] introduce a new monitor command 'dump' to dump guest's memory
  2012-02-09  3:16 [Qemu-devel] [RFC][PATCH 00/16 v6] introducing a new, dedicated memory dump mechanism Wen Congyang
                   ` (7 preceding siblings ...)
  2012-02-09  3:26 ` [Qemu-devel] [RFC][PATCH 08/16 v6] target-i386: add API to get dump info Wen Congyang
@ 2012-02-09  3:28 ` Wen Congyang
  2012-02-14 17:59   ` Jan Kiszka
  2012-02-09  3:28 ` [Qemu-devel] [RFC][PATCH 10/16 v6] run dump at the background Wen Congyang
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 68+ messages in thread
From: Wen Congyang @ 2012-02-09  3:28 UTC (permalink / raw)
  To: qemu-devel, Jan Kiszka, Dave Anderson, HATAYAMA Daisuke,
	Luiz Capitulino, Eric Blake

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 Makefile.target  |    8 +-
 dump.c           |  590 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 dump.h           |    3 +
 hmp-commands.hx  |   16 ++
 hmp.c            |    9 +
 hmp.h            |    1 +
 monitor.c        |    3 +
 qapi-schema.json |   13 ++
 qmp-commands.hx  |   26 +++
 9 files changed, 665 insertions(+), 4 deletions(-)
 create mode 100644 dump.c

diff --git a/Makefile.target b/Makefile.target
index d6e5684..f39ce2f 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -112,7 +112,7 @@ $(call set-vpath, $(SRC_PATH)/linux-user:$(SRC_PATH)/linux-user/$(TARGET_ABI_DIR
 QEMU_CFLAGS+=-I$(SRC_PATH)/linux-user/$(TARGET_ABI_DIR) -I$(SRC_PATH)/linux-user
 obj-y = main.o syscall.o strace.o mmap.o signal.o thunk.o \
       elfload.o linuxload.o uaccess.o gdbstub.o cpu-uname.o \
-      user-exec.o $(oslib-obj-y)
+      user-exec.o $(oslib-obj-y) dump.o
 
 obj-$(TARGET_HAS_BFLT) += flatload.o
 
@@ -150,7 +150,7 @@ LDFLAGS+=-Wl,-segaddr,__STD_PROG_ZONE,0x1000 -image_base 0x0e000000
 LIBS+=-lmx
 
 obj-y = main.o commpage.o machload.o mmap.o signal.o syscall.o thunk.o \
-        gdbstub.o user-exec.o
+        gdbstub.o user-exec.o dump.o
 
 obj-i386-y += ioport-user.o
 
@@ -172,7 +172,7 @@ $(call set-vpath, $(SRC_PATH)/bsd-user)
 QEMU_CFLAGS+=-I$(SRC_PATH)/bsd-user -I$(SRC_PATH)/bsd-user/$(TARGET_ARCH)
 
 obj-y = main.o bsdload.o elfload.o mmap.o signal.o strace.o syscall.o \
-        gdbstub.o uaccess.o user-exec.o
+        gdbstub.o uaccess.o user-exec.o dump.o
 
 obj-i386-y += ioport-user.o
 
@@ -188,7 +188,7 @@ endif #CONFIG_BSD_USER
 # System emulator target
 ifdef CONFIG_SOFTMMU
 
-obj-y = arch_init.o cpus.o monitor.o machine.o gdbstub.o balloon.o ioport.o
+obj-y = arch_init.o cpus.o monitor.o machine.o gdbstub.o balloon.o ioport.o dump.o
 # virtio has to be here due to weird dependency between PCI and virtio-net.
 # need to fix this properly
 obj-$(CONFIG_NO_PCI) += pci-stub.o
diff --git a/dump.c b/dump.c
new file mode 100644
index 0000000..a0e8b86
--- /dev/null
+++ b/dump.c
@@ -0,0 +1,590 @@
+/*
+ * QEMU dump
+ *
+ * Copyright Fujitsu, Corp. 2011
+ *
+ * Authors:
+ *     Wen Congyang <wency@cn.fujitsu.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu-common.h"
+#include <unistd.h>
+#include <elf.h>
+#include <sys/procfs.h>
+#include <glib.h>
+#include "cpu.h"
+#include "cpu-all.h"
+#include "targphys.h"
+#include "monitor.h"
+#include "kvm.h"
+#include "dump.h"
+#include "sysemu.h"
+#include "bswap.h"
+#include "memory_mapping.h"
+#include "error.h"
+#include "qmp-commands.h"
+
+#define CPU_CONVERT_TO_TARGET16(val) \
+({ \
+    uint16_t _val = (val); \
+    if (endian == ELFDATA2LSB) { \
+        _val = cpu_to_le16(_val); \
+    } else {\
+        _val = cpu_to_be16(_val); \
+    } \
+    _val; \
+})
+
+#define CPU_CONVERT_TO_TARGET32(val) \
+({ \
+    uint32_t _val = (val); \
+    if (endian == ELFDATA2LSB) { \
+        _val = cpu_to_le32(_val); \
+    } else {\
+        _val = cpu_to_be32(_val); \
+    } \
+    _val; \
+})
+
+#define CPU_CONVERT_TO_TARGET64(val) \
+({ \
+    uint64_t _val = (val); \
+    if (endian == ELFDATA2LSB) { \
+        _val = cpu_to_le64(_val); \
+    } else {\
+        _val = cpu_to_be64(_val); \
+    } \
+    _val; \
+})
+
+enum {
+    DUMP_STATE_ERROR,
+    DUMP_STATE_SETUP,
+    DUMP_STATE_CANCELLED,
+    DUMP_STATE_ACTIVE,
+    DUMP_STATE_COMPLETED,
+};
+
+typedef struct DumpState {
+    ArchDumpInfo dump_info;
+    MemoryMappingList list;
+    int phdr_num;
+    int state;
+    char *error;
+    int fd;
+    target_phys_addr_t memory_offset;
+} DumpState;
+
+static DumpState *dump_get_current(void)
+{
+    static DumpState current_dump = {
+        .state = DUMP_STATE_SETUP,
+    };
+
+    return &current_dump;
+}
+
+static int dump_cleanup(DumpState *s)
+{
+    int ret = 0;
+
+    free_memory_mapping_list(&s->list);
+    if (s->fd != -1) {
+        close(s->fd);
+        s->fd = -1;
+    }
+
+    return ret;
+}
+
+static void dump_error(DumpState *s, const char *reason)
+{
+    s->state = DUMP_STATE_ERROR;
+    s->error = g_strdup(reason);
+    dump_cleanup(s);
+}
+
+static inline int cpuid(CPUState *env)
+{
+#if defined(CONFIG_USER_ONLY) && defined(CONFIG_USE_NPTL)
+    return env->host_tid;
+#else
+    return env->cpu_index + 1;
+#endif
+}
+
+static int write_elf64_header(DumpState *s)
+{
+    Elf64_Ehdr elf_header;
+    int ret;
+    int endian = s->dump_info.d_endian;
+
+    memset(&elf_header, 0, sizeof(Elf64_Ehdr));
+    memcpy(&elf_header, ELFMAG, 4);
+    elf_header.e_ident[EI_CLASS] = ELFCLASS64;
+    elf_header.e_ident[EI_DATA] = s->dump_info.d_endian;
+    elf_header.e_ident[EI_VERSION] = EV_CURRENT;
+    elf_header.e_type = CPU_CONVERT_TO_TARGET16(ET_CORE);
+    elf_header.e_machine = CPU_CONVERT_TO_TARGET16(s->dump_info.d_machine);
+    elf_header.e_version = CPU_CONVERT_TO_TARGET32(EV_CURRENT);
+    elf_header.e_ehsize = CPU_CONVERT_TO_TARGET16(sizeof(elf_header));
+    elf_header.e_phoff = CPU_CONVERT_TO_TARGET64(sizeof(Elf64_Ehdr));
+    elf_header.e_phentsize = CPU_CONVERT_TO_TARGET16(sizeof(Elf64_Phdr));
+    elf_header.e_phnum = CPU_CONVERT_TO_TARGET16(s->phdr_num);
+
+    lseek(s->fd, 0, SEEK_SET);
+    ret = write(s->fd, &elf_header, sizeof(elf_header));
+    if (ret < 0) {
+        dump_error(s, "dump: failed to write elf header.\n");
+        return -1;
+    }
+
+    return 0;
+}
+
+static int write_elf32_header(DumpState *s)
+{
+    Elf32_Ehdr elf_header;
+    int ret;
+    int endian = s->dump_info.d_endian;
+
+    memset(&elf_header, 0, sizeof(Elf32_Ehdr));
+    memcpy(&elf_header, ELFMAG, 4);
+    elf_header.e_ident[EI_CLASS] = ELFCLASS32;
+    elf_header.e_ident[EI_DATA] = endian;
+    elf_header.e_ident[EI_VERSION] = EV_CURRENT;
+    elf_header.e_type = CPU_CONVERT_TO_TARGET16(ET_CORE);
+    elf_header.e_machine = CPU_CONVERT_TO_TARGET16(s->dump_info.d_machine);
+    elf_header.e_version = CPU_CONVERT_TO_TARGET32(EV_CURRENT);
+    elf_header.e_ehsize = CPU_CONVERT_TO_TARGET16(sizeof(elf_header));
+    elf_header.e_phoff = CPU_CONVERT_TO_TARGET32(sizeof(Elf32_Ehdr));
+    elf_header.e_phentsize = CPU_CONVERT_TO_TARGET16(sizeof(Elf32_Phdr));
+    elf_header.e_phnum = CPU_CONVERT_TO_TARGET16(s->phdr_num);
+
+    lseek(s->fd, 0, SEEK_SET);
+    ret = write(s->fd, &elf_header, sizeof(elf_header));
+    if (ret < 0) {
+        dump_error(s, "dump: failed to write elf header.\n");
+        return -1;
+    }
+
+    return 0;
+}
+
+static int write_elf64_load(DumpState *s, MemoryMapping *memory_mapping,
+                            int phdr_index, target_phys_addr_t offset)
+{
+    Elf64_Phdr phdr;
+    off_t phdr_offset;
+    int ret;
+    int endian = s->dump_info.d_endian;
+
+    memset(&phdr, 0, sizeof(Elf64_Phdr));
+    phdr.p_type = CPU_CONVERT_TO_TARGET32(PT_LOAD);
+    phdr.p_offset = CPU_CONVERT_TO_TARGET64(offset);
+    phdr.p_paddr = CPU_CONVERT_TO_TARGET64(memory_mapping->phys_addr);
+    if (offset == -1) {
+        phdr.p_filesz = 0;
+    } else {
+        phdr.p_filesz = CPU_CONVERT_TO_TARGET64(memory_mapping->length);
+    }
+    phdr.p_memsz = CPU_CONVERT_TO_TARGET64(memory_mapping->length);
+    phdr.p_vaddr = CPU_CONVERT_TO_TARGET64(memory_mapping->virt_addr);
+
+    phdr_offset = sizeof(Elf64_Ehdr) + sizeof(Elf64_Phdr)*phdr_index;
+    lseek(s->fd, phdr_offset, SEEK_SET);
+    ret = write(s->fd, &phdr, sizeof(Elf64_Phdr));
+    if (ret < 0) {
+        dump_error(s, "dump: failed to write program header table.\n");
+        return -1;
+    }
+
+    return 0;
+}
+
+static int write_elf32_load(DumpState *s, MemoryMapping *memory_mapping,
+                            int phdr_index, target_phys_addr_t offset)
+{
+    Elf32_Phdr phdr;
+    off_t phdr_offset;
+    int ret;
+    int endian = s->dump_info.d_endian;
+
+    memset(&phdr, 0, sizeof(Elf32_Phdr));
+    phdr.p_type = CPU_CONVERT_TO_TARGET32(PT_LOAD);
+    phdr.p_offset = CPU_CONVERT_TO_TARGET32(offset);
+    phdr.p_paddr = CPU_CONVERT_TO_TARGET32(memory_mapping->phys_addr);
+    if (offset == -1) {
+        phdr.p_filesz = 0;
+    } else {
+        phdr.p_filesz = CPU_CONVERT_TO_TARGET32(memory_mapping->length);
+    }
+    phdr.p_memsz = CPU_CONVERT_TO_TARGET32(memory_mapping->length);
+    phdr.p_vaddr = CPU_CONVERT_TO_TARGET32(memory_mapping->virt_addr);
+
+    phdr_offset = sizeof(Elf32_Ehdr) + sizeof(Elf32_Phdr)*phdr_index;
+    lseek(s->fd, phdr_offset, SEEK_SET);
+    ret = write(s->fd, &phdr, sizeof(Elf32_Phdr));
+    if (ret < 0) {
+        dump_error(s, "dump: failed to write program header table.\n");
+        return -1;
+    }
+
+    return 0;
+}
+
+static int write_elf64_notes(DumpState *s, int phdr_index,
+                             target_phys_addr_t *offset)
+{
+    CPUState *env;
+    int ret;
+    target_phys_addr_t begin = *offset;
+    Elf64_Phdr phdr;
+    off_t phdr_offset;
+    int id;
+    int endian = s->dump_info.d_endian;
+
+    for (env = first_cpu; env != NULL; env = env->next_cpu) {
+        id = cpuid(env);
+        ret = cpu_write_elf64_note(s->fd, env, id, offset);
+        if (ret < 0) {
+            dump_error(s, "dump: failed to write elf notes.\n");
+            return -1;
+        }
+    }
+
+    memset(&phdr, 0, sizeof(Elf64_Phdr));
+    phdr.p_type = CPU_CONVERT_TO_TARGET32(PT_NOTE);
+    phdr.p_offset = CPU_CONVERT_TO_TARGET64(begin);
+    phdr.p_paddr = 0;
+    phdr.p_filesz = CPU_CONVERT_TO_TARGET64(*offset - begin);
+    phdr.p_memsz = CPU_CONVERT_TO_TARGET64(*offset - begin);
+    phdr.p_vaddr = 0;
+
+    phdr_offset = sizeof(Elf64_Ehdr);
+    lseek(s->fd, phdr_offset, SEEK_SET);
+    ret = write(s->fd, &phdr, sizeof(Elf64_Phdr));
+    if (ret < 0) {
+        dump_error(s, "dump: failed to write program header table.\n");
+        return -1;
+    }
+
+    return 0;
+}
+
+static int write_elf32_notes(DumpState *s, int phdr_index,
+                             target_phys_addr_t *offset)
+{
+    CPUState *env;
+    int ret;
+    target_phys_addr_t begin = *offset;
+    Elf32_Phdr phdr;
+    off_t phdr_offset;
+    int id;
+    int endian = s->dump_info.d_endian;
+
+    for (env = first_cpu; env != NULL; env = env->next_cpu) {
+        id = cpuid(env);
+        ret = cpu_write_elf32_note(s->fd, env, id, offset);
+        if (ret < 0) {
+            dump_error(s, "dump: failed to write elf notes.\n");
+            return -1;
+        }
+    }
+
+    memset(&phdr, 0, sizeof(Elf32_Phdr));
+    phdr.p_type = CPU_CONVERT_TO_TARGET32(PT_NOTE);
+    phdr.p_offset = CPU_CONVERT_TO_TARGET32(begin);
+    phdr.p_paddr = 0;
+    phdr.p_filesz = CPU_CONVERT_TO_TARGET32(*offset - begin);
+    phdr.p_memsz = CPU_CONVERT_TO_TARGET32(*offset - begin);
+    phdr.p_vaddr = 0;
+
+    phdr_offset = sizeof(Elf32_Ehdr);
+    lseek(s->fd, phdr_offset, SEEK_SET);
+    ret = write(s->fd, &phdr, sizeof(Elf32_Phdr));
+    if (ret < 0) {
+        dump_error(s, "dump: failed to write program header table.\n");
+        return -1;
+    }
+
+    return 0;
+}
+
+static int write_data(DumpState *s, void *buf, int length,
+                      target_phys_addr_t *offset)
+{
+    int ret;
+
+    lseek(s->fd, *offset, SEEK_SET);
+    ret = write(s->fd, buf, length);
+    if (ret < 0) {
+        dump_error(s, "dump: failed to save memory.\n");
+        return -1;
+    }
+
+    *offset += length;
+    return 0;
+}
+
+/* write the memroy to vmcore. 1 page per I/O. */
+static int write_memory(DumpState *s, RAMBlock *block,
+                        target_phys_addr_t *offset)
+{
+    int i, ret;
+
+    for (i = 0; i < block->length / TARGET_PAGE_SIZE; i++) {
+        ret = write_data(s, block->host + i * TARGET_PAGE_SIZE,
+                         TARGET_PAGE_SIZE, offset);
+        if (ret < 0) {
+            return -1;
+        }
+    }
+
+    if ((block->length % TARGET_PAGE_SIZE) != 0) {
+        ret = write_data(s, block->host + i * TARGET_PAGE_SIZE,
+                         block->length % TARGET_PAGE_SIZE, offset);
+        if (ret < 0) {
+            return -1;
+        }
+    }
+
+    return 0;
+}
+
+/* get the memory's offset in the vmcore */
+static target_phys_addr_t get_offset(target_phys_addr_t phys_addr,
+                                     target_phys_addr_t memory_offset)
+{
+    RAMBlock *block;
+    target_phys_addr_t offset = memory_offset;
+
+    QLIST_FOREACH(block, &ram_list.blocks, next) {
+        if (phys_addr >= block->offset &&
+            phys_addr < block->offset + block->length) {
+            return phys_addr - block->offset + offset;
+        }
+        offset += block->length;
+    }
+
+    return -1;
+}
+
+static DumpState *dump_init(int fd, Error **errp)
+{
+    CPUState *env;
+    DumpState *s = dump_get_current();
+    int ret;
+
+    vm_stop(RUN_STATE_PAUSED);
+    s->state = DUMP_STATE_SETUP;
+    if (s->error) {
+        g_free(s->error);
+        s->error = NULL;
+    }
+    s->fd = fd;
+
+    /*
+     * get dump info: endian, class and architecture.
+     * If the target architecture is not supported, cpu_get_dump_info() will
+     * return -1.
+     *
+     * if we use kvm, we should synchronize the register before we get dump
+     * info.
+     */
+    for (env = first_cpu; env != NULL; env = env->next_cpu) {
+        cpu_synchronize_state(env);
+    }
+    ret = cpu_get_dump_info(&s->dump_info);
+    if (ret < 0) {
+        error_set(errp, QERR_UNSUPPORTED);
+        return NULL;
+    }
+
+    /* get memory mapping */
+    s->list.num = 0;
+    QTAILQ_INIT(&s->list.head);
+    get_memory_mapping(&s->list);
+
+    /* crash needs extra memory mapping to determine phys_base. */
+    ret = cpu_add_extra_memory_mapping(&s->list);
+    if (ret < 0) {
+        error_set(errp, QERR_UNDEFINED_ERROR);
+        return NULL;
+    }
+
+    /*
+     * calculate phdr_num
+     *
+     * the type of phdr->num is uint16_t, so we should avoid overflow
+     */
+    s->phdr_num = 1; /* PT_NOTE */
+    if (s->list.num > (1 << 16) - 2) {
+        s->phdr_num = (1 << 16) - 1;
+    } else {
+        s->phdr_num += s->list.num;
+    }
+
+    return s;
+}
+
+/* write elf header, PT_NOTE and elf note to vmcore. */
+static int dump_begin(DumpState *s)
+{
+    target_phys_addr_t offset;
+    int ret;
+
+    s->state = DUMP_STATE_ACTIVE;
+
+    /*
+     * the vmcore's format is:
+     *   --------------
+     *   |  elf header |
+     *   --------------
+     *   |  PT_NOTE    |
+     *   --------------
+     *   |  PT_LOAD    |
+     *   --------------
+     *   |  ......     |
+     *   --------------
+     *   |  PT_LOAD    |
+     *   --------------
+     *   |  elf note   |
+     *   --------------
+     *   |  memory     |
+     *   --------------
+     *
+     * we only know where the memory is saved after we write elf note into
+     * vmcore.
+     */
+
+    /* write elf header to vmcore */
+    if (s->dump_info.d_class == ELFCLASS64) {
+        ret = write_elf64_header(s);
+    } else {
+        ret = write_elf32_header(s);
+    }
+    if (ret < 0) {
+        return -1;
+    }
+
+    /* write elf notes to vmcore */
+    if (s->dump_info.d_class == ELFCLASS64) {
+        offset = sizeof(Elf64_Ehdr) + sizeof(Elf64_Phdr)*s->phdr_num;
+        ret = write_elf64_notes(s, 0, &offset);
+    } else {
+        offset = sizeof(Elf32_Ehdr) + sizeof(Elf32_Phdr)*s->phdr_num;
+        ret = write_elf32_notes(s, 0, &offset);
+    }
+
+    if (ret < 0) {
+        return -1;
+    }
+
+    s->memory_offset = offset;
+    return 0;
+}
+
+/* write PT_LOAD to vmcore */
+static int dump_completed(DumpState *s)
+{
+    target_phys_addr_t offset;
+    MemoryMapping *memory_mapping;
+    int phdr_index = 1, ret;
+
+    QTAILQ_FOREACH(memory_mapping, &s->list.head, next) {
+        offset = get_offset(memory_mapping->phys_addr, s->memory_offset);
+        if (s->dump_info.d_class == ELFCLASS64) {
+            ret = write_elf64_load(s, memory_mapping, phdr_index++, offset);
+        } else {
+            ret = write_elf32_load(s, memory_mapping, phdr_index++, offset);
+        }
+        if (ret < 0) {
+            return -1;
+        }
+    }
+
+    s->state = DUMP_STATE_COMPLETED;
+    dump_cleanup(s);
+    return 0;
+}
+
+/* write all memory to vmcore */
+static int dump_iterate(DumpState *s)
+{
+    RAMBlock *block;
+    target_phys_addr_t offset = s->memory_offset;
+    int ret;
+
+    /* write all memory to vmcore */
+    QLIST_FOREACH(block, &ram_list.blocks, next) {
+        ret = write_memory(s, block, &offset);
+        if (ret < 0) {
+            return -1;
+        }
+    }
+
+    return dump_completed(s);
+}
+
+static int create_vmcore(DumpState *s)
+{
+    int ret;
+
+    ret = dump_begin(s);
+    if (ret < 0) {
+        return -1;
+    }
+
+    ret = dump_iterate(s);
+    if (ret < 0) {
+        return -1;
+    }
+
+    return 0;
+}
+
+void qmp_dump(const char *file, Error **errp)
+{
+    const char *p;
+    int fd = -1;
+    DumpState *s;
+
+#if !defined(WIN32)
+    if (strstart(file, "fd:", &p)) {
+        fd = qemu_get_fd(p);
+        if (fd == -1) {
+            error_set(errp, QERR_FD_NOT_FOUND, p);
+            return;
+        }
+    }
+#endif
+
+    if  (strstart(file, "file:", &p)) {
+        fd = open(p, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, S_IRUSR);
+        if (fd < 0) {
+            error_set(errp, QERR_OPEN_FILE_FAILED, p);
+            return;
+        }
+    }
+
+    if (fd == -1) {
+        error_set(errp, QERR_INVALID_PARAMETER, "file");
+        return;
+    }
+
+    s = dump_init(fd, errp);
+    if (!s) {
+        return;
+    }
+
+    if (create_vmcore(s) < 0) {
+        error_set(errp, QERR_IO_ERROR);
+    }
+
+    return;
+}
diff --git a/dump.h b/dump.h
index a36468b..b413d18 100644
--- a/dump.h
+++ b/dump.h
@@ -1,6 +1,9 @@
 #ifndef DUMP_H
 #define DUMP_H
 
+#include "qdict.h"
+#include "error.h"
+
 typedef struct ArchDumpInfo {
     int d_machine;  /* Architecture */
     int d_endian;   /* ELFDATA2LSB or ELFDATA2MSB */
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 573b823..6cfb678 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -867,6 +867,22 @@ new parameters (if specified) once the vm migration finished successfully.
 ETEXI
 
     {
+        .name       = "dump",
+        .args_type  = "file:s",
+        .params     = "file",
+        .help       = "dump to file",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd = hmp_dump,
+    },
+
+
+STEXI
+@item dump @var{file}
+@findex dump
+Dump to @var{file}.
+ETEXI
+
+    {
         .name       = "snapshot_blkdev",
         .args_type  = "device:B,snapshot-file:s?,format:s?",
         .params     = "device [new-image-file] [format]",
diff --git a/hmp.c b/hmp.c
index 8ff8c94..1a69857 100644
--- a/hmp.c
+++ b/hmp.c
@@ -851,3 +851,12 @@ void hmp_block_job_cancel(Monitor *mon, const QDict *qdict)
 
     hmp_handle_error(mon, &error);
 }
+
+void hmp_dump(Monitor *mon, const QDict *qdict)
+{
+    Error *errp = NULL;
+    const char *file = qdict_get_str(qdict, "file");
+
+    qmp_dump(file, &errp);
+    hmp_handle_error(mon, &errp);
+}
diff --git a/hmp.h b/hmp.h
index 18eecbd..66984c5 100644
--- a/hmp.h
+++ b/hmp.h
@@ -58,5 +58,6 @@ void hmp_block_set_io_throttle(Monitor *mon, const QDict *qdict);
 void hmp_block_stream(Monitor *mon, const QDict *qdict);
 void hmp_block_job_set_speed(Monitor *mon, const QDict *qdict);
 void hmp_block_job_cancel(Monitor *mon, const QDict *qdict);
+void hmp_dump(Monitor *mon, const QDict *qdict);
 
 #endif
diff --git a/monitor.c b/monitor.c
index 7e72739..18e1ac7 100644
--- a/monitor.c
+++ b/monitor.c
@@ -73,6 +73,9 @@
 #endif
 #include "hw/lm32_pic.h"
 
+/* for dump */
+#include "dump.h"
+
 //#define DEBUG
 //#define DEBUG_COMPLETION
 
diff --git a/qapi-schema.json b/qapi-schema.json
index d02ee86..1013ae6 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1582,3 +1582,16 @@
 { 'command': 'qom-list-types',
   'data': { '*implements': 'str', '*abstract': 'bool' },
   'returns': [ 'ObjectTypeInfo' ] }
+
+##
+# @dump
+#
+# Dump guest's memory to vmcore.
+#
+# @file: the filename or file descriptor of the vmcore.
+#
+# Returns: nothing on success
+#
+# Since: 1.1
+##
+{ 'command': 'dump', 'data': { 'file': 'str' } }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index b5e2ab8..52d3d3b 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -566,6 +566,32 @@ Example:
 EQMP
 
     {
+        .name       = "dump",
+        .args_type  = "file:s",
+        .params     = "file",
+        .help       = "dump to file",
+        .user_print = monitor_user_noop,
+        .mhandler.cmd_new = qmp_marshal_input_dump,
+    },
+
+SQMP
+dump
+
+
+Dump to file.
+
+Arguments:
+
+- "file": Destination file (json-string)
+
+Example:
+
+-> { "execute": "dump", "arguments": { "file": "fd:dump" } }
+<- { "return": {} }
+
+EQMP
+
+    {
         .name       = "netdev_add",
         .args_type  = "netdev:O",
         .params     = "[user|tap|socket],id=str[,prop=value][,...]",
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [RFC][PATCH 10/16 v6] run dump at the background
  2012-02-09  3:16 [Qemu-devel] [RFC][PATCH 00/16 v6] introducing a new, dedicated memory dump mechanism Wen Congyang
                   ` (8 preceding siblings ...)
  2012-02-09  3:28 ` [Qemu-devel] [RFC][PATCH 09/16 v6] introduce a new monitor command 'dump' to dump guest's memory Wen Congyang
@ 2012-02-09  3:28 ` Wen Congyang
  2012-02-14 18:05   ` Jan Kiszka
  2012-02-09  3:29 ` [Qemu-devel] [RFC][PATCH 11/16 v6] support detached dump Wen Congyang
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 68+ messages in thread
From: Wen Congyang @ 2012-02-09  3:28 UTC (permalink / raw)
  To: qemu-devel, Jan Kiszka, Dave Anderson, HATAYAMA Daisuke,
	Luiz Capitulino, Eric Blake

The new monitor command dump may take long time to finish. So we need run it
at the background.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 dump.c |  155 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------
 1 files changed, 136 insertions(+), 19 deletions(-)

diff --git a/dump.c b/dump.c
index a0e8b86..cb33495 100644
--- a/dump.c
+++ b/dump.c
@@ -77,12 +77,20 @@ typedef struct DumpState {
     char *error;
     int fd;
     target_phys_addr_t memory_offset;
+    int64_t bandwidth;
+    RAMBlock *block;
+    ram_addr_t start;
+    target_phys_addr_t offset;
+    QEMUTimer *timer;
 } DumpState;
 
+#define DEFAULT_THROTTLE  (32 << 20)      /* Default dump speed throttling */
+
 static DumpState *dump_get_current(void)
 {
     static DumpState current_dump = {
         .state = DUMP_STATE_SETUP,
+        .bandwidth = DEFAULT_THROTTLE,
     };
 
     return &current_dump;
@@ -93,11 +101,19 @@ static int dump_cleanup(DumpState *s)
     int ret = 0;
 
     free_memory_mapping_list(&s->list);
+
     if (s->fd != -1) {
         close(s->fd);
         s->fd = -1;
     }
 
+    if (s->timer) {
+        qemu_del_timer(s->timer);
+        qemu_free_timer(s->timer);
+    }
+
+    qemu_resume_monitor();
+
     return ret;
 }
 
@@ -332,25 +348,40 @@ static int write_data(DumpState *s, void *buf, int length,
 }
 
 /* write the memroy to vmcore. 1 page per I/O. */
-static int write_memory(DumpState *s, RAMBlock *block,
-                        target_phys_addr_t *offset)
+static int write_memory(DumpState *s, RAMBlock *block, ram_addr_t start,
+                        target_phys_addr_t *offset, int64_t *size,
+                        int64_t deadline)
 {
     int i, ret;
+    int64_t writen_size = 0;
+    int64_t time;
 
-    for (i = 0; i < block->length / TARGET_PAGE_SIZE; i++) {
-        ret = write_data(s, block->host + i * TARGET_PAGE_SIZE,
+    for (i = 0; i < *size / TARGET_PAGE_SIZE; i++) {
+        ret = write_data(s, block->host + start + i * TARGET_PAGE_SIZE,
                          TARGET_PAGE_SIZE, offset);
         if (ret < 0) {
             return -1;
         }
+        writen_size += TARGET_PAGE_SIZE;
+        time = qemu_get_clock_ms(rt_clock);
+        if (time >= deadline) {
+            /* time out */
+            *size = writen_size;
+            return 1;
+        }
     }
 
-    if ((block->length % TARGET_PAGE_SIZE) != 0) {
-        ret = write_data(s, block->host + i * TARGET_PAGE_SIZE,
-                         block->length % TARGET_PAGE_SIZE, offset);
+    if ((*size % TARGET_PAGE_SIZE) != 0) {
+        ret = write_data(s, block->host + start + i * TARGET_PAGE_SIZE,
+                         *size % TARGET_PAGE_SIZE, offset);
         if (ret < 0) {
             return -1;
         }
+        time = qemu_get_clock_ms(rt_clock);
+        if (time >= deadline) {
+            /* time out */
+            return 1;
+        }
     }
 
     return 0;
@@ -379,6 +410,7 @@ static DumpState *dump_init(int fd, Error **errp)
     CPUState *env;
     DumpState *s = dump_get_current();
     int ret;
+    const char *msg = NULL;
 
     vm_stop(RUN_STATE_PAUSED);
     s->state = DUMP_STATE_SETUP;
@@ -387,6 +419,9 @@ static DumpState *dump_init(int fd, Error **errp)
         s->error = NULL;
     }
     s->fd = fd;
+    s->block = QLIST_FIRST(&ram_list.blocks);
+    s->start = 0;
+    s->timer = NULL;
 
     /*
      * get dump info: endian, class and architecture.
@@ -429,6 +464,9 @@ static DumpState *dump_init(int fd, Error **errp)
         s->phdr_num += s->list.num;
     }
 
+    msg = "terminal does not allow synchronous dumping, continuing detached\n";
+    qemu_suspend_monitor("%s", msg);
+
     return s;
 }
 
@@ -486,6 +524,7 @@ static int dump_begin(DumpState *s)
     }
 
     s->memory_offset = offset;
+    s->offset = offset;
     return 0;
 }
 
@@ -513,38 +552,116 @@ static int dump_completed(DumpState *s)
     return 0;
 }
 
-/* write all memory to vmcore */
-static int dump_iterate(DumpState *s)
+/*
+ * write memory to vmcore.
+ *
+ * this function has three return values:
+ *  -1 : there was one error
+ *   0 : We haven't finished, caller have to go again
+ *   1 : We have finished, we can go to complete phase
+ */
+static int dump_iterate(DumpState *s, int64_t deadline)
 {
-    RAMBlock *block;
-    target_phys_addr_t offset = s->memory_offset;
+    RAMBlock *block = s->block;
+    target_phys_addr_t offset = s->offset;
+    int64_t size, remain, writen_size;
+    int64_t total = s->bandwidth / 10;
     int ret;
 
-    /* write all memory to vmcore */
-    QLIST_FOREACH(block, &ram_list.blocks, next) {
-        ret = write_memory(s, block, &offset);
+    if ((block->length - s->start) >= total) {
+        size = total;
+    } else {
+        size = block->length - s->start;
+    }
+
+    ret = write_memory(s, block, s->start, &offset, &size, deadline);
+    if (ret < 0) {
+        return -1;
+    }
+
+    if (size == total || ret == 1) {
+        if ((size + s->start) == block->length) {
+            s->block = QLIST_NEXT(block, next);
+            s->start = 0;
+        } else {
+            s->start += size;
+        }
+        goto end;
+    }
+
+    while (size < total) {
+        block = QLIST_NEXT(block, next);
+        if (!block) {
+            /* we have finished */
+            return 1;
+        }
+
+        remain = total - size;
+        if (remain >= block->length) {
+            writen_size = block->length;
+        } else {
+            writen_size = remain;
+        }
+        ret = write_memory(s, block, 0, &offset, &writen_size, deadline);
         if (ret < 0) {
             return -1;
+        } else if (ret == 1) {
+            break;
         }
+        size += writen_size;
+    }
+    if (writen_size == block->length) {
+        s->block = QLIST_NEXT(block, next);
+        s->start = 0;
+    } else {
+        s->block = block;
+        s->start = writen_size;
+    }
+
+end:
+    s->offset = offset;
+    if (!s->block) {
+        /* we have finished */
+        return 1;
     }
 
-    return dump_completed(s);
+    return 0;
 }
 
-static int create_vmcore(DumpState *s)
+static void dump_rate_tick(void *opaque)
 {
+    DumpState *s = opaque;
+    int64_t begin, end;
     int ret;
 
-    ret = dump_begin(s);
+    begin = qemu_get_clock_ms(rt_clock);
+    ret = dump_iterate(s, begin + 100);
     if (ret < 0) {
-        return -1;
+        return;
+    } else if (ret == 1) {
+        dump_completed(s);
+        return;
     }
+    end = qemu_get_clock_ms(rt_clock);
+    if (end - begin >= 100) {
+        qemu_mod_timer(s->timer, end + 10);
+    } else {
+        qemu_mod_timer(s->timer, begin + 100);
+    }
+}
 
-    ret = dump_iterate(s);
+static int create_vmcore(DumpState *s)
+{
+    int ret;
+
+    ret = dump_begin(s);
     if (ret < 0) {
         return -1;
     }
 
+    s->timer = qemu_new_timer_ms(rt_clock, dump_rate_tick, s);
+    qemu_mod_timer(s->timer, qemu_get_clock_ms(rt_clock) + 100);
+
     return 0;
 }
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [RFC][PATCH 11/16 v6] support detached dump
  2012-02-09  3:16 [Qemu-devel] [RFC][PATCH 00/16 v6] introducing a new, dedicated memory dump mechanism Wen Congyang
                   ` (9 preceding siblings ...)
  2012-02-09  3:28 ` [Qemu-devel] [RFC][PATCH 10/16 v6] run dump at the background Wen Congyang
@ 2012-02-09  3:29 ` Wen Congyang
  2012-02-09  3:30 ` [Qemu-devel] [RFC][PATCH 12/16 v6] support to cancel the current dumping Wen Congyang
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 68+ messages in thread
From: Wen Congyang @ 2012-02-09  3:29 UTC (permalink / raw)
  To: qemu-devel, Jan Kiszka, Dave Anderson, HATAYAMA Daisuke,
	Luiz Capitulino, Eric Blake

Let the user to choose whether to block other monitor command while dumping.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 dump.c           |   12 ++++++++----
 hmp-commands.hx  |    8 ++++----
 hmp.c            |    3 ++-
 qapi-schema.json |    3 ++-
 qmp-commands.hx  |    7 ++++---
 5 files changed, 20 insertions(+), 13 deletions(-)

diff --git a/dump.c b/dump.c
index cb33495..0f5fcb6 100644
--- a/dump.c
+++ b/dump.c
@@ -76,6 +76,7 @@ typedef struct DumpState {
     int state;
     char *error;
     int fd;
+    bool detach;
     target_phys_addr_t memory_offset;
     int64_t bandwidth;
     RAMBlock *block;
@@ -405,7 +406,7 @@ static target_phys_addr_t get_offset(target_phys_addr_t phys_addr,
     return -1;
 }
 
-static DumpState *dump_init(int fd, Error **errp)
+static DumpState *dump_init(int fd, bool detach, Error **errp)
 {
     CPUState *env;
     DumpState *s = dump_get_current();
@@ -422,6 +423,7 @@ static DumpState *dump_init(int fd, Error **errp)
     s->block = QLIST_FIRST(&ram_list.blocks);
     s->start = 0;
     s->timer = NULL;
+    s->detach = detach;
 
     /*
      * get dump info: endian, class and architecture.
@@ -465,7 +467,9 @@ static DumpState *dump_init(int fd, Error **errp)
     }
 
     msg = "terminal does not allow synchronous dumping, continuing detached\n";
-    qemu_suspend_monitor("%s", msg);
+    if (!detach && qemu_suspend_monitor("%s", msg) != 0) {
+        s->detach = true;
+    }
 
     return s;
 }
@@ -665,7 +669,7 @@ static int create_vmcore(DumpState *s)
     return 0;
 }
 
-void qmp_dump(const char *file, Error **errp)
+void qmp_dump(bool detach, const char *file, Error **errp)
 {
     const char *p;
     int fd = -1;
@@ -694,7 +698,7 @@ void qmp_dump(const char *file, Error **errp)
         return;
     }
 
-    s = dump_init(fd, errp);
+    s = dump_init(fd, detach, errp);
     if (!s) {
         return;
     }
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 6cfb678..ed3544c 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -868,18 +868,18 @@ ETEXI
 
     {
         .name       = "dump",
-        .args_type  = "file:s",
+        .args_type  = "detach:-d,file:s",
         .params     = "file",
-        .help       = "dump to file",
+        .help       = "dump to file (using -d to not wait for completion)",
         .user_print = monitor_user_noop,
         .mhandler.cmd = hmp_dump,
     },
 
 
 STEXI
-@item dump @var{file}
+@item dump [-d] @var{file}
 @findex dump
-Dump to @var{file}.
+Dump to @var{file} (using -d to not wait for completion).
 ETEXI
 
     {
diff --git a/hmp.c b/hmp.c
index 1a69857..7e08332 100644
--- a/hmp.c
+++ b/hmp.c
@@ -855,8 +855,9 @@ void hmp_block_job_cancel(Monitor *mon, const QDict *qdict)
 void hmp_dump(Monitor *mon, const QDict *qdict)
 {
     Error *errp = NULL;
+    bool detach = qdict_get_try_bool(qdict, "detach", 0);
     const char *file = qdict_get_str(qdict, "file");
 
-    qmp_dump(file, &errp);
+    qmp_dump(detach, file, &errp);
     hmp_handle_error(mon, &errp);
 }
diff --git a/qapi-schema.json b/qapi-schema.json
index 1013ae6..d39cb41 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1588,10 +1588,11 @@
 #
 # Dump guest's memory to vmcore.
 #
+# @detach: detached dumping.
 # @file: the filename or file descriptor of the vmcore.
 #
 # Returns: nothing on success
 #
 # Since: 1.1
 ##
-{ 'command': 'dump', 'data': { 'file': 'str' } }
+{ 'command': 'dump', 'data': { 'detach': 'bool', 'file': 'str' } }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 52d3d3b..b0aa22e 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -567,9 +567,9 @@ EQMP
 
     {
         .name       = "dump",
-        .args_type  = "file:s",
+        .args_type  = "detach:-d,file:s",
         .params     = "file",
-        .help       = "dump to file",
+        .help       = "dump to file (using -d to not wait for completion)",
         .user_print = monitor_user_noop,
         .mhandler.cmd_new = qmp_marshal_input_dump,
     },
@@ -582,7 +582,8 @@ Dump to file.
 
 Arguments:
 
-- "file": Destination file (json-string)
+- "detach": detached dumping (json-bool, optional)
+- "file":   Destination file (json-string)
 
 Example:
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [RFC][PATCH 12/16 v6] support to cancel the current dumping
  2012-02-09  3:16 [Qemu-devel] [RFC][PATCH 00/16 v6] introducing a new, dedicated memory dump mechanism Wen Congyang
                   ` (10 preceding siblings ...)
  2012-02-09  3:29 ` [Qemu-devel] [RFC][PATCH 11/16 v6] support detached dump Wen Congyang
@ 2012-02-09  3:30 ` Wen Congyang
  2012-02-09  3:32 ` [Qemu-devel] [RFC][PATCH 13/16 v6] support to set dumping speed Wen Congyang
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 68+ messages in thread
From: Wen Congyang @ 2012-02-09  3:30 UTC (permalink / raw)
  To: qemu-devel, Jan Kiszka, Dave Anderson, HATAYAMA Daisuke,
	Luiz Capitulino, Eric Blake

Add API to allow the user to cancel the current dumping.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 dump.c           |   13 +++++++++++++
 hmp-commands.hx  |   14 ++++++++++++++
 hmp.c            |    5 +++++
 hmp.h            |    1 +
 qapi-schema.json |   13 +++++++++++++
 qmp-commands.hx  |   21 +++++++++++++++++++++
 6 files changed, 67 insertions(+), 0 deletions(-)

diff --git a/dump.c b/dump.c
index 0f5fcb6..cea4c8c 100644
--- a/dump.c
+++ b/dump.c
@@ -709,3 +709,16 @@ void qmp_dump(bool detach, const char *file, Error **errp)
 
     return;
 }
+
+void qmp_dump_cancel(Error **errp)
+{
+    DumpState *s = dump_get_current();
+
+    if (s->state != DUMP_STATE_ACTIVE) {
+        return;
+    }
+
+    s->state = DUMP_STATE_CANCELLED;
+    dump_cleanup(s);
+    return;
+}
diff --git a/hmp-commands.hx b/hmp-commands.hx
index ed3544c..d0f3485 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -883,6 +883,20 @@ Dump to @var{file} (using -d to not wait for completion).
 ETEXI
 
     {
+        .name       = "dump_cancel",
+        .args_type  = "",
+        .params     = "",
+        .help       = "cancel the current VM dumping",
+        .mhandler.cmd = hmp_dump_cancel,
+    },
+
+STEXI
+@item dump_cancel
+@findex dump_cancel
+Cancel the current VM dumping.
+ETEXI
+
+    {
         .name       = "snapshot_blkdev",
         .args_type  = "device:B,snapshot-file:s?,format:s?",
         .params     = "device [new-image-file] [format]",
diff --git a/hmp.c b/hmp.c
index 7e08332..3865032 100644
--- a/hmp.c
+++ b/hmp.c
@@ -861,3 +861,8 @@ void hmp_dump(Monitor *mon, const QDict *qdict)
     qmp_dump(detach, file, &errp);
     hmp_handle_error(mon, &errp);
 }
+
+void hmp_dump_cancel(Monitor *mon, const QDict *qdict)
+{
+    qmp_dump_cancel(NULL);
+}
diff --git a/hmp.h b/hmp.h
index 66984c5..c712f63 100644
--- a/hmp.h
+++ b/hmp.h
@@ -59,5 +59,6 @@ void hmp_block_stream(Monitor *mon, const QDict *qdict);
 void hmp_block_job_set_speed(Monitor *mon, const QDict *qdict);
 void hmp_block_job_cancel(Monitor *mon, const QDict *qdict);
 void hmp_dump(Monitor *mon, const QDict *qdict);
+void hmp_dump_cancel(Monitor *mon, const QDict *qdict);
 
 #endif
diff --git a/qapi-schema.json b/qapi-schema.json
index d39cb41..e5fd056 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1596,3 +1596,16 @@
 # Since: 1.1
 ##
 { 'command': 'dump', 'data': { 'detach': 'bool', 'file': 'str' } }
+
+##
+# @dump_cancel
+#
+# Cancel the current executing dumping process.
+#
+# Returns: nothing on success
+#
+# Notes: This command succeeds even if there is no dumping process running.
+#
+# Since: 1.1
+##
+{ 'command': 'dump_cancel' }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index b0aa22e..c09ca86 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -593,6 +593,27 @@ Example:
 EQMP
 
     {
+        .name       = "dump_cancel",
+        .args_type  = "",
+        .mhandler.cmd_new = qmp_marshal_input_dump_cancel,
+    },
+
+SQMP
+dump_cancel
+
+
+Cancel the current dumping.
+
+Arguments: None.
+
+Example:
+
+-> { "execute": "dump_cancel" }
+<- { "return": {} }
+
+EQMP
+
+    {
         .name       = "netdev_add",
         .args_type  = "netdev:O",
         .params     = "[user|tap|socket],id=str[,prop=value][,...]",
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [RFC][PATCH 13/16 v6] support to set dumping speed
  2012-02-09  3:16 [Qemu-devel] [RFC][PATCH 00/16 v6] introducing a new, dedicated memory dump mechanism Wen Congyang
                   ` (11 preceding siblings ...)
  2012-02-09  3:30 ` [Qemu-devel] [RFC][PATCH 12/16 v6] support to cancel the current dumping Wen Congyang
@ 2012-02-09  3:32 ` Wen Congyang
  2012-02-09  3:32 ` [Qemu-devel] [RFC][PATCH 14/16 v6] support to query dumping status Wen Congyang
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 68+ messages in thread
From: Wen Congyang @ 2012-02-09  3:32 UTC (permalink / raw)
  To: qemu-devel, Jan Kiszka, Dave Anderson, HATAYAMA Daisuke,
	Luiz Capitulino, Eric Blake

Add API to allow the user to control dumping speed

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 dump.c           |   12 ++++++++++++
 hmp-commands.hx  |   15 +++++++++++++++
 hmp.c            |    6 ++++++
 hmp.h            |    1 +
 qapi-schema.json |   15 +++++++++++++++
 qmp-commands.hx  |   22 ++++++++++++++++++++++
 6 files changed, 71 insertions(+), 0 deletions(-)

diff --git a/dump.c b/dump.c
index cea4c8c..9f0aceb 100644
--- a/dump.c
+++ b/dump.c
@@ -86,6 +86,7 @@ typedef struct DumpState {
 } DumpState;
 
 #define DEFAULT_THROTTLE  (32 << 20)      /* Default dump speed throttling */
+#define MIN_THROTTLE  (1 << 10)           /* Miniumum dump speed */
 
 static DumpState *dump_get_current(void)
 {
@@ -722,3 +723,14 @@ void qmp_dump_cancel(Error **errp)
     dump_cleanup(s);
     return;
 }
+
+void qmp_dump_set_speed(int64_t value, Error **errp)
+{
+    DumpState *s = dump_get_current();
+
+    if (value < MIN_THROTTLE) {
+        value = MIN_THROTTLE;
+    }
+    s->bandwidth = value;
+    return;
+}
diff --git a/hmp-commands.hx b/hmp-commands.hx
index d0f3485..407bd06 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -897,6 +897,21 @@ Cancel the current VM dumping.
 ETEXI
 
     {
+        .name       = "dump_set_speed",
+        .args_type  = "value:o",
+        .params     = "value",
+        .help       = "set maximum speed (in bytes) for dumping. "
+        "Defaults to MB if no size suffix is specified, ie. B/K/M/G/T",
+        .mhandler.cmd = hmp_dump_set_speed,
+    },
+
+STEXI
+@item dump_set_speed @var{value}
+@findex dump_set_speed
+Set maximum speed to @var{value} (in bytes) for dumping.
+ETEXI
+
+    {
         .name       = "snapshot_blkdev",
         .args_type  = "device:B,snapshot-file:s?,format:s?",
         .params     = "device [new-image-file] [format]",
diff --git a/hmp.c b/hmp.c
index 3865032..de70690 100644
--- a/hmp.c
+++ b/hmp.c
@@ -866,3 +866,9 @@ void hmp_dump_cancel(Monitor *mon, const QDict *qdict)
 {
     qmp_dump_cancel(NULL);
 }
+
+void hmp_dump_set_speed(Monitor *mon, const QDict *qdict)
+{
+    int64_t value = qdict_get_int(qdict, "value");
+    qmp_dump_set_speed(value, NULL);
+}
diff --git a/hmp.h b/hmp.h
index c712f63..8de4aff 100644
--- a/hmp.h
+++ b/hmp.h
@@ -60,5 +60,6 @@ void hmp_block_job_set_speed(Monitor *mon, const QDict *qdict);
 void hmp_block_job_cancel(Monitor *mon, const QDict *qdict);
 void hmp_dump(Monitor *mon, const QDict *qdict);
 void hmp_dump_cancel(Monitor *mon, const QDict *qdict);
+void hmp_dump_set_speed(Monitor *mon, const QDict *qdict);
 
 #endif
diff --git a/qapi-schema.json b/qapi-schema.json
index e5fd056..472eac0 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1609,3 +1609,18 @@
 # Since: 1.1
 ##
 { 'command': 'dump_cancel' }
+
+##
+# @dump_set_speed
+#
+# Set maximum speed for dumping.
+#
+# @value: maximum speed in bytes.
+#
+# Returns: nothing on success
+#
+# Notes: A value lesser than 1024 will be automatically round up to 1024.
+#
+# Since: 1.1
+##
+{ 'command': 'dump_set_speed', 'data': { 'value': 'int' } }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index c09ca86..73cef06 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -614,6 +614,28 @@ Example:
 EQMP
 
     {
+        .name       = "dump_set_speed",
+        .args_type  = "value:o",
+        .mhandler.cmd_new = qmp_marshal_input_dump_set_speed,
+    },
+
+SQMP
+dump_set_speed
+
+Set maximum speed for dumping.
+
+Arguments:
+
+- "value": maximum speed, in bytes per second (json-int)
+
+Example:
+
+-> { "execute": "dump_set_speed", "arguments": { "value": 1024 } }
+<- { "return": {} }
+
+EQMP
+
+    {
         .name       = "netdev_add",
         .args_type  = "netdev:O",
         .params     = "[user|tap|socket],id=str[,prop=value][,...]",
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [RFC][PATCH 14/16 v6] support to query dumping status
  2012-02-09  3:16 [Qemu-devel] [RFC][PATCH 00/16 v6] introducing a new, dedicated memory dump mechanism Wen Congyang
                   ` (12 preceding siblings ...)
  2012-02-09  3:32 ` [Qemu-devel] [RFC][PATCH 13/16 v6] support to set dumping speed Wen Congyang
@ 2012-02-09  3:32 ` Wen Congyang
  2012-02-09  3:33 ` [Qemu-devel] [RFC][PATCH 15/16 v6] auto cancel dumping after vm state is changed to run Wen Congyang
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 68+ messages in thread
From: Wen Congyang @ 2012-02-09  3:32 UTC (permalink / raw)
  To: qemu-devel, Jan Kiszka, Dave Anderson, HATAYAMA Daisuke,
	Luiz Capitulino, Eric Blake

Add API to allow the user to query dumping status.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 dump.c           |   32 ++++++++++++++++++++++++++++++++
 hmp-commands.hx  |    2 ++
 hmp.c            |   17 +++++++++++++++++
 hmp.h            |    1 +
 monitor.c        |    7 +++++++
 qapi-schema.json |   26 ++++++++++++++++++++++++++
 qmp-commands.hx  |   47 +++++++++++++++++++++++++++++++++++++++++++++++
 7 files changed, 132 insertions(+), 0 deletions(-)

diff --git a/dump.c b/dump.c
index 9f0aceb..a921b76 100644
--- a/dump.c
+++ b/dump.c
@@ -734,3 +734,35 @@ void qmp_dump_set_speed(int64_t value, Error **errp)
     s->bandwidth = value;
     return;
 }
+
+DumpInfo *qmp_query_dump(Error **errp)
+{
+    DumpInfo *info = g_malloc0(sizeof(*info));
+    DumpState *s = dump_get_current();
+
+    switch (s->state) {
+    case DUMP_STATE_SETUP:
+        /* no migration has happened ever */
+        break;
+    case DUMP_STATE_ACTIVE:
+        info->has_status = true;
+        info->status = g_strdup("active");
+        break;
+    case DUMP_STATE_COMPLETED:
+        info->has_status = true;
+        info->status = g_strdup("completed");
+        break;
+    case DUMP_STATE_ERROR:
+        info->has_status = true;
+        info->status = g_strdup("failed");
+        info->has_error = true;
+        info->error = g_strdup(s->error);
+        break;
+    case DUMP_STATE_CANCELLED:
+        info->has_status = true;
+        info->status = g_strdup("cancelled");
+        break;
+    }
+
+    return info;
+}
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 407bd06..a026905 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1431,6 +1431,8 @@ show device tree
 show qdev device model list
 @item info roms
 show roms
+@item info dump
+show dumping status
 @end table
 ETEXI
 
diff --git a/hmp.c b/hmp.c
index de70690..b36921c 100644
--- a/hmp.c
+++ b/hmp.c
@@ -872,3 +872,20 @@ void hmp_dump_set_speed(Monitor *mon, const QDict *qdict)
     int64_t value = qdict_get_int(qdict, "value");
     qmp_dump_set_speed(value, NULL);
 }
+
+void hmp_info_dump(Monitor *mon)
+{
+    DumpInfo *info;
+
+    info = qmp_query_dump(NULL);
+
+    if (info->has_status) {
+        monitor_printf(mon, "Dumping status: %s\n", info->status);
+    }
+
+    if (info->has_error) {
+        monitor_printf(mon, "Dumping failed reason: %s\n", info->error);
+    }
+
+    qapi_free_DumpInfo(info);
+}
diff --git a/hmp.h b/hmp.h
index 8de4aff..c54ae6e 100644
--- a/hmp.h
+++ b/hmp.h
@@ -61,5 +61,6 @@ void hmp_block_job_cancel(Monitor *mon, const QDict *qdict);
 void hmp_dump(Monitor *mon, const QDict *qdict);
 void hmp_dump_cancel(Monitor *mon, const QDict *qdict);
 void hmp_dump_set_speed(Monitor *mon, const QDict *qdict);
+void hmp_info_dump(Monitor *mon);
 
 #endif
diff --git a/monitor.c b/monitor.c
index 18e1ac7..f6fe4fd 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2592,6 +2592,13 @@ static mon_cmd_t info_cmds[] = {
         .mhandler.info = do_trace_print_events,
     },
     {
+        .name       = "dump",
+        .args_type  = "",
+        .params     = "",
+        .help       = "show dumping status",
+        .mhandler.info = hmp_info_dump,
+    },
+    {
         .name       = NULL,
     },
 };
diff --git a/qapi-schema.json b/qapi-schema.json
index 472eac0..27d5199 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1624,3 +1624,29 @@
 # Since: 1.1
 ##
 { 'command': 'dump_set_speed', 'data': { 'value': 'int' } }
+
+##
+# @DumpInfo
+#
+# Information about current migration process.
+#
+# @status: #optional string describing the current dumping status.
+#          As of 1,1 this can be 'active', 'completed', 'failed' or
+#          'cancelled'. If this field is not returned, no migration process
+#          has been initiated
+#
+# Since: 1.1
+##
+{ 'type': 'DumpInfo',
+  'data': { '*status': 'str', '*error': 'str' } }
+
+##
+# @query-dump
+#
+# Returns information about current dumping process.
+#
+# Returns: @DumpInfo
+#
+# Since: 1.1
+##
+{ 'command': 'query-dump', 'returns': 'DumpInfo' }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 73cef06..1656eea 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -2043,6 +2043,53 @@ EQMP
     },
 
 SQMP
+query-dump
+-------------
+
+Dumping status.
+
+Return a json-object.
+
+The main json-object contains the following:
+
+- "status": migration status (json-string)
+     - Possible values: "active", "completed", "failed", "cancelled"
+
+Examples:
+
+1. Before the first migration
+
+-> { "execute": "query-dump" }
+<- { "return": {} }
+
+2. Migration is done and has succeeded
+
+-> { "execute": "query-dump" }
+<- { "return": { "status": "completed" } }
+
+3. Migration is done and has failed
+
+-> { "execute": "query-dump" }
+<- { "return": { "status": "failed" } }
+
+4. Migration is being performed:
+
+-> { "execute": "query-dump" }
+<- {
+      "return":{
+         "status":"active",
+      }
+   }
+
+EQMP
+
+    {
+        .name       = "query-dump",
+        .args_type  = "",
+        .mhandler.cmd_new = qmp_marshal_input_query_dump,
+    },
+
+SQMP
 query-balloon
 -------------
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [RFC][PATCH 15/16 v6] auto cancel dumping after vm state is changed to run
  2012-02-09  3:16 [Qemu-devel] [RFC][PATCH 00/16 v6] introducing a new, dedicated memory dump mechanism Wen Congyang
                   ` (13 preceding siblings ...)
  2012-02-09  3:32 ` [Qemu-devel] [RFC][PATCH 14/16 v6] support to query dumping status Wen Congyang
@ 2012-02-09  3:33 ` Wen Congyang
  2012-02-09  3:34 ` [Qemu-devel] [RFC][PATCH 16/16 v6] allow user to dump a fraction of the memory Wen Congyang
  2012-02-13  1:45 ` [Qemu-devel] [RFC][PATCH 00/16 v6] introducing a new, dedicated memory dump mechanism Wen Congyang
  16 siblings, 0 replies; 68+ messages in thread
From: Wen Congyang @ 2012-02-09  3:33 UTC (permalink / raw)
  To: qemu-devel, Jan Kiszka, Dave Anderson, HATAYAMA Daisuke,
	Luiz Capitulino, Eric Blake

The command dump does not support to dump while vm is running. If the user resume
the vm, we should auto cancel dumping and set the status to failed.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 dump.c |   19 +++++++++++++++++++
 vl.c   |    5 +++--
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/dump.c b/dump.c
index a921b76..6322d1a 100644
--- a/dump.c
+++ b/dump.c
@@ -83,6 +83,7 @@ typedef struct DumpState {
     ram_addr_t start;
     target_phys_addr_t offset;
     QEMUTimer *timer;
+    VMChangeStateEntry *handler;
 } DumpState;
 
 #define DEFAULT_THROTTLE  (32 << 20)      /* Default dump speed throttling */
@@ -114,6 +115,11 @@ static int dump_cleanup(DumpState *s)
         qemu_free_timer(s->timer);
     }
 
+    if (s->handler) {
+        qemu_del_vm_change_state_handler(s->handler);
+        s->handler = NULL;
+    }
+
     qemu_resume_monitor();
 
     return ret;
@@ -670,6 +676,17 @@ static int create_vmcore(DumpState *s)
     return 0;
 }
 
+static void dump_vm_state_change(void *opaque, int running, RunState state)
+{
+    DumpState *s = opaque;
+
+    if (running) {
+        qmp_dump_cancel(NULL);
+        s->state = DUMP_STATE_ERROR;
+        s->error = g_strdup("vm state is changed to run\n");
+    }
+}
+
 void qmp_dump(bool detach, const char *file, Error **errp)
 {
     const char *p;
@@ -704,6 +721,8 @@ void qmp_dump(bool detach, const char *file, Error **errp)
         return;
     }
 
+    s->handler = qemu_add_vm_change_state_handler(dump_vm_state_change, s);
+
     if (create_vmcore(s) < 0) {
         error_set(errp, QERR_IO_ERROR);
     }
diff --git a/vl.c b/vl.c
index 2d464cf..863e91c 100644
--- a/vl.c
+++ b/vl.c
@@ -1248,11 +1248,12 @@ void qemu_del_vm_change_state_handler(VMChangeStateEntry *e)
 
 void vm_state_notify(int running, RunState state)
 {
-    VMChangeStateEntry *e;
+    VMChangeStateEntry *e, *next;
 
     trace_vm_state_notify(running, state);
 
-    for (e = vm_change_state_head.lh_first; e; e = e->entries.le_next) {
+    /* e->cb() may remove itself */
+    QLIST_FOREACH_SAFE(e, &vm_change_state_head, entries, next) {
         e->cb(e->opaque, running, state);
     }
 }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [Qemu-devel] [RFC][PATCH 16/16 v6] allow user to dump a fraction of the memory
  2012-02-09  3:16 [Qemu-devel] [RFC][PATCH 00/16 v6] introducing a new, dedicated memory dump mechanism Wen Congyang
                   ` (14 preceding siblings ...)
  2012-02-09  3:33 ` [Qemu-devel] [RFC][PATCH 15/16 v6] auto cancel dumping after vm state is changed to run Wen Congyang
@ 2012-02-09  3:34 ` Wen Congyang
  2012-02-14 18:27   ` Jan Kiszka
  2012-02-13  1:45 ` [Qemu-devel] [RFC][PATCH 00/16 v6] introducing a new, dedicated memory dump mechanism Wen Congyang
  16 siblings, 1 reply; 68+ messages in thread
From: Wen Congyang @ 2012-02-09  3:34 UTC (permalink / raw)
  To: qemu-devel, Jan Kiszka, Dave Anderson, HATAYAMA Daisuke,
	Luiz Capitulino, Eric Blake

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
---
 dump.c           |  206 ++++++++++++++++++++++++++++++++++++++++--------------
 hmp-commands.hx  |    6 +-
 hmp.c            |   13 +++-
 memory_mapping.c |   27 +++++++
 memory_mapping.h |    2 +
 qapi-schema.json |    6 ++-
 qmp-commands.hx  |    4 +-
 7 files changed, 205 insertions(+), 59 deletions(-)

diff --git a/dump.c b/dump.c
index 6322d1a..60a5180 100644
--- a/dump.c
+++ b/dump.c
@@ -77,6 +77,9 @@ typedef struct DumpState {
     char *error;
     int fd;
     bool detach;
+    bool has_filter;
+    int64_t begin;
+    int64_t length;
     target_phys_addr_t memory_offset;
     int64_t bandwidth;
     RAMBlock *block;
@@ -397,23 +400,82 @@ static int write_memory(DumpState *s, RAMBlock *block, ram_addr_t start,
 
 /* get the memory's offset in the vmcore */
 static target_phys_addr_t get_offset(target_phys_addr_t phys_addr,
-                                     target_phys_addr_t memory_offset)
+                                     DumpState *s)
 {
     RAMBlock *block;
-    target_phys_addr_t offset = memory_offset;
+    target_phys_addr_t offset = s->memory_offset;
+    int64_t size_in_block, start;
+
+    if (s->has_filter) {
+        if (phys_addr < s->begin || phys_addr >= s->begin + s->length) {
+            return -1;
+        }
+    }
 
     QLIST_FOREACH(block, &ram_list.blocks, next) {
-        if (phys_addr >= block->offset &&
-            phys_addr < block->offset + block->length) {
-            return phys_addr - block->offset + offset;
+        if (s->has_filter) {
+            if (block->offset >= s->begin + s->length ||
+                block->offset + block->length <= s->begin) {
+                /* This block is out of the range */
+                continue;
+            }
+
+            if (s->begin <= block->offset) {
+                start = block->offset;
+            } else {
+                start = s->begin;
+            }
+
+            size_in_block = block->length - (start - block->offset);
+            if (s->begin + s->length < block->offset + block->length) {
+                size_in_block -= block->offset + block->length -
+                                 (s->begin + s->length);
+            }
+        } else {
+            start = block->offset;
+            size_in_block = block->length;
         }
-        offset += block->length;
+
+        if (phys_addr >= start && phys_addr < start + size_in_block) {
+            return phys_addr - start + offset;
+        }
+
+        offset += size_in_block;
     }
 
     return -1;
 }
 
-static DumpState *dump_init(int fd, bool detach, Error **errp)
+static ram_addr_t get_start_block(DumpState *s)
+{
+    RAMBlock *block;
+
+    if (!s->has_filter) {
+        s->block = QLIST_FIRST(&ram_list.blocks);
+        return 0;
+    }
+
+    QLIST_FOREACH(block, &ram_list.blocks, next) {
+        if (block->offset >= s->begin + s->length ||
+            block->offset + block->length <= s->begin) {
+            /* This block is out of the range */
+            continue;
+        }
+
+        s->block = block;
+        if (s->begin > block->offset ) {
+            s->start = s->begin - block->offset;
+        } else {
+            s->start = 0;
+        }
+        return s->start;
+    }
+
+    return -1;
+}
+
+static DumpState *dump_init(int fd, bool detach, bool has_filter, int64_t begin,
+                            int64_t length, Error **errp)
 {
     CPUState *env;
     DumpState *s = dump_get_current();
@@ -427,10 +489,17 @@ static DumpState *dump_init(int fd, bool detach, Error **errp)
         s->error = NULL;
     }
     s->fd = fd;
-    s->block = QLIST_FIRST(&ram_list.blocks);
-    s->start = 0;
     s->timer = NULL;
     s->detach = detach;
+    s->has_filter = has_filter;
+    s->begin = begin;
+    s->length = length;
+
+    s->start = get_start_block(s);
+    if (s->start == -1) {
+        error_set(errp, QERR_INVALID_PARAMETER, "begin");
+        return NULL;
+    }
 
     /*
      * get dump info: endian, class and architecture.
@@ -454,6 +523,10 @@ static DumpState *dump_init(int fd, bool detach, Error **errp)
     QTAILQ_INIT(&s->list.head);
     get_memory_mapping(&s->list);
 
+    if (s->has_filter) {
+        filter_memory_mapping(&s->list, s->begin, s->length);
+    }
+
     /* crash needs extra memory mapping to determine phys_base. */
     ret = cpu_add_extra_memory_mapping(&s->list);
     if (ret < 0) {
@@ -547,7 +620,7 @@ static int dump_completed(DumpState *s)
     int phdr_index = 1, ret;
 
     QTAILQ_FOREACH(memory_mapping, &s->list.head, next) {
-        offset = get_offset(memory_mapping->phys_addr, s->memory_offset);
+        offset = get_offset(memory_mapping->phys_addr, s);
         if (s->dump_info.d_class == ELFCLASS64) {
             ret = write_elf64_load(s, memory_mapping, phdr_index++, offset);
         } else {
@@ -563,6 +636,33 @@ static int dump_completed(DumpState *s)
     return 0;
 }
 
+static int get_next_block(DumpState *s, RAMBlock *block)
+{
+    while (1) {
+        block = QLIST_NEXT(block, next);
+        if (!block) {
+            /* no more block */
+            return 1;
+        }
+
+        s->start = 0;
+        s->block = block;
+        if (s->has_filter) {
+            if (block->offset >= s->begin + s->length ||
+                block->offset + block->length <= s->begin) {
+                /* This block is out of the range */
+                continue;
+            }
+
+            if (s->begin > block->offset) {
+                s->start = s->begin - block->offset;
+            }
+        }
+
+        return 0;
+    }
+}
+
 /*
  * write memory to vmcore.
  *
@@ -573,47 +673,39 @@ static int dump_completed(DumpState *s)
  */
 static int dump_iterate(DumpState *s, int64_t deadline)
 {
-    RAMBlock *block = s->block;
+    RAMBlock *block;
     target_phys_addr_t offset = s->offset;
-    int64_t size, remain, writen_size;
+    int64_t size, writen_size, size_in_block;
     int64_t total = s->bandwidth / 10;
     int ret;
+    bool first = true;
 
-    if ((block->length - s->start) >= total) {
-        size = total;
-    } else {
-        size = block->length - s->start;
-    }
-
-    ret = write_memory(s, block, s->start, &offset, &size, deadline);
-    if (ret < 0) {
-        return -1;
-    }
-
-    if (size == total || ret == 1) {
-        if ((size + s->start) == block->length) {
-            s->block = QLIST_NEXT(block, next);
-            s->start = 0;
+    size = 0;
+    while (size < total) {
+        if (first) {
+            first = false;
         } else {
-            s->start += size;
+            ret = get_next_block(s, block);
+            if (ret == 1) {
+                /* we have finished */
+                return 1;
+            }
         }
-        goto end;
-    }
 
-    while (size < total) {
-        block = QLIST_NEXT(block, next);
-        if (!block) {
-            /* we have finished */
-            return 1;
+        block = s->block;
+        writen_size = total - size;
+        size_in_block = block->length - s->start;
+        if (s->has_filter &&
+            s->begin + s->length < block->offset + block->length) {
+            size_in_block -= block->offset + block->length -
+                             (s->begin + s->length);
         }
 
-        remain = total - size;
-        if (remain >= block->length) {
-            writen_size = block->length;
-        } else {
-            writen_size = remain;
+        if (writen_size >= size_in_block) {
+            writen_size = size_in_block;
         }
-        ret = write_memory(s, block, 0, &offset, &writen_size, deadline);
+
+        ret = write_memory(s, block, s->start, &offset, &writen_size, deadline);
         if (ret < 0) {
             return -1;
         } else if (ret == 1) {
@@ -621,21 +713,17 @@ static int dump_iterate(DumpState *s, int64_t deadline)
         }
         size += writen_size;
     }
-    if (writen_size == block->length) {
-        s->block = QLIST_NEXT(block, next);
-        s->start = 0;
+    if (writen_size == size_in_block) {
+        ret = get_next_block(s, block);
+        if (ret == 1) {
+            /* we have finished */
+            return 1;
+        }
     } else {
-        s->block = block;
-        s->start = writen_size;
+        s->start += writen_size;
     }
 
-end:
     s->offset = offset;
-    if (!s->block) {
-        /* we have finished */
-        return 1;
-    }
-
     return 0;
 }
 
@@ -687,12 +775,22 @@ static void dump_vm_state_change(void *opaque, int running, RunState state)
     }
 }
 
-void qmp_dump(bool detach, const char *file, Error **errp)
+void qmp_dump(bool detach, const char *file, bool has_begin, int64_t begin,
+              bool has_length, int64_t length, Error **errp)
 {
     const char *p;
     int fd = -1;
     DumpState *s;
 
+    if (has_begin && !has_length) {
+        error_set(errp, QERR_MISSING_PARAMETER, "length");
+        return;
+    }
+    if (!has_begin && has_length) {
+        error_set(errp, QERR_MISSING_PARAMETER, "begin");
+        return;
+    }
+
 #if !defined(WIN32)
     if (strstart(file, "fd:", &p)) {
         fd = qemu_get_fd(p);
@@ -716,7 +814,7 @@ void qmp_dump(bool detach, const char *file, Error **errp)
         return;
     }
 
-    s = dump_init(fd, detach, errp);
+    s = dump_init(fd, detach, has_begin, begin, length, errp);
     if (!s) {
         return;
     }
diff --git a/hmp-commands.hx b/hmp-commands.hx
index a026905..388b9ac 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -868,9 +868,11 @@ ETEXI
 
     {
         .name       = "dump",
-        .args_type  = "detach:-d,file:s",
+        .args_type  = "detach:-d,file:s,begin:i?,length:i?",
         .params     = "file",
-        .help       = "dump to file (using -d to not wait for completion)",
+        .help       = "dump to file (using -d to not wait for completion)"
+                      "\n\t\t\t begin(optional): the starting physical address"
+                      "\n\t\t\t length(optional): the memory size, in bytes",
         .user_print = monitor_user_noop,
         .mhandler.cmd = hmp_dump,
     },
diff --git a/hmp.c b/hmp.c
index b36921c..e0c5c62 100644
--- a/hmp.c
+++ b/hmp.c
@@ -857,8 +857,19 @@ void hmp_dump(Monitor *mon, const QDict *qdict)
     Error *errp = NULL;
     bool detach = qdict_get_try_bool(qdict, "detach", 0);
     const char *file = qdict_get_str(qdict, "file");
+    bool has_begin = qdict_haskey(qdict, "begin");
+    bool has_length = qdict_haskey(qdict, "length");
+    int64_t begin = 0;
+    int64_t length = 0;
 
-    qmp_dump(detach, file, &errp);
+    if (has_begin) {
+        begin = qdict_get_int(qdict, "begin");
+    }
+    if (has_length) {
+        length = qdict_get_int(qdict, "length");
+    }
+
+    qmp_dump(detach, file, has_begin, begin, has_length, length, &errp);
     hmp_handle_error(mon, &errp);
 }
 
diff --git a/memory_mapping.c b/memory_mapping.c
index fc0ddee..e2fcf2e 100644
--- a/memory_mapping.c
+++ b/memory_mapping.c
@@ -193,3 +193,30 @@ void get_memory_mapping(MemoryMappingList *list)
 
     return;
 }
+
+void filter_memory_mapping(MemoryMappingList *list, int64_t begin,
+                           int64_t length)
+{
+    MemoryMapping *cur, *next;
+
+    QTAILQ_FOREACH_SAFE(cur, &list->head, next, next) {
+        if (cur->phys_addr >= begin + length ||
+            cur->phys_addr + cur->length <= begin) {
+            QTAILQ_REMOVE(&list->head, cur, next);
+            list->num--;
+            continue;
+        }
+
+        if (cur->phys_addr < begin) {
+            cur->length -= begin - cur->phys_addr;
+            if (cur->virt_addr) {
+                cur->virt_addr += begin - cur->phys_addr;
+            }
+            cur->phys_addr = begin;
+        }
+
+        if (cur->phys_addr + cur->length > begin + length) {
+            cur->length -= cur->phys_addr + cur->length - begin - length;
+        }
+    }
+}
diff --git a/memory_mapping.h b/memory_mapping.h
index 679f9ef..c7bb4fa 100644
--- a/memory_mapping.h
+++ b/memory_mapping.h
@@ -35,5 +35,7 @@ void add_to_memory_mapping(MemoryMappingList *list,
 
 void free_memory_mapping_list(MemoryMappingList *list);
 void get_memory_mapping(MemoryMappingList *list);
+void filter_memory_mapping(MemoryMappingList *list, int64_t begin,
+                           int64_t length);
 
 #endif
diff --git a/qapi-schema.json b/qapi-schema.json
index 27d5199..aee9efa 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1590,12 +1590,16 @@
 #
 # @detach: detached dumping.
 # @file: the filename or file descriptor of the vmcore.
+# @begin: if specified, the starting physical address.
+# @length: if specified, the memory size, in bytes.
 #
 # Returns: nothing on success
 #
 # Since: 1.1
 ##
-{ 'command': 'dump', 'data': { 'detach': 'bool', 'file': 'str' } }
+{ 'command': 'dump',
+  'data': { 'detach': 'bool', 'file': 'str', '*begin': 'int',
+            '*length': 'int' } }
 
 ##
 # @dump_cancel
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 1656eea..0c72536 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -567,7 +567,7 @@ EQMP
 
     {
         .name       = "dump",
-        .args_type  = "detach:-d,file:s",
+        .args_type  = "detach:-d,file:s,begin:i?,end:i?",
         .params     = "file",
         .help       = "dump to file (using -d to not wait for completion)",
         .user_print = monitor_user_noop,
@@ -584,6 +584,8 @@ Arguments:
 
 - "detach": detached dumping (json-bool, optional)
 - "file":   Destination file (json-string)
+- "begin":  the starting physical address (json-int)
+- "length": the memory size, in bytes (json-int)
 
 Example:
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 00/16 v6] introducing a new, dedicated memory dump mechanism
  2012-02-09  3:16 [Qemu-devel] [RFC][PATCH 00/16 v6] introducing a new, dedicated memory dump mechanism Wen Congyang
                   ` (15 preceding siblings ...)
  2012-02-09  3:34 ` [Qemu-devel] [RFC][PATCH 16/16 v6] allow user to dump a fraction of the memory Wen Congyang
@ 2012-02-13  1:45 ` Wen Congyang
  16 siblings, 0 replies; 68+ messages in thread
From: Wen Congyang @ 2012-02-13  1:45 UTC (permalink / raw)
  To: qemu-devel, Jan Kiszka, Dave Anderson, HATAYAMA Daisuke,
	Luiz Capitulino, Eric Blake

At 02/09/2012 11:16 AM, Wen Congyang Wrote:
> Hi, all
> 
> 'virsh dump' can not work when host pci device is used by guest. We have
> discussed this issue here:
> http://lists.nongnu.org/archive/html/qemu-devel/2011-10/msg00736.html
> 
> We have determined to introduce a new command dump to dump memory. The core
> file's format can be elf.

Hi, Jan Kiszka

At 01/10/2012 09:30 PM, Luiz Capitulino Wrote:
> Btw, I'd like to have an ack from Jan for the general approach of this
> command.
> 

Do you agree with the general approach of this command?

Thanks
Wen Congyang

> 
> Note:
> 1. The guest should be x86 or x86_64. The other arch is not supported.
> 2. If you use old gdb, gdb may crash. I use gdb-7.3.1, and it does not crash.
> 3. If the OS is in the second kernel, gdb may not work well, and crash can
>    work by specifying '--machdep phys_addr=xxx' in the command line. The
>    reason is that the second kernel will update the page table, and we can
>    not get the page table for the first kernel.
> 4. If the guest OS is 32 bit and the memory size is larger than 4G, the vmcore
>    is elf64 format. You should use the gdb which is built with --enable-64-bit-bfd.
> 5. This patchset is based on the upstream tree, and apply one patch that is still
>    in Luiz Capitulino's tree, because I use the API qemu_get_fd() in this patchset.
> 
> Changes from v5 to v6:
> 1. allow user to dump a fraction of the memory
> 2. fix some bugs
> 
> Changes from v4 to v5:
> 1. convert the new command dump to QAPI 
> 
> Changes from v3 to v4:
> 1. support it to run asynchronously
> 2. add API to cancel dumping and query dumping progress
> 3. add API to control dumping speed
> 4. auto cancel dumping when the user resumes vm, and the status is failed.
> 
> Changes from v2 to v3:
> 1. address Jan Kiszka's comment
> 
> Changes from v1 to v2:
> 1. fix virt addr in the vmcore.
> 
> Wen Congyang (16):
>   monitor: introduce qemu_suspend_monitor()/qemu_resume_monitor()
>   Add API to create memory mapping list
>   Add API to check whether a physical address is I/O address
>   target-i386: implement cpu_get_memory_mapping()
>   Add API to get memory mapping
>   target-i386: Add API to write elf notes to core file
>   target-i386: Add API to add extra memory mapping
>   target-i386: add API to get dump info
>   introduce a new monitor command 'dump' to dump guest's memory
>   run dump at the background
>   support detached dump
>   support to cancel the current dumping
>   support to set dumping speed
>   support to query dumping status
>   auto cancel dumping after vm state is changed to run
>   allow user to dump a fraction of the memory
> 
>  Makefile.target         |   11 +-
>  cpu-all.h               |   18 +
>  cpu-common.h            |    2 +
>  dump.c                  |  885 +++++++++++++++++++++++++++++++++++++++++++++++
>  dump.h                  |   13 +
>  exec.c                  |   16 +
>  hmp-commands.hx         |   49 +++
>  hmp.c                   |   49 +++
>  hmp.h                   |    4 +
>  memory_mapping.c        |  222 ++++++++++++
>  memory_mapping.h        |   41 +++
>  monitor.c               |   37 ++
>  monitor.h               |    2 +
>  qapi-schema.json        |   72 ++++
>  qmp-commands.hx         |  119 +++++++
>  target-i386/arch-dump.c |  574 ++++++++++++++++++++++++++++++
>  vl.c                    |    5 +-
>  17 files changed, 2112 insertions(+), 7 deletions(-)
>  create mode 100644 dump.c
>  create mode 100644 dump.h
>  create mode 100644 memory_mapping.c
>  create mode 100644 memory_mapping.h
>  create mode 100644 target-i386/arch-dump.c
> 
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 01/16 v6] monitor: introduce qemu_suspend_monitor()/qemu_resume_monitor()
  2012-02-09  3:19 ` [Qemu-devel] [RFC][PATCH 01/16 v6] monitor: introduce qemu_suspend_monitor()/qemu_resume_monitor() Wen Congyang
@ 2012-02-14 16:19   ` Jan Kiszka
  2012-02-15  2:54     ` Wen Congyang
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Kiszka @ 2012-02-14 16:19 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

On 2012-02-09 04:19, Wen Congyang wrote:
> Sync command needs these two APIs to suspend/resume monitor.
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> ---
>  monitor.c |   27 +++++++++++++++++++++++++++
>  monitor.h |    2 ++
>  2 files changed, 29 insertions(+), 0 deletions(-)
> 
> diff --git a/monitor.c b/monitor.c
> index 11639b1..7e72739 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -4442,6 +4442,26 @@ static void monitor_command_cb(Monitor *mon, const char *cmdline, void *opaque)
>      monitor_resume(mon);
>  }
>  
> +int qemu_suspend_monitor(const char *fmt, ...)
> +{
> +    int ret;
> +
> +    if (cur_mon) {
> +        ret = monitor_suspend(cur_mon);
> +    } else {
> +        ret = -ENOTTY;
> +    }
> +
> +    if (ret < 0 && fmt) {
> +        va_list ap;
> +        va_start(ap, fmt);
> +        monitor_vprintf(cur_mon, fmt, ap);
> +        va_end(ap);
> +    }
> +
> +    return ret;
> +}
> +
>  int monitor_suspend(Monitor *mon)
>  {
>      if (!mon->rs)
> @@ -4450,6 +4470,13 @@ int monitor_suspend(Monitor *mon)
>      return 0;
>  }
>  
> +void qemu_resume_monitor(void)
> +{
> +    if (cur_mon) {
> +        monitor_resume(cur_mon);
> +    }
> +}
> +
>  void monitor_resume(Monitor *mon)
>  {
>      if (!mon->rs)
> diff --git a/monitor.h b/monitor.h
> index 58109af..60a1e17 100644
> --- a/monitor.h
> +++ b/monitor.h
> @@ -46,7 +46,9 @@ int monitor_cur_is_qmp(void);
>  void monitor_protocol_event(MonitorEvent event, QObject *data);
>  void monitor_init(CharDriverState *chr, int flags);
>  
> +int qemu_suspend_monitor(const char *fmt, ...) GCC_FMT_ATTR(1, 2);
>  int monitor_suspend(Monitor *mon);
> +void qemu_resume_monitor(void);
>  void monitor_resume(Monitor *mon);
>  
>  int monitor_read_bdrv_key_start(Monitor *mon, BlockDriverState *bs,

I don't see any added value in this API, specifically as it is built on
top of cur_mon. Just use the existing services like the migration code
does. If you properly pass down the monitor reference from the command
to the suspend and store what monitor you suspended, all should be fine.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 02/16 v6] Add API to create memory mapping list
  2012-02-09  3:20 ` [Qemu-devel] [RFC][PATCH 02/16 v6] Add API to create memory mapping list Wen Congyang
@ 2012-02-14 16:39   ` Jan Kiszka
  2012-02-15  3:00     ` Wen Congyang
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Kiszka @ 2012-02-14 16:39 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

On 2012-02-09 04:20, Wen Congyang wrote:
> The memory mapping list stores virtual address and physical address mapping.
> The folloing patch will use this information to create PT_LOAD in the vmcore.
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> ---
>  Makefile.target  |    1 +
>  memory_mapping.c |  130 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  memory_mapping.h |   38 ++++++++++++++++
>  3 files changed, 169 insertions(+), 0 deletions(-)
>  create mode 100644 memory_mapping.c
>  create mode 100644 memory_mapping.h
> 
> diff --git a/Makefile.target b/Makefile.target
> index 68481a3..e35e464 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -200,6 +200,7 @@ obj-$(CONFIG_KVM) += kvm.o kvm-all.o
>  obj-$(CONFIG_NO_KVM) += kvm-stub.o
>  obj-$(CONFIG_VGA) += vga.o
>  obj-y += memory.o savevm.o
> +obj-y += memory_mapping.o
>  LIBS+=-lz
>  
>  obj-i386-$(CONFIG_KVM) += hyperv.o
> diff --git a/memory_mapping.c b/memory_mapping.c
> new file mode 100644
> index 0000000..d83b7d7
> --- /dev/null
> +++ b/memory_mapping.c
> @@ -0,0 +1,130 @@
> +/*
> + * QEMU memory mapping
> + *
> + * Copyright Fujitsu, Corp. 2011
> + *
> + * Authors:
> + *     Wen Congyang <wency@cn.fujitsu.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "cpu.h"
> +#include "cpu-all.h"
> +#include "memory_mapping.h"
> +
> +static MemoryMapping *last_mapping;
> +
> +static void create_new_memory_mapping(MemoryMappingList *list,
> +                                      target_phys_addr_t phys_addr,
> +                                      target_phys_addr_t virt_addr,
> +                                      ram_addr_t length)
> +{
> +    MemoryMapping *memory_mapping, *p;
> +
> +    memory_mapping = g_malloc(sizeof(MemoryMapping));
> +    memory_mapping->phys_addr = phys_addr;
> +    memory_mapping->virt_addr = virt_addr;
> +    memory_mapping->length = length;
> +    last_mapping = memory_mapping;
> +    list->num++;
> +    QTAILQ_FOREACH(p, &list->head, next) {
> +        if (p->phys_addr >= memory_mapping->phys_addr) {
> +            QTAILQ_INSERT_BEFORE(p, memory_mapping, next);
> +            return;
> +        }
> +    }
> +    QTAILQ_INSERT_TAIL(&list->head, memory_mapping, next);
> +    return;
> +}
> +
> +void create_new_memory_mapping_head(MemoryMappingList *list,
> +                                    target_phys_addr_t phys_addr,
> +                                    target_phys_addr_t virt_addr,
> +                                    ram_addr_t length)
> +{
> +    MemoryMapping *memory_mapping;
> +
> +    memory_mapping = g_malloc(sizeof(MemoryMapping));
> +    memory_mapping->phys_addr = phys_addr;
> +    memory_mapping->virt_addr = virt_addr;
> +    memory_mapping->length = length;
> +    last_mapping = memory_mapping;
> +    list->num++;
> +    QTAILQ_INSERT_HEAD(&list->head, memory_mapping, next);
> +    return;
> +}
> +
> +void add_to_memory_mapping(MemoryMappingList *list,
> +                           target_phys_addr_t phys_addr,
> +                           target_phys_addr_t virt_addr,
> +                           ram_addr_t length)
> +{
> +    MemoryMapping *memory_mapping;
> +
> +    if (QTAILQ_EMPTY(&list->head)) {
> +        create_new_memory_mapping(list, phys_addr, virt_addr, length);
> +        return;
> +    }
> +
> +    if (last_mapping) {
> +        if ((phys_addr == (last_mapping->phys_addr + last_mapping->length)) &&
> +            (virt_addr == (last_mapping->virt_addr + last_mapping->length))) {
> +            last_mapping->length += length;
> +            return;
> +        }
> +    }
> +
> +    QTAILQ_FOREACH(memory_mapping, &list->head, next) {
> +        last_mapping = memory_mapping;
> +        if ((phys_addr == (last_mapping->phys_addr + last_mapping->length)) &&
> +            (virt_addr == (last_mapping->virt_addr + last_mapping->length))) {
> +            last_mapping->length += length;
> +            return;
> +        }
> +
> +        if (!(phys_addr >= (last_mapping->phys_addr)) ||
> +            !(phys_addr < (last_mapping->phys_addr + last_mapping->length))) {
> +            /* last_mapping does not contain this region */
> +            continue;
> +        }
> +        if (!(virt_addr >= (last_mapping->virt_addr)) ||
> +            !(virt_addr < (last_mapping->virt_addr + last_mapping->length))) {
> +            /* last_mapping does not contain this region */
> +            continue;
> +        }
> +        if ((virt_addr - last_mapping->virt_addr) !=
> +            (phys_addr - last_mapping->phys_addr)) {
> +            /*
> +             * last_mapping contains this region, but we should create another
> +             * mapping region.
> +             */
> +            break;
> +        }
> +
> +        /* merge this region into last_mapping */
> +        if ((virt_addr + length) >
> +            (last_mapping->virt_addr + last_mapping->length)) {
> +            last_mapping->length = virt_addr + length - last_mapping->virt_addr;
> +        }
> +        return;
> +    }
> +
> +    /* this region can not be merged into any existed memory mapping. */
> +    create_new_memory_mapping(list, phys_addr, virt_addr, length);
> +    return;
> +}
> +
> +void free_memory_mapping_list(MemoryMappingList *list)
> +{
> +    MemoryMapping *p, *q;
> +
> +    QTAILQ_FOREACH_SAFE(p, &list->head, next, q) {
> +        QTAILQ_REMOVE(&list->head, p, next);
> +        g_free(p);
> +    }
> +
> +    list->num = 0;
> +}
> diff --git a/memory_mapping.h b/memory_mapping.h
> new file mode 100644
> index 0000000..a4b1532
> --- /dev/null
> +++ b/memory_mapping.h
> @@ -0,0 +1,38 @@
> +#ifndef MEMORY_MAPPING_H
> +#define MEMORY_MAPPING_H
> +
> +#include "qemu-queue.h"
> +
> +typedef struct MemoryMapping {
> +    target_phys_addr_t phys_addr;
> +    target_ulong virt_addr;
> +    ram_addr_t length;
> +    QTAILQ_ENTRY(MemoryMapping) next;
> +} MemoryMapping;
> +
> +typedef struct MemoryMappingList {
> +    unsigned int num;

This field looks unused by this series. Unless I miss something, you
probably want to drop it.

> +    QTAILQ_HEAD(, MemoryMapping) head;
> +} MemoryMappingList;
> +
> +/*
> + * crash needs some memory mapping should be at the head of the list. It will
> + * cause the list is not sorted. So the caller must add the special memory
> + * mapping after adding all the normal memory mapping into list.
> + */
> +void create_new_memory_mapping_head(MemoryMappingList *list,
> +                                    target_phys_addr_t phys_addr,
> +                                    target_phys_addr_t virt_addr,
> +                                    ram_addr_t length);
> +/*
> + * add or merge the memory region into the memory mapping's list. The list is
> + * sorted by phys_addr.
> + */
> +void add_to_memory_mapping(MemoryMappingList *list,
> +                           target_phys_addr_t phys_addr,
> +                           target_phys_addr_t virt_addr,
> +                           ram_addr_t length);
> +
> +void free_memory_mapping_list(MemoryMappingList *list);
> +
> +#endif

A bit hard to understand and use the API. I would suggest:

memory_mapping_list_add_sorted(MemoryMappingList *list, ...);
memory_mapping_list_add_head(MemoryMappingList *list, ...);
memory_mapping_list_free(MemoryMappingList *list);

memory_mapping_list_add_head should set a flag in the MemoryMapping
appended to the list or let the MemoryMappingList point to the firs
sorted entry. That way, the adding order becomes irrelevant.

Moreover, you are lacking some
memory_mapping_list_init(MemoryMappingList *list). Cleaner than
open-coding this.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 03/16 v6] Add API to check whether a physical address is I/O address
  2012-02-09  3:21 ` [Qemu-devel] [RFC][PATCH 03/16 v6] Add API to check whether a physical address is I/O address Wen Congyang
@ 2012-02-14 16:52   ` Jan Kiszka
  2012-02-15  3:03     ` Wen Congyang
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Kiszka @ 2012-02-14 16:52 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

On 2012-02-09 04:21, Wen Congyang wrote:
> This API will be used in the following patch.
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> ---
>  cpu-common.h |    2 ++
>  exec.c       |   16 ++++++++++++++++
>  2 files changed, 18 insertions(+), 0 deletions(-)
> 
> diff --git a/cpu-common.h b/cpu-common.h
> index a40c57d..d047137 100644
> --- a/cpu-common.h
> +++ b/cpu-common.h
> @@ -71,6 +71,8 @@ void cpu_physical_memory_unmap(void *buffer, target_phys_addr_t len,
>  void *cpu_register_map_client(void *opaque, void (*callback)(void *opaque));
>  void cpu_unregister_map_client(void *cookie);
>  
> +bool is_io_addr(target_phys_addr_t phys_addr);

Something like cpu_physical_memory_is_io would be more consistent with
other APIs around.

> +
>  /* Coalesced MMIO regions are areas where write operations can be reordered.
>   * This usually implies that write operations are side-effect free.  This allows
>   * batching which can make a major impact on performance when using
> diff --git a/exec.c b/exec.c
> index b81677a..edc5684 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -4435,3 +4435,19 @@ bool virtio_is_big_endian(void)
>  #undef env
>  
>  #endif
> +
> +bool is_io_addr(target_phys_addr_t phys_addr)
> +{
> +    ram_addr_t pd;
> +    PhysPageDesc p;
> +
> +    p = phys_page_find(phys_addr >> TARGET_PAGE_BITS);
> +    pd = p.phys_offset;
> +
> +    if (!is_ram_rom_romd(pd)) {

return !is_ram_rom_romd(pd); ?

> +        /* I/O region */
> +        return true;
> +    }
> +
> +    return false;
> +}

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 04/16 v6] target-i386: implement cpu_get_memory_mapping()
  2012-02-09  3:21 ` [Qemu-devel] [RFC][PATCH 04/16 v6] target-i386: implement cpu_get_memory_mapping() Wen Congyang
@ 2012-02-14 17:07   ` Jan Kiszka
  2012-02-15  3:05     ` Wen Congyang
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Kiszka @ 2012-02-14 17:07 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

On 2012-02-09 04:21, Wen Congyang wrote:
> Walk cpu's page table and collect all virtual address and physical address mapping.
> Then, add these mapping into memory mapping list.
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> ---
>  Makefile.target         |    2 +-
>  cpu-all.h               |    7 ++
>  target-i386/arch-dump.c |  254 +++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 262 insertions(+), 1 deletions(-)
>  create mode 100644 target-i386/arch-dump.c
> 
> diff --git a/Makefile.target b/Makefile.target
> index e35e464..d6e5684 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -75,7 +75,7 @@ libobj-$(CONFIG_TCG_INTERPRETER) += tci.o
>  libobj-y += fpu/softfloat.o
>  libobj-y += op_helper.o helper.o
>  ifeq ($(TARGET_BASE_ARCH), i386)
> -libobj-y += cpuid.o
> +libobj-y += cpuid.o arch-dump.o
>  endif
>  libobj-$(TARGET_SPARC64) += vis_helper.o
>  libobj-$(CONFIG_NEED_MMU) += mmu.o
> diff --git a/cpu-all.h b/cpu-all.h
> index e2c3c49..4cd7fbb 100644
> --- a/cpu-all.h
> +++ b/cpu-all.h
> @@ -22,6 +22,7 @@
>  #include "qemu-common.h"
>  #include "qemu-tls.h"
>  #include "cpu-common.h"
> +#include "memory_mapping.h"
>  
>  /* some important defines:
>   *
> @@ -523,4 +524,10 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf);
>  int cpu_memory_rw_debug(CPUState *env, target_ulong addr,
>                          uint8_t *buf, int len, int is_write);
>  
> +#if defined(TARGET_I386)

Instead of collecting archs here, you could introduce some
HAVE_GET_MEMORY_MAPPING and let the targets that support that define it.

> +void cpu_get_memory_mapping(MemoryMappingList *list, CPUState *env);
> +#else
> +#define cpu_get_memory_mapping(list, env)

Better return an error from cpu_get_memory_mapping (and use static
inline) so that the caller can find out and report that dumping is not
supported for the current target.

> +#endif
> +
>  #endif /* CPU_ALL_H */
> diff --git a/target-i386/arch-dump.c b/target-i386/arch-dump.c
> new file mode 100644
> index 0000000..2e921c7
> --- /dev/null
> +++ b/target-i386/arch-dump.c
> @@ -0,0 +1,254 @@
> +/*
> + * i386 dump
> + *
> + * Copyright Fujitsu, Corp. 2011
> + *
> + * Authors:
> + *     Wen Congyang <wency@cn.fujitsu.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "cpu.h"
> +#include "cpu-all.h"
> +
> +/* PAE Paging or IA-32e Paging */
> +static void walk_pte(MemoryMappingList *list, target_phys_addr_t pte_start_addr,
> +                     int32_t a20_mask, target_ulong start_line_addr)
> +{
> +    target_phys_addr_t pte_addr, start_paddr;
> +    uint64_t pte;
> +    target_ulong start_vaddr;
> +    int i;
> +
> +    for (i = 0; i < 512; i++) {
> +        pte_addr = (pte_start_addr + i * 8) & a20_mask;
> +        pte = ldq_phys(pte_addr);
> +        if (!(pte & PG_PRESENT_MASK)) {
> +            /* not present */
> +            continue;
> +        }
> +
> +        start_paddr = (pte & ~0xfff) & ~(0x1ULL << 63);
> +        if (is_io_addr(start_paddr)) {
> +            /* I/O region */
> +            continue;
> +        }
> +
> +        start_vaddr = start_line_addr | ((i & 0x1fff) << 12);
> +        add_to_memory_mapping(list, start_paddr, start_vaddr, 1 << 12);
> +    }
> +}
> +
> +/* 32-bit Paging */
> +static void walk_pte2(MemoryMappingList *list,
> +                      target_phys_addr_t pte_start_addr, int32_t a20_mask,
> +                      target_ulong start_line_addr)
> +{
> +    target_phys_addr_t pte_addr, start_paddr;
> +    uint32_t pte;
> +    target_ulong start_vaddr;
> +    int i;
> +
> +    for (i = 0; i < 1024; i++) {
> +        pte_addr = (pte_start_addr + i * 4) & a20_mask;
> +        pte = ldl_phys(pte_addr);
> +        if (!(pte & PG_PRESENT_MASK)) {
> +            /* not present */
> +            continue;
> +        }
> +
> +        start_paddr = pte & ~0xfff;
> +        if (is_io_addr(start_paddr)) {
> +            /* I/O region */
> +            continue;
> +        }
> +
> +        start_vaddr = start_line_addr | ((i & 0x3ff) << 12);
> +        add_to_memory_mapping(list, start_paddr, start_vaddr, 1 << 12);
> +    }
> +}
> +
> +/* PAE Paging or IA-32e Paging */
> +static void walk_pde(MemoryMappingList *list, target_phys_addr_t pde_start_addr,
> +                     int32_t a20_mask, target_ulong start_line_addr)
> +{
> +    target_phys_addr_t pde_addr, pte_start_addr, start_paddr;
> +    uint64_t pde;
> +    target_ulong line_addr, start_vaddr;
> +    int i;
> +
> +    for (i = 0; i < 512; i++) {
> +        pde_addr = (pde_start_addr + i * 8) & a20_mask;
> +        pde = ldq_phys(pde_addr);
> +        if (!(pde & PG_PRESENT_MASK)) {
> +            /* not present */
> +            continue;
> +        }
> +
> +        line_addr = start_line_addr | ((i & 0x1ff) << 21);
> +        if (pde & PG_PSE_MASK) {
> +            /* 2 MB page */
> +            start_paddr = (pde & ~0x1fffff) & ~(0x1ULL << 63);
> +            if (is_io_addr(start_paddr)) {
> +                /* I/O region */
> +                continue;
> +            }
> +            start_vaddr = line_addr;
> +            add_to_memory_mapping(list, start_paddr, start_vaddr, 1 << 21);
> +            continue;
> +        }
> +
> +        pte_start_addr = (pde & ~0xfff) & a20_mask;
> +        walk_pte(list, pte_start_addr, a20_mask, line_addr);
> +    }
> +}
> +
> +/* 32-bit Paging */
> +static void walk_pde2(MemoryMappingList *list,
> +                      target_phys_addr_t pde_start_addr, int32_t a20_mask,
> +                      bool pse)
> +{
> +    target_phys_addr_t pde_addr, pte_start_addr, start_paddr;
> +    uint32_t pde;
> +    target_ulong line_addr, start_vaddr;
> +    int i;
> +
> +    for (i = 0; i < 1024; i++) {
> +        pde_addr = (pde_start_addr + i * 4) & a20_mask;
> +        pde = ldl_phys(pde_addr);
> +        if (!(pde & PG_PRESENT_MASK)) {
> +            /* not present */
> +            continue;
> +        }
> +
> +        line_addr = (((unsigned int)i & 0x3ff) << 22);
> +        if ((pde & PG_PSE_MASK) && pse) {
> +            /* 4 MB page */
> +            start_paddr = (pde & ~0x3fffff) | ((pde & 0x1fe000) << 19);
> +            if (is_io_addr(start_paddr)) {
> +                /* I/O region */
> +                continue;
> +            }
> +            start_vaddr = line_addr;
> +            add_to_memory_mapping(list, start_paddr, start_vaddr, 1 << 22);
> +            continue;
> +        }
> +
> +        pte_start_addr = (pde & ~0xfff) & a20_mask;
> +        walk_pte2(list, pte_start_addr, a20_mask, line_addr);
> +    }
> +}
> +
> +/* PAE Paging */
> +static void walk_pdpe2(MemoryMappingList *list,
> +                       target_phys_addr_t pdpe_start_addr, int32_t a20_mask)
> +{
> +    target_phys_addr_t pdpe_addr, pde_start_addr;
> +    uint64_t pdpe;
> +    target_ulong line_addr;
> +    int i;
> +
> +    for (i = 0; i < 4; i++) {
> +        pdpe_addr = (pdpe_start_addr + i * 8) & a20_mask;
> +        pdpe = ldq_phys(pdpe_addr);
> +        if (!(pdpe & PG_PRESENT_MASK)) {
> +            /* not present */
> +            continue;
> +        }
> +
> +        line_addr = (((unsigned int)i & 0x3) << 30);
> +        pde_start_addr = (pdpe & ~0xfff) & a20_mask;
> +        walk_pde(list, pde_start_addr, a20_mask, line_addr);
> +    }
> +}
> +
> +#ifdef TARGET_X86_64
> +/* IA-32e Paging */
> +static void walk_pdpe(MemoryMappingList *list,
> +                      target_phys_addr_t pdpe_start_addr, int32_t a20_mask,
> +                      target_ulong start_line_addr)
> +{
> +    target_phys_addr_t pdpe_addr, pde_start_addr, start_paddr;
> +    uint64_t pdpe;
> +    target_ulong line_addr, start_vaddr;
> +    int i;
> +
> +    for (i = 0; i < 512; i++) {
> +        pdpe_addr = (pdpe_start_addr + i * 8) & a20_mask;
> +        pdpe = ldq_phys(pdpe_addr);
> +        if (!(pdpe & PG_PRESENT_MASK)) {
> +            /* not present */
> +            continue;
> +        }
> +
> +        line_addr = start_line_addr | ((i & 0x1ffULL) << 30);
> +        if (pdpe & PG_PSE_MASK) {
> +            /* 1 GB page */
> +            start_paddr = (pdpe & ~0x3fffffff) & ~(0x1ULL << 63);
> +            if (is_io_addr(start_paddr)) {
> +                /* I/O region */
> +                continue;
> +            }
> +            start_vaddr = line_addr;
> +            add_to_memory_mapping(list, start_paddr, start_vaddr, 1 << 30);
> +            continue;
> +        }
> +
> +        pde_start_addr = (pdpe & ~0xfff) & a20_mask;
> +        walk_pde(list, pde_start_addr, a20_mask, line_addr);
> +    }
> +}
> +
> +/* IA-32e Paging */
> +static void walk_pml4e(MemoryMappingList *list,
> +                       target_phys_addr_t pml4e_start_addr, int32_t a20_mask)
> +{
> +    target_phys_addr_t pml4e_addr, pdpe_start_addr;
> +    uint64_t pml4e;
> +    target_ulong line_addr;
> +    int i;
> +
> +    for (i = 0; i < 512; i++) {
> +        pml4e_addr = (pml4e_start_addr + i * 8) & a20_mask;
> +        pml4e = ldq_phys(pml4e_addr);
> +        if (!(pml4e & PG_PRESENT_MASK)) {
> +            /* not present */
> +            continue;
> +        }
> +
> +        line_addr = ((i & 0x1ffULL) << 39) | (0xffffULL << 48);
> +        pdpe_start_addr = (pml4e & ~0xfff) & a20_mask;
> +        walk_pdpe(list, pdpe_start_addr, a20_mask, line_addr);
> +    }
> +}
> +#endif
> +
> +void cpu_get_memory_mapping(MemoryMappingList *list, CPUState *env)
> +{
> +    if (env->cr[4] & CR4_PAE_MASK) {
> +#ifdef TARGET_X86_64
> +        if (env->hflags & HF_LMA_MASK) {
> +            target_phys_addr_t pml4e_addr;
> +
> +            pml4e_addr = (env->cr[3] & ~0xfff) & env->a20_mask;
> +            walk_pml4e(list, pml4e_addr, env->a20_mask);
> +        } else
> +#endif
> +        {
> +            target_phys_addr_t pdpe_addr;
> +
> +            pdpe_addr = (env->cr[3] & ~0x1f) & env->a20_mask;
> +            walk_pdpe2(list, pdpe_addr, env->a20_mask);
> +        }
> +    } else {
> +        target_phys_addr_t pde_addr;
> +        bool pse;
> +
> +        pde_addr = (env->cr[3] & ~0xfff) & env->a20_mask;
> +        pse = !!(env->cr[4] & CR4_PSE_MASK);
> +        walk_pde2(list, pde_addr, env->a20_mask, pse);
> +    }
> +}

I haven't checked all paging details, but it looks good otherwise.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 05/16 v6] Add API to get memory mapping
  2012-02-09  3:22 ` [Qemu-devel] [RFC][PATCH 05/16 v6] Add API to get memory mapping Wen Congyang
@ 2012-02-14 17:21   ` Jan Kiszka
  2012-02-15  4:07     ` Wen Congyang
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Kiszka @ 2012-02-14 17:21 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

On 2012-02-09 04:22, Wen Congyang wrote:
> Add API to get all virtual address and physical address mapping.
> If there is no virtual address for some physical address, the virtual
> address is 0.
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> ---
>  memory_mapping.c |   65 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  memory_mapping.h |    1 +
>  2 files changed, 66 insertions(+), 0 deletions(-)
> 
> diff --git a/memory_mapping.c b/memory_mapping.c
> index d83b7d7..fc0ddee 100644
> --- a/memory_mapping.c
> +++ b/memory_mapping.c
> @@ -128,3 +128,68 @@ void free_memory_mapping_list(MemoryMappingList *list)
>  
>      list->num = 0;
>  }
> +
> +void get_memory_mapping(MemoryMappingList *list)
> +{
> +    CPUState *env;
> +    MemoryMapping *memory_mapping;
> +    RAMBlock *block;
> +    ram_addr_t offset, length;
> +
> +    last_mapping = NULL;
> +
> +    for (env = first_cpu; env != NULL; env = env->next_cpu) {
> +        cpu_get_memory_mapping(list, env);

Hmm, is the CPU number recorded along with the mappings? I mean, how
could crash tell them apart afterward if they are contradictory? This
way, they are just thrown in the same bucket, correct?

Even if crash or gdb aren't prepared for cpu/thread-specific mappings,
could we already record that information for later use? Or would it
break compatibility with current versions?

> +    }
> +
> +    /* some memory may be not mapped, add them into memory mapping's list */
> +    QLIST_FOREACH(block, &ram_list.blocks, next) {
> +        offset = block->offset;
> +        length = block->length;
> +
> +        QTAILQ_FOREACH(memory_mapping, &list->head, next) {
> +            if (memory_mapping->phys_addr >= (offset + length)) {
> +                /*
> +                 * memory_mapping's list does not conatin the region
> +                 * [offset, offset+length)
> +                 */
> +                create_new_memory_mapping(list, offset, 0, length);
> +                length = 0;
> +                break;
> +            }
> +
> +            if ((memory_mapping->phys_addr + memory_mapping->length) <=
> +                offset) {
> +                continue;
> +            }
> +
> +            if (memory_mapping->phys_addr > offset) {
> +                /*
> +                 * memory_mapping's list does not conatin the region
> +                 * [offset, memory_mapping->phys_addr)
> +                 */
> +                create_new_memory_mapping(list, offset, 0,
> +                                          memory_mapping->phys_addr - offset);
> +            }
> +
> +            if ((offset + length) <=
> +                (memory_mapping->phys_addr + memory_mapping->length)) {
> +                length = 0;
> +                break;
> +            }
> +            length -= memory_mapping->phys_addr + memory_mapping->length -
> +                      offset;
> +            offset = memory_mapping->phys_addr + memory_mapping->length;
> +        }
> +
> +        if (length > 0) {
> +            /*
> +             * memory_mapping's list does not conatin the region
> +             * [offset, memory_mapping->phys_addr)
> +             */
> +            create_new_memory_mapping(list, offset, 0, length);
> +        }
> +    }
> +
> +    return;

Please avoid redundant returns.

> +}
> diff --git a/memory_mapping.h b/memory_mapping.h
> index a4b1532..679f9ef 100644
> --- a/memory_mapping.h
> +++ b/memory_mapping.h
> @@ -34,5 +34,6 @@ void add_to_memory_mapping(MemoryMappingList *list,
>                             ram_addr_t length);
>  
>  void free_memory_mapping_list(MemoryMappingList *list);
> +void get_memory_mapping(MemoryMappingList *list);
>  
>  #endif

Maybe [qemu_]get_guest_memory_mapping. Just get_memory_mapping sounds a
bit to generic to me. Could be any mapping.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 06/16 v6] target-i386: Add API to write elf notes to core file
  2012-02-09  3:24 ` [Qemu-devel] [RFC][PATCH 06/16 v6] target-i386: Add API to write elf notes to core file Wen Congyang
@ 2012-02-14 17:31   ` Jan Kiszka
  2012-02-15  3:16     ` Wen Congyang
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Kiszka @ 2012-02-14 17:31 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

On 2012-02-09 04:24, Wen Congyang wrote:
> The core file contains register's value. These APIs write registers to
> core file, and them will be called in the following patch.
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> ---
>  cpu-all.h               |    6 +
>  target-i386/arch-dump.c |  243 +++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 249 insertions(+), 0 deletions(-)
> 
> diff --git a/cpu-all.h b/cpu-all.h
> index 4cd7fbb..efb5ba3 100644
> --- a/cpu-all.h
> +++ b/cpu-all.h
> @@ -526,8 +526,14 @@ int cpu_memory_rw_debug(CPUState *env, target_ulong addr,
>  
>  #if defined(TARGET_I386)
>  void cpu_get_memory_mapping(MemoryMappingList *list, CPUState *env);
> +int cpu_write_elf64_note(int fd, CPUState *env, int cpuid,
> +                         target_phys_addr_t *offset);
> +int cpu_write_elf32_note(int fd, CPUState *env, int cpuid,
> +                         target_phys_addr_t *offset);

Again, some HAVE_XXX would be nicer. Maybe you put the whole block under
HAVE_GUEST_CORE_DUMP or so.

Is writing to file descriptor generic enough? What if we want to dump
via QMP, letting the receiver side decide about where to write it?

>  #else
>  #define cpu_get_memory_mapping(list, env)
> +#define cpu_write_elf64_note(fd, env, cpuid, offset) ({ -1; })
> +#define cpu_write_elf32_note(fd, env, cpuid, offset) ({ -1; })
>  #endif
>  
>  #endif /* CPU_ALL_H */
> diff --git a/target-i386/arch-dump.c b/target-i386/arch-dump.c
> index 2e921c7..4c0ff77 100644
> --- a/target-i386/arch-dump.c
> +++ b/target-i386/arch-dump.c
> @@ -11,8 +11,11 @@
>   *
>   */
>  
> +#include <elf.h>

Does this create a new dependency and break non-Linux hosts? Can you
pull the required bits into qemu's elf.h then?

> +
>  #include "cpu.h"
>  #include "cpu-all.h"
> +#include "monitor.h"
>  
>  /* PAE Paging or IA-32e Paging */
>  static void walk_pte(MemoryMappingList *list, target_phys_addr_t pte_start_addr,
> @@ -252,3 +255,243 @@ void cpu_get_memory_mapping(MemoryMappingList *list, CPUState *env)
>          walk_pde2(list, pde_addr, env->a20_mask, pse);
>      }
>  }
> +
> +#ifdef TARGET_X86_64
> +typedef struct {
> +    target_ulong r15, r14, r13, r12, rbp, rbx, r11, r10;
> +    target_ulong r9, r8, rax, rcx, rdx, rsi, rdi, orig_rax;
> +    target_ulong rip, cs, eflags;
> +    target_ulong rsp, ss;
> +    target_ulong fs_base, gs_base;
> +    target_ulong ds, es, fs, gs;
> +} x86_64_user_regs_struct;
> +
> +static int x86_64_write_elf64_note(int fd, CPUState *env, int id,
> +                                   target_phys_addr_t *offset)
> +{
> +    x86_64_user_regs_struct regs;
> +    Elf64_Nhdr *note;
> +    char *buf;
> +    int descsz, note_size, name_size = 5;
> +    const char *name = "CORE";
> +    int ret;
> +
> +    regs.r15 = env->regs[15];
> +    regs.r14 = env->regs[14];
> +    regs.r13 = env->regs[13];
> +    regs.r12 = env->regs[12];
> +    regs.r11 = env->regs[11];
> +    regs.r10 = env->regs[10];
> +    regs.r9  = env->regs[9];
> +    regs.r8  = env->regs[8];
> +    regs.rbp = env->regs[R_EBP];
> +    regs.rsp = env->regs[R_ESP];
> +    regs.rdi = env->regs[R_EDI];
> +    regs.rsi = env->regs[R_ESI];
> +    regs.rdx = env->regs[R_EDX];
> +    regs.rcx = env->regs[R_ECX];
> +    regs.rbx = env->regs[R_EBX];
> +    regs.rax = env->regs[R_EAX];
> +    regs.rip = env->eip;
> +    regs.eflags = env->eflags;
> +
> +    regs.orig_rax = 0; /* FIXME */
> +    regs.cs = env->segs[R_CS].selector;
> +    regs.ss = env->segs[R_SS].selector;
> +    regs.fs_base = env->segs[R_FS].base;
> +    regs.gs_base = env->segs[R_GS].base;
> +    regs.ds = env->segs[R_DS].selector;
> +    regs.es = env->segs[R_ES].selector;
> +    regs.fs = env->segs[R_FS].selector;
> +    regs.gs = env->segs[R_GS].selector;
> +
> +    descsz = 336; /* sizeof(prstatus_t) is 336 on x86_64 box */
> +    note_size = ((sizeof(Elf64_Nhdr) + 3) / 4 + (name_size + 3) / 4 +
> +                (descsz + 3) / 4) * 4;
> +    note = g_malloc(note_size);
> +
> +    memset(note, 0, note_size);
> +    note->n_namesz = cpu_to_le32(name_size);
> +    note->n_descsz = cpu_to_le32(descsz);
> +    note->n_type = cpu_to_le32(NT_PRSTATUS);
> +    buf = (char *)note;
> +    buf += ((sizeof(Elf64_Nhdr) + 3) / 4) * 4;
> +    memcpy(buf, name, name_size);
> +    buf += ((name_size + 3) / 4) * 4;
> +    memcpy(buf + 32, &id, 4); /* pr_pid */
> +    buf += descsz - sizeof(x86_64_user_regs_struct)-sizeof(target_ulong);
> +    memcpy(buf, &regs, sizeof(x86_64_user_regs_struct));
> +
> +    lseek(fd, *offset, SEEK_SET);
> +    ret = write(fd, note, note_size);
> +    g_free(note);
> +    if (ret < 0) {
> +        return -1;
> +    }
> +
> +    *offset += note_size;
> +
> +    return 0;
> +}
> +#endif
> +
> +typedef struct {
> +    uint32_t ebx, ecx, edx, esi, edi, ebp, eax;
> +    unsigned short ds, __ds, es, __es;
> +    unsigned short fs, __fs, gs, __gs;
> +    uint32_t orig_eax, eip;
> +    unsigned short cs, __cs;
> +    uint32_t eflags, esp;
> +    unsigned short ss, __ss;
> +} x86_user_regs_struct;
> +
> +static int x86_write_elf64_note(int fd, CPUState *env, int id,
> +                                target_phys_addr_t *offset)
> +{
> +    x86_user_regs_struct regs;
> +    Elf64_Nhdr *note;
> +    char *buf;
> +    int descsz, note_size, name_size = 5;
> +    const char *name = "CORE";
> +    int ret;
> +
> +    regs.ebp = env->regs[R_EBP] & 0xffffffff;
> +    regs.esp = env->regs[R_ESP] & 0xffffffff;
> +    regs.edi = env->regs[R_EDI] & 0xffffffff;
> +    regs.esi = env->regs[R_ESI] & 0xffffffff;
> +    regs.edx = env->regs[R_EDX] & 0xffffffff;
> +    regs.ecx = env->regs[R_ECX] & 0xffffffff;
> +    regs.ebx = env->regs[R_EBX] & 0xffffffff;
> +    regs.eax = env->regs[R_EAX] & 0xffffffff;
> +    regs.eip = env->eip & 0xffffffff;
> +    regs.eflags = env->eflags & 0xffffffff;
> +
> +    regs.cs = env->segs[R_CS].selector;
> +    regs.__cs = 0;
> +    regs.ss = env->segs[R_SS].selector;
> +    regs.__ss = 0;
> +    regs.ds = env->segs[R_DS].selector;
> +    regs.__ds = 0;
> +    regs.es = env->segs[R_ES].selector;
> +    regs.__es = 0;
> +    regs.fs = env->segs[R_FS].selector;
> +    regs.__fs = 0;
> +    regs.gs = env->segs[R_GS].selector;
> +    regs.__gs = 0;
> +
> +    descsz = 144; /* sizeof(prstatus_t) is 144 on x86 box */
> +    note_size = ((sizeof(Elf64_Nhdr) + 3) / 4 + (name_size + 3) / 4 +
> +                (descsz + 3) / 4) * 4;
> +    note = g_malloc(note_size);
> +
> +    memset(note, 0, note_size);
> +    note->n_namesz = cpu_to_le32(name_size);
> +    note->n_descsz = cpu_to_le32(descsz);
> +    note->n_type = cpu_to_le32(NT_PRSTATUS);
> +    buf = (char *)note;
> +    buf += ((sizeof(Elf64_Nhdr) + 3) / 4) * 4;
> +    memcpy(buf, name, name_size);
> +    buf += ((name_size + 3) / 4) * 4;
> +    memcpy(buf + 24, &id, 4); /* pr_pid */
> +    buf += descsz - sizeof(x86_user_regs_struct)-4;
> +    memcpy(buf, &regs, sizeof(x86_user_regs_struct));
> +
> +    lseek(fd, *offset, SEEK_SET);
> +    ret = write(fd, note, note_size);
> +    g_free(note);
> +    if (ret < 0) {
> +        return -1;
> +    }
> +
> +    *offset += note_size;
> +
> +    return 0;
> +}
> +
> +static int x86_write_elf32_note(int fd, CPUState *env, int id,
> +                                target_phys_addr_t *offset)
> +{
> +    x86_user_regs_struct regs;
> +    Elf32_Nhdr *note;
> +    char *buf;
> +    int descsz, note_size, name_size = 5;
> +    const char *name = "CORE";
> +    int ret;
> +
> +    regs.ebp = env->regs[R_EBP] & 0xffffffff;
> +    regs.esp = env->regs[R_ESP] & 0xffffffff;
> +    regs.edi = env->regs[R_EDI] & 0xffffffff;
> +    regs.esi = env->regs[R_ESI] & 0xffffffff;
> +    regs.edx = env->regs[R_EDX] & 0xffffffff;
> +    regs.ecx = env->regs[R_ECX] & 0xffffffff;
> +    regs.ebx = env->regs[R_EBX] & 0xffffffff;
> +    regs.eax = env->regs[R_EAX] & 0xffffffff;
> +    regs.eip = env->eip & 0xffffffff;
> +    regs.eflags = env->eflags & 0xffffffff;
> +
> +    regs.cs = env->segs[R_CS].selector;
> +    regs.__cs = 0;
> +    regs.ss = env->segs[R_SS].selector;
> +    regs.__ss = 0;
> +    regs.ds = env->segs[R_DS].selector;
> +    regs.__ds = 0;
> +    regs.es = env->segs[R_ES].selector;
> +    regs.__es = 0;
> +    regs.fs = env->segs[R_FS].selector;
> +    regs.__fs = 0;
> +    regs.gs = env->segs[R_GS].selector;
> +    regs.__gs = 0;
> +
> +    descsz = 144; /* sizeof(prstatus_t) is 144 on x86 box */
> +    note_size = ((sizeof(Elf32_Nhdr) + 3) / 4 + (name_size + 3) / 4 +
> +                (descsz + 3) / 4) * 4;
> +    note = g_malloc(note_size);
> +
> +    memset(note, 0, note_size);
> +    note->n_namesz = cpu_to_le32(name_size);
> +    note->n_descsz = cpu_to_le32(descsz);
> +    note->n_type = cpu_to_le32(NT_PRSTATUS);
> +    buf = (char *)note;
> +    buf += ((sizeof(Elf32_Nhdr) + 3) / 4) * 4;
> +    memcpy(buf, name, name_size);
> +    buf += ((name_size + 3) / 4) * 4;
> +    memcpy(buf + 24, &id, 4); /* pr_pid */
> +    buf += descsz - sizeof(x86_user_regs_struct)-4;
> +    memcpy(buf, &regs, sizeof(x86_user_regs_struct));
> +
> +    lseek(fd, *offset, SEEK_SET);
> +    ret = write(fd, note, note_size);
> +    g_free(note);
> +    if (ret < 0) {
> +        return -1;
> +    }
> +
> +    *offset += note_size;
> +
> +    return 0;
> +}
> +
> +int cpu_write_elf64_note(int fd, CPUState *env, int cpuid,
> +                         target_phys_addr_t *offset)
> +{
> +    int ret;
> +#ifdef TARGET_X86_64
> +    bool lma = !!(first_cpu->hflags & HF_LMA_MASK);
> +
> +    if (lma) {
> +        ret = x86_64_write_elf64_note(fd, env, cpuid, offset);
> +    } else {
> +#endif
> +        ret = x86_write_elf64_note(fd, env, cpuid, offset);
> +#ifdef TARGET_X86_64
> +    }
> +#endif
> +
> +    return ret;
> +}
> +
> +int cpu_write_elf32_note(int fd, CPUState *env, int cpuid,
> +                         target_phys_addr_t *offset)
> +{
> +    return x86_write_elf32_note(fd, env, cpuid, offset);
> +}

Minor nit: I think this wrapping is not needed, just fold
x86_write_elf32_note into this function.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 07/16 v6] target-i386: Add API to add extra memory mapping
  2012-02-09  3:24 ` [Qemu-devel] [RFC][PATCH 07/16 v6] target-i386: Add API to add extra memory mapping Wen Congyang
@ 2012-02-14 17:35   ` Jan Kiszka
  2012-02-15  5:19     ` Wen Congyang
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Kiszka @ 2012-02-14 17:35 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

On 2012-02-09 04:24, Wen Congyang wrote:
> Crash needs extra memory mapping to determine phys_base.
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> ---
>  cpu-all.h               |    2 ++
>  target-i386/arch-dump.c |   43 +++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 45 insertions(+), 0 deletions(-)
> 
> diff --git a/cpu-all.h b/cpu-all.h
> index efb5ba3..290c43a 100644
> --- a/cpu-all.h
> +++ b/cpu-all.h
> @@ -530,10 +530,12 @@ int cpu_write_elf64_note(int fd, CPUState *env, int cpuid,
>                           target_phys_addr_t *offset);
>  int cpu_write_elf32_note(int fd, CPUState *env, int cpuid,
>                           target_phys_addr_t *offset);
> +int cpu_add_extra_memory_mapping(MemoryMappingList *list);
>  #else
>  #define cpu_get_memory_mapping(list, env)
>  #define cpu_write_elf64_note(fd, env, cpuid, offset) ({ -1; })
>  #define cpu_write_elf32_note(fd, env, cpuid, offset) ({ -1; })
> +#define cpu_add_extra_memory_mapping(list) ({ 0; })
>  #endif
>  
>  #endif /* CPU_ALL_H */
> diff --git a/target-i386/arch-dump.c b/target-i386/arch-dump.c
> index 4c0ff77..d96f6ae 100644
> --- a/target-i386/arch-dump.c
> +++ b/target-i386/arch-dump.c
> @@ -495,3 +495,46 @@ int cpu_write_elf32_note(int fd, CPUState *env, int cpuid,
>  {
>      return x86_write_elf32_note(fd, env, cpuid, offset);
>  }
> +
> +/* This function is copied from crash */

And what does it do there and here? I suppose it is Linux-specific - any
version? This should be documented and encoded in the function name.

> +static target_ulong get_phys_base_addr(CPUState *env, target_ulong *base_vaddr)
> +{
> +    int i;
> +    target_ulong kernel_base = -1;
> +    target_ulong last, mask;
> +
> +    for (i = 30, last = -1; (kernel_base == -1) && (i >= 20); i--) {
> +        mask = ~((1LL << i) - 1);
> +        *base_vaddr = env->idt.base & mask;
> +        if (*base_vaddr == last) {
> +            continue;
> +        }
> +
> +        kernel_base = cpu_get_phys_page_debug(env, *base_vaddr);
> +        last = *base_vaddr;
> +    }
> +
> +    return kernel_base;
> +}
> +
> +int cpu_add_extra_memory_mapping(MemoryMappingList *list)

Again, what does "extra" mean? Probably guest-specific, no?

> +{
> +#ifdef TARGET_X86_64
> +    target_phys_addr_t kernel_base = -1;
> +    target_ulong base_vaddr;
> +    bool lma = !!(first_cpu->hflags & HF_LMA_MASK);
> +
> +    if (!lma) {
> +        return 0;
> +    }
> +
> +    kernel_base = get_phys_base_addr(first_cpu, &base_vaddr);
> +    if (kernel_base == -1) {
> +        return -1;
> +    }
> +
> +    create_new_memory_mapping_head(list, kernel_base, base_vaddr,
> +                                   TARGET_PAGE_SIZE);
> +#endif
> +    return 0;
> +}

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 08/16 v6] target-i386: add API to get dump info
  2012-02-09  3:26 ` [Qemu-devel] [RFC][PATCH 08/16 v6] target-i386: add API to get dump info Wen Congyang
@ 2012-02-14 17:39   ` Jan Kiszka
  2012-02-15  3:30     ` Wen Congyang
  2012-02-15  9:12   ` Peter Maydell
  1 sibling, 1 reply; 68+ messages in thread
From: Jan Kiszka @ 2012-02-14 17:39 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

On 2012-02-09 04:26, Wen Congyang wrote:
> Dump info contains: endian, class and architecture. The next
> patch will use these information to create vmcore.
> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> ---
>  cpu-all.h               |    3 +++
>  dump.h                  |   10 ++++++++++
>  target-i386/arch-dump.c |   34 ++++++++++++++++++++++++++++++++++
>  3 files changed, 47 insertions(+), 0 deletions(-)
>  create mode 100644 dump.h
> 
> diff --git a/cpu-all.h b/cpu-all.h
> index 290c43a..268d1f6 100644
> --- a/cpu-all.h
> +++ b/cpu-all.h
> @@ -23,6 +23,7 @@
>  #include "qemu-tls.h"
>  #include "cpu-common.h"
>  #include "memory_mapping.h"
> +#include "dump.h"
>  
>  /* some important defines:
>   *
> @@ -531,11 +532,13 @@ int cpu_write_elf64_note(int fd, CPUState *env, int cpuid,
>  int cpu_write_elf32_note(int fd, CPUState *env, int cpuid,
>                           target_phys_addr_t *offset);
>  int cpu_add_extra_memory_mapping(MemoryMappingList *list);
> +int cpu_get_dump_info(ArchDumpInfo *info);
>  #else
>  #define cpu_get_memory_mapping(list, env)
>  #define cpu_write_elf64_note(fd, env, cpuid, offset) ({ -1; })
>  #define cpu_write_elf32_note(fd, env, cpuid, offset) ({ -1; })
>  #define cpu_add_extra_memory_mapping(list) ({ 0; })
> +#define cpu_get_dump_info(info) ({ -1; })

Please use static inlines where possible (applies to earlier patches as
well).

>  #endif
>  
>  #endif /* CPU_ALL_H */
> diff --git a/dump.h b/dump.h
> new file mode 100644
> index 0000000..a36468b
> --- /dev/null
> +++ b/dump.h
> @@ -0,0 +1,10 @@

License header missing.

> +#ifndef DUMP_H
> +#define DUMP_H
> +
> +typedef struct ArchDumpInfo {
> +    int d_machine;  /* Architecture */
> +    int d_endian;   /* ELFDATA2LSB or ELFDATA2MSB */
> +    int d_class;    /* ELFCLASS32 or ELFCLASS64 */
> +} ArchDumpInfo;
> +
> +#endif
> diff --git a/target-i386/arch-dump.c b/target-i386/arch-dump.c
> index d96f6ae..92a53bc 100644
> --- a/target-i386/arch-dump.c
> +++ b/target-i386/arch-dump.c
> @@ -15,6 +15,7 @@
>  
>  #include "cpu.h"
>  #include "cpu-all.h"
> +#include "dump.h"
>  #include "monitor.h"
>  
>  /* PAE Paging or IA-32e Paging */
> @@ -538,3 +539,36 @@ int cpu_add_extra_memory_mapping(MemoryMappingList *list)
>  #endif
>      return 0;
>  }
> +
> +int cpu_get_dump_info(ArchDumpInfo *info)
> +{
> +    bool lma = false;
> +    RAMBlock *block;
> +
> +#ifdef TARGET_X86_64
> +    lma = !!(first_cpu->hflags & HF_LMA_MASK);
> +#endif
> +
> +    if (lma) {
> +        info->d_machine = EM_X86_64;
> +    } else {
> +        info->d_machine = EM_386;
> +    }
> +    info->d_endian = ELFDATA2LSB;
> +
> +    if (lma) {
> +        info->d_class = ELFCLASS64;
> +    } else {
> +        info->d_class = ELFCLASS32;
> +    }
> +
> +    QLIST_FOREACH(block, &ram_list.blocks, next) {
> +        if (!lma && (block->offset + block->length > UINT_MAX)) {
> +            /* The memory size is greater than 4G */
> +            info->d_class = ELFCLASS32;

Is that correct, or did you rather mean ELFCLASS64?

> +            break;
> +        }
> +    }
> +
> +    return 0;
> +}

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 09/16 v6] introduce a new monitor command 'dump' to dump guest's memory
  2012-02-09  3:28 ` [Qemu-devel] [RFC][PATCH 09/16 v6] introduce a new monitor command 'dump' to dump guest's memory Wen Congyang
@ 2012-02-14 17:59   ` Jan Kiszka
  2012-02-15  3:44     ` Wen Congyang
  2012-02-17  8:52     ` Wen Congyang
  0 siblings, 2 replies; 68+ messages in thread
From: Jan Kiszka @ 2012-02-14 17:59 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

On 2012-02-09 04:28, Wen Congyang wrote:
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> ---
>  Makefile.target  |    8 +-
>  dump.c           |  590 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  dump.h           |    3 +
>  hmp-commands.hx  |   16 ++
>  hmp.c            |    9 +
>  hmp.h            |    1 +
>  monitor.c        |    3 +
>  qapi-schema.json |   13 ++
>  qmp-commands.hx  |   26 +++
>  9 files changed, 665 insertions(+), 4 deletions(-)
>  create mode 100644 dump.c
> 
> diff --git a/Makefile.target b/Makefile.target
> index d6e5684..f39ce2f 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -112,7 +112,7 @@ $(call set-vpath, $(SRC_PATH)/linux-user:$(SRC_PATH)/linux-user/$(TARGET_ABI_DIR
>  QEMU_CFLAGS+=-I$(SRC_PATH)/linux-user/$(TARGET_ABI_DIR) -I$(SRC_PATH)/linux-user
>  obj-y = main.o syscall.o strace.o mmap.o signal.o thunk.o \
>        elfload.o linuxload.o uaccess.o gdbstub.o cpu-uname.o \
> -      user-exec.o $(oslib-obj-y)
> +      user-exec.o $(oslib-obj-y) dump.o
> 
>  obj-$(TARGET_HAS_BFLT) += flatload.o
> 
> @@ -150,7 +150,7 @@ LDFLAGS+=-Wl,-segaddr,__STD_PROG_ZONE,0x1000 -image_base 0x0e000000
>  LIBS+=-lmx
> 
>  obj-y = main.o commpage.o machload.o mmap.o signal.o syscall.o thunk.o \
> -        gdbstub.o user-exec.o
> +        gdbstub.o user-exec.o dump.o
> 
>  obj-i386-y += ioport-user.o
> 
> @@ -172,7 +172,7 @@ $(call set-vpath, $(SRC_PATH)/bsd-user)
>  QEMU_CFLAGS+=-I$(SRC_PATH)/bsd-user -I$(SRC_PATH)/bsd-user/$(TARGET_ARCH)
> 
>  obj-y = main.o bsdload.o elfload.o mmap.o signal.o strace.o syscall.o \
> -        gdbstub.o uaccess.o user-exec.o
> +        gdbstub.o uaccess.o user-exec.o dump.o
> 
>  obj-i386-y += ioport-user.o
> 
> @@ -188,7 +188,7 @@ endif #CONFIG_BSD_USER
>  # System emulator target
>  ifdef CONFIG_SOFTMMU
> 
> -obj-y = arch_init.o cpus.o monitor.o machine.o gdbstub.o balloon.o ioport.o
> +obj-y = arch_init.o cpus.o monitor.o machine.o gdbstub.o balloon.o ioport.o dump.o
>  # virtio has to be here due to weird dependency between PCI and virtio-net.
>  # need to fix this properly
>  obj-$(CONFIG_NO_PCI) += pci-stub.o
> diff --git a/dump.c b/dump.c
> new file mode 100644
> index 0000000..a0e8b86
> --- /dev/null
> +++ b/dump.c
> @@ -0,0 +1,590 @@
> +/*
> + * QEMU dump
> + *
> + * Copyright Fujitsu, Corp. 2011
> + *
> + * Authors:
> + *     Wen Congyang <wency@cn.fujitsu.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu-common.h"
> +#include <unistd.h>
> +#include <elf.h>
> +#include <sys/procfs.h>
> +#include <glib.h>
> +#include "cpu.h"
> +#include "cpu-all.h"
> +#include "targphys.h"
> +#include "monitor.h"
> +#include "kvm.h"
> +#include "dump.h"
> +#include "sysemu.h"
> +#include "bswap.h"
> +#include "memory_mapping.h"
> +#include "error.h"
> +#include "qmp-commands.h"
> +
> +#define CPU_CONVERT_TO_TARGET16(val) \
> +({ \
> +    uint16_t _val = (val); \
> +    if (endian == ELFDATA2LSB) { \
> +        _val = cpu_to_le16(_val); \
> +    } else {\
> +        _val = cpu_to_be16(_val); \
> +    } \
> +    _val; \
> +})
> +
> +#define CPU_CONVERT_TO_TARGET32(val) \
> +({ \
> +    uint32_t _val = (val); \
> +    if (endian == ELFDATA2LSB) { \
> +        _val = cpu_to_le32(_val); \
> +    } else {\
> +        _val = cpu_to_be32(_val); \
> +    } \
> +    _val; \
> +})
> +
> +#define CPU_CONVERT_TO_TARGET64(val) \
> +({ \
> +    uint64_t _val = (val); \
> +    if (endian == ELFDATA2LSB) { \
> +        _val = cpu_to_le64(_val); \
> +    } else {\
> +        _val = cpu_to_be64(_val); \
> +    } \
> +    _val; \
> +})

static inline functions, please.

> +
> +enum {
> +    DUMP_STATE_ERROR,
> +    DUMP_STATE_SETUP,
> +    DUMP_STATE_CANCELLED,
> +    DUMP_STATE_ACTIVE,
> +    DUMP_STATE_COMPLETED,
> +};
> +
> +typedef struct DumpState {
> +    ArchDumpInfo dump_info;
> +    MemoryMappingList list;
> +    int phdr_num;
> +    int state;
> +    char *error;
> +    int fd;
> +    target_phys_addr_t memory_offset;
> +} DumpState;
> +
> +static DumpState *dump_get_current(void)
> +{
> +    static DumpState current_dump = {
> +        .state = DUMP_STATE_SETUP,
> +    };
> +
> +    return &current_dump;
> +}
> +
> +static int dump_cleanup(DumpState *s)
> +{
> +    int ret = 0;
> +
> +    free_memory_mapping_list(&s->list);
> +    if (s->fd != -1) {
> +        close(s->fd);
> +        s->fd = -1;
> +    }
> +
> +    return ret;
> +}
> +
> +static void dump_error(DumpState *s, const char *reason)
> +{
> +    s->state = DUMP_STATE_ERROR;
> +    s->error = g_strdup(reason);
> +    dump_cleanup(s);
> +}
> +
> +static inline int cpuid(CPUState *env)
> +{
> +#if defined(CONFIG_USER_ONLY) && defined(CONFIG_USE_NPTL)
> +    return env->host_tid;

Curious: Does this command already work with user mode guest?

> +#else
> +    return env->cpu_index + 1;
> +#endif
> +}

There is gdb_id in gdbstub. It should be made generally avialable and
reused here.

> +
> +static int write_elf64_header(DumpState *s)
> +{
> +    Elf64_Ehdr elf_header;
> +    int ret;
> +    int endian = s->dump_info.d_endian;
> +
> +    memset(&elf_header, 0, sizeof(Elf64_Ehdr));
> +    memcpy(&elf_header, ELFMAG, 4);
> +    elf_header.e_ident[EI_CLASS] = ELFCLASS64;
> +    elf_header.e_ident[EI_DATA] = s->dump_info.d_endian;
> +    elf_header.e_ident[EI_VERSION] = EV_CURRENT;
> +    elf_header.e_type = CPU_CONVERT_TO_TARGET16(ET_CORE);
> +    elf_header.e_machine = CPU_CONVERT_TO_TARGET16(s->dump_info.d_machine);
> +    elf_header.e_version = CPU_CONVERT_TO_TARGET32(EV_CURRENT);
> +    elf_header.e_ehsize = CPU_CONVERT_TO_TARGET16(sizeof(elf_header));
> +    elf_header.e_phoff = CPU_CONVERT_TO_TARGET64(sizeof(Elf64_Ehdr));
> +    elf_header.e_phentsize = CPU_CONVERT_TO_TARGET16(sizeof(Elf64_Phdr));
> +    elf_header.e_phnum = CPU_CONVERT_TO_TARGET16(s->phdr_num);
> +
> +    lseek(s->fd, 0, SEEK_SET);
> +    ret = write(s->fd, &elf_header, sizeof(elf_header));
> +    if (ret < 0) {
> +        dump_error(s, "dump: failed to write elf header.\n");
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +static int write_elf32_header(DumpState *s)
> +{
> +    Elf32_Ehdr elf_header;
> +    int ret;
> +    int endian = s->dump_info.d_endian;
> +
> +    memset(&elf_header, 0, sizeof(Elf32_Ehdr));
> +    memcpy(&elf_header, ELFMAG, 4);
> +    elf_header.e_ident[EI_CLASS] = ELFCLASS32;
> +    elf_header.e_ident[EI_DATA] = endian;
> +    elf_header.e_ident[EI_VERSION] = EV_CURRENT;
> +    elf_header.e_type = CPU_CONVERT_TO_TARGET16(ET_CORE);
> +    elf_header.e_machine = CPU_CONVERT_TO_TARGET16(s->dump_info.d_machine);
> +    elf_header.e_version = CPU_CONVERT_TO_TARGET32(EV_CURRENT);
> +    elf_header.e_ehsize = CPU_CONVERT_TO_TARGET16(sizeof(elf_header));
> +    elf_header.e_phoff = CPU_CONVERT_TO_TARGET32(sizeof(Elf32_Ehdr));
> +    elf_header.e_phentsize = CPU_CONVERT_TO_TARGET16(sizeof(Elf32_Phdr));
> +    elf_header.e_phnum = CPU_CONVERT_TO_TARGET16(s->phdr_num);
> +
> +    lseek(s->fd, 0, SEEK_SET);
> +    ret = write(s->fd, &elf_header, sizeof(elf_header));
> +    if (ret < 0) {
> +        dump_error(s, "dump: failed to write elf header.\n");
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +static int write_elf64_load(DumpState *s, MemoryMapping *memory_mapping,
> +                            int phdr_index, target_phys_addr_t offset)
> +{
> +    Elf64_Phdr phdr;
> +    off_t phdr_offset;
> +    int ret;
> +    int endian = s->dump_info.d_endian;
> +
> +    memset(&phdr, 0, sizeof(Elf64_Phdr));
> +    phdr.p_type = CPU_CONVERT_TO_TARGET32(PT_LOAD);
> +    phdr.p_offset = CPU_CONVERT_TO_TARGET64(offset);
> +    phdr.p_paddr = CPU_CONVERT_TO_TARGET64(memory_mapping->phys_addr);
> +    if (offset == -1) {
> +        phdr.p_filesz = 0;
> +    } else {
> +        phdr.p_filesz = CPU_CONVERT_TO_TARGET64(memory_mapping->length);
> +    }
> +    phdr.p_memsz = CPU_CONVERT_TO_TARGET64(memory_mapping->length);
> +    phdr.p_vaddr = CPU_CONVERT_TO_TARGET64(memory_mapping->virt_addr);
> +
> +    phdr_offset = sizeof(Elf64_Ehdr) + sizeof(Elf64_Phdr)*phdr_index;
> +    lseek(s->fd, phdr_offset, SEEK_SET);
> +    ret = write(s->fd, &phdr, sizeof(Elf64_Phdr));
> +    if (ret < 0) {
> +        dump_error(s, "dump: failed to write program header table.\n");
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +static int write_elf32_load(DumpState *s, MemoryMapping *memory_mapping,
> +                            int phdr_index, target_phys_addr_t offset)
> +{
> +    Elf32_Phdr phdr;
> +    off_t phdr_offset;
> +    int ret;
> +    int endian = s->dump_info.d_endian;
> +
> +    memset(&phdr, 0, sizeof(Elf32_Phdr));
> +    phdr.p_type = CPU_CONVERT_TO_TARGET32(PT_LOAD);
> +    phdr.p_offset = CPU_CONVERT_TO_TARGET32(offset);
> +    phdr.p_paddr = CPU_CONVERT_TO_TARGET32(memory_mapping->phys_addr);
> +    if (offset == -1) {
> +        phdr.p_filesz = 0;
> +    } else {
> +        phdr.p_filesz = CPU_CONVERT_TO_TARGET32(memory_mapping->length);
> +    }
> +    phdr.p_memsz = CPU_CONVERT_TO_TARGET32(memory_mapping->length);
> +    phdr.p_vaddr = CPU_CONVERT_TO_TARGET32(memory_mapping->virt_addr);
> +
> +    phdr_offset = sizeof(Elf32_Ehdr) + sizeof(Elf32_Phdr)*phdr_index;
> +    lseek(s->fd, phdr_offset, SEEK_SET);
> +    ret = write(s->fd, &phdr, sizeof(Elf32_Phdr));
> +    if (ret < 0) {
> +        dump_error(s, "dump: failed to write program header table.\n");
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +static int write_elf64_notes(DumpState *s, int phdr_index,
> +                             target_phys_addr_t *offset)
> +{
> +    CPUState *env;
> +    int ret;
> +    target_phys_addr_t begin = *offset;
> +    Elf64_Phdr phdr;
> +    off_t phdr_offset;
> +    int id;
> +    int endian = s->dump_info.d_endian;
> +
> +    for (env = first_cpu; env != NULL; env = env->next_cpu) {
> +        id = cpuid(env);
> +        ret = cpu_write_elf64_note(s->fd, env, id, offset);
> +        if (ret < 0) {
> +            dump_error(s, "dump: failed to write elf notes.\n");
> +            return -1;
> +        }
> +    }
> +
> +    memset(&phdr, 0, sizeof(Elf64_Phdr));
> +    phdr.p_type = CPU_CONVERT_TO_TARGET32(PT_NOTE);
> +    phdr.p_offset = CPU_CONVERT_TO_TARGET64(begin);
> +    phdr.p_paddr = 0;
> +    phdr.p_filesz = CPU_CONVERT_TO_TARGET64(*offset - begin);
> +    phdr.p_memsz = CPU_CONVERT_TO_TARGET64(*offset - begin);
> +    phdr.p_vaddr = 0;
> +
> +    phdr_offset = sizeof(Elf64_Ehdr);
> +    lseek(s->fd, phdr_offset, SEEK_SET);
> +    ret = write(s->fd, &phdr, sizeof(Elf64_Phdr));
> +    if (ret < 0) {
> +        dump_error(s, "dump: failed to write program header table.\n");
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +static int write_elf32_notes(DumpState *s, int phdr_index,
> +                             target_phys_addr_t *offset)
> +{
> +    CPUState *env;
> +    int ret;
> +    target_phys_addr_t begin = *offset;
> +    Elf32_Phdr phdr;
> +    off_t phdr_offset;
> +    int id;
> +    int endian = s->dump_info.d_endian;
> +
> +    for (env = first_cpu; env != NULL; env = env->next_cpu) {
> +        id = cpuid(env);
> +        ret = cpu_write_elf32_note(s->fd, env, id, offset);
> +        if (ret < 0) {
> +            dump_error(s, "dump: failed to write elf notes.\n");
> +            return -1;
> +        }
> +    }
> +
> +    memset(&phdr, 0, sizeof(Elf32_Phdr));
> +    phdr.p_type = CPU_CONVERT_TO_TARGET32(PT_NOTE);
> +    phdr.p_offset = CPU_CONVERT_TO_TARGET32(begin);
> +    phdr.p_paddr = 0;
> +    phdr.p_filesz = CPU_CONVERT_TO_TARGET32(*offset - begin);
> +    phdr.p_memsz = CPU_CONVERT_TO_TARGET32(*offset - begin);
> +    phdr.p_vaddr = 0;
> +
> +    phdr_offset = sizeof(Elf32_Ehdr);
> +    lseek(s->fd, phdr_offset, SEEK_SET);
> +    ret = write(s->fd, &phdr, sizeof(Elf32_Phdr));
> +    if (ret < 0) {
> +        dump_error(s, "dump: failed to write program header table.\n");
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +static int write_data(DumpState *s, void *buf, int length,
> +                      target_phys_addr_t *offset)
> +{
> +    int ret;
> +
> +    lseek(s->fd, *offset, SEEK_SET);
> +    ret = write(s->fd, buf, length);
> +    if (ret < 0) {
> +        dump_error(s, "dump: failed to save memory.\n");
> +        return -1;
> +    }
> +
> +    *offset += length;
> +    return 0;
> +}
> +
> +/* write the memroy to vmcore. 1 page per I/O. */
> +static int write_memory(DumpState *s, RAMBlock *block,
> +                        target_phys_addr_t *offset)
> +{
> +    int i, ret;
> +
> +    for (i = 0; i < block->length / TARGET_PAGE_SIZE; i++) {
> +        ret = write_data(s, block->host + i * TARGET_PAGE_SIZE,
> +                         TARGET_PAGE_SIZE, offset);
> +        if (ret < 0) {
> +            return -1;
> +        }
> +    }
> +
> +    if ((block->length % TARGET_PAGE_SIZE) != 0) {
> +        ret = write_data(s, block->host + i * TARGET_PAGE_SIZE,
> +                         block->length % TARGET_PAGE_SIZE, offset);
> +        if (ret < 0) {
> +            return -1;
> +        }
> +    }
> +
> +    return 0;
> +}
> +
> +/* get the memory's offset in the vmcore */
> +static target_phys_addr_t get_offset(target_phys_addr_t phys_addr,
> +                                     target_phys_addr_t memory_offset)
> +{
> +    RAMBlock *block;
> +    target_phys_addr_t offset = memory_offset;
> +
> +    QLIST_FOREACH(block, &ram_list.blocks, next) {
> +        if (phys_addr >= block->offset &&
> +            phys_addr < block->offset + block->length) {
> +            return phys_addr - block->offset + offset;
> +        }
> +        offset += block->length;
> +    }
> +
> +    return -1;
> +}
> +
> +static DumpState *dump_init(int fd, Error **errp)
> +{
> +    CPUState *env;
> +    DumpState *s = dump_get_current();
> +    int ret;
> +
> +    vm_stop(RUN_STATE_PAUSED);

I would save the current vm state first and restore it when finished.

> +    s->state = DUMP_STATE_SETUP;
> +    if (s->error) {
> +        g_free(s->error);
> +        s->error = NULL;
> +    }
> +    s->fd = fd;
> +
> +    /*
> +     * get dump info: endian, class and architecture.
> +     * If the target architecture is not supported, cpu_get_dump_info() will
> +     * return -1.
> +     *
> +     * if we use kvm, we should synchronize the register before we get dump
> +     * info.
> +     */
> +    for (env = first_cpu; env != NULL; env = env->next_cpu) {
> +        cpu_synchronize_state(env);
> +    }
> +    ret = cpu_get_dump_info(&s->dump_info);
> +    if (ret < 0) {
> +        error_set(errp, QERR_UNSUPPORTED);
> +        return NULL;
> +    }
> +
> +    /* get memory mapping */
> +    s->list.num = 0;
> +    QTAILQ_INIT(&s->list.head);
> +    get_memory_mapping(&s->list);
> +
> +    /* crash needs extra memory mapping to determine phys_base. */
> +    ret = cpu_add_extra_memory_mapping(&s->list);
> +    if (ret < 0) {
> +        error_set(errp, QERR_UNDEFINED_ERROR);
> +        return NULL;
> +    }
> +
> +    /*
> +     * calculate phdr_num
> +     *
> +     * the type of phdr->num is uint16_t, so we should avoid overflow
> +     */
> +    s->phdr_num = 1; /* PT_NOTE */
> +    if (s->list.num > (1 << 16) - 2) {
> +        s->phdr_num = (1 << 16) - 1;
> +    } else {
> +        s->phdr_num += s->list.num;
> +    }
> +
> +    return s;
> +}
> +
> +/* write elf header, PT_NOTE and elf note to vmcore. */
> +static int dump_begin(DumpState *s)
> +{
> +    target_phys_addr_t offset;
> +    int ret;
> +
> +    s->state = DUMP_STATE_ACTIVE;
> +
> +    /*
> +     * the vmcore's format is:
> +     *   --------------
> +     *   |  elf header |
> +     *   --------------
> +     *   |  PT_NOTE    |
> +     *   --------------
> +     *   |  PT_LOAD    |
> +     *   --------------
> +     *   |  ......     |
> +     *   --------------
> +     *   |  PT_LOAD    |
> +     *   --------------
> +     *   |  elf note   |
> +     *   --------------
> +     *   |  memory     |
> +     *   --------------
> +     *
> +     * we only know where the memory is saved after we write elf note into
> +     * vmcore.
> +     */
> +
> +    /* write elf header to vmcore */
> +    if (s->dump_info.d_class == ELFCLASS64) {
> +        ret = write_elf64_header(s);
> +    } else {
> +        ret = write_elf32_header(s);
> +    }
> +    if (ret < 0) {
> +        return -1;
> +    }
> +
> +    /* write elf notes to vmcore */
> +    if (s->dump_info.d_class == ELFCLASS64) {
> +        offset = sizeof(Elf64_Ehdr) + sizeof(Elf64_Phdr)*s->phdr_num;
> +        ret = write_elf64_notes(s, 0, &offset);
> +    } else {
> +        offset = sizeof(Elf32_Ehdr) + sizeof(Elf32_Phdr)*s->phdr_num;
> +        ret = write_elf32_notes(s, 0, &offset);
> +    }
> +
> +    if (ret < 0) {
> +        return -1;
> +    }
> +
> +    s->memory_offset = offset;
> +    return 0;
> +}
> +
> +/* write PT_LOAD to vmcore */
> +static int dump_completed(DumpState *s)
> +{
> +    target_phys_addr_t offset;
> +    MemoryMapping *memory_mapping;
> +    int phdr_index = 1, ret;
> +
> +    QTAILQ_FOREACH(memory_mapping, &s->list.head, next) {
> +        offset = get_offset(memory_mapping->phys_addr, s->memory_offset);
> +        if (s->dump_info.d_class == ELFCLASS64) {
> +            ret = write_elf64_load(s, memory_mapping, phdr_index++, offset);
> +        } else {
> +            ret = write_elf32_load(s, memory_mapping, phdr_index++, offset);
> +        }
> +        if (ret < 0) {
> +            return -1;
> +        }
> +    }
> +
> +    s->state = DUMP_STATE_COMPLETED;
> +    dump_cleanup(s);
> +    return 0;
> +}
> +
> +/* write all memory to vmcore */
> +static int dump_iterate(DumpState *s)
> +{
> +    RAMBlock *block;
> +    target_phys_addr_t offset = s->memory_offset;
> +    int ret;
> +
> +    /* write all memory to vmcore */
> +    QLIST_FOREACH(block, &ram_list.blocks, next) {
> +        ret = write_memory(s, block, &offset);
> +        if (ret < 0) {
> +            return -1;
> +        }
> +    }
> +
> +    return dump_completed(s);
> +}
> +
> +static int create_vmcore(DumpState *s)
> +{
> +    int ret;
> +
> +    ret = dump_begin(s);
> +    if (ret < 0) {
> +        return -1;
> +    }
> +
> +    ret = dump_iterate(s);
> +    if (ret < 0) {
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +void qmp_dump(const char *file, Error **errp)
> +{
> +    const char *p;
> +    int fd = -1;
> +    DumpState *s;
> +
> +#if !defined(WIN32)
> +    if (strstart(file, "fd:", &p)) {
> +        fd = qemu_get_fd(p);
> +        if (fd == -1) {
> +            error_set(errp, QERR_FD_NOT_FOUND, p);
> +            return;
> +        }
> +    }
> +#endif
> +
> +    if  (strstart(file, "file:", &p)) {
> +        fd = open(p, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, S_IRUSR);
> +        if (fd < 0) {
> +            error_set(errp, QERR_OPEN_FILE_FAILED, p);
> +            return;
> +        }
> +    }
> +
> +    if (fd == -1) {
> +        error_set(errp, QERR_INVALID_PARAMETER, "file");
> +        return;
> +    }
> +
> +    s = dump_init(fd, errp);
> +    if (!s) {
> +        return;
> +    }
> +
> +    if (create_vmcore(s) < 0) {
> +        error_set(errp, QERR_IO_ERROR);
> +    }
> +
> +    return;
> +}
> diff --git a/dump.h b/dump.h
> index a36468b..b413d18 100644
> --- a/dump.h
> +++ b/dump.h
> @@ -1,6 +1,9 @@
>  #ifndef DUMP_H
>  #define DUMP_H
> 
> +#include "qdict.h"
> +#include "error.h"
> +

This looks stray. Nothing is added to this header which require those
includes.

>  typedef struct ArchDumpInfo {
>      int d_machine;  /* Architecture */
>      int d_endian;   /* ELFDATA2LSB or ELFDATA2MSB */
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index 573b823..6cfb678 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -867,6 +867,22 @@ new parameters (if specified) once the vm migration finished successfully.
>  ETEXI
> 
>      {
> +        .name       = "dump",
> +        .args_type  = "file:s",
> +        .params     = "file",
> +        .help       = "dump to file",
> +        .user_print = monitor_user_noop,
> +        .mhandler.cmd = hmp_dump,
> +    },
> +
> +
> +STEXI
> +@item dump @var{file}
> +@findex dump
> +Dump to @var{file}.

That's way too brief! :) It should state the format, mention potential
architecture limitations, and explain that the output can be processed
with crash or gdb.

> +ETEXI
> +
> +    {
>          .name       = "snapshot_blkdev",
>          .args_type  = "device:B,snapshot-file:s?,format:s?",
>          .params     = "device [new-image-file] [format]",
> diff --git a/hmp.c b/hmp.c
> index 8ff8c94..1a69857 100644
> --- a/hmp.c
> +++ b/hmp.c
> @@ -851,3 +851,12 @@ void hmp_block_job_cancel(Monitor *mon, const QDict *qdict)
> 
>      hmp_handle_error(mon, &error);
>  }
> +
> +void hmp_dump(Monitor *mon, const QDict *qdict)
> +{
> +    Error *errp = NULL;
> +    const char *file = qdict_get_str(qdict, "file");
> +
> +    qmp_dump(file, &errp);
> +    hmp_handle_error(mon, &errp);
> +}
> diff --git a/hmp.h b/hmp.h
> index 18eecbd..66984c5 100644
> --- a/hmp.h
> +++ b/hmp.h
> @@ -58,5 +58,6 @@ void hmp_block_set_io_throttle(Monitor *mon, const QDict *qdict);
>  void hmp_block_stream(Monitor *mon, const QDict *qdict);
>  void hmp_block_job_set_speed(Monitor *mon, const QDict *qdict);
>  void hmp_block_job_cancel(Monitor *mon, const QDict *qdict);
> +void hmp_dump(Monitor *mon, const QDict *qdict);
> 
>  #endif
> diff --git a/monitor.c b/monitor.c
> index 7e72739..18e1ac7 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -73,6 +73,9 @@
>  #endif
>  #include "hw/lm32_pic.h"
> 
> +/* for dump */
> +#include "dump.h"
> +
>  //#define DEBUG
>  //#define DEBUG_COMPLETION
> 
> diff --git a/qapi-schema.json b/qapi-schema.json
> index d02ee86..1013ae6 100644
> --- a/qapi-schema.json
> +++ b/qapi-schema.json
> @@ -1582,3 +1582,16 @@
>  { 'command': 'qom-list-types',
>    'data': { '*implements': 'str', '*abstract': 'bool' },
>    'returns': [ 'ObjectTypeInfo' ] }
> +
> +##
> +# @dump
> +#
> +# Dump guest's memory to vmcore.
> +#
> +# @file: the filename or file descriptor of the vmcore.
> +#
> +# Returns: nothing on success
> +#
> +# Since: 1.1
> +##
> +{ 'command': 'dump', 'data': { 'file': 'str' } }
> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index b5e2ab8..52d3d3b 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -566,6 +566,32 @@ Example:
>  EQMP
> 
>      {
> +        .name       = "dump",
> +        .args_type  = "file:s",
> +        .params     = "file",
> +        .help       = "dump to file",
> +        .user_print = monitor_user_noop,
> +        .mhandler.cmd_new = qmp_marshal_input_dump,
> +    },
> +
> +SQMP
> +dump
> +
> +
> +Dump to file.
> +
> +Arguments:
> +
> +- "file": Destination file (json-string)

The code looks like it supports both file names and file descriptors,
no? Same for HMP.

> +
> +Example:
> +
> +-> { "execute": "dump", "arguments": { "file": "fd:dump" } }
> +<- { "return": {} }
> +
> +EQMP
> +
> +    {
>          .name       = "netdev_add",
>          .args_type  = "netdev:O",
>          .params     = "[user|tap|socket],id=str[,prop=value][,...]",
> --
> 1.7.1
> 

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 10/16 v6] run dump at the background
  2012-02-09  3:28 ` [Qemu-devel] [RFC][PATCH 10/16 v6] run dump at the background Wen Congyang
@ 2012-02-14 18:05   ` Jan Kiszka
  2012-02-14 18:27     ` Jan Kiszka
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Kiszka @ 2012-02-14 18:05 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

On 2012-02-09 04:28, Wen Congyang wrote:
> The new monitor command dump may take long time to finish. So we need run it
> at the background.

How does it work? Like live migration, i.e. you retransmit (overwrite)
already written but then dirtied pages? Hmm... no.

What does background mean then? What is the use case? What if the user
decides to resume the vm?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 10/16 v6] run dump at the background
  2012-02-14 18:05   ` Jan Kiszka
@ 2012-02-14 18:27     ` Jan Kiszka
  2012-02-15  3:47       ` Wen Congyang
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Kiszka @ 2012-02-14 18:27 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

On 2012-02-14 19:05, Jan Kiszka wrote:
> On 2012-02-09 04:28, Wen Congyang wrote:
>> The new monitor command dump may take long time to finish. So we need run it
>> at the background.
> 
> How does it work? Like live migration, i.e. you retransmit (overwrite)
> already written but then dirtied pages? Hmm... no.
> 
> What does background mean then? What is the use case? What if the user
> decides to resume the vm?

OK, that is addressed in patch 15! I would suggest merging it into this
patch. It makes sense to handle that case gracefully right from the
beginning.

OK, now I have some other question: What is the point of rate-limiting
the dump? The guest is not running, thus not competing for bandwidth.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 16/16 v6] allow user to dump a fraction of the memory
  2012-02-09  3:34 ` [Qemu-devel] [RFC][PATCH 16/16 v6] allow user to dump a fraction of the memory Wen Congyang
@ 2012-02-14 18:27   ` Jan Kiszka
  0 siblings, 0 replies; 68+ messages in thread
From: Jan Kiszka @ 2012-02-14 18:27 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

On 2012-02-09 04:34, Wen Congyang wrote:
> diff --git a/hmp-commands.hx b/hmp-commands.hx
> index a026905..388b9ac 100644
> --- a/hmp-commands.hx
> +++ b/hmp-commands.hx
> @@ -868,9 +868,11 @@ ETEXI
> 
>      {
>          .name       = "dump",
> -        .args_type  = "detach:-d,file:s",
> +        .args_type  = "detach:-d,file:s,begin:i?,length:i?",
>          .params     = "file",

You forgot to update params.

> -        .help       = "dump to file (using -d to not wait for completion)",
> +        .help       = "dump to file (using -d to not wait for completion)"
> +                      "\n\t\t\t begin(optional): the starting physical address"
> +                      "\n\t\t\t length(optional): the memory size, in bytes",

Is it [begin [lenght]] or [begin lenght]? If you specify params, you
don't need to state optional here.

Same for QMP.

I'm short on time, thus didn't look at code in patches >= 10.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 01/16 v6] monitor: introduce qemu_suspend_monitor()/qemu_resume_monitor()
  2012-02-14 16:19   ` Jan Kiszka
@ 2012-02-15  2:54     ` Wen Congyang
  2012-02-15  8:51       ` Jan Kiszka
  0 siblings, 1 reply; 68+ messages in thread
From: Wen Congyang @ 2012-02-15  2:54 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

At 02/15/2012 12:19 AM, Jan Kiszka Wrote:
> On 2012-02-09 04:19, Wen Congyang wrote:
>> Sync command needs these two APIs to suspend/resume monitor.
>>
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> ---
>>  monitor.c |   27 +++++++++++++++++++++++++++
>>  monitor.h |    2 ++
>>  2 files changed, 29 insertions(+), 0 deletions(-)
>>
>> diff --git a/monitor.c b/monitor.c
>> index 11639b1..7e72739 100644
>> --- a/monitor.c
>> +++ b/monitor.c
>> @@ -4442,6 +4442,26 @@ static void monitor_command_cb(Monitor *mon, const char *cmdline, void *opaque)
>>      monitor_resume(mon);
>>  }
>>  
>> +int qemu_suspend_monitor(const char *fmt, ...)
>> +{
>> +    int ret;
>> +
>> +    if (cur_mon) {
>> +        ret = monitor_suspend(cur_mon);
>> +    } else {
>> +        ret = -ENOTTY;
>> +    }
>> +
>> +    if (ret < 0 && fmt) {
>> +        va_list ap;
>> +        va_start(ap, fmt);
>> +        monitor_vprintf(cur_mon, fmt, ap);
>> +        va_end(ap);
>> +    }
>> +
>> +    return ret;
>> +}
>> +
>>  int monitor_suspend(Monitor *mon)
>>  {
>>      if (!mon->rs)
>> @@ -4450,6 +4470,13 @@ int monitor_suspend(Monitor *mon)
>>      return 0;
>>  }
>>  
>> +void qemu_resume_monitor(void)
>> +{
>> +    if (cur_mon) {
>> +        monitor_resume(cur_mon);
>> +    }
>> +}
>> +
>>  void monitor_resume(Monitor *mon)
>>  {
>>      if (!mon->rs)
>> diff --git a/monitor.h b/monitor.h
>> index 58109af..60a1e17 100644
>> --- a/monitor.h
>> +++ b/monitor.h
>> @@ -46,7 +46,9 @@ int monitor_cur_is_qmp(void);
>>  void monitor_protocol_event(MonitorEvent event, QObject *data);
>>  void monitor_init(CharDriverState *chr, int flags);
>>  
>> +int qemu_suspend_monitor(const char *fmt, ...) GCC_FMT_ATTR(1, 2);
>>  int monitor_suspend(Monitor *mon);
>> +void qemu_resume_monitor(void);
>>  void monitor_resume(Monitor *mon);
>>  
>>  int monitor_read_bdrv_key_start(Monitor *mon, BlockDriverState *bs,
> 
> I don't see any added value in this API, specifically as it is built on
> top of cur_mon. Just use the existing services like the migration code
> does. If you properly pass down the monitor reference from the command
> to the suspend and store what monitor you suspended, all should be fine.

This API is like qemu_get_fd() which is not merged into upstream qemu.
I need this API because I cannot use monitor in qapi command.

Thanks
Wen Congyang

> 
> Jan
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 02/16 v6] Add API to create memory mapping list
  2012-02-14 16:39   ` Jan Kiszka
@ 2012-02-15  3:00     ` Wen Congyang
  0 siblings, 0 replies; 68+ messages in thread
From: Wen Congyang @ 2012-02-15  3:00 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

At 02/15/2012 12:39 AM, Jan Kiszka Wrote:
> On 2012-02-09 04:20, Wen Congyang wrote:
>> The memory mapping list stores virtual address and physical address mapping.
>> The folloing patch will use this information to create PT_LOAD in the vmcore.
>>
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> ---
>>  Makefile.target  |    1 +
>>  memory_mapping.c |  130 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  memory_mapping.h |   38 ++++++++++++++++
>>  3 files changed, 169 insertions(+), 0 deletions(-)
>>  create mode 100644 memory_mapping.c
>>  create mode 100644 memory_mapping.h
>>
>> diff --git a/Makefile.target b/Makefile.target
>> index 68481a3..e35e464 100644
>> --- a/Makefile.target
>> +++ b/Makefile.target
>> @@ -200,6 +200,7 @@ obj-$(CONFIG_KVM) += kvm.o kvm-all.o
>>  obj-$(CONFIG_NO_KVM) += kvm-stub.o
>>  obj-$(CONFIG_VGA) += vga.o
>>  obj-y += memory.o savevm.o
>> +obj-y += memory_mapping.o
>>  LIBS+=-lz
>>  
>>  obj-i386-$(CONFIG_KVM) += hyperv.o
>> diff --git a/memory_mapping.c b/memory_mapping.c
>> new file mode 100644
>> index 0000000..d83b7d7
>> --- /dev/null
>> +++ b/memory_mapping.c
>> @@ -0,0 +1,130 @@
>> +/*
>> + * QEMU memory mapping
>> + *
>> + * Copyright Fujitsu, Corp. 2011
>> + *
>> + * Authors:
>> + *     Wen Congyang <wency@cn.fujitsu.com>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2.  See
>> + * the COPYING file in the top-level directory.
>> + *
>> + */
>> +
>> +#include "cpu.h"
>> +#include "cpu-all.h"
>> +#include "memory_mapping.h"
>> +
>> +static MemoryMapping *last_mapping;
>> +
>> +static void create_new_memory_mapping(MemoryMappingList *list,
>> +                                      target_phys_addr_t phys_addr,
>> +                                      target_phys_addr_t virt_addr,
>> +                                      ram_addr_t length)
>> +{
>> +    MemoryMapping *memory_mapping, *p;
>> +
>> +    memory_mapping = g_malloc(sizeof(MemoryMapping));
>> +    memory_mapping->phys_addr = phys_addr;
>> +    memory_mapping->virt_addr = virt_addr;
>> +    memory_mapping->length = length;
>> +    last_mapping = memory_mapping;
>> +    list->num++;
>> +    QTAILQ_FOREACH(p, &list->head, next) {
>> +        if (p->phys_addr >= memory_mapping->phys_addr) {
>> +            QTAILQ_INSERT_BEFORE(p, memory_mapping, next);
>> +            return;
>> +        }
>> +    }
>> +    QTAILQ_INSERT_TAIL(&list->head, memory_mapping, next);
>> +    return;
>> +}
>> +
>> +void create_new_memory_mapping_head(MemoryMappingList *list,
>> +                                    target_phys_addr_t phys_addr,
>> +                                    target_phys_addr_t virt_addr,
>> +                                    ram_addr_t length)
>> +{
>> +    MemoryMapping *memory_mapping;
>> +
>> +    memory_mapping = g_malloc(sizeof(MemoryMapping));
>> +    memory_mapping->phys_addr = phys_addr;
>> +    memory_mapping->virt_addr = virt_addr;
>> +    memory_mapping->length = length;
>> +    last_mapping = memory_mapping;
>> +    list->num++;
>> +    QTAILQ_INSERT_HEAD(&list->head, memory_mapping, next);
>> +    return;
>> +}
>> +
>> +void add_to_memory_mapping(MemoryMappingList *list,
>> +                           target_phys_addr_t phys_addr,
>> +                           target_phys_addr_t virt_addr,
>> +                           ram_addr_t length)
>> +{
>> +    MemoryMapping *memory_mapping;
>> +
>> +    if (QTAILQ_EMPTY(&list->head)) {
>> +        create_new_memory_mapping(list, phys_addr, virt_addr, length);
>> +        return;
>> +    }
>> +
>> +    if (last_mapping) {
>> +        if ((phys_addr == (last_mapping->phys_addr + last_mapping->length)) &&
>> +            (virt_addr == (last_mapping->virt_addr + last_mapping->length))) {
>> +            last_mapping->length += length;
>> +            return;
>> +        }
>> +    }
>> +
>> +    QTAILQ_FOREACH(memory_mapping, &list->head, next) {
>> +        last_mapping = memory_mapping;
>> +        if ((phys_addr == (last_mapping->phys_addr + last_mapping->length)) &&
>> +            (virt_addr == (last_mapping->virt_addr + last_mapping->length))) {
>> +            last_mapping->length += length;
>> +            return;
>> +        }
>> +
>> +        if (!(phys_addr >= (last_mapping->phys_addr)) ||
>> +            !(phys_addr < (last_mapping->phys_addr + last_mapping->length))) {
>> +            /* last_mapping does not contain this region */
>> +            continue;
>> +        }
>> +        if (!(virt_addr >= (last_mapping->virt_addr)) ||
>> +            !(virt_addr < (last_mapping->virt_addr + last_mapping->length))) {
>> +            /* last_mapping does not contain this region */
>> +            continue;
>> +        }
>> +        if ((virt_addr - last_mapping->virt_addr) !=
>> +            (phys_addr - last_mapping->phys_addr)) {
>> +            /*
>> +             * last_mapping contains this region, but we should create another
>> +             * mapping region.
>> +             */
>> +            break;
>> +        }
>> +
>> +        /* merge this region into last_mapping */
>> +        if ((virt_addr + length) >
>> +            (last_mapping->virt_addr + last_mapping->length)) {
>> +            last_mapping->length = virt_addr + length - last_mapping->virt_addr;
>> +        }
>> +        return;
>> +    }
>> +
>> +    /* this region can not be merged into any existed memory mapping. */
>> +    create_new_memory_mapping(list, phys_addr, virt_addr, length);
>> +    return;
>> +}
>> +
>> +void free_memory_mapping_list(MemoryMappingList *list)
>> +{
>> +    MemoryMapping *p, *q;
>> +
>> +    QTAILQ_FOREACH_SAFE(p, &list->head, next, q) {
>> +        QTAILQ_REMOVE(&list->head, p, next);
>> +        g_free(p);
>> +    }
>> +
>> +    list->num = 0;
>> +}
>> diff --git a/memory_mapping.h b/memory_mapping.h
>> new file mode 100644
>> index 0000000..a4b1532
>> --- /dev/null
>> +++ b/memory_mapping.h
>> @@ -0,0 +1,38 @@
>> +#ifndef MEMORY_MAPPING_H
>> +#define MEMORY_MAPPING_H
>> +
>> +#include "qemu-queue.h"
>> +
>> +typedef struct MemoryMapping {
>> +    target_phys_addr_t phys_addr;
>> +    target_ulong virt_addr;
>> +    ram_addr_t length;
>> +    QTAILQ_ENTRY(MemoryMapping) next;
>> +} MemoryMapping;
>> +
>> +typedef struct MemoryMappingList {
>> +    unsigned int num;
> 
> This field looks unused by this series. Unless I miss something, you
> probably want to drop it.

It is used in patch 09/16. I need it to calculate the PT_LOAD's num.

> 
>> +    QTAILQ_HEAD(, MemoryMapping) head;
>> +} MemoryMappingList;
>> +
>> +/*
>> + * crash needs some memory mapping should be at the head of the list. It will
>> + * cause the list is not sorted. So the caller must add the special memory
>> + * mapping after adding all the normal memory mapping into list.
>> + */
>> +void create_new_memory_mapping_head(MemoryMappingList *list,
>> +                                    target_phys_addr_t phys_addr,
>> +                                    target_phys_addr_t virt_addr,
>> +                                    ram_addr_t length);
>> +/*
>> + * add or merge the memory region into the memory mapping's list. The list is
>> + * sorted by phys_addr.
>> + */
>> +void add_to_memory_mapping(MemoryMappingList *list,
>> +                           target_phys_addr_t phys_addr,
>> +                           target_phys_addr_t virt_addr,
>> +                           ram_addr_t length);
>> +
>> +void free_memory_mapping_list(MemoryMappingList *list);
>> +
>> +#endif
> 
> A bit hard to understand and use the API. I would suggest:

Sorry for confusing you. I will change the API's name.

> 
> memory_mapping_list_add_sorted(MemoryMappingList *list, ...);
> memory_mapping_list_add_head(MemoryMappingList *list, ...);
> memory_mapping_list_free(MemoryMappingList *list);
> 
> memory_mapping_list_add_head should set a flag in the MemoryMapping
> appended to the list or let the MemoryMappingList point to the firs
> sorted entry. That way, the adding order becomes irrelevant.

Agree with it.

> 
> Moreover, you are lacking some
> memory_mapping_list_init(MemoryMappingList *list). Cleaner than
> open-coding this.

A useful API, and I will add it.
Thanks
Wen Congyang

> 
> Jan
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 03/16 v6] Add API to check whether a physical address is I/O address
  2012-02-14 16:52   ` Jan Kiszka
@ 2012-02-15  3:03     ` Wen Congyang
  0 siblings, 0 replies; 68+ messages in thread
From: Wen Congyang @ 2012-02-15  3:03 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

At 02/15/2012 12:52 AM, Jan Kiszka Wrote:
> On 2012-02-09 04:21, Wen Congyang wrote:
>> This API will be used in the following patch.
>>
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> ---
>>  cpu-common.h |    2 ++
>>  exec.c       |   16 ++++++++++++++++
>>  2 files changed, 18 insertions(+), 0 deletions(-)
>>
>> diff --git a/cpu-common.h b/cpu-common.h
>> index a40c57d..d047137 100644
>> --- a/cpu-common.h
>> +++ b/cpu-common.h
>> @@ -71,6 +71,8 @@ void cpu_physical_memory_unmap(void *buffer, target_phys_addr_t len,
>>  void *cpu_register_map_client(void *opaque, void (*callback)(void *opaque));
>>  void cpu_unregister_map_client(void *cookie);
>>  
>> +bool is_io_addr(target_phys_addr_t phys_addr);
> 
> Something like cpu_physical_memory_is_io would be more consistent with
> other APIs around.

Do you mean change the API's name? If so, I will change it.

> 
>> +
>>  /* Coalesced MMIO regions are areas where write operations can be reordered.
>>   * This usually implies that write operations are side-effect free.  This allows
>>   * batching which can make a major impact on performance when using
>> diff --git a/exec.c b/exec.c
>> index b81677a..edc5684 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -4435,3 +4435,19 @@ bool virtio_is_big_endian(void)
>>  #undef env
>>  
>>  #endif
>> +
>> +bool is_io_addr(target_phys_addr_t phys_addr)
>> +{
>> +    ram_addr_t pd;
>> +    PhysPageDesc p;
>> +
>> +    p = phys_page_find(phys_addr >> TARGET_PAGE_BITS);
>> +    pd = p.phys_offset;
>> +
>> +    if (!is_ram_rom_romd(pd)) {
> 
> return !is_ram_rom_romd(pd); ?

Yes, I will change the code.

Thanks
Wen Congyang

> 
>> +        /* I/O region */
>> +        return true;
>> +    }
>> +
>> +    return false;
>> +}
> 
> Jan
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 04/16 v6] target-i386: implement cpu_get_memory_mapping()
  2012-02-14 17:07   ` Jan Kiszka
@ 2012-02-15  3:05     ` Wen Congyang
  0 siblings, 0 replies; 68+ messages in thread
From: Wen Congyang @ 2012-02-15  3:05 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

At 02/15/2012 01:07 AM, Jan Kiszka Wrote:
> On 2012-02-09 04:21, Wen Congyang wrote:
>> Walk cpu's page table and collect all virtual address and physical address mapping.
>> Then, add these mapping into memory mapping list.
>>
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> ---
>>  Makefile.target         |    2 +-
>>  cpu-all.h               |    7 ++
>>  target-i386/arch-dump.c |  254 +++++++++++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 262 insertions(+), 1 deletions(-)
>>  create mode 100644 target-i386/arch-dump.c
>>
>> diff --git a/Makefile.target b/Makefile.target
>> index e35e464..d6e5684 100644
>> --- a/Makefile.target
>> +++ b/Makefile.target
>> @@ -75,7 +75,7 @@ libobj-$(CONFIG_TCG_INTERPRETER) += tci.o
>>  libobj-y += fpu/softfloat.o
>>  libobj-y += op_helper.o helper.o
>>  ifeq ($(TARGET_BASE_ARCH), i386)
>> -libobj-y += cpuid.o
>> +libobj-y += cpuid.o arch-dump.o
>>  endif
>>  libobj-$(TARGET_SPARC64) += vis_helper.o
>>  libobj-$(CONFIG_NEED_MMU) += mmu.o
>> diff --git a/cpu-all.h b/cpu-all.h
>> index e2c3c49..4cd7fbb 100644
>> --- a/cpu-all.h
>> +++ b/cpu-all.h
>> @@ -22,6 +22,7 @@
>>  #include "qemu-common.h"
>>  #include "qemu-tls.h"
>>  #include "cpu-common.h"
>> +#include "memory_mapping.h"
>>  
>>  /* some important defines:
>>   *
>> @@ -523,4 +524,10 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf);
>>  int cpu_memory_rw_debug(CPUState *env, target_ulong addr,
>>                          uint8_t *buf, int len, int is_write);
>>  
>> +#if defined(TARGET_I386)
> 
> Instead of collecting archs here, you could introduce some
> HAVE_GET_MEMORY_MAPPING and let the targets that support that define it.

OK

> 
>> +void cpu_get_memory_mapping(MemoryMappingList *list, CPUState *env);
>> +#else
>> +#define cpu_get_memory_mapping(list, env)
> 
> Better return an error from cpu_get_memory_mapping (and use static
> inline) so that the caller can find out and report that dumping is not
> supported for the current target.

OK, I will fix it.

Thanks
Wen Congyang

> 
>> +#endif
>> +
>>  #endif /* CPU_ALL_H */
>> diff --git a/target-i386/arch-dump.c b/target-i386/arch-dump.c
>> new file mode 100644
>> index 0000000..2e921c7
>> --- /dev/null
>> +++ b/target-i386/arch-dump.c
>> @@ -0,0 +1,254 @@
>> +/*
>> + * i386 dump
>> + *
>> + * Copyright Fujitsu, Corp. 2011
>> + *
>> + * Authors:
>> + *     Wen Congyang <wency@cn.fujitsu.com>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2.  See
>> + * the COPYING file in the top-level directory.
>> + *
>> + */
>> +
>> +#include "cpu.h"
>> +#include "cpu-all.h"
>> +
>> +/* PAE Paging or IA-32e Paging */
>> +static void walk_pte(MemoryMappingList *list, target_phys_addr_t pte_start_addr,
>> +                     int32_t a20_mask, target_ulong start_line_addr)
>> +{
>> +    target_phys_addr_t pte_addr, start_paddr;
>> +    uint64_t pte;
>> +    target_ulong start_vaddr;
>> +    int i;
>> +
>> +    for (i = 0; i < 512; i++) {
>> +        pte_addr = (pte_start_addr + i * 8) & a20_mask;
>> +        pte = ldq_phys(pte_addr);
>> +        if (!(pte & PG_PRESENT_MASK)) {
>> +            /* not present */
>> +            continue;
>> +        }
>> +
>> +        start_paddr = (pte & ~0xfff) & ~(0x1ULL << 63);
>> +        if (is_io_addr(start_paddr)) {
>> +            /* I/O region */
>> +            continue;
>> +        }
>> +
>> +        start_vaddr = start_line_addr | ((i & 0x1fff) << 12);
>> +        add_to_memory_mapping(list, start_paddr, start_vaddr, 1 << 12);
>> +    }
>> +}
>> +
>> +/* 32-bit Paging */
>> +static void walk_pte2(MemoryMappingList *list,
>> +                      target_phys_addr_t pte_start_addr, int32_t a20_mask,
>> +                      target_ulong start_line_addr)
>> +{
>> +    target_phys_addr_t pte_addr, start_paddr;
>> +    uint32_t pte;
>> +    target_ulong start_vaddr;
>> +    int i;
>> +
>> +    for (i = 0; i < 1024; i++) {
>> +        pte_addr = (pte_start_addr + i * 4) & a20_mask;
>> +        pte = ldl_phys(pte_addr);
>> +        if (!(pte & PG_PRESENT_MASK)) {
>> +            /* not present */
>> +            continue;
>> +        }
>> +
>> +        start_paddr = pte & ~0xfff;
>> +        if (is_io_addr(start_paddr)) {
>> +            /* I/O region */
>> +            continue;
>> +        }
>> +
>> +        start_vaddr = start_line_addr | ((i & 0x3ff) << 12);
>> +        add_to_memory_mapping(list, start_paddr, start_vaddr, 1 << 12);
>> +    }
>> +}
>> +
>> +/* PAE Paging or IA-32e Paging */
>> +static void walk_pde(MemoryMappingList *list, target_phys_addr_t pde_start_addr,
>> +                     int32_t a20_mask, target_ulong start_line_addr)
>> +{
>> +    target_phys_addr_t pde_addr, pte_start_addr, start_paddr;
>> +    uint64_t pde;
>> +    target_ulong line_addr, start_vaddr;
>> +    int i;
>> +
>> +    for (i = 0; i < 512; i++) {
>> +        pde_addr = (pde_start_addr + i * 8) & a20_mask;
>> +        pde = ldq_phys(pde_addr);
>> +        if (!(pde & PG_PRESENT_MASK)) {
>> +            /* not present */
>> +            continue;
>> +        }
>> +
>> +        line_addr = start_line_addr | ((i & 0x1ff) << 21);
>> +        if (pde & PG_PSE_MASK) {
>> +            /* 2 MB page */
>> +            start_paddr = (pde & ~0x1fffff) & ~(0x1ULL << 63);
>> +            if (is_io_addr(start_paddr)) {
>> +                /* I/O region */
>> +                continue;
>> +            }
>> +            start_vaddr = line_addr;
>> +            add_to_memory_mapping(list, start_paddr, start_vaddr, 1 << 21);
>> +            continue;
>> +        }
>> +
>> +        pte_start_addr = (pde & ~0xfff) & a20_mask;
>> +        walk_pte(list, pte_start_addr, a20_mask, line_addr);
>> +    }
>> +}
>> +
>> +/* 32-bit Paging */
>> +static void walk_pde2(MemoryMappingList *list,
>> +                      target_phys_addr_t pde_start_addr, int32_t a20_mask,
>> +                      bool pse)
>> +{
>> +    target_phys_addr_t pde_addr, pte_start_addr, start_paddr;
>> +    uint32_t pde;
>> +    target_ulong line_addr, start_vaddr;
>> +    int i;
>> +
>> +    for (i = 0; i < 1024; i++) {
>> +        pde_addr = (pde_start_addr + i * 4) & a20_mask;
>> +        pde = ldl_phys(pde_addr);
>> +        if (!(pde & PG_PRESENT_MASK)) {
>> +            /* not present */
>> +            continue;
>> +        }
>> +
>> +        line_addr = (((unsigned int)i & 0x3ff) << 22);
>> +        if ((pde & PG_PSE_MASK) && pse) {
>> +            /* 4 MB page */
>> +            start_paddr = (pde & ~0x3fffff) | ((pde & 0x1fe000) << 19);
>> +            if (is_io_addr(start_paddr)) {
>> +                /* I/O region */
>> +                continue;
>> +            }
>> +            start_vaddr = line_addr;
>> +            add_to_memory_mapping(list, start_paddr, start_vaddr, 1 << 22);
>> +            continue;
>> +        }
>> +
>> +        pte_start_addr = (pde & ~0xfff) & a20_mask;
>> +        walk_pte2(list, pte_start_addr, a20_mask, line_addr);
>> +    }
>> +}
>> +
>> +/* PAE Paging */
>> +static void walk_pdpe2(MemoryMappingList *list,
>> +                       target_phys_addr_t pdpe_start_addr, int32_t a20_mask)
>> +{
>> +    target_phys_addr_t pdpe_addr, pde_start_addr;
>> +    uint64_t pdpe;
>> +    target_ulong line_addr;
>> +    int i;
>> +
>> +    for (i = 0; i < 4; i++) {
>> +        pdpe_addr = (pdpe_start_addr + i * 8) & a20_mask;
>> +        pdpe = ldq_phys(pdpe_addr);
>> +        if (!(pdpe & PG_PRESENT_MASK)) {
>> +            /* not present */
>> +            continue;
>> +        }
>> +
>> +        line_addr = (((unsigned int)i & 0x3) << 30);
>> +        pde_start_addr = (pdpe & ~0xfff) & a20_mask;
>> +        walk_pde(list, pde_start_addr, a20_mask, line_addr);
>> +    }
>> +}
>> +
>> +#ifdef TARGET_X86_64
>> +/* IA-32e Paging */
>> +static void walk_pdpe(MemoryMappingList *list,
>> +                      target_phys_addr_t pdpe_start_addr, int32_t a20_mask,
>> +                      target_ulong start_line_addr)
>> +{
>> +    target_phys_addr_t pdpe_addr, pde_start_addr, start_paddr;
>> +    uint64_t pdpe;
>> +    target_ulong line_addr, start_vaddr;
>> +    int i;
>> +
>> +    for (i = 0; i < 512; i++) {
>> +        pdpe_addr = (pdpe_start_addr + i * 8) & a20_mask;
>> +        pdpe = ldq_phys(pdpe_addr);
>> +        if (!(pdpe & PG_PRESENT_MASK)) {
>> +            /* not present */
>> +            continue;
>> +        }
>> +
>> +        line_addr = start_line_addr | ((i & 0x1ffULL) << 30);
>> +        if (pdpe & PG_PSE_MASK) {
>> +            /* 1 GB page */
>> +            start_paddr = (pdpe & ~0x3fffffff) & ~(0x1ULL << 63);
>> +            if (is_io_addr(start_paddr)) {
>> +                /* I/O region */
>> +                continue;
>> +            }
>> +            start_vaddr = line_addr;
>> +            add_to_memory_mapping(list, start_paddr, start_vaddr, 1 << 30);
>> +            continue;
>> +        }
>> +
>> +        pde_start_addr = (pdpe & ~0xfff) & a20_mask;
>> +        walk_pde(list, pde_start_addr, a20_mask, line_addr);
>> +    }
>> +}
>> +
>> +/* IA-32e Paging */
>> +static void walk_pml4e(MemoryMappingList *list,
>> +                       target_phys_addr_t pml4e_start_addr, int32_t a20_mask)
>> +{
>> +    target_phys_addr_t pml4e_addr, pdpe_start_addr;
>> +    uint64_t pml4e;
>> +    target_ulong line_addr;
>> +    int i;
>> +
>> +    for (i = 0; i < 512; i++) {
>> +        pml4e_addr = (pml4e_start_addr + i * 8) & a20_mask;
>> +        pml4e = ldq_phys(pml4e_addr);
>> +        if (!(pml4e & PG_PRESENT_MASK)) {
>> +            /* not present */
>> +            continue;
>> +        }
>> +
>> +        line_addr = ((i & 0x1ffULL) << 39) | (0xffffULL << 48);
>> +        pdpe_start_addr = (pml4e & ~0xfff) & a20_mask;
>> +        walk_pdpe(list, pdpe_start_addr, a20_mask, line_addr);
>> +    }
>> +}
>> +#endif
>> +
>> +void cpu_get_memory_mapping(MemoryMappingList *list, CPUState *env)
>> +{
>> +    if (env->cr[4] & CR4_PAE_MASK) {
>> +#ifdef TARGET_X86_64
>> +        if (env->hflags & HF_LMA_MASK) {
>> +            target_phys_addr_t pml4e_addr;
>> +
>> +            pml4e_addr = (env->cr[3] & ~0xfff) & env->a20_mask;
>> +            walk_pml4e(list, pml4e_addr, env->a20_mask);
>> +        } else
>> +#endif
>> +        {
>> +            target_phys_addr_t pdpe_addr;
>> +
>> +            pdpe_addr = (env->cr[3] & ~0x1f) & env->a20_mask;
>> +            walk_pdpe2(list, pdpe_addr, env->a20_mask);
>> +        }
>> +    } else {
>> +        target_phys_addr_t pde_addr;
>> +        bool pse;
>> +
>> +        pde_addr = (env->cr[3] & ~0xfff) & env->a20_mask;
>> +        pse = !!(env->cr[4] & CR4_PSE_MASK);
>> +        walk_pde2(list, pde_addr, env->a20_mask, pse);
>> +    }
>> +}
> 
> I haven't checked all paging details, but it looks good otherwise.
> 
> Jan
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 06/16 v6] target-i386: Add API to write elf notes to core file
  2012-02-14 17:31   ` Jan Kiszka
@ 2012-02-15  3:16     ` Wen Congyang
  0 siblings, 0 replies; 68+ messages in thread
From: Wen Congyang @ 2012-02-15  3:16 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

At 02/15/2012 01:31 AM, Jan Kiszka Wrote:
> On 2012-02-09 04:24, Wen Congyang wrote:
>> The core file contains register's value. These APIs write registers to
>> core file, and them will be called in the following patch.
>>
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> ---
>>  cpu-all.h               |    6 +
>>  target-i386/arch-dump.c |  243 +++++++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 249 insertions(+), 0 deletions(-)
>>
>> diff --git a/cpu-all.h b/cpu-all.h
>> index 4cd7fbb..efb5ba3 100644
>> --- a/cpu-all.h
>> +++ b/cpu-all.h
>> @@ -526,8 +526,14 @@ int cpu_memory_rw_debug(CPUState *env, target_ulong addr,
>>  
>>  #if defined(TARGET_I386)
>>  void cpu_get_memory_mapping(MemoryMappingList *list, CPUState *env);
>> +int cpu_write_elf64_note(int fd, CPUState *env, int cpuid,
>> +                         target_phys_addr_t *offset);
>> +int cpu_write_elf32_note(int fd, CPUState *env, int cpuid,
>> +                         target_phys_addr_t *offset);
> 
> Again, some HAVE_XXX would be nicer. Maybe you put the whole block under
> HAVE_GUEST_CORE_DUMP or so.

OK

> 
> Is writing to file descriptor generic enough? What if we want to dump
> via QMP, letting the receiver side decide about where to write it?

Currently, writing to file descriptor is supported. If we want to
support some other target, I will modify the API and make the other
target is ealily supported.

> 
>>  #else
>>  #define cpu_get_memory_mapping(list, env)
>> +#define cpu_write_elf64_note(fd, env, cpuid, offset) ({ -1; })
>> +#define cpu_write_elf32_note(fd, env, cpuid, offset) ({ -1; })
>>  #endif
>>  
>>  #endif /* CPU_ALL_H */
>> diff --git a/target-i386/arch-dump.c b/target-i386/arch-dump.c
>> index 2e921c7..4c0ff77 100644
>> --- a/target-i386/arch-dump.c
>> +++ b/target-i386/arch-dump.c
>> @@ -11,8 +11,11 @@
>>   *
>>   */
>>  
>> +#include <elf.h>
> 
> Does this create a new dependency and break non-Linux hosts? Can you
> pull the required bits into qemu's elf.h then?

OK.

> 
>> +
>>  #include "cpu.h"
>>  #include "cpu-all.h"
>> +#include "monitor.h"
>>  
>>  /* PAE Paging or IA-32e Paging */
>>  static void walk_pte(MemoryMappingList *list, target_phys_addr_t pte_start_addr,
>> @@ -252,3 +255,243 @@ void cpu_get_memory_mapping(MemoryMappingList *list, CPUState *env)
>>          walk_pde2(list, pde_addr, env->a20_mask, pse);
>>      }
>>  }
>> +
>> +#ifdef TARGET_X86_64
>> +typedef struct {
>> +    target_ulong r15, r14, r13, r12, rbp, rbx, r11, r10;
>> +    target_ulong r9, r8, rax, rcx, rdx, rsi, rdi, orig_rax;
>> +    target_ulong rip, cs, eflags;
>> +    target_ulong rsp, ss;
>> +    target_ulong fs_base, gs_base;
>> +    target_ulong ds, es, fs, gs;
>> +} x86_64_user_regs_struct;
>> +
>> +static int x86_64_write_elf64_note(int fd, CPUState *env, int id,
>> +                                   target_phys_addr_t *offset)
>> +{
>> +    x86_64_user_regs_struct regs;
>> +    Elf64_Nhdr *note;
>> +    char *buf;
>> +    int descsz, note_size, name_size = 5;
>> +    const char *name = "CORE";
>> +    int ret;
>> +
>> +    regs.r15 = env->regs[15];
>> +    regs.r14 = env->regs[14];
>> +    regs.r13 = env->regs[13];
>> +    regs.r12 = env->regs[12];
>> +    regs.r11 = env->regs[11];
>> +    regs.r10 = env->regs[10];
>> +    regs.r9  = env->regs[9];
>> +    regs.r8  = env->regs[8];
>> +    regs.rbp = env->regs[R_EBP];
>> +    regs.rsp = env->regs[R_ESP];
>> +    regs.rdi = env->regs[R_EDI];
>> +    regs.rsi = env->regs[R_ESI];
>> +    regs.rdx = env->regs[R_EDX];
>> +    regs.rcx = env->regs[R_ECX];
>> +    regs.rbx = env->regs[R_EBX];
>> +    regs.rax = env->regs[R_EAX];
>> +    regs.rip = env->eip;
>> +    regs.eflags = env->eflags;
>> +
>> +    regs.orig_rax = 0; /* FIXME */
>> +    regs.cs = env->segs[R_CS].selector;
>> +    regs.ss = env->segs[R_SS].selector;
>> +    regs.fs_base = env->segs[R_FS].base;
>> +    regs.gs_base = env->segs[R_GS].base;
>> +    regs.ds = env->segs[R_DS].selector;
>> +    regs.es = env->segs[R_ES].selector;
>> +    regs.fs = env->segs[R_FS].selector;
>> +    regs.gs = env->segs[R_GS].selector;
>> +
>> +    descsz = 336; /* sizeof(prstatus_t) is 336 on x86_64 box */
>> +    note_size = ((sizeof(Elf64_Nhdr) + 3) / 4 + (name_size + 3) / 4 +
>> +                (descsz + 3) / 4) * 4;
>> +    note = g_malloc(note_size);
>> +
>> +    memset(note, 0, note_size);
>> +    note->n_namesz = cpu_to_le32(name_size);
>> +    note->n_descsz = cpu_to_le32(descsz);
>> +    note->n_type = cpu_to_le32(NT_PRSTATUS);
>> +    buf = (char *)note;
>> +    buf += ((sizeof(Elf64_Nhdr) + 3) / 4) * 4;
>> +    memcpy(buf, name, name_size);
>> +    buf += ((name_size + 3) / 4) * 4;
>> +    memcpy(buf + 32, &id, 4); /* pr_pid */
>> +    buf += descsz - sizeof(x86_64_user_regs_struct)-sizeof(target_ulong);
>> +    memcpy(buf, &regs, sizeof(x86_64_user_regs_struct));
>> +
>> +    lseek(fd, *offset, SEEK_SET);
>> +    ret = write(fd, note, note_size);
>> +    g_free(note);
>> +    if (ret < 0) {
>> +        return -1;
>> +    }
>> +
>> +    *offset += note_size;
>> +
>> +    return 0;
>> +}
>> +#endif
>> +
>> +typedef struct {
>> +    uint32_t ebx, ecx, edx, esi, edi, ebp, eax;
>> +    unsigned short ds, __ds, es, __es;
>> +    unsigned short fs, __fs, gs, __gs;
>> +    uint32_t orig_eax, eip;
>> +    unsigned short cs, __cs;
>> +    uint32_t eflags, esp;
>> +    unsigned short ss, __ss;
>> +} x86_user_regs_struct;
>> +
>> +static int x86_write_elf64_note(int fd, CPUState *env, int id,
>> +                                target_phys_addr_t *offset)
>> +{
>> +    x86_user_regs_struct regs;
>> +    Elf64_Nhdr *note;
>> +    char *buf;
>> +    int descsz, note_size, name_size = 5;
>> +    const char *name = "CORE";
>> +    int ret;
>> +
>> +    regs.ebp = env->regs[R_EBP] & 0xffffffff;
>> +    regs.esp = env->regs[R_ESP] & 0xffffffff;
>> +    regs.edi = env->regs[R_EDI] & 0xffffffff;
>> +    regs.esi = env->regs[R_ESI] & 0xffffffff;
>> +    regs.edx = env->regs[R_EDX] & 0xffffffff;
>> +    regs.ecx = env->regs[R_ECX] & 0xffffffff;
>> +    regs.ebx = env->regs[R_EBX] & 0xffffffff;
>> +    regs.eax = env->regs[R_EAX] & 0xffffffff;
>> +    regs.eip = env->eip & 0xffffffff;
>> +    regs.eflags = env->eflags & 0xffffffff;
>> +
>> +    regs.cs = env->segs[R_CS].selector;
>> +    regs.__cs = 0;
>> +    regs.ss = env->segs[R_SS].selector;
>> +    regs.__ss = 0;
>> +    regs.ds = env->segs[R_DS].selector;
>> +    regs.__ds = 0;
>> +    regs.es = env->segs[R_ES].selector;
>> +    regs.__es = 0;
>> +    regs.fs = env->segs[R_FS].selector;
>> +    regs.__fs = 0;
>> +    regs.gs = env->segs[R_GS].selector;
>> +    regs.__gs = 0;
>> +
>> +    descsz = 144; /* sizeof(prstatus_t) is 144 on x86 box */
>> +    note_size = ((sizeof(Elf64_Nhdr) + 3) / 4 + (name_size + 3) / 4 +
>> +                (descsz + 3) / 4) * 4;
>> +    note = g_malloc(note_size);
>> +
>> +    memset(note, 0, note_size);
>> +    note->n_namesz = cpu_to_le32(name_size);
>> +    note->n_descsz = cpu_to_le32(descsz);
>> +    note->n_type = cpu_to_le32(NT_PRSTATUS);
>> +    buf = (char *)note;
>> +    buf += ((sizeof(Elf64_Nhdr) + 3) / 4) * 4;
>> +    memcpy(buf, name, name_size);
>> +    buf += ((name_size + 3) / 4) * 4;
>> +    memcpy(buf + 24, &id, 4); /* pr_pid */
>> +    buf += descsz - sizeof(x86_user_regs_struct)-4;
>> +    memcpy(buf, &regs, sizeof(x86_user_regs_struct));
>> +
>> +    lseek(fd, *offset, SEEK_SET);
>> +    ret = write(fd, note, note_size);
>> +    g_free(note);
>> +    if (ret < 0) {
>> +        return -1;
>> +    }
>> +
>> +    *offset += note_size;
>> +
>> +    return 0;
>> +}
>> +
>> +static int x86_write_elf32_note(int fd, CPUState *env, int id,
>> +                                target_phys_addr_t *offset)
>> +{
>> +    x86_user_regs_struct regs;
>> +    Elf32_Nhdr *note;
>> +    char *buf;
>> +    int descsz, note_size, name_size = 5;
>> +    const char *name = "CORE";
>> +    int ret;
>> +
>> +    regs.ebp = env->regs[R_EBP] & 0xffffffff;
>> +    regs.esp = env->regs[R_ESP] & 0xffffffff;
>> +    regs.edi = env->regs[R_EDI] & 0xffffffff;
>> +    regs.esi = env->regs[R_ESI] & 0xffffffff;
>> +    regs.edx = env->regs[R_EDX] & 0xffffffff;
>> +    regs.ecx = env->regs[R_ECX] & 0xffffffff;
>> +    regs.ebx = env->regs[R_EBX] & 0xffffffff;
>> +    regs.eax = env->regs[R_EAX] & 0xffffffff;
>> +    regs.eip = env->eip & 0xffffffff;
>> +    regs.eflags = env->eflags & 0xffffffff;
>> +
>> +    regs.cs = env->segs[R_CS].selector;
>> +    regs.__cs = 0;
>> +    regs.ss = env->segs[R_SS].selector;
>> +    regs.__ss = 0;
>> +    regs.ds = env->segs[R_DS].selector;
>> +    regs.__ds = 0;
>> +    regs.es = env->segs[R_ES].selector;
>> +    regs.__es = 0;
>> +    regs.fs = env->segs[R_FS].selector;
>> +    regs.__fs = 0;
>> +    regs.gs = env->segs[R_GS].selector;
>> +    regs.__gs = 0;
>> +
>> +    descsz = 144; /* sizeof(prstatus_t) is 144 on x86 box */
>> +    note_size = ((sizeof(Elf32_Nhdr) + 3) / 4 + (name_size + 3) / 4 +
>> +                (descsz + 3) / 4) * 4;
>> +    note = g_malloc(note_size);
>> +
>> +    memset(note, 0, note_size);
>> +    note->n_namesz = cpu_to_le32(name_size);
>> +    note->n_descsz = cpu_to_le32(descsz);
>> +    note->n_type = cpu_to_le32(NT_PRSTATUS);
>> +    buf = (char *)note;
>> +    buf += ((sizeof(Elf32_Nhdr) + 3) / 4) * 4;
>> +    memcpy(buf, name, name_size);
>> +    buf += ((name_size + 3) / 4) * 4;
>> +    memcpy(buf + 24, &id, 4); /* pr_pid */
>> +    buf += descsz - sizeof(x86_user_regs_struct)-4;
>> +    memcpy(buf, &regs, sizeof(x86_user_regs_struct));
>> +
>> +    lseek(fd, *offset, SEEK_SET);
>> +    ret = write(fd, note, note_size);
>> +    g_free(note);
>> +    if (ret < 0) {
>> +        return -1;
>> +    }
>> +
>> +    *offset += note_size;
>> +
>> +    return 0;
>> +}
>> +
>> +int cpu_write_elf64_note(int fd, CPUState *env, int cpuid,
>> +                         target_phys_addr_t *offset)
>> +{
>> +    int ret;
>> +#ifdef TARGET_X86_64
>> +    bool lma = !!(first_cpu->hflags & HF_LMA_MASK);
>> +
>> +    if (lma) {
>> +        ret = x86_64_write_elf64_note(fd, env, cpuid, offset);
>> +    } else {
>> +#endif
>> +        ret = x86_write_elf64_note(fd, env, cpuid, offset);
>> +#ifdef TARGET_X86_64
>> +    }
>> +#endif
>> +
>> +    return ret;
>> +}
>> +
>> +int cpu_write_elf32_note(int fd, CPUState *env, int cpuid,
>> +                         target_phys_addr_t *offset)
>> +{
>> +    return x86_write_elf32_note(fd, env, cpuid, offset);
>> +}
> 
> Minor nit: I think this wrapping is not needed, just fold
> x86_write_elf32_note into this function.

OK

Thanks
Wen Congyang

> 
> Jan
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 08/16 v6] target-i386: add API to get dump info
  2012-02-14 17:39   ` Jan Kiszka
@ 2012-02-15  3:30     ` Wen Congyang
  2012-02-15  9:05       ` Jan Kiszka
  0 siblings, 1 reply; 68+ messages in thread
From: Wen Congyang @ 2012-02-15  3:30 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

At 02/15/2012 01:39 AM, Jan Kiszka Wrote:
> On 2012-02-09 04:26, Wen Congyang wrote:
>> Dump info contains: endian, class and architecture. The next
>> patch will use these information to create vmcore.
>>
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> ---
>>  cpu-all.h               |    3 +++
>>  dump.h                  |   10 ++++++++++
>>  target-i386/arch-dump.c |   34 ++++++++++++++++++++++++++++++++++
>>  3 files changed, 47 insertions(+), 0 deletions(-)
>>  create mode 100644 dump.h
>>
>> diff --git a/cpu-all.h b/cpu-all.h
>> index 290c43a..268d1f6 100644
>> --- a/cpu-all.h
>> +++ b/cpu-all.h
>> @@ -23,6 +23,7 @@
>>  #include "qemu-tls.h"
>>  #include "cpu-common.h"
>>  #include "memory_mapping.h"
>> +#include "dump.h"
>>  
>>  /* some important defines:
>>   *
>> @@ -531,11 +532,13 @@ int cpu_write_elf64_note(int fd, CPUState *env, int cpuid,
>>  int cpu_write_elf32_note(int fd, CPUState *env, int cpuid,
>>                           target_phys_addr_t *offset);
>>  int cpu_add_extra_memory_mapping(MemoryMappingList *list);
>> +int cpu_get_dump_info(ArchDumpInfo *info);
>>  #else
>>  #define cpu_get_memory_mapping(list, env)
>>  #define cpu_write_elf64_note(fd, env, cpuid, offset) ({ -1; })
>>  #define cpu_write_elf32_note(fd, env, cpuid, offset) ({ -1; })
>>  #define cpu_add_extra_memory_mapping(list) ({ 0; })
>> +#define cpu_get_dump_info(info) ({ -1; })
> 
> Please use static inlines where possible (applies to earlier patches as
> well).

OK

> 
>>  #endif
>>  
>>  #endif /* CPU_ALL_H */
>> diff --git a/dump.h b/dump.h
>> new file mode 100644
>> index 0000000..a36468b
>> --- /dev/null
>> +++ b/dump.h
>> @@ -0,0 +1,10 @@
> 
> License header missing.

There is no license in other header files.

> 
>> +#ifndef DUMP_H
>> +#define DUMP_H
>> +
>> +typedef struct ArchDumpInfo {
>> +    int d_machine;  /* Architecture */
>> +    int d_endian;   /* ELFDATA2LSB or ELFDATA2MSB */
>> +    int d_class;    /* ELFCLASS32 or ELFCLASS64 */
>> +} ArchDumpInfo;
>> +
>> +#endif
>> diff --git a/target-i386/arch-dump.c b/target-i386/arch-dump.c
>> index d96f6ae..92a53bc 100644
>> --- a/target-i386/arch-dump.c
>> +++ b/target-i386/arch-dump.c
>> @@ -15,6 +15,7 @@
>>  
>>  #include "cpu.h"
>>  #include "cpu-all.h"
>> +#include "dump.h"
>>  #include "monitor.h"
>>  
>>  /* PAE Paging or IA-32e Paging */
>> @@ -538,3 +539,36 @@ int cpu_add_extra_memory_mapping(MemoryMappingList *list)
>>  #endif
>>      return 0;
>>  }
>> +
>> +int cpu_get_dump_info(ArchDumpInfo *info)
>> +{
>> +    bool lma = false;
>> +    RAMBlock *block;
>> +
>> +#ifdef TARGET_X86_64
>> +    lma = !!(first_cpu->hflags & HF_LMA_MASK);
>> +#endif
>> +
>> +    if (lma) {
>> +        info->d_machine = EM_X86_64;
>> +    } else {
>> +        info->d_machine = EM_386;
>> +    }
>> +    info->d_endian = ELFDATA2LSB;
>> +
>> +    if (lma) {
>> +        info->d_class = ELFCLASS64;
>> +    } else {
>> +        info->d_class = ELFCLASS32;
>> +    }
>> +
>> +    QLIST_FOREACH(block, &ram_list.blocks, next) {
>> +        if (!lma && (block->offset + block->length > UINT_MAX)) {
>> +            /* The memory size is greater than 4G */
>> +            info->d_class = ELFCLASS32;
> 
> Is that correct, or did you rather mean ELFCLASS64?

Yes, it should be ELFCLASS64.

Thanks
Wen Congyang

> 
>> +            break;
>> +        }
>> +    }
>> +
>> +    return 0;
>> +}
> 
> Jan
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 09/16 v6] introduce a new monitor command 'dump' to dump guest's memory
  2012-02-14 17:59   ` Jan Kiszka
@ 2012-02-15  3:44     ` Wen Congyang
  2012-02-17  8:52     ` Wen Congyang
  1 sibling, 0 replies; 68+ messages in thread
From: Wen Congyang @ 2012-02-15  3:44 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

At 02/15/2012 01:59 AM, Jan Kiszka Wrote:
> On 2012-02-09 04:28, Wen Congyang wrote:
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> ---
>>  Makefile.target  |    8 +-
>>  dump.c           |  590 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  dump.h           |    3 +
>>  hmp-commands.hx  |   16 ++
>>  hmp.c            |    9 +
>>  hmp.h            |    1 +
>>  monitor.c        |    3 +
>>  qapi-schema.json |   13 ++
>>  qmp-commands.hx  |   26 +++
>>  9 files changed, 665 insertions(+), 4 deletions(-)
>>  create mode 100644 dump.c
>>
>> diff --git a/Makefile.target b/Makefile.target
>> index d6e5684..f39ce2f 100644
>> --- a/Makefile.target
>> +++ b/Makefile.target
>> @@ -112,7 +112,7 @@ $(call set-vpath, $(SRC_PATH)/linux-user:$(SRC_PATH)/linux-user/$(TARGET_ABI_DIR
>>  QEMU_CFLAGS+=-I$(SRC_PATH)/linux-user/$(TARGET_ABI_DIR) -I$(SRC_PATH)/linux-user
>>  obj-y = main.o syscall.o strace.o mmap.o signal.o thunk.o \
>>        elfload.o linuxload.o uaccess.o gdbstub.o cpu-uname.o \
>> -      user-exec.o $(oslib-obj-y)
>> +      user-exec.o $(oslib-obj-y) dump.o
>>
>>  obj-$(TARGET_HAS_BFLT) += flatload.o
>>
>> @@ -150,7 +150,7 @@ LDFLAGS+=-Wl,-segaddr,__STD_PROG_ZONE,0x1000 -image_base 0x0e000000
>>  LIBS+=-lmx
>>
>>  obj-y = main.o commpage.o machload.o mmap.o signal.o syscall.o thunk.o \
>> -        gdbstub.o user-exec.o
>> +        gdbstub.o user-exec.o dump.o
>>
>>  obj-i386-y += ioport-user.o
>>
>> @@ -172,7 +172,7 @@ $(call set-vpath, $(SRC_PATH)/bsd-user)
>>  QEMU_CFLAGS+=-I$(SRC_PATH)/bsd-user -I$(SRC_PATH)/bsd-user/$(TARGET_ARCH)
>>
>>  obj-y = main.o bsdload.o elfload.o mmap.o signal.o strace.o syscall.o \
>> -        gdbstub.o uaccess.o user-exec.o
>> +        gdbstub.o uaccess.o user-exec.o dump.o
>>
>>  obj-i386-y += ioport-user.o
>>
>> @@ -188,7 +188,7 @@ endif #CONFIG_BSD_USER
>>  # System emulator target
>>  ifdef CONFIG_SOFTMMU
>>
>> -obj-y = arch_init.o cpus.o monitor.o machine.o gdbstub.o balloon.o ioport.o
>> +obj-y = arch_init.o cpus.o monitor.o machine.o gdbstub.o balloon.o ioport.o dump.o
>>  # virtio has to be here due to weird dependency between PCI and virtio-net.
>>  # need to fix this properly
>>  obj-$(CONFIG_NO_PCI) += pci-stub.o
>> diff --git a/dump.c b/dump.c
>> new file mode 100644
>> index 0000000..a0e8b86
>> --- /dev/null
>> +++ b/dump.c
>> @@ -0,0 +1,590 @@
>> +/*
>> + * QEMU dump
>> + *
>> + * Copyright Fujitsu, Corp. 2011
>> + *
>> + * Authors:
>> + *     Wen Congyang <wency@cn.fujitsu.com>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2.  See
>> + * the COPYING file in the top-level directory.
>> + *
>> + */
>> +
>> +#include "qemu-common.h"
>> +#include <unistd.h>
>> +#include <elf.h>
>> +#include <sys/procfs.h>
>> +#include <glib.h>
>> +#include "cpu.h"
>> +#include "cpu-all.h"
>> +#include "targphys.h"
>> +#include "monitor.h"
>> +#include "kvm.h"
>> +#include "dump.h"
>> +#include "sysemu.h"
>> +#include "bswap.h"
>> +#include "memory_mapping.h"
>> +#include "error.h"
>> +#include "qmp-commands.h"
>> +
>> +#define CPU_CONVERT_TO_TARGET16(val) \
>> +({ \
>> +    uint16_t _val = (val); \
>> +    if (endian == ELFDATA2LSB) { \
>> +        _val = cpu_to_le16(_val); \
>> +    } else {\
>> +        _val = cpu_to_be16(_val); \
>> +    } \
>> +    _val; \
>> +})
>> +
>> +#define CPU_CONVERT_TO_TARGET32(val) \
>> +({ \
>> +    uint32_t _val = (val); \
>> +    if (endian == ELFDATA2LSB) { \
>> +        _val = cpu_to_le32(_val); \
>> +    } else {\
>> +        _val = cpu_to_be32(_val); \
>> +    } \
>> +    _val; \
>> +})
>> +
>> +#define CPU_CONVERT_TO_TARGET64(val) \
>> +({ \
>> +    uint64_t _val = (val); \
>> +    if (endian == ELFDATA2LSB) { \
>> +        _val = cpu_to_le64(_val); \
>> +    } else {\
>> +        _val = cpu_to_be64(_val); \
>> +    } \
>> +    _val; \
>> +})
> 
> static inline functions, please.

OK

> 
>> +
>> +enum {
>> +    DUMP_STATE_ERROR,
>> +    DUMP_STATE_SETUP,
>> +    DUMP_STATE_CANCELLED,
>> +    DUMP_STATE_ACTIVE,
>> +    DUMP_STATE_COMPLETED,
>> +};
>> +
>> +typedef struct DumpState {
>> +    ArchDumpInfo dump_info;
>> +    MemoryMappingList list;
>> +    int phdr_num;
>> +    int state;
>> +    char *error;
>> +    int fd;
>> +    target_phys_addr_t memory_offset;
>> +} DumpState;
>> +
>> +static DumpState *dump_get_current(void)
>> +{
>> +    static DumpState current_dump = {
>> +        .state = DUMP_STATE_SETUP,
>> +    };
>> +
>> +    return &current_dump;
>> +}
>> +
>> +static int dump_cleanup(DumpState *s)
>> +{
>> +    int ret = 0;
>> +
>> +    free_memory_mapping_list(&s->list);
>> +    if (s->fd != -1) {
>> +        close(s->fd);
>> +        s->fd = -1;
>> +    }
>> +
>> +    return ret;
>> +}
>> +
>> +static void dump_error(DumpState *s, const char *reason)
>> +{
>> +    s->state = DUMP_STATE_ERROR;
>> +    s->error = g_strdup(reason);
>> +    dump_cleanup(s);
>> +}
>> +
>> +static inline int cpuid(CPUState *env)
>> +{
>> +#if defined(CONFIG_USER_ONLY) && defined(CONFIG_USE_NPTL)
>> +    return env->host_tid;
> 
> Curious: Does this command already work with user mode guest?

I think the answer is not. I will change it.

> 
>> +#else
>> +    return env->cpu_index + 1;
>> +#endif
>> +}
> 
> There is gdb_id in gdbstub. It should be made generally avialable and
> reused here.

OK

> 
>> +
>> +static int write_elf64_header(DumpState *s)
>> +{
>> +    Elf64_Ehdr elf_header;
>> +    int ret;
>> +    int endian = s->dump_info.d_endian;
>> +
>> +    memset(&elf_header, 0, sizeof(Elf64_Ehdr));
>> +    memcpy(&elf_header, ELFMAG, 4);
>> +    elf_header.e_ident[EI_CLASS] = ELFCLASS64;
>> +    elf_header.e_ident[EI_DATA] = s->dump_info.d_endian;
>> +    elf_header.e_ident[EI_VERSION] = EV_CURRENT;
>> +    elf_header.e_type = CPU_CONVERT_TO_TARGET16(ET_CORE);
>> +    elf_header.e_machine = CPU_CONVERT_TO_TARGET16(s->dump_info.d_machine);
>> +    elf_header.e_version = CPU_CONVERT_TO_TARGET32(EV_CURRENT);
>> +    elf_header.e_ehsize = CPU_CONVERT_TO_TARGET16(sizeof(elf_header));
>> +    elf_header.e_phoff = CPU_CONVERT_TO_TARGET64(sizeof(Elf64_Ehdr));
>> +    elf_header.e_phentsize = CPU_CONVERT_TO_TARGET16(sizeof(Elf64_Phdr));
>> +    elf_header.e_phnum = CPU_CONVERT_TO_TARGET16(s->phdr_num);
>> +
>> +    lseek(s->fd, 0, SEEK_SET);
>> +    ret = write(s->fd, &elf_header, sizeof(elf_header));
>> +    if (ret < 0) {
>> +        dump_error(s, "dump: failed to write elf header.\n");
>> +        return -1;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int write_elf32_header(DumpState *s)
>> +{
>> +    Elf32_Ehdr elf_header;
>> +    int ret;
>> +    int endian = s->dump_info.d_endian;
>> +
>> +    memset(&elf_header, 0, sizeof(Elf32_Ehdr));
>> +    memcpy(&elf_header, ELFMAG, 4);
>> +    elf_header.e_ident[EI_CLASS] = ELFCLASS32;
>> +    elf_header.e_ident[EI_DATA] = endian;
>> +    elf_header.e_ident[EI_VERSION] = EV_CURRENT;
>> +    elf_header.e_type = CPU_CONVERT_TO_TARGET16(ET_CORE);
>> +    elf_header.e_machine = CPU_CONVERT_TO_TARGET16(s->dump_info.d_machine);
>> +    elf_header.e_version = CPU_CONVERT_TO_TARGET32(EV_CURRENT);
>> +    elf_header.e_ehsize = CPU_CONVERT_TO_TARGET16(sizeof(elf_header));
>> +    elf_header.e_phoff = CPU_CONVERT_TO_TARGET32(sizeof(Elf32_Ehdr));
>> +    elf_header.e_phentsize = CPU_CONVERT_TO_TARGET16(sizeof(Elf32_Phdr));
>> +    elf_header.e_phnum = CPU_CONVERT_TO_TARGET16(s->phdr_num);
>> +
>> +    lseek(s->fd, 0, SEEK_SET);
>> +    ret = write(s->fd, &elf_header, sizeof(elf_header));
>> +    if (ret < 0) {
>> +        dump_error(s, "dump: failed to write elf header.\n");
>> +        return -1;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int write_elf64_load(DumpState *s, MemoryMapping *memory_mapping,
>> +                            int phdr_index, target_phys_addr_t offset)
>> +{
>> +    Elf64_Phdr phdr;
>> +    off_t phdr_offset;
>> +    int ret;
>> +    int endian = s->dump_info.d_endian;
>> +
>> +    memset(&phdr, 0, sizeof(Elf64_Phdr));
>> +    phdr.p_type = CPU_CONVERT_TO_TARGET32(PT_LOAD);
>> +    phdr.p_offset = CPU_CONVERT_TO_TARGET64(offset);
>> +    phdr.p_paddr = CPU_CONVERT_TO_TARGET64(memory_mapping->phys_addr);
>> +    if (offset == -1) {
>> +        phdr.p_filesz = 0;
>> +    } else {
>> +        phdr.p_filesz = CPU_CONVERT_TO_TARGET64(memory_mapping->length);
>> +    }
>> +    phdr.p_memsz = CPU_CONVERT_TO_TARGET64(memory_mapping->length);
>> +    phdr.p_vaddr = CPU_CONVERT_TO_TARGET64(memory_mapping->virt_addr);
>> +
>> +    phdr_offset = sizeof(Elf64_Ehdr) + sizeof(Elf64_Phdr)*phdr_index;
>> +    lseek(s->fd, phdr_offset, SEEK_SET);
>> +    ret = write(s->fd, &phdr, sizeof(Elf64_Phdr));
>> +    if (ret < 0) {
>> +        dump_error(s, "dump: failed to write program header table.\n");
>> +        return -1;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int write_elf32_load(DumpState *s, MemoryMapping *memory_mapping,
>> +                            int phdr_index, target_phys_addr_t offset)
>> +{
>> +    Elf32_Phdr phdr;
>> +    off_t phdr_offset;
>> +    int ret;
>> +    int endian = s->dump_info.d_endian;
>> +
>> +    memset(&phdr, 0, sizeof(Elf32_Phdr));
>> +    phdr.p_type = CPU_CONVERT_TO_TARGET32(PT_LOAD);
>> +    phdr.p_offset = CPU_CONVERT_TO_TARGET32(offset);
>> +    phdr.p_paddr = CPU_CONVERT_TO_TARGET32(memory_mapping->phys_addr);
>> +    if (offset == -1) {
>> +        phdr.p_filesz = 0;
>> +    } else {
>> +        phdr.p_filesz = CPU_CONVERT_TO_TARGET32(memory_mapping->length);
>> +    }
>> +    phdr.p_memsz = CPU_CONVERT_TO_TARGET32(memory_mapping->length);
>> +    phdr.p_vaddr = CPU_CONVERT_TO_TARGET32(memory_mapping->virt_addr);
>> +
>> +    phdr_offset = sizeof(Elf32_Ehdr) + sizeof(Elf32_Phdr)*phdr_index;
>> +    lseek(s->fd, phdr_offset, SEEK_SET);
>> +    ret = write(s->fd, &phdr, sizeof(Elf32_Phdr));
>> +    if (ret < 0) {
>> +        dump_error(s, "dump: failed to write program header table.\n");
>> +        return -1;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int write_elf64_notes(DumpState *s, int phdr_index,
>> +                             target_phys_addr_t *offset)
>> +{
>> +    CPUState *env;
>> +    int ret;
>> +    target_phys_addr_t begin = *offset;
>> +    Elf64_Phdr phdr;
>> +    off_t phdr_offset;
>> +    int id;
>> +    int endian = s->dump_info.d_endian;
>> +
>> +    for (env = first_cpu; env != NULL; env = env->next_cpu) {
>> +        id = cpuid(env);
>> +        ret = cpu_write_elf64_note(s->fd, env, id, offset);
>> +        if (ret < 0) {
>> +            dump_error(s, "dump: failed to write elf notes.\n");
>> +            return -1;
>> +        }
>> +    }
>> +
>> +    memset(&phdr, 0, sizeof(Elf64_Phdr));
>> +    phdr.p_type = CPU_CONVERT_TO_TARGET32(PT_NOTE);
>> +    phdr.p_offset = CPU_CONVERT_TO_TARGET64(begin);
>> +    phdr.p_paddr = 0;
>> +    phdr.p_filesz = CPU_CONVERT_TO_TARGET64(*offset - begin);
>> +    phdr.p_memsz = CPU_CONVERT_TO_TARGET64(*offset - begin);
>> +    phdr.p_vaddr = 0;
>> +
>> +    phdr_offset = sizeof(Elf64_Ehdr);
>> +    lseek(s->fd, phdr_offset, SEEK_SET);
>> +    ret = write(s->fd, &phdr, sizeof(Elf64_Phdr));
>> +    if (ret < 0) {
>> +        dump_error(s, "dump: failed to write program header table.\n");
>> +        return -1;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int write_elf32_notes(DumpState *s, int phdr_index,
>> +                             target_phys_addr_t *offset)
>> +{
>> +    CPUState *env;
>> +    int ret;
>> +    target_phys_addr_t begin = *offset;
>> +    Elf32_Phdr phdr;
>> +    off_t phdr_offset;
>> +    int id;
>> +    int endian = s->dump_info.d_endian;
>> +
>> +    for (env = first_cpu; env != NULL; env = env->next_cpu) {
>> +        id = cpuid(env);
>> +        ret = cpu_write_elf32_note(s->fd, env, id, offset);
>> +        if (ret < 0) {
>> +            dump_error(s, "dump: failed to write elf notes.\n");
>> +            return -1;
>> +        }
>> +    }
>> +
>> +    memset(&phdr, 0, sizeof(Elf32_Phdr));
>> +    phdr.p_type = CPU_CONVERT_TO_TARGET32(PT_NOTE);
>> +    phdr.p_offset = CPU_CONVERT_TO_TARGET32(begin);
>> +    phdr.p_paddr = 0;
>> +    phdr.p_filesz = CPU_CONVERT_TO_TARGET32(*offset - begin);
>> +    phdr.p_memsz = CPU_CONVERT_TO_TARGET32(*offset - begin);
>> +    phdr.p_vaddr = 0;
>> +
>> +    phdr_offset = sizeof(Elf32_Ehdr);
>> +    lseek(s->fd, phdr_offset, SEEK_SET);
>> +    ret = write(s->fd, &phdr, sizeof(Elf32_Phdr));
>> +    if (ret < 0) {
>> +        dump_error(s, "dump: failed to write program header table.\n");
>> +        return -1;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int write_data(DumpState *s, void *buf, int length,
>> +                      target_phys_addr_t *offset)
>> +{
>> +    int ret;
>> +
>> +    lseek(s->fd, *offset, SEEK_SET);
>> +    ret = write(s->fd, buf, length);
>> +    if (ret < 0) {
>> +        dump_error(s, "dump: failed to save memory.\n");
>> +        return -1;
>> +    }
>> +
>> +    *offset += length;
>> +    return 0;
>> +}
>> +
>> +/* write the memroy to vmcore. 1 page per I/O. */
>> +static int write_memory(DumpState *s, RAMBlock *block,
>> +                        target_phys_addr_t *offset)
>> +{
>> +    int i, ret;
>> +
>> +    for (i = 0; i < block->length / TARGET_PAGE_SIZE; i++) {
>> +        ret = write_data(s, block->host + i * TARGET_PAGE_SIZE,
>> +                         TARGET_PAGE_SIZE, offset);
>> +        if (ret < 0) {
>> +            return -1;
>> +        }
>> +    }
>> +
>> +    if ((block->length % TARGET_PAGE_SIZE) != 0) {
>> +        ret = write_data(s, block->host + i * TARGET_PAGE_SIZE,
>> +                         block->length % TARGET_PAGE_SIZE, offset);
>> +        if (ret < 0) {
>> +            return -1;
>> +        }
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +/* get the memory's offset in the vmcore */
>> +static target_phys_addr_t get_offset(target_phys_addr_t phys_addr,
>> +                                     target_phys_addr_t memory_offset)
>> +{
>> +    RAMBlock *block;
>> +    target_phys_addr_t offset = memory_offset;
>> +
>> +    QLIST_FOREACH(block, &ram_list.blocks, next) {
>> +        if (phys_addr >= block->offset &&
>> +            phys_addr < block->offset + block->length) {
>> +            return phys_addr - block->offset + offset;
>> +        }
>> +        offset += block->length;
>> +    }
>> +
>> +    return -1;
>> +}
>> +
>> +static DumpState *dump_init(int fd, Error **errp)
>> +{
>> +    CPUState *env;
>> +    DumpState *s = dump_get_current();
>> +    int ret;
>> +
>> +    vm_stop(RUN_STATE_PAUSED);
> 
> I would save the current vm state first and restore it when finished.

OK, I will do it.

> 
>> +    s->state = DUMP_STATE_SETUP;
>> +    if (s->error) {
>> +        g_free(s->error);
>> +        s->error = NULL;
>> +    }
>> +    s->fd = fd;
>> +
>> +    /*
>> +     * get dump info: endian, class and architecture.
>> +     * If the target architecture is not supported, cpu_get_dump_info() will
>> +     * return -1.
>> +     *
>> +     * if we use kvm, we should synchronize the register before we get dump
>> +     * info.
>> +     */
>> +    for (env = first_cpu; env != NULL; env = env->next_cpu) {
>> +        cpu_synchronize_state(env);
>> +    }
>> +    ret = cpu_get_dump_info(&s->dump_info);
>> +    if (ret < 0) {
>> +        error_set(errp, QERR_UNSUPPORTED);
>> +        return NULL;
>> +    }
>> +
>> +    /* get memory mapping */
>> +    s->list.num = 0;
>> +    QTAILQ_INIT(&s->list.head);
>> +    get_memory_mapping(&s->list);
>> +
>> +    /* crash needs extra memory mapping to determine phys_base. */
>> +    ret = cpu_add_extra_memory_mapping(&s->list);
>> +    if (ret < 0) {
>> +        error_set(errp, QERR_UNDEFINED_ERROR);
>> +        return NULL;
>> +    }
>> +
>> +    /*
>> +     * calculate phdr_num
>> +     *
>> +     * the type of phdr->num is uint16_t, so we should avoid overflow
>> +     */
>> +    s->phdr_num = 1; /* PT_NOTE */
>> +    if (s->list.num > (1 << 16) - 2) {
>> +        s->phdr_num = (1 << 16) - 1;
>> +    } else {
>> +        s->phdr_num += s->list.num;
>> +    }
>> +
>> +    return s;
>> +}
>> +
>> +/* write elf header, PT_NOTE and elf note to vmcore. */
>> +static int dump_begin(DumpState *s)
>> +{
>> +    target_phys_addr_t offset;
>> +    int ret;
>> +
>> +    s->state = DUMP_STATE_ACTIVE;
>> +
>> +    /*
>> +     * the vmcore's format is:
>> +     *   --------------
>> +     *   |  elf header |
>> +     *   --------------
>> +     *   |  PT_NOTE    |
>> +     *   --------------
>> +     *   |  PT_LOAD    |
>> +     *   --------------
>> +     *   |  ......     |
>> +     *   --------------
>> +     *   |  PT_LOAD    |
>> +     *   --------------
>> +     *   |  elf note   |
>> +     *   --------------
>> +     *   |  memory     |
>> +     *   --------------
>> +     *
>> +     * we only know where the memory is saved after we write elf note into
>> +     * vmcore.
>> +     */
>> +
>> +    /* write elf header to vmcore */
>> +    if (s->dump_info.d_class == ELFCLASS64) {
>> +        ret = write_elf64_header(s);
>> +    } else {
>> +        ret = write_elf32_header(s);
>> +    }
>> +    if (ret < 0) {
>> +        return -1;
>> +    }
>> +
>> +    /* write elf notes to vmcore */
>> +    if (s->dump_info.d_class == ELFCLASS64) {
>> +        offset = sizeof(Elf64_Ehdr) + sizeof(Elf64_Phdr)*s->phdr_num;
>> +        ret = write_elf64_notes(s, 0, &offset);
>> +    } else {
>> +        offset = sizeof(Elf32_Ehdr) + sizeof(Elf32_Phdr)*s->phdr_num;
>> +        ret = write_elf32_notes(s, 0, &offset);
>> +    }
>> +
>> +    if (ret < 0) {
>> +        return -1;
>> +    }
>> +
>> +    s->memory_offset = offset;
>> +    return 0;
>> +}
>> +
>> +/* write PT_LOAD to vmcore */
>> +static int dump_completed(DumpState *s)
>> +{
>> +    target_phys_addr_t offset;
>> +    MemoryMapping *memory_mapping;
>> +    int phdr_index = 1, ret;
>> +
>> +    QTAILQ_FOREACH(memory_mapping, &s->list.head, next) {
>> +        offset = get_offset(memory_mapping->phys_addr, s->memory_offset);
>> +        if (s->dump_info.d_class == ELFCLASS64) {
>> +            ret = write_elf64_load(s, memory_mapping, phdr_index++, offset);
>> +        } else {
>> +            ret = write_elf32_load(s, memory_mapping, phdr_index++, offset);
>> +        }
>> +        if (ret < 0) {
>> +            return -1;
>> +        }
>> +    }
>> +
>> +    s->state = DUMP_STATE_COMPLETED;
>> +    dump_cleanup(s);
>> +    return 0;
>> +}
>> +
>> +/* write all memory to vmcore */
>> +static int dump_iterate(DumpState *s)
>> +{
>> +    RAMBlock *block;
>> +    target_phys_addr_t offset = s->memory_offset;
>> +    int ret;
>> +
>> +    /* write all memory to vmcore */
>> +    QLIST_FOREACH(block, &ram_list.blocks, next) {
>> +        ret = write_memory(s, block, &offset);
>> +        if (ret < 0) {
>> +            return -1;
>> +        }
>> +    }
>> +
>> +    return dump_completed(s);
>> +}
>> +
>> +static int create_vmcore(DumpState *s)
>> +{
>> +    int ret;
>> +
>> +    ret = dump_begin(s);
>> +    if (ret < 0) {
>> +        return -1;
>> +    }
>> +
>> +    ret = dump_iterate(s);
>> +    if (ret < 0) {
>> +        return -1;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +void qmp_dump(const char *file, Error **errp)
>> +{
>> +    const char *p;
>> +    int fd = -1;
>> +    DumpState *s;
>> +
>> +#if !defined(WIN32)
>> +    if (strstart(file, "fd:", &p)) {
>> +        fd = qemu_get_fd(p);
>> +        if (fd == -1) {
>> +            error_set(errp, QERR_FD_NOT_FOUND, p);
>> +            return;
>> +        }
>> +    }
>> +#endif
>> +
>> +    if  (strstart(file, "file:", &p)) {
>> +        fd = open(p, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, S_IRUSR);
>> +        if (fd < 0) {
>> +            error_set(errp, QERR_OPEN_FILE_FAILED, p);
>> +            return;
>> +        }
>> +    }
>> +
>> +    if (fd == -1) {
>> +        error_set(errp, QERR_INVALID_PARAMETER, "file");
>> +        return;
>> +    }
>> +
>> +    s = dump_init(fd, errp);
>> +    if (!s) {
>> +        return;
>> +    }
>> +
>> +    if (create_vmcore(s) < 0) {
>> +        error_set(errp, QERR_IO_ERROR);
>> +    }
>> +
>> +    return;
>> +}
>> diff --git a/dump.h b/dump.h
>> index a36468b..b413d18 100644
>> --- a/dump.h
>> +++ b/dump.h
>> @@ -1,6 +1,9 @@
>>  #ifndef DUMP_H
>>  #define DUMP_H
>>
>> +#include "qdict.h"
>> +#include "error.h"
>> +
> 
> This looks stray. Nothing is added to this header which require those
> includes.

Yes, I forgot to remove it when updating the patch. I will remove them.

> 
>>  typedef struct ArchDumpInfo {
>>      int d_machine;  /* Architecture */
>>      int d_endian;   /* ELFDATA2LSB or ELFDATA2MSB */
>> diff --git a/hmp-commands.hx b/hmp-commands.hx
>> index 573b823..6cfb678 100644
>> --- a/hmp-commands.hx
>> +++ b/hmp-commands.hx
>> @@ -867,6 +867,22 @@ new parameters (if specified) once the vm migration finished successfully.
>>  ETEXI
>>
>>      {
>> +        .name       = "dump",
>> +        .args_type  = "file:s",
>> +        .params     = "file",
>> +        .help       = "dump to file",
>> +        .user_print = monitor_user_noop,
>> +        .mhandler.cmd = hmp_dump,
>> +    },
>> +
>> +
>> +STEXI
>> +@item dump @var{file}
>> +@findex dump
>> +Dump to @var{file}.
> 
> That's way too brief! :) It should state the format, mention potential
> architecture limitations, and explain that the output can be processed
> with crash or gdb.

OK.

> 
>> +ETEXI
>> +
>> +    {
>>          .name       = "snapshot_blkdev",
>>          .args_type  = "device:B,snapshot-file:s?,format:s?",
>>          .params     = "device [new-image-file] [format]",
>> diff --git a/hmp.c b/hmp.c
>> index 8ff8c94..1a69857 100644
>> --- a/hmp.c
>> +++ b/hmp.c
>> @@ -851,3 +851,12 @@ void hmp_block_job_cancel(Monitor *mon, const QDict *qdict)
>>
>>      hmp_handle_error(mon, &error);
>>  }
>> +
>> +void hmp_dump(Monitor *mon, const QDict *qdict)
>> +{
>> +    Error *errp = NULL;
>> +    const char *file = qdict_get_str(qdict, "file");
>> +
>> +    qmp_dump(file, &errp);
>> +    hmp_handle_error(mon, &errp);
>> +}
>> diff --git a/hmp.h b/hmp.h
>> index 18eecbd..66984c5 100644
>> --- a/hmp.h
>> +++ b/hmp.h
>> @@ -58,5 +58,6 @@ void hmp_block_set_io_throttle(Monitor *mon, const QDict *qdict);
>>  void hmp_block_stream(Monitor *mon, const QDict *qdict);
>>  void hmp_block_job_set_speed(Monitor *mon, const QDict *qdict);
>>  void hmp_block_job_cancel(Monitor *mon, const QDict *qdict);
>> +void hmp_dump(Monitor *mon, const QDict *qdict);
>>
>>  #endif
>> diff --git a/monitor.c b/monitor.c
>> index 7e72739..18e1ac7 100644
>> --- a/monitor.c
>> +++ b/monitor.c
>> @@ -73,6 +73,9 @@
>>  #endif
>>  #include "hw/lm32_pic.h"
>>
>> +/* for dump */
>> +#include "dump.h"
>> +
>>  //#define DEBUG
>>  //#define DEBUG_COMPLETION
>>
>> diff --git a/qapi-schema.json b/qapi-schema.json
>> index d02ee86..1013ae6 100644
>> --- a/qapi-schema.json
>> +++ b/qapi-schema.json
>> @@ -1582,3 +1582,16 @@
>>  { 'command': 'qom-list-types',
>>    'data': { '*implements': 'str', '*abstract': 'bool' },
>>    'returns': [ 'ObjectTypeInfo' ] }
>> +
>> +##
>> +# @dump
>> +#
>> +# Dump guest's memory to vmcore.
>> +#
>> +# @file: the filename or file descriptor of the vmcore.
>> +#
>> +# Returns: nothing on success
>> +#
>> +# Since: 1.1
>> +##
>> +{ 'command': 'dump', 'data': { 'file': 'str' } }
>> diff --git a/qmp-commands.hx b/qmp-commands.hx
>> index b5e2ab8..52d3d3b 100644
>> --- a/qmp-commands.hx
>> +++ b/qmp-commands.hx
>> @@ -566,6 +566,32 @@ Example:
>>  EQMP
>>
>>      {
>> +        .name       = "dump",
>> +        .args_type  = "file:s",
>> +        .params     = "file",
>> +        .help       = "dump to file",
>> +        .user_print = monitor_user_noop,
>> +        .mhandler.cmd_new = qmp_marshal_input_dump,
>> +    },
>> +
>> +SQMP
>> +dump
>> +
>> +
>> +Dump to file.
>> +
>> +Arguments:
>> +
>> +- "file": Destination file (json-string)
> 
> The code looks like it supports both file names and file descriptors,
> no? Same for HMP.

Yes. I will update the description.

Thanks
Wen Congyang

> 
>> +
>> +Example:
>> +
>> +-> { "execute": "dump", "arguments": { "file": "fd:dump" } }
>> +<- { "return": {} }
>> +
>> +EQMP
>> +
>> +    {
>>          .name       = "netdev_add",
>>          .args_type  = "netdev:O",
>>          .params     = "[user|tap|socket],id=str[,prop=value][,...]",
>> --
>> 1.7.1
>>
> 
> Jan
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 10/16 v6] run dump at the background
  2012-02-14 18:27     ` Jan Kiszka
@ 2012-02-15  3:47       ` Wen Congyang
  2012-02-15  9:07         ` Jan Kiszka
  0 siblings, 1 reply; 68+ messages in thread
From: Wen Congyang @ 2012-02-15  3:47 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

At 02/15/2012 02:27 AM, Jan Kiszka Wrote:
> On 2012-02-14 19:05, Jan Kiszka wrote:
>> On 2012-02-09 04:28, Wen Congyang wrote:
>>> The new monitor command dump may take long time to finish. So we need run it
>>> at the background.
>>
>> How does it work? Like live migration, i.e. you retransmit (overwrite)
>> already written but then dirtied pages? Hmm... no.
>>
>> What does background mean then? What is the use case? What if the user
>> decides to resume the vm?
> 
> OK, that is addressed in patch 15! I would suggest merging it into this
> patch. It makes sense to handle that case gracefully right from the
> beginning.

OK, I will merge it.

> 
> OK, now I have some other question: What is the point of rate-limiting
> the dump? The guest is not running, thus not competing for bandwidth.

I use bandwidth to try to control the writing speed. If we write the vmcore
to disk in a high speed, it may affect some other appilications which use
the same disk too.

Thanks
Wen Congyang

> 
> Jan
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 05/16 v6] Add API to get memory mapping
  2012-02-14 17:21   ` Jan Kiszka
@ 2012-02-15  4:07     ` Wen Congyang
  2012-02-15  9:17       ` Jan Kiszka
  0 siblings, 1 reply; 68+ messages in thread
From: Wen Congyang @ 2012-02-15  4:07 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

At 02/15/2012 01:21 AM, Jan Kiszka Wrote:
> On 2012-02-09 04:22, Wen Congyang wrote:
>> Add API to get all virtual address and physical address mapping.
>> If there is no virtual address for some physical address, the virtual
>> address is 0.
>>
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> ---
>>  memory_mapping.c |   65 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  memory_mapping.h |    1 +
>>  2 files changed, 66 insertions(+), 0 deletions(-)
>>
>> diff --git a/memory_mapping.c b/memory_mapping.c
>> index d83b7d7..fc0ddee 100644
>> --- a/memory_mapping.c
>> +++ b/memory_mapping.c
>> @@ -128,3 +128,68 @@ void free_memory_mapping_list(MemoryMappingList *list)
>>  
>>      list->num = 0;
>>  }
>> +
>> +void get_memory_mapping(MemoryMappingList *list)
>> +{
>> +    CPUState *env;
>> +    MemoryMapping *memory_mapping;
>> +    RAMBlock *block;
>> +    ram_addr_t offset, length;
>> +
>> +    last_mapping = NULL;
>> +
>> +    for (env = first_cpu; env != NULL; env = env->next_cpu) {
>> +        cpu_get_memory_mapping(list, env);
> 
> Hmm, is the CPU number recorded along with the mappings? I mean, how
> could crash tell them apart afterward if they are contradictory? This
> way, they are just thrown in the same bucket, correct?
> 
> Even if crash or gdb aren't prepared for cpu/thread-specific mappings,
> could we already record that information for later use? Or would it
> break compatibility with current versions?

crash does not need this information. It only needs the physical address
stored in PT_LOAD.

gdb needs the virtual address and physical address stored in PT_LOAD.

If the address is in the kernel space, the virtual address and physical
address mapping should be the same. I collect the mapping information
from all vcpus, because the OS may enter the second kernel. In this case,
IIRC(according to my test result, but I don't remeber clearly), gdb's bt
can output the backtrace in the first kernel if the OS does not use the
first vcpu to do kdump. otherwise gdb's bt can output the backtrace in
the second kernel.

> 
>> +    }
>> +
>> +    /* some memory may be not mapped, add them into memory mapping's list */
>> +    QLIST_FOREACH(block, &ram_list.blocks, next) {
>> +        offset = block->offset;
>> +        length = block->length;
>> +
>> +        QTAILQ_FOREACH(memory_mapping, &list->head, next) {
>> +            if (memory_mapping->phys_addr >= (offset + length)) {
>> +                /*
>> +                 * memory_mapping's list does not conatin the region
>> +                 * [offset, offset+length)
>> +                 */
>> +                create_new_memory_mapping(list, offset, 0, length);
>> +                length = 0;
>> +                break;
>> +            }
>> +
>> +            if ((memory_mapping->phys_addr + memory_mapping->length) <=
>> +                offset) {
>> +                continue;
>> +            }
>> +
>> +            if (memory_mapping->phys_addr > offset) {
>> +                /*
>> +                 * memory_mapping's list does not conatin the region
>> +                 * [offset, memory_mapping->phys_addr)
>> +                 */
>> +                create_new_memory_mapping(list, offset, 0,
>> +                                          memory_mapping->phys_addr - offset);
>> +            }
>> +
>> +            if ((offset + length) <=
>> +                (memory_mapping->phys_addr + memory_mapping->length)) {
>> +                length = 0;
>> +                break;
>> +            }
>> +            length -= memory_mapping->phys_addr + memory_mapping->length -
>> +                      offset;
>> +            offset = memory_mapping->phys_addr + memory_mapping->length;
>> +        }
>> +
>> +        if (length > 0) {
>> +            /*
>> +             * memory_mapping's list does not conatin the region
>> +             * [offset, memory_mapping->phys_addr)
>> +             */
>> +            create_new_memory_mapping(list, offset, 0, length);
>> +        }
>> +    }
>> +
>> +    return;
> 
> Please avoid redundant returns.

OK

> 
>> +}
>> diff --git a/memory_mapping.h b/memory_mapping.h
>> index a4b1532..679f9ef 100644
>> --- a/memory_mapping.h
>> +++ b/memory_mapping.h
>> @@ -34,5 +34,6 @@ void add_to_memory_mapping(MemoryMappingList *list,
>>                             ram_addr_t length);
>>  
>>  void free_memory_mapping_list(MemoryMappingList *list);
>> +void get_memory_mapping(MemoryMappingList *list);
>>  
>>  #endif
> 
> Maybe [qemu_]get_guest_memory_mapping. Just get_memory_mapping sounds a
> bit to generic to me. Could be any mapping.

OK, I will change the API's name

Thanks
Wen Congyang

> 
> Jan
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 07/16 v6] target-i386: Add API to add extra memory mapping
  2012-02-14 17:35   ` Jan Kiszka
@ 2012-02-15  5:19     ` Wen Congyang
  2012-02-15  9:21       ` Jan Kiszka
  0 siblings, 1 reply; 68+ messages in thread
From: Wen Congyang @ 2012-02-15  5:19 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

At 02/15/2012 01:35 AM, Jan Kiszka Wrote:
> On 2012-02-09 04:24, Wen Congyang wrote:
>> Crash needs extra memory mapping to determine phys_base.
>>
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> ---
>>  cpu-all.h               |    2 ++
>>  target-i386/arch-dump.c |   43 +++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 45 insertions(+), 0 deletions(-)
>>
>> diff --git a/cpu-all.h b/cpu-all.h
>> index efb5ba3..290c43a 100644
>> --- a/cpu-all.h
>> +++ b/cpu-all.h
>> @@ -530,10 +530,12 @@ int cpu_write_elf64_note(int fd, CPUState *env, int cpuid,
>>                           target_phys_addr_t *offset);
>>  int cpu_write_elf32_note(int fd, CPUState *env, int cpuid,
>>                           target_phys_addr_t *offset);
>> +int cpu_add_extra_memory_mapping(MemoryMappingList *list);
>>  #else
>>  #define cpu_get_memory_mapping(list, env)
>>  #define cpu_write_elf64_note(fd, env, cpuid, offset) ({ -1; })
>>  #define cpu_write_elf32_note(fd, env, cpuid, offset) ({ -1; })
>> +#define cpu_add_extra_memory_mapping(list) ({ 0; })
>>  #endif
>>  
>>  #endif /* CPU_ALL_H */
>> diff --git a/target-i386/arch-dump.c b/target-i386/arch-dump.c
>> index 4c0ff77..d96f6ae 100644
>> --- a/target-i386/arch-dump.c
>> +++ b/target-i386/arch-dump.c
>> @@ -495,3 +495,46 @@ int cpu_write_elf32_note(int fd, CPUState *env, int cpuid,
>>  {
>>      return x86_write_elf32_note(fd, env, cpuid, offset);
>>  }
>> +
>> +/* This function is copied from crash */
> 
> And what does it do there and here? I suppose it is Linux-specific - any
> version? This should be documented and encoded in the function name.
> 
>> +static target_ulong get_phys_base_addr(CPUState *env, target_ulong *base_vaddr)
>> +{
>> +    int i;
>> +    target_ulong kernel_base = -1;
>> +    target_ulong last, mask;
>> +
>> +    for (i = 30, last = -1; (kernel_base == -1) && (i >= 20); i--) {
>> +        mask = ~((1LL << i) - 1);
>> +        *base_vaddr = env->idt.base & mask;
>> +        if (*base_vaddr == last) {
>> +            continue;
>> +        }
>> +
>> +        kernel_base = cpu_get_phys_page_debug(env, *base_vaddr);
>> +        last = *base_vaddr;
>> +    }
>> +
>> +    return kernel_base;
>> +}
>> +
>> +int cpu_add_extra_memory_mapping(MemoryMappingList *list)
> 
> Again, what does "extra" mean? Probably guest-specific, no?

crash will calculate the phys_base according to the virtual address and physical
address stored in the PT_LOAD.

If the vmcore is generated by 'virsh dump'(use migration to implement dumping),
crash calculates the phys_base according to idt.base. The function get_phys_base_addr()
uses the same way to calculates the phys_base.

I think crash may work without this. I will verify it.

Thanks
Wen Congyang

> 
>> +{
>> +#ifdef TARGET_X86_64
>> +    target_phys_addr_t kernel_base = -1;
>> +    target_ulong base_vaddr;
>> +    bool lma = !!(first_cpu->hflags & HF_LMA_MASK);
>> +
>> +    if (!lma) {
>> +        return 0;
>> +    }
>> +
>> +    kernel_base = get_phys_base_addr(first_cpu, &base_vaddr);
>> +    if (kernel_base == -1) {
>> +        return -1;
>> +    }
>> +
>> +    create_new_memory_mapping_head(list, kernel_base, base_vaddr,
>> +                                   TARGET_PAGE_SIZE);
>> +#endif
>> +    return 0;
>> +}
> 
> Jan
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 01/16 v6] monitor: introduce qemu_suspend_monitor()/qemu_resume_monitor()
  2012-02-15  2:54     ` Wen Congyang
@ 2012-02-15  8:51       ` Jan Kiszka
  2012-02-15 13:01         ` Luiz Capitulino
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Kiszka @ 2012-02-15  8:51 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

On 2012-02-15 03:54, Wen Congyang wrote:
> At 02/15/2012 12:19 AM, Jan Kiszka Wrote:
>> On 2012-02-09 04:19, Wen Congyang wrote:
>>> Sync command needs these two APIs to suspend/resume monitor.
>>>
>>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>>> ---
>>>  monitor.c |   27 +++++++++++++++++++++++++++
>>>  monitor.h |    2 ++
>>>  2 files changed, 29 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/monitor.c b/monitor.c
>>> index 11639b1..7e72739 100644
>>> --- a/monitor.c
>>> +++ b/monitor.c
>>> @@ -4442,6 +4442,26 @@ static void monitor_command_cb(Monitor *mon, const char *cmdline, void *opaque)
>>>      monitor_resume(mon);
>>>  }
>>>  
>>> +int qemu_suspend_monitor(const char *fmt, ...)
>>> +{
>>> +    int ret;
>>> +
>>> +    if (cur_mon) {
>>> +        ret = monitor_suspend(cur_mon);
>>> +    } else {
>>> +        ret = -ENOTTY;
>>> +    }
>>> +
>>> +    if (ret < 0 && fmt) {
>>> +        va_list ap;
>>> +        va_start(ap, fmt);
>>> +        monitor_vprintf(cur_mon, fmt, ap);
>>> +        va_end(ap);
>>> +    }
>>> +
>>> +    return ret;
>>> +}
>>> +
>>>  int monitor_suspend(Monitor *mon)
>>>  {
>>>      if (!mon->rs)
>>> @@ -4450,6 +4470,13 @@ int monitor_suspend(Monitor *mon)
>>>      return 0;
>>>  }
>>>  
>>> +void qemu_resume_monitor(void)
>>> +{
>>> +    if (cur_mon) {
>>> +        monitor_resume(cur_mon);
>>> +    }
>>> +}
>>> +
>>>  void monitor_resume(Monitor *mon)
>>>  {
>>>      if (!mon->rs)
>>> diff --git a/monitor.h b/monitor.h
>>> index 58109af..60a1e17 100644
>>> --- a/monitor.h
>>> +++ b/monitor.h
>>> @@ -46,7 +46,9 @@ int monitor_cur_is_qmp(void);
>>>  void monitor_protocol_event(MonitorEvent event, QObject *data);
>>>  void monitor_init(CharDriverState *chr, int flags);
>>>  
>>> +int qemu_suspend_monitor(const char *fmt, ...) GCC_FMT_ATTR(1, 2);
>>>  int monitor_suspend(Monitor *mon);
>>> +void qemu_resume_monitor(void);
>>>  void monitor_resume(Monitor *mon);
>>>  
>>>  int monitor_read_bdrv_key_start(Monitor *mon, BlockDriverState *bs,
>>
>> I don't see any added value in this API, specifically as it is built on
>> top of cur_mon. Just use the existing services like the migration code
>> does. If you properly pass down the monitor reference from the command
>> to the suspend and store what monitor you suspended, all should be fine.
> 
> This API is like qemu_get_fd() which is not merged into upstream qemu.
> I need this API because I cannot use monitor in qapi command.

OK, then I need to comment on that approach. QMP looks flawed here.
Either you have a need for a Monitor object (or a generic HMP/QMP
context), then you also have a handle. Or your don't, then you do not
need monitor suspend/resume or get_fd as well.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 08/16 v6] target-i386: add API to get dump info
  2012-02-15  3:30     ` Wen Congyang
@ 2012-02-15  9:05       ` Jan Kiszka
  2012-02-15  9:10         ` Wen Congyang
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Kiszka @ 2012-02-15  9:05 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

On 2012-02-15 04:30, Wen Congyang wrote:
>>> diff --git a/dump.h b/dump.h
>>> new file mode 100644
>>> index 0000000..a36468b
>>> --- /dev/null
>>> +++ b/dump.h
>>> @@ -0,0 +1,10 @@
>>
>> License header missing.
> 
> There is no license in other header files.

But those are preexisting files, no need to repeat the mistake for new
one. Please fix.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 10/16 v6] run dump at the background
  2012-02-15  3:47       ` Wen Congyang
@ 2012-02-15  9:07         ` Jan Kiszka
  2012-02-15  9:22           ` Wen Congyang
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Kiszka @ 2012-02-15  9:07 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

On 2012-02-15 04:47, Wen Congyang wrote:
> At 02/15/2012 02:27 AM, Jan Kiszka Wrote:
>> On 2012-02-14 19:05, Jan Kiszka wrote:
>>> On 2012-02-09 04:28, Wen Congyang wrote:
>>>> The new monitor command dump may take long time to finish. So we need run it
>>>> at the background.
>>>
>>> How does it work? Like live migration, i.e. you retransmit (overwrite)
>>> already written but then dirtied pages? Hmm... no.
>>>
>>> What does background mean then? What is the use case? What if the user
>>> decides to resume the vm?
>>
>> OK, that is addressed in patch 15! I would suggest merging it into this
>> patch. It makes sense to handle that case gracefully right from the
>> beginning.
> 
> OK, I will merge it.
> 
>>
>> OK, now I have some other question: What is the point of rate-limiting
>> the dump? The guest is not running, thus not competing for bandwidth.
> 
> I use bandwidth to try to control the writing speed. If we write the vmcore
> to disk in a high speed, it may affect some other appilications which use
> the same disk too.

Just like the guest of that particular VM can do. I don't think we need
this level of control here, it will be provided (if required) at a
different level, affecting the whole QEMU process. Removing the vmcore
bandwidth control will simplify code and user interface.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 08/16 v6] target-i386: add API to get dump info
  2012-02-15  9:05       ` Jan Kiszka
@ 2012-02-15  9:10         ` Wen Congyang
  0 siblings, 0 replies; 68+ messages in thread
From: Wen Congyang @ 2012-02-15  9:10 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

At 02/15/2012 05:05 PM, Jan Kiszka Wrote:
> On 2012-02-15 04:30, Wen Congyang wrote:
>>>> diff --git a/dump.h b/dump.h
>>>> new file mode 100644
>>>> index 0000000..a36468b
>>>> --- /dev/null
>>>> +++ b/dump.h
>>>> @@ -0,0 +1,10 @@
>>>
>>> License header missing.
>>
>> There is no license in other header files.
> 
> But those are preexisting files, no need to repeat the mistake for new
> one. Please fix.

OK, I will fix it.

Thanks
Wen Congyang

> 
> Jan
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 08/16 v6] target-i386: add API to get dump info
  2012-02-09  3:26 ` [Qemu-devel] [RFC][PATCH 08/16 v6] target-i386: add API to get dump info Wen Congyang
  2012-02-14 17:39   ` Jan Kiszka
@ 2012-02-15  9:12   ` Peter Maydell
  2012-02-15  9:19     ` Wen Congyang
  1 sibling, 1 reply; 68+ messages in thread
From: Peter Maydell @ 2012-02-15  9:12 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Jan Kiszka, qemu-devel, Luiz Capitulino, HATAYAMA Daisuke,
	Dave Anderson, Eric Blake

On 9 February 2012 03:26, Wen Congyang <wency@cn.fujitsu.com> wrote:
> +int cpu_get_dump_info(ArchDumpInfo *info)
> +{
> +    bool lma = false;
> +    RAMBlock *block;
> +
> +#ifdef TARGET_X86_64
> +    lma = !!(first_cpu->hflags & HF_LMA_MASK);
> +#endif
> +
> +    if (lma) {
> +        info->d_machine = EM_X86_64;
> +    } else {
> +        info->d_machine = EM_386;
> +    }
> +    info->d_endian = ELFDATA2LSB;
> +
> +    if (lma) {
> +        info->d_class = ELFCLASS64;
> +    } else {
> +        info->d_class = ELFCLASS32;
> +    }
> +
> +    QLIST_FOREACH(block, &ram_list.blocks, next) {
> +        if (!lma && (block->offset + block->length > UINT_MAX)) {
> +            /* The memory size is greater than 4G */
> +            info->d_class = ELFCLASS32;
> +            break;
> +        }
> +    }

I think it would be cleaner to have a single
  if (lma) {
     stuff;
  } else {
     stuff;
  }

rather than checking it three times, especially for
the loop, where if lma is true we'll walk the ram_list
without ever doing anything.

-- PMM

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 05/16 v6] Add API to get memory mapping
  2012-02-15  4:07     ` Wen Congyang
@ 2012-02-15  9:17       ` Jan Kiszka
  2012-02-15  9:41         ` Wen Congyang
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Kiszka @ 2012-02-15  9:17 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

On 2012-02-15 05:07, Wen Congyang wrote:
> At 02/15/2012 01:21 AM, Jan Kiszka Wrote:
>> On 2012-02-09 04:22, Wen Congyang wrote:
>>> Add API to get all virtual address and physical address mapping.
>>> If there is no virtual address for some physical address, the virtual
>>> address is 0.
>>>
>>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>>> ---
>>>  memory_mapping.c |   65 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>  memory_mapping.h |    1 +
>>>  2 files changed, 66 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/memory_mapping.c b/memory_mapping.c
>>> index d83b7d7..fc0ddee 100644
>>> --- a/memory_mapping.c
>>> +++ b/memory_mapping.c
>>> @@ -128,3 +128,68 @@ void free_memory_mapping_list(MemoryMappingList *list)
>>>  
>>>      list->num = 0;
>>>  }
>>> +
>>> +void get_memory_mapping(MemoryMappingList *list)
>>> +{
>>> +    CPUState *env;
>>> +    MemoryMapping *memory_mapping;
>>> +    RAMBlock *block;
>>> +    ram_addr_t offset, length;
>>> +
>>> +    last_mapping = NULL;
>>> +
>>> +    for (env = first_cpu; env != NULL; env = env->next_cpu) {
>>> +        cpu_get_memory_mapping(list, env);
>>
>> Hmm, is the CPU number recorded along with the mappings? I mean, how
>> could crash tell them apart afterward if they are contradictory? This
>> way, they are just thrown in the same bucket, correct?
>>
>> Even if crash or gdb aren't prepared for cpu/thread-specific mappings,
>> could we already record that information for later use? Or would it
>> break compatibility with current versions?
> 
> crash does not need this information. It only needs the physical address
> stored in PT_LOAD.

So crash does not support viewing memory through the eyes of different
CPUs? OK.

> 
> gdb needs the virtual address and physical address stored in PT_LOAD.
> 
> If the address is in the kernel space, the virtual address and physical
> address mapping should be the same. I collect the mapping information
> from all vcpus, because the OS may enter the second kernel. In this case,
> IIRC(according to my test result, but I don't remeber clearly), gdb's bt
> can output the backtrace in the first kernel if the OS does not use the
> first vcpu to do kdump. otherwise gdb's bt can output the backtrace in
> the second kernel.

gdb could only make proper use of the additional mappings if they are
not contradictory (which can easily happen with user space processes) or
the cpu context is additionally provided so that views can be switched
via the "thread N" command. So far, QEMU's gdbstub does this for gdb
when it requests some memory over the remote connection. I bet gdb
requires some extension to exploit such information offline from a core
file, but I'm also sure that this will come as the importance of gdb for
system level debugging will rise.

Therefore my question: is there room to encode the mapping relation to a
CPU/thread context?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 08/16 v6] target-i386: add API to get dump info
  2012-02-15  9:12   ` Peter Maydell
@ 2012-02-15  9:19     ` Wen Congyang
  0 siblings, 0 replies; 68+ messages in thread
From: Wen Congyang @ 2012-02-15  9:19 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Jan Kiszka, qemu-devel, Luiz Capitulino, HATAYAMA Daisuke,
	Dave Anderson, Eric Blake

At 02/15/2012 05:12 PM, Peter Maydell Wrote:
> On 9 February 2012 03:26, Wen Congyang <wency@cn.fujitsu.com> wrote:
>> +int cpu_get_dump_info(ArchDumpInfo *info)
>> +{
>> +    bool lma = false;
>> +    RAMBlock *block;
>> +
>> +#ifdef TARGET_X86_64
>> +    lma = !!(first_cpu->hflags & HF_LMA_MASK);
>> +#endif
>> +
>> +    if (lma) {
>> +        info->d_machine = EM_X86_64;
>> +    } else {
>> +        info->d_machine = EM_386;
>> +    }
>> +    info->d_endian = ELFDATA2LSB;
>> +
>> +    if (lma) {
>> +        info->d_class = ELFCLASS64;
>> +    } else {
>> +        info->d_class = ELFCLASS32;
>> +    }
>> +
>> +    QLIST_FOREACH(block, &ram_list.blocks, next) {
>> +        if (!lma && (block->offset + block->length > UINT_MAX)) {
>> +            /* The memory size is greater than 4G */
>> +            info->d_class = ELFCLASS32;
>> +            break;
>> +        }
>> +    }
> 
> I think it would be cleaner to have a single
>   if (lma) {
>      stuff;
>   } else {
>      stuff;
>   }
> 
> rather than checking it three times, especially for
> the loop, where if lma is true we'll walk the ram_list
> without ever doing anything.

Nice. I will change it.

Thanks
Wen Congyang

> 
> -- PMM
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 07/16 v6] target-i386: Add API to add extra memory mapping
  2012-02-15  5:19     ` Wen Congyang
@ 2012-02-15  9:21       ` Jan Kiszka
  2012-02-15  9:44         ` Wen Congyang
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Kiszka @ 2012-02-15  9:21 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

On 2012-02-15 06:19, Wen Congyang wrote:
> At 02/15/2012 01:35 AM, Jan Kiszka Wrote:
>> On 2012-02-09 04:24, Wen Congyang wrote:
>>> Crash needs extra memory mapping to determine phys_base.
>>>
>>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>>> ---
>>>  cpu-all.h               |    2 ++
>>>  target-i386/arch-dump.c |   43 +++++++++++++++++++++++++++++++++++++++++++
>>>  2 files changed, 45 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/cpu-all.h b/cpu-all.h
>>> index efb5ba3..290c43a 100644
>>> --- a/cpu-all.h
>>> +++ b/cpu-all.h
>>> @@ -530,10 +530,12 @@ int cpu_write_elf64_note(int fd, CPUState *env, int cpuid,
>>>                           target_phys_addr_t *offset);
>>>  int cpu_write_elf32_note(int fd, CPUState *env, int cpuid,
>>>                           target_phys_addr_t *offset);
>>> +int cpu_add_extra_memory_mapping(MemoryMappingList *list);
>>>  #else
>>>  #define cpu_get_memory_mapping(list, env)
>>>  #define cpu_write_elf64_note(fd, env, cpuid, offset) ({ -1; })
>>>  #define cpu_write_elf32_note(fd, env, cpuid, offset) ({ -1; })
>>> +#define cpu_add_extra_memory_mapping(list) ({ 0; })
>>>  #endif
>>>  
>>>  #endif /* CPU_ALL_H */
>>> diff --git a/target-i386/arch-dump.c b/target-i386/arch-dump.c
>>> index 4c0ff77..d96f6ae 100644
>>> --- a/target-i386/arch-dump.c
>>> +++ b/target-i386/arch-dump.c
>>> @@ -495,3 +495,46 @@ int cpu_write_elf32_note(int fd, CPUState *env, int cpuid,
>>>  {
>>>      return x86_write_elf32_note(fd, env, cpuid, offset);
>>>  }
>>> +
>>> +/* This function is copied from crash */
>>
>> And what does it do there and here? I suppose it is Linux-specific - any
>> version? This should be documented and encoded in the function name.
>>
>>> +static target_ulong get_phys_base_addr(CPUState *env, target_ulong *base_vaddr)
>>> +{
>>> +    int i;
>>> +    target_ulong kernel_base = -1;
>>> +    target_ulong last, mask;
>>> +
>>> +    for (i = 30, last = -1; (kernel_base == -1) && (i >= 20); i--) {
>>> +        mask = ~((1LL << i) - 1);
>>> +        *base_vaddr = env->idt.base & mask;
>>> +        if (*base_vaddr == last) {
>>> +            continue;
>>> +        }
>>> +
>>> +        kernel_base = cpu_get_phys_page_debug(env, *base_vaddr);
>>> +        last = *base_vaddr;
>>> +    }
>>> +
>>> +    return kernel_base;
>>> +}
>>> +
>>> +int cpu_add_extra_memory_mapping(MemoryMappingList *list)
>>
>> Again, what does "extra" mean? Probably guest-specific, no?
> 
> crash will calculate the phys_base according to the virtual address and physical
> address stored in the PT_LOAD.

Crash is a Linux-only tool, dump must not be restricted to that guest -
but could contain transparent extensions of the file format if needed.

> 
> If the vmcore is generated by 'virsh dump'(use migration to implement dumping),
> crash calculates the phys_base according to idt.base. The function get_phys_base_addr()
> uses the same way to calculates the phys_base.

Hmm, where are those special registers (idt, gdt, tr etc.) stored in the
vmcore file, BTW?

> 
> I think crash may work without this. I will verify it.

Does gdb require this?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 10/16 v6] run dump at the background
  2012-02-15  9:22           ` Wen Congyang
@ 2012-02-15  9:21             ` Jan Kiszka
  2012-02-15  9:35               ` Wen Congyang
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Kiszka @ 2012-02-15  9:21 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

On 2012-02-15 10:22, Wen Congyang wrote:
> At 02/15/2012 05:07 PM, Jan Kiszka Wrote:
>> On 2012-02-15 04:47, Wen Congyang wrote:
>>> At 02/15/2012 02:27 AM, Jan Kiszka Wrote:
>>>> On 2012-02-14 19:05, Jan Kiszka wrote:
>>>>> On 2012-02-09 04:28, Wen Congyang wrote:
>>>>>> The new monitor command dump may take long time to finish. So we need run it
>>>>>> at the background.
>>>>>
>>>>> How does it work? Like live migration, i.e. you retransmit (overwrite)
>>>>> already written but then dirtied pages? Hmm... no.
>>>>>
>>>>> What does background mean then? What is the use case? What if the user
>>>>> decides to resume the vm?
>>>>
>>>> OK, that is addressed in patch 15! I would suggest merging it into this
>>>> patch. It makes sense to handle that case gracefully right from the
>>>> beginning.
>>>
>>> OK, I will merge it.
>>>
>>>>
>>>> OK, now I have some other question: What is the point of rate-limiting
>>>> the dump? The guest is not running, thus not competing for bandwidth.
>>>
>>> I use bandwidth to try to control the writing speed. If we write the vmcore
>>> to disk in a high speed, it may affect some other appilications which use
>>> the same disk too.
>>
>> Just like the guest of that particular VM can do. I don't think we need
>> this level of control here, it will be provided (if required) at a
>> different level, affecting the whole QEMU process. Removing the vmcore
>> bandwidth control will simplify code and user interface.
> 
> OK. I will implementing it like this:
> 1. write 100ms
> 2. sleep 100ms(allow qemu to do the other things)
> 3. goto 1

Why? Just write as fast as possible.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 10/16 v6] run dump at the background
  2012-02-15  9:07         ` Jan Kiszka
@ 2012-02-15  9:22           ` Wen Congyang
  2012-02-15  9:21             ` Jan Kiszka
  0 siblings, 1 reply; 68+ messages in thread
From: Wen Congyang @ 2012-02-15  9:22 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

At 02/15/2012 05:07 PM, Jan Kiszka Wrote:
> On 2012-02-15 04:47, Wen Congyang wrote:
>> At 02/15/2012 02:27 AM, Jan Kiszka Wrote:
>>> On 2012-02-14 19:05, Jan Kiszka wrote:
>>>> On 2012-02-09 04:28, Wen Congyang wrote:
>>>>> The new monitor command dump may take long time to finish. So we need run it
>>>>> at the background.
>>>>
>>>> How does it work? Like live migration, i.e. you retransmit (overwrite)
>>>> already written but then dirtied pages? Hmm... no.
>>>>
>>>> What does background mean then? What is the use case? What if the user
>>>> decides to resume the vm?
>>>
>>> OK, that is addressed in patch 15! I would suggest merging it into this
>>> patch. It makes sense to handle that case gracefully right from the
>>> beginning.
>>
>> OK, I will merge it.
>>
>>>
>>> OK, now I have some other question: What is the point of rate-limiting
>>> the dump? The guest is not running, thus not competing for bandwidth.
>>
>> I use bandwidth to try to control the writing speed. If we write the vmcore
>> to disk in a high speed, it may affect some other appilications which use
>> the same disk too.
> 
> Just like the guest of that particular VM can do. I don't think we need
> this level of control here, it will be provided (if required) at a
> different level, affecting the whole QEMU process. Removing the vmcore
> bandwidth control will simplify code and user interface.

OK. I will implementing it like this:
1. write 100ms
2. sleep 100ms(allow qemu to do the other things)
3. goto 1

Thanks
Wen Congyang
> 
> Jan
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 10/16 v6] run dump at the background
  2012-02-15  9:21             ` Jan Kiszka
@ 2012-02-15  9:35               ` Wen Congyang
  2012-02-15 10:16                 ` Jan Kiszka
  0 siblings, 1 reply; 68+ messages in thread
From: Wen Congyang @ 2012-02-15  9:35 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

At 02/15/2012 05:21 PM, Jan Kiszka Wrote:
> On 2012-02-15 10:22, Wen Congyang wrote:
>> At 02/15/2012 05:07 PM, Jan Kiszka Wrote:
>>> On 2012-02-15 04:47, Wen Congyang wrote:
>>>> At 02/15/2012 02:27 AM, Jan Kiszka Wrote:
>>>>> On 2012-02-14 19:05, Jan Kiszka wrote:
>>>>>> On 2012-02-09 04:28, Wen Congyang wrote:
>>>>>>> The new monitor command dump may take long time to finish. So we need run it
>>>>>>> at the background.
>>>>>>
>>>>>> How does it work? Like live migration, i.e. you retransmit (overwrite)
>>>>>> already written but then dirtied pages? Hmm... no.
>>>>>>
>>>>>> What does background mean then? What is the use case? What if the user
>>>>>> decides to resume the vm?
>>>>>
>>>>> OK, that is addressed in patch 15! I would suggest merging it into this
>>>>> patch. It makes sense to handle that case gracefully right from the
>>>>> beginning.
>>>>
>>>> OK, I will merge it.
>>>>
>>>>>
>>>>> OK, now I have some other question: What is the point of rate-limiting
>>>>> the dump? The guest is not running, thus not competing for bandwidth.
>>>>
>>>> I use bandwidth to try to control the writing speed. If we write the vmcore
>>>> to disk in a high speed, it may affect some other appilications which use
>>>> the same disk too.
>>>
>>> Just like the guest of that particular VM can do. I don't think we need
>>> this level of control here, it will be provided (if required) at a
>>> different level, affecting the whole QEMU process. Removing the vmcore
>>> bandwidth control will simplify code and user interface.
>>
>> OK. I will implementing it like this:
>> 1. write 100ms
>> 2. sleep 100ms(allow qemu to do the other things)
>> 3. goto 1
> 
> Why? Just write as fast as possible.

If the memory is too big, the command will take too long time. 
Eric said:
  It sounds like it is long-running, which
  means it probably needs to be asynchronous, as well as issue an event
  upon completion, so that other monitor commands can be issued in the
  meantime.

Thanks
Wen Congyang
> 
> Jan
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 05/16 v6] Add API to get memory mapping
  2012-02-15  9:17       ` Jan Kiszka
@ 2012-02-15  9:41         ` Wen Congyang
  2012-02-15  9:47           ` HATAYAMA Daisuke
  0 siblings, 1 reply; 68+ messages in thread
From: Wen Congyang @ 2012-02-15  9:41 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

At 02/15/2012 05:17 PM, Jan Kiszka Wrote:
> On 2012-02-15 05:07, Wen Congyang wrote:
>> At 02/15/2012 01:21 AM, Jan Kiszka Wrote:
>>> On 2012-02-09 04:22, Wen Congyang wrote:
>>>> Add API to get all virtual address and physical address mapping.
>>>> If there is no virtual address for some physical address, the virtual
>>>> address is 0.
>>>>
>>>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>>>> ---
>>>>  memory_mapping.c |   65 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>  memory_mapping.h |    1 +
>>>>  2 files changed, 66 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/memory_mapping.c b/memory_mapping.c
>>>> index d83b7d7..fc0ddee 100644
>>>> --- a/memory_mapping.c
>>>> +++ b/memory_mapping.c
>>>> @@ -128,3 +128,68 @@ void free_memory_mapping_list(MemoryMappingList *list)
>>>>  
>>>>      list->num = 0;
>>>>  }
>>>> +
>>>> +void get_memory_mapping(MemoryMappingList *list)
>>>> +{
>>>> +    CPUState *env;
>>>> +    MemoryMapping *memory_mapping;
>>>> +    RAMBlock *block;
>>>> +    ram_addr_t offset, length;
>>>> +
>>>> +    last_mapping = NULL;
>>>> +
>>>> +    for (env = first_cpu; env != NULL; env = env->next_cpu) {
>>>> +        cpu_get_memory_mapping(list, env);
>>>
>>> Hmm, is the CPU number recorded along with the mappings? I mean, how
>>> could crash tell them apart afterward if they are contradictory? This
>>> way, they are just thrown in the same bucket, correct?
>>>
>>> Even if crash or gdb aren't prepared for cpu/thread-specific mappings,
>>> could we already record that information for later use? Or would it
>>> break compatibility with current versions?
>>
>> crash does not need this information. It only needs the physical address
>> stored in PT_LOAD.
> 
> So crash does not support viewing memory through the eyes of different
> CPUs? OK.
> 
>>
>> gdb needs the virtual address and physical address stored in PT_LOAD.
>>
>> If the address is in the kernel space, the virtual address and physical
>> address mapping should be the same. I collect the mapping information
>> from all vcpus, because the OS may enter the second kernel. In this case,
>> IIRC(according to my test result, but I don't remeber clearly), gdb's bt
>> can output the backtrace in the first kernel if the OS does not use the
>> first vcpu to do kdump. otherwise gdb's bt can output the backtrace in
>> the second kernel.
> 
> gdb could only make proper use of the additional mappings if they are
> not contradictory (which can easily happen with user space processes) or
> the cpu context is additionally provided so that views can be switched
> via the "thread N" command. So far, QEMU's gdbstub does this for gdb
> when it requests some memory over the remote connection. I bet gdb
> requires some extension to exploit such information offline from a core
> file, but I'm also sure that this will come as the importance of gdb for
> system level debugging will rise.
> 
> Therefore my question: is there room to encode the mapping relation to a
> CPU/thread context?

I donot know. But I think the answer is no, because there is no filed
in the struct Elf32_Phdr/Elf64_Phdr to store the CPU/thread id.

Thanks
Wen Congyang

> 
> Jan
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 07/16 v6] target-i386: Add API to add extra memory mapping
  2012-02-15  9:21       ` Jan Kiszka
@ 2012-02-15  9:44         ` Wen Congyang
  2012-02-15 10:21           ` Jan Kiszka
  0 siblings, 1 reply; 68+ messages in thread
From: Wen Congyang @ 2012-02-15  9:44 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

At 02/15/2012 05:21 PM, Jan Kiszka Wrote:
> On 2012-02-15 06:19, Wen Congyang wrote:
>> At 02/15/2012 01:35 AM, Jan Kiszka Wrote:
>>> On 2012-02-09 04:24, Wen Congyang wrote:
>>>> Crash needs extra memory mapping to determine phys_base.
>>>>
>>>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>>>> ---
>>>>  cpu-all.h               |    2 ++
>>>>  target-i386/arch-dump.c |   43 +++++++++++++++++++++++++++++++++++++++++++
>>>>  2 files changed, 45 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/cpu-all.h b/cpu-all.h
>>>> index efb5ba3..290c43a 100644
>>>> --- a/cpu-all.h
>>>> +++ b/cpu-all.h
>>>> @@ -530,10 +530,12 @@ int cpu_write_elf64_note(int fd, CPUState *env, int cpuid,
>>>>                           target_phys_addr_t *offset);
>>>>  int cpu_write_elf32_note(int fd, CPUState *env, int cpuid,
>>>>                           target_phys_addr_t *offset);
>>>> +int cpu_add_extra_memory_mapping(MemoryMappingList *list);
>>>>  #else
>>>>  #define cpu_get_memory_mapping(list, env)
>>>>  #define cpu_write_elf64_note(fd, env, cpuid, offset) ({ -1; })
>>>>  #define cpu_write_elf32_note(fd, env, cpuid, offset) ({ -1; })
>>>> +#define cpu_add_extra_memory_mapping(list) ({ 0; })
>>>>  #endif
>>>>  
>>>>  #endif /* CPU_ALL_H */
>>>> diff --git a/target-i386/arch-dump.c b/target-i386/arch-dump.c
>>>> index 4c0ff77..d96f6ae 100644
>>>> --- a/target-i386/arch-dump.c
>>>> +++ b/target-i386/arch-dump.c
>>>> @@ -495,3 +495,46 @@ int cpu_write_elf32_note(int fd, CPUState *env, int cpuid,
>>>>  {
>>>>      return x86_write_elf32_note(fd, env, cpuid, offset);
>>>>  }
>>>> +
>>>> +/* This function is copied from crash */
>>>
>>> And what does it do there and here? I suppose it is Linux-specific - any
>>> version? This should be documented and encoded in the function name.
>>>
>>>> +static target_ulong get_phys_base_addr(CPUState *env, target_ulong *base_vaddr)
>>>> +{
>>>> +    int i;
>>>> +    target_ulong kernel_base = -1;
>>>> +    target_ulong last, mask;
>>>> +
>>>> +    for (i = 30, last = -1; (kernel_base == -1) && (i >= 20); i--) {
>>>> +        mask = ~((1LL << i) - 1);
>>>> +        *base_vaddr = env->idt.base & mask;
>>>> +        if (*base_vaddr == last) {
>>>> +            continue;
>>>> +        }
>>>> +
>>>> +        kernel_base = cpu_get_phys_page_debug(env, *base_vaddr);
>>>> +        last = *base_vaddr;
>>>> +    }
>>>> +
>>>> +    return kernel_base;
>>>> +}
>>>> +
>>>> +int cpu_add_extra_memory_mapping(MemoryMappingList *list)
>>>
>>> Again, what does "extra" mean? Probably guest-specific, no?
>>
>> crash will calculate the phys_base according to the virtual address and physical
>> address stored in the PT_LOAD.
> 
> Crash is a Linux-only tool, dump must not be restricted to that guest -
> but could contain transparent extensions of the file format if needed.
> 
>>
>> If the vmcore is generated by 'virsh dump'(use migration to implement dumping),
>> crash calculates the phys_base according to idt.base. The function get_phys_base_addr()
>> uses the same way to calculates the phys_base.
> 
> Hmm, where are those special registers (idt, gdt, tr etc.) stored in the
> vmcore file, BTW?

'virsh dump' uses mirgation to implement dumping now. So the vmcore has all
registers.

> 
>>
>> I think crash may work without this. I will verify it.

I want to modify crash to make it work without this. I am discussing it
with Dave Anderson in crash community now.

> 
> Does gdb require this?

IIRC, the answer is no.

Thanks
Wen Congyang

> 
> Jan
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 05/16 v6] Add API to get memory mapping
  2012-02-15  9:41         ` Wen Congyang
@ 2012-02-15  9:47           ` HATAYAMA Daisuke
  2012-02-15 10:19             ` Jan Kiszka
  0 siblings, 1 reply; 68+ messages in thread
From: HATAYAMA Daisuke @ 2012-02-15  9:47 UTC (permalink / raw)
  To: wency; +Cc: jan.kiszka, anderson, qemu-devel, eblake, lcapitulino

From: Wen Congyang <wency@cn.fujitsu.com>
Subject: Re: [RFC][PATCH 05/16 v6] Add API to get memory mapping
Date: Wed, 15 Feb 2012 17:41:15 +0800

> At 02/15/2012 05:17 PM, Jan Kiszka Wrote:
>> On 2012-02-15 05:07, Wen Congyang wrote:
>>> At 02/15/2012 01:21 AM, Jan Kiszka Wrote:
>>>> On 2012-02-09 04:22, Wen Congyang wrote:
>>>>> Add API to get all virtual address and physical address mapping.
>>>>> If there is no virtual address for some physical address, the virtual
>>>>> address is 0.
>>>>>
>>>>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>>>>> ---
>>>>>  memory_mapping.c |   65 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>  memory_mapping.h |    1 +
>>>>>  2 files changed, 66 insertions(+), 0 deletions(-)
>>>>>
>>>>> diff --git a/memory_mapping.c b/memory_mapping.c
>>>>> index d83b7d7..fc0ddee 100644
>>>>> --- a/memory_mapping.c
>>>>> +++ b/memory_mapping.c
>>>>> @@ -128,3 +128,68 @@ void free_memory_mapping_list(MemoryMappingList *list)
>>>>>  
>>>>>      list->num = 0;
>>>>>  }
>>>>> +
>>>>> +void get_memory_mapping(MemoryMappingList *list)
>>>>> +{
>>>>> +    CPUState *env;
>>>>> +    MemoryMapping *memory_mapping;
>>>>> +    RAMBlock *block;
>>>>> +    ram_addr_t offset, length;
>>>>> +
>>>>> +    last_mapping = NULL;
>>>>> +
>>>>> +    for (env = first_cpu; env != NULL; env = env->next_cpu) {
>>>>> +        cpu_get_memory_mapping(list, env);
>>>>
>>>> Hmm, is the CPU number recorded along with the mappings? I mean, how
>>>> could crash tell them apart afterward if they are contradictory? This
>>>> way, they are just thrown in the same bucket, correct?
>>>>
>>>> Even if crash or gdb aren't prepared for cpu/thread-specific mappings,
>>>> could we already record that information for later use? Or would it
>>>> break compatibility with current versions?
>>>
>>> crash does not need this information. It only needs the physical address
>>> stored in PT_LOAD.
>> 
>> So crash does not support viewing memory through the eyes of different
>> CPUs? OK.
>> 
>>>
>>> gdb needs the virtual address and physical address stored in PT_LOAD.
>>>
>>> If the address is in the kernel space, the virtual address and physical
>>> address mapping should be the same. I collect the mapping information
>>> from all vcpus, because the OS may enter the second kernel. In this case,
>>> IIRC(according to my test result, but I don't remeber clearly), gdb's bt
>>> can output the backtrace in the first kernel if the OS does not use the
>>> first vcpu to do kdump. otherwise gdb's bt can output the backtrace in
>>> the second kernel.
>> 
>> gdb could only make proper use of the additional mappings if they are
>> not contradictory (which can easily happen with user space processes) or
>> the cpu context is additionally provided so that views can be switched
>> via the "thread N" command. So far, QEMU's gdbstub does this for gdb
>> when it requests some memory over the remote connection. I bet gdb
>> requires some extension to exploit such information offline from a core
>> file, but I'm also sure that this will come as the importance of gdb for
>> system level debugging will rise.
>> 
>> Therefore my question: is there room to encode the mapping relation to a
>> CPU/thread context?
> 
> I donot know. But I think the answer is no, because there is no filed
> in the struct Elf32_Phdr/Elf64_Phdr to store the CPU/thread id.
> 

See NT_PRSTATUS note, from which gdb knows what CPUs is related to
what thread.

For vmcore generated by kdump, NT_PRSTATUS notes is contained in the
order corresponding to online cpus.

If crash reads the vmcore generated by this command just as by kdump
and not considering this, crash might be understanding each CPU
information wrongly because qemu dump generated all possible CPUs.

Thanks.
HATAYAMA, Daisuke

> Thanks
> Wen Congyang
> 
>> 
>> Jan
>> 
> 
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 10/16 v6] run dump at the background
  2012-02-15  9:35               ` Wen Congyang
@ 2012-02-15 10:16                 ` Jan Kiszka
  0 siblings, 0 replies; 68+ messages in thread
From: Jan Kiszka @ 2012-02-15 10:16 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

On 2012-02-15 10:35, Wen Congyang wrote:
> At 02/15/2012 05:21 PM, Jan Kiszka Wrote:
>> On 2012-02-15 10:22, Wen Congyang wrote:
>>> At 02/15/2012 05:07 PM, Jan Kiszka Wrote:
>>>> On 2012-02-15 04:47, Wen Congyang wrote:
>>>>> At 02/15/2012 02:27 AM, Jan Kiszka Wrote:
>>>>>> On 2012-02-14 19:05, Jan Kiszka wrote:
>>>>>>> On 2012-02-09 04:28, Wen Congyang wrote:
>>>>>>>> The new monitor command dump may take long time to finish. So we need run it
>>>>>>>> at the background.
>>>>>>>
>>>>>>> How does it work? Like live migration, i.e. you retransmit (overwrite)
>>>>>>> already written but then dirtied pages? Hmm... no.
>>>>>>>
>>>>>>> What does background mean then? What is the use case? What if the user
>>>>>>> decides to resume the vm?
>>>>>>
>>>>>> OK, that is addressed in patch 15! I would suggest merging it into this
>>>>>> patch. It makes sense to handle that case gracefully right from the
>>>>>> beginning.
>>>>>
>>>>> OK, I will merge it.
>>>>>
>>>>>>
>>>>>> OK, now I have some other question: What is the point of rate-limiting
>>>>>> the dump? The guest is not running, thus not competing for bandwidth.
>>>>>
>>>>> I use bandwidth to try to control the writing speed. If we write the vmcore
>>>>> to disk in a high speed, it may affect some other appilications which use
>>>>> the same disk too.
>>>>
>>>> Just like the guest of that particular VM can do. I don't think we need
>>>> this level of control here, it will be provided (if required) at a
>>>> different level, affecting the whole QEMU process. Removing the vmcore
>>>> bandwidth control will simplify code and user interface.
>>>
>>> OK. I will implementing it like this:
>>> 1. write 100ms
>>> 2. sleep 100ms(allow qemu to do the other things)
>>> 3. goto 1
>>
>> Why? Just write as fast as possible.
> 
> If the memory is too big, the command will take too long time. 
> Eric said:
>   It sounds like it is long-running, which
>   means it probably needs to be asynchronous, as well as issue an event
>   upon completion, so that other monitor commands can be issued in the
>   meantime.

Asynchronous doesn't mean throttled. It means not waiting for
potentially long-running I/O in the context of the monitor, but becoming
interactive again.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 05/16 v6] Add API to get memory mapping
  2012-02-15  9:47           ` HATAYAMA Daisuke
@ 2012-02-15 10:19             ` Jan Kiszka
  0 siblings, 0 replies; 68+ messages in thread
From: Jan Kiszka @ 2012-02-15 10:19 UTC (permalink / raw)
  To: HATAYAMA Daisuke
  Cc: eblake@redhat.com, anderson@redhat.com, qemu-devel@nongnu.org,
	lcapitulino@redhat.com

On 2012-02-15 10:47, HATAYAMA Daisuke wrote:
> From: Wen Congyang <wency@cn.fujitsu.com>
> Subject: Re: [RFC][PATCH 05/16 v6] Add API to get memory mapping
> Date: Wed, 15 Feb 2012 17:41:15 +0800
> 
>> At 02/15/2012 05:17 PM, Jan Kiszka Wrote:
>>> On 2012-02-15 05:07, Wen Congyang wrote:
>>>> At 02/15/2012 01:21 AM, Jan Kiszka Wrote:
>>>>> On 2012-02-09 04:22, Wen Congyang wrote:
>>>>>> Add API to get all virtual address and physical address mapping.
>>>>>> If there is no virtual address for some physical address, the virtual
>>>>>> address is 0.
>>>>>>
>>>>>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>>>>>> ---
>>>>>>  memory_mapping.c |   65 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>  memory_mapping.h |    1 +
>>>>>>  2 files changed, 66 insertions(+), 0 deletions(-)
>>>>>>
>>>>>> diff --git a/memory_mapping.c b/memory_mapping.c
>>>>>> index d83b7d7..fc0ddee 100644
>>>>>> --- a/memory_mapping.c
>>>>>> +++ b/memory_mapping.c
>>>>>> @@ -128,3 +128,68 @@ void free_memory_mapping_list(MemoryMappingList *list)
>>>>>>  
>>>>>>      list->num = 0;
>>>>>>  }
>>>>>> +
>>>>>> +void get_memory_mapping(MemoryMappingList *list)
>>>>>> +{
>>>>>> +    CPUState *env;
>>>>>> +    MemoryMapping *memory_mapping;
>>>>>> +    RAMBlock *block;
>>>>>> +    ram_addr_t offset, length;
>>>>>> +
>>>>>> +    last_mapping = NULL;
>>>>>> +
>>>>>> +    for (env = first_cpu; env != NULL; env = env->next_cpu) {
>>>>>> +        cpu_get_memory_mapping(list, env);
>>>>>
>>>>> Hmm, is the CPU number recorded along with the mappings? I mean, how
>>>>> could crash tell them apart afterward if they are contradictory? This
>>>>> way, they are just thrown in the same bucket, correct?
>>>>>
>>>>> Even if crash or gdb aren't prepared for cpu/thread-specific mappings,
>>>>> could we already record that information for later use? Or would it
>>>>> break compatibility with current versions?
>>>>
>>>> crash does not need this information. It only needs the physical address
>>>> stored in PT_LOAD.
>>>
>>> So crash does not support viewing memory through the eyes of different
>>> CPUs? OK.
>>>
>>>>
>>>> gdb needs the virtual address and physical address stored in PT_LOAD.
>>>>
>>>> If the address is in the kernel space, the virtual address and physical
>>>> address mapping should be the same. I collect the mapping information
>>>> from all vcpus, because the OS may enter the second kernel. In this case,
>>>> IIRC(according to my test result, but I don't remeber clearly), gdb's bt
>>>> can output the backtrace in the first kernel if the OS does not use the
>>>> first vcpu to do kdump. otherwise gdb's bt can output the backtrace in
>>>> the second kernel.
>>>
>>> gdb could only make proper use of the additional mappings if they are
>>> not contradictory (which can easily happen with user space processes) or
>>> the cpu context is additionally provided so that views can be switched
>>> via the "thread N" command. So far, QEMU's gdbstub does this for gdb
>>> when it requests some memory over the remote connection. I bet gdb
>>> requires some extension to exploit such information offline from a core
>>> file, but I'm also sure that this will come as the importance of gdb for
>>> system level debugging will rise.
>>>
>>> Therefore my question: is there room to encode the mapping relation to a
>>> CPU/thread context?
>>
>> I donot know. But I think the answer is no, because there is no filed
>> in the struct Elf32_Phdr/Elf64_Phdr to store the CPU/thread id.
>>
> 
> See NT_PRSTATUS note, from which gdb knows what CPUs is related to
> what thread.
> 
> For vmcore generated by kdump, NT_PRSTATUS notes is contained in the
> order corresponding to online cpus.
> 
> If crash reads the vmcore generated by this command just as by kdump
> and not considering this, crash might be understanding each CPU
> information wrongly because qemu dump generated all possible CPUs.

If that note makes most sense to encode the mapping context, let's use
it and fix crash to be prepared for it.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 07/16 v6] target-i386: Add API to add extra memory mapping
  2012-02-15  9:44         ` Wen Congyang
@ 2012-02-15 10:21           ` Jan Kiszka
  2012-02-17  9:32             ` Wen Congyang
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Kiszka @ 2012-02-15 10:21 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

On 2012-02-15 10:44, Wen Congyang wrote:
> At 02/15/2012 05:21 PM, Jan Kiszka Wrote:
>> On 2012-02-15 06:19, Wen Congyang wrote:
>>> At 02/15/2012 01:35 AM, Jan Kiszka Wrote:
>>>> On 2012-02-09 04:24, Wen Congyang wrote:
>>>>> Crash needs extra memory mapping to determine phys_base.
>>>>>
>>>>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>>>>> ---
>>>>>  cpu-all.h               |    2 ++
>>>>>  target-i386/arch-dump.c |   43 +++++++++++++++++++++++++++++++++++++++++++
>>>>>  2 files changed, 45 insertions(+), 0 deletions(-)
>>>>>
>>>>> diff --git a/cpu-all.h b/cpu-all.h
>>>>> index efb5ba3..290c43a 100644
>>>>> --- a/cpu-all.h
>>>>> +++ b/cpu-all.h
>>>>> @@ -530,10 +530,12 @@ int cpu_write_elf64_note(int fd, CPUState *env, int cpuid,
>>>>>                           target_phys_addr_t *offset);
>>>>>  int cpu_write_elf32_note(int fd, CPUState *env, int cpuid,
>>>>>                           target_phys_addr_t *offset);
>>>>> +int cpu_add_extra_memory_mapping(MemoryMappingList *list);
>>>>>  #else
>>>>>  #define cpu_get_memory_mapping(list, env)
>>>>>  #define cpu_write_elf64_note(fd, env, cpuid, offset) ({ -1; })
>>>>>  #define cpu_write_elf32_note(fd, env, cpuid, offset) ({ -1; })
>>>>> +#define cpu_add_extra_memory_mapping(list) ({ 0; })
>>>>>  #endif
>>>>>  
>>>>>  #endif /* CPU_ALL_H */
>>>>> diff --git a/target-i386/arch-dump.c b/target-i386/arch-dump.c
>>>>> index 4c0ff77..d96f6ae 100644
>>>>> --- a/target-i386/arch-dump.c
>>>>> +++ b/target-i386/arch-dump.c
>>>>> @@ -495,3 +495,46 @@ int cpu_write_elf32_note(int fd, CPUState *env, int cpuid,
>>>>>  {
>>>>>      return x86_write_elf32_note(fd, env, cpuid, offset);
>>>>>  }
>>>>> +
>>>>> +/* This function is copied from crash */
>>>>
>>>> And what does it do there and here? I suppose it is Linux-specific - any
>>>> version? This should be documented and encoded in the function name.
>>>>
>>>>> +static target_ulong get_phys_base_addr(CPUState *env, target_ulong *base_vaddr)
>>>>> +{
>>>>> +    int i;
>>>>> +    target_ulong kernel_base = -1;
>>>>> +    target_ulong last, mask;
>>>>> +
>>>>> +    for (i = 30, last = -1; (kernel_base == -1) && (i >= 20); i--) {
>>>>> +        mask = ~((1LL << i) - 1);
>>>>> +        *base_vaddr = env->idt.base & mask;
>>>>> +        if (*base_vaddr == last) {
>>>>> +            continue;
>>>>> +        }
>>>>> +
>>>>> +        kernel_base = cpu_get_phys_page_debug(env, *base_vaddr);
>>>>> +        last = *base_vaddr;
>>>>> +    }
>>>>> +
>>>>> +    return kernel_base;
>>>>> +}
>>>>> +
>>>>> +int cpu_add_extra_memory_mapping(MemoryMappingList *list)
>>>>
>>>> Again, what does "extra" mean? Probably guest-specific, no?
>>>
>>> crash will calculate the phys_base according to the virtual address and physical
>>> address stored in the PT_LOAD.
>>
>> Crash is a Linux-only tool, dump must not be restricted to that guest -
>> but could contain transparent extensions of the file format if needed.
>>
>>>
>>> If the vmcore is generated by 'virsh dump'(use migration to implement dumping),
>>> crash calculates the phys_base according to idt.base. The function get_phys_base_addr()
>>> uses the same way to calculates the phys_base.
>>
>> Hmm, where are those special registers (idt, gdt, tr etc.) stored in the
>> vmcore file, BTW?
> 
> 'virsh dump' uses mirgation to implement dumping now. So the vmcore has all
> registers.

This is about the new format. And there we are lacking those special
registers. At some point, gdb will understand and need them to do proper
system-level debugging. I don't know the format structure here: can we
add sections to the core file in a way that consumers that don't know
them simply ignore them?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 01/16 v6] monitor: introduce qemu_suspend_monitor()/qemu_resume_monitor()
  2012-02-15  8:51       ` Jan Kiszka
@ 2012-02-15 13:01         ` Luiz Capitulino
  2012-02-16  1:35           ` Wen Congyang
  0 siblings, 1 reply; 68+ messages in thread
From: Luiz Capitulino @ 2012-02-15 13:01 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel

On Wed, 15 Feb 2012 09:51:04 +0100
Jan Kiszka <jan.kiszka@siemens.com> wrote:

> On 2012-02-15 03:54, Wen Congyang wrote:
> > At 02/15/2012 12:19 AM, Jan Kiszka Wrote:
> >> On 2012-02-09 04:19, Wen Congyang wrote:
> >>> Sync command needs these two APIs to suspend/resume monitor.
> >>>
> >>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> >>> ---
> >>>  monitor.c |   27 +++++++++++++++++++++++++++
> >>>  monitor.h |    2 ++
> >>>  2 files changed, 29 insertions(+), 0 deletions(-)
> >>>
> >>> diff --git a/monitor.c b/monitor.c
> >>> index 11639b1..7e72739 100644
> >>> --- a/monitor.c
> >>> +++ b/monitor.c
> >>> @@ -4442,6 +4442,26 @@ static void monitor_command_cb(Monitor *mon, const char *cmdline, void *opaque)
> >>>      monitor_resume(mon);
> >>>  }
> >>>  
> >>> +int qemu_suspend_monitor(const char *fmt, ...)
> >>> +{
> >>> +    int ret;
> >>> +
> >>> +    if (cur_mon) {
> >>> +        ret = monitor_suspend(cur_mon);
> >>> +    } else {
> >>> +        ret = -ENOTTY;
> >>> +    }
> >>> +
> >>> +    if (ret < 0 && fmt) {
> >>> +        va_list ap;
> >>> +        va_start(ap, fmt);
> >>> +        monitor_vprintf(cur_mon, fmt, ap);
> >>> +        va_end(ap);
> >>> +    }
> >>> +
> >>> +    return ret;
> >>> +}
> >>> +
> >>>  int monitor_suspend(Monitor *mon)
> >>>  {
> >>>      if (!mon->rs)
> >>> @@ -4450,6 +4470,13 @@ int monitor_suspend(Monitor *mon)
> >>>      return 0;
> >>>  }
> >>>  
> >>> +void qemu_resume_monitor(void)
> >>> +{
> >>> +    if (cur_mon) {
> >>> +        monitor_resume(cur_mon);
> >>> +    }
> >>> +}
> >>> +
> >>>  void monitor_resume(Monitor *mon)
> >>>  {
> >>>      if (!mon->rs)
> >>> diff --git a/monitor.h b/monitor.h
> >>> index 58109af..60a1e17 100644
> >>> --- a/monitor.h
> >>> +++ b/monitor.h
> >>> @@ -46,7 +46,9 @@ int monitor_cur_is_qmp(void);
> >>>  void monitor_protocol_event(MonitorEvent event, QObject *data);
> >>>  void monitor_init(CharDriverState *chr, int flags);
> >>>  
> >>> +int qemu_suspend_monitor(const char *fmt, ...) GCC_FMT_ATTR(1, 2);
> >>>  int monitor_suspend(Monitor *mon);
> >>> +void qemu_resume_monitor(void);
> >>>  void monitor_resume(Monitor *mon);
> >>>  
> >>>  int monitor_read_bdrv_key_start(Monitor *mon, BlockDriverState *bs,
> >>
> >> I don't see any added value in this API, specifically as it is built on
> >> top of cur_mon. Just use the existing services like the migration code
> >> does. If you properly pass down the monitor reference from the command
> >> to the suspend and store what monitor you suspended, all should be fine.
> > 
> > This API is like qemu_get_fd() which is not merged into upstream qemu.
> > I need this API because I cannot use monitor in qapi command.
> 
> OK, then I need to comment on that approach. QMP looks flawed here.
> Either you have a need for a Monitor object (or a generic HMP/QMP
> context), then you also have a handle. Or your don't, then you do not
> need monitor suspend/resume or get_fd as well.

The getfd one is explained in the other thread, but suspend/resume should
be done from HMP only.

PS: Haven't reviewed this series yet.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 01/16 v6] monitor: introduce qemu_suspend_monitor()/qemu_resume_monitor()
  2012-02-15 13:01         ` Luiz Capitulino
@ 2012-02-16  1:35           ` Wen Congyang
  0 siblings, 0 replies; 68+ messages in thread
From: Wen Congyang @ 2012-02-16  1:35 UTC (permalink / raw)
  To: Luiz Capitulino
  Cc: Jan Kiszka, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Eric Blake

At 02/15/2012 09:01 PM, Luiz Capitulino Wrote:
> On Wed, 15 Feb 2012 09:51:04 +0100
> Jan Kiszka <jan.kiszka@siemens.com> wrote:
> 
>> On 2012-02-15 03:54, Wen Congyang wrote:
>>> At 02/15/2012 12:19 AM, Jan Kiszka Wrote:
>>>> On 2012-02-09 04:19, Wen Congyang wrote:
>>>>> Sync command needs these two APIs to suspend/resume monitor.
>>>>>
>>>>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>>>>> ---
>>>>>  monitor.c |   27 +++++++++++++++++++++++++++
>>>>>  monitor.h |    2 ++
>>>>>  2 files changed, 29 insertions(+), 0 deletions(-)
>>>>>
>>>>> diff --git a/monitor.c b/monitor.c
>>>>> index 11639b1..7e72739 100644
>>>>> --- a/monitor.c
>>>>> +++ b/monitor.c
>>>>> @@ -4442,6 +4442,26 @@ static void monitor_command_cb(Monitor *mon, const char *cmdline, void *opaque)
>>>>>      monitor_resume(mon);
>>>>>  }
>>>>>  
>>>>> +int qemu_suspend_monitor(const char *fmt, ...)
>>>>> +{
>>>>> +    int ret;
>>>>> +
>>>>> +    if (cur_mon) {
>>>>> +        ret = monitor_suspend(cur_mon);
>>>>> +    } else {
>>>>> +        ret = -ENOTTY;
>>>>> +    }
>>>>> +
>>>>> +    if (ret < 0 && fmt) {
>>>>> +        va_list ap;
>>>>> +        va_start(ap, fmt);
>>>>> +        monitor_vprintf(cur_mon, fmt, ap);
>>>>> +        va_end(ap);
>>>>> +    }
>>>>> +
>>>>> +    return ret;
>>>>> +}
>>>>> +
>>>>>  int monitor_suspend(Monitor *mon)
>>>>>  {
>>>>>      if (!mon->rs)
>>>>> @@ -4450,6 +4470,13 @@ int monitor_suspend(Monitor *mon)
>>>>>      return 0;
>>>>>  }
>>>>>  
>>>>> +void qemu_resume_monitor(void)
>>>>> +{
>>>>> +    if (cur_mon) {
>>>>> +        monitor_resume(cur_mon);
>>>>> +    }
>>>>> +}
>>>>> +
>>>>>  void monitor_resume(Monitor *mon)
>>>>>  {
>>>>>      if (!mon->rs)
>>>>> diff --git a/monitor.h b/monitor.h
>>>>> index 58109af..60a1e17 100644
>>>>> --- a/monitor.h
>>>>> +++ b/monitor.h
>>>>> @@ -46,7 +46,9 @@ int monitor_cur_is_qmp(void);
>>>>>  void monitor_protocol_event(MonitorEvent event, QObject *data);
>>>>>  void monitor_init(CharDriverState *chr, int flags);
>>>>>  
>>>>> +int qemu_suspend_monitor(const char *fmt, ...) GCC_FMT_ATTR(1, 2);
>>>>>  int monitor_suspend(Monitor *mon);
>>>>> +void qemu_resume_monitor(void);
>>>>>  void monitor_resume(Monitor *mon);
>>>>>  
>>>>>  int monitor_read_bdrv_key_start(Monitor *mon, BlockDriverState *bs,
>>>>
>>>> I don't see any added value in this API, specifically as it is built on
>>>> top of cur_mon. Just use the existing services like the migration code
>>>> does. If you properly pass down the monitor reference from the command
>>>> to the suspend and store what monitor you suspended, all should be fine.
>>>
>>> This API is like qemu_get_fd() which is not merged into upstream qemu.
>>> I need this API because I cannot use monitor in qapi command.
>>
>> OK, then I need to comment on that approach. QMP looks flawed here.
>> Either you have a need for a Monitor object (or a generic HMP/QMP
>> context), then you also have a handle. Or your don't, then you do not
>> need monitor suspend/resume or get_fd as well.
> 
> The getfd one is explained in the other thread, but suspend/resume should
> be done from HMP only.

I have read the newest migration code. I will change the code like that and
remove these two APIs.

Please ignore this patch

Thanks
Wen Congyang

> 
> PS: Haven't reviewed this series yet.
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 09/16 v6] introduce a new monitor command 'dump' to dump guest's memory
  2012-02-14 17:59   ` Jan Kiszka
  2012-02-15  3:44     ` Wen Congyang
@ 2012-02-17  8:52     ` Wen Congyang
  2012-02-17  9:26       ` Jan Kiszka
  2012-02-17 16:32       ` Eric Blake
  1 sibling, 2 replies; 68+ messages in thread
From: Wen Congyang @ 2012-02-17  8:52 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

At 02/15/2012 01:59 AM, Jan Kiszka Wrote:
> On 2012-02-09 04:28, Wen Congyang wrote:
>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>> ---
>>  Makefile.target  |    8 +-
>>  dump.c           |  590 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  dump.h           |    3 +
>>  hmp-commands.hx  |   16 ++
>>  hmp.c            |    9 +
>>  hmp.h            |    1 +
>>  monitor.c        |    3 +
>>  qapi-schema.json |   13 ++
>>  qmp-commands.hx  |   26 +++
>>  9 files changed, 665 insertions(+), 4 deletions(-)
>>  create mode 100644 dump.c
>>
>> diff --git a/Makefile.target b/Makefile.target
>> index d6e5684..f39ce2f 100644
>> --- a/Makefile.target
>> +++ b/Makefile.target
>> @@ -112,7 +112,7 @@ $(call set-vpath, $(SRC_PATH)/linux-user:$(SRC_PATH)/linux-user/$(TARGET_ABI_DIR
>>  QEMU_CFLAGS+=-I$(SRC_PATH)/linux-user/$(TARGET_ABI_DIR) -I$(SRC_PATH)/linux-user
>>  obj-y = main.o syscall.o strace.o mmap.o signal.o thunk.o \
>>        elfload.o linuxload.o uaccess.o gdbstub.o cpu-uname.o \
>> -      user-exec.o $(oslib-obj-y)
>> +      user-exec.o $(oslib-obj-y) dump.o
>>
>>  obj-$(TARGET_HAS_BFLT) += flatload.o
>>
>> @@ -150,7 +150,7 @@ LDFLAGS+=-Wl,-segaddr,__STD_PROG_ZONE,0x1000 -image_base 0x0e000000
>>  LIBS+=-lmx
>>
>>  obj-y = main.o commpage.o machload.o mmap.o signal.o syscall.o thunk.o \
>> -        gdbstub.o user-exec.o
>> +        gdbstub.o user-exec.o dump.o
>>
>>  obj-i386-y += ioport-user.o
>>
>> @@ -172,7 +172,7 @@ $(call set-vpath, $(SRC_PATH)/bsd-user)
>>  QEMU_CFLAGS+=-I$(SRC_PATH)/bsd-user -I$(SRC_PATH)/bsd-user/$(TARGET_ARCH)
>>
>>  obj-y = main.o bsdload.o elfload.o mmap.o signal.o strace.o syscall.o \
>> -        gdbstub.o uaccess.o user-exec.o
>> +        gdbstub.o uaccess.o user-exec.o dump.o
>>
>>  obj-i386-y += ioport-user.o
>>
>> @@ -188,7 +188,7 @@ endif #CONFIG_BSD_USER
>>  # System emulator target
>>  ifdef CONFIG_SOFTMMU
>>
>> -obj-y = arch_init.o cpus.o monitor.o machine.o gdbstub.o balloon.o ioport.o
>> +obj-y = arch_init.o cpus.o monitor.o machine.o gdbstub.o balloon.o ioport.o dump.o
>>  # virtio has to be here due to weird dependency between PCI and virtio-net.
>>  # need to fix this properly
>>  obj-$(CONFIG_NO_PCI) += pci-stub.o
>> diff --git a/dump.c b/dump.c
>> new file mode 100644
>> index 0000000..a0e8b86
>> --- /dev/null
>> +++ b/dump.c
>> @@ -0,0 +1,590 @@
>> +/*
>> + * QEMU dump
>> + *
>> + * Copyright Fujitsu, Corp. 2011
>> + *
>> + * Authors:
>> + *     Wen Congyang <wency@cn.fujitsu.com>
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2.  See
>> + * the COPYING file in the top-level directory.
>> + *
>> + */
>> +
>> +#include "qemu-common.h"
>> +#include <unistd.h>
>> +#include <elf.h>
>> +#include <sys/procfs.h>
>> +#include <glib.h>
>> +#include "cpu.h"
>> +#include "cpu-all.h"
>> +#include "targphys.h"
>> +#include "monitor.h"
>> +#include "kvm.h"
>> +#include "dump.h"
>> +#include "sysemu.h"
>> +#include "bswap.h"
>> +#include "memory_mapping.h"
>> +#include "error.h"
>> +#include "qmp-commands.h"
>> +
>> +#define CPU_CONVERT_TO_TARGET16(val) \
>> +({ \
>> +    uint16_t _val = (val); \
>> +    if (endian == ELFDATA2LSB) { \
>> +        _val = cpu_to_le16(_val); \
>> +    } else {\
>> +        _val = cpu_to_be16(_val); \
>> +    } \
>> +    _val; \
>> +})
>> +
>> +#define CPU_CONVERT_TO_TARGET32(val) \
>> +({ \
>> +    uint32_t _val = (val); \
>> +    if (endian == ELFDATA2LSB) { \
>> +        _val = cpu_to_le32(_val); \
>> +    } else {\
>> +        _val = cpu_to_be32(_val); \
>> +    } \
>> +    _val; \
>> +})
>> +
>> +#define CPU_CONVERT_TO_TARGET64(val) \
>> +({ \
>> +    uint64_t _val = (val); \
>> +    if (endian == ELFDATA2LSB) { \
>> +        _val = cpu_to_le64(_val); \
>> +    } else {\
>> +        _val = cpu_to_be64(_val); \
>> +    } \
>> +    _val; \
>> +})
> 
> static inline functions, please.
> 
>> +
>> +enum {
>> +    DUMP_STATE_ERROR,
>> +    DUMP_STATE_SETUP,
>> +    DUMP_STATE_CANCELLED,
>> +    DUMP_STATE_ACTIVE,
>> +    DUMP_STATE_COMPLETED,
>> +};
>> +
>> +typedef struct DumpState {
>> +    ArchDumpInfo dump_info;
>> +    MemoryMappingList list;
>> +    int phdr_num;
>> +    int state;
>> +    char *error;
>> +    int fd;
>> +    target_phys_addr_t memory_offset;
>> +} DumpState;
>> +
>> +static DumpState *dump_get_current(void)
>> +{
>> +    static DumpState current_dump = {
>> +        .state = DUMP_STATE_SETUP,
>> +    };
>> +
>> +    return &current_dump;
>> +}
>> +
>> +static int dump_cleanup(DumpState *s)
>> +{
>> +    int ret = 0;
>> +
>> +    free_memory_mapping_list(&s->list);
>> +    if (s->fd != -1) {
>> +        close(s->fd);
>> +        s->fd = -1;
>> +    }
>> +
>> +    return ret;
>> +}
>> +
>> +static void dump_error(DumpState *s, const char *reason)
>> +{
>> +    s->state = DUMP_STATE_ERROR;
>> +    s->error = g_strdup(reason);
>> +    dump_cleanup(s);
>> +}
>> +
>> +static inline int cpuid(CPUState *env)
>> +{
>> +#if defined(CONFIG_USER_ONLY) && defined(CONFIG_USE_NPTL)
>> +    return env->host_tid;
> 
> Curious: Does this command already work with user mode guest?
> 
>> +#else
>> +    return env->cpu_index + 1;
>> +#endif
>> +}
> 
> There is gdb_id in gdbstub. It should be made generally avialable and
> reused here.
> 
>> +
>> +static int write_elf64_header(DumpState *s)
>> +{
>> +    Elf64_Ehdr elf_header;
>> +    int ret;
>> +    int endian = s->dump_info.d_endian;
>> +
>> +    memset(&elf_header, 0, sizeof(Elf64_Ehdr));
>> +    memcpy(&elf_header, ELFMAG, 4);
>> +    elf_header.e_ident[EI_CLASS] = ELFCLASS64;
>> +    elf_header.e_ident[EI_DATA] = s->dump_info.d_endian;
>> +    elf_header.e_ident[EI_VERSION] = EV_CURRENT;
>> +    elf_header.e_type = CPU_CONVERT_TO_TARGET16(ET_CORE);
>> +    elf_header.e_machine = CPU_CONVERT_TO_TARGET16(s->dump_info.d_machine);
>> +    elf_header.e_version = CPU_CONVERT_TO_TARGET32(EV_CURRENT);
>> +    elf_header.e_ehsize = CPU_CONVERT_TO_TARGET16(sizeof(elf_header));
>> +    elf_header.e_phoff = CPU_CONVERT_TO_TARGET64(sizeof(Elf64_Ehdr));
>> +    elf_header.e_phentsize = CPU_CONVERT_TO_TARGET16(sizeof(Elf64_Phdr));
>> +    elf_header.e_phnum = CPU_CONVERT_TO_TARGET16(s->phdr_num);
>> +
>> +    lseek(s->fd, 0, SEEK_SET);
>> +    ret = write(s->fd, &elf_header, sizeof(elf_header));
>> +    if (ret < 0) {
>> +        dump_error(s, "dump: failed to write elf header.\n");
>> +        return -1;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int write_elf32_header(DumpState *s)
>> +{
>> +    Elf32_Ehdr elf_header;
>> +    int ret;
>> +    int endian = s->dump_info.d_endian;
>> +
>> +    memset(&elf_header, 0, sizeof(Elf32_Ehdr));
>> +    memcpy(&elf_header, ELFMAG, 4);
>> +    elf_header.e_ident[EI_CLASS] = ELFCLASS32;
>> +    elf_header.e_ident[EI_DATA] = endian;
>> +    elf_header.e_ident[EI_VERSION] = EV_CURRENT;
>> +    elf_header.e_type = CPU_CONVERT_TO_TARGET16(ET_CORE);
>> +    elf_header.e_machine = CPU_CONVERT_TO_TARGET16(s->dump_info.d_machine);
>> +    elf_header.e_version = CPU_CONVERT_TO_TARGET32(EV_CURRENT);
>> +    elf_header.e_ehsize = CPU_CONVERT_TO_TARGET16(sizeof(elf_header));
>> +    elf_header.e_phoff = CPU_CONVERT_TO_TARGET32(sizeof(Elf32_Ehdr));
>> +    elf_header.e_phentsize = CPU_CONVERT_TO_TARGET16(sizeof(Elf32_Phdr));
>> +    elf_header.e_phnum = CPU_CONVERT_TO_TARGET16(s->phdr_num);
>> +
>> +    lseek(s->fd, 0, SEEK_SET);
>> +    ret = write(s->fd, &elf_header, sizeof(elf_header));
>> +    if (ret < 0) {
>> +        dump_error(s, "dump: failed to write elf header.\n");
>> +        return -1;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int write_elf64_load(DumpState *s, MemoryMapping *memory_mapping,
>> +                            int phdr_index, target_phys_addr_t offset)
>> +{
>> +    Elf64_Phdr phdr;
>> +    off_t phdr_offset;
>> +    int ret;
>> +    int endian = s->dump_info.d_endian;
>> +
>> +    memset(&phdr, 0, sizeof(Elf64_Phdr));
>> +    phdr.p_type = CPU_CONVERT_TO_TARGET32(PT_LOAD);
>> +    phdr.p_offset = CPU_CONVERT_TO_TARGET64(offset);
>> +    phdr.p_paddr = CPU_CONVERT_TO_TARGET64(memory_mapping->phys_addr);
>> +    if (offset == -1) {
>> +        phdr.p_filesz = 0;
>> +    } else {
>> +        phdr.p_filesz = CPU_CONVERT_TO_TARGET64(memory_mapping->length);
>> +    }
>> +    phdr.p_memsz = CPU_CONVERT_TO_TARGET64(memory_mapping->length);
>> +    phdr.p_vaddr = CPU_CONVERT_TO_TARGET64(memory_mapping->virt_addr);
>> +
>> +    phdr_offset = sizeof(Elf64_Ehdr) + sizeof(Elf64_Phdr)*phdr_index;
>> +    lseek(s->fd, phdr_offset, SEEK_SET);
>> +    ret = write(s->fd, &phdr, sizeof(Elf64_Phdr));
>> +    if (ret < 0) {
>> +        dump_error(s, "dump: failed to write program header table.\n");
>> +        return -1;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int write_elf32_load(DumpState *s, MemoryMapping *memory_mapping,
>> +                            int phdr_index, target_phys_addr_t offset)
>> +{
>> +    Elf32_Phdr phdr;
>> +    off_t phdr_offset;
>> +    int ret;
>> +    int endian = s->dump_info.d_endian;
>> +
>> +    memset(&phdr, 0, sizeof(Elf32_Phdr));
>> +    phdr.p_type = CPU_CONVERT_TO_TARGET32(PT_LOAD);
>> +    phdr.p_offset = CPU_CONVERT_TO_TARGET32(offset);
>> +    phdr.p_paddr = CPU_CONVERT_TO_TARGET32(memory_mapping->phys_addr);
>> +    if (offset == -1) {
>> +        phdr.p_filesz = 0;
>> +    } else {
>> +        phdr.p_filesz = CPU_CONVERT_TO_TARGET32(memory_mapping->length);
>> +    }
>> +    phdr.p_memsz = CPU_CONVERT_TO_TARGET32(memory_mapping->length);
>> +    phdr.p_vaddr = CPU_CONVERT_TO_TARGET32(memory_mapping->virt_addr);
>> +
>> +    phdr_offset = sizeof(Elf32_Ehdr) + sizeof(Elf32_Phdr)*phdr_index;
>> +    lseek(s->fd, phdr_offset, SEEK_SET);
>> +    ret = write(s->fd, &phdr, sizeof(Elf32_Phdr));
>> +    if (ret < 0) {
>> +        dump_error(s, "dump: failed to write program header table.\n");
>> +        return -1;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int write_elf64_notes(DumpState *s, int phdr_index,
>> +                             target_phys_addr_t *offset)
>> +{
>> +    CPUState *env;
>> +    int ret;
>> +    target_phys_addr_t begin = *offset;
>> +    Elf64_Phdr phdr;
>> +    off_t phdr_offset;
>> +    int id;
>> +    int endian = s->dump_info.d_endian;
>> +
>> +    for (env = first_cpu; env != NULL; env = env->next_cpu) {
>> +        id = cpuid(env);
>> +        ret = cpu_write_elf64_note(s->fd, env, id, offset);
>> +        if (ret < 0) {
>> +            dump_error(s, "dump: failed to write elf notes.\n");
>> +            return -1;
>> +        }
>> +    }
>> +
>> +    memset(&phdr, 0, sizeof(Elf64_Phdr));
>> +    phdr.p_type = CPU_CONVERT_TO_TARGET32(PT_NOTE);
>> +    phdr.p_offset = CPU_CONVERT_TO_TARGET64(begin);
>> +    phdr.p_paddr = 0;
>> +    phdr.p_filesz = CPU_CONVERT_TO_TARGET64(*offset - begin);
>> +    phdr.p_memsz = CPU_CONVERT_TO_TARGET64(*offset - begin);
>> +    phdr.p_vaddr = 0;
>> +
>> +    phdr_offset = sizeof(Elf64_Ehdr);
>> +    lseek(s->fd, phdr_offset, SEEK_SET);
>> +    ret = write(s->fd, &phdr, sizeof(Elf64_Phdr));
>> +    if (ret < 0) {
>> +        dump_error(s, "dump: failed to write program header table.\n");
>> +        return -1;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int write_elf32_notes(DumpState *s, int phdr_index,
>> +                             target_phys_addr_t *offset)
>> +{
>> +    CPUState *env;
>> +    int ret;
>> +    target_phys_addr_t begin = *offset;
>> +    Elf32_Phdr phdr;
>> +    off_t phdr_offset;
>> +    int id;
>> +    int endian = s->dump_info.d_endian;
>> +
>> +    for (env = first_cpu; env != NULL; env = env->next_cpu) {
>> +        id = cpuid(env);
>> +        ret = cpu_write_elf32_note(s->fd, env, id, offset);
>> +        if (ret < 0) {
>> +            dump_error(s, "dump: failed to write elf notes.\n");
>> +            return -1;
>> +        }
>> +    }
>> +
>> +    memset(&phdr, 0, sizeof(Elf32_Phdr));
>> +    phdr.p_type = CPU_CONVERT_TO_TARGET32(PT_NOTE);
>> +    phdr.p_offset = CPU_CONVERT_TO_TARGET32(begin);
>> +    phdr.p_paddr = 0;
>> +    phdr.p_filesz = CPU_CONVERT_TO_TARGET32(*offset - begin);
>> +    phdr.p_memsz = CPU_CONVERT_TO_TARGET32(*offset - begin);
>> +    phdr.p_vaddr = 0;
>> +
>> +    phdr_offset = sizeof(Elf32_Ehdr);
>> +    lseek(s->fd, phdr_offset, SEEK_SET);
>> +    ret = write(s->fd, &phdr, sizeof(Elf32_Phdr));
>> +    if (ret < 0) {
>> +        dump_error(s, "dump: failed to write program header table.\n");
>> +        return -1;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int write_data(DumpState *s, void *buf, int length,
>> +                      target_phys_addr_t *offset)
>> +{
>> +    int ret;
>> +
>> +    lseek(s->fd, *offset, SEEK_SET);
>> +    ret = write(s->fd, buf, length);
>> +    if (ret < 0) {
>> +        dump_error(s, "dump: failed to save memory.\n");
>> +        return -1;
>> +    }
>> +
>> +    *offset += length;
>> +    return 0;
>> +}
>> +
>> +/* write the memroy to vmcore. 1 page per I/O. */
>> +static int write_memory(DumpState *s, RAMBlock *block,
>> +                        target_phys_addr_t *offset)
>> +{
>> +    int i, ret;
>> +
>> +    for (i = 0; i < block->length / TARGET_PAGE_SIZE; i++) {
>> +        ret = write_data(s, block->host + i * TARGET_PAGE_SIZE,
>> +                         TARGET_PAGE_SIZE, offset);
>> +        if (ret < 0) {
>> +            return -1;
>> +        }
>> +    }
>> +
>> +    if ((block->length % TARGET_PAGE_SIZE) != 0) {
>> +        ret = write_data(s, block->host + i * TARGET_PAGE_SIZE,
>> +                         block->length % TARGET_PAGE_SIZE, offset);
>> +        if (ret < 0) {
>> +            return -1;
>> +        }
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +/* get the memory's offset in the vmcore */
>> +static target_phys_addr_t get_offset(target_phys_addr_t phys_addr,
>> +                                     target_phys_addr_t memory_offset)
>> +{
>> +    RAMBlock *block;
>> +    target_phys_addr_t offset = memory_offset;
>> +
>> +    QLIST_FOREACH(block, &ram_list.blocks, next) {
>> +        if (phys_addr >= block->offset &&
>> +            phys_addr < block->offset + block->length) {
>> +            return phys_addr - block->offset + offset;
>> +        }
>> +        offset += block->length;
>> +    }
>> +
>> +    return -1;
>> +}
>> +
>> +static DumpState *dump_init(int fd, Error **errp)
>> +{
>> +    CPUState *env;
>> +    DumpState *s = dump_get_current();
>> +    int ret;
>> +
>> +    vm_stop(RUN_STATE_PAUSED);
> 
> I would save the current vm state first and restore it when finished.

There is no API to get current vm state. If you want this feature, I will
add API to get it.

Thanks
Wen Congyang

> 
>> +    s->state = DUMP_STATE_SETUP;
>> +    if (s->error) {
>> +        g_free(s->error);
>> +        s->error = NULL;
>> +    }
>> +    s->fd = fd;
>> +
>> +    /*
>> +     * get dump info: endian, class and architecture.
>> +     * If the target architecture is not supported, cpu_get_dump_info() will
>> +     * return -1.
>> +     *
>> +     * if we use kvm, we should synchronize the register before we get dump
>> +     * info.
>> +     */
>> +    for (env = first_cpu; env != NULL; env = env->next_cpu) {
>> +        cpu_synchronize_state(env);
>> +    }
>> +    ret = cpu_get_dump_info(&s->dump_info);
>> +    if (ret < 0) {
>> +        error_set(errp, QERR_UNSUPPORTED);
>> +        return NULL;
>> +    }
>> +
>> +    /* get memory mapping */
>> +    s->list.num = 0;
>> +    QTAILQ_INIT(&s->list.head);
>> +    get_memory_mapping(&s->list);
>> +
>> +    /* crash needs extra memory mapping to determine phys_base. */
>> +    ret = cpu_add_extra_memory_mapping(&s->list);
>> +    if (ret < 0) {
>> +        error_set(errp, QERR_UNDEFINED_ERROR);
>> +        return NULL;
>> +    }
>> +
>> +    /*
>> +     * calculate phdr_num
>> +     *
>> +     * the type of phdr->num is uint16_t, so we should avoid overflow
>> +     */
>> +    s->phdr_num = 1; /* PT_NOTE */
>> +    if (s->list.num > (1 << 16) - 2) {
>> +        s->phdr_num = (1 << 16) - 1;
>> +    } else {
>> +        s->phdr_num += s->list.num;
>> +    }
>> +
>> +    return s;
>> +}
>> +
>> +/* write elf header, PT_NOTE and elf note to vmcore. */
>> +static int dump_begin(DumpState *s)
>> +{
>> +    target_phys_addr_t offset;
>> +    int ret;
>> +
>> +    s->state = DUMP_STATE_ACTIVE;
>> +
>> +    /*
>> +     * the vmcore's format is:
>> +     *   --------------
>> +     *   |  elf header |
>> +     *   --------------
>> +     *   |  PT_NOTE    |
>> +     *   --------------
>> +     *   |  PT_LOAD    |
>> +     *   --------------
>> +     *   |  ......     |
>> +     *   --------------
>> +     *   |  PT_LOAD    |
>> +     *   --------------
>> +     *   |  elf note   |
>> +     *   --------------
>> +     *   |  memory     |
>> +     *   --------------
>> +     *
>> +     * we only know where the memory is saved after we write elf note into
>> +     * vmcore.
>> +     */
>> +
>> +    /* write elf header to vmcore */
>> +    if (s->dump_info.d_class == ELFCLASS64) {
>> +        ret = write_elf64_header(s);
>> +    } else {
>> +        ret = write_elf32_header(s);
>> +    }
>> +    if (ret < 0) {
>> +        return -1;
>> +    }
>> +
>> +    /* write elf notes to vmcore */
>> +    if (s->dump_info.d_class == ELFCLASS64) {
>> +        offset = sizeof(Elf64_Ehdr) + sizeof(Elf64_Phdr)*s->phdr_num;
>> +        ret = write_elf64_notes(s, 0, &offset);
>> +    } else {
>> +        offset = sizeof(Elf32_Ehdr) + sizeof(Elf32_Phdr)*s->phdr_num;
>> +        ret = write_elf32_notes(s, 0, &offset);
>> +    }
>> +
>> +    if (ret < 0) {
>> +        return -1;
>> +    }
>> +
>> +    s->memory_offset = offset;
>> +    return 0;
>> +}
>> +
>> +/* write PT_LOAD to vmcore */
>> +static int dump_completed(DumpState *s)
>> +{
>> +    target_phys_addr_t offset;
>> +    MemoryMapping *memory_mapping;
>> +    int phdr_index = 1, ret;
>> +
>> +    QTAILQ_FOREACH(memory_mapping, &s->list.head, next) {
>> +        offset = get_offset(memory_mapping->phys_addr, s->memory_offset);
>> +        if (s->dump_info.d_class == ELFCLASS64) {
>> +            ret = write_elf64_load(s, memory_mapping, phdr_index++, offset);
>> +        } else {
>> +            ret = write_elf32_load(s, memory_mapping, phdr_index++, offset);
>> +        }
>> +        if (ret < 0) {
>> +            return -1;
>> +        }
>> +    }
>> +
>> +    s->state = DUMP_STATE_COMPLETED;
>> +    dump_cleanup(s);
>> +    return 0;
>> +}
>> +
>> +/* write all memory to vmcore */
>> +static int dump_iterate(DumpState *s)
>> +{
>> +    RAMBlock *block;
>> +    target_phys_addr_t offset = s->memory_offset;
>> +    int ret;
>> +
>> +    /* write all memory to vmcore */
>> +    QLIST_FOREACH(block, &ram_list.blocks, next) {
>> +        ret = write_memory(s, block, &offset);
>> +        if (ret < 0) {
>> +            return -1;
>> +        }
>> +    }
>> +
>> +    return dump_completed(s);
>> +}
>> +
>> +static int create_vmcore(DumpState *s)
>> +{
>> +    int ret;
>> +
>> +    ret = dump_begin(s);
>> +    if (ret < 0) {
>> +        return -1;
>> +    }
>> +
>> +    ret = dump_iterate(s);
>> +    if (ret < 0) {
>> +        return -1;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +void qmp_dump(const char *file, Error **errp)
>> +{
>> +    const char *p;
>> +    int fd = -1;
>> +    DumpState *s;
>> +
>> +#if !defined(WIN32)
>> +    if (strstart(file, "fd:", &p)) {
>> +        fd = qemu_get_fd(p);
>> +        if (fd == -1) {
>> +            error_set(errp, QERR_FD_NOT_FOUND, p);
>> +            return;
>> +        }
>> +    }
>> +#endif
>> +
>> +    if  (strstart(file, "file:", &p)) {
>> +        fd = open(p, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, S_IRUSR);
>> +        if (fd < 0) {
>> +            error_set(errp, QERR_OPEN_FILE_FAILED, p);
>> +            return;
>> +        }
>> +    }
>> +
>> +    if (fd == -1) {
>> +        error_set(errp, QERR_INVALID_PARAMETER, "file");
>> +        return;
>> +    }
>> +
>> +    s = dump_init(fd, errp);
>> +    if (!s) {
>> +        return;
>> +    }
>> +
>> +    if (create_vmcore(s) < 0) {
>> +        error_set(errp, QERR_IO_ERROR);
>> +    }
>> +
>> +    return;
>> +}
>> diff --git a/dump.h b/dump.h
>> index a36468b..b413d18 100644
>> --- a/dump.h
>> +++ b/dump.h
>> @@ -1,6 +1,9 @@
>>  #ifndef DUMP_H
>>  #define DUMP_H
>>
>> +#include "qdict.h"
>> +#include "error.h"
>> +
> 
> This looks stray. Nothing is added to this header which require those
> includes.
> 
>>  typedef struct ArchDumpInfo {
>>      int d_machine;  /* Architecture */
>>      int d_endian;   /* ELFDATA2LSB or ELFDATA2MSB */
>> diff --git a/hmp-commands.hx b/hmp-commands.hx
>> index 573b823..6cfb678 100644
>> --- a/hmp-commands.hx
>> +++ b/hmp-commands.hx
>> @@ -867,6 +867,22 @@ new parameters (if specified) once the vm migration finished successfully.
>>  ETEXI
>>
>>      {
>> +        .name       = "dump",
>> +        .args_type  = "file:s",
>> +        .params     = "file",
>> +        .help       = "dump to file",
>> +        .user_print = monitor_user_noop,
>> +        .mhandler.cmd = hmp_dump,
>> +    },
>> +
>> +
>> +STEXI
>> +@item dump @var{file}
>> +@findex dump
>> +Dump to @var{file}.
> 
> That's way too brief! :) It should state the format, mention potential
> architecture limitations, and explain that the output can be processed
> with crash or gdb.
> 
>> +ETEXI
>> +
>> +    {
>>          .name       = "snapshot_blkdev",
>>          .args_type  = "device:B,snapshot-file:s?,format:s?",
>>          .params     = "device [new-image-file] [format]",
>> diff --git a/hmp.c b/hmp.c
>> index 8ff8c94..1a69857 100644
>> --- a/hmp.c
>> +++ b/hmp.c
>> @@ -851,3 +851,12 @@ void hmp_block_job_cancel(Monitor *mon, const QDict *qdict)
>>
>>      hmp_handle_error(mon, &error);
>>  }
>> +
>> +void hmp_dump(Monitor *mon, const QDict *qdict)
>> +{
>> +    Error *errp = NULL;
>> +    const char *file = qdict_get_str(qdict, "file");
>> +
>> +    qmp_dump(file, &errp);
>> +    hmp_handle_error(mon, &errp);
>> +}
>> diff --git a/hmp.h b/hmp.h
>> index 18eecbd..66984c5 100644
>> --- a/hmp.h
>> +++ b/hmp.h
>> @@ -58,5 +58,6 @@ void hmp_block_set_io_throttle(Monitor *mon, const QDict *qdict);
>>  void hmp_block_stream(Monitor *mon, const QDict *qdict);
>>  void hmp_block_job_set_speed(Monitor *mon, const QDict *qdict);
>>  void hmp_block_job_cancel(Monitor *mon, const QDict *qdict);
>> +void hmp_dump(Monitor *mon, const QDict *qdict);
>>
>>  #endif
>> diff --git a/monitor.c b/monitor.c
>> index 7e72739..18e1ac7 100644
>> --- a/monitor.c
>> +++ b/monitor.c
>> @@ -73,6 +73,9 @@
>>  #endif
>>  #include "hw/lm32_pic.h"
>>
>> +/* for dump */
>> +#include "dump.h"
>> +
>>  //#define DEBUG
>>  //#define DEBUG_COMPLETION
>>
>> diff --git a/qapi-schema.json b/qapi-schema.json
>> index d02ee86..1013ae6 100644
>> --- a/qapi-schema.json
>> +++ b/qapi-schema.json
>> @@ -1582,3 +1582,16 @@
>>  { 'command': 'qom-list-types',
>>    'data': { '*implements': 'str', '*abstract': 'bool' },
>>    'returns': [ 'ObjectTypeInfo' ] }
>> +
>> +##
>> +# @dump
>> +#
>> +# Dump guest's memory to vmcore.
>> +#
>> +# @file: the filename or file descriptor of the vmcore.
>> +#
>> +# Returns: nothing on success
>> +#
>> +# Since: 1.1
>> +##
>> +{ 'command': 'dump', 'data': { 'file': 'str' } }
>> diff --git a/qmp-commands.hx b/qmp-commands.hx
>> index b5e2ab8..52d3d3b 100644
>> --- a/qmp-commands.hx
>> +++ b/qmp-commands.hx
>> @@ -566,6 +566,32 @@ Example:
>>  EQMP
>>
>>      {
>> +        .name       = "dump",
>> +        .args_type  = "file:s",
>> +        .params     = "file",
>> +        .help       = "dump to file",
>> +        .user_print = monitor_user_noop,
>> +        .mhandler.cmd_new = qmp_marshal_input_dump,
>> +    },
>> +
>> +SQMP
>> +dump
>> +
>> +
>> +Dump to file.
>> +
>> +Arguments:
>> +
>> +- "file": Destination file (json-string)
> 
> The code looks like it supports both file names and file descriptors,
> no? Same for HMP.
> 
>> +
>> +Example:
>> +
>> +-> { "execute": "dump", "arguments": { "file": "fd:dump" } }
>> +<- { "return": {} }
>> +
>> +EQMP
>> +
>> +    {
>>          .name       = "netdev_add",
>>          .args_type  = "netdev:O",
>>          .params     = "[user|tap|socket],id=str[,prop=value][,...]",
>> --
>> 1.7.1
>>
> 
> Jan
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 09/16 v6] introduce a new monitor command 'dump' to dump guest's memory
  2012-02-17  8:52     ` Wen Congyang
@ 2012-02-17  9:26       ` Jan Kiszka
  2012-02-17  9:35         ` Wen Congyang
  2012-02-17 16:32       ` Eric Blake
  1 sibling, 1 reply; 68+ messages in thread
From: Jan Kiszka @ 2012-02-17  9:26 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

On 2012-02-17 09:52, Wen Congyang wrote:
>>> +static DumpState *dump_init(int fd, Error **errp)
>>> +{
>>> +    CPUState *env;
>>> +    DumpState *s = dump_get_current();
>>> +    int ret;
>>> +
>>> +    vm_stop(RUN_STATE_PAUSED);
>>
>> I would save the current vm state first and restore it when finished.
> 
> There is no API to get current vm state. If you want this feature, I will
> add API to get it.

You are looking for runstate_is_running().

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 07/16 v6] target-i386: Add API to add extra memory mapping
  2012-02-15 10:21           ` Jan Kiszka
@ 2012-02-17  9:32             ` Wen Congyang
  2012-02-17 11:34               ` HATAYAMA Daisuke
  0 siblings, 1 reply; 68+ messages in thread
From: Wen Congyang @ 2012-02-17  9:32 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

At 02/15/2012 06:21 PM, Jan Kiszka Wrote:
> On 2012-02-15 10:44, Wen Congyang wrote:
>> At 02/15/2012 05:21 PM, Jan Kiszka Wrote:
>>> On 2012-02-15 06:19, Wen Congyang wrote:
>>>> At 02/15/2012 01:35 AM, Jan Kiszka Wrote:
>>>>> On 2012-02-09 04:24, Wen Congyang wrote:
>>>>>> Crash needs extra memory mapping to determine phys_base.
>>>>>>
>>>>>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>>>>>> ---
>>>>>>  cpu-all.h               |    2 ++
>>>>>>  target-i386/arch-dump.c |   43 +++++++++++++++++++++++++++++++++++++++++++
>>>>>>  2 files changed, 45 insertions(+), 0 deletions(-)
>>>>>>
>>>>>> diff --git a/cpu-all.h b/cpu-all.h
>>>>>> index efb5ba3..290c43a 100644
>>>>>> --- a/cpu-all.h
>>>>>> +++ b/cpu-all.h
>>>>>> @@ -530,10 +530,12 @@ int cpu_write_elf64_note(int fd, CPUState *env, int cpuid,
>>>>>>                           target_phys_addr_t *offset);
>>>>>>  int cpu_write_elf32_note(int fd, CPUState *env, int cpuid,
>>>>>>                           target_phys_addr_t *offset);
>>>>>> +int cpu_add_extra_memory_mapping(MemoryMappingList *list);
>>>>>>  #else
>>>>>>  #define cpu_get_memory_mapping(list, env)
>>>>>>  #define cpu_write_elf64_note(fd, env, cpuid, offset) ({ -1; })
>>>>>>  #define cpu_write_elf32_note(fd, env, cpuid, offset) ({ -1; })
>>>>>> +#define cpu_add_extra_memory_mapping(list) ({ 0; })
>>>>>>  #endif
>>>>>>  
>>>>>>  #endif /* CPU_ALL_H */
>>>>>> diff --git a/target-i386/arch-dump.c b/target-i386/arch-dump.c
>>>>>> index 4c0ff77..d96f6ae 100644
>>>>>> --- a/target-i386/arch-dump.c
>>>>>> +++ b/target-i386/arch-dump.c
>>>>>> @@ -495,3 +495,46 @@ int cpu_write_elf32_note(int fd, CPUState *env, int cpuid,
>>>>>>  {
>>>>>>      return x86_write_elf32_note(fd, env, cpuid, offset);
>>>>>>  }
>>>>>> +
>>>>>> +/* This function is copied from crash */
>>>>>
>>>>> And what does it do there and here? I suppose it is Linux-specific - any
>>>>> version? This should be documented and encoded in the function name.
>>>>>
>>>>>> +static target_ulong get_phys_base_addr(CPUState *env, target_ulong *base_vaddr)
>>>>>> +{
>>>>>> +    int i;
>>>>>> +    target_ulong kernel_base = -1;
>>>>>> +    target_ulong last, mask;
>>>>>> +
>>>>>> +    for (i = 30, last = -1; (kernel_base == -1) && (i >= 20); i--) {
>>>>>> +        mask = ~((1LL << i) - 1);
>>>>>> +        *base_vaddr = env->idt.base & mask;
>>>>>> +        if (*base_vaddr == last) {
>>>>>> +            continue;
>>>>>> +        }
>>>>>> +
>>>>>> +        kernel_base = cpu_get_phys_page_debug(env, *base_vaddr);
>>>>>> +        last = *base_vaddr;
>>>>>> +    }
>>>>>> +
>>>>>> +    return kernel_base;
>>>>>> +}
>>>>>> +
>>>>>> +int cpu_add_extra_memory_mapping(MemoryMappingList *list)
>>>>>
>>>>> Again, what does "extra" mean? Probably guest-specific, no?
>>>>
>>>> crash will calculate the phys_base according to the virtual address and physical
>>>> address stored in the PT_LOAD.
>>>
>>> Crash is a Linux-only tool, dump must not be restricted to that guest -
>>> but could contain transparent extensions of the file format if needed.
>>>
>>>>
>>>> If the vmcore is generated by 'virsh dump'(use migration to implement dumping),
>>>> crash calculates the phys_base according to idt.base. The function get_phys_base_addr()
>>>> uses the same way to calculates the phys_base.
>>>
>>> Hmm, where are those special registers (idt, gdt, tr etc.) stored in the
>>> vmcore file, BTW?
>>
>> 'virsh dump' uses mirgation to implement dumping now. So the vmcore has all
>> registers.
> 
> This is about the new format. And there we are lacking those special

Yes, this file can be processed with crash. gdb cannot process such file.

> registers. At some point, gdb will understand and need them to do proper
> system-level debugging. I don't know the format structure here: can we
> add sections to the core file in a way that consumers that don't know
> them simply ignore them?

I donot find such section now. If there is such section, I think it is
better to store all cpu's register in the core file.

I try to let the core file can be processed with crash and gdb. But crash
still does not work well sometimes.

I think we can add some option to let user choose whether to store memory
mapping in the core file. Because crash does not need such mapping. If
the p_vaddr in all PT_LOAD segments is 0, crash know the file is generated
by qemu dump, and use another way to calculate phys_base.

If you agree with it, please ignore this patch.

Thanks
Wen Congyang

> 
> Jan
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 09/16 v6] introduce a new monitor command 'dump' to dump guest's memory
  2012-02-17  9:35         ` Wen Congyang
@ 2012-02-17  9:35           ` Jan Kiszka
  0 siblings, 0 replies; 68+ messages in thread
From: Jan Kiszka @ 2012-02-17  9:35 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

On 2012-02-17 10:35, Wen Congyang wrote:
> At 02/17/2012 05:26 PM, Jan Kiszka Wrote:
>> On 2012-02-17 09:52, Wen Congyang wrote:
>>>>> +static DumpState *dump_init(int fd, Error **errp)
>>>>> +{
>>>>> +    CPUState *env;
>>>>> +    DumpState *s = dump_get_current();
>>>>> +    int ret;
>>>>> +
>>>>> +    vm_stop(RUN_STATE_PAUSED);
>>>>
>>>> I would save the current vm state first and restore it when finished.
>>>
>>> There is no API to get current vm state. If you want this feature, I will
>>> add API to get it.
>>
>> You are looking for runstate_is_running().
> 
> Yes. vm_stop() stops the vcpu only when runstate_is_running(). So I think
> you need to resume all vcpu after dumping is finished.

Yes, but _only_ if runstate_is_running() was true before calling
vm_stop. That is my point.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 09/16 v6] introduce a new monitor command 'dump' to dump guest's memory
  2012-02-17  9:26       ` Jan Kiszka
@ 2012-02-17  9:35         ` Wen Congyang
  2012-02-17  9:35           ` Jan Kiszka
  0 siblings, 1 reply; 68+ messages in thread
From: Wen Congyang @ 2012-02-17  9:35 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Eric Blake, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

At 02/17/2012 05:26 PM, Jan Kiszka Wrote:
> On 2012-02-17 09:52, Wen Congyang wrote:
>>>> +static DumpState *dump_init(int fd, Error **errp)
>>>> +{
>>>> +    CPUState *env;
>>>> +    DumpState *s = dump_get_current();
>>>> +    int ret;
>>>> +
>>>> +    vm_stop(RUN_STATE_PAUSED);
>>>
>>> I would save the current vm state first and restore it when finished.
>>
>> There is no API to get current vm state. If you want this feature, I will
>> add API to get it.
> 
> You are looking for runstate_is_running().

Yes. vm_stop() stops the vcpu only when runstate_is_running(). So I think
you need to resume all vcpu after dumping is finished.

Thanks
Wen Congyang

> 
> Jan
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 07/16 v6] target-i386: Add API to add extra memory mapping
  2012-02-17  9:32             ` Wen Congyang
@ 2012-02-17 11:34               ` HATAYAMA Daisuke
  0 siblings, 0 replies; 68+ messages in thread
From: HATAYAMA Daisuke @ 2012-02-17 11:34 UTC (permalink / raw)
  To: wency; +Cc: jan.kiszka, anderson, qemu-devel, eblake, lcapitulino

From: Wen Congyang <wency@cn.fujitsu.com>
Subject: Re: [RFC][PATCH 07/16 v6] target-i386: Add API to add extra memory mapping
Date: Fri, 17 Feb 2012 17:32:56 +0800

> At 02/15/2012 06:21 PM, Jan Kiszka Wrote:
>> On 2012-02-15 10:44, Wen Congyang wrote:
>>> At 02/15/2012 05:21 PM, Jan Kiszka Wrote:
>>>> On 2012-02-15 06:19, Wen Congyang wrote:
>>>>> At 02/15/2012 01:35 AM, Jan Kiszka Wrote:
>>>>>> On 2012-02-09 04:24, Wen Congyang wrote:
>>>>>>> Crash needs extra memory mapping to determine phys_base.
>>>>>>>
>>>>>>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
>>>>>>> ---
>>>>>>>  cpu-all.h               |    2 ++
>>>>>>>  target-i386/arch-dump.c |   43 +++++++++++++++++++++++++++++++++++++++++++
>>>>>>>  2 files changed, 45 insertions(+), 0 deletions(-)
>>>>>>>
>>>>>>> diff --git a/cpu-all.h b/cpu-all.h
>>>>>>> index efb5ba3..290c43a 100644
>>>>>>> --- a/cpu-all.h
>>>>>>> +++ b/cpu-all.h
>>>>>>> @@ -530,10 +530,12 @@ int cpu_write_elf64_note(int fd, CPUState *env, int cpuid,
>>>>>>>                           target_phys_addr_t *offset);
>>>>>>>  int cpu_write_elf32_note(int fd, CPUState *env, int cpuid,
>>>>>>>                           target_phys_addr_t *offset);
>>>>>>> +int cpu_add_extra_memory_mapping(MemoryMappingList *list);
>>>>>>>  #else
>>>>>>>  #define cpu_get_memory_mapping(list, env)
>>>>>>>  #define cpu_write_elf64_note(fd, env, cpuid, offset) ({ -1; })
>>>>>>>  #define cpu_write_elf32_note(fd, env, cpuid, offset) ({ -1; })
>>>>>>> +#define cpu_add_extra_memory_mapping(list) ({ 0; })
>>>>>>>  #endif
>>>>>>>  
>>>>>>>  #endif /* CPU_ALL_H */
>>>>>>> diff --git a/target-i386/arch-dump.c b/target-i386/arch-dump.c
>>>>>>> index 4c0ff77..d96f6ae 100644
>>>>>>> --- a/target-i386/arch-dump.c
>>>>>>> +++ b/target-i386/arch-dump.c
>>>>>>> @@ -495,3 +495,46 @@ int cpu_write_elf32_note(int fd, CPUState *env, int cpuid,
>>>>>>>  {
>>>>>>>      return x86_write_elf32_note(fd, env, cpuid, offset);
>>>>>>>  }
>>>>>>> +
>>>>>>> +/* This function is copied from crash */
>>>>>>
>>>>>> And what does it do there and here? I suppose it is Linux-specific - any
>>>>>> version? This should be documented and encoded in the function name.
>>>>>>
>>>>>>> +static target_ulong get_phys_base_addr(CPUState *env, target_ulong *base_vaddr)
>>>>>>> +{
>>>>>>> +    int i;
>>>>>>> +    target_ulong kernel_base = -1;
>>>>>>> +    target_ulong last, mask;
>>>>>>> +
>>>>>>> +    for (i = 30, last = -1; (kernel_base == -1) && (i >= 20); i--) {
>>>>>>> +        mask = ~((1LL << i) - 1);
>>>>>>> +        *base_vaddr = env->idt.base & mask;
>>>>>>> +        if (*base_vaddr == last) {
>>>>>>> +            continue;
>>>>>>> +        }
>>>>>>> +
>>>>>>> +        kernel_base = cpu_get_phys_page_debug(env, *base_vaddr);
>>>>>>> +        last = *base_vaddr;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    return kernel_base;
>>>>>>> +}
>>>>>>> +
>>>>>>> +int cpu_add_extra_memory_mapping(MemoryMappingList *list)
>>>>>>
>>>>>> Again, what does "extra" mean? Probably guest-specific, no?
>>>>>
>>>>> crash will calculate the phys_base according to the virtual address and physical
>>>>> address stored in the PT_LOAD.
>>>>
>>>> Crash is a Linux-only tool, dump must not be restricted to that guest -
>>>> but could contain transparent extensions of the file format if needed.
>>>>
>>>>>
>>>>> If the vmcore is generated by 'virsh dump'(use migration to implement dumping),
>>>>> crash calculates the phys_base according to idt.base. The function get_phys_base_addr()
>>>>> uses the same way to calculates the phys_base.
>>>>
>>>> Hmm, where are those special registers (idt, gdt, tr etc.) stored in the
>>>> vmcore file, BTW?
>>>
>>> 'virsh dump' uses mirgation to implement dumping now. So the vmcore has all
>>> registers.
>> 
>> This is about the new format. And there we are lacking those special
> 
> Yes, this file can be processed with crash. gdb cannot process such file.
> 
>> registers. At some point, gdb will understand and need them to do proper
>> system-level debugging. I don't know the format structure here: can we
>> add sections to the core file in a way that consumers that don't know
>> them simply ignore them?
> 
> I donot find such section now. If there is such section, I think it is
> better to store all cpu's register in the core file.
> 
> I try to let the core file can be processed with crash and gdb. But crash
> still does not work well sometimes.
> 
> I think we can add some option to let user choose whether to store memory
> mapping in the core file. Because crash does not need such mapping. If
> the p_vaddr in all PT_LOAD segments is 0, crash know the file is generated
> by qemu dump, and use another way to calculate phys_base.
> 

If you store cpu registers in the core file, checking if the
information is contained in the core file is better.

Thanks.
HATAYAMA, Daisuke

> If you agree with it, please ignore this patch.
> 
> Thanks
> Wen Congyang
> 
>> 
>> Jan
>> 
> 
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 09/16 v6] introduce a new monitor command 'dump' to dump guest's memory
  2012-02-17  8:52     ` Wen Congyang
  2012-02-17  9:26       ` Jan Kiszka
@ 2012-02-17 16:32       ` Eric Blake
  2012-02-17 16:51         ` Jan Kiszka
  1 sibling, 1 reply; 68+ messages in thread
From: Eric Blake @ 2012-02-17 16:32 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Jan Kiszka, HATAYAMA Daisuke, Dave Anderson, qemu-devel,
	Luiz Capitulino

[-- Attachment #1: Type: text/plain, Size: 2005 bytes --]

On 02/17/2012 01:52 AM, Wen Congyang wrote:
> At 02/15/2012 01:59 AM, Jan Kiszka Wrote:
>> On 2012-02-09 04:28, Wen Congyang wrote:
>>> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>

<snip several kilobytes>

>>> +static DumpState *dump_init(int fd, Error **errp)
>>> +{
>>> +    CPUState *env;
>>> +    DumpState *s = dump_get_current();
>>> +    int ret;
>>> +
>>> +    vm_stop(RUN_STATE_PAUSED);
>>
>> I would save the current vm state first and restore it when finished.
> 
> There is no API to get current vm state. If you want this feature, I will
> add API to get it.
> 
> Thanks
> Wen Congyang

<snip several kilobytes>

Maybe it's just me, and you can ignore me if I'm speaking out of turn
for expressing my views on list netiquette, but...

I get frustrated by lengthy messages that are heavily re-quoted versions
of earlier versions, with only a very little new content embedded in the
middle where I have to hunt for it.  There's nothing wrong with using
the Delete key to trim replies down to relevant portions, which reduces
the bandwidth of the list engine as well as reduces the time spent in
reviewing email exchanges.

/me returns back to lurk mode, but with one additional observation:

There are other APIs where qemu has ended up pausing the domain and not
restoring things back to running when done, and where libvirt has had to
track existing state prior to starting actions in order to manually fix
things after the fact (see libvirt's qemudDomainCoreDump as a wrapper
around migration to file, for an example).  If we do things right in
this new DumpState API, we may want to decide to fix other monitor
commands to use the same mechanism (it won't offload any of the burden
from libvirt, which must still correctly interact with older qemu, but
would make life nicer for clients that can assume the saner semantics).

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 09/16 v6] introduce a new monitor command 'dump' to dump guest's memory
  2012-02-17 16:32       ` Eric Blake
@ 2012-02-17 16:51         ` Jan Kiszka
  2012-02-17 17:05           ` Eric Blake
  0 siblings, 1 reply; 68+ messages in thread
From: Jan Kiszka @ 2012-02-17 16:51 UTC (permalink / raw)
  To: Eric Blake; +Cc: HATAYAMA Daisuke, Dave Anderson, qemu-devel, Luiz Capitulino

On 2012-02-17 17:32, Eric Blake wrote:
> There are other APIs where qemu has ended up pausing the domain and not
> restoring things back to running when done, and where libvirt has had to
> track existing state prior to starting actions in order to manually fix
> things after the fact (see libvirt's qemudDomainCoreDump as a wrapper
> around migration to file, for an example).  If we do things right in
> this new DumpState API, we may want to decide to fix other monitor
> commands to use the same mechanism (it won't offload any of the burden
> from libvirt, which must still correctly interact with older qemu, but
> would make life nicer for clients that can assume the saner semantics).

I think there is no need for a new API. Everything you need is there:
check current state, prevent transitions or invoked handlers on
unexpected transitions. If other commands do not make use of this, they
should probably be fixed.

What command or series of commands do you have in mind?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [Qemu-devel] [RFC][PATCH 09/16 v6] introduce a new monitor command 'dump' to dump guest's memory
  2012-02-17 16:51         ` Jan Kiszka
@ 2012-02-17 17:05           ` Eric Blake
  0 siblings, 0 replies; 68+ messages in thread
From: Eric Blake @ 2012-02-17 17:05 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: HATAYAMA Daisuke, Dave Anderson, qemu-devel, Luiz Capitulino

[-- Attachment #1: Type: text/plain, Size: 2532 bytes --]

On 02/17/2012 09:51 AM, Jan Kiszka wrote:
> On 2012-02-17 17:32, Eric Blake wrote:
>> There are other APIs where qemu has ended up pausing the domain and not
>> restoring things back to running when done, and where libvirt has had to
>> track existing state prior to starting actions in order to manually fix
>> things after the fact (see libvirt's qemudDomainCoreDump as a wrapper
>> around migration to file, for an example).  If we do things right in
>> this new DumpState API, we may want to decide to fix other monitor
>> commands to use the same mechanism (it won't offload any of the burden
>> from libvirt, which must still correctly interact with older qemu, but
>> would make life nicer for clients that can assume the saner semantics).
> 
> I think there is no need for a new API. Everything you need is there:
> check current state, prevent transitions or invoked handlers on
> unexpected transitions. If other commands do not make use of this, they
> should probably be fixed.
> 
> What command or series of commands do you have in mind?

Right now, libvirt pauses qemu itself at least before issuing 'migrate'
to file, before issuing 'savevm', and before issuing
'blockdev-snapshot-sync' [1].  In particular, this comment in the
libvirt code surrounding the 'savevm' call is interesting:

    if (virDomainObjGetState(vm, NULL) == VIR_DOMAIN_RUNNING) {
        /* savevm monitor command pauses the domain emitting an event which
         * confuses libvirt since it's not notified when qemu resumes the
         * domain. Thus we stop and start CPUs ourselves.
         */

I'm not sure if the situation has improved since that comment was first
written, but it looks like a case where if libvirt were to let qemu do
the pause and resume as part of the single monitor command, instead of
libvirt breaking things into multiple monitor commands to track state
itself, then enough weird stuff happened at least with older versions of
qemu to make libvirt unhappy.

[1] Note - the fact that libvirt must pause around
'blockdev-snapshot-sync' is due to an orthogonal issue of snapshotting
more than one disk as an atomic operation; my understanding is that Jeff
Cody is working on a patch series to add a new monitor command
'blockdev-group-snapshot-sync' that would let libvirt delegate the pause
and resume to qemu instead, but that's a topic for a different thread.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]

^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2012-02-17 17:05 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-09  3:16 [Qemu-devel] [RFC][PATCH 00/16 v6] introducing a new, dedicated memory dump mechanism Wen Congyang
2012-02-09  3:19 ` [Qemu-devel] [RFC][PATCH 01/16 v6] monitor: introduce qemu_suspend_monitor()/qemu_resume_monitor() Wen Congyang
2012-02-14 16:19   ` Jan Kiszka
2012-02-15  2:54     ` Wen Congyang
2012-02-15  8:51       ` Jan Kiszka
2012-02-15 13:01         ` Luiz Capitulino
2012-02-16  1:35           ` Wen Congyang
2012-02-09  3:20 ` [Qemu-devel] [RFC][PATCH 02/16 v6] Add API to create memory mapping list Wen Congyang
2012-02-14 16:39   ` Jan Kiszka
2012-02-15  3:00     ` Wen Congyang
2012-02-09  3:21 ` [Qemu-devel] [RFC][PATCH 03/16 v6] Add API to check whether a physical address is I/O address Wen Congyang
2012-02-14 16:52   ` Jan Kiszka
2012-02-15  3:03     ` Wen Congyang
2012-02-09  3:21 ` [Qemu-devel] [RFC][PATCH 04/16 v6] target-i386: implement cpu_get_memory_mapping() Wen Congyang
2012-02-14 17:07   ` Jan Kiszka
2012-02-15  3:05     ` Wen Congyang
2012-02-09  3:22 ` [Qemu-devel] [RFC][PATCH 05/16 v6] Add API to get memory mapping Wen Congyang
2012-02-14 17:21   ` Jan Kiszka
2012-02-15  4:07     ` Wen Congyang
2012-02-15  9:17       ` Jan Kiszka
2012-02-15  9:41         ` Wen Congyang
2012-02-15  9:47           ` HATAYAMA Daisuke
2012-02-15 10:19             ` Jan Kiszka
2012-02-09  3:24 ` [Qemu-devel] [RFC][PATCH 06/16 v6] target-i386: Add API to write elf notes to core file Wen Congyang
2012-02-14 17:31   ` Jan Kiszka
2012-02-15  3:16     ` Wen Congyang
2012-02-09  3:24 ` [Qemu-devel] [RFC][PATCH 07/16 v6] target-i386: Add API to add extra memory mapping Wen Congyang
2012-02-14 17:35   ` Jan Kiszka
2012-02-15  5:19     ` Wen Congyang
2012-02-15  9:21       ` Jan Kiszka
2012-02-15  9:44         ` Wen Congyang
2012-02-15 10:21           ` Jan Kiszka
2012-02-17  9:32             ` Wen Congyang
2012-02-17 11:34               ` HATAYAMA Daisuke
2012-02-09  3:26 ` [Qemu-devel] [RFC][PATCH 08/16 v6] target-i386: add API to get dump info Wen Congyang
2012-02-14 17:39   ` Jan Kiszka
2012-02-15  3:30     ` Wen Congyang
2012-02-15  9:05       ` Jan Kiszka
2012-02-15  9:10         ` Wen Congyang
2012-02-15  9:12   ` Peter Maydell
2012-02-15  9:19     ` Wen Congyang
2012-02-09  3:28 ` [Qemu-devel] [RFC][PATCH 09/16 v6] introduce a new monitor command 'dump' to dump guest's memory Wen Congyang
2012-02-14 17:59   ` Jan Kiszka
2012-02-15  3:44     ` Wen Congyang
2012-02-17  8:52     ` Wen Congyang
2012-02-17  9:26       ` Jan Kiszka
2012-02-17  9:35         ` Wen Congyang
2012-02-17  9:35           ` Jan Kiszka
2012-02-17 16:32       ` Eric Blake
2012-02-17 16:51         ` Jan Kiszka
2012-02-17 17:05           ` Eric Blake
2012-02-09  3:28 ` [Qemu-devel] [RFC][PATCH 10/16 v6] run dump at the background Wen Congyang
2012-02-14 18:05   ` Jan Kiszka
2012-02-14 18:27     ` Jan Kiszka
2012-02-15  3:47       ` Wen Congyang
2012-02-15  9:07         ` Jan Kiszka
2012-02-15  9:22           ` Wen Congyang
2012-02-15  9:21             ` Jan Kiszka
2012-02-15  9:35               ` Wen Congyang
2012-02-15 10:16                 ` Jan Kiszka
2012-02-09  3:29 ` [Qemu-devel] [RFC][PATCH 11/16 v6] support detached dump Wen Congyang
2012-02-09  3:30 ` [Qemu-devel] [RFC][PATCH 12/16 v6] support to cancel the current dumping Wen Congyang
2012-02-09  3:32 ` [Qemu-devel] [RFC][PATCH 13/16 v6] support to set dumping speed Wen Congyang
2012-02-09  3:32 ` [Qemu-devel] [RFC][PATCH 14/16 v6] support to query dumping status Wen Congyang
2012-02-09  3:33 ` [Qemu-devel] [RFC][PATCH 15/16 v6] auto cancel dumping after vm state is changed to run Wen Congyang
2012-02-09  3:34 ` [Qemu-devel] [RFC][PATCH 16/16 v6] allow user to dump a fraction of the memory Wen Congyang
2012-02-14 18:27   ` Jan Kiszka
2012-02-13  1:45 ` [Qemu-devel] [RFC][PATCH 00/16 v6] introducing a new, dedicated memory dump mechanism Wen Congyang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).