From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NkMcu-0003Tk-Hk for qemu-devel@nongnu.org; Wed, 24 Feb 2010 14:10:28 -0500 Received: from [199.232.76.173] (port=50054 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NkMct-0003TQ-Vl for qemu-devel@nongnu.org; Wed, 24 Feb 2010 14:10:28 -0500 Received: from Debian-exim by monty-python.gnu.org with spam-scanned (Exim 4.60) (envelope-from ) id 1NkMct-0005Wd-0q for qemu-devel@nongnu.org; Wed, 24 Feb 2010 14:10:27 -0500 Received: from e33.co.us.ibm.com ([32.97.110.151]:46804) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1NkMcs-0005WX-LE for qemu-devel@nongnu.org; Wed, 24 Feb 2010 14:10:26 -0500 Received: from d03relay03.boulder.ibm.com (d03relay03.boulder.ibm.com [9.17.195.228]) by e33.co.us.ibm.com (8.14.3/8.13.1) with ESMTP id o1OJ6tlN020241 for ; Wed, 24 Feb 2010 12:06:55 -0700 Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170]) by d03relay03.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id o1OJAGFs129512 for ; Wed, 24 Feb 2010 12:10:17 -0700 Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1]) by d03av04.boulder.ibm.com (8.14.3/8.13.1/NCO v10.0 AVout) with ESMTP id o1OJAFEF028980 for ; Wed, 24 Feb 2010 12:10:15 -0700 From: Anthony Liguori Date: Wed, 24 Feb 2010 13:10:12 -0600 Message-Id: <1267038612-21581-1-git-send-email-aliguori@us.ibm.com> Subject: [Qemu-devel] [PATCH] pc: madvise(MADV_DONTNEED) memory on reset List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: Anthony Liguori If you compare the RSS of a freshly booted guest and the same guest after a reboot, it's very likely the freshly booted guest will have an RSS that is much lower the the rebooted guest because the previous run of the guest faulted in all available memory. This patch addresses this issue by using madvise() during reset. It only resets RAM areas which means it has to be done in the machine. I've only done this for the x86 target because I'm fairly confident that this is allowed architecturally on x86 although I'm not sure this is universely true. This does not appear to have an observable cost with a large memory guest and I can't really think of any down sides. Reported-by: Karl Rister Signed-off-by: Anthony Liguori --- hw/pc.c | 40 ++++++++++++++++++++++++++++++++++++++++ 1 files changed, 40 insertions(+), 0 deletions(-) diff --git a/hw/pc.c b/hw/pc.c index 4f6a522..10446ba 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -45,6 +45,11 @@ #include "loader.h" #include "elf.h" #include "multiboot.h" +#include "kvm.h" + +#ifndef _WIN32 +#include +#endif /* output Bochs bios info messages */ //#define DEBUG_BIOS @@ -63,10 +68,19 @@ #define MAX_IDE_BUS 2 +#define MAX_MEMORY_ENTRIES 10 + +typedef struct MemoryEntry { + ram_addr_t addr; + ram_addr_t size; +} MemoryEntry; + static FDCtrl *floppy_controller; static RTCState *rtc_state; static PITState *pit; static PCII440FXState *i440fx_state; +static int num_memory_entries; +static MemoryEntry memory_entries[MAX_MEMORY_ENTRIES]; #define E820_NR_ENTRIES 16 @@ -782,6 +796,27 @@ static CPUState *pc_new_cpu(const char *cpu_model) return env; } +static void add_mem_entry(ram_addr_t addr, ram_addr_t size) +{ + memory_entries[num_memory_entries].addr = addr; + memory_entries[num_memory_entries].size = size; + num_memory_entries++; +} + +static void pc_reset_ram(void *opaque) +{ + int i; + + for (i = 0; i < num_memory_entries; i++) { +#ifndef _WIN32 + if (!kvm_enabled() || kvm_has_sync_mmu()) { + madvise(qemu_get_ram_ptr(memory_entries[i].addr), + memory_entries[i].size, MADV_DONTNEED); + } +#endif + } +} + /* PC hardware initialisation */ static void pc_init1(ram_addr_t ram_size, const char *boot_device, @@ -835,6 +870,7 @@ static void pc_init1(ram_addr_t ram_size, /* allocate RAM */ ram_addr = qemu_ram_alloc(0xa0000); cpu_register_physical_memory(0, 0xa0000, ram_addr); + add_mem_entry(ram_addr, 0xa0000); /* Allocate, even though we won't register, so we don't break the * phys_ram_base + PA assumption. This range includes vga (0xa0000 - 0xc0000), @@ -845,6 +881,7 @@ static void pc_init1(ram_addr_t ram_size, cpu_register_physical_memory(0x100000, below_4g_mem_size - 0x100000, ram_addr); + add_mem_entry(ram_addr, below_4g_mem_size - 0x100000); /* above 4giga memory allocation */ if (above_4g_mem_size > 0) { @@ -855,6 +892,7 @@ static void pc_init1(ram_addr_t ram_size, cpu_register_physical_memory(0x100000000ULL, above_4g_mem_size, ram_addr); + add_mem_entry(ram_addr, above_4g_mem_size); #endif } @@ -1050,6 +1088,8 @@ static void pc_init1(ram_addr_t ram_size, pci_create_simple(pci_bus, -1, "lsi53c895a"); } } + + qemu_register_reset(pc_reset_ram, NULL); } static void pc_init_pci(ram_addr_t ram_size, -- 1.6.5.2