From mboxrd@z Thu Jan 1 00:00:00 1970 From: john cooper Subject: Re: Resend: patch: qemu + hugetlbfs.. Date: Thu, 15 Jan 2009 21:19:55 -0500 Message-ID: <496FEECB.2060208@third-harmonic.com> References: <4873E400.4000409@third-harmonic.com> <4873F395.6030209@codemonkey.ws> <4874051A.8000802@third-harmonic.com> <48740F86.3050306@codemonkey.ws> <20080709170301.GA11439@dmt.cnet> <4874F156.2010708@codemonkey.ws> <48763B86.6060402@third-harmonic.com> <48764DAF.6060502@codemonkey.ws> <48766E03.4090901@third-harmonic.com> <48767558.50301@codemonkey.ws> <48767B20.20806@third-harmonic.com> <4876815E.3010109@codemonkey.ws> <48B33AAD.8000508@third-harmonic.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------030203070206050802070704" Cc: Marcelo Tosatti , john.cooper@redhat.com, avi@redhat.com To: kvm@vger.kernel.org, aliguori@us.ibm.com Return-path: Received: from dpc691978010.direcpc.com ([69.19.78.10]:39000 "EHLO anvil.third-harmonic.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755763AbZAPCfL (ORCPT ); Thu, 15 Jan 2009 21:35:11 -0500 In-Reply-To: <48B33AAD.8000508@third-harmonic.com> Sender: kvm-owner@vger.kernel.org List-ID: This is a multi-part message in MIME format. --------------030203070206050802070704 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit This trivial patch never did manage to find its way in. Marcelo called it to my attention earlier in the week. I've tweaked it to apply to kvm-83 and the resulting patch is attached. I've left the prior e-mail discussion below for reference. -john john cooper wrote: > This patch from over a month ago doesn't seem to have > made it into kvm-73 and may have been lost in the > shuffle. Attached is essentially the same patch but > as applied to kvm-73, and validated relative to that > version. > > In a nutshell the intention here is to allow > preallocation of guest huge page backed memory at > qemu initialization time to avoid a quirk in the > kernel's huge page accounting allowing overcommit > of huge pages. Failure of the kernel to resolve a > guest fault to overcommitted huge page memory during > runtime results in sigkill termination of the guest. > This patch provides the option of avoiding such > behavior at the cost of up-front preallocation of > physical huge pages backing the guest. > > -john > > > Anthony Liguori wrote: >> john cooper wrote: >>> Anthony Liguori wrote: >>>> john cooper wrote: >>>>> As it currently exists alloc_hpage_mem() is tied to >>>>> the notion of huge page allocation as it will reference >>>>> gethugepagesize() irrespective of *mem_path. So even >>>>> in the case of tmpfs backed files, if the host kernel >>>>> has been configured with CONFIG_HUGETLBFS we will wind >>>>> up doing allocations of /dev/shm mapped files at >>>>> /proc/meminfo:Hugepagesize granularity. >>>> >>>> Which is fine. It just means we round -m values up to even numbers. >>> >>> Well, yes it will round the allocation. But from a >>> minimally sufficient 4KB boundary to that of 4MB/2MB >>> relative to a 32/64 bit x86 host which is excessive. >>> >>>>> Probably not what was intended but probably not too >>>>> much of a concern as "-mem-path /dev/shm" is likely >>>>> only used in debug of this flag and associated logic. >>>>> I don't see it currently being worth the trouble to >>>>> correct from a squeaky clean POV, and doing so may >>>>> drag in far more than the header file we've just >>>>> booted above to deal with this architecture/config >>>>> dependency. >>>> >>>> Renaming a function to a name that's less accurate seems bad to me. >>>> I don't mean to be pedantic, but it seems like a strange thing to >>>> do. I prefer it the way it was before. >>> >>> I don't see any harm reverting the name. But I do >>> believe it is largely cosmetic as given the above, >>> the current code does require some work to make it >>> independent of huge page assumptions. Update attached. >>> >>> -john >> >> Looks good to me. >> >> Acked-by: Anthony Liguori >> >> -- >> To unsubscribe from this list: send the line "unsubscribe kvm" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > -- john.cooper@third-harmonic.com --------------030203070206050802070704 Content-Type: text/plain; name="prealloc.diff-090115" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="prealloc.diff-090115" kernel/x86/Kbuild | 4 ++-- qemu/vl.c | 27 ++++++++++++++++++++------- 2 files changed, 22 insertions(+), 9 deletions(-) ================================================================= --- a/qemu/vl.c +++ b/qemu/vl.c @@ -237,6 +237,7 @@ int semihosting_enabled = 0; int time_drift_fix = 0; unsigned int kvm_shadow_memory = 0; const char *mem_path = NULL; +int mem_prealloc = 1; /* force preallocation of physical target memory */ int hpagesize = 0; const char *cpu_vendor_string; #ifdef TARGET_ARM @@ -4116,7 +4117,10 @@ static void help(int exitcode) #endif "-tdf inject timer interrupts that got lost\n" "-kvm-shadow-memory megs set the amount of shadow pages to be allocated\n" - "-mem-path set the path to hugetlbfs/tmpfs mounted directory, also enables allocation of guest memory with huge pages\n" + "-mem-path set the path to hugetlbfs/tmpfs mounted directory, also\n" + " enables allocation of guest memory with huge pages\n" + "-mem-prealloc toggles preallocation of -mem-path backed physical memory\n" + " at startup. Default is enabled.\n" "-option-rom rom load a file, rom, into the option ROM space\n" #ifdef TARGET_SPARC "-prom-env variable=value set OpenBIOS nvram variables\n" @@ -4246,6 +4250,7 @@ enum { QEMU_OPTION_tdf, QEMU_OPTION_kvm_shadow_memory, QEMU_OPTION_mempath, + QEMU_OPTION_mem_prealloc }; typedef struct QEMUOption { @@ -4381,6 +4386,7 @@ static const QEMUOption qemu_options[] = { "icount", HAS_ARG, QEMU_OPTION_icount }, { "incoming", HAS_ARG, QEMU_OPTION_incoming }, { "mem-path", HAS_ARG, QEMU_OPTION_mempath }, + { "mem-prealloc", 0, QEMU_OPTION_mem_prealloc }, { NULL }, }; @@ -4662,7 +4668,7 @@ void *alloc_mem_area(size_t memory, unsi { char *filename; void *area; - int fd; + int fd, flags; if (asprintf(&filename, "%s/kvm.XXXXXX", path) == -1) return NULL; @@ -4690,13 +4696,17 @@ void *alloc_mem_area(size_t memory, unsi */ ftruncate(fd, memory); - area = mmap(0, memory, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0); + /* NB: MAP_POPULATE won't exhaustively alloc all phys pages in the case + * MAP_PRIVATE is requested. For mem_prealloc we mmap as MAP_SHARED + * to sidestep this quirk. + */ + flags = mem_prealloc ? MAP_POPULATE|MAP_SHARED : MAP_PRIVATE; + area = mmap(0, memory, PROT_READ|PROT_WRITE, flags, fd, 0); if (area == MAP_FAILED) { - perror("mmap"); - close(fd); - return NULL; + perror("alloc_mem_area: can't mmap hugetlbfs pages"); + close(fd); + return (NULL); } - *len = memory; return area; } @@ -5377,6 +5387,9 @@ int main(int argc, char **argv, char **e case QEMU_OPTION_mempath: mem_path = optarg; break; + case QEMU_OPTION_mem_prealloc: + mem_prealloc = !mem_prealloc; + break; case QEMU_OPTION_name: qemu_name = optarg; break; ================================================================= --- a/kernel/x86/Kbuild +++ b/kernel/x86/Kbuild @@ -9,8 +9,8 @@ kvm-objs := kvm_main.o x86.o mmu.o x86_e ifeq ($(EXT_CONFIG_KVM_TRACE),y) kvm-objs += kvm_trace.o endif -ifeq ($(CONFIG_DMAR),y) -kvm-objs += vtd.o +ifeq ($(CONFIG_IOMMU_API),y) +kvm-objs += iommu.o endif kvm-intel-objs := vmx.o vmx-debug.o ../external-module-compat.o kvm-amd-objs := svm.o ../external-module-compat.o --------------030203070206050802070704--