From mboxrd@z Thu Jan 1 00:00:00 1970 From: Anthony Liguori Subject: Re: patch: qemu + hugetlbfs.. Date: Thu, 10 Jul 2008 12:58:07 -0500 Message-ID: <48764DAF.6060502@codemonkey.ws> References: <4873E400.4000409@third-harmonic.com> <4873F395.6030209@codemonkey.ws> <4874051A.8000802@third-harmonic.com> <48740F86.3050306@codemonkey.ws> <20080709170301.GA11439@dmt.cnet> <4874F156.2010708@codemonkey.ws> <48763B86.6060402@third-harmonic.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Marcelo Tosatti , kvm@vger.kernel.org, john.cooper@redhat.com To: john cooper Return-path: Received: from yx-out-2324.google.com ([74.125.44.28]:34629 "EHLO yx-out-2324.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753608AbYGJR7E (ORCPT ); Thu, 10 Jul 2008 13:59:04 -0400 Received: by yx-out-2324.google.com with SMTP id 8so1154411yxm.1 for ; Thu, 10 Jul 2008 10:59:03 -0700 (PDT) In-Reply-To: <48763B86.6060402@third-harmonic.com> Sender: kvm-owner@vger.kernel.org List-ID: john cooper wrote: > Anthony Liguori wrote: >> Marcelo Tosatti wrote: >>> This is Linux's behaviour for all filesystems. There is no error >>> checking on MAP_POPULATE's attempt to prefault pages. >>> >> >> I thought that mmap() will fail with hugetlbfs if enough large pages >> can't be reserved? > > This appears to be the case for MAP_SHARED. For MAP_PRIVATE > the accounting is side-stepped, the result of which was the > best-effort prealloc/populate behavior we had see seen. > > Marcelo and I batted this around some yesterday. And there > doesn't appear to be any concern using MAP_SHARED for the > phys_ram map as potential access to the backing file is > still regulated by filesystem permissions. > > Doing so dispenses with the syscall/useraddr hack to fault > in pages since mmap() fails as expected in the case > sufficient pages are unavailable immediately to back > MAP_SHARED pages. We also decided to defer the map-fallback > logic. This is essentially behavior we'd like to see > provided by kvm, ideally on a dynamic (guest fault) basis > and was technically beyond the issue we were trying to > target. > > Attached is the resulting patch which also addresses the > other concerns which arose. > > -john > > --- a/qemu/vl.c > +++ b/qemu/vl.c > @@ -45,6 +45,7 @@ > #include > #include > #include > +#include > I don't think this is necessary anymore. Depending on a Linux headers breaks the QEMU build on other unices so it's a bad thing. > #include > #include > > @@ -234,6 +235,7 @@ int autostart = 1; > int time_drift_fix = 0; > unsigned int kvm_shadow_memory = 0; > const char *mem_path = NULL; > +int mem_prealloc = 1; /* force preallocation of physical target memory */ > int hpagesize = 0; > const char *cpu_vendor_string; > #ifdef TARGET_ARM > @@ -7809,7 +7811,10 @@ static void help(int exitcode) > #endif > "-tdf inject timer interrupts that got lost\n" > "-kvm-shadow-memory megs set the amount of shadow pages to be allocated\n" > - "-mem-path set the path to hugetlbfs/tmpfs mounted directory, also enables allocation of guest memory with huge pages\n" > + "-mem-path set the path to hugetlbfs/tmpfs mounted directory, also\n" > + " enables allocation of guest memory with huge pages\n" > + "-mem-prealloc toggles preallocation of huge page physical memory at\n" > + " startup when -mem-path is specified. Default is enabled.\n" > This really isn't huge page specific. It should work equally well with tmpfs, no? > "-option-rom rom load a file, rom, into the option ROM space\n" > #ifdef TARGET_SPARC > "-prom-env variable=value set OpenBIOS nvram variables\n" > @@ -7932,6 +7937,7 @@ enum { > QEMU_OPTION_tdf, > QEMU_OPTION_kvm_shadow_memory, > QEMU_OPTION_mempath, > + QEMU_OPTION_mem_prealloc > }; > > typedef struct QEMUOption { > @@ -8059,6 +8065,7 @@ const QEMUOption qemu_options[] = { > { "startdate", HAS_ARG, QEMU_OPTION_startdate }, > { "tb-size", HAS_ARG, QEMU_OPTION_tb_size }, > { "mem-path", HAS_ARG, QEMU_OPTION_mempath }, > + { "mem-prealloc", 0, QEMU_OPTION_mem_prealloc }, > { NULL }, > }; > > @@ -8276,11 +8283,13 @@ static int gethugepagesize(void) > return hugepagesize; > } > > -void *alloc_mem_area(unsigned long memory, const char *path) > +/* attempt to allocate memory mmap'ed to mem_path > + */ > +void *alloc_hpage_mem(unsigned long memory, const char *path) > { > char *filename; > void *area; > - int fd; > + int fd, flags; > > if (asprintf(&filename, "%s/kvm.XXXXXX", path) == -1) > return NULL; > @@ -8308,26 +8317,24 @@ void *alloc_mem_area(unsigned long memor > */ > ftruncate(fd, memory); > > - area = mmap(0, memory, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0); > - if (area == MAP_FAILED) { > - perror("mmap"); > - close(fd); > - return NULL; > - } > - > - return area; > + /* NB: MAP_POPULATE won't exhaustively alloc all phys pages in the case > + * MAP_PRIVATE is requested. For mem_prealloc we mmap as MAP_SHARED > + * to sidestep this quirk. > + */ > + flags = mem_prealloc ? MAP_POPULATE|MAP_SHARED : MAP_PRIVATE; > + area = mmap(0, memory, PROT_READ|PROT_WRITE, flags, fd, 0); > + if (area != MAP_FAILED) > + return (area); > + perror("alloc_hpage_mem: can't mmap hugetlbfs pages"); > + close(fd); > + return (NULL); > } > > -void *qemu_alloc_physram(unsigned long memory) > +/* allocate guest memory as requested > + */ > +void *qemu_alloc_physram(unsigned long size) > { > - void *area = NULL; > - > - if (mem_path) > - area = alloc_mem_area(memory, mem_path); > - if (!area) > - area = qemu_vmalloc(memory); > - > - return area; > + return (mem_path ? alloc_hpage_mem(size, mem_path) : qemu_vmalloc(size)); > } > I'd prefer if you didn't convert from a if/else to conditional. hpage is a misnomer too as we aren't actually dependent on huge pages (this code should work equally well for tmpfs). Regards, Anthony Liguori > int main(int argc, char **argv) > @@ -8962,6 +8969,9 @@ int main(int argc, char **argv) > case QEMU_OPTION_mempath: > mem_path = optarg; > break; > + case QEMU_OPTION_mem_prealloc: > + mem_prealloc = !mem_prealloc; > + break; > case QEMU_OPTION_name: > qemu_name = optarg; > break; >