From: Anthony Liguori <anthony@codemonkey.ws>
To: john cooper <john.cooper@third-harmonic.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
kvm@vger.kernel.org, john.cooper@redhat.com
Subject: Re: patch: qemu + hugetlbfs..
Date: Thu, 10 Jul 2008 12:58:07 -0500 [thread overview]
Message-ID: <48764DAF.6060502@codemonkey.ws> (raw)
In-Reply-To: <48763B86.6060402@third-harmonic.com>
john cooper wrote:
> Anthony Liguori wrote:
>> Marcelo Tosatti wrote:
>>> This is Linux's behaviour for all filesystems. There is no error
>>> checking on MAP_POPULATE's attempt to prefault pages.
>>>
>>
>> I thought that mmap() will fail with hugetlbfs if enough large pages
>> can't be reserved?
>
> This appears to be the case for MAP_SHARED. For MAP_PRIVATE
> the accounting is side-stepped, the result of which was the
> best-effort prealloc/populate behavior we had see seen.
>
> Marcelo and I batted this around some yesterday. And there
> doesn't appear to be any concern using MAP_SHARED for the
> phys_ram map as potential access to the backing file is
> still regulated by filesystem permissions.
>
> Doing so dispenses with the syscall/useraddr hack to fault
> in pages since mmap() fails as expected in the case
> sufficient pages are unavailable immediately to back
> MAP_SHARED pages. We also decided to defer the map-fallback
> logic. This is essentially behavior we'd like to see
> provided by kvm, ideally on a dynamic (guest fault) basis
> and was technically beyond the issue we were trying to
> target.
>
> Attached is the resulting patch which also addresses the
> other concerns which arose.
>
> -john
>
> --- a/qemu/vl.c
> +++ b/qemu/vl.c
> @@ -45,6 +45,7 @@
> #include <signal.h>
> #include <time.h>
> #include <errno.h>
> +#include <asm/param.h>
>
I don't think this is necessary anymore. Depending on a Linux headers
breaks the QEMU build on other unices so it's a bad thing.
> #include <sys/time.h>
> #include <zlib.h>
>
> @@ -234,6 +235,7 @@ int autostart = 1;
> int time_drift_fix = 0;
> unsigned int kvm_shadow_memory = 0;
> const char *mem_path = NULL;
> +int mem_prealloc = 1; /* force preallocation of physical target memory */
> int hpagesize = 0;
> const char *cpu_vendor_string;
> #ifdef TARGET_ARM
> @@ -7809,7 +7811,10 @@ static void help(int exitcode)
> #endif
> "-tdf inject timer interrupts that got lost\n"
> "-kvm-shadow-memory megs set the amount of shadow pages to be allocated\n"
> - "-mem-path set the path to hugetlbfs/tmpfs mounted directory, also enables allocation of guest memory with huge pages\n"
> + "-mem-path set the path to hugetlbfs/tmpfs mounted directory, also\n"
> + " enables allocation of guest memory with huge pages\n"
> + "-mem-prealloc toggles preallocation of huge page physical memory at\n"
> + " startup when -mem-path is specified. Default is enabled.\n"
>
This really isn't huge page specific. It should work equally well with
tmpfs, no?
> "-option-rom rom load a file, rom, into the option ROM space\n"
> #ifdef TARGET_SPARC
> "-prom-env variable=value set OpenBIOS nvram variables\n"
> @@ -7932,6 +7937,7 @@ enum {
> QEMU_OPTION_tdf,
> QEMU_OPTION_kvm_shadow_memory,
> QEMU_OPTION_mempath,
> + QEMU_OPTION_mem_prealloc
> };
>
> typedef struct QEMUOption {
> @@ -8059,6 +8065,7 @@ const QEMUOption qemu_options[] = {
> { "startdate", HAS_ARG, QEMU_OPTION_startdate },
> { "tb-size", HAS_ARG, QEMU_OPTION_tb_size },
> { "mem-path", HAS_ARG, QEMU_OPTION_mempath },
> + { "mem-prealloc", 0, QEMU_OPTION_mem_prealloc },
> { NULL },
> };
>
> @@ -8276,11 +8283,13 @@ static int gethugepagesize(void)
> return hugepagesize;
> }
>
> -void *alloc_mem_area(unsigned long memory, const char *path)
> +/* attempt to allocate memory mmap'ed to mem_path
> + */
> +void *alloc_hpage_mem(unsigned long memory, const char *path)
> {
> char *filename;
> void *area;
> - int fd;
> + int fd, flags;
>
> if (asprintf(&filename, "%s/kvm.XXXXXX", path) == -1)
> return NULL;
> @@ -8308,26 +8317,24 @@ void *alloc_mem_area(unsigned long memor
> */
> ftruncate(fd, memory);
>
> - area = mmap(0, memory, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
> - if (area == MAP_FAILED) {
> - perror("mmap");
> - close(fd);
> - return NULL;
> - }
> -
> - return area;
> + /* NB: MAP_POPULATE won't exhaustively alloc all phys pages in the case
> + * MAP_PRIVATE is requested. For mem_prealloc we mmap as MAP_SHARED
> + * to sidestep this quirk.
> + */
> + flags = mem_prealloc ? MAP_POPULATE|MAP_SHARED : MAP_PRIVATE;
> + area = mmap(0, memory, PROT_READ|PROT_WRITE, flags, fd, 0);
> + if (area != MAP_FAILED)
> + return (area);
> + perror("alloc_hpage_mem: can't mmap hugetlbfs pages");
> + close(fd);
> + return (NULL);
> }
>
> -void *qemu_alloc_physram(unsigned long memory)
> +/* allocate guest memory as requested
> + */
> +void *qemu_alloc_physram(unsigned long size)
> {
> - void *area = NULL;
> -
> - if (mem_path)
> - area = alloc_mem_area(memory, mem_path);
> - if (!area)
> - area = qemu_vmalloc(memory);
> -
> - return area;
> + return (mem_path ? alloc_hpage_mem(size, mem_path) : qemu_vmalloc(size));
> }
>
I'd prefer if you didn't convert from a if/else to conditional. hpage
is a misnomer too as we aren't actually dependent on huge pages (this
code should work equally well for tmpfs).
Regards,
Anthony Liguori
> int main(int argc, char **argv)
> @@ -8962,6 +8969,9 @@ int main(int argc, char **argv)
> case QEMU_OPTION_mempath:
> mem_path = optarg;
> break;
> + case QEMU_OPTION_mem_prealloc:
> + mem_prealloc = !mem_prealloc;
> + break;
> case QEMU_OPTION_name:
> qemu_name = optarg;
> break;
>
next prev parent reply other threads:[~2008-07-10 17:59 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-07-08 22:02 patch: qemu + hugetlbfs john cooper
2008-07-08 23:09 ` Anthony Liguori
2008-07-09 0:23 ` john cooper
2008-07-09 1:08 ` Anthony Liguori
2008-07-09 17:03 ` Marcelo Tosatti
2008-07-09 17:11 ` Anthony Liguori
2008-07-10 16:40 ` john cooper
2008-07-10 17:58 ` Anthony Liguori [this message]
2008-07-10 20:16 ` john cooper
2008-07-10 20:47 ` Anthony Liguori
2008-07-10 21:12 ` john cooper
2008-07-10 21:38 ` Anthony Liguori
2008-08-25 23:05 ` Resend: " john cooper
2008-08-26 8:11 ` Avi Kivity
2008-08-27 4:13 ` john cooper
2009-01-16 2:19 ` john cooper
2009-01-20 10:29 ` Avi Kivity
2009-01-23 21:21 ` john cooper
2009-02-05 15:42 ` Avi Kivity
2009-02-05 16:12 ` Marcelo Tosatti
2009-02-05 16:15 ` Avi Kivity
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48764DAF.6060502@codemonkey.ws \
--to=anthony@codemonkey.ws \
--cc=john.cooper@redhat.com \
--cc=john.cooper@third-harmonic.com \
--cc=kvm@vger.kernel.org \
--cc=mtosatti@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox