From: john cooper <john.cooper@third-harmonic.com>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
kvm@vger.kernel.org, john.cooper@redhat.com
Subject: Re: patch: qemu + hugetlbfs..
Date: Thu, 10 Jul 2008 12:40:38 -0400 [thread overview]
Message-ID: <48763B86.6060402@third-harmonic.com> (raw)
In-Reply-To: <4874F156.2010708@codemonkey.ws>
[-- Attachment #1: Type: text/plain, Size: 1201 bytes --]
Anthony Liguori wrote:
> Marcelo Tosatti wrote:
>> This is Linux's behaviour for all filesystems. There is no error
>> checking on MAP_POPULATE's attempt to prefault pages.
>>
>
> I thought that mmap() will fail with hugetlbfs if enough large pages
> can't be reserved?
This appears to be the case for MAP_SHARED. For MAP_PRIVATE
the accounting is side-stepped, the result of which was the
best-effort prealloc/populate behavior we had see seen.
Marcelo and I batted this around some yesterday. And there
doesn't appear to be any concern using MAP_SHARED for the
phys_ram map as potential access to the backing file is
still regulated by filesystem permissions.
Doing so dispenses with the syscall/useraddr hack to fault
in pages since mmap() fails as expected in the case
sufficient pages are unavailable immediately to back
MAP_SHARED pages. We also decided to defer the map-fallback
logic. This is essentially behavior we'd like to see
provided by kvm, ideally on a dynamic (guest fault) basis
and was technically beyond the issue we were trying to
target.
Attached is the resulting patch which also addresses the
other concerns which arose.
-john
--
john.cooper@third-harmonic.com
[-- Attachment #2: prealloc.diff-080710 --]
[-- Type: text/plain, Size: 3678 bytes --]
--- a/qemu/vl.c
+++ b/qemu/vl.c
@@ -45,6 +45,7 @@
#include <signal.h>
#include <time.h>
#include <errno.h>
+#include <asm/param.h>
#include <sys/time.h>
#include <zlib.h>
@@ -234,6 +235,7 @@ int autostart = 1;
int time_drift_fix = 0;
unsigned int kvm_shadow_memory = 0;
const char *mem_path = NULL;
+int mem_prealloc = 1; /* force preallocation of physical target memory */
int hpagesize = 0;
const char *cpu_vendor_string;
#ifdef TARGET_ARM
@@ -7809,7 +7811,10 @@ static void help(int exitcode)
#endif
"-tdf inject timer interrupts that got lost\n"
"-kvm-shadow-memory megs set the amount of shadow pages to be allocated\n"
- "-mem-path set the path to hugetlbfs/tmpfs mounted directory, also enables allocation of guest memory with huge pages\n"
+ "-mem-path set the path to hugetlbfs/tmpfs mounted directory, also\n"
+ " enables allocation of guest memory with huge pages\n"
+ "-mem-prealloc toggles preallocation of huge page physical memory at\n"
+ " startup when -mem-path is specified. Default is enabled.\n"
"-option-rom rom load a file, rom, into the option ROM space\n"
#ifdef TARGET_SPARC
"-prom-env variable=value set OpenBIOS nvram variables\n"
@@ -7932,6 +7937,7 @@ enum {
QEMU_OPTION_tdf,
QEMU_OPTION_kvm_shadow_memory,
QEMU_OPTION_mempath,
+ QEMU_OPTION_mem_prealloc
};
typedef struct QEMUOption {
@@ -8059,6 +8065,7 @@ const QEMUOption qemu_options[] = {
{ "startdate", HAS_ARG, QEMU_OPTION_startdate },
{ "tb-size", HAS_ARG, QEMU_OPTION_tb_size },
{ "mem-path", HAS_ARG, QEMU_OPTION_mempath },
+ { "mem-prealloc", 0, QEMU_OPTION_mem_prealloc },
{ NULL },
};
@@ -8276,11 +8283,13 @@ static int gethugepagesize(void)
return hugepagesize;
}
-void *alloc_mem_area(unsigned long memory, const char *path)
+/* attempt to allocate memory mmap'ed to mem_path
+ */
+void *alloc_hpage_mem(unsigned long memory, const char *path)
{
char *filename;
void *area;
- int fd;
+ int fd, flags;
if (asprintf(&filename, "%s/kvm.XXXXXX", path) == -1)
return NULL;
@@ -8308,26 +8317,24 @@ void *alloc_mem_area(unsigned long memor
*/
ftruncate(fd, memory);
- area = mmap(0, memory, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
- if (area == MAP_FAILED) {
- perror("mmap");
- close(fd);
- return NULL;
- }
-
- return area;
+ /* NB: MAP_POPULATE won't exhaustively alloc all phys pages in the case
+ * MAP_PRIVATE is requested. For mem_prealloc we mmap as MAP_SHARED
+ * to sidestep this quirk.
+ */
+ flags = mem_prealloc ? MAP_POPULATE|MAP_SHARED : MAP_PRIVATE;
+ area = mmap(0, memory, PROT_READ|PROT_WRITE, flags, fd, 0);
+ if (area != MAP_FAILED)
+ return (area);
+ perror("alloc_hpage_mem: can't mmap hugetlbfs pages");
+ close(fd);
+ return (NULL);
}
-void *qemu_alloc_physram(unsigned long memory)
+/* allocate guest memory as requested
+ */
+void *qemu_alloc_physram(unsigned long size)
{
- void *area = NULL;
-
- if (mem_path)
- area = alloc_mem_area(memory, mem_path);
- if (!area)
- area = qemu_vmalloc(memory);
-
- return area;
+ return (mem_path ? alloc_hpage_mem(size, mem_path) : qemu_vmalloc(size));
}
int main(int argc, char **argv)
@@ -8962,6 +8969,9 @@ int main(int argc, char **argv)
case QEMU_OPTION_mempath:
mem_path = optarg;
break;
+ case QEMU_OPTION_mem_prealloc:
+ mem_prealloc = !mem_prealloc;
+ break;
case QEMU_OPTION_name:
qemu_name = optarg;
break;
next prev parent reply other threads:[~2008-07-10 16:53 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-07-08 22:02 patch: qemu + hugetlbfs john cooper
2008-07-08 23:09 ` Anthony Liguori
2008-07-09 0:23 ` john cooper
2008-07-09 1:08 ` Anthony Liguori
2008-07-09 17:03 ` Marcelo Tosatti
2008-07-09 17:11 ` Anthony Liguori
2008-07-10 16:40 ` john cooper [this message]
2008-07-10 17:58 ` Anthony Liguori
2008-07-10 20:16 ` john cooper
2008-07-10 20:47 ` Anthony Liguori
2008-07-10 21:12 ` john cooper
2008-07-10 21:38 ` Anthony Liguori
2008-08-25 23:05 ` Resend: " john cooper
2008-08-26 8:11 ` Avi Kivity
2008-08-27 4:13 ` john cooper
2009-01-16 2:19 ` john cooper
2009-01-20 10:29 ` Avi Kivity
2009-01-23 21:21 ` john cooper
2009-02-05 15:42 ` Avi Kivity
2009-02-05 16:12 ` Marcelo Tosatti
2009-02-05 16:15 ` Avi Kivity
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48763B86.6060402@third-harmonic.com \
--to=john.cooper@third-harmonic.com \
--cc=anthony@codemonkey.ws \
--cc=john.cooper@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=mtosatti@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.