* patch: qemu + hugetlbfs..
@ 2008-07-08 22:02 john cooper
2008-07-08 23:09 ` Anthony Liguori
0 siblings, 1 reply; 21+ messages in thread
From: john cooper @ 2008-07-08 22:02 UTC (permalink / raw)
To: kvm; +Cc: mtosatti, john.cooper
[-- Attachment #1: Type: text/plain, Size: 1771 bytes --]
Attached is a patch to qemu which allows preallocation of huge
pages at startup time. Also in cases where sufficient huge
pages are not available, the user may request to selectively fall
back to 4K pages for those portions of phys_mem which couldn't be
backed by huge pages. This patch is relative to kvm-70.
The motivation for this arose from odd behavior seen in qemu
when access to huge page backed phys_mem failed during startup
(eg: loading the bios), and during runtime where a guest will
terminate via signal if a free hpage isn't available to satisfy
a guest page fault.
Two flags are introduced:
-mem-prealloc [0|1]
Attempt to fault-in all hpages and if all can be allocated
proceed as normal. If all hpages can't be resolved and
-mem-fallback behavior wasn't is requested, qemu will error
exit. This behavior is enabled by default when -mem-path
is specified.
-mem-fallback [0|1]
In the case some hpages can't be allocated, mapping a
series of 4K pages will be attempted in place of them.
In the case zero hpages were allocated a diagnostic to
this effect will be printed (flagging scenarios such
as /proc/sys/vm/nr_hugepages failing to be setup) and
the normal non-hpage 4K allocation will be attempted.
This behavior by default is disabled. Recognition of
-mem-fallback will only occur when -mem-path is
specified and -mem-prealloc behavior is enabled.
This is believed to be an interim bandaid until such support
is available from within the kernel+kvm, ideally allowing
fallback to 4K pages on a dynamic basis and ultimately allowing
huge pages faults to reclaim huge pages from other users in the
system.
-john
Signed-off-by: john cooper <john.cooper <at> redhat.com>
[-- Attachment #2: prealloc.diff --]
[-- Type: text/x-patch, Size: 6852 bytes --]
--- a/qemu/vl.c
+++ b/qemu/vl.c
@@ -45,6 +45,7 @@
#include <signal.h>
#include <time.h>
#include <errno.h>
+#include <asm/param.h>
#include <sys/time.h>
#include <zlib.h>
@@ -160,6 +161,15 @@ int inet_aton(const char *cp, struct in_
/* XXX: use a two level table to limit memory usage */
#define MAX_IOPORTS 65536
+/* assertion checking
+ */
+#define ASSERT(c) if (!(c)) _assert(#c, __FILE__, __LINE__)
+
+static inline void _assert(char *text, char *file, int line) {
+ fprintf(stderr, "ASSERTION FAILED: [%s] %s:%d\n", text, file, line);
+ kill(getpid(), 12); /* dump core */
+}
+
const char *bios_dir = CONFIG_QEMU_SHAREDIR;
const char *bios_name = NULL;
void *ioport_opaque[MAX_IOPORTS];
@@ -234,6 +244,8 @@ int autostart = 1;
int time_drift_fix = 0;
unsigned int kvm_shadow_memory = 0;
const char *mem_path = NULL;
+int mem_prealloc = {1}; /* force preallocation of physical target memory */
+int mem_fallback = {0}; /* allow alloc from alternate page size */
int hpagesize = 0;
const char *cpu_vendor_string;
#ifdef TARGET_ARM
@@ -7809,7 +7821,12 @@ static void help(int exitcode)
#endif
"-tdf inject timer interrupts that got lost\n"
"-kvm-shadow-memory megs set the amount of shadow pages to be allocated\n"
- "-mem-path set the path to hugetlbfs/tmpfs mounted directory, also enables allocation of guest memory with huge pages\n"
+ "-mem-path set the path to hugetlbfs/tmpfs mounted directory, also\n"
+ " enables allocation of guest memory with huge pages\n"
+ "-mem-prealloc forces preallocation of huge page physical memory\n"
+ " at startup when -mem-path is specified. Default 1 (on)\n"
+ "-mem-fallback enables fallback to conventional pages as needed\n"
+ " when -mem-prealloc is enabled. Default: 0 (off)\n"
"-option-rom rom load a file, rom, into the option ROM space\n"
#ifdef TARGET_SPARC
"-prom-env variable=value set OpenBIOS nvram variables\n"
@@ -7932,6 +7949,8 @@ enum {
QEMU_OPTION_tdf,
QEMU_OPTION_kvm_shadow_memory,
QEMU_OPTION_mempath,
+ QEMU_OPTION_mem_prealloc,
+ QEMU_OPTION_mem_fallback,
};
typedef struct QEMUOption {
@@ -8059,6 +8078,8 @@ const QEMUOption qemu_options[] = {
{ "startdate", HAS_ARG, QEMU_OPTION_startdate },
{ "tb-size", HAS_ARG, QEMU_OPTION_tb_size },
{ "mem-path", HAS_ARG, QEMU_OPTION_mempath },
+ { "mem-prealloc", HAS_ARG, QEMU_OPTION_mem_prealloc },
+ { "mem-fallback", HAS_ARG, QEMU_OPTION_mem_fallback },
{ NULL },
};
@@ -8276,11 +8297,36 @@ static int gethugepagesize(void)
return hugepagesize;
}
-void *alloc_mem_area(unsigned long memory, const char *path)
+/* fault in associated page, fail syscall when free page is unavailable
+ */
+static inline int fault_in(void *a)
+{
+ return (gettimeofday(a, NULL) < 0 ? errno != EFAULT : 1);
+}
+
+/* we failed to fault in hpage *a, fall back to conventional page mapping
+ */
+int remap_hpage(void *a, int sz)
+{
+ ASSERT(!(sz & (EXEC_PAGESIZE - 1)));
+ if (munmap(a, sz) < 0)
+ perror("remap_hpage: munmap");
+ else if (mmap(a, sz, PROT_READ|PROT_WRITE,
+ MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED, -1, 0) == MAP_FAILED)
+ perror("remap_hpage: mmap");
+ else
+ return (1);
+ return (0);
+}
+
+/* attempt to allocate memory mmap'ed to mem_path
+ */
+void *alloc_hpage_mem(unsigned long memory, const char *path)
{
char *filename;
- void *area;
- int fd;
+ void *area, *a;
+ int fd, fail;
+ long i;
if (asprintf(&filename, "%s/kvm.XXXXXX", path) == -1)
return NULL;
@@ -8308,27 +8354,80 @@ void *alloc_mem_area(unsigned long memor
*/
ftruncate(fd, memory);
+ fail = 0;
area = mmap(0, memory, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
- if (area == MAP_FAILED) {
- perror("mmap");
- close(fd);
- return NULL;
- }
+ if (area == MAP_FAILED)
+ ++fail, perror("alloc_hpage_mem: hugetlbfs mmap");
- return area;
+ /* success of mmap() doesn't guarantee phys pages are present in the
+ * map. if requested we attempt to fault in all associated huge pages * and where unable if requested, selectively revert back to a mapping
+ * of conventional sized pages
+ */
+ ASSERT(!((unsigned long)area & (hpagesize - 1)));
+ if (mem_prealloc)
+ for (a = area, i = memory / hpagesize; !fail && 0 < i--; a += hpagesize)
+ if (fault_in(a))
+ ; /* hpage is mapped */
+ else if (!mem_fallback)
+ ++fail;
+ else if (!remap_hpage(a, hpagesize))
+ ++fail;
+ if (!fail)
+ return (area);
+ if (area != MAP_FAILED)
+ munmap(area, memory);
+ close(fd);
+ return (NULL);
}
+/* allocate from hpage mem if requested and to the extent possible, else
+ * revert to conventional page allocation.
+ */
void *qemu_alloc_physram(unsigned long memory)
{
void *area = NULL;
- if (mem_path)
- area = alloc_mem_area(memory, mem_path);
- if (!area)
- area = qemu_vmalloc(memory);
+ if (mem_path && !(area = alloc_hpage_mem(memory, mem_path)))
+ fprintf(stderr, "warning: couldn't alloc memory as requested from %s\n",
+ mem_path);
+ return (!area && mem_fallback ? qemu_vmalloc(memory) : area);
+}
- return area;
+/* match string *s in pattern *p. *p is of the form: [<str> | [<str> '|']*]
+ * and <str> is a string to be matched which may be empty.
+ * Return pointer to tail of matched *p for success, NULL otherwise.
+ */
+char const *altmatch(const char *p, const char *s)
+{
+ const char *bp, *q;
+
+ for (bp = p, q = s; ; )
+ if ((!*p || *p == '|') && !*q)
+ return (q == s && bp != p ? NULL : p);
+ else if (!*p)
+ return (0);
+ else if (*p == *q)
+ ++p, ++q;
+ else { /* failed, try next field */
+ for (q = s; *p && *p != '|'; ++p)
+ ;
+ if (*p == '|')
+ bp = ++p;
+ }
+}
+
+int bool_arg(const char *optarg, const char *flag_name)
+{
+ if (!optarg)
+ ;
+ else if (altmatch("y|yes|1", optarg))
+ return (1);
+ else if (altmatch("n|no|0", optarg))
+ return (0);
+ fprintf(stderr, "usage: %s [0|1]\n", flag_name);
+ exit(1);
}
+
int main(int argc, char **argv)
{
@@ -8962,6 +9061,12 @@ int main(int argc, char **argv)
case QEMU_OPTION_mempath:
mem_path = optarg;
break;
+ case QEMU_OPTION_mem_prealloc:
+ mem_prealloc = bool_arg(optarg, "mem_prealloc");
+ break;
+ case QEMU_OPTION_mem_fallback:
+ mem_fallback = bool_arg(optarg, "mem_fallback");
+ break;
case QEMU_OPTION_name:
qemu_name = optarg;
break;
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: patch: qemu + hugetlbfs..
2008-07-08 22:02 patch: qemu + hugetlbfs john cooper
@ 2008-07-08 23:09 ` Anthony Liguori
2008-07-09 0:23 ` john cooper
0 siblings, 1 reply; 21+ messages in thread
From: Anthony Liguori @ 2008-07-08 23:09 UTC (permalink / raw)
To: john cooper; +Cc: kvm, mtosatti, john.cooper
john cooper wrote:
> Attached is a patch to qemu which allows preallocation of huge
> pages at startup time. Also in cases where sufficient huge
> pages are not available, the user may request to selectively fall
> back to 4K pages for those portions of phys_mem which couldn't be
> backed by huge pages. This patch is relative to kvm-70.
>
> The motivation for this arose from odd behavior seen in qemu
> when access to huge page backed phys_mem failed during startup
> (eg: loading the bios), and during runtime where a guest will
> terminate via signal if a free hpage isn't available to satisfy
> a guest page fault.
>
> Two flags are introduced:
>
> -mem-prealloc [0|1]
>
> Attempt to fault-in all hpages and if all can be allocated
> proceed as normal. If all hpages can't be resolved and
> -mem-fallback behavior wasn't is requested, qemu will error
> exit. This behavior is enabled by default when -mem-path
> is specified.
>
> -mem-fallback [0|1]
>
> In the case some hpages can't be allocated, mapping a
> series of 4K pages will be attempted in place of them.
> In the case zero hpages were allocated a diagnostic to
> this effect will be printed (flagging scenarios such
> as /proc/sys/vm/nr_hugepages failing to be setup) and
> the normal non-hpage 4K allocation will be attempted.
> This behavior by default is disabled. Recognition of
> -mem-fallback will only occur when -mem-path is
> specified and -mem-prealloc behavior is enabled.
>
> This is believed to be an interim bandaid until such support
> is available from within the kernel+kvm, ideally allowing
> fallback to 4K pages on a dynamic basis and ultimately allowing
> huge pages faults to reclaim huge pages from other users in the
> system.
>
> -john
>
>
> Signed-off-by: john cooper <john.cooper <at> redhat.com>
>
>
> --- a/qemu/vl.c
> +++ b/qemu/vl.c
> @@ -45,6 +45,7 @@
> #include <signal.h>
> #include <time.h>
> #include <errno.h>
> +#include <asm/param.h>
> #include <sys/time.h>
> #include <zlib.h>
>
> @@ -160,6 +161,15 @@ int inet_aton(const char *cp, struct in_
> /* XXX: use a two level table to limit memory usage */
> #define MAX_IOPORTS 65536
>
> +/* assertion checking
> + */
> +#define ASSERT(c) if (!(c)) _assert(#c, __FILE__, __LINE__)
> +
> +static inline void _assert(char *text, char *file, int line) {
> + fprintf(stderr, "ASSERTION FAILED: [%s] %s:%d\n", text, file, line);
> + kill(getpid(), 12); /* dump core */
> +}
> +
>
Is it really necessary to add this as part of this patch?
> const char *bios_dir = CONFIG_QEMU_SHAREDIR;
> const char *bios_name = NULL;
> void *ioport_opaque[MAX_IOPORTS];
> @@ -234,6 +244,8 @@ int autostart = 1;
> int time_drift_fix = 0;
> unsigned int kvm_shadow_memory = 0;
> const char *mem_path = NULL;
> +int mem_prealloc = {1}; /* force preallocation of physical target memory */
> +int mem_fallback = {0}; /* allow alloc from alternate page size */
>
This is a bizarre way to initialize an integer. I had no idea you could
even do this for a scalar. Just initialize to 1 and 0.
> int hpagesize = 0;
> const char *cpu_vendor_string;
> #ifdef TARGET_ARM
> @@ -7809,7 +7821,12 @@ static void help(int exitcode)
> #endif
> "-tdf inject timer interrupts that got lost\n"
> "-kvm-shadow-memory megs set the amount of shadow pages to be allocated\n"
> - "-mem-path set the path to hugetlbfs/tmpfs mounted directory, also enables allocation of guest memory with huge pages\n"
> + "-mem-path set the path to hugetlbfs/tmpfs mounted directory, also\n"
> + " enables allocation of guest memory with huge pages\n"
> + "-mem-prealloc forces preallocation of huge page physical memory\n"
> + " at startup when -mem-path is specified. Default 1 (on)\n"
> + "-mem-fallback enables fallback to conventional pages as needed\n"
> + " when -mem-prealloc is enabled. Default: 0 (off)\n"
> "-option-rom rom load a file, rom, into the option ROM space\n"
> #ifdef TARGET_SPARC
> "-prom-env variable=value set OpenBIOS nvram variables\n"
> @@ -7932,6 +7949,8 @@ enum {
> QEMU_OPTION_tdf,
> QEMU_OPTION_kvm_shadow_memory,
> QEMU_OPTION_mempath,
> + QEMU_OPTION_mem_prealloc,
> + QEMU_OPTION_mem_fallback,
> };
>
> typedef struct QEMUOption {
> @@ -8059,6 +8078,8 @@ const QEMUOption qemu_options[] = {
> { "startdate", HAS_ARG, QEMU_OPTION_startdate },
> { "tb-size", HAS_ARG, QEMU_OPTION_tb_size },
> { "mem-path", HAS_ARG, QEMU_OPTION_mempath },
> + { "mem-prealloc", HAS_ARG, QEMU_OPTION_mem_prealloc },
> + { "mem-fallback", HAS_ARG, QEMU_OPTION_mem_fallback },
> { NULL },
> };
>
> @@ -8276,11 +8297,36 @@ static int gethugepagesize(void)
> return hugepagesize;
> }
>
> -void *alloc_mem_area(unsigned long memory, const char *path)
> +/* fault in associated page, fail syscall when free page is unavailable
> + */
> +static inline int fault_in(void *a)
> +{
> + return (gettimeofday(a, NULL) < 0 ? errno != EFAULT : 1);
> +}
>
I don't like this very much. Why not just MAP_POPULATE?
> +/* we failed to fault in hpage *a, fall back to conventional page mapping
> + */
> +int remap_hpage(void *a, int sz)
> +{
> + ASSERT(!(sz & (EXEC_PAGESIZE - 1)));
> + if (munmap(a, sz) < 0)
> + perror("remap_hpage: munmap");
> + else if (mmap(a, sz, PROT_READ|PROT_WRITE,
> + MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED, -1, 0) == MAP_FAILED)
> + perror("remap_hpage: mmap");
> + else
> + return (1);
> + return (0);
> +}
>
I think this would be simplified with MAP_POPULATE since you can fail in
large chunks of memory instead of potentially having a highly fragmented
set of VMAs.
> +/* attempt to allocate memory mmap'ed to mem_path
> + */
> +void *alloc_hpage_mem(unsigned long memory, const char *path)
> {
> char *filename;
> - void *area;
> - int fd;
> + void *area, *a;
> + int fd, fail;
> + long i;
>
> if (asprintf(&filename, "%s/kvm.XXXXXX", path) == -1)
> return NULL;
> @@ -8308,27 +8354,80 @@ void *alloc_mem_area(unsigned long memor
> */
> ftruncate(fd, memory);
>
> + fail = 0;
> area = mmap(0, memory, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
> - if (area == MAP_FAILED) {
> - perror("mmap");
> - close(fd);
> - return NULL;
> - }
> + if (area == MAP_FAILED)
> + ++fail, perror("alloc_hpage_mem: hugetlbfs mmap");
>
> - return area;
> + /* success of mmap() doesn't guarantee phys pages are present in the
> + * map. if requested we attempt to fault in all associated huge pages * and where unable if requested, selectively revert back to a mapping
> + * of conventional sized pages
> + */
> + ASSERT(!((unsigned long)area & (hpagesize - 1)));
> + if (mem_prealloc)
> + for (a = area, i = memory / hpagesize; !fail && 0 < i--; a += hpagesize)
> + if (fault_in(a))
> + ; /* hpage is mapped */
> + else if (!mem_fallback)
> + ++fail;
> + else if (!remap_hpage(a, hpagesize))
> + ++fail;
> + if (!fail)
> + return (area);
> + if (area != MAP_FAILED)
> + munmap(area, memory);
> + close(fd);
> + return (NULL);
> }
>
> +/* allocate from hpage mem if requested and to the extent possible, else
> + * revert to conventional page allocation.
> + */
> void *qemu_alloc_physram(unsigned long memory)
> {
> void *area = NULL;
>
> - if (mem_path)
> - area = alloc_mem_area(memory, mem_path);
> - if (!area)
> - area = qemu_vmalloc(memory);
> + if (mem_path && !(area = alloc_hpage_mem(memory, mem_path)))
> + fprintf(stderr, "warning: couldn't alloc memory as requested from %s\n",
> + mem_path);
> + return (!area && mem_fallback ? qemu_vmalloc(memory) : area);
> +}
>
> - return area;
> +/* match string *s in pattern *p. *p is of the form: [<str> | [<str> '|']*]
> + * and <str> is a string to be matched which may be empty.
> + * Return pointer to tail of matched *p for success, NULL otherwise.
> + */
> +char const *altmatch(const char *p, const char *s)
> +{
> + const char *bp, *q;
> +
> + for (bp = p, q = s; ; )
> + if ((!*p || *p == '|') && !*q)
> + return (q == s && bp != p ? NULL : p);
> + else if (!*p)
> + return (0);
> + else if (*p == *q)
> + ++p, ++q;
> + else { /* failed, try next field */
> + for (q = s; *p && *p != '|'; ++p)
> + ;
> + if (*p == '|')
> + bp = ++p;
> + }
> +}
> +
> +int bool_arg(const char *optarg, const char *flag_name)
> +{
> + if (!optarg)
> + ;
> + else if (altmatch("y|yes|1", optarg))
> + return (1);
> + else if (altmatch("n|no|0", optarg))
> + return (0);
> + fprintf(stderr, "usage: %s [0|1]\n", flag_name);
> + exit(1);
> }
> +
>
This is introducing too much infrastructure. Just follow the
conventions in the rest of the code. No need to make the options take a
boolean argument. The options themselves are boolean arguments.
Regards,
Anthony Liguori
> int main(int argc, char **argv)
> {
> @@ -8962,6 +9061,12 @@ int main(int argc, char **argv)
> case QEMU_OPTION_mempath:
> mem_path = optarg;
> break;
> + case QEMU_OPTION_mem_prealloc:
> + mem_prealloc = bool_arg(optarg, "mem_prealloc");
> + break;
> + case QEMU_OPTION_mem_fallback:
> + mem_fallback = bool_arg(optarg, "mem_fallback");
> + break;
> case QEMU_OPTION_name:
> qemu_name = optarg;
> break;
>
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: patch: qemu + hugetlbfs..
2008-07-08 23:09 ` Anthony Liguori
@ 2008-07-09 0:23 ` john cooper
2008-07-09 1:08 ` Anthony Liguori
0 siblings, 1 reply; 21+ messages in thread
From: john cooper @ 2008-07-09 0:23 UTC (permalink / raw)
To: Anthony Liguori; +Cc: kvm, mtosatti, john.cooper, john cooper
Anthony Liguori wrote:
>> +/* assertion checking
>> + */
>> +#define ASSERT(c) if (!(c)) _assert(#c, __FILE__, __LINE__)
>> +
>> +static inline void _assert(char *text, char *file, int line) {
>> + fprintf(stderr, "ASSERTION FAILED: [%s] %s:%d\n", text, file, line);
>> + kill(getpid(), 12); /* dump core */
>> +}
>> +
>>
>
> Is it really necessary to add this as part of this patch?
No, certainly not strictly necessary.
>> +int mem_fallback = {0}; /* allow alloc from alternate page size */
>>
> This is a bizarre way to initialize an integer. I had no idea you could
> even do this for a scalar. Just initialize to 1 and 0.
That's historically been valid syntax since pre-K&R. But certainly
can be adapted to the context style if it raises concern.
>> -void *alloc_mem_area(unsigned long memory, const char *path)
>> +/* fault in associated page, fail syscall when free page is unavailable
>> + */ +static inline int fault_in(void *a)
>> +{
>> + return (gettimeofday(a, NULL) < 0 ? errno != EFAULT : 1);
>> +}
>>
>
> I don't like this very much. Why not just MAP_POPULATE?
I like it even less. MAP_POPULATE does not fault in physical
hpages to the map. Again this was a qemu-side interim bandaid.
>> +/* we failed to fault in hpage *a, fall back to conventional page
>> mapping
>> + */
>> +int remap_hpage(void *a, int sz)
>> +{
>> + ASSERT(!(sz & (EXEC_PAGESIZE - 1)));
>> + if (munmap(a, sz) < 0)
>> + perror("remap_hpage: munmap");
>> + else if (mmap(a, sz, PROT_READ|PROT_WRITE,
>> + MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED, -1, 0) == MAP_FAILED)
>> + perror("remap_hpage: mmap");
>> + else
>> + return (1);
>> + return (0);
>> +}
>>
>
> I think this would be simplified with MAP_POPULATE since you can fail in
> large chunks of memory instead of potentially having a highly fragmented
> set of VMAs.
Here for 4K pages we only need to setup the map. If we later
fault on a physically absent 4K page we'll wait if a page isn't
immediately available. Rather in the case of a hpage being
unavailable, we'll terminate. Note at this point we've effectively
locked onto whatever hpages we've been able to map as they can't
be reclaimed from us until we exit.
Also it appears tab formatting in this patch may need to be
scrubbed some as the mailers seem to be jostling the whitespace.
>> +int bool_arg(const char *optarg, const char *flag_name)
>> +{
>> + if (!optarg)
>> + ;
>> + else if (altmatch("y|yes|1", optarg))
>> + return (1);
>> + else if (altmatch("n|no|0", optarg))
>> + return (0);
>> + fprintf(stderr, "usage: %s [0|1]\n", flag_name);
>> + exit(1);
>> }
>> +
>
> This is introducing too much infrastructure. Just follow the
> conventions in the rest of the code. No need to make the options take a
> boolean argument. The options themselves are boolean arguments.
That's how it originally existed. However with the defaults
different for the two flags it seemed a bit clunky to have
one recall what the initial disposition of the option was and
to toggle its behavior if that wasn't the intention.
But admittedly I don't have a strong affinity either way. I
was only concerned with usability and clarity of the flags.
-john
--
john.cooper@third-harmonic.com
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: patch: qemu + hugetlbfs..
2008-07-09 0:23 ` john cooper
@ 2008-07-09 1:08 ` Anthony Liguori
2008-07-09 17:03 ` Marcelo Tosatti
0 siblings, 1 reply; 21+ messages in thread
From: Anthony Liguori @ 2008-07-09 1:08 UTC (permalink / raw)
To: john cooper; +Cc: kvm, mtosatti, john.cooper
john cooper wrote:
> I like it even less. MAP_POPULATE does not fault in physical
> hpages to the map. Again this was a qemu-side interim bandaid.
Really? That would seem like a bug in hugetlbfs to me.
>>> +/* we failed to fault in hpage *a, fall back to conventional page
>>> mapping
>>> + */
>>> +int remap_hpage(void *a, int sz)
>>> +{
>>> + ASSERT(!(sz & (EXEC_PAGESIZE - 1)));
>>> + if (munmap(a, sz) < 0)
>>> + perror("remap_hpage: munmap");
>>> + else if (mmap(a, sz, PROT_READ|PROT_WRITE,
>>> + MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED, -1, 0) == MAP_FAILED)
>>> + perror("remap_hpage: mmap");
>>> + else
>>> + return (1);
>>> + return (0);
>>> +}
>>>
>>
>> I think this would be simplified with MAP_POPULATE since you can fail
>> in large chunks of memory instead of potentially having a highly
>> fragmented set of VMAs.
>
> Here for 4K pages we only need to setup the map. If we later
> fault on a physically absent 4K page we'll wait if a page isn't
> immediately available. Rather in the case of a hpage being
> unavailable, we'll terminate. Note at this point we've effectively
> locked onto whatever hpages we've been able to map as they can't
> be reclaimed from us until we exit.
Right now. Once we drop references to the large pages, there's nothing
preventing them from being reclaimed in the future. That's what I'm
concerned about.
Regards,
Anthony Liguori
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: patch: qemu + hugetlbfs..
2008-07-09 1:08 ` Anthony Liguori
@ 2008-07-09 17:03 ` Marcelo Tosatti
2008-07-09 17:11 ` Anthony Liguori
0 siblings, 1 reply; 21+ messages in thread
From: Marcelo Tosatti @ 2008-07-09 17:03 UTC (permalink / raw)
To: Anthony Liguori; +Cc: john cooper, kvm, john.cooper
On Tue, Jul 08, 2008 at 08:08:22PM -0500, Anthony Liguori wrote:
> john cooper wrote:
>> I like it even less. MAP_POPULATE does not fault in physical
>> hpages to the map. Again this was a qemu-side interim bandaid.
>
> Really? That would seem like a bug in hugetlbfs to me.
This is Linux's behaviour for all filesystems. There is no error
checking on MAP_POPULATE's attempt to prefault pages.
>>>> +/* we failed to fault in hpage *a, fall back to conventional page
>>>> mapping
>>>> + */
>>>> +int remap_hpage(void *a, int sz)
>>>> +{
>>>> + ASSERT(!(sz & (EXEC_PAGESIZE - 1)));
>>>> + if (munmap(a, sz) < 0)
>>>> + perror("remap_hpage: munmap");
>>>> + else if (mmap(a, sz, PROT_READ|PROT_WRITE,
>>>> + MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED, -1, 0) == MAP_FAILED)
>>>> + perror("remap_hpage: mmap");
>>>> + else
>>>> + return (1);
>>>> + return (0);
>>>> +}
>>>>
>>>
>>> I think this would be simplified with MAP_POPULATE since you can fail
>>> in large chunks of memory instead of potentially having a highly
>>> fragmented set of VMAs.
>>
>> Here for 4K pages we only need to setup the map. If we later
>> fault on a physically absent 4K page we'll wait if a page isn't
>> immediately available. Rather in the case of a hpage being
>> unavailable, we'll terminate. Note at this point we've effectively
>> locked onto whatever hpages we've been able to map as they can't
>> be reclaimed from us until we exit.
>
> Right now. Once we drop references to the large pages, there's nothing
> preventing them from being reclaimed in the future. That's what I'm
> concerned about.
This is just a temporary workaround until a better solution is in place,
as John mentioned.
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: patch: qemu + hugetlbfs..
2008-07-09 17:03 ` Marcelo Tosatti
@ 2008-07-09 17:11 ` Anthony Liguori
2008-07-10 16:40 ` john cooper
0 siblings, 1 reply; 21+ messages in thread
From: Anthony Liguori @ 2008-07-09 17:11 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: john cooper, kvm, john.cooper
Marcelo Tosatti wrote:
> On Tue, Jul 08, 2008 at 08:08:22PM -0500, Anthony Liguori wrote:
>
>> john cooper wrote:
>>
>>> I like it even less. MAP_POPULATE does not fault in physical
>>> hpages to the map. Again this was a qemu-side interim bandaid.
>>>
>> Really? That would seem like a bug in hugetlbfs to me.
>>
>
> This is Linux's behaviour for all filesystems. There is no error
> checking on MAP_POPULATE's attempt to prefault pages.
>
I thought that mmap() will fail with hugetlbfs if enough large pages
can't be reserved?
Regards,
Anthony Liguori
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: patch: qemu + hugetlbfs..
2008-07-09 17:11 ` Anthony Liguori
@ 2008-07-10 16:40 ` john cooper
2008-07-10 17:58 ` Anthony Liguori
0 siblings, 1 reply; 21+ messages in thread
From: john cooper @ 2008-07-10 16:40 UTC (permalink / raw)
To: Anthony Liguori; +Cc: Marcelo Tosatti, kvm, john.cooper
[-- Attachment #1: Type: text/plain, Size: 1201 bytes --]
Anthony Liguori wrote:
> Marcelo Tosatti wrote:
>> This is Linux's behaviour for all filesystems. There is no error
>> checking on MAP_POPULATE's attempt to prefault pages.
>>
>
> I thought that mmap() will fail with hugetlbfs if enough large pages
> can't be reserved?
This appears to be the case for MAP_SHARED. For MAP_PRIVATE
the accounting is side-stepped, the result of which was the
best-effort prealloc/populate behavior we had see seen.
Marcelo and I batted this around some yesterday. And there
doesn't appear to be any concern using MAP_SHARED for the
phys_ram map as potential access to the backing file is
still regulated by filesystem permissions.
Doing so dispenses with the syscall/useraddr hack to fault
in pages since mmap() fails as expected in the case
sufficient pages are unavailable immediately to back
MAP_SHARED pages. We also decided to defer the map-fallback
logic. This is essentially behavior we'd like to see
provided by kvm, ideally on a dynamic (guest fault) basis
and was technically beyond the issue we were trying to
target.
Attached is the resulting patch which also addresses the
other concerns which arose.
-john
--
john.cooper@third-harmonic.com
[-- Attachment #2: prealloc.diff-080710 --]
[-- Type: text/plain, Size: 3678 bytes --]
--- a/qemu/vl.c
+++ b/qemu/vl.c
@@ -45,6 +45,7 @@
#include <signal.h>
#include <time.h>
#include <errno.h>
+#include <asm/param.h>
#include <sys/time.h>
#include <zlib.h>
@@ -234,6 +235,7 @@ int autostart = 1;
int time_drift_fix = 0;
unsigned int kvm_shadow_memory = 0;
const char *mem_path = NULL;
+int mem_prealloc = 1; /* force preallocation of physical target memory */
int hpagesize = 0;
const char *cpu_vendor_string;
#ifdef TARGET_ARM
@@ -7809,7 +7811,10 @@ static void help(int exitcode)
#endif
"-tdf inject timer interrupts that got lost\n"
"-kvm-shadow-memory megs set the amount of shadow pages to be allocated\n"
- "-mem-path set the path to hugetlbfs/tmpfs mounted directory, also enables allocation of guest memory with huge pages\n"
+ "-mem-path set the path to hugetlbfs/tmpfs mounted directory, also\n"
+ " enables allocation of guest memory with huge pages\n"
+ "-mem-prealloc toggles preallocation of huge page physical memory at\n"
+ " startup when -mem-path is specified. Default is enabled.\n"
"-option-rom rom load a file, rom, into the option ROM space\n"
#ifdef TARGET_SPARC
"-prom-env variable=value set OpenBIOS nvram variables\n"
@@ -7932,6 +7937,7 @@ enum {
QEMU_OPTION_tdf,
QEMU_OPTION_kvm_shadow_memory,
QEMU_OPTION_mempath,
+ QEMU_OPTION_mem_prealloc
};
typedef struct QEMUOption {
@@ -8059,6 +8065,7 @@ const QEMUOption qemu_options[] = {
{ "startdate", HAS_ARG, QEMU_OPTION_startdate },
{ "tb-size", HAS_ARG, QEMU_OPTION_tb_size },
{ "mem-path", HAS_ARG, QEMU_OPTION_mempath },
+ { "mem-prealloc", 0, QEMU_OPTION_mem_prealloc },
{ NULL },
};
@@ -8276,11 +8283,13 @@ static int gethugepagesize(void)
return hugepagesize;
}
-void *alloc_mem_area(unsigned long memory, const char *path)
+/* attempt to allocate memory mmap'ed to mem_path
+ */
+void *alloc_hpage_mem(unsigned long memory, const char *path)
{
char *filename;
void *area;
- int fd;
+ int fd, flags;
if (asprintf(&filename, "%s/kvm.XXXXXX", path) == -1)
return NULL;
@@ -8308,26 +8317,24 @@ void *alloc_mem_area(unsigned long memor
*/
ftruncate(fd, memory);
- area = mmap(0, memory, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
- if (area == MAP_FAILED) {
- perror("mmap");
- close(fd);
- return NULL;
- }
-
- return area;
+ /* NB: MAP_POPULATE won't exhaustively alloc all phys pages in the case
+ * MAP_PRIVATE is requested. For mem_prealloc we mmap as MAP_SHARED
+ * to sidestep this quirk.
+ */
+ flags = mem_prealloc ? MAP_POPULATE|MAP_SHARED : MAP_PRIVATE;
+ area = mmap(0, memory, PROT_READ|PROT_WRITE, flags, fd, 0);
+ if (area != MAP_FAILED)
+ return (area);
+ perror("alloc_hpage_mem: can't mmap hugetlbfs pages");
+ close(fd);
+ return (NULL);
}
-void *qemu_alloc_physram(unsigned long memory)
+/* allocate guest memory as requested
+ */
+void *qemu_alloc_physram(unsigned long size)
{
- void *area = NULL;
-
- if (mem_path)
- area = alloc_mem_area(memory, mem_path);
- if (!area)
- area = qemu_vmalloc(memory);
-
- return area;
+ return (mem_path ? alloc_hpage_mem(size, mem_path) : qemu_vmalloc(size));
}
int main(int argc, char **argv)
@@ -8962,6 +8969,9 @@ int main(int argc, char **argv)
case QEMU_OPTION_mempath:
mem_path = optarg;
break;
+ case QEMU_OPTION_mem_prealloc:
+ mem_prealloc = !mem_prealloc;
+ break;
case QEMU_OPTION_name:
qemu_name = optarg;
break;
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: patch: qemu + hugetlbfs..
2008-07-10 16:40 ` john cooper
@ 2008-07-10 17:58 ` Anthony Liguori
2008-07-10 20:16 ` john cooper
0 siblings, 1 reply; 21+ messages in thread
From: Anthony Liguori @ 2008-07-10 17:58 UTC (permalink / raw)
To: john cooper; +Cc: Marcelo Tosatti, kvm, john.cooper
john cooper wrote:
> Anthony Liguori wrote:
>> Marcelo Tosatti wrote:
>>> This is Linux's behaviour for all filesystems. There is no error
>>> checking on MAP_POPULATE's attempt to prefault pages.
>>>
>>
>> I thought that mmap() will fail with hugetlbfs if enough large pages
>> can't be reserved?
>
> This appears to be the case for MAP_SHARED. For MAP_PRIVATE
> the accounting is side-stepped, the result of which was the
> best-effort prealloc/populate behavior we had see seen.
>
> Marcelo and I batted this around some yesterday. And there
> doesn't appear to be any concern using MAP_SHARED for the
> phys_ram map as potential access to the backing file is
> still regulated by filesystem permissions.
>
> Doing so dispenses with the syscall/useraddr hack to fault
> in pages since mmap() fails as expected in the case
> sufficient pages are unavailable immediately to back
> MAP_SHARED pages. We also decided to defer the map-fallback
> logic. This is essentially behavior we'd like to see
> provided by kvm, ideally on a dynamic (guest fault) basis
> and was technically beyond the issue we were trying to
> target.
>
> Attached is the resulting patch which also addresses the
> other concerns which arose.
>
> -john
>
> --- a/qemu/vl.c
> +++ b/qemu/vl.c
> @@ -45,6 +45,7 @@
> #include <signal.h>
> #include <time.h>
> #include <errno.h>
> +#include <asm/param.h>
>
I don't think this is necessary anymore. Depending on a Linux headers
breaks the QEMU build on other unices so it's a bad thing.
> #include <sys/time.h>
> #include <zlib.h>
>
> @@ -234,6 +235,7 @@ int autostart = 1;
> int time_drift_fix = 0;
> unsigned int kvm_shadow_memory = 0;
> const char *mem_path = NULL;
> +int mem_prealloc = 1; /* force preallocation of physical target memory */
> int hpagesize = 0;
> const char *cpu_vendor_string;
> #ifdef TARGET_ARM
> @@ -7809,7 +7811,10 @@ static void help(int exitcode)
> #endif
> "-tdf inject timer interrupts that got lost\n"
> "-kvm-shadow-memory megs set the amount of shadow pages to be allocated\n"
> - "-mem-path set the path to hugetlbfs/tmpfs mounted directory, also enables allocation of guest memory with huge pages\n"
> + "-mem-path set the path to hugetlbfs/tmpfs mounted directory, also\n"
> + " enables allocation of guest memory with huge pages\n"
> + "-mem-prealloc toggles preallocation of huge page physical memory at\n"
> + " startup when -mem-path is specified. Default is enabled.\n"
>
This really isn't huge page specific. It should work equally well with
tmpfs, no?
> "-option-rom rom load a file, rom, into the option ROM space\n"
> #ifdef TARGET_SPARC
> "-prom-env variable=value set OpenBIOS nvram variables\n"
> @@ -7932,6 +7937,7 @@ enum {
> QEMU_OPTION_tdf,
> QEMU_OPTION_kvm_shadow_memory,
> QEMU_OPTION_mempath,
> + QEMU_OPTION_mem_prealloc
> };
>
> typedef struct QEMUOption {
> @@ -8059,6 +8065,7 @@ const QEMUOption qemu_options[] = {
> { "startdate", HAS_ARG, QEMU_OPTION_startdate },
> { "tb-size", HAS_ARG, QEMU_OPTION_tb_size },
> { "mem-path", HAS_ARG, QEMU_OPTION_mempath },
> + { "mem-prealloc", 0, QEMU_OPTION_mem_prealloc },
> { NULL },
> };
>
> @@ -8276,11 +8283,13 @@ static int gethugepagesize(void)
> return hugepagesize;
> }
>
> -void *alloc_mem_area(unsigned long memory, const char *path)
> +/* attempt to allocate memory mmap'ed to mem_path
> + */
> +void *alloc_hpage_mem(unsigned long memory, const char *path)
> {
> char *filename;
> void *area;
> - int fd;
> + int fd, flags;
>
> if (asprintf(&filename, "%s/kvm.XXXXXX", path) == -1)
> return NULL;
> @@ -8308,26 +8317,24 @@ void *alloc_mem_area(unsigned long memor
> */
> ftruncate(fd, memory);
>
> - area = mmap(0, memory, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
> - if (area == MAP_FAILED) {
> - perror("mmap");
> - close(fd);
> - return NULL;
> - }
> -
> - return area;
> + /* NB: MAP_POPULATE won't exhaustively alloc all phys pages in the case
> + * MAP_PRIVATE is requested. For mem_prealloc we mmap as MAP_SHARED
> + * to sidestep this quirk.
> + */
> + flags = mem_prealloc ? MAP_POPULATE|MAP_SHARED : MAP_PRIVATE;
> + area = mmap(0, memory, PROT_READ|PROT_WRITE, flags, fd, 0);
> + if (area != MAP_FAILED)
> + return (area);
> + perror("alloc_hpage_mem: can't mmap hugetlbfs pages");
> + close(fd);
> + return (NULL);
> }
>
> -void *qemu_alloc_physram(unsigned long memory)
> +/* allocate guest memory as requested
> + */
> +void *qemu_alloc_physram(unsigned long size)
> {
> - void *area = NULL;
> -
> - if (mem_path)
> - area = alloc_mem_area(memory, mem_path);
> - if (!area)
> - area = qemu_vmalloc(memory);
> -
> - return area;
> + return (mem_path ? alloc_hpage_mem(size, mem_path) : qemu_vmalloc(size));
> }
>
I'd prefer if you didn't convert from a if/else to conditional. hpage
is a misnomer too as we aren't actually dependent on huge pages (this
code should work equally well for tmpfs).
Regards,
Anthony Liguori
> int main(int argc, char **argv)
> @@ -8962,6 +8969,9 @@ int main(int argc, char **argv)
> case QEMU_OPTION_mempath:
> mem_path = optarg;
> break;
> + case QEMU_OPTION_mem_prealloc:
> + mem_prealloc = !mem_prealloc;
> + break;
> case QEMU_OPTION_name:
> qemu_name = optarg;
> break;
>
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: patch: qemu + hugetlbfs..
2008-07-10 17:58 ` Anthony Liguori
@ 2008-07-10 20:16 ` john cooper
2008-07-10 20:47 ` Anthony Liguori
0 siblings, 1 reply; 21+ messages in thread
From: john cooper @ 2008-07-10 20:16 UTC (permalink / raw)
To: Anthony Liguori; +Cc: Marcelo Tosatti, kvm, john.cooper
[-- Attachment #1: Type: text/plain, Size: 1305 bytes --]
Anthony Liguori wrote:
>> +#include <asm/param.h>
>>
>
> I don't think this is necessary anymore. Depending on a Linux headers
> breaks the QEMU build on other unices so it's a bad thing.
It is no longer required, but see below.
> hpage is a misnomer too as we aren't actually dependent on huge pages (this
> code should work equally well for tmpfs).
As it currently exists alloc_hpage_mem() is tied to
the notion of huge page allocation as it will reference
gethugepagesize() irrespective of *mem_path. So even
in the case of tmpfs backed files, if the host kernel
has been configured with CONFIG_HUGETLBFS we will wind
up doing allocations of /dev/shm mapped files at
/proc/meminfo:Hugepagesize granularity. Otherwise if
HUGETLBFS is not configured gethugepagesize() returns
zero and alloc_hpage_mem() itself will not perform the
allocation.
Probably not what was intended but probably not too
much of a concern as "-mem-path /dev/shm" is likely
only used in debug of this flag and associated logic.
I don't see it currently being worth the trouble to
correct from a squeaky clean POV, and doing so may
drag in far more than the header file we've just
booted above to deal with this architecture/config
dependency.
An updated patch is attached.
-john
--
john.cooper@third-harmonic.com
[-- Attachment #2: prealloc.diff-08071001 --]
[-- Type: text/plain, Size: 3517 bytes --]
--- a/qemu/vl.c
+++ b/qemu/vl.c
@@ -234,6 +234,7 @@ int autostart = 1;
int time_drift_fix = 0;
unsigned int kvm_shadow_memory = 0;
const char *mem_path = NULL;
+int mem_prealloc = 1; /* force preallocation of physical target memory */
int hpagesize = 0;
const char *cpu_vendor_string;
#ifdef TARGET_ARM
@@ -7809,7 +7810,10 @@ static void help(int exitcode)
#endif
"-tdf inject timer interrupts that got lost\n"
"-kvm-shadow-memory megs set the amount of shadow pages to be allocated\n"
- "-mem-path set the path to hugetlbfs/tmpfs mounted directory, also enables allocation of guest memory with huge pages\n"
+ "-mem-path set the path to hugetlbfs/tmpfs mounted directory, also\n"
+ " enables allocation of guest memory with huge pages\n"
+ "-mem-prealloc toggles preallocation of -mem-path backed physical memory\n"
+ " at startup. Default is enabled.\n"
"-option-rom rom load a file, rom, into the option ROM space\n"
#ifdef TARGET_SPARC
"-prom-env variable=value set OpenBIOS nvram variables\n"
@@ -7932,6 +7936,7 @@ enum {
QEMU_OPTION_tdf,
QEMU_OPTION_kvm_shadow_memory,
QEMU_OPTION_mempath,
+ QEMU_OPTION_mem_prealloc
};
typedef struct QEMUOption {
@@ -8059,6 +8064,7 @@ const QEMUOption qemu_options[] = {
{ "startdate", HAS_ARG, QEMU_OPTION_startdate },
{ "tb-size", HAS_ARG, QEMU_OPTION_tb_size },
{ "mem-path", HAS_ARG, QEMU_OPTION_mempath },
+ { "mem-prealloc", 0, QEMU_OPTION_mem_prealloc },
{ NULL },
};
@@ -8276,11 +8282,13 @@ static int gethugepagesize(void)
return hugepagesize;
}
-void *alloc_mem_area(unsigned long memory, const char *path)
+/* attempt to allocate memory mmap'ed to mem_path
+ */
+void *alloc_hpage_mem(unsigned long memory, const char *path)
{
char *filename;
void *area;
- int fd;
+ int fd, flags;
if (asprintf(&filename, "%s/kvm.XXXXXX", path) == -1)
return NULL;
@@ -8308,26 +8316,27 @@ void *alloc_mem_area(unsigned long memor
*/
ftruncate(fd, memory);
- area = mmap(0, memory, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
- if (area == MAP_FAILED) {
- perror("mmap");
- close(fd);
- return NULL;
- }
-
- return area;
+ /* NB: MAP_POPULATE won't exhaustively alloc all phys pages in the case
+ * MAP_PRIVATE is requested. For mem_prealloc we mmap as MAP_SHARED
+ * to sidestep this quirk.
+ */
+ flags = mem_prealloc ? MAP_POPULATE|MAP_SHARED : MAP_PRIVATE;
+ area = mmap(0, memory, PROT_READ|PROT_WRITE, flags, fd, 0);
+ if (area != MAP_FAILED)
+ return (area);
+ perror("alloc_hpage_mem: can't mmap hugetlbfs pages");
+ close(fd);
+ return (NULL);
}
-void *qemu_alloc_physram(unsigned long memory)
+/* allocate guest memory as requested
+ */
+void *qemu_alloc_physram(unsigned long size)
{
- void *area = NULL;
-
if (mem_path)
- area = alloc_mem_area(memory, mem_path);
- if (!area)
- area = qemu_vmalloc(memory);
-
- return area;
+ return (alloc_hpage_mem(size, mem_path));
+ else
+ return (qemu_vmalloc(size));
}
int main(int argc, char **argv)
@@ -8962,6 +8971,9 @@ int main(int argc, char **argv)
case QEMU_OPTION_mempath:
mem_path = optarg;
break;
+ case QEMU_OPTION_mem_prealloc:
+ mem_prealloc = !mem_prealloc;
+ break;
case QEMU_OPTION_name:
qemu_name = optarg;
break;
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: patch: qemu + hugetlbfs..
2008-07-10 20:16 ` john cooper
@ 2008-07-10 20:47 ` Anthony Liguori
2008-07-10 21:12 ` john cooper
0 siblings, 1 reply; 21+ messages in thread
From: Anthony Liguori @ 2008-07-10 20:47 UTC (permalink / raw)
To: john cooper; +Cc: Marcelo Tosatti, kvm, john.cooper
john cooper wrote:
> Anthony Liguori wrote:
>
>>> +#include <asm/param.h>
>>>
>>
>> I don't think this is necessary anymore. Depending on a Linux
>> headers breaks the QEMU build on other unices so it's a bad thing.
>
> It is no longer required, but see below.
>
>> hpage is a misnomer too as we aren't actually dependent on huge pages
>> (this code should work equally well for tmpfs).
>
> As it currently exists alloc_hpage_mem() is tied to
> the notion of huge page allocation as it will reference
> gethugepagesize() irrespective of *mem_path. So even
> in the case of tmpfs backed files, if the host kernel
> has been configured with CONFIG_HUGETLBFS we will wind
> up doing allocations of /dev/shm mapped files at
> /proc/meminfo:Hugepagesize granularity.
Which is fine. It just means we round -m values up to even numbers.
> Otherwise if
> HUGETLBFS is not configured gethugepagesize() returns
> zero and alloc_hpage_mem() itself will not perform the
> allocation.
That sounds like a bug.
>
> Probably not what was intended but probably not too
> much of a concern as "-mem-path /dev/shm" is likely
> only used in debug of this flag and associated logic.
> I don't see it currently being worth the trouble to
> correct from a squeaky clean POV, and doing so may
> drag in far more than the header file we've just
> booted above to deal with this architecture/config
> dependency.
Renaming a function to a name that's less accurate seems bad to me. I
don't mean to be pedantic, but it seems like a strange thing to do. I
prefer it the way it was before.
Regards,
Anthony Liguori
> An updated patch is attached.
>
> -john
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: patch: qemu + hugetlbfs..
2008-07-10 20:47 ` Anthony Liguori
@ 2008-07-10 21:12 ` john cooper
2008-07-10 21:38 ` Anthony Liguori
0 siblings, 1 reply; 21+ messages in thread
From: john cooper @ 2008-07-10 21:12 UTC (permalink / raw)
To: Anthony Liguori; +Cc: Marcelo Tosatti, kvm, john.cooper, john cooper
[-- Attachment #1: Type: text/plain, Size: 1495 bytes --]
Anthony Liguori wrote:
> john cooper wrote:
>> As it currently exists alloc_hpage_mem() is tied to
>> the notion of huge page allocation as it will reference
>> gethugepagesize() irrespective of *mem_path. So even
>> in the case of tmpfs backed files, if the host kernel
>> has been configured with CONFIG_HUGETLBFS we will wind
>> up doing allocations of /dev/shm mapped files at
>> /proc/meminfo:Hugepagesize granularity.
>
> Which is fine. It just means we round -m values up to even numbers.
Well, yes it will round the allocation. But from a
minimally sufficient 4KB boundary to that of 4MB/2MB
relative to a 32/64 bit x86 host which is excessive.
>> Probably not what was intended but probably not too
>> much of a concern as "-mem-path /dev/shm" is likely
>> only used in debug of this flag and associated logic.
>> I don't see it currently being worth the trouble to
>> correct from a squeaky clean POV, and doing so may
>> drag in far more than the header file we've just
>> booted above to deal with this architecture/config
>> dependency.
>
> Renaming a function to a name that's less accurate seems bad to me. I
> don't mean to be pedantic, but it seems like a strange thing to do. I
> prefer it the way it was before.
I don't see any harm reverting the name. But I do
believe it is largely cosmetic as given the above,
the current code does require some work to make it
independent of huge page assumptions. Update attached.
-john
--
john.cooper@third-harmonic.com
[-- Attachment #2: prealloc.diff-08071002 --]
[-- Type: text/plain, Size: 3452 bytes --]
--- a/qemu/vl.c
+++ b/qemu/vl.c
@@ -234,6 +234,7 @@ int autostart = 1;
int time_drift_fix = 0;
unsigned int kvm_shadow_memory = 0;
const char *mem_path = NULL;
+int mem_prealloc = 1; /* force preallocation of physical target memory */
int hpagesize = 0;
const char *cpu_vendor_string;
#ifdef TARGET_ARM
@@ -7809,7 +7810,10 @@ static void help(int exitcode)
#endif
"-tdf inject timer interrupts that got lost\n"
"-kvm-shadow-memory megs set the amount of shadow pages to be allocated\n"
- "-mem-path set the path to hugetlbfs/tmpfs mounted directory, also enables allocation of guest memory with huge pages\n"
+ "-mem-path set the path to hugetlbfs/tmpfs mounted directory, also\n"
+ " enables allocation of guest memory with huge pages\n"
+ "-mem-prealloc toggles preallocation of -mem-path backed physical memory\n"
+ " at startup. Default is enabled.\n"
"-option-rom rom load a file, rom, into the option ROM space\n"
#ifdef TARGET_SPARC
"-prom-env variable=value set OpenBIOS nvram variables\n"
@@ -7932,6 +7936,7 @@ enum {
QEMU_OPTION_tdf,
QEMU_OPTION_kvm_shadow_memory,
QEMU_OPTION_mempath,
+ QEMU_OPTION_mem_prealloc
};
typedef struct QEMUOption {
@@ -8059,6 +8064,7 @@ const QEMUOption qemu_options[] = {
{ "startdate", HAS_ARG, QEMU_OPTION_startdate },
{ "tb-size", HAS_ARG, QEMU_OPTION_tb_size },
{ "mem-path", HAS_ARG, QEMU_OPTION_mempath },
+ { "mem-prealloc", 0, QEMU_OPTION_mem_prealloc },
{ NULL },
};
@@ -8276,11 +8282,13 @@ static int gethugepagesize(void)
return hugepagesize;
}
+/* attempt to allocate memory mmap'ed to mem_path
+ */
void *alloc_mem_area(unsigned long memory, const char *path)
{
char *filename;
void *area;
- int fd;
+ int fd, flags;
if (asprintf(&filename, "%s/kvm.XXXXXX", path) == -1)
return NULL;
@@ -8308,26 +8316,27 @@ void *alloc_mem_area(unsigned long memor
*/
ftruncate(fd, memory);
- area = mmap(0, memory, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
- if (area == MAP_FAILED) {
- perror("mmap");
- close(fd);
- return NULL;
- }
-
- return area;
+ /* NB: MAP_POPULATE won't exhaustively alloc all phys pages in the case
+ * MAP_PRIVATE is requested. For mem_prealloc we mmap as MAP_SHARED
+ * to sidestep this quirk.
+ */
+ flags = mem_prealloc ? MAP_POPULATE|MAP_SHARED : MAP_PRIVATE;
+ area = mmap(0, memory, PROT_READ|PROT_WRITE, flags, fd, 0);
+ if (area != MAP_FAILED)
+ return (area);
+ perror("alloc_mem_area: can't mmap hugetlbfs pages");
+ close(fd);
+ return (NULL);
}
-void *qemu_alloc_physram(unsigned long memory)
+/* allocate guest memory as requested
+ */
+void *qemu_alloc_physram(unsigned long size)
{
- void *area = NULL;
-
if (mem_path)
- area = alloc_mem_area(memory, mem_path);
- if (!area)
- area = qemu_vmalloc(memory);
-
- return area;
+ return (alloc_mem_area(size, mem_path));
+ else
+ return (qemu_vmalloc(size));
}
int main(int argc, char **argv)
@@ -8962,6 +8971,9 @@ int main(int argc, char **argv)
case QEMU_OPTION_mempath:
mem_path = optarg;
break;
+ case QEMU_OPTION_mem_prealloc:
+ mem_prealloc = !mem_prealloc;
+ break;
case QEMU_OPTION_name:
qemu_name = optarg;
break;
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: patch: qemu + hugetlbfs..
2008-07-10 21:12 ` john cooper
@ 2008-07-10 21:38 ` Anthony Liguori
2008-08-25 23:05 ` Resend: " john cooper
0 siblings, 1 reply; 21+ messages in thread
From: Anthony Liguori @ 2008-07-10 21:38 UTC (permalink / raw)
To: john cooper; +Cc: Marcelo Tosatti, kvm, john.cooper
john cooper wrote:
> Anthony Liguori wrote:
>> john cooper wrote:
>>> As it currently exists alloc_hpage_mem() is tied to
>>> the notion of huge page allocation as it will reference
>>> gethugepagesize() irrespective of *mem_path. So even
>>> in the case of tmpfs backed files, if the host kernel
>>> has been configured with CONFIG_HUGETLBFS we will wind
>>> up doing allocations of /dev/shm mapped files at
>>> /proc/meminfo:Hugepagesize granularity.
>>
>> Which is fine. It just means we round -m values up to even numbers.
>
> Well, yes it will round the allocation. But from a
> minimally sufficient 4KB boundary to that of 4MB/2MB
> relative to a 32/64 bit x86 host which is excessive.
>
>>> Probably not what was intended but probably not too
>>> much of a concern as "-mem-path /dev/shm" is likely
>>> only used in debug of this flag and associated logic.
>>> I don't see it currently being worth the trouble to
>>> correct from a squeaky clean POV, and doing so may
>>> drag in far more than the header file we've just
>>> booted above to deal with this architecture/config
>>> dependency.
>>
>> Renaming a function to a name that's less accurate seems bad to me.
>> I don't mean to be pedantic, but it seems like a strange thing to
>> do. I prefer it the way it was before.
>
> I don't see any harm reverting the name. But I do
> believe it is largely cosmetic as given the above,
> the current code does require some work to make it
> independent of huge page assumptions. Update attached.
>
> -john
Looks good to me.
Acked-by: Anthony Liguori <aliguori@us.ibm.com>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Resend: patch: qemu + hugetlbfs..
2008-07-10 21:38 ` Anthony Liguori
@ 2008-08-25 23:05 ` john cooper
2008-08-26 8:11 ` Avi Kivity
2009-01-16 2:19 ` john cooper
0 siblings, 2 replies; 21+ messages in thread
From: john cooper @ 2008-08-25 23:05 UTC (permalink / raw)
To: kvm; +Cc: Anthony Liguori, Marcelo Tosatti, john.cooper, avi
[-- Attachment #1: Type: text/plain, Size: 2612 bytes --]
This patch from over a month ago doesn't seem to have
made it into kvm-73 and may have been lost in the
shuffle. Attached is essentially the same patch but
as applied to kvm-73, and validated relative to that
version.
In a nutshell the intention here is to allow
preallocation of guest huge page backed memory at
qemu initialization time to avoid a quirk in the
kernel's huge page accounting allowing overcommit
of huge pages. Failure of the kernel to resolve a
guest fault to overcommitted huge page memory during
runtime results in sigkill termination of the guest.
This patch provides the option of avoiding such
behavior at the cost of up-front preallocation of
physical huge pages backing the guest.
-john
Anthony Liguori wrote:
> john cooper wrote:
>> Anthony Liguori wrote:
>>> john cooper wrote:
>>>> As it currently exists alloc_hpage_mem() is tied to
>>>> the notion of huge page allocation as it will reference
>>>> gethugepagesize() irrespective of *mem_path. So even
>>>> in the case of tmpfs backed files, if the host kernel
>>>> has been configured with CONFIG_HUGETLBFS we will wind
>>>> up doing allocations of /dev/shm mapped files at
>>>> /proc/meminfo:Hugepagesize granularity.
>>>
>>> Which is fine. It just means we round -m values up to even numbers.
>>
>> Well, yes it will round the allocation. But from a
>> minimally sufficient 4KB boundary to that of 4MB/2MB
>> relative to a 32/64 bit x86 host which is excessive.
>>
>>>> Probably not what was intended but probably not too
>>>> much of a concern as "-mem-path /dev/shm" is likely
>>>> only used in debug of this flag and associated logic.
>>>> I don't see it currently being worth the trouble to
>>>> correct from a squeaky clean POV, and doing so may
>>>> drag in far more than the header file we've just
>>>> booted above to deal with this architecture/config
>>>> dependency.
>>>
>>> Renaming a function to a name that's less accurate seems bad to me.
>>> I don't mean to be pedantic, but it seems like a strange thing to
>>> do. I prefer it the way it was before.
>>
>> I don't see any harm reverting the name. But I do
>> believe it is largely cosmetic as given the above,
>> the current code does require some work to make it
>> independent of huge page assumptions. Update attached.
>>
>> -john
>
> Looks good to me.
>
> Acked-by: Anthony Liguori <aliguori@us.ibm.com>
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
john.cooper@third-harmonic.com
[-- Attachment #2: prealloc.diff-080825 --]
[-- Type: text/plain, Size: 3631 bytes --]
vl.c | 48 ++++++++++++++++++++++++++++++------------------
1 file changed, 30 insertions(+), 18 deletions(-)
=================================================================
--- ./qemu/vl.c.PAORG
+++ ./qemu/vl.c
@@ -239,6 +239,7 @@ int autostart = 1;
int time_drift_fix = 0;
unsigned int kvm_shadow_memory = 0;
const char *mem_path = NULL;
+int mem_prealloc = 1; /* force preallocation of physical target memory */
int hpagesize = 0;
const char *cpu_vendor_string;
#ifdef TARGET_ARM
@@ -8423,7 +8424,10 @@ static void help(int exitcode)
#endif
"-tdf inject timer interrupts that got lost\n"
"-kvm-shadow-memory megs set the amount of shadow pages to be allocated\n"
- "-mem-path set the path to hugetlbfs/tmpfs mounted directory, also enables allocation of guest memory with huge pages\n"
+ "-mem-path set the path to hugetlbfs/tmpfs mounted directory, also\n"
+ " enables allocation of guest memory with huge pages\n"
+ "-mem-prealloc toggles preallocation of -mem-path backed physical memory\n"
+ " at startup. Default is enabled.\n"
"-option-rom rom load a file, rom, into the option ROM space\n"
#ifdef TARGET_SPARC
"-prom-env variable=value set OpenBIOS nvram variables\n"
@@ -8546,6 +8550,7 @@ enum {
QEMU_OPTION_tdf,
QEMU_OPTION_kvm_shadow_memory,
QEMU_OPTION_mempath,
+ QEMU_OPTION_mem_prealloc
};
typedef struct QEMUOption {
@@ -8671,6 +8676,7 @@ const QEMUOption qemu_options[] = {
{ "tb-size", HAS_ARG, QEMU_OPTION_tb_size },
{ "icount", HAS_ARG, QEMU_OPTION_icount },
{ "mem-path", HAS_ARG, QEMU_OPTION_mempath },
+ { "mem-prealloc", 0, QEMU_OPTION_mem_prealloc },
{ NULL },
};
@@ -8890,11 +8896,13 @@ static int gethugepagesize(void)
return hugepagesize;
}
+/* attempt to allocate memory mmap'ed to mem_path
+ */
void *alloc_mem_area(unsigned long memory, const char *path)
{
char *filename;
void *area;
- int fd;
+ int fd, flags;
if (asprintf(&filename, "%s/kvm.XXXXXX", path) == -1)
return NULL;
@@ -8922,26 +8930,27 @@ void *alloc_mem_area(unsigned long memor
*/
ftruncate(fd, memory);
- area = mmap(0, memory, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
- if (area == MAP_FAILED) {
- perror("mmap");
- close(fd);
- return NULL;
- }
-
- return area;
+ /* NB: MAP_POPULATE won't exhaustively alloc all phys pages in the case
+ * MAP_PRIVATE is requested. For mem_prealloc we mmap as MAP_SHARED
+ * to sidestep this quirk.
+ */
+ flags = mem_prealloc ? MAP_POPULATE|MAP_SHARED : MAP_PRIVATE;
+ area = mmap(0, memory, PROT_READ|PROT_WRITE, flags, fd, 0);
+ if (area != MAP_FAILED)
+ return (area);
+ perror("alloc_mem_area: can't mmap hugetlbfs pages");
+ close(fd);
+ return (NULL);
}
-void *qemu_alloc_physram(unsigned long memory)
+/* allocate guest memory as requested
+ */
+void *qemu_alloc_physram(unsigned long size)
{
- void *area = NULL;
-
if (mem_path)
- area = alloc_mem_area(memory, mem_path);
- if (!area)
- area = qemu_vmalloc(memory);
-
- return area;
+ return (alloc_mem_area(size, mem_path));
+ else
+ return (qemu_vmalloc(size));
}
int main(int argc, char **argv)
@@ -9546,6 +9555,9 @@ int main(int argc, char **argv)
case QEMU_OPTION_mempath:
mem_path = optarg;
break;
+ case QEMU_OPTION_mem_prealloc:
+ mem_prealloc = !mem_prealloc;
+ break;
case QEMU_OPTION_name:
qemu_name = optarg;
break;
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: Resend: patch: qemu + hugetlbfs..
2008-08-25 23:05 ` Resend: " john cooper
@ 2008-08-26 8:11 ` Avi Kivity
2008-08-27 4:13 ` john cooper
2009-01-16 2:19 ` john cooper
1 sibling, 1 reply; 21+ messages in thread
From: Avi Kivity @ 2008-08-26 8:11 UTC (permalink / raw)
To: john cooper; +Cc: kvm, Anthony Liguori, Marcelo Tosatti, john.cooper
john cooper wrote:
> This patch from over a month ago doesn't seem to have
> made it into kvm-73 and may have been lost in the
> shuffle. Attached is essentially the same patch but
> as applied to kvm-73, and validated relative to that
> version.
>
I must have missed it. Thanks for persisting.
> In a nutshell the intention here is to allow
> preallocation of guest huge page backed memory at
> qemu initialization time to avoid a quirk in the
> kernel's huge page accounting allowing overcommit
> of huge pages. Failure of the kernel to resolve a
> guest fault to overcommitted huge page memory during
> runtime results in sigkill termination of the guest.
> This patch provides the option of avoiding such
> behavior at the cost of up-front preallocation of
> physical huge pages backing the guest.
>
What is the motivation for providing an option to disable this? If we
can detect mem-path is backed by huge pages somehow, I think we can
prefault the memory unconditionally.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Resend: patch: qemu + hugetlbfs..
2008-08-26 8:11 ` Avi Kivity
@ 2008-08-27 4:13 ` john cooper
0 siblings, 0 replies; 21+ messages in thread
From: john cooper @ 2008-08-27 4:13 UTC (permalink / raw)
To: Avi Kivity
Cc: john cooper, kvm, Anthony Liguori, Marcelo Tosatti, john.cooper
Avi Kivity wrote:
> john cooper wrote:
>> This patch from over a month ago doesn't seem to have
>> made it into kvm-73 and may have been lost in the
>> shuffle. Attached is essentially the same patch but
>> as applied to kvm-73, and validated relative to that
>> version.
>>
>
> What is the motivation for providing an option to disable this? If we
> can detect mem-path is backed by huge pages somehow, I think we can
> prefault the memory unconditionally.
>
Pre-allocation of the entire huge page backed guest
memory avoids the nondeterministic termination but
admittedly is overly pessimistic. As this patch does
so by default when -mem-path is specified, allowing
for disable of pre-allocation simply reverts this
change to prior behavior for use cases more tolerant
to it as well as for debug purposes.
The real fix arguably hinges on huge pages having
more general virtual memory behavior. But that
appears to be a much longer term prospect.
-john
--
john.cooper@redhat.com
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Resend: patch: qemu + hugetlbfs..
2008-08-25 23:05 ` Resend: " john cooper
2008-08-26 8:11 ` Avi Kivity
@ 2009-01-16 2:19 ` john cooper
2009-01-20 10:29 ` Avi Kivity
1 sibling, 1 reply; 21+ messages in thread
From: john cooper @ 2009-01-16 2:19 UTC (permalink / raw)
To: kvm, aliguori; +Cc: Marcelo Tosatti, john.cooper, avi
[-- Attachment #1: Type: text/plain, Size: 2975 bytes --]
This trivial patch never did manage to find its way
in. Marcelo called it to my attention earlier in
the week. I've tweaked it to apply to kvm-83 and
the resulting patch is attached. I've left the
prior e-mail discussion below for reference.
-john
john cooper wrote:
> This patch from over a month ago doesn't seem to have
> made it into kvm-73 and may have been lost in the
> shuffle. Attached is essentially the same patch but
> as applied to kvm-73, and validated relative to that
> version.
>
> In a nutshell the intention here is to allow
> preallocation of guest huge page backed memory at
> qemu initialization time to avoid a quirk in the
> kernel's huge page accounting allowing overcommit
> of huge pages. Failure of the kernel to resolve a
> guest fault to overcommitted huge page memory during
> runtime results in sigkill termination of the guest.
> This patch provides the option of avoiding such
> behavior at the cost of up-front preallocation of
> physical huge pages backing the guest.
>
> -john
>
>
> Anthony Liguori wrote:
>> john cooper wrote:
>>> Anthony Liguori wrote:
>>>> john cooper wrote:
>>>>> As it currently exists alloc_hpage_mem() is tied to
>>>>> the notion of huge page allocation as it will reference
>>>>> gethugepagesize() irrespective of *mem_path. So even
>>>>> in the case of tmpfs backed files, if the host kernel
>>>>> has been configured with CONFIG_HUGETLBFS we will wind
>>>>> up doing allocations of /dev/shm mapped files at
>>>>> /proc/meminfo:Hugepagesize granularity.
>>>>
>>>> Which is fine. It just means we round -m values up to even numbers.
>>>
>>> Well, yes it will round the allocation. But from a
>>> minimally sufficient 4KB boundary to that of 4MB/2MB
>>> relative to a 32/64 bit x86 host which is excessive.
>>>
>>>>> Probably not what was intended but probably not too
>>>>> much of a concern as "-mem-path /dev/shm" is likely
>>>>> only used in debug of this flag and associated logic.
>>>>> I don't see it currently being worth the trouble to
>>>>> correct from a squeaky clean POV, and doing so may
>>>>> drag in far more than the header file we've just
>>>>> booted above to deal with this architecture/config
>>>>> dependency.
>>>>
>>>> Renaming a function to a name that's less accurate seems bad to me.
>>>> I don't mean to be pedantic, but it seems like a strange thing to
>>>> do. I prefer it the way it was before.
>>>
>>> I don't see any harm reverting the name. But I do
>>> believe it is largely cosmetic as given the above,
>>> the current code does require some work to make it
>>> independent of huge page assumptions. Update attached.
>>>
>>> -john
>>
>> Looks good to me.
>>
>> Acked-by: Anthony Liguori <aliguori@us.ibm.com>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
>
--
john.cooper@third-harmonic.com
[-- Attachment #2: prealloc.diff-090115 --]
[-- Type: text/plain, Size: 3552 bytes --]
kernel/x86/Kbuild | 4 ++--
qemu/vl.c | 27 ++++++++++++++++++++-------
2 files changed, 22 insertions(+), 9 deletions(-)
=================================================================
--- a/qemu/vl.c
+++ b/qemu/vl.c
@@ -237,6 +237,7 @@ int semihosting_enabled = 0;
int time_drift_fix = 0;
unsigned int kvm_shadow_memory = 0;
const char *mem_path = NULL;
+int mem_prealloc = 1; /* force preallocation of physical target memory */
int hpagesize = 0;
const char *cpu_vendor_string;
#ifdef TARGET_ARM
@@ -4116,7 +4117,10 @@ static void help(int exitcode)
#endif
"-tdf inject timer interrupts that got lost\n"
"-kvm-shadow-memory megs set the amount of shadow pages to be allocated\n"
- "-mem-path set the path to hugetlbfs/tmpfs mounted directory, also enables allocation of guest memory with huge pages\n"
+ "-mem-path set the path to hugetlbfs/tmpfs mounted directory, also\n"
+ " enables allocation of guest memory with huge pages\n"
+ "-mem-prealloc toggles preallocation of -mem-path backed physical memory\n"
+ " at startup. Default is enabled.\n"
"-option-rom rom load a file, rom, into the option ROM space\n"
#ifdef TARGET_SPARC
"-prom-env variable=value set OpenBIOS nvram variables\n"
@@ -4246,6 +4250,7 @@ enum {
QEMU_OPTION_tdf,
QEMU_OPTION_kvm_shadow_memory,
QEMU_OPTION_mempath,
+ QEMU_OPTION_mem_prealloc
};
typedef struct QEMUOption {
@@ -4381,6 +4386,7 @@ static const QEMUOption qemu_options[] =
{ "icount", HAS_ARG, QEMU_OPTION_icount },
{ "incoming", HAS_ARG, QEMU_OPTION_incoming },
{ "mem-path", HAS_ARG, QEMU_OPTION_mempath },
+ { "mem-prealloc", 0, QEMU_OPTION_mem_prealloc },
{ NULL },
};
@@ -4662,7 +4668,7 @@ void *alloc_mem_area(size_t memory, unsi
{
char *filename;
void *area;
- int fd;
+ int fd, flags;
if (asprintf(&filename, "%s/kvm.XXXXXX", path) == -1)
return NULL;
@@ -4690,13 +4696,17 @@ void *alloc_mem_area(size_t memory, unsi
*/
ftruncate(fd, memory);
- area = mmap(0, memory, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
+ /* NB: MAP_POPULATE won't exhaustively alloc all phys pages in the case
+ * MAP_PRIVATE is requested. For mem_prealloc we mmap as MAP_SHARED
+ * to sidestep this quirk.
+ */
+ flags = mem_prealloc ? MAP_POPULATE|MAP_SHARED : MAP_PRIVATE;
+ area = mmap(0, memory, PROT_READ|PROT_WRITE, flags, fd, 0);
if (area == MAP_FAILED) {
- perror("mmap");
- close(fd);
- return NULL;
+ perror("alloc_mem_area: can't mmap hugetlbfs pages");
+ close(fd);
+ return (NULL);
}
-
*len = memory;
return area;
}
@@ -5377,6 +5387,9 @@ int main(int argc, char **argv, char **e
case QEMU_OPTION_mempath:
mem_path = optarg;
break;
+ case QEMU_OPTION_mem_prealloc:
+ mem_prealloc = !mem_prealloc;
+ break;
case QEMU_OPTION_name:
qemu_name = optarg;
break;
=================================================================
--- a/kernel/x86/Kbuild
+++ b/kernel/x86/Kbuild
@@ -9,8 +9,8 @@ kvm-objs := kvm_main.o x86.o mmu.o x86_e
ifeq ($(EXT_CONFIG_KVM_TRACE),y)
kvm-objs += kvm_trace.o
endif
-ifeq ($(CONFIG_DMAR),y)
-kvm-objs += vtd.o
+ifeq ($(CONFIG_IOMMU_API),y)
+kvm-objs += iommu.o
endif
kvm-intel-objs := vmx.o vmx-debug.o ../external-module-compat.o
kvm-amd-objs := svm.o ../external-module-compat.o
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: Resend: patch: qemu + hugetlbfs..
2009-01-16 2:19 ` john cooper
@ 2009-01-20 10:29 ` Avi Kivity
2009-01-23 21:21 ` john cooper
0 siblings, 1 reply; 21+ messages in thread
From: Avi Kivity @ 2009-01-20 10:29 UTC (permalink / raw)
To: john cooper; +Cc: kvm, aliguori, Marcelo Tosatti, john.cooper
john cooper wrote:
> This trivial patch never did manage to find its way
> in. Marcelo called it to my attention earlier in
> the week. I've tweaked it to apply to kvm-83 and
> the resulting patch is attached. I've left the
> prior e-mail discussion below for reference.
>
Sorry for missing this (copying me helps). Please resubmit with a
signoff. Also, please protect with #ifdef MAP_POPULATE to ease merging
into upstream eventually.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Resend: patch: qemu + hugetlbfs..
2009-01-20 10:29 ` Avi Kivity
@ 2009-01-23 21:21 ` john cooper
2009-02-05 15:42 ` Avi Kivity
0 siblings, 1 reply; 21+ messages in thread
From: john cooper @ 2009-01-23 21:21 UTC (permalink / raw)
To: Avi Kivity; +Cc: john cooper, kvm, aliguori, Marcelo Tosatti, john.cooper
[-- Attachment #1: Type: text/plain, Size: 589 bytes --]
Avi Kivity wrote:
> john cooper wrote:
>> This trivial patch never did manage to find its way
>> in. Marcelo called it to my attention earlier in
>> the week. I've tweaked it to apply to kvm-83 and
>> the resulting patch is attached. I've left the
>> prior e-mail discussion below for reference.
>>
>
> Sorry for missing this (copying me helps). Please resubmit with a
> signoff.Also, please protect with #ifdef MAP_POPULATE to ease merging
> into upstream eventually.
Updated patch attached.
-john
Signed-off-by: john cooper <john.cooper@redhat.com>
--
john.cooper@redhat.com
[-- Attachment #2: prealloc.diff-090123 --]
[-- Type: text/plain, Size: 3734 bytes --]
kernel/x86/Kbuild | 4 ++--
qemu/vl.c | 38 ++++++++++++++++++++++++++++++++++----
2 files changed, 36 insertions(+), 6 deletions(-)
=================================================================
--- a/qemu/vl.c
+++ b/qemu/vl.c
@@ -237,6 +237,9 @@ int semihosting_enabled = 0;
int time_drift_fix = 0;
unsigned int kvm_shadow_memory = 0;
const char *mem_path = NULL;
+#ifdef MAP_POPULATE
+int mem_prealloc = 1; /* force preallocation of physical target memory */
+#endif
int hpagesize = 0;
const char *cpu_vendor_string;
#ifdef TARGET_ARM
@@ -4116,7 +4119,12 @@ static void help(int exitcode)
#endif
"-tdf inject timer interrupts that got lost\n"
"-kvm-shadow-memory megs set the amount of shadow pages to be allocated\n"
- "-mem-path set the path to hugetlbfs/tmpfs mounted directory, also enables allocation of guest memory with huge pages\n"
+ "-mem-path set the path to hugetlbfs/tmpfs mounted directory, also\n"
+ " enables allocation of guest memory with huge pages\n"
+#ifdef MAP_POPULATE
+ "-mem-prealloc toggles preallocation of -mem-path backed physical memory\n"
+ " at startup. Default is enabled.\n"
+#endif
"-option-rom rom load a file, rom, into the option ROM space\n"
#ifdef TARGET_SPARC
"-prom-env variable=value set OpenBIOS nvram variables\n"
@@ -4246,6 +4254,9 @@ enum {
QEMU_OPTION_tdf,
QEMU_OPTION_kvm_shadow_memory,
QEMU_OPTION_mempath,
+#ifdef MAP_POPULATE
+ QEMU_OPTION_mem_prealloc,
+#endif
};
typedef struct QEMUOption {
@@ -4381,6 +4392,9 @@ static const QEMUOption qemu_options[] =
{ "icount", HAS_ARG, QEMU_OPTION_icount },
{ "incoming", HAS_ARG, QEMU_OPTION_incoming },
{ "mem-path", HAS_ARG, QEMU_OPTION_mempath },
+#ifdef MAP_POPULATE
+ { "mem-prealloc", 0, QEMU_OPTION_mem_prealloc },
+#endif
{ NULL },
};
@@ -4663,6 +4677,9 @@ void *alloc_mem_area(size_t memory, unsi
char *filename;
void *area;
int fd;
+#ifdef MAP_POPULATE
+ int flags;
+#endif
if (asprintf(&filename, "%s/kvm.XXXXXX", path) == -1)
return NULL;
@@ -4690,13 +4707,21 @@ void *alloc_mem_area(size_t memory, unsi
*/
ftruncate(fd, memory);
+#ifdef MAP_POPULATE
+ /* NB: MAP_POPULATE won't exhaustively alloc all phys pages in the case
+ * MAP_PRIVATE is requested. For mem_prealloc we mmap as MAP_SHARED
+ * to sidestep this quirk.
+ */
+ flags = mem_prealloc ? MAP_POPULATE|MAP_SHARED : MAP_PRIVATE;
+ area = mmap(0, memory, PROT_READ|PROT_WRITE, flags, fd, 0);
+#else
area = mmap(0, memory, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
+#endif
if (area == MAP_FAILED) {
- perror("mmap");
+ perror("alloc_mem_area: can't mmap hugetlbfs pages");
close(fd);
- return NULL;
+ return (NULL);
}
-
*len = memory;
return area;
}
@@ -5377,6 +5402,11 @@ int main(int argc, char **argv, char **e
case QEMU_OPTION_mempath:
mem_path = optarg;
break;
+#ifdef MAP_POPULATE
+ case QEMU_OPTION_mem_prealloc:
+ mem_prealloc = !mem_prealloc;
+ break;
+#endif
case QEMU_OPTION_name:
qemu_name = optarg;
break;
=================================================================
--- a/kernel/x86/Kbuild
+++ b/kernel/x86/Kbuild
@@ -9,8 +9,8 @@ kvm-objs := kvm_main.o x86.o mmu.o x86_e
ifeq ($(EXT_CONFIG_KVM_TRACE),y)
kvm-objs += kvm_trace.o
endif
-ifeq ($(CONFIG_DMAR),y)
-kvm-objs += vtd.o
+ifeq ($(CONFIG_IOMMU_API),y)
+kvm-objs += iommu.o
endif
kvm-intel-objs := vmx.o vmx-debug.o ../external-module-compat.o
kvm-amd-objs := svm.o ../external-module-compat.o
^ permalink raw reply [flat|nested] 21+ messages in thread* Re: Resend: patch: qemu + hugetlbfs..
2009-01-23 21:21 ` john cooper
@ 2009-02-05 15:42 ` Avi Kivity
2009-02-05 16:12 ` Marcelo Tosatti
0 siblings, 1 reply; 21+ messages in thread
From: Avi Kivity @ 2009-02-05 15:42 UTC (permalink / raw)
To: john cooper; +Cc: john cooper, kvm, aliguori, Marcelo Tosatti
john cooper wrote:
> Avi Kivity wrote:
>> john cooper wrote:
>>> This trivial patch never did manage to find its way
>>> in. Marcelo called it to my attention earlier in
>>> the week. I've tweaked it to apply to kvm-83 and
>>> the resulting patch is attached. I've left the
>>> prior e-mail discussion below for reference.
>>>
>>
>> Sorry for missing this (copying me helps). Please resubmit with a
>> signoff.Also, please protect with #ifdef MAP_POPULATE to ease merging
>> into upstream eventually.
> Updated patch attached.
>
Sorry, still rejects horribly. What did you generate this against?
The kernel/ part seems unrelated.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Resend: patch: qemu + hugetlbfs..
2009-02-05 15:42 ` Avi Kivity
@ 2009-02-05 16:12 ` Marcelo Tosatti
2009-02-05 16:15 ` Avi Kivity
0 siblings, 1 reply; 21+ messages in thread
From: Marcelo Tosatti @ 2009-02-05 16:12 UTC (permalink / raw)
To: Avi Kivity; +Cc: john cooper, john cooper, kvm, aliguori
On Thu, Feb 05, 2009 at 05:42:34PM +0200, Avi Kivity wrote:
> john cooper wrote:
>> Avi Kivity wrote:
>>> john cooper wrote:
>>>> This trivial patch never did manage to find its way
>>>> in. Marcelo called it to my attention earlier in
>>>> the week. I've tweaked it to apply to kvm-83 and
>>>> the resulting patch is attached. I've left the
>>>> prior e-mail discussion below for reference.
>>>>
>>>
>>> Sorry for missing this (copying me helps). Please resubmit with a
>>> signoff.Also, please protect with #ifdef MAP_POPULATE to ease merging
>>> into upstream eventually.
>> Updated patch attached.
>>
>
> Sorry, still rejects horribly. What did you generate this against?
>
> The kernel/ part seems unrelated.
This was merged through the kvm-devel branch (unless you dropped it).
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Resend: patch: qemu + hugetlbfs..
2009-02-05 16:12 ` Marcelo Tosatti
@ 2009-02-05 16:15 ` Avi Kivity
0 siblings, 0 replies; 21+ messages in thread
From: Avi Kivity @ 2009-02-05 16:15 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: john cooper, john cooper, kvm, aliguori
Marcelo Tosatti wrote:
>>
>> Sorry, still rejects horribly. What did you generate this against?
>>
>> The kernel/ part seems unrelated.
>>
>
> This was merged through the kvm-devel branch (unless you dropped it).
>
Right, so that's why it rejected.
I even saw it in the log when I tried to find why it rejected... and
ignored it.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2009-02-05 16:17 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-08 22:02 patch: qemu + hugetlbfs john cooper
2008-07-08 23:09 ` Anthony Liguori
2008-07-09 0:23 ` john cooper
2008-07-09 1:08 ` Anthony Liguori
2008-07-09 17:03 ` Marcelo Tosatti
2008-07-09 17:11 ` Anthony Liguori
2008-07-10 16:40 ` john cooper
2008-07-10 17:58 ` Anthony Liguori
2008-07-10 20:16 ` john cooper
2008-07-10 20:47 ` Anthony Liguori
2008-07-10 21:12 ` john cooper
2008-07-10 21:38 ` Anthony Liguori
2008-08-25 23:05 ` Resend: " john cooper
2008-08-26 8:11 ` Avi Kivity
2008-08-27 4:13 ` john cooper
2009-01-16 2:19 ` john cooper
2009-01-20 10:29 ` Avi Kivity
2009-01-23 21:21 ` john cooper
2009-02-05 15:42 ` Avi Kivity
2009-02-05 16:12 ` Marcelo Tosatti
2009-02-05 16:15 ` Avi Kivity
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox