* [Qemu-devel] [RFC PATCH 0/6] option to not remove files inside -mem-path dir (v2)
@ 2012-07-02 18:06 Eduardo Habkost
2012-07-02 18:06 ` [Qemu-devel] [PATCH 1/6] file_ram_alloc(): coding style fixes Eduardo Habkost
` (6 more replies)
0 siblings, 7 replies; 12+ messages in thread
From: Eduardo Habkost @ 2012-07-02 18:06 UTC (permalink / raw)
To: qemu-devel
Cc: Bill Gray, Andrea Arcangeli, Andre Przywara, kvm, Bharata B Rao
Resending series, after fixing some coding style issues. Does anybody has any
feedback about this proposal?
Changes v1 -> v2:
- Coding style fixes
Original cover letter:
I was investigating if there are any mechanisms that allow manually pinning of
guest RAM to specific host NUMA nodes, in the case of multi-node KVM guests, and
noticed that -mem-path could be used for that, except that it currently removes
any files it creates (using mkstemp()) immediately, not allowing numactl to be
used on the backing files, as a result. This patches add a -keep-mem-path-files
option to make QEMU create the files inside -mem-path with more predictable
names, and not remove them after creation.
Some previous discussions about the subject, for reference:
- Message-ID: <1281534738-8310-1-git-send-email-andre.przywara@amd.com>
http://article.gmane.org/gmane.comp.emulators.kvm.devel/57684
- Message-ID: <4C7D7C2A.7000205@codemonkey.ws>
http://article.gmane.org/gmane.comp.emulators.kvm.devel/58835
A more recent thread can be found at:
- Message-ID: <20111029184502.GH11038@in.ibm.com>
http://article.gmane.org/gmane.comp.emulators.qemu/123001
Note that this is just a mechanism to facilitate manual static binding using
numactl on hugetlbfs later, for optimization. This may be especially useful for
single large multi-node guests use-cases (and, of course, has to be used with
care).
I don't know if it is a good idea to use the memory range names as a publicly-
visible interface. Another option may be to use a single file instead, and mmap
different regions inside the same file for each memory region. I an open to
comments and suggestions.
Example (untested) usage to bind manually each half of the RAM of a guest to a
different NUMA node:
$ qemu-system-x86_64 [...] -m 2048 -smp 4 \
-numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \
-mem-prealloc -keep-mem-path-files -mem-path /mnt/hugetlbfs/FOO
$ numactl --offset=1G --length=1G --membind=1 --file /mnt/hugetlbfs/FOO/pc.ram
$ numactl --offset=0 --length=1G --membind=2 --file /mnt/hugetlbfs/FOO/pc.ram
Eduardo Habkost (6):
file_ram_alloc(): coding style fixes
file_ram_alloc(): use g_strdup_printf() instead of asprintf()
vl.c: change mem_prealloc to bool (v2)
file_ram_alloc: change length argument to size_t (v2)
file_ram_alloc(): extract temporary-file creation code to separate
function (v2)
add -keep-mem-path-files option (v2)
cpu-all.h | 3 ++-
exec.c | 68 +++++++++++++++++++++++++++++++++++++++++++------------
qemu-options.hx | 12 ++++++++++
vl.c | 9 ++++++--
4 files changed, 75 insertions(+), 17 deletions(-)
--
1.7.10.4
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH 1/6] file_ram_alloc(): coding style fixes
2012-07-02 18:06 [Qemu-devel] [RFC PATCH 0/6] option to not remove files inside -mem-path dir (v2) Eduardo Habkost
@ 2012-07-02 18:06 ` Eduardo Habkost
2012-07-03 19:16 ` Blue Swirl
2012-07-02 18:06 ` [Qemu-devel] [PATCH 2/6] file_ram_alloc(): use g_strdup_printf() instead of asprintf() Eduardo Habkost
` (5 subsequent siblings)
6 siblings, 1 reply; 12+ messages in thread
From: Eduardo Habkost @ 2012-07-02 18:06 UTC (permalink / raw)
To: qemu-devel
Cc: Andrea Arcangeli, Andre Przywara, kvm, Blue Swirl, Bharata B Rao,
Bill Gray
Cc: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
---
exec.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/exec.c b/exec.c
index 8244d54..c8bfd27 100644
--- a/exec.c
+++ b/exec.c
@@ -2392,7 +2392,7 @@ static void *file_ram_alloc(RAMBlock *block,
unlink(filename);
free(filename);
- memory = (memory+hpagesize-1) & ~(hpagesize-1);
+ memory = (memory + hpagesize - 1) & ~(hpagesize - 1);
/*
* ftruncate is not supported by hugetlbfs in older
@@ -2400,8 +2400,9 @@ static void *file_ram_alloc(RAMBlock *block,
* If anything goes wrong with it under other filesystems,
* mmap will fail.
*/
- if (ftruncate(fd, memory))
+ if (ftruncate(fd, memory)) {
perror("ftruncate");
+ }
#ifdef MAP_POPULATE
/* NB: MAP_POPULATE won't exhaustively alloc all phys pages in the case
--
1.7.10.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH 2/6] file_ram_alloc(): use g_strdup_printf() instead of asprintf()
2012-07-02 18:06 [Qemu-devel] [RFC PATCH 0/6] option to not remove files inside -mem-path dir (v2) Eduardo Habkost
2012-07-02 18:06 ` [Qemu-devel] [PATCH 1/6] file_ram_alloc(): coding style fixes Eduardo Habkost
@ 2012-07-02 18:06 ` Eduardo Habkost
2012-07-03 19:16 ` Blue Swirl
2012-07-02 18:06 ` [Qemu-devel] [PATCH 3/6] vl.c: change mem_prealloc to bool (v2) Eduardo Habkost
` (4 subsequent siblings)
6 siblings, 1 reply; 12+ messages in thread
From: Eduardo Habkost @ 2012-07-02 18:06 UTC (permalink / raw)
To: qemu-devel
Cc: Andrea Arcangeli, Andre Przywara, kvm, Blue Swirl, Bharata B Rao,
Bill Gray
Cc: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
---
exec.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/exec.c b/exec.c
index c8bfd27..d856325 100644
--- a/exec.c
+++ b/exec.c
@@ -24,6 +24,9 @@
#include <sys/mman.h>
#endif
+#include <glib.h>
+#include <glib/gprintf.h>
+
#include "qemu-common.h"
#include "cpu.h"
#include "tcg.h"
@@ -2357,7 +2360,7 @@ static void *file_ram_alloc(RAMBlock *block,
ram_addr_t memory,
const char *path)
{
- char *filename;
+ gchar *filename;
void *area;
int fd;
#ifdef MAP_POPULATE
@@ -2379,18 +2382,15 @@ static void *file_ram_alloc(RAMBlock *block,
return NULL;
}
- if (asprintf(&filename, "%s/qemu_back_mem.XXXXXX", path) == -1) {
- return NULL;
- }
-
+ filename = g_strdup_printf("%s/qemu_back_mem.XXXXXX", path);
fd = mkstemp(filename);
if (fd < 0) {
perror("unable to create backing store for hugepages");
- free(filename);
+ g_free(filename);
return NULL;
}
unlink(filename);
- free(filename);
+ g_free(filename);
memory = (memory + hpagesize - 1) & ~(hpagesize - 1);
--
1.7.10.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH 3/6] vl.c: change mem_prealloc to bool (v2)
2012-07-02 18:06 [Qemu-devel] [RFC PATCH 0/6] option to not remove files inside -mem-path dir (v2) Eduardo Habkost
2012-07-02 18:06 ` [Qemu-devel] [PATCH 1/6] file_ram_alloc(): coding style fixes Eduardo Habkost
2012-07-02 18:06 ` [Qemu-devel] [PATCH 2/6] file_ram_alloc(): use g_strdup_printf() instead of asprintf() Eduardo Habkost
@ 2012-07-02 18:06 ` Eduardo Habkost
2012-07-02 18:06 ` [Qemu-devel] [PATCH 4/6] file_ram_alloc: change length argument to size_t (v2) Eduardo Habkost
` (3 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Eduardo Habkost @ 2012-07-02 18:06 UTC (permalink / raw)
To: qemu-devel
Cc: Bill Gray, Andrea Arcangeli, Andre Przywara, kvm, Bharata B Rao
Changes v1 -> v2:
- Do not initialize variable to false, to make checkpatch.pl happy
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
---
cpu-all.h | 2 +-
vl.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/cpu-all.h b/cpu-all.h
index 9dc249a..2beed5a 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -490,7 +490,7 @@ typedef struct RAMList {
extern RAMList ram_list;
extern const char *mem_path;
-extern int mem_prealloc;
+extern bool mem_prealloc;
/* Flags stored in the low bits of the TLB virtual address. These are
defined so that fast path ram access is all zeros. */
diff --git a/vl.c b/vl.c
index 1329c30..5c80189 100644
--- a/vl.c
+++ b/vl.c
@@ -178,7 +178,7 @@ const char* keyboard_layout = NULL;
ram_addr_t ram_size;
const char *mem_path = NULL;
#ifdef MAP_POPULATE
-int mem_prealloc = 0; /* force preallocation of physical target memory */
+bool mem_prealloc; /* force preallocation of physical target memory */
#endif
int nb_nics;
NICInfo nd_table[MAX_NICS];
@@ -2673,7 +2673,7 @@ int main(int argc, char **argv, char **envp)
break;
#ifdef MAP_POPULATE
case QEMU_OPTION_mem_prealloc:
- mem_prealloc = 1;
+ mem_prealloc = true;
break;
#endif
case QEMU_OPTION_d:
--
1.7.10.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH 4/6] file_ram_alloc: change length argument to size_t (v2)
2012-07-02 18:06 [Qemu-devel] [RFC PATCH 0/6] option to not remove files inside -mem-path dir (v2) Eduardo Habkost
` (2 preceding siblings ...)
2012-07-02 18:06 ` [Qemu-devel] [PATCH 3/6] vl.c: change mem_prealloc to bool (v2) Eduardo Habkost
@ 2012-07-02 18:06 ` Eduardo Habkost
2012-07-02 18:06 ` [Qemu-devel] [PATCH 5/6] file_ram_alloc(): extract temporary-file creation code to separate function (v2) Eduardo Habkost
` (2 subsequent siblings)
6 siblings, 0 replies; 12+ messages in thread
From: Eduardo Habkost @ 2012-07-02 18:06 UTC (permalink / raw)
To: qemu-devel
Cc: Bill Gray, Andrea Arcangeli, Andre Przywara, kvm, Bharata B Rao
While we are at it, rename it to "length", as "memory" doesn't mean
anything.
Changes v1 -> v2:
- Rebase after coding style changes
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
---
exec.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/exec.c b/exec.c
index d856325..1e98244 100644
--- a/exec.c
+++ b/exec.c
@@ -2357,7 +2357,7 @@ static long gethugepagesize(const char *path)
}
static void *file_ram_alloc(RAMBlock *block,
- ram_addr_t memory,
+ size_t length,
const char *path)
{
gchar *filename;
@@ -2373,7 +2373,7 @@ static void *file_ram_alloc(RAMBlock *block,
return NULL;
}
- if (memory < hpagesize) {
+ if (length < hpagesize) {
return NULL;
}
@@ -2392,7 +2392,7 @@ static void *file_ram_alloc(RAMBlock *block,
unlink(filename);
g_free(filename);
- memory = (memory + hpagesize - 1) & ~(hpagesize - 1);
+ length = (length + hpagesize - 1) & ~(hpagesize - 1);
/*
* ftruncate is not supported by hugetlbfs in older
@@ -2400,7 +2400,7 @@ static void *file_ram_alloc(RAMBlock *block,
* If anything goes wrong with it under other filesystems,
* mmap will fail.
*/
- if (ftruncate(fd, memory)) {
+ if (ftruncate(fd, length)) {
perror("ftruncate");
}
@@ -2410,9 +2410,9 @@ static void *file_ram_alloc(RAMBlock *block,
* to sidestep this quirk.
*/
flags = mem_prealloc ? MAP_POPULATE | MAP_SHARED : MAP_PRIVATE;
- area = mmap(0, memory, PROT_READ | PROT_WRITE, flags, fd, 0);
+ area = mmap(0, length, PROT_READ | PROT_WRITE, flags, fd, 0);
#else
- area = mmap(0, memory, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
+ area = mmap(0, length, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
#endif
if (area == MAP_FAILED) {
perror("file_ram_alloc: can't mmap RAM pages");
--
1.7.10.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH 5/6] file_ram_alloc(): extract temporary-file creation code to separate function (v2)
2012-07-02 18:06 [Qemu-devel] [RFC PATCH 0/6] option to not remove files inside -mem-path dir (v2) Eduardo Habkost
` (3 preceding siblings ...)
2012-07-02 18:06 ` [Qemu-devel] [PATCH 4/6] file_ram_alloc: change length argument to size_t (v2) Eduardo Habkost
@ 2012-07-02 18:06 ` Eduardo Habkost
2012-07-02 18:06 ` [Qemu-devel] [RFC PATCH 6/6] add -keep-mem-path-files option (v2) Eduardo Habkost
2012-07-02 18:56 ` [Qemu-devel] [RFC PATCH 0/6] option to not remove files inside -mem-path dir (v2) Daniel P. Berrange
6 siblings, 0 replies; 12+ messages in thread
From: Eduardo Habkost @ 2012-07-02 18:06 UTC (permalink / raw)
To: qemu-devel
Cc: Bill Gray, Andrea Arcangeli, Andre Przywara, kvm, Bharata B Rao
Changes v1 -> v2:
- Fix trailing space issue
- Rebase against new code using g_strdup_printf()
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
---
exec.c | 29 ++++++++++++++++++++++-------
1 file changed, 22 insertions(+), 7 deletions(-)
diff --git a/exec.c b/exec.c
index 1e98244..456ac73 100644
--- a/exec.c
+++ b/exec.c
@@ -2356,11 +2356,31 @@ static long gethugepagesize(const char *path)
return fs.f_bsize;
}
+/* Return FD to temporary file inside directory at 'path',
+ * truncated to size 'length'
+ */
+static int get_temp_fd(const char *path)
+{
+ int fd;
+ gchar *filename;
+
+ filename = g_strdup_printf("%s/qemu_back_mem.XXXXXX", path);
+ fd = mkstemp(filename);
+ if (fd < 0) {
+ perror("unable to create backing store for hugepages");
+ g_free(filename);
+ return -1;
+ }
+ unlink(filename);
+ g_free(filename);
+
+ return fd;
+}
+
static void *file_ram_alloc(RAMBlock *block,
size_t length,
const char *path)
{
- gchar *filename;
void *area;
int fd;
#ifdef MAP_POPULATE
@@ -2382,15 +2402,10 @@ static void *file_ram_alloc(RAMBlock *block,
return NULL;
}
- filename = g_strdup_printf("%s/qemu_back_mem.XXXXXX", path);
- fd = mkstemp(filename);
+ fd = get_temp_fd(path);
if (fd < 0) {
- perror("unable to create backing store for hugepages");
- g_free(filename);
return NULL;
}
- unlink(filename);
- g_free(filename);
length = (length + hpagesize - 1) & ~(hpagesize - 1);
--
1.7.10.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [Qemu-devel] [RFC PATCH 6/6] add -keep-mem-path-files option (v2)
2012-07-02 18:06 [Qemu-devel] [RFC PATCH 0/6] option to not remove files inside -mem-path dir (v2) Eduardo Habkost
` (4 preceding siblings ...)
2012-07-02 18:06 ` [Qemu-devel] [PATCH 5/6] file_ram_alloc(): extract temporary-file creation code to separate function (v2) Eduardo Habkost
@ 2012-07-02 18:06 ` Eduardo Habkost
2012-07-02 18:56 ` [Qemu-devel] [RFC PATCH 0/6] option to not remove files inside -mem-path dir (v2) Daniel P. Berrange
6 siblings, 0 replies; 12+ messages in thread
From: Eduardo Habkost @ 2012-07-02 18:06 UTC (permalink / raw)
To: qemu-devel
Cc: Bill Gray, Andrea Arcangeli, Andre Przywara, kvm, Bharata B Rao
This make QEMU create files inside the -mem-path directory using
more predictable names, and not remove them afterwards.
This allow (for example) users or management layers to use numactl
later, to set NUMA policy for the guest RAM.
Changes v1 -> v2:
- Fix trailing space issue
- Coding style changes
- Do not initialize static variable to false
- Use g_strdup_printf() instead of asprintf()
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
---
cpu-all.h | 1 +
exec.c | 26 +++++++++++++++++++++++++-
qemu-options.hx | 12 ++++++++++++
vl.c | 5 +++++
4 files changed, 43 insertions(+), 1 deletion(-)
diff --git a/cpu-all.h b/cpu-all.h
index 2beed5a..acd59ff 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -490,6 +490,7 @@ typedef struct RAMList {
extern RAMList ram_list;
extern const char *mem_path;
+extern bool keep_mem_path_files;
extern bool mem_prealloc;
/* Flags stored in the low bits of the TLB virtual address. These are
diff --git a/exec.c b/exec.c
index 456ac73..2525a65 100644
--- a/exec.c
+++ b/exec.c
@@ -2377,6 +2377,25 @@ static int get_temp_fd(const char *path)
return fd;
}
+/* Return FD to RAM block file, using the memory region name as filename
+ */
+static int open_ramblock_file(RAMBlock *block, const char *path)
+{
+ int fd;
+ gchar *filename;
+
+ filename = g_strdup_printf("%s/%s", path, block->mr->name);
+ fd = open(filename, O_RDWR|O_CREAT);
+ if (fd < 0) {
+ perror("unable to open backing store for hugepages");
+ g_free(filename);
+ return -1;
+ }
+ g_free(filename);
+
+ return fd;
+}
+
static void *file_ram_alloc(RAMBlock *block,
size_t length,
const char *path)
@@ -2402,7 +2421,12 @@ static void *file_ram_alloc(RAMBlock *block,
return NULL;
}
- fd = get_temp_fd(path);
+ if (keep_mem_path_files) {
+ fd = open_ramblock_file(block, path);
+ } else {
+ fd = get_temp_fd(path);
+ }
+
if (fd < 0) {
return NULL;
}
diff --git a/qemu-options.hx b/qemu-options.hx
index 8b66264..f2eb237 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -399,6 +399,18 @@ STEXI
Allocate guest RAM from a temporarily created file in @var{path}.
ETEXI
+DEF("keep-mem-path-files", HAS_ARG, QEMU_OPTION_keep_mempath_files,
+ "-keep-mem-path-files Keep files created in -mem-path\n", QEMU_ARCH_ALL)
+STEXI
+@item -keep-mem-path-files
+Create the files for -mem-path using the memory region names, and don't remove
+them afterwards.
+
+This allows further fine-tuning of NUMA policy for memory regions using
+numactl.
+ETEXI
+
+
#ifdef MAP_POPULATE
DEF("mem-prealloc", 0, QEMU_OPTION_mem_prealloc,
"-mem-prealloc preallocate guest memory (use with -mem-path)\n",
diff --git a/vl.c b/vl.c
index 5c80189..9f18d53 100644
--- a/vl.c
+++ b/vl.c
@@ -177,6 +177,8 @@ int display_remote = 0;
const char* keyboard_layout = NULL;
ram_addr_t ram_size;
const char *mem_path = NULL;
+bool keep_mem_path_files; /* Keep files created at mem_path.
+ * use memory region names as filename */
#ifdef MAP_POPULATE
bool mem_prealloc; /* force preallocation of physical target memory */
#endif
@@ -2671,6 +2673,9 @@ int main(int argc, char **argv, char **envp)
case QEMU_OPTION_mempath:
mem_path = optarg;
break;
+ case QEMU_OPTION_keep_mempath_files:
+ keep_mem_path_files = true;
+ break;
#ifdef MAP_POPULATE
case QEMU_OPTION_mem_prealloc:
mem_prealloc = true;
--
1.7.10.4
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [RFC PATCH 0/6] option to not remove files inside -mem-path dir (v2)
2012-07-02 18:06 [Qemu-devel] [RFC PATCH 0/6] option to not remove files inside -mem-path dir (v2) Eduardo Habkost
` (5 preceding siblings ...)
2012-07-02 18:06 ` [Qemu-devel] [RFC PATCH 6/6] add -keep-mem-path-files option (v2) Eduardo Habkost
@ 2012-07-02 18:56 ` Daniel P. Berrange
2012-07-02 19:54 ` Eduardo Habkost
6 siblings, 1 reply; 12+ messages in thread
From: Daniel P. Berrange @ 2012-07-02 18:56 UTC (permalink / raw)
To: Eduardo Habkost
Cc: Andrea Arcangeli, Andre Przywara, kvm, qemu-devel, Bharata B Rao,
Bill Gray
On Mon, Jul 02, 2012 at 03:06:32PM -0300, Eduardo Habkost wrote:
> Resending series, after fixing some coding style issues. Does anybody has any
> feedback about this proposal?
>
> Changes v1 -> v2:
> - Coding style fixes
>
> Original cover letter:
>
> I was investigating if there are any mechanisms that allow manually pinning of
> guest RAM to specific host NUMA nodes, in the case of multi-node KVM guests, and
> noticed that -mem-path could be used for that, except that it currently removes
> any files it creates (using mkstemp()) immediately, not allowing numactl to be
> used on the backing files, as a result. This patches add a -keep-mem-path-files
> option to make QEMU create the files inside -mem-path with more predictable
> names, and not remove them after creation.
>
> Some previous discussions about the subject, for reference:
> - Message-ID: <1281534738-8310-1-git-send-email-andre.przywara@amd.com>
> http://article.gmane.org/gmane.comp.emulators.kvm.devel/57684
> - Message-ID: <4C7D7C2A.7000205@codemonkey.ws>
> http://article.gmane.org/gmane.comp.emulators.kvm.devel/58835
>
> A more recent thread can be found at:
> - Message-ID: <20111029184502.GH11038@in.ibm.com>
> http://article.gmane.org/gmane.comp.emulators.qemu/123001
>
> Note that this is just a mechanism to facilitate manual static binding using
> numactl on hugetlbfs later, for optimization. This may be especially useful for
> single large multi-node guests use-cases (and, of course, has to be used with
> care).
>
> I don't know if it is a good idea to use the memory range names as a publicly-
> visible interface. Another option may be to use a single file instead, and mmap
> different regions inside the same file for each memory region. I an open to
> comments and suggestions.
>
> Example (untested) usage to bind manually each half of the RAM of a guest to a
> different NUMA node:
>
> $ qemu-system-x86_64 [...] -m 2048 -smp 4 \
> -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \
> -mem-prealloc -keep-mem-path-files -mem-path /mnt/hugetlbfs/FOO
> $ numactl --offset=1G --length=1G --membind=1 --file /mnt/hugetlbfs/FOO/pc.ram
> $ numactl --offset=0 --length=1G --membind=2 --file /mnt/hugetlbfs/FOO/pc.ram
I'd suggest that instead of making the memory file name into a
public ABI QEMU needs to maintain, QEMU could expose the info
via a monitor command. eg
$ qemu-system-x86_64 [...] -m 2048 -smp 4 \
-numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \
-mem-prealloc -mem-path /mnt/hugetlbfs/FOO \
-monitor stdio
(qemu) info mem-nodes
node0: file=/proc/self/fd/3, offset=0G, length=1G
node1: file=/proc/self/fd/3, offset=1G, length=1G
This example takes advantage of the fact that with Linux, you can
still access a deleted file via /proc/self/fd/NNN, which AFAICT,
would avoid the need for a --keep-mem-path-files.
By returning info via a monitor command you also avoid hardcoding
the use of 1 single file for all of memory. You also avoid hardcoding
the fact that QEMU stores the nodes in contiguous order inside the
node. eg QEMU could easily return data like this
$ qemu-system-x86_64 [...] -m 2048 -smp 4 \
-numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \
-mem-prealloc -mem-path /mnt/hugetlbfs/FOO \
-monitor stdio
(qemu) info mem-nodes
node0: file=/proc/self/fd/3, offset=0G, length=1G
node1: file=/proc/self/fd/4, offset=0G, length=1G
or more ingeneous options
Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [RFC PATCH 0/6] option to not remove files inside -mem-path dir (v2)
2012-07-02 18:56 ` [Qemu-devel] [RFC PATCH 0/6] option to not remove files inside -mem-path dir (v2) Daniel P. Berrange
@ 2012-07-02 19:54 ` Eduardo Habkost
2012-07-03 9:03 ` Daniel P. Berrange
0 siblings, 1 reply; 12+ messages in thread
From: Eduardo Habkost @ 2012-07-02 19:54 UTC (permalink / raw)
To: Daniel P. Berrange
Cc: Andrea Arcangeli, Andre Przywara, kvm, qemu-devel, Bharata B Rao,
Bill Gray
On Mon, Jul 02, 2012 at 07:56:58PM +0100, Daniel P. Berrange wrote:
> On Mon, Jul 02, 2012 at 03:06:32PM -0300, Eduardo Habkost wrote:
> > Resending series, after fixing some coding style issues. Does anybody has any
> > feedback about this proposal?
> >
> > Changes v1 -> v2:
> > - Coding style fixes
> >
> > Original cover letter:
> >
> > I was investigating if there are any mechanisms that allow manually pinning of
> > guest RAM to specific host NUMA nodes, in the case of multi-node KVM guests, and
> > noticed that -mem-path could be used for that, except that it currently removes
> > any files it creates (using mkstemp()) immediately, not allowing numactl to be
> > used on the backing files, as a result. This patches add a -keep-mem-path-files
> > option to make QEMU create the files inside -mem-path with more predictable
> > names, and not remove them after creation.
> >
> > Some previous discussions about the subject, for reference:
> > - Message-ID: <1281534738-8310-1-git-send-email-andre.przywara@amd.com>
> > http://article.gmane.org/gmane.comp.emulators.kvm.devel/57684
> > - Message-ID: <4C7D7C2A.7000205@codemonkey.ws>
> > http://article.gmane.org/gmane.comp.emulators.kvm.devel/58835
> >
> > A more recent thread can be found at:
> > - Message-ID: <20111029184502.GH11038@in.ibm.com>
> > http://article.gmane.org/gmane.comp.emulators.qemu/123001
> >
> > Note that this is just a mechanism to facilitate manual static binding using
> > numactl on hugetlbfs later, for optimization. This may be especially useful for
> > single large multi-node guests use-cases (and, of course, has to be used with
> > care).
> >
> > I don't know if it is a good idea to use the memory range names as a publicly-
> > visible interface. Another option may be to use a single file instead, and mmap
> > different regions inside the same file for each memory region. I an open to
> > comments and suggestions.
> >
> > Example (untested) usage to bind manually each half of the RAM of a guest to a
> > different NUMA node:
> >
> > $ qemu-system-x86_64 [...] -m 2048 -smp 4 \
> > -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \
> > -mem-prealloc -keep-mem-path-files -mem-path /mnt/hugetlbfs/FOO
> > $ numactl --offset=1G --length=1G --membind=1 --file /mnt/hugetlbfs/FOO/pc.ram
> > $ numactl --offset=0 --length=1G --membind=2 --file /mnt/hugetlbfs/FOO/pc.ram
>
> I'd suggest that instead of making the memory file name into a
> public ABI QEMU needs to maintain, QEMU could expose the info
> via a monitor command. eg
>
> $ qemu-system-x86_64 [...] -m 2048 -smp 4 \
> -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \
> -mem-prealloc -mem-path /mnt/hugetlbfs/FOO \
> -monitor stdio
> (qemu) info mem-nodes
> node0: file=/proc/self/fd/3, offset=0G, length=1G
> node1: file=/proc/self/fd/3, offset=1G, length=1G
>
> This example takes advantage of the fact that with Linux, you can
> still access a deleted file via /proc/self/fd/NNN, which AFAICT,
> would avoid the need for a --keep-mem-path-files.
I like the suggestion.
But other processes still need to be able to open those files if we want
to do anything useful with them. In this case, I guess it's better to
let QEMU itself build a "/proc/<getpid()>/fd/<fd>" string instead of
using "/proc/self" and forcing the client to find out what's the right
PID?
Anyway, even if we want to avoid file-descriptor and /proc tricks, we
can still use the interface you suggest. Then we wouldn't need to have
any filename assumptions: the filenames could be completly random, as
they would be reported using the new monitor command.
>
> By returning info via a monitor command you also avoid hardcoding
> the use of 1 single file for all of memory. You also avoid hardcoding
> the fact that QEMU stores the nodes in contiguous order inside the
> node. eg QEMU could easily return data like this
>
>
> $ qemu-system-x86_64 [...] -m 2048 -smp 4 \
> -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \
> -mem-prealloc -mem-path /mnt/hugetlbfs/FOO \
> -monitor stdio
> (qemu) info mem-nodes
> node0: file=/proc/self/fd/3, offset=0G, length=1G
> node1: file=/proc/self/fd/4, offset=0G, length=1G
>
> or more ingeneous options
Sounds good.
--
Eduardo
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [RFC PATCH 0/6] option to not remove files inside -mem-path dir (v2)
2012-07-02 19:54 ` Eduardo Habkost
@ 2012-07-03 9:03 ` Daniel P. Berrange
0 siblings, 0 replies; 12+ messages in thread
From: Daniel P. Berrange @ 2012-07-03 9:03 UTC (permalink / raw)
To: Eduardo Habkost
Cc: Andrea Arcangeli, Andre Przywara, kvm, qemu-devel, Bharata B Rao,
Bill Gray
On Mon, Jul 02, 2012 at 04:54:03PM -0300, Eduardo Habkost wrote:
> On Mon, Jul 02, 2012 at 07:56:58PM +0100, Daniel P. Berrange wrote:
> > On Mon, Jul 02, 2012 at 03:06:32PM -0300, Eduardo Habkost wrote:
> > > Resending series, after fixing some coding style issues. Does anybody has any
> > > feedback about this proposal?
> > >
> > > Changes v1 -> v2:
> > > - Coding style fixes
> > >
> > > Original cover letter:
> > >
> > > I was investigating if there are any mechanisms that allow manually pinning of
> > > guest RAM to specific host NUMA nodes, in the case of multi-node KVM guests, and
> > > noticed that -mem-path could be used for that, except that it currently removes
> > > any files it creates (using mkstemp()) immediately, not allowing numactl to be
> > > used on the backing files, as a result. This patches add a -keep-mem-path-files
> > > option to make QEMU create the files inside -mem-path with more predictable
> > > names, and not remove them after creation.
> > >
> > > Some previous discussions about the subject, for reference:
> > > - Message-ID: <1281534738-8310-1-git-send-email-andre.przywara@amd.com>
> > > http://article.gmane.org/gmane.comp.emulators.kvm.devel/57684
> > > - Message-ID: <4C7D7C2A.7000205@codemonkey.ws>
> > > http://article.gmane.org/gmane.comp.emulators.kvm.devel/58835
> > >
> > > A more recent thread can be found at:
> > > - Message-ID: <20111029184502.GH11038@in.ibm.com>
> > > http://article.gmane.org/gmane.comp.emulators.qemu/123001
> > >
> > > Note that this is just a mechanism to facilitate manual static binding using
> > > numactl on hugetlbfs later, for optimization. This may be especially useful for
> > > single large multi-node guests use-cases (and, of course, has to be used with
> > > care).
> > >
> > > I don't know if it is a good idea to use the memory range names as a publicly-
> > > visible interface. Another option may be to use a single file instead, and mmap
> > > different regions inside the same file for each memory region. I an open to
> > > comments and suggestions.
> > >
> > > Example (untested) usage to bind manually each half of the RAM of a guest to a
> > > different NUMA node:
> > >
> > > $ qemu-system-x86_64 [...] -m 2048 -smp 4 \
> > > -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \
> > > -mem-prealloc -keep-mem-path-files -mem-path /mnt/hugetlbfs/FOO
> > > $ numactl --offset=1G --length=1G --membind=1 --file /mnt/hugetlbfs/FOO/pc.ram
> > > $ numactl --offset=0 --length=1G --membind=2 --file /mnt/hugetlbfs/FOO/pc.ram
> >
> > I'd suggest that instead of making the memory file name into a
> > public ABI QEMU needs to maintain, QEMU could expose the info
> > via a monitor command. eg
> >
> > $ qemu-system-x86_64 [...] -m 2048 -smp 4 \
> > -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \
> > -mem-prealloc -mem-path /mnt/hugetlbfs/FOO \
> > -monitor stdio
> > (qemu) info mem-nodes
> > node0: file=/proc/self/fd/3, offset=0G, length=1G
> > node1: file=/proc/self/fd/3, offset=1G, length=1G
> >
> > This example takes advantage of the fact that with Linux, you can
> > still access a deleted file via /proc/self/fd/NNN, which AFAICT,
> > would avoid the need for a --keep-mem-path-files.
>
> I like the suggestion.
>
> But other processes still need to be able to open those files if we want
> to do anything useful with them. In this case, I guess it's better to
> let QEMU itself build a "/proc/<getpid()>/fd/<fd>" string instead of
> using "/proc/self" and forcing the client to find out what's the right
> PID?
>
> Anyway, even if we want to avoid file-descriptor and /proc tricks, we
> can still use the interface you suggest. Then we wouldn't need to have
> any filename assumptions: the filenames could be completly random, as
> they would be reported using the new monitor command.
Opps, yes of course. I did intend that client apps could use the
files, so I should have used /proc/$PID and not /proc/self
>
> >
> > By returning info via a monitor command you also avoid hardcoding
> > the use of 1 single file for all of memory. You also avoid hardcoding
> > the fact that QEMU stores the nodes in contiguous order inside the
> > node. eg QEMU could easily return data like this
> >
> >
> > $ qemu-system-x86_64 [...] -m 2048 -smp 4 \
> > -numa node,cpus=0-1,mem=1024 -numa node,cpus=2-3,mem=1024 \
> > -mem-prealloc -mem-path /mnt/hugetlbfs/FOO \
> > -monitor stdio
> > (qemu) info mem-nodes
> > node0: file=/proc/self/fd/3, offset=0G, length=1G
> > node1: file=/proc/self/fd/4, offset=0G, length=1G
> >
> > or more ingeneous options
>
> Sounds good.
>
> --
> Eduardo
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH 1/6] file_ram_alloc(): coding style fixes
2012-07-02 18:06 ` [Qemu-devel] [PATCH 1/6] file_ram_alloc(): coding style fixes Eduardo Habkost
@ 2012-07-03 19:16 ` Blue Swirl
0 siblings, 0 replies; 12+ messages in thread
From: Blue Swirl @ 2012-07-03 19:16 UTC (permalink / raw)
To: Eduardo Habkost
Cc: Andrea Arcangeli, Andre Przywara, kvm, qemu-devel, Bharata B Rao,
Bill Gray
On Mon, Jul 2, 2012 at 6:06 PM, Eduardo Habkost <ehabkost@redhat.com> wrote:
> Cc: Blue Swirl <blauwirbel@gmail.com>
> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Acked-by: Blue Swirl <blauwirbel@gmail.com>
> ---
> exec.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/exec.c b/exec.c
> index 8244d54..c8bfd27 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -2392,7 +2392,7 @@ static void *file_ram_alloc(RAMBlock *block,
> unlink(filename);
> free(filename);
>
> - memory = (memory+hpagesize-1) & ~(hpagesize-1);
> + memory = (memory + hpagesize - 1) & ~(hpagesize - 1);
>
> /*
> * ftruncate is not supported by hugetlbfs in older
> @@ -2400,8 +2400,9 @@ static void *file_ram_alloc(RAMBlock *block,
> * If anything goes wrong with it under other filesystems,
> * mmap will fail.
> */
> - if (ftruncate(fd, memory))
> + if (ftruncate(fd, memory)) {
> perror("ftruncate");
> + }
>
> #ifdef MAP_POPULATE
> /* NB: MAP_POPULATE won't exhaustively alloc all phys pages in the case
> --
> 1.7.10.4
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH 2/6] file_ram_alloc(): use g_strdup_printf() instead of asprintf()
2012-07-02 18:06 ` [Qemu-devel] [PATCH 2/6] file_ram_alloc(): use g_strdup_printf() instead of asprintf() Eduardo Habkost
@ 2012-07-03 19:16 ` Blue Swirl
0 siblings, 0 replies; 12+ messages in thread
From: Blue Swirl @ 2012-07-03 19:16 UTC (permalink / raw)
To: Eduardo Habkost
Cc: Andrea Arcangeli, Andre Przywara, kvm, qemu-devel, Bharata B Rao,
Bill Gray
On Mon, Jul 2, 2012 at 6:06 PM, Eduardo Habkost <ehabkost@redhat.com> wrote:
> Cc: Blue Swirl <blauwirbel@gmail.com>
> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
Acked-by: Blue Swirl <blauwirbel@gmail.com>
> ---
> exec.c | 14 +++++++-------
> 1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/exec.c b/exec.c
> index c8bfd27..d856325 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -24,6 +24,9 @@
> #include <sys/mman.h>
> #endif
>
> +#include <glib.h>
> +#include <glib/gprintf.h>
> +
> #include "qemu-common.h"
> #include "cpu.h"
> #include "tcg.h"
> @@ -2357,7 +2360,7 @@ static void *file_ram_alloc(RAMBlock *block,
> ram_addr_t memory,
> const char *path)
> {
> - char *filename;
> + gchar *filename;
> void *area;
> int fd;
> #ifdef MAP_POPULATE
> @@ -2379,18 +2382,15 @@ static void *file_ram_alloc(RAMBlock *block,
> return NULL;
> }
>
> - if (asprintf(&filename, "%s/qemu_back_mem.XXXXXX", path) == -1) {
> - return NULL;
> - }
> -
> + filename = g_strdup_printf("%s/qemu_back_mem.XXXXXX", path);
> fd = mkstemp(filename);
> if (fd < 0) {
> perror("unable to create backing store for hugepages");
> - free(filename);
> + g_free(filename);
> return NULL;
> }
> unlink(filename);
> - free(filename);
> + g_free(filename);
>
> memory = (memory + hpagesize - 1) & ~(hpagesize - 1);
>
> --
> 1.7.10.4
>
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2012-07-03 19:17 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-02 18:06 [Qemu-devel] [RFC PATCH 0/6] option to not remove files inside -mem-path dir (v2) Eduardo Habkost
2012-07-02 18:06 ` [Qemu-devel] [PATCH 1/6] file_ram_alloc(): coding style fixes Eduardo Habkost
2012-07-03 19:16 ` Blue Swirl
2012-07-02 18:06 ` [Qemu-devel] [PATCH 2/6] file_ram_alloc(): use g_strdup_printf() instead of asprintf() Eduardo Habkost
2012-07-03 19:16 ` Blue Swirl
2012-07-02 18:06 ` [Qemu-devel] [PATCH 3/6] vl.c: change mem_prealloc to bool (v2) Eduardo Habkost
2012-07-02 18:06 ` [Qemu-devel] [PATCH 4/6] file_ram_alloc: change length argument to size_t (v2) Eduardo Habkost
2012-07-02 18:06 ` [Qemu-devel] [PATCH 5/6] file_ram_alloc(): extract temporary-file creation code to separate function (v2) Eduardo Habkost
2012-07-02 18:06 ` [Qemu-devel] [RFC PATCH 6/6] add -keep-mem-path-files option (v2) Eduardo Habkost
2012-07-02 18:56 ` [Qemu-devel] [RFC PATCH 0/6] option to not remove files inside -mem-path dir (v2) Daniel P. Berrange
2012-07-02 19:54 ` Eduardo Habkost
2012-07-03 9:03 ` Daniel P. Berrange
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).