* [Qemu-devel] [PATCH 0/2] mmap-alloc: fix hugetlbfs misaligned length in ppc64
@ 2019-01-30 23:36 Murilo Opsfelder Araujo
2019-01-30 23:36 ` [Qemu-devel] [PATCH 1/2] mmap-alloc: unfold qemu_ram_mmap() Murilo Opsfelder Araujo
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Murilo Opsfelder Araujo @ 2019-01-30 23:36 UTC (permalink / raw)
To: qemu-devel, qemu-ppc
Cc: Murilo Opsfelder Araujo, Cao jin, David Gibson, Fabiano Rosas,
Greg Kurz, Michael S . Tsirkin, Paolo Bonzini, Peter Crosthwaite,
Richard Henderson, mopsfelder
The first patch unfolds parts of qemu_ram_mmap() to make it clearer.
No changes in the function behaviour.
The second one fixes the alignment of the length given to munmap().
I am pretty sure there is room for improvement, so I would love to
hear your feedback.
Thank you!
Murilo Opsfelder Araujo (2):
mmap-alloc: unfold qemu_ram_mmap()
mmap-alloc: fix hugetlbfs misaligned length in ppc64
exec.c | 4 +--
include/qemu/mmap-alloc.h | 2 +-
util/mmap-alloc.c | 73 ++++++++++++++++++++++++++-------------
util/oslib-posix.c | 2 +-
4 files changed, 53 insertions(+), 28 deletions(-)
--
2.20.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH 1/2] mmap-alloc: unfold qemu_ram_mmap()
2019-01-30 23:36 [Qemu-devel] [PATCH 0/2] mmap-alloc: fix hugetlbfs misaligned length in ppc64 Murilo Opsfelder Araujo
@ 2019-01-30 23:36 ` Murilo Opsfelder Araujo
2019-01-31 9:49 ` Greg Kurz
2019-02-01 13:44 ` Balamuruhan S
2019-01-30 23:36 ` [Qemu-devel] [PATCH 2/2] mmap-alloc: fix hugetlbfs misaligned length in ppc64 Murilo Opsfelder Araujo
2019-02-04 0:08 ` [Qemu-devel] [PATCH 0/2] " David Gibson
2 siblings, 2 replies; 9+ messages in thread
From: Murilo Opsfelder Araujo @ 2019-01-30 23:36 UTC (permalink / raw)
To: qemu-devel, qemu-ppc
Cc: Murilo Opsfelder Araujo, Cao jin, David Gibson, Fabiano Rosas,
Greg Kurz, Michael S . Tsirkin, Paolo Bonzini, Peter Crosthwaite,
Richard Henderson, mopsfelder
Unfold parts of qemu_ram_mmap() for the sake of understanding, moving
declarations to the top, and keeping architecture-specifics in the
ifdef-else blocks. No changes in the function behaviour.
Give ptr and ptr1 meaningful names:
ptr -> guardptr : pointer to the PROT_NONE guard region
ptr1 -> ptr : pointer to the mapped memory returned to caller
Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
---
util/mmap-alloc.c | 53 ++++++++++++++++++++++++++++++-----------------
1 file changed, 34 insertions(+), 19 deletions(-)
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index fd329eccd8..f71ea038c8 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -77,11 +77,19 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
{
+ int flags;
+ int guardfd;
+ size_t offset;
+ size_t total;
+ void *guardptr;
+ void *ptr;
+
/*
* Note: this always allocates at least one extra page of virtual address
* space, even if size is already aligned.
*/
- size_t total = size + align;
+ total = size + align;
+
#if defined(__powerpc64__) && defined(__linux__)
/* On ppc64 mappings in the same segment (aka slice) must share the same
* page size. Since we will be re-allocating part of this segment
@@ -91,16 +99,22 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
* We do this unless we are using the system page size, in which case
* anonymous memory is OK.
*/
- int anonfd = fd == -1 || qemu_fd_getpagesize(fd) == getpagesize() ? -1 : fd;
- int flags = anonfd == -1 ? MAP_ANONYMOUS : MAP_NORESERVE;
- void *ptr = mmap(0, total, PROT_NONE, flags | MAP_PRIVATE, anonfd, 0);
+ flags = MAP_PRIVATE;
+ if (fd == -1 || qemu_fd_getpagesize(fd) == getpagesize()) {
+ guardfd = -1;
+ flags |= MAP_ANONYMOUS;
+ } else {
+ guardfd = fd;
+ flags |= MAP_NORESERVE;
+ }
#else
- void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
+ guardfd = -1;
+ flags = MAP_PRIVATE | MAP_ANONYMOUS;
#endif
- size_t offset;
- void *ptr1;
- if (ptr == MAP_FAILED) {
+ guardptr = mmap(0, total, PROT_NONE, flags, guardfd, 0);
+
+ if (guardptr == MAP_FAILED) {
return MAP_FAILED;
}
@@ -108,19 +122,20 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
/* Always align to host page size */
assert(align >= getpagesize());
- offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
- ptr1 = mmap(ptr + offset, size, PROT_READ | PROT_WRITE,
- MAP_FIXED |
- (fd == -1 ? MAP_ANONYMOUS : 0) |
- (shared ? MAP_SHARED : MAP_PRIVATE),
- fd, 0);
- if (ptr1 == MAP_FAILED) {
- munmap(ptr, total);
+ flags = MAP_FIXED;
+ flags |= fd == -1 ? MAP_ANONYMOUS : 0;
+ flags |= shared ? MAP_SHARED : MAP_PRIVATE;
+ offset = QEMU_ALIGN_UP((uintptr_t)guardptr, align) - (uintptr_t)guardptr;
+
+ ptr = mmap(guardptr + offset, size, PROT_READ | PROT_WRITE, flags, fd, 0);
+
+ if (ptr == MAP_FAILED) {
+ munmap(guardptr, total);
return MAP_FAILED;
}
if (offset > 0) {
- munmap(ptr, offset);
+ munmap(guardptr, offset);
}
/*
@@ -129,10 +144,10 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
*/
total -= offset;
if (total > size + getpagesize()) {
- munmap(ptr1 + size + getpagesize(), total - size - getpagesize());
+ munmap(ptr + size + getpagesize(), total - size - getpagesize());
}
- return ptr1;
+ return ptr;
}
void qemu_ram_munmap(void *ptr, size_t size)
--
2.20.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH 2/2] mmap-alloc: fix hugetlbfs misaligned length in ppc64
2019-01-30 23:36 [Qemu-devel] [PATCH 0/2] mmap-alloc: fix hugetlbfs misaligned length in ppc64 Murilo Opsfelder Araujo
2019-01-30 23:36 ` [Qemu-devel] [PATCH 1/2] mmap-alloc: unfold qemu_ram_mmap() Murilo Opsfelder Araujo
@ 2019-01-30 23:36 ` Murilo Opsfelder Araujo
2019-01-31 9:58 ` Greg Kurz
2019-02-01 13:43 ` Balamuruhan S
2019-02-04 0:08 ` [Qemu-devel] [PATCH 0/2] " David Gibson
2 siblings, 2 replies; 9+ messages in thread
From: Murilo Opsfelder Araujo @ 2019-01-30 23:36 UTC (permalink / raw)
To: qemu-devel, qemu-ppc
Cc: Murilo Opsfelder Araujo, Cao jin, David Gibson, Fabiano Rosas,
Greg Kurz, Michael S . Tsirkin, Paolo Bonzini, Peter Crosthwaite,
Richard Henderson, mopsfelder
The commit 7197fb4058bcb68986bae2bb2c04d6370f3e7218 ("util/mmap-alloc:
fix hugetlb support on ppc64") fixed Huge TLB mappings on ppc64.
However, we still need to consider the underlying huge page size
during munmap() because it requires that both address and length be a
multiple of the underlying huge page size for Huge TLB mappings.
Quote from "Huge page (Huge TLB) mappings" paragraph under NOTES
section of the munmap(2) manual:
"For munmap(), addr and length must both be a multiple of the
underlying huge page size."
On ppc64, the munmap() in qemu_ram_munmap() does not work for Huge TLB
mappings because the mapped segment can be aligned with the underlying
huge page size, not aligned with the native system page size, as
returned by getpagesize().
This has the side effect of not releasing huge pages back to the pool
after a hugetlbfs file-backed memory device is hot-unplugged.
This patch fixes the situation in qemu_ram_mmap() and
qemu_ram_munmap() by considering the underlying page size on ppc64.
After this patch, memory hot-unplug releases huge pages back to the
pool.
Fixes: 7197fb4058bcb68986bae2bb2c04d6370f3e7218
Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
---
exec.c | 4 ++--
include/qemu/mmap-alloc.h | 2 +-
util/mmap-alloc.c | 22 ++++++++++++++++------
util/oslib-posix.c | 2 +-
4 files changed, 20 insertions(+), 10 deletions(-)
diff --git a/exec.c b/exec.c
index da3e635f91..0db6d8bf34 100644
--- a/exec.c
+++ b/exec.c
@@ -1871,7 +1871,7 @@ static void *file_ram_alloc(RAMBlock *block,
if (mem_prealloc) {
os_mem_prealloc(fd, area, memory, smp_cpus, errp);
if (errp && *errp) {
- qemu_ram_munmap(area, memory);
+ qemu_ram_munmap(fd, area, memory);
return NULL;
}
}
@@ -2392,7 +2392,7 @@ static void reclaim_ramblock(RAMBlock *block)
xen_invalidate_map_cache_entry(block->host);
#ifndef _WIN32
} else if (block->fd >= 0) {
- qemu_ram_munmap(block->host, block->max_length);
+ qemu_ram_munmap(block->fd, block->host, block->max_length);
close(block->fd);
#endif
} else {
diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
index 50385e3f81..ef04f0ed5b 100644
--- a/include/qemu/mmap-alloc.h
+++ b/include/qemu/mmap-alloc.h
@@ -9,6 +9,6 @@ size_t qemu_mempath_getpagesize(const char *mem_path);
void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared);
-void qemu_ram_munmap(void *ptr, size_t size);
+void qemu_ram_munmap(int fd, void *ptr, size_t size);
#endif
diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
index f71ea038c8..8565885420 100644
--- a/util/mmap-alloc.c
+++ b/util/mmap-alloc.c
@@ -80,6 +80,7 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
int flags;
int guardfd;
size_t offset;
+ size_t pagesize;
size_t total;
void *guardptr;
void *ptr;
@@ -100,7 +101,8 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
* anonymous memory is OK.
*/
flags = MAP_PRIVATE;
- if (fd == -1 || qemu_fd_getpagesize(fd) == getpagesize()) {
+ pagesize = qemu_fd_getpagesize(fd);
+ if (fd == -1 || pagesize == getpagesize()) {
guardfd = -1;
flags |= MAP_ANONYMOUS;
} else {
@@ -109,6 +111,7 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
}
#else
guardfd = -1;
+ pagesize = getpagesize();
flags = MAP_PRIVATE | MAP_ANONYMOUS;
#endif
@@ -120,7 +123,7 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
assert(is_power_of_2(align));
/* Always align to host page size */
- assert(align >= getpagesize());
+ assert(align >= pagesize);
flags = MAP_FIXED;
flags |= fd == -1 ? MAP_ANONYMOUS : 0;
@@ -143,17 +146,24 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
* a guard page guarding against potential buffer overflows.
*/
total -= offset;
- if (total > size + getpagesize()) {
- munmap(ptr + size + getpagesize(), total - size - getpagesize());
+ if (total > size + pagesize) {
+ munmap(ptr + size + pagesize, total - size - pagesize);
}
return ptr;
}
-void qemu_ram_munmap(void *ptr, size_t size)
+void qemu_ram_munmap(int fd, void *ptr, size_t size)
{
+ size_t pagesize;
+
if (ptr) {
/* Unmap both the RAM block and the guard page */
- munmap(ptr, size + getpagesize());
+#if defined(__powerpc64__) && defined(__linux__)
+ pagesize = qemu_fd_getpagesize(fd);
+#else
+ pagesize = getpagesize();
+#endif
+ munmap(ptr, size + pagesize);
}
}
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 4ce1ba9ca4..37c5854b9c 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -226,7 +226,7 @@ void qemu_vfree(void *ptr)
void qemu_anon_ram_free(void *ptr, size_t size)
{
trace_qemu_anon_ram_free(ptr, size);
- qemu_ram_munmap(ptr, size);
+ qemu_ram_munmap(-1, ptr, size);
}
void qemu_set_block(int fd)
--
2.20.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH 1/2] mmap-alloc: unfold qemu_ram_mmap()
2019-01-30 23:36 ` [Qemu-devel] [PATCH 1/2] mmap-alloc: unfold qemu_ram_mmap() Murilo Opsfelder Araujo
@ 2019-01-31 9:49 ` Greg Kurz
2019-02-01 13:44 ` Balamuruhan S
1 sibling, 0 replies; 9+ messages in thread
From: Greg Kurz @ 2019-01-31 9:49 UTC (permalink / raw)
To: Murilo Opsfelder Araujo
Cc: qemu-devel, qemu-ppc, Cao jin, David Gibson, Fabiano Rosas,
Michael S . Tsirkin, Paolo Bonzini, Peter Crosthwaite,
Richard Henderson, mopsfelder
On Wed, 30 Jan 2019 21:36:04 -0200
Murilo Opsfelder Araujo <muriloo@linux.ibm.com> wrote:
> Unfold parts of qemu_ram_mmap() for the sake of understanding, moving
> declarations to the top, and keeping architecture-specifics in the
> ifdef-else blocks. No changes in the function behaviour.
>
> Give ptr and ptr1 meaningful names:
> ptr -> guardptr : pointer to the PROT_NONE guard region
> ptr1 -> ptr : pointer to the mapped memory returned to caller
>
> Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
> ---
Reviewed-by: Greg Kurz <groug@kaod.org>
> util/mmap-alloc.c | 53 ++++++++++++++++++++++++++++++-----------------
> 1 file changed, 34 insertions(+), 19 deletions(-)
>
> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> index fd329eccd8..f71ea038c8 100644
> --- a/util/mmap-alloc.c
> +++ b/util/mmap-alloc.c
> @@ -77,11 +77,19 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
>
> void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
> {
> + int flags;
> + int guardfd;
> + size_t offset;
> + size_t total;
> + void *guardptr;
> + void *ptr;
> +
> /*
> * Note: this always allocates at least one extra page of virtual address
> * space, even if size is already aligned.
> */
> - size_t total = size + align;
> + total = size + align;
> +
> #if defined(__powerpc64__) && defined(__linux__)
> /* On ppc64 mappings in the same segment (aka slice) must share the same
> * page size. Since we will be re-allocating part of this segment
> @@ -91,16 +99,22 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
> * We do this unless we are using the system page size, in which case
> * anonymous memory is OK.
> */
> - int anonfd = fd == -1 || qemu_fd_getpagesize(fd) == getpagesize() ? -1 : fd;
> - int flags = anonfd == -1 ? MAP_ANONYMOUS : MAP_NORESERVE;
> - void *ptr = mmap(0, total, PROT_NONE, flags | MAP_PRIVATE, anonfd, 0);
> + flags = MAP_PRIVATE;
> + if (fd == -1 || qemu_fd_getpagesize(fd) == getpagesize()) {
> + guardfd = -1;
> + flags |= MAP_ANONYMOUS;
> + } else {
> + guardfd = fd;
> + flags |= MAP_NORESERVE;
> + }
> #else
> - void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> + guardfd = -1;
> + flags = MAP_PRIVATE | MAP_ANONYMOUS;
> #endif
> - size_t offset;
> - void *ptr1;
>
> - if (ptr == MAP_FAILED) {
> + guardptr = mmap(0, total, PROT_NONE, flags, guardfd, 0);
> +
> + if (guardptr == MAP_FAILED) {
> return MAP_FAILED;
> }
>
> @@ -108,19 +122,20 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
> /* Always align to host page size */
> assert(align >= getpagesize());
>
> - offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
> - ptr1 = mmap(ptr + offset, size, PROT_READ | PROT_WRITE,
> - MAP_FIXED |
> - (fd == -1 ? MAP_ANONYMOUS : 0) |
> - (shared ? MAP_SHARED : MAP_PRIVATE),
> - fd, 0);
> - if (ptr1 == MAP_FAILED) {
> - munmap(ptr, total);
> + flags = MAP_FIXED;
> + flags |= fd == -1 ? MAP_ANONYMOUS : 0;
> + flags |= shared ? MAP_SHARED : MAP_PRIVATE;
> + offset = QEMU_ALIGN_UP((uintptr_t)guardptr, align) - (uintptr_t)guardptr;
> +
> + ptr = mmap(guardptr + offset, size, PROT_READ | PROT_WRITE, flags, fd, 0);
> +
> + if (ptr == MAP_FAILED) {
> + munmap(guardptr, total);
> return MAP_FAILED;
> }
>
> if (offset > 0) {
> - munmap(ptr, offset);
> + munmap(guardptr, offset);
> }
>
> /*
> @@ -129,10 +144,10 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
> */
> total -= offset;
> if (total > size + getpagesize()) {
> - munmap(ptr1 + size + getpagesize(), total - size - getpagesize());
> + munmap(ptr + size + getpagesize(), total - size - getpagesize());
> }
>
> - return ptr1;
> + return ptr;
> }
>
> void qemu_ram_munmap(void *ptr, size_t size)
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH 2/2] mmap-alloc: fix hugetlbfs misaligned length in ppc64
2019-01-30 23:36 ` [Qemu-devel] [PATCH 2/2] mmap-alloc: fix hugetlbfs misaligned length in ppc64 Murilo Opsfelder Araujo
@ 2019-01-31 9:58 ` Greg Kurz
2019-02-01 13:43 ` Balamuruhan S
1 sibling, 0 replies; 9+ messages in thread
From: Greg Kurz @ 2019-01-31 9:58 UTC (permalink / raw)
To: Murilo Opsfelder Araujo
Cc: qemu-devel, qemu-ppc, Cao jin, David Gibson, Fabiano Rosas,
Michael S . Tsirkin, Paolo Bonzini, Peter Crosthwaite,
Richard Henderson, mopsfelder
On Wed, 30 Jan 2019 21:36:05 -0200
Murilo Opsfelder Araujo <muriloo@linux.ibm.com> wrote:
> The commit 7197fb4058bcb68986bae2bb2c04d6370f3e7218 ("util/mmap-alloc:
> fix hugetlb support on ppc64") fixed Huge TLB mappings on ppc64.
>
> However, we still need to consider the underlying huge page size
> during munmap() because it requires that both address and length be a
> multiple of the underlying huge page size for Huge TLB mappings.
> Quote from "Huge page (Huge TLB) mappings" paragraph under NOTES
> section of the munmap(2) manual:
>
> "For munmap(), addr and length must both be a multiple of the
> underlying huge page size."
>
> On ppc64, the munmap() in qemu_ram_munmap() does not work for Huge TLB
> mappings because the mapped segment can be aligned with the underlying
> huge page size, not aligned with the native system page size, as
> returned by getpagesize().
>
> This has the side effect of not releasing huge pages back to the pool
> after a hugetlbfs file-backed memory device is hot-unplugged.
>
> This patch fixes the situation in qemu_ram_mmap() and
> qemu_ram_munmap() by considering the underlying page size on ppc64.
>
> After this patch, memory hot-unplug releases huge pages back to the
> pool.
>
> Fixes: 7197fb4058bcb68986bae2bb2c04d6370f3e7218
> Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
> ---
LGTM
Reviewed-by: Greg Kurz <groug@kaod.org>
> exec.c | 4 ++--
> include/qemu/mmap-alloc.h | 2 +-
> util/mmap-alloc.c | 22 ++++++++++++++++------
> util/oslib-posix.c | 2 +-
> 4 files changed, 20 insertions(+), 10 deletions(-)
>
> diff --git a/exec.c b/exec.c
> index da3e635f91..0db6d8bf34 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1871,7 +1871,7 @@ static void *file_ram_alloc(RAMBlock *block,
> if (mem_prealloc) {
> os_mem_prealloc(fd, area, memory, smp_cpus, errp);
> if (errp && *errp) {
> - qemu_ram_munmap(area, memory);
> + qemu_ram_munmap(fd, area, memory);
> return NULL;
> }
> }
> @@ -2392,7 +2392,7 @@ static void reclaim_ramblock(RAMBlock *block)
> xen_invalidate_map_cache_entry(block->host);
> #ifndef _WIN32
> } else if (block->fd >= 0) {
> - qemu_ram_munmap(block->host, block->max_length);
> + qemu_ram_munmap(block->fd, block->host, block->max_length);
> close(block->fd);
> #endif
> } else {
> diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
> index 50385e3f81..ef04f0ed5b 100644
> --- a/include/qemu/mmap-alloc.h
> +++ b/include/qemu/mmap-alloc.h
> @@ -9,6 +9,6 @@ size_t qemu_mempath_getpagesize(const char *mem_path);
>
> void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared);
>
> -void qemu_ram_munmap(void *ptr, size_t size);
> +void qemu_ram_munmap(int fd, void *ptr, size_t size);
>
> #endif
> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> index f71ea038c8..8565885420 100644
> --- a/util/mmap-alloc.c
> +++ b/util/mmap-alloc.c
> @@ -80,6 +80,7 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
> int flags;
> int guardfd;
> size_t offset;
> + size_t pagesize;
> size_t total;
> void *guardptr;
> void *ptr;
> @@ -100,7 +101,8 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
> * anonymous memory is OK.
> */
> flags = MAP_PRIVATE;
> - if (fd == -1 || qemu_fd_getpagesize(fd) == getpagesize()) {
> + pagesize = qemu_fd_getpagesize(fd);
> + if (fd == -1 || pagesize == getpagesize()) {
> guardfd = -1;
> flags |= MAP_ANONYMOUS;
> } else {
> @@ -109,6 +111,7 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
> }
> #else
> guardfd = -1;
> + pagesize = getpagesize();
> flags = MAP_PRIVATE | MAP_ANONYMOUS;
> #endif
>
> @@ -120,7 +123,7 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
>
> assert(is_power_of_2(align));
> /* Always align to host page size */
> - assert(align >= getpagesize());
> + assert(align >= pagesize);
>
> flags = MAP_FIXED;
> flags |= fd == -1 ? MAP_ANONYMOUS : 0;
> @@ -143,17 +146,24 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
> * a guard page guarding against potential buffer overflows.
> */
> total -= offset;
> - if (total > size + getpagesize()) {
> - munmap(ptr + size + getpagesize(), total - size - getpagesize());
> + if (total > size + pagesize) {
> + munmap(ptr + size + pagesize, total - size - pagesize);
> }
>
> return ptr;
> }
>
> -void qemu_ram_munmap(void *ptr, size_t size)
> +void qemu_ram_munmap(int fd, void *ptr, size_t size)
> {
> + size_t pagesize;
> +
> if (ptr) {
> /* Unmap both the RAM block and the guard page */
> - munmap(ptr, size + getpagesize());
> +#if defined(__powerpc64__) && defined(__linux__)
> + pagesize = qemu_fd_getpagesize(fd);
> +#else
> + pagesize = getpagesize();
> +#endif
> + munmap(ptr, size + pagesize);
> }
> }
> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index 4ce1ba9ca4..37c5854b9c 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -226,7 +226,7 @@ void qemu_vfree(void *ptr)
> void qemu_anon_ram_free(void *ptr, size_t size)
> {
> trace_qemu_anon_ram_free(ptr, size);
> - qemu_ram_munmap(ptr, size);
> + qemu_ram_munmap(-1, ptr, size);
> }
>
> void qemu_set_block(int fd)
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH 2/2] mmap-alloc: fix hugetlbfs misaligned length in ppc64
2019-01-30 23:36 ` [Qemu-devel] [PATCH 2/2] mmap-alloc: fix hugetlbfs misaligned length in ppc64 Murilo Opsfelder Araujo
2019-01-31 9:58 ` Greg Kurz
@ 2019-02-01 13:43 ` Balamuruhan S
1 sibling, 0 replies; 9+ messages in thread
From: Balamuruhan S @ 2019-02-01 13:43 UTC (permalink / raw)
To: Murilo Opsfelder Araujo; +Cc: qemu-devel
On Wed, Jan 30, 2019 at 09:36:05PM -0200, Murilo Opsfelder Araujo wrote:
> The commit 7197fb4058bcb68986bae2bb2c04d6370f3e7218 ("util/mmap-alloc:
> fix hugetlb support on ppc64") fixed Huge TLB mappings on ppc64.
>
> However, we still need to consider the underlying huge page size
> during munmap() because it requires that both address and length be a
> multiple of the underlying huge page size for Huge TLB mappings.
> Quote from "Huge page (Huge TLB) mappings" paragraph under NOTES
> section of the munmap(2) manual:
>
> "For munmap(), addr and length must both be a multiple of the
> underlying huge page size."
>
> On ppc64, the munmap() in qemu_ram_munmap() does not work for Huge TLB
> mappings because the mapped segment can be aligned with the underlying
> huge page size, not aligned with the native system page size, as
> returned by getpagesize().
>
> This has the side effect of not releasing huge pages back to the pool
> after a hugetlbfs file-backed memory device is hot-unplugged.
>
> This patch fixes the situation in qemu_ram_mmap() and
> qemu_ram_munmap() by considering the underlying page size on ppc64.
>
> After this patch, memory hot-unplug releases huge pages back to the
> pool.
>
> Fixes: 7197fb4058bcb68986bae2bb2c04d6370f3e7218
> Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Reported-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
Tested-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
I tried to test the patch in POWER8,
Distro: Fedora
Host Kernel: 5.0.0-rc3-gdbaaa7d30 (upstream)
Guest Kernel: 5.0.0-rc3-gdbaaa7d30 (upstream)
Qemu: 3.1.50 (v3.1.0-1313-gbd08d7ec4d-dirty) (upstream)
+ mmap-alloc: unfold qemu_ram_mmap()
+ mmap-alloc: fix hugetlbfs misaligned length in ppc64
1. object_add and object_del works,
Allocated 256M from host with 16M Hugepage
# echo 16 > /proc/sys/vm/nr_hugepages
# cat /proc/meminfo | grep Huge
AnonHugePages: 737280 kB
ShmemHugePages: 0 kB
HugePages_Total: 16
HugePages_Free: 16
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 16384 kB
Hugetlb: 262144 kB
(qemu) object_add memory-backend-file,id=mem1,size=256M,mem-path=/dev/hugepages
# cat /proc/meminfo | grep Huge
AnonHugePages: 737280 kB
ShmemHugePages: 0 kB
HugePages_Total: 16
HugePages_Free: 16
HugePages_Rsvd: 16
HugePages_Surp: 0
Hugepagesize: 16384 kB
Hugetlb: 262144 kB
(qemu) object_del mem1
# cat /proc/meminfo | grep Huge
AnonHugePages: 671744 kB
ShmemHugePages: 0 kB
HugePages_Total: 16
HugePages_Free: 16
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 16384 kB
Hugetlb: 262144 kB
2. Perform complete hotplug-hotunplug works,
Hotplug memory:
(qemu) object_add memory-backend-file,id=mem1,size=256M,mem-path=/dev/hugepages
On Host:
# cat /proc/meminfo | grep Huge
AnonHugePages: 737280 kB
ShmemHugePages: 0 kB
HugePages_Total: 16
HugePages_Free: 16
HugePages_Rsvd: 16
HugePages_Surp: 0
Hugepagesize: 16384 kB
Hugetlb: 262144 kB
(qemu) device_add pc-dimm,id=dimm1,memdev=mem1
# cat /proc/meminfo | grep Huge
AnonHugePages: 737280 kB
ShmemHugePages: 0 kB
HugePages_Total: 16
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 16384 kB
Hugetlb: 262144 kB
Inside guest:
[ 27.380848] pseries-hotplug-mem: Attempting to hot-add 1 LMB(s) at index 80000010
[ 27.420225] lpar: Attempting to resize HPT to shift 23
[ 27.619202] lpar: Hash collision while resizing HPT
[ 27.620113] Unable to resize hash page table to target order 23: -28
[ 27.748767] pseries-hotplug-mem: Memory at 100000000 (drc index 80000010) was hot-added
Hotunplug memory:
(qemu) device_del dimm1
# cat /proc/meminfo | grep Huge
AnonHugePages: 737280 kB
ShmemHugePages: 0 kB
HugePages_Total: 16
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 16384 kB
Hugetlb: 262144 kB
(qemu) object_del mem1
# cat /proc/meminfo | grep Huge
AnonHugePages: 737280 kB
ShmemHugePages: 0 kB
HugePages_Total: 16
HugePages_Free: 16
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 16384 kB
Hugetlb: 262144 kB
Inside guest:
[ 29.640670] pseries-hotplug-mem: Attempting to hot-remove 1 LMB(s) at 80000010
[ 29.710276] Offlined Pages 4096
[ 29.733824] lpar: Attempting to resize HPT to shift 22
[ 29.880381] lpar: Hash collision while resizing HPT
[ 29.881521] Unable to resize hash page table to target order 22: -28
[ 29.900542] pseries-hotplug-mem: Memory at 100000000 (drc index 80000010) was hot-removed
Thanks Murilo,
-- Bala
> ---
> exec.c | 4 ++--
> include/qemu/mmap-alloc.h | 2 +-
> util/mmap-alloc.c | 22 ++++++++++++++++------
> util/oslib-posix.c | 2 +-
> 4 files changed, 20 insertions(+), 10 deletions(-)
>
> diff --git a/exec.c b/exec.c
> index da3e635f91..0db6d8bf34 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1871,7 +1871,7 @@ static void *file_ram_alloc(RAMBlock *block,
> if (mem_prealloc) {
> os_mem_prealloc(fd, area, memory, smp_cpus, errp);
> if (errp && *errp) {
> - qemu_ram_munmap(area, memory);
> + qemu_ram_munmap(fd, area, memory);
> return NULL;
> }
> }
> @@ -2392,7 +2392,7 @@ static void reclaim_ramblock(RAMBlock *block)
> xen_invalidate_map_cache_entry(block->host);
> #ifndef _WIN32
> } else if (block->fd >= 0) {
> - qemu_ram_munmap(block->host, block->max_length);
> + qemu_ram_munmap(block->fd, block->host, block->max_length);
> close(block->fd);
> #endif
> } else {
> diff --git a/include/qemu/mmap-alloc.h b/include/qemu/mmap-alloc.h
> index 50385e3f81..ef04f0ed5b 100644
> --- a/include/qemu/mmap-alloc.h
> +++ b/include/qemu/mmap-alloc.h
> @@ -9,6 +9,6 @@ size_t qemu_mempath_getpagesize(const char *mem_path);
>
> void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared);
>
> -void qemu_ram_munmap(void *ptr, size_t size);
> +void qemu_ram_munmap(int fd, void *ptr, size_t size);
>
> #endif
> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> index f71ea038c8..8565885420 100644
> --- a/util/mmap-alloc.c
> +++ b/util/mmap-alloc.c
> @@ -80,6 +80,7 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
> int flags;
> int guardfd;
> size_t offset;
> + size_t pagesize;
> size_t total;
> void *guardptr;
> void *ptr;
> @@ -100,7 +101,8 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
> * anonymous memory is OK.
> */
> flags = MAP_PRIVATE;
> - if (fd == -1 || qemu_fd_getpagesize(fd) == getpagesize()) {
> + pagesize = qemu_fd_getpagesize(fd);
> + if (fd == -1 || pagesize == getpagesize()) {
> guardfd = -1;
> flags |= MAP_ANONYMOUS;
> } else {
> @@ -109,6 +111,7 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
> }
> #else
> guardfd = -1;
> + pagesize = getpagesize();
> flags = MAP_PRIVATE | MAP_ANONYMOUS;
> #endif
>
> @@ -120,7 +123,7 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
>
> assert(is_power_of_2(align));
> /* Always align to host page size */
> - assert(align >= getpagesize());
> + assert(align >= pagesize);
>
> flags = MAP_FIXED;
> flags |= fd == -1 ? MAP_ANONYMOUS : 0;
> @@ -143,17 +146,24 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
> * a guard page guarding against potential buffer overflows.
> */
> total -= offset;
> - if (total > size + getpagesize()) {
> - munmap(ptr + size + getpagesize(), total - size - getpagesize());
> + if (total > size + pagesize) {
> + munmap(ptr + size + pagesize, total - size - pagesize);
> }
>
> return ptr;
> }
>
> -void qemu_ram_munmap(void *ptr, size_t size)
> +void qemu_ram_munmap(int fd, void *ptr, size_t size)
> {
> + size_t pagesize;
> +
> if (ptr) {
> /* Unmap both the RAM block and the guard page */
> - munmap(ptr, size + getpagesize());
> +#if defined(__powerpc64__) && defined(__linux__)
> + pagesize = qemu_fd_getpagesize(fd);
> +#else
> + pagesize = getpagesize();
> +#endif
> + munmap(ptr, size + pagesize);
> }
> }
> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index 4ce1ba9ca4..37c5854b9c 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -226,7 +226,7 @@ void qemu_vfree(void *ptr)
> void qemu_anon_ram_free(void *ptr, size_t size)
> {
> trace_qemu_anon_ram_free(ptr, size);
> - qemu_ram_munmap(ptr, size);
> + qemu_ram_munmap(-1, ptr, size);
> }
>
> void qemu_set_block(int fd)
> --
> 2.20.1
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH 1/2] mmap-alloc: unfold qemu_ram_mmap()
2019-01-30 23:36 ` [Qemu-devel] [PATCH 1/2] mmap-alloc: unfold qemu_ram_mmap() Murilo Opsfelder Araujo
2019-01-31 9:49 ` Greg Kurz
@ 2019-02-01 13:44 ` Balamuruhan S
1 sibling, 0 replies; 9+ messages in thread
From: Balamuruhan S @ 2019-02-01 13:44 UTC (permalink / raw)
To: Murilo Opsfelder Araujo; +Cc: qemu-devel
On Wed, Jan 30, 2019 at 09:36:04PM -0200, Murilo Opsfelder Araujo wrote:
> Unfold parts of qemu_ram_mmap() for the sake of understanding, moving
> declarations to the top, and keeping architecture-specifics in the
> ifdef-else blocks. No changes in the function behaviour.
>
> Give ptr and ptr1 meaningful names:
> ptr -> guardptr : pointer to the PROT_NONE guard region
> ptr1 -> ptr : pointer to the mapped memory returned to caller
>
> Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com>
Reported-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
Tested-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
> ---
> util/mmap-alloc.c | 53 ++++++++++++++++++++++++++++++-----------------
> 1 file changed, 34 insertions(+), 19 deletions(-)
>
> diff --git a/util/mmap-alloc.c b/util/mmap-alloc.c
> index fd329eccd8..f71ea038c8 100644
> --- a/util/mmap-alloc.c
> +++ b/util/mmap-alloc.c
> @@ -77,11 +77,19 @@ size_t qemu_mempath_getpagesize(const char *mem_path)
>
> void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
> {
> + int flags;
> + int guardfd;
> + size_t offset;
> + size_t total;
> + void *guardptr;
> + void *ptr;
> +
> /*
> * Note: this always allocates at least one extra page of virtual address
> * space, even if size is already aligned.
> */
> - size_t total = size + align;
> + total = size + align;
> +
> #if defined(__powerpc64__) && defined(__linux__)
> /* On ppc64 mappings in the same segment (aka slice) must share the same
> * page size. Since we will be re-allocating part of this segment
> @@ -91,16 +99,22 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
> * We do this unless we are using the system page size, in which case
> * anonymous memory is OK.
> */
> - int anonfd = fd == -1 || qemu_fd_getpagesize(fd) == getpagesize() ? -1 : fd;
> - int flags = anonfd == -1 ? MAP_ANONYMOUS : MAP_NORESERVE;
> - void *ptr = mmap(0, total, PROT_NONE, flags | MAP_PRIVATE, anonfd, 0);
> + flags = MAP_PRIVATE;
> + if (fd == -1 || qemu_fd_getpagesize(fd) == getpagesize()) {
> + guardfd = -1;
> + flags |= MAP_ANONYMOUS;
> + } else {
> + guardfd = fd;
> + flags |= MAP_NORESERVE;
> + }
> #else
> - void *ptr = mmap(0, total, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> + guardfd = -1;
> + flags = MAP_PRIVATE | MAP_ANONYMOUS;
> #endif
> - size_t offset;
> - void *ptr1;
>
> - if (ptr == MAP_FAILED) {
> + guardptr = mmap(0, total, PROT_NONE, flags, guardfd, 0);
> +
> + if (guardptr == MAP_FAILED) {
> return MAP_FAILED;
> }
>
> @@ -108,19 +122,20 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
> /* Always align to host page size */
> assert(align >= getpagesize());
>
> - offset = QEMU_ALIGN_UP((uintptr_t)ptr, align) - (uintptr_t)ptr;
> - ptr1 = mmap(ptr + offset, size, PROT_READ | PROT_WRITE,
> - MAP_FIXED |
> - (fd == -1 ? MAP_ANONYMOUS : 0) |
> - (shared ? MAP_SHARED : MAP_PRIVATE),
> - fd, 0);
> - if (ptr1 == MAP_FAILED) {
> - munmap(ptr, total);
> + flags = MAP_FIXED;
> + flags |= fd == -1 ? MAP_ANONYMOUS : 0;
> + flags |= shared ? MAP_SHARED : MAP_PRIVATE;
> + offset = QEMU_ALIGN_UP((uintptr_t)guardptr, align) - (uintptr_t)guardptr;
> +
> + ptr = mmap(guardptr + offset, size, PROT_READ | PROT_WRITE, flags, fd, 0);
> +
> + if (ptr == MAP_FAILED) {
> + munmap(guardptr, total);
> return MAP_FAILED;
> }
>
> if (offset > 0) {
> - munmap(ptr, offset);
> + munmap(guardptr, offset);
> }
>
> /*
> @@ -129,10 +144,10 @@ void *qemu_ram_mmap(int fd, size_t size, size_t align, bool shared)
> */
> total -= offset;
> if (total > size + getpagesize()) {
> - munmap(ptr1 + size + getpagesize(), total - size - getpagesize());
> + munmap(ptr + size + getpagesize(), total - size - getpagesize());
> }
>
> - return ptr1;
> + return ptr;
> }
>
> void qemu_ram_munmap(void *ptr, size_t size)
> --
> 2.20.1
>
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH 0/2] mmap-alloc: fix hugetlbfs misaligned length in ppc64
2019-01-30 23:36 [Qemu-devel] [PATCH 0/2] mmap-alloc: fix hugetlbfs misaligned length in ppc64 Murilo Opsfelder Araujo
2019-01-30 23:36 ` [Qemu-devel] [PATCH 1/2] mmap-alloc: unfold qemu_ram_mmap() Murilo Opsfelder Araujo
2019-01-30 23:36 ` [Qemu-devel] [PATCH 2/2] mmap-alloc: fix hugetlbfs misaligned length in ppc64 Murilo Opsfelder Araujo
@ 2019-02-04 0:08 ` David Gibson
2019-02-04 14:27 ` Murilo Opsfelder Araujo
2 siblings, 1 reply; 9+ messages in thread
From: David Gibson @ 2019-02-04 0:08 UTC (permalink / raw)
To: Murilo Opsfelder Araujo
Cc: qemu-devel, qemu-ppc, Cao jin, Fabiano Rosas, Greg Kurz,
Michael S . Tsirkin, Paolo Bonzini, Peter Crosthwaite,
Richard Henderson, mopsfelder
[-- Attachment #1: Type: text/plain, Size: 989 bytes --]
On Wed, Jan 30, 2019 at 09:36:03PM -0200, Murilo Opsfelder Araujo wrote:
> The first patch unfolds parts of qemu_ram_mmap() to make it clearer.
> No changes in the function behaviour.
>
> The second one fixes the alignment of the length given to munmap().
>
> I am pretty sure there is room for improvement, so I would love to
> hear your feedback.
>
> Thank you!
Applied to ppc-for-4.0.
>
> Murilo Opsfelder Araujo (2):
> mmap-alloc: unfold qemu_ram_mmap()
> mmap-alloc: fix hugetlbfs misaligned length in ppc64
>
> exec.c | 4 +--
> include/qemu/mmap-alloc.h | 2 +-
> util/mmap-alloc.c | 73 ++++++++++++++++++++++++++-------------
> util/oslib-posix.c | 2 +-
> 4 files changed, 53 insertions(+), 28 deletions(-)
>
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH 0/2] mmap-alloc: fix hugetlbfs misaligned length in ppc64
2019-02-04 0:08 ` [Qemu-devel] [PATCH 0/2] " David Gibson
@ 2019-02-04 14:27 ` Murilo Opsfelder Araujo
0 siblings, 0 replies; 9+ messages in thread
From: Murilo Opsfelder Araujo @ 2019-02-04 14:27 UTC (permalink / raw)
To: David Gibson
Cc: Peter Crosthwaite, Michael S . Tsirkin, Fabiano Rosas, Greg Kurz,
qemu-devel, Cao jin, qemu-ppc, mopsfelder, Paolo Bonzini,
Richard Henderson, Balamuruhan S
On Mon, Feb 04, 2019 at 11:08:05AM +1100, David Gibson wrote:
> On Wed, Jan 30, 2019 at 09:36:03PM -0200, Murilo Opsfelder Araujo wrote:
> > The first patch unfolds parts of qemu_ram_mmap() to make it clearer.
> > No changes in the function behaviour.
> >
> > The second one fixes the alignment of the length given to munmap().
> >
> > I am pretty sure there is room for improvement, so I would love to
> > hear your feedback.
> >
> > Thank you!
>
> Applied to ppc-for-4.0.
Thank you all for reviewing and testing it.
I did appreciate!
--
Murilo
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2019-02-04 14:27 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-01-30 23:36 [Qemu-devel] [PATCH 0/2] mmap-alloc: fix hugetlbfs misaligned length in ppc64 Murilo Opsfelder Araujo
2019-01-30 23:36 ` [Qemu-devel] [PATCH 1/2] mmap-alloc: unfold qemu_ram_mmap() Murilo Opsfelder Araujo
2019-01-31 9:49 ` Greg Kurz
2019-02-01 13:44 ` Balamuruhan S
2019-01-30 23:36 ` [Qemu-devel] [PATCH 2/2] mmap-alloc: fix hugetlbfs misaligned length in ppc64 Murilo Opsfelder Araujo
2019-01-31 9:58 ` Greg Kurz
2019-02-01 13:43 ` Balamuruhan S
2019-02-04 0:08 ` [Qemu-devel] [PATCH 0/2] " David Gibson
2019-02-04 14:27 ` Murilo Opsfelder Araujo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).