[Qemu-devel] [PATCH RFC] mem-prealloc: Reduce large guest start-up and migration time.

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Jitendra Kolhe <jitendra.kolhe@hpe.com>
To: qemu-devel@nongnu.org
Cc: pbonzini@redhat.com, peter.maydell@linaro.org, armbru@redhat.com,
	kwolf@redhat.com, eblake@redhat.com, mohan_parthasarathy@hpe.com,
	renganathan.meenakshisundaram@hpe.com, jitendra.kolhe@hpe.com
Subject: [Qemu-devel] [PATCH RFC] mem-prealloc: Reduce large guest start-up and migration time.
Date: Thu,  5 Jan 2017 12:54:02 +0530	[thread overview]
Message-ID: <1483601042-6435-1-git-send-email-jitendra.kolhe@hpe.com> (raw)

Using "-mem-prealloc" option for a very large guest leads to huge guest
start-up and migration time. This is because with "-mem-prealloc" option
qemu tries to map every guest page (create address translations), and
make sure the pages are available during runtime. virsh/libvirt by
default, seems to use "-mem-prealloc" option in case the guest is
configured to use huge pages. The patch tries to map all guest pages
simultaneously by spawning multiple threads. Given the problem is more
prominent for large guests, the patch limits the changes to the guests
of at-least 64GB of memory size. Currently limiting the change to QEMU
library functions on POSIX compliant host only, as we are not sure if
the problem exists on win32. Below are some stats with "-mem-prealloc"
option for guest configured to use huge pages.

------------------------------------------------------------------------
Idle Guest      | Start-up time | Migration time
------------------------------------------------------------------------
Guest stats with 2M HugePage usage - single threaded (existing code)
------------------------------------------------------------------------
64 Core - 4TB   | 54m11.796s    | 75m43.843s
64 Core - 1TB   | 8m56.576s     | 14m29.049s
64 Core - 256GB | 2m11.245s     | 3m26.598s
------------------------------------------------------------------------
Guest stats with 2M HugePage usage - map guest pages using 8 threads
------------------------------------------------------------------------
64 Core - 4TB   | 5m1.027s      | 34m10.565s
64 Core - 1TB   | 1m10.366s     | 8m28.188s
64 Core - 256GB | 0m19.040s     | 2m10.148s
-----------------------------------------------------------------------
Guest stats with 2M HugePage usage - map guest pages using 16 threads
-----------------------------------------------------------------------
64 Core - 4TB   | 1m58.970s     | 31m43.400s
64 Core - 1TB   | 0m39.885s     | 7m55.289s
64 Core - 256GB | 0m11.960s     | 2m0.135s
-----------------------------------------------------------------------

Signed-off-by: Jitendra Kolhe <jitendra.kolhe@hpe.com>
---
 util/oslib-posix.c | 64 +++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 61 insertions(+), 3 deletions(-)

diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index f631464..a8bd7c2 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -55,6 +55,13 @@
 #include "qemu/error-report.h"
 #endif
 
+#define PAGE_TOUCH_THREAD_COUNT 8
+typedef struct {
+    char *addr;
+    uint64_t numpages;
+    uint64_t hpagesize;
+} PageRange;
+
 int qemu_get_thread_id(void)
 {
 #if defined(__linux__)
@@ -323,6 +330,52 @@ static void sigbus_handler(int signal)
     siglongjmp(sigjump, 1);
 }
 
+static void *do_touch_pages(void *arg)
+{
+    PageRange *range = (PageRange *)arg;
+    char *start_addr = range->addr;
+    uint64_t numpages = range->numpages;
+    uint64_t hpagesize = range->hpagesize;
+    uint64_t i = 0;
+
+    for (i = 0; i < numpages; i++) {
+        memset(start_addr + (hpagesize * i), 0, 1);
+    }
+    qemu_thread_exit(NULL);
+
+    return NULL;
+}
+
+static int touch_all_pages(char *area, size_t hpagesize, size_t numpages)
+{
+    QemuThread page_threads[PAGE_TOUCH_THREAD_COUNT];
+    PageRange page_range[PAGE_TOUCH_THREAD_COUNT];
+    uint64_t    numpage_per_thread, size_per_thread;
+    int         i = 0, tcount = 0;
+
+    numpage_per_thread = (numpages / PAGE_TOUCH_THREAD_COUNT);
+    size_per_thread = (hpagesize * numpage_per_thread);
+    for (i = 0; i < (PAGE_TOUCH_THREAD_COUNT - 1); i++) {
+        page_range[i].addr = area;
+        page_range[i].numpages = numpage_per_thread;
+        page_range[i].hpagesize = hpagesize;
+
+        qemu_thread_create(page_threads + i, "touch_pages",
+                           do_touch_pages, (page_range + i),
+                           QEMU_THREAD_JOINABLE);
+        tcount++;
+        area += size_per_thread;
+        numpages -= numpage_per_thread;
+    }
+    for (i = 0; i < numpages; i++) {
+        memset(area + (hpagesize * i), 0, 1);
+    }
+    for (i = 0; i < tcount; i++) {
+        qemu_thread_join(page_threads + i);
+    }
+    return 0;
+}
+
 void os_mem_prealloc(int fd, char *area, size_t memory, Error **errp)
 {
     int ret;
@@ -353,9 +406,14 @@ void os_mem_prealloc(int fd, char *area, size_t memory, Error **errp)
         size_t hpagesize = qemu_fd_getpagesize(fd);
         size_t numpages = DIV_ROUND_UP(memory, hpagesize);
 
-        /* MAP_POPULATE silently ignores failures */
-        for (i = 0; i < numpages; i++) {
-            memset(area + (hpagesize * i), 0, 1);
+        /* touch pages simultaneously for memory >= 64G */
+        if (memory < (1ULL << 36)) {
+            /* MAP_POPULATE silently ignores failures */
+            for (i = 0; i < numpages; i++) {
+                memset(area + (hpagesize * i), 0, 1);
+            }
+        } else {
+            touch_all_pages(area, hpagesize, numpages);
         }
     }
 
-- 
1.8.3.1

next             reply	other threads:[~2017-01-05  7:22 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-05  7:24 Jitendra Kolhe [this message]
2017-01-27 12:53 ` [Qemu-devel] [PATCH RFC] mem-prealloc: Reduce large guest start-up and migration time Juan Quintela
2017-01-27 13:06   ` Paolo Bonzini
2017-01-30  8:19   ` Jitendra Kolhe
2017-01-27 13:03 ` Dr. David Alan Gilbert
2017-01-30  8:32   ` Jitendra Kolhe
2017-02-07  7:44     ` Jitendra Kolhe
2017-01-27 13:26 ` Daniel P. Berrange
2017-02-02  9:35   ` Jitendra Kolhe
2017-02-03 18:59     ` Paolo Bonzini

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:f631464 dfblob:a8bd7c2 )
 OR (
bs:"[Qemu-devel] [PATCH RFC] mem-prealloc: Reduce large guest start-up and migration time." )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1483601042-6435-1-git-send-email-jitendra.kolhe@hpe.com \
    --to=jitendra.kolhe@hpe.com \
    --cc=armbru@redhat.com \
    --cc=eblake@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mohan_parthasarathy@hpe.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=renganathan.meenakshisundaram@hpe.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).