qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Michael Tokarev <mjt@tls.msk.ru>
To: Avi Kivity <avi@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	qemu-devel <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] qemu and transparent huge pages
Date: Wed, 15 Aug 2012 19:03:24 +0400	[thread overview]
Message-ID: <502BBA3C.70506@msgid.tls.msk.ru> (raw)
In-Reply-To: <502BB1A6.7030407@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 512 bytes --]

On 15.08.2012 18:26, Avi Kivity wrote:
> On 08/15/2012 05:22 PM, Michael Tokarev wrote:
> 
>>>
>>> Please provide extra info, like the setting of
>>> /sys/kernel/mm/transparent_hugepage/enabled.
>>
>> That was it - sort of.  Default value here is enabled=madvise.
>> When setting it to always the effect finally started appearing,
>> so it is actually working.
>>
>> But can't qemu set MADV_HUGEPAGE flag too, so it works automatically?
> 
> It can and should.

Something like the attached patch?

Thanks,

/mjt

[-- Attachment #2: 0001-mark-large-vmalloc-areas-as-MADV_HUGEPAGE-and-allow-.patch --]
[-- Type: text/x-patch, Size: 3044 bytes --]

>From 705b3efb8c0cf06cbf087204fc61863c2bbb9e27 Mon Sep 17 00:00:00 2001
From: Michael Tokarev <mjt@tls.msk.ru>
Date: Wed, 15 Aug 2012 18:55:16 +0400
Subject: [PATCH] mark large vmalloc areas as MADV_HUGEPAGE and allow
 hugepages on i386

A followup to commit 36b586284e678d.

On linux only (which supports transparent hugepages), explicitly mark
large vmalloced areas with madvise(MADV_HUGEPAGES).  The patch changes
previous logic a bit to allow inserting the call to madvise(), but keeps
the code the same (and saves one call to getpagesize() per allocation).

The code also adds #include <sys/mman.h> to the linux-specific part,
to get MADV_HUGEPAGES definition.

While at it, enable transparent hugepages (alignment and the new
explicit marking with madvise()) for 32bit x86 too - it makes good
sense for, say, 32bit userspace on 64bit kernel.

Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
---
 oslib-posix.c |   35 +++++++++++++++++++++++++----------
 1 file changed, 25 insertions(+), 10 deletions(-)

diff --git a/oslib-posix.c b/oslib-posix.c
index dbeb627..ab32d6d 100644
--- a/oslib-posix.c
+++ b/oslib-posix.c
@@ -35,19 +35,23 @@
 extern int daemon(int, int);
 #endif
 
-#if defined(__linux__) && defined(__x86_64__)
+#ifdef __linux__
+# include <sys/mman.h>
+
+# if defined(__x86_64__) || defined(__i386__)
    /* Use 2 MiB alignment so transparent hugepages can be used by KVM.
       Valgrind does not support alignments larger than 1 MiB,
       therefore we need special code which handles running on Valgrind. */
-#  define QEMU_VMALLOC_ALIGN (512 * 4096)
+#  define QEMU_VMALLOC_ALIGN_HUGE (512 * 4096)
 #  define CONFIG_VALGRIND
-#elif defined(__linux__) && defined(__s390x__)
+# elif defined(__s390x__)
    /* Use 1 MiB (segment size) alignment so gmap can be used by KVM. */
-#  define QEMU_VMALLOC_ALIGN (256 * 4096)
-#else
-#  define QEMU_VMALLOC_ALIGN getpagesize()
+#  define QEMU_VMALLOC_ALIGN_HUGE (256 * 4096)
+# endif
 #endif
 
+#define QEMU_VMALLOC_ALIGN getpagesize()
+
 #include "config-host.h"
 #include "sysemu.h"
 #include "trace.h"
@@ -114,7 +118,6 @@ void *qemu_memalign(size_t alignment, size_t size)
 void *qemu_vmalloc(size_t size)
 {
     void *ptr;
-    size_t align = QEMU_VMALLOC_ALIGN;
 
 #if defined(CONFIG_VALGRIND)
     if (running_on_valgrind < 0) {
@@ -125,10 +128,22 @@ void *qemu_vmalloc(size_t size)
     }
 #endif
 
-    if (size < align || running_on_valgrind) {
-        align = getpagesize();
+#ifdef QEMU_VMALLOC_ALIGN_HUGE
+    /* try to allocate as huge pages if supported and large enough */
+    if (size >= QEMU_VMALLOC_ALIGN_HUGE && !running_on_valgrind) {
+        ptr = qemu_memalign(QEMU_VMALLOC_ALIGN_HUGE, size);
+#ifdef MADV_HUGEPAGE
+#error
+        qemu_madvise(ptr, size, MADV_HUGEPAGE);
+#endif
     }
-    ptr = qemu_memalign(align, size);
+    else
+#endif
+    {
+        /* if unsupported or small, allocate pagesize-aligned */
+        ptr = qemu_memalign(QEMU_VMALLOC_ALIGN, size);
+    }
+
     trace_qemu_vmalloc(size, ptr);
     return ptr;
 }
-- 
1.7.10.4


  reply	other threads:[~2012-08-15 15:03 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-15 12:45 [Qemu-devel] qemu and transparent huge pages Michael Tokarev
2012-08-15 12:51 ` Avi Kivity
2012-08-15 14:22   ` Michael Tokarev
2012-08-15 14:26     ` Avi Kivity
2012-08-15 15:03       ` Michael Tokarev [this message]
2012-08-15 15:06         ` Michael Tokarev
2012-09-16 11:19         ` Michael Tokarev
2012-11-12 15:18           ` Michael Tokarev
2012-11-13 14:30             ` Aurelien Jarno
2012-11-13 16:38               ` Michael Tokarev
  -- strict thread matches above, loose matches on Subject: below --
2012-08-15 12:40 Michael Tokarev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=502BBA3C.70506@msgid.tls.msk.ru \
    --to=mjt@tls.msk.ru \
    --cc=aarcange@redhat.com \
    --cc=avi@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).