Re: [patch 00/35] Transparent Hugepage support #13

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Andrea Arcangeli <aarcange@redhat.com>
To: linux-mm@kvack.org, akpm@linux-foundation.org
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
	Adam Litke <agl@us.ibm.com>, Avi Kivity <avi@redhat.com>,
	Izik Eidus <ieidus@redhat.com>,
	Hugh Dickins <hugh.dickins@tiscali.co.uk>,
	Nick Piggin <npiggin@suse.de>, Rik van Riel <riel@redhat.com>,
	Mel Gorman <mel@csn.ul.ie>, Dave Hansen <dave@linux.vnet.ibm.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Ingo Molnar <mingo@elte.hu>, Mike Travis <travis@sgi.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Christoph Lameter <cl@linux-foundation.org>,
	Chris Wright <chrisw@sous-sol.org>,
	bpicco@redhat.com,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	Arnd Bergmann <arnd@arndb.de>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: [patch 00/35] Transparent Hugepage support #13
Date: Thu, 11 Mar 2010 01:55:36 +0100	[thread overview]
Message-ID: <20100311005536.GA5677@random.random> (raw)
In-Reply-To: <20100309193901.207868642@redhat.com>

Hello,

I run a very basic benchmark to confirm this is a significant
improvement for KVM.

qemu-kvm requires this patch to ensure the (gfn & pfn) &
(hpage_size-1) is zero (or hugepages cannot be allocated).

---------
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>

diff --git a/exec.c b/exec.c
index 9bcb4de..b5a44ad 100644
--- a/exec.c
+++ b/exec.c
@@ -2647,11 +2647,18 @@ ram_addr_t qemu_ram_alloc(ram_addr_t size)
                                 PROT_EXEC|PROT_READ|PROT_WRITE,
                                 MAP_SHARED | MAP_ANONYMOUS, -1, 0);
 #else
+#if TARGET_PAGE_BITS == TARGET_HPAGE_BITS
         new_block->host = qemu_vmalloc(size);
+#else
+	new_block->host = qemu_memalign(1 << TARGET_HPAGE_BITS, size);
+#endif
 #endif
 #ifdef MADV_MERGEABLE
         madvise(new_block->host, size, MADV_MERGEABLE);
 #endif
+#ifdef MADV_HUGEPAGE
+        madvise(new_block->host, size, MADV_HUGEPAGE);
+#endif
     }
     new_block->offset = last_ram_offset;
     new_block->length = size;
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index b64bd02..664655d 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -891,6 +891,7 @@ uint64_t cpu_get_tsc(CPUX86State *env);
 #define X86_DUMP_CCOP 0x0002 /* dump qemu flag cache */
 
 #define TARGET_PAGE_BITS 12
+#define TARGET_HPAGE_BITS (TARGET_PAGE_BITS+9)
 
 #define cpu_init cpu_x86_init
 #define cpu_exec cpu_x86_exec
---------

I also did a one liner change to kvm patch to use PageTransCompound
instead of PageHead (the former also is compiled away for 32bit kvm
builds).

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -470,6 +470,15 @@ static int host_mapping_level(struct kvm
 
 	page_size = kvm_host_page_size(kvm, gfn);
 
+	/* check for transparent hugepages */
+	if (page_size == PAGE_SIZE) {
+		struct page *page = gfn_to_page(kvm, gfn);
+
+		if (!is_error_page(page) && PageTransCompound(page))
+			page_size = KVM_HPAGE_SIZE(2);
+		kvm_release_page_clean(page);
+	}
+
 	for (i = PT_PAGE_TABLE_LEVEL;
 	     i < (PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES); ++i) {
 		if (page_size >= KVM_HPAGE_SIZE(i))



This is a kernel build in a 2.6.31 guest, on a 2.6.34-rc1 host. KVM
run with "-drive cache=on,if=virtio,boot=on and -smp 4 -m 2g -vnc :0"
(host has 4G of ram). CPU is Phenom (not II) with NPT (4 cores, 1
die). All reads are provided from host cache and cpu overhead of the
I/O is reduced thanks to virtio. Workload is just a "make clean
>/dev/null; time make -j20 >/dev/null". Results copied by hand because
I logged through vnc.

real 4m12.498s
14m28.106s
1m26.721s

real 4m12.000s
14m27.850s
1m25.729s

After the benchmark:

grep Anon /proc/meminfo 
AnonPages:        121300 kB
AnonHugePages:   1007616 kB
cat /debugfs/kvm/largepages 
2296

1.6G free in guest and 1.5free in host.

Then on host:

# echo never > /sys//kernel/mm/transparent_hugepage/enabled 
# echo never > /sys/kernel/mm/transparent_hugepage/khugepaged/enabled 

then I restart the VM and re-run the same workload:

real 4m25.040s
user 15m4.665s
sys 1m50.519s

real 4m29.653s
user 15m8.637s
sys 1m49.631s

(guest kernel was not so recent and it had no transparent hugepage
support because gcc normally won't take advantage of hugepages
according to /proc/meminfo, so I made the comparison with a distro
guest kernel with my usual .config I use in kvm guests)

So guest compile the kernel 6% faster with hugepages and the results
are trivially reproducible and stable enough (especially with hugepage
enabled, without it varies from 4m24 sto 4m30s as I tried a few times
more without hugepages in NTP when userland wasn't patched yet...).

Below another test that takes advantage of hugepage in guest too, so
running the same 2.6.34-rc1 with transparent hugepage support in both
host and guest. (this really shows the power of KVM design, we boost
the hypervisor and we get double boost for guest applications)

Workload: time dd if=/dev/zero of=/dev/null bs=128M count=100

Host hugepage no guest: 3.898
Host hugepage guest hugepage: 3.966 (-1.17%)
Host no hugepage no guest: 4.088 (-4.87%)
Host hugepage guest no hugepage: 4.312 (-10.1%)
Host no hugepage guest hugepage: 4.388 (-12.5%)
Host no hugepage guest no hugepage: 4.425 (-13.5%)

Workload: time dd if=/dev/zero of=/dev/null bs=4M count=1000

Host hugepage no guest: 1.207
Host hugepage guest hugepage: 1.245 (-3.14%)
Host no hugepage no guest: 1.261 (-4.47%)
Host no hugepage guest no hugepage: 1.323 (-9.61%)
Host no hugepage guest hugepage: 1.371 (-13.5%)
Host no hugepage guest no hugepage: 1.398 (-15.8%)

I've no local EPT system to test so I may run them over vpn later on
some large EPT system (and surely there are better benchs than a silly
dd... but this is a start and shows even basic stuff gets the boost).

The above is basically an "home-workstation/laptop" coverage. I
(partly) intentionally run these on a system that has a ~$100 CPU and
~$50 motherboard, to show the absolute worst case, to be sure that
100% of home end users (running KVM) will take a measurable advantage
from this effort.

On huge systems the percentage boost is expected much bigger than on
the home-workstation above test of course.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

     prev parent reply	other threads:[~2010-03-11  0:56 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-09 19:39 [patch 00/35] Transparent Hugepage support #13 aarcange
2010-03-09 19:39 ` [patch 01/35] define MADV_HUGEPAGE aarcange
2010-03-09 19:39 ` [patch 02/35] compound_lock aarcange
2010-03-09 19:39 ` [patch 03/35] alter compound get_page/put_page aarcange
2010-03-09 19:39 ` [patch 04/35] update futex compound knowledge aarcange
2010-03-09 19:39 ` [patch 05/35] fix bad_page to show the real reason the page is bad aarcange
2010-03-09 19:39 ` [patch 06/35] clear compound mapping aarcange
2010-03-09 19:39 ` [patch 07/35] add native_set_pmd_at aarcange
2010-03-09 19:39 ` [patch 08/35] add pmd paravirt ops aarcange
2010-03-09 19:39 ` [patch 09/35] no paravirt version of pmd ops aarcange
2010-03-09 19:39 ` [patch 10/35] export maybe_mkwrite aarcange
2010-03-09 19:39 ` [patch 11/35] comment reminder in destroy_compound_page aarcange
2010-03-09 19:39 ` [patch 12/35] config_transparent_hugepage aarcange
2010-03-09 19:39 ` [patch 13/35] special pmd_trans_* functions aarcange
2010-03-09 19:39 ` [patch 14/35] add pmd mangling generic functions aarcange
2010-03-09 19:39 ` [patch 15/35] add pmd mangling functions to x86 aarcange
2010-03-09 19:39 ` [patch 16/35] bail out gup_fast on splitting pmd aarcange
2010-03-09 19:39 ` [patch 17/35] pte alloc trans splitting aarcange
2010-03-09 19:39 ` [patch 18/35] add pmd mmu_notifier helpers aarcange
2010-03-09 19:39 ` [patch 19/35] clear page compound aarcange
2010-03-09 19:39 ` [patch 20/35] add pmd_huge_pte to mm_struct aarcange
2010-03-09 19:39 ` [patch 21/35] split_huge_page_mm/vma aarcange
2010-03-09 19:39 ` [patch 22/35] split_huge_page paging aarcange
2010-03-09 19:39 ` [patch 23/35] clear_copy_huge_page aarcange
2010-03-09 19:39 ` [patch 24/35] kvm mmu transparent hugepage support aarcange
2010-03-09 19:39 ` [patch 25/35] _GFP_NO_KSWAPD aarcange
2010-03-09 19:39 ` [patch 26/35] dont alloc harder for gfp nomemalloc even if nowait aarcange
2010-03-09 19:39 ` [patch 27/35] transparent hugepage core aarcange
2010-03-09 19:39 ` [patch 28/35] adapt to mm_counter in -mm aarcange
2010-03-09 19:39 ` [patch 29/35] verify pmd_trans_huge isnt leaking aarcange
2010-03-09 19:39 ` [patch 30/35] madvise(MADV_HUGEPAGE) aarcange
2010-03-09 19:39 ` [patch 31/35] pmd_trans_huge migrate bugcheck aarcange
2010-03-09 19:39 ` [patch 32/35] memcg compound aarcange
2010-03-09 19:39 ` [patch 33/35] memcg huge memory aarcange
2010-03-09 19:39 ` [patch 34/35] transparent hugepage vmstat aarcange
2010-03-09 19:39 ` [patch 35/35] khugepaged aarcange
2010-03-11  0:55 ` Andrea Arcangeli [this message]

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:9bcb4de dfblob:b5a44ad dfblob:b64bd02 dfblob:664655d )
 OR (
bs:"Re: [patch 00/35] Transparent Hugepage support #13" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100311005536.GA5677@random.random \
    --to=aarcange@redhat.com \
    --cc=agl@us.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=avi@redhat.com \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=benh@kernel.crashing.org \
    --cc=bpicco@redhat.com \
    --cc=chrisw@sous-sol.org \
    --cc=cl@linux-foundation.org \
    --cc=dave@linux.vnet.ibm.com \
    --cc=hugh.dickins@tiscali.co.uk \
    --cc=ieidus@redhat.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=mingo@elte.hu \
    --cc=mst@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=npiggin@suse.de \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=travis@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).