From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1Npksp-0007lh-Fo
	for qemu-devel@nongnu.org; Thu, 11 Mar 2010 11:05:11 -0500
Received: from [199.232.76.173] (port=60191 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1Npksp-0007lE-49
	for qemu-devel@nongnu.org; Thu, 11 Mar 2010 11:05:11 -0500
Received: from Debian-exim by monty-python.gnu.org with spam-scanned (Exim
	4.60) (envelope-from <aarcange@redhat.com>) id 1Npksn-0007VJ-Pk
	for qemu-devel@nongnu.org; Thu, 11 Mar 2010 11:05:10 -0500
Received: from mx1.redhat.com ([209.132.183.28]:38122)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <aarcange@redhat.com>) id 1Npksn-0007Uz-E9
	for qemu-devel@nongnu.org; Thu, 11 Mar 2010 11:05:09 -0500
Received: from int-mx05.intmail.prod.int.phx2.redhat.com
	(int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.18])
	by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id o2BG57QP010195
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK)
	for <qemu-devel@nongnu.org>; Thu, 11 Mar 2010 11:05:07 -0500
Date: Thu, 11 Mar 2010 17:05:05 +0100
From: Andrea Arcangeli <aarcange@redhat.com>
Subject: Re: [Qemu-devel] [PATCH QEMU] transparent hugepage support
Message-ID: <20100311160505.GG5677@random.random>
References: <20100311151427.GE5677@random.random> <4B9911B0.5000302@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4B9911B0.5000302@redhat.com>
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Avi Kivity <avi@redhat.com>
Cc: qemu-devel@nongnu.org

On Thu, Mar 11, 2010 at 05:52:16PM +0200, Avi Kivity wrote:
> That is a little wasteful.  How about a hint to mmap() requesting proper 
> alignment (MAP_HPAGE_ALIGN)?

So you suggest adding a new kernel feature to mmap? Not sure if it's
worth it, considering it'd also increase the number of vmas because it
will have to leave an hole. Wasting 2M-4k of virtual memory is likely
cheaper than having 1 more vma in the rbtree for every page fault. So
I think it's better to just malloc and adjust ourselfs on the next
offset which is done in userland by qemu_memalign I think.

What we could ask the kernel is the HPAGE_SIZE. Also thinking a bit
more about it, it now comes to mind what we really care about is the
HOST_HPAGE_SIZE. Said that I doubt for kvm it makes a lot of
difference and this only changes the kvm path. I'm open to suggestions
of where to get the HPAGE_SIZE from and how to call it...

> Failing that, modify qemu_memalign() to trim excess memory.
> 
> Come to think of it, posix_memalign() needs to do that (but doesn't).

It's hard to tell because of the amount of #ifdefs in .c files, but it
seems to be using posix_memalign.

If we don't touch these additional pages allocated and there's no
transparent hugepage support in the kernel, you won't waste any more
memory and less vmas will be generated this way than with a kernel
option to reduce the virtual memory waste. Basically the idea is to
waste virtual memory to avoid wasting cpu.

In short we should make sure it only wastes virtual memory...