From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1Nq3Ad-0001ah-SF
	for qemu-devel@nongnu.org; Fri, 12 Mar 2010 06:36:47 -0500
Received: from [199.232.76.173] (port=60937 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1Nq3Ad-0001Zt-01
	for qemu-devel@nongnu.org; Fri, 12 Mar 2010 06:36:47 -0500
Received: from Debian-exim by monty-python.gnu.org with spam-scanned (Exim
	4.60) (envelope-from <paul@codesourcery.com>) id 1Nq3Ac-0006ti-6Z
	for qemu-devel@nongnu.org; Fri, 12 Mar 2010 06:36:46 -0500
Received: from mx20.gnu.org ([199.232.41.8]:51128)
	by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32)
	(Exim 4.60) (envelope-from <paul@codesourcery.com>)
	id 1Nq3Ab-0006te-UV
	for qemu-devel@nongnu.org; Fri, 12 Mar 2010 06:36:46 -0500
Received: from mail.codesourcery.com ([38.113.113.100])
	by mx20.gnu.org with esmtp (Exim 4.60)
	(envelope-from <paul@codesourcery.com>) id 1Nq3Ab-0003II-0Q
	for qemu-devel@nongnu.org; Fri, 12 Mar 2010 06:36:45 -0500
From: Paul Brook <paul@codesourcery.com>
Subject: Re: [Qemu-devel] [PATCH QEMU] transparent hugepage support
Date: Fri, 12 Mar 2010 11:36:33 +0000
References: <20100311151427.GE5677@random.random>
	<201003111755.10914.paul@codesourcery.com>
	<20100311184926.GJ5677@random.random>
In-Reply-To: <20100311184926.GJ5677@random.random>
MIME-Version: 1.0
Content-Type: Text/Plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <201003121136.33916.paul@codesourcery.com>
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org
Cc: Andrea Arcangeli <aarcange@redhat.com>, Avi Kivity <avi@redhat.com>

> On Thu, Mar 11, 2010 at 05:55:10PM +0000, Paul Brook wrote:
> > sysconf(_SC_HUGEPAGESIZE); would seem to be the obvious answer.
> 
> There's not just one hugepage size 

We only have one madvise flag...

> and that thing doesn't exist yet
> plus it'd require mangling over glibc too. If it existed I could use
> it but I think this is better:
 
> $ cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size
> 2097152

Is "pmd" x86 specific?

> > If the allocation size is not a multiple of the preferred alignment, then
> > you probably loose either way, and we shouldn't be requesting increased
> > alignment.
> 
> That's probably good idea. Also note, if we were to allocate
> separately the 0-640k 1m-end, for NPT to work we'd need to start the
> second block misaligned at a 1m address. So maybe I should move the
> alignment out of qemu_ram_alloc and have it in the caller?

I think the only viable solution if you care about EPT/NPT is to not do that. 
With your current code the 1m-end region will be misaligned - your code 
allocates it on a 2M boundary. I suspect you actually want (base % 2M) == 1M. 
Aligning on a 1M boundary will only DTRT half the time.
 
> > I wouldn't be surprised if putting the start of guest ram on a large TLB
> > entry was a win. Your guest kernel often lives there!
> 
> Yep, that's easy to handle with the hpage_pmd_size ;).

But that's only going to happen if you align the allocation.

> > Assuming we're allocating in large chunks, I doubt an extra hugepage
> > worth of VMA is a big issue.
> >
> > Either way I'd argue that this isn't something qemu should have to care
> > about, and is actually a bug in posix_memalign.
> 
> Hmm the last is a weird claim considering posix_memalign gets an explicit
> alignment parameter and it surely can't choose what alignment to
> use. We can argue about the kernel side having to align automatically
> but again if it would do that, it'd generate unnecessary vma holes
> which we don't want.

It can't choose what align to use, but it can (should?) choose how to achieve 
that alignment.

Paul