From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Nq3Ad-0001ah-SF for qemu-devel@nongnu.org; Fri, 12 Mar 2010 06:36:47 -0500 Received: from [199.232.76.173] (port=60937 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Nq3Ad-0001Zt-01 for qemu-devel@nongnu.org; Fri, 12 Mar 2010 06:36:47 -0500 Received: from Debian-exim by monty-python.gnu.org with spam-scanned (Exim 4.60) (envelope-from ) id 1Nq3Ac-0006ti-6Z for qemu-devel@nongnu.org; Fri, 12 Mar 2010 06:36:46 -0500 Received: from mx20.gnu.org ([199.232.41.8]:51128) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1Nq3Ab-0006te-UV for qemu-devel@nongnu.org; Fri, 12 Mar 2010 06:36:46 -0500 Received: from mail.codesourcery.com ([38.113.113.100]) by mx20.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1Nq3Ab-0003II-0Q for qemu-devel@nongnu.org; Fri, 12 Mar 2010 06:36:45 -0500 From: Paul Brook Subject: Re: [Qemu-devel] [PATCH QEMU] transparent hugepage support Date: Fri, 12 Mar 2010 11:36:33 +0000 References: <20100311151427.GE5677@random.random> <201003111755.10914.paul@codesourcery.com> <20100311184926.GJ5677@random.random> In-Reply-To: <20100311184926.GJ5677@random.random> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201003121136.33916.paul@codesourcery.com> List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: Andrea Arcangeli , Avi Kivity > On Thu, Mar 11, 2010 at 05:55:10PM +0000, Paul Brook wrote: > > sysconf(_SC_HUGEPAGESIZE); would seem to be the obvious answer. > > There's not just one hugepage size We only have one madvise flag... > and that thing doesn't exist yet > plus it'd require mangling over glibc too. If it existed I could use > it but I think this is better: > $ cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size > 2097152 Is "pmd" x86 specific? > > If the allocation size is not a multiple of the preferred alignment, then > > you probably loose either way, and we shouldn't be requesting increased > > alignment. > > That's probably good idea. Also note, if we were to allocate > separately the 0-640k 1m-end, for NPT to work we'd need to start the > second block misaligned at a 1m address. So maybe I should move the > alignment out of qemu_ram_alloc and have it in the caller? I think the only viable solution if you care about EPT/NPT is to not do that. With your current code the 1m-end region will be misaligned - your code allocates it on a 2M boundary. I suspect you actually want (base % 2M) == 1M. Aligning on a 1M boundary will only DTRT half the time. > > I wouldn't be surprised if putting the start of guest ram on a large TLB > > entry was a win. Your guest kernel often lives there! > > Yep, that's easy to handle with the hpage_pmd_size ;). But that's only going to happen if you align the allocation. > > Assuming we're allocating in large chunks, I doubt an extra hugepage > > worth of VMA is a big issue. > > > > Either way I'd argue that this isn't something qemu should have to care > > about, and is actually a bug in posix_memalign. > > Hmm the last is a weird claim considering posix_memalign gets an explicit > alignment parameter and it surely can't choose what alignment to > use. We can argue about the kernel side having to align automatically > but again if it would do that, it'd generate unnecessary vma holes > which we don't want. It can't choose what align to use, but it can (should?) choose how to achieve that alignment. Paul