From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NpmbX-0004Ra-4R for qemu-devel@nongnu.org; Thu, 11 Mar 2010 12:55:27 -0500 Received: from [199.232.76.173] (port=37805 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NpmbW-0004R3-DU for qemu-devel@nongnu.org; Thu, 11 Mar 2010 12:55:26 -0500 Received: from Debian-exim by monty-python.gnu.org with spam-scanned (Exim 4.60) (envelope-from ) id 1NpmbV-0000y8-DO for qemu-devel@nongnu.org; Thu, 11 Mar 2010 12:55:26 -0500 Received: from mx20.gnu.org ([199.232.41.8]:40409) by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1NpmbV-0000xx-59 for qemu-devel@nongnu.org; Thu, 11 Mar 2010 12:55:25 -0500 Received: from mail.codesourcery.com ([38.113.113.100]) by mx20.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1NpmbT-0000P2-BP for qemu-devel@nongnu.org; Thu, 11 Mar 2010 12:55:23 -0500 From: Paul Brook Subject: Re: [Qemu-devel] [PATCH QEMU] transparent hugepage support Date: Thu, 11 Mar 2010 17:55:10 +0000 References: <20100311151427.GE5677@random.random> <201003111628.04566.paul@codesourcery.com> <20100311164642.GI5677@random.random> In-Reply-To: <20100311164642.GI5677@random.random> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201003111755.10914.paul@codesourcery.com> List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Andrea Arcangeli Cc: qemu-devel@nongnu.org, Avi Kivity > On Thu, Mar 11, 2010 at 04:28:04PM +0000, Paul Brook wrote: > > > > + /* > > > > + * Align on HPAGE_SIZE so "(gfn ^ pfn)& > > > > + * (HPAGE_SIZE-1) == 0" to allow KVM to take advantage > > > > + * of hugepages with NPT/EPT. > > > > + */ > > > > + new_block->host = qemu_memalign(1<< TARGET_HPAGE_BITS, size); > > > > This should not be target dependent. i.e. it should be the host page > > size. > > Yep I noticed. I'm not aware of an official way to get that > information out of the kernel (hugepagesize in /proc/meminfo is > dependent on hugetlbfs which in turn is not a dependency for > transparent hugepage support) but hey I can add it myself to > /sys/kernel/mm/transparent_hugepage/hugepage_size ! sysconf(_SC_HUGEPAGESIZE); would seem to be the obvious answer. > > > That is a little wasteful. How about a hint to mmap() requesting > > > proper alignment (MAP_HPAGE_ALIGN)? > > > > I'd kinda hope that we wouldn't need to. i.e. the host kernel is smart > > enough to automatically align large allocations anyway. > > Kernel won't do that, and the main reason is to avoid creating more > vmas, it's more efficient to waste virtual space and have userland > allocate more than needed, than ask the kernel alignment and force it > to create more vmas because of holes generated out of it. virtual > memory costs nothing. Huh. That seems unfortunate :-( > Also khugepaged can later zero out the pte_none regions to create a > full segment all backed by hugepages, however if we do that khugepaged > will eat into the free memory space. At the moment I kept khugepaged a > zero-memory-footprint thing. But I'm currently adding an option called > collapse_unmapped to allow khugepaged to collapse unmapped pages too > so if there are only 2/3 pages in the region before the memalign, they > also can be mapped by a large tlb to allow qemu run faster. I don't really understand what you're getting at here. Surely a naturally aligned block is always going to be easier to defragment than a misaligned block. If the allocation size is not a multiple of the preferred alignment, then you probably loose either way, and we shouldn't be requesting increased alignment. > > This is probably a useful optimization regardless of KVM. > > HPAGE alignment is only useful with KVM because it can only payoff > with EPT/NPT, transparent hugepage already works fine without that > (but ok it'd be a microoptimization for the first and last few pages > in the whole vma). This is why I made it conditional to > kvm_enabled(). I can remove the kvm_enabled() check if you worry about > the first and last pages in the huge anon vma. I wouldn't be surprised if putting the start of guest ram on a large TLB entry was a win. Your guest kernel often lives there! > OTOH the madvise(MADV_HUGEPAGE) is surely good idea for qemu too. KVM > normally runs on 64bit hosts, so it's no big deal if we waste 1M of > virtual memory here and there but I thought on qemu you preferred not > to have alignment and have the first few and last few pages in a vma > not backed by large tlb. Ideally we should also align on hpage size if > sizeof(long) = 8. Not sure what's the recommended way to code that > though and it'll make it a bit more complex for little good. Assuming we're allocating in large chunks, I doubt an extra hugepage worth of VMA is a big issue. Either way I'd argue that this isn't something qemu should have to care about, and is actually a bug in posix_memalign. Paul