From: Paul Brook <paul@codesourcery.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: qemu-devel@nongnu.org, Avi Kivity <avi@redhat.com>
Subject: Re: [Qemu-devel] [PATCH QEMU] transparent hugepage support
Date: Thu, 11 Mar 2010 17:55:10 +0000 [thread overview]
Message-ID: <201003111755.10914.paul@codesourcery.com> (raw)
In-Reply-To: <20100311164642.GI5677@random.random>
> On Thu, Mar 11, 2010 at 04:28:04PM +0000, Paul Brook wrote:
> > > > + /*
> > > > + * Align on HPAGE_SIZE so "(gfn ^ pfn)&
> > > > + * (HPAGE_SIZE-1) == 0" to allow KVM to take advantage
> > > > + * of hugepages with NPT/EPT.
> > > > + */
> > > > + new_block->host = qemu_memalign(1<< TARGET_HPAGE_BITS, size);
> >
> > This should not be target dependent. i.e. it should be the host page
> > size.
>
> Yep I noticed. I'm not aware of an official way to get that
> information out of the kernel (hugepagesize in /proc/meminfo is
> dependent on hugetlbfs which in turn is not a dependency for
> transparent hugepage support) but hey I can add it myself to
> /sys/kernel/mm/transparent_hugepage/hugepage_size !
sysconf(_SC_HUGEPAGESIZE); would seem to be the obvious answer.
> > > That is a little wasteful. How about a hint to mmap() requesting
> > > proper alignment (MAP_HPAGE_ALIGN)?
> >
> > I'd kinda hope that we wouldn't need to. i.e. the host kernel is smart
> > enough to automatically align large allocations anyway.
>
> Kernel won't do that, and the main reason is to avoid creating more
> vmas, it's more efficient to waste virtual space and have userland
> allocate more than needed, than ask the kernel alignment and force it
> to create more vmas because of holes generated out of it. virtual
> memory costs nothing.
Huh. That seems unfortunate :-(
> Also khugepaged can later zero out the pte_none regions to create a
> full segment all backed by hugepages, however if we do that khugepaged
> will eat into the free memory space. At the moment I kept khugepaged a
> zero-memory-footprint thing. But I'm currently adding an option called
> collapse_unmapped to allow khugepaged to collapse unmapped pages too
> so if there are only 2/3 pages in the region before the memalign, they
> also can be mapped by a large tlb to allow qemu run faster.
I don't really understand what you're getting at here. Surely a naturally
aligned block is always going to be easier to defragment than a misaligned
block.
If the allocation size is not a multiple of the preferred alignment, then you
probably loose either way, and we shouldn't be requesting increased alignment.
> > This is probably a useful optimization regardless of KVM.
>
> HPAGE alignment is only useful with KVM because it can only payoff
> with EPT/NPT, transparent hugepage already works fine without that
> (but ok it'd be a microoptimization for the first and last few pages
> in the whole vma). This is why I made it conditional to
> kvm_enabled(). I can remove the kvm_enabled() check if you worry about
> the first and last pages in the huge anon vma.
I wouldn't be surprised if putting the start of guest ram on a large TLB entry
was a win. Your guest kernel often lives there!
> OTOH the madvise(MADV_HUGEPAGE) is surely good idea for qemu too. KVM
> normally runs on 64bit hosts, so it's no big deal if we waste 1M of
> virtual memory here and there but I thought on qemu you preferred not
> to have alignment and have the first few and last few pages in a vma
> not backed by large tlb. Ideally we should also align on hpage size if
> sizeof(long) = 8. Not sure what's the recommended way to code that
> though and it'll make it a bit more complex for little good.
Assuming we're allocating in large chunks, I doubt an extra hugepage worth of
VMA is a big issue.
Either way I'd argue that this isn't something qemu should have to care about,
and is actually a bug in posix_memalign.
Paul
next prev parent reply other threads:[~2010-03-11 17:55 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-11 15:14 [Qemu-devel] [PATCH QEMU] transparent hugepage support Andrea Arcangeli
2010-03-11 15:52 ` Avi Kivity
2010-03-11 16:05 ` Andrea Arcangeli
2010-03-13 8:28 ` Avi Kivity
2010-03-13 17:47 ` Andrea Arcangeli
2010-03-11 16:28 ` Paul Brook
2010-03-11 16:46 ` Andrea Arcangeli
2010-03-11 17:55 ` Paul Brook [this message]
2010-03-11 18:49 ` Andrea Arcangeli
2010-03-12 11:36 ` Paul Brook
2010-03-12 14:52 ` Andrea Arcangeli
2010-03-12 16:04 ` Paul Brook
2010-03-12 16:17 ` Andrea Arcangeli
2010-03-12 16:24 ` Paul Brook
2010-03-12 16:57 ` Andrea Arcangeli
2010-03-12 17:10 ` Paul Brook
2010-03-12 17:41 ` Andrea Arcangeli
2010-03-12 18:17 ` Paul Brook
2010-03-12 18:36 ` Andrea Arcangeli
2010-03-12 18:41 ` Paul Brook
2010-03-12 18:51 ` Andrea Arcangeli
2010-03-12 22:40 ` Jamie Lokier
2010-03-12 16:10 ` Paul Brook
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201003111755.10914.paul@codesourcery.com \
--to=paul@codesourcery.com \
--cc=aarcange@redhat.com \
--cc=avi@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).