Re: [Qemu-devel] [PATCH QEMU] transparent hugepage support

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Paul Brook <paul@codesourcery.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: qemu-devel@nongnu.org, Avi Kivity <avi@redhat.com>
Subject: Re: [Qemu-devel] [PATCH QEMU] transparent hugepage support
Date: Thu, 11 Mar 2010 17:55:10 +0000	[thread overview]
Message-ID: <201003111755.10914.paul@codesourcery.com> (raw)
In-Reply-To: <20100311164642.GI5677@random.random>

> On Thu, Mar 11, 2010 at 04:28:04PM +0000, Paul Brook wrote:
> > > > +		/*
> > > > +		 * Align on HPAGE_SIZE so "(gfn ^ pfn)&
> > > > +		 * (HPAGE_SIZE-1) == 0" to allow KVM to take advantage
> > > > +		 * of hugepages with NPT/EPT.
> > > > +		 */
> > > > +		new_block->host = qemu_memalign(1<<  TARGET_HPAGE_BITS, size);
> >
> > This should not be target dependent. i.e. it should be the host page
> > size.
> 
> Yep I noticed. I'm not aware of an official way to get that
> information out of the kernel (hugepagesize in /proc/meminfo is
> dependent on hugetlbfs which in turn is not a dependency for
> transparent hugepage support) but hey I can add it myself to
> /sys/kernel/mm/transparent_hugepage/hugepage_size !

sysconf(_SC_HUGEPAGESIZE); would seem to be the obvious answer.
 
> > > That is a little wasteful.  How about a hint to mmap() requesting
> > > proper alignment (MAP_HPAGE_ALIGN)?
> >
> > I'd kinda hope that we wouldn't need to. i.e. the host kernel is smart
> > enough to automatically align large allocations anyway.
> 
> Kernel won't do that, and the main reason is to avoid creating more
> vmas, it's more efficient to waste virtual space and have userland
> allocate more than needed, than ask the kernel alignment and force it
> to create more vmas because of holes generated out of it. virtual
> memory costs nothing.

Huh. That seems unfortunate :-(

> Also khugepaged can later zero out the pte_none regions to create a
> full segment all backed by hugepages, however if we do that khugepaged
> will eat into the free memory space. At the moment I kept khugepaged a
> zero-memory-footprint thing. But I'm currently adding an option called
> collapse_unmapped to allow khugepaged to collapse unmapped pages too
> so if there are only 2/3 pages in the region before the memalign, they
> also can be mapped by a large tlb to allow qemu run faster.

I don't really understand what you're getting at here. Surely a naturally 
aligned block is always going to be easier to defragment than a misaligned 
block.

If the allocation size is not a multiple of the preferred alignment, then you 
probably loose either way, and we shouldn't be requesting increased alignment.

> > This is probably a useful optimization regardless of KVM.
> 
> HPAGE alignment is only useful with KVM because it can only payoff
> with EPT/NPT, transparent hugepage already works fine without that
> (but ok it'd be a microoptimization for the first and last few pages
> in the whole vma). This is why I made it conditional to
> kvm_enabled(). I can remove the kvm_enabled() check if you worry about
> the first and last pages in the huge anon vma.

I wouldn't be surprised if putting the start of guest ram on a large TLB entry 
was a win. Your guest kernel often lives there!

> OTOH the madvise(MADV_HUGEPAGE) is surely good idea for qemu too. KVM
> normally runs on 64bit hosts, so it's no big deal if we waste 1M of
> virtual memory here and there but I thought on qemu you preferred not
> to have alignment and have the first few and last few pages in a vma
> not backed by large tlb. Ideally we should also align on hpage size if
> sizeof(long) = 8. Not sure what's the recommended way to code that
> though and it'll make it a bit more complex for little good.

Assuming we're allocating in large chunks, I doubt an extra hugepage worth of 
VMA is a big issue.

Either way I'd argue that this isn't something qemu should have to care about, 
and is actually a bug in posix_memalign.

Paul

next prev parent reply	other threads:[~2010-03-11 17:55 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-11 15:14 [Qemu-devel] [PATCH QEMU] transparent hugepage support Andrea Arcangeli
2010-03-11 15:52 ` Avi Kivity
2010-03-11 16:05   ` Andrea Arcangeli
2010-03-13  8:28     ` Avi Kivity
2010-03-13 17:47       ` Andrea Arcangeli
2010-03-11 16:28   ` Paul Brook
2010-03-11 16:46     ` Andrea Arcangeli
2010-03-11 17:55       ` Paul Brook [this message]
2010-03-11 18:49         ` Andrea Arcangeli
2010-03-12 11:36           ` Paul Brook
2010-03-12 14:52             ` Andrea Arcangeli
2010-03-12 16:04               ` Paul Brook
2010-03-12 16:17                 ` Andrea Arcangeli
2010-03-12 16:24                   ` Paul Brook
2010-03-12 16:57                     ` Andrea Arcangeli
2010-03-12 17:10                       ` Paul Brook
2010-03-12 17:41                         ` Andrea Arcangeli
2010-03-12 18:17                           ` Paul Brook
2010-03-12 18:36                             ` Andrea Arcangeli
2010-03-12 18:41                               ` Paul Brook
2010-03-12 18:51                                 ` Andrea Arcangeli
2010-03-12 22:40                                   ` Jamie Lokier
2010-03-12 16:10               ` Paul Brook

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201003111755.10914.paul@codesourcery.com \
    --to=paul@codesourcery.com \
    --cc=aarcange@redhat.com \
    --cc=avi@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).