All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paul Brook <paul@codesourcery.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: qemu-devel@nongnu.org, Avi Kivity <avi@redhat.com>
Subject: Re: [Qemu-devel] [PATCH QEMU] transparent hugepage support
Date: Thu, 11 Mar 2010 17:55:10 +0000	[thread overview]
Message-ID: <201003111755.10914.paul@codesourcery.com> (raw)
In-Reply-To: <20100311164642.GI5677@random.random>

> On Thu, Mar 11, 2010 at 04:28:04PM +0000, Paul Brook wrote:
> > > > +		/*
> > > > +		 * Align on HPAGE_SIZE so "(gfn ^ pfn)&
> > > > +		 * (HPAGE_SIZE-1) == 0" to allow KVM to take advantage
> > > > +		 * of hugepages with NPT/EPT.
> > > > +		 */
> > > > +		new_block->host = qemu_memalign(1<<  TARGET_HPAGE_BITS, size);
> >
> > This should not be target dependent. i.e. it should be the host page
> > size.
> 
> Yep I noticed. I'm not aware of an official way to get that
> information out of the kernel (hugepagesize in /proc/meminfo is
> dependent on hugetlbfs which in turn is not a dependency for
> transparent hugepage support) but hey I can add it myself to
> /sys/kernel/mm/transparent_hugepage/hugepage_size !

sysconf(_SC_HUGEPAGESIZE); would seem to be the obvious answer.
 
> > > That is a little wasteful.  How about a hint to mmap() requesting
> > > proper alignment (MAP_HPAGE_ALIGN)?
> >
> > I'd kinda hope that we wouldn't need to. i.e. the host kernel is smart
> > enough to automatically align large allocations anyway.
> 
> Kernel won't do that, and the main reason is to avoid creating more
> vmas, it's more efficient to waste virtual space and have userland
> allocate more than needed, than ask the kernel alignment and force it
> to create more vmas because of holes generated out of it. virtual
> memory costs nothing.

Huh. That seems unfortunate :-(

> Also khugepaged can later zero out the pte_none regions to create a
> full segment all backed by hugepages, however if we do that khugepaged
> will eat into the free memory space. At the moment I kept khugepaged a
> zero-memory-footprint thing. But I'm currently adding an option called
> collapse_unmapped to allow khugepaged to collapse unmapped pages too
> so if there are only 2/3 pages in the region before the memalign, they
> also can be mapped by a large tlb to allow qemu run faster.

I don't really understand what you're getting at here. Surely a naturally 
aligned block is always going to be easier to defragment than a misaligned 
block.

If the allocation size is not a multiple of the preferred alignment, then you 
probably loose either way, and we shouldn't be requesting increased alignment.

> > This is probably a useful optimization regardless of KVM.
> 
> HPAGE alignment is only useful with KVM because it can only payoff
> with EPT/NPT, transparent hugepage already works fine without that
> (but ok it'd be a microoptimization for the first and last few pages
> in the whole vma). This is why I made it conditional to
> kvm_enabled(). I can remove the kvm_enabled() check if you worry about
> the first and last pages in the huge anon vma.

I wouldn't be surprised if putting the start of guest ram on a large TLB entry 
was a win. Your guest kernel often lives there!

> OTOH the madvise(MADV_HUGEPAGE) is surely good idea for qemu too. KVM
> normally runs on 64bit hosts, so it's no big deal if we waste 1M of
> virtual memory here and there but I thought on qemu you preferred not
> to have alignment and have the first few and last few pages in a vma
> not backed by large tlb. Ideally we should also align on hpage size if
> sizeof(long) = 8. Not sure what's the recommended way to code that
> though and it'll make it a bit more complex for little good.

Assuming we're allocating in large chunks, I doubt an extra hugepage worth of 
VMA is a big issue.

Either way I'd argue that this isn't something qemu should have to care about, 
and is actually a bug in posix_memalign.

Paul

  reply	other threads:[~2010-03-11 17:55 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-11 15:14 [Qemu-devel] [PATCH QEMU] transparent hugepage support Andrea Arcangeli
2010-03-11 15:52 ` Avi Kivity
2010-03-11 16:05   ` Andrea Arcangeli
2010-03-13  8:28     ` Avi Kivity
2010-03-13 17:47       ` Andrea Arcangeli
2010-03-11 16:28   ` Paul Brook
2010-03-11 16:46     ` Andrea Arcangeli
2010-03-11 17:55       ` Paul Brook [this message]
2010-03-11 18:49         ` Andrea Arcangeli
2010-03-12 11:36           ` Paul Brook
2010-03-12 14:52             ` Andrea Arcangeli
2010-03-12 16:04               ` Paul Brook
2010-03-12 16:17                 ` Andrea Arcangeli
2010-03-12 16:24                   ` Paul Brook
2010-03-12 16:57                     ` Andrea Arcangeli
2010-03-12 17:10                       ` Paul Brook
2010-03-12 17:41                         ` Andrea Arcangeli
2010-03-12 18:17                           ` Paul Brook
2010-03-12 18:36                             ` Andrea Arcangeli
2010-03-12 18:41                               ` Paul Brook
2010-03-12 18:51                                 ` Andrea Arcangeli
2010-03-12 22:40                                   ` Jamie Lokier
2010-03-12 16:10               ` Paul Brook

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201003111755.10914.paul@codesourcery.com \
    --to=paul@codesourcery.com \
    --cc=aarcange@redhat.com \
    --cc=avi@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.