Re: [Qemu-devel] [PATCH QEMU] transparent hugepage support

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Andrea Arcangeli <aarcange@redhat.com>
To: Paul Brook <paul@codesourcery.com>
Cc: qemu-devel@nongnu.org, Avi Kivity <avi@redhat.com>
Subject: Re: [Qemu-devel] [PATCH QEMU] transparent hugepage support
Date: Thu, 11 Mar 2010 19:49:26 +0100	[thread overview]
Message-ID: <20100311184926.GJ5677@random.random> (raw)
In-Reply-To: <201003111755.10914.paul@codesourcery.com>

On Thu, Mar 11, 2010 at 05:55:10PM +0000, Paul Brook wrote:
> sysconf(_SC_HUGEPAGESIZE); would seem to be the obvious answer.

There's not just one hugepage size and that thing doesn't exist yet
plus it'd require mangling over glibc too. If it existed I could use
it but I think this is better:

$ cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size 
2097152

Ok? If this file doesn't exist we won't align, so we also align on
qemu not only on kvm for the concern below on the first and last bytes.

> > Also khugepaged can later zero out the pte_none regions to create a
> > full segment all backed by hugepages, however if we do that khugepaged
> > will eat into the free memory space. At the moment I kept khugepaged a
> > zero-memory-footprint thing. But I'm currently adding an option called
> > collapse_unmapped to allow khugepaged to collapse unmapped pages too
> > so if there are only 2/3 pages in the region before the memalign, they
> > also can be mapped by a large tlb to allow qemu run faster.
> 
> I don't really understand what you're getting at here. Surely a naturally 
> aligned block is always going to be easier to defragment than a misaligned 
> block.

Basically was I was saying it's about touching subpage 0, 1 of an
hugepage, then posix_memalign extends the vma and nobody is ever going
to touch page 2-511 because those are the virtual addresses
wasted. khugepaged before couldn't allocate an hugepage for only page
0, 1 because the vma stopped there, but later after the vma is
extended it can. So previously I wasn't mapping this range with an
hugepage, but now I'm mapping it with an hugepage too. And a sysfs
control will select the max number of unmapped subpages for the
collapse to happen. For just 1 subpage mapped in the hugepage virtual
range, it won't make sense to use large tlb and waste 511 pages of
ram.

> If the allocation size is not a multiple of the preferred alignment, then you 
> probably loose either way, and we shouldn't be requesting increased alignment.

That's probably good idea. Also note, if we were to allocate
separately the 0-640k 1m-end, for NPT to work we'd need to start the
second block misaligned at a 1m address. So maybe I should move the
alignment out of qemu_ram_alloc and have it in the caller?

> I wouldn't be surprised if putting the start of guest ram on a large TLB entry 
> was a win. Your guest kernel often lives there!

Yep, that's easy to handle with the hpage_pmd_size ;).

> Assuming we're allocating in large chunks, I doubt an extra hugepage worth of 
> VMA is a big issue.
> 
> Either way I'd argue that this isn't something qemu should have to care about, 
> and is actually a bug in posix_memalign.

Hmm the last is a weird claim considering posix_memalign gets an explicit
alignment parameter and it surely can't choose what alignment to
use. We can argue about the kernel side having to align automatically
but again if it would do that, it'd generate unnecessary vma holes
which we don't want.

I think it's quite simple, just use my new sysfs control, if it exists
always use that alignment instead of the default. We've only to decide
if to align inside or outside of qemu_ram_alloc.

next prev parent reply	other threads:[~2010-03-11 18:52 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-11 15:14 [Qemu-devel] [PATCH QEMU] transparent hugepage support Andrea Arcangeli
2010-03-11 15:52 ` Avi Kivity
2010-03-11 16:05   ` Andrea Arcangeli
2010-03-13  8:28     ` Avi Kivity
2010-03-13 17:47       ` Andrea Arcangeli
2010-03-11 16:28   ` Paul Brook
2010-03-11 16:46     ` Andrea Arcangeli
2010-03-11 17:55       ` Paul Brook
2010-03-11 18:49         ` Andrea Arcangeli [this message]
2010-03-12 11:36           ` Paul Brook
2010-03-12 14:52             ` Andrea Arcangeli
2010-03-12 16:04               ` Paul Brook
2010-03-12 16:17                 ` Andrea Arcangeli
2010-03-12 16:24                   ` Paul Brook
2010-03-12 16:57                     ` Andrea Arcangeli
2010-03-12 17:10                       ` Paul Brook
2010-03-12 17:41                         ` Andrea Arcangeli
2010-03-12 18:17                           ` Paul Brook
2010-03-12 18:36                             ` Andrea Arcangeli
2010-03-12 18:41                               ` Paul Brook
2010-03-12 18:51                                 ` Andrea Arcangeli
2010-03-12 22:40                                   ` Jamie Lokier
2010-03-12 16:10               ` Paul Brook

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100311184926.GJ5677@random.random \
    --to=aarcange@redhat.com \
    --cc=avi@redhat.com \
    --cc=paul@codesourcery.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).