RFC: Superpage/hugepage performance improvement

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* RFC:  Superpage/hugepage performance improvement
@ 2010-04-05 17:52 Dave McCracken
  2010-04-06  9:29 ` Tim Deegan
  0 siblings, 1 reply; 4+ messages in thread
From: Dave McCracken @ 2010-04-05 17:52 UTC (permalink / raw)
  To: Keir Fraser, Jeremy Fitzhardinge; +Cc: Xen Developers List

In our testing we found that the superpage/hugepage mapping code is seriously 
bogged down by the need to maintain the reference count on each of the 
underlying pages every time a hugepage is mapped.  I came up with a fix where a 
guest can call into the hypervisor to mark a set of pages as a superpage, thus 
locking that set of pages to be read/write data pages until the corresponding 
unmark is call is made.  To make this work I added two mmuext ops, one to mark 
a superpage and one to unmark it.  This change makes a huge performance 
difference in the hugepage mapping (on the order of 50 times faster).

On the Linux side, the hugepages are marked at the time they are put into the 
hugepage pool, and unmarked when they are taken out of the pool.  This 
commonly happens very infrequently.

Does this mechanism sound reasonable to you all?  If so, I'd like to make sure 
the numbers we use for the new mmuext ops are reserved upstream so we won't 
have to change them in the future.

I will port the actual patch forward to mainline shortly and send it off, but I 
wanted to get an early indication of how you feel about the design.

Thanks,
Dave McCracken
Oracle Corp.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFC:  Superpage/hugepage performance improvement
  2010-04-05 17:52 RFC: Superpage/hugepage performance improvement Dave McCracken
@ 2010-04-06  9:29 ` Tim Deegan
  2010-04-06  9:40   ` Keir Fraser
  2010-04-06 14:12   ` Dave McCracken
  0 siblings, 2 replies; 4+ messages in thread
From: Tim Deegan @ 2010-04-06  9:29 UTC (permalink / raw)
  To: Dave McCracken; +Cc: Jeremy Fitzhardinge, Xen Developers List, Keir Fraser

Hi,

At 18:52 +0100 on 05 Apr (1270493549), Dave McCracken wrote:
> In our testing we found that the superpage/hugepage mapping code is
> seriously bogged down by the need to maintain the reference count on
> each of the underlying pages every time a hugepage is mapped.  I came
> up with a fix where a guest can call into the hypervisor to mark a set
> of pages as a superpage, thus locking that set of pages to be
> read/write data pages until the corresponding unmark is call is made.

Hmm.  That sounds OK, as long as we haven't ended up with a way for a
guest to manipulate Xen's typecounts (either by double-freeing and
underflowing them or by leaving typecounts non-zero on domain
destruction).  How does it work internally?  Does it take a typecount
on each page and keep a separate flag/refcount per superpage so the
guest can't double-free?

How is it synchronized with PTE changes?  e.g. how do we make sure that
all the superpage PTEs that map an area of memory are are gone before
the guest can unmark the memory?

And I guess it's up to the guest to make sure that no pagetables,
decriptor tables, &c end up in that memory.

> To make this work I added two mmuext ops, one to mark a superpage and
> one to unmark it.  This change makes a huge performance difference in
> the hugepage mapping (on the order of 50 times faster).

Plus, presumably, some noticeable difference on a macro benchmark. (I
expect that's the case but I've been wrong before.)

Cheers,

Tim.

-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, XenServer Engineering
Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFC:  Superpage/hugepage performance improvement
  2010-04-06  9:29 ` Tim Deegan
@ 2010-04-06  9:40   ` Keir Fraser
  2010-04-06 14:12   ` Dave McCracken
  1 sibling, 0 replies; 4+ messages in thread
From: Keir Fraser @ 2010-04-06  9:40 UTC (permalink / raw)
  To: Tim Deegan, Dave McCracken; +Cc: Jeremy Fitzhardinge, Xen Developers List

On 06/04/2010 10:29, "Tim Deegan" <Tim.Deegan@eu.citrix.com> wrote:

>> To make this work I added two mmuext ops, one to mark a superpage and
>> one to unmark it.  This change makes a huge performance difference in
>> the hugepage mapping (on the order of 50 times faster).
> 
> Plus, presumably, some noticeable difference on a macro benchmark. (I
> expect that's the case but I've been wrong before.)

It's a very good question since we might expect that hugetlbfs users
construct long-lived mappings.

 -- Keir

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: RFC:  Superpage/hugepage performance improvement
  2010-04-06  9:29 ` Tim Deegan
  2010-04-06  9:40   ` Keir Fraser
@ 2010-04-06 14:12   ` Dave McCracken
  1 sibling, 0 replies; 4+ messages in thread
From: Dave McCracken @ 2010-04-06 14:12 UTC (permalink / raw)
  To: Tim Deegan; +Cc: Jeremy Fitzhardinge, Xen Developers List, Keir Fraser

On Tuesday 06 April 2010, Tim Deegan wrote:
> At 18:52 +0100 on 05 Apr (1270493549), Dave McCracken wrote:
> > In our testing we found that the superpage/hugepage mapping code is
> > seriously bogged down by the need to maintain the reference count on
> > each of the underlying pages every time a hugepage is mapped.  I came
> > up with a fix where a guest can call into the hypervisor to mark a set
> > of pages as a superpage, thus locking that set of pages to be
> > read/write data pages until the corresponding unmark is call is made.
> 
> Hmm.  That sounds OK, as long as we haven't ended up with a way for a
> guest to manipulate Xen's typecounts (either by double-freeing and
> underflowing them or by leaving typecounts non-zero on domain
> destruction).  How does it work internally?  Does it take a typecount
> on each page and keep a separate flag/refcount per superpage so the
> guest can't double-free?

I do set a flag on the leading page of the superpage, and increment the 
writable typecount on each of the underlying pages.  Then when the page is 
subsequently mapped I only increment the typecount of the leading page.

Setting and unsetting the superpage flag can not be nested.  Attempts to double 
free will be rejected.

> How is it synchronized with PTE changes?  e.g. how do we make sure that
> all the superpage PTEs that map an area of memory are are gone before
> the guest can unmark the memory?

Hmm... I don't have a bulletproof solution to that one yet.  It's complicated 
by the fact that on Linux, at least, all free pages may have a writable 
typecount of either 0 or 1, depending on whether they're highmem pages.

> And I guess it's up to the guest to make sure that no pagetables,
> decriptor tables, &c end up in that memory.

Yes, that's correct.  This mechanism is fundamentally incompatible with using 
the pages for anything other than writable data.  Their attempts should fail.

> > To make this work I added two mmuext ops, one to mark a superpage and
> > one to unmark it.  This change makes a huge performance difference in
> > the hugepage mapping (on the order of 50 times faster).
> 
> Plus, presumably, some noticeable difference on a macro benchmark. (I
> expect that's the case but I've been wrong before.)

The original reason I came up with this mechanism was that one of our real 
applications would not run due to the poor performance.  They have something 
like a 64G region fully populated with hugepages.  This region has to be 
mapped in the child on every fork.  It was so slow their operations were 
timing out.

Dave McCracken
Oracle Corp.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-04-06 14:12 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-05 17:52 RFC: Superpage/hugepage performance improvement Dave McCracken
2010-04-06  9:29 ` Tim Deegan
2010-04-06  9:40   ` Keir Fraser
2010-04-06 14:12   ` Dave McCracken

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).