linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Nick Piggin <npiggin@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>, Avi Kivity <avi@redhat.com>,
	Mike Galbraith <efault@gmx.de>,
	Jason Garrett-Glaser <darkshikari@gmail.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Pekka Enberg <penberg@cs.helsinki.fi>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Marcelo Tosatti <mtosatti@redhat.com>,
	Adam Litke <agl@us.ibm.com>, Izik Eidus <ieidus@redhat.com>,
	Hugh Dickins <hugh.dickins@tiscali.co.uk>,
	Rik van Riel <riel@redhat.com>, Mel Gorman <mel@csn.ul.ie>,
	Dave Hansen <dave@linux.vnet.ibm.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Mike Travis <travis@sgi.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Christoph Lameter <cl@linux-foundation.org>,
	Chris Wright <chrisw@sous-sol.org>,
	bpicco@redhat.com,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	Arnd Bergmann <arnd@arndb.de>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Subject: Re: [PATCH 00 of 41] Transparent Hugepage Support #17
Date: Mon, 12 Apr 2010 10:06:26 +0200	[thread overview]
Message-ID: <20100412080626.GG5656@random.random> (raw)
In-Reply-To: <20100412072144.GS5683@laptop>

On Mon, Apr 12, 2010 at 05:21:44PM +1000, Nick Piggin wrote:
> On Mon, Apr 12, 2010 at 09:08:11AM +0200, Andrea Arcangeli wrote:
> > On Mon, Apr 12, 2010 at 04:09:31PM +1000, Nick Piggin wrote:
> > > One problem is that you need to keep a lot more memory free in order
> > > for it to be reasonably effective. Another thing is that the problem
> > > of fragmentation breakdown is not just a one-shot event that fills
> > > memory with pinned objects. It is a slow degredation.
> > 
> > set_recommended_min_free_kbytes seems to not be in function of ram
> > size, 60MB aren't such a big deal.
> > 
> > > Especially when you use something like SLUB as the memory allocator
> > > which requires higher order allocations for objects which are pinned
> > > in kernel memory.
> > > 
> > > Just running a few minutes of testing with a kernel compile in the
> > > background does not show the full picture. You really need a box that
> > > has been up for days running a proper workload before you are likely
> > > to see any breakdown.
> > > 
> > > I'm sure it's horrible for planning if the RDBMS or VM boxes gradually
> > > get slower after X days of uptime. It's better to have consistent
> > > performance really, for anything except pure benchmark setups.
> > 
> > All data I provided is very real, in addition to building a ton of
> > packages and running emerge on /usr/portage I've been running all my
> > real loads. Only problem I only run it for 1 day and half, but the
> > load I kept it under was significant (surely a lot bigger inode/dentry
> > load that any hypervisor usage would ever generate).
> 
> OK, but as a solution for some kind of very specific and highly
> optimized application already like RDBMS, HPC, hypervisor or JVM,
> they could just be using hugepages themselves, couldn't they?
>
> It seems more interesting as a more general speedup for applications
> that can't afford such optimizations? (eg. the common case for
> most people)

The reality is that very few are using hugetlbfs. I guess maybe 0.1%
of KVM instances on phenom/nahlem chips are running on hugetlbfs for
example (hugetlbfs boot reservation doesn't fit the cloud where you
need all ram available in hugetlbfs and you still need 100% of unused
ram as host pagecache for VDI), despite it would provide a >=6% boosts
to all VM no matter what's running on the guest. Same goes for the
JVM, maybe 0.1% of those runs on hugetlbfs. The commercial DBMS are
the exception and they're probably closer to 99% running on hugetlbfs
(and they've to keep using hugetlbfs until we move transparent
hugepages in tmpfs). But as

So there's a ton of wasted energy in my view. Like Ingo said, the
faster they make the chips and the cheaper the RAM becomes, the more
wasted energy as result of not using hugetlbfs. There's always more
difference between cache sizes and ram sizes and also more difference
between cache speeds and ram speeds. I don't see this trend ending and
I'm not sure what is the better CPU that will make hugetlbfs worthless
and unselectable at kernel configure time on x86 arch (if you build
without generic).

And I don't think it's feasible to ship a distro where 99% of apps
that can benefit from hugepages are running with
LD_PRELOAD=libhugetlbfs.so. It has to be transparent if we want to
stop the waste.

The main reason I've always been skeptical about transparent hugepages
before I started working on this is the mess they generate on the
whole kernel. So my priority of course has been to keep it self
contained as much as possible. It kept spilling over and over until I
managed to confine it to anonymous pages and fix whole mm/.c files
with just a one liner (even the hugepage aware implementation that
Johannes did still takes advantage of split_huge_page_pmd if the
mprotect start/end isn't 2M naturally aligned, just to show how
complex it would be to do it all at once). This will allow us to reach
a solid base, and then later move to tmpfs and maybe later to
pagecache and swapcache too. Pretending the whole kernel to become
hugepage aware at once is a total mess, gup would need to return only
head pages for example and breaking hundred of drivers in just that
change. The compound_lock can be removed after you fix all those
hundred of drivers and subsystems using gup... No big deal to remove
it later, kind of you're removing the big kernel lock these days after
14 years of when it has been introduced.

Plus I did all I could to try to keep it as black and white as
possible. I think other OS are more gray in their approaches, my
priority has been to pay for RAM anywhere I could if you set
enabled=always, and to decrease as much as I could any risk of
performance regressions in any workload. These days we can afford to
lose 1G without much worry if it speedup the workload 8%, so I think
the other designs are better for old hardware RAM constrainted and not
very actual. On embedded with my patchset one should set
enabled=madvise. Ingo suggested a per-process tweak to enable it
selectively on certain apps, that is feasible too in the future (so
people won't be forced to modify binaries to add madvise if they can't
leave enabled=always).

> Yes we do have the option to reserve pages and as far as I know it
> should work, although I can't remember whether it deals with mlock.

I think that is the right route to take for who needs the
math-guarantees, and for many products it won't even be noticeable to
enforce the math guarantee. It's kind of overcommit, somebody prefers
the = 2 version and maybe they don't even notice it allows them to
allocate less memory. Others prefers to be able to allocate ram
without accounting for the unused virtual regions despite the bigger
chance to run into the oom killer (and I'm in the latter camp for both
overcommit sysctl and kernelcore= ;).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2010-04-12  8:07 UTC|newest]

Thread overview: 205+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-02  0:41 [PATCH 00 of 41] Transparent Hugepage Support #17 Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 01 of 41] define MADV_HUGEPAGE Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 02 of 41] compound_lock Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 03 of 41] alter compound get_page/put_page Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 04 of 41] update futex compound knowledge Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 05 of 41] fix bad_page to show the real reason the page is bad Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 06 of 41] clear compound mapping Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 07 of 41] add native_set_pmd_at Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 08 of 41] add pmd paravirt ops Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 09 of 41] no paravirt version of pmd ops Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 10 of 41] export maybe_mkwrite Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 11 of 41] comment reminder in destroy_compound_page Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 12 of 41] config_transparent_hugepage Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 13 of 41] special pmd_trans_* functions Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 14 of 41] add pmd mangling generic functions Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 15 of 41] add pmd mangling functions to x86 Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 16 of 41] bail out gup_fast on splitting pmd Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 17 of 41] pte alloc trans splitting Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 18 of 41] add pmd mmu_notifier helpers Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 19 of 41] clear page compound Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 20 of 41] add pmd_huge_pte to mm_struct Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 21 of 41] split_huge_page_mm/vma Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 22 of 41] split_huge_page paging Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 23 of 41] clear_copy_huge_page Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 24 of 41] kvm mmu transparent hugepage support Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 25 of 41] _GFP_NO_KSWAPD Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 26 of 41] don't alloc harder for gfp nomemalloc even if nowait Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 27 of 41] transparent hugepage core Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 28 of 41] verify pmd_trans_huge isn't leaking Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 29 of 41] madvise(MADV_HUGEPAGE) Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 30 of 41] pmd_trans_huge migrate bugcheck Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 31 of 41] memcg compound Andrea Arcangeli
2010-04-02  0:41 ` [PATCH 32 of 41] memcg huge memory Andrea Arcangeli
2010-04-02  0:42 ` [PATCH 33 of 41] transparent hugepage vmstat Andrea Arcangeli
2010-04-02  0:42 ` [PATCH 34 of 41] khugepaged Andrea Arcangeli
2010-04-02  0:42 ` [PATCH 35 of 41] skip transhuge pages in ksm for now Andrea Arcangeli
2010-04-02  0:42 ` [PATCH 36 of 41] remove PG_buddy Andrea Arcangeli
2010-04-02  0:42 ` [PATCH 37 of 41] add x86 32bit support Andrea Arcangeli
2010-04-02  0:42 ` [PATCH 38 of 41] mincore transparent hugepage support Andrea Arcangeli
2010-04-02  0:42 ` [PATCH 39 of 41] add pmd_modify Andrea Arcangeli
2010-04-02  0:42 ` [PATCH 40 of 41] mprotect: pass vma down to page table walkers Andrea Arcangeli
2010-04-02  0:42 ` [PATCH 41 of 41] mprotect: transparent huge page support Andrea Arcangeli
2010-04-05 19:09 ` [PATCH 00 of 41] Transparent Hugepage Support #17 Andrew Morton
2010-04-05 19:36   ` Ingo Molnar
2010-04-05 20:26     ` Pekka Enberg
2010-04-05 20:32       ` Linus Torvalds
2010-04-05 20:46         ` Pekka Enberg
2010-04-05 20:58           ` Linus Torvalds
2010-04-05 21:54             ` Ingo Molnar
2010-04-05 23:21             ` Andrea Arcangeli
2010-04-06  0:26               ` Linus Torvalds
2010-04-06  1:08                 ` [RFD] " Linus Torvalds
2010-04-06  1:26                   ` Andrea Arcangeli
2010-04-06  1:35                   ` Linus Torvalds
2010-04-06  1:13                 ` Andrea Arcangeli
2010-04-06  1:38                   ` Linus Torvalds
2010-04-06  2:23                     ` Linus Torvalds
2010-04-06  5:25                       ` Nick Piggin
2010-04-06  9:08                       ` Ingo Molnar
2010-04-06  9:13                         ` Ingo Molnar
2010-04-10 18:47                         ` Andrea Arcangeli
2010-04-10 19:02                           ` Ingo Molnar
2010-04-10 19:22                             ` Avi Kivity
2010-04-10 19:47                               ` Ingo Molnar
2010-04-10 20:00                                 ` Andrea Arcangeli
2010-04-10 20:10                                   ` Andrea Arcangeli
2010-04-10 20:21                                   ` Jason Garrett-Glaser
2010-04-10 20:24                                 ` Avi Kivity
2010-04-10 20:42                                   ` Avi Kivity
2010-04-10 20:47                                     ` Andrea Arcangeli
2010-04-10 21:00                                       ` Avi Kivity
2010-04-10 21:47                                         ` Andrea Arcangeli
2010-04-11  1:05                                         ` Andrea Arcangeli
2010-04-11 11:24                                           ` Ingo Molnar
2010-04-11 11:33                                             ` Avi Kivity
2010-04-11 12:11                                               ` Ingo Molnar
2010-04-25 19:27                                           ` Andrea Arcangeli
2010-04-26 18:01                                             ` Andrea Arcangeli
2010-04-30  9:55                                               ` Ingo Molnar
2010-04-30 15:19                                                 ` Andrea Arcangeli
2010-05-02 12:17                                                   ` Ingo Molnar
2010-04-10 20:49                                     ` Jason Garrett-Glaser
2010-04-10 20:53                                       ` Avi Kivity
2010-04-10 20:58                                         ` Jason Garrett-Glaser
2010-04-11  9:29                                         ` Avi Kivity
2010-04-11  9:37                                           ` Jason Garrett-Glaser
2010-04-11  9:40                                             ` Avi Kivity
2010-04-11 10:22                                               ` Jason Garrett-Glaser
2010-04-11 11:00                                               ` Ingo Molnar
2010-04-11 11:19                                                 ` Avi Kivity
2010-04-11 11:30                                                   ` Jason Garrett-Glaser
2010-04-11 11:52                                                   ` hugepages will matter more in the future Ingo Molnar
2010-04-11 12:01                                                     ` Avi Kivity
2010-04-11 12:35                                                       ` Ingo Molnar
2010-04-11 15:22                                                     ` Linus Torvalds
2010-04-11 15:43                                                       ` Avi Kivity
2010-04-11 15:52                                                         ` Linus Torvalds
2010-04-11 16:04                                                           ` Avi Kivity
2010-04-12  7:45                                                             ` Ingo Molnar
2010-04-12  8:14                                                               ` Nick Piggin
2010-04-12  8:22                                                                 ` Ingo Molnar
2010-04-12  8:34                                                                   ` Nick Piggin
2010-04-12  8:47                                                                     ` Avi Kivity
2010-04-12  8:45                                                                 ` Andrea Arcangeli
2010-04-11 19:35                                                           ` Andrea Arcangeli
2010-04-12 16:20                                                           ` Rik van Riel
2010-04-12 16:40                                                             ` Linus Torvalds
2010-04-12 16:56                                                               ` Linus Torvalds
2010-04-12 17:06                                                                 ` Randy Dunlap
2010-04-12 17:36                                                               ` Andrea Arcangeli
2010-04-12 17:46                                                                 ` Rik van Riel
2010-04-11 19:40                                                       ` Andrea Arcangeli
2010-04-12 15:41                                                         ` Linus Torvalds
2010-04-12 11:22                                                     ` Arjan van de Ven
2010-04-12 11:29                                                       ` Avi Kivity
2010-04-17 15:12                                                         ` Arjan van de Ven
2010-04-17 18:18                                                           ` Avi Kivity
2010-04-17 19:05                                                             ` Arjan van de Ven
2010-04-17 19:05                                                               ` Avi Kivity
2010-04-17 19:18                                                                 ` Arjan van de Ven
2010-04-17 19:20                                                                   ` Avi Kivity
2010-04-12 13:30                                                       ` Andrea Arcangeli
2010-04-12 13:33                                                         ` Avi Kivity
2010-04-12 13:39                                                           ` Andrea Arcangeli
2010-04-12 13:53                                                             ` Avi Kivity
2010-04-13 11:38                                                         ` Ingo Molnar
2010-04-13 13:17                                                           ` Andrea Arcangeli
2010-04-11 10:46                                   ` [PATCH 00 of 41] Transparent Hugepage Support #17 Ingo Molnar
2010-04-11 10:49                                     ` Ingo Molnar
2010-04-11 11:30                                     ` Avi Kivity
2010-04-11 12:08                                       ` Ingo Molnar
2010-04-11 12:24                                         ` Avi Kivity
2010-04-11 12:46                                           ` Ingo Molnar
2010-04-12  6:09                                         ` Nick Piggin
2010-04-12  6:18                                           ` Pekka Enberg
2010-04-12  6:48                                             ` Nick Piggin
2010-04-12 14:29                                             ` Christoph Lameter
2010-04-12 16:06                                               ` Nick Piggin
2010-04-12  6:36                                           ` Avi Kivity
2010-04-12  6:55                                             ` Ingo Molnar
2010-04-12  7:15                                             ` Nick Piggin
2010-04-12  7:45                                               ` Avi Kivity
2010-04-12  8:28                                                 ` Nick Piggin
2010-04-12  9:01                                                   ` Andrea Arcangeli
2010-04-12  9:03                                                   ` Avi Kivity
2010-04-12  9:26                                                     ` Nick Piggin
2010-04-12  9:39                                                       ` Andrea Arcangeli
2010-04-12 10:02                                                       ` Avi Kivity
2010-04-12 10:08                                                         ` Andrea Arcangeli
2010-04-12 10:10                                                           ` Avi Kivity
2010-04-12 10:23                                                             ` Andrea Arcangeli
2010-04-12 10:37                                                         ` Nick Piggin
2010-04-12 10:59                                                           ` Avi Kivity
2010-04-12 12:23                                                             ` Avi Kivity
2010-04-12 13:25                                                             ` Andrea Arcangeli
2010-04-13  0:38                                                         ` Andrew Morton
2010-04-13  6:18                                                           ` Neil Brown
2010-04-13 13:31                                                             ` Andrea Arcangeli
2010-04-13 13:40                                                               ` Mel Gorman
2010-04-13 13:44                                                                 ` Andrea Arcangeli
2010-04-13 13:55                                                                   ` Mel Gorman
2010-04-13 14:03                                                                     ` Andrea Arcangeli
2010-04-12  7:51                                               ` Ingo Molnar
2010-04-12  7:18                                             ` Andrea Arcangeli
2010-04-12  6:49                                           ` Ingo Molnar
2010-04-12  7:35                                             ` Andrea Arcangeli
2010-04-12  7:08                                           ` Andrea Arcangeli
2010-04-12  7:21                                             ` Nick Piggin
2010-04-12  7:50                                               ` Avi Kivity
2010-04-12  8:07                                                 ` Ingo Molnar
2010-04-12  8:21                                                   ` Andrea Arcangeli
2010-04-12 10:27                                                   ` Mel Gorman
2010-04-12  8:18                                                 ` Andrea Arcangeli
2010-04-12  8:06                                               ` Andrea Arcangeli [this message]
2010-04-12 10:44                                                 ` Mel Gorman
2010-04-12 11:12                                                   ` Avi Kivity
2010-04-12 13:17                                                   ` Andrea Arcangeli
2010-04-12 14:24                           ` Christoph Lameter
2010-04-12 14:49                             ` Avi Kivity
2010-04-06  9:55                       ` Avi Kivity
2010-04-06  9:57                         ` Avi Kivity
2010-04-06 11:55                         ` Avi Kivity
2010-04-06 13:10                           ` Nick Piggin
2010-04-06 13:22                             ` Avi Kivity
2010-04-06 13:45                               ` Nick Piggin
2010-04-06 13:57                                 ` Avi Kivity
2010-04-06 16:50                                 ` Andrea Arcangeli
2010-04-06 17:31                                   ` Avi Kivity
2010-04-06 18:00                                     ` Christoph Lameter
2010-04-06 18:04                                       ` Avi Kivity
2010-04-06 18:47                                 ` Avi Kivity
2010-04-06 14:44                             ` Rik van Riel
2010-04-06 16:43                             ` Andrea Arcangeli
2010-04-06  9:30               ` Mel Gorman
2010-04-06 10:32                 ` Theodore Tso
2010-04-06 11:16                   ` Mel Gorman
2010-04-06 13:13                     ` Theodore Tso
2010-04-06 14:55                       ` Mel Gorman
2010-04-06 16:46                       ` Andrea Arcangeli
2010-04-05 21:01         ` Chris Mason
2010-04-05 21:18           ` Avi Kivity
2010-04-05 21:33             ` Linus Torvalds
2010-04-05 22:33               ` Chris Mason
2010-04-06  8:30             ` Mel Gorman
2010-04-06 11:35               ` Chris Mason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100412080626.GG5656@random.random \
    --to=aarcange@redhat.com \
    --cc=agl@us.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=avi@redhat.com \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=benh@kernel.crashing.org \
    --cc=bpicco@redhat.com \
    --cc=chrisw@sous-sol.org \
    --cc=cl@linux-foundation.org \
    --cc=darkshikari@gmail.com \
    --cc=dave@linux.vnet.ibm.com \
    --cc=efault@gmx.de \
    --cc=hannes@cmpxchg.org \
    --cc=hugh.dickins@tiscali.co.uk \
    --cc=ieidus@redhat.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=mingo@elte.hu \
    --cc=mst@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=nishimura@mxp.nes.nec.co.jp \
    --cc=npiggin@suse.de \
    --cc=penberg@cs.helsinki.fi \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=torvalds@linux-foundation.org \
    --cc=travis@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).