From: Andrea Arcangeli <aarcange@redhat.com>
To: Christoph Lameter <cl@linux-foundation.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
Marcelo Tosatti <mtosatti@redhat.com>,
Adam Litke <agl@us.ibm.com>, Avi Kivity <avi@redhat.com>,
Izik Eidus <ieidus@redhat.com>,
Hugh Dickins <hugh.dickins@tiscali.co.uk>,
Nick Piggin <npiggin@suse.de>, Rik van Riel <riel@redhat.com>,
Mel Gorman <mel@csn.ul.ie>, Dave Hansen <dave@linux.vnet.ibm.com>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Ingo Molnar <mingo@elte.hu>, Mike Travis <travis@sgi.com>,
Chris Wright <chrisw@sous-sol.org>,
bpicco@redhat.com,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Balbir Singh <balbir@linux.vnet.ibm.com>,
Arnd Bergmann <arnd@arndb.de>,
"Michael S. Tsirkin" <mst@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Subject: Re: [PATCH 00 of 41] Transparent Hugepage Support #16
Date: Wed, 31 Mar 2010 18:41:47 +0200 [thread overview]
Message-ID: <20100331164147.GN5825@random.random> (raw)
In-Reply-To: <alpine.DEB.2.00.1003311102580.17603@router.home>
On Wed, Mar 31, 2010 at 11:24:02AM -0500, Christoph Lameter wrote:
> On Wed, 31 Mar 2010, Andrea Arcangeli wrote:
>
> > > I'm sorry if you answered someone already.
> >
> > The generic archs without pmd approach can't mix hugepages and regular
> > pages in the same vma, so they can't provide graceful fallback and
> > never fail an allocation despite there is pleny of memory free which
> > is one critical fundamental point in the design (and later collapse
> > those with khugepaged which also can run memory compaction
> > asynchronously in the background and not synchronously during page
> > fault which would be entirely worthless for short lived allocations).
>
> Large pages would be more independent from the page table structure with
> the approach that I outlined earlier since you would not have to do these
> sync tricks.
I was talking about memory compaction. collapse_huge_page will still
be needed forever regardless of split_huge_page existing or not.
> > About the HPAGE_PMD_ prefix it's not only HPAGE_ like I did initially,
> > in case we later decide to split/collapse 1G pages too but frankly I
> > think by the time memory size doubles 512 times across the board (to
> > make 1G pages a not totally wasted effort to implement in the
> > transparent hugepage support) we'd better move the PAGE_SIZE to 2M and
> > stick to the HPAGE_PMD_ again.
>
> There are applications that have benefited for years already from 1G page
> sizes (available on IA64 f.e.). So why wait?
Because the difficulty on finding hugepages free increases
exponentially with the order of allocation. Plus increasing MAX_ORDER
so much would slowdown everything for no gain because we will fail to
obtain 1G pages freed. The cost of compacting 1G pages also is 512
times bigger than with regular pages. It's not feasible right now with
current memory sizes, I just said it's probably better to move to
PAGE_SIZE 2M instead of extending to 1g pages in a kernel whose
PAGE_SIZE is 4k.
Last but not the least it can be done but considering I'm abruptly
failing to merge 35 patches (and surely your comments aren't helping
in that direction...), it'd be counter-productive to make the core
even more complex with support for 1G pages immediately. In any case
the 1G support should be done at the very end of the patchset, not in
the core, or merging would be even harder as it'll all become more
complex all over the place requiring to modify two places instead of
just 1 all over the VM for every pagetable walk, and split_huge_page
internals would become more complex too. Doing it incremental also
allows the 1G support to be bisectable later.
In short, I think it makes zero sense to do it now, I think it makes
no sense until memory sizes increases 512 times, but in any case I
agreed to call it HPAGE_PMD_ and not HPAGE_ for a reason, so
discussing it now or mentioning lack of immediate
monolithic-no-bisectable 1G support isn't good reason for going
against my current patchset and we can defer this unpractical 1G
support after the useful 2M support is merged. In fact I think the
preferred way to do it (if we ever add it) is to make 2M handling
native first and then convert split_huge_page to be the "compatibility
fallback code" from 1G to 2M. Otherwise at times split_huge_page would
be forced to run a 262144 loop which might become noticeable.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-03-31 16:42 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-29 18:37 [PATCH 00 of 41] Transparent Hugepage Support #16 Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 01 of 41] define MADV_HUGEPAGE Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 02 of 41] compound_lock Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 03 of 41] alter compound get_page/put_page Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 04 of 41] update futex compound knowledge Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 05 of 41] fix bad_page to show the real reason the page is bad Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 06 of 41] clear compound mapping Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 07 of 41] add native_set_pmd_at Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 08 of 41] add pmd paravirt ops Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 09 of 41] no paravirt version of pmd ops Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 10 of 41] export maybe_mkwrite Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 11 of 41] comment reminder in destroy_compound_page Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 12 of 41] config_transparent_hugepage Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 13 of 41] special pmd_trans_* functions Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 14 of 41] add pmd mangling generic functions Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 15 of 41] add pmd mangling functions to x86 Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 16 of 41] bail out gup_fast on splitting pmd Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 17 of 41] pte alloc trans splitting Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 18 of 41] add pmd mmu_notifier helpers Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 19 of 41] clear page compound Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 20 of 41] add pmd_huge_pte to mm_struct Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 21 of 41] split_huge_page_mm/vma Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 22 of 41] split_huge_page paging Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 23 of 41] clear_copy_huge_page Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 24 of 41] kvm mmu transparent hugepage support Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 25 of 41] _GFP_NO_KSWAPD Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 26 of 41] don't alloc harder for gfp nomemalloc even if nowait Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 27 of 41] transparent hugepage core Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 28 of 41] verify pmd_trans_huge isn't leaking Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 29 of 41] madvise(MADV_HUGEPAGE) Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 30 of 41] pmd_trans_huge migrate bugcheck Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 31 of 41] memcg compound Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 32 of 41] memcg huge memory Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 33 of 41] transparent hugepage vmstat Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 34 of 41] khugepaged Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 35 of 41] skip transhuge pages in ksm for now Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 36 of 41] remove PG_buddy Andrea Arcangeli
2010-03-29 18:49 ` Peter Zijlstra
2010-03-29 20:18 ` Benjamin Herrenschmidt
2010-03-29 22:17 ` Andrea Arcangeli
2010-03-29 22:30 ` Dave Hansen
2010-03-30 0:15 ` Andrea Arcangeli
2010-03-30 1:06 ` Andrea Arcangeli
2010-04-01 18:15 ` Andrea Arcangeli
2010-03-30 16:35 ` Christoph Lameter
2010-03-30 16:44 ` Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 37 of 41] add x86 32bit support Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 38 of 41] mincore transparent hugepage support Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 39 of 41] add pmd_modify Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 40 of 41] mprotect: pass vma down to page table walkers Andrea Arcangeli
2010-03-29 18:37 ` [PATCH 41 of 41] mprotect: transparent huge page support Andrea Arcangeli
2010-03-31 5:10 ` [PATCH 00 of 41] Transparent Hugepage Support #16 KAMEZAWA Hiroyuki
2010-03-31 15:33 ` Andrea Arcangeli
2010-03-31 16:24 ` Christoph Lameter
2010-03-31 16:41 ` Andrea Arcangeli [this message]
2010-03-31 18:59 ` Christoph Lameter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100331164147.GN5825@random.random \
--to=aarcange@redhat.com \
--cc=agl@us.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=arnd@arndb.de \
--cc=avi@redhat.com \
--cc=balbir@linux.vnet.ibm.com \
--cc=benh@kernel.crashing.org \
--cc=bpicco@redhat.com \
--cc=chrisw@sous-sol.org \
--cc=cl@linux-foundation.org \
--cc=dave@linux.vnet.ibm.com \
--cc=hannes@cmpxchg.org \
--cc=hugh.dickins@tiscali.co.uk \
--cc=ieidus@redhat.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=mingo@elte.hu \
--cc=mst@redhat.com \
--cc=mtosatti@redhat.com \
--cc=nishimura@mxp.nes.nec.co.jp \
--cc=npiggin@suse.de \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=travis@sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).