From: Andrea Arcangeli <aarcange@redhat.com>
To: Christoph Lameter <cl@linux-foundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
linux-mm@kvack.org, Marcelo Tosatti <mtosatti@redhat.com>,
Adam Litke <agl@us.ibm.com>, Avi Kivity <avi@redhat.com>,
Izik Eidus <ieidus@redhat.com>,
Hugh Dickins <hugh.dickins@tiscali.co.uk>,
Nick Piggin <npiggin@suse.de>, Rik van Riel <riel@redhat.com>,
Mel Gorman <mel@csn.ul.ie>, Dave Hansen <dave@linux.vnet.ibm.com>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Ingo Molnar <mingo@elte.hu>, Mike Travis <travis@sgi.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Chris Wright <chrisw@sous-sol.org>,
bpicco@redhat.com,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
Balbir Singh <balbir@linux.vnet.ibm.com>,
Arnd Bergmann <arnd@arndb.de>,
"Michael S. Tsirkin" <mst@redhat.com>,
Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH 00 of 34] Transparent Hugepage support #14
Date: Mon, 22 Mar 2010 18:15:53 +0100 [thread overview]
Message-ID: <20100322171553.GS29874@random.random> (raw)
In-Reply-To: <alpine.DEB.2.00.1003221139300.17230@router.home>
On Mon, Mar 22, 2010 at 11:46:01AM -0500, Christoph Lameter wrote:
> On Mon, 22 Mar 2010, Johannes Weiner wrote:
>
> > > entries while walking the page tables! Go incrementally use what
> > > is there.
> >
> > That only works if you merely read the tables. If the VMA gets broken
> > up in the middle of a huge page, you definitely have to map ptes again.
>
> Yes then follow the established system for remapping stuff.
I followed exactly what __pte_alloc does already. When pmds can't go
away after they're established, the only place that uses that locking
is __pte_alloc. Now obviously more stuff will have to use that _same_
locking, because it's not true anymore that a pmd can't go away after
it's established _if_ it's huge.
I also made sure a pmd can't become huge without mmap_sem and anon_vma
rmap lock, so that we can lockless check if pmd is huge, and if it's
not, we just take the legacy 4k paths. That is how it guarantees there
is no measurable slowdown unless you actively use 2M pages and in turn
you get a huge boost (like 50% faster) which outweights any overhead
introduced by having to take the page_table_lock in the pmd_huge new
paths.
It's just pmd_huge -> page_table_lock, not pmd_huge -> call pmd_offset
lockless and take the PT lock.
page_table_lock for pmd_huge acts the _exact_ same way of the PT lock
for the not pmd_huge path.
> It results in a volatility in the page table entries that requires new
> synchronization procedures. It also increases the difficulty in
> establishing a reliable state of the pages / page tables for
> operations since there is potentially on-the-fly atomic conversion
> wizardry going on.
Again: split_huge_page has nothing to do with the pte or pmd locking.
Especially obvious in the case your proposed alternate design will
still use one form of split_huge_page but one that can fail if the
page is under gup (which would practically make it unusable anywhere
but swap and even in swap it would lead to potential livelocks in
unsolvable oom as it's not just slow-unfrequent-IO calling gup).
> You do not need to do this all at once. Again the huge page subsystem has
> been around for years and we have established mechanisms to move/remap.
> There nothing hindering us from implementing huge page -> regular page
> conversion using the known methods or also implementing explicit huge page
> support in more portions of the kernel.
Indeed hugetlbfs also adds page_table_lock around every pmd
manipulation. But personally I prefer you focus on __pte_alloc as
transparent hugepage has to mirror exactly the pte locking to be
clean, regardless of hugetlbfs (in this case hugetlbfs happen to use
the same locking of __pte_alloc and transparent hugepage huge_memory.c).
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-03-22 17:17 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-17 15:19 [PATCH 00 of 34] Transparent Hugepage support #14 Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 01 of 34] define MADV_HUGEPAGE Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 02 of 34] compound_lock Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 03 of 34] alter compound get_page/put_page Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 04 of 34] update futex compound knowledge Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 05 of 34] fix bad_page to show the real reason the page is bad Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 06 of 34] clear compound mapping Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 07 of 34] add native_set_pmd_at Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 08 of 34] add pmd paravirt ops Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 09 of 34] no paravirt version of pmd ops Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 10 of 34] export maybe_mkwrite Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 11 of 34] comment reminder in destroy_compound_page Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 12 of 34] config_transparent_hugepage Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 13 of 34] special pmd_trans_* functions Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 14 of 34] add pmd mangling generic functions Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 15 of 34] add pmd mangling functions to x86 Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 16 of 34] bail out gup_fast on splitting pmd Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 17 of 34] pte alloc trans splitting Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 18 of 34] add pmd mmu_notifier helpers Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 19 of 34] clear page compound Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 20 of 34] add pmd_huge_pte to mm_struct Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 21 of 34] split_huge_page_mm/vma Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 22 of 34] split_huge_page paging Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 23 of 34] clear_copy_huge_page Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 24 of 34] kvm mmu transparent hugepage support Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 25 of 34] _GFP_NO_KSWAPD Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 26 of 34] don't alloc harder for gfp nomemalloc even if nowait Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 27 of 34] transparent hugepage core Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 28 of 34] verify pmd_trans_huge isn't leaking Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 29 of 34] madvise(MADV_HUGEPAGE) Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 30 of 34] pmd_trans_huge migrate bugcheck Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 31 of 34] memcg compound Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 32 of 34] memcg huge memory Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 33 of 34] transparent hugepage vmstat Andrea Arcangeli
2010-03-17 15:19 ` [PATCH 34 of 34] khugepaged Andrea Arcangeli
2010-03-17 19:05 ` [PATCH 00 of 34] Transparent Hugepage support #14 Christoph Lameter
2010-03-18 23:49 ` Andrea Arcangeli
2010-03-19 13:29 ` Christoph Lameter
2010-03-19 14:41 ` Andrea Arcangeli
2010-03-22 15:38 ` Christoph Lameter
2010-03-22 16:35 ` Johannes Weiner
2010-03-22 16:46 ` Christoph Lameter
2010-03-22 17:15 ` Andrea Arcangeli [this message]
2010-03-23 17:08 ` Christoph Lameter
2010-03-22 18:20 ` Johannes Weiner
2010-03-23 17:11 ` Christoph Lameter
2010-03-23 19:06 ` Andrea Arcangeli
2010-03-22 17:08 ` Andrea Arcangeli
2010-03-22 17:06 ` Andrea Arcangeli
2010-03-23 17:06 ` Christoph Lameter
2010-03-23 19:08 ` Andrea Arcangeli
2010-03-24 21:03 ` Christoph Lameter
2010-03-24 21:22 ` Andrea Arcangeli
2010-03-25 22:17 ` Christoph Lameter
2010-03-25 22:41 ` Andrea Arcangeli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100322171553.GS29874@random.random \
--to=aarcange@redhat.com \
--cc=agl@us.ibm.com \
--cc=arnd@arndb.de \
--cc=avi@redhat.com \
--cc=balbir@linux.vnet.ibm.com \
--cc=benh@kernel.crashing.org \
--cc=bpicco@redhat.com \
--cc=chrisw@sous-sol.org \
--cc=cl@linux-foundation.org \
--cc=dave@linux.vnet.ibm.com \
--cc=hannes@cmpxchg.org \
--cc=hugh.dickins@tiscali.co.uk \
--cc=ieidus@redhat.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=mingo@elte.hu \
--cc=mst@redhat.com \
--cc=mtosatti@redhat.com \
--cc=npiggin@suse.de \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=travis@sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).