From: Dave Hansen <dave@linux.vnet.ibm.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: linux-mm@kvack.org, Marcelo Tosatti <mtosatti@redhat.com>,
Adam Litke <agl@us.ibm.com>, Avi Kivity <avi@redhat.com>,
Izik Eidus <ieidus@redhat.com>,
Hugh Dickins <hugh.dickins@tiscali.co.uk>,
Nick Piggin <npiggin@suse.de>, Rik van Riel <riel@redhat.com>,
Mel Gorman <mel@csn.ul.ie>, Andi Kleen <andi@firstfloor.org>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Ingo Molnar <mingo@elte.hu>, Mike Travis <travis@sgi.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Christoph Lameter <cl@linux-foundation.org>,
Chris Wright <chrisw@sous-sol.org>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 03 of 30] alter compound get_page/put_page
Date: Thu, 21 Jan 2010 09:35:46 -0800 [thread overview]
Message-ID: <1264095346.32717.34452.camel@nimitz> (raw)
In-Reply-To: <2c68e94d31d8c675a5e2.1264054827@v2.random>
On Thu, 2010-01-21 at 07:20 +0100, Andrea Arcangeli wrote:
> From: Andrea Arcangeli <aarcange@redhat.com>
>
> Alter compound get_page/put_page to keep references on subpages too, in order
> to allow __split_huge_page_refcount to split an hugepage even while subpages
> have been pinned by one of the get_user_pages() variants.
>
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> ---
>
> diff --git a/arch/powerpc/mm/gup.c b/arch/powerpc/mm/gup.c
> --- a/arch/powerpc/mm/gup.c
> +++ b/arch/powerpc/mm/gup.c
> @@ -43,6 +43,14 @@ static noinline int gup_pte_range(pmd_t
> page = pte_page(pte);
> if (!page_cache_get_speculative(page))
> return 0;
> + if (PageTail(page)) {
> + /*
> + * __split_huge_page_refcount() cannot run
> + * from under us.
> + */
> + VM_BUG_ON(atomic_read(&page->_count) < 0);
> + atomic_inc(&page->_count);
> + }
> if (unlikely(pte_val(pte) != pte_val(*ptep))) {
> put_page(page);
> return 0;
> diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c
> --- a/arch/x86/mm/gup.c
> +++ b/arch/x86/mm/gup.c
> @@ -128,6 +128,14 @@ static noinline int gup_huge_pmd(pmd_t p
> do {
> VM_BUG_ON(compound_head(page) != head);
> pages[*nr] = page;
> + if (PageTail(page)) {
> + /*
> + * __split_huge_page_refcount() cannot run
> + * from under us.
> + */
> + VM_BUG_ON(atomic_read(&page->_count) < 0);
> + atomic_inc(&page->_count);
> + }
Christoph kinda has a point here. The gup code is going to be a pretty
hot path for some people, and this does add a bunch of atomics that some
people will have no need for.
It's also a decent place to put a helper function anyway.
void pin_huge_page_tail(struct page *page)
{
/*
* This ensures that a __split_huge_page_refcount()
* running underneath us cannot
*/
VM_BUG_ON(atomic_read(&page->_count) < 0);
atomic_inc(&page->_count);
}
It'll keep us from putting the same comment in too many arches, I guess
> static inline void get_page(struct page *page)
> {
> - page = compound_head(page);
> - VM_BUG_ON(atomic_read(&page->_count) == 0);
> + VM_BUG_ON(atomic_read(&page->_count) < !PageTail(page));
Hmm.
if
> atomic_inc(&page->_count);
> + if (unlikely(PageTail(page))) {
> + VM_BUG_ON(atomic_read(&page->first_page->_count) <= 0);
> + atomic_inc(&page->first_page->_count);
> + /* __split_huge_page_refcount can't run under get_page */
> + VM_BUG_ON(!PageTail(page));
> + }
> }
Are you hoping to catch a race in progress with the second VM_BUG_ON()
here? Maybe the comment should say, "detect race with
__split_huge_page_refcount".
> static inline struct page *virt_to_head_page(const void *x)
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -409,7 +409,8 @@ static inline void __ClearPageTail(struc
> 1 << PG_private | 1 << PG_private_2 | \
> 1 << PG_buddy | 1 << PG_writeback | 1 << PG_reserved | \
> 1 << PG_slab | 1 << PG_swapcache | 1 << PG_active | \
> - 1 << PG_unevictable | __PG_MLOCKED | __PG_HWPOISON)
> + 1 << PG_unevictable | __PG_MLOCKED | __PG_HWPOISON | \
> + 1 << PG_compound_lock)
Nit: should probably go in the last patch.
> /*
> * Flags checked when a page is prepped for return by the page allocator.
> diff --git a/mm/swap.c b/mm/swap.c
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -55,17 +55,80 @@ static void __page_cache_release(struct
> del_page_from_lru(zone, page);
> spin_unlock_irqrestore(&zone->lru_lock, flags);
> }
> +}
> +
> +static void __put_single_page(struct page *page)
> +{
> + __page_cache_release(page);
> free_hot_page(page);
> }
>
> +static void __put_compound_page(struct page *page)
> +{
> + compound_page_dtor *dtor;
> +
> + __page_cache_release(page);
> + dtor = get_compound_page_dtor(page);
> + (*dtor)(page);
> +}
> +
> static void put_compound_page(struct page *page)
> {
> - page = compound_head(page);
> - if (put_page_testzero(page)) {
> - compound_page_dtor *dtor;
> -
> - dtor = get_compound_page_dtor(page);
> - (*dtor)(page);
> + if (unlikely(PageTail(page))) {
> + /* __split_huge_page_refcount can run under us */
> + struct page *page_head = page->first_page;
> + smp_rmb();
> + if (likely(PageTail(page) && get_page_unless_zero(page_head))) {
> + if (unlikely(!PageHead(page_head))) {
> + /* PageHead is cleared after PageTail */
> + smp_rmb();
> + VM_BUG_ON(PageTail(page));
> + goto out_put_head;
> + }
> + /*
> + * Only run compound_lock on a valid PageHead,
> + * after having it pinned with
> + * get_page_unless_zero() above.
> + */
> + smp_mb();
> + /* page_head wasn't a dangling pointer */
> + compound_lock(page_head);
> + if (unlikely(!PageTail(page))) {
> + /* __split_huge_page_refcount run before us */
> + compound_unlock(page_head);
> + out_put_head:
> + put_page(page_head);
> + out_put_single:
> + if (put_page_testzero(page))
> + __put_single_page(page);
> + return;
> + }
> + VM_BUG_ON(page_head != page->first_page);
> + /*
> + * We can release the refcount taken by
> + * get_page_unless_zero now that
> + * split_huge_page_refcount is blocked on the
> + * compound_lock.
> + */
> + if (put_page_testzero(page_head))
> + VM_BUG_ON(1);
> + /* __split_huge_page_refcount will wait now */
> + VM_BUG_ON(atomic_read(&page->_count) <= 0);
> + atomic_dec(&page->_count);
> + VM_BUG_ON(atomic_read(&page_head->_count) <= 0);
> + if (put_page_testzero(page_head))
> + __put_compound_page(page_head);
> + else
> + compound_unlock(page_head);
> + return;
> + } else
> + /* page_head is a dangling pointer */
> + goto out_put_single;
> + } else if (put_page_testzero(page)) {
> + if (PageHead(page))
> + __put_compound_page(page);
> + else
> + __put_single_page(page);
> }
> }
That looks functional to me, although the code is pretty darn dense. :)
But, I'm not sure there's a better way to do it.
-- Dave
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2010-01-21 17:36 UTC|newest]
Thread overview: 79+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-01-21 6:20 [PATCH 00 of 30] Transparent Hugepage support #3 Andrea Arcangeli
2010-01-21 6:20 ` [PATCH 01 of 30] define MADV_HUGEPAGE Andrea Arcangeli
2010-01-21 6:20 ` [PATCH 02 of 30] compound_lock Andrea Arcangeli
2010-01-21 6:20 ` [PATCH 03 of 30] alter compound get_page/put_page Andrea Arcangeli
2010-01-21 17:35 ` Dave Hansen [this message]
2010-01-23 17:39 ` Andrea Arcangeli
2010-01-21 6:20 ` [PATCH 04 of 30] clear compound mapping Andrea Arcangeli
2010-01-21 17:43 ` Dave Hansen
2010-01-23 17:55 ` Andrea Arcangeli
2010-01-21 6:20 ` [PATCH 05 of 30] add native_set_pmd_at Andrea Arcangeli
2010-01-21 6:20 ` [PATCH 06 of 30] add pmd paravirt ops Andrea Arcangeli
2010-01-21 6:20 ` [PATCH 07 of 30] no paravirt version of pmd ops Andrea Arcangeli
2010-01-21 6:20 ` [PATCH 08 of 30] export maybe_mkwrite Andrea Arcangeli
2010-01-21 6:20 ` [PATCH 09 of 30] comment reminder in destroy_compound_page Andrea Arcangeli
2010-01-21 6:20 ` [PATCH 10 of 30] config_transparent_hugepage Andrea Arcangeli
2010-01-21 6:20 ` [PATCH 11 of 30] add pmd mangling functions to x86 Andrea Arcangeli
2010-01-21 17:47 ` Dave Hansen
2010-01-21 19:14 ` Andrea Arcangeli
2010-01-21 6:20 ` [PATCH 12 of 30] add pmd mangling generic functions Andrea Arcangeli
2010-01-21 6:20 ` [PATCH 13 of 30] special pmd_trans_* functions Andrea Arcangeli
2010-01-21 6:20 ` [PATCH 14 of 30] bail out gup_fast on splitting pmd Andrea Arcangeli
2010-01-21 6:20 ` [PATCH 15 of 30] pte alloc trans splitting Andrea Arcangeli
2010-01-21 6:20 ` [PATCH 16 of 30] add pmd mmu_notifier helpers Andrea Arcangeli
2010-01-21 6:20 ` [PATCH 17 of 30] clear page compound Andrea Arcangeli
2010-01-21 6:20 ` [PATCH 18 of 30] add pmd_huge_pte to mm_struct Andrea Arcangeli
2010-01-21 6:20 ` [PATCH 19 of 30] ensure mapcount is taken on head pages Andrea Arcangeli
2010-01-21 6:20 ` [PATCH 20 of 30] split_huge_page_mm/vma Andrea Arcangeli
2010-01-21 6:20 ` [PATCH 21 of 30] split_huge_page paging Andrea Arcangeli
2010-01-21 6:20 ` [PATCH 22 of 30] pmd_trans_huge migrate bugcheck Andrea Arcangeli
2010-01-21 20:40 ` Christoph Lameter
2010-01-21 23:01 ` Andrea Arcangeli
2010-01-21 23:17 ` Andrea Arcangeli
2010-01-21 6:20 ` [PATCH 23 of 30] clear_copy_huge_page Andrea Arcangeli
2010-01-21 6:20 ` [PATCH 24 of 30] kvm mmu transparent hugepage support Andrea Arcangeli
2010-01-21 6:20 ` [PATCH 25 of 30] transparent hugepage core Andrea Arcangeli
2010-01-21 6:20 ` [PATCH 26 of 30] madvise(MADV_HUGEPAGE) Andrea Arcangeli
2010-01-21 6:20 ` [PATCH 27 of 30] memcg compound Andrea Arcangeli
2010-01-21 7:07 ` KAMEZAWA Hiroyuki
2010-01-21 15:44 ` Andrea Arcangeli
2010-01-21 23:55 ` KAMEZAWA Hiroyuki
2010-01-21 6:20 ` [PATCH 28 of 30] memcg huge memory Andrea Arcangeli
2010-01-21 7:16 ` KAMEZAWA Hiroyuki
2010-01-21 16:08 ` Andrea Arcangeli
2010-01-22 0:13 ` KAMEZAWA Hiroyuki
2010-01-27 11:27 ` Balbir Singh
2010-01-28 0:50 ` Daisuke Nishimura
2010-01-28 11:39 ` Andrea Arcangeli
2010-01-28 12:23 ` Balbir Singh
2010-01-28 12:36 ` Andrea Arcangeli
2010-01-21 6:20 ` [PATCH 29 of 30] transparent hugepage vmstat Andrea Arcangeli
2010-01-21 6:20 ` [PATCH 30 of 30] khugepaged Andrea Arcangeli
2010-01-22 14:46 ` [PATCH 00 of 30] Transparent Hugepage support #3 Christoph Lameter
2010-01-22 15:19 ` Andrea Arcangeli
2010-01-22 16:51 ` Christoph Lameter
2010-01-23 17:58 ` Andrea Arcangeli
2010-01-25 21:50 ` Christoph Lameter
2010-01-25 22:46 ` Andrea Arcangeli
2010-01-26 15:47 ` Christoph Lameter
2010-01-26 16:11 ` Andrea Arcangeli
2010-01-26 16:30 ` Christoph Lameter
2010-01-26 16:45 ` Andrea Arcangeli
2010-01-26 18:23 ` Christoph Lameter
2010-01-26 17:09 ` Avi Kivity
2010-01-26 0:52 ` Rik van Riel
2010-01-26 6:53 ` Gleb Natapov
2010-01-26 12:35 ` Andrea Arcangeli
2010-01-26 15:55 ` Christoph Lameter
2010-01-26 16:19 ` Andrea Arcangeli
2010-01-26 15:54 ` Christoph Lameter
2010-01-26 16:16 ` Andrea Arcangeli
2010-01-26 16:24 ` Andi Kleen
2010-01-26 16:37 ` Christoph Lameter
2010-01-26 16:42 ` Mel Gorman
2010-01-26 16:52 ` Andrea Arcangeli
2010-01-26 17:26 ` Mel Gorman
2010-01-26 19:46 ` Andrea Arcangeli
2010-01-26 23:07 ` Rik van Riel
2010-01-27 18:33 ` Christoph Lameter
2010-01-26 11:24 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1264095346.32717.34452.camel@nimitz \
--to=dave@linux.vnet.ibm.com \
--cc=aarcange@redhat.com \
--cc=agl@us.ibm.com \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=avi@redhat.com \
--cc=benh@kernel.crashing.org \
--cc=chrisw@sous-sol.org \
--cc=cl@linux-foundation.org \
--cc=hugh.dickins@tiscali.co.uk \
--cc=ieidus@redhat.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=mingo@elte.hu \
--cc=mtosatti@redhat.com \
--cc=npiggin@suse.de \
--cc=riel@redhat.com \
--cc=travis@sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.