All of lore.kernel.org
 help / color / mirror / Atom feed
From: aarcange@redhat.com
To: linux-mm@kvack.org
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
	Adam Litke <agl@us.ibm.com>, Avi Kivity <avi@redhat.com>,
	Izik Eidus <ieidus@redhat.com>,
	Hugh Dickins <hugh.dickins@tiscali.co.uk>,
	Nick Piggin <npiggin@suse.de>, Rik van Riel <riel@redhat.com>,
	Mel Gorman <mel@csn.ul.ie>, Dave Hansen <dave@linux.vnet.ibm.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Ingo Molnar <mingo@elte.hu>, Mike Travis <travis@sgi.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Christoph Lameter <cl@linux-foundation.org>,
	Chris Wright <chrisw@sous-sol.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	bpicco@redhat.com,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Balbir Singh <balbir@linux.vnet.ibm.com>,
	Arnd Bergmann <arnd@arndb.de>,
	Andrea Arcangeli <aarcange@redhat.com>
Subject: [patch 04/36] update futex compound knowledge
Date: Sun, 21 Feb 2010 15:10:13 +0100	[thread overview]
Message-ID: <20100221141753.297910660@redhat.com> (raw)
In-Reply-To: 20100221141009.581909647@redhat.com

[-- Attachment #1: compound_futex --]
[-- Type: text/plain, Size: 4536 bytes --]

From: Andrea Arcangeli <aarcange@redhat.com>

Futex code is smarter than most other gup_fast O_DIRECT code and knows about
the compound internals. However now doing a put_page(head_page) will not
release the pin on the tail page taken by gup-fast, leading to all sort of
refcounting bugchecks. Getting a stable head_page is a little tricky.

page_head = page is there because if this is not a tail page it's also the
page_head. Only in case this is a tail page, compound_head is called, otherwise
it's guaranteed unnecessary. And if it's a tail page compound_head has to run
atomically inside irq disabled section __get_user_pages_fast before returning.
Otherwise ->first_page won't be a stable pointer.

Disableing irq before __get_user_page_fast and releasing irq after running
compound_head is needed because if __get_user_page_fast returns == 1, it means
the huge pmd is established and cannot go away from under us.
pmdp_splitting_flush_notify in __split_huge_page_splitting will have to wait
for local_irq_enable before the IPI delivery can return. This means
__split_huge_page_refcount can't be running from under us, and in turn when we
run compound_head(page) we're not reading a dangling pointer from
tailpage->first_page. Then after we get to stable head page, we are always safe
to call compound_lock and after taking the compound lock on head page we can
finally re-check if the page returned by gup-fast is still a tail page. in
which case we're set and we didn't need to split the hugepage in order to take
a futex on it.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Rik van Riel <riel@redhat.com>
---

diff --git a/kernel/futex.c b/kernel/futex.c
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -218,7 +218,7 @@ get_futex_key(u32 __user *uaddr, int fsh
 {
 	unsigned long address = (unsigned long)uaddr;
 	struct mm_struct *mm = current->mm;
-	struct page *page;
+	struct page *page, *page_head;
 	int err;
 
 	/*
@@ -250,10 +250,53 @@ again:
 	if (err < 0)
 		return err;
 
-	page = compound_head(page);
-	lock_page(page);
-	if (!page->mapping) {
-		unlock_page(page);
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+	page_head = page;
+	if (unlikely(PageTail(page))) {
+		put_page(page);
+		/* serialize against __split_huge_page_splitting() */
+		local_irq_disable();
+		if (likely(__get_user_pages_fast(address, 1, 1, &page) == 1)) {
+			page_head = compound_head(page);
+			/*
+			 * page_head is valid pointer but we must pin
+			 * it before taking the PG_lock and/or
+			 * PG_compound_lock. The moment we re-enable
+			 * irqs __split_huge_page_splitting() can
+			 * return and the head page can be freed from
+			 * under us. We can't take the PG_lock and/or
+			 * PG_compound_lock on a page that could be
+			 * freed from under us.
+			 */
+			if (page != page_head)
+				get_page(page_head);
+			local_irq_enable();
+		} else {
+			local_irq_enable();
+			goto again;
+		}
+	}
+#else
+	page_head = compound_head(page);
+	if (page != page_head)
+		get_page(page_head);
+#endif
+
+	lock_page(page_head);
+	if (unlikely(page_head != page)) {
+		compound_lock(page_head);
+		if (unlikely(!PageTail(page))) {
+			compound_unlock(page_head);
+			unlock_page(page_head);
+			put_page(page_head);
+			put_page(page);
+			goto again;
+		}
+	}
+	if (!page_head->mapping) {
+		unlock_page(page_head);
+		if (page_head != page)
+			put_page(page_head);
 		put_page(page);
 		goto again;
 	}
@@ -265,19 +308,25 @@ again:
 	 * it's a read-only handle, it's expected that futexes attach to
 	 * the object not the particular process.
 	 */
-	if (PageAnon(page)) {
+	if (PageAnon(page_head)) {
 		key->both.offset |= FUT_OFF_MMSHARED; /* ref taken on mm */
 		key->private.mm = mm;
 		key->private.address = address;
 	} else {
 		key->both.offset |= FUT_OFF_INODE; /* inode-based key */
-		key->shared.inode = page->mapping->host;
-		key->shared.pgoff = page->index;
+		key->shared.inode = page_head->mapping->host;
+		key->shared.pgoff = page_head->index;
 	}
 
 	get_futex_key_refs(key);
 
-	unlock_page(page);
+	unlock_page(page_head);
+	if (page != page_head) {
+		VM_BUG_ON(!PageTail(page));
+		/* releasing compound_lock after page_lock won't matter */
+		compound_unlock(page_head);
+		put_page(page_head);
+	}
 	put_page(page);
 	return 0;
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2010-02-21 14:18 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-02-21 14:10 [patch 00/36] Transparent Hugepage support #11 aarcange
2010-02-21 14:10 ` [patch 01/36] define MADV_HUGEPAGE aarcange
2010-02-21 14:10 ` [patch 02/36] compound_lock aarcange
2010-02-21 14:10 ` [patch 03/36] alter compound get_page/put_page aarcange
2010-02-21 14:10 ` aarcange [this message]
2010-02-21 14:10 ` [patch 05/36] fix bad_page to show the real reason the page is bad aarcange
2010-02-21 14:10 ` [patch 06/36] clear compound mapping aarcange
2010-02-21 14:10 ` [patch 07/36] add native_set_pmd_at aarcange
2010-02-21 14:10 ` [patch 08/36] add pmd paravirt ops aarcange
2010-02-21 14:10 ` [patch 09/36] no paravirt version of pmd ops aarcange
2010-02-21 14:10 ` [patch 10/36] export maybe_mkwrite aarcange
2010-02-21 14:10 ` [patch 11/36] comment reminder in destroy_compound_page aarcange
2010-02-21 14:10 ` [patch 12/36] config_transparent_hugepage aarcange
2010-02-21 14:10 ` [patch 13/36] special pmd_trans_* functions aarcange
2010-02-21 14:10 ` [patch 14/36] add pmd mangling generic functions aarcange
2010-02-21 14:10 ` [patch 15/36] add pmd mangling functions to x86 aarcange
2010-02-21 14:10 ` [patch 16/36] bail out gup_fast on splitting pmd aarcange
2010-02-21 14:10 ` [patch 17/36] pte alloc trans splitting aarcange
2010-02-21 14:10 ` [patch 18/36] add pmd mmu_notifier helpers aarcange
2010-02-21 14:10 ` [patch 19/36] clear page compound aarcange
2010-02-21 14:10 ` [patch 20/36] add pmd_huge_pte to mm_struct aarcange
2010-02-21 14:10 ` [patch 21/36] split_huge_page_mm/vma aarcange
2010-02-21 14:10 ` [patch 22/36] split_huge_page paging aarcange
2010-02-21 14:10 ` [patch 23/36] clear_copy_huge_page aarcange
2010-02-21 14:10 ` [patch 24/36] kvm mmu transparent hugepage support aarcange
2010-02-21 14:10 ` [patch 25/36] _GFP_NO_KSWAPD aarcange
2010-02-22 17:53   ` Rik van Riel
2010-02-22 18:00     ` Andrea Arcangeli
2010-02-22 18:02       ` Avi Kivity
2010-03-01 12:14         ` Mel Gorman
2010-02-21 14:10 ` [patch 26/36] dont alloc harder for gfp nomemalloc even if nowait aarcange
2010-02-22 17:54   ` Rik van Riel
2010-02-21 14:10 ` [patch 27/36] transparent hugepage core aarcange
2010-02-21 14:10 ` [patch 28/36] adapt to mm_counter in -mm aarcange
2010-02-22 17:54   ` Rik van Riel
2010-02-21 14:10 ` [patch 29/36] page anon_vma aarcange
2010-02-22 17:55   ` Rik van Riel
2010-02-21 14:10 ` [patch 30/36] verify pmd_trans_huge isnt leaking aarcange
2010-02-22 17:56   ` Rik van Riel
2010-02-21 14:10 ` [patch 31/36] madvise(MADV_HUGEPAGE) aarcange
2010-02-21 14:10 ` [patch 32/36] pmd_trans_huge migrate bugcheck aarcange
2010-02-21 14:10 ` [patch 33/36] memcg compound aarcange
2010-02-21 14:10 ` [patch 34/36] memcg huge memory aarcange
2010-02-21 14:10 ` [patch 35/36] transparent hugepage vmstat aarcange
2010-02-21 14:10 ` [patch 36/36] khugepaged aarcange
2010-02-23  7:58   ` KAMEZAWA Hiroyuki
2010-02-23  8:51     ` KAMEZAWA Hiroyuki
2010-02-23 14:26     ` Andrea Arcangeli
2010-02-23 23:57       ` KAMEZAWA Hiroyuki
2010-02-24 20:11   ` Andrew Morton
2010-02-24 20:28     ` Rik van Riel
2010-02-24 20:52       ` Andrew Morton
2010-02-24 20:57         ` Rik van Riel
2010-02-24 21:12           ` Andrew Morton
2010-02-24 21:24             ` Rik van Riel
2010-02-24 21:28               ` Andrew Morton
2010-02-24 21:58                 ` Andrea Arcangeli
2010-02-24 22:52               ` Andrea Arcangeli
2010-02-24 22:56                 ` Rik van Riel
2010-02-22 10:22 ` [patch 00/36] Transparent Hugepage support #11 Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100221141753.297910660@redhat.com \
    --to=aarcange@redhat.com \
    --cc=agl@us.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=avi@redhat.com \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=benh@kernel.crashing.org \
    --cc=bpicco@redhat.com \
    --cc=chrisw@sous-sol.org \
    --cc=cl@linux-foundation.org \
    --cc=dave@linux.vnet.ibm.com \
    --cc=hugh.dickins@tiscali.co.uk \
    --cc=ieidus@redhat.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=mel@csn.ul.ie \
    --cc=mingo@elte.hu \
    --cc=mtosatti@redhat.com \
    --cc=npiggin@suse.de \
    --cc=riel@redhat.com \
    --cc=travis@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.