All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Mel Gorman <mgorman@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Dave Jones <davej@redhat.com>,
	Darren Hart <dvhart@linux.intel.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>
Subject: Re: [PATCH v2 1/1] mm: fix the theoretical compound_lock() vs prep_new_page() race
Date: Thu, 9 Jan 2014 15:04:47 +0100	[thread overview]
Message-ID: <20140109140447.GA25391@redhat.com> (raw)
In-Reply-To: <20140109112736.GR27046@suse.de>

On 01/09, Mel Gorman wrote:
>
> On Wed, Jan 08, 2014 at 08:04:43PM +0100, Oleg Nesterov wrote:
> >
> > Yes.
> >
> > But suppose that CPU B completes split_huge_page_to_list/munmap/etc
> > and frees this head page.
> >
>
> Where did the reference taken by get_huge_page_multiple go?

In __split_huge_page_refcount(), see below.

> CPU A
> static noinline int gup_huge_pmd(pmd_t pmd, unsigned long addr,
>                 unsigned long end, int write, struct page **pages, int *nr)
> {
> 	....
> 	do {
> 		...
> 		if (PageTail(page))
> 			/* Increment page->_mapcount */
> 			get_huge_page_tail(page);
> 		...
> 		refs++;
> 	} while (...)
> 	get_head_page_multiple(head, refs);
> }
>
> CPU A in get_futex_key has taken multiple references to the head page,
> one for every base page on the huge page

Not sure I understand "multiple references to the head page" above...
I mean, in this particular case case nr == 1.

IOW, If gup returns a tail page, this page_tail has 1 in ->_mapcount
(to simplify) and its ->first_page gets the additional 1 in ->_count.

> static void __split_huge_page_refcount(struct page *page,
>                                        struct list_head *list)
> {
> 	...
> 	spin_lock_irq(&zone->lru_lock);
> 	compound_lock(page);
>
> 	for_every_tail_page() {
> 		/* This picks up refcounts from GUP get_huge_page_tail */
> 		tail_count += page_mapcount(page_tail);
>
> 		/* Propogate all mapcounts to the "real" refcount in the tail page */
> 		atomic_add(page_mapcount(head) + page_mapcount(tail), tail->_count)
>
> 		.... flag reinits with barriers ...
> 	}
> 	atomic_sub(tail_count, headpage->_count);
> 	...
> 	unlock stuff
> }
>
> The refcounts on page->_mapcount taken while the page was huge is
> propogated to the tail pages so it's still pinned in place.

Yes. But at the same time atomic_sub(tail_count, headpage->_count)
above reverts the result of get_head_page_multiple(head) done by
gup() above.

IOW, after __split_huge_page_refcount() page_tail no longer pins its
former page_head.

> > >        takes reference unless zero
> >
> > suppose this page_head was reallocated and get_page_unless_zero()
> > succeds right after set_page_refcounted(),
> >
>
> You're right. The head page can still be freed and reallocated as a *smaller*
> compound page but futex.c is doing the reference count on the tail page
> that should have an elevated count even after the split

Yes, page_tail can't go away, the reference was moved to page_tail->_count.

> #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>         page_head = page;
>         if (unlikely(PageTail(page))) {
>                 put_page(page);
>
>
> so I'm still not seeing how a tail page racing with a split ends up with
> mayhem.

But get/put(page_tail) plays with page_head which can be freed/reallocated,
it does compound_lock(page_head).

> I could also still be stuck in a "la la la, everything is fine" mode.

More likely it is me who tries to deny the fact I missed something ;)



But let me try again. Lets ignore PageSlab/PageHeadHuge. put_compound_page()
is complicated, but roughly it does:

	CPU 0						CPU 1

	if (!PageTail(page_tail))
		return;

	page_head = page_tail->first_page;
	
							unmap this vma, free everything.
							page_tail can't go away, its
							->_count was incremented by
							__split_huge_page_refcount().
							

							alloc_pages(GFP_COMP, 1) reallocates
							page_head, prep_new_page() does
							set_page_refcounted(page_head),
							

	// succeeds after set_page_refcounted()
	get_page_unless_zero(page_head);

	compound_lock(page_head)			prep_compound_page(page_head);

Now, both compound_lock() and prep_compound_page() play with the same
page_head->flags, but __SetPageHead(page_head) is non-atomic.

OK. Even if I am right, we can probably make another fix.
put_compound_page() and __get_page_tail() can do yet another PageTail()
check _before_ compound_lock().

Although personally I'd prefer this patch. And if we change get/put
I think it would be  better to do this on top of

	"[PATCH -mm 6/7] mm: thp: introduce get_lock_thp_head()"
	http://marc.info/?l=linux-kernel&m=138739438800899

Oleg.


  reply	other threads:[~2014-01-09 14:04 UTC|newest]

Thread overview: 114+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-10 15:47 process 'stuck' at exit Dave Jones
2013-12-10 18:40 ` Linus Torvalds
2013-12-10 19:18   ` Thomas Gleixner
2013-12-10 19:55     ` Linus Torvalds
2013-12-10 20:27       ` Dave Jones
2013-12-10 20:34         ` Thomas Gleixner
2013-12-10 20:55           ` Dave Jones
2013-12-10 21:25             ` Darren Hart
2013-12-10 21:28               ` Thomas Gleixner
2013-12-10 21:39                 ` Steven Rostedt
2013-12-10 20:33       ` Thomas Gleixner
2013-12-10 20:43         ` Linus Torvalds
2013-12-10 21:17           ` Thomas Gleixner
2013-12-10 20:35       ` Oleg Nesterov
2013-12-10 20:49         ` Dave Jones
2013-12-10 21:06           ` Darren Hart
2013-12-10 21:12             ` Dave Jones
2013-12-10 21:18             ` Linus Torvalds
2013-12-10 21:24               ` Linus Torvalds
2013-12-10 21:32               ` Dave Jones
2013-12-10 21:49                 ` Linus Torvalds
2013-12-10 21:56                   ` Dave Jones
2013-12-10 21:59                     ` Linus Torvalds
2013-12-10 22:07                       ` Dave Jones
2013-12-11 12:45                   ` Ingo Molnar
2013-12-10 21:34           ` Oleg Nesterov
2013-12-10 21:41             ` Dave Jones
2013-12-10 21:57               ` Linus Torvalds
2013-12-10 22:02                 ` Dave Jones
2013-12-10 22:09                 ` Oleg Nesterov
2013-12-10 22:19                 ` Linus Torvalds
2013-12-10 22:33                   ` Linus Torvalds
2013-12-10 22:38                     ` Darren Hart
2013-12-10 22:57                     ` Thomas Gleixner
2013-12-10 23:05                       ` Linus Torvalds
2013-12-10 23:28                         ` Thomas Gleixner
2013-12-10 23:31                         ` Al Viro
2013-12-11 17:08                     ` Oleg Nesterov
2013-12-11 17:18                       ` Thomas Gleixner
2013-12-11 17:56                         ` Oleg Nesterov
2013-12-11 19:18                           ` PATCH? introduce get_compound_page (Was: process 'stuck' at exit) Oleg Nesterov
2013-12-13 15:10                             ` Andrea Arcangeli
2013-12-13 16:22                               ` Oleg Nesterov
2013-12-13 17:34                                 ` Andrea Arcangeli
2013-12-16 18:36                                   ` Oleg Nesterov
2013-12-16 20:19                                     ` Andrea Arcangeli
2013-12-16 20:46                                       ` Oleg Nesterov
2013-12-17 16:53                                         ` Oleg Nesterov
2013-12-17 18:06                                           ` Andrea Arcangeli
2013-12-18 19:19                                             ` [PATCH -mm 0/7] (Was: introduce get_compound_page) Oleg Nesterov
2013-12-18 19:19                                               ` [PATCH -mm 1/7] mm: thp: introduce __put_nontail_page() Oleg Nesterov
2013-12-18 19:19                                               ` [PATCH -mm 2/7] mm: thp: change __get_page_tail() to use __put_nontail_page() Oleg Nesterov
2013-12-18 19:19                                               ` [PATCH -mm 3/7] mm: change release_pages() to use put_page() rather than put_compound_page() Oleg Nesterov
2013-12-18 19:19                                               ` [PATCH -mm 4/7] mm: thp: turn put_compound_page() into __put_page_tail() Oleg Nesterov
2013-12-18 19:36                                                 ` Peter Zijlstra
2013-12-18 19:50                                                   ` Oleg Nesterov
2013-12-18 19:20                                               ` [PATCH -mm 5/7] mm: thp: reorganize out_put_single code in __put_page_tail() Oleg Nesterov
2013-12-18 19:20                                               ` [PATCH -mm 6/7] mm: thp: introduce get_lock_thp_head() Oleg Nesterov
2013-12-18 21:37                                                 ` Linus Torvalds
2013-12-19 16:29                                                   ` Oleg Nesterov
2013-12-18 19:20                                               ` [PATCH -mm 7/7] mm: thp: introduce compound_head_put_tail(), change get_futex_key() to use it Oleg Nesterov
2013-12-18 19:28                                                 ` Peter Zijlstra
2013-12-18 19:40                                                   ` Oleg Nesterov
2013-12-19 19:08                                       ` [PATCH 0/1] mm: fix the theoretical compound_lock() vs prep_new_page() race Oleg Nesterov
2013-12-19 19:09                                         ` [PATCH 1/1] " Oleg Nesterov
2013-12-23 11:43                                           ` Andrea Arcangeli
2014-01-03 19:55                                             ` [PATCH v2 0/1] " Oleg Nesterov
2014-01-03 19:55                                               ` [PATCH v2 1/1] " Oleg Nesterov
2014-01-03 21:00                                                 ` Andrew Morton
2014-01-04 16:43                                                   ` Oleg Nesterov
2014-01-08 11:54                                                     ` Mel Gorman
2014-01-08 13:14                                                       ` Mel Gorman
2014-01-08 16:13                                                       ` Oleg Nesterov
2014-01-08 18:02                                                         ` Mel Gorman
2014-01-08 19:04                                                           ` Oleg Nesterov
2014-01-09 11:27                                                             ` Mel Gorman
2014-01-09 14:04                                                               ` Oleg Nesterov [this message]
2014-01-09 18:52                                                                 ` Andrea Arcangeli
2014-01-09 19:43                                                                   ` Oleg Nesterov
2014-01-09 21:36                                                                     ` Andrea Arcangeli
2014-01-10 16:12                                                                       ` Oleg Nesterov
2014-01-10 16:50                                                                         ` Peter Zijlstra
2014-01-10 16:12                                                                 ` Mel Gorman
2013-12-20 14:19                                         ` [PATCH 0/1] " Martin Schwidefsky
2013-12-16 20:19                                   ` [PATCH 0/2] mm: thp: get_huge_page_tail() cleanups Oleg Nesterov
2013-12-16 20:19                                     ` [PATCH -mm 1/2] mm: thp: __get_page_tail_foll() can use get_huge_page_tail() Oleg Nesterov
2013-12-16 20:19                                     ` [PATCH -mm 2/2] mm: thp: turn compound_head() into BUG_ON(!PageTail) in get_huge_page_tail() Oleg Nesterov
2013-12-16 20:27                                     ` [PATCH 0/2] mm: thp: get_huge_page_tail() cleanups Andrea Arcangeli
2013-12-10 22:42                   ` process 'stuck' at exit Thomas Gleixner
2013-12-10 22:48                     ` Linus Torvalds
2013-12-10 22:58                       ` Darren Hart
2013-12-10 23:01                         ` Dave Jones
2013-12-10 23:00                       ` Dave Jones
2013-12-11  0:05                         ` Steven Rostedt
2013-12-11  0:23                           ` Dave Jones
2013-12-11  0:55                             ` Dave Jones
2013-12-14 20:17                               ` Oleg Nesterov
2013-12-11  4:09                       ` Dave Jones
2013-12-12  4:26                       ` Dave Jones
2013-12-12  5:29                         ` Darren Hart
2013-12-10 22:51                     ` Darren Hart
2013-12-10 23:24                     ` Al Viro
2013-12-11 16:31                     ` Mel Gorman
2013-12-11 16:38                       ` Thomas Gleixner
2013-12-11 17:57                         ` Mel Gorman
2013-12-12 19:00                 ` Andrea Arcangeli
2013-12-12 19:03                   ` Linus Torvalds
2013-12-10 22:09               ` Steven Rostedt
2013-12-10 22:16                 ` Dave Jones
2013-12-10 22:21                   ` Steven Rostedt
2013-12-10 22:27                     ` Dave Jones
2013-12-11  1:02     ` Mel Gorman
2013-12-10 20:57   ` Darren Hart
2013-12-10 21:09     ` Dave Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140109140447.GA25391@redhat.com \
    --to=oleg@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=davej@redhat.com \
    --cc=dvhart@linux.intel.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=peterz@infradead.org \
    --cc=schwidefsky@de.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.