All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Dave Jones <davej@redhat.com>,
	Darren Hart <dvhart@linux.intel.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>
Subject: Re: [PATCH v2 1/1] mm: fix the theoretical compound_lock() vs prep_new_page() race
Date: Wed, 8 Jan 2014 18:02:02 +0000	[thread overview]
Message-ID: <20140108180202.GL27046@suse.de> (raw)
In-Reply-To: <20140108161338.GA10434@redhat.com>

On Wed, Jan 08, 2014 at 05:13:38PM +0100, Oleg Nesterov wrote:
> On 01/08, Mel Gorman wrote:
> >
> > On Sat, Jan 04, 2014 at 05:43:47PM +0100, Oleg Nesterov wrote:
> > >
> > > 	get/put_page(thp_tail) paths do get_page_unless_zero(page_head) +
> > > 	compound_lock(). In theory this page_head can be already freed and
> > > 	reallocated as alloc_pages(__GFP_COMP, smaller_order). In this case
> > > 	get_page_unless_zero() can succeed right after set_page_refcounted(),
> > > 	and compound_lock() can race with the non-atomic __SetPageHead() in
> > > 	prep_compound_page().
> > >
> > This patch is putting a write barrier in the page allocator fast path and
> > that is going to be a leading cause of Sad Face. We already have seen
> > large regressions before when write barriers were introduced to the page
> > allocator paths for cpusets.  Sticking it under CONFIG_TRANSPARENT_HUGEPAGE
> > does not really address the issue.
> 
> As you already mentioned in another email, smp_wmb() is mostly nop. On
> x86_64 at least.

Which sometimes means that it'll just take longer for someone to find it
and bitch about it.

> Although perhaps it would be nice to have
> 
> 	static inline void atomic_store_release(atomic_t *v, int i)
> 	{
> 		smp_store_release(&v->counter, i);
> 	}
> 
> > > Yes, but thp can access this page_head via stale pointer, tail->first_page,
> > > if it races with split_huge_page_refcount().
> >
> > To justify the introduction of a performance regression we need to be 100%
> > sure this race actually exists
> 
> See below. But let me remind that I never looked at this code before,
> I can be easily wrong.
> 
> > and not just theoretical.
> 
> It is theoretical anyway, I guess.
> 
> > For futex, the THP page (and the tail) must have been discovered via
> > the page tables in which case the page tables are temporarily preventing
> > the page being freed to the allocator.
> 
> Yes. But, for example, get_futex_key() does
> 
> 	if (unlikely(PageTail(page))) {
> 		put_page(page);
> 
> why this put_page() can't race with _split? If nothing else, another thread
> can unmap the part of this vma.
> 

The race is not prevented but that does not mean it matters. Basic
scenario where a split starts after the PageTail check but before the
put_page in get_futex_key

CPU A
get_futex_key
  -> fast gup, page table removing prevents parallel unmap and free
    -> gup_huge_pmd (arch/x86/mm/gup.c at least)
      -> get_huge_page_tail (increment page tail _map_count)
      -> get_huge_page_multiple (increment ref on head page)
  -> Check PageTail
					CPU B
					split_huge_page_to_list
					  -> split_huge_page_refcount
					     spin_lock_irq(lru_lock)
					     compound_lock
  -> put_page(tail_page)
    ->put_compound_page
       looks up head page
       takes reference unless zero
       compound_lock (block)
					     complete split
					     compound_unlock
       check PageTail

This put_page blocks on the compound lock, finds the page is no longer a
PageTail as the split barriers correctly and backs out gracefully. So sure,
splits can race but the case is cared for.

The parallel unmap is prevented by get_huge_page_multiple in the gup path
and held in place until put_page_compound frees it later.

I still don't see the case where a page gets freed to the page allocator
that causes weird problems here. Unfortunately, I also recognise I have
tunnel vision because subconsciously I don't *want* to see a problem here
that justifies adding a write barrier. Andrea may stomp all over this
reasoning in which case we'll get a good comment for the smp_wmb :/

-- 
Mel Gorman
SUSE Labs

  reply	other threads:[~2014-01-08 18:02 UTC|newest]

Thread overview: 114+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-10 15:47 process 'stuck' at exit Dave Jones
2013-12-10 18:40 ` Linus Torvalds
2013-12-10 19:18   ` Thomas Gleixner
2013-12-10 19:55     ` Linus Torvalds
2013-12-10 20:27       ` Dave Jones
2013-12-10 20:34         ` Thomas Gleixner
2013-12-10 20:55           ` Dave Jones
2013-12-10 21:25             ` Darren Hart
2013-12-10 21:28               ` Thomas Gleixner
2013-12-10 21:39                 ` Steven Rostedt
2013-12-10 20:33       ` Thomas Gleixner
2013-12-10 20:43         ` Linus Torvalds
2013-12-10 21:17           ` Thomas Gleixner
2013-12-10 20:35       ` Oleg Nesterov
2013-12-10 20:49         ` Dave Jones
2013-12-10 21:06           ` Darren Hart
2013-12-10 21:12             ` Dave Jones
2013-12-10 21:18             ` Linus Torvalds
2013-12-10 21:24               ` Linus Torvalds
2013-12-10 21:32               ` Dave Jones
2013-12-10 21:49                 ` Linus Torvalds
2013-12-10 21:56                   ` Dave Jones
2013-12-10 21:59                     ` Linus Torvalds
2013-12-10 22:07                       ` Dave Jones
2013-12-11 12:45                   ` Ingo Molnar
2013-12-10 21:34           ` Oleg Nesterov
2013-12-10 21:41             ` Dave Jones
2013-12-10 21:57               ` Linus Torvalds
2013-12-10 22:02                 ` Dave Jones
2013-12-10 22:09                 ` Oleg Nesterov
2013-12-10 22:19                 ` Linus Torvalds
2013-12-10 22:33                   ` Linus Torvalds
2013-12-10 22:38                     ` Darren Hart
2013-12-10 22:57                     ` Thomas Gleixner
2013-12-10 23:05                       ` Linus Torvalds
2013-12-10 23:28                         ` Thomas Gleixner
2013-12-10 23:31                         ` Al Viro
2013-12-11 17:08                     ` Oleg Nesterov
2013-12-11 17:18                       ` Thomas Gleixner
2013-12-11 17:56                         ` Oleg Nesterov
2013-12-11 19:18                           ` PATCH? introduce get_compound_page (Was: process 'stuck' at exit) Oleg Nesterov
2013-12-13 15:10                             ` Andrea Arcangeli
2013-12-13 16:22                               ` Oleg Nesterov
2013-12-13 17:34                                 ` Andrea Arcangeli
2013-12-16 18:36                                   ` Oleg Nesterov
2013-12-16 20:19                                     ` Andrea Arcangeli
2013-12-16 20:46                                       ` Oleg Nesterov
2013-12-17 16:53                                         ` Oleg Nesterov
2013-12-17 18:06                                           ` Andrea Arcangeli
2013-12-18 19:19                                             ` [PATCH -mm 0/7] (Was: introduce get_compound_page) Oleg Nesterov
2013-12-18 19:19                                               ` [PATCH -mm 1/7] mm: thp: introduce __put_nontail_page() Oleg Nesterov
2013-12-18 19:19                                               ` [PATCH -mm 2/7] mm: thp: change __get_page_tail() to use __put_nontail_page() Oleg Nesterov
2013-12-18 19:19                                               ` [PATCH -mm 3/7] mm: change release_pages() to use put_page() rather than put_compound_page() Oleg Nesterov
2013-12-18 19:19                                               ` [PATCH -mm 4/7] mm: thp: turn put_compound_page() into __put_page_tail() Oleg Nesterov
2013-12-18 19:36                                                 ` Peter Zijlstra
2013-12-18 19:50                                                   ` Oleg Nesterov
2013-12-18 19:20                                               ` [PATCH -mm 5/7] mm: thp: reorganize out_put_single code in __put_page_tail() Oleg Nesterov
2013-12-18 19:20                                               ` [PATCH -mm 6/7] mm: thp: introduce get_lock_thp_head() Oleg Nesterov
2013-12-18 21:37                                                 ` Linus Torvalds
2013-12-19 16:29                                                   ` Oleg Nesterov
2013-12-18 19:20                                               ` [PATCH -mm 7/7] mm: thp: introduce compound_head_put_tail(), change get_futex_key() to use it Oleg Nesterov
2013-12-18 19:28                                                 ` Peter Zijlstra
2013-12-18 19:40                                                   ` Oleg Nesterov
2013-12-19 19:08                                       ` [PATCH 0/1] mm: fix the theoretical compound_lock() vs prep_new_page() race Oleg Nesterov
2013-12-19 19:09                                         ` [PATCH 1/1] " Oleg Nesterov
2013-12-23 11:43                                           ` Andrea Arcangeli
2014-01-03 19:55                                             ` [PATCH v2 0/1] " Oleg Nesterov
2014-01-03 19:55                                               ` [PATCH v2 1/1] " Oleg Nesterov
2014-01-03 21:00                                                 ` Andrew Morton
2014-01-04 16:43                                                   ` Oleg Nesterov
2014-01-08 11:54                                                     ` Mel Gorman
2014-01-08 13:14                                                       ` Mel Gorman
2014-01-08 16:13                                                       ` Oleg Nesterov
2014-01-08 18:02                                                         ` Mel Gorman [this message]
2014-01-08 19:04                                                           ` Oleg Nesterov
2014-01-09 11:27                                                             ` Mel Gorman
2014-01-09 14:04                                                               ` Oleg Nesterov
2014-01-09 18:52                                                                 ` Andrea Arcangeli
2014-01-09 19:43                                                                   ` Oleg Nesterov
2014-01-09 21:36                                                                     ` Andrea Arcangeli
2014-01-10 16:12                                                                       ` Oleg Nesterov
2014-01-10 16:50                                                                         ` Peter Zijlstra
2014-01-10 16:12                                                                 ` Mel Gorman
2013-12-20 14:19                                         ` [PATCH 0/1] " Martin Schwidefsky
2013-12-16 20:19                                   ` [PATCH 0/2] mm: thp: get_huge_page_tail() cleanups Oleg Nesterov
2013-12-16 20:19                                     ` [PATCH -mm 1/2] mm: thp: __get_page_tail_foll() can use get_huge_page_tail() Oleg Nesterov
2013-12-16 20:19                                     ` [PATCH -mm 2/2] mm: thp: turn compound_head() into BUG_ON(!PageTail) in get_huge_page_tail() Oleg Nesterov
2013-12-16 20:27                                     ` [PATCH 0/2] mm: thp: get_huge_page_tail() cleanups Andrea Arcangeli
2013-12-10 22:42                   ` process 'stuck' at exit Thomas Gleixner
2013-12-10 22:48                     ` Linus Torvalds
2013-12-10 22:58                       ` Darren Hart
2013-12-10 23:01                         ` Dave Jones
2013-12-10 23:00                       ` Dave Jones
2013-12-11  0:05                         ` Steven Rostedt
2013-12-11  0:23                           ` Dave Jones
2013-12-11  0:55                             ` Dave Jones
2013-12-14 20:17                               ` Oleg Nesterov
2013-12-11  4:09                       ` Dave Jones
2013-12-12  4:26                       ` Dave Jones
2013-12-12  5:29                         ` Darren Hart
2013-12-10 22:51                     ` Darren Hart
2013-12-10 23:24                     ` Al Viro
2013-12-11 16:31                     ` Mel Gorman
2013-12-11 16:38                       ` Thomas Gleixner
2013-12-11 17:57                         ` Mel Gorman
2013-12-12 19:00                 ` Andrea Arcangeli
2013-12-12 19:03                   ` Linus Torvalds
2013-12-10 22:09               ` Steven Rostedt
2013-12-10 22:16                 ` Dave Jones
2013-12-10 22:21                   ` Steven Rostedt
2013-12-10 22:27                     ` Dave Jones
2013-12-11  1:02     ` Mel Gorman
2013-12-10 20:57   ` Darren Hart
2013-12-10 21:09     ` Dave Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140108180202.GL27046@suse.de \
    --to=mgorman@suse.de \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=davej@redhat.com \
    --cc=dvhart@linux.intel.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=schwidefsky@de.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.