linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Dave Jones <davej@redhat.com>,
	Darren Hart <dvhart@linux.intel.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Mel Gorman <mgorman@suse.de>
Subject: Re: PATCH? introduce get_compound_page (Was: process 'stuck' at exit)
Date: Mon, 16 Dec 2013 21:19:52 +0100	[thread overview]
Message-ID: <20131216201952.GE21218@redhat.com> (raw)
In-Reply-To: <20131216183618.GA28252@redhat.com>

On Mon, Dec 16, 2013 at 07:36:18PM +0100, Oleg Nesterov wrote:
> On 12/13, Andrea Arcangeli wrote:
> >
> > On Fri, Dec 13, 2013 at 05:22:40PM +0100, Oleg Nesterov wrote:
> > >
> > > I'll try to make v2 based on -mm and your suggestions.
> >
> > Ok great!
> 
> Yes, it would be great, but I need your help again ;)
> 
> 
> Let me quote the pseudo-code you sent me:
> 
> 	put_compound_tail(page) {
> 		page_head = compound_trans_head(page);
> 		if (!__compound_tail_refcounted(page_head)) {
> 			...
> 			return ...;
> 		}
> 
> 		flags = compound_lock_irqsave(page_head);
> 		...
> 
> Sure, put_compound_tail() should be the simplified version of
> put_compound_page() which doesn't dec page_head->_count, this is clear.
> 
> But afaics, compound_lock_irqsave() above looks unsafe without
> get_page_unless_zero(page_head) ? If we race with _split, page_head
> can be freed and compound_lock() can race with, say, free_pages_check()
> which plays with page->flags ?
> 
> So it seems that put_compound_tail() should also do get/put(head) like
> put_compound_page() does, and this probably means we should factor out
> the common code somehow.
> 

Yes it was supposed to do the get_page_unless_zero before the
compound_lock.

> Or I missed something?
> 
> 
> 
> OTOH, I can't really understand
> 
> 	if (likely(page != page_head && get_page_unless_zero(page_head)))
> 
> in __get_page_tail() and put_compound_page().
> 
> First of all, is it really possible that page == compound_trans_head(page)?

It is possible if it become a regular page because split_huge_page
run in between.

compound_trans_head guarantees us it got us to a safe head page. And
the idea was to do get_page_unless_zero only on head pages and never
on tails to be safer/simpler. Same goes for the compound_lock that
follows still on the page_head where "page != page_head".

If you like, you can try to optimize away the check and reuse the
PageTail branch inside the compound_lock but I'm not sure if it's
worth it.

> We already verified that PG_tail was set. Of course this bit can be cleared
> and ->first_page can be a dangling pointer but it can never be changed to
> point to this page? (in fact, afaics it can be changed at all as long as we
> have a reference, but this doesn't matter).

If PG_tail gets cleared compound_trans_head returns "page".

> And compound_lock_irqsave() looks racy even after get_page_unless_zero().
> 
> For example, suppose that page_head was already freed and then re-allocated
> as (say) alloc_pages(__GFP_COMP, 1). get_page_unless_zero() can succeed right
> after prep_new_page() does set_page_refcounted(). Now, can't compound_lock()
> race with the non-atomic prep_compound_page()->__SetPageHead() ?

Yes. We need to change to:

	if (order && (gfp_flags & __GFP_COMP))
		prep_compound_page(page, order);
	smp_wmb();
	/* as the compound_lock can be taken after it's refcounted */
	set_page_refcounted(page);

__SetPageHead uses bts asm insn so literally only a "lock" prefix is
missing in a assembly instruction. So the race window is incredibly
small, but it must be fixed indeed. This also puts set_page_refcounted
as the last action of buffered_rmqueue so there shouldn't be any other
issues like this left in the page allocation code.

Can you reorder set_page_refcount in your v2?

I wonder if arch_alloc_page needs refcount 1, it sets the page as
stable on s390. the other way around is to move prep_compound_page
before set_page_refcounted (though I think if we can, keeping the
refcounted at the very last with a comment is preferable).

> Finally. basepage_index(page) after put_page(page) in get_futex_key() looks
> confusing imho. I think this is correct, we already checked PageAnon() so it
> can't be a thp page. But probably this needs a comment and __basepage_index()
> should do BUG_ON(!PageHuge()).

This looks a bug if you apply the patches to add THP in pagecache, and
BUG_ON(!PageHuge) would also break THP in pagecache later (though
safer than silently broken like now).

It'd safer to read the basepage_index in a local variable just before
doing the put_compund_tail but I agree it's not going to make a
difference right now.

But the whole idea of working with the head of the THP despite the
address points to a tail, looks flakey if there is a split (THP in
pagecache). I mean chances are reading basepage_index(page) before
releasing the tail pin is not enough.

The Anon path looks sane for all cases, as it doesn't pretend to
return any information on the actual mapping and it limits itself to
do the PageAnon check on the actual head that has PageAnon set, which
is still valid thing to do even after a split, and in fact required if
a split didn't take place yet as the tail "page" isn't anon but just a
tail.

  reply	other threads:[~2013-12-16 20:20 UTC|newest]

Thread overview: 114+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-12-10 15:47 process 'stuck' at exit Dave Jones
2013-12-10 18:40 ` Linus Torvalds
2013-12-10 19:18   ` Thomas Gleixner
2013-12-10 19:55     ` Linus Torvalds
2013-12-10 20:27       ` Dave Jones
2013-12-10 20:34         ` Thomas Gleixner
2013-12-10 20:55           ` Dave Jones
2013-12-10 21:25             ` Darren Hart
2013-12-10 21:28               ` Thomas Gleixner
2013-12-10 21:39                 ` Steven Rostedt
2013-12-10 20:33       ` Thomas Gleixner
2013-12-10 20:43         ` Linus Torvalds
2013-12-10 21:17           ` Thomas Gleixner
2013-12-10 20:35       ` Oleg Nesterov
2013-12-10 20:49         ` Dave Jones
2013-12-10 21:06           ` Darren Hart
2013-12-10 21:12             ` Dave Jones
2013-12-10 21:18             ` Linus Torvalds
2013-12-10 21:24               ` Linus Torvalds
2013-12-10 21:32               ` Dave Jones
2013-12-10 21:49                 ` Linus Torvalds
2013-12-10 21:56                   ` Dave Jones
2013-12-10 21:59                     ` Linus Torvalds
2013-12-10 22:07                       ` Dave Jones
2013-12-11 12:45                   ` Ingo Molnar
2013-12-10 21:34           ` Oleg Nesterov
2013-12-10 21:41             ` Dave Jones
2013-12-10 21:57               ` Linus Torvalds
2013-12-10 22:02                 ` Dave Jones
2013-12-10 22:09                 ` Oleg Nesterov
2013-12-10 22:19                 ` Linus Torvalds
2013-12-10 22:33                   ` Linus Torvalds
2013-12-10 22:38                     ` Darren Hart
2013-12-10 22:57                     ` Thomas Gleixner
2013-12-10 23:05                       ` Linus Torvalds
2013-12-10 23:28                         ` Thomas Gleixner
2013-12-10 23:31                         ` Al Viro
2013-12-11 17:08                     ` Oleg Nesterov
2013-12-11 17:18                       ` Thomas Gleixner
2013-12-11 17:56                         ` Oleg Nesterov
2013-12-11 19:18                           ` PATCH? introduce get_compound_page (Was: process 'stuck' at exit) Oleg Nesterov
2013-12-13 15:10                             ` Andrea Arcangeli
2013-12-13 16:22                               ` Oleg Nesterov
2013-12-13 17:34                                 ` Andrea Arcangeli
2013-12-16 18:36                                   ` Oleg Nesterov
2013-12-16 20:19                                     ` Andrea Arcangeli [this message]
2013-12-16 20:46                                       ` Oleg Nesterov
2013-12-17 16:53                                         ` Oleg Nesterov
2013-12-17 18:06                                           ` Andrea Arcangeli
2013-12-18 19:19                                             ` [PATCH -mm 0/7] (Was: introduce get_compound_page) Oleg Nesterov
2013-12-18 19:19                                               ` [PATCH -mm 1/7] mm: thp: introduce __put_nontail_page() Oleg Nesterov
2013-12-18 19:19                                               ` [PATCH -mm 2/7] mm: thp: change __get_page_tail() to use __put_nontail_page() Oleg Nesterov
2013-12-18 19:19                                               ` [PATCH -mm 3/7] mm: change release_pages() to use put_page() rather than put_compound_page() Oleg Nesterov
2013-12-18 19:19                                               ` [PATCH -mm 4/7] mm: thp: turn put_compound_page() into __put_page_tail() Oleg Nesterov
2013-12-18 19:36                                                 ` Peter Zijlstra
2013-12-18 19:50                                                   ` Oleg Nesterov
2013-12-18 19:20                                               ` [PATCH -mm 5/7] mm: thp: reorganize out_put_single code in __put_page_tail() Oleg Nesterov
2013-12-18 19:20                                               ` [PATCH -mm 6/7] mm: thp: introduce get_lock_thp_head() Oleg Nesterov
2013-12-18 21:37                                                 ` Linus Torvalds
2013-12-19 16:29                                                   ` Oleg Nesterov
2013-12-18 19:20                                               ` [PATCH -mm 7/7] mm: thp: introduce compound_head_put_tail(), change get_futex_key() to use it Oleg Nesterov
2013-12-18 19:28                                                 ` Peter Zijlstra
2013-12-18 19:40                                                   ` Oleg Nesterov
2013-12-19 19:08                                       ` [PATCH 0/1] mm: fix the theoretical compound_lock() vs prep_new_page() race Oleg Nesterov
2013-12-19 19:09                                         ` [PATCH 1/1] " Oleg Nesterov
2013-12-23 11:43                                           ` Andrea Arcangeli
2014-01-03 19:55                                             ` [PATCH v2 0/1] " Oleg Nesterov
2014-01-03 19:55                                               ` [PATCH v2 1/1] " Oleg Nesterov
2014-01-03 21:00                                                 ` Andrew Morton
2014-01-04 16:43                                                   ` Oleg Nesterov
2014-01-08 11:54                                                     ` Mel Gorman
2014-01-08 13:14                                                       ` Mel Gorman
2014-01-08 16:13                                                       ` Oleg Nesterov
2014-01-08 18:02                                                         ` Mel Gorman
2014-01-08 19:04                                                           ` Oleg Nesterov
2014-01-09 11:27                                                             ` Mel Gorman
2014-01-09 14:04                                                               ` Oleg Nesterov
2014-01-09 18:52                                                                 ` Andrea Arcangeli
2014-01-09 19:43                                                                   ` Oleg Nesterov
2014-01-09 21:36                                                                     ` Andrea Arcangeli
2014-01-10 16:12                                                                       ` Oleg Nesterov
2014-01-10 16:50                                                                         ` Peter Zijlstra
2014-01-10 16:12                                                                 ` Mel Gorman
2013-12-20 14:19                                         ` [PATCH 0/1] " Martin Schwidefsky
2013-12-16 20:19                                   ` [PATCH 0/2] mm: thp: get_huge_page_tail() cleanups Oleg Nesterov
2013-12-16 20:19                                     ` [PATCH -mm 1/2] mm: thp: __get_page_tail_foll() can use get_huge_page_tail() Oleg Nesterov
2013-12-16 20:19                                     ` [PATCH -mm 2/2] mm: thp: turn compound_head() into BUG_ON(!PageTail) in get_huge_page_tail() Oleg Nesterov
2013-12-16 20:27                                     ` [PATCH 0/2] mm: thp: get_huge_page_tail() cleanups Andrea Arcangeli
2013-12-10 22:42                   ` process 'stuck' at exit Thomas Gleixner
2013-12-10 22:48                     ` Linus Torvalds
2013-12-10 22:58                       ` Darren Hart
2013-12-10 23:01                         ` Dave Jones
2013-12-10 23:00                       ` Dave Jones
2013-12-11  0:05                         ` Steven Rostedt
2013-12-11  0:23                           ` Dave Jones
2013-12-11  0:55                             ` Dave Jones
2013-12-14 20:17                               ` Oleg Nesterov
2013-12-11  4:09                       ` Dave Jones
2013-12-12  4:26                       ` Dave Jones
2013-12-12  5:29                         ` Darren Hart
2013-12-10 22:51                     ` Darren Hart
2013-12-10 23:24                     ` Al Viro
2013-12-11 16:31                     ` Mel Gorman
2013-12-11 16:38                       ` Thomas Gleixner
2013-12-11 17:57                         ` Mel Gorman
2013-12-12 19:00                 ` Andrea Arcangeli
2013-12-12 19:03                   ` Linus Torvalds
2013-12-10 22:09               ` Steven Rostedt
2013-12-10 22:16                 ` Dave Jones
2013-12-10 22:21                   ` Steven Rostedt
2013-12-10 22:27                     ` Dave Jones
2013-12-11  1:02     ` Mel Gorman
2013-12-10 20:57   ` Darren Hart
2013-12-10 21:09     ` Dave Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131216201952.GE21218@redhat.com \
    --to=aarcange@redhat.com \
    --cc=davej@redhat.com \
    --cc=dvhart@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).