linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>, Mel Gorman <mgorman@suse.de>,
	Michal Hocko <mhocko@suse.cz>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Hugh Dickins <hughd@google.com>,
	Davidlohr Bueso <davidlohr.bueso@hp.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Wanpeng Li <liwanp@linux.vnet.ibm.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Hillf Danton <dhillf@gmail.com>
Subject: Re: [PATCH 17/18] mm, hugetlb: retry if we fail to allocate a hugepage with use_reserve
Date: Fri, 9 Aug 2013 10:02:31 +1000	[thread overview]
Message-ID: <20130809000231.GB2904@voom.fritz.box> (raw)
In-Reply-To: <20130807091832.GD32449@lge.com>

[-- Attachment #1: Type: text/plain, Size: 3972 bytes --]

On Wed, Aug 07, 2013 at 06:18:32PM +0900, Joonsoo Kim wrote:
> On Tue, Aug 06, 2013 at 06:38:49PM -0700, Davidlohr Bueso wrote:
> > On Wed, 2013-08-07 at 11:03 +1000, David Gibson wrote:
> > > On Tue, Aug 06, 2013 at 05:18:44PM -0700, Davidlohr Bueso wrote:
> > > > On Mon, 2013-08-05 at 16:36 +0900, Joonsoo Kim wrote:
> > > > > > Any mapping that doesn't use the reserved pool, not just
> > > > > > MAP_NORESERVE.  For example, if a process makes a MAP_PRIVATE mapping,
> > > > > > then fork()s then the mapping is instantiated in the child, that will
> > > > > > not draw from the reserved pool.
> > > > > > 
> > > > > > > Should we ensure them to allocate the last hugepage?
> > > > > > > They map a region with MAP_NORESERVE, so don't assume that their requests
> > > > > > > always succeed.
> > > > > > 
> > > > > > If the pages are available, people get cranky if it fails for no
> > > > > > apparent reason, MAP_NORESERVE or not.  They get especially cranky if
> > > > > > it sometimes fails and sometimes doesn't due to a race condition.
> > > > > 
> > > > > Hello,
> > > > > 
> > > > > Hmm... Okay. I will try to implement another way to protect race condition.
> > > > > Maybe it is the best to use a table mutex :)
> > > > > Anyway, please give me a time, guys.
> > > > 
> > > > So another option is to take the mutex table patchset for now as it
> > > > *does* improve things a great deal, then, when ready, get rid of the
> > > > instantiation lock all together.
> > > 
> > > We still don't have a solid proposal for doing that. Joonsoo Kim's
> > > patchset misses cases (non reserved mappings).  I'm also not certain
> > > there aren't a few edge cases which can lead to even reserved mappings
> > > failing, and if that happens the patches will lead to livelock.
> > > 
> > 
> > Exactly, which is why I suggest minimizing the lock contention until we
> > do have such a proposal.
> 
> Okay. my proposal is not complete and maybe much time is needed.
> And I'm not sure that my *retry* approach can eventually cover all
> the race situations, currently.

Yes.  The difficulty with retrying is knowing when its safe to to
so.  If you don't retry enough, you get SIGBUS when you should be able
to allocate, if you retry too much, you freeze up trying to find a
page that isn't there.

I once attempted an approach involving an atomic counter of the number
of "in flight" hugepages, only retrying when it's non zero.  Working
out a safe ordering for all the updates to get all the cases right
made my brain melt though, and I never got it working.

> If you have to hurry, I don't have strong objection to your patches,
> but, IMHO, we should go slow, because it is not just trivial change.
> Hugetlb code is too subtle, so it is hard to confirm it's solidness.
> Following is the race problem what I found with those patches.
> 
> I assume that nr_free_hugepage is 2.
> 
> 1. parent process map an 1 hugepage sizeid region with MAP_PRIVATE
> 2. parent process write something to this region, so fault occur.
> 3. fault handling.
> 4. fork
> 5. parent process write something to this hugepage, so cow-fault occur.
> 6. while parent allocate a new page and do copy_user_huge_page()
> 	in fault handler, child process write something to this hugepage,
> 	so cow-fault occur. This access is not protected by table mutex,
> 	because mm is different.
> 7. child process die, because there is no free hugepage.
> 
> If we have no race, child process would not die,
> because all we needed is only 2 hugepages, one for parent,
> and the other for child.

Ouch, good catch.  Unlike the existing form of the race, I doubt this
one has been encountered in the wild, but it shows how subtle this is.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

  reply	other threads:[~2013-08-09  0:27 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-29  5:31 [PATCH 00/18] mm, hugetlb: remove a hugetlb_instantiation_mutex Joonsoo Kim
2013-07-29  5:31 ` [PATCH 01/18] mm, hugetlb: protect reserved pages when softofflining requests the pages Joonsoo Kim
2013-07-29  7:24   ` Hillf Danton
2013-07-31  2:27     ` Joonsoo Kim
2013-07-31  2:49       ` Hillf Danton
2013-07-31  4:41         ` Joonsoo Kim
2013-07-31  6:21           ` Hillf Danton
2013-07-31  6:37             ` Joonsoo Kim
2013-07-31 15:25               ` Hillf Danton
2013-08-01  6:07                 ` Joonsoo Kim
2013-08-01 16:17                 ` Aneesh Kumar K.V
2013-08-04  5:10                   ` Hillf Danton
2013-08-05  5:17                     ` Aneesh Kumar K.V
2013-07-30 16:49   ` Aneesh Kumar K.V
2013-07-29  5:31 ` [PATCH 02/18] mm, hugetlb: change variable name reservations to resv Joonsoo Kim
2013-07-30 16:50   ` Aneesh Kumar K.V
2013-07-29  5:31 ` [PATCH 03/18] mm, hugetlb: unify region structure handling Joonsoo Kim
2013-07-30 17:27   ` Aneesh Kumar K.V
2013-07-31  2:36     ` Joonsoo Kim
2013-07-29  5:31 ` [PATCH 04/18] mm, hugetlb: region manipulation functions take resv_map rather list_head Joonsoo Kim
2013-07-29  5:31 ` [PATCH 05/18] mm, hugetlb: protect region tracking via newly introduced resv_map lock Joonsoo Kim
2013-07-29  8:58   ` Hillf Danton
2013-07-31  2:41     ` Joonsoo Kim
2013-07-29 18:53   ` Davidlohr Bueso
2013-07-31  2:43     ` Joonsoo Kim
2013-07-29  5:31 ` [PATCH 06/18] mm, hugetlb: remove vma_need_reservation() Joonsoo Kim
2013-07-29 17:52   ` Naoya Horiguchi
2013-07-31  4:53     ` Joonsoo Kim
2013-07-30 17:49   ` Aneesh Kumar K.V
2013-07-31  4:56     ` Joonsoo Kim
2013-07-29  5:31 ` [PATCH 07/18] mm, hugetlb: pass has_reserve to dequeue_huge_page_vma() Joonsoo Kim
2013-07-29  5:31 ` [PATCH 08/18] mm, hugetlb: do hugepage_subpool_get_pages() when avoid_reserve Joonsoo Kim
2013-07-29 18:05   ` Naoya Horiguchi
2013-07-31  5:02     ` Joonsoo Kim
2013-07-31 20:55       ` Naoya Horiguchi
2013-07-29  5:32 ` [PATCH 09/18] mm, hugetlb: unify has_reserve and avoid_reserve to use_reserve Joonsoo Kim
2013-07-29  5:32 ` [PATCH 10/18] mm, hugetlb: call vma_has_reserve() before entering alloc_huge_page() Joonsoo Kim
2013-07-29 18:27   ` Naoya Horiguchi
2013-07-31  5:06     ` Joonsoo Kim
2013-07-29  5:32 ` [PATCH 11/18] mm, hugetlb: move down outside_reserve check Joonsoo Kim
2013-07-29 18:39   ` Naoya Horiguchi
2013-07-31  5:08     ` Joonsoo Kim
2013-07-31 20:46       ` Naoya Horiguchi
2013-07-29  5:32 ` [PATCH 12/18] mm, hugetlb: remove a check for return value of alloc_huge_page() Joonsoo Kim
2013-07-29  5:32 ` [PATCH 13/18] mm, hugetlb: grab a page_table_lock after page_cache_release Joonsoo Kim
2013-07-29 18:50   ` Naoya Horiguchi
2013-07-29  5:32 ` [PATCH 14/18] mm, hugetlb: clean-up error handling in hugetlb_cow() Joonsoo Kim
2013-07-29  5:32 ` [PATCH 15/18] mm, hugetlb: move up anon_vma_prepare() Joonsoo Kim
2013-07-29 19:05   ` Naoya Horiguchi
2013-07-29 19:19     ` Naoya Horiguchi
2013-07-31  5:12       ` Joonsoo Kim
2013-07-31 16:43         ` Naoya Horiguchi
2013-07-29  5:32 ` [PATCH 16/18] mm, hugetlb: return a reserved page to a reserved pool if failed Joonsoo Kim
2013-07-29 20:19   ` Naoya Horiguchi
2013-07-31  5:21     ` Joonsoo Kim
2013-07-29  5:32 ` [PATCH 17/18] mm, hugetlb: retry if we fail to allocate a hugepage with use_reserve Joonsoo Kim
2013-07-29  7:28   ` David Gibson
2013-07-31  5:37     ` Joonsoo Kim
2013-08-03 10:43       ` David Gibson
2013-08-05  7:36         ` Joonsoo Kim
2013-08-07  0:18           ` Davidlohr Bueso
2013-08-07  1:03             ` David Gibson
2013-08-07  1:38               ` Davidlohr Bueso
2013-08-07  9:18                 ` Joonsoo Kim
2013-08-09  0:02                   ` David Gibson [this message]
2013-08-09  9:37                     ` Joonsoo Kim
2013-07-29  5:32 ` [PATCH 18/18] mm, hugetlb: remove a hugetlb_instantiation_mutex Joonsoo Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130809000231.GB2904@voom.fritz.box \
    --to=david@gibson.dropbear.id.au \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=davidlohr.bueso@hp.com \
    --cc=davidlohr@hp.com \
    --cc=dhillf@gmail.com \
    --cc=hughd@google.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liwanp@linux.vnet.ibm.com \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).