Re: [PATCH] hugepage: allow parallelization of the hugepage fault path

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
To: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: David Gibson <david@gibson.dropbear.id.au>,
	Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>, Rik van Riel <riel@redhat.com>,
	Michel Lespinasse <walken@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Konstantin Khlebnikov <khlebnikov@openvz.org>,
	Michal Hocko <mhocko@suse.cz>,
	"AneeshKumarK.V" <aneesh.kumar@linux.vnet.ibm.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Hillf Danton <dhillf@gmail.com>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
	Eric B Munson <emunson@mgebm.net>,
	Anton Blanchard <anton@samba.org>
Subject: Re: [PATCH] hugepage: allow parallelization of the hugepage fault path
Date: Mon, 22 Jul 2013 09:59:41 +0900	[thread overview]
Message-ID: <20130722005940.GA7506@lge.com> (raw)
In-Reply-To: <1374269055.9305.19.camel@buesod1.americas.hpqcorp.net>

On Fri, Jul 19, 2013 at 02:24:15PM -0700, Davidlohr Bueso wrote:
> On Fri, 2013-07-19 at 17:14 +1000, David Gibson wrote:
> > On Thu, Jul 18, 2013 at 05:42:35PM +0900, Joonsoo Kim wrote:
> > > On Wed, Jul 17, 2013 at 12:50:25PM -0700, Davidlohr Bueso wrote:
> > > > From: David Gibson <david@gibson.dropbear.id.au>
> > > > 
> > > > At present, the page fault path for hugepages is serialized by a
> > > > single mutex. This is used to avoid spurious out-of-memory conditions
> > > > when the hugepage pool is fully utilized (two processes or threads can
> > > > race to instantiate the same mapping with the last hugepage from the
> > > > pool, the race loser returning VM_FAULT_OOM).  This problem is
> > > > specific to hugepages, because it is normal to want to use every
> > > > single hugepage in the system - with normal pages we simply assume
> > > > there will always be a few spare pages which can be used temporarily
> > > > until the race is resolved.
> > > > 
> > > > Unfortunately this serialization also means that clearing of hugepages
> > > > cannot be parallelized across multiple CPUs, which can lead to very
> > > > long process startup times when using large numbers of hugepages.
> > > > 
> > > > This patch improves the situation by replacing the single mutex with a
> > > > table of mutexes, selected based on a hash, which allows us to know
> > > > which page in the file we're instantiating. For shared mappings, the
> > > > hash key is selected based on the address space and file offset being faulted.
> > > > Similarly, for private mappings, the mm and virtual address are used.
> > > > 
> > > 
> > > Hello.
> > > 
> > > With this table mutex, we cannot protect region tracking structure.
> > > See below comment.
> > > 
> > > /*
> > >  * Region tracking -- allows tracking of reservations and instantiated pages
> > >  *                    across the pages in a mapping.
> > >  *
> > >  * The region data structures are protected by a combination of the mmap_sem
> > >  * and the hugetlb_instantion_mutex.  To access or modify a region the caller
> > >  * must either hold the mmap_sem for write, or the mmap_sem for read and
> > >  * the hugetlb_instantiation mutex:
> > >  *
> > >  *      down_write(&mm->mmap_sem);
> > >  * or
> > >  *      down_read(&mm->mmap_sem);
> > >  *      mutex_lock(&hugetlb_instantiation_mutex);
> > >  */
> > 
> > Ugh.  Who the hell added that.  I guess you'll need to split of
> > another mutex for that purpose, afaict there should be no interaction
> > with the actual, intended purpose of the instantiation mutex.
> 
> This was added in commit 84afd99b. One way to go would be to add a
> spinlock to protect changes to the regions - however reading the
> changelog, and based on David's previous explanation for the
> instantiation mutex, I don't see why it was added. In fact several
> places modify regions without holding the instantiation mutex, ie:
> hugetlb_reserve_pages()
> 
> Am I missing something here?

hugetlb_reserve_pages() is called with down_write(mmap_sem),
so fault flow which require down_read(mmap_sem) cannot interfere
to change the region.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
To: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: David Gibson <david@gibson.dropbear.id.au>,
	Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>, Rik van Riel <riel@redhat.com>,
	Michel Lespinasse <walken@google.com>,
	Mel Gorman <mgorman@suse.de>,
	Konstantin Khlebnikov <khlebnikov@openvz.org>,
	Michal Hocko <mhocko@suse.cz>,
	"AneeshKumarK.V" <aneesh.kumar@linux.vnet.ibm.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Hillf Danton <dhillf@gmail.com>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
	Eric B Munson <emunson@mgebm.net>,
	Anton Blanchard <anton@samba.org>
Subject: Re: [PATCH] hugepage: allow parallelization of the hugepage fault path
Date: Mon, 22 Jul 2013 09:59:41 +0900	[thread overview]
Message-ID: <20130722005940.GA7506@lge.com> (raw)
In-Reply-To: <1374269055.9305.19.camel@buesod1.americas.hpqcorp.net>

On Fri, Jul 19, 2013 at 02:24:15PM -0700, Davidlohr Bueso wrote:
> On Fri, 2013-07-19 at 17:14 +1000, David Gibson wrote:
> > On Thu, Jul 18, 2013 at 05:42:35PM +0900, Joonsoo Kim wrote:
> > > On Wed, Jul 17, 2013 at 12:50:25PM -0700, Davidlohr Bueso wrote:
> > > > From: David Gibson <david@gibson.dropbear.id.au>
> > > > 
> > > > At present, the page fault path for hugepages is serialized by a
> > > > single mutex. This is used to avoid spurious out-of-memory conditions
> > > > when the hugepage pool is fully utilized (two processes or threads can
> > > > race to instantiate the same mapping with the last hugepage from the
> > > > pool, the race loser returning VM_FAULT_OOM).  This problem is
> > > > specific to hugepages, because it is normal to want to use every
> > > > single hugepage in the system - with normal pages we simply assume
> > > > there will always be a few spare pages which can be used temporarily
> > > > until the race is resolved.
> > > > 
> > > > Unfortunately this serialization also means that clearing of hugepages
> > > > cannot be parallelized across multiple CPUs, which can lead to very
> > > > long process startup times when using large numbers of hugepages.
> > > > 
> > > > This patch improves the situation by replacing the single mutex with a
> > > > table of mutexes, selected based on a hash, which allows us to know
> > > > which page in the file we're instantiating. For shared mappings, the
> > > > hash key is selected based on the address space and file offset being faulted.
> > > > Similarly, for private mappings, the mm and virtual address are used.
> > > > 
> > > 
> > > Hello.
> > > 
> > > With this table mutex, we cannot protect region tracking structure.
> > > See below comment.
> > > 
> > > /*
> > >  * Region tracking -- allows tracking of reservations and instantiated pages
> > >  *                    across the pages in a mapping.
> > >  *
> > >  * The region data structures are protected by a combination of the mmap_sem
> > >  * and the hugetlb_instantion_mutex.  To access or modify a region the caller
> > >  * must either hold the mmap_sem for write, or the mmap_sem for read and
> > >  * the hugetlb_instantiation mutex:
> > >  *
> > >  *      down_write(&mm->mmap_sem);
> > >  * or
> > >  *      down_read(&mm->mmap_sem);
> > >  *      mutex_lock(&hugetlb_instantiation_mutex);
> > >  */
> > 
> > Ugh.  Who the hell added that.  I guess you'll need to split of
> > another mutex for that purpose, afaict there should be no interaction
> > with the actual, intended purpose of the instantiation mutex.
> 
> This was added in commit 84afd99b. One way to go would be to add a
> spinlock to protect changes to the regions - however reading the
> changelog, and based on David's previous explanation for the
> instantiation mutex, I don't see why it was added. In fact several
> places modify regions without holding the instantiation mutex, ie:
> hugetlb_reserve_pages()
> 
> Am I missing something here?

hugetlb_reserve_pages() is called with down_write(mmap_sem),
so fault flow which require down_read(mmap_sem) cannot interfere
to change the region.

Thanks.

next prev parent reply	other threads:[~2013-07-22  0:59 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-12 23:28 [PATCH] mm/hugetlb: per-vma instantiation mutexes Davidlohr Bueso
2013-07-12 23:28 ` Davidlohr Bueso
2013-07-13  0:54 ` Hugh Dickins
2013-07-13  0:54   ` Hugh Dickins
2013-07-15  3:16   ` Davidlohr Bueso
2013-07-15  3:16     ` Davidlohr Bueso
2013-07-15  7:24     ` David Gibson
2013-07-15 23:08       ` Andrew Morton
2013-07-15 23:08         ` Andrew Morton
2013-07-16  0:12         ` Davidlohr Bueso
2013-07-16  0:12           ` Davidlohr Bueso
2013-07-16  8:00           ` David Gibson
2013-07-17 19:50         ` [PATCH] hugepage: allow parallelization of the hugepage fault path Davidlohr Bueso
2013-07-17 19:50           ` Davidlohr Bueso
2013-07-18  8:42           ` Joonsoo Kim
2013-07-18  8:42             ` Joonsoo Kim
2013-07-19  7:14             ` David Gibson
2013-07-19 21:24               ` Davidlohr Bueso
2013-07-19 21:24                 ` Davidlohr Bueso
2013-07-22  0:59                 ` Joonsoo Kim [this message]
2013-07-22  0:59                   ` Joonsoo Kim
2013-07-18  9:07           ` Joonsoo Kim
2013-07-18  9:07             ` Joonsoo Kim
2013-07-19  0:19             ` Davidlohr Bueso
2013-07-19  0:19               ` Davidlohr Bueso
2013-07-19  0:35               ` Davidlohr Bueso
2013-07-19  0:35                 ` Davidlohr Bueso
2013-07-23  7:04             ` Hush Bensen
2013-07-23  7:04               ` Hush Bensen
2013-07-23  6:55           ` Hush Bensen
2013-07-23  6:55             ` Hush Bensen
2013-07-16  1:51       ` [PATCH] mm/hugetlb: per-vma instantiation mutexes Rik van Riel
2013-07-16  1:51         ` Rik van Riel
2013-07-16  5:34         ` Joonsoo Kim
2013-07-16  5:34           ` Joonsoo Kim
2013-07-16 10:01           ` David Gibson
2013-07-18  6:50             ` Joonsoo Kim
2013-07-18  6:50               ` Joonsoo Kim
2013-07-16  8:20         ` David Gibson
2013-07-15  4:18 ` Konstantin Khlebnikov
2013-07-15  4:18   ` Konstantin Khlebnikov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130722005940.GA7506@lge.com \
    --to=iamjoonsoo.kim@lge.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=anton@samba.org \
    --cc=david@gibson.dropbear.id.au \
    --cc=davidlohr.bueso@hp.com \
    --cc=dhillf@gmail.com \
    --cc=emunson@mgebm.net \
    --cc=hughd@google.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=khlebnikov@openvz.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=riel@redhat.com \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.