Re: [PATCH] mm/hugetlb: per-vma instantiation mutexes

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Konstantin Khlebnikov <khlebnikov@openvz.org>
To: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>,
	Michel Lespinasse <walken@google.com>,
	Mel Gorman <mgorman@suse.de>, Michal Hocko <mhocko@suse.cz>,
	"AneeshKumarK.V" <aneesh.kumar@linux.vnet.ibm.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Hillf Danton <dhillf@gmail.com>, Hugh Dickins <hughd@google.com>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm/hugetlb: per-vma instantiation mutexes
Date: Mon, 15 Jul 2013 08:18:33 +0400	[thread overview]
Message-ID: <51E37819.8020207@openvz.org> (raw)
In-Reply-To: <1373671681.2448.10.camel@buesod1.americas.hpqcorp.net>

This seems incorrect. hugetlb_instantiation_mutex protects chains of struct file_region
in inode->i_mapping->private_list (VM_MAYSHARE) or vma_resv_map(vma)->regions (!VM_MAYSHARE)
These chains obviously can be shared between several vmas, so per-vma lock cannot protect them.

Davidlohr Bueso wrote:
> The hugetlb_instantiation_mutex serializes hugepage allocation and instantiation
> in the page directory entry. It was found that this mutex can become quite contended
> during the early phases of large databases which make use of huge pages - for instance
> startup and initial runs. One clear example is a 1.5Gb Oracle database, where lockstat
> reports that this mutex can be one of the top 5 most contended locks in the kernel during
> the first few minutes:
>
> hugetlb_instantiation_mutex:      10678     10678
>               ---------------------------
>               hugetlb_instantiation_mutex    10678  [<ffffffff8115e14e>] hugetlb_fault+0x9e/0x340
>               ---------------------------
>               hugetlb_instantiation_mutex    10678  [<ffffffff8115e14e>] hugetlb_fault+0x9e/0x340
>
> contentions:          10678
> acquisitions:         99476
> waittime-total: 76888911.01 us
>
> Instead of serializing each hugetlb fault, we can deal with concurrent faults for pages
> in different vmas. The per-vma mutex is initialized when creating a new vma. So, back to
> the example above, we now get much less contention:
>
>   &vma->hugetlb_instantiation_mutex:  1         1
>         ---------------------------------
>         &vma->hugetlb_instantiation_mutex       1   [<ffffffff8115e216>] hugetlb_fault+0xa6/0x350
>         ---------------------------------
>         &vma->hugetlb_instantiation_mutex       1    [<ffffffff8115e216>] hugetlb_fault+0xa6/0x350
>
> contentions:          1
> acquisitions:    108092
> waittime-total:  621.24 us
>
> Signed-off-by: Davidlohr Bueso<davidlohr.bueso@hp.com>
> ---
>   include/linux/mm_types.h |  3 +++
>   mm/hugetlb.c             | 12 +++++-------
>   mm/mmap.c                |  3 +++
>   3 files changed, 11 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index fb425aa..b45fd87 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -289,6 +289,9 @@ struct vm_area_struct {
>   #ifdef CONFIG_NUMA
>   	struct mempolicy *vm_policy;	/* NUMA policy for the VMA */
>   #endif
> +#ifdef CONFIG_HUGETLB_PAGE
> +	struct mutex hugetlb_instantiation_mutex;
> +#endif
>   };
>
>   struct core_thread {
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 83aff0a..12e665b 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -137,12 +137,12 @@ static inline struct hugepage_subpool *subpool_vma(struct vm_area_struct *vma)
>    * The region data structures are protected by a combination of the mmap_sem
>    * and the hugetlb_instantion_mutex.  To access or modify a region the caller
>    * must either hold the mmap_sem for write, or the mmap_sem for read and
> - * the hugetlb_instantiation mutex:
> + * the vma's hugetlb_instantiation mutex:
>    *
>    *	down_write(&mm->mmap_sem);
>    * or
>    *	down_read(&mm->mmap_sem);
> - *	mutex_lock(&hugetlb_instantiation_mutex);
> + *	mutex_lock(&vma->hugetlb_instantiation_mutex);
>    */
>   struct file_region {
>   	struct list_head link;
> @@ -2547,7 +2547,7 @@ static int unmap_ref_private(struct mm_struct *mm, struct vm_area_struct *vma,
>
>   /*
>    * Hugetlb_cow() should be called with page lock of the original hugepage held.
> - * Called with hugetlb_instantiation_mutex held and pte_page locked so we
> + * Called with the vma's hugetlb_instantiation_mutex held and pte_page locked so we
>    * cannot race with other handlers or page migration.
>    * Keep the pte_same checks anyway to make transition from the mutex easier.
>    */
> @@ -2847,7 +2847,6 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>   	int ret;
>   	struct page *page = NULL;
>   	struct page *pagecache_page = NULL;
> -	static DEFINE_MUTEX(hugetlb_instantiation_mutex);
>   	struct hstate *h = hstate_vma(vma);
>
>   	address&= huge_page_mask(h);
> @@ -2872,7 +2871,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
>   	 * get spurious allocation failures if two CPUs race to instantiate
>   	 * the same page in the page cache.
>   	 */
> -	mutex_lock(&hugetlb_instantiation_mutex);
> +	mutex_lock(&vma->hugetlb_instantiation_mutex);
>   	entry = huge_ptep_get(ptep);
>   	if (huge_pte_none(entry)) {
>   		ret = hugetlb_no_page(mm, vma, address, ptep, flags);
> @@ -2943,8 +2942,7 @@ out_page_table_lock:
>   	put_page(page);
>
>   out_mutex:
> -	mutex_unlock(&hugetlb_instantiation_mutex);
> -
> +	mutex_unlock(&vma->hugetlb_instantiation_mutex);
>   	return ret;
>   }
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index fbad7b0..8f0b034 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1543,6 +1543,9 @@ munmap_back:
>   	vma->vm_page_prot = vm_get_page_prot(vm_flags);
>   	vma->vm_pgoff = pgoff;
>   	INIT_LIST_HEAD(&vma->anon_vma_chain);
> +#ifdef CONFIG_HUGETLB_PAGE
> +	mutex_init(&vma->hugetlb_instantiation_mutex);
> +#endif
>
>   	error = -EINVAL;	/* when rejecting VM_GROWSDOWN|VM_GROWSUP */
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

     prev parent reply	other threads:[~2013-07-15  4:18 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-12 23:28 [PATCH] mm/hugetlb: per-vma instantiation mutexes Davidlohr Bueso
2013-07-13  0:54 ` Hugh Dickins
2013-07-15  3:16   ` Davidlohr Bueso
2013-07-15  7:24     ` David Gibson
2013-07-15 23:08       ` Andrew Morton
2013-07-16  0:12         ` Davidlohr Bueso
2013-07-16  8:00           ` David Gibson
2013-07-17 19:50         ` [PATCH] hugepage: allow parallelization of the hugepage fault path Davidlohr Bueso
2013-07-18  8:42           ` Joonsoo Kim
2013-07-19  7:14             ` David Gibson
2013-07-19 21:24               ` Davidlohr Bueso
2013-07-22  0:59                 ` Joonsoo Kim
2013-07-18  9:07           ` Joonsoo Kim
2013-07-19  0:19             ` Davidlohr Bueso
2013-07-19  0:35               ` Davidlohr Bueso
2013-07-23  7:04             ` Hush Bensen
2013-07-23  6:55           ` Hush Bensen
2013-07-16  1:51       ` [PATCH] mm/hugetlb: per-vma instantiation mutexes Rik van Riel
2013-07-16  5:34         ` Joonsoo Kim
2013-07-16 10:01           ` David Gibson
2013-07-18  6:50             ` Joonsoo Kim
2013-07-16  8:20         ` David Gibson
2013-07-15  4:18 ` Konstantin Khlebnikov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51E37819.8020207@openvz.org \
    --to=khlebnikov@openvz.org \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=davidlohr.bueso@hp.com \
    --cc=dhillf@gmail.com \
    --cc=hughd@google.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=riel@redhat.com \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).