Re: [PATCH v2 19/20] mm, hugetlb: retry if failed to allocate and there is concurrent user

All of lore.kernel.org
 help / color / mirror / Atom feed

From: David Gibson <david@gibson.dropbear.id.au>
To: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>, Mel Gorman <mgorman@suse.de>,
	Michal Hocko <mhocko@suse.cz>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Hugh Dickins <hughd@google.com>,
	Davidlohr Bueso <davidlohr.bueso@hp.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Joonsoo Kim <js1304@gmail.com>,
	Wanpeng Li <liwanp@linux.vnet.ibm.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Hillf Danton <dhillf@gmail.com>
Subject: Re: [PATCH v2 19/20] mm, hugetlb: retry if failed to allocate and there is concurrent user
Date: Thu, 5 Sep 2013 11:15:53 +1000	[thread overview]
Message-ID: <20130905011553.GA10158@voom.redhat.com> (raw)
In-Reply-To: <1376040398-11212-20-git-send-email-iamjoonsoo.kim@lge.com>

[-- Attachment #1: Type: text/plain, Size: 7680 bytes --]

On Fri, Aug 09, 2013 at 06:26:37PM +0900, Joonsoo Kim wrote:
> If parallel fault occur, we can fail to allocate a hugepage,
> because many threads dequeue a hugepage to handle a fault of same address.
> This makes reserved pool shortage just for a little while and this cause
> faulting thread who can get hugepages to get a SIGBUS signal.
> 
> To solve this problem, we already have a nice solution, that is,
> a hugetlb_instantiation_mutex. This blocks other threads to dive into
> a fault handler. This solve the problem clearly, but it introduce
> performance degradation, because it serialize all fault handling.
> 
> Now, I try to remove a hugetlb_instantiation_mutex to get rid of
> performance degradation. For achieving it, at first, we should ensure that
> no one get a SIGBUS if there are enough hugepages.
> 
> For this purpose, if we fail to allocate a new hugepage when there is
> concurrent user, we return just 0, instead of VM_FAULT_SIGBUS. With this,
> these threads defer to get a SIGBUS signal until there is no
> concurrent user, and so, we can ensure that no one get a SIGBUS if there
> are enough hugepages.
> 
> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index e29e28f..981c539 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -242,6 +242,7 @@ struct hstate {
>  	int next_nid_to_free;
>  	unsigned int order;
>  	unsigned long mask;
> +	unsigned long nr_dequeue_users;
>  	unsigned long max_huge_pages;
>  	unsigned long nr_huge_pages;
>  	unsigned long free_huge_pages;
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 8743e5c..0501fe5 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -561,6 +561,7 @@ retry_cpuset:
>  		if (cpuset_zone_allowed_softwall(zone, htlb_alloc_mask)) {
>  			page = dequeue_huge_page_node(h, zone_to_nid(zone));
>  			if (page) {
> +				h->nr_dequeue_users++;

So, nr_dequeue_users doesn't seem to be incremented in the
alloc_huge_page_node() path.  I'm not sure exactly where that's used,
so I'm not sure if it's a problem.

>  				if (!use_reserve)
>  					break;
>  
> @@ -577,6 +578,16 @@ retry_cpuset:
>  	return page;
>  }
>  
> +static void commit_dequeued_huge_page(struct hstate *h, bool do_dequeue)
> +{
> +	if (!do_dequeue)
> +		return;

Seems like it would be easier to do this test in the callers, but I
doubt it matters much.

> +	spin_lock(&hugetlb_lock);
> +	h->nr_dequeue_users--;
> +	spin_unlock(&hugetlb_lock);
> +}
> +
>  static void update_and_free_page(struct hstate *h, struct page *page)
>  {
>  	int i;
> @@ -1110,7 +1121,9 @@ static void vma_commit_reservation(struct hstate *h,
>  }
>  
>  static struct page *alloc_huge_page(struct vm_area_struct *vma,
> -				    unsigned long addr, int use_reserve)
> +				    unsigned long addr, int use_reserve,
> +				    unsigned long *nr_dequeue_users,
> +				    bool *do_dequeue)
>  {
>  	struct hugepage_subpool *spool = subpool_vma(vma);
>  	struct hstate *h = hstate_vma(vma);
> @@ -1138,8 +1151,11 @@ static struct page *alloc_huge_page(struct vm_area_struct *vma,
>  		return ERR_PTR(-ENOSPC);
>  	}
>  	spin_lock(&hugetlb_lock);
> +	*do_dequeue = true;
>  	page = dequeue_huge_page_vma(h, vma, addr, use_reserve);
>  	if (!page) {
> +		*nr_dequeue_users = h->nr_dequeue_users;

So, the nr_dequeue_users parameter is only initialized if !page here.
It's not obvious to me that the callers only use it in hat case.

> +		*do_dequeue = false;
>  		spin_unlock(&hugetlb_lock);
>  		page = alloc_buddy_huge_page(h, NUMA_NO_NODE);
>  		if (!page) {

I think the counter also needs to be incremented in the case where we
call alloc_buddy_huge_page() from alloc_huge_page().  Even though it's
new, it gets added to the hugepage pool at this point and could still
be a contended page for the last allocation, unless I'm missing
something.

> @@ -1894,6 +1910,7 @@ void __init hugetlb_add_hstate(unsigned order)
>  	h->mask = ~((1ULL << (order + PAGE_SHIFT)) - 1);
>  	h->nr_huge_pages = 0;
>  	h->free_huge_pages = 0;
> +	h->nr_dequeue_users = 0;
>  	for (i = 0; i < MAX_NUMNODES; ++i)
>  		INIT_LIST_HEAD(&h->hugepage_freelists[i]);
>  	INIT_LIST_HEAD(&h->hugepage_activelist);
> @@ -2500,6 +2517,8 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
>  	int outside_reserve = 0;
>  	long chg;
>  	bool use_reserve = false;
> +	unsigned long nr_dequeue_users = 0;
> +	bool do_dequeue = false;
>  	int ret = 0;
>  	unsigned long mmun_start;	/* For mmu_notifiers */
>  	unsigned long mmun_end;		/* For mmu_notifiers */
> @@ -2551,11 +2570,17 @@ retry_avoidcopy:
>  		use_reserve = !chg;
>  	}
>  
> -	new_page = alloc_huge_page(vma, address, use_reserve);
> +	new_page = alloc_huge_page(vma, address, use_reserve,
> +						&nr_dequeue_users, &do_dequeue);
>  
>  	if (IS_ERR(new_page)) {
>  		page_cache_release(old_page);
>  
> +		if (nr_dequeue_users) {
> +			ret = 0;
> +			goto out_lock;
> +		}
> +
>  		/*
>  		 * If a process owning a MAP_PRIVATE mapping fails to COW,
>  		 * it is due to references held by a child and an insufficient
> @@ -2580,6 +2605,9 @@ retry_avoidcopy:
>  			WARN_ON_ONCE(1);
>  		}
>  
> +		if (use_reserve)
> +			WARN_ON_ONCE(1);
> +
>  		ret = VM_FAULT_SIGBUS;
>  		goto out_lock;
>  	}
> @@ -2614,6 +2642,7 @@ retry_avoidcopy:
>  	page_cache_release(new_page);
>  out_old_page:
>  	page_cache_release(old_page);
> +	commit_dequeued_huge_page(h, do_dequeue);
>  out_lock:
>  	/* Caller expects lock to be held */
>  	spin_lock(&mm->page_table_lock);
> @@ -2666,6 +2695,8 @@ static int hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
>  	pte_t new_pte;
>  	long chg;
>  	bool use_reserve;
> +	unsigned long nr_dequeue_users = 0;
> +	bool do_dequeue = false;
>  
>  	/*
>  	 * Currently, we are forced to kill the process in the event the
> @@ -2699,9 +2730,17 @@ retry:
>  		}
>  		use_reserve = !chg;
>  
> -		page = alloc_huge_page(vma, address, use_reserve);
> +		page = alloc_huge_page(vma, address, use_reserve,
> +					&nr_dequeue_users, &do_dequeue);
>  		if (IS_ERR(page)) {
> -			ret = VM_FAULT_SIGBUS;
> +			if (nr_dequeue_users)
> +				ret = 0;
> +			else {
> +				if (use_reserve)
> +					WARN_ON_ONCE(1);
> +
> +				ret = VM_FAULT_SIGBUS;
> +			}
>  			goto out;
>  		}
>  		clear_huge_page(page, address, pages_per_huge_page(h));
> @@ -2714,22 +2753,24 @@ retry:
>  			err = add_to_page_cache(page, mapping, idx, GFP_KERNEL);
>  			if (err) {
>  				put_page(page);
> +				commit_dequeued_huge_page(h, do_dequeue);
>  				if (err == -EEXIST)
>  					goto retry;
>  				goto out;
>  			}
>  			ClearPagePrivate(page);
> +			commit_dequeued_huge_page(h, do_dequeue);
>  
>  			spin_lock(&inode->i_lock);
>  			inode->i_blocks += blocks_per_huge_page(h);
>  			spin_unlock(&inode->i_lock);
>  		} else {
>  			lock_page(page);
> +			anon_rmap = 1;
>  			if (unlikely(anon_vma_prepare(vma))) {
>  				ret = VM_FAULT_OOM;
>  				goto backout_unlocked;
>  			}
> -			anon_rmap = 1;
>  		}
>  	} else {
>  		/*
> @@ -2783,6 +2824,8 @@ retry:
>  	spin_unlock(&mm->page_table_lock);
>  	unlock_page(page);
>  out:
> +	if (anon_rmap)
> +		commit_dequeued_huge_page(h, do_dequeue);
>  	return ret;
>  
>  backout:

Otherwise I think it looks good.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

next prev parent reply	other threads:[~2013-09-05  1:15 UTC|newest]

Thread overview: 139+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-09  9:26 [PATCH v2 00/20] mm, hugetlb: remove a hugetlb_instantiation_mutex Joonsoo Kim
2013-08-09  9:26 ` Joonsoo Kim
2013-08-09  9:26 ` [PATCH v2 01/20] mm, hugetlb: protect reserved pages when soft offlining a hugepage Joonsoo Kim
2013-08-09  9:26   ` Joonsoo Kim
2013-08-12 13:20   ` Davidlohr Bueso
2013-08-12 13:20     ` Davidlohr Bueso
2013-08-09  9:26 ` [PATCH v2 02/20] mm, hugetlb: change variable name reservations to resv Joonsoo Kim
2013-08-09  9:26   ` Joonsoo Kim
2013-08-12 13:21   ` Davidlohr Bueso
2013-08-12 13:21     ` Davidlohr Bueso
2013-08-09  9:26 ` [PATCH v2 03/20] mm, hugetlb: fix subpool accounting handling Joonsoo Kim
2013-08-09  9:26   ` Joonsoo Kim
2013-08-21  9:28   ` Aneesh Kumar K.V
2013-08-21  9:28     ` Aneesh Kumar K.V
2013-08-22  6:50     ` Joonsoo Kim
2013-08-22  6:50       ` Joonsoo Kim
2013-08-22  7:08       ` Aneesh Kumar K.V
2013-08-22  7:08         ` Aneesh Kumar K.V
2013-08-22  7:47         ` Joonsoo Kim
2013-08-22  7:47           ` Joonsoo Kim
2013-08-26 13:01           ` Aneesh Kumar K.V
2013-08-26 13:01             ` Aneesh Kumar K.V
2013-08-27  7:40             ` Joonsoo Kim
2013-08-27  7:40               ` Joonsoo Kim
2013-08-09  9:26 ` [PATCH v2 04/20] mm, hugetlb: remove useless check about mapping type Joonsoo Kim
2013-08-09  9:26   ` Joonsoo Kim
2013-08-12 13:31   ` Davidlohr Bueso
2013-08-12 13:31     ` Davidlohr Bueso
2013-08-21  9:30   ` Aneesh Kumar K.V
2013-08-21  9:30     ` Aneesh Kumar K.V
2013-08-09  9:26 ` [PATCH v2 05/20] mm, hugetlb: grab a page_table_lock after page_cache_release Joonsoo Kim
2013-08-09  9:26   ` Joonsoo Kim
2013-08-12 13:35   ` Davidlohr Bueso
2013-08-12 13:35     ` Davidlohr Bueso
2013-08-21  9:31   ` Aneesh Kumar K.V
2013-08-21  9:31     ` Aneesh Kumar K.V
2013-08-09  9:26 ` [PATCH v2 06/20] mm, hugetlb: return a reserved page to a reserved pool if failed Joonsoo Kim
2013-08-09  9:26   ` Joonsoo Kim
2013-08-21  9:54   ` Aneesh Kumar K.V
2013-08-21  9:54     ` Aneesh Kumar K.V
2013-08-22  6:51     ` Joonsoo Kim
2013-08-22  6:51       ` Joonsoo Kim
2013-08-09  9:26 ` [PATCH v2 07/20] mm, hugetlb: unify region structure handling Joonsoo Kim
2013-08-09  9:26   ` Joonsoo Kim
2013-08-21  9:57   ` Aneesh Kumar K.V
2013-08-21  9:57     ` Aneesh Kumar K.V
2013-08-22  6:56     ` Joonsoo Kim
2013-08-22  6:56       ` Joonsoo Kim
2013-08-21 10:22   ` Aneesh Kumar K.V
2013-08-21 10:22     ` Aneesh Kumar K.V
2013-08-22  6:53     ` Joonsoo Kim
2013-08-22  6:53       ` Joonsoo Kim
2013-08-09  9:26 ` [PATCH v2 08/20] mm, hugetlb: region manipulation functions take resv_map rather list_head Joonsoo Kim
2013-08-09  9:26   ` Joonsoo Kim
2013-08-21  9:58   ` Aneesh Kumar K.V
2013-08-21  9:58     ` Aneesh Kumar K.V
2013-08-09  9:26 ` [PATCH v2 09/20] mm, hugetlb: protect region tracking via newly introduced resv_map lock Joonsoo Kim
2013-08-09  9:26   ` Joonsoo Kim
2013-08-12 22:03   ` Davidlohr Bueso
2013-08-12 22:03     ` Davidlohr Bueso
2013-08-13  7:45     ` Joonsoo Kim
2013-08-13  7:45       ` Joonsoo Kim
2013-08-21 10:13   ` Aneesh Kumar K.V
2013-08-21 10:13     ` Aneesh Kumar K.V
2013-08-22  6:59     ` Joonsoo Kim
2013-08-22  6:59       ` Joonsoo Kim
2013-08-09  9:26 ` [PATCH v2 10/20] mm, hugetlb: remove resv_map_put() Joonsoo Kim
2013-08-09  9:26   ` Joonsoo Kim
2013-08-21 10:49   ` Aneesh Kumar K.V
2013-08-21 10:49     ` Aneesh Kumar K.V
2013-08-22  7:24     ` Joonsoo Kim
2013-08-22  7:24       ` Joonsoo Kim
2013-08-09  9:26 ` [PATCH v2 11/20] mm, hugetlb: make vma_resv_map() works for all mapping type Joonsoo Kim
2013-08-09  9:26   ` Joonsoo Kim
2013-08-21 10:37   ` Aneesh Kumar K.V
2013-08-21 10:37     ` Aneesh Kumar K.V
2013-08-22  7:25     ` Joonsoo Kim
2013-08-22  7:25       ` Joonsoo Kim
2013-08-09  9:26 ` [PATCH v2 12/20] mm, hugetlb: remove vma_has_reserves() Joonsoo Kim
2013-08-09  9:26   ` Joonsoo Kim
2013-08-22  8:44   ` Aneesh Kumar K.V
2013-08-22  8:44     ` Aneesh Kumar K.V
2013-08-22  9:17     ` Joonsoo Kim
2013-08-22  9:17       ` Joonsoo Kim
2013-08-22 11:04       ` Aneesh Kumar K.V
2013-08-22 11:04         ` Aneesh Kumar K.V
2013-08-23  6:16         ` Joonsoo Kim
2013-08-23  6:16           ` Joonsoo Kim
2013-08-09  9:26 ` [PATCH v2 13/20] mm, hugetlb: mm, hugetlb: unify chg and avoid_reserve to use_reserve Joonsoo Kim
2013-08-09  9:26   ` Joonsoo Kim
2013-08-26 13:09   ` Aneesh Kumar K.V
2013-08-26 13:09     ` Aneesh Kumar K.V
2013-08-27  7:57     ` Joonsoo Kim
2013-08-27  7:57       ` Joonsoo Kim
2013-08-09  9:26 ` [PATCH v2 14/20] mm, hugetlb: call vma_needs_reservation before entering alloc_huge_page() Joonsoo Kim
2013-08-09  9:26   ` Joonsoo Kim
2013-08-26 13:36   ` Aneesh Kumar K.V
2013-08-26 13:36     ` Aneesh Kumar K.V
2013-08-26 13:46     ` Aneesh Kumar K.V
2013-08-26 13:46       ` Aneesh Kumar K.V
2013-08-27  7:58       ` Joonsoo Kim
2013-08-27  7:58         ` Joonsoo Kim
2013-08-09  9:26 ` [PATCH v2 15/20] mm, hugetlb: remove a check for return value of alloc_huge_page() Joonsoo Kim
2013-08-09  9:26   ` Joonsoo Kim
2013-08-26 13:38   ` Aneesh Kumar K.V
2013-08-26 13:38     ` Aneesh Kumar K.V
2013-08-09  9:26 ` [PATCH v2 16/20] mm, hugetlb: move down outside_reserve check Joonsoo Kim
2013-08-09  9:26   ` Joonsoo Kim
2013-08-26 13:44   ` Aneesh Kumar K.V
2013-08-26 13:44     ` Aneesh Kumar K.V
2013-08-09  9:26 ` [PATCH v2 17/20] mm, hugetlb: move up anon_vma_prepare() Joonsoo Kim
2013-08-09  9:26   ` Joonsoo Kim
2013-08-26 14:09   ` Aneesh Kumar K.V
2013-08-26 14:09     ` Aneesh Kumar K.V
2013-08-09  9:26 ` [PATCH v2 18/20] mm, hugetlb: clean-up error handling in hugetlb_cow() Joonsoo Kim
2013-08-09  9:26   ` Joonsoo Kim
2013-08-26 14:12   ` Aneesh Kumar K.V
2013-08-26 14:12     ` Aneesh Kumar K.V
2013-08-09  9:26 ` [PATCH v2 19/20] mm, hugetlb: retry if failed to allocate and there is concurrent user Joonsoo Kim
2013-08-09  9:26   ` Joonsoo Kim
2013-09-04  8:44   ` Joonsoo Kim
2013-09-04  8:44     ` Joonsoo Kim
2013-09-05  1:16     ` David Gibson
2013-09-05  1:15   ` David Gibson [this message]
2013-09-05  5:43     ` Joonsoo Kim
2013-09-05  5:43       ` Joonsoo Kim
2013-09-16 12:09       ` David Gibson
2013-09-30  7:47         ` Joonsoo Kim
2013-09-30  7:47           ` Joonsoo Kim
2013-12-09 16:36           ` Davidlohr Bueso
2013-12-09 16:36             ` Davidlohr Bueso
2013-12-10  8:32             ` Joonsoo Kim
2013-12-10  8:32               ` Joonsoo Kim
2013-08-09  9:26 ` [PATCH v2 20/20] mm, hugetlb: remove a hugetlb_instantiation_mutex Joonsoo Kim
2013-08-09  9:26   ` Joonsoo Kim
2013-08-14 23:22 ` [PATCH v2 00/20] " Andrew Morton
2013-08-14 23:22   ` Andrew Morton
2013-08-16 17:18   ` JoonSoo Kim
2013-08-16 17:18     ` JoonSoo Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130905011553.GA10158@voom.redhat.com \
    --to=david@gibson.dropbear.id.au \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=davidlohr.bueso@hp.com \
    --cc=dhillf@gmail.com \
    --cc=hughd@google.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=js1304@gmail.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liwanp@linux.vnet.ibm.com \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.cz \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.