stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Mike Kravetz <mike.kravetz@oracle.com>,
	Paul Cassella <cassella@cray.com>,
	Michal Hocko <mhocko@kernel.org>,
	Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>,
	Hillf Danton <hillf.zj@alibaba-inc.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 4.4 07/48] mm/hugetlb.c: fix reservation race when freeing surplus pages
Date: Wed, 18 Jan 2017 11:46:16 +0100	[thread overview]
Message-ID: <20170118104625.864063693@linuxfoundation.org> (raw)
In-Reply-To: <20170118104625.550018627@linuxfoundation.org>

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Mike Kravetz <mike.kravetz@oracle.com>

commit e5bbc8a6c992901058bc09e2ce01d16c111ff047 upstream.

return_unused_surplus_pages() decrements the global reservation count,
and frees any unused surplus pages that were backing the reservation.

Commit 7848a4bf51b3 ("mm/hugetlb.c: add cond_resched_lock() in
return_unused_surplus_pages()") added a call to cond_resched_lock in the
loop freeing the pages.

As a result, the hugetlb_lock could be dropped, and someone else could
use the pages that will be freed in subsequent iterations of the loop.
This could result in inconsistent global hugetlb page state, application
api failures (such as mmap) failures or application crashes.

When dropping the lock in return_unused_surplus_pages, make sure that
the global reservation count (resv_huge_pages) remains sufficiently
large to prevent someone else from claiming pages about to be freed.

Analyzed by Paul Cassella.

Fixes: 7848a4bf51b3 ("mm/hugetlb.c: add cond_resched_lock() in return_unused_surplus_pages()")
Link: http://lkml.kernel.org/r/1483991767-6879-1-git-send-email-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Paul Cassella <cassella@cray.com>
Suggested-by: Michal Hocko <mhocko@kernel.org>
Cc: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/hugetlb.c |   37 ++++++++++++++++++++++++++++---------
 1 file changed, 28 insertions(+), 9 deletions(-)

--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1723,23 +1723,32 @@ free:
 }
 
 /*
- * When releasing a hugetlb pool reservation, any surplus pages that were
- * allocated to satisfy the reservation must be explicitly freed if they were
- * never used.
- * Called with hugetlb_lock held.
+ * This routine has two main purposes:
+ * 1) Decrement the reservation count (resv_huge_pages) by the value passed
+ *    in unused_resv_pages.  This corresponds to the prior adjustments made
+ *    to the associated reservation map.
+ * 2) Free any unused surplus pages that may have been allocated to satisfy
+ *    the reservation.  As many as unused_resv_pages may be freed.
+ *
+ * Called with hugetlb_lock held.  However, the lock could be dropped (and
+ * reacquired) during calls to cond_resched_lock.  Whenever dropping the lock,
+ * we must make sure nobody else can claim pages we are in the process of
+ * freeing.  Do this by ensuring resv_huge_page always is greater than the
+ * number of huge pages we plan to free when dropping the lock.
  */
 static void return_unused_surplus_pages(struct hstate *h,
 					unsigned long unused_resv_pages)
 {
 	unsigned long nr_pages;
 
-	/* Uncommit the reservation */
-	h->resv_huge_pages -= unused_resv_pages;
-
 	/* Cannot return gigantic pages currently */
 	if (hstate_is_gigantic(h))
-		return;
+		goto out;
 
+	/*
+	 * Part (or even all) of the reservation could have been backed
+	 * by pre-allocated pages. Only free surplus pages.
+	 */
 	nr_pages = min(unused_resv_pages, h->surplus_huge_pages);
 
 	/*
@@ -1749,12 +1758,22 @@ static void return_unused_surplus_pages(
 	 * when the nodes with surplus pages have no free pages.
 	 * free_pool_huge_page() will balance the the freed pages across the
 	 * on-line nodes with memory and will handle the hstate accounting.
+	 *
+	 * Note that we decrement resv_huge_pages as we free the pages.  If
+	 * we drop the lock, resv_huge_pages will still be sufficiently large
+	 * to cover subsequent pages we may free.
 	 */
 	while (nr_pages--) {
+		h->resv_huge_pages--;
+		unused_resv_pages--;
 		if (!free_pool_huge_page(h, &node_states[N_MEMORY], 1))
-			break;
+			goto out;
 		cond_resched_lock(&hugetlb_lock);
 	}
+
+out:
+	/* Fully uncommit the reservation */
+	h->resv_huge_pages -= unused_resv_pages;
 }
 
 



  parent reply	other threads:[~2017-01-18 10:51 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20170118104957epcas3p3c8bb456f6ed6bf7171f9b645196aafc7@epcas3p3.samsung.com>
2017-01-18 10:46 ` [PATCH 4.4 00/48] 4.4.44-stable review Greg Kroah-Hartman
2017-01-18 10:46   ` [PATCH 4.4 01/48] Input: xpad - use correct product id for x360w controllers Greg Kroah-Hartman
2017-01-18 10:46   ` [PATCH 4.4 02/48] Input: i8042 - add Pegatron touchpad to noloop table Greg Kroah-Hartman
2017-01-18 10:46   ` [PATCH 4.4 03/48] selftests: do not require bash to run netsocktests testcase Greg Kroah-Hartman
2017-01-18 10:46   ` [PATCH 4.4 04/48] selftests: do not require bash for the generated test Greg Kroah-Hartman
2017-01-18 10:46   ` [PATCH 4.4 05/48] mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done} Greg Kroah-Hartman
2017-02-09 15:26     ` Ben Hutchings
2017-02-10  5:00       ` Dan Williams
2017-01-18 10:46   ` [PATCH 4.4 06/48] ocfs2: fix crash caused by stale lvb with fsdlm plugin Greg Kroah-Hartman
2017-01-18 10:46   ` Greg Kroah-Hartman [this message]
2017-01-18 10:46   ` [PATCH 4.4 08/48] KVM: x86: fix emulation of "MOV SS, null selector" Greg Kroah-Hartman
2017-01-18 10:46   ` [PATCH 4.4 10/48] jump_labels: API for flushing deferred jump label updates Greg Kroah-Hartman
2017-01-18 10:46   ` [PATCH 4.4 11/48] KVM: x86: flush pending lapic jump label updates on module unload Greg Kroah-Hartman
2017-01-18 10:46   ` [PATCH 4.4 15/48] KVM: x86: Introduce segmented_write_std Greg Kroah-Hartman
2017-01-18 10:46   ` [PATCH 4.4 16/48] nl80211: fix sched scan netlink socket owner destruction Greg Kroah-Hartman
2017-01-18 10:46   ` [PATCH 4.4 17/48] USB: serial: kl5kusb105: fix line-state error handling Greg Kroah-Hartman
2017-01-18 10:46   ` [PATCH 4.4 18/48] USB: serial: ch341: fix initial modem-control state Greg Kroah-Hartman
2017-01-18 10:46   ` [PATCH 4.4 19/48] USB: serial: ch341: fix open error handling Greg Kroah-Hartman
2017-01-18 10:46   ` [PATCH 4.4 20/48] USB: serial: ch341: fix control-message " Greg Kroah-Hartman
2017-01-18 10:46   ` [PATCH 4.4 21/48] USB: serial: ch341: fix open and resume after B0 Greg Kroah-Hartman
2017-01-18 10:46   ` [PATCH 4.4 22/48] Input: elants_i2c - avoid divide by 0 errors on bad touchscreen data Greg Kroah-Hartman
2017-01-18 10:46   ` [PATCH 4.4 23/48] i2c: print correct device invalid address Greg Kroah-Hartman
2017-01-18 10:46   ` [PATCH 4.4 24/48] i2c: fix kernel memory disclosure in dev interface Greg Kroah-Hartman
2017-01-18 10:46   ` [PATCH 4.4 25/48] xhci: fix deadlock at host remove by running watchdog correctly Greg Kroah-Hartman
2017-01-18 10:46   ` [PATCH 4.4 27/48] mnt: Protect the mountpoint hashtable with mount_lock Greg Kroah-Hartman
2017-01-18 10:46   ` [PATCH 4.4 28/48] tty/serial: atmel_serial: BUG: stop DMA from transmitting in stop_tx Greg Kroah-Hartman
2017-01-18 10:46   ` [PATCH 4.4 29/48] sysrq: attach sysrq handler correctly for 32-bit kernel Greg Kroah-Hartman
2017-01-18 10:46   ` [PATCH 4.4 30/48] sysctl: Drop reference added by grab_header in proc_sys_readdir Greg Kroah-Hartman
2017-01-18 10:46   ` [PATCH 4.4 31/48] drm/radeon: drop verde dpm quirks Greg Kroah-Hartman
2017-01-18 10:46   ` [PATCH 4.4 32/48] USB: serial: ch341: fix resume after reset Greg Kroah-Hartman
2017-01-18 10:46   ` [PATCH 4.4 33/48] USB: serial: ch341: fix modem-control and B0 handling Greg Kroah-Hartman
2017-01-18 10:46   ` [PATCH 4.4 34/48] x86/cpu: Fix bootup crashes by sanitizing the argument of the clearcpuid= command-line option Greg Kroah-Hartman
2017-01-18 10:46   ` [PATCH 4.4 35/48] btrfs: fix locking when we put back a delayed ref thats too new Greg Kroah-Hartman
2017-01-18 10:46   ` [PATCH 4.4 36/48] btrfs: fix error handling when run_delayed_extent_op fails Greg Kroah-Hartman
2017-01-18 10:46   ` [PATCH 4.4 37/48] pinctrl: meson: fix gpio request disabling other modes Greg Kroah-Hartman
2017-01-18 10:46   ` [PATCH 4.4 38/48] pNFS: Fix race in pnfs_wait_on_layoutreturn Greg Kroah-Hartman
2017-01-18 10:46   ` [PATCH 4.4 39/48] NFS: Fix a performance regression in readdir Greg Kroah-Hartman
2017-01-18 10:46   ` [PATCH 4.4 40/48] NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success Greg Kroah-Hartman
2017-01-18 10:46   ` [PATCH 4.4 41/48] cpufreq: powernv: Disable preemption while checking CPU throttling state Greg Kroah-Hartman
2017-01-18 10:46   ` [PATCH 4.4 42/48] block: cfq_cpd_alloc() should use @gfp Greg Kroah-Hartman
2017-01-18 10:46   ` [PATCH 4.4 43/48] ACPI / APEI: Fix NMI notification handling Greg Kroah-Hartman
2017-01-18 10:46   ` [PATCH 4.4 44/48] blk-mq: Always schedule hctx->next_cpu Greg Kroah-Hartman
2017-01-18 10:46   ` [PATCH 4.4 45/48] bus: vexpress-config: fix device reference leak Greg Kroah-Hartman
2017-01-18 10:46   ` [PATCH 4.4 46/48] powerpc/ibmebus: Fix further device reference leaks Greg Kroah-Hartman
2017-01-18 10:46   ` [PATCH 4.4 47/48] powerpc/ibmebus: Fix device reference leaks in sysfs interface Greg Kroah-Hartman
2017-01-18 18:45   ` [PATCH 4.4 00/48] 4.4.44-stable review Guenter Roeck
2017-01-19 18:02   ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170118104625.864063693@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=cassella@cray.com \
    --cc=hillf.zj@alibaba-inc.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=m.mizuma@jp.fujitsu.com \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).