From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Mike Kravetz <mike.kravetz@oracle.com>,
Jan Stancek <jstancek@redhat.com>,
"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
Hillf Danton <hillf.zj@alibaba-inc.com>,
Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
Michal Hocko <mhocko@suse.com>,
"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 4.8 09/92] mm/hugetlb: fix huge page reservation leak in private mapping error paths
Date: Thu, 17 Nov 2016 11:31:42 +0100 [thread overview]
Message-ID: <20161117103224.598220864@linuxfoundation.org> (raw)
In-Reply-To: <20161117103224.218007793@linuxfoundation.org>
4.8-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mike Kravetz <mike.kravetz@oracle.com>
commit 96b96a96ddee4ba08ce4aeb8a558a3271fd4a7a7 upstream.
Error paths in hugetlb_cow() and hugetlb_no_page() may free a newly
allocated huge page.
If a reservation was associated with the huge page, alloc_huge_page()
consumed the reservation while allocating. When the newly allocated
page is freed in free_huge_page(), it will increment the global
reservation count. However, the reservation entry in the reserve map
will remain.
This is not an issue for shared mappings as the entry in the reserve map
indicates a reservation exists. But, an entry in a private mapping
reserve map indicates the reservation was consumed and no longer exists.
This results in an inconsistency between the reserve map and the global
reservation count. This 'leaks' a reserved huge page.
Create a new routine restore_reserve_on_error() to restore the reserve
entry in these specific error paths. This routine makes use of a new
function vma_add_reservation() which will add a reserve entry for a
specific address/page.
In general, these error paths were rarely (if ever) taken on most
architectures. However, powerpc contained arch specific code that that
resulted in an extra fault and execution of these error paths on all
private mappings.
Fixes: 67961f9db8c4 ("mm/hugetlb: fix huge page reserve accounting for private mappings)
Link: http://lkml.kernel.org/r/1476933077-23091-2-git-send-email-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Jan Stancek <jstancek@redhat.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Kirill A . Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
mm/hugetlb.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 66 insertions(+)
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1826,11 +1826,17 @@ static void return_unused_surplus_pages(
* is not the case is if a reserve map was changed between calls. It
* is the responsibility of the caller to notice the difference and
* take appropriate action.
+ *
+ * vma_add_reservation is used in error paths where a reservation must
+ * be restored when a newly allocated huge page must be freed. It is
+ * to be called after calling vma_needs_reservation to determine if a
+ * reservation exists.
*/
enum vma_resv_mode {
VMA_NEEDS_RESV,
VMA_COMMIT_RESV,
VMA_END_RESV,
+ VMA_ADD_RESV,
};
static long __vma_reservation_common(struct hstate *h,
struct vm_area_struct *vma, unsigned long addr,
@@ -1856,6 +1862,14 @@ static long __vma_reservation_common(str
region_abort(resv, idx, idx + 1);
ret = 0;
break;
+ case VMA_ADD_RESV:
+ if (vma->vm_flags & VM_MAYSHARE)
+ ret = region_add(resv, idx, idx + 1);
+ else {
+ region_abort(resv, idx, idx + 1);
+ ret = region_del(resv, idx, idx + 1);
+ }
+ break;
default:
BUG();
}
@@ -1903,6 +1917,56 @@ static void vma_end_reservation(struct h
(void)__vma_reservation_common(h, vma, addr, VMA_END_RESV);
}
+static long vma_add_reservation(struct hstate *h,
+ struct vm_area_struct *vma, unsigned long addr)
+{
+ return __vma_reservation_common(h, vma, addr, VMA_ADD_RESV);
+}
+
+/*
+ * This routine is called to restore a reservation on error paths. In the
+ * specific error paths, a huge page was allocated (via alloc_huge_page)
+ * and is about to be freed. If a reservation for the page existed,
+ * alloc_huge_page would have consumed the reservation and set PagePrivate
+ * in the newly allocated page. When the page is freed via free_huge_page,
+ * the global reservation count will be incremented if PagePrivate is set.
+ * However, free_huge_page can not adjust the reserve map. Adjust the
+ * reserve map here to be consistent with global reserve count adjustments
+ * to be made by free_huge_page.
+ */
+static void restore_reserve_on_error(struct hstate *h,
+ struct vm_area_struct *vma, unsigned long address,
+ struct page *page)
+{
+ if (unlikely(PagePrivate(page))) {
+ long rc = vma_needs_reservation(h, vma, address);
+
+ if (unlikely(rc < 0)) {
+ /*
+ * Rare out of memory condition in reserve map
+ * manipulation. Clear PagePrivate so that
+ * global reserve count will not be incremented
+ * by free_huge_page. This will make it appear
+ * as though the reservation for this page was
+ * consumed. This may prevent the task from
+ * faulting in the page at a later time. This
+ * is better than inconsistent global huge page
+ * accounting of reserve counts.
+ */
+ ClearPagePrivate(page);
+ } else if (rc) {
+ rc = vma_add_reservation(h, vma, address);
+ if (unlikely(rc < 0))
+ /*
+ * See above comment about rare out of
+ * memory condition.
+ */
+ ClearPagePrivate(page);
+ } else
+ vma_end_reservation(h, vma, address);
+ }
+}
+
struct page *alloc_huge_page(struct vm_area_struct *vma,
unsigned long addr, int avoid_reserve)
{
@@ -3498,6 +3562,7 @@ retry_avoidcopy:
spin_unlock(ptl);
mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);
out_release_all:
+ restore_reserve_on_error(h, vma, address, new_page);
put_page(new_page);
out_release_old:
put_page(old_page);
@@ -3680,6 +3745,7 @@ backout:
spin_unlock(ptl);
backout_unlocked:
unlock_page(page);
+ restore_reserve_on_error(h, vma, address, page);
put_page(page);
goto out;
}
next prev parent reply other threads:[~2016-11-17 10:34 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20161117103726epcas5p2d4b3b822fdf8596bbd1a48a77364d0ac@epcas5p2.samsung.com>
2016-11-17 10:31 ` [PATCH 4.8 00/92] 4.8.9-stable review Greg Kroah-Hartman
2016-11-17 10:31 ` [PATCH 4.8 01/92] ALSA: info: Return error for invalid read/write Greg Kroah-Hartman
2016-11-17 10:31 ` [PATCH 4.8 02/92] ALSA: info: Limit the proc text input size Greg Kroah-Hartman
2016-11-17 10:31 ` [PATCH 4.8 03/92] ASoC: cs4270: fix DAPM stream name mismatch Greg Kroah-Hartman
2016-11-17 10:31 ` [PATCH 4.8 05/92] mm, frontswap: make sure allocated frontswap map is assigned Greg Kroah-Hartman
2016-11-17 10:31 ` [PATCH 4.8 06/92] shmem: fix pageflags after swapping DMA32 object Greg Kroah-Hartman
2016-11-17 10:31 ` [PATCH 4.8 07/92] swapfile: fix memory corruption via malformed swapfile Greg Kroah-Hartman
2016-11-17 10:31 ` [PATCH 4.8 08/92] mm: hwpoison: fix thp split handling in memory_failure() Greg Kroah-Hartman
2016-11-17 10:31 ` Greg Kroah-Hartman [this message]
2016-11-17 10:31 ` [PATCH 4.8 10/92] coredump: fix unfreezable coredumping task Greg Kroah-Hartman
2016-11-17 10:31 ` [PATCH 4.8 11/92] s390/hypfs: Use get_free_page() instead of kmalloc to ensure page alignment Greg Kroah-Hartman
2016-11-17 10:31 ` [PATCH 4.8 12/92] ARC: timer: rtc: implement read loop in "C" vs. inline asm Greg Kroah-Hartman
2016-11-17 10:31 ` [PATCH 4.8 13/92] PCI: Dont attempt to claim shadow copies of ROM Greg Kroah-Hartman
2016-11-17 10:31 ` [PATCH 4.8 14/92] arc: Implement arch-specific dma_map_ops.mmap Greg Kroah-Hartman
2016-11-17 10:31 ` [PATCH 4.8 15/92] pinctrl: cherryview: Serialize register access in suspend/resume Greg Kroah-Hartman
2016-11-17 10:31 ` [PATCH 4.8 16/92] pinctrl: cherryview: Prevent possible interrupt storm on resume Greg Kroah-Hartman
2016-11-17 10:31 ` [PATCH 4.8 17/92] cpupower: Correct return type of cpu_power_is_cpu_online() in cpufreq-set Greg Kroah-Hartman
2016-11-17 10:31 ` [PATCH 4.8 18/92] mmc: sdhci: Fix CMD line reset interfering with ongoing data transfer Greg Kroah-Hartman
2016-11-17 10:31 ` [PATCH 4.8 19/92] mmc: sdhci: Fix unexpected data interrupt handling Greg Kroah-Hartman
2016-11-17 10:31 ` [PATCH 4.8 20/92] mmc: mmc: Use 500ms as the default generic CMD6 timeout Greg Kroah-Hartman
2016-11-17 10:31 ` [PATCH 4.8 21/92] staging: iio: ad5933: avoid uninitialized variable in error case Greg Kroah-Hartman
2016-11-17 10:31 ` [PATCH 4.8 22/92] staging: sm750fb: Fix bugs introduced by early commits Greg Kroah-Hartman
2016-11-17 10:31 ` [PATCH 4.8 23/92] staging: comedi: ni_tio: fix buggy ni_tio_clock_period_ps() return value Greg Kroah-Hartman
2016-11-17 10:31 ` [PATCH 4.8 24/92] drivers: staging: nvec: remove bogus reset command for PS/2 interface Greg Kroah-Hartman
2016-11-17 10:31 ` [PATCH 4.8 25/92] Revert "staging: nvec: ps2: change serio type to passthrough" Greg Kroah-Hartman
2016-11-17 10:31 ` [PATCH 4.8 26/92] staging: nvec: remove managed resource from PS2 driver Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 27/92] usb: dwc3: Fix error handling for core init Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 28/92] USB: cdc-acm: fix TIOCMIWAIT Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 30/92] drbd: Fix kernel_sendmsg() usage - potential NULL deref Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 31/92] toshiba-wmi: Fix loading the driver on non Toshiba laptops Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 32/92] clk: qoriq: Dont allow CPU clocks higher than starting value Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 33/92] cdc-acm: fix uninitialized variable Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 34/92] iio: hid-sensors: Increase the precision of scale to fix wrong reading interpretation Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 35/92] iio: orientation: hid-sensor-rotation: Add PM function (fix non working driver) Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 36/92] iio: st_sensors: fix scale configuration for h3lis331dl Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 37/92] scsi: qla2xxx: Fix scsi scan hang triggered if adapter fails during init Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 38/92] scsi: mpt3sas: Fix for block device of raid exists even after deleting raid disk Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 39/92] scsi: scsi_dh_alua: fix missing kref_put() in alua_rtpg_work() Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 40/92] scsi: scsi_dh_alua: Fix a reference counting bug Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 41/92] KVM: arm/arm64: vgic: Prevent access to invalid SPIs Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 45/92] drm/i915/dp: Extend BDW DP audio workaround to GEN9 platforms Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 47/92] drm/amdgpu: fix crash in acp_hw_fini Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 51/92] xprtrdma: use complete() instead complete_all() Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 52/92] xprtrdma: Fix DMAR failure in frwr_op_map() after reconnect Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 53/92] iommu/io-pgtable-arm: Check for v7s-incapable systems Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 54/92] iommu/amd: Free domain id when free a domain of struct dma_ops_domain Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 55/92] iommu/vt-d: Fix dead-locks in disable_dmar_iommu() path Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 57/92] watchdog: core: Fix devres_alloc() allocation size Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 58/92] Input: synaptics-rmi4 - fix error handling in SPI transport driver Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 59/92] Input: synaptics-rmi4 - fix error handling in I2C " Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 60/92] perf top: Fix refreshing hierarchy entries on TUI Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 61/92] mei: bus: fix received data size check in NFC fixup Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 62/92] svcrdma: Skip put_page() when send_reply() fails Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 63/92] svcrdma: Tail iovec leaves an orphaned DMA mapping Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 64/92] nvme: Delete created IO queues on reset Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 65/92] Revert "clocksource/drivers/timer_sun5i: Replace code by clocksource_mmio_init" Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 66/92] x86/build: Fix build with older GCC versions Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 67/92] clk: samsung: clk-exynos-audss: Fix module autoload Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 68/92] rtc: pcf2123: Add missing error code assignment before test Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 69/92] s390/dumpstack: restore reliable indicator for call traces Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 70/92] lib/genalloc.c: start search from start of chunk Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 71/92] hwrng: core - Dont use a stack buffer in add_early_randomness() Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 72/92] i40e: fix call of ndo_dflt_bridge_getlink() Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 73/92] mmc: sdhci-msm: Fix error return code in sdhci_msm_probe() Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 74/92] ACPI / APEI: Fix incorrect return value of ghes_proc() Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 75/92] ACPI/PCI/IRQ: assign ISA IRQ directly during early boot stages Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 76/92] ACPI/PCI: pci_link: penalize SCI correctly Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 77/92] ACPI/PCI: pci_link: Include PIRQ_PENALTY_PCI_USING for ISA IRQs Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 80/92] gpio/mvebu: Use irq_domain_add_linear Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 81/92] gpio: of: fix GPIO drivers with multiple gpio_chip for a single node Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 82/92] ASoC: Intel: Skylake: Always acquire runtime pm ref on unload Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 83/92] ASoC: sun4i-codec: return error code instead of NULL when create_card fails Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 84/92] pinctrl: iproc: Fix iProc and NSP GPIO support Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 85/92] mmc: mxs: Initialize the spinlock prior to using it Greg Kroah-Hartman
2016-11-17 10:32 ` [PATCH 4.8 86/92] memcg: prevent memcg caches to be both OFF_SLAB & OBJFREELIST_SLAB Greg Kroah-Hartman
2016-11-17 10:33 ` [PATCH 4.8 87/92] libceph: fix legacy layout decode with pool 0 Greg Kroah-Hartman
2016-11-17 10:33 ` [PATCH 4.8 88/92] NFSv4.1: work around -Wmaybe-uninitialized warning Greg Kroah-Hartman
2016-11-17 10:33 ` [PATCH 4.8 92/92] netfilter: fix namespace handling in nf_log_proc_dostring Greg Kroah-Hartman
[not found] ` <20161117103227.709330459@linuxfoundation.org>
2016-11-17 10:51 ` [PATCH 4.8 78/92] batman-adv: fix splat on disabling an interface Sven Eckelmann
2016-11-17 12:02 ` Greg Kroah-Hartman
[not found] ` <ff6afc35-bd5d-f6f0-f483-e1bc692646d5@samsung.com>
2016-11-17 16:48 ` [PATCH 4.8 00/92] 4.8.9-stable review Greg Kroah-Hartman
2016-11-17 22:23 ` Guenter Roeck
2016-11-18 7:14 ` Greg Kroah-Hartman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161117103224.598220864@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=dave.hansen@linux.intel.com \
--cc=hillf.zj@alibaba-inc.com \
--cc=jstancek@redhat.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mhocko@suse.com \
--cc=mike.kravetz@oracle.com \
--cc=n-horiguchi@ah.jp.nec.com \
--cc=stable@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).