public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Mike Kravetz <mike.kravetz@oracle.com>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Michal Hocko <mhocko@kernel.org>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Sasha Levin <sashal@kernel.org>,
	linux-mm@kvack.org
Subject: [PATCH AUTOSEL 4.4 04/56] hugetlbfs: on restore reserve error path retain subpool reservation
Date: Sat,  1 Jun 2019 09:25:08 -0400	[thread overview]
Message-ID: <20190601132600.27427-4-sashal@kernel.org> (raw)
In-Reply-To: <20190601132600.27427-1-sashal@kernel.org>

From: Mike Kravetz <mike.kravetz@oracle.com>

[ Upstream commit 0919e1b69ab459e06df45d3ba6658d281962db80 ]

When a huge page is allocated, PagePrivate() is set if the allocation
consumed a reservation.  When freeing a huge page, PagePrivate is checked.
If set, it indicates the reservation should be restored.  PagePrivate
being set at free huge page time mostly happens on error paths.

When huge page reservations are created, a check is made to determine if
the mapping is associated with an explicitly mounted filesystem.  If so,
pages are also reserved within the filesystem.  The default action when
freeing a huge page is to decrement the usage count in any associated
explicitly mounted filesystem.  However, if the reservation is to be
restored the reservation/use count within the filesystem should not be
decrementd.  Otherwise, a subsequent page allocation and free for the same
mapping location will cause the file filesystem usage to go 'negative'.

Filesystem                         Size  Used Avail Use% Mounted on
nodev                              4.0G -4.0M  4.1G    - /opt/hugepool

To fix, when freeing a huge page do not adjust filesystem usage if
PagePrivate() is set to indicate the reservation should be restored.

I did not cc stable as the problem has been around since reserves were
added to hugetlbfs and nobody has noticed.

Link: http://lkml.kernel.org/r/20190328234704.27083-2-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 mm/hugetlb.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 324b2953e57e9..0357ad53af368 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1221,12 +1221,23 @@ void free_huge_page(struct page *page)
 	ClearPagePrivate(page);
 
 	/*
-	 * A return code of zero implies that the subpool will be under its
-	 * minimum size if the reservation is not restored after page is free.
-	 * Therefore, force restore_reserve operation.
+	 * If PagePrivate() was set on page, page allocation consumed a
+	 * reservation.  If the page was associated with a subpool, there
+	 * would have been a page reserved in the subpool before allocation
+	 * via hugepage_subpool_get_pages().  Since we are 'restoring' the
+	 * reservtion, do not call hugepage_subpool_put_pages() as this will
+	 * remove the reserved page from the subpool.
 	 */
-	if (hugepage_subpool_put_pages(spool, 1) == 0)
-		restore_reserve = true;
+	if (!restore_reserve) {
+		/*
+		 * A return code of zero implies that the subpool will be
+		 * under its minimum size if the reservation is not restored
+		 * after page is free.  Therefore, force restore_reserve
+		 * operation.
+		 */
+		if (hugepage_subpool_put_pages(spool, 1) == 0)
+			restore_reserve = true;
+	}
 
 	spin_lock(&hugetlb_lock);
 	clear_page_huge_active(page);
-- 
2.20.1


  parent reply	other threads:[~2019-06-01 13:30 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-01 13:25 [PATCH AUTOSEL 4.4 01/56] fs/fat/file.c: issue flush after the writeback of FAT Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 02/56] sysctl: return -EINVAL if val violates minmax Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 03/56] ipc: prevent lockup on alloc_msg and free_msg Sasha Levin
2019-06-01 13:25 ` Sasha Levin [this message]
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 05/56] mm/cma.c: fix crash on CMA allocation if bitmap allocation fails Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 06/56] mm/cma_debug.c: fix the break condition in cma_maxchunk_get() Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 07/56] kernel/sys.c: prctl: fix false positive in validate_prctl_map() Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 08/56] mfd: intel-lpss: Set the device in reset state when init Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 09/56] mfd: twl6040: Fix device init errors for ACCCTL register Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 10/56] perf/x86/intel: Allow PEBS multi-entry in watermark mode Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 11/56] drm/bridge: adv7511: Fix low refresh rate selection Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 12/56] NFS4: Fix v4.0 client state corruption when mount Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 13/56] ntp: Allow TAI-UTC offset to be set to zero Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 14/56] f2fs: fix to avoid panic in do_recover_data() Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 15/56] f2fs: fix to do sanity check on valid block count of segment Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 16/56] tracing: Fix partial reading of trace event's id file Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 17/56] uml: fix a boot splat wrt use of cpu_all_mask Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 18/56] mips: Make sure dt memory regions are valid Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 19/56] iommu/vt-d: Set intel_iommu_gfx_mapped correctly Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 20/56] ALSA: hda - Register irq handler after the chip initialization Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 21/56] nvmem: core: fix read buffer in place Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 22/56] stm class: Fix channel free in stm output free path Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 23/56] fuse: honor RLIMIT_FSIZE in fuse_file_fallocate Sasha Levin
2019-06-05 20:24   ` Liu Bo
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 24/56] fuse: require /dev/fuse reads to have enough buffer capacity Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 25/56] fuse: retrieve: cap requested size to negotiated max_write Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 26/56] nfsd: allow fh_want_write to be called twice Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 27/56] PCI: Mark Atheros AR9462 to avoid bus reset Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 28/56] media: ov6650: Fix sensor possibly not detected on probe Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 29/56] x86/PCI: Fix PCI IRQ routing table memory leak Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 30/56] platform/chrome: cros_ec_proto: check for NULL transfer function Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 31/56] soc: mediatek: pwrap: Zero initialize rdata in pwrap_init_cipher Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 32/56] clk: rockchip: Turn on "aclk_dmac1" for suspend on rk3288 Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 33/56] fbdev: fix WARNING in __alloc_pages_nodemask bug Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 34/56] iommu/tegra-smmu: Fix invalid ASID bits on Tegra30/114 Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 35/56] ARM: dts: imx6sx: Specify IMX6SX_CLK_IPG as "ahb" clock to SDMA Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 36/56] ARM: dts: imx6sx: Specify IMX6SX_CLK_IPG as "ipg" " Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 37/56] ARM: dts: imx6qdl: Specify IMX6QDL_CLK_IPG " Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 38/56] md: add mddev->pers to avoid potential NULL pointer dereference Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 39/56] PCI: rpadlpar: Fix leaked device_node references in add/remove paths Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 40/56] PCI: rcar: Fix a potential NULL pointer dereference Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 41/56] fbdev: sm712fb: fix crashes and garbled display during DPMS modesetting Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 42/56] fbdev: sm712fb: fix boot screen glitch when sm712fb replaces VGA Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 43/56] video: hgafb: fix potential NULL pointer dereference Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 44/56] video: imsttfb: fix potential NULL pointer dereferences Sasha Levin
2019-06-01 16:19   ` Greg Kroah-Hartman
2019-06-01 23:53     ` Finn Thain
2019-06-02 14:11     ` Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 45/56] fbdev: sm712fb: fix brightness control on reboot, don't set SR30 Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 46/56] fbdev: fix divide error in fb_var_to_videomode Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 47/56] fbdev: sm712fb: fix white screen of death on reboot, don't set CR3B-CR3F Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 48/56] fbdev: sm712fb: fix crashes during framebuffer writes by correctly mapping VRAM Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 49/56] PCI: xilinx: Check for __get_free_pages() failure Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 50/56] tty: pty: Fix race condition between release_one_tty and pty_write Sasha Levin
2019-06-01 16:17   ` Greg Kroah-Hartman
2019-06-01 16:18     ` Greg Kroah-Hartman
2019-06-11 16:25       ` Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 51/56] gpio: gpio-omap: add check for off wake capable gpios Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 52/56] dmaengine: idma64: Use actual device for DMA transfers Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 53/56] pwm: tiehrpwm: Update shadow register for disabling PWMs Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 54/56] ARM: dts: exynos: Always enable necessary APIO_1V8 and ABB_1V8 regulators on Arndale Octa Sasha Levin
2019-06-01 13:25 ` [PATCH AUTOSEL 4.4 55/56] pwm: Fix deadlock warning when removing PWM device Sasha Levin
2019-06-01 13:26 ` [PATCH AUTOSEL 4.4 56/56] ARM: exynos: Fix undefined instruction during Exynos5422 resume Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190601132600.27427-4-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=dave@stgolabs.net \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox