From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Denis Lisov <dennis.lissov@gmail.com>,
Qian Cai <cai@lca.pw>, Hugh Dickins <hughd@google.com>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Suren Baghdasaryan <surenb@google.com>
Subject: [PATCH 4.19 19/49] mm/khugepaged: fix filemap page_to_pgoff(page) != offset
Date: Mon, 12 Oct 2020 15:27:05 +0200 [thread overview]
Message-ID: <20201012132630.341352860@linuxfoundation.org> (raw)
In-Reply-To: <20201012132629.469542486@linuxfoundation.org>
From: Hugh Dickins <hughd@google.com>
commit 033b5d77551167f8c24ca862ce83d3e0745f9245 upstream.
There have been elusive reports of filemap_fault() hitting its
VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page) on kernels built
with CONFIG_READ_ONLY_THP_FOR_FS=y.
Suren has hit it on a kernel with CONFIG_READ_ONLY_THP_FOR_FS=y and
CONFIG_NUMA is not set: and he has analyzed it down to how khugepaged
without NUMA reuses the same huge page after collapse_file() failed
(whereas NUMA targets its allocation to the respective node each time).
And most of us were usually testing with CONFIG_NUMA=y kernels.
collapse_file(old start)
new_page = khugepaged_alloc_page(hpage)
__SetPageLocked(new_page)
new_page->index = start // hpage->index=old offset
new_page->mapping = mapping
xas_store(&xas, new_page)
filemap_fault
page = find_get_page(mapping, offset)
// if offset falls inside hpage then
// compound_head(page) == hpage
lock_page_maybe_drop_mmap()
__lock_page(page)
// collapse fails
xas_store(&xas, old page)
new_page->mapping = NULL
unlock_page(new_page)
collapse_file(new start)
new_page = khugepaged_alloc_page(hpage)
__SetPageLocked(new_page)
new_page->index = start // hpage->index=new offset
new_page->mapping = mapping // mapping becomes valid again
// since compound_head(page) == hpage
// page_to_pgoff(page) got changed
VM_BUG_ON_PAGE(page_to_pgoff(page) != offset)
An initial patch replaced __SetPageLocked() by lock_page(), which did
fix the race which Suren illustrates above. But testing showed that it's
not good enough: if the racing task's __lock_page() gets delayed long
after its find_get_page(), then it may follow collapse_file(new start)'s
successful final unlock_page(), and crash on the same VM_BUG_ON_PAGE.
It could be fixed by relaxing filemap_fault()'s VM_BUG_ON_PAGE to a
check and retry (as is done for mapping), with similar relaxations in
find_lock_entry() and pagecache_get_page(): but it's not obvious what
else might get caught out; and khugepaged non-NUMA appears to be unique
in exposing a page to page cache, then revoking, without going through
a full cycle of freeing before reuse.
Instead, non-NUMA khugepaged_prealloc_page() release the old page
if anyone else has a reference to it (1% of cases when I tested).
Although never reported on huge tmpfs, I believe its find_lock_entry()
has been at similar risk; but huge tmpfs does not rely on khugepaged
for its normal working nearly so much as READ_ONLY_THP_FOR_FS does.
Reported-by: Denis Lisov <dennis.lissov@gmail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206569
Link: https://lore.kernel.org/linux-mm/?q=20200219144635.3b7417145de19b65f258c943%40linux-foundation.org
Reported-by: Qian Cai <cai@lca.pw>
Link: https://lore.kernel.org/linux-xfs/?q=20200616013309.GB815%40lca.pw
Reported-and-analyzed-by: Suren Baghdasaryan <surenb@google.com>
Fixes: 87c460a0bded ("mm/khugepaged: collapse_shmem() without freezing new_page")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v4.9+
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
mm/khugepaged.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -820,6 +820,18 @@ static struct page *khugepaged_alloc_hug
static bool khugepaged_prealloc_page(struct page **hpage, bool *wait)
{
+ /*
+ * If the hpage allocated earlier was briefly exposed in page cache
+ * before collapse_file() failed, it is possible that racing lookups
+ * have not yet completed, and would then be unpleasantly surprised by
+ * finding the hpage reused for the same mapping at a different offset.
+ * Just release the previous allocation if there is any danger of that.
+ */
+ if (*hpage && page_count(*hpage) > 1) {
+ put_page(*hpage);
+ *hpage = NULL;
+ }
+
if (!*hpage)
*hpage = khugepaged_alloc_hugepage(wait);
next prev parent reply other threads:[~2020-10-12 13:39 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-12 13:26 [PATCH 4.19 00/49] 4.19.151-rc1 review Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.19 01/49] fbdev, newport_con: Move FONT_EXTRA_WORDS macros into linux/font.h Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.19 02/49] Fonts: Support FONT_EXTRA_WORDS macros for built-in fonts Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.19 03/49] fbcon: Fix global-out-of-bounds read in fbcon_get_font() Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.19 04/49] Revert "ravb: Fixed to be able to unload modules" Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.19 05/49] net: wireless: nl80211: fix out-of-bounds access in nl80211_del_key() Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.19 06/49] drm/nouveau/mem: guard against NULL pointer access in mem_del Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.19 07/49] usermodehelper: reset umask to default before executing user process Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.19 08/49] platform/x86: intel-vbtn: Fix SW_TABLET_MODE always reporting 1 on the HP Pavilion 11 x360 Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.19 09/49] platform/x86: thinkpad_acpi: initialize tp_nvram_state variable Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.19 10/49] platform/x86: intel-vbtn: Switch to an allow-list for SW_TABLET_MODE reporting Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.19 11/49] platform/x86: thinkpad_acpi: re-initialize ACPI buffer size when reuse Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.19 12/49] driver core: Fix probe_count imbalance in really_probe() Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 4.19 13/49] perf top: Fix stdio interface input handling with glibc 2.28+ Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.19 14/49] i2c: i801: Exclude device from suspend direct complete optimization Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.19 15/49] mtd: rawnand: sunxi: Fix the probe error path Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.19 16/49] arm64: dts: stratix10: add status to qspi dts node Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.19 17/49] nvme-core: put ctrl ref when module ref get fail Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.19 18/49] macsec: avoid use-after-free in macsec_handle_frame() Greg Kroah-Hartman
2020-10-12 13:27 ` Greg Kroah-Hartman [this message]
2020-10-12 13:27 ` [PATCH 4.19 20/49] xfrmi: drop ignore_df check before updating pmtu Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.19 21/49] cifs: Fix incomplete memory allocation on setxattr path Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.19 22/49] i2c: meson: fix clock setting overwrite Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.19 23/49] i2c: meson: fixup rate calculation with filter delay Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.19 24/49] i2c: owl: Clear NACK and BUS error bits Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.19 25/49] sctp: fix sctp_auth_init_hmacs() error path Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.19 26/49] team: set dev->needed_headroom in team_setup_by_port() Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.19 27/49] net: team: fix memory leak in __team_options_register Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.19 28/49] openvswitch: handle DNAT tuple collision Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.19 29/49] drm/amdgpu: prevent double kfree ttm->sg Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.19 30/49] xfrm: clone XFRMA_SET_MARK in xfrm_do_migrate Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.19 31/49] xfrm: clone XFRMA_REPLAY_ESN_VAL " Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.19 32/49] xfrm: clone XFRMA_SEC_CTX " Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.19 33/49] xfrm: clone whole liftime_cur structure " Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.19 34/49] net: stmmac: removed enabling eee in EEE set callback Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.19 35/49] platform/x86: fix kconfig dependency warning for FUJITSU_LAPTOP Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.19 36/49] xfrm: Use correct address family in xfrm_state_find Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.19 37/49] bonding: set dev->needed_headroom in bond_setup_by_slave() Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.19 38/49] mdio: fix mdio-thunder.c dependency & build error Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.19 39/49] net: usb: ax88179_178a: fix missing stop entry in driver_info Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.19 40/49] net/mlx5e: Fix VLAN cleanup flow Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.19 41/49] net/mlx5e: Fix VLAN create flow Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.19 42/49] rxrpc: Fix rxkad token xdr encoding Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.19 43/49] rxrpc: Downgrade the BUG() for unsupported token type in rxrpc_read() Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.19 44/49] rxrpc: Fix some missing _bh annotations on locking conn->state_lock Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.19 45/49] rxrpc: Fix server keyring leak Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.19 46/49] perf: Fix task_function_call() error handling Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.19 47/49] mmc: core: dont set limits.discard_granularity as 0 Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.19 48/49] mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 4.19 49/49] net: usb: rtl8150: set random MAC address when set_ethernet_addr() fails Greg Kroah-Hartman
2020-10-13 6:06 ` [PATCH 4.19 00/49] 4.19.151-rc1 review Naresh Kamboju
2020-10-13 16:40 ` Guenter Roeck
2020-10-14 8:34 ` Greg Kroah-Hartman
2020-10-13 18:11 ` Pavel Machek
2020-10-14 8:32 ` Greg Kroah-Hartman
2020-10-14 1:27 ` Shuah Khan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201012132630.341352860@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=cai@lca.pw \
--cc=dennis.lissov@gmail.com \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=surenb@google.com \
--cc=torvalds@linux-foundation.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox