stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Denis Lisov <dennis.lissov@gmail.com>,
	Qian Cai <cai@lca.pw>, Hugh Dickins <hughd@google.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Suren Baghdasaryan <surenb@google.com>
Subject: [PATCH 5.4 34/85] mm/khugepaged: fix filemap page_to_pgoff(page) != offset
Date: Mon, 12 Oct 2020 15:26:57 +0200	[thread overview]
Message-ID: <20201012132634.499402561@linuxfoundation.org> (raw)
In-Reply-To: <20201012132632.846779148@linuxfoundation.org>

From: Hugh Dickins <hughd@google.com>

commit 033b5d77551167f8c24ca862ce83d3e0745f9245 upstream.

There have been elusive reports of filemap_fault() hitting its
VM_BUG_ON_PAGE(page_to_pgoff(page) != offset, page) on kernels built
with CONFIG_READ_ONLY_THP_FOR_FS=y.

Suren has hit it on a kernel with CONFIG_READ_ONLY_THP_FOR_FS=y and
CONFIG_NUMA is not set: and he has analyzed it down to how khugepaged
without NUMA reuses the same huge page after collapse_file() failed
(whereas NUMA targets its allocation to the respective node each time).
And most of us were usually testing with CONFIG_NUMA=y kernels.

collapse_file(old start)
  new_page = khugepaged_alloc_page(hpage)
  __SetPageLocked(new_page)
  new_page->index = start // hpage->index=old offset
  new_page->mapping = mapping
  xas_store(&xas, new_page)

                          filemap_fault
                            page = find_get_page(mapping, offset)
                            // if offset falls inside hpage then
                            // compound_head(page) == hpage
                            lock_page_maybe_drop_mmap()
                              __lock_page(page)

  // collapse fails
  xas_store(&xas, old page)
  new_page->mapping = NULL
  unlock_page(new_page)

collapse_file(new start)
  new_page = khugepaged_alloc_page(hpage)
  __SetPageLocked(new_page)
  new_page->index = start // hpage->index=new offset
  new_page->mapping = mapping // mapping becomes valid again

                            // since compound_head(page) == hpage
                            // page_to_pgoff(page) got changed
                            VM_BUG_ON_PAGE(page_to_pgoff(page) != offset)

An initial patch replaced __SetPageLocked() by lock_page(), which did
fix the race which Suren illustrates above.  But testing showed that it's
not good enough: if the racing task's __lock_page() gets delayed long
after its find_get_page(), then it may follow collapse_file(new start)'s
successful final unlock_page(), and crash on the same VM_BUG_ON_PAGE.

It could be fixed by relaxing filemap_fault()'s VM_BUG_ON_PAGE to a
check and retry (as is done for mapping), with similar relaxations in
find_lock_entry() and pagecache_get_page(): but it's not obvious what
else might get caught out; and khugepaged non-NUMA appears to be unique
in exposing a page to page cache, then revoking, without going through
a full cycle of freeing before reuse.

Instead, non-NUMA khugepaged_prealloc_page() release the old page
if anyone else has a reference to it (1% of cases when I tested).

Although never reported on huge tmpfs, I believe its find_lock_entry()
has been at similar risk; but huge tmpfs does not rely on khugepaged
for its normal working nearly so much as READ_ONLY_THP_FOR_FS does.

Reported-by: Denis Lisov <dennis.lissov@gmail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206569
Link: https://lore.kernel.org/linux-mm/?q=20200219144635.3b7417145de19b65f258c943%40linux-foundation.org
Reported-by: Qian Cai <cai@lca.pw>
Link: https://lore.kernel.org/linux-xfs/?q=20200616013309.GB815%40lca.pw
Reported-and-analyzed-by: Suren Baghdasaryan <surenb@google.com>
Fixes: 87c460a0bded ("mm/khugepaged: collapse_shmem() without freezing new_page")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v4.9+
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/khugepaged.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -832,6 +832,18 @@ static struct page *khugepaged_alloc_hug
 
 static bool khugepaged_prealloc_page(struct page **hpage, bool *wait)
 {
+	/*
+	 * If the hpage allocated earlier was briefly exposed in page cache
+	 * before collapse_file() failed, it is possible that racing lookups
+	 * have not yet completed, and would then be unpleasantly surprised by
+	 * finding the hpage reused for the same mapping at a different offset.
+	 * Just release the previous allocation if there is any danger of that.
+	 */
+	if (*hpage && page_count(*hpage) > 1) {
+		put_page(*hpage);
+		*hpage = NULL;
+	}
+
 	if (!*hpage)
 		*hpage = khugepaged_alloc_hugepage(wait);
 



  parent reply	other threads:[~2020-10-12 13:42 UTC|newest]

Thread overview: 90+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-12 13:26 [PATCH 5.4 00/85] 5.4.71-rc1 review Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 01/85] fbdev, newport_con: Move FONT_EXTRA_WORDS macros into linux/font.h Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 02/85] Fonts: Support FONT_EXTRA_WORDS macros for built-in fonts Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 03/85] fbcon: Fix global-out-of-bounds read in fbcon_get_font() Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 04/85] Revert "ravb: Fixed to be able to unload modules" Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 05/85] io_uring: Fix resource leaking when kill the process Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 06/85] io_uring: Fix missing smp_mb() in io_cancel_async_work() Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 07/85] io_uring: Fix remove irrelevant req from the task_list Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 08/85] io_uring: Fix double list add in io_queue_async_work() Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 09/85] net: wireless: nl80211: fix out-of-bounds access in nl80211_del_key() Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 10/85] drm/nouveau/mem: guard against NULL pointer access in mem_del Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 11/85] vhost: Dont call access_ok() when using IOTLB Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 12/85] vhost: Use vhost_get_used_size() in vhost_vring_set_addr() Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 13/85] usermodehelper: reset umask to default before executing user process Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 14/85] Platform: OLPC: Fix memleak in olpc_ec_probe Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 15/85] platform/x86: intel-vbtn: Fix SW_TABLET_MODE always reporting 1 on the HP Pavilion 11 x360 Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 16/85] platform/x86: thinkpad_acpi: initialize tp_nvram_state variable Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 17/85] bpf: Fix sysfs export of empty BTF section Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 18/85] bpf: Prevent .BTF section elimination Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 19/85] platform/x86: intel-vbtn: Switch to an allow-list for SW_TABLET_MODE reporting Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 20/85] platform/x86: thinkpad_acpi: re-initialize ACPI buffer size when reuse Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 21/85] driver core: Fix probe_count imbalance in really_probe() Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 22/85] perf test session topology: Fix data path Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 23/85] perf top: Fix stdio interface input handling with glibc 2.28+ Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 24/85] i2c: i801: Exclude device from suspend direct complete optimization Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 25/85] arm64: dts: stratix10: add status to qspi dts node Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 26/85] Btrfs: send, allow clone operations within the same file Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 27/85] Btrfs: send, fix emission of invalid " Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 28/85] btrfs: volumes: Use more straightforward way to calculate map length Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 29/85] btrfs: Ensure we trim ranges across block group boundary Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 30/85] btrfs: fix RWF_NOWAIT write not failling when we need to cow Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 31/85] btrfs: allow btrfs_truncate_block() to fallback to nocow for data space reservation Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 32/85] nvme-core: put ctrl ref when module ref get fail Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 33/85] macsec: avoid use-after-free in macsec_handle_frame() Greg Kroah-Hartman
2020-10-12 13:26 ` Greg Kroah-Hartman [this message]
2020-10-12 13:26 ` [PATCH 5.4 35/85] net: introduce helper sendpage_ok() in include/linux/net.h Greg Kroah-Hartman
2020-10-12 13:26 ` [PATCH 5.4 36/85] tcp: use sendpage_ok() to detect misused .sendpage Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 37/85] nvme-tcp: check page by sendpage_ok() before calling kernel_sendpage() Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 38/85] xfrmi: drop ignore_df check before updating pmtu Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 39/85] cifs: Fix incomplete memory allocation on setxattr path Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 40/85] i2c: meson: fix clock setting overwrite Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 41/85] i2c: meson: fixup rate calculation with filter delay Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 42/85] i2c: owl: Clear NACK and BUS error bits Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 43/85] sctp: fix sctp_auth_init_hmacs() error path Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 44/85] team: set dev->needed_headroom in team_setup_by_port() Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 45/85] net: team: fix memory leak in __team_options_register Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 46/85] openvswitch: handle DNAT tuple collision Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 47/85] drm/amdgpu: prevent double kfree ttm->sg Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 48/85] iommu/vt-d: Fix lockdep splat in iommu_flush_dev_iotlb() Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 49/85] xfrm: clone XFRMA_SET_MARK in xfrm_do_migrate Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 50/85] xfrm: clone XFRMA_REPLAY_ESN_VAL " Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 51/85] xfrm: clone XFRMA_SEC_CTX " Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 52/85] xfrm: clone whole liftime_cur structure " Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 53/85] net: stmmac: removed enabling eee in EEE set callback Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 54/85] platform/x86: fix kconfig dependency warning for FUJITSU_LAPTOP Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 55/85] xfrm: Use correct address family in xfrm_state_find Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 56/85] iavf: use generic power management Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 57/85] iavf: Fix incorrect adapter get in iavf_resume Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 58/85] net: ethernet: cavium: octeon_mgmt: use phy_start and phy_stop Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 59/85] bonding: set dev->needed_headroom in bond_setup_by_slave() Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 60/85] mdio: fix mdio-thunder.c dependency & build error Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 61/85] mlxsw: spectrum_acl: Fix mlxsw_sp_acl_tcam_group_add()s error path Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 62/85] r8169: fix RTL8168f/RTL8411 EPHY config Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 63/85] net: usb: ax88179_178a: fix missing stop entry in driver_info Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 64/85] virtio-net: dont disable guest csum when disable LRO Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 65/85] net/mlx5: Avoid possible free of command entry while timeout comp handler Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 66/85] net/mlx5: Fix request_irqs error flow Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 67/85] net/mlx5e: Add resiliency in Striding RQ mode for packets larger than MTU Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 68/85] net/mlx5e: Fix VLAN cleanup flow Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 69/85] net/mlx5e: Fix VLAN create flow Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 70/85] rxrpc: Fix rxkad token xdr encoding Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 71/85] rxrpc: Downgrade the BUG() for unsupported token type in rxrpc_read() Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 72/85] rxrpc: Fix some missing _bh annotations on locking conn->state_lock Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 73/85] rxrpc: The server keyring isnt network-namespaced Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 74/85] rxrpc: Fix server keyring leak Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 75/85] perf: Fix task_function_call() error handling Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 76/85] mmc: core: dont set limits.discard_granularity as 0 Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 77/85] mm: khugepaged: recalculate min_free_kbytes after memory hotplug as expected by khugepaged Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 78/85] tcp: fix receive window update in tcp_add_backlog() Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 79/85] net/core: check length before updating Ethertype in skb_mpls_{push,pop} Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 80/85] net/tls: race causes kernel panic Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 81/85] net/mlx5e: Fix drivers declaration to support GRE offload Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 82/85] Input: ati_remote2 - add missing newlines when printing module parameters Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 83/85] net: usb: rtl8150: set random MAC address when set_ethernet_addr() fails Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 84/85] net_sched: defer tcf_idr_insert() in tcf_action_init_1() Greg Kroah-Hartman
2020-10-12 13:27 ` [PATCH 5.4 85/85] net_sched: commit action insertions together Greg Kroah-Hartman
2020-10-12 18:27 ` [PATCH 5.4 00/85] 5.4.71-rc1 review Jon Hunter
2020-10-13  5:52 ` Naresh Kamboju
2020-10-13 16:41 ` Guenter Roeck
2020-10-14  1:24 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201012132634.499402561@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=cai@lca.pw \
    --cc=dennis.lissov@gmail.com \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=surenb@google.com \
    --cc=torvalds@linux-foundation.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).