All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	patches@lists.linux.dev,
	syzbot+3622cea378100f45d59f@syzkaller.appspotmail.com,
	Qian Cai <cai@lca.pw>, Hugh Dickins <hughd@google.com>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 5.4 58/60] mm: fix VM_BUG_ON(PageTail) and BUG_ON(PageWriteback)
Date: Mon, 26 Jun 2023 20:12:37 +0200	[thread overview]
Message-ID: <20230626180741.934136653@linuxfoundation.org> (raw)
In-Reply-To: <20230626180739.558575012@linuxfoundation.org>

From: Hugh Dickins <hughd@google.com>

commit 073861ed77b6b957c3c8d54a11dc503f7d986ceb upstream.

Twice now, when exercising ext4 looped on shmem huge pages, I have crashed
on the PF_ONLY_HEAD check inside PageWaiters(): ext4_finish_bio() calling
end_page_writeback() calling wake_up_page() on tail of a shmem huge page,
no longer an ext4 page at all.

The problem is that PageWriteback is not accompanied by a page reference
(as the NOTE at the end of test_clear_page_writeback() acknowledges): as
soon as TestClearPageWriteback has been done, that page could be removed
from page cache, freed, and reused for something else by the time that
wake_up_page() is reached.

https://lore.kernel.org/linux-mm/20200827122019.GC14765@casper.infradead.org/
Matthew Wilcox suggested avoiding or weakening the PageWaiters() tail
check; but I'm paranoid about even looking at an unreferenced struct page,
lest its memory might itself have already been reused or hotremoved (and
wake_up_page_bit() may modify that memory with its ClearPageWaiters()).

Then on crashing a second time, realized there's a stronger reason against
that approach.  If my testing just occasionally crashes on that check,
when the page is reused for part of a compound page, wouldn't it be much
more common for the page to get reused as an order-0 page before reaching
wake_up_page()?  And on rare occasions, might that reused page already be
marked PageWriteback by its new user, and already be waited upon?  What
would that look like?

It would look like BUG_ON(PageWriteback) after wait_on_page_writeback()
in write_cache_pages() (though I have never seen that crash myself).

Matthew Wilcox explaining this to himself:
 "page is allocated, added to page cache, dirtied, writeback starts,

  --- thread A ---
  filesystem calls end_page_writeback()
        test_clear_page_writeback()
  --- context switch to thread B ---
  truncate_inode_pages_range() finds the page, it doesn't have writeback set,
  we delete it from the page cache.  Page gets reallocated, dirtied, writeback
  starts again.  Then we call write_cache_pages(), see
  PageWriteback() set, call wait_on_page_writeback()
  --- context switch back to thread A ---
  wake_up_page(page, PG_writeback);
  ... thread B is woken, but because the wakeup was for the old use of
  the page, PageWriteback is still set.

  Devious"

And prior to 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common() logic")
this would have been much less likely: before that, wake_page_function()'s
non-exclusive case would stop walking and not wake if it found Writeback
already set again; whereas now the non-exclusive case proceeds to wake.

I have not thought of a fix that does not add a little overhead: the
simplest fix is for end_page_writeback() to get_page() before calling
test_clear_page_writeback(), then put_page() after wake_up_page().

Was there a chance of missed wakeups before, since a page freed before
reaching wake_up_page() would have PageWaiters cleared?  I think not,
because each waiter does hold a reference on the page.  This bug comes
when the old use of the page, the one we do TestClearPageWriteback on,
had *no* waiters, so no additional page reference beyond the page cache
(and whoever racily freed it).  The reuse of the page has a waiter
holding a reference, and its own PageWriteback set; but the belated
wake_up_page() has woken the reuse to hit that BUG_ON(PageWriteback).

Reported-by: syzbot+3622cea378100f45d59f@syzkaller.appspotmail.com
Reported-by: Qian Cai <cai@lca.pw>
Fixes: 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common() logic")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org # v5.8+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 mm/filemap.c        |    8 ++++++++
 mm/page-writeback.c |    6 ------
 2 files changed, 8 insertions(+), 6 deletions(-)

--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1388,11 +1388,19 @@ void end_page_writeback(struct page *pag
 		rotate_reclaimable_page(page);
 	}
 
+	/*
+	 * Writeback does not hold a page reference of its own, relying
+	 * on truncation to wait for the clearing of PG_writeback.
+	 * But here we must make sure that the page is not freed and
+	 * reused before the wake_up_page().
+	 */
+	get_page(page);
 	if (!test_clear_page_writeback(page))
 		BUG();
 
 	smp_mb__after_atomic();
 	wake_up_page(page, PG_writeback);
+	put_page(page);
 }
 EXPORT_SYMBOL(end_page_writeback);
 
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2746,12 +2746,6 @@ int test_clear_page_writeback(struct pag
 	} else {
 		ret = TestClearPageWriteback(page);
 	}
-	/*
-	 * NOTE: Page might be free now! Writeback doesn't hold a page
-	 * reference on its own, it relies on truncation to wait for
-	 * the clearing of PG_writeback. The below can only access
-	 * page state that is static across allocation cycles.
-	 */
 	if (ret) {
 		dec_lruvec_state(lruvec, NR_WRITEBACK);
 		dec_zone_page_state(page, NR_ZONE_WRITE_PENDING);



  parent reply	other threads:[~2023-06-26 18:37 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-26 18:11 [PATCH 5.4 00/60] 5.4.249-rc1 review Greg Kroah-Hartman
2023-06-26 18:11 ` [PATCH 5.4 01/60] nilfs2: reject devices with insufficient block count Greg Kroah-Hartman
2023-06-26 18:11 ` [PATCH 5.4 02/60] mm: rewrite wait_on_page_bit_common() logic Greg Kroah-Hartman
2023-06-26 18:11 ` [PATCH 5.4 03/60] list: add "list_del_init_careful()" to go with "list_empty_careful()" Greg Kroah-Hartman
2023-06-26 18:11 ` [PATCH 5.4 04/60] epoll: ep_autoremove_wake_function should use list_del_init_careful Greg Kroah-Hartman
2023-06-26 18:11 ` [PATCH 5.4 05/60] tracing: Add tracing_reset_all_online_cpus_unlocked() function Greg Kroah-Hartman
2023-06-26 18:11 ` [PATCH 5.4 06/60] x86/purgatory: remove PGO flags Greg Kroah-Hartman
2023-06-26 18:11 ` [PATCH 5.4 07/60] tick/common: Align tick period during sched_timer setup Greg Kroah-Hartman
2023-06-26 18:11 ` [PATCH 5.4 08/60] media: dvbdev: Fix memleak in dvb_register_device Greg Kroah-Hartman
2023-06-26 18:11 ` [PATCH 5.4 09/60] media: dvbdev: fix error logic at dvb_register_device() Greg Kroah-Hartman
2023-06-26 18:11 ` [PATCH 5.4 10/60] media: dvb-core: Fix use-after-free due to race " Greg Kroah-Hartman
2023-06-26 18:11 ` [PATCH 5.4 11/60] nilfs2: fix buffer corruption due to concurrent device reads Greg Kroah-Hartman
2023-06-26 18:11 ` [PATCH 5.4 12/60] Drivers: hv: vmbus: Fix vmbus_wait_for_unload() to scan present CPUs Greg Kroah-Hartman
2023-06-26 18:11 ` [PATCH 5.4 13/60] PCI: hv: Fix a race condition bug in hv_pci_query_relations() Greg Kroah-Hartman
2023-06-26 18:11 ` [PATCH 5.4 14/60] cgroup: Do not corrupt task iteration when rebinding subsystem Greg Kroah-Hartman
2023-06-26 18:11 ` [PATCH 5.4 15/60] mmc: meson-gx: remove redundant mmc_request_done() call from irq context Greg Kroah-Hartman
2023-06-26 18:11 ` [PATCH 5.4 16/60] ip_tunnels: allow VXLAN/GENEVE to inherit TOS/TTL from VLAN Greg Kroah-Hartman
2023-06-26 18:11 ` [PATCH 5.4 17/60] writeback: fix dereferencing NULL mapping->host on writeback_page_template Greg Kroah-Hartman
2023-06-26 18:11 ` [PATCH 5.4 18/60] nilfs2: prevent general protection fault in nilfs_clear_dirty_page() Greg Kroah-Hartman
2023-06-26 18:11 ` [PATCH 5.4 19/60] cifs: Clean up DFS referral cache Greg Kroah-Hartman
2023-06-26 18:11 ` [PATCH 5.4 20/60] cifs: Get rid of kstrdup_const()d paths Greg Kroah-Hartman
2023-06-26 18:12 ` [PATCH 5.4 21/60] cifs: Introduce helpers for finding TCP connection Greg Kroah-Hartman
2023-06-26 18:12 ` [PATCH 5.4 22/60] cifs: Merge is_path_valid() into get_normalized_path() Greg Kroah-Hartman
2023-06-26 18:12 ` [PATCH 5.4 23/60] cifs: Fix potential deadlock when updating vol in cifs_reconnect() Greg Kroah-Hartman
2023-06-26 18:12 ` [PATCH 5.4 24/60] x86/mm: Avoid using set_pgd() outside of real PGD pages Greg Kroah-Hartman
2023-06-26 18:12 ` [PATCH 5.4 25/60] rcu: Upgrade rcu_swap_protected() to rcu_replace_pointer() Greg Kroah-Hartman
2023-06-26 18:12 ` [PATCH 5.4 26/60] ieee802154: hwsim: Fix possible memory leaks Greg Kroah-Hartman
2023-06-26 18:12 ` [PATCH 5.4 27/60] xfrm: Linearize the skb after offloading if needed Greg Kroah-Hartman
2023-06-26 18:12 ` [PATCH 5.4 28/60] net: qca_spi: Avoid high load if QCA7000 is not available Greg Kroah-Hartman
2023-06-26 18:12 ` [PATCH 5.4 29/60] mmc: mtk-sd: fix deferred probing Greg Kroah-Hartman
2023-06-26 18:12 ` [PATCH 5.4 30/60] mmc: mvsdio: convert to devm_platform_ioremap_resource Greg Kroah-Hartman
2023-06-26 18:12 ` [PATCH 5.4 31/60] mmc: mvsdio: fix deferred probing Greg Kroah-Hartman
2023-06-26 18:12 ` [PATCH 5.4 32/60] mmc: omap: " Greg Kroah-Hartman
2023-06-26 18:12 ` [PATCH 5.4 33/60] mmc: omap_hsmmc: " Greg Kroah-Hartman
2023-06-26 18:12 ` [PATCH 5.4 34/60] mmc: sdhci-acpi: " Greg Kroah-Hartman
2023-06-26 18:12 ` [PATCH 5.4 35/60] mmc: sh_mmcif: " Greg Kroah-Hartman
2023-06-26 18:12 ` [PATCH 5.4 36/60] mmc: usdhi60rol0: " Greg Kroah-Hartman
2023-06-26 18:12 ` [PATCH 5.4 37/60] ipvs: align inner_mac_header for encapsulation Greg Kroah-Hartman
2023-06-26 18:12 ` [PATCH 5.4 38/60] net: dsa: mt7530: fix trapping frames on non-MT7621 SoC MT7530 switch Greg Kroah-Hartman
2023-06-26 18:12 ` [PATCH 5.4 39/60] be2net: Extend xmit workaround to BE3 chip Greg Kroah-Hartman
2023-06-26 18:12 ` [PATCH 5.4 40/60] netfilter: nf_tables: disallow element updates of bound anonymous sets Greg Kroah-Hartman
2023-06-26 18:12 ` [PATCH 5.4 41/60] netfilter: nfnetlink_osf: fix module autoload Greg Kroah-Hartman
2023-06-26 18:12 ` [PATCH 5.4 42/60] Revert "net: phy: dp83867: perform soft reset and retain established link" Greg Kroah-Hartman
2023-06-26 18:12 ` [PATCH 5.4 43/60] sch_netem: acquire qdisc lock in netem_change() Greg Kroah-Hartman
2023-06-26 18:12 ` [PATCH 5.4 44/60] scsi: target: iscsi: Prevent login threads from racing between each other Greg Kroah-Hartman
2023-06-26 18:12 ` [PATCH 5.4 45/60] HID: wacom: Add error check to wacom_parse_and_register() Greg Kroah-Hartman
2023-06-26 18:12 ` [PATCH 5.4 46/60] arm64: Add missing Set/Way CMO encodings Greg Kroah-Hartman
2023-06-26 18:12 ` [PATCH 5.4 47/60] media: cec: core: dont set last_initiator if tx in progress Greg Kroah-Hartman
2023-06-26 18:12 ` [PATCH 5.4 48/60] nfcsim.c: Fix error checking for debugfs_create_dir Greg Kroah-Hartman
2023-06-26 18:12 ` [PATCH 5.4 49/60] usb: gadget: udc: fix NULL dereference in remove() Greg Kroah-Hartman
2023-06-26 18:12 ` [PATCH 5.4 50/60] s390/cio: unregister device when the only path is gone Greg Kroah-Hartman
2023-06-26 18:12 ` [PATCH 5.4 51/60] ASoC: nau8824: Add quirk to active-high jack-detect Greg Kroah-Hartman
2023-06-26 18:12 ` [PATCH 5.4 52/60] ARM: dts: Fix erroneous ADS touchscreen polarities Greg Kroah-Hartman
2023-06-26 18:12 ` [PATCH 5.4 53/60] drm/exynos: vidi: fix a wrong error return Greg Kroah-Hartman
2023-06-26 18:12 ` [PATCH 5.4 54/60] drm/exynos: fix race condition UAF in exynos_g2d_exec_ioctl Greg Kroah-Hartman
2023-06-26 18:12 ` [PATCH 5.4 55/60] drm/radeon: fix race condition UAF in radeon_gem_set_domain_ioctl Greg Kroah-Hartman
2023-06-26 18:12 ` [PATCH 5.4 56/60] x86/apic: Fix kernel panic when booting with intremap=off and x2apic_phys Greg Kroah-Hartman
2023-06-26 18:12 ` [PATCH 5.4 57/60] i2c: imx-lpi2c: fix type char overflow issue when calculating the clock cycle Greg Kroah-Hartman
2023-06-26 18:12 ` Greg Kroah-Hartman [this message]
2023-06-26 18:12 ` [PATCH 5.4 59/60] mm: make wait_on_page_writeback() wait for multiple pending writebacks Greg Kroah-Hartman
2023-06-26 18:12 ` [PATCH 5.4 60/60] xfs: verify buffer contents when we skip log replay Greg Kroah-Hartman
2023-06-27  9:04 ` [PATCH 5.4 00/60] 5.4.249-rc1 review Jon Hunter
2023-06-27 14:15 ` Harshit Mogalapalli
2023-06-27 20:10 ` Chris Paterson
2023-06-27 21:35 ` Guenter Roeck
2023-06-28  7:03 ` Naresh Kamboju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230626180741.934136653@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=cai@lca.pw \
    --cc=hughd@google.com \
    --cc=patches@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    --cc=syzbot+3622cea378100f45d59f@syzkaller.appspotmail.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.