From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org,
Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Michal Hocko <mhocko@suse.com>,
Nicholas Piggin <npiggin@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Sasha Levin <sashal@kernel.org>
Subject: [PATCH 4.9 02/50] mm/huge_memory.c: reorder operations in __split_huge_page_tail()
Date: Tue, 4 Dec 2018 11:49:57 +0100 [thread overview]
Message-ID: <20181204103714.604521350@linuxfoundation.org> (raw)
In-Reply-To: <20181204103714.485546262@linuxfoundation.org>
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
commit 605ca5ede7643a01f4c4a15913f9714ac297f8a6 upstream.
THP split makes non-atomic change of tail page flags. This is almost ok
because tail pages are locked and isolated but this breaks recent
changes in page locking: non-atomic operation could clear bit
PG_waiters.
As a result concurrent sequence get_page_unless_zero() -> lock_page()
might block forever. Especially if this page was truncated later.
Fix is trivial: clone flags before unfreezing page reference counter.
This race exists since commit 62906027091f ("mm: add PageWaiters
indicating tasks are waiting for a page bit") while unsave unfreeze
itself was added in commit 8df651c7059e ("thp: cleanup
split_huge_page()").
clear_compound_head() also must be called before unfreezing page
reference because after successful get_page_unless_zero() might follow
put_page() which needs correct compound_head().
And replace page_ref_inc()/page_ref_add() with page_ref_unfreeze() which
is made especially for that and has semantic of smp_store_release().
Link: http://lkml.kernel.org/r/151844393341.210639.13162088407980624477.stgit@buzz
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
mm/huge_memory.c | 36 +++++++++++++++---------------------
1 file changed, 15 insertions(+), 21 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 583ad61cc2f1..c14aec110e90 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1876,26 +1876,13 @@ static void __split_huge_page_tail(struct page *head, int tail,
struct page *page_tail = head + tail;
VM_BUG_ON_PAGE(atomic_read(&page_tail->_mapcount) != -1, page_tail);
- VM_BUG_ON_PAGE(page_ref_count(page_tail) != 0, page_tail);
/*
- * tail_page->_refcount is zero and not changing from under us. But
- * get_page_unless_zero() may be running from under us on the
- * tail_page. If we used atomic_set() below instead of atomic_inc() or
- * atomic_add(), we would then run atomic_set() concurrently with
- * get_page_unless_zero(), and atomic_set() is implemented in C not
- * using locked ops. spin_unlock on x86 sometime uses locked ops
- * because of PPro errata 66, 92, so unless somebody can guarantee
- * atomic_set() here would be safe on all archs (and not only on x86),
- * it's safer to use atomic_inc()/atomic_add().
+ * Clone page flags before unfreezing refcount.
+ *
+ * After successful get_page_unless_zero() might follow flags change,
+ * for exmaple lock_page() which set PG_waiters.
*/
- if (PageAnon(head)) {
- page_ref_inc(page_tail);
- } else {
- /* Additional pin to radix tree */
- page_ref_add(page_tail, 2);
- }
-
page_tail->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
page_tail->flags |= (head->flags &
((1L << PG_referenced) |
@@ -1907,14 +1894,21 @@ static void __split_huge_page_tail(struct page *head, int tail,
(1L << PG_unevictable) |
(1L << PG_dirty)));
- /*
- * After clearing PageTail the gup refcount can be released.
- * Page flags also must be visible before we make the page non-compound.
- */
+ /* Page flags must be visible before we make the page non-compound. */
smp_wmb();
+ /*
+ * Clear PageTail before unfreezing page refcount.
+ *
+ * After successful get_page_unless_zero() might follow put_page()
+ * which needs correct compound_head().
+ */
clear_compound_head(page_tail);
+ /* Finally unfreeze refcount. Additional reference from page cache. */
+ page_ref_unfreeze(page_tail, 1 + (!PageAnon(head) ||
+ PageSwapCache(head)));
+
if (page_is_young(head))
set_page_young(page_tail);
if (page_is_idle(head))
--
2.17.1
next prev parent reply other threads:[~2018-12-04 11:12 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-04 10:49 [PATCH 4.9 00/50] 4.9.143-stable review Greg Kroah-Hartman
2018-12-04 10:49 ` [PATCH 4.9 01/50] mm/huge_memory: rename freeze_page() to unmap_page() Greg Kroah-Hartman
2018-12-04 10:49 ` Greg Kroah-Hartman [this message]
2018-12-04 10:49 ` [PATCH 4.9 03/50] mm/huge_memory: splitting set mapping+index before unfreeze Greg Kroah-Hartman
2018-12-04 10:49 ` [PATCH 4.9 04/50] mm/huge_memory: fix lockdep complaint on 32-bit i_size_read() Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 05/50] mm/khugepaged: collapse_shmem() stop if punched or truncated Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 06/50] shmem: shmem_charge: verify max_block is not exceeded before inode update Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 07/50] shmem: introduce shmem_inode_acct_block Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 08/50] mm/khugepaged: fix crashes due to misaccounted holes Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 09/50] mm/khugepaged: collapse_shmem() remember to clear holes Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 10/50] mm/khugepaged: minor reorderings in collapse_shmem() Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 11/50] mm/khugepaged: collapse_shmem() without freezing new_page Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 12/50] mm/khugepaged: collapse_shmem() do not crash on Compound Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 13/50] media: em28xx: Fix use-after-free when disconnecting Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 14/50] Revert "wlcore: Add missing PM call for wlcore_cmd_wait_for_event_or_timeout()" Greg Kroah-Hartman
2018-12-04 10:50 ` Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 15/50] net: skb_scrub_packet(): Scrub offload_fwd_mark Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 16/50] rapidio/rionet: do not free skb before reading its length Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 17/50] s390/qeth: fix length check in SNMP processing Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 18/50] usbnet: ipheth: fix potential recvmsg bug and recvmsg bug 2 Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 19/50] kvm: mmu: Fix race in emulated page table writes Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 20/50] kvm: svm: Ensure an IBPB on all affected CPUs when freeing a vmcb Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 21/50] KVM: X86: Fix scan ioapic use-before-initialization Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 22/50] xtensa: enable coprocessors that are being flushed Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 23/50] xtensa: fix coprocessor context offset definitions Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 24/50] Btrfs: ensure path name is null terminated at btrfs_control_ioctl Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 25/50] perf/x86/intel: Move branch tracing setup to the Intel-specific source file Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 26/50] perf/x86/intel: Add generic branch tracing check to intel_pmu_has_bts() Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 27/50] fs: fix lost error code in dio_complete Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 28/50] ALSA: wss: Fix invalid snd_free_pages() at error path Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 29/50] ALSA: ac97: Fix incorrect bit shift at AC97-SPSA control write Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 30/50] ALSA: control: Fix race between adding and removing a user element Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 31/50] ALSA: sparc: Fix invalid snd_free_pages() at error path Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 32/50] ext2: fix potential use after free Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 33/50] dmaengine: at_hdmac: fix memory leak in at_dma_xlate() Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 34/50] dmaengine: at_hdmac: fix module unloading Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 35/50] btrfs: release metadata before running delayed refs Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 36/50] USB: usb-storage: Add new IDs to ums-realtek Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 37/50] usb: core: quirks: add RESET_RESUME quirk for Cherry G230 Stream series Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 38/50] Revert "usb: dwc3: gadget: skip Set/Clear Halt when invalid" Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 39/50] iio:st_magn: Fix enable device after trigger Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 40/50] mm: use swp_offset as key in shmem_replace_page() Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 41/50] Drivers: hv: vmbus: check the creation_status in vmbus_establish_gpadl() Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 42/50] misc: mic/scif: fix copy-paste error in scif_create_remote_lookup Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 43/50] efi/libstub: arm: support building with clang Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 44/50] ARM: 8766/1: drop no-thumb-interwork in EABI mode Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 45/50] ARM: 8767/1: add support for building ARM kernel with clang Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 46/50] bus: arm-cci: remove unnecessary unreachable() Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 47/50] ARM: trusted_foundations: do not use naked function Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 48/50] workqueue: avoid clang warning Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 49/50] efi/libstub: Make file I/O chunking x86-specific Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 50/50] kbuild: Set KBUILD_CFLAGS before incl. arch Makefile Greg Kroah-Hartman
2018-12-04 15:52 ` [PATCH 4.9 00/50] 4.9.143-stable review kernelci.org bot
2018-12-04 21:40 ` Guenter Roeck
2018-12-05 5:12 ` Naresh Kamboju
2018-12-05 9:30 ` Jon Hunter
2018-12-05 9:30 ` Jon Hunter
2018-12-05 23:54 ` shuah
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181204103714.604521350@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=akpm@linux-foundation.org \
--cc=khlebnikov@yandex-team.ru \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mhocko@suse.com \
--cc=npiggin@gmail.com \
--cc=sashal@kernel.org \
--cc=stable@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.