From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org,
Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Michal Hocko <mhocko@suse.com>,
Nicholas Piggin <npiggin@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Sasha Levin <sashal@kernel.org>
Subject: [PATCH 4.9 02/50] mm/huge_memory.c: reorder operations in __split_huge_page_tail()
Date: Tue, 4 Dec 2018 11:49:57 +0100 [thread overview]
Message-ID: <20181204103714.604521350@linuxfoundation.org> (raw)
In-Reply-To: <20181204103714.485546262@linuxfoundation.org>
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
commit 605ca5ede7643a01f4c4a15913f9714ac297f8a6 upstream.
THP split makes non-atomic change of tail page flags. This is almost ok
because tail pages are locked and isolated but this breaks recent
changes in page locking: non-atomic operation could clear bit
PG_waiters.
As a result concurrent sequence get_page_unless_zero() -> lock_page()
might block forever. Especially if this page was truncated later.
Fix is trivial: clone flags before unfreezing page reference counter.
This race exists since commit 62906027091f ("mm: add PageWaiters
indicating tasks are waiting for a page bit") while unsave unfreeze
itself was added in commit 8df651c7059e ("thp: cleanup
split_huge_page()").
clear_compound_head() also must be called before unfreezing page
reference because after successful get_page_unless_zero() might follow
put_page() which needs correct compound_head().
And replace page_ref_inc()/page_ref_add() with page_ref_unfreeze() which
is made especially for that and has semantic of smp_store_release().
Link: http://lkml.kernel.org/r/151844393341.210639.13162088407980624477.stgit@buzz
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
mm/huge_memory.c | 36 +++++++++++++++---------------------
1 file changed, 15 insertions(+), 21 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 583ad61cc2f1..c14aec110e90 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1876,26 +1876,13 @@ static void __split_huge_page_tail(struct page *head, int tail,
struct page *page_tail = head + tail;
VM_BUG_ON_PAGE(atomic_read(&page_tail->_mapcount) != -1, page_tail);
- VM_BUG_ON_PAGE(page_ref_count(page_tail) != 0, page_tail);
/*
- * tail_page->_refcount is zero and not changing from under us. But
- * get_page_unless_zero() may be running from under us on the
- * tail_page. If we used atomic_set() below instead of atomic_inc() or
- * atomic_add(), we would then run atomic_set() concurrently with
- * get_page_unless_zero(), and atomic_set() is implemented in C not
- * using locked ops. spin_unlock on x86 sometime uses locked ops
- * because of PPro errata 66, 92, so unless somebody can guarantee
- * atomic_set() here would be safe on all archs (and not only on x86),
- * it's safer to use atomic_inc()/atomic_add().
+ * Clone page flags before unfreezing refcount.
+ *
+ * After successful get_page_unless_zero() might follow flags change,
+ * for exmaple lock_page() which set PG_waiters.
*/
- if (PageAnon(head)) {
- page_ref_inc(page_tail);
- } else {
- /* Additional pin to radix tree */
- page_ref_add(page_tail, 2);
- }
-
page_tail->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
page_tail->flags |= (head->flags &
((1L << PG_referenced) |
@@ -1907,14 +1894,21 @@ static void __split_huge_page_tail(struct page *head, int tail,
(1L << PG_unevictable) |
(1L << PG_dirty)));
- /*
- * After clearing PageTail the gup refcount can be released.
- * Page flags also must be visible before we make the page non-compound.
- */
+ /* Page flags must be visible before we make the page non-compound. */
smp_wmb();
+ /*
+ * Clear PageTail before unfreezing page refcount.
+ *
+ * After successful get_page_unless_zero() might follow put_page()
+ * which needs correct compound_head().
+ */
clear_compound_head(page_tail);
+ /* Finally unfreeze refcount. Additional reference from page cache. */
+ page_ref_unfreeze(page_tail, 1 + (!PageAnon(head) ||
+ PageSwapCache(head)));
+
if (page_is_young(head))
set_page_young(page_tail);
if (page_is_idle(head))
--
2.17.1
next prev parent reply other threads:[~2018-12-04 10:49 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-04 10:49 [PATCH 4.9 00/50] 4.9.143-stable review Greg Kroah-Hartman
2018-12-04 10:49 ` [PATCH 4.9 01/50] mm/huge_memory: rename freeze_page() to unmap_page() Greg Kroah-Hartman
2018-12-04 10:49 ` Greg Kroah-Hartman [this message]
2018-12-04 10:49 ` [PATCH 4.9 03/50] mm/huge_memory: splitting set mapping+index before unfreeze Greg Kroah-Hartman
2018-12-04 10:49 ` [PATCH 4.9 04/50] mm/huge_memory: fix lockdep complaint on 32-bit i_size_read() Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 05/50] mm/khugepaged: collapse_shmem() stop if punched or truncated Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 06/50] shmem: shmem_charge: verify max_block is not exceeded before inode update Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 07/50] shmem: introduce shmem_inode_acct_block Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 08/50] mm/khugepaged: fix crashes due to misaccounted holes Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 09/50] mm/khugepaged: collapse_shmem() remember to clear holes Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 10/50] mm/khugepaged: minor reorderings in collapse_shmem() Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 11/50] mm/khugepaged: collapse_shmem() without freezing new_page Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 12/50] mm/khugepaged: collapse_shmem() do not crash on Compound Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 13/50] media: em28xx: Fix use-after-free when disconnecting Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 14/50] Revert "wlcore: Add missing PM call for wlcore_cmd_wait_for_event_or_timeout()" Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 15/50] net: skb_scrub_packet(): Scrub offload_fwd_mark Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 16/50] rapidio/rionet: do not free skb before reading its length Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 17/50] s390/qeth: fix length check in SNMP processing Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 18/50] usbnet: ipheth: fix potential recvmsg bug and recvmsg bug 2 Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 19/50] kvm: mmu: Fix race in emulated page table writes Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 20/50] kvm: svm: Ensure an IBPB on all affected CPUs when freeing a vmcb Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 21/50] KVM: X86: Fix scan ioapic use-before-initialization Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 22/50] xtensa: enable coprocessors that are being flushed Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 23/50] xtensa: fix coprocessor context offset definitions Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 24/50] Btrfs: ensure path name is null terminated at btrfs_control_ioctl Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 25/50] perf/x86/intel: Move branch tracing setup to the Intel-specific source file Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 26/50] perf/x86/intel: Add generic branch tracing check to intel_pmu_has_bts() Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 27/50] fs: fix lost error code in dio_complete Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 28/50] ALSA: wss: Fix invalid snd_free_pages() at error path Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 29/50] ALSA: ac97: Fix incorrect bit shift at AC97-SPSA control write Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 30/50] ALSA: control: Fix race between adding and removing a user element Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 31/50] ALSA: sparc: Fix invalid snd_free_pages() at error path Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 32/50] ext2: fix potential use after free Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 33/50] dmaengine: at_hdmac: fix memory leak in at_dma_xlate() Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 34/50] dmaengine: at_hdmac: fix module unloading Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 35/50] btrfs: release metadata before running delayed refs Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 36/50] USB: usb-storage: Add new IDs to ums-realtek Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 37/50] usb: core: quirks: add RESET_RESUME quirk for Cherry G230 Stream series Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 38/50] Revert "usb: dwc3: gadget: skip Set/Clear Halt when invalid" Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 39/50] iio:st_magn: Fix enable device after trigger Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 40/50] mm: use swp_offset as key in shmem_replace_page() Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 41/50] Drivers: hv: vmbus: check the creation_status in vmbus_establish_gpadl() Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 42/50] misc: mic/scif: fix copy-paste error in scif_create_remote_lookup Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 43/50] efi/libstub: arm: support building with clang Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 44/50] ARM: 8766/1: drop no-thumb-interwork in EABI mode Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 45/50] ARM: 8767/1: add support for building ARM kernel with clang Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 46/50] bus: arm-cci: remove unnecessary unreachable() Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 47/50] ARM: trusted_foundations: do not use naked function Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 48/50] workqueue: avoid clang warning Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 49/50] efi/libstub: Make file I/O chunking x86-specific Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 50/50] kbuild: Set KBUILD_CFLAGS before incl. arch Makefile Greg Kroah-Hartman
2018-12-04 15:52 ` [PATCH 4.9 00/50] 4.9.143-stable review kernelci.org bot
2018-12-04 21:40 ` Guenter Roeck
2018-12-05 5:12 ` Naresh Kamboju
2018-12-05 9:30 ` Jon Hunter
2018-12-05 23:54 ` shuah
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181204103714.604521350@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=akpm@linux-foundation.org \
--cc=khlebnikov@yandex-team.ru \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mhocko@suse.com \
--cc=npiggin@gmail.com \
--cc=sashal@kernel.org \
--cc=stable@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).