From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Hugh Dickins <hughd@google.com>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Jerome Glisse <jglisse@redhat.com>,
Konstantin Khlebnikov <khlebnikov@yandex-team.ru>,
Matthew Wilcox <willy@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Sasha Levin <sashal@kernel.org>
Subject: [PATCH 4.9 11/50] mm/khugepaged: collapse_shmem() without freezing new_page
Date: Tue, 4 Dec 2018 11:50:06 +0100 [thread overview]
Message-ID: <20181204103715.096057919@linuxfoundation.org> (raw)
In-Reply-To: <20181204103714.485546262@linuxfoundation.org>
4.9-stable review patch. If anyone has any objections, please let me know.
------------------
commit 87c460a0bded56195b5eb497d44709777ef7b415 upstream.
khugepaged's collapse_shmem() does almost all of its work, to assemble
the huge new_page from 512 scattered old pages, with the new_page's
refcount frozen to 0 (and refcounts of all old pages so far also frozen
to 0). Including shmem_getpage() to read in any which were out on swap,
memory reclaim if necessary to allocate their intermediate pages, and
copying over all the data from old to new.
Imagine the frozen refcount as a spinlock held, but without any lock
debugging to highlight the abuse: it's not good, and under serious load
heads into lockups - speculative getters of the page are not expecting
to spin while khugepaged is rescheduled.
One can get a little further under load by hacking around elsewhere; but
fortunately, freezing the new_page turns out to have been entirely
unnecessary, with no hacks needed elsewhere.
The huge new_page lock is already held throughout, and guards all its
subpages as they are brought one by one into the page cache tree; and
anything reading the data in that page, without the lock, before it has
been marked PageUptodate, would already be in the wrong. So simply
eliminate the freezing of the new_page.
Each of the old pages remains frozen with refcount 0 after it has been
replaced by a new_page subpage in the page cache tree, until they are
all unfrozen on success or failure: just as before. They could be
unfrozen sooner, but cause no problem once no longer visible to
find_get_entry(), filemap_map_pages() and other speculative lookups.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261527570.2275@eggly.anvils
Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
mm/khugepaged.c | 19 +++++++------------
1 file changed, 7 insertions(+), 12 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 47b83030fc53..b87bd43993bd 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1286,7 +1286,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
* collapse_shmem - collapse small tmpfs/shmem pages into huge one.
*
* Basic scheme is simple, details are more complex:
- * - allocate and freeze a new huge page;
+ * - allocate and lock a new huge page;
* - scan over radix tree replacing old pages the new one
* + swap in pages if necessary;
* + fill in gaps;
@@ -1294,11 +1294,11 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
* - if replacing succeed:
* + copy data over;
* + free old pages;
- * + unfreeze huge page;
+ * + unlock huge page;
* - if replacing failed;
* + put all pages back and unfreeze them;
* + restore gaps in the radix-tree;
- * + free huge page;
+ * + unlock and free huge page;
*/
static void collapse_shmem(struct mm_struct *mm,
struct address_space *mapping, pgoff_t start,
@@ -1336,13 +1336,11 @@ static void collapse_shmem(struct mm_struct *mm,
__SetPageSwapBacked(new_page);
new_page->index = start;
new_page->mapping = mapping;
- BUG_ON(!page_ref_freeze(new_page, 1));
/*
- * At this point the new_page is 'frozen' (page_count() is zero), locked
- * and not up-to-date. It's safe to insert it into radix tree, because
- * nobody would be able to map it or use it in other way until we
- * unfreeze it.
+ * At this point the new_page is locked and not up-to-date.
+ * It's safe to insert it into the page cache, because nobody would
+ * be able to map it or use it in another way until we unlock it.
*/
index = start;
@@ -1520,9 +1518,8 @@ static void collapse_shmem(struct mm_struct *mm,
index++;
}
- /* Everything is ready, let's unfreeze the new_page */
SetPageUptodate(new_page);
- page_ref_unfreeze(new_page, HPAGE_PMD_NR);
+ page_ref_add(new_page, HPAGE_PMD_NR - 1);
set_page_dirty(new_page);
mem_cgroup_commit_charge(new_page, memcg, false, true);
lru_cache_add_anon(new_page);
@@ -1570,8 +1567,6 @@ static void collapse_shmem(struct mm_struct *mm,
VM_BUG_ON(nr_none);
spin_unlock_irq(&mapping->tree_lock);
- /* Unfreeze new_page, caller would take care about freeing it */
- page_ref_unfreeze(new_page, 1);
mem_cgroup_cancel_charge(new_page, memcg, true);
new_page->mapping = NULL;
}
--
2.17.1
next prev parent reply other threads:[~2018-12-04 10:50 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-04 10:49 [PATCH 4.9 00/50] 4.9.143-stable review Greg Kroah-Hartman
2018-12-04 10:49 ` [PATCH 4.9 01/50] mm/huge_memory: rename freeze_page() to unmap_page() Greg Kroah-Hartman
2018-12-04 10:49 ` [PATCH 4.9 02/50] mm/huge_memory.c: reorder operations in __split_huge_page_tail() Greg Kroah-Hartman
2018-12-04 10:49 ` [PATCH 4.9 03/50] mm/huge_memory: splitting set mapping+index before unfreeze Greg Kroah-Hartman
2018-12-04 10:49 ` [PATCH 4.9 04/50] mm/huge_memory: fix lockdep complaint on 32-bit i_size_read() Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 05/50] mm/khugepaged: collapse_shmem() stop if punched or truncated Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 06/50] shmem: shmem_charge: verify max_block is not exceeded before inode update Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 07/50] shmem: introduce shmem_inode_acct_block Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 08/50] mm/khugepaged: fix crashes due to misaccounted holes Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 09/50] mm/khugepaged: collapse_shmem() remember to clear holes Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 10/50] mm/khugepaged: minor reorderings in collapse_shmem() Greg Kroah-Hartman
2018-12-04 10:50 ` Greg Kroah-Hartman [this message]
2018-12-04 10:50 ` [PATCH 4.9 12/50] mm/khugepaged: collapse_shmem() do not crash on Compound Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 13/50] media: em28xx: Fix use-after-free when disconnecting Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 14/50] Revert "wlcore: Add missing PM call for wlcore_cmd_wait_for_event_or_timeout()" Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 15/50] net: skb_scrub_packet(): Scrub offload_fwd_mark Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 16/50] rapidio/rionet: do not free skb before reading its length Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 17/50] s390/qeth: fix length check in SNMP processing Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 18/50] usbnet: ipheth: fix potential recvmsg bug and recvmsg bug 2 Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 19/50] kvm: mmu: Fix race in emulated page table writes Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 20/50] kvm: svm: Ensure an IBPB on all affected CPUs when freeing a vmcb Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 21/50] KVM: X86: Fix scan ioapic use-before-initialization Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 22/50] xtensa: enable coprocessors that are being flushed Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 23/50] xtensa: fix coprocessor context offset definitions Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 24/50] Btrfs: ensure path name is null terminated at btrfs_control_ioctl Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 25/50] perf/x86/intel: Move branch tracing setup to the Intel-specific source file Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 26/50] perf/x86/intel: Add generic branch tracing check to intel_pmu_has_bts() Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 27/50] fs: fix lost error code in dio_complete Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 28/50] ALSA: wss: Fix invalid snd_free_pages() at error path Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 29/50] ALSA: ac97: Fix incorrect bit shift at AC97-SPSA control write Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 30/50] ALSA: control: Fix race between adding and removing a user element Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 31/50] ALSA: sparc: Fix invalid snd_free_pages() at error path Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 32/50] ext2: fix potential use after free Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 33/50] dmaengine: at_hdmac: fix memory leak in at_dma_xlate() Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 34/50] dmaengine: at_hdmac: fix module unloading Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 35/50] btrfs: release metadata before running delayed refs Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 36/50] USB: usb-storage: Add new IDs to ums-realtek Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 37/50] usb: core: quirks: add RESET_RESUME quirk for Cherry G230 Stream series Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 38/50] Revert "usb: dwc3: gadget: skip Set/Clear Halt when invalid" Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 39/50] iio:st_magn: Fix enable device after trigger Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 40/50] mm: use swp_offset as key in shmem_replace_page() Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 41/50] Drivers: hv: vmbus: check the creation_status in vmbus_establish_gpadl() Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 42/50] misc: mic/scif: fix copy-paste error in scif_create_remote_lookup Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 43/50] efi/libstub: arm: support building with clang Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 44/50] ARM: 8766/1: drop no-thumb-interwork in EABI mode Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 45/50] ARM: 8767/1: add support for building ARM kernel with clang Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 46/50] bus: arm-cci: remove unnecessary unreachable() Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 47/50] ARM: trusted_foundations: do not use naked function Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 48/50] workqueue: avoid clang warning Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 49/50] efi/libstub: Make file I/O chunking x86-specific Greg Kroah-Hartman
2018-12-04 10:50 ` [PATCH 4.9 50/50] kbuild: Set KBUILD_CFLAGS before incl. arch Makefile Greg Kroah-Hartman
2018-12-04 15:52 ` [PATCH 4.9 00/50] 4.9.143-stable review kernelci.org bot
2018-12-04 21:40 ` Guenter Roeck
2018-12-05 5:12 ` Naresh Kamboju
2018-12-05 9:30 ` Jon Hunter
2018-12-05 23:54 ` shuah
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181204103715.096057919@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=akpm@linux-foundation.org \
--cc=hughd@google.com \
--cc=jglisse@redhat.com \
--cc=khlebnikov@yandex-team.ru \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=sashal@kernel.org \
--cc=stable@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).