From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Yang Shi <yang.shi@linux.alibaba.com>,
Gang Deng <gavin.dg@linux.alibaba.com>,
Hugh Dickins <hughd@google.com>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Matthew Wilcox <willy@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 4.9 13/65] mm: thp: handle page cache THP correctly in PageTransCompoundMap
Date: Mon, 11 Nov 2019 19:28:13 +0100 [thread overview]
Message-ID: <20191111181343.632952893@linuxfoundation.org> (raw)
In-Reply-To: <20191111181331.917659011@linuxfoundation.org>
From: Yang Shi <yang.shi@linux.alibaba.com>
commit 169226f7e0d275c1879551f37484ef6683579a5c upstream.
We have a usecase to use tmpfs as QEMU memory backend and we would like
to take the advantage of THP as well. But, our test shows the EPT is
not PMD mapped even though the underlying THP are PMD mapped on host.
The number showed by /sys/kernel/debug/kvm/largepage is much less than
the number of PMD mapped shmem pages as the below:
7f2778200000-7f2878200000 rw-s 00000000 00:14 262232 /dev/shm/qemu_back_mem.mem.Hz2hSf (deleted)
Size: 4194304 kB
[snip]
AnonHugePages: 0 kB
ShmemPmdMapped: 579584 kB
[snip]
Locked: 0 kB
cat /sys/kernel/debug/kvm/largepages
12
And some benchmarks do worse than with anonymous THPs.
By digging into the code we figured out that commit 127393fbe597 ("mm:
thp: kvm: fix memory corruption in KVM with THP enabled") checks if
there is a single PTE mapping on the page for anonymous THP when setting
up EPT map. But the _mapcount < 0 check doesn't work for page cache THP
since every subpage of page cache THP would get _mapcount inc'ed once it
is PMD mapped, so PageTransCompoundMap() always returns false for page
cache THP. This would prevent KVM from setting up PMD mapped EPT entry.
So we need handle page cache THP correctly. However, when page cache
THP's PMD gets split, kernel just remove the map instead of setting up
PTE map like what anonymous THP does. Before KVM calls get_user_pages()
the subpages may get PTE mapped even though it is still a THP since the
page cache THP may be mapped by other processes at the mean time.
Checking its _mapcount and whether the THP has PTE mapped or not.
Although this may report some false negative cases (PTE mapped by other
processes), it looks not trivial to make this accurate.
With this fix /sys/kernel/debug/kvm/largepage would show reasonable
pages are PMD mapped by EPT as the below:
7fbeaee00000-7fbfaee00000 rw-s 00000000 00:14 275464 /dev/shm/qemu_back_mem.mem.SKUvat (deleted)
Size: 4194304 kB
[snip]
AnonHugePages: 0 kB
ShmemPmdMapped: 557056 kB
[snip]
Locked: 0 kB
cat /sys/kernel/debug/kvm/largepages
271
And the benchmarks are as same as anonymous THPs.
[yang.shi@linux.alibaba.com: v4]
Link: http://lkml.kernel.org/r/1571865575-42913-1-git-send-email-yang.shi@linux.alibaba.com
Link: http://lkml.kernel.org/r/1571769577-89735-1-git-send-email-yang.shi@linux.alibaba.com
Fixes: dd78fedde4b9 ("rmap: support file thp")
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Reported-by: Gang Deng <gavin.dg@linux.alibaba.com>
Tested-by: Gang Deng <gavin.dg@linux.alibaba.com>
Suggested-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
include/linux/mm.h | 5 -----
include/linux/mm_types.h | 5 +++++
include/linux/page-flags.h | 20 ++++++++++++++++++--
3 files changed, 23 insertions(+), 7 deletions(-)
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -504,11 +504,6 @@ static inline int is_vmalloc_or_module_a
extern void kvfree(const void *addr);
-static inline atomic_t *compound_mapcount_ptr(struct page *page)
-{
- return &page[1].compound_mapcount;
-}
-
static inline int compound_mapcount(struct page *page)
{
VM_BUG_ON_PAGE(!PageCompound(page), page);
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -262,6 +262,11 @@ struct page_frag_cache {
typedef unsigned long vm_flags_t;
+static inline atomic_t *compound_mapcount_ptr(struct page *page)
+{
+ return &page[1].compound_mapcount;
+}
+
/*
* A region containing a mapping of a non-memory backed file under NOMMU
* conditions. These are held in a global tree and are pinned by the VMAs that
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -545,12 +545,28 @@ static inline int PageTransCompound(stru
*
* Unlike PageTransCompound, this is safe to be called only while
* split_huge_pmd() cannot run from under us, like if protected by the
- * MMU notifier, otherwise it may result in page->_mapcount < 0 false
+ * MMU notifier, otherwise it may result in page->_mapcount check false
* positives.
+ *
+ * We have to treat page cache THP differently since every subpage of it
+ * would get _mapcount inc'ed once it is PMD mapped. But, it may be PTE
+ * mapped in the current process so comparing subpage's _mapcount to
+ * compound_mapcount to filter out PTE mapped case.
*/
static inline int PageTransCompoundMap(struct page *page)
{
- return PageTransCompound(page) && atomic_read(&page->_mapcount) < 0;
+ struct page *head;
+
+ if (!PageTransCompound(page))
+ return 0;
+
+ if (PageAnon(page))
+ return atomic_read(&page->_mapcount) < 0;
+
+ head = compound_head(page);
+ /* File THP is PMD mapped and not PTE mapped */
+ return atomic_read(&page->_mapcount) ==
+ atomic_read(compound_mapcount_ptr(head));
}
/*
next prev parent reply other threads:[~2019-11-11 19:09 UTC|newest]
Thread overview: 71+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-11 18:28 [PATCH 4.9 00/65] 4.9.201-stable review Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 01/65] CDC-NCM: handle incomplete transfer of MTU Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 02/65] ipv4: Fix table id reference in fib_sync_down_addr Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 03/65] net: fix data-race in neigh_event_send() Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 04/65] net: usb: qmi_wwan: add support for DW5821e with eSIM support Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 05/65] NFC: fdp: fix incorrect free object Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 06/65] nfc: netlink: fix double device reference drop Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 07/65] NFC: st21nfca: fix double free Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 08/65] qede: fix NULL pointer deref in __qede_remove() Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 09/65] ALSA: timer: Fix incorrectly assigned timer instance Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 10/65] ALSA: bebob: fix to detect configured source of sampling clock for Focusrite Saffire Pro i/o series Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 11/65] ALSA: hda/ca0132 - Fix possible workqueue stall Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 12/65] mm, meminit: recalculate pcpu batch and high limits after init completes Greg Kroah-Hartman
2019-11-11 18:28 ` Greg Kroah-Hartman [this message]
2019-11-11 18:28 ` [PATCH 4.9 14/65] mm, vmstat: hide /proc/pagetypeinfo from normal users Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 15/65] dump_stack: avoid the livelock of the dump_lock Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 16/65] perf tools: Fix time sorting Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 17/65] drm/radeon: fix si_enable_smc_cac() failed issue Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 18/65] ceph: fix use-after-free in __ceph_remove_cap() Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 19/65] iio: imu: adis16480: make sure provided frequency is positive Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 20/65] netfilter: nf_tables: Align nft_expr private data to 64-bit Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 21/65] netfilter: ipset: Fix an error code in ip_set_sockfn_get() Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 22/65] can: usb_8dev: fix use-after-free on disconnect Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 23/65] can: c_can: c_can_poll(): only read status register after status IRQ Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 24/65] can: peak_usb: fix a potential out-of-sync while decoding packets Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 25/65] can: gs_usb: gs_can_open(): prevent memory leak Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 26/65] can: peak_usb: fix slab info leak Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 27/65] configfs: Fix bool initialization/comparison Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 28/65] configfs: stash the data we need into configfs_buffer at open time Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 29/65] configfs_register_group() shouldnt be (and isnt) called in rmdirable parts Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 30/65] configfs: new object reprsenting tree fragments Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 31/65] configfs: provide exclusion between IO and removals Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 32/65] configfs: fix a deadlock in configfs_symlink() Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 33/65] usbip: stub_rx: fix static checker warning on unnecessary checks Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 34/65] usbip: Fix vhci_urb_enqueue() URB null transfer buffer error path Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 35/65] usbip: fix possibility of dereference by NULLL pointer in vhci_hcd.c Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 36/65] drivers: usb: usbip: Add missing break statement to switch Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 37/65] PCI: tegra: Enable Relaxed Ordering only for Tegra20 & Tegra30 Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 38/65] dmaengine: xilinx_dma: Fix control reg update in vdma_channel_set_config Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 39/65] HID: intel-ish-hid: fix wrong error handling in ishtp_cl_alloc_tx_ring() Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 40/65] scsi: qla2xxx: fixup incorrect usage of host_byte Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 41/65] scsi: lpfc: Honor module parameter lpfc_use_adisc Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 42/65] ipvs: move old_secure_tcp into struct netns_ipvs Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 43/65] bonding: fix unexpected IFF_BONDING bit unset Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 44/65] usb: fsl: Check memory resource before releasing it Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 45/65] usb: gadget: udc: atmel: Fix interrupt storm in FIFO mode Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 46/65] usb: gadget: composite: Fix possible double free memory bug Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 47/65] usb: gadget: configfs: fix concurrent issue between composite APIs Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 48/65] usb: dwc3: remove the call trace of USBx_GFLADJ Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 49/65] perf/x86/amd/ibs: Fix reading of the IBS OpData register and thus precise RIP validity Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 50/65] perf/x86/amd/ibs: Handle erratum #420 only on the affected CPU family (10h) Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 51/65] USB: Skip endpoints with 0 maxpacket length Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 52/65] RDMA/iw_cxgb4: Avoid freeing skb twice in arp failure case Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 53/65] scsi: qla2xxx: stop timer in shutdown path Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 54/65] fjes: Handle workqueue allocation failure Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 55/65] net: hisilicon: Fix "Trying to free already-free IRQ" Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 56/65] NFSv4: Dont allow a cached open with a revoked delegation Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 57/65] net: ethernet: arc: add the missed clk_disable_unprepare Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 58/65] igb: Fix constant media auto sense switching when no cable is connected Greg Kroah-Hartman
2019-11-11 18:28 ` [PATCH 4.9 59/65] e1000: fix memory leaks Greg Kroah-Hartman
2019-11-11 18:29 ` [PATCH 4.9 60/65] x86/apic: Move pending interrupt check code into its own function Greg Kroah-Hartman
2019-11-11 18:29 ` [PATCH 4.9 61/65] x86/apic: Drop logical_smp_processor_id() inline Greg Kroah-Hartman
2019-11-11 18:29 ` [PATCH 4.9 62/65] x86/apic/32: Avoid bogus LDR warnings Greg Kroah-Hartman
2019-11-11 18:29 ` [PATCH 4.9 63/65] can: flexcan: disable completely the ECC mechanism Greg Kroah-Hartman
2019-11-11 18:29 ` [PATCH 4.9 64/65] mm/filemap.c: dont initiate writeback if mapping has no dirty pages Greg Kroah-Hartman
2019-11-11 18:29 ` [PATCH 4.9 65/65] cgroup,writeback: dont switch wbs immediately on dead wbs if the memcg is dead Greg Kroah-Hartman
2019-11-12 0:38 ` [PATCH 4.9 00/65] 4.9.201-stable review kernelci.org bot
2019-11-12 5:27 ` Greg Kroah-Hartman
2019-11-12 12:00 ` Jon Hunter
2019-11-12 13:24 ` Naresh Kamboju
2019-11-12 18:19 ` Guenter Roeck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191111181343.632952893@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=gavin.dg@linux.alibaba.com \
--cc=hughd@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=willy@infradead.org \
--cc=yang.shi@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).