From: Dave Hansen <dave.hansen@linux.intel.com>
To: linux-kernel@vger.kernel.org
Cc: Dave Hansen <dave.hansen@linux.intel.com>,
mhocko@suse.com, jannh@google.com, vbabka@suse.cz,
minchan@kernel.org, dancol@google.com, joel@joelfernandes.org,
akpm@linux-foundation.org
Subject: [PATCH 1/2] mm/madvise: help MADV_PAGEOUT to find swap cache pages
Date: Mon, 23 Mar 2020 16:41:49 -0700 [thread overview]
Message-ID: <20200323234149.9FE95081@viggo.jf.intel.com> (raw)
In-Reply-To: <20200323234147.558EBA81@viggo.jf.intel.com>
From: Dave Hansen <dave.hansen@linux.intel.com>
tl;dr: MADV_PAGEOUT ignores unmapped swap cache pages. Enable
MADV_PAGEOUT to find and reclaim swap cache.
The long story:
Looking for another issue, I wrote a simple test which had two
processes: a parent and a fork()'d child. The parent reads a
memory buffer shared by the fork() and the child calls
madvise(MADV_PAGEOUT) on the same buffer.
The first call to MADV_PAGEOUT does what is expected: it pages
the memory out and causes faults in the parent. However, after
that, it does not cause any faults in the parent. MADV_PAGEOUT
only works once! This was a surprise.
The PTEs in the shared buffer start out pte_present()==1 in
both parent and child. The first MADV_PAGEOUT operation replaces
those with pte_present()==0 swap PTEs. The parent process
quickly faults and recreates pte_present()==1. However, the
child process (the one calling MADV_PAGEOUT) never touches the
memory and has retained the non-present swap PTEs.
This situation could also happen in the case where a single
process had some of its data placed in the swap cache but where
the memory has not yet been reclaimed.
The MADV_PAGEOUT code has a pte_present()==0 check. It will
essentially ignore any pte_present()==0 pages. This essentially
makes unmapped swap cache immune from MADV_PAGEOUT, which is not
very friendly behavior.
Enable MADV_PAGEOUT to find and reclaim swap cache. Because
swap cache is not pinned by holding the PTE lock, a reference
must be held until the page is isolated, where a second
reference is obtained.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Jann Horn <jannh@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Daniel Colascione <dancol@google.com>
Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
b/mm/madvise.c | 68 +++++++++++++++++++++++++++++++++++++++++++++++----------
1 file changed, 57 insertions(+), 11 deletions(-)
diff -puN mm/madvise.c~madv-pageout-find-swap-cache mm/madvise.c
--- a/mm/madvise.c~madv-pageout-find-swap-cache 2020-03-23 16:30:48.505385896 -0700
+++ b/mm/madvise.c 2020-03-23 16:30:48.509385896 -0700
@@ -250,6 +250,52 @@ static void force_shm_swapin_readahead(s
#endif /* CONFIG_SWAP */
/*
+ * Given a PTE, find the corresponding 'struct page'
+ * and acquire a reference. Also handles non-present
+ * swap PTEs.
+ *
+ * Returns NULL when there is no page to reclaim.
+ */
+static struct page *pte_get_reclaim_page(struct vm_area_struct *vma,
+ unsigned long addr, pte_t ptent)
+{
+ swp_entry_t entry;
+ struct page *page;
+
+ /* Totally empty PTE: */
+ if (pte_none(ptent))
+ return NULL;
+
+ /* Handle present or PROT_NONE ptes: */
+ if (!is_swap_pte(ptent)) {
+ page = vm_normal_page(vma, addr, ptent);
+ if (page)
+ get_page(page);
+ return page;
+ }
+
+ /*
+ * 'ptent' is now definitely a (non-present) swap
+ * PTE in this process. Go look for additional
+ * references to the swap cache.
+ */
+
+ /*
+ * Is it one of the "swap PTEs" that's not really
+ * swap? Do not try to reclaim those.
+ */
+ entry = pte_to_swp_entry(ptent);
+ if (non_swap_entry(entry))
+ return NULL;
+
+ /*
+ * The PTE was a true swap entry. The page may be in
+ * the swap cache.
+ */
+ return lookup_swap_cache(entry, vma, addr);
+}
+
+/*
* Schedule all required I/O operations. Do not wait for completion.
*/
static long madvise_willneed(struct vm_area_struct *vma,
@@ -398,13 +444,8 @@ regular_page:
for (; addr < end; pte++, addr += PAGE_SIZE) {
ptent = *pte;
- if (pte_none(ptent))
- continue;
-
- if (!pte_present(ptent))
- continue;
-
- page = vm_normal_page(vma, addr, ptent);
+ /* 'page' can be mapped, in the swap cache or both */
+ page = pte_get_reclaim_page(vma, addr, ptent);
if (!page)
continue;
@@ -413,9 +454,10 @@ regular_page:
* are sure it's worth. Split it if we are only owner.
*/
if (PageTransCompound(page)) {
- if (page_mapcount(page) != 1)
+ if (page_mapcount(page) != 1) {
+ put_page(page);
break;
- get_page(page);
+ }
if (!trylock_page(page)) {
put_page(page);
break;
@@ -436,12 +478,14 @@ regular_page:
}
/* Do not interfere with other mappings of this page */
- if (page_mapcount(page) != 1)
+ if (page_mapcount(page) != 1) {
+ put_page(page);
continue;
+ }
VM_BUG_ON_PAGE(PageTransCompound(page), page);
- if (pte_young(ptent)) {
+ if (!is_swap_pte(ptent) && pte_young(ptent)) {
ptent = ptep_get_and_clear_full(mm, addr, pte,
tlb->fullmm);
ptent = pte_mkold(ptent);
@@ -466,6 +510,8 @@ regular_page:
}
} else
deactivate_page(page);
+ /* drop ref acquired in pte_get_reclaim_page() */
+ put_page(page);
}
arch_leave_lazy_mmu_mode();
_
next prev parent reply other threads:[~2020-03-23 23:43 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-23 23:41 [PATCH 0/2] mm/madvise: teach MADV_PAGEOUT about swap cache Dave Hansen
2020-03-23 23:41 ` Dave Hansen [this message]
2020-03-26 6:24 ` [PATCH 1/2] mm/madvise: help MADV_PAGEOUT to find swap cache pages Minchan Kim
2020-03-23 23:41 ` [PATCH 2/2] mm/madvise: skip MADV_PAGEOUT on shared " Dave Hansen
2020-03-26 6:28 ` Minchan Kim
2020-03-26 23:00 ` Dave Hansen
2020-03-27 6:42 ` Minchan Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200323234149.9FE95081@viggo.jf.intel.com \
--to=dave.hansen@linux.intel.com \
--cc=akpm@linux-foundation.org \
--cc=dancol@google.com \
--cc=jannh@google.com \
--cc=joel@joelfernandes.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mhocko@suse.com \
--cc=minchan@kernel.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox