From: Mel Gorman <mgorman@suse.de>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>, Hugh Dickins <hughd@google.com>,
Minchan Kim <minchan@kernel.org>,
Dave Hansen <dave.hansen@intel.com>,
Andi Kleen <andi@firstfloor.org>, H Peter Anvin <hpa@zytor.com>,
Ingo Molnar <mingo@kernel.org>, Linux-MM <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>, Mel Gorman <mgorman@suse.de>
Subject: [PATCH 3/3] mm: Defer flush of writable TLB entries
Date: Mon, 8 Jun 2015 13:50:54 +0100 [thread overview]
Message-ID: <1433767854-24408-4-git-send-email-mgorman@suse.de> (raw)
In-Reply-To: <1433767854-24408-1-git-send-email-mgorman@suse.de>
If a PTE is unmapped and it's dirty then it was writable recently. Due
to deferred TLB flushing, it's best to assume a writable TLB cache entry
exists. With that assumption, the TLB must be flushed before any IO can
start or the page is freed to avoid lost writes or data corruption. This
patch defers flushing of potentially writable TLBs as long as possible.
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
include/linux/sched.h | 1 +
mm/internal.h | 4 ++++
mm/rmap.c | 28 +++++++++++++++++++++-------
mm/vmscan.c | 7 ++++++-
4 files changed, 32 insertions(+), 8 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 57ff61b16565..827d9b123bd5 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1296,6 +1296,7 @@ enum perf_event_task_context {
struct tlbflush_unmap_batch {
struct cpumask cpumask;
unsigned long nr_pages;
+ bool writable;
unsigned long pfns[BATCH_TLBFLUSH_SIZE];
};
diff --git a/mm/internal.h b/mm/internal.h
index 8cbb68ccc731..ecf47a01420d 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -438,10 +438,14 @@ struct tlbflush_unmap_batch;
#ifdef CONFIG_ARCH_SUPPORTS_LOCAL_TLB_PFN_FLUSH
void try_to_unmap_flush(void);
+void try_to_unmap_flush_dirty(void);
#else
static inline void try_to_unmap_flush(void)
{
}
+static inline void try_to_unmap_flush_dirty(void)
+{
+}
#endif /* CONFIG_ARCH_SUPPORTS_LOCAL_TLB_PFN_FLUSH */
#endif /* __MM_INTERNAL_H */
diff --git a/mm/rmap.c b/mm/rmap.c
index a8dbba62398a..3c8ebacfe6ef 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -621,11 +621,21 @@ void try_to_unmap_flush(void)
}
cpumask_clear(&tlb_ubc->cpumask);
tlb_ubc->nr_pages = 0;
+ tlb_ubc->writable = false;
preempt_enable();
}
+/* Flush iff there are potentially writable TLB entries that can race with IO */
+void try_to_unmap_flush_dirty(void)
+{
+ struct tlbflush_unmap_batch *tlb_ubc = current->tlb_ubc;
+
+ if (tlb_ubc && tlb_ubc->writable)
+ try_to_unmap_flush();
+}
+
static void set_tlb_ubc_flush_pending(struct mm_struct *mm,
- struct page *page)
+ struct page *page, bool writable)
{
struct tlbflush_unmap_batch *tlb_ubc = current->tlb_ubc;
@@ -633,6 +643,14 @@ static void set_tlb_ubc_flush_pending(struct mm_struct *mm,
tlb_ubc->pfns[tlb_ubc->nr_pages] = page_to_pfn(page);
tlb_ubc->nr_pages++;
+ /*
+ * If the PTE was dirty then it's best to assume it's writable. The
+ * caller must use try_to_unmap_flush_dirty() or try_to_unmap_flush()
+ * before the page any IO is initiated.
+ */
+ if (writable)
+ tlb_ubc->writable = true;
+
if (tlb_ubc->nr_pages == BATCH_TLBFLUSH_SIZE)
try_to_unmap_flush();
}
@@ -657,7 +675,7 @@ static bool should_defer_flush(struct mm_struct *mm, enum ttu_flags flags)
}
#else
static void set_tlb_ubc_flush_pending(struct mm_struct *mm,
- struct page *page)
+ struct page *page, bool writable)
{
}
@@ -1309,11 +1327,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
*/
pteval = ptep_get_and_clear(mm, address, pte);
- /* Potentially writable TLBs must be flushed before IO */
- if (pte_dirty(pteval))
- flush_tlb_page(vma, address);
- else
- set_tlb_ubc_flush_pending(mm, page);
+ set_tlb_ubc_flush_pending(mm, page, pte_dirty(pteval));
} else {
pteval = ptep_clear_flush(vma, address, pte);
}
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5121742ccb87..0055224c52d4 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1065,7 +1065,12 @@ static unsigned long shrink_page_list(struct list_head *page_list,
if (!sc->may_writepage)
goto keep_locked;
- /* Page is dirty, try to write it out here */
+ /*
+ * Page is dirty. Flush the TLB if a writable entry
+ * potentially exists to avoid CPU writes after IO
+ * starts and then write it out here
+ */
+ try_to_unmap_flush_dirty();
switch (pageout(page, mapping, sc)) {
case PAGE_KEEP:
goto keep_locked;
--
2.3.5
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2015-06-08 12:51 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-08 12:50 [PATCH 0/3] TLB flush multiple pages per IPI v5 Mel Gorman
2015-06-08 12:50 ` [PATCH 1/3] x86, mm: Trace when an IPI is about to be sent Mel Gorman
2015-06-08 12:50 ` [PATCH 2/3] mm: Send one IPI per CPU to TLB flush multiple pages that were recently unmapped Mel Gorman
2015-06-08 22:38 ` Andrew Morton
2015-06-09 11:07 ` Mel Gorman
2015-06-08 12:50 ` Mel Gorman [this message]
2015-06-08 17:45 ` [PATCH 0/3] TLB flush multiple pages per IPI v5 Ingo Molnar
2015-06-08 18:21 ` Dave Hansen
2015-06-08 19:52 ` Ingo Molnar
2015-06-08 20:03 ` Ingo Molnar
2015-06-08 21:07 ` Dave Hansen
2015-06-08 21:50 ` Ingo Molnar
2015-06-09 8:47 ` Mel Gorman
2015-06-09 10:32 ` Ingo Molnar
2015-06-09 11:20 ` Mel Gorman
2015-06-09 12:43 ` Ingo Molnar
2015-06-09 13:05 ` Mel Gorman
2015-06-10 8:51 ` Ingo Molnar
2015-06-10 9:08 ` Ingo Molnar
2015-06-10 10:15 ` Mel Gorman
2015-06-11 15:26 ` Ingo Molnar
2015-06-10 9:19 ` Mel Gorman
2015-06-09 15:34 ` Dave Hansen
2015-06-09 16:49 ` Dave Hansen
2015-06-09 21:14 ` Dave Hansen
2015-06-09 21:54 ` Linus Torvalds
2015-06-09 22:32 ` Mel Gorman
2015-06-09 22:35 ` Mel Gorman
2015-06-10 13:13 ` Andi Kleen
2015-06-10 16:17 ` Linus Torvalds
2015-06-10 16:42 ` Linus Torvalds
2015-06-10 17:24 ` Mel Gorman
2015-06-10 17:31 ` Linus Torvalds
2015-06-10 18:08 ` Josh Boyer
2015-06-10 17:07 ` Mel Gorman
2015-06-21 20:22 ` Kirill A. Shutemov
2015-06-25 11:48 ` Ingo Molnar
2015-06-25 18:36 ` Linus Torvalds
2015-06-25 19:15 ` Vlastimil Babka
2015-06-25 22:04 ` Linus Torvalds
2015-06-25 18:46 ` Dave Hansen
2015-06-26 9:08 ` Ingo Molnar
-- strict thread matches above, loose matches on Subject: below --
2015-04-25 17:45 [PATCH 0/3] TLB flush multiple pages per IPI v4 Mel Gorman
2015-04-25 17:45 ` [PATCH 3/3] mm: Defer flush of writable TLB entries Mel Gorman
2015-04-27 2:50 ` Rik van Riel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1433767854-24408-4-git-send-email-mgorman@suse.de \
--to=mgorman@suse.de \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=dave.hansen@intel.com \
--cc=hpa@zytor.com \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan@kernel.org \
--cc=mingo@kernel.org \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).