From: Minchan Kim <minchan@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Nadav Amit <namit@vmware.com>,
linux-mm@kvack.org, nadav.amit@gmail.com,
linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
Ingo Molnar <mingo@redhat.com>,
Russell King <linux@armlinux.org.uk>,
Tony Luck <tony.luck@intel.com>,
Martin Schwidefsky <schwidefsky@de.ibm.com>,
"David S. Miller" <davem@davemloft.net>,
Heiko Carstens <heiko.carstens@de.ibm.com>,
Yoshinori Sato <ysato@users.sourceforge.jp>,
Jeff Dike <jdike@addtoit.com>,
linux-arch@vger.kernel.org
Subject: Re: [PATCH v6 6/7] mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem
Date: Mon, 14 Aug 2017 10:26:17 +0900 [thread overview]
Message-ID: <20170814012617.GB25427@bbox> (raw)
In-Reply-To: <20170811133020.zozuuhbw72lzolj5@hirez.programming.kicks-ass.net>
Hi Peter,
On Fri, Aug 11, 2017 at 03:30:20PM +0200, Peter Zijlstra wrote:
> On Tue, Aug 01, 2017 at 05:08:17PM -0700, Nadav Amit wrote:
> > void tlb_finish_mmu(struct mmu_gather *tlb,
> > unsigned long start, unsigned long end)
> > {
> > - arch_tlb_finish_mmu(tlb, start, end);
> > + /*
> > + * If there are parallel threads are doing PTE changes on same range
> > + * under non-exclusive lock(e.g., mmap_sem read-side) but defer TLB
> > + * flush by batching, a thread has stable TLB entry can fail to flush
> > + * the TLB by observing pte_none|!pte_dirty, for example so flush TLB
> > + * forcefully if we detect parallel PTE batching threads.
> > + */
> > + bool force = mm_tlb_flush_nested(tlb->mm);
> > +
> > + arch_tlb_finish_mmu(tlb, start, end, force);
> > }
>
> I don't understand the comment nor the ordering. What guarantees we see
> the increment if we need to?
How about this about commenting part?
From 05f06fd6aba14447a9ca2df8b810fbcf9a58e14b Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@kernel.org>
Date: Mon, 14 Aug 2017 10:16:56 +0900
Subject: [PATCH] mm: add describable comment for TLB batch race
[1] is a rather subtle/complicated bug so that it's hard to
understand it with limited code comment.
This patch adds a sequence diagaram to explain the problem
more easily, I hope.
[1] 99baac21e458, mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
mm/memory.c | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)
diff --git a/mm/memory.c b/mm/memory.c
index bcbe56f52163..f571b0eb9816 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -413,12 +413,37 @@ void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
void tlb_finish_mmu(struct mmu_gather *tlb,
unsigned long start, unsigned long end)
{
+
+
/*
* If there are parallel threads are doing PTE changes on same range
* under non-exclusive lock(e.g., mmap_sem read-side) but defer TLB
* flush by batching, a thread has stable TLB entry can fail to flush
* the TLB by observing pte_none|!pte_dirty, for example so flush TLB
* forcefully if we detect parallel PTE batching threads.
+ *
+ * Example: MADV_DONTNEED stale TLB problem on same range
+ *
+ * CPU 0 CPU 1
+ * *a = 1;
+ * MADV_DONTNEED
+ * MADV_DONTNEED tlb_gather_mmu
+ * tlb_gather_mmu
+ * down_read(mmap_sem) down_read(mmap_sem)
+ * pte_lock
+ * pte_get_and_clear
+ * tlb_remove_tlb_entry
+ * pte_unlock
+ * pte_lock
+ * found out the pte is none
+ * pte_unlock
+ * tlb_finish_mmu doesn't flush
+ *
+ * Access the address with stale TLB
+ * *a = 2;ie, success without segfault
+ * tlb_finish_mmu flush on range
+ * but it is too late.
+ *
*/
bool force = mm_tlb_flush_nested(tlb->mm);
--
2.7.4
WARNING: multiple messages have this Message-ID (diff)
From: Minchan Kim <minchan@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Nadav Amit <namit@vmware.com>,
linux-mm@kvack.org, nadav.amit@gmail.com,
linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
Ingo Molnar <mingo@redhat.com>,
Russell King <linux@armlinux.org.uk>,
Tony Luck <tony.luck@intel.com>,
Martin Schwidefsky <schwidefsky@de.ibm.com>,
"David S. Miller" <davem@davemloft.net>,
Heiko Carstens <heiko.carstens@de.ibm.com>,
Yoshinori Sato <ysato@users.sourceforge.jp>,
Jeff Dike <jdike@addtoit.com>,
linux-arch@vger.kernel.org
Subject: Re: [PATCH v6 6/7] mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem
Date: Mon, 14 Aug 2017 10:26:17 +0900 [thread overview]
Message-ID: <20170814012617.GB25427@bbox> (raw)
Message-ID: <20170814012617.38ip7KUv-4nQr2LlmL3_9PbWJviNcqGaNk-iZ_7Oc5s@z> (raw)
In-Reply-To: <20170811133020.zozuuhbw72lzolj5@hirez.programming.kicks-ass.net>
Hi Peter,
On Fri, Aug 11, 2017 at 03:30:20PM +0200, Peter Zijlstra wrote:
> On Tue, Aug 01, 2017 at 05:08:17PM -0700, Nadav Amit wrote:
> > void tlb_finish_mmu(struct mmu_gather *tlb,
> > unsigned long start, unsigned long end)
> > {
> > - arch_tlb_finish_mmu(tlb, start, end);
> > + /*
> > + * If there are parallel threads are doing PTE changes on same range
> > + * under non-exclusive lock(e.g., mmap_sem read-side) but defer TLB
> > + * flush by batching, a thread has stable TLB entry can fail to flush
> > + * the TLB by observing pte_none|!pte_dirty, for example so flush TLB
> > + * forcefully if we detect parallel PTE batching threads.
> > + */
> > + bool force = mm_tlb_flush_nested(tlb->mm);
> > +
> > + arch_tlb_finish_mmu(tlb, start, end, force);
> > }
>
> I don't understand the comment nor the ordering. What guarantees we see
> the increment if we need to?
How about this about commenting part?
WARNING: multiple messages have this Message-ID (diff)
From: Minchan Kim <minchan@kernel.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Nadav Amit <namit@vmware.com>,
linux-mm@kvack.org, nadav.amit@gmail.com,
linux-kernel@vger.kernel.org, akpm@linux-foundation.org,
Ingo Molnar <mingo@redhat.com>,
Russell King <linux@armlinux.org.uk>,
Tony Luck <tony.luck@intel.com>,
Martin Schwidefsky <schwidefsky@de.ibm.com>,
"David S. Miller" <davem@davemloft.net>,
Heiko Carstens <heiko.carstens@de.ibm.com>,
Yoshinori Sato <ysato@users.sourceforge.jp>,
Jeff Dike <jdike@addtoit.com>,
linux-arch@vger.kernel.org
Subject: Re: [PATCH v6 6/7] mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem
Date: Mon, 14 Aug 2017 10:26:17 +0900 [thread overview]
Message-ID: <20170814012617.GB25427@bbox> (raw)
In-Reply-To: <20170811133020.zozuuhbw72lzolj5@hirez.programming.kicks-ass.net>
Hi Peter,
On Fri, Aug 11, 2017 at 03:30:20PM +0200, Peter Zijlstra wrote:
> On Tue, Aug 01, 2017 at 05:08:17PM -0700, Nadav Amit wrote:
> > void tlb_finish_mmu(struct mmu_gather *tlb,
> > unsigned long start, unsigned long end)
> > {
> > - arch_tlb_finish_mmu(tlb, start, end);
> > + /*
> > + * If there are parallel threads are doing PTE changes on same range
> > + * under non-exclusive lock(e.g., mmap_sem read-side) but defer TLB
> > + * flush by batching, a thread has stable TLB entry can fail to flush
> > + * the TLB by observing pte_none|!pte_dirty, for example so flush TLB
> > + * forcefully if we detect parallel PTE batching threads.
> > + */
> > + bool force = mm_tlb_flush_nested(tlb->mm);
> > +
> > + arch_tlb_finish_mmu(tlb, start, end, force);
> > }
>
> I don't understand the comment nor the ordering. What guarantees we see
> the increment if we need to?
How about this about commenting part?
>From 05f06fd6aba14447a9ca2df8b810fbcf9a58e14b Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@kernel.org>
Date: Mon, 14 Aug 2017 10:16:56 +0900
Subject: [PATCH] mm: add describable comment for TLB batch race
[1] is a rather subtle/complicated bug so that it's hard to
understand it with limited code comment.
This patch adds a sequence diagaram to explain the problem
more easily, I hope.
[1] 99baac21e458, mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
mm/memory.c | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)
diff --git a/mm/memory.c b/mm/memory.c
index bcbe56f52163..f571b0eb9816 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -413,12 +413,37 @@ void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm,
void tlb_finish_mmu(struct mmu_gather *tlb,
unsigned long start, unsigned long end)
{
+
+
/*
* If there are parallel threads are doing PTE changes on same range
* under non-exclusive lock(e.g., mmap_sem read-side) but defer TLB
* flush by batching, a thread has stable TLB entry can fail to flush
* the TLB by observing pte_none|!pte_dirty, for example so flush TLB
* forcefully if we detect parallel PTE batching threads.
+ *
+ * Example: MADV_DONTNEED stale TLB problem on same range
+ *
+ * CPU 0 CPU 1
+ * *a = 1;
+ * MADV_DONTNEED
+ * MADV_DONTNEED tlb_gather_mmu
+ * tlb_gather_mmu
+ * down_read(mmap_sem) down_read(mmap_sem)
+ * pte_lock
+ * pte_get_and_clear
+ * tlb_remove_tlb_entry
+ * pte_unlock
+ * pte_lock
+ * found out the pte is none
+ * pte_unlock
+ * tlb_finish_mmu doesn't flush
+ *
+ * Access the address with stale TLB
+ * *a = 2;ie, success without segfault
+ * tlb_finish_mmu flush on range
+ * but it is too late.
+ *
*/
bool force = mm_tlb_flush_nested(tlb->mm);
--
2.7.4
next prev parent reply other threads:[~2017-08-14 1:26 UTC|newest]
Thread overview: 85+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-02 0:08 [PATCH v6 0/7] fixes of TLB batching races Nadav Amit
2017-08-02 0:08 ` Nadav Amit
2017-08-02 0:08 ` [PATCH v6 1/7] mm: migrate: prevent racy access to tlb_flush_pending Nadav Amit
2017-08-02 0:08 ` Nadav Amit
2017-08-02 0:08 ` [PATCH v6 2/7] mm: migrate: fix barriers around tlb_flush_pending Nadav Amit
2017-08-02 0:08 ` Nadav Amit
2017-08-02 0:08 ` [PATCH v6 3/7] Revert "mm: numa: defer TLB flush for THP migration as long as possible" Nadav Amit
2017-08-02 0:08 ` Nadav Amit
2017-08-11 10:50 ` Peter Zijlstra
2017-08-11 10:50 ` Peter Zijlstra
2017-08-02 0:08 ` [PATCH v6 4/7] mm: refactoring TLB gathering API Nadav Amit
2017-08-02 0:08 ` Nadav Amit
2017-08-02 0:08 ` Nadav Amit
2017-08-11 9:23 ` Peter Zijlstra
2017-08-11 9:23 ` Peter Zijlstra
2017-08-11 17:12 ` Nadav Amit
2017-08-11 17:12 ` Nadav Amit
2017-08-14 0:49 ` Minchan Kim
2017-08-14 0:49 ` Minchan Kim
2017-08-02 0:08 ` [PATCH v6 5/7] mm: make tlb_flush_pending global Nadav Amit
2017-08-02 0:08 ` Nadav Amit
2017-08-02 14:28 ` kbuild test robot
2017-08-02 14:28 ` kbuild test robot
2017-08-02 23:23 ` Minchan Kim
2017-08-02 23:23 ` Minchan Kim
2017-08-02 23:27 ` Andrew Morton
2017-08-02 23:27 ` Andrew Morton
2017-08-02 23:34 ` Minchan Kim
2017-08-02 23:34 ` Minchan Kim
2017-08-03 16:40 ` kbuild test robot
2017-08-03 16:40 ` kbuild test robot
2017-08-02 0:08 ` [PATCH v6 6/7] mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem Nadav Amit
2017-08-02 0:08 ` Nadav Amit
2017-08-02 0:08 ` Nadav Amit
2017-08-08 1:19 ` [lkp-robot] [mm] 7674270022: will-it-scale.per_process_ops -19.3% regression kernel test robot
2017-08-08 1:19 ` kernel test robot
2017-08-08 1:19 ` kernel test robot
2017-08-08 1:19 ` kernel test robot
2017-08-08 2:28 ` Minchan Kim
2017-08-08 2:28 ` Minchan Kim
2017-08-08 2:28 ` Minchan Kim
2017-08-08 4:23 ` Nadav Amit
2017-08-08 4:23 ` Nadav Amit
2017-08-08 4:23 ` Nadav Amit
2017-08-08 5:51 ` Nadav Amit
2017-08-08 5:51 ` Nadav Amit
2017-08-08 5:51 ` Nadav Amit
2017-08-08 8:08 ` Minchan Kim
2017-08-08 8:08 ` Minchan Kim
2017-08-08 8:08 ` Minchan Kim
2017-08-08 8:08 ` Minchan Kim
2017-08-08 8:08 ` Minchan Kim
2017-08-08 8:16 ` Nadav Amit
2017-08-08 8:16 ` Nadav Amit
2017-08-09 1:25 ` Ye Xiaolong
2017-08-09 1:25 ` Ye Xiaolong
2017-08-09 1:25 ` Ye Xiaolong
2017-08-09 1:25 ` Ye Xiaolong
2017-08-09 2:59 ` Ye Xiaolong
2017-08-09 2:59 ` Ye Xiaolong
2017-08-09 2:59 ` Ye Xiaolong
2017-08-09 2:59 ` Ye Xiaolong
2017-08-09 2:59 ` Ye Xiaolong
2017-08-10 4:13 ` Minchan Kim
2017-08-10 4:13 ` Minchan Kim
2017-08-10 4:13 ` Minchan Kim
2017-08-10 4:14 ` Nadav Amit
2017-08-10 4:14 ` Nadav Amit
2017-08-10 4:14 ` Nadav Amit
2017-08-10 4:20 ` Minchan Kim
2017-08-10 4:20 ` Minchan Kim
2017-08-10 4:20 ` Minchan Kim
2017-08-11 13:30 ` [PATCH v6 6/7] mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem Peter Zijlstra
2017-08-11 13:30 ` Peter Zijlstra
2017-08-13 6:14 ` Nadav Amit
2017-08-13 12:08 ` Peter Zijlstra
2017-08-13 12:08 ` Peter Zijlstra
2017-08-13 12:08 ` Peter Zijlstra
2017-08-14 1:26 ` Minchan Kim [this message]
2017-08-14 1:26 ` Minchan Kim
2017-08-14 1:26 ` Minchan Kim
2017-08-02 0:08 ` [PATCH v6 7/7] mm: fix KSM data corruption Nadav Amit
2017-08-02 0:08 ` Nadav Amit
2017-08-02 23:26 ` [PATCH v6 0/7] fixes of TLB batching races Minchan Kim
2017-08-02 23:26 ` Minchan Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170814012617.GB25427@bbox \
--to=minchan@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=davem@davemloft.net \
--cc=heiko.carstens@de.ibm.com \
--cc=jdike@addtoit.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux@armlinux.org.uk \
--cc=mingo@redhat.com \
--cc=nadav.amit@gmail.com \
--cc=namit@vmware.com \
--cc=peterz@infradead.org \
--cc=schwidefsky@de.ibm.com \
--cc=tony.luck@intel.com \
--cc=ysato@users.sourceforge.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.