Re: TLB and PTE coherency during munmap

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Peter Zijlstra <peterz@infradead.org>
To: Max Filippov <jcmvbkbc@gmail.com>
Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	linux-arch@vger.kernel.org, linux-mm@kvack.org,
	Ralf Baechle <ralf@linux-mips.org>,
	Chris Zankel <chris@zankel.net>,
	Marc Gauthier <Marc.Gauthier@tensilica.com>,
	linux-xtensa@linux-xtensa.org, Hugh Dickins <hughd@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Ingo Molnar <mingo@kernel.org>
Subject: Re: TLB and PTE coherency during munmap
Date: Mon, 3 Jun 2013 11:05:01 +0200	[thread overview]
Message-ID: <20130603090501.GI5910@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <CAMo8BfJtkEtf9RKsGRnOnZ5zbJQz5tW4HeDfydFq_ZnrFr8opw@mail.gmail.com>

On Fri, May 31, 2013 at 08:09:17AM +0400, Max Filippov wrote:
> Hi Peter,
> 
> On Wed, May 29, 2013 at 9:51 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > What about something like this?
> 
> With that patch I still get mtest05 firing my TLB/PTE incoherency check
> in the UP PREEMPT_VOLUNTARY configuration. This happens after
> zap_pte_range completion in the end of unmap_region because of
> rescheduling called in the following call chain:

OK, so there two options; completely kill off fast-mode or something like the
below where we add magic to the scheduler :/

I'm aware people might object to something like the below -- but since its a
possibility I thought we ought to at least mention it.

For those new to the thread; the problem is that since the introduction of
preemptible mmu_gather the traditional UP fast-mode is broken. Fast-mode is
where we free the pages first and flush TLBs later. This is not a problem if
there's no concurrency, but obviously if you can preempt there now is.

I think I prefer completely killing off fast-mode esp. since UP seems to go the
way of the Dodo and it does away with an exception in the mmu_gather code.

Anyway; opinions? Linus, Thomas, Ingo?

---
nclude/asm-generic/tlb.h | 19 +++++++++++++++----
 include/linux/sched.h     |  1 +
 kernel/sched/core.c       |  9 +++++++++
 mm/memory.c               |  3 +++
 4 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index b1b1fa6..8d84154 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -116,15 +116,26 @@ struct mmu_gather {
 
 static inline int tlb_fast_mode(struct mmu_gather *tlb)
 {
+#ifdef CONFIG_PREEMPT
+	/*
+	 * We don't want to add to the schedule fast path for preemptible
+	 * kernels; disable fast mode unconditionally.
+	 */
+	return 0;
+#endif
+
 #ifdef CONFIG_SMP
+	/*
+	 * We can only use fast mode if there's a single CPU online;
+	 * otherwise SMP might trip over stale TLB entries.
+	 */
 	return tlb->fast_mode;
-#else
+#endif
+
 	/*
-	 * For UP we don't need to worry about TLB flush
-	 * and page free order so much..
+	 * Non-preempt UP can do fast mode unconditionally.
 	 */
 	return 1;
-#endif
 }
 
 void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, bool fullmm);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 178a8d9..3dc6930 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1416,6 +1416,7 @@ struct task_struct {
 	unsigned int	sequential_io;
 	unsigned int	sequential_io_avg;
 #endif
+	struct mmu_gather *tlb;
 };
 
 /* Future-safe accessor for struct task_struct's cpus_allowed. */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 36f85be..6829b78 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2374,6 +2374,15 @@ static void __sched __schedule(void)
 	struct rq *rq;
 	int cpu;
 
+#ifndef CONFIG_PREEMPT
+	/*
+	 * We always force batched mmu_gather for preemptible kernels in order
+	 * to minimize scheduling delays. See tlb_fast_mode().
+	 */
+	if (current->tlb && tlb_fast_mode(current->tlb))
+		tlb_flush_mmu(current->tlb);
+#endif
+
 need_resched:
 	preempt_disable();
 	cpu = smp_processor_id();
diff --git a/mm/memory.c b/mm/memory.c
index d7d54a1..8925578 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -230,6 +230,7 @@ void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, bool fullmm)
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	tlb->batch = NULL;
 #endif
+	current->tlb = tlb;
 }
 
 void tlb_flush_mmu(struct mmu_gather *tlb)
@@ -274,6 +275,8 @@ void tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long e
 		free_pages((unsigned long)batch, 0);
 	}
 	tlb->local.next = NULL;
+
+	current->tlb = NULL;
 }
 
 /* __tlb_remove_page

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Peter Zijlstra <peterz@infradead.org>
To: Max Filippov <jcmvbkbc@gmail.com>
Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	linux-arch@vger.kernel.org, linux-mm@kvack.org,
	Ralf Baechle <ralf@linux-mips.org>,
	Chris Zankel <chris@zankel.net>,
	Marc Gauthier <Marc.Gauthier@tensilica.com>,
	linux-xtensa@linux-xtensa.org, Hugh Dickins <hughd@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Ingo Molnar <mingo@kernel.org>
Subject: Re: TLB and PTE coherency during munmap
Date: Mon, 3 Jun 2013 11:05:01 +0200	[thread overview]
Message-ID: <20130603090501.GI5910@twins.programming.kicks-ass.net> (raw)
Message-ID: <20130603090501.0NbCoh2qDf3xucYQjhWtb7WqoXsK172LSvv5LyJbdcU@z> (raw)
In-Reply-To: <CAMo8BfJtkEtf9RKsGRnOnZ5zbJQz5tW4HeDfydFq_ZnrFr8opw@mail.gmail.com>

On Fri, May 31, 2013 at 08:09:17AM +0400, Max Filippov wrote:
> Hi Peter,
> 
> On Wed, May 29, 2013 at 9:51 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > What about something like this?
> 
> With that patch I still get mtest05 firing my TLB/PTE incoherency check
> in the UP PREEMPT_VOLUNTARY configuration. This happens after
> zap_pte_range completion in the end of unmap_region because of
> rescheduling called in the following call chain:

OK, so there two options; completely kill off fast-mode or something like the
below where we add magic to the scheduler :/

I'm aware people might object to something like the below -- but since its a
possibility I thought we ought to at least mention it.

For those new to the thread; the problem is that since the introduction of
preemptible mmu_gather the traditional UP fast-mode is broken. Fast-mode is
where we free the pages first and flush TLBs later. This is not a problem if
there's no concurrency, but obviously if you can preempt there now is.

I think I prefer completely killing off fast-mode esp. since UP seems to go the
way of the Dodo and it does away with an exception in the mmu_gather code.

Anyway; opinions? Linus, Thomas, Ingo?

---
nclude/asm-generic/tlb.h | 19 +++++++++++++++----
 include/linux/sched.h     |  1 +
 kernel/sched/core.c       |  9 +++++++++
 mm/memory.c               |  3 +++
 4 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index b1b1fa6..8d84154 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -116,15 +116,26 @@ struct mmu_gather {
 
 static inline int tlb_fast_mode(struct mmu_gather *tlb)
 {
+#ifdef CONFIG_PREEMPT
+	/*
+	 * We don't want to add to the schedule fast path for preemptible
+	 * kernels; disable fast mode unconditionally.
+	 */
+	return 0;
+#endif
+
 #ifdef CONFIG_SMP
+	/*
+	 * We can only use fast mode if there's a single CPU online;
+	 * otherwise SMP might trip over stale TLB entries.
+	 */
 	return tlb->fast_mode;
-#else
+#endif
+
 	/*
-	 * For UP we don't need to worry about TLB flush
-	 * and page free order so much..
+	 * Non-preempt UP can do fast mode unconditionally.
 	 */
 	return 1;
-#endif
 }
 
 void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, bool fullmm);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 178a8d9..3dc6930 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1416,6 +1416,7 @@ struct task_struct {
 	unsigned int	sequential_io;
 	unsigned int	sequential_io_avg;
 #endif
+	struct mmu_gather *tlb;
 };
 
 /* Future-safe accessor for struct task_struct's cpus_allowed. */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 36f85be..6829b78 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2374,6 +2374,15 @@ static void __sched __schedule(void)
 	struct rq *rq;
 	int cpu;
 
+#ifndef CONFIG_PREEMPT
+	/*
+	 * We always force batched mmu_gather for preemptible kernels in order
+	 * to minimize scheduling delays. See tlb_fast_mode().
+	 */
+	if (current->tlb && tlb_fast_mode(current->tlb))
+		tlb_flush_mmu(current->tlb);
+#endif
+
 need_resched:
 	preempt_disable();
 	cpu = smp_processor_id();
diff --git a/mm/memory.c b/mm/memory.c
index d7d54a1..8925578 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -230,6 +230,7 @@ void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, bool fullmm)
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	tlb->batch = NULL;
 #endif
+	current->tlb = tlb;
 }
 
 void tlb_flush_mmu(struct mmu_gather *tlb)
@@ -274,6 +275,8 @@ void tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long e
 		free_pages((unsigned long)batch, 0);
 	}
 	tlb->local.next = NULL;
+
+	current->tlb = NULL;
 }
 
 /* __tlb_remove_page

next prev parent reply	other threads:[~2013-06-03  9:05 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-26  2:42 TLB and PTE coherency during munmap Max Filippov
2013-05-26  2:50 ` Max Filippov
2013-05-26  2:50   ` Max Filippov
2013-05-28  7:10   ` Max Filippov
2013-05-28  7:10     ` Max Filippov
2013-05-29 12:27     ` Peter Zijlstra
2013-05-29 12:27       ` Peter Zijlstra
2013-05-29 12:42       ` Vineet Gupta
2013-05-29 12:42         ` Vineet Gupta
2013-05-29 12:47         ` Peter Zijlstra
2013-05-29 12:47           ` Peter Zijlstra
2013-05-29 17:51         ` Peter Zijlstra
2013-05-29 17:51           ` Peter Zijlstra
2013-05-29 22:04           ` Catalin Marinas
2013-05-29 22:04             ` Catalin Marinas
2013-05-30  6:48             ` Peter Zijlstra
2013-05-30  6:48               ` Peter Zijlstra
2013-05-30  5:04           ` Vineet Gupta
2013-05-30  5:04             ` Vineet Gupta
2013-05-30  6:56             ` Peter Zijlstra
2013-05-30  6:56               ` Peter Zijlstra
2013-05-30  7:00               ` Vineet Gupta
2013-05-30  7:00                 ` Vineet Gupta
2013-05-30 11:03                 ` Peter Zijlstra
2013-05-30 11:03                   ` Peter Zijlstra
2013-05-31  4:09           ` Max Filippov
2013-05-31  4:09             ` Max Filippov
2013-05-31  7:55             ` Peter Zijlstra
2013-05-31  7:55               ` Peter Zijlstra
2013-06-03  9:05             ` Peter Zijlstra [this message]
2013-06-03  9:05               ` Peter Zijlstra
2013-06-03  9:16               ` Ingo Molnar
2013-06-03  9:16                 ` Ingo Molnar
2013-06-03 10:01                 ` Catalin Marinas
2013-06-03 10:01                   ` Catalin Marinas
2013-06-03 10:04                   ` Peter Zijlstra
2013-06-03 10:04                     ` Peter Zijlstra
2013-06-03 10:09                     ` Catalin Marinas
2013-06-03 10:09                       ` Catalin Marinas
2013-06-04  9:52               ` Peter Zijlstra
2013-06-04  9:52                 ` Peter Zijlstra
2013-06-05  0:05                 ` Linus Torvalds
2013-06-05  0:05                   ` Linus Torvalds
2013-06-05 10:26                   ` [PATCH] arch, mm: Remove tlb_fast_mode() Peter Zijlstra
2013-06-05 10:26                     ` Peter Zijlstra
2013-05-31  1:40       ` TLB and PTE coherency during munmap Max Filippov
2013-05-31  1:40         ` Max Filippov
2013-05-28 14:34   ` Konrad Rzeszutek Wilk
2013-05-28 14:34     ` Konrad Rzeszutek Wilk
2013-05-29  3:23     ` Max Filippov
2013-05-29  3:23       ` Max Filippov
2013-05-28 15:16   ` Michal Hocko
2013-05-28 15:16     ` Michal Hocko
2013-05-28 15:23     ` Catalin Marinas
2013-05-28 15:23       ` Catalin Marinas
2013-05-28 14:35 ` Catalin Marinas
2013-05-29  4:15   ` Max Filippov
2013-05-29  4:15     ` Max Filippov
2013-05-29 10:15     ` Catalin Marinas
2013-05-29 10:15       ` Catalin Marinas
2013-05-31  1:26       ` Max Filippov
2013-05-31  1:26         ` Max Filippov
2013-05-31  9:06         ` Catalin Marinas
2013-05-31  9:06           ` Catalin Marinas
2013-06-03  9:16         ` Max Filippov
2013-06-03  9:16           ` Max Filippov
2013-05-29 11:53   ` Vineet Gupta
2013-05-29 12:00   ` Vineet Gupta
2013-05-29 12:00     ` Vineet Gupta
  -- strict thread matches above, loose matches on Subject: below --
2013-06-07  2:21 George Spelvin

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:b1b1fa6 dfblob:8d84154 dfblob:178a8d9 dfblob:3dc6930
dfblob:36f85be dfblob:6829b78 dfblob:d7d54a1 dfblob:8925578
dfblob:b1b1fa6 dfblob:8d84154 dfblob:178a8d9 dfblob:3dc6930
dfblob:36f85be dfblob:6829b78 dfblob:d7d54a1 dfblob:8925578 )
 OR (
bs:"Re: TLB and PTE coherency during munmap" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130603090501.GI5910@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=Marc.Gauthier@tensilica.com \
    --cc=Vineet.Gupta1@synopsys.com \
    --cc=akpm@linux-foundation.org \
    --cc=chris@zankel.net \
    --cc=hughd@google.com \
    --cc=jcmvbkbc@gmail.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xtensa@linux-xtensa.org \
    --cc=mingo@kernel.org \
    --cc=ralf@linux-mips.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.