All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Nicholas Piggin <npiggin@gmail.com>
Cc: linux-kernel@vger.kernel.org, x86@kernel.org,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Arnd Bergmann <arnd@arndb.de>,
	linux-arch@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	linux-mm@kvack.org, Anton Blanchard <anton@ozlabs.org>
Subject: Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option
Date: Wed, 2 Dec 2020 12:17:31 +0100	[thread overview]
Message-ID: <20201202111731.GA2414@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20201128160141.1003903-7-npiggin@gmail.com>

On Sun, Nov 29, 2020 at 02:01:39AM +1000, Nicholas Piggin wrote:
> +static void shoot_lazy_tlbs(struct mm_struct *mm)
> +{
> +	if (IS_ENABLED(CONFIG_MMU_LAZY_TLB_SHOOTDOWN)) {
> +		/*
> +		 * IPI overheads have not found to be expensive, but they could
> +		 * be reduced in a number of possible ways, for example (in
> +		 * roughly increasing order of complexity):
> +		 * - A batch of mms requiring IPIs could be gathered and freed
> +		 *   at once.
> +		 * - CPUs could store their active mm somewhere that can be
> +		 *   remotely checked without a lock, to filter out
> +		 *   false-positives in the cpumask.
> +		 * - After mm_users or mm_count reaches zero, switching away
> +		 *   from the mm could clear mm_cpumask to reduce some IPIs
> +		 *   (some batching or delaying would help).
> +		 * - A delayed freeing and RCU-like quiescing sequence based on
> +		 *   mm switching to avoid IPIs completely.
> +		 */
> +		on_each_cpu_mask(mm_cpumask(mm), do_shoot_lazy_tlb, (void *)mm, 1);
> +		if (IS_ENABLED(CONFIG_DEBUG_VM))
> +			on_each_cpu(do_check_lazy_tlb, (void *)mm, 1);

So the obvious 'improvement' here would be something like:

	for_each_online_cpu(cpu) {
		p = rcu_dereference(cpu_rq(cpu)->curr;
		if (p->active_mm != mm)
			continue;
		__cpumask_set_cpu(cpu, tmpmask);
	}
	on_each_cpu_mask(tmpmask, ...);

The remote CPU will never switch _to_ @mm, on account of it being quite
dead, but it is quite prone to false negatives.

Consider that __schedule() sets rq->curr *before* context_switch(), this
means we'll see next->active_mm, even though prev->active_mm might still
be our @mm.

Now, because we'll be removing the atomic ops from context_switch()'s
active_mm swizzling, I think we can change this to something like the
below. The hope being that the cost of the new barrier can be offset by
the loss of the atomics.

Hmm ?

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 41404afb7f4c..2597c5c0ccb0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4509,7 +4509,6 @@ context_switch(struct rq *rq, struct task_struct *prev,
 	if (!next->mm) {                                // to kernel
 		enter_lazy_tlb(prev->active_mm, next);
 
-		next->active_mm = prev->active_mm;
 		if (prev->mm)                           // from user
 			mmgrab(prev->active_mm);
 		else
@@ -4524,6 +4523,7 @@ context_switch(struct rq *rq, struct task_struct *prev,
 		 * case 'prev->active_mm == next->mm' through
 		 * finish_task_switch()'s mmdrop().
 		 */
+		next->active_mm = next->mm;
 		switch_mm_irqs_off(prev->active_mm, next->mm, next);
 
 		if (!prev->mm) {                        // from kernel
@@ -5713,11 +5713,9 @@ static void __sched notrace __schedule(bool preempt)
 
 	if (likely(prev != next)) {
 		rq->nr_switches++;
-		/*
-		 * RCU users of rcu_dereference(rq->curr) may not see
-		 * changes to task_struct made by pick_next_task().
-		 */
-		RCU_INIT_POINTER(rq->curr, next);
+
+		next->active_mm = prev->active_mm;
+		rcu_assign_pointer(rq->curr, next);
 		/*
 		 * The membarrier system call requires each architecture
 		 * to have a full memory barrier after updating

WARNING: multiple messages have this Message-ID (diff)
From: Peter Zijlstra <peterz@infradead.org>
To: Nicholas Piggin <npiggin@gmail.com>
Cc: linux-arch@vger.kernel.org, Arnd Bergmann <arnd@arndb.de>,
	x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option
Date: Wed, 2 Dec 2020 12:17:31 +0100	[thread overview]
Message-ID: <20201202111731.GA2414@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20201128160141.1003903-7-npiggin@gmail.com>

On Sun, Nov 29, 2020 at 02:01:39AM +1000, Nicholas Piggin wrote:
> +static void shoot_lazy_tlbs(struct mm_struct *mm)
> +{
> +	if (IS_ENABLED(CONFIG_MMU_LAZY_TLB_SHOOTDOWN)) {
> +		/*
> +		 * IPI overheads have not found to be expensive, but they could
> +		 * be reduced in a number of possible ways, for example (in
> +		 * roughly increasing order of complexity):
> +		 * - A batch of mms requiring IPIs could be gathered and freed
> +		 *   at once.
> +		 * - CPUs could store their active mm somewhere that can be
> +		 *   remotely checked without a lock, to filter out
> +		 *   false-positives in the cpumask.
> +		 * - After mm_users or mm_count reaches zero, switching away
> +		 *   from the mm could clear mm_cpumask to reduce some IPIs
> +		 *   (some batching or delaying would help).
> +		 * - A delayed freeing and RCU-like quiescing sequence based on
> +		 *   mm switching to avoid IPIs completely.
> +		 */
> +		on_each_cpu_mask(mm_cpumask(mm), do_shoot_lazy_tlb, (void *)mm, 1);
> +		if (IS_ENABLED(CONFIG_DEBUG_VM))
> +			on_each_cpu(do_check_lazy_tlb, (void *)mm, 1);

So the obvious 'improvement' here would be something like:

	for_each_online_cpu(cpu) {
		p = rcu_dereference(cpu_rq(cpu)->curr;
		if (p->active_mm != mm)
			continue;
		__cpumask_set_cpu(cpu, tmpmask);
	}
	on_each_cpu_mask(tmpmask, ...);

The remote CPU will never switch _to_ @mm, on account of it being quite
dead, but it is quite prone to false negatives.

Consider that __schedule() sets rq->curr *before* context_switch(), this
means we'll see next->active_mm, even though prev->active_mm might still
be our @mm.

Now, because we'll be removing the atomic ops from context_switch()'s
active_mm swizzling, I think we can change this to something like the
below. The hope being that the cost of the new barrier can be offset by
the loss of the atomics.

Hmm ?

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 41404afb7f4c..2597c5c0ccb0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4509,7 +4509,6 @@ context_switch(struct rq *rq, struct task_struct *prev,
 	if (!next->mm) {                                // to kernel
 		enter_lazy_tlb(prev->active_mm, next);
 
-		next->active_mm = prev->active_mm;
 		if (prev->mm)                           // from user
 			mmgrab(prev->active_mm);
 		else
@@ -4524,6 +4523,7 @@ context_switch(struct rq *rq, struct task_struct *prev,
 		 * case 'prev->active_mm == next->mm' through
 		 * finish_task_switch()'s mmdrop().
 		 */
+		next->active_mm = next->mm;
 		switch_mm_irqs_off(prev->active_mm, next->mm, next);
 
 		if (!prev->mm) {                        // from kernel
@@ -5713,11 +5713,9 @@ static void __sched notrace __schedule(bool preempt)
 
 	if (likely(prev != next)) {
 		rq->nr_switches++;
-		/*
-		 * RCU users of rcu_dereference(rq->curr) may not see
-		 * changes to task_struct made by pick_next_task().
-		 */
-		RCU_INIT_POINTER(rq->curr, next);
+
+		next->active_mm = prev->active_mm;
+		rcu_assign_pointer(rq->curr, next);
 		/*
 		 * The membarrier system call requires each architecture
 		 * to have a full memory barrier after updating

  parent reply	other threads:[~2020-12-02 11:18 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-28 16:01 [PATCH 0/8] shoot lazy tlbs Nicholas Piggin
2020-11-28 16:01 ` Nicholas Piggin
2020-11-28 16:01 ` [PATCH 1/8] lazy tlb: introduce exit_lazy_tlb Nicholas Piggin
2020-11-28 16:01   ` Nicholas Piggin
2020-11-29  0:38   ` Andy Lutomirski
2020-11-29  0:38     ` Andy Lutomirski
2020-12-02  2:49     ` Nicholas Piggin
2020-12-02  2:49       ` Nicholas Piggin
2020-11-28 16:01 ` [PATCH 2/8] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode Nicholas Piggin
2020-11-28 16:01   ` Nicholas Piggin
2020-11-28 17:55   ` Andy Lutomirski
2020-11-28 17:55     ` Andy Lutomirski
2020-12-02  2:49     ` Nicholas Piggin
2020-12-02  2:49       ` Nicholas Piggin
2020-12-03  5:09       ` Andy Lutomirski
2020-12-03  5:09         ` Andy Lutomirski
2020-12-05  8:00         ` Nicholas Piggin
2020-12-05  8:00           ` Nicholas Piggin
2020-12-05 16:11           ` Andy Lutomirski
2020-12-05 16:11             ` Andy Lutomirski
2020-12-05 23:14             ` Nicholas Piggin
2020-12-05 23:14               ` Nicholas Piggin
2020-12-06  0:36               ` Andy Lutomirski
2020-12-06  0:36                 ` Andy Lutomirski
2020-12-06  3:59                 ` Nicholas Piggin
2020-12-06  3:59                   ` Nicholas Piggin
2020-12-11  0:11                   ` Andy Lutomirski
2020-12-11  0:11                     ` Andy Lutomirski
2020-12-14  4:07                     ` Nicholas Piggin
2020-12-14  4:07                       ` Nicholas Piggin
2020-12-14  5:53                       ` Nicholas Piggin
2020-12-14  5:53                         ` Nicholas Piggin
2020-11-30 14:57   ` Mathieu Desnoyers
2020-11-30 14:57     ` Mathieu Desnoyers
2020-11-28 16:01 ` [PATCH 3/8] x86: remove ARCH_HAS_SYNC_CORE_BEFORE_USERMODE Nicholas Piggin
2020-11-28 16:01   ` Nicholas Piggin
2020-11-28 16:01 ` [PATCH 4/8] lazy tlb: introduce lazy mm refcount helper functions Nicholas Piggin
2020-11-28 16:01   ` Nicholas Piggin
2020-11-28 16:01 ` [PATCH 5/8] lazy tlb: allow lazy tlb mm switching to be configurable Nicholas Piggin
2020-11-28 16:01   ` Nicholas Piggin
2020-11-29  0:36   ` Andy Lutomirski
2020-11-29  0:36     ` Andy Lutomirski
2020-12-02  2:49     ` Nicholas Piggin
2020-12-02  2:49       ` Nicholas Piggin
2020-11-28 16:01 ` [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option Nicholas Piggin
2020-11-28 16:01   ` Nicholas Piggin
2020-11-29  3:54   ` Andy Lutomirski
2020-11-29  3:54     ` Andy Lutomirski
2020-11-29 20:16     ` Andy Lutomirski
2020-11-29 20:16       ` Andy Lutomirski
2020-11-30  9:25       ` Peter Zijlstra
2020-11-30  9:25         ` Peter Zijlstra
2020-11-30 18:31       ` Andy Lutomirski
2020-11-30 18:31         ` Andy Lutomirski
2020-12-01 21:27         ` Will Deacon
2020-12-01 21:27           ` Will Deacon
2020-12-01 21:50           ` Andy Lutomirski
2020-12-01 21:50             ` Andy Lutomirski
2020-12-01 23:04             ` Will Deacon
2020-12-01 23:04               ` Will Deacon
2020-12-02  3:47         ` Nicholas Piggin
2020-12-02  3:47           ` Nicholas Piggin
2020-12-03  5:05           ` Andy Lutomirski
2020-12-03  5:05             ` Andy Lutomirski
2020-12-03 17:03         ` Alexander Gordeev
2020-12-03 17:03           ` Alexander Gordeev
2020-12-03 17:14           ` Andy Lutomirski
2020-12-03 17:14             ` Andy Lutomirski
2020-12-03 18:33             ` Alexander Gordeev
2020-12-03 18:33               ` Alexander Gordeev
2020-11-30  9:26     ` Peter Zijlstra
2020-11-30  9:26       ` Peter Zijlstra
2020-11-30  9:30     ` Peter Zijlstra
2020-11-30  9:30       ` Peter Zijlstra
2020-11-30  9:34       ` Peter Zijlstra
2020-11-30  9:34         ` Peter Zijlstra
2020-12-02  3:09     ` Nicholas Piggin
2020-12-02  3:09       ` Nicholas Piggin
2020-12-02 11:17   ` Peter Zijlstra [this message]
2020-12-02 11:17     ` Peter Zijlstra
2020-12-02 12:45     ` Peter Zijlstra
2020-12-02 12:45       ` Peter Zijlstra
2020-12-02 14:19   ` Peter Zijlstra
2020-12-02 14:19     ` Peter Zijlstra
2020-12-02 14:38     ` Andy Lutomirski
2020-12-02 14:38       ` Andy Lutomirski
2020-12-02 16:29       ` Peter Zijlstra
2020-12-02 16:29         ` Peter Zijlstra
2020-11-28 16:01 ` [PATCH 7/8] powerpc: use lazy mm refcount helper functions Nicholas Piggin
2020-11-28 16:01   ` Nicholas Piggin
2020-11-28 16:01 ` [PATCH 8/8] powerpc/64s: enable MMU_LAZY_TLB_SHOOTDOWN Nicholas Piggin
2020-11-28 16:01   ` Nicholas Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201202111731.GA2414@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=anton@ozlabs.org \
    --cc=arnd@arndb.de \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=npiggin@gmail.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.