From: Ingo Molnar <mingo@kernel.org>
To: Rik van Riel <riel@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>,
serebrin@google.com, peterz@infradead.org,
linux-kernel@vger.kernel.org, luto@kernel.org, bp@suse.de,
mgorman@suse.de, tglx@linutronix.de,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH RFC v3] x86,mm,sched: make lazy TLB mode even lazier
Date: Sat, 27 Aug 2016 10:03:16 +0200 [thread overview]
Message-ID: <20160827080316.GA11325@gmail.com> (raw)
In-Reply-To: <20160825170133.0a783ae8@riellap.home.surriel.com>
* Rik van Riel <riel@redhat.com> wrote:
> On Thu, 25 Aug 2016 12:42:15 -0700
> "H. Peter Anvin" <hpa@zytor.com> wrote:
>
> > Why grabbing a lock instead of cmpxchg?
>
> ... and some more cleanups later, this might actually be
> good to merge, assuming it works for Benjamin :)
>
> ---8<---
LGTM in principle (it's a pretty clever trick!), just some minor stylistic nits:
>
> Subject: x86,mm,sched: make lazy TLB mode even lazier
>
> Lazy TLB mode can result in an idle CPU being woken up for a TLB
> flush, when all it really needed to do was flush %cr3 before the
> next context switch.
s/%cr3/CR3
> This is mostly fine on bare metal, though sub-optimal from a power
> saving point of view, and deeper C states could make TLB flushes
> take a little longer than desired.
s/C state/C-state/
> On virtual machines, the pain can be much worse, especially if a
> currently non-running VCPU is woken up for a TLB invalidation
> IPI, on a CPU that is busy running another task. It could take
> a while before that IPI is handled, leading to performance issues.
So I was going to suggest:
s/VCPU/vCPU
But then, out of curiosity, I ran:
triton:~/tip> for N in $(git log v3.0.. | grep -i vcpu | sed 's/[^a-zA-Z\d\s]/ /g'); do echo $N; done | grep -iw vcpu | sort | uniq -c | sort -n
3 Vcpu
125 vCPU
527 VCPU
1611 vcpu
So never mind, I guess we missed that boat by a few years already ;-)
>
> This patch deals with the issue by introducing a third tlb state,
> TLBSTATE_FLUSH, which causes %cr3 to be flushed at the next
> context switch.
s/tlb state/TLB state
>
> A CPU that transitions from TLBSTATE_LAZY to TLBSTATE_OK during
> the attempted transition to TLBSTATE_FLUSH will get a TLB flush
> IPI, just like a CPU that was in TLBSTATE_OK to begin with.
>
> Nothing is done for a CPU that is already in TLBSTATE_FLUSH mode.
>
> This patch is totally untested, because I am at a conference right
> now, and Benjamin has the test case :)
>
> Benjamin, does this help your issue?
>
> Signed-off-by: Rik van Riel <riel@redhat.com>
> Reported-by: Benjamin Serebrin <serebrin@google.com>
> ---
> arch/x86/include/asm/tlbflush.h | 1 +
> arch/x86/mm/tlb.c | 47 ++++++++++++++++++++++++++++++++++++++---
> 2 files changed, 45 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
> index 4e5be94e079a..5ae8e4b174f8 100644
> --- a/arch/x86/include/asm/tlbflush.h
> +++ b/arch/x86/include/asm/tlbflush.h
> @@ -310,6 +310,7 @@ void native_flush_tlb_others(const struct cpumask *cpumask,
>
> #define TLBSTATE_OK 1
> #define TLBSTATE_LAZY 2
> +#define TLBSTATE_FLUSH 3
>
> static inline void reset_lazy_tlbstate(void)
> {
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index 5643fd0b1a7d..4352db65a129 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -140,10 +140,12 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
> }
> #ifdef CONFIG_SMP
> else {
> + int oldstate = this_cpu_read(cpu_tlbstate.state);
> this_cpu_write(cpu_tlbstate.state, TLBSTATE_OK);
Please separate local variable definition blocks from statements by an extra
newline.
> BUG_ON(this_cpu_read(cpu_tlbstate.active_mm) != next);
>
> - if (!cpumask_test_cpu(cpu, mm_cpumask(next))) {
> + if (oldstate == TLBSTATE_FLUSH ||
> + !cpumask_test_cpu(cpu, mm_cpumask(next))) {
> /*
> * On established mms, the mm_cpumask is only changed
> * from irq context, from ptep_clear_flush() while in
> @@ -242,11 +244,42 @@ static void flush_tlb_func(void *info)
>
> }
>
> +/*
> + * Determine whether a CPU's TLB needs to be flushed now, or whether the
> + * flush can be delayed until the next context switch, by changing the
> + * tlbstate from TLBSTATE_LAZY to TLBSTATE_FLUSH.
> + */
> +static bool lazy_tlb_can_skip_flush(int cpu) {
> + int *tlbstate = &per_cpu(cpu_tlbstate.state, cpu);
> + int old;
Yeah, so this is not how the standard function definition looks like.
> + /* A task on the CPU is actively using the mm. Flush the TLB. */
> + if (*tlbstate == TLBSTATE_OK)
> + return false;
> +
> + /* The TLB will be flushed on the next context switch. */
> + if (*tlbstate == TLBSTATE_FLUSH)
> + return true;
> +
> + /*
> + * The CPU is in TLBSTATE_LAZY, which could context switch back
> + * to TLBSTATE_OK, re-using the TLB state without a TLB flush.
> + * In that case, a TLB flush IPI needs to be sent.
> + *
> + * Otherwise, the TLB state is now TLBSTATE_FLUSH, and the
> + * TLB flush IPI can be skipped.
> + */
> + old = cmpxchg(tlbstate, TLBSTATE_LAZY, TLBSTATE_FLUSH);
> +
> + return old != TLBSTATE_OK;
> +}
> +
> void native_flush_tlb_others(const struct cpumask *cpumask,
> struct mm_struct *mm, unsigned long start,
> unsigned long end)
> {
> struct flush_tlb_info info;
> + unsigned int cpu;
>
> if (end == 0)
> end = start + PAGE_SIZE;
> @@ -262,8 +295,6 @@ void native_flush_tlb_others(const struct cpumask *cpumask,
> (end - start) >> PAGE_SHIFT);
>
> if (is_uv_system()) {
> - unsigned int cpu;
> -
> cpu = smp_processor_id();
> cpumask = uv_flush_tlb_others(cpumask, mm, start, end, cpu);
> if (cpumask)
> @@ -271,6 +302,16 @@ void native_flush_tlb_others(const struct cpumask *cpumask,
> &info, 1);
> return;
> }
> +
> + /*
> + * Instead of sending IPIs to CPUs in lazy TLB mode, move that
> + * CPUs TLB state to TLBSTATE_FLUSH, causing the TLB to be flushed
> + * at the next context switch.
s/CPUs/CPU's
> + */
> + for_each_cpu(cpu, cpumask)
> + if (lazy_tlb_can_skip_flush(cpu))
> + cpumask_clear_cpu(cpu, (struct cpumask *)cpumask);
Please remove the 'const' from the cpumask type definition instead of this ugly
cast!
Plus this needs one more pair of curly braces and it's perfect! ;-)
I'd also like to wait for the Tested-by from Benjamin as well before we can
proceeed.
Thanks,
Ingo
next prev parent reply other threads:[~2016-08-27 8:03 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-08-25 19:04 [PATCH RFC UGLY] x86,mm,sched: make lazy TLB mode even lazier Rik van Riel
2016-08-25 19:42 ` H. Peter Anvin
2016-08-25 19:52 ` Rik van Riel
2016-08-25 20:15 ` [PATCH RFC v2] " Rik van Riel
2016-08-25 21:01 ` [PATCH RFC v3] " Rik van Riel
2016-08-27 8:03 ` Ingo Molnar [this message]
2016-08-27 23:02 ` Linus Torvalds
2016-08-30 19:53 ` [PATCH RFC v4] " Rik van Riel
2016-08-30 21:09 ` [PATCH RFC v5] " Rik van Riel
2016-08-31 16:27 ` [PATCH RFC v6] " Rik van Riel
2016-09-08 6:56 ` Ingo Molnar
2016-09-09 0:09 ` Benjamin Serebrin
2016-09-09 4:39 ` Andy Lutomirski
2016-09-09 7:44 ` Peter Zijlstra
2017-04-25 3:30 ` Andy Lutomirski
2016-08-29 15:24 ` [PATCH RFC v3] " Rik van Riel
2016-08-29 16:08 ` [PATCH RFC UGLY] " Rik van Riel
2016-08-28 8:11 ` Andy Lutomirski
2016-08-29 14:54 ` Rik van Riel
2016-08-29 23:55 ` Andy Lutomirski
2016-08-30 1:14 ` H. Peter Anvin
2016-08-30 18:23 ` Andy Lutomirski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160827080316.GA11325@gmail.com \
--to=mingo@kernel.org \
--cc=bp@suse.de \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=mgorman@suse.de \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=serebrin@google.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox