Re: [RFC 05/11] x86/mm: Rework lazy TLB mode and TLB freshness tracking

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Rik van Riel <riel@redhat.com>
To: Andy Lutomirski <luto@kernel.org>
Cc: X86 ML <x86@kernel.org>, Borislav Petkov <bpetkov@suse.de>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Nadav Amit <nadav.amit@gmail.com>,
	Andrew Banman <abanman@sgi.com>, Mike Travis <travis@sgi.com>,
	Dimitri Sivanich <sivanich@sgi.com>,
	Juergen Gross <jgross@suse.com>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>
Subject: Re: [RFC 05/11] x86/mm: Rework lazy TLB mode and TLB freshness tracking
Date: Tue, 06 Jun 2017 23:33:25 -0400	[thread overview]
Message-ID: <1496806405.29205.131.camel@redhat.com> (raw)
In-Reply-To: <CALCETrVX73+vHJMVYaddygEFj42oc3ShoUrXOm_s6CBwEP1peA@mail.gmail.com>

On Tue, 2017-06-06 at 14:34 -0700, Andy Lutomirski wrote:
> On Tue, Jun 6, 2017 at 12:11 PM, Rik van Riel <riel@redhat.com>
> wrote:
> > On Mon, 2017-06-05 at 15:36 -0700, Andy Lutomirski wrote:
> > 
> > > +++ b/arch/x86/include/asm/mmu_context.h
> > > @@ -122,8 +122,10 @@ static inline void switch_ldt(struct
> > > mm_struct
> > > *prev, struct mm_struct *next)
> > > 
> > > A static inline void enter_lazy_tlb(struct mm_struct *mm, struct
> > > task_struct *tsk)
> > > A {
> > > -A A A A A if (this_cpu_read(cpu_tlbstate.state) == TLBSTATE_OK)
> > > -A A A A A A A A A A A A A this_cpu_write(cpu_tlbstate.state, TLBSTATE_LAZY);
> > > +A A A A A int cpu = smp_processor_id();
> > > +
> > > +A A A A A if (cpumask_test_cpu(cpu, mm_cpumask(mm)))
> > > +A A A A A A A A A A A A A cpumask_clear_cpu(cpu, mm_cpumask(mm));
> > > A }
> > 
> > This is an atomic write to a shared cacheline,
> > every time a CPU goes idle.
> > 
> > I am not sure you really want to do this, since
> > there are some workloads out there that have a
> > crazy number of threads, which go idle hundreds,
> > or even thousands of times a second, on dozens
> > of CPUs at a time. *cough*Java*cough*
> 
> It seems to me that the set of workloads on which this patch will
> hurt
> performance is rather limited.A A We'd need an mm with a lot of
> threads,
> probably spread among a lot of nodes, that is constantly going idle
> and non-idle on multiple CPUs on the same node, where there's nothing
> else happening on those CPUs.

I am assuming the SPECjbb2015 benchmark is representative
of how some actual (albeit crazy) Java workloads behave.

> > Keeping track of the state in a CPU-local variable,
> > written with a non-atomic write, would be much more
> > CPU cache friendly here.
> 
> We could, but then handing remote flushes becomes more complicated.

I already wrote that code. It's not that hard.

> My inclination would be to keep the patch as is and, if this is
> actually a problem, think about solving it more generally.A A The real
> issue is that we need a way to reasonably efficiently find the set of
> CPUs for which a given mm is currently loaded and non-lazy.A A A simple
> improvement would be to split up mm_cpumask so that we'd have one
> cache line per node.A A (And we'd presumably allow several mms to share
> the same pile of memory.)A A Or we could go all out and use percpu
> state
> only and iterate over all online CPUs when flushing (ick!).A A Or
> something in between.

Reading per cpu state is relatively cheap. Writing is
more expensive, but that only needs to be done at TLB
flush time, and is much cheaper than sending an IPI.

Tasks going idle and waking back up seems to be a much
more common operation than doing a TLB flush. Having the
idle path being the more expensive one makes little sense
to me, but I may be overlooking something.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2017-06-07  3:33 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-05 22:36 [RFC 00/11] PCID and improved laziness Andy Lutomirski
2017-06-05 22:36 ` [RFC 01/11] x86/ldt: Simplify LDT switching logic Andy Lutomirski
2017-06-05 22:40   ` Linus Torvalds
2017-06-05 22:44     ` Andy Lutomirski
2017-06-05 22:51     ` Linus Torvalds
2017-06-05 22:36 ` [RFC 02/11] x86/mm: Remove reset_lazy_tlbstate() Andy Lutomirski
2017-06-05 22:36 ` [RFC 03/11] x86/mm: Give each mm TLB flush generation a unique ID Andy Lutomirski
2017-06-05 22:36 ` [RFC 04/11] x86/mm: Track the TLB's tlb_gen and update the flushing algorithm Andy Lutomirski
2017-06-06  5:03   ` Nadav Amit
2017-06-06 22:45     ` Andy Lutomirski
2017-06-05 22:36 ` [RFC 05/11] x86/mm: Rework lazy TLB mode and TLB freshness tracking Andy Lutomirski
2017-06-06  1:39   ` Nadav Amit
2017-06-06 21:23     ` Andy Lutomirski
2017-06-06 19:11   ` Rik van Riel
2017-06-06 21:34     ` Andy Lutomirski
2017-06-07  3:33       ` Rik van Riel [this message]
2017-06-07  4:54         ` Andy Lutomirski
2017-06-07  5:11           ` Andy Lutomirski
2017-06-05 22:36 ` [RFC 06/11] x86/mm: Stop calling leave_mm() in idle code Andy Lutomirski
2017-06-05 22:36 ` [RFC 07/11] x86/mm: Disable PCID on 32-bit kernels Andy Lutomirski
2017-06-05 22:36 ` [RFC 08/11] x86/mm: Add nopcid to turn off PCID Andy Lutomirski
2017-06-06  3:22   ` Andi Kleen
2017-06-14  4:52     ` Andy Lutomirski
2017-06-14  9:51       ` Borislav Petkov
2017-06-05 22:36 ` [RFC 09/11] x86/mm: Teach CR3 readers about PCID Andy Lutomirski
2017-06-05 22:36 ` [RFC 10/11] x86/mm: Enable CR4.PCIDE on supported systems Andy Lutomirski
2017-06-06 21:31   ` Boris Ostrovsky
2017-06-06 21:35     ` Andy Lutomirski
2017-06-06 21:48       ` Boris Ostrovsky
2017-06-06 21:54         ` Andy Lutomirski
2017-06-05 22:36 ` [RFC 11/11] x86/mm: Try to preserve old TLB entries using PCID Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1496806405.29205.131.camel@redhat.com \
    --to=riel@redhat.com \
    --cc=abanman@sgi.com \
    --cc=akpm@linux-foundation.org \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bpetkov@suse.de \
    --cc=jgross@suse.com \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mgorman@suse.de \
    --cc=nadav.amit@gmail.com \
    --cc=sivanich@sgi.com \
    --cc=torvalds@linux-foundation.org \
    --cc=travis@sgi.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).