All of lore.kernel.org
 help / color / mirror / Atom feed
From: Will Deacon <will.deacon@arm.com>
To: Andy Lutomirski <luto@kernel.org>
Cc: Rik van Riel <riel@surriel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Juergen Gross <jgross@suse.com>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	linux-arch <linux-arch@vger.kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	linux-s390@vger.kernel.org,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
	LKML <linux-kernel@vger.kernel.org>, X86 ML <x86@kernel.org>,
	Mike Galbraith <efault@gmx.de>, kernel-team <kernel-team@fb.com>,
	Ingo Molnar <mingo@kernel.org>,
	Dave Hansen <dave.hansen@intel.com>
Subject: Re: [PATCH 4/7] x86,tlb: make lazy TLB mode lazier
Date: Tue, 24 Jul 2018 17:33:30 +0100	[thread overview]
Message-ID: <20180724163330.GC25888@arm.com> (raw)
In-Reply-To: <CALCETrXLMsSBChDvrms-omwYV4LHT30GenDjbnD-+LTg55yPow@mail.gmail.com>

Hi Andy,

Sorry, I missed the arm64 question at the end of this...

On Thu, Jul 19, 2018 at 10:04:09AM -0700, Andy Lutomirski wrote:
> On Thu, Jul 19, 2018 at 9:45 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> > [I added PeterZ and Vitaly -- can you see any way in which this would
> > break something obscure?  I don't.]
> >
> > On Thu, Jul 19, 2018 at 7:14 AM, Rik van Riel <riel@surriel.com> wrote:
> >> I guess we can skip both switch_ldt and load_mm_cr4 if real_prev equals
> >> next?
> >
> > Yes, AFAICS.
> >
> >>
> >> On to the lazy TLB mm_struct refcounting stuff :)
> >>
> >>>
> >>> Which refcount?  mm_users shouldn’t be hot, so I assume you’re talking about
> >>> mm_count. My suggestion is to get rid of mm_count instead of trying to
> >>> optimize it.
> >>
> >>
> >> Do you have any suggestions on how? :)
> >>
> >> The TLB shootdown sent at __exit_mm time does not get rid of the
> >> kernelthread->active_mm
> >> pointer pointing at the mm that is exiting.
> >>
> >
> > Ah, but that's conceptually very easy to fix.  Add a #define like
> > ARCH_NO_TASK_ACTIVE_MM.  Then just get rid of active_mm if that
> > #define is set.  After some grepping, there are very few users.  The
> > only nontrivial ones are the ones in kernel/ and mm/mmu_context.c that
> > are involved in the rather complicated dance of refcounting active_mm.
> > If that field goes away, it doesn't need to be refcounted.  Instead, I
> > think the refcounting can get replaced with something like:
> >
> > /*
> >  * Release any arch-internal references to mm.  Only called when
> > mm_users is zero
> >  * and all tasks using mm have either been switch_mm()'d away or have had
> >  * enter_lazy_tlb() called.
> >  */
> > extern void arch_shoot_down_dead_mm(struct mm_struct *mm);
> >
> > which the kernel calls in __mmput() after tearing down all the page
> > tables.  The body can be something like:
> >
> > if (WARN_ON(cpumask_any_but(mm_cpumask(...), ...)) {
> >   /* send an IPI.  Maybe just call tlb_flush_remove_tables() */
> > }
> >
> > (You'll also have to fix up the highly questionable users in
> > arch/x86/platform/efi/efi_64.c, but that's easy.)
> >
> > Does all that make sense?  Basically, as I understand it, the
> > expensive atomic ops you're seeing are all pointless because they're
> > enabling an optimization that hasn't actually worked for a long time,
> > if ever.
> 
> Hmm.  Xen PV has a big hack in xen_exit_mmap(), which is called from
> arch_exit_mmap(), I think.  It's a heavier weight version of more or
> less the same thing that arch_shoot_down_dead_mm() would be, except
> that it happens before exit_mmap().  But maybe Xen actually has the
> right idea.  In other words, rather doing the big pagetable free in
> exit_mmap() while there may still be other CPUs pointing at the page
> tables, the other order might make more sense.  So maybe, if
> ARCH_NO_TASK_ACTIVE_MM is set, arch_exit_mmap() should be responsible
> for getting rid of all secret arch references to the mm.
> 
> Hmm.  ARCH_FREE_UNUSED_MM_IMMEDIATELY might be a better name.
> 
> I added some more arch maintainers.  The idea here is that, on x86 at
> least, task->active_mm and all its refcounting is pure overhead.  When
> a process exits, __mmput() gets called, but the core kernel has a
> longstanding "optimization" in which other tasks (kernel threads and
> idle tasks) may have ->active_mm pointing at this mm.  This is nasty,
> complicated, and hurts performance on large systems, since it requires
> extra atomic operations whenever a CPU switches between real users
> threads and idle/kernel threads.
> 
> It's also almost completely worthless on x86 at least, since __mmput()
> frees pagetables, and that operation *already* forces a remote TLB
> flush, so we might as well zap all the active_mm references at the
> same time.
> 
> But arm64 has real HW remote flushes.  Does arm64 actually benefit
> from the active_mm optimization?  What happens on arm64 when a process
> exits?  How about s390?  I suspect that x390 has rather larger systems
> than arm64, where the cost of the reference counting can be much
> higher.

IIRC, the TLB invalidation on task exit has the fullmm field set in the
mmu_gather structure, so we don't actually do any TLB invalidation at all.
Instead, we just don't re-allocate the ASID and invalidate the whole TLB
when we run out of ASIDs (they're 16-bit on most Armv8 CPUs).

Does that answer your question?

Will

  parent reply	other threads:[~2018-07-24 16:33 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-16 19:03 [PATCH v6 0/7] x86,tlb,mm: make lazy TLB mode even lazier Rik van Riel
2018-07-16 19:03 ` [PATCH 1/7] mm: allocate mm_cpumask dynamically based on nr_cpu_ids Rik van Riel
2018-07-17  9:33   ` [tip:x86/mm] mm: Allocate the mm_cpumask (mm->cpu_bitmap[]) " tip-bot for Rik van Riel
2018-08-04 22:28   ` [PATCH 1/7] mm: allocate mm_cpumask " Guenter Roeck
2018-07-16 19:03 ` [PATCH 2/7] x86,tlb: leave lazy TLB mode at page table free time Rik van Riel
2018-07-17  9:34   ` [tip:x86/mm] x86/mm/tlb: Leave " tip-bot for Rik van Riel
2018-07-17 11:46     ` Peter Zijlstra
2018-07-25  1:00       ` Anders Roxell
2018-08-16  1:54   ` [PATCH 2/7] x86,tlb: leave " Andy Lutomirski
2018-08-16  5:31     ` Rik van Riel
2018-07-16 19:03 ` [PATCH 3/7] x86,mm: restructure switch_mm_irqs_off Rik van Riel
2018-07-17  9:34   ` [tip:x86/mm] x86/mm/tlb: Restructure switch_mm_irqs_off() tip-bot for Rik van Riel
2018-10-09 14:58   ` tip-bot for Rik van Riel
2018-07-16 19:03 ` [PATCH 4/7] x86,tlb: make lazy TLB mode lazier Rik van Riel
2018-07-17  9:35   ` [tip:x86/mm] x86/mm/tlb: Make " tip-bot for Rik van Riel
2018-07-17 11:33     ` Peter Zijlstra
2018-07-18 15:33       ` Rik van Riel
2018-07-18 16:00         ` Peter Zijlstra
     [not found]           ` <081E558D-DB34-4A18-A35C-896BC47F6EBA@surriel.com>
2018-07-18 18:23             ` Peter Zijlstra
2018-07-18 18:51               ` Rik van Riel
2018-07-19  9:13                 ` Peter Zijlstra
2018-07-17 20:04   ` [PATCH 4/7] x86,tlb: make " Andy Lutomirski
     [not found]     ` <FF977B78-140F-4787-AA57-0EA934017D85@surriel.com>
2018-07-17 21:29       ` Andy Lutomirski
2018-07-17 22:05         ` Rik van Riel
2018-07-17 22:27           ` Andy Lutomirski
2018-07-18 20:58     ` Rik van Riel
2018-07-18 23:13       ` Andy Lutomirski
     [not found]         ` <B976CC13-D014-433A-83DE-F8DF9AB4F421@surriel.com>
2018-07-19 16:45           ` Andy Lutomirski
2018-07-19 17:04             ` Andy Lutomirski
2018-07-19 17:04               ` Andy Lutomirski
2018-07-20  4:57               ` Benjamin Herrenschmidt
2018-07-20  8:30               ` Peter Zijlstra
2018-07-23 12:26                 ` Rik van Riel
2018-07-24 16:33               ` Will Deacon [this message]
     [not found]             ` <CF849A07-B7CE-4DE9-8246-53AC5A53A705@surriel.com>
2018-07-19 17:18               ` Andy Lutomirski
2018-07-20  8:02             ` Vitaly Kuznetsov
2018-07-20  9:49               ` Peter Zijlstra
2018-07-20 10:18                 ` Vitaly Kuznetsov
2018-07-20  9:32             ` Peter Zijlstra
2018-07-20 11:04               ` Peter Zijlstra
2018-07-16 19:03 ` [PATCH 5/7] x86,tlb: only send page table free TLB flush to lazy TLB CPUs Rik van Riel
2018-07-17  9:35   ` [tip:x86/mm] x86/mm/tlb: Only " tip-bot for Rik van Riel
2018-07-17 11:39     ` Peter Zijlstra
     [not found]       ` <1F8BDD25-864D-4105-B872-2109AA417454@surriel.com>
     [not found]         ` <24AA4367-22A1-450E-8F6A-3CBF39518384@surriel.com>
2018-07-18 16:19           ` Peter Zijlstra
2018-07-16 19:03 ` [PATCH 6/7] x86,mm: always use lazy TLB mode Rik van Riel
2018-07-17  9:36   ` [tip:x86/mm] x86/mm/tlb: Always " tip-bot for Rik van Riel
2018-10-09 14:58   ` tip-bot for Rik van Riel
2018-07-16 19:03 ` [PATCH 7/7] x86,switch_mm: skip atomic operations for init_mm Rik van Riel
2018-07-17  9:36   ` [tip:x86/mm] x86/mm/tlb: Skip atomic operations for 'init_mm' in switch_mm_irqs_off() tip-bot for Rik van Riel
  -- strict thread matches above, loose matches on Subject: below --
2018-07-10 14:28 [PATCH v5 0/7] x86,tlb,mm: make lazy TLB mode even lazier Rik van Riel
2018-07-10 14:28 ` [PATCH 4/7] x86,tlb: make lazy TLB mode lazier Rik van Riel
2018-07-06 21:56 [PATCH v4 0/7] x86,tlb,mm: make lazy TLB mode even lazier Rik van Riel
2018-07-06 21:56 ` [PATCH 4/7] x86,tlb: make lazy TLB mode lazier Rik van Riel
2018-06-29 14:29 [PATCH v3 0/7] x86,tlb,mm: make lazy TLB mode even lazier Rik van Riel
2018-06-29 14:29 ` [PATCH 4/7] x86,tlb: make lazy TLB mode lazier Rik van Riel
2018-06-29 17:05   ` Dave Hansen
2018-06-29 17:29     ` Rik van Riel
2018-06-20 19:56 [PATCH 0/7] x86,tlb,mm: make lazy TLB mode even lazier Rik van Riel
2018-06-20 19:56 ` [PATCH 4/7] x86,tlb: make lazy TLB mode lazier Rik van Riel
2018-06-22 15:04   ` Andy Lutomirski
2018-06-22 15:15     ` Rik van Riel
2018-06-22 15:34       ` Andy Lutomirski
2018-06-22 17:05   ` Dave Hansen
2018-06-22 17:16     ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180724163330.GC25888@arm.com \
    --to=will.deacon@arm.com \
    --cc=benh@kernel.crashing.org \
    --cc=boris.ostrovsky@oracle.com \
    --cc=catalin.marinas@arm.com \
    --cc=dave.hansen@intel.com \
    --cc=efault@gmx.de \
    --cc=jgross@suse.com \
    --cc=kernel-team@fb.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=luto@kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=riel@surriel.com \
    --cc=vkuznets@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.