From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
Rik van Riel <riel@redhat.com>, Ingo Molnar <mingo@elte.hu>,
akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
linux-arch@vger.kernel.org, linux-mm@kvack.org,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
David Miller <davem@davemloft.net>,
Hugh Dickins <hugh.dickins@tiscali.co.uk>,
Mel Gorman <mel@csn.ul.ie>, Nick Piggin <npiggin@kernel.dk>,
Russell King <rmk@arm.linux.org.uk>,
Chris Metcalf <cmetcalf@tilera.com>,
Martin Schwidefsky <schwidefsky@de.ibm.com>
Subject: Re: [RFC][PATCH 2/6] mm: Change flush_tlb_range() to take an mm_struct
Date: Wed, 02 Mar 2011 22:40:27 +0100 [thread overview]
Message-ID: <1299102027.1310.39.camel@laptop> (raw)
In-Reply-To: <AANLkTimhWKhHojZ-9XZGSh3OzfPhvo__Dib9VfeMWoBQ@mail.gmail.com>
On Wed, 2011-03-02 at 11:19 -0800, Linus Torvalds wrote:
> On Wed, Mar 2, 2011 at 9:59 AM, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> > In order to be able to properly support architecture that want/need to
> > support TLB range invalidation, we need to change the
> > flush_tlb_range() argument from a vm_area_struct to an mm_struct
> > because the range might very well extend past one VMA, or not have a
> > VMA at all.
>
> I really don't think this is right. The whole "drop the icache
> information" thing is a total anti-optimization, since for some
> architectures, the icache flush is the _big_ deal.
Right, so Tile has the I-cache flush from flush_tlb_range(), I'm not
sure if that's the right thing to do, Documentation/cachetlb.txt seems
to suggest doing it from update_mmu_cache() like things.
However, I really don't know, and would happily be explained how these
things are supposed to work. Also:
> Possibly much
> bigger than the TLB flush itself. Doing an icache flush was much more
> expensive than the TLB flush on alpha, for example (the tlb had ASI's
> etc, the icache did not).
Right, but the problem remains that we do page-table teardown without
having a vma.
Now we can re-introduce I/D variants again by assuming D-only and using
tlb_start_vma() to set a I-too bit on VM_EXEC. (this assumes the vm_args
range is non-executable -- which it had better be).
How about I do something like:
enum {
TLB_FLUSH_I = 1,
TLB_FLUSH_D = 2,
TLB_FLUSH_PAGE = 4,
TLB_FLUSH_HPAGE = 8,
};
void flush_tlb_range(struct mm_struct *mm, unsigned long start,
unsigned long end, unsigned int flags);
And we then do:
tlb_gather_mmu(struct mmu_gather *tlb, ...)
{
...
tlb->flush_type = TLB_FLUSH_D | TLB_FLUSH_PAGE;
}
tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
{
if (!tlb->fullmm)
flush_cache_range(vma, vma->vm_start, vma->vm_end);
if (vma->vm_flags & VM_EXEC)
tlb->flush_type |= TLB_FLUSH_I;
if (vma->vm_flags & VM_HUGEPAGE)
tlb->flush_type |= TLB_FLUSH_HPAGE;
}
tlb_flush_mmu(struct mmu_gather *tlb)
{
if (!tlb->fullmm && tlb->need_flush) {
flush_tlb_range(tlb->mm, tlb->start, tlb->end, tlb->flush_type);
tlb->start = TASK_SIZE;
tlb->end = 0;
}
...
}
> > There are various reasons that we need to flush TLBs _after_ freeing
> > the page-tables themselves. For some architectures (x86 among others)
> > this serializes against (both hardware and software) page table
> > walkers like gup_fast().
>
> This part of the changelog also makes no sense what-so-ever. It's
> actively wrong.
>
> On x86, we absolutely *must* do the TLB flush _before_ we release the
> page tables. So your commentary is actively wrong and misleading.
>
> The order has to be:
> - clear the page table entry, queue the page to be free'd
> - flush the TLB
> - free the page (and page tables)
>
> and nothing else is correct, afaik. So the changelog is pure and utter
> garbage. I didn't look at what the patch actually changed.
OK, so I use the wrong terms, I meant page-table tear-down, where we
remove the pte page pointer from the pmd, remove the pmd page from the
pud etc.
We then flush the TLBs and only then actually free the pages. I think
the confusion stems from the fact that we call tear-down free_pgtables()
The point was that we need to TLB flush _after_ tear-down (before actual
free), not before tear-down. The problem is that currently we either end
up doing too many TLB flushes or one too few.
next prev parent reply other threads:[~2011-03-02 21:40 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-03-02 17:59 [RFC][PATCH 0/6] mm: Unify TLB gather implementations Peter Zijlstra
2011-03-02 17:59 ` Peter Zijlstra
2011-03-02 17:59 ` [RFC][PATCH 1/6] mm: Optimize fullmm TLB flushing Peter Zijlstra
2011-03-02 17:59 ` Peter Zijlstra
2011-03-02 17:59 ` [RFC][PATCH 2/6] mm: Change flush_tlb_range() to take an mm_struct Peter Zijlstra
2011-03-02 17:59 ` Peter Zijlstra
2011-03-02 19:19 ` Linus Torvalds
2011-03-02 20:58 ` Rik van Riel
2011-03-02 20:58 ` Rik van Riel
2011-03-02 21:40 ` Peter Zijlstra [this message]
2011-03-02 21:40 ` Peter Zijlstra
2011-03-02 21:47 ` David Miller
2011-03-03 17:22 ` Chris Metcalf
2011-03-03 17:22 ` Chris Metcalf
2011-03-03 18:45 ` David Miller
2011-03-03 18:56 ` Chris Metcalf
2011-03-03 18:56 ` Chris Metcalf
2011-03-10 18:05 ` [PATCH] arch/tile: optimize icache flush Chris Metcalf
2011-03-10 18:05 ` Chris Metcalf
2011-03-10 23:19 ` Rik van Riel
2011-03-10 23:19 ` Rik van Riel
2011-03-02 17:59 ` [RFC][PATCH 3/6] mm: Provide generic range tracking and flushing Peter Zijlstra
2011-03-02 17:59 ` Peter Zijlstra
2011-03-02 17:59 ` [RFC][PATCH 4/6] arm, mm: Convert arm to generic tlb Peter Zijlstra
2011-03-02 17:59 ` Peter Zijlstra
2011-03-09 15:16 ` Catalin Marinas
2011-03-09 15:19 ` Peter Zijlstra
2011-03-09 15:19 ` Peter Zijlstra
2011-03-09 15:36 ` Catalin Marinas
2011-03-09 15:39 ` Peter Zijlstra
2011-03-09 15:39 ` Peter Zijlstra
2011-03-09 15:48 ` Peter Zijlstra
2011-03-09 16:34 ` Catalin Marinas
2012-05-17 3:05 ` Paul Mundt
2012-05-17 3:05 ` Paul Mundt
2012-05-17 9:30 ` Catalin Marinas
2012-05-17 9:39 ` Catalin Marinas
2012-05-17 9:51 ` Russell King
2012-05-17 9:51 ` Russell King
2012-05-17 11:28 ` Peter Zijlstra
2012-05-17 11:28 ` Peter Zijlstra
2012-05-17 12:14 ` Catalin Marinas
2012-05-17 16:00 ` Catalin Marinas
2012-05-17 16:24 ` Peter Zijlstra
2012-05-17 16:24 ` Peter Zijlstra
2012-05-17 16:33 ` Peter Zijlstra
2012-05-17 16:33 ` Peter Zijlstra
2012-05-17 16:44 ` Peter Zijlstra
2012-05-17 16:44 ` Peter Zijlstra
2012-05-17 16:59 ` Peter Zijlstra
2012-05-17 16:59 ` Peter Zijlstra
2012-05-17 17:01 ` Catalin Marinas
2012-05-17 17:01 ` Catalin Marinas
2012-05-17 17:11 ` Peter Zijlstra
2012-05-17 17:11 ` Peter Zijlstra
2012-05-21 7:47 ` Martin Schwidefsky
2012-05-21 7:47 ` Martin Schwidefsky
2012-05-17 17:22 ` Russell King
2012-05-17 17:22 ` Russell King
2012-05-17 18:31 ` Catalin Marinas
2011-03-02 17:59 ` [RFC][PATCH 5/6] ia64, mm: Convert ia64 " Peter Zijlstra
2011-03-02 17:59 ` Peter Zijlstra
2011-03-02 17:59 ` [RFC][PATCH 6/6] sh, mm: Convert sh " Peter Zijlstra
2011-03-02 17:59 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1299102027.1310.39.camel@laptop \
--to=a.p.zijlstra@chello.nl \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=benh@kernel.crashing.org \
--cc=cmetcalf@tilera.com \
--cc=davem@davemloft.net \
--cc=hugh.dickins@tiscali.co.uk \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=mingo@elte.hu \
--cc=npiggin@kernel.dk \
--cc=riel@redhat.com \
--cc=rmk@arm.linux.org.uk \
--cc=schwidefsky@de.ibm.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).