From: Russell King <rmk@arm.linux.org.uk>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
Avi Kivity <avi@redhat.com>, Thomas Gleixner <tglx@linutronix.de>,
Rik van Riel <riel@redhat.com>, Ingo Molnar <mingo@elte.hu>,
akpm@linux-foundation.org,
Linus Torvalds <torvalds@linux-foundation.org>,
linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
linux-mm@kvack.org,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
David Miller <davem@davemloft.net>,
Hugh Dickins <hugh.dickins@tiscali.co.uk>,
Mel Gorman <mel@csn.ul.ie>, Nick Piggin <npiggin@kernel.dk>,
Paul McKenney <paulmck@linux.vnet.ibm.com>,
Yanmin Zhang <yanmin_zhang@linux.intel.com>,
"Luck,Tony" <tony.luck@intel.com>,
PaulMundt <lethal@linux-sh.org>,
Chris Metcalf <cmetcalf@tilera.com>
Subject: Re: [PATCH 06/17] arm: mmu_gather rework
Date: Mon, 28 Feb 2011 12:28:04 +0000 [thread overview]
Message-ID: <20110228122803.GC492@flint.arm.linux.org.uk> (raw)
In-Reply-To: <1298895612.2428.10621.camel@twins>
On Mon, Feb 28, 2011 at 01:20:12PM +0100, Peter Zijlstra wrote:
> On Mon, 2011-02-28 at 11:59 +0000, Russell King wrote:
> > On Mon, Feb 28, 2011 at 12:44:47PM +0100, Peter Zijlstra wrote:
> > > Right, so the normal case is:
> > >
> > > unmap_region()
> > > tlb_gather_mmu()
> >
> > The fullmm argument is important here as it specifies the mode.
>
> well, unmap_region always has that 0, I've mentioned the fullmm mode
> separately below, its in many way the easiest case to deal with.
>
> > tlb_gather_mmu(, 0)
> >
> > > unmap_vmas()
> > > for (; vma; vma = vma->vm_next)
> > > unmao_page_range()
> > > tlb_start_vma() -> flush cache range
> > > zap_*_range()
> > > ptep_get_and_clear_full() -> batch/track external tlbs
> > > tlb_remove_tlb_entry() -> batch/track external tlbs
> > > tlb_remove_page() -> track range/batch page
> > > tlb_end_vma() -> flush tlb range
> >
> > tlb_finish_mmu() -> nothing
> >
> > >
> > > [ for architectures that have hardware page table walkers
> > > concurrent faults can still load the page tables ]
> > >
> > > free_pgtables()
> >
> > tlb_gather_mmu(, 1)
> >
> > > while (vma)
> > > unlink_*_vma()
> > > free_*_range()
> > > *_free_tlb()
> > > tlb_finish_mmu()
> >
> > tlb_finish_mmu() -> flush tlb mm
> >
> > >
> > > free vmas
> >
> > So this is all fine. Note that we *don't* use the range stuff here.
> >
> > > Now, if we want to track ranges _and_ have hardware page table walkers
> > > (ARM seems to be one such), we must flush TLBs at tlb_end_vma() because
> > > flush_tlb_range() requires a vma pointer (ARM and TILE actually use more
> > > than ->vm_mm), and on tlb_finish_mmu() issue a full mm wide invalidate
> > > because the hardware walker could have re-populated the cache after
> > > clearing the PTEs but before freeing the page tables.
> >
> > No. The hardware walker won't re-populate the TLB after the page table
>
> Never said it would repopulate the TLB, just said it could repopulate
> your cache thing and that it might still walk the page tables.
>
> > entries have been cleared - where would it get this information from if
> > not from the page tables?
> >
> > > What ARM does is it retains the last vma pointer and tracks
> > > pte_free_tlb() range and uses that in tlb_finish_mmu(), which is a tad
> > > hacky.
> >
> > It may be hacky but then the TLB shootdown interface is hacky too. We
> > don't keep the vma around to re-use after tlb_end_vma() - if you think
> > that then you misunderstand what's going on. The vma pointer is kept
> > around as a cheap way of allowing tlb_finish_mmu() to distinguish
> > between the unmap_region() mode and the shift_arg_pages() mode.
>
> Well, you most certainly use it in the unmap_region() case above.
> tlb_end_vma() will do a flush_tlb_range(), but then your
> __pte_free_tlb() will also track range and the tlb_finish_mmu() will
> then again issue a flush_tlb_range() using the last vma pointer.
Can you point out where pte_free_tlb() is used with unmap_region()?
> unmap_region()'s last tlb_start_vma(), with __pte_free_tlb() tracking
> range will then get tlb_finish_mmu() to issue a second
> flush_tlb_range().
I don't think it will because afaics pte_free_tlb() is never called in
the unmap_region() case.
> > No. That's stupid. Consider the case where you have to loop one page
> > at a time over the range (we do on ARM.) If we ended up with your
> > suggestion above, that means we could potentially have to loop 4K at a
> > time over 3GB of address space. That's idiotic when we have an
> > instruction which can flush the entire TLB for a particular thread.
>
> *blink* so you've implemented flush_tlb_range() as an iteration of
> single page invalidates?
Yes, because flush_tlb_range() is used at most over one VMA, which
typically will not be in the GB range, but a few MB at most.
> Anyway, I don't see how that's related to the I-TLB thing?
It's all related because I don't think you understand what's going on
here properly yet, and as such are getting rather mixed up and confused
about when flush_tlb_range() is called. As such, the whole
does-it-take-vma-or-mm argument is irrelevant, and therefore so is
the I-TLB stuff.
I put to you that pte_free_tlb() is not called in unmap_vmas(), and
as such the double-tlb-invalidate you talk about can't happen.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:
WARNING: multiple messages have this Message-ID (diff)
From: Russell King <rmk@arm.linux.org.uk>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
Avi Kivity <avi@redhat.com>, Thomas Gleixner <tglx@linutronix.de>,
Rik van Riel <riel@redhat.com>, Ingo Molnar <mingo@elte.hu>,
akpm@linux-foundation.org,
Linus Torvalds <torvalds@linux-foundation.org>,
linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
linux-mm@kvack.org,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
David Miller <davem@davemloft.net>,
Hugh Dickins <hugh.dickins@tiscali.co.uk>,
Mel Gorman <mel@csn.ul.ie>, Nick Piggin <npiggin@kernel.dk>,
Paul McKenney <paulmck@linux.vnet.ibm.com>,
Yanmin Zhang <yanmin_zhang@linux.intel.com>,
"Luck,Tony" <tony.luck@intel.com>,
PaulMundt <lethal@linux-sh.org>,
Chris Metcalf <cmetcalf@tilera.com>
Subject: Re: [PATCH 06/17] arm: mmu_gather rework
Date: Mon, 28 Feb 2011 12:28:04 +0000 [thread overview]
Message-ID: <20110228122803.GC492@flint.arm.linux.org.uk> (raw)
In-Reply-To: <1298895612.2428.10621.camel@twins>
On Mon, Feb 28, 2011 at 01:20:12PM +0100, Peter Zijlstra wrote:
> On Mon, 2011-02-28 at 11:59 +0000, Russell King wrote:
> > On Mon, Feb 28, 2011 at 12:44:47PM +0100, Peter Zijlstra wrote:
> > > Right, so the normal case is:
> > >
> > > unmap_region()
> > > tlb_gather_mmu()
> >
> > The fullmm argument is important here as it specifies the mode.
>
> well, unmap_region always has that 0, I've mentioned the fullmm mode
> separately below, its in many way the easiest case to deal with.
>
> > tlb_gather_mmu(, 0)
> >
> > > unmap_vmas()
> > > for (; vma; vma = vma->vm_next)
> > > unmao_page_range()
> > > tlb_start_vma() -> flush cache range
> > > zap_*_range()
> > > ptep_get_and_clear_full() -> batch/track external tlbs
> > > tlb_remove_tlb_entry() -> batch/track external tlbs
> > > tlb_remove_page() -> track range/batch page
> > > tlb_end_vma() -> flush tlb range
> >
> > tlb_finish_mmu() -> nothing
> >
> > >
> > > [ for architectures that have hardware page table walkers
> > > concurrent faults can still load the page tables ]
> > >
> > > free_pgtables()
> >
> > tlb_gather_mmu(, 1)
> >
> > > while (vma)
> > > unlink_*_vma()
> > > free_*_range()
> > > *_free_tlb()
> > > tlb_finish_mmu()
> >
> > tlb_finish_mmu() -> flush tlb mm
> >
> > >
> > > free vmas
> >
> > So this is all fine. Note that we *don't* use the range stuff here.
> >
> > > Now, if we want to track ranges _and_ have hardware page table walkers
> > > (ARM seems to be one such), we must flush TLBs at tlb_end_vma() because
> > > flush_tlb_range() requires a vma pointer (ARM and TILE actually use more
> > > than ->vm_mm), and on tlb_finish_mmu() issue a full mm wide invalidate
> > > because the hardware walker could have re-populated the cache after
> > > clearing the PTEs but before freeing the page tables.
> >
> > No. The hardware walker won't re-populate the TLB after the page table
>
> Never said it would repopulate the TLB, just said it could repopulate
> your cache thing and that it might still walk the page tables.
>
> > entries have been cleared - where would it get this information from if
> > not from the page tables?
> >
> > > What ARM does is it retains the last vma pointer and tracks
> > > pte_free_tlb() range and uses that in tlb_finish_mmu(), which is a tad
> > > hacky.
> >
> > It may be hacky but then the TLB shootdown interface is hacky too. We
> > don't keep the vma around to re-use after tlb_end_vma() - if you think
> > that then you misunderstand what's going on. The vma pointer is kept
> > around as a cheap way of allowing tlb_finish_mmu() to distinguish
> > between the unmap_region() mode and the shift_arg_pages() mode.
>
> Well, you most certainly use it in the unmap_region() case above.
> tlb_end_vma() will do a flush_tlb_range(), but then your
> __pte_free_tlb() will also track range and the tlb_finish_mmu() will
> then again issue a flush_tlb_range() using the last vma pointer.
Can you point out where pte_free_tlb() is used with unmap_region()?
> unmap_region()'s last tlb_start_vma(), with __pte_free_tlb() tracking
> range will then get tlb_finish_mmu() to issue a second
> flush_tlb_range().
I don't think it will because afaics pte_free_tlb() is never called in
the unmap_region() case.
> > No. That's stupid. Consider the case where you have to loop one page
> > at a time over the range (we do on ARM.) If we ended up with your
> > suggestion above, that means we could potentially have to loop 4K at a
> > time over 3GB of address space. That's idiotic when we have an
> > instruction which can flush the entire TLB for a particular thread.
>
> *blink* so you've implemented flush_tlb_range() as an iteration of
> single page invalidates?
Yes, because flush_tlb_range() is used at most over one VMA, which
typically will not be in the GB range, but a few MB at most.
> Anyway, I don't see how that's related to the I-TLB thing?
It's all related because I don't think you understand what's going on
here properly yet, and as such are getting rather mixed up and confused
about when flush_tlb_range() is called. As such, the whole
does-it-take-vma-or-mm argument is irrelevant, and therefore so is
the I-TLB stuff.
I put to you that pte_free_tlb() is not called in unmap_vmas(), and
as such the double-tlb-invalidate you talk about can't happen.
--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-02-28 12:28 UTC|newest]
Thread overview: 119+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-02-17 16:23 [PATCH 00/17] mm: mmu_gather rework Peter Zijlstra
2011-02-17 16:23 ` Peter Zijlstra
2011-02-17 16:23 ` Peter Zijlstra
2011-02-17 16:23 ` [PATCH 01/17] tile: Fix __pte_free_tlb Peter Zijlstra
2011-02-17 16:23 ` Peter Zijlstra
2011-02-17 16:23 ` Peter Zijlstra
2011-02-17 16:23 ` [PATCH 02/17] mm: mmu_gather rework Peter Zijlstra
2011-02-17 16:23 ` Peter Zijlstra
2011-02-17 16:23 ` Peter Zijlstra
2011-03-10 15:50 ` Mel Gorman
2011-03-10 15:50 ` Mel Gorman
2011-03-16 18:55 ` Peter Zijlstra
2011-03-16 18:55 ` Peter Zijlstra
2011-03-16 20:15 ` Geert Uytterhoeven
2011-03-16 20:15 ` Geert Uytterhoeven
2011-03-16 21:08 ` Peter Zijlstra
2011-03-16 21:08 ` Peter Zijlstra
2011-03-21 8:47 ` Avi Kivity
2011-03-21 8:47 ` Avi Kivity
2011-04-01 12:07 ` Peter Zijlstra
2011-04-01 12:07 ` Peter Zijlstra
2011-04-01 16:13 ` Linus Torvalds
2011-04-01 16:13 ` Linus Torvalds
2011-04-02 0:07 ` David Miller
2011-04-02 0:07 ` David Miller
2011-02-17 16:23 ` [PATCH 03/17] powerpc: " Peter Zijlstra
2011-02-17 16:23 ` Peter Zijlstra
2011-02-17 16:23 ` Peter Zijlstra
2011-02-17 16:23 ` [PATCH 04/17] sparc: " Peter Zijlstra
2011-02-17 16:23 ` Peter Zijlstra
2011-02-17 16:23 ` Peter Zijlstra
2011-02-17 16:23 ` [PATCH 05/17] s390: " Peter Zijlstra
2011-02-17 16:23 ` Peter Zijlstra
2011-02-17 16:23 ` Peter Zijlstra
2011-02-17 16:23 ` [PATCH 06/17] arm: " Peter Zijlstra
2011-02-17 16:23 ` Peter Zijlstra
2011-02-17 16:23 ` Peter Zijlstra
2011-02-24 16:34 ` Peter Zijlstra
2011-02-24 16:34 ` Peter Zijlstra
2011-02-25 18:04 ` Peter Zijlstra
2011-02-25 18:04 ` Peter Zijlstra
2011-02-25 19:45 ` Peter Zijlstra
2011-02-25 19:45 ` Peter Zijlstra
2011-02-25 19:59 ` Hugh Dickins
2011-02-25 19:59 ` Hugh Dickins
2011-02-25 21:51 ` Russell King
2011-02-25 21:51 ` Russell King
2011-02-28 11:44 ` Peter Zijlstra
2011-02-28 11:44 ` Peter Zijlstra
2011-02-28 11:59 ` Russell King
2011-02-28 11:59 ` Russell King
2011-02-28 12:06 ` Russell King
2011-02-28 12:06 ` Russell King
2011-02-28 12:25 ` Peter Zijlstra
2011-02-28 12:25 ` Peter Zijlstra
2011-02-28 12:06 ` Russell King
2011-02-28 12:06 ` Russell King
2011-02-28 12:20 ` Peter Zijlstra
2011-02-28 12:20 ` Peter Zijlstra
2011-02-28 12:28 ` Russell King [this message]
2011-02-28 12:28 ` Russell King
2011-02-28 12:49 ` Peter Zijlstra
2011-02-28 12:49 ` Peter Zijlstra
2011-02-28 12:50 ` Russell King
2011-02-28 12:50 ` Russell King
2011-02-28 13:03 ` Peter Zijlstra
2011-02-28 13:03 ` Peter Zijlstra
2011-02-28 14:18 ` Peter Zijlstra
2011-02-28 14:18 ` Peter Zijlstra
2011-02-28 14:57 ` Russell King
2011-02-28 14:57 ` Russell King
2011-02-28 15:05 ` Peter Zijlstra
2011-02-28 15:05 ` Peter Zijlstra
2011-02-28 15:15 ` Russell King
2011-02-28 15:15 ` Russell King
2011-03-01 22:05 ` Chris Metcalf
2011-03-01 22:05 ` Chris Metcalf
2011-03-01 22:05 ` Chris Metcalf
2011-03-02 10:54 ` Peter Zijlstra
2011-03-02 10:54 ` Peter Zijlstra
2011-03-02 10:54 ` Peter Zijlstra
2011-03-02 10:54 ` Peter Zijlstra
2011-02-17 16:23 ` [PATCH 07/17] sh: " Peter Zijlstra
2011-02-17 16:23 ` Peter Zijlstra
2011-02-17 16:23 ` Peter Zijlstra
2011-02-17 16:23 ` [PATCH 08/17] um: " Peter Zijlstra
2011-02-17 16:23 ` Peter Zijlstra
2011-02-17 16:23 ` Peter Zijlstra
2011-02-17 16:23 ` [PATCH 09/17] ia64: " Peter Zijlstra
2011-02-17 16:23 ` Peter Zijlstra
2011-02-17 16:23 ` Peter Zijlstra
2011-02-17 16:23 ` [PATCH 10/17] mm: Now that all old mmu_gather code is gone, remove the storage Peter Zijlstra
2011-02-17 16:23 ` Peter Zijlstra
2011-02-17 16:23 ` Peter Zijlstra
2011-02-17 16:23 ` [PATCH 11/17] mm, powerpc: Move the RCU page-table freeing into generic code Peter Zijlstra
2011-02-17 16:23 ` Peter Zijlstra
2011-02-17 16:23 ` Peter Zijlstra
2011-02-17 16:23 ` [PATCH 12/17] s390: use generic RCP page-table freeing Peter Zijlstra
2011-02-17 16:23 ` Peter Zijlstra
2011-02-17 16:23 ` Peter Zijlstra
2011-02-17 16:23 ` [PATCH 13/17] mm: Extended batches for generic mmu_gather Peter Zijlstra
2011-02-17 16:23 ` Peter Zijlstra
2011-02-17 16:23 ` Peter Zijlstra
2011-02-17 16:23 ` [PATCH 14/17] mm: Provide generic range tracking and flushing Peter Zijlstra
2011-02-17 16:23 ` Peter Zijlstra
2011-02-17 16:23 ` Peter Zijlstra
2011-02-17 16:23 ` [PATCH 15/17] arm, mm: Convert arm to generic tlb Peter Zijlstra
2011-02-17 16:23 ` Peter Zijlstra
2011-02-17 16:23 ` Peter Zijlstra
2011-02-17 16:23 ` [PATCH 16/17] ia64, mm: Convert ia64 " Peter Zijlstra
2011-02-17 16:23 ` Peter Zijlstra
2011-02-17 16:23 ` Peter Zijlstra
2011-02-17 16:23 ` [PATCH 17/17] sh, mm: Convert sh " Peter Zijlstra
2011-02-17 16:23 ` Peter Zijlstra
2011-02-17 16:23 ` Peter Zijlstra
2011-02-17 17:36 ` [PATCH 00/17] mm: mmu_gather rework Peter Zijlstra
2011-02-17 17:36 ` Peter Zijlstra
2011-02-17 17:42 ` Peter Zijlstra
2011-02-17 17:42 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110228122803.GC492@flint.arm.linux.org.uk \
--to=rmk@arm.linux.org.uk \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=avi@redhat.com \
--cc=benh@kernel.crashing.org \
--cc=cmetcalf@tilera.com \
--cc=davem@davemloft.net \
--cc=hugh.dickins@tiscali.co.uk \
--cc=lethal@linux-sh.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mel@csn.ul.ie \
--cc=mingo@elte.hu \
--cc=npiggin@kernel.dk \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=riel@redhat.com \
--cc=tglx@linutronix.de \
--cc=tony.luck@intel.com \
--cc=torvalds@linux-foundation.org \
--cc=yanmin_zhang@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.