* RE: ia64 mmu_gather question
2007-08-02 3:40 ia64 mmu_gather question Benjamin Herrenschmidt
@ 2007-08-02 5:38 ` Luck, Tony
2007-08-02 7:09 ` Benjamin Herrenschmidt
` (8 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Luck, Tony @ 2007-08-02 5:38 UTC (permalink / raw)
To: linux-ia64
Answers to select questions (that I can do off the top of my head).
> In addition, however, I do have a question. I haven't tracked every bit
> of MM code in ia64 land yet and I'm wondering how are the page table
> translations faulted in ? via a SW miss handler ? Or some HW handler ?
The page tables translations are inserted in s/w (by the VHPT miss handler
in arch/ia64/kernel/ivt.S). Essentially the first TLB miss in a PMD range
will end up here and will insert both the page mapping that we actually
want, plus the mapping for the page table (so that a subsequent TLB miss on
this address, or another address in the PMD range) can be serviced by the
h/w VHPT walker (for as long as the page table mapping survives in the
TLB).
> Is there some locking ?
No locking. But we do have race detection. After we chase the PGD>PUD>PMD>PTE
pointers we insert the TLB entry. Then we retrace the pointer chain and
make sure that the pte we find is still the same. If it isn't, then we
purge the entry we just inserted and go for a full page fault.
Time to tell bed-time stories to my daughter. More tomorrow (if someone
else doesn't fill in the rest of the answers before I get back to this).
-Tony
^ permalink raw reply [flat|nested] 11+ messages in thread* RE: ia64 mmu_gather question
2007-08-02 3:40 ia64 mmu_gather question Benjamin Herrenschmidt
2007-08-02 5:38 ` Luck, Tony
@ 2007-08-02 7:09 ` Benjamin Herrenschmidt
2007-08-02 7:57 ` Benjamin Herrenschmidt
` (7 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Benjamin Herrenschmidt @ 2007-08-02 7:09 UTC (permalink / raw)
To: linux-ia64
On Wed, 2007-08-01 at 22:38 -0700, Luck, Tony wrote:
> No locking. But we do have race detection. After we chase the
> PGD>PUD>PMD>PTE
> pointers we insert the TLB entry. Then we retrace the pointer chain
> and
> make sure that the pte we find is still the same. If it isn't, then
> we
> purge the entry we just inserted and go for a full page fault.
>
> Time to tell bed-time stories to my daughter. More tomorrow (if
> someone
> else doesn't fill in the rest of the answers before I get back to
> this).
Ok, that's what I think I understood from the asm. However, what
prevents the very unlikely race where you insert a stale pgtable
mapping entry, and before you backtrack and remove it, another
CPU accesses the stale PTE ?
I'm tempted, while at working on the mmu_gather, to add a generic
mechanism for putting the page table quicklist pages in there too.
Though that would only help for archs that 1) have the problem
(typically are SW loaded in a way or another) and 2) use an IPI for SMP
flush (not ppc64 for example).
Cheers,
Ben.
^ permalink raw reply [flat|nested] 11+ messages in thread* RE: ia64 mmu_gather question
2007-08-02 3:40 ia64 mmu_gather question Benjamin Herrenschmidt
2007-08-02 5:38 ` Luck, Tony
2007-08-02 7:09 ` Benjamin Herrenschmidt
@ 2007-08-02 7:57 ` Benjamin Herrenschmidt
2007-08-02 17:16 ` Luck, Tony
` (6 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Benjamin Herrenschmidt @ 2007-08-02 7:57 UTC (permalink / raw)
To: linux-ia64
On Thu, 2007-08-02 at 17:10 +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2007-08-01 at 22:38 -0700, Luck, Tony wrote:
> > No locking. But we do have race detection. After we chase the
> > PGD>PUD>PMD>PTE
> > pointers we insert the TLB entry. Then we retrace the pointer chain
> > and
> > make sure that the pte we find is still the same. If it isn't, then
> > we
> > purge the entry we just inserted and go for a full page fault.
> >
> > Time to tell bed-time stories to my daughter. More tomorrow (if
> > someone
> > else doesn't fill in the rest of the answers before I get back to
> > this).
>
> Ok, that's what I think I understood from the asm. However, what
> prevents the very unlikely race where you insert a stale pgtable
> mapping entry, and before you backtrack and remove it, another
> CPU accesses the stale PTE ?
I was thinking about a case where the TLB is shared (SMT) between linux
logical CPUs (threads) but ia64 is not SMT right ? Thus the TLB is
split ,and the "other" CPU can't see the stale translation... should be
allright then.
Cheers,
Ben.
^ permalink raw reply [flat|nested] 11+ messages in thread* RE: ia64 mmu_gather question
2007-08-02 3:40 ia64 mmu_gather question Benjamin Herrenschmidt
` (2 preceding siblings ...)
2007-08-02 7:57 ` Benjamin Herrenschmidt
@ 2007-08-02 17:16 ` Luck, Tony
2007-08-02 21:46 ` Benjamin Herrenschmidt
` (5 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Luck, Tony @ 2007-08-02 17:16 UTC (permalink / raw)
To: linux-ia64
> I was thinking about a case where the TLB is shared (SMT) between linux
> logical CPUs (threads) but ia64 is not SMT right ? Thus the TLB is
> split ,and the "other" CPU can't see the stale translation... should be
> allright then.
Montecito is SMT, and the threads do share the TLB resources in that
there are a fixed number of TLB TC slots that are dynamically shared
between threads. But entries in the TLB have their virtual addresses tagged
with a thread identifier, so an entry inserted by one thread cannot
be used by another thread.
See sections 2.4.3.1.1 and 2.4.3.1.2 in "Dual-Core Update to the Intel®
Itanium® 2 Processor Reference Manual"
http://download.intel.com/design/Itanium2/manuals/30806501.pdf
-Tony
^ permalink raw reply [flat|nested] 11+ messages in thread* RE: ia64 mmu_gather question
2007-08-02 3:40 ia64 mmu_gather question Benjamin Herrenschmidt
` (3 preceding siblings ...)
2007-08-02 17:16 ` Luck, Tony
@ 2007-08-02 21:46 ` Benjamin Herrenschmidt
2007-08-02 21:56 ` Luck, Tony
` (4 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Benjamin Herrenschmidt @ 2007-08-02 21:46 UTC (permalink / raw)
To: linux-ia64
On Thu, 2007-08-02 at 10:16 -0700, Luck, Tony wrote:
>
> Montecito is SMT, and the threads do share the TLB resources in that
> there are a fixed number of TLB TC slots that are dynamically shared
> between threads. But entries in the TLB have their virtual addresses
> tagged
> with a thread identifier, so an entry inserted by one thread cannot
> be used by another thread.
Allright... that's a bit of a waste of TLB space though :-)
So I suspect at this stage that the race isn't affecting you. However,
it looks to me that you put the burden on the fairly hot TLB miss path
rather than on the much less hot invalidation path itself...
Cheers,
BenH
^ permalink raw reply [flat|nested] 11+ messages in thread* RE: ia64 mmu_gather question
2007-08-02 3:40 ia64 mmu_gather question Benjamin Herrenschmidt
` (4 preceding siblings ...)
2007-08-02 21:46 ` Benjamin Herrenschmidt
@ 2007-08-02 21:56 ` Luck, Tony
2007-08-02 22:00 ` Benjamin Herrenschmidt
` (3 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Luck, Tony @ 2007-08-02 21:56 UTC (permalink / raw)
To: linux-ia64
> So I suspect at this stage that the race isn't affecting you. However,
> it looks to me that you put the burden on the fairly hot TLB miss path
> rather than on the much less hot invalidation path itself...
Suggestions on how to do move the burden gratefully received. I don't
think that it is all that bad though. The re-read of the PGD>PUD>PMD>PTE
should all hit in the L1-D cache, which has single cycle latency.
-Tony
^ permalink raw reply [flat|nested] 11+ messages in thread* RE: ia64 mmu_gather question
2007-08-02 3:40 ia64 mmu_gather question Benjamin Herrenschmidt
` (5 preceding siblings ...)
2007-08-02 21:56 ` Luck, Tony
@ 2007-08-02 22:00 ` Benjamin Herrenschmidt
2007-08-02 22:07 ` David Mosberger-Tang
` (2 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: Benjamin Herrenschmidt @ 2007-08-02 22:00 UTC (permalink / raw)
To: linux-ia64
On Thu, 2007-08-02 at 14:56 -0700, Luck, Tony wrote:
> > So I suspect at this stage that the race isn't affecting you. However,
> > it looks to me that you put the burden on the fairly hot TLB miss path
> > rather than on the much less hot invalidation path itself...
>
> Suggestions on how to do move the burden gratefully received. I don't
> think that it is all that bad though. The re-read of the PGD>PUD>PMD>PTE
> should all hit in the L1-D cache, which has single cycle latency.
Allright. I'll look into it. Other archs have a similar issues and don't
currently fix it. The easy fix for archs that have IPIs for TLB flushes
is to batch the freeing of page tables pages. That's made a bit harder
by the quicklist but I may just end up adding support for those to the
mmu_gather.
Ben.
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: ia64 mmu_gather question
2007-08-02 3:40 ia64 mmu_gather question Benjamin Herrenschmidt
` (6 preceding siblings ...)
2007-08-02 22:00 ` Benjamin Herrenschmidt
@ 2007-08-02 22:07 ` David Mosberger-Tang
2007-08-02 22:32 ` Benjamin Herrenschmidt
2007-08-03 3:22 ` Tony Luck
9 siblings, 0 replies; 11+ messages in thread
From: David Mosberger-Tang @ 2007-08-02 22:07 UTC (permalink / raw)
To: linux-ia64
Do keep in mind that there is a hardware walker, too! ;-)
--david
On 8/2/07, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
> On Thu, 2007-08-02 at 14:56 -0700, Luck, Tony wrote:
> > > So I suspect at this stage that the race isn't affecting you. However,
> > > it looks to me that you put the burden on the fairly hot TLB miss path
> > > rather than on the much less hot invalidation path itself...
> >
> > Suggestions on how to do move the burden gratefully received. I don't
> > think that it is all that bad though. The re-read of the PGD>PUD>PMD>PTE
> > should all hit in the L1-D cache, which has single cycle latency.
>
> Allright. I'll look into it. Other archs have a similar issues and don't
> currently fix it. The easy fix for archs that have IPIs for TLB flushes
> is to batch the freeing of page tables pages. That's made a bit harder
> by the quicklist but I may just end up adding support for those to the
> mmu_gather.
>
> Ben.
>
>
>
--
Mosberger Consulting LLC, http://www.mosberger-consulting.com/
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: ia64 mmu_gather question
2007-08-02 3:40 ia64 mmu_gather question Benjamin Herrenschmidt
` (7 preceding siblings ...)
2007-08-02 22:07 ` David Mosberger-Tang
@ 2007-08-02 22:32 ` Benjamin Herrenschmidt
2007-08-03 3:22 ` Tony Luck
9 siblings, 0 replies; 11+ messages in thread
From: Benjamin Herrenschmidt @ 2007-08-02 22:32 UTC (permalink / raw)
To: linux-ia64
On Thu, 2007-08-02 at 16:07 -0600, David Mosberger-Tang wrote:
> Do keep in mind that there is a hardware walker, too! ;-)
Yup, I know. Your TLB miss rate gets lower as you fault in the page
table pages, though it's still higher than a full tree walker. How big
is the TLB btw ?
Ben.
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: ia64 mmu_gather question
2007-08-02 3:40 ia64 mmu_gather question Benjamin Herrenschmidt
` (8 preceding siblings ...)
2007-08-02 22:32 ` Benjamin Herrenschmidt
@ 2007-08-03 3:22 ` Tony Luck
9 siblings, 0 replies; 11+ messages in thread
From: Tony Luck @ 2007-08-03 3:22 UTC (permalink / raw)
To: linux-ia64
> Yup, I know. Your TLB miss rate gets lower as you fault in the page
> table pages, though it's still higher than a full tree walker. How big
> is the TLB btw ?
# of TLB entries is model specific ... most Itanium cpus so far have been around
128 entries give or take a few.
-Tony
^ permalink raw reply [flat|nested] 11+ messages in thread