* SW TLB MMU rework and SMP issues
@ 2008-07-15 21:58 Kumar Gala
2008-07-16 2:07 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 6+ messages in thread
From: Kumar Gala @ 2008-07-15 21:58 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev list
Ben,
I've been giving some thought to the new software managed TLBs and SMP
issues. I was wondering if you had any insights on how we should deal
with the following issues:
* tlb invalidates -- need to ensure we don't have multiple tlbsync's
on the bus. I'm thinking for e500/fsl we will move to IPI based
invalidate broadcast and do invalidates locally (http://patchwork.ozlabs.org/linuxppc/patch?id=19657
)
* 64-bit PTEs and reader vs writer hazards. How do we ensure that the
TLB miss handler samples a consistent view of the pte. pte_updates
seem ok since we only update the flag word. However set_pte_at seems
like it could be problematic.
- k
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: SW TLB MMU rework and SMP issues
2008-07-15 21:58 SW TLB MMU rework and SMP issues Kumar Gala
@ 2008-07-16 2:07 ` Benjamin Herrenschmidt
2008-07-16 20:57 ` SW TLB MMU rework and SMP issues (pte read/write) Kumar Gala
0 siblings, 1 reply; 6+ messages in thread
From: Benjamin Herrenschmidt @ 2008-07-16 2:07 UTC (permalink / raw)
To: Kumar Gala; +Cc: linuxppc-dev list
On Tue, 2008-07-15 at 16:58 -0500, Kumar Gala wrote:
> Ben,
>
> I've been giving some thought to the new software managed TLBs and SMP
> issues. I was wondering if you had any insights on how we should deal
> with the following issues:
As discussed on IRC (might interest others...)
> * tlb invalidates -- need to ensure we don't have multiple tlbsync's
> on the bus. I'm thinking for e500/fsl we will move to IPI based
> invalidate broadcast and do invalidates locally
> (http://patchwork.ozlabs.org/linuxppc/patch?id=19657 )
Well, you can just have all your invalidations wrapped in a spinlock.
The "trick" of course is for full-mm invalidates such as page tables
teardown or fork, to avoid doing a lock/unlock & IPI for every PTE of
course. A way to do it is to do some batching, though it isn't trivial.
Without support for TLB invalidate all or by PID, what you can do maybe
is to manually do an invalidate by PID with a tlbre/tlbwe loop. Check
the worst case scenario of walking your entire TLB vs. small processes
that carry only a handful of PTEs....
You can use the batch interface to 'count' things on page table teardown
and decide based on a threshold of invalidated PTEs what approach is
more likely to be useful, but can't really use the batch interface for
fork.
> * 64-bit PTEs and reader vs writer hazards. How do we ensure that the
> TLB miss handler samples a consistent view of the pte. pte_updates
> seem ok since we only update the flag word. However set_pte_at seems
> like it could be problematic.
eieio on the writer and a data dependency on the reader. segher
suggested a nice way to do it on the reader side, by doing a subf of the
value from the pointer and then a lwxz using that value as an offset.
ie. something like that, with r3 containing the PTE pointer:
lwz r10,4(r3)
subf r4,r10,r3 <-- you can use r3,r10,r3 if clobber is safe
lwzx r11,r10,r4 <-- in which case you use r3 here too
That ensures that the top half is loaded after the bottom half, which
is what you want if you do the set_pte_at() that way:
stw r11,0(r3) <-- write top half first
eieio <-- maitain order to coherency domain
stw r10,4(r3) <-- write bottom half last
In fact, in the reader case, while at it, you can interleave that with
the testing of the present bit. Assuming _PAGE_PRESENT is in the low
bits and you can clobber r3, you get something like:
lwz r10,4(r3)
<-- can't do much here unless you can do unrelated things -->
andi. r0,r10,_PAGE_PRESENT
subf r3,r10,r3
beq page_fault
lwzx r11,r10,r3
Cheers,
Ben.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: SW TLB MMU rework and SMP issues (pte read/write)
2008-07-16 2:07 ` Benjamin Herrenschmidt
@ 2008-07-16 20:57 ` Kumar Gala
2008-07-16 21:15 ` Kumar Gala
2008-07-16 21:41 ` Benjamin Herrenschmidt
0 siblings, 2 replies; 6+ messages in thread
From: Kumar Gala @ 2008-07-16 20:57 UTC (permalink / raw)
To: benh; +Cc: linuxppc-dev list
>> * 64-bit PTEs and reader vs writer hazards. How do we ensure that
>> the
>> TLB miss handler samples a consistent view of the pte. pte_updates
>> seem ok since we only update the flag word. However set_pte_at seems
>> like it could be problematic.
>
> eieio on the writer and a data dependency on the reader. segher
> suggested a nice way to do it on the reader side, by doing a subf of
> the
> value from the pointer and then a lwxz using that value as an offset.
>
> ie. something like that, with r3 containing the PTE pointer:
>
> lwz r10,4(r3)
> subf r4,r10,r3 <-- you can use r3,r10,r3 if clobber is safe
> lwzx r11,r10,r4 <-- in which case you use r3 here too
>
> That ensures that the top half is loaded after the bottom half, which
> is what you want if you do the set_pte_at() that way:
>
> stw r11,0(r3) <-- write top half first
> eieio <-- maitain order to coherency domain
> stw r10,4(r3) <-- write bottom half last
>
> In fact, in the reader case, while at it, you can interleave that with
> the testing of the present bit. Assuming _PAGE_PRESENT is in the low
> bits and you can clobber r3, you get something like:
>
> lwz r10,4(r3)
> <-- can't do much here unless you can do unrelated things -->
> andi. r0,r10,_PAGE_PRESENT
> subf r3,r10,r3
> beq page_fault
> lwzx r11,r10,r3
This makes sense. I think we need to order the stores in set_pte_at
regardless of CONFIG_SMP. Also, I think we should change pte_clear to
use pte_update() so we only clear the low-order flag bits. Patch will
be sent shortly for review.
- k
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: SW TLB MMU rework and SMP issues (pte read/write)
2008-07-16 20:57 ` SW TLB MMU rework and SMP issues (pte read/write) Kumar Gala
@ 2008-07-16 21:15 ` Kumar Gala
2008-07-16 21:41 ` Benjamin Herrenschmidt
1 sibling, 0 replies; 6+ messages in thread
From: Kumar Gala @ 2008-07-16 21:15 UTC (permalink / raw)
To: Kumar Gala; +Cc: linuxppc-dev list
On Jul 16, 2008, at 3:57 PM, Kumar Gala wrote:
>>> * 64-bit PTEs and reader vs writer hazards. How do we ensure that
>>> the
>>> TLB miss handler samples a consistent view of the pte. pte_updates
>>> seem ok since we only update the flag word. However set_pte_at
>>> seems
>>> like it could be problematic.
>>
>> eieio on the writer and a data dependency on the reader. segher
>> suggested a nice way to do it on the reader side, by doing a subf
>> of the
>> value from the pointer and then a lwxz using that value as an offset.
>>
>> ie. something like that, with r3 containing the PTE pointer:
>>
>> lwz r10,4(r3)
>> subf r4,r10,r3 <-- you can use r3,r10,r3 if clobber is safe
>> lwzx r11,r10,r4 <-- in which case you use r3 here too
>>
>> That ensures that the top half is loaded after the bottom half, which
>> is what you want if you do the set_pte_at() that way:
>>
>> stw r11,0(r3) <-- write top half first
>> eieio <-- maitain order to coherency domain
>> stw r10,4(r3) <-- write bottom half last
>>
>> In fact, in the reader case, while at it, you can interleave that
>> with
>> the testing of the present bit. Assuming _PAGE_PRESENT is in the low
>> bits and you can clobber r3, you get something like:
>>
>> lwz r10,4(r3)
>> <-- can't do much here unless you can do unrelated things -->
>> andi. r0,r10,_PAGE_PRESENT
>> subf r3,r10,r3
>> beq page_fault
>> lwzx r11,r10,r3
>
> This makes sense. I think we need to order the stores in set_pte_at
> regardless of CONFIG_SMP.
Ok, so I realized that we don't actually need to order the stores
since the sequential programming model will ensure the right things
happens.
- k
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: SW TLB MMU rework and SMP issues (pte read/write)
2008-07-16 20:57 ` SW TLB MMU rework and SMP issues (pte read/write) Kumar Gala
2008-07-16 21:15 ` Kumar Gala
@ 2008-07-16 21:41 ` Benjamin Herrenschmidt
2008-07-16 22:12 ` Kumar Gala
1 sibling, 1 reply; 6+ messages in thread
From: Benjamin Herrenschmidt @ 2008-07-16 21:41 UTC (permalink / raw)
To: Kumar Gala; +Cc: linuxppc-dev list
On Wed, 2008-07-16 at 15:57 -0500, Kumar Gala wrote:
> This makes sense. I think we need to order the stores in set_pte_at
> regardless of CONFIG_SMP.
Nah, that shouldn't be necessary.
> Also, I think we should change pte_clear to
> use pte_update() so we only clear the low-order flag bits. Patch will
> be sent shortly for review.
Well... at one point at least we did rely on a PTE page with all PTEs
cleared to be blank. It don't know if that's still the case, I need to
look.
Ben.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: SW TLB MMU rework and SMP issues (pte read/write)
2008-07-16 21:41 ` Benjamin Herrenschmidt
@ 2008-07-16 22:12 ` Kumar Gala
0 siblings, 0 replies; 6+ messages in thread
From: Kumar Gala @ 2008-07-16 22:12 UTC (permalink / raw)
To: benh; +Cc: linuxppc-dev list
On Jul 16, 2008, at 4:41 PM, Benjamin Herrenschmidt wrote:
> On Wed, 2008-07-16 at 15:57 -0500, Kumar Gala wrote:
>> This makes sense. I think we need to order the stores in set_pte_at
>> regardless of CONFIG_SMP.
>
> Nah, that shouldn't be necessary.
Yeah I finally came to that realization.
>> Also, I think we should change pte_clear to
>> use pte_update() so we only clear the low-order flag bits. Patch
>> will
>> be sent shortly for review.
>
> Well... at one point at least we did rely on a PTE page with all PTEs
> cleared to be blank. It don't know if that's still the case, I need to
> look.
Doesn't look like we do anything special, we just call free_pages or
__free_pages in arch/powerpc/mm/pgtable_32.c.
- k
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2008-07-16 22:16 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-15 21:58 SW TLB MMU rework and SMP issues Kumar Gala
2008-07-16 2:07 ` Benjamin Herrenschmidt
2008-07-16 20:57 ` SW TLB MMU rework and SMP issues (pte read/write) Kumar Gala
2008-07-16 21:15 ` Kumar Gala
2008-07-16 21:41 ` Benjamin Herrenschmidt
2008-07-16 22:12 ` Kumar Gala
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).