linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* SW TLB MMU rework and SMP issues
@ 2008-07-15 21:58 Kumar Gala
  2008-07-16  2:07 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 6+ messages in thread
From: Kumar Gala @ 2008-07-15 21:58 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev list

Ben,

I've been giving some thought to the new software managed TLBs and SMP  
issues.  I was wondering if you had any insights on how we should deal  
with the following issues:

* tlb invalidates -- need to ensure we don't have multiple tlbsync's  
on the bus.  I'm thinking for e500/fsl we will move to IPI based  
invalidate broadcast and do invalidates locally (http://patchwork.ozlabs.org/linuxppc/patch?id=19657 
)

* 64-bit PTEs and reader vs writer hazards.  How do we ensure that the  
TLB miss handler samples a consistent view of the pte.  pte_updates  
seem ok since we only update the flag word.  However set_pte_at seems  
like it could be problematic.

- k

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: SW TLB MMU rework and SMP issues
  2008-07-15 21:58 SW TLB MMU rework and SMP issues Kumar Gala
@ 2008-07-16  2:07 ` Benjamin Herrenschmidt
  2008-07-16 20:57   ` SW TLB MMU rework and SMP issues (pte read/write) Kumar Gala
  0 siblings, 1 reply; 6+ messages in thread
From: Benjamin Herrenschmidt @ 2008-07-16  2:07 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev list

On Tue, 2008-07-15 at 16:58 -0500, Kumar Gala wrote:
> Ben,
> 
> I've been giving some thought to the new software managed TLBs and SMP  
> issues.  I was wondering if you had any insights on how we should deal  
> with the following issues:

As discussed on IRC (might interest others...)

> * tlb invalidates -- need to ensure we don't have multiple tlbsync's  
> on the bus.  I'm thinking for e500/fsl we will move to IPI based  
> invalidate broadcast and do invalidates locally
> (http://patchwork.ozlabs.org/linuxppc/patch?id=19657 )

Well, you can just have all your invalidations wrapped in a spinlock.

The "trick" of course is for full-mm invalidates such as page tables
teardown or fork, to avoid doing a lock/unlock & IPI for every PTE of
course. A way to do it is to do some batching, though it isn't trivial. 

Without support for TLB invalidate all or by PID, what you can do maybe
is to manually do an invalidate by PID with a tlbre/tlbwe loop. Check
the worst case scenario of walking your entire TLB vs. small processes
that carry only a handful of PTEs....

You can use the batch interface to 'count' things on page table teardown
and decide based on a threshold of invalidated PTEs what approach is
more likely to be useful, but can't really use the batch interface for
fork. 

> * 64-bit PTEs and reader vs writer hazards.  How do we ensure that the  
> TLB miss handler samples a consistent view of the pte.  pte_updates  
> seem ok since we only update the flag word.  However set_pte_at seems  
> like it could be problematic.

eieio on the writer and a data dependency on the reader. segher
suggested a nice way to do it on the reader side, by doing a subf of the
value from the pointer and then a lwxz using that value as an offset.

ie. something like that, with r3 containing the PTE pointer:

	lwz	r10,4(r3)
	subf	r4,r10,r3  <-- you can use r3,r10,r3 if clobber is safe
	lwzx	r11,r10,r4 <-- in which case you use r3 here too

That ensures that the top half is loaded after the bottom half, which
is what you want if you do the set_pte_at() that way:

	stw	r11,0(r3)  <-- write top half first
	eieio	           <-- maitain order to coherency domain
        stw	r10,4(r3)  <-- write bottom half last

In fact, in the reader case, while at it, you can interleave that with
the testing of the present bit. Assuming _PAGE_PRESENT is in the low
bits and you can clobber r3, you get something like:

	lwz	r10,4(r3)
	<-- can't do much here unless you can do unrelated things -->
	andi.	r0,r10,_PAGE_PRESENT
	subf	r3,r10,r3
	beq	page_fault
	lwzx	r11,r10,r3

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: SW TLB MMU rework and SMP issues (pte read/write)
  2008-07-16  2:07 ` Benjamin Herrenschmidt
@ 2008-07-16 20:57   ` Kumar Gala
  2008-07-16 21:15     ` Kumar Gala
  2008-07-16 21:41     ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 6+ messages in thread
From: Kumar Gala @ 2008-07-16 20:57 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev list

>> * 64-bit PTEs and reader vs writer hazards.  How do we ensure that  
>> the
>> TLB miss handler samples a consistent view of the pte.  pte_updates
>> seem ok since we only update the flag word.  However set_pte_at seems
>> like it could be problematic.
>
> eieio on the writer and a data dependency on the reader. segher
> suggested a nice way to do it on the reader side, by doing a subf of  
> the
> value from the pointer and then a lwxz using that value as an offset.
>
> ie. something like that, with r3 containing the PTE pointer:
>
> 	lwz	r10,4(r3)
> 	subf	r4,r10,r3  <-- you can use r3,r10,r3 if clobber is safe
> 	lwzx	r11,r10,r4 <-- in which case you use r3 here too
>
> That ensures that the top half is loaded after the bottom half, which
> is what you want if you do the set_pte_at() that way:
>
> 	stw	r11,0(r3)  <-- write top half first
> 	eieio	           <-- maitain order to coherency domain
>        stw	r10,4(r3)  <-- write bottom half last
>
> In fact, in the reader case, while at it, you can interleave that with
> the testing of the present bit. Assuming _PAGE_PRESENT is in the low
> bits and you can clobber r3, you get something like:
>
> 	lwz	r10,4(r3)
> 	<-- can't do much here unless you can do unrelated things -->
> 	andi.	r0,r10,_PAGE_PRESENT
> 	subf	r3,r10,r3
> 	beq	page_fault
> 	lwzx	r11,r10,r3

This makes sense.  I think we need to order the stores in set_pte_at  
regardless of CONFIG_SMP.  Also, I think we should change pte_clear to  
use pte_update() so we only clear the low-order flag bits.  Patch will  
be sent shortly for review.

- k

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: SW TLB MMU rework and SMP issues (pte read/write)
  2008-07-16 20:57   ` SW TLB MMU rework and SMP issues (pte read/write) Kumar Gala
@ 2008-07-16 21:15     ` Kumar Gala
  2008-07-16 21:41     ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 6+ messages in thread
From: Kumar Gala @ 2008-07-16 21:15 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev list


On Jul 16, 2008, at 3:57 PM, Kumar Gala wrote:

>>> * 64-bit PTEs and reader vs writer hazards.  How do we ensure that  
>>> the
>>> TLB miss handler samples a consistent view of the pte.  pte_updates
>>> seem ok since we only update the flag word.  However set_pte_at  
>>> seems
>>> like it could be problematic.
>>
>> eieio on the writer and a data dependency on the reader. segher
>> suggested a nice way to do it on the reader side, by doing a subf  
>> of the
>> value from the pointer and then a lwxz using that value as an offset.
>>
>> ie. something like that, with r3 containing the PTE pointer:
>>
>> 	lwz	r10,4(r3)
>> 	subf	r4,r10,r3  <-- you can use r3,r10,r3 if clobber is safe
>> 	lwzx	r11,r10,r4 <-- in which case you use r3 here too
>>
>> That ensures that the top half is loaded after the bottom half, which
>> is what you want if you do the set_pte_at() that way:
>>
>> 	stw	r11,0(r3)  <-- write top half first
>> 	eieio	           <-- maitain order to coherency domain
>>       stw	r10,4(r3)  <-- write bottom half last
>>
>> In fact, in the reader case, while at it, you can interleave that  
>> with
>> the testing of the present bit. Assuming _PAGE_PRESENT is in the low
>> bits and you can clobber r3, you get something like:
>>
>> 	lwz	r10,4(r3)
>> 	<-- can't do much here unless you can do unrelated things -->
>> 	andi.	r0,r10,_PAGE_PRESENT
>> 	subf	r3,r10,r3
>> 	beq	page_fault
>> 	lwzx	r11,r10,r3
>
> This makes sense.  I think we need to order the stores in set_pte_at  
> regardless of CONFIG_SMP.

Ok, so I realized that we don't actually need to order the stores  
since the sequential programming model will ensure the right things  
happens.

- k

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: SW TLB MMU rework and SMP issues (pte read/write)
  2008-07-16 20:57   ` SW TLB MMU rework and SMP issues (pte read/write) Kumar Gala
  2008-07-16 21:15     ` Kumar Gala
@ 2008-07-16 21:41     ` Benjamin Herrenschmidt
  2008-07-16 22:12       ` Kumar Gala
  1 sibling, 1 reply; 6+ messages in thread
From: Benjamin Herrenschmidt @ 2008-07-16 21:41 UTC (permalink / raw)
  To: Kumar Gala; +Cc: linuxppc-dev list

On Wed, 2008-07-16 at 15:57 -0500, Kumar Gala wrote:
> This makes sense.  I think we need to order the stores in set_pte_at  
> regardless of CONFIG_SMP. 

Nah, that shouldn't be necessary.

>  Also, I think we should change pte_clear to  
> use pte_update() so we only clear the low-order flag bits.  Patch will  
> be sent shortly for review.

Well... at one point at least we did rely on a PTE page with all PTEs
cleared to be blank. It don't know if that's still the case, I need to
look.

Ben.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: SW TLB MMU rework and SMP issues (pte read/write)
  2008-07-16 21:41     ` Benjamin Herrenschmidt
@ 2008-07-16 22:12       ` Kumar Gala
  0 siblings, 0 replies; 6+ messages in thread
From: Kumar Gala @ 2008-07-16 22:12 UTC (permalink / raw)
  To: benh; +Cc: linuxppc-dev list


On Jul 16, 2008, at 4:41 PM, Benjamin Herrenschmidt wrote:

> On Wed, 2008-07-16 at 15:57 -0500, Kumar Gala wrote:
>> This makes sense.  I think we need to order the stores in set_pte_at
>> regardless of CONFIG_SMP.
>
> Nah, that shouldn't be necessary.

Yeah I finally came to that realization.

>> Also, I think we should change pte_clear to
>> use pte_update() so we only clear the low-order flag bits.  Patch  
>> will
>> be sent shortly for review.
>
> Well... at one point at least we did rely on a PTE page with all PTEs
> cleared to be blank. It don't know if that's still the case, I need to
> look.

Doesn't look like we do anything special, we just call free_pages or  
__free_pages in arch/powerpc/mm/pgtable_32.c.

- k

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-07-16 22:16 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-15 21:58 SW TLB MMU rework and SMP issues Kumar Gala
2008-07-16  2:07 ` Benjamin Herrenschmidt
2008-07-16 20:57   ` SW TLB MMU rework and SMP issues (pte read/write) Kumar Gala
2008-07-16 21:15     ` Kumar Gala
2008-07-16 21:41     ` Benjamin Herrenschmidt
2008-07-16 22:12       ` Kumar Gala

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).