* SW TLB MMU rework and SMP issues @ 2008-07-15 21:58 Kumar Gala 2008-07-16 2:07 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 6+ messages in thread From: Kumar Gala @ 2008-07-15 21:58 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: linuxppc-dev list Ben, I've been giving some thought to the new software managed TLBs and SMP issues. I was wondering if you had any insights on how we should deal with the following issues: * tlb invalidates -- need to ensure we don't have multiple tlbsync's on the bus. I'm thinking for e500/fsl we will move to IPI based invalidate broadcast and do invalidates locally (http://patchwork.ozlabs.org/linuxppc/patch?id=19657 ) * 64-bit PTEs and reader vs writer hazards. How do we ensure that the TLB miss handler samples a consistent view of the pte. pte_updates seem ok since we only update the flag word. However set_pte_at seems like it could be problematic. - k ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: SW TLB MMU rework and SMP issues 2008-07-15 21:58 SW TLB MMU rework and SMP issues Kumar Gala @ 2008-07-16 2:07 ` Benjamin Herrenschmidt 2008-07-16 20:57 ` SW TLB MMU rework and SMP issues (pte read/write) Kumar Gala 0 siblings, 1 reply; 6+ messages in thread From: Benjamin Herrenschmidt @ 2008-07-16 2:07 UTC (permalink / raw) To: Kumar Gala; +Cc: linuxppc-dev list On Tue, 2008-07-15 at 16:58 -0500, Kumar Gala wrote: > Ben, > > I've been giving some thought to the new software managed TLBs and SMP > issues. I was wondering if you had any insights on how we should deal > with the following issues: As discussed on IRC (might interest others...) > * tlb invalidates -- need to ensure we don't have multiple tlbsync's > on the bus. I'm thinking for e500/fsl we will move to IPI based > invalidate broadcast and do invalidates locally > (http://patchwork.ozlabs.org/linuxppc/patch?id=19657 ) Well, you can just have all your invalidations wrapped in a spinlock. The "trick" of course is for full-mm invalidates such as page tables teardown or fork, to avoid doing a lock/unlock & IPI for every PTE of course. A way to do it is to do some batching, though it isn't trivial. Without support for TLB invalidate all or by PID, what you can do maybe is to manually do an invalidate by PID with a tlbre/tlbwe loop. Check the worst case scenario of walking your entire TLB vs. small processes that carry only a handful of PTEs.... You can use the batch interface to 'count' things on page table teardown and decide based on a threshold of invalidated PTEs what approach is more likely to be useful, but can't really use the batch interface for fork. > * 64-bit PTEs and reader vs writer hazards. How do we ensure that the > TLB miss handler samples a consistent view of the pte. pte_updates > seem ok since we only update the flag word. However set_pte_at seems > like it could be problematic. eieio on the writer and a data dependency on the reader. segher suggested a nice way to do it on the reader side, by doing a subf of the value from the pointer and then a lwxz using that value as an offset. ie. something like that, with r3 containing the PTE pointer: lwz r10,4(r3) subf r4,r10,r3 <-- you can use r3,r10,r3 if clobber is safe lwzx r11,r10,r4 <-- in which case you use r3 here too That ensures that the top half is loaded after the bottom half, which is what you want if you do the set_pte_at() that way: stw r11,0(r3) <-- write top half first eieio <-- maitain order to coherency domain stw r10,4(r3) <-- write bottom half last In fact, in the reader case, while at it, you can interleave that with the testing of the present bit. Assuming _PAGE_PRESENT is in the low bits and you can clobber r3, you get something like: lwz r10,4(r3) <-- can't do much here unless you can do unrelated things --> andi. r0,r10,_PAGE_PRESENT subf r3,r10,r3 beq page_fault lwzx r11,r10,r3 Cheers, Ben. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: SW TLB MMU rework and SMP issues (pte read/write) 2008-07-16 2:07 ` Benjamin Herrenschmidt @ 2008-07-16 20:57 ` Kumar Gala 2008-07-16 21:15 ` Kumar Gala 2008-07-16 21:41 ` Benjamin Herrenschmidt 0 siblings, 2 replies; 6+ messages in thread From: Kumar Gala @ 2008-07-16 20:57 UTC (permalink / raw) To: benh; +Cc: linuxppc-dev list >> * 64-bit PTEs and reader vs writer hazards. How do we ensure that >> the >> TLB miss handler samples a consistent view of the pte. pte_updates >> seem ok since we only update the flag word. However set_pte_at seems >> like it could be problematic. > > eieio on the writer and a data dependency on the reader. segher > suggested a nice way to do it on the reader side, by doing a subf of > the > value from the pointer and then a lwxz using that value as an offset. > > ie. something like that, with r3 containing the PTE pointer: > > lwz r10,4(r3) > subf r4,r10,r3 <-- you can use r3,r10,r3 if clobber is safe > lwzx r11,r10,r4 <-- in which case you use r3 here too > > That ensures that the top half is loaded after the bottom half, which > is what you want if you do the set_pte_at() that way: > > stw r11,0(r3) <-- write top half first > eieio <-- maitain order to coherency domain > stw r10,4(r3) <-- write bottom half last > > In fact, in the reader case, while at it, you can interleave that with > the testing of the present bit. Assuming _PAGE_PRESENT is in the low > bits and you can clobber r3, you get something like: > > lwz r10,4(r3) > <-- can't do much here unless you can do unrelated things --> > andi. r0,r10,_PAGE_PRESENT > subf r3,r10,r3 > beq page_fault > lwzx r11,r10,r3 This makes sense. I think we need to order the stores in set_pte_at regardless of CONFIG_SMP. Also, I think we should change pte_clear to use pte_update() so we only clear the low-order flag bits. Patch will be sent shortly for review. - k ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: SW TLB MMU rework and SMP issues (pte read/write) 2008-07-16 20:57 ` SW TLB MMU rework and SMP issues (pte read/write) Kumar Gala @ 2008-07-16 21:15 ` Kumar Gala 2008-07-16 21:41 ` Benjamin Herrenschmidt 1 sibling, 0 replies; 6+ messages in thread From: Kumar Gala @ 2008-07-16 21:15 UTC (permalink / raw) To: Kumar Gala; +Cc: linuxppc-dev list On Jul 16, 2008, at 3:57 PM, Kumar Gala wrote: >>> * 64-bit PTEs and reader vs writer hazards. How do we ensure that >>> the >>> TLB miss handler samples a consistent view of the pte. pte_updates >>> seem ok since we only update the flag word. However set_pte_at >>> seems >>> like it could be problematic. >> >> eieio on the writer and a data dependency on the reader. segher >> suggested a nice way to do it on the reader side, by doing a subf >> of the >> value from the pointer and then a lwxz using that value as an offset. >> >> ie. something like that, with r3 containing the PTE pointer: >> >> lwz r10,4(r3) >> subf r4,r10,r3 <-- you can use r3,r10,r3 if clobber is safe >> lwzx r11,r10,r4 <-- in which case you use r3 here too >> >> That ensures that the top half is loaded after the bottom half, which >> is what you want if you do the set_pte_at() that way: >> >> stw r11,0(r3) <-- write top half first >> eieio <-- maitain order to coherency domain >> stw r10,4(r3) <-- write bottom half last >> >> In fact, in the reader case, while at it, you can interleave that >> with >> the testing of the present bit. Assuming _PAGE_PRESENT is in the low >> bits and you can clobber r3, you get something like: >> >> lwz r10,4(r3) >> <-- can't do much here unless you can do unrelated things --> >> andi. r0,r10,_PAGE_PRESENT >> subf r3,r10,r3 >> beq page_fault >> lwzx r11,r10,r3 > > This makes sense. I think we need to order the stores in set_pte_at > regardless of CONFIG_SMP. Ok, so I realized that we don't actually need to order the stores since the sequential programming model will ensure the right things happens. - k ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: SW TLB MMU rework and SMP issues (pte read/write) 2008-07-16 20:57 ` SW TLB MMU rework and SMP issues (pte read/write) Kumar Gala 2008-07-16 21:15 ` Kumar Gala @ 2008-07-16 21:41 ` Benjamin Herrenschmidt 2008-07-16 22:12 ` Kumar Gala 1 sibling, 1 reply; 6+ messages in thread From: Benjamin Herrenschmidt @ 2008-07-16 21:41 UTC (permalink / raw) To: Kumar Gala; +Cc: linuxppc-dev list On Wed, 2008-07-16 at 15:57 -0500, Kumar Gala wrote: > This makes sense. I think we need to order the stores in set_pte_at > regardless of CONFIG_SMP. Nah, that shouldn't be necessary. > Also, I think we should change pte_clear to > use pte_update() so we only clear the low-order flag bits. Patch will > be sent shortly for review. Well... at one point at least we did rely on a PTE page with all PTEs cleared to be blank. It don't know if that's still the case, I need to look. Ben. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: SW TLB MMU rework and SMP issues (pte read/write) 2008-07-16 21:41 ` Benjamin Herrenschmidt @ 2008-07-16 22:12 ` Kumar Gala 0 siblings, 0 replies; 6+ messages in thread From: Kumar Gala @ 2008-07-16 22:12 UTC (permalink / raw) To: benh; +Cc: linuxppc-dev list On Jul 16, 2008, at 4:41 PM, Benjamin Herrenschmidt wrote: > On Wed, 2008-07-16 at 15:57 -0500, Kumar Gala wrote: >> This makes sense. I think we need to order the stores in set_pte_at >> regardless of CONFIG_SMP. > > Nah, that shouldn't be necessary. Yeah I finally came to that realization. >> Also, I think we should change pte_clear to >> use pte_update() so we only clear the low-order flag bits. Patch >> will >> be sent shortly for review. > > Well... at one point at least we did rely on a PTE page with all PTEs > cleared to be blank. It don't know if that's still the case, I need to > look. Doesn't look like we do anything special, we just call free_pages or __free_pages in arch/powerpc/mm/pgtable_32.c. - k ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2008-07-16 22:16 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-07-15 21:58 SW TLB MMU rework and SMP issues Kumar Gala 2008-07-16 2:07 ` Benjamin Herrenschmidt 2008-07-16 20:57 ` SW TLB MMU rework and SMP issues (pte read/write) Kumar Gala 2008-07-16 21:15 ` Kumar Gala 2008-07-16 21:41 ` Benjamin Herrenschmidt 2008-07-16 22:12 ` Kumar Gala
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).