* kernel mapping @ 2001-01-15 23:13 Dan Malek 2001-01-16 3:07 ` Frank Rowand 0 siblings, 1 reply; 14+ messages in thread From: Dan Malek @ 2001-01-15 23:13 UTC (permalink / raw) To: linuxppc-dev How come we don't use iopa() and friends for all kernel mapping information? It is only defined for CONFIG_APUS, but is the right thing to use on 8xx and 4xx, and probably all processors. The virt_to_bus/bus_to_virt contain the quickie arithmetic hack with KERNELBASE, but that isn't the right thing to do for any kmalloc() or valloc() space or if you don't have BAT mapping. I am considering making these functions more generic, removing the #ifdefs, and implementing "simulated" BAT mapping for processors like the 8xx and 4xx that don't have BATs (not for 2.4, of course :-). Why shouldn't I do this? -- Dan -- I like MMUs because I don't have a real life. ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel mapping 2001-01-15 23:13 kernel mapping Dan Malek @ 2001-01-16 3:07 ` Frank Rowand 2001-01-16 3:55 ` Dan Malek 2001-01-16 11:37 ` Ralph Blach 0 siblings, 2 replies; 14+ messages in thread From: Frank Rowand @ 2001-01-16 3:07 UTC (permalink / raw) To: Dan Malek; +Cc: linuxppc-dev Dan Malek wrote: > > How come we don't use iopa() and friends for all kernel mapping > information? It is only defined for CONFIG_APUS, but is the right > thing to use on 8xx and 4xx, and probably all processors. The > virt_to_bus/bus_to_virt contain the quickie arithmetic hack with > KERNELBASE, but that isn't the right thing to do for any kmalloc() > or valloc() space or if you don't have BAT mapping. > > I am considering making these functions more generic, removing the > #ifdefs, and implementing "simulated" BAT mapping for processors > like the 8xx and 4xx that don't have BATs (not for 2.4, of course :-). > > Why shouldn't I do this? > > -- Dan For the 405 I had to use iopa() for virt_to_bus() because there are cases where I create a virtual address for IO buffers that is uncached, and that virtual address is not (physical address + KERNELBASE). I also have the beginnings of simulated BAT mapping for the 405 (not quite there, but part way). -Frank -- Frank Rowand <frank_rowand@mvista.com> MontaVista Software, Inc ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel mapping 2001-01-16 3:07 ` Frank Rowand @ 2001-01-16 3:55 ` Dan Malek 2001-01-16 11:37 ` Ralph Blach 1 sibling, 0 replies; 14+ messages in thread From: Dan Malek @ 2001-01-16 3:55 UTC (permalink / raw) To: frowand; +Cc: linuxppc-dev Frank Rowand wrote: > I also have the beginnings of simulated BAT mapping for the 405 > (not quite there, but part way). I know. I'm currently re-writing it all to be generic. All of the 4xx specific pinned entry stuff is gone. I've been thinking about this too long for the 8xx and now have a reason to do it. Some of the logic is in the iopa and mm_ptov functions now, which will also just work fine on the 6xx/7xx/7xxx. -- Dan ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel mapping 2001-01-16 3:07 ` Frank Rowand 2001-01-16 3:55 ` Dan Malek @ 2001-01-16 11:37 ` Ralph Blach 2001-01-16 16:50 ` Dan Malek 1 sibling, 1 reply; 14+ messages in thread From: Ralph Blach @ 2001-01-16 11:37 UTC (permalink / raw) To: frowand; +Cc: linuxppc-dev [-- Attachment #1: Type: text/plain, Size: 1168 bytes --] Why do we need simulated bat registers. Chip Frank Rowand wrote: > > Dan Malek wrote: > > > > How come we don't use iopa() and friends for all kernel mapping > > information? It is only defined for CONFIG_APUS, but is the right > > thing to use on 8xx and 4xx, and probably all processors. The > > virt_to_bus/bus_to_virt contain the quickie arithmetic hack with > > KERNELBASE, but that isn't the right thing to do for any kmalloc() > > or valloc() space or if you don't have BAT mapping. > > > > I am considering making these functions more generic, removing the > > #ifdefs, and implementing "simulated" BAT mapping for processors > > like the 8xx and 4xx that don't have BATs (not for 2.4, of course :-). > > > > Why shouldn't I do this? > > > > -- Dan > > For the 405 I had to use iopa() for virt_to_bus() because there are > cases where I create a virtual address for IO buffers that is > uncached, and that virtual address is not (physical address + KERNELBASE). > I also have the beginnings of simulated BAT mapping for the 405 > (not quite there, but part way). > > -Frank > -- > Frank Rowand <frank_rowand@mvista.com> > MontaVista Software, Inc > [-- Attachment #2: Card for Ralph Blach --] [-- Type: text/x-vcard, Size: 247 bytes --] begin:vcard n:Blach;Ralph tel;work:919-543-1207 x-mozilla-html:TRUE url:www.ibm.com org:IBM MicroElectronics adr:;;3039 Cornwallis ;RTP;NC;27709;USA version:2.1 email;internet:rcblach@raleigh.ibm.com x-mozilla-cpt:;15936 fn:Ralph Blach end:vcard ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel mapping 2001-01-16 11:37 ` Ralph Blach @ 2001-01-16 16:50 ` Dan Malek 2001-01-16 17:10 ` Ralph Blach ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Dan Malek @ 2001-01-16 16:50 UTC (permalink / raw) To: Ralph Blach; +Cc: frowand, linuxppc-dev Ralph Blach wrote: > > Why do we need simulated bat registers. To improve performance. Right now, on the 4xx there is the concept of "pinned" TLB entries to reduce/eliminate TLB misses on large mapped areas (like kernel text/data or I/O). The 8xx does this in some custom applications as well. These are just hacks that are headed down a disastrous maintenance path that need to be stopped now for a more generic solution. I have been experimenting with many different methods of using the "large" page table sizes through the generic memory management methods that already exist in the kernel. I believe I can wrap the concept of the pinned TLB entries into the same logic as BAT register management on the bigger processors. Hence, I call them simulated BAT registers....the semantics aren't quite the same. The BAT registers are a really good thing, and although the large page size TLB entries are more flexible, they require more software overhead. I would like to make some generic Linux MM modifications to help us support variable page sizes, but I suspect that will never happen. -- Dan ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel mapping 2001-01-16 16:50 ` Dan Malek @ 2001-01-16 17:10 ` Ralph Blach 2001-01-16 17:47 ` David Edelsohn 2001-01-16 19:56 ` Frank Rowand 2 siblings, 0 replies; 14+ messages in thread From: Ralph Blach @ 2001-01-16 17:10 UTC (permalink / raw) To: Dan Malek; +Cc: frowand, linuxppc-dev [-- Attachment #1: Type: text/plain, Size: 1333 bytes --] Dan, Thanks for the info. I agree that Pinned tlbs could be maintence headache with each 4xx/8xx chip requiring a different set of pinned tlbs. Chip Dan Malek wrote: > > Ralph Blach wrote: > > > > Why do we need simulated bat registers. > > To improve performance. Right now, on the 4xx there is the > concept of "pinned" TLB entries to reduce/eliminate TLB misses > on large mapped areas (like kernel text/data or I/O). The 8xx > does this in some custom applications as well. These are just > hacks that are headed down a disastrous maintenance path that > need to be stopped now for a more generic solution. > > I have been experimenting with many different methods of using > the "large" page table sizes through the generic memory management > methods that already exist in the kernel. I believe I can wrap > the concept of the pinned TLB entries into the same logic as BAT > register management on the bigger processors. Hence, I call them > simulated BAT registers....the semantics aren't quite the same. > > The BAT registers are a really good thing, and although the large > page size TLB entries are more flexible, they require more software > overhead. I would like to make some generic Linux MM modifications > to help us support variable page sizes, but I suspect that will > never happen. > > -- Dan > [-- Attachment #2: Card for Ralph Blach --] [-- Type: text/x-vcard, Size: 247 bytes --] begin:vcard n:Blach;Ralph tel;work:919-543-1207 x-mozilla-html:TRUE url:www.ibm.com org:IBM MicroElectronics adr:;;3039 Cornwallis ;RTP;NC;27709;USA version:2.1 email;internet:rcblach@raleigh.ibm.com x-mozilla-cpt:;15936 fn:Ralph Blach end:vcard ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel mapping 2001-01-16 16:50 ` Dan Malek 2001-01-16 17:10 ` Ralph Blach @ 2001-01-16 17:47 ` David Edelsohn 2001-01-16 21:57 ` Dan Malek 2001-01-17 10:51 ` Gabriel Paubert 2001-01-16 19:56 ` Frank Rowand 2 siblings, 2 replies; 14+ messages in thread From: David Edelsohn @ 2001-01-16 17:47 UTC (permalink / raw) To: Dan Malek; +Cc: Ralph Blach, frowand, linuxppc-dev >>>>> Dan Malek writes: Dan> I have been experimenting with many different methods of using Dan> the "large" page table sizes through the generic memory management Dan> methods that already exist in the kernel. I believe I can wrap Dan> the concept of the pinned TLB entries into the same logic as BAT Dan> register management on the bigger processors. Hence, I call them Dan> simulated BAT registers....the semantics aren't quite the same. Note that forthcoming 64-bit PowerPC chips from IBM utilize multiple page sizes and no longer provide BAT registers. "BAT register management on the bigger processors" is a misnomer. David ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel mapping 2001-01-16 17:47 ` David Edelsohn @ 2001-01-16 21:57 ` Dan Malek 2001-01-17 10:51 ` Gabriel Paubert 1 sibling, 0 replies; 14+ messages in thread From: Dan Malek @ 2001-01-16 21:57 UTC (permalink / raw) To: David Edelsohn; +Cc: Ralph Blach, frowand, linuxppc-dev David Edelsohn wrote: > Note that forthcoming 64-bit PowerPC chips from IBM utilize > multiple page sizes and no longer provide BAT registers. "BAT register > management on the bigger processors" is a misnomer. No problem...I'm preparing.......Thanks for the info. -- Dan -- I like MMUs because I don't have a real life. ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel mapping 2001-01-16 17:47 ` David Edelsohn 2001-01-16 21:57 ` Dan Malek @ 2001-01-17 10:51 ` Gabriel Paubert 2001-01-17 17:45 ` David Edelsohn 1 sibling, 1 reply; 14+ messages in thread From: Gabriel Paubert @ 2001-01-17 10:51 UTC (permalink / raw) To: David Edelsohn; +Cc: linuxppc-dev On Tue, 16 Jan 2001, David Edelsohn wrote: > Note that forthcoming 64-bit PowerPC chips from IBM utilize > multiple page sizes and no longer provide BAT registers. "BAT register > management on the bigger processors" is a misnomer. How is it implemented ? Is there any documentation available on the web ? Regards, Gabriel. ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel mapping 2001-01-17 10:51 ` Gabriel Paubert @ 2001-01-17 17:45 ` David Edelsohn 0 siblings, 0 replies; 14+ messages in thread From: David Edelsohn @ 2001-01-17 17:45 UTC (permalink / raw) To: Gabriel Paubert; +Cc: linuxppc-dev >>>>> Gabriel Paubert writes: Gabriel> How is it implemented ? Is there any documentation available on the web ? See Paul DeMone's write-up at Real World Technologies website and the papers from Microprocessor Forum 1999 and 2000, Microprocessor Report 1999, and discussions in comp.arch on USENET. David ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel mapping 2001-01-16 16:50 ` Dan Malek 2001-01-16 17:10 ` Ralph Blach 2001-01-16 17:47 ` David Edelsohn @ 2001-01-16 19:56 ` Frank Rowand 2001-01-16 22:13 ` Dan Malek 2 siblings, 1 reply; 14+ messages in thread From: Frank Rowand @ 2001-01-16 19:56 UTC (permalink / raw) To: Dan Malek; +Cc: Ralph Blach, frowand, linuxppc-dev Dan Malek wrote: > > Ralph Blach wrote: > > > > Why do we need simulated bat registers. > > To improve performance. Right now, on the 4xx there is the > concept of "pinned" TLB entries to reduce/eliminate TLB misses > on large mapped areas (like kernel text/data or I/O). The 8xx > does this in some custom applications as well. These are just > hacks that are headed down a disastrous maintenance path that > need to be stopped now for a more generic solution. At the moment, the 405 processors _require_ kernel memory to be pinned because the tlb miss handlers use virtual addresses. When I started the 405 port I planned to move the TLB handlers into assembly running in real mode. Then when I started seeing info about the 440 I backed away from that plan because the 440 always runs with the MMU enabled. I'm still thinking about the 440... I pinned some IO ranges as a convenience when I was first porting to the 405gp but plan to remove those pins. Though I'm somewhat tempted to leave a pin in place for the on-chip ethernet device if performance measurements show a significant gain. > I have been experimenting with many different methods of using > the "large" page table sizes through the generic memory management > methods that already exist in the kernel. I believe I can wrap > the concept of the pinned TLB entries into the same logic as BAT > register management on the bigger processors. Hence, I call them > simulated BAT registers....the semantics aren't quite the same. I think that's a good idea. If you do so, please provide a way to force an entry to be locked in the tlb. > The BAT registers are a really good thing, and although the large > page size TLB entries are more flexible, they require more software > overhead. I would like to make some generic Linux MM modifications > to help us support variable page sizes, but I suspect that will > never happen. > > -- Dan I've toyed with the variable pages sizes idea too, and it just hasn't moved up high enough on my priority list. I'm not sure I'm quite as pessimistic as you about whether it will ever happen because several other architecture support variable page sizes (including pa-risc and (I think) IA-64). -Frank -- Frank Rowand <frank_rowand@mvista.com> MontaVista Software, Inc ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel mapping 2001-01-16 19:56 ` Frank Rowand @ 2001-01-16 22:13 ` Dan Malek 2001-01-17 0:04 ` Frank Rowand 0 siblings, 1 reply; 14+ messages in thread From: Dan Malek @ 2001-01-16 22:13 UTC (permalink / raw) To: frowand; +Cc: Ralph Blach, linuxppc-dev Frank Rowand wrote: > At the moment, the 405 processors _require_ kernel memory to be > pinned because the tlb miss handlers use virtual addresses. I changed that, too. It works like the other processors, in particular, the 8xx. > ......... Then when I started seeing info > about the 440 I backed away from that plan because the 440 always > runs with the MMU enabled. I'm still thinking about the 440... No way....Dammit can't you IBM guys follow your own rules :-). > I pinned some IO ranges as a convenience when I was first porting > to the 405gp but plan to remove those pins. Those are actually performace advantages, and I am doing that on some 8xx applications. The difference now is we don't have to actually allocate specific "pinned" entries, the large mapping will just happen as part of the TLB reload. > I think that's a good idea. If you do so, please provide a way to > force an entry to be locked in the tlb. Nope. I don't want to do that. Then you have to make processor specific trade offs, or incur high management overhead like the 405 does now. For example, some of the processors allow a fixed number of locked entries, but you have to trade off what you will put there against losing TLB entries. Or, you do like the 405 does and create a "software" locking, losing the use of some very functional TLB management instructions. By not locking entries and using large page table entries you don't need to have processor unique configurations that are cumbersome or unworkable on lesser featured processors. You also let the system operation find the best distribution of TLB entries. Yes, there is a clearly visible latency concern with loading TLBs, but considering the amount of context we are switching these days a single large page TLB miss is insignificant. > I've toyed with the variable pages sizes idea too, and it just hasn't > moved up high enough on my priority list. I'm not sure I'm quite as > pessimistic as you about whether it will ever happen It's going to happen with the 405 merge. It has to because I have already screwed up and coded myself into a corner, and I want the same features on the 8xx already as well. -- Dan -- I like MMUs because I don't have a real life. ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel mapping 2001-01-16 22:13 ` Dan Malek @ 2001-01-17 0:04 ` Frank Rowand 2001-01-17 7:02 ` Dan Malek 0 siblings, 1 reply; 14+ messages in thread From: Frank Rowand @ 2001-01-17 0:04 UTC (permalink / raw) To: Dan Malek; +Cc: frowand, linuxppc-dev Dan Malek wrote: > > Frank Rowand wrote: > I have only a small amount of performance instrumentation and measurements. Some of what I have to say is based on observation, inference, and conjecture... > > I pinned some IO ranges as a convenience when I was first porting > > to the 405gp but plan to remove those pins. > > Those are actually performace advantages, and I am doing that > on some 8xx applications. The difference now is we don't have > to actually allocate specific "pinned" entries, the large mapping > will just happen as part of the TLB reload. The IO ranges that I pinned were all just a 4k page (except the 64k "page" for PCI IO space, which shouldn't be accessed much except for PCI device initialization). So the only performance advantage I gained was avoiding TLB misses, not from large pages. > > I think that's a good idea. If you do so, please provide a way to > > force an entry to be locked in the tlb. > > Nope. I don't want to do that. Then you have to make processor > specific trade offs, or incur high management overhead like the > 405 does now. For example, some of the processors allow a fixed > number of locked entries, but you have to trade off what you will > put there against losing TLB entries. Or, you do like the 405 > does and create a "software" locking, losing the use of some > very functional TLB management instructions. The 405 core (and thus the many processors based on it) has a 64 entry tlb. While debugging via a JTAG debugger I have observed that the tlb very quickly gets filled with entries for the current context. It is extremely rare to see entries for a different context left over. >From this, I infer that the tlb is not large enough to hold a working set. (If I was still working as a performance geek, I would find this an interesting area to instrument.) Locking a few kernel entries in the tlb means that the majority of the kernel's working set _is_ in the tlb at all times. Here is a simple measurement of tlb misses (running a simple load of copying nfs mounted files around, etc): dtlb misses: 34679326 <--- data tlb itlb misses: 33075725 <--- instruction tlb d + i misses: 67755051 ktlb misses: 233683 <--- kernel addresses utlb misses: 67521368 <--- user space addresses k + u misses: 67755051 If you want to repeat the measurement with other workloads, just cat /proc/ppc_htab in my kernel to get the above data. For the 405, the only tlb management instruction I sacrificed was the tlbia (invalidate the entire tlb) that I would have used for PPC4xx_tlb_flush_all(), which is used by flush_tlb_all(), which is only called from: ppc_htab_write() mmu_context_overflow() vmfree_area_pages() vmalloc_area_pages() flush_all_zero_pkmaps() Which doesn't seem to be much of a sacrifice for a large gain. > By not locking entries and using large page table entries you don't > need to have processor unique configurations that are cumbersome > or unworkable on lesser featured processors. You also let the > system operation find the best distribution of TLB entries. Yes, The tlb is not large enough to accumulate a working set in the tlb, so system operation never finds the best distribution of tlb entries. > there is a clearly visible latency concern with loading TLBs, but > considering the amount of context we are switching these days a > single large page TLB miss is insignificant. It will be nice to have large page TLB implemented. -Frank -- Frank Rowand <frank_rowand@mvista.com> MontaVista Software, Inc ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: kernel mapping 2001-01-17 0:04 ` Frank Rowand @ 2001-01-17 7:02 ` Dan Malek 0 siblings, 0 replies; 14+ messages in thread From: Dan Malek @ 2001-01-17 7:02 UTC (permalink / raw) To: frowand; +Cc: linuxppc-dev Frank Rowand wrote: > The 405 core (and thus the many processors based on it) has a 64 entry > tlb. While debugging via a JTAG debugger I have observed that the > tlb very quickly gets filled with entries for the current context. Well, that's because we force it to do that. Programs that have a huge working set and run for an extended period will fill the TLB. Programs with small working sets will not if we actually use the contexts properly, but we don't. The way contexts are used today, it is effectively flushing the TLB on every switch. We do the same thing with VSIDs on the "bigger" processors, and this isn't right either. The Linux VM properly manages memory contexts, and we should extend this into the PowerPC specific software. I have an LRU context algorithm for the 8xx, and I want to extend this into the 4xx and even the other processors. The idea is right, but I need something that will scale beyond the small 16 contexts of the 8xx. I just don't have that yet. -- Dan -- I like MMUs because I don't have a real life. ** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2001-01-17 17:45 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2001-01-15 23:13 kernel mapping Dan Malek 2001-01-16 3:07 ` Frank Rowand 2001-01-16 3:55 ` Dan Malek 2001-01-16 11:37 ` Ralph Blach 2001-01-16 16:50 ` Dan Malek 2001-01-16 17:10 ` Ralph Blach 2001-01-16 17:47 ` David Edelsohn 2001-01-16 21:57 ` Dan Malek 2001-01-17 10:51 ` Gabriel Paubert 2001-01-17 17:45 ` David Edelsohn 2001-01-16 19:56 ` Frank Rowand 2001-01-16 22:13 ` Dan Malek 2001-01-17 0:04 ` Frank Rowand 2001-01-17 7:02 ` Dan Malek
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).