* 405 TLB miss reduction
@ 2003-12-10 14:43 Wolfgang Grandegger
2003-12-10 16:03 ` Matt Porter
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Wolfgang Grandegger @ 2003-12-10 14:43 UTC (permalink / raw)
To: linuxppc-embedded
Hello,
we are suffering from TLB misses on a 405GP processor, eating up to
10% of the CPU power when running our (rather big) application. We
can regain a few percent by using the kernel option CONFIG_PIN_TLB
but we are thinking about further kernel modifications to reduce
TLB misses. What comes into my mind is:
- using a kernel PAGE_SIZE of 8KB (instead of 4KB).
- using large-page TLB entries.
Has anybody already investigated the effort or benefit of such
changes or knows about other (simple) measures (apart from
replacing the hardware)?
TIA.
Wolfgang.
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: 405 TLB miss reduction 2003-12-10 14:43 405 TLB miss reduction Wolfgang Grandegger @ 2003-12-10 16:03 ` Matt Porter 2003-12-11 9:06 ` Wolfgang Grandegger 2003-12-10 17:08 ` Dan Malek 2003-12-11 16:44 ` Jon Masters 2 siblings, 1 reply; 14+ messages in thread From: Matt Porter @ 2003-12-10 16:03 UTC (permalink / raw) To: Wolfgang Grandegger; +Cc: linuxppc-embedded On Wed, Dec 10, 2003 at 03:43:41PM +0100, Wolfgang Grandegger wrote: > > Hello, > > we are suffering from TLB misses on a 405GP processor, eating up to > 10% of the CPU power when running our (rather big) application. We > can regain a few percent by using the kernel option CONFIG_PIN_TLB > but we are thinking about further kernel modifications to reduce > TLB misses. What comes into my mind is: > > - using a kernel PAGE_SIZE of 8KB (instead of 4KB). > - using large-page TLB entries. > > Has anybody already investigated the effort or benefit of such > changes or knows about other (simple) measures (apart from > replacing the hardware)? David Gibson and Paul M. implemented large TLB kernel lowmem support in 2.5/2.6 for 405. It allows for large TLB entries to be loaded on kernel lowmem TLB misses. This is better than the CONFIG_PIN_TLB since it works for all of your kernel lowmem system memory rather than the fixed amount of memory that CONFIG_PIN_TLB covers. I've been thinking about enabling a variant of Andi Kleen's patch to allow modules to be loaded into kernel lowmem space instead of vmalloc space (to avoid the performance penalty of modular drivers). This takes advantage of the large kernel lowmem 405 support above and on 440 all kernel lowmem is in a pinned tlb for architectural reasons. I've also been thinking about dynamically using large TLB/PTE mappings for ioremap on 405/440. In 2.6, there is hugetlb userspace infrastructure that could be enabled for the large page sizes on 4xx. Allowing a compile time choice of default page size would also be useful. Basically, all of these cases can provide a performance advantage depending on your embedded application...it all depends on what your application is doing. -Matt ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 405 TLB miss reduction 2003-12-10 16:03 ` Matt Porter @ 2003-12-11 9:06 ` Wolfgang Grandegger 2003-12-11 15:46 ` Dan Malek 2003-12-11 17:45 ` Matt Porter 0 siblings, 2 replies; 14+ messages in thread From: Wolfgang Grandegger @ 2003-12-11 9:06 UTC (permalink / raw) To: Matt Porter; +Cc: linuxppc-embedded On 12/10/2003 05:03 PM Matt Porter wrote: > On Wed, Dec 10, 2003 at 03:43:41PM +0100, Wolfgang Grandegger wrote: >> >> Hello, >> >> we are suffering from TLB misses on a 405GP processor, eating up to >> 10% of the CPU power when running our (rather big) application. We >> can regain a few percent by using the kernel option CONFIG_PIN_TLB >> but we are thinking about further kernel modifications to reduce >> TLB misses. What comes into my mind is: >> >> - using a kernel PAGE_SIZE of 8KB (instead of 4KB). >> - using large-page TLB entries. >> >> Has anybody already investigated the effort or benefit of such >> changes or knows about other (simple) measures (apart from >> replacing the hardware)? > > David Gibson and Paul M. implemented large TLB kernel lowmem > support in 2.5/2.6 for 405. It allows for large TLB entries > to be loaded on kernel lowmem TLB misses. This is better than > the CONFIG_PIN_TLB since it works for all of your kernel lowmem > system memory rather than the fixed amount of memory that > CONFIG_PIN_TLB covers. Ah, I will have a look to 2.5/2.6. Is there a backport for 2.4? > I've been thinking about enabling a variant of Andi Kleen's patch > to allow modules to be loaded into kernel lowmem space instead of > vmalloc space (to avoid the performance penalty of modular drivers). > This takes advantage of the large kernel lowmem 405 support above > and on 440 all kernel lowmem is in a pinned tlb for architectural > reasons. Is this patch available somewhere? It would be interesting to measure the improvement for our application. > I've also been thinking about dynamically using large TLB/PTE mappings > for ioremap on 405/440. OK, I expect not so much benefit from this measure but it depends on the application, of course. > In 2.6, there is hugetlb userspace infrastructure that could be enabled > for the large page sizes on 4xx. But this sounds more promising. Same questing as above. Is there a backport for 2.4? > Allowing a compile time choice of default page size would also be useful. Increasing the page size from 4 to 8 kB should, in theory, halve the page misses (if no large TLB pages are used). Unfortunately, increasing the page size seem not straight forward as it's statically used in various places and maybe the GLIBC needs to be rebuild as well. > Basically, all of these cases can provide a performance advantage > depending on your embedded application...it all depends on what your > application is doing. Of course, and tweaking the kernel for a dedicated application might not been worth the effort. Anyhow, I have now a better idea what else can be done. Thanks. Wolfgang. ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 405 TLB miss reduction 2003-12-11 9:06 ` Wolfgang Grandegger @ 2003-12-11 15:46 ` Dan Malek 2003-12-11 17:45 ` Matt Porter 1 sibling, 0 replies; 14+ messages in thread From: Dan Malek @ 2003-12-11 15:46 UTC (permalink / raw) To: Wolfgang Grandegger; +Cc: Matt Porter, linuxppc-embedded Wolfgang Grandegger wrote: > Increasing the page size from 4 to 8 kB should, in theory, halve the > page misses (if no large TLB pages are used). It depends entirely on locality of reference. Without doing any kind execution analysis (which isn't the proper engineering practice) you could assume it would help instruction pages and have little effect on data pages. > ... Unfortunately, increasing > the page size seem not straight forward as it's statically used in > various places and maybe the GLIBC needs to be rebuild as well. The MIPS port uses various (but static) page sizes depending upon the requirements of the processor core. IIRC, their glibc can handle this at run time. Maybe Drow can add some comments here. In any case there are already kernel and user reference ports we should leverage if we intend to go down this path. Thanks. -- Dan ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 405 TLB miss reduction 2003-12-11 9:06 ` Wolfgang Grandegger 2003-12-11 15:46 ` Dan Malek @ 2003-12-11 17:45 ` Matt Porter 2003-12-12 9:50 ` Wolfgang Grandegger 2003-12-15 11:26 ` Joakim Tjernlund 1 sibling, 2 replies; 14+ messages in thread From: Matt Porter @ 2003-12-11 17:45 UTC (permalink / raw) To: Wolfgang Grandegger; +Cc: Matt Porter, linuxppc-embedded On Thu, Dec 11, 2003 at 10:06:32AM +0100, Wolfgang Grandegger wrote: > > On 12/10/2003 05:03 PM Matt Porter wrote: > > David Gibson and Paul M. implemented large TLB kernel lowmem > > support in 2.5/2.6 for 405. It allows for large TLB entries > > to be loaded on kernel lowmem TLB misses. This is better than > > the CONFIG_PIN_TLB since it works for all of your kernel lowmem > > system memory rather than the fixed amount of memory that > > CONFIG_PIN_TLB covers. > > Ah, I will have a look to 2.5/2.6. Is there a backport for 2.4? No. > > I've been thinking about enabling a variant of Andi Kleen's patch > > to allow modules to be loaded into kernel lowmem space instead of > > vmalloc space (to avoid the performance penalty of modular drivers). > > This takes advantage of the large kernel lowmem 405 support above > > and on 440 all kernel lowmem is in a pinned tlb for architectural > > reasons. > > Is this patch available somewhere? It would be interesting to measure > the improvement for our application. Google is your friend. http://seclists.org/lists/linux-kernel/2002/Oct/6522.html IIRC, there's a later version with some minor differences. > > I've also been thinking about dynamically using large TLB/PTE mappings > > for ioremap on 405/440. > > OK, I expect not so much benefit from this measure but it depends on the > application, of course. Yes, I've seen a lot of apps with huge shared memory areas across PCI that can benefit from this...they used BATs on classic PPCs. > > In 2.6, there is hugetlb userspace infrastructure that could be enabled > > for the large page sizes on 4xx. > > But this sounds more promising. Same questing as above. Is there a > backport for 2.4? No. > > Allowing a compile time choice of default page size would also be useful. > > Increasing the page size from 4 to 8 kB should, in theory, halve the > page misses (if no large TLB pages are used). Unfortunately, increasing > the page size seem not straight forward as it's statically used in > various places and maybe the GLIBC needs to be rebuild as well. Possibly, as Dan mentions, there are other arches already doing this type of thing. I know ia64 does and sounds like MIPS is another. > > Basically, all of these cases can provide a performance advantage > > depending on your embedded application...it all depends on what your > > application is doing. > > Of course, and tweaking the kernel for a dedicated application might not > been worth the effort. Anyhow, I have now a better idea what else can be > done. When I used to do apps work we were very performance sensitive (depends on you project, of course) and we were very willing to make kernel tweaks (proprietary RTOS) to me our requirements. It all depends on your requirements, constraints, budget, etc. :) -Matt ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 405 TLB miss reduction 2003-12-11 17:45 ` Matt Porter @ 2003-12-12 9:50 ` Wolfgang Grandegger 2003-12-15 11:26 ` Joakim Tjernlund 1 sibling, 0 replies; 14+ messages in thread From: Wolfgang Grandegger @ 2003-12-12 9:50 UTC (permalink / raw) To: Matt Porter; +Cc: linuxppc-embedded On 12/11/2003 06:45 PM Matt Porter wrote: > On Thu, Dec 11, 2003 at 10:06:32AM +0100, Wolfgang Grandegger wrote: >> >> On 12/10/2003 05:03 PM Matt Porter wrote: >> > David Gibson and Paul M. implemented large TLB kernel lowmem >> > support in 2.5/2.6 for 405. It allows for large TLB entries >> > to be loaded on kernel lowmem TLB misses. This is better than >> > the CONFIG_PIN_TLB since it works for all of your kernel lowmem >> > system memory rather than the fixed amount of memory that >> > CONFIG_PIN_TLB covers. >> >> Ah, I will have a look to 2.5/2.6. Is there a backport for 2.4? > > No. > >> > I've been thinking about enabling a variant of Andi Kleen's patch >> > to allow modules to be loaded into kernel lowmem space instead of >> > vmalloc space (to avoid the performance penalty of modular drivers). >> > This takes advantage of the large kernel lowmem 405 support above >> > and on 440 all kernel lowmem is in a pinned tlb for architectural >> > reasons. >> >> Is this patch available somewhere? It would be interesting to measure >> the improvement for our application. > > Google is your friend. > http://seclists.org/lists/linux-kernel/2002/Oct/6522.html > IIRC, there's a later version with some minor differences. > >> > I've also been thinking about dynamically using large TLB/PTE mappings >> > for ioremap on 405/440. >> >> OK, I expect not so much benefit from this measure but it depends on the >> application, of course. > > Yes, I've seen a lot of apps with huge shared memory areas across PCI > that can benefit from this...they used BATs on classic PPCs. > >> > In 2.6, there is hugetlb userspace infrastructure that could be enabled >> > for the large page sizes on 4xx. >> >> But this sounds more promising. Same questing as above. Is there a >> backport for 2.4? > > No. > >> > Allowing a compile time choice of default page size would also be useful. >> >> Increasing the page size from 4 to 8 kB should, in theory, halve the >> page misses (if no large TLB pages are used). Unfortunately, increasing >> the page size seem not straight forward as it's statically used in >> various places and maybe the GLIBC needs to be rebuild as well. > > Possibly, as Dan mentions, there are other arches already doing this > type of thing. I know ia64 does and sounds like MIPS is another. > >> > Basically, all of these cases can provide a performance advantage >> > depending on your embedded application...it all depends on what your >> > application is doing. >> >> Of course, and tweaking the kernel for a dedicated application might not >> been worth the effort. Anyhow, I have now a better idea what else can be >> done. > > When I used to do apps work we were very performance sensitive (depends > on you project, of course) and we were very willing to make kernel > tweaks (proprietary RTOS) to me our requirements. It all depends on > your requirements, constraints, budget, etc. :) Well, time and money is usually a scarce resource :-(. Anyhow, this thread showed me that it might be worth tweaking the kernel and that there are already various implementations which could be followed after a more detailed analysis of the TLB misses. Thank you and Dan very much for the valuable input. Wolfgang. ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* RE: 405 TLB miss reduction 2003-12-11 17:45 ` Matt Porter 2003-12-12 9:50 ` Wolfgang Grandegger @ 2003-12-15 11:26 ` Joakim Tjernlund 1 sibling, 0 replies; 14+ messages in thread From: Joakim Tjernlund @ 2003-12-15 11:26 UTC (permalink / raw) To: 'Matt Porter', 'Wolfgang Grandegger'; +Cc: linuxppc-embedded > > > I've also been thinking about dynamically using large > TLB/PTE mappings > > > for ioremap on 405/440. > > > > OK, I expect not so much benefit from this measure but it > depends on the > > application, of course. > > Yes, I've seen a lot of apps with huge shared memory areas across PCI > that can benefit from this...they used BATs on classic PPCs. hmm, I wonder if this would be useful for systems using JFFS2/MTD? JFFS2/MTD usually ioremaps the underlying FLASH memory, which can be many MB. Jocke ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 405 TLB miss reduction 2003-12-10 14:43 405 TLB miss reduction Wolfgang Grandegger 2003-12-10 16:03 ` Matt Porter @ 2003-12-10 17:08 ` Dan Malek 2003-12-11 10:37 ` Wolfgang Grandegger 2003-12-11 16:44 ` Jon Masters 2 siblings, 1 reply; 14+ messages in thread From: Dan Malek @ 2003-12-10 17:08 UTC (permalink / raw) To: Wolfgang Grandegger; +Cc: linuxppc-embedded Wolfgang Grandegger wrote: > ....We > can regain a few percent by using the kernel option CONFIG_PIN_TLB > but we are thinking about further kernel modifications to reduce > TLB misses. What comes into my mind is: If you have a large application I doubt any kernel modification will gain much. It's the application causing the huge amounts of tlb misses, you probably need to evaluate changes that will reduce that. It's always easy to pick on the kernel and make some changes becaue it is a very static and well behaved application. It seems your biggest performance increase would come from the analysis of the application and some redesign to improve its use of system resources. Thanks. -- Dan ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 405 TLB miss reduction 2003-12-10 17:08 ` Dan Malek @ 2003-12-11 10:37 ` Wolfgang Grandegger 2003-12-11 16:48 ` Jon Masters 0 siblings, 1 reply; 14+ messages in thread From: Wolfgang Grandegger @ 2003-12-11 10:37 UTC (permalink / raw) To: Dan Malek; +Cc: linuxppc-embedded On 12/10/2003 06:08 PM Dan Malek wrote: > Wolfgang Grandegger wrote: > >> ....We >> can regain a few percent by using the kernel option CONFIG_PIN_TLB >> but we are thinking about further kernel modifications to reduce >> TLB misses. What comes into my mind is: > > If you have a large application I doubt any kernel modification > will gain much. It's the application causing the huge amounts > of tlb misses, you probably need to evaluate changes that will > reduce that. > > It's always easy to pick on the kernel and make some changes > becaue it is a very static and well behaved application. It > seems your biggest performance increase would come from the > analysis of the application and some redesign to improve its > use of system resources. We have been surprised, that CONFIG_PIN_TLB was able to reduce the page miss rate already by approx. 40%. We are also working on the optimization/tuning of our application and likely there we can gain more than by further squeezing the TBL management of the Linux kernel, I agree. Thanks. Wolfgang. ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 405 TLB miss reduction 2003-12-11 10:37 ` Wolfgang Grandegger @ 2003-12-11 16:48 ` Jon Masters 2003-12-11 16:56 ` Wolfgang Grandegger 0 siblings, 1 reply; 14+ messages in thread From: Jon Masters @ 2003-12-11 16:48 UTC (permalink / raw) To: Wolfgang Grandegger; +Cc: Dan Malek, linuxppc-embedded Wolfgang Grandegger wrote: | We have been surprised, that CONFIG_PIN_TLB was able to reduce the page | miss rate already by approx. 40%. We are also working on the | optimization/tuning of our application and likely there we can gain more | than by further squeezing the TBL management of the Linux kernel, I agree. I noticed a possible improvement in performance however have not yet done a set of tests to back this up. I saw the huge TLB stuff in 2.6.0-test11 and nearly fell off my chair - can someone here provide some reference to read on this so I can be more useful talking about it? Jon. ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 405 TLB miss reduction 2003-12-11 16:48 ` Jon Masters @ 2003-12-11 16:56 ` Wolfgang Grandegger 2003-12-11 17:06 ` Jon Masters 0 siblings, 1 reply; 14+ messages in thread From: Wolfgang Grandegger @ 2003-12-11 16:56 UTC (permalink / raw) To: Jon Masters; +Cc: Dan Malek, linuxppc-embedded On 12/11/2003 05:48 PM Jon Masters wrote: > > Wolfgang Grandegger wrote: > > | We have been surprised, that CONFIG_PIN_TLB was able to reduce the page > | miss rate already by approx. 40%. We are also working on the > | optimization/tuning of our application and likely there we can gain more > | than by further squeezing the TBL management of the Linux kernel, I agree. > > I noticed a possible improvement in performance however have not yet > done a set of tests to back this up. I saw the huge TLB stuff in > 2.6.0-test11 and nearly fell off my chair - can someone here provide > some reference to read on this so I can be more useful talking about it? I found "Documentation/vm/hugetlbpage.txt" in linuxppc-2.5 from BK. Wolfgang. ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 405 TLB miss reduction 2003-12-11 16:56 ` Wolfgang Grandegger @ 2003-12-11 17:06 ` Jon Masters 2003-12-11 17:36 ` Matt Porter 0 siblings, 1 reply; 14+ messages in thread From: Jon Masters @ 2003-12-11 17:06 UTC (permalink / raw) To: Wolfgang Grandegger; +Cc: Dan Malek, linuxppc-embedded Wolfgang Grandegger wrote: | I found "Documentation/vm/hugetlbpage.txt" in linuxppc-2.5 from BK. Yeah I have that too but want to know whether there is a resource I am missing where this stuff is getting discussed and sorted. Jon. ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 405 TLB miss reduction 2003-12-11 17:06 ` Jon Masters @ 2003-12-11 17:36 ` Matt Porter 0 siblings, 0 replies; 14+ messages in thread From: Matt Porter @ 2003-12-11 17:36 UTC (permalink / raw) To: Jon Masters; +Cc: Wolfgang Grandegger, Dan Malek, linuxppc-embedded On Thu, Dec 11, 2003 at 05:06:25PM +0000, Jon Masters wrote: > > > Wolfgang Grandegger wrote: > > | I found "Documentation/vm/hugetlbpage.txt" in linuxppc-2.5 from BK. > > Yeah I have that too but want to know whether there is a resource I am > missing where this stuff is getting discussed and sorted. In the past, there's been lots of discussion about hugetlb support on lkml. google for it. -Matt ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: 405 TLB miss reduction 2003-12-10 14:43 405 TLB miss reduction Wolfgang Grandegger 2003-12-10 16:03 ` Matt Porter 2003-12-10 17:08 ` Dan Malek @ 2003-12-11 16:44 ` Jon Masters 2 siblings, 0 replies; 14+ messages in thread From: Jon Masters @ 2003-12-11 16:44 UTC (permalink / raw) To: Wolfgang Grandegger; +Cc: linuxppc-embedded Wolfgang Grandegger wrote: | Has anybody already investigated the effort or benefit of such | changes or knows about other (simple) measures (apart from | replacing the hardware)? I have been thinking about doing something slightly more intelligent with tLB miss handling and selecting the next entry to write. At the moment I am doing stuff like setting up PTXdist (which does indeed rule by the way once you get the right configuration) however I could do with something for the holiday period so please keep me updated ~ and I will try to fit bits in when I get time with a board to do it. There is an EPPC405 which I am hoping to borrow for a while because it is not particularly being used but I will see what happens about it. Cheers, Jon. P.S. I am going to be at FOSDEM and welcome the idea of meeting other ppc 405 or Virtex II Pro or indeed anyone from this list. ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/ ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2003-12-15 11:26 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2003-12-10 14:43 405 TLB miss reduction Wolfgang Grandegger 2003-12-10 16:03 ` Matt Porter 2003-12-11 9:06 ` Wolfgang Grandegger 2003-12-11 15:46 ` Dan Malek 2003-12-11 17:45 ` Matt Porter 2003-12-12 9:50 ` Wolfgang Grandegger 2003-12-15 11:26 ` Joakim Tjernlund 2003-12-10 17:08 ` Dan Malek 2003-12-11 10:37 ` Wolfgang Grandegger 2003-12-11 16:48 ` Jon Masters 2003-12-11 16:56 ` Wolfgang Grandegger 2003-12-11 17:06 ` Jon Masters 2003-12-11 17:36 ` Matt Porter 2003-12-11 16:44 ` Jon Masters
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).