From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <3CF4E842.3070207@embeddededge.com> Date: Wed, 29 May 2002 10:40:02 -0400 From: Dan Malek MIME-Version: 1.0 To: David Gibson Cc: linuxppc-embedded@lists.linuxppc.org, Paul Mackerras Subject: Re: LMBench and CONFIG_PIN_TLB References: <20020529030838.GZ16537@zax> Content-Type: text/plain; charset=us-ascii; format=flowed Sender: owner-linuxppc-embedded@lists.linuxppc.org List-Id: David Gibson wrote: > I did some LMBench runs to observe the effect of CONFIG_PIN_TLB. I implemented the tlb pinning for two reasons. One, politics, since everyone "just knows it is signficanlty better", and two, to alleviate the exception path return problem of taking a TLB miss after loading SRR0/1. > .... the difference varies between > nothing (lost in the noise) to around 15% (fork proc). The only > measurement where no pinned entries might be argued to win is > LMbench's main memory latency measurement. The difference is < 0.1% > and may just be chance fluctation. It has been my experience over the last 20 years that in general applications that show high TLB miss activity are making inefficient use of all system resources and aren't likely to be doing any useful work. Why aren't we measuring cache efficiency? Why aren't we profiling the kernel to see where code changes will really make a difference? Why aren't we measuring TLB performace on all processors? If you want to improve TLB performance, get a processor with larger TLBs or better hardware support. Pinning TLB entries simply reduces the resource availability. When I'm running a real application, doing real work in a real product, I don't want these resources allocated for something else that is seldom used. There are lots of other TLB management implementations that can really improve performance, they just don't fit well into the current Linux/PowerPC design. I have seen exactly one application where TLB pinning actually improved the performace of the system. It was a real-time system, based on Linux using an MPC8xx, where the maximum event response latency had to be guaranteed. With the proper locking of pages and TLB pins this could be done. It didn't improve the performance of the application, but did ensure the system operated properly. > The difference between 1 and 2 pinned entries is very small. > There are a few cases where 1 might be better (but it might just be > random noise) and a very few where 2 might be better than one. On the > basis of that there seems little point in pinning 2 entries. What kind of scientific analysis is this? Run controlled tests, post the results, explain the variances, and allow it to be repeatable by others. Is there any consistency to the results? > ..... Unless someone can come up with a > real life workload which works poorly with pinned TLBs, I see little > point in keeping the option - pinned TLBs should always be on (pinning > 1 entry). Where is your data that supports this? Where is your "real life workload" that actually supports what you want to do? From my perspective, your data shows we shouldn't do it. A "real life workload" is not a fork proc test, but rather main memory latency test, where your tests showed it was better to not pin entries but you can't explain the "fluctuation." I contend the difference is due to the fact you have reduced the TLB resources, increasing the number of TLB misses to an application that is trying to do real work. I suggest you heed the quote you always attach to your messages. This isn't a simple solution that is suitable for all applications. It's one option among many that needs to be tuned to meet the requirements of an application. Thanks. -- Dan ** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/