From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S266490AbUHUPWc (ORCPT ); Sat, 21 Aug 2004 11:22:32 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S266686AbUHUPWc (ORCPT ); Sat, 21 Aug 2004 11:22:32 -0400 Received: from holomorphy.com ([207.189.100.168]:10192 "EHLO holomorphy.com") by vger.kernel.org with ESMTP id S266490AbUHUPW0 (ORCPT ); Sat, 21 Aug 2004 11:22:26 -0400 Date: Sat, 21 Aug 2004 08:22:18 -0700 From: William Lee Irwin III To: Anton Blanchard Cc: Andrew Morton , Jesse Barnes , linux-kernel@vger.kernel.org Subject: Re: 2.6.8.1-mm3 Message-ID: <20040821152218.GZ11200@holomorphy.com> Mail-Followup-To: William Lee Irwin III , Anton Blanchard , Andrew Morton , Jesse Barnes , linux-kernel@vger.kernel.org References: <20040820031919.413d0a95.akpm@osdl.org> <200408201144.49522.jbarnes@engr.sgi.com> <200408201257.42064.jbarnes@engr.sgi.com> <20040820115541.3e68c5be.akpm@osdl.org> <20040820200248.GJ11200@holomorphy.com> <20040820233126.GJ1945@krispykreme> <20040821000343.GR11200@holomorphy.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20040821000343.GR11200@holomorphy.com> User-Agent: Mutt/1.5.6+20040722i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org At some point in the past, I wrote: >>> Parallel compilation is an extremely poor benchmark in general, as the >>> workload is incapable of being effectively scaled to system sizes, the >>> linking phase is inherently unparallelizable and the compilation phase >>> too parallelizable to actually stress anything. There is also precisely >>> zero relevance the benchmark has to anything real users would do. The manner in which the load is too parallelizable is that literally the only shared data accessed are the directories and process creation related structures. By and large the dcache is useless to beat on as it's long been known what needs to happen there: the hashtable has to die in favor of a data structure that interoperates properly with lockless synchronization and with some remote hint of cache locality. Real workloads run by real users perform nontrivial communication between processes, not wait4(), access more devices than merely disk, and furthermore, don't fork() and exit() all day in preference to doing real work. On Fri, Aug 20, 2004 at 05:03:43PM -0700, William Lee Irwin III wrote: > Kernel hacking is not an end in itself, regardless of the fact there > are some, such as myself, who use computers for no other purpose. A > real user generally has some purpose to their activity beyond working > on the software or hardware they are "using". e.g. various real users > use their systems for entertainment: playing games, music, and movies. > Others may use their systems to make money somehow, e.g. archiving > information about customers so they can look up what they've bought > and paid for or have yet to pay for. > Regardless of the social issue, the rather serious technical deficits > of compilation of any software as a benchmark are showstopping issues. > Frankly, even the issues I've dredged up are nowhere near comprehensive. > There are further issues such as that stable (i.e. not varying across > the benchmarks being done on various systems at various times) versions > of the software being compiled and the toolchain being used to compile > it are lacking as components of any "kernel compile benchmarking suite" > and worse still the variance in target architecture of the toolchain > also defeats any attempt at meaningful benchmarking. To be useful at all, benchmarks have to be useful to evaluate different machines as well. For instance, to evaluate the scalability of the kernel to different sizes of machines, different machines must be comparable. Likewise, to evaluate how well utilizing a particular hardware feature of an architecture improves kernel performance relative to an architecture without the hardware feature, different machines must be compared. The relative performance of the machines before the kernel feature is utilized must be compared to the relative performance of the machines after the feature is utilized. Proper benchmarks are furthermore explicitly used to evaluate hardware. Individual users posting results from their own frozen-at-gcc-versions- of-their-choice userspace are worthless for this. If there were to be an attempt at a proper kernel and system benchmark using kernel compiles, which is unlikely ever to happen, one would do the following: (a) Bundle a toolchain and all supporting userspace required for it into the benchmark so that the gcc, make, etc. are identical for all users of the benchmark. (b) Run O(num_cpus_online()) kernel compiles in parallel instead of a single make -j so the workload can be sized appropriately for the system. The fact is that 32x+ systems can't even be kept loaded by the "benchmark" because there is just not enough work to distribute, so this has to be done. (c) Measure throughput in terms of kernel compiles per minute, and explicitly measure the variance during the runs. (d) This still doesn't fix the fact that there are no nontrivial shared resources amongst the processes. It still doesn't benchmark anything useful, as it's not modeled on any real end user workload and not targeted at any specific kernel functionality. It will merely produce self-consistent results with these fixes. i.e. the methodology now used for "kernel compile benchmarks" is poor and all of the "results" obtained from it are highly questionable and should be ignored. On Fri, Aug 20, 2004 at 05:03:43PM -0700, William Lee Irwin III wrote: > If you're truly concerned about compilation speed, userspace is going > to be the most productive area to work on anyway, as the vast majority > of time during compilation is spent in userspace. AIUI the userspace > algorithms in gcc are not particularly cognizant of cache locality and > in various instances have suboptimal time and space behavior, so it's > not as if there isn't work to be done there. Improving the compactness > and cache locality of data structures is important in userspace also, > and most (perhaps all) userspace programs are grossly ignorant of this. > FWIW, there are notable kernel hackers known to use very downrev gcc > versions due to regressions in compilation speed in subsequent versions, > so there are already large known differences in compilation speed that > can be obtained just by choosing a different compiler version. The point here is what to do if you are literally trying to improve compilation speed. If you are trying to benchmark the kernel, you should do a vaguely realistic simulation of something a user might do to stress the kernel or a real microbenchmark instead of repeating mistakes with poor methodology. The results are so bad they have to be thrown away after every post, as the relative results' baselines are effectively untraceable. It's also needlessly complex. It does too many different things at once, and to no useful effect, as it's not a meaningful macrobenchmark either, and so its results are even misleading. As it has been used it is logically impossible for it to have properly motivated any improvement of the kernel, or ever to do so in the future. If you care about fork(), then use a fork() microbenchmark; every meaningful improvement of fork() has been measured by such. If you care about the parallelism of the vfs, then use a vfs microbenchmark; every meaningful improvement of the vfs has been measured by such. Kernel compiles should not be used as benchmarks as they are now, and their results should not be taken into consideration as performance metrics. I highly encourage those concerned about performance to use other benchmarks, e.g. reaim and the like, which are multiuser simulations to measure interactive response and the like, or another properly-constructed benchmark targeted at their performance concerns. I encourage readers of kernel compile performance results to disregard them. The only real point of interest regarding this kind of affair on supercomputers is as a stress test to verify that the kernel doesn't livelock or deadlock due to the extreme performance characteristics of such large systems, and I encourage other readers likewise to limit their interest and involvement in this thread to stress testing results. -- wli