From mboxrd@z Thu Jan 1 00:00:00 1970 From: Kurt Fitzner Subject: Re: [parisc-linux] B132L outperforms C160 - 64-bit userland needed? Date: Tue, 16 Aug 2005 21:43:51 -0600 Message-ID: <4302B277.40706@excelcia.org> References: <200508170132.j7H1W4Sq027309@hiauly1.hia.nrc.ca> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 To: parisc-linux@lists.parisc-linux.org Return-Path: In-reply-to: <200508170132.j7H1W4Sq027309@hiauly1.hia.nrc.ca> List-Id: parisc-linux developers list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: parisc-linux-bounces@lists.parisc-linux.org John David Anglin wrote: >>>>When that produced even worse results, I tried -march=2.0 vs 1.1 and >>>>-mschedule=8000 vs 7300 seperately. Each one alone slows down the >>>>benchmark and the effect is addititive. It seems that in Linux, right >>>>now at least, compiling with -march=2.0 or -mschedule=8000 is a Bad Thing. >>>> > > > In theory, -mschedule=8000 should only be used on machines with PA 2.0 > processors (i.e., not the B132L). I must not have been clear. Wanting to have as level a playing field as possible for benchmarking, I fist used identical binaries for both the PA 1.1 and PA 2.0 machines. These binaries were build as -march=1.1/-mschedule=7300. When I first noticed that the B132L was outperforming the C160, I then recompiled the C160 binaries as -march=2.0/-mschedule=8000. My thinking was that perhaps the 1.1/7300 binaries on the C160 were perhaps causing the poor results. The recompiled 2.0/8000 binaries had even /poorer/ results, on the order of a consistent two percent reduction in performance over 1.1/7300. > ...It is tweaked to the number of execution > units, etc, in the PA 2.0 processor. How much difference this makes in the > real world is not clear. I haven't seen any numbers. As far as the > models themselves, they haven't changed since they were added by Jeff > Law somewhere around GCC 3.0. It would be interesting to see how they > compare on the same cpu, same os, etc. In Linux 2.6.10-pa11 and 2.6.8.1-pa11 with a 32 bit kernel on a C160 (PA-8000 cpu) * -march=1.1 produces code that performs approximately one percent faster than -march=2.0 * -mschedule=7300 produces code that is approx. one percent faster than -mschedule=8000 * The two above are additive - 1.1/7300 is two percent faster than 2.0/8000 * Code compiled as 1.1/7300 and also run on a C160 with its kernel configured with the PA7300LC processor type (as opposed to configured as PA8000) enjoys another ~2% speed boost that is additive. So, to be completely clear, where the baseline is a C160, Linux 2.6.8.1-pa11 configured for PA8000 cpu and where the user binary is compiled with -march=2.0 and -mschedule=8000: - User binary compiled with -march=1.1 = +1% performance - User binary compiled with -mschedule=8000 = +1% performance - Kernel configured as PA7300LC = +2% performance > As far PA 2.0 versus 1.1, the main differences affecting 32-bit code > are some new branch instructions. There are also some new FP instructions > but these are somewhat compromised by linker bugs. In non floating-point > code, I would expect the PA 2.0 features to make their presence felt > in code with large functions. Is there anything in PA 2.0 that you would expect to cause poorer performance in any circumstance when compared to PA 1.1? > There have been a lot of optimization improvements in GCC since 3.3. > It would be useful to see how effective they are in real applications > and in benchmark performance. As far as the PA backend goes, there > haven't been any major performance improvements added since 3.3. The > changes mainly are bug fixes. So, what I'm hearing is that I might expect to see better code across the board with a post 3.3 compiler, but that there is unlikely to be a change in what I am seeing with 1.1 vs 2.0, 7300 vs 8000? > 64-bit code isn't going to make your apps run faster. There is more > overhead in data accesses in 64-bit code (i.e., they go through the DLT) > than in 32-bit apps. Also, a lot more sign extensions are needed. In > terms of a GCC build, the difference is about 15-20%. The 64-bit tools > are less mature. So generally, you only want to use 64-bit apps when > they can benefit from the larger address space. I'm not strong on the PA architecture - I assumed on a 64 bit machine that the data bus would be 64 bits wide. Thus, I would have thought that 64-bit compiled apps on such a machine would run at least as fast as 32 bit ones, and be superior in some areas such as when they had to deal with 64 bit values like file offsets. If less bit-width is better, shouldn't we all be going back to 6502? :) In any case, if what I'm seeing isn't due to 32 vs 64 bit, then I am completely baffled by what I'm seeing. Why would a C160 running at a 20% faster clock speed, having eight times the cache, and a design that should be superior in every sense run slower than B132L? I'm not trying to blame my machine's poor performance on anyone, but I must admit that having read as much as I can on the C160 vs B132L that I can find no explanation in the hardware. Kurt. _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux