From mboxrd@z Thu Jan 1 00:00:00 1970 From: Duraid Madina Date: Mon, 12 Jul 2004 11:12:21 +0000 Subject: Re: Consistency problem on IPF Message-Id: <40F27215.2030408@octopus.com.au> List-Id: References: <40F2562C.10208@inria.fr> In-Reply-To: <40F2562C.10208@inria.fr> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: linux-ia64@vger.kernel.org For the record, on HP-UX 11.23 the standard deviation is down around 0.1s for NP0, 512. For N24, it's basically zero. This is on a 2-way 1.5G/6M system. Linux has a way to go yet... Duraid Erich Focht wrote: > Hi Marc, > > usually, if you can solve the problem at user level it is improbable > that someone will provide a solution in the kernel. To my knowledge > there is no knob for optimizing memory layout in 2.6 and 2.4 results > will be similar. What plaform do you use? If it's NUMA, there are more > answers... > > At user level you could: > - try an optimized matrix-matrix multiply (BLAS3 DGEMM function from > the Intel MKL (math kernel library) ?). I'd expect that to be coded in > such a way that the impact of your data layout is reduced/limited. > - try using pages from hugetlbfs and keep all data in one page. > > Regards, > Erich > > On Monday 12 July 2004 11:13, Marc Gonzalez-Sigler wrote: > >>Hello, >> >>Several weeks ago, I wrote a naive matrix-matrix multiply program. >> >>int main(void) >>{ >> static double A[N][N], B[N][N], C[N][N]; >> >> /* Initialize A and B */ >> >> /* Main loop */ >> for (i=0; i < N; ++i) >> for (j=0; j < N; ++j) >> for (k=0; k < N; ++k) >> C[i][j] += A[i][k]*B[k][j]; >> >> /* Print the sum of all elements of C */ >>} >> >>The system: >> >>$ cat /proc/cpuinfo >>processor : 0 >>vendor : GenuineIntel >>arch : IA-64 >>family : Itanium 2 >>model : 1 >>revision : 5 >>archrev : 0 >>features : branchlong >>cpu number : 0 >>cpu regs : 4 >>cpu MHz : 1300.000000 >>itc MHz : 1300.000000 >>BogoMIPS : 1946.15 >> >>processor : 1 >>[same as processor 0] >> >>$ uname -a >>Linux c64 2.6.6 #2 SMP Thu Jun 10 18:03:20 CEST 2004 ia64 GNU/Linux >> >> >>I started with NQ2, which was a bad idea. I ran the same program 100 >>times on an empty system, and saw very different execution times. I >>tried to pin the program to a single CPU, but the results were similar. >> >>NQ2 >>MIN = 1.190000 >>MAX = 11.470000 >>MEAN = 4.686900 >>MEDIAN = 1.390000 >>STDDEV = 4.181866 >> >>OK. NQ2 was probably a pathological case. Let us try NP0. >> >>NP0 >>MIN = 0.670000 >>MAX = 1.770000 >>MEAN = 1.013100 >>MEDIAN = 0.670000 >>STDDEV = 0.466653 >> >>Better, but still quite inconsistent... >> >>The same experiment on a 3.0 GHz Northwood running 2.4.22 >> >>NP0 >>MEAN = 1.375200 >>MEDIAN = 1.375000 >>STDDEV = 0.002825 >> >>Tony Luck, an Intel engineer, told me on a different list this was a >>page-coloring issue. Would you agree? Is there a knob in Linux 2.6 to >>request a smarter physical page allocation policy? Do you think I would >>get similar results if I used 2.4 instead of 2.6? >> >>Thanks to everybody for reading this far. > > > - > To unsubscribe from this list: send the line "unsubscribe linux-ia64" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >