From mboxrd@z Thu Jan  1 00:00:00 1970
From: Duraid Madina <duraid@octopus.com.au>
Date: Mon, 12 Jul 2004 11:12:21 +0000
Subject: Re: Consistency problem on IPF
Message-Id: <40F27215.2030408@octopus.com.au>
List-Id: <linux-ia64.vger.kernel.org>
References: <40F2562C.10208@inria.fr>
In-Reply-To: <40F2562C.10208@inria.fr>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
To: linux-ia64@vger.kernel.org

For the record, on HP-UX 11.23 the standard deviation is down around 
0.1s for NP0, 512. For N24, it's basically zero. This is on a 2-way 
1.5G/6M system.

Linux has a way to go yet...

	Duraid

Erich Focht wrote:
> Hi Marc,
> 
> usually, if you can solve the problem at user level it is improbable
> that someone will provide a solution in the kernel. To my knowledge
> there is no knob for optimizing memory layout in 2.6 and 2.4 results
> will be similar. What plaform do you use? If it's NUMA, there are more
> answers...
> 
> At user level you could:
> - try an optimized matrix-matrix multiply (BLAS3 DGEMM function from
> the Intel MKL (math kernel library) ?). I'd expect that to be coded in
> such a way that the impact of your data layout is reduced/limited.
> - try using pages from hugetlbfs and keep all data in one page.
> 
> Regards,
> Erich
> 
> On Monday 12 July 2004 11:13, Marc Gonzalez-Sigler wrote:
> 
>>Hello,
>>
>>Several weeks ago, I wrote a naive matrix-matrix multiply program.
>>
>>int main(void)
>>{
>>   static double A[N][N], B[N][N], C[N][N];
>>
>>   /* Initialize A and B */
>>
>>   /* Main loop */
>>   for (i=0; i < N; ++i)
>>     for (j=0; j < N; ++j)
>>       for (k=0; k < N; ++k)
>>         C[i][j] += A[i][k]*B[k][j];
>>
>>   /* Print the sum of all elements of C */
>>}
>>
>>The system:
>>
>>$ cat /proc/cpuinfo
>>processor  : 0
>>vendor     : GenuineIntel
>>arch       : IA-64
>>family     : Itanium 2
>>model      : 1
>>revision   : 5
>>archrev    : 0
>>features   : branchlong
>>cpu number : 0
>>cpu regs   : 4
>>cpu MHz    : 1300.000000
>>itc MHz    : 1300.000000
>>BogoMIPS   : 1946.15
>>
>>processor  : 1
>>[same as processor 0]
>>
>>$ uname -a
>>Linux c64 2.6.6 #2 SMP Thu Jun 10 18:03:20 CEST 2004 ia64 GNU/Linux
>>
>>
>>I started with NQ2, which was a bad idea. I ran the same program 100 
>>times on an empty system, and saw very different execution times. I 
>>tried to pin the program to a single CPU, but the results were similar.
>>
>>NQ2
>>MIN    = 1.190000
>>MAX    = 11.470000
>>MEAN   = 4.686900
>>MEDIAN = 1.390000
>>STDDEV = 4.181866
>>
>>OK. NQ2 was probably a pathological case. Let us try NP0.
>>
>>NP0
>>MIN    = 0.670000
>>MAX    = 1.770000
>>MEAN   = 1.013100
>>MEDIAN = 0.670000
>>STDDEV = 0.466653
>>
>>Better, but still quite inconsistent...
>>
>>The same experiment on a 3.0 GHz Northwood running 2.4.22
>>
>>NP0
>>MEAN   = 1.375200
>>MEDIAN = 1.375000
>>STDDEV = 0.002825
>>
>>Tony Luck, an Intel engineer, told me on a different list this was a 
>>page-coloring issue. Would you agree? Is there a knob in Linux 2.6 to 
>>request a smarter physical page allocation policy? Do you think I would 
>>get similar results if I used 2.4 instead of 2.6?
>>
>>Thanks to everybody for reading this far.
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>