[RFC 0/3] powerpc: memory copy routines tweaked for Cell

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

* [RFC 0/3] powerpc: memory copy routines tweaked for Cell
@ 2008-06-19  7:53 Mark Nelson
  2008-06-19 11:53 ` Arnd Bergmann
  0 siblings, 1 reply; 11+ messages in thread
From: Mark Nelson @ 2008-06-19  7:53 UTC (permalink / raw)
  To: linuxppc-dev, cbe-oss-dev; +Cc: Gunnar von Boehn, Michael Ellerman

The following are new versions of copy_tofrom_user, memcpy and copy_4K_page
which have been written specifically for Cell. All the hard work for these
routines was done by Gunnar von Boehn - I used his new memcpy to create
copy_4K_page and just added the exception handling code to
copy_tofrom_user.

Using these new routines give a big performance improvement for Cell
(tested on QS22 Cell blade):

Test				unpatched kernel		new routines
---------------------------------------------------------------------------------
iperf				2.3 GBits/sec			5.8 GBits/sec

netperf (TCP)		2.5 GBits/sec			5.3 GBits/sec

netperf (UDP)		318 MBits/sec			351 MBits/sec

kernel used was 2.6.25.7

tests were run as follows (final result is mean of 4 runs):
numactl --cpunodebind=0 --membind=0 ./iperf -s
-> numactl --cpunodebind=0 --membind=0 ./iperf -c 127.0.0.1 -t 30 -l 64k

numactl --cpunodebind=0 --membind=0 ./netserver
-> numactl --cpunodebind=0 --membind=0 ./netperf -l 30 -H 127.0.0.1 -c -t UDP_STREAM -- -m 1024
-> numactl --cpunodebind=0 --membind=0 ./netperf -l 30 -H 127.0.0.1 -c -t TCP_STREAM -i 10,2 -I 99,5 -- -m 32768

The plan is to use Michael Ellerman's code patching work so that at runtime
if we're running on a Cell machine the new routines are called but otherwise
the existing memory copy routines are used.

It would be good to get some more (fresh) eyes looking at this. Any and all
comments are welcome.

Thanks!
Mark

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 0/3] powerpc: memory copy routines tweaked for Cell
  2008-06-19  7:53 [RFC 0/3] powerpc: memory copy routines tweaked for Cell Mark Nelson
@ 2008-06-19 11:53 ` Arnd Bergmann
  2008-06-19 12:02   ` Paul Mackerras
                     ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Arnd Bergmann @ 2008-06-19 11:53 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Mark Nelson, Gunnar von Boehn, cbe-oss-dev, Michael Ellerman

On Thursday 19 June 2008, Mark Nelson wrote:
> The plan is to use Michael Ellerman's code patching work so that at runtime
> if we're running on a Cell machine the new routines are called but otherwise
> the existing memory copy routines are used.

Have you tried running this code on other platforms to see if it
actually performs worse on any of them? I would guess that the
older code also doesn't work too well on Power 5 and Power 6, so the
cell optimized version could give us a significant advantage as well,
albeit less than another CPU specific version.

	Arnd <><

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 0/3] powerpc: memory copy routines tweaked for Cell
  2008-06-19 11:53 ` Arnd Bergmann
@ 2008-06-19 12:02   ` Paul Mackerras
  2008-06-19 13:59     ` Arnd Bergmann
  2008-06-19 12:11   ` Gunnar von Boehn
  2008-06-19 23:49   ` Mark Nelson
  2 siblings, 1 reply; 11+ messages in thread
From: Paul Mackerras @ 2008-06-19 12:02 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Mark Nelson, linuxppc-dev, Gunnar von Boehn, cbe-oss-dev,
	Michael Ellerman

Arnd Bergmann writes:

> Have you tried running this code on other platforms to see if it
> actually performs worse on any of them? I would guess that the
> older code also doesn't work too well on Power 5 and Power 6,

Why would you guess that?

Paul.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 0/3] powerpc: memory copy routines tweaked for Cell
  2008-06-19 11:53 ` Arnd Bergmann
  2008-06-19 12:02   ` Paul Mackerras
@ 2008-06-19 12:11   ` Gunnar von Boehn
  2008-06-19 23:33     ` Paul Mackerras
  2008-06-19 23:49   ` Mark Nelson
  2 siblings, 1 reply; 11+ messages in thread
From: Gunnar von Boehn @ 2008-06-19 12:11 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Mark Nelson, linuxppc-dev, Michael Ellerman, cbe-oss-dev

Hi Arnd,

I have no results for P5/P6, but I did some tests on JS21 aka PPC-970.
On PPC-970 the CELL memcpy is faster than the current Linux routine.
This becomes really visible when you really copy memory-to-memory and are
not only working in the 2ndlevelcache.


Kind regards

Gunnar von Boehn




                                                                           
             Arnd Bergmann                                                 
             <arnd@arndb.de>                                               
                                                                        To 
             19/06/2008 13:53          linuxppc-dev@ozlabs.org             
                                                                        cc 
                                       Mark Nelson <markn@au1.ibm.com>,    
                                       cbe-oss-dev@ozlabs.org, Gunnar von  
                                       Boehn/Germany/Contr/IBM@IBMDE,      
                                       Michael Ellerman                    
                                       <ellerman@au1.ibm.com>              
                                                                   Subject 
                                       Re: [RFC 0/3] powerpc: memory copy  
                                       routines tweaked for Cell           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           




On Thursday 19 June 2008, Mark Nelson wrote:
> The plan is to use Michael Ellerman's code patching work so that at
runtime
> if we're running on a Cell machine the new routines are called but
otherwise
> the existing memory copy routines are used.

Have you tried running this code on other platforms to see if it
actually performs worse on any of them? I would guess that the
older code also doesn't work too well on Power 5 and Power 6, so the
cell optimized version could give us a significant advantage as well,
albeit less than another CPU specific version.

             Arnd <><

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 0/3] powerpc: memory copy routines tweaked for Cell
  2008-06-19 12:02   ` Paul Mackerras
@ 2008-06-19 13:59     ` Arnd Bergmann
  2008-06-19 14:53       ` Olof Johansson
  0 siblings, 1 reply; 11+ messages in thread
From: Arnd Bergmann @ 2008-06-19 13:59 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Mark Nelson, Gunnar von Boehn, Paul Mackerras, cbe-oss-dev,
	Michael Ellerman

On Thursday 19 June 2008, Paul Mackerras wrote:
> Arnd Bergmann writes:
> 
> > Have you tried running this code on other platforms to see if it
> > actually performs worse on any of them? I would guess that the
> > older code also doesn't work too well on Power 5 and Power 6,
> 
> Why would you guess that?

I remembered that Gunnar had done some tests on other CPUs showing
that an earlier version of the code was better than the kernel
memcpy.
Also, I had tried to trace the history of the usercopy function
and found that it predates most of the CPUs in current use, so
I assume it has suffered from bitrot and nobody tried to do better
since the Power3 days. AFAICT, it hasn't seen any update since your
original Power4 version from 2002.

	Arnd <><

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 0/3] powerpc: memory copy routines tweaked for Cell
  2008-06-19 13:59     ` Arnd Bergmann
@ 2008-06-19 14:53       ` Olof Johansson
  2008-06-20  0:04         ` Mark Nelson
  0 siblings, 1 reply; 11+ messages in thread
From: Olof Johansson @ 2008-06-19 14:53 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Mark Nelson, Gunnar von Boehn, linuxppc-dev, Paul Mackerras,
	Michael Ellerman, cbe-oss-dev

On Jun 19, 2008, at 8:59 AM, Arnd Bergmann wrote:

> I assume it has suffered from bitrot and nobody tried to do better
> since the Power3 days. AFAICT, it hasn't seen any update since your
> original Power4 version from 2002.

I've got an out-of-tree optimized version for pa6t as well that I  
haven't bothered posting yet.

The real pain with the usercopy code is all the exception cases. If  
anyone has made a test harness to make sure they're all right, please  
do post it for others to use as well...

-Olof

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 0/3] powerpc: memory copy routines tweaked for Cell
  2008-06-19 12:11   ` Gunnar von Boehn
@ 2008-06-19 23:33     ` Paul Mackerras
  2008-06-20 16:12       ` Gunnar von Boehn
  0 siblings, 1 reply; 11+ messages in thread
From: Paul Mackerras @ 2008-06-19 23:33 UTC (permalink / raw)
  To: Gunnar von Boehn
  Cc: Mark Nelson, linuxppc-dev, Michael Ellerman, cbe-oss-dev,
	Arnd Bergmann

Gunnar von Boehn writes:

> I have no results for P5/P6, but I did some tests on JS21 aka PPC-970.
> On PPC-970 the CELL memcpy is faster than the current Linux routine.
> This becomes really visible when you really copy memory-to-memory and are
> not only working in the 2ndlevelcache.

Could you send some more details, like the actual copy speed you
measured and how you did the tests?

Thanks,
Paul.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 0/3] powerpc: memory copy routines tweaked for Cell
  2008-06-19 11:53 ` Arnd Bergmann
  2008-06-19 12:02   ` Paul Mackerras
  2008-06-19 12:11   ` Gunnar von Boehn
@ 2008-06-19 23:49   ` Mark Nelson
  2008-06-21  0:12     ` Mark Nelson
  2 siblings, 1 reply; 11+ messages in thread
From: Mark Nelson @ 2008-06-19 23:49 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linuxppc-dev, Gunnar von Boehn, cbe-oss-dev, Michael Ellerman

On Thu, 19 Jun 2008 09:53:16 pm Arnd Bergmann wrote:
> On Thursday 19 June 2008, Mark Nelson wrote:
> > The plan is to use Michael Ellerman's code patching work so that at runtime
> > if we're running on a Cell machine the new routines are called but otherwise
> > the existing memory copy routines are used.
> 
> Have you tried running this code on other platforms to see if it
> actually performs worse on any of them? I would guess that the
> older code also doesn't work too well on Power 5 and Power 6, so the
> cell optimized version could give us a significant advantage as well,
> albeit less than another CPU specific version.
> 
> 	Arnd <><
> 

I did run the tests on Power 5 and Power 6, and on Power 5 with the
new routines, the iperf bandwidth increased to 7.9 GBits/sec up from
7.5 GBits/sec; but on Power 6 the bandwidth with the old routines
was 13.6 GBits/sec compared to 12.8 GBits/sec...

I also couldn't get the updated routines to boot on 970MP without
removing the dcbz instructions.

I'll investigate more and also rerun the tests again

Thanks!

Mark

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 0/3] powerpc: memory copy routines tweaked for Cell
  2008-06-19 14:53       ` Olof Johansson
@ 2008-06-20  0:04         ` Mark Nelson
  0 siblings, 0 replies; 11+ messages in thread
From: Mark Nelson @ 2008-06-20  0:04 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Gunnar von Boehn, Arnd Bergmann, Paul Mackerras, Olof Johansson,
	Michael Ellerman, cbe-oss-dev

On Fri, 20 Jun 2008 12:53:49 am Olof Johansson wrote:
> 
> On Jun 19, 2008, at 8:59 AM, Arnd Bergmann wrote:
> 
> > I assume it has suffered from bitrot and nobody tried to do better
> > since the Power3 days. AFAICT, it hasn't seen any update since your
> > original Power4 version from 2002.
> 
> I've got an out-of-tree optimized version for pa6t as well that I  
> haven't bothered posting yet.
> 
> The real pain with the usercopy code is all the exception cases. If  
> anyone has made a test harness to make sure they're all right, please  
> do post it for others to use as well...

I second that request - I verified (to the best that I could) with
pen and paper that the exception handling on this new version
is correct but it would be great to have a better way to test it.

Mark

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 0/3] powerpc: memory copy routines tweaked for Cell
  2008-06-19 23:33     ` Paul Mackerras
@ 2008-06-20 16:12       ` Gunnar von Boehn
  0 siblings, 0 replies; 11+ messages in thread
From: Gunnar von Boehn @ 2008-06-20 16:12 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Mark Nelson, linuxppc-dev, Michael Ellerman, cbe-oss-dev,
	Arnd Bergmann

[-- Attachment #1: Type: text/plain, Size: 3644 bytes --]

Hi Paul,

I wrote a small benchmark tool that compares various memcpy routines under
a number of conditions.
The benchmark test a number of different block sizes from a few Bytes to 16
MB.
Each test is repeated in a loop several times.

Three main types of copies are benchmarked.
a) Copies that are not cache able. This is a real memory-to-memory copy.
To prevent the CPU from caching the data, the SRC and DST pointers are
moved during the loop iterations.
b) Copies where the CPU is allowed to cache the SRC. This is a
cache-to-memory copy.
To prevent the CPU from caching the data, the DST pointer is moved during
the loop iterations.
C) Copies where the CPU is allowed to cache both SRC and DST. This is a
cache-to-cache copy.
To allow the CPU to cache it, both pointers are constant during the loop
iterations.

The test are repeated with different source/dst alignments to show
performance difference for aligned or not aligned data.

I've tested and compared various copy routines both on PS3, JS21 and QS21
and QS22.

Please find some results of the PS3 attached.

The test clearly show that the old GLIBC and Linux memcpy routines have the
same speed on CELL.
For aligned data:
Linux and GLIBC both got result speed  around 1500 MB/sec.
The Linux routine has an exception for the 4K case and gets around 3200
MB/sec for 4K copies
Our patch always gets a result between 5500-6000 MB/sec

For unaligned data both Linux and glibc score low with 800 MB/sec
Our patch gets here around 2500 MB/sec


If you want then I can send you the source of my benchmark program.

Cheers
Gunnar

(See attached file: ps3_result_easy_toread.txt)




                                                                           
             Paul Mackerras                                                
             <paulus@samba.org                                             
             >                                                          To 
                                       Gunnar von                          
             20/06/2008 01:33          Boehn/Germany/Contr/IBM@IBMDE       
                                                                        cc 
                                       Arnd Bergmann <arnd@arndb.de>, Mark 
                                       Nelson <markn@au1.ibm.com>,         
                                       linuxppc-dev@ozlabs.org, Michael    
                                       Ellerman <ellerman@au1.ibm.com>,    
                                       cbe-oss-dev@ozlabs.org              
                                                                   Subject 
                                       Re: [RFC 0/3] powerpc: memory copy  
                                       routines tweaked for Cell           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           




Gunnar von Boehn writes:

> I have no results for P5/P6, but I did some tests on JS21 aka PPC-970.
> On PPC-970 the CELL memcpy is faster than the current Linux routine.
> This becomes really visible when you really copy memory-to-memory and are
> not only working in the 2ndlevelcache.

Could you send some more details, like the actual copy speed you
measured and how you did the tests?

Thanks,
Paul.

[-- Attachment #2: ps3_result_easy_toread.txt --]
[-- Type: text/plain, Size: 10556 bytes --]

------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark memcpy performance v1.90
------------------------------------------------------------------------------------------------------------------------------------------------------------------
The Test will run some time please be patient.
Total memory required = 33.6 MB.
------------------------------------------------------------------------------------------------------------------------------------------------------------------

Memory throughput Working on Arrays of 16.8 MB.
We are now comparing different memcpy routines
Results are in MB/sec. Higher value means faster.
The test will be repeated on different aligned data.
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Best Case - Copy Memory-to-Memory
Alignment 0-0     16MB  300KB   64KB   16KB    4KB    2KB    1KB   512B   384B   256B   150B   128B   100B    64B    32B    16B 
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy      1563   1582   1601   1600   1599   1489   1320   1099    997    833    443    553    428    280    142     71 
linux 64          1544   1561   1535   1542   3274   1452   1291   1093   1006    839    508    555    451    296    149     75 
CELL memcpy       5869   6016   5454   5346   5607   4355   3523   2030   1648   1131    670    600    413    294    149     75 

------------------------------------------------------------------------------------------------------------------------------------------------------------------
Best Case - Copy Cache-to-Memory
Alignment 0-0     16MB  300KB   64KB   16KB    4KB    2KB    1KB   512B   384B   256B   150B   128B   100B    64B    32B    16B 
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy      1625   2225   6104   6569   7013   6778   5967   5777   5101   3928   2429   2224   1490   1121    564    284 
linux 64          1566   2110   5332   6372   3264   6394   5858   5286   4539   3574   2434   2142   1572   1135    574    288 
CELL memcpy       5683   7763  11002  10843  10306   9018   8352   6595   5805   4629   2572   2300   1595   1154    577    287 

------------------------------------------------------------------------------------------------------------------------------------------------------------------
Best Case - Copy Cache-to-Cache
Alignment 0-0     16MB  300KB   64KB   16KB    4KB    2KB    1KB   512B   384B   256B   150B   128B   100B    64B    32B    16B 
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy      1627   1982   6852  14928  14114  13128  11361   9291   8217   6470   5023   4808   3847   2882   1552   1014 
linux 64          1565   1878   5907  10565   4120   9874   9141   7993   7373   6334   4885   4344   4389   3619   2159   1227 
CELL memcpy       5652   7796  15277  18296  17374  16628  14332  11234  10468   9550   6982   8324   5456   4084   2703   1547 

------------------------------------------------------------------------------------------------------------------------------------------------------------------
Good Case - aligned on 4 but crosses 4k page - Copy Memory-to-Memory
Alignment 0-4092  16MB  300KB   64KB   16KB    4KB    2KB    1KB   512B   384B   256B   150B   128B   100B    64B    32B    16B 
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy      1606   1620   1599   1602   1591   1510   1354   1103    992    823    497    470    426    278    140     72 
linux 64          1558   1576   1559   1550   1521   1427   1242   1013    900    745    500    454    450    295    150     75 
CELL memcpy       5991   6042   5907   5660   4794   3660   2687   1837   1451   1039    636    556    438    290    148     73 

------------------------------------------------------------------------------------------------------------------------------------------------------------------
Good Case - aligned on 4 but crosses 4k page - Copy Cache-to-Memory
Alignment 0-4092  16MB  300KB   64KB   16KB    4KB    2KB    1KB   512B   384B   256B   150B   128B   100B    64B    32B    16B 
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy      1618   2128   4568   7531   8013   7122   6163   4845   3944   2860   2322   1737   1400   1109    559    282 
linux 64          1560   2038   4422   6604   6551   5659   4700   4273   3814   2765   2306   1667   1437   1127    567    286 
CELL memcpy       5628   7747  10750  10715  10038   8006   5955   5176   4431   3438   2431   1881   1343   1128    570    282 

------------------------------------------------------------------------------------------------------------------------------------------------------------------
Misaligned Cases - Copy Memory-to-Memory
Alignment 7-0     16MB  300KB   64KB   16KB    4KB    2KB    1KB   512B   384B   256B   150B   128B   100B    64B    32B    16B 
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy       823    829    823    823    816    778    713    620    568    489    385    342    343    240    130     67 
linux 64           861    875    864    859    861    814    753    654    601    518    371    365    358    253    138     73 
CELL memcpy       2551   2543   2512   2531   2426   2132   1756   1240   1089    839    540    500    402    272    121     75 

------------------------------------------------------------------------------------------------------------------------------------------------------------------
Alignment 0-7     16MB  300KB   64KB   16KB    4KB    2KB    1KB   512B   384B   256B   150B   128B   100B    64B    32B    16B 
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy       825    830    823    819    815    788    742    666    621    546    356    407    340    240    131     60 
linux 64           857    868    854    856    852    822    777    689    647    570    396    418    350    242    132     70 
CELL memcpy       2651   2626   2641   2540   2372   2071   1492    985    838    584    404    340    346    243    127     64 

------------------------------------------------------------------------------------------------------------------------------------------------------------------
Alignment 17-11   16MB  300KB   64KB   16KB    4KB    2KB    1KB   512B   384B   256B   150B   128B   100B    64B    32B    16B 
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy       823    829    823    823    816    771    711    614    564    483    379    339    345    240    125     69 
linux 64           853    867    853    851    851    803    739    638    584    499    395    351    351    239    130     69 
CELL memcpy       2557   2542   2507   2515   2387   1863   1487    998    829    620    435    377    390    241    138     71 

------------------------------------------------------------------------------------------------------------------------------------------------------------------
Misaligned Cases - Copy Cache-to-Memory
Alignment 7-0     16MB  300KB   64KB   16KB    4KB    2KB    1KB   512B   384B   256B   150B   128B   100B    64B    32B    16B 
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy       824    962   1340   1317   1321   1288   1239   1160   1120   1048    909    870    840    694    445    281 
linux 64           860   1010   1404   1432   1435   1402   1352   1247   1215   1134    963   1006    880    763    434    279 
CELL memcpy       2519   2656   2681   2613   2566   2446   2284   2100   2013   1846   1631   1500   1343   1059    561    287 

------------------------------------------------------------------------------------------------------------------------------------------------------------------
Alignment 0-7     16MB  300KB   64KB   16KB    4KB    2KB    1KB   512B   384B   256B   150B   128B   100B    64B    32B    16B 
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy       825    949   1268   1324   1319   1285   1244   1147   1107   1039    847    882    834    696    425    218 
linux 64           854   1007   1406   1419   1430   1391   1319   1209   1163   1082    938    923    822    695    371    192 
CELL memcpy       2598   2731   2729   2726   2558   2372   2126   1757   1633   1343   1057    921    850    739    374    242 

------------------------------------------------------------------------------------------------------------------------------------------------------------------
Alignment 17-11   16MB  300KB   64KB   16KB    4KB    2KB    1KB   512B   384B   256B   150B   128B   100B    64B    32B    16B 
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy       824    946   1251   1321   1325   1291   1238   1143   1106   1038    898    868    861    689    364    193 
linux 64           857    999   1391   1426   1421   1379   1316   1201   1159   1053    868    886    835    661    422    251 
CELL memcpy       2519   2657   2641   2605   2537   2389   2211   1962   1843   1554   1460   1302   1181    772    375    278 

------------------------------------------------------------------------------------------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC 0/3] powerpc: memory copy routines tweaked for Cell
  2008-06-19 23:49   ` Mark Nelson
@ 2008-06-21  0:12     ` Mark Nelson
  0 siblings, 0 replies; 11+ messages in thread
From: Mark Nelson @ 2008-06-21  0:12 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Gunnar von Boehn, cbe-oss-dev, Arnd Bergmann, Michael Ellerman

On Fri, 20 Jun 2008 09:49:29 am Mark Nelson wrote:
> On Thu, 19 Jun 2008 09:53:16 pm Arnd Bergmann wrote:
> > On Thursday 19 June 2008, Mark Nelson wrote:
> > > The plan is to use Michael Ellerman's code patching work so that at runtime
> > > if we're running on a Cell machine the new routines are called but otherwise
> > > the existing memory copy routines are used.
> > 
> > Have you tried running this code on other platforms to see if it
> > actually performs worse on any of them? I would guess that the
> > older code also doesn't work too well on Power 5 and Power 6, so the
> > cell optimized version could give us a significant advantage as well,
> > albeit less than another CPU specific version.
> > 
> > 	Arnd <><
> > 
> 
> I did run the tests on Power 5 and Power 6, and on Power 5 with the
> new routines, the iperf bandwidth increased to 7.9 GBits/sec up from
> 7.5 GBits/sec; but on Power 6 the bandwidth with the old routines
> was 13.6 GBits/sec compared to 12.8 GBits/sec...

After running the tests again I get a similar result, where on Power 6
the new routine is slower than the old one.

I'll spend some time doing tests to see if we can come up with a
routine that works well on Power 6 too.

Mark

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2008-06-21  0:12 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-19  7:53 [RFC 0/3] powerpc: memory copy routines tweaked for Cell Mark Nelson
2008-06-19 11:53 ` Arnd Bergmann
2008-06-19 12:02   ` Paul Mackerras
2008-06-19 13:59     ` Arnd Bergmann
2008-06-19 14:53       ` Olof Johansson
2008-06-20  0:04         ` Mark Nelson
2008-06-19 12:11   ` Gunnar von Boehn
2008-06-19 23:33     ` Paul Mackerras
2008-06-20 16:12       ` Gunnar von Boehn
2008-06-19 23:49   ` Mark Nelson
2008-06-21  0:12     ` Mark Nelson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).