* [RFC 0/3] powerpc: memory copy routines tweaked for Cell
@ 2008-06-19 7:53 Mark Nelson
2008-06-19 11:53 ` Arnd Bergmann
0 siblings, 1 reply; 11+ messages in thread
From: Mark Nelson @ 2008-06-19 7:53 UTC (permalink / raw)
To: linuxppc-dev, cbe-oss-dev; +Cc: Gunnar von Boehn, Michael Ellerman
The following are new versions of copy_tofrom_user, memcpy and copy_4K_page
which have been written specifically for Cell. All the hard work for these
routines was done by Gunnar von Boehn - I used his new memcpy to create
copy_4K_page and just added the exception handling code to
copy_tofrom_user.
Using these new routines give a big performance improvement for Cell
(tested on QS22 Cell blade):
Test unpatched kernel new routines
---------------------------------------------------------------------------------
iperf 2.3 GBits/sec 5.8 GBits/sec
netperf (TCP) 2.5 GBits/sec 5.3 GBits/sec
netperf (UDP) 318 MBits/sec 351 MBits/sec
kernel used was 2.6.25.7
tests were run as follows (final result is mean of 4 runs):
numactl --cpunodebind=0 --membind=0 ./iperf -s
-> numactl --cpunodebind=0 --membind=0 ./iperf -c 127.0.0.1 -t 30 -l 64k
numactl --cpunodebind=0 --membind=0 ./netserver
-> numactl --cpunodebind=0 --membind=0 ./netperf -l 30 -H 127.0.0.1 -c -t UDP_STREAM -- -m 1024
-> numactl --cpunodebind=0 --membind=0 ./netperf -l 30 -H 127.0.0.1 -c -t TCP_STREAM -i 10,2 -I 99,5 -- -m 32768
The plan is to use Michael Ellerman's code patching work so that at runtime
if we're running on a Cell machine the new routines are called but otherwise
the existing memory copy routines are used.
It would be good to get some more (fresh) eyes looking at this. Any and all
comments are welcome.
Thanks!
Mark
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC 0/3] powerpc: memory copy routines tweaked for Cell
2008-06-19 7:53 [RFC 0/3] powerpc: memory copy routines tweaked for Cell Mark Nelson
@ 2008-06-19 11:53 ` Arnd Bergmann
2008-06-19 12:02 ` Paul Mackerras
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Arnd Bergmann @ 2008-06-19 11:53 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Mark Nelson, Gunnar von Boehn, cbe-oss-dev, Michael Ellerman
On Thursday 19 June 2008, Mark Nelson wrote:
> The plan is to use Michael Ellerman's code patching work so that at runtime
> if we're running on a Cell machine the new routines are called but otherwise
> the existing memory copy routines are used.
Have you tried running this code on other platforms to see if it
actually performs worse on any of them? I would guess that the
older code also doesn't work too well on Power 5 and Power 6, so the
cell optimized version could give us a significant advantage as well,
albeit less than another CPU specific version.
Arnd <><
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC 0/3] powerpc: memory copy routines tweaked for Cell
2008-06-19 11:53 ` Arnd Bergmann
@ 2008-06-19 12:02 ` Paul Mackerras
2008-06-19 13:59 ` Arnd Bergmann
2008-06-19 12:11 ` Gunnar von Boehn
2008-06-19 23:49 ` Mark Nelson
2 siblings, 1 reply; 11+ messages in thread
From: Paul Mackerras @ 2008-06-19 12:02 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Mark Nelson, linuxppc-dev, Gunnar von Boehn, cbe-oss-dev,
Michael Ellerman
Arnd Bergmann writes:
> Have you tried running this code on other platforms to see if it
> actually performs worse on any of them? I would guess that the
> older code also doesn't work too well on Power 5 and Power 6,
Why would you guess that?
Paul.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC 0/3] powerpc: memory copy routines tweaked for Cell
2008-06-19 11:53 ` Arnd Bergmann
2008-06-19 12:02 ` Paul Mackerras
@ 2008-06-19 12:11 ` Gunnar von Boehn
2008-06-19 23:33 ` Paul Mackerras
2008-06-19 23:49 ` Mark Nelson
2 siblings, 1 reply; 11+ messages in thread
From: Gunnar von Boehn @ 2008-06-19 12:11 UTC (permalink / raw)
To: Arnd Bergmann; +Cc: Mark Nelson, linuxppc-dev, Michael Ellerman, cbe-oss-dev
Hi Arnd,
I have no results for P5/P6, but I did some tests on JS21 aka PPC-970.
On PPC-970 the CELL memcpy is faster than the current Linux routine.
This becomes really visible when you really copy memory-to-memory and are
not only working in the 2ndlevelcache.
Kind regards
Gunnar von Boehn
Arnd Bergmann
<arnd@arndb.de>
To
19/06/2008 13:53 linuxppc-dev@ozlabs.org
cc
Mark Nelson <markn@au1.ibm.com>,
cbe-oss-dev@ozlabs.org, Gunnar von
Boehn/Germany/Contr/IBM@IBMDE,
Michael Ellerman
<ellerman@au1.ibm.com>
Subject
Re: [RFC 0/3] powerpc: memory copy
routines tweaked for Cell
On Thursday 19 June 2008, Mark Nelson wrote:
> The plan is to use Michael Ellerman's code patching work so that at
runtime
> if we're running on a Cell machine the new routines are called but
otherwise
> the existing memory copy routines are used.
Have you tried running this code on other platforms to see if it
actually performs worse on any of them? I would guess that the
older code also doesn't work too well on Power 5 and Power 6, so the
cell optimized version could give us a significant advantage as well,
albeit less than another CPU specific version.
Arnd <><
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC 0/3] powerpc: memory copy routines tweaked for Cell
2008-06-19 12:02 ` Paul Mackerras
@ 2008-06-19 13:59 ` Arnd Bergmann
2008-06-19 14:53 ` Olof Johansson
0 siblings, 1 reply; 11+ messages in thread
From: Arnd Bergmann @ 2008-06-19 13:59 UTC (permalink / raw)
To: linuxppc-dev
Cc: Mark Nelson, Gunnar von Boehn, Paul Mackerras, cbe-oss-dev,
Michael Ellerman
On Thursday 19 June 2008, Paul Mackerras wrote:
> Arnd Bergmann writes:
>
> > Have you tried running this code on other platforms to see if it
> > actually performs worse on any of them? I would guess that the
> > older code also doesn't work too well on Power 5 and Power 6,
>
> Why would you guess that?
I remembered that Gunnar had done some tests on other CPUs showing
that an earlier version of the code was better than the kernel
memcpy.
Also, I had tried to trace the history of the usercopy function
and found that it predates most of the CPUs in current use, so
I assume it has suffered from bitrot and nobody tried to do better
since the Power3 days. AFAICT, it hasn't seen any update since your
original Power4 version from 2002.
Arnd <><
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC 0/3] powerpc: memory copy routines tweaked for Cell
2008-06-19 13:59 ` Arnd Bergmann
@ 2008-06-19 14:53 ` Olof Johansson
2008-06-20 0:04 ` Mark Nelson
0 siblings, 1 reply; 11+ messages in thread
From: Olof Johansson @ 2008-06-19 14:53 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Mark Nelson, Gunnar von Boehn, linuxppc-dev, Paul Mackerras,
Michael Ellerman, cbe-oss-dev
On Jun 19, 2008, at 8:59 AM, Arnd Bergmann wrote:
> I assume it has suffered from bitrot and nobody tried to do better
> since the Power3 days. AFAICT, it hasn't seen any update since your
> original Power4 version from 2002.
I've got an out-of-tree optimized version for pa6t as well that I
haven't bothered posting yet.
The real pain with the usercopy code is all the exception cases. If
anyone has made a test harness to make sure they're all right, please
do post it for others to use as well...
-Olof
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC 0/3] powerpc: memory copy routines tweaked for Cell
2008-06-19 12:11 ` Gunnar von Boehn
@ 2008-06-19 23:33 ` Paul Mackerras
2008-06-20 16:12 ` Gunnar von Boehn
0 siblings, 1 reply; 11+ messages in thread
From: Paul Mackerras @ 2008-06-19 23:33 UTC (permalink / raw)
To: Gunnar von Boehn
Cc: Mark Nelson, linuxppc-dev, Michael Ellerman, cbe-oss-dev,
Arnd Bergmann
Gunnar von Boehn writes:
> I have no results for P5/P6, but I did some tests on JS21 aka PPC-970.
> On PPC-970 the CELL memcpy is faster than the current Linux routine.
> This becomes really visible when you really copy memory-to-memory and are
> not only working in the 2ndlevelcache.
Could you send some more details, like the actual copy speed you
measured and how you did the tests?
Thanks,
Paul.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC 0/3] powerpc: memory copy routines tweaked for Cell
2008-06-19 11:53 ` Arnd Bergmann
2008-06-19 12:02 ` Paul Mackerras
2008-06-19 12:11 ` Gunnar von Boehn
@ 2008-06-19 23:49 ` Mark Nelson
2008-06-21 0:12 ` Mark Nelson
2 siblings, 1 reply; 11+ messages in thread
From: Mark Nelson @ 2008-06-19 23:49 UTC (permalink / raw)
To: Arnd Bergmann
Cc: linuxppc-dev, Gunnar von Boehn, cbe-oss-dev, Michael Ellerman
On Thu, 19 Jun 2008 09:53:16 pm Arnd Bergmann wrote:
> On Thursday 19 June 2008, Mark Nelson wrote:
> > The plan is to use Michael Ellerman's code patching work so that at runtime
> > if we're running on a Cell machine the new routines are called but otherwise
> > the existing memory copy routines are used.
>
> Have you tried running this code on other platforms to see if it
> actually performs worse on any of them? I would guess that the
> older code also doesn't work too well on Power 5 and Power 6, so the
> cell optimized version could give us a significant advantage as well,
> albeit less than another CPU specific version.
>
> Arnd <><
>
I did run the tests on Power 5 and Power 6, and on Power 5 with the
new routines, the iperf bandwidth increased to 7.9 GBits/sec up from
7.5 GBits/sec; but on Power 6 the bandwidth with the old routines
was 13.6 GBits/sec compared to 12.8 GBits/sec...
I also couldn't get the updated routines to boot on 970MP without
removing the dcbz instructions.
I'll investigate more and also rerun the tests again
Thanks!
Mark
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC 0/3] powerpc: memory copy routines tweaked for Cell
2008-06-19 14:53 ` Olof Johansson
@ 2008-06-20 0:04 ` Mark Nelson
0 siblings, 0 replies; 11+ messages in thread
From: Mark Nelson @ 2008-06-20 0:04 UTC (permalink / raw)
To: linuxppc-dev
Cc: Gunnar von Boehn, Arnd Bergmann, Paul Mackerras, Olof Johansson,
Michael Ellerman, cbe-oss-dev
On Fri, 20 Jun 2008 12:53:49 am Olof Johansson wrote:
>
> On Jun 19, 2008, at 8:59 AM, Arnd Bergmann wrote:
>
> > I assume it has suffered from bitrot and nobody tried to do better
> > since the Power3 days. AFAICT, it hasn't seen any update since your
> > original Power4 version from 2002.
>
> I've got an out-of-tree optimized version for pa6t as well that I
> haven't bothered posting yet.
>
> The real pain with the usercopy code is all the exception cases. If
> anyone has made a test harness to make sure they're all right, please
> do post it for others to use as well...
I second that request - I verified (to the best that I could) with
pen and paper that the exception handling on this new version
is correct but it would be great to have a better way to test it.
Mark
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC 0/3] powerpc: memory copy routines tweaked for Cell
2008-06-19 23:33 ` Paul Mackerras
@ 2008-06-20 16:12 ` Gunnar von Boehn
0 siblings, 0 replies; 11+ messages in thread
From: Gunnar von Boehn @ 2008-06-20 16:12 UTC (permalink / raw)
To: Paul Mackerras
Cc: Mark Nelson, linuxppc-dev, Michael Ellerman, cbe-oss-dev,
Arnd Bergmann
[-- Attachment #1: Type: text/plain, Size: 3644 bytes --]
Hi Paul,
I wrote a small benchmark tool that compares various memcpy routines under
a number of conditions.
The benchmark test a number of different block sizes from a few Bytes to 16
MB.
Each test is repeated in a loop several times.
Three main types of copies are benchmarked.
a) Copies that are not cache able. This is a real memory-to-memory copy.
To prevent the CPU from caching the data, the SRC and DST pointers are
moved during the loop iterations.
b) Copies where the CPU is allowed to cache the SRC. This is a
cache-to-memory copy.
To prevent the CPU from caching the data, the DST pointer is moved during
the loop iterations.
C) Copies where the CPU is allowed to cache both SRC and DST. This is a
cache-to-cache copy.
To allow the CPU to cache it, both pointers are constant during the loop
iterations.
The test are repeated with different source/dst alignments to show
performance difference for aligned or not aligned data.
I've tested and compared various copy routines both on PS3, JS21 and QS21
and QS22.
Please find some results of the PS3 attached.
The test clearly show that the old GLIBC and Linux memcpy routines have the
same speed on CELL.
For aligned data:
Linux and GLIBC both got result speed around 1500 MB/sec.
The Linux routine has an exception for the 4K case and gets around 3200
MB/sec for 4K copies
Our patch always gets a result between 5500-6000 MB/sec
For unaligned data both Linux and glibc score low with 800 MB/sec
Our patch gets here around 2500 MB/sec
If you want then I can send you the source of my benchmark program.
Cheers
Gunnar
(See attached file: ps3_result_easy_toread.txt)
Paul Mackerras
<paulus@samba.org
> To
Gunnar von
20/06/2008 01:33 Boehn/Germany/Contr/IBM@IBMDE
cc
Arnd Bergmann <arnd@arndb.de>, Mark
Nelson <markn@au1.ibm.com>,
linuxppc-dev@ozlabs.org, Michael
Ellerman <ellerman@au1.ibm.com>,
cbe-oss-dev@ozlabs.org
Subject
Re: [RFC 0/3] powerpc: memory copy
routines tweaked for Cell
Gunnar von Boehn writes:
> I have no results for P5/P6, but I did some tests on JS21 aka PPC-970.
> On PPC-970 the CELL memcpy is faster than the current Linux routine.
> This becomes really visible when you really copy memory-to-memory and are
> not only working in the 2ndlevelcache.
Could you send some more details, like the actual copy speed you
measured and how you did the tests?
Thanks,
Paul.
[-- Attachment #2: ps3_result_easy_toread.txt --]
[-- Type: text/plain, Size: 10556 bytes --]
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark memcpy performance v1.90
------------------------------------------------------------------------------------------------------------------------------------------------------------------
The Test will run some time please be patient.
Total memory required = 33.6 MB.
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Memory throughput Working on Arrays of 16.8 MB.
We are now comparing different memcpy routines
Results are in MB/sec. Higher value means faster.
The test will be repeated on different aligned data.
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Best Case - Copy Memory-to-Memory
Alignment 0-0 16MB 300KB 64KB 16KB 4KB 2KB 1KB 512B 384B 256B 150B 128B 100B 64B 32B 16B
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy 1563 1582 1601 1600 1599 1489 1320 1099 997 833 443 553 428 280 142 71
linux 64 1544 1561 1535 1542 3274 1452 1291 1093 1006 839 508 555 451 296 149 75
CELL memcpy 5869 6016 5454 5346 5607 4355 3523 2030 1648 1131 670 600 413 294 149 75
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Best Case - Copy Cache-to-Memory
Alignment 0-0 16MB 300KB 64KB 16KB 4KB 2KB 1KB 512B 384B 256B 150B 128B 100B 64B 32B 16B
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy 1625 2225 6104 6569 7013 6778 5967 5777 5101 3928 2429 2224 1490 1121 564 284
linux 64 1566 2110 5332 6372 3264 6394 5858 5286 4539 3574 2434 2142 1572 1135 574 288
CELL memcpy 5683 7763 11002 10843 10306 9018 8352 6595 5805 4629 2572 2300 1595 1154 577 287
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Best Case - Copy Cache-to-Cache
Alignment 0-0 16MB 300KB 64KB 16KB 4KB 2KB 1KB 512B 384B 256B 150B 128B 100B 64B 32B 16B
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy 1627 1982 6852 14928 14114 13128 11361 9291 8217 6470 5023 4808 3847 2882 1552 1014
linux 64 1565 1878 5907 10565 4120 9874 9141 7993 7373 6334 4885 4344 4389 3619 2159 1227
CELL memcpy 5652 7796 15277 18296 17374 16628 14332 11234 10468 9550 6982 8324 5456 4084 2703 1547
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Good Case - aligned on 4 but crosses 4k page - Copy Memory-to-Memory
Alignment 0-4092 16MB 300KB 64KB 16KB 4KB 2KB 1KB 512B 384B 256B 150B 128B 100B 64B 32B 16B
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy 1606 1620 1599 1602 1591 1510 1354 1103 992 823 497 470 426 278 140 72
linux 64 1558 1576 1559 1550 1521 1427 1242 1013 900 745 500 454 450 295 150 75
CELL memcpy 5991 6042 5907 5660 4794 3660 2687 1837 1451 1039 636 556 438 290 148 73
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Good Case - aligned on 4 but crosses 4k page - Copy Cache-to-Memory
Alignment 0-4092 16MB 300KB 64KB 16KB 4KB 2KB 1KB 512B 384B 256B 150B 128B 100B 64B 32B 16B
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy 1618 2128 4568 7531 8013 7122 6163 4845 3944 2860 2322 1737 1400 1109 559 282
linux 64 1560 2038 4422 6604 6551 5659 4700 4273 3814 2765 2306 1667 1437 1127 567 286
CELL memcpy 5628 7747 10750 10715 10038 8006 5955 5176 4431 3438 2431 1881 1343 1128 570 282
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Misaligned Cases - Copy Memory-to-Memory
Alignment 7-0 16MB 300KB 64KB 16KB 4KB 2KB 1KB 512B 384B 256B 150B 128B 100B 64B 32B 16B
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy 823 829 823 823 816 778 713 620 568 489 385 342 343 240 130 67
linux 64 861 875 864 859 861 814 753 654 601 518 371 365 358 253 138 73
CELL memcpy 2551 2543 2512 2531 2426 2132 1756 1240 1089 839 540 500 402 272 121 75
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Alignment 0-7 16MB 300KB 64KB 16KB 4KB 2KB 1KB 512B 384B 256B 150B 128B 100B 64B 32B 16B
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy 825 830 823 819 815 788 742 666 621 546 356 407 340 240 131 60
linux 64 857 868 854 856 852 822 777 689 647 570 396 418 350 242 132 70
CELL memcpy 2651 2626 2641 2540 2372 2071 1492 985 838 584 404 340 346 243 127 64
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Alignment 17-11 16MB 300KB 64KB 16KB 4KB 2KB 1KB 512B 384B 256B 150B 128B 100B 64B 32B 16B
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy 823 829 823 823 816 771 711 614 564 483 379 339 345 240 125 69
linux 64 853 867 853 851 851 803 739 638 584 499 395 351 351 239 130 69
CELL memcpy 2557 2542 2507 2515 2387 1863 1487 998 829 620 435 377 390 241 138 71
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Misaligned Cases - Copy Cache-to-Memory
Alignment 7-0 16MB 300KB 64KB 16KB 4KB 2KB 1KB 512B 384B 256B 150B 128B 100B 64B 32B 16B
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy 824 962 1340 1317 1321 1288 1239 1160 1120 1048 909 870 840 694 445 281
linux 64 860 1010 1404 1432 1435 1402 1352 1247 1215 1134 963 1006 880 763 434 279
CELL memcpy 2519 2656 2681 2613 2566 2446 2284 2100 2013 1846 1631 1500 1343 1059 561 287
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Alignment 0-7 16MB 300KB 64KB 16KB 4KB 2KB 1KB 512B 384B 256B 150B 128B 100B 64B 32B 16B
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy 825 949 1268 1324 1319 1285 1244 1147 1107 1039 847 882 834 696 425 218
linux 64 854 1007 1406 1419 1430 1391 1319 1209 1163 1082 938 923 822 695 371 192
CELL memcpy 2598 2731 2729 2726 2558 2372 2126 1757 1633 1343 1057 921 850 739 374 242
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Alignment 17-11 16MB 300KB 64KB 16KB 4KB 2KB 1KB 512B 384B 256B 150B 128B 100B 64B 32B 16B
------------------------------------------------------------------------------------------------------------------------------------------------------------------
glibc memcpy 824 946 1251 1321 1325 1291 1238 1143 1106 1038 898 868 861 689 364 193
linux 64 857 999 1391 1426 1421 1379 1316 1201 1159 1053 868 886 835 661 422 251
CELL memcpy 2519 2657 2641 2605 2537 2389 2211 1962 1843 1554 1460 1302 1181 772 375 278
------------------------------------------------------------------------------------------------------------------------------------------------------------------
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC 0/3] powerpc: memory copy routines tweaked for Cell
2008-06-19 23:49 ` Mark Nelson
@ 2008-06-21 0:12 ` Mark Nelson
0 siblings, 0 replies; 11+ messages in thread
From: Mark Nelson @ 2008-06-21 0:12 UTC (permalink / raw)
To: linuxppc-dev
Cc: Gunnar von Boehn, cbe-oss-dev, Arnd Bergmann, Michael Ellerman
On Fri, 20 Jun 2008 09:49:29 am Mark Nelson wrote:
> On Thu, 19 Jun 2008 09:53:16 pm Arnd Bergmann wrote:
> > On Thursday 19 June 2008, Mark Nelson wrote:
> > > The plan is to use Michael Ellerman's code patching work so that at runtime
> > > if we're running on a Cell machine the new routines are called but otherwise
> > > the existing memory copy routines are used.
> >
> > Have you tried running this code on other platforms to see if it
> > actually performs worse on any of them? I would guess that the
> > older code also doesn't work too well on Power 5 and Power 6, so the
> > cell optimized version could give us a significant advantage as well,
> > albeit less than another CPU specific version.
> >
> > Arnd <><
> >
>
> I did run the tests on Power 5 and Power 6, and on Power 5 with the
> new routines, the iperf bandwidth increased to 7.9 GBits/sec up from
> 7.5 GBits/sec; but on Power 6 the bandwidth with the old routines
> was 13.6 GBits/sec compared to 12.8 GBits/sec...
After running the tests again I get a similar result, where on Power 6
the new routine is slower than the old one.
I'll spend some time doing tests to see if we can come up with a
routine that works well on Power 6 too.
Mark
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2008-06-21 0:12 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-19 7:53 [RFC 0/3] powerpc: memory copy routines tweaked for Cell Mark Nelson
2008-06-19 11:53 ` Arnd Bergmann
2008-06-19 12:02 ` Paul Mackerras
2008-06-19 13:59 ` Arnd Bergmann
2008-06-19 14:53 ` Olof Johansson
2008-06-20 0:04 ` Mark Nelson
2008-06-19 12:11 ` Gunnar von Boehn
2008-06-19 23:33 ` Paul Mackerras
2008-06-20 16:12 ` Gunnar von Boehn
2008-06-19 23:49 ` Mark Nelson
2008-06-21 0:12 ` Mark Nelson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).