public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* New csum and csum_copy routines - and a test/benchmark program
@ 2002-10-28 13:12 Denis Vlasenko
  2002-10-28  8:48 ` Roberto Nibali
  2002-10-28 15:12 ` J.A. Magallon
  0 siblings, 2 replies; 6+ messages in thread
From: Denis Vlasenko @ 2002-10-28 13:12 UTC (permalink / raw)
  To: Ingo Oeser, Momchil Velikov, Alan Cox; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 5927 bytes --]

I took some time to develop a little test/benchmark program
for csum and csum_copy routines (used in networking).
It has grown to include following features:

* Total buffer size #define-selectable, hence you can measure
  cache-hot and cache-cold performance.

* It does not simply checksum entire buffer, you can do it in 'chunks'.
  Chunk size is a #define too. Chunk order is randomized for eash run
  (this is done to stop fooling us with prefetch from prev chunk to next).
  But you are guaranteed to walk entire buffer.

* Buffer contents are randomized at each run. Csum correctness is checked.

* Buffer copy correctness verified for csum_copy.

* You can set random (up to a #defined value) start and end offset for each
  chunk. Gaps are poisoned before each csum_copy and verified afterwards.
  This has already caught two bugs.

* It benchmarks each routine by running it #defined number of times
  and reporting min/max cycles per kb taken.

* It is easy to add/remove C and asm test routines.

* Easily adaptable for SSE and MMX instruction sets.

* It can make coffee for you. ;)

I'm thinking on how to collect 2-5 best routines and
make 'em compete at kernel init time for the right
to be used for blazing network performance, but did not
even start to code this. Similar approach can be taken
for page clear/copy and copy to/from user routines.

Election of 'best' routine by lkml posts is:
1. Slow
2. Doesn't fit given combination of CPU/mem/mobo
so do _not_ send your results to lkml unless you think
you found something interesting.

FYI, my last results below. kpf_XXX routines are newest'n'greatest.
I found out to my surprize that shortening unrolled loop
on Duron has positive effect.

Coders with 'prefetchless' CPUs are encouraged to write up
their own versions of prefetch-like routines (you may use
mov [mem],reg as a prefetch in the hopes CPU will reorder
instructions and will happily csum older data while such mov
is waiting for data to be fetched. But this needs testing.
That's what this program is for! :-)
--
vda

Duron 650
=========
Csum benchmark program
buffer size: 4 Mb
Each test tried 32 times, max and min CPU cycles are reported.
Please disregard max values. They are due to system interference only.
csum tests:
                     kernel_csum - took  2612 max, 1887 min cycles per kb. sum=0xfad28968
                     kernel_csum - took  2654 max, 1887 min cycles per kb. sum=0xfad28968
                     kernel_csum - took  2105 max, 1887 min cycles per kb. sum=0xfad28968
                    kernel2_csum - took  2636 max, 1925 min cycles per kb. sum=0xfad28968
                  kernelpii_csum - took 11879 max, 1735 min cycles per kb. sum=0xaeffd53b
                kernelpiipf_csum - took  2565 max, 1642 min cycles per kb. sum=0xaeffd53b
                        kpf_csum - took  1280 max, 1037 min cycles per kb. sum=0xaeffd53b
                        kpf_csum - took  1298 max, 1037 min cycles per kb. sum=0xaeffd53b
                        kpf_csum - took  1285 max, 1035 min cycles per kb. sum=0xaeffd53b
                        kpf_csum - took  1893 max, 1037 min cycles per kb. sum=0xaeffd53b
copy tests:
                     kernel_copy - took  5812 max, 4854 min cycles per kb. sum=0xfad28968
                     kernel_copy - took  5741 max, 4854 min cycles per kb. sum=0xfad28968
                     kernel_copy - took 17680 max, 4859 min cycles per kb. sum=0xfad28968
                  kernelpii_copy - took  7204 max, 6381 min cycles per kb. sum=0xe3bca07e
                kernelpiipf_copy - took  8429 max, 7477 min cycles per kb. sum=0xe3bca07e
                        kpf_copy - took 12806 max, 2471 min cycles per kb. sum=0xfad28968
                        kpf_copy - took  3181 max, 2470 min cycles per kb. sum=0xfad28968
                        kpf_copy - took  3327 max, 2471 min cycles per kb. sum=0xfad28968
                        kpf_copy - took 11967 max, 2471 min cycles per kb. sum=0xfad28968
Done

Celeron 1200
============
Csum benchmark program
buffer size: 4 Mb
Each test tried 32 times, max and min CPU cycles are reported.
Please disregard max values. They are due to system interference only.
csum tests:
                     kernel_csum - took  7368 max, 6833 min cycles per kb. sum=0x291132e0
                     kernel_csum - took  9038 max, 6845 min cycles per kb. sum=0x291132e0
                     kernel_csum - took  7112 max, 6836 min cycles per kb. sum=0x291132e0
                    kernel2_csum - took  7254 max, 6871 min cycles per kb. sum=0x291132e0
                  kernelpii_csum - took  4696 max, 4109 min cycles per kb. sum=0x484713aa
                kernelpiipf_csum - took  4715 max, 4271 min cycles per kb. sum=0x484713aa
                        kpf_csum - took  3295 max, 2780 min cycles per kb. sum=0x484713aa
                        kpf_csum - took  3091 max, 2793 min cycles per kb. sum=0x484713aa
                        kpf_csum - took 14580 max, 2833 min cycles per kb. sum=0x484713aa
                        kpf_csum - took  3292 max, 2833 min cycles per kb. sum=0x484713aa
copy tests:
                     kernel_copy - took 13927 max,13450 min cycles per kb. sum=0x291132e0
                     kernel_copy - took 14009 max,13406 min cycles per kb. sum=0x291132e0
                     kernel_copy - took 13957 max,13447 min cycles per kb. sum=0x291132e0
                  kernelpii_copy - took 15039 max,11335 min cycles per kb. sum=0x5474077d
                kernelpiipf_copy - took 14137 max,13059 min cycles per kb. sum=0x5474077d
                        kpf_copy - took  8226 max, 7857 min cycles per kb. sum=0x291132e0
                        kpf_copy - took 20698 max, 7886 min cycles per kb. sum=0x291132e0
                        kpf_copy - took  8504 max, 7897 min cycles per kb. sum=0x291132e0
                        kpf_copy - took  8245 max, 7893 min cycles per kb. sum=0x291132e0
Done

[-- Attachment #2: timing_csum_copy.3.tar.bz2 --]
[-- Type: application/x-bzip2, Size: 8713 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2002-10-28 18:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-10-28 13:12 New csum and csum_copy routines - and a test/benchmark program Denis Vlasenko
2002-10-28  8:48 ` Roberto Nibali
2002-10-28 15:27   ` Denis Vlasenko
2002-10-28 11:05     ` Roberto Nibali
2002-10-28 15:12 ` J.A. Magallon
2002-10-28 23:14   ` Denis Vlasenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox