From: rwhron@earthlink.net
To: linux-kernel@vger.kernel.org
Subject: [BENCHMARK] 2.5.68 and 2.5.68-mm2
Date: Fri, 25 Apr 2003 19:09:39 -0400 [thread overview]
Message-ID: <20030425230939.GA2281@rushmore> (raw)
There are a few benchmarks that have changed dramatically
between 2.5.68 and 2.5.68-mm2.
Machine is Quad P3 700 mhz Xeon with 1M cache.
3.75 GB RAM.
RAID0 LUN
QLogic 2200 Fiber channel
Some config differences. 2.5.68 has standard Qlogic driver.
2.5.68-mm2 has new Qlogic driver and the 2/2 GB memory split.
Only in 2.5.68
CONFIG_SCSI_QLOGIC_FC=y
CONFIG_SCSI_QLOGIC_FC_FIRMWARE=y
CONFIG_SCSI_QLOGIC_ISP=y
Only in 2.5.68-mm2
CONFIG_2GB=y
CONFIG_DEBUG_INFO=y
CONFIG_NR_SIBLINGS_0=y
CONFIG_SCSI_QLOGIC_ISP_NEW=y
CONFIG_SPINLINE=y
One recent change is -mm2 is 17-19% faster at tbench.
The logfiles don't indicate any errors. Wonder what helped?
tbench 192 processes Average High Low
2.5.68-mm2 139.44 142.14 136.77 MB/sec
2.5.68 118.78 132.41 111.45
tbench 64 processes Average High Low
2.5.68-mm2 136.34 143.66 124.13 MB/sec
2.5.68 114.30 116.88 111.33
The autoconf-2.53 make/make check is a fork test. 2.5.68
is about 13% faster here.
kernel average min_time max_time
2.5.68 732.8 729 738 seconds
2.5.68-mm2 833.3 824 841
On the AIM7 database test, -mm2 was about 18% faster and
uses about 15% more CPU time. (Real and CPU are in seconds).
The new Qlogic driver helps AIM7.
AIM7 dbase workload
kernel Tasks Jobs/Min Real CPU
2.5.68-mm2 32 559.8 339.6 164.0
2.5.68 32 477.1 398.4 150.9
2.5.68-mm2 64 714.1 532.4 312.3
2.5.68 64 608.3 625.0 272.4
2.5.68-mm2 96 785.6 725.9 458.8
2.5.68 96 664.7 857.8 393.9
2.5.68-mm2 128 832.1 913.8 640.0
2.5.68 128 702.3 1082.5 515.5
2.5.68-mm2 160 858.5 1107.0 712.2
2.5.68 160 726.7 1307.8 624.2
2.5.68-mm2 192 880.4 1295.4 871.1
2.5.68 192 745.7 1529.5 763.0
2.5.68-mm2 224 895.1 1486.5 1005.1
2.5.68 224 758.0 1755.3 868.4
2.5.68-mm2 256 907.8 1675.1 1144.5
2.5.68 256 767.5 1981.3 987.2
On the AIM7 shared test, -mm2 is 15-19% faster and
uses about 5% more CPU time.
AIM7 shared workload
kernel Tasks Jobs/Min Real CPU
2.5.68-mm2 64 2447.0 152.2 180.8
2.5.68 64 2110.4 176.5 170.0
2.5.68-mm2 128 2705.0 275.4 357.6
2.5.68 128 2276.9 327.2 337.2
2.5.68-mm2 192 2708.3 412.6 537.5
2.5.68 192 2265.4 493.3 506.8
2.5.68-mm2 256 2746.1 542.5 716.3
2.5.68 256 2304.7 646.5 677.5
2.5.68-mm2 320 2732.9 681.5 900.0
2.5.68 320 2296.3 811.0 849.4
L M B E N C H 2 . 0 S U M M A R Y
------------------------------------
The lmbench process latency results go along with the autoconf
build results.
Processor, Processes - times in microseconds - smaller is better
----------------------------------------------------------------
fork execve /bin/sh
kernel process process process
------------- ------- ------- -------
2.5.68 243 979 4401
2.5.68-mm2 502 1715 5200
The lmbench context switch tests have an interesting pattern.
With low processes and small packets, 2.5.68 has lower latency.
2.5.68-mm2 turns the table for high process big packet tests.
Context switching with 0K - times in microseconds - smaller is better
---------------------------------------------------------------------
2proc/0k 4proc/0k 8proc/0k 16proc/0k 32proc/0k 64proc/0k 96proc/0k
kernel ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch
2.5.68 1.32 2.63 2.38 2.41 2.42 2.87 3.79
2.5.68-mm2 6.80 6.97 6.74 6.59 6.43 5.94 6.17
Context switching with 4K - times in microseconds - smaller is better
---------------------------------------------------------------------
2proc/4k 4proc/4k 8proc/4k 16proc/4k 32proc/4k 64proc/4k 96proc/4k
kernel ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch
2.5.68 1.81 3.53 3.79 4.26 4.62 6.06 8.30
2.5.68-mm2 6.91 7.13 7.29 7.57 7.72 7.38 7.91
Context switching with 8K - times in microseconds - smaller is better
---------------------------------------------------------------------
2proc/8k 4proc/8k 8proc/8k 16proc/8k 32proc/8k 64proc/8k 96proc/8k
kernel ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch
2.5.68 3.31 5.35 5.16 5.29 6.07 12.05 19.60
2.5.68-mm2 7.20 8.42 8.86 8.87 9.12 9.13 10.51
Context switching with 16K - times in microseconds - smaller is better
----------------------------------------------------------------------
2proc/16k 4proc/16k 8proc/16k 16prc/16k 32prc/16k 64prc/16k 96prc/16k
kernel ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch
2.5.68 7.46 8.19 8.04 8.49 13.66 37.52 46.99
2.5.68-mm2 10.50 11.46 11.78 11.61 11.89 15.26 24.91
Context switching with 32K - times in microseconds - smaller is better
----------------------------------------------------------------------
2proc/32k 4proc/32k 8proc/32k 16prc/32k 32prc/32k 64prc/32k 96prc/32k
kernel ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch
2.5.68 12.690 13.520 13.856 19.877 52.473 81.259 83.397
2.5.68-mm2 17.419 17.285 17.212 17.358 20.044 46.069 75.088
Context switching with 64K - times in microseconds - smaller is better
----------------------------------------------------------------------
2proc/64k 4proc/64k 8proc/64k 16prc/64k 32prc/64k 64prc/64k 96prc/64k
kernel ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch
2.5.68 23.03 24.71 34.03 105.06 155.47 156.37 156.29
2.5.68-mm2 27.81 27.97 28.03 33.67 79.36 154.14 172.09
2.5.68 has lower latency in the local communcation tests.
*Local* Communication latencies in microseconds - smaller is better
-------------------------------------------------------------------
kernel Pipe AF/Unix UDP RPC/UDP TCP RPC/TCP
2.5.68 9.44 14.25 32.0856 60.1722 39.8264 73.7042
2.5.68-mm2 32.71 48.45 45.4747 65.2766 56.7022 79.7929
*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------
File Mmap Bcopy Bcopy Memory Memory
kernel Pipe AF/Unix TCP reread reread (libc) (hand) read write
2.5.68 511.3 546.9 174.0 296.5 363.9 170.3 172.0 364.9 211.9
2.5.68-mm2 493.2 278.0 167.2 289.2 347.8 160.9 163.1 348.1 199.3
*Local* More Communication bandwidths in MB/s - bigger is better
----------------------------------------------------------------
File Mmap Aligned Partial Partial Partial
OS open open Bcopy Bcopy Mmap Mmap
close close (libc) (hand) write rd/wrt HTTP
2.5.68 299.0 286.0 167.8 182.5 212.2 212.7 10.10
2.5.68-mm2 291.9 277.5 159.7 172.4 201.2 200.5 9.82
Memory latencies in nanoseconds - smaller is better
---------------------------------------------------
kernel Mhz L1 $ L2 $ Main mem
2.5.68 698 4.35 13.06 165.3
2.5.68-mm2 698 4.33 13.00 173.1
tiobench-0.3.3
Unit information
================
File size = 8192 megabytes
Blk Size = 4096 bytes
Rate = megabytes per second
CPU% = percentage of CPU used during the test
Latency = milliseconds
Lat% = percent of requests that took longer than X seconds
CPU Eff = Rate divided by CPU% - throughput per cpu load
One notable difference between -mm2 and 2.5.68 is the CPU% as
thread count goes up. -mm2 uses less CPU as thread count rises,
and 2.5.68 uses more. 2.5.68 keeps sequential read throughput
high as threads increase.
Sequential Reads ext2
Num Avg Maximum Lat% Lat% CPU
Kernel Thr Rate (CPU%) Latency Latency >2s >10s Eff
------------- --- ------------------------------------------------------------
2.5.68 1 28.77 13.23% 0.405 592.14 0.00000 0.00000 217
2.5.68-mm2 1 28.77 13.80% 0.404 659.18 0.00000 0.00000 208
2.5.68 8 36.65 18.04% 2.542 945.37 0.00000 0.00000 203
2.5.68-mm2 8 23.96 11.15% 3.810 1219.85 0.00000 0.00000 215
2.5.68 16 30.56 14.94% 6.080 1224.19 0.00000 0.00000 204
2.5.68-mm2 16 20.19 9.39% 8.953 2456.76 0.00000 0.00000 215
2.5.68 32 27.74 13.84% 13.376 1498.48 0.00000 0.00000 200
2.5.68-mm2 32 20.15 9.50% 16.728 4424.53 0.00000 0.00000 212
2.5.68 64 28.47 14.54% 25.294 6204.46 0.00005 0.00000 196
2.5.68-mm2 64 19.54 9.40% 32.600 12986.20 0.04410 0.00000 208
2.5.68 128 29.87 14.99% 41.715 17752.22 0.10242 0.00000 199
2.5.68-mm2 128 19.28 9.21% 63.638 57459.95 1.27239 0.01006 209
2.5.68 256 34.10 16.88% 64.697 51122.80 1.16358 0.01163 202
2.5.68-mm2 256 18.84 8.96% 125.350 164470.88 1.43795 0.14148 210
Random Reads throughput on ext2 is a lot higher on 2.5.68. -mm2 has a bump in
latency as thread count gets very high.
Num Avg Maximum Lat% Lat% CPU
Kernel Thr Rate (CPU%) Latency Latency >2s >10s Eff
------------- --- ------------------------------------------------------------
2.5.68 1 0.84 0.75% 14.003 120.98 0.00000 0.00000 111
2.5.68-mm2 1 0.95 0.88% 12.383 121.84 0.00000 0.00000 108
2.5.68 8 4.56 4.29% 19.193 122.64 0.00000 0.00000 106
2.5.68-mm2 8 0.96 0.85% 95.108 715.00 0.00000 0.00000 113
2.5.68 16 4.34 3.95% 40.724 212.21 0.00000 0.00000 110
2.5.68-mm2 16 0.99 0.80% 178.652 1203.69 0.00000 0.00000 123
2.5.68 32 3.28 3.40% 98.453 335.85 0.00000 0.00000 96
2.5.68-mm2 32 0.94 0.76% 357.853 2151.68 0.00000 0.00000 124
2.5.68 64 4.20 3.87% 137.963 647.04 0.00000 0.00000 108
2.5.68-mm2 64 0.91 0.79% 677.313 3973.72 0.00000 0.00000 115
2.5.68 128 4.18 4.03% 245.390 1693.66 0.00000 0.00000 104
2.5.68-mm2 128 0.90 0.76% 1275.112 7329.02 11.84476 0.00000 119
2.5.68 256 4.96 4.47% 285.231 6121.11 0.78125 0.00000 111
2.5.68-mm2 256 0.86 0.86% 2160.203 40955.72 32.13542 3.67187 99
For Sequential Writes on ext2, -mm2 has higher throughput and lower latency.
Num Avg Maximum Lat% Lat% CPU
Kernel Thr Rate (CPU%) Latency Latency >2s >10s Eff
------------- --- ------------------------------------------------------------
2.5.68 1 55.43 41.59% 0.173 3228.31 0.00000 0.00000 133
2.5.68-mm2 1 57.78 43.13% 0.164 3055.50 0.00000 0.00000 134
2.5.68 8 30.83 30.28% 2.473 21372.39 0.05684 0.00000 102
2.5.68-mm2 8 32.13 33.00% 2.281 20425.81 0.05011 0.00000 97
2.5.68 16 29.02 30.14% 4.886 36841.82 0.08054 0.00024 96
2.5.68-mm2 16 30.26 32.67% 4.616 33532.37 0.07949 0.00020 93
2.5.68 32 26.93 32.35% 9.834 76337.91 0.10024 0.03682 83
2.5.68-mm2 32 28.08 33.27% 9.433 75278.98 0.09423 0.01369 84
2.5.68 64 25.72 33.33% 19.158 134891.94 0.14043 0.07386 77
2.5.68-mm2 64 28.50 36.25% 18.455 133508.81 0.11492 0.06619 79
2.5.68 128 25.85 34.97% 35.961 266123.63 0.22740 0.09542 74
2.5.68-mm2 128 28.69 37.41% 33.453 217356.72 0.21301 0.08387 77
2.5.68 256 29.80 43.31% 60.387 463540.28 0.43515 0.12388 69
2.5.68-mm2 256 29.84 43.63% 60.796 404468.07 0.54049 0.11292 68
-mm2 does better with random writes.
Random Writes ext2
Num Avg Maximum Lat% Lat% CPU
Kernel Thr Rate (CPU%) Latency Latency >2s >10s Eff
------------- --- ------------------------------------------------------------
2.5.68 1 2.86 2.73% 1.059 60.94 0.00000 0.00000 105
2.5.68-mm2 1 4.48 3.94% 0.077 22.02 0.00000 0.00000 114
2.5.68 8 3.73 4.39% 1.176 81.25 0.00000 0.00000 85
2.5.68-mm2 8 4.09 3.91% 1.984 488.24 0.00000 0.00000 104
2.5.68 16 3.69 4.21% 1.872 189.26 0.00000 0.00000 88
2.5.68-mm2 16 4.00 4.45% 3.510 969.07 0.00000 0.00000 90
2.5.68 32 3.71 4.89% 2.102 352.52 0.00000 0.00000 76
2.5.68-mm2 32 4.03 5.62% 4.660 1455.09 0.00000 0.00000 72
2.5.68 64 3.71 5.68% 2.266 701.86 0.00000 0.00000 65
2.5.68-mm2 64 4.26 7.39% 2.334 1483.77 0.00000 0.00000 58
2.5.68 128 3.79 6.87% 1.343 1042.66 0.00000 0.00000 55
2.5.68-mm2 128 4.35 8.14% 0.853 275.49 0.00000 0.00000 53
2.5.68 256 3.79 6.70% 0.304 79.07 0.00000 0.00000 57
2.5.68-mm2 256 4.36 8.87% 2.487 3519.76 0.00000 0.00000 49
bonnie++-1.02c random seek test on ext2 supports the tiobench random write
result.
Sequential Output ------------------ ----- Random -----
------ Block ----- ---- Rewrite ---- ----- Seeks -----
Kernel Size MB/sec %CPU Eff MB/sec %CPU Eff /sec %CPU Eff
2.5.68 8192 68.62 53.3 129 15.92 17.0 94 502.5 3.00 16750
2.5.68-mm2 8192 71.61 57.0 126 17.52 19.0 92 203.9 1.00 20393
--
Randy Hron
http://home.earthlink.net/~rwhron/kernel/bigbox.html
next reply other threads:[~2003-04-25 22:50 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-04-25 23:09 rwhron [this message]
2003-04-25 23:25 ` [BENCHMARK] 2.5.68 and 2.5.68-mm2 Andrew Morton
-- strict thread matches above, loose matches on Subject: below --
2003-04-26 1:58 rwhron
2003-04-26 2:20 ` Nick Piggin
2003-04-26 3:11 ` Nick Piggin
2003-04-28 21:58 rwhron
2003-04-30 0:59 rwhron
2003-05-01 18:10 ` Nick Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20030425230939.GA2281@rushmore \
--to=rwhron@earthlink.net \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.