* I/O tests using elvtune to improve interactive performance [not found] <138.49c8e42.29247804@aol.com> @ 2001-11-17 8:06 ` rwhron 2001-11-19 7:09 ` Jens Axboe 0 siblings, 1 reply; 4+ messages in thread From: rwhron @ 2001-11-17 8:06 UTC (permalink / raw) To: linux-kernel, ltp-list Kernel: 2.4.15-pre5 Test: Run growfiles tests from Linux Test Project that really hurt interactive performance. Simultaneously run "ls -laR /". Change the elevator read latency value with elvtune. Also run mp3blaster tests. Summary: Smaller values for the I/O elevator read latency have a signficant positive impact on interactive performance; and throughput is as good or better than default value of 8192. The idea for this came from Andrea Arcangeli's excellent doc at http://tux.u-strasbg.fr/jl3/features-2.3-1.html . That page shows that dbench throughput can be good with low values for read latency too. My initial tests were just to run growfiles and do commands that were slow to respond in the past. Things like "ls -l", "login", "ps aux", etc. I didn't time these tests, but it was amazing what a difference using elvtune to set read latency to 128 or 32 made. Each growfiles test prints the number of iterations for a 120 second interval, and I was happy to see that the number of iterations went up while interactive performance was dramatically better. Of course, running ls -l in big directories isn't exactly scientific, so I tried to come up with something to measure interactive performance. For these tests, the ls -laR / is running at the same time as some growfiles tests. I picked ls for a couple reasons: 1) It's slow to respond when I/O is high. 2) It's easy to measure and repeat. 3) My disk has 5 partitions and lots of files spread on each partition, which will require some seeking on the disk. ls -laR / is not ideal though; it isn't interactive. Total time for the 4 growfiles tests is 8 minutes (120 seconds per test). The ls command finished before the last growfiles test completed in each run. I rebooted between each of these tests. read_latency = 2 ---------------- The ls was the slowest here, and none of the growfiles were the fastest. ls -laR / > /var/tmp/ls-laR2 Elapsed (wall clock) time (h:mm:ss or m:ss): 7:40.52 Percent of CPU this job got: 4% growfiles -b -e 1 -i 0 -L 120 -u -g 4090 -T 100 -t 408990 -l -C 10 -c 1000 -S 10 -f Lgf02_ 13969 iterations to 10 files. Hit time value of 120 growfiles -b -e 1 -i 0 -L 120 -u -g 5000 -T 100 -t 499990 -l -C 10 -c 1000 -S 10 -f Lgf03_ 12252 iterations to 10 files. Hit time value of 120 growfiles -b -e 1 -u -r 1-49600 -I r -u -i 0 -L 120 Lgfile1 48352 iterations to 1 files. Hit time value of 120 growfiles -b -e 1 -i 0 -L 120 -w -u -r 10-5000 -I r -T 10 -l -S 2 -f Lgf04_ 59807 iterations to 2 files. Hit time value of 120 read_latency = 32 ----------------- This value had 3 of the best results for growfiles. ls was 16% slower than the default read latency though. Interative performance was great though. ls -laR / > /var/tmp/ls-laR32 Elapsed (wall clock) time (h:mm:ss or m:ss): 5:08.23 Percent of CPU this job got: 6% growfiles -b -e 1 -i 0 -L 120 -u -g 4090 -T 100 -t 408990 -l -C 10 -c 1000 -S 10 -f Lgf02_ 14181 iterations to 10 files. Hit time value of 120 growfiles -b -e 1 -i 0 -L 120 -u -g 5000 -T 100 -t 499990 -l -C 10 -c 1000 -S 10 -f Lgf03_ 11691 iterations to 10 files. Hit time value of 120 growfiles -b -e 1 -u -r 1-49600 -I r -u -i 0 -L 120 Lgfile1 54768 iterations to 1 files. Hit time value of 120 growfiles -b -e 1 -i 0 -L 120 -w -u -r 10-5000 -I r -T 10 -l -S 2 -f Lgf04_ 68342 iterations to 2 files. Hit time value of 120 read_latency = 8192 (default) ------------------- ls -laR / > /var/tmp/ls-laR8192 Elapsed (wall clock) time (h:mm:ss or m:ss): 4:26.13 Percent of CPU this job got: 7% growfiles -b -e 1 -i 0 -L 120 -u -g 4090 -T 100 -t 408990 -l -C 10 -c 1000 -S 10 -f Lgf02_ 11085 iterations to 10 files. Hit time value of 120 growfiles -b -e 1 -i 0 -L 120 -u -g 5000 -T 100 -t 499990 -l -C 10 -c 1000 -S 10 -f Lgf03_ 13797 iterations to 10 files. Hit time value of 120 growfiles -b -e 1 -u -r 1-49600 -I r -u -i 0 -L 120 Lgfile1 53198 iterations to 1 files. Hit time value of 120 growfiles -b -e 1 -i 0 -L 120 -w -u -r 10-5000 -I r -T 10 -l -S 2 -f Lgf04_ 63542 iterations to 2 files. Hit time value of 120 mtest01 and mmap001 ------------------- I also ran the mtest01 and mmap001 tests playing mp3blaster with various elevator settings. These are the same tests I've run before. Below is just the total time for the test, and the percentage of time the mp3 played. read_latency = 16 was best here. Test was fastest and had the highest mp3 playtime. read_latency = 2 mtest01 - mp3 played 280 seconds of 316 second run. (88%) mmap001 not run because changing elvtune didn't seem to affect this test. read_latency = 16 mtest01 - mp3 played 280 seconds of 309 second run. (91%) mmap001 - mp3 played 908 seconds of 908 second run. read_latency = 64 mtest01 - mp3 played 280 seconds of 309 second run. (80%) mmap001 - mp3 played 908 seconds of 908 second run. read_latency = 8192 mtest01 - mp3 played 262 seconds of 314 second run. (83%) mmap001 - mp3 played 901 seconds of 901 second run. Hardware -------- Athlon 1333 512 Mb RAM (1) 40 Gb IDE hard drive with 5 partitions It's exciting to see Linux have good interactive performance under heavy disk load. Have fun! -- Randy Hron ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: I/O tests using elvtune to improve interactive performance 2001-11-17 8:06 ` I/O tests using elvtune to improve interactive performance rwhron @ 2001-11-19 7:09 ` Jens Axboe 2001-11-19 15:26 ` rwhron 2001-11-20 7:32 ` rwhron 0 siblings, 2 replies; 4+ messages in thread From: Jens Axboe @ 2001-11-19 7:09 UTC (permalink / raw) To: rwhron; +Cc: linux-kernel, ltp-list On Sat, Nov 17 2001, rwhron@earthlink.net wrote: > Kernel: 2.4.15-pre5 > > Test: Run growfiles tests from Linux Test Project that really hurt > interactive performance. Simultaneously run "ls -laR /". > Change the elevator read latency value with elvtune. > Also run mp3blaster tests. Interesting tests, thanks. I wonder if you could be convinced to do bonnie++ and dbench tests with the same read_latency values used? Also, I'm assuming you kept write latency at its default of 16384? -- Jens Axboe ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: I/O tests using elvtune to improve interactive performance 2001-11-19 7:09 ` Jens Axboe @ 2001-11-19 15:26 ` rwhron 2001-11-20 7:32 ` rwhron 1 sibling, 0 replies; 4+ messages in thread From: rwhron @ 2001-11-19 15:26 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-kernel, ltp-list On Mon, Nov 19, 2001 at 08:09:22AM +0100, Jens Axboe wrote: > > Test: Run growfiles tests from Linux Test Project that really hurt > > interactive performance. Simultaneously run "ls -laR /". > > Change the elevator read latency value with elvtune. > > Also run mp3blaster tests. > > Interesting tests, thanks. I wonder if you could be convinced to do > bonnie++ and dbench tests with the same read_latency values used? Also, > I'm assuming you kept write latency at its default of 16384? > -- > Jens Axboe > Thanks for the feedback. Write latency was 16384 for all tests. I'm downloading dbench and bonnie++ now. I'll check them out. I'm still not sure how to measure/quantify interactive performance. My ideal test will have these components: 1) Simulate and measure user interactive response time. 2) Disk I/O patterns capable of making interactive performance slow. 3) Measurement of I/O throughput. 4) Note how changes with elvtune effect throughput and response time. 5) It's not too boring. (i.e. type something, use a stop watch). It's the "measure interactive response time" that I haven't got a handle on yet. I'm looking at the SSBA benchmarks for something that simulates users. I don't know if it measures response time. I could resort to a stopwatch to test interactive response, but hopefully, something better will come to mind. -- Randy Hron ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: I/O tests using elvtune to improve interactive performance 2001-11-19 7:09 ` Jens Axboe 2001-11-19 15:26 ` rwhron @ 2001-11-20 7:32 ` rwhron 1 sibling, 0 replies; 4+ messages in thread From: rwhron @ 2001-11-20 7:32 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-kernel On Mon, Nov 19, 2001 at 08:09:22AM +0100, Jens Axboe wrote: > Interesting tests, thanks. I wonder if you could be convinced to do > bonnie++ and dbench tests with the same read_latency values used? Also, > -- > Jens Axboe Jens, I'm sure this isn't what you had in mind, but ... :) Kernel: 2.4.15-pre6 Test: dbench 775 on 5 partitions. Time ls -l on big directories. Test with read_latency to 8192 (default) and 32. Summary: Load average of 775 and console IRC clients perform great. Lower read latency reduces throughput, but big directory listings are faster. This is really a crazy test, but it's a testament to the amazing work of the kernel hackers. I was looking for the I/O load that makes interactive response poor. There are a couple growfiles tests in the Linux Test Project that do that with a load average of less than 5. dbench is different. The dbench load the kernel can handle is remarkable. Hardware: 1 Athlon 1333 1 GB RAM 1 GB swap 1 40 GB IDE disk A reasonable test may be dbench 36 or 144, which return: Throughput 90.636 MB/sec (NB=113.295 MB/sec 906.36 MBit/sec) 8 procs Throughput 56.0331 MB/sec (NB=70.0413 MB/sec 560.331 MBit/sec) 36 procs Throughput 25.7869 MB/sec (NB=32.2336 MB/sec 257.869 MBit/sec) 144 procs Instead, I figured out roughly how many simultaneous dbench processes would run with the amount of free disk space I have. 8:01pm up 53 min, 12 users, load average: 779.12, 778.68, 737.68 I had 3 console irc sessions up. Occasionally there was a very slight delay. "ls -l" on the other hand was very slow on big directories; timings are below. Summary: read latency=8192 compared to read latency=32 dbench 50 10% more throughput dbench 50 6% more throughput dbench 75 22% more throughput dbench 150 25% more throughput dbench 450 24% more throughput ls -l time 48% longer ls times are interspersed with dbench results in chronologic order. read_latency = 8192 ------------------- /usr/share/man/man3 real 30m48.908s # /home/dbench$ ./dbench 50 completes Throughput 1.91472 MB/sec (NB=2.39339 MB/sec 19.1472 MBit/sec) 50 procs # /usr/local/dbench$ ./dbench 50 completes Throughput 1.84434 MB/sec (NB=2.30543 MB/sec 18.4434 MBit/sec) 50 procs # /dbench$ ./dbench 75 completes Throughput 2.50039 MB/sec (NB=3.12548 MB/sec 25.0039 MBit/sec) 75 procs /usr/src/linux ls -laR real 10m11.953s # /usr/src/sources/d/dbench$ ./dbench 150 completes Throughput 3.51881 MB/sec (NB=4.39852 MB/sec 35.1881 MBit/sec) 150 procs /usr/X11R6/lib/X11/fonts/75dpi real 28m22.315s /usr/X11R6/lib/X11/fonts/100dpi real 12m27.915s # /opt/dbench$ ./dbench 450 completes Throughput 4.64194 MB/sec (NB=5.80242 MB/sec 46.4194 MBit/sec) 450 procs read_latency = 32 ----------------- /usr/share/man/man3 real 10m8.684s # /home/dbench$ ./dbench 50 completes Throughput 1.74518 MB/sec (NB=2.18147 MB/sec 17.4518 MBit/sec) 50 procs # /usr/local/dbench$ ./dbench 50 completes Throughput 1.73985 MB/sec (NB=2.17481 MB/sec 17.3985 MBit/sec) 50 procs /usr/src/linux ls -laR real 5m57.340s # /dbench$ ./dbench 75 completes Throughput 2.0441 MB/sec (NB=2.55513 MB/sec 20.441 MBit/sec) 75 procs /usr/X11R6/lib/X11/fonts/75dpi real 13m32.822s # /usr/src/sources/d/dbench$ ./dbench 150 completes Throughput 2.8047 MB/sec (NB=3.50587 MB/sec 28.047 MBit/sec) 150 procs /usr/X11R6/lib/X11/fonts/100dpi real 14m14.336s # /opt/dbench$ ./dbench 450 completes Throughput 3.74463 MB/sec (NB=4.68079 MB/sec 37.4463 MBit/sec) 450 procs Filesystems (test not running) ----------- Filesystem Type Size Used Avail Use% Mounted on /dev/hda12 reiserfs 4.2G 1.2G 3.0G 27% / /dev/hda11 reiserfs 15G 3.9G 11G 26% /opt /dev/hda5 reiserfs 10G 5.6G 4.9G 53% /usr/src /dev/hda6 reiserfs 5.2G 3.4G 1.8G 64% /home /dev/hda8 reiserfs 2.1G 200M 1.8G 10% /usr/local Conclusion: Load average 775! Box is solid. IRC clients perform great. Total throughput goes down as load goes up. It may have made more sense to do a shorter test with less processes and more values for read_latency, but it turned out this way. Hopefully it's entertaining, nonetheless. :) -- Randy Hron ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2001-11-20 7:30 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <138.49c8e42.29247804@aol.com>
2001-11-17 8:06 ` I/O tests using elvtune to improve interactive performance rwhron
2001-11-19 7:09 ` Jens Axboe
2001-11-19 15:26 ` rwhron
2001-11-20 7:32 ` rwhron
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox