I/O throughput regression bug going from 2.6.6 to 2.6.7 or 2.6.8-rc3 There are several I/O throughput bugs that were introduced in 2.6.7, not related to any updates in drivers/md. The first reduces multi-threaded reads of a file on an ext3/RAID0 file from ~600MB/Sec to ~160MB/Sec on my opteron. The same result is seen on a i686 based system. Doing single threaded reads of the same ext3/RAID0 file shows a ~60MB/Sec reduction. The hardware is 2-Adaptec 39320A-R HBAs into 4 7-drive strings of 15KRPM Seagate U320 drives. The AIC79XX driver is V2.0.12 (the stock driver shows lower performance). The second throughput bug happens when the file being read is larger than physical memory, in this case 16GB of file, and 8GB of RAM. Reading the first 7GB of file runs at ~420MB/Sec (1-39320A and 14 drives). The next 9GB runs at 60MB/Sec or less. If I use fadvise64_64() to try to manage the file cache, the rate drops to under 10MB/Sec :-) Observations - kswapd goes nuts under 2.6.7, 2.6.8-rc3 when the file being read exceeds the physical memory size. System idle time (from top) is near zero for 2 threads reading under 2.6.6, and is 50% or better for 2.6.7 or 2.6.8-rc3. Otherwise I/O wait is the dominant state 89% to 95%. The opteron is capable of 955MB/Sec raw I/O off the 28 drive array using O_DIRECT on /dev/sda, /dev/sdb... 541MB/Sec raw I/O off the 14 drive array. The value of 2 threads, 11KB reads, and 448 RASize were close to optimal for 2.6.2 through 2.6.6 on the 14 drive system. fadvise64_64() is broken on i686 and x86_64. The 3rd parameter is being passed garbage off the stack. Patching the ioctl fixed that one. fadvise64_64() helps on the i686, but is very harmfull on the x86_64. The values passed via ioctl(BLKRASET | BLKFRASET) to get peak performance vary radically between 2.6.6, 2.6.7, and 2.6.8-rc3 for /dev/md0. The ioctl does a right shift by 3 bits before using the passed in value, so my RASize value is left shifted by 3 bits (X * 8) before being passed in. 4-controller, 28 drives, raid0, 2.6.6, 2 threads 2, 11, 448, 552.617 2, 11, 2736, 595.695 2, 11, 2275, 596.911 2, 11, 2253, 597.956 2, 11, 2321, 600.234 2, 11, 2164, 601.115 2-controller, 14 drives, raid0, 2.6.8-rc3, 2 threads 2, 11, 448, 81.543 2, 239, 448, 154.706 2, 239, 673, 161.070 2, 239, 124, 161.158 2, 239, 149, 161.209 2, 239, 128, 161.298 2, 239, 229, 161.400 2, 239, 548, 161.897 2-controller, 14 drives, raid0, 2.6.8-rc3, single thread 1, 11, 448, 329.419 1, 11, 935, 373.382 1, 11, 894, 373.518 1, 11, 1021, 377.442 1, 11, 1023, 387.952 1, 11, 1024, 418.985 2-controller, 14 drives, raid0, 2.6.6, 2 threads 2, 11, 471, 429.170 2, 11, 470, 430.252 2, 11, 493, 430.523 2, 11, 514, 430.795 2, 11, 448, 431.612 2-controller, 14 drives, raid0, 2.6.6, single thread 1, 7, 448, 328.047 1, 7, 681, 365.714 1, 7, 675, 366.107 1, 7, 186, 366.237 1, 7, 668, 367.882 1, 7, 662, 368.213 Hardware setup: dual cpu 2.0GHz opteron, Tyan S2885, 8GB ram, dual 39320A-R on different PCi-X busses. RedHat ES3.0-update 1. dual cpu 2.66GHZ Xeon w/hyperthtreading, SuperMicro X5DA8, 2GB RAM, dual 39320A-R (or AIC7902 on mobo), RedHat ES3.0-update 2. 14 Seagate 36GB 15K RPM U320 drives in one partition, 14 Fujitsu 73GB 15K RPM U320 drives in two 36GB partitions. (better have the right ucode for those fujitsu drives!) In two StorCase 14-bay Infostations.