* RE: Looking for the cause of poor I/O performance
@ 2004-12-03 11:30 TJ
2004-12-03 11:46 ` Erik Mouw
0 siblings, 1 reply; 36+ messages in thread
From: TJ @ 2004-12-03 11:30 UTC (permalink / raw)
To: linux-raid
>Gigabit! Lucky :)
>
>I want a Gigabit switch for Christmas! And a few PCI cards too!
>
>Of course, with Gigabit, I would want/need a better Linux system too! With
>PCI-express and ... I better wake up and go to bed! :)
Bwahahaha!
I'm cheap. I use a crossover so I didn't have to spring for the switch. The
NICs are Intel 82540EM's. I got them for around $55 per. I didn't think that
was too bad for gigabit. Of course, these controllers may be complete trash,
I dunno.
TJ Harrell
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: Looking for the cause of poor I/O performance 2004-12-03 11:30 Looking for the cause of poor I/O performance TJ @ 2004-12-03 11:46 ` Erik Mouw 2004-12-03 15:09 ` TJ 2004-12-03 16:32 ` David Greaves 0 siblings, 2 replies; 36+ messages in thread From: Erik Mouw @ 2004-12-03 11:46 UTC (permalink / raw) To: TJ; +Cc: linux-raid On Fri, Dec 03, 2004 at 06:30:51AM -0500, TJ wrote: > I'm cheap. I use a crossover so I didn't have to spring for the switch. The > NICs are Intel 82540EM's. I got them for around $55 per. I didn't think that > was too bad for gigabit. Of course, these controllers may be complete trash, > I dunno. You won't do any better than fast ethernet when you're using a crossover cable. Gigabit ethernet doesn't need crossover cables for direct connections, it uses all four wire pairs in cat5 cable and will automatically figure out if there's a direct connection and do the right thing (all mandatory by the gigE standard, so every NIC will support it). If you use a fast ethernet cross cable, the NICs will autonegotiate to 100 MB/s full-duplex. The Intel gigE NICs are very good: good hardware, good driver, good support. Gigabit ethernet switches are becoming rather cheap: 200 EUR buys you an 8 port switch. Erik -- +-- Erik Mouw -- www.harddisk-recovery.com -- +31 70 370 12 90 -- | Lab address: Delftechpark 26, 2628 XH, Delft, The Netherlands ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance 2004-12-03 11:46 ` Erik Mouw @ 2004-12-03 15:09 ` TJ 2004-12-03 16:25 ` Erik Mouw 2004-12-03 16:32 ` David Greaves 1 sibling, 1 reply; 36+ messages in thread From: TJ @ 2004-12-03 15:09 UTC (permalink / raw) To: linux-raid; +Cc: Erik Mouw > You won't do any better than fast ethernet when you're using a > crossover cable. Gigabit ethernet doesn't need crossover cables for > direct connections, it uses all four wire pairs in cat5 cable and will > automatically figure out if there's a direct connection and do the > right thing (all mandatory by the gigE standard, so every NIC will > support it). If you use a fast ethernet cross cable, the NICs will > autonegotiate to 100 MB/s full-duplex. I did not know that auto-sensing was part of the Gigabit standard. I don't understand why you would think that performance would be worse with a crossover than a straight cable, though. I assure you, the link autonegotiates to a gigabit connection. The card driver reports this, the card's light indicator reports this, and my benchmarking of throughput has proven it. > The Intel gigE NICs are very good: good hardware, good driver, good > support. Gigabit ethernet switches are becoming rather cheap: 200 EUR > buys you an 8 port switch. Yeah, I knew Intel made good NIC's, and I knew they were linux supported. I'm only worried because this is the lowest end model in the line. I wonder if it offloads work to the CPU, causing lower throughput on a busy link, while more expensive versions handle more work on the card. Also, I have read some traffic that the e1000 driver is better tuned for light duty connections, and could use some improvement under a heavy workload. If you knew about any documentation, or mailing lists on the topic of tuning this, I'd appreciate it. TJ Harrell ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance 2004-12-03 15:09 ` TJ @ 2004-12-03 16:25 ` Erik Mouw 0 siblings, 0 replies; 36+ messages in thread From: Erik Mouw @ 2004-12-03 16:25 UTC (permalink / raw) To: TJ; +Cc: linux-raid On Fri, Dec 03, 2004 at 10:09:23AM -0500, TJ wrote: > I did not know that auto-sensing was part of the Gigabit standard. I don't > understand why you would think that performance would be worse with a > crossover than a straight cable, though. I assure you, the link > autonegotiates to a gigabit connection. The card driver reports this, the > card's light indicator reports this, and my benchmarking of throughput has > proven it. That means you have a crossover cable with two wire pairs crossed and two wire pairs straight, and guess what: gigE automatically detects badly wired cables (to a certain extent), correct it and negotiate to the correct speed: 1 Gbit/s. If you have a crossover cable using only two crossed wire pairs and the other pairs not connected, the link will negotiate to 100 Mbit/s. > > The Intel gigE NICs are very good: good hardware, good driver, good > > support. Gigabit ethernet switches are becoming rather cheap: 200 EUR > > buys you an 8 port switch. > > Yeah, I knew Intel made good NIC's, and I knew they were linux supported. I'm > only worried because this is the lowest end model in the line. I wonder if it > offloads work to the CPU, causing lower throughput on a busy link, while more > expensive versions handle more work on the card. We use the dual ported PCI-X server adapters in the file servers (dual Athlon and dual Opteron), but to be honest I haven't seen a difference in performance with the desktop adapters when we replaced them. It's just that they're 64 bit wide and have two NICs on a single board (and hence only use one PCI slot). The other machines (about 10 or so) have the cheaper desktop adapters. > Also, I have read some traffic that the e1000 driver is better tuned > for light duty connections, and could use some improvement under a > heavy workload. If you knew about any documentation, or mailing lists > on the topic of tuning this, I'd appreciate it. I can't comment on that. We push several gigabytes/day through the cards and I haven't seen any real problems. We had performance problems with NatSemi gigE NICs; Broadcom gigE NICs looks like too much driver hassle to me (judging from posts on linux-kernel). Documentation can be found on http://sourceforge.net/projects/e1000 , the appropriate mailing list is the networking list: netdev@oss.sgi.com . Erik -- +-- Erik Mouw -- www.harddisk-recovery.com -- +31 70 370 12 90 -- | Lab address: Delftechpark 26, 2628 XH, Delft, The Netherlands | Data lost? Stay calm and contact Harddisk-recovery.com ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance 2004-12-03 11:46 ` Erik Mouw 2004-12-03 15:09 ` TJ @ 2004-12-03 16:32 ` David Greaves 2004-12-03 16:50 ` Guy 1 sibling, 1 reply; 36+ messages in thread From: David Greaves @ 2004-12-03 16:32 UTC (permalink / raw) To: Erik Mouw; +Cc: TJ, linux-raid I paid about £50 for a 5port gig switch I have 3 e1000 cards (about £30 each) - they're relegated to doorstops I'm afraid :( Despite months of trying they just won't work with my consumer VIA/AMD systems (and Ganesh and gang have tried) I'm now using even cheaper Marvell based SMC EZ1000s (£20ish) - I doubt I'll get close to the throughput the e1000s could achieve - but I get 3 times more then fast ethernet (and about 10 times more than e1000s) which is worthwhile. David Erik Mouw wrote: >On Fri, Dec 03, 2004 at 06:30:51AM -0500, TJ wrote: > > >>I'm cheap. I use a crossover so I didn't have to spring for the switch. The >>NICs are Intel 82540EM's. I got them for around $55 per. I didn't think that >>was too bad for gigabit. Of course, these controllers may be complete trash, >>I dunno. >> >> > >You won't do any better than fast ethernet when you're using a >crossover cable. Gigabit ethernet doesn't need crossover cables for >direct connections, it uses all four wire pairs in cat5 cable and will >automatically figure out if there's a direct connection and do the >right thing (all mandatory by the gigE standard, so every NIC will >support it). If you use a fast ethernet cross cable, the NICs will >autonegotiate to 100 MB/s full-duplex. > >The Intel gigE NICs are very good: good hardware, good driver, good >support. Gigabit ethernet switches are becoming rather cheap: 200 EUR >buys you an 8 port switch. > > >Erik > > > - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: Looking for the cause of poor I/O performance 2004-12-03 16:32 ` David Greaves @ 2004-12-03 16:50 ` Guy 0 siblings, 0 replies; 36+ messages in thread From: Guy @ 2004-12-03 16:50 UTC (permalink / raw) To: 'David Greaves', 'Erik Mouw'; +Cc: 'TJ', linux-raid Now I have network envy! I am feeling inadequate. :) I guess I have some more XMAS ideas! :) I have about 7+ computers in my house and 2 network printers. I currently have a 24 port 10/100BaseT switch. A 5 port 100/1000BaseT switch would be enough to make me happy. I would connect up to 4 computers to Gigabit, and connect the Gigabit switch to the 100BaseT switch. Sweet! I guess I better get a full time job! Guy -----Original Message----- From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of David Greaves Sent: Friday, December 03, 2004 11:33 AM To: Erik Mouw Cc: TJ; linux-raid@vger.kernel.org Subject: Re: Looking for the cause of poor I/O performance I paid about £50 for a 5port gig switch I have 3 e1000 cards (about £30 each) - they're relegated to doorstops I'm afraid :( Despite months of trying they just won't work with my consumer VIA/AMD systems (and Ganesh and gang have tried) I'm now using even cheaper Marvell based SMC EZ1000s (£20ish) - I doubt I'll get close to the throughput the e1000s could achieve - but I get 3 times more then fast ethernet (and about 10 times more than e1000s) which is worthwhile. David Erik Mouw wrote: >On Fri, Dec 03, 2004 at 06:30:51AM -0500, TJ wrote: > > >>I'm cheap. I use a crossover so I didn't have to spring for the switch. The >>NICs are Intel 82540EM's. I got them for around $55 per. I didn't think that >>was too bad for gigabit. Of course, these controllers may be complete trash, >>I dunno. >> >> > >You won't do any better than fast ethernet when you're using a >crossover cable. Gigabit ethernet doesn't need crossover cables for >direct connections, it uses all four wire pairs in cat5 cable and will >automatically figure out if there's a direct connection and do the >right thing (all mandatory by the gigE standard, so every NIC will >support it). If you use a fast ethernet cross cable, the NICs will >autonegotiate to 100 MB/s full-duplex. > >The Intel gigE NICs are very good: good hardware, good driver, good >support. Gigabit ethernet switches are becoming rather cheap: 200 EUR >buys you an 8 port switch. > > >Erik > > > - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Looking for the cause of poor I/O performance @ 2004-12-02 16:38 TJ 2004-12-03 0:49 ` Mark Hahn 2004-12-03 7:12 ` TJ 0 siblings, 2 replies; 36+ messages in thread From: TJ @ 2004-12-02 16:38 UTC (permalink / raw) To: linux-raid Hi, I'm getting horrible performance on my samba server, and I am unsure of the cause after reading, benchmarking, and tuning. My server is a K6-500 with 43MB of RAM, standard x86 hardware. The OS is Slackware 10.0 w/ 2.6.7 kernel I've had similar problems with the 2.4.26 kernel. I've listed my partitions below, as well as the drive models. I have a linear RAID array as a single element of a RAID 5 array. The RAID 5 array is the array containing the fs being served by samba. I'm sure having one raid array built on another affects my I/O performance, as well as having root, swap, and a slice of that array all on one drive, however, I have taken this into account and still am unable to account for my machine's poor performance. All drives are on their own IDE channel, no master slave combos, as suggested in the RAID howto. To tune these drives, I use: hdparm -c3 -d1 -m16 -X68 -k1 -A1 -a128 -M128 -u1 /dev/hd[kigca] I have tried different values for -a. I use 128, because this corresponds closely with the 64k stripe of the raid 5 array. I ran hdparm -Tt on each individual drive as well as both of the raid arrays and included these numbers below. The numbers I got were pretty low for modern drives. In my dmesg, I'm seeing something strange.. I think this is determined by kernel internals. It seems strange and problematic to me. I believe this number is controller dependant, so I'm wondering if I have a controller issue here... hda: max request size: 128KiB hdc: max request size: 1024KiB hdg: max request size: 64KiB hdi: max request size: 128KiB hdk: max request size: 1024KiB I believe my hard drives are somehow not tuned properly due to the low hdparm numbers, especially hda and hdc. This is causing the raid array to perform poorly, in dbench and hdparm -tT. The fact that two drives on the same IDE controller are performing worse than the group, hda and hdc, further indicate that there may be a controller problem. I may try eliminating this controller and checking the results again. Also, I know that VIA chipsets, such as this MVP3, are known for poor PCI performance. I know that this is tweakable, and several programs exist for tweaking BIOS registers within Windows. How might I test the PCI bus to see if it is causing performance problems? Does anyone have any ideas on how to better tune these drives for more throughput? My partitions are: /dev/hda1 on / /dev/hda2 is swap /dev/hda3 is part of /dev/md0 /dev/hdi is part of /dev/md0 /dev/hdk is part of /dev/md0 /dev/md0 is a linear array. It is part of /dev/md1 /dev/hdg is part of /dev/md1 /dev/hdc is part of /dev/md1 /dev/md1 is a raid 5 array. hda: WD 400JB 40GB hdc: WD 2000JB 200GB hdg: WD 2000JB 200GB hdi: IBM 75 GXP 120GB hdk: WD 1200JB 120GB Controllers: hda-c: Onboard controller, VIA VT82C596B (rev 12) hdd-g: Silicon Image SiI 680 (rev 1) hdh-k: Promise PDC 20269 (rev 2) The results from hdparm -tT for each individual drive and each raid array are: /dev/hda: Timing buffer-cache reads: 212 MB in 2.02 seconds = 105.17 MB/sec Timing buffered disk reads: 42 MB in 3.07 seconds = 13.67 MB/sec /dev/hdc: Timing buffer-cache reads: 212 MB in 2.00 seconds = 105.80 MB/sec Timing buffered disk reads: 44 MB in 3.12 seconds = 14.10 MB/sec /dev/hdg: Timing buffer-cache reads: 212 MB in 2.02 seconds = 105.12 MB/sec Timing buffered disk reads: 68 MB in 3.04 seconds = 22.38 MB/sec /dev/hdi: Timing buffer-cache reads: 216 MB in 2.04 seconds = 106.05 MB/sec Timing buffered disk reads: 72 MB in 3.06 seconds = 23.53 MB/sec /dev/hdk: Timing buffer-cache reads: 212 MB in 2.01 seconds = 105.33 MB/sec Timing buffered disk reads: 66 MB in 3.05 seconds = 21.66 MB/sec /dev/md0: Timing buffer-cache reads: 212 MB in 2.01 seconds = 105.28 MB/sec Timing buffered disk reads: 70 MB in 3.07 seconds = 22.77 MB/sec /dev/md1: Timing buffer-cache reads: 212 MB in 2.03 seconds = 104.35 MB/sec Timing buffered disk reads: 50 MB in 3.03 seconds = 16.51 MB/sec The results from dbench 1 are: Throughput 19.0968 MB/sec 1 procs The results from tbench 1 are: Throughput 4.41996 MB/sec 1 procs I would appriciate any thoughts, leads, ideas, anything at all to point me in a direction here. Thanks, TJ Harrell ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance 2004-12-02 16:38 TJ @ 2004-12-03 0:49 ` Mark Hahn 2004-12-03 3:54 ` Guy ` (2 more replies) 2004-12-03 7:12 ` TJ 1 sibling, 3 replies; 36+ messages in thread From: Mark Hahn @ 2004-12-03 0:49 UTC (permalink / raw) To: TJ; +Cc: linux-raid > My server is a K6-500 with 43MB of RAM, standard x86 hardware. The such a machine was good in its day, but that day was what, 5-7 years ago? in practical terms, the machine probably has about 300 MB/s of memory bandwidth (vs 3000 for a low-end server today). further, it was not uncommon for chipsets to fail to cache then-large amounts of RAM (32M was a common limit for caches configured writeback, for instance, that would magically cache 64M if set to writethrough...) > OS is Slackware 10.0 w/ 2.6.7 kernel I've had similar problems with the with a modern kernel, manual hdparm tuning is unnecessary and probably wrong. > To tune these drives, I use: > hdparm -c3 -d1 -m16 -X68 -k1 -A1 -a128 -M128 -u1 /dev/hd[kigca] if you don't mess with the config via hdparm, what mode do they come up in? > hda: WD 400JB 40GB > hdc: WD 2000JB 200GB > hdg: WD 2000JB 200GB > hdi: IBM 75 GXP 120GB > hdk: WD 1200JB 120GB iirc, the 75GXP has a noticably lower density (and thus bandwidth). > Controllers: > hda-c: Onboard controller, VIA VT82C596B (rev 12) > hdd-g: Silicon Image SiI 680 (rev 1) > hdh-k: Promise PDC 20269 (rev 2) > /dev/hda: Timing buffered disk reads: 42 MB in 3.07 seconds = 13.67 MB/sec > /dev/hdc: Timing buffered disk reads: 44 MB in 3.12 seconds = 14.10 MB/sec not that bad for such a horrible controller (and PCI, CPU, memory system) > /dev/hdg: Timing buffered disk reads: 68 MB in 3.04 seconds = 22.38 MB/sec > /dev/hdi: Timing buffered disk reads: 72 MB in 3.06 seconds = 23.53 MB/sec > /dev/hdk: Timing buffered disk reads: 66 MB in 3.05 seconds = 21.66 MB/sec fairly modern controllers help, but not much. > /dev/md0: Timing buffered disk reads: 70 MB in 3.07 seconds = 22.77 MB/sec > /dev/md1: Timing buffered disk reads: 50 MB in 3.03 seconds = 16.51 MB/sec since the cpu/mem/chipset/bus are limiting factors, raid doesn't help. > I would appriciate any thoughts, leads, ideas, anything at all to point me in > a direction here. keeping a K6 alive is noble and/or amusing, but it's just not reasonable to expect it to keep up with modern disks. expecting it to run samba well is not terribly reasonable. plug those disks into any entry-level machine bought new (celeron, sempron) and you'll get whiplash. plug those disks into a proper server (dual-opteron, few GB ram) and you'll never look back. in fact, you'll start looking for a faster network. regards, mark hahn. ^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: Looking for the cause of poor I/O performance 2004-12-03 0:49 ` Mark Hahn @ 2004-12-03 3:54 ` Guy 2004-12-03 6:33 ` TJ 2004-12-04 15:23 ` TJ 2004-12-03 6:51 ` TJ 2004-12-03 20:03 ` TJ 2 siblings, 2 replies; 36+ messages in thread From: Guy @ 2004-12-03 3:54 UTC (permalink / raw) To: 'Mark Hahn', 'TJ'; +Cc: linux-raid My linux system is a P3-500 with 2 CPUs and 512 Meg RAM. My system is much faster than my network. I don't know how your K6-500 compares to my P3-500. But RAM may be your issue. That amount of ram seems very low. Are you swapping? What is your CPU load during the tests? If you are at 100%, then you are CPU bound. Your disk performance is faster than a 100BaseT network. So, your performance may not be an issue. My array gives about 60MB /second. # hdparm -tT /dev/md2 /dev/md2: Timing buffer-cache reads: 128 MB in 0.87 seconds =147.13 MB/sec Timing buffered disk reads: 64 MB in 0.99 seconds = 64.65 MB/sec # bonnie++ -d . -u 0:0 Version 1.03 ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP watkins-home 1G 3414 99 30899 66 20449 46 3599 99 77781 74 438.7 9 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 475 98 +++++ +++ 15634 88 501 99 1277 99 1977 98 Guy -----Original Message----- From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Mark Hahn Sent: Thursday, December 02, 2004 7:50 PM To: TJ Cc: linux-raid@vger.kernel.org Subject: Re: Looking for the cause of poor I/O performance > My server is a K6-500 with 43MB of RAM, standard x86 hardware. The such a machine was good in its day, but that day was what, 5-7 years ago? in practical terms, the machine probably has about 300 MB/s of memory bandwidth (vs 3000 for a low-end server today). further, it was not uncommon for chipsets to fail to cache then-large amounts of RAM (32M was a common limit for caches configured writeback, for instance, that would magically cache 64M if set to writethrough...) > OS is Slackware 10.0 w/ 2.6.7 kernel I've had similar problems with the with a modern kernel, manual hdparm tuning is unnecessary and probably wrong. > To tune these drives, I use: > hdparm -c3 -d1 -m16 -X68 -k1 -A1 -a128 -M128 -u1 /dev/hd[kigca] if you don't mess with the config via hdparm, what mode do they come up in? > hda: WD 400JB 40GB > hdc: WD 2000JB 200GB > hdg: WD 2000JB 200GB > hdi: IBM 75 GXP 120GB > hdk: WD 1200JB 120GB iirc, the 75GXP has a noticably lower density (and thus bandwidth). > Controllers: > hda-c: Onboard controller, VIA VT82C596B (rev 12) > hdd-g: Silicon Image SiI 680 (rev 1) > hdh-k: Promise PDC 20269 (rev 2) > /dev/hda: Timing buffered disk reads: 42 MB in 3.07 seconds = 13.67 MB/sec > /dev/hdc: Timing buffered disk reads: 44 MB in 3.12 seconds = 14.10 MB/sec not that bad for such a horrible controller (and PCI, CPU, memory system) > /dev/hdg: Timing buffered disk reads: 68 MB in 3.04 seconds = 22.38 MB/sec > /dev/hdi: Timing buffered disk reads: 72 MB in 3.06 seconds = 23.53 MB/sec > /dev/hdk: Timing buffered disk reads: 66 MB in 3.05 seconds = 21.66 MB/sec fairly modern controllers help, but not much. > /dev/md0: Timing buffered disk reads: 70 MB in 3.07 seconds = 22.77 MB/sec > /dev/md1: Timing buffered disk reads: 50 MB in 3.03 seconds = 16.51 MB/sec since the cpu/mem/chipset/bus are limiting factors, raid doesn't help. > I would appriciate any thoughts, leads, ideas, anything at all to point me in > a direction here. keeping a K6 alive is noble and/or amusing, but it's just not reasonable to expect it to keep up with modern disks. expecting it to run samba well is not terribly reasonable. plug those disks into any entry-level machine bought new (celeron, sempron) and you'll get whiplash. plug those disks into a proper server (dual-opteron, few GB ram) and you'll never look back. in fact, you'll start looking for a faster network. regards, mark hahn. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance 2004-12-03 3:54 ` Guy @ 2004-12-03 6:33 ` TJ 2004-12-03 7:38 ` Guy 2004-12-04 15:23 ` TJ 1 sibling, 1 reply; 36+ messages in thread From: TJ @ 2004-12-03 6:33 UTC (permalink / raw) To: linux-raid; +Cc: Guy > My linux system is a P3-500 with 2 CPUs and 512 Meg RAM. My system is much > faster than my network. I don't know how your K6-500 compares to my > P3-500. But RAM may be your issue. That amount of ram seems very low. Are > you swapping? What is your CPU load during the tests? If you are at 100%, > then you are CPU bound. You've got a dual CPU setup, mine is only single. I'll bet you have a server chipset too. Still, I have serious doubts that the CPU is at fault. My guess would be that this could be a VIA chipset problem. The load averages while running these tests are allways well below 1. I mistyped the amount of memory. I have approx 409 MB. I am not swapping. > Your disk performance is faster than a 100BaseT network. So, your > performance may not be an issue. The network is gigabit, with a crossover to the client machine. I used ttcp to verify that the link is capable of over 146 MB/sec. TJ ^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: Looking for the cause of poor I/O performance 2004-12-03 6:33 ` TJ @ 2004-12-03 7:38 ` Guy 0 siblings, 0 replies; 36+ messages in thread From: Guy @ 2004-12-03 7:38 UTC (permalink / raw) To: 'TJ', linux-raid Gigabit! Lucky :) I want a Gigabit switch for Christmas! And a few PCI cards too! Of course, with Gigabit, I would want/need a better Linux system too! With PCI-express and ... I better wake up and go to bed! :) Guy -----Original Message----- From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of TJ Sent: Friday, December 03, 2004 1:33 AM To: linux-raid@vger.kernel.org Cc: Guy Subject: Re: Looking for the cause of poor I/O performance > My linux system is a P3-500 with 2 CPUs and 512 Meg RAM. My system is much > faster than my network. I don't know how your K6-500 compares to my > P3-500. But RAM may be your issue. That amount of ram seems very low. Are > you swapping? What is your CPU load during the tests? If you are at 100%, > then you are CPU bound. You've got a dual CPU setup, mine is only single. I'll bet you have a server chipset too. Still, I have serious doubts that the CPU is at fault. My guess would be that this could be a VIA chipset problem. The load averages while running these tests are allways well below 1. I mistyped the amount of memory. I have approx 409 MB. I am not swapping. > Your disk performance is faster than a 100BaseT network. So, your > performance may not be an issue. The network is gigabit, with a crossover to the client machine. I used ttcp to verify that the link is capable of over 146 MB/sec. TJ - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance 2004-12-03 3:54 ` Guy 2004-12-03 6:33 ` TJ @ 2004-12-04 15:23 ` TJ 2004-12-04 17:59 ` Guy 1 sibling, 1 reply; 36+ messages in thread From: TJ @ 2004-12-04 15:23 UTC (permalink / raw) To: linux-raid; +Cc: Guy On Thursday 02 December 2004 10:54 pm, Guy wrote: > My linux system is a P3-500 with 2 CPUs and 512 Meg RAM. My system is much > faster than my network. I don't know how your K6-500 compares to my > P3-500. > My array gives about 60MB /second. Now I'm extremely curious to know why your box does so much better than mine. Does the bus run at 100? 133? I'm guessing it's SDRAM, not DDR. Also, does it have a stock PCI bus, or something special? TJ Harrell ^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: Looking for the cause of poor I/O performance 2004-12-04 15:23 ` TJ @ 2004-12-04 17:59 ` Guy 2004-12-04 23:51 ` Mark Hahn 0 siblings, 1 reply; 36+ messages in thread From: Guy @ 2004-12-04 17:59 UTC (permalink / raw) To: 'TJ', linux-raid My disks... My system has 14 disk drives. At 65Meg per second they are only doing about 5 meg per second each. 6 of the disks are on a 40MB/second SCSI bus, this limits my overall speed. During a re-sync I get about 6 Meg/second per disk. My system... 2 CPUs help. It's a Dell. :) It is what they call a workstation. The chipset is Intel 440BX (going from memory, so not 100% sure). In its day it was a high end system. It has SD ram. 100 Mhz system bus. All memory slots are full with the same size DIMMs, so it can interleave if the chipset supports that. The chipset has 3 PCI buses. Since my overall speed is not exceeding the speed of 1 PCI bus, I don't think this helps me, but maybe it does. Everything is SCSI, I don't know if that helps. My disks are on 3 different SCSI busses, 2 Adaptec cards and 1 built-in Adaptec chipset. This may help, it is a Dell Precision Workstation 410. http://support.dell.com/support/edocs/systems/deqkmt/specs.htm If my system is so much faster because of the motherboard design, then cool! I did not know motherboard design could make such a difference. The test "hdparm -tT /dev/md2" used about 35% of both CPU's. The test is so quick it is hard to be sure about the cpu load. I have 17 disks overall, so I tried hdparm of all of my disks at the same time. This uses 100% of my CPUs. I don't understand how this can report such high speeds on my 6 disks on the slow SCSI bus. Timing buffer-cache reads: 128 MB in 11.18 seconds = 11.45 MB/sec 128 MB in 11.03 seconds = 11.60 MB/sec 128 MB in 10.97 seconds = 11.67 MB/sec 128 MB in 10.91 seconds = 11.73 MB/sec 128 MB in 11.43 seconds = 11.20 MB/sec 128 MB in 11.37 seconds = 11.26 MB/sec 128 MB in 11.35 seconds = 11.28 MB/sec 128 MB in 11.37 seconds = 11.26 MB/sec 128 MB in 11.45 seconds = 11.18 MB/sec 128 MB in 11.97 seconds = 10.69 MB/sec 128 MB in 11.78 seconds = 10.87 MB/sec 128 MB in 11.99 seconds = 10.68 MB/sec 128 MB in 12.26 seconds = 10.44 MB/sec 128 MB in 12.18 seconds = 10.51 MB/sec 128 MB in 11.84 seconds = 10.81 MB/sec 128 MB in 11.84 seconds = 10.81 MB/sec 128 MB in 12.43 seconds = 10.30 MB/sec Timing buffered disk reads: 64 MB in 9.42 seconds = 6.79 MB/sec 64 MB in 9.62 seconds = 6.65 MB/sec 64 MB in 9.95 seconds = 6.43 MB/sec 64 MB in 9.71 seconds = 6.59 MB/sec 64 MB in 10.17 seconds = 6.29 MB/sec 64 MB in 11.00 seconds = 5.82 MB/sec 64 MB in 11.45 seconds = 5.59 MB/sec 64 MB in 10.81 seconds = 5.92 MB/sec 64 MB in 11.20 seconds = 5.71 MB/sec 64 MB in 11.57 seconds = 5.53 MB/sec 64 MB in 10.89 seconds = 5.88 MB/sec 64 MB in 11.73 seconds = 5.46 MB/sec 64 MB in 11.27 seconds = 5.68 MB/sec 64 MB in 11.20 seconds = 5.71 MB/sec 64 MB in 12.18 seconds = 5.25 MB/sec 64 MB in 11.41 seconds = 5.61 MB/sec 64 MB in 11.91 seconds = 5.37 MB/sec This is from a single disk: Timing buffer-cache reads: 128 MB in 0.87 seconds =147.13 MB/sec Timing buffered disk reads: 64 MB in 3.51 seconds = 18.23 MB/sec When I test a single disk, they all perform about the same. A single disk "buffer-cache" performs better than any of my SCSI buses. I have 2 at 80 Meg/sec and 1 at 40 Meg/sec. The speed exceeds the speed of the PCI bus. Ok, I understand. I was thinking buffer-cache was the disk drive's on-board cache, but buffer-cache is the Linux disk cache. I think! Now I wonder why it is so slow! :) Anyway, I hope I gave you too much information! :) Guy -----Original Message----- From: TJ [mailto:systemloc@earthlink.net] Sent: Saturday, December 04, 2004 10:24 AM To: linux-raid@vger.kernel.org Cc: Guy Subject: Re: Looking for the cause of poor I/O performance On Thursday 02 December 2004 10:54 pm, Guy wrote: > My linux system is a P3-500 with 2 CPUs and 512 Meg RAM. My system is much > faster than my network. I don't know how your K6-500 compares to my > P3-500. > My array gives about 60MB /second. Now I'm extremely curious to know why your box does so much better than mine. Does the bus run at 100? 133? I'm guessing it's SDRAM, not DDR. Also, does it have a stock PCI bus, or something special? TJ Harrell ^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: Looking for the cause of poor I/O performance 2004-12-04 17:59 ` Guy @ 2004-12-04 23:51 ` Mark Hahn 2004-12-05 1:00 ` Steven Ihde ` (2 more replies) 0 siblings, 3 replies; 36+ messages in thread From: Mark Hahn @ 2004-12-04 23:51 UTC (permalink / raw) To: Guy; +Cc: linux-raid > Timing buffer-cache reads: > 128 MB in 11.18 seconds = 11.45 MB/sec ... > > Timing buffered disk reads: > 64 MB in 9.42 seconds = 6.79 MB/sec ... > > This is from a single disk: > Timing buffer-cache reads: 128 MB in 0.87 seconds =147.13 MB/sec > Timing buffered disk reads: 64 MB in 3.51 seconds = 18.23 MB/sec excellent! this is really a great example of how a machine's limited internal bandwidth infringes on your raid performance. running hdparm -T shows that your machine can manage about 150 MB/s when simply doing a syscall, copying bytes to userspace, and returning. no involvement of any IO device. this number is typically about half the user-visible dram bandwidth as reported by the stream benchmark. when you try to do parallel IO (either with a bunch of hdparm -t's or with raid), each disk is desperately trying to write to dram at about 18 MB/s, ignoring other bottlenecks. alas, we already know that your available dram bandwidth is much lower than 14*18. for comparison, a fairly crappy SiS 735-based k7 system with 64b-wide PC2100 can deliver maybe 1.2 GB/s dram bandwidth. hdparm -T is about 500 MB/s, and would probably have trouble breaking 200 MB/s with raid0 even if it had enough buses. an older server of mine is e7500-based, dual xeon/2.4's, with 2xPC1600 ram. it sustains about 1.6 GB/s on Stream, and about 500 MB/s hdparm -T, and can sustain 250 MB/s through it's 6-disk raid without any problem. a newish server (dual-opteron, 2xPC2700) gives 1.4 GB/s under hdparm -T, and I expect it could hit 600 MB/s without much trouble, if given 10-12 disks and pcix (or better) controllers... regards, mark hahn. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance 2004-12-04 23:51 ` Mark Hahn @ 2004-12-05 1:00 ` Steven Ihde 2004-12-06 17:48 ` Steven Ihde 2004-12-05 2:16 ` Guy 2004-12-05 15:17 ` TJ 2 siblings, 1 reply; 36+ messages in thread From: Steven Ihde @ 2004-12-05 1:00 UTC (permalink / raw) To: Mark Hahn; +Cc: Guy, linux-raid Well while we're on the subject ;-) I have a three-disk raid5 array. In summary, the raid5 performs slightly worse than any of the three disks alone. Memory bandwidth tested by hdparm seems more than adequate (1.6GB/sec). Shouldn't read-balancing give me some benefit here? Kernel is 2.6.8. The system is an i865PE (I think) chipset with a 2.4GHz P4. I believe the memory bandwidth is more than adequate and that the disks are performing up to spec when tested alone (Seagate Barracudas, hda & hdc are 80GB PATA, sda is 120GB SATA): /dev/hda: Timing cached reads: 3356 MB in 2.00 seconds = 1676.58 MB/sec Timing buffered disk reads: 122 MB in 3.03 seconds = 40.24 MB/sec /dev/hdc: Timing cached reads: 3316 MB in 2.00 seconds = 1657.42 MB/sec Timing buffered disk reads: 122 MB in 3.02 seconds = 40.34 MB/sec /dev/sda: Timing cached reads: 3344 MB in 2.00 seconds = 1673.09 MB/sec Timing buffered disk reads: 122 MB in 3.04 seconds = 40.19 MB/sec Now, the raid5 array: /dev/md1: Timing cached reads: 3408 MB in 2.00 seconds = 1704.26 MB/sec Timing buffered disk reads: 114 MB in 3.01 seconds = 37.83 MB/sec Slightly worse! Bonnie++ gives me an even lower number, about 30.9 MB/sec for sequential input from the raid5. hda and hdc are attached to the on-board PATA interfaces (one per channel, no slaves on either channel). sda is attached to the on-board SATA interface (the other on-board SATA is empty). A possible clue is that when tested individually but in parallel, hda and hdc both halve their bandwidth: /dev/hda: Timing cached reads: 1552 MB in 2.00 seconds = 774.57 MB/sec Timing buffered disk reads: 68 MB in 3.07 seconds = 22.15 MB/sec /dev/hdc: Timing cached reads: 784 MB in 2.00 seconds = 391.86 MB/sec Timing buffered disk reads: 68 MB in 3.02 seconds = 22.54 MB/sec /dev/sda: Timing cached reads: 836 MB in 2.00 seconds = 417.65 MB/sec Timing buffered disk reads: 120 MB in 3.00 seconds = 39.94 MB/sec Could there be contention for some shared resource in the on-board PATA chipset between hda and hdc? Would moving one of them to a separate IDE controller on a PCI card help? Am I unreasonable to think that I should be getting better than 37 MB/sec on raid5 read performance, given that each disk alone seems capable of 40 MB/sec? Thanks, Steve ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance 2004-12-05 1:00 ` Steven Ihde @ 2004-12-06 17:48 ` Steven Ihde 2004-12-06 19:29 ` Guy 0 siblings, 1 reply; 36+ messages in thread From: Steven Ihde @ 2004-12-06 17:48 UTC (permalink / raw) To: linux-raid On Sat, 04 Dec 2004 17:00:08 -0800, Steven Ihde wrote: [snip] > A possible clue is that when tested individually but in parallel, hda > and hdc both halve their bandwidth: > > /dev/hda: > Timing cached reads: 1552 MB in 2.00 seconds = 774.57 MB/sec > Timing buffered disk reads: 68 MB in 3.07 seconds = 22.15 MB/sec > /dev/hdc: > Timing cached reads: 784 MB in 2.00 seconds = 391.86 MB/sec > Timing buffered disk reads: 68 MB in 3.02 seconds = 22.54 MB/sec > /dev/sda: > Timing cached reads: 836 MB in 2.00 seconds = 417.65 MB/sec > Timing buffered disk reads: 120 MB in 3.00 seconds = 39.94 MB/sec > > Could there be contention for some shared resource in the on-board > PATA chipset between hda and hdc? Would moving one of them to a > separate IDE controller on a PCI card help? > > Am I unreasonable to think that I should be getting better than 37 > MB/sec on raid5 read performance, given that each disk alone seems > capable of 40 MB/sec? To answer my own question... I moved one of the PATA drives to a PCI PATA controller. This did enable me to move 40MB/sec simultaneously from all three drives. Guess there's some issue with the built-in PATA on the ICH5R southbridge. However, this didn't help raid5 performance -- it was still about 35-39MB/sec. I also have a raid1 array on the same physical disks, and observed the same thing there (same read performance as a single disk with hdparm -tT, about 40 MB/sec). So: 2.6.8 includes the raid1 read balancing fix which was mentioned previously on this list -- should this show up as substantially better hdparm -tT numbers for raid1 or is it more complicated than that? Does raid5 do read-balancing at all or am I just fantasizing? Thanks, Steve ^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: Looking for the cause of poor I/O performance 2004-12-06 17:48 ` Steven Ihde @ 2004-12-06 19:29 ` Guy 2004-12-06 21:10 ` David Greaves 2004-12-06 21:16 ` Steven Ihde 0 siblings, 2 replies; 36+ messages in thread From: Guy @ 2004-12-06 19:29 UTC (permalink / raw) To: 'Steven Ihde', linux-raid RAID5 can't do read balancing. Any 1 piece of data is only on 1 drive. However, RAID5 does do read ahead, my speed is about 3.5 times as fast as a single disk. A single disk: 18 M/sec, my RAID5 array, 65 M/sec. Guy -----Original Message----- From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Steven Ihde Sent: Monday, December 06, 2004 12:49 PM To: linux-raid@vger.kernel.org Subject: Re: Looking for the cause of poor I/O performance On Sat, 04 Dec 2004 17:00:08 -0800, Steven Ihde wrote: [snip] > A possible clue is that when tested individually but in parallel, hda > and hdc both halve their bandwidth: > > /dev/hda: > Timing cached reads: 1552 MB in 2.00 seconds = 774.57 MB/sec > Timing buffered disk reads: 68 MB in 3.07 seconds = 22.15 MB/sec > /dev/hdc: > Timing cached reads: 784 MB in 2.00 seconds = 391.86 MB/sec > Timing buffered disk reads: 68 MB in 3.02 seconds = 22.54 MB/sec > /dev/sda: > Timing cached reads: 836 MB in 2.00 seconds = 417.65 MB/sec > Timing buffered disk reads: 120 MB in 3.00 seconds = 39.94 MB/sec > > Could there be contention for some shared resource in the on-board > PATA chipset between hda and hdc? Would moving one of them to a > separate IDE controller on a PCI card help? > > Am I unreasonable to think that I should be getting better than 37 > MB/sec on raid5 read performance, given that each disk alone seems > capable of 40 MB/sec? To answer my own question... I moved one of the PATA drives to a PCI PATA controller. This did enable me to move 40MB/sec simultaneously from all three drives. Guess there's some issue with the built-in PATA on the ICH5R southbridge. However, this didn't help raid5 performance -- it was still about 35-39MB/sec. I also have a raid1 array on the same physical disks, and observed the same thing there (same read performance as a single disk with hdparm -tT, about 40 MB/sec). So: 2.6.8 includes the raid1 read balancing fix which was mentioned previously on this list -- should this show up as substantially better hdparm -tT numbers for raid1 or is it more complicated than that? Does raid5 do read-balancing at all or am I just fantasizing? Thanks, Steve - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance 2004-12-06 19:29 ` Guy @ 2004-12-06 21:10 ` David Greaves 2004-12-06 23:02 ` Guy 2004-12-06 21:16 ` Steven Ihde 1 sibling, 1 reply; 36+ messages in thread From: David Greaves @ 2004-12-06 21:10 UTC (permalink / raw) To: Guy; +Cc: 'Steven Ihde', linux-raid but aren't the next 'n' blocks of data on (about) n drives that can be read concurrently (if the read is big enough) Guy wrote: >RAID5 can't do read balancing. Any 1 piece of data is only on 1 drive. >However, RAID5 does do read ahead, my speed is about 3.5 times as fast as a >single disk. A single disk: 18 M/sec, my RAID5 array, 65 M/sec. > >Guy > >-----Original Message----- >From: linux-raid-owner@vger.kernel.org >[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Steven Ihde >Sent: Monday, December 06, 2004 12:49 PM >To: linux-raid@vger.kernel.org >Subject: Re: Looking for the cause of poor I/O performance > >On Sat, 04 Dec 2004 17:00:08 -0800, Steven Ihde wrote: >[snip] > > >>A possible clue is that when tested individually but in parallel, hda >>and hdc both halve their bandwidth: >> >>/dev/hda: >> Timing cached reads: 1552 MB in 2.00 seconds = 774.57 MB/sec >> Timing buffered disk reads: 68 MB in 3.07 seconds = 22.15 MB/sec >>/dev/hdc: >> Timing cached reads: 784 MB in 2.00 seconds = 391.86 MB/sec >> Timing buffered disk reads: 68 MB in 3.02 seconds = 22.54 MB/sec >>/dev/sda: >> Timing cached reads: 836 MB in 2.00 seconds = 417.65 MB/sec >> Timing buffered disk reads: 120 MB in 3.00 seconds = 39.94 MB/sec >> >>Could there be contention for some shared resource in the on-board >>PATA chipset between hda and hdc? Would moving one of them to a >>separate IDE controller on a PCI card help? >> >>Am I unreasonable to think that I should be getting better than 37 >>MB/sec on raid5 read performance, given that each disk alone seems >>capable of 40 MB/sec? >> >> > >To answer my own question... I moved one of the PATA drives to a PCI >PATA controller. This did enable me to move 40MB/sec simultaneously >from all three drives. Guess there's some issue with the built-in >PATA on the ICH5R southbridge. > >However, this didn't help raid5 performance -- it was still about >35-39MB/sec. I also have a raid1 array on the same physical disks, >and observed the same thing there (same read performance as a single >disk with hdparm -tT, about 40 MB/sec). So: > >2.6.8 includes the raid1 read balancing fix which was mentioned >previously on this list -- should this show up as substantially better >hdparm -tT numbers for raid1 or is it more complicated than that? > >Does raid5 do read-balancing at all or am I just fantasizing? > >Thanks, > >Steve >- >To unsubscribe from this list: send the line "unsubscribe linux-raid" in >the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html > >- >To unsubscribe from this list: send the line "unsubscribe linux-raid" in >the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html > > > ^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: Looking for the cause of poor I/O performance 2004-12-06 21:10 ` David Greaves @ 2004-12-06 23:02 ` Guy 2004-12-08 9:24 ` David Greaves 0 siblings, 1 reply; 36+ messages in thread From: Guy @ 2004-12-06 23:02 UTC (permalink / raw) To: 'David Greaves'; +Cc: 'Steven Ihde', linux-raid Yes. I did say it reads ahead! Guy -----Original Message----- From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of David Greaves Sent: Monday, December 06, 2004 4:10 PM To: Guy Cc: 'Steven Ihde'; linux-raid@vger.kernel.org Subject: Re: Looking for the cause of poor I/O performance but aren't the next 'n' blocks of data on (about) n drives that can be read concurrently (if the read is big enough) Guy wrote: >RAID5 can't do read balancing. Any 1 piece of data is only on 1 drive. >However, RAID5 does do read ahead, my speed is about 3.5 times as fast as a >single disk. A single disk: 18 M/sec, my RAID5 array, 65 M/sec. > >Guy > >-----Original Message----- >From: linux-raid-owner@vger.kernel.org >[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Steven Ihde >Sent: Monday, December 06, 2004 12:49 PM >To: linux-raid@vger.kernel.org >Subject: Re: Looking for the cause of poor I/O performance > >On Sat, 04 Dec 2004 17:00:08 -0800, Steven Ihde wrote: >[snip] > > >>A possible clue is that when tested individually but in parallel, hda >>and hdc both halve their bandwidth: >> >>/dev/hda: >> Timing cached reads: 1552 MB in 2.00 seconds = 774.57 MB/sec >> Timing buffered disk reads: 68 MB in 3.07 seconds = 22.15 MB/sec >>/dev/hdc: >> Timing cached reads: 784 MB in 2.00 seconds = 391.86 MB/sec >> Timing buffered disk reads: 68 MB in 3.02 seconds = 22.54 MB/sec >>/dev/sda: >> Timing cached reads: 836 MB in 2.00 seconds = 417.65 MB/sec >> Timing buffered disk reads: 120 MB in 3.00 seconds = 39.94 MB/sec >> >>Could there be contention for some shared resource in the on-board >>PATA chipset between hda and hdc? Would moving one of them to a >>separate IDE controller on a PCI card help? >> >>Am I unreasonable to think that I should be getting better than 37 >>MB/sec on raid5 read performance, given that each disk alone seems >>capable of 40 MB/sec? >> >> > >To answer my own question... I moved one of the PATA drives to a PCI >PATA controller. This did enable me to move 40MB/sec simultaneously >from all three drives. Guess there's some issue with the built-in >PATA on the ICH5R southbridge. > >However, this didn't help raid5 performance -- it was still about >35-39MB/sec. I also have a raid1 array on the same physical disks, >and observed the same thing there (same read performance as a single >disk with hdparm -tT, about 40 MB/sec). So: > >2.6.8 includes the raid1 read balancing fix which was mentioned >previously on this list -- should this show up as substantially better >hdparm -tT numbers for raid1 or is it more complicated than that? > >Does raid5 do read-balancing at all or am I just fantasizing? > >Thanks, > >Steve >- >To unsubscribe from this list: send the line "unsubscribe linux-raid" in >the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html > >- >To unsubscribe from this list: send the line "unsubscribe linux-raid" in >the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html > > > - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance 2004-12-06 23:02 ` Guy @ 2004-12-08 9:24 ` David Greaves 2004-12-08 18:31 ` Guy 0 siblings, 1 reply; 36+ messages in thread From: David Greaves @ 2004-12-08 9:24 UTC (permalink / raw) To: Guy; +Cc: 'Steven Ihde', linux-raid My understanding of 'readahead' is that when an application asks for 312 bytes of data, the buffering code will anticipate more data is required and will fill a buffer (4096 bytes). If we know that apps are really greedy and read *loads* of data then we set a large readahead which will cause the buffer code (?) to fill a further n buffers/kb according to the readahead setting. This will all be read sequentially and the performance boost is because the read heads on the drive get all the data in one 'hit' - no unneeded seeks, no rotational latency. That's not the same as raid5 where when asked for 312 bytes of data, the buffering code wil fill the 4k buffer and then will issue a readahead on the next n kb of data - which is spread over multiple disks, which read in parallel, not sequentially. Yes, the readahead triggers this behaviour - but you say "RAID5 can't do read balancing." - which I thought it could through this mechanism. It depends whether the original use of "read balancing" in this context means "selecting a drive to obtain the data from according to the drive's read queue" (no) or "distributing reads amongst the drives to obtain a throughput greater than that of one individual drive" (yes) (OK, the terminology is not quite exact but...) do we agree? Or have I misunderstood something? David Guy wrote: >Yes. I did say it reads ahead! > >Guy > >-----Original Message----- >From: linux-raid-owner@vger.kernel.org >[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of David Greaves >Sent: Monday, December 06, 2004 4:10 PM >To: Guy >Cc: 'Steven Ihde'; linux-raid@vger.kernel.org >Subject: Re: Looking for the cause of poor I/O performance > >but aren't the next 'n' blocks of data on (about) n drives that can be >read concurrently (if the read is big enough) > >Guy wrote: > > > >>RAID5 can't do read balancing. Any 1 piece of data is only on 1 drive. >>However, RAID5 does do read ahead, my speed is about 3.5 times as fast as a >>single disk. A single disk: 18 M/sec, my RAID5 array, 65 M/sec. >> >>Guy >> >>-----Original Message----- >>From: linux-raid-owner@vger.kernel.org >>[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Steven Ihde >>Sent: Monday, December 06, 2004 12:49 PM >>To: linux-raid@vger.kernel.org >>Subject: Re: Looking for the cause of poor I/O performance >> >>On Sat, 04 Dec 2004 17:00:08 -0800, Steven Ihde wrote: >>[snip] >> >> >> >> >>>A possible clue is that when tested individually but in parallel, hda >>>and hdc both halve their bandwidth: >>> >>>/dev/hda: >>>Timing cached reads: 1552 MB in 2.00 seconds = 774.57 MB/sec >>>Timing buffered disk reads: 68 MB in 3.07 seconds = 22.15 MB/sec >>>/dev/hdc: >>>Timing cached reads: 784 MB in 2.00 seconds = 391.86 MB/sec >>>Timing buffered disk reads: 68 MB in 3.02 seconds = 22.54 MB/sec >>>/dev/sda: >>>Timing cached reads: 836 MB in 2.00 seconds = 417.65 MB/sec >>>Timing buffered disk reads: 120 MB in 3.00 seconds = 39.94 MB/sec >>> >>>Could there be contention for some shared resource in the on-board >>>PATA chipset between hda and hdc? Would moving one of them to a >>>separate IDE controller on a PCI card help? >>> >>>Am I unreasonable to think that I should be getting better than 37 >>>MB/sec on raid5 read performance, given that each disk alone seems >>>capable of 40 MB/sec? >>> >>> >>> >>> >>To answer my own question... I moved one of the PATA drives to a PCI >>PATA controller. This did enable me to move 40MB/sec simultaneously >> >> >>from all three drives. Guess there's some issue with the built-in > > >>PATA on the ICH5R southbridge. >> >>However, this didn't help raid5 performance -- it was still about >>35-39MB/sec. I also have a raid1 array on the same physical disks, >>and observed the same thing there (same read performance as a single >>disk with hdparm -tT, about 40 MB/sec). So: >> >>2.6.8 includes the raid1 read balancing fix which was mentioned >>previously on this list -- should this show up as substantially better >>hdparm -tT numbers for raid1 or is it more complicated than that? >> >>Does raid5 do read-balancing at all or am I just fantasizing? >> >>Thanks, >> >>Steve >>- >>To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>the body of a message to majordomo@vger.kernel.org >>More majordomo info at http://vger.kernel.org/majordomo-info.html >> >>- >>To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>the body of a message to majordomo@vger.kernel.org >>More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> >> >> > >- >To unsubscribe from this list: send the line "unsubscribe linux-raid" in >the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html > >- >To unsubscribe from this list: send the line "unsubscribe linux-raid" in >the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html > > > ^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: Looking for the cause of poor I/O performance 2004-12-08 9:24 ` David Greaves @ 2004-12-08 18:31 ` Guy 2004-12-08 22:00 ` Steven Ihde 0 siblings, 1 reply; 36+ messages in thread From: Guy @ 2004-12-08 18:31 UTC (permalink / raw) To: 'David Greaves'; +Cc: 'Steven Ihde', linux-raid "read balancing" will help regardless of random or sequential disk access. It can double your performance (assuming 2 disks). "read ahead" only helps sequential access, it hurts random access. Yes, I understand "read balancing" to be balancing the IO over 2 or more disks, when only 1 disk is really needed. So, you need 2 or more copies of the data, as in RAID1. About read ahead... The physical disks read ahead. md does read ahead. Since the disks and md are doing read ahead, you should have more than 1 disk reading at the same time. The physical disks are not very smart about RAID5, when reading ahead, they will also read the parity data, which is wasted effort. With all of the above going on you should get more than 1 disk reading data at the same time. With RAID(0, 4, 5 and 6) no one can choose which disk(s) to read. You can't balance anything. You can only predict what data will be needed before it is requested. Read ahead does this for large files (sequential reads). I would not consider this to be "read balancing", just read ahead. Guy -----Original Message----- From: David Greaves [mailto:david@dgreaves.com] Sent: Wednesday, December 08, 2004 4:24 AM To: Guy Cc: 'Steven Ihde'; linux-raid@vger.kernel.org Subject: Re: Looking for the cause of poor I/O performance My understanding of 'readahead' is that when an application asks for 312 bytes of data, the buffering code will anticipate more data is required and will fill a buffer (4096 bytes). If we know that apps are really greedy and read *loads* of data then we set a large readahead which will cause the buffer code (?) to fill a further n buffers/kb according to the readahead setting. This will all be read sequentially and the performance boost is because the read heads on the drive get all the data in one 'hit' - no unneeded seeks, no rotational latency. That's not the same as raid5 where when asked for 312 bytes of data, the buffering code wil fill the 4k buffer and then will issue a readahead on the next n kb of data - which is spread over multiple disks, which read in parallel, not sequentially. Yes, the readahead triggers this behaviour - but you say "RAID5 can't do read balancing." - which I thought it could through this mechanism. It depends whether the original use of "read balancing" in this context means "selecting a drive to obtain the data from according to the drive's read queue" (no) or "distributing reads amongst the drives to obtain a throughput greater than that of one individual drive" (yes) (OK, the terminology is not quite exact but...) do we agree? Or have I misunderstood something? David Guy wrote: >Yes. I did say it reads ahead! > >Guy > >-----Original Message----- >From: linux-raid-owner@vger.kernel.org >[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of David Greaves >Sent: Monday, December 06, 2004 4:10 PM >To: Guy >Cc: 'Steven Ihde'; linux-raid@vger.kernel.org >Subject: Re: Looking for the cause of poor I/O performance > >but aren't the next 'n' blocks of data on (about) n drives that can be >read concurrently (if the read is big enough) > >Guy wrote: > > > >>RAID5 can't do read balancing. Any 1 piece of data is only on 1 drive. >>However, RAID5 does do read ahead, my speed is about 3.5 times as fast as a >>single disk. A single disk: 18 M/sec, my RAID5 array, 65 M/sec. >> >>Guy >> >>-----Original Message----- >>From: linux-raid-owner@vger.kernel.org >>[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Steven Ihde >>Sent: Monday, December 06, 2004 12:49 PM >>To: linux-raid@vger.kernel.org >>Subject: Re: Looking for the cause of poor I/O performance >> >>On Sat, 04 Dec 2004 17:00:08 -0800, Steven Ihde wrote: >>[snip] >> >> >> >> >>>A possible clue is that when tested individually but in parallel, hda >>>and hdc both halve their bandwidth: >>> >>>/dev/hda: >>>Timing cached reads: 1552 MB in 2.00 seconds = 774.57 MB/sec >>>Timing buffered disk reads: 68 MB in 3.07 seconds = 22.15 MB/sec >>>/dev/hdc: >>>Timing cached reads: 784 MB in 2.00 seconds = 391.86 MB/sec >>>Timing buffered disk reads: 68 MB in 3.02 seconds = 22.54 MB/sec >>>/dev/sda: >>>Timing cached reads: 836 MB in 2.00 seconds = 417.65 MB/sec >>>Timing buffered disk reads: 120 MB in 3.00 seconds = 39.94 MB/sec >>> >>>Could there be contention for some shared resource in the on-board >>>PATA chipset between hda and hdc? Would moving one of them to a >>>separate IDE controller on a PCI card help? >>> >>>Am I unreasonable to think that I should be getting better than 37 >>>MB/sec on raid5 read performance, given that each disk alone seems >>>capable of 40 MB/sec? >>> >>> >>> >>> >>To answer my own question... I moved one of the PATA drives to a PCI >>PATA controller. This did enable me to move 40MB/sec simultaneously >> >> >>from all three drives. Guess there's some issue with the built-in > > >>PATA on the ICH5R southbridge. >> >>However, this didn't help raid5 performance -- it was still about >>35-39MB/sec. I also have a raid1 array on the same physical disks, >>and observed the same thing there (same read performance as a single >>disk with hdparm -tT, about 40 MB/sec). So: >> >>2.6.8 includes the raid1 read balancing fix which was mentioned >>previously on this list -- should this show up as substantially better >>hdparm -tT numbers for raid1 or is it more complicated than that? >> >>Does raid5 do read-balancing at all or am I just fantasizing? >> >>Thanks, >> >>Steve >>- >>To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>the body of a message to majordomo@vger.kernel.org >>More majordomo info at http://vger.kernel.org/majordomo-info.html >> >>- >>To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>the body of a message to majordomo@vger.kernel.org >>More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> >> >> > >- >To unsubscribe from this list: send the line "unsubscribe linux-raid" in >the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html > >- >To unsubscribe from this list: send the line "unsubscribe linux-raid" in >the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html > > > ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance 2004-12-08 18:31 ` Guy @ 2004-12-08 22:00 ` Steven Ihde 2004-12-08 22:25 ` Guy 0 siblings, 1 reply; 36+ messages in thread From: Steven Ihde @ 2004-12-08 22:00 UTC (permalink / raw) To: Guy; +Cc: 'David Greaves', linux-raid OK, between your discussion of read-ahead and Monday's post by Morten Olsen about /proc/sys/vm/max-readahead, I think I get it now. I'm using kernel 2.6 so /proc/sys/vm/max-readahead doesn't exist, but "blockdev --getra/--setra" seems to do the trick. By increasing readahead on my array device from 256 (the default) to 1024, I can achieve 80MB/sec sequential read throughput (where before I could get only 40MB/sec, same as a single disk). As you point out while it helps sequential reads it may hurt random reads, so I'll test a little more and see. One other point -- apparently 2.6 allows one to set the read-ahead on a per-device basis (maybe 2.4 does too, I don't know). So would it make sense to set read-ahead on the disks low (or zero), and readahead on the MD device high? Perhaps this could allow us to avoid the overhead of reading unecessary parity chunks. As the number of disks increases this would be less and less significant. -Steve On Wed, 08 Dec 2004 13:31:27 -0500, Guy wrote: > "read balancing" will help regardless of random or sequential disk access. > It can double your performance (assuming 2 disks). > > "read ahead" only helps sequential access, it hurts random access. > > Yes, I understand "read balancing" to be balancing the IO over 2 or more > disks, when only 1 disk is really needed. So, you need 2 or more copies of > the data, as in RAID1. > > About read ahead... > The physical disks read ahead. > md does read ahead. > Since the disks and md are doing read ahead, you should have more than 1 > disk reading at the same time. The physical disks are not very smart about > RAID5, when reading ahead, they will also read the parity data, which is > wasted effort. > > With all of the above going on you should get more than 1 disk reading data > at the same time. > > With RAID(0, 4, 5 and 6) no one can choose which disk(s) to read. You can't > balance anything. You can only predict what data will be needed before it > is requested. Read ahead does this for large files (sequential reads). I > would not consider this to be "read balancing", just read ahead. > > Guy > > -----Original Message----- > From: David Greaves [mailto:david@dgreaves.com] > Sent: Wednesday, December 08, 2004 4:24 AM > To: Guy > Cc: 'Steven Ihde'; linux-raid@vger.kernel.org > Subject: Re: Looking for the cause of poor I/O performance > > My understanding of 'readahead' is that when an application asks for 312 > bytes of data, the buffering code will anticipate more data is required > and will fill a buffer (4096 bytes). If we know that apps are really > greedy and read *loads* of data then we set a large readahead which will > cause the buffer code (?) to fill a further n buffers/kb according to > the readahead setting. This will all be read sequentially and the > performance boost is because the read heads on the drive get all the > data in one 'hit' - no unneeded seeks, no rotational latency. > > That's not the same as raid5 where when asked for 312 bytes of data, the > buffering code wil fill the 4k buffer and then will issue a readahead on > the next n kb of data - which is spread over multiple disks, which read > in parallel, not sequentially. > > Yes, the readahead triggers this behaviour - but you say "RAID5 can't do > read balancing." - which I thought it could through this mechanism. > > It depends whether the original use of "read balancing" in this context > means "selecting a drive to obtain the data from according to the > drive's read queue" (no) or "distributing reads amongst the drives to > obtain a throughput greater than that of one individual drive" (yes) > (OK, the terminology is not quite exact but...) > > do we agree? Or have I misunderstood something? > > David > > Guy wrote: > > >Yes. I did say it reads ahead! > > > >Guy > > > >-----Original Message----- > >From: linux-raid-owner@vger.kernel.org > >[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of David Greaves > >Sent: Monday, December 06, 2004 4:10 PM > >To: Guy > >Cc: 'Steven Ihde'; linux-raid@vger.kernel.org > >Subject: Re: Looking for the cause of poor I/O performance > > > >but aren't the next 'n' blocks of data on (about) n drives that can be > >read concurrently (if the read is big enough) > > > >Guy wrote: > > > > > > > >>RAID5 can't do read balancing. Any 1 piece of data is only on 1 drive. > >>However, RAID5 does do read ahead, my speed is about 3.5 times as fast as > a > >>single disk. A single disk: 18 M/sec, my RAID5 array, 65 M/sec. > >> > >>Guy > >> > >>-----Original Message----- > >>From: linux-raid-owner@vger.kernel.org > >>[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Steven Ihde > >>Sent: Monday, December 06, 2004 12:49 PM > >>To: linux-raid@vger.kernel.org > >>Subject: Re: Looking for the cause of poor I/O performance > >> > >>On Sat, 04 Dec 2004 17:00:08 -0800, Steven Ihde wrote: > >>[snip] > >> > >> > >> > >> > >>>A possible clue is that when tested individually but in parallel, hda > >>>and hdc both halve their bandwidth: > >>> > >>>/dev/hda: > >>>Timing cached reads: 1552 MB in 2.00 seconds = 774.57 MB/sec > >>>Timing buffered disk reads: 68 MB in 3.07 seconds = 22.15 MB/sec > >>>/dev/hdc: > >>>Timing cached reads: 784 MB in 2.00 seconds = 391.86 MB/sec > >>>Timing buffered disk reads: 68 MB in 3.02 seconds = 22.54 MB/sec > >>>/dev/sda: > >>>Timing cached reads: 836 MB in 2.00 seconds = 417.65 MB/sec > >>>Timing buffered disk reads: 120 MB in 3.00 seconds = 39.94 MB/sec > >>> > >>>Could there be contention for some shared resource in the on-board > >>>PATA chipset between hda and hdc? Would moving one of them to a > >>>separate IDE controller on a PCI card help? > >>> > >>>Am I unreasonable to think that I should be getting better than 37 > >>>MB/sec on raid5 read performance, given that each disk alone seems > >>>capable of 40 MB/sec? > >>> > >>> > >>> > >>> > >>To answer my own question... I moved one of the PATA drives to a PCI > >>PATA controller. This did enable me to move 40MB/sec simultaneously > >> > >> > >>from all three drives. Guess there's some issue with the built-in > > > > > >>PATA on the ICH5R southbridge. > >> > >>However, this didn't help raid5 performance -- it was still about > >>35-39MB/sec. I also have a raid1 array on the same physical disks, > >>and observed the same thing there (same read performance as a single > >>disk with hdparm -tT, about 40 MB/sec). So: > >> > >>2.6.8 includes the raid1 read balancing fix which was mentioned > >>previously on this list -- should this show up as substantially better > >>hdparm -tT numbers for raid1 or is it more complicated than that? > >> > >>Does raid5 do read-balancing at all or am I just fantasizing? > >> > >>Thanks, > >> > >>Steve ^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: Looking for the cause of poor I/O performance 2004-12-08 22:00 ` Steven Ihde @ 2004-12-08 22:25 ` Guy 2004-12-08 22:41 ` Guy 0 siblings, 1 reply; 36+ messages in thread From: Guy @ 2004-12-08 22:25 UTC (permalink / raw) To: 'Steven Ihde'; +Cc: 'David Greaves', linux-raid Good question! "One other point -- apparently 2.6 allows one to set the read-ahead on a per-device basis (maybe 2.4 does too, I don't know). So would it make sense to set read-ahead on the disks low (or zero), and read ahead on the MD device high? Perhaps this could allow us to avoid the overhead of reading unnecessary parity chunks. As the number of disks increases this would be less and less significant." I was wondering about this myself. I have read other people have played with the numbers, but I can't. # blockdev --getra /dev/md2 1024 # blockdev --setra 2048 /dev/md2 BLKRASET: Invalid argument # blockdev --setra 1024 /dev/md2 BLKRASET: Invalid argument I can change read ahead on each drive. I can set read ahead from 0 to 255 on my disks, but this seems to have no effect. My performance using "hdparm -t /dev/md2" stays about the same. Odd, I just tried other sizes with md2. I can change read ahead from 0 to 255 also. But it was 1024. With read ahead set to 0 on all of my disks and on md2, I still get the same performance. I guess on on-disk cache read ahead does just fine. My kernel is 2.4.28. Guy -----Original Message----- From: Steven Ihde [mailto:x-linux-raid@hamachi.dyndns.org] Sent: Wednesday, December 08, 2004 5:00 PM To: Guy Cc: 'David Greaves'; linux-raid@vger.kernel.org Subject: Re: Looking for the cause of poor I/O performance OK, between your discussion of read-ahead and Monday's post by Morten Olsen about /proc/sys/vm/max-readahead, I think I get it now. I'm using kernel 2.6 so /proc/sys/vm/max-readahead doesn't exist, but "blockdev --getra/--setra" seems to do the trick. By increasing readahead on my array device from 256 (the default) to 1024, I can achieve 80MB/sec sequential read throughput (where before I could get only 40MB/sec, same as a single disk). As you point out while it helps sequential reads it may hurt random reads, so I'll test a little more and see. One other point -- apparently 2.6 allows one to set the read-ahead on a per-device basis (maybe 2.4 does too, I don't know). So would it make sense to set read-ahead on the disks low (or zero), and readahead on the MD device high? Perhaps this could allow us to avoid the overhead of reading unecessary parity chunks. As the number of disks increases this would be less and less significant. -Steve On Wed, 08 Dec 2004 13:31:27 -0500, Guy wrote: > "read balancing" will help regardless of random or sequential disk access. > It can double your performance (assuming 2 disks). > > "read ahead" only helps sequential access, it hurts random access. > > Yes, I understand "read balancing" to be balancing the IO over 2 or more > disks, when only 1 disk is really needed. So, you need 2 or more copies of > the data, as in RAID1. > > About read ahead... > The physical disks read ahead. > md does read ahead. > Since the disks and md are doing read ahead, you should have more than 1 > disk reading at the same time. The physical disks are not very smart about > RAID5, when reading ahead, they will also read the parity data, which is > wasted effort. > > With all of the above going on you should get more than 1 disk reading data > at the same time. > > With RAID(0, 4, 5 and 6) no one can choose which disk(s) to read. You can't > balance anything. You can only predict what data will be needed before it > is requested. Read ahead does this for large files (sequential reads). I > would not consider this to be "read balancing", just read ahead. > > Guy > > -----Original Message----- > From: David Greaves [mailto:david@dgreaves.com] > Sent: Wednesday, December 08, 2004 4:24 AM > To: Guy > Cc: 'Steven Ihde'; linux-raid@vger.kernel.org > Subject: Re: Looking for the cause of poor I/O performance > > My understanding of 'readahead' is that when an application asks for 312 > bytes of data, the buffering code will anticipate more data is required > and will fill a buffer (4096 bytes). If we know that apps are really > greedy and read *loads* of data then we set a large readahead which will > cause the buffer code (?) to fill a further n buffers/kb according to > the readahead setting. This will all be read sequentially and the > performance boost is because the read heads on the drive get all the > data in one 'hit' - no unneeded seeks, no rotational latency. > > That's not the same as raid5 where when asked for 312 bytes of data, the > buffering code wil fill the 4k buffer and then will issue a readahead on > the next n kb of data - which is spread over multiple disks, which read > in parallel, not sequentially. > > Yes, the readahead triggers this behaviour - but you say "RAID5 can't do > read balancing." - which I thought it could through this mechanism. > > It depends whether the original use of "read balancing" in this context > means "selecting a drive to obtain the data from according to the > drive's read queue" (no) or "distributing reads amongst the drives to > obtain a throughput greater than that of one individual drive" (yes) > (OK, the terminology is not quite exact but...) > > do we agree? Or have I misunderstood something? > > David > > Guy wrote: > > >Yes. I did say it reads ahead! > > > >Guy > > > >-----Original Message----- > >From: linux-raid-owner@vger.kernel.org > >[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of David Greaves > >Sent: Monday, December 06, 2004 4:10 PM > >To: Guy > >Cc: 'Steven Ihde'; linux-raid@vger.kernel.org > >Subject: Re: Looking for the cause of poor I/O performance > > > >but aren't the next 'n' blocks of data on (about) n drives that can be > >read concurrently (if the read is big enough) > > > >Guy wrote: > > > > > > > >>RAID5 can't do read balancing. Any 1 piece of data is only on 1 drive. > >>However, RAID5 does do read ahead, my speed is about 3.5 times as fast as > a > >>single disk. A single disk: 18 M/sec, my RAID5 array, 65 M/sec. > >> > >>Guy > >> > >>-----Original Message----- > >>From: linux-raid-owner@vger.kernel.org > >>[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Steven Ihde > >>Sent: Monday, December 06, 2004 12:49 PM > >>To: linux-raid@vger.kernel.org > >>Subject: Re: Looking for the cause of poor I/O performance > >> > >>On Sat, 04 Dec 2004 17:00:08 -0800, Steven Ihde wrote: > >>[snip] > >> > >> > >> > >> > >>>A possible clue is that when tested individually but in parallel, hda > >>>and hdc both halve their bandwidth: > >>> > >>>/dev/hda: > >>>Timing cached reads: 1552 MB in 2.00 seconds = 774.57 MB/sec > >>>Timing buffered disk reads: 68 MB in 3.07 seconds = 22.15 MB/sec > >>>/dev/hdc: > >>>Timing cached reads: 784 MB in 2.00 seconds = 391.86 MB/sec > >>>Timing buffered disk reads: 68 MB in 3.02 seconds = 22.54 MB/sec > >>>/dev/sda: > >>>Timing cached reads: 836 MB in 2.00 seconds = 417.65 MB/sec > >>>Timing buffered disk reads: 120 MB in 3.00 seconds = 39.94 MB/sec > >>> > >>>Could there be contention for some shared resource in the on-board > >>>PATA chipset between hda and hdc? Would moving one of them to a > >>>separate IDE controller on a PCI card help? > >>> > >>>Am I unreasonable to think that I should be getting better than 37 > >>>MB/sec on raid5 read performance, given that each disk alone seems > >>>capable of 40 MB/sec? > >>> > >>> > >>> > >>> > >>To answer my own question... I moved one of the PATA drives to a PCI > >>PATA controller. This did enable me to move 40MB/sec simultaneously > >> > >> > >>from all three drives. Guess there's some issue with the built-in > > > > > >>PATA on the ICH5R southbridge. > >> > >>However, this didn't help raid5 performance -- it was still about > >>35-39MB/sec. I also have a raid1 array on the same physical disks, > >>and observed the same thing there (same read performance as a single > >>disk with hdparm -tT, about 40 MB/sec). So: > >> > >>2.6.8 includes the raid1 read balancing fix which was mentioned > >>previously on this list -- should this show up as substantially better > >>hdparm -tT numbers for raid1 or is it more complicated than that? > >> > >>Does raid5 do read-balancing at all or am I just fantasizing? > >> > >>Thanks, > >> > >>Steve ^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: Looking for the cause of poor I/O performance 2004-12-08 22:25 ` Guy @ 2004-12-08 22:41 ` Guy 2004-12-09 1:40 ` Steven Ihde 0 siblings, 1 reply; 36+ messages in thread From: Guy @ 2004-12-08 22:41 UTC (permalink / raw) To: 'Guy', 'Steven Ihde'; +Cc: 'David Greaves', linux-raid I also tried changing /proc/sys/vm/max-readahead. I tried the default of 31, 0 and 127. All gave me about the same performance. I started testing the speed with the dd command below. It complete in about 12.9 seconds. None of the read ahead changes seem to affect my speed. Everything is now set to 0, still 12.9 seconds. 12.9 seconds = about 79.38 MB/sec. time dd if=/dev/md2 of=/dev/null bs=1024k count=1024 Guy -----Original Message----- From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Guy Sent: Wednesday, December 08, 2004 5:25 PM To: 'Steven Ihde' Cc: 'David Greaves'; linux-raid@vger.kernel.org Subject: RE: Looking for the cause of poor I/O performance Good question! "One other point -- apparently 2.6 allows one to set the read-ahead on a per-device basis (maybe 2.4 does too, I don't know). So would it make sense to set read-ahead on the disks low (or zero), and read ahead on the MD device high? Perhaps this could allow us to avoid the overhead of reading unnecessary parity chunks. As the number of disks increases this would be less and less significant." I was wondering about this myself. I have read other people have played with the numbers, but I can't. # blockdev --getra /dev/md2 1024 # blockdev --setra 2048 /dev/md2 BLKRASET: Invalid argument # blockdev --setra 1024 /dev/md2 BLKRASET: Invalid argument I can change read ahead on each drive. I can set read ahead from 0 to 255 on my disks, but this seems to have no effect. My performance using "hdparm -t /dev/md2" stays about the same. Odd, I just tried other sizes with md2. I can change read ahead from 0 to 255 also. But it was 1024. With read ahead set to 0 on all of my disks and on md2, I still get the same performance. I guess on on-disk cache read ahead does just fine. My kernel is 2.4.28. Guy -----Original Message----- From: Steven Ihde [mailto:x-linux-raid@hamachi.dyndns.org] Sent: Wednesday, December 08, 2004 5:00 PM To: Guy Cc: 'David Greaves'; linux-raid@vger.kernel.org Subject: Re: Looking for the cause of poor I/O performance OK, between your discussion of read-ahead and Monday's post by Morten Olsen about /proc/sys/vm/max-readahead, I think I get it now. I'm using kernel 2.6 so /proc/sys/vm/max-readahead doesn't exist, but "blockdev --getra/--setra" seems to do the trick. By increasing readahead on my array device from 256 (the default) to 1024, I can achieve 80MB/sec sequential read throughput (where before I could get only 40MB/sec, same as a single disk). As you point out while it helps sequential reads it may hurt random reads, so I'll test a little more and see. One other point -- apparently 2.6 allows one to set the read-ahead on a per-device basis (maybe 2.4 does too, I don't know). So would it make sense to set read-ahead on the disks low (or zero), and readahead on the MD device high? Perhaps this could allow us to avoid the overhead of reading unecessary parity chunks. As the number of disks increases this would be less and less significant. -Steve On Wed, 08 Dec 2004 13:31:27 -0500, Guy wrote: > "read balancing" will help regardless of random or sequential disk access. > It can double your performance (assuming 2 disks). > > "read ahead" only helps sequential access, it hurts random access. > > Yes, I understand "read balancing" to be balancing the IO over 2 or more > disks, when only 1 disk is really needed. So, you need 2 or more copies of > the data, as in RAID1. > > About read ahead... > The physical disks read ahead. > md does read ahead. > Since the disks and md are doing read ahead, you should have more than 1 > disk reading at the same time. The physical disks are not very smart about > RAID5, when reading ahead, they will also read the parity data, which is > wasted effort. > > With all of the above going on you should get more than 1 disk reading data > at the same time. > > With RAID(0, 4, 5 and 6) no one can choose which disk(s) to read. You can't > balance anything. You can only predict what data will be needed before it > is requested. Read ahead does this for large files (sequential reads). I > would not consider this to be "read balancing", just read ahead. > > Guy > > -----Original Message----- > From: David Greaves [mailto:david@dgreaves.com] > Sent: Wednesday, December 08, 2004 4:24 AM > To: Guy > Cc: 'Steven Ihde'; linux-raid@vger.kernel.org > Subject: Re: Looking for the cause of poor I/O performance > > My understanding of 'readahead' is that when an application asks for 312 > bytes of data, the buffering code will anticipate more data is required > and will fill a buffer (4096 bytes). If we know that apps are really > greedy and read *loads* of data then we set a large readahead which will > cause the buffer code (?) to fill a further n buffers/kb according to > the readahead setting. This will all be read sequentially and the > performance boost is because the read heads on the drive get all the > data in one 'hit' - no unneeded seeks, no rotational latency. > > That's not the same as raid5 where when asked for 312 bytes of data, the > buffering code wil fill the 4k buffer and then will issue a readahead on > the next n kb of data - which is spread over multiple disks, which read > in parallel, not sequentially. > > Yes, the readahead triggers this behaviour - but you say "RAID5 can't do > read balancing." - which I thought it could through this mechanism. > > It depends whether the original use of "read balancing" in this context > means "selecting a drive to obtain the data from according to the > drive's read queue" (no) or "distributing reads amongst the drives to > obtain a throughput greater than that of one individual drive" (yes) > (OK, the terminology is not quite exact but...) > > do we agree? Or have I misunderstood something? > > David > > Guy wrote: > > >Yes. I did say it reads ahead! > > > >Guy > > > >-----Original Message----- > >From: linux-raid-owner@vger.kernel.org > >[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of David Greaves > >Sent: Monday, December 06, 2004 4:10 PM > >To: Guy > >Cc: 'Steven Ihde'; linux-raid@vger.kernel.org > >Subject: Re: Looking for the cause of poor I/O performance > > > >but aren't the next 'n' blocks of data on (about) n drives that can be > >read concurrently (if the read is big enough) > > > >Guy wrote: > > > > > > > >>RAID5 can't do read balancing. Any 1 piece of data is only on 1 drive. > >>However, RAID5 does do read ahead, my speed is about 3.5 times as fast as > a > >>single disk. A single disk: 18 M/sec, my RAID5 array, 65 M/sec. > >> > >>Guy > >> > >>-----Original Message----- > >>From: linux-raid-owner@vger.kernel.org > >>[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Steven Ihde > >>Sent: Monday, December 06, 2004 12:49 PM > >>To: linux-raid@vger.kernel.org > >>Subject: Re: Looking for the cause of poor I/O performance > >> > >>On Sat, 04 Dec 2004 17:00:08 -0800, Steven Ihde wrote: > >>[snip] > >> > >> > >> > >> > >>>A possible clue is that when tested individually but in parallel, hda > >>>and hdc both halve their bandwidth: > >>> > >>>/dev/hda: > >>>Timing cached reads: 1552 MB in 2.00 seconds = 774.57 MB/sec > >>>Timing buffered disk reads: 68 MB in 3.07 seconds = 22.15 MB/sec > >>>/dev/hdc: > >>>Timing cached reads: 784 MB in 2.00 seconds = 391.86 MB/sec > >>>Timing buffered disk reads: 68 MB in 3.02 seconds = 22.54 MB/sec > >>>/dev/sda: > >>>Timing cached reads: 836 MB in 2.00 seconds = 417.65 MB/sec > >>>Timing buffered disk reads: 120 MB in 3.00 seconds = 39.94 MB/sec > >>> > >>>Could there be contention for some shared resource in the on-board > >>>PATA chipset between hda and hdc? Would moving one of them to a > >>>separate IDE controller on a PCI card help? > >>> > >>>Am I unreasonable to think that I should be getting better than 37 > >>>MB/sec on raid5 read performance, given that each disk alone seems > >>>capable of 40 MB/sec? > >>> > >>> > >>> > >>> > >>To answer my own question... I moved one of the PATA drives to a PCI > >>PATA controller. This did enable me to move 40MB/sec simultaneously > >> > >> > >>from all three drives. Guess there's some issue with the built-in > > > > > >>PATA on the ICH5R southbridge. > >> > >>However, this didn't help raid5 performance -- it was still about > >>35-39MB/sec. I also have a raid1 array on the same physical disks, > >>and observed the same thing there (same read performance as a single > >>disk with hdparm -tT, about 40 MB/sec). So: > >> > >>2.6.8 includes the raid1 read balancing fix which was mentioned > >>previously on this list -- should this show up as substantially better > >>hdparm -tT numbers for raid1 or is it more complicated than that? > >> > >>Does raid5 do read-balancing at all or am I just fantasizing? > >> > >>Thanks, > >> > >>Steve - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance 2004-12-08 22:41 ` Guy @ 2004-12-09 1:40 ` Steven Ihde 0 siblings, 0 replies; 36+ messages in thread From: Steven Ihde @ 2004-12-09 1:40 UTC (permalink / raw) To: Guy; +Cc: 'David Greaves', linux-raid On Wed, 08 Dec 2004 17:41:45 -0500, Guy wrote: > I also tried changing /proc/sys/vm/max-readahead. > I tried the default of 31, 0 and 127. All gave me about the same > performance. > > I started testing the speed with the dd command below. It complete in about > 12.9 seconds. None of the read ahead changes seem to affect my speed. > Everything is now set to 0, still 12.9 seconds. > 12.9 seconds = about 79.38 MB/sec. > > time dd if=/dev/md2 of=/dev/null bs=1024k count=1024 I'm running kernel 2.6.8; I found the readahead setting had a pretty dramatic effect. I set readahead for all the drives and their partitions to zero: blockdev --setra 0 /dev/{hdc,hdg,sda,hdc5,hdg5,sda5} I tested various readahead values for the array device by reading 1GB of data from the device using this procedure: blockdev --flushbufs /dev/md1 blockdev --setra $readahead /dev/md1 dd if=/dev/md1 of=/dev/null bs=1024k count=1024 These are the results: RA transfer rate (B/s) --------------- 0: 15768513 128: 33680867 256: 42982770 512: 59223248 1024: 78590551 2048: 81918844 4096: 82386839 We seem to reach the point of diminishing returns at 1024 readahead, ~80MB/sec throughput. To recap, this is with three Seagate Barracuda drives, two of which are 80GB PATA, the other a 120GB SATA, in a RAID5 configuration. 256 was the default readahead value. The chunk size on my array is 32k. I don't know if that has an effect or not. -Steve ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance 2004-12-06 19:29 ` Guy 2004-12-06 21:10 ` David Greaves @ 2004-12-06 21:16 ` Steven Ihde 1 sibling, 0 replies; 36+ messages in thread From: Steven Ihde @ 2004-12-06 21:16 UTC (permalink / raw) To: Guy; +Cc: linux-raid Gotcha. Please excuse the loose use of terminology on my part. But now I'm more convinced than ever that I should be getting better performance than I am. I'm getting 40MB/sec from each disk individually, I've shown with hdparm that I can pull 40MB/sec from all three disks simultaneously, but still my raid5 read performance (in a three-disk array) is slightly less than 40MB/sec. Any guesses what the issue could be? Is there a switch for read-ahead? -Steve On Mon, 06 Dec 2004 14:29:40 -0500, Guy wrote: > RAID5 can't do read balancing. Any 1 piece of data is only on 1 drive. > However, RAID5 does do read ahead, my speed is about 3.5 times as fast as a > single disk. A single disk: 18 M/sec, my RAID5 array, 65 M/sec. > > Guy > > -----Original Message----- > From: linux-raid-owner@vger.kernel.org > [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Steven Ihde > Sent: Monday, December 06, 2004 12:49 PM > To: linux-raid@vger.kernel.org > Subject: Re: Looking for the cause of poor I/O performance > > On Sat, 04 Dec 2004 17:00:08 -0800, Steven Ihde wrote: > [snip] > > A possible clue is that when tested individually but in parallel, hda > > and hdc both halve their bandwidth: > > > > /dev/hda: > > Timing cached reads: 1552 MB in 2.00 seconds = 774.57 MB/sec > > Timing buffered disk reads: 68 MB in 3.07 seconds = 22.15 MB/sec > > /dev/hdc: > > Timing cached reads: 784 MB in 2.00 seconds = 391.86 MB/sec > > Timing buffered disk reads: 68 MB in 3.02 seconds = 22.54 MB/sec > > /dev/sda: > > Timing cached reads: 836 MB in 2.00 seconds = 417.65 MB/sec > > Timing buffered disk reads: 120 MB in 3.00 seconds = 39.94 MB/sec > > > > Could there be contention for some shared resource in the on-board > > PATA chipset between hda and hdc? Would moving one of them to a > > separate IDE controller on a PCI card help? > > > > Am I unreasonable to think that I should be getting better than 37 > > MB/sec on raid5 read performance, given that each disk alone seems > > capable of 40 MB/sec? > > To answer my own question... I moved one of the PATA drives to a PCI > PATA controller. This did enable me to move 40MB/sec simultaneously > from all three drives. Guess there's some issue with the built-in > PATA on the ICH5R southbridge. > > However, this didn't help raid5 performance -- it was still about > 35-39MB/sec. I also have a raid1 array on the same physical disks, > and observed the same thing there (same read performance as a single > disk with hdparm -tT, about 40 MB/sec). So: > > 2.6.8 includes the raid1 read balancing fix which was mentioned > previously on this list -- should this show up as substantially better > hdparm -tT numbers for raid1 or is it more complicated than that? > > Does raid5 do read-balancing at all or am I just fantasizing? > > Thanks, > > Steve > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > - > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: Looking for the cause of poor I/O performance 2004-12-04 23:51 ` Mark Hahn 2004-12-05 1:00 ` Steven Ihde @ 2004-12-05 2:16 ` Guy 2004-12-05 15:14 ` TJ 2004-12-05 15:17 ` TJ 2 siblings, 1 reply; 36+ messages in thread From: Guy @ 2004-12-05 2:16 UTC (permalink / raw) To: 'Mark Hahn'; +Cc: linux-raid Ok, now I am confused. I have a second Dell Precision Workstation 410: System A: CPUs 2 X 500 MHz RAM 4 X 128 Meg SDRAM Bus 100 MHz Timing buffer-cache reads: 128 MB in 0.87 seconds =147.13 MB/sec System B: CPUs 2 X 1000 MHz RAM 4 X 256 Meg Registered SDRAM Bus 100 MHz Timing buffer-cache reads: 524 MB in 2.00 seconds =262.00 MB/sec Why is system B almost twice as fast? Is registered RAM faster? I know the CPU speed is twice as fast, but the system bus is still 100 MHz. There are other differences I don't think would have an effect. Video cards, modem, SCSI cards, HW RAID card, USB mouse. Guy -----Original Message----- From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Mark Hahn Sent: Saturday, December 04, 2004 6:51 PM To: Guy Cc: linux-raid@vger.kernel.org Subject: RE: Looking for the cause of poor I/O performance > Timing buffer-cache reads: > 128 MB in 11.18 seconds = 11.45 MB/sec ... > > Timing buffered disk reads: > 64 MB in 9.42 seconds = 6.79 MB/sec ... > > This is from a single disk: > Timing buffer-cache reads: 128 MB in 0.87 seconds =147.13 MB/sec > Timing buffered disk reads: 64 MB in 3.51 seconds = 18.23 MB/sec excellent! this is really a great example of how a machine's limited internal bandwidth infringes on your raid performance. running hdparm -T shows that your machine can manage about 150 MB/s when simply doing a syscall, copying bytes to userspace, and returning. no involvement of any IO device. this number is typically about half the user-visible dram bandwidth as reported by the stream benchmark. when you try to do parallel IO (either with a bunch of hdparm -t's or with raid), each disk is desperately trying to write to dram at about 18 MB/s, ignoring other bottlenecks. alas, we already know that your available dram bandwidth is much lower than 14*18. for comparison, a fairly crappy SiS 735-based k7 system with 64b-wide PC2100 can deliver maybe 1.2 GB/s dram bandwidth. hdparm -T is about 500 MB/s, and would probably have trouble breaking 200 MB/s with raid0 even if it had enough buses. an older server of mine is e7500-based, dual xeon/2.4's, with 2xPC1600 ram. it sustains about 1.6 GB/s on Stream, and about 500 MB/s hdparm -T, and can sustain 250 MB/s through it's 6-disk raid without any problem. a newish server (dual-opteron, 2xPC2700) gives 1.4 GB/s under hdparm -T, and I expect it could hit 600 MB/s without much trouble, if given 10-12 disks and pcix (or better) controllers... regards, mark hahn. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance 2004-12-05 2:16 ` Guy @ 2004-12-05 15:14 ` TJ 2004-12-06 21:39 ` Mark Hahn 0 siblings, 1 reply; 36+ messages in thread From: TJ @ 2004-12-05 15:14 UTC (permalink / raw) To: linux-raid On Saturday 04 December 2004 09:16 pm, Guy wrote: > Ok, now I am confused. > I have a second Dell Precision Workstation 410: > > System A: > CPUs 2 X 500 MHz > RAM 4 X 128 Meg SDRAM > Bus 100 MHz > Timing buffer-cache reads: 128 MB in 0.87 seconds =147.13 MB/sec > > System B: > CPUs 2 X 1000 MHz > RAM 4 X 256 Meg Registered SDRAM > Bus 100 MHz > Timing buffer-cache reads: 524 MB in 2.00 seconds =262.00 MB/sec > > Why is system B almost twice as fast? > Is registered RAM faster? > I know the CPU speed is twice as fast, but the system bus is still 100 MHz. Memory interleaving, perhaps? Registered ram has higher latency. It's possible that the machine is made to do memory interleaving with ECC ram to boost performance.. TJ Harrell ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance 2004-12-05 15:14 ` TJ @ 2004-12-06 21:39 ` Mark Hahn 0 siblings, 0 replies; 36+ messages in thread From: Mark Hahn @ 2004-12-06 21:39 UTC (permalink / raw) To: TJ; +Cc: linux-raid > > System A: > > CPUs 2 X 500 MHz > > RAM 4 X 128 Meg SDRAM > > Bus 100 MHz > > Timing buffer-cache reads: 128 MB in 0.87 seconds =147.13 MB/sec > > > > System B: > > CPUs 2 X 1000 MHz > > RAM 4 X 256 Meg Registered SDRAM > > Bus 100 MHz > > Timing buffer-cache reads: 524 MB in 2.00 seconds =262.00 MB/sec > > > > Why is system B almost twice as fast? the buffer-cache is measuring systemcall overhead as well as speed of pagecache-to-user-buffer copying. both those are certainly influenced by the CPU speed - even by things like mmx. > > Is registered RAM faster? no, it's inherently slower (mostly latency, but since bursts are short, also in bandwidth.) > Memory interleaving, perhaps? Registered ram has higher latency. It's possible > that the machine is made to do memory interleaving with ECC ram to boost > performance.. sdram means that interleaving is basically irrelevant. interleaving was important when a single bank of ram couldn't sustain one transaction per cycle (EDO, FPM, etc). ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance 2004-12-04 23:51 ` Mark Hahn 2004-12-05 1:00 ` Steven Ihde 2004-12-05 2:16 ` Guy @ 2004-12-05 15:17 ` TJ 2004-12-06 21:34 ` Mark Hahn 2 siblings, 1 reply; 36+ messages in thread From: TJ @ 2004-12-05 15:17 UTC (permalink / raw) To: linux-raid > for comparison, a fairly crappy SiS 735-based k7 system with > 64b-wide PC2100 can deliver maybe 1.2 GB/s dram bandwidth. > hdparm -T is about 500 MB/s, and would probably have trouble > breaking 200 MB/s with raid0 even if it had enough buses. > > an older server of mine is e7500-based, dual xeon/2.4's, with > 2xPC1600 ram. it sustains about 1.6 GB/s on Stream, and about > 500 MB/s hdparm -T, and can sustain 250 MB/s through it's 6-disk > raid without any problem. hmmm.. In these cases, how does the throughput exceed the PCI bandwidth? Do these boards have multiple busses? PCI-X? TJ Harrell ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance 2004-12-05 15:17 ` TJ @ 2004-12-06 21:34 ` Mark Hahn 2004-12-06 23:06 ` Guy 0 siblings, 1 reply; 36+ messages in thread From: Mark Hahn @ 2004-12-06 21:34 UTC (permalink / raw) To: TJ; +Cc: linux-raid > > for comparison, a fairly crappy SiS 735-based k7 system with > > 64b-wide PC2100 can deliver maybe 1.2 GB/s dram bandwidth. > > hdparm -T is about 500 MB/s, and would probably have trouble > > breaking 200 MB/s with raid0 even if it had enough buses. > > > > an older server of mine is e7500-based, dual xeon/2.4's, with > > 2xPC1600 ram. it sustains about 1.6 GB/s on Stream, and about > > 500 MB/s hdparm -T, and can sustain 250 MB/s through it's 6-disk > > raid without any problem. > > hmmm.. In these cases, how does the throughput exceed the PCI bandwidth? Do > these boards have multiple busses? PCI-X? I didn't say they exceeded bus bandwidth ("if it had enough"). the latter does actually have multiple buses, some of which are pcix; that's pretty common among server boards. interestingly, even non-server desktop parts can exceed PCI bandwidth - I did a disk server a few years ago that used two chipset ATA ports and an add-in 4-port card to hit about 150 MB/s total (sustained). ^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: Looking for the cause of poor I/O performance 2004-12-06 21:34 ` Mark Hahn @ 2004-12-06 23:06 ` Guy 0 siblings, 0 replies; 36+ messages in thread From: Guy @ 2004-12-06 23:06 UTC (permalink / raw) To: 'Mark Hahn', 'TJ'; +Cc: linux-raid Some systems may have 66 MHz PCI, or 64 bit. Or, just more than 1 PCI bus. My desktop system has 3 PCI buses, I think. Guy -----Original Message----- From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Mark Hahn Sent: Monday, December 06, 2004 4:35 PM To: TJ Cc: linux-raid@vger.kernel.org Subject: Re: Looking for the cause of poor I/O performance > > for comparison, a fairly crappy SiS 735-based k7 system with > > 64b-wide PC2100 can deliver maybe 1.2 GB/s dram bandwidth. > > hdparm -T is about 500 MB/s, and would probably have trouble > > breaking 200 MB/s with raid0 even if it had enough buses. > > > > an older server of mine is e7500-based, dual xeon/2.4's, with > > 2xPC1600 ram. it sustains about 1.6 GB/s on Stream, and about > > 500 MB/s hdparm -T, and can sustain 250 MB/s through it's 6-disk > > raid without any problem. > > hmmm.. In these cases, how does the throughput exceed the PCI bandwidth? Do > these boards have multiple busses? PCI-X? I didn't say they exceeded bus bandwidth ("if it had enough"). the latter does actually have multiple buses, some of which are pcix; that's pretty common among server boards. interestingly, even non-server desktop parts can exceed PCI bandwidth - I did a disk server a few years ago that used two chipset ATA ports and an add-in 4-port card to hit about 150 MB/s total (sustained). - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance 2004-12-03 0:49 ` Mark Hahn 2004-12-03 3:54 ` Guy @ 2004-12-03 6:51 ` TJ 2004-12-03 20:03 ` TJ 2 siblings, 0 replies; 36+ messages in thread From: TJ @ 2004-12-03 6:51 UTC (permalink / raw) To: linux-raid; +Cc: Mark Hahn > such a machine was good in its day, but that day was what, 5-7 years ago? > in practical terms, the machine probably has about 300 MB/s of memory > bandwidth (vs 3000 for a low-end server today). further, it was not > uncommon for chipsets to fail to cache then-large amounts of RAM (32M was a > common limit for caches configured writeback, for instance, that would > magically cache 64M if set to writethrough...) You are clearly right that the memory bandwidth is lower than a modern machine. However, I do feel that the disk I/O still should be much better with this limitation. Doing a straight read operation as hdparm -tT does, the 300 MB/s memory bandwidth should allow for better performance than this. Guy's numbers with an oldish P3 box validate me. Additionally, unless you're talking about a box with 64 bit PCI, PCI-X, or PCI Express, the PCI bus is going to be a severely limiting factor compared to the memory bus. While the box could do more memory I/O, a disk-bound read operation should be limited by the PCI bandwidth on either a new machine, or this machine. > with a modern kernel, manual hdparm tuning is unnecessary and probably > wrong. I understand why setting dma and the like is probably unnecessary. For RAID arrays, I would think that setting up readaheads, and sound management levels with hdparm, and setting kernel readahead parameters in the fs settings would be advantageous. > > To tune these drives, I use: > > hdparm -c3 -d1 -m16 -X68 -k1 -A1 -a128 -M128 -u1 /dev/hd[kigca] > > if you don't mess with the config via hdparm, what mode do they come up in? > iirc, the 75GXP has a noticably lower density (and thus bandwidth). Granted, so why on earth would it perform similarly with hdparm -tT? Even more confusing, how could it best the newer WD 400JB? > > /dev/hda: Timing buffered disk reads: 42 MB in 3.07 seconds = 13.67 > > MB/sec /dev/hdc: Timing buffered disk reads: 44 MB in 3.12 seconds = > > 14.10 MB/sec > > not that bad for such a horrible controller (and PCI, CPU, memory system) So you do think that the VIA controller is inferior? > > /dev/md0: Timing buffered disk reads: 70 MB in 3.07 seconds = 22.77 > > MB/sec /dev/md1: Timing buffered disk reads: 50 MB in 3.03 seconds = > > 16.51 MB/sec > > since the cpu/mem/chipset/bus are limiting factors, raid doesn't help. Those low raid numbers do seem to suggest that, wouldn't it.. > keeping a K6 alive is noble and/or amusing, but it's just not reasonable to > expect it to keep up with modern disks. expecting it to run samba well is > not terribly reasonable. > > plug those disks into any entry-level machine bought new (celeron, sempron) > and you'll get whiplash. plug those disks into a proper server > (dual-opteron, few GB ram) and you'll never look back. in fact, > you'll start looking for a faster network. I disagree, but I must admit that this is a possibility. My desktop machine is an Athlon XP 1700+, 512 MB ram, running at 266MHz DDR bus. It's a class over the K6 easily, with a much better memory subsystem. I could dump all the drives and controllers onto it and run the same tests using the same kernel and everything and record the numbers. Do you feel this would prove or disprove the idea that the box is just underpowered? TJ Harrell ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance 2004-12-03 0:49 ` Mark Hahn 2004-12-03 3:54 ` Guy 2004-12-03 6:51 ` TJ @ 2004-12-03 20:03 ` TJ 2004-12-04 22:59 ` Mark Hahn 2 siblings, 1 reply; 36+ messages in thread From: TJ @ 2004-12-03 20:03 UTC (permalink / raw) To: linux-raid > > My server is a K6-500 with 43MB of RAM, standard x86 hardware. The > > such a machine was good in its day, but that day was what, 5-7 years ago? > in practical terms, the machine probably has about 300 MB/s of memory > bandwidth (vs 3000 for a low-end server today). further, it was not > uncommon for chipsets to fail to cache then-large amounts of RAM (32M was a > common limit for caches configured writeback, for instance, that would > magically cache 64M if set to writethrough...) Bah. As much as I had hoped to squeeze more out of this box, I think you have a pretty solid point. I tried out doing a quick hdparm -tT on all of the drives, using the same OS on the newer Athlon box I described. I got much better numbers. I included the output below. > > OS is Slackware 10.0 w/ 2.6.7 kernel I've had similar problems with the > > with a modern kernel, manual hdparm tuning is unnecessary and probably > wrong. > > > To tune these drives, I use: > > hdparm -c3 -d1 -m16 -X68 -k1 -A1 -a128 -M128 -u1 /dev/hd[kigca] I also checked out what hdparm showed for default settings without modifying them on boot, both on the Athlon, and on the K6. In the case of the Athlon, you're right, performance suffered from my tuning. In the case of the K6, I saw no noticeable difference. It's interesting that on the Athlon, one controller was set to -c1 by default, while on the K6, the same controller with the same drives is set to -c0 by default by the exact same kernel. I'm at a loss for how this is determined and why.. > if you don't mess with the config via hdparm, what mode do they come up in? I included this for both machines. I think I will concentrate on tweaking the PCI settings to see if I can't get a bit more out of that bus and check for any noticable improvements in throughput. I'm hoping that this still may be related to some sort of PCI latency issue. TJ Harrell _______________________________________________________________________ Default settings on the Athlon: 75 GXP: Timing buffer-cache reads: 1124 MB in 2.01 seconds = 560.12 MB/sec Timing buffered disk reads: 108 MB in 3.02 seconds = 35.74 MB/sec WD 1200: Timing buffer-cache reads: 1116 MB in 2.00 seconds = 556.69 MB/sec Timing buffered disk reads: 108 MB in 3.05 seconds = 35.43 MB/sec WD 2000: Timing buffer-cache reads: 1092 MB in 2.01 seconds = 544.45 MB/sec Timing buffered disk reads: 106 MB in 3.00 seconds = 35.32 MB/sec WD 400: Timing buffer-cache reads: 1084 MB in 2.01 seconds = 540.46 MB/sec Timing buffered disk reads: 122 MB in 3.02 seconds = 40.42 MB/sec WD 2000: Timing buffer-cache reads: 1140 MB in 2.00 seconds = 569.52 MB/sec Timing buffered disk reads: 112 MB in 3.01 seconds = 37.19 MB/sec Defaults on the K6: /dev/hda: multcount = 0 (off) IO_support = 1 (32-bit) unmaskirq = 1 (on) using_dma = 1 (on) keepsettings = 0 (off) readonly = 0 (off) readahead = 256 (on) geometry = 65535/16/63, sectors = 78165360, start = 0 /dev/hdc: multcount = 0 (off) IO_support = 1 (32-bit) unmaskirq = 1 (on) using_dma = 1 (on) keepsettings = 0 (off) readonly = 0 (off) readahead = 256 (on) geometry = 24321/255/63, sectors = 390721968, start = 0 dev/hdg: multcount = 0 (off) IO_support = 0 (default 16-bit) unmaskirq = 0 (off) using_dma = 1 (on) keepsettings = 0 (off) readonly = 0 (off) readahead = 256 (on) geometry = 24321/255/63, sectors = 390721968, start = 0 /dev/hdi: multcount = 0 (off) IO_support = 0 (default 16-bit) unmaskirq = 0 (off) using_dma = 1 (on) keepsettings = 0 (off) readonly = 0 (off) readahead = 256 (on) geometry = 16383/255/63, sectors = 234441648, start = 0 /dev/hdk: multcount = 0 (off) IO_support = 0 (default 16-bit) unmaskirq = 0 (off) using_dma = 1 (on) keepsettings = 0 (off) readonly = 0 (off) readahead = 256 (on) geometry = 65535/16/63, sectors = 90069840, start = 0 ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance 2004-12-03 20:03 ` TJ @ 2004-12-04 22:59 ` Mark Hahn 0 siblings, 0 replies; 36+ messages in thread From: Mark Hahn @ 2004-12-04 22:59 UTC (permalink / raw) To: TJ; +Cc: linux-raid > throughput. I'm hoping that this still may be related to some sort of PCI > latency issue. you can use setpci to do this sort of tweaking. naturally, you probably want to mount your filesystems RO when you do this, since accidents do happen... but in abstract, it's quite odd to believe that you can persuade a 7-year old machine to perform as well as a current one. sure, it sometimes happens, but so very much has changed. as a completely random example, the topology of bus-like links in a modern box is vastly less bottlenecked than in the bad old days. for instance, it was completely normal back then for the chipset's builtin IDE (and any add-ons) to sit on a single 32x33 PCI segment. that segment was often unable to actually approach 133 MB/s (80 was pretty good in those days). and often vendors would bend the rules a little like adding a bit more delay in arbitration in order to permit more PCI slots (since PCI, like any multidrop bus, always has a speed-drops tradeoff). nowadays, it's actually common to see unshared pcix slots direct to a memory-controller-hub, along with unshared connections for *ATA, sound, even ethernet. not to mention the fact that memory itself is 10x faster. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Looking for the cause of poor I/O performance 2004-12-02 16:38 TJ 2004-12-03 0:49 ` Mark Hahn @ 2004-12-03 7:12 ` TJ 1 sibling, 0 replies; 36+ messages in thread From: TJ @ 2004-12-03 7:12 UTC (permalink / raw) To: linux-raid Here's some info on the VIA chipset problem.. Here's the article that really cracked the VIA chipset problem. It seems there are problems with the default timings and latency settings: http://www.tecchannel.de/hardware/817/8.html This guy, independant of VIA, came up with some chipset register tweaks to improve performance of the PCI bus for the broken VIAs. http://www.georgebreese.com/net/software/ http://www.georgebreese.com/net/software/readmes/vlatency_v020b21_readme.HTM http://adsl.cutw.net/dlink-dsl200-via.html The linux kernel sources indicate a problem with VIA chipsets too, but the author doesn't have it fixed to his satisfaction, obviously. From linux-2.6.7/drivers/pci/quirks.c: /* The VIA VP2/VP3/MVP3 seem to have some 'features'. There may be a workaround but VIA don't answer queries. If you happen to have good contacts at VIA ask them for me please -- Alan This appears to be BIOS not version dependent. So presumably there is a chipset level fix */ I'm wondering if I could get the tweaks that George Breese has made and implement them in linux. If I could, how would I benchmark to check for improvement? Bonnie? hdparm -tT? Some PCI benchmark? TJ Harrell ^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2004-12-09 1:40 UTC | newest] Thread overview: 36+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-12-03 11:30 Looking for the cause of poor I/O performance TJ 2004-12-03 11:46 ` Erik Mouw 2004-12-03 15:09 ` TJ 2004-12-03 16:25 ` Erik Mouw 2004-12-03 16:32 ` David Greaves 2004-12-03 16:50 ` Guy -- strict thread matches above, loose matches on Subject: below -- 2004-12-02 16:38 TJ 2004-12-03 0:49 ` Mark Hahn 2004-12-03 3:54 ` Guy 2004-12-03 6:33 ` TJ 2004-12-03 7:38 ` Guy 2004-12-04 15:23 ` TJ 2004-12-04 17:59 ` Guy 2004-12-04 23:51 ` Mark Hahn 2004-12-05 1:00 ` Steven Ihde 2004-12-06 17:48 ` Steven Ihde 2004-12-06 19:29 ` Guy 2004-12-06 21:10 ` David Greaves 2004-12-06 23:02 ` Guy 2004-12-08 9:24 ` David Greaves 2004-12-08 18:31 ` Guy 2004-12-08 22:00 ` Steven Ihde 2004-12-08 22:25 ` Guy 2004-12-08 22:41 ` Guy 2004-12-09 1:40 ` Steven Ihde 2004-12-06 21:16 ` Steven Ihde 2004-12-05 2:16 ` Guy 2004-12-05 15:14 ` TJ 2004-12-06 21:39 ` Mark Hahn 2004-12-05 15:17 ` TJ 2004-12-06 21:34 ` Mark Hahn 2004-12-06 23:06 ` Guy 2004-12-03 6:51 ` TJ 2004-12-03 20:03 ` TJ 2004-12-04 22:59 ` Mark Hahn 2004-12-03 7:12 ` TJ
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).